Passing long Strings as parameters (by ref and by value)

  • Thread starter Thread starter Sujeet
  • Start date Start date
S

Sujeet

If there are long strings (like 1MB or 2MB) is it more performant to pass
those by ref to methods or by value?
 
Sujeet, string objects in C# and VB are reference-type objects. They're
allocated on the heap, and your variable contains a reference to the actual
string data on the heap. So when you pass a string of any length to a method
you're always passing a "pointer" (so to speak) to the actual string data,
rather than the actual string data itself, no matter whether you pass it by
value or by reference.

So the performance implications of passing that "pointer" by reference
versus by value are small compared to what would be the cost if the actual
string data were being passed on the stack. When you pass by value, the
compiled code merely sends the "pointer" off to the called method and
forgets about it. When you pass by reference, the compiled code sends the
"pointer" off to the called method, and after the method has completed its
processing the (possibly updated) value is returned to the calling method's
variable. So it's just a matter of one or two IL instructions, probably.

Instead, you should primarily consider whether or not the called method has
any reason to change the "pointer" to point to a different string during its
processing. If not, pass it by value. If so, pass it by reference, but
document in the calling method that the called method can potentially change
the variable's contents to point to a different string.

For other languages the performance differences can be extreme. But for C#
and VB we get a free pass.

"performant" - jeez, I hate that word. Even the guy who coined it a long
time ago later got defensive about it. Wish I could remember his name. I
used to see him speak at the VBITS conferences back in the early days of
..Net. He worked on the CLR.

HTH,
Tom Dacon
Dacon Software Consulting
 
The other posters have answered this fine - but just to add that this
only applies within a single AppDomain; all bets are off when talking
between different AppDomains (remoting / IPC) or processes (IPC) or
COM-interop or over the network. Maybe obvious, but worth noting.

Marc
 
Tom Dacon said:
Sujeet, string objects in C# and VB are reference-type objects. They're
allocated on the heap, and your variable contains a reference to the actual
string data on the heap. So when you pass a string of any length to a method
you're always passing a "pointer" (so to speak) to the actual string data,
rather than the actual string data itself, no matter whether you pass it by
value or by reference.

So the performance implications of passing that "pointer" by reference
versus by value are small compared to what would be the cost if the actual
string data were being passed on the stack. When you pass by value, the
compiled code merely sends the "pointer" off to the called method and
forgets about it. When you pass by reference, the compiled code sends the
"pointer" off to the called method, and after the method has completed its
processing the (possibly updated) value is returned to the calling method's
variable. So it's just a matter of one or two IL instructions, probably.

That's not quite the way it works. The value isn't "copied back" to the
variable after the method is executed. The variable *itself* ends up
being passed, rather than the variable's value. In other words, every
time the method changes the value of the parameter, that change is
*immediately* visible in the variable, because that's what's being
updated. That's hard to see if it's a local variable, but very visible
if it's an instance/static variable.

Here's an example:

using System;

class Test
{
static string text;

static void Main()
{
Method(ref text);
}

static void Method(ref string foo)
{
Console.WriteLine ("text = " + text);
foo = "hi";
Console.WriteLine ("text = " + text);
}
}
 
Sorry, Jon - that's what I get for not checking the IL before I reply off
the top of my head.

I've seen so much prolog/epilog code in so many languages - and written
some, for that matter - that they all sort of merge into one another after
awhile. Basically, they all have to solve the same problem but the details
tend to differ.

Tom
 
Jon said:
That's not quite the way it works. The value isn't "copied back" to
the variable after the method is executed. The variable *itself* ends
up being passed, rather than the variable's value. In other words,
every time the method changes the value of the parameter, that change
is *immediately* visible in the variable, because that's what's being
updated. That's hard to see if it's a local variable, but very visible
if it's an instance/static variable.

Unless you call using BeginInvoke/EndInvoke, or it's a remote object, or
called through COM, or...

In any case where marshalling is involved, the update is propagated back
exactly once, at the end of the call.
 
Ben Voigt said:
Unless you call using BeginInvoke/EndInvoke, or it's a remote object, or
called through COM, or...

In any case where marshalling is involved, the update is propagated back
exactly once, at the end of the call.

True - but at that point it's not really being a language feature and
more, it's to do with the remoting involved. Any number of things
become "odd" when marshalling is involved...
 
Jon Skeet said:
True - but at that point it's not really being a language feature and
more, it's to do with the remoting involved. Any number of things
become "odd" when marshalling is involved...

And that's one of the worst features of new languages... you can address
remote objects and "odd" behavior and the syntax never even hints you're
doing so.

At least with standard C++, if something looks like a variable reference, it
is a variable reference. The worst performance hit you could have is
dealing with a VM cache miss. Course that only helps a little, since
functions still vary in complexity from "almost free" to "wait for a
response from the moon", but I guess you can't have everything.

VB6 was the worst, because of the whole Let/Set syntax. So:

I = I + 1

could invoke two remote calls (one to I's get, one to I's let). Ugh!

Best thing to do is consider that anytime you pass something by reference,
you've transferred ownership of that variable to the called function for the
duration.
 
Ben said:
At least with standard C++, if something looks like a variable reference, it
is a variable reference. The worst performance hit you could have is
dealing with a VM cache miss.

Strongly disagree. operator= could be doing anything.

-- Barry
 
Sujeet, string objects in C# and VB are reference-type objects. They're
allocated on the heap, and your variable contains a reference to the actual
string data on the heap. So when you pass a string of any length to a method
you're always passing a "pointer" (so to speak) to the actual string data,
rather than the actual string data itself, no matter whether you pass it by
value or by reference.

So the performance implications of passing that "pointer" by reference
versus by value are small compared to what would be the cost if the actual
string data were being passed on the stack. When you pass by value, the
compiled code merely sends the "pointer" off to the called method and
forgets about it. When you pass by reference, the compiled code sends the
"pointer" off to the called method, and after the method has completed its
processing the (possibly updated) value is returned to the calling method's
variable. So it's just a matter of one or two IL instructions, probably.

Instead, you should primarily consider whether or not the called method has
any reason to change the "pointer" to point to a different string during its
processing. If not, pass it by value. If so, pass it by reference, but
document in the calling method that the called method can potentially change
the variable's contents to point to a different string.

For other languages the performance differences can be extreme. But for C#
and VB we get a free pass.

"performant" - jeez, I hate that word. Even the guy who coined it a long
time ago later got defensive about it. Wish I could remember his name. I
used to see him speak at the VBITS conferences back in the early days of
.Net. He worked on the CLR.

HTH,
Tom Dacon
Dacon Software Consulting






- Show quoted text -

To continue the educational nature of this thread for advanced newbies
like me :)...
So what happens if the string is passed by value and the callee
changes the string in a way that involvs string's contents (example -
Replace all occurences of ' with space? Does this create a modified
copy of the (entire 2 MB) string on the heap and re-points the local
variable to that new copy (which pointer is then discarded when the
callee ends)?
I guess that is not relevant to initial question of performance of
byref vs. by val, because the same things happen if the string is
passed by reference and modified that way (with the only difference
being that the caller variable (pointer) is updated upon return.)
 
G.S. said:
To continue the educational nature of this thread for advanced newbies
like me :)...
So what happens if the string is passed by value

The *reference* to the string is passed by value, just to keep it
clear. The object isn't passed at all.
and the callee changes the string in a way that involvs string's
contents (example - Replace all occurences of ' with space?

You can't do that - strings are immutable. That's why methods like
Replace return a new string.
Does this create a modified copy of the (entire 2 MB) string on the
heap and re-points the local variable to that new copy (which pointer
is then discarded when the callee ends)?

If you do:

parameter = parameter.Replace("'", " ");

then that would indeed create a new string, and change the value of the
parameter to refer to that new string.
I guess that is not relevant to initial question of performance of
byref vs. by val, because the same things happen if the string is
passed by reference and modified that way (with the only difference
being that the caller variable (pointer) is updated upon return.)

Indeed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top