Efficient fixed width string substition puzzle

  • Thread starter Thread starter Jami Bradley
  • Start date Start date
J

Jami Bradley

Hi,

I'm looking for an efficient way to do this, because I know it will be heavily used :-)

I have a fixed width string and I need to substitute a substring of characters with new values. I
can do this with 2 substring calls, but it will need to rebuild the string just to write a few
characters.

Here is the simple, but inefficient, version:
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

// Replace a substring with one of equal length, based on offset and length:
Console.WriteLine("Substring: " + s.Substring(Offset, Length)); // Displays "345"
Console.WriteLine("Original: [" + s + "]");

s = s.Substring(0, Offset) + r.PadLeft(3, ' ') + s.Substring(Offset + Length);

Console.WriteLine("Result: [" + s + "]");

This will take the string "0123456789" and replace the characters starting at offset 3 with "abc".
The result is "012abc6789"

I am guaranteeing that the lengths are the same, so in C/C++ I could do something like this with a
memcpy, but that isn't a very friendly way :-)

TIA,

Jami
 
Use the StringBuilder class; it's optimized for things like this.

Tom Dacon
Dacon Software Consulting
 
Jami,
but it will need to rebuild the string just to write a few characters.

Since strings are immutable, you'll always have to create a new string
one way or another. I'd use a StringBuilder or a char[] to reduce the
number of intermediate strings created.



Mattias
 
Mattias Sjögren said:
but it will need to rebuild the string just to write a few characters.

Since strings are immutable, you'll always have to create a new string
one way or another. I'd use a StringBuilder or a char[] to reduce the
number of intermediate strings created.

Of these, I'd go for the StringBuilder option, creating it with the
right buffer size to start with, and then using:

builder.Append (s, 0, Offset);
builder.Append (r);
builder.Append (s, Offset+Length, s.Length-(Offset+Length));

This should avoid creating any temporary objects other than the builder
itself.
 
Thanks everyone for their tips. I decided to try all three options and time them to see what it
would take. I tried doing a simple 3 byte copy into the middle of the string - similar to what I
would expect during production. With 10M iterations, I had the following results:

// Test results on 2.4GHz P4:
// TestString: 10000000 Iterations in 3.8946357707474 seconds
// TestStringBuilder: 10000000 Iterations in 5.4614639570113 seconds
// TestCharArray: 10000000 Iterations in 2.0478365267094 seconds

Some interesting notes:
1. I needed to copy the StringBuilder back to a string so that I could loop - otherwise I would be
stepping on myself.
2. StringBuilder is slower than string! I presume this is mostly due to the extra copy.
3. As expected, the character array is the fastest.
4. All of these are extremely fast, I don't think the efficiency gains will matter! :-)

I hope this is useful to others. I've included the source below for those interested. Timing was
done with the PerformanceCounter.

Jami
----------------------------------------------------------------------------------

static void TestString(int Count)
{
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

for (int Idx = 0; Idx < Count; ++Idx)
{
// Replace a substring with one of equal length, based on offset and length:
//Console.WriteLine("Substring: " + s.Substring(Offset, Length)); // Displays "345"
//Console.WriteLine("Original: [" + s + "]");
s = s.Substring(0, Offset) + r.PadLeft(3, ' ') + s.Substring(Offset + Length);
//Console.WriteLine("Result: [" + s + "]");
}
return;
}

static void TestStringBuilder(int Count)
{
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

for (int Idx = 0; Idx < Count; ++Idx)
{
// Replace a substring with one of equal length, based on offset and length:
StringBuilder sb = new StringBuilder(s.Length);
sb.Append(s, 0, Offset);
sb.Append(r);
sb.Append(s, Offset+Length, s.Length-(Offset+Length));
s = sb.ToString();
//Console.WriteLine("Result: [" + s + "]");
}
return;
}

static void TestCharArray(int Count)
{
char[] s = "0123456789".ToCharArray();
char[] r = "abc".ToCharArray(); // Value to substitute

int Offset = 3; // Starting index of substring to change

for (int Idx = 0; Idx < Count; ++Idx)
{
r.CopyTo(s, Offset);
}
return;
}





Mattias Sjögren said:
but it will need to rebuild the string just to write a few characters.

Since strings are immutable, you'll always have to create a new string
one way or another. I'd use a StringBuilder or a char[] to reduce the
number of intermediate strings created.

Of these, I'd go for the StringBuilder option, creating it with the
right buffer size to start with, and then using:

builder.Append (s, 0, Offset);
builder.Append (r);
builder.Append (s, Offset+Length, s.Length-(Offset+Length));

This should avoid creating any temporary objects other than the builder
itself.
 
And one more note :-)

I tried increasing the starting string to 300 bytes, so that it would be more like my problem, and
the timing results changed to the following:

TestString: 10000000 Iterations in 11.4631644524653 seconds
TestStringBuilder: 10000000 Iterations in 10.010672026752 seconds
TestCharArray: 10000000 Iterations in 2.07768892415097 seconds

Not surprisingly, the character array moves ahead. It is interesting to see the StringBuilder pass
the string - makes some sense.

Enjoy,

Jami

Thanks everyone for their tips. I decided to try all three options and time them to see what it
would take. I tried doing a simple 3 byte copy into the middle of the string - similar to what I
would expect during production. With 10M iterations, I had the following results:

// Test results on 2.4GHz P4:
// TestString: 10000000 Iterations in 3.8946357707474 seconds
// TestStringBuilder: 10000000 Iterations in 5.4614639570113 seconds
// TestCharArray: 10000000 Iterations in 2.0478365267094 seconds

Some interesting notes:
1. I needed to copy the StringBuilder back to a string so that I could loop - otherwise I would be
stepping on myself.
2. StringBuilder is slower than string! I presume this is mostly due to the extra copy.
3. As expected, the character array is the fastest.
4. All of these are extremely fast, I don't think the efficiency gains will matter! :-)

I hope this is useful to others. I've included the source below for those interested. Timing was
done with the PerformanceCounter.

Jami
----------------------------------------------------------------------------------

static void TestString(int Count)
{
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

for (int Idx = 0; Idx < Count; ++Idx)
{
// Replace a substring with one of equal length, based on offset and length:
//Console.WriteLine("Substring: " + s.Substring(Offset, Length)); // Displays "345"
//Console.WriteLine("Original: [" + s + "]");
s = s.Substring(0, Offset) + r.PadLeft(3, ' ') + s.Substring(Offset + Length);
//Console.WriteLine("Result: [" + s + "]");
}
return;
}

static void TestStringBuilder(int Count)
{
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

for (int Idx = 0; Idx < Count; ++Idx)
{
// Replace a substring with one of equal length, based on offset and length:
StringBuilder sb = new StringBuilder(s.Length);
sb.Append(s, 0, Offset);
sb.Append(r);
sb.Append(s, Offset+Length, s.Length-(Offset+Length));
s = sb.ToString();
//Console.WriteLine("Result: [" + s + "]");
}
return;
}

static void TestCharArray(int Count)
{
char[] s = "0123456789".ToCharArray();
char[] r = "abc".ToCharArray(); // Value to substitute

int Offset = 3; // Starting index of substring to change

for (int Idx = 0; Idx < Count; ++Idx)
{
r.CopyTo(s, Offset);
}
return;
}





Mattias Sjögren said:
but it will need to rebuild the string just to write a few characters.

Since strings are immutable, you'll always have to create a new string
one way or another. I'd use a StringBuilder or a char[] to reduce the
number of intermediate strings created.

Of these, I'd go for the StringBuilder option, creating it with the
right buffer size to start with, and then using:

builder.Append (s, 0, Offset);
builder.Append (r);
builder.Append (s, Offset+Length, s.Length-(Offset+Length));

This should avoid creating any temporary objects other than the builder
itself.
 
Jami Bradley said:
Thanks everyone for their tips. I decided to try all three options
and time them to see what it would take. I tried doing a simple 3
byte copy into the middle of the string - similar to what I would
expect during production. With 10M iterations, I had the following
results:

// Test results on 2.4GHz P4:
// TestString: 10000000 Iterations in 3.8946357707474 seconds
// TestStringBuilder: 10000000 Iterations in 5.4614639570113 seconds
// TestCharArray: 10000000 Iterations in 2.0478365267094 seconds

Some interesting notes:
1. I needed to copy the StringBuilder back to a string so that I could loop - otherwise I would be
stepping on myself.
2. StringBuilder is slower than string! I presume this is mostly due to the extra copy.
3. As expected, the character array is the fastest.
4. All of these are extremely fast, I don't think the efficiency gains will matter! :-)

I hope this is useful to others. I've included the source below for
those interested. Timing was done with the PerformanceCounter.

Your test isn't really fair:

1) You don't end up with a string at the end of the TestCharArray
method, which I thought was the point. Just adding a
string s = new string(r); at the end of the loop makes the
TestCharArray version the slowest on my box.

2) You're only allocating the char array (and copying the original
contents) once in TestCharArray - which is no good unless you know
ahead of time what size all the strings you need to work with will be,
*and* that the "surrounding" string doesn't change between iterations -
and if that's the case, the StringBuilder case can be improved as well,
I suspect. (If it's not the case, you need to call ToCharArray on each
iteration, or use String.CopyTo if the first condition is true but not
the second.)
 
It is true that the tests aren't quite identical.

In my case, I am essentially dealing with data records similar to DBase 2 (fixed width records).

My usage will be a small class which owns a record (string or other type) and has get/set methods
to update pieces of the record by field name. The record is fixed length for each record type,
guaranteed.

I really don't care how the record is stored internally, whether it is a string, StringBuilders, or
character array. At the end of the manipulations, I want to grab a copy of the record as a string.
So the typical usage would be 1) create empty fixed length record, 2) set a bunch of fields (about
50 calls), 3) get the record as a string type.

I'm not sure how to improve StringBuilder, because even if I keep the record in a StringBuilder,
I'll need to copy it to make the Append calls.

Thanks,

Jami
 
Jami Bradley said:
It is true that the tests aren't quite identical.

In my case, I am essentially dealing with data records similar to
DBase 2 (fixed width records).

My usage will be a small class which owns a record (string or other
type) and has get/set methods to update pieces of the record by field
name. The record is fixed length for each record type, guaranteed.

I really don't care how the record is stored internally, whether it
is a string, StringBuilders, or character array. At the end of the
manipulations, I want to grab a copy of the record as a string. So
the typical usage would be 1) create empty fixed length record, 2)
set a bunch of fields (about 50 calls), 3) get the record as a string
type.

I'm not sure how to improve StringBuilder, because even if I keep the
record in a StringBuilder, I'll need to copy it to make the Append
calls.

Okay. It sounds like keeping it in a char array is indeed going to be
the fastest way of doing things. If you're going to be doing lots of
manipulations with a single record, it probably doesn't matter if you
create a new char array for each record - if it were a case of millions
of records with a couple of manipulations each, and efficiency were
*really* an issue, you could have kept just one char array and copied
to it at the start of each set of manipulations.
 
Back
Top