Efficient fixed width string substition puzzle

J

Jami Bradley

Hi,

I'm looking for an efficient way to do this, because I know it will be heavily used :)

I have a fixed width string and I need to substitute a substring of characters with new values. I
can do this with 2 substring calls, but it will need to rebuild the string just to write a few
characters.

Here is the simple, but inefficient, version:
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

// Replace a substring with one of equal length, based on offset and length:
Console.WriteLine("Substring: " + s.Substring(Offset, Length)); // Displays "345"
Console.WriteLine("Original: [" + s + "]");

s = s.Substring(0, Offset) + r.PadLeft(3, ' ') + s.Substring(Offset + Length);

Console.WriteLine("Result: [" + s + "]");

This will take the string "0123456789" and replace the characters starting at offset 3 with "abc".
The result is "012abc6789"

I am guaranteeing that the lengths are the same, so in C/C++ I could do something like this with a
memcpy, but that isn't a very friendly way :)

TIA,

Jami
 
T

Tom Dacon

Use the StringBuilder class; it's optimized for things like this.

Tom Dacon
Dacon Software Consulting
 
M

Mattias Sjögren

Jami,
but it will need to rebuild the string just to write a few characters.

Since strings are immutable, you'll always have to create a new string
one way or another. I'd use a StringBuilder or a char[] to reduce the
number of intermediate strings created.



Mattias
 
J

Jon Skeet [C# MVP]

Mattias Sjögren said:
but it will need to rebuild the string just to write a few characters.

Since strings are immutable, you'll always have to create a new string
one way or another. I'd use a StringBuilder or a char[] to reduce the
number of intermediate strings created.

Of these, I'd go for the StringBuilder option, creating it with the
right buffer size to start with, and then using:

builder.Append (s, 0, Offset);
builder.Append (r);
builder.Append (s, Offset+Length, s.Length-(Offset+Length));

This should avoid creating any temporary objects other than the builder
itself.
 
J

Jami Bradley

Thanks everyone for their tips. I decided to try all three options and time them to see what it
would take. I tried doing a simple 3 byte copy into the middle of the string - similar to what I
would expect during production. With 10M iterations, I had the following results:

// Test results on 2.4GHz P4:
// TestString: 10000000 Iterations in 3.8946357707474 seconds
// TestStringBuilder: 10000000 Iterations in 5.4614639570113 seconds
// TestCharArray: 10000000 Iterations in 2.0478365267094 seconds

Some interesting notes:
1. I needed to copy the StringBuilder back to a string so that I could loop - otherwise I would be
stepping on myself.
2. StringBuilder is slower than string! I presume this is mostly due to the extra copy.
3. As expected, the character array is the fastest.
4. All of these are extremely fast, I don't think the efficiency gains will matter! :)

I hope this is useful to others. I've included the source below for those interested. Timing was
done with the PerformanceCounter.

Jami
----------------------------------------------------------------------------------

static void TestString(int Count)
{
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

for (int Idx = 0; Idx < Count; ++Idx)
{
// Replace a substring with one of equal length, based on offset and length:
//Console.WriteLine("Substring: " + s.Substring(Offset, Length)); // Displays "345"
//Console.WriteLine("Original: [" + s + "]");
s = s.Substring(0, Offset) + r.PadLeft(3, ' ') + s.Substring(Offset + Length);
//Console.WriteLine("Result: [" + s + "]");
}
return;
}

static void TestStringBuilder(int Count)
{
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

for (int Idx = 0; Idx < Count; ++Idx)
{
// Replace a substring with one of equal length, based on offset and length:
StringBuilder sb = new StringBuilder(s.Length);
sb.Append(s, 0, Offset);
sb.Append(r);
sb.Append(s, Offset+Length, s.Length-(Offset+Length));
s = sb.ToString();
//Console.WriteLine("Result: [" + s + "]");
}
return;
}

static void TestCharArray(int Count)
{
char[] s = "0123456789".ToCharArray();
char[] r = "abc".ToCharArray(); // Value to substitute

int Offset = 3; // Starting index of substring to change

for (int Idx = 0; Idx < Count; ++Idx)
{
r.CopyTo(s, Offset);
}
return;
}





Mattias Sjögren said:
but it will need to rebuild the string just to write a few characters.

Since strings are immutable, you'll always have to create a new string
one way or another. I'd use a StringBuilder or a char[] to reduce the
number of intermediate strings created.

Of these, I'd go for the StringBuilder option, creating it with the
right buffer size to start with, and then using:

builder.Append (s, 0, Offset);
builder.Append (r);
builder.Append (s, Offset+Length, s.Length-(Offset+Length));

This should avoid creating any temporary objects other than the builder
itself.
 
J

Jami Bradley

And one more note :)

I tried increasing the starting string to 300 bytes, so that it would be more like my problem, and
the timing results changed to the following:

TestString: 10000000 Iterations in 11.4631644524653 seconds
TestStringBuilder: 10000000 Iterations in 10.010672026752 seconds
TestCharArray: 10000000 Iterations in 2.07768892415097 seconds

Not surprisingly, the character array moves ahead. It is interesting to see the StringBuilder pass
the string - makes some sense.

Enjoy,

Jami

Thanks everyone for their tips. I decided to try all three options and time them to see what it
would take. I tried doing a simple 3 byte copy into the middle of the string - similar to what I
would expect during production. With 10M iterations, I had the following results:

// Test results on 2.4GHz P4:
// TestString: 10000000 Iterations in 3.8946357707474 seconds
// TestStringBuilder: 10000000 Iterations in 5.4614639570113 seconds
// TestCharArray: 10000000 Iterations in 2.0478365267094 seconds

Some interesting notes:
1. I needed to copy the StringBuilder back to a string so that I could loop - otherwise I would be
stepping on myself.
2. StringBuilder is slower than string! I presume this is mostly due to the extra copy.
3. As expected, the character array is the fastest.
4. All of these are extremely fast, I don't think the efficiency gains will matter! :)

I hope this is useful to others. I've included the source below for those interested. Timing was
done with the PerformanceCounter.

Jami
----------------------------------------------------------------------------------

static void TestString(int Count)
{
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

for (int Idx = 0; Idx < Count; ++Idx)
{
// Replace a substring with one of equal length, based on offset and length:
//Console.WriteLine("Substring: " + s.Substring(Offset, Length)); // Displays "345"
//Console.WriteLine("Original: [" + s + "]");
s = s.Substring(0, Offset) + r.PadLeft(3, ' ') + s.Substring(Offset + Length);
//Console.WriteLine("Result: [" + s + "]");
}
return;
}

static void TestStringBuilder(int Count)
{
string s = "0123456789";
string r = "abc"; // Value to substitute

int Offset = 3; // Starting index of substring to change
int Length = 3; // Length of substring

for (int Idx = 0; Idx < Count; ++Idx)
{
// Replace a substring with one of equal length, based on offset and length:
StringBuilder sb = new StringBuilder(s.Length);
sb.Append(s, 0, Offset);
sb.Append(r);
sb.Append(s, Offset+Length, s.Length-(Offset+Length));
s = sb.ToString();
//Console.WriteLine("Result: [" + s + "]");
}
return;
}

static void TestCharArray(int Count)
{
char[] s = "0123456789".ToCharArray();
char[] r = "abc".ToCharArray(); // Value to substitute

int Offset = 3; // Starting index of substring to change

for (int Idx = 0; Idx < Count; ++Idx)
{
r.CopyTo(s, Offset);
}
return;
}





Mattias Sjögren said:
but it will need to rebuild the string just to write a few characters.

Since strings are immutable, you'll always have to create a new string
one way or another. I'd use a StringBuilder or a char[] to reduce the
number of intermediate strings created.

Of these, I'd go for the StringBuilder option, creating it with the
right buffer size to start with, and then using:

builder.Append (s, 0, Offset);
builder.Append (r);
builder.Append (s, Offset+Length, s.Length-(Offset+Length));

This should avoid creating any temporary objects other than the builder
itself.
 
J

Jon Skeet [C# MVP]

Jami Bradley said:
Thanks everyone for their tips. I decided to try all three options
and time them to see what it would take. I tried doing a simple 3
byte copy into the middle of the string - similar to what I would
expect during production. With 10M iterations, I had the following
results:

// Test results on 2.4GHz P4:
// TestString: 10000000 Iterations in 3.8946357707474 seconds
// TestStringBuilder: 10000000 Iterations in 5.4614639570113 seconds
// TestCharArray: 10000000 Iterations in 2.0478365267094 seconds

Some interesting notes:
1. I needed to copy the StringBuilder back to a string so that I could loop - otherwise I would be
stepping on myself.
2. StringBuilder is slower than string! I presume this is mostly due to the extra copy.
3. As expected, the character array is the fastest.
4. All of these are extremely fast, I don't think the efficiency gains will matter! :)

I hope this is useful to others. I've included the source below for
those interested. Timing was done with the PerformanceCounter.

Your test isn't really fair:

1) You don't end up with a string at the end of the TestCharArray
method, which I thought was the point. Just adding a
string s = new string(r); at the end of the loop makes the
TestCharArray version the slowest on my box.

2) You're only allocating the char array (and copying the original
contents) once in TestCharArray - which is no good unless you know
ahead of time what size all the strings you need to work with will be,
*and* that the "surrounding" string doesn't change between iterations -
and if that's the case, the StringBuilder case can be improved as well,
I suspect. (If it's not the case, you need to call ToCharArray on each
iteration, or use String.CopyTo if the first condition is true but not
the second.)
 
J

Jami Bradley

It is true that the tests aren't quite identical.

In my case, I am essentially dealing with data records similar to DBase 2 (fixed width records).

My usage will be a small class which owns a record (string or other type) and has get/set methods
to update pieces of the record by field name. The record is fixed length for each record type,
guaranteed.

I really don't care how the record is stored internally, whether it is a string, StringBuilders, or
character array. At the end of the manipulations, I want to grab a copy of the record as a string.
So the typical usage would be 1) create empty fixed length record, 2) set a bunch of fields (about
50 calls), 3) get the record as a string type.

I'm not sure how to improve StringBuilder, because even if I keep the record in a StringBuilder,
I'll need to copy it to make the Append calls.

Thanks,

Jami
 
J

Jon Skeet [C# MVP]

Jami Bradley said:
It is true that the tests aren't quite identical.

In my case, I am essentially dealing with data records similar to
DBase 2 (fixed width records).

My usage will be a small class which owns a record (string or other
type) and has get/set methods to update pieces of the record by field
name. The record is fixed length for each record type, guaranteed.

I really don't care how the record is stored internally, whether it
is a string, StringBuilders, or character array. At the end of the
manipulations, I want to grab a copy of the record as a string. So
the typical usage would be 1) create empty fixed length record, 2)
set a bunch of fields (about 50 calls), 3) get the record as a string
type.

I'm not sure how to improve StringBuilder, because even if I keep the
record in a StringBuilder, I'll need to copy it to make the Append
calls.

Okay. It sounds like keeping it in a char array is indeed going to be
the fastest way of doing things. If you're going to be doing lots of
manipulations with a single record, it probably doesn't matter if you
create a new char array for each record - if it were a case of millions
of records with a couple of manipulations each, and efficiency were
*really* an issue, you could have kept just one char array and copied
to it at the start of each set of manipulations.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top