Algorithm: Reverse a string

A

Aamir Mahmood

What could be the fastest method to reverse a string.
Speed really matters for my application. Average size of string that comes
to my function is around 200K - 300K.

Currently i am using the following algorithm:
------------------------------------------
public static string Reverse(string s) {
if (s == null || s.Length < 2) {
return s;
}

int len = s.Length;
int loop = (len >> 1) + 1;
int j;
char[] ca = new char[len];
for (int i = 0; i < loop; i++) {
j = len - i - 1;
ca = s[j];
ca[j] = s;
}
return new string(ca);
}

------------------------------------------

Can it be improved? Or some new idea?

-
AM
 
T

Thomas P. Skinner [MVP]

I am not sure it would be faster, but instead of building a character array
you might try building the string directly with a StringBuilder class.
Construct with the size of the string, set the length, and index to set
each character. You could also use pointers and unmanaged code. That would
probably be the fastest. You might want to use ILDASM and examine the code
generated for each method.

Thomas P. Skinner [MVP]
 
T

Thomas P. Skinner [MVP]

Just thought of another way. Try the Reverse method of the Array class. I
think you would need to run some timing tests since you wouldn't know what
is going on inside the method.

Thomas P. Skinner [MVP]

Thomas P. Skinner said:
I am not sure it would be faster, but instead of building a character array
you might try building the string directly with a StringBuilder class.
Construct with the size of the string, set the length, and index to set
each character. You could also use pointers and unmanaged code. That would
probably be the fastest. You might want to use ILDASM and examine the code
generated for each method.

Thomas P. Skinner [MVP]

Aamir Mahmood said:
What could be the fastest method to reverse a string.
Speed really matters for my application. Average size of string that
comes to my function is around 200K - 300K.

Currently i am using the following algorithm:
------------------------------------------
public static string Reverse(string s) {
if (s == null || s.Length < 2) {
return s;
}

int len = s.Length;
int loop = (len >> 1) + 1;
int j;
char[] ca = new char[len];
for (int i = 0; i < loop; i++) {
j = len - i - 1;
ca = s[j];
ca[j] = s;
}
return new string(ca);
}

------------------------------------------

Can it be improved? Or some new idea?

-
AM

 
J

Jon Skeet [C# MVP]

Aamir Mahmood said:
What could be the fastest method to reverse a string.
Speed really matters for my application. Average size of string that comes
to my function is around 200K - 300K.

<snip>

I had a look at this a while ago, and came up with the following:

static string ReverseUnsafe (string x)
{
if (x==null || x.Length < 2)
{
return x;
}
string y = x.Substring(0, x.Length);
unsafe
{
fixed (char *start = y)
{
char *p = start;
for (int i=x.Length-1; i >= 0; i--)
{
*p++ = x;
}
}
}
return y;
}

I'm not an unsafe code expert by a long chalk, but I believe it's okay
- and was about twice as fast as anything I could come up with without
using unsafe code. You might find the thread I came up with that code
for instructive - it was in the Compact Framework group, with the
subject line "To All: String reverse problem".
 
A

Aamir Mahmood

Won't a half-way loop be faster than a full length loop? As I did in my
code.

I have not bench marked your code, but this was the first thought that came
to my mind.

-
AM

Jon Skeet said:
Aamir Mahmood said:
What could be the fastest method to reverse a string.
Speed really matters for my application. Average size of string that
comes
to my function is around 200K - 300K.

<snip>

I had a look at this a while ago, and came up with the following:

static string ReverseUnsafe (string x)
{
if (x==null || x.Length < 2)
{
return x;
}
string y = x.Substring(0, x.Length);
unsafe
{
fixed (char *start = y)
{
char *p = start;
for (int i=x.Length-1; i >= 0; i--)
{
*p++ = x;
}
}
}
return y;
}

I'm not an unsafe code expert by a long chalk, but I believe it's okay
- and was about twice as fast as anything I could come up with without
using unsafe code. You might find the thread I came up with that code
for instructive - it was in the Compact Framework group, with the
subject line "To All: String reverse problem".
 
A

Aamir Mahmood

Won't a half-way loop be faster than a full length loop? As I did in my
code.

I have not bench marked your code, but this was the first thought that came
to my mind.

-
AM

Jon Skeet said:
Aamir Mahmood said:
What could be the fastest method to reverse a string.
Speed really matters for my application. Average size of string that
comes
to my function is around 200K - 300K.

<snip>

I had a look at this a while ago, and came up with the following:

static string ReverseUnsafe (string x)
{
if (x==null || x.Length < 2)
{
return x;
}
string y = x.Substring(0, x.Length);
unsafe
{
fixed (char *start = y)
{
char *p = start;
for (int i=x.Length-1; i >= 0; i--)
{
*p++ = x;
}
}
}
return y;
}

I'm not an unsafe code expert by a long chalk, but I believe it's okay
- and was about twice as fast as anything I could come up with without
using unsafe code. You might find the thread I came up with that code
for instructive - it was in the Compact Framework group, with the
subject line "To All: String reverse problem".
 
J

Jon Skeet [C# MVP]

Aamir Mahmood said:
Won't a half-way loop be faster than a full length loop? As I did in my
code.

Not necessarily - it's doing twice as much in each iteration.
Furthermore, I expect the JIT may well be able to do some bounds
checking elimination on the simpler version, but not on the more
complex version.

It's worth trying, I suppose, but I doubt be a significant improvement.
 
T

Thomas P. Skinner [MVP]

Jon,

Why wouldn't you use pointers for both source and destination? I would think
array indexing would have much more overhead.

Thomas P. Skinner [MVP]

Jon Skeet said:
Aamir Mahmood said:
What could be the fastest method to reverse a string.
Speed really matters for my application. Average size of string that
comes
to my function is around 200K - 300K.

<snip>

I had a look at this a while ago, and came up with the following:

static string ReverseUnsafe (string x)
{
if (x==null || x.Length < 2)
{
return x;
}
string y = x.Substring(0, x.Length);
unsafe
{
fixed (char *start = y)
{
char *p = start;
for (int i=x.Length-1; i >= 0; i--)
{
*p++ = x;
}
}
}
return y;
}

I'm not an unsafe code expert by a long chalk, but I believe it's okay
- and was about twice as fast as anything I could come up with without
using unsafe code. You might find the thread I came up with that code
for instructive - it was in the Compact Framework group, with the
subject line "To All: String reverse problem".
 
J

Jon Skeet [C# MVP]

Thomas P. Skinner said:
Why wouldn't you use pointers for both source and destination? I
would think array indexing would have much more overhead.

Yes, I'd have thought so too. It doesn't look like it though, having
done a quick benchmark.

Making the loop do twice as much work for half as many iterations looks
like it slows things down too - I can't get it any faster than the
version posted (unless you don't mind reversing the actual original
string, which is a horrendously bad idea).

I'm open for more ideas though :)
 
T

Thomas P. Skinner [MVP]

Ahhh for the good old days of inline assembler. At least then you knew
exactly what was going on. .NET and managed code is wonderful in many
respects, but does present challenges in writing optimized loops.

Thomas P. Skinner [MVP]
Boston University
 
J

Jon Skeet [C# MVP]

Thomas P. Skinner said:
Ahhh for the good old days of inline assembler. At least then you knew
exactly what was going on. .NET and managed code is wonderful in many
respects, but does present challenges in writing optimized loops.

You mean the days of inline assembler with simple, understandable
pipelines, and either no cache or a very easy to predict cache. Oh, and
single-threaded, single-process, for preference, too :)
 
J

jeff_louie

Thomas.... You _can_ write in assembler, or disassemble the C# IL.

Here is some fun with bit level code that takes an int32 and outputs the bit value by left shifting the
binary representation of the integer 32 times. This code demonstrates the use of try catch IL and the
use of local variables to store temporary values. The CLR (Common Language Runtime) will complain
if the try catch logic can lead to invalid states. We initialize the local variable n_input before we try to
assign it a user value in try. This makes the CLR happy!

// TestILAssembler.il
// 11.28.04 Jeff Louie
..assembly extern mscorlib {}
..assembly TestILAssembler {.ver 1:0:1:0}

..method private static void Main(string[] args) cil managed
{
.entrypoint
.custom instance void [mscorlib]System.STAThreadAttribute::.ctor() =
( 01 00 00 00 )
.maxstack 4
.locals init ([0] int32 n_counter,[1] int32 n_input)
.try // get user input
{
ldstr "Enter number: "
call void [mscorlib]System.Console::Write(string)
call string [mscorlib]System.Console::ReadLine()
call int32 [mscorlib]System.Int32::parse(string)
// parse may throw exception
stloc.1 // place valid user input into n_input
leave.s got_value // leave don't branch from try
} // end .try
catch [mscorlib]System.Object
{
pop // remove exception from stack
ldstr "Invalid Number."
call void [mscorlib]System.Console::WriteLine(string)
leave exit // invalid input so exit program
} // end handler

got_value: ldloc.1 // push value n_input onto stack
dup
call void [mscorlib]System.Console::Write(int32)
ldstr " [Input value] "
call void [mscorlib]System.Console::WriteLine(string)
// **** (for int i=0; i<32; i++) ****
load_counter: ldc.i4 0 // initialize n_counter 0
stloc.0 // store in local variable n_count
begin_loop: ldloc.0 // get current counter value
ldc.i4 32
bge end_loop // break loop if counter is >=32
dup
ldc.i4 0x80000000
// bit mask 10000000000000000000000000000000
and // clear all bits except most significant
ldc.i4 0x80000000
ceq // push 1 if most sig bit is true, else 0
call void [mscorlib]System.Console::Write(int32)
ldc.i4 1 // push 1
shl // *** shift left 1, zero least sig bit ****
ldloc.0 // get n_counter
ldc.i4 1
add // add 1 to n_counter, postfix increment
stloc.0 // store new n_counter value
br begin_loop // branch to loop again
// **** end for loop ****
end_loop: pop // remove input value should be all zeros
ldstr " [32 bit value] "
end_algo: call void [mscorlib]System.Console::WriteLine(string)
exit: ldstr "JAL 11.25.04"
call void [mscorlib]System.Console::WriteLine(string)
ret
} // end of method Main


Here is the output [input]:


Enter number: 10
10 [Input value]
00000000000000000000000000001010 [32 bit value]
JAL 11.28.04


Pretty cool!
Ahhh for the good old days of inline assembler. At least then you knew
exactly what was going on. .NET and managed code is wonderful in many
respects, but does present challenges in writing optimized loops.<


**********************************************************************
Sent via Fuzzy Software @ http://www.fuzzysoftware.com/
Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...
 
T

Thomas P. Skinner [MVP]

Jeff,

You may have missed my point. I was referring to inline assembler using
"native" code a la the good old C programming days. . I am well familiar
with the ability to write in MSIL. In a previous post in this thread I
mentioned using the ILDASM. I also have a Microsoft Press book on the IL
that I hope to spend more time on someday.

However, thanks for providing this interesting example and I can only hope
more people will take a look at the IL generated for their C# code.

Thomas P. Skinner [MVP]

Jeff Louie said:
Thomas.... You _can_ write in assembler, or disassemble the C# IL.

Here is some fun with bit level code that takes an int32 and outputs the
bit value by left shifting the
binary representation of the integer 32 times. This code demonstrates the
use of try catch IL and the
use of local variables to store temporary values. The CLR (Common Language
Runtime) will complain
if the try catch logic can lead to invalid states. We initialize the local
variable n_input before we try to
assign it a user value in try. This makes the CLR happy!

// TestILAssembler.il
// 11.28.04 Jeff Louie
.assembly extern mscorlib {}
.assembly TestILAssembler {.ver 1:0:1:0}

.method private static void Main(string[] args) cil managed
{
.entrypoint
.custom instance void [mscorlib]System.STAThreadAttribute::.ctor()
=
( 01 00 00 00 )
.maxstack 4
.locals init ([0] int32 n_counter,[1] int32 n_input)
.try // get user input
{
ldstr "Enter number: "
call void [mscorlib]System.Console::Write(string)
call string [mscorlib]System.Console::ReadLine()
call int32 [mscorlib]System.Int32::parse(string)
// parse may throw exception
stloc.1 // place valid user input into n_input
leave.s got_value // leave don't branch from try
} // end .try
catch [mscorlib]System.Object
{
pop // remove exception from stack
ldstr "Invalid Number."
call void [mscorlib]System.Console::WriteLine(string)
leave exit // invalid input so exit program
} // end handler

got_value: ldloc.1 // push value n_input onto stack
dup
call void [mscorlib]System.Console::Write(int32)
ldstr " [Input value] "
call void
[mscorlib]System.Console::WriteLine(string)
// **** (for int i=0; i<32; i++) ****
load_counter: ldc.i4 0 // initialize n_counter 0
stloc.0 // store in local variable n_count
begin_loop: ldloc.0 // get current counter value
ldc.i4 32
bge end_loop // break loop if counter is >=32
dup
ldc.i4 0x80000000
// bit mask 10000000000000000000000000000000
and // clear all bits except most significant
ldc.i4 0x80000000
ceq // push 1 if most sig bit is true, else 0
call void [mscorlib]System.Console::Write(int32)
ldc.i4 1 // push 1
shl // *** shift left 1, zero least sig bit ****
ldloc.0 // get n_counter
ldc.i4 1
add // add 1 to n_counter, postfix increment
stloc.0 // store new n_counter value
br begin_loop // branch to loop again
// **** end for loop ****
end_loop: pop // remove input value should be all zeros
ldstr " [32 bit value] "
end_algo: call void
[mscorlib]System.Console::WriteLine(string)
exit: ldstr "JAL 11.25.04"
call void
[mscorlib]System.Console::WriteLine(string)
ret
} // end of method Main


Here is the output [input]:


Enter number: 10
10 [Input value]
00000000000000000000000000001010 [32 bit value]
JAL 11.28.04


Pretty cool!
Ahhh for the good old days of inline assembler. At least then you knew
exactly what was going on. .NET and managed code is wonderful in many
respects, but does present challenges in writing optimized loops.<


**********************************************************************
Sent via Fuzzy Software @ http://www.fuzzysoftware.com/
Comprehensive, categorised, searchable collection of links to ASP &
ASP.NET resources...
 
T

Thomas P. Skinner [MVP]

Oh yes. Things are not like they used to be. Remember when you could
determine a loop's execution time by adding up the cycles for all the
instructions and multiplying by the clock period? In addition to the items
you mention, don't forget that the way you write a loop can affect
performance in the presence of "paging." Optimizing "locality of reference"
can be important. It's getting hard to predict a "best" algorithm without
running a timing test.

Thomas P. Skinner [MVP]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top