Binary BitString to Int32 : Bit fiddlers only! Can you improve this?

R

Russell Mangel

Hi,

This programs converts a BitString "0101110" to an Int32 value.
Can anyone improve, or possibly shorten the following program?

public static int BitStringToInt32(String s)
{
int j, k, m = 0;
int len = s.Length - 1;

for (int i = 0; i <= len; i++)
{
j = 1;
k = (s - '0');

if ((k > 1) || (k < 0))
{
throw new System.IO.InvalidDataException("Invalid input. Must be 0 or
1");
}

j = j << (len - i);
m = m + k * j;
}

return m;
}

Thanks
Russell Mangel
Las Vegas, NV
 
P

Paul E Collins

Russell Mangel said:
Can anyone improve, or possibly shorten the following program?

public static int BitStringToInt32(String s)
{
return Convert.ToInt32(s, 2);
}

Eq.
 
J

Jon Skeet [C# MVP]

Russell Mangel said:
This programs converts a BitString "0101110" to an Int32 value.
Can anyone improve, or possibly shorten the following program?

Sure:

Convert.ToInt32(s, 2);
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Russell said:
Hi,

This programs converts a BitString "0101110" to an Int32 value.
Can anyone improve, or possibly shorten the following program?

public static int BitStringToInt32(String s)
{
int j, k, m = 0;
int len = s.Length - 1;

for (int i = 0; i <= len; i++)
{
j = 1;
k = (s - '0');

if ((k > 1) || (k < 0))
{
throw new System.IO.InvalidDataException("Invalid input. Must be 0 or
1");
}

j = j << (len - i);
m = m + k * j;
}

return m;
}

Thanks
Russell Mangel
Las Vegas, NV


Disregarding the obvious use of Convert, perhaps this is rather what you
were expecting?

public static int BitStringToInt32(String s) {
int result = 0;
foreach (char c in s) {
if (c != '0' && c != '1') {
throw new ArgumentException("Invalid input. Must be 0 or 1.");
}
result <<= 1;
result += c & 1;
}
return result;
}
 
P

Peter Duniho

Disregarding the obvious use of Convert, perhaps this is rather what you
were expecting?

public static int BitStringToInt32(String s) {
int result = 0;
foreach (char c in s) {
if (c != '0' && c != '1') {
throw new ArgumentException("Invalid input. Must be 0 or 1.");
}
result <<= 1;
result += c & 1;
}
return result;
}

IMHO, it seems to me that the answers using Convert() _are_ the best way.
They are not only more readable and more maintainable, it's likely they
actually perform better as well.

However, if you're going to try to do better, I would change the code you
posted to be something more like this:

public static int BitStringToInt32(String s)
{
int result = 0;
foreach (char c in s)
{
result <<= 1;
if (c == '1')
{
result += 1;
}
else if (c != '0')
{
throw new ArgumentException("Invalid input. Must be 0 or 1.");
}
}
return result;
}

Why waste time adding zero to the accumulator? Or calculating the zero in
the first place, or doing the type conversion from char to int, for that
matter? :)

Of course, technically the input string should be checked for length as
well. I leave that as an exercise for the reader. :)

Pete
 
G

Guest

Peter said:
IMHO, it seems to me that the answers using Convert() _are_ the best
way.

Yes, of course. :)
They are not only more readable and more maintainable, it's likely
they actually perform better as well.

However, if you're going to try to do better, I would change the code
you posted to be something more like this:

public static int BitStringToInt32(String s)
{
int result = 0;
foreach (char c in s)
{
result <<= 1;
if (c == '1')
{
result += 1;
}
else if (c != '0')
{
throw new ArgumentException("Invalid input. Must be 0 or 1.");
}
}
return result;
}

Why waste time adding zero to the accumulator? Or calculating the zero
in the first place, or doing the type conversion from char to int, for
that matter? :)

Yes, that is a bit better. :)
 
G

Gaz

Göran Andersson said:
Yes, of course. :)


Yes, that is a bit better. :)

I would say it depends on how the compiler will optimize your code. To me
the first version looks more efficient. You don't want any conditional
branch statements in the assembler code if you can help it, branching is
expensive in assembler (resets the pipeline).

Gaz
 
P

Peter Duniho

I would say it depends on how the compiler will optimize your code.

Well, if you want to get that pedantic about it (and apparently you do :)
), it also depends on how the CPU will optimize the code.
To me the first version looks more efficient.

Unless you're a compiler, I'm not sure that's a useful metric. :)
You don't want any conditional
branch statements in the assembler code if you can help it, branching is
expensive in assembler (resets the pipeline).

Branching in and of itself does not "reset the pipeline". Between having
multiple pipelines and branch prediction, simple branching by itself is
not necessarily a problem at all.

Beyond that, there's not really any difference with respect to branching
between the two versions. The compiler can optimize either into an
assumed "straight-through" case if it wants to. Though, the last time I
looked at this sort of optimization (granted, years ago), the "assumed"
case was the "true" if() statement, which means that in the former, the
optimized version will optimize for the failure case. Probably not the
best thing to do.

Besides, as far as "conditional branch statements" go, both versions have
the same nominal number of conditionals. The main difference is that the
latter version takes advantage of information generated by them to avoid
later computations. Avoiding unnecessary computations is a fundamental
technique in optimization.

(There is, of course, some chance that the compiler would notice the
opportunity to remove the unnecessary computations. But that would depend
on it having specific knowledge of the char to int conversion and relating
that to the constraint of the possible values of c by the time it gets to
the computation. I would be pretty surprised if it did pick up on that,
even as good as optimizing compilers are these days).

Of course, if the performance of this method is so critical it's worth
anywhere close to the scrutiny it's receiving here, the real issue is to
figure out why an application spends so much time converting strings to
their binary equivalent. :)

Pete
 
R

Russell Mangel

Thanks for your reply. I missed that overload for Convert.
I wonder how it will perform.
Russ
 
R

Russell Mangel

Thanks for your reply. I missing that overload for Convert.

I see that both you and Paul replied at almost the same time.

Russ.
 
R

Russell Mangel

Peter Duniho said:
Of course, if the performance of this method is so critical it's worth
anywhere close to the scrutiny it's receiving here, the real issue is to
figure out why an application spends so much time converting strings to
their binary equivalent. :)

Pete

Excellent observation, it is nice to see that you are capable of optimizing
code, but only if it makes sense to do so. As you stated, converting strings
to Int32 values in production code would be inefficient.

Russell
 
R

Russell Mangel

Thanks to all who replied.

Thanks to Göran for the optimization. (See I even got the ö right.)
Thanks to Peter for optimizing Göran's version.

Russell Mangel
Las Vegas, NV
 
G

Guest

Peter said:
Branching in and of itself does not "reset the pipeline". Between
having multiple pipelines and branch prediction, simple branching by
itself is not necessarily a problem at all.

Beyond that, there's not really any difference with respect to branching
between the two versions. The compiler can optimize either into an
assumed "straight-through" case if it wants to. Though, the last time I
looked at this sort of optimization (granted, years ago), the "assumed"
case was the "true" if() statement, which means that in the former, the
optimized version will optimize for the failure case. Probably not the
best thing to do.

Besides, as far as "conditional branch statements" go, both versions
have the same nominal number of conditionals. The main difference is
that the latter version takes advantage of information generated by them
to avoid later computations. Avoiding unnecessary computations is a
fundamental technique in optimization.

And how does that analysis explain that the latter version is
actually slower ?

Arne
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Russell said:
Thanks to Göran for the optimization. (See I even got the ö right.)
Thanks to Peter for optimizing Göran's version.

Before you consider something an optimization, then I suggest
that you make a little test to verify ...

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top