Assembler in c#

C

Clive Tooth

Here is some code that I would like to speed up...

=================================
unchecked
{
for (int i = startI; i < stopI; i++)
{
uint m = (a << 8) | carry;
carry = m/myBase;
if ((a = m%myBase) == 0)
{
WorkerFoundZero(myNumber, i);
}
} // i loop
} // unchecked
=================================

a[], carry, myBase are uints. It seems that I need to do *two* divide
instructions "m/myBase" and "m%myBase". Of course, the processor has a
DIV instruction that returns both the quotient and the remainder. How
can I get access to both results? Math.DivRem does not seem
particularly fast, also it only accepts int and long operands.

Ideally, I would like to be able to write a couple of lines of
assembler in my c# program.
 
C

Clive Tooth

Here is some code that I would like to speed up...

=================================
unchecked
{
for (int i = startI; i < stopI; i++)
{
uint m = (a << 8) | carry;
carry = m/myBase;
if ((a = m%myBase) == 0)
{
WorkerFoundZero(myNumber, i);
}
} // i loop
} // unchecked
=================================

a[], carry, myBase are uints. It seems that I need to do *two* divide
instructions "m/myBase" and "m%myBase". Of course, the processor has a
DIV instruction that returns both the quotient and the remainder. How
can I get access to both results? Math.DivRem does not seem
particularly fast, also it only accepts int and long operands.

Ideally, I would like to be able to write a couple of lines of
assembler in my c# program.


C# is good at a lot of different things, but inline assembly isn't one of
them.

That said, you have already done the division. Division is way more
expensive than addition or multiplication. So why not take advantage of
the result you already have:

carry = m / myBase;
remainder = m - carry * myBase;
if ((a = remainder) == 0)
{
...

?

I didn't write a benchmark to actually compare, but it's possible that
would be significantly faster than the two divisions.

Of course, that assumes that the JIT compiler is already not optimizing
your two operations into a single DIV instruction. Unless you've looked at
the generated machine code, you should not assume it's not.


Wow! That that change virtually doubled the speed of the loop. Thanks,
Pete.

Background: This is my runs-of-zeros-in-a-power-of-two program.
I am searching for a run of 18 consecutive zeros. I had changed the
program so that it would detect if the current power of two has a run
of at least 13 consecutive zeros. [I changed the internal
represenation from base one billion to base ten million.] If such a
run is present the program simply proceeds to the next power of two by
doubling the current one. If such a run is not present the program
multiplies the current power by 256. The effect of that change was to
make the program about 2.5 times faster. [Reducing the picoseconds per
digit metric to about 36.] For powers of two in the region of 2^(10^9)
it turns out that the program can multiply by 256 about 99.99% of the
time. Your change has further reduced the ps/digit figure to about 19.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top