C# Unsafe Bug

J

JS

I was writing some routines which could do bitwise boolean operations
on byte arrays, and I ran into what I think is a bug with C#'s unsafe
code. I am pasting a console application below. Can anyone give an
explanation, or has this type of problem been reported already?
Thanks.

using System;
using System.Collections.Generic;
using System.Text;

namespace UnsafeBug
{
class Program
{
static void Main(string[] args)
{
byte[] dest = new byte[] { 0x01, 0xff };
byte[] sdata1 = new byte[] { 0x01, 0xff };
UnsafeOr(sdata1, dest, dest, 7);

for (int ii = 0; ii < dest.Length; ii++)
{
Console.WriteLine("Byte {0} = 0x{1:X2}", ii + 1, dest[ii]);
}
Console.WriteLine("Hit Enter to quit..."); Console.ReadLine();
}

// OR's together 2 byte arrays into 'dest' without touching the
first few bits of 'dest'.
// it is allowed for sources/destinations to be the same.
static void UnsafeOr(byte[] src1, byte[] src2, byte[] dest, int
firstBit)
{
byte mask = (byte)(0x80 >> firstBit);
int nbytes = Math.Min(Math.Min(src1.Length, src2.Length),
dest.Length);
unsafe
{
fixed (byte* s1data = &src1[0], s2data = &src2[0], ddata =
&dest[0])
{
byte* s1ptr = s1data;
byte* s2ptr = s2data;
byte* dptr = ddata;

#if false
// this version works
byte val = (byte)(*dptr & ~mask);
*dptr++ = (byte)(val | ((*s1ptr++ | *s2ptr++)&mask));
#else
// this version does not work
*dptr++ = (byte)((*dptr & ~mask) | ((*s1ptr++ |
*s2ptr++)&mask));
#endif
for (int ii = 1; ii < nbytes; ii++)
{
*dptr++ = (byte)(*s1ptr++ | *s2ptr++);
}
}
}
}
}
}
 
J

Jon Skeet [C# MVP]

JS said:
I was writing some routines which could do bitwise boolean operations
on byte arrays, and I ran into what I think is a bug with C#'s unsafe
code. I am pasting a console application below. Can anyone give an
explanation, or has this type of problem been reported already?

<snip>

I don't believe this is a bug, and I don't believe it has anything to
do with unsafe code. Here's a smaller app which shows the same effect
without any unsafe code.

using System;

class Test
{
static void Main()
{
int[] x = {10};
int i=0;
x[i++] = i;
Console.WriteLine (x[0]);
}
}

That will print out 1, not 0 (which is what you'd have expected, I
believe).

What you're seeing is the postfix increment being executed before the
evaluation of the right hand side of the assignment operator.

This is entirely correct according to the specification.
 
S

Stefan Simek

JS said:
I was writing some routines which could do bitwise boolean operations
on byte arrays, and I ran into what I think is a bug with C#'s unsafe
code. I am pasting a console application below. Can anyone give an
explanation, or has this type of problem been reported already?
Thanks.

using System;
using System.Collections.Generic;
using System.Text;

namespace UnsafeBug
{
class Program
{
static void Main(string[] args)
{
byte[] dest = new byte[] { 0x01, 0xff };
byte[] sdata1 = new byte[] { 0x01, 0xff };
UnsafeOr(sdata1, dest, dest, 7);

for (int ii = 0; ii < dest.Length; ii++)
{
Console.WriteLine("Byte {0} = 0x{1:X2}", ii + 1, dest[ii]);
}
Console.WriteLine("Hit Enter to quit..."); Console.ReadLine();
}

// OR's together 2 byte arrays into 'dest' without touching the
first few bits of 'dest'.
// it is allowed for sources/destinations to be the same.
static void UnsafeOr(byte[] src1, byte[] src2, byte[] dest, int
firstBit)
{
byte mask = (byte)(0x80 >> firstBit);
int nbytes = Math.Min(Math.Min(src1.Length, src2.Length),
dest.Length);
unsafe
{
fixed (byte* s1data = &src1[0], s2data = &src2[0], ddata =
&dest[0])
{
byte* s1ptr = s1data;
byte* s2ptr = s2data;
byte* dptr = ddata;

#if false
// this version works
byte val = (byte)(*dptr & ~mask);
*dptr++ = (byte)(val | ((*s1ptr++ | *s2ptr++)&mask));
#else
// this version does not work
*dptr++ = (byte)((*dptr & ~mask) | ((*s1ptr++ |
*s2ptr++)&mask));
#endif
for (int ii = 1; ii < nbytes; ii++)
{
*dptr++ = (byte)(*s1ptr++ | *s2ptr++);
}
}
}
}
}
}

The problem lies in the fact that left side of the assignment is
evaluated before the right side. First, *dptr++ is evaluated,
incrementing dptr but still resulting in the correct destination
address. When the right side is evaluated afterwards, the dptr is
already incremented so it points to a different location.

So you can either change your code to

*dptr = (byte)((byte)(*dptr++ & ~mask) | ((*s1ptr++ | *s2ptr++)&mask));

or (maybe safer)

*dptr = (byte)((byte)(*dptr & ~mask) | ((*s1ptr++ | *s2ptr++)&mask));
dptr++

HTH,
Stefan
 
J

JS

OK, I guess. It was my understanding that postfix was done after the
statement was executed. I suppose I should look at the language
specifications.

Thanks for the quick responses.
 
J

Jon Skeet [C# MVP]

JS said:
OK, I guess. It was my understanding that postfix was done after the
statement was executed. I suppose I should look at the language
specifications.

Well, while I would never suggest that it's not a good idea to know the
language well, I think it would also be better to split your current
statement up into potentially several, to make it more readable. It
certainly wasn't obvious to me what it should be doing, partly due to
the number of side-effects involved.

I assume the reason for using unsafe code is performance - I would
benchmark various possible versions of the code (including safe code)
and record details of those benchmarks (source and result). You can
then make a trade-off between performance and readability (if, indeed,
the most readable code doesn't also perform the best).

(Also, shouldn't the mask be 0xff >> firstBit, not 0x80 >> firstBit?)
 
J

JS

Yes, it should have been 0xff>>firstBit. It got messed up when I
created the small project to demonstrate my 'problem'.

As you surmised we need to fully optimize this function. Our
applications sometimes call these image processing operations many
times, with images up to 8192x8192. If we don't optimize, it gets very
slow.

I will try benchmarking the performance using safe code, as you
suggest, although for these low-level functions readability is
secondary to speed.
 
J

JS

I have done a quick benchmark:

Average time to do simple binary image OR (1000x1000 pixels = 125000
bytes):
Unsafe code: 3.0 milliseconds
Safe code: 2.5 milliseconds

This is significant but not too significant. I will consider having 2
versions of each function, the default function being safe and an
unsafe version where it makes sense.
 
S

Stefan Simek

JS said:
I have done a quick benchmark:

Average time to do simple binary image OR (1000x1000 pixels = 125000
bytes):
Unsafe code: 3.0 milliseconds
Safe code: 2.5 milliseconds

This is significant but not too significant. I will consider having 2
versions of each function, the default function being safe and an
unsafe version where it makes sense.
The safe version seems to be faster in this case, so I think there is no
real need to use unsafe code at all ;)
 
J

Jon Skeet [C# MVP]

JS said:
I have done a quick benchmark:

Average time to do simple binary image OR (1000x1000 pixels = 125000
bytes):
Unsafe code: 3.0 milliseconds
Safe code: 2.5 milliseconds

This is significant but not too significant. I will consider having 2
versions of each function, the default function being safe and an
unsafe version where it makes sense.

Have you got those the wrong way round? If the safe code is faster, why
not use that all the time?
 
J

Josh Sebastian

Jon said:
using System;

class Test
{
static void Main()
{
int[] x = {10};
int i=0;
x[i++] = i;
Console.WriteLine (x[0]);
}
}

That will print out 1, not 0 (which is what you'd have expected, I
believe).

What you're seeing is the postfix increment being executed before the
evaluation of the right hand side of the assignment operator.

This is entirely correct according to the specification.

I'm very new to C#, so if I'm way off-base, please let me know. I asked
myself the other day what the effect of the snippet

int i=0;
i = i++;

would be, and I couldn't find any suitable answer in my book (Deitel
and Deitel) or in an MSDN article. So I downloaded a copy of the spec
and started reading.

I'm a C++ programmer, and I'm well-aware of sequence points and that
the result of that snippet (and yours) in C or C++ is undefined. I
thought C# might have a similar concept, and indeed it does. Section
3.10 defines a "critical execution point", between two of which side
effects can be reordered. There are two side effects in that statement:
incrementing i, and assigning the (unincremented) value of i to i. As
far as I can tell, 3.10 says that those can occur in any order.

Basically, I'm saying that my reading of the spec says your snippet
might print 0 sometimes, and it might print 1 other times.

Josh
 
J

Jon Skeet [C# MVP]

Josh Sebastian said:
I'm very new to C#, so if I'm way off-base, please let me know. I asked
myself the other day what the effect of the snippet

int i=0;
i = i++;

would be, and I couldn't find any suitable answer in my book (Deitel
and Deitel) or in an MSDN article. So I downloaded a copy of the spec
and started reading.

I'm a C++ programmer, and I'm well-aware of sequence points and that
the result of that snippet (and yours) in C or C++ is undefined. I
thought C# might have a similar concept, and indeed it does. Section
3.10 defines a "critical execution point", between two of which side
effects can be reordered. There are two side effects in that statement:
incrementing i, and assigning the (unincremented) value of i to i. As
far as I can tell, 3.10 says that those can occur in any order.

Basically, I'm saying that my reading of the spec says your snippet
might print 0 sometimes, and it might print 1 other times.

No, that's not true. It will *always* print 1. The spec states that the
assignment operator evaluates the LHS first, then the RHS. The
evaluation of the LHS includes incrementing i. See section 7.13.1 of
the spec (14.13.1 in the ECMA numbering) for that.

Critical execution points are about what becomes visible to other
threads when. There's no ambiguity about what the "original program
order" is (which is the order the program must be perceived to be run
in from the point of view of the executing thread).
 
J

Josh Sebastian

Jon said:
Critical execution points are about what becomes visible to other
threads when. There's no ambiguity about what the "original program
order" is (which is the order the program must be perceived to be run
in from the point of view of the executing thread).

OK, thanks Jon.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top