UNION in struct

  • Thread starter Thread starter Ken Allen
  • Start date Start date
K

Ken Allen

I have some code from C/C++ that I am attempting to port to C#. I have come
across an interesting problem that is quite common in complex C/C++ code:
the us of UNION in structure definitions to permit the same piece of memory
to be referenced as different data types. This is often used to save space
and permit a single piece of memory to contain different types of data,
typically based on some other flag in the structure. In other cases this
simply permits the data to be viewed in different ways -- for example, as an
array of 16 32-bit integers or as an array of 64 bytes. Since this is
defined at compile time, there is no runtime overhead in performing any
conversions or the like.

How can I achieve the same thing in C#? I am more interested in the latter
situation right now -- the ability to view the same piece of memory as
either an array of 16 "int" or 74 "byte" objects.

-ken
 
there's no union in C#, you can try to use the StructLayout attribute and FieldOffset attribut

[StructLayout( LayoutKind.Explicit)
class MyUnio

[FieldOffset(0)] byte byte1
[FieldOffset(1)] byte byte2
[FieldOffset(2)] byte byte3
[FieldOffset(3)] byte byte4
[FieldOffset(0)] int myInt


but I don't think it will work with array since it's a reference type.
 
Daniel Jin said:
there's no union in C#, you can try to use the StructLayout attribute and FieldOffset attribute

[StructLayout( LayoutKind.Explicit)]
class MyUnion
{
[FieldOffset(0)] byte byte1;
[FieldOffset(1)] byte byte2;
[FieldOffset(2)] byte byte3;
[FieldOffset(3)] byte byte4;
[FieldOffset(0)] int myInt;
}

but I don't think it will work with array since it's a reference type.

I do not want to actually us a union, and I am aware that it does not exist.
This structure is not being shared with any code, it is simply used in some
calculations.

I need a way to declare an array of integers and then be able to reference
this as either an array of bytes or longs. The lenth of this structure is
always 64 bytes in my example, so I want to be able to reference it as 64
bytes, 4 ints or 2 longs.

-ken
 
On this same note, how does one declare an array in a struct and have it
take space within the struct -- my impressions is that the declaration in
the struct will be a reference to the array, which is allocated elsewhere.

-ken
 
like I said, I don't think you can do that with an array because reference fields are not stored inline with the struct/class that contains them

btw, 64 bytes is more than just 4 ints, but you probably knew that

----- Ken Allen wrote: ----

Daniel Jin said:
there's no union in C#, you can try to use the StructLayout attribute an FieldOffset attribut
[StructLayout( LayoutKind.Explicit)
class MyUnio

[FieldOffset(0)] byte byte1
[FieldOffset(1)] byte byte2
[FieldOffset(2)] byte byte3
[FieldOffset(3)] byte byte4
[FieldOffset(0)] int myInt
but I don't think it will work with array since it's a reference type

I do not want to actually us a union, and I am aware that it does not exist
This structure is not being shared with any code, it is simply used in som
calculations

I need a way to declare an array of integers and then be able to referenc
this as either an array of bytes or longs. The lenth of this structure i
always 64 bytes in my example, so I want to be able to reference it as 6
bytes, 4 ints or 2 longs

-ke
 
----- Ken Allen wrote: ----
On this same note, how does one declare an array in a struct and have i
take space within the struct --

you can'
my impressions is that the declaration i
the struct will be a reference to the array, which is allocated elsewhere

that's correct

-ke
 
[StructLayout(LayoutKind.Explicit, Size=64)]
struct Union64Bytes
{
[FieldOffset(0)]
public byte Bytes;
[FieldOffset(0)]
public int Ints;
[FieldOffset(0)]
public long Longs;
}

To manipulate it, use pointers:

Union64Bytes u=new Union64Bytes();
byte* b=&u.Bytes;
int* i=&u.Ints;
long* l=&u.Longs;

b[0]=255;
b[1]=255;
b[2]=255;
b[3]=255;
Console.WriteLine(i[0].ToString()); //should print out -1, as all its
bits are set.
 
On this same note, how does one declare an array in a struct and have
it take space within the struct -- my impressions is that the
declaration in the struct will be a reference to the array, which is
allocated elsewhere.

The only place where this would be useful is when dumping data to stream,
or using interop.

For interop, you can specify attributes on the array member that makes the
marshalling code able to place the array into the marshalled data as though
it "were taking up space" in the struct.

Look at the MarshalAs attribute class for more details.

For dumping the struct to disk, you can do this with marshalling too. Just
define the struct as though you were going to marshal it for interop, then
convert it to a byte array before dumping it to the stream.
 
No, you are missing my point. This struct is not used for serialization or
communicaiton between different components. The issue is that this specific
algorithm, which was written in C originally, uses a data structure to hold
the context for the algorithm at different phases. Within this structure is
a union of two arrays that occupy the same amount of physical memory -- some
of the code refers to the contents of this union as an array of bytes and
manipulates individual bytes, and then other parts of the same code refers
to the same union content as an array of integers and manipulates the
individual integer values.

In fact, I do not even need the structure if I convert this into a C# class
object, since the context variables can be data members of the class. But I
still need to have the ability to refer to a collection of a sequences of
128 bits as either an array of 16 bytes or an array of 4 ints. The algorithm
is extrmely compute intensive, and so I cannot afford any process (such as
marshalling the data into and out of an IntPtr buffer) -- what I would truly
like is a way to allocate the memory and then cast it as either an array of
16 bytes or 4 int values.

-ken
 
Hello Ken,

The only way to do what you really want is to abandon C# and use C++.
However, if you are willing to pay a perf penalty, you can store the data as
16 bytes, either as hardcoded fields (byte0, byte1, ...) or as an array
(larger), and write an indexer or function for your C# struct or class that
takes the index of an the integer you wish to get or set, and dynamically
performs the conversion.

To perform the conversion you can use bitshifting and combining ((byteX <<
(X * 8)) | ...) or the BitConverter functions (ToInt32).

Regards,
Frank Hileman

check out VG.net: www.vgdotnet.com
Animated vector graphics system
Integrated Visual Studio .NET graphics editor
 
Drat!

The structure issue is a complete red herring, but the need to be able to
reference the same piece of memory in different ways and to be able to
manipulate it accordingly is critical to being able to port some
mathematical algorithms into C#! We wanted to use a single programming
language in our approach, not because the languages do not play well
together, but because we wanted a single assembly as the result -- since
different languages cannot be used in the same assembly, this forces us to
create an assembly (DLL) in managed C++ that is called from the large amount
of code already written in C#!

Drat and double drat!

-Ken
 
Let me take another tack on this. I have been decomposing the specific
algorithm that we have been translating, or attempting to translate, from C
to C#. The original algorithm used a structure that contained a union of an
array of bytes and an array of ints -- it turns out that the array of bytes
is only used to populate the array with information read from a file.
Basically the algorithm is this:

1. While not end of file
2. Read 64KB from file
3. While more in buffer
4. Copy 64 bytes into union.byte array
5. Call processing routine with union.word (UInt32) array
6. End While
7. End While

The code is reading 64KB at a time for performance reasons to ensure that
the data is being buffered in a meanginful way.

If the file does not contain a multiple of 64 bytes, then the 'last' segment
of the file is simply copied to the byte array portion of the union and then
treated as an array of UInt32 values.

Assuming I want to replicate this algorithm as closely as possible, how can
I read 64KB into some buffer than then transfer 4-64 bytes at a time into an
array of UInt32?

I suspect I need this since the file length may not be a multiple of 4
bytes.

-ken
 
Ken said:
I have some code from C/C++ that I am attempting to port to C#. I have come
across an interesting problem that is quite common in complex C/C++ code:
the us of UNION in structure definitions to permit the same piece of memory
to be referenced as different data types. This is often used to save space
and permit a single piece of memory to contain different types of data,
typically based on some other flag in the structure. In other cases this
simply permits the data to be viewed in different ways -- for example, as an
array of 16 32-bit integers or as an array of 64 bytes. Since this is
defined at compile time, there is no runtime overhead in performing any
conversions or the like.

How can I achieve the same thing in C#? I am more interested in the latter
situation right now -- the ability to view the same piece of memory as
either an array of 16 "int" or 74 "byte" objects.

Here's an implementation of a Union class that you can pass a byte array
into via the constructor. You can then access it as bytes, Int16's or
Int32's via a few properties.

This might do what you need, or at least give you a starting point.
Sorry for the lack of comments this was just a quickly I did sometime
over the weekend:


//===================================================================
using System;

public class ByteIndexer {
private byte[] _data;

public ByteIndexer( byte [] data) {
_data = data;
}

public byte this[int index] {
get {
return( _data[index]);
}
set {
_data[index] = value;
}
}
}

public class Int16Indexer {
private byte[] _data;
private const int _elementSize = 2;

public Int16Indexer( byte [] data) {
if ((data.Length % _elementSize) != 0) {
throw new ArgumentException( String.Format( "Size of a
Int16Indexer array must be a multiple of {0}", _elementSize));
}

_data = data;
}


public short this[int index] {
get {
if ((index < 0) || ((index+1) * _elementSize >
_data.Length)) {
throw new IndexOutOfRangeException();
}

return( BitConverter.ToInt16( _data, index *
_elementSize));
}
set {
if ((index < 0) || ((index+1) * _elementSize >
_data.Length)) {
throw new IndexOutOfRangeException();
}

Array.Copy( BitConverter.GetBytes( value), 0, _data,
index * _elementSize, _elementSize);
}
}
}

public class Int32Indexer {
private byte[] _data;
private const int _elementSize = 4;

public Int32Indexer( byte [] data) {
if ((data.Length % _elementSize) != 0) {
throw new ArgumentException( String.Format( "Size of a
Int32Indexer array must be a multiple of {0}", _elementSize));
}

_data = data;
}

public int this[int index] {
get {
if ((index < 0) || ((index+1) * _elementSize >
_data.Length)) {
throw new IndexOutOfRangeException();
}

return( BitConverter.ToInt32( _data, index *
_elementSize));
}
set {
if ((index < 0) || ((index+1) * _elementSize >
_data.Length)) {
throw new IndexOutOfRangeException();
}

Array.Copy( BitConverter.GetBytes( value), 0, _data,
index * _elementSize, _elementSize);
}
}
}


public class Union {
private byte[] _data;
private const int _defaultSize = 16;
private const int _elementSize = 4;

private ByteIndexer _byteIndexer;
private Int16Indexer _int16Indexer;
private Int32Indexer _int32Indexer;


private void initMembers( byte [] data) {
_byteIndexer = new ByteIndexer( data);
_int16Indexer = new Int16Indexer( data);
_int32Indexer = new Int32Indexer( data);
}

public Union(): this( _defaultSize) {}

public Union( int size) {
// size must be a multiple of 4 to cleanly handle ints

if ((size % _elementSize) != 0) {
throw new ArgumentOutOfRangeException( "size", size,
String.Format( "Size of a Union must be a multiple of {0}", _elementSize));
}

_data = new byte[size];
initMembers( _data);
}

public Union( byte[] data) {
if ((data.Length % _elementSize) != 0) {
throw new ArgumentException( String.Format( "Size of a
Union must be a multiple of {0}", _elementSize));
}

initMembers( data);
}

public byte[] GetArray() {
return( _data); //TODO: should a copy of the array
be returned instead of a reference?
}


public ByteIndexer ByteAccess {
get {
return( _byteIndexer);
}
}

public Int16Indexer Int16Access {
get {
return( _int16Indexer);
}
}

public Int32Indexer Int32Access {
get {
return( _int32Indexer);
}
}

}


/// <summary>
/// Summary description for Class1.
/// </summary>
class MainClass
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static void Main(string[] args)
{
int i;
Union myUnion = new Union( 32);

for (i = 0; i < 32; i++) {
myUnion.ByteAccess = (byte) i;
}

myUnion.Int32Access[3] = unchecked( (int) 0xdeadbeef);

Console.WriteLine( "DWord access...");
Console.WriteLine();

for (i = 0; i < 32/4; i++) {
Console.WriteLine( myUnion.Int32Access.ToString( "X"));
}

// now Word sized access
Console.WriteLine( "Word access...");
Console.WriteLine();

Console.WriteLine( myUnion.Int16Access[15].ToString( "X"));

try {
myUnion.Int16Access[16] = 255;
}
catch (Exception ex) {
// the above should throw an exception
Console.WriteLine( ex);

}
}
}
//===================================================================
 
Mike,

This looks pretty good, but I have some questions.

1. Would it make sense to make the individual indexers inner classes of the
Union class?

2. Should not the limit checks in the Int32 indexer class be different from
those in the Int16 indexer class? Both only check index and index+1.

3. I can see that this works since all of the indexer classes share a
reference to the same byte array internally, but if initialized using a
constructor that passes in a byte[] reference, then external code will also
have a direct reference to the same data. Would it not be better to have the
Union constructor initialize the inner classes using a copy of that array?

4. How efficient are the BitConverter.ToInt16 and BitConverted.ToInt32
methods? I need to be able to call this inside a rather tight loop, so if
this is not efficient then the performance will really suffer. I have been
thinking about using a byte array and then using Buffer.BlockCopy to
transfer the results into an Int32 array -- this works, but I have not yet
checked the performance -- the cost is in the single copy and from then on
the array references run at full speed, whereas you classes, which are quite
clever, require a BitConverter call for every access.

-ken

mikeb said:
Ken said:
I have some code from C/C++ that I am attempting to port to C#. I have come
across an interesting problem that is quite common in complex C/C++ code:
the us of UNION in structure definitions to permit the same piece of memory
to be referenced as different data types. This is often used to save space
and permit a single piece of memory to contain different types of data,
typically based on some other flag in the structure. In other cases this
simply permits the data to be viewed in different ways -- for example, as an
array of 16 32-bit integers or as an array of 64 bytes. Since this is
defined at compile time, there is no runtime overhead in performing any
conversions or the like.

How can I achieve the same thing in C#? I am more interested in the latter
situation right now -- the ability to view the same piece of memory as
either an array of 16 "int" or 74 "byte" objects.

Here's an implementation of a Union class that you can pass a byte array
into via the constructor. You can then access it as bytes, Int16's or
Int32's via a few properties.

This might do what you need, or at least give you a starting point.
Sorry for the lack of comments this was just a quickly I did sometime
over the weekend:


//===================================================================
using System;

public class ByteIndexer {
private byte[] _data;

public ByteIndexer( byte [] data) {
_data = data;
}

public byte this[int index] {
get {
return( _data[index]);
}
set {
_data[index] = value;
}
}
}

public class Int16Indexer {
private byte[] _data;
private const int _elementSize = 2;

public Int16Indexer( byte [] data) {
if ((data.Length % _elementSize) != 0) {
throw new ArgumentException( String.Format( "Size of a
Int16Indexer array must be a multiple of {0}", _elementSize));
}

_data = data;
}


public short this[int index] {
get {
if ((index < 0) || ((index+1) * _elementSize >
_data.Length)) {
throw new IndexOutOfRangeException();
}

return( BitConverter.ToInt16( _data, index *
_elementSize));
}
set {
if ((index < 0) || ((index+1) * _elementSize >
_data.Length)) {
throw new IndexOutOfRangeException();
}

Array.Copy( BitConverter.GetBytes( value), 0, _data,
index * _elementSize, _elementSize);
}
}
}

public class Int32Indexer {
private byte[] _data;
private const int _elementSize = 4;

public Int32Indexer( byte [] data) {
if ((data.Length % _elementSize) != 0) {
throw new ArgumentException( String.Format( "Size of a
Int32Indexer array must be a multiple of {0}", _elementSize));
}

_data = data;
}

public int this[int index] {
get {
if ((index < 0) || ((index+1) * _elementSize >
_data.Length)) {
throw new IndexOutOfRangeException();
}

return( BitConverter.ToInt32( _data, index *
_elementSize));
}
set {
if ((index < 0) || ((index+1) * _elementSize >
_data.Length)) {
throw new IndexOutOfRangeException();
}

Array.Copy( BitConverter.GetBytes( value), 0, _data,
index * _elementSize, _elementSize);
}
}
}


public class Union {
private byte[] _data;
private const int _defaultSize = 16;
private const int _elementSize = 4;

private ByteIndexer _byteIndexer;
private Int16Indexer _int16Indexer;
private Int32Indexer _int32Indexer;


private void initMembers( byte [] data) {
_byteIndexer = new ByteIndexer( data);
_int16Indexer = new Int16Indexer( data);
_int32Indexer = new Int32Indexer( data);
}

public Union(): this( _defaultSize) {}

public Union( int size) {
// size must be a multiple of 4 to cleanly handle ints

if ((size % _elementSize) != 0) {
throw new ArgumentOutOfRangeException( "size", size,
String.Format( "Size of a Union must be a multiple of {0}", _elementSize));
}

_data = new byte[size];
initMembers( _data);
}

public Union( byte[] data) {
if ((data.Length % _elementSize) != 0) {
throw new ArgumentException( String.Format( "Size of a
Union must be a multiple of {0}", _elementSize));
}

initMembers( data);
}

public byte[] GetArray() {
return( _data); //TODO: should a copy of the array
be returned instead of a reference?
}


public ByteIndexer ByteAccess {
get {
return( _byteIndexer);
}
}

public Int16Indexer Int16Access {
get {
return( _int16Indexer);
}
}

public Int32Indexer Int32Access {
get {
return( _int32Indexer);
}
}

}


/// <summary>
/// Summary description for Class1.
/// </summary>
class MainClass
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static void Main(string[] args)
{
int i;
Union myUnion = new Union( 32);

for (i = 0; i < 32; i++) {
myUnion.ByteAccess = (byte) i;
}

myUnion.Int32Access[3] = unchecked( (int) 0xdeadbeef);

Console.WriteLine( "DWord access...");
Console.WriteLine();

for (i = 0; i < 32/4; i++) {
Console.WriteLine( myUnion.Int32Access.ToString( "X"));
}

// now Word sized access
Console.WriteLine( "Word access...");
Console.WriteLine();

Console.WriteLine( myUnion.Int16Access[15].ToString( "X"));

try {
myUnion.Int16Access[16] = 255;
}
catch (Exception ex) {
// the above should throw an exception
Console.WriteLine( ex);

}
}
}
//===================================================================
 
Hello Ken,

Yes, to make secure code easy to verify they only allow one interpretation
of memory contents. However, the bit shifting and combining method I
mentioned is very fast in C# -- I have used similar methods with no
noticeable perf hit, at least in my scenarios.

Here is a way to make it easier to use: create a struct, called Int4. In
that put four byte fields:
byte byte0;
byte byte1;
byte byte2;
byte byte3;

You can make the fields public or wrap them in props. Then add another
property, IntValue, where you do the bit shifting and | operation to convert
to and from the 4 bytes.

Now you have encapsulated all the ugliness in a struct. Structs are very
fast, and your property code should be inlined, so you can reuse your struct
in the same way your would use a similar set of functions in C++. You can
create an array of these structs, or put temporaries on the heap, and it
will all be very compact and fast. We often wrap bit manipulation this way
in structs.

In the framework there is a similar type for bit vectors.

I think you will be pleasantly surprised when you perform some perf tests
with your struct.

- Frank
 
I believe that I may attempt something along these lines. The specific
algorithm I am considering at the moment is an SHA-256 hash generator. The
unmanaged C code (in Visual Studio 6) executes on a large file in 12
seconds, but the built-in System.Security.Cryptography.SHA256Managed class
takes more than 74 seconds!

My only concern now is with byte ordering. While this code will likely to
only be used on Windows systems, I am still concerned that the algorithm do
the right thing. The C code that casts the two array types handles this
auto-magically, but I am concerned that if I use the class structure you
defined it may be processor specific.

-Ken
 
Hello Ken,

Best way to handle byte ordering problem: in the constructor for the struct.
For example, suppose you create a function that takes an array of bytes, and
returns an array of Int4 (the struct we speak of). As a parameter to the
function you specify the byte order used in the array of bytes. This depends
on your input -- you could use a byte order mark (2 bytes, 0xFEFFwould do)
or bool in the beginning of the array when you save it to determine the
saved byte order. So if the function receives big-endian, you do construct
your structs one way, if little-endian, you construct the other.

If your byte array will only be created on one type of platform, you will
always interpret the same way. BitConverter.IsLittleEndian can tell you the
byte order on any platform. Intel is little-endian, so I would use that by
default. I would experiment both with the BitConverter calls and your own
shift/combine to see which is faster.

If you read byte0, byte1, byte2, byte3 in order, and it is stored
little-endian, convert to an int as follows:

byte0 | (byte1 << 8) | (byte2 << 16) | (byte3 << 24)

For a big buffer processing, with good coding the C compiler optimization
probably cannot be beat, except with some custom assembly coding. The C/C++
compiler has extra optimizations you don't get in C#. Still, you never know
till you try.

- Frank
 
Ken said:
Mike,

This looks pretty good, but I have some questions.

1. Would it make sense to make the individual indexers inner classes of the
Union class?

Sure, why not?
2. Should not the limit checks in the Int32 indexer class be different from
those in the Int16 indexer class? Both only check index and index+1.

Note that the check for index+1 is scaled by _elementSize before
checking against the Length of the data byte array.
3. I can see that this works since all of the indexer classes share a
reference to the same byte array internally, but if initialized using a
constructor that passes in a byte[] reference, then external code will also
have a direct reference to the same data. Would it not be better to have the
Union constructor initialize the inner classes using a copy of that array?

That depends on whether you want the Union to have Value semantics or
reference semantics. If you look at my TODO comment in the GetArray()
method, you'll see that I wasn't really sure which it should have. I
think that value semantics would probably make more sense, in which case
your comment should be acted on (and the GetArray() method should
return a copy instead of a reference to the internal array)
4. How efficient are the BitConverter.ToInt16 and BitConverted.ToInt32
methods? I need to be able to call this inside a rather tight loop, so if
this is not efficient then the performance will really suffer. I have been
thinking about using a byte array and then using Buffer.BlockCopy to
transfer the results into an Int32 array -- this works, but I have not yet
checked the performance -- the cost is in the single copy and from then on
the array references run at full speed, whereas you classes, which are quite
clever, require a BitConverter call for every access.

When I wrote this, I was under the impression that the intended access
to the data would be a mix of various types (byte, Int16 and Int32) over
the life of the application. I see now that it's really an Int32 array
that gets read into a byte array initially. I think there's no doubt
that once you get the data you should copy it to an Int32 array for
subsequent use to get the best performance.
 
Back
Top