generate 20 byte int from GUID ?

  • Thread starter Thread starter John Grandy
  • Start date Start date
J

John Grandy

Is it possible to generate a 20 byte integer from a GUID that is "unique
enough" ( just like a GUID is not truly unique , but is "unique enough" ).

We identify transactions with GUIDs , but a partner web service has a 20
byte limit on transaction ID passed.
 
John said:
Is it possible to generate a 20 byte integer from a GUID that is "unique
enough" ( just like a GUID is not truly unique , but is "unique enough" ).

We identify transactions with GUIDs , but a partner web service has a 20
byte limit on transaction ID passed.

20 bytes = 160 bits. Most GUIDs are 128 bits, a 20 byte limit should be
more than enough to use a standard GUID.
 
The 3rd party's 20 byte representation of a TransactionID must be a string.
So each byte contains one character (ASCII coding). So although 20 bytes
are available, when converted to an integer this only corresponds to 20
nibbles or 10 bytes. So we need to convert a 32 nibble GUID to a 20 nibble
"psuedo-GUID".
 
John said:
The 3rd party's 20 byte representation of a TransactionID must be a string.
So each byte contains one character (ASCII coding). So although 20 bytes
are available, when converted to an integer this only corresponds to 20
nibbles or 10 bytes. So we need to convert a 32 nibble GUID to a 20 nibble
"psuedo-GUID".

Ah, ok I guess it all depends on how the GUIDS are created. I believe
GUIDs are generated using a combination of the Adapter Address and a
timestamp, that said, I doubt that uniqueness will be a problem if you
truncate a section of the GUID.

If you are really concerned you may want to look into generating your
own 20 nibble GUIDs according to criteria that will guarentee uniqueness.
 
John Grandy said:
Is it possible to generate a 20 byte integer from a GUID that is "unique enough" ( just like a
GUID is not truly unique , but is "unique enough" ).

We identify transactions with GUIDs , but a partner web service has a 20 byte limit on transaction
ID passed.

Just figured I would ask the obvious.
Does the Transaction ID need to be numeric or can it be Alphanumeric?
I'm assuming numeric, because otherwise there is no problem.

Bill
 
The 3rd party's 20 byte representation of a TransactionID must be a string.
So each byte contains one character (ASCII coding). So although 20 bytes
are available, when converted to an integer this only corresponds to 20
nibbles or 10 bytes. So we need to convert a 32 nibble GUID to a 20 nibble
"psuedo-GUID".

Hang on though - you can get an awful lot more than one nybble in a
single ASCII character!

If you can use *any* ASCII character (including the ones which aren't
really in ASCII, but everyone talks about as if they are - bell,
carriage return, etc - the control characters, basically) then you
should only need to drop one bit per byte, giving you 20*7=140 bits,
which is still enough to hold a full GUID.

Otherwise, you could use Base64 to give you 6 bits per byte. That means
dropping one byte from the GUID, but that would probably still be good
enough, I suspect. It may be worth finding out how the GUIDs are
generated by default in .NET - there could be one byte which is better
to get rid of than others.
 
Problem is that the string must be URL encoded. The 3rd party can only
accept 20 chars *including* any potential extra chars created by URL
encoding.

So, to ensure never go over the 20 char limit, that only leaves 0-9, a-z ,
A-Z = 62 chars.

62 ^ 20 = 7.044 x 10^35

Not enough to hold a full GUID. So how to hash it down is the question.
 
Problem is that the string must be URL encoded. The 3rd party can only
accept 20 chars *including* any potential extra chars created by URL
encoding.

So, to ensure never go over the 20 char limit, that only leaves 0-9, a-z ,
A-Z = 62 chars.

URL encoding shouldn't touch * and -, so replace the normal base 64
encoding with one which uses * instead of + and - instead of /, and you
end up with 64 characters. Not enough to hold a full GUID, but enough
to get 7 of the 8 bytes. Ditch one of the bytes, as I suggested, to get
120 useful bits.
 
Where can I find information on how
Convert.ToBase64String(Guid.ToByteArray()) performs its magic?

Then I write a replacement function following your idea .... correct ?
 
Hate to say this but what about just creating a table in your DB or where
ever your GUIDs are stored and point them to a IDENTITY column in SQL, or
the equivalent in the database you are using.

In SQL you could easily create a Stored Procedure that would accept the GUID
as a parameter and return you your "Transaction ID" to send to the other
provider. This way you also can easily look up the transaction ID for your
GUID if you needed to check status.

Just a thought.

Eric Renken
 
Where can I find information on how
Convert.ToBase64String(Guid.ToByteArray()) performs its magic?

Then I write a replacement function following your idea .... correct ?

Well, an alternative would be to just call Convert.ToBase64, and then
call Replace on the result and trim it to 20 characters.
 
That's exactly what I did.

But it looks like the loss of "uniqueness" is severe ...

loss of uniqueness = 1 - (64 ^ 20) / (2 ^ 128) = 1 - 0.00390625 = 99.61%

What I am wondering is if , in terms of uniqueness , there is any difference
between this method and randomly generating 20 base-64 chars ( using a
random generation technique at the same level of quality as the technique
used by the Guid.NewGuid() method )
 
I did think of that. We do create our transaction IDs with a SQL IDENTITY
column , and keeping track of the mapping between our transaction IDs and
their transaction IDs is definitely a good idea (although I don't presently
see any need for it, it's better to be safe than sorry).

Since it's a deterministic relationship, I believe that you could even do it
as a computed column in a view.

The problem with adding another column (even a computed column) to the
Transactions table and then populating and retrieving that column's value at
the time of adding a record is that it would have a severe performance
impact in a web-farm scenario, would it not ? Seems better to calculate
their transaction id in the CLR.

Any thoughts anyone ?
 
Where can I find information on how
Convert.ToBase64String(Guid.ToByteArray()) performs its magic?

in Rotor check out the \sscli\clr\src\vm\comutilnative.cpp line # 1973 to
see the magic. The Force is strong in there :-).

Ab.
http://joehacker.blogspot.com


John Grandy said:
Where can I find information on how
Convert.ToBase64String(Guid.ToByteArray()) performs its magic?

Then I write a replacement function following your idea .... correct ?
 
That's exactly what I did.

But it looks like the loss of "uniqueness" is severe ...

loss of uniqueness = 1 - (64 ^ 20) / (2 ^ 128) = 1 - 0.00390625 = 99.61%

What I am wondering is if , in terms of uniqueness , there is any difference
between this method and randomly generating 20 base-64 chars ( using a
random generation technique at the same level of quality as the technique
used by the Guid.NewGuid() method )

Well, the difficulty is getting the same level of quality. If you take
120 bits of the GUID, you do indeed lose out by a factor of 256 - but
there's still 2^120 possibilities, which is still a vast number!
 
Oh man, I'm terrible at cpp. Is the code for Guid creation available in c#
anywhere ?
 
Abubakar said:
I looked the at mono (open source .net for almost all platforms!) sources
and this is what I found:

Well, that shows Base64 conversion, but it doesn't show GUID creation
as far as I can see.
 
I looked the at mono (open source .net for almost all platforms!) sources
and this is what I found:

--------------------
inside the inside the \mcs\class\corlib\System\Convert.cs :

public static string ToBase64String (byte[] inArray)
{
if (inArray == null)
throw new ArgumentNullException ("inArray");

return ToBase64String (inArray, 0, inArray.Length);
}

public static string ToBase64String (byte[] inArray, int offset, int length)
{
if (inArray == null)
throw new ArgumentNullException ("inArray");
if (offset < 0 || length < 0)
throw new ArgumentOutOfRangeException ("offset < 0 || length < 0");
// avoid integer overflow
if (offset > inArray.Length - length)
throw new ArgumentOutOfRangeException ("offset + length > array.Length");

// note: normally ToBase64Transform doesn't support multiple block
processing
byte[] outArr = toBase64Transform.InternalTransformFinalBlock (inArray,
offset, length);

return (new ASCIIEncoding ().GetString (outArr));
}

inside the
\mcs\class\corlib\System.Security.Cryptography\ToBase64Transform.cs:

// Mono System.Convert depends on the ability to process multiple blocks
internal byte[] InternalTransformFinalBlock (byte[] inputBuffer, int
inputOffset, int inputCount)
{
int blockLen = this.InputBlockSize;
int outLen = this.OutputBlockSize;
int fullBlocks = inputCount / blockLen;
int tail = inputCount % blockLen;

byte[] res = new byte [(inputCount != 0)
? ((inputCount + 2) / blockLen) * outLen
: 0];

int outputOffset = 0;

for (int i = 0; i < fullBlocks; i++) {

TransformBlock (inputBuffer, inputOffset,
blockLen, res, outputOffset);

inputOffset += blockLen;
outputOffset += outLen;
}

byte[] lookup = Base64Constants.EncodeTable;
int b1,b2;

// When fewer than 24 input bits are available
// in an input group, zero bits are added
// (on the right) to form an integral number of
// 6-bit groups.
switch (tail) {
case 0:
break;
case 1:
b1 = inputBuffer [inputOffset];
res [outputOffset] = lookup [b1 >> 2];
res [outputOffset+1] = lookup [(b1 << 4) & 0x30];

// padding
res [outputOffset+2] = (byte)'=';
res [outputOffset+3] = (byte)'=';
break;

case 2:
b1 = inputBuffer [inputOffset];
b2 = inputBuffer [inputOffset + 1];
res [outputOffset] = lookup [b1 >> 2];
res [outputOffset+1] = lookup [((b1 << 4) & 0x30) | (b2 >> 4)];
res [outputOffset+2] = lookup [(b2 << 2) & 0x3c];

// one-byte padding
res [outputOffset+3] = (byte)'=';
break;
}

return res;
}


--------------------
from mono-1.1.9.1.

Ab.
http://joehacker.blogspot.com


John A Grandy said:
Oh man, I'm terrible at cpp. Is the code for Guid creation available in c#
anywhere ?
 
Back
Top