Compress ASCII text as Hex?

Ben Bloom · Nov 10, 2004

Hi -

I was speaking with someone who mentioned that it's possible to encode
an ascii string as hex(?) in order to fit more data into the same # of
chars. Can anyone enlighten me?

The scenario is - I've got a CSV with a field that has a 16 character
limit. I need to fit potentially 24 ASCII characters into it.

Thanks.
-Ben

Nicholas Paldino [.NET/C# MVP] · Nov 10, 2004

Ben,

You can't do that unless you limit the range of characters that can be
used in the 24 character string. Without doing that, you have to accept the
full range of characters and you can't just squeeze them in there without
some loss.

Hope this helps.

Ben Bloom · Nov 10, 2004

Thanks Nicholas,

The 24 character string is a concatenation of a number (8-10 digits, I
believe) and two other string fields. Would I have more success if I
tried to shrink the number only?

-Ben

Guest · Nov 11, 2004

if you are using a subset of characters, try fit 2 characters into character
written to the csv file,
say for example you were only interested in the character codes from 0-127,
you could write the string "me" i.e. hex codes 6d and 65, into one character
[pseudo]
char c = 0x6d65
[/pseudo]

and write that single char to the text file,
then when you read it, you breake it up again.
hope that helps

Jon Skeet [C# MVP] · Nov 11, 2004

<"=?Utf-8?B?QnJpYW4gS2VhdGluZyBFSTlGWEI=?=" <csharp at

briankeating.net> said:
if you are using a subset of characters, try fit 2 characters into character
written to the csv file,
say for example you were only interested in the character codes from 0-127,
you could write the string "me" i.e. hex codes 6d and 65, into one character
[pseudo]
char c = 0x6d65
[/pseudo]

and write that single char to the text file,
then when you read it, you breake it up again.

Note that that will only work if your CSV file is written in a Unicode-
supporting encoding. There's also no absolute guarantee that it won't
end up forming invalid characters, or characters which the reader might
normalize to a different but equivalent form as far as Unicode is
concerned. I doubt that it'll be a problem, but it's worth bearing in
mind.

Guest · Nov 11, 2004

Yes your right,
Encoding could prevent a problem but my description was slightly actually
more than slightly incorrect,
if we were limited the the 0-127 characters for the ascii table then we
would be using 7 bits to represent a character, therefore for every 7
characters we could squeeze in an extra char.
More trouble than it's worth i guess.

regards
Brian.

Jon Skeet said:
<"=?Utf-8?B?QnJpYW4gS2VhdGluZyBFSTlGWEI=?=" <csharp at

briankeating.net> said:

if you are using a subset of characters, try fit 2 characters into character
written to the csv file,
say for example you were only interested in the character codes from 0-127,
you could write the string "me" i.e. hex codes 6d and 65, into one character
[pseudo]
char c = 0x6d65
[/pseudo]

and write that single char to the text file,
then when you read it, you breake it up again.

Click to expand...

Note that that will only work if your CSV file is written in a Unicode-
supporting encoding. There's also no absolute guarantee that it won't
end up forming invalid characters, or characters which the reader might
normalize to a different but equivalent form as far as Unicode is
concerned. I doubt that it'll be a problem, but it's worth bearing in
mind.

Jon Skeet [C# MVP] · Nov 11, 2004

<"=?Utf-8?B?QnJpYW4gS2VhdGluZyBFSTlGWEI=?=" <csharp at

briankeating.net> said:
Yes your right,
Encoding could prevent a problem but my description was slightly actually
more than slightly incorrect,
if we were limited the the 0-127 characters for the ascii table then we
would be using 7 bits to represent a character, therefore for every 7
characters we could squeeze in an extra char.
More trouble than it's worth i guess.

Certainly when the only necessity is to squeeze 24 characters into 16

James Curran · Nov 11, 2004

There's a method, but it's a bit snarky....

There an encoding format code BASE64 (also known as UUEncoding in some
quarters). It take fully binary data (0-255) and converts it a set of 64
printable characters (digits, uppercase, lowercase plus two symbols + and
/). Since email messages are required to be pure printable text (due to
some ancient hardware, which are almost certainly no longer on the 'net),
all attachments are BASE64 encoded. It converts 3 binary bytes into 4
characters, so encoded blocks increase 33% in size.

So, what does this effect you? Well, as long as your "encoded" string meets
the criteria of Base64 encoding, you can "decode" it into a smaller block of
binary data. 4 characters will become 3 bytes, or in your case, 20
characters can become 15 bytes.

string origString = "123456,abcdef,ghijkl"; // 20 character CSV text

string prepareText = origString.Replace(',', '+'); // Replace commas with
plus signs
byte[] compressedText = Convert.FromBase64String(prepareText);
Console.WriteLine("Length of Conpressed text = {0}", compressedText.Length);
// Save compressedText to your store.
// :
// Later read it back
string alteredText = Convert.ToBase64String(compressedText);
string finalString = alteredText.Replace('+', ',');

Console.WriteLine("Text: {0}, this {1} the same as the original",
finalString, finalString == origString ? "IS" : "IS NOT");

Running the above, I get:
Length of Conpressed text = 15
Text: 123456,abcdef,ghijkl, this IS the same as the original

--
Truth,
James Curran
[erstwhile VC++ MVP]
Home: www.noveltheory.com Work: www.njtheater.com
Blog: www.honestillusion.com Day Job: www.partsearch.com

Jon Skeet [C# MVP] · Nov 12, 2004

James Curran said:
There's a method, but it's a bit snarky....

There an encoding format code BASE64 (also known as UUEncoding in some
quarters). It take fully binary data (0-255) and converts it a set of 64
printable characters (digits, uppercase, lowercase plus two symbols + and
/). Since email messages are required to be pure printable text (due to
some ancient hardware, which are almost certainly no longer on the 'net),
all attachments are BASE64 encoded. It converts 3 binary bytes into 4
characters, so encoded blocks increase 33% in size.

So, what does this effect you? Well, as long as your "encoded" string meets
the criteria of Base64 encoding, you can "decode" it into a smaller block of
binary data. 4 characters will become 3 bytes, or in your case, 20
characters can become 15 bytes.

Yes... it does mean you can only have 63 distinct characters though
(IIRC, '=' is used for end padding, which you also need to work out).

It also doesn't get 24 characters down to 16

Possibly a combination
of that (if it all applies appropriately) with something clever to do
with the 8 digits (which can be represented as a 4 byte integer, which
should help) could help.

It all sounds like something which should be redesigned rather than
munged like this though...

Converting ASCII to UTF-8	2	Nov 28, 2007
Document Merging with a Tab Delimited ASCII file	1	Mar 10, 2004
base64 encode characters > 127	1	Dec 15, 2003
base64 encoding - characters above 127	4	Dec 15, 2003
Stripping out hex codes from text file	1	Jul 28, 2005
Old-fashioned Style (ASCII/ANSI) & Console Applications in c#	2	Oct 1, 2003
Total confused and need help with small encryption and decryption methods	8	Apr 3, 2007
String of bits to a char	3	Jun 15, 2004

Compress ASCII text as Hex?

Ben Bloom

Nicholas Paldino [.NET/C# MVP]

Ben Bloom

Guest

Jon Skeet [C# MVP]

Guest

Jon Skeet [C# MVP]

James Curran

Jon Skeet [C# MVP]

Ask a Question

Similar Threads