string to byte[] back to string + Compression Failed!

J

jeremyje

I'm writing some code that will convert a regular string to a byte[]
for compression and then beable to convert that compressed string back
into original form.

Conceptually I have....

For compression
string ->(Unicode Conversion) byte[] -> (Compression + Unicode
Conversion) string

For Decompression
string ->(Unicode Conversion) byte[] -> (DECompression + Unicode
Conversion) string

The problem is that there's a code chunk that fails. Probably because
of some bad conversion somewhere in my code.



The key line that is constantly failing is....
int size = s.Read(write_data, 0, 8);
GZip algorithm gives me a ArrayIndexOutOfBounds
Deflate gives me some data corruption error. I looked at the byte[]
right after compression and right before decompression and they DO NOT
MATCH! What is the problem in this situation?

I need this process such that I get these 2 functions...

string CompressString(string in, Algorithm.GZip or Algorithm.Deflate);
string DecompressString(string in, Algorithm.GZip or
Algorithm.Deflate);

I've seen similar code but no potential fixes on Google.com


My Code is below....






using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using System.IO.Compression;

namespace Jeremyje.Utility
{
public class StringTransforms
{
public enum StringCompressionAlgorithm
{
GZip,
Deflate
}

public static byte[] UnicodeStringToByteArray(string str)
{
UnicodeEncoding enc = new UnicodeEncoding();
return enc.GetBytes(str);
}

public static string ByteArrayToUnicodeString(byte[] str_arr)
{
UnicodeEncoding enc = new UnicodeEncoding();
return enc.GetString(str_arr);
}

public static bool DecompressString(string in_string, out
string out_string)
{
return DecompressString(in_string, out out_string,
StringCompressionAlgorithm.GZip);
}

public static bool DecompressString(string in_string, out
string out_string, StringCompressionAlgorithm alg)
{
bool status = false;
out_string = in_string;

switch (alg)
{
case StringCompressionAlgorithm.GZip:
{
try
{
out_string = "";
int total_length = 0;
byte[] write_data = new byte[4096];
byte[] bData =
UnicodeStringToByteArray(in_string);

GZipStream s = new GZipStream(new
MemoryStream(bData), CompressionMode.Decompress);

while (true)
{
int size = s.Read(write_data, 0, 8);
if (size > 0)
{
total_length += size;
out_string +=
Encoding.Unicode.GetString(write_data, 0, size);
}
else
{
break;
}
}
s.Close();
status = true;
}
catch (Exception e)
{
Console.WriteLine(e);
status = false;
}

return status;
}
case StringCompressionAlgorithm.Deflate:
{
try
{
out_string = "";
int total_length = 0;
byte[] write_data = new byte[4096];
byte[] bData =
UnicodeStringToByteArray(in_string);

DeflateStream s = new DeflateStream(new
MemoryStream(bData), CompressionMode.Decompress);

while (true)
{
int size = s.Read(write_data, 0, 8);
if (size > 0)
{
total_length += size;
out_string +=
Encoding.Unicode.GetString(write_data, 0, size);
}
else
{
break;
}
}
s.Close();
status = true;
}
catch (Exception e)
{
Console.WriteLine(e);
status = false;
}

return status;
}

default:
break;
}

return status;
}

public static bool CompressString(string in_string, out string
out_string)
{
return CompressString(in_string, out out_string,
StringCompressionAlgorithm.GZip);
}

public static bool CompressString(string in_string, out string
out_string, StringCompressionAlgorithm alg)
{
bool status = false;
out_string = in_string;

switch(alg)
{
case StringCompressionAlgorithm.GZip:
{
try
{
MemoryStream ms = new MemoryStream();
Stream s = new GZipStream(ms,
CompressionMode.Compress);
byte[] bData =
UnicodeStringToByteArray(in_string);

s.Write(bData, 0, bData.Length);
s.Close();
byte[] compressed_data =
(byte[])ms.ToArray();
out_string =
ByteArrayToUnicodeString(compressed_data);
status = true;
}
catch(Exception e)
{
Console.WriteLine(e);
status = false;
}

return status;
}
case StringCompressionAlgorithm.Deflate:
{
try
{
MemoryStream ms = new MemoryStream();
Stream s = new DeflateStream(ms,
CompressionMode.Compress);
byte[] bData =
UnicodeStringToByteArray(in_string);

s.Write(bData, 0, bData.Length);
s.Close();
byte[] compressed_data =
(byte[])ms.ToArray();
out_string =
ByteArrayToUnicodeString(compressed_data);
status = true;
}
catch (Exception e)
{
Console.WriteLine(e);
status = false;
}

return status;
}

default:
break;
}

return false;
}
}
}
 
M

Morten Wennevik [C# MVP]

Hi Jeremy,

Your problem is converting the compressed byte[] to string. After the
compression a string can't hold the data the byte[] holds and you lose
lots of data causing an exception when you try to decompress it.

Having Compress return a byte[] and Decompress take a byte[] will solve
your problem. If you need the byte[] to be represented as string you can
use Base64.

[DecompressString]
byte[] bData = Convert.FromBase64String(in_string);

[CompressString]
out_string = Convert.ToBase64String(compressed_data);
 
J

Jon Skeet [C# MVP]

I'm writing some code that will convert a regular string to a byte[]
for compression and then beable to convert that compressed string back
into original form.

Don't try to encode arbitrary binary data as a string directly using
Encoding.Unicode. As Morten suggested, use Base64 instead.
 
J

jeremyje

I'm writing some code that will convert a regular string to a byte[]
for compression and then beable to convert that compressed string back
into original form.

Don't try to encode arbitrary binary data as a string directly using
Encoding.Unicode. As Morten suggested, use Base64 instead.

Yeah, that's what I ended up doing but that's not much of a
compression since base64 makes the data a lot larger. Is there another
encoding other than base64 that will yield better results?
 
J

Jon Skeet [C# MVP]

Yeah, that's what I ended up doing but that's not much of a
compression since base64 makes the data a lot larger. Is there another
encoding other than base64 that will yield better results?

Not with the same degree of safety. If you could store the raw
compressed data instead of converting it back to a string, you'd be
okay - but to convert arbitrary binary data into text data which is
"safe" in many situations (i.e. won't be subject to unicode
normalization, can be expressed in many encodings etc) Base64 is a very
good choice.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

I'm writing some code that will convert a regular string to a byte[]
for compression and then beable to convert that compressed string back
into original form.
Don't try to encode arbitrary binary data as a string directly using
Encoding.Unicode. As Morten suggested, use Base64 instead.
Yeah, that's what I ended up doing but that's not much of a
compression since base64 makes the data a lot larger. Is there another
encoding other than base64 that will yield better results?

Nothing standard.

Using a home made Base128 together with using a single byte
encoding like ISO8859-1 will reduce the overhead slightly.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top