Problem with the GZipStream class and small streams

G

Guest

Hello!

I have a problem using the System.IO.Compression.GZipStream class. I wrote
the following methods to compress and decompress arrays of bytes.

private static byte[] Compress(byte[] array)
{
MemoryStream stream = new MemoryStream();
GZipStream gZipStream = new GZipStream(stream, CompressionMode.Compress);
gZipStream.Write(array, 0, array.Length);
return stream.ToArray();
}

private static byte[] Decompress(byte[] array)
{
MemoryStream stream = new MemoryStream();
GZipStream gZipStream = new GZipStream(new MemoryStream(array),
CompressionMode.Decompress);
byte[] b = new byte[4096];
while (true)
{
int n = gZipStream.Read(b, 0, b.Length);
if (n > 0)
stream.Write(b, 0, n);
else
break;
}
return stream.ToArray();
}

In the Decompress method, if the array of bytes is small (apparently smaller
than 4096 bytes), then the Read method returns 0 regardless the size of the b
buffer. Also, replacing the while block with the following one, if the array
of bytes is small, then the ReadByte method returns -1.

while (true)
{
int n = gZipStream.ReadByte();
if (n != -1)
stream.WriteByte((byte)n);
else
break;
}

Is this happening because the GZipStream class internally uses a 4 KB
buffer? Anyway, how could I solve the problem?

Thank you,
Fabio
 
W

Walter Wang [MSFT]

Hi,

In the Compress code:

private static byte[] Compress(byte[] array)
{
MemoryStream stream = new MemoryStream();
GZipStream gZipStream = new GZipStream(stream, CompressionMode.Compress);
gZipStream.Write(array, 0, array.Length);
return stream.ToArray();
}

You need to close the GZipStream first to read from the underlying
MemoryStream. It's because the GZip footer was written in
GZipStream.Dispose.

Sincerely,
Walter Wang ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications. If you are using Outlook Express, please make sure you clear the
check box "Tools/Options/Read: Get 300 headers at a time" to see your reply
promptly.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
 
G

Guest

Thank you, Walter. That solves the problem indeed. Still I cannot explain the
different behavior with larger streams?
 
W

Walter Wang [MSFT]

Hi Fabio,

Thanks for the update.

I've done some test using large buffer, however, they all shows incorrect
result if the GZipStream is not closed before reading the underlying
MemoryStream. Can you help me confirm the behavior on your side? Thanks.

static void Main(string[] args)
{
TestBySize(10);
TestBySize(4095);
TestBySize(4096);
TestBySize(409700);
}

private static void TestBySize(int C)
{
byte[] buf1 = new byte[C];
for (int i = 0; i < C; i++)
{
buf1 = (byte) i;
}
byte[] buf2 = Compress(buf1);
byte[] buf3 = Decompress(buf2);
Console.WriteLine(CompareBuffer(buf1, buf3));
}

private static bool CompareBuffer(byte[] buf1, byte[] buf2)
{
if (buf1 == null || buf2 == null) return false;
if (buf1.Length != buf2.Length) return false;
for (int i = 0; i < buf1.Length; i++)
{
if (buf1 != buf2) return false;
}
return true;
}

private static byte[] Compress(byte[] array)
{
MemoryStream stream = new MemoryStream();
GZipStream gZipStream = new GZipStream(stream,
CompressionMode.Compress);
gZipStream.Write(array, 0, array.Length);
return stream.ToArray();
}

private static byte[] Decompress(byte[] array)
{
MemoryStream stream = new MemoryStream();
GZipStream gZipStream = new GZipStream(new MemoryStream(array),
CompressionMode.Decompress);
byte[] b = new byte[4096];
while (true)
{
int n = gZipStream.Read(b, 0, b.Length);
if (n > 0)
stream.Write(b, 0, n);
else
break;
}
return stream.ToArray();
}


Regards,
Walter Wang ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
 
W

Walter Wang [MSFT]

Hi Fabio,

I am interested in this issue. Would you mind letting me know the result of
the suggestions? If you need further assistance, feel free to let me know.
I will be more than happy to be of assistance.

Have a great day!

Regards,
Walter Wang ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
 
G

Guest

Hi Walter,

Sorry for my delayed response, I have been training during the last days and
only today I found some time to further investigate this issue.

I first noticed this behavior while developing the following sample
application.

http://fabioscagliola.spaces.live.com/blog/cns!919F8FCDE3DC9AC4!160.entry

In the beginning I was not closing the GZipStream in my Compress method as
you suggested me. Nonetheless the application was able to correctly handle
files larger than 32576 bytes (I thought 4096 guessing the GZipStream class
internally uses a 4 KB buffer, but I was wrong).

Running your test code I get the same results as you: if I close the
GZipStream in my Compress method as you suggested me, then your Compare
method always returns true, else if I do not close it, then your Compare
method always returns false.

However, the reasons why your Compare method returns false if I do not close
the GZipStream are different based on the size of the array of bytes. Here is
what I found out.

(1) If the size of the array is 32575 bytes or less (852 compressed), then
the Read method of the GZipStream fails.

(2) If the size of the array is 32576 bytes or more (854 compressed), then
the array being compressed and then decompressed is one byte larger that the
original one, BUT, except for the last byte, the contents of the two arrays
are identical.

Please, give the following code a try. I still cannot explain the different
behavior.


using System;
using System.IO;
using System.IO.Compression;

public class ConsoleApplication
{
private static byte[] Compress(byte[] array)
{
MemoryStream stream = new MemoryStream();
GZipStream gZipStream = new GZipStream(stream,
CompressionMode.Compress);
gZipStream.Write(array, 0, array.Length);
//gZipStream.Close();
return stream.ToArray();
}

private static byte[] Decompress(byte[] array)
{
MemoryStream stream = new MemoryStream();
GZipStream gZipStream = new GZipStream(new MemoryStream(array),
CompressionMode.Decompress);
byte[] b = new byte[4096];
while (true)
{
int n = gZipStream.Read(b, 0, b.Length);
if (n > 0)
stream.Write(b, 0, n);
else
{
if (stream.Length == 0)
throw new Exception("Cannot read from GZipStream.");
else
break;
}
}
gZipStream.Close();
return stream.ToArray();
}

public static void Test(int size)
{
try
{
Console.WriteLine(string.Format("Using a {0} bytes array...", size));
byte[] b1 = new byte[size];
for (int i = 0; i < size; i++)
b1 = (byte)i;
Console.WriteLine("Compressing...");
byte[] b2 = Compress(b1);
Console.WriteLine("Decompressing...");
byte[] b3 = Decompress(b2);
Console.WriteLine("Done.");
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}

public static void Main()
{
Test(32574);
Test(32575);
Test(32576);
Test(32577);
}
}


I thank you very much and wish you too have a great day!

Regards,
Fabio
 
W

Walter Wang [MSFT]

Hi Fabio,

Thank you very much for your following up.

Based on my understanding, your current question is about:

when not closing the GZipStream in Compress(), the decompressed
MemoryStream in Decompress() sometimes has data, sometimes doesn't, and you
want to know why.


Well, I think it's related to the GZip compressing algorithm and the
internal implementation details in the .NET class library. In my opinion,
when we're not closing the GZipStream before getting the underlying
MemoryStream's content, the content passed to Decompress() is already
invalid GZip content. Which means the resulting data from Decompress()
would be unexpected, the data could be zero-length, or could be other
incomplete data; in either case, the resulting data is wrong if you use my
CompareBuffer() method to compare it with the original buffer before
compressing.

I hope I didn't misunderstand your question and my answer make sense to
your question. Please let me know whether or not you need further
information. Thanks.

Regards,
Walter Wang ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
 
G

Guest

Hi Walter,

You perfectly understood my question (by the way, sorry for I did not
formulate it at all :) and your answer definitely makes sense. After all it
was just my curiosity, because -as you pointed out since the beginning- the
correct way to handle compression and decompression is closing the stream,
which I had forgotten in the first implementation of my methods.

Thank you,
Fabio
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top