compression - dropping the last byte

M

Marc Gravell

It might just be my tired eyes, but I can't see what is wrong in the
following:

I have a byte array filled with random data (with a fixed seed to make
reproducable); I then compress and decompress this using the 2.0 GZip
compression classes - however, as you see, the last byte is getting dropped.

Any ideas? Almost certainly me having a blond moment ;-(

Marc

Overview:
GetByteData : returns an array of random data
Describe : details the first / last 5 values to the console (first: last,
second : penultimate, etc...)
CopyStream : reads / writes between two streams until the source is
exhausted (is there an wasier way of doing this?)
Compress / Decompress : use the GZip classes to manipulate the data

Results:

* 50000
0=170 49999=187
1=200 49998=183
2=81 49997=158
3=81 49996=216
4=66 49995=154

* 49999
0=170 49998=183
1=200 49997=158
2=81 49996=216
3=81 49995=154
4=66 49994=133


Code : watch for wrap
================

using System;
using System.IO;
using System.IO.Compression;
namespace ConsoleApplication1 {
static class Program {
public static void Main() {
byte[] data = GetByteData();
Describe(data);
Console.WriteLine();
byte[] compressed = Compress(data);
byte[] decompressed = Decompress(compressed);
Describe(decompressed);
Console.ReadLine();
}

private static void Describe(byte[] data) {
Console.WriteLine("* " + data.Length.ToString());
int length = data.Length;
for (int i = 0; i < 5; i++) {
int otherEnd = length - 1 - i;
Console.WriteLine(string.Format("{0}={1}\t{2}={3}",i,data,otherEnd,data[otherEnd]));
}
}
private static byte[] Compress(byte[] data) {
using (Stream input = new MemoryStream(data))
using (MemoryStream output = new MemoryStream())
using (Stream zipper = new GZipStream(output,
CompressionMode.Compress, true)) {
CopyStream(input, zipper);
return output.ToArray();
}
}
private static byte[] Decompress(byte[] data) {
using (Stream input = new MemoryStream(data))
using (Stream unzipper = new GZipStream(input,
CompressionMode.Decompress, false))
using (MemoryStream output = new MemoryStream()) {
CopyStream(unzipper, output);
return output.ToArray();
}
}

private const int SEED = 141566, SIZE = 50000;

public static byte[] GetByteData() {
Random rand = new Random(SEED); // some seed to make
reproducable
byte[] data = new byte[SIZE];
rand.NextBytes(data);
return data;

}

public static long CopyStream(System.IO.Stream source,
System.IO.Stream destination) {
const int BUFFER_SIZE = 512;
long copied = 0;
byte[] buffer = new byte[BUFFER_SIZE];
int bytes;
while ((bytes = source.Read(buffer, 0, BUFFER_SIZE)) > 0) {
destination.Write(buffer, 0, bytes);
copied += bytes;
}
return copied;
}
}
}
 
J

Jon Skeet [C# MVP]

Marc said:
It might just be my tired eyes, but I can't see what is wrong in the
following:

I have a byte array filled with random data (with a fixed seed to make
reproducable); I then compress and decompress this using the 2.0 GZip
compression classes - however, as you see, the last byte is getting dropped.

Any ideas? Almost certainly me having a blond moment ;-(

Move the return output.ToArray(); call *outside* the "using (Stream
zipper...)" block. That way the GZipStream is disposed of (and
therefore flushed and terminated appropriately) before you return the
data.

Making that change (which requires extra bracing) makes your sample
work.

Jon
 
M

Marc Gravell

A huge thanks to you Jon - that does indeed fix it. Quite a subtle one! Even
calling zipper.Flush() (just before ToArray) doesn't fix it...

I guess there's a cautionary lesson in there for all of use regarding
bracing etc : at a casual glance the following look functionally identical -
but one works and one doesn't... a bug that I think got introduced when
"tidying" the braces because they obviously [sic] do the same thing...

// works
using (Stream input = new MemoryStream(data))
using (MemoryStream output = new MemoryStream()) {
using (Stream zipper = new GZipStream(output, CompressionMode.Compress,
true)) {
CopyStream(input, zipper);
}
return output.ToArray();
}

and

// fails
using (Stream input = new MemoryStream(data))
using (MemoryStream output = new MemoryStream())
using (Stream zipper = new GZipStream(output, CompressionMode.Compress,
true)) {
CopyStream(input, zipper);
return output.ToArray();
}

Marc
 
W

William Stacey [MVP]

one would think flush should have solved this. Is this by design?

--
William Stacey [MVP]

|A huge thanks to you Jon - that does indeed fix it. Quite a subtle one!
Even
| calling zipper.Flush() (just before ToArray) doesn't fix it...
|
| I guess there's a cautionary lesson in there for all of use regarding
| bracing etc : at a casual glance the following look functionally
identical -
| but one works and one doesn't... a bug that I think got introduced when
| "tidying" the braces because they obviously [sic] do the same thing...
|
| // works
| using (Stream input = new MemoryStream(data))
| using (MemoryStream output = new MemoryStream()) {
| using (Stream zipper = new GZipStream(output, CompressionMode.Compress,
| true)) {
| CopyStream(input, zipper);
| }
| return output.ToArray();
| }
|
| and
|
| // fails
| using (Stream input = new MemoryStream(data))
| using (MemoryStream output = new MemoryStream())
| using (Stream zipper = new GZipStream(output, CompressionMode.Compress,
| true)) {
| CopyStream(input, zipper);
| return output.ToArray();
| }
|
| Marc
|
|
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top