Built in compression on .net looses data, maybe?

T

TerryStone

Thanks to anyone who reads this.

Below is some C# that compresses an array of bytes, and then
decompresses, and compares the original data with the new.

Firstly, the length of the decompressed data is shorter than the
original. So some loss of data has occured. But the content up until
the early truncation matches. So am I flushing correctly? This error
only occurs for particular combinations of bytes in the original
buffer.

Secondly, when I read the decompressed data from the zip-stream, the
first read returns zero bytes. After that I perform a second read and
the data can be read. Why is that?

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.IO.Compression;
using System.Text;

namespace lab02
{

class Program
{

static void Main(string[] args)
{

// declaration of local variables
int i1 = 0;

// create a buffer of data for compressing
byte[] bufferData = new byte[10];
for (i1 = 0; i1 < 10; i1++)
bufferData[i1] = Convert.ToByte(i1);

// PART 1 - Compression
byte[] bufferCompressed = null;
{

// compress buffer into a memory stream (ms)
MemoryStream msCompressed = new MemoryStream();
DeflateStream zipStream = new DeflateStream(msCompressed,
CompressionMode.Compress);
zipStream.Write(bufferData, 0, bufferData.Length);
msCompressed.Flush();

// get the compressed memory stream into a buffer
bufferCompressed = new byte[msCompressed.Length];
msCompressed.Position = 0;
msCompressed.Read(bufferCompressed, 0, bufferCompressed.Length);

// close zip stream
zipStream.Close();

}

// PART 2 - Decompression
byte[] bufferDecompressed = null;
{

// put the compressed data (bufferCompressed) into a memory stream
(msCompressed)
MemoryStream msCompressed = new MemoryStream();
msCompressed.Write(bufferCompressed, 0, bufferCompressed.Length);
msCompressed.Position = 0;

// decompress buffer
DeflateStream zipStream = new DeflateStream(msCompressed,
CompressionMode.Decompress);
msCompressed.Flush();

// read the de-compressed data into a buffer
MemoryStream msDecompressed = new MemoryStream();
int iBytesRead = 0;
byte[] bufferSub = new Byte[1024];
do
{

// read next bytes (problem with the first read, always read zero
first)
iBytesRead = zipStream.Read(bufferSub, 0, bufferSub.Length);
if ((msDecompressed.Length == 0) && (iBytesRead == 0))
iBytesRead = zipStream.Read(bufferSub, 0, bufferSub.Length);

// if some data was read...
if (iBytesRead > 0)
{

// add to stream
msDecompressed.Write(bufferSub, 0, iBytesRead);

}

} while (iBytesRead == bufferSub.Length);

// close zip stream
zipStream.Close();

// load buffer with unzipped data
bufferDecompressed = new byte[msDecompressed.Length];
msDecompressed.Position = 0;
msDecompressed.Read(bufferDecompressed, 0,
bufferDecompressed.Length);

}

// PART 3 - Comparision of what was and now is, or is it?
if(bufferData.Length!=bufferDecompressed.Length)
Trace.TraceInformation("Length mismatch!!!");
else
{

// compare contents
bool bMatch = true;
for (i1 = 0; i1 < bufferData.Length; i1++)
{

// if bytes do not match...
if (bufferData[i1] != bufferDecompressed[i1])
{

// update flag
bMatch = false;

// break out of loop
break;

}

}
if(!bMatch)
Trace.TraceInformation("Content does not match!!!");

}

}

}

}
 
M

Marc Gravell

First - you are flushing the wrong stream; it is zipStream that needs
flushing. However, I have seen the compression classes refuse to Flush
completely until Close is called; presumably this is an optimisation to keep
a few bytes for use in the compression algorithm. So My compression code
would be (note overload to ctor to leave the stream open):

using (MemoryStream msCompressed = new MemoryStream()) {
using (DeflateStream zipStream = new
DeflateStream(msCompressed, CompressionMode.Compress, true)) {
zipStream.Write(bufferData, 0, bufferData.Length);
zipStream.Close();
}
bufferCompressed = msCompressed.ToArray();
}

I don't know why your code reports zero (I didn't even run it to find out,
I'm afraid) - however, I would do as follows; note the trick is on the while
condition (which captures the count and tests it in one go).

using (MemoryStream msDecompressed = new MemoryStream()) {
using (MemoryStream msCompressed = new
MemoryStream(bufferCompressed))
using (DeflateStream zipStream = new
DeflateStream(msCompressed, CompressionMode.Decompress)) {
int bytesRead;
const int BUFFER_SIZE = 1024;
byte[] buffer = new byte[BUFFER_SIZE];
while ((bytesRead = zipStream.Read(buffer, 0,
BUFFER_SIZE)) > 0) {
msDecompressed.Write(buffer, 0, bytesRead);
}
}
bufferDecompressed = msDecompressed.ToArray();
}

Third - in this scenario, since I used the ctor to not close the stream
(first block of code) I could actually re-use my MemoryStream between the
two blocks simply by rewinding (.Position = 0); but the above works fine,
and illustrates the point. Also note that short or fairly random blocks of
data can get longer during compression. Ironic, but life, especially with
single-pass compression. Double-pass compression can use a "don't bother"
flag.

Marc
 
T

TerryStone

Thanks for the help Marc. Especially the

Stream.ToArray();

function, which make the code a lot more compact and readable.

I have discovered from another group that the zip stream must be
flushed AND closed before the compressed data can be read.

zipStream.Flush();
zipStream.Close();

Solved!

Terry.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top