Adding compression

chance · Apr 4, 2007

Hello,
I want to add compression to a memory stream and save it in an Oracle
database. This is the code I have so far:

//save the Word document to a binary field,
MemoryStream dataStream = new MemoryStream();
doc.Save(dataStream, SaveFormat.Doc);

//now compress it
GZipStream compressedZipStream = new GZipStream(dataStream,
CompressionMode.Compress);

//now store to document attatchment
row["DOCUMENT"] = compressedZipStream. <---------How can I
dump all the bytes here?

I need help with the fourth line. I was using stream.ToArray() to make
the assignment but that is not available for the compressedZipStream.

How can I store the compressed zip stream in the row?

tia,
chance.

Jon Skeet [C# MVP] · Apr 4, 2007

Hello,
I want to add compression to a memory stream and save it in an Oracle
database. This is the code I have so far:

//save the Word document to a binary field,
MemoryStream dataStream = new MemoryStream();
doc.Save(dataStream, SaveFormat.Doc);

//now compress it
GZipStream compressedZipStream = new GZipStream(dataStream,
CompressionMode.Compress);

//now store to document attatchment
row["DOCUMENT"] = compressedZipStream. <---------How can I
dump all the bytes here?

I need help with the fourth line. I was using stream.ToArray() to make
the assignment but that is not available for the compressedZipStream.

How can I store the compressed zip stream in the row?

You're not ordering things correctly. You want to set the GZipStream
up to write into the MemoryStream, then tell your document to save
into the GZipStream, then close both streams, *then* call ToArray.
Currently you're not compressing anything.

Jon

chance · Apr 4, 2007

What call actually does the compression?

Hello,
I want to add compression to a memory stream and save it in an Oracle
database. This is the code I have so far:

Click to expand...

//save the Word document to a binary field,
MemoryStream dataStream = new MemoryStream();
doc.Save(dataStream, SaveFormat.Doc);

Click to expand...

//now compress it
GZipStream compressedZipStream = new GZipStream(dataStream,
CompressionMode.Compress);

Click to expand...

//now store to document attatchment
row["DOCUMENT"] = compressedZipStream. <---------How can I
dump all the bytes here?

Click to expand...

I need help with the fourth line. I was using stream.ToArray() to make
the assignment but that is not available for the compressedZipStream.

Click to expand...

How can I store the compressed zip stream in the row?

Click to expand...

You're not ordering things correctly. You want to set the GZipStream
up to write into the MemoryStream, then tell your document to save
into the GZipStream, then close both streams, *then* call ToArray.
Currently you're not compressing anything.

Jon- Hide quoted text -

- Show quoted text -

Jon Skeet [C# MVP] · Apr 4, 2007

What call actually does the compression?

When you write to a GZipStream, it writes the compressed data (after
buffering etc) to the stream you give it in the constructor. The
compression effectively happens behind the scenes, without you ever
having to say "compress now". You do, however, have to close the
stream so it can write the final buffered data out.

Jon

chance · Apr 4, 2007

Can you show an example where we try and compress a file called c:\
\report.doc and then store it in a row on a table?

thanxs.

Jon Skeet [C# MVP] · Apr 4, 2007

chance said:
Can you show an example where we try and compress a file called c:\
\report.doc and then store it in a row on a table?

I'm afraid I haven't got the time, but your original code was very
close - just make the changes I suggested and it should be fine.

D. Yates · Apr 5, 2007

Chance,

There is an example here:
http://msdn2.microsoft.com/en-us/library/system.io.compression.gzipstream.aspxDave

D. Yates · Apr 5, 2007

Chance,

The only other thing that I would add is that you should not stuff bytes
into the GZipStream one byte at a time. In my experience this has resulted
in almost NO compression. Try shoving 1K or 2K worth of data at a time into
the GZipStream till you get to the end of your file stream and then a
partail buffer buffer before closing the GZipStream. Just keep this fact
in mind.

Dave

D. Yates · Apr 5, 2007

That should be:
http://msdn2.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx

Jon Skeet [C# MVP] · Apr 5, 2007

D. Yates said:
The only other thing that I would add is that you should not stuff bytes
into the GZipStream one byte at a time. In my experience this has resulted
in almost NO compression. Try shoving 1K or 2K worth of data at a time into
the GZipStream till you get to the end of your file stream and then a
partail buffer buffer before closing the GZipStream. Just keep this fact
in mind.

It shouldn't make any difference - I would expect GZipStream to buffer
things up appropriately. One of the points of a stream is that it
shouldn't normally make a difference (other than performance) how you
put the data in - you should get the same data out.

I can write a test program for this if you're really sure you've seen
it make a difference, but as I say it shouldn't.

D. Yates · Apr 5, 2007

Jon,

I zipped a 3,760Kb firewall log text file and it only compresses to 3750Kb
using code like this:

private void Compress_Click(object sender, EventArgs e)
{
using(FileStream oldFile = File.OpenRead("Test.log"))
using(FileStream newFile = File.Create("Test.gz"))
using(GZipStream compression = new GZipStream(newFile,
CompressionMode.Compress))
{
int data = oldFile.ReadByte();
while(data != -1)
{
compression.WriteByte((byte) data);
data = oldFile.ReadByte();
}

compression.Close();
}
}

However, I can zip the 3,760Kb firewall log text file and it will compresses
to 233KB using code like this:

private void Compress_Click(object sender, EventArgs e)
{
using (FileStream oldFile = File.OpenRead("Test.log"))
using (FileStream newFile = File.Create("Test.gz"))
using (GZipStream compression = new GZipStream(newFile,
CompressionMode.Compress))
{
byte[] buffer = new byte[1024];
int numberOfBytesRead = oldFile.Read(buffer, 0, buffer.Length);
while (numberOfBytesRead > 0)
{
compression.Write(buffer, 0, numberOfBytesRead);
numberOfBytesRead = oldFile.Read(buffer, 0, buffer.Length);
}

compression.Close();
}
}

Decompress works if I do it one byte at a time like this:
private void Decompress_Click(object sender, EventArgs e)
{
using(FileStream compressFile = File.Open("Test.gz", FileMode.Open))
using (FileStream uncompressedFile = File.Create("Test-gz.log"))
using (GZipStream compression = new GZipStream(compressFile,
CompressionMode.Decompress))
{
int data = compression.ReadByte();
while(data != -1)
{
uncompressedFile.WriteByte((byte) data);
data = compression.ReadByte();
}

compression.Close();
}
}

Dave

Jon Skeet [C# MVP] · Apr 5, 2007

D. Yates said:
I zipped a 3,760Kb firewall log text file and it only compresses to 3750Kb
using code like this:

<snip>

Good grief. I view that as a significant flaw in the GZipStream class.
Fortunately it can also be fixed by wrapping a BufferedStream round it,
but I'm astonished that it doesn't perform appropriate buffering
itself.

I do apologise for doubting you - thanks for the simple sample code

(In my case I only took a 40K file, but it went down to 35K without
buffering and 5K when a BufferedStream was wrapped around the
GZipStream.)

D. Yates · Apr 5, 2007

Chance,

You might also want to read this:
http://www.madskristensen.dk/blog/PermaLink,guid,8804590e-9ab8-422e-a8db-f9f64e924fa0.aspx

On Mads Kristensen blog, he states that he tested GZipStream against
DeflateStream and that DeflateStream is 41% faster than GZipStream.

You might want to do your own tests as well.....

Dave

chance · Apr 6, 2007

I can't even get a non-corrupt zip file. This is my code. What gives?

//compress it
MemoryStream uncompressedStream = new MemoryStream();
doc.Save(uncompressedStream, SaveFormat.Doc);

MemoryStream compressedStream = new MemoryStream();
GZipStream compressor = new GZipStream(compressedStream,
CompressionMode.Compress);

uncompressedStream.Position = 0;
uncompressedStream.WriteTo(compressor);

row["DOCUMENT"] = compressedStream.ToArray();

D. Yates · Apr 6, 2007

Chance,

You are going to have to create a compressed version of the file on disk,
load the compressed version and then stream it to the database. If you try
to compress the file directly to a memorystream it will not work because the
compression stream will CLOSE the memory stream when it is disposed/closed.

Sooo... use the example given earlier (maybe with DeflateStream instead of
GZipStream) to compress the document on disk and then load up a the
compressed document and send it to the database. Afterwards, you can delete
to compressed disk file and you are good to go.

Dave

PS - You should use the GZipStream to do the writing since it holds a
reference to the destination stream and it compresses the data as it writes.
Look to the examples posted earlier for more information.

D. Yates · Apr 6, 2007

Jon,

I'm interested in why you would use a BufferedStream for reading data in and
then writing data back to a file? I can see it benefits if you don't know
how much data is coming down the pipe (the MSDN example uses a NetworkStream
with sockets...I get that...) and you want to gradually feed data into the
BufferedStream till it hits its preset size limit and then flushes data, but
in a case like this are there any advantages?

Dave

Jon Skeet [C# MVP] · Apr 7, 2007

D. Yates said:
I'm interested in why you would use a BufferedStream for reading data in and
then writing data back to a file? I can see it benefits if you don't know
how much data is coming down the pipe (the MSDN example uses a NetworkStream
with sockets...I get that...) and you want to gradually feed data into the
BufferedStream till it hits its preset size limit and then flushes data, but
in a case like this are there any advantages?

By wrapping a BufferedStream round the GZipStream, if you *do* write in
small blocks the effect is mitigated by the buffering.

Nicer to read and write whole blocks at a time, of course. Indeed, I've
got code in my MiscUtil library to do exactly that, copying the
contents of one stream into another...

D. Yates · Apr 7, 2007

Jon,

Hey, I couldn't find any code that uses BufferedStream in the Miscellaneous
Utilities here: http://www.yoda.arachsys.com/csharp/miscutil/

The MiscUtil.IO.SteamUtil is mentioned in the contents section, but the
source download does not contain this class.

Dave

Jon Skeet [C# MVP] · Apr 7, 2007

D. Yates said:
Hey, I couldn't find any code that uses BufferedStream in the Miscellaneous
Utilities here: http://www.yoda.arachsys.com/csharp/miscutil/

No, that doesn't use BufferedStream - but it provides a way of copying
the contents of one stream to another easily using a single buffer.

The MiscUtil.IO.SteamUtil is mentioned in the contents section, but the
source download does not contain this class.

Eek, does it not? It certainly should do!

<sfx: tappety tappety>

Hmm. Not sure where that all went wrong. Okay, I've got a unit test to
fix (from a bug reported by someone else) and then I'll upload a new
version. Thanks for pointing out the inconsistency!

Jon Skeet [C# MVP] · Apr 7, 2007

Hmm. Not sure where that all went wrong. Okay, I've got a unit test to
fix (from a bug reported by someone else) and then I'll upload a new
version. Thanks for pointing out the inconsistency!

Okay, found out what was wrong. I'd got all the code, but not committed
it to svn. My build process for the code that ends up on the website
fetches a clean copy from svn...

It's up there now.

Adding compression

chance

Jon Skeet [C# MVP]

chance

Jon Skeet [C# MVP]

chance

Jon Skeet [C# MVP]

D. Yates

D. Yates

D. Yates

Jon Skeet [C# MVP]

D. Yates

Jon Skeet [C# MVP]

D. Yates

chance

D. Yates

D. Yates

Jon Skeet [C# MVP]

D. Yates

Jon Skeet [C# MVP]

Jon Skeet [C# MVP]