System.IO.Compression

M

Mark Rae

Hi,

In v1.1, I used the Chilkat component for compression support, but am now
looking at using the new System.IO.Compression namespace in .NET2

I found this article on the web:
http://www.developer.com/net/net/article.php/3510026 which explains how to
compress and decompress files, and the code certainly works, but the author
mentions that the decompression method is not very efficient because it has
to read the compressed file twice in order to know how large it is.

Can anyone tell me if there is a better / more efficient way to do that?

Any assistance gratefully received.

Mark
 
J

Jon Skeet [C# MVP]

Mark said:
In v1.1, I used the Chilkat component for compression support, but am now
looking at using the new System.IO.Compression namespace in .NET2

I found this article on the web:
http://www.developer.com/net/net/article.php/3510026 which explains how to
compress and decompress files, and the code certainly works, but the author
mentions that the decompression method is not very efficient because it has
to read the compressed file twice in order to know how large it is.

Can anyone tell me if there is a better / more efficient way to do that?

You could decompress by writing each decompressed chunk you read into a
MemoryStream instead, then get the array after that. You might waste a
bit of time/memory copying things around, but it would only need to
read the data once.

Jon
 
M

Mark Rae

You could decompress by writing each decompressed chunk you read into a
MemoryStream instead, then get the array after that. You might waste a
bit of time/memory copying things around, but it would only need to
read the data once.

Yes, I see what you're saying.

I'm also reading quite a few postings around the net that the new
System.IO.Compression functionality "isn't very good", at least not for file
compression / decompression - is this the general consensus around here?

The 3rd-party component I've been using up to now is available for .NET2, so
I might just continue to use it anyway...
 
J

Jon Skeet [C# MVP]

Mark said:
Yes, I see what you're saying.

I'm also reading quite a few postings around the net that the new
System.IO.Compression functionality "isn't very good", at least not for file
compression / decompression - is this the general consensus around here?

Well, it's somewhat limited in terms of it being solely stream
compression - zip files themselves can't be extracted, etc. I don't
know much more about it than that though.
The 3rd-party component I've been using up to now is available for .NET2, so
I might just continue to use it anyway...

SharpZipLib by any chance? I've used that too, and it works well -
although there's a bug which I reported a while ago which worried me
somewhat. (It was unable to decompress its own compressed data.) It may
have been fixed by now, of course...

Jon
 
M

Mark Rae

SharpZipLib by any chance?

No - I tried that, but didn't like it.

I'm using this: http://www.chilkatsoft.com/ChilkatDotNet.asp

I originally started using it back in v1.0 days, because the contract I was
working on needed to support compression, FTP and encryption, and this did
all three plus loads more besides.

Their support is excellent too. Recently, I had a requirement to interface
with an ancient VMS machine at a company in Canada via FTP, and they worked
with me to iron out the various issues, specifically with the way VMS stores
more than one file of the same name, differentiated by version numbers e.g.

JON.SKEET;1
JON.SKEET;2
JON.SKEET;3

Basically, I sent them all of the error logs, and they sent me an updated
version of their DLL with updated VMS support.
 
G

Guest

I've used Mike Krueger, et al SharpZipLib for all kinds of compression
needs, both Zip and in-memory, and I'm sure the bug that Jon referred to has
long since been fixed. Something about having all the source code handy just
gives me a better comfort level. And once when first started out I had a
question and Mike was gracious enough to supply me with some sample code.
Peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com
 
J

Jon Skeet [C# MVP]

Peter said:
I've used Mike Krueger, et al SharpZipLib for all kinds of compression
needs, both Zip and in-memory, and I'm sure the bug that Jon referred to has
long since been fixed. Something about having all the source code handy just
gives me a better comfort level. And once when first started out I had a
question and Mike was gracious enough to supply me with some sample code.

I wouldn't be so sure it'll have been fixed by now. I submitted it less
than a year ago, and haven't heard anything about the bug being fixed,
despite providing a "short but complete example" in the normal way.

Unfortunately the forums (where I reported the bug) don't have anything
earlier than last August, so I can't easily check. I've found the files
involved though, so I'll have a go at compressing them from what I
remember of the problem, and see if I can reproduce it again.

Of course, I tried to fix it myself, but I rapidly got into fairly deep
water where an in-depth knowledge of compression was required.

Jon
 
J

Jon Skeet [C# MVP]

Jon said:
Unfortunately the forums (where I reported the bug) don't have anything
earlier than last August, so I can't easily check. I've found the files
involved though, so I'll have a go at compressing them from what I
remember of the problem, and see if I can reproduce it again.

Just managed to reproduce it again with release 0.84. I don't know
whether the #ZipLib team have an internal bug database beyond "the
forums", but if so hopefully they've got it as a "to do" some time.
Rather worrying though.

Jon
 
G

Guest

Jon,
Can you provide one of your "Short but completes" on this? I'd like to check
it out myself as I am particularly fond of their library. I suspect they are
pretty busy on SharpDevelop 2.0 (which looks pretty interesting BTW) but I'd
be happy to add another voice to the roar of the crowd.
Peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com
 
J

Jon Skeet [C# MVP]

Peter Bromberg said:
Can you provide one of your "Short but completes" on this? I'd like to check
it out myself as I am particularly fond of their library. I suspect they are
pretty busy on SharpDevelop 2.0 (which looks pretty interesting BTW) but I'd
be happy to add another voice to the roar of the crowd.

Sure. Here's the compression code:

using System;
using System.IO;
using ICSharpCode.SharpZipLib.Zip.Compression;
using ICSharpCode.SharpZipLib.Zip.Compression.Streams;

class Comp
{
static void Main(string[] args)
{
using (Stream input = File.OpenRead (args[0]))
{
using (Stream output = File.OpenWrite (args[1]))
{
using (DeflaterOutputStream dos = new
DeflaterOutputStream
(output, new Deflater(Deflater.BEST_SPEED)))
{
byte[] buffer = new byte[16*1024];

int read;
while ((read=input.Read(buffer,0,buffer.Length))>0)
{
dos.Write (buffer, 0, read);
}
dos.Flush();
dos.Finish();
}
}
}
}
}


Here's the decompression code:

using System;
using System.IO;
using ICSharpCode.SharpZipLib.Zip.Compression;
using ICSharpCode.SharpZipLib.Zip.Compression.Streams;

class Decomp
{
static void Main(string[] args)
{
using (Stream input = File.OpenRead (args[0]))
{
using (Stream output = File.OpenWrite (args[1]))
{
using (InflaterInputStream iis = new
InflaterInputStream (input))
{
byte[] buffer = new byte[16*1024];

int read;
while ((read=iis.Read(buffer, 0, buffer.Length))>0)
{
output.Write (buffer, 0, read);
}
}
}
}
}
}

Compile:
csc Comp.cs /r:ICSharpCode.SharpZipLib.dll
csc Decomp.cs /r:ICSharpCode.SharpZipLib.dll

Download test data (important - it only fails on certain data)
http://www.pobox.com/~skeet/original.dat

Compress:
comp original.dat compressed.dat

Decompress:
decomp compressed.dat decompressed.dat

Exception:
Unhandled Exception: ICSharpCode.SharpZipLib.SharpZipBaseException:
ICSharpCode.
SharpZipLib.SharpZipBaseException: Adler chksum doesn't match:
-1805756378 vs. -768451840
at ICSharpCode.SharpZipLib.Zip.Compression.Inflater.DecodeChksum()
at ICSharpCode.SharpZipLib.Zip.Compression.Inflater.Decode()
at ICSharpCode.SharpZipLib.Zip.Compression.Inflater.Inflate(Byte[]
buf, Int32 offset, Int32 len)
at
ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream.Rea
d(Byte[] b, Int32 off, Int32 len)
at
ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream.Rea
d(Byte[] b, Int32 off, Int32 len)
at Comp.Main(String[] args)


The code looks correct to me...
 
G

Guest

Jon,
Got it all, thanks. I'll keep you posted.
Peter
--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com




Jon Skeet said:
Peter Bromberg said:
Can you provide one of your "Short but completes" on this? I'd like to check
it out myself as I am particularly fond of their library. I suspect they are
pretty busy on SharpDevelop 2.0 (which looks pretty interesting BTW) but I'd
be happy to add another voice to the roar of the crowd.

Sure. Here's the compression code:

using System;
using System.IO;
using ICSharpCode.SharpZipLib.Zip.Compression;
using ICSharpCode.SharpZipLib.Zip.Compression.Streams;

class Comp
{
static void Main(string[] args)
{
using (Stream input = File.OpenRead (args[0]))
{
using (Stream output = File.OpenWrite (args[1]))
{
using (DeflaterOutputStream dos = new
DeflaterOutputStream
(output, new Deflater(Deflater.BEST_SPEED)))
{
byte[] buffer = new byte[16*1024];

int read;
while ((read=input.Read(buffer,0,buffer.Length))>0)
{
dos.Write (buffer, 0, read);
}
dos.Flush();
dos.Finish();
}
}
}
}
}


Here's the decompression code:

using System;
using System.IO;
using ICSharpCode.SharpZipLib.Zip.Compression;
using ICSharpCode.SharpZipLib.Zip.Compression.Streams;

class Decomp
{
static void Main(string[] args)
{
using (Stream input = File.OpenRead (args[0]))
{
using (Stream output = File.OpenWrite (args[1]))
{
using (InflaterInputStream iis = new
InflaterInputStream (input))
{
byte[] buffer = new byte[16*1024];

int read;
while ((read=iis.Read(buffer, 0, buffer.Length))>0)
{
output.Write (buffer, 0, read);
}
}
}
}
}
}

Compile:
csc Comp.cs /r:ICSharpCode.SharpZipLib.dll
csc Decomp.cs /r:ICSharpCode.SharpZipLib.dll

Download test data (important - it only fails on certain data)
http://www.pobox.com/~skeet/original.dat

Compress:
comp original.dat compressed.dat

Decompress:
decomp compressed.dat decompressed.dat

Exception:
Unhandled Exception: ICSharpCode.SharpZipLib.SharpZipBaseException:
ICSharpCode.
SharpZipLib.SharpZipBaseException: Adler chksum doesn't match:
-1805756378 vs. -768451840
at ICSharpCode.SharpZipLib.Zip.Compression.Inflater.DecodeChksum()
at ICSharpCode.SharpZipLib.Zip.Compression.Inflater.Decode()
at ICSharpCode.SharpZipLib.Zip.Compression.Inflater.Inflate(Byte[]
buf, Int32 offset, Int32 len)
at
ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream.Rea
d(Byte[] b, Int32 off, Int32 len)
at
ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream.Rea
d(Byte[] b, Int32 off, Int32 len)
at Comp.Main(String[] args)


The code looks correct to me...
 
J

Jon Skeet [C# MVP]

Peter Bromberg said:
Got it all, thanks. I'll keep you posted.

Thanks - did it reproduce the problem for you? I'd hate it to be
processor-specific somehow...
 
G

Guest

Jon,
this is happening in the DecodeChkSum method of class Inflater (excerpt):

private bool DecodeChksum()
{
while (neededBits > 0) {
int chkByte = input.PeekBits(8);
if (chkByte < 0) {
return false;
}
input.DropBits(8);
readAdler = (readAdler << 8) | chkByte;
neededBits -= 8;
}
if ((int) adler.Value != readAdler) {
throw new SharpZipBaseException("Adler chksum doesn't match: " +
(int)adler.Value + " vs. " + readAdler +" times: " +numTimes.ToString());
}
mode = FINISHED;
return false;
}
-- the exception is thrown as the adler.Value and readAdler don't match,
which blows it up.

Before I send this off to Krueger and friends, I want to know this isn't one
of those cruel "Hunchback in the Nut Shoppe" jokes! The original.dat file
you provided seems not to compress at all - the output is the same size. So
what's this file?
it is already compressed? If so, with what program.
Thanks,
Peter
--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com
 
G

Guest

Jon,
If I use these 2 methods on your original.dat file, it not only compresses
to about 140K, it decompresses fine (note that the decompress method looks
all fudged up since it was originally for a string, but you won't have any
trouble figuring it out):

private byte[] Compress(string strInput)
{
try
{
byte[] bytData = System.Text.Encoding.UTF8.GetBytes(strInput);
MemoryStream ms = new MemoryStream();
ICSharpCode.SharpZipLib.Zip.Compression.Deflater defl = new
ICSharpCode.SharpZipLib.Zip.Compression.Deflater(9,false);
Stream s = new
ICSharpCode.SharpZipLib.Zip.Compression.Streams.DeflaterOutputStream(ms,defl);
s.Write(bytData, 0, bytData.Length);
s.Close();
byte[] compressedData = (byte[])ms.ToArray();
MessageBox.Show("Original: " +bytData.Length.ToString()+": "
+"Compressed: " +compressedData.Length.ToString());
return compressedData;
}
catch(Exception e)
{
MessageBox.Show( e.ToString());
return null;
}

}

private string DeCompress(byte[] bytInput)
{
MemoryStream ms = new MemoryStream(bytInput,0,bytInput.Length);

string strResult="";
int totalLength = 0;
byte[] writeData = new byte[4096];
Stream s2 = new
ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream(ms);
MemoryStream outMs = new MemoryStream();
byte[] bFinal=null;
try
{
while (true)
{
int size = s2.Read(writeData, 0, writeData.Length);
outMs.Write(writeData, 0, writeData.Length);
if (size > 0)
{
totalLength += size;

strResult+=System.Text.Encoding.UTF8.GetString(writeData, 0,size);
}
else
{
break;
}
}
s2.Seek(0, 0);
bFinal=new byte[s2.Length];
s2.Read(bFinal,0,(int)s2.Length);
s2.Close();
FileStream fs = new FileStream(@"C:\temp\out.dat",FileMode.Create);
bFinal = outMs.ToArray();
fs.Write(bFinal, 0, bFinal.Length);
fs.Close();

return strResult;
}
catch(Exception e)
{
MessageBox.Show(e.ToString());
return null;
}
}

--Peter
--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com
 
G

Guest

Jon,
this definitely revolves around the decompression level that is passed. In
fact I remember when I was testing compression of DataSets for remoting over
the wire, it would blow up if I used a level less than 5. If I change your
code to, for example:

using (DeflaterOutputStream dos = new DeflaterOutputStream (output, new
Deflater(Deflater.DEFAULT_COMPRESSION)))

//( DEFAULT_COMPRESSION = 6)

It doesn't blow up.
Peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com
 
J

Jon Skeet [C# MVP]

Peter Bromberg said:
this definitely revolves around the decompression level that is passed. In
fact I remember when I was testing compression of DataSets for remoting over
the wire, it would blow up if I used a level less than 5. If I change your
code to, for example:

using (DeflaterOutputStream dos = new DeflaterOutputStream (output, new
Deflater(Deflater.DEFAULT_COMPRESSION)))

//( DEFAULT_COMPRESSION = 6)

It doesn't blow up.

Yes - I do apologise, I should certainly have stated that using a
different compression level causes it not to blow up.

As for what the file is - it's a file generated by a product I was
working on the time. I believe it's compressed, but I couldn't say
exactly how and probably wouldn't even if I knew off-hand, just in case
it's sensitive :)

My *guess* is that there's an error somewhere due to Java bytes being
signed and C# bytes being unsigned, but it's hard to say for sure
whether it's the compressor providing the wrong checksum or the
decompressor reading it incorrectly :(
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top