using in-memory zlib deflate from c# (with max performance :-)

tombrogan3 · Jun 30, 2008

Hi, I need to implement in-memory zlib compression in c# to replace an
old c++ app.

Pre-requisites..
1) The performance must be FAST (with source memory sizes from a few k
to a meg).
2) The output must match exactly compression generated with the c++
zlib.org code with default compression.

zlib.org c# code is as slow as hell! Not sure if I'm doing anything
wrong but it crawls - on the plus side it matches what's generated
with the c++ libraries exactly.

system.io.compression deflate is fast, as s SharpZLib, but the output
is different from the zlib.org.

Any help would be GREATLY appreciated!

Cheers,

Tom

Jon Skeet [C# MVP] · Jun 30, 2008

Hi, I need to implement in-memory zlib compression in c# to replace an
old c++ app.

Pre-requisites..
1) The performance must be FAST (with source memory sizes from a few k
to a meg).
2) The output must match exactly compression generated with the c++
zlib.org code with default compression.

That second point is an odd one, and likely to give you issues. What's
the basis of that requirement? Obviously the decompressed data should
be the same, but do you really need the compressed version to be
identical?

Jon

tombrogan3 · Jun 30, 2008

That second point is an odd one, and likely to give you issues. What's
the basis of that requirement? Obviously the decompressed data should
be the same, but do you really need the compressed version to be
identical?

Jon

Hi, unfortunately yes.

I'm compressing the data then writing it to a file (the file has an
extremely propriatary format that means I can't just compress into it
directly).

The file will then be read by another c++ process.

Obviously if both match the deflate spec then c++ will be able to read
it, but my solution will be a lot more "acceptable" if the output
files are the same for c# amd c++.

Thanks,

Tom

Jon Skeet [C# MVP] · Jun 30, 2008

Hi, unfortunately yes.

I'm compressing the data then writing it to a file (the file has an
extremely propriatary format that means I can't just compress into it
directly).

The file will then be read by another c++ process.

Obviously if both match the deflate spec then c++ will be able to read
it, but my solution will be a lot more "acceptable" if the output
files are the same for c# amd c++.

In that case you may find yourself digging into the zlib.org code in
the normal profiling kind of way. It's unlikely that other compressors
will produce *exactly* the same output, although you can try tweaking
options (window sizes etc) to see if that will help.

Personally I'd try to push back on the "identical output" requirement,
satisfying myself instead with a comprehensive sets of tests for the
"compress and then uncompress" cycle. I realise that may be futile in
some situations, but it may be worth pointing out that if the C++ zlib
code is ever patched that may well change the output in a harmless
manner too.

Jon

tombrogan3 · Jun 30, 2008

In that case you may find yourself digging into the zlib.org code in
the normal profiling kind of way. It's unlikely that other compressors
will produce *exactly* the same output, although you can try tweaking
options (window sizes etc) to see if that will help.

Personally I'd try to push back on the "identical output" requirement,
satisfying myself instead with a comprehensive sets of tests for the
"compress and then uncompress" cycle. I realise that may be futile in
some situations, but it may be worth pointing out that if the C++ zlib
code is ever patched that may well change the output in a harmless
manner too.

Jon- Hide quoted text -

- Show quoted text -

Cheers Jon, I'll do that.

You don't know any way to get maximum performance do you?

I'm having to iterate through many structures (up to 1 million),
compressing them one at a time, with a source data size ranging from a
few kbytes to a meg.

Do you think threading would help? (it will run on multi processor
machines).

Thanks,

Tom

Jon Skeet [C# MVP] · Jun 30, 2008

Cheers Jon, I'll do that.

You don't know any way to get maximum performance do you?

Find a bottleneck, squish it. Lather, rinse repeat

The exact details of squishing the bottleneck depend on the kind of
bottleneck, but basically profiling is your friend. Don't expect a
profiler to necessarily give you accurate results - the various
techniques used by different profilers always skew results, but you
can still use them a lot to help. (Basically you need to make sure
you've got a benchmark which runs in release mode, not under a
profiler, to see the *actual* improvements gained by making changes
suggested by the profiler.)

I'm having to iterate through many structures (up to 1 million),
compressing them one at a time, with a source data size ranging from a
few kbytes to a meg.

Do you think threading would help? (it will run on multi processor
machines).

Threading should help in that case, if you've got a naturally parallel
system - if you can compress two data sources independently, without
caring about which ends up being written first, for instance. If it's
not naturally parallel it may be harder, but still feasible.

If you're not close to release and don't mind using beta software,
Parallel Extensions makes life a lot simpler in my experience.

Jon

How do I inflate compressed data from an asynchronous socket?	6	Jun 11, 2007
Access to driver memory using c#	4	Apr 23, 2006
Deterministic destruction across scope and state machines in c#	7	May 31, 2011
As I able to corrupt the memory using unsafe code in C#?	1	Oct 2, 2003
Way to improve managed code performance	4	Feb 17, 2005
Processing XML With C# and .NET	0	Apr 20, 2014
Processing XML With C# and .NET	1	Apr 20, 2014
Call C# COM objects from C++	1	Aug 30, 2004

using in-memory zlib deflate from c# (with max performance :-)

tombrogan3

Jon Skeet [C# MVP]

tombrogan3

Jon Skeet [C# MVP]

tombrogan3

Jon Skeet [C# MVP]

Ask a Question

Similar Threads