Compression size

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I compressed a file with GZipStream class and is larger than the original
file.... how can this be?, the original file is 737 KB and the "compressed"
file is 1.1 MB. Did i miss something or is normal with that compression class?
 
What type of file are you compressing? Some highly compressed files such as
images may grow in size when compressed a second time. Also, the Microsoft
algorithms are not idea but rather make an attempt to steer clear of patent
issues.
 
First I compressed a txt file, i read that if a file is very small , the
compression can turn it larger in size, so then i tried with a mp3 file (not
sure if the file type matters) of 3.4 Mb, but turned it to 5.3
MB....so.....what's wrong??
 
VBA said:
First I compressed a txt file, i read that if a file is very small , the
compression can turn it larger in size, so then i tried with a mp3 file (not
sure if the file type matters) of 3.4 Mb, but turned it to 5.3
MB....so.....what's wrong??

MP3 is a compressed file... I bet you'd get better behavior with a 3.5
MB text file.

Scott
 
But i can only compress text files?? because a tried a while ago with a
pdf....and resulted the same..bigger size, but i don't know if a pdf file is
somehow compressed already.

by the way, when compressing a file, the resulting compressed file should be
with the same file extension? or i must use something like *.Z ????
 
PDF can contain compressed graphics (and, IIRC, sometimes text), and
if it is encrypted the data can appear relatively random. Both of
these make it a poor choice for compression.

Put simply: some files compress very well indeed, and some don't. In
particular, those that are already compressed (or highly random) don't
tend to compress (and can get bigger).

Marc
 
VBA said:
I compressed a file with GZipStream class and is larger than the original
file.... how can this be?, the original file is 737 KB and the
"compressed" file is 1.1 MB. Did i miss something or is normal with that
compression class?

Hi VBA,

Random data is hard to compress, as compression techniques often work on
probabilities (e.g. Huffman encoding). So, encrypted files and already
compressed files, such as MP3s, JPEGs, GIFs, etc will not compress at all.

Text documents written in English, or files containing sparse data (such as
BMPs and certain executables) will compress fairly well. It all depends on
the compression algorithm.

You should choose an algorithm that's appropriate to the type of data you're
trying to compress... a bad algorithm will almost certainly result in
larger files.

But like I said at the start random data is hard if not damn near impossible
to compress.
 
[...]
by the way, when compressing a file, the resulting compressed file
should be
with the same file extension? or i must use something like *.Z ????

You can name the compressed file whatever you like. Of course, using the
Gzip class, it's common to use the ".gz" extension for the output. But
there's no requirement that you do so.

Pete
 
[...] Also, the Microsoft
algorithms are not idea but rather make an attempt to steer clear of
patent
issues.

GzipStream may not implement an ideal algorithm, but since Gzip itself is
an open format, I doubt that patent issues are part of the question.
 
Looks very interesenting all that you are telling me :)
I just now thought in a new question related it.... how does Winzip work?? i
mean you can put any file in a Winzip file and compress it, and i read in a
book that uses a similar compression algorithm, is that another type a
compression or you could do a similar software in .NET using GZipStream????
 
Looks very interesenting all that you are telling me :)
I just now thought in a new question related it.... how does Winzip
work??

Two standard compression algorithms on which much (nearly all, actually,
as far as I know) of our lossless compression tools are built on are
Huffman encoding and the Lempel-Ziv-Welch algorithm. I don't have
specifics on the exact implementation of WinZip, but I gather that like
all "zip" variations, it uses some forms of these algorithms.

If you want to have a better idea of how various compression schemes work,
the place to start is reading about these basic algorithms.
i mean you can put any file in a Winzip file and compress it, and i read
in a
book that uses a similar compression algorithm, is that another type a
compression or you could do a similar software in .NET using
GZipStream????

You can't "put any file in a Winzip file and compress it". Typically,
something like WinZip will try a variety of specific compression
algorithms to see which performs best (each variation of a given algorithm
may perform differently, depending on the content and structure of the
data). In some cases, no compression algorithm will reduce the size, or
will reduce it significantly, and the original data will be used. But
inclusion of file headers and other information will increase the file
size at least a little.

Note that the GzipStream class does not have the entire data before it
must make decisions about how to compress the data. As far as I know, it
just uses a single "best general case" version of the "deflate" algorithm
(based on Huffman and LZW). In any case, it's guaranteed that GzipStream
doesn't have the ability to pick from a variety of algorithms to use the
best-performing one, as something like WinZip can.

Again, I don't know specifically how WinZip works, but all compression
tools have this basic behavior. There is not a single compression tool
that is guaranteed to reduce the size of the data.

Pete
 
Tom said:
But like I said at the start random data is hard if not damn near impossible
to compress.

Some define random data as being data that are uncompressable ...

:-)

Arne
 
Peter said:
Two standard compression algorithms on which much (nearly all, actually,
as far as I know) of our lossless compression tools are built on are
Huffman encoding and the Lempel-Ziv-Welch algorithm. I don't have
specifics on the exact implementation of WinZip, but I gather that like
all "zip" variations, it uses some forms of these algorithms.

Absolutely untrue.

LZ78 (LZW) is used in traditional Unix compress.

But ZIP and GZip uses LZ77.

Both often combined with either Huffman or Arithmetic encoding.

BZip uses Burrows Wheeler.
You can't "put any file in a Winzip file and compress it". Typically,
something like WinZip will try a variety of specific compression
algorithms to see which performs best (each variation of a given
algorithm may perform differently, depending on the content and
structure of the data). In some cases, no compression algorithm will
reduce the size, or will reduce it significantly, and the original data
will be used. But inclusion of file headers and other information will
increase the file size at least a little.

Note that the GzipStream class does not have the entire data before it
must make decisions about how to compress the data. As far as I know,
it just uses a single "best general case" version of the "deflate"
algorithm (based on Huffman and LZW). In any case, it's guaranteed that
GzipStream doesn't have the ability to pick from a variety of algorithms
to use the best-performing one, as something like WinZip can.

I would assume that WinZip only uses the possibilities within the
Zip format and not some custom format.

And deflate is still LZ77 not LZ78 (LZW).

Arne
 
Absolutely untrue.
Okay.

LZ78 (LZW) is used in traditional Unix compress.

But ZIP and GZip uses LZ77.

Both often combined with either Huffman or Arithmetic encoding.

That's what I said. I thought you said what I said was "absolutely
untrue".

Maybe the word "absolutely" means something different in your native
language? Here, it's used to emphasize, rather than to negate.

Pete
 
Peter said:
That's what I said. I thought you said what I said was "absolutely
untrue".

Maybe the word "absolutely" means something different in your native
language? Here, it's used to emphasize, rather than to negate.

????

You said that nearly all lossless compression tools are build on LZW.

That is absolute untrue or complete bullshit or whatever you want
to call it.

It even explained why: that ZIP and GZip does not use LZW. And they
are a lot more used than good old Unix Compress.

Arne
 
You said that nearly all lossless compression tools are build on LZW.

I wrote (and you quoted) "WinZip...uses some forms of these algorithms".

In what way is LZ77 (the algorithm you wrote is used with the ZIP format)
_not_ "some form" of the LZW algorithm?
That is absolute untrue or complete bullshit or whatever you want
to call it.

My statement was just fine, and your own claims even confirm that. You
can continue to write asinine things like "absolute untrue" and "complete
bullshit" as much as you like, there was nothing wrong with my post.
Furthermore, your posts continue to insult without educating.

If you have an actual point, try making it without being such an ass.

Thanks,
Pete
 
Peter said:
I wrote (and you quoted) "WinZip...uses some forms of these algorithms".

In what way is LZ77 (the algorithm you wrote is used with the ZIP
format) _not_ "some form" of the LZW algorithm?

No.

Not code wise. Not patent wise. Not in any way.
My statement was just fine, and your own claims even confirm that.
Bullshit.

Furthermore, your posts continue to insult without educating.

I have tried multiple times to explain to you that the most
widely used compression algorithms does not use LZW they use
LZ77.

That is educational.

That you refuse to understand it does not make it less educational.
If you have an actual point, try making it without being such an ass.

It seems as if you just have difficulties understanding the point.

Arne
 
It seems as if you just have difficulties understanding the point.

When you make a point that is comprehensible, then I will start worrying
about whether I understand it.
 
Peter said:
When you make a point that is comprehensible, then I will start worrying
about whether I understand it.

So you did not understand the following:

#> In what way is LZ77 (the algorithm you wrote is used with the ZIP
#> format) _not_ "some form" of the LZW algorithm?
#
#No.
#
#Not code wise. Not patent wise. Not in any way.

LZW is a completely different algorithm than LZ77. An implementation
will be different code. The infamous LZW patent does not apply to LZ77.

It is difficult to understand ?

Arne
 
[...]
LZW is a completely different algorithm than LZ77. An implementation
will be different code. The infamous LZW patent does not apply to LZ77.

You have a very strange concept of these absolute terms you're using:
"absolutely untrue", "complete bullshit", "completely different
algorithm", etc.

LZW is _not_ a COMPLETELY different algorithm. A COMPLETELY different
algorithm would share absolutely zero similarities.

All of the algorithms spawned by Lempel and Ziv, including the LZW
algorithm, share various similarities. Some have more similarities in
common than others, but they are ALL "some form" of each other. They all
share the same heritage, and in many ways address similar problems with
similar approaches. All of the LZ-based algorithms, being
dictionary-based, are much more similar to each other than they are to,
for example, Huffman encoding.

The question of a patent is completely irrelevant, by the way. Even
assuming that software patents make sense in the first place, it doesn't
take much for a patent to be inapplicable to closely related code. Most
software patents are written narrowly, for the very reason that it's too
easy to invalidate a broadly-written patent. As such, relatively minor
variations can results in two otherwise closely related algorithms not
sharing patent protection (see MP3 versus other similar
psychoacoustics-based audio compression algorithms, for example).

You seem to have this pathological need to find fault in whatever has been
written, at least with respect to my own posts, regardless of how
contrivedly narrow you have to interpret what was actually written, even
to the point of completely ignoring whatever intent actually existed in
what was written.

Frankly, I find _that_ to be "complete bullshit", and I'm sick and tired
of it. I go to a lot of trouble to make what I write as correct as I can,
and to make it clear where my first-hand knowledge of something is vague
or incomplete. When someone posts a _valid_ correction to something I've
written, I have no problem acknowledging my mistake, and I've posted my
share of "mea culpas" here in this newsgroup and others.

I find your insistence on finding fault with my posts where no fault
exists to be idiotic. I wish you would cut it out.

Pete
 
Back
Top