PC Review


Reply
Thread Tools Rate Thread

ZIP and solid archives

 
 
morten.skarstad@sapphire.no
Guest
Posts: n/a
 
      15th Mar 2006
ZIP has become somewhat of an industry standard, and is supported by
everything and everybody. It has been overtaken by other formats in
terms of performance and features, but it still remains the most widely
used and supported archive format.

One of the benefits of alternative formats is not only that compression
ratios alone are higher. But a lot of them also supports something
called solid archives. That is, redundancy is removed not only within
each single file, but also between all files in the archive. In some
cases, the differences in compression ratios caused by solid archiving
can be extreme.

My question is: Can solid archives somehow be emulated without breaking
compatibility with the ZIP standard?

Point in case: I have a bunch of similar files. Let's look at only a
couple of them, say file1 and file2. They are both 464 kB in size.
Compressed to individual ZIP files they both shrink to 403 kB each.
ZIPping both into one single ZIP gives me a larger ZIP file of 806 kB,
which is to be expected.

Now, let's try something else. I compress file1 and file2 individually
into 7-Zip files. The resulting files are 399 kB, i.e. slighly smaller
than the ZIP files. However, compressing file1 and file2 into one
single 7z file takes only 400 kB! Adding even more of these files (I
have a bunch of them) only seems to increase the 7z archive by about 1
kB each.

All examples are made using TugZip, maximum compression, unless
otherwise stated.

Next, I tried to first put file1 and file2 into a container without
compressing them (i.e. ZIP with no compression), resulting in a single
uncompressed 929 kB file, and then compressing this. I was disappointed
to find that compressing this container with ZIP gave me a 805 kB file,
only slightly smaller than the standard ZIP. Compressing the container
using 7-Zip yet again produced a 400 kB file. Why this difference? Does
it have something to do with search span or dictionary size of the two
algorithms? Can this difference somehow be worked around?

Out of curiousity I also tried making tgz (.tar.gz) and tbz (.tar.bz2)
archives of file1 and file2, since these formats are also solid. The
resulting archives were 805 kB and 505 kB respectively.

The reason for my concern with this is that I routinely receive and
send lots of files to various recipients in my work, either via e-mail
or from closed web download sites. In particular, mails bouncing due to
attachment sizes are a common problem. I have tried convincing some of
my contacts to consider the possibility of using something like .7z, so
far without results. From what I can gather, people are either using
WinZip or the builtin shell extension in Windows XP. Self extracting
executables are also out of the question, since these are commonly
blocked due to security policies of various companies.

 
Reply With Quote
 
 
 
 
Fran
Guest
Posts: n/a
 
      15th Mar 2006
zip doesn't support solid archives, but 7z is also widely supported format
and 7z archives can be unpacked by most archivers. The simplest solution to
your problem would be: use 7z.

<(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> ZIP has become somewhat of an industry standard, and is supported by
> everything and everybody. It has been overtaken by other formats in
> terms of performance and features, but it still remains the most widely
> used and supported archive format.
>
> One of the benefits of alternative formats is not only that compression
> ratios alone are higher. But a lot of them also supports something
> called solid archives. That is, redundancy is removed not only within
> each single file, but also between all files in the archive. In some
> cases, the differences in compression ratios caused by solid archiving
> can be extreme.
>
> My question is: Can solid archives somehow be emulated without breaking
> compatibility with the ZIP standard?
>
> Point in case: I have a bunch of similar files. Let's look at only a
> couple of them, say file1 and file2. They are both 464 kB in size.
> Compressed to individual ZIP files they both shrink to 403 kB each.
> ZIPping both into one single ZIP gives me a larger ZIP file of 806 kB,
> which is to be expected.
>
> Now, let's try something else. I compress file1 and file2 individually
> into 7-Zip files. The resulting files are 399 kB, i.e. slighly smaller
> than the ZIP files. However, compressing file1 and file2 into one
> single 7z file takes only 400 kB! Adding even more of these files (I
> have a bunch of them) only seems to increase the 7z archive by about 1
> kB each.
>
> All examples are made using TugZip, maximum compression, unless
> otherwise stated.
>
> Next, I tried to first put file1 and file2 into a container without
> compressing them (i.e. ZIP with no compression), resulting in a single
> uncompressed 929 kB file, and then compressing this. I was disappointed
> to find that compressing this container with ZIP gave me a 805 kB file,
> only slightly smaller than the standard ZIP. Compressing the container
> using 7-Zip yet again produced a 400 kB file. Why this difference? Does
> it have something to do with search span or dictionary size of the two
> algorithms? Can this difference somehow be worked around?
>
> Out of curiousity I also tried making tgz (.tar.gz) and tbz (.tar.bz2)
> archives of file1 and file2, since these formats are also solid. The
> resulting archives were 805 kB and 505 kB respectively.
>
> The reason for my concern with this is that I routinely receive and
> send lots of files to various recipients in my work, either via e-mail
> or from closed web download sites. In particular, mails bouncing due to
> attachment sizes are a common problem. I have tried convincing some of
> my contacts to consider the possibility of using something like .7z, so
> far without results. From what I can gather, people are either using
> WinZip or the builtin shell extension in Windows XP. Self extracting
> executables are also out of the question, since these are commonly
> blocked due to security policies of various companies.
>



 
Reply With Quote
 
 
 
 
morten.skarstad@sapphire.no
Guest
Posts: n/a
 
      15th Mar 2006
Fran wrote:
> zip doesn't support solid archives, but 7z is also widely supported format
> and 7z archives can be unpacked by most archivers. The simplest solution to
> your problem would be: use 7z.


I know that zip does not support solid archives. I know that 7z is
supported by a lot of archivers. That is not the issue. Please read my
entire post.

In case my point was unclear: I wanted to emulate solid archiving by
using the following technique:
1) Put all the files I want to compress into a single ZIP using _no_
compression. The result is a single file with size equal to the sum of
the original files.
2) Compress the uncompressed ZIP. Since I am now compressing one single
file rather than several small ones, redundancy throughout should be
removed even if the compressor does not support solid archiving.

The task performed by most archivers is actually a two step process:
Joining several files into a single container, and compression. The
difference between solid and non-solid archiving is basically in which
order these two tasks are performed. Non-solid archivers compress the
files first, before joining the compressed files. Solid archivers join
the files first, before compressing the full container.

I have previously achieved good results using the above technique with
LhA, which neither does not support solid archiving. Actually, I
originally picked up this tip from the LhA manual more than a decade
ago. However, using this technique with ZIP seems to yield little to
none of the potential gain, and I do not understand why.

 
Reply With Quote
 
FirstName LastName
Guest
Posts: n/a
 
      15th Mar 2006
(E-Mail Removed) wrote:
> Fran wrote:
>
> In case my point was unclear: I wanted to emulate solid archiving by
> using the following technique:
> 1) Put all the files I want to compress into a single ZIP using _no_
> compression. The result is a single file with size equal to the sum of
> the original files.
> 2) Compress the uncompressed ZIP. Since I am now compressing one single
> file rather than several small ones, redundancy throughout should be
> removed even if the compressor does not support solid archiving.
>


Tar and gzip

http://www.gzip.org/

<quote>
The gzip file format holds a single compressed file. On Unix systems,
compressed archives are typically created by rolling collections of
files into a tar archive, and then compressing that archive with gzip.
The final .tar.gz or .tgz file is usually called a "compressed tarball."
<quote>

http://www.irnis.net/soft/wingzip/
WinGZip utility

----

http://advancemame.sourceforge.net/comp-readme.html
Advance Projects - AdvanceCOMP

<quote>
The main purpose of this utility is to recompress and test the zip
archives to get the smallest possible size.
<quote>

----

http://www.bzip.org/
The bzip2 and libbzip2 home page

<quote>
bzip2 can be used combined or independently of tar: bzip2 file to
compress and bzip2 -d file.bz2 to uncompress (the alias bunzip2 for
decompression may also be used).
<quote>

http://gnuwin32.sourceforge.net/packages/bzip2.htm
bzip2 for Windows
 
Reply With Quote
 
morten.skarstad@sapphire.no
Guest
Posts: n/a
 
      15th Mar 2006
FirstName LastName wrote:
> Tar and gzip


I am aware of tar + gzip, as well as tar + bzip2. I also tried these,
see my original post. tbz did provide better compression, tgz did not.

However, neither brings me nearer to my goal, which is better
compression ratios without breaking ZIP compatibility. As mentioned in
my original post, I need to be able interchange files with contacts
which due to own unwillingness and/or company policies solely rely on
WinZip and/or zipfldr.dll

> http://advancemame.sourceforge.net/comp-readme.html
> Advance Projects - AdvanceCOMP
>
> <quote>
> The main purpose of this utility is to recompress and test the zip
> archives to get the smallest possible size.
> <quote>


This is somewhat interesting. Not really related to solid archiving,
but rather an attempt to increase compression using an alternative ZIP
implementation. The achieved results (less than 1% of the original ZIP
in my initial tests) leaves a lot to be desired, but I'll look closer
into this one. Thanks.

 
Reply With Quote
 
morten.skarstad@sapphire.no
Guest
Posts: n/a
 
      15th Mar 2006
I wrote:
> My question is: Can solid archives somehow be emulated without breaking
> compatibility with the ZIP standard?


After a little digging following up mr. Lastnames post, I discovered
that the procedure I have been describing is commonly known as nested
zipping. Further digging revealed that there actually exists a tool
that does this job without manual rezipping: VelcroFly
(http://www.randelshofer.ch/velcrofly/download.html)

The description fits my needs, but unfortunately the performance seems
to be identical to that of my manually nested ZIP files: file1+file2 in
a nested ZIP still takes up 805 kB.

However, results for much smaller files seems to be good. For instance,
the docs for AdvanceComp (the program suggested by mr. Lastname) take
up 43,6 kB in 14 files. Compressed to a plain ZIP they occupy 16 kB,
but in a nested ZIP they only take up 8 kB. In other words, solid
archiving with ZIP can be done. The problem seems to be that once the
original files exceed a certain size the ZIP deflate algorithm is
useless. This is further supported by my experience with tgz, which
uses the same algorithm and achieves the same mediocre results.

Is there really no way around this?

 
Reply With Quote
 
Vic Dura
Guest
Posts: n/a
 
      15th Mar 2006
On 15 Mar 2006 03:40:17 -0800, (E-Mail Removed) wrote Re
Re: ZIP and solid archives:

>I have previously achieved good results using the above technique with
>LhA, which neither does not support solid archiving. Actually, I
>originally picked up this tip from the LhA manual more than a decade
>ago. However, using this technique with ZIP seems to yield little to
>none of the potential gain, and I do not understand why.


I seem to recall reading when zip compression was released into the
public domain after the PKarc lawsuit, that the compression of several
files in a zip file was performed individually on each file. I never
tested it because I didn't care. However your tests seem to confirm
it.

I don't know why they chose to do it that way. Possibly to make
splitting files off from a .zip easier? That's just a guess.
--
To email me directly, remove CLUTTER.
 
Reply With Quote
 
H-Man
Guest
Posts: n/a
 
      15th Mar 2006
On 15 Mar 2006 01:24:10 -0800, (E-Mail Removed) wrote:

<SNIPPED>
> My question is: Can solid archives somehow be emulated without breaking
> compatibility with the ZIP standard?
>

<SNIPPED>

There's really a lot more to it than just making an archive solid. As
you've experienced, the Unix world has been doing things this way for some
time with tar+gzip. The method described is really the only way to make a
solid ZIP archive, that is archive first, then zip the archive. Better
compression rates are archived using better compression algorithms. It most
certainly is possible to improve the ZIP algorithm to provide better
compression, the trick is to leave it compatible with the existing UNZIP
algorithm. This may not be possible at all, and even if it were, the
tradeoff to keep it compatible might not provide enough gain to be worth
the effort.

In order to build a better ZIP it almost certainly is necessary to break
the "ZIP" campatibility. I suggest that what you do is create a .7z SFX and
then ZIP it as a ZIP file. The receiving party can then unzip it and
execute it. This is certianly not ideal but it won't get dumped because it
is an EXE and the receiving party won't need 7z to open the archive.
Another trick I've often used is to make it the archive a 7z SFX and simply
rename it to filename.xyz. In the body of the email, instruct the receiving
party to save it to a folder and rename it to filename.exe, then execute
it. Your choice, but better compression requires better algorithms, end of
story.

--
HK
 
Reply With Quote
 
Matth
Guest
Posts: n/a
 
      20th Mar 2006

On 15 Mar 2006 05:50:06 -0800, (E-Mail Removed) wrote:

>I wrote:
>> My question is: Can solid archives somehow be emulated without breaking
>> compatibility with the ZIP standard?

>
>After a little digging following up mr. Lastnames post, I discovered
>that the procedure I have been describing is commonly known as nested
>zipping. Further digging revealed that there actually exists a tool
>that does this job without manual rezipping: VelcroFly
>(http://www.randelshofer.ch/velcrofly/download.html)
>
>The description fits my needs, but unfortunately the performance seems
>to be identical to that of my manually nested ZIP files: file1+file2 in
>a nested ZIP still takes up 805 kB.
>
>However, results for much smaller files seems to be good. For instance,
>the docs for AdvanceComp (the program suggested by mr. Lastname) take
>up 43,6 kB in 14 files. Compressed to a plain ZIP they occupy 16 kB,
>but in a nested ZIP they only take up 8 kB. In other words, solid
>archiving with ZIP can be done. The problem seems to be that once the
>original files exceed a certain size the ZIP deflate algorithm is
>useless. This is further supported by my experience with tgz, which
>uses the same algorithm and achieves the same mediocre results.
>
>Is there really no way around this?


The ZIP (and GZIP) dictionary size is only 32k, far too small to take
advantage of inter-file similarity except for small files. Nested zip
was a bit of a false dawn. Not sure if it would work any better under
"deflate64" - not sure if they extended the dictionary size (or even
added a "solid" option).

ZIP has had its day, and it's a pity Microsoft integrated handling of
such an old compression system, as it props up the old relic for
longer.

Either persuade your contacts to get 7-Zip or another (often freeware
too) program that can handle 7Z, or as suggested, bundle a 7Z self
extractor inside a ZIP to prevent it being blocked.

If they insist on commercial software, then the answer is RAR (with a
free UNRAR,).

7-Zip's "7Z" format needs to take it's place as the new, de-facto
archiving format, as an open format that is far superior to legacy
zip.

--
I may be dozzzy, but take the ZZZ's out to mail me
http://www.junkroom.freeserve.co.uk/jvc2080.htm - 2x2x24 CD-RW troubles

If you drop a cactus, don't try to catch it!
 
Reply With Quote
 
Morten Skarstad
Guest
Posts: n/a
 
      21st Mar 2006
Matth skrev:
> The ZIP (and GZIP) dictionary size is only 32k, far too small to take
> advantage of inter-file similarity except for small files. Nested zip
> was a bit of a false dawn.


Ah well. Dead end, then.

Thanks for the info.
 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[New] Zipoid - ZIP Code, City Name and Area Code Lookup - Zip Code to Zip Code Distance Calculation Mel Freeware 0 22nd Jul 2005 04:13 PM
ZIP Archives in folder slow down windows explorer =?Utf-8?B?RElfUmVpbWVycw==?= Windows XP Help 0 17th Jun 2004 11:54 AM
How do I stop SEARCH from looking in my ZIP file archives Rick Windows XP Basics 2 3rd Apr 2004 06:00 PM
prevent displaying contents of zip archives in Explorer cockatoo Windows XP Customization 3 11th Dec 2003 12:22 AM
Searching for files in .zip archives John Healey Windows XP Basics 2 7th Oct 2003 03:14 AM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 02:26 AM.