ZIP and solid archives

Discussion in 'Freeware' started by morten.skarstad@sapphire.no, Mar 15, 2006.

  1. Guest

    ZIP has become somewhat of an industry standard, and is supported by
    everything and everybody. It has been overtaken by other formats in
    terms of performance and features, but it still remains the most widely
    used and supported archive format.

    One of the benefits of alternative formats is not only that compression
    ratios alone are higher. But a lot of them also supports something
    called solid archives. That is, redundancy is removed not only within
    each single file, but also between all files in the archive. In some
    cases, the differences in compression ratios caused by solid archiving
    can be extreme.

    My question is: Can solid archives somehow be emulated without breaking
    compatibility with the ZIP standard?

    Point in case: I have a bunch of similar files. Let's look at only a
    couple of them, say file1 and file2. They are both 464 kB in size.
    Compressed to individual ZIP files they both shrink to 403 kB each.
    ZIPping both into one single ZIP gives me a larger ZIP file of 806 kB,
    which is to be expected.

    Now, let's try something else. I compress file1 and file2 individually
    into 7-Zip files. The resulting files are 399 kB, i.e. slighly smaller
    than the ZIP files. However, compressing file1 and file2 into one
    single 7z file takes only 400 kB! Adding even more of these files (I
    have a bunch of them) only seems to increase the 7z archive by about 1
    kB each.

    All examples are made using TugZip, maximum compression, unless
    otherwise stated.

    Next, I tried to first put file1 and file2 into a container without
    compressing them (i.e. ZIP with no compression), resulting in a single
    uncompressed 929 kB file, and then compressing this. I was disappointed
    to find that compressing this container with ZIP gave me a 805 kB file,
    only slightly smaller than the standard ZIP. Compressing the container
    using 7-Zip yet again produced a 400 kB file. Why this difference? Does
    it have something to do with search span or dictionary size of the two
    algorithms? Can this difference somehow be worked around?

    Out of curiousity I also tried making tgz (.tar.gz) and tbz (.tar.bz2)
    archives of file1 and file2, since these formats are also solid. The
    resulting archives were 805 kB and 505 kB respectively.

    The reason for my concern with this is that I routinely receive and
    send lots of files to various recipients in my work, either via e-mail
    or from closed web download sites. In particular, mails bouncing due to
    attachment sizes are a common problem. I have tried convincing some of
    my contacts to consider the possibility of using something like .7z, so
    far without results. From what I can gather, people are either using
    WinZip or the builtin shell extension in Windows XP. Self extracting
    executables are also out of the question, since these are commonly
    blocked due to security policies of various companies.
     
    , Mar 15, 2006
    #1
    1. Advertisements

  2. Fran Guest

    zip doesn't support solid archives, but 7z is also widely supported format
    and 7z archives can be unpacked by most archivers. The simplest solution to
    your problem would be: use 7z.

    <> wrote in message
    news:...
    > ZIP has become somewhat of an industry standard, and is supported by
    > everything and everybody. It has been overtaken by other formats in
    > terms of performance and features, but it still remains the most widely
    > used and supported archive format.
    >
    > One of the benefits of alternative formats is not only that compression
    > ratios alone are higher. But a lot of them also supports something
    > called solid archives. That is, redundancy is removed not only within
    > each single file, but also between all files in the archive. In some
    > cases, the differences in compression ratios caused by solid archiving
    > can be extreme.
    >
    > My question is: Can solid archives somehow be emulated without breaking
    > compatibility with the ZIP standard?
    >
    > Point in case: I have a bunch of similar files. Let's look at only a
    > couple of them, say file1 and file2. They are both 464 kB in size.
    > Compressed to individual ZIP files they both shrink to 403 kB each.
    > ZIPping both into one single ZIP gives me a larger ZIP file of 806 kB,
    > which is to be expected.
    >
    > Now, let's try something else. I compress file1 and file2 individually
    > into 7-Zip files. The resulting files are 399 kB, i.e. slighly smaller
    > than the ZIP files. However, compressing file1 and file2 into one
    > single 7z file takes only 400 kB! Adding even more of these files (I
    > have a bunch of them) only seems to increase the 7z archive by about 1
    > kB each.
    >
    > All examples are made using TugZip, maximum compression, unless
    > otherwise stated.
    >
    > Next, I tried to first put file1 and file2 into a container without
    > compressing them (i.e. ZIP with no compression), resulting in a single
    > uncompressed 929 kB file, and then compressing this. I was disappointed
    > to find that compressing this container with ZIP gave me a 805 kB file,
    > only slightly smaller than the standard ZIP. Compressing the container
    > using 7-Zip yet again produced a 400 kB file. Why this difference? Does
    > it have something to do with search span or dictionary size of the two
    > algorithms? Can this difference somehow be worked around?
    >
    > Out of curiousity I also tried making tgz (.tar.gz) and tbz (.tar.bz2)
    > archives of file1 and file2, since these formats are also solid. The
    > resulting archives were 805 kB and 505 kB respectively.
    >
    > The reason for my concern with this is that I routinely receive and
    > send lots of files to various recipients in my work, either via e-mail
    > or from closed web download sites. In particular, mails bouncing due to
    > attachment sizes are a common problem. I have tried convincing some of
    > my contacts to consider the possibility of using something like .7z, so
    > far without results. From what I can gather, people are either using
    > WinZip or the builtin shell extension in Windows XP. Self extracting
    > executables are also out of the question, since these are commonly
    > blocked due to security policies of various companies.
    >
     
    Fran, Mar 15, 2006
    #2
    1. Advertisements

  3. Guest

    Fran wrote:
    > zip doesn't support solid archives, but 7z is also widely supported format
    > and 7z archives can be unpacked by most archivers. The simplest solution to
    > your problem would be: use 7z.


    I know that zip does not support solid archives. I know that 7z is
    supported by a lot of archivers. That is not the issue. Please read my
    entire post.

    In case my point was unclear: I wanted to emulate solid archiving by
    using the following technique:
    1) Put all the files I want to compress into a single ZIP using _no_
    compression. The result is a single file with size equal to the sum of
    the original files.
    2) Compress the uncompressed ZIP. Since I am now compressing one single
    file rather than several small ones, redundancy throughout should be
    removed even if the compressor does not support solid archiving.

    The task performed by most archivers is actually a two step process:
    Joining several files into a single container, and compression. The
    difference between solid and non-solid archiving is basically in which
    order these two tasks are performed. Non-solid archivers compress the
    files first, before joining the compressed files. Solid archivers join
    the files first, before compressing the full container.

    I have previously achieved good results using the above technique with
    LhA, which neither does not support solid archiving. Actually, I
    originally picked up this tip from the LhA manual more than a decade
    ago. However, using this technique with ZIP seems to yield little to
    none of the potential gain, and I do not understand why.
     
    , Mar 15, 2006
    #3
  4. wrote:
    > Fran wrote:
    >
    > In case my point was unclear: I wanted to emulate solid archiving by
    > using the following technique:
    > 1) Put all the files I want to compress into a single ZIP using _no_
    > compression. The result is a single file with size equal to the sum of
    > the original files.
    > 2) Compress the uncompressed ZIP. Since I am now compressing one single
    > file rather than several small ones, redundancy throughout should be
    > removed even if the compressor does not support solid archiving.
    >


    Tar and gzip

    http://www.gzip.org/

    <quote>
    The gzip file format holds a single compressed file. On Unix systems,
    compressed archives are typically created by rolling collections of
    files into a tar archive, and then compressing that archive with gzip.
    The final .tar.gz or .tgz file is usually called a "compressed tarball."
    <quote>

    http://www.irnis.net/soft/wingzip/
    WinGZip utility

    ----

    http://advancemame.sourceforge.net/comp-readme.html
    Advance Projects - AdvanceCOMP

    <quote>
    The main purpose of this utility is to recompress and test the zip
    archives to get the smallest possible size.
    <quote>

    ----

    http://www.bzip.org/
    The bzip2 and libbzip2 home page

    <quote>
    bzip2 can be used combined or independently of tar: bzip2 file to
    compress and bzip2 -d file.bz2 to uncompress (the alias bunzip2 for
    decompression may also be used).
    <quote>

    http://gnuwin32.sourceforge.net/packages/bzip2.htm
    bzip2 for Windows
     
    FirstName LastName, Mar 15, 2006
    #4
  5. Guest

    FirstName LastName wrote:
    > Tar and gzip


    I am aware of tar + gzip, as well as tar + bzip2. I also tried these,
    see my original post. tbz did provide better compression, tgz did not.

    However, neither brings me nearer to my goal, which is better
    compression ratios without breaking ZIP compatibility. As mentioned in
    my original post, I need to be able interchange files with contacts
    which due to own unwillingness and/or company policies solely rely on
    WinZip and/or zipfldr.dll

    > http://advancemame.sourceforge.net/comp-readme.html
    > Advance Projects - AdvanceCOMP
    >
    > <quote>
    > The main purpose of this utility is to recompress and test the zip
    > archives to get the smallest possible size.
    > <quote>


    This is somewhat interesting. Not really related to solid archiving,
    but rather an attempt to increase compression using an alternative ZIP
    implementation. The achieved results (less than 1% of the original ZIP
    in my initial tests) leaves a lot to be desired, but I'll look closer
    into this one. Thanks.
     
    , Mar 15, 2006
    #5
  6. Guest

    I wrote:
    > My question is: Can solid archives somehow be emulated without breaking
    > compatibility with the ZIP standard?


    After a little digging following up mr. Lastnames post, I discovered
    that the procedure I have been describing is commonly known as nested
    zipping. Further digging revealed that there actually exists a tool
    that does this job without manual rezipping: VelcroFly
    (http://www.randelshofer.ch/velcrofly/download.html)

    The description fits my needs, but unfortunately the performance seems
    to be identical to that of my manually nested ZIP files: file1+file2 in
    a nested ZIP still takes up 805 kB.

    However, results for much smaller files seems to be good. For instance,
    the docs for AdvanceComp (the program suggested by mr. Lastname) take
    up 43,6 kB in 14 files. Compressed to a plain ZIP they occupy 16 kB,
    but in a nested ZIP they only take up 8 kB. In other words, solid
    archiving with ZIP can be done. The problem seems to be that once the
    original files exceed a certain size the ZIP deflate algorithm is
    useless. This is further supported by my experience with tgz, which
    uses the same algorithm and achieves the same mediocre results.

    Is there really no way around this?
     
    , Mar 15, 2006
    #6
  7. Vic Dura Guest

    On 15 Mar 2006 03:40:17 -0800, wrote Re
    Re: ZIP and solid archives:

    >I have previously achieved good results using the above technique with
    >LhA, which neither does not support solid archiving. Actually, I
    >originally picked up this tip from the LhA manual more than a decade
    >ago. However, using this technique with ZIP seems to yield little to
    >none of the potential gain, and I do not understand why.


    I seem to recall reading when zip compression was released into the
    public domain after the PKarc lawsuit, that the compression of several
    files in a zip file was performed individually on each file. I never
    tested it because I didn't care. However your tests seem to confirm
    it.

    I don't know why they chose to do it that way. Possibly to make
    splitting files off from a .zip easier? That's just a guess.
    --
    To email me directly, remove CLUTTER.
     
    Vic Dura, Mar 15, 2006
    #7
  8. H-Man Guest

    On 15 Mar 2006 01:24:10 -0800, wrote:

    <SNIPPED>
    > My question is: Can solid archives somehow be emulated without breaking
    > compatibility with the ZIP standard?
    >

    <SNIPPED>

    There's really a lot more to it than just making an archive solid. As
    you've experienced, the Unix world has been doing things this way for some
    time with tar+gzip. The method described is really the only way to make a
    solid ZIP archive, that is archive first, then zip the archive. Better
    compression rates are archived using better compression algorithms. It most
    certainly is possible to improve the ZIP algorithm to provide better
    compression, the trick is to leave it compatible with the existing UNZIP
    algorithm. This may not be possible at all, and even if it were, the
    tradeoff to keep it compatible might not provide enough gain to be worth
    the effort.

    In order to build a better ZIP it almost certainly is necessary to break
    the "ZIP" campatibility. I suggest that what you do is create a .7z SFX and
    then ZIP it as a ZIP file. The receiving party can then unzip it and
    execute it. This is certianly not ideal but it won't get dumped because it
    is an EXE and the receiving party won't need 7z to open the archive.
    Another trick I've often used is to make it the archive a 7z SFX and simply
    rename it to filename.xyz. In the body of the email, instruct the receiving
    party to save it to a folder and rename it to filename.exe, then execute
    it. Your choice, but better compression requires better algorithms, end of
    story.

    --
    HK
     
    H-Man, Mar 15, 2006
    #8
  9. Matth Guest

    On 15 Mar 2006 05:50:06 -0800, wrote:

    >I wrote:
    >> My question is: Can solid archives somehow be emulated without breaking
    >> compatibility with the ZIP standard?

    >
    >After a little digging following up mr. Lastnames post, I discovered
    >that the procedure I have been describing is commonly known as nested
    >zipping. Further digging revealed that there actually exists a tool
    >that does this job without manual rezipping: VelcroFly
    >(http://www.randelshofer.ch/velcrofly/download.html)
    >
    >The description fits my needs, but unfortunately the performance seems
    >to be identical to that of my manually nested ZIP files: file1+file2 in
    >a nested ZIP still takes up 805 kB.
    >
    >However, results for much smaller files seems to be good. For instance,
    >the docs for AdvanceComp (the program suggested by mr. Lastname) take
    >up 43,6 kB in 14 files. Compressed to a plain ZIP they occupy 16 kB,
    >but in a nested ZIP they only take up 8 kB. In other words, solid
    >archiving with ZIP can be done. The problem seems to be that once the
    >original files exceed a certain size the ZIP deflate algorithm is
    >useless. This is further supported by my experience with tgz, which
    >uses the same algorithm and achieves the same mediocre results.
    >
    >Is there really no way around this?


    The ZIP (and GZIP) dictionary size is only 32k, far too small to take
    advantage of inter-file similarity except for small files. Nested zip
    was a bit of a false dawn. Not sure if it would work any better under
    "deflate64" - not sure if they extended the dictionary size (or even
    added a "solid" option).

    ZIP has had its day, and it's a pity Microsoft integrated handling of
    such an old compression system, as it props up the old relic for
    longer.

    Either persuade your contacts to get 7-Zip or another (often freeware
    too) program that can handle 7Z, or as suggested, bundle a 7Z self
    extractor inside a ZIP to prevent it being blocked.

    If they insist on commercial software, then the answer is RAR (with a
    free UNRAR,).

    7-Zip's "7Z" format needs to take it's place as the new, de-facto
    archiving format, as an open format that is far superior to legacy
    zip.

    --
    I may be dozzzy, but take the ZZZ's out to mail me
    http://www.junkroom.freeserve.co.uk/jvc2080.htm - 2x2x24 CD-RW troubles

    If you drop a cactus, don't try to catch it!
     
    Matth, Mar 19, 2006
    #9
  10. Matth skrev:
    > The ZIP (and GZIP) dictionary size is only 32k, far too small to take
    > advantage of inter-file similarity except for small files. Nested zip
    > was a bit of a false dawn.


    Ah well. Dead end, then.

    Thanks for the info.
     
    Morten Skarstad, Mar 21, 2006
    #10
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gordon Darling

    UnZip 5.51 - Unpacks .zip archives

    Gordon Darling, May 24, 2004, in forum: Freeware
    Replies:
    4
    Views:
    243
    Gordon Darling
    May 26, 2004
  2. Gordon Darling
    Replies:
    0
    Views:
    220
    Gordon Darling
    Mar 12, 2005
  3. Bjorn Simonsen
    Replies:
    0
    Views:
    280
    Bjorn Simonsen
    Mar 25, 2005
  4. Mel
    Replies:
    0
    Views:
    1,197
  5. Kinetic
    Replies:
    8
    Views:
    319
    Mike Dee
    Oct 31, 2005
Loading...

Share This Page