ZIP and solid archives

Discussion in 'Freeware' started by, Mar 15, 2006.

  1. Guest

    ZIP has become somewhat of an industry standard, and is supported by
    everything and everybody. It has been overtaken by other formats in
    terms of performance and features, but it still remains the most widely
    used and supported archive format.

    One of the benefits of alternative formats is not only that compression
    ratios alone are higher. But a lot of them also supports something
    called solid archives. That is, redundancy is removed not only within
    each single file, but also between all files in the archive. In some
    cases, the differences in compression ratios caused by solid archiving
    can be extreme.

    My question is: Can solid archives somehow be emulated without breaking
    compatibility with the ZIP standard?

    Point in case: I have a bunch of similar files. Let's look at only a
    couple of them, say file1 and file2. They are both 464 kB in size.
    Compressed to individual ZIP files they both shrink to 403 kB each.
    ZIPping both into one single ZIP gives me a larger ZIP file of 806 kB,
    which is to be expected.

    Now, let's try something else. I compress file1 and file2 individually
    into 7-Zip files. The resulting files are 399 kB, i.e. slighly smaller
    than the ZIP files. However, compressing file1 and file2 into one
    single 7z file takes only 400 kB! Adding even more of these files (I
    have a bunch of them) only seems to increase the 7z archive by about 1
    kB each.

    All examples are made using TugZip, maximum compression, unless
    otherwise stated.

    Next, I tried to first put file1 and file2 into a container without
    compressing them (i.e. ZIP with no compression), resulting in a single
    uncompressed 929 kB file, and then compressing this. I was disappointed
    to find that compressing this container with ZIP gave me a 805 kB file,
    only slightly smaller than the standard ZIP. Compressing the container
    using 7-Zip yet again produced a 400 kB file. Why this difference? Does
    it have something to do with search span or dictionary size of the two
    algorithms? Can this difference somehow be worked around?

    Out of curiousity I also tried making tgz (.tar.gz) and tbz (.tar.bz2)
    archives of file1 and file2, since these formats are also solid. The
    resulting archives were 805 kB and 505 kB respectively.

    The reason for my concern with this is that I routinely receive and
    send lots of files to various recipients in my work, either via e-mail
    or from closed web download sites. In particular, mails bouncing due to
    attachment sizes are a common problem. I have tried convincing some of
    my contacts to consider the possibility of using something like .7z, so
    far without results. From what I can gather, people are either using
    WinZip or the builtin shell extension in Windows XP. Self extracting
    executables are also out of the question, since these are commonly
    blocked due to security policies of various companies.
    , Mar 15, 2006
    1. Advertisements

  2. Fran Guest

    zip doesn't support solid archives, but 7z is also widely supported format
    and 7z archives can be unpacked by most archivers. The simplest solution to
    your problem would be: use 7z.
    Fran, Mar 15, 2006
    1. Advertisements

  3. Guest

    I know that zip does not support solid archives. I know that 7z is
    supported by a lot of archivers. That is not the issue. Please read my
    entire post.

    In case my point was unclear: I wanted to emulate solid archiving by
    using the following technique:
    1) Put all the files I want to compress into a single ZIP using _no_
    compression. The result is a single file with size equal to the sum of
    the original files.
    2) Compress the uncompressed ZIP. Since I am now compressing one single
    file rather than several small ones, redundancy throughout should be
    removed even if the compressor does not support solid archiving.

    The task performed by most archivers is actually a two step process:
    Joining several files into a single container, and compression. The
    difference between solid and non-solid archiving is basically in which
    order these two tasks are performed. Non-solid archivers compress the
    files first, before joining the compressed files. Solid archivers join
    the files first, before compressing the full container.

    I have previously achieved good results using the above technique with
    LhA, which neither does not support solid archiving. Actually, I
    originally picked up this tip from the LhA manual more than a decade
    ago. However, using this technique with ZIP seems to yield little to
    none of the potential gain, and I do not understand why.
    , Mar 15, 2006
  4. Tar and gzip

    The gzip file format holds a single compressed file. On Unix systems,
    compressed archives are typically created by rolling collections of
    files into a tar archive, and then compressing that archive with gzip.
    The final .tar.gz or .tgz file is usually called a "compressed tarball."
    WinGZip utility

    Advance Projects - AdvanceCOMP

    The main purpose of this utility is to recompress and test the zip
    archives to get the smallest possible size.

    The bzip2 and libbzip2 home page

    bzip2 can be used combined or independently of tar: bzip2 file to
    compress and bzip2 -d file.bz2 to uncompress (the alias bunzip2 for
    decompression may also be used).
    bzip2 for Windows
    FirstName LastName, Mar 15, 2006
  5. Guest

    I am aware of tar + gzip, as well as tar + bzip2. I also tried these,
    see my original post. tbz did provide better compression, tgz did not.

    However, neither brings me nearer to my goal, which is better
    compression ratios without breaking ZIP compatibility. As mentioned in
    my original post, I need to be able interchange files with contacts
    which due to own unwillingness and/or company policies solely rely on
    WinZip and/or zipfldr.dll
    This is somewhat interesting. Not really related to solid archiving,
    but rather an attempt to increase compression using an alternative ZIP
    implementation. The achieved results (less than 1% of the original ZIP
    in my initial tests) leaves a lot to be desired, but I'll look closer
    into this one. Thanks.
    , Mar 15, 2006
  6. Guest

    After a little digging following up mr. Lastnames post, I discovered
    that the procedure I have been describing is commonly known as nested
    zipping. Further digging revealed that there actually exists a tool
    that does this job without manual rezipping: VelcroFly

    The description fits my needs, but unfortunately the performance seems
    to be identical to that of my manually nested ZIP files: file1+file2 in
    a nested ZIP still takes up 805 kB.

    However, results for much smaller files seems to be good. For instance,
    the docs for AdvanceComp (the program suggested by mr. Lastname) take
    up 43,6 kB in 14 files. Compressed to a plain ZIP they occupy 16 kB,
    but in a nested ZIP they only take up 8 kB. In other words, solid
    archiving with ZIP can be done. The problem seems to be that once the
    original files exceed a certain size the ZIP deflate algorithm is
    useless. This is further supported by my experience with tgz, which
    uses the same algorithm and achieves the same mediocre results.

    Is there really no way around this?
    , Mar 15, 2006
  7. Vic Dura Guest

    I seem to recall reading when zip compression was released into the
    public domain after the PKarc lawsuit, that the compression of several
    files in a zip file was performed individually on each file. I never
    tested it because I didn't care. However your tests seem to confirm

    I don't know why they chose to do it that way. Possibly to make
    splitting files off from a .zip easier? That's just a guess.
    Vic Dura, Mar 15, 2006
  8. H-Man Guest

    On 15 Mar 2006 01:24:10 -0800, wrote:


    There's really a lot more to it than just making an archive solid. As
    you've experienced, the Unix world has been doing things this way for some
    time with tar+gzip. The method described is really the only way to make a
    solid ZIP archive, that is archive first, then zip the archive. Better
    compression rates are archived using better compression algorithms. It most
    certainly is possible to improve the ZIP algorithm to provide better
    compression, the trick is to leave it compatible with the existing UNZIP
    algorithm. This may not be possible at all, and even if it were, the
    tradeoff to keep it compatible might not provide enough gain to be worth
    the effort.

    In order to build a better ZIP it almost certainly is necessary to break
    the "ZIP" campatibility. I suggest that what you do is create a .7z SFX and
    then ZIP it as a ZIP file. The receiving party can then unzip it and
    execute it. This is certianly not ideal but it won't get dumped because it
    is an EXE and the receiving party won't need 7z to open the archive.
    Another trick I've often used is to make it the archive a 7z SFX and simply
    rename it to In the body of the email, instruct the receiving
    party to save it to a folder and rename it to filename.exe, then execute
    it. Your choice, but better compression requires better algorithms, end of
    H-Man, Mar 15, 2006
  9. Matth Guest

    The ZIP (and GZIP) dictionary size is only 32k, far too small to take
    advantage of inter-file similarity except for small files. Nested zip
    was a bit of a false dawn. Not sure if it would work any better under
    "deflate64" - not sure if they extended the dictionary size (or even
    added a "solid" option).

    ZIP has had its day, and it's a pity Microsoft integrated handling of
    such an old compression system, as it props up the old relic for

    Either persuade your contacts to get 7-Zip or another (often freeware
    too) program that can handle 7Z, or as suggested, bundle a 7Z self
    extractor inside a ZIP to prevent it being blocked.

    If they insist on commercial software, then the answer is RAR (with a
    free UNRAR,).

    7-Zip's "7Z" format needs to take it's place as the new, de-facto
    archiving format, as an open format that is far superior to legacy
    Matth, Mar 19, 2006
  10. Matth skrev:
    Ah well. Dead end, then.

    Thanks for the info.
    Morten Skarstad, Mar 21, 2006
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.