File Compression: No-Brainer?

P

(PeteCresswell)

I'm troubleshooting some problems with my media server and
something that just dawned on me is blocking. The discs are
blocked 4k and the media server's recommendation is 64k.

That being the case, I will reformat/reblock the discs.

The Question: Should I leave compression enabled?

Somewhere along the line, I got the idea that there is a
crossover point between physical disc speed and memory speed
where compression saves net time once memory speed is above a
certain point for a given physical disc speed.... and that, with
2+ ghz processors and SATA drives that crossover point has long
been passed.... and, accordingly, turning on compression sb SOP.

??
 
P

Paul

(PeteCresswell) said:
I'm troubleshooting some problems with my media server and
something that just dawned on me is blocking. The discs are
blocked 4k and the media server's recommendation is 64k.

That being the case, I will reformat/reblock the discs.

The Question: Should I leave compression enabled?

Somewhere along the line, I got the idea that there is a
crossover point between physical disc speed and memory speed
where compression saves net time once memory speed is above a
certain point for a given physical disc speed.... and that, with
2+ ghz processors and SATA drives that crossover point has long
been passed.... and, accordingly, turning on compression sb SOP.

??

http://www.ntfs.com/ntfs-compressed.htm

"The compression algorithms in NTFS are designed to support
cluster sizes of up to 4 KB. When the cluster size is greater
than 4 KB on an NTFS volume, none of the NTFS compression
functions are available."

I don't know if that's sensitive to OS version or not (like, changed
at all, over the years).

The algorithm used, could be a good one. One article says
NTFS compression is LZ77. And the description for GZIP,
says it uses some flavor of LZ77 as well. And GZIP is about
the best, in my limited testing, for some degree of compression,
with good processing speed. There are compression methods which
achieve higher compression, but are much more computationally
expensive. Also, for GZIP, there is a multi-threaded version,
which can further speed things up. I doubt the NTFS compression
is that fancy (i.e. use a few cores).

You could bench with GZIP, and see what kind of speed your
machine can manage. The 7ZIP package, I think it you do
an "Add To Archive" and compress something, you can choose
the GZIP algorithm. If you needed a single core compressor run
with GZIP, that might be a way to do it. (Short of getting
a copy of the actual GZIP for Windows, and using that of course.)
And that would give you some idea of how fast an LZ77 compressor
could do its job.

Then the next question becomes, what kind of file do you test
with ? A file which is trivially compressible (a file of zeros) ?
A file which can't be compressed ? (If you can't think of a
way to make test files, this is a possibility. I'm hoping
this would make a 1GB test file, in each case.)

dd if=/dev/zero of=C:\easycompress.bin bs=1m count=1000
dd if=/dev/random of=C:\hardcompress.bin bs=1m count=1000

I would think normally, you choose to use compression, when space
saving is paramount, and performance is a secondary issue. For example,
I had a 500GB disk, and needed to do some temporary work with
600GB of files. It wasn't much of a debate, as to whether I needed
to enable compression or not on the drive, since the project wasn't
going anywhere without it. That's the only time I've used the
compression. I didn't really care whether it ran at 20MB/sec or
125MB/sec - it was just going to take as long as was needed,
until it was finished.

Paul
 
P

(PeteCresswell)

Per Paul:
http://www.ntfs.com/ntfs-compressed.htm

"The compression algorithms in NTFS are designed to support
cluster sizes of up to 4 KB. When the cluster size is greater
than 4 KB on an NTFS volume, none of the NTFS compression
functions are available."

I shot off my mouth too soon on that one.

Just formatted the drive and, as soon as I selected 64k blocking
the "Compression" checkbox became disabled.
 
V

VanguardLH

(PeteCresswell) said:
I'm troubleshooting some problems with my media server and
something that just dawned on me is blocking. The discs are
blocked 4k and the media server's recommendation is 64k.

That being the case, I will reformat/reblock the discs.

The Question: Should I leave compression enabled?

Somewhere along the line, I got the idea that there is a
crossover point between physical disc speed and memory speed
where compression saves net time once memory speed is above a
certain point for a given physical disc speed.... and that, with
2+ ghz processors and SATA drives that crossover point has long
been passed.... and, accordingly, turning on compression sb SOP.

??

What type of compression are you talking about? Where you compress a
file into an archive file (e.g., .zip) and then transfer that over the
network? That requires the time to compress the file in the first place
and then everyone that retrieves that compressed file is going to have
to decompress it. You have one compression plus the multiple
decompressions. Since you're even considering compressing the file must
mean it gets retrieved by LOTS of users, like hundreds or, more likely,
thousands of them, or you are or a few users are retrieving it
repeatedly a LOT of times. Since the original compression and every
subsequent decompression is performed completely independently of the
network transfer, there is no consideration about disk speed versus net
time. The static compress and decompress sessions aren't network
functions. You compressing the file won't involve the network. Them
decompressing the file won't involve the network. You save on disk
space (maybe) with the compressed file and network transfers for it will
be shorter because the file was smaller.

If you're talking about compression within the file system then you gain
nothing regarding "net time". The OS handling that file system the
employs compression is going to decompress the file upon reading it
BEFORE it get transferred over the network. Whether the file was
compressed or not, it will still be sent decompressed (or never
compressed) over the network. You'll only save disk space on the
server. The file over the network will be the same size whether you
used compression in the file system on the media server or not.
If you're using NTFS compression, you will not be transferring a
compressed file over the network. Before it gets transferred, the file
has to get decompressed. The savings of compression (for disk space) is
of benefit only within the host where it is used.

Note that I'm assuming networking is involved since, after all, you are
asking about a media *server*, not a workstation. Is this a server or
are you using a server version of Windows as a workstation? Are you
serving the files to other hosts or just to that host? If you're asking
about NTFS file compression for files you have and use on the same host
then it doesn't matter that you're using a "server" version of Windows.
You're using it as a workstation on vacuous steriods (the server
potential is not realized). In that case, no matter how fast is your
hard disk or memory, compression will ALWAYS slow access to a file.
Before you can read the file means the OS has to decompress the file to
present the decompressed data it contains to whatever process opened
that file. You have a computer with a slow hard disk and slow memory.
Adding compression of files in NTFS will slow access to those files.
You replace the hard disk and memory (and probably the mobo to make use
of the fastest memory you can get) with the fastest you can afford.
Adding compression of files in NTFS will *still* slow access to those
files. Compression adds overhead no matter how fast is your hard disk
(even if you go with an SSD) or memory.

http://directedge.us/content/to-compress-or-not-to-compress-part-ii

Google can find you other compressed vs. non-compressed benchmarks. You
mentioned *media* server but does that mean you only read the compressed
files? Or do you also write to them (which includes creating them since
that's a write operation)? If you write, look how much overhead gets
added for NTFS compression in the above article.

Faster hard disks and faster memory means the read and write operations
may go faster (shorter time) but compression is still going to add
overhead onto those operations, so on that faster hardware it will still
take longer to use NTFS compression. It ALWAYS adds overhead. To look
into the file (read) means having to decompress. To change the file
(write) means having to decompress. Your apps don't work with the
compressed file. They work with the decompressed version. If you want
the most speed then don't compress. If you're desparate for disk space
then maybe you might want to compress - but not if you're writing to
those files.

Something else to consider (if saving hard disk space is the actual
criteria for using NTFS compression) is that NTFS compression is
designed for best speed, not for best compression. The more compressed
is a file the longer it takes to decompress. Not only does that take
longer but eats up more CPU (unless you can throttle CPU usage which
then lengthens the compression time).

If you are truly using the host as a media server then you'll find media
files don't compress much. That's because many media file formats
already incorporate compression or conversely the media files are
[nearly] imcompressible. You'll be trying to compress a file that is
already compressed. You get no benefit from trying to doubly compress
the file. In fact, when creating compressed files, like .zip files, you
could end up making the .zip file larger due to the overhead of adding
the archive container around another archive container with the
compressed file.

See some comments at http://en.wikipedia.org/wiki/NTFS#File_compression.
 
P

Paul

Paul said:
http://www.ntfs.com/ntfs-compressed.htm

"The compression algorithms in NTFS are designed to support
cluster sizes of up to 4 KB. When the cluster size is greater
than 4 KB on an NTFS volume, none of the NTFS compression
functions are available."

I don't know if that's sensitive to OS version or not (like, changed
at all, over the years).

The algorithm used, could be a good one. One article says
NTFS compression is LZ77. And the description for GZIP,
says it uses some flavor of LZ77 as well. And GZIP is about
the best, in my limited testing, for some degree of compression,
with good processing speed. There are compression methods which
achieve higher compression, but are much more computationally
expensive. Also, for GZIP, there is a multi-threaded version,
which can further speed things up. I doubt the NTFS compression
is that fancy (i.e. use a few cores).

You could bench with GZIP, and see what kind of speed your
machine can manage. The 7ZIP package, I think it you do
an "Add To Archive" and compress something, you can choose
the GZIP algorithm. If you needed a single core compressor run
with GZIP, that might be a way to do it. (Short of getting
a copy of the actual GZIP for Windows, and using that of course.)
And that would give you some idea of how fast an LZ77 compressor
could do its job.

Then the next question becomes, what kind of file do you test
with ? A file which is trivially compressible (a file of zeros) ?
A file which can't be compressed ? (If you can't think of a
way to make test files, this is a possibility. I'm hoping
this would make a 1GB test file, in each case.)

dd if=/dev/zero of=C:\easycompress.bin bs=1m count=1000
dd if=/dev/random of=C:\hardcompress.bin bs=1m count=1000

I selected GZIP "Normal" compression for a test (as done inside 7ZIP).

The file full of zeros, compressed at 19MB/sec. The disk
can easily keep up with the write rate. The file shrinks by
close to a factor of 1000.

The random file, compressed at 12MB/sec. Which is a bit surprising.
I thought the compression process would "scream" on the zeros file.
But it didn't.

The hard to compress file, is slightly larger than the original
file. So it doesn't compress at all (and being filled with
random numbers, this isn't surprising). If you're storing video
files, they likely won't compress much either, as modern video
formats have around 100:1 compression.

If I use a copy of gzip.exe from 1993, it compresses the 1GB file of
zeros in 11 seconds. That's using its default compression level setting.
So that's about 90MB/sec. The output file is a bit smaller than the one from 7ZIP.
Using the multithreaded compressor (pigz.exe), gets that down to 7 seconds.
My processor has two cores, and both run pretty well during that short test.
They don't go to 100% but still load pretty well.

gzip -c easycompress.bin > easycompress.bin.gz (11 seconds)
pigz -c easycompress.bin > easycompress.bin.gz (7 seconds)

If I run the random file, that runs at about 20MB/sec. (Using the
multithreaded compressor, that one does it at 27MB/sec)

gzip -c hardcompress.bin > hardcompress.bin.gz
pigz -c hardcompress.bin > hardcompress.bin.gz

So there's some difference between implementations. The non-7ZIP
ones run a bit faster.

I don't know how well the NTFS compression works, but those
are some examples of LZ77 at work. My worst case is around
12MB/sec and best case around 143MB/sec.

The source file in that case, I checksum it first, to
cause the system file cache to load it up. That's why the
source disk isn't a limitation. Because the 1GB file is
completely in memory for the test. That's how I can get a
read rate of 143MB/sec.

If I run md5sum on the source file, the summation takes 3 seconds,
or ~333MB/sec from the system file cache. So the system file
cache can deliver data fast enough for the other tests.

If you store mostly video files on a compressed NTFS, then
I'd expect the 20MB/sec kind of speed.

Paul
 
P

Paul

J. P. Gilliver (John) said:
VanguardLH <[email protected]> said:
The Question: Should I leave compression enabled?

Somewhere along the line, I got the idea that there is a
crossover point between physical disc speed and memory speed
where compression saves net time once memory speed is above a
certain point for a given physical disc speed.... and that, with
2+ ghz processors and SATA drives that crossover point has long
been passed.... and, accordingly, turning on compression sb SOP.
[]
Faster hard disks and faster memory means the read and write operations
may go faster (shorter time) but compression is still going to add
overhead onto those operations, so on that faster hardware it will still
take longer to use NTFS compression. It ALWAYS adds overhead. To look

So does reading a file. I think he was asking: with today's fast
processors (and memory), does reading a compressed file and then
decompressing it take less time than just reading the uncompressed file?

(I think he's talking about the compression offered by the OS [I didn't
know it still was: I first became aware of that around I think Windows
3.1 or 95 or so, and never used it then as I didn't like the idea of how
it was done then, which was have all of your drive C: compressed into
one vulnerable file. I presume modern compression is on a file basis],
rather than any .zip or similar - i. e. such that it's transparent to
the user.)

When I tested it here, it was a property that applied to a whole partition.
I set up a data partition (not C:). And compression is applied to all the
objects on that data partition.

If the best case was 143MB/sec from the decompresser, that's faster than the
135MB/sec raw read rate from the drive. So a "super-compressible" file, can
actually be read out faster, than if it was written directly to disk uncompressed.
But in most realistic situations, the file is only marginally compressible,
and the performance level is closer to 20MB/sec at compress/decompress. In
which case, it's slower than if the file was written straight to disk at
135MB/sec. Even the end of the disk, still handles around 70MB/sec, and is
still faster than the 20MB/sec case.

So you only really want to turn on compression, if you're desperate for
space. And it only created more space for you, if the data is actually
compressible (by the equivalent of LZ77).

Paul
 
P

Paul

J. P. Gilliver (John) said:
J. P. Gilliver (John) wrote: []
So does reading a file. I think he was asking: with today's fast
processors (and memory), does reading a compressed file and then
decompressing it take less time than just reading the uncompressed file?
(I think he's talking about the compression offered by the OS [I
didn't know it still was: I first became aware of that around I
think Windows 3.1 or 95 or so, and never used it then as I didn't
like the idea of how it was done then, which was have all of your
drive C: compressed into one vulnerable file. I presume modern
compression is on a file basis], rather than any .zip or similar -
i. e. such that it's transparent to the user.)

When I tested it here, it was a property that applied to a whole
partition.
I set up a data partition (not C:). And compression is applied to all the
objects on that data partition.

Interesting. Presumably, however, it treats the objects individually,
not rewriting the whole partition every time something is changed, though.

I think it's just the clusters currently being written.

The stuff I was writing, was compressible, and wasn't pathologically bad.
If I'd known in advance, that the data couldn't be compressed, it would
be pretty dumb to find out the hard way, that it wasn't going to save any
space. (I did some tests first, to see how much individual files would
compress. So I knew my 600GB of files would easily fit in the 500GB of
available space.)

I don't know if the NTFS compression scheme is clever enough to leave
uncompressed, things that don't compress well, or goes ahead anyway.
As it takes slightly more space, to store something that doesn't compress
well. And the I/O rate would be pretty "lumpy", if the file system
had to make decisions on the fly, how to store things. So my guess would
be, the compressor does its thing in any case.

*******

http://en.wikipedia.org/wiki/Doublespace

"DoubleSpace is, however, different from such programs in other aspects.
For instance, it compresses whole discs rather than select files.
Furthermore, it hooks into the file routines in the operating system so
that it can handle the compression/decompression (which operates on a
per-cluster basis) transparently to the user and to programs running
on the system."

http://en.wikipedia.org/wiki/NTFS_Compression#File_compression

"NTFS can compress files using LZNT1 algorithm (a variant of the LZ77).
Files are compressed in 16-cluster chunks. With 4 kB clusters, files
are compressed in 64 kB chunks. If the compression reduces 64 kB of data
to 60 kB or less, NTFS treats the unneeded 4 kB pages like empty sparse
file clusters - they are not written. This allows not unreasonable
random-access times."

Those descriptions sound different, but I'm not sure they are. When I
enabled the tick box on that NTFS partition, it applies to the partition,
so you could claim it applies to the "whole disk". I don't know if the DoubleSpace
concept was different, in that it was "visible", or it was just the description
of the thing that wasn't clear about how it worked.

Windows has the ability to unzip a ZIP archive (a thing ending in .zip), but
that's different than a file system compression scheme. The files on my disk,
didn't end up with a new extension of .zip or anything. They still had their
original file names.

And it's not something I left enabled. After the experiment was finished,
and I had my answer, I returned the disk to uncompressed mode. I didn't
leave it that way. If compression takes a whole CPU core, and handles 20MB/sec
with something like video content, it wouldn't be a very good permanent choice.
A highly compressible file (pathologically so), is worthwhile to compress, if
you count keeping a CPU core pegged a good usage of CPU. It would allow
a write rate, faster than the raw disk alone. But not many things are going
to be that compressible. I don't generally work with data files that are all
zeros (or some other repetitive value besides zero).

What I was doing at the time, was converting a Camstudio screen capture movie, into
individual BMP files, for analysis. To discover, that the stupid thing duplicates
captured frames, if it hasn't finished processing the current frame. In Camstudio,
the default screen capture speed is set to 200 frames per second, leaving the
impression the program can actually do that. It turns out, in fact, that the
same frame is repeated about 30 times, before a new frame has been finished
processing and can be written. It means the output "screen movie", contains
30x more data than it's really got. It was actually only capturing at 6 to 7
frames per second. So again, I'd be stupid to leave the tool set at the
default 200, if it had no intention of actually capturing 200 frames per second
and making a "smooth" movie. The movie was far from smooth, and didn't succeed
in capturing mouse movement well. By converting the movie to BMP format,
then computing a checksum on each frame, I was able to determine how many
frames contained identical content, and I could easily see how the Camstudio
algorithm worked. But to get there, I needed 600GB of space, while the
BMP files were being written out. The individual BMPs were compressible, so
the whole thing fit easily on a 500GB compressing drive. And once I understood
what the thing was doing, I could dump the 600GB of BMP files as they were
no longer needed. If I was to use Camstudio again, I'd simply adjust the
default to something more realistic (perhaps a setting of 12 to 14 FPS, if
the actual hardware can only manage to capture 6 to 7 real frames - there'd be
no sense to keep the 200 setting). The wasted space is only evident, if you
attempt to convert the movie to something else. Then it balloons.

Paul
 
V

VanguardLH

J. P. Gilliver (John) said:
I think he was asking: with today's fast processors (and memory), does
reading a compressed file and then decompressing it take less time
than just reading the uncompressed file?

My point is no app "reads a compressed file". The OS has to first
decompress [a portion of] the file before the app can read that file.
Surely that depends on how often the applications (and/or OS, and/or
caching hardware on the drive) do an actual write? As you say, they
work with the decompressed version; it's how often that is saved back
to disc that needs determining, which probably isn't easy.

As noted by example, reading will always be slower but sometimes not by
much. The savings in disk space (and not anything to do with network
transfer since the file would be decompress to do the transfer) might
have greater weight than the slight delay to read a file. Binary files
don't compress well so they receive little compression so decompression
is very quick. However, some file types, like documents, get heavily
compressed so even reading them which first requires decompression
would result in a significant delay. If writing is involved at all,
the delay due to decompression is very significant.
I don't think he is desperate for space - he's just wondering about
speed.

Compression, even if performed only once (i.e., no writes after
creating the file), along with every decompression to access the
contents of the file will always add overhead. So speed will always be
slower. Sometimes by just a little, sometimes by a lot, but always by
some amount.

I can see use of NTFS compression for use of files within that file
system an upon that same host but only if trying to conserve disk space
consumption. If speed is the ultimate criteria then never compress.
But it's an interesting question: for a file that _does_ benefit from
compression, and is large enough for the difference to be
significant, is the compressing/decompressing going to add more time
than the writing/reading of the smaller result saves?

Compression (to write) and decompression (to read) always adds
overhead. Both read and write will be slowed. The native file
(uncompressed) would write and read faster.
Given that processors and memory (especially multicore, if the
compressor can use that) are so fast now?

Being fast doesn't preclude adding the overhead. Whatever is the
overhead, it will be there and slow throughput. With a slow setup,
maybe the overhead impinges only 5 seconds to decompress a highly
compressible huge file. With a much faster hardware setup, the
overhead might only take 1 seconds; however, accessing the native
(uncompressed) file would've also benefited from the faster hardware,
so it's likely the percentage of overhead remained the same (using the
same compress/decompress algorithms). That the overhead takes less
time is accompanied by the less time it would've taken to access the
native file.
It might be of interest to those who work with raw audio, and
particularly raw video, especially if their systems have enough RAM,
and their software works in such a way, that the files are only read
and written at the beginning/end of the editing process.

Only reading and high compressible file formats is about the only time
when compression makes sense but only when there is a dearth of disk
space. If disk space is not a criteria then compression with its
overhead makes no sense to employ. If the files are nearly
incompressible, there is also no advantage in using compression. If
the files are written to then compression can severely slow access to
highly compressible files.

Read only (no writes) Highly compressible files Low on disk space

If any of those criteria are not required then I see no point in
employing compression. I'm sure special cases can be contrived where
compression benchmarks out higher for read access than for a
non-compressed file but what use is such a benchmark to a real world
server setup where files vary in size resulting in varying amounts of
slack space along with varying file types some of which won't compress
and others that will highly compress. It's not likely any server is
going to operate under the limited conditions of a special-case
benchmark.

As for "net speed" as mentioned by the OP, compression won't help with
that at all. The compressed file will have to get decompressed before
it gets put out on the network. The compression is only within the
NTFS file system on the host where it is used. It is unknown what the
destination uses. Might not even be Windows on the other end.
 
P

Paul

Bill said:
J. P. Gilliver (John) said:
In message <[email protected]>, VanguardLH <[email protected]>
writes:
[]
Compression (to write) and decompression (to read) always adds
overhead. Both read and write will be slowed. The native file
(uncompressed) would write and read faster.
Compression/decompression will always add overhead. BUT SO WILL
READING/WRITING A BIGGER FILE (i. e. the uncompressed one).

In practice, it seems highly likely that there will rarely if ever be a
file that compresses/decompresses in less time than will be saved by it
being smaller to actually write to disc. But it could, in theory.
[]
It might be of interest to those who work with raw audio, and
particularly raw video, especially if their systems have enough RAM,
and their software works in such a way, that the files are only read
and written at the beginning/end of the editing process.
Only reading and high compressible file formats is about the only time
Raw audio and video _are_ quite highly compressible. (Text considerably
more so, but I suspect the files will rarely be big enough for it to be
relevant.) Whether raw audio/video files are compressible _enough_ for
this to be relevant to this discussion, I don't know, but I suspect not.

Let's take a specific example (audio) that can perhaps illustrate a point:

We can compare a mp3 to its wav counterpart. Typically the mp3 file (using
128 kbps) is about 1/10 or 1/11 the size of its WAV counterpart.

It would be much faster to "transfer" the smaller (compressed) file over a
network than its WAV counterpart, obviously, since no "decompression" is
needed in such a case. (If such on the fly conversions were necessary
during the process, that would be a different ballgame, altogether)

However, if we want to edit or work on the audio file in an audio editor,
the WAV file opens much more quickly (and saves more quickly) in an audio
editing program than its mp3 counterpart, since no conversions are needed.

I expect a similar thing could be said about video files, too. But I don't
recall working with a raw uncompressed video file directly (and I don't own
a video cam) since they are so huge to begin with. I just play around with
the compressed MP4, WMV, and MPEG2 formats, and let the video editing
program(s) do the conversions "in the background".

<snip>

Remembering the difference between lossy and lossless compression methods...

The NTFS compression being applied in this example, is the lossless kind.
Meaning it won't damage any content you store on the disk.

Lossless compression is limited in terms of the compression ratio, for
most real content. (The "file of zeros" pathological case doesn't count.)

Lossy compression, such as your 10:1 MP3 or a 100:1 video compressor like
maybe H.264, those change the content. The formats take advantage of
human perception, and modify the content in a way that doesn't bother
humans too much. If you decompress and recompress files like that,
there can be generational loss. So if you were doing some kind of
editing, and not finishing your edits in one session, you could be
degrading the content still further (that is, if you saved out to
MP3 between sessions or whatever). And yes, I've run into editing
people, who do stuff like that (aren't really aware about
generational loss, or don't care).

When a video file is already compressed 100:1 by a lossy compressor,
you would not expect that video file, stored on an NTFS compressed
partition, to compress further. And the NTFS compressor, while fast
compared to other things, probably is going to be limited to around
20MB/sec (it's still going to try to compress an uncompressable
thing). So if you know *all* your files are compressed, there's not
much advantage to using NTFS compression on the storage device.

The same thing happens to network compression. For example, at one
time, we'd be using dialup. And dialup had compression options. If
a person downloaded a PDF file, the compression capability in the
dialup modem, was rendered useless (for any properly crafted PDF).
Generally speaking, as time goes on, the opportunity to gain a
"free lunch", is reduced, as formats of things already involve
compression of one sort or another. Even some of the downloadable
executables, come in "packed" format, and have been compressed already.

My WinTV card, captures input in uncompressed form. A movie might be
a file in the 100-120GB range. I would never consider transferring
that over a network, even GbE on my LAN. If the content was of some
value, I'd store it on a local disk, until all edits were finished
or whatever. Then produce a compressed format video file. And then
consider whether there was a need to send that over a network. If I
needed to start the edits over again, I could go back to the 120GB file.
And if I wanted to store the 120GB file for a couple years, I'd use
the 7Zip compressor, which only compresses at around 5MB/sec, but achieves
about as good as you can get for a lossless format (it's better than LZ77).
It might take all day, with that running in the background, to compress the
file. It would likely take more time, than making the compressed format
video file (DVD quality). But being lossless, I can be assured that none
of the information in the original capture is lost. It would probably take
less than 24 hours, to move that 120GB file over my LAN, but it's also
easier to use SneakerNet and just walk the hard drive with the file on it,
over to another computer.

Instead of storing the raw WinTV file, I'd also have the option
of using Huff YUV. I don't have many CODECs installed on my PC,
but this is one of the ones I did install. And this can be used
to reduce the size of the WinTV file. This helps reduce the
I/O rate, if doing edits in uncompressed --> uncompressed format.
And probably doesn't help, with respect to long term archival
storage (where you'd be using 7ZIP anyway).

http://en.wikipedia.org/wiki/Huffyuv

Paul
 
P

Paul

Bill said:
Well yeah. I don't generally mind the loss. The advantage gained in disk
space is so substantial that I'm willing to forego the small loss (up to a
point, of course). And since I only have a SD TV set here, and only watch SD
programs, and even then just watch those on a 20" Sony CRT TV, I can also
live with the inherent limitations of SD (standard def) TV signals too. :)
So, MP4's are just fine by me. :)

Oh, just as another comment on the topic of audio compression:

I've taken a 128 kbps Joint Stereo MP3, and recompressed it again (the
reason isn't relevant here) a couple of times to the same bitrate, and I'm
hard pressed to tell any difference, and I have pretty good ears. In
principle, each time you do this, you add generational loss, of course, but
that loss is not always or necessarily noticeable, in practice.

And there seems to be this "old wives tale" floating around that if you've
already compressed it once, and have to do it again (for any reason), you'll
destroy it. Which ain't really the case, in practice! I would respectfully
suggest that those who perpetrate this story (about how it really ruins it)
do some of their own listening tests, firsthand, instead of just
regurgitating it. :)

But just to be clear: yes, each time you do this, there is some generational
loss. But that's missing my point. I've been stuck in some situations
where all I had was the mp3 source file, and later did some further
improvements in its restoration (even if it ended up being a second or third
generational copy), and the results were well worth it!

Have you popped both of them into a waveform editor and compared them ?

If you need an editor, there's Audacity. It's going to be missing lots
of I/O formats, so you may need to convert to WAV before reading in two
sample files. At least this will allow you to look at the waveforms from
each generation of file.

http://audacity.sourceforge.net/

I don't use that for things like music content. Sometimes, a person will
post a sample file on the web, and I do a few commands with that tool,
to try and figure out where a source of noise might be coming from. The
tool may be missing certain filters, and one of my favorite add-ons is
"notch", for filtering out things like A.C. hum. You can use the spectrum
analyser and "notch", to hack at things. Filename: notch.ny

http://wiki.audacityteam.org/wiki/Nyquist_Effect_Plug-ins#Notch_Filter

The "notch.ny" is a text file, so you can see the language it's written in
if you open it with Wordpad.

Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top