Unicode Compression Access 2007

A

Allen Browne

Originally, computers used 1 byte (8 bits) to store a character. This
allowed for up to 256 characters, which is plenty for upper and lower case
letters, digits, punctuation, and special characters (space, line feed,
etc.) However, it is not enough for character based languages (such as
Chinese.)

Since 2000, Access therefore uses Unicode: 2 bytes per character, which
copes with 65536 characters but takes twice the space to store. For
languages like English, you can use Unicode Compression to reduce the wasted
space.

The issue is not just reducing the number of bytes being used on the hard
disk: large disks are inexpensive. More significant is the time it takes to
read all the data off the drive, or to write the data onto the drive. Since
most databases are disk-bound, it makes sense to use Unicode Compression to
reduce the reads and writes.

Of course, there's no such thing as a free lunch, so when you turn Unicode
Compression on, your CPU must compress the text (during a write) and
decompress it (during a read.)

There are too many factors to give a definitive answer on when a db will
benefit form Unicode Compression, but if we assume you have a fast processor
and ordinary drives and lots of text and not terribly complex data matching
and more reads than writes, my guess is that your application will be
disk-bound and hence Unicode Compression will be useful.
 
D

David W. Fenton

Of course, there's no such thing as a free lunch, so when you turn
Unicode Compression on, your CPU must compress the text (during a
write) and decompress it (during a read.)

Back in the old days when disk compression software was in use, it
was found that for large files, the compressed version was faster to
read and write, because the disk was slow enough that having half
the data or less to write made up for the compression time.

I don't think that's the case these days, nor that data page reads
in Access are big enough to have benefited from it then, but I just
wanted to point out that the reduced amount of bits being pushed
around can outweigh the compression/decompression time.
 
A

Allen Browne

David if you (or anyone else) has figures on this, it would be interesting.

My impression is that CPU speeds have increased dramatically since those
days, whereas disk speeds have increased only marginally. So in theory,
reading the uncompressed data from twice the disk space should be even more
disk-bound than it used to be.

Of course, much of this will change in the next decade or so as we move away
from platters and heads to solid state memory. (I haven't really had a
chance to work through what this will mean for the way we design and store
databases.)
 
D

David W. Fenton

My impression is that CPU speeds have increased dramatically since
those days, whereas disk speeds have increased only marginally. So
in theory, reading the uncompressed data from twice the disk space
should be even more disk-bound than it used to be.

Oh, no, I think disk speeds have vastly improved. The time frame I'm
talking about is c. 1990, when a 40MB hard drive was a pretty
standard configuration. I had a laptop back then with a 42MB hard
drive that was compressed, and the hard drive was *very* slow in
comparison to modern drives.
Of course, much of this will change in the next decade or so as we
move away from platters and heads to solid state memory. (I
haven't really had a chance to work through what this will mean
for the way we design and store databases.)

I think that the entire design of RDMS software needs to change, as
increasingly we'll be working with the data loaded into memory, with
complete random access at very high speeds. That changes everything.

Flash disks have the same random access properties (though not the
same speed as RAM), but are slow to wright. Flash disks plus
in-memory caching would be very, very fast, seems to me.

I saw a review recently a Sony VAIO laptop that had a Flash drive
instead of a hard drive. It was apparently *very* fast. The problem
now is that these large Flash memory drives are very expensive so
they can't make them nearly as big as cheap hard drives. But
according to the reviewer it definitely made a huge difference in
terms of the speed of the machine (he compared it to the same laptop
with a conventional hard drive). Given that laptops generally have
very slow hard drives (because of power consumption issues you
hardly ever see anything faster than 5400RPM, which is considered a
dog of a drive in a desktop; Lenovo is now offering 7200RPM drives
in the ThinkPad T61 line, but apparently they have a really negative
impact on battery life because of the NVIDIA video chips used;
apparently Intel video chipsets used with these faster drives don't
have the same power drain), these Flash drives could be a way to
really speed up the performance of portable computers, and
simultaneously extend battery life (no moving parts for the primary
storage drive saves a lot of power).

I definitely think that the cheapness of hard drive storage in terms
of density has made compression pretty much useless. Much more
important these days is encryption, which has its own issues in
terms of eating up CPU cycles. I'm currently working on a borrowed
laptop and have encrypted all my data files (since that's the only
real way to protect your data on a laptop; see
http://www.practicalpc.co.uk/computing/windows/xpencrypt1.htm for
fabulously good instructions on how to do it), and I have the
impression that it slightly slows things down (I also ran into
issues with having to change the user for the Apache system service,
since the local System account didn't have the key to decrypt the
files), but not a whole lot (and it's not a high-powered laptop,
having only a single-core CPU).

Anyway, I'm rambling, so I'll stop now.
 
J

Jamie Collins

Back in the old days when disk compression software was in use, it
was found that for large files, the compressed version was faster to
read and write, because the disk was slow enough that having half
the data or less to write made up for the compression time.

I don't think that's the case these days, nor that data page reads
in Access are big enough to have benefited from it then, but I just
wanted to point out that the reduced amount of bits being pushed
around can outweigh the compression/decompression time.

I just wanted to point out that most of the regulars, including David
W. Fenton, will use Unicode compression most of the time. See:

Description of the new features that are included in Microsoft Jet 4.0
http://support.microsoft.com/kb/275561
"It should be noted that when going through the Access User Interface
(UI), Access will always add the Unicode compression attribute
whenever applicable. The only time that the end-user needs to be
concerned about adding the compression attribute is when creating a
table with the CREATE TABLE SQL syntax. The compression attribute is
not accessible via DAO"

To create a table, most regulars use either the Access User Interface
in ANSI-89 ('traditional') Query Mode or DAO in VBA code, so
essentially it's a 'lifestyle choice' in which the avoidance of
Unicode compression does not figure.

Jamie.

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top