Unicode Compression?

Paul Fenton · Oct 23, 2004

What is it and what does it do? I see it as a property in text and
memo fields. Sometimes it's 'Yes' and sometimes it's 'No'. What are
the rules for determining those parameters? (Acces 2000)

Paul Fenton

Allen Browne · Oct 23, 2004

In the old days, we used one byte (8 bits) to store a character. 255
possible characters was plenty for all upper and lower case characters,
numbers, punctuation, symbols, and control characters. But it was not enough
for foreign languages such as Chinese characters.

Unicode uses two bytes to store a character, so it can handle 64k different
characters, but it also means that you are using twice as much storage
space. For English, every second character is unused. Unicode compression
addresses this so it does not use two bytes of disk space for every text
character in your database.

I'm not aware of any rules on which setting to use. Without testing, I would
expect the extra processor time (uncompressing the data) to be almost
completely offset by the smaller disk read. I have no idea whether that
would hold for disk writes as well. There are several other factors that
completely change the picture, such as whether the database is encrypted,
whether data is coming across a network, whether the processing is being
done on the server (e.g. Terminal Server), ...

You also allude to the fact that Access does not set this property
consistently. Whether it defaults to Yes or No depends whether you created
the field through the interface, via DAO, with a DDL query, or through ADOX.
Unfortunately, there is just no consistency with this, but it really does
not matter to the integrity of the data in the same way as the
AllowZeroLength setting does (which is also set inconsistently).

Sylvain Lafontaine · Oct 23, 2004

The first time that I have seen that, it was my understanding that the
uncompressed Unicode format was UTF16 and the compressed one was UTF8.
(UTF8 will compress the data for latin alphabets with no or few accents,
like english or french, but it will have the opposite effect for other
languages. Also, contrary to UTF16, UTF8 is not really a universally
adopted standard and may not be understood by many Unicode applications.)

However, I never took the time to verify that assumption.

S. L.

Paul Fenton · Oct 26, 2004

Thank you Allen. I did find that MS recommends the parameter be set
to "Yes" to improve performance.

Paul Fenton

Unicode Compression?

Paul Fenton

Allen Browne

Sylvain Lafontaine

Paul Fenton