Reduce size of data

T

Tom Ellison

I have the assignment to imporve a database's performance.

The data is in a separate back end database that has just under 3 million
rows. It has no index. All queries are running as table scans, of course.

I want to add some indexes that should fix this nicely. But there's no
room. It's almost 2 gigabytes now.

I've found that several columns are much larger than necessary. Here are
the stats:

Datatype Length Number Change
text 50 1 15
decimal 18 27 7
decimal 18 2
date/time 1
integer 2

The above Number column is the number of such columns in the table. I am
expecting to impact the size of the table by changing a 50 character text
column to 15 characters, and by reducing the precision of the 29 decimal
columns to 7.

I could not do this by just changing the datatypes on a full table.
Something about not enough memory. (I like Jet less and less. I would
never have this kind of mess with MSDE!)

So, I made a copy of the table, cleared out all the records, compacted this,
and made the changes to the 27 decimal columns. The table was just over
100K bytes in size. I then linked to the database containing the copy of
the data and created an append query to put the data in.

When this finished, I had saved less than 100K bytes, a fraction of 1% of
the data size. Does not the size of a Decimal datatype depend on the
precision? If not, then why is there a precision. If you're going to store
10 bytes all the time, why not allow them all to be used to hold data?
Allowing two digits (nibbles) per byte, a decimal of precision 18 should
require 10 bytes to store (add one nibble for a sign and divide by two,
rounding up). I chose 7, although 6 would have been sufficient for all
existing data, since I expect it's going to add one for a sign anyway, so 6
+ 1 = 7 would be 4 bytes, and 7 + 1 = 8 would be 4 bytes anyway, so there
would be no savings there. Just my theory.

Anyway, after completing the append, the table had reduced in size by less
than 100K bytes (out of just under 2 GB). By my calculation, I was
expecting a reduction of just over 50%. Does Jet not store the decimal
datatype in varying sized pieces, according to the precision? Is my formula
of (Precision + 1) / 2 not what you would expect?

I then repeated all the steps, reducing the text column from 50 to 15. This
I expected would save 105MB (35 bytes X 3 million). Bottom line: the file
is now larger than ever!

Does anyone know how a database can be reduced in size to accomplish this?

Tom Ellison
 
T

Tom Ellison

Tried another approach. Created a new database and typed in the table's
entire definition, but with the reduced sizes. Still not smaller. I'm
expecting it to be half the size, but almost no change. Not even 1%.

Tom Ellison
 
T

Tom Ellison

I changed them to "single" and it cut the file size by 75%. Go figure!

Tom Ellison
 
J

John Nurick

AFAIK Jet stores Decimals like dBase does: no nibbles, but one byte per
digit, so at least 18 bytes for a decimal of precision 18. However, if
you only need 7 digits precision you could replace your Decimals with
Longs and hard-code the position of the decimal point.

OTOH Jet does not store empty space if a Text field is not full: a
10-character string will occupy the same amount of space whether the
field size is 10 or 255. (This of course means storing a size and a
pointer for each value.)

Bottom line: you're very near the limits of what Jet's designed for, and
some flavour of SQL client/server looks like the sensible way to go.
 
T

Tom Ellison

Dear John:

MSDE has same limits, but probably better storage algorithms. I'm not sure
there would be enough benefit to switching to MSDE. Client will not upgrade
to full SQL Server. Expense of server OS, SQL Server, plus server hardware
would be nearly $4K I expect. No where close to what he would spend.

I've used the same procedure in changing to a Single datatype, and the
database dropped by 80% in size. Go figure! It didn't drop even 1%
changing from precision 18 to 7. No explanation from me!

So, as you can see, I had already discussed SQL Server as an alternative.
It's one I certainly like.

My initial results are in. Indexing just one numeric column causes a 83%
increase in performance. Pretty good start. Maybe enough to finish the
project. I just hope Single datatype isn't a problem. It's not dollars and
cents, but some kind of ratios and such. Probably this will work, though.
I'll have to explain why < 9.17 can result in records that show 9.17,
although it's stored as 9.169993. Life is a series of trade-offs, eh?

I understand about the text pointer thing. This app doesn't search for that
text column. But surely the area where the text is stored makes the overall
database sensitive to the length of the strings. But I guess that's the
actual length, not the maximum allocated length. So now that makes sense
also. If the text were all padded to full length with spaces, that would be
different then.

Thanks for your participation.

Tom Ellison
 
T

Tom Ellison

Dear Allen:

I had never run into it before. It's "new" isn't it. I mean, I knew it was
there since the 2000 version, but I've never had to work with it. Was it in
'97? That's the second version of Access I learned, and the first I learned
thoroughly (oh for the good ole days).

The equivalent in MSDE is very useful. I like it. Data size increases one
byte at a time, and it even tells you the size of storage in bytes when
you're working on it. That's where I got my idea of how I thought this
should work. Apparently, it doesn't work that way at all.

Thanks again, Allen. I'm absorbing as fast as I can tonight.

Tom Ellison
 
A

Allen Browne

Correct: JET 3.x (Access 97) did not have the Decimal data type. VBA did
support a Variant of subtype Decimal, but had serious trouble displaying it
on a form back then.

I find it extremely embarrassing that JET 4 gets this query wrong:

SELECT MyField FROM MyTable ORDER BY MyField DESC;

If MyField is an unindexed Decimal field, the negative values appear first,
then the positive ones, and the zeros and nulls sort unpredictibly!

Can you believe that this remains unfixed after more than 5 years? Info,
sample, and k.b. reference:
http://allenbrowne.com/bug-08.html
 
T

Tom Ellison

Dear Allen:

I have been using Currency/Money for almost all numeric columns for some
time (other than good old integer and long, of course). I believe I'll
stick with that.

What I cannot conceive is that any such error would be tolerated in SQL
Server. No such error would ever be released, let alone escape correction
for years. That speaks for itself.

Tom Ellison
 
J

Jeff Boyce

Tom

Have you looked into the new Express SQL Server 2005 -- download for free.

Regards

Jeff Boyce
Microsoft Office/Access MVP
Microsoft IT Academy Program Mentor
 
G

Guest

Hi Tom,

About the text field. Making it's size smaller won't reduce the database
size UNLESS it also truncated data. If you put one character in a text field
set to 255, Access only stores one character. No waste. Now if you had 255
characters in that field and reduce the size to 5, the database file would
get smaller, but then you've lost 250 characters from that record.

Then why not just set all text fields to 255 characters? (1) The wizards use
the field size to attempt to size controls on forms and reports. It could get
ugly fast if your zipcode fields were set to 255! (2) Speaking of zipcode
fields, by setting the field size to 5 it would prevent someone from typing
in something grossly incorrect or prevent the Zip+4 format. Not as good as an
input mask, but it works.
 
G

Guest

I hate to ask this, but I didn't see it on a quick scan through the thread:
Have you done a Compact and Repair on your database, after making changes?
 
T

Tom Ellison

Dear Jeff:

Unless they have changed their minds, this has a 1 GB database size limit,
half that of MSDE. Not helpful for this application. I'll be getting into
this some time soon, for sure! But not with respect to the current project.

Thanks, Jeff!

Tom Ellison
 
T

Tom Ellison

Dear MNature:

I mentioned in the thread that I, "compacted this". By that I did mean
compact and repair. That's necessary to get the file size down, as you
said. Then it goes back up when appending.

So, yes, I did this.

Thanks for your contribution!

Tom Ellison
 
T

Tom Ellison

Back at the Chicago regional summit in January 2004 (I think that's the one)
this was announced. I objected, as this would break my existing
installations. Hearing that it's now 4 GB is glorious news. I wish I'd
known that a week ago. I decided against recommending MSDE because it has
the 2 GB limit. A 4 GB limit on Express would be about perfect for them!
Well, I'll have to correct myself. First, research, though. It must run on
non-server platform, and I'll have to check other details. Ignorance is
such a downer. You have made my day!

Tom Ellison
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top