large dataset/import problems

D

Daniel Collison

I am importing a large dataset of approximately 720,000 records, with each
record comprised of 25-30 data elements. The size of the raw dataset in
microsoft access is in excess of 1,400,000 kb (1.4 gigabytes?). When the raw
dataset is imported into an MS Access dataset with queries/reports, the size
is in excess of 2,000,000 kb (2.0 gigabytes?). The size of the dataset makes
it exceedingly difficult to manipulate data, run queries and reports.

One of the problems associated with the raw dataset is that many of the
individual data elements, though numbers, are imported as text with upwards
of 40 blank spaces attached to a number. For instance, "3" will import as a
"3" with 40 spaces to the right of it. I imagine that these spaces account
for the large size of the raw dataset. An update query could eliminate the
spaces. Example: criteria with " " and replace with "". But I would need to
do this with 15 to 20 fields, and because of the number of records, I would
probably have to work with subsets of any particular field (15,000 to 20,000
records per update).

Does anyone know of a way that I can import records without all the spaces?

Thanks,
 
J

John W. Vinson

I am importing a large dataset of approximately 720,000 records, with each
record comprised of 25-30 data elements. The size of the raw dataset in
microsoft access is in excess of 1,400,000 kb (1.4 gigabytes?). When the raw
dataset is imported into an MS Access dataset with queries/reports, the size
is in excess of 2,000,000 kb (2.0 gigabytes?). The size of the dataset makes
it exceedingly difficult to manipulate data, run queries and reports.

Since an Access database has a hard, no-way-around limit of 2GByte, it's worse
than that: it will make the database completely unusable.
One of the problems associated with the raw dataset is that many of the
individual data elements, though numbers, are imported as text with upwards
of 40 blank spaces attached to a number. For instance, "3" will import as a
"3" with 40 spaces to the right of it. I imagine that these spaces account
for the large size of the raw dataset. An update query could eliminate the
spaces. Example: criteria with " " and replace with "". But I would need to
do this with 15 to 20 fields, and because of the number of records, I would
probably have to work with subsets of any particular field (15,000 to 20,000
records per update).

Does anyone know of a way that I can import records without all the spaces?

Access *trims* trailing blanks in text fields, and doesn't store blanks in
number fields at all, so that's not the problem.

I would suggest using File... Get External Data... Link to *link* to the
dataset (rather than importing it), and running an Append query from the
linked data into a local table. Compact and repair the database before and
after doing so, and *keep good backups*.
 
H

Hans Up

Daniel said:
One of the problems associated with the raw dataset is that many of the
individual data elements, though numbers, are imported as text with upwards
of 40 blank spaces attached to a number. For instance, "3" will import as a
"3" with 40 spaces to the right of it. I imagine that these spaces account
for the large size of the raw dataset.

What is the source for the raw dataset?
An update query could eliminate the
spaces. Example: criteria with " " and replace with "". But I would need to
do this with 15 to 20 fields, and because of the number of records, I would
probably have to work with subsets of any particular field (15,000 to 20,000
records per update).

Perhaps you could import the raw dataset to a new blank database and
perform your manipulations there ... at the least that should give your
more room to work before hitting the 2 GB file size limit.

If that approach is practical you could then link the new table to your
old database. Or you could import the new table into the old database,
if the file size allows.
 
D

Dirk Goldgar

John W. Vinson said:
Access *trims* trailing blanks in text fields, and doesn't store blanks in
number fields at all, so that's not the problem.

In the case of imported text data, I don't think it trims the trailing
blanks. I believe it is a function of the text box to trim the spaces.
However, I would expect that a bunch of trailing spaces would compress
fairly well with Unicode compression, so this may still not be the problem.
I would suggest using File... Get External Data... Link to *link* to the
dataset (rather than importing it), and running an Append query from the
linked data into a local table. Compact and repair the database before and
after doing so, and *keep good backups*.

I agree with this suggestion, but I would modify it as follows: In your
append query, use the Trim() function to trim the spaces from each text
field, and the appropriate numeric conversion function to convert each field
that should be a number type, before they get appended to the target table.
 
D

Daniel Collison

Your suggestion regarding the datalink was especially helpful. I was able to
link to an external file and then append the records to a local table.
Thanks for the assist!

Daniel
 
J

John W. Vinson

Your suggestion regarding the datalink was especially helpful. I was able to
link to an external file and then append the records to a local table.
Thanks for the assist!

You're welcome; glad I was able to help!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top