Very large database

Q

qaz

I have created a very large database of stock prices and company financial
information. The queries are PAINFULLY slow. For example, I queried the
database for a Count of the number of records in the table, and it took 10
minutes to execute. I asked it to return all stocks from a list of stocks
that had a capitalization of greater than $25,000,000 and it took 38 minutes
to execute! Furthermore, I have only added 2 years worth of data to the
database. I want to add considerably more data into the database for other
years.

Can anyone point me to a decent tutorial on the correct methods of creating
and searching very large databases so I can start to learn how the serious
database administrators handle the problem of the very large database?

For example, one question I'd like to find an answer to is this: One table
of company information has 137 fields containing company financial
information. I have a basic understanding of normalization and it appears
to me that all the fields are appropriately placed in a single table.
However, in order to speed things up, would it be better to "artificially"
split up those fields and place them in separate tables and then, one would
hope, my searches could often be run against smaller tables? In other
words, for very large databases, would that kind of "fragmentation" of
company financial data amongst different tables be a better approach?
Another approach I thought of was to separate company data in different
tables by year, viz., all 2005 data in one table, all 2004 data in another,
etc. Then if I needed to run a query that compared data from year to year I
would have to do successive searches against the different tables.

Well, you guys get the idea. Any help is appreciated.

Thanks.
 
J

John Spencer (MVP)

First, how do you define very large database?
Second, have you indexed the fields, althought with 137 fields in one table you
cannot index all the fields in the table.

I doubt that the table is near normalized. If it could be normalized it might
be considerably faster.

Is the database located on your computer? Or is it on the network? If on the
network, what anti-virus software are you using? Some of the anti-virus
packages insist on checking each record as it is transmitted across the network
- that leads to extremely slow performance.

I just slammed together a million record table with no index and counted the
last one hundred thousand records in 4 seconds on an emulator on a macintosh
computer. (only one field, but that should make little difference).

I did a query to count each segment of 100 records using a vba function "Mod
100" and that took 38 seconds. With an index on the field the time was cut in
half. I then did a sum of each segment of records (still indexed) and that took
24 seconds.

Quote from Tony Toews (Access MVP)

The three most common performance problems in Access 2000 are:
- LDB locking which a persistent recordset connection fixes
- sub datasheet Name property set to [Auto] should be [None]
- Track name AutoCorrect should be off
(Tools: Options: General Uncheck all name AutoCorrect features)
Other reasons are
- New format of Access 2000 MDB
- Place backend MDB on the root of the network share rather than several
folders down
- Shorten the name of the backend MDB
- Miscellaneous Performance Suggestions
End quote

Check out the following URL for a more complete discussion.

http://www.granite.ab.ca/access/a2kslow.htm
 
Q

qaz

You obviously are able to crunch 1,000,000 records in 4 seconds because you
are much better at this stuff than I am!

Unfortunately, because the DB is so large, I have to have it on a large,
external HD accessed via USB 2.0. I had it on the C:\ drive of my computer
but it was taking up too much space and I had to move it off to the 160
GByte external. I expect this access to be slower but not as slow as it is.

Since I've started reconstructing the DB on the external drive, I have about
350,000 records in the table so far (with only 2 years worth of data). I
have 137 fields of data for each company. Each record consists of a date, a
company name and 135 fields of company financial information for that
particular date.

I suppose I could move the company names and dates to separate tables and
then link the tables with the one containing the financial info. But, I had
simply assumed to this point that it would not make much difference because
I am not usually searching on company names, though dates are important in
many of my searches. But at this point, the DB is so slow as to be
unuseable. So I need to go ahead and see if that will help.

Thank you for your suggestions. They are informative and helpful and will
give me some direction as I try to get a useful system running to aid me in
my investment activities.


John Spencer (MVP) said:
First, how do you define very large database?
Second, have you indexed the fields, althought with 137 fields in one table you
cannot index all the fields in the table.

I doubt that the table is near normalized. If it could be normalized it might
be considerably faster.

Is the database located on your computer? Or is it on the network? If on the
network, what anti-virus software are you using? Some of the anti-virus
packages insist on checking each record as it is transmitted across the network
- that leads to extremely slow performance.

I just slammed together a million record table with no index and counted the
last one hundred thousand records in 4 seconds on an emulator on a macintosh
computer. (only one field, but that should make little difference).

I did a query to count each segment of 100 records using a vba function "Mod
100" and that took 38 seconds. With an index on the field the time was cut in
half. I then did a sum of each segment of records (still indexed) and that took
24 seconds.

Quote from Tony Toews (Access MVP)

The three most common performance problems in Access 2000 are:
- LDB locking which a persistent recordset connection fixes
- sub datasheet Name property set to [Auto] should be [None]
- Track name AutoCorrect should be off
(Tools: Options: General Uncheck all name AutoCorrect features)
Other reasons are
- New format of Access 2000 MDB
- Place backend MDB on the root of the network share rather than several
folders down
- Shorten the name of the backend MDB
- Miscellaneous Performance Suggestions
End quote

Check out the following URL for a more complete discussion.

http://www.granite.ab.ca/access/a2kslow.htm
I have created a very large database of stock prices and company financial
information. The queries are PAINFULLY slow. For example, I queried the
database for a Count of the number of records in the table, and it took 10
minutes to execute. I asked it to return all stocks from a list of stocks
that had a capitalization of greater than $25,000,000 and it took 38 minutes
to execute! Furthermore, I have only added 2 years worth of data to the
database. I want to add considerably more data into the database for other
years.

Can anyone point me to a decent tutorial on the correct methods of creating
and searching very large databases so I can start to learn how the serious
database administrators handle the problem of the very large database?

For example, one question I'd like to find an answer to is this: One table
of company information has 137 fields containing company financial
information. I have a basic understanding of normalization and it appears
to me that all the fields are appropriately placed in a single table.
However, in order to speed things up, would it be better to "artificially"
split up those fields and place them in separate tables and then, one would
hope, my searches could often be run against smaller tables? In other
words, for very large databases, would that kind of "fragmentation" of
company financial data amongst different tables be a better approach?
Another approach I thought of was to separate company data in different
tables by year, viz., all 2005 data in one table, all 2004 data in another,
etc. Then if I needed to run a query that compared data from year to year I
would have to do successive searches against the different tables.

Well, you guys get the idea. Any help is appreciated.

Thanks.
 
J

Jeff Boyce

On the off chance that John doesn't get the opportunity to return to this
thread for a while, I will point out that a table with over 100 fields is
probably not particularly well-normalized. What you described sounds more
like a spreadsheet than a relational database structure.

I think you will find that a survey through the tablesdbdesign newsgroup
will show that tables with more than 30 (?!?) fields are unusual, and
perhaps in need of further normalization.

Consider spending some time looking into normalization and relational
database design. I think you will be pleasantly surprised at how much
faster Access performs if it has data to work with in a structure it expects
to find.

Good luck!

Jeff Boyce
<Access MVP>
 
C

Chris2

qaz said:
I have created a very large database of stock prices and company financial
information. The queries are PAINFULLY slow. For example, I queried the
database for a Count of the number of records in the table, and it took 10
minutes to execute. I asked it to return all stocks from a list of stocks
that had a capitalization of greater than $25,000,000 and it took 38 minutes
to execute! Furthermore, I have only added 2 years worth of data to the
database. I want to add considerably more data into the database for other
years.

Can anyone point me to a decent tutorial on the correct methods of creating
and searching very large databases so I can start to learn how the serious
database administrators handle the problem of the very large database?

For example, one question I'd like to find an answer to is this: One table
of company information has 137 fields containing company financial
information. I have a basic understanding of normalization and it appears
to me that all the fields are appropriately placed in a single table.
However, in order to speed things up, would it be better to "artificially"
split up those fields and place them in separate tables and then, one would
hope, my searches could often be run against smaller tables? In other
words, for very large databases, would that kind of "fragmentation" of
company financial data amongst different tables be a better approach?
Another approach I thought of was to separate company data in different
tables by year, viz., all 2005 data in one table, all 2004 data in another,
etc. Then if I needed to run a query that compared data from year to year I
would have to do successive searches against the different tables.

Well, you guys get the idea. Any help is appreciated.

Thanks.

qaz,

See:

Information about query performance in an Access database (2000, 2002,
2003): http://support.microsoft.com/kb/209126


Sincerely,

Chris O.
 
D

david epsom dot com dot au

How fast is your USB 2.0? Note that USB 2.0 Low Speed is only
as fast as USB 1.0 -- and USB 2.0 Full Speed is only as fast
as USB 1.1. You want USB 2.0 High Speed. How long does it
take to copy the file onto or off of the USB drive?

Also, you may get better speed if you open the file in
exclusive mode (requires coding because the link manager
only does shared links).

(david)



qaz said:
You obviously are able to crunch 1,000,000 records in 4 seconds because
you
are much better at this stuff than I am!

Unfortunately, because the DB is so large, I have to have it on a large,
external HD accessed via USB 2.0. I had it on the C:\ drive of my
computer
but it was taking up too much space and I had to move it off to the 160
GByte external. I expect this access to be slower but not as slow as it
is.

Since I've started reconstructing the DB on the external drive, I have
about
350,000 records in the table so far (with only 2 years worth of data). I
have 137 fields of data for each company. Each record consists of a date,
a
company name and 135 fields of company financial information for that
particular date.

I suppose I could move the company names and dates to separate tables and
then link the tables with the one containing the financial info. But, I
had
simply assumed to this point that it would not make much difference
because
I am not usually searching on company names, though dates are important in
many of my searches. But at this point, the DB is so slow as to be
unuseable. So I need to go ahead and see if that will help.

Thank you for your suggestions. They are informative and helpful and will
give me some direction as I try to get a useful system running to aid me
in
my investment activities.


John Spencer (MVP) said:
First, how do you define very large database?
Second, have you indexed the fields, althought with 137 fields in one table you
cannot index all the fields in the table.

I doubt that the table is near normalized. If it could be normalized it might
be considerably faster.

Is the database located on your computer? Or is it on the network? If
on the
network, what anti-virus software are you using? Some of the anti-virus
packages insist on checking each record as it is transmitted across the network
- that leads to extremely slow performance.

I just slammed together a million record table with no index and counted the
last one hundred thousand records in 4 seconds on an emulator on a macintosh
computer. (only one field, but that should make little difference).

I did a query to count each segment of 100 records using a vba function "Mod
100" and that took 38 seconds. With an index on the field the time was cut in
half. I then did a sum of each segment of records (still indexed) and that took
24 seconds.

Quote from Tony Toews (Access MVP)

The three most common performance problems in Access 2000 are:
- LDB locking which a persistent recordset connection fixes
- sub datasheet Name property set to [Auto] should be [None]
- Track name AutoCorrect should be off
(Tools: Options: General Uncheck all name AutoCorrect features)
Other reasons are
- New format of Access 2000 MDB
- Place backend MDB on the root of the network share rather than several
folders down
- Shorten the name of the backend MDB
- Miscellaneous Performance Suggestions
End quote

Check out the following URL for a more complete discussion.

http://www.granite.ab.ca/access/a2kslow.htm
I have created a very large database of stock prices and company financial
information. The queries are PAINFULLY slow. For example, I queried the
database for a Count of the number of records in the table, and it took 10
minutes to execute. I asked it to return all stocks from a list of stocks
that had a capitalization of greater than $25,000,000 and it took 38 minutes
to execute! Furthermore, I have only added 2 years worth of data to
the
database. I want to add considerably more data into the database for other
years.

Can anyone point me to a decent tutorial on the correct methods of creating
and searching very large databases so I can start to learn how the serious
database administrators handle the problem of the very large database?

For example, one question I'd like to find an answer to is this: One table
of company information has 137 fields containing company financial
information. I have a basic understanding of normalization and it appears
to me that all the fields are appropriately placed in a single table.
However, in order to speed things up, would it be better to "artificially"
split up those fields and place them in separate tables and then, one would
hope, my searches could often be run against smaller tables? In other
words, for very large databases, would that kind of "fragmentation" of
company financial data amongst different tables be a better approach?
Another approach I thought of was to separate company data in different
tables by year, viz., all 2005 data in one table, all 2004 data in another,
etc. Then if I needed to run a query that compared data from year to year I
would have to do successive searches against the different tables.

Well, you guys get the idea. Any help is appreciated.

Thanks.
 
J

John Spencer (MVP)

You do realize that the MAXIMUM size of an Access database is 2 gigabytes (1 gig
in Access 97)?

How fast was it when you had the database on your C drive? Can you compare the
speed between the C drive and the external drive with duplicate copies of the
database or at least a limited set of data?

Anyway, good luck. I did not mean to imply that you were doing something wrong.
I only wanted to point out that the performance you were seeing might be an
anomaly and needed investigation to figure out the cause.

By the way, complex queries can be slow in Access. Have you tried some simple
queries - select all fields from the table where one record would be returned?

SELECT *
FROM TABLENAME
WHERE PrimaryKeyField = "SomeUniqueValue"

You obviously are able to crunch 1,000,000 records in 4 seconds because you
are much better at this stuff than I am!

Unfortunately, because the DB is so large, I have to have it on a large,
external HD accessed via USB 2.0. I had it on the C:\ drive of my computer
but it was taking up too much space and I had to move it off to the 160
GByte external. I expect this access to be slower but not as slow as it is.

Since I've started reconstructing the DB on the external drive, I have about
350,000 records in the table so far (with only 2 years worth of data). I
have 137 fields of data for each company. Each record consists of a date, a
company name and 135 fields of company financial information for that
particular date.

I suppose I could move the company names and dates to separate tables and
then link the tables with the one containing the financial info. But, I had
simply assumed to this point that it would not make much difference because
I am not usually searching on company names, though dates are important in
many of my searches. But at this point, the DB is so slow as to be
unuseable. So I need to go ahead and see if that will help.

Thank you for your suggestions. They are informative and helpful and will
give me some direction as I try to get a useful system running to aid me in
my investment activities.

John Spencer (MVP) said:
First, how do you define very large database?
Second, have you indexed the fields, althought with 137 fields in one table you
cannot index all the fields in the table.

I doubt that the table is near normalized. If it could be normalized it might
be considerably faster.

Is the database located on your computer? Or is it on the network? If on the
network, what anti-virus software are you using? Some of the anti-virus
packages insist on checking each record as it is transmitted across the network
- that leads to extremely slow performance.

I just slammed together a million record table with no index and counted the
last one hundred thousand records in 4 seconds on an emulator on a macintosh
computer. (only one field, but that should make little difference).

I did a query to count each segment of 100 records using a vba function "Mod
100" and that took 38 seconds. With an index on the field the time was cut in
half. I then did a sum of each segment of records (still indexed) and that took
24 seconds.

Quote from Tony Toews (Access MVP)

The three most common performance problems in Access 2000 are:
- LDB locking which a persistent recordset connection fixes
- sub datasheet Name property set to [Auto] should be [None]
- Track name AutoCorrect should be off
(Tools: Options: General Uncheck all name AutoCorrect features)
Other reasons are
- New format of Access 2000 MDB
- Place backend MDB on the root of the network share rather than several
folders down
- Shorten the name of the backend MDB
- Miscellaneous Performance Suggestions
End quote

Check out the following URL for a more complete discussion.

http://www.granite.ab.ca/access/a2kslow.htm
I have created a very large database of stock prices and company financial
information. The queries are PAINFULLY slow. For example, I queried the
database for a Count of the number of records in the table, and it took 10
minutes to execute. I asked it to return all stocks from a list of stocks
that had a capitalization of greater than $25,000,000 and it took 38 minutes
to execute! Furthermore, I have only added 2 years worth of data to the
database. I want to add considerably more data into the database for other
years.

Can anyone point me to a decent tutorial on the correct methods of creating
and searching very large databases so I can start to learn how the serious
database administrators handle the problem of the very large database?

For example, one question I'd like to find an answer to is this: One table
of company information has 137 fields containing company financial
information. I have a basic understanding of normalization and it appears
to me that all the fields are appropriately placed in a single table.
However, in order to speed things up, would it be better to "artificially"
split up those fields and place them in separate tables and then, one would
hope, my searches could often be run against smaller tables? In other
words, for very large databases, would that kind of "fragmentation" of
company financial data amongst different tables be a better approach?
Another approach I thought of was to separate company data in different
tables by year, viz., all 2005 data in one table, all 2004 data in another,
etc. Then if I needed to run a query that compared data from year to year I
would have to do successive searches against the different tables.

Well, you guys get the idea. Any help is appreciated.

Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top