Can Access handle 50 million rows of data or more?


Z

Zoop

I have a database of intra-day stock tick price/volume data that exceeds 50
million rows. I want to analyze and backtest various trading strategies on
this data. Right now I am using Excel 2003 and I am considering buying Access
2007.

Questions:

1 - Can Access import and handle 50 million rows of data, or would I need
to go for some kind of Oracle (or other) solution?

2- Will I be able to use Excel 2003 with Access 2007 to manipulate the
database from Excel 2003?

Thanks for any feedback!
 
Ad

Advertisements

L

Larry Daugherty

The number of rows is irrelevant. A single Access MDB can store
2Gbytes of data. Until you get to the point where the data in a
single table approaches that size you have different strategies you
can use. The next half step upward would be SQL Express 2005. It's
free.

I don't use Access 2007 but interacting with the tables from Excel
*should* be the same as interacting with tables in an earlier version.

HTH
 
Z

Zoop

Thanks, Larry. I took a quick look at SQL Express 2005. I like the fact that
it's free for one. Two, it looks like it might be as easy to learn as Access.

Not sure if you have any experience with SQL Express 2005, but do you think
that might be a better way for me to go? I know basic programming in C type
languages and am familiar with Excel, Windows, and other basic stuff.
 
T

Tony Toews [MVP]

Zoop said:
I have a database of intra-day stock tick price/volume data that exceeds 50
million rows.

50 million is a lot of records for Access. Access allows for a
maximum of 2 Gb MDB so that's 40 bytes per record. Not including
overhead and indexes.

SQL Server Express has a 4 gb size limit so that's not a lot better.
http://msdn.microsoft.com/en-us/library/ms345154.aspx

BTW just how big is your Excel spreadsheet? In terms of rows, columns
and size of the file.

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
 
L

Larry Daugherty

Haven't used it, shame on me. SQL Express is equivalent to JET, the
underlying RDBMS that is the default for Access. Just about any RDBMS
will do. However, Access is a whole bunch of RAD tools and User
Interface components.

If you intend to do *all* of your data manipulation and display from
within Excel then you probably don't need Access. I can't imagine
doing any data management application without it. A few years ago my
client insisted that I provide them with an Excel solution rather than
Access. They were all familiar with Excel wanted output they could
play with in familiar ways. I did as they asked and gave them an
application that was low on User Interface niceties but highly
functional in terms of data manipulated and presented; charts, etc.
They were ecstatic. I was still grumbling "cudda dun it better in
Access". :)

HTH
--
-Larry-
--

Zoop said:
Thanks, Larry. I took a quick look at SQL Express 2005. I like the fact that
it's free for one. Two, it looks like it might be as easy to learn as Access.

Not sure if you have any experience with SQL Express 2005, but do you think
that might be a better way for me to go? I know basic programming in C type
languages and am familiar with Excel, Windows, and other basic
stuff.
 
Z

Zoop

Hi Tony,

My data is only 8 columns across but something like 50 million or more rows
down. Just one of the files is 12 GB in CSV format. It's so big I can only
open tiny chunks of it in Excel. That is why I was looking at Access or SQL.

Below is a sample of the data. Any idea how best I should approach this,
keeping in mind that my programming and computer skills are only basic? I
feel that I could learn Access or SQL but if they can only handle 2GB and 4GB
file sizes, maybe I am better off trying something else?


"Date","Time","O","H","L","C","U","D"
04/09/1998,0334,0.8318,0.8318,0.8318,0.8318,0,0
04/09/1998,0335,0.8317,0.8317,0.8317,0.8317,0,0
04/09/1998,0335,0.8317,0.8317,0.8317,0.8317,0,0
04/09/1998,0336,0.8319,0.8319,0.8319,0.8319,0,0
04/09/1998,0336,0.8318,0.8318,0.8318,0.8318,0,0
04/09/1998,0336,0.8317,0.8317,0.8317,0.8317,0,0
 
Ad

Advertisements

A

Armen Stein

Haven't used it, shame on me. SQL Express is equivalent to JET, the
underlying RDBMS that is the default for Access. Just about any RDBMS
will do. However, Access is a whole bunch of RAD tools and User
Interface components.

To clarify, SQL Express is the same database engine as the "bigger"
versions of SQL Server, but with CPU, memory and disk space
contraints. It is a "client-server" database, rather than a
"file-server" database like JET. So it isn't really equivalent to
JET, beyond the fact that both are relational databases that can
process SQL statements.

With that many records, you're going to be pushing the limits of SQL
Express also. You many need to look at the other editions (which
aren't free of course):

http://www.microsoft.com/sql/prodinfo/features/compare-features.mspx

Also, Access can make a great front-end to SQL Server, so that you
still get the RAD tools and features that Larry mentioned.

Armen Stein
Microsoft Access MVP
www.JStreetTech.com
 
T

Tony Toews [MVP]

Zoop said:
My data is only 8 columns across but something like 50 million or more rows
down. Just one of the files is 12 GB in CSV format. It's so big I can only
open tiny chunks of it in Excel. That is why I was looking at Access or SQL.

Below is a sample of the data. Any idea how best I should approach this,
keeping in mind that my programming and computer skills are only basic? I
feel that I could learn Access or SQL but if they can only handle 2GB and 4GB
file sizes, maybe I am better off trying something else?

If the CSV file is 12 Gb then the Access file certainly won't be any
smaller and could be quite a bit larger depending on indexing.

There is SQL Server Express, which is free, but has a 4 gb size limit.
SQL Server (paid version) will easily handle your data. Cost though?
I have no idea.

There are open source database systems out there such as mySQL that
are free. How good? I have no idea. They may not have the
features that I would require but they may work very well for your
application. Although their feature list is being added to
regularly. I'd suggest searching at sourceforge.net.

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
 
G

Guest

In a Jet 4.0 MDB, by my count,

132,406,200 records, one Boolean field.

87,486,165 records, indexed autonumber with one Boolean field.


MDB back end ran slow when I tried it with that many records,
but it doesn't look like an MDB backend is an option any way
- 1 or 2 million is a more realistic maximum.


Normally SQL Server express doesn't run any faster than MDB's,
but it handles indexes and memory differently, so this might be
a case.


(david)
 
A

a a r o n . k e m p f

bullshit it's not a lot better

you can PARTITION data in SQL Server and it's plenty fast.

Furthermore, SQL Server has these thigns called 'DataTypes' where you
have _MUCH_ better control over how 40 bytes.

40 bytes = 3 char, a couple of money and a handful of smallIntegers.
That should be enough for almost anything you want to do.

-Aaron
 
Ad

Advertisements

A

a a r o n . k e m p f

NOT TRUE

you can use Access -- with SQL Server -- to rightsize your data.

12gb unicode, stored as INTEGERS could be a whole heck of a lot
smaller

-Aaron
 
A

a a r o n . k e m p f

nothing costs tens of millions of dollars.
except a bunch of Access retards.

of _COURSE_ you should use SQL Server.
Maybe you shoudl spend a whopping $49 and use a developers edition?

Seriously

-Aaron



Most of the tools you're thinking of using can't even begin to work with that
much data unless you link tables in multiple databases and have multiple
instances running simultaneously.  You probably don't have the hardwareto
process the data in reasonable amounts of time, either.

Since they're free, look at MySQL 5.0 and IBM DB2 Express-C for your
feasibility study.  MySQL holds up to 16 TB, depending on the OS.  There's no
data file size limit on IBM DB2.

http://dev.mysql.com/downloads/mysql/5.0.html

http://www-128.ibm.com/developerworks/downloads/im/udbexp/index.html?...

I don't know about IBM DB2, but MySQL has limits on table sizes.  Seehttp://dev.mysql.com/doc/refman/5.0/en/full-table.html.

That said, the hurdles you face with only basic computer skills aren't
insurmountable, but realize that professional brokers spend tens of millions
of dollars building their backtesting systems with very, very experienced
software professionals.  Unless you're a quick study and can spend a minimum
of 15 to 18 months learning to build and use relational databases and data
warehouses daily, you'll write queries that takes days to execute - if they
ever finish - and the results will be wrong.  Your trading strategies will be
based on wishful thinking, and you won't have the skills to recognize this
until you've spent a lot of time building the project and lost a lot of money
trading with what you thought were winning strategies.

Just looking at your data structure on the one file, you have date and time,
open, high, low, close, up and down intervals, yet you don't even have the
stock symbol in the row of data.  Which means the table name is probably the
stock symbol.  Very poor data structure for efficient queries.

Don't build this from the ground up on your PC.  Get yourself hired as a
trainee where they build backtesting systems, or else hire someone who's had
the training.

Chris
Microsoft MVP


My data is only 8 columns across but something like 50 million or more rows
down. Just one of the files is 12 GB in CSV format. It's so big I can only
open tiny chunks of it in Excel. That is why I was looking at Access or SQL.
Below is a sample of the data. Any idea how best I should approach this,
keeping in mind that my programming and computer skills are only basic? I
feel that I could learn Access or SQL but if they can only handle 2GB and 4GB
file sizes, maybe I am better off trying something else?
"Date","Time","O","H","L","C","U","D"
04/09/1998,0334,0.8318,0.8318,0.8318,0.8318,0,0
04/09/1998,0335,0.8317,0.8317,0.8317,0.8317,0,0
04/09/1998,0335,0.8317,0.8317,0.8317,0.8317,0,0
04/09/1998,0336,0.8319,0.8319,0.8319,0.8319,0,0
04/09/1998,0336,0.8318,0.8318,0.8318,0.8318,0,0
04/09/1998,0336,0.8317,0.8317,0.8317,0.8317,0,0
I have a database of intra-day stock tick price/volume data that exceeds 50
million rows.  
[quoted text clipped - 10 lines]
 
A

a a r o n . k e m p f

bullshit

you could partition the data just fine using SQL Express.

I eat 100gb databases for lunch using single processor systems.

-Aaron
 
A

a a r o n . k e m p f

I've recently built a single SQL Server database that has 350 million
rows.

in 4gb.

Stick a fork in it kids.
It can be done.

But you should probably use something like SQL Server and a real ETL
Tool.

-Aaron
 
Ad

Advertisements

D

Douglas J. Steele

It's 2 Gb in Access 2000, 2002, 2003 and 2007.

--
Doug Steele, Microsoft Access MVP

(no private e-mails, please)


Is the limit in Access 2003 also 2GB ?
Thanks,
Yossi
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top