Most of the tools you're thinking of using can't even begin to work with that
much data unless you link tables in multiple databases and have multiple
instances running simultaneously. You probably don't have the hardwareto
process the data in reasonable amounts of time, either.
Since they're free, look at MySQL 5.0 and IBM DB2 Express-C for your
feasibility study. MySQL holds up to 16 TB, depending on the OS. There's no
data file size limit on IBM DB2.
http://dev.mysql.com/downloads/mysql/5.0.html
http://www-128.ibm.com/developerworks/downloads/im/udbexp/index.html?...
I don't know about IBM DB2, but MySQL has limits on table sizes. Seehttp://dev.mysql.com/doc/refman/5.0/en/full-table.html.
That said, the hurdles you face with only basic computer skills aren't
insurmountable, but realize that professional brokers spend tens of millions
of dollars building their backtesting systems with very, very experienced
software professionals. Unless you're a quick study and can spend a minimum
of 15 to 18 months learning to build and use relational databases and data
warehouses daily, you'll write queries that takes days to execute - if they
ever finish - and the results will be wrong. Your trading strategies will be
based on wishful thinking, and you won't have the skills to recognize this
until you've spent a lot of time building the project and lost a lot of money
trading with what you thought were winning strategies.
Just looking at your data structure on the one file, you have date and time,
open, high, low, close, up and down intervals, yet you don't even have the
stock symbol in the row of data. Which means the table name is probably the
stock symbol. Very poor data structure for efficient queries.
Don't build this from the ground up on your PC. Get yourself hired as a
trainee where they build backtesting systems, or else hire someone who's had
the training.
Chris
Microsoft MVP
My data is only 8 columns across but something like 50 million or more rows
down. Just one of the files is 12 GB in CSV format. It's so big I can only
open tiny chunks of it in Excel. That is why I was looking at Access or SQL.
Below is a sample of the data. Any idea how best I should approach this,
keeping in mind that my programming and computer skills are only basic? I
feel that I could learn Access or SQL but if they can only handle 2GB and 4GB
file sizes, maybe I am better off trying something else?
"Date","Time","O","H","L","C","U","D"
04/09/1998,0334,0.8318,0.8318,0.8318,0.8318,0,0
04/09/1998,0335,0.8317,0.8317,0.8317,0.8317,0,0
04/09/1998,0335,0.8317,0.8317,0.8317,0.8317,0,0
04/09/1998,0336,0.8319,0.8319,0.8319,0.8319,0,0
04/09/1998,0336,0.8318,0.8318,0.8318,0.8318,0,0
04/09/1998,0336,0.8317,0.8317,0.8317,0.8317,0,0
I have a database of intra-day stock tick price/volume data that exceeds 50
million rows.
[quoted text clipped - 10 lines]