What's a fast way of doing millions of SQL queries

D

dave

Hi,
I have a job where it will involve checking large volumes of addresses
against a master address table.

The master address table contains approx 1.7 million addresses.
The files (normally CSV) I must check against will contain anything
between 100,000 records and 40 million records.

Due to spelling mistakes, different address formats and other errors I
can't just load up the data into SQL and do simple queries to get
matches. So I am writting some vb.net code to pre-process the data in
order to match to the SQL master address table.

At the moment I've played around with an open SQL connection and
SQLdatareader that I'm feeding new SQLcommands to. It works but as I'm a
relative beginner to vb.net it's probably not the most efficient method.

Would loading the master address table into a dataset or datatable be
effective. Bear in mind that the master address table will be approx
600mb in size.


Any help/advice/suggestions greatly appreciated.


thanks
Dave
 
S

steve

uh...actually you *can* do it all in sql and i can almost guarantee that it
will process the records faster than any pre-processor you can build.

i know because we had a team of developers do and test just that. we even
went as far as expanding every form of address (blvd., w 34 n ave., st.,
ste. etc.) to its full standardized text. if you want accuracy, you'll need
to consult the usps guidelines on addressing. anyway, once your address is
standardized you can accurately query it to do whatever you'd like.

imho, this should all be done as a dts package and any other option, unless
for better reasons than given here, should be summarily dismissed.


| Hi,
| I have a job where it will involve checking large volumes of addresses
| against a master address table.
|
| The master address table contains approx 1.7 million addresses.
| The files (normally CSV) I must check against will contain anything
| between 100,000 records and 40 million records.
|
| Due to spelling mistakes, different address formats and other errors I
| can't just load up the data into SQL and do simple queries to get
| matches. So I am writting some vb.net code to pre-process the data in
| order to match to the SQL master address table.
|
| At the moment I've played around with an open SQL connection and
| SQLdatareader that I'm feeding new SQLcommands to. It works but as I'm a
| relative beginner to vb.net it's probably not the most efficient method.
|
| Would loading the master address table into a dataset or datatable be
| effective. Bear in mind that the master address table will be approx
| 600mb in size.
|
|
| Any help/advice/suggestions greatly appreciated.
|
|
| thanks
| Dave
 
D

dave

steve said:
uh...actually you *can* do it all in sql and i can almost guarantee that it
will process the records faster than any pre-processor you can build.

i know because we had a team of developers do and test just that. we even
went as far as expanding every form of address (blvd., w 34 n ave., st.,
ste. etc.) to its full standardized text. if you want accuracy, you'll need
to consult the usps guidelines on addressing. anyway, once your address is
standardized you can accurately query it to do whatever you'd like.

My problem is that the addresses are not standardized. They are a mess
and in many cases they are invalid and un-useable in thier current form.
Hence the need to reference a maser address table to clean them up.

Dave
 
J

JT Lovell

Steve's correct, you need to get the data into the SQL table and then
process it from there. If the CSV file requires some basic preprocessing to
get it ready to load into SQL then you can code that in VB, but get it into
SQL and then do the majority of your cleanup work there.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top