CSV File Import

W

William Foster

Good morning all,

I have an issue where I am importing a large CSV into Visual Studio 2005
- Visual Basic; as always the creators of the CSV have included the
details of the data at the end file (Some 1.6 million records from the
beginning). I am doing some evaluation on the file before I undertake
the importation, currently this is being done by reading all rows until
I get to the end then returning the last 10 or so rows; does anyone know
how I can read from the bottom of the file up to negate the 1.6 million
cycles of the read procedure before I can get to the data I need.

Any assistance you may be able to provide would be greatly appreciated.

Yours sincerely,

William Foster
 
S

ShaneO

William said:
Good morning all,

I have an issue where I am importing a large CSV into Visual Studio 2005
- Visual Basic; as always the creators of the CSV have included the
details of the data at the end file (Some 1.6 million records from the
beginning). I am doing some evaluation on the file before I undertake
the importation, currently this is being done by reading all rows until
I get to the end then returning the last 10 or so rows; does anyone know
how I can read from the bottom of the file up to negate the 1.6 million
cycles of the read procedure before I can get to the data I need.

Any assistance you may be able to provide would be greatly appreciated.

Yours sincerely,

William Foster

There's probably a simpler way, but I've done it before by first opening
the file in Binary mode, and as CSV files are generally line-delimited
by CrLf (Hex 0D 0A) characters, I start by obtaining the file length and
reading backwards (using SEEK) to find the second-last CrLf.

Between the second-last and the last CrLf would be your header
information you were looking for.

I trust this helps, although as already said, maybe someone else has a
better way.

ShaneO

There are 10 kinds of people - Those who understand Binary and those who
don't.
 
H

Homer J Simpson

William Foster said:
Good morning all,

I have an issue where I am importing a large CSV into Visual Studio 2005
- Visual Basic; as always the creators of the CSV have included the
details of the data at the end file (Some 1.6 million records from the
beginning). I am doing some evaluation on the file before I undertake
the importation, currently this is being done by reading all rows until
I get to the end then returning the last 10 or so rows; does anyone know
how I can read from the bottom of the file up to negate the 1.6 million
cycles of the read procedure before I can get to the data I need.

Have you tried MSWord?
 
W

William Foster

Shane,

Thanks for your assistance, I will give that a go, it sounds a bit
awkward, but it is much better than the way I am doing it.

Thank you !

Yours sincerely,

William Foster
 
W

William Foster

Homer,

I was previously using Microsoft Excel, but I need to move to Visual
Studio for a number of reasons. In Microsoft Word I would still need to
seek the last record by reading all rows anyway wouldn't I ? Or is there
a quick way ?

Thanks for you assistance.

Yours sincerely,

William Foster
 
S

ShaneO

William said:
Shane,

Thanks for your assistance, I will give that a go, it sounds a bit
awkward, but it is much better than the way I am doing it.

Thank you !

Yours sincerely,

William Foster

Hello William,

For what it's worth, I've found the code I mentioned earlier.

You'll have to excuse the syntax, it was just a quick and dirty routine
I needed for a one-off.

If you copy & paste it (watch for line-wrapping!), then set the
filename, you'll find it will display your Header data as required.


Dim blPossibleEOL As Boolean = False
Dim byTemp As Byte = 0
Dim iFileNum As UInt16 = FreeFile()
Dim sA As String = "YOUR FILENAME HERE!!!"
Dim sHeader As String = ""

Dim iFileEnd As Integer = FileLen(sA) - 2
Dim iFileSeekPosition As Integer = iFileEnd
FileOpen(iFileNum, sA, OpenMode.Binary, OpenAccess.Read)

Do
FileGet(iFileNum, byTemp, iFileSeekPosition)
If blPossibleEOL Then
If byTemp = &HD Then
sHeader = StrDup(iFileEnd - (iFileSeekPosition + 1), " ")
FileGet(iFileNum, sHeader, iFileSeekPosition + 2)
Exit Do
End If
ElseIf byTemp = &HA Then
blPossibleEOL = True
Else
blPossibleEOL = False
End If
iFileSeekPosition -= 1
Loop
FileClose(iFileNum)

Debug.Print(sHeader)


Hope this helps.

ShaneO

There are 10 kinds of people - Those who understand Binary and those who
don't.
 
H

Homer J Simpson

Homer,

I was previously using Microsoft Excel, but I need to move to Visual
Studio for a number of reasons. In Microsoft Word I would still need to
seek the last record by reading all rows anyway wouldn't I ? Or is there
a quick way ?

I believe you could just go <CTRL>-<END> and get the last several lines.
 
W

William Foster

Shane,

Thanks a lot ! That will save me a fair bit of work.

Yours sincerely,

William Foster
 
W

William Foster

Homer,

Thanks for the tip, however, I need to do it progammatically as it is
part of a larger program to clean a file, this first bit just assesses
the data. There are in excess of 1.6 million records, with
73 fields per record so opening it up into Microsoft Word and then
getting the program to scroll to the bottom with your method will result
in a computer crash. Kind of the way that Microsoft Excel is currently
dying whcih is why I am moving to Visual Studio.

Thanks anyway !

Yours sincerely,

William Foster
 
H

Homer J Simpson

Thanks for the tip, however, I need to do it progammatically as it is
part of a larger program to clean a file, this first bit just assesses
the data. There are in excess of 1.6 million records, with
73 fields per record so opening it up into Microsoft Word and then
getting the program to scroll to the bottom with your method will result
in a computer crash. Kind of the way that Microsoft Excel is currently
dying whcih is why I am moving to Visual Studio.

OK. If it was a one off I'd try Word. You could also do this with vi on a
Unix system.

Otherwise the best method is to open the file, go to the end, back up some
and then read the data in. This is what Word and vi can do as both can
handle files far larger than main memory.
 
W

William Foster

Shane,

I tried that routine out and it worked beautifully; it was amazingly
quick to search through on of my data files which was in excess of
700,000 rows.

I have just one more question, now that I have the last row, do you have
any ideas on how to capture the last sixteen rows? I have tried a few
different things with your existing code but found that the FileGet must
work on some type of algorithm as it doesn't read sequentially and in
order to find the end of the file it only did 26 loops within the whole
file.

Thank you very much for your assistance so far it has been invaluable.

Yours sincerely,

William Foster
 
S

ShaneO

William said:
Shane,

I tried that routine out and it worked beautifully; it was amazingly
quick to search through on of my data files which was in excess of
700,000 rows.

I have just one more question, now that I have the last row, do you have
any ideas on how to capture the last sixteen rows? I have tried a few
different things with your existing code but found that the FileGet must
work on some type of algorithm as it doesn't read sequentially and in
order to find the end of the file it only did 26 loops within the whole
file.

Thank you very much for your assistance so far it has been invaluable.

Yours sincerely,

William Foster

William,

The following code will do what you want. (Watch for line-wrapping!) -


Dim blPossibleEOL As Boolean = False
Dim byTemp As Byte = 0
Dim iFileNum As UInt16 = FreeFile()
Dim sA As String = "YOUR FILENAME HERE!!"
Dim sHeader As String = ""
Dim iCounter As UInt16 = 0

Dim iFileEnd As Integer = FileLen(sA) - 2
Dim iFileSeekPosition As Integer = iFileEnd
FileOpen(iFileNum, sA, OpenMode.Binary, OpenAccess.Read)

Do
FileGet(iFileNum, byTemp, iFileSeekPosition)
If blPossibleEOL Then
If byTemp = &HD Then
iCounter += 1
If iCounter = 26 Then 'Change this for whatever number of
lines you are wanting.
sHeader = StrDup(iFileEnd - (iFileSeekPosition + 1), " ")
FileGet(iFileNum, sHeader, iFileSeekPosition + 2)
Exit Do
Else
blPossibleEOL = False
End If
Else
blPossibleEOL = False
End If
ElseIf byTemp = &HA Then
blPossibleEOL = True
Else
blPossibleEOL = False
End If
iFileSeekPosition -= 1
Loop
FileClose(iFileNum)

Debug.Print(sHeader)

One thing to be careful of is that I have declared both "iFileEnd" and
"iFileSeekPosition" as Integer variables. If the CSV file you're
reading is larger that approx 2GB you'll need to change that to Long
Integer (UInt64).

Hope this helps.

ShaneO

There are 10 kinds of people - Those who understand Binary and those who
don't.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top