PC Review


Reply
Thread Tools Rate Thread

CSV decoding line-by-line

 
 
Phil Hibbs
Guest
Posts: n/a
 
      15th Nov 2010
I have a large CSV file that I need to read and filter into various
sheets based on some rules that test values in particular columns.
There are more than 64k rows, so I can't load it into one sheet first
and then split it up from there as this needs to be 2003-compatible.
I'm currently reading it line by line in a macro, breaking up each
line into its fields, applying the rules, and then writing each row to
the appropriate sheet. This is quite slow, I think mostly it's the
rules part of the code, but I was wondering if there was any way of
parsing a single row of CSV other than doing it the hard way? I
suppose I should try writing the entire line to a single cell and
calling Text To Columns on it and then sending it off to the
appropriate sheet. Maybe I could do, say, 10000 rows at a time this
way, and maybe use AutoFilter to apply the rules. Any other
suggestions?

Phil Hibbs.
 
Reply With Quote
 
 
 
 
Pete_UK
Guest
Posts: n/a
 
      15th Nov 2010
Chip Pearson has a macro here:

http://www.cpearson.com/excel/ImportBigFiles.aspx

which allows you to import files with more than 64k records - it just
splits them up into different sheets. This will be quicker than
reading the file one line at a time. Perhaps you can sort the data in
the imported sheets to reflect your rules (or apply autofilter), and
then copy blocks of data into the sheets you need.

Hope this helps.

Pete

On Nov 15, 11:11*am, Phil Hibbs <sna...@gmail.com> wrote:
> I have a large CSV file that I need to read and filter into various
> sheets based on some rules that test values in particular columns.
> There are more than 64k rows, so I can't load it into one sheet first
> and then split it up from there as this needs to be 2003-compatible.
> I'm currently reading it line by line in a macro, breaking up each
> line into its fields, applying the rules, and then writing each row to
> the appropriate sheet. This is quite slow, I think mostly it's the
> rules part of the code, but I was wondering if there was any way of
> parsing a single row of CSV other than doing it the hard way? I
> suppose I should try writing the entire line to a single cell and
> calling Text To Columns on it and then sending it off to the
> appropriate sheet. Maybe I could do, say, 10000 rows at a time this
> way, and maybe use AutoFilter to apply the rules. Any other
> suggestions?
>
> Phil Hibbs.


 
Reply With Quote
 
Jim Cone
Guest
Posts: n/a
 
      15th Nov 2010
Also, you can import up to ~98000 lines using the Excel Text Import Wizard.
("import external data" on the Data menu)
You do this by specifying the start line of the imported data.
See... http://support.microsoft.com/kb/119770
--
Jim Cone
Portland, Oregon USA
http://www.mediafire.com/PrimitiveSoftware

..
..
..

"Phil Hibbs" <(E-Mail Removed)>
wrote in message
news:56f18086-0c0f-45b0-b556-(E-Mail Removed)...
I have a large CSV file that I need to read and filter into various
sheets based on some rules that test values in particular columns.
There are more than 64k rows, so I can't load it into one sheet first
and then split it up from there as this needs to be 2003-compatible.
I'm currently reading it line by line in a macro, breaking up each
line into its fields, applying the rules, and then writing each row to
the appropriate sheet. This is quite slow, I think mostly it's the
rules part of the code, but I was wondering if there was any way of
parsing a single row of CSV other than doing it the hard way? I
suppose I should try writing the entire line to a single cell and
calling Text To Columns on it and then sending it off to the
appropriate sheet. Maybe I could do, say, 10000 rows at a time this
way, and maybe use AutoFilter to apply the rules. Any other
suggestions?

Phil Hibbs.
 
Reply With Quote
 
Phil Hibbs
Guest
Posts: n/a
 
      15th Nov 2010
On Nov 15, 1:36*pm, Pete_UK <pashu...@auditel.net> wrote:
> Chip Pearson has a macro here:
> http://www.cpearson.com/excel/ImportBigFiles.aspx
> Hope this helps.
> Pete


That suffers from the same problem that my code does (sorry, should
have mentioned that), which is that it does not respect quoted strings
that contain the separator character. I need proper quoted CSV
parsing.

Phil Hibbs.
 
Reply With Quote
 
GS
Guest
Posts: n/a
 
      15th Nov 2010
Phil Hibbs formulated on Monday :
> I have a large CSV file that I need to read and filter into various
> sheets based on some rules that test values in particular columns.
> There are more than 64k rows, so I can't load it into one sheet first
> and then split it up from there as this needs to be 2003-compatible.
> I'm currently reading it line by line in a macro, breaking up each
> line into its fields, applying the rules, and then writing each row to
> the appropriate sheet. This is quite slow, I think mostly it's the
> rules part of the code, but I was wondering if there was any way of
> parsing a single row of CSV other than doing it the hard way? I
> suppose I should try writing the entire line to a single cell and
> calling Text To Columns on it and then sending it off to the
> appropriate sheet. Maybe I could do, say, 10000 rows at a time this
> way, and maybe use AutoFilter to apply the rules. Any other
> suggestions?
>
> Phil Hibbs.


If it's a properly constructed CSV then you could load it into an array
of arrays, then loop the array for the desired value at its expected
position in each element.

Also, if it's a properly constructed CSV having the first line
containing the field names, you could use ADO and filter the data via a
WHERE clause and specify the field name and value to filter for. --If
the first line doesn't contain the field names then simply prepend a
line to the file contents and save it b efore running an ADO query.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


 
Reply With Quote
 
Phil Hibbs
Guest
Posts: n/a
 
      16th Nov 2010
GS wrote:
> Also, if it's a properly constructed CSV having the first line
> containing the field names, you could use ADO and filter the data via a
> WHERE clause and specify the field name and value to filter for. --If
> the first line doesn't contain the field names then simply prepend a
> line to the file contents and save it b efore running an ADO query.


Unfortunately, the 4th line contains the column headings. That's one
of the reasons I need to process it line-by-line in VBA and decode
each line separately. I was kind of hoping there was some easy way to
take a string and parse it as a quoted CSV (or tab-separated, in the
case of another file I need to do something similar with). The SPLIT
function is not good enough as the data contains commas within quoted
values.

Phil Hibbs.
--
 
Reply With Quote
 
GS
Guest
Posts: n/a
 
      16th Nov 2010
It happens that Phil Hibbs formulated :
> GS wrote:
>> Also, if it's a properly constructed CSV having the first line
>> containing the field names, you could use ADO and filter the data via a
>> WHERE clause and specify the field name and value to filter for. --If
>> the first line doesn't contain the field names then simply prepend a
>> line to the file contents and save it b efore running an ADO query.

>
> Unfortunately, the 4th line contains the column headings. That's one
> of the reasons I need to process it line-by-line in VBA and decode
> each line separately. I was kind of hoping there was some easy way to
> take a string and parse it as a quoted CSV (or tab-separated, in the
> case of another file I need to do something similar with). The SPLIT
> function is not good enough as the data contains commas within quoted
> values.
>
> Phil Hibbs.


This begs me to ask what the first 3 lines contain. Can these be
eliminated so the file starts with the headings? -Perhaps a temp file
that starts at line 4! If so, filtering by heading is a good way to
capture records for dumping to a worksheet in one shot. That way it
shouldn't matter what each line contains because the entire recordset
will be only lines that meet your WHERE clause criteria.

I'm thinking that you could dump the entire file into a variant using
the Split function and specifying vbCrLf as the delimiter. Then empty
the first 3 elements and Filter() the array to get rid of those. Then
dump the array back into a temp file using Join and vbCrLf as the
delimiter. Then run your ADO query.

May seem like a lot of work but the coding is rather simple and
straight forward. The process should be fairly fast, and (I suspect)
much easier to manage than your current parsing process.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


 
Reply With Quote
 
Phil Hibbs
Guest
Posts: n/a
 
      16th Nov 2010
GS wrote:
> This begs me to ask what the first 3 lines contain. Can these be
> eliminated so the file starts with the headings? -Perhaps a temp file
> that starts at line 4!


Client name, date and time of report, and a blank line. I guess I
could write a temp file and then use the built-in facilities. I'm
reluctant to do individual ADO queries, though, as I need to split the
data out into 31 different sheets, and I don't want to have to read
through a 70,000 line file 31 times.

Phil.
 
Reply With Quote
 
GS
Guest
Posts: n/a
 
      16th Nov 2010
Phil Hibbs pretended :
> GS wrote:
>> This begs me to ask what the first 3 lines contain. Can these be
>> eliminated so the file starts with the headings? -Perhaps a temp file
>> that starts at line 4!

>
> Client name, date and time of report, and a blank line. I guess I
> could write a temp file and then use the built-in facilities. I'm
> reluctant to do individual ADO queries, though, as I need to split the
> data out into 31 different sheets, and I don't want to have to read
> through a 70,000 line file 31 times.
>
> Phil.


Well, the Tab-delimited files would be easy to dump into an array of
arrays. It's the Comma-delimited files that are going to be a problem
if, as you say, some individual values also contain commas. The easiest
way <IMO> to handle these is with loading lines into ADO recordsets,
which you'd only have to process if the recordset is not empty.

As it stands now, you have to evaluate each one of those 70,000 lines
to determine which sheet to put the data on, then redirect code to the
appropriate place to do that. Making 31 recordsets sounds easier and
quicker to me!<g>

Here's some code to restructure the CSVs:

Sub RestructureCSVs(ByVal FileIn As String, _
ByVal LinesToRemove As Long, _
Optional FileOut As String = "tmpCSV.dat")
' Opens a CSV file for editing.
' (Used to remove auxilliary lines before the line containing headings)
' Removes a specified number of lines from the start of the file.
' Dumps the filtered array into a reusable temp file.
' Requires ReadTextFileContents(), WriteTextFileContents()

Dim saLines() As String, i As Long

'Get all lines from the file
saLines() = Split(ReadTextFileContents(FileIn), vbCrLf)

'To quickly delete unwanted/empty lines, load them with vbNullChar
'and use the Filter() function to delete them
If LinesToRemove > 0 Then
For i = 0 To LinesToRemove - 1
saLines(i) = vbNullChar
Next
saLines() = Filter(saLines(), vbNullChar, False)

'Dump the contents into a temp file
FileOut = ThisWorkbook.Path & "\" & FileOut
WriteTextFileContents Join(saLines, vbCrLf), FileOut
End If
End Sub

Function ReadTextFileContents(Filename As String) As String
' Reuseable proc to read large amounts of data from a text file

Dim iNum As Integer, bFileIsOpen As Boolean

On Error GoTo ErrHandler
iNum = FreeFile()
Open Filename For Input As #iNum
bFileIsOpen = True '//if we got here the file opened successfully
ReadTextFileContents = Space$(LOF(iNum)) '//size our return string
'Read the entire contents in one single step
ReadTextFileContents = Input(LOF(iNum), iNum)

ErrHandler:
'Close the file
If bFileIsOpen Then Close #iNum
If Err Then Err.Raise Err.Number, , Err.Description
End Function 'ReadTextFileContents()

Sub WriteTextFileContents(Text As String, _
Filename As String, _
Optional AppendMode As Boolean = False)
' Reuseable proc to write/append large amounts of data to a text file

Dim iNum As Integer, bFileIsOpen As Boolean

On Error GoTo ErrHandler
iNum = FreeFile()
If AppendMode Then
Open Filename For Append As #iNum
Else
Open Filename For Output As #iNum
End If
bFileIsOpen = True '//if we got here the file opened successfully
Print #iNum, Text '//print to the file in one single step


ErrHandler:
'Close the file
If bFileIsOpen Then Close #iNum
If Err Then Err.Raise Err.Number, , Err.Description
End Sub 'WriteTextFileContents()

HTH

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


 
Reply With Quote
 
Phil Hibbs
Guest
Posts: n/a
 
      16th Nov 2010
GS wrote:
> Well, the Tab-delimited files would be easy to dump into an array of
> arrays. It's the Comma-delimited files that are going to be a problem
> if, as you say, some individual values also contain commas.


You'd think so. However, it gets better, it turns out the tab-
delimited file also contains tabs within quoted fields.

Phil.
--
 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
I format line 1 in bold; when I unbold line 2, line 1 UNBOLDS Jack Microsoft Powerpoint 0 21st Sep 2009 05:17 PM
How to convert a dotted line to a solid line in a line graph Sharlz Microsoft Excel Charting 1 14th Jan 2009 04:51 AM
How to store/read listbox items in a text file line by line (with line break) ? kimiraikkonen Microsoft VB .NET 6 2nd Nov 2007 05:27 PM
Different line border width between Line control and Line method Vensia Microsoft Access Reports 4 24th Oct 2005 12:38 PM
Macro problem on, Yellowed line - previous line or next line. =?Utf-8?B?RWQ=?= Microsoft Excel Programming 8 29th Mar 2005 11:55 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 06:10 AM.