load/parse large text file

K

Keith R

I have a data file that I'm importing into Excel (starts as a VMS report
that is dumped into the body of an outlook message, which is then saved as a
text file). I parse it based on fixed width columns (no delimiter) the same
code is used for several different files, so the parse widths are stored in
an array.

When the file was smaller, it was plenty fast- but now the file has grown to
13MB, and it takes many minutes to import. I suspect there is a better way.
Right now I'm opening the file, grabbing each line, parsing it, then pasting
it to the worksheet (code below).

Can someone recommend the best practice to do this as quickly as possible?
I've already got calculations set to manual, and screenupdating to false. I
was considering loading the entire file first, then parsing each line in
memory, then pasting all the lines, but while that groups like operations
together, I wasn't sure how it would save time if all of the same operations
were still being done a line at a time.

Thanks!!
Keith

Open LongFN For Input As #1
Do While Not EOF(1)
Line Input #1, LineofText
If Len(LineofText) > 3 Then
rw = rw + 1
' now parse LineofText according to the column
widths and
' put the values in an array.
For j = 1 To 30
ParseStart = TotalFileArray(WhichFile, j)
ParseEnd = TotalFileArray(WhichFile, j + 1)
If ParseEnd > 0 Then
TotalDataArray(WhichFile, j, rw) =
Trim(Mid(LineofText, ParseStart, ParseEnd - ParseStart))
ConvertCol (j)
mywrksht.Range(Usecol & Trim(Str(rw))).Value
= TotalDataArray(WhichFile, j, rw)
End If
Next
End If
Loop
Close #1 'Close the file
 
P

Peter Grebenik

Keith

My understanding is that each "write" to the spreadsheet carries an
overhead.

So, you could try parsing something like 100 rows of data into a
variant array and then writing that to the spreadsheet by simply
setting the appropriate range equal to the array. This will also cut
out your call to your ConvertCol routine.

HTH

Peter
 
K

Keith R

I've updated my code to read the entire file first, one line at a time into
an array (wow, that does seem faster, now down to about 30 seconds), parse
each line in memory (almost instant), and now I'm ready to set the worksheet
range equal to the array. I now have a few more questions. Using Excel 2003.

1) I'm never sure how many rows the data file will have, so I start with an
oversized array so I won't have to continually redim/preserve. My array is
dimmed as testarray(1 To 32, 1 To 50000). I'm not worried about memory, so
is it ok to dim an oversized array or will that cause some other problem I
don't know about yet?

2) I'm not sure how many rows are in the data file in advance, so I
increment a counter variable, rw, as I read each line. what is the
appropriate syntax to set a range equal to the array? I tried the following,
but it didn't work out.
Sheet1.range("A1") = testArray() 'in case Excel could autoexpand the range
to match the array size
Sheet1.range("A1:AF50000") = testArray() 'to paste the whole array at once,
risk overwriting surrounding areas with blanks
Sheet1.range("A1").resize(rw,32) = testArray() ' to paste only the same
number of rows as were in the raw file?

I keep getting a type mismatch error :(

3) I'm actually reading the raw file into (32, 1 to 50000), then parsing
each string back into fields 1-31. When I write the range back, is there an
easy way to just write the first 31 fields of the array? My alternative is
to erase field 32 before writing the whole array (probably with a
redim/preserve to make the array smaller), but I wasn't sure if I could just
write the desired section of the array- it might be useful if I later find
out I need to go back to the original string for any reason.

Many thanks!!
Keith
 
P

Peter Grebenik

Keith

I think that you need to swap your rows and columns so that it becomes
testArray(rows,columns).

To put your array into the sheet, you need to set the range size to be
equal to the array size as the following sample code shows. I used
this to copy a range of 40000 rows by 31 columns in about 10 seconds.

Sub t()
Dim r As Long, c As Long, a As Variant
a = Selection

r = UBound(a, 1) - LBound(a, 1) + 1
c = UBound(a, 2) - LBound(a, 2) + 1

Range("ah2").Resize(r, c) = a

End Sub


Peter
 
P

Peter T

As Peter said you need to swap your rows/columns

Re not dumping unnecessary columns into cells with your potentially
oversized array, while populating your array track the max number of used
columns, then redim preserve down to the max used. Can't do that of course
for the first dimension, the rows.

Regards,
Peter T
 
K

Keith R

Peter- I apologize for being so dense- I've not taken this approach before,
so the syntax is killing me. I switched the rows and columns, so that part
is done.

I believe the Ubound and LBound would need to refer to the array, so
therefore 'a' has to refer to my array. I changed the line a=selection to
a=myarray()

It gets through the Ubound/LBound lines and returns correct size paramters,
but when it tries to assign the array to my worksheet, I get a runtime error
1004 application- or object-defined error.

Sheet17.Range("a1").Resize(r, c) = a
values=
Sheet17.range("a1").Resize(5000,32)=a
I also reversed it in case the resize was based on my original (switched)
array order
Sheet17.range("a1").resize(32,5000)=a
but got the same error.

I tried both a=myarray() and a=myarray

Thank you for your continued help and patience,
Keith
 
K

Keith R

Ok, I have it working now-

The reason I was getting the error was I had not yet redimmed my array to
truncate the last field (the one the contained the original text string, in
case I needed it later). The error was occuring when it tried to paste the
original strings into cells. Once I redimmed the array and shrank the paste
range accordingly, everything pasted just fine.

I know there is a visible character limit of something like 255 characters
in a cell, but I thought that the cell could actually hold a good bit more
than that. I'm now assuming that I hit an upper limit on the actual cell
string length, whatever that limit is.

Thanks again to Peter G and Peter T for the help,
Keith
 
P

Peter Grebenik

Keith

I hope you got your improvement in speed.

A little experimentation seems to indicate that the maximum string
length that can be copied by this method is about 900 characters in
Excel 2003.

Helpfully, Microsoft have arranged different limits in different
versions of Excel!

Best wishes

Peter
 
P

Peter T

A cell can display the first 1024 characters but store 32K, whether as the
result of a formula, a value or after copy/paste

[a1].Formula = "=REPT(""a"",2^15-1)"
[a2].Formula = "=LEN(A1)"
[a3].Value = [a1].Value
[a4].Formula = "=LEN(A1)"
[a3].Copy [a5] ' copy/paste
[a6].Formula = "=LEN(A5)"

The above is a bit artificial and the respective limits may be slightly
less. Max length of a formula (beginning with an =) is 1024 or a bit less.

Regards,
Peter T
 
P

Peter Grebenik

Peter

I was merely pointing out the limitations of copying data by setting a
range equal to an array. It means that this method cannot be used to
copy cells containing very long text strings; they have to be copied
using the copy method.

Peter
 
P

Peter T

Hi Peter,

I have to confess I only scan read parts of the thread before posting the
previous two times, eg I didn't pick up on "this method" in your adjacent
post

I see your point, for me in a very light test the array to cells method
truncates to 1823 characters

[a1].Formula = "=Rept(""a"", 20000)"
[a1:d5] = [a1].Value

varr = [a1:d5].Value
Debug.Print Len(varr(2, 2)) ' 20000, picks up fine

Range("A11:d16") = varr
Debug.Print Len(Range("A11")) ' 1823, dumps truncated

Regards,
Peter T
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top