How to put lines with certain text (from a file) in an array

V

Varun

Guys,

I'd like to open and parse a file such that when I parse, only lines with
certain text in them get included into my array. How can I accomplish this?
For example, let say that file contents are as follows:

text in line 1
text in line 2
layer_1
layer_2
layer_3

I'd like to save the lines with layer_1, layer_2, and layer_3 in my array
named line.

Here's what I have so far - what should I do next?

Sub geomsasciiparse()

Dim Buf() As String
Dim logical_layer As Variant
Dim line() As String

Dim objFSO As Object
Dim objGeomsAsciiFile As Object
Set objFSO = CreateObject("Scripting.FileSystemObject")

Set objGeomsAsciiFile = objFSO.OpenTextFile(MentDesContPath & "\geoms_ascii")
strBuffer = objGeomsAsciiFile.Readline

Do While Not objGeomsAsciiFile.AtEndOfStream

If InStr(strBuffer, "layer") = 1 Then




End If

Thanks for help.
 
R

RB Smissaert

Function OpenTextFileToString(strFile As String) As String

Dim hFile As Long

On Error GoTo ERROROUT

hFile = FreeFile
Open strFile For Binary As #hFile
OpenTextFileToString = Space(LOF(hFile))
Get hFile, , OpenTextFileToString
Close #hFile

Exit Function
ERROROUT:

If hFile > 0 Then
Close #hFile
End If

End Function


Sub Test()

Dim i As Long
Dim n As Long
Dim str As String
Dim arr1
Dim arr2() As String

str = OpenTextFileToString("C:\testfile.txt")

arr1 = Split(str, vbCrLf)

ReDim arr2(0 To UBound(arr1)) As String

For i = 0 To UBound(arr1)
If InStr(1, arr1(i), "layer_", vbBinaryCompare) > 0 Then
arr2(n) = arr1(i)
n = n + 1
End If
Next i

ReDim Preserve arr2(0 To n - 1) As String

'to check we got it right
For i = 0 To UBound(arr2)
MsgBox arr2(i), , i
Next i

End Sub


RBS
 
R

Rick Rothstein

The following code will output all the lines containing the text "layers_"
**anywhere** within them. Notice that you can't pick and choose a subset of
all the "layer_" lines; that is, you can't use this method to only output
"layer_1" and "layer_2" skipping over "layer_3"... the search text that gets
used in the Filter function works like that in the InStr function. Oh, and
change the file names for the input and output files.

Sub ReadProcessOutput()
Dim FileNum As Long
Dim TotalFile As String
Dim LinesOut As String
Dim LinesIn() As String
FileNum = FreeFile
Open "d:\temp\Test.txt" For Binary As #FileNum
TotalFile = Space(LOF(FileNum))
Get #FileNum, , TotalFile
Close #FileNum
LinesIn = Split(TotalFile, vbCrLf)
LinesOut = Join(Filter(LinesIn, "layer_", True, vbTextCompare), vbCrLf)
FileNum = FreeFile
Open "d:\temp\OutTest.txt" For Output As #FileNum
Print #FileNum, LinesOut
Close #FileNum
End Sub
 
P

Peter T

That's pretty cool Rick!
you can't use this method to only output "layer_1" and "layer_2" skipping
over "layer_3"...

maybe -

LinesOut = Join(Filter(Filter(LinesIn, "layer_", True, vbTextCompare), _
"layer_3", False, vbTextCompare), vbCrLf)

Regards,
Peter T
 
R

RB Smissaert

LinesOut = Join(Filter(LinesIn, "layer_", True, vbTextCompare), vbCrLf)

OK, that is another way, but as you say you still may need Instr if you want
layer_1, layer_2, layer_3, but not layer_4.

RBS
 
R

Rick Rothstein

I actually meant to post my message directly under the OP's posting, not
yours... sorry.
 
R

Rick Rothstein

Yes, very good Peter, that does seem to work. "Pretty cool" back at you.
 
R

RB Smissaert

OK, it is a one-liner, but is it faster than Instr in a loop?
Will test in a bit, unless somebody else will do that ...

RBS
 
R

RB Smissaert

I got the method with Join and Filter about twice as slow.
This is with testing on 1 Mb test file, with the 5 line repeating sequence
as in the OP:

Option Explicit
Private Declare Function timeGetTime Lib "winmm.dll" () As Long
Private lStartTime As Long

Function OpenTextFileToString(strFile As String) As String

Dim hFile As Long

On Error GoTo ERROROUT

hFile = FreeFile
Open strFile For Binary As #hFile
OpenTextFileToString = Space(LOF(hFile))
Get hFile, , OpenTextFileToString
Close #hFile

Exit Function
ERROROUT:

If hFile > 0 Then
Close #hFile
End If

End Function

Sub Test()

Dim i As Long
Dim n As Long
Dim str As String
Dim arr1
Dim str2 As String
Dim arr2
Dim bJoin As Boolean

bJoin = True

str = OpenTextFileToString("C:\testfile.txt")

arr1 = Split(str, vbCrLf)

StartSW

If bJoin Then
str2 = Join(Filter(Filter(arr1, "layer_", True, vbTextCompare), _
"layer_3", False, vbTextCompare), vbCrLf)
arr2 = Split(str2, vbCrLf)
Else
ReDim arr2(0 To UBound(arr1)) As String
For i = 0 To UBound(arr1)
If InStr(1, arr1(i), "layer_", vbBinaryCompare) > 0 And _
InStr(1, arr1(i), "layer_3", vbBinaryCompare) = 0 Then
arr2(n) = arr1(i)
n = n + 1
End If
Next i
ReDim Preserve arr2(0 To n - 1) As String
End If

StopSW

'to check we got it right
For i = 0 To 3
MsgBox arr2(i), , i
Next i

End Sub

Sub StartSW()
lStartTime = timeGetTime()
End Sub

Function StopSW(Optional bMsgBox As Boolean = True, _
Optional vMessage As Variant, _
Optional lMinimumTimeToShow As Long = -1) As Variant

Dim lTime As Long

lTime = timeGetTime() - lStartTime

If lTime > lMinimumTimeToShow Then
If IsMissing(vMessage) Then
StopSW = lTime
Else
StopSW = lTime & " - " & vMessage
End If
End If

If bMsgBox Then
If lTime > lMinimumTimeToShow Then
MsgBox "Done in " & lTime & " msecs", , vMessage
End If
End If

End Function


RBS
 
P

Peter T

Hi Bart,

For your test the Join and the Split are not necessary, simply

arr2 = Filter(Filter(arr1, "layer_", True, vbTextCompare), _
"layer_3", False, vbTextCompare)

With small files (up to say 0.3Mb) and large files 10Mb I didn't find much
difference in the two methods. Barely any measurable difference with the
small files although the loop was always slightly faster with the large
files.

Curiously though the loop was much faster with medium size files of 1Mb. The
Filter method was only slightly slower with a 1Mb file vs a 10Mb file (not
pro-rata at all). I don't understand the timing anomalies I got.

Regards,
Peter T
 
R

RB Smissaert

Hi Peter,

Ah, yes, that was a bit silly, joining first and then splitting again.
I did it all very quick and didn't look carefully at what it was doing.
So, on the whole the loop is somewhat faster still then, particularly for
medium sized files. So overall I prefer it as it clearer as well as to what
it is doing.

RBS
 
R

Rick Rothstein

Out of curiosity, how much faster was "much faster" for a single loop if
your test involved multiple loops (total time divided by number of loops)?
 
P

Peter T

I didn't save the original test. I've made a new test with somewhat
different data and seem to be getting a very different set of results this
time. In one sense all consistent but now the Filter approach is taking
about 2x longer than the loop with Instr with all sizes. (Previously 10Mb
was only about 25% slower with the Filter method, but 1Mb an odd 3x slower).

I'm pretty sure I had double checked my results last time. Maybe somehow I
got it wrong or as I suspect, in the past I've also had inconsistent results
with large strings, who knows. Here's what I tested this time -

Option Explicit
Private Declare Function GetTickCount Lib "kernel32.dll" () As Long
Const cFILE As String = "c:\temp\TestFile#.txt"

Sub MakeTestFiles()
Dim i As Long
Dim sFile As String, sText As String
Dim a(1 To 5) As String
Dim ff As Integer

a(1) = "This is layer_1"
a(2) = "this line does not have any layers"
a(3) = "Embedded at the end of this line is Layer_3"
a(4) = "A layer_4 in this fourth line"
a(5) = "This will be the last line with layer_5"

sText = Join(a, vbCrLf)
Do
sText = sText & vbCrLf & sText
If Len(sText) > 20000 Then
i = i + 1
sFile = Replace(cFILE, "#", i)
ff = FreeFile
Open sFile For Output As #ff
Print #ff, sText
Close #ff
Debug.Print i, Len(sText), sFile
End If

Loop Until Len(sText) > 10000000
' 10 files from 22Kb to 11Mb

End Sub

Sub CompareFilterLoop()
Dim ff As Integer
Dim i As Long, k As Long, n As Long, nSize As Long
Dim tFilter As Long, tLoop As Long
Dim sFile As String, sText As String
Dim arr1, arr2

For k = 1 To 10

ff = FreeFile
sFile = Replace(cFILE, "#", k)
Open sFile For Binary As #ff
nSize = LOF(ff)
sText = Space(nSize)
Get #ff, , sText
Close #ff

arr1 = Split(sText, vbCrLf)
If IsArray(arr2) Then Erase arr2

tFilter = GetTickCount
arr2 = Filter(Filter(arr1, "layer_", True, vbTextCompare), _
"layer_3", False, vbTextCompare)
tFilter = GetTickCount - tFilter

Erase arr2

tLoop = GetTickCount
ReDim arr2(0 To UBound(arr1)) As String
n = 0
For i = 0 To UBound(arr1)
If InStr(1, arr1(i), "layer_", vbBinaryCompare) > 0 And _
InStr(1, arr1(i), "layer_3", vbBinaryCompare) = 0 Then
arr2(n) = arr1(i)
n = n + 1
End If
Next i
ReDim Preserve arr2(0 To n - 1) As String

tLoop = GetTickCount - tLoop
Debug.Print tFilter, tLoop, nSize, UBound(arr1), UBound(arr2)

Next

End Sub

For me the filter method was roughly 2x slower with all sizes above 300k
where timings are meaningful

Regards,
Peter T
 
R

Rick Rothstein

My question was referring to physical elapsed time per loop, not relative
percentage speed. The reason I asked that question is if the entire process
(read, process, save) takes, say, 5 seconds to complete and the part of the
code in question takes either an 1/8 second for the fast code or 1/4 second
for the slow code, I would not think that a significant time difference,
even though one is half as fast as the other, when compared to the entire
process the code is part of. In other words, reading a file and then saving
a file will more than likely take up the bulk of the time and that is what
the user will notice, not the relative time difference for a portion of the
entire process.

--
Rick (MVP - Excel)


Peter T said:
I didn't save the original test. I've made a new test with somewhat
different data and seem to be getting a very different set of results this
time. In one sense all consistent but now the Filter approach is taking
about 2x longer than the loop with Instr with all sizes. (Previously 10Mb
was only about 25% slower with the Filter method, but 1Mb an odd 3x
slower).

I'm pretty sure I had double checked my results last time. Maybe somehow I
got it wrong or as I suspect, in the past I've also had inconsistent
results with large strings, who knows. Here's what I tested this time -

Option Explicit
Private Declare Function GetTickCount Lib "kernel32.dll" () As Long
Const cFILE As String = "c:\temp\TestFile#.txt"

Sub MakeTestFiles()
Dim i As Long
Dim sFile As String, sText As String
Dim a(1 To 5) As String
Dim ff As Integer

a(1) = "This is layer_1"
a(2) = "this line does not have any layers"
a(3) = "Embedded at the end of this line is Layer_3"
a(4) = "A layer_4 in this fourth line"
a(5) = "This will be the last line with layer_5"

sText = Join(a, vbCrLf)
Do
sText = sText & vbCrLf & sText
If Len(sText) > 20000 Then
i = i + 1
sFile = Replace(cFILE, "#", i)
ff = FreeFile
Open sFile For Output As #ff
Print #ff, sText
Close #ff
Debug.Print i, Len(sText), sFile
End If

Loop Until Len(sText) > 10000000
' 10 files from 22Kb to 11Mb

End Sub

Sub CompareFilterLoop()
Dim ff As Integer
Dim i As Long, k As Long, n As Long, nSize As Long
Dim tFilter As Long, tLoop As Long
Dim sFile As String, sText As String
Dim arr1, arr2

For k = 1 To 10

ff = FreeFile
sFile = Replace(cFILE, "#", k)
Open sFile For Binary As #ff
nSize = LOF(ff)
sText = Space(nSize)
Get #ff, , sText
Close #ff

arr1 = Split(sText, vbCrLf)
If IsArray(arr2) Then Erase arr2

tFilter = GetTickCount
arr2 = Filter(Filter(arr1, "layer_", True, vbTextCompare), _
"layer_3", False, vbTextCompare)
tFilter = GetTickCount - tFilter

Erase arr2

tLoop = GetTickCount
ReDim arr2(0 To UBound(arr1)) As String
n = 0
For i = 0 To UBound(arr1)
If InStr(1, arr1(i), "layer_", vbBinaryCompare) > 0 And _
InStr(1, arr1(i), "layer_3", vbBinaryCompare) = 0 Then
arr2(n) = arr1(i)
n = n + 1
End If
Next i
ReDim Preserve arr2(0 To n - 1) As String

tLoop = GetTickCount - tLoop
Debug.Print tFilter, tLoop, nSize, UBound(arr1), UBound(arr2)

Next

End Sub

For me the filter method was roughly 2x slower with all sizes above 300k
where timings are meaningful

Regards,
Peter T
 
P

Peter T

I didn't understand exactly what you were asking, actually I still don't -
"total time divided by number of loops"
There were no loops in the Filter method.

The relative timings I referred to in my last two posts refer to the time to
"process" the In-Array (irrespective from where it came from) and to output
a processed array, respectively for the double Filter method and looping
Instr in each element of the in-array.

IOW, timings relate purely to the different methods to "process", exclusive
of say read and save. This is consistent with the timings Bart demonstrated
(albeit with the unnecessary Split & Join).

I fully accept your point that the relevant time for the user is the overall
time, and that a significant difference in a small part of the overall
process may be insignificant overall, but that depends on the overall
process.

If you try the demo I posted it should be easy to adapt to time "read,
process, save" vs merely "process"

Regards,
Peter T






Rick Rothstein said:
My question was referring to physical elapsed time per loop, not relative
percentage speed. The reason I asked that question is if the entire
process (read, process, save) takes, say, 5 seconds to complete and the
part of the code in question takes either an 1/8 second for the fast code
or 1/4 second for the slow code, I would not think that a significant time
difference, even though one is half as fast as the other, when compared to
the entire process the code is part of. In other words, reading a file and
then saving a file will more than likely take up the bulk of the time and
that is what the user will notice, not the relative time difference for a
portion of the entire process.
 
R

RB Smissaert

How about run the test yourself and you will see?
Just add some extra StartSW and StopSW and it will all be revealed.

RBS


Rick Rothstein said:
My question was referring to physical elapsed time per loop, not relative
percentage speed. The reason I asked that question is if the entire
process (read, process, save) takes, say, 5 seconds to complete and the
part of the code in question takes either an 1/8 second for the fast code
or 1/4 second for the slow code, I would not think that a significant time
difference, even though one is half as fast as the other, when compared to
the entire process the code is part of. In other words, reading a file and
then saving a file will more than likely take up the bulk of the time and
that is what the user will notice, not the relative time difference for a
portion of the entire process.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top