IO function in Vb.Net slower than in Vb6.0

  • Thread starter Thread starter hillcountry74
  • Start date Start date
Stephany,

Thanks for the code. I tried your sample, it doesn't seem to work. I'm
assuming _s variable is the string to be parsed and need not
necessarily have the fieldnames like Lastname etc, right?

How does the regex engine know to take 26 characters for extracting
City and that it is not the first 26 chrs. Please explain. And excuse
me for my ignorance. Never used reg exprs.
 
Stephany,

Thanks for the code. I tried your sample, it doesn't seem to work. I'm
assuming _s variable is the string to be parsed and need not
necessarily have the fieldnames like Lastname etc, right?

How does the regex engine know to take 26 characters for extracting
City and that it is not the first 26 chrs. Please explain. And excuse
me for my ignorance. Never used reg exprs.

Imports System.Text
Imports System.IO
Imports System.Text.RegularExpressions
Module Module1

Sub Main()

Dim aStreamReader As TextReader
aStreamReader = New StreamReader("C:\SAMPLE FILE.txt")
Dim _s As String = aStreamReader.ReadToEnd
aStreamReader.Close()

Dim _exp As String = "((?<ActionCode>.{1})" & _
"(?<CarrierID>.{25})" & _
"(?<LastName>.{60})" & _
"(?<FirstName>.{30})" & _
"(?<MiddleName>.{15})" & _
"(?<Addr1>.{60})" & _
"(?<Addr2>.{60})" & _
"(?<City>.{30})" & _
"(?<State>.{2})" & _
"(?<Zip>.{10})" & _
"(?<BenefitOption>.{60})" & _
"(?<EmployerGroup>.{15})" & _
"(?<OptionEffDate>.{8})" & _
"(?<HPEffDate>.{8})" & _
"(?<TermDate>.{8})" & _
"(?<Sex>.{1})" & _
"(?<DOB>.{8})" & _
"(?<SSN>.{9})" & _
"(?<Phone>.{12})" & _
"(?<EmployerGroupAnivDate>.{8})" & _
"(?<HeadOfHouse>.{9})" & _
"(?<PrimaryStatus>.{1})" & _
"(?<MaritalStatus>.{1}))"

Dim r As Regex = New Regex(_exp)

Dim g As MatchCollection = r.Matches(_s)
Dim m As Match

Dim _sInActionCode As String
Dim _sInCarrierID As String
Dim _sInLastName As String
Dim _sInFirstName As String
Dim _sInMiddleName As String
Dim _sInAddr1 As String
Dim _sInAddr2 As String
Dim _sInCity As String
Dim _sInState As String
Dim _sInZip As String
Dim _sInBenefitOption As String
Dim _sInEmployerGroup As String
Dim _sInOptionEffDate As String
Dim _sInHPEffDate As String
Dim _sInTermDate As String
Dim _sInSex As String
Dim _sInDOB As String
Dim _sInSSN As String
Dim _sInPhone As String
Dim _sInEmployerGroupAnivDate As String
Dim _sInHeadOfHouse As String
Dim _sInPrimaryStatus As String
Dim _sInMaritalStatus As String
Dim d As New DateTime
Dim dt As Double

d = DateTime.Now

For i As Int32 = 0 To g.Count - 1
m = g.Item(i)

_sInActionCode = m.Groups("ActionCode").ToString.Trim
_sInCarrierID = m.Groups("CarrierID").ToString.Trim
_sInLastName = m.Groups("LastName").ToString.Trim
_sInFirstName = m.Groups("FirstName").ToString.Trim
_sInMiddleName = m.Groups("MiddleName").ToString.Trim
_sInAddr1 = m.Groups("Addr1").ToString.Trim
_sInAddr2 = m.Groups("Addr2").ToString.Trim
_sInCity = m.Groups("City").ToString.Trim
_sInState = m.Groups("State").ToString.Trim
_sInZip = m.Groups("Zip").ToString.Trim
_sInBenefitOption = m.Groups("BenefitOption").ToString.Trim
_sInEmployerGroup = m.Groups("EmployerGroup").ToString.Trim
_sInOptionEffDate = m.Groups("OptionEffDate").ToString.Trim
_sInHPEffDate = m.Groups("OptionEffDate").ToString.Trim
_sInTermDate = m.Groups("HPEffDate").ToString.Trim
_sInSex = m.Groups("TermDate").ToString.Trim
_sInDOB = m.Groups("DOB").ToString.Trim
_sInSSN = m.Groups("SSN").ToString.Trim
_sInPhone = m.Groups("Phone").ToString.Trim
_sInEmployerGroupAnivDate = m.Groups
("EmployerGroupAnivDate").ToString.Trim()
_sInHeadOfHouse = m.Groups("HeadOfHouse").ToString.Trim
_sInPrimaryStatus = m.Groups("PrimaryStatus").ToString.Trim
_sInMaritalStatus = m.Groups("MaritalStatus").ToString.Trim
'Console.WriteLine()
Console.WriteLine(i)
'Console.WriteLine()
'Console.WriteLine("_sInActionCode = " & _sInActionCode)
'Console.WriteLine("_sInCarrierID = " & _sInCarrierID)
'Console.WriteLine("_sInLastName = " & _sInLastName)
'Console.WriteLine("_sInFirstName = " & _sInFirstName)
'Console.WriteLine("_sInMiddleName = " & _sInMiddleName)
'Console.WriteLine("_sInAddr1 = " & _sInAddr1)
'Console.WriteLine("_sInAddr2 = " & _sInAddr2)
'Console.WriteLine("_sInCity = " & _sInCity)
'Console.WriteLine("_sInState = " & _sInState)
'Console.WriteLine("_sInZip = " & _sInZip)
'Console.WriteLine("_sInBenefitOption = " &
_sInBenefitOption)
'Console.WriteLine("_sInEmployerGroup = " &
_sInEmployerGroup)
'Console.WriteLine("_sInOptionEffDate = " &
_sInOptionEffDate)
'Console.WriteLine("_sInHPEffDate = " & _sInHPEffDate)
'Console.WriteLine("_sInTermDate = " & _sInTermDate)
'Console.WriteLine("_sInSex = " & _sInSex)
'Console.WriteLine("_sInDOB = " & _sInDOB)
'Console.WriteLine("_sInSSN = " & _sInSSN)
'Console.WriteLine("_sInPhone = " & _sInPhone)
'Console.WriteLine("_sInEmployerGroupAnivDate = " &
_sInEmployerGroupAnivDate)
'Console.WriteLine("_sInHeadOfHouse = " & _sInHeadOfHouse)
'Console.WriteLine("_sInPrimaryStatus = " &
_sInPrimaryStatus)
'Console.WriteLine("_sInMaritalStatus = " &
_sInMaritalStatus)
Next
Dim dt2 = DateTime.Now.Subtract(d).TotalSeconds

Console.WriteLine(dt2)
Console.ReadLine()
End Sub

End Module

Sorry the code is a little messy. But it works. Parsed 8064 Records in 2
seconds flat. Simulate some other work by outputing everything to a
console window an it takes 34 seconds.

To answer your question RegEx uses a position marker *simular* to that
of reading a file where the position is incremented relative to the
amount read (for comparison sakes). So just telling it how much to read
is good enough.

Thankyou Stephany for writing that all out:)

A note about your sample file: I hope fields were left blank, and things
like HeadOfHouse is a number, otherwise this isn't working.
Sample:
_sInActionCode =
_sInCarrierID = 00000050101
_sInLastName = SMITH
_sInFirstName = VICKI
_sInMiddleName =
_sInAddr1 = C/O SUE EDDY - MISD BENEFITS
_sInAddr2 = 405 EAST DAVIS
_sInCity = MESQUITE
_sInState = TX
_sInZip = 75149
_sInBenefitOption = 001
_sInEmployerGroup = 2002MISD
_sInOptionEffDate = 20050301
_sInHPEffDate = 20050301
_sInTermDate = 20040401
_sInSex = 20050331
_sInDOB = 19510125
_sInSSN = 000010009
_sInPhone =
_sInEmployerGroupAnivDate =
_sInHeadOfHouse = 464088770
_sInPrimaryStatus = P
_sInMaritalStatus = I

Let me know,
MP
 
Yes, you are correct, _s is a string to simmulate a record read from your
input file and my use of fieldnames etc, was jsut to create some
placeholders. Where it says e.g. <7 spaces>, you need to replace that bit
(including the angle brackets) with that number of space characters. Because
of the way newsgroup readers wrap text etc, it was difficult to show the
actual values.

The numbers in the curly brackets e.g. {8}, in the regex expression tell it
how many character positions each element takes up. If you add them all up
you will find that they come to 442 which is why you your input record must
be a minimum of 442 characters long. If it is shorter then it won't work. If
it is longer than only the first 442 characters are utilised.

I have been assuming that your input records are, in fact, fixed-length
fields in a fixed-length record. Is this actually the case or are there some
records that are shorter. It would be nice if you could show a smaple record
(doctored to hide sensitive information of course).

In addition what did you find about my other comments regarding the stray
'unicode' character etc.
 
Thanks a lot MP. Really appreciate your help.

Can you please paste the regular expression for this? Can't find it in
the code.

Also, on the headofhouse, it could be alphanumeric. And yes, some
fields would be blank.

There could be files of size 400MB. In such a case, reading till
endoffile might not work. Instead, if it is changed to reading one line
at a time, do you think the speed will reduce?

Thanks again.
 
In you output, note that you've got some fields out of whack.
_sInHPEffDate = m.Groups("OptionEffDate").ToString.Trim
_sInTermDate = m.Groups("HPEffDate").ToString.Trim
_sInSex = m.Groups("TermDate").ToString.Trim

This shows in that the display of _sInSex is not 1 character.

I don't understand your last comment:
A note about your sample file: I hope fields were left blank, and things
like HeadOfHouse is a number, otherwise this isn't working.

What do you mean by 'I hope fields were left blank'? If you mean, for
example, ActionCode being left blank if it is not supplied, i.e. a space
character as a place holder for it, then I would assume yes because
otherwise the original parsing routine would never work in the first place.
When I refer to a value being 'missing' then I am really saying that the
character positions that it would normally occupy are filled with spaces.

I don't think that you can assume (unless you are privvy to something that
I'm not) that HeadOfHouse is numeric. From an earlier post - Head of House
Validation - chars "A-Z,.-'0-9" so this indicates any combination of
characters in the list. The only other thing that can be implied about it is
that if it is 'missing' then it is assigned the first 9 characters of
CarrierId and there is no indication that CarrierId should be numeric.
Anyway, checking that is the role of validation rather than parsing.
 
MP,
Posting this msg for the 2nd time.

Thanks a lot for the code. I really appreciate your help and time.

Can you please post the regular expr for this as I can't find it in the
code?

As there could be files of size 400MB, reading till endoffile might not
work. Instead, if it is changed to reading one line at a time, will it
slowdown the parsing?

Also, on the headofhouse, it can be alphanumeric. And yes, some fields
could be blank.

Thanks.
 
Thanks a lot MP. Really appreciate your help.

Can you please paste the regular expression for this? Can't find it in
the code.

Also, on the headofhouse, it could be alphanumeric. And yes, some
fields would be blank.

There could be files of size 400MB. In such a case, reading till
endoffile might not work. Instead, if it is changed to reading one line
at a time, do you think the speed will reduce?

Thanks again.

This is the Regular Expression:
Dim _exp As String = "((?<ActionCode>.{1})" & _
"(?<CarrierID>.{25})" & _
"(?<LastName>.{60})" & _
"(?<FirstName>.{30})" & _
"(?<MiddleName>.{15})" & _
"(?<Addr1>.{60})" & _
"(?<Addr2>.{60})" & _
"(?<City>.{30})" & _
"(?<State>.{2})" & _
"(?<Zip>.{10})" & _
"(?<BenefitOption>.{60})" & _
"(?<EmployerGroup>.{15})" & _
"(?<OptionEffDate>.{8})" & _
"(?<HPEffDate>.{8})" & _
"(?<TermDate>.{8})" & _
"(?<Sex>.{1})" & _
"(?<DOB>.{8})" & _
"(?<SSN>.{9})" & _
"(?<Phone>.{12})" & _
"(?<EmployerGroupAnivDate>.{8})" & _
"(?<HeadOfHouse>.{9})" & _
"(?<PrimaryStatus>.{1})" & _
"(?<MaritalStatus>.{1}))"

A pretty good definition can be found on MSDN. Search For RegEx or
Regular Expressions. :)

I'll try to "simulate"(wink wink:) a 400mb file and check performance.
Reading one line at a time is out of the question for this experiment,
as it would require a couple of million reads(guessing), reading in
442bytes * nRecords would be better if not the best way to do it. BUT
this and ReadToEnd both REQUIRE every record to be 442 bytes (or
whatever it is) Off by one byte, and kiss you're records goodbye.

MP
 
The speed when dealing with IO devices (disks, networks, etc.) is largely
subject because it depends on things like disk rpm, network bandwith,
network usage, processior type and speed, memory size and a lot of other
factors that it is not really not worth losing any sleep over.

Can we digress back to your original post and address your perception of
'slowness'.

You said that your VB.NET version takes 15 seconds longer than your VB6
version.

Now, that is 15 seconds longer in relation to what?

- If you take a specific file and run it through the VB6 version then how
long does it take?

- If you run that same file through the VB.NET version then how long does
that take?

- How many records were in the file?

If you run that same file through the VB.NET version again almost
immediately, then is the the run time any different than the first time.

Then we come to some usage scenario questions:

At what time of day does the VB6 version run.

- Is it run by a user during the course of the business day?

- Is it run as a 'batch process' at an 'off peak' time?

- At what time of day were you running the VB.NET version?

- Was the VB.NET version run on the same hardware as the VB6 version?

Do you see what I'm driving at? The question really is - Are we comparing
apples with apples and is a '15 second' difference really relevant?

If the VB6 version takes, for example, more than 2 minutes to process, say,
10000 records (approx 4MB), then I would suggest that an additional 15
seconds to be insignificant.

If, however, the VB6 version takes, for example, less than 10 seconds to
process, say, 10000 records (approx 4MB), then, obviously, an additional 15
seconds is highly significant.

You say that files could be up to 400MB which indicates somewhre around
1000000 records.

- Is this file size a regular occurrence or does this size occur only
occasionally?

- How long does the VB6 version take to process a file of this size and
how long does the VB.NET version take.

- If it takes longer, is the time differenece relevant to the number of
records?

For example, if it takes 15 seconds longer to process 10000 records, does it
take 1500 seconds longer to process 1000000 records (100 times the records
ergo 100 times longer).

If, for instance, it always takes 15 seconds longer, regardless of the
number of records, then that would indicate that it's nothing to do with
your processing code at all, rather the 'problem' would lie in the general
program overhead under .NET.

It would be interesting to hear your comments and/or finding of any/all of
thses factors, remembering, of course, that the the factors I have thrown
into the ring are really only scratching the surface.
 
I'm just a little concerned that you might have missed a critical point her
MeltingPoint.

Each record in the file is terminated by a LF or a CR/LF pair. This is hown
by the use of the ReadLine method in the original code fragment.

A line is defined as a sequence of characters followed by a line feed or a
carriage return immediately followed by a line feed. The string that is
returned does not contain the terminating carriage return or line feed. The
returned value is a null reference (Nothing in Visual Basic) if the end of
the input stream is reached.

If you use the ReadToEnd method then you have to identify what the record
delimiter is and split the input into 'records' based on that delimiter
before you can apply the RegEx anyway. Unless, of course the reGex is
preceded by a '$' to indicate start at the beginning of each line.

If you dont handle this then each record, subsequent to the first, will be
off by 1 or 2 characters compounding.
 
In you output, note that you've got some fields out of whack.


This shows in that the display of _sInSex is not 1 character.

Sorry Stephany, post order is getting screwed up. The above is copied from
one of your posts(and thank you for it again), but thanks for pointing it
out, I was wondering why sex was "45738495". As for the rest of your
comment, I was sent some sample data from hillcountry74, and thought I was
replying under his thread. Sorry for the confusion. By the way, do you mind
if I ask what field your in?

Cheers,
MP
 
I'm an IT Consultant, with close to 30 experience in the industry.

Since 1994 I have specialised in VB related software and have been using
VB.NET and C#.NET since their first 'retail' release.

I still have a few applications that I support in Vb4, VB5 and VB6 but all
new development is in VB.Net or C#.NET.
 
MP,
Posting this msg for the 2nd time.

Thanks a lot for the code. I really appreciate your help and time.

Can you please post the regular expr for this as I can't find it in the
code?

As there could be files of size 400MB, reading till endoffile might not
work. Instead, if it is changed to reading one line at a time, will it
slowdown the parsing?

Also, on the headofhouse, it can be alphanumeric. And yes, some fields
could be blank.

Thanks.

I just thought of something. How can you be using ReadLine if there's no
delimiters? The answer is: There is a delimiter. A carriage return at
the end of every record. However, I don't thinks this helps the regex
thing. But 'just so ya know' a delimiter can be anything, not just a
comma! :)

I'm just cleaning up the code so you can read in chunks at a time,(1.5
gigs of ram wasn't even enough to read in 400mb of text) will post back
soon. By the by, are you using VB.NET?

MP
 
I'm just a little concerned that you might have missed a critical
point her MeltingPoint.

Each record in the file is terminated by a LF or a CR/LF pair. This is
hown by the use of the ReadLine method in the original code fragment.

A line is defined as a sequence of characters followed by a line feed
or a carriage return immediately followed by a line feed. The string
that is returned does not contain the terminating carriage return or
line feed. The returned value is a null reference (Nothing in Visual
Basic) if the end of the input stream is reached.

If you use the ReadToEnd method then you have to identify what the
record delimiter is and split the input into 'records' based on that
delimiter before you can apply the RegEx anyway. Unless, of course the
reGex is preceded by a '$' to indicate start at the beginning of each
line.

If you dont handle this then each record, subsequent to the first,
will be off by 1 or 2 characters compounding.

I figured it out. Either way, if it is a fixed record then the cr would
be included in the record size. So the above would be 443*nRecords. No
harm no foul. The point is the file can evenly be divided by the number
of bytes in a record plus the delimiter (which is the first thing I
asked him 10 posts ago and was told the was no delimiter).

MP
 
I'm an IT Consultant, with close to 30 experience in the industry.

Since 1994 I have specialised in VB related software and have been
using VB.NET and C#.NET since their first 'retail' release.

I still have a few applications that I support in Vb4, VB5 and VB6 but
all new development is in VB.Net or C#.NET.

I had a feeling you were. You've got that dry reply that says 'I've been
dealing with this shit for 30 years' Not in a bad way of course. I find it
humorous, only because I've recently begun spending alot more time with
'interesting' clients, and I can feel myself doing it:)

Cheers, relax, enjoy life while your still "Young"

Sorry, coudn't resist,
MP
 
What was said was that the fields were not delimited.

The fact that there is a record delimiter is a given because of the use of
the ReadLine method.

Remember that the code in VB6 works and the 'ReadLine' method is a straight
conversion of the 'Line Input' statement which does, ostensibly, the same
thing.

Anyway the detectives have been at work.

Parsing a 100000 file of 422 characters per record in a line by line read
using regex on my workstation takes approx 73 seconds.

Parsing the same file in a line by line read using the Trim and Mid
functions takes approx 15 seconds.

Parsing the same file in a line by line read using the String.SubString and
String.Trim methods takes approx 17 seconds.

The VB6 equivalent takes approx 40 seconds.

As I said in my first post, it is highly likely that the '15 second'
difference was due to one of the the other methods that is executed on a per
record basis, rather than the reading and parsing of the file and these
results bear that out.

Although it has been an interesting exercise, I don't think that regex is
the way to go in this case.
 
Inside the IDE - Not being displayed to console.
Processed 83265 records in 5.6875 seconds. At 2 Records per pass.
Processed 83265 records in 4.28125 seconds. At 20 Records per pass.
Processed 83265 records in 4.046875 seconds. At 50 Records per pass.
Processed 83265 records in 4.046875 seconds. At 75 Records per pass.
Processed 83265 records in 4.765625 seconds. At 100 Records per pass. Breaking Point Reached.

Compiled Application
Processed 83265 records in 3.53125 seconds. At 75 Records per pass.
Processed 83265 records in 3.625 seconds. At 100 Records per pass.
Processed 83265 records in 3.59375 seconds. At 200 Records per pass.
Processed 83265 records in 3.609375 seconds. At 500 Records per pass.
Processed 83265 records in 3.625 seconds. At 1000 Records per pass.
Processed 83265 records in 3.609375 seconds. At 10000 Records per pass.
Processed 83265 records in 3.59375 seconds. At 50000 Records per pass.

You be the judge.
Heres the source code.
Let me know if you need help with the verify routines.

Imports System.Text
Imports System.IO
Imports System.Text.RegularExpressions
Module Module1

Sub Main()
'File path and number of records to parse per pass
ReadAndParse("C:\SAMPLE FILE.txt", 1000)
End Sub

#Region " Expression Definition "
Dim _exp As String = "((?<ActionCode>.{1})" & _
"(?<CarrierID>.{25})" & _
"(?<LastName>.{60})" & _
"(?<FirstName>.{30})" & _
"(?<MiddleName>.{15})" & _
"(?<Addr1>.{60})" & _
"(?<Addr2>.{60})" & _
"(?<City>.{30})" & _
"(?<State>.{2})" & _
"(?<Zip>.{10})" & _
"(?<BenefitOption>.{60})" & _
"(?<EmployerGroup>.{15})" & _
"(?<OptionEffDate>.{8})" & _
"(?<HPEffDate>.{8})" & _
"(?<TermDate>.{8})" & _
"(?<Sex>.{1})" & _
"(?<DOB>.{8})" & _
"(?<SSN>.{9})" & _
"(?<Phone>.{12})" & _
"(?<EmployerGroupAnivDate>.{8})" & _
"(?<HeadOfHouse>.{9})" & _
"(?<PrimaryStatus>.{1})" & _
"(?<MaritalStatus>.{1}))"
#End Region

#Region " Label Definitions "
Dim _sInActionCode As String
Dim _sInCarrierID As String
Dim _sInLastName As String
Dim _sInFirstName As String
Dim _sInMiddleName As String
Dim _sInAddr1 As String
Dim _sInAddr2 As String
Dim _sInCity As String
Dim _sInState As String
Dim _sInZip As String
Dim _sInBenefitOption As String
Dim _sInEmployerGroup As String
Dim _sInOptionEffDate As String
Dim _sInHPEffDate As String
Dim _sInTermDate As String
Dim _sInSex As String
Dim _sInDOB As String
Dim _sInSSN As String
Dim _sInPhone As String
Dim _sInEmployerGroupAnivDate As String
Dim _sInHeadOfHouse As String
Dim _sInPrimaryStatus As String
Dim _sInMaritalStatus As String
#End Region

#Region " Timing "
Dim startTime As New DateTime
Dim finishTime As Double
#End Region



Sub ReadAndParse(ByVal inFilePath As String, ByVal numRecordsPerBlock As Int32)
Const RECORD_SIZE As Int32 = 443
Dim inputFile As New FileInfo(inFilePath)
Dim inputFileLen As Int64 = inputFile.Length
Dim iterations As Int32
Dim bytesPerIteration As Int32
Dim totalRecords As Int32
Dim moreRecords As Boolean

'Verify Length
If Not inputFileLen Mod 443 = 0 Then
Throw New ApplicationException("File Length Error")
End If

'Figure out how many times to loop
iterations = inputFileLen \ (numRecordsPerBlock * RECORD_SIZE)
'Bytes(records) per loop
bytesPerIteration = numRecordsPerBlock * RECORD_SIZE
'Check to see if we got lucky
moreRecords = ((iterations * RECORD_SIZE) <> inputFileLen)
'reset total records
totalRecords = 0

'Get input stream
Dim inStream As New StreamReader(inputFile.FullName)
Dim inputBlock As String
Dim buf(bytesPerIteration) As Char

'Set up regex
Dim regExp As New Regex(_exp, RegexOptions.Compiled) ' I think this speeds it up'
Dim mc As MatchCollection
Dim record As Match

'Set up and loop
startTime = Now()
For i As Int32 = 1 To iterations

inStream.ReadBlock(buf, 0, bytesPerIteration)
inputBlock = New String(buf)

'Parse it
mc = regExp.Matches(inputBlock)
For j As Int32 = 0 To mc.Count - 1
record = mc.Item(j)
'Verify record proc here
totalRecords += 1
_sInActionCode = record.Groups("ActionCode").ToString.Trim
_sInCarrierID = record.Groups("CarrierID").ToString.Trim
_sInLastName = record.Groups("LastName").ToString.Trim
_sInFirstName = record.Groups("FirstName").ToString.Trim
_sInMiddleName = record.Groups("MiddleName").ToString.Trim
_sInAddr1 = record.Groups("Addr1").ToString.Trim
_sInAddr2 = record.Groups("Addr2").ToString.Trim
_sInCity = record.Groups("City").ToString.Trim
_sInState = record.Groups("State").ToString.Trim
_sInZip = record.Groups("Zip").ToString.Trim
_sInBenefitOption = record.Groups("BenefitOption").ToString.Trim
_sInEmployerGroup = record.Groups("EmployerGroup").ToString.Trim
_sInOptionEffDate = record.Groups("OptionEffDate").ToString.Trim
_sInHPEffDate = record.Groups("HPEffDate").ToString.Trim
_sInTermDate = record.Groups("TermDate").ToString.Trim
_sInSex = record.Groups("Sex").ToString.Trim
_sInDOB = record.Groups("DOB").ToString.Trim
_sInSSN = record.Groups("SSN").ToString.Trim
_sInPhone = record.Groups("Phone").ToString.Trim
_sInEmployerGroupAnivDate = record.Groups("EmployerGroupAnivDate").ToString.Trim()
_sInHeadOfHouse = record.Groups("HeadOfHouse").ToString.Trim
_sInPrimaryStatus = record.Groups("PrimaryStatus").ToString.Trim
_sInMaritalStatus = record.Groups("MaritalStatus").ToString.Trim
'REMOVE
DisplayToConsole(record)
'END REMOVE

Next
Next

'One last time through
If moreRecords Then
inputBlock = inStream.ReadToEnd() 'Finish off reading
inStream.Close()
mc = regExp.Matches(inputBlock)
For j As Int32 = 0 To mc.Count - 1
record = mc.Item(j)
'Verify record proc here
totalRecords += 1
_sInActionCode = record.Groups("ActionCode").ToString.Trim
_sInCarrierID = record.Groups("CarrierID").ToString.Trim
_sInLastName = record.Groups("LastName").ToString.Trim
_sInFirstName = record.Groups("FirstName").ToString.Trim
_sInMiddleName = record.Groups("MiddleName").ToString.Trim
_sInAddr1 = record.Groups("Addr1").ToString.Trim
_sInAddr2 = record.Groups("Addr2").ToString.Trim
_sInCity = record.Groups("City").ToString.Trim
_sInState = record.Groups("State").ToString.Trim
_sInZip = record.Groups("Zip").ToString.Trim
_sInBenefitOption = record.Groups("BenefitOption").ToString.Trim
_sInEmployerGroup = record.Groups("EmployerGroup").ToString.Trim
_sInOptionEffDate = record.Groups("OptionEffDate").ToString.Trim
_sInHPEffDate = record.Groups("HPEffDate").ToString.Trim
_sInTermDate = record.Groups("TermDate").ToString.Trim
_sInSex = record.Groups("Sex").ToString.Trim
_sInDOB = record.Groups("DOB").ToString.Trim
_sInSSN = record.Groups("SSN").ToString.Trim
_sInPhone = record.Groups("Phone").ToString.Trim
_sInEmployerGroupAnivDate = record.Groups("EmployerGroupAnivDate").ToString.Trim()
_sInHeadOfHouse = record.Groups("HeadOfHouse").ToString.Trim
_sInPrimaryStatus = record.Groups("PrimaryStatus").ToString.Trim
_sInMaritalStatus = record.Groups("MaritalStatus").ToString.Trim
'REMOVE
DisplayToConsole(record)
'END REMOVE
Next
Else
inStream.Close()
End If

Dim finishTime = DateTime.Now.Subtract(startTime).TotalSeconds
Console.WriteLine()
Console.WriteLine("Processed {0} records in {1} seconds.", totalRecords, finishTime)
Console.ReadLine()
End Sub
Sub DisplayToConsole(ByVal record As Match)
_sInActionCode = record.Groups("ActionCode").ToString.Trim
_sInCarrierID = record.Groups("CarrierID").ToString.Trim
_sInLastName = record.Groups("LastName").ToString.Trim
_sInFirstName = record.Groups("FirstName").ToString.Trim
_sInMiddleName = record.Groups("MiddleName").ToString.Trim
_sInAddr1 = record.Groups("Addr1").ToString.Trim
_sInAddr2 = record.Groups("Addr2").ToString.Trim
_sInCity = record.Groups("City").ToString.Trim
_sInState = record.Groups("State").ToString.Trim
_sInZip = record.Groups("Zip").ToString.Trim
_sInBenefitOption = record.Groups("BenefitOption").ToString.Trim
_sInEmployerGroup = record.Groups("EmployerGroup").ToString.Trim
_sInOptionEffDate = record.Groups("OptionEffDate").ToString.Trim
_sInHPEffDate = record.Groups("HPEffDate").ToString.Trim
_sInTermDate = record.Groups("TermDate").ToString.Trim
_sInSex = record.Groups("Sex").ToString.Trim
_sInDOB = record.Groups("DOB").ToString.Trim
_sInSSN = record.Groups("SSN").ToString.Trim
_sInPhone = record.Groups("Phone").ToString.Trim
_sInEmployerGroupAnivDate = record.Groups("EmployerGroupAnivDate").ToString.Trim()
_sInHeadOfHouse = record.Groups("HeadOfHouse").ToString.Trim
_sInPrimaryStatus = record.Groups("PrimaryStatus").ToString.Trim
_sInMaritalStatus = record.Groups("MaritalStatus").ToString.Trim

Console.WriteLine()
Console.WriteLine("_sInActionCode = " & _sInActionCode)
Console.WriteLine("_sInCarrierID = " & _sInCarrierID)
Console.WriteLine("_sInLastName = " & _sInLastName)
Console.WriteLine("_sInFirstName = " & _sInFirstName)
Console.WriteLine("_sInMiddleName = " & _sInMiddleName)
Console.WriteLine("_sInAddr1 = " & _sInAddr1)
Console.WriteLine("_sInAddr2 = " & _sInAddr2)
Console.WriteLine("_sInCity = " & _sInCity)
Console.WriteLine("_sInState = " & _sInState)
Console.WriteLine("_sInZip = " & _sInZip)
Console.WriteLine("_sInBenefitOption = " & _sInBenefitOption)
Console.WriteLine("_sInEmployerGroup = " & _sInEmployerGroup)
Console.WriteLine("_sInOptionEffDate = " & _sInOptionEffDate)
Console.WriteLine("_sInHPEffDate = " & _sInHPEffDate)
Console.WriteLine("_sInTermDate = " & _sInTermDate)
Console.WriteLine("_sInSex = " & _sInSex)
Console.WriteLine("_sInDOB = " & _sInDOB)
Console.WriteLine("_sInSSN = " & _sInSSN)
Console.WriteLine("_sInPhone = " & _sInPhone)
Console.WriteLine("_sInEmployerGroupAnivDate = " & _sInEmployerGroupAnivDate)
Console.WriteLine("_sInHeadOfHouse = " & _sInHeadOfHouse)
Console.WriteLine("_sInPrimaryStatus = " & _sInPrimaryStatus)
Console.WriteLine("_sInMaritalStatus = " & _sInMaritalStatus)
End Sub
End Module
 
Using your code, I am seeing similar results give or take a few
milliseconds. I noticed that the best results were achieved using about 40
records per block. Maybe RegEx uses some sort of optimisation based on
around about 16KB.

We obviously have differing amounts of RAM because I can do a ReadToEnd on a
200MB + file quite happily. My machine spits it's dummy at just over 260 MB.
That level, of course will vary depending on whatever else is running at the
time.

Using the ReadLine method on an 83265 record file, and using different
combinations of Mid, Trim, String.SubString and String.Trim I am seeing
results of between 1.5 and 2 seconds to pase the entire file, depending on
the combination. The difference between running it compiled for debug
configuration in the IDE and release configuration is insignificant (less
than 100 milliseconds).

Unfortunately you haven't provided any comparative reslts for your machine.

The evidence I see is that RegEx is actually a poor performer compared to
more convential string parsing in this particular case.

I am still of the opinion that the 'percieved slowness' is in one of the
other functions that is called on a per record basis rather in the file
IO/record parsing area per se.
 
So...... are we saying that the core of his parsing code was the fastest to
begin with, give or take a trim. I won't argue that. I started with the
impression that that many reads was causing a significant overhead, and
looked for a solution that required less reads. As far as comparitive
results, I think we have them, if your best time was 1.5 and mine 3.5 then
the results are in:) For a box to box test, just post the code and I'll run
it here and let you know the results (and hillcountry if he's still reading
this thread:)

Night,
MP
 
Stephany, MP

I'm still reading the thread. I guess it's the time diff and so I'm not
around when you guys are discussing.

Stephany, to answer some of your questions:

I've tested the same file in both Vb6 and VB.Net. This specific file
has 4564 records and Vb6 processes it in avg of 35 secs and VB.Net
takes an avg of 45 secs. If I process the same file again in Vb.Net,
it's about the same speed +/- 1 sec.

On the usage:
1. Yes, it is run by a user during the business day and sometimes when
the file is too big like 400MB, it runs thru the following day and this
is for the VB6 ver.
2. No, it is not run as a batch. Basically, the user selects a file
processes it and then if there are additional files, continues to
process one at atime.
3. I've been testing the .Net ver thru' out the day to check if time
makes a diff. But I've noticed in the Vb6 ver, that at times(no
specific time of the day) it is processing real fast and for the same
file it slows down and then again picks up the speed. Note that no
other application is run. Not sure what causes this. On the other hand,
..Net ver always processes about the same speed.

Well, the 400MB files have more than 350K records.

That's exactly, even I was thinking if I was here comparing apples to
apple or not.

400MB file size is regular. Basically, this file is sent by our client
adn we process the files convert it to our format and then run a
backend job to update the database with this info. Our's is a
healthcare industry.

I've not tested the VB.Net version for a 400MB size. I just found out
from the user who runs the VB6 ver and he said it takes about 7 1/2
hrs. So, I don't know if the there will be significant diff or not. To
begin with, I started testing a smaller file. Since this is 15 secs
slower, I decided to debug and try and optimize if necessary before
testing a bigger file.

We are re-writing this in Vb.Net as most of our other appls are already
in .net and this is one of the older apps.

I'm planning to test the 400MB file sometime today. Will keep you guys
posted.

Thanks.
 
HillCountry,

I told you before that you should test this clean.
I have seen in your sample a dataset, generic VBNet stuff and more what is
not possible in VB6 and probably do you have in that part (with a quick
look) not used the most optimal methods to load bulk data.

Therefore when you ask if the VBNet IO function is slower than VB6 than you
should in my opinion test the most common VB6 IO functions agains the most
common VBNet IO functions to write files.

What you now are doing is in my opinion comparing apples with fishes.

Just my thought,

Cor
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top