IO function in Vb.Net slower than in Vb6.0

H

hillcountry74

Hi,

I'm re-writing a VB6 app in Vb.Net. This basically reads a text file
using streamreader one line at a time, parses the string using
substring, trim functions and writes the parsed string to an output
text file using streamwriter. I've noticed while testing that this is
15 secs slower than the VB6 app. Wonder why it is slow. Can someone
give me some pointers?

Thanks. Appreciate your time.
 
H

Herfried K. Wagner [MVP]

hillcountry74 said:
I'm re-writing a VB6 app in Vb.Net. This basically reads a text file
using streamreader one line at a time, parses the string using
substring, trim functions and writes the parsed string to an output
text file using streamwriter. I've noticed while testing that this is
15 secs slower than the VB6 app. Wonder why it is slow. Can someone
give me some pointers?

VB.NET applications are stored in IL (Intermediate Language) instead of
native code. At runtime, the CLR's JIT compiler converts the methods' IL to
native code. This process will take some time and can influence the runtime
of your application. However, I think that there might be a different
reason for the performance differences. Could you post the VB6 code and the
corresponding VB.NET version of this code?
 
H

hillcountry74

Thanks for your response

Here is the code:

Public Overrides Sub PreferredInputProcessing()
Dim InputFileReader As StreamReader
Dim UndefinedBenefitsFileWriter As StreamWriter
Dim DataBlock As MHNet.ApplicationBlocks.Data.SqlHelper
Dim ds As New DataSet()
Dim dr As SqlClient.SqlDataReader
Dim sSql As New StringBuilder()
Dim iNewOptionCount As Integer
Dim sPreferredInputFile As System.String
Dim iRecordsProcessed As System.Int32

'Open input file
InputFileReader = New StreamReader(_sInputFileLocation)

'Open output file
OpenOutputFiles()

'Create UndefinedBenefits file
UndefinedBenefitsFileWriter = New StreamWriter("C:\EDIFiles\" &
_sRateCode & "UndefinedBenefits.csv")

'Zero out the number of records processed.
ProcessReport.RecordsProcessed = 0

'Set validation properties
_bValidateEnrollType = True
_bValidateHeadofHouse = True
'The Celanese benefits don't have a value for PrimaryStatus in
the elig file,
'but are actually = Primary. So, clsMain defaults it to primary
status.
Select Case _sRateCode
Case "CELANESEMBH", "MMSI", "GLOBALHEALTH"
_bValidatePrimaryStatus = False
Case Else
_bValidatePrimaryStatus = True
End Select
_bValidateMaritalStatus = False

Do While InputFileReader.Peek > -1

InitializeInputVariables()
sPreferredInputFile = Nothing
sPreferredInputFile = InputFileReader.ReadLine
'/ skip blank lines
If sPreferredInputFile.Trim <> "" AndAlso
sPreferredInputFile.Trim(""c) <> "" AndAlso
sPreferredInputFile.Trim.Length >= 439 Then
iRecordsProcessed = iRecordsProcessed + 1

'/ update display every 100 records
If iRecordsProcessed Mod 100 = 0 Then
Status = "Records processed: " & iRecordsProcessed
RaiseEvent ProcessStatus(Me, New
System.EventArgs())
End If

'Set the input properties by extracting specific
'values from the input record.
_sInActionCode = Trim(Mid(sPreferredInputFile, 1, 1))
_sInCarrierMemId = Trim(Mid(sPreferredInputFile, 2,
25))
_sInLastName = Trim(Mid(sPreferredInputFile, 27, 60))
_sInFirstName = Trim(Mid(sPreferredInputFile, 87, 30))
_sInMiddleName = Trim(Mid(sPreferredInputFile, 117,
15))
_sInAddr1 = Trim(Mid(sPreferredInputFile, 132, 60))
_sInAddr2 = Trim(Mid(sPreferredInputFile, 192, 60))
_sInCity = Trim(Mid(sPreferredInputFile, 252, 30))
_sInState = Trim(Mid(sPreferredInputFile, 282, 2))
_sInZip = Trim(Mid(sPreferredInputFile, 284, 10))
_sInBenefitOption = Trim(Mid(sPreferredInputFile, 294,
60))
_sInEmployerGroup = Trim(Mid(sPreferredInputFile, 354,
15))
_sInOptionEffDate = Trim(Mid(sPreferredInputFile, 369,
8))
_sInHPEffDate = Trim(Mid(sPreferredInputFile, 377, 8))
_sInTermDate = Trim(Mid(sPreferredInputFile, 385, 8))

If _sInTermDate = "" Or Not
IsDateValid(AddDateDashes(_sInTermDate)) Then
_sInTermDate = _sMagicTermDate
End If
'TERMING PLANS to set date to manual date
Select Case _sRateCode
Case "MMSI"
If _sInTermDate > "20041231" Then
_sInTermDate = "20041231"
End If
End Select
_sInSex = sPreferredInputFile.Substring(392, 1).Trim
Dim sTmp As System.String
sTmp = Trim(Mid(sPreferredInputFile, 394, 8))
If sTmp <> "" Then
_sInDOB = Trim(Mid(sTmp, 1, 4)) & "-" &
Trim(Mid(sTmp, 5, 2)) & _
"-" & Trim(Mid(sTmp, 7, 2))
End If
_sInSSN = Trim(Mid(sPreferredInputFile, 402, 9))
_sInPhone = Trim(Mid(sPreferredInputFile, 411, 12))
If _sInPhone.Length = 12 Then
_sInPhone = Trim(Mid(_sInPhone, 1, 3)) &
Trim(Mid(_sInPhone, 5, 3)) & Trim(Mid(_sInPhone, 9, 4))
End If
sTmp = sPreferredInputFile.Substring(422, 8).Trim
If sTmp <> "" Then
_sInEmployerGroupAnivDate = Trim(Mid(sTmp, 1, 4)) &
_
"-" & Trim(Mid(sTmp, 5,
2)) & _
"-" & Trim(Mid(sTmp, 7,
2))
End If

_sInHeadOfHouse = Trim(Mid(sPreferredInputFile, 431,
9))
If _sInHeadOfHouse = "" Then
_sInHeadOfHouse = Trim(Mid(_sInCarrierMemId, 2, 9))
End If
_sInPrimaryStatus = Trim(Mid(sPreferredInputFile, 440,
1))
_sInEnrollType = Trim(Mid(sPreferredInputFile, 441, 1))

Try
_sInMaritalStatus = Trim(Mid(sPreferredInputFile,
442, 1))
Catch ex As System.ArgumentOutOfRangeException
If ex.Message.IndexOf("Index and length must refer
to a location within the string") > 0 Then _sInMaritalStatus = ""
End Try
'Validate the incoming record.

Validate()
f _bValidated Then
BuildOutputRecord()
WriteOutputRecord()
ProcessReport.TotalSuccessfulRecords =
ProcessReport.TotalRecordsProcessed - ProcessReport.TotalErrorRecords
Else
WriteOutputErrorRecord()
ProcessReport.TotalErrorRecords =
ProcessReport.TotalErrorRecords + 1
End If
End If 'skip blank lines
Loop

This is the main parsing routine.
Thanks for your help.
 
H

Herfried K. Wagner [MVP]

hillcountry74 said:
Try
_sInMaritalStatus = Trim(Mid(sPreferredInputFile, 442,
1))
Catch ex As System.ArgumentOutOfRangeException
If ex.Message.IndexOf("Index and length must refer
to a location within the string") > 0 Then _sInMaritalStatus = ""
End Try

How often is this exception thrown? Instead of catching an exception make
sure that the indices are valid. In addition to that, check the performance
of the release version (not the debug) version of the application when it's
started outside the IDE.
 
H

hillcountry74

This exception is called maybe 1 out of 1000 times. I've tried to see
if this makes a difference by commenting out this piece of code , but
no diff.

I've compiled in release mode and executed the app for the resulting
exe. But there is also a .pdb file which I think is created when I run
the app in debug mode.

Is there anything else I'm missing. Do you think the substring,trim
functions will slow down? Or is it the IO the cause?

Thanks for your time.
 
M

MeltingPoint

This exception is called maybe 1 out of 1000 times. I've tried to see
if this makes a difference by commenting out this piece of code , but
no diff.

I've compiled in release mode and executed the app for the resulting
exe. But there is also a .pdb file which I think is created when I run
the app in debug mode.

Is there anything else I'm missing. Do you think the substring,trim
functions will slow down? Or is it the IO the cause?

Thanks for your time.

I've spent a year or so on vb.net and still consider myself new, but in
my opinion it is the many small reads that are slowing you down. Since I
don't know the structue of the file (but it sounds like a text file with
the records all smashed together) I'll suggest a couple of way *I THINK*
will speed it up.
1) (If file has delimiters) - Read the whole file into a string, and use
String.Split() to create an array that you can then map to variables or
just write it straight out ->outFile.Write(array(elementNumber))

2) Read the whole file into a string, and use RegularExpressions.RegEx
and RegularExpressions.MatchCollection to break the string into parts
and process from there (done right this should solve the "Trim" problem.

3) If you have control over the file format(which i assume you don't)
fix the file format so you can read it in line by line without further
processing.

4) Out of Ideas:) Let me know if it helps or you need help with one of
the above:)

MP
 
S

Stephany Young

At face value there does not appear to be anything that is an obvious
bottleneck, however you do call a number of methods that you have not
described, (OpenOutputFiles, InitializeInputVariables, Validate,
BuildOutputRecord, WriteOutputRecord, WriteOutputErrorRecord,
WriteOutputErrorRecord), and it is possible that there is bottleneck in any
of those. In addition you raise event ProcessStatus regularly and it would
be prudent to ensure that that whatever is handling that event is not
blocking the process for an inordinate length of time.

From my point of view the number of calls to Trim could be a factor and
perhaps some of of them are redundant. For example, take the line:

_sInActionCode = Trim(Mid(sPreferredInputFile, 1, 1))

If the first character of a 'record' always contains a non-space character
then Trim is redundant. In this case there are 3 new strings being created,
(remember that strings are immutable), and there is an overhead, abeit
small, involved in the creation of each string. Removing the Trim from this
line would mean that there are only 2 new strings being created thus
reducing the overhead accordingly. With the number of string operations in
your PreferredInputProcessing method this could be significant.

You might also try modifying the string parsing to the '.NET way'., for
example:

_sInActionCode = sPreferredInputFile.SubString(0, 1).Trim

or

_sInActionCode = sPreferredInputFile.SubString(0, 1)

I do not have any benchmarking data but it is possible that you might find a
performance increase.

Another place where, in my view there extraneous overhead is:

If sPreferredInputFile.Trim <> "" AndAlso sPreferredInputFile.Trim(""c)
<> "" AndAlso
sPreferredInputFile.Trim.Length >= 439 Then

Note here that you are using the System.String.Trim method rather than the
Microsoft.VisualBasic.Trim function. The Microsoft.VisualBasic.Trim function
returns the source string with leading and trailing space (&H20) characters
removed while the System.String.Trim method returns the source string after
white space characters are removed from the beginning and end. Note that
there is a difference between 'space' characters and 'white space'
characters. It is unclear what actual character is being specified in the
sPreferredInputFile.Trim(""c) clause but it is highly likely that it
qualifies as 'white space' and is therfore being removed by the first
clause. I would be inclined to code the test this:

sPreferredInputFile = sPreferredInputFile.Trim()
If sPreferredInputFile.Length >= 439 Then

The 3 string operations are now reduced to 1 and the number of comparison
operations is also reduced from three to one. Given the above you might be
able to refine your parsing code and identify further redundancies.


Thanks for your response

Here is the code:

Public Overrides Sub PreferredInputProcessing()
Dim InputFileReader As StreamReader
Dim UndefinedBenefitsFileWriter As StreamWriter
Dim DataBlock As MHNet.ApplicationBlocks.Data.SqlHelper
Dim ds As New DataSet()
Dim dr As SqlClient.SqlDataReader
Dim sSql As New StringBuilder()
Dim iNewOptionCount As Integer
Dim sPreferredInputFile As System.String
Dim iRecordsProcessed As System.Int32

'Open input file
InputFileReader = New StreamReader(_sInputFileLocation)

'Open output file
OpenOutputFiles()

'Create UndefinedBenefits file
UndefinedBenefitsFileWriter = New StreamWriter("C:\EDIFiles\" &
_sRateCode & "UndefinedBenefits.csv")

'Zero out the number of records processed.
ProcessReport.RecordsProcessed = 0

'Set validation properties
_bValidateEnrollType = True
_bValidateHeadofHouse = True
'The Celanese benefits don't have a value for PrimaryStatus in
the elig file,
'but are actually = Primary. So, clsMain defaults it to primary
status.
Select Case _sRateCode
Case "CELANESEMBH", "MMSI", "GLOBALHEALTH"
_bValidatePrimaryStatus = False
Case Else
_bValidatePrimaryStatus = True
End Select
_bValidateMaritalStatus = False

Do While InputFileReader.Peek > -1

InitializeInputVariables()
sPreferredInputFile = Nothing
sPreferredInputFile = InputFileReader.ReadLine
'/ skip blank lines
If sPreferredInputFile.Trim <> "" AndAlso
sPreferredInputFile.Trim(""c) <> "" AndAlso
sPreferredInputFile.Trim.Length >= 439 Then
iRecordsProcessed = iRecordsProcessed + 1

'/ update display every 100 records
If iRecordsProcessed Mod 100 = 0 Then
Status = "Records processed: " & iRecordsProcessed
RaiseEvent ProcessStatus(Me, New
System.EventArgs())
End If

'Set the input properties by extracting specific
'values from the input record.
_sInActionCode = Trim(Mid(sPreferredInputFile, 1, 1))
_sInCarrierMemId = Trim(Mid(sPreferredInputFile, 2,
25))
_sInLastName = Trim(Mid(sPreferredInputFile, 27, 60))
_sInFirstName = Trim(Mid(sPreferredInputFile, 87, 30))
_sInMiddleName = Trim(Mid(sPreferredInputFile, 117,
15))
_sInAddr1 = Trim(Mid(sPreferredInputFile, 132, 60))
_sInAddr2 = Trim(Mid(sPreferredInputFile, 192, 60))
_sInCity = Trim(Mid(sPreferredInputFile, 252, 30))
_sInState = Trim(Mid(sPreferredInputFile, 282, 2))
_sInZip = Trim(Mid(sPreferredInputFile, 284, 10))
_sInBenefitOption = Trim(Mid(sPreferredInputFile, 294,
60))
_sInEmployerGroup = Trim(Mid(sPreferredInputFile, 354,
15))
_sInOptionEffDate = Trim(Mid(sPreferredInputFile, 369,
8))
_sInHPEffDate = Trim(Mid(sPreferredInputFile, 377, 8))
_sInTermDate = Trim(Mid(sPreferredInputFile, 385, 8))

If _sInTermDate = "" Or Not
IsDateValid(AddDateDashes(_sInTermDate)) Then
_sInTermDate = _sMagicTermDate
End If
'TERMING PLANS to set date to manual date
Select Case _sRateCode
Case "MMSI"
If _sInTermDate > "20041231" Then
_sInTermDate = "20041231"
End If
End Select
_sInSex = sPreferredInputFile.Substring(392, 1).Trim
Dim sTmp As System.String
sTmp = Trim(Mid(sPreferredInputFile, 394, 8))
If sTmp <> "" Then
_sInDOB = Trim(Mid(sTmp, 1, 4)) & "-" &
Trim(Mid(sTmp, 5, 2)) & _
"-" & Trim(Mid(sTmp, 7, 2))
End If
_sInSSN = Trim(Mid(sPreferredInputFile, 402, 9))
_sInPhone = Trim(Mid(sPreferredInputFile, 411, 12))
If _sInPhone.Length = 12 Then
_sInPhone = Trim(Mid(_sInPhone, 1, 3)) &
Trim(Mid(_sInPhone, 5, 3)) & Trim(Mid(_sInPhone, 9, 4))
End If
sTmp = sPreferredInputFile.Substring(422, 8).Trim
If sTmp <> "" Then
_sInEmployerGroupAnivDate = Trim(Mid(sTmp, 1, 4)) &
_
"-" & Trim(Mid(sTmp, 5,
2)) & _
"-" & Trim(Mid(sTmp, 7,
2))
End If

_sInHeadOfHouse = Trim(Mid(sPreferredInputFile, 431,
9))
If _sInHeadOfHouse = "" Then
_sInHeadOfHouse = Trim(Mid(_sInCarrierMemId, 2, 9))
End If
_sInPrimaryStatus = Trim(Mid(sPreferredInputFile, 440,
1))
_sInEnrollType = Trim(Mid(sPreferredInputFile, 441, 1))

Try
_sInMaritalStatus = Trim(Mid(sPreferredInputFile,
442, 1))
Catch ex As System.ArgumentOutOfRangeException
If ex.Message.IndexOf("Index and length must refer
to a location within the string") > 0 Then _sInMaritalStatus = ""
End Try
'Validate the incoming record.

Validate()
f _bValidated Then
BuildOutputRecord()
WriteOutputRecord()
ProcessReport.TotalSuccessfulRecords =
ProcessReport.TotalRecordsProcessed - ProcessReport.TotalErrorRecords
Else
WriteOutputErrorRecord()
ProcessReport.TotalErrorRecords =
ProcessReport.TotalErrorRecords + 1
End If
End If 'skip blank lines
Loop

This is the main parsing routine.
Thanks for your help.
 
C

Cor Ligthert

HillCountry,
I'm re-writing a VB6 app in Vb.Net. This basically reads a text file
using streamreader one line at a time, parses the string using
substring, trim functions and writes the parsed string to an output
text file using streamwriter. I've noticed while testing that this is
15 secs slower than the VB6 app. Wonder why it is slow. Can someone
give me some pointers?
When you want to test this, than you should use comparable code.

That means

Read inputline
outputline = inputline
Write outputline.

Because the fact that I don't have VB6 installed I cannot test that.

However it looks strange to me.

Cor
 
H

hillcountry74

Thanks guys for your suggestions.

MP,
The file does not have delimiters but follows a specific format and
hence I used Mid to parse.

Can you please give me more info on using regular exprs as a
replacement for Trim function?


Stephany,
The file might contain a valid character in position 1. So, I still
need to use Trim. even I'm suspecting Trim to be the cause. I read this
article on MSDN,
http://msdn.microsoft.com/library/d...tml/vbtchmicrosoftvisualbasicnetinternals.asp
and recommend using Mid instead of Substring. I couldn't get the diff.

I will change the If condition as you have mentioned and let you know
the results. And "", I'm thinking is an unicode character. Earlier, I
was not checking for this and in one of the files after the last
record, this was there and when it tried to do a substring, it threw an
exception. I was using Substring previously instead of Trim adn changed
it subsequently after reading the above article.

Guys, more suggestions really appreciated. I'm stuck with this issue
from past 1 week. Please help!!

Thanks again for your time.
 
H

hillcountry74

I wanted to add to the above: The parsing routine is in dll and is
being called from the frontend app which is a separate project but in
the same solution. Could this architecture be a problem?
 
H

hillcountry74

Guys,

I commented calling the Validate method and it was faster by 10 secs.
Here is the Validate method code. Please let me know how I can optimize
this. (Please note that ValidateLast, firstname, state etc are all the
same). I'm using IndexOf method in DoesBadCharacterExist method? Is
there a better way? Thanks

Protected Sub Validate()
'**********************************************************
' Validate the current input record
'**********************************************************
Dim sTmpHold As System.String
'/ Set the validated flag to True.
_bValidated = True


'/ Initialize the output error record.
BuildOutputErrorRecord()

'/ Member ID Validation
If _sInCarrierMemId = "" Then
Throw New InvalidFieldException.MissingMemberIDException()
End If

'/ Mhnet Member Validation
If Not _bMhnetMember Then
Select Case _sRateCode
Case "HCUSA", "HUMANAFLHMO", "HUMANAFLPPO", "HUMANA"
'Commented out BB 2003-03-04 line from criteria
because added humanafl
'And frmMain.cboRateCode = "HCUSA" Then
Throw New
InvalidFieldException.MHNetMemberException()
Case Else
End Select
End If

'/ Last name Validation - chars "A-Z,.-'0-9"
Select Case ValidateLastName(_sInLastName, _sLastnameChars)
Case 0
Throw New
InvalidFieldException.MissingLastnameException()
Case 1
Throw New
InvalidFieldException.BadFormatLastnameException()
Case Else
End Select

'/ First name validation - chars "A-Z.'"
Select Case ValidateFirstName(_sInFirstName, _sFirstNameChars)
Case 0
Throw New
InvalidFieldException.MissingFirstnameException()
Case 1
Throw New
InvalidFieldException.BadFormatFirstnameException()
Case Else
End Select

'/ Middle name validation - chars "A-Z"
If ValidateMiddleName(_sInMiddleName, _sMiddleNameChars) <>
True Then
Throw New
InvalidFieldException.BadFormatMiddlenameException()
End If

'/ City name validation - chars "A-Z.-'"
If _sInCity <> "" Then
_sInCity = _sInCity.Replace("/", "") 'added 20050221 BB
_sInCity = _sInCity.Replace("\", "") 'added 20050221 BB
_sInCity = _sInCity.Replace(",", "") 'added 20050221 BB
If ValidateCityName(_sInCity, _sCityNameChars) <> True Then
Throw New
InvalidFieldException.BadFormatCityException()
End If
End If

'/ State name validation - chars "A-Z"
If ValidateStateName(_sInState, _sStateNameChars) <> True Then
Throw New InvalidFieldException.BadFormatStateException()
End If

'/ SSN validation - make sure SSN is only numeric if it exists
If _sInSSN <> "" Then
_sInSSN = Mid(_sInSSN, 1, 9)
If Not IsNumeric(_sInSSN) Then
Throw New InvalidFieldException.BadFormatSSNException()
End If
End If

'/ Phone validation - make sure Phone is only numeric if it
exists
If _sInPhone <> "" Then
_sInPhone = _sInPhone.Replace("-", "")
_sInPhone = _sInPhone.Replace(" ", "") 'added 20050221 BB
_sInPhone = _sInPhone.Replace("*", "") 'added 20050221 BB
_sInPhone = _sInPhone.Replace(".", "") 'added 20050221 BB
_sInPhone = _sInPhone.Replace("/", "") 'added 20050221 BB
If Not IsNumeric(_sInPhone) Then
Throw New
InvalidFieldException.BadFormatPhoneException()
End If
End If

'/ Date of Birth Validation
sTmpHold = AddDateDashes(_sInDOB)
If Not IsDateValid(sTmpHold) Then
Throw New InvalidFieldException.DateOfBirthException()
Else
_sInDOB = System.String.Format("{0:yyyyMMdd}",
CType(sTmpHold, Date))
End If
If sTmpHold > _sMagicTermDateWithDashes Then
_sInDOB = _sMagicTermDate
End If
'_sInDOB = CheckMaxDate(_sInDOB)

sTmpHold = AddDateDashes(_sInOptionEffDate)
If IsDateValid(sTmpHold) Then
If System.String.Format("{0:yyyyMMdd}", _sInOptionEffDate)
< System.String.Format("{0:yyyyMMdd}", _sInceptionDate) Then
_sInOptionEffDate =
System.String.Format("{0:yyyyMMdd}", _sInceptionDate)
Else
_sInOptionEffDate =
System.String.Format("{0:yyyyMMdd}", _sInOptionEffDate)
End If
Else ' _sInOptionEffDate in not valid
Throw New InvalidFieldException.OptionEffDateException()
End If
'End If
'Commented out above Code, goes with above comment out code
block bb 2002-12-12
If sTmpHold > _sMagicTermDateWithDashes Then
_sInOptionEffDate = _sMagicTermDate
End If
'_sInOptionEffDate = CheckMaxDate(_sInOptionEffDate)

'/ if this contains " " (8 blanks) then this allows the
'code to choose inception or filedate for hpeffdate
'Commented code below while re-writing in .Net
'as for an invalid date it was always defaulting to an empty
string and was processed
'when it should actually be errored.
'If Not IsDateValid(AddDateDashes(_sInHPEffDate)) Then
' _sInHPEffDate = ""
'End If

sTmpHold = AddDateDashes(_sInHPEffDate)
If _sInHPEffDate = "" Then
'If the file date is before the inception date, use the
inception date.
'Otherwise, use the file date.
If System.String.Format("{0:yyyyMMdd}",
FileDateTime(_sInputFileLocation).ToString) <
System.String.Format("{0:yyyyMMdd}", _sInceptionDate) Then
_sInHPEffDate = System.String.Format("{0:yyyyMMdd}",
_sInceptionDate)
Else
_sInHPEffDate = System.String.Format("{0:yyyyMMdd}",
FileDateTime(_sInputFileLocation).ToString)
End If
Else
If IsDateValid(sTmpHold) Then
_sInHPEffDate = System.String.Format("{0:yyyyMMdd}",
CType(sTmpHold, Date))
Else
Throw New InvalidFieldException.HPEffDateException()
End If
End If
'_sInHPEffDate = CheckMaxDate(_sInHPEffDate)
If sTmpHold > _sMagicTermDateWithDashes Then
_sInHPEffDate = _sMagicTermDate
End If
'/ Benefit Option Validation
If _sInBenefitOption = "" Then
Throw New
InvalidFieldException.MissingBenefitOptionException()
End If

'/ Employer Group Validation
If _sInEmployerGroup = "" And _bValidateEmployerGroup Then
Throw New
InvalidFieldException.MissingEmployerGroupException()
End If

'/ Set the Term Date to 12.31.2078 if the Term date is not a
valid date.
sTmpHold = AddDateDashes(_sInTermDate)
If IsDateValid(sTmpHold) Then
_sInTermDate = System.String.Format("{0:yyyyMMdd}",
CType(sTmpHold, Date))
Else
_sInTermDate = _sMagicTermDate
End If

''/ Term Date Validation
''/ If (msInTermDate < Format(Now(), "yyyymmdd")) Then
If (_sInTermDate < System.String.Format("{0:yyyyMMdd}",
CType(_sInceptionDate, Date))) Then
'msOutErrorRec = msOutErrorRec & " Invalid Term Date Error:
" & msInTermDate
'/ changed per Kit on 4-17-2001
_sInTermDate = System.String.Format("{0:yyyyMMdd}",
CType(_sInceptionDate, Date))
Throw New InvalidFieldException.TermDateException()
End If
'_sInTermDate = CheckMaxDate(_sInTermDate)
If _sInTermDate > _sMagicTermDateWithDashes Then
_sInTermDate = _sMagicTermDate
End If

'/ Employer Group Aniversary date validation
If Not IsDateValid(AddDateDashes(_sInEmployerGroupAnivDate))
Then
_sInEmployerGroupAnivDate = ""
Else
_sInEmployerGroupAnivDate =
System.String.Format("{yyyymmdd}",
AddDateDashes(_sInEmployerGroupAnivDate))
End If
'_sInEmployerGroupAnivDate =
CheckMaxDate(_sInEmployerGroupAnivDate)
If _sInEmployerGroupAnivDate > _sMagicTermDateWithDashes Then
_sInEmployerGroupAnivDate = _sMagicTermDate
End If

'/ If the Head of House is blank and the element was NOT
supplied in the
'/ submitted positive enrollment file, use the left nine
characters
'/ of the Carrier Member ID.
If _sInHeadOfHouse = "" And _bValidateHeadofHouse = False Then
_sInHeadOfHouse = _sInCarrierMemId.Substring(0, 9)
End If

'/ Head of House Validation - chars "A-Z,.-'0-9"
Select Case ValidateHeadHouse(_sInHeadOfHouse,
_sHeadOfHouseChars)
Case 0 'If the Head of House element was supplied and was
blank, reject the record.
Throw New
InvalidFieldException.MissingHeadofHouseException()
'/ If Head of House contains garbage chars reject the
record
Case 1
Throw New
InvalidFieldException.BadFormatHeadofHouseException()
Case Else
End Select

'/ Primary Status Validation
'/ If the Primary Status is blank and the Primary Status
element was NOT
'/ submitted as an element of the positive enrollment file, use
"P"
If _sInPrimaryStatus = "" And _bValidatePrimaryStatus = False
Then
_sInPrimaryStatus = "P"
'/ If the Primary Status element was supplied and was
blank, reject the record.
ElseIf _sInPrimaryStatus = "" And _bValidatePrimaryStatus =
True Then
Throw New
InvalidFieldException.MissingPrimaryStatusException()
Else '/ it was supplied, make sure it is a P or S
Select Case _sInPrimaryStatus.ToUpper
Case "P", "S"
Case Else
Throw New
InvalidFieldException.BadFormatPrimaryStatusException()
End Select
End If

'/ Enroll Type Validation
'/ If Enroll Type is blank and it was not one of the supplied
elements in
'/ the health plans positive enrollment file, set Enroll Type
to "I".
If _sInEnrollType = "" And _bValidateEnrollType = False Then
_sInEnrollType = "I"
'/ If the Enroll Type element was supplied and was blank,
reject the record.
ElseIf _sInEnrollType = "" And _bValidateEnrollType = True Then
Throw New
InvalidFieldException.MissingEnrollTypeException()
Else '/ it was supplied, make sure it it a I,S,D,or C
Select Case _sInEnrollType.ToUpper
Case "I", "S", "D", "C"
Case Else
Throw New
InvalidFieldException.BadFormatEnrollTypeException()
End Select
End If

'/ If Marital status is supplied and was blank reject
If _sInMaritalStatus = "" And _bValidateMaritalStatus = True
Then
Throw New
InvalidFieldException.MissingMaritalStatusException()
'/ assure that only "S" and "M" are passed
Else
Select Case _sInMaritalStatus.ToUpper
Case "S", "M", ""
Case Else
Throw New
InvalidFieldException.BadFormatMaritalStatusException()
End Select
End If
End Sub

Protected Overridable Function ValidateLastName(ByVal sSuspect As
String, ByVal sGoodChars As String) As Integer
If sSuspect.Length = 0 Then
Return 0
End If
If DoesBadCharExist(sSuspect, sGoodChars) = True Then
Return 1
Else
Return 2
End If
End Function

Protected Overridable Function ValidateFirstName(ByVal sSuspect As
String, ByVal sGoodChars As String) As Integer
If sSuspect.Length = 0 Then
Return 0
End If
If DoesBadCharExist(sSuspect, sGoodChars) = True Then
Return 1
Else
Return 2
End If
End Function

Protected Overridable Function AddDateDashes(ByVal sSuspect As String)
As String
'/ add dashes to dates so that Isdate function willl work
properly
'/ 2000-12-26 rlt
Dim sCached As String

sCached = sSuspect.Trim
If sCached.Length = 8 Then
Return sCached.Substring(0, 4) & "-" & sCached.Substring(4,
2) & "-" & sCached.Substring(6, 2)
Else
Return sCached
End If
End Function

Protected Overridable Function DoesBadCharExist(ByVal sSuspect As
String, ByVal sGoodChars As String) As Boolean
Dim iCount As Integer
For iCount = 0 To sSuspect.Length - 1
If sGoodChars.IndexOf(sSuspect.ToUpper.Chars(iCount)) < 0
Then
Return True
End If
Next iCount
Return False
End Function
 
S

Stephany Young

I could be wrong but, I'm sure that MeltingPoint didn't realise that you are
dealing with a 'fixed' record when he alluded to using RegEx for the
parsing.

However RegEx would certainly be of assistance in the validation. Careful
construction of RegEx expressions would effeciencies in this method e.g. it
would make DoesBadCharacterExist obsolete.

Now, don't take this the wrong way here, but from the fragments you have
supplied and the obvious complexity of the operation, it is getting into the
area where you might be better off engaging a consultant to review the
project and make recommendations. Analysing the overall operation and making
the appropriate recommendations would take a number of hours, if not days,
and it would be unfair to expect those who donate their time and expertise,
quite freely I might add, to advise on something with the scope of your
project without being given the full picture.

My analysis of your fragments is that there there is a lot more to your
'problem' than meets the eye and I consider that if you try to get advice
'piecemeal' then you won't end up getting the performance boost you are
looking for and/or you will get advice that is entirely appropriate for the
fragment in question but might cause problems for you in the 'bigger
picture'.

That said, feel free to post 'questions' about specfic things that you like
advice on like 'How would I go about doing a benchmark test to see if Mid is
more efficient than SubString' or 'How would I construct a Regex expression
to make sure a string contains only certain characters'.
 
M

MeltingPoint

<lots o code>

Some good ideas so far. I've started to put the regex expression together
for you, could have it done in a few hours. If you want to send me one of
these files, (important info changed of course) I could fine tune the
expression. macmanic(zero)(zero)atHotmail.com

Note to anyone else reading this thread, Any ideas on the speed of regex as
opposed to Substring/IndexOf. I can say for sure that I've parsed a 4mb
file with regex in a few hundred milliseconds.
 
M

MeltingPoint

<lots o code>

Some good ideas so far. I've started to put the regex expression
together for you, could have it done in a few hours. If you want to send
me one of these files, (important info changed of course) I could fine
tune the expression. macmanic(zero)(zero)atHotmail.com

Note to anyone else reading this thread, Any ideas on the speed of regex
as opposed to Substring/IndexOf. I can say for sure that I've parsed a
4mb file with regex in a few hundred milliseconds.

++Just saw stefs comment. I'm not sure what difference it makes as to
weather its fixed or not. RegEx still works and its alot easier on the
eyes:)
((?<ActionCode>.)
(?<CarrierID>\d{0,25})
(?<LastName>\w{0,60}\s*\b)
(?<FirstName>\w{0,30}\s*\b)
(?<MiddleName>\w{0,15}\s*\b)
(?<Addr1>.{0,60}\s*\b)
(?<Addr2>.{0,60}\s*\b)
(?<City>.{0,30}\s*\b)
(?<State>.{0,2}\s*\b)
(?<Zip>.{0,10}\s*\b))
Actually the fact that its fixed makes it easier.

And a note as to how close I was paying attention:
sPreferredInputFile.Trim.Length >= 439
does not 'allude' to me that it is totally fixed.

However, I don't know Stef, she probably knows more than me, considering
I just started using RegEx a month ago. But the above Regex does match
the following:

e8374837463784958473627495Sc9ott 8nglis
Micheal 554 sdf sdf
667 rtert ertwert Hell
FL90210

Which I think is what the record looks like (at least so far)

Let me know, both of you :)
MP
 
M

MeltingPoint

Some good ideas so far. I've started to put the regex expression
together for you, could have it done in a few hours. If you want to send
me one of these files, (important info changed of course) I could fine
tune the expression. macmanic(zero)(zero)atHotmail.com

Note to anyone else reading this thread, Any ideas on the speed of regex
as opposed to Substring/IndexOf. I can say for sure that I've parsed a
4mb file with regex in a few hundred milliseconds.

++Just saw stefs comment. I'm not sure what difference it makes as to
weather its fixed or not. RegEx still works and its alot easier on the
eyes:)
((?<ActionCode>.)
(?<CarrierID>\d{0,25})
(?<LastName>\w{0,60}\s*\b)
(?<FirstName>\w{0,30}\s*\b)
(?<MiddleName>\w{0,15}\s*\b)
(?<Addr1>.{0,60}\s*\b)
(?<Addr2>.{0,60}\s*\b)
(?<City>.{0,30}\s*\b)
(?<State>.{0,2}\s*)
(?<Zip>.{0,10}\s*\b))
Actually the fact that its fixed makes it easier.

And a note as to how close I was paying attention:
sPreferredInputFile.Trim.Length >= 439
does not 'allude' to me that it is totally fixed.

However, I don't know Stef, she probably knows more than me, considering
I just started using RegEx a month ago. But the above Regex does match
the following:

e8374837463784958473627495Sc9ott 8nglis
Micheal 554 sdf sdf
667 rtert ertwert Hell
FL90210

Which I think is what the record looks like (at least so far)

Let me know, both of you :)
MP
 
M

MeltingPoint

Sorry about all the posts (xnews acting up)


<lots o code>

Some good ideas so far. I've started to put the regex expression
together for you, could have it done in a few hours. If you want to send
me one of these files, (important info changed of course) I could fine
tune the expression. macmanic(zero)(zero)atHotmail.com

Note to anyone else reading this thread, Any ideas on the speed of regex
as opposed to Substring/IndexOf. I can say for sure that I've parsed a
4mb file with regex in a few hundred milliseconds.

++Just saw stefs comment. I'm not sure what difference it makes as to
weather its fixed or not. RegEx still works and its alot easier on the
eyes:)
((?<ActionCode>.)
(?<CarrierID>\d{0,25})
(?<LastName>\w{0,60}\s*\b)
(?<FirstName>\w{0,30}\s*\b)
(?<MiddleName>\w{0,15}\s*\b)
(?<Addr1>.{0,60}\s*\b)
(?<Addr2>.{0,60}\s*\b)
(?<City>.{0,30}\s*\b)
(?<State>.{0,2}\s*)
(?<Zip>.{0,10}\s*\b))
Actually the fact that its fixed makes it easier.

And a note as to how close I was paying attention:
sPreferredInputFile.Trim.Length >= 439
does not 'allude' to me that it is totally fixed.

However, I don't know Stef, she probably knows more than me, considering
I just started using RegEx a month ago. But the above Regex does match
the following:

e8374837463784958473627495Tomlin Nilmot
Micheal 554 Some Street
667 Some Other Street Hell
FL90210

Which I think is what the record looks like (at least so far)

Let me know, both of you :)
MP
 
S

Stephany Young

I surrender. I was having an abberration and thinking of Regex for simple
pattern matching rather than it's 'extraction' capability.

_sInHeadOfHouse = Trim(Mid(sPreferredInputFile, 431, 9))
....
_sInPrimaryStatus = Trim(Mid(sPreferredInputFile, 440, 1))
_sInEnrollType = Trim(Mid(sPreferredInputFile, 441, 1))
Try
_sInMaritalStatus = Trim(Mid(sPreferredInputFile, 442, 1))
....

This stuff here indicates to me that the record is, more than likely, fixed.
Note the Try ... Catch ... End Try to catch if there are not 442 characters,
but there is no matching construct for position 440 and 441. The earlier
test is for a record length of 439 characters or more, so the record might
be 439, 440, 441 or 442 characters. The catcher on position 442 implies that
characters 440 and 441 are always present. I read between the lines and
decided that 442 'should' always be present. Given that hilcountry74 hasn't
provided all the information this was a 50/50 call but for the purposes of
the exercise is largely irrelevant.

I have no problem with being proved wrong, but I don't think that your regex
will work for parsing here.

In your example thus far you rely on there being exactly 25 digits for
CarrierID. If there are less then your match attempt for LastName won't
start at position 27. Remember that the start position for each component of
the string is specifically defined. Also there is no indication that
CarrierID is numeric which means that it should use . instead of \d. To read
the correct number of characters, the quantifier must be {25} rather than
{0,25} and this means that you have read any trailing spaces as well which
still have to be trimmed off when the matches are read out.

(?<LastName>\w{0,60}\s*\b) will only handle simple names - those with no
imbedded spaces or punctuation characters like "van Allen", "O'Brien",
"Mandeville-Brown". Also it is common for company names to be stored in a
LastName field and other name fields left blank like "Acme Inc.". \w will
miss imbedded spaces, apostrophes, hyphens and periods. Another factor is
that you get idiots hitting the spacebar just as they are starting to type a
name and never correcting it so you can get " Smith". The \w will report no
match at all in this case. Use of the \b will only make things worse in such
cases.

In this case I think that the Mid or SubString methods are best for the
actual parsing, however regex will certainly make the validation routine
more compact and efficient because here you are operating on each individual
string rather than trying to pick the character sequence from postion x to
position y and therefore 2nd guessing what is actually there or not there as
the case may be.

BTW: I have a perfectly good name - there is no need to assume that it needs
contracting or that the spelling needs changing.
 
M

MeltingPoint

I surrender. I was having an abberration and thinking of Regex for
simple pattern matching rather than it's 'extraction' capability.

_sInHeadOfHouse = Trim(Mid(sPreferredInputFile, 431, 9))
...
_sInPrimaryStatus = Trim(Mid(sPreferredInputFile, 440, 1))
_sInEnrollType = Trim(Mid(sPreferredInputFile, 441, 1))
Try
_sInMaritalStatus = Trim(Mid(sPreferredInputFile, 442, 1))
...

This stuff here indicates to me that the record is, more than likely,
fixed. Note the Try ... Catch ... End Try to catch if there are not
442 characters, but there is no matching construct for position 440
and 441. The earlier test is for a record length of 439 characters or
more, so the record might be 439, 440, 441 or 442 characters. The
catcher on position 442 implies that characters 440 and 441 are always
present. I read between the lines and decided that 442 'should' always
be present. Given that hilcountry74 hasn't provided all the
information this was a 50/50 call but for the purposes of the exercise
is largely irrelevant.

I have no problem with being proved wrong, but I don't think that your
regex will work for parsing here.

In your example thus far you rely on there being exactly 25 digits for
CarrierID. If there are less then your match attempt for LastName
won't start at position 27. Remember that the start position for each
component of the string is specifically defined. Also there is no
indication that CarrierID is numeric which means that it should use .
instead of \d. To read the correct number of characters, the
quantifier must be {25} rather than {0,25} and this means that you
have read any trailing spaces as well which still have to be trimmed
off when the matches are read out.

(?<LastName>\w{0,60}\s*\b) will only handle simple names - those with
no imbedded spaces or punctuation characters like "van Allen",
"O'Brien", "Mandeville-Brown". Also it is common for company names to
be stored in a LastName field and other name fields left blank like
"Acme Inc.". \w will miss imbedded spaces, apostrophes, hyphens and
periods. Another factor is that you get idiots hitting the spacebar
just as they are starting to type a name and never correcting it so
you can get " Smith". The \w will report no match at all in this case.
Use of the \b will only make things worse in such cases.

In this case I think that the Mid or SubString methods are best for
the actual parsing, however regex will certainly make the validation
routine more compact and efficient because here you are operating on
each individual string rather than trying to pick the character
sequence from postion x to position y and therefore 2nd guessing what
is actually there or not there as the case may be.

BTW: I have a perfectly good name - there is no need to assume that it
needs contracting or that the spelling needs changing.

I knew I would catch it for that :) Force of habit from my personal
life:)

OK just checked it, imbedded spaces screw it up. And theres nothing I
can think of readily. I've seen some funky reg exp's - I'm sure it can
be done but not by me:) I tried just doing:

((?<ActionCode>.{1})" _
& "(?<CarrierID>.{25})" _
& "(?<LastName>.{60})" _
& "(?<FirstName>.{30})" _
& "(?<MiddleName>.{15})" _
& "(?<Addr1>.{60})" _
& "(?<Addr2>.{60})" _
& "(?<City>.{30)" _
& "(?<State>.{2})" _
& "(?<Zip>.{10}))"

....and my computer actually laughed at me!!

Back to the drawing board...
 
S

Stephany Young

You have a typo in your "(?<City>.{30)" - a missing }

Anyway, this works a treat with the caveat that the target string has to be
the expected length (442) or longer.

On my machine 10000 takes 1 second give or take a few milliseconds and
100000 iterations takes 10 seconds give or take a few milliseconds. It is
fair to say that, as writ and on my machine, as a parser it will handle
approx 1000 records per second.

So, I stand educated, you can do rudimentary parsing with Regex so long as
the expression is very carefully constructed.

Dim _s As String = "ACarrierID<16 spaces>" & _
"LastName<52 spaces>" & _
"FirstName<21 spaces>" & _
"MiddleName<5 spaces>" & _
"Addr1<55 spaces>" & _
"Addr2<55 spaces>" & _
"City<26 spaces>" & _
"StZip<7 spaces>" & _
"BenefitOption<47 spaces>" & _
"EmployerGroup OptionEfHPEffDatTermDate" & _
"SDOB<5 spaces>" & _
"SSN<6 spaces>" & _
"Phone<7 spaces>" & _
"EmployerHeadOfHouPM"

Dim _exp As String = "(?<ActionCode>.{1})" & _
"(?<CarrierID>.{25})" & _
"(?<LastName>.{60})" & _
"(?<FirstName>.{30})" & _
"(?<MiddleName>.{15})" & _
"(?<Addr1>.{60})" & _
"(?<Addr2>.{60})" & _
"(?<City>.{30})" & _
"(?<State>.{2})" & _
"(?<Zip>.{10})" & _
"(?<BenefitOption>.{60})" & _
"(?<EmployerGroup>.{15})" & _
"(?<OptionEffDate>.{8})" & _
"(?<HPEffDate>.{8})" & _
"(?<TermDate>.{8})" & _
"(?<Sex>.{1})" & _
"(?<DOB>.{8})" & _
"(?<SSN>.{9})" & _
"(?<Phone>.{12})" & _
"(?<EmployerGroupAnivDate>.{8})" & _
"(?<HeadOfHouse>.{9})" & _
"(?<PrimaryStatus>.{1})" & _
"(?<MaritalStatus>.{1})"

Dim r As Regex = New Regex(_exp)

Dim m As Match = r.Match(_s)

Dim _sInActionCode As String = m.Groups("ActionCode").ToString.Trim
Dim _sInCarrierID As String = m.Groups("CarrierID").ToString.Trim
Dim _sInLastName As String = m.Groups("LastName").ToString.Trim
Dim _sInFirstName As String = m.Groups("FirstName").ToString.Trim
Dim _sInMiddleName As String = m.Groups("MiddleName").ToString.Trim
Dim _sInAddr1 As String = m.Groups("Addr1").ToString.Trim
Dim _sInAddr2 As String = m.Groups("Addr2").ToString.Trim
Dim _sInCity As String = m.Groups("City").ToString.Trim
Dim _sInState As String = m.Groups("State").ToString.Trim
Dim _sInZip As String = m.Groups("Zip").ToString.Trim
Dim _sInBenefitOption As String = m.Groups("BenefitOption").ToString.Trim
Dim _sInEmployerGroup As String = m.Groups("EmployerGroup").ToString.Trim
Dim _sInOptionEffDate As String = m.Groups("OptionEffDate").ToString.Trim
Dim _sInHPEffDate As String = m.Groups("OptionEffDate").ToString.Trim
Dim _sInTermDate As String = m.Groups("HPEffDate").ToString.Trim
Dim _sInSex As String = m.Groups("TermDate").ToString.Trim
Dim _sInDOB As String = m.Groups("DOB").ToString.Trim
Dim _sInSSN As String = m.Groups("SSN").ToString.Trim
Dim _sInPhone As String = m.Groups("Phone").ToString.Trim
Dim _sInEmployerGroupAnivDate As String =
m.Groups("EmployerGroupAnivDate").ToString.Trim
Dim _sInHeadOfHouse As String = m.Groups("HeadOfHouse").ToString.Trim
Dim _sInPrimaryStatus As String = m.Groups("PrimaryStatus").ToString.Trim
Dim _sInMaritalStatus As String = m.Groups("MaritalStatus").ToString.Trim

Console.WriteLine("_sInActionCode = " & _sInActionCode)
Console.WriteLine("_sInCarrierID = " & _sInCarrierID)
Console.WriteLine("_sInLastName = " & _sInLastName)
Console.WriteLine("_sInFirstName = " & _sInFirstName)
Console.WriteLine("_sInMiddleName = " & _sInMiddleName)
Console.WriteLine("_sInAddr1 = " & _sInAddr1)
Console.WriteLine("_sInAddr2 = " & _sInAddr2)
Console.WriteLine("_sInCity = " & _sInCity)
Console.WriteLine("_sInState = " & _sInState)
Console.WriteLine("_sInZip = " & _sInZip)
Console.WriteLine("_sInBenefitOption = " & _sInBenefitOption)
Console.WriteLine("_sInEmployerGroup = " & _sInEmployerGroup)
Console.WriteLine("_sInOptionEffDate = " & _sInOptionEffDate)
Console.WriteLine("_sInHPEffDate = " & _sInHPEffDate)
Console.WriteLine("_sInTermDate = " & _sInTermDate)
Console.WriteLine("_sInSex = " & _sInSex)
Console.WriteLine("_sInDOB = " & _sInDOB)
Console.WriteLine("_sInSSN = " & _sInSSN)
Console.WriteLine("_sInPhone = " & _sInPhone)
Console.WriteLine("_sInEmployerGroupAnivDate = " &
_sInEmployerGroupAnivDate)
Console.WriteLine("_sInHeadOfHouse = " & _sInHeadOfHouse)
Console.WriteLine("_sInPrimaryStatus = " & _sInPrimaryStatus)
Console.WriteLine("_sInMaritalStatus = " & _sInMaritalStatus)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top