VB pgmr needs help with Regex for C#

  • Thread starter Mortimer Schnurd
  • Start date
M

Mortimer Schnurd

Hi All,
I am a VB 6 programmer who is now trying to learn C#. In doing so, I
am trying to convert some of my VB modules to C#. I routinely user Reg
Expressions in VB and am having some trouble trying to use Regex in
C#. Basically, I have a fixed format text file which I need to
validate prior to using in a program. The validation insures the data
format matches what the program is expecting to find in the file. The
pattern I am trying to match for multiple lines is "^[0-9]{4}.{74}01$"
or IOW, 4 digits at the start of a line, followed by 74 characters,
and ending with a literal "01" at the end of the line. This pattern
works fine in my VB code and it correctly identifies all of the lines
with this pattern. My C# code, on the other hand, finds 0 matches for
the same file.

I'm quite sure I am missing something quite simple but I just can't
see what it is! Can some kind soul please point out where I am going
wrong? I am including the code snippets for my VB app and for the C#
app. BTW, the first several lines of data do NOT contain the matching
data. The data does contain 345 lines of matching data which VB does
find.

VB
Private Sub Command1_Click()
Const FILENAME As String = "D:\Johnw\Data\RPLs\V2 RPLs\alldata.rpl"
Dim regx1 As New RegExp
Dim m As Match, mc As MatchCollection, sm As SubMatches
Dim sText As String
Dim fs As New FileSystemObject
Dim ts As TextStream
Set ts = fs.OpenTextFile(FILENAME)
sText = ts.ReadAll
With regx1
.Global = True
.MultiLine = True
.Pattern = "^[0-9]{4}.{74}01$"
Set mc = .Execute(sText)
Debug.Print mc.Count
End With

End Sub

C# App
private string ValidateFile(string filename)
{
string alltext = new StreamReader
(@"D:\Johnw\Data\RPLs\V2 RPLs\alldata.rpl").ReadToEnd();
Regex re = new Regex("^[0-9]{4}.{74}01$", RegexOptions.Multiline);

MatchCollection mc = re.Matches(alltext);
Console.WriteLine("Found " + mc.Count.ToString() + " matches");
if ( mc.Count == 0 )
{
return "";
}
return filename;
}
 
M

Mortimer Schnurd

Hi All,
I am a VB 6 programmer who is now trying to learn C#. In doing so, I
am trying to convert some of my VB modules to C#. I routinely user Reg
Expressions in VB and am having some trouble trying to use Regex in
C#. Basically, I have a fixed format text file which I need to
validate prior to using in a program. The validation insures the data
format matches what the program is expecting to find in the file. The
pattern I am trying to match for multiple lines is "^[0-9]{4}.{74}01$"
or IOW, 4 digits at the start of a line, followed by 74 characters,
and ending with a literal "01" at the end of the line. This pattern
works fine in my VB code and it correctly identifies all of the lines
with this pattern. My C# code, on the other hand, finds 0 matches for
the same file.

I'm quite sure I am missing something quite simple but I just can't
see what it is! Can some kind soul please point out where I am going
wrong? I am including the code snippets for my VB app and for the C#
app. BTW, the first several lines of data do NOT contain the matching
data. The data does contain 345 lines of matching data which VB does
find.

VB
Private Sub Command1_Click()
Const FILENAME As String = "D:\Johnw\Data\RPLs\V2 RPLs\alldata.rpl"
Dim regx1 As New RegExp
Dim m As Match, mc As MatchCollection, sm As SubMatches
Dim sText As String
Dim fs As New FileSystemObject
Dim ts As TextStream
Set ts = fs.OpenTextFile(FILENAME)
sText = ts.ReadAll
With regx1
.Global = True
.MultiLine = True
.Pattern = "^[0-9]{4}.{74}01$"
Set mc = .Execute(sText)
Debug.Print mc.Count
End With

End Sub

C# App
private string ValidateFile(string filename)
{
string alltext = new StreamReader
(@"D:\Johnw\Data\RPLs\V2 RPLs\alldata.rpl").ReadToEnd();
Regex re = new Regex("^[0-9]{4}.{74}01$", RegexOptions.Multiline);

MatchCollection mc = re.Matches(alltext);
Console.WriteLine("Found " + mc.Count.ToString() + " matches");
if ( mc.Count == 0 )
{
return "";
}
return filename;
}
Well, after reducing mi pattern to its most elemental state then
iteratively adding more to the patter, I found the answer to my
problem: it seems that, when using multiline, VB 6 doesn't give a
rat's-a$$ about a carriage-return character/new-line character pair
and it treats it as one character when looking for an end-of-line "$".
Whereas, C# does care and a "\r" needs to be accounted for within the
pattern. Changing my pattern to "^[0-9]{4}.{74}01\r$" now finds all
occurrences of the pattern in my file.
 
R

Robert Linder

Take a look at O'Reilly Book 'Mastering Regular Expression' ISBN
0-596-00289-0. It covers .NET Regex.

Mortimer said:
Hi All,
I am a VB 6 programmer who is now trying to learn C#. In doing so, I
am trying to convert some of my VB modules to C#. I routinely user Reg
Expressions in VB and am having some trouble trying to use Regex in
C#. Basically, I have a fixed format text file which I need to
validate prior to using in a program. The validation insures the data
format matches what the program is expecting to find in the file. The
pattern I am trying to match for multiple lines is "^[0-9]{4}.{74}01$"
or IOW, 4 digits at the start of a line, followed by 74 characters,
and ending with a literal "01" at the end of the line. This pattern
works fine in my VB code and it correctly identifies all of the lines
with this pattern. My C# code, on the other hand, finds 0 matches for
the same file.

I'm quite sure I am missing something quite simple but I just can't
see what it is! Can some kind soul please point out where I am going
wrong? I am including the code snippets for my VB app and for the C#
app. BTW, the first several lines of data do NOT contain the matching
data. The data does contain 345 lines of matching data which VB does
find.

VB
Private Sub Command1_Click()
Const FILENAME As String = "D:\Johnw\Data\RPLs\V2 RPLs\alldata.rpl"
Dim regx1 As New RegExp
Dim m As Match, mc As MatchCollection, sm As SubMatches
Dim sText As String
Dim fs As New FileSystemObject
Dim ts As TextStream
Set ts = fs.OpenTextFile(FILENAME)
sText = ts.ReadAll
With regx1
.Global = True
.MultiLine = True
.Pattern = "^[0-9]{4}.{74}01$"
Set mc = .Execute(sText)
Debug.Print mc.Count
End With

End Sub

C# App
private string ValidateFile(string filename)
{
string alltext = new StreamReader
(@"D:\Johnw\Data\RPLs\V2 RPLs\alldata.rpl").ReadToEnd();
Regex re = new Regex("^[0-9]{4}.{74}01$", RegexOptions.Multiline);

MatchCollection mc = re.Matches(alltext);
Console.WriteLine("Found " + mc.Count.ToString() + " matches");
if ( mc.Count == 0 )
{
return "";
}
return filename;
}

Well, after reducing mi pattern to its most elemental state then
iteratively adding more to the patter, I found the answer to my
problem: it seems that, when using multiline, VB 6 doesn't give a
rat's-a$$ about a carriage-return character/new-line character pair
and it treats it as one character when looking for an end-of-line "$".
Whereas, C# does care and a "\r" needs to be accounted for within the
pattern. Changing my pattern to "^[0-9]{4}.{74}01\r$" now finds all
occurrences of the pattern in my file.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Regex with double and char 2
Rookie thoughts on Regex--useful but not complete 28
VB to C# 2
Question for RegEx gurus 6
strange c# - vb difference 7
VB's CreateObject for C# 7
help with regex? 4
Regex Pattern 11

Top