[Regular Expression] extraction when bounds are vbCr and vbLf

T

teo

Hallo

I need to extract a subtext from a text.
The subtext must contain a given word.

The subtext bounds are:

vbCr (return)
vbLf (new line)
vbCrLf (return+new line)
the very beginning of the text
the very ending of the text


I tried with:

^
\n
\r
$

so to have:

Dim myText As String
Dim myPattern As String = "^\n\r" & myWord & "\n\r$"

Dim match As Match = Regex.Match(myText, myPattern, RegexOptions.Multiline
Or RegexOptions.IgnoreCase)

but I had problems.
 
C

Chris

Try this, where Text is the subtext

Dim FoundMatch As Boolean
Try
FoundMatch = Regex.IsMatch(SubjectString, "^$Text$",
RegexOptions.Multiline)
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try

HTH

Chris
 
C

Chris

Hi Teo,

Just to clarify, are you trying to find all the lines in a given file that
contain a particular word?

What does your data look like, are these strictly text files? Can you give
me an example that I can test on. Where ever there is a VbLf, VbCr, or
VbCrLf you can just make note of it

This is some text VbCrLf
that I want to test VbCrLf
against.

Regulazy or the Regulator by Roy Osherove might help as well.
http://tools.osherove.com/

Chris
 
C

Chris

Hi Teo,

Thanks for putting that up there. It helped nicely.


Try the following code:

Imports System.Text.RegularExpressions
Imports System.Windows.Forms
Imports System.IO
Public Module Module1

Public Sub main()
Dim fileName As String = InputBox("Give me the file to parse", _
"File name input box")
CheckContents(fileName)

End Sub

''' <summary>
''' Check the contents of a file
''' </summary>
''' <param name="Filename"></param>
''' <remarks>Could be expanded to check against multiple
''' keywords by adding another argument that contains the
''' keyword and inserting it in place of the DIO characters</remarks>
Public Sub CheckContents(ByVal Filename As String)

'Declare RegExp
Dim dioRegex As New Regex(".*DIO.*(\n|\r|\r\n)",
RegexOptions.IgnoreCase)

'Make sure the file is really there
Dim fileExists As Boolean
fileExists = My.Computer.FileSystem.FileExists(Filename)

'Throw exception if the file is not there
If Not fileExists Then Throw New FileNotFoundException

'Get the contents of the file
Dim fileContents As String
fileContents = My.Computer.FileSystem.ReadAllText(Filename)

'Check File Contents Against Regex
Dim dioMatches As MatchCollection = dioRegex.Matches(fileContents)

'Loop though all of the matches and do something cool with them
For Each dioMatch As Match In dioMatches

'Your cool code goes here :blush:)

'I'm just going to print the results to a messagebox
MsgBox(dioMatch.Value)

Next

End Sub
End Module


Please keep in mind that some of the RTF formatting characters are left. I
didn't know if you wanted them left in, but you should be able to easily
strip out the /p and other character combinations using Str.Replace(oldChar,
newChar) where Str is the your data.

Best regards,

Chris
 
T

teo

I made few tests and I faced one problem:

the last sentence is never matched

(that is
if the word is in the last sentence
I'm not able to extract the sentence;
while if it is in the first sentence, it is all OK...)
 
C

Chris

Hi Teo,

I missed the case if there is not a line feed, carriage return or
combination.

Try replacing the dioRegex, in the CheckContents sub, with the following:

Dim dioRegex As New Regex(".*DIO.*((\n|\r|\r\n)|.*)",
RegexOptions.IgnoreCase)

Hope that helps,

Chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top