regular expressions

JFB · Oct 17, 2009

Hi All,
What is the pattern for a regular expression if i want to get the first
paragraph in a string between "" tag?
String = "match sample testmatch2 sample2 test2"
Get Result = "match sample test"
Thanks

JFB

JFB · Oct 17, 2009

BTW
With this pattern I got all the text between the first and the last 
<(?<tag>\w*)>(?<text>.*)<(?<tag>\w*)>
How can I get until the next 
Thanks

Martin Honnen · Oct 17, 2009

JFB said:
BTW
With this pattern I got all the text between the first and the last 
<(?<tag>\w*)>(?<text>.*)<(?<tag>\w*)>
How can I get until the next

Try non-greedy matching with
.*?
instead of
.*
Or use
[^<]*

JFB · Oct 17, 2009

Thanks for you reply and help.
using <(?<tag>\w*)>(?<text>[^<]*)<(?<tag>\w*)>
It jumps and get the last one

"match2 sample2 test2"
any other idea?

Martin Honnen said:
JFB said:

BTW
With this pattern I got all the text between the first and the last 
<(?<tag>\w*)>(?<text>.*)<(?<tag>\w*)>
How can I get until the next 

Click to expand...

Try non-greedy matching with
.*?
instead of
.*
Or use
[^<]*

JFB · Oct 17, 2009

Never mine, looks like the string is different.. now I'm confuse.
String= " match sample test match2 sample2 test2 "
How can I get the result?
result= "match sample test"
Thanks!

JFB said:
Thanks for you reply and help.
using <(?<tag>\w*)>(?<text>[^<]*)<(?<tag>\w*)>
It jumps and get the last one

"match2 sample2 test2"
any other idea?

Martin Honnen said:

JFB said:

BTW
With this pattern I got all the text between the first and the last 
<(?<tag>\w*)>(?<text>.*)<(?<tag>\w*)>
How can I get until the next 

Click to expand...

Try non-greedy matching with
.*?
instead of
.*
Or use
[^<]*

Click to expand...

JFB · Oct 17, 2009

This is getting better

, the string now is
String = "" match sample test match2 sample2 test2 "
Please help, now i can't match at all.
Thanks

JFB said:
Never mine, looks like the string is different.. now I'm confuse.
String= " match sample test match2 sample2 test2 "
How can I get the result?
result= "match sample test"
Thanks!

JFB said:

Thanks for you reply and help.
using <(?<tag>\w*)>(?<text>[^<]*)<(?<tag>\w*)>
It jumps and get the last one

"match2 sample2 test2"
any other idea?

Martin Honnen said:

JFB wrote:
BTW
With this pattern I got all the text between the first and the last
 
<(?<tag>\w*)>(?<text>.*)<(?<tag>\w*)>
How can I get until the next 

Try non-greedy matching with
.*?
instead of
.*
Or use
[^<]*

Click to expand...

Click to expand...

Martin Honnen · Oct 17, 2009

JFB said:
This is getting better , the string now is
String = "" match sample test match2 sample2 test2 "
Please help, now i can't match at all.

That looks like an XML fragment now so you could parse it as XML e.g.

Dim xml As String = " match sample test match2
sample2 test2 "
Dim settings As New XmlReaderSettings()
settings.ConformanceLevel = ConformanceLevel.Fragment
Dim doc As New XPathDocument(XmlReader.Create(New
StringReader(xml), settings))
Dim text As XPathNavigator =
doc.CreateNavigator().SelectSingleNode("br/following-sibling::text()")
If text IsNot Nothing Then
Console.WriteLine(text.Value)
End If

would output "match sample test".

Your earlier samples however were not XML fragments or documents so the
above approach would not work with them.
But if you know the input is an XML document or fragment then I wouldn't
bother to try to parse it with regular expressions but instead exploit
the power of XPath.

eBob.com · Oct 17, 2009

If you don't have it, get Expresso from UltraPico. It's a FREE tool which
makes it very easy to experiment with regular expressions.

Bob

JFB · Oct 18, 2009

Thanks again for you reply and help.
When I run this code. I'm getting this error:
' ', hexadecimal value 0x0B

Looks like the data from this doc file is not correct, but I open the word
file in notepad and looks ok with html format.
Maybe xml have problem reading my text?
The shows as square.
Do you have an idea how to solve this?
Regards

J

hnny

Martin Honnen · Oct 18, 2009

JFB said:
When I run this code. I'm getting this error:
' ', hexadecimal value 0x0B

Looks like the data from this doc file is not correct, but I open the word
file in notepad and looks ok with html format.
Maybe xml have problem reading my text?
The shows as square.
Do you have an idea how to solve this?

Which code exactly do you run that gives that error for which statement
eaxctly? How does the input exactly look? Does it contain characters
that are not allowed in XML, such as control characters?

So far you have shown only variables with strings of markup.
If you have a file instead then you will need to show how you read the
file contents into a string respectively in terms of XML you would
normally let the XML parser do all that work meaning if you have a file
file1.xml then you would simply change the code I posted to

Dim settings As New XmlReaderSettings()
settings.ConformanceLevel = ConformanceLevel.Fragment
Dim doc As New XPathDocument(XmlReader.Create("file1.xml",
settings))
Dim text As XPathNavigator =
doc.CreateNavigator().SelectSingleNode("br/following-sibling::text()")
If text IsNot Nothing Then
Console.WriteLine(text.Value)
End If

If you still have problems then you need to provide more details as to
where the file comes from, how it is encoded.

JFB · Oct 19, 2009

Which code exactly do you run that gives that error for which statement

eaxctly?

Error:{"'', hexadecimal value 0x0B, is an invalid character. Line 1,
position 1."}
Line Code when the error show:
Dim doc As New XPath.XPathDocument(XmlReader.Create(New
StringReader(tempcontent), settings))

How does the input exactly look?

I have a word doc file that I need to read and get the name of address
block.
The paragraph looks like this when I edit the file with notepad.
 

SHLOMI HELWA 

563 ELTINGVILLE BLVD. 

STATEN ISLAND, NY 10312

Does it contain characters that are not allowed in XML, such as control
characters?

So far you have shown only variables with strings of markup.
If you have a file instead then you will need to show how you read the
file contents into a string respectively in terms of XML you would
normally let the XML parser do all that work meaning if you have a file
file1.xml then you would simply change the code I posted to

Please send me an email to jfb00(at)hotmail.com and I can send you the word
file.
I have many word files that I need to collect only the name of an address
block, so I reading and getting the paragraph that contains the address
block.
Here is my code:
Try

'for office xp

wordApp = CreateObject("Word.Application")

wordDoc = CreateObject("Word.document")

Catch

'for office 2000 and 97

wordApp = New Word.Application

wordDoc = New Word.Document

End Try

wordApp.Visible = False

wordDoc = wordApp.Documents.Open(FileName:=docName.ToString)

Dim tempcontent As String = ""

Dim subPara As Word.Paragraph

Dim paraCount As Integer

paraCount = 0

For Each subPara In wordDoc.Paragraphs

tempcontent = subPara.Range.Text

paraCount = paraCount + 1

If paraCount = 5 Then ''Here I get the address block

Exit For

End If

Next

Dim settings As New XmlReaderSettings()

settings.ConformanceLevel = ConformanceLevel.Fragment

settings.CheckCharacters = True

Dim doc As New XPath.XPathDocument(XmlReader.Create(New
StringReader(tempcontent), settings))

Dim text As XPath.XPathNavigator =
doc.CreateNavigator().SelectSingleNode("br/following-sibling::text()")

If text IsNot Nothing Then

MsgBox(text.Value)

End If

thanks for your help!

JFB · Oct 19, 2009

Thanks for you reply Bob,
I already get that but it doesn't help in my case because I have some
special character in my file.
Rgds

Martin Honnen · Oct 19, 2009

JFB said:
Error:{"'', hexadecimal value 0x0B, is an invalid character. Line 1,
position 1."}
Line Code when the error show:
Dim doc As New XPath.XPathDocument(XmlReader.Create(New
StringReader(tempcontent), settings))

I have a word doc file that I need to read and get the name of address
block.

I am afraid a Word document can contain characters that are not allowed
in XML documents so using an XML parser on the contents will not work
unless you strip any not allowed characters first.

JFB · Oct 19, 2009

I used arrays and it works:
Dim ArrayCadenas() As String

ArrayCadenas = Split(" SHLOMI HELWA 563 ELTINGVILLE BLVD. STATEN ISLAND, NY 10312 "," ")

msbBox(ArrayCadenas(0).ToString)

Thanks for you reply and help!

Regular Expression question	5	Sep 21, 2009
Regular Expression Mystery	1	Dec 10, 2007
Help re RegEx	9	Jul 18, 2008
Regular expressions	3	Jan 27, 2005
A Question About Regular Expressions and Capture	2	Jun 13, 2006
using a regular expression to match up to but not including html start/end tags	9	Oct 11, 2008
Regular Expressions Question	8	Sep 21, 2009
Regular Expression Syntax	1	Apr 21, 2008

regular expressions

JFB

JFB

Martin Honnen

JFB

JFB

JFB

Martin Honnen

eBob.com

JFB

Martin Honnen

JFB

JFB

Martin Honnen

JFB

Ask a Question

Similar Threads