Regular Expression Help on syntax

Jason · Jan 8, 2010

Need help on regular expression, where we are trying to grab number values as
well as minus sign if negative. Currently we can grab all the numbers
correctly, just if numbers are negative, we only grab the number not the
number and the minus sign:

Example for:
-10 we get 10 but want -10 of course

I am new to Regular Expressions and have played with the syntax but cannot
get it work properly. Currently we are using
"\D+(\d+)"

Ron Rosenfeld · Jan 8, 2010

Need help on regular expression, where we are trying to grab number values as
well as minus sign if negative. Currently we can grab all the numbers
correctly, just if numbers are negative, we only grab the number not the
number and the minus sign:

Example for:
-10 we get 10 but want -10 of course

I am new to Regular Expressions and have played with the syntax but cannot
get it work properly. Currently we are using
"\D+(\d+)"

I note that your expression will only capture integers. And also will not
capture an integer at the start of a line.

For integers with an optional sign, and to allow capture of positive integers
at the beginning of a line, try:

"[-+]?\b\d+\b"
--ron

Colbert Zhou [MSFT] · Jan 8, 2010

Hello Dan,

Based on my researches, if we want to identify the negative number, we can
use the following regular expressions,
^(-)?\D+(\d+)

Best regards,
Ji Zhou
Microsoft Online Community

Colbert Zhou [MSFT] · Jan 8, 2010

And for more Regular Expression pattern, as well as a list of symbol usage,
you can refer the following MSDN link,
http://msdn.microsoft.com/en-us/library/ms974570.aspx

Hope this helps!

Best regards,
Ji Zhou
Microsoft Online Community Support

Jason · Jan 8, 2010

Neither of these suggestions would grab the negative sign, so maybe there is
more to it. I have posted what I believe to be the more relevant portions of
the code below:

c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb")
c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb")
c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=gb")

Private Function RegexMid(s As String, sDate As String, sTempType As String)
As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.MultiLine = True
re.Global = True
re.Pattern = "\b" & sDate & "/DailyHistory[\s\S]+?" & sTempType & "\D+(\d+)"

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).submatches(0)
End If
Set re = Nothing
End Function

***Here is an example of the data from the source page that we are trying to
pull from

</tr>
</tbody>
<tbody>
<tr>
<td><a href="/history/airport/KSTP/2010/1/1/DailyHistory.html">1</a></td>
<td class="bl gb">
8
</td>
<td class="gb">
0
</td>
<td class="br gb">
-8
</td>

Thank you!

Jason · Jan 8, 2010

Joel,

I tried using your suggestion, but it won't pull any numbers, much less the
negatives.

The closest I got was using:
"(\D+(-)?\d+)"

but it pulls
">-8
instead of
-8

or

-8

instead of
-8

joel said:
You need to OR to expressions like I did below

"(\d+|([-+]\d+))"

the pipe character is an or function to either have the minus sign or
not have the minus sign

--
joel
------------------------------------------------------------------------
joel's Profile: 229
View this thread: http://www.thecodecage.com/forumz/showthread.php?t=168082

Microsoft Office Help

.

Ron Rosenfeld · Jan 8, 2010

Joel,

I tried using your suggestion, but it won't pull any numbers, much less the
negatives.

The closest I got was using:
"(\D+(-)?\d+)"

but it pulls
">-8
instead of
-8

or
instead of
-8

I'm just curious about what happened with my suggestion?

"[-+]?\b\d+\b"
--ron

Jason · Jan 8, 2010

Still will not pull any numbers with either syntax you suggested.

Jason · Jan 8, 2010

"[-+]?\b\d+\b"

This doesn't seem to grab any numbers at all. Perhaps there is more info I
can provide from the script below:

c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb")
c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb")
c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=gb")

Private Function RegexMid(s As String, sDate As String, sTempType As String)
As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.MultiLine = True
re.Global = True
re.Pattern = "\b" & sDate & "/DailyHistory[\s\S]+?" & sTempType & "\D+(\d+)"

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).submatches(0)
End If
Set re = Nothing
End Function

***Here is an example of the data from the source page that we are trying to
pull from

</tr>
</tbody>
<tbody>
<tr>
<td><a href="/history/airport/KSTP/2010/1/1/DailyHistory.html">1</a></td>
<td class="bl gb">
8
</td>
<td class="gb">
0
</td>
<td class="br gb">
-8
</td>

Thank you!

Jason · Jan 8, 2010

Ron,

Sorry I posted this reply earlier.

"[-+]?\b\d+\b"

This doesn't seem to grab any numbers at all. Perhaps there is more info I
can provide from the script below:

c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb")
c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb")
c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=gb")

Private Function RegexMid(s As String, sDate As String, sTempType As String)
As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.MultiLine = True
re.Global = True
re.Pattern = "\b" & sDate & "/DailyHistory[\s\S]+?" & sTempType & "\D+(\d+)"

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).submatches(0)
End If
Set re = Nothing
End Function

***Here is an example of the data from the source page that we are trying to
pull from

</tr>
</tbody>
<tbody>
<tr>
<td><a href="/history/airport/KSTP/2010/1/1/DailyHistory.html">1</a></td>
<td class="bl gb">
8
</td>
<td class="gb">
0
</td>
<td class="br gb">
-8
</td>

Thank you!

Ron Rosenfeld · Jan 9, 2010

Ron,

Sorry I posted this reply earlier.

"[-+]?\b\d+\b"

This doesn't seem to grab any numbers at all.

Perhaps there is more info I
can provide from the script below:

c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb")
c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb")
c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=gb")

Private Function RegexMid(s As String, sDate As String, sTempType As String)
As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.MultiLine = True
re.Global = True
re.Pattern = "\b" & sDate & "/DailyHistory[\s\S]+?" & sTempType & "\D+(\d+)"

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).submatches(0)
End If
Set re = Nothing
End Function

***Here is an example of the data from the source page that we are trying to
pull from

</tr>
</tbody>
<tbody>
<tr>
<td><a href="/history/airport/KSTP/2010/1/1/DailyHistory.html">1</a></td>
<td class="bl gb">
8
</td>
<td class="gb">
0
</td>
<td class="br gb">
-8
</td>

Thank you!

There is no question that the regex I provided should return all signed and
unsigned integers, as I understand things.

There is also no question that your regex of \D+(\d+) will never capture a
negative integer into the capturing group, since the "-" will always be
captured by the \D.

The problem is in the rest of your regex. It could be that either sDate or
sTempType is not properly formed.

For example, the following three regex's return the desired numbers in
capturing group 1 when run against your test data:

"\b/2010/1/1/DailyHistory[\s\S]+?""bl gb""[\s\S]+?-?\b(\d+)\b"

--> 8

"\b/2010/1/1/DailyHistory[\s\S]+?""gb""[\s\S]+?-?\b(\d+)\b"

--> 0

"\b/2010/1/1/DailyHistory[\s\S]+?""br gb""[\s\S]+?(-?\b\d+)\b"

--> -8

Above is what your regex should look like after you finish concatenating things
together.

So in addition to your problem with the part of your regex that captures the
integer, which should be corrected by using my suggestion, you need to evaluate
the rest of the regex and how it is being constructed.
--ron

Ron Rosenfeld · Jan 9, 2010

I tested Ron's code and it didn't return what you really need. Ron's
string returns everthing to the end of the number you are looking for.

My "code" was merely a regex which is quite different from your code.

If implemented properly, it returns only the number. If you are returning only
what you say, then you probably are not implementing it properly.

To demonstrate some code segments, we could set up the following, with the data
in an Excel worksheet. And code similar to what the OP posted, with a few
minor changes to correct his errors.

I just put MyStr into A1 for testing, not having access to the rest of the OP's
code. And, by setting c=A2 and leaving i=0, the OP's sub would place the
results in B2

2.

A1: (the sample text given by the OP
</tr>
</tbody>
<tbody>
<tr>
<td><a href="/history/airport/KSTP/2010/1/1/DailyHistory.html">1</a></td>
<td class="bl gb">
8
</td>
<td class="gb">
0
</td>
<td class="br gb">
-8
</td>

Then use this routine -- quite similar to that of the OP:

====================================
Option Explicit
Sub TestExtract()
Dim i As Long
Dim c As Range
Dim myStr As String
Dim sURLdate As String
Set c = Range("A2")
myStr = Range("A1").Value
sURLdate = Format(CDate("1/1/2010"), "/yyyy/m/d/")

c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb")
c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb")
c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=""gb""")
End Sub

'------------------------------------------------------------------
Private Function RegexMid(s As String, sDate As String, sTempType As String) _
As String
Dim re As Object, mc As Object

Set re = CreateObject("vbscript.regexp")
re.ignorecase = True
re.Pattern = sDate & "DailyHistory[\s\S]+?" & _
sTempType & "[\s\S]+?(-?\b\d+)\b"

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).submatches(0)
End If
End Function
=================================

As expected, this returns into B2, C2 and D2 the signed integers following the
sTempType variables.

Note also the extra quote marks required in the third call, c/w the OP's.

Also note the terminal regex representing the signed integer.

I finally figure it out. You need to put parenthsis around the two sub
parts of the search string. the first part is the tag and the second
part is the number. Then use the submatches prperty to get the 2nd
submatch.

Note that your code for a signed integer:

(\d+|([-+]\d+))

could be more simply expressed as

([-+]?\d+)

Why did you choose to use alternation?

--ron

Lars-Åke Aspelin · Jan 9, 2010

I tested Ron's code and it didn't return what you really need. Ron's
string returns everthing to the end of the number you are looking for.

Click to expand...

My "code" was merely a regex which is quite different from your code.

If implemented properly, it returns only the number. If you are returning only
what you say, then you probably are not implementing it properly.

To demonstrate some code segments, we could set up the following, with the data
in an Excel worksheet. And code similar to what the OP posted, with a few
minor changes to correct his errors.

I just put MyStr into A1 for testing, not having access to the rest of the OP's
code. And, by setting c=A2 and leaving i=0, the OP's sub would place the
results in B22.

A1: (the sample text given by the OP
</tr>
</tbody>
<tbody>
<tr>
<td><a href="/history/airport/KSTP/2010/1/1/DailyHistory.html">1</a></td>
<td class="bl gb">
8
</td>
<td class="gb">
0
</td>
<td class="br gb">
-8
</td>

Then use this routine -- quite similar to that of the OP:

====================================
Option Explicit
Sub TestExtract()
Dim i As Long
Dim c As Range
Dim myStr As String
Dim sURLdate As String
Set c = Range("A2")
myStr = Range("A1").Value
sURLdate = Format(CDate("1/1/2010"), "/yyyy/m/d/")

c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb")
c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb")
c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=""gb""")
End Sub

'------------------------------------------------------------------
Private Function RegexMid(s As String, sDate As String, sTempType As String) _
As String
Dim re As Object, mc As Object

Set re = CreateObject("vbscript.regexp")
re.ignorecase = True
re.Pattern = sDate & "DailyHistory[\s\S]+?" & _
sTempType & "[\s\S]+?(-?\b\d+)\b"

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).submatches(0)
End If
End Function
=================================

As expected, this returns into B2, C2 and D2 the signed integers following the
sTempType variables.

Note also the extra quote marks required in the third call, c/w the OP's.

Also note the terminal regex representing the signed integer.

I finally figure it out. You need to put parenthsis around the two sub
parts of the search string. the first part is the tag and the second
part is the number. Then use the submatches prperty to get the 2nd
submatch.

Click to expand...

Note that your code for a signed integer:

(\d+|([-+]\d+))

could be more simply expressed as

([-+]?\d+)

Why did you choose to use alternation?

--ron

In order to get a proper match I have to use the \ character before
the / characters in the Format function, like this

sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/")

Maybe this has something to do with the Regional and Language
settings.

I also have a question. What is the function of the two \b in the
regexp? I get a match even without them.
And the negative number is also matched.

Lars-Åke

Lars-Åke Aspelin · Jan 9, 2010

I tested Ron's code and it didn't return what you really need. Ron's
string returns everthing to the end of the number you are looking for.

Click to expand...

My "code" was merely a regex which is quite different from your code.

If implemented properly, it returns only the number. If you are returning only
what you say, then you probably are not implementing it properly.

To demonstrate some code segments, we could set up the following, with the data
in an Excel worksheet. And code similar to what the OP posted, with a few
minor changes to correct his errors.

I just put MyStr into A1 for testing, not having access to the rest of the OP's
code. And, by setting c=A2 and leaving i=0, the OP's sub would place the
results in B22.

A1: (the sample text given by the OP
</tr>
</tbody>
<tbody>
<tr>
<td><a href="/history/airport/KSTP/2010/1/1/DailyHistory.html">1</a></td>
<td class="bl gb">
8
</td>
<td class="gb">
0
</td>
<td class="br gb">
-8
</td>

Then use this routine -- quite similar to that of the OP:

====================================
Option Explicit
Sub TestExtract()
Dim i As Long
Dim c As Range
Dim myStr As String
Dim sURLdate As String
Set c = Range("A2")
myStr = Range("A1").Value
sURLdate = Format(CDate("1/1/2010"), "/yyyy/m/d/")

c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb")
c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb")
c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=""gb""")
End Sub

'------------------------------------------------------------------
Private Function RegexMid(s As String, sDate As String, sTempType As String) _
As String
Dim re As Object, mc As Object

Set re = CreateObject("vbscript.regexp")
re.ignorecase = True
re.Pattern = sDate & "DailyHistory[\s\S]+?" & _
sTempType & "[\s\S]+?(-?\b\d+)\b"

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).submatches(0)
End If
End Function
=================================

As expected, this returns into B2, C2 and D2 the signed integers following the
sTempType variables.

Note also the extra quote marks required in the third call, c/w the OP's.

Also note the terminal regex representing the signed integer.

I finally figure it out. You need to put parenthsis around the two sub
parts of the search string. the first part is the tag and the second
part is the number. Then use the submatches prperty to get the 2nd
submatch.

Click to expand...

Note that your code for a signed integer:

(\d+|([-+]\d+))

could be more simply expressed as

([-+]?\d+)

Why did you choose to use alternation?

--ron

Click to expand...

In order to get a proper match I have to use the \ character before
the / characters in the Format function, like this

sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/")

Maybe this has something to do with the Regional and Language
settings.

I also have a question. What is the function of the two \b in the
regexp? I get a match even without them.
And the negative number is also matched.

Lars-Åke

I found the explanation to \b (word boundary)
http://msdn.microsoft.com/en-us/library/ms974570.aspx

Lars-Åke

Ron Rosenfeld · Jan 10, 2010

In order to get a proper match I have to use the \ character before
the / characters in the Format function, like this

sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/")

Maybe this has something to do with the Regional and Language
settings.

I don't know. AFAIK, all characters except

[\^$.|?*+()

should get matched literally.

Preceding a character (that has no special meaning) by a backslash ("\") merely
represents a single instance of the second character.

If you type, into the immediate window:

?CDate("1/1/2010")

what is returned?

I also have a question. What is the function of the two \b in the
regexp? I get a match even without them.
And the negative number is also matched.

\b represents a word boundary, or, more specifically, it matches at the
position between a word character (anything matched by \w) and a non-word
character (anything matched by [^\w] or \W) as well as at the start and/or end
of the string (or line) if the first and/or last characters in the string (or
line) are word characters.
--ron

Lars-Åke Aspelin · Jan 10, 2010

In order to get a proper match I have to use the \ character before
the / characters in the Format function, like this

sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/")

Maybe this has something to do with the Regional and Language
settings.

Click to expand...

I don't know. AFAIK, all characters except

[\^$.|?*+()

should get matched literally.

Preceding a character (that has no special meaning) by a backslash ("\") merely
represents a single instance of the second character.

If you type, into the immediate window:

?CDate("1/1/2010")

what is returned?

I also have a question. What is the function of the two \b in the
regexp? I get a match even without them.
And the negative number is also matched.

Click to expand...

\b represents a word boundary, or, more specifically, it matches at the
position between a word character (anything matched by \w) and a non-word
character (anything matched by [^\w] or \W) as well as at the start and/or end
of the string (or line) if the first and/or last characters in the string (or
line) are word characters.
--ron

If I write ?CDate("1/1/2010") in the immediate window the following is
returned
2010-01-01

If I write ?Format(CDate("1/1/2010"), "/yyyy/m/d/") the following is
returned
-2010-1-1-

Only if I write ?Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/" the wanted
result
/2010/1/1/
is achieved.

So, with my settings the / character seems to generate a - .
To have a / generated the / has to be escaped with \.

I found the explanation in the Excel help for Format:

"(/) Date separator. In some locales, other characters may be used to
represent the date separator. The date separator separates the day,
month, and year when date values are formatted. The actual character
used as the date separator in formatted output is determined by your
system settings."

And in my settings - (hyphen) is used as the date separator, in
accordance with ISO 8601 extended format
http://en.wikipedia.org/wiki/ISO_8601

Lars-Åke

Walter Briscoe · Jan 10, 2010

In message <[email protected]> of Sun, 10 Jan 2010 04:55:10 in

microsoft.public.excel.programming said:
Ron: have you ever looked at the 1970's unix manual volume 2b under the
YACC topic. See this webpage. Look for the link for the PDF files and
use the link : v7vol2b.pdf (819KB)

Webpage
'7th Edition Manual PDF'
(http://plan9.bell-labs.com/7thEdMan/bswv7.html)

pdf file
http://plan9.bell-labs.com/7thEdMan/v7vol2b.pdf

It is the only good description of pattern matching that I have ever
seen.

The question mark indicates any single character.

Joel, How do you conclude that? (I expect "." to match any single
character within a line and ".|\s" to match any single character).

I found this table on page 49/250 (Joel is probably looking elsewhere):

Regular expressions in Lex use the following operators:

....
x? an optional x.
x* 0,1,2, ... instances of x.
x+ 1,2,3, ... instances of x.
x?y an x or a y.
....

I am confused by that x?y. I think it means an optional x followed by a
literal y. I think there may be a glyph confusion. I suspect it is
intended to be x|y where the character - for which I know no name -
between x and y means "or" as & - ampersand - means "and".

The symbols used by Set RE = CreateObject("VBScript.RegExp")
RE.Pattern = ... are a superset of the BRE described in <http://opengrou
p.org/onlinepubs/007908775/xbd/re.html>. My own view is that thinks like
\d is an unnecessary shorthand for [0-9]. I concede it is reasonable if
a range is not allowed in a character set. i.e. if [0123456789] is
needed. In VBA, I refer to <http://msdn.microsoft.com/en-
us/library/ms974570.aspx> which specifies everything I want to know
other than the meanings of the exceptions given.
<http://msdn.microsoft.com/en-us/library/xe43cc8d(VS.85).aspx>
specifies 5019, which I hit yesterday.

For my part, I would start with "-?\d+" to grab a whole number. i.e a
whole number is a minus which is optional followed by a digit one or
more times. I found the OP's situation too complicated to want to follow
and offer a suggestion.

Ron Rosenfeld · Jan 10, 2010

In order to get a proper match I have to use the \ character before
the / characters in the Format function, like this

sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/")

Maybe this has something to do with the Regional and Language
settings.

Click to expand...

I don't know. AFAIK, all characters except

[\^$.|?*+()

should get matched literally.

Preceding a character (that has no special meaning) by a backslash ("\") merely
represents a single instance of the second character.

If you type, into the immediate window:

?CDate("1/1/2010")

what is returned?

I also have a question. What is the function of the two \b in the
regexp? I get a match even without them.
And the negative number is also matched.

Click to expand...

\b represents a word boundary, or, more specifically, it matches at the
position between a word character (anything matched by \w) and a non-word
character (anything matched by [^\w] or \W) as well as at the start and/or end
of the string (or line) if the first and/or last characters in the string (or
line) are word characters.
--ron

Click to expand...

If I write ?CDate("1/1/2010") in the immediate window the following is
returned
2010-01-01

If I write ?Format(CDate("1/1/2010"), "/yyyy/m/d/") the following is
returned
-2010-1-1-

Only if I write ?Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/" the wanted
result
/2010/1/1/
is achieved.

So, with my settings the / character seems to generate a - .
To have a / generated the / has to be escaped with \.

I found the explanation in the Excel help for Format:

"(/) Date separator. In some locales, other characters may be used to
represent the date separator. The date separator separates the day,
month, and year when date values are formatted. The actual character
used as the date separator in formatted output is determined by your
system settings."

And in my settings - (hyphen) is used as the date separator, in
accordance with ISO 8601 extended format
http://en.wikipedia.org/wiki/ISO_8601

Lars-Åke

Well, that is an interesting difference, and clearly related to
Excel/VBA/Regional Settings and not to the regex engine per se, as I originally
thought.

When I place ?Cdate("1/1/2010") in the immediate window, it returns
1/1/2010.

So it would seem that for code that will run properly, regardless of the
Windows Regional settings, it might be best to use "\/yyyy\/m\/d" in the Format
command to properly format the date string to match with the URL.

Here in the US, I don't generally have to deal with international requirements.
Thanks for pointing that out.
--ron

Ron Rosenfeld · Jan 10, 2010

Ron: have you ever looked at the 1970's unix manual volume 2b under the
YACC topic. See this webpage. Look for the link for the PDF files and
use the link : v7vol2b.pdf (819KB)

Webpage
'7th Edition Manual PDF'
(http://plan9.bell-labs.com/7thEdMan/bswv7.html)

pdf file
http://plan9.bell-labs.com/7thEdMan/v7vol2b.pdf

It is the only good description of pattern matching that I have ever
seen.

The question mark indicates any single character.

I have not looked at that reference. However, if that is what it says, it is
wrong.

The question mark "?": makes the preceding item optional. Greedy, so the
optional item is included in the match if possible.

=================================
Looking at the reference you provided, I see that you have misread it. It does
*NOT* read, as you claim, that "the question mark indicates any single
character". It states the almost the SAME definition as *I* posted above:

"Optional Expressions . The operator ? indicates an optional element of an
expression. Thus ab?c matches either ac or abc."
===================================

There are some instances where "?" represents any single character, but it does
NOT do this with regard to regular expressions. Perhaps that is where you got
confused.

There are a number of problems with the above search string

1) why do you care that a blank occurs immediately after the last
digit. what happens if there is a return, tab or end of file

I don't know what you mean by a "blank". Perhaps you are confusing the use of
the "\b" token? Check that definition (which I previously posted). It is also
does NOT seem to be part of the LEX regular expression flavor which you are
quoting. Rather, in LEX, it would mean a backspace; but that is not the case
with more commonly used flavors.

2) The match is looking for a positive or negative sign folowed by any
character followed by a blank. Most people don't have a blank.

No it is not. See above. You are misinterpreting "\b".\

\d+|([-+]\d+)

It is where I found the saying

If you can't bring Mohammed to the mountain, bring the mountain to
Mohammed.

If Mohammed goes to the wrong mountain, he may get the wrong information.

You might try some of these "mountains", if the links are still valid:

Regular Expressions
http://www.regular-expressions.info/reference.html
http://support.microsoft.com/default.aspx?scid=kb;en-us;818802&Product=vbb
http://msdn2.microsoft.com/en-us/library/6wzad2b2.aspx
http://msdn2.microsoft.com/en-us/library/ms974619.aspx
http://www.regex-guru.info/
--ron

Ron Rosenfeld · Jan 10, 2010

For my part, I would start with "-?\d+" to grab a whole number. i.e a
whole number is a minus which is optional followed by a digit one or
more times. I found the OP's situation too complicated to want to follow
and offer a suggestion.

I agree with a lot of what you wrote.

I would point out that the above construct will also extract whole numbers that
are embedded within other strings. That may or may not be desirable.

In other words, it can extract both integers from the string below:

abc123abc 123
--ron

Why won't negative sign come over with negative numbers?	1	Jan 7, 2010
regular expression	7	Nov 27, 2012
Excel Subtracting multiple columns from another column	3	Nov 22, 2018
Finding last constant in range	1	Jul 6, 2013
Sum work hours to hours & nearest 10th of an hour	14	Jul 17, 2014
I am getting duplicate decimal points on numeric validation	10	Oct 4, 2007
Windows 10 Will Windows 10 be able to run on my PC?	3	Apr 23, 2024
Regular Expression for cell address	11	Jan 2, 2007

Regular Expression Help on syntax

Jason

Ron Rosenfeld

Colbert Zhou [MSFT]

Colbert Zhou [MSFT]

Jason

Jason

Ron Rosenfeld

Jason

Jason

Jason

Ron Rosenfeld

Ron Rosenfeld

Lars-Åke Aspelin

Lars-Åke Aspelin

Ron Rosenfeld

Lars-Åke Aspelin

Walter Briscoe

Ron Rosenfeld

Ron Rosenfeld

Ron Rosenfeld

Ask a Question

Similar Threads