Help with Regexp, please

R

Raj

Hi,

The regular expression (\d{15,16}) matches a substring in a cell. I
want to extract the remaining part of the cell ie. from the character
after the matched substring till the end of the string in the cell
using a regular expression.

Is it possible to do this?

Thanks in advance for the help.

Regards,
Raj
 
R

Ron Rosenfeld

Hi,

The regular expression (\d{15,16}) matches a substring in a cell. I
want to extract the remaining part of the cell ie. from the character
after the matched substring till the end of the string in the cell
using a regular expression.

Is it possible to do this?

Thanks in advance for the help.

Regards,
Raj

(\d{15,16})([\s\S]*)

will capture everything in the cell and after your "match" into Group 2




--ron
 
L

Lars-Åke Aspelin

Hi,

The regular expression (\d{15,16}) matches a substring in a cell. I
want to extract the remaining part of the cell ie. from the character
after the matched substring till the end of the string in the cell
using a regular expression.

Is it possible to do this?

Thanks in advance for the help.

Regards,
Raj

(\d{15,16})([\s\S]*)

will capture everything in the cell and after your "match" into Group 2




--ron

Can't [\s\S] be replaced by . like this
(\d{15,16})(.*)

Lars-Åke
 
R

Raj

To clarify, the regex should return " Raffles Traders" from the string
below:

NRK2D 986123456789312 Raffles Traders

The regex (\d{15,16})([\s\S]*) is returning "986123456789312 Raffles
Traders"

Regards,
Raj



Hi,
The regular expression (\d{15,16}) matches a substring in a cell. I
want to extract the remaining part of the cell ie. from the character
after the matched substring till the end of the string in the cell
using a regular expression.
Is it possible to do this?
Thanks in advance for the help.
Regards,
Raj
(\d{15,16})([\s\S]*)

will capture everything in the cell and after your "match" into Group 2

Can't [\s\S] be replaced by . like this
(\d{15,16})(.*)

Lars-Åke
 
R

Ron Rosenfeld

Can't [\s\S] be replaced by . like this
(\d{15,16})(.*)

Lars-Åke

Your suggestion will work IF and ONLY IF there are no line feeds or carriage
returns in the cell. In some flavors, there is an option to have Dot match
newline, but such does not exist in VBA (or Javascript).

If the OP, rather than wanting to extract everything to " ... the end of the
string in the cell" only wanted to extract everything to the end of the line,
and ignore anything in the cell after a newline character, then (.*) would be
appropriate.
--ron
 
R

Raj

I have made another post before Ron's last post. I have explained the
requirement with an example.

Regards,
Raj



Can't [\s\S] be replaced by . like this
(\d{15,16})(.*)
Lars-Åke

Your suggestion will work IF and ONLY IF there are no line feeds or carriage
returns in the cell.  In some flavors, there is an option to have Dot match
newline, but such does not exist in VBA (or Javascript).

If the OP, rather than wanting to extract everything to " ... the end of the
string in the cell"  only wanted to extract everything to the end of the line,
and ignore anything in the cell after a newline character, then (.*) would be
appropriate.
--ron
 
R

Rick Rothstein

Just in case you are interested, here is some non-RegEx code that finds the
same part of your text (the result is returned in the variable named
TailEnd)...

Text = "NRK2D 986123456789312 Raffles Traders"
For X = 1 To Len(Text)
If Mid(Text, X, 15) Like String(15, "#") Then
TailEnd = Mid(Text, X + 15)
Exit For
End If
Next

--
Rick (MVP - Excel)



Raj said:
To clarify, the regex should return " Raffles Traders" from the string
below:

NRK2D 986123456789312 Raffles Traders

The regex (\d{15,16})([\s\S]*) is returning "986123456789312 Raffles
Traders"

Regards,
Raj



The regular expression (\d{15,16}) matches a substring in a cell. I
want to extract the remaining part of the cell ie. from the character
after the matched substring till the end of the string in the cell
using a regular expression.
Is it possible to do this?
Thanks in advance for the help.
Regards,
Raj
(\d{15,16})([\s\S]*)

will capture everything in the cell and after your "match" into Group 2

Can't [\s\S] be replaced by . like this
(\d{15,16})(.*)

Lars-Åke
 
R

Ron Rosenfeld

To clarify, the regex should return " Raffles Traders" from the string
below:

NRK2D 986123456789312 Raffles Traders

The regex (\d{15,16})([\s\S]*) is returning "986123456789312 Raffles
Traders"

Regards,
Raj

Obviously, you are not doing what I suggested which was to return the *SECOND*
matching group. You are returning the ENTIRE match. Here's an example as to
returning the *SECOND* match using VBA:

===============================
Option Explicit
Function Part2(s As String) As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(\d{15,16})([\s\S]*)"
If re.test(s) = True Then
Set mc = re.Execute(s)
Part2 = mc(0).submatches(1)
End If
End Function
==================================
--ron
 
R

Ron Rosenfeld

Just in case you are interested, here is some non-RegEx code that finds the
same part of your text (the result is returned in the variable named
TailEnd)...

Text = "NRK2D 986123456789312 Raffles Traders"
For X = 1 To Len(Text)
If Mid(Text, X, 15) Like String(15, "#") Then
TailEnd = Mid(Text, X + 15)
Exit For
End If
Next

Rick,

The OP's original regex (\d{15,16}) will capture the first 15 *OR 16* digit
string into capture group 1. So when the *rest* of the string is returned, it
will omit digit 16 if present.

Your code will return the 16th digit, if present, as a part of "TailEnd"
--ron
 
R

Rick Rothstein

Okay, I wasn't entirely sure what the "15,16" meant in the pattern. If the
numbers are always followed by a space (is that what the "\s\S" part of your
expression is for?), my code could be modified to this...

Text = "NRK2D 3298613456378931 Raffles Traders"
For X = 1 To Len(Text)
If Mid(Text, X, 16) Like String(15, "#") & " " Then
TailEnd = Mid(Text, X + 15)
Exit For
End If
Next
 
R

Ron Rosenfeld

Okay, I wasn't entirely sure what the "15,16" meant in the pattern. If the
numbers are always followed by a space (is that what the "\s\S" part of your
expression is for?), my code could be modified to this...

Whatever it is that follows the string of digits is undefined in the OP's
specifications. It could be a space; it could be another digit; it could be an
alpha character; it could be nothing. That is a problem with the
specification.

Also, whatever it is that precedes the string of digits is ALSO unspecified. It
could even be another digit!

If, as in the OP's example, the string of digits is ALWAYS preceded and
followed by a space, then his regex should have been something like:

\s\d{15,16}\s

If he only wanted to capture the standalone string of digits, then

\s(\d{15,16})\s would capture just the digits into Group 1.

and, expanding on that,

\s\d{15,16}\s+([\s\S]*) would also
capture everything after the string of digits into group 2 except for
the leading <space>'s before group 2.

In my suggestion, the [\s\S] will match every character that is either a
<space> or not a <space>. In other words, it captures everything.

If all I wanted to do was return everything in the string that came after a 15
or 16 digit number, that was bounded by spaces, I would just replace the
beginning of the string with nothing. It's much simpler, and probably faster.
===================================
Option Explicit
Function Part2(s As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "^[\s\S]+\s\d{15,16}\s+"
Part2 = re.Replace(s, "")
End Function
====================================

If I wanted to include the space prior to " Raffles Traders" as the OP did in
his example, then perhaps I would use this pattern:

re.Pattern = "^[\s\S]+\s\d{15,16}\b"

Now this would return the unaltered string if there was no match, but we could
easily test for that, depending on what the OP wanted to do in that instance.

-----------------
if re.test(s) = true then
part2 = re.replace(s,"")
else
part2 = "no pattern match"
end if
 
R

Raj

Thanks for the solved problem and the learning about submatches that I
was not aware of.

Regards,
Raj




Okay, I wasn't entirely sure what the "15,16" meant in the pattern. If the
numbers are always followed by a space (is that what the "\s\S" part of your
expression is for?), my code could be modified to this...

Whatever it is that follows the string of digits is undefined in the OP's
specifications.  It could be a space; it could be another digit; it could be an
alpha character; it could be nothing.  That is a problem with the
specification.

Also, whatever it is that precedes the string of digits is ALSO unspecified. It
could even be another digit!

If, as in the OP's example, the string of digits is ALWAYS preceded and
followed by a space, then his regex should have been something like:

\s\d{15,16}\s

If he only wanted to capture the standalone string of digits, then

\s(\d{15,16})\s  would capture just the digits into Group 1.

and, expanding on that,

\s\d{15,16}\s+([\s\S]*)  would also
        capture everything after the string of digits into group 2 except for
the leading <space>'s before group 2.

In my suggestion, the [\s\S] will match every character that is either a
<space> or not a <space>.  In other words, it captures everything.

If all I wanted to do was return everything in the string that came aftera 15
or 16 digit number, that was bounded by spaces, I would just replace the
beginning of the string with nothing.  It's much simpler, and probably faster.
===================================
Option Explicit
Function Part2(s As String) As String
 Dim re As Object
Set re = CreateObject("vbscript.regexp")
 re.Pattern = "^[\s\S]+\s\d{15,16}\s+"
 Part2 = re.Replace(s, "")
End Function
====================================

If I wanted to include the space prior to " Raffles Traders" as the OP did in
his example, then perhaps I would use this pattern:

re.Pattern = "^[\s\S]+\s\d{15,16}\b"

Now this would return the unaltered string if there was no match, but we could
easily test for that, depending on what the OP wanted to do in that instance.

-----------------
if re.test(s) = true then
        part2 = re.replace(s,"")
else
        part2 = "no pattern match"
end if
 
R

Ron Rosenfeld

Thanks for the solved problem and the learning about submatches that I
was not aware of.

Regards,
Raj

Glad to help. Thanks for the feedback.

In your initial posting, you had your regex within parentheses:

(\d{15,16})
^ ^

That captures that result into a capturing group, which, in VBA, is referenced
as a submatch. That was why I assumed that you were aware of that concept. I
should have been more explicit.
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top