Regular expressions

D

Dave

I was wondering if someone could help me with a regular expression problem.
I have a string of characters and numbers that will contain some information
about chemical formulas. I need to parse out of the string what elements it
contains and which numbers are associated with those elements. The problem
I'm having is that some elements contain 2 characters and some just one. For
example, the elements Br and B. I had originally set up a search pattern for
the Regex object to have a complete list of the periodic table "[B|Br|
etc.]" and was going to use matches method to get a collection of all
elements matched in the string.

dim myRegex as regularexpressions.regex("[B|Br| the rest of periodic table
here ]", regexoptions.ignorecase)
dim matchesMade as regularexpressions.matchescollection
matchesMade = myregex.Matches(aString)
for each matchMade as regularexpressions.match in matchesMade
' do stuff to the match here
next

The problem is the logic above would get a match on B even if the formula
really contain Br and I need to distinguish between the two.

Any suggestions on how I do this would be appreciated.

Thanks,
Dave
 
D

Dave

I think a way around this is to change the order of my pattern expressions
so that the multiple character elements shown up first in the list ( ie
["Br|Bk|Bi|Be|Ba|B"]). Not sure if this is the best way around the problem
but it seems to work.

Dave
 
J

Jay B. Harlow [MVP - Outlook]

Dave,
That might be the "best way", at least its the "easist way" as | the
alternation operator states "Matches any one of the terms... The leftmost
successful match wins"

Seeing as its going to find the first one (left most) it will do what you
need...

Hope this helps
Jay

Dave said:
I think a way around this is to change the order of my pattern expressions
so that the multiple character elements shown up first in the list ( ie
["Br|Bk|Bi|Be|Ba|B"]). Not sure if this is the best way around the problem
but it seems to work.

Dave

Dave said:
I was wondering if someone could help me with a regular expression
problem. I have a string of characters and numbers that will contain some
information about chemical formulas. I need to parse out of the string
what elements it contains and which numbers are associated with those
elements. The problem I'm having is that some elements contain 2
characters and some just one. For example, the elements Br and B. I had
originally set up a search pattern for the Regex object to have a complete
list of the periodic table "[B|Br| etc.]" and was going to use matches
method to get a collection of all elements matched in the string.

dim myRegex as regularexpressions.regex("[B|Br| the rest of periodic
table here ]", regexoptions.ignorecase)
dim matchesMade as regularexpressions.matchescollection
matchesMade = myregex.Matches(aString)
for each matchMade as regularexpressions.match in matchesMade
' do stuff to the match here
next

The problem is the logic above would get a match on B even if the formula
really contain Br and I need to distinguish between the two.

Any suggestions on how I do this would be appreciated.

Thanks,
Dave
 
D

Dave

Thanks Jay & yes that does help.

Dave

Jay B. Harlow said:
Dave,
That might be the "best way", at least its the "easist way" as | the
alternation operator states "Matches any one of the terms... The leftmost
successful match wins"

Seeing as its going to find the first one (left most) it will do what you
need...

Hope this helps
Jay

Dave said:
I think a way around this is to change the order of my pattern expressions
so that the multiple character elements shown up first in the list ( ie
["Br|Bk|Bi|Be|Ba|B"]). Not sure if this is the best way around the problem
but it seems to work.

Dave

Dave said:
I was wondering if someone could help me with a regular expression
problem. I have a string of characters and numbers that will contain some
information about chemical formulas. I need to parse out of the string
what elements it contains and which numbers are associated with those
elements. The problem I'm having is that some elements contain 2
characters and some just one. For example, the elements Br and B. I had
originally set up a search pattern for the Regex object to have a
complete list of the periodic table "[B|Br| etc.]" and was going to use
matches method to get a collection of all elements matched in the string.

dim myRegex as regularexpressions.regex("[B|Br| the rest of periodic
table here ]", regexoptions.ignorecase)
dim matchesMade as regularexpressions.matchescollection
matchesMade = myregex.Matches(aString)
for each matchMade as regularexpressions.match in matchesMade
' do stuff to the match here
next

The problem is the logic above would get a match on B even if the
formula really contain Br and I need to distinguish between the two.

Any suggestions on how I do this would be appreciated.

Thanks,
Dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top