Regular Expression Question

Guest · Jan 28, 2007

I am a newbie to regular expressions and want to extract a number from the
end of a string. The string would have these formats:

image/4567
image/45678
image/456789

I would also want to extract the name if possible from this string too:
"image/4567">name</a>

Thanks.

Daniel · Jan 28, 2007

Use String.SubString();

Takes 2 arguments for start and end of substring. To find that
String.IndexOf();

Both can be found on msdn under String class for usage.

So you'll be looking for the character "/" on the first prob. and retrieving
sub string for that location using indexof and the end of sub string is the
String.Length.

Second one, for name it comes after ">" and before "<" so an index of on
both of those to get start and end of sub string and then do substring.

Guest · Jan 28, 2007

Thanks for the post Daniel. I neglected to mention that this is going to be
an HTML document. I am not sure the substring method will work for that.

Oliver Sturm · Jan 28, 2007

Hello JP,

The possibility of using an algorithm without regular expressions has
already been discussed. If you decide you want to go for regular
expressions, here's how (using the one more complex example you mentioned):

static void Main(string[] args) {
string sourceString = @"""image/4567"">name</a>";
string regex = @"image/(?<number>\d+)";
Match match = Regex.Match(sourceString, regex);
if (match.Success)
Console.WriteLine(match.Groups["number"].Value);
Console.ReadLine( );
}

Oliver Sturm

Oliver Sturm · Jan 28, 2007

Hello Daniel,

Use String.SubString();

Takes 2 arguments for start and end of substring. To find that
String.IndexOf();

Re-reading this, I'm not so sure this is really a good suggestion for this
case. After all there are several different possibilities for the final
delimiter of the number that is to be extracted - SubString always
requires you to find out where your string starts (not a problem here) and
how long it is (something of a problem here). Unless performance is really
an issue here (and of course tests prove that the SubString approach is
still more performant in its full complexity), I would go for the regex
approach for the sake of simplicity.

Oliver Sturm

Daniel · Jan 28, 2007

My reply below:

Re-reading this, I'm not so sure this is really a good suggestion for this
case. After all there are several different possibilities for the final
delimiter of the number that is to be extracted - SubString always
requires you to find out where your string starts (not a problem here)

"not a problem here"...so no need to mention any of the above.

how long it is (something of a problem here).

No it isn't. From what i can see he wants the inner part of tags between the
'>' and '<' as well as the image source location from his example. All of
which is easy to do using substring and indexof. Get the first one, then
start at the end of the first one and repeat until end of file.

Unless performance is really an issue here (and of course tests prove that
the SubString approach is still more performant in its full complexity), I
would go for the regex approach for the sake of simplicity.

Performance should always be an issue in my opinion but yes regex is a
viable alternative but not as versatile, if later images/myPic.jpg were used
or images/myfolder regex would fail and substring would not.

Oliver Sturm · Jan 28, 2007

Hello Daniel,

No it isn't. From what i can see he wants the inner part of tags between
the '>' and '<' as well as the image source location from his example. All
of which is easy to do using substring and indexof. Get the first one,
then start at the end of the first one and repeat until end of file.

Hm... I may be a bit unclear after all on the intentions of the OP. Still,
I see him mentioning samples without any delimiter at all, plus the one
where he shows the string in question to be an attribute to what may be an
XML tag. That makes at least two different delimiters - the '"' and the
end of the line or word. That's what I was referring to, I'm not sure
whether it was the OP's intention or just the fact that the samples were
given out of context.

I also think we're having a misunderstanding here - your comment above
seems to relate solely to the part of the problem where the "name" has to
be found, while my own solution, admittedly, focused solely on the part
where the number is to be found.

Performance should always be an issue in my opinion

I don't agree, sorry. I wouldn't be using .NET as a development platform
if performance was a general concern of mine. I'm not saying it's not
important - I'm saying that acceptable performance is something that can
usually be reached easily these days while relying on the convenience
functionality provided to us by advanced programming features like regular
expressions or indeed managed code.

but yes regex is a viable alternative but not as versatile, if later
images/myPic.jpg were used or images/myfolder regex would fail and
substring would not.

I don't agree once more. There's no technology as versatile for pattern
matching as regular expressions - writing an algorithm manually may in any
single case be more performant or actually enable features that wouldn't
easily be possible with regular expressions. But as a general purpose
toolset regular expressions are much easier to use and to maintain than
any such algorithm could ever be.

Obviously the regular expression I was using in my example took into
account the precise samples the OP was giving, so it would miss the
additional samples you're now mentioning. In the same way I could
criticize that your SubString idea would by default accept "myPic.jpg",
while only names consisting of digits were to be allowed.

Oliver Sturm

Oliver Sturm · Jan 28, 2007

Following up on this discussion, I noticed I had missed some of what your
original post was asking. Here's a fixed sample program that assumes a
complete tag as the source text and extracts both the number and the name
from it.

static void Main(string[] args) {
string sourceString = @"<a href=""image/4567"">name</a>";
string regex = @"\<a.*?""image/(?<number>\d+)"".*?\>(?<name>[^<]+)\<";
Match match = Regex.Match(sourceString, regex);
if (match.Success) {
Console.WriteLine("Number = " + match.Groups["number"].Value);
Console.WriteLine("Name = " + match.Groups["name"].Value);
}

Console.ReadLine( );
}

As my discussion with Daniel in the other part of the thread has also
shown, we are guessing somewhat as to the actual purpose of the code
you're trying to write. So if our solutions don't help you solve your
problem, I think it would be helpful to get some more information about
your intent.

Oliver Sturm

Daniel · Jan 28, 2007

lol i disagree obviously but i don't have the time to a) read all your reply
in detail and no doubt spark another reply and b) debate with someone over
such a minor requirement. OP you decide and good luck.

dave · Jan 28, 2007

You have a delimeter
if you have 2 parts in ur string and know the delimiter
substring does just fine
int ipos=0;
string mystring;

Mystring="abc/123';
ipos =mystring.indexOf("/");
string NewMystring:
NewMystring=Mystring.Substring(1,ipos-1)
MyString=Mystring.Substring(iPos+1); // continues to the end

dave

Guest · Jan 29, 2007

Oliver,

Thanks for the responses. What I want to do is read the html to pull out
specific text and eventually save to a database. I was testing using an
image link to see if I could do it with an image. What I really want to do
is access a page like this:

http://www.nfl.com/gamecenter/live/NFL_20020922_DAL@PHI

and get the player number and player name. For example if you hover over
D. McNabb, you will see this link in your status bar -
http://www.nfl.com/players/playerpage/133361. So basically I want to grab
all the playerpage numbers from this page and the name associated with the
playerpage number.

Hopefully this makes sense.

Oliver Sturm · Jan 29, 2007

Hello JP,

What I really want to do
is access a page like this:

http://www.nfl.com/gamecenter/live/NFL_20020922_DAL@PHI

and get the player number and player name. For example if you hover over
D. McNabb, you will see this link in your status bar -
http://www.nfl.com/players/playerpage/133361. So basically I want to grab
all the playerpage numbers from this page and the name associated with the
playerpage number.

Yes, I think it does. Try this:

static void Main(string[] args) {
WebClient webClient = new WebClient( );
Stream stream =
webClient.OpenRead("http://www.nfl.com/gamecenter/live/NFL_20020922_DAL@PHI");
StreamReader reader = new StreamReader(stream);
string content = reader.ReadToEnd( );

// The regex is specific to the slightly broken tag format on these pages
string regex =
@"\<a.*?href=""?/players/playerpage/(?<number>\d+)""?.*?\>(?<name>[^<]*)\</a\>";

MatchCollection matches = Regex.Matches(content, regex);

foreach (Match match in matches)
Console.WriteLine(String.Format("Name: {0}, Number: {1}",
match.Groups["name"].Value, match.Groups["number"].Value));

Console.ReadLine( );
}

Oliver Sturm

Guest · Jan 29, 2007

Oliver,

Yes, that does it, thanks! I was trying to do this using a aspx page and
throwing the output into a gridview. It was giving me all sorts of
extraneous html code with it but the console app does exactly what I am
after. I'll try to apply to my aspx page as well.

JP

Oliver Sturm said:
Hello JP,

What I really want to do
is access a page like this:

http://www.nfl.com/gamecenter/live/NFL_20020922_DAL@PHI

and get the player number and player name. For example if you hover over
D. McNabb, you will see this link in your status bar -
http://www.nfl.com/players/playerpage/133361. So basically I want to grab
all the playerpage numbers from this page and the name associated with the
playerpage number.

Click to expand...

Yes, I think it does. Try this:

static void Main(string[] args) {
WebClient webClient = new WebClient( );
Stream stream =
webClient.OpenRead("http://www.nfl.com/gamecenter/live/NFL_20020922_DAL@PHI");
StreamReader reader = new StreamReader(stream);
string content = reader.ReadToEnd( );

// The regex is specific to the slightly broken tag format on these pages
string regex =
@"\<a.*?href=""?/players/playerpage/(?<number>\d+)""?.*?\>(?<name>[^<]*)\</a\>";

MatchCollection matches = Regex.Matches(content, regex);

foreach (Match match in matches)
Console.WriteLine(String.Format("Name: {0}, Number: {1}",
match.Groups["name"].Value, match.Groups["number"].Value));

Console.ReadLine( );
}

Oliver Sturm

regular expression question	3	Jan 28, 2007
Regular expression question	10	Apr 23, 2004
Regular expressions	3	Mar 24, 2008
Regular expressions	20	Nov 24, 2011
Regex help	2	Feb 9, 2008
Regular Expression Problem	7	Aug 16, 2006
Canoscan FS4000US IR/FARE Problem	1	Jan 23, 2023
Regular Expression Question	2	Jun 13, 2006

Regular Expression Question

Guest

Daniel

Guest

Oliver Sturm

Oliver Sturm

Daniel

Oliver Sturm

Oliver Sturm

Daniel

dave

Guest

Oliver Sturm

Guest

Ask a Question

Similar Threads