Regular Expression Question

G

Guest

I am a newbie to regular expressions and want to extract a number from the
end of a string. The string would have these formats:

image/4567
image/45678
image/456789

I would also want to extract the name if possible from this string too:
"image/4567">name</a>

Thanks.
 
D

Daniel

Use String.SubString();

Takes 2 arguments for start and end of substring. To find that
String.IndexOf();

Both can be found on msdn under String class for usage.

So you'll be looking for the character "/" on the first prob. and retrieving
sub string for that location using indexof and the end of sub string is the
String.Length.

Second one, for name it comes after ">" and before "<" so an index of on
both of those to get start and end of sub string and then do substring.
 
G

Guest

Thanks for the post Daniel. I neglected to mention that this is going to be
an HTML document. I am not sure the substring method will work for that.
 
O

Oliver Sturm

Hello JP,

The possibility of using an algorithm without regular expressions has
already been discussed. If you decide you want to go for regular
expressions, here's how (using the one more complex example you mentioned):

static void Main(string[] args) {
string sourceString = @"""image/4567"">name</a>";
string regex = @"image/(?<number>\d+)";
Match match = Regex.Match(sourceString, regex);
if (match.Success)
Console.WriteLine(match.Groups["number"].Value);
Console.ReadLine( );
}


Oliver Sturm
 
O

Oliver Sturm

Hello Daniel,
Use String.SubString();

Takes 2 arguments for start and end of substring. To find that
String.IndexOf();

Re-reading this, I'm not so sure this is really a good suggestion for this
case. After all there are several different possibilities for the final
delimiter of the number that is to be extracted - SubString always
requires you to find out where your string starts (not a problem here) and
how long it is (something of a problem here). Unless performance is really
an issue here (and of course tests prove that the SubString approach is
still more performant in its full complexity), I would go for the regex
approach for the sake of simplicity.


Oliver Sturm
 
D

Daniel

My reply below:

Re-reading this, I'm not so sure this is really a good suggestion for this
case. After all there are several different possibilities for the final
delimiter of the number that is to be extracted - SubString always
requires you to find out where your string starts (not a problem here)

"not a problem here"...so no need to mention any of the above.
how long it is (something of a problem here).

No it isn't. From what i can see he wants the inner part of tags between the
'>' and '<' as well as the image source location from his example. All of
which is easy to do using substring and indexof. Get the first one, then
start at the end of the first one and repeat until end of file.
Unless performance is really an issue here (and of course tests prove that
the SubString approach is still more performant in its full complexity), I
would go for the regex approach for the sake of simplicity.

Performance should always be an issue in my opinion but yes regex is a
viable alternative but not as versatile, if later images/myPic.jpg were used
or images/myfolder regex would fail and substring would not.
 
O

Oliver Sturm

Hello Daniel,
No it isn't. From what i can see he wants the inner part of tags between
the '>' and '<' as well as the image source location from his example. All
of which is easy to do using substring and indexof. Get the first one,
then start at the end of the first one and repeat until end of file.

Hm... I may be a bit unclear after all on the intentions of the OP. Still,
I see him mentioning samples without any delimiter at all, plus the one
where he shows the string in question to be an attribute to what may be an
XML tag. That makes at least two different delimiters - the '"' and the
end of the line or word. That's what I was referring to, I'm not sure
whether it was the OP's intention or just the fact that the samples were
given out of context.

I also think we're having a misunderstanding here - your comment above
seems to relate solely to the part of the problem where the "name" has to
be found, while my own solution, admittedly, focused solely on the part
where the number is to be found.
Performance should always be an issue in my opinion

I don't agree, sorry. I wouldn't be using .NET as a development platform
if performance was a general concern of mine. I'm not saying it's not
important - I'm saying that acceptable performance is something that can
usually be reached easily these days while relying on the convenience
functionality provided to us by advanced programming features like regular
expressions or indeed managed code.
but yes regex is a viable alternative but not as versatile, if later
images/myPic.jpg were used or images/myfolder regex would fail and
substring would not.

I don't agree once more. There's no technology as versatile for pattern
matching as regular expressions - writing an algorithm manually may in any
single case be more performant or actually enable features that wouldn't
easily be possible with regular expressions. But as a general purpose
toolset regular expressions are much easier to use and to maintain than
any such algorithm could ever be.

Obviously the regular expression I was using in my example took into
account the precise samples the OP was giving, so it would miss the
additional samples you're now mentioning. In the same way I could
criticize that your SubString idea would by default accept "myPic.jpg",
while only names consisting of digits were to be allowed.


Oliver Sturm
 
O

Oliver Sturm

Following up on this discussion, I noticed I had missed some of what your
original post was asking. Here's a fixed sample program that assumes a
complete tag as the source text and extracts both the number and the name
from it.

static void Main(string[] args) {
string sourceString = @"<a href=""image/4567"">name</a>";
string regex = @"\<a.*?""image/(?<number>\d+)"".*?\>(?<name>[^<]+)\<";
Match match = Regex.Match(sourceString, regex);
if (match.Success) {
Console.WriteLine("Number = " + match.Groups["number"].Value);
Console.WriteLine("Name = " + match.Groups["name"].Value);
}

Console.ReadLine( );
}

As my discussion with Daniel in the other part of the thread has also
shown, we are guessing somewhat as to the actual purpose of the code
you're trying to write. So if our solutions don't help you solve your
problem, I think it would be helpful to get some more information about
your intent.


Oliver Sturm
 
D

Daniel

lol i disagree obviously but i don't have the time to a) read all your reply
in detail and no doubt spark another reply and b) debate with someone over
such a minor requirement. OP you decide and good luck.
 
D

dave

You have a delimeter
if you have 2 parts in ur string and know the delimiter
substring does just fine
int ipos=0;
string mystring;


Mystring="abc/123';
ipos =mystring.indexOf("/");
string NewMystring:
NewMystring=Mystring.Substring(1,ipos-1)
MyString=Mystring.Substring(iPos+1); // continues to the end

dave
 
G

Guest

Oliver,

Thanks for the responses. What I want to do is read the html to pull out
specific text and eventually save to a database. I was testing using an
image link to see if I could do it with an image. What I really want to do
is access a page like this:

http://www.nfl.com/gamecenter/live/NFL_20020922_DAL@PHI

and get the player number and player name. For example if you hover over
D. McNabb, you will see this link in your status bar -
http://www.nfl.com/players/playerpage/133361. So basically I want to grab
all the playerpage numbers from this page and the name associated with the
playerpage number.

Hopefully this makes sense.
 
O

Oliver Sturm

Hello JP,
What I really want to do
is access a page like this:

http://www.nfl.com/gamecenter/live/NFL_20020922_DAL@PHI

and get the player number and player name. For example if you hover over
D. McNabb, you will see this link in your status bar -
http://www.nfl.com/players/playerpage/133361. So basically I want to grab
all the playerpage numbers from this page and the name associated with the
playerpage number.

Yes, I think it does. Try this:

static void Main(string[] args) {
WebClient webClient = new WebClient( );
Stream stream =
webClient.OpenRead("http://www.nfl.com/gamecenter/live/NFL_20020922_DAL@PHI");
StreamReader reader = new StreamReader(stream);
string content = reader.ReadToEnd( );

// The regex is specific to the slightly broken tag format on these pages
string regex =
@"\<a.*?href=""?/players/playerpage/(?<number>\d+)""?.*?\>(?<name>[^<]*)\</a\>";

MatchCollection matches = Regex.Matches(content, regex);

foreach (Match match in matches)
Console.WriteLine(String.Format("Name: {0}, Number: {1}",
match.Groups["name"].Value, match.Groups["number"].Value));

Console.ReadLine( );
}


Oliver Sturm
 
G

Guest

Oliver,

Yes, that does it, thanks! I was trying to do this using a aspx page and
throwing the output into a gridview. It was giving me all sorts of
extraneous html code with it but the console app does exactly what I am
after. I'll try to apply to my aspx page as well.

JP

Oliver Sturm said:
Hello JP,
What I really want to do
is access a page like this:

http://www.nfl.com/gamecenter/live/NFL_20020922_DAL@PHI

and get the player number and player name. For example if you hover over
D. McNabb, you will see this link in your status bar -
http://www.nfl.com/players/playerpage/133361. So basically I want to grab
all the playerpage numbers from this page and the name associated with the
playerpage number.

Yes, I think it does. Try this:

static void Main(string[] args) {
WebClient webClient = new WebClient( );
Stream stream =
webClient.OpenRead("http://www.nfl.com/gamecenter/live/NFL_20020922_DAL@PHI");
StreamReader reader = new StreamReader(stream);
string content = reader.ReadToEnd( );

// The regex is specific to the slightly broken tag format on these pages
string regex =
@"\<a.*?href=""?/players/playerpage/(?<number>\d+)""?.*?\>(?<name>[^<]*)\</a\>";

MatchCollection matches = Regex.Matches(content, regex);

foreach (Match match in matches)
Console.WriteLine(String.Format("Name: {0}, Number: {1}",
match.Groups["name"].Value, match.Groups["number"].Value));

Console.ReadLine( );
}


Oliver Sturm
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

regular expression question 3
Regular expression question 10
Regular expressions 3
Regular expressions 20
Regex help 2
Regular Expression Problem 7
Canoscan FS4000US IR/FARE Problem 1
Regular Expression Question 2

Top