Anyone know how...

O

O-('' Q)

You could explain it better by being more specific than "looks for
certain characters". Heck, my own description of what I *think* it's
doing was more specific than yours!

As I asked for before, please give us example inputs and desired
outputs.

Rather than showing us you're having a bad day, simply tell me what
information you ARE looking for and I will try to explain it. I am not
sure what part of "it parses the HTML and returns the IP address within
the HTML" you are getting lost on, but let me know and I will be MORE
than happy to try to explain it to you.

You seem more argumentative than helpful. If the former is the case,
then by all means change the channel as no one is forcing you to sit
here and watch the drama unfold on someone who is simply trying to
learn and asking for help.

My description may have been vague to you, but others seem to have
understood it well enough to offer help instead of incessant badgering
and claims that telling you what the code SPECIFICALLY does aren't good
enough.
 
J

Jon Skeet [C# MVP]

O-('' Q) said:
Rather than showing us you're having a bad day,

I'm not having a bad day. I'm trying to help you, and you're making it
difficult.
simply tell me what
information you ARE looking for and I will try to explain it. I am not
sure what part of "it parses the HTML and returns the IP address within
the HTML" you are getting lost on, but let me know and I will be MORE
than happy to try to explain it to you.

I gave a good example of the kind of description I'm looking for:

"[...] It *looks* like it's trying to skip 'v' characters at the start
of the line, skip the next 7 characters, then take everything up to the
next quote."

That's a nice precise description of what it's doing - *if* I
understood the Delphi correctly.

You also still haven't given any samples, which would be really good
ways of describing what you're after. For a start, they'd give unit
test cases...
You seem more argumentative than helpful.

I think you'll find I'm very helpful once I've been given appropriate
information.

Put it this way - if you were given the task of writing something to
"Parse the HTML and return the IP address within the HTML", what would
your first question be? Would it by any chance be "What's the format of
the HTML?"
My description may have been vague to you, but others seem to have
understood it well enough to offer help instead of incessant badgering
and claims that telling you what the code SPECIFICALLY does aren't good
enough.

You haven't said what the code specifically does. You've said what the
code does in very general terms.

Let's try this again:
1) I don't think that translating directly from the Delphi is going to
give you the best way of tackling the problem. This really looks like a
job for regular expressions.
2) It's hard to be sure I fully understand the problem from vague
descriptions and code in a language I'm not familiar with.
3) If you provide a detailed breakdown of what the code is meant to do,
and some test cases (sample input and output) I'm quite happy to try to
write a correct regular expression for you.

If you want to keep claiming you've given enough information though,
that's fine. Just remember that getting a working solution benefits
*you* far more than it benefits me.

Jon
 
J

Jon Skeet [C# MVP]

<snip>

Okay, having put in more effort than I should have had to, here's how
you could have replied to my question:

"The code takes the following steps:
1) If the HTML is empty, it returns that the address is unknown
2) It looks for the first occurrence of "host" within the HTML
3) It looks for the first 'v' character after that point (the start of
the value attribute)
4) It then skips 7 characters (the value=" bit)
5) It finds the next " character (skipping two characters first - not
sure why)
6) It returns the substring between the position at the end of value="
and the position from step 5.

Here's a sample piece of HTML:
<p align="center"><input type="text" name="host" size="45"
value="213.146.158.130"><small><span style="font-family: Arial"><font
face="Arial"><br>";

I would like the above to return the string 213.146.158.130.


Here's how I would have replied:

Thanks for the information. As I thought, a regular expression is more
appropriate here - for one thing, the current code seems to assume that
if the HTML is non-empty, then the right data will be there. Here's
some code which does what you want, including a Main method which
prints out the result from your test data:

using System;
using System.Text.RegularExpressions;

public class Test
{
static void Main()
{
string html = "<p align=\"center\"><input type=\"text\"
name=\"host\" "+
"size=\"45\" value=\"213.146.158.130\"><small>"+
"<span style=\"font-family: Arial\"><font
face=\"Arial\"><br>";

Console.WriteLine (ParseAddress(html));

}

static string ParseAddress(string html)
{
Regex re = new Regex ("name=\"host\" .*value=\"([^\"]*)\"");

Match match = re.Match(html);
if (match.Success)
{
return match.Groups[1].Value;
}
else
{
return "IP address unavailable";
}
}
}

Some notes:
1) This only picks out the first match; is that okay?
2) The regular expression is really:
name="host" .*value="([^"]*)"
The backslashes are for escaping the double-quote characters.
3) You could create the regular expression once and tell it to compile
it in memory for more speed, if performance is an issue for you.



Jon
 
O

O-('' Q)

Let's try this again:

Sure. Let's.
1) I don't think that translating directly from the Delphi is going to
give you the best way of tackling the problem. This really looks like a
job for regular expressions.

I am more than open to using your method, but doing so in c#.net is new
for me. Like I said, it was easy for me in Delphi just as it would be
easy for YOU in c#, correct? If you knew what you wanted to do, that
is. (Yeah, the latch)
2) It's hard to be sure I fully understand the problem from vague
descriptions and code in a language I'm not familiar with.

I understand you're not familiar with Delphi. I can work with that if I
get better requests than "tell me what it does." Then, when I tell you
what it does in layman's terms, I get "that isn't good enough. Tell me
more." Can you see where that throws someone for a loop?
3) If you provide a detailed breakdown of what the code is meant to do,
and some test cases (sample input and output) I'm quite happy to try to
write a correct regular expression for you.

Sample Input: It downloads the HTML from the site listed in the code.
Sample Output: Your IP Address is 192.168.1.103

But, like I have said, it NEEDS to be able to parse the html and
disregard everything except the IP address. This is where I am lost
with you, I guess. I am telling you what it does, how it does so and
why I need it to do. You keep saying "use regular expressions." Fine.
Good choice since you're the MVP. But for a total newbie in this
language, it is not so cut and dry. How about an example of your
regular expressions?
If you want to keep claiming you've given enough information though,
that's fine. Just remember that getting a working solution benefits
*you* far more than it benefits me.

Like I said earlier; I am not sure how to explain it better. I said
what it does, verbatim. I may not have given a total and complete
breakdown of the code line-for-line, but I explained what it does and
what I need it to do in c#.

I am not trying to benefit you in any way. I am trying to learn, not
even for my own benefit, really, but because I find learning new things
fun and challenging. I will give you that much, however. You've made
this MUCH more challenging.
 
J

Jon Skeet [C# MVP]

O-('' Q) said:
Sure. Let's.

I am more than open to using your method, but doing so in c#.net is new
for me. Like I said, it was easy for me in Delphi just as it would be
easy for YOU in c#, correct? If you knew what you wanted to do, that
is. (Yeah, the latch)
Indeed.


I understand you're not familiar with Delphi. I can work with that if I
get better requests than "tell me what it does." Then, when I tell you
what it does in layman's terms, I get "that isn't good enough. Tell me
more." Can you see where that throws someone for a loop?

I would if I hadn't given you a good example of the kind of thing I'm
after - namely "It looks for the first 'host' string within the data,
then skips 7 characters, then [...]". I think that should have given
you a pretty good idea of the kind of reply which would have been
helpful. I don't think any of what I've asked for is unreasonable.
Sample Input: It downloads the HTML from the site listed in the code.

That's not sample input. Sample input for the bit you're actually
worried about is a string.
Sample Output: Your IP Address is 192.168.1.103

That wouldn't be the correct output unless you'd specified some sample
input which included 192.168.0.3.
But, like I have said, it NEEDS to be able to parse the html and
disregard everything except the IP address. This is where I am lost
with you, I guess. I am telling you what it does, how it does so and
why I need it to do. You keep saying "use regular expressions." Fine.
Good choice since you're the MVP. But for a total newbie in this
language, it is not so cut and dry. How about an example of your
regular expressions?

You've got to know what you're looking for before you can provide a
regular expression which looks for it.
Like I said earlier; I am not sure how to explain it better. I said
what it does, verbatim. I may not have given a total and complete
breakdown of the code line-for-line, but I explained what it does and
what I need it to do in c#.

Which bit exactly is "verbatim"? To me, "verbatim" in this case really
would mean "line-for-line" or at least "step-for-step". Here are a few
of your descriptions:

<quote>
It retrieves your external IP by parsing the results from
network-tools.com and
displays it to the user in a text area.
</quote>

<quote>
it parses the HTML to extract the IP address which is displayed
</quote>

<quote>
Uhh... well... hehe, it parses through the HTML, looks for certain
characters and skips them until it finds what it is looking for.
</quote>

None of those are verbatim descriptions. They're very high level
descriptions.

My other post (with the actual answer) provides a *real* verbatim
description of what the Delphi code does.
I am not trying to benefit you in any way. I am trying to learn, not
even for my own benefit, really, but because I find learning new things
fun and challenging. I will give you that much, however. You've made
this MUCH more challenging.

I *hope* you'll look at my other reply and learn not just how to
perform the task you're interested in, but how to ask questions the
smart way (as Eric Raymond puts it).

Please read

I think you'll find you can get answers a lot quicker if you follow the
guidelines contained there.

Jon
 
O

O-('' Q)

Some notes:
1) This only picks out the first match; is that okay?
2) The regular expression is really:
name="host" .*value="([^"]*)"
The backslashes are for escaping the double-quote characters.
3) You could create the regular expression once and tell it to compile
it in memory for more speed, if performance is an issue for you.

I appreciate your time and effort, Jon. Really I do. You simply seemed
more like you were out for an argument than to help and if I took you
wrong, I apologize. I really do appreciate any help I can get when I
learn and I realize everyone does so at the cost of their own time.

Again, if I took you wrong, I am sorry.

Thank you for the help. I will use this to see if I can get it working
now that I see how regular expression is handled in c#.net for a
change.

As you can see... Delphi code relied on me parsing the strings myself,
whereas this code was INCREDIBLY more simple and yet does the same
thing. This floors me, really. All of those delphi lines compressed
into what looks like, to me, something that should not work, yet does.

Anyway, thanks again Jon.
 
S

Steve Barnett

Jon Skeet is offering you the best solution really. Regular Expressions seem
to be the way to go (though they all look strange to me - haven't read that
book yet).

If you want to pursue this option though... see inline comments.

O-('' Q) said:
string sHTML;
char[] sIPAddr = new char[256];
int i;

txtIP.Clear();
Indy.Sockets.HTTP IdHTTP = new Indy.Sockets.HTTP();
sHTML = IdHTTP.Get("http://www.network-tools.com/");

if ( sHTML.Length == 0 )
{
return;
}

i = sHTML.IndexOf( "value=" );
if (i == 0)
{
return;
}

"i" will tell you where the string "value=" begins. You need to add the
length of "value=" to i to skip over it. If you do that, then the next while
loop serves no purpose.
while ( sHTML=='v' ) {
i++;
}
if (i >= sHTML.Length)
{
return;
}
i++;


Ok, assuming that there are no spaces between "value=" and the first quote,
and you have added 6 to it (the length of value=) then "i" now points at the
first quote.
int x = i + 2;

Don't know what "x" is now intended to point at... "i" should be pointing at
the opening quote, so you should probably be adding 1 to it, skiping the
first quote.

The next loop is still wrong, however, as it loops while the indexed
character IS a quote. You need to add the exclamation point to the
comparison, as you're trying to loop while the indexed character IS NOT a
quote. [ while (!sHtml[x].Equals("\"")) ]
while ( sHTML[x].Equals("\"") ) {
x++;
}
if (x >= sHTML.Length)
{
return;
}
try
{

(You'll need to check the syntax but) how about using the SubString
function? Something like
sHtml = sHtml.SubString(i, x-i);
If I remember rightly, the syntax is (start index, length) so it should be
something like that. (I'm like a fish out of water without popup syntax
help - sad isn't it).
sHTML.CopyTo(i, sHTML, x - 1, sHTML.Length);
}
catch (Exception except)
{
// Show error
txtIP.Text = except.ToString();
}
txtIP.Text = sHTML.ToString();

That is my current code. What this NEEDS to do is what this program
does:
http://mysite.verizon.net/unclelugzy/MCDIPsentry.exe

Sorry, nothing personal, but I don't download programs that people post in
news groups.
Basically, I am trying to port this program to C#.NET in order to help
me learn more about the language.

We're all learning, which is why having people like Jon on-side is very
useful.
I hope this helps some. As for returned data... the code above works,
but it just kicks back ALL of the HTML on the site, rather than
disregarding everything except the IP address.

I think that, if you fix the things above, you'll find that you're now
selecting the right bits. As I said before, it's best to put a break in the
code and single step through it - you'll learn a lot that way.

Steve
 
J

Jon Skeet [C# MVP]

O-('' Q) said:
Some notes:
1) This only picks out the first match; is that okay?
2) The regular expression is really:
name="host" .*value="([^"]*)"
The backslashes are for escaping the double-quote characters.
3) You could create the regular expression once and tell it to compile
it in memory for more speed, if performance is an issue for you.

Anyway, thanks again Jon.

My pleasure - in the end :)

Now that the actual task is done, here's a *relatively* direct
translation of the Delphi into C#. It's clearly not the best way of
doing it, but it might help you in the future. Note that I'm using
IndexOf instead of manually stepping through each character:

const string Unavailable = "IP Unavailable";
static string ParseAddress(string html)
{
if (html.Length==0)
{
return Unavailable;
}

int i = html.IndexOf("host");
if (i==-1)
{
return Unavailable;
}
i = html.IndexOf('v', i);
if (i==-1)
{
return Unavailable;
}
i += 7;
int x = html.IndexOf('\"', i+2);
if (x==-1)
{
return Unavailable;
}
return html.Substring(i, x-i);
}

(Note that I'd normally return null instead of a constant string - the
above is just to keep it closer to your original interface.)

I made a mistake in my previous analysis of your code, by the way - my
claim that "the current code seems to assume that if the HTML is
non-empty, then the right data will be there" is entirely wrong.

It *does* assume a certain amount - like that the first v will be for
the value attribute - but that's not so bad :)

Jon
 
J

Jon Skeet [C# MVP]

O-('' Q) wrote:

As you can see... Delphi code relied on me parsing the strings myself,
whereas this code was INCREDIBLY more simple and yet does the same
thing. This floors me, really. All of those delphi lines compressed
into what looks like, to me, something that should not work, yet does.

Oops - forgot to say: do you want me to go through *how* the regex
works? Regular expressions can often look like "magic" (which is why I
don't like using them when there *is* a more straightforward way with
simple string operations) but they're not too bad in the end. This one
is actually quite simple. Here's what it looks like when broken up a
bit:

name="host" <-- Look for the literal name="host" (with a space after
the quote)
..* <-- Look for any character (.) repeated any number of times (*).
value=" <-- Look for the literal value="
([^"]*) <-- This is the trickiest bit, which I'll expand below. It's
the grouping part
" <-- Look for the literal "


The grouping part consists of the open and close brackets, which just
mean "this is a capturing group", and the stuff inside them, which says
what to capture.
[^"] means "any character other than double quote"
* means "repeated any number of times"


Now, the above on its own probably isn't good enough to help you
understand it if you're completely new to regular expressions. However,
if you find a regular expressions tutorial, read it, then come back,
the above should seem almost trivial!

Jon
 
S

Steve Barnett

My go at the parsing routine (not using RegEx) is shown below. It assumes
that you retrieved the html and passed it as a parameter to this function.

string ParseHtml(string sHtml)
{
int i;

// Where does the host name appear
i = sHtml.IndexOf("host");
if (i == 0) return null;

// Where is it's value?
i = sHtml.IndexOf("value=", i);
if (i == 0) return null;

// Point after the first quote after Value=
i+= 7;

// start parsing for the next quote after i
int x = sHtml.IndexOf("\"", i);
if (x == 0) return null;

// Return the ip address
return sHtml.Substring(i, x - i);
}

It worked with the HTML from that web site.

Steve


Steve Barnett said:
Jon Skeet is offering you the best solution really. Regular Expressions
seem to be the way to go (though they all look strange to me - haven't
read that book yet).

If you want to pursue this option though... see inline comments.

O-('' Q) said:
string sHTML;
char[] sIPAddr = new char[256];
int i;

txtIP.Clear();
Indy.Sockets.HTTP IdHTTP = new Indy.Sockets.HTTP();
sHTML = IdHTTP.Get("http://www.network-tools.com/");

if ( sHTML.Length == 0 )
{
return;
}

i = sHTML.IndexOf( "value=" );
if (i == 0)
{
return;
}

"i" will tell you where the string "value=" begins. You need to add the
length of "value=" to i to skip over it. If you do that, then the next
while loop serves no purpose.
while ( sHTML=='v' ) {
i++;
}
if (i >= sHTML.Length)
{
return;
}
i++;


Ok, assuming that there are no spaces between "value=" and the first
quote, and you have added 6 to it (the length of value=) then "i" now
points at the first quote.
int x = i + 2;

Don't know what "x" is now intended to point at... "i" should be pointing
at the opening quote, so you should probably be adding 1 to it, skiping
the first quote.

The next loop is still wrong, however, as it loops while the indexed
character IS a quote. You need to add the exclamation point to the
comparison, as you're trying to loop while the indexed character IS NOT a
quote. [ while (!sHtml[x].Equals("\"")) ]
while ( sHTML[x].Equals("\"") ) {
x++;
}
if (x >= sHTML.Length)
{
return;
}
try
{

(You'll need to check the syntax but) how about using the SubString
function? Something like
sHtml = sHtml.SubString(i, x-i);
If I remember rightly, the syntax is (start index, length) so it should be
something like that. (I'm like a fish out of water without popup syntax
help - sad isn't it).
sHTML.CopyTo(i, sHTML, x - 1, sHTML.Length);
}
catch (Exception except)
{
// Show error
txtIP.Text = except.ToString();
}
txtIP.Text = sHTML.ToString();

That is my current code. What this NEEDS to do is what this program
does:
http://mysite.verizon.net/unclelugzy/MCDIPsentry.exe

Sorry, nothing personal, but I don't download programs that people post in
news groups.
Basically, I am trying to port this program to C#.NET in order to help
me learn more about the language.

We're all learning, which is why having people like Jon on-side is very
useful.
I hope this helps some. As for returned data... the code above works,
but it just kicks back ALL of the HTML on the site, rather than
disregarding everything except the IP address.

I think that, if you fix the things above, you'll find that you're now
selecting the right bits. As I said before, it's best to put a break in
the code and single step through it - you'll learn a lot that way.

Steve
 
O

O-('' Q)

Now that the actual task is done, here's a *relatively* direct...

Wow... you've shown me a lot, Jon. I think once I get over the initial
hurdles, I will really enjoy c# overall. That translation is nice but,
like you said, the regular expression just really seems the better
route to do this.

I just wish I began learning this long ago, when it first presented
itself to me. I thought I would stick it out with Delphi, but when MS
bought out the original dev team over at Borland, things just turned
ugly in Delphi.

The VS2005 has a lot of the old Delphi feel to it, so it's becoming
more and more comfortable for me thanks, in part, to people like you,
Jon.
 
J

Jon Skeet [C# MVP]

Steve said:
My go at the parsing routine (not using RegEx) is shown below. It assumes
that you retrieved the html and passed it as a parameter to this function.

<snip>

Very close. The only problem is that IndexOf returns -1 if it can't
find the string, not 0. Other than that, I think it's correct.

Jon
 
S

Steve Barnett

That's Ok, "O-<" Q)" seems to have stopped listing to anything I have to say
anyway.
Steve
 
O

O-('' Q)

That's Ok, "O-<" Q)" seems to have stopped listing to anything I have to say
anyway.
Steve

Not that I have stopped listening, hehe, just that I have become sorta
sidetracked. I have been meaning to reply to you and thank you for your
help, Steve. I do appreciate it and notice it. Sorry for my delay. :)

Also, just call me Kirby.

O-("Q)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top