Regular Expressions Help

J

JoeW

Sorry if I am asking something has already been asked but I am
somewhat stuck on a problem with regular expressions. My task is
simple: take an HTTP url and remove all the characters after the .com|
org|edu TLD identifier and return the string. So far all my attempts
have been unsuccessful. Basically all I want to do is take the URL
passed in and strip out the unwanted parts return the new string so i
can add it to a database for link tracking. Any pointers or maybe a
kick in the right direction?

Thanks again to any and all suggestions.
 
J

JoeW

Sorry if I am asking something has already been asked but I am
somewhat stuck on a problem with regular expressions. My task is
simple: take an HTTP url and remove all the characters after the .com|
org|edu TLD identifier and return the string. So far all my attempts
have been unsuccessful. Basically all I want to do is take the URL
passed in and strip out the unwanted parts return the new string so i
can add it to a database for link tracking. Any pointers or maybe a
kick in the right direction?

Thanks again to any and all suggestions.

Disregard this, I found the solution here:

http://regexlib.com/RETester.aspx?regexp_id=1577

the code i used was as follows:

Regex r = new Regex(@"(?<protocol>http(s)?|ftp)://(?<server>([A-Za-
z0-9-]+\.)*(?<basedomain>[A-Za-z0-9-]+\.[A-Za-z0-9]+))+((/?)(?<path>(?
<dir>[A-Za-z0-9\._\-]+)(/){0,1}[A-Za-z0-9.-/]*)){0,1}",
RegexOptions.Compiled);
return r.Match(url).Result("$6://$7");
 
A

Alberto Poblacion

JoeW said:
Sorry if I am asking something has already been asked but I am
somewhat stuck on a problem with regular expressions. My task is
simple: take an HTTP url and remove all the characters after the .com|
org|edu TLD identifier and return the string. So far all my attempts
have been unsuccessful. Basically all I want to do is take the URL
passed in and strip out the unwanted parts return the new string so i
can add it to a database for link tracking. Any pointers or maybe a
kick in the right direction?

You don't need to use RegEx for that. It's much simpler to use the
System.Uri class, which has a variety of properties that wil give you the
various parts of your URL. In your case, I believe that the .Host property
will give you exactly what you want.
 
J

JoeW

You don't need to use RegEx for that. It's much simpler to use the
System.Uri class, which has a variety of properties that wil give you the
various parts of your URL. In your case, I believe that the .Host property
will give you exactly what you want.
Alberto,

Really?, I completely forgot about System.Uri..Regular Expressions
rack my brain so I guess overlooked the solutions you gave. I will
give that a shot, thanks again for your help.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top