splitting string with a string

C

CSharper

I have a html page that I retrived and stored in a string and I want
to split it based on <td> elements. I know only way you can split
using string.split is using characters. The other option is traverse
and split. Is there any other way you can split a string using string
token itself?
Thanks,
 
J

Jon Skeet [C# MVP]

I have a html page that I retrived and stored in a string and I want
to split it based on <td> elements. I know only way you can split
using string.split is using characters. The other option is traverse
and split. Is there any other way you can split a string using string
token itself?

Use Regex.Split.

Jon
 
P

Pavel Minaev

I have a html page that I retrived and stored in a string and I want
to split it based on <td> elements. I know only way you can split
using string.split is using characters. The other option is traverse
and split. Is there any other way you can split a string using string
token itself?

Yes; use String.Split. It has an overload which takes String (not
char) delimiters:

public string[] Split(
string[] separator,
StringSplitOptions options
)
 
N

Nicholas Paldino [.NET/C# MVP]

CSharper,

Have you taken a look at the RegEx class? Specifically, the Split
method on the RegEx class?
 
G

Göran Andersson

CSharper said:
I have a html page that I retrived and stored in a string and I want
to split it based on <td> elements. I know only way you can split
using string.split is using characters. The other option is traverse
and split. Is there any other way you can split a string using string
token itself?
Thanks,

As suggested, the Regex class also has a Split method, but you can do
better than that with a regular expression.

You can use the pattern "<td[^>]*>([\w\W]*?)</td>" with the Regex.Match
method to find the contents of all td elements in the string.

<td[^>]*> matches the starting tag even if it has arguments
[^>] matches any character except >
* means zero or more matches
() catches the value
[\w\W] matches any character
*? makes a non-gready match, so that it ends at the first </td>, not the
last

Note: This doesn't work well if you have nested tables.
 
J

Jon Skeet [C# MVP]


Which part didn't you understand? In the RegEx class, there's a Split
method. Construct an appropriate regex, and call the Split method.

As Pavel mentioned, String also now contains an overload for
String.Split which takes an array of delimiter strings instead of
chars. It's "new" to 2.0, but hopefully that won't be an issue for
you.

Jon
 
N

Nicholas Paldino [.NET/C# MVP]

haha, he was talking about himself I believe. As in "Duh, why didn't I
figure that out"


--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)


Which part didn't you understand? In the RegEx class, there's a Split
method. Construct an appropriate regex, and call the Split method.

As Pavel mentioned, String also now contains an overload for
String.Split which takes an array of delimiter strings instead of
chars. It's "new" to 2.0, but hopefully that won't be an issue for
you.

Jon
 
M

Maxwell

I actually used this functionality quite heavily recently, to narrow in on
an encoded url in a webpage source. I split the string after a "<td
id=\"...\">" element, or something similar, that occurred once and was
unique, and took the second part.
Then I took the first part of the split at "</td>".
Then I took the second part of "<a href=\"".
Then I took the first part of ">".


Which part didn't you understand? In the RegEx class, there's a Split
method. Construct an appropriate regex, and call the Split method.

As Pavel mentioned, String also now contains an overload for
String.Split which takes an array of delimiter strings instead of
chars. It's "new" to 2.0, but hopefully that won't be an issue for
you.

Jon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top