Search for values in between two values in a string?

P

pat

The subject may not be written well but what I'm trying to do is
search for a value in a string by searching for its span class. The
string is an HTML file so what I want is the value in between <span
class="myclass"> and </span>

Is there anything in vb.net that can do that? c#.net has a function,
but I can't seem to find it in vb.net. Thanks
 
T

Tom Shelton

The subject may not be written well but what I'm trying to do is
search for a value in a string by searching for its span class. The
string is an HTML file so what I want is the value in between <span
class="myclass"> and </span>

Is there anything in vb.net that can do that? c#.net has a function,
but I can't seem to find it in vb.net. Thanks

What function were you thinking existed in C#? C# has no intrinsic functions
like VB. It only uses method of objects in the framework....

But, if I was going to do this, I would probably look at using
System.Text.RegularExpressions, or look at some combination of String.IndexOf
and String.Substring...
 
T

Tom Shelton

The subject may not be written well but what I'm trying to do is
search for a value in a string by searching for its span class. The
string is an HTML file so what I want is the value in between <span
class="myclass"> and </span>

Is there anything in vb.net that can do that? c#.net has a function,
but I can't seem to find it in vb.net. Thanks

What function were you thinking existed in C#? C# has no intrinsic functions
like VB. It only uses method of objects in the framework....

But, if I was going to do this, I would probably look at using
System.Text.RegularExpressions, or look at some combination of String.IndexOf
and String.Substring...
 
P

pat

What function were you thinking existed in C#?  C# has no intrinsic functions
like VB.  It only uses method of objects in the framework....

But, if I was going to do this, I would probably look at using
System.Text.RegularExpressions, or look at some combination of String.IndexOf
and String.Substring...

I'll check those out, here is the code i found for C#, looks like I
misread it because its was created by someone else and just named
GetStringInBetween. Oops!


public static string[] GetStringInBetween(string strBegin,

string strEnd, string strSource,

bool includeBegin, bool includeEnd)

{

string[] result ={ "", "" };

int iIndexOfBegin = strSource.IndexOf(strBegin);

if (iIndexOfBegin != -1)

{

// include the Begin string if desired

if (includeBegin)

iIndexOfBegin -= strBegin.Length;

strSource = strSource.Substring(iIndexOfBegin

+ strBegin.Length);

int iEnd = strSource.IndexOf(strEnd);

if (iEnd != -1)

{

// include the End string if desired

if (includeEnd)

iEnd += strEnd.Length;

result[0] = strSource.Substring(0, iEnd);

// advance beyond this segment

if (iEnd + strEnd.Length < strSource.Length)

result[1] = strSource.Substring(iEnd

+ strEnd.Length);

}

}

else

// stay where we are

result[1] = strSource;

return result;

}
 
P

pat

What function were you thinking existed in C#?  C# has no intrinsic functions
like VB.  It only uses method of objects in the framework....

But, if I was going to do this, I would probably look at using
System.Text.RegularExpressions, or look at some combination of String.IndexOf
and String.Substring...

I'll check those out, here is the code i found for C#, looks like I
misread it because its was created by someone else and just named
GetStringInBetween. Oops!


public static string[] GetStringInBetween(string strBegin,

string strEnd, string strSource,

bool includeBegin, bool includeEnd)

{

string[] result ={ "", "" };

int iIndexOfBegin = strSource.IndexOf(strBegin);

if (iIndexOfBegin != -1)

{

// include the Begin string if desired

if (includeBegin)

iIndexOfBegin -= strBegin.Length;

strSource = strSource.Substring(iIndexOfBegin

+ strBegin.Length);

int iEnd = strSource.IndexOf(strEnd);

if (iEnd != -1)

{

// include the End string if desired

if (includeEnd)

iEnd += strEnd.Length;

result[0] = strSource.Substring(0, iEnd);

// advance beyond this segment

if (iEnd + strEnd.Length < strSource.Length)

result[1] = strSource.Substring(iEnd

+ strEnd.Length);

}

}

else

// stay where we are

result[1] = strSource;

return result;

}
 
M

Mike

pat said:
The subject may not be written well but what I'm trying to do is
search for a value in a string by searching for its span class. The
string is an HTML file so what I want is the value in between <span
class="myclass"> and </span>

Is there anything in vb.net that can do that? c#.net has a function,
but I can't seem to find it in vb.net. Thanks

You can use XPATH or LINQ to perform XML or HTML lookups. I'm not
familar with LINQ and I believe its for .NET 3.0 with I haven't
upgraded to yet. But in XPATH, its fairly simple;

Example HTML file:

<!-- File: c:\spantest.htm-->
<html>
<body>
<span class="myclass">text</span>
<span class="myclass1">text1</span>
<span class="myclass2">text2</span>
<span class="myclass2">text2.2</span>
<span class="myclass2">text2.3</span>
<span class="myclass2">text2.4</span>
<span class="myclass3">text3</span>
<span class="myclass4">text4</span>
<span class="myclass5">text5</span>
</body>
</html>

Using XPATH VB.NET Example:

Imports System.XML.XPath

Class Test_Xpath

'' load xml doc, query and return values array
Shared Function GetHtmlValue( _
ByVal xpq As String, _
ByVal xmlfn As String) As String()
Dim xmldoc As New XPathDocument(xmlfn)
Dim nav As XPathNavigator = xmldoc.CreateNavigator()
Dim iterator As XPathNodeIterator
Dim result As New List(Of String)
Try
iterator = nav.Select(xpq)
Do While iterator.MoveNext
result.Add(iterator.Current.Value)
Loop
Catch ex As Exception
End Try
Return result.ToArray()
End Function

Shared Sub DoTest1(byval query as string, byval fn as string)
Dim res As String() = GetHtmlValue(query, fn)
Console.WriteLine("Total DOM elements Found: {0}", res.Length)
For Each s As String In res
Console.WriteLine("{0}", s)
Next
Console.ReadKey(True)
End Sub

Shared Sub main()

' expecting one
DoTest1("//child::*/span[@class='myclass']", "c:\spantest.htm")

' expecting multiple
DoTest1("//child::*/span[@class='myclass2']","c:\spantest.htm")

' expecting none
DoTest1("//child::*/span[@class='myclassXX']","c:\spantest.htm")

End Sub
End Class

Of course, learning XPATH statements is the trick. It is well
documented in MSDN.

--
 
M

Mike

pat said:
The subject may not be written well but what I'm trying to do is
search for a value in a string by searching for its span class. The
string is an HTML file so what I want is the value in between <span
class="myclass"> and </span>

Is there anything in vb.net that can do that? c#.net has a function,
but I can't seem to find it in vb.net. Thanks

You can use XPATH or LINQ to perform XML or HTML lookups. I'm not
familar with LINQ and I believe its for .NET 3.0 with I haven't
upgraded to yet. But in XPATH, its fairly simple;

Example HTML file:

<!-- File: c:\spantest.htm-->
<html>
<body>
<span class="myclass">text</span>
<span class="myclass1">text1</span>
<span class="myclass2">text2</span>
<span class="myclass2">text2.2</span>
<span class="myclass2">text2.3</span>
<span class="myclass2">text2.4</span>
<span class="myclass3">text3</span>
<span class="myclass4">text4</span>
<span class="myclass5">text5</span>
</body>
</html>

Using XPATH VB.NET Example:

Imports System.XML.XPath

Class Test_Xpath

'' load xml doc, query and return values array
Shared Function GetHtmlValue( _
ByVal xpq As String, _
ByVal xmlfn As String) As String()
Dim xmldoc As New XPathDocument(xmlfn)
Dim nav As XPathNavigator = xmldoc.CreateNavigator()
Dim iterator As XPathNodeIterator
Dim result As New List(Of String)
Try
iterator = nav.Select(xpq)
Do While iterator.MoveNext
result.Add(iterator.Current.Value)
Loop
Catch ex As Exception
End Try
Return result.ToArray()
End Function

Shared Sub DoTest1(byval query as string, byval fn as string)
Dim res As String() = GetHtmlValue(query, fn)
Console.WriteLine("Total DOM elements Found: {0}", res.Length)
For Each s As String In res
Console.WriteLine("{0}", s)
Next
Console.ReadKey(True)
End Sub

Shared Sub main()

' expecting one
DoTest1("//child::*/span[@class='myclass']", "c:\spantest.htm")

' expecting multiple
DoTest1("//child::*/span[@class='myclass2']","c:\spantest.htm")

' expecting none
DoTest1("//child::*/span[@class='myclassXX']","c:\spantest.htm")

End Sub
End Class

Of course, learning XPATH statements is the trick. It is well
documented in MSDN.

--
 
M

Mike

pat said:
I'll check those out, here is the code i found for C#, looks like I
misread it because its was created by someone else and just named
GetStringInBetween. Oops!


public static string[] GetStringInBetween(string strBegin,

Hi pat, you are over thinking the solution :) The technology is
already there for this.

When it comes to HTML, understanding DOM is useful.

If the application is on the client side (browser), then you can just
pure JavaScript or JSCRIPT or VBSCRIPT to access the Document.*
methods and properties and events.

var tag = document.GetElementById("myclass")

but that finds a tag id=value, i.e.

<span id="myclass">...</span>

DOM has GetElementByTag and GetElementByName, but no GetElementByClass().

So there are advanced javascripts using prototyping methods
specifically designed for searching the HTML DOM and manipulating it.
This is whats making WEB 2.0 happen (that along with AJAX). You
have hundreds of open source prototype javascript libraries, among
them are I know and use:

prototype.js which started the idea (I think)
jquery.js probably the best and fastest based on prototype.js

There are others too but I think they are bulky.

In fact, jQuery is so small, fast and sweet, Microsoft has announced
full jQuery support in ASP.NET. Its being add for VS2010 (or already
was for VS2008).

jQuery will allow you to find anything using a CSS style lookup and
XPATH as well. Off hand, to find the <span class="myclass"> the
command would be:

var value = $("span.class")[0].innerHTML;

I think that is correct syntax, if not, its close. :)

Again, thats the client side.

For the server-side, you can use the .NET System.XML.XPath library
assembly as I showed in the previous post.

Kind in mind that a Class is a attribute which other span tags or any
tag can have. In general, using ID or Name is more unique.
Nonetheless, in all cases, the search can result in a list as shown
in the example I provided. If you expect only one, use the first index.

Finally, as Tom has highlighted in similar XML/HTML query questions,
Microsoft's LINQ is a new simply query language that plays off
SQL-like commands to find stuff. So you might want to explore LINQ
as well.

--
 
M

Mike

pat said:
I'll check those out, here is the code i found for C#, looks like I
misread it because its was created by someone else and just named
GetStringInBetween. Oops!


public static string[] GetStringInBetween(string strBegin,

Hi pat, you are over thinking the solution :) The technology is
already there for this.

When it comes to HTML, understanding DOM is useful.

If the application is on the client side (browser), then you can just
pure JavaScript or JSCRIPT or VBSCRIPT to access the Document.*
methods and properties and events.

var tag = document.GetElementById("myclass")

but that finds a tag id=value, i.e.

<span id="myclass">...</span>

DOM has GetElementByTag and GetElementByName, but no GetElementByClass().

So there are advanced javascripts using prototyping methods
specifically designed for searching the HTML DOM and manipulating it.
This is whats making WEB 2.0 happen (that along with AJAX). You
have hundreds of open source prototype javascript libraries, among
them are I know and use:

prototype.js which started the idea (I think)
jquery.js probably the best and fastest based on prototype.js

There are others too but I think they are bulky.

In fact, jQuery is so small, fast and sweet, Microsoft has announced
full jQuery support in ASP.NET. Its being add for VS2010 (or already
was for VS2008).

jQuery will allow you to find anything using a CSS style lookup and
XPATH as well. Off hand, to find the <span class="myclass"> the
command would be:

var value = $("span.class")[0].innerHTML;

I think that is correct syntax, if not, its close. :)

Again, thats the client side.

For the server-side, you can use the .NET System.XML.XPath library
assembly as I showed in the previous post.

Kind in mind that a Class is a attribute which other span tags or any
tag can have. In general, using ID or Name is more unique.
Nonetheless, in all cases, the search can result in a list as shown
in the example I provided. If you expect only one, use the first index.

Finally, as Tom has highlighted in similar XML/HTML query questions,
Microsoft's LINQ is a new simply query language that plays off
SQL-like commands to find stuff. So you might want to explore LINQ
as well.

--
 
M

Mike

Mike said:
pat wrote:
jQuery will allow you to find anything using a CSS style lookup and
XPATH as well. Off hand, to find the <span class="myclass"> the command
would be:

var value = $("span.class")[0].innerHTML;

I think that is correct syntax, if not, its close. :)

I didn't want to leave this buggy :) The CSS style search syntax is:

// return list of all span tags with a class="myclass"
var list = $("span.myclass");

It the same syntax as in a CSS style sheet

<style>
span.myclass {
color: yellow;
background: blue;
}
</style>
 
M

Mike

Mike said:
pat wrote:
jQuery will allow you to find anything using a CSS style lookup and
XPATH as well. Off hand, to find the <span class="myclass"> the command
would be:

var value = $("span.class")[0].innerHTML;

I think that is correct syntax, if not, its close. :)

I didn't want to leave this buggy :) The CSS style search syntax is:

// return list of all span tags with a class="myclass"
var list = $("span.myclass");

It the same syntax as in a CSS style sheet

<style>
span.myclass {
color: yellow;
background: blue;
}
</style>
 
C

Cor Ligthert[MVP]

Mike,

The DOM is in Net and Com the MSHTML namespace (and DLL)

Cor

Mike said:
pat said:
I'll check those out, here is the code i found for C#, looks like I
misread it because its was created by someone else and just named
GetStringInBetween. Oops!


public static string[] GetStringInBetween(string strBegin,

Hi pat, you are over thinking the solution :) The technology is already
there for this.

When it comes to HTML, understanding DOM is useful.

If the application is on the client side (browser), then you can just pure
JavaScript or JSCRIPT or VBSCRIPT to access the Document.* methods and
properties and events.

var tag = document.GetElementById("myclass")

but that finds a tag id=value, i.e.

<span id="myclass">...</span>

DOM has GetElementByTag and GetElementByName, but no GetElementByClass().

So there are advanced javascripts using prototyping methods specifically
designed for searching the HTML DOM and manipulating it. This is whats
making WEB 2.0 happen (that along with AJAX). You have hundreds of open
source prototype javascript libraries, among them are I know and use:

prototype.js which started the idea (I think)
jquery.js probably the best and fastest based on prototype.js

There are others too but I think they are bulky.

In fact, jQuery is so small, fast and sweet, Microsoft has announced full
jQuery support in ASP.NET. Its being add for VS2010 (or already was for
VS2008).

jQuery will allow you to find anything using a CSS style lookup and XPATH
as well. Off hand, to find the <span class="myclass"> the command would
be:

var value = $("span.class")[0].innerHTML;

I think that is correct syntax, if not, its close. :)

Again, thats the client side.

For the server-side, you can use the .NET System.XML.XPath library
assembly as I showed in the previous post.

Kind in mind that a Class is a attribute which other span tags or any tag
can have. In general, using ID or Name is more unique. Nonetheless, in
all cases, the search can result in a list as shown in the example I
provided. If you expect only one, use the first index.

Finally, as Tom has highlighted in similar XML/HTML query questions,
Microsoft's LINQ is a new simply query language that plays off SQL-like
commands to find stuff. So you might want to explore LINQ as well.

--
 
C

Cor Ligthert[MVP]

Mike,

The DOM is in Net and Com the MSHTML namespace (and DLL)

Cor

Mike said:
pat said:
I'll check those out, here is the code i found for C#, looks like I
misread it because its was created by someone else and just named
GetStringInBetween. Oops!


public static string[] GetStringInBetween(string strBegin,

Hi pat, you are over thinking the solution :) The technology is already
there for this.

When it comes to HTML, understanding DOM is useful.

If the application is on the client side (browser), then you can just pure
JavaScript or JSCRIPT or VBSCRIPT to access the Document.* methods and
properties and events.

var tag = document.GetElementById("myclass")

but that finds a tag id=value, i.e.

<span id="myclass">...</span>

DOM has GetElementByTag and GetElementByName, but no GetElementByClass().

So there are advanced javascripts using prototyping methods specifically
designed for searching the HTML DOM and manipulating it. This is whats
making WEB 2.0 happen (that along with AJAX). You have hundreds of open
source prototype javascript libraries, among them are I know and use:

prototype.js which started the idea (I think)
jquery.js probably the best and fastest based on prototype.js

There are others too but I think they are bulky.

In fact, jQuery is so small, fast and sweet, Microsoft has announced full
jQuery support in ASP.NET. Its being add for VS2010 (or already was for
VS2008).

jQuery will allow you to find anything using a CSS style lookup and XPATH
as well. Off hand, to find the <span class="myclass"> the command would
be:

var value = $("span.class")[0].innerHTML;

I think that is correct syntax, if not, its close. :)

Again, thats the client side.

For the server-side, you can use the .NET System.XML.XPath library
assembly as I showed in the previous post.

Kind in mind that a Class is a attribute which other span tags or any tag
can have. In general, using ID or Name is more unique. Nonetheless, in
all cases, the search can result in a list as shown in the example I
provided. If you expect only one, use the first index.

Finally, as Tom has highlighted in similar XML/HTML query questions,
Microsoft's LINQ is a new simply query language that plays off SQL-like
commands to find stuff. So you might want to explore LINQ as well.

--
 
H

Herfried K. Wagner [MVP]

Mike said:
You can use XPATH or LINQ to perform XML or HTML lookups. I'm not
familar with LINQ and I believe its for .NET 3.0 with I haven't upgraded
to yet. But in XPATH, its fairly simple;

Note that this will only work with XHTML or HTML which is "compatible" to
XHTML, but it will not work with every HTML document. For the latter, using
an SGML parser might be an option.
 
H

Herfried K. Wagner [MVP]

Mike said:
You can use XPATH or LINQ to perform XML or HTML lookups. I'm not
familar with LINQ and I believe its for .NET 3.0 with I haven't upgraded
to yet. But in XPATH, its fairly simple;

Note that this will only work with XHTML or HTML which is "compatible" to
XHTML, but it will not work with every HTML document. For the latter, using
an SGML parser might be an option.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top