Parsing some info out of HTML - can you help me please?

A

almurph

Hi everyone,


Can you help me please. I have an XML document (with the XML shown
below) and i am trying to parse out the information beginning "Member
Type:" and ending "Email:"

Can you help me please? Any suggestions/comments/code-sampels much
appreciated.

Thanking you,
Al.



****** XML Dcoument Bwlow *****

- <html xmlns="http://www.w3.org/1999/xhtml">
- <!-- #BeginTemplate "/Templates/faseb_main.dwt"
-->
- <!-- DW6
-->
- <head>
- <!-- #BeginEditable "doctitle"
-->
<title>Federation of American Societies for Experimental Biology
(FASEB)</title>
- <!-- #EndEditable
-->
<link rel="STYLESHEET" type="text/css" href="faseb_included/
style.css" />
- <style type="text/css">
- <!--
div.Section1
{page:Section1;}
..style1 {font-size: .8em; color : #0E469B; font-family: Tahoma, sans-
serif;}


-->
</style>
- <!-- #BeginEditable "javainsert"
-->
- <!-- #EndEditable
-->
- <script language="JavaScript" type="text/JavaScript">
- <![CDATA[ // mmLoadMenus()

]]>
- <!--
function popUp(URL) {
day = new Date();
id = day.getTime();
eval("page" + id + " = window.open(URL, '" + id + "',
'toolbar=0,scrollbars=0,location=0,statusbar=0,menubar=0,resizable=1,width=760,height=550,left
= 452.5,top = 414.5');");
}
//

-->
- <script type="text/javascript">
- <![CDATA[
if(ns4)_d.write("<scr"+"ipt type=text/javascript src=Faseb_included/
mmenuns4.js><\/scr"+"ipt>");
else _d.write("<scr"+"ipt type=text/javascript src=Faseb_included/
mmenudom.js><\/scr"+"ipt>");


]]>
</script>
<script type="text/javascript" src="Faseb_included/menu_data.js" />
</head>
- <body bgcolor="#FFFFFF" leftmargin="0" topmargin="0" marginwidth="0"
marginheight="0">
- <div align="center">
- <div align="center">
- <table border="0" cellspacing="0" cellpadding="0" id="table72">
- <tr>
- <td style="height: 103px; width: 790px;" id="remove1">
- <div align="center">
- <table width="790" border="0" cellpadding="0" cellspacing="0"
id="table73">
- <tr>
- <td>
- <map id="FPMap0" name="FPMap0">
<area alt="" href="http://www.faseb.org/" shape="rect" coords="12,
29, 171, 95" />
</map>
<img alt="" border="0" src="Faseb_Included/fasebheader.gif"
width="790" height="103" usemap="#FPMap0" id="Header" />
</td>
</tr>
</table>
</div>
</td>
</tr>
- <tr>
- <td style="height: 1px; width: 790px;" id="remove">
- <table border="0" cellpadding="0" cellspacing="0" height="25"
width="100%" bgcolor="#000000" id="table74">
- <tr>
- <td id="menu">
<script type="text/javascript" src="Faseb_Included/
menu_selections.js" />
</td>
</tr>
</table>
</td>
</tr>
</table>
</div>
- <table width="790" border="0" cellspacing="0" cellpadding="0"
id="table53">
- <tr valign="top">
- <div align="left">
- <div align="left">
- <!-- Search Google
-->
- <form id="searchbox_011027697650713911188:krkcv01jvpi"
action="http://www.faseb.org/search/results.htm">
<input type="hidden" name="cx"
value="011027697650713911188:krkcv01jvpi" />
<input name="q" type="text" size="20" />
<input type="submit" name="sa" value="Search" style="color: #000000;
font-size: 10px; font-family: Tahoma; border: 1px solid #000000;
background-color: #C0C0C0" class="main" />
<input type="hidden" name="cof" value="FORID:11" />
</form>
<script type="text/javascript" src="Faseb_included/search.js" />
- <!-- Search Google
-->
- <!-- #BeginEditable "Address"
-->
- <p>
<img alt="" src="Faseb_Included/address.gif" />
</p>
- <!-- #EndEditable
-->
<p style="azimuth:left" />
<p style="azimuth:left" />
<p> </p>
</div>
</div>
</td>
- <td rowspan="3" background="Faseb_Included/leftbg.gif"
id="sidegoogle" style="width: 10px">
<img alt="" src="Faseb_Included/spacer.gif" width="10" height="1"/</td>
- <td rowspan="2" style="background-color: #FFFFFF; width: 615px;
border-right: black thin solid; border-top: black thin solid; border-
left: black thin solid; border-bottom: black thin solid;">
- <table border="0" cellspacing="0" cellpadding="3" id="table65"
style="width: 100%">
- <tr>
- <td style="height: 30px; width: 602px;" id="ads">
- <table border="0" cellpadding="5" cellspacing="0" id="table66"
style="background-color: #e7e7e7; width: 101%;">
- <tr>
- <td style="width: 613px">
- <p>
- <span class="style1">
- <!-- #BeginEditable "navbar"
-->
Membership Directory
- <!-- #EndEditable
-->
</span>
</p>
</td>
</tr>
</table>
</td>
</tr>
</table>
- <table width="100%" border="0" cellspacing="0" cellpadding="3"
id="table70">
- <tr valign="top">
- <td style="width: 618px; height: 208px;">
- <form name="aspnetForm" method="post" action="MemberInfo.aspx?
DirID=100000" id="aspnetForm">
- <div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET"
value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT"
value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/
wEPDwUKMTM0OTUyOTI4Nw9kFgJmD2QWAgIBD2QWAgIBD2QWBgIBDw8WAh4EVGV4dAUWSnVhbiBQYXRyaWNpbyBOb2d1ZWlyYWRkAgsPDxYIHglGb3JlQ29sb3IKJR8ABSNUaGlzIE1lbWJlciBiZWxvbmdzIHRvIG9uZSBzb2NpZXR5Lh4EXyFTQgIEHgdWaXNpYmxlZ2RkAg0PPCsACQEADxYGHgpEYXRhTWVtYmVyBQVUYWJsZR4IRGF0YUtleXMWAB4LXyFJdGVtQ291bnQCAWQWAmYPZBYCZg8VFBFFbmRvY3JpbmUgU29jaWV0eRhGZWxsb3cvU3R1ZGVudCBBc3NvY2lhdGUCTUQESnVhbghOb2d1ZWlyYQ9FbmRvY3Jpbm9sb2dpc3QiVW5pdiBvZiBNZWRpdGVycmFuZWUsIERlcHQgb2YgRW5kbw8xMTkgYmQgamVhbm5lIGQAAAlNYXJzZWlsbGUABTEzMDA1BkZyYW5jZQ4zMy0wNi0xOTc2NjgxNAASanVhbnBub2dAeWFob28uY29tEmp1YW5wbm9nQHlhaG9vLmNvbQoxMi8zMC8yMDA2ZEp1YW4gUGF0cmljaW8gTm9ndWVpcmENVW5pdiBvZiBNZWRpdGVycmFuZWUsIERlcHQgb2YgRW5kbw0xMTkgYmQgamVhbm5lIGQNDQ1NYXJzZWlsbGUgLCAxMzAwNQ1GcmFuY2VkZJ9HcE0erIMB8F4iFJFSAD4qNiqs" /</div>
- <script type="text/javascript">
- <![CDATA[
//<![CDATA[
var theForm = document.forms['aspnetForm'];
if (!theForm) {
theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}



]]>
</script>
- <div>
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION"
value="/wEWAwL/l/7/AQKfgeHyDAL1m+DgCyWmSkpMB7WyVHbfYdw77D3CY8pO" />
</div>
- <table id="Header" style="width: 592px; height: 56px">
- <tr>
- <td id="welcome" colspan="2" style="font-weight: bold; width: 462px;
font-family: Arial">
<span id="ctl00_MainContent_lblWelcome">Juan Patricio Nogueira</
span>
</td>
- <td id="options" style="vertical-align: middle; width: 270px; text-
align: right">
<a id="ctl00_MainContent_lnkProfile" href="javascript:__doPostBack
('ctl00$MainContent$lnkProfile','')" style="display:inline-block;width:
100px;">Edit my profile</a>
<a id="ctl00_MainContent_hplSearch" href="javascript:history.back
();" style="display:inline-block;width:72px;">Search</a>
<a id="ctl00_MainContent_lnkLogout" href="javascript:__doPostBack
('ctl00$MainContent$lnkLogout','')" style="display:inline-block;text-
decoration:underline;width:72px;">Log Out</a>
</td>
</tr>
- <tr>
- <td id="title" colspan="3" style="vertical-align: middle; text-
align: center">
<span id="ctl00_MainContent_lblTitle" style="font-size:X-Large;font-
weight:bold;">Member Info</span>
</td>
</tr>
</table>
- <table style="width: 592px">
- <tr>
- <td style="vertical-align: middle; width: 100px; text-align: right">
<span id="ctl00_MainContent_lblInfo" style="display:inline-
block;color:Blue;width:580px;">This Member belongs to one society.</
span>
</td>
</tr>
- <tr>
- <td style="width: 100px">
- <table id="ctl00_MainContent_dlMemberInfo" cellspacing="0"
cellpadding="3" rules="rows" border="1" style="background-
color:White;border-color:#0E469B;border-width:1px;border-
style:Solid;font-family:Tahoma;height:200px;width:591px;border-
collapse:collapse;">
- <tr>
- <td style="color:#0E469B;background-color:#E7E7E7;">
- <div style="padding-right: 0px; padding-left: 0px; font-weight:
normal; font-size: 12pt; padding-bottom: 0px; line-height: normal;
padding-top: 0px; font-style: normal; font-variant: normal">
<i>Endocrine Society</i>
</div>
- <div style="padding-right: 0px; padding-left: 0px; font-size: 10pt;
padding-bottom: 0px; padding-top: 0px">
<br />
<br />
<b>Member Type:</b>
Fellow/Student Associate
<br />
<b>Degrees:</b>
MD
<br />
<b>First Name:</b>
Juan
<br />
<b>Last Name:</b>
Nogueira
<br />
<b>Title:</b>
Endocrinologist
<br />
<b>Institution:</b>
Univ of Mediterranee, Dept of Endo
<br />
<b>Address 1:</b>
119 bd jeanne d
<br />
<b>Address 2:</b>
<br />
<b>Address 3:</b>
<br />
<b>City:</b>
Marseille
<br />
<b>State/Province:</b>
<br />
<b>Zip/Postal:</b>
13005
<br />
<b>Country:</b>
France
<br />
<b>Work Phone:</b>
33-06-19766814
<br />
<b>Fax:</b>
<br />
<b>Email:</b>
<a id="lblemailsoc" href="mailto:[email protected]" style="color:
navy; text-decoration: underline">[email protected]</a>
<br />
<b>Member Since:</b>
12/30/2006
<br />
</div>
- <div style="padding-right: 0px; padding-left: 0px; font-size: 10pt;
padding-bottom: 0px; padding-top: 0px">
<textarea cols="1" rows="7" style="WIDTH: 496px; overflow: auto;
height: 128px;" readonly="readonly" wrap="off">Juan Patricio Nogueira
Univ of Mediterranee, Dept of Endo 119 bd jeanne d Marseille , 13005
France</textarea>

</div>
</td>
</tr>
</table>
</td>
</tr>
</table>
</form>
- <!-- #BeginEditable "main"
-->
</td>
<td class="main" style="margin-top: 0; margin-bottom: 0; height:
208px;" id="rightbg" />
</tr>
</table>
- <table width="100%" border="0" cellpadding="0" cellspacing="0"
background="Faseb_Included/hrback.gif" id="table71" style="height:
1px">
- <tr>
- <td style="height: 5px">
<img alt="" src="Faseb_Included/spacer.gif" height="5" width="5" />
</td>
</tr>
</table>
- <p align="center">
<font color="#000000" size="2" face="Tahoma">The Federation of
American Societies for Experimental Biology (FASEB) advances
biological science through collaborative advocacy for research
policies that promote scientific progress and education and lead to
improvements in human health.</font>
</p>
- <p align="center">
<span style="font-size: 10.0pt; font-family: Tahoma; color:
black">©2007 Federation of American Societies for Experimental
Biology</span>
<br />
- <span class="main">
<a href="mailto:[email protected]?subject=FASEB.org%20Technical
%20Issue">For Technical Issues</a>
</span>

- <span class="main">
<a href="mailto:[email protected]?subject=FASEB%20website
%20query">For other questions or comments</a>
</span>
</p>
</td>
<td rowspan="3" style="width: 1px; background-image: url
(Faseb_included/rightbg1.gif);" />
</tr>
</table>
</div>
- <div align="center">
- <table width="790" border="0" cellpadding="0" cellspacing="0">
- <tr>
- <td height="20">
<img alt="" border="0" src="Faseb_Included/footer.gif" width="790"
height="28" />
</td>
</tr>
</table>
</div>
- <script language="javascript" type="text/javascript">
- <![CDATA[
if(document.getElementById("ctl00$MainContent$HiddenField")!
=null)
{
window.scrollBy(0,document.getElementById
("ctl00$MainContent$HiddenField").value);
}


]]>
</script>
</body>
- <!-- #EndTemplate
-->
</html>
 
J

J.B. Moreno

Mark Rae said:
Two methods spring immediately to mind: String manipulation and Regular
Expressions.

The one that springs immediately to my mind, is xmldocument and the
various node selection command (such as selectsinglenode).
 
H

Helmut Giese

Regular Expressions represent a much better solution, IMO:
http://dotnetperls.com/regex-match-use
Hm,
as powerful as REs are they are _not_ particularly good on working
with nested data - and XML can be arbitrarily nested.

Al,
in my experience parsing XML "by hand" is tedious, error prone and
does not help you in the long run.
Unless you are absolutely certain that you will never have to deal
with XML again in the future I would strongly suggest that you learn
to use a _real_ XML parser. For your task at hand this might take a
bit longer, but this knowledge will pay off in the future.
Just my 0.02, of course.
Good luck
Helmut Giese
 
C

Christoph Basedau

I have an XML document (with the XML shown
below) and i am trying to parse out the information beginning "Member
Type:" and ending "Email:"

If it's wellformed x(ht)ml, using System.Xml.XmlDocument is probably
the easiest way to grep the data, cause you let .NET do
the parsing.
Having loaded the doc, you have to SelectNodes and then
check InnerText and NextSibling and your done.

Christoph

****** XML Dcoument Bwlow *****

- <html xmlns="http://www.w3.org/1999/xhtml">
- <!-- #BeginTemplate "/Templates/faseb_main.dwt"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top