Is RegEx a good choice for reading malformed xml?

T

Terry Olsen

I download xml logs from several servers every day and read the data out of
them using the XmlTextReader. But about 10% of them each day throw
exceptions because they are not well formed. I don't want to lose the data
in the files that won't load into an XmlDocument. So I was thinking maybe
using a RegEx function, sending a Node Name to the function and having it
return the InnerText.

Is this a good use for RegEx, or is there a better way to do what I want?
I'm not versed in RegEx either, so what would a RegEx expression look like
for this?

Thanks.
 
J

Jay B. Harlow [MVP - Outlook]

Terry,
| Is this a good use for RegEx, or is there a better way to do what I want?
IMHO The "better" way, i.e. the *correct* way, would be to correct the
program that allegedly is writing Xml to *actually write* Xml, (have it use
a "parser" & write well formed Xml) then your program would not (should not)
have an issue reading valid Xml!

For details see "Item 29 - Always Use a Parser" in Elliotte Rusty Harold's
excellent book " Effective XML - 50 Specific Ways to Improve Your XML" from
Addison Wesley.



Although RegEx could possibly parse the mal formed Xml, what's to say the
source program is able to write enough bad Xml so that you regex could read
it.

Before using RegEx to parse out enough info to throw an exception, I would
consider using alternate Xml Parsers/readers, such as the SgmlReader from
Got Dot Net:

http://www.gotdotnet.com/Community/...mpleGuid=b90fddce-e60d-43f8-a5c4-c3bd760564bc


Some RegEx resources:

Expresso:
http://www.ultrapico.com/Expresso.htm

RegEx Workbench:
http://www.gotdotnet.com/Community/...mpleGuid=c712f2df-b026-4d58-8961-4ee2729d7322

A tutorial & reference on using regular expressions:
http://www.regular-expressions.info/

The MSDN's documentation on regular expressions:
http://msdn.microsoft.com/library/d...l/cpconRegularExpressionsLanguageElements.asp

Expresso & RegEx Workbench are helpful tools for learning regular
expressions & testing them.

I use the regular-expressions.info as a general regex reference, then fall
back to MSDN for the specifics. The above link is .NET 1.x; I don't have the
..NET 2.0 link handy; not sure if any thing changes in 2.0.

--
Hope this helps
Jay B. Harlow [MVP - Outlook]
..NET Application Architect, Enthusiast, & Evangelist
T.S. Bradley - http://www.tsbradley.net


|I download xml logs from several servers every day and read the data out of
| them using the XmlTextReader. But about 10% of them each day throw
| exceptions because they are not well formed. I don't want to lose the data
| in the files that won't load into an XmlDocument. So I was thinking maybe
| using a RegEx function, sending a Node Name to the function and having it
| return the InnerText.
|
| Is this a good use for RegEx, or is there a better way to do what I want?
| I'm not versed in RegEx either, so what would a RegEx expression look like
| for this?
|
| Thanks.
|
|
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top