Regex Help!

C

c

Hello,

I have a question about a Regex I'm trying to write. I'm trying to
strip out the DOCTYPE declaration from a XML document I'm receiving.
I've noticed that the files are delivered with different DOCTYPEs
though. Some look like this:

<!DOCTYPE nitf SYSTEM "nitf.dtd">

which I can strip out with the Regex <!DOCTYPE [^>]*>

However, some of the files are delivered like this:

<!DOCTYPE nitf SYSTEM "http://dtd.dtd" [
<!ENTITY % xhtml SYSTEM "http://dtd.dtd">
%xhtml;
]>

which forced me to write another Regex <!ENTITY [^>]*> to strip out the
ENTITY tags. I've also noticed that there can be several more
declarations in a DOCTYPE such as ELEMENT, ATTLIST and NOTATION.

Does anyone know any way that I can write one Regex that will strip out
the entire DTD regardless if it contains any sub declarations?

Thanks in advance.
 
C

c

I found the answer:

<!DOCTYPE[^>]*?(\[(.|\n)*?\]\s*?)*?>

However, this doesn't work with the vbscript.dll version 5.1.0.7426
unfortunately.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top