C
c
Hello,
I have a question about a Regex I'm trying to write. I'm trying to
strip out the DOCTYPE declaration from a XML document I'm receiving.
I've noticed that the files are delivered with different DOCTYPEs
though. Some look like this:
<!DOCTYPE nitf SYSTEM "nitf.dtd">
which I can strip out with the Regex <!DOCTYPE [^>]*>
However, some of the files are delivered like this:
<!DOCTYPE nitf SYSTEM "http://dtd.dtd" [
<!ENTITY % xhtml SYSTEM "http://dtd.dtd">
%xhtml;
]>
which forced me to write another Regex <!ENTITY [^>]*> to strip out the
ENTITY tags. I've also noticed that there can be several more
declarations in a DOCTYPE such as ELEMENT, ATTLIST and NOTATION.
Does anyone know any way that I can write one Regex that will strip out
the entire DTD regardless if it contains any sub declarations?
Thanks in advance.
I have a question about a Regex I'm trying to write. I'm trying to
strip out the DOCTYPE declaration from a XML document I'm receiving.
I've noticed that the files are delivered with different DOCTYPEs
though. Some look like this:
<!DOCTYPE nitf SYSTEM "nitf.dtd">
which I can strip out with the Regex <!DOCTYPE [^>]*>
However, some of the files are delivered like this:
<!DOCTYPE nitf SYSTEM "http://dtd.dtd" [
<!ENTITY % xhtml SYSTEM "http://dtd.dtd">
%xhtml;
]>
which forced me to write another Regex <!ENTITY [^>]*> to strip out the
ENTITY tags. I've also noticed that there can be several more
declarations in a DOCTYPE such as ELEMENT, ATTLIST and NOTATION.
Does anyone know any way that I can write one Regex that will strip out
the entire DTD regardless if it contains any sub declarations?
Thanks in advance.