Regex to find new line chars within XML tags?

B

Burt

I'm pulling back contact data from Exchange using WebDAV. The problem
is a third party app is adding random "\r\n"s within the XML data,
corrupting it.

I want to remove these, but only if they occur within the XML start or
end tags (between the "<" and the ">"), since some actual data does
have new line characters. So for:

<address \r\n>15 Pine St \r\n Suite 400</address>

I'd want to remove only the first new line.

I think regex is the way to go, but I can't get the right expression.
I've tried (?<=[<])\r\n(?=>), etc but no luck.

Can anyone help?

Many thanks,

Burt
 
N

Niki Estner

Burt said:
I'm pulling back contact data from Exchange using WebDAV. The problem
is a third party app is adding random "\r\n"s within the XML data,
corrupting it.

I want to remove these, but only if they occur within the XML start or
end tags (between the "<" and the ">"), since some actual data does
have new line characters. So for:

<address \r\n>15 Pine St \r\n Suite 400</address>

I'd want to remove only the first new line.

I think regex is the way to go, but I can't get the right expression.
I've tried (?<=[<])\r\n(?=>), etc but no luck.

I think either of these should be enough: "(?<=<[^>]*)\r?\n", or
"\r?\n(?=[^<]*>)". In well-formed XML, I can't think of a case when you
actually need to check both ends.

(Not sure if this handles attribute strings correctly, but you should be
able to add that)

Niki
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top