Regex to find new line chars within XML tags?

  • Thread starter Thread starter Burt
  • Start date Start date
B

Burt

I'm pulling back contact data from Exchange using WebDAV. The problem
is a third party app is adding random "\r\n"s within the XML data,
corrupting it.

I want to remove these, but only if they occur within the XML start or
end tags (between the "<" and the ">"), since some actual data does
have new line characters. So for:

<address \r\n>15 Pine St \r\n Suite 400</address>

I'd want to remove only the first new line.

I think regex is the way to go, but I can't get the right expression.
I've tried (?<=[<])\r\n(?=>), etc but no luck.

Can anyone help?

Many thanks,

Burt
 
Burt said:
I'm pulling back contact data from Exchange using WebDAV. The problem
is a third party app is adding random "\r\n"s within the XML data,
corrupting it.

I want to remove these, but only if they occur within the XML start or
end tags (between the "<" and the ">"), since some actual data does
have new line characters. So for:

<address \r\n>15 Pine St \r\n Suite 400</address>

I'd want to remove only the first new line.

I think regex is the way to go, but I can't get the right expression.
I've tried (?<=[<])\r\n(?=>), etc but no luck.

I think either of these should be enough: "(?<=<[^>]*)\r?\n", or
"\r?\n(?=[^<]*>)". In well-formed XML, I can't think of a case when you
actually need to check both ends.

(Not sure if this handles attribute strings correctly, but you should be
able to add that)

Niki
 
Back
Top