Regex Replace Help

Discussion in 'Microsoft Dot NET Framework' started by barry, Apr 29, 2008.

  1. barry

    barry Guest

    Hi

    I have a files which contains
    &
    &
    &quote;

    I want to replace & with & , but not & or &quote;

    Will someone please help with the Regular Expression.

    TIA
    Barry
     
    barry, Apr 29, 2008
    #1
    1. Advertisements

  2. barry

    barry Guest

    strange, no one has replied

    looks like i have crossed the limit of asking question on the same topic. I
    the some limit maybe (10 or 15 per topic)


    "barry" <> wrote in message
    news:...
    > Hi
    >
    > I have a files which contains
    > &
    > &amp;
    > &quote;
    >
    > I want to replace & with &amp; , but not &amp; or &quote;
    >
    > Will someone please help with the Regular Expression.
    >
    > TIA
    > Barry
    >
    >
    >
     
    barry, Apr 30, 2008
    #2
    1. Advertisements

  3. barry

    eBob.com Guest

    Hi Barry,

    Actually I wanted to play around with this for you but just haven't gotten
    around to it. (And, btw, I don't recall any previous posts from you on this
    subject.)

    Part of my response to you, the part I can provide without actually playing
    with a regex, is the following. Regular Expressions are extremely useful.
    If you do any programming the effort you put into learning regular
    expressions will be worth it. Several of us here use Expresso (from
    UltraPico) and recommend it. I just became aware of something similar
    called Regular Expression Workbench available from MSDN. I've installed it
    but have not yet played with it.

    I'll try to play with it later today but no promises.

    Bob

    "barry" <> wrote in message
    news:...
    > strange, no one has replied
    >
    > looks like i have crossed the limit of asking question on the same topic.
    > I the some limit maybe (10 or 15 per topic)
    >
    >
    > "barry" <> wrote in message
    > news:...
    >> Hi
    >>
    >> I have a files which contains
    >> &
    >> &amp;
    >> &quote;
    >>
    >> I want to replace & with &amp; , but not &amp; or &quote;
    >>
    >> Will someone please help with the Regular Expression.
    >>
    >> TIA
    >> Barry
    >>
    >>
    >>

    >
    >
     
    eBob.com, Apr 30, 2008
    #3
  4. barry

    barry Guest

    Thanks for your reply

    Imagine the following string
    string str = "The Quick Black&Fox &amp; Jumped Over &quote; The & Lazy Dog";

    should be

    string str = "The Quick Black&amp;Fox &amp; Jumped Over &quote; The & Lazy
    Dog";

    This is a problem with a larger .xml file in which xx&xx is creating a
    problems in IE

    In fact in have just spent over 50 minutes and managed to get some results
    like this

    str = Regex.Replace(str, @"\b\s*(?=&[^&amp;|&quote;| & ])\b", "&amp;",
    RegexOptions.None);

    And last but not the least i collect all answers posted to my Regex queries
    for later use.








    "eBob.com" <> wrote in message
    news:...
    > Hi Barry,
    >
    > Actually I wanted to play around with this for you but just haven't gotten
    > around to it. (And, btw, I don't recall any previous posts from you on
    > this subject.)
    >
    > Part of my response to you, the part I can provide without actually
    > playing with a regex, is the following. Regular Expressions are extremely
    > useful. If you do any programming the effort you put into learning regular
    > expressions will be worth it. Several of us here use Expresso (from
    > UltraPico) and recommend it. I just became aware of something similar
    > called Regular Expression Workbench available from MSDN. I've installed
    > it but have not yet played with it.
    >
    > I'll try to play with it later today but no promises.
    >
    > Bob
    >
    > "barry" <> wrote in message
    > news:...
    >> strange, no one has replied
    >>
    >> looks like i have crossed the limit of asking question on the same topic.
    >> I the some limit maybe (10 or 15 per topic)
    >>
    >>
    >> "barry" <> wrote in message
    >> news:...
    >>> Hi
    >>>
    >>> I have a files which contains
    >>> &
    >>> &amp;
    >>> &quote;
    >>>
    >>> I want to replace & with &amp; , but not &amp; or &quote;
    >>>
    >>> Will someone please help with the Regular Expression.
    >>>
    >>> TIA
    >>> Barry
    >>>
    >>>
    >>>

    >>
    >>

    >
    >
     
    barry, Apr 30, 2008
    #4
  5. barry

    eBob.com Guest

    Well ... I did know before I looked at this that I was far from an expert in
    regular expressions. But I know that even better now! But I think that I
    did eventually find a solution.

    I began by ignoring the replace aspect of the problem and tried to just find
    a regular expression that would find the right ampersands. At first I did
    not see how to find an "&" NOT followed by a specific STRING. But after
    some research in the great Balena book I learned that I could use what is
    called a "zero-width negative look-ahead assertion". The syntax for one of
    those is "(?!subexpr)". So using this expression
    (?<desiredamp>&(?!amp;))
    I was able to find ampersands except for those followed by "amp;". Great!
    I thought I was on my way and leapt, without sufficient thought, to
    (?<desiredamp>&((?!amp;)|(?!quote;)))
    but that catches all ampersands. It catches &amp; because that's an & not
    followed by "quote;". And catches &quote; because that's an & not followed
    by "amp;".

    After more thought I came up with
    (?<desiredamp>&(?!(amp;)|(quote;)))
    which I think finds the ampersands which you want to find.

    Another plug for Expresso, it was absolutely invaluable in researching this!

    The expression above does find the & in "The & Lazy Dog". But if you don't
    want that one I am sure you can see how to alter the expression to eliminate
    it. I also did not worry about the replace aspect of the problem, I'm sure
    you don't need help with that.

    Good Luck, Bob

    "barry" <> wrote in message
    news:%...
    >
    > Thanks for your reply
    >
    > Imagine the following string
    > string str = "The Quick Black&Fox &amp; Jumped Over &quote; The & Lazy
    > Dog";
    >
    > should be
    >
    > string str = "The Quick Black&amp;Fox &amp; Jumped Over &quote; The & Lazy
    > Dog";
    >
    > This is a problem with a larger .xml file in which xx&xx is creating a
    > problems in IE
    >
    > In fact in have just spent over 50 minutes and managed to get some results
    > like this
    >
    > str = Regex.Replace(str, @"\b\s*(?=&[^&amp;|&quote;| & ])\b", "&amp;",
    > RegexOptions.None);
    >
    > And last but not the least i collect all answers posted to my Regex
    > queries for later use.
    >
    >
    >
    >
    >
    >
    >
    >
    > "eBob.com" <> wrote in message
    > news:...
    >> Hi Barry,
    >>
    >> Actually I wanted to play around with this for you but just haven't
    >> gotten around to it. (And, btw, I don't recall any previous posts from
    >> you on this subject.)
    >>
    >> Part of my response to you, the part I can provide without actually
    >> playing with a regex, is the following. Regular Expressions are
    >> extremely useful. If you do any programming the effort you put into
    >> learning regular expressions will be worth it. Several of us here use
    >> Expresso (from UltraPico) and recommend it. I just became aware of
    >> something similar called Regular Expression Workbench available from
    >> MSDN. I've installed it but have not yet played with it.
    >>
    >> I'll try to play with it later today but no promises.
    >>
    >> Bob
    >>
    >> "barry" <> wrote in message
    >> news:...
    >>> strange, no one has replied
    >>>
    >>> looks like i have crossed the limit of asking question on the same
    >>> topic. I the some limit maybe (10 or 15 per topic)
    >>>
    >>>
    >>> "barry" <> wrote in message
    >>> news:...
    >>>> Hi
    >>>>
    >>>> I have a files which contains
    >>>> &
    >>>> &amp;
    >>>> &quote;
    >>>>
    >>>> I want to replace & with &amp; , but not &amp; or &quote;
    >>>>
    >>>> Will someone please help with the Regular Expression.
    >>>>
    >>>> TIA
    >>>> Barry
    >>>>
    >>>>
    >>>>
    >>>
    >>>

    >>
    >>

    >
    >
     
    eBob.com, May 1, 2008
    #5
  6. Hello Barry,

    Let's try to wriete out the pattern you're looking for.

    You're specifically looking for a pattern that consists of an &, not followed
    by any alphanumeric characters and a ;. Now if you write it out like that,
    it becomes quite simple:

    &(?![a-z0-9]+;)

    That's it... Now, replace those with &amp; and you're done.

    Jesse


    > Thanks for your reply
    >
    > Imagine the following string
    > string str = "The Quick Black&Fox &amp; Jumped Over &quote; The & Lazy
    > Dog";
    > should be
    >
    > string str = "The Quick Black&amp;Fox &amp; Jumped Over &quote; The &
    > Lazy Dog";
    >
    > This is a problem with a larger .xml file in which xx&xx is creating
    > a problems in IE
    >
    > In fact in have just spent over 50 minutes and managed to get some
    > results like this
    >
    > str = Regex.Replace(str, @"\b\s*(?=&[^&amp;|&quote;| & ])\b", "&amp;",
    > RegexOptions.None);
    >
    > And last but not the least i collect all answers posted to my Regex
    > queries for later use.
    >
    > "eBob.com" <> wrote in message
    > news:...
    >
    >> Hi Barry,
    >>
    >> Actually I wanted to play around with this for you but just haven't
    >> gotten around to it. (And, btw, I don't recall any previous posts
    >> from you on this subject.)
    >>
    >> Part of my response to you, the part I can provide without actually
    >> playing with a regex, is the following. Regular Expressions are
    >> extremely useful. If you do any programming the effort you put into
    >> learning regular expressions will be worth it. Several of us here
    >> use Expresso (from UltraPico) and recommend it. I just became aware
    >> of something similar called Regular Expression Workbench available
    >> from MSDN. I've installed it but have not yet played with it.
    >>
    >> I'll try to play with it later today but no promises.
    >>
    >> Bob
    >>
    >> "barry" <> wrote in message
    >> news:...
    >>
    >>> strange, no one has replied
    >>>
    >>> looks like i have crossed the limit of asking question on the same
    >>> topic. I the some limit maybe (10 or 15 per topic)
    >>>
    >>> "barry" <> wrote in message
    >>> news:...
    >>>
    >>>> Hi
    >>>>
    >>>> I have a files which contains
    >>>> &
    >>>> &amp;
    >>>> &quote;
    >>>> I want to replace & with &amp; , but not &amp; or &quote;
    >>>>
    >>>> Will someone please help with the Regular Expression.
    >>>>
    >>>> TIA
    >>>> Barry

    --
    Jesse Houwing
    jesse.houwing at sogeti.n
     
    Jesse Houwing, May 2, 2008
    #6
  7. barry

    barry Guest

    well i work on freelancer sites and one buyer had posted 3 xml files which
    hr/she could not read in IE, i tried them myself it would fail on some lines
    with IE giving the following error message

    A semi colon character was expected. Error processing resource
    'file:///C:/3Xmls/canales.es_9159529468.xml'. Line 16590, P...

    once the & was replacedwith &amp; it would move further and show a error on
    another line.

    The buyer wanted the errors corrected in the entire files, it was possible
    to do a find/replace (carefully) in a text editor, i have no intention of
    hacking and do not have the time to do so.

    If you want i can send you one of those files (i do not have the permission
    to do so, but that does not matter).
    following is one such problem node, link is the problem node

    <video>
    <idvideo>Publicidad</idvideo>
    <nombre>Publicidad</nombre>
    <descripcion>Publicidad</descripcion>
    <url>http://www.xxxxxxxxx.tv/xxx/redir.php?pf=zoneid__18;n__ae371c90;cb__786592291</url> <link>http://www.xxxxxxxxx.tv/ads/redir.php?clk=1&pf=zoneid__18;n__ae371c90;cb__786592291</link> <category>preroll</category> <thumbnail></thumbnail></video>"Tigger" <> wrote in messagenews:D...> "barry" <> wrote in messagenews:%...>>>> Thanks for your reply>>>> Imagine the following string>> string str = "The Quick Black&Fox &amp; Jumped Over &quote; The & LazyDog";>>>> should be>>>> string str = "The Quick Black&amp;Fox &amp; Jumped Over &quote; The &Lazy Dog";>>>> This is a problem with a larger .xml file in which xx&xx is creating aproblems in IE>>>> In fact in have just spent over 50 minutes and managed to get someresults like this>>>> str = Regex.Replace(str, @"\b\s*(?=&[^&amp;|&quote;| & ])\b", "&amp;",RegexOptions.None);>>>> And last but not the least i collect all answers posted to my Regexqueries for later use.>>>> Is this a case of correcting badly encoded data? Is the source dataexpected to be correctly encoded html/xml?>> It seems encoding certain "&"s while igonoring others is hacking around aproblem instead of sorting out why the source data is incorrect.>> Also, in your example you encode one "&" at "Black&Fox" while ignoringanother at "The & Lazy". So what are the rules?>> --> Tigger> http://www.mccreath.org.uk>
     
    barry, May 2, 2008
    #7
  8. barry

    barry Guest

    Hello Jesse

    Will this work on a entire XML file (it has over 20,000 lines) and there are
    many lines with such problems. The acutal job is long over, i am only trying
    to understand regex in such cases.

    Barry


    "Jesse Houwing" <> wrote in message
    news:...
    > Hello Barry,
    >
    > Let's try to wriete out the pattern you're looking for.
    >
    > You're specifically looking for a pattern that consists of an &, not
    > followed by any alphanumeric characters and a ;. Now if you write it out
    > like that, it becomes quite simple:
    >
    > &(?![a-z0-9]+;)
    >
    > That's it... Now, replace those with &amp; and you're done.
    >
    > Jesse
    >
    >
    >> Thanks for your reply
    >>
    >> Imagine the following string
    >> string str = "The Quick Black&Fox &amp; Jumped Over &quote; The & Lazy
    >> Dog";
    >> should be
    >>
    >> string str = "The Quick Black&amp;Fox &amp; Jumped Over &quote; The &
    >> Lazy Dog";
    >>
    >> This is a problem with a larger .xml file in which xx&xx is creating
    >> a problems in IE
    >>
    >> In fact in have just spent over 50 minutes and managed to get some
    >> results like this
    >>
    >> str = Regex.Replace(str, @"\b\s*(?=&[^&amp;|&quote;| & ])\b", "&amp;",
    >> RegexOptions.None);
    >>
    >> And last but not the least i collect all answers posted to my Regex
    >> queries for later use.
    >>
    >> "eBob.com" <> wrote in message
    >> news:...
    >>
    >>> Hi Barry,
    >>>
    >>> Actually I wanted to play around with this for you but just haven't
    >>> gotten around to it. (And, btw, I don't recall any previous posts
    >>> from you on this subject.)
    >>>
    >>> Part of my response to you, the part I can provide without actually
    >>> playing with a regex, is the following. Regular Expressions are
    >>> extremely useful. If you do any programming the effort you put into
    >>> learning regular expressions will be worth it. Several of us here
    >>> use Expresso (from UltraPico) and recommend it. I just became aware
    >>> of something similar called Regular Expression Workbench available
    >>> from MSDN. I've installed it but have not yet played with it.
    >>>
    >>> I'll try to play with it later today but no promises.
    >>>
    >>> Bob
    >>>
    >>> "barry" <> wrote in message
    >>> news:...
    >>>
    >>>> strange, no one has replied
    >>>>
    >>>> looks like i have crossed the limit of asking question on the same
    >>>> topic. I the some limit maybe (10 or 15 per topic)
    >>>>
    >>>> "barry" <> wrote in message
    >>>> news:...
    >>>>
    >>>>> Hi
    >>>>>
    >>>>> I have a files which contains
    >>>>> &
    >>>>> &amp;
    >>>>> &quote;
    >>>>> I want to replace & with &amp; , but not &amp; or &quote;
    >>>>>
    >>>>> Will someone please help with the Regular Expression.
    >>>>>
    >>>>> TIA
    >>>>> Barry

    > --
    > Jesse Houwing
    > jesse.houwing at sogeti.nl
    >
    >
     
    barry, May 2, 2008
    #8
  9. barry

    barry Guest

    Thanks Jesse

    It does the replace in the entire xml file.

    Barry


    "barry" <> wrote in message
    news:%...
    > Hello Jesse
    >
    > Will this work on a entire XML file (it has over 20,000 lines) and there
    > are many lines with such problems. The acutal job is long over, i am only
    > trying to understand regex in such cases.
    >
    > Barry
    >
    >
    > "Jesse Houwing" <> wrote in message
    > news:...
    >> Hello Barry,
    >>
    >> Let's try to wriete out the pattern you're looking for.
    >>
    >> You're specifically looking for a pattern that consists of an &, not
    >> followed by any alphanumeric characters and a ;. Now if you write it out
    >> like that, it becomes quite simple:
    >>
    >> &(?![a-z0-9]+;)
    >>
    >> That's it... Now, replace those with &amp; and you're done.
    >>
    >> Jesse
    >>
    >>
    >>> Thanks for your reply
    >>>
    >>> Imagine the following string
    >>> string str = "The Quick Black&Fox &amp; Jumped Over &quote; The & Lazy
    >>> Dog";
    >>> should be
    >>>
    >>> string str = "The Quick Black&amp;Fox &amp; Jumped Over &quote; The &
    >>> Lazy Dog";
    >>>
    >>> This is a problem with a larger .xml file in which xx&xx is creating
    >>> a problems in IE
    >>>
    >>> In fact in have just spent over 50 minutes and managed to get some
    >>> results like this
    >>>
    >>> str = Regex.Replace(str, @"\b\s*(?=&[^&amp;|&quote;| & ])\b", "&amp;",
    >>> RegexOptions.None);
    >>>
    >>> And last but not the least i collect all answers posted to my Regex
    >>> queries for later use.
    >>>
    >>> "eBob.com" <> wrote in message
    >>> news:...
    >>>
    >>>> Hi Barry,
    >>>>
    >>>> Actually I wanted to play around with this for you but just haven't
    >>>> gotten around to it. (And, btw, I don't recall any previous posts
    >>>> from you on this subject.)
    >>>>
    >>>> Part of my response to you, the part I can provide without actually
    >>>> playing with a regex, is the following. Regular Expressions are
    >>>> extremely useful. If you do any programming the effort you put into
    >>>> learning regular expressions will be worth it. Several of us here
    >>>> use Expresso (from UltraPico) and recommend it. I just became aware
    >>>> of something similar called Regular Expression Workbench available
    >>>> from MSDN. I've installed it but have not yet played with it.
    >>>>
    >>>> I'll try to play with it later today but no promises.
    >>>>
    >>>> Bob
    >>>>
    >>>> "barry" <> wrote in message
    >>>> news:...
    >>>>
    >>>>> strange, no one has replied
    >>>>>
    >>>>> looks like i have crossed the limit of asking question on the same
    >>>>> topic. I the some limit maybe (10 or 15 per topic)
    >>>>>
    >>>>> "barry" <> wrote in message
    >>>>> news:...
    >>>>>
    >>>>>> Hi
    >>>>>>
    >>>>>> I have a files which contains
    >>>>>> &
    >>>>>> &amp;
    >>>>>> &quote;
    >>>>>> I want to replace & with &amp; , but not &amp; or &quote;
    >>>>>>
    >>>>>> Will someone please help with the Regular Expression.
    >>>>>>
    >>>>>> TIA
    >>>>>> Barry

    >> --
    >> Jesse Houwing
    >> jesse.houwing at sogeti.nl
    >>
    >>

    >
    >
     
    barry, May 2, 2008
    #9
  10. Hello Barry,

    > Thanks Jesse
    >
    > It does the replace in the entire xml file.
    >


    You're welcome :)

    Jesse


    > Barry
    >
    > "barry" <> wrote in message
    > news:%...
    >
    >> Hello Jesse
    >>
    >> Will this work on a entire XML file (it has over 20,000 lines) and
    >> there are many lines with such problems. The acutal job is long over,
    >> i am only trying to understand regex in such cases.
    >>
    >> Barry
    >>
    >> "Jesse Houwing" <> wrote in message
    >> news:...
    >>
    >>> Hello Barry,
    >>>
    >>> Let's try to wriete out the pattern you're looking for.
    >>>
    >>> You're specifically looking for a pattern that consists of an &, not
    >>> followed by any alphanumeric characters and a ;. Now if you write it
    >>> out like that, it becomes quite simple:
    >>>
    >>> &(?![a-z0-9]+;)
    >>>
    >>> That's it... Now, replace those with &amp; and you're done.
    >>>
    >>> Jesse
    >>>
    >>>> Thanks for your reply
    >>>>
    >>>> Imagine the following string
    >>>> string str = "The Quick Black&Fox &amp; Jumped Over &quote; The &
    >>>> Lazy
    >>>> Dog";
    >>>> should be
    >>>> string str = "The Quick Black&amp;Fox &amp; Jumped Over &quote; The
    >>>> & Lazy Dog";
    >>>>
    >>>> This is a problem with a larger .xml file in which xx&xx is
    >>>> creating a problems in IE
    >>>>
    >>>> In fact in have just spent over 50 minutes and managed to get some
    >>>> results like this
    >>>>
    >>>> str = Regex.Replace(str, @"\b\s*(?=&[^&amp;|&quote;| & ])\b",
    >>>> "&amp;", RegexOptions.None);
    >>>>
    >>>> And last but not the least i collect all answers posted to my Regex
    >>>> queries for later use.
    >>>>
    >>>> "eBob.com" <> wrote in message
    >>>> news:...
    >>>>
    >>>>> Hi Barry,
    >>>>>
    >>>>> Actually I wanted to play around with this for you but just
    >>>>> haven't gotten around to it. (And, btw, I don't recall any
    >>>>> previous posts from you on this subject.)
    >>>>>
    >>>>> Part of my response to you, the part I can provide without
    >>>>> actually playing with a regex, is the following. Regular
    >>>>> Expressions are extremely useful. If you do any programming the
    >>>>> effort you put into learning regular expressions will be worth it.
    >>>>> Several of us here use Expresso (from UltraPico) and recommend it.
    >>>>> I just became aware of something similar called Regular Expression
    >>>>> Workbench available from MSDN. I've installed it but have not yet
    >>>>> played with it.
    >>>>>
    >>>>> I'll try to play with it later today but no promises.
    >>>>>
    >>>>> Bob
    >>>>>
    >>>>> "barry" <> wrote in message
    >>>>> news:...
    >>>>>> strange, no one has replied
    >>>>>>
    >>>>>> looks like i have crossed the limit of asking question on the
    >>>>>> same topic. I the some limit maybe (10 or 15 per topic)
    >>>>>>
    >>>>>> "barry" <> wrote in message
    >>>>>> news:...
    >>>>>>> Hi
    >>>>>>>
    >>>>>>> I have a files which contains
    >>>>>>> &
    >>>>>>> &amp;
    >>>>>>> &quote;
    >>>>>>> I want to replace & with &amp; , but not &amp; or &quote;
    >>>>>>> Will someone please help with the Regular Expression.
    >>>>>>>
    >>>>>>> TIA
    >>>>>>> Barry
    >>> --
    >>> Jesse Houwing
    >>> jesse.houwing at sogeti.nl

    --
    Jesse Houwing
    jesse.houwing at sogeti.nl
     
    Jesse Houwing, May 2, 2008
    #10
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Guest

    Need help on regex.replace

    Guest, Feb 27, 2006, in forum: Microsoft Dot NET Framework
    Replies:
    3
    Views:
    238
    Kevin Spencer
    Feb 28, 2006
  2. Alexey Smirnov

    Regex: replace first occurence only?

    Alexey Smirnov, Apr 2, 2006, in forum: Microsoft Dot NET Framework
    Replies:
    2
    Views:
    197
    Alexey Smirnov
    Apr 12, 2006
  3. Mike Edgewood

    Regex.Replace to format a phone number...

    Mike Edgewood, May 26, 2006, in forum: Microsoft Dot NET Framework
    Replies:
    3
    Views:
    701
    Ben Voigt
    May 29, 2006
  4. Help with Regex.replace

    , Feb 7, 2007, in forum: Microsoft Dot NET Framework
    Replies:
    7
    Views:
    179
  5. Rory Becker

    Regex replace where Search Value not between specific delimiters

    Rory Becker, Jun 7, 2007, in forum: Microsoft Dot NET Framework
    Replies:
    8
    Views:
    863
    Walter Wang [MSFT]
    Jun 22, 2007
Loading...

Share This Page