parsing the output using string.split and current locale..

Prasad Dabak · Apr 5, 2005

Hello,

I have a legacy unmanaged application that returns property=value
pairs separated by chr(252)and I am trying to parse this output from
C# using string.split method.

This works fine as long as the default locale is en-US. However, the
moment, I change it to say Greek/Portuguese, the parsing logic goofs
up. I don't have access to the source code of legacy application to
make changes.

Is there any way, by which, I can change the parsing logic such that
it works irrespective of the locale?

Thanks.
-Prasad

Morten Wennevik · Apr 5, 2005

Hi Prasad,

When dealing with multilple languages you should treat strings as unicode.

The below code will get you the character for any encoding regardless of
locale.

Encoding e = Encoding.GetEncoding("ISO-8859-1");
char splitCharacter = e.GetString(new byte[]{252})[0];

Change "ISO-8859-1" with whatever encoding your system uses.

Alex Passos · Apr 5, 2005

Is the parsing breaking because you are seeing your delimeter as part of the
"property" or in the "value", that would cause split to break and not
provide the correct values. Is there anything unique to the property names
that you can verify against, what I am thinking is this:

1) You have delimieters showing up in "value"

Pre-scan the string and replace all delimeters to their proper HEX values
(as in HTTP encoding with % sign) if the delimeter is not immedialy
succeeded by a known property. If you know what all properties are this can
be implemented, and then you can split the string on delimiter and get
property = <encoded> values, which then you can decode the HEX values back
to char(252).

2) If your properties have delimeters and you are receiving known values
then implement similar to #1

3) If both a more creative solution needs to be thought about, more than I
can in this 5 minute post.

Prasad Dabak · Apr 6, 2005

Hello,

I am not sure, if I understand this.

The output received from the legacy unmanaged application is as
follows

p1=v1üp2=v2üp3=v3

Where ü is chr(252)

The C# application splits the output using ü separator.

This all works fine as long as the locale in en-US. The moment, I
change it to Portuguese, the output returned is as follows and hence
the split logic in C# application goofs up.

p1=v1³p2=v2³p3=v3

Is there any way to solve this irrespective of locale and without
making any changes to legacy unmanaged application.

Thanks.
-Prasad

parsing the output using string.split and current locale..

Prasad Dabak

Morten Wennevik

Alex Passos

Prasad Dabak