Reading Unicode escape sequences from File

  • Thread starter Thread starter John Ztwin
  • Start date Start date
J

John Ztwin

Hello,

I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).

When I read the file using e.g. StreamReader Unicode escape sequences are
not converted to their character representation. They are shown excatly same
way than in file. Literals in C# code's variables are shown corretly.

Can anyone tell how to read Unicode escape sequences from file so that they
are presented like literals?

Thanks,
 
John Ztwin said:
I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).

When I read the file using e.g. StreamReader Unicode escape sequences are
not converted to their character representation.

No, I wouldn't expect them to be. That's done by the C# compiler - it
would be a big mistake for it to be done by StreamReader.
They are shown excatly same
way than in file. Literals in C# code's variables are shown corretly.

Can anyone tell how to read Unicode escape sequences from file so that they
are presented like literals?

You basically need to parse the text you've read, just like the C#
compiler does. You can search for \u fairly easily, then take the next
four digits, complain if they're not all hex, convert the hex to a
char, then replace the whole section with the character value.
 
John Ztwin said:
A little bit more work than in Java if I remember right,

Well, not if you use the normal BufferedReader and InputStreamReader in
Java.

Java's Properties class will do the unescaping for properties files,
but it isn't general purpose.
 
John said:
I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).

When I read the file using e.g. StreamReader Unicode escape sequences are
not converted to their character representation. They are shown excatly same
way than in file. Literals in C# code's variables are shown corretly.

Can anyone tell how to read Unicode escape sequences from file so that they
are presented like literals?

You will need to make a text replace.

Example code:

public static string U2U(string s)
{
string res = s;
MatchCollection reg = Regex.Matches(res, @"\\u([0-9A-F]{4})");
for(int i = 0; i < reg.Count; i++) {
res = res.Replace(reg.Groups[0].Value, "" +
(char)int.Parse(reg.Groups[1].Value, NumberStyles.HexNumber));
}
return res;
}

Arne
 
John said:
I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).

If the file always uses \u then there is no risk. However, some
standards (like the C# spec) allow \U (uppercase) escape sequences:

unicode-escape-sequence:
\u hex-digit hex-digit hex-digit hex-digit
\U hex-digit hex-digit hex-digit hex-digit hex-digit
hex-digit hex-digit hex-digit

http://msdn.microsoft.com/en-us/library/aa664812.aspx


Best regards
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top