ASCII files

A

Alex Leduc

I'm trying to load ASCII files that contain characters from the French
language in a way that is independant of whatever Locale the machine is
configured to use.

So If I have machine who's default Locale is "en-US" and I open some
french text like this:

[C# exaple that has the same behaviour in any .net languages]

StreamReader sr = new StreamReader("C:\\someFrenchFile.txt");
string strInput = sr.ReadToEnd();

Suppose the file contains this:
"Le Québec en été."
the characters that I get in strInput are:
"Le Qu?bec en ?t?."

If I change the default Locale in the Control Panel and use
Encoding.Default in the StreamReader's constructor parameters, I get the
right characters in strInput:
"Le Québec en été."

What I'd like to be able to do is load the french string with the right
characters regardless of what's the machine's default Locale. What's the
way to programmatically decide what Locale to use with all ASCII strings?

Alexandre Leduc
 
D

Dave Quigley[work]

Your stream reader is missing something important for the second parameter
use System.Text.Encoding.ASCII
otherwise it should eb unicode i believe.

This should helo you
 
J

Jack Hanebach

Alex Leduc said:
I'm trying to load ASCII files that contain characters from the French
language in a way that is independant of whatever Locale the machine is
configured to use. [snip]
What I'd like to be able to do is load the french string with the right
characters regardless of what's the machine's default Locale. What's the
way to programmatically decide what Locale to use with all ASCII strings?

If you know what's the code page of the file you can try to set
StreamReader's CurrentEncoding property to ASCIIEncoding with the CodePage
set to file's code page. [Warning! haven't tried it myself :)]

OTOH if you want to read arbitrary file in arbitrary language I'm afraid
it's not possible... (or, at least, I don't know the way...)
 
A

Alex Leduc

Dave said:
Your stream reader is missing something important for the second parameter
use System.Text.Encoding.ASCII
otherwise it should eb unicode i believe.

I forgot to mention that I've tried that and the result I get is:

"Le Qubec en t."

It removes all accentuated characters from the string.
 
A

Alex Leduc

Bruno said:
ASCII is a 7-bit codeset and it does not cover accentuated characters.

What you want is probably ISO-Latin1 also known as ISO-8859-1, which
contains the French accentuated characters. So, you should specify this
encoding when you open the StreamReader.

Bruno.

Could you tell me how to do that in code? I find the SDK documentation
on this topic to be a bit confusing.
 
M

Michael A. Covington

Alex Leduc said:
I'm trying to load ASCII files that contain characters from the French
language in a way that is independant of whatever Locale the machine is
configured to use.

If it contains anything non-English (such as accented letters), it's not
ASCII.

What you have is some kind of extension of ASCII, and there are many such.
 
A

Anthony Christianson

Try:

StreamReader sr = new StreamReader("C:\\someFrenchFile.txt",
System.Text.Encoding.GetEncoding("ISO-8859-1") );
string strInput = sr.ReadToEnd();
 
M

Marc Scheuner [MVP ADSI]

Your stream reader is missing something important for the second parameter
I forgot to mention that I've tried that and the result I get is:
"Le Qubec en t."
It removes all accentuated characters from the string.

Is it really ASCII (as in DOS / OEM), or is it ANSI (as in a regular
Windows file)??

If it's ANSI / Windows, try using System.Text.Encoding.Default. Works
for German umlauts for me :)

Marc

================================================================
Marc Scheuner May The Source Be With You!
Bern, Switzerland m.scheuner(at)inova.ch
 
A

Alex Leduc

Yeah I think what I was talking about is ANSI. I never understood the
difference between the two so I assumed they were two different names
for the same thing.
 
A

Alex Leduc

Thanks a lot. That worked fine.

Now what I'd like to know is if there's a way to tell my application to
always use this encoding for whatever string related methods/types it
has to use.

Kind of like in C

char *loc = setlocale(LC_ALL, "French_Canada.1252");

which can set the appication's locale at a global scope.

Alexandre Leduc
 
M

Marc Scheuner [MVP ADSI]

Yeah I think what I was talking about is ANSI. I never understood the
difference between the two so I assumed they were two different names
for the same thing.

No, not really - the ASCII stuff is "old" DOS age thingies - the ASCII
character set is standardized up to ASCII 127 and country-specific
above that - it usually contains things like French accented
characters, German Umlauts (ö ä ü) and so forth, plus line drawing
characters and a few mathematical symbols.

ANSI is the Windows base character set, which tossed out the
line-drawing characters and math stuff, and added extra characters.

Marc
================================================================
Marc Scheuner May The Source Be With You!
Bern, Switzerland m.scheuner(at)inova.ch
 
M

Marc Scheuner [MVP ADSI]

Alex Leduc said:
Assuming you mean accented characters, that's impossible. ASCII doesn't
contain any accented characters.

8-bit ASCII (e.g. codepage 850) does contain accented chars and German
umlauts etc - ASCII doesn't always stop at 7 bit, you know! There's a
whole wide world outside of English-speaking 7 bits! :)

Marc
================================================================
Marc Scheuner May The Source Be With You!
Bern, Switzerland m.scheuner(at)inova.ch
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top