printing ASCII character greater than 0x7f

A

auldh

i have come across a situation in my project where i read a text file with
some characters greater than hex 0x7f.

i need to write character (0xE0) to a new file as an exception. however when
i attempt to write this via "Console.Write" or "filestream.Write" it seems
the value changes. most of the output file is in text mode.

if i view the original file in binary mode i see the character i'm having
issue with as "e0 00" but when i re-write it i get "C3 A0".

i just can not reproduce the read information in my output file. is there
way to do this and if so how?
 
M

Marc Gravell

Well, ASCII doesn't define any characters above this. You need to know
the codepage of the file, and use the correct encoding - via (for
example) Encoding.GetEncoding(int codepage) [and pass this encoding into
whichever StreamReader etc you are using].

Otherwise, translations will occur. And whether they represent your
original data is anyone's guess.

Marc
 
J

Jon Skeet [C# MVP]

is there away to find the codepage? is there a good reference source?

There's no hard and fast rule for determining the codepage of a file.
A single file (i.e. a sequence of bytes) may be valid (but with
different meanings) for several different codepages.

Jon
 
A

auldh

ok, i got the codepage and i see in MSDN how to set the codepage.

i see how to write the bytes but how can i write the character to the output
file and not just the bytes?

is there a good sample on how to read a string with codepage 1250 (E0 01).
then write via Console.Write and FileStream.Write to UTF16? the string is
from a registry key and not another file per say.
 
J

Jon Skeet [C# MVP]

auldh said:
ok, i got the codepage and i see in MSDN how to set the codepage.

i see how to write the bytes but how can i write the character to the output
file and not just the bytes?

Use a StreamWriter (either directly or around a stream) and specify
Encoding.GetEncoding(1250).
is there a good sample on how to read a string with codepage 1250 (E0 01).
then write via Console.Write and FileStream.Write to UTF16? the string is
from a registry key and not another file per say.

If you're reading the string from the registry, I'd expect it to be in
Unicode already. However, if you read it as bytes, just use
Encoding.GetString(bytes).
 
A

auldh

Jon,
the key that i'm reading is "reg_sz" so it should not be byte. i guess the
value is corrupted because like you said the registry should be in Unicode
already.

and the base language is US english.
the tool i'm building is a registry export to "reg" format i want to do 2
things:
1) create an excetption reporting the key that is in trouble. (i got that
handled)
2) write the key and keyvalue to the export file just like it appears in the
registry.

i'm getting the keyname and keyvalue as string. how ever i can't
getFileStream.Write to rebuild the value correctly it is converting to???

how do i set the string to codepage 1250 on the read?
how do i set the FileStream.Write to codepage UTF16 to write?
i'm dizzy trying to get this done and not seeing straight.
 
J

Jon Skeet [C# MVP]

auldh said:
the key that i'm reading is "reg_sz" so it should not be byte. i guess the
value is corrupted because like you said the registry should be in Unicode
already.

Right. Have a look with regedit and see what it shows.
and the base language is US english.
the tool i'm building is a registry export to "reg" format i want to do 2
things:
1) create an excetption reporting the key that is in trouble. (i got that
handled)
2) write the key and keyvalue to the export file just like it appears in the
registry.

Well, that's tricky - because as far as I know you'll only get the key
value as a string.
i'm getting the keyname and keyvalue as string. how ever i can't
getFileStream.Write to rebuild the value correctly it is converting to???

If you've got garabage in the registry, you'll have a hard time
"fixing" it.
how do i set the string to codepage 1250 on the read?

You don't - the registry correctly reads whatever is there.
how do i set the FileStream.Write to codepage UTF16 to write?

Don't use a FileStream, use a StreamWriter and pass in the right
encoding.
i'm dizzy trying to get this done and not seeing straight.

See if http://pobox.com/~skeet/csharp/unicode.html helps at all.
 
Q

qglyirnyfgfo

Looks like you are reading the Unicode UTF16 character “00E0” 'LATIN
SMALL LETTER A WITH GRAVE' http://www.fileformat.info/info/unicode/char/00e0/index.htm
and then you are writing the character to a file using the default
encoding of UTF8 which will end up translating the UTF16 byte from
“00E0” to UTF8 “C3A0”.

If you want to have an exact copy of the bytes you are reading, then
you will need to set your StreamWritter encoding to UTF16 (.Net calls
it Encoding.Unicode) that way, if you open the file in binary mode you
will see “00E0” and not “C3A0”.

At least I think that is what’s going on…..

Rene
 
A

auldh

it sounds right. but from the registry key it looks like an "a" with a "`"
over it and it comes closer to codepage 1250.

i just can't figue out how to copy extactly to a new output file.

if i could just find a way to write via "streamwriter.write" this one
character set via a different codepage.
 
R

Rene

static void Main(string[] args)
{
// Your none ASCII character.
char charFromRegistry = (char)0xE0;

// Save the char to a file.
using (FileStream fs = new FileStream(@"C:\Err.bin", FileMode.Create))
{
byte[] uniBytes = Encoding.Unicode.GetBytes(new char[] {
charFromRegistry });
fs.WriteByte(uniBytes[0]);
}

// Read the char from the file.
using (FileStream fs = new FileStream(@"C:\Err.bin", FileMode.Open))
{
byte b = (byte)fs.ReadByte();
}
}

Perhaps I am missing something????
 
A

auldh

Rene,
sorry i guess this would work if i did create a ".bin" file.

the output file is a "text" file. the program reads a given registry hive
and enumerates it.

the program emulates the "regedit" export but it will read local and remote
machine.

the program creates an output file in "text" format then in "reg" format.
the later can be imported via regedit.
it also validates a specific hive to see if there are missing keys, missing
values and corrupts as it did in this case.

i guess i realizing there are too many issues to over come unless i'm wrong.
1) in this test run on the a machine i found a key with the wrong codepage
being used.
2) i don't think i can change the codepage output in run-time. meaning if i
create the output file in "reg" mode i'm using default ASCII.
and the format of the file can not be changed to something else for a given
line.
3) i need to alter my plan to exclude the corrupted key and create an
error.txt file with exceptions.

if i'm wrong i look forward to input.

i would like to thank all you who volunteer your input. well done.


Rene said:
static void Main(string[] args)
{
// Your none ASCII character.
char charFromRegistry = (char)0xE0;

// Save the char to a file.
using (FileStream fs = new FileStream(@"C:\Err.bin", FileMode.Create))
{
byte[] uniBytes = Encoding.Unicode.GetBytes(new char[] {
charFromRegistry });
fs.WriteByte(uniBytes[0]);
}

// Read the char from the file.
using (FileStream fs = new FileStream(@"C:\Err.bin", FileMode.Open))
{
byte b = (byte)fs.ReadByte();
}
}

Perhaps I am missing something????




auldh said:
it sounds right. but from the registry key it looks like an "a" with a "`"
over it and it comes closer to codepage 1250.

i just can't figue out how to copy extactly to a new output file.

if i could just find a way to write via "streamwriter.write" this one
character set via a different codepage.
 
H

Hans Kesting

auldh submitted this idea :
Rene,
sorry i guess this would work if i did create a ".bin" file.

Just rename the file to "something.txt". The
"Encoding.Unicode.GetBytes" part translated the characters in the
string to the bytes that should be in the file according to that
encoding. The filesystem sees no differences between "text" and
"binary" files: they all consist of a lot of "bytes".
You can either use a plain FileStream where you write bytes that you
got from passing a string through some encoding (as in the example) or
use a StringWriter with some encoding where you can "just" write a
string and have the exact same string-to-byte[] conversion take place
"below the covers".

Hans Kesting
static void Main(string[] args)
{
// Your none ASCII character.
char charFromRegistry = (char)0xE0;

// Save the char to a file.
using (FileStream fs = new FileStream(@"C:\Err.bin", FileMode.Create))
{
byte[] uniBytes = Encoding.Unicode.GetBytes(new char[] {
charFromRegistry });
fs.WriteByte(uniBytes[0]);
}

// Read the char from the file.
using (FileStream fs = new FileStream(@"C:\Err.bin", FileMode.Open))
{
byte b = (byte)fs.ReadByte();
}
}
 
A

auldh

ok, found this issue between my registry output file and the one created by
Windows.

regedit uses codepage encoding i just used StreamWriter with out specifing a
codepage. in seems regedit uses "Encoding.Unicode" specifier.

now i can get my character and binary compare also shows a perfect replica.

thanks everyone for your help.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top