character encoding

G

Guest

is there any way in C# to determine a file's character encoding? I am trying
to modify a file but the original program is not liking it, and I think it's
due to character encoding. Reason why is I tried just copying the file
through my program (not modifying a single thing) and the program still
rejected my file. I need to match the original file's character encoding but
don't know how...
 
J

Jon Skeet [C# MVP]

MrNobody said:
is there any way in C# to determine a file's character encoding? I am trying
to modify a file but the original program is not liking it, and I think it's
due to character encoding. Reason why is I tried just copying the file
through my program (not modifying a single thing) and the program still
rejected my file. I need to match the original file's character encoding but
don't know how...

There's no guaranteed way of doing it - there are various encodings
which could all match a given file, for instance. There are heuristic
ways of guessing, but nothing 100%.
 
G

Guest

Damn, thought this was going to be easy...

I used a hex editor to compare the files and I see the original file appears
' tighter', not separated by null bytes (00), while my file I write is (so I
guess my file is 32 bit while the original is not ??). So between every
character is this 00 null byte in my file.

Also my file has some kind of small header (FF FE) while the original file
just starts right away with text.

Any idea how I can write such a file using c# ? what output stream?
 
J

Jon Skeet [C# MVP]

MrNobody said:
Damn, thought this was going to be easy...

I used a hex editor to compare the files and I see the original file appears
' tighter', not separated by null bytes (00), while my file I write is (so I
guess my file is 32 bit while the original is not ??). So between every
character is this 00 null byte in my file.

Also my file has some kind of small header (FF FE) while the original file
just starts right away with text.

Any idea how I can write such a file using c# ? what output stream?

Which file? The one with no 00 bytes? Just use a StreamWriter and the
relevant encoding, eg Encoding.UTF8, Encoding.Default or
Encoding.ASCII.

It sounds like currently you've got Encoding.Unicode with a BOM (byte-
ordering mark).
 
G

Guest

Ok, I used UTF7 and this looks like the closest match (it starts off right
away with text and there are no 00 null bytes between each character,) but
then when I write a semicolon to the file it comes out as "+ADs-"- what's
that all about ??
 
J

Jon Skeet [C# MVP]

MrNobody said:
Ok, I used UTF7 and this looks like the closest match (it starts off right
away with text and there are no 00 null bytes between each character,) but
then when I write a semicolon to the file it comes out as "+ADs-"- what's
that all about ??

You almost certainly *don't* want UTF7 - it's almost exclusively used
in mail. See http://www.faqs.org/rfcs/rfc2152.html for the details.

It's not a good idea to just arbitrarily guess about what encoding you
should use. What is this file, what's going to use it, etc?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top