PC Review


Reply
Thread Tools Rate Thread

character encoding

 
 
=?Utf-8?B?TXJOb2JvZHk=?=
Guest
Posts: n/a
 
      6th Jan 2005
is there any way in C# to determine a file's character encoding? I am trying
to modify a file but the original program is not liking it, and I think it's
due to character encoding. Reason why is I tried just copying the file
through my program (not modifying a single thing) and the program still
rejected my file. I need to match the original file's character encoding but
don't know how...
 
Reply With Quote
 
 
 
 
Jon Skeet [C# MVP]
Guest
Posts: n/a
 
      6th Jan 2005
MrNobody <(E-Mail Removed)> wrote:
> is there any way in C# to determine a file's character encoding? I am trying
> to modify a file but the original program is not liking it, and I think it's
> due to character encoding. Reason why is I tried just copying the file
> through my program (not modifying a single thing) and the program still
> rejected my file. I need to match the original file's character encoding but
> don't know how...


There's no guaranteed way of doing it - there are various encodings
which could all match a given file, for instance. There are heuristic
ways of guessing, but nothing 100%.

--
Jon Skeet - <(E-Mail Removed)>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
 
Reply With Quote
 
=?Utf-8?B?TXJOb2JvZHk=?=
Guest
Posts: n/a
 
      6th Jan 2005
Damn, thought this was going to be easy...

I used a hex editor to compare the files and I see the original file appears
' tighter', not separated by null bytes (00), while my file I write is (so I
guess my file is 32 bit while the original is not ??). So between every
character is this 00 null byte in my file.

Also my file has some kind of small header (FF FE) while the original file
just starts right away with text.

Any idea how I can write such a file using c# ? what output stream?
 
Reply With Quote
 
Jon Skeet [C# MVP]
Guest
Posts: n/a
 
      6th Jan 2005
MrNobody <(E-Mail Removed)> wrote:
> Damn, thought this was going to be easy...
>
> I used a hex editor to compare the files and I see the original file appears
> ' tighter', not separated by null bytes (00), while my file I write is (so I
> guess my file is 32 bit while the original is not ??). So between every
> character is this 00 null byte in my file.
>
> Also my file has some kind of small header (FF FE) while the original file
> just starts right away with text.
>
> Any idea how I can write such a file using c# ? what output stream?


Which file? The one with no 00 bytes? Just use a StreamWriter and the
relevant encoding, eg Encoding.UTF8, Encoding.Default or
Encoding.ASCII.

It sounds like currently you've got Encoding.Unicode with a BOM (byte-
ordering mark).

--
Jon Skeet - <(E-Mail Removed)>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
 
Reply With Quote
 
=?Utf-8?B?TXJOb2JvZHk=?=
Guest
Posts: n/a
 
      6th Jan 2005

Ok, I used UTF7 and this looks like the closest match (it starts off right
away with text and there are no 00 null bytes between each character,) but
then when I write a semicolon to the file it comes out as "+ADs-"- what's
that all about ??
 
Reply With Quote
 
Jon Skeet [C# MVP]
Guest
Posts: n/a
 
      6th Jan 2005
MrNobody <(E-Mail Removed)> wrote:
> Ok, I used UTF7 and this looks like the closest match (it starts off right
> away with text and there are no 00 null bytes between each character,) but
> then when I write a semicolon to the file it comes out as "+ADs-"- what's
> that all about ??


You almost certainly *don't* want UTF7 - it's almost exclusively used
in mail. See http://www.faqs.org/rfcs/rfc2152.html for the details.

It's not a good idea to just arbitrarily guess about what encoding you
should use. What is this file, what's going to use it, etc?

--
Jon Skeet - <(E-Mail Removed)>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Character Encoding Casi12 Microsoft Excel Worksheet Functions 1 8th Sep 2009 08:01 AM
TDS and character encoding raymond_b_jimenez@yahoo.com Microsoft ADO .NET 30 28th Sep 2007 02:34 PM
character encoding LiMBi Microsoft VB .NET 3 25th Apr 2007 04:27 AM
TCP And Character Encoding Siddharth Sood via .NET 247 Microsoft VB .NET 1 24th May 2005 09:38 AM
Character encoding seb Microsoft Dot NET 2 3rd Aug 2004 02:12 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 05:52 PM.