Text encodings

  • Thread starter Thread starter Andy Burchill
  • Start date Start date
A

Andy Burchill

I am using the StreamReader to read in some text from a plain txt file
and then display it in a text box.

When I look at the text file in notepad and my program the text looks
all messed up, when I look at it in wordpad the spacing is correct and
there are no funny charcters.

How can I make sure the right text encoding is used so the file displays
properly ?

any help appreciated.
 
Andy said:
I am using the StreamReader to read in some text from a plain txt
file and then display it in a text box.

When I look at the text file in notepad and my program the text looks
all messed up, when I look at it in wordpad the spacing is correct
and there are no funny charcters.

Save for those encodings, which use preambles that allow them to be
identified (e.g. UTF-8 with BOM or UTF-16), there's no good way to
identify the correct encoding, so applications need to guess. Seems
Notepad is guessing wrong ;-)
How can I make sure the right text encoding is used so the file
displays properly ?

Silly answer: Use the right encoding. If you want to create an
application that supports multiple encodings, you have to provide some
means for the user to select a specific encoding.


Cheers,
 
Silly answer: Use the right encoding. If you want to create an
application that supports multiple encodings, you have to provide some
means for the user to select a specific encoding.


Cheers,

thanks for that, what is the esiest way to find out which encoding a
text file is using ?, i tried all the availiable encodings in my program
but none displayed properly.
 
Andy Burchill said:
thanks for that, what is the esiest way to find out which encoding a
text file is using ?, i tried all the availiable encodings in my program
but none displayed properly.

There's no guaranteed way to find out which encoding a text file is
using. For instance, any file is a valid code page 1252 file, but that
doesn't mean it's what you want.

Are you absolutely sure it's a plain text file in the first place,
rather than something like a word document?
 
Andy Burchill wrote:

thanks for that, what is the esiest way to find out which encoding a
text file is using ?, i tried all the availiable encodings in my
program but none displayed properly.

As I wrote before, at the end of the day it's trial & error with an
editor that allows you to switch between encodings on the fly.

Cheers,
 
Are you absolutely sure it's a plain text file in the first place,
rather than something like a word document?

I think so, I am currently using the read to end method of the
streamreader, I'll try reading a line at a time to see if this preserves
the original spacing.

I am enjoying c# so but I have a silly question I can't find the answer
to, whats the difference between a class and a namespace ?
 
Ok just to elaborate a little I am trying to rewrite a java program I
wrote last year, I checked the java source code and found out that for
some reason I was using the UTF8 encoding when saving txt files.

Unfortuantely setting the c# streamreader to use ut8 hasn't solved the
problem, so now I am really stuck. If it makes any difference the java
program used a fileOutputStream wrapped around a BufferedInputStream
wrapped around a OutputStreamWriter.

Any help appreciated.
 
Andy Burchill said:
I think so, I am currently using the read to end method of the
streamreader, I'll try reading a line at a time to see if this preserves
the original spacing.

Well, in what way is it "all messed up" at the moment? If the problem
is at the end of each line, it could well be that your Java code was
writing "\n" rather than the "\r\n" that a textbox would want.
I am enjoying c# so but I have a silly question I can't find the answer
to, whats the difference between a class and a namespace ?

A namespace is just a way of disambiguating names - for instance,
there's System.Windows.Forms.Control and System.Web.UI.Control.
 
Jon said:
Well, in what way is it "all messed up" at the moment? If the problem
is at the end of each line, it could well be that your Java code was
writing "\n" rather than the "\r\n" that a textbox would want.

The spacing was wrong, where it should have started a new line it just
wrapped, some funny non alphabetic or numerical characters here and there.

Sorted the problem out now, using a richTextBox instead, i don't
actually think it was ever a problem with the text encoding at all, just
that the standard textbox can't handle any formatting whatsoever (not
even new lines). Think about it this seems a little obvious, oh well,
you live and learn.

thanks for all your help.
 
Andy Burchill said:
The spacing was wrong, where it should have started a new line it just
wrapped, some funny non alphabetic or numerical characters here and there.

Here and there? If it's wherever there should be a newline, that sounds
like your problem.
Sorted the problem out now, using a richTextBox instead, i don't
actually think it was ever a problem with the text encoding at all, just
that the standard textbox can't handle any formatting whatsoever (not
even new lines). Think about it this seems a little obvious, oh well,
you live and learn.

It certainly *can* handle new lines, but it has to be CRLF (\r\n).
 
Back
Top