reading 8BIT chars from old DOS file

Nick · Nov 3, 2003

Hi !
I want to load an old Pascal-Dos-File where
records stand in. When i view the file
in a HEX-Editor it's clear how to acces these
Strings and chars in that file. Since these
are old 8BIT chars (C# uses 16BIT) i read
the file bytewise and convert the bytes
to chars using ENCODER.getChars(). From the chars
i make a big String which should be the file as
i see in the HEX-Editor. But there are many errors
in it, as the getChars() Method seem not to
convert all the chars (bytes) propably.
Any suggestions ?

Please Help,
Thanks,
nick

Jon Skeet [C# MVP] · Nov 3, 2003

Nick said:
I want to load an old Pascal-Dos-File where
records stand in. When i view the file
in a HEX-Editor it's clear how to acces these
Strings and chars in that file. Since these
are old 8BIT chars (C# uses 16BIT) i read
the file bytewise and convert the bytes
to chars using ENCODER.getChars(). From the chars
i make a big String which should be the file as
i see in the HEX-Editor. But there are many errors
in it, as the getChars() Method seem not to
convert all the chars (bytes) propably.
Any suggestions ?

You need to use the right Encoding instance when grabbing the string
(you can usually use GetString rather than GetChars). What encoding was
it written in?

(Alternatively, just use a StreamReader with the appropriate encoding
to start with - you'll still need to know what encoding was used.)

Note that strings are character data, whereas when you're viewing the
file with a hex editor you're viewing it as binary data, possibly with
a text interpretation (in whatever encoding the hex editor feels is
appropriate) available too.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information about this whole topic.

Nick · Nov 3, 2003

hi !

you mentioned this text interpretation of the
hex editor. so in the hex editor every BYTE
is a character and that's exactly what i want.
because in the hex editor i can read all the info
i want to read out of the file, but with c# i
can't convert every byte. can the hex editor check
the encoding, or is this "no" encoding when every byte
is a character.

thanks,
nick

Jon Skeet [C# MVP] · Nov 3, 2003

Nick said:
you mentioned this text interpretation of the
hex editor. so in the hex editor every BYTE
is a character and that's exactly what i want.
because in the hex editor i can read all the info
i want to read out of the file, but with c# i
can't convert every byte. can the hex editor check
the encoding, or is this "no" encoding when every byte
is a character.

The hex editor can't really check the encoding because (say) *every*
file is a valid ISO-8859-1 file, just as an example.

When every byte is a character you still need to interpret that byte as
a character - and that depends on the encoding. The hex editor may well
be assuming an encoding such as Cp1252, or (for an old DOS file) Cp437.

Nick · Nov 3, 2003

okay

)
i have the following code:

StreamReader sr = new StreamReader(fileName,
System.Text.UnicodeEncoding.ASCII);
String s = sr.ReadToEnd();
sr.Close();

for (int i=0;i<3000;i++)
{
Console.WriteLine("Byte "+i+": "+s);
}

and, of course because of the wrong encoding,
i get the following error output:
Byte 0: Byte 86: 244:
Byte 323: 1: te 480: 637: Byte 716: : e 873:
Byte 874:
Byte 875:

it leaves some bytes undecoded.
but with the few encodings in System.Text.Encoding
i can't get it right.
how can i set other encodings, for example this
Cp1252 or Cp437 ?

thanks,
nick

Jon Skeet [C# MVP] · Nov 3, 2003

Nick said:
okay )
i have the following code:

StreamReader sr = new StreamReader(fileName,
System.Text.UnicodeEncoding.ASCII);

UnicodeEncoding.ASCII is a very weird way of writing that - I've seen
people use ASCIIEncoding.ASCII or UnicodeEncoding.Unicode, but never
mixing the two. I just write Encoding.ASCII, as that's where it's
actually declared.

String s = sr.ReadToEnd();
sr.Close();

for (int i=0;i<3000;i++)
{
Console.WriteLine("Byte "+i+": "+s);
}

and, of course because of the wrong encoding,
i get the following error output:
Byte 0: Byte 86: 244:
Byte 323: 1: te 480: 637: Byte 716: : e 873:
Byte 874:
Byte 875:

it leaves some bytes undecoded.
but with the few encodings in System.Text.Encoding
i can't get it right.
how can i set other encodings, for example this
Cp1252 or Cp437 ?

Use Encoding.GetEncoding - see the documentation for it for more
information. You might use Encoding.GetEncoding (1252) for instance, or
Encoding.GetEncoding(437).

Nick · Nov 3, 2003

hi,
all these getencoding methods seem to work,
i get no exception, but i can't still read
the textfile the way i want ?!?
i've tried it now with the simple notepad
(which could/should be easily implemented)
and it works with ANSI and UTF8. I can
read the few things i would need.
but when i read it with the StreamReader
and use Encoding.UTF8 or ANSI i get the same
not working output as if i would use ascii
or unicode (which are both not displaying
the data the way i would need it in notepad)

what could be so hard for c# reading the file
like notepad does or just interpret every
byte as a ascii character like the hex editor
does ?

thanks,
nick

Jon Skeet [C# MVP] · Nov 3, 2003

Nick said:
all these getencoding methods seem to work,
i get no exception, but i can't still read
the textfile the way i want ?!?
i've tried it now with the simple notepad
(which could/should be easily implemented)
and it works with ANSI and UTF8. I can
read the few things i would need.

Reading it as UTF8 should give radically different results to reading
it as "ANSI" (where "ANSI" means different things depending on your
region).

but when i read it with the StreamReader
and use Encoding.UTF8 or ANSI i get the same
not working output as if i would use ascii
or unicode (which are both not displaying
the data the way i would need it in notepad)

what could be so hard for c# reading the file
like notepad does or just interpret every
byte as a ascii character like the hex editor
does ?

Interpreting every byte as an ASCII character is easy - just use
Encoding.ASCII. You then run into trouble with bytes > 127 though, as
ASCII doesn't have values > 127.

One thing to consider is whether or not you're accurately examining the
data after reading it in. I find the best way to find out exactly what
characters I've got in a string is to print out their Unicode values
one at a time, as numbers, and look them up at http://www.unicode.org

Jon Skeet [C# MVP] · Nov 3, 2003

nick said:
now i have read a lot of articles in the net
and tried a lot of things, nothing helped.
now i've opened the file with notepad using
ANSI encoding and saved it with ending .txt
NOW it worked in c#, and i could read the info.
so what's the difference ?

Without knowing exactly what you're doing or what's in the file, it's
hard to say.

in explorer both files have the same size.
i opened them with hex edit and the ONLY difference
is that every "00" hex pair is replaced by "20" hex.
hmmm

and that's the problem with c# reading the file ?

I doubt it - but it may be part of the problem with how you were
displaying it.

reading 8BIT chars from old DOS file

Nick

Jon Skeet [C# MVP]

Nick

Jon Skeet [C# MVP]

Nick

Jon Skeet [C# MVP]

Nick

Jon Skeet [C# MVP]

Jon Skeet [C# MVP]