Encoding problem on VB.Net 2005

Lorenzo Puglioli · Jan 23, 2008

Hi all,

in my windows application I have to read a text file to a string array. The
file contains a Ã» character (code 251).
On my pc all works fine, but in the production one the Ã» characters
disapperas, the string resulting 1 character shorter.

The code that do the reading is shown here:

Using sr As System.IO.StreamReader = New
System.IO.StreamReader(Me.File.OpenRead, System.Text.Encoding.UTF8)
Dim s As List(Of String)
s = New List(Of String)
Do While Not sr.EndOfStream
s.Add(sr.ReadLine)
Loop
Return s.ToArray
End If
End Using

Why, even if I force the UTF8 decoding, the behavior on different PCs is not
the same?
My app runs on framework 2.0, operating system is windows xp pro in italian
language (in both PCs).
On my pc were installed the italian language pack: I also installed it on
the production one, but never changed.

Someone could help me?

Thank you in advance.
Lorenzo Puglioli

kimiraikkonen · Jan 23, 2008

Hi all,

in my windows application I have to read a text file to a string array. The
file contains a û character (code 251).
On my pc all works fine, but in the production one the û characters
disapperas, the string resulting 1 character shorter.

The code that do the reading is shown here:

Using sr As System.IO.StreamReader = New
System.IO.StreamReader(Me.File.OpenRead, System.Text.Encoding.UTF8)
Dim s As List(Of String)
s = New List(Of String)
Do While Not sr.EndOfStream
s.Add(sr.ReadLine)
Loop
Return s.ToArray
End If
End Using

Why, even if I force the UTF8 decoding, the behavior on different PCs is not
the same?
My app runs on framework 2.0, operating system is windows xp pro in italian
language (in both PCs).
On my pc were installed the italian language pack: I also installed it on
the production one, but never changed.

Someone could help me?

Thank you in advance.
Lorenzo Puglioli

Have you tried "System.Text.Encoding.Default" instead of
System.Text.Encoding.UTF8?

SurturZ · Jan 23, 2008

UTF-8 is an encoding scheme so it won't store ASCII 251 as &HFB (By my
calculation it stores it as &HC3BB see http://en.wikipedia.org/wiki/Utf-8).
But I'm assuming you are simply losing a character rather than seeing
complete gibberish.

Perhaps the Byte Order Mark is confusing things? What does the underlying
byte stream look like?

SurturZ · Jan 23, 2008

I mean Unicode code point 251 (U+00FB), not ASCII 251. Duh.

Cor Ligthert[MVP] · Jan 23, 2008

Lorenzo,

Probably you mean this one

\\\
Dim Str As New StreamReader(FilePath)
Dim arrInput As Byte() = _
System.Text.Encoding.GetEncoding(437).GetBytes(Str.ReadToEnd)
Str.Close()
///

Have a look at my reply to Gino for the rest of your problem.

Cor

Lorenzo Puglioli · Jan 23, 2008

Thanks Cor,

I can try using the encoding you suggest (423-MS DOS), but I still can't
understand what happens.
Why, even if I force a specific encoding, I obtain different results on
different PCs?

In addition, consider that: in my first post I didn't say that I don't
really care the (251) character being read, because it is in a part of file
that I ignore. But i read positionally some other character after this, so
the problem is that they result shifted left.

I explain.
By semplicity, I simplified the input file so it is 5 bytes long:
(32)(251)(32)(13)(10)
On my pc, the resulting string will be 3 characters long: a space, a strange
character (that I don't care) and a space again.
On the other pc it will be only 2 characters long: a space and another
space. The second space (and all other subsequent characters, if any), is
shifted left!

UPDATE:
In the meantime, I have made another test: if I try to read the file on a
CHAR array (using the utf8 decoder), this results the same on both PCs. The
difference happens when I write the char array to a string.
For example using:

dim s as new string(chars)

(where chars is the CHAR array)

I obtain the two different results shown above, even if char arrays are
identical.
I think this is due to the framework internal representation of strings.
So, why are them different? Can I control this?

Thank you
Lorenzo

Herfried K. Wagner [MVP] · Jan 23, 2008

SurturZ said:
UTF-8 is an encoding scheme so it won't store ASCII 251

There is not even an ASCII 251 because ASCII is a 7-bit encoding ;-).

Cor Ligthert[MVP] · Jan 24, 2008

Lorenzo,

I had a same kind of problem as you too some time ago.

(Using the textbox while copying I got not expected results)

I was forgotten that I had set my computer to the Polish Language, after
setting it back, my problem was gone.

Cor

kimiraikkonen · Jan 24, 2008

Lorenzo,

I had a same kind of problem as you too some time ago.

(Using the textbox while copying I got not expected results)

I was forgotten that I had set my computer to the Polish Language, after
setting it back, my problem was gone.

Cor

In a thread, i was having difficulty to display and save non-English
chars, setting encoding type to system.text.encoding.default kicked
the problem away. Note that if you're using Notepad, ANSI encoding
type is default and as suggested in my first reply, setting encoding
mode to default may kick your problem away.

Hope this helps.

Problem with UTF8 files	4	Apr 23, 2008
Getting correct encoding from a webresponse	2	Dec 24, 2004
Encoding problem	8	Aug 27, 2003
UTF8 encoding - Problem	1	Apr 28, 2005
encoding problem	2	Jan 11, 2005
Annoying keboard problem - please help	3	Jan 27, 2006
base64 encode/decode issue	2	Jan 7, 2009
Problem on Combobox textchange event VB.NET	2	Jan 14, 2009

Encoding problem on VB.Net 2005

Lorenzo Puglioli

kimiraikkonen

SurturZ

SurturZ

Cor Ligthert[MVP]

Lorenzo Puglioli

Herfried K. Wagner [MVP]

Cor Ligthert[MVP]

kimiraikkonen

Ask a Question

Similar Threads