Encoding problem on VB.Net 2005

L

Lorenzo Puglioli

Hi all,

in my windows application I have to read a text file to a string array. The
file contains a û character (code 251).
On my pc all works fine, but in the production one the û characters
disapperas, the string resulting 1 character shorter.

The code that do the reading is shown here:

Using sr As System.IO.StreamReader = New
System.IO.StreamReader(Me.File.OpenRead, System.Text.Encoding.UTF8)
Dim s As List(Of String)
s = New List(Of String)
Do While Not sr.EndOfStream
s.Add(sr.ReadLine)
Loop
Return s.ToArray
End If
End Using

Why, even if I force the UTF8 decoding, the behavior on different PCs is not
the same?
My app runs on framework 2.0, operating system is windows xp pro in italian
language (in both PCs).
On my pc were installed the italian language pack: I also installed it on
the production one, but never changed.

Someone could help me?

Thank you in advance.
Lorenzo Puglioli
 
K

kimiraikkonen

Hi all,

in my windows application I have to read a text file to a string array. The
file contains a û character (code 251).
On my pc all works fine, but in the production one the û characters
disapperas, the string resulting 1 character shorter.

The code that do the reading is shown here:

Using sr As System.IO.StreamReader = New
System.IO.StreamReader(Me.File.OpenRead, System.Text.Encoding.UTF8)
                Dim s As List(Of String)
                s = New List(Of String)
                Do While Not sr.EndOfStream
                    s.Add(sr.ReadLine)
                Loop
                Return s.ToArray
                End If
            End Using

Why, even if I force the UTF8 decoding, the behavior on different PCs is not
the same?
My app runs on framework 2.0, operating system is windows xp pro in italian
language (in both PCs).
On my pc were installed the italian language pack: I also installed it on
the production one, but never changed.

Someone could help me?

Thank you in advance.
Lorenzo Puglioli

Have you tried "System.Text.Encoding.Default" instead of
System.Text.Encoding.UTF8?
 
S

SurturZ

UTF-8 is an encoding scheme so it won't store ASCII 251 as &HFB (By my
calculation it stores it as &HC3BB see http://en.wikipedia.org/wiki/Utf-8).
But I'm assuming you are simply losing a character rather than seeing
complete gibberish.

Perhaps the Byte Order Mark is confusing things? What does the underlying
byte stream look like?
 
C

Cor Ligthert[MVP]

Lorenzo,

Probably you mean this one

\\\
Dim Str As New StreamReader(FilePath)
Dim arrInput As Byte() = _
System.Text.Encoding.GetEncoding(437).GetBytes(Str.ReadToEnd)
Str.Close()
///

Have a look at my reply to Gino for the rest of your problem.

Cor
 
L

Lorenzo Puglioli

Thanks Cor,

I can try using the encoding you suggest (423-MS DOS), but I still can't
understand what happens.
Why, even if I force a specific encoding, I obtain different results on
different PCs?

In addition, consider that: in my first post I didn't say that I don't
really care the (251) character being read, because it is in a part of file
that I ignore. But i read positionally some other character after this, so
the problem is that they result shifted left.

I explain.
By semplicity, I simplified the input file so it is 5 bytes long:
(32)(251)(32)(13)(10)
On my pc, the resulting string will be 3 characters long: a space, a strange
character (that I don't care) and a space again.
On the other pc it will be only 2 characters long: a space and another
space. The second space (and all other subsequent characters, if any), is
shifted left!

UPDATE:
In the meantime, I have made another test: if I try to read the file on a
CHAR array (using the utf8 decoder), this results the same on both PCs. The
difference happens when I write the char array to a string.
For example using:

dim s as new string(chars)

(where chars is the CHAR array)

I obtain the two different results shown above, even if char arrays are
identical.
I think this is due to the framework internal representation of strings.
So, why are them different? Can I control this?


Thank you
Lorenzo
 
H

Herfried K. Wagner [MVP]

SurturZ said:
UTF-8 is an encoding scheme so it won't store ASCII 251

There is not even an ASCII 251 because ASCII is a 7-bit encoding ;-).
 
C

Cor Ligthert[MVP]

Lorenzo,


I had a same kind of problem as you too some time ago.

(Using the textbox while copying I got not expected results)

I was forgotten that I had set my computer to the Polish Language, after
setting it back, my problem was gone.

Cor
 
K

kimiraikkonen

Lorenzo,

I had a same kind of problem as you too some time ago.

(Using the textbox while copying I got not expected results)

I was forgotten that I had set my computer to the Polish Language, after
setting it back, my problem was gone.

Cor

In a thread, i was having difficulty to display and save non-English
chars, setting encoding type to system.text.encoding.default kicked
the problem away. Note that if you're using Notepad, ANSI encoding
type is default and as suggested in my first reply, setting encoding
mode to default may kick your problem away.

Hope this helps.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top