TextFieldParser - reading tab delimited file

A

al jones

I’m using textfieldparser to read a data file. which contains, for example:

Amondó Szegi Amondo Szegi
andré nossek André Nossek
© Characte Character

Note the vowels with diacriticals and the copyright symbol - it is dropping
these (and other similar) characters which fall outside ascii range
(apparently)

The code is simple and looks like:
Using MyReader As New TextFieldParser(Application.StartupPath &
"\designers.txt")
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.CommentTokens = New String() {"#"}
MyReader.Delimiters = New String() {vbTab}
MyReader.TrimWhiteSpace = True
Dim currentRow As String()
intElement = 0
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
If Microsoft.VisualBasic.Left(currentRow(0), 7) =
"UNKNOWN" Then
strUnknownDesigner = currentRow(1)
Continue While
End If
arDesigner(intElement, 0) = currentRow(0)
arDesigner(intElement, 1) = currentRow(1)
arDesignerCounter(intElement) = 0
intElement += 1
Catch ex As MalformedLineException
MsgBox("Designer Line " & ex.Message & "is not valid
and will be skipped.")
End Try
End While
End Using

I can’t see any reason in the documentation for it dropping copyright or
the French and German (etc…) vowels with accents.

Comments or suggestions anyone??

Thanks //al
 
A

Andrew Morton

al said:
I'm using textfieldparser to read a data file. which contains, for
example:

Amondó Szegi Amondo Szegi
andré nossek André Nossek
© Characte Character

Note the vowels with diacriticals and the copyright symbol - it is
dropping these (and other similar) characters which fall outside
ascii range (apparently)

It appears to be an encoding problem where the file uses (I'm guessing)
ISO-8859-1 or maybe Windows-1252 whereas the .NET framework defaults to
Unicode. Does a TextFieldParser have a setting for that (or have a
..BaseClass that does)?

Or perhaps you can arrange for the file to be encoded with Unicode?

Andrew
 
A

al jones

It appears to be an encoding problem where the file uses (I'm guessing)
ISO-8859-1 or maybe Windows-1252 whereas the .NET framework defaults to
Unicode. Does a TextFieldParser have a setting for that (or have a
.BaseClass that does)?

Or perhaps you can arrange for the file to be encoded with Unicode?

Andrew

Possibly my confusion is from the fact that I maintain these files (there
are three of them) within VS 2005 so I would have epected them to be
unicode. The characters exist within the files (the three line examples are
cut & paste from the file itself) so I don't understand why reading them
would literally eliminate the characters.

I've been over the TextFieldParser docs and see nothing that indicates that
it shouldn't take the data as presented.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top