upper 128 ASCII chars in a text file

S

Sean Kirkpatrick

As part of my ongoing effort to provide a set of .Net wrappers for DAO,
I'm writing a simple parser in VB.Net to search collection of VB6 source
files to add explicit qualifiers to existing variables

existing: dim DB as Database
...
DB(0)

new: DB.Tabledefs(0)

For the most part it's working, except that I've discovered that the
original authors of the VB6 code (my boss and the guy that I replaced)
have embedded upper 128 ASCII characters in the VB6 files, i.e., the
copyright symbol, etc. Most of them are included as part of a comment
somewhere, and so if I trash it it's not a big deal. However, there are
a couple that are actually embedded in a string literal as part of the
code - can't trash them.

I've mucked around with some of the various IO namespace file classes to
see if I could easily figure out how to handle this, but while I play, I
figured I'd ask if anyone else has encountered a similar problem and
come up with a solution that I could leverage? Google isn't helpful so far.

Thanks!

Sean
 
C

Cor Ligthert [MVP]

Sean,

Normally that should not be a problem as well, depending that they have used
the characters that are local for you. Your name sounds quiet English so
probably is that the 1252

Have a look at those two links to get some insight in that.

Unicode
http://www.geocities.com/Athens/Academy/4038/graph/fontset.htm#b

OS systems
http://www.microsoft.com/globaldev/reference/oslocversion.mspx

Otherwise have a look at 'Encoding" not encode in msdn.

http://search.microsoft.com/search/results.aspx?qu=encoding&View=msdn&st=b&c=0&s=1&swc=0

I hope this helps

Cor
 
S

Sean Kirkpatrick

Thanks, Cor.

I find that the characters are converted during the oReader.Readline -
they simply are not present in the string read from the file. Hence,
when I try to rewrite the converted file, they're not present there
either. It's obvious what's happened when I run a diff on original and
the converted.

Could it be that I'm using the wrong type of reader? The examples I've
seen show something like

Dim oReader As System.IO.StreamReader =
System.IO.File.OpenText(psModuleName)

Sean
 
S

Sean Kirkpatrick

Yup, on input, it's throwing away anything above 127

my input line -> gvProductType1 = 1 ' comment with ® (174)

when read from the file, the ® is missing from the string variable used
in the input operation

sVar = oReader.Readline

Very strange - I thought the StreamReader could handle UTF-8 characters.

Sean
 
N

Norman Diamond

Sean Kirkpatrick said:
Yup, on input, it's throwing away anything above 127

I didn't look at ADO stuff yet but the Encoding stuff has a pretty useless
default. Its default is ASCII which is one set of character assignments to
codepoints 0 through 127. (There is no such thing as upper 128 ASCII, you
mean upper 128 of some 256-character code page but you didn't say which code
page.)
my input line -> gvProductType1 = 1 ' comment with ® (174)

when read from the file, the ® is missing from the string variable used in
the input operation

sVar = oReader.Readline

Very strange - I thought the StreamReader could handle UTF-8 characters.

I think the Encoding stuff can be told to handle UTF-8 characters instead of
its default ASCII. But if you tell it to do so, I don't think you'll get
the results you expect. To the best of my understanding there is no
codepoint 174 in UTF-8, so that byte will be the first byte of a multibyte
character, and the result will either be invalid or some character different
from what you expected. You need to figure out which code page you want to
work with and tell the Encoding stuff to use your intended code page.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top