Additional characters when reading htm text file

J

John

Hi

I have an htm text file which has just one word 'test' (without quotes) in
it. When I try to read it via below code;

Dim fso
Dim Ts
Dim emessage1 As String

Set fso = CreateObject("Scripting.FileSystemObject")
Set Ts = fso.OpenTextFile("myfile.htm", 1)
emessage1 = Ts.ReadAll

The value read in message1 is 'test' ie the system adds additional
characters in front of word test. What is the problem and how can I get rid
of these additional characters?

Thanks

Regards
 
P

Paul Shapiro

What do you see in the file if you open it with Notepad? If it looks correct
in Notepad, then it could be an issue with the character encoding. That's
usually what produces strange characters. If the file is UTF-8 encoded, you
have to specify the encoding when opening the file. The ADO Stream object
allows you to specify a file encoding.
 
J

John

Hi Paul

In Notepad I see test.

How can I figure the encoding what of the htm file so I can set it
accordingly in ADO Stream?

Thanks

Regards
 
P

Paul Shapiro

I think the first few bytes of a unicode file might specify the encoding,
but I don't know the details. You could try searching the web, because there
should be a reasonable way, or experiment to see what works. I would try
utf-8 and unicode or utf-16 as first guesses.

You might try opening it in notepad again. From the File menu choose Save As
and see what it is suggesting as the encoding type. I would hope it suggests
the current encoding.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top