The first line of the file's I'm getting is fouled up and so I cannot
open/read it at all using any XML features in VB. The first line is not
recognizeable. It's coiming to me saying it's UTF-8 but it's not and the
double quotes in the header are not coming to me as double quotes.
When I use StreamReader, alter the fist line and then save it as a new
file, that almost works but the characters that need to have the correct
encoding actually get changed to something else in the save process. I'm
guessing the stream reader is interpreting them funny and so it doesn't
really matter what I change the header to, the characters themselves change
(I checked in a hex editor to be sure).
So since it works to manually open these files in notepad and simply change
the header to the correct encoding, the characters themselves MUST have the
correct binary values. All that needs to be done is to change that header to
the right encoding without fouling up the characters in the body.
So how can I open the file in the most raw form of text, replace that first
line and save it without changing the characters in question in the process?
I made some progress with this:
Dim sr As New StreamReader(xmlFilesLocation & "\" & sArticleToPost,
Encoding.UTF7)
Dim text As String = sr.ReadToEnd
Dim text2() As String
ReDim text2(1)
text2(0) = text.Replace("<?xml version=1.0 encoding=UTF-8?>", "<?xml
version=""1.0"" encoding=""ISO-8859-1""?>")
System.IO.File.WriteAllLines(xmlFilesLocation & "\x" & sArticleToPost,
text2)
The text2 variable shows the correct characters and when I copy its value
into notepad it's fine. But it doesn't save right. I still get weirder
characters than I want. It's supposed to have characters like N with a
tilde, O with a tilde, O with an accent mark, etc. There are about 6 or 7 I
expect to see in this file. But when I open the newly saved files, those
characters are converted into very strange characters that I'd have to show
you.
I have a question regarding all of this. The encoding header merely tells
the program that's opening the file how to read the characters that are in
it. The characters are of course ultimately stored in binary so the encoding
knows how to interpret the binary into readable characters. If I open a file
using one encoding and the characters look a certain way and then save it
using another, the characters change binary. Is this all true? Am I
understandign this or not? I mean the 0's and 1's that are stored on disk
don't change just cuz of the way you open it. If you open it using one
interpreter (encoding) adn they look this way then open using another
encoding you'll see different characters. that makes sense to me. So the
only way I could see the binary changing is if the encoding used when saving
reinterprets the charcters to different string of 1's and 0's. Yes?
Okay, so when I choose the "encoding" parameter of StreamReader, there are
only about 5 options (UTF-7, UTF-8, UTF-32, ASCII, Default, ...) How do I
tell it I want it to read AND SAVE as ISO-8859-1????
Opening UTF-7 seems to help but OMG when I save using UTF-7 things are a big
mess.
Thanks,
Keith