Missing characters after file rewrite using File.OpenText

Z

Zark3

Hi all,
Unsure if this is the best group to place this, but here it is anyway
;).
I've got a large text file that needs rewriting into a different
format, and decided to try it using C#, which usually does my
programming tricks... However, this time I've got a difference of
opinion with the result :(
In words using accents and special chars (i.e. façade [c cedilla] or
één [e acute]) the result of my efforts just omits these characters.
Not the words entirely, just those letters. (i.e. façade turns into
faade). Pretty much, my question is why? I'm probably just forgetting
to set a text-encoding variable somewhere, but I can't seem to find out
where it should go.

Algorithm: (and yes, I know it's not the most elegant solution...)
---
StreamReader input = File.OpenText("C:\\ERUIT.txt");
string fields="", values="";
ArrayList lines = new ArrayList();
string inputline = input.ReadLine();
while(inputline != null) {
if(inputline.Substring(0, 1) == "$") {
fields = fields.Substring(0, fields.Length-2);
values = values.Substring(0, values.Length-2);
lines.Add(String.Format("INSERT INTO InMagic ({0}) VALUES ({1});",
fields, values));
fields = ""; values = "";
}
else {
string tla = inputline.Substring(0, 3);
if(tla == "SUB" || tla == "ODN" || tla == "RTY" || tla == "LOG" ||
tla == "SCG" || tla == "NOT" || tla == "DAT" || tla == "SCS" || tla ==
"QAL" || tla == "MAT" || tla == "LOA" || tla == "SCP" || tla == "SID"
|| tla == "RGN" || tla == "REF" || tla == "COL" || tla == "PHG" || tla
== "MIS" || tla == "CON" || tla == "RIN" || tla == "COD" ||tla ==
"HEI" || tla == "RPN") {
if(tla == "NOT") tla = "_NOT";
fields += String.Format("{0}, ", tla);
values += String.Format("\"{0}\", ", inputline.Substring(4));
}
else {
values = values.Substring(0, values.Length-3);
values += String.Format("{0}\", ", inputline);
}
}
inputline = input.ReadLine();
}
input.Close();
StreamWriter output = File.CreateText("C:\\OutputKern.txt");
for(int i=0; i<lines.Count; i++) {
output.WriteLine(lines);
}
output.Close();
---

PS: Config: WinXP having framework 1.0, 1.1 and 2.0 installed using VS
2k3

Thanks for every pointer anyone can give me,

Chris
 
J

Jon Skeet [C# MVP]

Zark3 said:
Unsure if this is the best group to place this, but here it is anyway
;).
I've got a large text file that needs rewriting into a different
format, and decided to try it using C#, which usually does my
programming tricks... However, this time I've got a difference of
opinion with the result :(
In words using accents and special chars (i.e. façade [c cedilla] or
één [e acute]) the result of my efforts just omits these characters.
Not the words entirely, just those letters. (i.e. façade turns into
faade). Pretty much, my question is why? I'm probably just forgetting
to set a text-encoding variable somewhere, but I can't seem to find out
where it should go.

If your input file isn't in UTF-8, you should specify the encoding when
you create your StreamReader.

If your output file isn't meant to be UTF-8, you should specify the
encoding when you create your StreamWriter.
 
Top