Writing out text with nulls

T

tshad

I have a program in 2005 that is reading a text file removing text and then
writing it back out again. It removes lines that start with PRINT.

This program has worked fine for months. Now all of a sudden, it is reading
a straight text file and adding a null after each character it reads in.
Why is that?

The original file doesn't have nulls in them. The code is:
********************************************
using System;
using System.IO;
using System.Collections.Generic;
using System.Text;

namespace DeletePrintStatements
{
class Program
{
static void Main(string[] args)
{
string lineDisplay;
string oldLineDisplay;
FileStream fs = null;

StreamReader sr = null;
fs = new FileStream(@"D:\Database
Scripts\CurrentSchema101408.sql", FileMode.Open, System.IO.FileAccess.Read);
sr = new StreamReader(fs);

StreamWriter sw = null;
sw = File.CreateText(@"D:\Database
Scripts\CurrentSchemaNoPrint101408.sql");

string stemp = null;
sw.WriteLine("set nocount on");

while (sr.Peek() >= 0)
{
lineDisplay = sr.ReadLine();

if (lineDisplay.Length >= 4) stemp =
lineDisplay.Substring(0, 4);

if ((lineDisplay.Length < 5) || (lineDisplay.Substring(0, 5)
!= "PRINT"))
sw.WriteLine(lineDisplay);
else
{
// Since last line was not a Print statement make sure
next line is = "GO" and if so ignore it

if (sr.Peek() >= 0)
{
oldLineDisplay = lineDisplay;
lineDisplay = sr.ReadLine();
if ((lineDisplay.Length < 2) ||
(lineDisplay.Substring(0, 2) != "GO"))
{
sw.WriteLine(oldLineDisplay); // Should only be
the "Update Succeeded" line
// or a print
statement inside of a SP
sw.WriteLine(lineDisplay);
}
}
}
Console.WriteLine(lineDisplay);
}
fs.Close();
sr.Close();
sw.Close();
Console.ReadLine();
}
}
}
********************************************

I have tried closing an reopening the program but it keeps doing the same
thing.

Thanks,

Tom
 
J

jimbrown

I have a program in 2005 that is reading a text file removing text and then
writing it back out again.  It removes lines that start with PRINT.

This program has worked fine for months.  Now all of a sudden, it is reading
a straight text file and adding a null after each character it reads in.
Why is that?

The original file doesn't have nulls in them.  The code is:
********************************************
using System;
using System.IO;
using System.Collections.Generic;
using System.Text;

namespace DeletePrintStatements
{
    class Program
    {
        static void Main(string[] args)
        {
            string lineDisplay;
            string oldLineDisplay;
            FileStream fs = null;

            StreamReader sr = null;
            fs = new FileStream(@"D:\Database
Scripts\CurrentSchema101408.sql", FileMode.Open, System.IO.FileAccess.Read);
            sr = new StreamReader(fs);

            StreamWriter sw = null;
            sw = File.CreateText(@"D:\Database
Scripts\CurrentSchemaNoPrint101408.sql");

            string stemp = null;
            sw.WriteLine("set nocount on");

            while (sr.Peek() >= 0)
            {
                lineDisplay = sr.ReadLine();

                if (lineDisplay.Length >= 4) stemp =
lineDisplay.Substring(0, 4);

                if ((lineDisplay.Length < 5) || (lineDisplay.Substring(0, 5)
!= "PRINT"))
                    sw.WriteLine(lineDisplay);
                else
                {
                    // Since last line was not a Print statement make sure
next line is = "GO" and if so ignore it

                    if (sr.Peek() >= 0)
                    {
                        oldLineDisplay = lineDisplay;
                        lineDisplay = sr.ReadLine();
                        if ((lineDisplay.Length <2) ||
(lineDisplay.Substring(0, 2) != "GO"))
                        {
                            sw.WriteLine(oldLineDisplay);  // Should only be
the "Update Succeeded" line
                                                           // or a print
statement inside of a SP
                            sw.WriteLine(lineDisplay);
                        }
                    }
                }
                Console.WriteLine(lineDisplay);
            }
            fs.Close();
            sr.Close();
            sw.Close();
            Console.ReadLine();
        }
    }}

********************************************

I have tried closing an reopening the program but it keeps doing the same
thing.

Thanks,

Tom

The output you describe is what Unicode characters would look like.
Maybe your project changed from multi-byte to Unicode.
 
T

tshad

Here is the file I am reading:

SET NUMERIC_ROUNDABORT OFF
GO
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT, QUOTED_IDENTIFIER, ANSI_NULLS ON
GO

Here is what it comes up with:

0: 73 65 74 20 6E 6F 63 6F 75 6E 74 20 6F 6E 0D 0A set nocount on..
10: 53 00 45 00 54 00 20 00 4E 00 55 00 4D 00 45 00 S.E.T. ..N.U.M.E.
20: 52 00 49 00 43 00 5F 00 52 00 4F 00 55 00 4E 00 R.I.C._.R.O.U.N.
30: 44 00 41 00 42 00 4F 00 52 00 54 00 20 00 4F 00 D.A.B.O.R.T. .O.
40: 46 00 46 00 0D 0A 00 0D 0A 00 47 00 4F 00 0D 0A F.F.......G.O...
50: 00 0D 0A 00 53 00 45 00 54 00 20 00 41 00 4E 00 .....S.E.T. .A.N.

60: 53 00 49 00 5F 00 50 00 41 00 44 00 44 00 49 00 S.I._.P.A.D.D.I.
70: 4E 00 47 00 2C 00 20 00 41 00 4E 00 53 00 49 00 N.G.,. ..A.N.S.I.
80: 5F 00 57 00 41 00 52 00 4E 00 49 00 4E 00 47 00 _.W.A.R.N.I.N.G.
90: 53 00 2C 00 20 00 43 00 4F 00 4E 00 43 00 41 00 S.,. ..C.O.N.C.A.
A0: 54 00 5F 00 4E 00 55 00 4C 00 4C 00 5F 00 59 00 T._.N.U.L.L._.Y.

B0: 49 00 45 00 4C 00 44 00 53 00 5F 00 4E 00 55 00 I.E.L.D.S._.N.U.
C0: 4C 00 4C 00 2C 00 20 00 41 00 52 00 49 00 54 00 L.L.,. ..A.R.I.T.
D0: 48 00 41 00 42 00 4F 00 52 00 54 00 2C 00 20 00 H.A.B.O.R.T.,. .
E0: 51 00 55 00 4F 00 54 00 45 00 44 00 5F 00 49 00 Q.U.O.T.E.D._.I.
F0: 44 00 45 00 4E 00 54 00 49 00 46 00 49 00 45 00 D.E.N.T.I.F.I.E.
100: 52 00 2C 00 20 00 41 00 4E 00 53 00 49 00 5F 00 R.,. ..A.N.S.I._.

110: 4E 00 55 00 4C 00 4C 00 53 00 20 00 4F 00 4E 00 N.U.L.L.S. .O.N.
120: 0D 0A 00 0D 0A 00 47 00 4F 00 0D 0A 00 0D 0A 00 .......G.O.......
130: 0D 0A ..


As you can see the line that was added (set nocount on) didn't have nulls and the lines it read it does.

What would cause this?

Thanks,

Tom

tshad said:
I have a program in 2005 that is reading a text file removing text and then
writing it back out again. It removes lines that start with PRINT.

This program has worked fine for months. Now all of a sudden, it is reading
a straight text file and adding a null after each character it reads in.
Why is that?

The original file doesn't have nulls in them. The code is:
********************************************
using System;
using System.IO;
using System.Collections.Generic;
using System.Text;

namespace DeletePrintStatements
{
class Program
{
static void Main(string[] args)
{
string lineDisplay;
string oldLineDisplay;
FileStream fs = null;

StreamReader sr = null;
fs = new FileStream(@"D:\Database
Scripts\CurrentSchema101408.sql", FileMode.Open, System.IO.FileAccess.Read);
sr = new StreamReader(fs);

StreamWriter sw = null;
sw = File.CreateText(@"D:\Database
Scripts\CurrentSchemaNoPrint101408.sql");

string stemp = null;
sw.WriteLine("set nocount on");

while (sr.Peek() >= 0)
{
lineDisplay = sr.ReadLine();

if (lineDisplay.Length >= 4) stemp =
lineDisplay.Substring(0, 4);

if ((lineDisplay.Length < 5) || (lineDisplay.Substring(0, 5)
!= "PRINT"))
sw.WriteLine(lineDisplay);
else
{
// Since last line was not a Print statement make sure
next line is = "GO" and if so ignore it

if (sr.Peek() >= 0)
{
oldLineDisplay = lineDisplay;
lineDisplay = sr.ReadLine();
if ((lineDisplay.Length < 2) ||
(lineDisplay.Substring(0, 2) != "GO"))
{
sw.WriteLine(oldLineDisplay); // Should only be
the "Update Succeeded" line
// or a print
statement inside of a SP
sw.WriteLine(lineDisplay);
}
}
}
Console.WriteLine(lineDisplay);
}
fs.Close();
sr.Close();
sw.Close();
Console.ReadLine();
}
}
}
********************************************

I have tried closing an reopening the program but it keeps doing the same
thing.

Thanks,

Tom
 
T

tshad

Peter Duniho said:
Please do not post HTML. Use plain text. As for the question...

Here is the file I am reading: [...]

Where did that file come from? As Jim suggested, the text with the 0
bytes do in fact look like Unicode characters (UTF-16 to be specific).
The bytes you posted have mixed UTF-8 and UTF-16 (UTF-8 is the default for
StreamWriter, and as long as the characters are all in the 0-127 range
will be indistinguishable from ASCII), because you're reading UTF-16 data
from the original file and emitted that data as if it were UTF-8 (along
with the other UTF-8 stuff you've added, such as the first line, and the
line breaks).

Whatever the problem is, it's related to whatever outputs the file you're
reading. Somewhere along the line, it apparently got changed to output
UTF-16. You can either fix your program to read the input as UTF-16
instead, or you can go smack upside the head whatever person it was that
changed the output format without consulting the people that would affect
(such as yourself). And then get them to change it back so that they are
writing UTF-8 or ASCII again (whatever it was that was being written in
the first place).

Found out what was going on. Just not sure why.

It seems to be written out in unicode (hex shows it that way) but the
program sees it as ansi (utf-8, I assume). And the program handles it fine.

But if I make any change (textpad or notepad) it now shows the each
character as having a blank character between it when it writes it out.
Then when you look at it in Textpad it shows a black box between each
character and Notepad shows a blank between each character.

Not sure why they are different. In both cases, there were nulls between
each character. But the editors treated them different.

Tom
 
M

Michael B. Trausch

But if I make any change (textpad or notepad) it now shows the each
character as having a blank character between it when it writes it
out. Then when you look at it in Textpad it shows a black box between
each character and Notepad shows a blank between each character.

Not sure why they are different. In both cases, there were nulls
between each character. But the editors treated them different.

The text editor is probably set up to use UTF-16 encoding for
characters. Per MSDN, UTF-16 is the internal encoding used in Windows
and .NET,[1] Java also uses this as well, IIRC. It could be saving the
file in that way if the system configuration has somehow changed to do
that, but I don't know what would be involved in such a thing.

In any case, if you can manage to do it, you should probably try to
detect the character set of the file before processing it, so that your
program can appropriately handle it. UTF-16 is pretty easy to detect
for documents that contain characters which mostly or completely fit in
the ASCII character set, and most ASCII-compatible ones are detectable
if you know their rules; ASCII compatible charsets use 0-127
identically to ASCII. You could, in theory, detect UTF-16 and
compensate for that, and otherwise just read bytes in the range of
33-127, as a (very simple, but not terribly robust) way for dealing
with files that may have an arbitrary charset.

--- Mike
 
F

Franck

Your problem seems the file format.

try
sw = new StreamWriter(fs, System.Text.Encoding.UTF8);
with reader you can do the same, try always specify the format your
are readign when it's none binary files
obviously System.Text.Encoding contains other format like ASCII Utf16
and more. choose one and stick with it.

But those are SQL query so they should be using anything else than
ASCII or UTF8. And right now your code seems to read as UTF16
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top