reading in text

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I was wondering if it is possible to read in a text file as a data type other
than string. I would like to read it in as some type that handles numbers,
like double, or float.
 
Christine said:
I was wondering if it is possible to read in a text file as a data type other
than string. I would like to read it in as some type that handles numbers,
like double, or float.

You read each part as a string (or a whole line as a string and then
break it up), and then call the appropriate Parse method (or
Doule.TryParse).
 
My problem with that is that it is not working on my machine. I do not know
why, but my code is not running correctly when it comes to parsing the text.
I have had other people run the exact same code that I have and it works for
them but not for me. For some reason when I parse the text, I still get
empty values. But like I said, the exact same code, copied and pasted, works
differently for other people. So I am trying to find out how to get around
this problem.
 
Christine said:
My problem with that is that it is not working on my machine. I do not know
why, but my code is not running correctly when it comes to parsing the text.
I have had other people run the exact same code that I have and it works for
them but not for me. For some reason when I parse the text, I still get
empty values. But like I said, the exact same code, copied and pasted, works
differently for other people. So I am trying to find out how to get around
this problem.

You can't get an "empty" value for an int or a double though.

Could you post a short but complete program which demonstrates the
problem, even if it only demonstrates it on your computer?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.
 
Ok I guess this one uses split, but it is the code I know works elsewhere.

using System;
using System.IO;
namespace csConsole
class NULL
static void Main(string[] args)
StreamReader reader = new StreamReader("C:\\ChristineWork\\test1.txt");
string text = string.Empty;
while( reader.Peek() != -1 )
text += reader.ReadLine();
reader.Close();
string[] parts = text.Split(' ');
for(int i = 0; i < (parts.Length - 1); i++)
Console.WriteLine( parts + ";" );
Console.Read();
 
Christine said:
Ok I guess this one uses split, but it is the code I know works elsewhere.

using System;
using System.IO;
namespace csConsole
class NULL
static void Main(string[] args)
StreamReader reader = new StreamReader("C:\\ChristineWork\\test1.txt");
string text = string.Empty;
while( reader.Peek() != -1 )
text += reader.ReadLine();
reader.Close();
string[] parts = text.Split(' ');
for(int i = 0; i < (parts.Length - 1); i++)
Console.WriteLine( parts + ";" );
Console.Read();


Well, that's not a valid C# program, but some problems with it:

1) Using Peek is generally a bad idea. Just call ReadLine until it
returns null.

2) *Don't* use string concatenation like that - use a StringBuilder.


Now, you've said this code works elsewhere - is that the actual code
which fails on your box? If so, in what way does it fail?
 
Why not just say

StreamReader reader = new StreamReader("C:\\ChristineWork\\test1.txt");
text = reader.ReadToEnd().Replace ("\r\n","" );
reader.Close();


Jon Skeet said:
Christine said:
Ok I guess this one uses split, but it is the code I know works
elsewhere.

using System;
using System.IO;
namespace csConsole
class NULL
static void Main(string[] args)
StreamReader reader = new StreamReader("C:\\ChristineWork\\test1.txt");
string text = string.Empty;
while( reader.Peek() != -1 )
text += reader.ReadLine();
reader.Close();
string[] parts = text.Split(' ');
for(int i = 0; i < (parts.Length - 1); i++)
Console.WriteLine( parts + ";" );
Console.Read();


Well, that's not a valid C# program, but some problems with it:

1) Using Peek is generally a bad idea. Just call ReadLine until it
returns null.

2) *Don't* use string concatenation like that - use a StringBuilder.


Now, you've said this code works elsewhere - is that the actual code
which fails on your box? If so, in what way does it fail?
 
Also, I know this is going to sound bad, and I don't recycle my rubbish as
much as I should... but with machines running 1Ghz+ now as standard with
256MB, will StringBuilder give you a noticeable amount of performance
advantage over string1 + string2?

I know the other unknown there is depends on the length of string1 and
string2... but any ideas when it becomes noticeable?




Jon Skeet said:
Christine said:
Ok I guess this one uses split, but it is the code I know works
elsewhere.

using System;
using System.IO;
namespace csConsole
class NULL
static void Main(string[] args)
StreamReader reader = new StreamReader("C:\\ChristineWork\\test1.txt");
string text = string.Empty;
while( reader.Peek() != -1 )
text += reader.ReadLine();
reader.Close();
string[] parts = text.Split(' ');
for(int i = 0; i < (parts.Length - 1); i++)
Console.WriteLine( parts + ";" );
Console.Read();


Well, that's not a valid C# program, but some problems with it:

1) Using Peek is generally a bad idea. Just call ReadLine until it
returns null.

2) *Don't* use string concatenation like that - use a StringBuilder.


Now, you've said this code works elsewhere - is that the actual code
which fails on your box? If so, in what way does it fail?
 
Also, I know this is going to sound bad, and I don't recycle my rubbish as
much as I should... but with machines running 1Ghz+ now as standard with
256MB, will StringBuilder give you a noticeable amount of performance
advantage over string1 + string2?
Absolutely.

I know the other unknown there is depends on the length of string1 and
string2... but any ideas when it becomes noticeable?

The problem isn't that it becomes noticeable - the problem is that it
becomes unbearable. Consider this simulation of the previous program,
reading in a file with 50,000 very short lines - not a huge file:

using System;

class Test
{
static void Main()
{
DateTime start = DateTime.Now;
string total="";

for (int i=0; i < 50000; i++)
{
total += "hello";
}
DateTime end = DateTime.Now;
Console.WriteLine (end-start);
}
}

On my laptop (3GHz, 1GB memory) that takes about 30 seconds.

Change it to a StringBuilder:

using System;
using System.Text;

class Test
{
static void Main()
{
DateTime start = DateTime.Now;
StringBuilder builder = new StringBuilder();

for (int i=0; i < 50000; i++)
{
builder.Append ("hello");
}
string total = builder.ToString();
DateTime end = DateTime.Now;
Console.WriteLine (end-start);
}
}

It takes 0.015 seconds.

Using repeated concatenation gets worse and worse. It needs to create a
copy of *all* the data read so far for *every* line in the file.
StringBuilder copies the data each time the buffer overflows, so
creating it with a large enough buffer speeds things up even further,
but I think 30 vs 0.015 is a noticeable performance difference :)
 
Ok first I want to apologize for my disorganized thinking. I have tried so
many things in trying to get the code to work and give the correct output
that they are getting a bit confused in my brain.
Ok below is the complete code without any cuts. The file I am trying to read
in looks like:
5.62 1.49 6.53 3.91 3.26 3.04 4.47 2.58 2.01 2.00 2.68
4.17
2.85 5.78 6.02 4.65 3.25 4.45 4.73 1.60 2.76 1.75 6.82
0.29
4.41 5.52 12.51 6.89 1.84 2.31 3.70 3.07 0.08 1.10 2.74
6.83
8.22 1.58 4.21 5.29 1.93 3.34 4.38 3.54 4.68 3.56 3.94
2.30
5.86 6.02 8.18 3.69 2.56 2.17 4.68 2.24 0.98 2.03 2.02
5.95
2.95 6.51 3.88 5.42 2.61 11.27 3.32 1.80 4.54 5.07 6.01
2.65
4.58 2.32 4.22 5.78 4.31 3.02 1.77 7.21 4.29 1.00 1.16
7.69
5.22 4.02 8.81 3.06 2.60 3.51 1.24 3.07 5.00 2.59 4.70
6.60
in the text file. When I run this program I get the following:
5.62;
;
;
1.49;
;
;
6.53;
;
;
etc.
The other person running this code gets
5.62;
1.49;
etc.

using System;
using System.IO;

namespace csConsole
{
class NULL
{
static void Main(string[] args)
{
//open the file
StreamReader reader = new StreamReader("C:\\ChristineWork\\test1.txt");

string text = string.Empty; //to store the text

while( reader.Peek() != -1 )//while we're not at the end of the file
{
text += reader.ReadLine(); //append the line of text
}

reader.Close(); //close file

string[] parts = text.Split(' '); //split text by space

//the last element in parts is a NULL
//so I wrote Length-1
//the ';' I added to look if there are really no spaces left.
for(int i = 0; i < (parts.Length - 1); i++)
{
Console.WriteLine( parts + ";" );
}

Console.Read();
}
}
}


Jon Skeet said:
Christine said:
Ok I guess this one uses split, but it is the code I know works elsewhere.

using System;
using System.IO;
namespace csConsole
class NULL
static void Main(string[] args)
StreamReader reader = new StreamReader("C:\\ChristineWork\\test1.txt");
string text = string.Empty;
while( reader.Peek() != -1 )
text += reader.ReadLine();
reader.Close();
string[] parts = text.Split(' ');
for(int i = 0; i < (parts.Length - 1); i++)
Console.WriteLine( parts + ";" );
Console.Read();


Well, that's not a valid C# program, but some problems with it:

1) Using Peek is generally a bad idea. Just call ReadLine until it
returns null.

2) *Don't* use string concatenation like that - use a StringBuilder.


Now, you've said this code works elsewhere - is that the actual code
which fails on your box? If so, in what way does it fail?
 
Christine said:
Ok first I want to apologize for my disorganized thinking. I have tried so
many things in trying to get the code to work and give the correct output
that they are getting a bit confused in my brain.
Ok below is the complete code without any cuts. The file I am trying to read
in looks like:
5.62 1.49 6.53 3.91 3.26 3.04 4.47 2.58 2.01 2.00 2.68
4.17
2.85 5.78 6.02 4.65 3.25 4.45 4.73 1.60 2.76 1.75 6.82
0.29
4.41 5.52 12.51 6.89 1.84 2.31 3.70 3.07 0.08 1.10 2.74
6.83
8.22 1.58 4.21 5.29 1.93 3.34 4.38 3.54 4.68 3.56 3.94
2.30
5.86 6.02 8.18 3.69 2.56 2.17 4.68 2.24 0.98 2.03 2.02
5.95
2.95 6.51 3.88 5.42 2.61 11.27 3.32 1.80 4.54 5.07 6.01
2.65
4.58 2.32 4.22 5.78 4.31 3.02 1.77 7.21 4.29 1.00 1.16
7.69
5.22 4.02 8.81 3.06 2.60 3.51 1.24 3.07 5.00 2.59 4.70
6.60
in the text file. When I run this program I get the following:
5.62;
;
;
1.49;
;
;
6.53;
;
;
etc.

As you should.
The other person running this code gets
5.62;
1.49;
etc.

I suspect they've got a different file then. Note that you've got three
spaces between each number. Do they have that in their copy of the
file?
 
Hi, Christine.

You pretty much already answered your own question in that you read a text
file as text and you read a binary file as binary. That is, both TextReader
and StreamReader return strings.

One thing that is quite straightforward to do is to read in a line of text,
parse the text, and then use the Convert methods. In the simplest form
where you have a single number on each line, you can simply use code like:

FileStream filStream = new FileStream(sFilename, FileMode.Open,
FileAccess.Read);

BufferedStream bufStream = new BufferedStream(filStream, 10240);

StreamReader sReader = new StreamReader(bufStream,
System.Text.Encoding.UTF7);


while (sReader.Peek() >= 0)

{

string line = sReader.ReadLine();

double dMyDouble = Convert.ToDouble(line); // of course, you can merge this
with the above line

}

// Then close the streams

--- Bob
 
Bob Sillett said:
You pretty much already answered your own question in that you read a text
file as text and you read a binary file as binary. That is, both TextReader
and StreamReader return strings.

One thing that is quite straightforward to do is to read in a line of text,
parse the text, and then use the Convert methods. In the simplest form
where you have a single number on each line, you can simply use code like:

FileStream filStream = new FileStream(sFilename, FileMode.Open,
FileAccess.Read);

BufferedStream bufStream = new BufferedStream(filStream, 10240);

StreamReader sReader = new StreamReader(bufStream,
System.Text.Encoding.UTF7);


Or you can just use:

using (StreamReader reader = new StreamReader (sFileName,
Encoding.Whatever))
{
string line;
while ( (line = reader.ReadLine()) != null)
{
...
}
}

No need to wrap a BufferedStream round it, no need to explicitly create
the FileStream, and no need to explicitly close anything.

Note that Encoding.UTF7, which I know you only used as an example, is
almost never the correct encoding to use.
 
Jon Skeet said:
string line;
while ( (line = reader.ReadLine()) != null)
{
...
}

Hi Jon,

Since I value your opinions, how do you feel about the following construct
compared to the one above?

while (true)
{
string line = reader.ReadLine();
if (line == null)
{
break;
}
...
}
 
Since I value your opinions, how do you feel about the following construct
compared to the one above?

while (true)
{
string line = reader.ReadLine();
if (line == null)
{
break;
}
...
}

I prefer the brevity of the version above - it's such a common idiom
(in my code, anyway) that the slight problem of having an assignment
within the rest of a statement is okay.

It does expose the "line" variable to the rest of the scope,
admittedly. That could be fixed (in the rare cases where it's really a
problem) just by putting it in a block on its own:

{
string line;
while ( (line=reader.ReadLine()) != null)
{
}
}
 
[snip]
I prefer the brevity of the version above - [...]

The brevity is nice. I find mine slightly more readable, but maybe that's
somewhat due to the fact that I'm not accustomed to your version. I'll
ponder it.

Thanks
 
Back
Top