File.ReadAllLines doesn't read last blank line? Weird?

R

ralpugan

Hi,
I don't know whether anyone faced it, using .NET 2.0, i'm loading a
text file into a string array then adding it to a listbox or use a
loop to see the context anywhere to see the content:

Dim content() As String = System.IO.File.ReadAllLines("c:
\nonblank.txt")
Listbox1.Items.AddRange(content)

My text file's last TWO lines are blank, but after loading it to a
listbox or when i loop through it, i ONLY see ONE blank line in the
whole output which is weird.

My text file's content:
' I will comment out to specify which lines are really blank

line1
line2
line3
line4
line6
line7
line8
line9
line10
' Here is blank-line11
' Here is blank-line12

Now, to test, create a text file that consists in 12 lines above
(leave last 2 lines blank), and remove comments and leave them blank
(not empty char, completely blank), load it into a string array using
IO.File.ReadAllLines and load them into a listbox with AddRange method
(preferably).

In the output, i always see there's one blank line after line 10 (that
is line 11) and line 12 doesn't appear. That means why ReadAllLines
doesn't read all the lines with all the blank lines till the end.

I'm uploading a screenshot to make it more clear:
http://img26.imageshack.us/img26/4906/readalllines.jpg

Output on listbox:
http://img26.imageshack.us/img26/6100/listbox.jpg

Does it indicate a problem about ReadAllLines? I hope someone makes a
clear explanation about it.

Thanks!

ralpugan
 
O

Onur Güzel

Hi,
I don't know whether anyone faced it, using .NET 2.0, i'm loading a
text file into a string array then adding it to a listbox or use a
loop to see the context anywhere to see the content:

Dim content() As String = System.IO.File.ReadAllLines("c:
\nonblank.txt")
Listbox1.Items.AddRange(content)

My text file's last TWO lines are blank, but after loading it to a
listbox or when i loop through it, i ONLY see ONE blank line in the
whole output which is weird.

My text file's content:
' I will comment out to specify which lines are really blank

line1
line2
line3
line4
line6
line7
line8
line9
line10
' Here is blank-line11
' Here is blank-line12

Now, to test, create a text file that consists in 12 lines above
(leave last 2 lines blank), and remove comments and leave them blank
(not empty char, completely blank), load it into a string array using
IO.File.ReadAllLines and load them into a listbox with AddRange method
(preferably).

In the output, i always see there's one blank line after line 10 (that
is line 11) and line 12 doesn't appear. That means why ReadAllLines
doesn't read all the lines with all the blank lines till the end.

I'm uploading a screenshot to make it more clear:http://img26.imageshack.us/img26/4906/readalllines.jpg

Output on listbox:http://img26.imageshack.us/img26/6100/listbox.jpg

Does it indicate a problem about ReadAllLines? I hope someone makes a
clear explanation about it.

Thanks!

ralpugan

Minor correction, the text file's content is:
(I missed line5)

line1
line2
line3
line4
line5
line6
line7
line8
line9
line10
' Here is blank-line11
' Here is blank-line12

However, still ReadAllLines method doesn't read blank line 12, it
claims line 11 is the final line as stated above.

Please see the screenshots.

Thanks,

ralpugan
 
A

Armin Zingler

ralpugan said:
I'm uploading a screenshot to make it more clear:
http://img26.imageshack.us/img26/4906/readalllines.jpg

The cursor (caret) location is _after_ the last character. The last two
characters are CRLF, terminating line 11. Between these characters, there is
nothing, so there is no line 12. ReadAllLines works correct, IMO.

If you start notepad and save the file, it has 0 bytes, so there is no line
1 even if the caret is located in line 1.


Armin
 
R

ralpugan

The cursor (caret) location is _after_ the last character. The last two
characters are CRLF, terminating line 11. Between these characters, there is
nothing, so there is no line 12. ReadAllLines works correct, IMO.

If you start notepad and save the file, it has 0 bytes, so there is no line
1 even if the caret is located in line 1.

Armin

Thanks for your reply, however that arises 2 more questions:

1) In notepad, I went to line12 by pressing ENTER key on line11, so
doesn't it mean that line12 was created?

2) If there's no line12, why does notepad show line12 on the status
bar of Notepad(bottom-right) while caret flashes on it?
http://img26.imageshack.us/img26/4906/readalllines.jpg

ralpugan
 
A

Armin Zingler

ralpugan said:
Thanks for your reply, however that arises 2 more questions:

1) In notepad, I went to line12 by pressing ENTER key on line11, so
doesn't it mean that line12 was created?

No, CR+LF was created. While being in line11, there were only 10 lines.
Notepad puts the caret _after_ the last char.

You can also use a hex editor to view the file:
....line9 said:
2) If there's no line12, why does notepad show line12 on the status
bar of Notepad(bottom-right) while caret flashes on it?
http://img26.imageshack.us/img26/4906/readalllines.jpg

If you could place the caret only into existing lines, how were you ever
able to add a new line? So, "St 12" is a "virtual" line 12. It does not
really exist until you enter anything in it.
As I wrote, start notepad. With a new document there are 0 lines. Though,
notepad shows line 1.


Armin
 
M

mayayana

1) In notepad, I went to line12 by pressing ENTER key on line11, so
doesn't it mean that line12 was created?
Yes. CrLf is actually 2 characters, with the Lf on
the new line. So a string that ends in CrLf should really
be treated differently than one that doesn't. But if the
function doesn't count an ending CrLf then that's that. :)
You just have to deal with the way it works and be aware
of that in the future.
 
J

James Hahn

LF is the last character of the line - it is not on the new line.

A line is defined as zero or more characters ending in a CrLf character
pair. So a line can consist of CrLf on its own.

Although the cursor will sit on a 12th line in Notepad, there is no ending
CrLf for that line. The file has 11 lines. The 11th line has zero characters
before the CrLf. The function _does_ count the ending CrLf.
 
M

Michael Williams

A line is defined as zero or more characters
ending in a CrLf character . . . Although the
cursor will sit on a 12th line in Notepad,
there is no ending CrLf for that line.

By that definition if the last line of a NotePad document is "A nice glass
of rum and Coke®" and it does not have a CrLf after it, which is exactly
what I have in this NotePad document sitting in front of me, then it does
not exist! Shame that, because I like a nice glas of rum and Coke®
occasionally. Luckily for me you are wrong and not all "lines" in a text
file need a CrLf. Certainly the last line does not. I've just saved such a
file and then loaded it as raw data and my "Nice glass of rum and Coke®"
definitely does exist, and it definitely does not end in CrLf. Perhaps you
might like to revise your definition while I get on with this refreshing
drink :)

Mike
 
M

Michel Posseth [MCP]

Michael Williams is right if you rewind a file stream to the end and read
back you wil encounter the newline after you have processesd the last line
characters
so the last line does not have a newline .

imho it is obvious behavior

bla|
hello|
bladiebla|

would give on a split
bla , hello , bladiebla , "empty string"

bla|
hello|
bladiebla

would give on a split
bla , hello , bladiebla

wich is correct


Michel
 
M

Michel Posseth [MCP]

The discussion should be more if it is expected behavior that notepad uses a
"virtual" line
well look at a datagrid in add modus the last line is a "virtual" line it is
there cause you can see it but it doesn`t exist in the database

so i guess the behavior is common behavior , and thought out by a smart
person ;-) for a verry good reasson .
 
M

Michael Williams

I think you misunderstood. Notepad will put the cursor on a
non-existent line if the previous line has a CRLF combination
at the end.

I understand perfectly well, but James Hahn did not say what you have just
said, and it was James to whom I was responding. James said, "A line is
defined as zero or more characters ending in a CrLf character". By that
definition a string of text which does not end in a CrLf is not a line,
which of course is not true because the last line of text in a text file
exists whether it ends in a CrLf or not, and it is on the next line AFTER
the last item that would qualify as a line using James Hahn's definition! It
is, therefore, another line regardless of the fact that it may or may not
end in CrLf. If you want to be really pedantic about it, which I'm sure you
do [;-)] then you could say that nobody actually knows whether the author
intended it to be a FULL line of text if it does not end in CrLf, but it is
very definitely at the very least part of a line of text that may, or may
not, be added to in a future editing session and that may, or may not, be a
full line in the eyes of the author. In the meantime though, until the
author or someone else reloads that document and adds some more text or
control characters, any final string of text, whether it is followed by any
CrLf or not, must be regarded as a valid line of the document as it
currently stands and it very definitely MUST be loaded by any application
which reads that document line by line or in any other way. If it does not
end in CrLf then only the author of the text itself knows whether he
intended it to be a full line or not, and whether he intended to add some
more text to it at a later editing stage or whether he perhaps intended to
"close it as a full line" in the next editing stage by adding a CrLf, but at
the moment, with the text file existing as I have currently described it,
that last string of text very definitely exists AFTER what would have been
declared the last "line" according to James's definition, and it will be on
a DIFFERENT line to it, so it very definitely DOES exist as a line.

Have a nice day.

Mike
 
M

mayayana

LF is the last character of the line - it is not on the new line.I posted what I did because in some cases, in my
experience, returning text programmatically will work
in the way I've described. So it seems that the Lf actually
marks the start of a line. Though that behavior may not
be universal.
A line is defined as zero or more characters ending in a CrLf character
pair. So a line can consist of CrLf on its own.

Isn't that contrary to what the OP has been finding?
He's got lines with only vbCrLf that are not being returned
when he checks lines. And in a RichEdit window, for instance,
a new line is only vbCr. If one parses the RichEdit content
directly the line returns must be fixed.

All of which is to stress my original intention with my
post: The important thing is to be aware of how it works
and act accordingly. Perhaps it's human nature to find
a good reason why things are as they are, after the fact,
but that's not particularly useful. It's more like a faint
reflection of existential doubt, clenching its fist in empty
space in hopes of grasping the dream of true certainty. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top