Regular Expression Help...Line Break

J

jsummit

I'm trying to parse a rather large string from a .txt file that breaks
into 2 lines ( the string is a fixed-length string reaching 6694
characters) into a dataset using VB.net. The issues that I am having is
with the line breaks. I am able to parse the string into the dataset up
in till the line breaks. My pattern is as follows..
"^(?<n0>.{1})(?<n1>.{8})(?<n2>.{2})..... (?<n404>.{2})$"
My code is as follows:

Dim re as New Regex(patern)
Dim ma as Match = re.Match(sr.ReadLine(), RegexOptions.MultiLine)
Do while
ma.Groups("n" & i & "").Value.ToString
Loop

My issue is when I hit a line break in the txt file. The reason the
string breaks is that it is so long that it reaches the max places of
the txt file and has to continue on the next line. The beginning of a
new line then begins on the next line .
Example:
I33445 89 u000x00000 900000 x00000 ....... (on and on until the
max length of a txt file).......0000 00000000 (End of Txt file)
TOWN88989 000000000000 909 (end of String)
NEW Line Starts Here.

I have tried different options with RegexOptions but to no avail.

Thank you for your time.
 
A

Andrew Morton

I'm trying to parse a rather large string from a .txt file that breaks
into 2 lines ( the string is a fixed-length string reaching 6694
characters) into a dataset using VB.net. The issues that I am having
is with the line breaks.

How about removing all the line breaks first?

Andrew
 
J

jsummit

I have tried that - It's not an option. The string is so large that it
extends the whole length of the txt file and then some. I didn't think
there was an end or a line break in a txt file. I though you would be
able to extend as far right as you would like. Not so.

Thanks for you idea.

Jim
 
A

Andrew Morton

I have tried that - It's not an option. The string is so large that it
extends the whole length of the txt file and then some.

It cannot. Past the end of the file there is nothing. Or do you mean there
is an EOF character in there?
I didn't think
there was an end or a line break in a txt file.

Any byte can be in a text file; text file means only that it is meant to be
human-readable with minimal processing (e.g. tab characters).

Look at the supposedly text file in a hex editor. Do not trust Notepad.
I though you would be
able to extend as far right as you would like. Not so.

That must be a limitation of the program displaying/writing the text file.
( the string is a fixed-length string reaching 6694 characters)

That's a tiny string compared to the limit of something like 2^31-1 chars.
When you say "reaching 6694 characters", do you mean it is /always/ 6694
chars, and, if so, does that mean that each field is of a fixed width too?
In that case, you could just extract each substring with
System.String.Subtring(int32, int32).

Andrew
 
J

jsummit

Both of you questions are yes. Is /always/6694 char, and each field is
a fixed length. Although each field's length is different. I have been
reading the file thought Note Tab and probably why the break is at the
same spot.
I'm going to look into the system.string.substring. That may work. If
the a string can stretch 2^31-1 char in a txt file, then how come I'm
having issues with my short string breaking? I have no control on how
the string is extracted out and dumped into the txt file. The file I
have received is a .txt file.
Thanks again for you help.

-Jim
 
J

jsummit

Which object would you sugest to use? StreamReader, stringReader?
TextReader? In order to use the subString and write out the string
values to a ds. I was using a StreamReader.

Thanks
 
A

Andrew Morton

I'm going to look into the system.string.substring. That may work. If
the a string can stretch 2^31-1 char in a txt file, then how come I'm
having issues with my short string breaking? I have no control on how
the string is extracted out and dumped into the txt file. The file I
have received is a .txt file.

Have you looked at the file in a hex editor to see if there are any
unexpected characters in it? The fact it has an extension of .txt is
irrelevant.

In the case of what I think your file looks like, unexpected characters
would be anything with a decimal value in the range 0-31. Possibly you would
find 13,10 (CRLF) at the end.

Note that files and strings are not connected. The maximum possible size of
a file (disregarding hardware) is determined by the OS.

Andrew
 
A

Andrew Morton

Which object would you sugest to use? StreamReader, stringReader?
TextReader?

See the help for StreamReader and for StringReader to determine the
appropriate choice.

Andrew
 
J

jsummit

The Hexadecimal value for the line break in the txt file is 30 and the
decial value is 0. Are 13, 10 values you mentond are decimal values or
hexadecimal values?

Thanks
 
A

Andrew Morton

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top