String builder (Parsing vertically presented records)

I

ILCSP

Hi, I just started learning Visual Basic (VB.NET 03) and I need to do
this small program that will read a text file we get from another
company that has survey data, parse it and flatten it out and make
single strings out of the records in it. The major difference between
this space delimited file from the many examples in these groups is
that the data is not presented horizontally, but vertically. You can
see an example below.

What I need from this Visual Basic procedure to do first is to ignore
the first line of the text file (headings). Then, extract the RegID
from the first survey record, get all 25 numbers with their answers
(some of them are blank and some of them are numbers) and convert it
into a single string. After this is done, I need a carriage return so
the next survey record can be flatten out. One thing that might make
this easier is that I do get a line that says "NewRegID" when a new
survey record starts.

I need the final data to look like this:

214555134,1,Y,2,N,3,Y,4,1,5,Y,6,Y,7,Y,8,Y,9,1,10,Y,11, ... all the way
to 25,2.
214016421,1,Y,2,Y,3,Y, .. and so on.

and it will be saved as a text file.

So far, this is what I have in my command button. This basically opens
the text file with StreamReader, reads it line by line and places it
in a text box called txtStrings. I'm missing the most important part:
the record builder and string creation.


Dim StrFileName As String
strFileName ="C:\Survey01.txt"
If Not System.IO.File.Exists(StrFileName) Then
MsgBox("File does not exists.")
Exit Sub
End If

Dim strRdr As System.IO.StreamReader =
System.IO.File.OpenText(StrFileName)

Dim StrLine As String
StrLine = strRdr.ReadLine()
Do Until StrLine Is Nothing
txtStrings.AppendText(StrLine & vbCrLf)
StrLine = strRdr.ReadLine()
Loop
strRdr.Close()



Survey1.txt sample:

RegID ItemName Response
214555134 NewRegID
214555134 1 Y
214555134 2 N
214555134 3 Y
214555134 4 1
214555134 5 Y
214555134 6 Y
214555134 7 Y
214555134 8 Y
214555134 9 1
214555134 10 Y
214555134 11 Y
214555134 12 Y
214555134 13 Y
214555134 14 Y
214555134 15 1
214555134 16
214555134 17 Y
214555134 18 Y
214555134 19 Y
214555134 20 Y
214555134 21 1
214555134 22 N
214555134 23 N
214555134 24 1
214555134 25 2
214016421 NewRegID
214016421 1 Y
214016421 2 Y
214016421 3 Y
214016421 4 1
214016421 5 Y
214016421 6
214016421 7 Y
214016421 8 Y
214016421 9 1
214016421 10 Y
214016421 11 N
214016421 12 Y
214016421 13 Y
214016421 14 Y
214016421 15 1
214016421 16 Y
214016421 17
214016421 18 Y
214016421 19 Y
214016421 20 Y
214016421 21 1
214016421 21 1
214016421 22 Y
214016421 23 N
214016421 24 2
214016421 25 3
213565432 1 Y
213565432 2 N
213565432 3 N
...
EOF

Any help would be greatly appreciated it.

Thanks!
 
C

Chris

Hi, I just started learning Visual Basic (VB.NET 03) and I need to do
this small program that will read a text file we get from another
company that has survey data, parse it and flatten it out and make
single strings out of the records in it. The major difference between
this space delimited file from the many examples in these groups is
that the data is not presented horizontally, but vertically. You can
see an example below.

What I need from this Visual Basic procedure to do first is to ignore
the first line of the text file (headings). Then, extract the RegID
from the first survey record, get all 25 numbers with their answers
(some of them are blank and some of them are numbers) and convert it
into a single string. After this is done, I need a carriage return so
the next survey record can be flatten out. One thing that might make
this easier is that I do get a line that says "NewRegID" when a new
survey record starts.

I need the final data to look like this:

214555134,1,Y,2,N,3,Y,4,1,5,Y,6,Y,7,Y,8,Y,9,1,10,Y,11, ... all the way
to 25,2.
214016421,1,Y,2,Y,3,Y, .. and so on.

and it will be saved as a text file.

So far, this is what I have in my command button. This basically opens
the text file with StreamReader, reads it line by line and places it
in a text box called txtStrings. I'm missing the most important part:
the record builder and string creation.


Dim StrFileName As String
strFileName ="C:\Survey01.txt"
If Not System.IO.File.Exists(StrFileName) Then
MsgBox("File does not exists.")
Exit Sub
End If

Dim strRdr As System.IO.StreamReader =
System.IO.File.OpenText(StrFileName)

Dim StrLine As String
StrLine = strRdr.ReadLine()
Do Until StrLine Is Nothing
txtStrings.AppendText(StrLine & vbCrLf)
StrLine = strRdr.ReadLine()
Loop
strRdr.Close()



Survey1.txt sample:

RegID ItemName Response
214555134 NewRegID
214555134 1 Y
214555134 2 N
214555134 3 Y
214555134 4 1
214555134 5 Y
214555134 6 Y
214555134 7 Y
214555134 8 Y
214555134 9 1
214555134 10 Y
214555134 11 Y
214555134 12 Y
214555134 13 Y
214555134 14 Y
214555134 15 1
214555134 16
214555134 17 Y
214555134 18 Y
214555134 19 Y
214555134 20 Y
214555134 21 1
214555134 22 N
214555134 23 N
214555134 24 1
214555134 25 2
214016421 NewRegID
214016421 1 Y
214016421 2 Y
214016421 3 Y
214016421 4 1
214016421 5 Y
214016421 6
214016421 7 Y
214016421 8 Y
214016421 9 1
214016421 10 Y
214016421 11 N
214016421 12 Y
214016421 13 Y
214016421 14 Y
214016421 15 1
214016421 16 Y
214016421 17
214016421 18 Y
214016421 19 Y
214016421 20 Y
214016421 21 1
214016421 21 1
214016421 22 Y
214016421 23 N
214016421 24 2
214016421 25 3
213565432 1 Y
213565432 2 N
213565432 3 N
..
EOF

Any help would be greatly appreciated it.

Thanks!

Maybe this will help you. You probably want to use the String.Split
function to get the indvidual items.

Dim StrFileName As String
strFileName ="C:\Survey01.txt"
If Not System.IO.File.Exists(StrFileName) Then
MsgBox("File does not exists.")
Exit Sub
End If

Dim strRdr As System.IO.StreamReader =
System.IO.File.OpenText(StrFileName)
Dim StrLine As String
Dim StringArray() as String
Do strRdr.Peek <> -1
StrLine = strRdr.ReadLine()
if StrLine.EndsWith(NewRegID) then
'This is a new Record
StringArray = StrLine.Split(" ")
'StringArray(0) is your new ID
else
'This continuing record
StringArray = StrLine.Split(" ")
'String(0) has Column1
'String(1) has Column2
'String(2) has Column3 (if there is one)
for ii as integer = 1 to String.Getupperbound(0)
'Append your string here
next
end if
Loop
strRdr.Close()
 
C

Cor Ligthert [MVP]

Hi EOF

Do you mean something as (typed here not checked so watch typos or whatever)
Dim StrLine As String
StrLine = strRdr.ReadLine()
dim sb as new text.stringbuilder(StrLine.Substring(0,9))
Do Until StrLine Is Nothing
dim fields() = StrLine.Split()
sb.append(fields(1) & fields(2))
StrLine = strRdr.ReadLine()
if Is StrLine Not Nothing AndAlso
strline.indexof("NewRegID") > -1 then
strWrt.WriteLine(sb.ToString)
sb = new text.Stringbuilder(StrLine.Substring(0,9))
end if
Loop
strRdr.Close()

I hope this helps,

Cor
 
C

Cerebrus

Hi,

Another implementation :

===================
Private Sub ReadText()
Dim fsr As New FileStream("VertFile.txt", FileMode.Open,
FileAccess.Read)
Dim fsw As New FileStream("ResultFile.txt", FileMode.Create,
FileAccess.Write)
Dim sr As New StreamReader(fsr)
Dim sw As New StreamWriter(fsw)

Dim thisLine As String
Dim lineContents() As String

'Ignore the first line of the text file
sr.ReadLine()
While sr.Peek > -1
thisLine = sr.ReadLine()
lineContents = thisLine.Split(New Char() {" "c})
If lineContents(1).Trim = "NewRegID" Then
'We have a new record. Write current value of sb and Insert new
line.
If sb.Length > 0 Then
'Replace the last comma with a period.
sb.Replace(",", ".", sb.Length - 1, 1)
sw.WriteLine(sb.ToString())
End If
sb = New StringBuilder()
sb.Append(lineContents(0))
sb.Append(",")
Else
'This is the same record. Keep adding text
sb.Append(lineContents(1))
sb.Append(",")
If sb.Length > 2 Then
sb.Append(lineContents(2))
sb.Append(",")
End If
End If
End While
'Replace the last comma with a period.
sb.Replace(",", ".", sb.Length - 1, 1)
sw.WriteLine(sb.ToString())
sw.Flush()
'Clean up
sw.Close()
sr.Close()
fsr.Close()
fsw.Close()
End Sub
===================

Note that your third set of records (213565432) does not have the line
"NewRegID".
When that line is inserted in the sample text, then it works.

Regards,

Cerebrus.
 
I

ILCSP

Hi again, thanks for replying.

I'm trying Cerebrus code right now, but when I debug I get these
errors:

Name 'sb' is not declared
Type 'FileStream' is not defined
Type 'StreamReader' is not defined

Am I missing something since these FileStream and StreamReader names
don't seem to be recognized by VB? Does the variable "sb" actually is
meant to be "sr" or is it for the text box?

I'm running VB.net version 2003.

The third and all subsequent records do have the "NewRegID" so that's
not a problem.

Thanks.
 
C

Cor Ligthert [MVP]

You needs some imports in the top of your program

imports system.text
imports system.io

Or put that text and io before where that streamreader and stringbuilder are
used.
text.stringbuilder

Cor
 
I

ILCSP

Hi, when I added the lines to the top of my procedures:

imports system.text
imports system.io

I got a Syntax error for the 2 of them.

Therefore, I had to change the first line to this:
Dim fsr As New System.io.FileStream("c:\surv060313a.txt",
IO.FileMode.Open)

I still get a "name 'sb' is not declared" error for every time the line
shows in the code.
 
C

Cor Ligthert [MVP]

Yes

It is missing, if you look in my sample I have showed you than you see.

dim sb as new text.stringbuilder(StrLine.Substring(0,9))

in this case you can set it with the declarations in the style Cerebrus
does.

dim sb as new text.stringbuilder()

Cor
 
I

ILCSP

Hello Cor, now I am trying your code. I'm sorry if I sound way too
frustrated, but I've been trying to do this for days now.

This is what I have in my command button:

==========================

Dim StrFileName As String
StrFileName = "C:\Survey01.txt"
If Not System.IO.File.Exists(StrFileName) Then
MsgBox("File does not exists.")
Exit Sub
End If

Dim StrLine As String
Dim strRdr As System.IO.StreamReader =
System.IO.File.OpenText(StrFileName)
Dim sb As New System.text.StringBuilder(StrLine.Substring(0, 9))

StrLine = strRdr.ReadLine()


Do Until StrLine Is Nothing
Dim fields() = StrLine.Split()
sb.append(fields(1) & fields(2))

if Is StrLine Not Nothing AndAlso strline.indexof("NewRegID") > -1
then
strWrt.WriteLine(sb.ToString)
sb = New System.text.StringBuilder(StrLine.Substring(0, 9))
End If

Loop
strRdr.Close()

==========================


I am getting an "expression expected" error in the last IF Statement:
if Is StrLine Not Nothing AndAlso strline.indexof("NewRegID") > -1 then

strWrt.WriteLine(sb.ToString)
sb = New System.text.StringBuilder(StrLine.Substring(0, 9))
End if


Also, I am getting a "Name strWrt is not declared" error because the
strWrt was not declared.
I tried to declared it as a string, but I got a 'writeline' is not a
member of sTring.

Finally, when I am supposed to append the outcome to my text box (
txtStrings) so I can see it?


Please advise me what to do next.

Thanks.
 
I

ILCSP

It's me again, I took another look at those survey text files and I
realized that what separates the columns is not a space, but a (^T)
character. Consequently, this is what one line is made of


214555134^T1^T Y^P

Where ^T is the "space" between the first, second and third columns.
The last character for each line is a paragraph break.

Hope this helps.
 
C

Cor Ligthert [MVP]

IlCSP,

Why did you not go on with Cerebrus code. It is another approach, I only
wanted to tell that the SB was in my sample as well.

My sample was by the way full of typing and other errors. And because that
you are busy already so long here a better one if you don't succeed with
that of Cerebrus..
\\\
Dim strRdr As New IO.StreamReader("C:\test.txt")
Dim strWrt As New IO.StreamWriter("C:\TestO.txt")
Dim StrLine As String = strRdr.ReadLine()
StrLine = strRdr.ReadLine()
Dim sb As New System.Text.StringBuilder(StrLine.Substring(0, 9))
Do Until StrLine Is Nothing
Dim fields() As String = StrLine.Split(" "c)
If fields.Length = 2 Then
sb.Append(fields(1))
Else
sb.Append(fields(1) & "," & fields(2))
End If
StrLine = strRdr.ReadLine()
If Not StrLine Is Nothing Then
If StrLine.IndexOf("NewRegID") > -1 Then
strWrt.WriteLine(sb.ToString)
sb = New System.Text.StringBuilder(StrLine.Substring(0,
9))
StrLine = strRdr.ReadLine()
Else
sb.Append(",")
End If
End If
Loop
strWrt.WriteLine(sb.ToString)
strRdr.Close()
strWrt.Close()
///


The result withouth that crazy thirth line that you showed is.

2145551341,Y,2,N,3,Y,4,1,5,Y,6,Y,7,Y,8,Y,9,1,10,Y,11,Y,12,Y,13,Y,14,Y,15,1,16,17,Y,18,Y,19,Y,20,Y,21,1,22,N,23,N,24,1,25,2
2140164211,Y,2,Y,3,Y,4,1,5,Y,6,7,Y,8,Y,9,1,10,Y,11,N,12,Y,13,Y,14,Y,15,1,16,Y,17,18,Y,19,Y,20,Y,21,1,21,1,22,Y,23,N,24,2,25,3

I hope this helps,

Cor
 
C

Cor Ligthert [MVP]

As addition,

Probably is what you want this in my sample.
If fields.Length = 2 Then
sb.Append(fields(1) & ",")
Else
sb.Append(fields(1) & "," & fields(2))
End If

Try that yourself, however.

Cor
 
C

Cerebrus

First of all, Lol !

I'm very sorry guys, in my sample code, I declared sb as a
StringBuilder globally (outside the procedure). Consequently, when
copying it here, I forgot to add that in. I seem to have caused more
problems than solved ! He he ! I assumed you would be familiar with
importing Namespaces. Also, All you needed to add to my code was a
declaration saying :
--------------------------------------------
Dim sb as New StringBuilder()
--------------------------------------------
Well... my mistake.

Now, since you have a "^T" instead of a space as a separator in your
source file, I imagine you have 2 options :

First, you could either replace all "^T" in the file with a " " (or
commas, to make it a CSV file). Then proceed with my code.
Secondly, you could simply change the statment (the 2nd statement
inside the While loop) that splits the string to :
--------------------------------------------
lineContents = Split(thisLine, "^T", , CompareMethod.Text)
--------------------------------------------

Also, please change another statement that was erroneous in the
original code :
:
:
'The following line should be changed from "If sb.Length > 2 Then"
If lineContents.Length > 2 Then
sb.Append(lineContents(2))
sb.Append(",")
End If
End If
End While

Hope that will get you going. I have tested this and it works. The
following was the output in the result file:

214555134,1,Y,2,N,3,Y,4,1,5,Y,6,Y,7,Y,8,Y,9,1,10,Y,11,Y,12,Y,13,Y,14,Y,15,1,16,17,Y,18,Y,19,Y,20,Y,21,1,22,N,23,N,24,1,25,2.
214016421,1,Y,2,Y,3,Y,4,1,5,Y,6,7,Y,8,Y,9,1,10,Y,11,N,12,Y,13,Y,14,Y,15,1,16,Y,17,18,Y,19,Y,20,Y,21,1,21,1,22,Y,23,N,24,2,25,3.
213565432,1,Y,2,N,3,N.

Regards,

Cerebrus.
 
I

ILCSP

Hi Cor and Cerebrus. First of all I would like to say a million thanks
for guiding me through this ordeal. I was able to do this finally.

if I changed the text file and replaced the ^T characters to spaces " "
both of your codes worked perfectly.

However, I'm afraid I am not allowed to modify these survey files, so I
tried to use Cerebrus comparing line:
lineContents = Split(thisLine, "^T", , CompareMethod.Text)

This also gave me an error:

An unhandled exception of type 'System.IndexOutOfRangeException'
occurred in SurveyStringer.exe
Additional information: Index was outside the bounds of the array.

Therefore, since I realized the ^T were tabs, I changed the line to
this:
lineContents = Split(thisLine, Chr(9), , CompareMethod.Binary)

Where Chr(9) is the code for a Tab (^T)

This made it work.

Again, I truly appreciate your hard work.

Sincerely,

JR.
 
I

ILCSP

Hi again, I have one last question about this. Is there a way to
process a whole directory (folder) containing the raw survey data files
using this code?

What I would want is to check if the folder containing the survey text
files is not empty and if it's not, then get the first file's filename
(without the extension) store it in a variable and then use it to
create the name of the new string output file. The new file name would
equal the name of the original filename plus "strings.txt" They would
be saved in a separate folder. Then, after the first text file is
processed, the next one would be read and string read and so on.

I thinking also to pop up a message box telling the user the number of
files processed.

For example, if I have this file as the one to be read:
c:\surveys06\Survey01.txt

the outcome would be saved as this:
c:\SurveyStrings06\Survey01string.txt

and so on.

Thanks in advance.
 
C

Cerebrus

Hi,

As Cor directed you, the DirectoryInfo class is the one to use.

1. Create a DirectoryInfo object supplying the directory u want to look
in.
2. The DirectoryInfo.GetFiles() method will return all files in the
directory in a FileInfo array.
3. Iterate through the members of this array using For each, using the
Name property to get the Filename, the extension property to append the
extension after adding "strings"
4. For each file let the above logic run.

HTH,

Regards,

Cerebrus.
 
I

ILCSP

Hi Cor and Cerebrus. Thanks for all your help. I followed your advice
and that DirectoryInfo and FileInfo class really came in handy.

I also had to write some code to skip the string process if the
current survey file contained no records and set the process to work
only with surv*.txt files (in case there were other files in the
directory)

You guys rock!

Thanks again.
 
C

Cerebrus

Hi,

The DirectoryInfo.GetFiles() method takes an overload in which you can
specify the pattern of files to be selected. Here you can simply use :
DirectoryInfo.GetFiles("surv*.txt") to make sure only files matching
that pattern are returned. No additional code necessary.

Happy coding,

Regards,

Cerebrus.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top