Loopy over loops

Guest · Mar 25, 2005

Using VB.NET in VS 2003.
This should be a simple routine, but it has me flummoxed.
I have to compare strings in two text files:
FILE1(srar) consists of lines of book titles.
FILE (srac) consists of multi-line â€œbook recordsâ€, separated by blank rows.
The number of rows in each â€œbook recordâ€ varies, as in this diagram:

1xxxxxx
xxxxxx
TitleRow
xxxxxx

2xxxxxx
TitleRow
xxxxxx
xxxxxx
xxxxxx

All Iâ€™m trying to do is to first read a title in File1, then read through
the entire File2 file, one â€œbook recordâ€ at a time, looking for a matching
title. After other processing (not relevant here) and all book records have
been searched, go to the next File1 title and read all the book records in
File2 again. In short: typical looping within a loop.

The problem is that I cannot get it to work correctly! I know the issue
involves the blank rows and how the streamreader works. Here are core
excerpts of my code. Any enlightenment is greatly appreciated!

Dim RowCntr, ARBldr, ARCntr, MARCRowCntr, z, StartPos, TabPos As Long
Dim C1, C2, BkTitle, ArTitle, Arline, Acline, ArWtr, AcWtr, arbk As String
Dim srAR As System.IO.StreamReader = New StreamReader(FILE1.txt,
Encoding.GetEncoding(1252))
Dim srAC As System.IO.StreamReader = New StreamReader(FILE2.txt,
Encoding.GetEncoding(1252))

' Loop through the AR file (FILE1), reading each book title
For ARCntr = 0 To RdrArray.GetUpperBound(1)
MarcCntr = 1 ' initialize marc record counter variable
â€˜ Read through FILE2, one â€œbook recordâ€ at a time.
Do
RowCntr = 1 â€˜ reset variable for next book record row counter
' Now, the Inner loop supposed to read all lines for a single book
record.
Do
Acline = srAC.ReadLine
LibList.Add(Acline) ' new field row
If Acline=â€â€ Then â€˜ found the blank row
LibList.Add(vbCr) ' new field row
RowCntr += 1 ' set value of RowCntr
End If
Loop Until srAC.Peek = -1
Loop Until srAC.Peek = -1 ' of first DO. Get another book record
Next ARCntr ' of the original FOR loop. Get another book title

I hope this is enough information to work with. I'll reply with more info if
necessary.
Thanks again for any help!

George

Rod Gill · Mar 26, 2005

Hi,

Can't give you code (I'm learning vb.net myself!) but I would:

Read the second file into memory ignoring all blank rows and ignoring any
non-title rows if possible.
I would save possible title rows into a collection you define to hold all
the data you need
Then you can search the list letting the code behind the collection do all
the searching for you. It should also run a lot faster.

Guest · Mar 27, 2005

Thanks for the reply, Rod.
Unfortunately, there are reasons I cannot do this (otherwise, it would have
been a no-brainer). And I probably didn't go into sufficient detail about
this, so I apologize:

1. These "book records" must be written back out to a text file to be
imported into a database; hence, removing blank rows, extracting only certain
rows, etc., will do no good.
2. As for the blank rows, they must remain, as they are the legal
"separators" between the records. Otherwise, the import process (which also
involves a file format change that we don't need to talk about here) will
fail. However, I have tried stopping the loop when it hits a blank row and
programmatically putting the blank row back in before it is written out to a
file.

And the reason I'm trying to read these records one at a time is that I'm
dealing with tens of thousands of them and I fear running out of memory if I
try to read them all at one time. Remember each "record" contains numerous
rows of data.

In fact, after I finish processing a single record, I clear the array and
read in another record to process.

Perhaps another way for me to ask t his question is this:

With the outside For loop, looking through the AR book title files, what is
the most effective way for me to run an inner loop to read through the book
records file, one "book record" of rows at a time?

If anything else comes to mind, don't hesitate to respond! Thanks for your
ideas and time, Rod!

George

Nick Malik [Microsoft] · Mar 27, 2005

You logic is strange.

Best to do this as psuedo-code first. Then convert to the language. I also
suggest that you pull out a seperate function to get the group of records
from the second file and return only the book title.

Open File1
Read a record from File1
Do while not EOF(File1)
FoundIt = false
Open File 2
BookTitle = GetBookTitle(File2)
Do while not EOF(File2)
if (File1.Record == BookTitle) then
FoundIt = true
break out of inner loop
end if
BookTitle = GetBookTitle(File2)
Loop
if (FoundIt) then
' do your processing
end if
Close File 2
Read a record from File 1
Loop

The inner call: GetBookTitle is pretty simple at this point:
Function GetBookTitle(file File2) as string
GetBookTitle = blank ' make sure you return a valid value
read a line from File2
do wile line is not blank
save line to array or other object
if line is the title
GetBookTitle = line read
end if
read a line from File2
loop
End Function

Hopefully this will clear up the log jam.

BTW: if you are searching for items from the first set in the second set,
why not just load both of them into database tables and then do a simple
join? It's a LOT more efficient and would go much quicker!

You mention also that these files can get big. This algorithm, while it
will solve the immediate problem, will present you with a new one: it is not
efficient at all. You will reread the entire large file 2 for each record
in File 2. I don't know how many records are in File 1 or File 2, but if
you are afraid of running out of memory, I'd mention that this method will
have you running out of processor time and will put an amazing strain on the
garbage collector!

If you must do this kind of processing, is there any way you can presort
both files? If so, you won't have to read each file from the beginning.
You can read from file1 until you pass where the first record would occur
alphabetically. Then, if you don't fine record 1, you move forward to
record2 and read forward from there. each file is read once. This is
considerably more efficient. (It's how we used to do this in the old days,
where every cycle on a computer was counted and charged back to the person
running the job.

The best approach is still to load both table in a db and do a join.

HTH,
--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--

Guest · Mar 28, 2005

A great reply, Nick. Thanks! You have some good ideas to consider.

However, let me try to clarify a few things. Sorry I did not mention all of
this earlier, but I didn't want me message to go on and on, and I thought
that my problem was simply one of finding the right syntax for the code.

Everything you say about using a DB or sorting data makes sense; however, I
have to deal with a big restraint: The file I am reading (so-called FILE2) is
composed of what are called MARC records (I referred to them earlier as "book
records" that are composed of a variable number of rows).

Each MARC record must be kept (and at least returned to) its original state.
The reason is that I have to take the updated MARC records and import them
back into their library database program, with my updates. (And the library
program is a proprietary, closed system, alas.)

Thus, it seems to me that extracting and sorting out the titles will still
force me to keep track of their relationship back to the MARC records in
order that I do my other processing (which is to edit a specific "field"
value in the MARC record of a matched book title).

As for creating joins on titles, that is a good idea, however, here is the
rub:
The book titles in file1 may not specifically match the titles in File2
(containing the MARC records). There can be variations based on data entry
errors, abbreviations, use of sub-titles, etc. What I've done is to
"normalize" titles on both sides by stripping out all spaces, same-casing all
text, substituting "foreign characters" for normal alphabetic ones (a for Ã¡,
etc.), and then doing substring matches.

I could read all of the data into database tables, but I'm not sure what it
buys me in the end, given the data "normalizing", pattern matching and
importing back into the library system that has to be done, in any event.
But, I'll take a look at it. Perhaps I'm just dense or too close to the
project.

Thanks again for the advise!

George Atkins

Nick Malik [Microsoft] · Mar 28, 2005

Hello George,

You still get some value out of reading the data from both files into
tables.
Look at it this way: after all of your normalizing and tweaking, you still
have to compare the value of the book title from one table to the title of
another.
So, when you are loading your data into the tables in SQL, create a new
column with a normalized title, just for comparison.

You can do this for both tables. You can even create "many" normalized
possibilities for a single title by creating a detailed table.

Then the join is easy.

Either way, you have a hard problem. Good luck.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--

Guest · Mar 28, 2005

Well, you make a compelling argument, Nick. Looks like I have a new approach
to try out. Thanks for the insights and methodology!

George

Looping through files to compare	6	Oct 24, 2006
Row counter, copy master records	2	Mar 31, 2004
Help with streamwriter	1	Mar 9, 2007
vb.net last record in loop	2	Sep 28, 2009
messed up listbox entries	3	Nov 4, 2005
Reading Fixed Length ASCII/Text File	3	Nov 18, 2003
Loop Repeats - PLS HELP	2	Jun 7, 2011
Urgent: Fast way to read Parts of Big Files	3	Apr 6, 2004

Loopy over loops

Guest

Rod Gill

Guest

Nick Malik [Microsoft]

Guest

Nick Malik [Microsoft]

Guest

Ask a Question

Similar Threads