Text Manipulation

W

Wedgie

Hi everyone,

I was looking for a freeware program to help me with a problem I'm
having. I don't really know what to search by though, so I was hoping
if I explained the problem, someone may be able to suggest something.
To start with, I have a large text file. The format is one entry of a
number of lines, followed by another entry of a number of lines,
followed bymore entries. The format is roughly:

(First entry)
A code associated with first entry
Another code associated with first entry
some data associated with first entry
some data associated with first entry
some date associated with first entry
QA {number}
(Second entry)
A code associated with second entry
Another code associated with second entry
some data associated with second entry
QA {number}
(Third entry}
A code associated with third entry
Another code associated with third entry
some data associated with third entry
some data associated with third entry
some data associated with third entry
some data associated with third entry
some data associated with third entry
QA {number}
etc
etc

Now, QA {number} above is the important bit. This is literally the
letters 'QA' followed by a couple of spaces, followed by a number (can
be any number, 99% of the time it will be in the range 0-9). Now, what
I want to do is hopefully find a program that will scan that QA line
for {number} > 0 and then return the data above it (from 'A code' to
'QA {number}' into a new file. That is, if the number just after QA is
not a zero for any entry in this text file, I want it to copy the data
above it (ie belonging to that entry), into a new file (I want all non
zero entries in the original file to be copied to a single, new file).

The reason why I want it to do this is that I currently go through
this file manually, making note of which entries have a non zero QA
value and copying their details down. This file everyday has a few
hundred entries in it (which takes a lot of time to go through), but
the great majority of them are zero value QA's, so I only end up using
a few of the entries. If I can find a program that can copy the half
dozen or so non zero values (and the data just above them), to a new
file, it would make things so much easier for me. As I said though, I
have no idea what sort of program to search for. Text manipulation?
Text editing? Text search? Can anyone help?
 
C

Canetoad

Now, QA {number} above is the important bit. This is literally the
letters 'QA' followed by a couple of spaces, followed by a number (can
be any number, 99% of the time it will be in the range 0-9). Now, what
I want to do is hopefully find a program that will scan that QA line
for {number} > 0 and then return the data above it (from 'A code' to
'QA {number}' into a new file. That is, if the number just after QA is
not a zero for any entry in this text file, I want it to copy the data
above it (ie belonging to that entry), into a new file (I want all non
zero entries in the original file to be copied to a single, new file).

The reason why I want it to do this is that I currently go through
this file manually, making note of which entries have a non zero QA
value and copying their details down. This file everyday has a few
hundred entries in it (which takes a lot of time to go through), but
the great majority of them are zero value QA's, so I only end up using
a few of the entries. If I can find a program that can copy the half
dozen or so non zero values (and the data just above them), to a new
file, it would make things so much easier for me. As I said though, I
have no idea what sort of program to search for. Text manipulation?
Text editing? Text search? Can anyone help?

--

Regards,

Wedgie

Hi
From your description it seems that you might be better off keeping
this data in a spreadsheet instead of a text file. Each entry is a row,
use separate columns for QA, number, code, data etc. Then it's easy to
search by column for entries >0 and copy that whole row with all it's
data cells to a new SS.

I know this is not what you asked for, maybe someone else knows if this
is possible in a text file.
 
B

Bernard Peek

Wedgie said:
Hi everyone,

I was looking for a freeware program to help me with a problem I'm
having. I don't really know what to search by though, so I was hoping
if I explained the problem, someone may be able to suggest something.
To start with, I have a large text file. The format is one entry of a
number of lines, followed by another entry of a number of lines,
followed bymore entries. The format is roughly:

The usual tools for handling text files like this are sed, awk and perl.
They can be used to create scripts that will do what you want, but it
will take time to learn how to use them. I think awk is probably the one
you want.
 
J

John Fitzsimons

The reason why I want it to do this is that I currently go through
this file manually, making note of which entries have a non zero QA
value and copying their details down. This file everyday has a few
hundred entries in it (which takes a lot of time to go through), but
the great majority of them are zero value QA's, so I only end up using
a few of the entries. If I can find a program that can copy the half
dozen or so non zero values (and the data just above them), to a new
file, it would make things so much easier for me. As I said though, I
have no idea what sort of program to search for. Text manipulation?
Text editing? Text search? Can anyone help?

(1) Put each record on a single line of text.

(2) Use Ted notepad to sort by the QA number.

(3) Cut/paste to a new/existing file all the non zero QA lines.

Regards, John.
--
****************************************************
,-._|\ (A.C.F FAQ) http://clients.net2000.com.au/~johnf/faq.html
/ Oz \ John Fitzsimons - Melbourne, Australia.
\_,--.x/ http://www.vicnet.net.au/~johnf/welcome.htm
v http://clients.net2000.com.au/~johnf/
 
W

Wedgie

Hi
From your description it seems that you might be better off keeping
this data in a spreadsheet instead of a text file. Each entry is a row,
use separate columns for QA, number, code, data etc. Then it's easy to
search by column for entries >0 and copy that whole row with all it's
data cells to a new SS.

I know this is not what you asked for, maybe someone else knows if this
is possible in a text file.

Hi Canetoad,

I can't recall now if I have tried it in a spreadsheet or not. Anyway,
I will give it a go when I get to work tomorrow.

Thanks.

Wedgie
 
W

Wedgie

The usual tools for handling text files like this are sed, awk and perl.
They can be used to create scripts that will do what you want, but it
will take time to learn how to use them. I think awk is probably the one
you want.

Hi Bernard,

I've found a couple of awk tutorials on the web, I will have a good
read through them and see if I can see a way to do what I want.

Thanks.

Wedgie
 
W

Wedgie

(1) Put each record on a single line of text.

(2) Use Ted notepad to sort by the QA number.

(3) Cut/paste to a new/existing file all the non zero QA lines.

Regards, John.

Hi john,

By 'put each record on a single line of text', are you referring to:

(First entry)
A code associated with first entry Another code associated with first
entry some data associated with first entry some data associated with
first entry some date associated with first entry QA {number}
(Second entry)
A code associated with second entry Another code associated with
second entry some data associated with second entry QA {number}
(Third entry}
A code associated with third entry Another code associated with third
entry some data associated with third entry some data associated with
third entry some data associated with third entry some data associated
with third entry some data associated with third entry QA {number}

or

(First entry)
QA {number} A code associated with first entry Another code
associated with first entry some data associated with first entry some
data associated with first entry some date associated with first entry
(Second entry)
QA {number} A code associated with second entry Another code
associated with second entry some data associated with second entry
(Third entry}
QA {number} A code associated with third entry Another code
associated with third entry some data associated with third entry
some data associated with third entry some data associated with third
entry some data associated with third entry some data associated with
third entry QA {number}

(obviously each entry is on a single line, but word wrap makes it look
like a paragraph)

Also, what would be the fastest way to get each record on one line
(the idea of going to the end of each line in the file, pressing
space, and then pressing the delete key to bring the next line back a
line doesn't appeal to me)?

Thanks,

Wedgie.
 
J

jack horsfield

First entry)
A code associated with first entry
Another code associated with first entry
some data associated with first entry
some data associated with first entry
some date associated with first entry
QA {number}

even without getting to the program you can probably speed up your manual
method if you start by extracting a list of all the qa numbers. in a DOS
window, enter:

find "QA" myQAfile.txt | sort

then you can just search for the ones with numbers, without going through
the whole file.


jack
the best thing you have going for you is your willingness to humiliate
yourself
 
B

Bernard Peek

Wedgie said:
Hi Bernard,

I've found a couple of awk tutorials on the web, I will have a good
read through them and see if I can see a way to do what I want.

The other alternative is to use a word-processor with a macro language.
You can do a lot with find and replace functions. I am confident that
awk is able to do the job. If you are comfortable working with VBA then
MS Word would also be able to do it.
 
J

John Fitzsimons

On Tue, 19 Apr 2005 09:34:47 +1000, John Fitzsimons
By 'put each record on a single line of text', are you referring to:
(First entry)
A code associated with first entry Another code associated with first
entry some data associated with first entry some data associated with
first entry some date associated with first entry QA {number}


(First entry)
QA {number} A code associated with first entry Another code
associated with first entry some data associated with first entry some
data associated with first entry some date associated with first entry

< snip >

Either will do. Ted can sort from the R.H.S. or L.H.S. and pretty well
every other text editor does the latter, only.
(obviously each entry is on a single line, but word wrap makes it look
like a paragraph)

Not so obviously. I don't know how the original data looks.
Also, what would be the fastest way to get each record on one line
(the idea of going to the end of each line in the file, pressing
space, and then pressing the delete key to bring the next line back a
line doesn't appeal to me)?

Me neither. What happens when you put this in a text editor and turn
off "word wrap" ? Doesn't that work ? Don't the records go to one line
each ?

Perhaps giving an example of a complete "record" would make it easier
for people to help you ? My first thought is that you need to unwrap
then re-wrap after each number "string".

Regards, John.
--
****************************************************
,-._|\ (A.C.F FAQ) http://clients.net2000.com.au/~johnf/faq.html
/ Oz \ John Fitzsimons - Melbourne, Australia.
\_,--.x/ http://www.vicnet.net.au/~johnf/welcome.htm
v http://clients.net2000.com.au/~johnf/
 
W

Wedgie

Not so obviously. I don't know how the original data looks.

Sorry, I wasn't referring to the original data, I was referring to the
example of 'all on one line' in my reply to your original message.
Because of the word wrap in my news program, it maked the two examples
look like they were all on separate lines, rather than the one line
how I had made them.
Me neither. What happens when you put this in a text editor and turn
off "word wrap" ? Doesn't that work ? Don't the records go to one line
each ?

No - each line in the file is ended with a return character, so no
matter whether word wrap is on or off, each record is on a separate
line.
Perhaps giving an example of a complete "record" would make it easier
for people to help you ? My first thought is that you need to unwrap
then re-wrap after each number "string".

From: Smith, John
Sent: Tuesday, 19 April 2005 16:13
To: Jones, Bill
Subject: Report050419-0046

19 Apr 2005 Imported date
10 Number



REPORT

Base information:
Name: XYZ
Operator: jjones
Date: 19 Apr 2005
Batch Number: 100000
Code used:
Indexing information:
Operation: jsmith
Date: 19 Apr 2005
Number: 10

Seq EmpID Folio Source Last Comments Action
Nums Num Code Date Taken

1 1234567 ZZ123 NOWHE 11Apr2005
2 1234567 ZZ456 NOWHE 10Apr2005
3 1234567 ZZ456 NOWHE 10Apr2005
4 1234567 ZZ456 NOWHE 10Apr2005
5 1234567 ZZ159 NOWHE 10Apr2005
6 1234567 ZZ159 NOWHE 11Apr2005
7 1234567 ZZ15 NOWHE 11Apr2005
8 1234567 ZY516 NOWHE 11Apr2005
9 1234567 ZZ19 NOWHE 11Apr2005
10 1234567 ZZ357 NOWHE 11Apr2005

Number of folios committed: 10
Number of folios send to QA: 0

PLEASE FILE

This is one complete record. The text file consists of a number of
records, all one after the other. Imagine copying this to a text file,
making a couple of blank lines below it, and copying another record
and so on. Basically, I have no interest in any records that have a
'number of folios send to QA' value of 0. All I want is the ones that
have a non zero value, preferably copied to a new text file to sort
out the dozen or so from the few hundred in the file that are zero
(new file is generated daily, so doing it manually each and every day
takes way too much time). Once in a new text file, I can manipulate
them to the way I want. As an added complication, I need the text
immediately above the QA line (as long as it is > 0 of course),
copied as well (basically get rid of the header info, everything below
'report' down to 'number of folios send to qa').
 
J

John Fitzsimons


Number of folios send to QA: 0
PLEASE FILE
This is one complete record. The text file consists of a number of
records, all one after the other. Imagine copying this to a text file,
making a couple of blank lines below it, and copying another record
and so on. Basically, I have no interest in any records that have a
'number of folios send to QA' value of 0. All I want is the ones that
have a non zero value, preferably copied to a new text file to sort
out the dozen or so from the few hundred in the file that are zero
(new file is generated daily, so doing it manually each and every day
takes way too much time). Once in a new text file, I can manipulate
them to the way I want. As an added complication, I need the text
immediately above the QA line (as long as it is > 0 of course),
copied as well (basically get rid of the header info, everything below
'report' down to 'number of folios send to qa').

Okay. You have two types of "Report".


(1)

REPORT

< snipped stuff >

Number of folios send to QA: 0

and

(2) REPORT

< snipped stuff >

Number of folios send to QA: >0




Now you haven't said what your "QA" number range is. Is it 1-9 ?
1- 1,000,000 ?

In any case one can ignore that variable if you like. Just get rid of
all records with QA of 0. Then everything that is left will be QA >0.

Here is how you do things.

(1) Get a program like MicroGenius (Multiple-File Recursive Search and
Replace) . It has a "Between" search and replace function. Both SOS's
site at ;

http://www.anycities.com/user1/sonofspy/Text.html

and CoMa's site at ;

http://www.algonet.se/~hubbabub/freeware/text2.html

mention it but both appear to have dead home page and/or download
links.

If you cannot find the program, or mfrsr.zip, then another program
with variable search and replace should work. Maybe take a trip to ;

http://www.inforapid.com/html/english.htm ?

(2) Enter your "Search" START TEXT criteria as "Report"

(3) Enter your "Search" FINISH WITH criteria as "QA: 0"

(4) Enter nothing as your "Replace with". Effectively deleting
everything QA 0.

(4) Run the S/R. You should now only have records with a QA >0.

(5) "Save" as a different filename.

(6) To now get rid of the "Header" info do the above again but this
time your "START TEXT is "Report" and your FINISH WITH is
"Action Taken"

Regards, John.
--
****************************************************
,-._|\ (A.C.F FAQ) http://clients.net2000.com.au/~johnf/faq.html
/ Oz \ John Fitzsimons - Melbourne, Australia.
\_,--.x/ http://www.vicnet.net.au/~johnf/welcome.htm
v http://clients.net2000.com.au/~johnf/
 
W

Wedgie

(1) Get a program like MicroGenius (Multiple-File Recursive Search and
Replace) . It has a "Between" search and replace function. Both SOS's
site at ;

http://www.anycities.com/user1/sonofspy/Text.html

and CoMa's site at ;

http://www.algonet.se/~hubbabub/freeware/text2.html

mention it but both appear to have dead home page and/or download
links.

If you cannot find the program, or mfrsr.zip, then another program
with variable search and replace should work. Maybe take a trip to ;

http://www.inforapid.com/html/english.htm ?

Hi John,

Genius! That looks so easy. I've searched for mfrsr (google, yahoo and
altavista, even the Wayback machine), but I have found no trace of it.
I have posted a message to this group asking if anyone has a copy of
the file. I did try some other programs first though - BK ReplaceEm,
NodeSoft Search and Replace, InfoRapid Search & Replace, but none of
them work like this. BK comes the closest, but the problem with it is
that it seems to ignore the "QA: 0" at the end, and even if this is
"QA: 1" or "QA: 2", it replaces it :-( InfoRapid is a bit harder to
use, but I eventually got it to produce the correct output in the
search...but the replace will only replace the found text - it will
not replace the text in between (ie in a range), so the 'Index' is
replaced, as is the "QA: 0" (though like BK, it ignores the 0 and
replaces 1's and 2's as well), but it leaves the text in the middle
alone. Anyway, hopefully someone in this group has a copy of the mfrsr
program. *cross my fingers, knock on wood, go looking for a 4 leaf
clover etc*
 
A

Adrian Carter

Hi Wedgie,
When I first saw this post I thought my program TextWedge could do it,
but wasn't sure until you posted your more detailed example in a later
post. I have just tried it, after extending your example data, and yes,
there
is a process by which you can get the results you want with less angst
than the method you currently use.

TextWedge at http://www.homestead.com/adriancarter/Index.html
is a program for splitting text files according to very flexible splitting
criteria.

How I did it:
1. Split the file by getting it to split at any line that contains
"Number of folios send to QA", and to split in such a way that
each "hit line" is at the bottom of each chunk.
2. Before splitting, choose a naming convention for the chunks
so they are all in a new empty folder.
3. After splitting, search in this folder for files that contain
"Number of folios send to QA: 0".
4. Delete everything from the search results window.
5. What remains is a collection of files containing the info you want.
If you want it all in a single file, use DOS COPY to join them up again.

HTH,
Adrian Carter
 
S

Susan Bugher

Wedgie said:
On Thu, 21 Apr 2005 11:44:39 +1000, John Fitzsimons
Hi John,

Genius!

Agree. :)

That looks so easy. I've searched for mfrsr (google, yahoo and
altavista, even the Wayback machine), but I have found no trace of it.

Wrong link - http://www.anycities.com/user1/sonofspy/Text.html - this is
an old *DEAD* SOS site. Try an active SOS site. ;)

The description is here:

http://www.sover.net/~wysiwygx/Text.html

The *GOOD* download link is:

http://www.pricelesswarehome.org/SOS/mfrsr.zip

Enjoy.

Susan
--
Posted to alt.comp.freeware
Search alt.comp.freeware (or read it online):
http://google.ca/advanced_group_search?q=+group:alt.comp.freeware
Pricelessware & ACF: http://www.pricelesswarehome.org
Pricelessware: http://www.pricelessware.org (not maintained)
 
J

John Fitzsimons

On Thu, 21 Apr 2005 15:04:33 GMT, "Adrian Carter"

Hi Adrian,

TextWedge at http://www.homestead.com/adriancarter/Index.html
is a program for splitting text files according to very flexible splitting
criteria.
How I did it:
1. Split the file by getting it to split at any line that contains
"Number of folios send to QA", and to split in such a way that
each "hit line" is at the bottom of each chunk.

< snip >

According to that page you can "split on "chunk size" or
"number of lines per chunk" criteria, or you can split at lines
containing any one of multiple search strings .."

But if the size of the "chunk" varies will it still work ? Suppose
one chink is ten lines and another twenty lines.

Also, would it be possible for your program to do it's work on a
directory of files ? Without needing to open them one by one
first ?

Regards, John.
--
****************************************************
,-._|\ (A.C.F FAQ) http://clients.net2000.com.au/~johnf/faq.html
/ Oz \ John Fitzsimons - Melbourne, Australia.
\_,--.x/ http://www.vicnet.net.au/~johnf/welcome.htm
v http://clients.net2000.com.au/~johnf/
 
A

Adrian Carter

John Fitzsimons wrote:

snip>
snip>

According to that page you can "split on "chunk size" or
"number of lines per chunk" criteria, or you can split at lines
containing any one of multiple search strings .."

But if the size of the "chunk" varies will it still work ? Suppose
one chink is ten lines and another twenty lines.

The splitting criteria are not mutually exclusive. It splits at every line
where *any* one of the criteria is satisfied. eg it might split at the 2nd
line if it found one of the search strings, and then at the 22nd if no other
search string was found before then, & you also had "number of lines = 20".
Though I can't imagine why anyone would want to mix size criteria with
search strings. Naturally, using search strings means it can spit out
variable sized chunks.
Also, would it be possible for your program to do it's work on a
directory of files ? Without needing to open them one by one
first ?

A nice idea. If splitting multiple files this way, I can see users possibly
wanting to know which original file a chunk came from. I will have to
think of a way to specify a naming convention that could incorporate
such a requirement. I'll put this one on the to do list.

Regards,
Adrian
 
W

Wedgie

How I did it:
1. Split the file by getting it to split at any line that contains
"Number of folios send to QA", and to split in such a way that
each "hit line" is at the bottom of each chunk.
2. Before splitting, choose a naming convention for the chunks
so they are all in a new empty folder.
3. After splitting, search in this folder for files that contain
"Number of folios send to QA: 0".
4. Delete everything from the search results window.
5. What remains is a collection of files containing the info you want.
If you want it all in a single file, use DOS COPY to join them up again.

HTH,
Adrian Carter

Hi Adrian,

I think it will help. I downloaded your program, but I must be doing
something wrong, as I can't get the correct result. Ok, first I looked
into the 'text break splitting' dialog. I discarded i as an optiont,
as it looked to me to be for files with the same number of lines in
each record, which mine doesn't have. So, I went to the 'standard
splitting' dialog. There are 3 options here - phrase in first line of
chunk, phrase in last line of chunk and dump lines containing phrase.
I copied the phrase "Number of folios send to QA: 0" from my file,
and went to work.

First, the first line option. It cuts out that line, so that the last
line in each record is now the one above the 'Number of folios send to
QA: 0' line. Ok, so I tried the wrong option. No problem, I then try
the last line option, thinking this was the right one. Nope, in this
case, the very top line of the report is removed, but every other line
is untouched. So, I try the dump lines option. This works as it states
- every 'Number of folios send to QA: 0' line was simply deleted from
the file. I've run out of ideas on how to configure your program to
get the right result - can you give me some pointers?
 
A

Adrian Carter

Hi Wedgie,

Text break splitting is for data files that are sorted - you can elect
(for example) to split whenever the text in the first 18 characters
of a line changes. On to your problem. I'll describe the exact
sequence of steps I followed, using a test file that I built by
copying your example 14 times and modifying each group a bit..

1. Load or paste file.

2. Using menu sequence Dialogs -> Configure Standard Splitting,
I did as follows:
a) Select radiobutton "Phrase in LAST line of chunk"
b) Enter the text "Number of folios send to QA" in the
first line of the grid.
c) Left UNCHECKED the box "Phrases are regular expressions"
d) In box labelled "Pattern for filenames of chunks",
I put "C:\Dump\Result<#>.Txt". C:\Dump was an empty folder,
and it later filled with files Result01.Txt - Result14.Txt.
e) Press OK button.

3. Chose menu sequence Action -> Split, and Save Chunks
Dump folder then filled with 14 files.

4. Go to Windows Explorer, open C:\Dump and search for
"Number of folios send to QA: 0".

5. Delete all files from search results window.

When I opened C:\Dump again, there were just 3 files remaining,
and they all had a non-zero after "QA:".

If you want to take this to email, my unmunged address is in the
2nd para of the ReadMe.Txt file that you would have downloaded
with TextWedge. I'm happy to assist with any problems or
questions about my software.

Regards,
Adrian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top