Record parsing

R

RD

I have a new one. My gf always manages to send me a problem that
bends my head. :)

I have a table (imported from a HUGE text file) with a field for phone
numbers, another for Carriers, another for Groups, another for Company
Name and a couple other fields (descriptions or something).

The phone numbers can be sequential or they may not be. In the middle
of a sequence of numbers the Carrier may change or the Company may
change or the Group may change or any combination may occur. Or not.

What we need is a way to iterate through the numbers and write records
to a new table where non-sequential numbers are pretty much written as
they are but sequential numbers get parsed into a range in one cell.
She needs to do this to make it easier to move whole blocks of numbers
between Carriers or Groups or Companies.

A parsed record would look kind of like this:

PhnNum Grp Carr Cmpy
2013955700-2013955799 covadnj gobeam Focal Comm

With the last fields written as well.

I'm messing with a "little" sample table of over 11,000 lines. Now, I
can loop with the best of them and I've parsed out plenty of fields
and text files and such in the past but it's always been in the other
direction. Y'know, pulling info *out* of ranges and bad
concatenations not putting them *into* a range.

So, I don't feel like re-inventing the wheel if somebody has a routine
that does something similar I can adapt. How about it? Has anyone
had to do anything like this?

Thanks,
RD
 
J

John Nurick

You need to keep track of three rows:

3) the one you have just read ("next")
2) the one before that ("latest")
1) the row after the last change in carrier, group or company (i.e. the
"first" row with the "latest" combinatoin of carrier, group and
company).

If "next" has a different carrier, group, or company from "latest", you
need to output a row. If the "latest" phone number is different from the
"first" one, concatenate the two; otherwise just output the single
number, along with the other fields. Then update "latest" from "next",
and - each first iteration after outputting a row - "first" from
"latest".

This sort of thing is much easier to code in Perl (reading and writing
text files) than in VBA:

use strict;
my ( @first, @next, @latest );

while (<>) {
@next = split /\t/, $_ ;

@first = @latest unless @first;

if ( $first[1] ne $next[1]
|| $first[2] ne $next[2]
|| $first[3] ne $next[3] ) { #values have changed

#append latest number if a range is required
$first[0] .= "-$latest[0]" if ($first[0] ne $latest[0]);

print join("\t", @first);
undef @first;
}
@latest = @next;
}

#print last line
if (@first) { #there's a range to print
$first[0] .= "-$latest[0]" ;
print join("\t", @first) ;
} else { #single phone number
print join("\t", @latest);
}
 
R

RD

Wow.

I always wanted to learn Perl. I even bought "Perl for Dummies" :)
but it's all Greek to me.

Anyway, John, thanks for the logic. I guess I better get to coding.

Regards,
RD


You need to keep track of three rows:

3) the one you have just read ("next")
2) the one before that ("latest")
1) the row after the last change in carrier, group or company (i.e. the
"first" row with the "latest" combinatoin of carrier, group and
company).

If "next" has a different carrier, group, or company from "latest", you
need to output a row. If the "latest" phone number is different from the
"first" one, concatenate the two; otherwise just output the single
number, along with the other fields. Then update "latest" from "next",
and - each first iteration after outputting a row - "first" from
"latest".

This sort of thing is much easier to code in Perl (reading and writing
text files) than in VBA:

use strict;
my ( @first, @next, @latest );

while (<>) {
@next = split /\t/, $_ ;

@first = @latest unless @first;

if ( $first[1] ne $next[1]
|| $first[2] ne $next[2]
|| $first[3] ne $next[3] ) { #values have changed

#append latest number if a range is required
$first[0] .= "-$latest[0]" if ($first[0] ne $latest[0]);

print join("\t", @first);
undef @first;
}
@latest = @next;
}

#print last line
if (@first) { #there's a range to print
$first[0] .= "-$latest[0]" ;
print join("\t", @first) ;
} else { #single phone number
print join("\t", @latest);
}


I have a new one. My gf always manages to send me a problem that
bends my head. :)

I have a table (imported from a HUGE text file) with a field for phone
numbers, another for Carriers, another for Groups, another for Company
Name and a couple other fields (descriptions or something).

The phone numbers can be sequential or they may not be. In the middle
of a sequence of numbers the Carrier may change or the Company may
change or the Group may change or any combination may occur. Or not.

What we need is a way to iterate through the numbers and write records
to a new table where non-sequential numbers are pretty much written as
they are but sequential numbers get parsed into a range in one cell.
She needs to do this to make it easier to move whole blocks of numbers
between Carriers or Groups or Companies.

A parsed record would look kind of like this:

PhnNum Grp Carr Cmpy
2013955700-2013955799 covadnj gobeam Focal Comm

With the last fields written as well.

I'm messing with a "little" sample table of over 11,000 lines. Now, I
can loop with the best of them and I've parsed out plenty of fields
and text files and such in the past but it's always been in the other
direction. Y'know, pulling info *out* of ranges and bad
concatenations not putting them *into* a range.

So, I don't feel like re-inventing the wheel if somebody has a routine
that does something similar I can adapt. How about it? Has anyone
had to do anything like this?

Thanks,
RD
 
J

John Nurick

Good luck. The logic in VBA will be pretty much the same as in Perl,
though the details will be different. Here are a few clues:

@first is an array (like a VBA variant array, but you'll
need to iterate through a recordset instead)
$first[0] is the first element in the array, i.e. the phone number
$first[1] is the second element, i.e. the carrier
ne translates as <>
|| translates as Or
unless @first tests whether @first contains a value
undef @first wipes out all the elements of @first


Wow.

I always wanted to learn Perl. I even bought "Perl for Dummies" :)
but it's all Greek to me.

Anyway, John, thanks for the logic. I guess I better get to coding.

Regards,
RD


You need to keep track of three rows:

3) the one you have just read ("next")
2) the one before that ("latest")
1) the row after the last change in carrier, group or company (i.e. the
"first" row with the "latest" combinatoin of carrier, group and
company).

If "next" has a different carrier, group, or company from "latest", you
need to output a row. If the "latest" phone number is different from the
"first" one, concatenate the two; otherwise just output the single
number, along with the other fields. Then update "latest" from "next",
and - each first iteration after outputting a row - "first" from
"latest".

This sort of thing is much easier to code in Perl (reading and writing
text files) than in VBA:

use strict;
my ( @first, @next, @latest );

while (<>) {
@next = split /\t/, $_ ;

@first = @latest unless @first;

if ( $first[1] ne $next[1]
|| $first[2] ne $next[2]
|| $first[3] ne $next[3] ) { #values have changed

#append latest number if a range is required
$first[0] .= "-$latest[0]" if ($first[0] ne $latest[0]);

print join("\t", @first);
undef @first;
}
@latest = @next;
}

#print last line
if (@first) { #there's a range to print
$first[0] .= "-$latest[0]" ;
print join("\t", @first) ;
} else { #single phone number
print join("\t", @latest);
}


I have a new one. My gf always manages to send me a problem that
bends my head. :)

I have a table (imported from a HUGE text file) with a field for phone
numbers, another for Carriers, another for Groups, another for Company
Name and a couple other fields (descriptions or something).

The phone numbers can be sequential or they may not be. In the middle
of a sequence of numbers the Carrier may change or the Company may
change or the Group may change or any combination may occur. Or not.

What we need is a way to iterate through the numbers and write records
to a new table where non-sequential numbers are pretty much written as
they are but sequential numbers get parsed into a range in one cell.
She needs to do this to make it easier to move whole blocks of numbers
between Carriers or Groups or Companies.

A parsed record would look kind of like this:

PhnNum Grp Carr Cmpy
2013955700-2013955799 covadnj gobeam Focal Comm

With the last fields written as well.

I'm messing with a "little" sample table of over 11,000 lines. Now, I
can loop with the best of them and I've parsed out plenty of fields
and text files and such in the past but it's always been in the other
direction. Y'know, pulling info *out* of ranges and bad
concatenations not putting them *into* a range.

So, I don't feel like re-inventing the wheel if somebody has a routine
that does something similar I can adapt. How about it? Has anyone
had to do anything like this?

Thanks,
RD
 
R

RD

Thanks again. Once I get it worked out in VBA I'll post it. Since I
have to do it in my spare time it may take a few days. The paying job
has to take priority over my gf's 'little projects".

Think I'll go home and crack open Perl for Dummies, again.


Good luck. The logic in VBA will be pretty much the same as in Perl,
though the details will be different. Here are a few clues:

@first is an array (like a VBA variant array, but you'll
need to iterate through a recordset instead)
$first[0] is the first element in the array, i.e. the phone number
$first[1] is the second element, i.e. the carrier
ne translates as <>
|| translates as Or
unless @first tests whether @first contains a value
undef @first wipes out all the elements of @first


Wow.

I always wanted to learn Perl. I even bought "Perl for Dummies" :)
but it's all Greek to me.

Anyway, John, thanks for the logic. I guess I better get to coding.

Regards,
RD


You need to keep track of three rows:

3) the one you have just read ("next")
2) the one before that ("latest")
1) the row after the last change in carrier, group or company (i.e. the
"first" row with the "latest" combinatoin of carrier, group and
company).

If "next" has a different carrier, group, or company from "latest", you
need to output a row. If the "latest" phone number is different from the
"first" one, concatenate the two; otherwise just output the single
number, along with the other fields. Then update "latest" from "next",
and - each first iteration after outputting a row - "first" from
"latest".

This sort of thing is much easier to code in Perl (reading and writing
text files) than in VBA:

use strict;
my ( @first, @next, @latest );

while (<>) {
@next = split /\t/, $_ ;

@first = @latest unless @first;

if ( $first[1] ne $next[1]
|| $first[2] ne $next[2]
|| $first[3] ne $next[3] ) { #values have changed

#append latest number if a range is required
$first[0] .= "-$latest[0]" if ($first[0] ne $latest[0]);

print join("\t", @first);
undef @first;
}
@latest = @next;
}

#print last line
if (@first) { #there's a range to print
$first[0] .= "-$latest[0]" ;
print join("\t", @first) ;
} else { #single phone number
print join("\t", @latest);
}



I have a new one. My gf always manages to send me a problem that
bends my head. :)

I have a table (imported from a HUGE text file) with a field for phone
numbers, another for Carriers, another for Groups, another for Company
Name and a couple other fields (descriptions or something).

The phone numbers can be sequential or they may not be. In the middle
of a sequence of numbers the Carrier may change or the Company may
change or the Group may change or any combination may occur. Or not.

What we need is a way to iterate through the numbers and write records
to a new table where non-sequential numbers are pretty much written as
they are but sequential numbers get parsed into a range in one cell.
She needs to do this to make it easier to move whole blocks of numbers
between Carriers or Groups or Companies.

A parsed record would look kind of like this:

PhnNum Grp Carr Cmpy
2013955700-2013955799 covadnj gobeam Focal Comm

With the last fields written as well.

I'm messing with a "little" sample table of over 11,000 lines. Now, I
can loop with the best of them and I've parsed out plenty of fields
and text files and such in the past but it's always been in the other
direction. Y'know, pulling info *out* of ranges and bad
concatenations not putting them *into* a range.

So, I don't feel like re-inventing the wheel if somebody has a routine
that does something similar I can adapt. How about it? Has anyone
had to do anything like this?

Thanks,
RD
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top