File Parsing and Creating Error Report : using MS Access

S

Sandy

Hi
1. I have a flat file (.txt) which I need to parse . Basically file is
of size 146 and has 4 different types of records namely
C01,C10,C20,C30.
File contains one record reach of type C01 , C10 and multiple C20,C30
records.
Giving below sample records (each new line is new record)

C01121036200529400100000 0000000278
C10121036200529400100000 5500000001 010


2. Record structure of each record type is different.
3. I need to parse the .txt file,identify record type
4. Based on corresponding record structure apply validation rules
say , I extract c01 record
C01121036200529400100000 0000000278

then 121036 shows time
2005294 shows date (yyyyddd) (field type:char)
001 shows version number (field type:char)
000000 shows sequence number (field type:num)
00000 shows legacy system id
0000000278 shows no of records in file. (field type:num)

char type fields should be checked as not null or spaces
while number field should be checked whether they contains number
fields at all.

also need to check whether total number of records in file is equal to
32

5. Generate error report where data violates validation rules.

can anybody give me sample code or provide me way to do this
 
J

John Nurick

Hi Sandy,

I'd probably do this by pre-processing the text file to split it into
four files each containing one type of record. These can then easily be
linked or imported using the built-in TransferText functionality, using
an import specification or schema.ini entry to store the record
structure. I use a Perl script to do this and have pasted it at the end
of this message.

Having got the four files for the four record types, I'd then link them
rather than import them. That way, it's easy to use queries to move the
records into the tables where you really want them - and do field type
conversions and validation at the same time.

The stucture you use for the linked tables (which is of course defined
in the import specifications or schema.ini) need not be the same as in
the final tables. For example, your "yyyyddd" date field is easier to
handle if you link it as two fields TheYear (e.g. 2005) and TheDay (e.g.
294), and then convert them into a single Access date/time value using
something like this:

TheDate: DateSerial(TheYear, 1, 1) - 1 + TheDay

For the time (e.g. 121036) the idea's the same: link it as three fields
TheHour, TheMinute, TheSecond and assemble them into a date/time value:

TheDateTime: DateSerial(TheYear, 1, 1) - 1 + TheDay +
TimeSerial(TheHour, TheMinute, TheSecond)

And so on.


Hi
1. I have a flat file (.txt) which I need to parse . Basically file is
of size 146 and has 4 different types of records namely
C01,C10,C20,C30.
File contains one record reach of type C01 , C10 and multiple C20,C30
records.
Giving below sample records (each new line is new record)

C01121036200529400100000 0000000278
C10121036200529400100000 5500000001 010


2. Record structure of each record type is different.
3. I need to parse the .txt file,identify record type
4. Based on corresponding record structure apply validation rules
say , I extract c01 record
C01121036200529400100000 0000000278

then 121036 shows time
2005294 shows date (yyyyddd) (field type:char)
001 shows version number (field type:char)
000000 shows sequence number (field type:num)
00000 shows legacy system id
0000000278 shows no of records in file. (field type:num)

char type fields should be checked as not null or spaces
while number field should be checked whether they contains number
fields at all.

also need to check whether total number of records in file is equal to
32

5. Generate error report where data violates validation rules.

can anybody give me sample code or provide me way to do this
#SplitByFirstCharacters.pl
#Splits a large text file into smaller files based on the
#first characters of each record
#Syntax:
# Perl SplitByFirstCharacters.pl File Chars

my $inputfile = shift @ARGV
or die "Please specify input file on command line.\n" ;
open "INFILE", "<$inputfile"
or die "Can't open input file $inputfile\n" ;
my $chars = shift @ARGV
or die "Please specify the number of characters to examine
as the second argument on the command line (with a space,
not a comma, between arguments)\n" ;

#Get basic name for output files
my $filestem = $inputfile ;
$filestem =~ s|\.[^\.]*$|| ;
my $fileext = ".txt";
my %outfiles ; # hash to store output filenames and filehandles
my $group ;

while (<INFILE>) {
# get the "group" - i.e. first 2 chars of record
$group = substr $_, 0, $chars ;

# create an output file for the group unless it already exists
unless (defined $outfiles{$group}) {
# create entry in hash for filehandle
$outfiles{$group} = "OUT$group" ;
# open corresponding otput file
open $outfiles{$group}, ">$filestem$group$fileext"
or die "Couldn't open output file $filestem$group$fileext";
print "Created $filestem$group$fileext for $group\n";
}

# write the record to the appropriate file
print { $outfiles{$group} } $_ ;
# progress message
print "processing record $.\n" unless $. % 10_000 ;
}
#Files will be closed automatically when the script ends
#End of script
 
S

Sandy

Hi John

Thanks for details. Some Qs about this.

1. Is it possible to write code to split files in MsAccess only based
on guideline code in Perl provided by u?

2.Also can you elaborate on "I'd then link them rather than import
them" .
Please explain this in detail

3. Also about generating error report, Can you throw some light on this?
 
J

John Nurick

Hi John

Thanks for details. Some Qs about this.

1. Is it possible to write code to split files in MsAccess only based
on guideline code in Perl provided by u?

Yes. Use the VBA file I/O statements such as Open, Line Input # and
Print # .
2.Also can you elaborate on "I'd then link them rather than import
them" .
Please explain this in detail

Access allows you either to import data from external sources (such as
text files) into Access tables, or to link to the external sources so
that they can be used in most ways as if they were Access tables. Try it
for yourself, using the menu commands File|Get External Data|Import, and
File, Get External Data|Link. (The VBA equivalents are
DoCmd.TransferText acImport and DoCmd.TransferText acLink.)

3. Also about generating error report, Can you throw some light on this?

Having created the linked table, you can run queries on it to identify
and report on any invalid data. You can do this before actually
importing the data to your "real" tables, so you can correct the data
before trying to import it. Or you can write an append query to import
only the "good" records, and one or more other queries to report on the
"bad" records.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top