Data Records Formats Testing Tool

M

Mark Jerde

(If these are the wrong groups please suggest the right one(s). Thanks.)

I need to come up with a way to test potentially thousands of data (files /
records / streams) to determine if they match one of about thirty defined
data formats. If a record partially matches one of the formats I need to
log why it failed.

The formats are byte-oriented. Byte 0 is the type, byte 1 is the subtype,
bytes 2-5 give the total record length, etc. There are two wrinkles.
First, some of the formats allow 1..n subrecords, like a person listing her
home phone, cell phone, fax number, ICQ #, the dog's cell phone, etc.
Second, some of the formats allow other formats to be wholly contained in
them, like an "inventory" format being made up of many separate items of
different "item" format types.

In the history of computers this *can't* be the first need for this kind of
program. ;-) New formats are approved periodically so hard-coding
everything in C# or VB.NET is a sub-optimal solution. ISTM it should be
possible write the permissible format "rules" in (XML / ASN.1 / RegEx /
etc.), present the rules to a tried and true program, and smash data files
against the program all day long.

Suggestions? Windows preferred but not required.

Thanks.

-- Mark
 
K

Ken Tucker [MVP]

Hi,

Convert the stream to a string and use an regular expressions to
match the format. Not sure how you will be able to tell if the phone number
is a home number, fax, or dog's cell phone.

http://msdn.microsoft.com/library/d...s/cpguide/html/cpconcomregularexpressions.asp

Library of regular expressions.
http://www.regexlib.com/


Ken
-------------------------
(If these are the wrong groups please suggest the right one(s). Thanks.)

I need to come up with a way to test potentially thousands of data (files /
records / streams) to determine if they match one of about thirty defined
data formats. If a record partially matches one of the formats I need to
log why it failed.

The formats are byte-oriented. Byte 0 is the type, byte 1 is the subtype,
bytes 2-5 give the total record length, etc. There are two wrinkles.
First, some of the formats allow 1..n subrecords, like a person listing her
home phone, cell phone, fax number, ICQ #, the dog's cell phone, etc.
Second, some of the formats allow other formats to be wholly contained in
them, like an "inventory" format being made up of many separate items of
different "item" format types.

In the history of computers this *can't* be the first need for this kind of
program. ;-) New formats are approved periodically so hard-coding
everything in C# or VB.NET is a sub-optimal solution. ISTM it should be
possible write the permissible format "rules" in (XML / ASN.1 / RegEx /
etc.), present the rules to a tried and true program, and smash data files
against the program all day long.

Suggestions? Windows preferred but not required.

Thanks.

-- Mark
 
M

Mark Jerde

Ken said:
Hi,

Convert the stream to a string and use an regular expressions
to match the format.

Thanks, I'll look into this if we decide to write something. I don't know
much about regular expressions yet but I'm concerned about the calculated
offsets and regex complexity (and validation). See the phones example
below.

There are some advantages for this project to use a commercial or open
source product. A "drag & drop" interface like Visio would be ideal.
Not sure how you will be able to tell if the
phone number is a home number, fax, or dog's cell phone.

(My addition may be off...)
Byte 10 - Length of the phone text description
Bytes 11 to 11+(val(Byte10-1)) - Phone text description
Byte 11+(val(Byte10)) - Length of phone number
Bytes (11+(val(Byte10))) to (11+(val(Byte10)))+(val(11+(val(Byte10)))-1) -
Phone number

-- Mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top