Regex: Could this pattern be more efficient?

S

sklett

I have an Intel hex file I need to parse. I want to run a regex on each
line to get the separate sections.
the format is like this:
:llaaaatt[d...]cc
where:
: - starts the record
ll - is the length of the data section([d...]) in hex
aaaa - is the address of the data in hex
tt - is the type in hex
[d...] are the data bytes in hex, this is a variable length section
cc - checksum in hex

So I need a pattern that will separate all the sections. I can get most of
them, but the variable data section I'm not sure. it basically will start
at index 9 and be ll long.

I'm thinking something like this (*note: I don't need section tt):
@":(?<ll>(\w{2}))(?<aaaa>(\w{4}))\w{2}(?<d>(\w+))(?<cc>(\w{2}))";

It works, but I'm very new to Regex and not sure if I could do this a better
way. Do you see any improvements that could be made?
Thanks for reading!
Steve
 
G

Greg Bacon

: I have an Intel hex file I need to parse. I want to run a regex on each
: line to get the separate sections.
: the format is like this:
: :llaaaatt[d...]cc
: where:
: : - starts the record
: ll - is the length of the data section([d...]) in hex
: aaaa - is the address of the data in hex
: tt - is the type in hex
: [d...] are the data bytes in hex, this is a variable length section
: cc - checksum in hex
:
: So I need a pattern that will separate all the sections. I can get
: most of them, but the variable data section I'm not sure. it basically
: will start at index 9 and be ll long.
:
: I'm thinking something like this (*note: I don't need section tt):
: @":(?<ll>(\w{2}))(?<aaaa>(\w{4}))\w{2}(?<d>(\w+))(?<cc>(\w{2}))";
:
: It works, but I'm very new to Regex and not sure if I could do this a
: better way. Do you see any improvements that could be made?

If you upcase your input, you could use

Regex pattern = new Regex(
@"
^
:
(?<ll> [\dA-F][\dA-F])
(?<aaaa> [\dA-F][\dA-F][\dA-F][\dA-F])
(?<tt> 0[0124])
(?<dd> ([\dA-F][\dA-F])+)
(?<cc> [\dA-F][\dA-F])
$
",
RegexOptions.IgnorePatternWhitespace |
RegexOptions.ExplicitCapture);

Note that you'd still need to verify the checksum.

The technique here is to specify "bookends" to bracket the portion
whose length you don't know ahead of time, and the data field has
to be whatever is in between.

The left bookend is the beginning of string, the colon, the length,
the address, and the type -- all with known lengths.

Then the plus quantifier in the dd subpattern (which matches one or
more of the preceding pattern -- pairs of hex digits in this case)
allows enough elasticity to grab only the variable-length portion of
the record.

Finally, the right bookend is the last byte in the record.

I hope this helps.

Greg
 
S

sklett

Very cool, Greg! Thank you for this thorough explanation and example, I
appreciate it!
Have a great weekend.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top