Regex: Could this pattern be more efficient?

sklett · Apr 18, 2006

I have an Intel hex file I need to parse. I want to run a regex on each
line to get the separate sections.
the format is like this:
:llaaaatt[d...]cc
where:
: - starts the record
ll - is the length of the data section([d...]) in hex
aaaa - is the address of the data in hex
tt - is the type in hex
[d...] are the data bytes in hex, this is a variable length section
cc - checksum in hex

So I need a pattern that will separate all the sections. I can get most of
them, but the variable data section I'm not sure. it basically will start
at index 9 and be ll long.

I'm thinking something like this (*note: I don't need section tt):
@"

?<ll>(\w{2}))(?<aaaa>(\w{4}))\w{2}(?<d>(\w+))(?<cc>(\w{2}))";

It works, but I'm very new to Regex and not sure if I could do this a better
way. Do you see any improvements that could be made?
Thanks for reading!
Steve

Greg Bacon · Apr 20, 2006

: I have an Intel hex file I need to parse. I want to run a regex on each
: line to get the separate sections.
: the format is like this:
: :llaaaatt[d...]cc
: where:
: : - starts the record
: ll - is the length of the data section([d...]) in hex
: aaaa - is the address of the data in hex
: tt - is the type in hex
: [d...] are the data bytes in hex, this is a variable length section
: cc - checksum in hex
:
: So I need a pattern that will separate all the sections. I can get
: most of them, but the variable data section I'm not sure. it basically
: will start at index 9 and be ll long.
:
: I'm thinking something like this (*note: I don't need section tt):
: @"

?<ll>(\w{2}))(?<aaaa>(\w{4}))\w{2}(?<d>(\w+))(?<cc>(\w{2}))";
:
: It works, but I'm very new to Regex and not sure if I could do this a
: better way. Do you see any improvements that could be made?

If you upcase your input, you could use

Regex pattern = new Regex(
@"
^
:
(?<ll> [\dA-F][\dA-F])
(?<aaaa> [\dA-F][\dA-F][\dA-F][\dA-F])
(?<tt> 0[0124])
(?<dd> ([\dA-F][\dA-F])+)
(?<cc> [\dA-F][\dA-F])
$
",
RegexOptions.IgnorePatternWhitespace |
RegexOptions.ExplicitCapture);

Note that you'd still need to verify the checksum.

The technique here is to specify "bookends" to bracket the portion
whose length you don't know ahead of time, and the data field has
to be whatever is in between.

The left bookend is the beginning of string, the colon, the length,
the address, and the type -- all with known lengths.

Then the plus quantifier in the dd subpattern (which matches one or
more of the preceding pattern -- pairs of hex digits in this case)
allows enough elasticity to grab only the variable-length portion of
the record.

Finally, the right bookend is the last byte in the record.

I hope this helps.

Greg

sklett · Apr 21, 2006

Very cool, Greg! Thank you for this thorough explanation and example, I
appreciate it!
Have a great weekend.

Regex: pulling values out of this string	4	Apr 7, 2006
Can't put a comma in a regex pattern?	4	Mar 6, 2007
Backreference in Regular Expression Quantifier	1	Mar 29, 2004
Regex help	8	Nov 14, 2009
more regex question how to avoid capturing leading empty lines	2	Aug 9, 2007
Regex question	7	Oct 9, 2006
Regular expression groupings/collections	2	Jun 1, 2006
VBA block cell copying.	1	Feb 12, 2008

Regex: Could this pattern be more efficient?

sklett

Greg Bacon

sklett

Ask a Question

Similar Threads