O
Oliver Sturm
Hello,
Replied by email. Just a quick summary of what was wrong with your
previous code.
Looking at the regex I had posted previously:
[ ]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]
This regex has a number of groups that I had added for test purposes. It
can be stripped down to this without any changes:
[ ]([A-Z][A-Z])[A-Z0-9]+\1[A-Z0-9]+\1[A-Z0-9]+[ ]
It still has one capture group that is absolutely necessary to make the
back reference work. The Regex.Split method has the peculiar behaviour of
adding the result of the capture group itself to the string array it
returns, and there doesn't seem to be a way around that. So in the sample
program I sent you, I used the matching functionality of the Regex class
instead and picked out the pieces from the string "manually".
All this is probably not the most efficient algorithm in the world -
including the idea of reading the whole 14MB file into a string - but I
wouldn't expect any big performance problems on a modern system... if
performance is important, there are certainly lots of optimizations that
can be done.
Oliver Sturm
Emailed a sample, thanks very much.
Replied by email. Just a quick summary of what was wrong with your
previous code.
Looking at the regex I had posted previously:
[ ]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]
This regex has a number of groups that I had added for test purposes. It
can be stripped down to this without any changes:
[ ]([A-Z][A-Z])[A-Z0-9]+\1[A-Z0-9]+\1[A-Z0-9]+[ ]
It still has one capture group that is absolutely necessary to make the
back reference work. The Regex.Split method has the peculiar behaviour of
adding the result of the capture group itself to the string array it
returns, and there doesn't seem to be a way around that. So in the sample
program I sent you, I used the matching functionality of the Regex class
instead and picked out the pieces from the string "manually".
All this is probably not the most efficient algorithm in the world -
including the idea of reading the whole 14MB file into a string - but I
wouldn't expect any big performance problems on a modern system... if
performance is important, there are certainly lots of optimizations that
can be done.
Oliver Sturm