Must be more elegant solution than this?

M

Mike Davies

Hi Everyone,

Is there a better way of doing the following?

I have 2 lists. List 1 is a list of MAC addresses and List 2 is a list
of regular expressions. A user is only allowed to view devices that
match their list of regular expressions.

Small example:

List1
-------
0011223344
0022334455
0011225544

List2
-------
001122.*
002233.*

Now the only way I can see of doing this is to have a foreach loop for
List 1 and then nest a foreach loop for list2.

Pseudo Code

foreach(item in list1)
{

foreach(item in list2)
{
if (list1.item is a match on list2 regular expression)
{
Write(This device is allowed!)
break;
}
}
}

Seems a very inefficient way of doing it but can;t think of any othe
way to match list1 to list2? BTW, List1 could get quite large and
List2 could be a fair size as well.

Any ideas,

Thanks
 
M

Mark Wilden

Now the only way I can see of doing this is to have a foreach loop for
List 1 and then nest a foreach loop for list2.

I can't think of an easy way to improve on this, either.
Seems a very inefficient way of doing it

Try it, measure it, and see if it's worth spending any time trying to
optimize.

"Premature optimization is the root of all evil." (Hoare via Knuth)
 
B

Ben Voigt

Mark Wilden said:
I can't think of an easy way to improve on this, either.

Quickly, it would be far better to put the strings loop inside the regex
loop, because reusing a regex is much more efficient than creating a new
one.

If this turns out to be a performance bottleneck, some more ideas:
You might also get a win by extracting the leading literal characters from
the regex. Now you will only need to do a regex test on the addresses that
start with the right digits. Note that this may not help certain regexs at
all. Sorting your address list, and now getting the subset of potential
matches requires a binary search. This is oriented toward address lists
orders of magnitude larger than the regex list. Also flag addresses on the
first successful match to avoid testing them against each successive regex.
 
G

Greg Young

Another thing you should add is to remove items that have already been
matched from the items you continue trying to match :) think about the
following data ..

100000
..
..
123450


first item is 1*, there are then 70 other regexps to run ... the first
expression will in fact select every node.

Cheers,

Greg Young
MVP - C#
http://codebetter.com/blogs/gregyoung
 
L

Larry Lard

Mike said:
Hi Everyone,

Is there a better way of doing the following?

I have 2 lists. List 1 is a list of MAC addresses and List 2 is a list
of regular expressions. A user is only allowed to view devices that
match their list of regular expressions.

Small example:

List1
-------
0011223344
0022334455
0011225544

List2
-------
001122.*
002233.*

Now the only way I can see of doing this is to have a foreach loop for
List 1 and then nest a foreach loop for list2.

Pseudo Code

foreach(item in list1)
{

foreach(item in list2)
{
if (list1.item is a match on list2 regular expression)
{
Write(This device is allowed!)
break;
}
}
}

Seems a very inefficient way of doing it but can;t think of any othe
way to match list1 to list2? BTW, List1 could get quite large and
List2 could be a fair size as well.

As Greg Young suggests, your method can be improved by noting that at
the moment, you continue to check a device even when it has already
been matched!

Remembering that we can't modify a collection while enumerating it(*),
here is some pseudo code:

Create all the regexes up front
Create an address collection named ToCheck that contains all the
addresses
Create empty address collections named OkAddresses and StillToCheck

foreach regex
foreach address in ToCheck
if current regex matches address
add current address to OkAddresses
else
add current address to StillToCheck

set ToCheck to StillToCheck, and clear out StillToCheck

now OkAddresses is the ok addresses

(*) is why we can't just remove ok addresses out of ToCheck. Also, if
possible order your regexes so that the most inclusive will be run
first.
 
K

Kevin Spencer

Hi Mike,

First of all, it looks from your example, that you don't necessarily need to
use Regular Expressions at all. Unless there is something you need to match
other than the beginning of the string, you certainly do not need to use
Regular Expressions, and they might, in fact, impede performance.

If all you need to do is match the beginning of the string, first, sort both
lists. Then start with the first item in the user's list, and read all
members of the second list using IndexOf(list1) to determine if that entry
starts with the string from the first list. At the point where it returns 0,
you have found the first match. When it returns -1, you have found the last
match for that entry. The end point is now the starting point of the second
loop though the second list, and you repeat this process until you reach the
end of the first list.

User's Sorted List (listA):

001122
002233
002245

Sorted Available Mac Addresses (listB):

0000000000 - no match
0011112345 - no match
0011234567 - no match
0011223344 - matches [0]
0011223456 - matches [0]
0011234567 - no match
0012345678 - no match
0022334455 - matches [1]
0022334566 - matches [1]
0022345678 - no match
0022346678 - no match
0022455667 - matches [2]
0022456789 - matches [2]

bool[] matches = new bool[listB.Length]

int i = 0, ct = 0;
while (i < listA.Length)
{
while(listB[ct].IndexOf(listA) != 0); // skip non-matches
{
listB[ct] = false;
ct++;
}
while(listB[ct].IndexOf(listA) == 0) // find all matches
{
listB[ct] = true;
ct++:
}
i++;
}

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

Big thicks are made up of lots of little thins.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top