RegEx how do I do unique?

D · Feb 8, 2005

My first attempt at this and I'm searching formulas like so

RIGHT(TEXT(A15,'yy'),1)*1000+A15-CONCATENATE(1,'-','jan','-',TEXT(A15,'yy'))+1

I want to extract the row / col coordinates (A15 in above)

so I'm using this [a-z]+\d+\d*

however I want unique ones and not 3 copies of A15 as the above returns.

How do I do that? I was reading that (?!) is used for uniqueness but I
cannot seem to get it to work here

Thanks

Chris R. Timmons · Feb 9, 2005

My first attempt at this and I'm searching formulas like so

RIGHT(TEXT(A15,'yy'),1)*1000+A15-CONCATENATE(1,'-','jan','-',TEXT
(A15,'yy'))+1

I want to extract the row / col coordinates (A15 in above)

so I'm using this [a-z]+\d+\d*

however I want unique ones and not 3 copies of A15 as the above
returns.

How do I do that? I was reading that (?!) is used for uniqueness
but I cannot seem to get it to work here

Dave,

(?!) is the negative lookahead construct. I don't think it would be
applicable here.

What might work best is to get all of the matches, then put them into
a hash table to get just the unique ones.

The regex "[a-z]+\d+\d*" probably matches too much (also the last \d*
isn't needed). If this is for an Excel spreadsheet, note that Excel
has a limit of 256 columns (A thru IV) and 65,536 rows per worksheet.
So a regular expression to match Excel cell references should take
this into account so it doesn't match things that can't be valid cell
references.

For example:

string regex = @"
#
# This regex matches strings like A15, IV3491, C9 or DP138.
#

\b # Match a word boundary.
(?:[a-z]|[a-i][a-z]?) # Match cell IDs A thru IV.
\d{1,5} # Match between one and five numbers.
\b # Match a word boundary.
";

string s =
"RIGHT(TEXT(A15,'yy'),1)*1000+A15-CONCATENATE(1,'-','jan'," +
"'-',TEXT(A15,'yy'))+1";

// Get all of the matches.
MatchCollection mc = Regex.Matches(s, regex,
RegexOptions.Singleline |
RegexOptions.IgnorePatternWhitespace |
RegexOptions.IgnoreCase);

Console.WriteLine("All cell IDs = {0}", mc.Count);

// Put all of the non-unique matches into the keys of
// a hash table.
Hashtable uniqueCells = new Hashtable(mc.Count);
foreach (Match m in mc)
uniqueCells[m.Value.ToUpper()] = null;

Console.WriteLine("Unique cell IDs = {0}", uniqueCells.Count);

// Print the unique cell IDs.
foreach (DictionaryEntry de in uniqueCells)
Console.WriteLine(de.Key.ToString());

D · Feb 9, 2005

----- Original Message -----
From: "Chris R. Timmons" <crtimmons@X_NOSPAM_Xcrtimmonsinc.com>
Newsgroups: microsoft.public.dotnet.languages.csharp
Sent: Wednesday, February 09, 2005 1:30 AM
Subject: Re: RegEx how do I do unique?

My first attempt at this and I'm searching formulas like so

RIGHT(TEXT(A15,'yy'),1)*1000+A15-CONCATENATE(1,'-','jan','-',TEXT
(A15,'yy'))+1

I want to extract the row / col coordinates (A15 in above)

so I'm using this [a-z]+\d+\d*

however I want unique ones and not 3 copies of A15 as the above
returns.

How do I do that? I was reading that (?!) is used for uniqueness
but I cannot seem to get it to work here

Click to expand...

Dave,

(?!) is the negative lookahead construct. I don't think it would be
applicable here.

What might work best is to get all of the matches, then put them into
a hash table to get just the unique ones.

The regex "[a-z]+\d+\d*" probably matches too much (also the last \d*
isn't needed). If this is for an Excel spreadsheet, note that Excel
has a limit of 256 columns (A thru IV) and 65,536 rows per worksheet.
So a regular expression to match Excel cell references should take
this into account so it doesn't match things that can't be valid cell
references.

For example:

string regex = @"
#
# This regex matches strings like A15, IV3491, C9 or DP138.
#

\b # Match a word boundary.
(?:[a-z]|[a-i][a-z]?) # Match cell IDs A thru IV.
\d{1,5} # Match between one and five numbers.
\b # Match a word boundary.
";

string s =
"RIGHT(TEXT(A15,'yy'),1)*1000+A15-CONCATENATE(1,'-','jan'," +
"'-',TEXT(A15,'yy'))+1";

// Get all of the matches.
MatchCollection mc = Regex.Matches(s, regex,
RegexOptions.Singleline |
RegexOptions.IgnorePatternWhitespace |
RegexOptions.IgnoreCase);

Console.WriteLine("All cell IDs = {0}", mc.Count);

// Put all of the non-unique matches into the keys of
// a hash table.
Hashtable uniqueCells = new Hashtable(mc.Count);
foreach (Match m in mc)
uniqueCells[m.Value.ToUpper()] = null;

Console.WriteLine("Unique cell IDs = {0}", uniqueCells.Count);

// Print the unique cell IDs.
foreach (DictionaryEntry de in uniqueCells)
Console.WriteLine(de.Key.ToString());

Thanks.

I found the (?!) operator on this page
http://www.thescarms.com/dotNet/RegExUnique.asp where he was doing this

Dim pattern As String = "(?\b\w+\b)(?!.+\b\k\b)"

to find the unique occurrences of words in a string. Prior to that way he
also suggest the hash table which I have been using for testing while I
figured out the regex method.

I'll definately use your suggestions about the bounds of excel, that's a
good point I didn't think of.

Thank you.

Regards,
Dave