Regex items in any order

  • Thread starter Greg Collins [Microsoft MVP]
  • Start date
G

Greg Collins [Microsoft MVP]

I've done a bit of research on this topic in the newsgroups and on MSDN, and though it sounds possible, I still don't understand how I would make it work.

I want to be able to use named groups to capture from a path items that can be in any order.

For example, the path might be:

/root/item1/item2/item2/item3/item4

or it might be mixed up:

/root/item4/item3/item1/item2

and it is possible that certain items might not even be present:

/root/item3/item1

/root will always be present, though its name will vary depending on which type of path I am trying to match--so I'm mainly concerned with matching named groups <Item1> through <Item4> when they can appear in any order.

I'd prever to be able to do this WITHOUT needing to write an expression for each combination, as there could be upwards of 10 items I want to match (too many combinations!!)

Thanks ahead of time for your help!
 
D

Dave Sexton

Hi Greg,

It's probably easier to just split the path into an array on the "/"
character, then iterate the array and for each element check whether it
exists in a list of values.

--
Dave Sexton

"Greg Collins [Microsoft MVP]" <gcollins_AT_msn_DOT_com> wrote in message
I've done a bit of research on this topic in the newsgroups and on MSDN, and
though it sounds possible, I still don't understand how I would make it
work.

I want to be able to use named groups to capture from a path items that can
be in any order.

For example, the path might be:

/root/item1/item2/item2/item3/item4

or it might be mixed up:

/root/item4/item3/item1/item2

and it is possible that certain items might not even be present:

/root/item3/item1

/root will always be present, though its name will vary depending on which
type of path I am trying to match--so I'm mainly concerned with matching
named groups <Item1> through <Item4> when they can appear in any order.

I'd prever to be able to do this WITHOUT needing to write an expression for
each combination, as there could be upwards of 10 items I want to match (too
many combinations!!)

Thanks ahead of time for your help!
 
O

Oliver Sturm

Hello Greg,

I'm not entirely sure I understand what you want to do. For instance, this
expression here matches all your examples and allows you to get to each
single element of the path:

\/[a-z0-9]+

Of course this could be a bit more static, leaving less options for the
names of path elements... maybe like this:

\/(root|item\d)

This matches all your samples as well.

From your question I'm getting the impression that you want to do more
than this, but I don't really understand what exactly you need. Can you
explain a bit more on this basis?


Oliver Sturm
 
G

Greg Collins [Microsoft MVP]

Thanks. This sounds like it might be a viable option... I'll look into this.
 
G

Greg Collins [Microsoft MVP]

I was using a pretty generic sample... So if I follow down one particluar root, in this example: articles, i might have the following possibilities for valid paths:

/articles
/articles/new
/articles/updated
/articles/coming
/articles/author/{author}
/articles/date/{year}[/{month}[/{day}]]
/articles/level/{level}
/articles/mrv
/articles/mfv

Most of these can be combined in any order. So I might have paths like:

/articles/level/3/author/Greg/new
/articles/date/2006/12/level/2-4/
/articles/author/Greg/mfv/

etc.

Hope that helps.
 
O

Oliver Sturm

Hello Greg,
I was using a pretty generic sample... So if I follow down one particluar
root, in this example: articles, i might have the following possibilities
for valid paths:

<snip>

Well, all of these should be matched by the first of the two alternatives
I posted. Have you tried it?

My previous post was targeted more at additional functionality that I
thought you wanted, not necessarily at additional samples. IOW, from your
original post I got the impression that you weren't asking for somebody to
write a regular expression for you, but rather to solve a particular
problem you were having, beyond writing the expression in the first place.

Finally, what Dave said in the parallel thread may very well be true - if
all you know about your path is that it consists of elements that are
separated by slash characters, you might get a more efficient algorithm by
splitting first and analyzing second, if analyzing is required at all. As
for as I understand, you don't have any syntactical requirements for the
path elements at all (except maybe that they can't contain the slash
character), in which case the second step could fall away completely.

The regular expression for this case would be (if you decide to go that way)

\/[^\/]+


Oliver Sturm
 
G

Greg Collins [Microsoft MVP]

Hi Oliver, thanks for your help! I haven't had time to play with any of the solutions yet. I expect I will have time later today or tomorrow to do that.
 
D

Dave Sexton

Hi Greg,

I assumed that you wanted something like Oliver's second example but with a
variable number of possible elements (upwards of 10 was stated):

/(new|date|coming|updated|author|(\d{4}(\d{2}(\d{2})?)?))

Calling Regex.Matches using the expression above might work (I didn't test
it myself); however, if there are a variable number of elements then I
suggest splitting and iterating instead since that will be much less cryptic
than dynamically creating a Regex pattern, IMO.

Also, if it's really "named groups" that you're interested in then you might
need to iterate the Matches anyway. i.e., If each of "new", "date",
"coming", etc. have specific semantics that your application must be able to
identify then using a Regex is probably not the best approach, especially if
the number of possible elements may vary.
 
G

Greg Collins [Microsoft MVP]

Thanks Dave.

I think I prefer splitting the string to using a generic regex. One thing I also need to do is identify when there is an invalid path and then take appropriate measures. Although both the generic regex and the split string would both ultimately get me to the same set of data, I think it would be easier to work with the split string in the long run (at least for me).

Thank both you and Oliver for your assistance. It would be great if regex had a way of doing what I was wanting (random order matches) but since it doesn't, I'll just have to hack it out of the string! :blush:)
 
O

Oliver Sturm

Hello Greg,
Thank both you and Oliver for your assistance. It would be great if regex
had a way of doing what I was wanting (random order matches) but since it
doesn't, I'll just have to hack it out of the string! :blush:)

Hm... that's pretty amazing. Since my first reply I've been trying to
understand what you were getting at with that "random order match"
request, but I haven't understood it yet :) If you can explain, I'll have
another shot.


Oliver Sturm
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top