My apologies, Nightcrawler.
Revised Standard Version:
(?<artist>.+)(?=(\s+-\s+))\1(?
?<title>.+)??(?<remix>(?:\([^\)]+\)|\[[^\]]+\]))|(?<title>.+))
Part of the problem with my first was that it didn't account for spaces in
the Artist or Title. Another was that I was not aware of the rules, which
include the possibility that there might be hyphens (or other characters)in
the Artist, Title, or Remix, and finally, that Title might contain
parenthetized groups of characters, just like Remix. Your examples were very
helpful!
A short explanation of the above:
(?<artist>.+)(?=(\s+-\s+))\1
This indicates that "artist" should be any characters that MUST be followed
by 1 or more spaces, a hyphen, and 1 or more spaces. This means that the
test will fail if the Artist contains a hyphen which has 1 or more spaceson
both sides, but that a hyphen which does NOT have a space on either the left
or right side is okay. The assertion is that the hyphen between "artist" and
"title" will have spaces on BOTH sides.
I put the "space-space" sequence into an unnamed capturing group, becauseit
has to be captured after the assertion, which does NOT capture it, in order
to match the rest of the line. Thus, the first part ends with "\1" which
captures the "space-space" sequence.
(?
?<title>.+)??(?<remix>(?:\([^\)]+\)|\[[^\]]+\]))|(?<title>.+))
This was the tricky part, since the "title" may have parenthetized character
groups in it, which look just like the "remix," further complicated by the
fact that "remix" may be absent. Note that this is not perfect, and I will
explain why in a bit.
It puts 2 possible combinations into an OR-ing non-capturing group. The
first possible combination is:
(?<title>.+)??(?<remix>(?:\([^\)]+\)|\[[^\]]+\]))
This uses a double-question-mark quantifier, which makes the first ("title")
part optional, and matches it lazily, a rare construct, but necessary in
this case, as we assume that the title WILL be there, but the lazy part
leaves room for the last part if there are any parenthetized groups of
characters in the "title." This is followed by the "remix" group, which is
defined as either a '(' followed by 1 or more non-')' characters, followed
by a ')', or a '[' followed by 1 or more non-']' characters, followed by a
']'. This ensures that if the remix is present, it will be captured.
However, if the remix is NOT present, we need an alternative:
(?<title>.+)
Captures the rest of the string, if the first alternative fails.
Now, as to why these rules are not perfect, let's have a look at one of the
items in your list:
Air - Cherry Blossom Girl (Because You Blossom) [DJ AM Mix]
Obviously, the [DJ AM Mix] is the Remix. Why obviously? Well, it is the last
parenthetized expression in the string. But what if you left the Remix off?
Air - Cherry Blossom Girl (Because You Blossom)
NOW, "Cherry Blossom Girl" becomes the title, and "(Because You Blossom)"
becomes the Remix. Why? Because it is the last parenthetized expression in
the string. Now, even a human being could not tell the difference, because
you are using a rule that states that the last the parenthetized expression
in the string is the Remix. In other words, your rules for "remix" overlap
your rules for "title." The only solution to this would be to further
qualify the rules. That is, you would have to either restrict the rules for
"title" to a certain pair of brackets, or restrict the rules for "remix" to
a certain pair of brackets.
Thanks for the challenge!
--
HTH,
Kevin Spencer
Microsoft MVP
Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net
(?<artist>\w+)\s+-\s+(?<title>\w+)(?:\s+[\(\[](?<remix>\w+)[)\]])?
Explanation:
There are 4 distinct parts to this:
(?<artist>\w+) Find a string of word characters. Captures to group
"artist"
\s+-\s+ Followed by 1 or more spaces, followed by a hyphen, followedby
1
or more spaces
(?<title>\w+) Find a string of word characters. Captures to group "title"
(?:\s+[\(\[](?<remix>\w+)[)\]])?
Non-capturing group, of which there may be 0 or 1. Begins with 1 or more
spaces, followed by 1 of the characters '(' or '['. This is followed by a
named capturing group called "remix" which is defined as 1 or more word
characters. This is followed by 1 of the characters ')' or ']'.
This assumes that there will always be an artist and a title, but that
remix
may be omitted.
--
HTH,
Kevin Spencer
Microsoft MVP
Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net
Hi all,
I am trying to use regular expressions to parse out mp3 titles into
three different groups (artist, title and remix). I currently have
three ways to name a mp3 file:
Artist - Title [Remix]
Artist - Title (Remix)
Artist - Title
I have approached the problem the following way.
First I start by looking to see if the following regex matches (?
<artist>.*?) - (?<title>.*?) \[(?<remix>.*?)\]. If not I move on to
see if (?<artist>.*?) - (?<title>.*?) \((?<remix>.*?)\) matches. If
not I move on to see if (?<artist>.*?) - (?<title>.*?) matches,
however I run into two problems.
1. The last regex does not work.
2. I have to execute these regular expressions in the above order for
it to be correct. If I would execute a working version of the last
regex it would match every time.
So my two questions are:
1. Is there a better way to do this? Do I have to execute the regular
expressions in order for this to work? It could be problematic if I
introduce more naming conventions.
2. How do I get the last regular expression to work.
Any help is appreciated.
Thanks- Hide quoted text -
- Show quoted text -
Thank you. I tried your regex on a sample of 10 titles and it didn't
really work. Here are my ten samples that I used:
From P-60 - Sinking With The Fall
JP Conley - Karma Moods [Soul Mix]
Soul Beats - Wherever You Go... [Love Mix]
Thievery Corporation - Doors Of Perception
Thievery Corporation - Holographic Universe
Ananda Project - Universal Love [Jay-J's Shifted Up Mix]
Collective Sound Members - Switch
Cool Touch - Gravity
Dennis Ferrer - Church Lady Part 2 [Bryan Cox Remix]
Air - Cherry Blossom Girl (Because You Blossom) [DJ AM Mix]
After I ran the regular expression on the titles above. Here is what
the groups caught:
Remix
Please let me know what is wrong.
Thanks- Hide quoted text -
- Show quoted text -