S
skavan
Use Case:
We have music files that describe, in their filename, attributes of the
music.
We do not know a general pattern that applies to all filenames -- but
we do know that filenames that are clustered together (by for example
directory) will, most likely, have the same filename pattern.
Here is an example:
10,000 Maniacs - MTV Unplugged - 01 - These are Days.mp3
on its own, we can deduce little, except perhaps that the 01 is a
tracknumber (see my previous post). But if we know that all filenames
in the same directory, belong to the same album, but will have
different titles, then maybe, just maybe we can xform this filename
into its component parts.
So, if we have in our possession say, 2 more filenames:
10,000 Maniacs - MTV Unplugged - 02 - Eat for Two.mp3
10,000 Maniacs - MTV Unplugged - 03 - Candy Everybody Wants.mp3
Then we should be able to parse all the filenames into:
<var1>,<var2>,<tracknum>,<title>
Even if we don't know whether 10,000 Maniacs or MTV Unplugged is the
name of the Album or the Artists (this bit can be deduced by other
heuristics).
Now, it may be that a cluster of filenames has fewer properties:
4 Non Blondes - Train.mp3
4 Non Blondes - No Place Like Home.mp3
In this case the 4 is part of the Artist name which can be deduced,
since two tracks wouldn't have the same tracknumber.
As more example clusters:
01 What's the Matter Here.mp3
02 Hey Jack Kerouac.mp3
50 Cent - Back Down.mp3
50 Cent - Gotta Get.mp3
10 Just Like Anyone Aimee Mann Bachelor No. 2 Rock.mp3
09 Driving Sideways Aimee Mann Bachelor No. 2 Rock.mp3
This is doing my head in. I just can't figure out how to attack the
problem - even though my instincts say Regex should be the
tool....H-E-L-P!! Any ideas anyone?
We have music files that describe, in their filename, attributes of the
music.
We do not know a general pattern that applies to all filenames -- but
we do know that filenames that are clustered together (by for example
directory) will, most likely, have the same filename pattern.
Here is an example:
10,000 Maniacs - MTV Unplugged - 01 - These are Days.mp3
on its own, we can deduce little, except perhaps that the 01 is a
tracknumber (see my previous post). But if we know that all filenames
in the same directory, belong to the same album, but will have
different titles, then maybe, just maybe we can xform this filename
into its component parts.
So, if we have in our possession say, 2 more filenames:
10,000 Maniacs - MTV Unplugged - 02 - Eat for Two.mp3
10,000 Maniacs - MTV Unplugged - 03 - Candy Everybody Wants.mp3
Then we should be able to parse all the filenames into:
<var1>,<var2>,<tracknum>,<title>
Even if we don't know whether 10,000 Maniacs or MTV Unplugged is the
name of the Album or the Artists (this bit can be deduced by other
heuristics).
Now, it may be that a cluster of filenames has fewer properties:
4 Non Blondes - Train.mp3
4 Non Blondes - No Place Like Home.mp3
In this case the 4 is part of the Artist name which can be deduced,
since two tracks wouldn't have the same tracknumber.
As more example clusters:
01 What's the Matter Here.mp3
02 Hey Jack Kerouac.mp3
50 Cent - Back Down.mp3
50 Cent - Gotta Get.mp3
10 Just Like Anyone Aimee Mann Bachelor No. 2 Rock.mp3
09 Driving Sideways Aimee Mann Bachelor No. 2 Rock.mp3
This is doing my head in. I just can't figure out how to attack the
problem - even though my instincts say Regex should be the
tool....H-E-L-P!! Any ideas anyone?