regular expression with linq to get files from a dir

  • Thread starter Thread starter Matthijs de Z
  • Start date Start date
M

Matthijs de Z

Hi,

After googling and reading some pages (among
http://msdn.microsoft.com/en-us/library/bb882639.aspx), i was trying
to get some code running, but I have some problem with the regular
expression.

I would like to look for files that match a name build like this:
defaulName20100223.zip

Therefore I made a regular expression string:

string regExpression = "("+myDefaultName+@"[0-9]{8}\.[zZ][iI][pP])";

where myDefaultName is just a string (needs to be dynamic).

the fileList I will query, will look something like this:

c:\\myDir1\\mydir2\\defaulName20100223.zip
c:\\myDir1\\mydir2\\extra backup defaulName20100223.zip
c:\\myDir1\\mydir2\\defaulName20100224.zip
c:\\myDir1\\mydir2\\defaulName20100225.zip
c:\\myDir1\\mydir2\\copy defaulName20100223.zip
c:\\myDir1\\mydir2\\defaulName20100226.zip
c:\\myDir1\\mydir2\\defaulName20100226-copy.zip

the only files I want in the result set are:
c:\\myDir1\\mydir2\\defaulName20100223.zip
c:\\myDir1\\mydir2\\defaulName20100224.zip
c:\\myDir1\\mydir2\\defaulName20100225.zip
c:\\myDir1\\mydir2\\defaulName20100226.zip

I tried to add Path.DirectorySeparatorChar.ToString() to the front of
the regular expression string, I get it twice.

string myDefaultName = @Path.DirectorySeparatorChar + "defaultName";
string regExpression = "("+myDefaultName+@"[0-9]{8}\.[zZ][iI][pP])";

Gives me "(\\defaultName[0-9]{8}\\.[zZ][iI][pP])" as a regular
expression, while DirectorySeparatorChar is actually just one \
(although I can see it twice again in the result set)

How can I make the regExpr so, that it does what it should do?
I pasted the rest of the code under the message.
Kinds regards and hope you can help me out,

Matthijs


-----------------------------------
private void worker()
{
string startFolder = @"c:\myDir1\mydir2\";

IEnumerable<System.IO.FileInfo> fileList =
GetFiles(startFolder);

string myDefaultName = @Path.DirectorySeparatorChar +
"defaultName";
string regExpression = "("+myDefaultName+@"[0-9]{8}\.[zZ]
[iI][pP])";

System.Text.RegularExpressions.Regex searchTerm =
new System.Text.RegularExpressions.Regex(regExpression);


var queryMatchingFiles =
from file in fileList
where file.Extension == ".zip"
let matches = searchTerm.Matches(file.FullName)
where searchTerm.Matches(file.FullName).Count > 0
select new
{
name = file.FullName,
matches = from
System.Text.RegularExpressions.Match match in matches
select match.Value
};


queryMatchingFiles = queryMatchingFiles; //just for debug
mode, so I can hover over it and check the content


}

static IEnumerable<System.IO.FileInfo> GetFiles(string path)
{
if (!System.IO.Directory.Exists(path))
throw new System.IO.DirectoryNotFoundException();

string[] fileNames = null;
List<System.IO.FileInfo> files = new
List<System.IO.FileInfo>();

fileNames = System.IO.Directory.GetFiles(path, "*.*");
foreach (string name in fileNames)
{
string onlyTheName =
name.Substring(name.LastIndexOf(Path.DirectorySeparatorChar) + 1);
files.Add(new System.IO.FileInfo(name));
}
return files;
}
 
I tried to add Path.DirectorySeparatorChar.ToString() to the front of
the regular expression string, I get it twice.

string myDefaultName = @Path.DirectorySeparatorChar + "defaultName";
string regExpression = "("+myDefaultName+@"[0-9]{8}\.[zZ][iI][pP])";

Gives me "(\\defaultName[0-9]{8}\\.[zZ][iI][pP])" as a regular
expression, while DirectorySeparatorChar is actually just one \
(although I can see it twice again in the result set)

When you say "I get it twice," what are you using to make this
determination? Are you simply looking at the tooltip display in the
debugger? If so, that shows you the "C# view" of the string, with all
characters that need escaping escaped. In other words, if your string
actually contains "C:\Temp" what you'll see in the debug view is "C:\\Temp".
It's basically the IDE showing you exactly what you would need to enter in
code (without the @"" syntax) if you wanted to make this string a constant,
i.e., if you wanted to have

string myString = "C:\\Temp"

in code.
 
Matthijs said:
Hi,

After googling and reading some pages (among
http://msdn.microsoft.com/en-us/library/bb882639.aspx), i was trying
to get some code running, but I have some problem with the regular
expression.

I would like to look for files that match a name build like this:
defaulName20100223.zip

Therefore I made a regular expression string:

string regExpression = "("+myDefaultName+@"[0-9]{8}\.[zZ][iI][pP])";

[...]
I tried to add Path.DirectorySeparatorChar.ToString() to the front of
the regular expression string, I get it twice.

If you are trying to match on the filename only, it seems to me you'd be
better off preprocessing the path before you hand it to the regex. Just
use the Path class, with the GetFileName() method, to obtain only the
filename portion of the path, then match that against the regex.

There's probably a way to handle the directory separator characters in
the regex, but doing so seems overly complicated to me, given that .NET
already has path-specific support for manipulating strings.

Pete
 
Matthijs de Z said:
I tried to add Path.DirectorySeparatorChar.ToString() to the front of
the regular expression string, I get it twice.
string myDefaultName = @Path.DirectorySeparatorChar + "defaultName";
string regExpression = "("+myDefaultName+@"[0-9]{8}\.[zZ][iI][pP])";
Gives me "(\\defaultName[0-9]{8}\\.[zZ][iI][pP])" as a regular
expression, while DirectorySeparatorChar is actually just one \
(although I can see it twice again in the result set)

When you say "I get it twice," what are you using to make this
determination? Are you simply looking at the tooltip display in the
debugger?

when I hover over the variable that contains the string
(myDefaultName) I see \\ but when I add the string to a richTextBox I
just see one. So I suppose it's just one \. But still....it doesn't
work...
Any suggestions?
Regards,

Matthijs

If so, that shows you the "C# view" of the string, with all
 
Matthijs said:
After googling and reading some pages (among
http://msdn.microsoft.com/en-us/library/bb882639.aspx), i was trying
to get some code running, but I have some problem with the regular
expression.
I would like to look for files that match a name build like this:
defaulName20100223.zip
Therefore I made a regular expression string:
string regExpression = "("+myDefaultName+@"[0-9]{8}\.[zZ][iI][pP])";
[...]
I tried to add Path.DirectorySeparatorChar.ToString() to the front of
the regular expression string, I get it twice.

If you are trying to match on the filename only, it seems to me you'd be
better off preprocessing the path before you hand it to the regex.  Just
use the Path class, with the GetFileName() method, to obtain only the
filename portion of the path, then match that against the regex.

if I do that, i think I will still have a problem with for
instance:'copy defaulName20100223.zip'
How can I make sure I only get the names like defaulName20100223.zip?
regards,

Matthijs
 
I tried to add Path.DirectorySeparatorChar.ToString() to the front of
the regular expression string, I get it twice.
string myDefaultName = @Path.DirectorySeparatorChar + "defaultName";
string regExpression = "("+myDefaultName+@"[0-9]{8}\.[zZ][iI][pP])";
Gives me "(\\defaultName[0-9]{8}\\.[zZ][iI][pP])" as a regular
expression, while DirectorySeparatorChar is actually just one \
(although I can see it twice again in the result set)
When you say "I get it twice," what are you using to make this
determination? Are you simply looking at the tooltip display in the
debugger?
when I hover over the variable that contains the string
(myDefaultName) I see \\ but when I add the string to a richTextBox I
just see one. So I suppose it's just one \.

So then you're seeing exactly what I described.
But still....it doesn't work...

See my other reply.
 
if I do that, i think I will still have a problem with for
instance:'copy defaulName20100223.zip'
How can I make sure I only get the names like defaulName20100223.zip?
regards,

Put ^ at the beginning of the regex so that it only matches if the string
starts with the default name.
 
Put ^ at the beginning of the regex so that it only matches if the string
starts with the default name.

when I use
string regExpression = @"^([0-9]{8}\.[zZ][iI][pP])";

It doesn't work, unless I trimdown the filename, cutting of all
directory info. But I need need that actually..
Regards,

Matthijs
 
Put ^ at the beginning of the regex so that it only matches if the string
starts with the default name.

when I use
string regExpression = @"^([0-9]{8}\.[zZ][iI][pP])";

It doesn't work, unless I trimdown the filename, cutting of all
directory info. But I need need that actually..

Well, I was building on what Pete said, and he suggested that you strip of
the directory information. I didn't realize it was important to you.

Does this work:

string regExpression = @".*\\" + myDefaultName +
"(\d{8}\.[zZ][iI][pP])$";

(I replaced [0-9] with \d, since they're the same. Also, you should just
consider setting the case-insensistive option on the regex and test for
"zip" instead of the way you're doing it now, unless case in the rest of the
file name is important--but why would it be?)
 
    string regExpression = @".*\\" + myDefaultName +
"(\d{8}\.[zZ][iI][pP])$";

adding a @ to the "(\d{8}\.[zZ][iI][pP])$" part was the final thing.
Now it works fine.
thanks all!
 
Back
Top