regex question

  • Thread starter Thread starter CSharper
  • Start date Start date
C

CSharper

I have a situation where I have file with all the find and replace
values, I read it and put it in a dictionary format. Next I have to
read all the files in a given directory and read each line in each
file and apply regex on it. The way I am doing it is, not the real
code but for an idea...

foreach(file in directory)
{
filehandle = file.open(file)
foreach(line in filehandle)
{
RegEx = new Regex()
....
}
}

This part of the program is taking lot of time to execute (roughly 10
minutes). I think the problem is due to the regex instantiation for
each line in the file. What is the better way to do this? Do you think
an extension method will resolve the problem or using lambda or linq??

Thanks.
 
I suspect the first thing to try is using a single compiled Regex,
rather than re-creating it constantly:

Regex re = new Regex(pattern, RegexOptions.Compiled);
foreach(file in directory)
{
filehandle = file.open(file)
foreach(line in filehandle)
{
// ... use "re"
}
}

It isn't clear what mechanism you are using to read each line of the
file... but try the regex trick first.

Marc
 
Other considerations: how many files? how big are the files? how
complex is the regex?

Simple fact: more work takes more time...

Marc
 
I suspect the first thing to try is using a single compiled Regex,
rather than re-creating it constantly:

Regex re = new Regex(pattern, RegexOptions.Compiled);
foreach(file in directory)
{
    filehandle = file.open(file)
    foreach(line in filehandle)
    {
         // ... use "re"
    }

}

It isn't clear what mechanism you are using to read each line of the
file... but try the regex trick first.

Marc

let me make on correction, there is one more loop in the loops :)

foreach(file in directory)
{
filehandle = file.open(file)
foreach(line in filehandle)
{
foreach(pattern in patternList)
{
RegEx re = new RegEx(pattern, RegexOptions.Compiled)
//perform replacement
}
}
}

The problem is that, Foreach line I need to create instance of all the
available patterns to make sure it is matched. One thing, I thought
about is, since the patterns doesn't change, I could create the
arraylist of all the regex patterns once and then reuse them for all
the lines again and again...
Any better way of doing it?
 
Any better way of doing it?

For starters, how about running your code through a profiler to see
where the time is *actually* going. We can only guess...
Yes, re-using (and compiling) regular expressions will usually be
quicker (as long as it isn't a different pattern every time), but
without profiling this could be the difference between 5 seconds and 1
second in your reported 10 minute slice; or it could be the difference
between 10 minutes and 3 minutes...

Also - ArrayList: are you using .NET 1.1? If you are, I'd suggest
moving up to .NET 2.0 (or ideally 3.5 SP1) - there have been many
performance tweaks over the years. If you are already using .NET 2.0
or above, I would recommend stopping using ArrayList, but using
List<T> instead. This (ArrayList) is rarely (if ever) going to be a
major performance factor - but it does makes the code easier to work
with.

Marc
 
For starters, how about running your code through a profiler to see
where the time is *actually* going. We can only guess...
Yes, re-using (and compiling) regular expressions will usually be
quicker (as long as it isn't a different pattern every time), but
without profiling this could be the difference between 5 seconds and 1
second in your reported 10 minute slice; or it could be the difference
between 10 minutes and 3 minutes...

Also - ArrayList: are you using .NET 1.1? If you are, I'd suggest
moving up to .NET 2.0 (or ideally 3.5 SP1) - there have been many
performance tweaks over the years. If you are already using .NET 2.0
or above, I would recommend stopping using ArrayList, but using
List<T> instead. This (ArrayList) is rarely (if ever) going to be a
major performance factor - but it does makes the code easier to work
with.

Marc

Marc,

thank you and I haven't run the profiler yet. I will run it. Also I am
coding in 3.5 SP1 in C# 3.0.
 
Marc,

thank you and I haven't run the profiler yet. I will run it. Also I am
coding in 3.5 SP1 in C# 3.0.
Perhaps some threading might be useful as well.

regards
A.G.
 
Back
Top