regex question

C

CSharper

I have a situation where I have file with all the find and replace
values, I read it and put it in a dictionary format. Next I have to
read all the files in a given directory and read each line in each
file and apply regex on it. The way I am doing it is, not the real
code but for an idea...

foreach(file in directory)
{
filehandle = file.open(file)
foreach(line in filehandle)
{
RegEx = new Regex()
....
}
}

This part of the program is taking lot of time to execute (roughly 10
minutes). I think the problem is due to the regex instantiation for
each line in the file. What is the better way to do this? Do you think
an extension method will resolve the problem or using lambda or linq??

Thanks.
 
M

Marc Gravell

I suspect the first thing to try is using a single compiled Regex,
rather than re-creating it constantly:

Regex re = new Regex(pattern, RegexOptions.Compiled);
foreach(file in directory)
{
filehandle = file.open(file)
foreach(line in filehandle)
{
// ... use "re"
}
}

It isn't clear what mechanism you are using to read each line of the
file... but try the regex trick first.

Marc
 
M

Marc Gravell

Other considerations: how many files? how big are the files? how
complex is the regex?

Simple fact: more work takes more time...

Marc
 
C

CSharper

I suspect the first thing to try is using a single compiled Regex,
rather than re-creating it constantly:

Regex re = new Regex(pattern, RegexOptions.Compiled);
foreach(file in directory)
{
    filehandle = file.open(file)
    foreach(line in filehandle)
    {
         // ... use "re"
    }

}

It isn't clear what mechanism you are using to read each line of the
file... but try the regex trick first.

Marc

let me make on correction, there is one more loop in the loops :)

foreach(file in directory)
{
filehandle = file.open(file)
foreach(line in filehandle)
{
foreach(pattern in patternList)
{
RegEx re = new RegEx(pattern, RegexOptions.Compiled)
//perform replacement
}
}
}

The problem is that, Foreach line I need to create instance of all the
available patterns to make sure it is matched. One thing, I thought
about is, since the patterns doesn't change, I could create the
arraylist of all the regex patterns once and then reuse them for all
the lines again and again...
Any better way of doing it?
 
M

Marc Gravell

Any better way of doing it?

For starters, how about running your code through a profiler to see
where the time is *actually* going. We can only guess...
Yes, re-using (and compiling) regular expressions will usually be
quicker (as long as it isn't a different pattern every time), but
without profiling this could be the difference between 5 seconds and 1
second in your reported 10 minute slice; or it could be the difference
between 10 minutes and 3 minutes...

Also - ArrayList: are you using .NET 1.1? If you are, I'd suggest
moving up to .NET 2.0 (or ideally 3.5 SP1) - there have been many
performance tweaks over the years. If you are already using .NET 2.0
or above, I would recommend stopping using ArrayList, but using
List<T> instead. This (ArrayList) is rarely (if ever) going to be a
major performance factor - but it does makes the code easier to work
with.

Marc
 
C

CSharper

For starters, how about running your code through a profiler to see
where the time is *actually* going. We can only guess...
Yes, re-using (and compiling) regular expressions will usually be
quicker (as long as it isn't a different pattern every time), but
without profiling this could be the difference between 5 seconds and 1
second in your reported 10 minute slice; or it could be the difference
between 10 minutes and 3 minutes...

Also - ArrayList: are you using .NET 1.1? If you are, I'd suggest
moving up to .NET 2.0 (or ideally 3.5 SP1) - there have been many
performance tweaks over the years. If you are already using .NET 2.0
or above, I would recommend stopping using ArrayList, but using
List<T> instead. This (ArrayList) is rarely (if ever) going to be a
major performance factor - but it does makes the code easier to work
with.

Marc

Marc,

thank you and I haven't run the profiler yet. I will run it. Also I am
coding in 3.5 SP1 in C# 3.0.
 
R

Registered User

Marc,

thank you and I haven't run the profiler yet. I will run it. Also I am
coding in 3.5 SP1 in C# 3.0.
Perhaps some threading might be useful as well.

regards
A.G.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Regex in C# 4
C# Regex Question 4
Newbie question about Regex 8
Regex : handling single quotes while parsing csv file 4
Regex woes 8
RegEx Format Help 4
Bug in Regex? 5
How to get rid of the regex???? 6

Top