Quickly Finding Items by multiple indicies

J

jehugaleahsa

Hello:

This is probably more of an algorithms question than a C# question.

We have these classes that represent lines on a report. They contain
two field:

1) Line Print - A String that indicates the type of line and ordered
lexicographically.
2) ModCode - A String that indicates that a special action should be
taken on the line.

Since Line Print is the order that the lines are calculated and
displayed, I have a List<Line> sorted by Line Print. However, the
existing code commonly searches for all lines with a given mod code.
To do this, however, requires a linear search.

I can use binary search to find a line print, no problem. However, I
want to eliminate the common occurence of linear searches for multiple
mod codes.

First of all, is it a bad to have all these linear searches? Supposing
it is, what is the overhead of storing the Lines in both a List and a
Dictionary<string, List<Line>> going to mean for performance?
Performance does matter, here. Sure, my search times would decrease,
but I am afraid the extra memory could be problematic as well.

Suggestions?
 
L

Lasse Vågsæther Karlsen

First of all, is it a bad to have all these linear searches? Supposing
it is, what is the overhead of storing the Lines in both a List and a
Dictionary<string, List<Line>> going to mean for performance?
Performance does matter, here. Sure, my search times would decrease,
but I am afraid the extra memory could be problematic as well.

Suggestions?

At the moment, do you have any problems running this code against a
large dataset, like takes too long, etc.?

If not, I wouldn't prematurely optimize something that works, just
because you feel something could be done with it.

If you can justify the time taken to optimize the code against the time
needed to retest and recertify the code, the increased potential for
bugs, then it could be worth it.
 
J

jehugaleahsa

(e-mail address removed) wrote:




At the moment, do you have any problems running this code against a
large dataset, like takes too long, etc.?

If not, I wouldn't prematurely optimize something that works, just
because you feel something could be done with it.

If you can justify the time taken to optimize the code against the time
needed to retest and recertify the code, the increased potential for
bugs, then it could be worth it.

There is a N^3 operation that occurs for every line. This line will
appear 744 times (for every hour of the month). This is run almost 200
times for our customers. Saving a milliseconds quickly cuts minutes
off runtime.

I will obviously test the outcome. I was only wondering, if it becomes
requirement, whether I would have to do a lot in terms of data
structures.
 
J

Jon Skeet [C# MVP]

On Mar 25, 4:23 pm, "(e-mail address removed)" <[email protected]>
wrote:

There is a N^3 operation that occurs for every line.

At what point though? During searching? If not, changing searching
isn't going to change much.

It's still not clear what your dataset size is. Is it 200*744? If so,
a linear search of that will be *slightly* painful, but still not
hugely bad.

The overhead of a dictionary isn't likely to be particularly nasty in
terms of memory, but obviously it will be present.

Jon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top