Caching Queries in DataTable

jehugaleahsa · Aug 14, 2008

Hello:

I am writing a cute little class that will cache queries against a
database. Currently, I am implementing this by storing the command
text, parameter values and generated DataRows.

First of all, is there something that does this already?

Second, right now I use a struct called Result that looks like this:

private struct Result
{
public object[] ParameterValues;
public DataRow[] Rows;
}

I am looking up past queries by storing the queries in a
Dictionary<string, List<Result>>, where the string key is the command
text and the List<Result> value is a collection of all the parameter
values and the generated DataRows that have been queried thus far.

Is there a way to improve the performance of this search? I am looking
for ideas. Currently, I have to loop though all the Results in the
List<Result> and compare against each of the parameter values. Is
there a way to optimize this algorithm while keeping things generic? I
can't even go against a primary key, since this class is meant to work
with anything.

Essentially, this class is worthless if it takes longer to find the
cached results than it would be to just rehit the database.

Thanks,
Travis

Pavel Minaev · Aug 14, 2008

Hello:

I am writing a cute little class that will cache queries against a
database. Currently, I am implementing this by storing the command
text, parameter values and generated DataRows.

First of all, is there something that does this already?

If you're doing it solely for performance reasons, then yes. The
database itself is likely to cache query results.

Second, right now I use a struct called Result that looks like this:

private struct Result
{
public object[] ParameterValues;
public DataRow[] Rows;

}

I am looking up past queries by storing the queries in a
Dictionary<string, List<Result>>, where the string key is the command
text and the List<Result> value is a collection of all the parameter
values and the generated DataRows that have been queried thus far.

Is there a way to improve the performance of this search? I am looking
for ideas. Currently, I have to loop though all the Results in the
List<Result> and compare against each of the parameter values. Is
there a way to optimize this algorithm while keeping things generic? I
can't even go against a primary key, since this class is meant to work
with anything.

Since you're effectively looking up by CommandText and Parameters, why
not aggregate them together into a single class, and use that as a key
of your Dictionary? Then your search will be just a single hashtable
lookup, without the following linear scan (though GetHashCode() would
be somewhat slower).

Ignacio Machin ( .NET/ C# MVP ) · Aug 14, 2008

Hello:

I am writing a cute little class that will cache queries against a
database. Currently, I am implementing this by storing the command
text, parameter values and generated DataRows.

First of all, is there something that does this already?

Second, right now I use a struct called Result that looks like this:

private struct Result
{
public object[] ParameterValues;
public DataRow[] Rows;

}

I am looking up past queries by storing the queries in a
Dictionary<string, List<Result>>, where the string key is the command
text and the List<Result> value is a collection of all the parameter
values and the generated DataRows that have been queried thus far.

Is there a way to improve the performance of this search? I am looking
for ideas. Currently, I have to loop though all the Results in the
List<Result> and compare against each of the parameter values. Is
there a way to optimize this algorithm while keeping things generic? I
can't even go against a primary key, since this class is meant to work
with anything.

Essentially, this class is worthless if it takes longer to find the
cached results than it would be to just rehit the database.

Thanks,
Travis

honestly, unless that you have a serious performance hit by accessing
the DB you should go to it.

if your approach is faster depend of several things like how many
combinations of command/parameters you will have, how fast is your DB
connection, etc

jehugaleahsa · Aug 14, 2008

Hello:

Click to expand...

I am writing a cute little class that will cache queries against a
database. Currently, I am implementing this by storing the command
text, parameter values and generated DataRows.

Click to expand...

First of all, is there something that does this already?

Click to expand...

If you're doing it solely for performance reasons, then yes. The
database itself is likely to cache query results.

Second, right now I use a struct called Result that looks like this:

Click to expand...

private struct Result
{
public object[] ParameterValues;
public DataRow[] Rows;

}

Click to expand...

I am looking up past queries by storing the queries in a
Dictionary<string, List<Result>>, where the string key is the command
text and the List<Result> value is a collection of all the parameter
values and the generated DataRows that have been queried thus far.

Click to expand...

Is there a way to improve the performance of this search? I am looking
for ideas. Currently, I have to loop though all the Results in the
List<Result> and compare against each of the parameter values. Is
there a way to optimize this algorithm while keeping things generic? I
can't even go against a primary key, since this class is meant to work
with anything.

Click to expand...

Since you're effectively looking up by CommandText and Parameters, why
not aggregate them together into a single class, and use that as a key
of your Dictionary? Then your search will be just a single hashtable
lookup, without the following linear scan (though GetHashCode() would
be somewhat slower).- Hide quoted text -

- Show quoted text -

How do I implement such a class? How can I be sure that my hash codes
don't overlap? I mean, I don't know how many parameter values there
may be. Won't this be an issue?

Or is it safe to iterate over each parameter and the command text and
just xor the hash codes?

Alvin Bruney [ASP.NET MVP] · Aug 14, 2008

You have a design issue here that you aren't addressing. Caching the queries
brings no conceivable advantage - your rdbms already does that for you by
preserving and optimizing the query. Also, query construction is simply a
matter of string concatenation which should, all things considered, rival a
hash table look up and load.

You should cache the results of the query since that is where the expense
lies. Opening a connection and transferring data is what is expensive.

--

Regards,
Alvin Bruney [MVP ASP.NET]

[Shameless Author plug]
Download OWC Black Book, 2nd Edition
Exclusively on www.lulu.com/owc $15.00
Need a free copy of VSTS 2008 w/ MSDN Premium?
http://msmvps.com/blogs/alvin/Default.aspx
-------------------------------------------------------

Hello:

Click to expand...

I am writing a cute little class that will cache queries against a
database. Currently, I am implementing this by storing the command
text, parameter values and generated DataRows.

Click to expand...

First of all, is there something that does this already?

Click to expand...

If you're doing it solely for performance reasons, then yes. The
database itself is likely to cache query results.

Second, right now I use a struct called Result that looks like this:

Click to expand...

private struct Result
{
public object[] ParameterValues;
public DataRow[] Rows;

}

Click to expand...

I am looking up past queries by storing the queries in a
Dictionary<string, List<Result>>, where the string key is the command
text and the List<Result> value is a collection of all the parameter
values and the generated DataRows that have been queried thus far.

Click to expand...

Is there a way to improve the performance of this search? I am looking
for ideas. Currently, I have to loop though all the Results in the
List<Result> and compare against each of the parameter values. Is
there a way to optimize this algorithm while keeping things generic? I
can't even go against a primary key, since this class is meant to work
with anything.

Click to expand...

Since you're effectively looking up by CommandText and Parameters, why
not aggregate them together into a single class, and use that as a key
of your Dictionary? Then your search will be just a single hashtable
lookup, without the following linear scan (though GetHashCode() would
be somewhat slower).- Hide quoted text -

- Show quoted text -

Click to expand...

How do I implement such a class? How can I be sure that my hash codes
don't overlap? I mean, I don't know how many parameter values there
may be. Won't this be an issue?

Or is it safe to iterate over each parameter and the command text and
just xor the hash codes?

jehugaleahsa · Aug 15, 2008

You have a design issue here that you aren't addressing. Caching the queries
brings no conceivable advantage - your rdbms already does that for you by
preserving and optimizing the query. Also, query construction is simply a
matter of string concatenation which should, all things considered, rivala
hash table look up and load.

You should cache the results of the query since that is where the expense
lies. Opening a connection and transferring data is what is expensive.

I am caching the result of the query in a DataTable. But, how do you
know which results belong to which query? I could dump down everything
and then work entirely in my DataTable, but that is not a generic
solution. The table could be huge or I don't want to spend a
considerable time building the table at start up.

If I can reduce the time it takes to find older results, then I should
be able use a little more memory and save a hit or two to the database.

jehugaleahsa · Aug 15, 2008

An interesting thing to note: I went ahead and took Pavel Minaev's
suggestion to make a Hashable class. It was a great idea, once I
realized clashing didn't matter (just implement Equals). When running
a few examples, I was able to get some statistics. The cacher really
only helps if there is a moderate number of parameters (which is
usually the case) and the parameter values fall within a moderately-
sized range (which is also usual). The memory usage was actually very
low. I might enact a expiration policy.

So really, the value of the cacher hinges on the rate at which
parameter values repeat. It makes sense since the cacher can't cache
something unless it has already been queried. Duh.

Obaum1 · Aug 15, 2008

Hello:

I am writing a cute little class that will cache queries against a
database. Currently, I am implementing this by storing the command
text, parameter values and generated DataRows.

First of all, is there something that does this already?

Second, right now I use a struct called Result that looks like this:

private struct Result
{
public object[] ParameterValues;
public DataRow[] Rows;

}

I am looking up past queries by storing the queries in a
Dictionary<string, List<Result>>, where the string key is the command
text and the List<Result> value is a collection of all the parameter
values and the generated DataRows that have been queried thus far.

Is there a way to improve the performance of this search? I am looking
for ideas. Currently, I have to loop though all the Results in the
List<Result> and compare against each of the parameter values. Is
there a way to optimize this algorithm while keeping things generic? I
can't even go against a primary key, since this class is meant to work
with anything.

Essentially, this class is worthless if it takes longer to find the
cached results than it would be to just rehit the database.

Thanks,
Travis

Hello

If i understand right you need Linq to something
the select of the linq is very fast
I need more details

Pavel Minaev · Aug 15, 2008

An interesting thing to note: I went ahead and took Pavel Minaev's
suggestion to make a Hashable class. It was a great idea, once I
realized clashing didn't matter (just implement Equals).

Erm... you _do_ need to implement GetHashCode() if you provide your
own Equals(). Yes, your GetHashCode() can return equal values for non-
equal objects without breaking anything - but it should never return
non-equal values for equal objects, otherwise any hash-based container
will break! And if you override Equals() to use data equality, but
leave the default GetHashCode(), then that is precisely what will
happen...

jehugaleahsa · Aug 18, 2008

Erm... you _do_ need to implement GetHashCode() if you provide your
own Equals(). Yes, your GetHashCode() can return equal values for non-
equal objects without breaking anything - but it should never return
non-equal values for equal objects, otherwise any hash-based container
will break! And if you override Equals() to use data equality, but
leave the default GetHashCode(), then that is precisely what will
happen...

I went ahead and did some additional analysis. The class itself is
really only useful in a very small number of cases. The class comes
with the requirement that data doesn't change a lot, since keeping a
local copy around will eventually lead outdated data. The class also
have to have an extremely limited number of inputs. The values of
parameters must cluster. As the values approach being truly random,
the range of values must be fractions smaller than the number of
reads. Also, the number of parameters also have to be small because as
each value is added, the probability of hitting a cached query grows
substantially. Say parameter one usually tends to fall between 0-99
and the next parameter falls between 0-9. The likelihood of a value
being the cache in a random environment approaches 99 * 9. Take into
account that missed caches still require a hash table hit. From my
analysis, half the processing time is performed by the caching
algorithm performing searches (which is amortized constant), rather
than simply hitting the database and allowing it to perform the
majority of the work, that is in a random environment. Finally,
deleted records need to be handled appropriately. It doesn't make
sense to return a deleted row, so the cacher probably should remove
them before returning the results. This requires a linear search every
time results are returned, since the cache keeps track of the DataRow
directly and doesn't have another means of keeping their values in
tact. The only other work around is to create an expression for
searching within the DataTable that would yield the same results.
However, creating such an expression would probably have even more of
a negative affect on runtime performance (as well as only be partially
implementable).

I think what I am getting at is this: it probably makes more sense to
use an Identity Map (a class for caching results by primary key) when
possible and to just hit the database directly otherwise. With the
primary key, seaches can be performed against the DataTable, which
means it will probably be fast and deal with deleted DataRows
automatically. Of course, I would like to do some more research on a
cacher that works with foreign keys, since that seems slightly more
useful.

Thanks for everyone's input.

Caching Queries in DataTable

jehugaleahsa

Pavel Minaev

Ignacio Machin ( .NET/ C# MVP )

jehugaleahsa

Alvin Bruney [ASP.NET MVP]

jehugaleahsa

jehugaleahsa

Obaum1

Pavel Minaev

jehugaleahsa