Ok, I totally agree that a dataset is not a database and should not be
treated as such. And so it is usually not a good idea to load a 1 GB
databse into memory.
However, I think my situation is different, although if someone can
think of a more elegant solution, I'd be happy to hear it - maybe I'm
just overlooking something obvious.
My windows application (winforms, vb.net) allows a user to point at any
odbc datasource, create a query and then operate on the results. Because
the datasource might be anything from SQL Server to text to Excel to
Sybase to Oracle and so on, I can't write any dynamic sql in code, at
least not anything more complex than a simple select. So if the user
wants to operate on 200,000 rows of a table and that works out to a 700
MB datatable, there's not much I can do besides handle it.
Once I have the table, I need to loop through each row and do some stuff
with it - doesn't really matter what.
One options is to make them create two queries - one a simple table of
primary key values, and the other a query which can define the rest of
the fields needed, and this can then be called as a new query one at a
time using "where [key] = [value from first table]". But this is not
only a lot of unnecessary db calls, it also has three pitfalls.
1. If the primary key is a compound, that is very hard to write generic
dynamic sql for.
2. If the user aliases their primary key (because it is a compound or
for whatever reason), there could be problems creating the dynamic query
(ok, this is worth throwing an error back at the user for…).
3. If the user has any other SQL AFTER their where clause in the dynamic
query, such as Having, Group By, etc., it gets ugly – it is
theoretically possible to look for all these keywords and dissect the
query, but not something I want to do!
I could do this in passes if I could limit the number of rows returned
in each pass, but again it’s a problem of generic SQL compatibility –
Top X versus Set RowCount X, etc.
So does anyone have ideas about how to handle this? Currently, if the
query is too big I end up with a SystemOutOfMemory exception.