Object oriented method is inefficient with database calling

TS · Aug 10, 2006

Say i have a class car with properties: Color, Make, Model, Year, DriverID
And a Driver class with properties: DriverID, Name

The driverID PRIVATE property is the id of the driver from say a driver
table (t_driver). This has a PUBLIC property accessor called Driver

My understanding of OO using the composition model is that when you want to
load up a car class, you would access the DB to get Color, make, Model,
Year, DriverID and load the Car class. When the PUBLIC property Driver is
accessed, it would use the PRIVATE property DriverID to call the DB and load
the Driver class.

In this example, 2 separate DB calls would be made to load the Car and
Driver classes. I have DB/network people at work saying that these calls
should be done at once to cut down on network traffic and DB calls.

Any comments as to why my way is not OO sound or firepower I could use to
tell to my peers?

Thanks so much!

Jon Skeet [C# MVP] · Aug 10, 2006

TS said:
Say i have a class car with properties: Color, Make, Model, Year, DriverID
And a Driver class with properties: DriverID, Name

The driverID PRIVATE property is the id of the driver from say a driver
table (t_driver). This has a PUBLIC property accessor called Driver

My understanding of OO using the composition model is that when you want to
load up a car class, you would access the DB to get Color, make, Model,
Year, DriverID and load the Car class. When the PUBLIC property Driver is
accessed, it would use the PRIVATE property DriverID to call the DB and load
the Driver class.

In this example, 2 separate DB calls would be made to load the Car and
Driver classes. I have DB/network people at work saying that these calls
should be done at once to cut down on network traffic and DB calls.

Any comments as to why my way is not OO sound or firepower I could use to
tell to my peers?

Depending on the ORM system involved, you can often eagerly fetch
things - issue a query which will load the car and the driver at the
same time using a join.

Linda Liu [MSFT] · Aug 11, 2006

Hi,

I think you should take different strategies depending on the amount of
records a database query to the table t_car returns.

If a database query to the table t_car returns a great deal of records a
time, I think you'd better load a Car object for each record and leave the
Driver property in the Car object aside. As you said, when the public
property Driver is accessed, use the private property DriverID to access
the database and load a Driver object.

In the above scenario, if you load the Driver object when you load the Car
object, the network traffic would be very heavy because there're a great
deal of Car objects after the query and you are going to load a Driver
object for each Car object.

I don't think you should use a join to query the database in this instance
either, because the join operation will consume a lot of resources as well.

The advantage of loading a Driver object when the public property Driver is
accessed is that this avoids the amount of querying database being too much
a time.

On the other hand, if you query the table t_car and get one record a time,
you could load a Car object for the record and then access the table
t_driver by the value of DriverID field in the Car object and load a Driver
object. This won't cause network traffic being heavy.

Hope this helps.
If you have anything unclear, please feel free to let me know.

Sincerely,
Linda Liu
Microsoft Online Community Support

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.

Jon Skeet [C# MVP] · Aug 11, 2006

Linda Liu said:
I think you should take different strategies depending on the amount of
records a database query to the table t_car returns.

If a database query to the table t_car returns a great deal of records a
time, I think you'd better load a Car object for each record and leave the
Driver property in the Car object aside. As you said, when the public
property Driver is accessed, use the private property DriverID to access
the database and load a Driver object.

In the above scenario, if you load the Driver object when you load the Car
object, the network traffic would be very heavy because there're a great
deal of Car objects after the query and you are going to load a Driver
object for each Car object.

I don't think you should use a join to query the database in this instance
either, because the join operation will consume a lot of resources as well.

You think it's better to do 501 queries to retrieve 500 cars rather
than 1 which has a join? I have to disagree. If you're pretty sure
you're going to need to use the driver, then joining the two tables is
the logical approach IMO.

The advantage of loading a Driver object when the public property Driver is
accessed is that this avoids the amount of querying database being too much
a time.

On the other hand, if you query the table t_car and get one record a time,
you could load a Car object for the record and then access the table
t_driver by the value of DriverID field in the Car object and load a Driver
object. This won't cause network traffic being heavy.

It'll cause far more queries though. I would have thought that would
actually mean heavier network traffic than fewer queries returning the
same total amount of data in effectively bigger chunks.

Bruce Wood · Aug 11, 2006

TS said:
Say i have a class car with properties: Color, Make, Model, Year, DriverID
And a Driver class with properties: DriverID, Name

The driverID PRIVATE property is the id of the driver from say a driver
table (t_driver). This has a PUBLIC property accessor called Driver

My understanding of OO using the composition model is that when you want to
load up a car class, you would access the DB to get Color, make, Model,
Year, DriverID and load the Car class. When the PUBLIC property Driver is
accessed, it would use the PRIVATE property DriverID to call the DB and load
the Driver class.

In this example, 2 separate DB calls would be made to load the Car and
Driver classes. I have DB/network people at work saying that these calls
should be done at once to cut down on network traffic and DB calls.

We use exactly this system where I work, but we have some tricks to cut
down on the number of queries.

1. If we know that the number of records will always be small (say, a
few hundred at most) then we design the data O-O layer to fetch the
entire table when the first item is requested for that table. So, if
you know that you're only ever going to have a hundred or so Drivers,
then the first request for a driver via a DriverId fetches all drivers
and caches them. This transfers more data, but the query is fast (no
conditions) and there is one query and then that's it.

2. We have collection fetches: "get me the drivers for all of these
cars" that result in a single query. For example:

CarCollection cars = new CarCollection();
.... add five cars to the collection ...
Drivers driversForCars = cars.GetDrivers();

results in a query like this:

SELECT * FROM DRIVERS WHERE DRIVER_ID IN ( 15, 24, 17, 2, 63 );

Again, one query gets you the drivers for all cars in the collection.
Compare this with:

foreach (Car c in cars)
{
Driver d = c.Driver;
...
}

Here, we either do this:

Drivers driversForCars = cars.GetDrivers();
foreach (Driver d in driversForCars) { ... do something ... }

or this:

cars.GetDrivers();
foreach (Car c in cars)
{
... do something with c.Driver ...
}

The latter, of course, assumes caching: once you have fetched a car's
driver you don't fetch it again. Caching may or may not be viable
depending upon whether multiple users need to be able to see each
others' changes in a timely manner.

Frans Bouma [C# MVP] · Aug 11, 2006

Jon said:
You think it's better to do 501 queries to retrieve 500 cars rather
than 1 which has a join? I have to disagree. If you're pretty sure
you're going to need to use the driver, then joining the two tables
is the logical approach IMO.

Joins aren't the solution. It might be in a simple 2 node graph with a
m:1 fetch (this particular situation) but once you have multiple graph
paths, it's not the way to go as a join will make it impossible to
formulate several common scenario's of prefetch paths in 1 query.

What's faster are 2 queries: one for the cars and one for the drivers.
You then use a small trick.
- if the # of master rows (cars in this case) is below a given
threshold, do:
select ... from drivers where driverid in (@D1, @D2, @D3.... @Dn)
where @Dx is a driverid from the cars.
- if the # of master rows is equal or above a given threshold, do:
select ... from drivers where driverid in (select driverid from cars
where <filter on cars>)

and you merge them on the client with hashvalues, which is pretty
simple and straightforward and fast.

This gives the least amount of roundtrips and overhead and the queries
always succeed, no matter how complex the graph of paths is, so you can
create generic code to produce these queries.

Joins seem a logical choice, eventually with UNIONs, though they
aren't.

Also, when pulling master-detail data in a 1:n scenario from the db,
it's often more efficient to use a subquery, due to the duplicates on
the master-side it will give. (one of the serious performance issues in
DLinq, if they fail to implement a subquery directive hint)

It'll cause far more queries though. I would have thought that would
actually mean heavier network traffic than fewer queries returning
the same total amount of data in effectively bigger chunks.

I agree, lazy loading on the car object to pull the driver from the db
is not the way to go if you want to load ALL drivers associated with a
loaded set of cars.

Lazy loading is only useful if you want to pull related data of a
small subset of entities on an occasional basis from the db, not in
graph-oriented fetches.

Frans

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------

Frans Bouma [C# MVP] · Aug 11, 2006

TS said:
Say i have a class car with properties: Color, Make, Model, Year,
DriverID And a Driver class with properties: DriverID, Name

The driverID PRIVATE property is the id of the driver from say a
driver table (t_driver). This has a PUBLIC property accessor called
Driver

My understanding of OO using the composition model is that when you
want to load up a car class, you would access the DB to get Color,
make, Model, Year, DriverID and load the Car class. When the PUBLIC
property Driver is accessed, it would use the PRIVATE property
DriverID to call the DB and load the Driver class.

In this example, 2 separate DB calls would be made to load the Car
and Driver classes. I have DB/network people at work saying that
these calls should be done at once to cut down on network traffic and
DB calls.

Any comments as to why my way is not OO sound or firepower I could
use to tell to my peers?

'why my way isn't OO' is a question for a discussion between pundits

, why or why isn't something OO and if it's not, it therefore must be
bad is something not useful to waste time on. So let's drop that.

the thing you're worrying about is that because you're targeting the
problem with an OO approach will result in a lot of queries. This in
general is the case, IF you use the easy-way-out method where you
simply fetch a related driver when you're asked to do that from a car
instance.

Though you can also use a little more advanced approach: define fetch
paths for prefetching, or in short prefetch paths. These are paths
along which related entities have to be fetched, eventually with
filters.

So if you want to fetch a set of car entities, and their associated
driver entities, you effectively define a path: Car - Driver. This
should effectively result in 2 queries: one for Car and one for Driver,
which resultsets are then merged on the client. There are some tricks
you can use to do this even more efficiently, please see my other post
in this thread.

Frans

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------

Jon Skeet [C# MVP] · Aug 11, 2006

Frans Bouma said:
Joins aren't the solution. It might be in a simple 2 node graph with a
m:1 fetch (this particular situation) but once you have multiple graph
paths, it's not the way to go as a join will make it impossible to
formulate several common scenario's of prefetch paths in 1 query.

Absolutely - there are certainly situations where it doesn't work.
However, where it *does* work I don't see why it's not an appropriate
solution - and as you've said, the situation specified in the question
will work.

Of course, it helps to have an ORM solution which allows you to specify
this kind of thing.

Bruce Wood · Aug 11, 2006

Frans said:
the thing you're worrying about is that because you're targeting the
problem with an OO approach will result in a lot of queries. This in
general is the case, IF you use the easy-way-out method where you
simply fetch a related driver when you're asked to do that from a car
instance.

Note that this isn't an O-O problem at all. Once you add in tricks like
caching and prefetching, you have to start wrestling with problems of
data going stale: "If I prefetch / cache, how do I keep up with changes
in the database? How do I make sure that the data within my application
is current with the data in the database?"

Depending upon your application, the answer may range from "I don't
care. I don't need to keep current with changes to the DB," to "I have
to fetch every time because I need to be as current as possible with
the DB." At this stage, the questions and problems are the same whether
you're writing an O-O application or a 3GL application.

The difference I would submit, is that O-O requires more careful design
in this regard, as it's harder to immediately understand what
operations will result in database fetches. If you design your objects
(as I did mine) so that a property reference:

Driver d = car.Driver;

may result in a database fetch, it's very difficult to know, looking at
the code, where trips back to the database may occur. This doesn't make
the O-O approach somehow inferior. All it means is that you have to
take more care in architecting your solution. We're using a horribly
slow ODBC connection, and we've managed to optimize our trips back to
the DB. It can be done.

One change I would recommend is this: don't use a property, use a
method:

Driver d = car.GetDriver();

I suggest this because in several cases we discovered that we wanted to
be able to regulate the amount of information coming back for an
object. In the case of a Driver, you may want merely a minimal object
that stands in for a driver and knows the driver ID, but contains no
additional information. Or, you might want everything about the driver.
The first requires no trip to the database, while the second does.
Wouldn't it be ugly if your architecture required client code to force
a trip to the database when all it wanted was a marker representing a
driver? So, we did something like this:

Driver d = car.GetDriver(DriverPart.Minimal);

or to get a full driver:

Driver d = car.GetDriver(DriverPart.All);

We have other objects that allow fetching of various subsets for
various purposes. Of course, if you're doing a full read up front and
caching all drivers, then this level of control isn't necessary.
However, for objects that are always fetched on first read, it can be
very useful. It also provides a visual clue of potential trips back to
the DB.

Frans Bouma [C# MVP] · Aug 12, 2006

Bruce said:
Note that this isn't an O-O problem at all. Once you add in tricks
like caching and prefetching, you have to start wrestling with
problems of data going stale: "If I prefetch / cache, how do I keep
up with changes in the database? How do I make sure that the data
within my application is current with the data in the database?"

I don't think caching and prefetching are related. They're completely
different things, and I don't thing what applies to caching applies to
prefetching as well. UNLESS! you're defining 'prefetching' as 'fetching
data way before it's perhaps needed'. In that case we're talking about
two different things

. I call 'prefetching' the fetch action you're
doing together with another fetch action because you know you'll need
it in the very near future, e.g.: you need both car and driver, thus
you fetch both up front.

Stale data is always a problem where data is consumed outside the
system where it's stored/kept. Though as that's a given, a developer
has to realize that as soon as s/he fetches data from a table / view in
the db, the data IS stale.

Depending upon your application, the answer may range from "I don't
care. I don't need to keep current with changes to the DB," to "I have
to fetch every time because I need to be as current as possible with
the DB." At this stage, the questions and problems are the same
whether you're writing an O-O application or a 3GL application.

true, but that wasn't the OO problem at hand I think. The OO problem
at hand was more in the form of:
- in an OO world you would focus on fetching each Car object
individually and every Driver object individually
vs.
- in a set oriented world, you'd focus on the set of data you have to
fetch, e.g. a joined list or 2 sets with a mapping between them (e.g.
dataset with two datatables and a datarelation).

If you want to do the latter, but want to use OO objects, there's a
friction, because they're not equal.

The difference I would submit, is that O-O requires more careful
design in this regard, as it's harder to immediately understand what
operations will result in database fetches. If you design your objects
(as I did mine) so that a property reference:

Driver d = car.Driver;

may result in a database fetch, it's very difficult to know, looking
at the code, where trips back to the database may occur.

trips back as in, persisting data?

I see it like this: say the algorithm A consumes an X amount of data.
That data is consumed in a period of time P. If it's ok for A that X is
available before P starts, you could opt for an optimization to fetch X
up front, and then proceed into P. If it's not ok for A to have X up
front, you have to fetch X during P, and perhaps the amount you need at
that given moment.

A is then efficient if A has the data amount X' available at time T
when it needs X'. This thus means that when X' is needed, no delay in
fetching should occur. IF fetching-on-demand (lazy loading) is enough
to feed A with X' amount of data at any given time T, why bother with
prefetching? IF fetching-on-demand isn't enough, prefetching is
recommended.

You see, in the statement you gave, it's not important for the
developer what happens below the hood. What's important is that after
that line, 'd' points to the Driver entity of car, as that's what
'reality' is on the abstraction level the developer works on.

This doesn't
make the O-O approach somehow inferior. All it means is that you have
to take more care in architecting your solution. We're using a
horribly slow ODBC connection, and we've managed to optimize our
trips back to the DB. It can be done.

Oh, I didn't want to imply that an OO solution would be inferior, not
at all. I just wanted to imply that the hard-core 'This is not OO and
therefore crap!'-slogans aren't practical to work with

so it should
be a basis to base a decision on.

One change I would recommend is this: don't use a property, use a
method:

Driver d = car.GetDriver();

It should be a method, indeed, because the operation can be expensive
(if the related driver hasn't been read yet) and therefore the MS
guidelines say you should create a method.
The sad thing is though, what would you suggest for the other side of
the relation:
Cars c = d.Cars;

It then should be logical to have a method as well, correct? However,
then you run into the problem with databinding. What if you want to
bind the Cars collection of the selected driver in a grid to another
grid? You can't do that with design time databinding or other simple
constructs, it requires ugly glue code, not something you want to
write, trust me

Therefore, I think a property which calls the method (which is also
there) is best. (I use that scheme). The method then can also have
overloads which accept additional filters and other bells / whistles,
and the property is there for easy access and databinding access.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------

Registered User · Aug 12, 2006

Say i have a class car with properties: Color, Make, Model, Year, DriverID
And a Driver class with properties: DriverID, Name

The driverID PRIVATE property is the id of the driver from say a driver
table (t_driver). This has a PUBLIC property accessor called Driver

My understanding of OO using the composition model is that when you want to
load up a car class, you would access the DB to get Color, make, Model,
Year, DriverID and load the Car class. When the PUBLIC property Driver is
accessed, it would use the PRIVATE property DriverID to call the DB and load
the Driver class.

In this example, 2 separate DB calls would be made to load the Car and
Driver classes. I have DB/network people at work saying that these calls
should be done at once to cut down on network traffic and DB calls.

Any comments as to why my way is not OO sound or firepower I could use to
tell to my peers?

This is more about the relationship between Car and Driver than
anything else. First I question the Driver type. it would seem a car's
driver property would be null if the car's state property
IsBeingDriven returns false. Although only one person at a time can
drive a car [1] through its lifetime a car can have many drivers.
Perhaps a type Owner might be more suitable.

In any case I would argue that the car type should know nothing about
its owner or driver. In the real world the VIN [2] is used as the key
to ownership while the driver's key is the physical key. In both cases
there is a layer between the car and owner or driver. This suggests
Car table
Ownership table
Owner table
where the Ownership table contains the pink slip [3].

But this doesn't resolve the query question.

Now it might be wise to consider two car types keeping the existing
Car type with the VIN. The second car type would be derived from Car
and named FullyQualifiedCar.The FullyQualifiedCar type would be
additionally populated with desired values from the owner table. What
a particular piece of code does can determine which type of object to
use. Overloading a FullyQualifiedCar c'tor to take a Car as an
argument will be useful for situations where a Car exists and a
FullyQualifiedCar may be conditionally needed.

regards
A.G.

[1] excluding driver education vehicles with redundant controls. Isn't
there is always a contrary real world example?
[2] vehicle identification number, unique identifier
[3] vehicle title, CarID, OwnerID

Chris Darnell · Aug 13, 2006

The truely OO approach is that each class has the knowledge to load and
save its own information. As an object is filled (vehicle, for
example), it tells its Driver subclass to load its data based on the
provided Driver ID. This is a very sound OO design, but can adversely
impact the performance of the system.

There is no shame in having the Vehicle class also load/save the Driver
information, also. The Vehicle class's Fill() method would also
instantiate and fill a Driver object.

Arguments for the truely OO approach will depend upon the real design
of your system. A simple Vehicle/Driver sample can easily be argued
both ways. Consider the "bigger picture" that your system represents.
What are the real tradeoffs for your design if you don't do the true OO
approach? How many users will be using the system at the same time?
How often is any given vehicle loaded? How many vehicles are returned
at a time? Will the users only be working with one vehicle at a time?

Chris

Registered User · Aug 13, 2006

The truely OO approach is that each class has the knowledge to load and
save its own information. As an object is filled (vehicle, for
example), it tells its Driver subclass to load its data based on the
provided Driver ID. This is a very sound OO design, but can adversely
impact the performance of the system.

You probably chose the wrong word with subclass but I pretty much
agree with you.

There is no shame in having the Vehicle class also load/save the Driver
information, also. The Vehicle class's Fill() method would also
instantiate and fill a Driver object.

Arguments for the truely OO approach will depend upon the real design
of your system. A simple Vehicle/Driver sample can easily be argued
both ways.

Absolutely correct.

Consider the "bigger picture" that your system represents.
What are the real tradeoffs for your design if you don't do the true OO
approach? How many users will be using the system at the same time?
How often is any given vehicle loaded? How many vehicles are returned
at a time? Will the users only be working with one vehicle at a time?

Excellent points. All too often the first design consideration becomes
the only design consideration.

regards
A.G.

Chris Darnell · Aug 14, 2006

Registered said:
On 12 Aug 2006 20:57:01 -0700, "Chris Darnell" <[email protected]>
wrote:

You probably chose the wrong word with subclass but I pretty much
agree with you.

regards
A.G.

Thanks, A.G. I was a little tired when I wrote that. I reckon the
Driver would NOT be a subclass of Vehicle. It would probably be
considered a "child class".

Chris

Object oriented method is inefficient with database calling

TS

Jon Skeet [C# MVP]

Linda Liu [MSFT]

Jon Skeet [C# MVP]

Bruce Wood

Frans Bouma [C# MVP]

Frans Bouma [C# MVP]

Jon Skeet [C# MVP]

Bruce Wood

Frans Bouma [C# MVP]

Registered User

Chris Darnell

Registered User

Chris Darnell