Most efficient way to join two collections fetched from a database

C

cpp.mbs

Hi all,

I have two tables in a database: invoices and invoiceItems.
I need to fetch them from a database and create a collection of
invoice entities, each of them associated whith its corresponding
invoice items.

This is what I am doing now:

1. Fech the invoices in a list.
2. Fech the invoiceItems in another list.
3. Loop through the invoice items list and for each item do a inner
loop to search the corresponding invoice and add the item to the
invoice.

To optimize a little I fetch the invoice items sorted by invoiceId
from the database and then in the invoiceItems loop, store always the
last invoice and if the current invoiceItem has the same invoiceId
than the previous, then assign it directly to the last invoice.

But I wonder if there is a more efficient way.

Would it be better to store all the invoices in a dictionary and then
loop through the items and get directly the invoice by its id key from
the dictionary or would be this method a waist of memmory?

thanks!
 
C

cpp.mbs

I mind "memory waste", Sorry for the spelling mistakes, I hit the send
button too soon, before running the spell checker :)
 
S

sloan

See my
Order(s)
OrderDetail(s) example at:


http://sholliday.spaces.live.com/Blog/cns!A68482B9628A842A!140.entry
http://sholliday.spaces.live.com/Blog/cns!A68482B9628A842A!139.entry

.......

But 1 IDataReader with 2 results sets is the short answer.

I could code up
Invoice
InvoiceCollection : List<Invoice>
InvoiceItem
InvoiceItemCollection : List<InvoiceItem>

and put a "sub" collection as a property on Invoice
as in
Invoice.AllInvoiceItems ( which would be a InvoiceItemCollection of course).

...
 
I

Ignacio Machin ( .NET/ C# MVP )

Have you considered using dataset?

Hi,

DataSet are a good idea here.

Otherwise create an Order class, that expose a OrderDetail collection.
Now the good thing about this escenrio is that you only load the
OrderDetails collection if needed, this will save you a bunch of
memory usage
 
C

cpp.mbs

Hi,

DataSet are a good idea here.

Otherwise create an Order class, that expose a OrderDetail collection.
Now the good thing about this escenrio is that you only load the
OrderDetails collection if needed, this will save you a bunch of
memory usage


Hi,

but what method does the DataSet uses internally to relate the
dataTables? Does it loops through all the related table searching the
related items?

This system or having a property OrderDetails that is only populated
when it is called sounds great if you are going to access it in a few
orders, but what if I know that I am going call it in many orders?
Wouldn't be more efficient something like the following code?

(Assuming that the orderlist is sorted by id and the orderDetails list
is sorted by orderId):

int orderIndex = 0;
int ordersLenght = orders.Length;

foreach(OrderDetail orderDetail in orderDetailsList)
{
for(; orderIndex < ordersLength; orderIndex++)
{
Order order = orders[orderIndex];

if(order.Id == orderDetail.OrderId)
{
order.OrderDetails.Add(orderDetail);
break;
}
}
}
 
S

sloan

Did you read my post? And blog entries?

...


The complete code samples are downloadable, and have exactly what you need.
Both 1.1 and 2.0 versions.


...

I use a "Controller" class to handle multiple Orders or Single Orders.





Hi,

DataSet are a good idea here.

Otherwise create an Order class, that expose a OrderDetail collection.
Now the good thing about this escenrio is that you only load the
OrderDetails collection if needed, this will save you a bunch of
memory usage


Hi,

but what method does the DataSet uses internally to relate the
dataTables? Does it loops through all the related table searching the
related items?

This system or having a property OrderDetails that is only populated
when it is called sounds great if you are going to access it in a few
orders, but what if I know that I am going call it in many orders?
Wouldn't be more efficient something like the following code?

(Assuming that the orderlist is sorted by id and the orderDetails list
is sorted by orderId):

int orderIndex = 0;
int ordersLenght = orders.Length;

foreach(OrderDetail orderDetail in orderDetailsList)
{
for(; orderIndex < ordersLength; orderIndex++)
{
Order order = orders[orderIndex];

if(order.Id == orderDetail.OrderId)
{
order.OrderDetails.Add(orderDetail);
break;
}
}
}
 
C

cpp.mbs

Did you read my post? And blog entries?


Hi sloan,

thanks for answering. Yes I did, I dowloaded your code and I this is
the system you use for each order:


// Create a new person location filter,
that will look for the specified CustomerID
Filters.CustomerFilter custFilt = new
Filters.CustomerFilter(currentOrder.CustomerID);
// Create a new predicate, that uses the
FilterByLocation method to determine whether the Customer has been
found
Predicate<BusinessObjects.Customer>
filterByLocation = new
Predicate<BusinessObjects.Customer>(custFilt.FilterByCustomerId );
// Find the Customer in the collection
BusinessObjects.Customer foundCustomer =
coll.Find(filterByLocation);


But this is just looping through all the orders for each customer. I
am looking for something more efficient if there is any. what I would
like to know is the most efficient algorithm to do that if there is
any.
 
J

John Daragon

Hi all,

I have two tables in a database: invoices and invoiceItems.
I need to fetch them from a database and create a collection of
invoice entities, each of them associated whith its corresponding
invoice items.

This is what I am doing now:

1. Fech the invoices in a list.
2. Fech the invoiceItems in another list.
3. Loop through the invoice items list and for each item do a inner
loop to search the corresponding invoice and add the item to the
invoice.

I'm probably being a bit dumb here, but assuming that the database
you're using is relational, I don't understand why you're not using an
SQL join ?

jd
 
S

sloan

Well....I guess the biggest thing is that there is no magic voodoo.

Whether MS does it for you under the covers, or you do it...somewhere there
is a "find the right one" and "now add".

I believe you will find the below code better performing than a (strong)
DataSet with constraints.
My 1.1 example actually shows that.

The looping on the IDataReader is probably your biggest performance killer.
But if you want to talk to a database, that's about the best way to do it.

You should also give a range on how many parent and child records you're
dealing with.

...

But again. Just because you pass off the work to under the covers MS
code..doesn't necessarily mean it is going to behave better.

I almost never use strong datasets anymore (esp one with constraints) since
I find it slows down a bulk population.

...

You can go here an look at (what should be a mimic of) MS code.
http://www.koders.com/zeitgeist/csharp/
 
C

cpp.mbs

yes, I also found that standard datasets perform really well. But the
database part was just an example to explain what I am really curious
about:

having two collection of objects, like order- orderDetails, find an
efficient algorithm to distribute the orderDetails between the orders.

If the collections are unsorted then I guess that, as you say, there
is no magic, you must loop through all items, but assuming that both
are sorted then you could use some technique to find items like
"...compare with the item in the middle, if lower the compare with the
item in the middle of the first half, ....etc " or some code like in
my previous post.

Also it would be interesting to know if it could be faster to sort
before the collections and then distribute the items than looping
trough all the orders with an inner loop for all the orderDetails.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top