Latency problem with linq queries, when having a big DataTable inmemory

M

Matthijs de Z

Hi,


I’ve got a “strange” problem.

Because I wanted to reduce unnecessary DB interaction between C# and
MySQL, I load all the data from a mySQL table in a C# DataTable
(AllData). From thereon I use this DataTabe AllData to get my
information.

I Query AllData to get some information I need. The output of that
query is again used to get the max of myDate.

When there are just two instruments in the table, with about 1250 rows
each, all of the code runs smooth.

EndTicker1 is between 0 and 16;

EndTicker2 is between 0 and 16;

EndTicker3 is between 32 and 48;



If I make AllData bigger, by not inserting 2 instruments, but let say
1000….all of a sudden…the used linq queries are slow as hell, while
they still use the result of the join linq, which still contains 1250
rows (because NumberOfROwsPerInstruments is set at 1250). There is
more then enough memory in the system available, so I do not really
understand why it takes like forever (about 2300 ms) to do some simple
querying on a set that is equally big as before.

Below I’ve pasted some example code I used for testing and figuring
out when the problems occurred.

How can I keep up the pace of the min / max queries (and any other
that use the result of myPairInfoData) with a “big” DataTable in the
memory?

By the way: The figures are in debug mode. I don’t log the latency
results while in release mode, but it doesn’t look significantly
better.
Kind regards,



Matthijs





using System;

using System.Collections.Generic;

using System.ComponentModel;

using System.Data;

using System.Drawing;

using System.Linq;

using System.Text;

using System.Windows.Forms;



namespace LinqJoinTest

{

public partial class Form1 : Form

{

private int NumberOfInstruments = 2;

private int NumberOfROwsPerInstruments = 1250;



public Form1()

{

InitializeComponent();

}



private void button1_Click(object sender, EventArgs e)

{

DataTable MyTable = new DataTable();

MyTable.Columns.Add("myDate", typeof(int));

MyTable.Columns.Add("InstrID", typeof(int));

MyTable.Columns.Add("NormPrice", typeof(double));



for (int j = 1; j < NumberOfInstruments+1; j++)

{

for (int i = 1; i < NumberOfROwsPerInstruments+1; i++)

{

DataRow MyNewRow = MyTable.NewRow();

MyNewRow["myDate"] = 20090901 + i;

MyNewRow["InstrID"] = j;

MyNewRow["NormPrice"] = 10 + i * 0.1 * Math.Sin
(j*i);

MyTable.Rows.Add(MyNewRow);

}

}



//join solution provided by Alberto Poblacion. Works fine.

var DataInstr1 = from x in MyTable.AsEnumerable()

where x.Field<int>("instrID") == 1

select new

{

myDate = x.Field<int>("myDate"),

NormPrice = x.Field<double>
("NormPrice")

};





var DataInstr2 = from x in MyTable.AsEnumerable()

where x.Field<int>("instrID") == 2

select new

{

myDate = x.Field<int>("myDate"),

NormPrice = x.Field<double>
("NormPrice")

};



var result = from v1 in DataInstr1

join v2 in DataInstr2 on v1.myDate equals
v2.myDate

select new

{

MyDate = v1.myDate,

PriceInstr1 = v1.NormPrice,

PriceInstr2 = v2.NormPrice,

Ratio = v1.NormPrice / v2.NormPrice

};



var myTest = (from x in result

orderby x.MyDate descending

select new

{

x.MyDate

}).Take(1);





int startTicker1 = Environment.TickCount;

int myIntForeachTest1 = 0;

foreach (var x in myTest)

{

myIntForeachTest1 = x.MyDate;

}

int EndTicker1 = Environment.TickCount-startTicker1;

EndTicker1 = EndTicker1;





IEnumerable<myPairInfo>

myPairInfoData = from v1 in DataInstr1.AsEnumerable()

join v2 in DataInstr2.AsEnumerable() on
v1.myDate equals v2.myDate

select new myPairInfo

{

myDate = v1.myDate,

PriceInstr1 = v1.NormPrice,

PriceInstr2 = v2.NormPrice,

Ratio = v1.NormPrice / v2.NormPrice

};





int startTicker2 = Environment.TickCount;

int myMaxDateint = (from PairInfo in myPairInfoData

orderby PairInfo.myDate descending

select PairInfo.myDate).Take(1).Single
();

int EndTicker2 = Environment.TickCount - startTicker2;

EndTicker2 = EndTicker2;





int startTicker3 = Environment.TickCount;

var myMaxDateVar = (from PairInfo in myPairInfoData

orderby PairInfo.myDate descending

select PairInfo.myDate).Take(1);



foreach (int x in myMaxDateVar)

{

myIntForeachTest1 = x;

}

int EndTicker3 = Environment.TickCount - startTicker3;

EndTicker3 = EndTicker3;





}

}



class myPairInfo

{

public int myDate { get; set; }

public double PriceInstr1 { get; set; }

public double PriceInstr2 { get; set; }

public double Ratio { get; set; }

}



class RawDataFromDB

{

public int myDate { get; set; }

public int InstrID { get; set; }

public double NormPrice { get; set; }

}

}
 
G

Gregory A. Beamer

Because I wanted to reduce unnecessary DB interaction between C# and
MySQL, I load all the data from a mySQL table in a C# DataTable
(AllData). From thereon I use this DataTabe AllData to get my
information.

The problem is when you mix the words "unnecessary data interaction" with
"data interaction". In general, you are better to have more small data
trips than one big one. the reason is, with perhaps the exception of
reporting, a user can only deal with a small amount of data at one time.

Whether you are taking one trip at the beginning or a huge trip every time,
you are wasting time pulling some data that will not be used in your app.

A better strategy, at least in most cases, is caching the small trip so you
do not have to pull from the database when the same question is asked. The
trip is still small, but you have eliminated the "often" part of the
equation (which seems to be your real issue) by caching.

Peace and Grace,

--
Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA

Twitter: @gbworld
Blog: http://gregorybeamer.spaces.live.com

*******************************************
| Think outside the box! |
*******************************************
 
M

Matthijs de Z

Thank you both for your comments / suggestions. I'll give it a try
with List or use the database again.
regards,

Matthijs
 
M

Matthijs de Z

Just did a test using a List... it's about 100 times faster than a
DataTable (containing the same data). nice ;-)
 
M

Matthijs de Z

Just did a test using a List... it's about 100 times faster than a
DataTable (containing the same data). nice ;-)
 
M

Matthijs de Z

Just did a test using a List... it's about 100 times faster than a
DataTable (containing the same data). nice ;-)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

SQL to Linq with join 2
Linq in C# 3

Top