any preformance tips ?

C

colin

Hi,
I have several gigabytes of CSV files - coma seperated variables (in text)
I find the converting to numbers is a bit slow,
to double or int seems about the same,
using split seems to take longer than substring,
seems no alternative to parse/tryparse.
although I havnt tested to see wich is faster.

It reads the file at a rate of about 1 million lines a sec wich is probably
as fast as the system will allow,
but only about converts 100k lines a sec with 5 numbers in each line,
this seems rather slow, must be many thousands of instructions per
conversion,
im running on a 64bit amd ~ 2ghz, 3gb ram.

I do this in assembler in my PIC mcu in a lot less lol.

the number crunching wich is quite involved can also do about 100k lines a
sec,

I have seperate threads for each step so I can delay the start of the next
thread to see how fast it is.
otherwise its procesed and displayed as it comes in.

I could do with speeding up the number crunching too,
I found it considerably faster to use structs instead of class
to organize the data, wich is a couple of complex numbers per point, plus
DateTime.
the data is stored in chunks in List<>

but theres lots of function calls as its quite structred with a complex
class I copied from somwhere
howcome c# doesnt have its own complex class or even a complex variable type
?
do simple functions get inlined ? or is this not happening in debug.
I read the Jit makes its own mind up about inlining,
structures get passed by value on the stack though
I tried using ref but got into trouble as then you cant pass values from
function calls directly.

I also run into memory problems, I need to keep all the data I read in as it
takes long to convert it,
then do statistical noise reduction and more than one FFT
and be able to change parameters and see the results quickly.

I store the files in 1hour chunks wich is about 150k records,
I found it slightly better to use an array[] then resize the aray when it is
finished,
changing the initial size of the array and the increment if its too small
considerably affects the max memory it uses.
I managed to get the memory down to about 115% of the size of data I have.

although I only have a few days data atm,
with a years worth im going to run into problems I think.
its probably going to be more than 4gb,

maybe I should store the input data as binary numbers, or is there a way to
speed this up ?
I would need to store both.

maybe I just though of an idea .. to have a binary cache.
could I simply map the file into memory and make it an array of structs ?
Ive done something like this before with c++
I gues it would need to be unsafe code,
is there much scope for speeding things up with unsafe code?

Im also wondering if its worth going for a 64bit version of winxp,
has any one been down this route with c# and know if its that much better
or has any disadvantages ? I know some things arnt compatable.

thanks
Colin =^.^=
 
N

Nicholas Paldino [.NET/C# MVP]

Colin,

If you can do this in assembler, then create that code, compile it into
a DLL which exports a function, and then call that function through the
P/Invoke layer in .NET. That would be the best solution. Why rewrite code
when you already have a solution you can just plug into?

Also, you asked why C# doesn't have a complex class or complex variable
type. What exactly do you mean by that? A class definition can be as
simple or as complex as one needs it to be, or are you thinking of something
else.

It seems like you have the solutions already in other code bases which
satisfy your needs. I would find a way to interact with those code bases,
instead of trying to re-engineer the wheel, which is what you are trying to
do here.

Hope this helps.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

colin said:
Hi,
I have several gigabytes of CSV files - coma seperated variables (in text)
I find the converting to numbers is a bit slow,
to double or int seems about the same,
using split seems to take longer than substring,
seems no alternative to parse/tryparse.
although I havnt tested to see wich is faster.

It reads the file at a rate of about 1 million lines a sec wich is
probably as fast as the system will allow,
but only about converts 100k lines a sec with 5 numbers in each line,
this seems rather slow, must be many thousands of instructions per
conversion,
im running on a 64bit amd ~ 2ghz, 3gb ram.

I do this in assembler in my PIC mcu in a lot less lol.

the number crunching wich is quite involved can also do about 100k lines a
sec,

I have seperate threads for each step so I can delay the start of the next
thread to see how fast it is.
otherwise its procesed and displayed as it comes in.

I could do with speeding up the number crunching too,
I found it considerably faster to use structs instead of class
to organize the data, wich is a couple of complex numbers per point, plus
DateTime.
the data is stored in chunks in List<>

but theres lots of function calls as its quite structred with a complex
class I copied from somwhere
howcome c# doesnt have its own complex class or even a complex variable
type ?
do simple functions get inlined ? or is this not happening in debug.
I read the Jit makes its own mind up about inlining,
structures get passed by value on the stack though
I tried using ref but got into trouble as then you cant pass values from
function calls directly.

I also run into memory problems, I need to keep all the data I read in as
it takes long to convert it,
then do statistical noise reduction and more than one FFT
and be able to change parameters and see the results quickly.

I store the files in 1hour chunks wich is about 150k records,
I found it slightly better to use an array[] then resize the aray when it
is finished,
changing the initial size of the array and the increment if its too small
considerably affects the max memory it uses.
I managed to get the memory down to about 115% of the size of data I have.

although I only have a few days data atm,
with a years worth im going to run into problems I think.
its probably going to be more than 4gb,

maybe I should store the input data as binary numbers, or is there a way
to speed this up ?
I would need to store both.

maybe I just though of an idea .. to have a binary cache.
could I simply map the file into memory and make it an array of structs ?
Ive done something like this before with c++
I gues it would need to be unsafe code,
is there much scope for speeding things up with unsafe code?

Im also wondering if its worth going for a 64bit version of winxp,
has any one been down this route with c# and know if its that much better
or has any disadvantages ? I know some things arnt compatable.

thanks
Colin =^.^=
 
C

colin

Nicholas Paldino said:
Colin,

If you can do this in assembler, then create that code, compile it into
a DLL which exports a function, and then call that function through the
P/Invoke layer in .NET. That would be the best solution. Why rewrite
code when you already have a solution you can just plug into?

Also, you asked why C# doesn't have a complex class or complex variable
type. What exactly do you mean by that? A class definition can be as
simple or as complex as one needs it to be, or are you thinking of
something else.

It seems like you have the solutions already in other code bases which
satisfy your needs. I would find a way to interact with those code bases,
instead of trying to re-engineer the wheel, which is what you are trying
to do here.

Hope this helps.

erm the assembler code is in fact for a different machine -
a small PIC microcontroller.

the PIC generates the data and sends it to the PC,
the PC does the heavy number crunching and display.

kinda does makes my head spin switching between assembler and c# lol,

Oh wait confusion here, by complex I mean a number type consisting of a real
and imaginary part.

Im trying to get my head round how to transfer an array of structs to and
from a file,
ive got it to write to the file ok using
UnsafeAddrOfPinnedArrayElement(), copying it to a byte[] and using the
File.Write()

wich I found on the web but is there any way to bypass the copy to/from the
byte[] array ?

Colin =^.^=
 
G

Guest

Colin,
If you have the need to do complex math, FFT, etc. take a look at MathNet:
http://mathnet.opensourcedotnet.info/Default.aspx
Peter
--
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short urls & more: http://ittyurl.net




colin said:
Nicholas Paldino said:
Colin,

If you can do this in assembler, then create that code, compile it into
a DLL which exports a function, and then call that function through the
P/Invoke layer in .NET. That would be the best solution. Why rewrite
code when you already have a solution you can just plug into?

Also, you asked why C# doesn't have a complex class or complex variable
type. What exactly do you mean by that? A class definition can be as
simple or as complex as one needs it to be, or are you thinking of
something else.

It seems like you have the solutions already in other code bases which
satisfy your needs. I would find a way to interact with those code bases,
instead of trying to re-engineer the wheel, which is what you are trying
to do here.

Hope this helps.

erm the assembler code is in fact for a different machine -
a small PIC microcontroller.

the PIC generates the data and sends it to the PC,
the PC does the heavy number crunching and display.

kinda does makes my head spin switching between assembler and c# lol,

Oh wait confusion here, by complex I mean a number type consisting of a real
and imaginary part.

Im trying to get my head round how to transfer an array of structs to and
from a file,
ive got it to write to the file ok using
UnsafeAddrOfPinnedArrayElement(), copying it to a byte[] and using the
File.Write()

wich I found on the web but is there any way to bypass the copy to/from the
byte[] array ?

Colin =^.^=
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top