need for speed

C

colin

Hi,
I have written a simple app wich analyses a large data set,
100 samples a second spanning many days upto a year,
wich is a lot of points, each point being 6 floats.

so far ive not focused too much on speed exept for the
convertion of comma seperated variables wich was initialy painfully slow.

im trying to make it a bit faster,
so i can change things like noise rejection and see the results.
its basically demodulating a signal then doing an FFT
but the signal is way below the noise.

so ive run some simple tests,
im using an athlon 64,3200 with winxp sp2
with a simple loop i can get 100M flops with a simple loop :-

for(double x=0;x<1million;x++)
result=x*x;

the slowest math part of my code apears to be the use of atan2 wich gives me
about 8M/s
if run in a similar loop to above, im not sure if its worht trying to get
round this.

the way the points are stored in the list can slow it down a lot
if each is alocated seperatly so I have used structs and
then used ref to structs for function calls although using ref
is awkward and im not sure its of any benefit.

are there any reaonably easy ways to ensure getting the best math
performance ?

I know the fp unit can go faster but only if it is fed with
operations wich dont rely on the previous result.
a MAC instruction would be handy I gues.

Colin =^.^=
 
C

colin

colin said:
Hi,
I have written a simple app wich analyses a large data set,
100 samples a second spanning many days upto a year,
wich is a lot of points, each point being 6 floats.

so far ive not focused too much on speed exept for the
convertion of comma seperated variables wich was initialy painfully slow.

im trying to make it a bit faster,
so i can change things like noise rejection and see the results.
its basically demodulating a signal then doing an FFT
but the signal is way below the noise.

so ive run some simple tests,
im using an athlon 64,3200 with winxp sp2
with a simple loop i can get 100M flops with a simple loop :-

for(double x=0;x<1million;x++)
result=x*x;

the slowest math part of my code apears to be the use of atan2 wich gives
me about 8M/s
if run in a similar loop to above, im not sure if its worht trying to get
round this.

the way the points are stored in the list can slow it down a lot
if each is alocated seperatly so I have used structs and
then used ref to structs for function calls although using ref
is awkward and im not sure its of any benefit.

are there any reaonably easy ways to ensure getting the best math
performance ?

I know the fp unit can go faster but only if it is fed with
operations wich dont rely on the previous result.
a MAC instruction would be handy I gues.

Colin =^.^=


well ive dropped down to using SSE2 instructions in c++.

if I keep the data set to less than 1mb it does 1800Mflops
if i go much above this it drops to 1/4,
wich is then about the same as not using sse2 at all.

the prefetch instructions seem to have no effect whatsoever,
any ideas how theyr used to good effect ?
or does it just mean ive reached the memory bandwidth.

its a loop doing standard deviation of an array of complex numbers.
If i process 1 minutes worth at a time that would
keep it withn the cpu cache.

Colin =^.^=
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top