Simple optimization

B

Bob Cummings

Not sure if this is the correct place or not. Anyhow in school we were
taught that when trying to calculate the efficiency of an algorithm to
focus on something called FLOPs or Floating Point operations and
disregard all integer operations.

My question is this. I am writing a simulation for animal dispersement
through large landscapes. We are loading data from several different
files to simulate the environment the animals will be traveling through.
So for every time step for every day for every year for every animal I
will have to figure out the new location on the map for the animal, then
check to see what the chance is the animal was killed by a predator,
then if they survive that see what type of food is around and get a
chance on getting something to eat. Then what type of environment they
are moving through (Open prairie might have some species move relatively
straight while others might circle around and stay in the woods) and on
and on and on. Most of these decisions will be determined by comparing
a value against a random number from some (undetermined as of now)
random number generator. So should I have the values be between 0-100
and generate a random integer? Or have the values 0.0 to 1.0 and then
generate a random double in that range.

The scientist are not too worried about how fine a detail they have
available for stetting the value of find food or what ever(1 to 100 will
work I am told), so efficiency is the name of the game for this round.

Any suggestions or ideas welcome

thanks for your time

cheers

bob
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Bob said:
Not sure if this is the correct place or not. Anyhow in school we were
taught that when trying to calculate the efficiency of an algorithm to
focus on something called FLOPs or Floating Point operations and
disregard all integer operations.

That's a strange way to look at it... What if an algorithm doesn't use
floating point operations at all? Does that mean that you should
consider it to complete instantaneously?
My question is this. I am writing a simulation for animal dispersement
through large landscapes. We are loading data from several different
files to simulate the environment the animals will be traveling through.
So for every time step for every day for every year for every animal I
will have to figure out the new location on the map for the animal, then
check to see what the chance is the animal was killed by a predator,
then if they survive that see what type of food is around and get a
chance on getting something to eat. Then what type of environment they
are moving through (Open prairie might have some species move relatively
straight while others might circle around and stay in the woods) and on
and on and on. Most of these decisions will be determined by comparing
a value against a random number from some (undetermined as of now)
random number generator. So should I have the values be between 0-100
and generate a random integer? Or have the values 0.0 to 1.0 and then
generate a random double in that range.

The scientist are not too worried about how fine a detail they have
available for stetting the value of find food or what ever(1 to 100 will
work I am told), so efficiency is the name of the game for this round.

If you use integers or floating points doesn't really matter that much.
You should be more concerned about how efficient the algorithm is.

The random generator in the framework generates a floating point random
number, so to get an integer random number there are some more
calculations needed. On the other hand, integer calcualtions are
slightly faster than floating point calculations, so it may prove faster
once you use the random number in some calculations.
 
B

Bruce Wood

Every random number generator I've ever seen uses floating point, which
you can convert to an integer if you want to, but there probably isn't
much point.

If I were you I wouldn't pay any attention to floating point versus
integer. I would pay attention to that time step: what do I have to
calculate for each animal for each "tick" of the clock, and is there
any way to:

1) Design my data structures so that teh calculations are really easy.
(For example, if you want to know whether the animal was eaten by a
predator, if you compare the animal against every predator animal on
the whole "board" every time, it will be slow. If you can design your
data so that there is a fast way to determine which predators are
nearby, you can speed things up substantially.)

2) Design my data structures so that it's possible to skip steps for
certain animals. This may not be possible, but the basic idea is that
if an animal has enough to eat in a location for, say, seven days, and
there are no predators nearby, then design some way to skip processing
that animal for seven days. Something like that.

The bottom line is that real efficiencies come from looking at the
problem and your solution in the large, and seeing where you can play
tricks to avoid doing calculations at all. Once you're in a situation
where you have to do the calculations, you're often past the point
where you can make significant efficiency gains.

That said, there are always exceptions. Your program may be one of
them. However, the best place to start is alway the large-scale problem
and algorithm / data structures used to solve it. Once you're convinced
that you can't possibly design your program to avoid work... only then
worry about small-scale efficiencies like whether to use integers or
floats.
 
B

Bob Cummings

Bruce said:
Every random number generator I've ever seen uses floating point, which
you can convert to an integer if you want to, but there probably isn't
much point.

If I were you I wouldn't pay any attention to floating point versus
integer. I would pay attention to that time step: what do I have to
calculate for each animal for each "tick" of the clock, and is there
any way to:

OK this is a side job and I have only been out of school for two years.
And most of my "official" work is not like this. So as I am going
through my design phase and trying to decide what type of data to store
this seemed like it could make a huge difference. I guess I was wrong,
not the first or last.
1) Design my data structures so that teh calculations are really easy.
(For example, if you want to know whether the animal was eaten by a
predator, if you compare the animal against every predator animal on
the whole "board" every time, it will be slow. If you can design your
data so that there is a fast way to determine which predators are
nearby, you can speed things up substantially.)

I think we are there already by the way we are loading the data using
shape files. I will only have the animal compare his exact location on
one map then get the values from the other maps at that exact location
for comparison. That way I am not looking all over the entire landscape.
2) Design my data structures so that it's possible to skip steps for
certain animals. This may not be possible, but the basic idea is that
if an animal has enough to eat in a location for, say, seven days, and
there are no predators nearby, then design some way to skip processing
that animal for seven days. Something like that.

My animal does have different behavior modes to simulate some of that.
For example if he just missed getting nailed by a predator he goes into
flee mode for a certain time. No feeding, no looking for a mate, no
nothing but beat feet to get out of here now. If he has enough energy
reserves then we will not be looking for food. I think this is what you
are talking about? One of the parameters we are passing in will be how
far can the animal "see" around him to make some of those decisions, but
I think he is going to have to move every time step if only a little bit.
The bottom line is that real efficiencies come from looking at the
problem and your solution in the large, and seeing where you can play
tricks to avoid doing calculations at all. Once you're in a situation
where you have to do the calculations, you're often past the point
where you can make significant efficiency gains.

Good point I will bring that up the the scientists who are funding the
project and see if there is any time I can do that and let them make the
decision. I have already discussed this issue with them once and they
are not too concerned about efficiency. Rather they want to try and
find a balance between mimicking real life and gathering data in a
simple enough form to make decisions in the future.
That said, there are always exceptions. Your program may be one of
them. However, the best place to start is alway the large-scale problem
and algorithm / data structures used to solve it. Once you're convinced
that you can't possibly design your program to avoid work... only then
worry about small-scale efficiencies like whether to use integers or
floats.

Thanks for the advice. And be careful with great help like this I will
be back asking for more help.
 
N

Nick Hounsome

Bob Cummings said:
Not sure if this is the correct place or not. Anyhow in school we were
taught that when trying to calculate the efficiency of an algorithm to
focus on something called FLOPs or Floating Point operations and disregard
all integer operations.

This is only true where you are doing multiplications or divisions - For all
other operations on ordinary machines integers should be faster.

The advice about only counting FLOPs comes about at least partly because all
the really big number crunching programs such as weather forecasting, atomic
physics and astrophysics use lots of complex FLOPs and use vector processors
to do multiple FLOPs in parallel and hence tend to be rated in FLOPS (FLOPs
per second) [If you are using one of these machines you should not be
asking these questions]

Regarding your simulation - you clearly haven't thought out exactly what you
want because the integer range 0-100 contains only 101 discreete values
whereas 0.0-1.0 contains something like 999999999999 in a double. Your
random number generation could well be the most expensive operation so
knowing what you need is very important. Ideally you could get away with
just using a relatively small array of precomputed random numbers.

Beware the pitfalls of using just the low bits of a simple integer random
number generator.
 
B

Bob Cummings

Nick said:
Regarding your simulation - you clearly haven't thought out exactly what you
want because the integer range 0-100 contains only 101 discreete values
whereas 0.0-1.0 contains something like 999999999999 in a double.

I understand the idea of more possible values in a 0.0 ~ 1.0 vs. 0 ~ 100
I just do not have the knowledge and experience on how to express
myself. I guess the point I was trying to make was the scientist do not
require a finer scale then 1 - 100. Hope that make sense.

Your
random number generation could well be the most expensive operation so
knowing what you need is very important. Ideally you could get away with
just using a relatively small array of precomputed random numbers.

Again my ignorance is showing here. I think you are suggesting in the
beginning of the program to generate say 10,000 random numbers and put
them in an array. Then as the program progresses just grab the next
number in the array. When I run out then fill the array again?


I can see how that would work especially since between each year time
period there are maintenance issues such as witting output files and
loading the next years maps. I am not very versed in using multi
threads and call backs but this might be a case for that?
Beware the pitfalls of using just the low bits of a simple integer random
number generator.

Here I am completely lost on your advice?

Anyhow thanks for you time and thoughts

cheers

bob
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Bob said:
I understand the idea of more possible values in a 0.0 ~ 1.0 vs. 0 ~ 100
I just do not have the knowledge and experience on how to express
myself. I guess the point I was trying to make was the scientist do not
require a finer scale then 1 - 100. Hope that make sense.

Your

Again my ignorance is showing here. I think you are suggesting in the
beginning of the program to generate say 10,000 random numbers and put
them in an array. Then as the program progresses just grab the next
number in the array. When I run out then fill the array again?

I think that Nick was thinking more in the line of creating the array
once, and use it over and over again. If you just refill the array after
using it once, there is no performance gain. On the contrary it uses
more operations and more memory to do the same thing.

There is a risk with reusing random numbers, though. If the size of the
array is too small so that you reuse the same numbers many times, you
might begin to see patterns in the simulation.
I can see how that would work especially since between each year time
period there are maintenance issues such as witting output files and
loading the next years maps. I am not very versed in using multi
threads and call backs but this might be a case for that?

As the task in itself doesn't need multi threading to work, using it
would only add complexity and overhead.
Here I am completely lost on your advice?

That also has to do with patterns. A simple integer random generator has
a simple and predictable algorithm. If you use the lower bits of the
value you will se reaccuring series of numbers.
 
B

Bob Cummings

Göran Andersson said:
I think that Nick was thinking more in the line of creating the array
once, and use it over and over again. If you just refill the array after
using it once, there is no performance gain. On the contrary it uses
more operations and more memory to do the same thing.

There is a risk with reusing random numbers, though. If the size of the
array is too small so that you reuse the same numbers many times, you
might begin to see patterns in the simulation.

That would not do at all. I thought he was suggesting to just initiate
the generator once, make as many numbers as needed, then get rid of it.
And that would be more efficient then asking the generator to make a
number on demand as needed through out the simulation.
As the task in itself doesn't need multi threading to work, using it
would only add complexity and overhead.

OK I am clueless about threading except for some examples in school
work. I just thought if I started a worker thread to generate the
numbers, another to write the files, and another to load up the new
years data; it might have been better then doing one thing, then the
next and so forth. I really need to grasp a better understanding
threads and what they would be good for.

That also has to do with patterns. A simple integer random generator has
a simple and predictable algorithm. If you use the lower bits of the
value you will se reaccuring series of numbers.


Ok I did encountered that problem in the past on another simulation. I
think I ended up using another number generator.

Thanks again for every ones time and expertise.

cheers

bob
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Lucian said:
Really? Not me. And here, every one of these uses integers:
http://en.wikipedia.org/wiki/List_of_pseudorandom_number_generators

I even used a variation of a linear congruential generator that uses a
byte. In C# it would be something like:

byte rnd(byte seed) {
return (byte)((seed * 251 + 1) & 255);
}

Well, it wasn't in C# that I used it, but in a size optimized Tetris
game written in assembler, weighing in at less than 600 bytes... :)
 
B

Bruce Wood

You're right: now that I harken back, all of the ones I wrote myself in
university were integer-based. I was thinking only of the commercial
ones that I've seen shipped with frameworks like .NET. They all seem to
be based on floating point and return values within the range 0.0 <=
value < 1.0, although looking at .NET I see that the Random object
offers both.
 
L

Lucian Wischik

Bob Cummings said:
Not sure if this is the correct place or not. Anyhow in school we were
taught that when trying to calculate the efficiency of an algorithm to
focus on something called FLOPs or Floating Point operations and
disregard all integer operations.

Maybe they taught that just for certain situations? It doesn't apply
here.

So should I have the values be between 0-100
and generate a random integer? Or have the values 0.0 to 1.0 and then
generate a random double in that range.

It costs a certain amount to generate each random bit. If you only
need 7 bits (numbers 0-100) then it would be wasteful to calculate 64
bits (double) and use only the most significant ones.


But really you're approaching this from the wrong angle. For a
stochastic scientific simulation, you need a scientifically acceptable
pseudo-random generator. That means you (or your scientists) need to
review the literature, find what's considered acceptable in the field,
and use that. Later on you can post to sci.maths or some other
appropriate newsgroup and say "I am using the Mersene random number
generator but it's too slow. Are there any less costly generators
which have equivalent randomness properties?"
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Internally it uses the floating point value to calculate an integer value.
 
N

Nick Hounsome

Bob Cummings said:
That would not do at all. I thought he was suggesting to just initiate the
generator once, make as many numbers as needed, then get rid of it. And
that would be more efficient then asking the generator to make a number on
demand as needed through out the simulation.

Why wouldn't it do?

Given a reasonably large set and a reasonably complicated simulation to
eliminate any chance of repetition I expect that this will work just fine.

People get too hung up on the word "random". There is no such thing
available - only approximations. You are saying that this is not a good
enough approximation but without actually saying what your requirements are.

The only true random number generators that I know of require dedicated
hardware to measure random noise in a physical circuit and even then a true
random number generator is rarely desirable on it own because, perversely,
it is not gauranteed to generate a "random" sequence of numbers.

In the old days before cheap computers people got by perftectly well using
tables of random numbers. The advantage then and now is that you can check
that they actually meet the distribution that you require (Incidentally your
simulation should probably be using different probability distributions in
certain areas - the real world doesn't go in for normal distributions as
much as programmers would like).
OK I am clueless about threading except for some examples in school work.
I just thought if I started a worker thread to generate the numbers,
another to write the files, and another to load up the new years data; it
might have been better then doing one thing, then the next and so forth.
I really need to grasp a better understanding threads and what they would
be good for.




Ok I did encountered that problem in the past on another simulation. I
think I ended up using another number generator.

For a "simple" integer generator Random()&0x01 generates
10101010101010101... which is unlikely to be useful. The general rule is to
use bits from the middle.

If you have a good table of random numbers and you don't need many choices
you can effectively increase the size by looking at different bits - eg. a
table of 100 random 32 bit integers can be viewed as 3200 random binary
choices.
 
L

Larry Lard

Nick Hounsome wrote:
[snip]
For a "simple" integer generator Random()&0x01 generates
10101010101010101... which is unlikely to be useful. The general rule is to
use bits from the middle.

I thought the rule was to use the top bits? ie, floor(n*R) where R is
in [0..1) and you want an integer in [0..n)
 
N

Nick Hounsome

Larry Lard said:
Nick Hounsome wrote:
[snip]
For a "simple" integer generator Random()&0x01 generates
10101010101010101... which is unlikely to be useful. The general rule is
to
use bits from the middle.

I thought the rule was to use the top bits? ie, floor(n*R) where R is
in [0..1) and you want an integer in [0..n)

I'm talking linear congruential INTEGER random number generators here.

See http://en.wikipedia.org/wiki/Linear_congruential_generator
 
L

Larry Lard

Nick said:
Larry Lard said:
Nick Hounsome wrote:
[snip]
That also has to do with patterns. A simple integer random generator
has
a simple and predictable algorithm. If you use the lower bits of the
value you will se reaccuring series of numbers.


Ok I did encountered that problem in the past on another simulation. I
think I ended up using another number generator.

For a "simple" integer generator Random()&0x01 generates
10101010101010101... which is unlikely to be useful. The general rule is
to
use bits from the middle.

I thought the rule was to use the top bits? ie, floor(n*R) where R is
in [0..1) and you want an integer in [0..n)

I'm talking linear congruential INTEGER random number generators here.

See http://en.wikipedia.org/wiki/Linear_congruential_generator

Well, my understanding applies there also - I extremely vaguely recall
some text in Seminumerical Algorithms along the lines of - regard the
output of your integer RNG as being binary digits to the *right* of the
binary point, then multiply by n and floor. That wikipedia article
mentions not using the low bits, but doesn't mention anything about
using the middle bits.
 
N

Nick Hounsome

Göran Andersson said:
That's a strange way to look at it... What if an algorithm doesn't use
floating point operations at all? Does that mean that you should consider
it to complete instantaneously?


If you use integers or floating points doesn't really matter that much.
You should be more concerned about how efficient the algorithm is.

The random generator in the framework generates a floating point random
number, so to get an integer random number there are some more
calculations needed. On the other hand, integer calcualtions are slightly
faster than floating point calculations, so it may prove faster once you
use the random number in some calculations.

This is just wrong.
Integer multiplication and division is often done either in assembler or
microcode - in either case it is much slower than using a FPU. Other
operations are indeed always faster.
 
B

Bob Cummings

Lucian said:
Maybe they taught that just for certain situations? It doesn't apply
here.





It costs a certain amount to generate each random bit. If you only
need 7 bits (numbers 0-100) then it would be wasteful to calculate 64
bits (double) and use only the most significant ones.


But really you're approaching this from the wrong angle. For a
stochastic scientific simulation, you need a scientifically acceptable
pseudo-random generator. That means you (or your scientists) need to
review the literature, find what's considered acceptable in the field,
and use that. Later on you can post to sci.maths or some other
appropriate newsgroup and say "I am using the Mersene random number
generator but it's too slow. Are there any less costly generators
which have equivalent randomness properties?"


Lucian

thanks for the reply. I actually did use a Mersene random number
generator in the last simulation I developed. That simulation was
implemented in C and did not cause a problem at all. I did that
simulation as an intern (4 years ago) before the class in school that
started this whole discussion.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top