Random Extension

S

shapper

Hello,

I am using the following extension to get one random item from an
IEnumerable<T>:

public static T Random<T>(this IEnumerable<T> collection) {

if (collection.Count<T>() == 0) return default(T);
Int32 position = random.Next(collection.Count<T>());
return collection.ElementAt(position);

} // Random


Is it possible, and does it make sense (I think it does), to transform
my extension to be able to use it as follows:

.... collection.Random(p => p.Country == "US");

So I would like to get one random element but only from the ones which
country is equal to US.

Another extra option would be select how many random records to be
returned (if not one) but in this case I am not sure if this is
possible.

Thank you,
Miguel
 
A

Alberto Poblacion

shapper said:
[...]
public static T Random<T>(this IEnumerable<T> collection) {
[...]
Is it possible, and does it make sense (I think it does), to transform
my extension to be able to use it as follows:

... collection.Random(p => p.Country == "US");

So I would like to get one random element but only from the ones which
country is equal to US.

You could just filter the collection by means of the Where extension
method before taking a random element:

public static T Random<T>(this IEnumerable<T> collection, Func<T, bool>
predicate) {
IEnumerable<T> filteredCollection = collection.Where(predicate);
if (filteredCollection.Count<T>() == 0) return default(T);
Int32 position = random.Next(filteredCollection.Count<T>());
return filteredCollection.ElementAt(position);
}

Another extra option would be select how many random records to be
returned (if not one) but in this case I am not sure if this is
possible.

Why wouldn't it be possible? Just add another parameter with the number
of records, and repeat the previous function as needed, building a new
collection. Of course, the return type would need to be changed from a
single element into a type that can return various elements (such as, again,
IEnumerable<T>):

public static IEnumerable<T> Random<T>(this IEnumerable<T> collection,
Func<T, bool> predicate, int numrecords) {
foreach (int i=0; i<numrecords; i++)
{
yield return collection.Random(predicate);
}
}

I've built this one in terms of the preceding overload for the same
function. This will not be particularly efficient, but it is easy to write.
Not that this version can return the same element of the collection several
times over, which may or may not be what you want.
 
P

Peter Duniho

shapper said:
[...]
Is it possible, and does it make sense (I think it does), to transform
my extension to be able to use it as follows:

.... collection.Random(p => p.Country == "US");

So I would like to get one random element but only from the ones which
country is equal to US.

IMHO, it makes more sense to just pass the extension method an
enumeration that's already been filtered. For example:

collection.Where(p => p.Country == "US").Random();

The Random() method should be simple, do one thing, and do it well (in
fact, I would have your extension method throw an exception if the
collection is empty...even returning the "default(T)" is somewhat of an
overload in behavior, as there's no way to distinguish between an
existing default value returned because it was actually in the
collection, and returned because the collection was empty).
Another extra option would be select how many random records to be
returned (if not one) but in this case I am not sure if this is
possible.

It certainly is possible. It just depends on what you really want the
method to do. As was explained last time you were asking about this
particular issue, first you need to decide what it means to return more
than one element. That is, do you want to select a random element N
times, or do you want N unique random elements?

Pete
 
S

shapper

    Why wouldn't it be possible? Just add another parameter with the number
of records, and repeat the previous function as needed, building a new
collection. Of course, the return type would need to be changed from a
single element into a type that can return various elements (such as, again,
IEnumerable<T>):

    public static IEnumerable<T> Random<T>(this IEnumerable<T> collection,
Func<T, bool> predicate, int numrecords) {
      foreach (int i=0; i<numrecords; i++)
      {
          yield return collection.Random(predicate);
      }
    }

    I've built this one in terms of the preceding overload for the same
function. This will not be particularly efficient, but it is easy to write.
Not that this version can return the same element of the collection several
times over, which may or may not be what you want.

The functionally I was looking would be:

collection.Random(10);

If the number of records in the collection is 10 or fewer then return
all records.
If the collection contains more than 10 records then return 10 random
records but all different. Not repeated.
 
S

shapper

IMHO, it makes more sense to just pass the extension method an
enumeration that's already been filtered.  For example:

     collection.Where(p => p.Country == "US").Random();

Yes, I understand. But consider for example FirstOrDefault.
If I am not worng it can used:

collection.Where(p => p.Country == "US").FirstOrDefault();

or

collection.FirstOrDefault(p => p.Country == "US");

This is why I was considering this approach.
The Random() method should be simple, do one thing, and do it well (in
fact, I would have your extension method throw an exception if the
collection is empty ...

That makes sense. I didn't think about it.
even returning the "default(T)" is somewhat of an
overload in behavior, as there's no way to distinguish between an
existing default value returned because it was actually in the
collection, and returned because the collection was empty).


It certainly is possible.  It just depends on what you really want the
method to do.  As was explained last time you were asking about this
particular issue, first you need to decide what it means to return more
than one element.  That is, do you want to select a random element N
times, or do you want N unique random elements?

Please, just see my previous answer to Alberto.

Thank You,
Miguel
 
P

Peter Duniho

shapper said:
[...]
If the collection contains more than 10 records then return 10 random
records but all different. Not repeated.

Then you want a shuffle. Take the original collection, shuffle it,
return the first N records in the shuffled result. See the previous
discussion for more details.
 
G

Göran Andersson

shapper said:
Hello,

I am using the following extension to get one random item from an
IEnumerable<T>:

public static T Random<T>(this IEnumerable<T> collection) {

if (collection.Count<T>() == 0) return default(T);
Int32 position = random.Next(collection.Count<T>());
return collection.ElementAt(position);

} // Random


Is it possible, and does it make sense (I think it does), to transform
my extension to be able to use it as follows:

... collection.Random(p => p.Country == "US");

So I would like to get one random element but only from the ones which
country is equal to US.

Another extra option would be select how many random records to be
returned (if not one) but in this case I am not sure if this is
possible.

Thank you,
Miguel

Excuse me for saying it, but that's pretty inefficient. Do you realise
that you are looping through the entire collection by average two and a
half times to get that random item? If the source happened to be a
database query, it would run the query three times against the database...

I just happen to have an extension lying around that does this in a more
efficient way:

public static T PickRandomOne<T>(this IEnumerable<T> list, Random rnd) {
T picked = default(T);
int cnt = 0;
foreach (T item in list) {
if (rnd.Next(++cnt) == 0) {
picked = item;
}
}
return picked;
}

It loops through the items and check for each item if it should keep it
or not. As it doesn't have to know how many items there are, it only has
to loop through the items once.

To get more than one item from the collection, you just store them in a
list instead of a single variable:

public static IEnumerable<T> PickRandom<T>(this IEnumerable<T> list, int
max, Random rnd) {
List<T> picked = new List<T>(max);
int cnt = 0;
foreach (T item in list) {
cnt++;
if (picked.Count < max) {
picked.Add(item);
} else {
int index = rnd.Next(cnt);
if (index < max) {
picked.RemoveAt(index);
picked.Add(item);
}
}
}
return picked;
}

To get random items using a condition, you just use the Where extension
method to filter the result before feeding it to the random extension
method:

var one = collection.Where(p => p.Country == "US").PickRandomOne(rnd);
 
P

Peter Duniho

Göran Andersson said:
Excuse me for saying it, but that's pretty inefficient. Do you realise
that you are looping through the entire collection by average two and a
half times to get that random item? [...]

I would hope he does. All of that was explained last time this
particular topic came up.
 
A

Alberto Poblacion

Göran Andersson said:
I just happen to have an extension lying around that does this in a more
efficient way:

public static T PickRandomOne<T>(this IEnumerable<T> list, Random rnd) {
T picked = default(T);
int cnt = 0;
foreach (T item in list) {
if (rnd.Next(++cnt) == 0) {
picked = item;
}
}
return picked;
}

It loops through the items and check for each item if it should keep it or
not. As it doesn't have to know how many items there are, it only has to
loop through the items once.

However, note that your method does not return all the elements in the
collection with equal probability. The first element has a 50% chance of
being picked, the second one chance in 6, the third a smaller number and so
on. And there's no guarantee that any of the members will be chosen at all,
so default(T) could be returned even if the collection is not empty.

In order to achieve an equal probability, I think that the only way to
do it requires to know the number of elements in advance, and if you only
have an IEnumerable interface, this requires looping through the entire
collection before being able to choose the "random" item to be returned.

Certainly, the Original Poster's code calls Count() twice, which can be
avoided, but we still need one Count() plus one ElementAt(), which requires
looping through the entire collection by average one and a half times.
 
G

Göran Andersson

Alberto said:
However, note that your method does not return all the elements in
the collection with equal probability.

Yes, it does.


I happened to have kept this code for testing the distribution of the
method:

int[] cnt = new int[10];
Random rnd = new Random();
List<int> source = new List<int>() { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
for (int i = 0; i < 50000000; i++) {
foreach (int c in source.PickRandom(2, rnd)) cnt[c]++;
}
for (int i = 0; i < 10; i++) Console.WriteLine("{0} : {1,8}", i, cnt);

An example of the output looks like this:

0 : 10005414
1 : 9998340
2 : 9999069
3 : 9999868
4 : 10000630
5 : 9997909
6 : 9995578
7 : 10003262
8 : 10002942
9 : 9996988

As you see, the distribution is as close to completely even that you
could ever expect. If it was more even, it wouldn't be random.
The first element has a 50%
chance of being picked,

No. It has a 100% chance of being picked. The expression rnd.Next(1)
always returns zero, so the first element will always be placed in the
variable.
the second one chance in 6,

Where did you pull that number from? The second item has 50% chance of
replacing the first item. This will also make the probability for the
first item to survive this step 50%.
the third a smaller number and so on.

Yes, that's the point. The third item has 1/3 chance of being picked,
because at that time we know that there are three items or more. The
fourth item has 1/4 chance of being picked, and so on.
And there's no guarantee that any of the members will
be chosen at all, so default(T) could be returned even if the collection
is not empty.

No, the default value is only returned if the collection is empty.
In order to achieve an equal probability, I think that the only way
to do it requires to know the number of elements in advance, and if you
only have an IEnumerable interface, this requires looping through the
entire collection before being able to choose the "random" item to be
returned.

Certainly, the Original Poster's code calls Count() twice, which can
be avoided, but we still need one Count() plus one ElementAt(), which
requires looping through the entire collection by average one and a half
times.

No, we don't need to know beforehand how many items there are. We only
need to know how many items has been processed so that we can calculate
the probability of each added item to be picked.


Obviously you didn't understand how the method works before writing that
it doesn't. Take another look at it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top