Random Record

shapper · Jun 27, 2009

Hello,

How can I get a random record using linq?

Thanks,
Miguel

shapper · Jun 27, 2009

See Count(), Skip(), and Take().

What about using an extension?

private static Random random = new Random();
public static T GetRandom<T>(this IList<T> list) {
if (list.Count == 0) {
return default(T);
}
return list[random.Next(0, list.Count)];
}

But would it possible, or even make sense, to make this extension work
with List, IEnumerable, etc?

Thanks,
Miguel

shapper · Jun 28, 2009

Take your pick. Do you want to use LINQ or don't you?

I mentioned "using Linq" because I though this was the way to do
it ... Sorry.

With your suggestion I understood that it was not ...
And since lately I have been creating some String and DateTime
extensions which have become really useful then why not an extension
for this to?

When I ask advice on the code I post is because I think it would be
good this extension to be applicable to collections in general. Or
not?

vanderghast · Jul 13, 2009

Take the first 'row(s)' after a random sort (which implies you can implement
a sort delegate function wich randomly returns -1, 0, or 1, as example, for
IComparer<T>).

Vanderghast, Access MVP

Peter Duniho · Jul 13, 2009

Take the first 'row(s)' after a random sort (which implies you can
implement a sort delegate function wich randomly returns -1, 0, or 1, as
example, for IComparer<T>).

But that implication is false. You can't simply return random values from
a comparison function. Each element within the collection must be ordered
consistently with every other element. If the comparison function returns
-1 when elements A and B are passed to it, and it returns -1 when elements
B and C are passed to it, it must also return -1 when elements A and C are
passed to it. There's no way to ensure that if the comparison function
just returns a number selected randomly from -1, 0, and 1.

Besides, sorting the entire collection costs way more than simply skipping
the first N elements.

Pete

vanderghast · Jul 13, 2009

"But that implication is false."

You tried it?

Vanderghast, Access MVP

vanderghast · Jul 13, 2009

Explicitly, HERE, it is about getting a RANDOM sort, so if element B is > ,
= or < than element C can be determined 'ad hoc', rather than 'once for all
at the start' of the operations. Furthermore, if a test is required, it is
(logical) to assume that 'up to now' the order is/was unknown/unused by the
algorithm making the sort, so returning a 'random' value (among -1, 0 and 1)
is strictly equivalent to have that value pre-determined at the start.

Vanderghast, Access MVP

Random · Jul 13, 2009

"But that implication is false."

You tried it?

What does that matter? If you break the contract like this and it
works for you once or a hundred times on your dev box, then telling
the customer it works for you when it locks up or throws and exception
on them is useless. Don't code by trial-and-error, rather try to
actually follow the documentation.

Don't believe me? Run this bit of code, whatch it work once then
break the second time through the seed loop.

for (int seed = 0; seed < 1000; seed++)
{
List<int> ints = new List<int>();
for (int i = 0; i < 1000; i++)
{
ints.Add(i);
}
Random r = new Random(seed);
ints.Sort(delegate(int a, int b) {
return r.Next(0, 3) - 1; });
Debug.Print("It worked for seed = {0}", seed);
}

vanderghast · Jul 13, 2009

Your example is not relevant to the discussion. By the way, I don't meant to
return "a number" between -1 and +1, but one of the three numbers, -1, 0 and
1, randomly, 'ad hoc', as 'asked' and the observation is based on the fact
that the algorithm would ask JUST ONCE, for two given 'objects', A and B,
if A<B, if A=B or if A>B. So, ANSWERING to that QUESTION randomly once for
all before the process starts, or 'ad hoc', AS I PROPOSED, is logically
equivalent.

Vanderghast, Access MVP

"But that implication is false."

You tried it?

What does that matter? If you break the contract like this and it
works for you once or a hundred times on your dev box, then telling
the customer it works for you when it locks up or throws and exception
on them is useless. Don't code by trial-and-error, rather try to
actually follow the documentation.

Don't believe me? Run this bit of code, whatch it work once then
break the second time through the seed loop.

for (int seed = 0; seed < 1000; seed++)
{
List<int> ints = new List<int>();
for (int i = 0; i < 1000; i++)
{
ints.Add(i);
}
Random r = new Random(seed);
ints.Sort(delegate(int a, int b) {
return r.Next(0, 3) - 1; });
Debug.Print("It worked for seed = {0}", seed);
}

vanderghast · Jul 13, 2009

Besides, sorting the entire collection costs way more than simply skipping

the first N elements.

maybe, for ONE element, but if you want to simulate a TOP 2 random, or a TOP
10 random, skipping the first N element method repeated 2, or 10 times,
become slower, and since N would be a generated random pick, you will have
to add the required logic to be sure you don't pick the same record twice.

Vanderghast, Access MVP

vanderghast · Jul 13, 2009

And I failed to answer specifically to your objection about

- if A > B (by previous comparison between A and B)
- and if B > C (again by a previous comparison)

Your objection was that then, nothing would insure us that a randombly
generated answer betwenn A and C would return A > C . While your objection
is right about the nothingness (while is it also possible to store the
generated random value, once it is generated), it is somehow irrelevant
since the algorithm won't need to compare A with C in the first place,
since it already knows the relative position of C to B to A... unless you
use a very innefficient sorting algorithm.

Vanderghast, Access MVP

Random · Jul 13, 2009

Your example is not relevant to the discussion.

Why not? It does exactly what you suggested when you said: "which
implies you can implement a sort delegate function wich randomly

returns -1 said:
By the way, I don't meant to
return "a number" between -1 and +1, but one of the three numbers, -1, 0 and
1

Expect for -1, 0, and 1, how many ints are there that are greater than
or equal to -1 and less than or equal to 1?

randomly, 'ad hoc', as 'asked' and the observation is based on the fact
that the algorithm would ask JUST ONCE, for two given 'objects', A and B,
if A<B, if A=B or if A>B. So, ANSWERING to that QUESTION randomly once for
all before the process starts, or 'ad hoc', AS I PROPOSED, is logically
equivalent.

Bzzt. Thanks for playing.

Here's code that proves you wrong on two counts:

Dictionary<int, int> seen = new Dictionary<int, int>();
List<int> ints = new List<int>();
for (int i = 0; i < 1000; i++)
{
ints.Add(i);
}
Random r = new Random(0);
ints.Sort(delegate(int a, int b)
{
if (!seen.ContainsKey(a * 10000 + b))
seen[a * 10000 + b] = r.Next(0, 3) - 1;
else
Debug.Print("The algorithim asked me again!");
return seen[a * 10000 + b];
});

If you run through this, you'll see the Sort() algorithm is clearing
asking for the same comparison more than once and further more, even
if you satisfy your criteria of always returning the same thing for
each sort, if you violate the rule that was mentioned elsewhere in the
thread of returning illogical results for A vs B, B vs C, then A vs C,
then the algorithm falls over (in this case, it appears to lock up. I
didn't have the patience to see if it'll eventually throw a stack
overflow exception).

Peter Duniho · Jul 13, 2009

And I failed to answer specifically to your objection about

- if A > B (by previous comparison between A and B)
- and if B > C (again by a previous comparison)

Your objection was that then, nothing would insure us that a randombly
generated answer betwenn A and C would return A > C . While your
objection is right about the nothingness (while is it also possible to
store the generated random value, once it is generated), it is somehow
irrelevant since the algorithm won't need to compare A with C in the
first place, since it already knows the relative position of C to B to
A... unless you use a very innefficient sorting algorithm.

You should learn a little more about sorting algorithms before you make
statements like that.

If it were possible to write a sort algorithm that only had to compare two
elements exactly once, and perfectly resolved any transitive information,
then we would not have to suffer through using the O(n log n) quicksort
that is the de facto standard for general-purpose sorting.

Even a merge sort, which is O(n), depends on consistent comparisons
between different elements.

As far as your question "have you tried it?" goes, yes...a very long time
ago when I first learned about sort algorithms. You should consider what
your own answer to that question would be were it posed to you. It's
obvious you haven't tried it, because if you had, you would have noticed
that you can't get a truly random sort (for that matter, many sort
implementations will either throw an exception or simply never complete,
depending on the implementation and algorithm...the .NET implementation
definitely can hang if you provide it invalid comparisons, though
sometimes if you're lucky you'll just get an unexpected ordering).

As far as your comment about picking more than one random element goes...

If you want more than one random element, it's still sufficient and more
efficient to pick a series of random indices, and select those from the
original collection. You only have to sort the indices, not the entire
collection, so as long there are fewer indices than there are elements in
the collection, it's faster to do that.

Even if you do want the entire collection in random order, it would be
more efficient to simply shuffle the collection (which would be O(n))
rather than try to _sort_ the collection using random indices (which at
best would be O(n log n)).

Of course, Miguel never said anything at all about wanting to select more
than one element at random anyway, so that's a moot point anyway, even if
it supported your suggestion (which it doesn't).

Pete

vanderghast · Jul 14, 2009

After these checks, it appears that indeed, the Framework does not use the
algorithm I was assuming it used, and that some pair of elements may be
tested more than once. A memory-less solution, as I proposed is thus clearly
not appropriate, neither the most efficient one, in general , since a
shuffling would be more efficient than ordering on a random value(exception
seems to exist, such as rows in a database table, where shuffling is not an
appropriate concept).

Vanderghast, Access MVP

vanderghast · Jul 14, 2009

You probably meant the quick sort is O(n2) in the worst case, and a Merge
sort O(n log n), but you are right about using a shuffling algorithm would
be more effective than sorting on a random value to start with, assuming
'position' for the elements are available (as example, they are not exposed
for rows in a database table).

I recognize that I was wrong about the possibility to use a memoryless sort
with the sorting algorithm used by the Framework.

Vanderghast, Access MVP

Peter Duniho · Jul 14, 2009

You probably meant the quick sort is O(n2) in the worst case, and a
Merge sort O(n log n)

No, I'm talking about average case. And the merge sort scenario I'm
talking about is where the separate collections are already sorted; i.e.
there's only the merge to do, which is O(n). The point in either,
however, is that neither will work correctly if the comparison function is
just returning random values.

but you are right about using a shuffling algorithm would be more
effective than sorting on a random value to start with, assuming
'position' for the elements are available (as example, they are not
exposed for rows in a database table).

I recognize that I was wrong about the possibility to use a memoryless
sort with the sorting algorithm used by the Framework.

"Memoryless"? What's that supposed to mean? Quicksort (for example)
doesn't _require_ any additional memory beyond the original data; it can
be done in-place. "Memoryless" doesn't seem to have anything to do with
your original suggestion, which was to simply return random values from a
comparison function.

In any case, I'm not aware of any sort algorithm that will work correctly
with a broken comparison function. Please, educate me. Show me a
_sorting_ algorithm that takes as input a collection of data and a
comparison function, and which provides the desired results when used with
a comparison function that simply returns random results.

Note that even the lowly bubble sort, while it should run to completion
with a randomized comparison function, still won't produce correctly
random results. It would essentially be a slower, less-correct shuffle
algorithm (i.e. the elements will be somewhat randomized, but the
distribution won't be perfectly even).

Pete

vanderghast · Jul 15, 2009

Memoryless as in not remembering the random value generated to compare
element A with element B. I could have been possible to generates a random
floating point number for each element and use that random value to compare
A and B. Sorry it that was not crystal clear.

The problem was not to compare correctly but to 'suffle' the elements, as
you pointed out, so the memoryless (as here up) random generated value is
quite possible. Check the 'animation' at
http://www.softpanorama.org/Algorithms/Sorting/quicksort.shtml where you
can change randomly the values in the middle of the algorithm execution.
Sure, then, the result is not strictly properly sorted, but is randomly
shuffled. Your objections are stick on the general case of sorting, which is
not the case, indeed.

Vanderghast, Access MVP

Göran Andersson · Jul 15, 2009

shapper said:
See Count(), Skip(), and Take().

Click to expand...

What about using an extension?

private static Random random = new Random();
public static T GetRandom<T>(this IList<T> list) {
if (list.Count == 0) {
return default(T);
}
return list[random.Next(0, list.Count)];
}

But would it possible, or even make sense, to make this extension work
with List, IEnumerable, etc?

Thanks,
Miguel

You can do it using IEnumerable<T> instead:

public static T GetRandom<T>(this IEnumerable<T> list, Random rnd) {
T picked = default(T);
int cnt = 0;
foreach (T item in list) {
if (rnd.Next(++cnt) == 0) {
picked = item;
}
}
return picked;
}

That way you wouldn't need to actually have the entire list in memory at
once, as it can process the items as they are created. You could do
something like this (or something more meaningful) without running out
of memory:

int n = Enumerable.Range(0,int.MaxValue).GetRandom(new Random())

Random Record

shapper

shapper

shapper

vanderghast

Peter Duniho

vanderghast

vanderghast

Random

vanderghast

vanderghast

vanderghast

Random

Peter Duniho

vanderghast

vanderghast

Peter Duniho

vanderghast

Göran Andersson