random sorting a collection

A

Adam Sandler

Hello,

I have a question about sorting a collection randomly. All the items
in the collection are unique. I've looked at
http://groups.google.com/group/micr...gst&q=random+sort+collection#fdd035444e560c40
and roughly implemented something like this:

int count = a.Length;
for (int i = count - 1; i >= 1; --i)
{
int j = r.Next(i + 1);

// something like a sort but instead of ordering
by value, order by random number
String temp = a;
a = a[j];
a[j] = temp;
}

The problem is, the sorted array can now contain duplicates! I'm
assuming the problem can come from using Random, which isn't
guaranteed to provide a unique number every time.

How do I sort an array randomly where the array items should remain
unique and I don't introduce duplicate items? Suggestions are greatly
appreciated.

Thanks!
 
I

Ignacio Machin ( .NET/ C# MVP )

Hello,

I have a question about sorting a collection randomly.  All the items
in the collection are unique.  I've looked athttp://groups.google.com/group/microsoft.public.dotnet.languages.csha...
and roughly implemented something like this:

            int count = a.Length;
            for (int i = count - 1; i >= 1; --i)
            {
                int j = r.Next(i + 1);

                    // something like a sort but instead of ordering
by value, order by random number
                    String temp = a;
                    a = a[j];
                    a[j] = temp;
            }

The problem is, the sorted array can now contain duplicates!  I'm
assuming the problem can come from using Random, which isn't
guaranteed to provide a unique number every time.

How do I sort an array randomly where the array items should remain
unique and I don't introduce duplicate items?  Suggestions are greatly
appreciated.

Thanks!


Hi,

First it strike me as a contradiction that of "sorting" and "random"
in the same sentence.
Maybe if you talk about ordering it make sense.

I do not see where you use a Random number in your code though.

What you can do is something like
-create a temp list
- generate a random num in 0 .. origList.Count
-add that elem in the temp list
- REMOVE that elem from the orig list
repit until orig list is empty
 
J

Jon Skeet [C# MVP]

On May 28, 3:39 pm, "Peter Duniho" <[email protected]>
wrote:

Then you avoid all the -1 and +1 and off-by-one. For the purposes of
shuffling the input, it doesn't matter that you might shuffle an item that
you'd already shuffled. It's still random.

It ends up being less random though. It's less likely that something
will end up in its initial position than with a true shuffle, IIRC. It
takes a *little* more work to get rid of the off-by-one errors, but
it's not that bad.

I'd personally write it as:

// i=position we're going to replace (possibly with same element)
// Everything less than i is already shuffled
for (int i=0; i < a.Length; i++)
{
// target is in range [i, a.Length)
int target = i+rng.Next(a.Length-i);

String temp = a;
a = a[target];
a[target] = temp;
}

Jon
 
B

Brian Gideon

I have a question about sorting a collection randomly.  All the items
in the collection are unique.  I've looked at
http://groups.google.com/group/microsoft.public.dotnet.languages.csha...
and roughly implemented something like this:
           int count = a.Length;
           for (int i = count - 1; i >= 1; --i)
           {
               int j = r.Next(i + 1);
                   // something like a sort but instead of ordering
by value, order by random number
                   String temp = a;
                   a = a[j];
                   a[j] = temp;
           }

The problem is, the sorted array can now contain duplicates!  I'm
assuming the problem can come from using Random, which isn't
guaranteed to provide a unique number every time.
How do I sort an array randomly where the array items should remain
unique and I don't introduce duplicate items?  Suggestions are greatly
appreciated.

Knuth, Volume 2, Algorithm 3.4.2 P

rossum


The C# code for that algorithm is:

for (int j = array.Length - 1; j > 0; j--)
{
int k = random.Next(j)
string temp = array[k];
array[k] = array[j];
array[j] = temp;
}
 
C

Chris Dunaway

Hello,

I have a question about sorting a collection randomly. All the items
in the collection are unique. I've looked

How do I sort an array randomly where the array items should remain
unique and I don't introduce duplicate items? Suggestions are greatly
appreciated.

If you are using C# 3.0 and .Net 3.5, I came up with this LINQ example
to process a list in a random order without reordering the original
list:

void randomProcess() {
List<int> integers = new List<int>() { 1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12 };

Random rnd = new Random();

var ints = from i in integers
orderby rnd.Next(integers.Count)
select i;
}

Chris
 
A

Arne Vajhøj

Adam said:
I have a question about sorting a collection randomly. All the items
in the collection are unique.
The problem is, the sorted array can now contain duplicates! I'm
assuming the problem can come from using Random, which isn't
guaranteed to provide a unique number every time.

How do I sort an array randomly where the array items should remain
unique and I don't introduce duplicate items?

The following code does both array and collection.

It is not the most efficient way of doing it, but the
algorithm is easy to understand.

public class Util<T>
{
private static Random rng = new Random();
public static T[] Shuffle(T[] a)
{
T[] a2 = (T[])a.Clone();
int[] b = new int[a.Length];
for(int i = 0; i < b.Length; i++) b = rng.Next();
Array.Sort(b, a2);
return a2;
}
public static ICollection<T> Shuffle(ICollection<T> c)
{
T[] a = new T[c.Count];
c.CopyTo(a, 0);
return new List<T>(Shuffle(a));
}
}

Arne
 
C

Chris Dunaway

Interesting. Are you guaranteed that the expression that goes with the
"orderby" statement is only ever executed once per element? A naïve
implementation of the OrderBy() extension method (which I assume is what
the "orderby" statement uses) would simply sort the input according to the
passed in function, evaluating it as needed for each comparison during the
sort.

If LINQ creates a second collection, or some kind of combined collection,
in which case the elements of each source collection are only evaluated
once, the above should work. Does it? If not, I would be concerned. :)

Pete

I ran the code several times and by visual inspection, the resulting
list seemed reasonably random to me, but I cannot answer your
questions about how the orderby statement works in this case. And I
don't what happens if the rnd.Next call returns a duplicate number in
the range and how that affects the ordering of the results.

Chris
 
J

Jon Skeet [C# MVP]

Peter Duniho said:
As a side-note: I was intrigued to find that when the target of the
"orderby" clause is more complex -- for example, a lamba expression taking
a parameter and returning a value -- the compiler doesn't like that,
complaining that it can't do type inference. In spite of the fact that
simply replacing "orderby" with an explicit call to the "OrderBy()" method
fixes it.

Could you give an example of what you mean? I'll try to work out what's
going on and whether there's a way round it.
I don't recall if the sort used in .NET is stable.

Enumerable.OrderBy is stable. List.Sort isn't.
 
J

Jon Skeet [C# MVP]

Peter Duniho said:
The original example was:

orderby rnd.Next(integers.Count)

I tried this:

orderby () => { Console.WriteLine("orderby called"); return
rnd.Next(integers.Count); }

Okay, that will expand to:

..OrderBy(i => () => { Console.WriteLine("orderby called");
return rnd.Next(integers.Count); });

Note how there are two lambda expressions there - you're trying to
order by a lambda expression now! Query expression expansion basically
puts in the i => part and then the rest is whatever you've got.


**** WARNING: Unpleasant code ahead ****


My first attempt was this:

orderby { Console.WriteLine("orderby called");
return rnd.Next(integers.Count); }

which also didn't work - something I found mildly surprising, until I
looked at the spec and saw that each ordering requires an *expression*
- and a block isn't an expression. However, what you *can* do is write
an expression which executes a lambda expression which has a block as
its body. Let's do this one step at a time - first we'll put the
delegate outside the query expression:

Func<int> randomizer = () => { Console.WriteLine("orderby called");
return rng.Next(integers.Count); };

var shuffled = from i in integers
orderby randomizer()
select i;


That works, but we can do the whole thing in a single query expression.
It's not pretty though...

var shuffled = from i in integers
orderby ( (Func<int>) (() =>
{ Console.WriteLine("orderby called");
return rng.Next(integers.Count); })) ()
select i;

You need the cast to tell the compiler what to try to convert the
lambda expression to. You need to execute it in order to end up with an
expression of an appropriate type (rather than the lambda expression
itself, which isn't comparable to anything).

I tried to space things out a bit to make it easier to read. It's not
exactly pleasant though!

So anyway, that's why it didn't work to start with, and how to make it
work, should you absolutely, absolutely have to. But please don't ;)
 
J

Jon Skeet [C# MVP]

Peter Duniho said:
[...]
Note how there are two lambda expressions there - you're trying to
order by a lambda expression now! Query expression expansion basically
puts in the i => part and then the rest is whatever you've got.

Okay...it was the expansion rule that I didn't know. The rest makes more
sense now. :)
Phew!
[...] Let's do this one step at a time - first we'll put the
delegate outside the query expression:

Func<int> randomizer = () => { Console.WriteLine("orderby called");
return rng.Next(integers.Count); };
var shuffled = from i in integers
orderby randomizer()
select i;

Yeah, from Chris's original example it was pretty clear that I _could_ get
it to work just by moving all my logic out into a separate method that
could be executed. But without understanding the expansion rule for
"orderby" (why this isn't in the documentation for "orderby", or at least
referenced by that page if it's fully elaborated elsewhere, I don't know),
I didn't see that the target of "orderby" wasn't an expression so much as
it needed to be an actual piece of code that could be executed.

Right. The whole business of query expression expansion is fairly
poorly understood at the moment, I suspect - and that's something which
is going to have to change (IMO) for LINQ to succeed.

I know Frans disagrees with me fundamentally on this, and I honestly
don't expect people to in as deep as I've had to, but a little bit more
understanding and a little bit less magic would be helpful :)

To avoid plugging my book yet again, Jamie King and Bruce Eckel's free
chapter on query expressions goes into this thoroughly:

http://www.mindviewinc.com/Books/CSharp/Index.php
I know, that's not really the right thing to say either. As you say, not
only is it an expression, it _must_ be an expression that can be
evaluated.

What kind of expression can't be evaluated? Or do you mean in terms of
the context (your next sentence).
It's that it gets evaluated in the context of a lambda
expression that means a lambda expression (which is by itself not really
an expression...odd, now that I think about :) ) won't work.

Well, sort of. One thing to understand about lambda expressions is that
there *has* to be a conversion involved to either a delegate type or an
expression tree type - otherwise the compiler doesn't know what to do
with it, basically. Once the cast was there, you'd have a syntactically
valid orderby clause - but it wouldn't do anything useful because it
would try to compare the delegates rather than comparing the result of
executing the delegates.
 
J

Jon Skeet [C# MVP]

Peter Duniho said:
The latter. And just to complicate things, part of the issue is the
dual-purpose of lambda expressions as both a delegate type and an
expression tree type. No doubt this is part of the power of lambda
expressions, but it also complicates them.

It complicates things a bit, but even without them you'd still need the
cast.

Consider C# 2 - you couldn't write:

Delegate d = delegate(int x) { Console.WriteLine(x); }

The compiler needs to know which concrete delegate type to create an
instance of. Arguably it could try the various Action and Func
delegates if nothing more specified were provided, I suppose...
 
J

Jon Skeet [C# MVP]

On May 30, 1:31 am, "Peter Duniho" <[email protected]>
wrote:

I understand the example you give, but not how it relates.

I'll attempt to make things clearer :)
While the compiler is complaining about not being able to perform
inference on the type, it's my opinion that the expression "(int iElement)
=> return rnd.Next(integers.Count);" needs no additional information.

It does. That expression in itself doesn't have a known type. It can
be converted to a delegate type in itself.
This is borne out by the fact that using that directly in the OrderBy()
method works fine. (Note: I realize this isn't actually true...it's just
how I think it _ought_ to be...see below).

The reason it works for OrderBy is that OrderBy's parameter is a
Func<TSource,TKey> (or something similar - I've lost my network
connection at the minute). It can convert

(iElement) => "foo"

into a Func<int, string> very easily. However, when you put in an
extra level of lambda expressions, it won't manage it - if you call

OrderBy(first => (second => "foo"))

you'll get the same type inference problem - it can't infer the TKey
type.

(It's possible you understood all of this bit already - just thought
I'd restate it for clarity.)
It seems to me that the real issue here is how the expansion works, in
that it's a straight "search and replace" sort of expansion in which a
lambda expression is simply not an appropriate input.

It's not an appropriate input when the compiler needs to do type
inference. If you wrote a type with the appropriate LINQ operators
where OrderBy was declared to accept a
I (think) I get why the compiler says what it says. Different parts of
the compiler are probably responsible for different parts of the process;
there's effectively a pre-processor doing the "search and replace" part,
and by the time that then is tried to be compiled into IL, it's not only
just plain wrong, there's no enough type information for any inference to
be done. Hence the actual error message.

Sort of. The query expression expansion part is indeed relatively
plain. But I don't think that's the root cause - the root cause is the
inability to infer a whole type directly from a lambda expression (as
opposed to inferring some of the parameter types or the return type,
based on knowledge of the general shape of what it's meant to be
converted to).
So, sure...the cast is required...but the real gimmick in the code you
posted is the final "()" that actually _calls_ the delegate. That changes
the lambda expression, after casting, into something that can then be
inserted into yet another lambda expression for actual evaluation. Even
with a cast, a lambda expression by itself wouldn't work.

It wouldn't actually *work*, but I think it would compile.
Finally, I do admit to remaining a little confused about the lack of type
inference for un-cast delegates (this is the "see below" I mentioned :)
). In particular, I'm not really sure why the language won't create some
sort of implicit delegate type for the purposes of doing type inference.
A common statement is one that looks like:

myControl.Invoke((MethodInvoker) delegate { // some code });

The cast is required because of the reason you state. But Invoke()
actually just takes a Delegate. Why can't the compiler generate an
anonymous type (sort of like it does with "var") that can then be used?

Just to be clear for any other readers - it's not "var" that creates
anonymous types, it's expressions of the form new { Property =
Value }. "var" is just how you cope with the fact that you can't tell
the compiler what the type's name is. You can in fact use anonymous
types without var, and var without anonymous types.

However, yes, it would be possible for the compiler to create a new
delegate type for you - or try to use one of the existing generic
ones, as mentioned before. Personally I don't think the extra
complexity in the language (particularly with respect to overloading -
there are a few nasty cases I can think of off-hand) but I would
imagine it would be doable.
It's not like the compiler doesn't know the method signature, is it? I
mean, I _think_ it should know the signature...if there's some reason it
doesn't, it's not obvious to me.

In the MethodInvoker example you gave it knows one potential
signature. It doesn't know what parameters it could use, because
you've used the syntax which says "I don't care about parameters."
With the more explicit syntax (e.g. delegate() { ... }) it would have
more information.

Jon
 
J

Jon Skeet [C# MVP]

On May 30, 1:31 am, "Peter Duniho" <[email protected]>
wrote:

I understand the example you give, but not how it relates.

I'll attempt to make things clearer :)
While the compiler is complaining about not being able to perform
inference on the type, it's my opinion that the expression "(int iElement)
=> return rnd.Next(integers.Count);" needs no additional information.

It does. That expression in itself doesn't have a known type. It can
be converted to a delegate type in itself.
This is borne out by the fact that using that directly in the OrderBy()
method works fine. (Note: I realize this isn't actually true...it's just
how I think it _ought_ to be...see below).

The reason it works for OrderBy is that OrderBy's parameter is a
Func<TSource,TKey> (or something similar - I've lost my network
connection at the minute). It can convert

(iElement) => "foo"

into a Func<int, string> very easily. However, when you put in an
extra level of lambda expressions, it won't manage it - if you call

OrderBy(first => (second => "foo"))

you'll get the same type inference problem - it can't infer the TKey
type.

(It's possible you understood all of this bit already - just thought
I'd restate it for clarity.)
It seems to me that the real issue here is how the expansion works, in
that it's a straight "search and replace" sort of expansion in which a
lambda expression is simply not an appropriate input.

It's not an appropriate input when the compiler needs to do type
inference. If you wrote a type with the appropriate LINQ operators
where OrderBy was declared to accept a
I (think) I get why the compiler says what it says. Different parts of
the compiler are probably responsible for different parts of the process;
there's effectively a pre-processor doing the "search and replace" part,
and by the time that then is tried to be compiled into IL, it's not only
just plain wrong, there's no enough type information for any inference to
be done. Hence the actual error message.

Sort of. The query expression expansion part is indeed relatively
plain. But I don't think that's the root cause - the root cause is the
inability to infer a whole type directly from a lambda expression (as
opposed to inferring some of the parameter types or the return type,
based on knowledge of the general shape of what it's meant to be
converted to).
So, sure...the cast is required...but the real gimmick in the code you
posted is the final "()" that actually _calls_ the delegate. That changes
the lambda expression, after casting, into something that can then be
inserted into yet another lambda expression for actual evaluation. Even
with a cast, a lambda expression by itself wouldn't work.

It wouldn't actually *work*, but I think it would compile.
Finally, I do admit to remaining a little confused about the lack of type
inference for un-cast delegates (this is the "see below" I mentioned :)
). In particular, I'm not really sure why the language won't create some
sort of implicit delegate type for the purposes of doing type inference.
A common statement is one that looks like:

myControl.Invoke((MethodInvoker) delegate { // some code });

The cast is required because of the reason you state. But Invoke()
actually just takes a Delegate. Why can't the compiler generate an
anonymous type (sort of like it does with "var") that can then be used?

Just to be clear for any other readers - it's not "var" that creates
anonymous types, it's expressions of the form new { Property =
Value }. "var" is just how you cope with the fact that you can't tell
the compiler what the type's name is. You can in fact use anonymous
types without var, and var without anonymous types.

However, yes, it would be possible for the compiler to create a new
delegate type for you - or try to use one of the existing generic
ones, as mentioned before. Personally I don't think the extra
complexity in the language (particularly with respect to overloading -
there are a few nasty cases I can think of off-hand) but I would
imagine it would be doable.
It's not like the compiler doesn't know the method signature, is it? I
mean, I _think_ it should know the signature...if there's some reason it
doesn't, it's not obvious to me.

In the MethodInvoker example you gave it knows one potential
signature. It doesn't know what parameters it could use, because
you've used the syntax which says "I don't care about parameters."
With the more explicit syntax (e.g. delegate() { ... }) it would have
more information.

Jon
 
J

Jon Skeet [C# MVP]

(Apologies for the previous duplication, btw. Network issues.)

I'm snipping a load of stuff here, just to get to interesting
things... a lot of the rest is just looking at things from different
points of view, and possibly a little miscommunication. Nothing
particularly significant though.

How would that work? As you wrote before (maybe not in so many words, but
I think it was basically the same thing), there's the issue of being able
to use a Func<TSource, TKey> as a comparison object. Unless Func<TSource,
TKey> implemented IComparable (or whatever the "orderby" requirement
happens to be), would that actually work? (Actually, now that I've played
with the casting variations -- see below -- I think it wouldn't, or else I
misunderstand what you're saying).

You could write your own ordering routine which first called the
"outer" function to return the "inner" function, then called the
"inner" function to return the sorting key.

I'm not saying it would be a particularly sane thing to do, but type
inference would cope (I believe).

Exactly right. :) I tried it. With the cast, the compiler decided it
had enough type information to match the outer lambda expression and
compiled fine. Of course, what it was returning was the delegate itself,
not the evaluation of the delegate, and of course the delegate isn't
comparable, so BOOM! :)

If it were possible to create a delegate type which implemented
[...]
However, yes, it would be possible for the compiler to create a new
delegate type for you - or try to use one of the existing generic
ones, as mentioned before. Personally I don't think the extra
complexity in the language (particularly with respect to overloading -
there are a few nasty cases I can think of off-hand) but I would
imagine it would be doable.

Before anonymous types, I would have said "sure, keep the language
simple". But C# is getting much more complicated with each revision. If
other people get their pet features, why not me? :)

:) I'd argue that as C# acquires more features, the bar for adding yet
another feature should be going up and up as well. I think the C# team
realise that in C# 3 they added quite a lot of stuff and need to let
it settle down for a while before they do too much more :)

But in terms of this particular feature, I think there's a big
difference between using an expression which is explicitly an
anonymous type, and using an expression which implicitly creates an
anonymous type just because of the way you've used it. Maybe I just
need to see more places it would be useful to convince me though :)

Jon
 
J

Jon Skeet [C# MVP]

Peter Duniho said:
I see. Yes, seems icky to me. :)

Oh indeed. There are lots of "fun" (read: evil) things you can do to
query expression syntax. At one geek night we even started creating
anonymous types with properties called "Select", "Where" etc which were
of suitable delegate types... the query expression expansion worked
fine with them. Totally useless, of course, but fun.
[...]
But in terms of this particular feature, I think there's a big
difference between using an expression which is explicitly an
anonymous type, and using an expression which implicitly creates an
anonymous type just because of the way you've used it. Maybe I just
need to see more places it would be useful to convince me though :)

Well, for me the most obvious example is the Control.Invoke() method,
which is used on a very regular basis. Casting works fine, of course, but
the casting is always to some arbitrary type that the Invoke() method
itself doesn't really care about anyway.

Just as a thought, I wonder what would happen if you added an extension
method to Control.Invoke taking a MethodInvoker (and then calling the
"real" Control.Invoke with it)... that might well work around the
problem quite easily.
Other than that, I admit...I don't run into the issue much, if at all.
Most of the time, for a delegate there's a very specific signature and an
explicit type needed that the compiler can use for inference. I guess the
question becomes, to some extent, is frequency of use sufficient cause, or
does there need to be a threshold met of variety of use as well.

I'm happy to leave that debate up to the language designers. I've done
great by them so far, and I'm not about to mess with a good thing (even if
I could, which I can't :) ).

:) Yup, they've done a very good job so far. I'm still not wild about
the way extension methods are discovered, and I wish that classes were
sealed by default, but hey...
 
A

Arne Vajhøj

Chris said:
I ran the code several times and by visual inspection, the resulting
list seemed reasonably random to me, but I cannot answer your
questions about how the orderby statement works in this case. And I
don't what happens if the rnd.Next call returns a duplicate number in
the range and how that affects the ordering of the results.

There are a couple of possibilities:
A) the docs specify that it does what you hope it does
B) the docs does specify what it does but current implementation does
what you hope it does
C) it does not actually do what you hope it does

Note that B is actually worse than C. B will pass any careful
test, but may break the day .NET is updated (and who remember
to check this specific issue as careful as it was original done).

As a rule of thumb: if it is not possible to verify that it is A
in seconds or max a few minutes, then the time is better spend
finding an alternative solution that is more straight forward
than to investigate the original issue.

Somebody will have to maintain that code. And having to read
tons of blogs on the internet and study the framework source code
is not something that makes the life of maintenance programmers
easier.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top