Data structure suggestion wanted...

  • Thread starter Thread starter sprash
  • Start date Start date
S

sprash

I am writing a program to get the mode of a given set of numbers.

As a refresher, mode is the number occuring most often in a list and
1. if 'n' numbers have the same maximum frequency in the list, all
those 'n' numbers are modes
2. If the maximum frequency is 1, the list does not have a mode.

Understandably, this list can be potentially huge and so the list of
modes can be huge. However, the number of modes in a list that contains
'n' elements cannot be more than 'n/2' due to the above rules.

My algorithm simply sorts the array and does a linear scan to measure
the length of all adjacent runs.

The question is, what data structure should I use to hold the mode list
keeping in mind that my list can grow as much as n/2 and can also be as
short as 1. Also note that my mode list can suddenly shrink from a
large number of elements to as small as 1 if it discovers a number
which occurs more than those contained in a list.

I am considering an ArrayList with initial capacity n/2 to hold the
mode, but I do realize that this is inefficient because:
1. The data it will hold will be a value type, hence the performance
penalty in boxing my Int32s
2. I might not even use n/2 values. But then how do I decide when to
use TrimToSize?

Any suggestions?
 
sprash said:
I am considering an ArrayList with initial capacity n/2 to hold the
mode, but I do realize that this is inefficient because:
1. The data it will hold will be a value type, hence the performance
penalty in boxing my Int32s
2. I might not even use n/2 values. But then how do I decide when to
use TrimToSize?

I can see 3 options:
1) Upgrade to dotnet 2.0 and use generics (if you already have 2.0 then
forget 2 and 3 below :-)
2) Use an array and resize it as needed.
3) Use the arraylist

With 2 & 3 you should define your own class with a Value and Frequency
member. That will avoid boxing.

If you don't have 2.0 then i'd use no 2. Start with 10 elements and double
each time it is exceeded. You should probably sort the array as you go to
make lookups faster.

Michael
 
If I understand the requirements correctly try the code below. It's not well
tested, I'll leave that to you ...

//////////////////////////////

int[] numbers = {1,2,3,4,5,2,3,4,5,3,4,5,4,5}; // Modes are 4 & 5 in this set

int max = 2; // Minimum occurances to be a mode

Dictionary<int,int> counts = new Dictionary<int,int>(numbers.Length);
Dictionary<int,int> modes = new Dictionary<int,int>(numbers.Length);

foreach (int n in numbers)
{
counts[n] = counts.ContainsKey(n) ? (counts[n] + 1) : 1;

if (counts[n] == max)
{
modes[n] = max;
}
else if (counts[n] > max)
{
modes.Clear();
max = counts[n];
modes[n] = max;
}
}

foreach(KeyValuePair<int,int> kvp in modes)
{
Console.WriteLine("mode={0}, count={1}",kvp.Key,kvp.Value);
}

//////////////////////////////
 
Back
Top