How to count value in a ArrayList

Tony · May 23, 2008

Hello!

I have an ArrayList with many double that is sorted some of these double
value exist more then once in the ArrayList.
Is it possible to use some feature in C# or from the framework to count each
unique double value or is the only
solution to run through the list and count every unique double value.

//Tony

Family Tree Mike · May 23, 2008

If you can use LINQ, then this page is an excelent resource
http://msdn.microsoft.com/en-us/vcsharp/aa336747.aspx.

Look at the example for Aggregate Operators - Count - Grouped.

j1mb0jay · May 23, 2008

Hello!

I have an ArrayList with many double that is sorted some of these double
value exist more then once in the ArrayList. Is it possible to use some
feature in C# or from the framework to count each unique double value or
is the only
solution to run through the list and count every unique double value.

//Tony

Maybe try using a hash table rather that an array list, searching of a
hash table is O(1) rather than a list O(n).

Regards j1mb0jay

Jon Skeet [C# MVP] · May 23, 2008

I have an ArrayList with many double that is sorted some of these double
value exist more then once in the ArrayList.
Is it possible to use some feature in C# or from the framework to count each
unique double value or is the only
solution to run through the list and count every unique double value.

Leaving aside the other responses, I'd note that comparing double
values against each other directly is usually a bad thing. Two
sequences of calculations which are "logically" identical can have
different results, sometimes for very odd reasons to do with JIT
optimisations etc. It's usually better to compare within a certain
tolerance. What are these doubles representing, and where have they
come from? Different situations merit different approaches.

Jon

Ignacio Machin ( .NET/ C# MVP ) · May 23, 2008

Hello!

I have an ArrayList with many double that is sorted some of these double
value exist more then once in the ArrayList.
Is it possible to use some feature in C# or from the framework to count each
unique double value or is the only
solution to run through the list and count every unique double value.

//Tony

Hi,

You will have to run through the list, there is no way to avoid that.

j1mb0jay · May 24, 2008

Hi,

You will have to run through the list, there is no way to avoid that.

If you input the the data into a hash table rather than a ArrayList, that
will clone itself when it gets full and is rather slow. Then when you
want to look for an item you will only have to look at one item to see if
you have it, rather than the whole list.

Regards j1mb0jay

j1mb0jay · May 24, 2008

And in addition to Jon's excellent point, I'll point out that if the
list is in fact sorted as you say, then you should use a binary search
to find the first instance of the value you're looking for. Then you
only need to examine from that point forward until you reach the next
value not "equal" (using whatever definition is appropriate).

For lists of any significant size, this would be _much_ more efficient
than enumerating the entire list.

Pete

Trees are a very fast at what they do, but i do not think this is the job
that they were made for, putting the list is order is what the tree is
for O( n log n ) is very good ( assuming you root value if a the 1/2
value in the list.)

I think that the OP wants to stop having duplicate entries in his list,
so a hash table that checks for an already exists instance of the current
node will be by far the quicker.

Regards j1mb0jay

Arne VajhÃ¸j · May 24, 2008

j1mb0jay said:
If you input the the data into a hash table rather than a ArrayList, that
will clone itself when it gets full and is rather slow. Then when you
want to look for an item you will only have to look at one item to see if
you have it, rather than the whole list.

Is that different from ArrayList ??

Arne

j1mb0jay · May 24, 2008

Is that different from ArrayList ??

Arne

A well distributed hash table has a BigO(1) for all operations (Insert,
Search and Remove).

http://en.wikipedia.org/wiki/Hash_table

Regards j1mb0jay

Arne VajhÃ¸j · May 24, 2008

j1mb0jay said:
A well distributed hash table has a BigO(1) for all operations (Insert,
Search and Remove).

So does an array list for operations at the tail.

Until the backing array is full.

And a hash table has the same issue.

Arne

j1mb0jay · May 24, 2008

So does an array list for operations at the tail.

Until the backing array is full.

And a hash table has the same issue.

Arne

How can an array list have a O(1) Search, when it has to look at each
item and compare. Dont be so silly !!!

j1mb0jay

Arne VajhÃ¸j · May 24, 2008

j1mb0jay said:
How can an array list have a O(1) Search, when it has to look at each
item and compare. Dont be so silly !!!

Both inserting at the tail and removing from the tail is O(1).
Technically speaking searching at the tail is also O(1), but there
is not much point in that.

Arne

j1mb0jay · May 24, 2008

It can't. However, I don't blame Arne for responding as he did, given
that you explicitly called out "will clone itself when it gets full and
is rather slow". To me (and presumably to him) that clearly implies
that the operation you're concerned with at that particular moment is
adding, not searching. It seems reasonable to me to point out that the
addition operation for both data structures is constant order, and that
the cost of "cloning" is present in both.

But even a hash table is slower for searching than simply enumerating a
list that's already been sorted (as in the OP's question). So,
please...tell us why the hash table is the preferred solution here?

Pete

From reading the OP, i thought that they where inserting double values
into a data structure and wanted to make sure that the value that they
were adding was not already stored in the data structure. To me this
means they will need to compare every insert item with ever single item
in the list O(n). Doing this with a ArrayList is a lot slower than a hash
table, as you could hash the double value to find out which linked list
the value would be stored in, then just check the few values in the list.

When an array list is full is needs to be cloned to another larger array
list. I thought that hash tables where are array of linked lists (Also
known as a bucket array) this means they can never be full (will use SWAP/
Virtual Memory when RAM is full), but the more items that are in each
linked list the slower the search time becomes, the insert time of a
linked list is always O(1) if you are adding to the tail of the list.

If the OP is using a ArrayList and would like to keep the items in order
when inserting a new item this would mean moving all of items along one
in the list, again another slow process, maybe a tree of some kind would
be a better option.

I was always taught to stay away from ArrayList at all costs. Arrays are
fine if you know the amount of items that you would like to store, and
they have a fast data access compared to a list as you do not have to
iterate all of the items to find the one you require.

j1mb0jay

Arne VajhÃ¸j · May 24, 2008

j1mb0jay said:
From reading the OP, i thought that they where inserting double values
into a data structure and wanted to make sure that the value that they
were adding was not already stored in the data structure. To me this
means they will need to compare every insert item with ever single item
in the list O(n). Doing this with a ArrayList is a lot slower than a hash
table, as you could hash the double value to find out which linked list
the value would be stored in, then just check the few values in the list.

That makes sense, but that was not the statement of yours we
disagreed with.

We disagreed with the suggestion that ArrayList was slower than
Hashtable due to the reallocate and copy at expansion of the backing
array.

When an array list is full is needs to be cloned to another larger array
list. I thought that hash tables where are array of linked lists

It is not.

And I suspect it would be significant slower doing it that way. The
reason Hashtable is fast is the ability to lookup directly based on
hash value.

If the OP is using a ArrayList and would like to keep the items in order
when inserting a new item this would mean moving all of items along one
in the list, again another slow process, maybe a tree of some kind would
be a better option.

ArrayList is not a good choice to insert sorted into.

I was always taught to stay away from ArrayList at all costs.

You were taugth wrong.

ArrayList (or List<> in newer .NET versions) is in most cases
a very good choice of data structure.

Arne

j1mb0jay · May 25, 2008

That makes sense, but that was not the statement of yours we disagreed
with.

We disagreed with the suggestion that ArrayList was slower than
Hashtable due to the reallocate and copy at expansion of the backing
array.

It is not.

And I suspect it would be significant slower doing it that way. The
reason Hashtable is fast is the ability to lookup directly based on hash
value.

ArrayList is not a good choice to insert sorted into.

You were taugth wrong.

ArrayList (or List<> in newer .NET versions) is in most cases a very
good choice of data structure.

Arne

I thought the new ArrayList<TYPE> was just to stop unwanted items been
added item your list which in turn would be processed by your program
that maybe "harmful", and of course you no longer need to cast objects
when pulling them out structure. ArrayLists are simple to use and perfect
for most simple programs. Although you may argue that the more complex
programs be written in C of Assembly, for example Folding@Home.

I still stick with my argument, you do not have to re hash a hash table
when a large amount of collisions happen, you can just keep adding more
items, once a array list is full you have to clone it to a larger table.
But of course you may already know the amount of items that you are going
to be adding which would make this discussion null and void.

j1mb0jay

Arne VajhÃ¸j · May 25, 2008

j1mb0jay said:
I thought the new ArrayList<TYPE> was just to stop unwanted items been
added item your list which in turn would be processed by your program
that maybe "harmful", and of course you no longer need to cast objects
when pulling them out structure.

It is more type safe.

Type safeness is a very good thing.

It also has a positive impact on performance of lists
of value types.

ArrayLists are simple to use and perfect
for most simple programs.

Also for complex programs.

Although you may argue that the more complex
programs be written in C of Assembly, for example Folding@Home.

It is the other way around. You can do simple programs in assembler
(and to some extent C). For complex programs you need a more
high level language.

I still stick with my argument, you do not have to re hash a hash table
when a large amount of collisions happen, you can just keep adding more
items, once a array list is full you have to clone it to a larger table.

You are free to stick with your argument.

But everybody can lookup in the .NET code and see that you are wrong -
the Hashtable/Dictionary also reallocates and copy data when it
is full.

Arne

Rudy Velthuis · May 25, 2008

j1mb0jay said:
Then
when you want to look for an item you will only have to look at one
item to see if you have it, rather than the whole list.

If a hash table is full of collisions, as you put it, it is in fact
just a list of linked lists, and not that much faster anymore. You must
most definitely look at more than one item, although of course only for
the bucket corresponding to the hash value you are looking for.

So rehashing is most definitely something that must be done, sometimes,
if you keep on having collisions, or if the list is rather full.

And reallocations in an ArrayList do not occur for each new item
either. If it is full, the capacity is doubled and the data is copied
over. From then on, you can add quite a few more items before it is
full.
--
Rudy Velthuis http://rvelthuis.de

"Your Highness, I have no need of this hypothesis."
-- Pierre Laplace (1749-1827), to Napoleon on why his works on
celestial mechanics make no mention of God.

j1mb0jay · May 25, 2008

It is more type safe.

Type safeness is a very good thing.

It also has a positive impact on performance of lists of value types.

Also for complex programs.

It is the other way around. You can do simple programs in assembler (and
to some extent C). For complex programs you need a more high level
language.

You are free to stick with your argument.

But everybody can lookup in the .NET code and see that you are wrong -
the Hashtable/Dictionary also reallocates and copy data when it is full.

Arne

I never said anything about the .net version !!!

j1mb0jay

Arne VajhÃ¸j · May 25, 2008

j1mb0jay said:
I never said anything about the .net version !!!

There are not much point in posting about non-.NET in a
..NET newsgroup.

Arne

Arne Vajhøj · May 26, 2008

Tony said:
I have an ArrayList with many double that is sorted some of these double
value exist more then once in the ArrayList.
Is it possible to use some feature in C# or from the framework to count each
unique double value or is the only
solution to run through the list and count every unique double value.

You can not avoid the loop (with a list as data structure).

If you used List<double> instead of ArrayList you could use an
implicit loop as in:

int ndistinct = 0;
double lastx = double.NaN;
lst.ForEach(delegate(double x) { if(x != lastx) ndistinct++; lastx = x; });

But if you consider that nicer than a simple for loop is up to you.

And please pay attention to Jon's note about floating point
comparison.

Arne

How to count value in a ArrayList

Tony

Family Tree Mike

j1mb0jay

Jon Skeet [C# MVP]

Ignacio Machin ( .NET/ C# MVP )

j1mb0jay

j1mb0jay

Arne VajhÃ¸j

j1mb0jay

Arne VajhÃ¸j

j1mb0jay

Arne VajhÃ¸j

j1mb0jay

Arne VajhÃ¸j

j1mb0jay

Arne VajhÃ¸j

Rudy Velthuis

j1mb0jay

Arne VajhÃ¸j

Arne Vajhøj