Best way to compare two sets of data

  • Thread starter Thread starter Terry
  • Start date Start date
T

Terry

I've got a situation where I have a set of data, and later take another
snapshot to obtain a second set of data. There will be one or more
changes in the second set of data and I need to be able to tell which
items were in the first set missing from the second set and which items
were added to the second set.

Can anyone recommend an algorithm for this, or a collection class in C#
that may be of help?

My first inclination is the take two arrays and just start looping
through one at a time, but it seems like this is something that would
have to done all the time and that there would be a more efficient
algorithm for doing so.

In the end, I'd like to be able to say that between that time and this
time, "x" and "y" were added, and "z" was removed.

Thanks!
 
Terry,

Is this stored in a DataSet? If it is, then you can use the GetChanges
method on the DataSet to create another DataSet that has only the changes
that have occured since the last time AcceptChange (or the creation of the
dataset) was called.

Hope this helps.
 
Nicholas said:
Terry,

Is this stored in a DataSet? If it is, then you can use the GetChanges
method on the DataSet to create another DataSet that has only the changes
that have occured since the last time AcceptChange (or the creation of the
dataset) was called.

Hope this helps.

No. Currently the data is coming back as a simple array. These aren't
large sets of data, 64 max and most of the time less than 20, so I was
thinking that "DataSet" would be kind of a heavyweight control for this.
I just noticed "SortedList" which may be useful.

Or is DataSet not as big as I'm assuming it is? By "big" I mean a lot
of overhead for a simple operation?

Terry
 
Terry said:
No. Currently the data is coming back as a simple array. These aren't
large sets of data, 64 max and most of the time less than 20, so I was
thinking that "DataSet" would be kind of a heavyweight control for this.
I just noticed "SortedList" which may be useful.

Or is DataSet not as big as I'm assuming it is? By "big" I mean a lot
of overhead for a simple operation?

Terry

How's this for a solution? This might be good enough, unless anyone
sees a better way of handling this?

private void processDifferences(SortedList ar1, SortedList ar2)
{
ArrayList itemsRemoved = new ArrayList();
ArrayList itemsAdded = new ArrayList();
foreach(DictionaryEntry de in ar1)
{
// If it's not in second set it was removed
if (!ar2.ContainsKey(de.Key))
{
itemsRemoved.Add(de.Key);
}
else // Otherwise it is still there, remove it from ar2
{
ar2.Remove(de.Key);
}

// Everything that's left in ar2 was added
itemsAdded.AddRange(ar2);
}
}
 
Terry,

Well, that's something you will have to decide for yourself (whether or
not it is too big). Obviously, there is going to be some overhead, but only
you can determine if that overhead is tolerable. For this kind of
functionality, I would say it is, but then again, I don't know scope of it's
use.

If you don't want to use a data set, then you could easily determine
which elements changed in between the two iterations. The problem stems
from how you want to indicate there was a change. For example, if the
element at index 14 was deleted, is everything else shifted down or not?
Does the position matter? What if the array has more than one element with
the same value in it?

These questions pile up pretty quickly, which is why I opt for the data
set =)
 
Terry said:
No. Currently the data is coming back as a simple array. These aren't
large sets of data, 64 max and most of the time less than 20, so I was
thinking that "DataSet" would be kind of a heavyweight control for this.
I just noticed "SortedList" which may be useful.

Or is DataSet not as big as I'm assuming it is? By "big" I mean a lot
of overhead for a simple operation?

Terry

How's this for a solution? This might be good enough, unless anyone
sees a better way of handling this?

private void processDifferences(SortedList ar1, SortedList ar2)
{
ArrayList itemsRemoved = new ArrayList();
ArrayList itemsAdded = new ArrayList();
foreach(DictionaryEntry de in ar1)
{
// If it's not in second set it was removed
if (!ar2.ContainsKey(de.Key))
{
itemsRemoved.Add(de.Key);
}
else // Otherwise it is still there, remove it from ar2
{
ar2.Remove(de.Key);
}

// Everything that's left in ar2 was added
itemsAdded.AddRange(ar2);
}
}
 
Terry,

Can your lists have multiple entries of the same value? If so, then
that might not work (unless you don't care which duplicate value is
retained).
 
Nicholas said:
Terry,

Can your lists have multiple entries of the same value? If so, then
that might not work (unless you don't care which duplicate value is
retained).
No, they can't have duplicate values. The keys must be unique, and
since I'm checking for existance of keys, wouldn't that take care of it.
Even if two values were the same, the keys would mean that they are
two different items. The values can really be anything, but the keys
are how things are tracked, so it should be ok (I think :-)

Also, I realized I need to move that "addRange()" call outside of the
"foreach" loop. :-X

Terry
 
Back
Top