need suggestion on data structure for this data

  • Thread starter Thread starter bahoo
  • Start date Start date
B

bahoo

Hi,

My data is organized like this, where A always goes from 1 to some
number N:
A 1 2 3 ...
B 53,26 42,18 15,86 ...
C 59 43 31 ...


I often need to look up in this way:
Given B=18, what is the corresponding value of A and C? (Answer is 2
and 43).
My current implementation is a for-loop that goes through all A's and
checks if the B value is the query, and if yes, return the
corresponding A and C. But this is very inefficient.
Could anyone please suggest me a better data structure for doing this,
better be a built-in one?
Observe that B always appears in pairs, and when I look up, I may use
either value within the pair.
Thanks!
bahoo
 
bahoo said:
[...]
My current implementation is a for-loop that goes through all A's and
checks if the B value is the query, and if yes, return the
corresponding A and C. But this is very inefficient.
Could anyone please suggest me a better data structure for doing this,
better be a built-in one?

Are the values of B unique? I assume they are, otherwise even your
brute-force algorithm seems to have problems.

The usual way to address something like this is to create an index. In
..NET, the Dictionary<> generic class would work well for this. Using
the values from B as your index, and the corresponding column number (A)
as your value.

You would have to scan through all of the data once, adding two new
entries to the Dictionary for each column (one for each value of B).
Then given a value for B to look up, you would simply use that value as
the index for the Dictionary<>, which would return your A value, which
you could then use to also get the C value.

There are lots of other ways to implement an index; don't feel you have
to use Dictionary<>. It just happens to be relatively convenient. An
alternative might be to maintain an array sorted based on the B values,
and containing a data structure that includes the A value. Then a
binary search would quickly find the right array element based on the B
value, which would then yield the A value.

Any index will, of course, improve performance at the cost of memory
usage. You don't say how many columns of data you might have, but if
it's small, your brute-force iterative method might well be just fine.

Pete
 
bahoo said:
My data is organized like this, where A always goes from 1 to some
number N:
A 1 2 3 ...
B 53,26 42,18 15,86 ...
C 59 43 31 ...


I often need to look up in this way:
Given B=18, what is the corresponding value of A and C? (Answer is 2
and 43).
My current implementation is a for-loop that goes through all A's and
checks if the B value is the query, and if yes, return the
corresponding A and C. But this is very inefficient.
Could anyone please suggest me a better data structure for doing this,
better be a built-in one?
Observe that B always appears in pairs, and when I look up, I may use
either value within the pair.

Make a class Info with 4 fields/properties A, B1, B2 and C.

Have a Dictionary<int,Info> instance where you insert each
object twice using B1 and B2 as keys.

Arne
 
bahoo said:
[...]
My current implementation is a for-loop that goes through all A's and
checks if the B value is the query, and if yes, return the
corresponding A and C. But this is very inefficient.
Could anyone please suggest me a better data structure for doing this,
better be a built-in one?

Are the values of B unique? I assume they are, otherwise even your
brute-force algorithm seems to have problems.

The usual way to address something like this is to create an index. In
.NET, the Dictionary<> generic class would work well for this. Using
the values from B as your index, and the corresponding column number (A)
as your value.

You would have to scan through all of the data once, adding two new
entries to the Dictionary for each column (one for each value of B).
Then given a value for B to look up, you would simply use that value as
the index for the Dictionary<>, which would return your A value, which
you could then use to also get the C value.

There are lots of other ways to implement an index; don't feel you have
to use Dictionary<>. It just happens to be relatively convenient. An
alternative might be to maintain an array sorted based on the B values,
and containing a data structure that includes the A value. Then a
binary search would quickly find the right array element based on the B
value, which would then yield the A value.

Any index will, of course, improve performance at the cost of memory
usage. You don't say how many columns of data you might have, but if
it's small, your brute-force iterative method might well be just fine.

Pete

Thanks Pete, but sorry that I didn't mention the B values are NOT
unique: different A's can have the same B value. When I do look up, I
wish to retrieve all these A's and C's that correspond to the query
B.
What I did in my brute-force approach is to scan through all the B's,
and every time I find a match, I record the A's and C's in a list.
If "Dictionary" doesn't apply here, what would you suggest next?

Thanks!!
bahoo
 
bahoo said:
[...]
My current implementation is a for-loop that goes through all A's and
checks if the B value is the query, and if yes, return the
corresponding A and C. But this is very inefficient.
Could anyone please suggest me a better data structure for doing this,
better be a built-in one?

Are the values of B unique? I assume they are, otherwise even your
brute-force algorithm seems to have problems.

The usual way to address something like this is to create an index. In
.NET, the Dictionary<> generic class would work well for this. Using
the values from B as your index, and the corresponding column number (A)
as your value.

You would have to scan through all of the data once, adding two new
entries to the Dictionary for each column (one for each value of B).
Then given a value for B to look up, you would simply use that value as
the index for the Dictionary<>, which would return your A value, which
you could then use to also get the C value.

There are lots of other ways to implement an index; don't feel you have
to use Dictionary<>. It just happens to be relatively convenient. An
alternative might be to maintain an array sorted based on the B values,
and containing a data structure that includes the A value. Then a
binary search would quickly find the right array element based on the B
value, which would then yield the A value.

Any index will, of course, improve performance at the cost of memory
usage. You don't say how many columns of data you might have, but if
it's small, your brute-force iterative method might well be just fine.

Pete

Thanks Pete, but sorry that I didn't mention the B values are NOT
unique: different A's can have the same B value. When I do look up, I
wish to retrieve all these A's and C's that correspond to the query
B.
What I did in my brute-force approach is to scan through all the B's,
and every time I find a match, I record the A's and C's in a list.
If "Dictionary" doesn't apply here, what would you suggest next?

Thanks!!
bahoo
 
bahoo said:
Thanks Pete, but sorry that I didn't mention the B values are NOT
unique: different A's can have the same B value. When I do look up, I
wish to retrieve all these A's and C's that correspond to the query
B.
What I did in my brute-force approach is to scan through all the B's,
and every time I find a match, I record the A's and C's in a list.
If "Dictionary" doesn't apply here, what would you suggest next?

You can use a Dictionary<int, List<...>> where you put the items in the
lists.

For each item you add you check if the dictionary contains the key. If
it does, just add the item to the existing list, if it doesn't, create a
list, add the item to it and add the list to the dictionary.
 
bahoo said:
[...]
What I did in my brute-force approach is to scan through all the B's,
and every time I find a match, I record the A's and C's in a list.
If "Dictionary" doesn't apply here, what would you suggest next?

Well, as I mentioned, using an array sorted by the B values, with a
binary search to look them up, is an alternative. With duplicated
values, you'd have to scan forward and back in the array once you've
found a match, to ensure that you retrieve all of the records. But it
would be reasonably fast.

There may be some hashtable-based structure in .NET that allows for
duplicate values. If there is, I don't know it off the top of my head
and I don't have time to look it up. However, generally speaking that
would be an option too (whether you have to implement it yourself or
not, I don't know).

I will also reiterate my suggestion that if the brute-force method works
for the size of data you have, there may be no need to do anything more
complicated than that.

Pete
 
Göran Andersson said:
You can use a Dictionary<int, List<...>> where you put the items in the
lists.

Yup, that's a pretty good approach too. I'm embarrassed to not have
mentioned it myself, if only because I have used that exact technique in
at least one of my own programs. :)

Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top