Best data structure?

  • Thread starter Thread starter James
  • Start date Start date
J

James

I am using vb.net and need to keep in memory a large data structure, so I am
looking for the best option. And after several test I am pretty confused. So
I will be grateful if anyone can help me.



My basic need is:



Keep an "array" (or other structure) of pointers to "arrays" (or other
structure) of bytes. Example:



arrayPointer(1,000,000) each one point to arraByte1(), arrayByte2(),
arrayByte3().., arrayByte1000000().

So it is possible to redim each arrayByte() to the dimension I need.

I supposed that a configuration like the above (suppose each arrayByte hold
1 Byte) needs:



Pointers: 4 bytes x 1 Million = 4 Mb

Arrays: 1 byte per array x 1 Million arrays = 1 Mb



I have trying different option and I obtained a very high memory
comsumption, is there any way to keep memory low and achieve what I need?,

I am using "GC.GetTotalMemory(True)" to find out the memory usage is that
right?





Array of structures

Public Structure stbyte

Public byt() As Byte

End Structure



Memory = GC.GetTotalMemory(True)

Dim intPo(1000000) As stbyte ' 1 Million structures

For intI = 1 To 1000000

ReDim intPo(intI - 1).byt(0) ' Redim to 1 byte each

Next

Memory = GC.GetTotalMemory(True) - Memory

Memory = Memory / 1000000



I obtained Memory= 20 Mbytes





Jagged Array of Bytes

Memory = GC.GetTotalMemory(True)

Dim bytes(1000000)() As Byte ' 1 Million Jagged array

For intI = 1 To 1000000

ReDim bytes(intI - 1)(1) ' Each vector to 1 byte

Next

Memory = GC.GetTotalMemory(True) - Memory

Memory = Memory / 1000000



I obtained Memory= 20 Mbytes





Rectangular Array of Bytes

Memory = GC.GetTotalMemory(True)

Dim bytes(1000000, 0) As Byte ' 1 Million array of 1 byte each

Memory = GC.GetTotalMemory(True) - Memory

Memory = Memory / 1000000



I obtained Memory = 1 Mbytes (but in this case I can not redim each row).





Array of BitsArrays

Memory = GC.GetTotalMemory(True)

Dim bitArray(1000000) As BitArray ' 1 Million bitarrays

For intI = 1 To 1000000

bitArray(intI - 1) = New BitArray(8) ' Each bitarray 8 bits

Next

Memory = GC.GetTotalMemory(True) - Memory

Memory = Memory / 1000000



I obtained Memory = 40 Mbytes
 
James,

The first question you have to ask yourself forever with arrays is. "Are all
elements of the array used". If not than in what amount not.

Maybe is by instance the HashTable than a better choise.

However in my opinion is there not to give a general answer based on your
question as it is.

Cor
 
Probably the question is not well define. What I need:

I need the less consuming memory data structure that allow me to:

1) Keep a list of "Pointers" or "references" to data structures (for this a
Hashtable is fine).
2) Each data structure can be resize dynamically, each one with different
sizes.

An example could be:
An array of pointers, each pointer points to an array of bytes. That is: 1
array of M pointers, and M arrays of bytes (each one can hod different
number of bytes).

Thanks,

james
 
The big picture may help (is this for things like spreadsheet cells ? What
is the nature of those informations ?).

For example a "sparse array" structure is a combination of elementar data
structures that allows to store only filled in cells. (i.e. you can have a
200x200 cells array but non used cells won't take storage).

Also as a side note that there is AFAIK not a single solution for this (for
example basically it uses a list of rows/columns and each list holds a list
of the cells on the row/column) but as long as the external interface is
exposed as an array (cells(x,y)) you can actually change the actual
implementation with no or minimal impact on the main code...
 
Ok! These are my needs:

I have a list of words (more than a million different words) I keep this
words in a hashtable for easy access. For each of the words I need to keep
in compress (binary form) the documents ID where each word belong (it can be
hundreds of documents for each word). Sometimes I need to add more documents
ID to some of the words. That is:

Words DocumentID
Paris -> doc123, doc234, doc345, doc345,...
London->doc2, doc34, doc234,...
....
So I need to keep in memory (that is what i am looking for low memory usage)
the Words (actully I used a hashtable) and some structure to hold the
document ID's for each word. Sometimes I need to add new documents to some
of the words (normally after the last document for that word), so the data
structure that hold the documents must be able to increase it lenght.

My first thought was to use a jagged array but it consume a lot of overhead
memory:
Dim bytes(1000000)() As Byte ' 1 Million Jagged array
For intI = 1 To 1000000
ReDim bytes(intI - 1)(1) ' Each vector to 1 byte
Next
This consume: 20 Mbytes of data?? when I was just supposing it will consume
1 Mbyte

Any help??
 
Have you considered using a class object that has a "Word" property and a
Collection of Document ID"s. You would have a class instance for each word.
 
Dennis,

That might not be a bad idea. That class could encapsulate the
compression/decompression of the word and associated document names for
more efficient storage. Sure, there'll be a performance hit, but with
1,000,000 word associations the OP is going to have to get creative
with the data structure. It'll be an excerise in balancing space vs.
performance.

That's one thing you didn't tell us James. I think we have a pretty
good handle on your memory requirements, but what about performance.
Is that important? What is the typical scenario for accessing the data
structure?

Brian
 
Back
Top