14 million lexicons

ryu · Nov 28, 2004

Hi,

I am just curious. In the paper by the google founders, they said they are
able to load 14 million lexicons into 256mb of memory. How did they do that?
Is there anyway it can be done using dotnet or C++?

Regards
Ryu

Alexander Muylaert · Nov 28, 2004

Hi

People used to be able to get a whole game, a textprocessor and an
spreadsheet on 1 floppy...

Do the math, you will see you still have more than 19 bytes available for
each word.

But, indeed, I would also like to know how memory alignment is done in .Net
with classes. Anybody knows about a (correct) indept article about the
internal memory alignment?

kind regards

Alexander

Sahil Malik · Nov 28, 2004

256MB = 256,000,000 bits (apprx)
14 million = 14,000,000.

Divide the two - apprx 18 bits per entry.

If individual entry information is less than 18 bits per entry, then it's
possible, or else it's not. (Were they simply storing pointers to
information?).

Yes it's possible to do the above in C++. The real question is - why would
you want to? Memory is cheap !!.

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik

Jon Skeet [C# MVP] · Nov 28, 2004

Sahil Malik said:
256MB = 256,000,000 bits (apprx)

Nope - 256MB is 256,000,000 *bytes* - so you get about 18 *bytes* per
entry.

18 *bits* per entry would have been far harder to do.

ryu · Nov 28, 2004

How am I able to put 14 million terms into 256 mb? I am only able to put 2
million into approximately 120 MB , and I am using a hashtable. What can i
use besides hash table?

Sushant Bhatia · Nov 29, 2004

1) B+ Tree (used in databases..good for frequent reads, few writes)
http://babbage.clarku.edu/~achou/cs160/B+Trees/B+Trees.htm

2) Text files (sequential access)

Ryu · Nov 29, 2004

Thanks! I will that.

Brader · Nov 29, 2004

Jon said:
Nope - 256MB is 256,000,000 *bytes* - so you get about 18 *bytes* per
entry.

Correct me if I'm wrong, but 256MB is 256*1024*1024 bytes

That is 268,435,456 (few more words fits in this way)

14 million lexicons

ryu

Alexander Muylaert

Sahil Malik

Jon Skeet [C# MVP]

ryu

Sushant Bhatia

Ryu

Brader