Working with Hastables

David · Jan 29, 2006

Hi all,

I am new to Hashtables, so at the moment, not fully familiar with them.

I have been experimenting with a search engine spider written in c#. It uses
hashtables to hold the catalog.

Now, if I have a large site, or I want to scan many websites, then the
hashtables would get very large. I am looking at writing them to disk and
reading them, though am not sure how this would work.

Now, I have found this on the net (vb code, I can convert, so no need to
worry about that)

Private Sub cmdSave_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles cmdSave.Click
'Save the hashtable
If File.Exists(Application.StartupPath & "\data.dat") = True Then
File.Delete(Application.StartupPath & "\data.dat")
Dim fs As New FileStream(Application.StartupPath & "\data.dat",
FileMode.CreateNew)
Dim bf As New
Runtime.Serialization.Formatters.Binary.BinaryFormatter()
bf.Serialize(fs, HashTest)
fs.Close()
End Sub

Private Sub cmdLoad_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles cmdLoad.Click
If File.Exists(Application.StartupPath & "\data.dat") = False Then
Exit Sub
Dim fs As New FileStream(Application.StartupPath & "\data.dat",
FileMode.Open)
Dim bf As New
Runtime.Serialization.Formatters.Binary.BinaryFormatter()
HashTest = bf.Deserialize(fs)
fs.Close()
cmdIterate_Click(Nothing, Nothing)
End Sub

which looks like it might be suitable... but, my question is then,
1. Can I incrementally add to the hashtable file so that I don't run out of
memory when scanning for files to catalog
2. When reading it back off disk, do I have to read the whole lot into
memory in order to search through it?

If I can't do either of these, then what would you suggest?

The way I want to use it is somewhat like a sql database, where I can
quickly select the records I need.

--
Best regards,
Dave Colliver.
http://www.AshfieldFOCUS.com
~~
http://www.FOCUSPortals.com - Local franchises available

Piotr Dobrowolski · Jan 29, 2006

Dnia 29-01-2006 o 17:20:07 David

Hi all,

I am new to Hashtables, so at the moment, not fully familiar with them.

I have been experimenting with a search engine spider written in c#. It
uses
hashtables to hold the catalog.

Now, if I have a large site, or I want to scan many websites, then the
hashtables would get very large. I am looking at writing them to disk and
reading them, though am not sure how this would work.

Now, I have found this on the net (vb code, I can convert, so no need to
worry about that)

Private Sub cmdSave_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles cmdSave.Click
'Save the hashtable
If File.Exists(Application.StartupPath & "\data.dat") = True Then
File.Delete(Application.StartupPath & "\data.dat")
Dim fs As New FileStream(Application.StartupPath & "\data.dat",
FileMode.CreateNew)
Dim bf As New
Runtime.Serialization.Formatters.Binary.BinaryFormatter()
bf.Serialize(fs, HashTest)
fs.Close()
End Sub

Private Sub cmdLoad_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles cmdLoad.Click
If File.Exists(Application.StartupPath & "\data.dat") = False
Then
Exit Sub
Dim fs As New FileStream(Application.StartupPath & "\data.dat",
FileMode.Open)
Dim bf As New
Runtime.Serialization.Formatters.Binary.BinaryFormatter()
HashTest = bf.Deserialize(fs)
fs.Close()
cmdIterate_Click(Nothing, Nothing)
End Sub

which looks like it might be suitable... but, my question is then,
1. Can I incrementally add to the hashtable file so that I don't run out
of
memory when scanning for files to catalog
2. When reading it back off disk, do I have to read the whole lot into
memory in order to search through it?

[PD] In the solution above I'm afraid the answer for both questions is no.
You would have to serialize and deserialize whole Hashtable.

If I can't do either of these, then what would you suggest?

The way I want to use it is somewhat like a sql database, where I can
quickly select the records I need.

[PD] If you want database functionality why not use database? Some SQL
servers (i.e. Firebird) can be embedded within your application so you
don't need to install them.

Bruce Wood · Jan 30, 2006

A hashtable is basically like a database table with a single, primary
key. (Actually, they're more analogous to in-memory indexed files, but
not everyone these days remembers what an indexed file is.)

You can store an object in the hashtable by specifying the key under
which to store it. You can retrieve the object again by giving its key.
I use them all the time: if you read some a table full of business
information from a database, often a Hashtable is a natural fit.

Take a table full of invoices, for example. Each invoice usually has a
unique invoice number. Put the invoices in a Hashtable, keyed by
invoice number. Later, when you get a foreign key in another table that
refers to an invoice number, just use that number to look up the
invoice object in your in-memory hash table.

In your case, why not store the information in a database? I believe
that Microsoft's new SQL Express is out (others will correct me if I'm
wrong about that). It's a free mini-database that will run on any
laptop / desktop / notepad machine. That way your app is scalable.

I wouldn't bother using a flat-file serialization like the one you
posted here. It may be fine for a quick test hack, but not for a real
application.

David · Jan 30, 2006

Thank you both Bruce and Piotr,

I thought what I was looking at would not be the best solution. I will look
at writing it to a database.

It is for a search engine and spider that uses Hashtables to store the
catalog. I don't like the idea of storing it this way, but need a quick
access.

I am looking at alternative search engine code as well, so I will start
another thread regarding that.

Best regards,
Dave Colliver.
http://www.AshfieldFOCUS.com
~~
http://www.FOCUSPortals.com - Local franchises available

Working with Hastables

David

Piotr Dobrowolski

Bruce Wood

David