In-memory whitelist of 1M phrases

A

AV

Hi,

What is the best way to handle the following:

* List of 1M unique phrases
* These are "whitelist" phrases. Meaning that if an input string
matches one of these phrases it is "GOOD" if it does not match any of
these phrases it is "BAD".
* The process gets hit 200-500 times per SECOND
* I want to avoid making this many look up requests to a database
with
clustered index that stores these phrases
* I want to somehow detect the BAD phrase with an in-memory lookup
* The problem is that in my environment (ASP.NET) the regular
hashtable takes up too much space in memory for these 1M records
(since it actually stores the values of each record).
* I do not care about the value of the record, I only want to store
it's hash and in the minimalistic way so that the looks that happe
200-500 requests per second can use the in-memory struct.


Any suggestions for C#/ASP.NET environment?


Thanks,
AV
 
A

Arne Vajhøj

AV said:
What is the best way to handle the following:

* List of 1M unique phrases
* These are "whitelist" phrases. Meaning that if an input string
matches one of these phrases it is "GOOD" if it does not match any of
these phrases it is "BAD".
* The process gets hit 200-500 times per SECOND
* I want to avoid making this many look up requests to a database
with
clustered index that stores these phrases
* I want to somehow detect the BAD phrase with an in-memory lookup
* The problem is that in my environment (ASP.NET) the regular
hashtable takes up too much space in memory for these 1M records
(since it actually stores the values of each record).
* I do not care about the value of the record, I only want to store
it's hash and in the minimalistic way so that the looks that happe
200-500 requests per second can use the in-memory struct.

Any suggestions for C#/ASP.NET environment?

1) create a hash function that creates a reasonable good hash
of the first 24 characters and the last 8 characters to let us
say the full range of a ulong

2) make a database table with:
id,int,pk
phrase,varchar(n)
hash,bigint,ix

3) create a class that hold a fast searchable list of ulong hashes
(HashSet or sorted array)

4) implement code that populates the database table

5) implement code to load from table to in-memory structure

6) implement logic as:

search(phrase) {
h = hash(phrase)
if h in in-memory structure {
query = SELECT phrase FROM tabel WHERE hash=h
if phrase found in query results {
return GOOD
} else {
return BAD
}
} else {
return BAD
}

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top