A
almurph
Hi,
Wondering if you can you help me here. I'm looking for suggestions
essentially (though reasoning would be appreciated too i.e. the why).
I'm looking for a good search algorithm that can handle (i.e. scale
well with) a gradually increasing (very long) string.
For example, say the search pattern is: "ADGH" and the corpus is
"AFSDWFSDTERTDFVDFGDFHHJCFHDFGWEWEGDFGDGSDGSDGDFGHDFDFGDADGHWETERGDFGEDHDFH"
Now, I know that there are many search algorithms out there can can
find this pattern. But I have a number of other requirements:
1. The string is gradually getting longer (from the right) - hence if
building an intermediate object (like a suffix tree) for example - it
has to be able to scale easily and in linear time (like, Ukkonen's
algorithm - which build from the left i.e. using prefixes in near
linear time)
2. There could be multiple occurances of the search pattern in the
corpus
3. I'm also looking for the ability to spot *close* matches, like, say
the string "ADFGH" would be suggested for the search pattern "ADGH" as
it's close...
4. If building any intermediate object the ability to store easily n a
database _may_ be useful, hence the ability of the object to be
serialisable would be handy.
A tall order I know, it may require more than 1 type of algorithm.
So, I would greatly appreciate any suggestions/comments/ideas that you
may want to make.
Thanking you
Al.
Wondering if you can you help me here. I'm looking for suggestions
essentially (though reasoning would be appreciated too i.e. the why).
I'm looking for a good search algorithm that can handle (i.e. scale
well with) a gradually increasing (very long) string.
For example, say the search pattern is: "ADGH" and the corpus is
"AFSDWFSDTERTDFVDFGDFHHJCFHDFGWEWEGDFGDGSDGSDGDFGHDFDFGDADGHWETERGDFGEDHDFH"
Now, I know that there are many search algorithms out there can can
find this pattern. But I have a number of other requirements:
1. The string is gradually getting longer (from the right) - hence if
building an intermediate object (like a suffix tree) for example - it
has to be able to scale easily and in linear time (like, Ukkonen's
algorithm - which build from the left i.e. using prefixes in near
linear time)
2. There could be multiple occurances of the search pattern in the
corpus
3. I'm also looking for the ability to spot *close* matches, like, say
the string "ADFGH" would be suggested for the search pattern "ADGH" as
it's close...
4. If building any intermediate object the ability to store easily n a
database _may_ be useful, hence the ability of the object to be
serialisable would be handy.
A tall order I know, it may require more than 1 type of algorithm.
So, I would greatly appreciate any suggestions/comments/ideas that you
may want to make.
Thanking you
Al.