Needleman-Wunsch - producing more than the optimal solution -- pleasehelp

A

almurph

Folks,

I want to modify the Needleman Wunsch algorithm (pseudo-code here:
http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm) so that it
produces more than the optimal solution.

What I want is the optimal solution, second optimal solution, third
and so on do the chain.

Problem is though I don't know how to do this. Can anyone help
please? Any comments/suggestions/code-sample would be most
appreciated.

Thanks,
Al.
 
E

Ethan Strauss

Hi Chris,
I am not sure exactly what you have right now, but I have been using a
Smith Waterman alignment algorithm I got from SourceForge
(http://jaligner.sourceforge.net/naligner/) which seems to work pretty well.
I have thought about trying to get it to do pretty much what you have said
you are looking for, but have not yet done so. If it would be helpful,
perhaps we could work on this together? I never have quite wrapped my brain
around these dynamic programming algorithms and perhaps it is time.
Let me know if you are interested.
Ethan

Ethan Strauss Ph.D.
Bioinformatics Scientist
Promega Corporation
2800 Woods Hollow Rd.
Madison, WI 53711
608-274-4330
800-356-9526
(e-mail address removed)
 
A

almurph

Ethan/Peter,


Firstly thank you both for your comments. Peter, I am familiar with
Smith-Waterman. The implementation that you have indicated (Java I
think) is for Gotoh's improvement. I have seen this before and found
it a bit complex. With that in mind could you have a look at the
following URL:
http://mlab.cb.k.u-tokyo.ac.jp/~moris/LSGSP/Chapter6_ApproximateStringMatching.java

There are several algorithms here (including smith-Waterman-Gotoh)
and I prefer this implementation. I think it uses Affile Gap Penalties
(cf. http://en.wikipedia.org/wiki/Gap_penalty)


As for doing a callaboration together, this takes me to my next
point. Peter, thank you for you comments. It's quite a response. I am
aware of the danger here - the problem is exponential, this is true.
However, the website: http://www.avatar.se/molbioinfo2001/dynprog/dynamic.html
(particularly the bit at the bottom concerning "alternative
solutions") leads me to believe that it may be possible - in the sense
of working at the traceback level and using the fact that the
solutions must never exceed the maximum value (smith waterman) or the
bottom right value (needleman-wunsch)

However I do not have a nice metric to offer to rank any solutions
that may be obtained. This still eludes me.

I am particularly interested in solutions that minimise any
introduction of a gap in the corpus text (i.e. the text that you are
searching through). This condition could help to reduce the sheer
number of possible permutations, I think...


I would like to offer one more thing. This is slightly off topic but
I would like to offer it tentatively. I obtained the optimal solution
and compared and contrasted it with the corpus in an effort to extract
a regular expression. I then use this regular expression to search for
similar patterns. Something like this:

CORPUS: ZYGZBCZ
PATTERN: ZYY-BCZ

Hence I extract a regEx like: "ZY[GY]?BCZ" and see if patterns like
this exist elsewhere. Its off topic and a bit of a dodge to be quite
honest but I tentatively offer it.

Al.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top