A function of reverse complement

M

M H

I've written an excel function aim to change a RNA sequence to its
reverse complement (e.g. from "ACGUUGUA" to "UACAACGU") as:

Function ReverseComplement(Rcell As Range, Optional IsText As Boolean)

Dim i As Integer
Dim strReverseSeq As String
Dim strSenseSeq As String

strSenseSeq = Trim(Rcell)

For i = 1 To Len(strSenseSeq)
strReverseSeq = Mid(strSenseSeq, i, 1) & _ strReverseSeq
Next i

If IsText = False Then
ReverseComplement = CLng(strReverseSeq)
Else
ReverseComplement = strReverseSeq
End If

strSeqA1 = Replace(strReverseSeq, "A", "1")
strSeqC2 = Replace(strSeqA1, "C", "2")
strSeqG3 = Replace(strSeqC2, "G", "3")
strSeqT4 = Replace(strSeqG3, "U", "4")
strSeq1T = Replace(strSeqT4, "1", "U")
strSeq2G = Replace(strSeq1T, "2", "G")
strSeq3C = Replace(strSeq2G, "3", "C")
strSeq4A = Replace(strSeq3C, "4", "A")

End Function

The first half works fine, which reverse the sequence, but the second
part doesn't work, which cannot make the comoplements Please help for
debug. Much thanks.
 
T

Tom Ogilvy

As an example
Assume you had a reversed string like AAA

your code would convert it like this

AAA => 111 => UUU

If that is what you want, then Ok. If the problem is you would want 111 as
a result, then I think you need to go to an intermediate stage (two step
process)

AAA => MMM => 111

Where the middle stage would use unique identifiers so you wouldn't convert
any value after it was converted to a final value.

A to M, M to 1
C to N, N to 2
G to O, O to 3
U to P, P to 4
1 to Q, Q to U
2 to R, R to G
3 to S, S to C
4 to T, T to A
 
J

JE McGimpsey

Perhaps

Public Function ReverseComplement(sInput As String)
Dim sTemp As String
sTemp = UCase(Replace(Replace(Replace(Replace( _
sInput, "A", "u"), "C", "g"), "G", "c"), "U", "a"))
ReverseComplement = sTemp
End Function
 
J

JE McGimpsey

And I'm not sure what you were after with the IsText, but if you want
ACGU to return 4321:

Public Function ReverseComplement(sInput As String, _
Optional bText As Boolean = True) As String
Dim sTemp As Variant
If bText Then
sTemp = UCase(Replace(Replace(Replace(Replace( _
sInput, "A", "u"), "C", "g"), "G", "c"), "U", "a"))
Else
sTemp = Replace(Replace(Replace(Replace( _
sInput, "A", "4"), "C", "3"), "G", "2"), "U", "1")
End If
ReverseComplement = sTemp
End Function
 
M

Myrna Larson

Nice one, JE! I think he also wants the characters in reverse order, so
perhaps this slight modification will take care of that:

Public Function ReverseComplement(sInput As String)
Dim sTemp As String
sTemp = UCase(Replace(Replace(Replace(Replace( _
sInput, "A", "u"), "C", "g"), "G", "c"), "U", "a"))
ReverseComplement = StrReverse(sTemp)
End Function
 
T

Tom Ogilvy

After looking more closely, (you didn't say what you error/problem was
exactly and I don't know anything about RNA), I see you have just failed to
assign the result to the function name.

Function ReverseComplement(Rcell As Range)

Dim i As Integer
Dim strReverseSeq As String
Dim strSenseSeq As String

strSenseSeq = Trim(Rcell)

For i = 1 To Len(strSenseSeq)
strReverseSeq = Mid(strSenseSeq, i, 1) & _
strReverseSeq
Next i


strSeqA1 = Replace(strReverseSeq, "A", "1")
strSeqC2 = Replace(strSeqA1, "C", "2")
strSeqG3 = Replace(strSeqC2, "G", "3")
strSeqT4 = Replace(strSeqG3, "U", "4")
strSeq1T = Replace(strSeqT4, "1", "U")
strSeq2G = Replace(strSeq1T, "2", "G")
strSeq3C = Replace(strSeq2G, "3", "C")
strSeq4A = Replace(strSeq3C, "4", "A")
ReverseComplement = strSeq4A
End Function


I don't see any reason for the istext argument as there is no reason to
convert to clng.
 
T

Tom Ogilvy

If the objective is to pack everything together, why stop there

Public Function ReverseComplement(sInput As String)
ReverseComplement = _
StrReverse(UCase(Replace(Replace(Replace(Replace( _
sInput, "A", "u"), "C", "g"), "G", "c"), "U", "a")))
End Function

---
Regards,
Tom Ogilvy

Myrna Larson said:
Nice one, JE! I think he also wants the characters in reverse order, so
perhaps this slight modification will take care of that:

Public Function ReverseComplement(sInput As String)
Dim sTemp As String
sTemp = UCase(Replace(Replace(Replace(Replace( _
sInput, "A", "u"), "C", "g"), "G", "c"), "U", "a"))
ReverseComplement = StrReverse(sTemp)
End Function
 
J

JE McGimpsey

The objective (in my case) was to avoid a series of explicit transition
steps (e.g, A->1->U, C->2->G,...), reducing 8 Replace's to 4 + a UCase().

Whether StrReverse is applied to an named or unnamed temporary variable
is probably irrelevant.

For clarity, if nothing else, I generally use temporary variables in the
function, making an explicit assignment to the function's returned
variable at the very end.
 
M

Myrna Larson

I guess my objective in this case is more to show the OP what steps are to be
taken. To demonstrate as a learning tool, I would have done the replacements
as 4 steps (as the OP had done originally), a 5th line to to the UCase, a 6th
for the StrReverse, and a 7th to assign to the function output. I don't know
whether in the real world that is slower or faster than combining everything
into one statement.

If the objective is to pack everything together, why stop there

Public Function ReverseComplement(sInput As String)
ReverseComplement = _
StrReverse(UCase(Replace(Replace(Replace(Replace( _
sInput, "A", "u"), "C", "g"), "G", "c"), "U", "a")))
End Function
 
M

Myrna Larson

I was impressed not by JE's use of a single statement, but by replacing with a
lower case letter instead of a number, symbol or different letter altogether.
The first approach requires 4 replacements plus the UCase$; the latter
requires 8 replacements.

If the objective is to pack everything together, why stop there

Public Function ReverseComplement(sInput As String)
ReverseComplement = _
StrReverse(UCase(Replace(Replace(Replace(Replace( _
sInput, "A", "u"), "C", "g"), "G", "c"), "U", "a")))
End Function
 
G

GysdeJongh

Hi,
sorry did not find the original posting so I answer this one.
I see how you reverse the strand .
I do not understand how you want to complement the strand.
So I can not comment your solution.
But have a solution for the problem if you want.
Feel free to adapt it to your needs.

'This function converts a DNA string to its
'reverse complement
'Gys de Jongh 16-jun-1999
Function Rev(Forw As String) As String
Dim ForwA(1 To 100) As String * 1
Dim RevA(1 To 100) As String * 1
N = Len(Forw)
If N > 100 Then
Rev = "Input too long (max 100)"
Exit Function
End If
For I = 1 To N
ForwA(I) = Mid(Forw, I, 1)
Next I
For I = 1 To N
Select Case ForwA(N - I + 1)
Case "A"
RevA(I) = "T"
Case "a"
RevA(I) = "t"
'From A to T
Case "C"
RevA(I) = "G"
Case "c"
RevA(I) = "g"
'From C to G
Case "G"
RevA(I) = "C"
Case "g"
RevA(I) = "c"
'From G to C
Case "T"
RevA(I) = "A"
Case "t"
RevA(I) = "a"
'From T to A
Case Else
RevA(I) = "N"
'Rest is N
End Select
Next I
For I = 1 To N
Rev = Rev & RevA(I)
Next I
End Function

Short explanation for Tom :
DNA contains the information for cells
It is a double helix with 2 strings , molecules
The two strings contain the same information
They are redundant , one can be used to construct the other
In each string A only pairs with T and C only with G
But :
the Sense strand reads from left to right and
the anti Sense strand from right to left
so :

ACGTCCCTGAAATT <====Sense strand
TGCAGGGACTTTAA <====anti Sense strand

Finding the reverse complement means finding the other if one is known

hth
Gys




Tom Ogilvy said:
As an example
Assume you had a reversed string like AAA

your code would convert it like this

AAA => 111 => UUU

If that is what you want, then Ok. If the problem is you would want 111 as
a result, then I think you need to go to an intermediate stage (two step
process)

AAA => MMM => 111

Where the middle stage would use unique identifiers so you wouldn't convert
any value after it was converted to a final value.

A to M, M to 1
C to N, N to 2
G to O, O to 3
U to P, P to 4
1 to Q, Q to U
2 to R, R to G
3 to S, S to C
4 to T, T to A

here strSeqA1 etc are just a not declared variables which don't seem to reach
the output of the function ????
 
G

GysdeJongh

Hi,
sorry did not find the original posting so I answer this one.
I see how you reverse the strand .
I do not understand how you want to complement the strand.
So I can not comment your solution.
But have a solution for the problem if you want.
Feel free to adapt it to your needs.

'This function converts a DNA string to its
'reverse complement
'Gys de Jongh 16-jun-1999
Function Rev(Forw As String) As String
Dim ForwA(1 To 100) As String * 1
Dim RevA(1 To 100) As String * 1
N = Len(Forw)
If N > 100 Then
Rev = "Input too long (max 100)"
Exit Function
End If
For I = 1 To N
ForwA(I) = Mid(Forw, I, 1)
Next I
For I = 1 To N
Select Case ForwA(N - I + 1)
Case "A"
RevA(I) = "T"
Case "a"
RevA(I) = "t"
'From A to T
Case "C"
RevA(I) = "G"
Case "c"
RevA(I) = "g"
'From C to G
Case "G"
RevA(I) = "C"
Case "g"
RevA(I) = "c"
'From G to C
Case "T"
RevA(I) = "A"
Case "t"
RevA(I) = "a"
'From T to A
Case Else
RevA(I) = "N"
'Rest is N
End Select
Next I
For I = 1 To N
Rev = Rev & RevA(I)
Next I
End Function

Short explanation for Tom :
DNA contains the information for cells
It is a double helix with 2 strings , molecules
The two strings contain the same information
They are redundant , one can be used to construct the other
In each string A only pairs with T and C only with G
But :
the Sense strand reads from left to right and
the anti Sense strand from right to left
so :

ACGTCCCTGAAATT <====Sense strand
TGCAGGGACTTTAA <====anti Sense strand

Finding the reverse complement means finding the other if one is known

hth
Gys
 
M

Myrna Larson

I believe By Complement, he means interchange A and T and C and G, as you (and
he) are doing.

But why do you not want to use Replace, StrReverse and UCase$? The code is
much shorter -- it can be reduced to just 7 lines.
 
T

Tom Ogilvy

Bad choice of words.

--
Regards,
Tom Ogilvy

Myrna Larson said:
I was impressed not by JE's use of a single statement, but by replacing with a
lower case letter instead of a number, symbol or different letter altogether.
The first approach requires 4 replacements plus the UCase$; the latter
requires 8 replacements.
 
M

M H

Much thanks to JE's original idea, precise and concise, and Myrna's
essential amendment. Actually I realize my stupidity of missing one
line as Tom suggested when I was on the bed, and I'm really amazed to
see the how several brains led to the one-line code!

This is finally what I've to achieve, with including the degenerate
alphabets and distinguishing DNA from RNA:

Public Function RevCompl(sInput As String)

Dim sTemp As String

sUInput = UCase(sInput)
sTemp1 = Replace(sUInput, "T", "")

'If it is DNA
If Len(sTemp1) < Len(sUInput) Then
sTemp = Replace(Replace(Replace(Replace(Replace(Replace( _
Replace(Replace(Replace(Replace(Replace(Replace( _
sUInput, "R", "y"), "M", "k"), "Y", "r"), "K", "m"), _
"H", "d"), "D", "h"), "B", "v"), "V", "b"), _
"A", "t"), "C", "g"), "G", "c"), "T", "a")
'if it is RNA
Else
sTemp = Replace(Replace(Replace(Replace(Replace(Replace( _
Replace(Replace(Replace(Replace(Replace(Replace( _
sUInput, "R", "y"), "M", "k"), "Y", "r"), "K", "m"), _
"H", "d"), "D", "h"), "B", "v"), "V", "b"), _
"A", "u"), "C", "g"), "G", "c"), "U", "a")
End If

RevCompl = StrReverse(UCase(sTemp))

End Function


rgds,
Maurice
 
G

GysdeJongh

Hi,
you are right , thank you very much for this suggestion. The job is to reverse
the string and replace A by T etc. It is called : calculating the Reverse
Complement in bioinformatics. There is of course dedicated software. Look here
if you like to see an example :

http://www.mbio.ncsu.edu/BioEdit/bioedit.html


I made this little piece of VBA to calculate the reverse complement for a very
large number of short sequences.

Your solution is more elegant and probably faster. It did not occur to me that
it can be done that way because I am in genetics and statistics . I am not a
programmer.
Thanks again for your time
Gys
 
G

GysdeJongh

Hi,
very nice solution. I have one question : should there be a test on the lenght
of the sInput srting or is there no 65k restriction ?

Gys
 
P

Peter T

Hi Gys

Someone may advise what the ultimate limit is but try for your self:

Sub test()
Dim s As String, s2 As String, n As Long, s2

s = "ACGT"
For n = 1 To 15 ' 15 > len 64k, 17 > len 512k
s = s & s
Next

s2 = RevCompl(s)

MsgBox Len(s) & " " & Left(s, 12) & vbCr & _
Len(s2) & " " & Left(s2, 12)

End Sub

But note cells can only store strings up to 32k.

Regards,
Peter T
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top