Count consecutive characters within a cell

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I have a genetic sequence such as:
AATTCAGTTACTTTTGCA

I need a formula that will tell me if this cell has a run of 4 or more
consecutive letters. The run can consist of 4 or more A, T, C, or G.

The above example can return simply as "yes" or can be as complex as "this
cell has 4 consecutive T's"

Thanks,
Josh
 
One way, using a User Define Function:

Public Function XOrMore( _
ByVal sTest As String, _
ByVal X As Long) As Boolean
Dim nCount As Long
Dim i As Long
For i = 2 To Len(sTest)
If Mid(sTest, i - 1, 1) = Mid(sTest, i, 1) Then
nCount = nCount + 1
If nCount = X - 1 Then Exit For
Else
nCount = 0
End If
Next i
XOrMore = nCount = X - 1
End Function

Call as =XorMore(A1, 4)

if you're not familiar with UDFs, see

http://www.mvps.org/dmcritchie/excel/getstarted.htm
 
How about just True or False

=LEN(A1)<>LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(
SUBSTITUTE(UPPER(A1),"GGGG",""),"AAAA",""),"TTTT",""),"CCCC",""))

Or

=TRIM(
IF(LEN(A1)=LEN(SUBSTITUTE(UPPER(A1),"GGGG",""))," ","Consecutive G's ")
&IF(LEN(A1)=LEN(SUBSTITUTE(UPPER(A1),"AAAA",""))," ","Consecutive A's ")
&IF(LEN(A1)=LEN(SUBSTITUTE(UPPER(A1),"TTTT",""))," ","Consecutive T's ")
&IF(LEN(A1)=LEN(SUBSTITUTE(UPPER(A1),"CCCC",""))," ","Consecutive C's "))
 
Here's one way:

=IF(OR(LEN(SUBSTITUTE($A1,REPT(E1,4),""))<LEN(A1),(LEN(SUBSTITUTE($A1,REPT(F1,4),""))<LEN(A1)),(LEN(SUBSTITUTE($A1,REPT(G1,4),""))<LEN(A1)),(LEN(SUBSTITUTE($A1,REPT(H1,4),""))<LEN(A1))),"Yes","No")

Cells E1,F1,G1 and H1 represent the letters that you want to test against 4
consecutive.
E1= A
F1=T
G1=C
H1=G

HTH,
Paul
 
assume your data in A1

Create a lookup list
A5= AAAA
A6= CCCC
A7= GGGG
A8= TTTT

=IF(SUMPRODUCT(--(ISNUMBER(SEARCH(A5:A8,A1))))>0,"yes","no")
 
Back
Top