Palindromes and repeats

  • Thread starter Luciano Paulino da Silva
  • Start date
L

Luciano Paulino da Silva

[snip]
Dear John,
Perfect, it is exactly this! It sounds good. After that we could think
other algorithms (rules).
Thanks in advance,
Luciano- Hide quoted text -

DearLuciano,
See if this works for you. As before, it uses Main() as the driver
sub, so you should delete all of my old code or create a new workbook
and put this in a new code module so as
to avoid possible name conflicts. I've tried it on a number of strings
and it seems to work as intended, but you should give it a full
battery of tests as well in case I overlooked something.

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Dim palindromes As Object
Dim centers As Variant
Dim SearchString As String
Dim PalinCount As Long
Dim anchor As Range

Sub Initialize()
'initializes module-level variables
'centers contains, for each possible
'center of a palindrome (which may fall
'between two characters), the indices and
'length of the largest palindrome centered there.
'For arithmetical convience, centers will
'be represented by integers between 3 and
'2n-1. palindromes will be a dictionary
'object to store found palindromes and
'their frequencies

Dim i As Long, j As Long, k As Long, n As Long

Set palindromes = CreateObject("Scripting.Dictionary")
Set anchor = Range("A3")
SearchString = Range("A1").Value
n = Len(SearchString)
PalinCount = 0
ReDim centers(3 To 2 * n - 1, 1 To 3)

For i = 3 To 2 * n - 1
    If i Mod 2 = 0 Then
        'searching for palindromes like BAB
        j = i / 2 - 1
        k = j + 2
    Else
        'searching for palindromes like BAAB
        j = Int(i / 2)
        k = j + 1
    End If
    Do While j >= 1 And k <= n And _
      Mid(SearchString, IIf(j > 0, j, 1), 1) = _
      Mid(SearchString, k, 1)
      '(kludge for lack of short-cicuit evaluation in VBA)
        j = j - 1
        k = k + 1
    Loop
    'last pass through the loop caused trouble, so correct:
    j = j + 1
    k = k - 1
    '(this has weird effect of making j > k (j = k+1)
    'if center cuts between two characters but no
    'palindrome was found. Nevertheless, this gives
    'correct length of 0 in this case, so ok)
    centers(i, 1) = k - j + 1 'length
    centers(i, 2) = j
    centers(i, 3) = k
Next i
End Sub
Sub FindLargest(j As Long, k As Long)
'finds largest palindrome in range j-k
'after finding leftmost such
'it recursively calls itself on left and right remainders,
'printing the string if needed between calls

Dim i As Long, MinI As Long, MaxI As Long
Dim delta As Long
Dim MaxLen As Long, MaxStart As Long, MaxEnd As Long
Dim NewJ As Long, NewK As Long
Dim palindrome As String

MinI = 2 * j + 1
MaxI = 2 * k - 1

For i = MinI To MaxI
    If centers(i, 1) > 1 Then
        'first contract radius of palindrome
        'to make it fit in j-k if needed
        If centers(i, 2) < j Or centers(i, 3) > k Then
            delta = j - centers(i, 2)
            If delta < centers(i, 3) - k Then
                delta = centers(i, 3) - k
            End If
            centers(i, 2) = centers(i, 2) + delta
            centers(i, 3) = centers(i, 3) - delta
            centers(i, 1) = centers(i, 3) - centers(i, 2) +1
        End If
        If centers(i, 1) > MaxLen Then
            MaxLen = centers(i, 1)
            MaxStart = centers(i, 2)
            MaxEnd = centers(i, 3)
        End If
    End If
Next i

If MaxLen <= 1 Then Exit Sub
'else
'first process any string remaining to left
NewK = MaxStart - 1
NewJ = j
If NewK - NewJ > 0 Then FindLargest NewJ, NewK
'process current node
palindrome = Mid(SearchString, MaxStart, MaxLen)
If Not palindromes.Exists(palindrome) Then
    'found new palindrome - display it
    PalinCount = PalinCount + 1
    anchor.Offset(PalinCount) = palindrome
    palindromes.Add palindrome, 1
Else
    palindromes.Item(palindrome) = palindromes.Item(palindrome) + 1
End If
'now process any remaining right substring
NewJ = MaxEnd + 1
NewK = k
If NewK - NewJ > 0 Then FindLargest NewJ, NewK
End Sub

Sub Main()
Dim i As Long, palindrome As String

If Len(Range("A1").Value) < 2 Then
    MsgBox "A string of length > 1 must be entered in A1"
    Exit Sub
End If
Initialize
anchor.CurrentRegion.ClearContents
FindLargest 1, Len(SearchString)
If palindromes.count = 0 Then
    anchor.Value = "No palindromes found"
Else
    anchor.Value = "Palindrome"
    anchor.Offset(0, 1) = "Frequency"
    anchor.Offset(0, 2) = "Length"
    For i = 1 To PalinCount
        palindrome = anchor.Offset(i).Value
        anchor.Offset(i, 1).Value = _
            palindromes.Item(palindrome)
        anchor.Offset(i, 2).Value = Len(palindrome)
    Next i
End If
End Sub

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Hope this works for you.

-John

Dear John,
I have performed some tests and it sounds good. I just did not
understand well the initialize module. I will perform an additional
battery of tests and tell you. Do you have an idea about how could we
perform similar test for serach repeats?
Thanks in advance,
Luciano
 
J

John Coleman

[snip]
Dear John,
I have performed some tests and it sounds good. I just did not
understand well the initialize module. I will perform an additional
battery of tests and tell you. Do you have an idea about how could we
perform similar test for serach repeats?
Thanks in advance,
Luciano- Hide quoted text -

Dear Luciano,

The algorithm I suggested yesterday is naturally recursive, but I
feared that a naive recursive implementation would slow to a crawl for
larger strings since it would be repeatedly rediscovering the same
palindromes again and again. The centers array in the initialization
sub allows for a partial memoization ( http://en.wikipedia.org/wiki/Memoization
) for greater efficiency. I decided to focus on the "center" of a
palindrome since it is easier to first of all find the largest
palindrome centered at a given position (which is either at a
character or midway between two characters) and, more importantly, it
seemed to be easier to adjust for the fact that remaining palindromes
must be shortened as larger palindromes are removed earlier in the
call-sequence. If you really want to get a feel for the information
that centers agai, type something into A1 then run:

Sub Display()
'displays initialized centers
Initialize
Dim i As Long, j As Long, k As Long, l As Long
With anchor
.Value = "i"
.Offset(0, 1).Value = "i/2"
.Offset(0, 2).Value = "Palindrome"
.Offset(0, 3).Value = "Length"
.Offset(0, 4).Value = "j"
.Offset(0, 5).Value = "k"
For i = LBound(centers) To UBound(centers)
l = centers(i, 1)
j = centers(i, 2)
k = centers(i, 3)
.Offset(i - 2).Value = i
.Offset(i - 2, 1).Value = i / 2
If l > 0 Then .Offset(i - 2, 2).Value = Mid(SearchString, j,
l)
.Offset(i - 2, 3).Value = l
.Offset(i - 2, 4).Value = j
.Offset(i - 2, 5).Value = k
Next i
End With

End Sub

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
The second column is the crucial one. Note j and k are always
symmetric about that number. If i/2 is an integer then
any palindrome discovered has odd length and if i/2 is like 3.5 then
it is of even length.

hth

-John

p.s. - I'll think about the finding repeats problem a bit tomorrow,
but it might be several days before I get anything since I'm going to
turn my attention back to my paper as well.
 
J

John Coleman

p.s.

In general I don't like writing code that makes assumptions about the
layout of a spreadsheet. When I was testing my code it became tedious
always manually clearing the output region before rerunning it, so I
included the line

anchor.CurrentRegion.ClearContents

(in the middle of Main())

But if you have data in column D or in A2 (connecting A3 and A1) that
line could wipe out something that
you want to keep. You might want to replace that line with:

Range(anchor, anchor.End(xlDown).Offset(0, 2)).ClearContents

This has the effect of clearing Main's last output but never anything
larger (unless maybe if for some odd reason you have data further down
in Columns A:C , but such data is liable to be overwritten anyway).
 
L

Luciano Paulino da Silva

[snip]
Dear John,
I have performed some tests and it sounds good. I just did not
understand well the initialize module. I will perform an additional
battery of tests and tell you. Do you have an idea about how could we
perform similar test for serach repeats?
Thanks in advance,
Luciano- Hide quoted text -

DearLuciano,

The algorithm I suggested yesterday is naturally recursive, but I
feared that a naive recursive implementation would slow to a crawl for
larger strings since it would be repeatedly rediscovering the same
palindromes again and again. The centers array in the initialization
sub allows for a partial memoization (http://en.wikipedia.org/wiki/Memoization
) for greater efficiency. I decided to focus on the "center" of a
palindrome since it is easier to first of all find the largest
palindrome centered at a given position (which is either at a
character or midway between two characters) and, more importantly, it
seemed to be easier to adjust for the fact that remaining palindromes
must be shortened as larger palindromes are removed earlier in the
call-sequence. If you really want to get a feel for the information
that centers agai, type something into A1 then run:

Sub Display()
'displays initialized centers
Initialize
Dim i As Long, j As Long, k As Long, l As Long
With anchor
    .Value = "i"
    .Offset(0, 1).Value = "i/2"
    .Offset(0, 2).Value = "Palindrome"
    .Offset(0, 3).Value = "Length"
    .Offset(0, 4).Value = "j"
    .Offset(0, 5).Value = "k"
    For i = LBound(centers) To UBound(centers)
        l = centers(i, 1)
        j = centers(i, 2)
        k = centers(i, 3)
        .Offset(i - 2).Value = i
        .Offset(i - 2, 1).Value = i / 2
        If l > 0 Then .Offset(i - 2, 2).Value = Mid(SearchString, j,
l)
        .Offset(i - 2, 3).Value = l
        .Offset(i - 2, 4).Value = j
        .Offset(i - 2, 5).Value = k
    Next i
End With

End Sub

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
The second column is the crucial one. Note j and k are always
symmetric about that number. If i/2 is an integer then
any palindrome discovered has odd length and if i/2 is like 3.5 then
it is of even length.

hth

-John

p.s. - I'll think about the finding repeats problem a bit tomorrow,
but it might be several days before I get anything since I'm going to
turn my attention back to my paper as well.

Dear John,
I have tried use your new optimized code but it fail due to an error.
I`m trying understand what is happening.
Thanks in advance,
Luciano
 
J

John Coleman

[snip]
Dear John,
I have performed some tests and it sounds good. I just did not
understand well the initialize module. I will perform an additional
battery of tests and tell you. Do you have an idea about how could we
perform similar test for serach repeats?
Thanks in advance,
Luciano- Hide quoted text -
DearLuciano,

The algorithm I suggested yesterday is naturally recursive, but I
feared that a naive recursive implementation would slow to a crawl for
larger strings since it would be repeatedly rediscovering the same
palindromes again and again. The centers array in the initialization
sub allows for a partial memoization (http://en.wikipedia.org/wiki/Memoization
) for greater efficiency. I decided to focus on the "center" of a
palindrome since it is easier to first of all find the largest
palindrome centered at a given position (which is either at a
character or midway between two characters) and, more importantly, it
seemed to be easier to adjust for the fact that remaining palindromes
must be shortened as larger palindromes are removed earlier in the
call-sequence. If you really want to get a feel for the information
that centers agai, type something into A1 then run:
Sub Display()
'displays initialized centers
Initialize
Dim i As Long, j As Long, k As Long, l As Long
With anchor
    .Value = "i"
    .Offset(0, 1).Value = "i/2"
    .Offset(0, 2).Value = "Palindrome"
    .Offset(0, 3).Value = "Length"
    .Offset(0, 4).Value = "j"
    .Offset(0, 5).Value = "k"
    For i = LBound(centers) To UBound(centers)
        l = centers(i, 1)
        j = centers(i, 2)
        k = centers(i, 3)
        .Offset(i - 2).Value = i
        .Offset(i - 2, 1).Value = i / 2
        If l > 0 Then .Offset(i - 2, 2).Value = Mid(SearchString, j,
l)
        .Offset(i - 2, 3).Value = l
        .Offset(i - 2, 4).Value = j
        .Offset(i - 2, 5).Value = k
    Next i
End With
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
The second column is the crucial one. Note j and k are always
symmetric about that number. If i/2 is an integer then
any palindrome discovered has odd length and if i/2 is like 3.5 then
it is of even length.


p.s. - I'll think about the finding repeats problem a bit tomorrow,
but it might be several days before I get anything since I'm going to
turn my attention back to my paper as well.

Dear John,
I have tried use your new optimized code but it fail due to an error.
I`m trying understand what is happening.
Thanks in advance,
Luciano- Hide quoted text -

- Show quoted text -

Dear Luciano,

What input led to an error, and what output did you expect? By "error"
did you mean incorrect output or did you mean some sort of run-time
error? If the later, what is the error code and what line is
hightlighted if you go into debug mode?

-John
 
J

John Coleman

On Oct 25, 6:05 pm, Luciano Paulino da Silva
I have performed some tests and it sounds good. I just did not
understand well the initialize module. I will perform an additional
battery of tests and tell you. Do you have an idea about how could we
perform similar test for serach repeats?
Thanks in advance,
Luciano- Hide quoted text -

- Show quoted text -

Dear Luciano,

I became interested in the repeat problem. I wrote the following code.
It lists all substrings of length >= 2that are repeated at least once
(with the repeat not overlapping itself) but doesn't list repeats
which have the property that each occurence of those repeats is
contained inside of the occurence of a larger repeat since, for
example, in ABCDABC the fact that AB is repeated is implicit in the
fact that ABC is repeated (on the other hand, my criteria imply that
AB isn't listed even in ABCABDABCAB, although it might be of
significance to note that AB is an "internal" repeat inside of the
repeat ABCAB. On the other hand, you wouldn't want to list a whole
bunch of strings like SS, SSS, SSSS in SSSSSSSSSSTSSSSSSSSSS, hence my
current approach). I list the repeat, its length, frequency and
starting position of the first occurence. It is sorted by order of
descending length and (as a tie-breaker) in order of ascending
position:

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Sub FindRepeats()

Dim repeatDict As Object
Dim repeatArray As Variant
Dim repeat As Variant
Dim ext As Variant 'extension
Dim i As Long, j As Long, k As Long
Dim n As Long
Dim myLen As Long, maxLen As Long
Dim SearchString As String
Dim occurences As String
Dim anchor As Range, sortRange As Range
Dim start As Double
Dim A As Variant
Dim RepeatFound As Boolean

'start = Timer
Set anchor = Range("A3")
Range(anchor, anchor.End(xlDown).Offset(0, 3)).ClearContents
Application.ScreenUpdating = False
Set repeatDict = CreateObject("Scripting.Dictionary")
SearchString = Range("A1").Value
n = Len(SearchString)
'load dictionary:
'initialize with repeats of length 2 since longer
'repeats are extensions of shorter ones
myLen = 2
With repeatDict 'this block isn't indented since confusion
'is unlikely
For i = 1 To n - myLen + 1
repeat = Mid(SearchString, i, myLen)
If .Exists(repeat) Then
.Item(repeat) = .Item(repeat) & "," & i
Else
.Add repeat, i
'i is first occurence
End If
Next i
'clean up dict - deleting "repeats" that only
'occur once or only repeat by overlapping first occurence
For Each repeat In .Keys
A = Split(.Item(repeat), ",")
If UBound(A) = 0 Then
.Remove repeat
ElseIf CInt(A(UBound(A))) < A(0) + myLen Then
.Remove repeat
Else
RepeatFound = True
End If
Next repeat
myLen = 3
Do While myLen <= Int(n / 2) And RepeatFound
RepeatFound = False ' have to find additional repeats
For Each repeat In .Keys
If Len(repeat) + 1 = myLen Then
A = Split(.Item(repeat), ",")
For i = LBound(A) To UBound(A)
j = A(i)
ext = Mid(SearchString, j, myLen)
If Len(ext) = myLen Then 'might have been cut off
If .Exists(ext) Then
.Item(ext) = .Item(ext) & "," & j
Else
.Add ext, j
End If
End If
Next i
End If
Next repeat
'cleanup dictionary
For Each repeat In .Keys
If Len(repeat) = myLen Then
A = Split(.Item(repeat), ",")
If UBound(A) = 0 Then
.Remove repeat
ElseIf CInt(A(UBound(A))) < A(0) + myLen Then
.Remove repeat
Else
RepeatFound = True
End If
End If
Next repeat
myLen = myLen + 1
Loop

'Now clean up the dictionary another way
'delete from it all repeats with the property that
'each occurences of that repeat is a substring of
'an occurence of a larger repeat
'Note that the maximum size of a
'repeat is myLen - 1, hence the maximum size of
'a repeat which can be contained in another is myLen-2

maxLen = myLen - 2
For myLen = 2 To maxLen
For Each repeat In .Keys
If Len(repeat) = myLen Then
A = Split(.Item(repeat), ",")
RepeatFound = True 'to enter first pass
i = 0
Do While i <= UBound(A) And RepeatFound
j = A(i)
RepeatFound = False 'is the occurence at j
'part of a larger repeat?
'two ways this could be so
If j > 1 Then
ext = Mid(SearchString, j - 1, myLen + 1)
If .Exists(ext) Then
occurences = .Item(ext)
If occurences Like (j - 1) & ",*" _
Or occurences Like "*," & (j - 1) & ",*" _
Or occurences Like "*," & (j - 1) Then
RepeatFound = True
End If
End If
End If
If j < n Then
ext = Mid(SearchString, j, myLen + 1)
If .Exists(ext) Then
occurences = .Item(ext)
If occurences Like j & ",*" _
Or occurences Like "*," & j & ",*" _
Or occurences Like "*," & j Then
RepeatFound = True
End If
End If
End If
i = i + 1
Loop
If RepeatFound Then .Remove repeat
End If
Next repeat
Next myLen

'transfer data to spreadsheet
If repeatDict.count = 0 Then
anchor.Value = "No repeats found"
Exit Sub
End If
anchor.Value = "Repeat"
anchor.Offset(0, 1).Value = "Length"
anchor.Offset(0, 2).Value = "Frequency"
anchor.Offset(0, 3).Value = "First Index"
i = 1
For Each repeat In .Keys
anchor.Offset(i, 0) = repeat
anchor.Offset(i, 1) = Len(repeat)
A = Split(.Item(repeat), ",")
anchor.Offset(i, 2) = UBound(A) + 1
anchor.Offset(i, 3) = A(0)
i = i + 1
Next repeat
Set sortRange = Range(anchor.Offset(1), _
anchor.Offset(.count, 3))
sortRange.Sort Key1:=anchor.Offset(1, 1), _
Order1:=xlDescending, _
Key2:=anchor.Offset(1, 3), _
Order2:=xlAscending
End With
Set repeatDict = Nothing
Application.ScreenUpdating = True
'MsgBox Timer - start
End Sub

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Hope that helps

-John

p.s. Did you clarify what the issue with the palindrome program is?
 
L

Luciano Paulino da Silva

[snip]
Dear John,
I have performed some tests and it sounds good. I just did not
understand well the initialize module. I will perform an additional
battery of tests and tell you. Do you have an idea about how could we
perform similar test for serach repeats?
Thanks in advance,
Luciano- Hide quoted text -
DearLuciano,
The algorithm I suggested yesterday is naturally recursive, but I
feared that a naive recursive implementation would slow to a crawl for
larger strings since it would be repeatedly rediscovering the same
palindromes again and again. The centers array in the initialization
sub allows for a partial memoization (http://en.wikipedia.org/wiki/Memoization
) for greater efficiency. I decided to focus on the "center" of a
palindrome since it is easier to first of all find the largest
palindrome centered at a given position (which is either at a
character or midway between two characters) and, more importantly, it
seemed to be easier to adjust for the fact that remaining palindromes
must be shortened as larger palindromes are removed earlier in the
call-sequence. If you really want to get a feel for the information
that centers agai, type something into A1 then run:
Sub Display()
'displays initialized centers
Initialize
Dim i As Long, j As Long, k As Long, l As Long
With anchor
    .Value = "i"
    .Offset(0, 1).Value = "i/2"
    .Offset(0, 2).Value = "Palindrome"
    .Offset(0, 3).Value = "Length"
    .Offset(0, 4).Value = "j"
    .Offset(0, 5).Value = "k"
    For i = LBound(centers) To UBound(centers)
        l = centers(i, 1)
        j = centers(i, 2)
        k = centers(i, 3)
        .Offset(i - 2).Value = i
        .Offset(i - 2, 1).Value = i / 2
        If l > 0 Then .Offset(i - 2, 2).Value = Mid(SearchString, j,
l)
        .Offset(i - 2, 3).Value = l
        .Offset(i - 2, 4).Value = j
        .Offset(i - 2, 5).Value = k
    Next i
End With
End Sub
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
The second column is the crucial one. Note j and k are always
symmetric about that number. If i/2 is an integer then
any palindrome discovered has odd length and if i/2 is like 3.5 then
it is of even length.
hth
-John
p.s. - I'll think about the finding repeats problem a bit tomorrow,
but it might be several days before I get anything since I'm going to
turn my attention back to my paper as well.
Dear John,
I have tried use your new optimized code but it fail due to an error.
I`m trying understand what is happening.
Thanks in advance,
Luciano- Hide quoted text -
- Show quoted text -

DearLuciano,

What input led to an error, and what output did you expect? By "error"
did you mean incorrect output or did you mean some sort of run-time
error? If the later, what is the error code and what line is
hightlighted if you go into debug mode?

-John

Dear John,
I performed the folloing input string:

GQGAGQGAGAAAAAAGGAGQGGYGGLGNQGAGRGGQGAAAAAAGGAGQGGYGGLGSQGAGRGGLGGQGAGAAAAAAGGAGQGGYGGLGGQGAGQGAGQGGYGGLGIQGSGRGGLGGQGAGAAAAAAGGAGQGGLGGQGAGQGAGAAAAAAGGVRQGGYGGLGSQGAGRGGQGAGAAAAAAGGAGQGGYGGLGGQGVGRGGLGGQGAGAAAAGGAGQGGYGGVGSGASAASAAASRLSSPQASSRVSSAVSNLVASGPTNSAALSSTISNVVSQIGASNPGLSGCDVLIQALLEVVSALIQILGSSSIGQVNYGSAGQATQIVGQSVYQALG

and the output of the sub Display was:

i i/2 Palindrome Length j k
3 1.5 7 0 2 1
4 2 GQG 3 1 3
5 2.5 3 0 3 2
6 3 G 1 3 3
7 3.5 2 0 4 3
8 4 GQGAGQG 7 1 7
9 4.5 13 0 5 4
10 5 G 1 5 5
11 5.5 3 0 6 5
12 6 GAGQGAG 7 3 9
13 6.5 3 0 7 6
14 7 G 1 7 7
15 7.5 6 0 8 7
16 8 GAG 3 7 9
17 8.5 2 0 9 8
18 9 AGA 3 8 10
19 9.5 2 0 10 9
20 10 A 1 10 10
21 10.5 AA 2 10 11
22 11 AAA 3 10 12
23 11.5 AAAA 4 10 13
24 12 AAAAA 5 10 14
25 12.5 GAAAAAAG 8 9 16
26 13 AAAAA 5 11 15
27 13.5 AAAA 4 12 15
28 14 AAA 3 13 15
29 14.5 AA 2 14 15
30 15 A 1 15 15
31 15.5 0 16 15
32 16 G 1 16 16
33 16.5 AGGA 4 15 18
34 17 G 1 17 17
35 17.5 0 18 17
36 18 GAG 3 17 19
37 18.5 0 19 18
38 19 G 1 19 19
39 19.5 0 20 19
40 20 GQG 3 19 21
41 20.5 0 21 20
42 21 G 1 21 21
43 21.5 GG 2 21 22
44 22 G 1 22 22
45 22.5 0 23 22
46 23 GGYGG 5 21 25
47 23.5 0 24 23
48 24 G 1 24 24
49 24.5 GG 2 24 25
50 25 G 1 25 25
51 25.5 0 26 25
52 26 GLG 3 25 27
53 26.5 0 27 26
54 27 G 1 27 27
55 27.5 0 28 27
56 28 N 1 28 28
57 28.5 0 29 28
58 29 Q 1 29 29
59 29.5 0 30 29
60 30 G 1 30 30
61 30.5 0 31 30
62 31 GAG 3 30 32
63 31.5 0 32 31
64 32 G 1 32 32
65 32.5 0 33 32
66 33 GRG 3 32 34
67 33.5 0 34 33
68 34 G 1 34 34
69 34.5 GG 2 34 35
70 35 G 1 35 35
71 35.5 0 36 35
72 36 GQG 3 35 37
73 36.5 0 37 36
74 37 G 1 37 37
75 37.5 0 38 37
76 38 A 1 38 38
77 38.5 AA 2 38 39
78 39 AAA 3 38 40
79 39.5 AAAA 4 38 41
80 40 AAAAA 5 38 42
81 40.5 GAAAAAAG 8 37 44
82 41 AAAAA 5 39 43
83 41.5 AAAA 4 40 43
84 42 AAA 3 41 43
85 42.5 AA 2 42 43
86 43 A 1 43 43
87 43.5 0 44 43
88 44 G 1 44 44
89 44.5 AGGA 4 43 46
90 45 G 1 45 45
91 45.5 0 46 45
92 46 GAG 3 45 47
93 46.5 0 47 46
94 47 G 1 47 47
95 47.5 0 48 47
96 48 GQG 3 47 49
97 48.5 0 49 48
98 49 G 1 49 49
99 49.5 GG 2 49 50
100 50 G 1 50 50
101 50.5 0 51 50
102 51 GGYGG 5 49 53
103 51.5 0 52 51
104 52 G 1 52 52
105 52.5 GG 2 52 53
106 53 G 1 53 53
107 53.5 0 54 53
108 54 GLG 3 53 55
109 54.5 0 55 54
110 55 G 1 55 55
111 55.5 0 56 55
112 56 S 1 56 56
113 56.5 0 57 56
114 57 Q 1 57 57
115 57.5 0 58 57
116 58 G 1 58 58
117 58.5 0 59 58
118 59 GAG 3 58 60
119 59.5 0 60 59
120 60 G 1 60 60
121 60.5 0 61 60
122 61 GRG 3 60 62
123 61.5 0 62 61
124 62 G 1 62 62
125 62.5 GG 2 62 63
126 63 G 1 63 63
127 63.5 0 64 63
128 64 GGLGG 5 62 66
129 64.5 0 65 64
130 65 G 1 65 65
131 65.5 GG 2 65 66
132 66 G 1 66 66
133 66.5 0 67 66
134 67 GQG 3 66 68
135 67.5 0 68 67
136 68 G 1 68 68
137 68.5 0 69 68
138 69 GAG 3 68 70
139 69.5 0 70 69
140 70 AGA 3 69 71
141 70.5 0 71 70
142 71 A 1 71 71
143 71.5 AA 2 71 72
144 72 AAA 3 71 73
145 72.5 AAAA 4 71 74
146 73 AAAAA 5 71 75
147 73.5 GAAAAAAG 8 70 77
148 74 AAAAA 5 72 76
149 74.5 AAAA 4 73 76
150 75 AAA 3 74 76
151 75.5 AA 2 75 76
152 76 A 1 76 76
153 76.5 0 77 76
154 77 G 1 77 77
155 77.5 AGGA 4 76 79
156 78 G 1 78 78
157 78.5 0 79 78
158 79 GAG 3 78 80
159 79.5 0 80 79
160 80 G 1 80 80
161 80.5 0 81 80
162 81 GQG 3 80 82
163 81.5 0 82 81
164 82 G 1 82 82
165 82.5 GG 2 82 83
166 83 G 1 83 83
167 83.5 0 84 83
168 84 GGYGG 5 82 86
169 84.5 0 85 84
170 85 G 1 85 85
171 85.5 GG 2 85 86
172 86 G 1 86 86
173 86.5 0 87 86
174 87 GGLGG 5 85 89
175 87.5 0 88 87
176 88 G 1 88 88
177 88.5 GG 2 88 89
178 89 G 1 89 89
179 89.5 0 90 89
180 90 GQG 3 89 91
181 90.5 0 91 90
182 91 G 1 91 91
183 91.5 0 92 91
184 92 GQGAGQG 7 89 95
185 92.5 0 93 92
186 93 G 1 93 93
187 93.5 0 94 93
188 94 GGQGAGQGAGQGG 13 88 100
189 94.5 0 95 94
190 95 G 1 95 95
191 95.5 0 96 95
192 96 GQGAGQG 7 93 99
193 96.5 0 97 96
194 97 G 1 97 97
195 97.5 0 98 97
196 98 GQG 3 97 99
197 98.5 0 99 98
198 99 G 1 99 99
199 99.5 GG 2 99 100
200 100 G 1 100 100
201 100.5 0 101 100
202 101 GGYGG 5 99 103
203 101.5 0 102 101
204 102 G 1 102 102
205 102.5 GG 2 102 103
206 103 G 1 103 103
207 103.5 0 104 103
208 104 GLG 3 103 105
209 104.5 0 105 104
210 105 G 1 105 105
211 105.5 0 106 105
212 106 I 1 106 106
213 106.5 0 107 106
214 107 Q 1 107 107
215 107.5 0 108 107
216 108 G 1 108 108
217 108.5 0 109 108
218 109 GSG 3 108 110
219 109.5 0 110 109
220 110 G 1 110 110
221 110.5 0 111 110
222 111 GRG 3 110 112
223 111.5 0 112 111
224 112 G 1 112 112
225 112.5 GG 2 112 113
226 113 G 1 113 113
227 113.5 0 114 113
228 114 GGLGG 5 112 116
229 114.5 0 115 114
230 115 G 1 115 115
231 115.5 GG 2 115 116
232 116 G 1 116 116
233 116.5 0 117 116
234 117 GQG 3 116 118
235 117.5 0 118 117
236 118 G 1 118 118
237 118.5 0 119 118
238 119 GAG 3 118 120
239 119.5 0 120 119
240 120 AGA 3 119 121
241 120.5 0 121 120
242 121 A 1 121 121
243 121.5 AA 2 121 122
244 122 AAA 3 121 123
245 122.5 AAAA 4 121 124
246 123 AAAAA 5 121 125
247 123.5 GAAAAAAG 8 120 127
248 124 AAAAA 5 122 126
249 124.5 AAAA 4 123 126
250 125 AAA 3 124 126
251 125.5 AA 2 125 126
252 126 A 1 126 126
253 126.5 0 127 126
254 127 G 1 127 127
255 127.5 AGGA 4 126 129
256 128 G 1 128 128
257 128.5 0 129 128
258 129 GAG 3 128 130
259 129.5 0 130 129
260 130 G 1 130 130
261 130.5 0 131 130
262 131 GQG 3 130 132
263 131.5 0 132 131
264 132 G 1 132 132
265 132.5 GG 2 132 133
266 133 G 1 133 133
267 133.5 0 134 133
268 134 GAGQGGLGGQGAG 13 128 140
269 134.5 0 135 134
270 135 G 1 135 135
271 135.5 GG 2 135 136
272 136 G 1 136 136
273 136.5 0 137 136
274 137 GQG 3 136 138
275 137.5 0 138 137
276 138 G 1 138 138
277 138.5 0 139 138
278 139 GQGAGQG 7 136 142
279 139.5 0 140 139
280 140 G 1 140 140
281 140.5 0 141 140
282 141 GAGQGAG 7 138 144
283 141.5 0 142 141
284 142 G 1 142 142
285 142.5 0 143 142
286 143 GAG 3 142 144
287 143.5 0 144 143
288 144 AGA 3 143 145
289 144.5 0 145 144
290 145 A 1 145 145
291 145.5 AA 2 145 146
292 146 AAA 3 145 147
293 146.5 AAAA 4 145 148
294 147 AAAAA 5 145 149
295 147.5 GAAAAAAG 8 144 151
296 148 AAAAA 5 146 150
297 148.5 AAAA 4 147 150
298 149 AAA 3 148 150
299 149.5 AA 2 149 150
300 150 A 1 150 150
301 150.5 0 151 150
302 151 G 1 151 151
303 151.5 GG 2 151 152
304 152 G 1 152 152
305 152.5 0 153 152
306 153 V 1 153 153
307 153.5 0 154 153
308 154 R 1 154 154
309 154.5 0 155 154
310 155 Q 1 155 155
311 155.5 0 156 155
312 156 G 1 156 156
313 156.5 GG 2 156 157
314 157 G 1 157 157
315 157.5 0 158 157
316 158 GGYGG 5 156 160
317 158.5 0 159 158
318 159 G 1 159 159
319 159.5 GG 2 159 160
320 160 G 1 160 160
321 160.5 0 161 160
322 161 GLG 3 160 162
323 161.5 0 162 161
324 162 G 1 162 162
325 162.5 0 163 162
326 163 S 1 163 163
327 163.5 0 164 163
328 164 Q 1 164 164
329 164.5 0 165 164
330 165 G 1 165 165
331 165.5 0 166 165
332 166 GAG 3 165 167
333 166.5 0 167 166
334 167 G 1 167 167
335 167.5 0 168 167
336 168 GRG 3 167 169
337 168.5 0 169 168
338 169 G 1 169 169
339 169.5 GG 2 169 170
340 170 G 1 170 170
341 170.5 0 171 170
342 171 GQG 3 170 172
343 171.5 0 172 171
344 172 G 1 172 172
345 172.5 0 173 172
346 173 GAG 3 172 174
347 173.5 0 174 173
348 174 AGA 3 173 175
349 174.5 0 175 174
350 175 A 1 175 175
351 175.5 AA 2 175 176
352 176 AAA 3 175 177
353 176.5 AAAA 4 175 178
354 177 AAAAA 5 175 179
355 177.5 GAAAAAAG 8 174 181
356 178 AAAAA 5 176 180
357 178.5 AAAA 4 177 180
358 179 AAA 3 178 180
359 179.5 AA 2 179 180
360 180 A 1 180 180
361 180.5 0 181 180
362 181 G 1 181 181
363 181.5 AGGA 4 180 183
364 182 G 1 182 182
365 182.5 0 183 182
366 183 GAG 3 182 184
367 183.5 0 184 183
368 184 G 1 184 184
369 184.5 0 185 184
370 185 GQG 3 184 186
371 185.5 0 186 185
372 186 G 1 186 186
373 186.5 GG 2 186 187
374 187 G 1 187 187
375 187.5 0 188 187
376 188 GGYGG 5 186 190
377 188.5 0 189 188
378 189 G 1 189 189
379 189.5 GG 2 189 190
380 190 G 1 190 190
381 190.5 0 191 190
382 191 GGLGG 5 189 193
383 191.5 0 192 191
384 192 G 1 192 192
385 192.5 GG 2 192 193
386 193 G 1 193 193
387 193.5 0 194 193
388 194 GQG 3 193 195
389 194.5 0 195 194
390 195 G 1 195 195
391 195.5 0 196 195
392 196 GVG 3 195 197
393 196.5 0 197 196
394 197 G 1 197 197
395 197.5 0 198 197
396 198 GRG 3 197 199
397 198.5 0 199 198
398 199 G 1 199 199
399 199.5 GG 2 199 200
400 200 G 1 200 200
401 200.5 0 201 200
402 201 GGLGG 5 199 203
403 201.5 0 202 201
404 202 G 1 202 202
405 202.5 GG 2 202 203
406 203 G 1 203 203
407 203.5 0 204 203
408 204 GQG 3 203 205
409 204.5 0 205 204
410 205 G 1 205 205
411 205.5 0 206 205
412 206 GAG 3 205 207
413 206.5 0 207 206
414 207 AGA 3 206 208
415 207.5 0 208 207
416 208 A 1 208 208
417 208.5 AA 2 208 209
418 209 AAA 3 208 210
419 209.5 GAAAAG 6 207 212
420 210 AAA 3 209 211
421 210.5 AA 2 210 211
422 211 A 1 211 211
423 211.5 0 212 211
424 212 G 1 212 212
425 212.5 AGGA 4 211 214
426 213 G 1 213 213
427 213.5 0 214 213
428 214 GAG 3 213 215
429 214.5 0 215 214
430 215 G 1 215 215
431 215.5 0 216 215
432 216 GQG 3 215 217
433 216.5 0 217 216
434 217 G 1 217 217
435 217.5 GG 2 217 218
436 218 G 1 218 218
437 218.5 0 219 218
438 219 GGYGG 5 217 221
439 219.5 0 220 219
440 220 G 1 220 220
441 220.5 GG 2 220 221
442 221 G 1 221 221
443 221.5 0 222 221
444 222 GVG 3 221 223
445 222.5 0 223 222
446 223 G 1 223 223
447 223.5 0 224 223
448 224 GSG 3 223 225
449 224.5 0 225 224
450 225 G 1 225 225
451 225.5 0 226 225
452 226 A 1 226 226
453 226.5 0 227 226
454 227 ASA 3 226 228
455 227.5 0 228 227
456 228 A 1 228 228
457 228.5 ASAASA 6 226 231
458 229 A 1 229 229
459 229.5 0 230 229
460 230 AASAA 5 228 232
461 230.5 0 231 230
462 231 A 1 231 231
463 231.5 AA 2 231 232
464 232 SAAAS 5 230 234
465 232.5 AA 2 232 233
466 233 A 1 233 233
467 233.5 0 234 233
468 234 S 1 234 234
469 234.5 0 235 234
470 235 R 1 235 235
471 235.5 0 236 235
472 236 L 1 236 236
473 236.5 0 237 236
474 237 S 1 237 237
475 237.5 SS 2 237 238
476 238 S 1 238 238
477 238.5 0 239 238
478 239 P 1 239 239
479 239.5 0 240 239
480 240 Q 1 240 240
481 240.5 0 241 240
482 241 A 1 241 241
483 241.5 0 242 241
484 242 S 1 242 242
485 242.5 SS 2 242 243
486 243 S 1 243 243
487 243.5 0 244 243
488 244 R 1 244 244
489 244.5 0 245 244
490 245 V 1 245 245
491 245.5 0 246 245
492 246 S 1 246 246
493 246.5 SS 2 246 247
494 247 S 1 247 247
495 247.5 0 248 247
496 248 A 1 248 248
497 248.5 0 249 248
498 249 V 1 249 249
499 249.5 0 250 249
500 250 S 1 250 250
501 250.5 0 251 250
502 251 N 1 251 251
503 251.5 0 252 251
504 252 L 1 252 252
505 252.5 0 253 252
506 253 V 1 253 253
507 253.5 0 254 253
508 254 A 1 254 254
509 254.5 0 255 254
510 255 S 1 255 255
511 255.5 0 256 255
512 256 G 1 256 256
513 256.5 0 257 256
514 257 P 1 257 257
515 257.5 0 258 257
516 258 T 1 258 258
517 258.5 0 259 258
518 259 N 1 259 259
519 259.5 0 260 259
520 260 S 1 260 260
521 260.5 0 261 260
522 261 A 1 261 261
523 261.5 AA 2 261 262
524 262 A 1 262 262
525 262.5 0 263 262
526 263 L 1 263 263
527 263.5 0 264 263
528 264 S 1 264 264
529 264.5 SS 2 264 265
530 265 S 1 265 265
531 265.5 0 266 265
532 266 T 1 266 266
533 266.5 0 267 266
534 267 I 1 267 267
535 267.5 0 268 267
536 268 S 1 268 268
537 268.5 0 269 268
538 269 N 1 269 269
539 269.5 0 270 269
540 270 V 1 270 270
541 270.5 VV 2 270 271
542 271 V 1 271 271
543 271.5 0 272 271
544 272 S 1 272 272
545 272.5 0 273 272
546 273 Q 1 273 273
547 273.5 0 274 273
548 274 I 1 274 274
549 274.5 0 275 274
550 275 G 1 275 275
551 275.5 0 276 275
552 276 A 1 276 276
553 276.5 0 277 276
554 277 S 1 277 277
555 277.5 0 278 277
556 278 N 1 278 278
557 278.5 0 279 278
558 279 P 1 279 279
559 279.5 0 280 279
560 280 G 1 280 280
561 280.5 0 281 280
562 281 L 1 281 281
563 281.5 0 282 281
564 282 S 1 282 282
565 282.5 0 283 282
566 283 G 1 283 283
567 283.5 0 284 283
568 284 C 1 284 284
569 284.5 0 285 284
570 285 D 1 285 285
571 285.5 0 286 285
572 286 V 1 286 286
573 286.5 0 287 286
574 287 L 1 287 287
575 287.5 0 288 287
576 288 I 1 288 288
577 288.5 0 289 288
578 289 Q 1 289 289
579 289.5 0 290 289
580 290 A 1 290 290
581 290.5 0 291 290
582 291 L 1 291 291
583 291.5 LL 2 291 292
584 292 L 1 292 292
585 292.5 0 293 292
586 293 E 1 293 293
587 293.5 0 294 293
588 294 V 1 294 294
589 294.5 VV 2 294 295
590 295 V 1 295 295
591 295.5 0 296 295
592 296 S 1 296 296
593 296.5 0 297 296
594 297 A 1 297 297
595 297.5 0 298 297
596 298 L 1 298 298
597 298.5 0 299 298
598 299 I 1 299 299
599 299.5 0 300 299
600 300 LIQIL 5 298 302
601 300.5 0 301 300
602 301 I 1 301 301
603 301.5 0 302 301
604 302 L 1 302 302
605 302.5 0 303 302
606 303 G 1 303 303
607 303.5 0 304 303
608 304 S 1 304 304
609 304.5 SS 2 304 305
610 305 SSS 3 304 306
611 305.5 SS 2 305 306
612 306 S 1 306 306
613 306.5 0 307 306
614 307 I 1 307 307
615 307.5 0 308 307
616 308 G 1 308 308
617 308.5 0 309 308
618 309 Q 1 309 309
619 309.5 0 310 309
620 310 V 1 310 310
621 310.5 0 311 310
622 311 N 1 311 311
623 311.5 0 312 311
624 312 Y 1 312 312
625 312.5 0 313 312
626 313 G 1 313 313
627 313.5 0 314 313
628 314 S 1 314 314
629 314.5 0 315 314
630 315 A 1 315 315
631 315.5 0 316 315
632 316 G 1 316 316
633 316.5 0 317 316
634 317 Q 1 317 317
635 317.5 0 318 317
636 318 A 1 318 318
637 318.5 0 319 318
638 319 T 1 319 319
639 319.5 0 320 319
640 320 Q 1 320 320
641 320.5 0 321 320
642 321 I 1 321 321
643 321.5 0 322 321
644 322 V 1 322 322
645 322.5 0 323 322
646 323 G 1 323 323
647 323.5 0 324 323
648 324 Q 1 324 324
649 324.5 0 325 324
650 325 S 1 325 325
651 325.5 0 326 325
652 326 V 1 326 326
653 326.5 0 327 326
654 327 Y 1 327 327
655 327.5 0 328 327
656 328 Q 1 328 328
657 328.5 0 329 328
658 329 A 1 329 329
659 329.5 0 330 329
660 330 L 1 330 330
661 330.5 0 331 330

Do you know what is happening?
Thanks in advance,
Luciano
 
J

John Coleman

Dear John,
I performed the folloing input string:
GQGAGQGAGAAAAAAGGAGQGGYGGLGNQGAGRGGQGAAAAAAGGAGQGGYGGLGSQGAGRGGLGGQGAGAAAAA­AGGAGQGGYGGLGGQGAGQGAGQGGYGGLGIQGSGRGGLGGQGAGAAAAAAGGAGQGGLGGQGAGQGAGAAAAAA­GGVRQGGYGGLGSQGAGRGGQGAGAAAAAAGGAGQGGYGGLGGQGVGRGGLGGQGAGAAAAGGAGQGGYGGVGSG­ASAASAAASRLSSPQASSRVSSAVSNLVASGPTNSAALSSTISNVVSQIGASNPGLSGCDVLIQALLEVVSALIQ­ILGSSSIGQVNYGSAGQATQIVGQSVYQALG

[snip]

Dear Luciano,

The purpose of sub Display() was not to replace Main() - it was simply
to show what sort of data Initialize() was collecting since you
indicated that you couldn't figure out what that subroutine was doing
(which isn't surprising since I made little effort to comment the code
and the algorithm is somewhat invovled). If you find the output of
Display() confusing then it is maybe more trouble than it is worth. I
wrote it mostly for myself for debugging purposes anyway. You can
delete it from the project with no ill-effect.

If you want to understand the output a bit more, consider the
following bit:

[snip]
18      9       AGA     3       8       10
19      9.5     2       0       10      9
20      10      A       1       10      10
21      10.5    AA      2       10      11
22      11      AAA     3       10      12
23      11.5    AAAA    4       10      13
24      12      AAAAA   5       10      14
25      12.5    GAAAAAAG        8       9      16
26      13      AAAAA   5       11      15
27      13.5    AAAA    4       12      15
28      14      AAA     3       13      15
29      14.5    AA      2       14      15
30      15      A       1       15      15
31      15.5            0       16      15

[snip]

AGA is a plaindrome of length 3 which runs from string position 8 to
10. It is thus centered at 9 = 18/2

GAAAAAAG is a palindrome of length 8 which extends from 9 to 16.
Unlike AGA, it has no middle character. Instead, the "center" of the
palindrome falls between GAAA and AAAG, which I refer to as position.

Don't worry about this output too much. It is probably a better use of
time to give Main() a battery of tests. On the other hand, if either
you or someone on your team is going to maintain or modify the code,
then I could try to write up a brief description of the algorithm and
comment the code more.

Sorry for any confusion.

-John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top