Text manipulation

P

paulinoluciano

"Is there some VBA code which could delete all first, second or third
characters of a text? Could it be done to the three last characters
from this same text and these be displayed on reverse order?"

Example:
AAAASAHDASK
AAASAHDASK
AASAHDASK
ASAHDASK
SADHASAAAA
ADHASAAAA
DHASAAAA
 
B

Barb Reinhardt

I'm guessing you could use the LEFT, MID and RIGHT functions, but I've not
used them in VBA.
 
P

paulinoluciano

You are right Barb Reinhardt... But in this case I would need to use
some kind of VBA code because I would not like to have to put each
sequence o characters and functions for each cell independently. I
would like to perform an automatic approach because I have a lot of
files to do that.
Thank you anyway!
Luciano
 
B

Bob Phillips

Sub Test()
Dim iLastRow As Long
Dim i As Long, j As Long
Dim temp
Const nIndex As Long = 3 'every third, change to suit

iLastRow = Cells(Rows.Count, "A").End(xlUp).Row
For i = 1 To iLastRow
If nIndex = 1 Then
Cells(i, "A").Value = Right(Cells(i, "A").Value, _
Len(Cells(i, "A").Value) - 1)
Else
Cells(i, "A").Value = Left(Cells(i, "A").Value, nIndex - 1) & _
Right(Cells(i, "A").Value, Len(Cells(i,
"A").Value) - nIndex)
End If
Next i

End Sub


--
HTH

Bob Phillips

(remove nothere from email address if mailing direct)
 
P

paulinoluciano

Thank you Bob Phillips! It solve my problem in part. However the major
question is just a few more complex. I explained it better in topic
Text subsequences. In the present topic instead subtract one letter any
time I need a macro capable to let the first cell (e.g. A1) as it is
and remove the characters only in the next row (A2) that it would be
identical to the previous without the first letter.
Luciano
 
B

Bob Phillips

I didn't understand the other one!

--
HTH

Bob Phillips

(remove nothere from email address if mailing direct)
 
P

paulinoluciano

In fact, the other topic is just a few more complex until to explain.
Let me try explain better. I have a sequence of characters like:

AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAOEKOQPPDAOPSKAEPQ

This sequence must be put in cell A2.
Thus, I have to perform some specific operations in this text:

Example 1:
Rules:
a) Fragment the sequence before K but not always (you could have lost
cut).
b) Sequence is not cut if K is found before FP

Results:

AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAOEKOQPPDAOPSKAEPQ

0 lost cut = Cutting the sequence all the time in which K is present
(The subsequences of this process should be put in B column:
AASSASDK
ASASDASFAFSASASADK
ASASAFPKQREWEAQEOK
SPADAOEK
OQPPDAOPSK
AEPQ

1 lost cut = Cutting the sequence after the first K present in the
sequence (The subsequences of this process should be put in C column::
AASSASDKASASDASFAFSASASADK
ASASAFPKQREWEAQEOKSPADAOEK
OQPPDAOPSKAEPQ
AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
SPADAOEKOQPPDAOPSKAEPQ

2 lost cut = = Cutting the sequence after the second K (just for the
third and following) present in the sequence (The subsequences of this
process should be put in D column:
AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
SPADAOEKOQPPDAOPSKAEPQ

Repair that in some cases I need lost cuts in which you cut after 1, 2,
3, 4,... specific characters.
I have to specify such rules in some place of the sheet containing the
precursor text.
The rules are:

Cut after "XXX" (In this example I have put K but the some cell in the
sheet must contain what is the character in which the sequence will be
fragmented). In some cases it could be more than only one character
(e.g. K and R; nor necessarily together)
Cut before "XXX" (The cut may be after like previous example or before
the character)

Never before "XXX" (In some cases I have prohibitive situations; e.g.
It must not cut a sequence in K if K is preceeded by P or by RP)
Never after "XXX" (Same for after)

Number of times that the character could be missed prior cut "XXX" (In
some place of the sheet I must explicit how many characters could be
"lost" prior cut (see example).
 
P

paulinoluciano

Oh, Sorry! This is applied to proteomics reserch (biology). In that
case, amino acid sequences are fragmented in small parts by proteases.
There are a lot of non-Excel softwares devoted to do that but it would
be easier and nore roboust for my current applications if could I use
excel devoted to this end.
Best regards,
Luciano
 
R

Ron Rosenfeld

In fact, the other topic is just a few more complex until to explain.
Let me try explain better. I have a sequence of characters like:

AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAOEKOQPPDAOPSKAEPQ

This sequence must be put in cell A2.
Thus, I have to perform some specific operations in this text:

Example 1:
Rules:
a) Fragment the sequence before K but not always (you could have lost
cut).
b) Sequence is not cut if K is found before FP

Results:

AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAOEKOQPPDAOPSKAEPQ

0 lost cut = Cutting the sequence all the time in which K is present
(The subsequences of this process should be put in B column:
AASSASDK
ASASDASFAFSASASADK
ASASAFPKQREWEAQEOK
SPADAOEK
OQPPDAOPSK
AEPQ

1 lost cut = Cutting the sequence after the first K present in the
sequence (The subsequences of this process should be put in C column::
AASSASDKASASDASFAFSASASADK
ASASAFPKQREWEAQEOKSPADAOEK
OQPPDAOPSKAEPQ
AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
SPADAOEKOQPPDAOPSKAEPQ

2 lost cut = = Cutting the sequence after the second K (just for the
third and following) present in the sequence (The subsequences of this
process should be put in D column:
AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
SPADAOEKOQPPDAOPSKAEPQ

Repair that in some cases I need lost cuts in which you cut after 1, 2,
3, 4,... specific characters.
I have to specify such rules in some place of the sheet containing the
precursor text.
The rules are:

Cut after "XXX" (In this example I have put K but the some cell in the
sheet must contain what is the character in which the sequence will be
fragmented). In some cases it could be more than only one character
(e.g. K and R; nor necessarily together)
Cut before "XXX" (The cut may be after like previous example or before
the character)

Never before "XXX" (In some cases I have prohibitive situations; e.g.
It must not cut a sequence in K if K is preceeded by P or by RP)
Never after "XXX" (Same for after)

Number of times that the character could be missed prior cut "XXX" (In
some place of the sheet I must explicit how many characters could be
"lost" prior cut (see example).

You may want to look into "regular expressions" to do what you are trying to
describe. If you download and install Longre's free morefunc.xll add-in from
http://xcell05.free.fr/ you will see that you can use them as worksheet
functions and also call them from a VBA module.

What you write is a bit confusing. For example, one rule you give is:
"Sequence is not cut if K is found before FP" but in your example you seem to
be acting as if the rule applies if K is found AFTER FP.

I am assuming the output starts in B1; if it starts in a different row, then
adjust the ROW() function to result in a 1 as the output:

seq is the character sequence (Insert/Name/Define and set seq = "your string")

For the "0 lost cuts"

B1: =REGEX.MID(seq,"(\w+?([^FP]K|$)){"&COLUMN()-1&"}",ROW())

ROW() resolves to a '1' which means take the 'first' sequence that matches the
pattern. As you copy/drag the formula down, ROW() will resolve to '2', '3',
etc. which means match the 2nd, 3rd, etc sequence that matches the pattern.

The basic pattern is defined by "(\w+?([^FP]K|$)){" which means look for a
sequence of letters that ends with a K that is not preceded by an FP, or that
is at the end of the string.

The {"&COLUMN()-1&"}" resolves, in Column B, to {1} which means look for one
occurrence of the preceding pattern.

If you copy/drag the formula down until you get blanks for the results, you
will see what you posted in your previous message.

If you copy/drag across to column D, you will see the results of "1 lost cut"
or "2 lost cuts".

I think once you understand the formula construction and the regular
expressions, it will be simple to use this for your other rules.

Without the COLUMN and ROW functions, the formulas would look like:

B1: =REGEX.MID(seq,"(\w+?([^FP]K|$)){1}",1)
B2: =REGEX.MID(seq,"(\w+?([^FP]K|$)){1}",2)

C1: =REGEX.MID(seq,"(\w+?([^FP]K|$)){2}",1)
C2: =REGEX.MID(seq,"(\w+?([^FP]K|$)){2}",2)

To use this in VBA, you would use the RUN method which is outlined in HELP for
morefunc.xll
--ron
 
R

Ron Rosenfeld

1 lost cut = Cutting the sequence after the first K present in the
sequence (The subsequences of this process should be put in C column::
AASSASDKASASDASFAFSASASADK
ASASAFPKQREWEAQEOKSPADAOEK
OQPPDAOPSKAEPQ
AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
SPADAOEKOQPPDAOPSKAEPQ

See my other answer. But I did not understand how you obtained the last two
lines in the "1 lost cut" sequence.

They are identical to the two lines in the "2 lost cut" sequence, so I thought
this might be a typo. But perhaps I am missing something?


--ron
 
P

paulinoluciano

Hi Ron Rosenfeld,
Thank you very much for your help.
Could I use your first function as a VBA code?
In this second case, is is possible that we have a typo. In this case
speak about 1 lost cut means that you will cut the sequence only after
the first K appear, never to the first one. But you will have
intermediates in such process because you never now where will be
performed the first cut.
Luciano
 
R

Ron Rosenfeld

Hi Ron Rosenfeld,
Thank you very much for your help.
Could I use your first function as a VBA code?

Yes, you can. The morefunc.xll add-in functions can be used in VBA by using
the RUN method. See HELP for those add-ins for more details.

In this second case, is is possible that we have a typo. In this case
speak about 1 lost cut means that you will cut the sequence only after
the first K appear, never to the first one. But you will have
intermediates in such process because you never now where will be
performed the first cut.

How do you determine, if you specify ONE lost cut, whether the first cut will
occur after the SECOND 'K', or after the THIRD 'K', or ???

The formulas assumed that with ONE lost cut, the first cut would occur after
the SECOND 'K'.

Here is a UDF written in VBA to do the same thing, using the REGEX.MID function
from the morefunc.xll add-in. The variables should be self-explanatory. The
return value is an array, and the individual components can be obtained using
the INDEX worksheet function.

e.g. with the sequence stored in A1:

=INDEX(SplitK($A$1,0),1) would return the first item in the '0 lost cuts'
splitting function).

===================================
Option Explicit
Function SplitK(ByVal seq As String, LostCut As Long) As Variant
Dim i As Long, j As Long
Dim KCount As Long
Dim Temp() As String

If LostCut < 0 Then
SplitK = CVErr(xlErrNum)
Exit Function
End If

KCount = Len(seq) - Len(Replace(seq, "K", ""))
ReDim Temp(1 To KCount)

For i = 1 To KCount
Temp(i) = Run([regex.mid], seq, "(\w+?([^FP]K|$)){" & LostCut + 1 & "}", i)
Next i

SplitK = Temp
End Function
=========================

This could also be written as a SUB to automatically place the results into
specified cells, but it would be less flexible.

To write results into columns B, C, D:

=====================================
Option Explicit
Sub SplitK()
Const seq As String =
"AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAOEKOQPPDAOPSKAEPQ"
Const MaxLostCuts As Long = 2
Const ResultColumn As Long = 2 'Column B

Dim i As Long
Dim LostCut As Long
Dim KCount As Long
Dim Temp() As String


KCount = Len(seq) - Len(Replace(seq, "K", ""))
ReDim Temp(1 To KCount)

For LostCut = 0 To MaxLostCuts
For i = 1 To KCount
Cells(i, ResultColumn + LostCut) = _
Run([regex.mid], seq, "(\w+?([^FP]K|$)){" & LostCut + 1 & "}", i)
Next i
Next LostCut

End Sub
====================================


--ron
 
P

paulinoluciano

Hi Ron,
When we are talking about "lost cut" it means that inside the sequence
will be present 1 "K" or 2 "K" or 3 "K" that will not detected in order
to be cut.
Do you understand?
Luciano
 
R

Ron Rosenfeld

Hi Ron,
When we are talking about "lost cut" it means that inside the sequence
will be present 1 "K" or 2 "K" or 3 "K" that will not detected in order
to be cut.
Do you understand?
Luciano

I understood that to mean that if there is
ZERO lost cuts then
cut after every K (that is not preceded by an FP)

if there is ONE lost cut then
cut after every second K that is not preceded by an FP

if there are TWO lost cuts then
cut after every third K that is not preceded by an FP




--ron
 
P

paulinoluciano

Yes, it is almost this. However, since some cut is performed the
sequence to be considered to serach the next possible cut is the
remained subsequence. In such case, it could be expected any place in
the text sequence displayng two (or three) K being that the second or
third should be at the end of the sequence.
Luciano
 
R

Ron Rosenfeld

Yes, it is almost this. However, since some cut is performed the
sequence to be considered to serach the next possible cut is the
remained subsequence. In such case, it could be expected any place in
the text sequence displayng two (or three) K being that the second or
third should be at the end of the sequence.
Luciano

I don't understand how what you are writing is different from the results that
my algorithm produces.

Perhaps if you gave some examples of the results of my formula on a text string
vs what you expect to have as a result.

For example, with ONE lost cut, and using your original seq, I get:

AASSASDKASASDASFAFSASASADK
ASASAFPKQREWEAQEOKSPADAOEK
OQPPDAOPSKAEPQ

There as been ONE cut missed in each string:

AASSASDKASASDASFAFSASASADK
^
ASASAFPKQREWEAQEOKSPADAOEK
^^^ ^
OQPPDAOPSKAEPQ
^

The FPK sequence in the second string is also not cut based on your initial
specifications.

What kind of output are you expecting from this, and why??


--ron
 
P

paulinoluciano

It is exactly this I`m waiting for. However I could not apply your
algorithm yet. It is telling me that are an error: "Run-time error
'1004': Method 'Run' of object'_Global' failed".
I did not understand what is happening.
Regards,
Luciano
 
R

Ron Rosenfeld

It is exactly this I`m waiting for. However I could not apply your
algorithm yet. It is telling me that are an error: "Run-time error
'1004': Method 'Run' of object'_Global' failed".
I did not understand what is happening.
Regards,
Luciano

1. Perhaps you did not follow all of the instructions.
2. Perhaps there is a problem with line-wrapping in your newsgroup reader, so
that coding is not copied precisely as I posted.

Suggestion:

Write me exactly what you did to reproduce what I recommended.
Copy/Paste the code you are using into your post.
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top