Macro to apply parsing rules for strings and list the substrings

  • Thread starter Luciano Paulino da Silva
  • Start date
L

Luciano Paulino da Silva

Dear All,
I'm looking for detect and list the substrings (bellow cell "A3") for
a given string (on cell "A1") that must be generated for one or more
general rule(s). I have almost 100 different rules that can be applied
alone or together one to the other. Do you have any idea about how
could I do that? The rules are listed bellow and I put an example.

Rules:
According to these rules, letters in a string undergoing parse are
designated in the left or right direction from the parsed letter.
There are some exceptions related to the presence of one or more
letters sorrounding some specific point of parse.

Parsing rules:

Rule Parse where? Exceptions
1 Right side of K or R if P is Right to K or
R
2 Right side of K or R
3 Right side of K or R if P is Right to
K or R; after K in CKY, DKD, CKH, CKD, KKR; after R in RRH, RRR, CRK,
DRD, RRF, KRR
4 Right side of K
5 Left side of K
6 Right side of M
7 Right side of R if P is Right to R
8 Left side of D
9 Left side of D, Right side of K
10 Left side of D or E
11 Right side of E if P is Right to E, or if
E is Right to E
12 Right side of D or E if P is Right to D or
E, or if E is Right to D or E
13 Right side of D, E and K if P is Right to D or E,
or if E is Right to D or E
14 Right side of F, L, M, W, Y if P is Right to F,
L, M, W, Y, if P is Left to Y
15 Right side of F, Y, W if P is Right to F,
Y, W, if P is Left to Y
16 Right side of K, R, F, Y, W if P is Right to K, R,
F, Y, W, if P is Left to Y
17 Right side of F, L
18 Right side of F, L, W, Y, A, E, Q
19 Right side of A, F, Y, W, L, I, V
20 Left side of A, F, I, L, M, V if D or E is Left to
A, F, I, L, M, V


Examples:
String:

AGFSAFSAHASGASHSGHHSRASAKSASFDDAKPASASAFDAGSRPASSDADASAPSASDASDASSRADSKADSKK

Using Rule 1:

AGFSAFSAHASGASHSGHHSR
ASAK
SASFDDAKPASASAFDAGSRPASSDADASAPSASDASDASSR
ADSK
ADSK
K

Using Rule 8:
AGFSAFSAHASGASHSGHHSRASAKSASF
D
DAKPASASAF
DAGSRPASS
DA
DASAPSAS
DAS
DASSRA
DSKA
DSKK

Using Rules 1 and 17
AGF
SAF
SAHASGASHSGHHSR
ASAK
SASF
DDAKPASASAF
DAGSRPASSDADASAPSASDASDASSR
ADSK
ADSK
K
 
B

Bernie Deitrick

Luciano,

A lot of these simpler rules would be easy to implement

For example, rule 2

Function Rule2(strVal As String) As Variant
Rule2 = Application.Transpose(Split(Replace(Replace(strVal, "K", "K "), "R", "R "), " "))
End Function


If your string is in cell A1, your could select A2:A10 and enter using Ctrl-Shift-Enter

=Rule2(A1)

Other rules would require stepping through and checking each instance, but before I do an example of
that, if you could answer one question:

Is there a letter that will NEVER appear in your strings?

HTH,
Bernie
MS Excel MVP
 
L

Luciano Paulino da Silva

Bernie,

OK, We will have just 20 types of letters:

A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, X

For some situations, I will have some new future rules and it is
possible that in several cases I will have to apply more than one rule
simultaneously.
Thanks in advance,
Luciano
 
L

Luciano Paulino da Silva

I`m very happy that you said that a lot of the rules are simple to
implemnt. I have successfuly tested that one you send me.
Thank for your help with this.
Luciano
 
R

Ron Rosenfeld

Dear All,
I'm looking for detect and list the substrings (bellow cell "A3") for
a given string (on cell "A1") that must be generated for one or more
general rule(s). I have almost 100 different rules that can be applied
alone or together one to the other. Do you have any idea about how
could I do that? The rules are listed bellow and I put an example.

Rules:
According to these rules, letters in a string undergoing parse are
designated in the left or right direction from the parsed letter.
There are some exceptions related to the presence of one or more
letters sorrounding some specific point of parse.

Parsing rules:

Rule Parse where? Exceptions
1 Right side of K or R if P is Right to K or
R
2 Right side of K or R
3 Right side of K or R if P is Right to
K or R; after K in CKY, DKD, CKH, CKD, KKR; after R in RRH, RRR, CRK,
DRD, RRF, KRR
4 Right side of K
5 Left side of K
6 Right side of M
7 Right side of R if P is Right to R
8 Left side of D
9 Left side of D, Right side of K
10 Left side of D or E
11 Right side of E if P is Right to E, or if
E is Right to E
12 Right side of D or E if P is Right to D or
E, or if E is Right to D or E
13 Right side of D, E and K if P is Right to D or E,
or if E is Right to D or E
14 Right side of F, L, M, W, Y if P is Right to F,
L, M, W, Y, if P is Left to Y
15 Right side of F, Y, W if P is Right to F,
Y, W, if P is Left to Y
16 Right side of K, R, F, Y, W if P is Right to K, R,
F, Y, W, if P is Left to Y
17 Right side of F, L
18 Right side of F, L, W, Y, A, E, Q
19 Right side of A, F, Y, W, L, I, V
20 Left side of A, F, I, L, M, V if D or E is Left to
A, F, I, L, M, V


Examples:
String:

AGFSAFSAHASGASHSGHHSRASAKSASFDDAKPASASAFDAGSRPASSDADASAPSASDASDASSRADSKADSKK

Using Rule 1:

AGFSAFSAHASGASHSGHHSR
ASAK
SASFDDAKPASASAFDAGSRPASSDADASAPSASDASDASSR
ADSK
ADSK
K

Using Rule 8:
AGFSAFSAHASGASHSGHHSRASAKSASF
D
DAKPASASAF
DAGSRPASS
DA
DASAPSAS
DAS
DASSRA
DSKA
DSKK

Using Rules 1 and 17
AGF
SAF
SAHASGASHSGHHSR
ASAK
SASF
DDAKPASASAF
DAGSRPASSDADASAPSASDASDASSR
ADSK
ADSK
K

You could implement each rule as a Regular Expression, and then select the
rule(s) you wish to apply.

For example, a routine with the Regular Expressions for rules 1,2,8,17 already
figured out for you, and which allows the application of multiple rules on a
given string, might look like the code below.

You select a cell which contains your string and then execute the macro. You
can enter one or more rules -- enter them by number <space> separated.

The macro will then apply each rule in order, and output the results in the
rows below your selected cell, skipping the row immediately below (so that if
your data is in A1, the strings will start in A3.

There are other methods of implementing your rules than Regular Expressions,
but with so many rules, this would be the simplest for me.

=================================
Option Explicit
Sub ParseSpecial()
Dim c As Range
Dim i As Long, j As Long
Dim vRule As Variant
Dim aRule(1 To 100) As String
Dim aResRule1() As String
Dim aResRule2() As String
Dim re As Object, mc As Object, m As Object

'Rules
'construct so that all strings will have at least
'one match
'store each rule number in the corresponding element of
'the aRule array

aRule(1) = "([^KR]|[KR]P)+[KR]?|[KR]"
aRule(2) = "[^KR]+[KR]?|[KR]"
aRule(8) = "D?[^D]+|D"
aRule(17) = "[^FL]+[FL]?|[FL]"

vRule = _
Split(InputBox("Rule Number (for multiple rules, separate with space): "))

Set c = Selection 'or whatever

If c.Count <> 1 Then
MsgBox ("Can only select one cell")
'but could add code to iterate through a
' bunch of cells
Exit Sub
End If

'add more code to ensure validity of rule number
' --------------------------
ReDim aResRule1(0)
aResRule1(0) = c.Value
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = False
re.Global = True

For j = 0 To UBound(vRule)
re.Pattern = aRule(vRule(j))
ReDim aResRule2(UBound(aResRule1))
'move current results to aResRule2
For i = 0 To UBound(aResRule1)
aResRule2(i) = aResRule1(i)
Next i
'clear out aResRule1
ReDim aResRule1(0)
For i = 0 To UBound(aResRule2)
Set mc = re.Execute(aResRule2(i))
For Each m In mc
If Len(aResRule1(0)) > 0 Then
ReDim Preserve aResRule1(UBound(aResRule1) + 1)
End If
aResRule1(UBound(aResRule1)) = m
Next m
Next i
Next j
'clear and write results below
WriteResults aResRule1, c.Offset(2, 0)
End Sub
Sub WriteResults(res, rDest As Range)
'clear out range for results
Range(rDest, rDest.End(xlDown)).Clear
Dim i As Long
For i = 0 To UBound(res)
rDest(i + 1, 1).Value = res(i)
Next i
With rDest(i + 1, 1)
.Value = "End of List of Strings"
.Font.Italic = True
.Font.Bold = True
.Font.Color = vbRed
End With
End Sub
=============================
--ron
 
L

Luciano Paulino da Silva

Dear All,
I'm looking for detect and list the substrings (bellow cell "A3") for
a given string (on cell "A1") that must be generated for one or more
general rule(s). I have almost 100 different rules that can be applied
alone or together one to the other. Do you have any idea about how
could I do that? The rules are listed bellow and I put an example.
Rules:
According to these rules, letters in a string undergoing parse are
designated in the left or right direction from the parsed letter.
There are some exceptions related to the presence of one or more
letters sorrounding some specific point of parse.
Parsing rules:
Rule        Parse where?                                  Exceptions
1   Right side of K or R                           if P is Right to K or
R
2       Right side of K or R
3       Right side of K or R                               if P is Right to
K or R; after K in CKY, DKD, CKH, CKD, KKR; after R in RRH, RRR, CRK,
DRD, RRF, KRR
4   Right side of K
5   Left side of K
6   Right side of M
7   Right side of R                                if P is Right to R
8   Left side of D
9   Left side of D, Right side of K
10  Left side of D or E
11  Right side of E                                if P is Right to E, or if
E is Right to E
12  Right side of D or E                           if P is Right to D or
E, or if E is Right to D or E
13  Right side of D, E and K                       if P is Right to D or E,
or if E is Right to D or E
14          Right side of F, L, M, W, Y                    if P is Right to F,
L, M, W, Y, if P is Left to Y
15          Right side of F, Y, W                          if P is Right to F,
Y, W, if P is Left to Y
16  Right side of K, R, F, Y, W                    if P is Right to K, R,
F, Y, W, if P is Left to Y
17  Right side of F, L
18  Right side of F, L, W, Y, A, E, Q
19  Right side of A, F, Y, W, L, I, V
20  Left side of A, F, I, L, M, V                  if D or E is Left to
A, F, I, L, M, V


Using Rule 1:

Using Rule 8:
AGFSAFSAHASGASHSGHHSRASAKSASF
D
DAKPASASAF
DAGSRPASS
DA
DASAPSAS
DAS
DASSRA
DSKA
DSKK
Using Rules 1 and 17
AGF
SAF
SAHASGASHSGHHSR
ASAK
SASF
DDAKPASASAF
DAGSRPASSDADASAPSASDASDASSR
ADSK
ADSK
K

You could implement each rule as a Regular Expression, and then select the
rule(s) you wish to apply.

For example, a routine with the Regular Expressions for rules 1,2,8,17 already
figured out for you, and which allows the application of multiple rules on a
given string, might look like the code below.

You select a cell which contains your string and then execute the macro.  You
can enter one or more rules -- enter them by number <space> separated.

The macro will then apply each rule in order, and output the results in the
rows below your selected cell, skipping the row immediately below (so that if
your data is in A1, the strings will start in A3.

There are other methods of implementing your rules than Regular Expressions,
but with so many rules, this would be the simplest for me.

=================================
Option Explicit
Sub ParseSpecial()
Dim c As Range
Dim i As Long, j As Long
Dim vRule As Variant
Dim aRule(1 To 100) As String
Dim aResRule1() As String
Dim aResRule2() As String
Dim re As Object, mc As Object, m As Object

'Rules
'construct so that all strings will have at least
'one match
'store each rule number in the corresponding element of
'the aRule array

aRule(1) = "([^KR]|[KR]P)+[KR]?|[KR]"
aRule(2) = "[^KR]+[KR]?|[KR]"
aRule(8) = "D?[^D]+|D"
aRule(17) = "[^FL]+[FL]?|[FL]"

vRule = _
  Split(InputBox("Rule Number (for multiple rules, separate with space): "))

Set c = Selection 'or whatever

If c.Count <> 1 Then
    MsgBox ("Can only select one cell")
    'but could add code to iterate through a
    '  bunch of cells
    Exit Sub
End If

'add more code to ensure validity of rule number
' --------------------------
ReDim aResRule1(0)
aResRule1(0) = c.Value
Set re = CreateObject("vbscript.regexp")
    re.IgnoreCase = False
    re.Global = True

For j = 0 To UBound(vRule)
    re.Pattern = aRule(vRule(j))
    ReDim aResRule2(UBound(aResRule1))
    'move current results to aResRule2
        For i = 0 To UBound(aResRule1)
            aResRule2(i) = aResRule1(i)
        Next i
    'clear out aResRule1
    ReDim aResRule1(0)
    For i = 0 To UBound(aResRule2)
        Set mc = re.Execute(aResRule2(i))
            For Each m In mc
                If Len(aResRule1(0)) > 0 Then
                    ReDim Preserve aResRule1(UBound(aResRule1) + 1)
                End If
                    aResRule1(UBound(aResRule1)) = m
            Next m
     Next i
Next j
'clear and write results below
WriteResults aResRule1, c.Offset(2, 0)
End Sub
Sub WriteResults(res, rDest As Range)
'clear out range for results
Range(rDest, rDest.End(xlDown)).Clear
Dim i As Long
    For i = 0 To UBound(res)
        rDest(i + 1, 1).Value = res(i)
    Next i
With rDest(i + 1, 1)
    .Value = "End of List of Strings"
    .Font.Italic = True
    .Font.Bold = True
    .Font.Color = vbRed
End With
End Sub
=============================
--ron

Dear Ron,
Excelent!!! Thank you very much!
Luciano
 
R

Ron Rosenfeld

R

Ron Rosenfeld

I`m trying but for several situations I could not implement the
rules... :-/

Well, post them and I or someone will give it a try. In addition to the rules,
also post the source string and the expected results.

I don't mind doing a few.
--ron
 
L

Luciano Paulino da Silva

I`m trying but for several situations I could not implement the
rules... :-/

Dear Ron,
I had forgotten to say that in some circunstances it is necessary to
specify that the rule could avoid parse 1, 2, 3... of the letters.
For example, we have the string bellow and following a rule that parse
after K. If I specify the possibility of one (1) lost parsing, we
should have as result:

ASADASDASKASSASADASASADKASADASASAKDDAASASKASSADADASKSASA

ASADASDASK
ASSASADASASADK
ASADASASAK
DDAASASK
ASSADADASK
SASA
ASADASDASKASSASADASASADK
ASSASADASASADKASADASASAK
ASADASASAKDDAASASK
DDAASASKASSADADASK
ASSADADASKSASA

Do you have any idea how to implement that in your script?
Thanks in advance,
Luciano
 
L

Luciano Paulino da Silva

Dear Ron,
I had forgotten to say that in some circunstances it is necessary to
specify that the rule could avoid parse 1, 2, 3... of the letters.
For example, we have the string bellow and following a rule that parse
after K. If I specify the possibility of one (1) lost parsing, we
should have as result:

ASADASDASKASSASADASASADKASADASASAKDDAASASKASSADADASKSASA

ASADASDASK
ASSASADASASADK
ASADASASAK
DDAASASK
ASSADADASK
SASA
ASADASDASKASSASADASASADK
ASSASADASASADKASADASASAK
ASADASASAKDDAASASK
DDAASASKASSADADASK
ASSADADASKSASA

Do you have any idea how to implement that in your script?
Thanks in advance,
Luciano

I`m having some problems with rules 3, 9, 12, 13, 14, 15, 16, 20
 
L

Luciano Paulino da Silva

Well, post them and I or someone will give it a try. In addition to the rules,
also post the source string and the expected results.

I don't mind doing a few.
--ron

I`m having some problems with rules 3, 9, 12, 13, 14, 15, 16, 20.
 
R

Ron Rosenfeld

I`m having some problems with rules 3, 9, 12, 13, 14, 15, 16, 20.

You didn't give any examples of what the results would be of these rules when
applied to a target string. By that I mean to do, as you did in your initial
posting, to give an example of the input string, and what you expect as output.

That makes it more difficult to debug.

But try this:

Rule 3:

"(CKY|DKD|CKH|CKD|KKR|RRH|RRR|CRK|DRD|RRF|KRR|[^KR]|[KR]P)+[KR]?|[KR]"

Rule 9:

"D?[^KD]+K?|[KD]"
 
R

Ron Rosenfeld

Dear Ron,
I had forgotten to say that in some circunstances it is necessary to
specify that the rule could avoid parse 1, 2, 3... of the letters.
For example, we have the string bellow and following a rule that parse
after K. If I specify the possibility of one (1) lost parsing, we
should have as result:

ASADASDASKASSASADASASADKASADASASAKDDAASASKASSADADASKSASA

ASADASDASK
ASSASADASASADK
ASADASASAK
DDAASASK
ASSADADASK
SASA
ASADASDASKASSASADASASADK
ASSASADASASADKASADASASAK
ASADASASAKDDAASASK
DDAASASKASSADADASK
ASSADADASKSASA

Do you have any idea how to implement that in your script?
Thanks in advance,
Luciano

It would just be a matter of combining the results from the array we generate.

Examining your example result, it appears as if you want to have both the
original output with the output relevant to the number of "skipped" parsing's
below it.

Since this is getting more complex, I have taken the liberty of also outputting
the rule(s) being used; and separating the original output from the output with
the "skipped" parsing's.

This has required some modifications so I am posting the entire macro as it
presently exists.

Eventually, it might be useful to input the parameters (rule(s) and number of
skips) via a user form, instead of multiple Input Box's as I've done so far.

======================================================
Option Explicit
Dim aRule(0 To 1, 1 To 100) As String
Sub ParseSpecial()
Dim c As Range
Dim i As Long, j As Long
Dim vRule As Variant
Dim lSkips As Long

Dim aResRule1() As String
Dim aResRule2() As String
Dim re As Object, mc As Object, m As Object

'Rules
'construct so that all strings will have at least
'one match
'store each rule number in the corresponding element of
'the aRule array
aRule(0, 1) = "Right side of K or R; NOT if P is Right to K or R"
aRule(1, 1) = "([^KR]|[KR]P)+[KR]?|[KR]"

aRule(0, 2) = "Right side of K or R"
aRule(1, 2) = "[^KR]+[KR]?|[KR]"

aRule(0, 3) = "Right side of K or R; NOT if P is Right to K or R; " & _
"after K in CKY, DKD, CKH, CKD, KKR; after R in RRH, RRR, " & _
"CRK, DRD, RRF, KRR"
aRule(1, 3) = "(CKY|DKD|CKH|CKD|KKR|RRH|RRR|CRK|DRD|RRF|KRR|[^KR]|" & _
"[KR]P)+[KR]?|[KR]"

aRule(0, 4) = "Right side of K"
aRule(1, 4) = "[^K]+K?|K"

aRule(0, 8) = "Left side of D"
aRule(1, 8) = "D?[^D]+|D"

aRule(0, 9) = "Left side of D, Right side of K"
aRule(1, 9) = "D?[^KD]+K?|[KD]"

aRule(0, 17) = "Right side of F, L"
aRule(1, 17) = "[^FL]+[FL]?|[FL]"

vRule = _
Split(InputBox("Rule Number (for multiple rules, separate with space): "))

lSkips = InputBox(Prompt:="Number to Skip", Default:="0")

Set c = Selection 'or whatever

If c.Count <> 1 Then
MsgBox ("Can only select one cell")
'but could add code to iterate through a
' bunch of cells
Exit Sub
End If

'add more code to ensure validity of rule number
' --------------------------
ReDim aResRule1(0)
aResRule1(0) = c.Value
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = False
re.Global = True

For j = 0 To UBound(vRule)
re.Pattern = aRule(1, vRule(j))
ReDim aResRule2(UBound(aResRule1))
'move current results to aResRule2
For i = 0 To UBound(aResRule1)
aResRule2(i) = aResRule1(i)
Next i
'clear out aResRule1
ReDim aResRule1(0)
For i = 0 To UBound(aResRule2)
Set mc = re.Execute(aResRule2(i))
For Each m In mc
If Len(aResRule1(0)) > 0 Then
ReDim Preserve aResRule1(UBound(aResRule1) + 1)
End If
aResRule1(UBound(aResRule1)) = m
Next m
Next i
Next j

'clear and write results below
WriteResults aResRule1, c.Offset(2, 0), vRule, lSkips
End Sub
'------------------------------------------------------------------------------------
Sub WriteResults(res, rDest As Range, Rules As Variant, lSkips As Long)
Dim i As Long, j As Long, k As Long
Dim res2()
'clear out range for results
Range(rDest, rDest.End(xlDown)).Clear
'write rules
With rDest.Offset(-1, 0)
.Clear
For i = 0 To UBound(Rules)
.Value = .Value & aRule(0, Rules(i)) & _
IIf(i < UBound(Rules), vbLf, "")
Next i
.Font.Italic = True
.Font.Color = vbRed
i = 1
Do While InStr(i, .Value, "NOT", vbBinaryCompare) > 0
With .Characters(InStr(i, .Value, "NOT", vbBinaryCompare), 3).Font
.Bold = True
.Color = vbBlack
End With
i = i + 3
Loop
End With
For i = 0 To UBound(res)
rDest(i + 1, 1).Value = res(i)
Next i

'check for skips
For j = 1 To lSkips Step lSkips 'won't execute if lSkips = 0
'move current results to res2
ReDim res2(UBound(res))
For i = 0 To UBound(res)
res2(i) = res(i)
Next i
'clear out res
ReDim res(0)
'combine
For i = 0 To UBound(res2) - lSkips
If Len(res(0)) > 0 Then
ReDim Preserve res(UBound(res) + 1)
End If
For k = i To i + lSkips
res(UBound(res)) = res(UBound(res)) & _
res2(k)
Next k
Next i
Next j

If lSkips > 0 Then
Set rDest = rDest.End(xlDown)(2, 1)
With rDest
.Value = "With " & lSkips & " Skip" & _
IIf(lSkips > 1, "s", "") & ":"
.Font.Color = vbRed
.Font.Bold = True
For i = 0 To UBound(res)
.Offset(i + 1, 0).Value = res(i)
Next i
End With
End If

With rDest.End(xlDown)(2, 1)
.Value = "End of List of Strings"
.Font.Italic = True
.Font.Bold = True
.Font.Color = vbRed
End With
End Sub
================================================
--ron
 
L

Luciano Paulino da Silva

I`m having some problems with rules 3, 9, 12, 13, 14, 15, 16, 20.

You didn't give any examples of what the results would be of these rules when
applied to a target string.  By that I mean to do, as you did in your initial
posting, to give an example of the input string, and what you expect as output.

That makes it more difficult to debug.

But try this:

Rule 3:

"(CKY|DKD|CKH|CKD|KKR|RRH|RRR|CRK|DRD|RRF|KRR|[^KR]|[KR]P)+[KR]?|[KR]"

Rule 9:

"D?[^KD]+K?|[KD]"

I`m sending some examples of than, sorry...
Thanks in advance,

3 Right side of K or R if P is Right to K or R; except after K
in CKY, DKD, CKH, CKD, KKR; after R in RRH, RRR, CRK, DRD, RRF, KRR

AAKASASRAAAKASASKPASASASRPSASDFCKYDSADSDKDASCKHSGAHSACKDFHASFKKRGHAHGRRHGSAHGSAHRRRFSDAGFSCRKASDRDRRFKRRS

AAK
ASASR
AAAK
ASASKPASASASRPSASDFCKYDSADSDKDASCKHSGAHSACKDFHASFKKRGHAHGRRHGSAHGSAHRRRFSDAGFSCRKASDRDRRFKRRS

9 Left side of D, Right side of K

ASASADSASASKASSASASHASASKSASASDSAAS

ASASA
DSASASK
ASSASASHASASK
SASAS
DSAAS

12 Right side of D or E except if P is Right to D or E, or if E
is Right to D or E

ASADSSASASAESASASADPASASASAEPASAASDEASASASAEESASAS

ASAD
SSASASAE
SASASADPASASASAEPASAASDEASASASAEESASAS

13 Right side of D, E and K except if P is Right to D or E, or if
E is Right to D or E

SASASDASASAESASASASKASAKPSADPASASASAEPASASASDEASASASEEASSAS
SASASD
ASASAE
SASASASK
ASAKPSADPASASASAEPASASASDEASASASEEASSAS

14 Right side of F, L, M, W, Y except if P is Right to F, L, M,
W, Y, if P is Left to Y

SASFASASLASASMASASWASASYASASFPASASASLPASASMPASSAWPASASYPASASAPYASS
SASF
ASASL
ASASM
ASASW
ASASY
ASASFPASASASLPASASMPASSAWPASASYPASASAPYASS

15 Right side of F, Y, W except if P is Right to F, Y, W, if P is
Left to Y

ASSAFASASASYASASWASSASFPASASAYPASAASWPASASAPY
ASSAF
ASASASY
ASASW
ASSASFPASASAYPASAASWPASASAPY

16 Right side of K, R, F, Y, W except if P is Right to K, R, F,
Y, W, if P is Left to Y

ASSASKASASASRASASAFASSAYASASSWSASASKPASASASRPASSAFPSASASYPASASWPASSASPY
ASSASK
ASASASR
ASASAF
ASSAY
ASASSW
SASASKPASASASRPASSAFPSASASYPASASWPASSASPY

20 Left side of A, F, I, L, M, V except if D or E is Left to A,
F, I, L, M, V

TTAWFNNICCMSSVASDATTYEQTTDFQQEFQQDIQQEINNDLQWELQWQEMQWDMWQWEVQWQWDVQWQLSTSS
TT
AW
FNN
ICC
MSS
V
ASDATTYEQTTDFQQEFQQDIQQEINNDLQWELQWQEMQWDMWQWEVQWQWDVQWQ
LSTSS
 
R

Ron Rosenfeld

3 Right side of K or R if P is Right to K or R; except after K
in CKY, DKD, CKH, CKD, KKR; after R in RRH, RRR, CRK, DRD, RRF, KRR

AAKASASRAAAKASASKPASASASRPSASDFCKYDSADSDKDASCKHSGAHSACKDFHASFKKRGHAHGRRHGSAHGSAHRRRFSDAGFSCRKASDRDRRFKRRS

AAK
ASASR
AAAK
ASASKPASASASRPSASDFCKYDSADSDKDASCKHSGAHSACKDFHASFKKRGHAHGRRHGSAHGSAHRRRFSDAGFSCRKASDRDRRFKRRS

9 Left side of D, Right side of K

ASASADSASASKASSASASHASASKSASASDSAAS

ASASA
DSASASK
ASSASASHASASK
SASAS
DSAAS

12 Right side of D or E except if P is Right to D or E, or if E
is Right to D or E

ASADSSASASAESASASADPASASASAEPASAASDEASASASAEESASAS

ASAD
SSASASAE
SASASADPASASASAEPASAASDEASASASAEESASAS


I think these will do: (The others will have to wait, if you can't figure it
out).

Rule3:

"(CKY|DKD|CKH|CKD|KKR|RRH|RRR|CRK|DRD|RRF|KRR|[^KR]|[KR]P)+[KR]?|[KR]"

Rule9:

"D?[^KD]+K?|[KD]"

Rule12:

"([^DE]|[DE][EP])+[DE]?|[DE]"
--ron
 
L

Luciano Paulino da Silva

Dear Ron,
I had forgotten to say that in some circunstances it is necessary to
specify that the rule could avoid parse 1, 2, 3... of the letters.
For example, we have the string bellow and following a rule that parse
after K. If I specify the possibility of one (1) lost parsing, we
should have as result:


Do you have any idea how to implement that in your script?
Thanks in advance,
Luciano

It would just be a matter of combining the results from the array we generate.

Examining your example result, it appears as if you want to have both the
original output with the output relevant to the number of "skipped" parsing's
below it.

Since this is getting more complex, I have taken the liberty of also outputting
the rule(s) being used; and separating the original output from the output with
the "skipped" parsing's.

This has required some modifications so I am posting the entire macro as it
presently exists.

Eventually, it might be useful to input the parameters (rule(s) and number of
skips) via a user form, instead of multiple Input Box's as I've done so far.

======================================================
Option Explicit
Dim aRule(0 To 1, 1 To 100) As String
Sub ParseSpecial()
Dim c As Range
Dim i As Long, j As Long
Dim vRule As Variant
Dim lSkips As Long

Dim aResRule1() As String
Dim aResRule2() As String
Dim re As Object, mc As Object, m As Object

'Rules
'construct so that all strings will have at least
'one match
'store each rule number in the corresponding element of
'the aRule array
aRule(0, 1) = "Right side of K or R; NOT if P is Right to K or R"
aRule(1, 1) = "([^KR]|[KR]P)+[KR]?|[KR]"

aRule(0, 2) = "Right side of K or R"
aRule(1, 2) = "[^KR]+[KR]?|[KR]"

aRule(0, 3) = "Right side of K or R; NOT if P is Right to K or R; " & _
    "after K in CKY, DKD, CKH, CKD, KKR; after R in RRH, RRR, " & _
    "CRK, DRD, RRF, KRR"
aRule(1, 3) = "(CKY|DKD|CKH|CKD|KKR|RRH|RRR|CRK|DRD|RRF|KRR|[^KR]|" & _
    "[KR]P)+[KR]?|[KR]"

aRule(0, 4) = "Right side of K"
aRule(1, 4) = "[^K]+K?|K"

aRule(0, 8) = "Left side of D"
aRule(1, 8) = "D?[^D]+|D"

aRule(0, 9) = "Left side of D, Right side of K"
aRule(1, 9) = "D?[^KD]+K?|[KD]"

aRule(0, 17) = "Right side of F, L"
aRule(1, 17) = "[^FL]+[FL]?|[FL]"

vRule = _
  Split(InputBox("Rule Number (for multiple rules, separate with space): "))

lSkips = InputBox(Prompt:="Number to Skip", Default:="0")

Set c = Selection 'or whatever

If c.Count <> 1 Then
    MsgBox ("Can only select one cell")
    'but could add code to iterate through a
    '  bunch of cells
    Exit Sub
End If

'add more code to ensure validity of rule number
' --------------------------
ReDim aResRule1(0)
aResRule1(0) = c.Value
Set re = CreateObject("vbscript.regexp")
    re.IgnoreCase = False
    re.Global = True

For j = 0 To UBound(vRule)
    re.Pattern = aRule(1, vRule(j))
    ReDim aResRule2(UBound(aResRule1))
    'move current results to aResRule2
        For i = 0 To UBound(aResRule1)
            aResRule2(i) = aResRule1(i)
        Next i
    'clear out aResRule1
    ReDim aResRule1(0)
    For i = 0 To UBound(aResRule2)
        Set mc = re.Execute(aResRule2(i))
            For Each m In mc
                If Len(aResRule1(0)) > 0 Then
                    ReDim Preserve aResRule1(UBound(aResRule1) + 1)
                End If
                    aResRule1(UBound(aResRule1)) = m
            Next m
     Next i
Next j

'clear and write results below
WriteResults aResRule1, c.Offset(2, 0), vRule, lSkips
End Sub
'------------------------------------------------------------------------------------
Sub WriteResults(res, rDest As Range, Rules As Variant, lSkips As Long)
Dim i As Long, j As Long, k As Long
Dim res2()
'clear out range for results
Range(rDest, rDest.End(xlDown)).Clear
'write rules
With rDest.Offset(-1, 0)
    .Clear
    For i = 0 To UBound(Rules)
        .Value = .Value & aRule(0, Rules(i)) & _
            IIf(i < UBound(Rules), vbLf, "")
    Next i
        .Font.Italic = True
        .Font.Color = vbRed
    i = 1
    Do While InStr(i, .Value, "NOT", vbBinaryCompare) > 0
        With .Characters(InStr(i, .Value, "NOT", vbBinaryCompare), 3).Font
            .Bold = True
            .Color = vbBlack
        End With
        i = i + 3
    Loop
End With
    For i = 0 To UBound(res)
        rDest(i + 1, 1).Value = res(i)
    Next i

'check for skips
For j = 1 To lSkips Step lSkips 'won't execute if lSkips = 0
  'move current results to res2
        ReDim res2(UBound(res))
        For i = 0 To UBound(res)
            res2(i) = res(i)
        Next i
    'clear out res
    ReDim res(0)
    'combine
    For i = 0 To UBound(res2) - lSkips
        If Len(res(0)) > 0 Then
            ReDim Preserve res(UBound(res) + 1)
        End If
        For k = i To i + lSkips
            res(UBound(res)) = res(UBound(res)) & _
                res2(k)
        Next k
    Next i
Next j

If lSkips > 0 Then
    Set rDest = rDest.End(xlDown)(2, 1)
    With rDest
        .Value = "With " & lSkips & " Skip" & _
            IIf(lSkips > 1, "s", "") & ":"
        .Font.Color = vbRed
        .Font.Bold = True
    For i = 0 To UBound(res)
        .Offset(i + 1, 0).Value = res(i)
    Next i
    End With
End If

With rDest.End(xlDown)(2, 1)
    .Value = "End of List of Strings"
    .Font.Italic = True
    .Font.Bold = True
    .Font.Color = vbRed
End With
End Sub
================================================
--ron

Dear Ron,
You are correct that it would be better to input the parameters (rule
(s) and number of
skips) via a user form, instead of multiple Box. It would be great to
select the rules and eventually skips. Could you help me with this?
Thanks in advance,
Luciano
 
L

Luciano Paulino da Silva

Dear Ron,
I had forgotten to say that in some circunstances it is necessary to
specify that the rule could avoid parse 1, 2, 3... of the letters.
For example, we have the string bellow and following a rule that parse
after K. If I specify the possibility of one (1) lost parsing, we
should have as result:


Do you have any idea how to implement that in your script?
Thanks in advance,
Luciano

It would just be a matter of combining the results from the array we generate.

Examining your example result, it appears as if you want to have both the
original output with the output relevant to the number of "skipped" parsing's
below it.

Since this is getting more complex, I have taken the liberty of also outputting
the rule(s) being used; and separating the original output from the output with
the "skipped" parsing's.

This has required some modifications so I am posting the entire macro as it
presently exists.

Eventually, it might be useful to input the parameters (rule(s) and number of
skips) via a user form, instead of multiple Input Box's as I've done so far.

======================================================
Option Explicit
Dim aRule(0 To 1, 1 To 100) As String
Sub ParseSpecial()
Dim c As Range
Dim i As Long, j As Long
Dim vRule As Variant
Dim lSkips As Long

Dim aResRule1() As String
Dim aResRule2() As String
Dim re As Object, mc As Object, m As Object

'Rules
'construct so that all strings will have at least
'one match
'store each rule number in the corresponding element of
'the aRule array
aRule(0, 1) = "Right side of K or R; NOT if P is Right to K or R"
aRule(1, 1) = "([^KR]|[KR]P)+[KR]?|[KR]"

aRule(0, 2) = "Right side of K or R"
aRule(1, 2) = "[^KR]+[KR]?|[KR]"

aRule(0, 3) = "Right side of K or R; NOT if P is Right to K or R; " & _
    "after K in CKY, DKD, CKH, CKD, KKR; after R in RRH, RRR, " & _
    "CRK, DRD, RRF, KRR"
aRule(1, 3) = "(CKY|DKD|CKH|CKD|KKR|RRH|RRR|CRK|DRD|RRF|KRR|[^KR]|" & _
    "[KR]P)+[KR]?|[KR]"

aRule(0, 4) = "Right side of K"
aRule(1, 4) = "[^K]+K?|K"

aRule(0, 8) = "Left side of D"
aRule(1, 8) = "D?[^D]+|D"

aRule(0, 9) = "Left side of D, Right side of K"
aRule(1, 9) = "D?[^KD]+K?|[KD]"

aRule(0, 17) = "Right side of F, L"
aRule(1, 17) = "[^FL]+[FL]?|[FL]"

vRule = _
  Split(InputBox("Rule Number (for multiple rules, separate with space): "))

lSkips = InputBox(Prompt:="Number to Skip", Default:="0")

Set c = Selection 'or whatever

If c.Count <> 1 Then
    MsgBox ("Can only select one cell")
    'but could add code to iterate through a
    '  bunch of cells
    Exit Sub
End If

'add more code to ensure validity of rule number
' --------------------------
ReDim aResRule1(0)
aResRule1(0) = c.Value
Set re = CreateObject("vbscript.regexp")
    re.IgnoreCase = False
    re.Global = True

For j = 0 To UBound(vRule)
    re.Pattern = aRule(1, vRule(j))
    ReDim aResRule2(UBound(aResRule1))
    'move current results to aResRule2
        For i = 0 To UBound(aResRule1)
            aResRule2(i) = aResRule1(i)
        Next i
    'clear out aResRule1
    ReDim aResRule1(0)
    For i = 0 To UBound(aResRule2)
        Set mc = re.Execute(aResRule2(i))
            For Each m In mc
                If Len(aResRule1(0)) > 0 Then
                    ReDim Preserve aResRule1(UBound(aResRule1) + 1)
                End If
                    aResRule1(UBound(aResRule1)) = m
            Next m
     Next i
Next j

'clear and write results below
WriteResults aResRule1, c.Offset(2, 0), vRule, lSkips
End Sub
'------------------------------------------------------------------------------------
Sub WriteResults(res, rDest As Range, Rules As Variant, lSkips As Long)
Dim i As Long, j As Long, k As Long
Dim res2()
'clear out range for results
Range(rDest, rDest.End(xlDown)).Clear
'write rules
With rDest.Offset(-1, 0)
    .Clear
    For i = 0 To UBound(Rules)
        .Value = .Value & aRule(0, Rules(i)) & _
            IIf(i < UBound(Rules), vbLf, "")
    Next i
        .Font.Italic = True
        .Font.Color = vbRed
    i = 1
    Do While InStr(i, .Value, "NOT", vbBinaryCompare) > 0
        With .Characters(InStr(i, .Value, "NOT", vbBinaryCompare), 3).Font
            .Bold = True
            .Color = vbBlack
        End With
        i = i + 3
    Loop
End With
    For i = 0 To UBound(res)
        rDest(i + 1, 1).Value = res(i)
    Next i

'check for skips
For j = 1 To lSkips Step lSkips 'won't execute if lSkips = 0
  'move current results to res2
        ReDim res2(UBound(res))
        For i = 0 To UBound(res)
            res2(i) = res(i)
        Next i
    'clear out res
    ReDim res(0)
    'combine
    For i = 0 To UBound(res2) - lSkips
        If Len(res(0)) > 0 Then
            ReDim Preserve res(UBound(res) + 1)
        End If
        For k = i To i + lSkips
            res(UBound(res)) = res(UBound(res)) & _
                res2(k)
        Next k
    Next i
Next j

If lSkips > 0 Then
    Set rDest = rDest.End(xlDown)(2, 1)
    With rDest
        .Value = "With " & lSkips & " Skip" & _
            IIf(lSkips > 1, "s", "") & ":"
        .Font.Color = vbRed
        .Font.Bold = True
    For i = 0 To UBound(res)
        .Offset(i + 1, 0).Value = res(i)
    Next i
    End With
End If

With rDest.End(xlDown)(2, 1)
    .Value = "End of List of Strings"
    .Font.Italic = True
    .Font.Bold = True
    .Font.Color = vbRed
End With
End Sub
================================================
--ron

Dear Ron,
You are correct that it would be better to input the parameters (rule
(s) and number of
skips) via a user form, instead of multiple Box. It would be great to
select the rules and eventually skips. Could you help me with this?
Thanks in advance,
Luciano
 
L

Luciano Paulino da Silva

Dear Ron,
I had forgotten to say that in some circunstances it is necessary to
specify that the rule could avoid parse 1, 2, 3... of the letters.
For example, we have the string bellow and following a rule that parse
after K. If I specify the possibility of one (1) lost parsing, we
should have as result:


Do you have any idea how to implement that in your script?
Thanks in advance,
Luciano

It would just be a matter of combining the results from the array we generate.

Examining your example result, it appears as if you want to have both the
original output with the output relevant to the number of "skipped" parsing's
below it.

Since this is getting more complex, I have taken the liberty of also outputting
the rule(s) being used; and separating the original output from the output with
the "skipped" parsing's.

This has required some modifications so I am posting the entire macro as it
presently exists.

Eventually, it might be useful to input the parameters (rule(s) and number of
skips) via a user form, instead of multiple Input Box's as I've done so far.

======================================================
Option Explicit
Dim aRule(0 To 1, 1 To 100) As String
Sub ParseSpecial()
Dim c As Range
Dim i As Long, j As Long
Dim vRule As Variant
Dim lSkips As Long

Dim aResRule1() As String
Dim aResRule2() As String
Dim re As Object, mc As Object, m As Object

'Rules
'construct so that all strings will have at least
'one match
'store each rule number in the corresponding element of
'the aRule array
aRule(0, 1) = "Right side of K or R; NOT if P is Right to K or R"
aRule(1, 1) = "([^KR]|[KR]P)+[KR]?|[KR]"

aRule(0, 2) = "Right side of K or R"
aRule(1, 2) = "[^KR]+[KR]?|[KR]"

aRule(0, 3) = "Right side of K or R; NOT if P is Right to K or R; " & _
    "after K in CKY, DKD, CKH, CKD, KKR; after R in RRH, RRR, " & _
    "CRK, DRD, RRF, KRR"
aRule(1, 3) = "(CKY|DKD|CKH|CKD|KKR|RRH|RRR|CRK|DRD|RRF|KRR|[^KR]|" & _
    "[KR]P)+[KR]?|[KR]"

aRule(0, 4) = "Right side of K"
aRule(1, 4) = "[^K]+K?|K"

aRule(0, 8) = "Left side of D"
aRule(1, 8) = "D?[^D]+|D"

aRule(0, 9) = "Left side of D, Right side of K"
aRule(1, 9) = "D?[^KD]+K?|[KD]"

aRule(0, 17) = "Right side of F, L"
aRule(1, 17) = "[^FL]+[FL]?|[FL]"

vRule = _
  Split(InputBox("Rule Number (for multiple rules, separate with space): "))

lSkips = InputBox(Prompt:="Number to Skip", Default:="0")

Set c = Selection 'or whatever

If c.Count <> 1 Then
    MsgBox ("Can only select one cell")
    'but could add code to iterate through a
    '  bunch of cells
    Exit Sub
End If

'add more code to ensure validity of rule number
' --------------------------
ReDim aResRule1(0)
aResRule1(0) = c.Value
Set re = CreateObject("vbscript.regexp")
    re.IgnoreCase = False
    re.Global = True

For j = 0 To UBound(vRule)
    re.Pattern = aRule(1, vRule(j))
    ReDim aResRule2(UBound(aResRule1))
    'move current results to aResRule2
        For i = 0 To UBound(aResRule1)
            aResRule2(i) = aResRule1(i)
        Next i
    'clear out aResRule1
    ReDim aResRule1(0)
    For i = 0 To UBound(aResRule2)
        Set mc = re.Execute(aResRule2(i))
            For Each m In mc
                If Len(aResRule1(0)) > 0 Then
                    ReDim Preserve aResRule1(UBound(aResRule1) + 1)
                End If
                    aResRule1(UBound(aResRule1)) = m
            Next m
     Next i
Next j

'clear and write results below
WriteResults aResRule1, c.Offset(2, 0), vRule, lSkips
End Sub
'------------------------------------------------------------------------------------
Sub WriteResults(res, rDest As Range, Rules As Variant, lSkips As Long)
Dim i As Long, j As Long, k As Long
Dim res2()
'clear out range for results
Range(rDest, rDest.End(xlDown)).Clear
'write rules
With rDest.Offset(-1, 0)
    .Clear
    For i = 0 To UBound(Rules)
        .Value = .Value & aRule(0, Rules(i)) & _
            IIf(i < UBound(Rules), vbLf, "")
    Next i
        .Font.Italic = True
        .Font.Color = vbRed
    i = 1
    Do While InStr(i, .Value, "NOT", vbBinaryCompare) > 0
        With .Characters(InStr(i, .Value, "NOT", vbBinaryCompare), 3).Font
            .Bold = True
            .Color = vbBlack
        End With
        i = i + 3
    Loop
End With
    For i = 0 To UBound(res)
        rDest(i + 1, 1).Value = res(i)
    Next i

'check for skips
For j = 1 To lSkips Step lSkips 'won't execute if lSkips = 0
  'move current results to res2
        ReDim res2(UBound(res))
        For i = 0 To UBound(res)
            res2(i) = res(i)
        Next i
    'clear out res
    ReDim res(0)
    'combine
    For i = 0 To UBound(res2) - lSkips
        If Len(res(0)) > 0 Then
            ReDim Preserve res(UBound(res) + 1)
        End If
        For k = i To i + lSkips
            res(UBound(res)) = res(UBound(res)) & _
                res2(k)
        Next k
    Next i
Next j

If lSkips > 0 Then
    Set rDest = rDest.End(xlDown)(2, 1)
    With rDest
        .Value = "With " & lSkips & " Skip" & _
            IIf(lSkips > 1, "s", "") & ":"
        .Font.Color = vbRed
        .Font.Bold = True
    For i = 0 To UBound(res)
        .Offset(i + 1, 0).Value = res(i)
    Next i
    End With
End If

With rDest.End(xlDown)(2, 1)
    .Value = "End of List of Strings"
    .Font.Italic = True
    .Font.Bold = True
    .Font.Color = vbRed
End With
End Sub
================================================
--ron

Dear Ron,
You are correct that it would be better to input the parameters (rule
(s) and number of
skips) via a user form, instead of multiple Box. It would be great to
select the rules and eventually skips. Could you help me with this?
Thanks in advance,
Luciano
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top