Split and Count text in a field

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I have a field as follows:

aa_comp
03, 04, 05
03, 07, 05, 20
03, 06, 07

It is imported from xml (along with other fields), and I am aware of
multiple values in a field is bad normalisation.

I need to count of each text occurrence within the whole field column.
The results would look like:

03: 3
04: 1
05: 2
06: 1
07: 2
20: 1

The possible values range from 01 to 99.
The field could contain 0 values or anything up to 99 values
The delimiter is with a comma.

Please can someone offer some help?
I assume I would use split() and count(), but I am not sure how to go about
it.

Thanks for any help,
Simon
 
Hi Simon,

You can use this function

Public Function SafeSplit(V As Variant, _
Delim As String, Item As Long) As Variant

On Error Resume Next 'to return Null if Item is out of range
SafeSplit = Split(V, Delim)(Item)
End Function

First create a Union query that returns all the values in one column,
something like this, where XXX and YYY are field and table respectively,
and N is the largest number of values in any one field:

SELECT SafeSplit([XXX], 0) AS TheValue FROM YYY
UNION
SELECT SafeSplit([XXX], 1) AS TheValue FROM YYY
UNION
...
UNION
SELECT SafeSplit([XXX], N-1) AS TheValue FROM YYY
;

Then use this union query as the basis of a totals query that groups and
counts the values.
 
This is great.
Now all the values are listed in the query, how am I able to count every
occurrence within the field? Are they not still complete strings?

John Nurick said:
Hi Simon,

You can use this function

Public Function SafeSplit(V As Variant, _
Delim As String, Item As Long) As Variant

On Error Resume Next 'to return Null if Item is out of range
SafeSplit = Split(V, Delim)(Item)
End Function

First create a Union query that returns all the values in one column,
something like this, where XXX and YYY are field and table respectively,
and N is the largest number of values in any one field:

SELECT SafeSplit([XXX], 0) AS TheValue FROM YYY
UNION
SELECT SafeSplit([XXX], 1) AS TheValue FROM YYY
UNION
...
UNION
SELECT SafeSplit([XXX], N-1) AS TheValue FROM YYY
;

Then use this union query as the basis of a totals query that groups and
counts the values.





I have a field as follows:

aa_comp
03, 04, 05
03, 07, 05, 20
03, 06, 07

It is imported from xml (along with other fields), and I am aware of
multiple values in a field is bad normalisation.

I need to count of each text occurrence within the whole field column.
The results would look like:

03: 3
04: 1
05: 2
06: 1
07: 2
20: 1

The possible values range from 01 to 99.
The field could contain 0 values or anything up to 99 values
The delimiter is with a comma.

Please can someone offer some help?
I assume I would use split() and count(), but I am not sure how to go about
it.

Thanks for any help,
Simon
 
This is great.
Now all the values are listed in the query, how am I able to count every
occurrence within the field? Are they not still complete strings?

Simon

John Nurick said:
Hi Simon,

You can use this function

Public Function SafeSplit(V As Variant, _
Delim As String, Item As Long) As Variant

On Error Resume Next 'to return Null if Item is out of range
SafeSplit = Split(V, Delim)(Item)
End Function

First create a Union query that returns all the values in one column,
something like this, where XXX and YYY are field and table respectively,
and N is the largest number of values in any one field:

SELECT SafeSplit([XXX], 0) AS TheValue FROM YYY
UNION
SELECT SafeSplit([XXX], 1) AS TheValue FROM YYY
UNION
...
UNION
SELECT SafeSplit([XXX], N-1) AS TheValue FROM YYY
;

Then use this union query as the basis of a totals query that groups and
counts the values.





I have a field as follows:

aa_comp
03, 04, 05
03, 07, 05, 20
03, 06, 07

It is imported from xml (along with other fields), and I am aware of
multiple values in a field is bad normalisation.

I need to count of each text occurrence within the whole field column.
The results would look like:

03: 3
04: 1
05: 2
06: 1
07: 2
20: 1

The possible values range from 01 to 99.
The field could contain 0 values or anything up to 99 values
The delimiter is with a comma.

Please can someone offer some help?
I assume I would use split() and count(), but I am not sure how to go about
it.

Thanks for any help,
Simon
 
Did you miss the sentence at the bottom of my previous message about using a
totals query to count them?


Simon said:
This is great.
Now all the values are listed in the query, how am I able to count every
occurrence within the field? Are they not still complete strings?

Simon

John Nurick said:
Hi Simon,

You can use this function

Public Function SafeSplit(V As Variant, _
Delim As String, Item As Long) As Variant

On Error Resume Next 'to return Null if Item is out of range
SafeSplit = Split(V, Delim)(Item)
End Function

First create a Union query that returns all the values in one column,
something like this, where XXX and YYY are field and table respectively,
and N is the largest number of values in any one field:

SELECT SafeSplit([XXX], 0) AS TheValue FROM YYY
UNION
SELECT SafeSplit([XXX], 1) AS TheValue FROM YYY
UNION
...
UNION
SELECT SafeSplit([XXX], N-1) AS TheValue FROM YYY
;

Then use this union query as the basis of a totals query that groups and
counts the values.





I have a field as follows:

aa_comp
03, 04, 05
03, 07, 05, 20
03, 06, 07

It is imported from xml (along with other fields), and I am aware of
multiple values in a field is bad normalisation.

I need to count of each text occurrence within the whole field column.
The results would look like:

03: 3
04: 1
05: 2
06: 1
07: 2
20: 1

The possible values range from 01 to 99.
The field could contain 0 values or anything up to 99 values
The delimiter is with a comma.

Please can someone offer some help?
I assume I would use split() and count(), but I am not sure how to go
about
it.

Thanks for any help,
Simon
 
Simon -

Try something like this. I did and it worked on your sample
You can use numbers strings etc but if you use strings
then you have to "normalize" them by removing leading and
trailing blanks which can cause errors.

this example uses 3 functions:

CountValues - which retuns a variant
which contains an Nx2 array with the
resulting answer

UniqueElementArray - which returns an array
that contains only 1 instance of any value
in the input - in other words the duplicated
values are eliminated.

IndexArray - which returns the position of any
instance in an array of values. or -1 if not
contained in the target array

You can write your own dumparray sub for enumerating the
resultant of CountValues which will either be NULL if an error
or the Nx2 array where n is the number of unique instances
and the columns are:

Column(0) = the unique instance
Column(1) = the count of the instance in the input array

Example:
CountValues("03, 04, 05, 03, 07, 05, 20 ,03, 06, 07")

Produces:
Row Column (0) Column (1)
---------- ---------
0) 03 3
1) 04 1
2) 05 2
3) 07 2
4) 20 1
5) 06 1

------------------------
Public Function CountValues(ByVal pstrInput) As Variant
'Counts occurances of single values in a comma delimited string
'retuns an array as a variant

Dim strLastChar As String
Dim astrInstances() As String
Dim avUniqueInstances As Variant
Dim lngPosition As Long

Dim lngInstanceCounts() As Long
Dim vElmt As Variant
Dim I As Long
Dim lngMaxInstances As Long

Dim avOutput As Variant

'First be sure that your input length is > 0
If Len(pstrInput) = 0 Then
CountValues = Null
GoTo CommonExit
Else
strLastChar = mID(pstrInput, Len(pstrInput), 1)
If strLastChar = "," Then
pstrInput = mID(pstrInput, 1, Len(pstrInput) - 1)
End If
End If

'Then create an array using the SPLIT function
astrInstances = Split(pstrInput, ",")
lngMaxInstances = UBound(astrInstances)

'eliminate leading and trailing blanks
For I = 0 To lngMaxInstances
astrInstances(I) = Trim(astrInstances(I))
Next I

'Create an Array of unique instances (See UniqueElementArray - Below)
avUniqueInstances = UniqueElementArray(astrInstances)

'now count the number of Each Element
ReDim lngInstanceCounts(UBound(avUniqueInstances))

For I = 0 To UBound(astrInstances)
lngPosition = IndexArray(avUniqueInstances, astrInstances(I))
If lngPosition > -1 Then
lngInstanceCounts(lngPosition) = lngInstanceCounts(lngPosition) + 1
End If
Next I

'load the unique instance and the counts into a nx2 array
ReDim avOutput(UBound(avUniqueInstances), 1)
For I = 0 To UBound(avUniqueInstances)
avOutput(I, 0) = avUniqueInstances(I)
avOutput(I, 1) = lngInstanceCounts(I)
Next I

'return the result as an nx2 array
CountValues = avOutput

CommonExit:
Exit Function

Errorhandler:
CountValues = Null
Resume CommonExit

End Function

---

You'll need these 2 functions

Public Function UniqueElementArray(aIN As Variant) As Variant
'creates a Unique Element array
'Array must be a one dimensioned array

Dim objDictionary As Dictionary
Dim vElmt As Variant
Dim aVars As Variant
Dim aLngs As Long
Dim aInts As Integer
Dim aSngs As Single
Dim aDbls As Double
Dim aCurs As Currency
Dim aObjs As Object
Dim aStrs As String
Dim lngCount As Long
Dim aDts As Date
Dim aBytes As Byte
Dim I As Long
Dim lngDim As Long
Dim lngVarType As Long
Dim aOut As Variant

On Error GoTo Errorhandler

'use the Dictionary object to force uniqueness
Set objDictionary = New Dictionary
With objDictionary
For Each vElmt In aIN
.Add vElmt, vElmt
NextElement:
Next vElmt

'now output the array into "aout"
lngCount = .Count
lngDim = lngCount - 1
ReDim aOut(lngDim)
'vartype returns a value vbArray + Scalar Object type
lngVarType = varType(aOut) - vbArray

Select Case lngVarType
Case vbInteger
For I = 0 To lngDim
'Debug.Print I, .Items(I)
aInts = .Items(I)
aOut(I) = aInts
Next I

Case vbVariant
For I = 0 To lngDim
'Debug.Print I, .Items(I)
aVars = .Items(I)
aOut(I) = aVars
Next I

Case vbLong
For I = 0 To lngDim
'Debug.Print I, .Items(I)
aLngs = .Items(I)
aOut(I) = aLngs
Next I

Case vbString
For I = 0 To lngDim
'Debug.Print I, .Items(I)
aStrs = .Items(I)
aOut(I) = aStrs
Next I

Case vbDouble
For I = 0 To lngDim
'Debug.Print I, .Items(I)
aDbls = .Items(I)
aOut(I) = aDbls
Next I

Case vbSingle
For I = 0 To lngDim
'Debug.Print I, .Items(I)
aSngs = .Items(I)
aOut(I) = aSngs
Next I
End Select
End With
UniqueElementArray = aOut

CommonExit:
Exit Function
Errorhandler:
Select Case Err.Number
Case 457 'Key already There
Err.Clear
Resume NextElement
Case Else
WTO Err.Number & " - " & Err.Description & vbCrLf &
"UniqueElementArray"
Resume CommonExit
End Select
End Function

----------

Public Function IndexArray(vArray As Variant, _
vSearch As Variant) As Long
Dim I As Long
For I = 0 To UBound(vArray, 1)
If vArray(I) = vSearch Then
IndexArray = I
GoTo CommonExit
End If
Next I
IndexArray = -1
CommonExit:

End Function

Happy Coding!
 
On Thu, 15 Jun 2006 21:49:02 -0700, Ken Higgins

[snip]
the Nx2 array where n is the number of unique instances
and the columns are:

Column(0) = the unique instance
Column(1) = the count of the instance in the input array

[lots of useful VBA code snipped]

Hi Ken,

I'm not sure this does what Simon needs, which AIUI is to count the
occurrences of each value not just in one list but in a whole column.

Setting that aside: I looked at your 150+ lines of code and then thought
how one might do the same job in Perl:

sub getCountsOfItemsInList($) {
my %items;
$items{$_}++ foreach split /,\s*/ , $_[0];
return %items;
}

which translates into VBA as

Public Function GetCountsOfItemsInList(List As String) As Variant
Dim AllItems As Variant
Dim Item As Variant
Dim Uniques As Object 'Scripting.Dictionary

Set Uniques = CreateObject("Scripting.Dictionary")

AllItems = Split(List, ",")
For Each Item In AllItems
Item = Trim(Item)
If Uniques.Exists(Item) Then 'increment count
Uniques(Item) = Uniques(Item) + 1
Else 'add new item
Uniques.Add Item, 1
End If
Next
Set GetCountsOfItemsInList = Uniques

End Function
 
Simon said:
I have a field as follows:

aa_comp
03, 04, 05
03, 07, 05, 20
03, 06, 07

I need to count of each text occurrence within the whole field column.
The results would look like:

03: 3
04: 1
05: 2
06: 1
07: 2
20: 1

The possible values range from 01 to 99.
The field could contain 0 values or anything up to 99 values
The delimiter is with a comma.

In the test data you posted the deliminator appears to be comma+space.

If you want an SQL solution, try something like this. Ironically, I'm
using VBA to create the tables and data required to demo my SQL
solution <g>. Rather than create sequence tables (i.e. tables of
incrementing integers) on the fly as I have, you could create a
permanent auxillary table in your database:

Sub test5()
Dim cat
Set cat = CreateObject("ADOX.Catalog")
With cat
.Create _
"Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=C:\DropMe.mdb"
With .ActiveConnection
.Execute _
"CREATE TABLE Test1 (" & _
" aa_comp NVARCHAR(255)" & _
" NOT NULL)"

.Execute _
"INSERT INTO Test1 (aa_comp)" & _
" VALUES ('03, 04, 05');"
.Execute _
"INSERT INTO Test1 (aa_comp)" & _
" VALUES ('03, 07, 05, 20');"
.Execute _
"INSERT INTO Test1 (aa_comp)" & _
" VALUES ('03, 06, 07');"

Dim sql
sql = _
"SELECT DT1.aa_comp_parsed, COUNT(*) AS count_of" & _
" FROM ( SELECT T1.aa_comp, S1.seq as pos," & _
" MID$(T1.aa_comp, S1.seq, MIN(S2.seq) -" & _
" S1.seq - 2) AS aa_comp_parsed FROM Test1" & _
" AS T1, (SELECT Units.nbr + Tens.nbr AS" & _
" seq FROM ( SELECT nbr FROM ( SELECT DISTINCT" & _
" 0 AS nbr FROM Test1 UNION ALL SELECT DISTINCT" & _
" 1 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 2 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 3 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 4 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 5 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 6 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 7 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 8 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 9 FROM Test1 ) AS Digits ) AS Units, (" & _
" SELECT nbr * 10 AS nbr FROM ( SELECT DISTINCT" & _
" 0 AS nbr FROM Test1 UNION ALL SELECT DISTINCT" & _
" 1 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 2 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 3 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 4 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 5 FROM Test1 UNION ALL SELECT DISTINCT"
sql = sql & _
" 6 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 7 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 8 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 9 FROM Test1 ) AS Digits ) AS Tens) AS" & _
" S1, (SELECT Units.nbr + Tens.nbr AS seq" & _
" FROM ( SELECT nbr FROM ( SELECT DISTINCT" & _
" 0 AS nbr FROM Test1 UNION ALL SELECT DISTINCT" & _
" 1 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 2 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 3 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 4 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 5 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 6 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 7 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 8 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 9 FROM Test1 ) AS Digits ) AS Units, (" & _
" SELECT nbr * 10 AS nbr FROM ( SELECT DISTINCT" & _
" 0 AS nbr FROM Test1 UNION ALL SELECT DISTINCT" & _
" 1 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 2 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 3 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 4 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 5 FROM Test1 UNION ALL SELECT DISTINCT"
sql = sql & _
" 6 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 7 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 8 FROM Test1 UNION ALL SELECT DISTINCT" & _
" 9 FROM Test1 ) AS Digits ) AS Tens) AS" & _
" S2 WHERE MID$(', ' & T1.aa_comp & ', '," & _
" S1.seq, 2) = ', ' AND MID$(', ' & T1.aa_comp" & _
" & ', ', S2.seq, 2) = ', ' AND S1.seq <" & _
" S2.seq AND S1.seq > 0 AND S2.seq > 0 GROUP" & _
" BY T1.aa_comp, S1.seq) AS DT1 GROUP BY" & _
" DT1.aa_comp_parsed ORDER BY DT1.aa_comp_parsed;"

Dim rs
Set rs = .Execute(sql)
MsgBox rs.GetString
rs.Close

End With
Set .ActiveConnection = Nothing
End With
End Sub

Jamie.

--
 
John:

I tried out that VB module - what a nice clean algorithm! Hats off for
simplification
I dont know Pearl, but i do know that the Dictionary object in VB had its
genesis in
Pearl. Thanks for offering the simplification. Nice work.

Ken

John Nurick said:
On Thu, 15 Jun 2006 21:49:02 -0700, Ken Higgins

[snip]
the Nx2 array where n is the number of unique instances
and the columns are:

Column(0) = the unique instance
Column(1) = the count of the instance in the input array

[lots of useful VBA code snipped]

Hi Ken,

I'm not sure this does what Simon needs, which AIUI is to count the
occurrences of each value not just in one list but in a whole column.

Setting that aside: I looked at your 150+ lines of code and then thought
how one might do the same job in Perl:

sub getCountsOfItemsInList($) {
my %items;
$items{$_}++ foreach split /,\s*/ , $_[0];
return %items;
}

which translates into VBA as

Public Function GetCountsOfItemsInList(List As String) As Variant
Dim AllItems As Variant
Dim Item As Variant
Dim Uniques As Object 'Scripting.Dictionary

Set Uniques = CreateObject("Scripting.Dictionary")

AllItems = Split(List, ",")
For Each Item In AllItems
Item = Trim(Item)
If Uniques.Exists(Item) Then 'increment count
Uniques(Item) = Uniques(Item) + 1
Else 'add new item
Uniques.Add Item, 1
End If
Next
Set GetCountsOfItemsInList = Uniques

End Function
 
If you want an SQL solution, try something like this. Ironically, I'm
using VBA to create the tables and data required to demo my SQL
solution <g>. Rather than create sequence tables (i.e. tables of
incrementing integers) on the fly as I have, you could create a
permanent auxillary table in your database:

[Ingenious SQL snipped].

Interesting... I'll have to spend part of my weekend working out how it
works. Thank you.
 
John said:
[Ingenious SQL snipped].

Don't decorate the messenger <g>; the ingenuity is not mine:

http://groups.google.com/group/microsoft.public.sqlserver.programming/msg/03a528ab01d4ea9d

The basic premise here is that Jet is a SQL product and SQL is a set
based, declarative language, therefore we can best solve 'data'
problems in Jet with set based solutions. Of course, Access has VBA and
VBA is a 'traditional' procedural language; most programmers are used
to seeing procedural solutions (e.g. traversing ordered recordsets from
BOF to EOF, flagging values along the way).
Interesting... I'll have to spend part of my weekend working out how it
works. Thank you.

I'm sure you figure it out in no time but for fun here's a breakdown
(long post).

First, a 'Sequence' table of integers. Here's a quick way (three lines
of SQL and a couple of seconds of time) to create a 100K row auxiliary
table, better than materializing on the fly:

CREATE TABLE [Sequence] (
seq INTEGER NOT NULL CONSTRAINT pk__Sequence PRIMARY KEY)
;
INSERT INTO [Sequence] (seq) VALUES (-1)
;
INSERT INTO [Sequence] (seq)
SELECT Units.nbr + Tens.nbr + Hundreds.nbr + Thousands.nbr +
TenThousands.nbr AS seq
FROM
(
SELECT nbr FROM
(
SELECT 0 AS nbr FROM [Sequence]
UNION ALL
SELECT 1 FROM [Sequence]
UNION ALL
SELECT 2 FROM [Sequence]
UNION ALL
SELECT 3 FROM [Sequence]
UNION ALL
SELECT 4 FROM [Sequence]
UNION ALL
SELECT 5 FROM [Sequence]
UNION ALL
SELECT 6 FROM [Sequence]
UNION ALL
SELECT 7 FROM [Sequence]
UNION ALL
SELECT 8 FROM [Sequence]
UNION ALL
SELECT 9 FROM [Sequence]
) AS Digits
) AS Units,
(
SELECT nbr * 10 AS nbr FROM
(
SELECT 0 AS nbr FROM [Sequence]
UNION ALL
SELECT 1 FROM [Sequence]
UNION ALL
SELECT 2 FROM [Sequence]
UNION ALL
SELECT 3 FROM [Sequence]
UNION ALL
SELECT 4 FROM [Sequence]
UNION ALL
SELECT 5 FROM [Sequence]
UNION ALL
SELECT 6 FROM [Sequence]
UNION ALL
SELECT 7 FROM [Sequence]
UNION ALL
SELECT 8 FROM [Sequence]
UNION ALL
SELECT 9 FROM [Sequence]
) AS Digits
) AS Tens,
(
SELECT nbr * 100 AS nbr FROM
(
SELECT 0 AS nbr FROM [Sequence]
UNION ALL
SELECT 1 FROM [Sequence]
UNION ALL
SELECT 2 FROM [Sequence]
UNION ALL
SELECT 3 FROM [Sequence]
UNION ALL
SELECT 4 FROM [Sequence]
UNION ALL
SELECT 5 FROM [Sequence]
UNION ALL
SELECT 6 FROM [Sequence]
UNION ALL
SELECT 7 FROM [Sequence]
UNION ALL
SELECT 8 FROM [Sequence]
UNION ALL
SELECT 9 FROM [Sequence]
) AS Digits
) AS Hundreds,
(
SELECT nbr * 1000 AS nbr FROM
(
SELECT 0 AS nbr FROM [Sequence]
UNION ALL
SELECT 1 FROM [Sequence]
UNION ALL
SELECT 2 FROM [Sequence]
UNION ALL
SELECT 3 FROM [Sequence]
UNION ALL
SELECT 4 FROM [Sequence]
UNION ALL
SELECT 5 FROM [Sequence]
UNION ALL
SELECT 6 FROM [Sequence]
UNION ALL
SELECT 7 FROM [Sequence]
UNION ALL
SELECT 8 FROM [Sequence]
UNION ALL
SELECT 9 FROM [Sequence]
) AS Digits
) AS Thousands,
(
SELECT nbr * 10000 AS nbr FROM
(
SELECT 0 AS nbr FROM [Sequence]
UNION ALL
SELECT 1 FROM [Sequence]
UNION ALL
SELECT 2 FROM [Sequence]
UNION ALL
SELECT 3 FROM [Sequence]
UNION ALL
SELECT 4 FROM [Sequence]
UNION ALL
SELECT 5 FROM [Sequence]
UNION ALL
SELECT 6 FROM [Sequence]
UNION ALL
SELECT 7 FROM [Sequence]
UNION ALL
SELECT 8 FROM [Sequence]
UNION ALL
SELECT 9 FROM [Sequence]
) AS Digits
) AS TenThousands
;

Here's the OP's test table and data:

CREATE TABLE Test1 (
aa_comp VARCHAR(100) NOT NULL)
;
INSERT INTO Test1 (aa_comp) VALUES ('03, 04, 05')
;
INSERT INTO Test1 (aa_comp) VALUES ('03, 07, 05, 20')
;
INSERT INTO Test1 (aa_comp) VALUES ('03, 06, 07')
;

We can create a 'cross join' to return all the possible combinations of
rows between the OP's table and the sequence table and plug the seq
value into Jet's MID$ function to parse out each character individually
(best to limit the rows in the sequence table based on the maximum
character width of the column):

SELECT T1.aa_comp,
S1.seq as pos,
MID$(T1.aa_comp, S1.seq, 1) AS aa_comp_parsed
FROM Test1 AS T1,
[sequence] AS S1
WHERE S1.seq BETWEEN 1 AND 100;

I'm sure you get the basic idea. The following parses out all the
possible strings of consecutive characters, this time using a search
condition to remove the null strings:

SELECT T1.aa_comp,
S1.seq as pos1,
S2.seq as pos2,
MID$(T1.aa_comp, S1.seq, S2.seq) AS aa_comp_parsed
FROM Test1 AS T1,
[sequence] AS S1,
[sequence] AS S2
WHERE S1.seq BETWEEN 1 AND 100
AND S2.seq BETWEEN 1 AND 100
AND S1.seq < S2.seq
AND LEN(MID$(T1.aa_comp, S1.seq, S2.seq)) > 0
ORDER BY T1.aa_comp, S1.seq, S2.seq

It is immediately clear we are only interested in a small subset of
these results. Note that to stand any chance of identifying the first
and last substring we need to prefix and suffix the whole string with
the delimiting string:

SELECT T1.aa_comp,
S1.seq as pos1,
S2.seq as pos2,
MID$(T1.aa_comp, S1.seq, S2.seq) AS aa_comp_parsed
FROM
(
SELECT ', ' & aa_comp & ', ' AS aa_comp
FROM Test1
)
AS T1,
[sequence] AS S1,
[sequence] AS S2
WHERE S1.seq BETWEEN 1 AND 100
AND S2.seq BETWEEN 1 AND 100
AND S1.seq < S2.seq
AND LEN(MID$(T1.aa_comp, S1.seq, S2.seq)) > 0
ORDER BY T1.aa_comp, S1.seq, S2.seq

We are only interested in substrings that are delimited by the
delimiting string:

SELECT T1.aa_comp,
S1.seq as pos1,
S2.seq as pos2,
MID$(T1.aa_comp, S1.seq, S2.seq) AS aa_comp_parsed
FROM
(
SELECT ', ' & aa_comp & ', ' AS aa_comp
FROM Test1
)
AS T1,
[sequence] AS S1,
[sequence] AS S2
WHERE S1.seq BETWEEN 1 AND 104
AND S2.seq BETWEEN 1 AND 104
AND S1.seq < S2.seq
AND MID$(T1.aa_comp, S1.seq, 2) = ', '
AND MID$(T1.aa_comp, S2.seq, 2) = ', '
ORDER BY T1.aa_comp, S1.seq, S2.seq;

OK, the above is still returning too many rows. We want to remove the
rows which have the delimiting string in addition to each end of the
substring.

For me, the real genius is realization that

MID$(T1.aa_comp, S1.seq, S2.seq) AS aa_comp_parsed

may be re-written as

MID$(T1.aa_comp, S1.seq, S2.seq - S1.seq) AS aa_comp_parsed

There are various ways of parsing/filtering out superfluous rows but
the MIN is particularly well optimized.

Here's the final resultset:

SELECT T1.aa_comp,
S1.seq as pos1,
MID$(T1.aa_comp, S1.seq + 2, MIn(S2.seq) - S1.seq - 2) AS
aa_comp_parsed
FROM
(
SELECT ', ' & aa_comp & ', ' AS aa_comp
FROM Test1
)
AS T1,
[sequence] AS S1,
[sequence] AS S2
WHERE S1.seq BETWEEN 1 AND 104
AND S2.seq BETWEEN 1 AND 104
AND S1.seq < S2.seq
AND MID$(T1.aa_comp, S1.seq, 2) = ', '
AND MID$(T1.aa_comp, S2.seq, 2) = ', '
GROUP BY T1.aa_comp, S1.seq
ORDER BY T1.aa_comp, S1.seq;

As ever, here's my test code:

Sub parmlist()
Kill "C:\DropMe.mdb"
Dim cat
Set cat = CreateObject("ADOX.Catalog")
With cat
.Create _
"Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=C:\DropMe.mdb"
With .ActiveConnection

' Create 100K row 'sequence' auxilary table
.Execute _
"CREATE TABLE [Sequence] (seq INTEGER NOT" & _
" NULL CONSTRAINT pk__Sequence PRIMARY KEY);"
.Execute _
"INSERT INTO [Sequence] (seq) VALUES (-1)"

Dim sql
sql = sql & _
"INSERT INTO [Sequence] (seq) SELECT Units.nbr" & _
" + Tens.nbr + Hundreds.nbr + Thousands.nbr" & _
" + TenThousands.nbr AS seq FROM ( SELECT" & _
" nbr FROM ( SELECT 0 AS nbr FROM [Sequence]" & _
" UNION ALL SELECT 1 FROM [Sequence] UNION" & _
" ALL SELECT 2 FROM [Sequence] UNION ALL" & _
" SELECT 3 FROM [Sequence] UNION ALL SELECT" & _
" 4 FROM [Sequence] UNION ALL SELECT 5 FROM" & _
" [Sequence] UNION ALL SELECT 6 FROM [Sequence]" & _
" UNION ALL SELECT 7 FROM [Sequence] UNION" & _
" ALL SELECT 8 FROM [Sequence] UNION ALL" & _
" SELECT 9 FROM [Sequence] ) AS Digits )" & _
" AS Units, ( SELECT nbr * 10 AS nbr FROM" & _
" ( SELECT 0 AS nbr FROM [Sequence] UNION" & _
" ALL SELECT 1 FROM [Sequence] UNION ALL" & _
" SELECT 2 FROM [Sequence] UNION ALL SELECT" & _
" 3 FROM [Sequence] UNION ALL SELECT 4 FROM" & _
" [Sequence] UNION ALL SELECT 5 FROM [Sequence]" & _
" UNION ALL SELECT 6 FROM [Sequence] UNION" & _
" ALL SELECT 7 FROM [Sequence] UNION ALL" & _
" SELECT 8 FROM [Sequence] UNION ALL SELECT" & _
" 9 FROM [Sequence] ) AS Digits ) AS Tens," & _
" ( SELECT nbr * 100 AS nbr FROM ( SELECT"
sql = sql & _
" 0 AS nbr FROM [Sequence] UNION ALL SELECT" & _
" 1 FROM [Sequence] UNION ALL SELECT 2 FROM" & _
" [Sequence] UNION ALL SELECT 3 FROM [Sequence]" & _
" UNION ALL SELECT 4 FROM [Sequence] UNION" & _
" ALL SELECT 5 FROM [Sequence] UNION ALL" & _
" SELECT 6 FROM [Sequence] UNION ALL SELECT" & _
" 7 FROM [Sequence] UNION ALL SELECT 8 FROM" & _
" [Sequence] UNION ALL SELECT 9 FROM [Sequence]" & _
" ) AS Digits ) AS Hundreds, ( SELECT nbr" & _
" * 1000 AS nbr FROM ( SELECT 0 AS nbr FROM" & _
" [Sequence] UNION ALL SELECT 1 FROM [Sequence]" & _
" UNION ALL SELECT 2 FROM [Sequence] UNION" & _
" ALL SELECT 3 FROM [Sequence] UNION ALL" & _
" SELECT 4 FROM [Sequence] UNION ALL SELECT" & _
" 5 FROM [Sequence] UNION ALL SELECT 6 FROM" & _
" [Sequence] UNION ALL SELECT 7 FROM [Sequence]" & _
" UNION ALL SELECT 8 FROM [Sequence] UNION" & _
" ALL SELECT 9 FROM [Sequence] ) AS Digits" & _
" ) AS Thousands, ( SELECT nbr * 10000 AS" & _
" nbr FROM ( SELECT 0 AS nbr FROM [Sequence]" & _
" UNION ALL SELECT 1 FROM [Sequence] UNION" & _
" ALL SELECT 2 FROM [Sequence] UNION ALL" & _
" SELECT 3 FROM [Sequence] UNION ALL SELECT"
sql = sql & _
" 4 FROM [Sequence] UNION ALL SELECT 5 FROM" & _
" [Sequence] UNION ALL SELECT 6 FROM [Sequence]" & _
" UNION ALL SELECT 7 FROM [Sequence] UNION" & _
" ALL SELECT 8 FROM [Sequence] UNION ALL" & _
" SELECT 9 FROM [Sequence] ) AS Digits )" & _
" AS TenThousands;"
.Execute sql

' Create test table
.Execute _
"CREATE TABLE Test1 (" & _
" aa_comp VARCHAR(100)" & _
" NOT NULL)"

' Create ', ' delimted data
.Execute _
"INSERT INTO Test1 (aa_comp)" & _
" VALUES ('03, 04, 05');"
.Execute _
"INSERT INTO Test1 (aa_comp)" & _
" VALUES ('03, 07, 05, 20');"
.Execute _
"INSERT INTO Test1 (aa_comp)" & _
" VALUES ('03, 06, 07');"

Dim rs
Set rs = .Execute( _
"SELECT T1.aa_comp, S1.seq as pos, MID$(T1.aa_comp," & _
" S1.seq, 1) AS aa_comp_parsed FROM Test1" & _
" AS T1, [sequence] AS S1 WHERE S1.seq BETWEEN" & _
" 1 AND 100 AND LEN(MID$(T1.aa_comp, S1.seq," & _
" 1)) > 0;")
MsgBox _
"Example 1: parse out each character" & vbCr & vbCr & _
"aa_comp " & vbTab & "pos" & vbTab & _
"parsed" & vbCr & _
rs.GetString
rs.Close

With rs
.CursorLocation = 3
.ActiveConnection = cat.ActiveConnection
.Source = _
"SELECT T1.aa_comp, S1.seq as pos1, S2.seq" & _
" as pos2, MID$(T1.aa_comp, S1.seq, S2.seq)" & _
" AS aa_comp_parsed FROM ( SELECT ', ' &" & _
" aa_comp & ', ' AS aa_comp FROM Test1 )" & _
" AS T1, [sequence] AS S1, [sequence] AS" & _
" S2 WHERE S1.seq BETWEEN 1 AND 100 AND S2.seq" & _
" BETWEEN 1 AND 100 AND S1.seq < S2.seq AND" & _
" LEN(MID$(T1.aa_comp, S1.seq, S2.seq)) >" & _
" 0 ORDER BY T1.aa_comp, S1.seq, S2.seq "
.Open
MsgBox _
"Example 2: parse out each substring" & vbCr & _
"(" & rs.RecordCount & " rows)" & vbCr & vbCr & _
"aa_comp " & vbTab & "pos1" & vbTab & _
"pos2" & vbTab & "parsed" & vbCr & _
.GetString
.Close
End With

Set rs = .Execute( _
"SELECT T1.aa_comp, S1.seq as pos1, S2.seq" & _
" as pos2, MID$(T1.aa_comp, S1.seq, S2.seq)" & _
" AS aa_comp_parsed FROM ( SELECT ', ' &" & _
" aa_comp & ', ' AS aa_comp FROM Test1 )" & _
" AS T1, [sequence] AS S1, [sequence] AS" & _
" S2 WHERE S1.seq BETWEEN 1 AND 104 AND S2.seq" & _
" BETWEEN 1 AND 104 AND S1.seq < S2.seq AND" & _
" MID$(T1.aa_comp, S1.seq, 2) = ', ' AND" & _
" MID$(T1.aa_comp, S2.seq, 2) = ', ' ORDER" & _
" BY T1.aa_comp, S1.seq, S2.seq;")
MsgBox _
"Example 3: only interested in substrings" & vbCr & _
"prefixed and suffixed by delimiter:" & vbCr & vbCr & _
"aa_comp " & vbTab & "pos1" & vbTab & _
"pos2" & vbTab & "parsed" & vbCr & _
rs.GetString
rs.Close

Set rs = .Execute( _
"SELECT T1.aa_comp, S1.seq as pos1, MID$(T1.aa_comp," & _
" S1.seq + 2, MIn(S2.seq) - S1.seq - 2) AS" & _
" aa_comp_parsed FROM ( SELECT ', ' & aa_comp" & _
" & ', ' AS aa_comp FROM Test1 ) AS T1, [sequence]" & _
" AS S1, [sequence] AS S2 WHERE S1.seq BETWEEN" & _
" 1 AND 104 AND S2.seq BETWEEN 1 AND 104" & _
" AND S1.seq < S2.seq AND MID$(T1.aa_comp," & _
" S1.seq, 2) = ', ' AND MID$(T1.aa_comp," & _
" S2.seq, 2) = ', ' GROUP BY T1.aa_comp," & _
" S1.seq ORDER BY T1.aa_comp, S1.seq;")
MsgBox _
"Final resultset: removed rows containing" & vbCr & _
"delimiting characters:" & vbCr & vbCr & _
"aa_comp " & vbTab & "pos1" & vbTab & _
"parsed" & vbCr & _
rs.GetString
rs.Close

End With
Set .ActiveConnection = Nothing
End With
End Sub

Jamie.

--
 
Back
Top