John said:
Don't decorate the messenger <g>; the ingenuity is not mine:
http://groups.google.com/group/microsoft.public.sqlserver.programming/msg/03a528ab01d4ea9d
The basic premise here is that Jet is a SQL product and SQL is a set
based, declarative language, therefore we can best solve 'data'
problems in Jet with set based solutions. Of course, Access has VBA and
VBA is a 'traditional' procedural language; most programmers are used
to seeing procedural solutions (e.g. traversing ordered recordsets from
BOF to EOF, flagging values along the way).
Interesting... I'll have to spend part of my weekend working out how it
works. Thank you.
I'm sure you figure it out in no time but for fun here's a breakdown
(long post).
First, a 'Sequence' table of integers. Here's a quick way (three lines
of SQL and a couple of seconds of time) to create a 100K row auxiliary
table, better than materializing on the fly:
CREATE TABLE [Sequence] (
seq INTEGER NOT NULL CONSTRAINT pk__Sequence PRIMARY KEY)
;
INSERT INTO [Sequence] (seq) VALUES (-1)
;
INSERT INTO [Sequence] (seq)
SELECT Units.nbr + Tens.nbr + Hundreds.nbr + Thousands.nbr +
TenThousands.nbr AS seq
FROM
(
SELECT nbr FROM
(
SELECT 0 AS nbr FROM [Sequence]
UNION ALL
SELECT 1 FROM [Sequence]
UNION ALL
SELECT 2 FROM [Sequence]
UNION ALL
SELECT 3 FROM [Sequence]
UNION ALL
SELECT 4 FROM [Sequence]
UNION ALL
SELECT 5 FROM [Sequence]
UNION ALL
SELECT 6 FROM [Sequence]
UNION ALL
SELECT 7 FROM [Sequence]
UNION ALL
SELECT 8 FROM [Sequence]
UNION ALL
SELECT 9 FROM [Sequence]
) AS Digits
) AS Units,
(
SELECT nbr * 10 AS nbr FROM
(
SELECT 0 AS nbr FROM [Sequence]
UNION ALL
SELECT 1 FROM [Sequence]
UNION ALL
SELECT 2 FROM [Sequence]
UNION ALL
SELECT 3 FROM [Sequence]
UNION ALL
SELECT 4 FROM [Sequence]
UNION ALL
SELECT 5 FROM [Sequence]
UNION ALL
SELECT 6 FROM [Sequence]
UNION ALL
SELECT 7 FROM [Sequence]
UNION ALL
SELECT 8 FROM [Sequence]
UNION ALL
SELECT 9 FROM [Sequence]
) AS Digits
) AS Tens,
(
SELECT nbr * 100 AS nbr FROM
(
SELECT 0 AS nbr FROM [Sequence]
UNION ALL
SELECT 1 FROM [Sequence]
UNION ALL
SELECT 2 FROM [Sequence]
UNION ALL
SELECT 3 FROM [Sequence]
UNION ALL
SELECT 4 FROM [Sequence]
UNION ALL
SELECT 5 FROM [Sequence]
UNION ALL
SELECT 6 FROM [Sequence]
UNION ALL
SELECT 7 FROM [Sequence]
UNION ALL
SELECT 8 FROM [Sequence]
UNION ALL
SELECT 9 FROM [Sequence]
) AS Digits
) AS Hundreds,
(
SELECT nbr * 1000 AS nbr FROM
(
SELECT 0 AS nbr FROM [Sequence]
UNION ALL
SELECT 1 FROM [Sequence]
UNION ALL
SELECT 2 FROM [Sequence]
UNION ALL
SELECT 3 FROM [Sequence]
UNION ALL
SELECT 4 FROM [Sequence]
UNION ALL
SELECT 5 FROM [Sequence]
UNION ALL
SELECT 6 FROM [Sequence]
UNION ALL
SELECT 7 FROM [Sequence]
UNION ALL
SELECT 8 FROM [Sequence]
UNION ALL
SELECT 9 FROM [Sequence]
) AS Digits
) AS Thousands,
(
SELECT nbr * 10000 AS nbr FROM
(
SELECT 0 AS nbr FROM [Sequence]
UNION ALL
SELECT 1 FROM [Sequence]
UNION ALL
SELECT 2 FROM [Sequence]
UNION ALL
SELECT 3 FROM [Sequence]
UNION ALL
SELECT 4 FROM [Sequence]
UNION ALL
SELECT 5 FROM [Sequence]
UNION ALL
SELECT 6 FROM [Sequence]
UNION ALL
SELECT 7 FROM [Sequence]
UNION ALL
SELECT 8 FROM [Sequence]
UNION ALL
SELECT 9 FROM [Sequence]
) AS Digits
) AS TenThousands
;
Here's the OP's test table and data:
CREATE TABLE Test1 (
aa_comp VARCHAR(100) NOT NULL)
;
INSERT INTO Test1 (aa_comp) VALUES ('03, 04, 05')
;
INSERT INTO Test1 (aa_comp) VALUES ('03, 07, 05, 20')
;
INSERT INTO Test1 (aa_comp) VALUES ('03, 06, 07')
;
We can create a 'cross join' to return all the possible combinations of
rows between the OP's table and the sequence table and plug the seq
value into Jet's MID$ function to parse out each character individually
(best to limit the rows in the sequence table based on the maximum
character width of the column):
SELECT T1.aa_comp,
S1.seq as pos,
MID$(T1.aa_comp, S1.seq, 1) AS aa_comp_parsed
FROM Test1 AS T1,
[sequence] AS S1
WHERE S1.seq BETWEEN 1 AND 100;
I'm sure you get the basic idea. The following parses out all the
possible strings of consecutive characters, this time using a search
condition to remove the null strings:
SELECT T1.aa_comp,
S1.seq as pos1,
S2.seq as pos2,
MID$(T1.aa_comp, S1.seq, S2.seq) AS aa_comp_parsed
FROM Test1 AS T1,
[sequence] AS S1,
[sequence] AS S2
WHERE S1.seq BETWEEN 1 AND 100
AND S2.seq BETWEEN 1 AND 100
AND S1.seq < S2.seq
AND LEN(MID$(T1.aa_comp, S1.seq, S2.seq)) > 0
ORDER BY T1.aa_comp, S1.seq, S2.seq
It is immediately clear we are only interested in a small subset of
these results. Note that to stand any chance of identifying the first
and last substring we need to prefix and suffix the whole string with
the delimiting string:
SELECT T1.aa_comp,
S1.seq as pos1,
S2.seq as pos2,
MID$(T1.aa_comp, S1.seq, S2.seq) AS aa_comp_parsed
FROM
(
SELECT ', ' & aa_comp & ', ' AS aa_comp
FROM Test1
)
AS T1,
[sequence] AS S1,
[sequence] AS S2
WHERE S1.seq BETWEEN 1 AND 100
AND S2.seq BETWEEN 1 AND 100
AND S1.seq < S2.seq
AND LEN(MID$(T1.aa_comp, S1.seq, S2.seq)) > 0
ORDER BY T1.aa_comp, S1.seq, S2.seq
We are only interested in substrings that are delimited by the
delimiting string:
SELECT T1.aa_comp,
S1.seq as pos1,
S2.seq as pos2,
MID$(T1.aa_comp, S1.seq, S2.seq) AS aa_comp_parsed
FROM
(
SELECT ', ' & aa_comp & ', ' AS aa_comp
FROM Test1
)
AS T1,
[sequence] AS S1,
[sequence] AS S2
WHERE S1.seq BETWEEN 1 AND 104
AND S2.seq BETWEEN 1 AND 104
AND S1.seq < S2.seq
AND MID$(T1.aa_comp, S1.seq, 2) = ', '
AND MID$(T1.aa_comp, S2.seq, 2) = ', '
ORDER BY T1.aa_comp, S1.seq, S2.seq;
OK, the above is still returning too many rows. We want to remove the
rows which have the delimiting string in addition to each end of the
substring.
For me, the real genius is realization that
MID$(T1.aa_comp, S1.seq, S2.seq) AS aa_comp_parsed
may be re-written as
MID$(T1.aa_comp, S1.seq, S2.seq - S1.seq) AS aa_comp_parsed
There are various ways of parsing/filtering out superfluous rows but
the MIN is particularly well optimized.
Here's the final resultset:
SELECT T1.aa_comp,
S1.seq as pos1,
MID$(T1.aa_comp, S1.seq + 2, MIn(S2.seq) - S1.seq - 2) AS
aa_comp_parsed
FROM
(
SELECT ', ' & aa_comp & ', ' AS aa_comp
FROM Test1
)
AS T1,
[sequence] AS S1,
[sequence] AS S2
WHERE S1.seq BETWEEN 1 AND 104
AND S2.seq BETWEEN 1 AND 104
AND S1.seq < S2.seq
AND MID$(T1.aa_comp, S1.seq, 2) = ', '
AND MID$(T1.aa_comp, S2.seq, 2) = ', '
GROUP BY T1.aa_comp, S1.seq
ORDER BY T1.aa_comp, S1.seq;
As ever, here's my test code:
Sub parmlist()
Kill "C:\DropMe.mdb"
Dim cat
Set cat = CreateObject("ADOX.Catalog")
With cat
.Create _
"Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=C:\DropMe.mdb"
With .ActiveConnection
' Create 100K row 'sequence' auxilary table
.Execute _
"CREATE TABLE [Sequence] (seq INTEGER NOT" & _
" NULL CONSTRAINT pk__Sequence PRIMARY KEY);"
.Execute _
"INSERT INTO [Sequence] (seq) VALUES (-1)"
Dim sql
sql = sql & _
"INSERT INTO [Sequence] (seq) SELECT Units.nbr" & _
" + Tens.nbr + Hundreds.nbr + Thousands.nbr" & _
" + TenThousands.nbr AS seq FROM ( SELECT" & _
" nbr FROM ( SELECT 0 AS nbr FROM [Sequence]" & _
" UNION ALL SELECT 1 FROM [Sequence] UNION" & _
" ALL SELECT 2 FROM [Sequence] UNION ALL" & _
" SELECT 3 FROM [Sequence] UNION ALL SELECT" & _
" 4 FROM [Sequence] UNION ALL SELECT 5 FROM" & _
" [Sequence] UNION ALL SELECT 6 FROM [Sequence]" & _
" UNION ALL SELECT 7 FROM [Sequence] UNION" & _
" ALL SELECT 8 FROM [Sequence] UNION ALL" & _
" SELECT 9 FROM [Sequence] ) AS Digits )" & _
" AS Units, ( SELECT nbr * 10 AS nbr FROM" & _
" ( SELECT 0 AS nbr FROM [Sequence] UNION" & _
" ALL SELECT 1 FROM [Sequence] UNION ALL" & _
" SELECT 2 FROM [Sequence] UNION ALL SELECT" & _
" 3 FROM [Sequence] UNION ALL SELECT 4 FROM" & _
" [Sequence] UNION ALL SELECT 5 FROM [Sequence]" & _
" UNION ALL SELECT 6 FROM [Sequence] UNION" & _
" ALL SELECT 7 FROM [Sequence] UNION ALL" & _
" SELECT 8 FROM [Sequence] UNION ALL SELECT" & _
" 9 FROM [Sequence] ) AS Digits ) AS Tens," & _
" ( SELECT nbr * 100 AS nbr FROM ( SELECT"
sql = sql & _
" 0 AS nbr FROM [Sequence] UNION ALL SELECT" & _
" 1 FROM [Sequence] UNION ALL SELECT 2 FROM" & _
" [Sequence] UNION ALL SELECT 3 FROM [Sequence]" & _
" UNION ALL SELECT 4 FROM [Sequence] UNION" & _
" ALL SELECT 5 FROM [Sequence] UNION ALL" & _
" SELECT 6 FROM [Sequence] UNION ALL SELECT" & _
" 7 FROM [Sequence] UNION ALL SELECT 8 FROM" & _
" [Sequence] UNION ALL SELECT 9 FROM [Sequence]" & _
" ) AS Digits ) AS Hundreds, ( SELECT nbr" & _
" * 1000 AS nbr FROM ( SELECT 0 AS nbr FROM" & _
" [Sequence] UNION ALL SELECT 1 FROM [Sequence]" & _
" UNION ALL SELECT 2 FROM [Sequence] UNION" & _
" ALL SELECT 3 FROM [Sequence] UNION ALL" & _
" SELECT 4 FROM [Sequence] UNION ALL SELECT" & _
" 5 FROM [Sequence] UNION ALL SELECT 6 FROM" & _
" [Sequence] UNION ALL SELECT 7 FROM [Sequence]" & _
" UNION ALL SELECT 8 FROM [Sequence] UNION" & _
" ALL SELECT 9 FROM [Sequence] ) AS Digits" & _
" ) AS Thousands, ( SELECT nbr * 10000 AS" & _
" nbr FROM ( SELECT 0 AS nbr FROM [Sequence]" & _
" UNION ALL SELECT 1 FROM [Sequence] UNION" & _
" ALL SELECT 2 FROM [Sequence] UNION ALL" & _
" SELECT 3 FROM [Sequence] UNION ALL SELECT"
sql = sql & _
" 4 FROM [Sequence] UNION ALL SELECT 5 FROM" & _
" [Sequence] UNION ALL SELECT 6 FROM [Sequence]" & _
" UNION ALL SELECT 7 FROM [Sequence] UNION" & _
" ALL SELECT 8 FROM [Sequence] UNION ALL" & _
" SELECT 9 FROM [Sequence] ) AS Digits )" & _
" AS TenThousands;"
.Execute sql
' Create test table
.Execute _
"CREATE TABLE Test1 (" & _
" aa_comp VARCHAR(100)" & _
" NOT NULL)"
' Create ', ' delimted data
.Execute _
"INSERT INTO Test1 (aa_comp)" & _
" VALUES ('03, 04, 05');"
.Execute _
"INSERT INTO Test1 (aa_comp)" & _
" VALUES ('03, 07, 05, 20');"
.Execute _
"INSERT INTO Test1 (aa_comp)" & _
" VALUES ('03, 06, 07');"
Dim rs
Set rs = .Execute( _
"SELECT T1.aa_comp, S1.seq as pos, MID$(T1.aa_comp," & _
" S1.seq, 1) AS aa_comp_parsed FROM Test1" & _
" AS T1, [sequence] AS S1 WHERE S1.seq BETWEEN" & _
" 1 AND 100 AND LEN(MID$(T1.aa_comp, S1.seq," & _
" 1)) > 0;")
MsgBox _
"Example 1: parse out each character" & vbCr & vbCr & _
"aa_comp " & vbTab & "pos" & vbTab & _
"parsed" & vbCr & _
rs.GetString
rs.Close
With rs
.CursorLocation = 3
.ActiveConnection = cat.ActiveConnection
.Source = _
"SELECT T1.aa_comp, S1.seq as pos1, S2.seq" & _
" as pos2, MID$(T1.aa_comp, S1.seq, S2.seq)" & _
" AS aa_comp_parsed FROM ( SELECT ', ' &" & _
" aa_comp & ', ' AS aa_comp FROM Test1 )" & _
" AS T1, [sequence] AS S1, [sequence] AS" & _
" S2 WHERE S1.seq BETWEEN 1 AND 100 AND S2.seq" & _
" BETWEEN 1 AND 100 AND S1.seq < S2.seq AND" & _
" LEN(MID$(T1.aa_comp, S1.seq, S2.seq)) >" & _
" 0 ORDER BY T1.aa_comp, S1.seq, S2.seq "
.Open
MsgBox _
"Example 2: parse out each substring" & vbCr & _
"(" & rs.RecordCount & " rows)" & vbCr & vbCr & _
"aa_comp " & vbTab & "pos1" & vbTab & _
"pos2" & vbTab & "parsed" & vbCr & _
.GetString
.Close
End With
Set rs = .Execute( _
"SELECT T1.aa_comp, S1.seq as pos1, S2.seq" & _
" as pos2, MID$(T1.aa_comp, S1.seq, S2.seq)" & _
" AS aa_comp_parsed FROM ( SELECT ', ' &" & _
" aa_comp & ', ' AS aa_comp FROM Test1 )" & _
" AS T1, [sequence] AS S1, [sequence] AS" & _
" S2 WHERE S1.seq BETWEEN 1 AND 104 AND S2.seq" & _
" BETWEEN 1 AND 104 AND S1.seq < S2.seq AND" & _
" MID$(T1.aa_comp, S1.seq, 2) = ', ' AND" & _
" MID$(T1.aa_comp, S2.seq, 2) = ', ' ORDER" & _
" BY T1.aa_comp, S1.seq, S2.seq;")
MsgBox _
"Example 3: only interested in substrings" & vbCr & _
"prefixed and suffixed by delimiter:" & vbCr & vbCr & _
"aa_comp " & vbTab & "pos1" & vbTab & _
"pos2" & vbTab & "parsed" & vbCr & _
rs.GetString
rs.Close
Set rs = .Execute( _
"SELECT T1.aa_comp, S1.seq as pos1, MID$(T1.aa_comp," & _
" S1.seq + 2, MIn(S2.seq) - S1.seq - 2) AS" & _
" aa_comp_parsed FROM ( SELECT ', ' & aa_comp" & _
" & ', ' AS aa_comp FROM Test1 ) AS T1, [sequence]" & _
" AS S1, [sequence] AS S2 WHERE S1.seq BETWEEN" & _
" 1 AND 104 AND S2.seq BETWEEN 1 AND 104" & _
" AND S1.seq < S2.seq AND MID$(T1.aa_comp," & _
" S1.seq, 2) = ', ' AND MID$(T1.aa_comp," & _
" S2.seq, 2) = ', ' GROUP BY T1.aa_comp," & _
" S1.seq ORDER BY T1.aa_comp, S1.seq;")
MsgBox _
"Final resultset: removed rows containing" & vbCr & _
"delimiting characters:" & vbCr & vbCr & _
"aa_comp " & vbTab & "pos1" & vbTab & _
"parsed" & vbCr & _
rs.GetString
rs.Close
End With
Set .ActiveConnection = Nothing
End With
End Sub
Jamie.
--