Count letter string, e.g. h/EL/p

G

Guest

I want to count how many times a string of letters occurs in each row in a
column. Column A has 1000 rows with 1 word in each row (1000 most frequent
English words). e.g. I would like to count how many words contain the string
of letters "ee" (sEEn).

I also want to count how frequent a string such as "A, any letter, E" (mAdE)
occurs.

Thanks

Lee
 
R

Ron Rosenfeld

I want to count how many times a string of letters occurs in each row in a
column. Column A has 1000 rows with 1 word in each row (1000 most frequent
English words). e.g. I would like to count how many words contain the string
of letters "ee" (sEEn).

I also want to count how frequent a string such as "A, any letter, E" (mAdE)
occurs.

Thanks

Lee

Just use the COUNTIF worksheet function with wild cards if you want a case
insensitive count.

e.g:

=COUNTIF(A:A,"*ee*")
=COUNTIF(A:A,"*A?E*)

--ron
 
G

Guest

try:

=SUM(N(ISNUMBER(SEARCH("a?e",A2:A1000))))

=SUM(N(ISNUMBER(SEARCH("ee",A2:A1000))))

Enter with Ctrl+Shift+Enter
 
S

Sandy Mann

One way:

With the string that you are looking for in D1 try:

=SUM(LEN(A1:A1000)-LEN(SUBSTITUTE(LOWER(A1:A1000),D1,"")))/LEN(D1)

Array entered with Ctrl + Shift + Enter

--
HTH

Sandy
In Perth, the ancient capital of Scotland
and the crowning place of kings

(e-mail address removed)
Replace @mailinator.com with @tiscali.co.uk
 
G

Guest

..... why do I make life difficult!!

Ron Rosenfeld said:
Just use the COUNTIF worksheet function with wild cards if you want a case
insensitive count.

e.g:

=COUNTIF(A:A,"*ee*")
=COUNTIF(A:A,"*A?E*)

--ron
 
T

T. Valko

Try this:

For "ee" :

=COUNTIF(A1:A10,"*ee*")

For "A, any letter, E" (mAdE) :

=COUNTIF(A1:A10,"*a?e*")
 
R

Ron Rosenfeld

One way:

With the string that you are looking for in D1 try:

=SUM(LEN(A1:A1000)-LEN(SUBSTITUTE(LOWER(A1:A1000),D1,"")))/LEN(D1)

Array entered with Ctrl + Shift + Enter


How does that work with wildcards? e.g. a?e for mAdE as the OP requested?
--ron
 
S

Sandy Mann

The <> operator is better but Ron has us both beaten hands down if the OP
only want to count words not multiple occurrences within words.
--
Regards,


Sandy
In Perth, the ancient capital of Scotland
and the crowning place of kings

(e-mail address removed)
Replace @mailinator.com with @tiscali.co.uk
 
S

Sandy Mann

Ron Rosenfeld said:
How does that work with wildcards? e.g. a?e for mAdE as the OP requested?

mmmm....

Didn't think it through did I?

--

Sandy
In Perth, the ancient capital of Scotland
and the crowning place of kings

(e-mail address removed)
Replace @mailinator.com with @tiscali.co.uk
 
R

Ron Rosenfeld

The <> operator is better but Ron has us both beaten hands down if the OP
only want to count words not multiple occurrences within words.

For ease of formulating, I'd probably use a regular expression UDF if he wanted
multiple occurrences within words


--ron
 
R

Ron Rosenfeld

Thanks,

How would I count OU not OUGH (count OUT not THOUGH)?

How much variability might there be in your patterns?

Will it always be either an unqualified series of letters or a series of
letters that is NOT followed by another series?

Or are there other possibilities also?
--ron
 
G

Guest

I am counting how often specific digraphs (e.g. er, ou, ought, ow) occur in
the most common 1000 words. Sometimes the digraph will be embedded within a
word and sometimes it occurs at the end.
 
R

Ron Rosenfeld

I am counting how often specific digraphs (e.g. er, ou, ought, ow) occur in
the most common 1000 words. Sometimes the digraph will be embedded within a
word and sometimes it occurs at the end.

If my understanding of digraphs is correct, then the only variation you will
need is to separate out digraphs from trigraphs and tetragraphs, so a routine
which allows you to specify a character sequence that may not be followed by
other character(s) should be sufficient.

I also presume that if a word should have two identical digraphs (and I don't
know if any exist), that you would only want to count it once.

That being the case, this UDF should suffice:

To use the UDF, enter a formula of the type:


=SUMPRODUCT(ReCount(range,string_to_find,[string_not_to_follow],[search_entire_string]))

The arguments within brackets [...] are optional.

The [search_entire_string] option will, is set to TRUE, will count multiple
matches within a single string. In other words, looking for "ee" in
beekeeping, the function will return two; otherwise it will return one.

To return all the "ou", you would use:

=SUMPRODUCT(recount(A1:A1000,"ou"))

To return only "ou" not followed by "gh", you would use:

=SUMPRODUCT(recount(A1:A1000,"ou","gh"))

and to count all of the "ou" not followed by "gh" in each word as a separate
entry, you would use:

=SUMPRODUCT(recount(A1:A1000,"ou","gh",TRUE))

As written, the UDF is case insensitive. This could be made an optional
argument, if required.

To enter the VBA code, <alt-F11> opens the VB Editor.

Ensure your project is highlighted in the Project Explorer window, then
Insert/Module and enter the code below into the window that opens:

============================================================
Option Explicit

Function ReCount(s, p, Optional NF = "", Optional g As Boolean = False)
's: the string being searched. It can be a string, _
a cell reference, or a range of cells
'p: the pattern to search for
'NF: an optional pattern that should NOT FOLLOW p
'g: if TRUE, will cause multiple matches within a single string

If NF <> "" Then
p = p & "(?!" & NF & ")"
End If
Select Case VarType(s)
Case Is = vbString
ReCount = Cnt(s, p, g)
Case Is > vbArray
Dim c As Range, i As Long
Dim t
ReDim t(0 To s.Count)
For Each c In s
t(i) = Cnt(c.Text, p, g)
i = i + 1
Next c
ReCount = t
End Select
End Function

Function Cnt(str, sPattern, g)
Dim re As Object
Dim mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = sPattern
re.Global = g
re.ignorecase = True

Set mc = re.Execute(str)
Cnt = mc.Count
End Function
================================
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top