CountIf on Upper/Lowercase

Harlan Grove · Dec 4, 2003

Alan Beban said:
More's the pity that Excel has no expression for a null value, which is
the source of the problem. . . .

Actually it does. Write a udf that returns the variant value Empty. Pass
that udf's result to the worksheet function ISBLANK, and ISBLANK returns
TRUE. The problem is that Excel can't propagate Empty - it can only convert
it into 0 or "" depending on context. What's needed is something akin to an
error value literal. If we start asking now, Microsoft may implement this at
the same blistering pace that they did for colored worksheet tabs, i.e., we
may see it in 5-10 years.

. . . I consider the "violation" a nit; not only
is it pointed out in the description of the function, the syntax for
ArrayCountIf is the same as the COUNTIF syntax for counting empty
blanks, and shouldn't therefore send users astray.

....

Granted, but you picked on COUNTIF syntax before. Looks like you approve of
COUNTIF syntax when it happens to agree with the syntax you're using in
ArrayCountIf, but disapprove of it when it doesn't agree with your udf,
e.g., string representations of error value literals.

Harlan Grove · Dec 4, 2003

Dave Peterson said:
Just a comment about sorting fields with dashes:
Excel just ignores the dash when it sorts.

....

There's a pattern that Excel maintains when sorting text containing dashes,
so I can't accept the assertion that Excel ignores them, but it's clear that
Excel treats them differently than most other characters. FWLIW, Excel
appears to treat single quotes the same as dashes.

Sorted in ascending order,

-
--
---
Z
-Z
--Z
Z-
-Z-
Z--
ZZ
-ZZ
Z-Z
ZZ-
ZZZ

Ditto

'
''
'''
Z
'Z
''Z
Z'
'Z'
Z''
ZZ
'ZZ
Z'Z
ZZ'
ZZZ

Who knows what other surprises lurk in Excel's collation sequence. Has
Microsoft bothered to document it?

Harlan Grove · Dec 4, 2003

Harlan Grove said:
. . . More general and perhaps more efficient would be a udf
wrapper around VBA's StrComp. On my to-do list.

And now off my to-do list because here it is.

Function astrcmp( _
a As Variant, _
b As Variant, _
Optional cm As Long = 1 _
) As Variant
'-----------------------------------------------------------
'intended to be used as a udf wrapper around VBA's StrComp
'-----------------------------------------------------------
Dim aa As Variant, bb As Variant, rv As Variant
Dim d(1 To 2, 1 To 2) As Long, e(1 To 2, 1 To 2) As Long
Dim i As Long, j As Long

'quickly handle two scalars
If Not (IsArray(a) Or IsArray(b)) Then
If IsError(a) Then
astrcmp = a
ElseIf IsError(b) Then
astrcmp = b
Else
astrcmp = StrComp(a, b, cm)
End If

Exit Function
End If

'only use first area of range arguments
If TypeOf a Is Range Then a = a.Areas(1).Value
If TypeOf b Is Range Then b = b.Areas(1).Value

'get dimensions to be used with 0-based result
On Error Resume Next
d(1, 1) = -1
d(1, 1) = UBound(a, 1) - LBound(a, 1)
d(1, 2) = -1
d(1, 2) = UBound(a, 2) - LBound(a, 2)
d(2, 1) = -1
d(2, 1) = UBound(b, 1) - LBound(b, 1)
d(2, 2) = -1
d(2, 2) = UBound(b, 2) - LBound(b, 2)
Err.Clear
On Error GoTo 0

'load a into a possibly degenerate 2D array aa
If d(1, 1) = -1 Then
ReDim aa(0 To 0, 0 To 0)
aa(0, 0) = a
d(1, 1) = 0
d(1, 2) = 0
ElseIf d(1, 2) = -1 Then
ReDim aa(0 To 0, 0 To d(1, 1))
For j = 0 To d(1, 1)
aa(0, j) = a(j + LBound(a, 1))
Next j
d(1, 2) = d(1, 1)
d(1, 1) = 0
Else
ReDim aa(0 To d(1, 1), 0 To d(1, 2))
For i = 0 To d(1, 1)
For j = 0 To d(1, 2)
aa(i, j) = a(i + LBound(a, 1), j + LBound(a, 2))
Next j
Next i
End If

'load b into a possibly degenerate 2D array bb
If d(2, 1) = -1 Then
ReDim bb(0 To 0, 0 To 0)
bb(0, 0) = b
d(2, 1) = 0
d(2, 2) = 0
ElseIf d(2, 2) = -1 Then
ReDim bb(0 To 0, 0 To d(2, 1))
For j = 0 To d(2, 1)
bb(0, j) = b(j + LBound(b, 1))
Next j
d(2, 2) = d(2, 1)
d(2, 1) = 0
Else
ReDim bb(0 To d(2, 1), 0 To d(2, 2))
For i = 0 To d(2, 1)
For j = 0 To d(2, 2)
bb(i, j) = b(i + LBound(b, 1), j + LBound(b, 2))
Next j
Next i
End If

'set return value array's dimensions to the larger of
'aa's and bb's row and column dimensions
ReDim rv(0 To IIf(d(1, 1) > d(2, 1), d(1, 1), d(2, 1)), _
0 To IIf(d(1, 2) > d(2, 2), d(1, 2), d(2, 2)))

For i = 0 To UBound(rv, 1)
For j = 0 To UBound(rv, 2)
If (i > d(1, 1) And d(1, 1) > 0) Or (i > d(2, 1) And d(2, 1) > 0) _
Or (j > d(1, 2) And d(1, 2) > 0) Or (j > d(2, 2) And d(2, 2) > 0) _
Then
rv(i, j) = CVErr(xlErrNA)
Else
e(1, 1) = LBound(aa, 1) + IIf(d(1, 1) > 0, i, 0)
e(1, 2) = LBound(aa, 2) + IIf(d(1, 2) > 0, j, 0)
e(2, 1) = LBound(bb, 1) + IIf(d(2, 1) > 0, i, 0)
e(2, 2) = LBound(bb, 2) + IIf(d(2, 2) > 0, j, 0)

rv(i, j) = StrComp(aa(e(1, 1), e(1, 2)), bb(e(2, 1), e(2, 2)), cm)
End If
Next j
Next i

astrcmp = rv
End Function

Not elegant, but it seems to reproduce Excel's range/array size semantics
when used as operands. Well, not quite - if fed a 1D array and either a
scalar or another 1D array, it returns a 2D array with degenerate row
dimension, e.g., rv(0 to 0, 0 to whatever). I could have dealt with that as
a special case, but I didn't see the point.

Anyway, with this udf, counting text in case-sensitive fashion becomes a
simple matter of using this inside a SUMPRODUCT call.

COUNTIF(X,"=Y")
=SUMPRODUCT(--(astrcmp(X,"Y",0)=0))

COUNTIF(X,"<>Y")
=SUMPRODUCT(--(astrcmp(X,"Y",0)<>0))

COUNTIF(X,"<Y")
=SUMPRODUCT(--(astrcmp(X,"Y",0)<0))

COUNTIF(X,"<=Y")
=SUMPRODUCT(--(astrcmp(X,"Y",0)<=0))

COUNTIF(X,">=Y")
=SUMPRODUCT(--(astrcmp(X,"Y",0)>=0))

COUNTIF(X,">Y")
=SUMPRODUCT(--(astrcmp(X,"Y",0)>0))

Alan Beban · Dec 4, 2003

Harlan said:
...

Granted, but you picked on COUNTIF syntax before. Looks like you approve of
COUNTIF syntax when it happens to agree with the syntax you're using in
ArrayCountIf, but disapprove of it when it doesn't agree with your udf,
e.g., string representations of error value literals.

I'm not sure what the point of this comment is. Nevertheless, I "picked
on COUNTIF syntax" when it gives quirky, arguably erroneous results,
like =COUNTIF(A1:A2,A2) returning 0 when A2 is an empty blank, or not
sorting error values as though they are of equal value. I didn't
approve or disapprove of the syntax =COUNTIF(rng,"=") to count empty
blanks but simply pointed out that the syntax of ArrayCountIf to count
blanks was the same as COUNTIF's and thus unlikely to cause difficulty
for users.

Alan Beban

Harlan Grove · Dec 4, 2003

...
...

I'm not sure what the point of this comment is. Nevertheless, I "picked
on COUNTIF syntax" when it gives quirky, arguably erroneous results,
like =COUNTIF(A1:A2,A2) returning 0 when A2 is an empty blank, or not
sorting error values as though they are of equal value. I didn't
approve or disapprove of the syntax =COUNTIF(rng,"=") to count empty
blanks but simply pointed out that the syntax of ArrayCountIf to count
blanks was the same as COUNTIF's and thus unlikely to cause difficulty
for users.

You griped about COUNTIF treating "#DIV/0!" and #DIV/0! the same as 2nd,
criteria arguments. You're failing to appreciate that the criteria arguments to
COUNTIF (and SUMIF) are *expressions*, not plain text. That it takes the
criteria argument "#DIV/0!*" to match literal text "#DIV/0!" in a range is *NOT*
an error any more than needing to write SEARCH("~?","Who's there?") to get the
result 12 rather than SEARCH("?","Who's there?") returning 1 is an error. The
next thing you'll be telling the ng is that it's an error that
COUNTIF(Range,"#DIV/0!~*") is an error if one wants to match literal strings
"#DIV/0!*" in Range.

Actually, the expressions that COUNTIF takes as 2nd argument can't represent
everything that could appear in a cell. A large part of the problem is that the
criteria expression is converted to text. That means there's no way to use
COUNTIF to distinguish between literal text "#DIV/0!" alone or followed by any
other text without using two COUNTIF calls,

=COUNTIF(Range,"#DIV/0!*")-COUNTIF(Range,"#DIV/0!?*")

So I'll agree that there's awkwardness using COUNTIF criteria which is made
worse by Microsoft's continuing failure to recognize and comprehend the manifest
superiority of text processing methodologies invented by users of other
operating systems (if tildes were treated the same as backslashes in Unix, it'd
be very simple indeed to sort out "#DIV/0!" as an error value literal token and
"~#DIV/0!" as the literal text '#DIV/0!', but Microsoft just can't help making
stupid design decisions - in this case, tilde is only special if it precedes *,
? or itself).

Alan Beban · Dec 4, 2003

I didn't "gripe" about anything. Nor did I say that requiring the
criteria argument "#DIV/0!*" to match the literal text "#DIV/0!" in a
range is an error. What I said exactly, in connection with
COUNTIF(S1:S8,"#DIV/0!") interpreting the second argument as the error
value rather than a text string, and why ArrayCountIf does not, was:

"As one result, with 1 in A1 and the text string (not the error value)
#DIV/0! in A2, =COUNTIF(A1:A2,A2) erroneously (I submit) returns 0."

Rationalize it however you will, it is anomalous that a function that
purports to indicate the number of occurrences of the contents of A2 in
the range A1:A2 reports that the contents of A2 don't occur in A1:A2 at
all. I chose to avoid that anomaly in ArrayCountIf, that's all.

Alan Beban

Harlan Grove · Dec 5, 2003

...
...

"As one result, with 1 in A1 and the text string (not the error value)
#DIV/0! in A2, =COUNTIF(A1:A2,A2) erroneously (I submit) returns 0."

Rationalize it however you will, it is anomalous that a function that
purports to indicate the number of occurrences of the contents of A2 in
the range A1:A2 reports that the contents of A2 don't occur in A1:A2 at
all. I chose to avoid that anomaly in ArrayCountIf, that's all.

But you choose to retain the anomaly of using "=" as 2nd argument to match
'empty blanks'. Consistency is a good thing in programming, so it's not a
surprise you don't strive for it. Nothing like militant amateurism.

It's not rationalizing, and it's not an anomaly. You just don't get it about
expressions. I'll try one more time with a simpler example. If each cell in
A1:A5 contained a single character, how would you count up the '<' and '>'? Not
with ">" or "<". Same general reason text that's indistinguishable from error
value tokens isn't treated as plain text.

Now, I also need to correct your misunderstanding of what COUNTIF does. As it
says in online help,

"[COUNTIF] Counts the number of cells within a range that meet the given
criteria."

Apparently you're incapable of distinguishing between satisfying criteria and
literal textual matching. Again, the second argument to COUNTIF (and SUMIF) is
an *EXPRESSION*, *_N_O_T_* literal text. Excel does exactly what it claims to
do. That you don't seem capable of understanding what it does doesn't come as a
complete surprise.

Alan Beban · Dec 5, 2003

Harlan said:
It's not rationalizing, and it's not an anomaly. You just don't get

it > about expressions. I'll try one more time with a simpler example.
If > each cell in A1:A5 contained a single character, how would you
count > up the '<' and '>'? Not with ">" or "<".

How would I? Funny you should ask: =ArrayCountIf(A1:A5, "<") and
=ArrayCountIf(A1:A5,">").

Alan Beban

Dave Peterson · Dec 5, 2003

I don't recall seeing anything from MS, but I haven't looked too hard, either.

And your modified list after =SUBSTITUTE(A1,"-","") is dragged down

(looks blank)
(looks blank)
(looks blank)
Z
Z
Z
Z
Z
Z
ZZ
ZZ
ZZ
ZZ
ZZZ

(It's easier for me to see the sorted order.)

Alan Beban · Dec 5, 2003

From Microsoft Excel Help:

Default sort orders
Microsoft Excel uses specific sort orders to arrange data according to
the value, not the format, of the data.

In an ascending sort, Excel uses the following order. (In a descending
sort, this sort order is reversed except for blank cells, which are
always placed last.)

Numbers Numbers are sorted from the smallest negative number to the
largest positive number.

Alphanumeric sort When you sort alphanumeric text, Excel sorts left to
right, character by character. For example, if a cell contains the text
"A100," Excel places the cell after a cell that contains the entry "A1"
and before a cell that contains the entry "A11."

Text and text that includes numbers are sorted in the following order:

0 1 2 3 4 5 6 7 8 9 (space) ! " # $ % & ( ) * , . / : ; ? @ [ \ ] ^ _ `
{ | } ~ + < = > A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Apostrophes (') and hyphens (-) are ignored, with one exception: If two
text strings are the same except for a hyphen, the text with the hyphen
is sorted last.

Logical values In logical values, FALSE is placed before TRUE.

Error values All error values are equal.

Blanks Blanks are always placed last.

Alan Beban

Alan Beban · Dec 5, 2003

Harlan said:
. . .
It's not rationalizing, and it's not an anomaly. You just don't get it about
expressions. I'll try one more time with a simpler example. If each cell in
A1:A5 contained a single character, how would you count up the '<' and '>'? Not
with ">" or "<". Same general reason text that's indistinguishable from error
value tokens isn't treated as plain text.

You're right, I don't get it. With COUNTIF you'd count the '<' with
=COUNTIF(rng,"="&"<"),interpreting "<" as plain text. What does that
have to do with =COUNTIF(rng,"="&"#DIV/0!") not interpreting "#DIV/0!"
as plain text?

Alan Beban

Harlan Grove · Dec 5, 2003

it > about expressions. I'll try one more time with a simpler example.
If > each cell in A1:A5 contained a single character, how would you
count > up the '<' and '>'? Not with ">" or "<".

How would I? Funny you should ask: =ArrayCountIf(A1:A5, "<") and
=ArrayCountIf(A1:A5,">").

No doubt you would. More sensible people would use COUNTIF(A1:A5,"=<") and
COUNTIF(A1:A5,"=>").

Harlan Grove · Dec 5, 2003

...
...

You're right, I don't get it. With COUNTIF you'd count the '<' with
=COUNTIF(rng,"="&"<"),interpreting "<" as plain text. What does that
have to do with =COUNTIF(rng,"="&"#DIV/0!") not interpreting "#DIV/0!"
as plain text?

The point is that the second argument to COUNTIF is a criteria expression.

. Criteria expressions may also contain the wildcards * and ?, and it's possible to force these wildcards to be interpretted as literal text by preceding them with ~.

[For the most part, tilde processing in COUNTIF matches that of SEARCH and
Edit>Find, but there's one REALLY ANNOYING exception, e.g., if A1 contained a
single tilde, COUNTIF(A1,"~"), SEARCH("~",A1) and SEARCH("~~",A1) all return 1,
but COUNTIF(A1,"~~") returns 0. At least you share one characteristic with
Excel's programmers: minimal concern for consistency.]

In addition to that, COUNTIF criteria expressions may be text or not, but they
are converted to text when processed. To me, it makes sense that Excel would
treat the error value tokens as error values even if they're in text arguments.
That provides consistency with how it treats numeric- and boolean-valued
criteria arguments.

The reason preceding error value tokens with equal signs doesn't change its
meaning to literal text matching is, simply, that the semantics of the error
value token are unchanged by preceding them with operators. With the sole
exception of "" and "=" criteria with respect to empty cells and cells
evaluating to "", there's no difference between criteria beginning with an equal
sign and the same criteria without the initial equal sign. That is, "#DIV/0!"
and "=#DIV/0!" mean the same thing, and also mean the same thing as #DIV/0! and
"="&#DIV/0! (which actually evaluate to the same thing), and are consistent with
how Excel interprets 1, "="&1, "1" and "=1" or FALSE, "="&FALSE, "FALSE" and
"=FALSE". If COUNTIF didn't work this way, it'd be impossible to count the cells
in a range that didn't match a particular error value, e.g., to exclude #DIV/0!
results from a count, use

=COUNTIF(Range,"<>#DIV/0!")

but *NOT*

=COUNTIF(Range,"<>"&#DIV/0!)

because the criteria argument in the latter is "<>"&#DIV/0!, which will evaluate
to #DIV/0!, so be interpretted as =#DIV/0!, the opposite of what's intended.

Fortunately, if error value tokens in strings are *followed* by other text in
criteria expressions, they are treated as literal text rather than as error
tokens. This is likely because error values as such would always appear without
any other text. That is, a true #DIV/0! error value will be only #DIV/0!, not
+++#DIV/0!+++ or anything else.

So the lesson you need to learn is that preceding a text string A with a string
containing a comparison operator C won't change the interpretation of A. Only
adding other text to A will do that.

Harlan Grove · Dec 5, 2003

From Microsoft Excel Help:

Default sort orders ...
Apostrophes (') and hyphens (-) are ignored, with one exception: If two
text strings are the same except for a hyphen, the text with the hyphen
is sorted last. ...

<quoting removed for clarity>

-
--
---
Z
-Z
--Z
Z-
-Z-
Z--
ZZ
-ZZ
Z-Z
ZZ-
ZZZ

Interesting definition of 'sorted last'. Why does -Z appear before Z- ? Both are
the same if the dashes were removed. It appears that while Excel may sort them
last, it does so by looking for dashes and single quotes right to left. Weird!

Dave Peterson · Dec 6, 2003

I guess I could have looked more closely <bg>.

thanks for looking and finding it.

Alan said:
From Microsoft Excel Help:

Default sort orders
Microsoft Excel uses specific sort orders to arrange data according to
the value, not the format, of the data.

In an ascending sort, Excel uses the following order. (In a descending
sort, this sort order is reversed except for blank cells, which are
always placed last.)

Numbers Numbers are sorted from the smallest negative number to the
largest positive number.

Alphanumeric sort When you sort alphanumeric text, Excel sorts left to
right, character by character. For example, if a cell contains the text
"A100," Excel places the cell after a cell that contains the entry "A1"
and before a cell that contains the entry "A11."

Text and text that includes numbers are sorted in the following order:

0 1 2 3 4 5 6 7 8 9 (space) ! " # $ % & ( ) * , . / : ; ? @ [ \ ] ^ _ `
{ | } ~ + < = > A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Apostrophes (') and hyphens (-) are ignored, with one exception: If two
text strings are the same except for a hyphen, the text with the hyphen
is sorted last.

Logical values In logical values, FALSE is placed before TRUE.

Error values All error values are equal.

Blanks Blanks are always placed last.

Alan Beban

Dave said:

I don't recall seeing anything from MS, but I haven't looked too hard, either.

And your modified list after =SUBSTITUTE(A1,"-","") is dragged down

(looks blank)
(looks blank)
(looks blank)
Z
Z
Z
Z
Z
Z
ZZ
ZZ
ZZ
ZZ
ZZZ

(It's easier for me to see the sorted order.)

Click to expand...