a |=b, a | b and a || b (why not a ||= b?)

Mats-Lennart Hansson · Dec 13, 2004

Hi,
Why isn't it possible to use a ||= b? To increase efficiency, I only want to
evaluate as much as neccessary. Do I have to write it the "longer" way (a =
a || b)?

Thanks,

Mats-Lennart

Jon Skeet [C# MVP] · Dec 13, 2004

Mats-Lennart Hansson said:
Why isn't it possible to use a ||= b? To increase efficiency, I only want to
evaluate as much as neccessary. Do I have to write it the "longer" way (a =
a || b)?

I suspect it's because that would be a very rarely used operator - it's
not that often that booleans are used in that way, I believe. Writing
it the long way is also clearer, in my view.

Guest · Dec 13, 2004

The |= operator is what you're looking for. It works just fine on boolean
values.

Jon Skeet [C# MVP] · Dec 13, 2004

Jeremy Davis said:
The |= operator is what you're looking for. It works just fine on boolean
values.

Yes, but it doesn't short-circuit, which is what the OP wanted, I
believe. Effectively:

if (!a)
{
a = b;
}

(where b may be a method call or property).

Nick Malik · Dec 14, 2004

because a three-symbol operator like ||= would require a three-symbol
lookahead in the parser, which makes the lexical scanner more complex.

Given the fact that the optimized version of the two forms (a ||= b and a=a
|| b) would very likely produce the same IL code, it doesn't make sense to
make the lexical scanner more complicated.

--- Nick

Frans Bouma [C# MVP] · Dec 14, 2004

Nick said:
because a three-symbol operator like ||= would require a three-symbol
lookahead in the parser, which makes the lexical scanner more complex.

This is nonsense. You have a lexical analyzer, which converts text in a
textstream into tokens in a tokenstream. That tokenstream goes into the
parser. So the parser works with tokens. All it sees is: a OP b and OP
is an operator, in this case '||=', but it also could have been
'--------->'.

A lexical analyzer works with an NFA, so it doesn't matter how long an
operator is, or identifier for that matter, as long as there is an
unabiguous way to identify a token. In this case, there is, as ||= can
be seen as a separate token, as '=' is not allowed to be part of any
identifier.

Given the fact that the optimized version of the two forms (a ||= b and a=a
|| b) would very likely produce the same IL code, it doesn't make sense to
make the lexical scanner more complicated.

also that's IMHO a non-valid reason to disallow ||= (and &&=). Just
because bitoperators on bools have the same effect as logical operators
on bools doesn't make this right. So although a|=b will cause the same
result to be written to a as a=a || b, toesn't make a|= an equivalent to
a||=b in terms of syntax elements.

I for one would opt for a ||= and &&= operator. I don't buy the 'a=a||b
is more readable' argument as a+=b then also should be disallowed.

For the people who think a||=b is not that common, write 2 inner search
loops, to track that you've found a match you can set a bool foundOne to
false and OR that with the current match result (which returns false if
not equal, very common). Saves if statements.

But then again, how many people write:
if(a)
{
b=10;
}
else
{
b=12;
}

as:
b=12;
if(a)
{
b=10;
}

??

Frans.

Nick Malik · Dec 14, 2004

First off, I apoligize if my post wasn't clear. I think that we may be
discussing the same thing using slightly different words.

The part of the parser that is made more complicated by a three-symbol
look-ahead is the lexical scanner. My statement wasn't clear. I also use
the term lexical scanner interchangably with lexical analyzer (which is
probably not "correct" ... alas :-)

. So we are talking about the same
thing. I've always considered the lexical scanner to be PART of the parser
and not seperate from it, and that may also have led to some of the
'language confusion' in my message.

Frans Bouma said:
This is nonsense. You have a lexical analyzer, which converts text in a
textstream into tokens in a tokenstream. That tokenstream goes into the
parser. So the parser works with tokens. All it sees is: a OP b and OP
is an operator, in this case '||=', but it also could have been
'--------->'.

A lexical analyzer works with an NFA, so it doesn't matter how long an
operator is, or identifier for that matter, as long as there is an
unabiguous way to identify a token. In this case, there is, as ||= can
be seen as a separate token, as '=' is not allowed to be part of any
identifier.

It does matter how long a token is in this case, since the operator ||= can
be scanned in many ways:
1) It can be scanned as three valid tokens t('|') t('|') t('=')
2) It can be scanned as two valid tokens t('||') t('=')
3) It can be scanned as two valid tokens t('|') t('|=')
4) It can be scanned as a single token t('||=')

the logic needed to get to the third scan is more difficult to produce.
This is not a simple (deterministic) finite state automata. As you pointed
out, it is non-deterministic.

I have created a lexical analyzer for a production system (not a student
exercise) using symbol tables. I personally found it challenging to debug
any changes to the symbol table for two-element lookahead, and three-symbol
lookahead introduced so much concern that I withdrew three symbol tokens
from the language. The trickiest part is distinguishing between items 2 and
3 above, because most lexical analyzers that aren't hand-coded would have a
terrible time distinguishing, on a routine basis, between them.

It is simple as this: the more complex the logic needed, the greater the
liklihood of bugs and the lower the support of the tools (like tools that
create two-symbol lookahead tables). Each bug costs money to find and fix.
Therefore, increasing complexity for the sake of elegance at the expense of
the project is simply poor judgement on the part of the project manager.

also that's IMHO a non-valid reason to disallow ||= (and &&=). Just
because bitoperators on bools have the same effect as logical operators
on bools doesn't make this right.

I wasn't discussing bit operators. I'm concerned that this discussion would
jump off the rails quickly if I were to judge the language itself, which I
am not. I am only discussing the difficulty of debugging the lexical
scanner.

I for one would opt for a ||= and &&= operator. I don't buy the 'a=a||b
is more readable' argument as a+=b then also should be disallowed.

I am also not making a case for or against readability or clarity in the
language. As long as the functionality is there, I'm happy.

For the people who think a||=b is not that common, write 2 inner search
loops, to track that you've found a match you can set a bool foundOne to
false and OR that with the current match result (which returns false if
not equal, very common). Saves if statements.

I am also not making a case for how useful this particular three-token
construct may seem to be. I would state, however, that there are many
_useful_ things that currently require more than a single operator. We
could debate endlessly on the "comparitive usefulness" of a myriad of
operators that are designed to make the language more useful, in our own
personal opinion. Remember that your opinion about "what is useful" may not
be universally shared, as I'm sure that mine wouldn't either.

Even if it were, it would not change my concerns about the complexity of the
scanner.

But then again, how many people write:
if(a)
{
b=10;
}
else
{
b=12;
}

as:
b=12;
if(a)
{
b=10;
}

An interesting question, but I don't see how this is salient to my point. I
hope you don't mind if I don't attempt to respond to it.

HTH,
--- Nick

Frans Bouma [C# MVP] · Dec 15, 2004

Nick said:
First off, I apoligize if my post wasn't clear. I think that we may be
discussing the same thing using slightly different words.

ok

It does matter how long a token is in this case, since the operator ||= can
be scanned in many ways:
1) It can be scanned as three valid tokens t('|') t('|') t('=')
2) It can be scanned as two valid tokens t('||') t('=')
3) It can be scanned as two valid tokens t('|') t('|=')
4) It can be scanned as a single token t('||=')

no, it works with states. an NFA is a state machine (but infinite).
Scanning a textstream for tokens is just a statemachine, but a big one.

if the input is a||=b (let's make it complicated, as a ||= b is easy,
whitespace delimites the operands from the operator), the states could be:
..a||=b
-> identifier path start
a.||=b
-> identifier end, a is identifier or keyword
-> operator path start
a|.|=
-> operator path continue,
a||.=b
-> operator path continue
a||=.b
-> operator path end. ||= is operator
-> identifier path start

it doesn't have to look ahead more than 1 character. The state machine
automatically follows the paths of the right operator recognition, based
on the current state and the current input character. If it has seen a
'|' and current input is again '|', it doesn't recognize 2 times '|',
but it recognizes a single '||'. As there are 2 tokens with '||', it
doesn't have a match yet and thus can't create a token yet. If the '='
was omitted, the token would have been '||', but it's not so the token
is '||='.

the logic needed to get to the third scan is more difficult to produce.
This is not a simple (deterministic) finite state automata. As you pointed
out, it is non-deterministic.

yes, these state machines are not easy to write, that's why there are
tools like Lex

I have created a lexical analyzer for a production system (not a student
exercise) using symbol tables. I personally found it challenging to debug
any changes to the symbol table for two-element lookahead, and three-symbol
lookahead introduced so much concern that I withdrew three symbol tokens
from the language. The trickiest part is distinguishing between items 2 and
3 above, because most lexical analyzers that aren't hand-coded would have a
terrible time distinguishing, on a routine basis, between them.

That's the problem you get when you merge a lexical analyzer with a
parser. A parser parses tokens, not textstreams. However if you
integrate lexical analyzer logic with the parser (this is often done,
don't worry, I'm not critizising you, just look at the C# mono compiler
for example) you'll get problems because you want to make decisions
based on textinput, which is, IMHO, not correct, as you should only make
decisions based on tokens in the parser. That way you can write LL(n)
or LR(n) parsers.

It is simple as this: the more complex the logic needed, the greater the
liklihood of bugs and the lower the support of the tools (like tools that
create two-symbol lookahead tables). Each bug costs money to find and fix.
Therefore, increasing complexity for the sake of elegance at the expense of
the project is simply poor judgement on the part of the project manager.

That's great, but has nothing to do with this though. I can use a
variable call bools, which clashes at position 5 with the keyword bool,
still the C# compiler is perfectly able to handle it. Why can it do that
but can't it handle an operator of 3 positions? (btw, what about custom
operators defined by the developer?)

I wasn't discussing bit operators. I'm concerned that this discussion would
jump off the rails quickly if I were to judge the language itself, which I
am not. I am only discussing the difficulty of debugging the lexical
scanner.

... which I appreciate but IMHO was not the point, i.e.: the lexical
analyzer can perfectly tokenize a 3-pos operator like '||='. Remember,
the lexical analyzer doesn't know that '||=' is an operator, it could
also be an identifier or whitespace, it doesn't care about that. It
simply checks its current state and the current input character and
moves on to the next state, either being a token recognition or just a
new state push. A lookahead for the lexical analyzer (parsers also use
lookaheads for tokens, but that's a total different thing!) is ONLY
needed if the lexical analyzer has to make a decision what the next
state is and it can't do that based on the current state + the current
input character. This only happens if you have a textinput which looks
the same in ascii but has different meanings based on the following
character.

I am also not making a case for how useful this particular three-token
construct may seem to be. I would state, however, that there are many
_useful_ things that currently require more than a single operator. We
could debate endlessly on the "comparitive usefulness" of a myriad of
operators that are designed to make the language more useful, in our own
personal opinion. Remember that your opinion about "what is useful" may not
be universally shared, as I'm sure that mine wouldn't either.

Please go back to what is likely the reason for the set of operators
like +=, -=, *= etc.: easier usage of the language. Why is it that there
isn't an operator &&= but there is an operator &= ? And no, not because
of lexical analyzer issues, if I can write a lex + parser that can do
it, why can't MS do it?

As Jon said, it's not very common, so I then
conclude: adding the operator is not worth the effort based on that. IF
it's not very common, I can live with it. Problem is, why did I ran into
it in various cases, where I have to use a&=b although it's
theoretically 'dirty' IMHO, even though the documentation says: "The &
operator performs a bitwise AND operation on integral operands and
logical AND on bool operands.". (i.o.w.: for logical operands, we don't
do bitwise actions but we do logical actions. Which is IMHO dirty,
because as a reader of the code you have to determine what the operands
are to understand the code: a|=b. Will that do a bitwise a OR b or will
it do a logical a OR b ?

An interesting question, but I don't see how this is salient to my point. I
hope you don't mind if I don't attempt to respond to it.

oh no problem, it was more of a response to the other reactions in the
thread as well (as I did with a couple of other snippets above). This
example was to illustrate how some constructs are said to be 'uncommon'
while they are just 'unknown' to these people.

Frans

--

Jon Skeet [C# MVP] · Dec 15, 2004

Please go back to what is likely the reason for the set of operators
like +=, -=, *= etc.: easier usage of the language. Why is it that there
isn't an operator &&= but there is an operator &= ?

For me, it's because += etc are "natural" operators - even &= is a
relatively natural operator. &&= and ||= just don't seem particularly
natural to me - to not even evaluate the RHS of an assignment-like
operator in some situations seems a bit odd. Just MHO though.

And no, not because
of lexical analyzer issues, if I can write a lex + parser that can do
it, why can't MS do it? As Jon said, it's not very common, so I then
conclude: adding the operator is not worth the effort based on that. IF
it's not very common, I can live with it. Problem is, why did I ran into
it in various cases, where I have to use a&=b although it's
theoretically 'dirty' IMHO, even though the documentation says: "The &
operator performs a bitwise AND operation on integral operands and
logical AND on bool operands.". (i.o.w.: for logical operands, we don't
do bitwise actions but we do logical actions. Which is IMHO dirty,
because as a reader of the code you have to determine what the operands
are to understand the code: a|=b. Will that do a bitwise a OR b or will
it do a logical a OR b ?

I think it's fairly understandable if you know the types of the
expressions involved - and if you don't, there are bigger problems.

Anders Borum · Dec 15, 2004

Hello!

I think it's fairly understandable if you know the types of the
expressions involved - and if you don't, there are bigger problems.

I was experiencing something along those lines the other day when I was
talking to another programmer. I had been spending some time with C# 2.0 the
day before (once again) and pointed out, that I thought it would be a major
improvement to our programmers (hence looking forward to microsoft releasing
whidbey).

I told him I was looking forward to replacing our typed collections with
generics and so no. His immediate reponse was that he thought generics would
add too much complexity to our programmers so we should refrain from using
it ..

Needless to say I was speechless :-)

Jon Skeet [C# MVP] · Dec 15, 2004

Anders Borum said:
I was experiencing something along those lines the other day when I was
talking to another programmer. I had been spending some time with C# 2.0 the
day before (once again) and pointed out, that I thought it would be a major
improvement to our programmers (hence looking forward to microsoft releasing
whidbey).

I told him I was looking forward to replacing our typed collections with
generics and so no. His immediate reponse was that he thought generics would
add too much complexity to our programmers so we should refrain from using
it ..

Needless to say I was speechless

Yikes - there are some changes which are worth making, and some which
aren't. The idea that generics aren't is one I just don't understand!

Anders Borum · Dec 15, 2004

Hello!

Yikes - there are some changes which are worth making, and some which
aren't. The idea that generics aren't is one I just don't understand!

That's it, but I was surprised to hear this from a fellow C# programmering.
It just felt odd experiencing a developer dodging something so powerful
because of language syntax.

List<string> list = new List<string>;

How simple is that?

I think it's very simple and well implemented by the C# team, not knowing
how much they actually "inherited" from the C++ template syntax. Regardless,
I find it simple and very elegant to use (the "where" clauses on generic
classes are great).

Frans Bouma [C# MVP] · Dec 16, 2004

Anders said:
Hello!

That's it, but I was surprised to hear this from a fellow C# programmering.
It just felt odd experiencing a developer dodging something so powerful
because of language syntax.

List<string> list = new List<string>;

How simple is that?

I think it's very simple and well implemented by the C# team, not knowing
how much they actually "inherited" from the C++ template syntax. Regardless,
I find it simple and very elegant to use (the "where" clauses on generic
classes are great).

It might be his eyes still burn from the time he couldn't resist but
peeked into the C++ STL sourcecode

Frans.

--

Anders Borum · Dec 16, 2004

It might be his eyes still burn from the time he couldn't resist but

peeked into the C++ STL sourcecode

Could be, but I doubt it :-)

Ravichandran J.V. · Dec 17, 2004

You cannot define operators in any language that do not exist. You
cannot overloadd operators like ||= as you cannot ** because they would
mean providing for an unknown functionality to a programmer thereby
baffling the CLR.

with regards,

J.V.Ravichandran
- http://www.geocities.com/
jvravichandran
- http://www.411asp.net/func/search?
qry=Ravichandran+J.V.&cob=aspnetpro
- http://www.southasianoutlook.com
- http://www.MSDNAA.Net
- http://www.csharphelp.com
- http://www.poetry.com/Publications/
display.asp?ID=P3966388&BN=999&PN=2
- Or, just search on "J.V.Ravichandran"
at http://www.Google.com

Jon Skeet [C# MVP] · Dec 17, 2004

Ravichandran J.V. said:
You cannot define operators in any language that do not exist. You
cannot overloadd operators like ||= as you cannot ** because they would
mean providing for an unknown functionality to a programmer thereby
baffling the CLR.

The point was to ask why they weren't in the language in the first
place.

Frans Bouma [C# MVP] · Dec 17, 2004

Ravichandran said:
You cannot define operators in any language that do not exist. You
cannot overloadd operators like ||= as you cannot ** because they would
mean providing for an unknown functionality to a programmer thereby
baffling the CLR.

... but nothing is stopping you in theory to provide the routine to
handle the operator, as it is done in f.e. C++ (and also the operator
overloading in C# uses this (though not with that much functionality as
C++ provides it). The operators in C# which map to IL statements are
thus operators which don't need a routine to handle them, as that
routine is build in.

Like a++ doesn't have to be an IL statement, it can be emitted as a = a+1;

Frans.

--

a |=b, a | b and a || b (why not a ||= b?)

Mats-Lennart Hansson

Jon Skeet [C# MVP]

Guest

Jon Skeet [C# MVP]

Nick Malik

Frans Bouma [C# MVP]

Nick Malik

Frans Bouma [C# MVP]

Jon Skeet [C# MVP]

Anders Borum

Jon Skeet [C# MVP]

Anders Borum

Frans Bouma [C# MVP]

Anders Borum

Ravichandran J.V.

Jon Skeet [C# MVP]

Frans Bouma [C# MVP]