Performance of switch{} block.

  • Thread starter Thread starter Smithers
  • Start date Start date
S

Smithers

Just wondering if the sequence of the case(s) in the switch block might
impact performance. Suppose I switch on an int, and I have 4 expected cases.
Lets say I expect case 3 to happen 95% of the time. Would the switch block
be expected to execute faster (meaning "start executing the code for case 3
sooner") when case 3 appears first, as opposed to appearing as the 3rd
case:, as in the following code?

switch (someInt)
{
case 3:
// Do stuff here
break;
case 1:
// Do stuff here
break;
case 2:
// Do stuff here
break;
case 4:
// Do stuff here
break;
default:
// Do stuff here
break;
}

Just curious - and thought someone might already know offhand.

Thanks!
 
Smithers said:
Just wondering if the sequence of the case(s) in the switch block might
impact performance. Suppose I switch on an int, and I have 4 expected cases.
Lets say I expect case 3 to happen 95% of the time. Would the switch block
be expected to execute faster (meaning "start executing the code for case 3
sooner") when case 3 appears first, as opposed to appearing as the 3rd
case:, as in the following code?

switch (someInt)
{
case 3:
// Do stuff here
break;
case 1:
// Do stuff here
break;
case 2:
// Do stuff here
break;
case 4:
// Do stuff here
break;
default:
// Do stuff here
break;
}

In principle, the compiler can build an array of code offsets (a
"vector table") and simply use the switch variable to index into. I am
under the impression that the C# compiler will do this, when the case
tags are all contiguous.

In practice, every switch statement I've ever unassembled is compiled
to an if-then-else-if tree. I do not know if this is a function of
debug mode, or whether the compiler will only use a vector table when
there are the "right" number of case statements.

In other words, yes, your case ordering probably does matter. Of
course, it would be a good idea to benchmark this (in Release mode,
without the debugger involved) to see if it makes any significant
difference. I'd be surprised if this was a big deal, outside of a loop
that's repeated millions of times a second.
 
Mark R. Dawson said:
Hi Smithers,
the case statements are evaluated sequentially in the order they are
defined [...]

Is this really true?

I don't know how the C# compiler/run-time handles things, but a good C
compiler will often convert a switch() statement into a jump-table based
solution. If the cases are contiguous integers, a simple table can be
created with jump offsets to the correct place in code. With that solution,
handling the condition of the switch occurs in linear time, with each case
take equal amounts of time (not counting secondary issues like cache or
pipeline flushes, of course).

If C# does something like that, then the order of each case is irrelevant.
Each possible value takes the exact same time to process.

Of course, if C# naively just turns a switch() statement into a sequence of
if()/else if()... statements, then yes...the first one is the quickest. But
I haven't seen any documentation that suggests this is always what C# does
with a switch().

Pete
 
Look at the generated IL with ILDASM, and you'll get your answer right away.
Hint: code several switch statements with varying numbers of cases; a good
compiler, as you say, might use different techniques depending on how many
cases there are.

Tom Dacon
Dacon Software Consulting

Peter Duniho said:
Mark R. Dawson said:
Hi Smithers,
the case statements are evaluated sequentially in the order they are
defined [...]

Is this really true?

I don't know how the C# compiler/run-time handles things, but a good C
compiler will often convert a switch() statement into a jump-table based
solution. If the cases are contiguous integers, a simple table can be
created with jump offsets to the correct place in code. With that
solution, handling the condition of the switch occurs in linear time, with
each case take equal amounts of time (not counting secondary issues like
cache or pipeline flushes, of course).

If C# does something like that, then the order of each case is irrelevant.
Each possible value takes the exact same time to process.

Of course, if C# naively just turns a switch() statement into a sequence
of if()/else if()... statements, then yes...the first one is the quickest.
But I haven't seen any documentation that suggests this is always what C#
does with a switch().

Pete
 
Hi

1. If you are very much sure about 95% of time case 3 (int =3) excutes,
it is quite reasonable to put it as a first case. (atleast for the sake
of readability and maintenace).

2. As I know switch statement is nothing but a if-else if - else if
...... else ladder. In which case placeing case 3 (int =3) as first case
makes sense.

3. Even if both of above statements ae wrong..... there is no harm to
your application ... I suppose.


Thanks
-Srinivas.
 
Smithers said:
Just wondering if the sequence of the case(s) in the switch block might
impact performance.

Question: will the JIT ever optimize the switch statement at runtime,
based on the behaviour it has been observing so far during execution
of the program?
 
Lucian Wischik said:
Question: will the JIT ever optimize the switch statement at runtime,
based on the behaviour it has been observing so far during execution
of the program?

The .NET JIT is a "one time" JIT - it doesn't recompile code after it's
done it once. (This is in contrast to Hotspot, Sun's Java JIT, which
recompiles based on changes to optimisation strategies etc.)
 
In practice, every switch statement I've ever unassembled is compiled
to an if-then-else-if tree. I do not know if this is a function of
debug mode, or whether the compiler will only use a vector table when
there are the "right" number of case statements.

I suspect it will happen if you've got "sparse" case statements. In
this case, however, it looks to me like it's using a table (as one
would hope, to be honest).

Here's a C# program:

using System;

class Test
{
static void Main()
{
int someInt = 3;

switch (someInt)
{
case 3:
Console.WriteLine ("3");
// Do stuff here
break;
case 1:
Console.WriteLine ("1");
// Do stuff here
break;
case 2:
Console.WriteLine ("2");
// Do stuff here
break;
case 4:
Console.WriteLine ("4");
// Do stuff here
break;
default:
// Do stuff here
break;
}
}
}

and here's the disassembled code in cordbg with JitOptimizations turned
on:

[0000] push edi
[0001] push esi
[0002] cmp dword ptr ds:[001C2DECh],0
[0009] je 00000007
[000b] call 792DD123
[0010] xor esi,esi
[0012] xor edi,edi
[0014] mov esi,3
[0019] mov edi,esi
[001b] lea eax,[edi-1]
[001e] cmp eax,4
[0021] jae 00000009
[0023] jmp dword ptr [eax*4+00DB9C18h]
[002a] nop
[002b] jmp 0000003E
[002d] mov ecx,dword ptr ds:[02412098h]
[0033] call dword ptr ds:[00DF0ECCh]
[0039] nop
[003a] jmp 0000002F
[003c] mov ecx,dword ptr ds:[0241209Ch]
[0042] call dword ptr ds:[00DF0ECCh]
[0048] nop
[0049] jmp 00000020
[004b] mov ecx,dword ptr ds:[024120A0h]
[0051] call dword ptr ds:[00DF0ECCh]
[0057] nop
[0058] jmp 00000011
[005a] mov ecx,dword ptr ds:[024120A4h]
[0060] call dword ptr ds:[00DF0ECCh]
[0066] nop
[0067] jmp 00000002
[0069] pop esi
[006a] pop edi
[006b] ret

I'm not an expert in x86, but this looks to me like a straight jump
rather than a series of comparisons.
 
Smithers said:
Just wondering if the sequence of the case(s) in the switch block might impact
performance. Suppose I switch on an int, and I have 4 expected cases. Lets say I expect
case 3 to happen 95% of the time. Would the switch block be expected to execute faster
(meaning "start executing the code for case 3 sooner") when case 3 appears first, as
opposed to appearing as the 3rd case:, as in the following code?

switch (someInt)
{
case 3:
// Do stuff here
break;
case 1:
// Do stuff here
break;
case 2:
// Do stuff here
break;
case 4:
// Do stuff here
break;
default:
// Do stuff here
break;
}

Just curious - and thought someone might already know offhand.

Thanks!



No, it doesn't matter where you put case, the JIT builds a jump table and uses the switch
"value" as an index into that table and uses the contents of (index*4 + table_start_address)
as the address to jump to.

The jump table For your switch block on X86, would look something like this:
003601b8 00360105 < - index 0
00360133 <- index 1
003600d2 <- index 2
00360162 <- index 3

and the jump instruction like this:
jmp dword ptr [eax*4+3601B8h]

the index in eax is the calculated value of the switch expression-1, so for the case 1 the
jmp will be made to 00360105 , for case 2 a jump is made to 00360133 etc...


Willy.


..
 
Jon Skeet said:
In practice, every switch statement I've ever unassembled is compiled
to an if-then-else-if tree. I do not know if this is a function of
debug mode, or whether the compiler will only use a vector table when
there are the "right" number of case statements.

I suspect it will happen if you've got "sparse" case statements. In
this case, however, it looks to me like it's using a table (as one
would hope, to be honest).

Here's a C# program:

using System;

class Test
{
static void Main()
{
int someInt = 3;

switch (someInt)
{
case 3:
Console.WriteLine ("3");
// Do stuff here
break;
case 1:
Console.WriteLine ("1");
// Do stuff here
break;
case 2:
Console.WriteLine ("2");
// Do stuff here
break;
case 4:
Console.WriteLine ("4");
// Do stuff here
break;
default:
// Do stuff here
break;
}
}
}

and here's the disassembled code in cordbg with JitOptimizations turned
on:

[0000] push edi
[0001] push esi
[0002] cmp dword ptr ds:[001C2DECh],0
[0009] je 00000007
[000b] call 792DD123
[0010] xor esi,esi
[0012] xor edi,edi
[0014] mov esi,3
[0019] mov edi,esi
[001b] lea eax,[edi-1]
[001e] cmp eax,4
[0021] jae 00000009
[0023] jmp dword ptr [eax*4+00DB9C18h]
[002a] nop
[002b] jmp 0000003E
[002d] mov ecx,dword ptr ds:[02412098h]
[0033] call dword ptr ds:[00DF0ECCh]
[0039] nop
[003a] jmp 0000002F
[003c] mov ecx,dword ptr ds:[0241209Ch]
[0042] call dword ptr ds:[00DF0ECCh]
[0048] nop
[0049] jmp 00000020
[004b] mov ecx,dword ptr ds:[024120A0h]
[0051] call dword ptr ds:[00DF0ECCh]
[0057] nop
[0058] jmp 00000011
[005a] mov ecx,dword ptr ds:[024120A4h]
[0060] call dword ptr ds:[00DF0ECCh]
[0066] nop
[0067] jmp 00000002
[0069] pop esi
[006a] pop edi
[006b] ret

I'm not an expert in x86, but this looks to me like a straight jump
rather than a series of comparisons.



Yep, the important part is this:

[001b] lea eax,[edi-1]
[001e] cmp eax,4
[0021] jae 00000009
[0023] jmp dword ptr [eax*4+00DB9C18h]
[002a] nop

here the *index*, that is the number of 'cases'is caluclated and compared against the
maximum valid value, 4 in this sample.
If the value is larger a jump s made to current IP + 000000009 (to > [002a] nop) else, a
jump is made to the address in the jump table at address eax*4., where eax = index and
00DB9C18h is the start of the table. Note that on 64 bit the index is obviously multiplied
by 8.

Note the > [002a] nop, this means that the code is not JIT optimized!!

Willy.
 
Note the > [002a] nop, this means that the code is not JIT optimized!!

Does it, for sure? I definitely had JIT optimisations on... Might it
not be optimising in terms of only jumping to aligned addresses? (As I
say, I'm no expert on this...)
 
Jon Skeet said:
Note the > [002a] nop, this means that the code is not JIT optimized!!

Does it, for sure? I definitely had JIT optimisations on... Might it
not be optimising in terms of only jumping to aligned addresses? (As I
say, I'm no expert on this...)

No need to align instructions, not only the nop's are an indication you aren't running
optimized code, I don't see the call to the Console.WriteLine method either, you I guess
this code block is taken too early and the call you see are to the JIT thunks for the not
yet compiled Console.Write, else the calls would go to the mscorlib_ni native code library
loaded at the top of the VAS. (addresses 79xxxxxx). I never used cordbg or it's latest
incarnation mdbg for this, but I'm afraid it's not the right tool to look at optimized
JIT'd code, the managed debugger is know to be attached by the CLR and I know he is taking
some weird decisions because of this, if you really need the assembly code, just attach a
native debugger like windbg or the VS debugger in "native" only mode, you will get a
different picture really.

Willy.
 
Back
Top