Why do we need the DataRowExtensions.Field method?

C

Carl Johansson

Why do we need the DataRowExtensions.Field method?

According to the on line help the answer is becuase it: "Provides
strongly-typed access to each of the column values in the DataRow."

Nevertheless, I don't understand the difference between this
statement:

int i = dataRow.Field<int>("SomeIntColumn");

and this statement:

int j = ((int)dataRow["SomeIntColumn"]);

As far as I can see, whether compile time or run time, there is no
difference at all?! So again, why do we need the
DataRowExtensions.Field method?

Regards Carl Johansson
 
F

Felix Palmen

* Carl Johansson said:
int i = dataRow.Field<int>("SomeIntColumn");

and this statement:

int j = ((int)dataRow["SomeIntColumn"]);

Without even talking about the DataRow class, you should always be aware
that a direct cast is expensive. In the example using a simple value
type (int), it has to be boxed in an object by DataRow's default indexer
and unboxed by the caller.
As far as I can see, whether compile time or run time, there is no
difference at all?! So again, why do we need the
DataRowExtensions.Field method?

Also read the remarks in the MSDN. In your example, boxing and casting
is avoided. According to MSDN remarks, there is also support for null
values, so you don't have do use the awkward DbNull stuff.

Regards,
Felix
 
F

Felix Palmen

* Peter Duniho said:
Except that the Field<T>() method is an extension method:
http://msdn.microsoft.com/en-us/library/bb360891.aspx

As such, it has no special access to the internal data. It has to do
the same cast that directly casting the column's object value does.

Sure, I overlooked that.
In any case, casting is not really _that_ expensive. If you've got some
code that accounts for 99% of your runtime and it's got a tight loop
with some casting you can avoid in the middle, then sure…it might be
worthwhile to try to change that.

There are two reasons casting is expensive. One is the conversion of the
actual data that is sometimes needed (float, int, decimal). The second
is checking for a compatible type. For reference types, this can be
leveraged using an "as-cast", so there's at least no overhead for
generating an exception. If there's a chance to avoid casting
(especially "direct" casts), you should do so. Unfortunately, there is
no cast operator without any runtime checks in .NET (like e.g. the
reinterpret_cast<T> in C++).

Of course, as you pointed out, this doesn't apply /here/.

Regards,
Felix
 
F

Felix Palmen

* Peter Duniho said:
Other kinds of casts are actually "conversions" in CLR parlance. They
can be expensive because of the reinterpretation of the data that
happens (which can involve arbitrarily complex operations).

Point is, they can happen in the background, depending on what types you
are casting (e.g. decimal to float, the internal representations are
fundamentally different).
But that's not what happens when unboxing. The cast for unboxing is
successful only if the boxed type is exactly the type being requested.
Indeed.


As long as the cast is successful, a plain cast shouldn't be any more
expensive than "as". In both cases, the CLR is doing the same checks.

Unfortunately, this isn't true. See here:
http://www.codeproject.com/KB/cs/csharpcasts.aspx

Apparently, there's a CIL opcode for reinterpreting a reference as a
different type and return null on incompatibility, it is only used with
the "as" syntax.
I disagree that that's unfortunate. :) One of the main reasons that
managed code is robust and secure is that all casts involve run-time checks.

A matter of taste. I like some high performance features in the language
that require me to know exactly what I'm doing. They're normally rarely
used because they require very careful programming, but for critical
optimizations, it's sometimes worth the effort.

Regards,
Felix
 
F

Felix Palmen

* Peter Duniho said:
I did my own test (see below).
[...]

I modified the test a little to compare what's fair to compare (both
techniques once WITH error checking/handling and once without). Ideed,
on .NET 4, there's a ~ 10% advantage for direct casts. Of course,
throwing the exception will always be very costly, so it's probably best
now to use direct casts when you are sure the cast will succeed,
"as-casts" when you know you have to check. That also matches the
general idea that exceptions should be reserved to error conditions.
Except that that preference is directly in conflict with the design
goals of C# and managed code generally. If you have that preference,
then C# is not the appropriate language for you to be using. It's not
"unfortunate" that C# doesn't have that particular feature. It's just
that C# isn't the right tool for the job.

So, the unsafe keyword got into the language accidentally?

C# tries to be the right tool for as many jobs as possible -- a cast
without runtime checks would fit perfectly into unsafe mode.

Regards,
Felix
 
F

Felix Palmen

* Peter Duniho said:
Feel free to ignore the third example if you like. The "as" version is
still 40% to 100% slower than the direct cast.

Wrong. Your test code for example measures pointless comparisons in each
loop. Tested on a XP machine with .NET 4, the direct cast indeed is
faster, but it's just 10% and the relative difference stays the same
when adding code to check for error conditions.
The performance difference between casting and a theoretical feature
with no casting is negligible, while the performance difference between
going through individual array elements as managed types versus (for
example) using Buffer.BlockCopy() or pointers via unsafe code is huge.

You're mixing completely unrelated things. There's no need for "unsafe"
features in order to copy a block. Collection types may have a huge
performance penalty over unmanaged data ... System.Arreys do not
necessarily.
It tries to support reasonable desires. It doesn't support pointless ones.

Believe whatever you like.
 
F

Felix Palmen

* Peter Duniho said:
I'm afraid you're going to have to stop hand-waving and be more
specific. In the two test cases that are of the most interest, there
aren't any comparisons at all, except that which is required to manage
the loop. That's hardly pointless.

You're checking for your warmup value in every iteration, that is
pointless if you want to measure the cost for the cast. But well ... I
can't show my test code right now because I hacked it together at work
and I'm lacking remote access.
In any case, the only thing that extra code around the actual code of
interest can result in is a _smaller_ tested difference between the two
test cases, which the actual difference being larger than measured.
Removing cruft around the actual statement of interest isn't going to
make the difference smaller, as you suggest.

That's what I'd assume, too, but my tests showed consistently 10%
difference (windows xp, .net 4). Maybe the difference is caused by the
underlying OS. Maybe, the article I quoted was right for CLR 2.0?
The BlockCopy() method accesses the data as bytes, no matter what the
actual underlying type is. The caller doesn't need to be "unsafe", but
the underlying copy does. See "memcpy" in the BlockCopy() class.

Well, that IS something different. There are even quite a few calls to
native API functions in the framework. As long as I'm using the
framework's interface, I don't really need the "unsafe" mode.

BTW, I think it's a misleading nomenclature. The code is not unsafe,
it's just not guaranteed to be safe. And even this way to express it is
flawed. Just because the code is managed doesn't mean it can't contain
loads of weird and nasty bugs...

I would've called it "rawpointer" mode or something like this.
You can test it yourself if you don't believe me. The overhead of
managed arrays is still significant, though less than the other
collection types. Much more significant than the difference between a
type-checked cast and an unchecked one.

I was refering to the indexing operation, compared to a plain pointer
offset. What else would you use raw pointers for in C#? The memory area
occupied by a managed array is contiguous, so the operations really are
basically the same (definitely in the same complexity class).

Regards,
Felix
 
F

Felix Palmen

* Felix Palmen said:
That's what I'd assume, too, but my tests showed consistently 10%
difference (windows xp, .net 4). Maybe the difference is caused by the
underlying OS. Maybe, the article I quoted was right for CLR 2.0?

For the sake of completeness, these tests were done with a debugging
build running inside VS 2010. Release builds behave quite differently. I
compiled two release executables, one against .NET 2.0, one against 4.0,
both using VS 2010's compiler -- these are the results on windows xp:

#v+
X:\>casttest-2.exe
Running plain tests ...

trial# | direct cast | as cast
-------+--------------+--------------
1 | 3436ms | 4810ms
2 | 3436ms | 4810ms
3 | 3435ms | 4808ms
4 | 3436ms | 4810ms
5 | 3435ms | 4812ms

Running safe tests ...

trial# | direct cast | as cast
-------+--------------+--------------
1 | 6020ms | 5495ms
2 | 6022ms | 5497ms
3 | 6022ms | 5496ms
4 | 6028ms | 5496ms
5 | 6022ms | 5496ms

X:\>casttest-4.exe
Running plain tests ...

trial# | direct cast | as cast
-------+--------------+--------------
1 | 3513ms | 4465ms
2 | 3513ms | 4466ms
3 | 3512ms | 4465ms
4 | 3513ms | 4466ms
5 | 3513ms | 4464ms

Running safe tests ...

trial# | direct cast | as cast
-------+--------------+--------------
1 | 5727ms | 5156ms
2 | 5754ms | 5155ms
3 | 5726ms | 5154ms
4 | 5739ms | 5155ms
5 | 5716ms | 5152ms
#v-

It's interesting to see that even without any exception thrown, just
checking for one adds cost so that the direct cast is slower than the as
cast. This was different in the debugging build. Also, the difference is
bigger in release builds.

I didn't do a test with CLR 1 -- maybe that's where the author of the
article I found got his data (or he just made some mistake...)

So to conclude, in case of casts following the language semantics (using
direct casts when a wrong type is an error condition, as casts
otherwise) also gives the best performance.

Regards,
Felix

PS -- Testcode:

#v+
using System;
using System.Diagnostics;

namespace CastTest
{
class Test
{
private int _count;

public static void UpdateDirect(object instance)
{
++((Test)instance)._count;
}

public static void UpdateAs(object instance)
{
++(instance as Test)._count;
}

public static bool SafeUpdateDirect(object instance)
{
bool result = true;
try
{
++((Test)instance)._count;
}
catch (InvalidCastException)
{
result = false;
}
return result;
}

public static bool SafeUpdateAs(object instance)
{
Test testObject = instance as Test;
if (testObject == null) return false;
++testObject._count;
return true;
}
}

static class Program
{
private static readonly int warmupIterations = 1024;
private static readonly int iterations = 1000000000;
private static readonly int trials = 5;

private static object testDirect = new Test();
private static object testAs = new Test();
private static Stopwatch watchDirect = new Stopwatch();
private static Stopwatch watchAs = new Stopwatch();

static void Main(string[] args)
{
Console.WriteLine("Running plain tests ...");
Console.WriteLine();
Console.WriteLine("trial# | direct cast | as cast");
Console.WriteLine("-------+--------------+--------------");

for (int trial = 1; trial <= trials; ++trial)
{
watchDirect.Reset();
watchAs.Reset();

for (int i = 0; i < warmupIterations; ++i)
{
Test.UpdateDirect(testDirect);
}
watchDirect.Start();
for (int i = 0; i < iterations; ++i)
{
Test.UpdateDirect(testDirect);
}
watchDirect.Stop();

for (int i = 0; i < warmupIterations; ++i)
{
Test.UpdateAs(testAs);
}
watchAs.Start();
for (int i = 0; i < iterations; ++i)
{
Test.UpdateAs(testAs);
}
watchAs.Stop();

Console.WriteLine(string.Format(
"{0,6} | {1,10}ms | {2,10}ms", trial,
watchDirect.ElapsedMilliseconds,
watchAs.ElapsedMilliseconds));
}

Console.WriteLine();
Console.WriteLine("Running safe tests ...");
Console.WriteLine();
Console.WriteLine("trial# | direct cast | as cast");
Console.WriteLine("-------+--------------+--------------");

for (int trial = 1; trial <= trials; ++trial)
{
watchDirect.Reset();
watchAs.Reset();

for (int i = 0; i < warmupIterations; ++i)
{
Test.SafeUpdateDirect(testDirect);
}
watchDirect.Start();
for (int i = 0; i < iterations; ++i)
{
Test.SafeUpdateDirect(testDirect);
}
watchDirect.Stop();

for (int i = 0; i < warmupIterations; ++i)
{
Test.SafeUpdateAs(testAs);
}
watchAs.Start();
for (int i = 0; i < iterations; ++i)
{
Test.SafeUpdateAs(testAs);
}
watchAs.Stop();

Console.WriteLine(string.Format(
"{0,6} | {1,10}ms | {2,10}ms", trial,
watchDirect.ElapsedMilliseconds,
watchAs.ElapsedMilliseconds));
}
#if DEBUG
Console.ReadLine();
#endif
}
}
}
#v-
 
F

Felix Palmen

* Peter Duniho said:
Of course it does. I already showed that to be true. And of course,
you also repeated my result that showed that casts don't really matter
anyway, since they cost so little (in your tests, even the slowest test
was only 6 seconds for a _billion_ casts).

It also showed something your test completely missed. Checking for the
exception thrown by a direct cast adds significant cost, even if it does
never occur.

And no, I disagree that it is completely irrelevant. Right now I'm
working on code that parses a lot of awful data (I won't go into details
here...) in order to create some kind of batch import for it. The batch
import's speed doesn't matter much, the preparation does -- and it
requires LOTS of casts.
Your test is also technically invalid in the sense that your "warmup"
doesn't actually warmup all of the code that's actually doing the test.

The only thing it misses is the loop variable. I could have reused it.
But with a billion iterations, that's not really going to matter anyway.
But thanks for the confirmation.

You'll always feel confirmed. But the only thing I confirmed was the
major part of your test results.

Regards,
Felix
 
F

Felix Palmen

* Peter Duniho said:
No sane C# programmer would use a try/catch around a direct cast for
casts that are known to be possible to fail. It's such a fundamental
element of .NET programming, that it goes without saying and there's no
need to include that scenario in performance testing.

Misuse of exceptions is one of the problems you see most often in live
code. If you measure performance of the as cast including the check for
null, the "equivalent" direct cast method needs to catch the exception.
If the performance of your code depends significantly on the difference
of using a direct cast (no try/catch…try/catch around a direct cast is
stupid) and using the "as" operator, then your code is seriously broken.

That's not what I said. When I'm absolutely sure the cast is valid, it
runs a little quicker with a direct cast, as of what I know now. It
would run much quicker with an unchecked cast (which has zero runtime
cost, hard to beat), but THAT one is not available on .NET except for
casting to a base type.

Regards,
Felix
 
F

Felix Palmen

* Peter Duniho said:
You can keep saying that an unchecked cast would be useful in C# until
you're blue in the face. But no matter how many times you say it, the
facts just don't bear that out.

You're just acting defiantly, so no use to discuss this any further. The
arguments for its usefulness were given earlier in this thread, and if
you refuse to recognize them, I just don't care.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top