For vs. For Each

A

Alvin Bruney [MVP]

you are right.

i am guilty of skimming over the code and didn't catch the arraylist inside
the loop construct. this code is a gem by the way. it is the best approach i
have seen for this problem, and i have been keeping an eye out for a
feasible solution for a while (see a thread in csharp newsgroups about 8 -
10 months ago).

tomorrow, i'm gonna re-read the thread you posted a while back on that
topic. i believe you may be on to something.

thanks for your vigilance.
 
A

Alvin Bruney [MVP]

And the case where you want to alter the iteration based on
actions during the iteration is

a) ambiguous and generally domain-specific, so not suitable for the CLR;
and
b) probably a really bad idea.

There are issues with your approach. it really isn't domain specific or the
domain is large enough to be common to a lot of applications. From what i
understand from your solution, the problem is this approach may work well
for small collections but will not scale because you force a copy of the
collection per request.

Consider a large dataset where rows are to be removed. For n-items, you
introduce n copies. This effectively doubles your memory allocation. In a
high concurrency environment or in applications which run on finite memory
resources, this approach is not feasible because the cost of iteration is
prohibitive. Also, the cost is more expensive if the copy involves live
objects or objects in the collection which contain children as in the case
of hierarchical data. so it definitely is not trivial.

This is where a non-readonly collection would be more scalable and quicker.

I do agree that it is a good approach for small collections.
 
A

Alvin Bruney [MVP]

There's a deeper issue too, which I won't really get into. But IMHO
there's a tendency for .Net developers to overuse dumb collections

we need to hash this one out. provide all the details (get into it).
 
D

David

And the case where you want to alter the iteration based on

There are issues with your approach. it really isn't domain specific or the
domain is large enough to be common to a lot of applications.

The basic question upfront here is this: should changes to the
collection make during an enumeration affect the enumeration? If so,
how exactly should the enumeration be affected?

foreach(DictionaryEntry entry in MyDictionary)
{
MyDictionary.Add(CalculateNewKey(), "New Entry");
MyDictionary.Remove(CalculateKeyToRemove());
}

Is that an infinite loop? Or does it depend on the values returned by
CalculateNewKey()? Or does it run once for each entry that exists in
the dictionary at the start of the loop? Or does it run once for each
entry that exists at the start of the loop, unless that entry has been
removed from the list. There's a large number of possible definitions,
and each is significantly flawed in its own way.
From what i
understand from your solution, the problem is this approach may work well
for small collections but will not scale because you force a copy of the
collection per request.

I do copy the collection, but I don't copy the objects contained in the
collection, only the references to those objects. There's a huge
difference between those two things.
Consider a large dataset where rows are to be removed. For n-items, you
introduce n copies. This effectively doubles your memory allocation.

No. Because the rows aren't being copied, only references to the rows.
Memory allocation is significantly less than double.
In a
high concurrency environment or in applications which run on finite memory
resources, this approach is not feasible because the cost of iteration is
prohibitive. Also, the cost is more expensive if the copy involves live
objects or objects in the collection which contain children as in the case
of hierarchical data. so it definitely is not trivial.

No, again because only references are being copied. There's a second
iteration, but it's impossible to know the performance effect of that
without knowing the internals of the collection (e.g., what is the cost
of a removal?).
This is where a non-readonly collection would be more scalable and quicker.

As I said to Nick, an enumerator over a non-readonly collection would
have to do the *exact same thing* at a minimum. The fact that some
enumerator type in the CLR does it instead of you doing it explicitly
doesn't affect scalability or performance at all.

In fact, though, performance in the CLR would be much worse than this,
how much worse depends entirely on how you chose to define the behavior
you're requesting. Copying the references as I did is trivial, but it
only works if I limit myself to very specific types of editing to the
underlying collection.
 
D

David

we need to hash this one out. provide all the details (get into it).

If you really want to get into it, see my reply to Nick on the subject
and/or my recent post in the "Design/Implementation Considerations"
thread.

In a nutshell, it generally violates encapsulation to enumerate a
non-member collection. If you want domain-specific information from a
collection, or if you want to perform a domain-specific action on
members of a collection, then the collection class should know how to do
those things.

As a simple example, consider...

foreach(Stock stock in portfolio) {
if(stock.ticker == "MSFT") {
stock.Sell();
}
}

... as opposed to

portfolio.FindByTicker("MSFT").Sell(); // ignore the possible null ref for now

What if my stock lookup is taking too much time? In the second example,
I could add a Hash lookup for tickers and correct the issue, but the
first example will always be O(n). What if I want to move the portfolio
to a webservice? What if my portfolio becomes too large to keep in
available memory (hey, I can dream), and I want to keep it in the
database? Or whatever, there's a lot of possibilities here.

The point is that portfolio manipulation should be encapsulated in a
class, and shouldn't be spread out across the app. Sure, the first
implementation of portfolio is probably a simple member ArrayList, but
there's no reason that other classes should need to know that.
Enumerations seem benign, but in fact they make a lot of assumptions
about the inner implementation of the collection.

Well, I guess that wasn't really a nutshell. I've been putting a lot of
thought into these types of issues lately because I'm putting together a
curriculum for an OO design class.
 
S

Scott Allen

foreach(Stock stock in portfolio) {
if(stock.ticker == "MSFT") {
stock.Sell();
}
}
</snip>

David et al:

Just curious what you would think of:

Stock stock =
Array.BinarySearch(portfolio, "MFST", stock.TickerComparer)

Where the comparisons are taken care of by IComparer implementations
nested into the domain class.

I've taken this approach a few times. I feel the collection does what
it does best (search), while the domain specific logic (the different
ways to compare two stocks) stays encapsulated into the domain
objects.
 
J

Jay B. Harlow [MVP - Outlook]

Scott,
I think you are making too many assumptions in your code that can cause
obscure problems later. :)
Supports multiple MSFT stocks in the portfolio that could be sold. Doesn't
require the portfolio to be sorted in ticker order.
Array.BinarySearch(portfolio, "MFST", stock.TickerComparer)
Assumes (restricts) a single MSFT stock in the portfolio, plus requires that
the portfolio be sorted by ticker. Also requires portfolio to be an array
(rather then some abstraction of an actual Portfolio with its own behaviors
& attributes)

I would consider making "sell stock" a behavior of the portfolio object,
then how it was implemented would be immaterial. I would implement it as a
for each per the 80/20 rule.

portfolio.SellStock("MSFT");

Hope this helps
Jay
 
D

David

Just curious what you would think of:

Stock stock =
Array.BinarySearch(portfolio, "MFST", stock.TickerComparer)

Where the comparisons are taken care of by IComparer implementations
nested into the domain class.

I've taken this approach a few times. I feel the collection does what
it does best (search), while the domain specific logic (the different
ways to compare two stocks) stays encapsulated into the domain
objects.

Jay makes some salient points in another post, but I'd just add that
IMHO the code above could be fine as long as a) it's encapsulated in a
single place, and b) you understand the trade-offs you've made.

A sorted array might be the best implementation of a portfolio for now,
but that choice forces you into a lot of assumptions about behavior,
assumptions which are likely to change in the future. The question I'd
have is how many places in your code have knowledge of a portfolio's
internal implementation?
 
S

Scott Allen

Hi Jay:

I agree this does make many assumptions as a specific example.

Speaking more generally, however, I like the fact that item retrieval
from a collection is abstracted away (the details of using a
BinarySearch or a hashing algorithm will be hidden if you add another
layer of indirection - which I didn't show).

I'm sure you can imagine scenarios where foreach doesn't scale well
enough to use for all look up scenarios (the 20%). This is not the
difference between for and foreach but the difference between O(n) and
O(log n).

I also have the flexibility of using IComparer to search based on
ticker symbol, or price, or any other public member of the object. As
long as the selection of the IComparer instance is kept encapsulated


Which is all too complex for the 80% case I agree... :)

One note: BinarySearch wouldn't restrict the array to holding a single
MFST ticker. BinarySearch doesn't guarantee that it will return the
object with the lowest index in the array, but if there are multiples
it does give back an index that is 'in the ballpark', so to speak.
 
J

Jay B. Harlow [MVP - Outlook]

Scott,
I'm sure you can imagine scenarios where foreach doesn't scale well
enough to use for all look up scenarios (the 20%). This is not the
difference between for and foreach but the difference between O(n) and
O(log n).
As I stated earlier I code for the 80% and only use special case when the
20% is proven...

If the implementation within Portfolio warranted using BinarySearch instead
of For Each then I would consider using it.

My point is: I consider BinarySearch verses For Each an Implementation
detail, I'm sure you can know the importance of hiding implementation
details.
I also have the flexibility of using IComparer to search based on
ticker symbol, or price, or any other public member of the object. As
long as the selection of the IComparer instance is kept encapsulated
As long as you realize that BinarySearch requires your collection be sorted
by the "field" you are searching on. I hope you also realize this sorting
may rake havoc with your scalability.

Which is where I find (I make pun) Whidbey's Array.Find, Array.FindAll,
Array.FindIndex might be a better choice. Again implementation details.

Of course there is nothing stopping Portfolio from having a tickerIndex,
priceIndex, symbolIndex sorted Arrays that you use BinarySearch individually
on based on which "field" you are searching, again implementation detail...
One note: BinarySearch wouldn't restrict the array to holding a single
MFST ticker. BinarySearch doesn't guarantee that it will return the
Its not a literal restriction as much as its a conceptual one, in that
BinarySearch is only able to return a single Stock. You would not know which
stock was returned. Your sample effectively says to sell 1 of N MSFT stock
where as David's code says to sell N of N MSFT stock. I hope you agree 1 of
N is very different then N of N!

Also as you pointed out you may not know which MSFT stock you sold, it may
be the first, last, or middle shares in your portfolio...

Hope this helps
Jay
 
S

Scott Allen

On Sun, 15 Aug 2004 13:47:36 -0500, "Jay B. Harlow [MVP - Outlook]"

hi Jay:

I hope you also realize this sorting
may rake havoc with your scalability.

If you are suggesting I might be choosing algorithms at random, then I
want to assure you I have specific performance and salability metrics
to meet. The applications are tested and measured, then specific areas
are targeted for tuning to achieve the goals.

Its not a literal restriction as much as its a conceptual one, in that
BinarySearch is only able to return a single Stock. You would not know which
stock was returned. Your sample effectively says to sell 1 of N MSFT stock
where as David's code says to sell N of N MSFT stock. I hope you agree 1 of
N is very different then N of N!
<snip>

Being nit picky, but BinarySearch doesn't return an object reference,
it returns an index value. You can use the index to find other
occurrences of what you are looking for. *If* you are in a performance
critical section, this *may* still be a win over straight iteration.
The devil is always in the details.
 
J

Jay B. Harlow [MVP - Outlook]

Scott,
If you are suggesting I might be choosing algorithms at random, then I
I was not suggesting you were choosing algorithms at random, please re-read
what I stated!

BinarySearch requires that the array be sorted. Correct?

Sorting the array each time you search on a different "field" may negate the
benefit of the BinarySearch. Correct?

If you feel either of the above statements are false, outside my suggestion
of multiple arrays, I would like to see a working sample of how you get it
to work, while being performant & scalable! (as I may find it useful in my
solutions).

I was suggesting that using For Each or BinarySearch is IMHO an
implementation detail. I would pick the appropriate one for the requirements
of the application.
Being nit picky, but BinarySearch doesn't return an object reference,
it returns an index value. You can use the index to find other
Yes it returns an Index. It returns only a single index, this index is able
to retrieve a single stock. Correct?

In your original example you only showed changing a single stock, if you
want to show an example of using BinarySearch to change N stocks, then we
can continue this discussion...

Thanks for understanding
Jay






Scott Allen said:
On Sun, 15 Aug 2004 13:47:36 -0500, "Jay B. Harlow [MVP - Outlook]"

hi Jay:

I hope you also realize this sorting
may rake havoc with your scalability.

If you are suggesting I might be choosing algorithms at random, then I
want to assure you I have specific performance and salability metrics
to meet. The applications are tested and measured, then specific areas
are targeted for tuning to achieve the goals.

Its not a literal restriction as much as its a conceptual one, in that
BinarySearch is only able to return a single Stock. You would not know which
stock was returned. Your sample effectively says to sell 1 of N MSFT stock
where as David's code says to sell N of N MSFT stock. I hope you agree 1 of
N is very different then N of N!
<snip>

Being nit picky, but BinarySearch doesn't return an object reference,
it returns an index value. You can use the index to find other
occurrences of what you are looking for. *If* you are in a performance
critical section, this *may* still be a win over straight iteration.
The devil is always in the details.
 
S

Scott Allen

Hi again Jay:

Scott,
I was not suggesting you were choosing algorithms at random, please re-read
what I stated!

You stated:

"As long as you realize that BinarySearch requires your collection be
sorted by the "field" you are searching on. I hope you also realize
this sorting may rake havoc with your salability."

Yes, I am aware the collection has to be in sorted order. I also
wanted to make it clear I'm not doing binary searches because I
watched Oprah Winfrey's show in the morning and Oprah said binary
searches were cool so I wanted to try it too. I took this as a
condescending statement, but please accept my apologies if it wasn't
meant this way.
I was suggesting that using For Each or BinarySearch is IMHO an
implementation detail. I would pick the appropriate one for the requirements
of the application.

Agreed.

Yes it returns an Index. It returns only a single index, this index is able
to retrieve a single stock. Correct?

Correct, but also note index + 1 and index - 1 can also contain a
match. Checking in the vicinity can, in the right scenario, be faster
than an iterating an entire collection and is non-trivial to
implement, in fact .....
In your original example you only showed changing a single stock, if you
want to show an example of using BinarySearch to change N stocks, then we
can continue this discussion...

..... I'm sure you've done this yourself by using the Select method of
the DataTable, or the Find method of the DataView, and then
manipulating the rows the methods return. One doesn't need to know
that Select uses a binary search, or that arrays (indexes, if you
will) are built to do to the binary search. These details are all
nicely encapsulated - someone tells the container what to search for
and the container does the specialized work. (Though like everything,
there can be a need to know in exceptional cases).
 
G

Guest

Hi,

I may have misunderstood what was said, but at a recent Tech ED 2004 seminar
(in NZ) we were told that FOREACH is not a good idea on collections
containing primitive data types (i.e integers etc).
Apparently this is because it causing a lot of unnecesary "boxing"
(converting the primitive data type into an object) and "unboxing" (the
reverse). Apparently if you have a collection of ints you should use for.
Why anyone would foreach through a collection of ints I don't know.

Apparently this is a bigger problem when developing with the compact .net
framework (for mobile devices), because this sort of application normally
runs on devices with smaller amounts of memory available. If too much
boxing/unboxing goes on the garbage collector then has a lot of work to do
and runs quite often, slowing down the app.

In a desktop environment it's presumably less of a problem.
 
J

Jon Skeet [C# MVP]

Troy said:
I may have misunderstood what was said, but at a recent Tech ED 2004 seminar
(in NZ) we were told that FOREACH is not a good idea on collections
containing primitive data types (i.e integers etc).

I don't believe that's not generally true. If the collection already
has the data in boxed form, it needs to be unboxed anyway. If you're
dealing with an array, no boxing needs to occur.

The only time I can see it being true is with a collection which does
actually have a strongly-typed interface to it and stores the ints
without boxing, where an iterator would need to box. I think that's a
relatively rare situation, myself.
 
D

David

I don't believe that's not generally true.

I'm not sure I'm parsing that correctly.
If the collection already
has the data in boxed form, it needs to be unboxed anyway. If you're
dealing with an array, no boxing needs to occur.

The only time I can see it being true is with a collection which does
actually have a strongly-typed interface to it and stores the ints
without boxing, where an iterator would need to box. I think that's a
relatively rare situation, myself.

My first thought was "Wouldn't that be the case with any array of value
type?"

But, a few seconds later...

OK, strike that, I'm wrong of course. I just glanced at the IL and using
foreach on an array of int doesn't call GetEnumerator, it inlines the
loop instead, avoiding any boxing. That's interesting.

Do you know if language matters here? Is this an optimization of the C#
compiler, or something inherent to the CLR?
 
J

Jon Skeet [C# MVP]

David said:
I'm not sure I'm parsing that correctly.

Nah, I just made a typo. It should have been "I'm not sure that's
generally true."
My first thought was "Wouldn't that be the case with any array of value
type?"

But, a few seconds later...

OK, strike that, I'm wrong of course. I just glanced at the IL and using
foreach on an array of int doesn't call GetEnumerator, it inlines the
loop instead, avoiding any boxing. That's interesting.

It's pretty necessary, to be honest - the performance penalties
otherwise would be awful, and completely unnecessary.
Do you know if language matters here? Is this an optimization of the C#
compiler, or something inherent to the CLR?

It's the C# compiler - the CLR itself doesn't know about foreach at
all.
 
G

Guest

Is there a performance difference between this:

\\\
Dim i As Integer
For i = 0 to myObject.Controls.Count - 1
myObject.Controls(i) = ...
Next
///

and this:

\\\
Dim ctl As Control
For Each ctl In myObject.Controls
ctl = ...
Next
///

Or is For Each just "prettier"?

Thanks,

Eric
 
F

Fredrik Wahlgren

Snipped from this ng:

"I came across a reference on a web site
(http://www.personalmicrocosms.com/html/dotnettips.html#richtextbox_lines )
that said to speed up access to a rich text box's lines that you needed to
use a "foreach" loop instead of a "for" loop. This made absolutely no sense
to me, but the author had posted his code and timing results. The "foreach"
(a VB and other languages construct) was 0.01 seconds to access 1000 lines
in
rich text box, whereas the "for" loop (a traditional C++ construct) was an
astounding 25 seconds (on a not very fast PC).

I recreated a test file using the partial source code posted by the author
and verified that there is a SIGNIFICANT performance difference between the
two constructs (although on my PC is was 0.01 seconds vs 3.6 seconds - still
a noticeable delay). Unfortunately, there was no explanation as to why this
was the case and I couldn't see anything as to why one loop construct would
be different. Looking at the generated IL code with Lutz Roeder's Reflector
tool, I see that the real culprit is not the loop structure but the
get_Lines() function that is pulled out of the loop in the "foreach" loop
and
not in the "for" loop code. Which, leads to me post this question about the
differences in complier code generation/optimization and is there any
setting
that can change this.

Interestingly, this is true for both Debug and Release builds. The compiler
generated code that called that function twice for each pass of the loop
(once for the loop index check and then again for the length calculation).
Pulling out unneccessary function calls is pretty basic optimization, and I
surprised that the compiler didn't detect this.

With the IDE's intellisense and auto completion features, the "for" loop
construct shown in the code below seems like something that someone might
actually code up, and of course who would have figured out that the
get_Lines
method would be so performance intensive.

Makes me wonder if there are any other gotchas like this.

Thanks, Mike L.

----------------------------------------------------------------------------
----------------

//Simple windows form with a richtextbox control, initialized w/1000 lines
of text (e.g., "line #101", etc).

private void ForLoopButton_Click(object sender, System.EventArgs e)
{
Cursor.Current = Cursors.WaitCursor;
int Len = 0;
int Start = Environment.TickCount;
for (int i = 0; i < TheRichTextBox.Lines.Length; i++)
{
Len += TheRichTextBox.Lines.Length;
}
int ElapsedTime = Environment.TickCount - Start;
ResultsTextBox.Clear();
RsultsTextBox.Text = "for loop\r\n\r\nElapsed time = " + ((double)
ElapsedTime / (double) 1000.0).ToString() + " seconds\r\n\r\nResult = " +
Len.ToString();
Cursor.Current = Cursors.Arrow;
}

private void ForEachLoopButton_Click(object sender, System.EventArgs e)
{
Cursor.Current = Cursors.WaitCursor;
int Len = 0;
int Start = Environment.TickCount;
foreach (String Line in TheRichTextBox.Lines)
{
Len += Line.Length;
}
int ElapsedTime = Environment.TickCount - Start;
ResultsTextBox.Clear();
ResultsTextBox.Text = "foreach loop\r\n\r\nElapsed time = " + ((double)
ElapsedTime / (double) 1000.0).ToString() + " seconds\r\n\r\nResult = " +
Len.ToString();
Cursor.Current = Cursors.Arrow;
}

private void ForLoopButton2_Click(object sender, System.EventArgs e)
{
//Performance results now same as ForEachLoopButton_Click with the changes
made.
Cursor.Current = Cursors.WaitCursor;
int Len = 0;
int Start = Environment.TickCount;
string[] lines = TheTextBox.Lines;
for (int i = 0; i < lines.Length; i++)
{
Len += lines.Length;
}
int ElapsedTime = Environment.TickCount - Start;
ResultsTextBox.Clear();
RsultsTextBox.Text = "for loop\r\n\r\nElapsed time = " + ((double)
ElapsedTime / (double) 1000.0).ToString() + " seconds\r\n\r\nResult = " +
Len.ToString();
Cursor.Current = Cursors.Arrow;
}"
 
C

Chad Z. Hower aka Kudzu

Fredrik Wahlgren said:
Makes me wonder if there are any other gotchas like this.

This is well known. Its because many (but not all) list type items are
maintained internally without an index (or other, such as linked items) and
the for loop requires access by index, in which such structures access by
index is slow.


--
Chad Z. Hower (a.k.a. Kudzu) - http://www.hower.org/Kudzu/
"Programming is an art form that fights back"

Empower ASP.NET with IntraWeb
http://www.atozed.com/IntraWeb/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top