It's Better CLEAR or NEW ?

  • Thread starter Thread starter pamela fluente
  • Start date Start date
Pamela,

Did you ever try to avoid Class scope or even more try to avoid
Static/Shared members.

It will help you to make more maintanable programs in an OOP way.

Cor
 
Jon's right a pragmatic approach is what is required - I have been there on a
couple of occasions :)

Of course I could just prolong the discussion by disagreeing with the MVP's :)

Ollie Riches
 
If a collection is not referenced any more, wouldn't its destructor/
finalize do the equivalent of clear anyhow so that the objects it
references are dereferenced.

Otherwise theres a potential for circular references that the GC can't
handle.

If I an correct then speed wise, doing a clear will be faster than
discarding + new.

Trying to time this stuff is always hard as a large part of the cost
could be in the GC, which could happen at any time.
 
If a collection is not referenced any more, wouldn't its destructor/
finalize do the equivalent of clear anyhow so that the objects it
references are dereferenced.

No, the finalizer doesn't need to do anything. I'd be very surprised
if ArrayList/List<T> etc *had* finalizers. The backing array won't be
treated as a "live" variable any more, so the previously referenced
objects can become eligible for GC (if there are no other references).

There's no real concept of "dereferencing" an object, because .NET
doesn't do reference counting.
Otherwise theres a potential for circular references that the GC can't
handle.

The GC can handle circular references just fine. It uses mark and
sweep, not reference counting.

Jon
 
The answer to this question is tied to the usage of Capacity. If you
store a thousand items and then make a new list a do the same thing
without setting capacity on either list, refilling will be twice as
fast as making a new list simply because the memory is already
allocated for a thousand items. In general clearing and then refilling
should save you an allocation unless your are adding more items than
the existing set contained.
 
Patrick Steele said:
As you develop your app, keep testing it for your desired performance
benchmarks (speed, memory usage, etc...) Wait until you start to
actually see a problem to fix it.

That's exactly why Pacman once run on a Comodore 64 with almost no CPU
and/or memory and that when it's developped today it takes a Core2Duo 2.8Ghz
to run it with 1Gb RAM and a 256Mb video card... Well, I push it up a bit,
but you know what I mean... Some programmers don't care about their code's
performance anymore. Yes it's a bit stupid to overoptimize, but it's
extremely important to use the right way first and not wait to see problems
appear before fixing them. If you can do it right at once, do it right. I
like to do things right once and never touch it again because I know it's
the best way possible. Well, it's my way of thinking, you're free to agree
or not...

pamela fluente said:
In such cases, where it is functionally "equivalent", and I can
choose, what is more advisable ?

It's better to clear a collection and reuse it. Or just instantiate a
brand new one leaving the old
one to the GC ?

As for the question of Pamela, I've seen with experience that the clear is
often slower when you have a lot of items in the collection, but if you only
have a few, go for the clear... Actually, try to estimate the item count
you will have in the collection at any time and test it to use the most
effective way.

I hope it helps

ThunderMusic
 
Pamela,

Did you ever try to avoid Class scope or even more try to avoid
Static/Shared members.

It will help you to make more maintanable programs in an OOP way.

Cor
 
For those who ask why I have send this twice, something strange was
happening, all my answers to the VB net newsgroup from today were gone, when
I opened this I saw later that this one had arived the C# newsgroup however
never the VB languages newsgroup.

Sorry for that,

Cor
 
The answer to this question is tied to the usage of Capacity. If you
store a thousand items and then make a new list a do the same thing
without setting capacity on either list, refilling will be twice as
fast as making a new list simply because the memory is already
allocated for a thousand items. In general clearing and then refilling
should save you an allocation unless your are adding more items than
the existing set contained.

The answer to this question is tied to the usage of Capacity. If you
store a thousand items and then make a new list a do the same thing
without setting capacity on either list, refilling will be twice as
fast as making a new list simply because the memory is already
allocated for a thousand items. In general clearing and then refilling
should save you an allocation unless your are adding more items than
the existing set contained.

It does depend on the context. However, I have found that clearing is
almost always preferable, if it is possible. However, it makes the
code longer, typically.

One way to think of it is the comparison of memory allocation/
deallocation to changing the size (assigning a member to 0) with other
values. I have found that systems under stress typically end up having
a lot of page faults when simple memory reuse is overlooked. However,
on systems with lots of free memory, or lots of user interaction,
letting the GC handle the orphaned memory is easier to write and read.

Blah blah = new Blah();
while (blehBleh)
{
blah.Clear();
blah.AddRange(blipityDoDah);
}

vs.

while (blehBleh)
{
Blah blah = new Blah(blipityDoDah);
}

I recently wrote an application that built Oracle Arrays dynamically
using List<Decimal>s. Since there were 26 lists that held multiples of
32, there was a lot of memory being moved around for the 26MB file.
Clearing actually was significantly faster, however that was due to
the nature of my application (the capacity was foreknown).

Blah,
Travis
 
Jon said:
I'm with Patrick. There's little point in micro-optimising *all* the
code (and often making it less readable at the same point) when
bottlenecks are often in unexpected places.

Jon,

I never said we should micro-optimize all the code. However, I do think we
should continually address performance issues so that they don't jump up and
bite us at the end of the project, or worse, when the engineer working on
the code is no longer on the project and both the bugs and performance
issues are left to someone new on the project.

Hilton
 
Hilton said:
I never said we should micro-optimize all the code. However, I do think we
should continually address performance issues so that they don't jump up and
bite us at the end of the project, or worse, when the engineer working on
the code is no longer on the project and both the bugs and performance
issues are left to someone new on the project.

You should certainly test performance at appropriate times rather than
at the end of the project - but the *testing* part of it is the
important one. Rather than write code assuming that the performance of
that section is important, you should test it so that you know, as far
as is reasonably possible.

Performance is very often achieved at the cost of readability, clarity
and simplicity. Those are not things I give up lightly - so I will only
do so where I can see a significant improvement.

Yes, your overall architecture and algorithms should be designed with
performance in mind, but at the "implementing a method" level it's
relatively unimportant if it can be fixed in one small area later on.

In particular, with the example in this thread, the decision about
whether to clear an existing list or create a new one should usually be
one of the most appropriate semantics, *not* one about performance.
 
Peter said:
Everyone has to think about performance, regardless of platform. Everyone
should write code that performs well.

But for most code, all this really means is to not write inefficient
_algorithms_. Don't use an algorithm that is O(N^2), or even O(N log N)
for that matter, when an O(N) algorithm will do.

Yeah, true, but not. O(N^2) can sometimes be faster than O(N log N), so
sometimes the inefficient algorithm is faster. Anyway, I digress. Let me
give you an example, on the Pocket PC if you combine the Location and Size
calls into a single Bounds method call and change Controls.Add to
Parent=this, you get significant performance increase. These aren't
algorithmic changes, just very trivial code changes which provide huge
increases in performance.

Differences in implementation of the same algorithm are not likely to
produce a performance difference that the user will notice in most cases,
while other aspects of the implementation such as overall code
maintainability and obviousness of the implementation details often do, in
the form of code that actually _works_ and doesn't have unanticipated
complications.

See above, also there was my DateTimePicker experience. Adding a bunch of
these to a page required 45 seconds! After optimizing the code,
it now takes about 5 seconds. No algorithm change. It's just not true that
only changing algorithms gets you big changes.

[zap stuff we pretty much agree on; i.e. readability, don't optimize
something if is makes no difference, etc]
As Patrick and Jon have both said, once you have a complete
implementation, then it makes sense to identify and address any potential
performance problems. At that point, you will know what areas of the code
are actually affecting the user experience, and you will be able to
measure changes in the implementation in a way that takes into account the
context of those changes.

Well this arguement breaks down in many ways, here are a few:
1. This breaks the whole OOP concept; i.e. the object might work fine now
but when I come to reusing it, its performance sucks.
2. You might not be working on the project when the performance issues
become apparent.
3. You move the code to a different platform and it is painfully slow. You
could argue that this is a 'complete implementation' and you can now
optimize. But why not just do it right the first time.

Let me summarize here because I think I'm being a little misunderstood. I
do not suggest micro-optimizing everything. I strongly encourage
readability. I try comment every method and non-obvious lines of code. I
even line up the "=" on different lines (a topic for another thread). My
point is, is that we need to 'build-in' both performance and quality and not
leave it to suddenly rear its ugly head when the project is nearing
completion, deadlines aren't being met, and QA gets thrown a huge project to
test and then we have to start putzing around with the performance issues as
well as the bugs. Do it right the first time. It some method is run once
of startup and optimizing it changes it from 0.2 seconds to 0.1 seconds, of
course keep the most readable, least bug-susceptable version.

Hilton
 
Hilton said:
Yeah, true, but not. O(N^2) can sometimes be faster than O(N log N), so
sometimes the inefficient algorithm is faster. Anyway, I digress. Let me
give you an example, on the Pocket PC if you combine the Location and Size
calls into a single Bounds method call and change Controls.Add to
Parent=this, you get significant performance increase. These aren't
algorithmic changes, just very trivial code changes which provide huge
increases in performance.

And that's fine - because they're *significant* and presumably you've
found them due to testing.

Note that Peter did explicitly say "for most code". Things like this
are common enough on the Compact Framework (and there are other things
you can do to the designer-generated code which speed things up again,
IIRC) but don't affect *most* code.
See above, also there was my DateTimePicker experience. Adding a bunch of
these to a page required 45 seconds! After optimizing the code,
it now takes about 5 seconds. No algorithm change. It's just not true that
only changing algorithms gets you big changes.

Could you have made the change later without breaking anything else? If
so, and if you'd originally been working fast enough on a PC, would
there have been any benefit in performing that optimisation at that
point rather than at the point of moving it to a PocketPC?
[zap stuff we pretty much agree on; i.e. readability, don't optimize
something if is makes no difference, etc]
As Patrick and Jon have both said, once you have a complete
implementation, then it makes sense to identify and address any potential
performance problems. At that point, you will know what areas of the code
are actually affecting the user experience, and you will be able to
measure changes in the implementation in a way that takes into account the
context of those changes.

Well this arguement breaks down in many ways, here are a few:
1. This breaks the whole OOP concept; i.e. the object might work fine now
but when I come to reusing it, its performance sucks.

So improve the performance at that point, in a way which doesn't break
the previous code.
2. You might not be working on the project when the performance issues
become apparent.

If your code is clear enough, others should be able to maintain it,
including performance improvements.
3. You move the code to a different platform and it is painfully slow. You
could argue that this is a 'complete implementation' and you can now
optimize.
But why not just do it right the first time.

Because there's a cost involved in making things faster - whether it's
in readability, clarity, or just effort taken in development.

Most of the code I write is never going to run on a PocketPC, so why on
earth would I want to waste time optimising it so that it's fast enough
for that platform (or another one)? Who's to say that the optimisations
you apply on one platform won't make it even *worse* on another?
Let me summarize here because I think I'm being a little misunderstood. I
do not suggest micro-optimizing everything.

It still sounds like you are doing exactly that though, if your idea of
"doing it right" includes "making it go as fast as possible" even if
after writing it in the simplest way to start with works easily *fast
enough*. I often write code which is simple but could be made faster. I
choose *not* to make it faster in order to keep the simplicity.
I strongly encourage
readability. I try comment every method and non-obvious lines of code. I
even line up the "=" on different lines (a topic for another thread). My
point is, is that we need to 'build-in' both performance and quality and not
leave it to suddenly rear its ugly head when the project is nearing
completion, deadlines aren't being met, and QA gets thrown a huge project to
test and then we have to start putzing around with the performance issues as
well as the bugs.

No-one has suggested leaving performance to the last minute. Indeed,
Patrick's statement you originally disagreed with explicitly stated:

<quote>
As you develop your app, keep testing it for your desired performance
benchmarks
</quote>

Note the "as you develop your app". Not "when you've done the whole
thing, then check that the performance is okay".
Do it right the first time. It some method is run once of startup and
optimizing it changes it from 0.2 seconds to 0.1 seconds, of course
keep the most readable, least bug-susceptable version.

If it's only taking 0.2 seconds to start up, why would you want to
spend the time investigating if you can reduce it in the first place?
Why not just keep the simplest code that you write initially?
 
Jon,

I think this is one of those disagreements that occur on the NGs where we
actually agree with each other perhaps 99% and we could argue it out over
days, but if we were sitting down over a cold Diet Coke, we could agree, but
done with it and move on in a few minutes.

Hilton
 
Hilton said:
Yeah, true, but not. O(N^2) can sometimes be faster than O(N log N)

Differences in absolute times aside, I never said one should always use
O(N log N) over O(N^2).
, so
sometimes the inefficient algorithm is faster.

Yes, and often the efficient algorithm costs more to implement in
developer hours. But these are highly dependent on the specific
scenario. They cannot be resolved as a general solution; a general test
of performance isn't going to answer the question as to what the best
implementation for a specific situation is.

If anything, that simply reinforces my point: the first tier of
optimization has a lot more to do with overall use of programmer time
than it does with specific performance scenarios.
Anyway, I digress. Let me
give you an example, on the Pocket PC if you combine the Location and Size
calls into a single Bounds method call and change Controls.Add to
Parent=this, you get significant performance increase. These aren't
algorithmic changes, just very trivial code changes which provide huge
increases in performance.

But in situations where the user never notices those increases, it's not
worth your time to investigate the performance differences.

In the Location/Size example, it should be immediately apparent to the
programmer who is paying attention that setting two properties, each of
which might force an update of the instance, is going to cost more than
setting a single property that encapsulates both. You don't need to
profile the code to know that it's less expensive to batch up
layout-related assignments.

In the Add() method vs Parent property, the difference is less apparent,
but again in many cases the user will never know the difference. It's
my opinion that it's a problem that two apparently equivalent techniques
produce significantly different performance results (assuming they
do...you're not specific enough for me to comment on that). But it's
not practical to go around performance-testing all of the possible
mechanisms you might use the framework.

The framework itself _should_ minimize these differences. But even
inasmuch as it doesn't, it's not practical to write a performance test
every time you add some new use of a framework element.
See above, also there was my DateTimePicker experience. Adding a bunch of
these to a page required 45 seconds! After optimizing the code,
it now takes about 5 seconds. No algorithm change. It's just not true that
only changing algorithms gets you big changes.

I never said it was.

Furthermore, what led you to optimize the code? Did you actually
profile all of the possible implementations in a separate test harness
to determine precise performance differences between the various
techniques available to you, before even implementing the overall
behavior desired?

Or did you, as I think is more likely, write the code, identify a
performance issue, and investigate how to improve the issue?

I would almost never do the former. I've never argued against the latter.
[...]
As Patrick and Jon have both said, once you have a complete
implementation, then it makes sense to identify and address any potential
performance problems. At that point, you will know what areas of the code
are actually affecting the user experience, and you will be able to
measure changes in the implementation in a way that takes into account the
context of those changes.

Well this arguement breaks down in many ways, here are a few:
1. This breaks the whole OOP concept; i.e. the object might work fine now
but when I come to reusing it, its performance sucks.

That does not "break the whole OOP concept". The OOP concept works just
fine, even if you don't address every possible performance scenario in
your initial implementation.

In fact, it is impossible to anticipate every use of an object, and one
must be prepared to resolve potential issues in the future through
fixing the existing implementation. Not only does this not "break the
whole OOP concept", the "whole OOP concept" is based around
encapsulation that allows for such repairs without affecting existing
users of an object.
2. You might not be working on the project when the performance issues
become apparent.

So? More the reason for the code to be maintainable first, performant
second.
3. You move the code to a different platform and it is painfully slow. You
could argue that this is a 'complete implementation' and you can now
optimize. But why not just do it right the first time.

Because you have no way to know what is "right". If there's a
performance difference that is platform dependent, there is every
possibility that different techniques are required for different
platforms. Optimizing the code on a given platform could very well
result in the least-optimal results on another.
Let me summarize here because I think I'm being a little misunderstood. I
do not suggest micro-optimizing everything. I strongly encourage
readability. I try comment every method and non-obvious lines of code. I
even line up the "=" on different lines (a topic for another thread). My
point is, is that we need to 'build-in' both performance and quality and not
leave it to suddenly rear its ugly head when the project is nearing
completion, deadlines aren't being met, and QA gets thrown a huge project to
test and then we have to start putzing around with the performance issues as
well as the bugs.

If you've "done it right the first time", performance issues are easily
addressed. High-level architecture, correct abstractions, maintainable
implementations all lead to flexible, easily-fixable code.

Obviously, there are certain aspects of performance that can be
addressed during the initial implementation. No one is suggesting
otherwise. We're talking about a specific class of performance
optimizations here though. Such as the difference between instantiating
a new collection versus clearing an existing one.

These kinds of "optimizations" cannot even necessarily be predicted
accurately outside of the real-world application, but beyond that they
often optimize areas of code that represent a tiny fraction of the total
cost of execution. It's inefficient and in many cases
counter-productive to worry about those kinds of optimizations until
they have been demonstrated to be actual problems in the final user
experience.

Pete
 
I can see you are discussing about optimization.

Actually I did not have in mind an optimization purpose.

Just to find the *best* way to do something I found often and I
was asking myself about..

That is, give a *class scope* Collection, it's CLEAR Vs NEW,

[my current preference, I confess, being towards clear, while I
consider
using New a "dissennate" practice potentially very very
dangerous ... :-) ]



-P
 
pamela said:
I can see you are discussing about optimization.

Actually I did not have in mind an optimization purpose.

All due respect, your original post practically limited the discussion
_only_ to optimizations. Your own qualification:

"I mean to the purpose of memory usage, speed, GC, etc."
Just to find the *best* way to do something I found often and I
was asking myself about..

There is no "best". You have to qualify the question (and you did) in
order for anyone to answer the question what is "best".
That is, give a *class scope* Collection, it's CLEAR Vs NEW,

You did not restrict your question to "class scoped" variables. You did
mention them, but you didn't state that as a strict narrowing of the
question.
[my current preference, I confess, being towards clear, while I
consider
using New a "dissennate" practice potentially very very
dangerous ... :-) ]

I don't know what "dissennate" is supposed to mean. That said, I've
already pointed out that the question regarding which is "safer" depends
wholly on the quality of your design otherwise.

If you have a collection that is to be used by multiple objects, you had
better set some very specific rules regarding how that collection is to
be used and maintained. Furthermore, these rules will depend on the
nature of the collection; sometimes it will be appropriate for the
"owner" of the collection to reinstantiate it and sometimes it will be
appropriate for the owner of the collection to clear it.

It is not true that in all designs the multiple users of the collection
will _want_ to see changes made to the collection made by the "owner",
nor is it true that the collection is truly "orphaned" (which generally
implies a memory leak).

Which brings us back to the question of "best". Your original post
stipulated that you wanted "best" evaluated on the basis of performance,
and those are the answers you got. If you instead really wanted the
answer on the basis of the overall design of your code, it really just
depends on that design. There is no single "best" choice, since what's
"best" really depends on the behavior you want the users of the
collection to observe.

Since you didn't share that design or its requirements, there's no way
for anyone to answer the question from that perspective.

Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top