High CPU Usage

  • Thread starter Thread starter Damien
  • Start date Start date
D

Damien

Hi guys,

I'm looking for ideas for troubleshooting the following. We've tried
some random things to try to treat the symptoms, but none seem robust
enough to use when we go live, and we'd rather discover the root cause:

We've got an ASP.NET application, running on framework 1.1 on Windows
2003 (IIS 6). Under default settings, during testing by two users,
we're seeing the CPU usage on w3wp.exe rocket up and stay up (and thus
cause web pages to be served extremely slowly). Pages do continue to
work, so fairly sure we've not got an infinite loop in there somewhere.

The only two thing that seems to be new to this application compared to
previous applications is that it uses remoting in a few places and
makes use of datagrids on most pages. The remoting - in two places it
places a call which can be reasonably slow (blocks internally until the
operation is complete). In the other, a first remoting call is made to
kick off a long-running process and then polling (via meta-refresh or
AJAX, depending on whether JS is available) is used to determine when
the operation is complete. The remoting channel is configured in
Application_Start. This is just for information, and may be unrelated
to our problem. The high CPU usage does not seem to be directly
triggered by any of these calls, and the server end of the remoting is
in a separate service (and it's CPU usage doesn't change)

The datagrids use custom databinding, since the number and type of
controls on each page is determined at runtime from the database. So,
for instance, a grid can consist of three columns. Each column has a
textbox control, a radio-button list, a drop-down list, etc. During the
databinding, the unneeded controls are hidden, and the correct control
has any necessary validation attached to it.

I've not used IISState before, but tried using it when the CPU maxed
out. Which resulted, after a while (and well before the dump was
complete) in IIS deciding to recycle the process.

Restarting IIS every thirty minutes is allowing testing to proceed, but
obviously this isn't a long-term solution.

Running under IIS 5 isolation mode resulted in no difference in
behaviour.

Letting IIS run more worker processes (we've tried values of 1, 3, 5
and 1000) seems to work for some period of time but eventually one of
the processes will again go up to high usage.

So, ideas on how to diagnose this problem further? Like I say, it
doesn't appear to be triggered by any particular page, and all requests
still succeed. Thanks in advance for any help,

Damien
 
Damien said:
Hi guys,

I'm looking for ideas for troubleshooting the following. We've tried
some random things to try to treat the symptoms, but none seem robust
enough to use when we go live, and we'd rather discover the root cause:

We've got an ASP.NET application, running on framework 1.1 on Windows
2003 (IIS 6). Under default settings, during testing by two users,
we're seeing the CPU usage on w3wp.exe rocket up and stay up (and thus
cause web pages to be served extremely slowly). Pages do continue to
work, so fairly sure we've not got an infinite loop in there somewhere.

The only two thing that seems to be new to this application compared to
previous applications is that it uses remoting in a few places and
makes use of datagrids on most pages. The remoting - in two places it
places a call which can be reasonably slow (blocks internally until the
operation is complete). In the other, a first remoting call is made to
kick off a long-running process and then polling (via meta-refresh or
AJAX, depending on whether JS is available) is used to determine when
the operation is complete. The remoting channel is configured in
Application_Start. This is just for information, and may be unrelated
to our problem. The high CPU usage does not seem to be directly
triggered by any of these calls, and the server end of the remoting is
in a separate service (and it's CPU usage doesn't change)

The datagrids use custom databinding, since the number and type of
controls on each page is determined at runtime from the database. So,
for instance, a grid can consist of three columns. Each column has a
textbox control, a radio-button list, a drop-down list, etc. During the
databinding, the unneeded controls are hidden, and the correct control
has any necessary validation attached to it.

I've not used IISState before, but tried using it when the CPU maxed
out. Which resulted, after a while (and well before the dump was
complete) in IIS deciding to recycle the process.

Restarting IIS every thirty minutes is allowing testing to proceed, but
obviously this isn't a long-term solution.

Running under IIS 5 isolation mode resulted in no difference in
behaviour.

Letting IIS run more worker processes (we've tried values of 1, 3, 5
and 1000) seems to work for some period of time but eventually one of
the processes will again go up to high usage.

So, ideas on how to diagnose this problem further? Like I say, it
doesn't appear to be triggered by any particular page, and all requests
still succeed. Thanks in advance for any help,

Damien

Well, we had someone come in and provide some assistance to us today.
We've now resolved our issues, mostly through the assistance of some
tools we had already. Using a memory profiler led us to finding three
memory leaks (essentially, places where objects hooked themselves up to
static event handlers, and never unhooked themselves). But that wasn't
the cause...

We also found some older database access code which was using the
"single connection, synclock it and pass it around" method of
connecting to the database, which some of the code was spending a fair
bit of time in (we thought we'd migrated all of that code, but the
single shared connection was still available, and some code was still
using it). We think this was the major cause...

Once we'd switched to "grab a connection, do your work, close the
connection", we started getting Connection Pool Timeouts. Turns out
that two pieces of code we're doing "grab a connection, do your work,
return", which was quickly swamping the connection pool. This may have
been a contributor to the problems we were experiencing.

We also found a few minor bugs, but I don't think any of them were
contributing.

Major thing I'm trying to fathom out is why we weren't seeing the
Connection Pool Timeouts in the past.

But we're now a lot happier that our code is "correct", and the speed
issues have gone away. Definitely a lot happier using a profiler now
(both for performance and for memory usage)

Sorry if anyone spent any time trying to fathom anything out.

Damien
 
Back
Top