GC having spikes that take more then 60% cpu time

G

Gotisch

Hi,

we are currently writing an mostly singlethreaded (game)server
application in c#. But we are encountering problems with the Garbage
Collector: After a certain time but also apparently randomly the GC
starts working Alot and taking up to 60-70% of the cpu time, which
results in huge lags in responce time of the server.

The server uses (depending on the setting) around 350-800 MB of memory
( the servers we tested this on had from 1gb to 3gb of memory) most of
which is in gen2 heap. That itself is inevitably since its loads
spawners (object that will create mobiles or gameobjects over the time)
and gameobjects / mobiles into the world.

Now we think the problem might be comming from the gc having to move
through those when a GC occurs. ( Can you clear me up on this point?
afaik worst case the gc has to check all objects to see if they hold a
reference to objects on gen0, or am i misunderstanding this?)

Removing the long-living objects ofcourse stops this problem, but then
the server takes nearly no memory at all.

Another weird thing is, that our gen0 seems to be maxed out all the
time. This could come from the network part, which stores incomming
data into a packet that is enqueued and then processed by the mainloop
every 20 milliseconds(for every client). Inserting a GC.GetGeneration
debug output shows that from 50 packets around 5 make it to generation1
and 1 to generation2, could this be cause for the problem?

I doubt this though because the spikes don't occur at the same time a
gen2 collection occurs. The count of gen0 - gen1 - gen2 isnt that bad
either its approximatly 1000 - 500 - 1.

We though moving all the very long living objects that stay through the
whole applications lifetime to the large object heap could avoid those
making problems, but apparently you cant move small objects there
anyway?

Or it could be that gen0 / 1 are to fragmented. From a dump we get:
Gen0 is size 423592 with 71148 free - 16% fragmentation?
Gen1 is size 1134596 with 71160 free - 6% fragmentation?
Gen2 is size 530020272 with 6989736 free - 1.3% fragmentation?

Is it true that fragmentation above 10% is a nono in gen0 heap? if yes
how can we find out what is causing it?

What would really help me are some hints how to figure out what causes
the GC to use that much cpu time. At the moment we can only guess,
which isn't really bringing forth anything.

How should we start investigating what is causing this problem? I
assume there is no way to see what the garbage collector is doing when
he is taking that much cpu? if its marking objects for collection,
moving them or whatever?

We also tried to get dumps from the heap before and after a "spike" but
with about 8 million objects its hard to see any relation between
before and after a spike.

I would be greatfull for any tipps you could supply.

Thank you
 
G

Gotisch

Laura said:
Your memory pattern matches quite nicely to what Rico Mariani calls a
"Mid-life Crisis".
You can check this for some recommendations:
http://blogs.msdn.com/ricom/archive/2003/12/04/41281.aspx

In his article he writes:

"To get overall promotion rates, use the GC Performance counters, there
are counters that will tell you how much stuff is getting promoted into
generation 2. You want that number to be as small as possible - zero
is ideal and even achievable in steady state, but as long as the rate
of generation 2 collects is staying low, you'll be fine."

We have like 298245 gen0 collections 100455 gen1 collections and 1751
gen2 collections at the moment. So i cant really say we have to many
gen2 collections. It more looks like we have to many gen1 collections.
but those shouldn't be that time consuming should they? 300 - 100 - 2
is the proportion so this either means we dont have enough gen0
collections or to many gen1 ( i read somewhere the perfect proportion
is (100gen0 10gen1 1gen2). But gen1 collections are not
"midlife-crisis" collections are they?
 
N

Nicholas Paldino [.NET/C# MVP]

Gotisch,

Something you said in your original post stood out to me. You said that
your app is a single threaded game server application. Is your server
running on a machine where you can set the runtime to gc on a separate
thread (in other words, not the workstation edition of the CLR). You might
be able to amortize the costs of GC out if you enable this (given that it is
possible given the hardware and OS that your app is running on).

Hope this helps.
 
C

Christof Nordiek

Gotisch said:
In his article he writes:

"To get overall promotion rates, use the GC Performance counters, there
are counters that will tell you how much stuff is getting promoted into
generation 2. You want that number to be as small as possible - zero
is ideal and even achievable in steady state, but as long as the rate
of generation 2 collects is staying low, you'll be fine."

We have like 298245 gen0 collections 100455 gen1 collections and 1751
gen2 collections at the moment. So i cant really say we have to many
gen2 collections. It more looks like we have to many gen1 collections.
but those shouldn't be that time consuming should they? 300 - 100 - 2
is the proportion so this either means we dont have enough gen0
collections or to many gen1 ( i read somewhere the perfect proportion
is (100gen0 10gen1 1gen2). But gen1 collections are not
"midlife-crisis" collections are they?
Hi Gotisch,

reading the article and the passage you cited the problem seemes to be the
number
of objects promoted to gen2, not the number of gen2 collections.

in your first post you are talking about packets with incoming data that can
survive to gen2.
this could be the mid-life-object rico is talking about.
You should look if they are referencing any objects wich aren't needed
anymore. this references should
be nulled prior to enqueuing the packets.

hth
 
W

Willy Denoyette [MVP]

| Hi,
|
| we are currently writing an mostly singlethreaded (game)server
| application in c#. But we are encountering problems with the Garbage
| Collector: After a certain time but also apparently randomly the GC
| starts working Alot and taking up to 60-70% of the cpu time, which
| results in huge lags in responce time of the server.
|
| The server uses (depending on the setting) around 350-800 MB of memory
| ( the servers we tested this on had from 1gb to 3gb of memory) most of
| which is in gen2 heap. That itself is inevitably since its loads
| spawners (object that will create mobiles or gameobjects over the time)
| and gameobjects / mobiles into the world.
|
| Now we think the problem might be comming from the gc having to move
| through those when a GC occurs. ( Can you clear me up on this point?
| afaik worst case the gc has to check all objects to see if they hold a
| reference to objects on gen0, or am i misunderstanding this?)
|
| Removing the long-living objects ofcourse stops this problem, but then
| the server takes nearly no memory at all.
|
| Another weird thing is, that our gen0 seems to be maxed out all the
| time. This could come from the network part, which stores incomming
| data into a packet that is enqueued and then processed by the mainloop
| every 20 milliseconds(for every client). Inserting a GC.GetGeneration
| debug output shows that from 50 packets around 5 make it to generation1
| and 1 to generation2, could this be cause for the problem?
|
| I doubt this though because the spikes don't occur at the same time a
| gen2 collection occurs. The count of gen0 - gen1 - gen2 isnt that bad
| either its approximatly 1000 - 500 - 1.
|
| We though moving all the very long living objects that stay through the
| whole applications lifetime to the large object heap could avoid those
| making problems, but apparently you cant move small objects there
| anyway?
|
| Or it could be that gen0 / 1 are to fragmented. From a dump we get:
| Gen0 is size 423592 with 71148 free - 16% fragmentation?
| Gen1 is size 1134596 with 71160 free - 6% fragmentation?
| Gen2 is size 530020272 with 6989736 free - 1.3% fragmentation?
|
| Is it true that fragmentation above 10% is a nono in gen0 heap? if yes
| how can we find out what is causing it?
|


| What would really help me are some hints how to figure out what causes
| the GC to use that much cpu time. At the moment we can only guess,
| which isn't really bringing forth anything.
|
| How should we start investigating what is causing this problem? I
| assume there is no way to see what the garbage collector is doing when
| he is taking that much cpu? if its marking objects for collection,
| moving them or whatever?
|
| We also tried to get dumps from the heap before and after a "spike" but
| with about 8 million objects its hard to see any relation between
| before and after a spike.
|
| I would be greatfull for any tipps you could supply.
|
| Thank you
|

Well, you have a very high allocation rate, this combined with gen0/gen1
fragmentation, can lead to a higher GC rate than normal because part of the
gen0 cannot be compacted due to the fragmentation. While gen0 collections
are cheap, gen2 collections are expensive, especially when a lot of objects
reach gen2 where they die. So it's not uncommon to see spikes in % Time in
GC reaching 70%.
First thing you should inspect is the # Induced GC's, this counter should be
at 0 (preferably), to keep this counter as low as possible:
- don't ever call GC.Collect() or GC.Collect(2) or GC.Collect(2)
- Don't call GC methods like GC.GetTotalMemory(true), that force GC sweeps
(these are needed to update the counters).

Second thing you should take care of is the pinned object allocations, try
to pre-allocate the buffers from the LOH if you can, or you need to change
your design such that you release pinned buffers as soon as possible (watch
out for asynchronous receives).

Don't know what server you are running this on, but from what I see from
your usage scenario, you should not run this on a single CPU box, you will
need at least two CPU's possibly more, speed is not as important as the
number of.
Also, you should not run this on V1.1, you need v2.

Willy.
 
C

Chris Mullins

Gotisch said:
we are currently writing an mostly singlethreaded (game)server
application in c#.

Ouch.

You may want to give some serious thought to re-thinking this approach. A
canonical example of "When to use threads" is "when writing a game server".
But we are encountering problems with the Garbage
Collector: After a certain time but also apparently randomly the GC
starts working Alot and taking up to 60-70% of the cpu time, which
results in huge lags in responce time of the server.

Time to memory profile your application.
The server uses (depending on the setting) around 350-800 MB of memory
( the servers we tested this on had from 1gb to 3gb of memory) most of
which is in gen2 heap. That itself is inevitably since its loads
spawners (object that will create mobiles or gameobjects over the time)
and gameobjects / mobiles into the world.

As a quick aside, if you have a number of these Spawners, and their lifetime
is infinite, you may want to try loading them into the Large Object Heap.
 
C

Chris Mullins

[Fragmention only response]

Gotisch said:
Or it could be that gen0 / 1 are to fragmented. From a dump we get:
Gen0 is size 423592 with 71148 free - 16% fragmentation?
Gen1 is size 1134596 with 71160 free - 6% fragmentation?
Gen2 is size 530020272 with 6989736 free - 1.3% fragmentation?

Is it true that fragmentation above 10% is a nono in gen0 heap? if yes
how can we find out what is causing it?

The way your're doing network I/O is almost certainly causing your heap
fragmentation.

The blogs post here:
http://www.coversant.net/dotnetnuke/Coversant/Blogs/tabid/88/EntryID/9/Default.aspx

goes into this in great detail. You're sending memory into the socket for
receive (and for send), and that memory gets pinned due to Interop issues.
Because it's pinned, it can't be compacted and you start to get
fragmentation.
 
C

Chris Mullins

[How to fix the problem response]

Gotisch said:
Or it could be that gen0 / 1 are to fragmented. From a dump we get:
Gen0 is size 423592 with 71148 free - 16% fragmentation?
Gen1 is size 1134596 with 71160 free - 6% fragmentation?
Gen2 is size 530020272 with 6989736 free - 1.3% fragmentation?

What would really help me are some hints how to figure out what causes
the GC to use that much cpu time. At the moment we can only guess,
which isn't really bringing forth anything.

How should we start investigating what is causing this problem?

The best option by far is the Scitech memory profiler. You can find this at:
http://memprofiler.com/

It's going to take you 4 or 5 hours to figure out how make sense of the
output that it gives you, but once you do, your memory issues will jump
right out at ya.
I assume there is no way to see what the garbage collector is doing when
he is taking that much cpu? if its marking objects for collection,
moving them or whatever?

It's probably a combination of many little things:
- forgetting to call dispose on a number of object, so the GC has to
Finalize them instead, thereby decresing performacne
- Probably doing way more memory allocations than you really need to. Adding
in some low-level cache code often helps with memory pressure.
- Pinned object being sent into Windows (via Socket reads/write) causes
fragmentation and makes the GC work much harder.
- Leaking memory. Big systems do this, even in a GC world.
- Allocating too much from the heap - if there's a LOT of small stuff, turn
it into structures that get stack allocated.
We also tried to get dumps from the heap before and after a "spike" but
with about 8 million objects its hard to see any relation between
before and after a spike.

The SciTech profiler has a nice way to compare heap-dumps.
 
C

Chris Mullins

Gotisch said:
we are currently writing an mostly singlethreaded (game)server
application in c#.

Ouch.

You may want to give some serious thought to re-thinking this approach. A
canonical example of "When to use threads" is "when writing a game server".
But we are encountering problems with the Garbage
Collector: After a certain time but also apparently randomly the GC
starts working Alot and taking up to 60-70% of the cpu time, which
results in huge lags in responce time of the server.

Time to memory profile your application.
The server uses (depending on the setting) around 350-800 MB of memory
( the servers we tested this on had from 1gb to 3gb of memory) most of
which is in gen2 heap. That itself is inevitably since its loads
spawners (object that will create mobiles or gameobjects over the time)
and gameobjects / mobiles into the world.

As a quick aside, if you have a number of these Spawners, and their lifetime
is infinite, you may want to try loading them into the Large Object Heap.You
may need to play some weird tricks to do this, but getting them out of the
main Gen0/Gen1/Gen2 heap may help things out.

Just be SURE that you don't get much churn in there, or things will get
worse.
Another weird thing is, that our gen0 seems to be maxed out all the
time. This could come from the network part, which stores incomming
data into a packet that is enqueued and then processed by the mainloop
every 20 milliseconds(for every client).

You're doing this single-threaded? You really, really want to switch over to
an asynchronous model of network programming. You'll gain lots of
performance and your application responsiveness will shoot way up.

Inserting a GC.GetGeneration
debug output shows that from 50 packets around 5 make it to generation1
and 1 to generation2, could this be cause for the problem?

It's certainly contributing to the problem, but unlikley to be the cause.
You really want to avoid these things being promoted into Gen2 if you can
help it, as a Gen2 GC takes a very long time. There are ways to do this:

- pre-create a number of packets at application startup (say 500 or 1000, or
whatever number you think is big enough) and store them in a data structure
of some sort. When new data comes in, check a pre-allocated packet out of
the pool, and copy your data into it. When you're all done processing the
packet, check the packet back into the available pool. Essentially, you just
want to create a Packet Pool. Doing this will make sure the packet pool gets
into the Gen2 heap, and that your standard packets during processing don't
have any impact on GC.
We though moving all the very long living objects that stay through the
whole applications lifetime to the large object heap could avoid those
making problems, but apparently you cant move small objects there
anyway?

Just make your object bigger. That's always easy. It's making them smaller
that's tough! :)
I would be greatfull for any tipps you could supply.

As I posted in another response to this thread, time to break out the
Scitech Memory Profiler...
 
C

Chris Mullins

Nicholas Paldino said:
You said that your app is a single threaded game server application. Is
your server running on a machine where you can set the runtime to gc on a
separate thread (in other words, not the workstation edition of the CLR).
You might be able to amortize the costs of GC out if you enable this
(given that it is possible given the hardware and OS that your app is
running on).

I don't think he's quite far enough along yet.

Until he's "pretty close" in terms of having profiled his application and
tuning it to run on the standard CLR GC, introducing the complexity of the
Server GC will only make things more difficult. I've personally made this
mistake in the past with GC ("Oh - if only I switch over to that CLR that I
read about, all my GC problems will go away!), and regretted it. I ended up
having to fix all the problems anyway, and this just made things more
difficult.
 
G

gotisch

Hi,

i though it might be something with the network code and while doing
research i saw something else that seemed pretty weird:

http://img208.imageshack.us/img208/6929/gcoh5.png

first i ran the app with on beginaccept and everything seemed ok.
but somewhere on the web i read that using multiple beginaccept would
speed up the connecting of other clients.

though when i run it with multiple accepts as you can see the gc cpu
time goes up to a minimum of 2.5% even when the server isnt doing
anything ( 1 connection active). and gen0 heap size is pushed to 3,8mB
while gen1 heap size starts to switch bewteen 400kB and 900kB.
this seems pretty wrong since it makes me lots of gen1 collections (1
every 2-3 seconds)


any ideas on what is causing this?

is it bad to use mutliple beginaccept calls?

thanks for you time.
 
L

Laura T

There is a resolved bug for using multiple BeginAccept with TCPListeners.
(https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=94417)
It's not exactly your case, but as the workaround is to set
AlwaysUseCompletionPortsForAccept true for the socket, you might try it
anyway.

Using multiple accepts should be ok.

Hi,

i though it might be something with the network code and while doing
research i saw something else that seemed pretty weird:

http://img208.imageshack.us/img208/6929/gcoh5.png

first i ran the app with on beginaccept and everything seemed ok.
but somewhere on the web i read that using multiple beginaccept would
speed up the connecting of other clients.

though when i run it with multiple accepts as you can see the gc cpu
time goes up to a minimum of 2.5% even when the server isnt doing
anything ( 1 connection active). and gen0 heap size is pushed to 3,8mB
while gen1 heap size starts to switch bewteen 400kB and 900kB.
this seems pretty wrong since it makes me lots of gen1 collections (1
every 2-3 seconds)


any ideas on what is causing this?

is it bad to use mutliple beginaccept calls?

thanks for you time.
 
M

Mat Polutta

Sounds like you are on the right Track. I had a very similar GC issue
with a TCP socket (C#) app several years ago. I spent much time with
the Intel Profiler, which was available as a free download for a one
month test drive. Ultimately, what I discovered is that the way that I
was instancing my listener objects (or packet objects? - been awhile
since I looked at the code) kept the GC from knowing that these objects
should expire immediately after use. As far as the GC was concerned,
the packet/socket handlers were to live forever, meaning that whatever
data arrived et al lived on as well. Our app used IO Completion Ports.

Good luck. This is the kind of bug that you don't discover until you
are in the final stage of Acceptance Testing with a a production load.
Deadline looming darkly.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top