I have a large multi-threaded C# application running on a multi-core 4-way server. Currently we're using "server mode" garbage collection. However testing has shown that workstation mode GC is quicker.
MSDN says:
Managed code applications that use the server API receive significant benefits from using the server-optimized garbage collector (GC) instead of the default workstation GC.
Workstation is the default GC mode and the only one available on single-processor computers. Workstation GC is hosted in console and Windows Forms applications. It performs full (generation 2) collections concurrently with the running program, thereby minimizing latency. This mode is useful for client applications, where perceived performance is usually more important than raw throughput.
The server GC is available only on multiprocessor computers. It creates a separate managed heap and thread for each processor and performs collections in parallel. During collection, all managed threads are paused (threads running native code are paused only when the native call returns). In this way, the server GC mode maximizes throughput (the number of requests per second) and improves performance as the number of processors increases. Performance especially shines on computers with four or more processors.
But we're not seeing performance shine!!!! Has anyone got any advice?
It's not explained very well, but as far as I can tell, the server mode is synchronous per core, while the workstation mode is asynchronous.
In other words, the workstation mode is intended for a small number of long running applications that need consistent performance. The garbage collection tries to "stay out of the way" but, as a result, is less efficient on average.
The server mode is intended for applications where each "job" is relatively short lived and handled by a single core (edit: think multi threaded web server). The idea is that each "job" gets all the cpu power, and gets done quickly, but that occasionally the core stops handling requests and cleans up memory. So in this case the hope is that GC is more efficient on average, but the core is unavailable while its running, so the application needs to be able to adapt to that.
In your case it sounds like, because you have a single application whose threads are relatively coupled, you're fitting better into the model expected by the first mode rather than the second.
But that's all just after-the-fact justification. Measure your system's performance (as ammoQ said, not your GC performance, but how well you application behaves) and use what you measure to be best.
.NET 4.5 introduces concurrent server garbage collection.
http://msdn.microsoft.com/en-us/library/ee787088.aspx
specify <gcServer enabled="true"/>
specify <gcConcurrent enabled="true"/> (this is the default so can be omitted)
And there is the new SustainedLowLatencyMode;
In the .NET Framework 4.5, SustainedLowLatency mode is available for both workstation and server GC. To turn it on, set the GCSettings.LatencyMode property to GCLatencyMode.SustainedLowLatency.
Server: Your program is the only significant application on the machine and needs the lowest possible latency for GCs.
Workstation: You have a UI or share the machine with other important process
I have a test of my DB engine on .NET 6 where I compare performance on different GC settings. In general, more than 300% improvement did not happen, with 6 heaps.
https://vimeo.com/711964445
runtimeconfig.template.json
{
"configProperties": {
"System.GC.HeapHardLimit": 8000000000,
"System.GC.Server": true,
"System.GC.HeapCount": 6
}
}
Related
Some facts:
We have developed wcf service that acts as a layer between clients and the database.
It's selfhosted and runs as a windows service.
The service keeps several caches, where the largest are about 1-2gb in memory. Total memory usage is usually about 5-8gb.
Connections are duplex and uses tcp protocol and the serialization is made with protobuf-net. Our connected client count usually range from 1000-1500.
The server is a 8-core xeon of newish model with 64gb memory and runs nothing more then the service.
The problem: After x amount of time, it has been everywhere from a day to a week the service gets extremely slow. Requests that takes 0.5 seconds can take over a minute. This behaviour goes on for 15-40 minutes or til the service is restarted.
What we have done :
We have checked the network and network connection to the server and there is no problem. CPU utilization goes up somewhat during this time from f.eks. 30% avg to 40-50% avg.
We have taken memory dumps and there are no logical locks in code that blocks the users and not much activity at all.
Our latest lead is the Garbage collector. In perfmon we can see that "% time in gc" is constantly over 90%,(90-97%) and the collection counts rises. Both GC0 and GC1. We suspect there is a blocking GC2 running also but we had to restart the service as this is in production so it didn't count up during the 5min window we ran perfmon. Memory usage was 7,6 Gb.
Note : Calls outstanding rises so the calls get there but the service does not handle them.
My questions are, Can the garbage collector get in a state where it runs and blocks constantly for over 15minutes? or are the problem probably related to some other issue?
Our service ran GC in workstation mode and latencymode : Interactive
We have now changed this to Server and SustainedLowLatency and hopes this will help somewhat. Are there anything else we can do if its the garbage collector?
Edit : The large memory usage is by design, the data in the caches is that large and there is lots of more memory available.
Excessive garbage collection is often caused by code issues. You either create too many objects in a short time, or you keep allocating memory without releasing it.
There is actually an extensive checklist available on MSDN that should help you diagnose the problem.
A very large GC2 means that the objects in there survived multiple garbage collections, which means they are kept in memory for a longer period of time. That could be the root cause of your issue. Maybe there is a caching mechanism that could use some tuning / retention policy (remove data that isn't used for a long time).
I have a similar situation. Large database data cache in a service using protobuf with WCF for client communication. The cache is not purely just for clients, the business layer uses the cache to perform operations. The memory footprint of the service can be anywhere between 2 and 10 GB. I release a segment of the cache after 8 hours of inactivity. The machine has 8 virtual cores and 32 GB of memory. I am using .Net 4.5.1.
The GC would consume 98% of the CPU for an hour as soon as I loaded the cache from the database. The interesting point here in both our cases there is no memory pressure what so ever.
I think the GC is performed regardless because something was changed where the GC tries to keep available memory for all threads. Since one thread allocated a large amount of memory when loading the cache, the GC kicked in. I had to do several things to fix it.
1) Removed Tuples from the cache. I was using them as dictionary keys and their implementation of StructuralEquality is horrible. It compares all properties as objects so there is a lot of boxing going on for properties that are values and these will have to be garbage collected at some point.
2) When replacing Tuples used as keys I could not simply replace them with structures without implementing Equals as the value comparison uses reflection and it is too expensive so I ended up creating a Generic Pair structure. I decided to use structures to remove the number of objects when they were in arrays.
3) To remove the Tuples I had to create my own Pair structure that compares the properties using default equals for property types. Eseentially the same thing that PowerCollections created.
I've a C# server developed on both Visual Studio 2010 and Mono Develop 2.8. NET Framework 4.0
It looks like this server behaves much better (in terms of scalability) on Windows than on Linux.
I tested the server scalability on native Windows(12 physical cores), and 8 and 12 cores Windows and Ubuntu Virtual Machines using Apache's ab tool.
The windows response time is pretty much flat. It starts picking up when the concurrency level approaches/overcomes the number of cores.
For some reason the linux response times are much worse. They grow pretty much linearly starting from level 5 of concurrency. Also 8 and 12 cores Linux VM behave similarly.
So my question is: why does it perform worse on linux? (and How can I fix that?).
Please take a look at the graph attached, it shows the averaged time to fulfill 75% of the requests as a function of the requests concurrency(the range bar are set at 50% and 100%).
I have a feeling that this might be due to mono's Garbage Collector. I tried playing around with the GC settings but I had no success. Any suggestion?
Some additional background information: the server is based on an HTTP listener that quickly parses the requests and queues them on a thread pool. The thread pool takes care of replying to those requests with some intensive math (computing an answer in ~10secs).
You need to isolate where the problem is first. Start by monitoring your memory usage with HeapShot. If it's not memory, then profile your code to pinpoint the time consuming methods.
This page, Performance Tips: Writing better performing .NET and Mono applications, contains some useful information including using the mono profiler.
Excessive String manipulation and Boxing are often 'hidden' culprits of code that doesn't scale well.
Try the sgen garbage collector (and for that, Mono 2.11.x is recommended). Look at the mono man page for more details.
I don't believe it's because of the GC. AFAIK the GC side effects should be more or less evenly distributed across the threads.
My blind guess is: you can fix it by playing with ThreadPool.SetMinThreads/SetMaxThreads API.
I would strongly recommend you do some profiles on the code with regards to how long individual methods are running. It's quite likely that you're seeing some locking or similar multi-threading difficulties that are not being handled perfectly by mono. CPU and RAM usage would also help.
I believe this may be the same problem that we tracked down involving the thread pool and the starting behavior for new threads, as well as a bug in the mono implementation of setMinThreads. Please see my answer on that thread for more information: https://stackoverflow.com/a/12371795/1663096
If you code throws a lot of exceptions, then mono is 10x faster than .NET
This a VERY open question.
Basically, I have a computing application that launches test combinations for N Scenarios.
Each test is conducted in a single dedicated thread, and involves reading large binary data, processing it, and dropping results to DB.
If the number of threads is too large, the app gets rogue and eats out all available memory and hangs out..
What is the most efficient way to exploit all CPU+RAM capabilities (High Performance computing i.e 12Cores/16GB RAM) without putting the system down to its knees (which happens if "too many" simultaneous threads are launched, "too many" being a relative notion of course)
I have to specify that I have a workers buffer queue with N workers, every time one finishes and dies a new one is launched via a Queue. This works pretty fine as of now. But I would like to avoid "manually" and "empirically" setting the number of simultaneous threads and have an intelligent scalable system that drops as many threads at a time that the system can properly handle, and stop at a "reasonable" memory usage (the target server is dedicated to the app so there is no problem regarding other applications except the system)
PS : I know that .Net 3.5 comes with Thread Pools and .Net 4 has interesting TPL capabilites, that I am still considering right now (I never went very deep into this so far).
PS 2 : After reading this post I was a bit puzzled by the "don't do this" answers. Though I think such request is fair for a memory-demanding computing program.
EDIT
After reading this post I will to try to use WMI features
All built-in threading capabilities in .NET do not support adjusting according to memory usage. You need to build this yourself.
You can either predict memory usage or react to low memory conditions. Alternatives:
Look at the amount of free memory on the system before launching a new task. If it is below 500mb, wait until enough has been freed.
Launch tasks as they come and throttle as soon as some of them start to fail because of OOM. Restart them later. This alternative sucks big time because your process will do garbage collections like crazy to avoid the OOMs.
I recommend (1).
You can either look at free system memory or your own processes memory usage. In order to get the memory usage I recommend looking at private bytes using the Process class.
If you set aside 1GB of buffer on your 16GB system you run at 94% efficiency and are pretty safe.
We have a 64bit C#/.Net3.0 application that runs on a 64bit Windows server. From time to time the app can use large amount of memory which is available. In some instances the application stops allocating additional memory and slows down significantly (500+ times slower).When I check the memory from the task manager the amount of the memory used barely changes. The application keeps on running very slowly and never gives an out of memory exception.
Any ideas? Let me know if more data is needed.
You might try enabling server mode for the Garbage Collector. By default, all .NET apps run in Workstation Mode, where the GC tries to do its sweeps while keeping the application running. If you turn on server mode, it temporarily stops the application so that it can free up memory (much) faster, and it also uses different heaps for each processor/core.
Most server apps will see a performance improvement using the GC server mode, especially if they allocate a lot of memory. The downside is that your app will basically stall when it starts to run out of memory (until the GC is finished).
* To enable this mode, insert the following into your app.config or web.config:
<configuration>
<runtime>
<gcServer enabled="true"/>
</runtime>
</configuration>
The moment you are hitting the physical memory limit, the OS will start paging (that is, write memory to disk). This will indeed cause the kind of slowdown you are seeing.
Solutions?
Add more memory - this will only help until you hit the new memory limit
Rewrite your app to use less memory
Figure out if you have a memory leak and fix it
If memory is not the issue, perhaps your application is hitting CPU very hard? Do you see the CPU hitting close to 100%? If so, check for large collections that are being iterated over and over.
As with 32-bit Windows operating systems, there is a 2GB limit on the size of an object you can create while running a 64-bit managed application on a 64-bit Windows operating system.
Investigating Memory Issues (MSDN article)
There is an awful lot of good stuff mentioned in the other answers. However, I'm going to chip in my two pence (or cents - depending on where you're from!) anyway.
Assuming that this is indeed a 64-bit process as you have stated, here's a few avenues of investigation...
Which memory usage are you checking? Mem Usage or VMem Size? VMem size is the one that actually matters, since that applies to both paged and non-paged memory. If the two numbers are far out of whack, then the memory usage is indeed the cause of the slow-down.
What's the actual memory usage across the whole server when things start to slow down? Does the slow down also apply to other apps? If so, then you may have a kernel memory issue - which can be due to huge amounts of disk accessing and low-level resource usage (for example, create 20000 mutexes, or load a few thousand bitmaps via code that uses Win32 HBitmaps). You can get some indication of this on the Task Manager (although Windows 2003's version is more informative directly on this than 2008's).
When you say that the app gets significantly slower, how do you know? Are you using vast dictionaries or lists? Could it not just be that the internal data structures are getting so big so as to complicate the work any internal algorithms are performing? When you get to huge numbers some algorithms can start to become slower by orders of magnitude.
What's the CPU load of the application when it's running at full-pelt? Is actually the same as when the slow-down occurs? If the CPU usage decreases as the memory usage goes up, then that means that whatever it's doing is taking the OS longer to fulfill, meaning that it's probably putting too much load on the OS. If there's no difference in CPU load, then my guess is it's internal data structures getting so big as to slow down your algos.
I would certainly be looking at running a Perfmon on the application - starting off with some .Net and native memory counters, Cache hits and misses, and Disk Queue length. Run it over the course of the application from startup to when it starts to run like an asthmatic tortoise, and you might just get a clue from that as well.
Having skimmed through the other answers, I'd say there's a lot of good ideas. Here's one I didn't see:
Get a memory profiler, such as SciTech's MemProfiler. It will tell you what's being allocated, by what, and it will show you the whole slice n dice.
It also has video tutorials in case you don't know how to use it. In my case, I discovered I had IDisposable instances that I wasn't Using(...)
I have a C# windows service acting as a server, the service holds some large (>8Gb) data structures in memory and exposes search methods to clients via remoting.
The avg search operation is executed in <200ms and the service handles up to 20 request/sec.
I'm noticing some serious performance degradation (>6000ms) on a regular basis for few seconds
My best guess is that the server threads are stopped by a gen2 garbage collection from time to time.
I'm considering switching from server gc to workstation gc and wrap my search method in this to prevent GC during requests.
static protected void DoLowLatencyAction(Action action)
{
GCLatencyMode oldMode = GCSettings.LatencyMode;
try
{
GCSettings.LatencyMode = GCLatencyMode.LowLatency;
// perform time-sensitive actions here
action();
}
finally
{
GCSettings.LatencyMode = oldMode;
}
}
Is this a good idea?
Under what conditions the GC will be performed anyway inside the low latency block?
Note: I'm running on a x64 server with 8 cores
Thanks
I've not used GCLatencyMode before so cannot comment on whether using it is a good idea or not.
However, are you sure you are using server GC? By default, Windows services use workstation GC.
I've had a similar problem before in a Windows service, and setting server GC mode using:
<configuration>
<runtime>
<gcServer enabled="true" />
</runtime>
</configuration>
in the service's app.config file solved it.
Have a read of this post from Tess Ferrandez's blog for a more details.
I am surprised that your search method even trigger a GC, if the 8GB datastructure is static ( not modified a lot by adding or removing from it ), and all you do is searching it, then you should just try to avoid allocating temporary objects within your search method ( if you can ). Creating objects is the thing that triggers a GC, and if you have a data structure that is rarely modified, then it makes sense to avoid the GC all together ( or delay it as much as possible ).
GCSettings.LowLatency does is it gives a hint to the GC to do eager collections such that we can avoid Gen2 collections while the LowLatency mode is set. What this does as a side effect is make the GC eager to collect outside the region where LowLatency mode is set, and could result in lower performance ( in this case you can try server GC mode ).
This doesn't sound like a great idea. At some point the GC will ignore your hint and do the collection anyway.
I believe you will have much better success by actually profiling and optimizing your service. Run PerfView (free, awesome, Microsoft tool) on your server while it is running. See who owns troublesome objects and how long specific long running GC events are taking.