Control Memory-Hungy Multi-Threaded App - c#

This a VERY open question.
Basically, I have a computing application that launches test combinations for N Scenarios.
Each test is conducted in a single dedicated thread, and involves reading large binary data, processing it, and dropping results to DB.
If the number of threads is too large, the app gets rogue and eats out all available memory and hangs out..
What is the most efficient way to exploit all CPU+RAM capabilities (High Performance computing i.e 12Cores/16GB RAM) without putting the system down to its knees (which happens if "too many" simultaneous threads are launched, "too many" being a relative notion of course)
I have to specify that I have a workers buffer queue with N workers, every time one finishes and dies a new one is launched via a Queue. This works pretty fine as of now. But I would like to avoid "manually" and "empirically" setting the number of simultaneous threads and have an intelligent scalable system that drops as many threads at a time that the system can properly handle, and stop at a "reasonable" memory usage (the target server is dedicated to the app so there is no problem regarding other applications except the system)
PS : I know that .Net 3.5 comes with Thread Pools and .Net 4 has interesting TPL capabilites, that I am still considering right now (I never went very deep into this so far).
PS 2 : After reading this post I was a bit puzzled by the "don't do this" answers. Though I think such request is fair for a memory-demanding computing program.
EDIT
After reading this post I will to try to use WMI features

All built-in threading capabilities in .NET do not support adjusting according to memory usage. You need to build this yourself.
You can either predict memory usage or react to low memory conditions. Alternatives:
Look at the amount of free memory on the system before launching a new task. If it is below 500mb, wait until enough has been freed.
Launch tasks as they come and throttle as soon as some of them start to fail because of OOM. Restart them later. This alternative sucks big time because your process will do garbage collections like crazy to avoid the OOMs.
I recommend (1).
You can either look at free system memory or your own processes memory usage. In order to get the memory usage I recommend looking at private bytes using the Process class.
If you set aside 1GB of buffer on your 16GB system you run at 94% efficiency and are pretty safe.

Related

High-performance TCP Socket programming in .NET C#

I know this topic is already asked sometimes, and I have read almost all threads and comments, but I'm still not finding the answer to my problem.
I'm working on a high-performance network library that must have TCP server and client, has to be able to accept even 30000+ connections, and the throughput has to be as high as possible.
I know very well I have to use async methods, and I have already implemented all kinds of solutions that I have found and tested them.
In my benchmarking, only the minimal code was used to avoid any overhead in the scope, I have used profiling to minimize the CPU load, there is no more room for simple optimization, on the receiving socket the buffer data was always read, counted and discarded to avoid socket buffer fill completely.
The case is very simple, one TCP Socket listens on localhost, another TCP Socket connects to the listening socket (from the same program, on the same machine oc.), then one infinite loop starts to send 256kB sized packets with the client socket to the server socket.
A timer with 1000ms interval prints a byte counter from both sockets to the console to make the bandwidth visible then resets them for the next measurement.
I've realized the sweet-spot for packet size is 256kB and the socket's buffer size is 64kB to have the maximum throughput.
With the async/await type methods I could reach
~370MB/s (~3.2gbps) on Windows, ~680MB/s (~5.8gbps) on Linux with mono
With the BeginReceive/EndReceive/BeginSend/EndSend type methods I could reach
~580MB/s (~5.0gbps) on Windows, ~9GB/s (~77.3gbps) on Linux with mono
With the SocketAsyncEventArgs/ReceiveAsync/SendAsync type methods I could reach
~1.4GB/s (~12gbps) on Windows, ~1.1GB/s (~9.4gbps) on Linux with mono
Problems are the following:
async/await methods were the slowest, so I will not work with them
BeginReceive/EndReceive methods started new async thread together with the BeginAccept/EndAccept methods, under Linux/mono every new instance of the socket was extremely slow (when there was no more thread in the ThreadPool mono started up new threads, but to create 25 instance of connections did take about 5 mins, creating 50 connections was impossible (program just stopped doing anything after ~30 connections).
Changing the ThreadPool size did not help at all, and I would not change it (it was just a debug move)
The best solution so far is SocketAsyncEventArgs, and that makes the highest throughput on Windows, but in Linux/mono it is slower than the Windows, and it was the opposite before.
I've benchmarked both my Windows and Linux machine with iperf,
Windows machine produced ~1GB/s (~8.58gbps), Linux machine produced ~8.5GB/s (~73.0gbps)
The weird thing is iperf could make a weaker result than my application, but on Linux, it is much higher.
First of all, I would like to know if the results are normal, or can I get better results with a different solution?
If I decide to use the BeginReceive/EndReceive methods (they produced relatively the highest result on Linux/mono) then how can I fix the threading problem, to make the connection instance creating fast, and eliminate the stalled state after creating multiple instances?
I continue making further benchmarks and will share the results if there is any new.
================================= UPDATE ==================================
I promised code snippets, but after many hours of experimenting the overall code is kind of a mess, so I would just share my experience in case it can help someone.
I had to realize under Window 7 the loopback device is slow, could not get higher result than 1GB/s with iperf or NTttcp, only Windows 8 and newer versions have fast loopback, so I don't care anymore about Windows results until I can test on newer version. SIO_LOOPBACK_FAST_PATH should be enabled via Socket.IOControl, but it throws exception on Windows 7.
It turned out the most powerful solution is the Completion event based SocketAsyncEventArgs implementation both on Windows and Linux/Mono. Creating a few thousand instances of the clients never messed up the ThreadPool, the program did not stop suddenly as I mentioned above. This implementation is very nice to the threading.
Creating 10 connections to the listening socket and feeding data from 10 separate thread from the ThreadPool with the clients together could produce ~2GB/s data traffic on Windows, and ~6GB/s on Linux/Mono.
Increasing the client connection count did not improve the overall throughput, but the total traffic became distributed among the connections, this might be because the CPU load was 100% on all cores/threads even with 5, 10 or 200 clients.
I think overall performance is not bad, 100 clients could produce around ~500mbit/s traffic each. (Of course this is measured in local connections, real life scenario on network would be different.)
The only observation I would share: experimenting with both the Socket in/out buffer sizes and with the program read/write buffer sizes/loop cycles highly affected the performance and very differently on Windows and on Linux/Mono.
On Windows the best performance has been reached with 128kB socket-receive, 32kB socket-send, 16kB program-read and 64kB program-write buffers.
On Linux the previous settings produced very weak performance, but 512kB socket-receive and -send both, 256kB program-read and 128kB program-write buffer sizes worked the best.
Now my only problem is if I try create 10000 connecting sockets, after around 7005 it just stops creating the instances, does not throw any exceptions, and the program is running as there was no any problem, but I don't know how can it quit from a specific for loop without break, but it does.
Any help would be appreciated regarding anything I was talking about!
Because this question gets a lot of views I decided to post an "answer", but technically this isn't an answer, but my final conclusion for now, so I will mark it as answer.
About the approaches:
The async/await functions tend to produce awaitable async Tasks assigned to the TaskScheduler of the dotnet runtime, so having thousands of simultaneous connections, therefore thousands or reading/writing operations will start up thousands of Tasks. As far as I know this creates thousands of StateMachines stored in ram and countless context switchings in the threads they are assigned to, resulting in very high CPU overhead. With a few connections/async calls it is better balanced, but as the awaitable Task count grows it gets slow exponentially.
The BeginReceive/EndReceive/BeginSend/EndSend socket methods are technically async methods with no awaitable Tasks, but with callbacks on the end of the call, which actually optimizes more the multithreading, but still the limitation of the dotnet design of these socket methods are poor in my opinion, but for simple solutions (or limited count of connections) it is the way to go.
The SocketAsyncEventArgs/ReceiveAsync/SendAsync type of socket implementation is the best on Windows for a reason. It utilizes the Windows IOCP in the background to achieve the fastest async socket calls and use the Overlapped I/O and a special socket mode. This solution is the "simplest" and fastest under Windows. But under mono/linux, it never will be that fast, because mono emulates the Windows IOCP by using linux epoll, which actually is much faster than IOCP, but it has to emulate the IOCP to achieve dotnet compatibility, this causes some overhead.
About buffer sizes:
There are countless ways to handle data on sockets. Reading is straightforward, data arrives, You know the length of it, You just copy bytes from the socket buffer to Your application and process it.
Sending data is a bit different.
You can pass Your complete data to the socket and it will cut it to chunks, copy the chucks to the socket buffer until there is no more to send and the sending method of the socket will return when all data is sent (or when error happens).
You can take Your data, cut it to chunks and call the socket send method with a chunk, and when it returns then send the next chunk until there is no more.
In any cases You should consider what socket buffer size You should choose. If You are sending large amount of data, then the bigger the buffer is, the less chunks has to be sent, therefore less calls in Your (or in the socket's internal) loop has to be called, less memory copy, less overhead.
But allocating large socket buffers and program data buffers will result in large memory usage, especially if You are having thousands of connections, and allocating (and freeing up) large memory multiple times is always expensive.
On sending side 1-2-4-8kB socket buffer size is ideal for most cases, but if You are preparing to send large files (over few MB) regularly then 16-32-64kB buffer size is the way to go. Over 64kB there is usually no point to go.
But this has only advantage if the receiver side has relatively large receiving buffers too.
Usually over the internet connections (not local network) no point to get over 32kB, even 16kB is ideal.
Going under 4-8kB can result in exponentially incremented call count in the reading/writing loop, causing large CPU load and slow data processing in the application.
Go under 4kB only if You know Your messages will usually be smaller than 4kB, or just very rarely over 4KB.
My conclusion:
Regarding my experiments built-in socket class/methods/solutions in dotnet are OK, but not efficient at all. My simple linux C test programs using non-blocking sockets could overperform the fastest and "high-performance" solution of dotnet sockets (SocketAsyncEventArgs).
This does not mean it is impossible to have fast socket programming in dotnet, but under Windows I had to make my own implementation of Windows IOCP by directly communicating with the Windows Kernel via InteropServices/Marshaling, directly calling Winsock2 methods, using a lot of unsafe codes to pass the context structs of my connections as pointers between my classes/calls, creating my own ThreadPool, creating IO event handler threads, creating my own TaskScheduler to limit the count of simultaneous async calls to avoid pointlessly much context switches.
This was a lot of job with a lot of research, experiment, and testing. If You want to do it on Your own, do it only if You really think it worth it. Mixing unsafe/unmanage code with managed code is a pain in the ass, but the end it worth it, because with this solution I could reach with my own http server about 36000 http request/sec on a 1gbit lan, on Windows 7, with an i7 4790.
This is such a high performance that I never could reach with dotnet built-in sockets.
When running my dotnet server on an i9 7900X on Windows 10, connected to a 4c/8t Intel Atom NAS on Linux, via 10gbit lan, I can use the complete bandwidth (therefore copying data with 1GB/s) no matter if I have only 1 or 10000 simultaneous connections.
My socket library also detects if the code is running on linux, and then instead of Windows IOCP (obviously) it is using linux kernel calls via InteropServices/Marshalling to create, use sockets, and handle the socket events directly with linux epoll, managed to max out the performance of the test machines.
Design tip:
As it turned out it is difficult to design a networking library from scatch, especially one, that is likely very universal for all purposes. You have to design it to have many settings, or especially to the task You need.
This means finding the proper socket buffer sizes, the I/O processing thread count, the Worker thread count, the allowed async task count, these all has to be tuned to the machine the application running on and to the connection count, and data type You want to transfer through the network. This is why the built-in sockets are not performing that good, because they must be universal, and they do not let You set these parameters.
In my case assingning more than 2 dedicated threads to I/O event processing actually makes the overall performance worse, because using only 2 RSS Queues, and causing more context switching than what is ideal.
Choosing wrong buffer sizes will result in performance loss.
Always benchmark different implementations for the simulated task You need to find out which solution or setting is the best.
Different settings may produce different performance results on different machines and/or operating systems!
Mono vs Dotnet Core:
Since I've programmed my socket library in a FW/Core compatible way I could test them under linux with mono, and with core native compilation. Most interestingly I could not observe any remarkable performance differences, both were fast, but of course leaving mono and compiling in core should be the way to go.
Bonus performance tip:
If Your network card is capable of RSS (Receive Side Scaling) then enable it in Windows in the network device settings in the advanced properties, and set the RSS Queue from 1 to as high you can/as high is the best for your performance.
If it is supported by Your network card then it is usually set to 1, this assigns the network event to process only by one CPU core by the kernel. If You can increment this queue count to higher numbers then it will distribute the network events between more CPU cores, and will result in much better performance.
In linux it is also possible to set this up, but in different ways, better to search for Your linux distro/lan driver information.
I hope my experience will help some of You!
I had the same problem. You should take a look into:
NetCoreServer
Every thread in the .NET clr threadpool can handle one task at one time. So to handle more async connects/reads etc., you have to change the threadpool size by using:
ThreadPool.SetMinThreads(Int32, Int32)
Using EAP (event based asynchronous pattern) is the way to go on Windows. I would use it on Linux too because of the problems you mentioned and take the performance plunge.
The best would be io completion ports on Windows, but they are not portable.
PS: when it comes to serialize objects, you are highly encouraged to use protobuf-net. It binary serializes objects up to 10x times faster than the .NET binary serializer and saves a little space too!

Best way to throttle an external application's CPU Usage

Ok - here is the scenario:
I host a server application on Amazon AWS hosted windows instances. (I do not have access to the source code - so I cannot resolve the issues from within the applications source code)
These specific instances are able to build up CPU credits during times of idle cpu (less than 10-20% usage) and then spend those CPU credits during times of increased compute requirement.
My server application however, typically runs at around 15-20% cpu usage when no clients are connected- this is time when I would rather lower the cpu usage to around 5% through throttling of the cpu - maintaining enough cpu throughput to accept a TCP Socket from incoming clients.
When a connected client is detected, I would like to remove the throttle and allow full access to the reserve of AWS CPU Credits.
I have got code in place that can Suspend and Resume processes via C# using Windows API calls.
I am however a bit fuzzy on how to accurately attain a target cpu usage for that process.
What I am doing so far, which is having moderate success:
Looping inside another application
check the cpu usage of the server application - using performance counters (dont like these - they require a 100-1000 ms wait in order to return a % value)
I determine if the current value is above or below the target value - if above, I increase an int value called 'sleep' by 10ms
If below - 'sleep' is decreased by 10ms.
Then the application will call
Process.Suspend();
Threads.sleep(sleep);
Process.Resume();
Like I said - this is having moderate success.
But there are several reasons I don't like it:
1. It requires a semi-rapid loop in an external application: This might end up just shifting cpu usage to that application.
2. Im sure there are better mathematical solutions to work out the ideal sleep time.
I came across this application : http://mion.faireal.net/BES/
It seems to do everything I want, except I need to be able to control it, and I am not a c++ developer.
It also seems to be able to achieve accurate cpu throttling without consuming large cpu utself.
Can someone suggest CPU throttle techniques.
Remember - I cannot modify the source code of the application being throttled - at most, I could inject code into it: but it occurs to me that if I inject suspend code into it, then the resume code could not fire etc.
An external agent program might be the best way to go.

Threads vs Processes in .NET

I have a long-running process that reads large files and writes summary files. To speed things up, I'm processing multiple files simultaneously using regular old threads:
ThreadStart ts = new ThreadStart(Work);
Thread t = new Thread(ts);
t.Start();
What I've found is that even with separate threads reading separate files and no locking between them and using 4 threads on a 24-core box, I can't even get up to 10% on the CPU or 10% on disk I/O. If I use more threads in my app, it seems to run even more slowly.
I'd guess I'm doing something wrong, but where it gets curious is that if I start the whole exe a second and third time, then it actually processes files two and three times faster. My question is, why can't I get 12 threads in my one app to process data and tax the machine as well as 4 threads in 3 instances of my app?
I've profiled the app and the most time-intensive and frequently called functions are all string processing calls.
It's possible that your computing problem is not CPU bound, but I/O bound. It doesn't help to state that your disk I/O is "only at 10%". I'm not sure such performance counter even exists.
The reason why it gets slower while using more threads is because those threads are all trying to get to their respective files at the same time, while the disk subsystem is having a hard time trying to accomodate all of the different threads. You see, even with a modern technology like SSDs where the seek time is several orders of magnitude smaller than with traditional hard drives, there's still a penalty involved.
Rather, you should conclude that your problem is disk bound and a single thread will probably be the fastest way to solve your problem.
One could argue that you could use asynchronous techniques to process a bit that's been read, while on the background the next bit is being read in, but I think you'll see very little performance improvement there.
I've had a similar problem not too long ago in a small tool where I wanted to calculate MD5 signatures of all the files on my harddrive and I found that the CPU is way too fast compared to the storage system and I got similar results trying to get more performance by using more threads.
Using the Task Parallel Library didn't alleviate this problem.
First of all on a 24 core box if you are using only 4 threads the most cpu it could ever use is 16.7% so really you are getting 60% utilization, which is fairly good.
It is hard to tell if your program is I/O bound at this point, my guess is that is is. You need to run a profiler on your project and see what sections of code your project is spending the most of it's time. If it is sitting on a read/write operation it is I/O bound.
It is possable you have some form of inter-thread locking being used. That would cause the program to slow down as you add more threads, and yes running a second process would fix that but fixing your locking would too.
What it all boils down to is without profiling information we can not say if using a second process will speed things up or make things slower, we need to know if the program is hanging on a I/O operation, a locking operation, or just taking a long time in a function that can be parallelized better.
I think you find out what file cache is not ideal in case when one proccess write data in many file concurrently. File cache should sync to disk when the number of dirty page cache exceeds a threshold. It seems concurrent writers in one proccess hit threshold faster than the single thread writer. You can read read about file system cache here File Cache Performance and Tuning
Try using Task library from .net 4 (System.Threading.Task). This library have built-in optimizations for different number of processors.
Have no clue what is you problem, maybe because your code snippet is not really informative

Does multi-threading equal less CPU?

I have a small list of rather large files that I want to process, which got me thinking...
In C#, I was thinking of using Parallel.ForEach of TPL to take advantage of modern multi-core CPUs, but my question is more of a hypothetical character;
Does the use of multi-threading in practicality mean that it would take longer time to load the files in parallel (using as many CPU-cores as possible), as opposed to loading each file sequentially (but with probably less CPU-utilization)?
Or to put it in another way (:
What is the point of multi-threading? More tasks in parallel but at a slower rate, as opposed to focusing all computing resources on one task at a time?
In order to not increase latency, parallel computational programs typically only create one thread per core. Applications which aren't purely computational tend to add more threads so that the number of runnable threads is the number of cores (the others are in I/O wait, and not competing for CPU time).
Now, parallelism on disk-I/O bound programs may well cause performance to decrease, if the disk has a non-negligible seek time then much more time will be wasted performing seeks and less time actually reading. This is called "churning" or "thrashing". Elevator sorting helps somewhat, true random access (such as solid state memories) helps more.
Parallelism does almost always increase the total raw work done, but this is only important if battery life is of foremost importance (and by the time you account for power used by other components, such as the screen backlight, completing quicker is often still more efficient overall).
You asked multiple questions, so I've broken up my response into multiple answers:
Multithreading may have no effect on loading speed, depending on what your bottleneck during loading is. If you're loading a lot of data off disk or a database, I/O may be your limiting factor. On the other hand if 'loading' involves doing a lot of CPU work with some data, you may get a speed up from using multithreading.
Generally speaking you can't focus "all computing resources on one task." Some multicore processors have the ability to overclock a single core in exchange for disabling other cores, but this speed boost is not equal to the potential performance benefit you would get from fully utilizing all of the cores using multithreading/multiprocessing. In other words it's asymmetrical -- if you have a 4 core 1Ghz CPU, it won't be able to overclock a single core all the way to 4ghz in exchange for disabling the others. In fact, that's the reason the industry is going multicore in the first place -- at least for now we've hit limits on how fast we can make a single CPU run, so instead we've gone the route of adding more CPUs.
There are 2 reasons for multithreading. The first is that you want to tasks to run at the same time simply because it's desirable for both to be able to happen simultaneously -- e.g. you want your GUI to continue to respond to clicks or keyboard presses while it's doing other work (event loops are another way to accomplish this though). The second is to utilize multiple cores to get a performance boost.
For loading files from disk, this is likely to make things much slower. What happens is the operating system tries to lay out files on disk such that you should only need to do an expensive disk seek once for each file. If you have a lot of threads reading a lot of files, you're gonna have contention over which thread has access to the disk, and you'll have to seek back to the right place in the file every time the next thread gets a turn.
What you can do is use exactly two threads. Set one to load all of the files in the background, and let the other remain available for other tasks, like handling user input. In C# winforms, you can do this easily with a BackgroundWorker control.
Multi-threading is useful for highly parallelizable tasks. CPU intensive tasks are perfect. Your CPU has many cores, many threads can use many cores. They'll use more CPU time, but in the end they'll use less "user" time. If your app is I/O bounded, then multithreading isn't always the solution (but it COULD help)
It might be helpful to first understand the difference between Multithreading and Parallelism, as more often than not I see them being used rather interchangeably. Joseph Albahari has written a quite interesting guide about the subject: Threading in C# - Part 5 - Parallelism
As with all great programming endeavors, it depends. By and large, you'll be requesting files from one physical store, or one physical controller which will serialize the requests anyhow (or worse, cause a LOT of head back-and-forth on a classical hard drive) and slow down the already slow I/O.
OTOH, if the controllers and the medium are separate, multiple cores loading data from them should be improved over a sequential method.

Hitting a memory limit slows down the .Net application

We have a 64bit C#/.Net3.0 application that runs on a 64bit Windows server. From time to time the app can use large amount of memory which is available. In some instances the application stops allocating additional memory and slows down significantly (500+ times slower).When I check the memory from the task manager the amount of the memory used barely changes. The application keeps on running very slowly and never gives an out of memory exception.
Any ideas? Let me know if more data is needed.
You might try enabling server mode for the Garbage Collector. By default, all .NET apps run in Workstation Mode, where the GC tries to do its sweeps while keeping the application running. If you turn on server mode, it temporarily stops the application so that it can free up memory (much) faster, and it also uses different heaps for each processor/core.
Most server apps will see a performance improvement using the GC server mode, especially if they allocate a lot of memory. The downside is that your app will basically stall when it starts to run out of memory (until the GC is finished).
* To enable this mode, insert the following into your app.config or web.config:
<configuration>
<runtime>
<gcServer enabled="true"/>
</runtime>
</configuration>
The moment you are hitting the physical memory limit, the OS will start paging (that is, write memory to disk). This will indeed cause the kind of slowdown you are seeing.
Solutions?
Add more memory - this will only help until you hit the new memory limit
Rewrite your app to use less memory
Figure out if you have a memory leak and fix it
If memory is not the issue, perhaps your application is hitting CPU very hard? Do you see the CPU hitting close to 100%? If so, check for large collections that are being iterated over and over.
As with 32-bit Windows operating systems, there is a 2GB limit on the size of an object you can create while running a 64-bit managed application on a 64-bit Windows operating system.
Investigating Memory Issues (MSDN article)
There is an awful lot of good stuff mentioned in the other answers. However, I'm going to chip in my two pence (or cents - depending on where you're from!) anyway.
Assuming that this is indeed a 64-bit process as you have stated, here's a few avenues of investigation...
Which memory usage are you checking? Mem Usage or VMem Size? VMem size is the one that actually matters, since that applies to both paged and non-paged memory. If the two numbers are far out of whack, then the memory usage is indeed the cause of the slow-down.
What's the actual memory usage across the whole server when things start to slow down? Does the slow down also apply to other apps? If so, then you may have a kernel memory issue - which can be due to huge amounts of disk accessing and low-level resource usage (for example, create 20000 mutexes, or load a few thousand bitmaps via code that uses Win32 HBitmaps). You can get some indication of this on the Task Manager (although Windows 2003's version is more informative directly on this than 2008's).
When you say that the app gets significantly slower, how do you know? Are you using vast dictionaries or lists? Could it not just be that the internal data structures are getting so big so as to complicate the work any internal algorithms are performing? When you get to huge numbers some algorithms can start to become slower by orders of magnitude.
What's the CPU load of the application when it's running at full-pelt? Is actually the same as when the slow-down occurs? If the CPU usage decreases as the memory usage goes up, then that means that whatever it's doing is taking the OS longer to fulfill, meaning that it's probably putting too much load on the OS. If there's no difference in CPU load, then my guess is it's internal data structures getting so big as to slow down your algos.
I would certainly be looking at running a Perfmon on the application - starting off with some .Net and native memory counters, Cache hits and misses, and Disk Queue length. Run it over the course of the application from startup to when it starts to run like an asthmatic tortoise, and you might just get a clue from that as well.
Having skimmed through the other answers, I'd say there's a lot of good ideas. Here's one I didn't see:
Get a memory profiler, such as SciTech's MemProfiler. It will tell you what's being allocated, by what, and it will show you the whole slice n dice.
It also has video tutorials in case you don't know how to use it. In my case, I discovered I had IDisposable instances that I wasn't Using(...)

Categories

Resources