One big byte buffer or several small ones?

One big byte buffer or several small ones? - c#

I'm learning C# asynchronous socket programming, and I've learned that it's a good idea to reuse byte buffers in some sort of pool, and then just check one out as needed when receiving data from a socket.
However, I have seen two different methods of doing a byte array pool: one used a simple queue system, and just added/removed them from the queue as needed. If one was requested and there were no more left in the queue, a new byte array is created.
The other method that I've seen uses one big byte array for the entire program. The idea of a queue still applies, but instead it's a queue of integers which determine the slice (offset) of the byte array to use. If one was requested and there were no more left in the queue, the array must be resized.
Which one of these is a better solution for a highly scalable server? My instinct is it would be cheaper to just use many byte arrays because I'd imagine resizing the array as needed (even if we allocate it in large chunks) would be pretty costly, especially when it gets big. Using multiple arrays seems more intuitive too - is there some advantage to using one massive array that I'm not thinking of?

You are correct in your gut feeling. Every time you need to make the array bigger, you will be recreating the array and copying the existing bytes over. Since we are talking about bytes here, the size of the array may get large very quickly. So, you will be asking for a contiguous piece of memory each time, which, depending on how your program uses memory, might or might not be viable. This will also in effect, become a virtual pool, so to speak. A pool by definition has a set of multiple items that are managed and shared by various clients.
The one array solution is also way more complex to implement. The good thing is that a one array solution allows you to give variable-sized chunks out, but this comes at the cost of essentially reimplementing malloc: dealing with fragmentation, etc, etc, which you shouldn't get into.
A multiple array solution allows you to initialize a pool with N amount of buffers and easily manage them in a straightforward fashion. Definitely the approach I'd recommend.

I wouldn't suggest the resizing option. Start simple and work your way up. A queue of byte buffers which gets a new one added to the end when it is exhausted would be a good start. You will probably have to pay attention to threading issues, so my advice would be to use somebody else's thread-safe queue implementation.
Next you can take a look at the more complex "pointers" into a big byte array chunk, except my advice would be to have a queue of 4k/16k (some power of two multiple of the page size) blocks that you index into, and when it is full you add another big chunk to the queue. Actually, I don't recommend this at all due to the complexity and the dubious gain in performance.
Start simple, work your way up. Pool of buffers, make it thread safe, see if you need anything more.

One more vote for multiple buffers, but with the addition that since you're doing things asynchronously you need to make sure your queue is threadsafe. The default Queue<T> collection is definitely not threadsafe.
SO user and MS employee JaredPar has a good threadsafe queue implementation here:
http://blogs.msdn.com/jaredpar/archive/2009/02/16/a-more-usable-thread-safe-collection.aspx

If you use the single buffer you need a strategy of how fast it should grow when needed. If you grow it by small increments you may have to do it often and copy all the data often. If you grow it by large increments (like the next size is 1,5 times the previous one) you risk to face a situation when you get "Out of memory" simply trying to grow the buffer. It's a lose-lose choice for a scalable system. This is why reusing small buffers is preferrable.

With a garbage collection heap, you should always favor small, right-sized buffers that have a short life-time. The .NET heap allocator is very fast, generation #0 collections are very cheap.
When you keep a static buffer around, you'll use up system resources for the life of the program. The worst case scenario is when it gets big enough to get moved in the Large Object Heap where it will be a permanent obstacle that cannot be moved.

Related

C#, byte arrays, avoid memory fragmentation

I'm writing a data structure, which main purpose will be containing binary data (byte arrays) organized in such way, that user can insert data in the middle without performance penalties (moving large chunks of memory).
For that purpose, I'll need to create a lot of smaller byte-arrays, which will serve as buffers for parts of the bigger structure. The problem is that I might need to juggle those small arrays frequently and I'm worried about possible memory fragmentation.
My idea to avoid it is to ensure, that all arrays I use within this structure will have one of two sizes, say 1024 or 2048 bytes (or similar, generally n and 2*n). I hope to simplify work of memory manager, which will be able to reuse memory after disposed arrays.
Will it work that way? Or should I design some kind of array pool and implement the array-reusing mechanism myself?

Simple algorithm to determine when to free some memory .Net

Our system keeps hold of lots of large objects for performance. However, when running low on memory, we want to drop some of the objects. The objects are prioritized, so I know which ones to drop. Is there a simple way of determining when to free memory? Also, dropping 1 object may not be enough, so I guess I need a loop to drop, check, drop again if necessary, etc. But in c#, I won't necessarily see the effect immediately of dropping an object, so how do I avoid kicking too much stuff out?
I guess it's just a simple function of used vs total physical & virtual memory. But what function?
Edit: Some clarifications
"Large objects" was misleading. I meant logical "package" of objects (the objects should be small enough individually to avoid the LOB - that's the intention certainly) that together are large (~ 100MB?)
A request can come in which requires the use of one such package. If it is in memory, the response is rapid. If not, it needs to be reconstructed, which is very slow. So I want to keep stuff in memory as long as possible, but can ditch the least requested ones when necessary.
We have no sensible way to serialize these packages. We should probably do that, but it's a lot of work and there's a lot of resistance to doing so.
Our original simple approach is to periodically compare the following to a configurable threshold.
var c = new ComputerInfo();
return c.AvailablePhysicalMemory / c.TotalPhysicalMemory;

There're a lot of different topics on this questions and I think is best to clarify them before actually answering.
First of, you say your app does get a hold of a lot of "large objects". Define large object. Anything larger than about 85K goes into the LOH which only gets collected as part of a generation 2 collection (the most expensive of them all), anything smaller than that, even if you think is a "big" object, is not and it's treated as any other kind of object.
Secondly there're two problems in terms of "managing memory"
One is managing the amount of space you're using inside your virtual memory space. That is, in 32 bit systems making sure you can address all the memory you're asking for, which in Windows 32 bit uses to be around 1,5 GB.
Secondly is managing disposing of that memory when it's needed, which is a part of the garbage collector work so that it triggers when there's a shortage on memory (although that doesn't mean you can't get an OutOfMemoryException if you don't give the GC time enough to do its job).
With that said, I think you should forget about taking the place of the GC... just let it do its job and, if you're worried then find the critical paths that may fail (on memory request) and protect yourself against OutOfMemoryExceptions.
There're a lot of different patterns for handling the case you're posting and most of them really depend on your business scenario. One example is having a state machine that can actually go to an "OutOfMemory" state, in which case the system switches to freeing memory before doing anything else (that includes disposing old objects and invoking the GC to clean everything up, all while you patiently wait for it to happen).
Other techniques involve saving the data to the disk and then manually swapping in and out objects based on some algorithm when you reach certain levels. That means stopping all your threads (or some, depending on business) and moving the data back and forth.
If your large objects are all controlled in terms of location you can also declare a facade over their creation, so that the facade can check whether it needs to free objects or not based on the amount of memory (virtual memory) your process is using. BTW, use the PerformanceInfo API call as quoted in the other answer as this will include the amount of memory used by unmanaged code, which is, nonetheless, located inside the virtual memory space of your process.
Don't worry too much about "real" memory, as the operating system will make sure the most appropriate pages are located in memory.
Then there're hundreds of other optimizations that completely depend on your business scenario. For example databases "know" to bring data to memory depending on the query and predicting the data you're going to use in advance so the data is ready and they do remove objects that are not used... but that's another topic.
Edit: Based on your edits to the question.
Checking memory in the facade will not add a significant overhead in terms of performance.
If you start getting low on memory you should take a decision of how many objects / how much space are you going to free. Don't do it one at a time, take a bunch of them and free enough memory so that you don't have to collect again.
If you go with the previous approach you can service the request after you've freed enough space and continue cleaning in background.
One of the fastest ways of handling memory / disk swapping is by using memory mapped files.

Use GC.GetTotalMemory and if this exceeds your expectation then you can nullify the objects that you want to release and call GC.Collect.

Have a look at the accepted answer to this question. It uses the GetPerformanceInfo Windows API to determine memory consumption of all sorts. Task Manager is using the same information. This should help you writing a class that observes memory consumption periodically.
Once memory runs low you can fill a FIFO queue with soon-to-be deleted tasks.
The observer will delete the first object in the queue and maybe call GCCollect manually, I'm not too sure about this.
Give the collection some time before you recheck the mem consumption for your application. If there is still not enough free mem, delete the next object from the queue and so on...

Whats the Best Concurrent Thread Shared Memory Architecture Without Locking?

I have a 2d Array of memory. I have multiple threads reading and writing to single elements in the array spontaneously, arbitrarily, and concurrently.
What is the fastest way or best practice to construct my memory access code? I don't like the idea of locking because it blocks other threads.
Data integrity is actually not that important, but it should be (mostly) consistent. My code can handle a few memory errors.
It needs to be really, really fast!
Thanks for feedback.

If data integrity is not important, you can just access the data without caring about multithreading at all.
No one can predict the result, though.
I wouldn't call this approach "best practice", however. IMHO best pratice is caring about multithreading, and protecting the data with appropriately-grained mutexes. My opinion is that every application should be first correct, and only then fast. Inconsistent results are just wrong, doesn't matter if they come fast or not.

Use the Interlocked class to CAS (CompareAndExchange) the objects/values in your array. It makes the operation atomic which ensures that the data is not corrupted. That's about the fastest thing you can do (aside from accessing/modifying the data directly without interlocking). However, if you're modifying the size of the 2D array (growing/shrinking) then you will have some serious problems unless you use some kind of locking mechanism on your array.

Declare the array as volatile and ensure it's scoped such that it's visible to all your threads. I generally like to avoid statics, so either pass the array by reference, or set up all your threads to run methods of an instance class that has the array defined as an instance field.
However, I strongly urge you to rethink what "volatile access" means in terms of data integrity. Best practice is NOT to do what you are attempting without good locking mechanics. You may think it's a small problem, but you can find yourself with a very non-deterministic system, so much so that its data isn't reliable in the slightest.
Let's say you have 8 threads running, and all of them will get a value from an index of the array, do some calculation, then add the result back to the index of the array. Thread 1 starts first and gets the value of the index, 0. Then threads 2-7 all start and get the same value. Thread 1 performs its calculation, gets the index again to ensure it has the "latest" value, then tries to update the value. However, other threads are waiting for that memory, and due to some scheduling implementation you know nothing about, in between Thread 1 getting the index (still zero) and writing its result, threads 2-7 have ALL written their values. Then Thread 1 writes its value, overwriting everything the other 7 threads have done. The other 7 threads, in turn, probably had similar "races" with each other such that the value overwritten by Thread 1 probably overwrote the results of half the threads anyway.
I guarantee you that this behavior is NOT what you want, no matter how much you think you can get away with it; it WILL cause data corruption, which WILL affect other areas of the system, and you WILL be forced to implement proper locking.

If you are interested solely in performance, then the way in which you order your memory accesses can play a big role. Spend an hour or so reading through the slides from Lecture 1 of MIT's Performance Engineering class. The other lectures may also be interesting to you (such as Lecture 6).
Basically, you can optimize your use of the cache to greatly improve performance, depending on your read/write patterns, given the workload you are using.
This should not stop you from doing something that is correct, however.

.NET Binary File Read Performance

I have a very large set of binary files where several thousand raw frames of video are being sequentially read and processed, and I’m now looking to optimize it as it appears to be more CPU-bound than I/O-bound.
The frames are currently being read in this manner, and I suspect this is the biggest culprit:
private byte[] frameBuf;
BinaryReader binRead = new BinaryReader(FS);
// Initialize a new buffer of sizeof(frame)
frameBuf = new byte[VARIABLE_BUFFER_SIZE];
//Read sizeof(frame) bytes from the file
frameBuf = binRead.ReadBytes(VARIABLE_BUFFER_SIZE);
Would it make much of a difference in .NET to re-organize the I/O to avoid creating all these new byte arrays with each frame?
My understanding of .NET’s memory allocation mechanism is weak as I am coming from a pure C/C++ background. My idea is to re-write this to share a static buffer class that contains a very large shared buffer with an integer keeping track of the frame’s actual size, but I love the simplicity and readability of the current implementation and would rather keep it if the CLR already handles this in some way I am not aware of.
Any input would be much appreciated.

You don't need to init frameBuf if you use binRead.ReadBytes -- you'll get back a new byte array which will overwrite the one you just created. This does create a new array for each read, though.
If you want to avoid creating a bunch of byte arrays, you could use binRead.Read, which will put the bytes into an array you supply to it. If other threads are using the array, though, they'll see the contents of it change right in front of them. Be sure you can guarantee you're done with the buffer before reusing it.

You need to be careful here. It is very easy to get completely bogus test results on code like this, results that never repro in real use. The problem is the file system cache, it will cache the data you read from a file. The trouble starts when you run your test over and over again, tweaking the code and looking for improvements.
The second, and subsequent times you run the test, the data no longer comes off the disk. It is still present in the cache, it only takes a memory-to-memory copy to get it into your program. That's very fast, a microsecond or so of overhead plus the time needed to copy. Which runs at bus-speeds, at least 5 gigabytes per second on modern machines.
Your test will now reveal that you spend a lot of time on allocating the buffer and processing the data, relative from the amount of time spent reading the data.
This will rarely repro in real use. The data won't be in the cache yet, now the sluggy disk drive needs to seek the data (many milliseconds) and it needs to be read off the disk platter (a couple of dozen megabytes per second, at best). Reading the data now takes a good three of four magnitudes of time longer. If you managed to make the processing step twice as fast, your program will actually only run 0.05% faster. Give or take.

Large Arrays, and LOH Fragmentation. What is the accepted convention?

I have an other active question HERE regarding some hopeless memory issues that possibly involve LOH Fragmentation among possibly other unknowns.
What my question now is, what is the accepted way of doing things?
If my app needs to be done in Visual C#, and needs to deal with large arrays to the tune of int[4000000], how can I not be doomed by the garbage collector's refusal to deal with the LOH?
It would seem that I am forced to make any large arrays global, and never use the word "new" around any of them. So, I'm left with ungraceful global arrays with "maxindex" variables instead of neatly sized arrays that get passed around by functions.
I've always been told that this was bad practice. What alternative is there?
Is there some kind of function to the tune of System.GC.CollectLOH("Seriously") ?
Are there possibly some way to outsource garbage collection to something other than System.GC?
Anyway, what are the generally accepted rules for dealing with large (>85Kb) variables?

Firstly, the garbage collector does collect the LOH, so do not be immediately scared by its prescence. The LOH gets collected when generation 2 gets collected.
The difference is that the LOH does not get compacted, which means that if you have an object in there that has a long lifetime then you will effectively be splitting the LOH into two sections — the area before and the area after this object. If this behaviour continues to happen then you could end up with the situation where the space between long-lived objects is not sufficiently large for subsequent assignments and .NET has to allocate more and more memory in order to place your large objects, i.e. the LOH gets fragmented.
Now, having said that, the LOH can shrink in size if the area at its end is completely free of live objects, so the only problem is if you leave objects in there for a long time (e.g. the duration of the application).
Starting from .NET 4.5.1, LOH could be compacted, see GCSettings.LargeObjectHeapCompactionMode property.
Strategies to avoid LOH fragmentation are:
Avoid creating large objects that hang around. Basically this just means large arrays, or objects which wrap large arrays (such as the MemoryStream which wraps a byte array), as nothing else is that big (components of complex objects are stored separately on the heap so are rarely very big). Also watch out for large dictionaries and lists as these use an array internally.
Watch out for double arrays — the threshold for these going into the LOH is much, much smaller — I can't remember the exact figure but its only a few thousand.
If you need a MemoryStream, considering making a chunked version that backs onto a number of smaller arrays rather than one huge array. You could also make custom version of the IList and IDictionary which using chunking to avoid stuff ending up in the LOH in the first place.
Avoid very long Remoting calls, as Remoting makes heavy use of MemoryStreams which can fragment the LOH during the length of the call.
Watch out for string interning — for some reason these are stored as pages on the LOH and can cause serious fragmentation if your application continues to encounter new strings to intern, i.e. avoid using string.Intern unless the set of strings is known to be finite and the full set is encountered early on in the application's life. (See my earlier question.)
Use Son of Strike to see what exactly is using the LOH memory. Again see this question for details on how to do this.
Consider pooling large arrays.
Edit: the LOH threshold for double arrays appears to be 8k.

It's an old question, but I figure it doesn't hurt to update answers with changes introduced in .NET. It is now possible to defragment the Large Object Heap. Clearly the first choice should be to make sure the best design choices were made, but it is nice to have this option now.
https://msdn.microsoft.com/en-us/library/xe0c2357(v=vs.110).aspx
"Starting with the .NET Framework 4.5.1, you can compact the large object heap (LOH) by setting the GCSettings.LargeObjectHeapCompactionMode property to GCLargeObjectHeapCompactionMode.CompactOnce before calling the Collect method, as the following example illustrates."
GCSettings can be found in the System.Runtime namespace
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();

The first thing that comes to mind is to split the array up into smaller ones, so they don't reach the memory needed for the GC to put in it the LOH. You could spit the arrays into smaller ones of say 10,000, and build an object which would know which array to look in based on the indexer you pass.
Now I haven't seen the code, but I would also question why you need an array that large. I would potentially look at refactoring the code so all of that information doesn't need to be stored in memory at once.

You get it wrong. You do NOT need to havean array size 4000000 and you definitely do not need to call the garbace collector.
Write your own IList implementation. Like "PagedList"
Store items in arrays of 65536 elements.
Create an array of arrays to hold the pages.
This allows you to access basically all your elements with ONE redirection only. And, as the individual arrays are smaller, fragmentation is not an issue...
...if it is... then REUSE pages. Dont throw them away on dispose, put them on a static "PageList" and pull them from there first. All this can be transparently done within your class.
The really good thing is that this List is pretty dynamic in the memory usage. You may want to resize the holder array (the redirector). Even when not, it is about 512kbdata per page only.
Second level arrays have basically 64k per byte - which is 8 byte for a class (512kb per page, 256kb on 32 bit), or 64kb per struct byte.
Technically:
Turn
int[]
into
int[][]
Decide whether 32 or 64 bit is better as you want ;) Both ahve advantages and disadvantages.
Dealing with ONE large array like that is unwieldely in any langauge - if you ahve to, then... basically.... allocate at program start and never recreate. Only solution.

This is an old question, but with .NET Standard 1.1 (.NET Core, .NET Framework 4.5.1+) there is another possible solution:
Using ArrayPool<T> in the System.Buffers package, we can pool arrays to avoid this problem.

Am adding an elaboration to the answer above, in terms of how the issue can arise. Fragmentation of the LOH is not only dependent on the objects being long lived, but if you've got the situation that there are multiple threads and each of them are creating big lists going onto the LOH then you could have the situation that the first thread needs to grow its List but the next contiguous bit of memory is already taken up by a List from a second thread, hence the runtime will allocate new memory for the first threads List - leaving behind a rather big hole. This is whats happening currently on one project I've inherited and so even though the LOH is approx 4.5 MB, the runtime has got a total of 117MB free memory but the largest free memory segment is 28MB.
Another way this could happen without multiple threads, is if you've got more than one list being added to in some kind of loop and as each expands beyond the memory initially allocated to it then each leapfrogs the other as they grow beyond their allocated spaces.
A useful link is: https://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/
Still looking for a solution for this, one option may be to use some kind of pooled objects and request from the pool when doing the work. If you're dealing with large arrays then another option is to develop a custom collection e.g. a collection of collections, so that you don't have just one huge list but break it up into smaller lists each of which avoid the LOH.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.