NOTE: My case is in the ecosystem of an old API that only work with Strings, no modern .NET additions.
So I have a strong need to have mutable string that has no allocations. String is updated every X ms so you can figure out how much garbage it can produce just in few minutes (StringBuilder is not even close to being relevant here at all). My current approach is to pre-allocate string of fixed size and mutate it via pinning, writing characters directly and either falling off silently of throwing when capacity reached.
This works fine. The allocated string is long-lived so eventually GC will promote it to Gen2 and pinning wont bother it that much, minimizing overhead. There are 2 major issues though:
Because string is fixed, I have to pad it with \0 and while this worked fine so far with all default NET/MONO functionality and 3rd party stuff, no way telling how something else will react when string is 1024 in len, but last 100 are \0
I cant resize it, because this will incur allocation. I could take one allocation once a blue moon but since string is fairly dynamic I cant be sure when it will try expand or shrink further. I COULD use "expand only" approach, this way I allocate only when expansion needed, however, this has disadvantages of padding overhead (if string expanded to be 5k characters, but next string is just 3k - 2k characters will be padded for extra cycles) and also memory extra usage. I'm not sure how GC will feel about mchuge, often pinned string in Gen2 and not in LOH. Another way would be pool re-useable string objects, however, this has higher memory and GC overhead + lookup overhead.
Since the target string has to live for quite some time, I was thinking about moving it into Unmanaged memory, via byte buffer. This will remove burden from GC (pinning penalty overhead) and I can re-size/re-allocate at less cost than in managed heap.
What I'm having hard time to understand is - how can I possibly slice specific part of allocated unmanaged buffer and wrap it as a normal net string to use in managed space/code? Like, pass it to Console.WriteLine or some 3rd party library that draws UI label on screen and accepts string. Is this even doable?
P.S. As far as I know, the plan for NET5 (and to be finalized in NET6, I think) that you will no longer be able to mutate things like string (either blocked at runtime or undefined failure). Their solution seems to be POH which is essentially what I describe, with the same limitations.
how can I possibly slice specific part of allocated unmanaged buffer and wrap it as a normal net string to use in managed space/code
As far as I know this is not possible. .Net has their own way to define objects (object headers etc), you cannot treat some arbitrary memory region as a .net object. Pinning and mutating a string seem dangerous since strings are intended to be immutable, and some things might not work correctly (using the string as a dictionary key for example).
The correct way would be (as Canton7 mentions) to use a char[] buffer and Span<char> / Memory<char> for slicing the string. When passing to other methods you can convert a slice of the string to an actual string object. When calling methods like Console.WriteLine or UI methods, the overhead of allocating the string object will be irrelevant compared to everything else that is going on.
If you have old code that only accepts string you would either need to accept the limitations this entails, or rewrite the code to accept memory/span representations.
I would highly recommend profiling to see if it is an actual problem with frequent allocations. As long as the string fits in the small object heap (SOH, i.e. less than 87kb) and is not promoted to gen 2 the overhead might not be huge. Allocations on the SOH is fast, and the time to run a gen 0 GC does not scale directly with the amount allocated. So updating every few milliseconds might not be terrible. I would be more worried if you where talking about microseconds.
Related
What is the big O time complexity of allocating an array in .Net?
I'm guessing that if the array is small enough to fit on the ephemeral segment it should be O(1), but that as n gets larger it gets more difficult to find enough memory so it may change.
Also the large object heap may be fragmented, so if n is larger enough for the array to fit on the LOH, it probably won't be O(1).
A new array will be allocated in two distinct heaps, as most are probably aware, depending on its size (and the size threshold is 85000 bytes):
Small Object Heap - allocation here happens in so-called allocation context which is pre-zeroed region of memory, located inside an ephemeral segment. Two scenarios may happen here:
there is enough space for a new array in current allocation context - in such case we can treat it as O(1) operation of just returning an address for the array (and bumping pointer for the next objects)
there is not enough space there - allocation context will be tried to be enlarged by an allocation quantum (usually around 8kB) if it is possible (like it lies at the end of the ephemeral segment). Here we hit the cost of zeroing those 8kB so it is significantly bigger. Even worse, allocation context may not be possible to enlarge - because it may lie between already allocated objects. In such case a new allocation context will be created - somewhere inside the ephemeral segment with the help of free-list, to make use of the fragmentation. In this case, the cost is even bigger - traversing free-list to find the proper place and then zeroing it. Still, the cost does not depend on the array size directly and is "constant" so we can treat it as O(1) like previously.
Large Object Heap - because allocations here are by default much less frequent, it uses "ad-hoc" allocation contexts - each time allocation happens here, GC searches for the appropriate place with the help of the free-list and zeros it. Again, both cost of free-list traversal and memory zeroing happens here, but as objects are big here, it is mainly predominated by zeroing cost. Here we can talk about O(n) cost.
In case of LOH allocation one should be aware of an additional hidden "cost" - such allocations do not happen during some parts of Background GCs (because both operate on free-list). So if it happens that you have a lot of long Background GCs, LOH allocations will be paused waiting for GCs to end. This obviously will introduce unwanted delays for your threads.
Objects in the ephemeral segment (SOH; small object heap) are allocated after the last known object on that segment. It should really be just a pointer to there.
The "empty" space in between will not be considered, since there is no empty space. Even if the object has no reference any more, it will still be there until it's garbage collected. Then, the SOH will be compacted, so again, there are no free spaces.
If the SOH is not large enough, then a different one has to be chosen or a new segment has to be created. This will take longer, but is still O(1).
The LOH is a bit more complex, since it will usually not be compacted. There are websites that give statements that the LOH has a "free list". But I'm not sure whether it's really a list style implementation. I guess that it has a better management and works like a dictionary, so it should not be worse than O(log(n)).
What needs to be done?
perhaps get new memory from the kernel. If so, the memory was already zeroed and memset() is not needed.
if that new memory is not available in RAM, swap something to disk first. This part may become really expensive but unpredictable.
If memory is already available in .NET, it might need to be initialized to zero. But the implementation of memset() is optimized (e.g. using rep stos)
Initialize the array with values from somewhere (e.g. a file). This will likely be a .NET loop and except from swapping one of the expensive parts.
Usually, I would not consider the allocation of memory something to worry about, unless you have used a profiler (like dotMemory) that told you about memory throughput issues. Trust Donald Knuth: "premature optimization is the root of all evil".
I wonder why C# does not have a version of long.Parse accepting an offset in the string and length. In effect, I am forced to call string.Substring first.
This is unlike C strtol where one does not need to extract the substring first.
If I need to parse millions of rows I have a feeling there will be overhead creating those small strings that immediately become garbage.
Is there a way to parse a string into numbers efficiently without creating temporary short lived garbage strings on the heap? (Essentially doing it the C way)
Unless I'm reading this wrong, strtol doesn't take an offset into the string. It takes a memory address, which the caller can set to any position within a character buffer (or outside the buffer, if they aren't paying attention).
This presents a couple issues:
Computation of the offset requires an understanding of how the string is encoded. I believe c# uses UTF16 for in-memory strings, currently anyway. if that were ever to change, your offsets would be off, possibly with disastrous results.
Computation of the address could easily go stale for managed objects since they are not pinned in memory-- they could be moved around by memory management at any time. You'd have to pin it in memory using something like GCHandle.Alloc. When you're done, you'd better unpin it, or you could have serious problems!
If you get the address wrong, e.g. outside your buffer, your program is likely going to blow up.
I think C programmers are more accustomed to managing memory mapped objects themselves and have no issue computing offsets and addresses and monkeying around with them like you would with assembly. With a managed language like c# those sorts of things require more work and aren't typically done-- the only time we pin things in memory is when we have to pass objects off to unmanaged code. When we do it, it incurs overhead. I wouldn't advise it if your overall goal is to improve performance.
But if you are hell bent on getting down to the bare metal on this, you could try this solution where one clever c# programmer would read the string as an array of ASCII-encoded bytes and compute the numbers based on that. With his solution you can specify start and length to your heart's content. You'd have to write something different if your strings are encoded in UTF. I would go this route rather than trying to hack the string object's memory mapping.
I have a large application which averages about 30 mb/sec in memory allocations (per performance monitor bytes allocated/sec measurement). I am trying to cut this down substantially, and the source of the allocations is not obvious.
To instrument things I have recorded my ETW traces for the CLR / GC, and have exported the AllocationTick event, which records every time an additional 100 kilobytes is allocated, and what the object type was that was most recently allocated. This produces a nice size sample set. Three object types account for 70% of my allocations, but they are a bit of a mystery.
System.Int64 30%
System.Int32 28%
System.Runtime.CompilerServices.CallSite'1[System.Func'3[System.Runtime.CompilerServices.CallSite,System.Object,System.Object]] 12%
The dataset was ~70 minutes and a million events, so I am quite confident in the numbers.
I am guessing this is somehow indicating that I am creating a lot of pointers on the heap in some unexpected way? (this is an x64 application)
I use some linq and foreach loops, but these should only be creating increment variables on the stack, not the heap.
I am running everything on top of the TPL / Dataflow library as well, which could be generating these.
I am looking for any advice on what may be causing so many heap allocations of int32/64, and perhaps some techniques to isolate these allocations (call stacks would be great, but may be performance prohibitive).
I am guessing this is somehow indicating that I am creating a lot of pointers on the heap in some unexpected way?
It sounds more likely that you're boxing a lot of int and long values to me.
The CallSite part sounds like you're using dynamic a lot (or in one very heavily-used part of the code), which can easily lead to more boxing than statically typed code.
I would try to isolate particular areas of the code which allocate a lot of your objects - if you can exercise just specific code paths, for example, that would give you a much clearer idea of which of those paths generates more garbage than you'd expect. Have a look at anywhere that uses dynamic and see whether you actually need to - although you shouldn't feel you have to remove all uses of dynamic by any means; there may well be one particuar "hot spot" that could be micro-optimized.
The other thing to consider is how much this allocation is actually costing you. You say you're trying to cut down on it substantially - do you really need to? If all of these objects are very short-lived, you may well find that you don't improve performance much by reducing the allocation rate. You should measure time spent in garbage collection to work out how effective this is likely to be.
Just like almost any other big .NET application, my current C# project contains many .net collections .
Sometimes I don't know, from the beginning, what the size of a Collection (List/ObservableCollection/Dictionary/etc.) is going to be.
But there are many times when I do know what it is going to be.
I often get an OutOfMemoryException and I've been told it can happen not only because process size limits, but also because of fragmentation.
So my question is this - will setting collection's size (using the capacity argument in the constructor) every time I know its expected size help me prevent at least some of the fragmentation problems ?
This quote is from the msdn :
If the size of the collection can be
estimated, specifying the initial
capacity eliminates the need to
perform a number of resizing
operations while adding elements to
the List.
But still, I don't want to start changing big parts of my code for something that might not be the real problem.
Has it ever helped any of you to solve out of memory problems ?
Specifying an initial size will rarely if ever get rid of an OutOfMemory issue - unless your collection size is millions of object in which case you should really not keep such a collection.
Resizing a collection involves defining a completely new array with a new additional size and then copying the memory. If you are already close to out of memory, yes, this can cause an out of memory since the new array cannot be allocated.
However, 99 out of 100, you have a memory leak in your app and collection resizing issues is only a symptom of it.
If you are hitting OOM, then you may be being overly aggressive with the data, but to answer the question:
Yes, this may help some - as if it has to keep growing the collections by doubling, it could end up allocating and copying twice as much memory for the underlying array (or more precicely, for the earlier smaller copies that are discarded). Most of these intermediate arrays will be collected promptly, but when they get big you are using the "large object heap", which is harder to compact.
Starting with the correct size prevents all the intermediate copies of the array.
However, it also depends what is in the array matters. Typically, for classes, there is more data in each object (plus overheads for references etc) - meaning the list is not necessarily the biggest culprit for memory use; you might be burning up most of the memory on objects.
Note that x64 will allow more overall space, but arrays are limited to 2GB - and if each reference doubles in size this halves the maximum effective length of the array.
Personally I would look at breaking the huge sets into smaller chains of lists; jagged lists, for example.
.NET has a compating garbage collector, so you probably won't run into fragmentation problems on the normal .NET heap. You can however get memory fragmentation if you're using lots of unmanaged memory (e.g. through GDI+, COM, etc.). Also, the large object heap isn't compacted, so that can get fragmented, too. IIRC an object is put into the LOH if it's bigger than 80kb. So if you have many collections that contain more than 20k objects, you might get fragmentation problems.
But instead of guessing where the problem might be, it might be better to narrow the problem down some more: When do you get the OutOfMemoryExceptions? How much memory is the application using at that time? Using a tool like WinDbg or memory profilers you should be able to find out how much of that memory is on the LOH.
That said, it's always a good idea to set the capacity of List and other data structures in advance if you know it. Otherwise, the List will double it's capacity everytime you add an item and hit the capacity limit which means lots of unnecessary allocation and copy operations.
In order to solve this, you have to understand the basics and pinpoint the problem in your code.
It is always a good idea to set the initial capacity, if you have a sensible estimate. If you only have an approximate guess, allocate more.
Fragmentation can only occur on the LOH (objects over 80 kB). To prevent it , try to allocate blocks of the same size. Paradoxically, the solution might be to sometimes allocate more memory than you actually need.
The answer is that, yes pre-defining a size on collections will increase performance and memory optimization and reduce fragmentation. See my answer here to see why - If I set the initial size of a .NET collection and then add some items OVER this initial size, how does the collection determine the next resize?
However, without analyzing a memory dump or memory profiling on the app, it's impossible to say exactly what the cause of the OOM is. Thus, impossible to conjecture if this optimization will solve the problem.
I have an other active question HERE regarding some hopeless memory issues that possibly involve LOH Fragmentation among possibly other unknowns.
What my question now is, what is the accepted way of doing things?
If my app needs to be done in Visual C#, and needs to deal with large arrays to the tune of int[4000000], how can I not be doomed by the garbage collector's refusal to deal with the LOH?
It would seem that I am forced to make any large arrays global, and never use the word "new" around any of them. So, I'm left with ungraceful global arrays with "maxindex" variables instead of neatly sized arrays that get passed around by functions.
I've always been told that this was bad practice. What alternative is there?
Is there some kind of function to the tune of System.GC.CollectLOH("Seriously") ?
Are there possibly some way to outsource garbage collection to something other than System.GC?
Anyway, what are the generally accepted rules for dealing with large (>85Kb) variables?
Firstly, the garbage collector does collect the LOH, so do not be immediately scared by its prescence. The LOH gets collected when generation 2 gets collected.
The difference is that the LOH does not get compacted, which means that if you have an object in there that has a long lifetime then you will effectively be splitting the LOH into two sections — the area before and the area after this object. If this behaviour continues to happen then you could end up with the situation where the space between long-lived objects is not sufficiently large for subsequent assignments and .NET has to allocate more and more memory in order to place your large objects, i.e. the LOH gets fragmented.
Now, having said that, the LOH can shrink in size if the area at its end is completely free of live objects, so the only problem is if you leave objects in there for a long time (e.g. the duration of the application).
Starting from .NET 4.5.1, LOH could be compacted, see GCSettings.LargeObjectHeapCompactionMode property.
Strategies to avoid LOH fragmentation are:
Avoid creating large objects that hang around. Basically this just means large arrays, or objects which wrap large arrays (such as the MemoryStream which wraps a byte array), as nothing else is that big (components of complex objects are stored separately on the heap so are rarely very big). Also watch out for large dictionaries and lists as these use an array internally.
Watch out for double arrays — the threshold for these going into the LOH is much, much smaller — I can't remember the exact figure but its only a few thousand.
If you need a MemoryStream, considering making a chunked version that backs onto a number of smaller arrays rather than one huge array. You could also make custom version of the IList and IDictionary which using chunking to avoid stuff ending up in the LOH in the first place.
Avoid very long Remoting calls, as Remoting makes heavy use of MemoryStreams which can fragment the LOH during the length of the call.
Watch out for string interning — for some reason these are stored as pages on the LOH and can cause serious fragmentation if your application continues to encounter new strings to intern, i.e. avoid using string.Intern unless the set of strings is known to be finite and the full set is encountered early on in the application's life. (See my earlier question.)
Use Son of Strike to see what exactly is using the LOH memory. Again see this question for details on how to do this.
Consider pooling large arrays.
Edit: the LOH threshold for double arrays appears to be 8k.
It's an old question, but I figure it doesn't hurt to update answers with changes introduced in .NET. It is now possible to defragment the Large Object Heap. Clearly the first choice should be to make sure the best design choices were made, but it is nice to have this option now.
https://msdn.microsoft.com/en-us/library/xe0c2357(v=vs.110).aspx
"Starting with the .NET Framework 4.5.1, you can compact the large object heap (LOH) by setting the GCSettings.LargeObjectHeapCompactionMode property to GCLargeObjectHeapCompactionMode.CompactOnce before calling the Collect method, as the following example illustrates."
GCSettings can be found in the System.Runtime namespace
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();
The first thing that comes to mind is to split the array up into smaller ones, so they don't reach the memory needed for the GC to put in it the LOH. You could spit the arrays into smaller ones of say 10,000, and build an object which would know which array to look in based on the indexer you pass.
Now I haven't seen the code, but I would also question why you need an array that large. I would potentially look at refactoring the code so all of that information doesn't need to be stored in memory at once.
You get it wrong. You do NOT need to havean array size 4000000 and you definitely do not need to call the garbace collector.
Write your own IList implementation. Like "PagedList"
Store items in arrays of 65536 elements.
Create an array of arrays to hold the pages.
This allows you to access basically all your elements with ONE redirection only. And, as the individual arrays are smaller, fragmentation is not an issue...
...if it is... then REUSE pages. Dont throw them away on dispose, put them on a static "PageList" and pull them from there first. All this can be transparently done within your class.
The really good thing is that this List is pretty dynamic in the memory usage. You may want to resize the holder array (the redirector). Even when not, it is about 512kbdata per page only.
Second level arrays have basically 64k per byte - which is 8 byte for a class (512kb per page, 256kb on 32 bit), or 64kb per struct byte.
Technically:
Turn
int[]
into
int[][]
Decide whether 32 or 64 bit is better as you want ;) Both ahve advantages and disadvantages.
Dealing with ONE large array like that is unwieldely in any langauge - if you ahve to, then... basically.... allocate at program start and never recreate. Only solution.
This is an old question, but with .NET Standard 1.1 (.NET Core, .NET Framework 4.5.1+) there is another possible solution:
Using ArrayPool<T> in the System.Buffers package, we can pool arrays to avoid this problem.
Am adding an elaboration to the answer above, in terms of how the issue can arise. Fragmentation of the LOH is not only dependent on the objects being long lived, but if you've got the situation that there are multiple threads and each of them are creating big lists going onto the LOH then you could have the situation that the first thread needs to grow its List but the next contiguous bit of memory is already taken up by a List from a second thread, hence the runtime will allocate new memory for the first threads List - leaving behind a rather big hole. This is whats happening currently on one project I've inherited and so even though the LOH is approx 4.5 MB, the runtime has got a total of 117MB free memory but the largest free memory segment is 28MB.
Another way this could happen without multiple threads, is if you've got more than one list being added to in some kind of loop and as each expands beyond the memory initially allocated to it then each leapfrogs the other as they grow beyond their allocated spaces.
A useful link is: https://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/
Still looking for a solution for this, one option may be to use some kind of pooled objects and request from the pool when doing the work. If you're dealing with large arrays then another option is to develop a custom collection e.g. a collection of collections, so that you don't have just one huge list but break it up into smaller lists each of which avoid the LOH.