I want to know if Array.Resize deletes the old allocated Array, and if yes when?
I assumed that it deletes it as soon as the values are copied.
But my teacher says that it only does so at the end of the program, meaning that the memory could be full with the old allocated values.
Is that so?
The old Array is not used in my code after the resize, this should call the GC, shouldn't it?
When objects are garbage-collected is a nondeterministic process and you shouldn’t care for that too much.
However what is deterministic is when the array is eligible for GC. This is when it gets out of scope, or more specific, when there are no more references to it. This happens for example when you’re outside the method that contains the array. Being marked for GC however won’t delete it, there needs to be some memory pressure on the GC which will make the GC clean up resources.
HimBromBeere and erikallen already explained what happens. We can also easily verify this experimentally.
Consider the following code:
static void Main(string[] args)
{
byte[] a = new byte[] { };
long total = 0;
Console.WriteLine("Iteration | curent array size (KB) | total allocations (KB) | private memory size (KB)");
for (int i = 1; i < Int32.MaxValue; i++ )
{
Array.Resize(ref a, i);
total += i;
if (i % 10000 == 0)
{
Console.WriteLine(i.ToString().PadLeft(9) +
(i / 1024).ToString().PadLeft(25) +
(total / 1024).ToString().PadLeft(25) +
(Process.GetCurrentProcess().PrivateMemorySize64 / 1024).ToString().PadLeft(27));
}
}
}
which yields the following result:
Iteration | curent array size (KB) | total allocations (KB) | private memory size (KB)
10000 9 48833 10560
20000 19 195322 10924
30000 29 439467 10976
40000 39 781269 11040
50000 48 1220727 11040
60000 58 1757841 11056
70000 68 2392612 11080
80000 78 3125039 11144
90000 87 3955122 14192
...
If all old arrays were kept im memory, we'd need around 4 GB (column total allocations) after 90000 iterations , but memory usage stays at a low 14 MB (column private memory size).
The old array will be considered unreachable by the GC and will be freed at some unspecified point in time, just as all other objects that become unreachable.
An object is elegible for collection when the GC determines that the object is not reachable anymore. Therefore, the original array can be collected if there is no "usable* reference left to reach it.
When the GC decides to collect the object itself is an alltogether different matter and it is up to the GC to decide; it might very well not collect it at all during the whole lifetime of your app simply because there is no memory pressure that requires it.
Example:
private Blah[] Frob()
{
var someArray = new Blah[] { .... }
//somework
return (Blah[])Array.Resive(someArray, size);
}
In this case, the object referenced bysomeArray will be eligible for collection once Frob returns, because the array is no longer reachable. Its a locally initialized object that can not be reached in any way.
However, in this example:
private Frob[] Foo()
{
var someArray = GetArrayOfFrobs()
//somework
return (Blah[])Array.Resive(someArray, size);
}
The object referenced by someArray will be eligible for collection depending on what GetArrayOfFrobs acutally does. If GetArrayOfFrobs returns an array that is cached somewhere or its part of the state of some other reachable object, then the GC will not mark it as collectible.
In any case, in a managed environment like .NET it’s not methods who decide if a managed object is “freed” or not as you seem to believe based on your question; it’s the GC and it does a pretty good job, so don’t fret about it.
Related
This question already has answers here:
change array size
(15 answers)
Closed 4 months ago.
This post was edited and submitted for review 4 months ago and failed to reopen the post:
Original close reason(s) were not resolved
If I create and array without initialization the program uses only a few MB of memory to save the reference. For example like this:
int[] arr = new int[1000000000];
Later if I initialize the elements of array the memory usage goes up by the full array amount.
The memory usage is lower if I initialize only a part of array.
for (int i = 0; i < arr.Length; i++) // Full memory usage, length * sizeof(int)
{
arr[i] = i;
}
----------------------------------------------------------
for (int i = 0; i < arr.Length/2; i++) // Only ~half memory usage, (length / 2) * sizeof(int)
{
arr[i] = i;
}
Now I need to use my full initialized array to create an data structure and then reduce its size by keeping every N element. Because the array is very big reducing its size can save several GB. The missing elements can be calculated from the remaining ones and created data structure. Its a trading calculation time for memory usage.
Now to the question:
Is it possible in C# to release some memory from array after the array is fully initialized?
I have tried to set every Nth element to zero but the memory usage is still the same.
Although Array.Resize() sounds like an option, in reality wouldn't satisfy your question. Your question is: How can I release RAM? Am I correct? If so, using Array.Resize() will only make it worse, because in reality it creates a new array, leaving the source untouched, so in reality this consumes even more RAM.
In the safe space of .Net, I don't think you can immediately drop the RAM. I think you will just have to create a final list of numbers and drop all references to the original array and wait for the garbage collector to do its job, or see if you can collect faster by calling GC.Collect().
In the unsafe space, however, you can control your RAM usage.
I will stop here for now because I just read an incoming comment from you stating that you need to delete the space but keep the size. You are using an array of a primitive data type. This array, once initialized, cannot be deleted, regardless of being in a safe or unsafe context.
In your case, I would then use a dictionary to save the remaining values and drop the entire original array. Line in your comment example, where you have 8 elements and you are conserving every 3rd element. Your dictionary will have 3 elements, then you drop the entire original array. The dictionary's key will tell you the array's original index, while its value will give you, well, the value contained in that position.
What if you initialize a second array
int[] arrCopy = new int[1000000000];
Copy the first array into the second
Array.Copy(arr, arrCopy, 1000000000);
Re-initialize the first array
arr = new int[1000000000];
Iterate over the second array and store every Nth value into the first
for (int i = 0 ; i < 1000000000; i += N){
arr[i] = i;
}
Re-initialize the second array
arrCopy = new int[1000000000];
I've created a simple test application which allocates 100mb using binary arrays.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;
namespace VirtualMemoryUsage
{
class Program
{
static void Main(string[] args)
{
StringBuilder sb = new StringBuilder();
sb.AppendLine($"IsServerGC = {System.Runtime.GCSettings.IsServerGC.ToString()}");
const int tenMegabyte = 1024 * 1024 * 10;
long allocatedMemory = 0;
List<byte[]> memory = new List<byte[]>();
for (int i = 0; i < 10; i++)
{
//alloc 10 mb memory
memory.Add(new byte[tenMegabyte]);
allocatedMemory += tenMegabyte;
}
sb.AppendLine($"Allocated memory: {PrettifyByte(allocatedMemory)}");
sb.AppendLine($"VirtualMemorySize64: {PrettifyByte(Process.GetCurrentProcess().VirtualMemorySize64)}");
sb.AppendLine($"PrivateMemorySize64: {PrettifyByte(Process.GetCurrentProcess().PrivateMemorySize64)}");
sb.AppendLine();
Console.WriteLine(sb.ToString());
Console.ReadLine();
}
private static object PrettifyByte(long allocatedMemory)
{
string[] sizes = { "B", "KB", "MB", "GB", "TB" };
int order = 0;
while (allocatedMemory >= 1024 && order < sizes.Length - 1)
{
order++;
allocatedMemory = allocatedMemory / 1024;
}
return $"{allocatedMemory:0.##} {sizes[order]}";
}
}
}
Note: For this test it is important to set gcserver to true in the app.config
<runtime>
<gcServer enabled="true"/>
</runtime>
This then will show the amount of PrivateMemorySize64 and VirtualMemorySize64 allocated by the process.
While PrivateMemorySize64 remains similar on different computers, VirtualMemorySize64 varies quite a bit.
What is the reason for this differences in VirtualMemorySize64 when the same amount of memory is allocated? Is there any documentation about this?
Wow, you're lucky. On my machine, the last line says 17 GB!
Allocated memory: 100M
VirtualMemorySize64: 17679M
PrivateMemorySize64: 302M
While PrivateMemorySize64 remains similar on different computers [...]
Private bytes are the bytes that belong to your program only. It can hardly be influenced by something else. It contains what is on your heap and inaccessible by someone else.
Why is that 302 MB and not just 100 MB? SysInternals VMMap is a good tool to break down that value:
The colors and sizes of private bytes say:
violet (7.5 MB): image files, i.e. DLLs that are not shareable
orange (11.2 MB): heap (non-.NET)
green (103 MB): managed heap
orange (464 kB): stack
yellow (161 MB): private data, e.g. TEB and PEB
brown (36 MB): page table
As you can see, .NET has just 3 MB overhead in the managed heap. The rest is other stuff that needs to be done for any process.
A debugger or a profiler can help in breaking down the managed heap:
0:013> .loadby sos clr
0:013> !dumpheap -stat
[...]
000007fedac16878 258 11370 System.String
000007fed9bafb38 243 11664 System.Diagnostics.ThreadInfo
000007fedac16ef0 34 38928 System.Object[]
000007fed9bac9c0 510 138720 System.Diagnostics.NtProcessInfoHelper+SystemProcessInformation
000007fedabcfa28 1 191712 System.Int64[]
0000000000a46290 305 736732 Free
000007fedac1bb20 13 104858425 System.Byte[]
Total 1679 objects
So you can see there are some strings and other objects that .NET needs "by default".
What is the reason for this differences in VirtualMemorySize64 when the same amount of memory is allocated?
0:013> !address -summary
[...]
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE 58 7fb`adfae000 ( 7.983 TB) 99.79%
MEM_RESERVE 61 4`3d0fc000 ( 16.954 GB) 98.11% 0.21%
MEM_COMMIT 322 0`14f46000 ( 335.273 MB) 1.89% 0.00%
Only 335 MB are committed. That's memory that can actually be used. The 16.954 GB are just reserved. They cannot be used at the moment. They are neither in RAM nor on disk in the page file. Allocating reserved memory is super fast. I've seen that 17 GB value very often, especially in ASP.NET crash dumps.
Looking at details in VMMap again
we can see that the 17 GB are just allocated in one block. A comment on your question said: "When the system runs out of memory, the garbage collector fires and releases the busy one." However, to release a VirtualAlloc()'d block by VirtualFree(), that block must not be logically empty, i.e. there should not be a single .NET object inside - and that's unlikely. So it will stay there forever.
What are possible benefits? It's a single contiguous block of memory. If you need a new byte[4G]() now, it would just work.
Finally, the likely reason is: it's done because it doesn't hurt, neither RAM nor disk. And when needed, it can be commited at a later point in time.
Is there any documentation about this?
That's unlikely. The GC implementation in detail could change with the next version of .NET. I think Microsoft does not document that, otherwise people would complain if the behavior changed.
There are people who have written blog posts like this one that tells us that some values might depend on the number of processors for example.
0:013> !eeheap -gc
Number of GC Heaps: 4
[...]
What we see here is that .NET creates as many heaps as processors. That's good for gabage collection, since every processor can collect one heap independently.
The metrics you are using is not allocated memory, but memory used by the process. One - private, another - shared with other processes on your machine. Real amount of memory, used by the process varies depending on both amount of available memory and other processes running.
Edit: answer by Thomas Weller provides much more details on that subject than my Microsoft links
It does not necessarily represent the amount of allocations performed by your application. If you want to get estimate of the allocated memory (not including .NET framework libraries and memory pagination overhead, etc) you can use
long memory = GC.GetTotalMemory(true);
where true parameter tells GC to perform garbage collection first (it doesn't have to). Unused, but not collected memory is accounted for in the values you asked about. If system has enough memory, it might not be collected until it's needed. Here you can find additional information on how GC works.
I'm using a datatable to hold a running last 1000 log messages in FIFO methodology. I add items into the datatable and remove first in row after the size grows to 1000 items. However, while the datatable doesn't exceed 1000 items the memory drops over time.
Sample:
DataTable dtLog = new DataTable();
for (int nLoop = 0; nLoop < 10000; nLoop++)
{
oLog LogType = new LogType();
oLog.Name = "Message number " + nLoop;
dtLog.Rows.Add( oLog);
if (dtLog.Rows.Count > 1000)
dtLog.Rows.RemoveAt(0);
}
So the messages are removed from the datatable, but the memory doesn't seem to get released. I would expect the memory to be released...?
Or perhaps there's a better way to do a running log using something other than datatables?
I can't speak to the memory leak part of your question as the Memory Management and Garbage Collection in .net makes that a hard thing to investigate.
But, what I can do is suggest that unless you have to, you should never use DataTables in .Net.
Now, "never" is a pretty strong claim! That sort of thing needs backing up with good reasons.
So,. what are those reasons? ... memory usage.
I created this .net fiddle: https://dotnetfiddle.net/wOtjw1
using System;
using System.Collections.Generic;
using System.Xml;
using System.Data;
public class DataObject
{
public string Name { get; set; }
}
public class Program
{
public static void Main()
{
Queue();
}
public static void DataTable()
{
var dataTable = new DataTable();
dataTable.Columns.Add("Name", typeof(string));
for (int nLoop = 0; nLoop < 10000; nLoop++)
{
var dataObject = new DataObject();
dataObject.Name = "Message number " + nLoop;
dataTable.Rows.Add(dataObject);
if (dataTable.Rows.Count > 1000)
dataTable.Rows.RemoveAt(0);
}
}
public static void Queue()
{
var queue = new Queue<DataObject>();
for (int nLoop = 0; nLoop < 10000; nLoop++)
{
var dataObject = new DataObject();
dataObject.Name = "Message number " + nLoop;
queue.Enqueue(dataObject);
if (queue.Count > 1000)
queue.Dequeue();
}
}
}
Run it twice, once with the DataTable method, once with the Queue method.
Look at the memory usage .net fiddle reports each time:
DataTable Memory: 2.74Mb
Queue Memory: 1.46Mb
It's almost half the memory usage! And all we did was stop using DataTables.
.Net DataTables are notoriously memory hungry. They have fairly good reasons for that, they can store lots of complex schema information and can track changes etc.
That's great, but ... do you need those features?
No? Dump the DT, use something under System.Collections(.Generic).
Whenever you modify/delete a row from DataTable the old/deleted data is still kept by the DataTable until you call DataTable.AcceptChanges
When AcceptChanges is called, any DataRow object still in edit mode successfully ends its edits. The DataRowState also changes: all Added and Modified rows become Unchanged, and Deleted rows are removed.
There is no memory leak because that is as designed.
As an alternative you can use a circular buffer which would fit better than a queue.
Your memory is released but it is not so easy to see. There is a lack of tools (except Windbg with SOS) to show the currently allocated memory minus dead objects. Windbg has for this the !DumpHeap -live option to display only live objects.
I have tried the fiddle from AndyJ https://dotnetfiddle.net/wOtjw1
First I needed to create a memory dump with DataTable to have a stable baseline. MemAnalyzer https://github.com/Alois-xx/MemAnalyzer is the right tool for that.
MemAnalyzer.exe -procdump -ma DataTableMemoryLeak.exe DataTable.dmp
This expects procdump from SysInternals in your path.
Now you can run the program with the queue implementation and compare the allocation metrics on the managed heap:
C>MemAnalyzer.exe -f DataTable.dmp -pid2 20792 -dtn 3
Delta(Bytes) Delta(Instances) Instances Instances2 Allocated(Bytes) Allocated2(Bytes) AvgSize(Bytes) AvgSize2(Bytes) Type
-176,624 -10,008 10,014 6 194,232 17,608 19 2934 System.Object[]
-680,000 -10,000 10,000 0 680,000 0 68 System.Data.DataRow
-7,514 -88 20,273 20,185 749,040 741,526 36 36 System.String
-918,294 -20,392 60,734 40,342 1,932,650 1,014,356 Managed Heap(Allocated)!
-917,472 0 0 0 1,954,980 1,037,508 Managed Heap(TotalSize)
This shows that we have 917KB more memory allocated with the DataTable approach and that 10K DataRow instances are still floating around on the managed heap. But are these numbers correct?
No.
Because most objects are already dead but no full GC did happen before we did take a memory dump these objects are still reported as alive. The fix is to tell MemAnalyzer to consider only rooted (live) objects like Windbg does it with the -live option:
C>MemAnalyzer.exe -f DataTable.dmp -pid2 20792 -dts 5 -live
Delta(Bytes) Delta(Instances) Instances Instances2 Allocated(Bytes) Allocated2(Bytes) AvgSize(Bytes) AvgSize2(Bytes) Type
-68,000 -1,000 1,000 0 68,000 0 68 System.Data.DataRow
-36,960 -8 8 0 36,960 0 4620 System.Data.RBTree+Node<System.Data.DataRow>[]
-16,564 -5 10 5 34,140 17,576 3414 3515 System.Object[]
-4,120 -2 2 0 4,120 0 2060 System.Data.DataRow[]
-4,104 -1 19 18 4,716 612 248 34 System.String[]
-141,056 -1,285 1,576 291 169,898 28,842 Managed Heap(Allocated)!
-917,472 0 0 0 1,954,980 1,037,508 Managed Heap(TotalSize)
The DataTable approach still needs 141,056 bytes more memory because of the extra DataRow, object[] and System.Data.RBTree+Node[] instances. Measuring only the Working set is not enough because the managed heap is lazy deallocated. The GC can keep large amounts of memory if it thinks that the next memory spike is not far away. Measuring committed memory is therefore a nearly meaningless metric except if your (very low hanging) goal is to fix only memory leaks of GB in size.
The correct way to measure things is to measure the sum of
Unmanaged Heap
Allocated Managed Heap
Memory Mapped Files
Page File baked Memory Mapped File (Shareable Memory)
Private Bytes
This is actually exactly what MemAnalyzer does with the -vmmap switch which expexct vmmap from Sysinternals in its path.
MemAnalyzer -pid ddd -vmmap
This way you can also track unmanaged memory leaks or file mapping leaks as well. The return value of MemAnalyzer is the total allocated memory in KB.
If -vmmap is used it will report the sum of the above points.
If vmmap is not present it will only report the allocated managed heap.
If -live is added then only rooted managed objects are reported.
I did write the tool because there are no tools out there to my knowledge which make it easy to look at memory leaks in a holistic way. I always want to know if I leak memory regardless if it is managed, unmanaged or something else.
By writing the diff output to a CSV file you can create easily Pivot diff charts like the one above.
MemAnalyzer.exe -f DataTable.dmp -pid2 20792 -live -o ExcelDiff.csv
That should give you some ideas how to track allocation metrics in a much more accurate way.
After doing some profiling, we've discovered that the current way in which our app concatenates strings causes an enormous amount of memory churn and CPU time.
We're building a List<string> of strings to concatenate that is on the order of 500 thousand elements long, referencing several hundred megabytes worth of strings. We're trying to optimize this one small part of our app since it seems to account for a disproportionate amount of CPU and memory usage.
We do a lot of text processing :)
Theoretically, we should be able to perform the concatenation in a single allocation and N copies - we can know how many total characters are available in our string, so it should just be as simple as summing up the lengths of the component strings and allocating enough underlying memory to hold the result.
Assuming we're starting with a pre-filled List<string>, is it possible to concatenate all strings in that list using a single allocation?
Currently, we're using the StringBuilder class, but this stores its own intermediate buffer of all of the characters - so we have an ever growing chunk array, with each chunk storing a copy of the characters we're giving it. Far from ideal. The allocations for the array of chunks aren't horrible, but the worst part is that it allocates intermediate character arrays, which means N allocations and copies.
The best we can do right now is to call List<string>.ToArray() - which performs one copy of a 500k element array - and pass the resulting string[] to string.Concat(params string[]). string.Concat() then performs two allocations, one to copy the input array into an internal array, and the one to allocate the destination string's memory.
From referencesource.microsoft.com:
public static String Concat(params String[] values) {
if (values == null)
throw new ArgumentNullException("values");
Contract.Ensures(Contract.Result<String>() != null);
// Spec#: Consider a postcondition saying the length of this string == the sum of each string in array
Contract.EndContractBlock();
int totalLength=0;
// -----------> Allocation #1 <---------
String[] internalValues = new String[values.Length];
for (int i=0; i<values.Length; i++) {
string value = values[i];
internalValues[i] = ((value==null)?(String.Empty):(value));
totalLength += internalValues[i].Length;
// check for overflow
if (totalLength < 0) {
throw new OutOfMemoryException();
}
}
return ConcatArray(internalValues, totalLength);
}
private static String ConcatArray(String[] values, int totalLength) {
// -----------------> Allocation #2 <---------------------
String result = FastAllocateString(totalLength);
int currPos=0;
for (int i=0; i<values.Length; i++) {
Contract.Assert((currPos <= totalLength - values[i].Length),
"[String.ConcatArray](currPos <= totalLength - values[i].Length)");
FillStringChecked(result, currPos, values[i]);
currPos+=values[i].Length;
}
return result;
}
Thus, in the best case, we have three allocations, two for arrays referencing the component strings, and one for the destination concatenated string.
Can we improve on this? Is it possible to concatenate a List<string> using a single allocation and a single loop of character copies?
Edit 1
I'd like to summarize the various approaches discussed so far, and why they are still sub-optimal. I'd also like to set the parameters of the situation in concrete a little more, since I've received a lot of questions that try to side step the central question.
...
First, the structure of the code that I am working within. There are three layers:
Layer one is a set of methods that produce my content. These methods return small-ish string objects, which I will call my 'component' strings'. These string objects will eventually be concatenated into a single string. I do not have the ability to modify these methods; I have to face the reality that they return string objects and move forward.
Layer two is my code that calls these content producers and assembles the output, and is the subject of this question. I must call the content producer methods, collect the strings they return, and eventually concatenate the returned strings into a single string (reality is a little more complex; the returned strings are partitioned depending on how they're routed for output, and so I have several sets of large collections of strings).
Layer three is a set of methods that accept a single large string for further processing. Changing the interface of that code is beyond my control.
Talking about some numbers: a typical batch run will collect ~500000 strings from the content producers, representing about 200-500 MB of memory. I need the most efficient way to concatenate these 500k strings into a single string.
...
Now I'd like to examine the approaches discussed so far. For the sake of numbers, assume we're running 64-bit, assume that we are collecting 500000 string objects, and assume that the aggregate size of the string objects totals 200 megabytes worth of character data. Also, assume that the original string object's memory is not counted toward any approach's total in the below analysis. I make this assumption because it is necessarily common to any and all approaches, because it is an assumption that we cannot change the interface of the content producers - they return 500k relatively small fully formed strings objects that I must then accept and somehow concatenate. As stated above, I cannot change this interface.
Approach #1
Content producers ----> StringBuilder ----> string
Conceptually, this would be invoking the content producers, and directly writing the strings they return to a StringBuilder, and then later calling StringBuilder.ToString() to obtain the concatenated string.
By analyzing StringBuilder's implementation, we can see that the cost of this boils down to 400 MB of allocations and copies:
During the stage where we collect the output from the content producers, we're writing 200 MB of data to the StringBuilder. We would be performing one 200 MB allocation to pre-allocate the StringBuilder, and then 200 MB worth of copies as we copy and discard the strings returned from the content producers
After we've collected all output from the content producers and have a fully formed StringBuilder, we then need to call StringBuilder.ToString(). This performs exactly one allocation (string.FastAllocateString()), and then copies the string data from its internal buffers to the string object's internal memory.
Total cost: approximately 400 MB of allocations and copies
Approach #2
Content producers ---> pre-allocated char[] ---> string
This strategy is fairly simple. Assuming we know roughly how much character data we're going to be collecting from the producers, we can pre-allocate a char[] that is 200 MB large. Then, as we call the content producers, we copy the strings they return into our char[]. This accounts for 200 MB of allocations and copies. The final step to turn this into a string object is to pass it to the new string(char[]) constructor. However, since strings are immutable and arrays are not, the constructor will make a copy of that entire array, causing it to allocate and copy another 200 MB of character data.
Total cost: approximately 400 MB of allocations and copies
Approach #3:
Content producers ---> List<string> ----> string[] ----> string.Concat(string[])
Pre-allocate a List<string> to be about 500k elements - approximately 4 MB of allocations for List's underlying array (500k * 8 bytes per pointer == 4 MB of memory).
Call all of the content producers to collect their strings. Approximately 4 MB of copies, as we copy the pointer to the returned string into List's underlying array.
Call List<string>.ToArray() to obtain a string[]. Approximately 4 MB of allocations and copies (again, we're really just copying pointers).
Call string.Concat(string[]):
Concat will make a copy of the array provided to it before it does any real work. Approximately 4 MB of allocations and copies, again.
Concat will then allocate a single 'destination' string object using the internal string.FastAllocateString() special method. Approximately 200 MB of allocations.
Concat will then copy strings from its internal copy of the provided array directly into the destination. Approximately 200 MB of copies.
Total cost: approximately 212 MB of allocations and copies
None of these approaches are ideal, however approach #3 is very close. We're assuming that the absolute minimum of memory that needs to be allocated and copied is 200 MB (for the destination string), and here we get pretty close - 212 MB.
If there were a string.Concat overload that 1) Accepted an IList<string> and 2) did not make a copy of that IList before using it, then the problem would be solved. No such method is provided by .Net, hence the subject of this question.
Edit 2
Progress on a solution.
I've done some testing with some hacked IL, and found that directly invoking string.FastAllocateString(n) (which is not usually invokable...) is about as fast as invoking new string('\0', n), and both seem to allocate exactly as much memory as is expected.
From there, it seems its possible to acquire a pointer to the freshly allocated string using the unsafe and fixed statements.
And so, a rough solution begins to appear:
private static string Concat( List<string> list )
{
int concatLength = 0;
for( int i = 0; i < list.Count; i++ )
{
concatLength += list[i].Length;
}
string newString = new string( '\0', concatLength );
unsafe
{
fixed( char* ptr = newString )
{
...
}
}
return newString;
}
The next biggest hurdle is implementing or finding an efficient block copy method, ala Buffer.BlockCopy, except one that will accept char* types.
If you can determine the length of the concatenation before trying to perform the operation, a char array can beat string builder in some use cases. Manipulating the characters within the array prevents the multiple allocations.
See: http://blogs.msdn.com/b/cisg/archive/2008/09/09/performance-analysis-reveals-char-array-is-better-than-stringbuilder.aspx
UPDATE
Please check out this internal implementation of the String.Join from .NET - it uses unsafe code with pointers to avoid multiple allocations. Unless I'm missing something, it would seem you can re-write this using your List to accomplish what you want:
[System.Security.SecuritySafeCritical] // auto-generated
public unsafe static String Join(String separator, String[] value, int startIndex, int count) {
//Range check the array
if (value == null)
throw new ArgumentNullException("value");
if (startIndex < 0)
throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_StartIndex"));
if (count < 0)
throw new ArgumentOutOfRangeException("count", Environment.GetResourceString("ArgumentOutOfRange_NegativeCount"));
if (startIndex > value.Length - count)
throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_IndexCountBuffer"));
Contract.EndContractBlock();
//Treat null as empty string.
if (separator == null) {
separator = String.Empty;
}
//If count is 0, that skews a whole bunch of the calculations below, so just special case that.
if (count == 0) {
return String.Empty;
}
int jointLength = 0;
//Figure out the total length of the strings in value
int endIndex = startIndex + count - 1;
for (int stringToJoinIndex = startIndex; stringToJoinIndex <= endIndex; stringToJoinIndex++) {
if (value[stringToJoinIndex] != null) {
jointLength += value[stringToJoinIndex].Length;
}
}
//Add enough room for the separator.
jointLength += (count - 1) * separator.Length;
// Note that we may not catch all overflows with this check (since we could have wrapped around the 4gb range any number of times
// and landed back in the positive range.) The input array might be modifed from other threads,
// so we have to do an overflow check before each append below anyway. Those overflows will get caught down there.
if ((jointLength < 0) || ((jointLength + 1) < 0) ) {
throw new OutOfMemoryException();
}
//If this is an empty string, just return.
if (jointLength == 0) {
return String.Empty;
}
string jointString = FastAllocateString( jointLength );
fixed (char * pointerToJointString = &jointString.m_firstChar) {
UnSafeCharBuffer charBuffer = new UnSafeCharBuffer( pointerToJointString, jointLength);
// Append the first string first and then append each following string prefixed by the separator.
charBuffer.AppendString( value[startIndex] );
for (int stringToJoinIndex = startIndex + 1; stringToJoinIndex <= endIndex; stringToJoinIndex++) {
charBuffer.AppendString( separator );
charBuffer.AppendString( value[stringToJoinIndex] );
}
Contract.Assert(*(pointerToJointString + charBuffer.Length) == '\0', "String must be null-terminated!");
}
return jointString;
}
Source: http://www.dotnetframework.org/default.aspx/4#0/4#0/DEVDIV_TFS/Dev10/Releases/RTMRel/ndp/clr/src/BCL/System/String#cs/1305376/String#cs
UPDATE 2
Good point on the fast allocate. According to an old SO post, you can wrap FastAllocate using reflection (assuming of course you'd cache the fastAllocate method reference so you just called Invoke each time. Perhaps the tradeoff of the call is better than what you're doing now.
var fastAllocate = typeof (string).GetMethods(BindingFlags.NonPublic | BindingFlags.Static)
.First(x => x.Name == "FastAllocateString");
var newString = (string)fastAllocate.Invoke(null, new object[] {20});
Console.WriteLine(newString.Length); // 20
Perhaps another approach is to use unsafe code to copy your allocation into a char* array, then pass this to the string constructor. The string constructor with char* is an extern passed to the underlying C++ implementation. I haven't found a reliable source for that code to confirm, but perhaps this can be faster for you. The non-prod ready code (no checks for potential overflow, add fixed to lock strings from garbage collection, etc) would start with:
public unsafe string MyConcat(List<string> values)
{
int index = 0;
int totalLength = values.Sum(m => m.Length);
char* concat = stackalloc char[totalLength + 1]; // Add additional char for null term
foreach (var value in values)
{
foreach (var c in value)
{
concat[index] = c;
index++;
}
}
concat[index] = '\0';
return new string(concat);
}
Now I'm all out of ideas for this :) Perhaps somebody can figure out a method here with marshalling to avoid unsafe code. Since introducing unsafe code requires adding the unsafe flag to compilation, consider adding this piece as a separate dll to minimize your app's security risk if you go down that route.
Unless the average length of the strings is very small, the most efficient approach, given a List<String>, will be to use ToArray() to copy it to a new String[], and pass that to a concatenation or joining method. Doing that may cause a wasted allocation for an array of references if the concatenation or joining method wants to make a copy of its array before it starts, but that would only allocate one reference per string, there will only be one allocation to hold character data, and it will be correctly sized to hold the entire string.
If you're building the data structure yourself, you might gain a little bit of efficiency by initializing a String[] to the estimated required size, populating it yourself, and expanding it as needed. That would save one allocation of a String[] worth of data.
Another approach would be to allocate a String[8192][] and then allocate a String[8192] for each array of strings as you go along. Once you're all done, you'll know exactly what size String[] you need to pass to the Concat method so you can create an array of that exact size. This approach would require a greater quantity of allocations, but only the final String[] and the String itself would need to go on the Large Object Heap.
It's a shame the constraints you're putting on yourself. It's very blockily structured, and it's hard to get any flow going. For example, if you didn't expect a IList but only expected IEnumerable you might be able to make it easier for the producer of your content. Not only that, you could make your processing benefit from being able to consume the strings only as you need them - and only as they're produced.
This gets you on down the road to some nice asynchrony.
One the other end, they're making you send to whole thing at once. That's tough.
But having said that, and since you're going to run it over and over, etc... I'm wondering if you couldn't create your string buffer or byte buffer or StringBuilder or whatever - and reuse it between executions - allocate the max monster (or progressively bump-reallocate it as needed) one time - and don't let the gc have it. The string constructor will copy it over and over again - but that's a single allocation per cycle. If you're running this so much you're making the machine hot, then it might be worth the hit. I've made precisely that tradeoff in the near past (but I didn't have 5gb to choke on). It felt dirty at first - but ooohh - the throughput spoke loudly!
Also, it may be possible, that while your native API expects a string, but you can lie to it - let it think you're giving it a string. You can very probably pass the buffer with a null char at the end - or with the length - depending on the API's particulars. I think one or two commenters spoke to this. In such a case, you may probably need your buffer pinned for the duration of the calls to the native consumer of your big ol' string.
If this is the case, you're down to a one-time allocation of a buffer, repeated copies into it, and that's it. It could go way under your proposed best case.
I have implemented a method to concatenate a List into a single string that performs exactly one allocation.
The following code compiles under .Net 4.6 - Block.MemoryCopy wasn't added to .Net until 4.6.
The "unsafe" implementation:
public static unsafe class FastConcat
{
public static string Concat( IList<string> list )
{
string destinationString;
int destLengthChars = 0;
for( int i = 0; i < list.Count; i++ )
{
destLengthChars += list[i].Length;
}
destinationString = new string( '\0', destLengthChars );
unsafe
{
fixed( char* origDestPtr = destinationString )
{
char* destPtr = origDestPtr; // a pointer we can modify.
string source;
for( int i = 0; i < list.Count; i++ )
{
source = list[i];
fixed( char* sourcePtr = source )
{
Buffer.MemoryCopy(
sourcePtr,
destPtr,
long.MaxValue,
source.Length * sizeof( char )
);
}
destPtr += source.Length;
}
}
}
return destinationString;
}
}
The competing implementation is the following "safe" implementation:
public static string Concat( IList<string> list )
{
return string.Concat( list.ToArray() )
}
Memory consumption
The "unsafe" implementation performs exactly one allocation and zero temporary allocations. The List<string> is directly concatenated into a single, freshly allocated string object.
The "safe" implementation requires two copies of the list - one, when I call ToArray() to pass it to string.Concat, and another when string.Concat performs its own internal copy of the array.
When concatenating a 500k element list, the "safe" string.Concat method allocates exactly 8 MB of extra memory in a 64-bit process, which I've confirmed by running the test driver in a memory monitor. This is what we would expect with the array copies performed by the safe implementation.
CPU performance
For small worksets, the unsafe implementation seems to win by about 25%.
The test driver was tested by compiling for 64-bit, installing the program into the native image cache via NGEN, and running from outside the debugger on an unloaded workstation.
From my test driver with a small workset (500k strings each 2-10 chars long):
Unsafe Time: 17.266 ms
Unsafe Time: 18.419 ms
Unsafe Time: 16.876 ms
Safe Time: 21.265 ms
Safe Time: 21.890 ms
Safe Time: 24.492 ms
Unsafe average: 17.520 ms. Safe average: 22.549 ms. Safe takes about 25% longer than unsafe. This is likely due to the extra work the safe implementation has to do, allocating temporary arrays.
...
From my test driver with a large workset (500k strings, each 500-800 chars long):
Unsafe Time: 498.122 ms
Unsafe Time: 513.725 ms
Unsafe Time: 515.016 ms
Safe Time: 487.456 ms
Safe Time: 499.508 ms
Safe Time: 512.390 ms
As you can see, the performance difference with large strings is roughly zero, likely because the time is dominated by the raw copy.
Conclusion
If you don't care about the array copies, the safe implementation is dead simple to implement, and is roughly as fast as the unsafe implementation. If you want to be absolutely perfect with memory usage, use the unsafe implementation.
I've attached the code I used for the test harness:
class PerfTestHarness
{
private List<string> corpus;
public PerfTestHarness( List<string> corpus )
{
this.corpus = corpus;
// Warm up the JIT
// Note that `result` is discarded. We reference it via 'result[0]' as an
// unused paramater to my prints to be absolutely sure it doesn't get
// optimized out. Cheap hack, but it works.
string result;
result = FastConcat.Concat( this.corpus );
Console.WriteLine( "Fast warmup done", result[0] );
result = string.Concat( this.corpus.ToArray() );
Console.WriteLine( "Safe warmup done", result[0] );
GC.Collect();
GC.WaitForPendingFinalizers();
}
public void PerfTestSafe()
{
Stopwatch watch = new Stopwatch();
string result;
GC.Collect();
GC.WaitForPendingFinalizers();
watch.Start();
result = string.Concat( this.corpus.ToArray() );
watch.Stop();
Console.WriteLine( "Safe Time: {0:0.000} ms", watch.Elapsed.TotalMilliseconds, result[0] );
Console.WriteLine( "Memory usage: {0:0.000} MB", Environment.WorkingSet / 1000000.0 );
Console.WriteLine();
}
public void PerfTestUnsafe()
{
Stopwatch watch = new Stopwatch();
string result;
GC.Collect();
GC.WaitForPendingFinalizers();
watch.Start();
result = FastConcat.Concat( this.corpus );
watch.Stop();
Console.WriteLine( "Unsafe Time: {0:0.000} ms", watch.Elapsed.TotalMilliseconds, result[0] );
Console.WriteLine( "Memory usage: {0:0.000} MB", Environment.WorkingSet / 1000000.0 );
Console.WriteLine();
}
}
StringBuilder was designed to concatenate strings efficiently. It has no other purpose.Use the constructor which sets the initial capacity:
int totalLength = CalcTotalLength();
// sufficient capacity
StringBuilder sb = new StringBuilder(totalLength);
But then you say that even StringBuilder allocates intermediate memory, and you want to do better...
These are unusual requirements, so you need to write a function which suits your situation (creating a char[] of appropriate size, then filling it in). I'm sure you are more than capable.
The first two of my answers have now been already incorporated in the question. Here is my highly situation dependent, but useful -
Third Answer
If in all these MBs of string you are getting a lot of strings that are same, then a smarter way would be use two dictionaries, one would be Dictionary<int, int> to store position and "Id" of the string at that position while another would be a Dictionary<int, int> to store the "Id" and the index of actual string in the original string[].
Coincidentally for me, what I am trying to do is already implemented in C#. Goes kinda like this...
If indeed there are a lot of same strings, is it a rare case where String Interning is useful? You are guaranteed to save considerable amount of your 200 MB target if a lot of matching strings are coming from the content producers.
What is String.Intern?
When you use strings in C#, the CLR does something clever called
string interning. It's a way of storing one copy of any string. If you
end up having a hundred—or, worse, a million—strings with the same
value, it's a waste to take up all of that memory storing the same
string over and over again. String interning is a way around that.
The CLR maintains a table called the intern pool that contains a
single, unique reference to every literal string that's either
declared or created programmatically while your program's running. And
the .NET Framework gives you two useful methods for interacting with
the intern pool: String.Intern() and String.IsInterned().
The way String.Intern() works is pretty straightforward. You pass it a
single string as an argument. If that string is already in the intern
pool, it returns a reference to that string. If it's not already in
the intern pool, it adds it and returns the same reference you passed
into it.
The way to use String Interning is explained in the link. For the sake of completeness of this answer I can add the code here but only if you feel that these solutions are useful.
I'm investigating the memory leak issue with PerfMon & WinDbg. I was noticed the 'Large memory heap' counter is increased from 10MB to 37MB. After force a GC it only can reduce to 28MB.
(No matter how many time I repeate the operation(create/destroy), after GC, the large object heap is stable at 28MB).
I would like know which objects cause leak issue, so I run the WinDbg with the '!Dumpheap -min 85000' comand. Captured two snapshots, the first one was done before memory leak; The second one is after memory leak:
Before:
MT Count TotalSize Class Name
6f39fb08 1 89024 System.String
6f3a4aa0 1 107336 System.Byte[]
6f356d84 2 262176 System.Object[]
00360e4c 1 350392 System.Collections.Generic.Dictionary`2+Entry[Int64,Int32][]
6f3a2a94 3 592584 System.Int32[]
00360c24 1 727072 System.Collections.Generic.Dictionary`2+Entry[String,Int64][]
0bc78b34 4 2754488 System.Collections.Generic.Dictionary`2+Entry[Int64, AccountNode][]
00730260 10 5375572 Free
After:
MT Count TotalSize Class Name
6f39fb08 1 89024 System.String
6f3a4aa0 1 107336 System.Byte[]
6f3a55d8 2 202080 System.Collections.Hashtable+bucket[]
6f356d84 2 262176 System.Object[]
00360e4c 1 350392 System.Collections.Generic.Dictionary`2+Entry[Int64,Int32][]
00360c24 1 727072 System.Collections.Generic.Dictionary`2+Entry[String,Int64][]
6f3a2a94 4 738008 System.Int32[]
6cf02838 1 872488 System.Collections.Generic.Dictionary`2+Entry[[MS.Internal.ComponentModel.PropertyKey, WindowsBase],[MS.Internal.ComponentModel.DependencyPropertyKind, WindowsBase]][]
0bc78b34 4 2754488 System.Collections.Generic.Dictionary`2+Entry[Int64, AccountNode][]
00730260 14 21881328 Free
Total 31 objects
Camparing these two snapshot, the most difference is the size of 'Free'. its size has increased near 16MB.
Can anyone tell me what's the meaning of the 'Free', is it the free space? Is the increasement caused by fragements?
According to this article, the ‘Large Object Heap Size’ performance counter seems include free space.
So in my case, there isn't have too much memory leak on large object heap, only 2MB (= 28 - 10 -16), right?
FREE indicates a block of unused memory on the heap. FREE blocks on the LOH are expected, because the LOH never gets compacted. Instead, a list of FREE space is kept for the LOH. FREE blocks on the normal GC heap, with a few exceptions, indicate fragmentation due to the pinning of objects. When the GC encounters a pinned object, compaction of the segment is halted and the memory consumed by unused objects is marked as FREE. What you're seeing on the LOH is normal. Remember that the LOH is never compacted and memory segments allocated for the LOH are never freed, so the LOH never shrinks.
The meaning of large object heap is well explained here.
Large objects are objects of size greater than 85kb, stored in that area, and collected only when the generation 2 will be reclaimed.