Deleting byte[] from program and memory in C#

Deleting byte[] from program and memory in C# - c#

I'm making a program in which one of its functions, in order to correctly create the message to be sent, keeps calling a function I have generated to add each of the parts to the array. The thing is, in C# you can't do this because the byte arrays (and if I'm not wrong, any kind of array) has a finite Length which cannot be changed.
Due to this, I thought of creating 2 byte variables. The first one would get the first to values. The second one would be created after you know the quantity of new bytes you have to add, and after this, you would delete the first variable and create it again, with the Length of the previous variable, but adding the Length of the new values, doing the same you did with the second variable. The code I've generated is:
byte[] message_mod_0 = adr_and_func;
byte[] byte_memory_adr = AddAndTypes.ToByteArray(memory_adr);
byte[] message_mod_1 = new byte[2 + byte_memory_adr.Length];
message_mod_1 = AddAndTypes.AddByteArrayToByteArray(message_mod_0, byte_memory_adr);
AddAndTypes.AddByteArrayToByteArray(message_mod_0, AddAndTypes.IntToByte(value));
byte[] CRC = Aux.CRC(message_mod_0);
AddAndTypes.AddByteArrayToByteArray(message_mod_0, CRC);
In this code, the two variables I've meant are message_mod_0 and message_mod_1. I also think of doing the deleting and redeclaring the byte_memory_adr variable that is required in order to know which is the Length of the byte array you want to add to the ouput message.
The parameters adr_and_func, memory_adr and value are given as input parameters of the function I'm making.
The question can be summed up as: is there any way to delete variables in the same scope they were created? And, in case it can be done, would there be any problem if I created a new variable with the same name after I have deleted the first one? I can't think of any reason why that could happen, but I'm pretty new to this programming language.
Also, I don't know if there is any less messy way of doing this.

This sounds like you are writing your own custom serializer.
I would recommend just using a existing library, like protobuf.net to define your messages if at all possible.
If this is not possible you can use a BinaryWriter to write your values to a Stream. If you want to keep it in memory use a MemoryStream and use .ToArray() when your done to get a array of all bytes.
As for memory, do not worry about it. Unless you have gigabyte sized messages the memory requirements should not be an issue, and the garbage collector will automatically recycle memory when it is no longer needed, and it can do this after the last usage, regardless of scope. If you have huge memory streams you might want to look at something like recyclable memory stream since this can avoid some allocation performance issues and fragmentation issues.

Related

concat two byte[] returns System.OutOfMemoryException

I have a problem with concat two byte[]. One of them have more than 300,000,000 byte. It's throwing exception of type System.OutOfMemoryException.
I use this code :
byte[] b3 = by2.Concat(by1).ToArray();
anybody can help me

Because of Concat call ToArray know nothing about how big the result array has to be. It can't create proper, big array and just fill it with data. So it creates small one, then when it's full creates new one with twice the size, etc. over and over again as long as there is more data to fill. This way you need much more memory then just theoretical (b1.Length + b2.Length) * 2. And things get even more tricky, because after certain point these big arrays are allocated on LOH, and are not collected that easily by GC as normal objects.
That's why you should not use ToArray() in this case and do it the old-fashioned way: allocate new array with size equals combines sizes of source arrays and copy the data.
Something like:
var b3 = new byte[b1.Length + b2.Length];
Array.Copy(b1, b2, b1.Length);
Array.Copy(b1, 0, b2, b1.Length, b2.Length);
It does not guaranty success, but makes it more likely. And executes much, much, much faster then ToArray().

When working with that amount of data, I think you should be working with streams (this of course depends on the application).
Then you can have code that works on the data without requiring it all to be loaded in memory at the same time, and you could create a specialized stream class that acts as a concatenation between two streams.

Well, the error message taks for itself, you don't have free continuous ~550Mb of RAM. Maybe it's just too fragmented.

Well.. you know, requesting from the system a continuous block of ~600meg - I'm not suprised. It is quite a large block itself, and provided that you must also have the source arrays in the memory, that's over 1GB of raw data chunks..
You should probably start thinking about other data structures, or try to keep them as files and map them to memory edit: memmapping a whole file needs the same contiguous area in address space, so it solves nothing. This answer will be deleted.

How can I reduce the garbage generation in this situation

My game has gotten to the point where its generating too much garbage and is resulting in long GC times. I've been going around and reducing a lot of the garbage generated but there's one spot that's allocating a large amount of memory too frequently and I'm stuck on how I can resolve this.
My game is a minecraft-type world that generates new regions as you walk. I have a large, variable size array that is allocated on the creation of a new region that is used to store the vertex data for the terrain. After the array is filled with data, it's passed to a slimdx DataStream so it can be used for rendering.
The problem is the fact that this is a variable-size array and that it needs to be passed to slimdx, which calls GCHandle.Alloc on it. Since its a variable size, it may have to be resized in order to reuse it. I also can't just allocate a max sized array for each region because it would require impossibly large amounts of memory. I can't use a List because of the GCHandle business with slimdx.
So far, resizing the array only when it needs to be made bigger seems to be the only plausible option to me, but it may not work out that well and will likely be a pain to implement. I'd need to keep track of the actual size of the array separately and use unsafe code to get a pointer to the array and pass that to slimdx. It may also end up eventually using such a large amount of memory that I have to occasionally go and reduce the size of all the arrays down to the minimum needed.
I'm hesitant to jump at this solution and would like to know if anyone sees any better solutions to this.

I'd suggest a tighter integration with the slimdx library. It's open source so you could dig in and find the critical path that you need for the rendering. Then you could integrate tighter by using a DMA-style memory sharing approach.

Since SlimDX is open source and it is too slow the time has come to change the open Source to suit your performance needs. What I do see here is that you want to keep a much larger array but hand to SlimDX only the actual used region to prevent additional memory allocatons for this potentially huge array.
There is a type in the .NET Framework named ArraySegment which was made exactly for this purpose.
// Taken from MSDN
// Create and initialize a new string array.
String[] myArr = { "The", "quick", "brown", "fox", "jumps", "over", "the",
"lazy", "dog" };
// Define an array segment that contains the middle five values of the array.
ArraySegment<String> myArrSegMid = new ArraySegment<String>( myArr, 2, 5 );
public static void PrintIndexAndValues( ArraySegment<String> arrSeg )
{
for ( int i = arrSeg.Offset; i < (arrSeg.Offset + arrSeg.Count); i++ )
{
Console.WriteLine( " [{0}] : {1}", i, arrSeg.Array[i] );
}
Console.WriteLine();
}
That said I have found the usage of ArraySegment somewhat strange because I always have to use the offset and the index which just behaves not a regular array. Instead you can distill your own struct which allows zero based indexing which is much easier to use but comes at the cost that every index based access does cost you and add of the base offset. But if the usage pattern is mainly foreaches then it does not really matter.
I had situations where ArraySegment was also too costly because you do allocate a struct every time and pass it to all methods by value on the stack. You need to watch closely where its usage is ok and if it is not allocated at a too high rate.

I sympathize your problem with older library, slimdx, which may not be .NET compliant. I have dealt with such a situation.
Suggestions:
Use a more performance efficient generic list or array like ArrayList. It keeps track of the size of the array so you don't have to. Allocate the List, chunks at a time, e.g. 100 elements at a time.
Use C++ .NET and take advantage of unsafe arrays or the .NET classes like ArrayList.
Updated: Use the idea of virtual memory. Save some data to an XML file or SQL database, relieving huge amount of memory.
I realize it's a gamble either way.

non contiguous String object C#.net

By what i understand String and StringBuilder objects both allocate contiguous memory underneath.
My program runs for days buffering several output in a String object. This sometimes cause outofmemoryexception which i think is because of non availability of contiguous memory. my string size can go upto 100MBs and i m concatenating new string frequently this causes new string object being allocated. i can reduce new string object creation by using Stringbuilder but that would not solve my problem entirely
Is there an alternative to a contiguous string object?

A rope data structure may be an option but I don't know of any ready-to-use implementations for .NET.
Have you tried using a LinkedList of strings instead? Or perhaps you can modify your architecture to read and write a file on disk instead of keeping everything in memory.

DO NOT USE STRINGS.
Strings will copy and allocate a new string for every operation. That is, if you have an 50mb string and add one character, until garbage collection happens, you will have two (aprox) 50mb strings around.
Then, you add another char, you'll have 3.... and so on.
On the other hand, proper use of StringBuilder, that is, using "Append" should not have any problem with 100 mbs.
Another optimization is creating the StringBuilder with your estimated size,
StringBuilder SB;
SB= new StringBuilder(capacity); // being capacity the suggested starting size
Use stringBuider to hold your big string, and then use append.
HTH

By going so large your strings are moved to the Large Object Heap (LOH) and you run a greater risk of fragmentation.
A few options:
Use a StringBuilder. You will be re-allocating less frequently. And try to pre-allocate, like new StringBuilder(100*1000*1000);
re-design your solution. There must be alternatives to keeping such large strings around. A List<string> for instance, that is only converted to 1 single string when (really) necessary.

I don't believe there's any solution for this using either String or StringBuilder. Both will require contiguous memory. Is it possible to change your architecture such that you can save the ongoing data to a List, a file, a database, or some other structure designed for such purposes?

First you should examine why you are doing that and see if there are other things you can do that give you the same value.
Then you have lots of options (depending on what you need) ranging from using logging to writing a simple class that collects strings into a List.

You can try saving the string to a database such as TextFile, SQL Server Express, MySQL, MS Access, ..etc. This way if your server gets shutdown for any reason (Power outage, someone bumped the UPS, thunderstorm, etc) you would not lose your data. It is a little slower then RAM but I think the trade off is worth it.
If this is not an option -- Most definitly use the stringbuilder for adding strings.

Faster way to get an SQL-compatible string from a Guid

I noticed this particular line of code when I was profiling my application (which creates a boatload of database insertions by processing some raw data):
myStringBuilder.AppendLine(
string.Join(
BULK_SEPARATOR, new string[]{myGuid.ToString() ...
Bearing in mind that the resultant string is going to end up in a file called via the TSQL command BULK INSERT, is there a way I can do this step faster? I know that getting a byte array is faster, but I can't just plug that into the file.

The fastest and simplest way would be not to use BULK INSERT with a raw file at all. Instead, use the SqlBulkCopy class. That should speed this up significantly by sending the data directly over the pipe instead of using an intermediate file.
(You'll also be able to use the Guid directly without any string conversions, although I can't be 100% sure what SqlBulkCopy does internally with it.)

You aren't indicating where you are getting the guid from. Also, I don't believe that getting the bytes are going to be any faster, as you are going to do what the ToString method on the Guid class already does, iterate through the bytes and convert to a string value.
Rather, I think a few general areas that this code can possibly be improved upon in terms of performance are (and assuming you are doing this in a loop):
Are you reusing the myStringBuilder instance upon a new iteration of your loop? You should be setting the Length (not the Capacity) property to 0, and then rebuild your string using that. This will prevent having to warm up a new StringBuilder instance, and the memory allocation for a larger string will already have been made.
Use calls to Append on myStringBuilder instead of calling String.Join. String.Join is going to preallocate a bunch of memory and then return a string instance which you will just allocate again (if on the first iteration) or copy into already-allocated space. There is no reason to do this twice. Instead, iterate through the array you are creating (or expand the loop, it seems you have a fixed-size array) and call Append, passing in the guid and then then BULK_SEPARATOR. It's easier to remove the single character from the end, btw, just decrement the Length property of the StringBuilder by one if you actually appended Guids.

If time is critical - could you pre-create a sufficiently long list of GUID's converted to string ahead of time, and then use it in your code? Either in C#, or possibly in SQL Server, depending on your requirements?

String concatenation in C# with interned strings

I know this question has been done but I have a slightly different twist to it. Several have pointed out that this is premature optimization, which is entirely true if I were asking for practicality's sake and practicality's sake only. My problem is rooted in a practical problem but I'm still curious nonetheless.
I'm creating a bunch of SQL statements to create a script (as in it will be saved to disk) to recreate a database schema (easily many many hundreds of tables, views, etc.). This means my string concatenation is append-only. StringBuilder, according to MSDN, works by keeping an internal buffer (surely a char[]) and copying string characters into it and reallocating the array as necessary.
However, my code has a lot of repeat strings ("CREATE TABLE [", "GO\n", etc.) which means I can take advantage of them being interned but not if I use StringBuilder since they would be copied each time. The only variables are essentially table names and such that already exist as strings in other objects that are already in memory.
So as far as I can tell that after my data is read in and my objects created that hold the schema information then all my string information can be reused by interning, yes?
Assuming that, then wouldn't a List or LinkedList of strings be faster because they retain pointers to interned strings? Then it's only one call to String.Concat() for a single memory allocation of the whole string that is exactly the correct length.
A List would have to reallocate string[] of interned pointers and a linked list would have to create nodes and modify pointers, so they aren't "free" to do but if I'm concatenating many thousands of interned strings then they would seem like they would be more efficient.
Now I suppose I could come up with some heuristic on character counts for each SQL statement & count each type and get a rough idea and pre-set my StringBuilder capacity to avoid reallocating its char[] but I would have to overshoot by a fair margin to reduce the probability of reallocating.
So for this case, which would be fastest to get a single concatenated string:
StringBuilder
List<string> of interned strings
LinkedList<string> of interned strings
StringBuilder with a capacity heuristic
Something else?
As a separate question (I may not always go to disk) to the above: would a single StreamWriter to an output file be faster yet? Alternatively, use a List or LinkedList then write them to a file from the list instead of first concatenating in memory.
EDIT:
As requested, the reference (.NET 3.5) to MSDN. It says: "New data is appended to the end of the buffer if room is available; otherwise, a new, larger buffer is allocated, data from the original buffer is copied to the new buffer, then the new data is appended to the new buffer." That to me means a char[] that is realloced to make it larger (which requires copying old data to the resized array) then appending.

For your separate question, Win32 has a WriteFileGather function, which could efficiently write a list of (interned) strings to disk - but it would make a notable difference only when being called asynchronously, as the disk write will overshadow all but extremely large concatenations.
For your main question: unless you are reaching megabytes of script, or tens of thousands of scripts, don't worry.
You can expect StringBuilder to double the allocation size on each reallocation. That would mean growing a buffer from 256 bytes to 1MB is just 12 reallocations - quite good, given that your initial estimate was 3 orders of magnitude off the target.
Purely as an exercise, some estimates: building a buffer of 1MB will sweep roughly 3 MB memory (1MB source, 1MB target, 1MB due to
copying during realloation).
A linked list implementation will sweep about 2MB, (and that's ignoring the 8 byte / object overhead per string reference). So you are saving 1 MB memory reads/writes, compared to a typical memory bandwidth of 10Gbit/s and 1MB L2 cache.)
Yes, a list implementation is potentially faster, and the difference would matter if your buffers are an order of magnitude larger.
For the much more common case of small strings, the algorithmic gain is negligible, and easily offset by other factors: the StringBuilder code is likely in the code cache already, and a viable target for microoptimizations. Also, using a string internally means no copy at all if the final string fits the initial buffer.
Using a linked list will also bring down the reallocation problem from O(number of characters) to O(number of segments) - your list of string references faces the same problem as a string of characters!
So, IMO the implementation of StringBuilder is the right choice, optimized for the common case, and degrades mostly for unexpectedly large target buffers. I'd expect a list implementation to degrade for very many small segments first, which is actually the extreme kind of scenario StringBuilder is trying to optimize for.
Still, it would be interesting to see a comparison of the two ideas, and when the list starts to be faster.

If I were implementing something like this, I would never build a StringBuilder (or any other in memory buffer of your script).
I would just stream it out to your file instead, and make all strings inline.
Here's an example pseudo code (not syntactically correct or anything):
FileStream f = new FileStream("yourscript.sql");
foreach (Table t in myTables)
{
f.write("CREATE TABLE [");
f.write(t.ToString());
f.write("]");
....
}
Then, you'll never need an in memory representation of your script, with all the copying of strings.
Opinions?

In my experience, I properly allocated StringBuilder outperforms most everything else for large amounts of string data. It's worth wasting some memory, even, by overshooting your estimate by 20% or 30% in order to prevent reallocation. I don't currently have hard numbers to back it up using my own data, but take a look at this page for more.
However, as Jeff is fond of pointing out, don't prematurely optimize!
EDIT: As #Colin Burnett pointed out, the tests that Jeff conducted don't agree with Brian's tests, but the point of linking Jeff's post was about premature optimization in general. Several commenters on Jeff's page noted issues with his tests.

Actually StringBuilder uses an instance of String internally. String is in fact mutable within the System assembly, which is why StringBuilder can be build on top of it. You can make StringBuilder a wee bit more effective by assigning a reasonable length when you create the instance. That way you will eliminate/reduce the number of resize operations.
String interning works for strings that can be identified at compile time. Thus if you generate a lot of strings during the execution they will not be interned unless you do so yourself by calling the interning method on string.
Interning will only benefit you if your strings are identical. Almost identical strings doesn't benefit from interning, so "SOMESTRINGA" and "SOMESTRINGB" will be two different strings even if they are interned.

If all (or most) of the strings being concatenated are interned, then your scheme MIGHT give you a performance boost, as it could potentally use less memory, and could save a few large string copies.
However, whether or not it actually improves perf depends on the volume of data you are processing, because the improvement is in constant factors, not in the order of magnitude of the algorithm.
The only way to really tell is to run your app using both ways and measure the results. However, unless you are under significant memory pressure, and need a way to save bytes, I wouldn't bother and would just use string builder.

A StringBuilder doesn't use a char[] to store the data, it uses an internal mutable string. That means that there is no extra step to create the final string as it is when you concatenate a list of strings, the StringBuilder just returns the internal string buffer as a regular string.
The reallocations that the StringBuilder does to increase the capacity means that the data is by average copied an extra 1.33 times. If you can provide a good estimate on the size when you create the StringBuilder you can reduce that even furter.
However, to get a bit of perspective, you should look at what it is that you are trying to optimise. What will take most of the time in your program is to actually write the data to disk, so even if you can optimise your string handling to be twice as fast as using a StringBuilder (which is very unlikely), the overall difference will still only be a few percent.

Have you considered C++ for this? Is there a library class that already builds T/SQL expressions, preferably written in C++.
Slowest thing about strings is malloc. It takes 4KB per string on 32-bit platforms. Consider optimizing number of string objects created.
If you must use C#, I'd recommend something like this:
string varString1 = tableName;
string varString2 = tableName;
StringBuilder sb1 = new StringBuilder("const expression");
sb1.Append(varString1);
StringBuilder sb2 = new StringBuilder("const expression");
sb2.Append(varString2);
string resultingString = sb1.ToString() + sb2.ToString();
I would even go as far as letting the computer evaluate the best path for object instantiation with dependency injection frameworks, if perf is THAT important.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.