I have problem with simple byte[] copy. In ConsoleApplication i load 75MB DAT file into byte[]. After that i would like to cut array with function bellow.
public static byte[] SubArray(this byte[] Data, int Index, int Length = 0)
{
if (Length == 0) Length = Data.Length - Index;
byte[] Result = new byte[Length];
Array.Copy(Data, Index, Result, 0, Length);
return Result;
}
If i use only one Data = Data.SubArray(32), memory grow from 100 to 180MB, but if i do a test with three times Data = Data.SubArray(32), memory grow triple to 340MB. I suppose that old array is still in memory. How do i release old array from memory? I don't need it anymore and with more array sub in code memory growth to 2GB.
You need to let the Garbage Collector to do its thing. To make it easier for GC, you would normally set the old unused reference to null or replace it with a new reference value. GC needs some time to hit.
Related
I have a pointer to a byte array, and I need to set the values of a certain region of this array to 0. I'm quite familiar with the methods available through the Marshal/Buffer/Array classes, and this problem is not at all hard.
The problem, however, is that I do not want to create excessive arrays, or write every byte one-by-one. All the methods I'm familiar with require full arrays, though, and they obviously don't work with single values.
I've seen several C methods that would achieve the result I'm looking for, but I don't have believe I have access to these methods without including the whole C library, or without writing platform-specific code.
My current solution is shown below, but I'd like to achieve this without allocating a new byte array.
Marshal.Copy(new byte[Length], 0, ptr + offset, length);
So is there a method in C#, or in an unmanaged language/library that I can use to fill an array (via a pointer) at a certain offset and for a certain length, with one single value (0)?
Miraculously, ChatGPT came rather close when I asked what would be a good solution to this problem. It didn't figure it out, but it suggested that I use spans.
As such, this is the solution I've come up with:
Span<byte> span = new Span<byte>(ptr + offset, Length);
span.Fill(0);
This solution is about 25 times faster than having to allocate a byte array for very large arrays.
Example benchmarks:
int size = 100_000;
nint ArrayPointer = Marshal.AllocHGlobal(size);
int trials = 1_000_000;
// Runtime was 1582ms
Benchmark("Fill with span", () =>
{
Span<byte> span = new Span<byte>((void*) ArrayPointer, size);
span.Fill(0);
}, trials);
// Runtime was 40681ms
Benchmark("Fill with allocation", () =>
{
Marshal.Copy(new byte[size], 0, ArrayPointer, size);
}, trials);
// Far too slow to get a result with these settings
Benchmark("Fill individually", () =>
{
for (int i = 0; i < size; i++)
{
Marshal.WriteByte(ArrayPointer + i, 0);
}
}, trials);
// Results with size = 100_000 and trials = 100_000
// Fill with span: 176ms
// Fill with allocation: 4382ms
// Fill individually: 24672ms
You can use Fill for this
arrayName.Fill('X',4,10) // fill character array at index 4 for 10 elements with character X
https://learn.microsoft.com/en-us/dotnet/api/system.array.fill?view=net-7.0
Note: The documentation for C# is quite good. You can go to the website and see all the Methods for array. If you really care how this is implemented you could even go to github and read the source code.
I got a big byte array (around 50kb) and i need to extract numeric values from it. Every three bytes are representing one value.
What i tried is to work with LINQs skip & take but it's really slow regarding the large size of the array.
This is my very slow routine:
List<int> ints = new List<int>();
for (int i = 0; i <= fullFile.Count(); i+=3)
{
ints.Add(BitConverter.ToInt16(fullFile.Skip(i).Take(i + 3).ToArray(), 0));
}
I think i got a wrong approach to this.
Your code
First of all, ToInt16 only uses two bytes. So your third byte will be discarded.
You can't use ToInt32 as it would include one extra byte.
Let's review this:
fullFile.Skip(i).Take(i + 3).ToArray()
..and take a careful look at Take(i + 3). It says that you want to copy a larger and larger buffer. For instance, when i is on index 32000 you copy 32003 bytes into your new buffer.
That's why the code is quite slow.
The code is also slow since you allocate a lot of byte buffers which will need to be garbage collected. 65535 extra buffers of growing size which would have to be garbage collected.
You could also have done like this:
List<int> ints = new List<int>();
var workBuffer = new byte[4];
for (int i = 0; i <= fullFile.Length; i += 3)
{
// Copy the three bytes into the beginning of the temp buffer
Buffer.BlockCopy(fullFile, i, workBuffer, 0, 3);
// Now we can use ToInt32 as the last byte always is zero
var value = BitConverter.ToInt32(workBuffer, 0);
ints.Add(value);
}
Quite easy to understand, but not the fastest code.
A better solution
So the most efficient way is to do the conversion by yourself (bit shifting).
Something like:
List<int> ints = new List<int>();
for (int i = 0; i <= fullFile.Length; i += 3)
{
// This code assume little endianess
var value = (fullFile[i + 2] << 16)
+ (fullFile[i + 1] << 8)
+ fullFile[i];
ints.Add(value);
}
This code do not allocate anything extra (except the ints), and should be quite fast.
You can read more about Shift operators in MSDN. And about endianess
I've got a BinaryReader reading in a number of bytes into an array. The underlying Stream for the reader is a BufferedStream(whose underlying stream is a network stream). I noticed that sometimes the reader.Read(arr, 0, len) method is returning different(wrong) results than reader.ReadBytes(len).
Basically my setup code looks like this:
var httpClient = new HttpClient();
var reader = new BinaryReader(new BufferedStream(await httpClient.GetStreamAsync(url).ConfigureAwait(false)));
Later on down the line, I'm reading a byte array from the reader. I can confirm the sz variable is the same for both scenarios.
int sz = ReadSize(reader); //sz of the array to read
if (bytes == null || bytes.Length <= sz)
{
bytes = new byte[sz];
}
//reader.Read will return different results than reader.ReadBytes sometimes
//everything else is the same up until this point
//var tempBytes = reader.ReadBytes(sz); <- this will return right results
reader.Read(bytes, 0, sz); // <- this will not return the right results sometimes
It seems like the reader.Read method is reading further into the stream than it needs to or something, because the rest of the parsing will break after this happens. Obviously I could stick with reader.ReadBytes, but I want to reuse the byte array to go easy on the GC here.
Would there ever be any reason that this would happen? Is a setting wrong or something?
Make sure you clear out bytes array before calling this function because Read(bytes, 0, len) does NOT clear given byte array, so some previous bytes may conflict with new one. I also had this problem long ago in one of my parsers. just set all elements to zero, or make sure that you are only reading (parsing) up to given len
i am trying to read the stream of bytes from a file. However when I try to read the bytes I get a
The function evaluation was disabled because of an out of memory
exception
Quite straightforward. However, what is the best way of getting around this problem? Is it too loop around the length at 1028 at a time? Or is there a better way?
The C# I am using
BinaryReader br = new BinaryReader(stream fs);
// The length is around 600000000
long Length = fs.Length;
// Error here
bytes = new byte[Length];
for (int i = 0; i < Length; i++)
{
bytes [i] = br.ReadByte();
}
Thanks
Well. First of all. Imagine a file with the size of e.g. 2GB. Your code would allocate 2GB of memory. Just read the part of the file you really need instead of the whole file at once.
Secondly: Don't do something like this:
for (int i = 0; i < Length; i++)
{
bytes [i] = br.ReadByte();
}
It is quite inefficient. To read the raw bytes of a stream you should use something like this:
using(var stream = File.OpenRead(filename))
{
int bytesToRead = 1234;
byte[] buffer = new byte[bytesToRead];
int read = stream.Read(buffer, 0, buffer.Length);
//do something with the read data ... e.g.:
for(int i = 0; i < read; i++)
{
//...
}
}
When you try to allocate an array, the CLR lays it out contiguously in virtual memory given to it by the OS. Virtual memory can be fragmented, however, so a contiguous 1 GB block may not be available, hence the OutOfMemoryException. It doesn't matter how much physical RAM your machine has, and this problem is not limited to managed code (try allocating a huge array in native C and you'll find similar results).
Instead of allocating a huge array, I recommend using several smaller arrays, an ArrayList or List, that way the Framework can allocate the data in chunks.
Hope that helps
I believe that the instantiation of the stream object already reads the file (into a cache). Then your loop copies the bytes in memory to another array.
So, why not to use the data into "br" instead of making a further copy?
This is a continuation of the ongoing struggle to reduce my memory load mention in
How do you refill a byte array using SqlDataReader?
So I have a byte array that is a set size, for this example, I'll say new byte[400000]. Inside of this array, I'll be placing pdf's of different sizes (less than 400000).
psuedo code would be:
public void Run()
{
byte[] fileRetrievedFromDatabase = new byte[400000];
foreach (var document in documentArray)
{
// Refill the file with data from the database
var currentDocumentSize = PopulateFileWithPDFDataFromDatabase(fileRetrievedFromDatabase);
var reader = new iTextSharp.text.pdf.PdfReader(fileRetrievedFromDatabase.Take((int)currentDocumentSize ).ToArray());
pageCount = reader.NumberOfPages;
// DO ADDITIONAL WORK
}
}
private int PopulateFileWithPDFDataFromDatabase(byte[] fileRetrievedFromDatabase)
{
// DataAccessCode Goes here
int documentSize = 0;
int bufferSize = 100; // Size of the BLOB buffer.
byte[] outbyte = new byte[bufferSize]; // The BLOB byte[] buffer to be filled by GetBytes.
myReader = logoCMD.ExecuteReader(CommandBehavior.SequentialAccess);
Array.Clear(fileRetrievedFromDatabase, 0, fileRetrievedFromDatabase.Length);
if (myReader == null)
{
return;
}
while (myReader.Read())
{
documentSize = myReader.GetBytes(0, 0, null, 0, 0);
// Reset the starting byte for the new BLOB.
startIndex = 0;
// Read the bytes into outbyte[] and retain the number of bytes returned.
retval = myReader.GetBytes(0, startIndex, outbyte, 0, bufferSize);
// Continue reading and writing while there are bytes beyond the size of the buffer.
while (retval == bufferSize)
{
Array.Copy(outbyte, 0, fileRetrievedFromDatabase, startIndex, retval);
// Reposition the start index to the end of the last buffer and fill the buffer.
startIndex += retval;
retval = myReader.GetBytes(0, startIndex, outbyte, 0, bufferSize);
}
}
return documentSize;
}
The problem with the above code is that that I keep getting a "Rebuild trailer not found. Original Error: PDF startxref not found" error when I try to access the PDF Reader. I believe it's because the byte array is too long and has trailing 0's. But since I'm using the byte array so that I'm not continuously building new objects on the LOH, I need to do this.
So how do I get just the piece of the Array that I need and send it to the PDFReader?
Updated
So I looked at the source and realized I had some variables from my actual code that was confusing. I'm basically reusing the fileRetrievedFromDatabase object in each iteration of the loop. Since it's passed by reference, it gets cleared (set to all zero's), and then filled in the PopulateFileWithPDFDataFromDatabase. This object is then used to create a new PDF.
If I didn't do it this way, a new large byte array would be created in every iteration and the Large Object Heap gets full and eventually throws an OutOfMemory exception.
You have at least two options:
Treat your buffer like a circular buffer with two indexes for starting and ending position.
need an index of the last byte written in outByte and you have to stop reading when you reach that index.
Simply read the same number of bytes as you have in your data array to avoid reading into the "unknown" parts of the buffer which don't belong to the same file.
In other words, instead of having bufferSize as the last parameter, have the data.Length.
// Read the bytes into outbyte[] and retain the number of bytes returned.
retval = myReader.GetBytes(0, startIndex, outbyte, 0, data.Length);
If data length is 10 and your outbyte buffer is 15, then you should only read the data.Length not the bufferSize.
However, I still don't see how you're reusing the outbyte "buffer", if that's what you're doing... I'm simply not following based on what you've provided in your answer. Maybe you can clarify exactly what is being reused.
Apparently, I the way the while loop is currently structured, it wasn't copying the data on it's last iteration. Needed to add this:
if (outbyte != null && outbyte.Length > 0 && retval > 0)
{
Array.Copy(outbyte, 0, currentDocument.Data, startIndex, retval);
}
It's now working, although I will definitely need to refactor.