Retrieving constant amount of bytes from array - c#

I have a huge byte[] data array. I want to take specific amount of bytes (considering as Blocksize) and do some operation with it and have all the results of each block added one after another in a new array.
This is my code:
int j = 0;
int number_of_blocks = (data.Length) / 16;
byte[] one_block = new byte[16];
byte[] one_block_return = new byte[16];
byte[] all_block_return = new byte[data.Length];
for (int i = 0; i < number_of_blocks; i++)
{
Array.Copy(data, j, one_block, 0, 16);
one_block_return = one_block_operation(one_block);
Array.Copy(one_block_return, 0, all_block_return, j, 16);
Array.Clear(one_block, 0, one_block.Length);
j = j + 16;
}
The only problem of this code is its too slow since my data array is extremely large. So I am expecting a replacement of Array.Copy() which is more faster then this or if someone has a better way to do this. I want to know how many ways to do this and hoping to see variation of coding as well.
-Thanks

What about simple parallelization?
int number_of_blocks = (int)Math.Ceiling((double)data.Length / 16);
byte[] all_block_return = new byte[data.Length];
Parallel.For(0, number_of_blocks - 1, block_no =>
{
var blockStart = block_no * 16; // 16 - block size
var blockLength = Math.Min(16, data.Length - blockStart);
byte[] one_block = new byte[16];
byte[] one_block_return = new byte[16];
Array.Copy(data, blockStart, one_block, 0, blockLength);
one_block_return = one_block_operation(one_block);
Array.Copy(one_block_return, 0, all_block_return, blockStart, blockLength);
});
It is possible to modyify one_block_operation to take data, blockStart, blockStart + blockLength arguments instead buffer (one_block)? You could avoid one of Array.Copy.
EDIT:
Here is how it works:
Firstly, we need to calculate number or blocks. Then the Parallel.For is executes with specified arguments: start index, end index and delegate that passes one argument - currently processed index. In our case, index is considered to be number of block. Equivalent to this code is:
for (var block_no = 0, block_no <= number_of_blocks - 1; block_no++) {
delegate(block_no);
}
The only difference is that Parallerl.For runs that loop in multiple threads. The threads count number is not fixed - it dependens on ThreadPool size (according to MSDN it also depedends on many factors).
Due to each deletage could be called independently (and we don't know the order of calling deletagtes) we cannot use variable to store current block start index outside deletegate (like you stored it outside for loop). But if we know to current block number and size of block, calculating block start index is very easy (and it is done in 8th line).
And no - you can't skip 9th line or replace it with const value of 16. Why? Consider follwing sequence:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
We can divide this sequence into two block of size 16:
1st: [1-16]
2nd: [17]
So, like you see - the second block doesn't contain 16 elements, but only 1. The 9th line calculate actual block size / length, so you can easily avoid IndexOutOfBoundException.

Related

How to take array segments out of a byte array after every X step?

I got a big byte array (around 50kb) and i need to extract numeric values from it. Every three bytes are representing one value.
What i tried is to work with LINQs skip & take but it's really slow regarding the large size of the array.
This is my very slow routine:
List<int> ints = new List<int>();
for (int i = 0; i <= fullFile.Count(); i+=3)
{
ints.Add(BitConverter.ToInt16(fullFile.Skip(i).Take(i + 3).ToArray(), 0));
}
I think i got a wrong approach to this.
Your code
First of all, ToInt16 only uses two bytes. So your third byte will be discarded.
You can't use ToInt32 as it would include one extra byte.
Let's review this:
fullFile.Skip(i).Take(i + 3).ToArray()
..and take a careful look at Take(i + 3). It says that you want to copy a larger and larger buffer. For instance, when i is on index 32000 you copy 32003 bytes into your new buffer.
That's why the code is quite slow.
The code is also slow since you allocate a lot of byte buffers which will need to be garbage collected. 65535 extra buffers of growing size which would have to be garbage collected.
You could also have done like this:
List<int> ints = new List<int>();
var workBuffer = new byte[4];
for (int i = 0; i <= fullFile.Length; i += 3)
{
// Copy the three bytes into the beginning of the temp buffer
Buffer.BlockCopy(fullFile, i, workBuffer, 0, 3);
// Now we can use ToInt32 as the last byte always is zero
var value = BitConverter.ToInt32(workBuffer, 0);
ints.Add(value);
}
Quite easy to understand, but not the fastest code.
A better solution
So the most efficient way is to do the conversion by yourself (bit shifting).
Something like:
List<int> ints = new List<int>();
for (int i = 0; i <= fullFile.Length; i += 3)
{
// This code assume little endianess
var value = (fullFile[i + 2] << 16)
+ (fullFile[i + 1] << 8)
+ fullFile[i];
ints.Add(value);
}
This code do not allocate anything extra (except the ints), and should be quite fast.
You can read more about Shift operators in MSDN. And about endianess

Destination array was not long enough. Check destIndex and length, and the array's lower bounds in mvc

I'm trying to upload an image, and an getting the following exception on the third line:
var file = Request.Files[0];
var imgBytes = new Byte[file.ContentLength - 1];
file.InputStream.Read(imgBytes, 0, file.ContentLength);
var base64String = Convert.ToBase64String(imgBytes,0,imgBytes.Length);
p.Photo = base64String;
Your code says: allocate (n - 1) bytes, read n bytes.
var imgBytes = new Byte[file.ContentLength]; // <- Remove - 1
file.InputStream.Read(imgBytes, 0, file.ContentLength);
Seems you're making a simple mistake when creating your array, and it's probably rooted in the fact that arrays are zero-based (i.e. positions start with 0).
First, to make this extremely clear, consider an array that should contain three elements, {A, B, C}. When you store those in an array, A will have the index 0, B will have 1, and C will be at 2.
In other words, the last item will be at the position length - 1. The length itself though, will still be 3.
Apply that to your situation, and you'll realize the problem lies here:
var imgBytes = new Byte[file.ContentLength - 1];
Remove the -1 and it should work.
If you needed to read directly from the last byte in your array on the other hand, you'd use file.ContentLength - 1 to access it.

find with how many digits array is filled?

Let's say I have array of bytes
byte[] arr = new byte[16];
and I filled only 10 of those 16 bytes
arr[0] = 1;
arr[1] = 1;
arr[2] = 2;
arr[3] = 3;
arr[4] = 4;
arr[5] = 5;
arr[6] = 6;
arr[7] = 7;
arr[8] = 8;
arr[9] = 9;
arr[10] = 1;
the problem is that user can input up to 10 digits, and the last digit could be 0
How can I find how many digits my array holds, since arr.Length and arr.Count() will give 16 but I need to know that it's 10?
I think you'd be better off using a generic List. Then you can use the Count property to get the correct number of items.
List<byte> arr = new List<byte>();
arr.Add(1);
arr.Add(2);
arr.Add(3);
arr.Add(4);
arr.Add(5);
int count = arr.Count; // returns 5;
byte[] myArray = arr.ToArray(); // returns byte array
UPDATE
If an array is your only option and you cannot switch to a List<T> then you are out of luck. The problem is, by default, C# will initialize each byte in the array to 0. So as soon as the compiler gets to the new byte[16]; initialization, the entire byte array is filled with 0's. Once this happens, there is no way to know if the 0 is the default value or if its a valid value. If there is a number between 0 and 255 that you 100% know will not be used, you could initialize the array with that number and get a count of each element != to that number.
Another thing you can do though is you can just initialize the array to one byte (e.g. new byte[1]) and each time you need more elements, resize the array an additional byte. Then you can use the standard Length property to see how many are filled.
Well if "0" means the item isn't filled (which means you can't use 0 as a valid entry) you could use:
int numFilled = arr.Count(b => b != 0);
Otherwise you're going to have to use a "magic" number (e.g. 255) to indicate an "unused" item.
Either way it's not foolproof. If there's not a reason to use a fixed-length array then I'd suggest using a different structure like List<byte> which can be filled dynamically and easily converted to an array.
int count = arr.Where(item => item != 0).Count();
Or
int count = arr.Count(item => item != 0);
[EDIT] As Jon Skeet says above, this assumes that you did not fill any of the entries with 0.

Copying a part of a byte[] array into a PDFReader

This is a continuation of the ongoing struggle to reduce my memory load mention in
How do you refill a byte array using SqlDataReader?
So I have a byte array that is a set size, for this example, I'll say new byte[400000]. Inside of this array, I'll be placing pdf's of different sizes (less than 400000).
psuedo code would be:
public void Run()
{
byte[] fileRetrievedFromDatabase = new byte[400000];
foreach (var document in documentArray)
{
// Refill the file with data from the database
var currentDocumentSize = PopulateFileWithPDFDataFromDatabase(fileRetrievedFromDatabase);
var reader = new iTextSharp.text.pdf.PdfReader(fileRetrievedFromDatabase.Take((int)currentDocumentSize ).ToArray());
pageCount = reader.NumberOfPages;
// DO ADDITIONAL WORK
}
}
private int PopulateFileWithPDFDataFromDatabase(byte[] fileRetrievedFromDatabase)
{
// DataAccessCode Goes here
int documentSize = 0;
int bufferSize = 100; // Size of the BLOB buffer.
byte[] outbyte = new byte[bufferSize]; // The BLOB byte[] buffer to be filled by GetBytes.
myReader = logoCMD.ExecuteReader(CommandBehavior.SequentialAccess);
Array.Clear(fileRetrievedFromDatabase, 0, fileRetrievedFromDatabase.Length);
if (myReader == null)
{
return;
}
while (myReader.Read())
{
documentSize = myReader.GetBytes(0, 0, null, 0, 0);
// Reset the starting byte for the new BLOB.
startIndex = 0;
// Read the bytes into outbyte[] and retain the number of bytes returned.
retval = myReader.GetBytes(0, startIndex, outbyte, 0, bufferSize);
// Continue reading and writing while there are bytes beyond the size of the buffer.
while (retval == bufferSize)
{
Array.Copy(outbyte, 0, fileRetrievedFromDatabase, startIndex, retval);
// Reposition the start index to the end of the last buffer and fill the buffer.
startIndex += retval;
retval = myReader.GetBytes(0, startIndex, outbyte, 0, bufferSize);
}
}
return documentSize;
}
The problem with the above code is that that I keep getting a "Rebuild trailer not found. Original Error: PDF startxref not found" error when I try to access the PDF Reader. I believe it's because the byte array is too long and has trailing 0's. But since I'm using the byte array so that I'm not continuously building new objects on the LOH, I need to do this.
So how do I get just the piece of the Array that I need and send it to the PDFReader?
Updated
So I looked at the source and realized I had some variables from my actual code that was confusing. I'm basically reusing the fileRetrievedFromDatabase object in each iteration of the loop. Since it's passed by reference, it gets cleared (set to all zero's), and then filled in the PopulateFileWithPDFDataFromDatabase. This object is then used to create a new PDF.
If I didn't do it this way, a new large byte array would be created in every iteration and the Large Object Heap gets full and eventually throws an OutOfMemory exception.
You have at least two options:
Treat your buffer like a circular buffer with two indexes for starting and ending position.
need an index of the last byte written in outByte and you have to stop reading when you reach that index.
Simply read the same number of bytes as you have in your data array to avoid reading into the "unknown" parts of the buffer which don't belong to the same file.
In other words, instead of having bufferSize as the last parameter, have the data.Length.
// Read the bytes into outbyte[] and retain the number of bytes returned.
retval = myReader.GetBytes(0, startIndex, outbyte, 0, data.Length);
If data length is 10 and your outbyte buffer is 15, then you should only read the data.Length not the bufferSize.
However, I still don't see how you're reusing the outbyte "buffer", if that's what you're doing... I'm simply not following based on what you've provided in your answer. Maybe you can clarify exactly what is being reused.
Apparently, I the way the while loop is currently structured, it wasn't copying the data on it's last iteration. Needed to add this:
if (outbyte != null && outbyte.Length > 0 && retval > 0)
{
Array.Copy(outbyte, 0, currentDocument.Data, startIndex, retval);
}
It's now working, although I will definitely need to refactor.

Get a Array subset without copying like in C with pointers

I have an api call which needs to get a byte[] as parameter and my data already is in a byte[]. The problem is that I want to send this buffer in little chunks.
The slow solution would be to copy the array data to new arrays. But I don't want to do this because copying is unnecessary. I just want a byte[]-pointer which i can move around in my buffer. Like in C or C++...
Here a sample in pseudo code:
ArrayOriginal = { 0, 1, 2, 3, 4, ... 100 }
ArrayFirstChunk = { 0, 1, 2 } (pointer to the first element in the Original Array)
ArraySecondChunk = { 3, 4, 5 } (pointer to the fourth element in the Original Array)
...
Is this possible? The data shall be available only one time in the memory.
thx
You don't say whether you can change the API. I assume not, but if you can, there is always IEnumerable<byte> - so you return
myarray.Skip(4).Take(4);
etc
You could try using unsafe to get a pointer to your array. Otherwise Buffer.BlockCopy is an efficient way of copying portions of arrays to another array. If sending small chunks of data you could just reuse the small array instance and leave it to garbage collection to release the memory from the array.
You can create FakeArray that contain an array an offset and a length. Like this you could work with subarray of array. But It won't be an array.
When you pass an array as a parameter, you are passing just a pointer to that array. The array is stored only once in memory.
So, I think you just don't need to divide in little chunks.
If you want to process it in chunks, I would suggest just reading the desired elements, first from 0 to 2, then from 3 to 5, etc...
Hope that helps
The usual way of handling data inside an array is simply to specify a chosen offset and count, for example:
// {0, ..., 100}
byte[] data = Enumerable.Range(0, 101).Select(i => (byte)i).ToArray();
Write(data, 0, 3);
Write(data, 3, 3);
...
static void Write(byte[] buffer, int offset, int count)
{
for(int i = offset ; i < offset + count; i++)
Console.WriteLine(buffer[i]);
Console.WriteLine();
}
You can do something similar to the C approach with unsafe (via byte* and fixed), but I'm not sure it buys you much here; but:
fixed(byte* ptr = data)
{
Write(ptr, 3);
Write(ptr + 3, 3);
}
...
static unsafe void Write(byte* ptr, int count)
{
for (int i = 0; i < count; i++)
Console.WriteLine(ptr[i]);
Console.WriteLine();
}
You can encapsulate a buffer, offset and count, but then it won't be an array - probably not very helpful.

Categories

Resources