C# byte array - writing in the middle - c#

I have the task of reading data from a source in chunks, and storing the entire result in a byte array. Specifically, I need to make subsequent calls to "Socket.Receive". I would like to allocate the byte array with the final size in advance, and each time give the position within the array to copy data into. This, to avoid an extra copy.
In C++, you simply give the offset of the array. Could not figure out how to give the Receive method a location in the middle of the byte array...
Can this be done in C#?

There are overloads to Receive that accept the offset and count to read. You can use them: https://msdn.microsoft.com/en-us/library/system.net.sockets.socket.receive(v=vs.110).aspx - for a specific example: https://msdn.microsoft.com/en-us/library/w3xtz6a5(v=vs.110).aspx

Related

How to optimize sequential reading and backtracking the position of the file c#?

I have an indefinitely big file. I am to find largest matches of segments of the file with some byte arrays of different lengths.
What I do now is this.
1-Created a FileStream fs
ForEach byte b in fs.
save currentPosition.
//these byte arrays are different depending on b
ForEach byte array
while matching bytes
read from fs.
print matched sequence.
seek to position
Now the program is slow. How can I improve my reading from the file?
From what I read, the fs has an internal buffer, so when I read a byte, it looks ahead by default 4kb.
My questions:
Am I correct in assuming that the sequential reads of the bytes in fs inside the while loop are satisfied from that buffer?
If so, what happens when I seek back? Does the buffer get destroyed and I read fill it again with the same content for each byte array? Because I need the same buffer, but I just want to reiterate the buffer.
Also, after I have iterated all the byte arrays, and I want to move on to next bwhat happens to that buffer, because what I really want is that same buffer but without the first byte.
How does this work? Do I need to create a wrapper for the FileStream, to read a byte array (that buffer) myself, and satisfy my reads from that buffer?
Edit: From task manager I see that the average of the processor my program uses is 2%. So the fact that the program is slow must be because of the file reads.

System.Array.CopyTo() Issue

I have researched this issue and cannot seem to find the any worthwhile answer. What is the difference between System.Array.CopyTo() and System.Array.Clone()?
System.Array.CopyTo will copy to existing array of the similar size(If size of destination is less than size of source data, exception will be thrown). System.Array.Clone will create a new array.
From MSDN:
Array.CopyTo : Copies all the elements of the current one-dimensional Array to the specified one-dimensional Array starting at the specified destination Array index. The index is specified as a 32-bit integer.
Array.Clone: Creates a shallow copy of the Array.
The CopyTo method will allow you to append to another array.
for example, if you have an array of size 100 and another array of size 200, you can use the CopyTo method to Copy the array of size 100 to the last one hundred slots of the larger array. Or even copy it to the larger array starting at position 50 etc.
Clone will simply create an identical (shallow copy) of your existing array.

Filling one byte[] with multiple byte[]s

I created an application that stores byte arrays in my SQLiteDatabase.
This same application also selects the byte arrays from the database every 'x' seconds.
The dataflow of my application is as follow:
Application - > SQLiteDatabase -> Application
My question is:
How do I fill one byte array with all the incoming byte arrays from the SQLiteDatabase?
For example:
Byte[] Data;
Needs to be filled with the following byte array:
Byte[] IncomingData;
IncomingData is constantly being filled by the SQLiteDatabase.
Data needs to be filled with IncomingData constantly.
Can someone help me out?
Just use Concat:
data1.Concat(IncomingData);
You'll need to add the System.Linq namespace reference.
There are a few approaches you can take.
Use a List<byte> and List.AddRange
Use LINQ's Enumerable.Concat
Use Array.Copy and do it all manually
Of the three, if possible go with the List as it will (likely) reduce the amount of array copying required. This is what List's are made for, they use an array behind the scenes with a certain capacity, it starts at 4 and doubles when it hits the capacity. The capacity can even be set to some large number with the list.Capacity property or the constructor that takes an int much like you can with an array. You can always bring the list back using List.ToArray.
Enumerable.Concat will likely only create an array of the minimum size, meaning a new array needs to be created every time you get some more bytes.

How to read/write a specific number of bytes to file

I am looking to create a file by structuring it in size blocks. Essentially I am looking to create a rudimentary file system.
I need to write a header, and then an "infinite" possible number of entries of the same size/structure. The important parts are:
Each block of data needs to be read/writable individually
Header needs to be readable/writable as its own entity
Need a way to store this data and be able to determine its location in the file quickly
The would imagine the file would resemble something like:
[HEADER][DATA1][DATA2][DATA3][...]
What is the proper way to handle something like this? Lets say I want to read DATA3 from the file, how do I know where that data chunk starts?
If I understand you correctly and you need a way to assign a kind of names/IDs to your DATA chunks, you can try to introduce yet another type of chunk.
Let's call it TOC (table of contents).
So, the file structure will look like [HEADER][TOC1][DATA1][DATA2][DATA3][TOC2][...].
TOC chunk will contain names/IDs and references to multiple DATA chunks. Also, it will contain some internal data such as pointer to the next TOC chunk (so, you might consider each TOC chunk as a linked-list node).
At runtime all TOC chunks could be represented as a kind of HashMap, where key is a name/ID of the DATA chunk and value is its location in the file.
We can store in the header the size of chunk. If the size of chunks are variable, you can store pointers which points to actual chunk. An interesting design for variable size is in postgres heap file page. http://doxygen.postgresql.org/bufpage_8h_source.html
I am working in reverse but this may help.
I write decompilers for binary files. Generally there is a fixed header of a known number of bytes. This contains specific file identification so we can recognize the file type we are dealing with.
Following that will be a fixed number of bytes containing the number of sections (groups of data) This number then tells us how many data pointers there will be. Each data pointer may be four bytes (or whatever you need) representing the start of the data block. From this we can work out the size of each block. The decompiler then reads the blocks one at a time to get the size and location in the file of each data block. The job then is to extract that block of bytes and do whatever is needed.
We step through the file one block at a time. The size of the last block is the start pointer to the end of the file.

C# simple way to do file I/O with big multidimensional (non-jagged) array of bytes

I am working with big multidimensional byte arrays (~500mb per array, like, an array with dimensions of [8,8192,8192]) and I'd like to read and write them into file for storage.
I tried using BinaryFormatter but is very slow (takes minutes to do).
I tried using BinaryWriter but it only takes in a single dimensional array. Now, in C, there was no problem passing multi-dimensional array as single-dimensional. In C#, from what I see, I have two options:
Allocate another chunk of memory for single-dimensional array, copy data into it with for loops, then write this array into file using BinaryWriter
Using for loops, write each individual byte into file using BinaryWriter
Obviously it would be much faster if i'd just use byte[] everywhere and instead of using myarray[i,j] use myarray[i+j*myarray_width] but that would require rewrite of whole class just for purpose of easier working of one set of I/O functions (Save/Load).
There's gotta be a better way.
When it comes to fast serialization, unsafe code might come in handy. There are two techniques that can help here:
Do a memcpy from your byte[,] to a fresh byte[] that you can pass to FileStream.Write. This requires, of course, a temporary doubling of storage space and some copying. You could split this work into 64KB chunks, though.
PInvoke to the unmanaged WriteFile and pass it the FileStream.SafeHandle value. WriteFile takes an arbitrary pointer so you can directly write out your byte[,] (converted into a void*).
Option 2 is maximally fast ("zero-copy").
Sidenote: Unsafe code comes in handy whenever you need to reinterpret bytes. This capability leads to some nice abstractions in C. Fortunately, C# has that capability, too.

Categories

Resources