Concatenating a C# List of byte[] - c#

I am creating several byte arrays that need to be joined together to create one large byte array - i'd prefer not to use byte[]'s at all but have no choice here...
I am adding each one to a List as I create them, so I only have to do the concatenation once I have all the byte[]'s, but my question is, what is the best way of actually doing this?
When I have a list with an unknown number of byte[]'s and I want to concat them all together.
Thanks.

listOfByteArrs.SelectMany(byteArr=>byteArr).ToArray()
The above code will concatenate a sequence of sequences of bytes into one sequence - and store the result in an array.
Though readable, this is not maximally efficient - it's not making use of the fact that you already know the length of the resultant byte array and thus can avoid the dynamically extended .ToArray() implementation that necessarily involves multiple allocations and array-copies. Furthermore, SelectMany is implemented in terms of iterators; this means lots+lots of interface calls which is quite slow. However, for small-ish data-set sizes this is unlikely to matter.
If you need a faster implementation you can do the following:
var output = new byte[listOfByteArrs.Sum(arr=>arr.Length)];
int writeIdx=0;
foreach(var byteArr in listOfByteArrs) {
byteArr.CopyTo(output, writeIdx);
writeIdx += byteArr.Length;
}
or as Martinho suggests:
var output = new byte[listOfByteArrs.Sum(arr => arr.Length)];
using(var stream = new MemoryStream(output))
foreach (var bytes in listOfByteArrs)
stream.Write(bytes, 0, bytes.Length);
Some timings:
var listOfByteArrs = Enumerable.Range(1,1000)
.Select(i=>Enumerable.Range(0,i).Select(x=>(byte)x).ToArray()).ToList();
Using the short method to concatenate these 500500 bytes takes 15ms, using the fast method takes 0.5ms on my machine - YMMV, and note that for many applications both are more than fast enough ;-).
Finally, you could replace Array.CopyTo with the static Array.Copy, the low-level Buffer.BlockCopy, or a MemoryStream with a preallocated back buffer - these all perform pretty much identically on my tests (x64 .NET 4.0).

Here's a solution based on Andrew Bezzub's and fejesjoco's answers, pre-allocating all the memory needed up front. This yields Θ(N) memory usage and Θ(N) time (N being the total number of bytes).
byte[] result = new byte[list.Sum(a => a.Length)];
using(var stream = new MemoryStream(result))
{
foreach (byte[] bytes in list)
{
stream.Write(bytes, 0, bytes.Length);
}
}
return result;

write them all to a MemoryStream instead of a list. then call MemoryStream.ToArray(). Or when you have the list, first summarize all the byte array lengths, create a new byte array with the total length, and copy each array after the last in the big array.

Use Linq:
List<byte[]> list = new List<byte[]>();
list.Add(new byte[] { 1, 2, 3, 4 });
list.Add(new byte[] { 1, 2, 3, 4 });
list.Add(new byte[] { 1, 2, 3, 4 });
IEnumerable<byte> result = Enumerable.Empty<byte>();
foreach (byte[] bytes in list)
{
result = result.Concat(bytes);
}
byte[] newArray = result.ToArray();
Maybe faster solution would be (not declaring array upfront):
IEnumerable<byte> bytesEnumerable = GetBytesFromList(list);
byte[] newArray = bytesEnumerable.ToArray();
private static IEnumerable<T> GetBytesFromList<T>(IEnumerable<IEnumerable<T>> list)
{
foreach (IEnumerable<T> elements in list)
{
foreach (T element in elements)
{
yield return element;
}
}
}
It seems like above would iterate each array only once.

Instead of storing each byte array into a List<byte[]>, you could instead add them directly to a List<byte>, using the AddRange method for each one.

hmm how about list.addrange?

Related

Array of bytes[] has no values when is converted from int[]

I'm passing int[] array that hold image, later I want to convert it to bytes[] and save the image to local path. However, I notice that the bytePic[] length is equal to int[] arrPic just the values are missing. There is a screenshot below:
Below is the entire function:
public string ChangeMaterialPicture(int[] arrPic, int materialId,string defaultPath)
{
var material = _warehouseRepository.GetMaterialById(materialId);
if(material is not null)
{
// Convert the Array to Bytes
byte[] bytePic = new byte[arrPic.Length];
for(var i = 0; i < arrPic.Length; i++)
{
AddByteToArray(bytePic, Convert.ToByte(arrPic[i]));
}
// Convert the Bytes to IMG
string filename = Guid.NewGuid().ToString() + "_.png";
System.IO.File.WriteAllBytes(#$"{defaultPath}\materials\{material.VendorId}\{filename}", bytePic);
// Update the Image
material.Picture = filename;
_warehouseRepository.UpdateMaterial(material);
return material.Picture;
}
else
{
return String.Empty;
}
}
public byte[] AddByteToArray(byte[] bArray, byte newByte)
{
byte[] newArray = new byte[bArray.Length + 1];
bArray.CopyTo(newArray, 1);
newArray[0] = newByte;
return newArray;
}
You are creating the new array newArray in AddByteToArray and return it. But at the call site you are never using this returned value and the bytePic array remains unchanged.
The code in AddByteToArray makes no sense. Why create a new array when the intention was to insert one byte into an existing array? What you need to do is to cast the int into byte. Simply write:
byte[] bytePic = new byte[arrPic.Length];
for (int i = 0; i < arrPic.Length; i++)
{
bytePic[i] = (byte)arrPic[i];
}
And delete the method AddByteToArray.
This assumes that every value in the int array is in the range 0 to 255 and therefore fits into one byte.
There are different ways to do this. With LINQ you could also write:
byte[] bytePic = arrPic.Select(i => (byte)i).ToArray();
I would assume your original array uses a int to represent a full RGBA-pixel, since 32bit per pixel mono images are very rare in my experience. And if you do have such an image, you probably want to be more intelligent in how you do this conversion. The only time just casting int to bytes would be a good idea is if you are sure only the lower 8 bits are used, but if that is the case, why are you using an int-array in the first place.
If you actually have RGBA-pixles you do not want to convert individual int-values to bytes, but rather convert a single int value to 4 bytes. And this is not that difficult to do, you just need to use the right methods. The old school options is to use Buffer.BlockCopy.
Example:
byte[] bytePic = new byte[arrPic.Length * 4];
Buffer.BlockCopy(arrPic, 0, bytePic, 0, bytePic.Length);
But if your write-method accepts a span you might want to just convert your array to a span and cast this to the right type, avoiding the copy.

Converting a memory stream to an array but taking only a specified number of elements

I'm interested in implementing image format validation.
I'm taking an input file (as IFormFile), and I want to encode it to bytes, and to compare these bytes to the starting bytes of formats such as jpeg and png. If the file's first two bytes equal to those of jpeg, for example, then the file is a jpeg image.
The implementation attached converts the entire file to an array of bytes, which seems to be inefficient:
using var stream = new MemoryStream();
file.CopyTo(stream);
byte[] checkIfImage = stream.ToArray();
Then, we'll compare it to bytes of jpeg or png.
var png = new byte[] { 137, 80 };
var jpeg = new byte[] { 255, 216 };
Rather than creating this big inefficient array containing all of the file's bytes, I want to create an array containing only the first two bytes, so the comparison will be efficient.
However, I cannot simply add .Take(2) or something like that after .ToArray().
What should I do?
Thanks!
Just use the stream.Read method
var firstBytes = new byte[2];
var nrBytesRead = filestream.Read(firstBytes);
if(nrBytesRead == firstBytes.Length){
// do the comparison
}
Classes like binaryreader might also be useful. Reading the entire stream to memory can be convenient, but should in general be avoided if performance or memory usage is a concern.
Disregarding any other problems you have conceptually or otherwise... Since you are allocating for the MemoryStream, you might as well just use GetBuffer to access to the underlying array. From there you can use any comparison technique you like.
In this example i'm using Memory<T> and ReadOnlyMemory<T> and using SequenceEqual to check for the pattern (¯\_(ツ)_/¯)
Given
private static readonly (string Name, ReadOnlyMemory<byte> Pattern)[] _patterns =
{
("png", new byte[] {1, 2}),
("jpj", new byte[] {1, 216}),
};
Usage
using var file = new MemoryStream(new byte[] {1, 2, 3, 4, 5});
using var stream = new MemoryStream();
file.CopyTo(stream);
var mem = stream.GetBuffer().AsMemory();
foreach (var (name, pattern) in _patterns)
if (pattern.Span.SequenceEqual(mem.Span(0,pattern.Span.Length)))
Console.WriteLine("Found : " + name + ", " + Convert.ToHexString(pattern.Span));
Output
Found : png, 0102
Disclaimer 1 : If you are using the old and busted .net framework, all bets are off. You will need to do this with arrays and crayons
Disclaimer 2 : This was not meant to be the greatest code in the world, it was just a tribute

Need a fast method of deserializing 1 million Strings & Guids in c#

I want to deserialize a list of 1 million pairs of (String,Guid) for a performance critical app. The format can be anything I choose, and serialization does not have the same performance requirements.
What sort of approach is best? Text or binary? Write each pair (string,guid) consecutively, or write all strings followed by all guids?
I started playing with LinqPad, (and the simpler example of deserializing strings only) and found that (slightly counter-intuitively), using a TextReader and ReadLine() was a fair bit faster than using a BinaryReader and ReadString(). (Is the filesystem cache playing tricks on me?)
public string[] DeSerializeBinary()
{
var tmr = System.Diagnostics.Stopwatch.StartNew();
long ms = 0;
string[] arr = null;
using (var rdr = new BinaryReader(new FileStream(file, FileMode.Open, FileAccess.Read)))
{
var num = rdr.ReadInt32();
arr = new String[num];
for (int i = 0; i < num; i++)
{
arr[i] = rdr.ReadString();
}
tmr.Stop();
ms = tmr.ElapsedMilliseconds;
Console.WriteLine("DeSerializeBinary took {0}ms", ms);
}
return arr;
}
public string[] DeserializeText()
{
var tmr = System.Diagnostics.Stopwatch.StartNew();
long ms = 0;
string[] arr = null;
using (var rdr = File.OpenText(file))
{
var num = Int32.Parse(rdr.ReadLine());
arr = new String[num];
for (int i = 0; i < num; i++)
{
arr[i] = rdr.ReadLine();
}
tmr.Stop();
ms = tmr.ElapsedMilliseconds;
Console.WriteLine("DeserializeText took {0}ms", ms);
}
return arr;
}
Some Edits:
I used RamMap to clear the file system cache, and it turns out there was very little difference to Text & Binary reader for strings only.
I have a fairly simple class that holds the string and guid. It also holds an int index which corresponds to its position in the list. Obviously there's no need to include this in serialization.
In a test for (binary) deSerializing Strings and Guids alternately, I get around 500ms.
Ideal timing is 50ms, or as close as I can get. However, a simple experiment showed it takes at least 120ms to read the (compressed) file into memory from a reasonably fast SSD drive, without any sort of parsing at all. So 50ms seems unlikely.
Our strings have no theoretical length restrictions. However, we can assume that the performance target only applies if they are all 20 characters or less.
Timings include opening the file.
Reading the Strings is the clear bottleneck now (hence my experiments with serializing strings only). The JIT_NewFast took 30% before I preallocated an array of 16bytes for reading GUIDs.
It's not surprising that reading a bunch of strings is faster with StreamReader than with BinaryReader. StreamReader reads in blocks from the underlying stream, and parses the strings from that buffer. BinaryReader doesn't have a buffer like that. It reads the string length from the underlying stream, and then reads that many characters. So BinaryReader makes more calls to the base stream's Read method.
But there's more to deserializing a (String, Guid) pair than just reading. You also have to parse the Guid. If you write the file in binary then the Guid is written in binary, which makes it much easier and faster to create a Guid structure. If it's a string, then you have to call new Guid(string) to parse the text and create a Guid, after you split the line into its two fields.
Hard to say which of those will be faster.
I can't imagine that we're talking about a whole lot of time here. Certainly reading a file with a million lines will take around a second. Unless the string is really long. A GUID is only 36 characters if you count the separators, right?
With BinaryWriter, you can write the file like this:
writer.Write(count); // integer number of records
foreach (var pair in pairs)
{
writer.Write(pair.theString);
writer.Write(pair.theGuid.ToByteArray());
}
And to read it, you have:
count = reader.ReadInt32();
byte[] guidBytes = new byte[16];
for (int i = 0; i < count; ++i)
{
string s = reader.ReadString();
reader.Read(guidBytes, 0, guidBytes.Length);
pairs.Add(new Pair(s, new Guid(guidBytes));
}
Whether that's faster than splitting a string and calling the Guid constructor that takes a string parameter, I don't know.
I suspect that any difference is going to be pretty slight. I'd probably go with the simplest method: a text file.
If you want to get really crazy, you can write a custom format that you can easily slurp up in just a couple of large reads (a header, an index, and two arrays for strings and GUIDs), and do everything else in memory. That would almost certainly be faster. But faster enough to warrant the extra work? Doubtful.
Update
Or maybe not doubtful. Here's some code that writes and reads a custom binary format. The format is:
count (int32)
guids (count * 16 bytes)
strings (one big concatenated string)
index (index of each string's starting character in the big string)
I assume you're using a Dictionary<string, Guid> to hold these things. But your data structure doesn't really matter. The code would be substantially the same.
Note that I tested this very briefly. I won't say that the code is 100% bug free, but I think you can get the idea of what I'm doing.
private void WriteGuidFile(string filename, Dictionary<string, Guid>guids)
{
using (var fs = File.Create(filename))
{
using (var writer = new BinaryWriter(fs, Encoding.UTF8))
{
List<int> stringIndex = new List<int>(guids.Count);
StringBuilder bigString = new StringBuilder();
// write count
writer.Write(guids.Count);
// Write the GUIDs and build the string index
foreach (var pair in guids)
{
writer.Write(pair.Value.ToByteArray(), 0, 16);
stringIndex.Add(bigString.Length);
bigString.Append(pair.Key);
}
// Add one more entry to the string index.
// makes deserializing easier
stringIndex.Add(bigString.Length);
// Write the string that contains all of the strings, combined
writer.Write(bigString.ToString());
// write the index
foreach (var ix in stringIndex)
{
writer.Write(ix);
}
}
}
}
Reading is just slightly more involved:
private Dictionary<string, Guid> ReadGuidFile(string filename)
{
using (var fs = File.OpenRead(filename))
{
using (var reader = new BinaryReader(fs, Encoding.UTF8))
{
// read the count
int count = reader.ReadInt32();
// The guids are in a huge byte array sized 16*count
byte[] guidsBuffer = new byte[16*count];
reader.Read(guidsBuffer, 0, guidsBuffer.Length);
// Strings are all concatenated into one
var bigString = reader.ReadString();
// Index is an array of int. We can read it as an array of
// ((count+1) * 4) bytes.
byte[] indexBuffer = new byte[4*(count+1)];
reader.Read(indexBuffer, 0, indexBuffer.Length);
var guids = new Dictionary<string, Guid>(count);
byte[] guidBytes = new byte[16];
int startix = 0;
int endix = 0;
for (int i = 0; i < count; ++i)
{
endix = BitConverter.ToInt32(indexBuffer, 4*(i+1));
string key = bigString.Substring(startix, endix - startix);
Buffer.BlockCopy(guidsBuffer, (i*16),
guidBytes, 0, 16);
guids.Add(key, new Guid(guidBytes));
startix = endix;
}
return guids;
}
}
}
A couple of notes here. First, I'm using BitConverter to convert the data in the byte arrays to integers. It would be faster to use unsafe code and just index into the arrays using an int32*.
You might gain some speed by using pointers to index into the guidBuffer and calling Guid Constructor (Int32, Int16, Int16, Byte, Byte, Byte, Byte, Byte, Byte, Byte, Byte) rather than using Buffer.BlockCopy to copy the GUID into the temporary array.
You could make the string index an index of lengths rather than the starting positions. That would eliminate the need for the extra value at the end of the array, but it's unlikely that it'd make any difference in the speed.
There might be other optimization opportunities, but I think you get the general idea here.

Get a Array subset without copying like in C with pointers

I have an api call which needs to get a byte[] as parameter and my data already is in a byte[]. The problem is that I want to send this buffer in little chunks.
The slow solution would be to copy the array data to new arrays. But I don't want to do this because copying is unnecessary. I just want a byte[]-pointer which i can move around in my buffer. Like in C or C++...
Here a sample in pseudo code:
ArrayOriginal = { 0, 1, 2, 3, 4, ... 100 }
ArrayFirstChunk = { 0, 1, 2 } (pointer to the first element in the Original Array)
ArraySecondChunk = { 3, 4, 5 } (pointer to the fourth element in the Original Array)
...
Is this possible? The data shall be available only one time in the memory.
thx
You don't say whether you can change the API. I assume not, but if you can, there is always IEnumerable<byte> - so you return
myarray.Skip(4).Take(4);
etc
You could try using unsafe to get a pointer to your array. Otherwise Buffer.BlockCopy is an efficient way of copying portions of arrays to another array. If sending small chunks of data you could just reuse the small array instance and leave it to garbage collection to release the memory from the array.
You can create FakeArray that contain an array an offset and a length. Like this you could work with subarray of array. But It won't be an array.
When you pass an array as a parameter, you are passing just a pointer to that array. The array is stored only once in memory.
So, I think you just don't need to divide in little chunks.
If you want to process it in chunks, I would suggest just reading the desired elements, first from 0 to 2, then from 3 to 5, etc...
Hope that helps
The usual way of handling data inside an array is simply to specify a chosen offset and count, for example:
// {0, ..., 100}
byte[] data = Enumerable.Range(0, 101).Select(i => (byte)i).ToArray();
Write(data, 0, 3);
Write(data, 3, 3);
...
static void Write(byte[] buffer, int offset, int count)
{
for(int i = offset ; i < offset + count; i++)
Console.WriteLine(buffer[i]);
Console.WriteLine();
}
You can do something similar to the C approach with unsafe (via byte* and fixed), but I'm not sure it buys you much here; but:
fixed(byte* ptr = data)
{
Write(ptr, 3);
Write(ptr + 3, 3);
}
...
static unsafe void Write(byte* ptr, int count)
{
for (int i = 0; i < count; i++)
Console.WriteLine(ptr[i]);
Console.WriteLine();
}
You can encapsulate a buffer, offset and count, but then it won't be an array - probably not very helpful.

Fastest way to chop array in two pieces

I have an array, say:
var arr1 = new [] { 1, 2, 3, 4, 5, 6 };
Now, when my array-size exceeds 5, I want to resize the current array to 3, and create a new array that contains the upper 3 values, so after this action:
arr1 = new [] { 1, 2, 3 };
newArr = new [] { 4, 5, 6 };
What's the fastest way to do this? I guess I'll have to look into the unmanaged corner, but no clue.
Some more info:
The arrays have to be able to size up without large performance hits
The arrays will only contain Int32's
Purpose of the array is to group the numbers in my source array without having to sort the whole list
In short: I want to split the following input array:
int[] arr = new int[] { 1, 3, 4, 29, 31, 33, 35, 36, 37 };
into
arr1 = 1, 3, 4
arr2 = 29, 31, 33, 35, 36, 37
but because the ideal speed is reached with an array size of 3, arr2 should be split into 2 evenly sized arrays.
Note
I know that an array's implementation in memory is quite naive (well, at least it is in C, where you can manipulate the count of items in the array so the array resizes). Also that there is a memory move function somewhere in the Win32 API. So I guess this would be the fastest:
Change arr1 so it only contains 3 items
Create new array arr2 with size 3
Memmove the bytes that aren't in arr1 anymore into arr2
I'm not sure there's anything better than creating the empty arrays, and then using Array.Copy. I'd at least hope that's optimized internally :)
int[] firstChunk = new int[3];
int[] secondChunk = new int[3];
Array.Copy(arr1, 0, firstChunk, 0, 3);
Array.Copy(arr1, 3, secondChunk, 0, 3);
To be honest, for very small arrays the overhead of the method call may be greater than just explicitly assigning the elements - but I assume that in reality you'll be using slightly bigger ones :)
You might also consider not actually splitting the array, but instead using ArraySegment to have separate "chunks" of the array. Or perhaps use List<T> to start with... it's hard to know without a bit more context.
If speed is really critical, then unmanaged code using pointers may well be the fastest approach - but I would definitely check whether you really need to go there before venturing into unsafe code.
Are you looking for something like this?
static unsafe void DoIt(int* ptr)
{
Console.WriteLine(ptr[0]);
Console.WriteLine(ptr[1]);
Console.WriteLine(ptr[2]);
}
static unsafe void Main()
{
var bytes = new byte[1024];
new Random().NextBytes(bytes);
fixed (byte* p = bytes)
{
for (int i = 0; i < bytes.Length; i += sizeof(int))
{
DoIt((int*)(p + i));
}
}
Console.ReadKey();
}
This avoids creating new arrays (which cannot be resized, not even with unsafe code!) entirely and just passes a pointer into the array to some method which reads the first three integers.
If your array will always contain 6 items how about:
var newarr1 = new []{oldarr[0], oldarr[1],oldarr[2]};
var newarr2 = new []{oldarr[3], oldarr[4],oldarr[5]};
Reading from memory is fast.
Since arrays are not dynamically resized in C#, this means your first array must have a minimum length of 5 or maximum length of 6, depending on your implementation. Then, you're going to have to dynamically create new statically sized arrays of 3 each time you need to split. Only after each split will your array items be in their natural order unless you make each new array a length of 5 or 6 as well and only add to the most recent. This approach means that each new array will have 2-3 extra pointers as well.
Unless you have a known number of items to go into your array BEFORE compiling the application, you're also going to have to have some form of holder for your dynamically created arrays, meaning you're going to have to have an array of arrays (a jagged array). Since your jagged array is also statically sized, you'll need to be able to dynamically recreate and resize it as each new dynamically created array is instantiated.
I'd say copying the items into the new array is the least of your worries here. You're looking at some pretty big performance hits as well as the array size(s) grow.
UPDATE: So, if this were absolutely required of me...
public class MyArrayClass
{
private int[][] _master = new int[10][];
private int[] _current = new int[3];
private int _currentCount, _masterCount;
public void Add(int number)
{
_current[_currentCount] = number;
_currentCount += 1;
if (_currentCount == _current.Length)
{
Array.Copy(_current,0,_master[_masterCount],0,3);
_currentCount = 0;
_current = new int[3];
_masterCount += 1;
if (_masterCount == _master.Length)
{
int[][] newMaster = new int[_master.Length + 10][];
Array.Copy(_master, 0, newMaster, 0, _master.Length);
_master = newMaster;
}
}
}
public int[][] GetMyArray()
{
return _master;
}
public int[] GetMinorArray(int index)
{
return _master[index];
}
public int GetItem(int MasterIndex, int MinorIndex)
{
return _master[MasterIndex][MinorIndex];
}
}
Note: This probably isn't perfect code, it's a horrible way to implement things, and I would NEVER do this in production code.
The obligatory LINQ solution:
if(arr1.Length > 5)
{
var newArr = arr1.Skip(arr1.Length / 2).ToArray();
arr1 = arr1.Take(arr1.Length / 2).ToArray();
}
LINQ is faster than you might think; this will basically be limited by the Framework's ability to spin through an IEnumerable (which on an array is pretty darn fast). This should execute in roughly linear time, and can accept any initial size of arr1.

Categories

Resources