faster code to remove first elements from byte array [duplicate] - c#

This question already has answers here:
How do you remove and add bytes from a byte array in C#
(2 answers)
Closed 7 years ago.
So I have a byte array, and I need to remove the first 5 elements from it. Anyway, I looked online and I couldn't find anything that suited what I was looking for. So I made this, and it is horribly slow, in essence, unusable.
private byte[] fR(byte[] tb)
{
string b = "";
byte[] m = new byte[tb.Length - 5];
for (int a = 5; a < tb.Length; a++)
{
b = b + " " + tb.GetValue(a);
}
b = b.Remove(0, 1);
string[] rd = Regex.Split(b, " ");
for (int c = 0; c < rd.Length; c++)
{
byte curr = Convert.ToByte(rd[c]);
m.SetValue(curr, c);
}
return m;
}
What I am asking is, is if there is a way to make this faster/improve. Or another method in which I can remove the first 5 elements from a byte array.

Much easier and quicker:
byte[] src = ...;
byte[] dst = new byte[src.Length - 5];
Array.Copy(src, 5, dst, 0, dst.Length);
This is as fast as you'll be able to get.
If you're using C# 8, you can use ranges to copy a slice of the array very concisely:
byte[] src = ...;
byte[] dst = src[5..];
LINQ used in other answers, being a bit easier to understand, is what I'd do 90% of the time. But, LINQ has its own overheads especially for simple problems like this, and I'd not use it if performance is critical.

Your code is slow because you're packing the byte array into a string and then unpacking it. Get rid of the string manipulation and it will be fast.
You can use Linq:
tb.Skip(5).ToArray();

What about
tb.Skip(5).ToArray()
?

Related

Array of bytes[] has no values when is converted from int[]

I'm passing int[] array that hold image, later I want to convert it to bytes[] and save the image to local path. However, I notice that the bytePic[] length is equal to int[] arrPic just the values are missing. There is a screenshot below:
Below is the entire function:
public string ChangeMaterialPicture(int[] arrPic, int materialId,string defaultPath)
{
var material = _warehouseRepository.GetMaterialById(materialId);
if(material is not null)
{
// Convert the Array to Bytes
byte[] bytePic = new byte[arrPic.Length];
for(var i = 0; i < arrPic.Length; i++)
{
AddByteToArray(bytePic, Convert.ToByte(arrPic[i]));
}
// Convert the Bytes to IMG
string filename = Guid.NewGuid().ToString() + "_.png";
System.IO.File.WriteAllBytes(#$"{defaultPath}\materials\{material.VendorId}\{filename}", bytePic);
// Update the Image
material.Picture = filename;
_warehouseRepository.UpdateMaterial(material);
return material.Picture;
}
else
{
return String.Empty;
}
}
public byte[] AddByteToArray(byte[] bArray, byte newByte)
{
byte[] newArray = new byte[bArray.Length + 1];
bArray.CopyTo(newArray, 1);
newArray[0] = newByte;
return newArray;
}
You are creating the new array newArray in AddByteToArray and return it. But at the call site you are never using this returned value and the bytePic array remains unchanged.
The code in AddByteToArray makes no sense. Why create a new array when the intention was to insert one byte into an existing array? What you need to do is to cast the int into byte. Simply write:
byte[] bytePic = new byte[arrPic.Length];
for (int i = 0; i < arrPic.Length; i++)
{
bytePic[i] = (byte)arrPic[i];
}
And delete the method AddByteToArray.
This assumes that every value in the int array is in the range 0 to 255 and therefore fits into one byte.
There are different ways to do this. With LINQ you could also write:
byte[] bytePic = arrPic.Select(i => (byte)i).ToArray();
I would assume your original array uses a int to represent a full RGBA-pixel, since 32bit per pixel mono images are very rare in my experience. And if you do have such an image, you probably want to be more intelligent in how you do this conversion. The only time just casting int to bytes would be a good idea is if you are sure only the lower 8 bits are used, but if that is the case, why are you using an int-array in the first place.
If you actually have RGBA-pixles you do not want to convert individual int-values to bytes, but rather convert a single int value to 4 bytes. And this is not that difficult to do, you just need to use the right methods. The old school options is to use Buffer.BlockCopy.
Example:
byte[] bytePic = new byte[arrPic.Length * 4];
Buffer.BlockCopy(arrPic, 0, bytePic, 0, bytePic.Length);
But if your write-method accepts a span you might want to just convert your array to a span and cast this to the right type, avoiding the copy.

How to take array segments out of a byte array after every X step?

I got a big byte array (around 50kb) and i need to extract numeric values from it. Every three bytes are representing one value.
What i tried is to work with LINQs skip & take but it's really slow regarding the large size of the array.
This is my very slow routine:
List<int> ints = new List<int>();
for (int i = 0; i <= fullFile.Count(); i+=3)
{
ints.Add(BitConverter.ToInt16(fullFile.Skip(i).Take(i + 3).ToArray(), 0));
}
I think i got a wrong approach to this.
Your code
First of all, ToInt16 only uses two bytes. So your third byte will be discarded.
You can't use ToInt32 as it would include one extra byte.
Let's review this:
fullFile.Skip(i).Take(i + 3).ToArray()
..and take a careful look at Take(i + 3). It says that you want to copy a larger and larger buffer. For instance, when i is on index 32000 you copy 32003 bytes into your new buffer.
That's why the code is quite slow.
The code is also slow since you allocate a lot of byte buffers which will need to be garbage collected. 65535 extra buffers of growing size which would have to be garbage collected.
You could also have done like this:
List<int> ints = new List<int>();
var workBuffer = new byte[4];
for (int i = 0; i <= fullFile.Length; i += 3)
{
// Copy the three bytes into the beginning of the temp buffer
Buffer.BlockCopy(fullFile, i, workBuffer, 0, 3);
// Now we can use ToInt32 as the last byte always is zero
var value = BitConverter.ToInt32(workBuffer, 0);
ints.Add(value);
}
Quite easy to understand, but not the fastest code.
A better solution
So the most efficient way is to do the conversion by yourself (bit shifting).
Something like:
List<int> ints = new List<int>();
for (int i = 0; i <= fullFile.Length; i += 3)
{
// This code assume little endianess
var value = (fullFile[i + 2] << 16)
+ (fullFile[i + 1] << 8)
+ fullFile[i];
ints.Add(value);
}
This code do not allocate anything extra (except the ints), and should be quite fast.
You can read more about Shift operators in MSDN. And about endianess

Same integer lists, different byte arrays

I have a question about interesting thing that happened to me when I have tried to convert elements of List<short> to byte[] in C#.
Firstly, I had to read large binary file, which contains 262144 short type signed numbers. I have read the file and build list of numbers with the following code:
byte[] content = null;
content = File.ReadAllBytes(scanName);
List<int> transformed = new List<int>();
for (int n = 0; n < content.Length; n += 2) // 2 bytes
{
short sample = BitConverter.ToInt16(content, n);
transformed.Add(sample);
}
Then I have compressed and decompressed numbers with algorithm and got back same values, which seemed right. The problems occurs when try to convert both lists to byte arrays. This has been done by following method:
private byte[] ToByte(List<short> list){
List<byte> toRet = new List<byte>();
foreach(short s in list)
{
byte[] converted = BitConverter.GetBytes(s);
foreach(byte b in converted)
{
toRet.Add(b);
}
}
return toRet.ToArray();
}
But when I compared both byte arrays with first.SequenceEqual(second), the method returned false. Isn't it strange, because values in both lists are same?
At the end, I have solved the issue. The problem wasn't in converting short to byte, but in the part of code which hasn't been published there. Specifically, I made very beginner mistake, I converted 2D array into 1D array in the wrong way. Now everything works perfectly. Thank you for all your responses and sorry for inconveniences!

HashSet for byte arrays [duplicate]

This question already has answers here:
How to create a HashSet<List<Int>> with distinct elements?
(5 answers)
Closed 4 years ago.
This post was edited and submitted for review 2 months ago and failed to reopen the post:
Original close reason(s) were not resolved
I need a HashSet for byte arrays in order to check if a given byte array exists in the collection. But it seems like this doesn't work for byte arrays (or perhaps any array).
Here is my test code:
void test()
{
byte[] b1 = new byte[] { 1, 2, 3 };
byte[] b2 = new byte[] { 1, 2, 3 };
HashSet<byte[]> set = new HashSet<byte[]>();
set.Add(b1);
set.Add(b2);
Text = set.Count.ToString();//returns 2 instead of the expected 1.
}
Is there a way to make a HashSet for byte arrays?
Construct a HashSet with an IEqualityComparer<byte[]>. You don't want to use an interface here. While byte[] does in fact implement interfaces such as IEnumerable<byte>, IList<byte>, etc., use of them is a bad idea due to the weightiness involved. You don't use the fact that string implements IEnumerable<char> much at all so don't for byte[] either.
public class bytearraycomparer : IEqualityComparer<byte[]> {
public bool Equals(byte[] a, byte[] b)
{
if (a.Length != b.Length) return false;
for (int i = 0; i < a.Length; i++)
if (a[i] != b[i]) return false;
return true;
}
public int GetHashCode(byte[] a)
{
uint b = 0;
for (int i = 0; i < a.Length; i++)
b = ((b << 23) | (b >> 9)) ^ a[i];
return unchecked((int)b);
}
}
void test()
{
byte[] b1 = new byte[] { 1, 2, 3 };
byte[] b2 = new byte[] { 1, 2, 3 };
HashSet<byte[]> set = new HashSet<byte[]>(new bytearraycomparer );
set.Add(b1);
set.Add(b2);
Text = set.Count.ToString();
}
https://msdn.microsoft.com/en-us/library/bb359100(v=vs.110).aspx
If you were to use the answers in proposed duplicate question, you would end up with one function call and one array bounds check per byte processed. You don't want that. If expressed in the simplest way like so, the jitter will inline the fetches, and then notice that the bounds checks cannot fail (arrays can't be resized) and omit them. Only one function call for the entire array. Yay.
Lists tend to have only a few elements as compared to a byte array so often the dirt-simple hash function such as foreach (var item in list) hashcode = hashcode * 5 + item.GetHashCode(); if you use that kind of hash function for byte arrays you will have problems. The multiply by a small odd number trick ends up being rather biased too quickly for comfort here. My particular hash function given here is probably not optimal but we have run tests on this family and it performs quite well with three million entries. The multiply-by-odd was getting into trouble too quickly due to possessing numerous collisions that were only two bytes long/different. If you avoid the degenerate numbers this family will have no collisions in two bytes and most of them have no collisions in three bytes.
Considering actual use cases: By far the two most likely things here are byte strings and actual files being checked for sameness. In either case, taking a hash code of the first few bytes is most likely a bad idea. String's hash code uses the whole string, so byte strings should do the same, and most files being duplicated don't have a unique prefix in the first few bytes. For N entries, if you have hash collisions for the square root on N, you might as well have walked the entire array when generating the hash code, neglecting the fact that compares are slower than hashes.

Initialize a byte array to a certain value, other than the default null? [duplicate]

This question already has answers here:
What is the equivalent of memset in C#?
(17 answers)
Closed 8 years ago.
I'm busy rewriting an old project that was done in C++, to C#.
My task is to rewrite the program so that it functions as close to the original as possible.
During a bunch of file-handling the previous developer who wrote this program creates a structure containing a ton of fields that correspond to the set format that a file has to be written in, so all that work is already done for me.
These fields are all byte arrays. What the C++ code then does is use memset to set this entire structure to all spaces characters (0x20). One line of code. Easy.
This is very important as the utility that this file eventually goes to is expecting the file in this format. What I've had to do is change this struct to a class in C#, but I cannot find a way to easily initialize each of these byte arrays to all space characters.
What I've ended up having to do is this in the class constructor:
//Initialize all of the variables to spaces.
int index = 0;
foreach (byte b in UserCode)
{
UserCode[index] = 0x20;
index++;
}
This works fine, but I'm sure there must be a simpler way to do this. When the array is set to UserCode = new byte[6] in the constructor the byte array gets automatically initialized to the default null values. Is there no way that I can make it become all spaces upon declaration, so that when I call my class' constructor that it is initialized straight away like this? Or some memset-like function?
For small arrays use array initialisation syntax:
var sevenItems = new byte[] { 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20 };
For larger arrays use a standard for loop. This is the most readable and efficient way to do it:
var sevenThousandItems = new byte[7000];
for (int i = 0; i < sevenThousandItems.Length; i++)
{
sevenThousandItems[i] = 0x20;
}
Of course, if you need to do this a lot then you could create a helper method to help keep your code concise:
byte[] sevenItems = CreateSpecialByteArray(7);
byte[] sevenThousandItems = CreateSpecialByteArray(7000);
// ...
public static byte[] CreateSpecialByteArray(int length)
{
var arr = new byte[length];
for (int i = 0; i < arr.Length; i++)
{
arr[i] = 0x20;
}
return arr;
}
Use this to create the array in the first place:
byte[] array = Enumerable.Repeat((byte)0x20, <number of elements>).ToArray();
Replace <number of elements> with the desired array size.
You can use Enumerable.Repeat()
Enumerable.Repeat generates a sequence that contains one repeated value.
Array of 100 items initialized to 0x20:
byte[] arr1 = Enumerable.Repeat((byte)0x20,100).ToArray();
var array = Encoding.ASCII.GetBytes(new string(' ', 100));
If you need to initialise a small array you can use:
byte[] smallArray = new byte[] { 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20 };
If you have a larger array, then you could use:
byte[] bitBiggerArray Enumerable.Repeat(0x20, 7000).ToArray();
Which is simple, and easy for the next guy/girl to read. And will be fast enough 99.9% of the time.
(Normally will be the BestOption™)
However if you really really need super speed, calling out to the optimized memset method, using P/invoke, is for you:
(Here wrapped up in a nice to use class)
public static class Superfast
{
[DllImport("msvcrt.dll",
EntryPoint = "memset",
CallingConvention = CallingConvention.Cdecl,
SetLastError = false)]
private static extern IntPtr MemSet(IntPtr dest, int c, int count);
//If you need super speed, calling out to M$ memset optimized method using P/invoke
public static byte[] InitByteArray(byte fillWith, int size)
{
byte[] arrayBytes = new byte[size];
GCHandle gch = GCHandle.Alloc(arrayBytes, GCHandleType.Pinned);
MemSet(gch.AddrOfPinnedObject(), fillWith, arrayBytes.Length);
gch.Free();
return arrayBytes;
}
}
Usage:
byte[] oneofManyBigArrays = Superfast.InitByteArray(0x20,700000);
Maybe these could be helpful?
What is the equivalent of memset in C#?
http://techmikael.blogspot.com/2009/12/filling-array-with-default-value.html
Guys before me gave you your answer. I just want to point out your misuse of foreach loop. See, since you have to increment index standard "for loop" would be not only more compact, but also more efficient ("foreach" does many things under the hood):
for (int index = 0; index < UserCode.Length; ++index)
{
UserCode[index] = 0x20;
}
This is a faster version of the code from the post marked as the answer.
All of the benchmarks that I have performed show that a simple for loop that only contains something like an array fill is typically twice as fast if it is decrementing versus if it is incrementing.
Also, the array Length property is already passed as the parameter so it doesn't need to be retrieved from the array properties. It should also be pre-calculated and assigned to a local variable.
Loop bounds calculations that involve a property accessor will re-compute the value of the bounds before each iteration of the loop.
public static byte[] CreateSpecialByteArray(int length)
{
byte[] array = new byte[length];
int len = length - 1;
for (int i = len; i >= 0; i--)
{
array[i] = 0x20;
}
return array;
}
Just to expand on my answer a neater way of doing this multiple times would probably be:
PopulateByteArray(UserCode, 0x20);
which calls:
public static void PopulateByteArray(byte[] byteArray, byte value)
{
for (int i = 0; i < byteArray.Length; i++)
{
byteArray[i] = value;
}
}
This has the advantage of a nice efficient for loop (mention to gwiazdorrr's answer) as well as a nice neat looking call if it is being used a lot. And a lot mroe at a glance readable than the enumeration one I personally think. :)
The fastest way to do this is to use the api:
bR = 0xFF;
RtlFillMemory(pBuffer, nFileLen, bR);
using a pointer to a buffer, the length to write, and the encoded byte. I think the fastest way to do it in managed code (much slower), is to create a small block of initialized bytes, then use Buffer.Blockcopy to write them to the byte array in a loop. I threw this together but haven't tested it, but you get the idea:
long size = GetFileSize(FileName);
// zero byte
const int blocksize = 1024;
// 1's array
byte[] ntemp = new byte[blocksize];
byte[] nbyte = new byte[size];
// init 1's array
for (int i = 0; i < blocksize; i++)
ntemp[i] = 0xff;
// get dimensions
int blocks = (int)(size / blocksize);
int remainder = (int)(size - (blocks * blocksize));
int count = 0;
// copy to the buffer
do
{
Buffer.BlockCopy(ntemp, 0, nbyte, blocksize * count, blocksize);
count++;
} while (count < blocks);
// copy remaining bytes
Buffer.BlockCopy(ntemp, 0, nbyte, blocksize * count, remainder);
This function is way faster than a for loop for filling an array.
The Array.Copy command is a very fast memory copy function. This function takes advantage of that by repeatedly calling the Array.Copy command and doubling the size of what we copy until the array is full.
I discuss this on my blog at https://grax32.com/2013/06/fast-array-fill-function-revisited.html (Link updated 12/16/2019). Also see Nuget package that provides this extension method. http://sites.grax32.com/ArrayExtensions/
Note that this would be easy to make into an extension method by just adding the word "this" to the method declarations i.e. public static void ArrayFill<T>(this T[] arrayToFill ...
public static void ArrayFill<T>(T[] arrayToFill, T fillValue)
{
// if called with a single value, wrap the value in an array and call the main function
ArrayFill(arrayToFill, new T[] { fillValue });
}
public static void ArrayFill<T>(T[] arrayToFill, T[] fillValue)
{
if (fillValue.Length >= arrayToFill.Length)
{
throw new ArgumentException("fillValue array length must be smaller than length of arrayToFill");
}
// set the initial array value
Array.Copy(fillValue, arrayToFill, fillValue.Length);
int arrayToFillHalfLength = arrayToFill.Length / 2;
for (int i = fillValue.Length; i < arrayToFill.Length; i *= 2)
{
int copyLength = i;
if (i > arrayToFillHalfLength)
{
copyLength = arrayToFill.Length - i;
}
Array.Copy(arrayToFill, 0, arrayToFill, i, copyLength);
}
}
You can use a collection initializer:
UserCode = new byte[]{0x20,0x20,0x20,0x20,0x20,0x20};
This will work better than Repeat if the values are not identical.
You could speed up the initialization and simplify the code by using the the Parallel class (.NET 4 and newer):
public static void PopulateByteArray(byte[] byteArray, byte value)
{
Parallel.For(0, byteArray.Length, i => byteArray[i] = value);
}
Of course you can create the array at the same time:
public static byte[] CreateSpecialByteArray(int length, byte value)
{
var byteArray = new byte[length];
Parallel.For(0, length, i => byteArray[i] = value);
return byteArray;
}

Categories

Resources