Remove First 16 Bytes? - c#

How would I go about removing a number of bytes from a byte array?

EDIT: As nobugz's comment (and Reed Copsey's answer) mentions, if you don't actually need the result as a byte array, you should look into using ArraySegment<T>:
ArraySegment<byte> segment = new ArraySegment<byte>(full, 16, full.Length - 16);
Otherwise, copying will be necessary - arrays always have a fixed size, so you can't "remove" the first 16 bytes from the existing array. Instead, you'll have to create a new, smaller array and copy the relevant data into it.
Zach's suggestion is along the right lines for the non-LINQ approach, but it can be made simpler (this assumes you already know the original array is at least 16 bytes long):
byte[] newArray = new byte[oldArray.Length - 16];
Buffer.BlockCopy(oldArray, 16, newArray, 0, newArray.Length);
or
byte[] newArray = new byte[oldArray.Length - 16];
Array.Copy(oldArray, 16, newArray, 0, newArray.Length);
I suspect Buffer.BlockCopy will be slightly faster, but I don't know for sure.
Note that both of these could be significantly more efficient than the LINQ approach if the arrays involved are big: the LINQ approach requires each byte to be individually returned from an iterator, and potentially intermediate copies to be made (in the same way as adding items to a List<T> needs to grow the backing array periodically). Obviously don't micro-optimise, but it's worth checking if this bit of code is a performance bottleneck.
EDIT: I ran a very "quick and dirty" benchmark of the three approaches. I don't trust the benchmark to distinguish between Buffer.BlockCopy and Array.Copy - they were pretty close - but the LINQ approach was over 100 times slower.
On my laptop, using byte arrays of 10,000 elements, it took nearly 10 seconds to perform 40,000 copies using LINQ; the above approaches took about 80ms to do the same amount of work. I upped the iteration count to 4,000,000 and it still only took about 7 seconds. Obviously the normal caveats around micro-benchmarks apply, but this is a pretty significat difference.
Definitely use the above approach if this is in a code path which is important to performance :)

You could do this:
using System.Linq
// ...
var newArray = oldArray.Skip(numBytes).ToArray();

I will also mention - depending on how you plan to use the results, often, an alternative approach is to use ArraySegment<T> to just access the remaining portion of the array. This prevents the need to copy the array, which can be more efficient in some usage scenarios:
ArraySegment<byte> segment = new ArraySegment<byte>(originalArray, 16, originalArray.Length-16);
// Use segment how you'd use your array...

If you can't use Linq, you could do it this way:
byte[] myArray = // however you acquire the array
byte[] newArray = new byte[myArray.Length - 16];
for (int i = 0; i < newArray.Length; i++)
{
newArray[i] = myArray[i + 16];
}
// newArray is now myArray minus the first 16 bytes
You'll also need to handle the case where the array is less than 16 bytes long.

Related

Most efficient way to store and retrieve a 512-bit number?

I have a String of 512 characters that contains only 0, 1. I'm trying to represent it into a data structure that can save the space. Is BitArray the most efficient way?
I'm also thinking about using 16 int32 to store the number, which would then be 16 * 4 = 64 bytes.
Most efficient can mean many different things...
Most efficient from a memory management perspective?
Most efficient from a CPU calculation perspective?
Most efficient from a usage perspective? (In respect to writing code that uses the numbers for calculations)
For 1 - use byte[64] or long[8] - if you aren't doing calculations or don't mind writing your own calculations.
For 3 definitely BigInteger is the way to go. You have your math functions already defined and you just need to turn your binary number into a decimal representation.
EDIT: Sounds like you don't want BigInteger due to size concerns... however I think you are going to find that you will of course have to parse this as an enumerable / yield combo where you are parsing it a bit at a time and don't hold the entire data structure in memory at the same time.
That being said... I can help you somewhat with parsing your string into array's of Int64's... Thanks King King for part of this linq statement here.
// convert string into an array of int64's
// Note that MSB is in result[0]
var result = input.Select((x, i) => i)
.Where(i => i % 64 == 0)
.Select(i => input.Substring(i, input.Length - i >= 64 ?
64 : input.Length - i))
.Select(x => Convert.ToUInt64(x, 2))
.ToArray();
If you decide you want a different array structure byte[64] or whatever it should be easy to modify.
EDIT 2: OK I got bored so I wrote an EditDifference function for fun... here you go...
static public int GetEditDistance(ulong[] first, ulong[] second)
{
int editDifference = 0;
var smallestArraySize = Math.Min(first.Length, second.Length);
for (var i = 0; i < smallestArraySize; i++)
{
long signedDifference;
var f = first[i];
var s = second[i];
var biggest = Math.Max(f, s);
var smallest = Math.Min(f, s);
var difference = biggest - smallest;
if (difference > long.MaxValue)
{
editDifference += 1;
signedDifference = Convert.ToInt64(difference - long.MaxValue - 1);
}
else
signedDifference = Convert.ToInt64(difference);
editDifference += Convert.ToString(signedDifference, 2)
.Count(x => x == '1');
}
// if arrays are different sizes every bit is considered to be different
var differenceOfArraySize =
Math.Max(first.Length, second.Length) - smallestArraySize;
if (differenceOfArraySize > 0)
editDifference += differenceOfArraySize * 64;
return editDifference;
}
Use BigInteger from .NET. It can easily support 512-bit numbers as well as operations on those numbers.
BigInteger.Parse("your huge number");
BitArray (with 512 bits), byte[64], int[16], long[8] (or List<> variants of those), or BigInteger will all be much more efficient than your String. I'd say that byte[] is the most idiomatic/typical way of representing data such as this, in general. For example, ComputeHash uses byte[] and Streams deal with byte[]s, and if you store this data as a BLOB in a DB, byte[] will be the most natural way to work with that data. For that reason, it'd probably make sense to use this.
On the other hand, if this data represents a number that you might do numeric things to like addition and subtraction, you probably want to use a BigInteger.
These approaches have roughly the same performance as each other, so you should choose between them based primarily on things like what makes sense, and secondarily on performance benchmarked in your usage.
The most efficient would be having eight UInt64/ulong or Int64/long typed variables (or a single array), although this might not be optimal for querying/setting. One way to get around this is, indeed, to use a BitArray (which is basically a wrapper around the former method, including additional overhead [1]). It's a matter of choice either for easy use or efficient storage.
If this isn't sufficient, you can always choose to apply compression, such as RLE-encoding or various other widely available encoding methods (gzip/bzip/etc...). This will require additional processing power though.
It depends on your definition of efficient.
[1] Addtional overhead, as in storage overhead. BitArray internally uses an Int32-array to store values. In addition to that BitArray stores its current mutation version, the number of ints 'allocated' and a syncroot. Even though the overhead is negligible for smaller amount of values, it can be an issue if you keep a lot of these in memory.

Fast byte array masking in C#

I have a struct with some properties (like int A1, int A2,...). I store a list of struct as binary in a file.
Now, I'm reading the bytes from file using binary reader into Buffer and I want to apply a filter based on the struct's properties (like .A1 = 100 & .A2 = 12).
The performance is very important in my scenario, so I convert the filter criteria to byte array (Filter) and then I want to mask Buffer with Filter. If the result of masking is equal to Filter, the Buffer will be converted to the struct.
The question: What is the fastest way to mask and compare two byte arrays?
Update: The Buffer size is more than 256 bytes. I'm wondering if there is a better way rather than iterating in each byte of Buffer and Filter.
The way I would usually approach this is with unsafe code. You can use the fixed keyword to get a byte[] as a long*, which you can then iterate in 1/8th of the iterations - but using the same bit operations. You will typically have a few bytes left over (from it not being an exact multiple of 8 bytes) - just clean those up manually afterwards.
Try a simple loop with System.BitConverter.ToInt64(). Something Like this:
byte[] arr1;
byte[] arr2;
for (i = 0; i < arr1.Length; i += 8)
{
var P1 = System.BitConverter.ToInt64(arr1, i);
var P2 = System.BitConverter.ToInt64(arr2, i);
if((P1 & P2) != P1) //or whatever
//break the loop if you need to.
}
My assumption is that comparing/masking two Int64s will be much faster (especially on 64-bit machines) than masking one byte at a time.
Once you've got the two arrays - one from reading the file and one from the filter, all you then need is a fast comparison for the arrays. Check out the following postings which are using unsafe or PInvoke methods.
What is the fastest way to compare two byte arrays?
Comparing two byte arrays in .NET

how to drop part of array without allocationg new memory?

I have byte array. I need to drop first 4 bytes, like that:
byte[] newArray = new byte[byteArray.Length - 4];
Buffer.BlockCopy(byteArray, 4, newArray, 0, byteArray.Length - 4);
But can I just move pointer in C/C++ style? :
byte[] byteMsg = byteArray + 4;
I do not want to allocate extra memory until absolutely requried because this code is executed pretty often.
upd: I receive data from Socket so I probably should just use another version of Receive count = s.Receive(byteArray);
No, you can't do that. A .NET array is always of a fixed size, and you can't do pointer arithmetic on it outside unsafe code.
Try using ArraySegment instead
I wouldn't worry, the GC will take care of cleaning up the memory that you're no longer using provided it is not being referenced.
Arrays in C# are immutable. You can't modify them, so if you need to drop the first 4 bytes then you're going to have to reallocate. As thecoop suggests, I'd take a look at ArraySegment and use that to pass around to other functions, if these first 4 bytes are not important to you.
It's also worth noting that yes, in C++, we'd use a bit of pointer arithmetic, but definitely keep hold of the original pointer, less we end up de-allocated and losing 4 bytes to the Demons :)
Just leave the Byte Array untouched and use a MemoryStream and it's offset capability. this won't change your array and you have the ability to skip the first n bytes.
var memoryStream = new MemoryStream(byteArray);
// do whatever you want with the memory stream

Converting int[] to byte: How to look at int[] as it was byte[]?

To explain: I have array of ints as input. I need to convert it to array of bytes, where 1 int = 4 bytes (big endian). In C++, I can easily just cast it and then access to the field as if it was byte array, without copying or counting the data - just direct access. Is this possible in C#? And in C# 2.0?
Yes, using unsafe code:
int[] arr =...
fixed(int* ptr = arr) {
byte* ptr2 = (byte*)ptr;
// now access ptr2[n]
}
If the compiler complains, add a (void*):
byte* ptr2 = (byte*)(void*)ptr;
You can create a byte[] 4 times the size of your int[] lenght.
Then, you iterate trough your integer array & get the byte array from:
BitConverter.GetBytes(int32);
Next you copy the 4 bytes from this function to the correct offset (i * 4) using Buffer.BlockCopy.
BitConverter
Buffer.BlockCopy
Have a look at the BitConverter class. You could iterate through the array of int, and call BitConverter.GetBytes(Int32) to get a byte[4] for each one.
If you write unsafe code, you can fix the array in memory, get a pointer to its beginning, and cast that pointer.
unsafe
{
fixed(int* pi=arr)
{
byte* pb=(byte*)pi;
...
}
}
An array in .net is prefixed with the number of elements, so you can't safely convert between int[] and byte[] that points to the same data. You can cast between uint[] and int[] (at least as far as .net is concerned, the support for this feature in C# itself is a bit inconsistent).
There is also a union based trick to reinterpret cast references, but I strongly recommend not using it.
The usual way to get individual integers from a byte array in native-endian order is BitConverter, but its relatively slow. Manual code is often faster. And of course it doesn't support the reverse conversion.
One way to manually convert assuming little-endian (managed about 400 million reads per second on my 2.6GHz i3):
byte GetByte(int[] arr, int index)
{
uint elem=(uint)arr[index>>2];
return (byte)(elem>>( (index&3)* 8));
}
I recommend manually writing code that uses bitshifting to access individual bytes if you want to go with managed code, and pointers if you want the last bit of performance.
You also need to be careful about endianness issues. Some of these methods only support native endianness.
The simplest way in type-safe managed code is to use:
byte[] result = new byte[intArray.Length * sizeof(int)];
Buffer.BlockCopy(intArray, 0, result, 0, result.Length);
That doesn't quite do what I think your question asked, since on little endian architectures (like x86 or ARM), the result array will end up being little endian, but I'm pretty sure the same is true for C++ as well.
If you can use unsafe{}, you have other options:
unsafe{
fixed(byte* result = (byte*)(void*)intArray){
// Do stuff with result.
}
}

What is a high performance way of multiplying/Adding a big array of numbers with a constant in c#?

I have a structure (class) that keeps very large number of numbers (either float, double, int, byte) in an array.
Now I want to have very high performance approach to apply some primitive operations (Add/Subtract/Divide/Multiply with a constant) on this array.
This arrays is on a continuous piece of memory so for example for copying it I am using Buffer.BlockCopy.
But what about adding a constant or multiplying with a constant?
The first choice is to walk through the array using pointers. What other approaches do you suggest for this?
Using pointer (unsafe) is not certain to be more performant.
Why don't you start with a normal for(int index = 0; index < data.Lenght; index++) loop and see if it meets your requirements.
And of course, the next step would be processing in parallel :
Parallel.For(0, data.Length, i => data[i] *= myFactor);
Pointers won't help you much. Two approaches that could/should be combined:
Process multiple numbers in parallel using some SIMD approach, SSE being one instance of it
Process different array chunks in different threads ("multithreading"); this is most worthwhile on machines with more than CPU core
If you want to reduce ("reduction") the result, e.g. say you want to build the sum of all elements, you could also divide the chunks recursively, and build sums of sums (of sums (of sums (you get the point))).
This multiplies every element with 5 and writes back into the same array.
var someArray = new int[] {1, 2, 3, 4, 5, 6, 7};
int i=0;
Array.ForEach(someArray, (x) => {someArray[i++] = x * 5;});
I don't think you will find a general solution faster than just walking through the array. Most processors will prefetch items in your array; thus as long as your manipulations are small, you'll obtain optimum performance by just accessing each item consecutively.
Keep work per item as straighforward as possible and minimize assignments and fetches inside the loop. Keep as much work as you can outside the loop.

Categories

Resources