What is the fastest (possibly unsafe) way to read a byte[]?

What is the fastest (possibly unsafe) way to read a byte[]? - c#

I'm working on a server project in C#, and after a TCP message is received, it is parsed, and stored in a byte[] of exact size. (Not a buffer of fixed length, but a byte[] of an absolute length in which all data is stored.)
Now for reading this byte[] I'll be creating some wrapper functions (also for compatibility), these are the signatures of all functions I need:
public byte ReadByte();
public sbyte ReadSByte();
public short ReadShort();
public ushort ReadUShort();
public int ReadInt();
public uint ReadUInt();
public float ReadFloat();
public double ReadDouble();
public string ReadChars(int length);
public string ReadString();
The string is a \0 terminated string, and is probably encoded in ASCII or UTF-8, but I cannot tell that for sure, since I'm not writing the client.
The data exists of:
byte[] _data;
int _offset;
Now I can write all those functions manually, like this:
public byte ReadByte()
{
return _data[_offset++];
}
public sbyte ReadSByte()
{
byte r = _data[_offset++];
if (r >= 128) return (sbyte)(r - 256);
else return (sbyte)r;
}
public short ReadShort()
{
byte b1 = _data[_offset++];
byte b2 = _data[_offset++];
if (b1 >= 128) return (short)(b1 * 256 + b2 - 65536);
else return (short)(b1 * 256 + b2);
}
public short ReadUShort()
{
byte b1 = _data[_offset++];
return (short)(b1 * 256 + _data[_offset++]);
}
But I wonder if there's a faster way, not excluding the use of unsafe code, since this seems to cost too much time for simple processing.

Check out the BitConverter class, in the System namespace. It contains methods for turning parts of byte[]s into other primitive types. I've used it in similar situations and have found it suitably quick.
As for decoding strings from byte[]s, use the classes that derive from the Encoding class, in the System.Text namespace, specifically the GetString(byte[]) method (and its overloads).

One way is to map the contents of the array to a struct (providing your structure is indeed static):
http://geekswithblogs.net/taylorrich/archive/2006/08/21/88665.aspx
using System;
using System.Runtime.InteropServices;
[StructLayout(LayoutKind.Sequential, Pack=1)]
struct Message
{
public int id;
[MarshalAs (UnmanagedType.ByValTStr, SizeConst=50)]
public string text;
}
void OnPacket(byte[] packet)
{
GCHandle pinnedPacket = GCHandle.Alloc(packet, GCHandleType.Pinned);
Message msg = (Message)Marshal.PtrToStructure(
pinnedPacket.AddrOfPinnedObject(),
typeof(Message));
pinnedPacket.Free();
}

You could use a BinaryReader.
A BinaryReader is a stream decorator so you would have to wrap the byte[] in a MemoryStream or attach the Reader directly to the network stream.
And then you have
int ReadInt32()
char[] ReadChars(int count)
etc.
Edit: Apparently you want 'faster execution'.
That means you are looking for an optimization in the conversion(s) from byte[], after those bytes have been received over (network) I/O.
In other words, you are trying to optimize the part that only takes up (an estimated) 0.1% of the time. Totally futile.

Related

IndexOf char within an ReadOnlySpan<byte> of UTF8 bytes

I'm looking for an efficient, allocation-free (!) implementation of
public static int IndexOf(this ReadOnlySpan<byte> utf8Bytes, char #char)
{
// Should return the index of the first byte of #char within utf8Bytes
// (not the character index of #char within the string)
}
I've not found a way to iterate through the span char by char yet. Utf8Parser does not have an overload supporting single characters.
And System.Text.Encoding seems to work mostly on the entire span, and does allocate internally while doing so.
Is there any builtin functionality I haven't spotted yet? If not, can anyone think of a reasonable custom implementation?

Rather than trying to iterate through the utf8Bytes character by character, it may be easier to convert the character to a short stackalloc'ed utf8 byte sequence, and search for that:
public static class StringExtensions
{
const int MaxBytes = 4;
public static int IndexOf(this ReadOnlySpan<byte> utf8Bytes, char #char)
{
Rune rune;
try
{
rune = new Rune(#char);
}
catch (ArgumentOutOfRangeException)
{
// Malformed unicode character, return -1 or throw?
return -1;
}
return utf8Bytes.IndexOf(rune);
}
public static int IndexOf(this ReadOnlySpan<byte> utf8Bytes, Rune #char)
{
Span<byte> charBytes = stackalloc byte[MaxBytes];
var n = #char.EncodeToUtf8(charBytes);
charBytes = charBytes.Slice(0, n);
for (int i = 0, thisLength = 1; i <= utf8Bytes.Length - charBytes.Length; i += thisLength)
{
thisLength = Utf8ByteSequenceLength(utf8Bytes[i]);
if (thisLength == charBytes.Length && charBytes.CommonPrefixLength(utf8Bytes.Slice(i)) == charBytes.Length)
return i;
}
return -1;
}
static int Utf8ByteSequenceLength(byte firstByte)
{
//https://en.wikipedia.org/wiki/UTF-8#Encoding
if ( (firstByte & 0b11111000) == 0b11110000) // 11110xxx
return 4;
else if ((firstByte & 0b11110000) == 0b11100000) // 1110xxxx
return 3;
else if ((firstByte & 0b11100000) == 0b11000000) // 110xxxxx
return 2;
return 1; // Either a 1-byte sequence (matching 0xxxxxxx) or an invalid start byte.
}
}
Notes:
Rune is a struct introduced in .NET Core 3.x that represents a Unicode scalar value. If you need to search your utf8Bytes for a Unicode codepoint that is not in the basic multilingual plane, you will need to use Rune.
Rune has the added advantage that its method Rune.TryEncodeToUtf8() is lightweight and allocation-free.
If char #char is an invalid Unicode character, the .NET encoding algorithms will throw an exception if you attempt to construct a Rune from it. The above code catches the exception and returns -1. You may wish to rethrow the exception.
As an alternative, Rune.DecodeFromUtf8(ReadOnlySpan<Byte>, Rune, Int32) can be used to iterate through a utf8 byte span Rune by Rune. You could use that to locate an incoming Rune by index. However, I suspect doing so would be less efficient than the method above.
Demo fiddle here.

You can negate allocations with stackalloc. First approximation can look like:
static (int Found, int Processed) IndexOf(ReadOnlySpan<byte> utf8Bytes, char #char)
{
Span<char> chars = stackalloc char[utf8Bytes.Length]; // "worst" case every byte is a separate char
var proc = Encoding.UTF8.GetChars(utf8Bytes, chars);
var indexOf = chars.IndexOf(#char);
if (indexOf > 0)
{
Span<byte> bytes = stackalloc byte[indexOf * 4];
var result = Encoding.UTF8.GetBytes(chars.Slice(0, indexOf), bytes);
return (result, proc);
}
return (indexOf, proc);
}
There are few notes here:
Big incoming spans can result in SO
Decoding the whole array is not optimal
Span can contain "partial" codepoints at start and end so Processed should be processed accordingly
First two points can be mitigated by processing the incoming span in slices of smaller size (for example reading 4 bytes into 4 chars spand).
Actually I believe that System.IO.Pipelines handles the same issues (via System.Buffers I believe) though it 1) it can be not completely allocation free I believe 2) I still have not investigated it that much so would not be able to provide a completely working example.

From .NET 5 onwards, there's a library method EncodingExtensions.GetChars to help you.
Specifically, you want the overload that gets the byte data from a ReadOnlySpan and writes to an IBufferWriter<char>, which you can then implement to receive your characters one by one and run whatever on them (your matching algorithm, for example). This solution is allocation-free of course, as long as you put your custom buffer writer in a static field and allocate it only once.

Casting a managed array to an array of structs without copying

I have a managed array provided by a third party library. The type of the array does not directly reflect the data stored in the array, but instead it's the data interpreted as integers.
So int[] data = Lib.GetData(); gives me an integer array, that I would like to cast into an array of DataStructure, that could look like this.
struct DataStructure {
public int Id;
public double Value;
}
Current I use Marshal.Copy (an implementation can been seen here), but it seems a bit excessive to copy the entire thing.
Does something like this exists:
int[] data = Lib.GetData();
DataStructure[] dataStructs = InterpretAs<DataStructure>(data);
where no copying is required, but access to dataStruct elements can be done like dataStruct[1].Id?
EDIT 2:
If I just want a single DataStructure, I can use
public static T ToStruct<T>(byte[] bytes) where T : struct
{
GCHandle handle = GCHandle.Alloc(bytes, GCHandleType.Pinned);
T something = Marshal.PtrToStructure<T>(handle.AddrOfPinnedObject());
handle.Free();
return something;
}
where no copying is required.
EDIT:
The answers possible duplicate are currently about 7 years old, and they consists of either copying the data or implementing a hack.

There is actually a cheat, but it is an ugly unsafe totally unsafe cheat:
[StructLayout(LayoutKind.Sequential)]
//[StructLayout(LayoutKind.Sequential, Pack = 4)]
public struct DataStructure
{
public int Id;
public double Value;
}
[StructLayout(LayoutKind.Explicit)]
public struct DataStructureConverter
{
[FieldOffset(0)]
public int[] IntArray;
[FieldOffset(0)]
public DataStructure[] DataStructureArray;
}
and then you can convert it without problems:
var myarray = new int[8];
myarray[0] = 1;
myarray[3] = 2;
//myarray[4] = 2;
DataStructure[] ds = new DataStructureConverter { IntArray = myarray }.DataStructureArray;
int i1 = ds[0].Id;
int i2 = ds[1].Id;
Note that depending on the size of DataStructure (if it is 16 bytes or 12 bytes), you have to use Pack = 4 (if it is 12 bytes) or you don't need anything (see explanation (1) later)
I'll add that this technique is undocumented and totally unsafe. It even has a problem: ds.Length isn't the length of the DataStructure[] but is the length of the int[] (so in the example given it is 8, not 2)
The "technique" is the same I described here and originally described here.
explanation (1)
The sizeof(double) is 8 bytes, so Value is normally aligned on the 8 bytes boundary, so normally there is a "gap" between Id (that has sizeof(int) == 4) and Value of 4 bytes. So normally sizeof(DataStructure) == 16. Depending on how the DataStructure is built, there could not be this gap, so the Pack = 4 that forces alignment on the 4 byte boundary.

Rewrite C++ function to C# (pass pointer to next element in array)

I need to rewrite this C++ function to C#
bool DesDecrypt(const BYTE *InBuff, DWORD dwInBuffSize, BYTE *OutBuff, DWORD dwOutBuffSize, const char *TerminalID)
{
...
for(DWORD i = 0 ; i < dwInBuffSize/8 ; i++)
DES.des_ecb_encrypt((des_cblock *)InBuff+i, (des_cblock *)OutBuff+i, sched, DES_DECRYPT) ;
}
The place I am stuck is pointer arithmetic. On C++ side you can see author uses
InBuff+i
So it is advancing pointer and passing it to function.
On C# my function looks like this:
public static bool DesDecrypt(byte[] inBuff, uint inBuffSize, byte[] outBuff, uint outBufSize, string terminalID)
{
.....
}
I am stuck how to rewrite above loop(particularly how to pass pointer to next element in byte array) to C#. In C# there is no pointer arithmetic so if I do similar, it will just pass i'th value of byte array.
So how can I simulate on C# passing pointer to the next element in array ?
This is my decrypt function in C#
public static byte[] DecryptDES_ECB(byte [] ciphertext, byte [] key)
which I should use instead of C++ version: DES.des_ecb_encrypt
I am looking for such wrapper as a solution on C# side
public static byte[] DecryptDES_ECB(byte[] ciphertext, int cipherOffset, byte[] key)
{
byte [] tmp = new byte [ciphertext.Length - cipherOffset];
for(int i = 0; i<ciphertext.Length - cipherOffset; i++)
{
tmp[i] = ciphertext[cipherOffset + i];
}
return DecryptDES_ECB(tmp, key);
}
Do you think this should work? Now I will call this function on C# side in loop and pass offset as in C++.

If you use a LINQ extension and write inBuff.Skip(i) you will get an IEnumerable that yilds it's elements starting with the i inBuff element. Unless you call ToList method no copying and additional memory allocation will appear but you can treat and use your new IEnumerable like it's a subarray.

After your update:
Easiest solution would be to get some sort of subarray of your inBuff and outBuff, then perform your DecryptDES_ECB()-function and copy your results into your original arrays afterwards.
public static void DecryptDES_ECB(byte[] ciphertext, byte[] decryptedtext, int cipherOffset, byte[] key)
{
byte [] tmpCipher = new byte [ciphertext.Length - cipherOffset];
Array.copy(ciphertext, cipherOffset, tmpCipher, 0, tmpCipher.Length);
byte [] tmpDecrypt = DecryptDES_ECB(tmp, key);
Array.copy(tmpDecrypt, 0, decryptedtext, cipherOffset, tmpDecrypt.Length);
}
This method has not been tested and I don't know the underlaying library, so I can not guarantee for correctness. But generally this would be an easy (but rather slow) attempt on solving your general problem.
EDIT:
Just some additional info on Array.Copy: It performs a memmove which internally (usually) performs a call to memcpy, which is pretty damn fast. (Usually) a lot faster than your loop can possibly be.

create fixed size string in a struct in C#?

I m a newbie in C#.I want to create a struct in C# which consist of string variable of fixed size. example DistributorId of size [20]. What is the exact way of giving the string a fixed size.
public struct DistributorEmail
{
public String DistributorId;
public String EmailId;
}

If you need fixed, preallocated buffers, String is not the correct datatype.
This type of usage would only make sense in an interop context though, otherwise you should stick to Strings.
You will also need to compile your assembly with allow unsafe code.
unsafe public struct DistributorEmail
{
public fixed char DistributorId[20];
public fixed char EmailID[20];
public DistributorEmail(string dId)
{
fixed (char* distId = DistributorId)
{
char[] chars = dId.ToCharArray();
Marshal.Copy(chars, 0, new IntPtr(distId), chars.Length);
}
}
}
If for some reason you are in need of fixed size buffers, but not in an interop context, you can use the same struct but without unsafe and fixed. You will then need to allocate the buffers yourself.
Another important point to keep in mind, is that in .NET, sizeof(char) != sizeof(byte). A char is at the very least 2 bytes, even if it is encoded in ANSI.

If you really need a fixed length, you can always use a char[] instead of a string. It's easy to convert to/from, if you also need string manipulation.
string s = "Hello, world";
char[] ca = s.ToCharArray();
string s1 = new string(ca);
Note that, aside from some special COM interop scenarios, you can always just use strings, and let the framework worry about sizes and storage.

You can create a new fixed length string by specifying the length when you create it.
string(char c, int count)
This code will create a new string of 40 characters in length, filled with the space character.
string newString = new string(' ', 40);

As string extension, covers source string longer and shorter thand fixed:
public static string ToFixedLength(this string inStr, int length)
{
if (inStr.Length == length)
return inStr;
if(inStr.Length > length)
return inStr.Substring(0, length);
var blanks = Enumerable.Range(1, length - inStr.Length).Select(v => " ").Aggregate((a, b) => $"{a}{b}");
return $"{inStr}{blanks}";
}

How do I convert a struct to a byte array without a copy?

[StructLayout(LayoutKind.Explicit)]
public struct struct1
{
[FieldOffset(0)]
public byte a; // 1 byte
[FieldOffset(1)]
public int b; // 4 bytes
[FieldOffset(5)]
public short c; // 2 bytes
[FieldOffset(7)]
public byte buffer;
[FieldOffset(18)]
public byte[] shaHashResult; // 20 bytes
}
void DoStuff()
{
struct1 myTest = new struct1();
myTest.shaHashResult = sha256.ComputeHash(pkBytes); // 20 bytes
byte[] newParameter = myTest.ToArray() //<-- How do I convert a struct
// to array without a copy?
}
How do I take the array myTest and convert it to a byte[]? Since my objects will be large, I don't want to copy the array (memcopy, etc)

Since you have a big array, this is really the only way you will be able to do what you want:
var myBigArray = new Byte[100000];
// ...
var offset = 0;
var hash = sha256.ComputeHash(pkBytes);
Buffer.BlockCopy(myBigArray, offset, hash, 0, hash.Length);
// if you are in a loop, do something like this
offset += hash.Length;
This code is very efficient, and even in a tight loop, the results of ComputeHash will be collected in a quick Gen0 collection, while myBigArray will be on the Large Object Heap, and not moved or collected. The Buffer.BlockCopy is highly optimized in the runtime, yielding the fastest copy you could probably achieve, even in the face of pinning the target, and using an unrolled pointer copy.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

What is the fastest (possibly unsafe) way to read a byte[]? - c#

Related

IndexOf char within an ReadOnlySpan<byte> of UTF8 bytes

Casting a managed array to an array of structs without copying

Rewrite C++ function to C# (pass pointer to next element in array)

create fixed size string in a struct in C#?

How do I convert a struct to a byte array without a copy?

Categories

Resources