I've been trying to get rid of the unmanaged code that I'm currently using on my source code after a friend suggested why I shouldn't be using unmanaged code. However I seem to keep facing a couple of issues here and there. For most case scenarios I used Buffer.BlockCopy() as it seemed to be the most adequate method but there are still a few in which I'm not sure what to use.
Basically this is used to handle packets that are sent between a server and a client. WriteInt16 is a function used to write ushortvalues in the byte array for example. (1. is basically just doing this at offset 20 and 22).
I'll leave a couple examples below:
1.
fixed (byte* p = Data)
{
*((ushort*)(p + 20)) = X;//x is an ushort
*((ushort*)(p + 22)) = Y;//y is an ushort
}
2.
private byte* Ptr
{
get
{
fixed (byte* p = PData)
return p;
}
}
3.
public unsafe void WriteInt16(ushort val)
{
try
{
*((ushort*)(Ptr + Count)) = val;
Count += 2;
}
catch{}
}
Assuming little Endian platform and byte arrays (not tested):
*((ushort*)(p + 20)) = X;
is
Data[20] = (byte)X;
Data[21] = (byte)(X >> 8);
and
*((ushort*)(Ptr + Count)) = val;
is
PData[Count] = (byte)val;
PData[Count + 1] = (byte)(val >> 8);
Related
If I run this:
Console.WriteLine("Foo".GetHashCode());
Console.WriteLine("Foo".GetHashCode());
it will print the same number twice but if I run the program again it will print a different number.
According to Microsoft and other places on the internet we cannot rely on GetHashCode function to return the same value. But if I plan on using it on strings only how can I make use of it and expect to always return the same value for the same string? I love how fast it is. It will be great if I could get the source code of it and use it on my application.
Reason why I need it (you may skip this part)
I have a lot of complex objects that I need to serialize and send them between inter process communication. As you know BinaryFormatter is now obsolete so I then tried System.Text.Json to serialize my objects. That was very fast but because I have a lot of complex objects deserialization did not work well because I am making heavy use of polymorphism. Then I tried Newtonsoft (json.net) and that work great with this example: https://stackoverflow.com/a/71398251/637142. But it was very slow. I then decided I will use the best option and that is ProtoBuffers. So I was using protobuf-net and that worked great but the problem is that I have some objects that are very complex and it was a pain to place thousands of attributes. For example I have a base class that was being used by 70 other classes I had to place an attribute of inheritance for every single one it was not practical. So lastly I decided to implement my own algorithm it was not that complicated. I just have to traverse the properties of each object and if one property was not a value type then traverse them again recursively. But in order for this custom serialization that I build to be fast I needed to store all reflection objects in memory. So I have a dictionary with the types and propertyInfos. So the first time I serialize it will be slow but then it is even faster than ProtoBuf! So yes this approach is fast but every process must have the same exact object otherwise it will not work. Another tradeoff is that it's size is larger than protobuf because every time I serialize a property I include the full name of that property before. As a result I want to hash the full name of the property into an integer (4 bytes) and the GetHashCode() function does exactly that!
A lot of people may suggest that I should use MD5 or a different alternative but take a look at the performance difference:
// generate 1 million random GUIDS
List<string> randomGuids = new List<string>();
for (int i = 0; i < 1_000_000; i++)
randomGuids.Add(Guid.NewGuid().ToString());
// needed to measure time
var sw = new Stopwatch();
sw.Start();
// using md5 (takes aprox 260 ms)
using (var md5 = MD5.Create())
{
sw.Restart();
foreach (var guid in randomGuids)
{
byte[] inputBytes = System.Text.Encoding.ASCII.GetBytes(guid);
byte[] hashBytes = md5.ComputeHash(inputBytes);
// make use of hashBytes to make sure code is compiled
if (hashBytes.Length == 44)
throw new Exception();
}
var elapsed = sw.Elapsed.TotalMilliseconds;
Console.WriteLine($"md5: {elapsed}");
}
// using .net framework 4.7 source code (takes aprox 65 ms)
{
[System.Security.SecuritySafeCritical] // auto-generated
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
static int GetHashCodeDotNetFramework4_7(string str)
{
#if FEATURE_RANDOMIZED_STRING_HASHING
if(HashHelpers.s_UseRandomizedStringHashing)
{
return InternalMarvin32HashString(this, this.Length, 0);
}
#endif // FEATURE_RANDOMIZED_STRING_HASHING
unsafe
{
fixed (char* src = str)
{
#if WIN32
int hash1 = (5381<<16) + 5381;
#else
int hash1 = 5381;
#endif
int hash2 = hash1;
#if WIN32
// 32 bit machines.
int* pint = (int *)src;
int len = this.Length;
while (len > 2)
{
hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ pint[1];
pint += 2;
len -= 4;
}
if (len > 0)
{
hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
}
#else
int c;
char* s = src;
while ((c = s[0]) != 0)
{
hash1 = ((hash1 << 5) + hash1) ^ c;
c = s[1];
if (c == 0)
break;
hash2 = ((hash2 << 5) + hash2) ^ c;
s += 2;
}
#endif
#if DEBUG
// We want to ensure we can change our hash function daily.
// This is perfectly fine as long as you don't persist the
// value from GetHashCode to disk or count on String A
// hashing before string B. Those are bugs in your code.
hash1 ^= -484733382;
#endif
return hash1 + (hash2 * 1566083941);
}
}
}
sw.Restart();
foreach (var guid in randomGuids)
if (GetHashCodeDotNetFramework4_7(guid) == 1234567)
throw new Exception("this will probably never happen");
var elapsed = sw.Elapsed.TotalMilliseconds;
Console.WriteLine($".NetFramework4.7SourceCode: {elapsed}");
}
// using .net 6 built in GetHashCode function (takes aprox: 22 ms)
{
sw.Restart();
foreach (var guid in randomGuids)
if (guid.GetHashCode() == 1234567)
throw new Exception("this will probably never happen");
var elapsed = sw.Elapsed.TotalMilliseconds;
Console.WriteLine($".net6: {elapsed}");
}
Running this in release mode these where my results:
md5: 254.7139
.NetFramework4.7SourceCode: 74.2588
.net6: 23.274
I got the source code from .NET Framework 4.8 from this link: https://referencesource.microsoft.com/#mscorlib/system/string.cs,8281103e6f23cb5c
Anyways searching on the internet I have found this helpful article:
https://andrewlock.net/why-is-string-gethashcode-different-each-time-i-run-my-program-in-net-core/
and I have done exactly what it tells you to do and I have added:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<runtime>
<UseRandomizedStringHashAlgorithm enabled="1" />
</runtime>
</configuration>
to my app.config file and still I get different values for "foo".GetHashCode() every time I run my application.
How can I make the GetHashcode() method return always the same value for the string "foo" in .net 6?
Edit
I will just use the solution of .net framework 4.8 source code that took 73ms to execute and move on. I was just curios to understand why was the build in hashcode so much faster.
At least I understand now why the hash is different every time. By looking at the source code of .net 6 the reason why it has a different hash every time is because of this:
namespace System
{
internal static partial class Marvin
{
... .net source code
....
public static ulong DefaultSeed { get; } = GenerateSeed();
private static unsafe ulong GenerateSeed()
{
ulong seed;
Interop.GetRandomBytes((byte*)&seed, sizeof(ulong));
return seed;
}
}
}
As a result I have tried this just for fun and still did not work:
var ass = typeof(string).Assembly;
var marvin = ass.GetType("System.Marvin");
var defaultSeed = marvin.GetProperty("DefaultSeed");
var value = defaultSeed.GetValue(null); // returns 3644491462759144438
var field = marvin.GetField("<DefaultSeed>k__BackingField", BindingFlags.NonPublic | BindingFlags.Static);
ulong v = 3644491462759144438;
field.SetValue(null, v);
but on the last line I get the exception: System.FieldAccessException: 'Cannot set initonly static field '<DefaultSeed>k__BackingField' after type 'System.Marvin' is initialized.'
But still even if this worked it would be very unsfafe. I rader have something execute 3 times slower and move on.
Why not to use the implementation suggested on the article you shared?
I'm copying it for reference:
static int GetDeterministicHashCode(this string str)
{
unchecked
{
int hash1 = (5381 << 16) + 5381;
int hash2 = hash1;
for (int i = 0; i < str.Length; i += 2)
{
hash1 = ((hash1 << 5) + hash1) ^ str[i];
if (i == str.Length - 1)
break;
hash2 = ((hash2 << 5) + hash2) ^ str[i + 1];
}
return hash1 + (hash2 * 1566083941);
}
}
Currently I code client-server junk and deal a lot with C++ structs passed over network.
I know about ways provided here Reading a C/C++ data structure in C# from a byte array, but they all about making a copy.
I want to have something like that:
struct/*or class*/ SomeStruct
{
public uint F1;
public uint F2;
public uint F3;
}
Later in my code I want to have something like that:
byte[] Data; //16 bytes that I got from network
SomeStruct PartOfDataAsSomeStruct { get { return /*make SomeStruct instance based on this.Data starting from index 4, without copying it. So when I do PartOfDataAsSomeStruct.F1 = 132465; it also changes bytes 4, 5, 6 and 7 in this.Data.*/; } }
If this is possible, please, tell how?
Like so?
byte[] data = new byte[16];
// 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
Console.WriteLine(BitConverter.ToString(data));
ref SomeStruct typed = ref Unsafe.As<byte, SomeStruct>(ref data[4]);
typed.F1 = 42;
typed.F2 = 3;
typed.F3 = 9;
// 00-00-00-00-2A-00-00-00-03-00-00-00-09-00-00-00
Console.WriteLine(BitConverter.ToString(data));
This coerces the data from the middle of the byte-array using a ref-local that is an "interior managed pointer" to the data. Zero copies.
If you need multiple items (like how a vector would work), you can do the same thing with spans and MemoryMarshal.Cast
Note that it uses CPU-endian rules for the elements - little endian in my case.
For spans:
byte[] data = new byte[256];
// create a span of some of it
var span = new Span<byte>(data, 4, 128);
// now coerce the span
var typed = MemoryMarshal.Cast<byte, SomeStruct>(span);
Console.WriteLine(typed.Length); // 10 of them fit
typed[3].F1 = 3; // etc
Thank you for the correction, Marc Gravell. And thank you for the example.
Here is a way using Class and Bitwise Operators, without pointers, to do the samething:
class SomeClass
{
public byte[] Data;
public SomeClass()
{
Data = new byte[16];
}
public uint F1
{
get
{
uint ret = (uint)(Data[4] << 24 | Data[5] << 16 | Data[6] << 8 | Data[7]);
return ret;
}
set
{
Data[4] = (byte)(value >> 24);
Data[5] = (byte)(value >> 16);
Data[6] = (byte)(value >> 8);
Data[7] = (byte)value;
}
}
}
Testing:
SomeClass sc = new SomeClass();
sc.F1 = 0b_00000001_00000010_00000011_00000100;
Console.WriteLine(sc.Data[3].ToString() + " " + sc.Data[4].ToString() + " " + sc.Data[5].ToString() + " " + sc.Data[6].ToString());
Console.WriteLine(sc.F1.ToString());
//Output:
//1 2 3 4
//16909060
The idea: Being able to take the bytes of any struct, send those bytes across a TcpClient (or through my Client wrapper), then have the receiving client load those bytes and use pointers to "paint" them onto a new struct.
The problem: It reads the bytes into the buffer perfectly; it reads the array of bytes on the other end perfectly. The "paint" operation, however, fails miserably. I write a new Vector3(1F, 2F, 3F); I read a Vector3(0F, 0F, 0F)...Obviously, not ideal.
Unfortunately, I don't see the bug - If it works one way, it should work the reverse - And the values are being filled in.
The write/read functions are as follows:
public static unsafe void Write<T>(Client client, T value) where T : struct
{
int n = System.Runtime.InteropServices.Marshal.SizeOf(value);
byte[] buffer = new byte[n];
{
var handle = System.Runtime.InteropServices.GCHandle.Alloc(value, System.Runtime.InteropServices.GCHandleType.Pinned);
void* ptr = handle.AddrOfPinnedObject().ToPointer();
byte* bptr = (byte*)ptr;
for (int t = 0; t < n; ++t)
{
buffer[t] = *(bptr + t);
}
handle.Free();
}
client.Writer.Write(buffer);
}
Line Break
public static unsafe T Read<T>(Client client) where T : struct
{
T r = new T();
int n = System.Runtime.InteropServices.Marshal.SizeOf(r);
{
byte[] buffer = client.Reader.ReadBytes(n);
var handle = System.Runtime.InteropServices.GCHandle.Alloc(r, System.Runtime.InteropServices.GCHandleType.Pinned);
void* ptr = handle.AddrOfPinnedObject().ToPointer();
byte* bptr = (byte*)ptr;
for (int t = 0; t < n; ++t)
{
*(bptr + t) = buffer[t];
}
handle.Free();
}
return r;
}
Help, please, thanks.
Edit:
Well, one major problem is that I'm getting a handle to a temporary copy created when I passed in the struct value.
Edit2:
Changing "T r = new T();" to "object r = new T();" and "return r" to "return (T)r" boxes and unboxes the struct and, in the meantime, makes it a reference, so the pointer actually points to it.
However, it is slow. I'm getting 13,500 - 14,500 write/reads per second.
Edit3:
OTOH, serializing/deserializing the Vector3 through a BinaryFormatter gets about 750 writes/reads per second. So a Lot faster than what I was using. :)
Edit4:
Sending the floats individually got 8,400 RW/second. Suddenly, I feel much better about this. :)
Edit5:
Tested GCHandle allocation pinning and freeing; 28,000,000 ops per second (Compared to about 1,000,000,000 Int32/int add+assign/second. So compared to integers, it's 35 times slower. However, that's still comparatively fast enough). Note that you don't appear to be able to pin classes, even if GCHandle does auto-boxed structs fine (GCHandle accepts values of type "object").
Now, if the C# guys update constraints to the point where the pointer allocation recognizes that "T" is a struct, I could just assign directly to a pointer, which is...Yep, incredibly fast.
Next up: Probably testing write/read using separate threads. :) See how much the GCHandle actually affects the send/receive delay.
As it turns out:
Edit6:
double start = Timer.Elapsed.TotalSeconds;
for (t = 0; t < count; ++t)
{
Vector3 from = new Vector3(1F, 2F, 3F);
// Vector3* ptr = &test;
// Vector3* ptr2 = &from;
int n = sizeof(Vector3);
if (n / 4 * 4 != n)
{
// This gets 9,000,000 ops/second;
byte* bptr1 = (byte*)&test;
byte* bptr2 = (byte*)&from;
// int n = 12;
for (int t2 = 0; t2 < n; ++t2)
{
*(bptr1 + t2) = *(bptr2 + t2);
}
}
else
{
// This speedup gets 24,000,000 ops/second.
int n2 = n / 4;
int* iptr1 = (int*)&test;
int* iptr2 = (int*)&from;
// int n = 12;
for (int t2 = 0; t2 < n2; ++t2)
{
*(iptr1 + t2) = *(iptr2 + t2);
}
}
}
So, overall, I don't think the GCHandle is really slowing things down. (Those who are thinking this is a slow way of assigning one Vector3 to another, remember that the purpose is to serialize structs into a byte[] buffer to send over a network. And, while that's not what we're doing here, it would be rather easy to do with the first method).
Edit7:
The following got 6,900,000 ops/second:
for (t = 0; t < count; ++t)
{
Vector3 from = new Vector3(1F, 2F, 3F);
int n = sizeof(Vector3);
byte* bptr2 = (byte*)&from;
byte[] buffer = new byte[n];
for (int t2 = 0; t2 < n; ++t2)
{
buffer[t2] = *(bptr2 + t2);
}
}
...Help! I've got IntruigingPuzzlitus! :D
Is there a method (in c#/.net) that would left-shift (bitwise) each short in a short[] that would be faster then doing it in a loop?
I am talking about data coming from a digital camera (16bit gray), the camera only uses the lower 12 bits. So to see something when rendering the data it needs to be shifted left by 4.
This is what I am doing so far:
byte[] RawData; // from camera along with the other info
if (pf == PixelFormats.Gray16)
{
fixed (byte* ptr = RawData)
{
short* wptr = (short*)ptr;
short temp;
for (int line = 0; line < ImageHeight; line++)
{
for (int pix = 0; pix < ImageWidth; pix++)
{
temp = *(wptr + (pix + line * ImageWidth));
*(wptr + (pix + line * ImageWidth)) = (short)(temp << 4);
}
}
}
}
Any ideas?
I don't know of a library method that will do it, but I have some suggestions that might help. This will only work if you know that the upper four bits of the pixel are definitely zero (rather than garbage). (If they are garbage, you'd have to add bitmasks to the below). Basically I would propose:
Using a shift operator on a larger data type (int or long) so that you are shifting more data at once
Getting rid of the multiply operations inside your loop
Doing a little loop unrolling
Here is my code:
using System.Diagnostics;
namespace ConsoleApplication9 {
class Program {
public static void Main() {
Crazy();
}
private static unsafe void Crazy() {
short[] RawData={
0x000, 0x111, 0x222, 0x333, 0x444, 0x555, 0x666, 0x777, 0x888,
0x999, 0xaaa, 0xbbb, 0xccc, 0xddd, 0xeee, 0xfff, 0x123, 0x456,
//extra sentinel value which is just here to demonstrate that the algorithm
//doesn't go too far
0xbad
};
const int ImageHeight=2;
const int ImageWidth=9;
var numShorts=ImageHeight*ImageWidth;
fixed(short* rawDataAsShortPtr=RawData) {
var nextLong=(long*)rawDataAsShortPtr;
//1 chunk of 4 longs
// ==8 ints
// ==16 shorts
while(numShorts>=16) {
*nextLong=*nextLong<<4;
nextLong++;
*nextLong=*nextLong<<4;
nextLong++;
*nextLong=*nextLong<<4;
nextLong++;
*nextLong=*nextLong<<4;
nextLong++;
numShorts-=16;
}
var nextShort=(short*)nextLong;
while(numShorts>0) {
*nextShort=(short)(*nextShort<<4);
nextShort++;
numShorts--;
}
}
foreach(var item in RawData) {
Debug.Print("{0:X4}", item);
}
}
}
}
Is there a way of mapping data collected on a stream or array to a data structure or vice-versa?
In C++ this would simply be a matter of casting a pointer to the stream as a data type I want to use (or vice-versa for the reverse)
eg: in C++
Mystruct * pMyStrct = (Mystruct*)&SomeDataStream;
pMyStrct->Item1 = 25;
int iReadData = pMyStrct->Item2;
obviously the C++ way is pretty unsafe unless you are sure of the quality of the stream data when reading incoming data, but for outgoing data is super quick and easy.
Most people use .NET serialization (there is faster binary and slower XML formatter, they both depend on reflection and are version tolerant to certain degree)
However, if you want the fastest (unsafe) way - why not:
Writing:
YourStruct o = new YourStruct();
byte[] buffer = new byte[Marshal.SizeOf(typeof(YourStruct))];
GCHandle handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
Marshal.StructureToPtr(o, handle.AddrOfPinnedObject(), false);
handle.Free();
Reading:
handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
o = (YourStruct)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(YourStruct));
handle.Free();
In case lubos hasko's answer was not unsafe enough, there is also the really unsafe way, using
pointers in C#. Here's some tips and pitfalls I've run into:
using System;
using System.Runtime.InteropServices;
using System.IO;
using System.Diagnostics;
// Use LayoutKind.Sequential to prevent the CLR from reordering your fields.
[StructLayout(LayoutKind.Sequential)]
unsafe struct MeshDesc
{
public byte NameLen;
// Here fixed means store the array by value, like in C,
// though C# exposes access to Name as a char*.
// fixed also requires 'unsafe' on the struct definition.
public fixed char Name[16];
// You can include other structs like in C as well.
public Matrix Transform;
public uint VertexCount;
// But not both, you can't store an array of structs.
//public fixed Vector Vertices[512];
}
[StructLayout(LayoutKind.Sequential)]
unsafe struct Matrix
{
public fixed float M[16];
}
// This is how you do unions
[StructLayout(LayoutKind.Explicit)]
unsafe struct Vector
{
[FieldOffset(0)]
public fixed float Items[16];
[FieldOffset(0)]
public float X;
[FieldOffset(4)]
public float Y;
[FieldOffset(8)]
public float Z;
}
class Program
{
unsafe static void Main(string[] args)
{
var mesh = new MeshDesc();
var buffer = new byte[Marshal.SizeOf(mesh)];
// Set where NameLen will be read from.
buffer[0] = 12;
// Use Buffer.BlockCopy to raw copy data across arrays of primitives.
// Note we copy to offset 2 here: char's have alignment of 2, so there is
// a padding byte after NameLen: just like in C.
Buffer.BlockCopy("Hello!".ToCharArray(), 0, buffer, 2, 12);
// Copy data to struct
Read(buffer, out mesh);
// Print the Name we wrote above:
var name = new char[mesh.NameLen];
// Use Marsal.Copy to copy between arrays and pointers to arrays.
unsafe { Marshal.Copy((IntPtr)mesh.Name, name, 0, mesh.NameLen); }
// Note you can also use the String.String(char*) overloads
Console.WriteLine("Name: " + new string(name));
// If Erik Myers likes it...
mesh.VertexCount = 4711;
// Copy data from struct:
// MeshDesc is a struct, and is on the stack, so it's
// memory is effectively pinned by the stack pointer.
// This means '&' is sufficient to get a pointer.
Write(&mesh, buffer);
// Watch for alignment again, and note you have endianess to worry about...
int vc = buffer[100] | (buffer[101] << 8) | (buffer[102] << 16) | (buffer[103] << 24);
Console.WriteLine("VertexCount = " + vc);
}
unsafe static void Write(MeshDesc* pMesh, byte[] buffer)
{
// But byte[] is on the heap, and therefore needs
// to be flagged as pinned so the GC won't try to move it
// from under you - this can be done most efficiently with
// 'fixed', but can also be done with GCHandleType.Pinned.
fixed (byte* pBuffer = buffer)
*(MeshDesc*)pBuffer = *pMesh;
}
unsafe static void Read(byte[] buffer, out MeshDesc mesh)
{
fixed (byte* pBuffer = buffer)
mesh = *(MeshDesc*)pBuffer;
}
}
if its .net on both sides:
think you should use binary serialization and send the byte[] result.
trusting your struct to be fully blittable can be trouble.
you will pay in some overhead (both cpu and network) but will be safe.
If you need to populate each member variable by hand you can generalize it a bit as far as the primitives are concerned by using FormatterServices to retrieve in order the list of variable types associated with an object. I've had to do this in a project where I had a lot of different message types coming off the stream and I definitely didn't want to write the serializer/deserializer for each message.
Here's the code I used to generalize the deserialization from a byte[].
public virtual bool SetMessageBytes(byte[] message)
{
MemberInfo[] members = FormatterServices.GetSerializableMembers(this.GetType());
object[] values = FormatterServices.GetObjectData(this, members);
int j = 0;
for (int i = 0; i < members.Length; i++)
{
string[] var = members[i].ToString().Split(new char[] { ' ' });
switch (var[0])
{
case "UInt32":
values[i] = (UInt32)((message[j] << 24) + (message[j + 1] << 16) + (message[j + 2] << 8) + message[j + 3]);
j += 4;
break;
case "UInt16":
values[i] = (UInt16)((message[j] << 8) + message[j + 1]);
j += 2;
break;
case "Byte":
values[i] = (byte)message[j++];
break;
case "UInt32[]":
if (values[i] != null)
{
int len = ((UInt32[])values[i]).Length;
byte[] b = new byte[len * 4];
Array.Copy(message, j, b, 0, len * 4);
Array.Copy(Utilities.ByteArrayToUInt32Array(b), (UInt32[])values[i], len);
j += len * 4;
}
break;
case "Byte[]":
if (values[i] != null)
{
int len = ((byte[])values[i]).Length;
Array.Copy(message, j, (byte[])(values[i]), 0, len);
j += len;
}
break;
default:
throw new Exception("ByteExtractable::SetMessageBytes Unsupported Type: " + var[1] + " is of type " + var[0]);
}
}
FormatterServices.PopulateObjectMembers(this, members, values);
return true;
}