public void Consumer()
{
foreach(int i in Integers())
{
Console.WriteLine(i.ToString());
}
}
public IEnumerable<int> Integers()
{
yield return 1;
yield return 2;
yield return 4;
yield return 8;
yield return 16;
yield return 16777216;
}
Is there a way with template trick (or other) to get the same syntax in c++?
Take a look at boost::Coroutine. It does what you want.
http://www.crystalclearsoftware.com/soc/coroutine/index.html#coroutine.intro
Example from tutorial
http://www.crystalclearsoftware.com/soc/coroutine/coroutine/tutorial.html
int range_generator(generator_type::self& self, int min, int max)
{
while(min < max)
self.yield(min++);
self.exit();
}
You can always code this by hand. Truthfully, yield really seems like sugar coating to me (and co-routines too).
What a coroutine is, really ? Some state bundled up together with:
one function to create it (isn't it a constructor ?)
one function to move to the next state (isn't it operator++, traditionally ?)
In C++, it's called an InputIterator, and can be arbitrarily fat.
So, it's true that the syntax won't be as pretty, but this should do, just with the Standard Library:
static std::array<int, 6> const Array = {{1, 2, 4, 8, 16, 16777216}};
class Integers: public std::iterator<std::input_iterator_tag,
int, ptrdiff_t, int const*, int>
{
public:
Integers(): _index(0) {}
operator bool() const { return _index < Array.size(); }
Integers& operator++() { assert(*this); ++_index; return *this; }
Integers operator++(int) { Integers tmp = *this; ++*this; return tmp; }
int operator*() const { assert(*this); return Array[_index]; }
int const* operator->() const { assert(*this); return &Array[_index]; }
private:
size_t _index;
}; // class Integers
And obviously, since you decide exactly what state is stored, you decide if all is pre-computed or if part (or whole of it) is lazily computed, and possibly cached, and possibly multi-threaded, and ... you got the idea :)
In C++14, you can mimic yield this way:
auto&& function = []() {
int i = 0;
return [=]() mutable {
int arr[] = { 1, 2, 4, 8, 16, 16777216};
if (i < 6)
return arr[i++];
return 0;
};
}();
A live example is available at http://ideone.com/SQZ1qZ
Coroutines are in the standard library since C++20 and uses co_yield instead of yield.
See also: What are coroutines in C++20?
There are some example usages in the first link: (the second one is probably what you're looking for)
uses the co_await operator to suspend execution until resumed
task<> tcp_echo_server() {
char data[1024];
while (true) {
size_t n = co_await socket.async_read_some(buffer(data));
co_await async_write(socket, buffer(data, n));
}
}
uses the keyword co_yield to suspend execution returning a value
generator<int> iota(int n = 0) {
while (true)
co_yield n++;
}
uses the keyword co_return to complete execution returning a value
lazy<int> f() {
co_return 7;
}
Here is ASM "roll your own" version : http://www.flipcode.com/archives/Yield_in_C.shtml
#include <stdio.h
#include <conio.h
#include <iostream.h
//
// marks a location in the program for resume
// does not return control, exits function from inside macro
//
// yield( x, ret )
// x : the 'name' of the yield, cannot be ambiguous in the
// function namespace
// ret : the return value for when yield() exits the function;
// must match function return type (leave blank for no return type)
#define yield(x,ret) \
{ \
/* store the resume location */ \
__asm { \
mov _myStaticMkr,offset label_##x \
} \
\
/* return the supplied value */ \
return ret; \
} \
/* our offset in the function */ \
label_##x:
//
// resumes function from the stored offset, or
// continues without notice if there's not one
// stored
//
// resume()
// <void
#define resume() \
/* our stored offset */ \
static _myStaticMkr=0; \
\
/* test for no offset */ \
if( _myStaticMkr ) \
{ \
/* resume from offset */ \
__asm \
{ \
jmp _myStaticMkr \
} \
}
// example demonstrating a function with an int return type
// using the yield() and resume() macros
//
// myFunc()
// <void
int myFunc()
{
resume();
cout << "1\n";
yield(1,1);
cout << "2\n";
yield(2,1);
cout << "3\n";
yield(3,1);
cout << "4\n";
return 0;
}
// main function
//
// main()
// <void
void main( void )
{
cout << "Yield in C++\n";
cout << "Chris Pergrossi\n\n";
myFunc();
do
{
cout << "main()\n";
cout.flush();
} while( myFunc() );
cout.flush();
getch();
}
/*
// example demonstrating a function with no return type
// using the yield() and resume() macros
//
// myFunc()
// <void
void myFunc()
{
resume();
cout << "1\n";
yield(1);
cout << "2\n";
yield(2);
cout << "3\n";
yield(3);
cout << "4\n";
return;
}
// main function
//
// main()
// <void
void main( void )
{
cout << "Yield in C++\n";
cout << "Chris Pergrossi\n\n";
myFunc();
for( int k = 0; k < 4; k ++ )
{
cout << "main()\n";
cout.flush();
myFunc();
}
cout.flush();
getch();
}
*/
If all what you need is just foreach-like stuff, then following syntax is available in C++:
#define GENERATOR(name) \
struct name \
{ \
template<typename F> \
void operator()(F yield) \
/**/
#define _ };
template<typename Gen>
struct Adaptor
{
Gen f;
template<typename C>
void operator*(C cont)
{
f(cont);
}
};
template<typename Gen>
Adaptor<Gen> make_adaptor(Gen gen)
{
return {gen};
}
#define FOREACH(arg, gen) make_adaptor(gen) * [&](arg)
#include <iostream>
using namespace std;
GENERATOR(integers)
{
yield(1);
yield(2);
yield(4);
yield(8);
yield(16777216);
}_
int main()
{
FOREACH(int i, integers())
{
cout << i << endl;
};
}
Live Demo
If you need a little bit of coroutine "power", then you can try stackless coroutines.
Or if you need full power - then go with stackful coroutines. There is Boost.Coroutine library which implements stackful coroutines for different platforms.
An try to implement yield in c++ coroutine
If you write static unsigned int checkpoint = 0;, make all your variables static, switch (checkpoint), set each case: goto to some label, above each return set checkpoint to unique value, and below define label, and at the end of the function set checkpoint to zero, and all static variables to their default value, and at last return the end value of the function. If you do all this then the function becomes enumerable and iterative. The two lines you add above and below each return line, makes the return command to behave like yield return. goto allows you to continue and resume where you left off, and static integer variable, like checkpoint, help you to remember where you stopped, from where to continue/resume and where to go. You test it's values with switch case statements. Making all other variables static, is to save their value to the next call, so in the next call, their value won't be reset!
Here for example:
#define PowerEnd INT_MIN
int Power(int number, int exponent)
{
static unsigned int checkpoint = 0;
static int result = 1, i = 0;
switch (checkpoint)
{
case 1: goto _1;
}
for (i = 0; i < exponent; i++)
{
result *= number;
checkpoint = 1;
return result;
_1:;
}
checkpoint = 0;
result = 1;
i = 0;
return PowerEnd;
}
void main()
{
while (true)
{
int result = Power(2, 8);
if (result == PowerEnd)
break;
cout << result << endl;
}
//to print only the first 4 results (if there are at least 4 results) then
for (int i = 0; i < 4; i++)
{
int result = Power(2, 8);
if (result == PowerEnd)
break;
cout << result << endl;
}
}
The above program produces the following output:
2
4
8
16
32
64
128
256
2
4
8
16
Something similar is proposed for C++17 and there is already an experimental implementation in Visual C++ 2015. Here's a good overview talk from Gor Nishanov, one of the main authors of the proposal.
#include <setjmp.h>
class superclass
{
public:
jmp_buf jbuf;
public:
virtual int enumerate(void) { return -1; }
};
class subclass: public superclass
{
public:
int enumerate()
{
static int i;
static bool b = false;
if(b)
longjmp(jbuf, 1);
for(b = true, i = 0; i < 5; (i)++)
{
printf("\ndoing stuff: i = %d\n", i);
if(setjmp(jbuf) != 1)
return i;
}
return -1;
}
};
To use the code...
int iret;
subclass *sc;
sc = new subclass();
while((iret = sc->enumerate()) != -1)
{
printf("\nsc->enumerate() returned: %d\n", iret);
}
Just got this working; it seems quite simple now, although I had a few false starts with it :)
You can of course always write your own iterators and return from them whatever you desire, but why would you want to? In the given example, why not simply put your values into a container like vector and iterate over that?
Related
What I want to do is change how a C# method executes when it is called, so that I can write something like this:
[Distributed]
public DTask<bool> Solve(int n, DEvent<bool> callback)
{
for (int m = 2; m < n - 1; m += 1)
if (m % n == 0)
return false;
return true;
}
At run-time, I need to be able to analyse methods that have the Distributed attribute (which I already can do) and then insert code before the body of the function executes and after the function returns. More importantly, I need to be able to do it without modifying code where Solve is called or at the start of the function (at compile time; doing so at run-time is the objective).
At the moment I have attempted this bit of code (assume t is the type that Solve is stored in, and m is a MethodInfo of Solve):
private void WrapMethod(Type t, MethodInfo m)
{
// Generate ILasm for delegate.
byte[] il = typeof(Dpm).GetMethod("ReplacedSolve").GetMethodBody().GetILAsByteArray();
// Pin the bytes in the garbage collection.
GCHandle h = GCHandle.Alloc((object)il, GCHandleType.Pinned);
IntPtr addr = h.AddrOfPinnedObject();
int size = il.Length;
// Swap the method.
MethodRental.SwapMethodBody(t, m.MetadataToken, addr, size, MethodRental.JitImmediate);
}
public DTask<bool> ReplacedSolve(int n, DEvent<bool> callback)
{
Console.WriteLine("This was executed instead!");
return true;
}
However, MethodRental.SwapMethodBody only works on dynamic modules; not those that have already been compiled and stored in the assembly.
So I'm looking for a way to effectively do SwapMethodBody on a method that is already stored in a loaded and executing assembly.
Note, it is not an issue if I have to completely copy the method into a dynamic module, but in this case I need to find a way to copy across the IL as well as update all of the calls to Solve() such that they would point to the new copy.
Disclosure: Harmony is a library that was written and is maintained by me, the author of this post.
Harmony 2 is an open source library (MIT license) designed to replace, decorate or modify existing C# methods of any kind during runtime. Its main focus is games and plugins written in Mono or .NET. It takes care of multiple changes to the same method - they accumulate instead of overwrite each other.
It creates dynamic replacement methods for every original method and emits code to them that calls custom methods at the start and end. It also allows you to write filters to process the original IL code and custom exception handlers which allows for more detailed manipulation of the original method.
To complete the process, it writes a simple assembler jump into the trampoline of the original method that points to the assembler generated from compiling the dynamic method. This works for 32/64-bit on Windows, macOS and any Linux that Mono supports.
Documentation can be found here.
Example
(Source)
Original Code
public class SomeGameClass
{
private bool isRunning;
private int counter;
private int DoSomething()
{
if (isRunning)
{
counter++;
return counter * 10;
}
}
}
Patching with Harmony annotations
using SomeGame;
using HarmonyLib;
public class MyPatcher
{
// make sure DoPatching() is called at start either by
// the mod loader or by your injector
public static void DoPatching()
{
var harmony = new Harmony("com.example.patch");
harmony.PatchAll();
}
}
[HarmonyPatch(typeof(SomeGameClass))]
[HarmonyPatch("DoSomething")]
class Patch01
{
static FieldRef<SomeGameClass,bool> isRunningRef =
AccessTools.FieldRefAccess<SomeGameClass, bool>("isRunning");
static bool Prefix(SomeGameClass __instance, ref int ___counter)
{
isRunningRef(__instance) = true;
if (___counter > 100)
return false;
___counter = 0;
return true;
}
static void Postfix(ref int __result)
{
__result *= 2;
}
}
Alternatively, manual patching with reflection
using SomeGame;
using System.Reflection;
using HarmonyLib;
public class MyPatcher
{
// make sure DoPatching() is called at start either by
// the mod loader or by your injector
public static void DoPatching()
{
var harmony = new Harmony("com.example.patch");
var mOriginal = typeof(SomeGameClass).GetMethod("DoSomething", BindingFlags.Instance | BindingFlags.NonPublic);
var mPrefix = typeof(MyPatcher).GetMethod("MyPrefix", BindingFlags.Static | BindingFlags.Public);
var mPostfix = typeof(MyPatcher).GetMethod("MyPostfix", BindingFlags.Static | BindingFlags.Public);
// add null checks here
harmony.Patch(mOriginal, new HarmonyMethod(mPrefix), new HarmonyMethod(mPostfix));
}
public static void MyPrefix()
{
// ...
}
public static void MyPostfix()
{
// ...
}
}
For .NET 4 and above
using System;
using System.Reflection;
using System.Runtime.CompilerServices;
namespace InjectionTest
{
class Program
{
static void Main(string[] args)
{
Target targetInstance = new Target();
targetInstance.test();
Injection.install(1);
Injection.install(2);
Injection.install(3);
Injection.install(4);
targetInstance.test();
Console.Read();
}
}
public class Target
{
public void test()
{
targetMethod1();
Console.WriteLine(targetMethod2());
targetMethod3("Test");
targetMethod4();
}
private void targetMethod1()
{
Console.WriteLine("Target.targetMethod1()");
}
private string targetMethod2()
{
Console.WriteLine("Target.targetMethod2()");
return "Not injected 2";
}
public void targetMethod3(string text)
{
Console.WriteLine("Target.targetMethod3("+text+")");
}
private void targetMethod4()
{
Console.WriteLine("Target.targetMethod4()");
}
}
public class Injection
{
public static void install(int funcNum)
{
MethodInfo methodToReplace = typeof(Target).GetMethod("targetMethod"+ funcNum, BindingFlags.Instance | BindingFlags.Static | BindingFlags.NonPublic | BindingFlags.Public);
MethodInfo methodToInject = typeof(Injection).GetMethod("injectionMethod"+ funcNum, BindingFlags.Instance | BindingFlags.Static | BindingFlags.NonPublic | BindingFlags.Public);
RuntimeHelpers.PrepareMethod(methodToReplace.MethodHandle);
RuntimeHelpers.PrepareMethod(methodToInject.MethodHandle);
unsafe
{
if (IntPtr.Size == 4)
{
int* inj = (int*)methodToInject.MethodHandle.Value.ToPointer() + 2;
int* tar = (int*)methodToReplace.MethodHandle.Value.ToPointer() + 2;
#if DEBUG
Console.WriteLine("\nVersion x86 Debug\n");
byte* injInst = (byte*)*inj;
byte* tarInst = (byte*)*tar;
int* injSrc = (int*)(injInst + 1);
int* tarSrc = (int*)(tarInst + 1);
*tarSrc = (((int)injInst + 5) + *injSrc) - ((int)tarInst + 5);
#else
Console.WriteLine("\nVersion x86 Release\n");
*tar = *inj;
#endif
}
else
{
long* inj = (long*)methodToInject.MethodHandle.Value.ToPointer()+1;
long* tar = (long*)methodToReplace.MethodHandle.Value.ToPointer()+1;
#if DEBUG
Console.WriteLine("\nVersion x64 Debug\n");
byte* injInst = (byte*)*inj;
byte* tarInst = (byte*)*tar;
int* injSrc = (int*)(injInst + 1);
int* tarSrc = (int*)(tarInst + 1);
*tarSrc = (((int)injInst + 5) + *injSrc) - ((int)tarInst + 5);
#else
Console.WriteLine("\nVersion x64 Release\n");
*tar = *inj;
#endif
}
}
}
private void injectionMethod1()
{
Console.WriteLine("Injection.injectionMethod1");
}
private string injectionMethod2()
{
Console.WriteLine("Injection.injectionMethod2");
return "Injected 2";
}
private void injectionMethod3(string text)
{
Console.WriteLine("Injection.injectionMethod3 " + text);
}
private void injectionMethod4()
{
System.Diagnostics.Process.Start("calc");
}
}
}
You CAN modify a method's content at runtime. But you're not supposed to, and it's strongly recommended to keep that for test purposes.
Just have a look at:
http://www.codeproject.com/Articles/463508/NET-CLR-Injection-Modify-IL-Code-during-Run-time
Basically, you can:
Get IL method content via MethodInfo.GetMethodBody().GetILAsByteArray()
Mess with these bytes.
If you just wish to prepend or append some code, then just preprend/append opcodes you want (be careful about leaving stack clean, though)
Here are some tips to "uncompile" existing IL:
Bytes returned are a sequence of IL instructions, followed by their arguments (if they have some - for instance, '.call' has one argument: the called method token, and '.pop' has none)
Correspondence between IL codes and bytes you find in the returned array may be found using OpCodes.YourOpCode.Value (which is the real opcode byte value as saved in your assembly)
Arguments appended after IL codes may have different sizes (from one to several bytes), depending on opcode called
You may find tokens that theses arguments are referring to via appropriate methods. For instance, if your IL contains ".call 354354" (coded as 28 00 05 68 32 in hexa, 28h=40 being '.call' opcode and 56832h=354354), corresponding called method can be found using MethodBase.GetMethodFromHandle(354354)
Once modified, you IL byte array can be reinjected via InjectionHelper.UpdateILCodes(MethodInfo method, byte[] ilCodes) - see link mentioned above
This is the "unsafe" part... It works well, but this consists in hacking internal CLR mechanisms...
you can replace it if the method is non virtual, non generic, not in generic type, not inlined and on x86 plateform:
MethodInfo methodToReplace = ...
RuntimeHelpers.PrepareMetod(methodToReplace.MethodHandle);
var getDynamicHandle = Delegate.CreateDelegate(Metadata<Func<DynamicMethod, RuntimeMethodHandle>>.Type, Metadata<DynamicMethod>.Type.GetMethod("GetMethodDescriptor", BindingFlags.Instance | BindingFlags.NonPublic)) as Func<DynamicMethod, RuntimeMethodHandle>;
var newMethod = new DynamicMethod(...);
var body = newMethod.GetILGenerator();
body.Emit(...) // do what you want.
body.Emit(OpCodes.jmp, methodToReplace);
body.Emit(OpCodes.ret);
var handle = getDynamicHandle(newMethod);
RuntimeHelpers.PrepareMethod(handle);
*((int*)new IntPtr(((int*)methodToReplace.MethodHandle.Value.ToPointer() + 2)).ToPointer()) = handle.GetFunctionPointer().ToInt32();
//all call on methodToReplace redirect to newMethod and methodToReplace is called in newMethod and you can continue to debug it, enjoy.
Based on the answer to this question and another, ive came up with this tidied up version:
// Note: This method replaces methodToReplace with methodToInject
// Note: methodToInject will still remain pointing to the same location
public static unsafe MethodReplacementState Replace(this MethodInfo methodToReplace, MethodInfo methodToInject)
{
//#if DEBUG
RuntimeHelpers.PrepareMethod(methodToReplace.MethodHandle);
RuntimeHelpers.PrepareMethod(methodToInject.MethodHandle);
//#endif
MethodReplacementState state;
IntPtr tar = methodToReplace.MethodHandle.Value;
if (!methodToReplace.IsVirtual)
tar += 8;
else
{
var index = (int)(((*(long*)tar) >> 32) & 0xFF);
var classStart = *(IntPtr*)(methodToReplace.DeclaringType.TypeHandle.Value + (IntPtr.Size == 4 ? 40 : 64));
tar = classStart + IntPtr.Size * index;
}
var inj = methodToInject.MethodHandle.Value + 8;
#if DEBUG
tar = *(IntPtr*)tar + 1;
inj = *(IntPtr*)inj + 1;
state.Location = tar;
state.OriginalValue = new IntPtr(*(int*)tar);
*(int*)tar = *(int*)inj + (int)(long)inj - (int)(long)tar;
return state;
#else
state.Location = tar;
state.OriginalValue = *(IntPtr*)tar;
* (IntPtr*)tar = *(IntPtr*)inj;
return state;
#endif
}
}
public struct MethodReplacementState : IDisposable
{
internal IntPtr Location;
internal IntPtr OriginalValue;
public void Dispose()
{
this.Restore();
}
public unsafe void Restore()
{
#if DEBUG
*(int*)Location = (int)OriginalValue;
#else
*(IntPtr*)Location = OriginalValue;
#endif
}
}
There exists a couple of frameworks that allows you to dynamically change any method at runtime (they use the ICLRProfiling interface mentioned by user152949):
Prig: Free and Open Source!
Microsoft Fakes: Commercial, included in Visual Studio Premium and Ultimate but not Community and Professional
Telerik JustMock: Commercial, a "lite" version is available
Typemock Isolator: Commercial
There are also a few frameworks that mocks around with the internals of .NET, these are likely more fragile, and probably can't change inlined code, but on the other hand they are fully self-contained and does not require you to use a custom launcher.
Harmony: MIT licensed. Seems to actually have been used sucessfully in a few game mods, supports both .NET and Mono.
Deviare In Process Instrumentation Engine: GPLv3 and Commercial. .NET support currently marked as experimental, but on the other hand has the benefit of being commercially backed.
Logman's solution, but with an interface for swapping method bodies. Also, a simpler example.
using System;
using System.Linq;
using System.Reflection;
using System.Runtime.CompilerServices;
namespace DynamicMojo
{
class Program
{
static void Main(string[] args)
{
Animal kitty = new HouseCat();
Animal lion = new Lion();
var meow = typeof(HouseCat).GetMethod("Meow", BindingFlags.Instance | BindingFlags.NonPublic);
var roar = typeof(Lion).GetMethod("Roar", BindingFlags.Instance | BindingFlags.NonPublic);
Console.WriteLine("<==(Normal Run)==>");
kitty.MakeNoise(); //HouseCat: Meow.
lion.MakeNoise(); //Lion: Roar!
Console.WriteLine("<==(Dynamic Mojo!)==>");
DynamicMojo.SwapMethodBodies(meow, roar);
kitty.MakeNoise(); //HouseCat: Roar!
lion.MakeNoise(); //Lion: Meow.
Console.WriteLine("<==(Normality Restored)==>");
DynamicMojo.SwapMethodBodies(meow, roar);
kitty.MakeNoise(); //HouseCat: Meow.
lion.MakeNoise(); //Lion: Roar!
Console.Read();
}
}
public abstract class Animal
{
public void MakeNoise() => Console.WriteLine($"{this.GetType().Name}: {GetSound()}");
protected abstract string GetSound();
}
public sealed class HouseCat : Animal
{
protected override string GetSound() => Meow();
private string Meow() => "Meow.";
}
public sealed class Lion : Animal
{
protected override string GetSound() => Roar();
private string Roar() => "Roar!";
}
public static class DynamicMojo
{
/// <summary>
/// Swaps the function pointers for a and b, effectively swapping the method bodies.
/// </summary>
/// <exception cref="ArgumentException">
/// a and b must have same signature
/// </exception>
/// <param name="a">Method to swap</param>
/// <param name="b">Method to swap</param>
public static void SwapMethodBodies(MethodInfo a, MethodInfo b)
{
if (!HasSameSignature(a, b))
{
throw new ArgumentException("a and b must have have same signature");
}
RuntimeHelpers.PrepareMethod(a.MethodHandle);
RuntimeHelpers.PrepareMethod(b.MethodHandle);
unsafe
{
if (IntPtr.Size == 4)
{
int* inj = (int*)b.MethodHandle.Value.ToPointer() + 2;
int* tar = (int*)a.MethodHandle.Value.ToPointer() + 2;
byte* injInst = (byte*)*inj;
byte* tarInst = (byte*)*tar;
int* injSrc = (int*)(injInst + 1);
int* tarSrc = (int*)(tarInst + 1);
int tmp = *tarSrc;
*tarSrc = (((int)injInst + 5) + *injSrc) - ((int)tarInst + 5);
*injSrc = (((int)tarInst + 5) + tmp) - ((int)injInst + 5);
}
else
{
throw new NotImplementedException($"{nameof(SwapMethodBodies)} doesn't yet handle IntPtr size of {IntPtr.Size}");
}
}
}
private static bool HasSameSignature(MethodInfo a, MethodInfo b)
{
bool sameParams = !a.GetParameters().Any(x => !b.GetParameters().Any(y => x == y));
bool sameReturnType = a.ReturnType == b.ReturnType;
return sameParams && sameReturnType;
}
}
}
Based on TakeMeAsAGuest's answer, here's a similar extension which does not require to use unsafe blocks.
Here's the Extensions class:
using System;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
namespace MethodRedirect
{
static class Extensions
{
public static void RedirectTo(this MethodInfo origin, MethodInfo target)
{
IntPtr ori = GetMethodAddress(origin);
IntPtr tar = GetMethodAddress(target);
Marshal.Copy(new IntPtr[] { Marshal.ReadIntPtr(tar) }, 0, ori, 1);
}
private static IntPtr GetMethodAddress(MethodInfo mi)
{
const ushort SLOT_NUMBER_MASK = 0xffff; // 2 bytes mask
const int MT_OFFSET_32BIT = 0x28; // 40 bytes offset
const int MT_OFFSET_64BIT = 0x40; // 64 bytes offset
IntPtr address;
// JIT compilation of the method
RuntimeHelpers.PrepareMethod(mi.MethodHandle);
IntPtr md = mi.MethodHandle.Value; // MethodDescriptor address
IntPtr mt = mi.DeclaringType.TypeHandle.Value; // MethodTable address
if (mi.IsVirtual)
{
// The fixed-size portion of the MethodTable structure depends on the process type
int offset = IntPtr.Size == 4 ? MT_OFFSET_32BIT : MT_OFFSET_64BIT;
// First method slot = MethodTable address + fixed-size offset
// This is the address of the first method of any type (i.e. ToString)
IntPtr ms = Marshal.ReadIntPtr(mt + offset);
// Get the slot number of the virtual method entry from the MethodDesc data structure
long shift = Marshal.ReadInt64(md) >> 32;
int slot = (int)(shift & SLOT_NUMBER_MASK);
// Get the virtual method address relative to the first method slot
address = ms + (slot * IntPtr.Size);
}
else
{
// Bypass default MethodDescriptor padding (8 bytes)
// Reach the CodeOrIL field which contains the address of the JIT-compiled code
address = md + 8;
}
return address;
}
}
}
And here's a simple usage example:
using System;
using System.Reflection;
namespace MethodRedirect
{
class Scenario
{
static void Main(string[] args)
{
Assembly assembly = Assembly.GetAssembly(typeof(Scenario));
Type Scenario_Type = assembly.GetType("MethodRedirect.Scenario");
MethodInfo Scenario_InternalInstanceMethod = Scenario_Type.GetMethod("InternalInstanceMethod", BindingFlags.Instance | BindingFlags.NonPublic);
MethodInfo Scenario_PrivateInstanceMethod = Scenario_Type.GetMethod("PrivateInstanceMethod", BindingFlags.Instance | BindingFlags.NonPublic);
Scenario_InternalInstanceMethod.RedirectTo(Scenario_PrivateInstanceMethod);
// Using dynamic type to prevent method string caching
dynamic scenario = (Scenario)Activator.CreateInstance(Scenario_Type);
bool result = scenario.InternalInstanceMethod() == "PrivateInstanceMethod";
Console.WriteLine("\nRedirection {0}", result ? "SUCCESS" : "FAILED");
Console.ReadKey();
}
internal string InternalInstanceMethod()
{
return "InternalInstanceMethod";
}
private string PrivateInstanceMethod()
{
return "PrivateInstanceMethod";
}
}
}
This is distilled from a more detailed project I made available on Github (MethodRedirect).
Remark : The code was implemented using .NET Framework 4 and it has not been tested on newer version of .NET.
You can replace a method at runtime by using the ICLRPRofiling Interface.
Call AttachProfiler to attach to the process.
Call SetILFunctionBody to replace the method code.
See this blog for more details.
I know it is not the exact answer to your question, but the usual way to do it is using factories/proxy approach.
First we declare a base type.
public class SimpleClass
{
public virtual DTask<bool> Solve(int n, DEvent<bool> callback)
{
for (int m = 2; m < n - 1; m += 1)
if (m % n == 0)
return false;
return true;
}
}
Then we can declare a derived type (call it proxy).
public class DistributedClass
{
public override DTask<bool> Solve(int n, DEvent<bool> callback)
{
CodeToExecuteBefore();
return base.Slove(n, callback);
}
}
// At runtime
MyClass myInstance;
if (distributed)
myInstance = new DistributedClass();
else
myInstance = new SimpleClass();
The derived type can be also generated at runtime.
public static class Distributeds
{
private static readonly ConcurrentDictionary<Type, Type> pDistributedTypes = new ConcurrentDictionary<Type, Type>();
public Type MakeDistributedType(Type type)
{
Type result;
if (!pDistributedTypes.TryGetValue(type, out result))
{
if (there is at least one method that have [Distributed] attribute)
{
result = create a new dynamic type that inherits the specified type;
}
else
{
result = type;
}
pDistributedTypes[type] = result;
}
return result;
}
public T MakeDistributedInstance<T>()
where T : class
{
Type type = MakeDistributedType(typeof(T));
if (type != null)
{
// Instead of activator you can also register a constructor delegate generated at runtime if performances are important.
return Activator.CreateInstance(type);
}
return null;
}
}
// In your code...
MyClass myclass = Distributeds.MakeDistributedInstance<MyClass>();
myclass.Solve(...);
The only performance loss is during construction of the derived object, the first time is quite slow because it will use a lot of reflection and reflection emit.
All other times, it is the cost of a concurrent table lookup and a constructor.
As said, you can optimize construction using
ConcurrentDictionary<Type, Func<object>>.
have a look into Mono.Cecil:
using Mono.Cecil;
using Mono.Cecil.Inject;
public class Patcher
{
public void Patch()
{
// Load the assembly that contains the hook method
AssemblyDefinition hookAssembly = AssemblyLoader.LoadAssembly("MyHookAssembly.dll");
// Load the assembly
AssemblyDefinition targetAssembly = AssemblyLoader.LoadAssembly("TargetAssembly.dll");
// Get the method definition for the injection definition
MethodDefinition myHook = hookAssembly.MainModule.GetType("HookNamespace.MyHookClass").GetMethod("MyHook");
// Get the method definition for the injection target.
// Note that in this example class Bar is in the global namespace (no namespace), which is why we don't specify the namespace.
MethodDefinition foo = targetAssembly.MainModule.GetType("Bar").GetMethod("Foo");
// Create the injector
InjectionDefinition injector = new InjectionDefinition(foo, myHook, InjectFlags.PassInvokingInstance | InjectFlags.passParametersVal);
// Perform the injection with default settings (inject into the beginning before the first instruction)
injector.Inject();
// More injections or saving the target assembly...
}
}
I have very simple Bitwise Operator methods that I want to use like this:
myInt.SetBit(int k, bool set)
so that it changes the bit at index 'k' to the value 'set' (0 or 1) I first thought of doing it like this:
public static void SetBit(this int A, int k, bool set) {
if (set) A |= 1 << k;
else A &= ~(1 << k);
}
but of course this only changes the internal value of variable 'A', and not the original variable, since integer isn't a reference type.
I can't use 'ref' along with 'this' so I don't know how to turn this into a reference parameter.
I already have similar methods for Int arrays, but those work fine since an array is a reference type.
I'm looking for a way to keep this handy syntax for single integers, if there is one.
You shouldn't treat it as a reference type.
Make your method return the modified value and assign it back to your variable. That approach will be consistent to immutable types. Consider example of String.Replace, it doesn't modify the string in place, instead it returns a modified copy.
public static int SetBit(this int A, int k, bool set)
{
if (set) A |= 1 << k;
else A &= ~(1 << k);
return A;
}
Some types (like ints) are immutable, meaning that once they are set to a value, they cannot be changed again. You'll notice that any method working on an int will return a new int rather than changing the value of the given value.
Update your code like this:
public static int SetBit(this int A, int k, bool set) {
if (set) A |= 1 << k;
else A &= ~(1 << k);
return A;
}
and use it like:
var x = 5;
x = x.SetBit(1, true);
I suggest just returning the result instead of trying to change immutable int:
// int instead of void
public static int SetBit(this int A, int k, bool set) {
return set ? (A | (1 << k)) : (A & (~(1 << k)));
}
So you can do
int source = 12345;
int result = source.SetBit(3, true);
I had to check on this because it looks a little odd. Sure enough - you can't write an extension method using ref.
See here, someone else has tried this.
You cannot mix this and ref in the extension methods. You have many choices:
Returning the result from the extension method (I prefer this option):
public static int SetBit(this int A, int k, bool set)
{
if (set) A |= 1 << k;
else A &= ~(1 << k);
return A;
}
using:
int i = 3;
i = i.SetBit(1, false);
Using a method with ref:
public static void SetBitRef(ref int A, int k, bool set)
{
if (set) A |= 1 << k;
else A &= ~(1 << k);
}
using:
int i = 3;
IntExtensions.SetBitRef(ref i, 1, false);
Using an IntWrapper class instead of int:
class IntWrapper
{
public IntWrapper(int intValue)
{
Value = intValue;
}
public int Value { get; set; }
}
with a reference type you can create this extension method:
public static void SetBit(this IntWrapper A, int k, bool set)
{
int intValue = A.Value;
if (set) intValue |= 1 << k;
else intValue &= ~(1 << k);
A.Value = intValue;
}
using:
IntWrapper iw = new IntWrapper(3);
iw.SetBit(1, false);
I am trying to process a WM_MOUSEMOVE message in C#.
What is the proper way to get an X and Y coordinate from lParam which is a type of IntPtr?
Try:
(note that this was the initial version, read below for the final version)
IntPtr xy = value;
int x = unchecked((short)xy);
int y = unchecked((short)((uint)xy >> 16));
The unchecked normally isn't necessary (because the "default" c# projects are unchecked)
Consider that these are the definitions of the used macros:
#define LOWORD(l) ((WORD)(((DWORD_PTR)(l)) & 0xffff))
#define HIWORD(l) ((WORD)((((DWORD_PTR)(l)) >> 16) & 0xffff))
#define GET_X_LPARAM(lp) ((int)(short)LOWORD(lp))
#define GET_Y_LPARAM(lp) ((int)(short)HIWORD(lp))
Where WORD == ushort, DWORD == uint. I'm cutting some ushort->short conversions.
Addendum:
one and half year later, and having experienced the "vagaries" of 64 bits .NET, I concur with Celess (but note that 99% of the Windows messages are still 32 bits for reasons of compatibility, so I don't think the problem isn't really big now. It's more for the future and because if you want to do something, you should do it correctly.)
The only thing I would make different is this:
IntPtr xy = value;
int x = unchecked((short)(long)xy);
int y = unchecked((short)((long)xy >> 16));
instead of doing the check "is the IntPtr 4 or 8 bytes long", I take the worst case (8 bytes long) and cast xy to a long. With a little luck the double cast (to long and then to short/to uint) will be optimized by the compiler (in the end, the explicit conversion to int of IntPtr is a red herring... If you use it you are putting yourself at risk in the future. You should always use the long conversion and then use it directly/re-cast it to what you need, showing to the future programmers that you knew what you were doing.
A test example: http://ideone.com/a4oGW2 (sadly only 32 bits, but if you have a 64 bits machine you can test the same code)
Correct for both 32 and 64-bit:
Point GetPoint(IntPtr _xy)
{
uint xy = unchecked(IntPtr.Size == 8 ? (uint)_xy.ToInt64() : (uint)_xy.ToInt32());
int x = unchecked((short)xy);
int y = unchecked((short)(xy >> 16));
return new Point(x, y);
}
- or -
int GetIntUnchecked(IntPtr value)
{
return IntPtr.Size == 8 ? unchecked((int)value.ToInt64()) : value.ToInt32();
}
int Low16(IntPtr value)
{
return unchecked((short)GetIntUnchecked(value));
}
int High16(IntPtr value)
{
return unchecked((short)(((uint)GetIntUnchecked(value)) >> 16));
}
These also work:
int Low16(IntPtr value)
{
return unchecked((short)(uint)value); // classic unchecked cast to uint
}
int High16(IntPtr value)
{
return unchecked((short)((uint)value >> 16));
}
- or -
int Low16(IntPtr value)
{
return unchecked((short)(long)value); // presumption about internals
} // is what framework lib uses
int High16(IntPtr value)
{
return unchecked((short)((long)value >> 16));
}
Going the other way
public static IntPtr GetLParam(Point point)
{
return (IntPtr)((point.Y << 16) | (point.X & 0xffff));
} // mask ~= unchecked((int)(short)x)
- or -
public static IntPtr MakeLParam(int low, int high)
{
return (IntPtr)((high << 16) | (low & 0xffff));
} // (IntPtr)x is same as 'new IntPtr(x)'
The accepted answer is good translation of the C definition. If were dealing with just the raw 'void*' directly, then would be mostly ok. However when using 'IntPtr' in a .Net 64-bit execution environment, 'unchecked' will not stop conversion overflow exceptions from being thrown from inside IntPtr. The unchecked block does not affect conversions that happen inside IntPtr funcitons and operators. Currently the accepted answer states that use of 'unchecked' is not necesary. However the use of 'unchecked' is absolutely necessary, as would always be the case in casting to negative values from a larger type.
On 64-bit, from the accepted answer:
var xy = new IntPtr(0x0FFFFFFFFFFFFFFF);
int x = unchecked((short)xy); // <-- throws
int y = unchecked((short)((uint)xy >> 16)); // gets lucky, 'uint' implicit 'long'
y = unchecked((short)((int)xy >> 16)); // <-- throws
xy = new IntPtr(0x00000000FFFF0000); // 0, -1
x = unchecked((short)xy); // <-- throws
y = unchecked((short)((uint)xy >> 16)); // still lucky
y = (short)((uint)xy >> 16); // <-- throws (short), no longer lucky
On 64-bit, using extrapolated version of DmitryG's:
var ptr = new IntPtr(0x0FFFFFFFFFFFFFFF);
var xy = IntPtr.Size == 8 ? (int)ptr.ToInt64() : ptr.ToInt32(); // <-- throws (int)
int x = unchecked((short)xy); // fine, if gets this far
int y = unchecked((short)((uint)xy >> 16)); // fine, if gets this far
y = unchecked((short)(xy >> 16)); // also fine, if gets this far
ptr = new IntPtr(0x00000000FFFF0000); // 0, -1
xy = IntPtr.Size == 8 ? (int)ptr.ToInt64() : ptr.ToInt32(); // <-- throws (int)
On performance
return IntPtr.Size == 8 ? unchecked((int)value.ToInt64()) : value.ToInt32();
The IntPtr.Size property returns a constant as compile time literal that is capable if being inlined across assemblies. Thus is possible for the JIT to have nearly all of this optimized out. Could also do:
return unchecked((int)value.ToInt64());
- or -
return unchecked((int)(long)value);
- or -
return unchecked((uint)value); // traditional
and all 3 of these will always call the equivalient of IntPtr.ToInt64(). ToInt64(), and 'operator long', are also capable of being inlined, but less likely to be. Is much more code in 32-bit version than the Size constant. I would submit that the solution at the top is maybe more symantically correct. Its also important to be aware of sign-extension artifacts, which would fill all 64-bits reguardless on something like (long)int_val, though i've pretty much glossed over that here, however may additionally affect inlining on 32-bit.
Useage
if (Low16(wParam) == NativeMethods.WM_CREATE)) { }
var x = Low16(lParam);
var point = GetPoint(lParam);
A 'safe' IntPtr mockup shown below for future traverlers.
Run this without setting the WIN32 define on 32-bit to get a solid simulation of the 64-bit IntPtr behavour.
public struct IntPtrMock
{
#if WIN32
int m_value;
#else
long m_value;
#endif
int IntPtr_ToInt32() {
#if WIN32
return (int)m_value;
#else
long l = m_value;
return checked((int)l);
#endif
}
public static explicit operator int(IntPtrMock value) { //(short) resolves here
#if WIN32
return (int)value.m_value;
#else
long l = value.m_value;
return checked((int)l); // throws here if any high 32 bits
#endif // check forces sign stay signed
}
public static explicit operator long(IntPtrMock value) { //(uint) resolves here
#if WIN32
return (long)(int)value.m_value;
#else
return (long)value.m_value;
#endif
}
public int ToInt32() {
#if WIN32
return (int)value.m_value;
#else
long l = m_value;
return checked((int)l); // throws here if any high 32 bits
#endif // check forces sign stay signed
}
public long ToInt64() {
#if WIN32
return (long)(int)m_value;
#else
return (long)m_value;
#endif
}
public IntPtrMock(long value) {
#if WIN32
m_value = checked((int)value);
#else
m_value = value;
#endif
}
}
public static IntPtr MAKELPARAM(int low, int high)
{
return (IntPtr)((high << 16) | (low & 0xffff));
}
public Main()
{
var xy = new IntPtrMock(0x0FFFFFFFFFFFFFFF); // simulate 64-bit, overflow smaller
int x = unchecked((short)xy); // <-- throws
int y = unchecked((short)((uint)xy >> 16)); // got lucky, 'uint' implicit 'long'
y = unchecked((short)((int)xy >> 16)); // <-- throws
int xy2 = IntPtr.Size == 8 ? (int)xy.ToInt64() : xy.ToInt32(); // <-- throws
int xy3 = unchecked(IntPtr.Size == 8 ? (int)xy.ToInt64() : xy.ToInt32()); //ok
// proper 32-bit lParam, overflow signed
var xy4 = new IntPtrMock(0x00000000FFFFFFFF); // x = -1, y = -1
int x2 = unchecked((short)xy4); // <-- throws
int xy5 = IntPtr.Size == 8 ? (int)xy4.ToInt64() : xy4.ToInt32(); // <-- throws
var xy6 = new IntPtrMock(0x00000000FFFF0000); // x = 0, y = -1
int x3 = unchecked((short)xy6); // <-- throws
int xy7 = IntPtr.Size == 8 ? (int)xy6.ToInt64() : xy6.ToInt32(); // <-- throws
var xy8 = MAKELPARAM(-1, -1); // WinForms macro
int x4 = unchecked((short)xy8); // <-- throws
int xy9 = IntPtr.Size == 8 ? (int)xy8.ToInt64() : xy8.ToInt32(); // <-- throws
}
Usualy, for low-level mouse processing I have used the following helper (it also considers that IntPtr size depends on x86/x64):
//...
Point point = WinAPIHelper.GetPoint(msg.LParam);
//...
static class WinAPIHelper {
public static Point GetPoint(IntPtr lParam) {
return new Point(GetInt(lParam));
}
public static MouseButtons GetButtons(IntPtr wParam) {
MouseButtons buttons = MouseButtons.None;
int btns = GetInt(wParam);
if((btns & MK_LBUTTON) != 0) buttons |= MouseButtons.Left;
if((btns & MK_RBUTTON) != 0) buttons |= MouseButtons.Right;
return buttons;
}
static int GetInt(IntPtr ptr) {
return IntPtr.Size == 8 ? unchecked((int)ptr.ToInt64()) : ptr.ToInt32();
}
const int MK_LBUTTON = 1;
const int MK_RBUTTON = 2;
}
Is there a method (in c#/.net) that would left-shift (bitwise) each short in a short[] that would be faster then doing it in a loop?
I am talking about data coming from a digital camera (16bit gray), the camera only uses the lower 12 bits. So to see something when rendering the data it needs to be shifted left by 4.
This is what I am doing so far:
byte[] RawData; // from camera along with the other info
if (pf == PixelFormats.Gray16)
{
fixed (byte* ptr = RawData)
{
short* wptr = (short*)ptr;
short temp;
for (int line = 0; line < ImageHeight; line++)
{
for (int pix = 0; pix < ImageWidth; pix++)
{
temp = *(wptr + (pix + line * ImageWidth));
*(wptr + (pix + line * ImageWidth)) = (short)(temp << 4);
}
}
}
}
Any ideas?
I don't know of a library method that will do it, but I have some suggestions that might help. This will only work if you know that the upper four bits of the pixel are definitely zero (rather than garbage). (If they are garbage, you'd have to add bitmasks to the below). Basically I would propose:
Using a shift operator on a larger data type (int or long) so that you are shifting more data at once
Getting rid of the multiply operations inside your loop
Doing a little loop unrolling
Here is my code:
using System.Diagnostics;
namespace ConsoleApplication9 {
class Program {
public static void Main() {
Crazy();
}
private static unsafe void Crazy() {
short[] RawData={
0x000, 0x111, 0x222, 0x333, 0x444, 0x555, 0x666, 0x777, 0x888,
0x999, 0xaaa, 0xbbb, 0xccc, 0xddd, 0xeee, 0xfff, 0x123, 0x456,
//extra sentinel value which is just here to demonstrate that the algorithm
//doesn't go too far
0xbad
};
const int ImageHeight=2;
const int ImageWidth=9;
var numShorts=ImageHeight*ImageWidth;
fixed(short* rawDataAsShortPtr=RawData) {
var nextLong=(long*)rawDataAsShortPtr;
//1 chunk of 4 longs
// ==8 ints
// ==16 shorts
while(numShorts>=16) {
*nextLong=*nextLong<<4;
nextLong++;
*nextLong=*nextLong<<4;
nextLong++;
*nextLong=*nextLong<<4;
nextLong++;
*nextLong=*nextLong<<4;
nextLong++;
numShorts-=16;
}
var nextShort=(short*)nextLong;
while(numShorts>0) {
*nextShort=(short)(*nextShort<<4);
nextShort++;
numShorts--;
}
}
foreach(var item in RawData) {
Debug.Print("{0:X4}", item);
}
}
}
}
I need to maintain a roster of connected clients that are very shortlived and frequently go up and down. Due to the potential number of clients I need a collection that supports fast insert/delete. Suggestions?
C5 Generic Collection Library
The best implementations I have found in C# and C++ are these -- for C#/CLI:
http://www.itu.dk/research/c5/Release1.1/ITU-TR-2006-76.pdf
http://www.itu.dk/research/c5/
It's well researched, has extensible unit tests, and since February they also have implemented the common interfaces in .Net which makes it a lot easier to work with the collections. They were featured on Channel9 and they've done extensive performance testing on the collections.
If you are using data-structures anyway these researchers have a red-black-tree implementation in their library, similar to what you find if you fire up Lütz reflector and have a look in System.Data's internal structures :p. Insert-complexity: O(log(n)).
Lock-free C++ collections
Then, if you can allow for some C++ interop and you absolutely need the speed and want as little overhead as possible, then these lock-free ADTs from Dmitriy V'jukov are probably the best you can get in this world, outperforming Intel's concurrent library of ADTs.
http://groups.google.com/group/lock-free
I've read some of the code and it's really the makings of someone well versed in how these things are put together. VC++ can do native C++ interop without annoying boundaries. http://www.swig.org/ can otherwise help you wrap C++ interfaces for consumption in .Net, or you can do it yourself through P/Invoke.
Microsoft's Take
They have written tutorials, this one implementing a rather unpolished skip-list in C#, and discussing other types of data-structures. (There's a better SkipList at CodeProject, which is very polished and implement the interfaces in a well-behaved manner.) They also have a few data-structures bundled with .Net, namely the HashTable/Dictionary<,> and HashSet. Of course there's the "ResizeArray"/List type as well together with a stack and queue, but they are all "linear" on search.
Google's perf-tools
If you wish to speed up the time it takes for memory-allocation you can use google's perf-tools. They are available at google code and they contain a very interesting multi-threaded malloc-implementation (TCMalloc) which shows much more consistent timing than the normal malloc does. You could use this together with the lock-free structures above to really go crazy with performance.
Improving response times with memoization
You can also use memoization on functions to improve performance through caching, something interesting if you're using e.g. F#. F# also allows C++ interop, so you're OK there.
O(k)
There's also the possibility of doing something on your own using the research which has been done on bloom-filters, which allow O(k) lookup complexity where k is a constant that depends on the number of hash-functions you have implemented. This is how google's BigTable has been implemented. These filter will get you the element if it's in the set or possibly with a very low likeliness an element which is not the one you're looking for (see the graph at wikipedia -- it's approaching P(wrong_key) -> 0.01 as size is around 10000 elements, but you can go around this by implementing further hash-functions/reducing the set.
I haven't searched for .Net implementations of this, but since the hashing calculations are independent you can use MS's performance team's implementation of Tasks to speed that up.
"My" take -- randomize to reach average O(log n)
As it happens I just did a coursework involving data-structures. In this case we used C++, but it's very easy to translate to C#. We built three different data-structures; a bloom-filter, a skip-list and random binary search tree.
See the code and analysis after the last paragraph.
Hardware-based "collections"
Finally, to make my answer "complete", if you truly need speed you can use something like Routing-tables or Content-addressable memory . This allows you to very quickly O(1) in principle get a "hash"-to-value lookup of your data.
Random Binary Search Tree/Bloom Filter C++ code
I would really appreciate feedback if you find mistakes in the code, or just pointers on how I can do it better (or with better usage of templates). Note that the bloom filter isn't like it would be in real life; normally you don't have to be able to delete from it and then it much much more space efficient than the hack I did to allow the delete to be tested.
DataStructure.h
#ifndef DATASTRUCTURE_H_
#define DATASTRUCTURE_H_
class DataStructure
{
public:
DataStructure() {countAdd=0; countDelete=0;countFind=0;}
virtual ~DataStructure() {}
void resetCountAdd() {countAdd=0;}
void resetCountFind() {countFind=0;}
void resetCountDelete() {countDelete=0;}
unsigned int getCountAdd(){return countAdd;}
unsigned int getCountDelete(){return countDelete;}
unsigned int getCountFind(){return countFind;}
protected:
unsigned int countAdd;
unsigned int countDelete;
unsigned int countFind;
};
#endif /*DATASTRUCTURE_H_*/
Key.h
#ifndef KEY_H_
#define KEY_H_
#include <string>
using namespace std;
const int keyLength = 128;
class Key : public string
{
public:
Key():string(keyLength, ' ') {}
Key(const char in[]): string(in){}
Key(const string& in): string(in){}
bool operator<(const string& other);
bool operator>(const string& other);
bool operator==(const string& other);
virtual ~Key() {}
};
#endif /*KEY_H_*/
Key.cpp
#include "Key.h"
bool Key::operator<(const string& other)
{
return compare(other) < 0;
};
bool Key::operator>(const string& other)
{
return compare(other) > 0;
};
bool Key::operator==(const string& other)
{
return compare(other) == 0;
}
BloomFilter.h
#ifndef BLOOMFILTER_H_
#define BLOOMFILTER_H_
#include <iostream>
#include <assert.h>
#include <vector>
#include <math.h>
#include "Key.h"
#include "DataStructure.h"
#define LONG_BIT 32
#define bitmask(val) (unsigned long)(1 << (LONG_BIT - (val % LONG_BIT) - 1))
// TODO: Implement RW-locking on the reads/writes to the bitmap.
class BloomFilter : public DataStructure
{
public:
BloomFilter(){}
BloomFilter(unsigned long length){init(length);}
virtual ~BloomFilter(){}
void init(unsigned long length);
void dump();
void add(const Key& key);
void del(const Key& key);
/**
* Returns true if the key IS BELIEVED to exist, false if it absolutely doesn't.
*/
bool testExist(const Key& key, bool v = false);
private:
unsigned long hash1(const Key& key);
unsigned long hash2(const Key& key);
bool exist(const Key& key);
void getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1, int& i2, const Key& key);
void getCountIndicies(const int i1, const unsigned long h1,
const int i2, const unsigned long h2, int& i1_c, int& i2_c);
vector<unsigned long> m_tickBook;
vector<unsigned int> m_useCounts;
unsigned long m_length; // number of bits in the bloom filter
unsigned long m_pockets; //the number of pockets
static const unsigned long m_pocketSize; //bits in each pocket
};
#endif /*BLOOMFILTER_H_*/
BloomFilter.cpp
#include "BloomFilter.h"
const unsigned long BloomFilter::m_pocketSize = LONG_BIT;
void BloomFilter::init(unsigned long length)
{
//m_length = length;
m_length = (unsigned long)((2.0*length)/log(2))+1;
m_pockets = (unsigned long)(ceil(double(m_length)/m_pocketSize));
m_tickBook.resize(m_pockets);
// my own (allocate nr bits possible to store in the other vector)
m_useCounts.resize(m_pockets * m_pocketSize);
unsigned long i; for(i=0; i< m_pockets; i++) m_tickBook[i] = 0;
for (i = 0; i < m_useCounts.size(); i++) m_useCounts[i] = 0; // my own
}
unsigned long BloomFilter::hash1(const Key& key)
{
unsigned long hash = 5381;
unsigned int i=0; for (i=0; i< key.length(); i++){
hash = ((hash << 5) + hash) + key.c_str()[i]; /* hash * 33 + c */
}
double d_hash = (double) hash;
d_hash *= (0.5*(sqrt(5)-1));
d_hash -= floor(d_hash);
d_hash *= (double)m_length;
return (unsigned long)floor(d_hash);
}
unsigned long BloomFilter::hash2(const Key& key)
{
unsigned long hash = 0;
unsigned int i=0; for (i=0; i< key.length(); i++){
hash = key.c_str()[i] + (hash << 6) + (hash << 16) - hash;
}
double d_hash = (double) hash;
d_hash *= (0.5*(sqrt(5)-1));
d_hash -= floor(d_hash);
d_hash *= (double)m_length;
return (unsigned long)floor(d_hash);
}
bool BloomFilter::testExist(const Key& key, bool v){
if(exist(key)) {
if(v) cout<<"Key "<< key<<" is in the set"<<endl;
return true;
}else {
if(v) cout<<"Key "<< key<<" is not in the set"<<endl;
return false;
}
}
void BloomFilter::dump()
{
cout<<m_pockets<<" Pockets: ";
// I changed u to %p because I wanted it printed in hex.
unsigned long i; for(i=0; i< m_pockets; i++) printf("%p ", (void*)m_tickBook[i]);
cout<<endl;
}
void BloomFilter::add(const Key& key)
{
unsigned long h1, h2;
int i1, i2;
int i1_c, i2_c;
// tested!
getHashAndIndicies(h1, h2, i1, i2, key);
getCountIndicies(i1, h1, i2, h2, i1_c, i2_c);
m_tickBook[i1] = m_tickBook[i1] | bitmask(h1);
m_tickBook[i2] = m_tickBook[i2] | bitmask(h2);
m_useCounts[i1_c] = m_useCounts[i1_c] + 1;
m_useCounts[i2_c] = m_useCounts[i2_c] + 1;
countAdd++;
}
void BloomFilter::del(const Key& key)
{
unsigned long h1, h2;
int i1, i2;
int i1_c, i2_c;
if (!exist(key)) throw "You can't delete keys which are not in the bloom filter!";
// First we need the indicies into m_tickBook and the
// hashes.
getHashAndIndicies(h1, h2, i1, i2, key);
// The index of the counter is the index into the bitvector
// times the number of bits per vector item plus the offset into
// that same vector item.
getCountIndicies(i1, h1, i2, h2, i1_c, i2_c);
// We need to update the value in the bitvector in order to
// delete the key.
m_useCounts[i1_c] = (m_useCounts[i1_c] == 1 ? 0 : m_useCounts[i1_c] - 1);
m_useCounts[i2_c] = (m_useCounts[i2_c] == 1 ? 0 : m_useCounts[i2_c] - 1);
// Now, if we depleted the count for a specific bit, then set it to
// zero, by anding the complete unsigned long with the notted bitmask
// of the hash value
if (m_useCounts[i1_c] == 0)
m_tickBook[i1] = m_tickBook[i1] & ~(bitmask(h1));
if (m_useCounts[i2_c] == 0)
m_tickBook[i2] = m_tickBook[i2] & ~(bitmask(h2));
countDelete++;
}
bool BloomFilter::exist(const Key& key)
{
unsigned long h1, h2;
int i1, i2;
countFind++;
getHashAndIndicies(h1, h2, i1, i2, key);
return ((m_tickBook[i1] & bitmask(h1)) > 0) &&
((m_tickBook[i2] & bitmask(h2)) > 0);
}
/*
* Gets the values of the indicies for two hashes and places them in
* the passed parameters. The index is into m_tickBook.
*/
void BloomFilter::getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1,
int& i2, const Key& key)
{
h1 = hash1(key);
h2 = hash2(key);
i1 = (int) h1/m_pocketSize;
i2 = (int) h2/m_pocketSize;
}
/*
* Gets the values of the indicies into the count vector, which keeps
* track of how many times a specific bit-position has been used.
*/
void BloomFilter::getCountIndicies(const int i1, const unsigned long h1,
const int i2, const unsigned long h2, int& i1_c, int& i2_c)
{
i1_c = i1*m_pocketSize + h1%m_pocketSize;
i2_c = i2*m_pocketSize + h2%m_pocketSize;
}
** RBST.h **
#ifndef RBST_H_
#define RBST_H_
#include <iostream>
#include <assert.h>
#include <vector>
#include <math.h>
#include "Key.h"
#include "DataStructure.h"
#define BUG(str) printf("%s:%d FAILED SIZE INVARIANT: %s\n", __FILE__, __LINE__, str);
using namespace std;
class RBSTNode;
class RBSTNode: public Key
{
public:
RBSTNode(const Key& key):Key(key)
{
m_left =NULL;
m_right = NULL;
m_size = 1U; // the size of one node is 1.
}
virtual ~RBSTNode(){}
string setKey(const Key& key){return Key(key);}
RBSTNode* left(){return m_left; }
RBSTNode* right(){return m_right;}
RBSTNode* setLeft(RBSTNode* left) { m_left = left; return this; }
RBSTNode* setRight(RBSTNode* right) { m_right =right; return this; }
#ifdef DEBUG
ostream& print(ostream& out)
{
out << "Key(" << *this << ", m_size: " << m_size << ")";
return out;
}
#endif
unsigned int size() { return m_size; }
void setSize(unsigned int val)
{
#ifdef DEBUG
this->print(cout);
cout << "::setSize(" << val << ") called." << endl;
#endif
if (val == 0) throw "Cannot set the size below 1, then just delete this node.";
m_size = val;
}
void incSize() {
#ifdef DEBUG
this->print(cout);
cout << "::incSize() called" << endl;
#endif
m_size++;
}
void decrSize()
{
#ifdef DEBUG
this->print(cout);
cout << "::decrSize() called" << endl;
#endif
if (m_size == 1) throw "Cannot decrement size below 1, then just delete this node.";
m_size--;
}
#ifdef DEBUG
unsigned int size(RBSTNode* x);
#endif
private:
RBSTNode(){}
RBSTNode* m_left;
RBSTNode* m_right;
unsigned int m_size;
};
class RBST : public DataStructure
{
public:
RBST() {
m_size = 0;
m_head = NULL;
srand(time(0));
};
virtual ~RBST() {};
/**
* Tries to add key into the tree and will return
* true for a new item added
* false if the key already is in the tree.
*
* Will also have the side-effect of printing to the console if v=true.
*/
bool add(const Key& key, bool v=false);
/**
* Same semantics as other add function, but takes a string,
* but diff name, because that'll cause an ambiguity because of inheritance.
*/
bool addString(const string& key);
/**
* Deletes a key from the tree if that key is in the tree.
* Will return
* true for success and
* false for failure.
*
* Will also have the side-effect of printing to the console if v=true.
*/
bool del(const Key& key, bool v=false);
/**
* Tries to find the key in the tree and will return
* true if the key is in the tree and
* false if the key is not.
*
* Will also have the side-effect of printing to the console if v=true.
*/
bool find(const Key& key, bool v = false);
unsigned int count() { return m_size; }
#ifdef DEBUG
int dump(char sep = ' ');
int dump(RBSTNode* target, char sep);
unsigned int size(RBSTNode* x);
#endif
private:
RBSTNode* randomAdd(RBSTNode* target, const Key& key);
RBSTNode* addRoot(RBSTNode* target, const Key& key);
RBSTNode* rightRotate(RBSTNode* target);
RBSTNode* leftRotate(RBSTNode* target);
RBSTNode* del(RBSTNode* target, const Key& key);
RBSTNode* join(RBSTNode* left, RBSTNode* right);
RBSTNode* find(RBSTNode* target, const Key& key);
RBSTNode* m_head;
unsigned int m_size;
};
#endif /*RBST_H_*/
** RBST.cpp **
#include "RBST.h"
bool RBST::add(const Key& key, bool v){
unsigned int oldSize = m_size;
m_head = randomAdd(m_head, key);
if (m_size > oldSize){
if(v) cout<<"Node "<<key<< " is added into the tree."<<endl;
return true;
}else {
if(v) cout<<"Node "<<key<< " is already in the tree."<<endl;
return false;
}
if(v) cout<<endl;
};
bool RBST::addString(const string& key) {
return add(Key(key), false);
}
bool RBST::del(const Key& key, bool v){
unsigned oldSize= m_size;
m_head = del(m_head, key);
if (m_size < oldSize) {
if(v) cout<<"Node "<<key<< " is deleted from the tree."<<endl;
return true;
}
else {
if(v) cout<< "Node "<<key<< " is not in the tree."<<endl;
return false;
}
};
bool RBST::find(const Key& key, bool v){
RBSTNode* ret = find(m_head, key);
if (ret == NULL){
if(v) cout<< "Node "<<key<< " is not in the tree."<<endl;
return false;
}else {
if(v) cout<<"Node "<<key<< " is in the tree."<<endl;
return true;
}
};
#ifdef DEBUG
int RBST::dump(char sep){
int ret = dump(m_head, sep);
cout<<"SIZE: " <<ret<<endl;
return ret;
};
int RBST::dump(RBSTNode* target, char sep){
if (target == NULL) return 0;
int ret = dump(target->left(), sep);
cout<< *target<<sep;
ret ++;
ret += dump(target->right(), sep);
return ret;
};
#endif
/**
* Rotates the tree around target, so that target's left
* is the new root of the tree/subtree and updates the subtree sizes.
*
*(target) b (l) a
* / \ right / \
* a ? ----> ? b
* / \ / \
* ? x x ?
*
*/
RBSTNode* RBST::rightRotate(RBSTNode* target) // private
{
if (target == NULL) throw "Invariant failure, target is null"; // Note: may be removed once tested.
if (target->left() == NULL) throw "You cannot rotate right around a target whose left node is NULL!";
#ifdef DEBUG
cout <<"Right-rotating b-node ";
target->print(cout);
cout << " for a-node ";
target->left()->print(cout);
cout << "." << endl;
#endif
RBSTNode* l = target->left();
int as0 = l->size();
// re-order the sizes
l->setSize( l->size() + (target->right() == NULL ? 0 : target->right()->size()) + 1); // a.size += b.right.size + 1; where b.right may be null.
target->setSize( target->size() -as0 + (l->right() == NULL ? 0 : l->right()->size()) ); // b.size += -a_0_size + x.size where x may be null.
// swap b's left (for a)
target->setLeft(l->right());
// and a's right (for b's left)
l->setRight(target);
#ifdef DEBUG
cout << "A-node size: " << l->size() << ", b-node size: " << target->size() << "." << endl;
#endif
// return the new root, a.
return l;
};
/**
* Like rightRotate, but the other way. See docs for rightRotate(RBSTNode*)
*/
RBSTNode* RBST::leftRotate(RBSTNode* target)
{
if (target == NULL) throw "Invariant failure, target is null";
if (target->right() == NULL) throw "You cannot rotate left around a target whose right node is NULL!";
#ifdef DEBUG
cout <<"Left-rotating a-node ";
target->print(cout);
cout << " for b-node ";
target->right()->print(cout);
cout << "." << endl;
#endif
RBSTNode* r = target->right();
int bs0 = r->size();
// re-roder the sizes
r->setSize(r->size() + (target->left() == NULL ? 0 : target->left()->size()) + 1);
target->setSize(target->size() -bs0 + (r->left() == NULL ? 0 : r->left()->size()));
// swap a's right (for b's left)
target->setRight(r->left());
// swap b's left (for a)
r->setLeft(target);
#ifdef DEBUG
cout << "Left-rotation done: a-node size: " << target->size() << ", b-node size: " << r->size() << "." << endl;
#endif
return r;
};
//
/**
* Adds a key to the tree and returns the new root of the tree.
* If the key already exists doesn't add anything.
* Increments m_size if the key didn't already exist and hence was added.
*
* This function is not called from public methods, it's a helper function.
*/
RBSTNode* RBST::addRoot(RBSTNode* target, const Key& key)
{
countAdd++;
if (target == NULL) return new RBSTNode(key);
#ifdef DEBUG
cout << "addRoot(";
cout.flush();
target->print(cout) << "," << key << ") called." << endl;
#endif
if (*target < key)
{
target->setRight( addRoot(target->right(), key) );
target->incSize(); // Should I?
RBSTNode* res = leftRotate(target);
#ifdef DEBUG
if (target->size() != size(target))
BUG("in addRoot 1");
#endif
return res;
}
target->setLeft( addRoot(target->left(), key) );
target->incSize(); // Should I?
RBSTNode* res = rightRotate(target);
#ifdef DEBUG
if (target->size() != size(target))
BUG("in addRoot 2");
#endif
return res;
};
/**
* This function is called from the public add(key) function,
* and returns the new root node.
*/
RBSTNode* RBST::randomAdd(RBSTNode* target, const Key& key)
{
countAdd++;
if (target == NULL)
{
m_size++;
return new RBSTNode(key);
}
#ifdef DEBUG
cout << "randomAdd(";
target->print(cout) << ", \"" << key << "\") called." << endl;
#endif
int r = (rand() % target->size()) + 1;
// here is where we add the target as root!
if (r == 1)
{
m_size++; // TODO: Need to lock.
return addRoot(target, key);
}
#ifdef DEBUG
printf("randomAdd recursion part, ");
#endif
// otherwise, continue recursing!
if (*target <= key)
{
#ifdef DEBUG
printf("target <= key\n");
#endif
target->setRight( randomAdd(target->right(), key) );
target->incSize(); // TODO: Need to lock.
#ifdef DEBUG
if (target->right()->size() != size(target->right()))
BUG("in randomAdd 1");
#endif
}
else
{
#ifdef DEBUG
printf("target > key\n");
#endif
target->setLeft( randomAdd(target->left(), key) );
target->incSize(); // TODO: Need to lock.
#ifdef DEBUG
if (target->left()->size() != size(target->left()))
BUG("in randomAdd 2");
#endif
}
#ifdef DEBUG
printf("randomAdd return part\n");
#endif
m_size++; // TODO: Need to lock.
return target;
};
/////////////////////////////////////////////////////////////
///////////////////// DEL FUNCTIONS ////////////////////////
/////////////////////////////////////////////////////////////
/**
* Deletes a node with the passed key.
* Returns the root node.
* Decrements m_size if something was deleted.
*/
RBSTNode* RBST::del(RBSTNode* target, const Key& key)
{
countDelete++;
if (target == NULL) return NULL;
#ifdef DEBUG
cout << "del(";
target->print(cout) << ", \"" << key << "\") called." << endl;
#endif
RBSTNode* ret = NULL;
// found the node to delete
if (*target == key)
{
ret = join(target->left(), target->right());
m_size--;
delete target;
return ret; // return the newly built joined subtree!
}
// store a temporary size before recursive deletion.
unsigned int size = m_size;
if (*target < key) target->setRight( del(target->right(), key) );
else target->setLeft( del(target->left(), key) );
// if the previous recursion changed the size, we need to decrement the size of this target too.
if (m_size < size) target->decrSize();
#ifdef DEBUG
if (RBST::size(target) != target->size())
BUG("in del");
#endif
return target;
};
/**
* Joins the two subtrees represented by left and right
* by randomly choosing which to make the root, weighted on the
* size of the sub-tree.
*/
RBSTNode* RBST::join(RBSTNode* left, RBSTNode* right)
{
if (left == NULL) return right;
if (right == NULL) return left;
#ifdef DEBUG
cout << "join(";
left->print(cout);
cout << ",";
right->print(cout) << ") called." << endl;
#endif
// Find the chance that we use the left tree, based on its size over the total tree size.
// 3 s.d. randomness :-p e.g. 60.3% chance.
bool useLeft = ((rand()%1000) < (signed)((float)left->size()/(float)(left->size() + right->size()) * 1000.0));
RBSTNode* subtree = NULL;
if (useLeft)
{
subtree = join(left->right(), right);
left->setRight(subtree)
->setSize((left->left() == NULL ? 0 : left->left()->size())
+ subtree->size() + 1 );
#ifdef DEBUG
if (size(left) != left->size())
BUG("in join 1");
#endif
return left;
}
subtree = join(right->left(), left);
right->setLeft(subtree)
->setSize((right->right() == NULL ? 0 : right->right()->size())
+ subtree->size() + 1);
#ifdef DEBUG
if (size(right) != right->size())
BUG("in join 2");
#endif
return right;
};
/////////////////////////////////////////////////////////////
///////////////////// FIND FUNCTIONS ///////////////////////
/////////////////////////////////////////////////////////////
/**
* Tries to find the key in the tree starting
* search from target.
*
* Returns NULL if it was not found.
*/
RBSTNode* RBST::find(RBSTNode* target, const Key& key)
{
countFind++; // Could use private method only counting the first call.
if (target == NULL) return NULL; // not found.
if (*target == key) return target; // found (does string override ==?)
if (*target < key) return find(target->right(), key); // search for gt to the right.
return find(target->left(), key); // search for lt to the left.
};
#ifdef DEBUG
unsigned int RBST::size(RBSTNode* x)
{
if (x == NULL) return 0;
return 1 + size(x->left()) + size(x->right());
}
#endif
I'll save the SkipList for another time since it's already possible to find good implementations of a SkipList from the links and my version wasn't much different.
The graphs generated from the test-file are as follows:
Graph showing time taken to add new items for BloomFilter, RBST and SkipList.
graph http://haf.se/content/dl/addtimer.png
Graph showing time taken to find items for BloomFilter, RBST and SkipList
graph http://haf.se/content/dl/findtimer.png
Graph showing time taken to delete items for BloomFilter, RBST and SkipList
graph http://haf.se/content/dl/deltimer.png
So as you can see, the random binary search tree was rather a lot better than the SkipList. The bloom filter lives up to its O(k).
Consider the hash-based collections for this, e.g. HashSet, Dictionary, HashTable, which provide constant time performance for adding and removing elements.
More information from the .NET Framework Developer's Guide:
Hashtable and Dictionary Collection Types
HashSet Collection Type
Well, how much do you need to query it? A linked-list has fast insert/delete (at any position), but isn't as quick to search as (for example) a dictionary / sorted-list. Alternatively, a straight list with a bit/value pair in each - i.e. "still has value". Just re-use logically empty cells before appending. Delete just clears the cell.
For reference types, "null" would do here. For value-types, Nullable<T>.
You could use a Hashtable or strongly typed Dictionary<Client>. The client class might override GetHashCode to provide a faster hash code generation, or if using Hashtable you can optionally use an IHashCodeProvider.
How do you need to find the clients? Is a Tuple/Dictionary necessary? You're more than likely to find something that solves your problem in the Jeffrey Richter's Power Collections library which has lists, trees, most data structures you can think of.
I was very impressed by the Channel9 interview with Peter Sestoft:
channel9.msdn.com/shows/Going+Deep/Peter-Sestoft-C5-Generic-Collection-Library-for-C-and-CLI/
He is a professor at the Copenhagen IT University who helped to create the The C5 Generic Collection Library:
www.itu.dk/research/c5/
It might be overkill or it might be just the speedy collection you were looking for ...
hth,
-Mike