Related
I need to read 8 bool values and create a Byte from it, How is this done?
rather than hardcoding the following 1's and 0's - how can i create that binary value from a series of Boolean values in c#?
byte myValue = 0b001_0000;
There's many ways of doing it, for example to build it from an array:
bool[] values = ...;
byte result = 0;
for(int i = values.Length - 1; i >= 0; --i) // assuming you store them "in reverse"
result = result | (values[i] << (values.Length - 1 - i));
My solution with Linq:
public static byte CreateByte(bool[] bits)
{
if (bits.Length > 8)
{
throw new ArgumentOutOfRangeException();
}
return (byte)bits.Reverse().Select((val, i) => Convert.ToByte(val) << i).Sum();
}
The call to Reverse() is optional and dependent on if you want index 0 to be the LSB (without Reverse) or the MSB (with Reverse)
var values = new bool[8];
values [7] = true;
byte result = 0;
for (var i = 0; i < 8; i++)
{
//edited to bit shifting because of community complains :D
if (values [i]) result |= (byte)(1 << i);
}
// result => 128
This might be absolutely overkill, but I felt like playing around with SIMD. It could've probably been written even better but I don't know SIMD all that well.
If you want reverse bit order to what this generates, just remove the shuffling part from the SIMD approach and change (7 - i) to just i
For those not familiar with SIMD, this approach is about 3 times faster than a normal for loop.
public static byte ByteFrom8Bools(ReadOnlySpan<bool> bools)
{
if (bools.Length < 8)
Throw();
static void Throw() // Throwing in a separate method helps JIT produce better code, or so I've heard
{
throw new ArgumentException("Not enough booleans provided");
}
// these are JIT compile time constants, only one of the branches will be compiled
// depending on the CPU running this code, eliminating the branch entirely
if(Sse2.IsSupported && Ssse3.IsSupported)
{
// copy out the 64 bits all at once
ref readonly bool b = ref bools[0];
ref bool refBool = ref Unsafe.AsRef(b);
ulong ulongBools = Unsafe.As<bool, ulong>(ref refBool);
// load our 64 bits into a vector register
Vector128<byte> vector = Vector128.CreateScalarUnsafe(ulongBools).AsByte();
// this is just to propagate the 1 set bit in true bools to the most significant bit
Vector128<byte> allTrue = Vector128.Create((byte)1);
Vector128<byte> compared = Sse2.CompareEqual(vector, allTrue);
// reverse the bytes we care about, leave the rest in their place
Vector128<byte> shuffleMask = Vector128.Create((byte)7, 6, 5, 4, 3, 2, 1, 0, 8, 9, 10, 11, 12, 13, 14, 15);
Vector128<byte> shuffled = Ssse3.Shuffle(compared, shuffleMask);
// move the most significant bit of each byte into a bit of int
int mask = Sse2.MoveMask(shuffled);
// returning byte = returning the least significant byte from int
return (byte)mask;
}
else
{
// fall back to a more generic algorithm if there aren't the correct instructions on the CPU
byte bits = 0;
for (int i = 0; i < 8; i++)
{
bool b = bools[i];
bits |= (byte)(Unsafe.As<bool, byte>(ref b) << (7 - i));
}
return bits;
}
}
I'm making a function that will allow the user to pass a double value, and then return a UInt16.
This is my code:
public static UInt16 Value_To_BatteryVoltage(double value)
{
var ret = ((int)value << 8);
var retMod = (value % (int)value) * 10;
return (UInt16)(ret + retMod);
}
Basically what it does is as follows, function call:
Value_To_BatteryVoltage(25.10)
Will return: 6401
I can check the result by doing:
public static double VoltageLevel(UInt16 value)
{
return ((value & 0xFF00) >> 8) + ((value & 0x00FF) / 10.0);
}
This is working as expected, BUT, if I do:
Value_To_BatteryVoltage(25.11) //notice the 0.11
I get the wrong result, because:
public static UInt16 Value_To_BatteryVoltage(double value)
{
var ret = ((int)value << 8); // returns 6400 OK
var retMod = (value % (int)value) * 10; //returns 0.11 x 10 = 1.1 WRONG!
return (UInt16)(ret + retMod); //returns 6400, because (UInt16)(6400 + 1.1) = 6401 same as 25.10 so I've lost precision
}
So the question is, is there some way to do this kind of conversion without losing precision?
If I understand the question, you want to store the Characteristic (interger-part) in the first 8 bits of UInt16. And the Mantissa (fractional-part) in the second 8 bits.
This is one way to do it. I treat the double like a string and split it at the decimal. For example:
public static UInt16 Value_To_BatteryVoltage(double value)
{
string[] number = value.ToString().Split('.');
UInt16 c = (UInt16)(UInt16.Parse(number[0]) << 8);
UInt16 m = UInt16.Parse(number[1]);
return (UInt16)(c + m);
}
And here is the output:
public void Consumer()
{
foreach(int i in Integers())
{
Console.WriteLine(i.ToString());
}
}
public IEnumerable<int> Integers()
{
yield return 1;
yield return 2;
yield return 4;
yield return 8;
yield return 16;
yield return 16777216;
}
Is there a way with template trick (or other) to get the same syntax in c++?
Take a look at boost::Coroutine. It does what you want.
http://www.crystalclearsoftware.com/soc/coroutine/index.html#coroutine.intro
Example from tutorial
http://www.crystalclearsoftware.com/soc/coroutine/coroutine/tutorial.html
int range_generator(generator_type::self& self, int min, int max)
{
while(min < max)
self.yield(min++);
self.exit();
}
You can always code this by hand. Truthfully, yield really seems like sugar coating to me (and co-routines too).
What a coroutine is, really ? Some state bundled up together with:
one function to create it (isn't it a constructor ?)
one function to move to the next state (isn't it operator++, traditionally ?)
In C++, it's called an InputIterator, and can be arbitrarily fat.
So, it's true that the syntax won't be as pretty, but this should do, just with the Standard Library:
static std::array<int, 6> const Array = {{1, 2, 4, 8, 16, 16777216}};
class Integers: public std::iterator<std::input_iterator_tag,
int, ptrdiff_t, int const*, int>
{
public:
Integers(): _index(0) {}
operator bool() const { return _index < Array.size(); }
Integers& operator++() { assert(*this); ++_index; return *this; }
Integers operator++(int) { Integers tmp = *this; ++*this; return tmp; }
int operator*() const { assert(*this); return Array[_index]; }
int const* operator->() const { assert(*this); return &Array[_index]; }
private:
size_t _index;
}; // class Integers
And obviously, since you decide exactly what state is stored, you decide if all is pre-computed or if part (or whole of it) is lazily computed, and possibly cached, and possibly multi-threaded, and ... you got the idea :)
In C++14, you can mimic yield this way:
auto&& function = []() {
int i = 0;
return [=]() mutable {
int arr[] = { 1, 2, 4, 8, 16, 16777216};
if (i < 6)
return arr[i++];
return 0;
};
}();
A live example is available at http://ideone.com/SQZ1qZ
Coroutines are in the standard library since C++20 and uses co_yield instead of yield.
See also: What are coroutines in C++20?
There are some example usages in the first link: (the second one is probably what you're looking for)
uses the co_await operator to suspend execution until resumed
task<> tcp_echo_server() {
char data[1024];
while (true) {
size_t n = co_await socket.async_read_some(buffer(data));
co_await async_write(socket, buffer(data, n));
}
}
uses the keyword co_yield to suspend execution returning a value
generator<int> iota(int n = 0) {
while (true)
co_yield n++;
}
uses the keyword co_return to complete execution returning a value
lazy<int> f() {
co_return 7;
}
Here is ASM "roll your own" version : http://www.flipcode.com/archives/Yield_in_C.shtml
#include <stdio.h
#include <conio.h
#include <iostream.h
//
// marks a location in the program for resume
// does not return control, exits function from inside macro
//
// yield( x, ret )
// x : the 'name' of the yield, cannot be ambiguous in the
// function namespace
// ret : the return value for when yield() exits the function;
// must match function return type (leave blank for no return type)
#define yield(x,ret) \
{ \
/* store the resume location */ \
__asm { \
mov _myStaticMkr,offset label_##x \
} \
\
/* return the supplied value */ \
return ret; \
} \
/* our offset in the function */ \
label_##x:
//
// resumes function from the stored offset, or
// continues without notice if there's not one
// stored
//
// resume()
// <void
#define resume() \
/* our stored offset */ \
static _myStaticMkr=0; \
\
/* test for no offset */ \
if( _myStaticMkr ) \
{ \
/* resume from offset */ \
__asm \
{ \
jmp _myStaticMkr \
} \
}
// example demonstrating a function with an int return type
// using the yield() and resume() macros
//
// myFunc()
// <void
int myFunc()
{
resume();
cout << "1\n";
yield(1,1);
cout << "2\n";
yield(2,1);
cout << "3\n";
yield(3,1);
cout << "4\n";
return 0;
}
// main function
//
// main()
// <void
void main( void )
{
cout << "Yield in C++\n";
cout << "Chris Pergrossi\n\n";
myFunc();
do
{
cout << "main()\n";
cout.flush();
} while( myFunc() );
cout.flush();
getch();
}
/*
// example demonstrating a function with no return type
// using the yield() and resume() macros
//
// myFunc()
// <void
void myFunc()
{
resume();
cout << "1\n";
yield(1);
cout << "2\n";
yield(2);
cout << "3\n";
yield(3);
cout << "4\n";
return;
}
// main function
//
// main()
// <void
void main( void )
{
cout << "Yield in C++\n";
cout << "Chris Pergrossi\n\n";
myFunc();
for( int k = 0; k < 4; k ++ )
{
cout << "main()\n";
cout.flush();
myFunc();
}
cout.flush();
getch();
}
*/
If all what you need is just foreach-like stuff, then following syntax is available in C++:
#define GENERATOR(name) \
struct name \
{ \
template<typename F> \
void operator()(F yield) \
/**/
#define _ };
template<typename Gen>
struct Adaptor
{
Gen f;
template<typename C>
void operator*(C cont)
{
f(cont);
}
};
template<typename Gen>
Adaptor<Gen> make_adaptor(Gen gen)
{
return {gen};
}
#define FOREACH(arg, gen) make_adaptor(gen) * [&](arg)
#include <iostream>
using namespace std;
GENERATOR(integers)
{
yield(1);
yield(2);
yield(4);
yield(8);
yield(16777216);
}_
int main()
{
FOREACH(int i, integers())
{
cout << i << endl;
};
}
Live Demo
If you need a little bit of coroutine "power", then you can try stackless coroutines.
Or if you need full power - then go with stackful coroutines. There is Boost.Coroutine library which implements stackful coroutines for different platforms.
An try to implement yield in c++ coroutine
If you write static unsigned int checkpoint = 0;, make all your variables static, switch (checkpoint), set each case: goto to some label, above each return set checkpoint to unique value, and below define label, and at the end of the function set checkpoint to zero, and all static variables to their default value, and at last return the end value of the function. If you do all this then the function becomes enumerable and iterative. The two lines you add above and below each return line, makes the return command to behave like yield return. goto allows you to continue and resume where you left off, and static integer variable, like checkpoint, help you to remember where you stopped, from where to continue/resume and where to go. You test it's values with switch case statements. Making all other variables static, is to save their value to the next call, so in the next call, their value won't be reset!
Here for example:
#define PowerEnd INT_MIN
int Power(int number, int exponent)
{
static unsigned int checkpoint = 0;
static int result = 1, i = 0;
switch (checkpoint)
{
case 1: goto _1;
}
for (i = 0; i < exponent; i++)
{
result *= number;
checkpoint = 1;
return result;
_1:;
}
checkpoint = 0;
result = 1;
i = 0;
return PowerEnd;
}
void main()
{
while (true)
{
int result = Power(2, 8);
if (result == PowerEnd)
break;
cout << result << endl;
}
//to print only the first 4 results (if there are at least 4 results) then
for (int i = 0; i < 4; i++)
{
int result = Power(2, 8);
if (result == PowerEnd)
break;
cout << result << endl;
}
}
The above program produces the following output:
2
4
8
16
32
64
128
256
2
4
8
16
Something similar is proposed for C++17 and there is already an experimental implementation in Visual C++ 2015. Here's a good overview talk from Gor Nishanov, one of the main authors of the proposal.
#include <setjmp.h>
class superclass
{
public:
jmp_buf jbuf;
public:
virtual int enumerate(void) { return -1; }
};
class subclass: public superclass
{
public:
int enumerate()
{
static int i;
static bool b = false;
if(b)
longjmp(jbuf, 1);
for(b = true, i = 0; i < 5; (i)++)
{
printf("\ndoing stuff: i = %d\n", i);
if(setjmp(jbuf) != 1)
return i;
}
return -1;
}
};
To use the code...
int iret;
subclass *sc;
sc = new subclass();
while((iret = sc->enumerate()) != -1)
{
printf("\nsc->enumerate() returned: %d\n", iret);
}
Just got this working; it seems quite simple now, although I had a few false starts with it :)
You can of course always write your own iterators and return from them whatever you desire, but why would you want to? In the given example, why not simply put your values into a container like vector and iterate over that?
I need to maintain a roster of connected clients that are very shortlived and frequently go up and down. Due to the potential number of clients I need a collection that supports fast insert/delete. Suggestions?
C5 Generic Collection Library
The best implementations I have found in C# and C++ are these -- for C#/CLI:
http://www.itu.dk/research/c5/Release1.1/ITU-TR-2006-76.pdf
http://www.itu.dk/research/c5/
It's well researched, has extensible unit tests, and since February they also have implemented the common interfaces in .Net which makes it a lot easier to work with the collections. They were featured on Channel9 and they've done extensive performance testing on the collections.
If you are using data-structures anyway these researchers have a red-black-tree implementation in their library, similar to what you find if you fire up Lütz reflector and have a look in System.Data's internal structures :p. Insert-complexity: O(log(n)).
Lock-free C++ collections
Then, if you can allow for some C++ interop and you absolutely need the speed and want as little overhead as possible, then these lock-free ADTs from Dmitriy V'jukov are probably the best you can get in this world, outperforming Intel's concurrent library of ADTs.
http://groups.google.com/group/lock-free
I've read some of the code and it's really the makings of someone well versed in how these things are put together. VC++ can do native C++ interop without annoying boundaries. http://www.swig.org/ can otherwise help you wrap C++ interfaces for consumption in .Net, or you can do it yourself through P/Invoke.
Microsoft's Take
They have written tutorials, this one implementing a rather unpolished skip-list in C#, and discussing other types of data-structures. (There's a better SkipList at CodeProject, which is very polished and implement the interfaces in a well-behaved manner.) They also have a few data-structures bundled with .Net, namely the HashTable/Dictionary<,> and HashSet. Of course there's the "ResizeArray"/List type as well together with a stack and queue, but they are all "linear" on search.
Google's perf-tools
If you wish to speed up the time it takes for memory-allocation you can use google's perf-tools. They are available at google code and they contain a very interesting multi-threaded malloc-implementation (TCMalloc) which shows much more consistent timing than the normal malloc does. You could use this together with the lock-free structures above to really go crazy with performance.
Improving response times with memoization
You can also use memoization on functions to improve performance through caching, something interesting if you're using e.g. F#. F# also allows C++ interop, so you're OK there.
O(k)
There's also the possibility of doing something on your own using the research which has been done on bloom-filters, which allow O(k) lookup complexity where k is a constant that depends on the number of hash-functions you have implemented. This is how google's BigTable has been implemented. These filter will get you the element if it's in the set or possibly with a very low likeliness an element which is not the one you're looking for (see the graph at wikipedia -- it's approaching P(wrong_key) -> 0.01 as size is around 10000 elements, but you can go around this by implementing further hash-functions/reducing the set.
I haven't searched for .Net implementations of this, but since the hashing calculations are independent you can use MS's performance team's implementation of Tasks to speed that up.
"My" take -- randomize to reach average O(log n)
As it happens I just did a coursework involving data-structures. In this case we used C++, but it's very easy to translate to C#. We built three different data-structures; a bloom-filter, a skip-list and random binary search tree.
See the code and analysis after the last paragraph.
Hardware-based "collections"
Finally, to make my answer "complete", if you truly need speed you can use something like Routing-tables or Content-addressable memory . This allows you to very quickly O(1) in principle get a "hash"-to-value lookup of your data.
Random Binary Search Tree/Bloom Filter C++ code
I would really appreciate feedback if you find mistakes in the code, or just pointers on how I can do it better (or with better usage of templates). Note that the bloom filter isn't like it would be in real life; normally you don't have to be able to delete from it and then it much much more space efficient than the hack I did to allow the delete to be tested.
DataStructure.h
#ifndef DATASTRUCTURE_H_
#define DATASTRUCTURE_H_
class DataStructure
{
public:
DataStructure() {countAdd=0; countDelete=0;countFind=0;}
virtual ~DataStructure() {}
void resetCountAdd() {countAdd=0;}
void resetCountFind() {countFind=0;}
void resetCountDelete() {countDelete=0;}
unsigned int getCountAdd(){return countAdd;}
unsigned int getCountDelete(){return countDelete;}
unsigned int getCountFind(){return countFind;}
protected:
unsigned int countAdd;
unsigned int countDelete;
unsigned int countFind;
};
#endif /*DATASTRUCTURE_H_*/
Key.h
#ifndef KEY_H_
#define KEY_H_
#include <string>
using namespace std;
const int keyLength = 128;
class Key : public string
{
public:
Key():string(keyLength, ' ') {}
Key(const char in[]): string(in){}
Key(const string& in): string(in){}
bool operator<(const string& other);
bool operator>(const string& other);
bool operator==(const string& other);
virtual ~Key() {}
};
#endif /*KEY_H_*/
Key.cpp
#include "Key.h"
bool Key::operator<(const string& other)
{
return compare(other) < 0;
};
bool Key::operator>(const string& other)
{
return compare(other) > 0;
};
bool Key::operator==(const string& other)
{
return compare(other) == 0;
}
BloomFilter.h
#ifndef BLOOMFILTER_H_
#define BLOOMFILTER_H_
#include <iostream>
#include <assert.h>
#include <vector>
#include <math.h>
#include "Key.h"
#include "DataStructure.h"
#define LONG_BIT 32
#define bitmask(val) (unsigned long)(1 << (LONG_BIT - (val % LONG_BIT) - 1))
// TODO: Implement RW-locking on the reads/writes to the bitmap.
class BloomFilter : public DataStructure
{
public:
BloomFilter(){}
BloomFilter(unsigned long length){init(length);}
virtual ~BloomFilter(){}
void init(unsigned long length);
void dump();
void add(const Key& key);
void del(const Key& key);
/**
* Returns true if the key IS BELIEVED to exist, false if it absolutely doesn't.
*/
bool testExist(const Key& key, bool v = false);
private:
unsigned long hash1(const Key& key);
unsigned long hash2(const Key& key);
bool exist(const Key& key);
void getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1, int& i2, const Key& key);
void getCountIndicies(const int i1, const unsigned long h1,
const int i2, const unsigned long h2, int& i1_c, int& i2_c);
vector<unsigned long> m_tickBook;
vector<unsigned int> m_useCounts;
unsigned long m_length; // number of bits in the bloom filter
unsigned long m_pockets; //the number of pockets
static const unsigned long m_pocketSize; //bits in each pocket
};
#endif /*BLOOMFILTER_H_*/
BloomFilter.cpp
#include "BloomFilter.h"
const unsigned long BloomFilter::m_pocketSize = LONG_BIT;
void BloomFilter::init(unsigned long length)
{
//m_length = length;
m_length = (unsigned long)((2.0*length)/log(2))+1;
m_pockets = (unsigned long)(ceil(double(m_length)/m_pocketSize));
m_tickBook.resize(m_pockets);
// my own (allocate nr bits possible to store in the other vector)
m_useCounts.resize(m_pockets * m_pocketSize);
unsigned long i; for(i=0; i< m_pockets; i++) m_tickBook[i] = 0;
for (i = 0; i < m_useCounts.size(); i++) m_useCounts[i] = 0; // my own
}
unsigned long BloomFilter::hash1(const Key& key)
{
unsigned long hash = 5381;
unsigned int i=0; for (i=0; i< key.length(); i++){
hash = ((hash << 5) + hash) + key.c_str()[i]; /* hash * 33 + c */
}
double d_hash = (double) hash;
d_hash *= (0.5*(sqrt(5)-1));
d_hash -= floor(d_hash);
d_hash *= (double)m_length;
return (unsigned long)floor(d_hash);
}
unsigned long BloomFilter::hash2(const Key& key)
{
unsigned long hash = 0;
unsigned int i=0; for (i=0; i< key.length(); i++){
hash = key.c_str()[i] + (hash << 6) + (hash << 16) - hash;
}
double d_hash = (double) hash;
d_hash *= (0.5*(sqrt(5)-1));
d_hash -= floor(d_hash);
d_hash *= (double)m_length;
return (unsigned long)floor(d_hash);
}
bool BloomFilter::testExist(const Key& key, bool v){
if(exist(key)) {
if(v) cout<<"Key "<< key<<" is in the set"<<endl;
return true;
}else {
if(v) cout<<"Key "<< key<<" is not in the set"<<endl;
return false;
}
}
void BloomFilter::dump()
{
cout<<m_pockets<<" Pockets: ";
// I changed u to %p because I wanted it printed in hex.
unsigned long i; for(i=0; i< m_pockets; i++) printf("%p ", (void*)m_tickBook[i]);
cout<<endl;
}
void BloomFilter::add(const Key& key)
{
unsigned long h1, h2;
int i1, i2;
int i1_c, i2_c;
// tested!
getHashAndIndicies(h1, h2, i1, i2, key);
getCountIndicies(i1, h1, i2, h2, i1_c, i2_c);
m_tickBook[i1] = m_tickBook[i1] | bitmask(h1);
m_tickBook[i2] = m_tickBook[i2] | bitmask(h2);
m_useCounts[i1_c] = m_useCounts[i1_c] + 1;
m_useCounts[i2_c] = m_useCounts[i2_c] + 1;
countAdd++;
}
void BloomFilter::del(const Key& key)
{
unsigned long h1, h2;
int i1, i2;
int i1_c, i2_c;
if (!exist(key)) throw "You can't delete keys which are not in the bloom filter!";
// First we need the indicies into m_tickBook and the
// hashes.
getHashAndIndicies(h1, h2, i1, i2, key);
// The index of the counter is the index into the bitvector
// times the number of bits per vector item plus the offset into
// that same vector item.
getCountIndicies(i1, h1, i2, h2, i1_c, i2_c);
// We need to update the value in the bitvector in order to
// delete the key.
m_useCounts[i1_c] = (m_useCounts[i1_c] == 1 ? 0 : m_useCounts[i1_c] - 1);
m_useCounts[i2_c] = (m_useCounts[i2_c] == 1 ? 0 : m_useCounts[i2_c] - 1);
// Now, if we depleted the count for a specific bit, then set it to
// zero, by anding the complete unsigned long with the notted bitmask
// of the hash value
if (m_useCounts[i1_c] == 0)
m_tickBook[i1] = m_tickBook[i1] & ~(bitmask(h1));
if (m_useCounts[i2_c] == 0)
m_tickBook[i2] = m_tickBook[i2] & ~(bitmask(h2));
countDelete++;
}
bool BloomFilter::exist(const Key& key)
{
unsigned long h1, h2;
int i1, i2;
countFind++;
getHashAndIndicies(h1, h2, i1, i2, key);
return ((m_tickBook[i1] & bitmask(h1)) > 0) &&
((m_tickBook[i2] & bitmask(h2)) > 0);
}
/*
* Gets the values of the indicies for two hashes and places them in
* the passed parameters. The index is into m_tickBook.
*/
void BloomFilter::getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1,
int& i2, const Key& key)
{
h1 = hash1(key);
h2 = hash2(key);
i1 = (int) h1/m_pocketSize;
i2 = (int) h2/m_pocketSize;
}
/*
* Gets the values of the indicies into the count vector, which keeps
* track of how many times a specific bit-position has been used.
*/
void BloomFilter::getCountIndicies(const int i1, const unsigned long h1,
const int i2, const unsigned long h2, int& i1_c, int& i2_c)
{
i1_c = i1*m_pocketSize + h1%m_pocketSize;
i2_c = i2*m_pocketSize + h2%m_pocketSize;
}
** RBST.h **
#ifndef RBST_H_
#define RBST_H_
#include <iostream>
#include <assert.h>
#include <vector>
#include <math.h>
#include "Key.h"
#include "DataStructure.h"
#define BUG(str) printf("%s:%d FAILED SIZE INVARIANT: %s\n", __FILE__, __LINE__, str);
using namespace std;
class RBSTNode;
class RBSTNode: public Key
{
public:
RBSTNode(const Key& key):Key(key)
{
m_left =NULL;
m_right = NULL;
m_size = 1U; // the size of one node is 1.
}
virtual ~RBSTNode(){}
string setKey(const Key& key){return Key(key);}
RBSTNode* left(){return m_left; }
RBSTNode* right(){return m_right;}
RBSTNode* setLeft(RBSTNode* left) { m_left = left; return this; }
RBSTNode* setRight(RBSTNode* right) { m_right =right; return this; }
#ifdef DEBUG
ostream& print(ostream& out)
{
out << "Key(" << *this << ", m_size: " << m_size << ")";
return out;
}
#endif
unsigned int size() { return m_size; }
void setSize(unsigned int val)
{
#ifdef DEBUG
this->print(cout);
cout << "::setSize(" << val << ") called." << endl;
#endif
if (val == 0) throw "Cannot set the size below 1, then just delete this node.";
m_size = val;
}
void incSize() {
#ifdef DEBUG
this->print(cout);
cout << "::incSize() called" << endl;
#endif
m_size++;
}
void decrSize()
{
#ifdef DEBUG
this->print(cout);
cout << "::decrSize() called" << endl;
#endif
if (m_size == 1) throw "Cannot decrement size below 1, then just delete this node.";
m_size--;
}
#ifdef DEBUG
unsigned int size(RBSTNode* x);
#endif
private:
RBSTNode(){}
RBSTNode* m_left;
RBSTNode* m_right;
unsigned int m_size;
};
class RBST : public DataStructure
{
public:
RBST() {
m_size = 0;
m_head = NULL;
srand(time(0));
};
virtual ~RBST() {};
/**
* Tries to add key into the tree and will return
* true for a new item added
* false if the key already is in the tree.
*
* Will also have the side-effect of printing to the console if v=true.
*/
bool add(const Key& key, bool v=false);
/**
* Same semantics as other add function, but takes a string,
* but diff name, because that'll cause an ambiguity because of inheritance.
*/
bool addString(const string& key);
/**
* Deletes a key from the tree if that key is in the tree.
* Will return
* true for success and
* false for failure.
*
* Will also have the side-effect of printing to the console if v=true.
*/
bool del(const Key& key, bool v=false);
/**
* Tries to find the key in the tree and will return
* true if the key is in the tree and
* false if the key is not.
*
* Will also have the side-effect of printing to the console if v=true.
*/
bool find(const Key& key, bool v = false);
unsigned int count() { return m_size; }
#ifdef DEBUG
int dump(char sep = ' ');
int dump(RBSTNode* target, char sep);
unsigned int size(RBSTNode* x);
#endif
private:
RBSTNode* randomAdd(RBSTNode* target, const Key& key);
RBSTNode* addRoot(RBSTNode* target, const Key& key);
RBSTNode* rightRotate(RBSTNode* target);
RBSTNode* leftRotate(RBSTNode* target);
RBSTNode* del(RBSTNode* target, const Key& key);
RBSTNode* join(RBSTNode* left, RBSTNode* right);
RBSTNode* find(RBSTNode* target, const Key& key);
RBSTNode* m_head;
unsigned int m_size;
};
#endif /*RBST_H_*/
** RBST.cpp **
#include "RBST.h"
bool RBST::add(const Key& key, bool v){
unsigned int oldSize = m_size;
m_head = randomAdd(m_head, key);
if (m_size > oldSize){
if(v) cout<<"Node "<<key<< " is added into the tree."<<endl;
return true;
}else {
if(v) cout<<"Node "<<key<< " is already in the tree."<<endl;
return false;
}
if(v) cout<<endl;
};
bool RBST::addString(const string& key) {
return add(Key(key), false);
}
bool RBST::del(const Key& key, bool v){
unsigned oldSize= m_size;
m_head = del(m_head, key);
if (m_size < oldSize) {
if(v) cout<<"Node "<<key<< " is deleted from the tree."<<endl;
return true;
}
else {
if(v) cout<< "Node "<<key<< " is not in the tree."<<endl;
return false;
}
};
bool RBST::find(const Key& key, bool v){
RBSTNode* ret = find(m_head, key);
if (ret == NULL){
if(v) cout<< "Node "<<key<< " is not in the tree."<<endl;
return false;
}else {
if(v) cout<<"Node "<<key<< " is in the tree."<<endl;
return true;
}
};
#ifdef DEBUG
int RBST::dump(char sep){
int ret = dump(m_head, sep);
cout<<"SIZE: " <<ret<<endl;
return ret;
};
int RBST::dump(RBSTNode* target, char sep){
if (target == NULL) return 0;
int ret = dump(target->left(), sep);
cout<< *target<<sep;
ret ++;
ret += dump(target->right(), sep);
return ret;
};
#endif
/**
* Rotates the tree around target, so that target's left
* is the new root of the tree/subtree and updates the subtree sizes.
*
*(target) b (l) a
* / \ right / \
* a ? ----> ? b
* / \ / \
* ? x x ?
*
*/
RBSTNode* RBST::rightRotate(RBSTNode* target) // private
{
if (target == NULL) throw "Invariant failure, target is null"; // Note: may be removed once tested.
if (target->left() == NULL) throw "You cannot rotate right around a target whose left node is NULL!";
#ifdef DEBUG
cout <<"Right-rotating b-node ";
target->print(cout);
cout << " for a-node ";
target->left()->print(cout);
cout << "." << endl;
#endif
RBSTNode* l = target->left();
int as0 = l->size();
// re-order the sizes
l->setSize( l->size() + (target->right() == NULL ? 0 : target->right()->size()) + 1); // a.size += b.right.size + 1; where b.right may be null.
target->setSize( target->size() -as0 + (l->right() == NULL ? 0 : l->right()->size()) ); // b.size += -a_0_size + x.size where x may be null.
// swap b's left (for a)
target->setLeft(l->right());
// and a's right (for b's left)
l->setRight(target);
#ifdef DEBUG
cout << "A-node size: " << l->size() << ", b-node size: " << target->size() << "." << endl;
#endif
// return the new root, a.
return l;
};
/**
* Like rightRotate, but the other way. See docs for rightRotate(RBSTNode*)
*/
RBSTNode* RBST::leftRotate(RBSTNode* target)
{
if (target == NULL) throw "Invariant failure, target is null";
if (target->right() == NULL) throw "You cannot rotate left around a target whose right node is NULL!";
#ifdef DEBUG
cout <<"Left-rotating a-node ";
target->print(cout);
cout << " for b-node ";
target->right()->print(cout);
cout << "." << endl;
#endif
RBSTNode* r = target->right();
int bs0 = r->size();
// re-roder the sizes
r->setSize(r->size() + (target->left() == NULL ? 0 : target->left()->size()) + 1);
target->setSize(target->size() -bs0 + (r->left() == NULL ? 0 : r->left()->size()));
// swap a's right (for b's left)
target->setRight(r->left());
// swap b's left (for a)
r->setLeft(target);
#ifdef DEBUG
cout << "Left-rotation done: a-node size: " << target->size() << ", b-node size: " << r->size() << "." << endl;
#endif
return r;
};
//
/**
* Adds a key to the tree and returns the new root of the tree.
* If the key already exists doesn't add anything.
* Increments m_size if the key didn't already exist and hence was added.
*
* This function is not called from public methods, it's a helper function.
*/
RBSTNode* RBST::addRoot(RBSTNode* target, const Key& key)
{
countAdd++;
if (target == NULL) return new RBSTNode(key);
#ifdef DEBUG
cout << "addRoot(";
cout.flush();
target->print(cout) << "," << key << ") called." << endl;
#endif
if (*target < key)
{
target->setRight( addRoot(target->right(), key) );
target->incSize(); // Should I?
RBSTNode* res = leftRotate(target);
#ifdef DEBUG
if (target->size() != size(target))
BUG("in addRoot 1");
#endif
return res;
}
target->setLeft( addRoot(target->left(), key) );
target->incSize(); // Should I?
RBSTNode* res = rightRotate(target);
#ifdef DEBUG
if (target->size() != size(target))
BUG("in addRoot 2");
#endif
return res;
};
/**
* This function is called from the public add(key) function,
* and returns the new root node.
*/
RBSTNode* RBST::randomAdd(RBSTNode* target, const Key& key)
{
countAdd++;
if (target == NULL)
{
m_size++;
return new RBSTNode(key);
}
#ifdef DEBUG
cout << "randomAdd(";
target->print(cout) << ", \"" << key << "\") called." << endl;
#endif
int r = (rand() % target->size()) + 1;
// here is where we add the target as root!
if (r == 1)
{
m_size++; // TODO: Need to lock.
return addRoot(target, key);
}
#ifdef DEBUG
printf("randomAdd recursion part, ");
#endif
// otherwise, continue recursing!
if (*target <= key)
{
#ifdef DEBUG
printf("target <= key\n");
#endif
target->setRight( randomAdd(target->right(), key) );
target->incSize(); // TODO: Need to lock.
#ifdef DEBUG
if (target->right()->size() != size(target->right()))
BUG("in randomAdd 1");
#endif
}
else
{
#ifdef DEBUG
printf("target > key\n");
#endif
target->setLeft( randomAdd(target->left(), key) );
target->incSize(); // TODO: Need to lock.
#ifdef DEBUG
if (target->left()->size() != size(target->left()))
BUG("in randomAdd 2");
#endif
}
#ifdef DEBUG
printf("randomAdd return part\n");
#endif
m_size++; // TODO: Need to lock.
return target;
};
/////////////////////////////////////////////////////////////
///////////////////// DEL FUNCTIONS ////////////////////////
/////////////////////////////////////////////////////////////
/**
* Deletes a node with the passed key.
* Returns the root node.
* Decrements m_size if something was deleted.
*/
RBSTNode* RBST::del(RBSTNode* target, const Key& key)
{
countDelete++;
if (target == NULL) return NULL;
#ifdef DEBUG
cout << "del(";
target->print(cout) << ", \"" << key << "\") called." << endl;
#endif
RBSTNode* ret = NULL;
// found the node to delete
if (*target == key)
{
ret = join(target->left(), target->right());
m_size--;
delete target;
return ret; // return the newly built joined subtree!
}
// store a temporary size before recursive deletion.
unsigned int size = m_size;
if (*target < key) target->setRight( del(target->right(), key) );
else target->setLeft( del(target->left(), key) );
// if the previous recursion changed the size, we need to decrement the size of this target too.
if (m_size < size) target->decrSize();
#ifdef DEBUG
if (RBST::size(target) != target->size())
BUG("in del");
#endif
return target;
};
/**
* Joins the two subtrees represented by left and right
* by randomly choosing which to make the root, weighted on the
* size of the sub-tree.
*/
RBSTNode* RBST::join(RBSTNode* left, RBSTNode* right)
{
if (left == NULL) return right;
if (right == NULL) return left;
#ifdef DEBUG
cout << "join(";
left->print(cout);
cout << ",";
right->print(cout) << ") called." << endl;
#endif
// Find the chance that we use the left tree, based on its size over the total tree size.
// 3 s.d. randomness :-p e.g. 60.3% chance.
bool useLeft = ((rand()%1000) < (signed)((float)left->size()/(float)(left->size() + right->size()) * 1000.0));
RBSTNode* subtree = NULL;
if (useLeft)
{
subtree = join(left->right(), right);
left->setRight(subtree)
->setSize((left->left() == NULL ? 0 : left->left()->size())
+ subtree->size() + 1 );
#ifdef DEBUG
if (size(left) != left->size())
BUG("in join 1");
#endif
return left;
}
subtree = join(right->left(), left);
right->setLeft(subtree)
->setSize((right->right() == NULL ? 0 : right->right()->size())
+ subtree->size() + 1);
#ifdef DEBUG
if (size(right) != right->size())
BUG("in join 2");
#endif
return right;
};
/////////////////////////////////////////////////////////////
///////////////////// FIND FUNCTIONS ///////////////////////
/////////////////////////////////////////////////////////////
/**
* Tries to find the key in the tree starting
* search from target.
*
* Returns NULL if it was not found.
*/
RBSTNode* RBST::find(RBSTNode* target, const Key& key)
{
countFind++; // Could use private method only counting the first call.
if (target == NULL) return NULL; // not found.
if (*target == key) return target; // found (does string override ==?)
if (*target < key) return find(target->right(), key); // search for gt to the right.
return find(target->left(), key); // search for lt to the left.
};
#ifdef DEBUG
unsigned int RBST::size(RBSTNode* x)
{
if (x == NULL) return 0;
return 1 + size(x->left()) + size(x->right());
}
#endif
I'll save the SkipList for another time since it's already possible to find good implementations of a SkipList from the links and my version wasn't much different.
The graphs generated from the test-file are as follows:
Graph showing time taken to add new items for BloomFilter, RBST and SkipList.
graph http://haf.se/content/dl/addtimer.png
Graph showing time taken to find items for BloomFilter, RBST and SkipList
graph http://haf.se/content/dl/findtimer.png
Graph showing time taken to delete items for BloomFilter, RBST and SkipList
graph http://haf.se/content/dl/deltimer.png
So as you can see, the random binary search tree was rather a lot better than the SkipList. The bloom filter lives up to its O(k).
Consider the hash-based collections for this, e.g. HashSet, Dictionary, HashTable, which provide constant time performance for adding and removing elements.
More information from the .NET Framework Developer's Guide:
Hashtable and Dictionary Collection Types
HashSet Collection Type
Well, how much do you need to query it? A linked-list has fast insert/delete (at any position), but isn't as quick to search as (for example) a dictionary / sorted-list. Alternatively, a straight list with a bit/value pair in each - i.e. "still has value". Just re-use logically empty cells before appending. Delete just clears the cell.
For reference types, "null" would do here. For value-types, Nullable<T>.
You could use a Hashtable or strongly typed Dictionary<Client>. The client class might override GetHashCode to provide a faster hash code generation, or if using Hashtable you can optionally use an IHashCodeProvider.
How do you need to find the clients? Is a Tuple/Dictionary necessary? You're more than likely to find something that solves your problem in the Jeffrey Richter's Power Collections library which has lists, trees, most data structures you can think of.
I was very impressed by the Channel9 interview with Peter Sestoft:
channel9.msdn.com/shows/Going+Deep/Peter-Sestoft-C5-Generic-Collection-Library-for-C-and-CLI/
He is a professor at the Copenhagen IT University who helped to create the The C5 Generic Collection Library:
www.itu.dk/research/c5/
It might be overkill or it might be just the speedy collection you were looking for ...
hth,
-Mike
For the life of me, I can't remember how to set, delete, toggle or test a bit in a bitfield. Either I'm unsure or I mix them up because I rarely need these. So a "bit-cheat-sheet" would be nice to have.
For example:
flags = flags | FlagsEnum.Bit4; // Set bit 4.
or
if ((flags & FlagsEnum.Bit4)) == FlagsEnum.Bit4) // Is there a less verbose way?
Can you give examples of all the other common operations, preferably in C# syntax using a [Flags] enum?
I did some more work on these extensions - You can find the code here
I wrote some extension methods that extend System.Enum that I use often... I'm not claiming that they are bulletproof, but they have helped... Comments removed...
namespace Enum.Extensions {
public static class EnumerationExtensions {
public static bool Has<T>(this System.Enum type, T value) {
try {
return (((int)(object)type & (int)(object)value) == (int)(object)value);
}
catch {
return false;
}
}
public static bool Is<T>(this System.Enum type, T value) {
try {
return (int)(object)type == (int)(object)value;
}
catch {
return false;
}
}
public static T Add<T>(this System.Enum type, T value) {
try {
return (T)(object)(((int)(object)type | (int)(object)value));
}
catch(Exception ex) {
throw new ArgumentException(
string.Format(
"Could not append value from enumerated type '{0}'.",
typeof(T).Name
), ex);
}
}
public static T Remove<T>(this System.Enum type, T value) {
try {
return (T)(object)(((int)(object)type & ~(int)(object)value));
}
catch (Exception ex) {
throw new ArgumentException(
string.Format(
"Could not remove value from enumerated type '{0}'.",
typeof(T).Name
), ex);
}
}
}
}
Then they are used like the following
SomeType value = SomeType.Grapes;
bool isGrapes = value.Is(SomeType.Grapes); //true
bool hasGrapes = value.Has(SomeType.Grapes); //true
value = value.Add(SomeType.Oranges);
value = value.Add(SomeType.Apples);
value = value.Remove(SomeType.Grapes);
bool hasOranges = value.Has(SomeType.Oranges); //true
bool isApples = value.Is(SomeType.Apples); //false
bool hasGrapes = value.Has(SomeType.Grapes); //false
In .NET 4 you can now write:
flags.HasFlag(FlagsEnum.Bit4)
The idiom is to use the bitwise or-equal operator to set bits:
flags |= 0x04;
To clear a bit, the idiom is to use bitwise and with negation:
flags &= ~0x04;
Sometimes you have an offset that identifies your bit, and then the idiom is to use these combined with left-shift:
flags |= 1 << offset;
flags &= ~(1 << offset);
#Drew
Note that except in the simplest of cases, the Enum.HasFlag carries a heavy performance penalty in comparison to writing out the code manually. Consider the following code:
[Flags]
public enum TestFlags
{
One = 1,
Two = 2,
Three = 4,
Four = 8,
Five = 16,
Six = 32,
Seven = 64,
Eight = 128,
Nine = 256,
Ten = 512
}
class Program
{
static void Main(string[] args)
{
TestFlags f = TestFlags.Five; /* or any other enum */
bool result = false;
Stopwatch s = Stopwatch.StartNew();
for (int i = 0; i < 10000000; i++)
{
result |= f.HasFlag(TestFlags.Three);
}
s.Stop();
Console.WriteLine(s.ElapsedMilliseconds); // *4793 ms*
s.Restart();
for (int i = 0; i < 10000000; i++)
{
result |= (f & TestFlags.Three) != 0;
}
s.Stop();
Console.WriteLine(s.ElapsedMilliseconds); // *27 ms*
Console.ReadLine();
}
}
Over 10 million iterations, the HasFlags extension method takes a whopping 4793 ms, compared to the 27 ms for the standard bitwise implementation.
.NET's built-in flag enum operations are unfortunately quite limited. Most of the time users are left with figuring out the bitwise operation logic.
In .NET 4, the method HasFlag was added to Enum which helps simplify user's code but unfortunately there are many problems with it.
HasFlag is not type-safe as it accepts any type of enum value argument, not just the given enum type.
HasFlag is ambiguous as to whether it checks if the value has all or any of the flags provided by the enum value argument. It's all by the way.
HasFlag is rather slow as it requires boxing which causes allocations and thus more garbage collections.
Due in part to .NET's limited support for flag enums I wrote the OSS library Enums.NET which addresses each of these issues and makes dealing with flag enums much easier.
Below are some of the operations it provides along with their equivalent implementations using just the .NET framework.
Combine Flags
.NET flags | otherFlags
Enums.NET flags.CombineFlags(otherFlags)
Remove Flags
.NET flags & ~otherFlags
Enums.NET flags.RemoveFlags(otherFlags)
Common Flags
.NET flags & otherFlags
Enums.NET flags.CommonFlags(otherFlags)
Toggle Flags
.NET flags ^ otherFlags
Enums.NET flags.ToggleFlags(otherFlags)
Has All Flags
.NET (flags & otherFlags) == otherFlags or flags.HasFlag(otherFlags)
Enums.NET flags.HasAllFlags(otherFlags)
Has Any Flags
.NET (flags & otherFlags) != 0
Enums.NET flags.HasAnyFlags(otherFlags)
Get Flags
.NET
Enumerable.Range(0, 64)
.Where(bit => ((flags.GetTypeCode() == TypeCode.UInt64 ? (long)(ulong)flags : Convert.ToInt64(flags)) & (1L << bit)) != 0)
.Select(bit => Enum.ToObject(flags.GetType(), 1L << bit))`
Enums.NET flags.GetFlags()
I'm trying to get these improvements incorporated into .NET Core and maybe eventually the full .NET Framework. You can check out my proposal here.
C++ syntax, assuming bit 0 is LSB, assuming flags is unsigned long:
Check if Set:
flags & (1UL << (bit to test# - 1))
Check if not set:
invert test !(flag & (...))
Set:
flag |= (1UL << (bit to set# - 1))
Clear:
flag &= ~(1UL << (bit to clear# - 1))
Toggle:
flag ^= (1UL << (bit to set# - 1))
For the best performance and zero garbage, use this:
using System;
using T = MyNamespace.MyFlags;
namespace MyNamespace
{
[Flags]
public enum MyFlags
{
None = 0,
Flag1 = 1,
Flag2 = 2
}
static class MyFlagsEx
{
public static bool Has(this T type, T value)
{
return (type & value) == value;
}
public static bool Is(this T type, T value)
{
return type == value;
}
public static T Add(this T type, T value)
{
return type | value;
}
public static T Remove(this T type, T value)
{
return type & ~value;
}
}
}
Bitwise (Flags) enum guide
Old, but wanted to take a stab at a cheat sheet, even if for my own reference:
Operation
Syntax
Example
On
|=
e |= E.A
Off
&= + ~
e &= ~E.A
Toggle
^=
e ^= E.A
Test (.NET API)
.HasFlag
e.HasFlag(E.A)
Test (bitwise)
(see example)
(e & E.A) == E.A
Examples
[Flags]
enum E {
A = 0b1,
B = 0b10,
C = 0b100
}
E e = E.A; // Assign (e = A)
e |= E.B | E.C; // Add (e = A, B, C)
e &= ~E.A & ~E.B; // Remove (e = C) -- alt syntax: &= ~(E.A | E.B)
e ^= E.A | E.C; // Toggle (e = A)
e.HasFlag(E.A); // Test (returns true)
// Testing multiple flags using bit operations:
bool hasAandB = ( e & (E.A | E.B) ) == (E.A | E.B);
Bonus: defining a Flags enum
Typically, we use integers like so:
[Flags]
enum E {
A = 1,
B = 2,
C = 4,
// etc.
But as we approach larger numbers, it's not as easy to calculate the next value:
// ...
W = 4194304,
X = 8388608,
// ..
There are a couple of alternatives, however: binary and hexadecimal literals.
For Binary, just append a 0 at the end of the previous value:
[Flags]
enum E {
A = 0b1,
B = 0b10,
C = 0b100,
// ...
W = 0b100_0000_0000_0000_0000_0000,
X = 0b1000_0000_0000_0000_0000_0000,
Hexadecimal also has a handy pattern and might look a bit less ugly: cycle through 1, 2, 4, 8, adding a zero after each complete iteration.
[Flags]
enum E {
A = 0x1,
B = 0x2,
C = 0x4,
D = 0x8,
E = 0x10, // 16
F = 0x20, // 32, etc.
// ...
W = 0x400000,
X = 0x800000,
To test a bit you would do the following:
(assuming flags is a 32 bit number)
Test Bit:
if((flags & 0x08) == 0x08) (If bit 4 is set then its true)
Toggle Back (1 - 0 or 0 - 1): flags = flags ^ 0x08;
Reset Bit 4 to Zero: flags = flags & 0xFFFFFF7F;
This was inspired by using Sets as indexers in Delphi, way back when:
/// Example of using a Boolean indexed property
/// to manipulate a [Flags] enum:
public class BindingFlagsIndexer
{
BindingFlags flags = BindingFlags.Default;
public BindingFlagsIndexer()
{
}
public BindingFlagsIndexer( BindingFlags value )
{
this.flags = value;
}
public bool this[BindingFlags index]
{
get
{
return (this.flags & index) == index;
}
set( bool value )
{
if( value )
this.flags |= index;
else
this.flags &= ~index;
}
}
public BindingFlags Value
{
get
{
return flags;
}
set( BindingFlags value )
{
this.flags = value;
}
}
public static implicit operator BindingFlags( BindingFlagsIndexer src )
{
return src != null ? src.Value : BindingFlags.Default;
}
public static implicit operator BindingFlagsIndexer( BindingFlags src )
{
return new BindingFlagsIndexer( src );
}
}
public static class Class1
{
public static void Example()
{
BindingFlagsIndexer myFlags = new BindingFlagsIndexer();
// Sets the flag(s) passed as the indexer:
myFlags[BindingFlags.ExactBinding] = true;
// Indexer can specify multiple flags at once:
myFlags[BindingFlags.Instance | BindingFlags.Static] = true;
// Get boolean indicating if specified flag(s) are set:
bool flatten = myFlags[BindingFlags.FlattenHierarchy];
// use | to test if multiple flags are set:
bool isProtected = ! myFlags[BindingFlags.Public | BindingFlags.NonPublic];
}
}
C++ operations are: & | ^ ~ (for and, or, xor and not bitwise operations). Also of interest are >> and <<, which are bitshift operations.
So, to test for a bit being set in a flag, you would use:
if (flags & 8) //tests bit 4 has been set