I need to perform a bitwise equality between two bytes. That means that for instance if I have two bytes: 00011011 and 00011110 the result is 11111010
The only fast way I see is to use the following statement
byte a, b;//set input bytes
byte c = ~(a^b);//output bytes
But I wonder if there is a faster solution for this. After these equality operation I want to mask the bits I need. So I need to use an AND-operation. So the code becomes:
byte a, b;//set input bytes
byte m;//mask, intresting bits are set to 1, others to 0
byte c = (~(a^b))&m;//output bytes
aren't there any faster and more simple methods that don't need to use all those bitwise operations because this part of the code will be called very often.
I doubt it can be done in fewer operations. That looks optimal. Perhaps you could store ~(a^b) in a lookup table (256*256 entries)? I doubt you would get much benefit and possibly even make things worse, but you could try it.
You are looking in the wrong place for this optimization; you won't end up finding any better bitwise operation here. Even if you did, it's hardly going to speed anything up. The real win will come from processing more than just a byte at a time. The processor is already having to do a bunch of bit shifting and masking operations just so that it can pretend for you that you are working with bytes. Process your arrays of bytes 1 word at a time, or use vector instructions if they are available.
These operations seem fast enough to be honest. I think you shouldn't try to optimize them further, but finish your software first, see if you are happy with the overall performance and use a profiler if you are not. I am fairly sure the problem will be elsewhere.
What you want is an XNOR operation. Unfortunately this is not supported by C#/Mono. I think your solution is optimal.
Related
I have a tree structure where each node knows its CRC. What's a reasonable way to compute a CRC for each sub-tree that would give me a good value for the entire sub-tree to that point? In other words, a value to identify if any part of the sub-tree was changed.
My current thought is simply take each child node CRC, convert it to a string/byte[], concatenate all the nodes together, and take the CRC of that byte[]. But I'm not sure if this might lead to easy collisions as I suspect this removes quite a bit of information.
(I looked at crc32_combine but it seems inappropriate because I don't have any lengths. I could use a length of zero, but would that be any better or worse?)
Working in C# but I guess this is really language agnostic.
EDIT: Ended up going with this technique. Will switch to longer hashes if collisions seem to be a problem. While I don't need leaf order to be important, am not using xor just in case it is later on.
Ideally you would combine the CRCs of the nodes to compute the CRC of a sub-tree, using something like crc32_combine(). The result would be the same as computing the CRC over all the nodes in whatever canonical ordering you have defined. This would only check the ordering though, not the structure of the tree. A different structure with the same ordering would give the same CRC. This will be true no matter how you combine the CRCs, unless you include additional information on the tree structure.
The use of crc32_combine() requires the length of the data for each of the CRCs being combined (except the first). This is apparently not saved and not available in this case. You can instead make a stream of bytes of the CRCs in the canonical order and take the CRC of that stream. (You will need to decide if the CRCs are to be stored big or little endian in the byte stream, and then stick to your convention.)
The use of cryptographic signatures such as SHA1 or MD5 is unnecessary, unless you are worried for some reason that a devious human is interfering with your computed check values and trying to trick you into thinking that contents of the tree have not changed when they have. (The devious human can already do this at the level of the nodes anyway, since CRCs are easily spoofed.) Otherwise such signatures are just a waste of CPU time. If you simply want a longer hash, more than 32 bits, to reduce the probability of collisions, then you can use a fast hash function such as one from the CityHash family.
I'd probably use least SHA1 for your checksums since the collisions aren't that infrequent for MD5s and your idea about combining the CRCs seems solid though personally I'd XOR the hashes together to save on RAM and CPU cycles.
You should use something designed for this such as SHA-2. You may be able to get away with CRC32 depending on your particular requirements. There is a similar question posted here with more discussion:
Can CRC32 be used as a hash function?
15 years ago, while programming with Pascal, I understood why to use power of two's for memory allocation. But this still seems to be state-of-the-art.
C# Examples:
new StringBuilder(256);
new byte[1024];
int bufferSize = 1 << 12;
I still see this thousands of times, I use this myself and I'm still questioning:
Do we need this in modern programming languages and modern hardware?
I guess its good practice, but what's the reason?
EDIT
For example a byte[] array, as stated by answers here, a power of 2 will make no sense: the array itself will use 16 bytes (?), so does it make sense to use 240 (=256-16) for the size to fit a total of 256 bytes?
Do we need this in modern programming languages and modern hardware? I guess its good practice, but what's the reason?
It depends. There are two things to consider here:
For sizes less than the memory page size, there's no appreciable difference between a power-of-two and an arbitrary number to allocate space;
You mostly use managed data structures with C#, so you won't even know how many bytes are really allocated underneath.
Assuming you're doing low-level allocation with malloc(), using multiples of the page size would be considered a good idea, i.e. 4096 or 8192; this is because it allows for more efficient memory management.
My advice would be to just allocate what you need and let C# handle the memory management and allocation for you.
Sadly, it's quite stupid if you want to keep a block of memory in a single memory page of 4k... And persons don't even know it :-) (I didn't until 10 minutes ago... I only had an hunch)... An example... It's unsafe code and implementation dependant (using .NET 4.5 at 32/64 bits)
byte[] arr = new byte[4096];
fixed (byte* p = arr)
{
int size = ((int*)p)[IntPtr.Size == 4 ? -1 : -2];
}
So the CLR has allocated at least 4096 + (1 or 2) sizeof(int)... So it has gone over one 4k memory page. This is logical... It has to keep the size of the array somewhere, and keeping it together with the array is the most intelligent thing (for those that know what Pascal Strings and BSTR are, yes, it's the same principle)
I'll add that all the objects in .NET have a syncblck number and a RuntimeType... They are at least int if not IntPtr, so a total of between 8 and 16 bytes/object (This is explained in various places... try looking for .net object header if you are interested)
It still makes sense in certain cases, but I would prefer to analyze case-by-case whether I need that kind of specification or not, rather than blindly use it as good practice.
For example, there might be cases where you want to use exactly 8 bits of information (1 byte) to address a table.
In that case, I would let the table have the size of 2^8.
Object table = new Object[256];
By this, you will be able to address any object of the table using only one byte.
Even if the table is actually smaller and doesn't use all 256 places, you still have the guarantee of bidirectional mapping from table to index and from index to table, which could prevent errors that would appear, for example, if you had:
Object table = new Object[100];
And then someone (probably someone else) accesses it with a byte value out of table's range.
Maybe this kind of bijective behavior could be good, maybe you could have other ways to guarantee your constraints.
Probably, given the increase in smartness of current compilers, it is not the only good practice anymore.
IMHO, anything ending in exact power of two's arithmeric operation is like a fast track. low level arithmeric operation for power of two takes less number of turns and bit manipulations than any other numbers need extra work for cpu.
And found this possible duplicate:Is it better to allocate memory in the power of two?
Yes, it's good practice, and it has at least one reason.
The modern processors have L1 cache-line size 64 bytes, and if you will use buffer size as 2^n (for example 1024, 4096,..), you will take fully cache-line, without wasted space.
In some cases, this will help prevent false sharing problem (http://en.wikipedia.org/wiki/False_sharing).
I'm working on a genetic algorithm project where I encode my chromosome in a binary string where each 32 bits represents a value. The problem is that the functions I'm optimizing has over 3000 parameters which implies that I have over 96000 bits in my bit string and the manipulations i do on this are simply to slow...
I have proceeded as following: I have a binary class where I'm creating a boolean array that represents my big binary string. Then I manipulate this binary string with various shifts and moves and such.
My question is, is there a better way to do this? The speed is just killing. I'm sure it would be fine if i only did this on one bit string but i have to do the manipulations on 25 bit strings for way over 1000 generations.
What I would do is take a step back. Your analysis seems to be wedded to an implementation detail, namely that you have chosen bool[] as how you represent a string of bits.
Clear your mind of bools and arrays and make a complete list of the operations you actually need to perform, how frequently they happen, and how fast they have to be. Ideally consider whether your speed requirement is average speed or worst case speed. (There are many data structures that attain high average speed by having one expensive operation for every thousand cheap operations; if having any expensive operations is unacceptable then you need to know that up front.)
Once you have that list, you can then do research on what data structures work well.
For example, suppose your list of operations is:
construct bit sequences on the order of 32 bits
concatenate on the order of 3000 bit sequences together to form new bit sequences
insert new bit sequences into existing long bit sequences at specific locations, quickly
Given just that list of operations, I'd think that the data structure you want is a catenable deque. Catenable deques support fast insertion on either end, and can be broken up into two deques efficiently. Inserting stuff into the middle of a deque is then easily done: break the deque up, insert the item into the end of one half, and join them back up again.
However, if you then add another operation to the problem, say, "search for a particular bit string anywhere in the 90000-bit sequence, and find the result in sublinear time" then just a catenable deque isn't going to do it. Searching a deque is slow. There are other data structures that support that operation.
If I understood correctly you are encoding the bit array in a bool[]. The first obvious optimisation would be to change this to int[] (or even long[]) and take advantage of bit operations on a whole machine word, where possible.
For example, this would make shifts more efficient by ~ a factor 4.
Is the BitArray class no help?
A BitArray would probably be faster than a boolean array but you would still not get built-in support to shift 96000 bits.
GPUs are extremely good at massive bit operations. Maybe Brahma, CUDA.NET, or Accelerator could be of service?
Let me understand; currently, you're using a sequence of 32-bit values for a "chromosome". Are we talking about DNA chromosomes or neuroevolutionary algorithmic chromosomes?
If it's DNA, you deal with 4 values; A,C,G,T. That can be coded in 2 bits, making a byte able to hold 4 values. Your 3000-element chromosome sequence can be stored in a 750-element byte array; that's nothing, really.
Your two most expensive operations are to and from the compressed bitstream. I would recommend a byte-keyed enum:
public enum DnaMarker : byte { A, C, G, T };
Then, you go from 4 of these to a byte with one operation:
public static byte ToByteCode(this DnaMarker[] markers)
{
byte output = 0;
for(byte i=0;i<4;i++)
output = (output << 2) + (byte)markers[i];
}
... and parse them back out with something like this:
public static DnaMarker[] ToMarkers(this byte input)
{
var result = new byte[4];
for(byte i=0;i<4;i++)
result[i] = (DnaMarker)(input - (input >> (2*(i+1))));
return result;
}
You might see a slight performance increase using four parameters (output if necessary) versus allocating and using an array in the heap. But, you lose the iteration which makes the code more compact.
Now, because you're packing them into four-byte "blocks", if your sequence length isn't always an exact multiple of four you'll end up "padding" the end of your block with zero values (A). Working around this is messy, but if you had a 32-bit integer that told you the exact number of markers, you can simply discard anything more you find in the bytestream.
From here, possibilities are endless; you can convert the enum array to a string by simply calling ToString() on each one, and likewise you can feed in a string and get an enum array by iterating through using Enum.Parse().
And always remember, unless memory is at a premium (usually it isn't), it's almost always faster to deal with the data in an easily-usable format instead of the most compact format. The one big exception is in network transmission; if you had to send 750 bytes vs 12KB over the Internet, there's an obvious advantage in the smaller size.
I'm working on an application that needs to pass around large sets of Int32 values. The sets are expected to contain ~1,000,000-50,000,000 items, where each item is a database key in the range 0-50,000,000. I expect distribution of ids in any given set to be effectively random over this range. The operations I need on the set are dirt simple:
Add a new value
Iterate over all of the values.
There is a serious concern about the memory usage of these sets, so I'm looking for a data structure that can store the ids more efficiently than a simple List<int>or HashSet<int>. I've looked at BitArray, but that can be wasteful depending on how sparse the ids are. I've also considered a bitwise trie, but I'm unsure how to calculate the space efficiency of that solution for the expected data. A Bloom Filter would be great, if only I could tolerate the false negatives.
I would appreciate any suggestions of data structures suitable for this purpose. I'm interested in both out-of-the-box and custom solutions.
EDIT: To answer your questions:
No, the items don't need to be sorted
By "pass around" I mean both pass between methods and serialize and send over the wire. I clearly should have mentioned this.
There could be a decent number of these sets in memory at once (~100).
Use the BitArray. It uses only some 6MB of memory; the only real problem is that iteration is Theta(N), i.e. you have to walk the entire range. Locality of reference is good though and you can allocate the entire structure in one operation.
As for wasting space: you waste 6MB in the worst case.
EDIT: ok, you've lots of sets and you're serializing. For serializing on disk, I suggest 6MB files :)
For sending over the wire, just iterate and consider sending ranges instead of individual elements. That does require a sorting structure.
You need lots of these sets. Consider if you have 600MB to spare. Otherwise, check out:
Bytewise tries: O(1) insert, O(n) iteration, much lower constant factors than bitwise tries
A custom hash table, perhaps Google sparsehash through C++/CLI
BSTs storing ranges/intervals
Supernode BSTs
It would depend on the distribution of the sizes of your sets. Unless you expect most of the sets to be (close to) the minimum you've specified, I'd probably use a bitset. To cover a range up to 50,000,000, a bitset ends up ~6 megabytes.
Compared to storing the numbers directly, this is marginally larger for the minimum size set you've specified (~6 megabytes instead of ~4), but considerably smaller for the maximum size set (1/32nd the size).
The second possibility would be to use a delta encoding. For example, instead of storing each number directly, store the difference between that number and the previous number that was included. Given a maximum magnitude of 50,000,000 and a minimum size of 1,000,000 items, the average difference between one number and the next is ~50. This means you can theoretically store the difference in <6 bits on average. I'd probably use the 7 least significant bits directly, and if you need to encode a larger gap, set the msb and (for example) store the size of the gap in the lower 7 bits plus the next three bytes. That can't happen very often, so in most cases you're using only one byte per number, for about 4:1 compression compared to storing numbers directly. In the best case this would use ~1 megabyte for a set, and in the worst about 50 megabytes -- 4:1 compression compared to storing numbers directly.
If you don't mind a little bit of extra code, you could use an adaptive scheme -- delta encoding for small sets (up to 6,000,000 numbers), and a bitmap for larger sets.
I think the answer depends on what you mean by "passing around" and what you're trying to accomplish. You say you are only adding to the list: how often do you add? How fast will the list grow? What is an acceptable overhead for memory use, versus the time to reallocate memory?
In your worst case, 50,000,000 32-bit numbers = 200 megabytes using the most efficient possible data storage mechanism. Assuming you may end up with this much use in your worst case scenario, is it OK to use this much memory all the time? Is that better than having to reallocate memory frequently? What's the distribution of typical usage patterns? You could always just use an int[] that's pre-allocated to the whole 50 million.
As far as access speed for your operations, nothing is faster than iterating and adding to a pre-allocated chunk of memory.
From OP edit: There could be a decent number of these sets in memory at once (~100).
Hey now. You need to store 100 sets of 1 to 50 million numbers in memory at once? I think the bitset method is the only possible way this could work.
That would be 600 megabytes. Not insignificant, but unless they are (typically) mostly empty, it seems very unlikely that you would find a more efficient storage mechanism.
Now, if you don't use bitsets, but rather use dynamically sized constructs, and they could somehow take up less space to begin with, you're talking about a real ugly memory allocation/deallocation/garbage collection scenario.
Let's assume you really need to do this, though I can only imagine why. So your server's got a ton of memory, just allocate as many of these 6 megabyte bitsets as you need and recycle them. Allocation and garbage collection are no longer a problem. Yeah, you're using a ton of memory, but that seems inevitable.
I have found a few threads in regards to this issue. Most people appear to favor using int in their c# code accross the board even if a byte or smallint would handle the data unless it is a mobile app. I don't understand why. Doesn't it make more sense to define your C# datatype as the same datatype that would be in your data storage solution?
My Premise:
If I am using a typed dataset, Linq2SQL classes, POCO, one way or another I will run into compiler datatype conversion issues if I don't keep my datatypes in sync across my tiers. I don't really like doing System.Convert all the time just because it was easier to use int accross the board in c# code. I have always used whatever the smallest datatype is needed to handle the data in the database as well as in code, to keep my interface to the database clean. So I would bet 75% of my C# code is using byte or short as opposed to int, because that is what is in the database.
Possibilities:
Does this mean that most people who just use int for everything in code also use the int datatype for their sql storage datatypes and could care less about the overall size of their database, or do they do system.convert in code wherever applicable?
Why I care: I have worked on my own forever and I just want to be familiar with best practices and standard coding conventions.
Performance-wise, an int is faster in almost all cases. The CPU is designed to work efficiently with 32-bit values.
Shorter values are complicated to deal with. To read a single byte, say, the CPU has to read the 32-bit block that contains it, and then mask out the upper 24 bits.
To write a byte, it has to read the destination 32-bit block, overwrite the lower 8 bits with the desired byte value, and write the entire 32-bit block back again.
Space-wise, of course, you save a few bytes by using smaller datatypes. So if you're building a table with a few million rows, then shorter datatypes may be worth considering. (And the same might be good reason why you should use smaller datatypes in your database)
And correctness-wise, an int doesn't overflow easily. What if you think your value is going to fit within a byte, and then at some point in the future some harmless-looking change to the code means larger values get stored into it?
Those are some of the reasons why int should be your default datatype for all integral data. Only use byte if you actually want to store machine bytes. Only use shorts if you're dealing with a file format or protocol or similar that actually specifies 16-bit integer values. If you're just dealing with integers in general, make them ints.
I am only 6 years late but maybe I can help someone else.
Here are some guidelines I would use:
If there is a possibility the data will not fit in the future then use the larger int type.
If the variable is used as a struct/class field then by default it will be padded to take up the whole 32-bits anyway so using byte/int16 will not save memory.
If the variable is short lived (like inside a function) then the smaller data types will not help much.
"byte" or "char" can sometimes describe the data better and can do compile time checking to make sure larger values are not assigned to it on accident. e.g. If storing the day of the month(1-31) using a byte and try to assign 1000 to it then it will cause an error.
If the variable is used in an array of roughly 100 or more I would use the smaller data type as long as it makes sense.
byte and int16 arrays are not as thread safe as an int (a primitive).
One topic that no one brought up is the limited CPU cache. Smaller programs execute faster then larger ones because the CPU can fit more of the program in the faster L1/L2/L3 caches.
Using the int type can result in fewer CPU instructions however it will also force a higher percentage of the data memory to not fit in the CPU cache. Instructions are cheap to execute. Modern CPU cores can execute 3-7 instructions per clock cycle however a single cache miss on the other hand can cost 1000-2000 clock cycles because it has to go all the way to RAM.
When memory is conserved it also results in the rest of the application performing better because it is not squeezed out of the cache.
I did a quick sum test with accessing random data in random order using both a byte array and an int array.
const int SIZE = 10000000, LOOPS = 80000;
byte[] array = Enumerable.Repeat(0, SIZE).Select(i => (byte)r.Next(10)).ToArray();
int[] visitOrder = Enumerable.Repeat(0, LOOPS).Select(i => r.Next(SIZE)).ToArray();
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
int sum = 0;
foreach (int v in visitOrder)
sum += array[v];
sw.Stop();
Here are the results in time(ticks): (x86, release mode, without debugger, .NET 4.5, I7-3930k) (smaller is better)
________________ Array Size __________________
10 100 1K 10K 100K 1M 10M
byte: 549 559 552 552 568 632 3041
int : 549 566 552 562 590 1803 4206
Accessing 1M items randomly using byte on my CPU had a 285% performance increase!
Anything under 10,000 was hardly noticeable.
int was never faster then byte for this basic sum test.
These values will vary with different CPUs with different cache sizes.
One final note, Sometimes I look at the now open-source .NET framework to see what Microsoft's experts do. The .NET framework uses byte/int16 surprisingly little. I could not find any actually.
You would have to be dealing with a few BILLION rows before this makes any significant difference in terms of storage capacity. Lets say you have three columns, and instead of using a byte-equivalent database type, you use an int-equivalent.
That gives us 3 (columns) x 3 (bytes extra) per row, or 9 bytes per row.
This means, for "a few million rows" (lets say three million), you are consuming a whole extra 27 megabytes of disk space! Fortunately as we're no longer living in the 1970s, you shouldn't have to worry about this :)
As said above, stop micro-optimising - the performance hit in converting to/from different integer-like numeric types is going to hit you much, much harder than the bandwidth/diskspace costs, unless you are dealing with very, very, very large datasets.
For the most part, 'No'.
Unless you know upfront that you are going to be dealing with 100's of millions of rows, it's a micro-optimisation.
Do what fits the Domain model best. Later, if you have performance problems, benchmark and profile to pin-point where they are occuring.
Not that I didn't believe Jon Grant and others, but I had to see for myself with our "million row table". The table has 1,018,000. I converted 11 tinyint columns and 6 smallint columns into int, there were already 5 int & 3 smalldatetimes. 4 different indexes used a combo of the various data types, but obviously the new indexes are now all using int columns.
Making the changes only cost me 40 mb calculating base table disk usage with no indexes. When I added the indexes back in the overall change was only 30 mb difference overall. So I was suprised because I thought the index size would be larger.
So is 30 mb worth the hassle of using all the different data types, No Way! I am off to INT land, thanks everyone for setting this anal retentive programmer back on the straight and happy blissful life of no more integer conversions...yippeee!
The .NET runtime is optimised for Int32. See previous discussion at .NET Integer vs Int16?
If int is used everywhere, no casting or conversions are required. That is a bigger bang for the buck than the memory you will save by using multiple integer sizes.
It just makes life simpler.