I'm trying to optimize an algorithm I've been working on in C#. I have a while loop that will continue as long as a couple of variables are not 0. I was hoping to potentially replace this with some binary arithmetic & lookup table to avoid conditional branching.
while(a != 0 && b != 0 ) {
// DO WORK
}
I was hoping I might be able to do something like this:
var lookup_table = [next_iteration, done];
next_iteration:
// DO WORK
goto lookup_table[<< a&b? 0 : 1 >>];
done:
Replacing the << a&b? 0 : 1 >> with binary arithmetic. Probably folding all the bits to the 1s place in a & b, and then masking out other bits in the final AND operation. Perhaps there's a better technique.
Of course, this level of optimization may just be beyond the capabilities of C#/Managed Languages without dropping into IL (which could be a valid solution). In which case I can drop down to C/C++. I'm curious to see if I can't get it to work in C# though.
Related
I'm working on the following practice problem from GeeksForGeeks:
Write a function Add() that returns sum of two integers. The function should not use any of the arithmetic operators (+, ++, –, -, .. etc).
The given solution in C# is:
public static int Add(int x, int y)
{
// Iterate till there is no carry
while (y != 0)
{
// carry now contains common set bits of x and y
int carry = x & y;
// Sum of bits of x and y where at least one of the bits is not set
x = x ^ y;
// Carry is shifted by one so that adding it to x gives the required sum
y = carry << 1;
}
return x;
}
Looking at this solution, I understand how it is happening; I can follow along with the debugger and anticipate the value changes before they come. But after walking through it several times, I still don't understand WHY it is happening. If this was to come up in an interview, I would have to rely on memory to solve it, not actual understanding of how the algorithm works.
Could someone help explain why we use certain operators at certain points and what those totals are suppose to represent? I know there are already comments in the code, but I'm obviously missing something...
At each iteration, you have these steps:
carry <- x & y // mark every location where the addition has a carry
x <- x ^ y // sum without carries
y <- carry << 1 // shift the carry left one column
On the next iteration, x holds the entire sum except for the carry bits, which are in y. These carries are properly bumped one column to the left, just as if you were doing the addition on paper. Continue doing this until there are no more carry bits to worry about.
Very briefly, this does the addition much as you or I would do it on paper, except that, instead of working right to left, it does all the bits in parallel.
Decimal arithmetic is more complicated than binary arithmetic, but perhaps it helps to compare them.
The algorithm that is usually taught for addition is to go through the digits one by one, remembering to "carry a one" when necessary. In the above algorithm, that is not exactly what happens - rather, all digits are added and allowed to wrap, and all the carries are collected to be applied all at once in the next step. In decimal that would look like this:
123456
777777
------ +
890123
001111 << 1
011110
------ +
801233
010000 << 1
100000
------ +
901233
000000 done
In binary arithmetic, addition without carry is just XOR.
What you have here is a case of Binary Math on the Represenetation in memory:
https://www.wikihow.com/Add-Binary-Numbers
Generally when programming in C#, you do not bother with the "how is it represented in memory" level of things. 55% of the time it is not worth the effort, 40% it is worse then just using the builtin functions. And the remaing 5% you should ask yourself why you are not programming in Native C++, Assembler or something with similar low level capacities to begin with.
So I have the following RotateLeft algorithm in C#:
public static ushort RotateLeft(ushort value, int count)
{
int left = value << count;
int right = value >> (16 - count);
return (ushort)(left | right);
}
Does this algorithm differ if the numbering scheme is different?
By numbering scheme I mean if it's MSB-0 or LSB-0
MSB-0
0 7
1 0 0 1 0 1 1 0
LSB-0
7 0
1 0 0 1 0 1 1 0
Say I wanted to shift left by 1, does having a different numbering scheme affect the algorithm?
It looks like that algorithm is agnostic of the underlying system's little- or big- endianness. That is to say, it will work the same regardless of the numbering scheme, since the OR operation adds the shifted bits back in on the other side before returning. Assuming you're using it for bit-level operations and flag checks, this might be all you need.
Without knowing how it fits into the rest of your program it's hard to say whether it will work the way you expect across different platforms using opposite bit numbering schemes. For example, if you run this on a device using MSB-0 and write some shifted data to a binary file, then read that binary data back in on a device using LSB-0, it probably won't be what you're expecting.
If your goal is to have your software work the same across differently-endian systems, take a look at the .NET BitConverter Class. If you're using this for efficient mathematical operations, the static BitConverter.IsLittleEndian field will let you check the underlying architecture so that you can shift the other way or reverse binary data accordingly.
Is there a reason I shouldn't be testing a set of variables for 0 by testing their product?
Often in my coding across different languages I will test a set of variables to do something if they all consist of zeros.
For example (c#):
if( myString.Length * myInt * (myUrl.LastIndexOf(#"\")+1) == 0 )
Instead of:
if( myString.Length == 0 || myInt == 0 || myUrl.LastIndexOf(#"\") < 0)
Is there a reason I shouldn't be testing this way?
Here are a few reasons. All are important, and they're in no particular order.
Short circuiting. In your first example, all three things will be evaluated even if they don't need to be. In some cases, this can be a real problem: short circuiting is nice because you can do things like if (myObj != null && myObj.Enabled) without throwing exceptions
Correctness. Is myString.Length * myInt * myUrl.LastIndexOf(#"\") == 0 actually equivalent in all practical cases to if( myString.Length > 0 && myInt != 0 && myUrl.LastIndexOf(#"\") <= 0)? I'm not sure. I doubt it. I'm sure I could figure it out with some effort, but why should I have to in the first place? Which brings me to...
Clarity. Since the conventional way is to use separate statements and &&, anyone reading this code in the future will have a harder time understanding what it's doing. And don't make the excuse that, "I'll be the only one to read it", because in a few months or years, you'll probably have forgotten the thoughts and conventions you had when you wrote it, and be reading it just like anyone else.
You shouldn't do that because it's not obvious what you're doing. Code should be clean, readable, and easily maintainable.
It's clever, but it's going to make the next person who looks at your code have to "decipher" what your intent was by doing it that way.
There are practical cases in which that will not work.
Consider for example a string of length 65536 where the last character is a "\". 65536 * myInt * 65536 is always zero, because the product contains a factor 232 and int's are only 32 bits long. All the bits that would be set are not actually present in the result.
Depending on the value of myInt, that can also happen for shorter strings, but a string of 65k isn't that much of a stretch. (an URL of 65k may be a bit of a stretch, but whatever)
Note, by the way, that it's not so much the overflow that causes the problem, the problem is having several too-high powers of 2 as factor. For example, if all factors are odd (but high), the result will be odd as well (regardless of overflows), and therefore at least one bit will be set.
Is there a shorter and better-looking alternative to
(b == 0) ? 0 : 1;
in terms of bitwise operations?
Also, to get the correct sign (-1, 0 or 1) of a given integer a I am currently using
(a > 0) ? 1 : (a >> 32);
Are there any shorter (but not slower) ways?
Personally I'd stick with the first option for your "equal to zero or not" option.
For the sign of an integer, I'd use Math.Sign and assume that the JIT compiler is going to inline it - testing that assumption with benchmarks if it turns out to be a potential bottleneck.
Think about the readability above all - your first piece of code is blindingly obvious. Your second isn't. I'm not even convinced your second piece of code works... I thought that right-shifts were effectively masked to the bottom 5 bits, assuming this is an Int32 (int).
EDIT: Just checked, and indeed your current second piece of code is equivalent to:
int y = x > 0 ? 1 : x;
The shift doesn't even end up in the compiled code.
Take this as an object lesson about caring more about micro-optimization than readability.
It's much easier to make code work if it's easy to understand. If your code gives the wrong result, I don't care how quickly it runs.
Microoptimizations are the root of all evil. You sacrifice readability for a nanosecond. That's a bad bargain
in terms of bitwise operations... Can
anyone please point that to me? Also,
to get the correct sign (-1, 0 or 1)
of a given integer a I am currently
using
(a > 0) ? 1 : (a >> 32);
Armen Tsirunyan and Jon Skeet answered your technical question, I am going to attempt to explain some technical misconceptions you appear to have.
The first mistake is that if you have a 32 bit signed integer and attempted to shift it by 32, you would be attempting to look at the 33rd bit which in the case of a signed base 2 arthmetic would be an overflow bit.
The second mistake is when you have 32-bit signed binary value. The last bit would either be a one or zero. There is a only a single zero value. So your statement of trying to figure out if the sign is ( -1,0,1) clearly indicates you don't understand this fact. If the signed bit is a 1 the number would be negative, if it was zero, it would be positive. The structures that handle a number for the most part within the .NET Framework are not aware of 2's complement and 1's complement. This of course does not mean you cannot extend that functionality or simply convert a signed integer to a 2's complement number ( really simple honestly ).
I should add that there is only one value for zero when you have a signed integer. I guess this is was my major problem with your "check the sign" statement that you made which shows a misconception about binary numbers.
http://en.wikipedia.org/wiki/Signed_magnitude#Sign-and-magnitude
I read all over the place people talk about compressing objects on a bit by bit scale. Things like "The first three bits represent such and such, then the next two represent this and twelve bits for that"
I understand why it would be desirable to minimize memory usage, but I cannot think of a good way to implement this. I know I would pack it into one or more integers (or longs, whatever), but I cannot envision an easy way to work with it. It would be pretty cool if there were a class where I could get/set arbitrary bits from an arbitrary length binary field, and it would take care of things for me, and I wouldn't have to go mucking about with &'s and |'s and masks and such.
Is there a standard pattern for this kind of thing?
From MSDN:
BitArray Class
Manages a compact array of bit values, which are represented as Booleans, where true indicates that the bit is on (1) and false indicates the bit is off (0).
Example:
BitArray myBitArray = new BitArray(5);
myBitArray[3] = true; // set bit at offset 3 to 1
BitArray allows you to set only individual bits, though. If you want to encode values with more bits, there's probably no way around mucking about with &'s and |'s and masks and stuff :-)
You might want to check out the BitVector32 structure in the .NET Framework. It lets you define "sections" which are ranges of bits within an int, then read and write values to those sections.
The main limitation is that it's limited to a single 32-bit integer; this may or may not be a problem depending on what you're trying to do. As dtb mentioned, BitArray can handle bit fields of any size, but you can only get and set a single bit at a time--there is no support for sections as in BitVector32.
What you're looking for are called bitwise operations.
For example, let's say we're going to represent an RGB value in the least significant 24 bits of an integer, with R being bits 23-16, G being bits 15-8, and B being bits 7-0.
You can set R to any value between 0 and 255 without effecting the other bits like this:
void setR(ref int RGBValue, int newR)
{
int newRValue = newR << 16; // shift it left 16 bits so that the 8 low-bits are now in position 23-16
RGBValue = RGBValue & 0x00FF; // AND it with 0x00FF so that the top 16 bits are set to zero
RGBValue = RGBValue | newRValue; // now OR it with the newR value so that the new value is set.
}
By using bitwise ANDs and ORs (and occasionally more exotic operations) you can easily set and clear any individual bit of a larger value.
Rather than using toolkit or platform specific wrapper classes I think you are better off biting the bullet and learning your &s and |s and 0x04s and how all the bitwise operators work. By and large that's how its done for most projects, and the operations are extremely fast. The operations are pretty much identical on most languages so you won't be stuck dependant on some specific toolkit.