Fastest way to operate on individual bytes in an int - c#

I found that my application spends 25% of its time doing this in a loop:
private static int Diff (int c0, int c1)
unsafe {
byte* pc0 = (byte*) &c0;
byte* pc1 = (byte*) &c1;
int d0 = pc0[0] - pc1[0];
int d1 = pc0[1] - pc1[1];
int d2 = pc0[2] - pc1[2];
int d3 = pc0[3] - pc1[3];
d0 *= d0;
d1 *= d1;
d2 *= d2;
d3 *= d3;
return d0 + d1 + d2 + d3;
How can I improve the performance of this method? My ideas so far:
Most obviously, this would benefit from SIMD, but let us suppose I don't want to go there because it is a bit of a hassle.
Same goes for lower level stuff (calling a C library, executing on GPGPU)
Multithreading - I'll use that.
Edit: For your convenience, some test code which reflects the real environment and use case. (In reality even more data are involved, and data are not compared in single large blocks but in many chunks of several kb each.)
public static class ByteCompare
private static void Main ()
const int n = 1024 * 1024 * 20;
const int repeat = 20;
var rnd = new Random (0);
Console.Write ("Generating test data... ");
var t0 = Enumerable.Range (1, n)
.Select (x => rnd.Next (int.MinValue, int.MaxValue))
.ToArray ();
var t1 = Enumerable.Range (1, n)
.Select (x => rnd.Next (int.MinValue, int.MaxValue))
.ToArray ();
Console.WriteLine ("complete.");
GC.Collect (2, GCCollectionMode.Forced);
Console.WriteLine ("GCs: " + GC.CollectionCount (0));
var sw = Stopwatch.StartNew ();
long res = 0;
for (int reps = 0; reps < repeat; reps++) {
for (int i = 0; i < n; i++) {
int c0 = t0[i];
int c1 = t1[i];
res += ByteDiff_REGULAR (c0, c1);
sw.Stop ();
Console.WriteLine ("res=" + res + ", t=" + sw.Elapsed.TotalSeconds.ToString ("0.00") + "s - ByteDiff_REGULAR");
var sw = Stopwatch.StartNew ();
long res = 0;
for (int reps = 0; reps < repeat; reps++) {
for (int i = 0; i < n; i++) {
int c0 = t0[i];
int c1 = t1[i];
res += ByteDiff_UNSAFE (c0, c1);
sw.Stop ();
Console.WriteLine ("res=" + res + ", t=" + sw.Elapsed.TotalSeconds.ToString ("0.00") + "s - ByteDiff_UNSAFE_PTR");
Console.WriteLine ("GCs: " + GC.CollectionCount (0));
Console.WriteLine ("Test complete.");
Console.ReadKey (true);
public static int ByteDiff_REGULAR (int c0, int c1)
var c00 = (byte) (c0 >> (8 * 0));
var c01 = (byte) (c0 >> (8 * 1));
var c02 = (byte) (c0 >> (8 * 2));
var c03 = (byte) (c0 >> (8 * 3));
var c10 = (byte) (c1 >> (8 * 0));
var c11 = (byte) (c1 >> (8 * 1));
var c12 = (byte) (c1 >> (8 * 2));
var c13 = (byte) (c1 >> (8 * 3));
var d0 = (c00 - c10);
var d1 = (c01 - c11);
var d2 = (c02 - c12);
var d3 = (c03 - c13);
d0 *= d0;
d1 *= d1;
d2 *= d2;
d3 *= d3;
return d0 + d1 + d2 + d3;
private static int ByteDiff_UNSAFE (int c0, int c1)
unsafe {
byte* pc0 = (byte*) &c0;
byte* pc1 = (byte*) &c1;
int d0 = pc0[0] - pc1[0];
int d1 = pc0[1] - pc1[1];
int d2 = pc0[2] - pc1[2];
int d3 = pc0[3] - pc1[3];
d0 *= d0;
d1 *= d1;
d2 *= d2;
d3 *= d3;
return d0 + d1 + d2 + d3;
which yields for me (running as x64 Release on an i5):
Generating test data... complete.
GCs: 8
res=18324555528140, t=1.46s - ByteDiff_REGULAR
res=18324555528140, t=1.15s - ByteDiff_UNSAFE
res=18324555528140, t=1.73s - Diff_Alex1
res=18324555528140, t=1.63s - Diff_Alex2
res=18324555528140, t=3.59s - Diff_Alex3
res=18325828513740, t=3.90s - Diff_Alex4
GCs: 8
Test complete.

Most obviously, this would benefit from SIMD, but let us suppose I don't want to go there because it is a bit of a hassle.
Well avoid it if you want, but it's actually fairly well supported directly from C#. Short of offloading to the GPU, I would expect this to be by far the largest performance winner if the larger algorithm lends itself to SIMD processing.
Sure, use one thread per CPU core. You can also use constructs like Parallel.For and let .NET sort out how many threads to use. It's pretty good at that, but since you know this is certainly CPU bound you might (or might not) get a more optimal result by managing threads yourself.
As for speeding up the actual code block, it may be faster to use bit masking and bit shifting to get the individual values to work on, rather than using pointers. That has the additional benefit that you don't need an unsafe code block, e.g.
byte b0_leftmost = (c0 & 0xff000000) >> 24;

Besides the already mentioned SIMD options and running multiple operations in parallel, have you tried to benchmark some possible implementation variations on the theme? Like some of the below options.
I almost forgot to mention a very important optimization:
Add a using System.Runtime.CompilerServices;
Add the [MethodImpl(MethodImplOptions.AggressiveInlining)] attribute to your method.
Like this:
private static int Diff(int c0, int c1)
byte* pc0 = (byte*)&c0;
byte* pc1 = (byte*)&c1;
int sum = 0;
int dif = 0;
for (var i = 0; i < 4; i++, pc0++, pc1++)
dif = *pc0 - *pc1;
sum += (dif * dif);
return sum;
private static int Diff(int c0, int c1)
int sum = 0;
int dif = 0;
for (var i = 0; i < 4; i++)
dif = (c0 & 0xFF) - (c1 & 0xFF);
c0 >>= 8;
c1 >>= 8;
sum += (dif * dif);
return sum;
private static int Diff(int c0, int c1)
int* difs = stackalloc int[4];
byte* pc0 = (byte*)&c0;
byte* pc1 = (byte*)&c1;
difs[0] = pc0[0] - pc1[0];
difs[1] = pc0[1] - pc1[1];
difs[2] = pc0[2] - pc1[2];
difs[3] = pc0[3] - pc1[3];
return difs[0] * difs[0] + difs[1] * difs[1] + difs[2] * difs[2] + difs[3] * difs[3];
private static int Diff(int c0, int c1)
int* difs = stackalloc int[4];
difs[0] = (c0 >> 24) - (c1 >> 24);
difs[1] = ((c0 >> 16) & 0xFF) - ((c1 >> 16) & 0xFF);
difs[2] = ((c0 >> 8) & 0xFF) - ((c1 >> 8) & 0xFF);
difs[3] = (c0 & 0xFF) - (c1 & 0xFF);
return difs[0] * difs[0] + difs[1] * difs[1] + difs[2] * difs[2] + difs[3] * difs[3];

I tried to reduce IL instructions count (looks like it's only option for single threaded, no-SIMD code). This code runs 35% faster than in description on my machine. Also i was thinking that you could try to generate IL instruction by yourself via Emit static class. It can give you more accuracy.
private static int ByteDiff_UNSAFE_2 (int c0, int c1)
unsafe {
byte* pc0 = (byte*) &c0;
byte* pc1 = (byte*) &c1;
int d0 = pc0[0] - pc1[0];
d0 *= d0;
int d1 = pc0[1] - pc1[1];
d0 += d1 * d1;
int d2 = pc0[2] - pc1[2];
d0 += d2 * d2;
int d3 = pc0[3] - pc1[3];
return d0 + d3 * d3;


C# signed fixed point to floating point conversion

I have a temperature sensor returning 2 bytes.
The temperature is defined as follows :
What is the best way in C# to convert these 2 byte to a float ?
My sollution is the following, but I don't like the power of 2 and the for loop :
static void Main(string[] args)
byte[] sensorData = new byte[] { 0b11000010, 0b10000001 }; //(-1) * (2^(6) + 2^(1) + 2^(-1) + 2^(-8)) = -66.50390625
static double ByteArrayToTemp(byte[] data)
// Convert byte array to short to be able to shift it
if (BitConverter.IsLittleEndian)
Int16 dataInt16 = BitConverter.ToInt16(data, 0);
double temp = 0;
for (int i = 0; i < 15; i++)
//We take the LSB of the data and multiply it by the corresponding second power (from -8 to 6)
//Then we shift the data for the next loop
temp += (dataInt16 & 0x01) * Math.Pow(2, -8 + i);
dataInt16 >>= 1;
if ((dataInt16 & 0x01) == 1) temp *= -1; //Sign bit
return temp;
This might be slightly more efficient, but I can't see it making much difference:
static double ByteArrayToTemp(byte[] data)
if (BitConverter.IsLittleEndian)
ushort bits = BitConverter.ToUInt16(data, 0);
double scale = 1 << 6;
double result = 0;
for (int i = 0, bit = 1 << 14; i < 15; ++i, bit >>= 1, scale /= 2)
if ((bits & bit) != 0)
result += scale;
if ((bits & 0x8000) != 0)
result = -result;
return result;
You're not going to be able to avoid a loop when calculating this.

Meaning of rational transfer function underlying MATLAB filter or Scipy.signal filter

I have some MATLAB code that filters an input signal using filter:
CUTOFF = 0.05;
FS = 5000;
[b, a] = butter(1, CUTOFF / (FS / 2), 'high');
% b = [0.99996859, -0.99996859]
% a = [1.0, -0.99993717]
dataAfter = filter(b, a, dataBefore);
I'm trying to convert this code to C#. I have already got the butter function to work pretty fast, but now I'm stuck converting the filter function.
I have read the MATLAB filter documentation and Python Scipy.signal filter documentation, but there is a term present in the transfer function definition that I don't understand.
Here is the "rational transfer function" definition from the linked documentation:
b[0] + b[1]z^(-1) + ... + b[M]z^(-M)
Y(z) = _______________________________________ X(z)
a[0] + a[1]z^(-1) + ... + a[N]z^(-N)
Correct me if i'm wrong, but z is the current element of input data, and Y(z) is the output?
If the above this is true, what is X(z) in this equation?
I want to understand this to implement it in C#, if there is an equivalent option then please enlighten me.
In the More About section of the matlab docs as you pointed out, they describe:
The input-output description of the filter operation on a vector in the Z-transform domain is a rational transfer function. A rational transfer function is of the form,
b[0] + b[1]z^(-1) + ... + b[M]z^(-M)
Y(z) = _______________________________________ X(z)
a[0] + a[1]z^(-1) + ... + a[N]z^(-N)
Y(z) b[0] + b[1]z^(-1) + ... + b[M]z^(-M)
H(z) = ____ = _______________________________________
X(z) a[0] + a[1]z^(-1) + ... + a[N]z^(-N)
Thus, X(z) is the z-domain transform of the input vector x (seeDigital Filter). It is important to mention that, also in the docs they give an alternate representation of the transfer function as a difference equation
Which lends itself better to be ported into code. One possible implementation in C#, could be (using this answer as reference)
public static double[] Filter(double[] b, double[] a, double[] x)
// normalize if a[0] != 1.0. TODO: check if a[0] == 0
if(a[0] != 1.0)
a = a.Select(el => el / a[0]).ToArray();
b = b.Select(el => el / a[0]).ToArray();
double[] result = new double[x.Length];
result[0] = b[0] * x[0];
for (int i = 1; i < x.Length; i++)
result[i] = 0.0;
int j = 0;
if ((i < b.Length) && (j < x.Length))
result[i] += (b[i] * x[j]);
while(++j <= i)
int k = i - j;
if ((k < b.Length) && (j < x.Length))
result[i] += b[k] * x[j];
if ((k < x.Length) && (j < a.Length))
result[i] -= a[j] * result[k];
return result;
static void Main(string[] args)
double[] dataBefore = { 1, 2, 3, 4 };
double[] b = { 0.99996859, -0.99996859 };
double[] a = { 1.0, -0.99993717 };
var dataAfter = Filter(b1, a, dataBefore);
Matlab dataAfter = [0.99996859 1.999874351973491 2.999717289867956 3.999497407630634]
CSharp dataAfter = [0.99996859 1.9998743519734905 2.9997172898679563 3.999497407630634]
If the coefficient vectors a and b have a fixed length of 2 the filtering function can be simplified to:
public static double[] Filter(double[] b, double[] a, double[] x)
// normalize if a[0] != 1.0. TODO: check if a[0] == 0
if (a[0] != 1.0)
a = a.Select(el => el / a[0]).ToArray();
b = b.Select(el => el / a[0]).ToArray();
int length = x.Length;
double z = 0.0;
double[] y = new double[length]; // output filtered signal
double b0 = b[0];
double b1 = b[1];
double a1 = a[1];
for (int i = 0; i < length; i++)
y[i] = b0 * x[i] + z;
z = b1 * x[i] - a1 * y[i];
return y;

Errors when re-compiling a .dll

I'm getting the following errors when I try to rebuild a .dll
Please advise what can I replace these lines with so that the code will compile.
Background info (likely not relevant):
The .dll is part of an add-in output module for a program that controls Christmas lights. It outputs data from the program to the selected serial port telling the connected relay board which relays are to be on or off.
I intend to modify the output to suit my device so instead of the output being FF FF FF 00 00 00 00 00 for relay 1 2 3 on and 4 5 6 7 8 off it will send the appropriate format for the board That I have. (see below)
Error CS0571 'SerialSetupDialog.SelectedPort.get': cannot explicitly call operator or accessor
It references the line in this section:
private void buttonSerialSetup_Click(object sender, EventArgs e)
SerialSetupDialog serialSetupDialog = new SerialSetupDialog(this.m_selectedPort);
if (((Form) serialSetupDialog).ShowDialog() != DialogResult.OK)
this.m_selectedPort = serialSetupDialog.get_SelectedPort();
Also there are 3 cases of:
Error CS0221 Constant value '-128' cannot be converted to a 'byte' (use 'unchecked' syntax to override)
The compiler doesn't like this part of the code. "(byte) sbyte.MinValue;"
private void Protocol1Event(byte[] channelValues)
int length1 = channelValues.Length;
int count = 2;
int length2 = 2 + 2 * length1 + (2 + 2 * length1) / 100;
if (this.m_p1Packet.Length < length2)
this.m_p1Packet = new byte[length2];
this.m_p1Packet[0] = (byte) 126;
this.m_p1Packet[1] = (byte) sbyte.MinValue;
this.m_threadPosition = 10;
for (int index = 0; index < length1; ++index)
if ((int) channelValues[index] == 125)
this.m_threadPosition = 11;
this.m_p1Packet[count++] = (byte) 124;
else if ((int) channelValues[index] == 126)
this.m_threadPosition = 12;
this.m_p1Packet[count++] = (byte) 124;
else if ((int) channelValues[index] == (int) sbyte.MaxValue)
this.m_threadPosition = 13;
this.m_p1Packet[count++] = (byte) sbyte.MinValue;
this.m_threadPosition = 14;
this.m_p1Packet[count++] = channelValues[index];
if (count % 100 == 0)
this.m_threadPosition = 15;
this.m_p1Packet[count++] = (byte) 125;
this.m_threadPosition = 16;
this.m_threadPosition = 17;
if (this.m_running)
while (this.m_selectedPort.WriteBufferSize - this.m_selectedPort.BytesToWrite <= count)
this.m_threadPosition = 18;
this.m_selectedPort.Write(this.m_p1Packet, 0, count);
this.m_threadPosition = 19;
this.m_threadPosition = 20;
private void Protocol2Event(byte[] channelValues)
byte num1 = (byte) sbyte.MinValue;
int length = channelValues.Length;
byte[] array = new byte[8];
int num2 = 0;
while (num2 < length)
int num3 = Math.Min(num2 + 7, length - 1);
this.m_p2Packet[1] = num1++;
if (num3 >= length - 1)
this.m_p2Zeroes.CopyTo((Array) this.m_p2Packet, 3);
Array.Clear((Array) array, 0, 8);
for (int index = num2; index <= num3; ++index)
byte num4 = channelValues[index];
byte num5 = num4;
if ((int) num4 >= 1 && (int) num4 <= 8)
array[(int) num4 - 1] = (byte) 1;
else if ((int) num5 >= 1 && (int) num5 <= 8)
array[(int) num5 - 1] = (byte) 1;
byte num6 = (byte) (1 + Array.IndexOf<byte>(array, (byte) 0));
this.m_p2Packet[2] = num6;
int index1 = num2;
int count = 3;
while (index1 <= num3)
this.m_p2Packet[count] = (byte) ((uint) channelValues[index1] - (uint) num6);
if (this.m_running)
this.m_selectedPort.Write(this.m_p2Packet, 0, count);
num2 += 8;
The reason that (byte)sbyte.MinValue; throws an error is because sbytes minimal value is -128 whereas bytes minimal value is 0. Therefore converting that to the other will cause an overflow. If you really want this behaviour you can use the keyword unchecked as the following:
byte b = unchecked((byte)sbyte.MinValue);
However this will give b the value of 128.
To answer the other part of your question I believe that replacing:
should fix the issue.

Despite conventional wisdom, using + instead of | to combine bytes into an int always works?

Conventional wisdom has it that when you are ORing bytes together to make an int, you should use the | operator rather than the + operator, otherwise you could have problems with the sign bit.
But this doesn't appear to be the case in C#. It looks like you can happily use the + operator, and it still works even for negative results.
My questions:
Is this really true?
If so, why does it work? (And why do a lot of people think it shouldn't - including me! ;)
Here's a test program which I believe tests every possible combination of four bytes using the + operator and the | operator, and verifies that both approaches yield the same results.
Here's the test code:
using System;
using System.Diagnostics;
namespace Demo
class Program
int Convert1(byte b1, byte b2, byte b3, byte b4)
return b1 + (b2 << 8) + (b3 << 16) + (b4 << 24);
int Convert2(byte b1, byte b2, byte b3, byte b4)
return b1 | (b2 << 8) | (b3 << 16) | (b4 << 24);
void Run()
byte b = 0xff;
Trace.Assert(Convert1(b, b, b, b) == -1); // Sanity check.
Trace.Assert(Convert2(b, b, b, b) == -1);
for (int i = 0; i < 256; ++i)
byte b1 = (byte) i;
for (int j = 0; j < 256; ++j)
byte b2 = (byte) j;
for (int k = 0; k < 256; ++k)
byte b3 = (byte) k;
for (int l = 0; l < 256; ++l)
byte b4 = (byte) l;
Trace.Assert(Convert1(b1, b2, b3, b4) == Convert2(b1, b2, b3, b4));
static void Main()
new Program().Run();
To see how this works, consider this:
byte b = 0xff;
int i1 = b;
int i2 = (b << 8);
int i3 = (b << 16);
int i4 = (b << 24);
int total = i1 + i2 + i3 + i4;
This prints:
When bits overlap, | and + will produce different results:
2 | 3 = 3
2 + 3 = 5
When actually using signed bytes, the result will be different:
-2 | -3 = -1
-2 + (-3) = -5

Converting a range into a bit array

I'm writing a time-critical piece of code in C# that requires me to convert two unsigned integers that define an inclusive range into a bit field. Ex:
uint x1 = 3;
uint x2 = 9;
//defines the range [3-9]
// 98 7654 3
//must be converted to: 0000 0011 1111 1000
It may help to visualize the bits in reverse order
The maximum value for this range is a parameter given at run-time which we'll call max_val. Therefore, the bit field variable ought to be defined as a UInt32 array with size equal to max_val/32:
UInt32 MAX_DIV_32 = max_val / 32;
UInt32[] bitArray = new UInt32[MAX_DIV_32];
Given a range defined by the variables x1 and x2, what is the fastest way to perform this conversion?
Try this. Calculate the range of array items that must be filled with all ones and do this by iterating over this range. Finally set the items at both borders.
Int32 startIndex = x1 >> 5;
Int32 endIndex = x2 >> 5;
bitArray[startIndex] = UInt32.MaxValue << (x1 & 31);
for (Int32 i = startIndex + 1; i <= endIndex; i++)
bitArray[i] = UInt32.MaxValue;
bitArray[endIndex] &= UInt32.MaxValue >> (31 - (x2 & 31));
May be the code is not 100% correct, but the idea should work.
Just tested it and found three bugs. The calculation at start index required a mod 32 and at end index the 32 must be 31 and a logical and instead of a assignment to handle the case of start and end index being the same. Should be quite fast.
Just benchmarked it with equal distribution of x1 and x2 over the array.
Intel Core 2 Duo E8400 3.0 GHz, MS VirtualPC with Server 2003 R2 on Windows XP host.
Array length [bits] 320 160 64
Performance [executions/s] 33 million 43 million 54 million
One more optimazation x % 32 == x & 31 but I am unable to meassure a performance gain. Because of only 10.000.000 iterations in my test the fluctuations are quite high. And I am running in VirtualPC making the situation even more unpredictable.
My solution for setting a whole range of bits in a BitArray to true or false:
public static BitArray SetRange(BitArray bitArray, Int32 offset, Int32 length, Boolean value)
Int32[] ints = new Int32[(bitArray.Count >> 5) + 1];
bitArray.CopyTo(ints, 0);
var firstInt = offset >> 5;
var lastInt = (offset + length) >> 5;
Int32 mask = 0;
if (value)
// set first and last int
mask = (-1 << (offset & 31));
if (lastInt != firstInt)
ints[lastInt] |= ~(-1 << ((offset + length) & 31));
mask &= ~(-1 << ((offset + length) & 31));
ints[firstInt] |= mask;
// set all ints in between
for (Int32 i = firstInt + 1; i < lastInt; i++)
ints[i] = -1;
// set first and last int
mask = ~(-1 << (offset & 31));
if (lastInt != firstInt)
ints[lastInt] &= -1 << ((offset + length) & 31);
mask |= -1 << ((offset + length) & 31);
ints[firstInt] &= mask;
// set all ints in between
for (Int32 i = firstInt + 1; i < lastInt; i++)
ints[i] = 0;
return new BitArray(ints) { Length = bitArray.Length };
You could try:
UInt32 x1 = 3;
UInt32 x2 = 9;
UInt32 newInteger = (UInt32)(Math.Pow(2, x2 + 1) - 1) &
~(UInt32)(Math.Pow(2, x1)-1);
Is there a reason not to use the System.Collections.BitArray class instead of a UInt32[]? Otherwise, I'd try something like this:
int minIndex = (int)x1/32;
int maxIndex = (int)x2/32;
// first handle the all zero regions and the all one region (if any)
for (int i = 0; i < minIndex; i++) {
bitArray[i] = 0;
for (int i = minIndex + 1; i < maxIndex; i++) {
bitArray[i] = UInt32.MaxValue; // set to all 1s
for (int i = maxIndex + 1; i < MAX_DIV_32; i++) {
bitArray[i] = 0;
// now handle the tricky parts
uint maxBits = (2u << ((int)x2 - 32 * maxIndex)) - 1; // set to 1s up to max
uint minBits = ~((1u << ((int)x1 - 32 * minIndex)) - 1); // set to 1s after min
if (minIndex == maxIndex) {
bitArray[minIndex] = maxBits & minBits;
else {
bitArray[minIndex] = minBits;
bitArray[maxIndex] = maxBits;
I was bored enough to try doing it with a char array and using Convert.ToUInt32(string, int) to convert to a uint from base 2.
uint Range(int l, int h)
char[] buffer = new char[h];
for (int i = 0; i < buffer.Length; i++)
buffer[i] = i < h - l ? '1' : '0';
return Convert.ToUInt32(new string(buffer), 2);
A simple benchmark shows that my method is about 5% faster than Angrey Jim's (even if you replace second Pow with a bit shift.)
It is probably the easiest to convert to producing a uint array if the upper bound is too big to fit into a single int. It's a little cryptic but I believe it works.
uint[] Range(int l, int h)
char[] buffer = new char[h];
for (int i = 0; i < buffer.Length; i++)
buffer[i] = i < h - l ? '1' : '0';
int bitsInUInt = sizeof(uint) * 8;
int numNeededUInts = (int)Math.Ceiling((decimal)buffer.Length /
uint[] uints = new uint[numNeededUInts];
for (int j = uints.Length - 1, s = buffer.Length - bitsInUInt;
j >= 0 && s >= 0;
j--, s -= bitsInUInt)
uints[j] = Convert.ToUInt32(new string(buffer, s, bitsInUInt), 2);
int remainder = buffer.Length % bitsInUInt;
if (remainder > 0)
uints[0] = Convert.ToUInt32(new string(buffer, 0, remainder), 2);
return uints;
Try this:
uint x1 = 3;
uint x2 = 9;
int cbToShift = x2 - x1; // 6
int nResult = ((1 << cbToShift) - 1) << x1;
(1<<6)-1 gives you 63 = 111111, then you shift it on 3 bits left

