I'm porting several thousand lines of cryptographic C# functions to a Java project. The C# code extensively uses unsigned values and bitwise operations.
I am aware of the necessary Java work-arounds to support unsigned values. However, it would be much more convenient if there were implementations of unsigned 32bit and 64bit Integers that I could drop into my code. Please link to such a library.
Quick google queries reveal several that are part of commercial applications:
http://www.teamdev.com/downloads/jniwrapper/javadoc/com/jniwrapper/UInt64.html
http://publib.boulder.ibm.com/infocenter/rfthelp/v7r0m0/index.jsp?topic=/com.rational.test.ft.api.help/ApiReference/com/rational/test/value/UInt64.html
Operations with signed and unsigned integers are mostly identical, when using two's complement notation, which is what Java does. What this means is that if you have two 32-bit words a and b and want to compute their sum a+b, the same internal operation will produce the right answer regardless of whether you consider the words as being signed or unsigned. This will work properly for additions, subtractions, and multiplications.
The operations which must be sign-aware include:
Right shifts: a signed right shift duplicates the sign bit, while an unsigned right shift always inserts zeros. Java provides the ">>>" operator for unsigned right-shifting.
Divisions: an unsigned division is distinct from a signed division. When using 32-bit integers, you can convert the values to the 64-bit long type ("x & 0xFFFFFFFFL" does the "unsigned conversion" trick).
Comparisons: if you want to compare a with b as two 32-bit unsigned words, then you have two standard idioms:
if ((a + Integer.MIN_VALUE) < (b + Integer.MIN_VALUE)) { ... }
if ((a & 0xFFFFFFFFL) < (b & 0xFFFFFFFFL)) { ... }
Knowing that, the signed Java types are not a big hassle for cryptographic code. I have implemented many cryptographic primitives in Java, and the signed types are not an issue provided that you understand what you are writing. For instance, have a look at sphlib: this is an opensource library which implements many cryptographic hash functions, both in C and in Java. The Java code uses Java's signed types (int, long...) quite seamlessly, and it simply works.
Java does not have operator overloading, so Java-only "solutions" to get unsigned types will involve custom classes (such as the UInt64 class you link to), which will imply a massive performance penalty. You really do not want to do that.
Theoretically, one could define a Java-like language with unsigned types and implement a compiler which produces bytecode for the JVM (internally using the tricks I detail above for shifts, divisions and comparisons). I am not aware of any available tool which does that; and, as I said above, Java's signed types are just fine for cryptographic code (in other words, if you have trouble with such signed types, then I daresay that you do not know enough to implement cryptographic code securely, and you should refrain from doing so; instead, use existing opensource libraries).
This is a language feature, not a library feature, so there is no way to extend Java to support this functionality unless you change the language itself, in which case you'd need to make your own compiler.
However, if you need unsigned right-shifts, Java supports the >>> operator which works like the >> operator for unsigned types.
You can, however, make your own methods to perform arithmetic with signed types as though they were unsigned; this should work, for example:
static int multiplyUnsigned(int a, int b)
{
final bool highBitA = a < 0, highBitB = b < 0;
final long a2 = a & ~(1 << 31), b2 = b & ~(1 << 31);
final long result = (highBitA ? a2 | (1 << 31) : a2)
* (highBitB ? b2 | (1 << 31) : b2);
return (int)result;
}
Edit:
Thanks to #Ben's comment, we can simplify this:
static int multiplyUnsigned(int a, int b)
{
final long mask = (1L << 32) - 1;
return (int)((a & mask) * (b & mask));
}
Neither of these methods works, though, for the long type. You'd have to cast to a double, negate, multiply, and cast it back again in that case, which would likely kill any and all of your optimizations.
Related
When I Initialize a ulong with the value 18446744073709551615 and then add a 1 to It and display to the Console It displays a 0 which is totally expected.
I know this question sounds stupid but I have to ask It. if my Computer has a 64-bit architecture CPU how is my calculator able to work with larger numbers than 18446744073709551615?
I suppose floating-point has a lot to do here.
I would like to know exactly how this happens.
Thank you.
working with larger numbers than 18446744073709551615
"if my Computer has a 64-bit architecture CPU" --> The architecture bit size is largely irrelevant.
Consider how you are able to add 2 decimal digits whose sum is more than 9. There is a carry generated and then used when adding the next most significant decimal place.
The CPU can do the same but with base 18446744073709551616 instead of base 10. It uses a carry bit as well as a sign and overflow bit to perform extended math.
I suppose floating-point has a lot to do here.
This is nothing to do with floating point.
; you say you're using ulong, which means your using unsigned 64-but arithmetic. The largest value you can store is therefore "all ones", for 64 bits - aka UInt64.MaxValue, which as you've discovered: https://learn.microsoft.com/en-us/dotnet/api/system.uint64.maxvalue
If you want to store arbitrarily large numbers: there are APIs for that - for example BigInteger. However, arbitrary size cones at a cost, so it isn't the default, and certainly isn't what you get when you use ulong (or double, or decimal, etc - all the compiler-level numeric types have fixed size).
So: consider using BigInteger
You either way have a 64 bits architecture processor and limited to doing 64 bits math - your problem is a bit hard to explain without taking an explicit example of how this is solved with BigInteger in System.Numerics namespace, available in .NET Framework 4.8 for example. The basis is to 'decompose' the number into an array representation.
mathematical expression 'decompose' here meaning :
"express (a number or function) as a combination of simpler components."
Internally BigInteger uses an internal array (actually multiple internal constructs) and a helper class called BigIntegerBuilder. In can implicitly convert an UInt64 integer without problem, for even bigger numbers you can use the + operator for example.
BigInteger bignum = new BigInteger(18446744073709551615);
bignum += 1;
You can read about the implicit operator here:
https://referencesource.microsoft.com/#System.Numerics/System/Numerics/BigInteger.cs
public static BigInteger operator +(BigInteger left, BigInteger right)
{
left.AssertValid();
right.AssertValid();
if (right.IsZero) return left;
if (left.IsZero) return right;
int sign1 = +1;
int sign2 = +1;
BigIntegerBuilder reg1 = new BigIntegerBuilder(left, ref sign1);
BigIntegerBuilder reg2 = new BigIntegerBuilder(right, ref sign2);
if (sign1 == sign2)
reg1.Add(ref reg2);
else
reg1.Sub(ref sign1, ref reg2);
return reg1.GetInteger(sign1);
}
In the code above from ReferenceSource you can see that we use the BigIntegerBuilder to add the left and right parts, which are also BigInteger constructs.
Interesting, it seems to keep its internal structure into an private array called "_bits", so that is the answer to your question. BigInteger keeps track of an array of 32-bits valued integer array and is therefore able to handle big integers, even beyond 64 bits.
You can drop this code into a console application or Linqpad (which has the .Dump() method I use here) and inspect :
BigInteger bignum = new BigInteger(18446744073709551615);
bignum.GetType().GetField("_bits",
BindingFlags.NonPublic | BindingFlags.Instance).GetValue(bignum).Dump();
A detail about BigInteger is revealed in a comment in the source code of BigInteger on Reference Source. So for integer values, BigInteger stores the value in the _sign field, for other values the field _bits is used.
Obviously, the internal array needs to be able to be converted into a representation in the decimal system (base-10) so humans can read it, the ToString() method converts the BigInteger to a string representation.
For a better in-depth understanding here, consider doing .NET source stepping to step way into the code how you carry out the mathematics here. But for a basic understanding, the BigInteger uses an internal representation of which is composed with 32 bits array which is transformed into a readable format which allows bigger numbers, bigger than even Int64.
// For values int.MinValue < n <= int.MaxValue, the value is stored in sign
// and _bits is null. For all other values, sign is +1 or -1 and the bits are in _bits
What's so special about adding/subtracting 1 to/from a floating point value that it deserves a dedicated operator?
double a = -0.001234129;
a++; // ?
I've never felt the need to use such a construction; it looks really weird to me. But if I ever had to, I'd feel much more comfortable with just:
a += 1;
Maybe it's because of my strong C++ background, but to me it makes a variable look like an array indexer.
Is there any reason for this?
The ++ and -- operators operate on all other number types, why make an exception for floating point numbers? To me, that would be the more surprising choice.
Note that C++ also implements these for floating point:
#include <iostream>
using namespace std;
int main(int argc, char* argv[])
{
double a = 0.5;
cout << a << '\n';
++a;
cout << a << '\n';
return 0;
}
Output:
0.5
1.5
My guess is that the reason is consistency with C/C++.
I agree with you, that it's kind of weird - the '++' operator has some special meaning for integer values:
It translates to INC assembly instruction,
It represents changing the value by a special amount (i.e. by the smallest possible amount), and because of this it's used in iterations.
For floating point numbers, however, the value 1.0 is not any special value (from machine point of view). You also shouldn't use it for iterations (in other words: if you're using it you should usually consider using an int) as well as it doesn't have a designated INC assembly instruction.
I am trying to understand this code and I am not sure what language it is. It seems to be Java but I am not sure. I apologize if I am posting this incorrectly. I am volunteering and helping with a calendar and trying to find a random generator to work with basic. I am immediately trying to understand what this is doing.
private static uint GetUint()
{
m_z = 36969 * (m_z & 65535) + (m_z >> 16);
m_w = 18000 * (m_w & 65535) + (m_w >> 16);
return (m_z << 16) + m_w;
}
public static double GetUniform()
{
// 0 <= u < 2^32
uint u = GetUint();
// The magic number below is 1/(2^32 + 2).
// The result is strictly between 0 and 1.
return (u + 1.0) * 2.328306435454494e-10;
}
It is C#, and the code is from here http://www.codeproject.com/KB/recipes/SimpleRNG.aspx?display=Print
It is used to generate random numbers. There's quite a bit more info on it at that link above. To find it I Googled the 2.38... number, because it looked familiar.
It's should be C#.
C++'s public and private must be followed by a :.
Java doesn't have uint.
The naming convention (CamelCase) looks like a .NET language, and the syntax is C-like.
This seems to be a double LCG implemented in C# (I say C# instead of Java because IIRC Java doesn't have uint). You can find more about LCGs on Wikipedia.
Still, most dialects of BASIC have some random number generator built in, typically using the instructions RANDOMIZE for initializing it and RAND or RANDOM to get a random number.
Because of the naming conventions (methods starting in uppercase), the data types (uint, double), the keywords (private, public, static), the programming conventions (braces in a separate line) and the operators (>>, +, *, &) I'm pretty sure the programming language used in the above snippet it's C# .
I just found an interesting problem between translating some data:
VB.NET: CByte(4) << 8 Returns 4
But C#: (byte)4 << 8 Returns 1024
Namely, why does VB.NET: (CByte(4) << 8).GetType() return type {Name = "Byte" FullName = "System.Byte"}
Yet C#: ((byte)4 << 8).GetType() returns type {Name = "Int32" FullName = "System.Int32"}
Is there a reason why these two treat the binary shift the same? Following from that, is there any way to make the C# bit shift perform the same as VB.NET (to make VB.NET perform like C# you just do CInt(_____) << 8)?
According to http://msdn.microsoft.com/en-us/library/a1sway8w.aspx byte does not have << defined on it for C# (only int, uint, long and ulong. This means that it will use an implciit conversion to a type that it can use so it converts it to int before doing the bit shift.
http://msdn.microsoft.com/en-us/library/7haw1dex.aspx says that VB defines the operation on Bytes. To prevent overflow it applies a mask to your shift to bring it within an appropriate range so it is actually in this case shifting by nothing at all.
As to why C# doesn't define shifting on bytes I can't tell you.
To actually make it behave the same for other datatypes you need to just mask your shift number by 7 for bytes or 15 for shorts (see second link for info).
To apply the same in C#, you would use
static byte LeftShiftVBStyle(byte value, int count)
{
return (byte)(value << (count & 7));
}
as for why VB took that approach.... just different language, different rules (it is a natural extension of the way C# handles shifting of int/&31 and long/&63, to be fair).
Chris already nailed it, vb.net has defined shift operators for the Byte and Short types, C# does not. The C# spec is very similar to C and also a good match for the MSIL definitions for OpCodes.Shl, Shr and Shr_Un, they only accept int32, int64 and intptr operands. Accordingly, any byte or short sized operands are first converted to int32 with their implicit conversion.
That's a limitation that the vb.net compiler has to work with, it needs to generate extra code to make the byte and short specific versions of the operators work. The byte operator is implemented like this:
Dim result As Byte = CByte(leftOperand << (rightOperand And 7))
and the short operator:
Dim result As Short = CShort(leftOperand << (rightOperand And 15))
The corresponding C# operation is:
Dim result As Integer = CInt(leftOperand) << CInt(rightOperand)
Or CLng() if required. Implicit in C# code is that the programmer always has to cast the result back to the desired result type. There are a lot of SO questions about that from programmers that don't think that's very intuitive. VB.NET has another feature that makes automatic casting more survivable, it has overflow checking enabled by default. Although that's not applicable to shifts.
I came from a mostly C/C++ background before I began using C#. One of the things I did with my first project in C# was make a class like this
class Element{
public uint Size;
public ulong BigThing;
}
I was then mortified by the fact that this requires casting:
int x=MyElement.Size;
as does
int x=5;
uint total=MyElement.Size+x;
Why did the language designers decide to make the signed and unsigned integer types not implicitly castable? And why are the unsigned types not used more throughout the .Net library? For instance String.Length can never be negative, yet it is a signed integer.
Why did the language designers decide to make the signed and unsigned integer types not
implicitly castable?
Because that could lose data or throw any exception, neither of which is generally a good thing to allow implicitly. (The implicit conversion from long to double can lose data too, admittedly, but in a different way.)
And why are the unsigned types not used more throughout the .Net library
Unsigned types aren't CLS-compliant - not all .NET languages have always supported them. For example, Visual Basic didn't have "native" support for unsigned data types in .NET 1.0 and 1.1; it was added to the language for 2.0. (You could still use them, but they weren't part of the language itself - you couldn't use the normal arithmetic operators, for example.)
Along with Jon's answer, just because an unsigned number can't be negative doesn't mean it isn't bigger than a signed one. uint is 0 to 4,294,967,295 but int is -2,147,483,648 to 2,147,483,647. Plenty of room above int's max for loss.
Because implicitly converting an unsigned integer of 3B into an signed integer is going to blow up.
Unsigned has twice the maximum value of signed. It's the same reason you can't cast a long to an int.
I was then mortified by the fact that this requires casting:
int x=MyElement.Size;
But you are contradicting yourself here. If you really (really) need Size to be unsigned than assigning it to (signed) x is an error. A deep flaw in your code.
For instance String.Length can never be negative, yet it is a signed integer
But String.IndexOf can return a negative number, and it would be awkward if String.Length and Index values where of different types.
And while in theory there would be merit in an unsigned String.Length (4 GB cap), in practice even the current 2GB is large enough (because strings of that length are rare and unworkable anyway).
So the real answer is: Why use unsigned in the first place?
On the second count: because they wanted the CLR to be compatible with languages that don't have unsigned datatypes (read: VB.NET).