_BitScanForward in C#? - c#

I am translating a program written in C++ to C#, and I have come across an intrinsic function that I cannot work around. In C++ this is known as:
unsigned char _BitScanForward(unsigned long * Index, unsigned long Mask);
If I only knew what DLL, if any, the intrinsic functions were in, I could use P/Invoke. Since I do not know, I looked for alternatives in the .NET framework, but I have come up empty handed.
Does anyone know how use P/Invoke on _BitScanForward, or an .NET method that does the same thing?
Any help is appreciated, thank you.

Intrinsic functions aren't in any library, they're implemented inside the CPU, the compiler emits the machine code which the CPU recognizes as evoking this particular behavior.
They're a way of getting access to instructions that don't have a simple C equivalent.
Until the .NET optimizer becomes smart enough to recognize them (for example, the Mono JIT recognizes some SIMD instructions, encoded in MSIL as calls to functions of a particular class, similarly the .NET JIT replaces calls to System.Math methods with floating-point operations), your C# code is doomed to run an order of magnitude slower than the original C++.

The _BitScanForward C++ function is an intrinsic compiler function. It finds the first on bit in a sequence of bytes searching from the lowest order bit to the highest and returning the value of the bit. You could probably implement something similar using bit manipulation tactics in C# (though it'll never come close to the same performance). If you're comfortable with bit manipulation in C++ then its basically the same in C#.

_BitScanForward searches for the first set bit in an integer, starting from the least significant bit searching towards the most significant bit. It compiles to the bsf instruction on the x86 platform.
The bit twiddling hacks page includes a handful of potential replacement algorithms that excel in different situations. There's an O(N) function (that half the time with uniformly-distributed inputs returns with only one iteration) and some sub-linear options, and some that make use of multiplication steps. Picking one might not be trivial, but any should work.

Wow, looks like there is a question on C# that haven't yet been covered with the recent improvements.
Other commenters have properly noted that the intrinsics like _BitScanForward are not functions per se, those are rather markers for the compiler to inject a specific platform instruction into the object code. It is impossible to emulate an intrinsic in a high-level language (unless you're willing to pay an abstraction penalty).
However, good news is that starting with .Net Core 3.0 the JIT does support the intrinsics for a number of hardware platforms.
For the _BitScanForward you might use System.Runtime.Intrinsics.X86.Bmi1.TrailingZeroCount.
Caveat: Don't forget to check for Bmi1.IsSupported before using, otherwise the code would fail at runtime.
You could also get a decent execution speed on ARM (.Net 5.0+) by using their ffs intrinsics:
public int ArmBitScanForward(int x)
=> 32 − System.Runtime.Intrinsics.Arm.ArmBase.LeadingZeroCount(x & −x);
public int ArmBitScanForward(long x)
=> 64 − System.Runtime.Intrinsics.Arm.ArmBase.Arm64.LeadingZeroCount(x & −x);
If neither platform is present, you would have to resort to the bit-twiddling hacks like de-Bruijun sequences:
for i from 0 to 31: table[ ( 0x077CB531 * ( 1 << i ) ) >> 27 ] ← i // table [0..31] initialized
function ctz5 (x)
return table[((x & -x) * 0x077CB531) >> 27]
(taken from https://en.wikipedia.org/wiki/Find_first_set)
Depending on the task restrictions, I would choose across different strategies of the algorithm selection at runtime. Branching on each call is likely to kill all the efficiency. The most efficient way is to branch on a level higher - i.e. have three versions of your code to choose from at run time.
An easy way to automate codegen is to have your code in a generic from parameterized with a bit-handling type:
public interface IBitScanner
{
int BitScanForward(int x);
}
public int MyFunction<T>(int[] data)
where T: new, IBitScanner
{
var s=0;
var scanner = new T();
foreach(var i in data)
s+= scanner.BitScanForward(i);
return s;
}
Then we define a couple of structs implementing our scanner:
public struct BitScannerX86: IBitScanner
{
public int BitScanForward(int x)
=> unchecked((int)System.Runtime.Intrinsics.X86.Bmi1.TrailingZeroCount((uint)x));
}
public struct BitScannerArm: IBitScanner
{
public int BitScanForward(int x)
=> 32 − System.Runtime.Intrinsics.Arm.ArmBase.LeadingZeroCount(x & −x);
}
public struct BitScanner: IBitScanner
{
private static int[] _table = InitTable();
private static int[] InitTable()
{
var table = new int[32];
for(var i=0; i<table.Length; i++)
table[i] = ( 0x077CB531 * ( 1 << i ) ) >> 27;
return table;
}
public int BitScanForward(int x)
=> _table[((x & -x) * 0x077CB531) >> 27]
}
Now whenever we need a platform-specific version of MyFunction, we do it via
MyFunction<BitScannerArm>. Being struct, the type parameter forces JIT to generate the specific code for it instead of a generic one fancying a virtual call.
Then, as the T is known at JIT time, the call to BitScanForward gets inlined, and ends up with the appropriate intrinsic injected into the loop.
Depending on the MyFunction task size, this version of MyFunction might be saved to a delegate, be part of an interface, or be part of a struct that implements an interface to repeat the trick one level higher.
Note that original question didn't bother with the cross-platform compatibility, as the _BitScanForward is an Intel-only instruction.
It was probably Ok in the C++ world of compiling an executable against a specific OS&HW combination; contemporary managed code like Java/.Net has a chance to be executed anywhere.

It is not possible to P/Invoke _BitScanForward because it is a compiler intrinsic, not an actual library function (it gets translated by the Visual C++ compiler to a BSF x86 machine instruction). As far as I'm aware, there is no MSIL instruction for this "find first set" operation. The simplest thing to do is write your own C++ native DLL that exports a function that invokes _BitScanForward(), and then P/Invoke that.
You can also write it directly in C# using bit manipulation (see Algorithms for find first set in Wikipedia). I'm not sure if this would be faster or slower than P/Invoke. Measure and find out.

Related

Implementing ByteCode interpreter in c#

My question: is there a
memory-efficient way to mimic the c++ union concept while allowing for string datatype, or some other efficient way to include data types and values in bytecode with minimal pointer chasing so as to take advantage of instruction caching?
I'm trying to write a VM bytecode interpreter in C#. I'd like to keep it in C# for simplicity, security, and familiarity reasons, mostly because I want to interact with a library of C# code I've already written.
There's information about how to do so online readily enough, except that it uses 'union' in c++, for which I can't seem to find an equivalent. Specifically, any kind of values (that is, anything that isn't an instruction) are stored as a tagged union.
I've searched and found questions like: Discriminated union in C#, but their answers don't make for efficient code - using inheritance still involves pointer chasing.
C++ union in C# proposes using StructLayout. It works until you need string values, and then throws:
[StructLayout(LayoutKind.Explicit)]
public struct SampleUnion
{
[FieldOffset(0)] public byte typeTag;
[FieldOffset(1)] public int num;
[FieldOffset(1)] public bool flag;
[FieldOffset(1)] public string c;
}
Could not load type ... because it contains an object field at offset 1 that is incorrectly aligned or overlapped by a non-object field.
I also tried messing around with just passing around arrays of bytes but then I get burned in perf costs when I have to use the value, because I have to convert it.
I've considered using dynamic. Maybe that will work, but it's at best a waste of memory for some types, and at worst I'm uncertain what shenanigans it might try to pull behind the scenes.
I mean, worst case scenario I suppose I could write the byte code interpreter in c++ and call it within the c# code, but I'd rather avoid that if I can, especially because I don't love the idea of messing around with the unsafe keyword, and it introduces a lot of complexity into my project.
As described in this article, the pseudocode of a bytecode interpreter is:
load the bytecode into memory
initialize interpreter state
repeat {
   fetch the next instruction, advance the instruction pointer
   decode the instruction
   execute the instruction
}
Depending on the bytecode format or structure, the instruction can have either fixed or dynamic length. Data like arrays or strings are typically referenced as (fixed length) memory offsets. The data is embedded in the bytecode separate from the instructions. The data address/offset is an index within the bytecode, as data is stored as sequence of bytes. An instruction to load a string would contain the string offset but not the string data itself.
To fetch and decode the next instruction, it is common to analyze the first one or two bytes which is/are usually the opcode. From this opcode, the length of the instruction is derived. The bytes belonging to the instruction can then be copied into a struct(ure) to disect it further and extract the instruction operand(s).
I can't see where a union would help in this process.
A simple C++ bytecode interpreter is described in XIDEK Extensible Interpreter Development Kit

Why can I not encode const structs?

I wish to encode hard coded value of a const Point struct.
Why does the compiler not allow neither internal, nor arbitrary structs to be replaced during compilation? Since the internal bitwise representation can be established at compile time (in both cases), there is no apparent reason for the restriction.
My question is: Is there a way to hard-code a predefined set of bytes in c# that can be interpreted at compile time as the appropriate type, since all structs have a predetermined memory outline.
EDIT:
To clarify: Compile time means C# -> IL byte-code as stored in the output assembly.
The use case example:
public void Draw(Bitmap bmp, Point Location = new Point(0,0)) // invalid
This is an error because the new Point(0,0) cannot be evaluated at compile time. I can pass in int X = 0, int Y = 0 or the nullable Point? Location = null and generate the struct inside of the method, or Overload the method without the optional parameters and call the main method passing in the default values, but that technique incurs a performance penalty in terms of the extra method calls required.
This may not be appropriate for all structs, since the constructor could rely on, or change, external state or randomness.
FINAL EDIT:
This is now possible. Making the question moot. Yay.
The issue was the incorrect belief that the new keyword always implied heap allocation or dynamic stack allocation, with constant arguments neither case was true.
Why does the compiler not allow neither internal, nor arbitrary structs to be replaced during compilation? Since the internal bitwise representation can be established at compile time (in both cases), there is no apparent reason for the restriction.
All not-implemented features are not implemented for the same reason. To be implemented, a feature must be thought of, judged to be appropriate, designed, specified, implemented, tested, documented and shipped. All those things must happen. For your proposed feature, none of them happened. Therefore, no feature.
Programming language designers are not required to provide a justification for why a feature was not implemented. Rather, the people who want the feature are required to provide a reason why programming language designers should spend their valuable time implementing a feature that you want.
The C# design process is open, and the compiler source code is available. Why have you not designed and implemented the feature? If it is fair for you to ask the designers that question, it's fair for them to ask it of you! You're a computer programmer; get busy programming computers and build the feature if you think it is worthwhile, and then convince the language team to accept your pull request. If you don't think it is worth your time to do that, well, probably the language designers feel the same way.
My question is: Is there a way to hard-code a predefined set of bytes in c# that can be interpreted at compile time as the appropriate type, since all structs have a predetermined memory outline.
I'm not sure what you mean by "at compile time"; can you clarify?
There are ways to store byte arrays in an assembly, sure. Make a C# program with a byte array initialized to all constant values and ildasm the assembly; you'll see the code that the C# compiler generates to get the byte array image out of the metadata and into the array.
You could implement similar shenanigans to get a byte array, fix the array in place, and then use unsafe pointer magic to reinterpret the array bytes as struct bytes. That sounds extraordinarily dangerous, and might mess up the performance of the garbage collector. I would not wish to do so myself, but you seem pretty keen on this feature, so go for it and report back what you find out!
Alternatively, C++/CLI probably implements the feature you want; I've never used it but that seems like the sort of thing it would do. You could write a little program in C++/CLI that does what you want, and then either (1) use that program's assembly as a dependency of your assembly, (2) compile it as a netmodule link it in to your assembly via the usual netmodule linking gear (yuck) or (3) deduce how they implemented the feature and then do the same.
You can convert a struct into byte array and encode a byte array. It may work like this: (please compile and fix any errors (typing through mobile))
Suppose your struct is:
public struct TestStruct
{
public int x;
public string y;
}
public byte[] GetTestByte(TestStruct c)
{
var intGuy = BitConverter.GetBytes(c.x);
var stringGuy = Encoding.UTF8.GetBytes(c.y);
var both = stringGuy.Concat(intGuy).Concat(new byte[1]).ToArray();
return both;
}
Now you can encode a byte array like:
Convert.ToBase64String(byteArray);
There is no direct way to encode a struct. Its a value type at best and probably there as legacy for C and C++

Higher order functions in AleaGPU C#

I am trying to code C# versions (in C# style) of the F# reduce functions found here:
https://github.com/quantalea/AleaGPUTutorial/tree/master/src/fsharp/examples/generic_reduce
More specific to my question, take this function for example:
let multiReduce (opExpr:Expr<'T -> 'T -> 'T>) numWarps =
let warpStride = WARP_SIZE + WARP_SIZE / 2 + 1
let sharedSize = numwarps * warpStride
<# fun tid (x:'T) ->
// stuff
#>
I'm primarily an F# guy, and I'm not quite sure how I should go about coding functions like these in C#. For the C# version, the multiReduce function will be a class member. So if I wanted to do a more direct translation of the F# code, I would return a Func from my MultiReduce member.
The other option would be to "flatten" the multiReduce function, so that my C# member version would have two extra parameters. So...
public T MultiReduce(Func<T,T,T> op, int numWarps, int tid, T x)
{
// stuff
}
But I don't think this would work for AleaGPU coding in all cases because the quoted expression in the F# version is a device function. You need the nested function structure to be able to separate the assignment of certain variables from the actual invocation of the function.
Another way I see to do it would be to make a MultiReduce class and have the opExpr and numWarps as fields, then make the function in the quotation a class member.
So how are higher order functions like these generally implemented in AleaGPU-C#? I don't think it's good to return Func<..> everywhere since I don't see this done much in C# coding. Is AleaGPU a special case where this would be ok?
A basic AleaGPU C# implementation looks like this:
internal class TransformModule<T> : ILGPUModule
{
private readonly Func<T, T> op;
public TransformModule(GPUModuleTarget target, Func<T, T> opFunc)
: base(target)
{
op = opFunc;
}
[Kernel]
public void Kernel(int n, deviceptr<T> x, deviceptr<T> y)
{
var start = blockIdx.x * blockDim.x + threadIdx.x;
var stride = gridDim.x * blockDim.x;
for (var i = start; i < n; i += stride)
y[i] = op(x[i]);
}
public void Apply(int n, deviceptr<T> x, deviceptr<T> y)
{
const int blockSize = 256;
var numSm = this.GPUWorker.Device.Attributes.MULTIPROCESSOR_COUNT;
var gridSize = Math.Min(16 * numSm, Common.divup(n, blockSize));
var lp = new LaunchParam(gridSize, blockSize);
GPULaunch(Kernel, lp, n, x, y);
}
public T[] Apply(T[] x)
{
using (var dx = GPUWorker.Malloc(x))
using (var dy = GPUWorker.Malloc<T>(x.Length))
{
Apply(x.Length, dx.Ptr, dy.Ptr);
return dy.Gather();
}
}
}
Higher-order functions are not nearly as ubiquitous in C# as they are in F#. While there are plenty of examples of accepting functions as arguments, C# code rarely returns functions as results. I guess this is partly because the code comes out very ugly (Func<T,U> everywhere) and partly because C# programmers are not generally used to functional style and gravitate more toward OO ways.
In particular, there is no automatic currying/partial application in C#. You can think of it as if all your F# functions always had tupled parameters. In fact, that's how a multi-parameter C# method would look if you called it from F#.
I must also note that the function in your code is not, in fact, "higher-order". It neither accepts nor returns any functions. Instead, it accepts and returns quotations, which is not at all the same thing. Function is, roughly speaking, a reference to a piece of code, but quotation is a data structure. They look similar, but they're completely different animals.
C# does, too, have its own quotations, represented by the type System.Linq.Expressions.Expression<T> (where T must be a delegate type). However, they are not the same thing as F# quotations. From F# side, you can (sorta) use C# quotation, but not the other way around.
Both F# and C# quotations have their strengths and weaknesses. In particular, C# supports compilation, F# doesn't. F# supports splicing, C# doesn't.
Which brings me to the next point: you probably need splicing. Because you are using opExpr in the body of the returned quotation, aren't you?
And C# doesn't have an out-of-the-box support for it. Yes, it is theoretically possible to implement splicing as a library function, but for some reason there is no de-facto standard, regularly maintained implementation. We, for one, had to roll our own. It's open source, too, and pretty straightforward, so feel free to use it.
Now, having said all the above, I want to express a doubt that you would be able to use C# for this at all. I don't really know how AleaGPU works, but it looks like it expects you to return an F# quotation, which it then, presumably, compiles into GPU code. If that's the case, because C# and F# quotations are two different things, you probably won't be able to return a C# quotation to AleaGPU in lieu of the F# one. Unless it has separate support for it, of course.

The Benefits of Using Function Pointers

I have been programming for a few years now and have used function pointers in certain cases. What I would like to know is when is it appropriate or not to use them for performance reasons and I mean in the context of games, not business software.
Function pointers are fast, John Carmack used them to the extent of abuse in the Quake and Doom source code and because he is a genius :)
I would like to use function pointers more but I want to use them where they are most appropriate.
These days what are the best and most practical uses of function pointers in modern c-style languages such as C, C++, C# and Java, etc?
There is nothing especially "fast" about function pointers. They allow you to call a function which is specified at runtime. But you have exactly the same overhead as you'd get from any other function call (plus the additional pointer indirection). Further, since the function to call is determined at runtime, the compiler can typically not inline the function call as it could anywhere else. As such, function pointers may in some cases add up to be significantly slower than a regular function call.
Function pointers have nothing to do with performance, and should never be used to gain performance.
Instead, they are a very slight nod to the functional programming paradigm, in that they allow you to pass a function around as parameter or return value in another function.
A simple example is a generic sorting function. It has to have some way to compare two elements in order to determine how they should be sorted. This could be a function pointer passed to the sort function, and in fact c++'s std::sort() can be used exactly like that. If you ask it to sort sequences of a type that does not define the less than operator, you have to pass in a function pointer it can call to perform the comparison.
And this leads us nicely to a superior alternative. In C++, you're not limited to function pointers. You often use functors instead - that is, classes that overload the operator (), so that they can be "called" as if they were functions. Functors have a couple of big advantages over function pointers:
They offer more flexibility: they're full-fledged classes, with constructor, destructor and member variables. They can maintain state, and they may expose other member functions that the surrounding code can call.
They are faster: unlike function pointers, whose type only encode the signature of the function (a variable of type void (*)(int) may be any function which takes an int and returns void. We can't know which one), a functor's type encodes the precise function that should be called (Since a functor is a class, call it C, we know that the function to call is, and will always be, C::operator()). And this means the compiler can inline the function call. That's the magic that makes the generic std::sort just as fast as your hand-coded sorting function designed specifically for your datatype. The compiler can eliminate all the overhead of calling a user-defined function.
They are safer: There's very little type safety in a function pointer. You have no guarantee that it points to a valid function. It could be NULL. And most of the problems with pointers apply to function pointers as well. They're dangerous and error-prone.
Function pointers (in C) or functors (in C++) or delegates (in C#) all solve the same problem, with different levels of elegance and flexibility: They allow you to treat functions as first-class values, passing them around as you would any other variable. You can pass a function to another function, and it will call your function at specified times (when a timer expires, when the window needs redrawing, or when it needs to compare two elements in your array)
As far as I know (and I could be wrong, because I haven't worked with Java for ages), Java doesn't have a direct equivalent. Instead, you have to create a class, which implements an interface, and defines a function (call it Execute(), for example). And then instead of calling the user-supplied function (in the shape of a function pointer, functor or delegate), you call foo.Execute(). Similar to the C++ implementation in principle, but without the generality of C++ templates, and without the function syntax that allows you to treat function pointers and functors the same way.
So that is where you use function pointers: When more sophisticated alternatives are not available (i.e. you are stuck in C), and you need to pass one function to another. The most common scenario is a callback. You define a function F that you want the system to call when X happens. So you create a function pointer pointing to F, and pass that to the system in question.
So really, forget about John Carmack and don't assume that anything you see in his code will magically make your code better if you copy it. He used function pointers because the games you mention were written in C, where superior alternatives are not available, and not because they are some magical ingredient whose mere existence makes code run faster.
They can be useful if you do not know the functionality supported by your target platform until run-time (e.g. CPU functionality, available memory). The obvious solution is to write functions like this:
int MyFunc()
{
if(SomeFunctionalityCheck())
{
...
}
else
{
...
}
}
If this function is called deep inside of important loops then its probably better to use a function pointer for MyFunc:
int (*MyFunc)() = MyFunc_Default;
int MyFunc_SomeFunctionality()
{
// if(SomeFunctionalityCheck())
..
}
int MyFunc_Default()
{
// else
...
}
int MyFuncInit()
{
if(SomeFunctionalityCheck()) MyFunc = MyFunc_SomeFunctionality;
}
There are other uses of course, like callback functions, executing byte code from memory or for creating an interpreted language.
To execute Intel compatible byte code on Windows, which might be useful for an interpreter. For example, here is an stdcall function returning 42 (0x2A) stored in an array which can be executed:
code = static_cast<unsigned char*>(VirtualAlloc(0, 6, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE));
// mov eax, 42
code[0] = 0x8b;
code[1] = 0x2a;
code[2] = 0x00;
code[3] = 0x00;
code[4] = 0x00;
// ret
code[5] = 0xc3;
// this line executes the code in the byte array
reinterpret_cast<unsigned int (_stdcall *)()>(code)();
...
VirtualFree(code, 6, MEM_RELEASE);
);
As per my personal experience they can can help you save significant lines of code.
Consider the condition:
switch(sample_var)
{
case 0:
func1(<parameters>);
break;
case 1:
func2(<parameters>);
break;
up to case n:
funcn(<parameters>);
break;
}
where func1() ... funcn() are functions with same prototype.
What we could do is:
Declare an array of function pointers arrFuncPoint containing the addresses of functions
func1() to funcn()
Then the whole switch case would be replaced by
*arrFuncPoint[sample_var];
Any time you use a event handler or delegate in C#, you are effectively using a function pointer.
And no, they are not about speed. Function pointers are about convenience.
Jonathan
Function pointers are used as callbacks in many cases. One use is as a comparison function in sorting algorithms. So if you are trying to compare customized objects, you can provide a function pointer to the comparison function that knows how to handle that data.
That said, I'll provide a quote I got from a former professor of mine:
Treat a new C++ feature like you would treat a loaded automatic weapon in a crowded room: never use it just because it looks nifty. Wait until you understand the consequences, don't get cute, write what you know, and know what you write.
These days what are the best and most practical uses of integers in modern c-style languages?
In the dim, dark ages before C++, there was a common pattern I used in my code which was to define a struct with a set of function pointers that (typically) operated on that struct in some way and provided particular behaviors for it. In C++ terms, I was just building a vtable. The difference was that I could side-effect the struct at runtime to change behaviors of individual objects on the fly as needed. This offers a much richer model of inheritance at the cost of stability and ease of debugging. The greatest cost, however, was that there was exactly one person who could write this code effectively: me.
I used this heavily in a UI framework that let me change the way objects got painted, who was the target of commands, and so on, on the fly - something that very few UIs offered.
Having this process formalized in OO languages is better in every meaningful way.
Just speaking of C#, but function pointers are used all over C#. Delegates and Events (and Lambdas, etc) are all function pointers under the hood, so nearly any C# project is going to be riddled with function pointers. Basically every event handler, near every LINQ query, etc - will be using function pointers.
There are occasions when using function pointers can speed up processing. Simple dispatch tables can be used instead of long switch statements or if-then-else sequences.
Function pointers are a poor man's attempt to be functional. You could even make an argument that having function pointers makes a language functional, since you can write higher order functions with them.
Without closures and easy syntax, they're sorta gross. So you tend to use them far less than desireable. Mainly for "callback" functions.
Sometimes, OO design works around using functions by instead creating a whole interface type to pass in the function needed.
C# has closures, so function pointers (which actually store an object so it's not just a raw function, but typed state too) are vastly more usable there.
Edit
One of the comments said there should be a demonstration of higher order functions with function pointers. Any function taking a callback function is a higher order function. Like, say, EnumWindows:
BOOL EnumWindows(
WNDENUMPROC lpEnumFunc,
LPARAM lParam
);
First parameter is the function to pass in, easy enough. But since there are no closures in C, we get this lovely second parameter: "Specifies an application-defined value to be passed to the callback function." That app-defined value allows you to manually pass around untyped state to compensate for lack of closures.
The .NET framework is also filled with similar designs. For instance, IAsyncResult.AsyncState: "Gets a user-defined object that qualifies or contains information about an asynchronous operation." Since the IAR is all you get on your callback, without closures, you need a way to shove some data into the async op so you can cast it out later.
Function pointers are fast
In what context? Compared to?
It sounds like you just want to use function pointers for the sake of using them. That would be bad.
A pointer to a function is normally used as a callback or event handler.

Should I use int or Int32

In C#, int and Int32 are the same thing, but I've read a number of times that int is preferred over Int32 with no reason given. Is there a reason, and should I care?
The two are indeed synonymous; int will be a little more familiar looking, Int32 makes the 32-bitness more explicit to those reading your code. I would be inclined to use int where I just need 'an integer', Int32 where the size is important (cryptographic code, structures) so future maintainers will know it's safe to enlarge an int if appropriate, but should take care changing Int32s in the same way.
The resulting code will be identical: the difference is purely one of readability or code appearance.
ECMA-334:2006 C# Language Specification (p18):
Each of the predefined types is shorthand for a system-provided type. For example, the keyword int refers to the struct System.Int32. As a matter of style, use of the keyword is favoured over use of the complete system type name.
They both declare 32 bit integers, and as other posters stated, which one you use is mostly a matter of syntactic style. However they don't always behave the same way. For instance, the C# compiler won't allow this:
public enum MyEnum : Int32
{
member1 = 0
}
but it will allow this:
public enum MyEnum : int
{
member1 = 0
}
Go figure.
I always use the system types - e.g., Int32 instead of int. I adopted this practice after reading Applied .NET Framework Programming - author Jeffrey Richter makes a good case for using the full type names. Here are the two points that stuck with me:
Type names can vary between .NET languages. For example, in C#, long maps to System.Int64 while in C++ with managed extensions, long maps to Int32. Since languages can be mixed-and-matched while using .NET, you can be sure that using the explicit class name will always be clearer, no matter the reader's preferred language.
Many framework methods have type names as part of their method names:
BinaryReader br = new BinaryReader( /* ... */ );
float val = br.ReadSingle(); // OK, but it looks a little odd...
Single val = br.ReadSingle(); // OK, and is easier to read
int is a C# keyword and is unambiguous.
Most of the time it doesn't matter but two things that go against Int32:
You need to have a "using System;" statement. using "int" requires no using statement.
It is possible to define your own class called Int32 (which would be silly and confusing). int always means int.
As already stated, int = Int32. To be safe, be sure to always use int.MinValue/int.MaxValue when implementing anything that cares about the data type boundaries. Suppose .NET decided that int would now be Int64, your code would be less dependent on the bounds.
Byte size for types is not too interesting when you only have to deal with a single language (and for code which you don't have to remind yourself about math overflows). The part that becomes interesting is when you bridge between one language to another, C# to COM object, etc., or you're doing some bit-shifting or masking and you need to remind yourself (and your code-review co-wokers) of the size of the data.
In practice, I usually use Int32 just to remind myself what size they are because I do write managed C++ (to bridge to C# for example) as well as unmanaged/native C++.
Long as you probably know, in C# is 64-bits, but in native C++, it ends up as 32-bits, or char is unicode/16-bits while in C++ it is 8-bits. But how do we know this? The answer is, because we've looked it up in the manual and it said so.
With time and experiences, you will start to be more type-conscientious when you do write codes to bridge between C# and other languages (some readers here are thinking "why would you?"), but IMHO I believe it is a better practice because I cannot remember what I've coded last week (or I don't have to specify in my API document that "this parameter is 32-bits integer").
In F# (although I've never used it), they define int, int32, and nativeint. The same question should rise, "which one do I use?". As others has mentioned, in most cases, it should not matter (should be transparent). But I for one would choose int32 and uint32 just to remove the ambiguities.
I guess it would just depend on what applications you are coding, who's using it, what coding practices you and your team follows, etc. to justify when to use Int32.
Addendum:
Incidentally, since I've answered this question few years ago, I've started using both F# and Rust. F#, it's all about type-inferences, and bridging/InterOp'ing between C# and F#, the native types matches, so no concern; I've rarely had to explicitly define types in F# (it's almost a sin if you don't use type-inferences). In Rust, they completely have removed such ambiguities and you'd have to use i32 vs u32; all in all, reducing ambiguities helps reduce bugs.
There is no difference between int and Int32, but as int is a language keyword many people prefer it stylistically (just as with string vs String).
In my experience it's been a convention thing. I'm not aware of any technical reason to use int over Int32, but it's:
Quicker to type.
More familiar to the typical C# developer.
A different color in the default visual studio syntax highlighting.
I'm especially fond of that last one. :)
I always use the aliased types (int, string, etc.) when defining a variable and use the real name when accessing a static method:
int x, y;
...
String.Format ("{0}x{1}", x, y);
It just seems ugly to see something like int.TryParse(). There's no other reason I do this other than style.
Though they are (mostly) identical (see below for the one [bug] difference), you definitely should care and you should use Int32.
The name for a 16-bit integer is Int16. For a 64 bit integer it's Int64, and for a 32-bit integer the intuitive choice is: int or Int32?
The question of the size of a variable of type Int16, Int32, or Int64 is self-referencing, but the question of the size of a variable of type int is a perfectly valid question and questions, no matter how trivial, are distracting, lead to confusion, waste time, hinder discussion, etc. (the fact this question exists proves the point).
Using Int32 promotes that the developer is conscious of their choice of type. How big is an int again? Oh yeah, 32. The likelihood that the size of the type will actually be considered is greater when the size is included in the name. Using Int32 also promotes knowledge of the other choices. When people aren't forced to at least recognize there are alternatives it become far too easy for int to become "THE integer type".
The class within the framework intended to interact with 32-bit integers is named Int32. Once again, which is: more intuitive, less confusing, lacks an (unnecessary) translation (not a translation in the system, but in the mind of the developer), etc. int lMax = Int32.MaxValue or Int32 lMax = Int32.MaxValue?
int isn't a keyword in all .NET languages.
Although there are arguments why it's not likely to ever change, int may not always be an Int32.
The drawbacks are two extra characters to type and [bug].
This won't compile
public enum MyEnum : Int32
{
AEnum = 0
}
But this will:
public enum MyEnum : int
{
AEnum = 0
}
I know that the best practice is to use int, and all MSDN code uses int. However, there's not a reason beyond standardisation and consistency as far as I know.
You shouldn't care. You should use int most of the time. It will help the porting of your program to a wider architecture in the future (currently int is an alias to System.Int32 but that could change). Only when the bit width of the variable matters (for instance: to control the layout in memory of a struct) you should use int32 and others (with the associated "using System;").
int is the C# language's shortcut for System.Int32
Whilst this does mean that Microsoft could change this mapping, a post on FogCreek's discussions stated [source]
"On the 64 bit issue -- Microsoft is indeed working on a 64-bit version of the .NET Framework but I'm pretty sure int will NOT map to 64 bit on that system.
Reasons:
1. The C# ECMA standard specifically says that int is 32 bit and long is 64 bit.
2. Microsoft introduced additional properties & methods in Framework version 1.1 that return long values instead of int values, such as Array.GetLongLength in addition to Array.GetLength.
So I think it's safe to say that all built-in C# types will keep their current mapping."
int is the same as System.Int32 and when compiled it will turn into the same thing in CIL.
We use int by convention in C# since C# wants to look like C and C++ (and Java) and that is what we use there...
BTW, I do end up using System.Int32 when declaring imports of various Windows API functions. I am not sure if this is a defined convention or not, but it reminds me that I am going to an external DLL...
Once upon a time, the int datatype was pegged to the register size of the machine targeted by the compiler. So, for example, a compiler for a 16-bit system would use a 16-bit integer.
However, we thankfully don't see much 16-bit any more, and when 64-bit started to get popular people were more concerned with making it compatible with older software and 32-bit had been around so long that for most compilers an int is just assumed to be 32 bits.
I'd recommend using Microsoft's StyleCop.
It is like FxCop, but for style-related issues. The default configuration matches Microsoft's internal style guides, but it can be customised for your project.
It can take a bit to get used to, but it definitely makes your code nicer.
You can include it in your build process to automatically check for violations.
It makes no difference in practice and in time you will adopt your own convention. I tend to use the keyword when assigning a type, and the class version when using static methods and such:
int total = Int32.Parse("1009");
int and Int32 is the same. int is an alias for Int32.
You should not care. If size is a concern I would use byte, short, int, then long. The only reason you would use an int larger than int32 is if you need a number higher than 2147483647 or lower than -2147483648.
Other than that I wouldn't care, there are plenty of other items to be concerned with.
int is an alias for System.Int32, as defined in this table:
Built-In Types Table (C# Reference)
I use int in the event that Microsoft changes the default implementation for an integer to some new fangled version (let's call it Int32b).
Microsoft can then change the int alias to Int32b, and I don't have to change any of my code to take advantage of their new (and hopefully improved) integer implementation.
The same goes for any of the type keywords.
You should not care in most programming languages, unless you need to write very specific mathematical functions, or code optimized for one specific architecture... Just make sure the size of the type is enough for you (use something bigger than an Int if you know you'll need more than 32-bits for example)
It doesn't matter. int is the language keyword and Int32 its actual system type.
See also my answer here to a related question.
Use of Int or Int32 are the same Int is just sugar to simplify the code for the reader.
Use the Nullable variant Int? or Int32? when you work with databases on fields containing null. That will save you from a lot of runtime issues.
Some compilers have different sizes for int on different platforms (not C# specific)
Some coding standards (MISRA C) requires that all types used are size specified (i.e. Int32 and not int).
It is also good to specify prefixes for different type variables (e.g. b for 8 bit byte, w for 16 bit word, and l for 32 bit long word => Int32 lMyVariable)
You should care because it makes your code more portable and more maintainable.
Portable may not be applicable to C# if you are always going to use C# and the C# specification will never change in this regard.
Maintainable ihmo will always be applicable, because the person maintaining your code may not be aware of this particular C# specification, and miss a bug were the int occasionaly becomes more than 2147483647.
In a simple for-loop that counts for example the months of the year, you won't care, but when you use the variable in a context where it could possibly owerflow, you should care.
You should also care if you are going to do bit-wise operations on it.
Using the Int32 type requires a namespace reference to System, or fully qualifying (System.Int32). I tend toward int, because it doesn't require a namespace import, therefore reducing the chance of namespace collision in some cases. When compiled to IL, there is no difference between the two.
According to the Immediate Window in Visual Studio 2012 Int32 is int, Int64 is long. Here is the output:
sizeof(int)
4
sizeof(Int32)
4
sizeof(Int64)
8
Int32
int
base {System.ValueType}: System.ValueType
MaxValue: 2147483647
MinValue: -2147483648
Int64
long
base {System.ValueType}: System.ValueType
MaxValue: 9223372036854775807
MinValue: -9223372036854775808
int
int
base {System.ValueType}: System.ValueType
MaxValue: 2147483647
MinValue: -2147483648
Also consider Int16. If you need to store an Integer in memory in your application and you are concerned about the amount of memory used, then you could go with Int16 since it uses less memeory and has a smaller min/max range than Int32 (which is what int is.)
It's 2021 and I've read all answers. Most says it's basically the same (it's an alias), or, it depends on "what you like", or "by convention use int..." No answer gives you a clear when, where and why use Int32 over int. That's why I'm here.
98% of the time, you can get away with int, and that's perfectly fine. What are the other 2% ?
IO with records (struct, native types, organization and compression). Someone said an useless application is one that can read and manipulate data, but not actually capable of writing new datas to a defined storage. But in order to not reinvent the wheel, at some point, those dealing with old datas has to retrieve the documentation on how to read them. And chances are they were compiled from an era where a long was always a 32-bits integer.
It happenned before, where some had trouble remembering a db is a byte, a dw is a word, a dd is a double word, but how many bits was that about ? And that will likely happen again on C# 43.0 on a 256-bits platform... where the (future) boys never heard of "by convention, use int instead of Int32". That's the 2% where Int32 matters over int. MSDN saying today it's recommended to use int is irrelevant, it usually works with current C# version, but that may get dropped in future MSDN pages, in 2028, or 2034 ? Fewer and fewer people have WORD and DWORD encouters today, yet, two decades ago, they were common. The same thing will happen to int, in the very case of dealing with precise-fixed-length data.
In memory, a ushort (UInt16) can be a Decimal as long as it's fractional part is null, it is positive or null, and does not exceed 65535. But inside a file, it must be a short, 16-bits long. And when you read a documentation about a file structure from another era (inside the source code), you realize there are 3545 records definitions, some nested inside others, each record having between a couple and hundreds of fields of varying types.
Somewhere in 2028 a boy thought he could just get away by Ctrl-H-ing int to Int32, whole word only and match case... ~67000 changes in whole solution. Hit Run and still get CTDs. Clap clap clap. Go figure which int you should have changed to Int32 and which ones you should have changed to var. Also worth to point out Pointers are useful, when you deal with terabytes of datas (have a virtual representation of an entire planet on some cloud, download on demand, and render to user screen). Pointers are really fast in the ~1% of cases where there are so many datas to compute in realtime, you must trade with unsafe code. Again, it's to come up with an actually useful application, instead of being fancy and waste time porting to managed. So, be carefull, IntPtr is 32-bits or 64-bits already ? Could you get away with your code without caring how many bytes you read/skip ? Or just go (Int32*) int32Ptr = (Int32*) int64Ptr;...
An even more factual example is a file containing data processing and their respective commands (methods in the source code), like internal branching (a conditional continue or jump to if the test fails) :
IfTest record in file says : if value equals someConstant, jump to address. Where address is a 16-bits integer representing a relative pointer inside the file (you can go back towards the start of the file up to 32768 bytes, or up to 32767 bytes further down). But 10 years later, platforms can handle larger files and larger datas, and now you have 32-bits relative address. Your method in the source code were named IfTestMethod(...), now how would you name the new one ? IfTestMethodInt() or IfTestMethod32() ? Would you also rename the old method IfTestMethodShort() or IfTestMethod16() ? Then a decade later, you get a new command with long (Int64) relative address... What about a 128 bits command some 10 years later ? Be consistent ! Principles are great, but sometimes logic is better.
The problem is not me or you writing a code today, and it appears okay to us. It is being in the place of the one guy trying to understand what we wrote, 10 or 20 years later, how much it costs in time (= money) to come up with a working updated code ? Being explicit or writing redundant comments will actually save time. Which one you prefer ? Int32 val; or var val; // 32-bits.
Also, working with foreign data from other platforms or compile directives is a concept (today involves Interop, COM, PInvoke...) And that's a concept we cannot get rid of, whatever the era, because it takes time to update (reformat) datas (via serialization for ex.) Upgrading DLLs to managed code also takes time. We took time to leave assembler behind and go full-C. We are taking time to move from 32-bits datas to 64-bits, yet, we still need to care about 8 and 16-bits. What next in the future ? Move from 128-bits to 256 or directly to 1024 ? Do not assume a keyword explicit to you will remain explicit for the guys reading your documentation 20 years later (and documentation usually contains errors, mainly because of copy/paste).
So here it is : Where to use Int32 today over int ?
It's when you are producing code that is data-size sensible (IO, network, cross-platform data...), and at some point in the future - could be decades later - someone will have to understand and port your code. The key reason is era-based. 1000 lines of code, it's okay to use int, 100000 lines, it's not anymore. That's a rare duty only a few will have to do, and hell yeah, they have struggle, if only some were a little more explicit instead of relying on "by convention" or "it looks pretty in the IDE, Int32 is so ugly" or "they are the same, don't bother, it's a waste of time to write that two numbers and holding shift key", or "int is unambiguous", or "those who don't like int are just VB fanboys - go learn C# you noob" (yeah, that's the underlying meaning of a few comments right here)
Do not take what I wrote as a generalized perception, nor an attempt to promote Int32 on all cases. I clearly stated the specific case (as it seems to me this was not clear from other answers), to advocate for the few ones getting blammed by their supervisors for being fancy writing Int32, and at the same time the very same supervisor not understanding what takes so long to rewrite that C DLL to C#. It's an edge case, but at least for those reading, "Int32" has at least one purpose in its life.
The point can be further discussed by turning the question the other way around : Why not just get rid of Int32, Int64 and all the other variants in future C# specifications ? What that would imply ?

Categories

Resources