I am writing some tools to help validate IL that is emitted at runtime. A part of this validation involves maintaining a Stack<Type> as OpCodes are emitted so that future OpCodes that utilize these stack elements can be validated as using the proper types. I am confused as to how to handle the ldind.i opcode, however.
The Microsoft documentation states:
The ldind.i instruction indirectly loads a native int value from the
specified address (of type native int, &, or *) onto the stack as a
native int.
In C#, native int is not defined, and I am confused as to what type most accurately represents this data. How can I determine what its size is, and which C# type should be used to represent it? I am concerned it will vary by system hardware.
To my mind, you'd be better off looking at how the VES is defined and using a dedicated enum to model the types on the stack rather than C# visible types. Otherwise you're in for a rude surprise when we get to the floating point type.
From MS Partition I.pdf1, Section 12.1:
The CLI model uses an evaluation stack [...] However, the CLI supports only a subset of these types in its operations upon values stored on its evaluation stack—int32, int64, and native int. In addition, the CLI supports an internal data type to represent floating-point values on the internal evaluation stack. The size of the internal data type is implementation-dependent.
So those, as well as things like references are the things you should track, and I'd recommend you do that with an explicit model of the VES Stack using its terms.
1ECMA C# and Common Language Infrastructure Standards
Related
How do System.float, System.int and other primitives types work? I never understood how it was possible to make primitives structs and I wonder if I could make my own numeric type.
Assuming we're talking about a C# compiler which targets the Common Language Infrastructure (CLI), as almost everything does, it basically uses the primitive types exposed by the CLI.
There are effectively three levels of support to consider:
Genuine primitives, with their own representation and instructions in IL
Numeric types that the C# compiler has special knowledge of, but which aren't part of the the CLI - basically, System.Decimal. This is also part of the Common Type System (CTS) which means that if you create a const decimal in C#, you can still consume it in VB, for example. But there's still no direct IL support.
Other numeric types, such as BigInteger - you can write your own ones of these.
The middle ground of the second bullet allows C# to have decimal literals and decimal constants, neither of which are possible for the third bullet. For example, BigInteger doesn't have language support, so you can't write:
// This isn't valid
BigInteger bigInteger = 123456789012345678901234567890;
You'd have to parse a string representation instead. Likewise you can't have a const BigInteger.
(In theory it would be possible to have a type with support in C# but not in the CTS. I don't know of any such types.)
Primitive types like int and float are supported directly by the processor, and the .net platform and languages have similar built-in support for them. So there are no way you could create your own primitives at the language level.
I have repeatedly heard that generics in C# are less powerful than templates in C++. But I have not heard any arguments in favor of (or against) this. Is it really so, if so, in what it shows?
I recently faced with the strange feature that if SomeClassChild is descendant of class SomeClass, then List<SomeClassChild> can not be converted to List<SomeClass>, whereas SomeClassChild[] to SomeClass[] - can.
The following code will also result in an error:
List<SomeClass> lst = new List<SomeClass>();
lst.Add(new SomeClassChild());
The book "C# in depth", it has a topic on comparing generics between languages. As C++/C#, I just copy some content form the book:
The C++ compiler is smart enough to compile the code only once for any given set
of template arguments, but it isn’t able to share code in the way that the CLR does with
reference types. That lack of sharing does have its benefits, though—it allows type specific optimizations, such as inlining method calls for some type parameters but not
others, from the same template. It also means that overload resolution can be performed separately for each set of type parameters, rather than just once based solely
on the limited knowledge the C# compiler has due to any constraints present.
One significant feature that C++ templates have over C# generics is that the template arguments don’t have to be type names. Variable names, function names, and
constant expressions can be used as well. A common example of this is a buffer type
that has the size of the buffer as one of the template arguments—a buffer
will always be a buffer of 20 integers, and a bufferwill always be a buffer
of 35 doubles. This ability is crucial to template metaprogramming (see the Wikipedia
article, http://en.wikipedia.org/wiki/Template_metaprogramming), which is an
advanced C++ technique, the very idea of which scares me but that can be powerful in
the hands of experts.
C++ templates are more flexible in other ways too. They don’t suffer from the lack
of operator constraints and there are a few other restrictions
that don’t exist in C++: you can derive a class from one of its type parameters, and you
can specialize a template for a particular set of type arguments. The latter ability allows
the template author to write general code to be used when there’s no more knowledge
available, and specific (often highly optimized) code for particular types.
If you want to find out more, check the book.
I was reading about Marshaling. and im confused because what does mean this in unmanaged code.
HRESULT, DWORD, and HANDLE.
The original text is:
You already know that there is no such compatibility between managed and unmanaged environments. In other words, .NET does not contain such the types HRESULT, DWORD, and HANDLE that exist in the realm of unmanaged code. Therefore, you need to find a .NET substitute or create your own if needed. That is what called marshaling.
short answer:
it is just telling you that you must "map" one data type used in one programming language to another data type used in a different programming language, and the data types must match.
quick answer:
For this one, the details may not be correct, but the concept is.
These are a few of the data types defined in the Windows header files for C/C++. They are "macros" which "abstract" the primitive data types of C/C++ into more meaningful data types used in Windows programming. For instance, DWORD is really an 32-bit unsigned integer in C/C++, but on 64-bit processors, it is defined in the header files as a 64-bit unsigned integer. The idea is to provide an abstraction layer between the data type needed by the processor and the data types used by the language.
During marshalling, this "dword" will be converted to the CLR data type you specify in the DllImport declaration. This is an important point.
Let's say you want to call a Windows API method that takes a DWORD parameter. When declaring this call in C# using DllImport, you must specify the parameter data type as System.UInt32. If you don't, "bad things will happen".
For example, if you mistakenly specify the parameter data type as System.UInt64. When the actual call is made, the stack will become corrupt because more bytes are being placed on the stack then the API call expects. Which can lead to completely unexpected behavior, such as crashing the application, crashing Windows, invalid return values, or whatever.
That is why it is important to specific the correct data type.
data types in question:
DWORD is defined as 32-bit unsigned integer or the CLR type System.UInt32.
HANDLE is the CLR types IntPtr, UintPtr, or HandleRef
HRESULT is System.Int32 or System.UInt32
References:
Using P/Invoke to Call Unmanaged APIs from Your Managed Classes at http://msdn.microsoft.com/en-us/library/aa719104(v=vs.71).aspx has a table listing the Windows data type with its corresponding CLR data type that specifically answers your question.
Windows Data Types (Windows) at http://msdn.microsoft.com/en-us/library/aa383751(v=VS.85).aspx
.NET Column: Calling Win32 DLLs in C# with P/Invoke at http://msdn.microsoft.com/en-us/magazine/cc164123.aspx
HRESULT: http://en.wikipedia.org/wiki/HRESULT
In the field of computer programming, the HRESULT is a data type used
in Windows operating systems, and the earlier IBM/Microsoft OS/2
Operating system, used to represent error conditions, and warning
conditions. The original purpose of HRESULTs was to formally lay out
ranges of error codes for both public and Microsoft internal use in
order to prevent collisions between error codes in different
subsystems of the OS/2 Operating System. HRESULTs are numerical error
codes. Various bits within an HRESULT encode information about the
nature of the error code, and where it came from. HRESULT error codes
are most commonly encountered in COM programming, where they form the
basis for a standardized COM error handling convention.
DWORD: http://en.wikipedia.org/wiki/DWORD#Size_families
HANDLE: http://en.wikipedia.org/wiki/Handle_(computing)
In computer programming, a handle is an abstract reference to a
resource. Handles are used when application software references blocks
of memory or objects managed by another system, such as a database or
an operating system. While a pointer literally contains the address of
the item to which it refers, a handle is an abstraction of a reference
which is managed externally; its opacity allows the referent to be
relocated in memory by the system without invalidating the handle,
which is impossible with pointers. The extra layer of indirection also
increases the control the managing system has over operations
performed on the referent. Typically the handle is an index or a
pointer into a global array of tombstones.
HRESULT, DWORD, and HANDLE are typedef's (i.e., they represent plain data types) defined by Microsoft for use by programmers compiling *un*managed code in a Windows environment. They are defined in a C (or C++) header file that is provided by Microsoft that is, typically, automatically included in unmanaged Windows projects created within Microsoft Visual Studio.
There is a well-known fact that C++ templates are turing-complete, CSS is turing-complete (!) and that the C# overload resolution is NP-hard (even without generics).
But is C# 4.0 (with co/contravariance, generics etc) compile-time turing complete?
Unlike templates in C++, generics in C# (and other .net lang) are a runtime generated feature. The compiler does do some checking as to verify the types use but, actual substitution happens at runtime. Same goes for Co and contravariance if I'm not mistaken as well as even the preprocessor directives. Lots of CLR magic.
(At the implementation level, the primary difference is that C#
generic type substitutions are performed at runtime and generic type
information is thereby preserved for instantiated objects)
See MSDN
http://msdn.microsoft.com/en-us/library/c6cyy67b(v=vs.110).aspx
Update:
The CLR does preform type checking via information stored in the metadata associated with the compiled assemblies(Vis-à-vis Jit Compliation), It does this as one of its many services,(ShuggyCoUk answer on this question explains it in detail) (others include memory management and exception handling). So with that I would infer that the compiler has a understanding of state as progression and state as in machine internal state (TC,in part, mean being able to review data (symbols) with so reference to previous data(symbols) , conditionally and evaluate) (I hesitated to state the exact def of TC as I, myself am not sure I have it fully grasped, so feel free to fill in the blanks and correct me when applicable ) So with that I would say with a bit of trepidation, yes, yes it can be.
C# and VB.NET comes with built in types that maps to the CLR types. Examples are: int (C#) and Integer (VB) maps to System.Int32, long (C#) and Long (VB) maps to System.Int64. What are the best practices for deciding when to use built in types or not to use them (using the System.* structs/classes instead)?
I nearly always use the built-in aliases, such as int/short/long. They are easier to read, and do not require you to import System or to type System.Int32 everywhere, etc.
The language clearly defines them, and gives them a specific meaning, so I do not see any harm. However, this is 100% a personal choice.
That being said - the one place where I do explicitly use Int32, Int16, etc., is if I'm dealing with binary storage or transfer, especially to or from a custom binary format. In this case, having the explicit bitsize of each member going into and out of the file makes the code more readable and understandable, IMO.
The language types (e.g. string, int, char) are simply Aliases for the CLR types (System.String, System.Int32, System.Char).
They are interchangeable, there is no need to prefer one over the other.
EDIT
The poster asked for some help in choosing between the two, very well.
Personally I tend to choose the C# language types (int, string, char etc), because they involve less typing - I suppose I'm just lazy :)
The only time I would ever explicitly use "System.XYZ" in preference to a built-in type keyword is when I need an integer type of very specific size, and I want that to be clear to anyone reading my code (e.g. I might use Int32 instead of int if the integer in question is actually 4 8-bit fields packed together.)
I always use the System.* types because they look more consistent between other classes - upper case first letter and the same syntax highlighting. But that's just a personal preference and just an aesthetic issue.
Using "int" and "Int32" (and the others) are exactly same. Typicaly are used the keywords (int, Integer (vb.net), bool, etc...), because it is shorter and is highlited in IDE.
Rather than when to use or not use the language types versus explicit BCL class names, it is more important to know whether or not the type you intend to use is CLS Compliant.
Specifically, the unsigned integer types are not CLS compliant because there is no requirement that a language support unsigned integer math.
Other than this wrinkle... I would recommend whichever idiom is more in keeping with your organizations code practices. If you fully namespace your type references, then I would continue that pattern with the System.* namespace... (I would also recommend against that practice, though, as it adds reader load without attendant gain in clarity).