I am a tinkerer—no doubt about that. For this reason (and very little beyond that), I recently did a little experiment to confirm my suspicion that writing to a struct is not an atomic operation, which means that a so-called "immutable" value type which attempts to enforce certain constraints could hypothetically fail at its goal.
I wrote a blog post about this using the following type as an illustration:
struct SolidStruct
{
public SolidStruct(int value)
{
X = Y = Z = value;
}
public readonly int X;
public readonly int Y;
public readonly int Z;
}
While the above looks like a type for which it could never be true that X != Y or Y != Z, in fact this can happen if a value is "mid-assignment" at the same time it is copied to another location by a separate thread.
OK, big deal. A curiosity and little more. But then I had this hunch: my 64-bit CPU should actually be able to copy 64 bits atomically, right? So what if I got rid of Z and just stuck with X and Y? That's only 64 bits; it should be possible to overwrite those in one step.
Sure enough, it worked. (I realize some of you are probably furrowing your brows right now, thinking, Yeah, duh. How is this even interesting? Humor me.) Granted, I have no idea whether this is guaranteed or not given my system. I know next to nothing about registers, cache misses, etc. (I am literally just regurgitating terms I've heard without understanding their meaning); so this is all a black box to me at the moment.
The next thing I tried—again, just on a hunch—was a struct consisting of 32 bits using 2 short fields. This seemed to exhibit "atomic assignability" as well. But then I tried a 24-bit struct, using 3 byte fields: no go.
Suddenly the struct appeared to be susceptible to "mid-assignment" copies once again.
Down to 16 bits with 2 byte fields: atomic again!
Could someone explain to me why this is? I've heard of "bit packing", "cache line straddling", "alignment", etc.—but again, I don't really know what all that means, nor whether it's even relevant here. But I feel like I see a pattern, without being able to say exactly what it is; clarity would be greatly appreciated.
The pattern you're looking for is the native word size of the CPU.
Historically, the x86 family worked natively with 16-bit values (and before that, 8-bit values). For that reason, your CPU can handle these atomically: it's a single instruction to set these values.
As time progressed, the native element size increased to 32 bits, and later to 64 bits. In every case, an instruction was added to handle this specific amount of bits. However, for backwards compatibility, the old instructions were still kept around, so your 64-bit processor can work with all of the previous native sizes.
Since your struct elements are stored in contiguous memory (without padding, i.e. empty space), the runtime can exploit this knowledge to only execute that single instruction for elements of these sizes. Put simply, that creates the effect you're seeing, because the CPU can only execute one instruction at a time (although I'm not sure if true atomicity can be guaranteed on multi-core systems).
However, the native element size was never 24 bits. Consequently, there is no single instruction to write 24 bits, so multiple instructions are required for that, and you lose the atomicity.
The C# standard (ISO 23270:2006, ECMA-334) has this to say regarding atomicity:
12.5 Atomicity of variable references
Reads and writes of the following data types shall be atomic: bool, char, byte, sbyte, short, ushort,
uint, int, float, and reference types. In addition, reads and writes of enum types with an underlying type
in the previous list shall also be atomic. Reads and writes of other types, including long, ulong, double,
and decimal, as well as user-defined types, need not be atomic. (emphasis mine) Aside from the library functions designed
for that purpose, there is no guarantee of atomic read-modify-write, such as in the case of increment or
decrement.Your example X = Y = Z = value is short hand for 3 separate assignment operations, each of which is defined to be atomic by 12.5. The sequence of 3 operations (assign value to Z, assign Z to Y, assign Y to X) is not guaranteed to be atomic.
Since the language specification doesn't mandate atomicity, while X = Y = Z = value; might be an atomic operation, whether it is or not is dependent on a whole bunch of factors:
the whims of the compiler writers
what code generation optimizations options, if any, were selected at build time
the details of the JIT compiler responsible for turning the assembly's IL into machine language. Identical IL run under Mono, say, might exhibit different behaviour than when run under .Net 4.0 (and that might even differ from earlier versions of .Net).
the particular CPU on which the assembly is running.
One might also note that even a single machine instruction is not necessarily warranted to be an atomic operation—many are interruptable.
Further, visiting the CLI standard (ISO 23217:2006), we find section 12.6.6:
12.6.6 Atomic reads and writes
A conforming CLI shall guarantee that read and write access to properly
aligned memory locations no larger than the native word size (the size of type
native int) is atomic (see §12.6.2) when all the write accesses to a location are
the same size. Atomic writes shall alter no bits other than those written. Unless
explicit layout control (see Partition II (Controlling Instance Layout)) is used to
alter the default behavior, data elements no larger than the natural word size (the
size of a native int) shall be properly aligned. Object references shall be treated
as though they are stored in the native word size.
[Note: There is no guarantee
about atomic update (read-modify-write) of memory, except for methods provided for
that purpose as part of the class library (see Partition IV). (emphasis mine)
An atomic write of a “small data item” (an item no larger than the native word size)
is required to do an atomic read/modify/write on hardware that does not support direct
writes to small data items. end note]
[Note: There is no guaranteed atomic access to 8-byte data when the size of
a native int is 32 bits even though some implementations might perform atomic
operations when the data is aligned on an 8-byte boundary. end note]
x86 CPU operations take place in 8, 16, 32, or 64 bits; manipulating other sizes requires multiple operations.
The compiler and x86 CPU are going to be careful to move only exactly as many bytes as the structure defines. There are no x86 instructions that can move 24 bits in one operation, but there are single instruction moves for 8, 16, 32, and 64 bit data.
If you add another byte field to your 24 bit struct (making it a 32 bit struct), you should see your atomicity return.
Some compilers allow you to define padding on structs to make them behave like native register sized data. If you pad your 24 bit struct, the compiler will add another byte to "round up" the size to 32 bits so that the whole structure can be moved in one atomic instruction. The downside is your structure will always occupy 30% more space in memory.
Note that alignment of the structure in memory is also critical to atomicity. If a multibyte structure does not begin at an aligned address, it may span multiple cache lines in the CPU cache. Reading or writing this data will require multiple clock cycles and multiple read/writes even though the opcode is a single move instruction. So, even single instruction moves may not be atomic if the data is misaligned. x86 does guarantee atomicity for native sized read/writes on aligned boundaries, even in multicore systems.
It is possible to achieve memory atomicity with multi-step moves using the x86 LOCK prefix. However this should be avoided as it can be very expensive in multicore systems (LOCK not only blocks other cores from accessing memory, it also locks the system bus for the duration of the operation which can impact disk I/O and video operations. LOCK may also force the other cores to purge their local caches)
Related
I almost went crazy when trying to debug a random 40x performance drop when running in x86 on an algorithm which make heavy use of Interlock.CompareExchange with an Int64.
I finally isolated the issue and it occurred only when the said Int64 was not 8-bytes aligned.
No matter how I explicitly position the field in a StructLayout, its dependent on the base address of the outer object on the heap. On x86 the base address will be either 4-bytes aligned or 8-bytes aligned.
I thought of defining a 12 bytes struct and set the Int64 at offset 0 or offset 4 depending on the alignment, but that's kinda hacky.
Is there a good practice in c# for performing Interlocked operation on Int64 in x86 that guarantees the proper alignment?
EDIT
The code can be found here:
https://github.com/akkadotnet/akka.net/pull/1569#discussion-diff-47997213R520
Its a threadpool implementation based on the Clr ThreadPool. The issue is about storing the state of a custom semaphore in a 8 bytes struct and modifying it with InterlockedCompareExchange64.
i have this code in c:
long data1 = 1091230456;
*(double*)&((data1)) = 219999.02343875566
when i use the same code in C# the result is:
*(double*)&((data1)) = 5.39139480005278E-315
but if i define another varibale in C# :
unsafe
{long *data2 = &(data1);}
now :
*(double)&((data2)) = 219999.02343875566
Why the difference ?
Casting pointers is always tricky, especially when you don't have guarantees about the layout and size of the underlying types.
In C#, long is always a 64-bit integer and double is always 64-bit floating point number.
In C, long can easily end up being smaller than the 64-bits needed. If you're using a compiler that translates long as a 32-bit number, the rest of the value will be junk read from the next piece of memory - basically a "buffer" overflow.
On Windows, you usually want to use long long for 64-bit integers. Or better, use something like int64_t, where you're guaranteed to have exactly 64-bits of data. Or the best, don't cast pointers.
C integer types can be confusing if you have a Java / C# background. They give you guarantees about the minimal range they must allow, but that's it. For example, int must be able to hold values in the [−32767,+32767] range (note that it's not -32768 - C had to support one's complement machines, which had two zeroes), close to C#'s short. long must be able to hold values in the [−2147483647,+2147483647] range, close to C#'s int. Finally, long long is close to C#'s long, having at least the [-2^63,+2^63] range. float and double are specified even more loosely.
Whenever you cast pointers, you throw away even the tiny bits of abstraction C provides you with - you work with the underlying hardware layouts, whatever those are. This is one road to hell and something to avoid.
Sure, these days you probably will not find one's complement numbers, or other floating points than IEEE 754, but it's still inherently unsafe and unpredictable.
EDIT:
Okay, reproducing your example fully in a way that actually compiles:
unsafe
{
long data1 = 1091230456;
long *data2 = &data1;
var result = *(double*)&((data2));
}
result ends up being 219999.002675845 for me, close enough to make it obvious. Let's see what you're actually doing here, in more detail:
Store 1091230456 in a local data1
Take the address of data1, and store it in data2
Take the address of data2, cast it to a double pointer
Take the double value of the resulting pointer
It should be obvious that whatever value ends up in result has little relation to the value you stored in data1 in the first place!
Printing out the various parts of what you're doing will make this clearer:
unsafe
{
long data1 = 1091230456;
long *pData1 = &data1;
var pData2 = &pData1;
var pData2Double = (double*)pData2;
var result = *pData2Double;
new
{
data1 = data1,
pData1 = (long)pData1,
pData2 = (long)pData2,
pData2Double = (long)pData2Double,
result = result
}.Dump();
}
This prints out:
data1: 1091230456
pData1: 91941328
pData2: 91941324
pData2Double: 91941324
result: 219999.002675845
This will vary according to many environmental settings, but the critical part is that pData2 is pointing to memory four bytes in front of the actual data! This is because of the way the locals are allocated on stack - pData2 is pointing to pData1, not to data1. Since we're using 32-bit pointers here, we're reading the last four bytes of the original long, combined with the stack pointer to data1. You're reading at the wrong address, skipping over one indirection. To get back to the correct result, you can do something like this:
var pData2Double = (double**)pData2;
var result = *(*pData2Double);
This results in 5.39139480005278E-315 - the original value produced by your C# code. This is the more "correct" value, as far as there can even be a correct value.
The obvious answer here is that your C code is wrong as well - either due to different operand semantics, or due to some bug in the code you're not showing (or again, using a 32-bit integer instead of 64-bit), you end up with a pointer to a pointer to the value you want, and you mistakenly build the resulting double on a scrambled value that includes part of the original long, as well as the stack pointer - in other words, exactly one of the reasons you should be extra cautious whenever using unsafe code. Interestingly, this also implies that when compiled as a 64-bit executable, the result will be entirely decoupled from the value of data1 - you'll have a double built on the stack pointer exclusively.
Don't mess with pointers until you understand indirection very, very well. They have a tendency to "mostly work" when used entirely wrong. Then you change a tiny part of the code (for example, in this code you could add a third local, which could change where pData1 is allocated) or move to a different architecture (32-bit vs. 64-bit is quite enough in this example), or a different compiler, or a different OS... and it breaks completely. You don't guess around your way with pointers. Either you know exactly what every single expression in the code means, or you shouldn't deal with pointers at all.
After further investigation, it all boils down to this:
(decimal)((object)my_4_decimal_place_double_value_20.9032)
after casting twice, it becomes 20.903199999999998
I have a double value, which is rounded to just 4 decimal points via Math.Round(...) the value is 20.9032
In my dev environment, it is displayed as is.
But in released environment, it is displayed as 20.903199999999998
There were no operation after Math.Round(...) but the value has been copied around and assigned.
How can this happen?
Updates:
Data is not loaded from a DB.
returned value from Math.Round() is assigned to the original double varible.
Release and dev are the same architecture, if this information helps.
According to the CLR ECMA specification:
Storage locations for floating-point numbers (statics, array elements,
and fields of classes) are of fixed size. The supported storage sizes
are float32 and float64. Everywhere else (on the evaluation stack, as
arguments, as return types, and as local variables) floating-point
numbers are represented using an internal floating-point type. In each
such instance, the nominal type of the variable or expression is
either R4 or R8, but its value can be represented internally with
additional range and/or precision. The size of the internal
floating-point representation is implementation-dependent, can vary,
and shall have precision at least as great as that of the variable or
expression being represented. An implicit widening conversion to the
internal representation from float32 or float64 is performed when
those types are loaded from storage. The internal representation is
typically the native size for the hardware, or as required for
efficient implementation of an operation.
To translate, the IL generated will be the same (except that debug mode inserts nops in places to ensure a breakpoint is possible, it may also deliberately maintain a temporary variable that release mode deems unnecessary.)... but the JITter is less aggressive when dealing with an assembly marked as debug. Release builds tend to move more floating values into 80-bit registers; debug builds tend to read direct from 64-bit memory storage.
If you want a "precise" float number printing, use string.Substring(...) instead of Math.Round
A IEEE754 double precision floating point number can not represent 20.9032.
The most accurate representation is 2.09031999999999982264853315428E1 and that is what you see in your output.
Do not format numbers with round instead use the string format of the double.ToString(string formatString) Method.
See msdn documentation of Double.ToString Method (String)
The difference between Release and Debug build may be some optimization that gets done for the release build, but this is way to detailed in my opinion.
In my opinion the core issue is that you try to format a text output with a mathematical Operation. I'm sorry but i don't know what in detail creates the different behavior.
I have found a few threads in regards to this issue. Most people appear to favor using int in their c# code accross the board even if a byte or smallint would handle the data unless it is a mobile app. I don't understand why. Doesn't it make more sense to define your C# datatype as the same datatype that would be in your data storage solution?
My Premise:
If I am using a typed dataset, Linq2SQL classes, POCO, one way or another I will run into compiler datatype conversion issues if I don't keep my datatypes in sync across my tiers. I don't really like doing System.Convert all the time just because it was easier to use int accross the board in c# code. I have always used whatever the smallest datatype is needed to handle the data in the database as well as in code, to keep my interface to the database clean. So I would bet 75% of my C# code is using byte or short as opposed to int, because that is what is in the database.
Possibilities:
Does this mean that most people who just use int for everything in code also use the int datatype for their sql storage datatypes and could care less about the overall size of their database, or do they do system.convert in code wherever applicable?
Why I care: I have worked on my own forever and I just want to be familiar with best practices and standard coding conventions.
Performance-wise, an int is faster in almost all cases. The CPU is designed to work efficiently with 32-bit values.
Shorter values are complicated to deal with. To read a single byte, say, the CPU has to read the 32-bit block that contains it, and then mask out the upper 24 bits.
To write a byte, it has to read the destination 32-bit block, overwrite the lower 8 bits with the desired byte value, and write the entire 32-bit block back again.
Space-wise, of course, you save a few bytes by using smaller datatypes. So if you're building a table with a few million rows, then shorter datatypes may be worth considering. (And the same might be good reason why you should use smaller datatypes in your database)
And correctness-wise, an int doesn't overflow easily. What if you think your value is going to fit within a byte, and then at some point in the future some harmless-looking change to the code means larger values get stored into it?
Those are some of the reasons why int should be your default datatype for all integral data. Only use byte if you actually want to store machine bytes. Only use shorts if you're dealing with a file format or protocol or similar that actually specifies 16-bit integer values. If you're just dealing with integers in general, make them ints.
I am only 6 years late but maybe I can help someone else.
Here are some guidelines I would use:
If there is a possibility the data will not fit in the future then use the larger int type.
If the variable is used as a struct/class field then by default it will be padded to take up the whole 32-bits anyway so using byte/int16 will not save memory.
If the variable is short lived (like inside a function) then the smaller data types will not help much.
"byte" or "char" can sometimes describe the data better and can do compile time checking to make sure larger values are not assigned to it on accident. e.g. If storing the day of the month(1-31) using a byte and try to assign 1000 to it then it will cause an error.
If the variable is used in an array of roughly 100 or more I would use the smaller data type as long as it makes sense.
byte and int16 arrays are not as thread safe as an int (a primitive).
One topic that no one brought up is the limited CPU cache. Smaller programs execute faster then larger ones because the CPU can fit more of the program in the faster L1/L2/L3 caches.
Using the int type can result in fewer CPU instructions however it will also force a higher percentage of the data memory to not fit in the CPU cache. Instructions are cheap to execute. Modern CPU cores can execute 3-7 instructions per clock cycle however a single cache miss on the other hand can cost 1000-2000 clock cycles because it has to go all the way to RAM.
When memory is conserved it also results in the rest of the application performing better because it is not squeezed out of the cache.
I did a quick sum test with accessing random data in random order using both a byte array and an int array.
const int SIZE = 10000000, LOOPS = 80000;
byte[] array = Enumerable.Repeat(0, SIZE).Select(i => (byte)r.Next(10)).ToArray();
int[] visitOrder = Enumerable.Repeat(0, LOOPS).Select(i => r.Next(SIZE)).ToArray();
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
int sum = 0;
foreach (int v in visitOrder)
sum += array[v];
sw.Stop();
Here are the results in time(ticks): (x86, release mode, without debugger, .NET 4.5, I7-3930k) (smaller is better)
________________ Array Size __________________
10 100 1K 10K 100K 1M 10M
byte: 549 559 552 552 568 632 3041
int : 549 566 552 562 590 1803 4206
Accessing 1M items randomly using byte on my CPU had a 285% performance increase!
Anything under 10,000 was hardly noticeable.
int was never faster then byte for this basic sum test.
These values will vary with different CPUs with different cache sizes.
One final note, Sometimes I look at the now open-source .NET framework to see what Microsoft's experts do. The .NET framework uses byte/int16 surprisingly little. I could not find any actually.
You would have to be dealing with a few BILLION rows before this makes any significant difference in terms of storage capacity. Lets say you have three columns, and instead of using a byte-equivalent database type, you use an int-equivalent.
That gives us 3 (columns) x 3 (bytes extra) per row, or 9 bytes per row.
This means, for "a few million rows" (lets say three million), you are consuming a whole extra 27 megabytes of disk space! Fortunately as we're no longer living in the 1970s, you shouldn't have to worry about this :)
As said above, stop micro-optimising - the performance hit in converting to/from different integer-like numeric types is going to hit you much, much harder than the bandwidth/diskspace costs, unless you are dealing with very, very, very large datasets.
For the most part, 'No'.
Unless you know upfront that you are going to be dealing with 100's of millions of rows, it's a micro-optimisation.
Do what fits the Domain model best. Later, if you have performance problems, benchmark and profile to pin-point where they are occuring.
Not that I didn't believe Jon Grant and others, but I had to see for myself with our "million row table". The table has 1,018,000. I converted 11 tinyint columns and 6 smallint columns into int, there were already 5 int & 3 smalldatetimes. 4 different indexes used a combo of the various data types, but obviously the new indexes are now all using int columns.
Making the changes only cost me 40 mb calculating base table disk usage with no indexes. When I added the indexes back in the overall change was only 30 mb difference overall. So I was suprised because I thought the index size would be larger.
So is 30 mb worth the hassle of using all the different data types, No Way! I am off to INT land, thanks everyone for setting this anal retentive programmer back on the straight and happy blissful life of no more integer conversions...yippeee!
The .NET runtime is optimised for Int32. See previous discussion at .NET Integer vs Int16?
If int is used everywhere, no casting or conversions are required. That is a bigger bang for the buck than the memory you will save by using multiple integer sizes.
It just makes life simpler.
I often have to convert a retreived value (usually as a string) - and then convert it to an int. But in C# (.Net) you have to choose either int16, int32 or int64 - how do you know which one to choose when you don't know how big your retrieved number will be?
Everyone here who has mentioned that declaring an Int16 saves ram should get a downvote.
The answer to your question is to use the keyword "int" (or if you feel like it, use "Int32").
That gives you a range of up to 2.4 billion numbers... Also, 32bit processors will handle those ints better... also (and THE MOST IMPORTANT REASON) is that if you plan on using that int for almost any reason... it will likely need to be an "int" (Int32).
In the .Net framework, 99.999% of numeric fields (that are whole numbers) are "ints" (Int32).
Example: Array.Length, Process.ID, Windows.Width, Button.Height, etc, etc, etc 1 million times.
EDIT: I realize that my grumpiness is going to get me down-voted... but this is the right answer.
Just wanted to add that... I remembered that in the days of .NET 1.1 the compiler was optimized so that 'int' operations are actually faster than byte or short operations.
I believe it still holds today, but I'm running some tests now.
EDIT: I have got a surprise discovery: the add, subtract and multiply operations for short(s) actually return int!
Repeatedly trying TryParse() doesn't make sense, you have a field already declared. You can't change your mind unless you make that field of type Object. Not a good idea.
Whatever data the field represents has a physical meaning. It's an age, a size, a count, etc. Physical quantities have realistic restraints on their range. Pick the int type that can store that range. Don't try to fix an overflow, it would be a bug.
Contrary to the current most popular answer, shorter integers (like Int16 and SByte) do often times take up less space in memory than larger integers (like Int32 and Int64). You can easily verify this by instantiating large arrays of sbyte/short/int/long and using perfmon to measure managed heap sizes. It is true that many CLR flavors will widen these integers for CPU-specific optimizations when doing arithmetic on them and such, but when stored as part of an object, they take up only as much memory as is necessary.
So, you definitely should take size into consideration especially if you'll be working with large list of integers (or with large list of objects containing integer fields). You should also consider things like CLS-compliance (which disallows any unsigned integers in public members).
For simple cases like converting a string to an integer, I agree an Int32 (C# int) usually makes the most sense and is likely what other programmers will expect.
If we're just talking about a couple numbers, choosing the largest won't make a noticeable difference in your overall ram usage and will just work. If you are talking about lots of numbers, you'll need to use TryParse() on them and figure out the smallest int type, to save ram.
All computers are finite. You need to define an upper limit based on what you think your users requirements will be.
If you really have no upper limit and want to allow 'unlimited' values, try adding the .Net Java runtime libraries to your project, which will allow you to use the java.math.BigInteger class - which does math on nearly-unlimited size integer.
Note: The .Net Java libraries come with full DevStudio, but I don't think they come with Express.