if (myValue > ConstantValue + 1)
{
// do some stuff
}
Is ConstantValue + 1 determined at compile time?
Yes, it will be replaced during compilation:
C# Code:
if (value <= ConstValue)
Console.WriteLine("Test1");
if (value <= ConstValue + 1)
Console.WriteLine("Test2");
IL:
IL_000c: ldloc.0
IL_000d: ldc.i4.s 10
IL_000f: cgt
IL_0011: stloc.1
IL_0012: ldloc.1
IL_0013: brtrue.s IL_0020
IL_0015: ldstr "Test1"
IL_001a: call void [mscorlib]System.Console::WriteLine(string)
IL_001f: nop
IL_0020: ldloc.0
IL_0021: ldc.i4.s 11
IL_0023: cgt
IL_0025: stloc.1
IL_0026: ldloc.1
IL_0027: brtrue.s IL_0034
IL_0029: ldstr "Test2"
IL_002e: call void [mscorlib]System.Console::WriteLine(string)
IL_0033: nop
ConstValue is declared as following:
public const int ConstValue = 10;
Yes, ConstantValue + 1 determined at compile time.
Example:
static void Main(string[] args)
{
const int count = 1;
int myValue = 3;
if (myValue > count + 1)
{
Console.WriteLine(count);
}
}
we can see this with reflector:
private static void Main(string[] args)
{
int myValue = 3;
if (myValue > 2)
{
Console.WriteLine(1);
}
}
Related
I'm writing a high-performance component with a lot of int-double-int conversions, so I need to know the execution time between them.
static double ToDouble(int val) => (double)val;
static int ToInt(double val) => (int)val;
static void Main(string[] args) {
const int TIMES = 1000_0000;
Console.ReadLine();
var t_0 = Stopwatch.GetTimestamp();
for (int i = 0; i < TIMES; i++) {
var val = ToInt(ToDouble(i));
}
var t_1 = Stopwatch.GetTimestamp();
Console.WriteLine((t_1 - t_0) * 100_0000L / Stopwatch.Frequency); // 4002 Microseconds
var t_2 = Stopwatch.GetTimestamp();
for (int i = 0; i < TIMES; i++) {
}
var t_3 = Stopwatch.GetTimestamp();
Console.WriteLine((t_3 - t_2) * 100_0000L / Stopwatch.Frequency); // 3997 Microseconds
Console.ReadLine();
}
I found that int-double-int conversion is so fast that execution time is comparable to empty loops.
I think the code in the first loop is not executed at all, it is optimized by the compiler as an empty loop, is that true?
You should use StopWatch.Start and Stop then Elapsed!
const int TIMES = 100_000_000;
var chrono = new Stopwatch();
int val = 0;
chrono.Start();
for ( int i = 1; i <= TIMES; i++ )
val = ToInt(ToDouble(i));
chrono.Stop();
Console.WriteLine(val);
Console.WriteLine(chrono.ElapsedMilliseconds.ToString());
chrono.Restart();
for ( int i = 1; i <= TIMES; i++ )
{
var v1 = (double)i;
val = (int)v1;
}
chrono.Stop();
Console.WriteLine(val);
Console.WriteLine(chrono.ElapsedMilliseconds.ToString());
chrono.Restart();
for ( int i = 1; i <= TIMES; i++ )
val = i;
chrono.Stop();
Console.WriteLine(val);
Console.WriteLine(chrono.ElapsedMilliseconds.ToString());
chrono.Restart();
for ( int i = 1; i <= TIMES; i++ )
;
chrono.Stop();
Console.WriteLine(chrono.ElapsedMilliseconds.ToString());
Output with Debug mode:
729
270
194
218
Using Release build optimized:
84
61
57
31
The first loop in Debug IL:
// value = ToInt(ToDouble(i));
IL_0015: ldloc.2
IL_0016: call float64 ConsoleApp.Program::ToDouble(int32)
IL_001b: call int32 ConsoleApp.Program::ToInt(float64)
IL_0020: stloc.1
The first loop in Release IL:
// value = ToInt(ToDouble(i));
IL_0012: ldloc.2
IL_0013: call float64 ConsoleApp.Program::ToDouble(int32)
IL_0018: call int32 ConsoleApp.Program::ToInt(float64)
IL_001d: stloc.1
The second loop Debug:
// double num = j;
IL_0065: ldloc.s 5
IL_0067: conv.r8
IL_0068: stloc.s 6
// value = (int)num;
IL_006a: ldloc.s 6
IL_006c: conv.i4
IL_006d: stloc.1
The second loop Release:
// value = (int)(double)j;
IL_0054: ldloc.s 4
IL_0056: conv.r8
IL_0057: conv.i4
IL_0058: stloc.1
Proc calls eat a lot of CPU ticks and this is the first thing to consider when optimizing, with loops and calculation.
The compiler optimization is mainly with the loop itself:
Debug:
// for (int i = 1; i <= 100000000; i++)
IL_0010: ldc.i4.1
IL_0011: stloc.2
// (no C# code)
IL_0012: br.s IL_0026
// loop start (head: IL_0026)
//...
// for (int i = 1; i <= 100000000; i++)
IL_0022: ldloc.2
IL_0023: ldc.i4.1
IL_0024: add
IL_0025: stloc.2
// for (int i = 1; i <= 100000000; i++)
IL_0026: ldloc.2
IL_0027: ldc.i4 100000000
IL_002c: cgt
// (no C# code)
IL_002e: ldc.i4.0
IL_002f: ceq
IL_0031: stloc.3
IL_0032: ldloc.3
IL_0033: brtrue.s IL_0014
// end loop
Release:
// for (int i = 1; i <= 100000000; i++)
IL_000c: ldc.i4.1
IL_000d: stloc.1
// (no C# code)
IL_000e: br.s IL_0020
// loop start (head: IL_0020)
// ...
// for (int i = 1; i <= 100000000; i++)
IL_001c: ldloc.1
IL_001d: ldc.i4.1
IL_001e: add
IL_001f: stloc.1
// for (int i = 1; i <= 100000000; i++)
IL_0020: ldloc.1
IL_0021: ldc.i4 100000000
IL_0026: ble.s IL_0010
// end loop
The loops in debug without the console.writeline(val):
// value = ToInt(ToDouble(i));
IL_0015: ldloc.2
IL_0016: call float64 ConsoleApp.Program::ToDouble(int32)
IL_001b: call int32 ConsoleApp.Program::ToInt(float64)
IL_0020: stloc.1
// double num = j;
IL_0065: ldloc.s 5
IL_0067: conv.r8
IL_0068: stloc.s 6
// value = (int)num;
IL_006a: ldloc.s 6
IL_006c: conv.i4
IL_006d: stloc.1
// value = k;
IL_00b7: ldloc.s 8
IL_00b9: stloc.1
// nothing
The loops in release without the console.writeline(val):
// ToInt(ToDouble(i));
IL_0010: ldloc.1
IL_0011: call float64 ConsoleApp.Program::ToDouble(int32)
IL_0016: call int32 ConsoleApp.Program::ToInt(float64)
IL_001b: pop
// _ = (double)j;
IL_004b: ldloc.3
IL_004c: conv.r8
IL_004d: pop
// nothing
// nothing
Background
I wanted to make a few integer-sized structs (i.e. 32 and 64 bits) that are easily convertible to/from primitive unmanaged types of the same size (i.e. Int32 and UInt32 for 32-bit-sized struct in particular).
The structs would then expose additional functionality for bit manipulation / indexing that is not available on integer types directly. Basically, as a sort of syntactic sugar, improving readability and ease of use.
The important part, however, was performance, in that there should essentially be 0 cost for this extra abstraction (at the end of the day the CPU should "see" the same bits as if it was dealing with primitive ints).
Sample Struct
Below is just the very basic struct I came up with. It does not have all the functionality, but enough to illustrate my questions:
[StructLayout(LayoutKind.Explicit, Pack = 1, Size = 4)]
public struct Mask32 {
[FieldOffset(3)]
public byte Byte1;
[FieldOffset(2)]
public ushort UShort1;
[FieldOffset(2)]
public byte Byte2;
[FieldOffset(1)]
public byte Byte3;
[FieldOffset(0)]
public ushort UShort2;
[FieldOffset(0)]
public byte Byte4;
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe implicit operator Mask32(int i) => *(Mask32*)&i;
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe implicit operator Mask32(uint i) => *(Mask32*)&i;
}
The Test
I wanted to test the performance of this struct. In particular I wanted to see if it could let me get the individual bytes just as quickly if I were to use regular bitwise arithmetic: (i >> 8) & 0xFF (to get the 3rd byte for example).
Below you will see a benchmark I came up with:
public unsafe class MyBenchmark {
const int count = 50000;
[Benchmark(Baseline = true)]
public static void Direct() {
var j = 0;
for (int i = 0; i < count; i++) {
//var b1 = i.Byte1();
//var b2 = i.Byte2();
var b3 = i.Byte3();
//var b4 = i.Byte4();
j += b3;
}
}
[Benchmark]
public static void ViaStructPointer() {
var j = 0;
int i = 0;
var s = (Mask32*)&i;
for (; i < count; i++) {
//var b1 = s->Byte1;
//var b2 = s->Byte2;
var b3 = s->Byte3;
//var b4 = s->Byte4;
j += b3;
}
}
[Benchmark]
public static void ViaStructPointer2() {
var j = 0;
int i = 0;
for (; i < count; i++) {
var s = *(Mask32*)&i;
//var b1 = s.Byte1;
//var b2 = s.Byte2;
var b3 = s.Byte3;
//var b4 = s.Byte4;
j += b3;
}
}
[Benchmark]
public static void ViaStructCast() {
var j = 0;
for (int i = 0; i < count; i++) {
Mask32 m = i;
//var b1 = m.Byte1;
//var b2 = m.Byte2;
var b3 = m.Byte3;
//var b4 = m.Byte4;
j += b3;
}
}
[Benchmark]
public static void ViaUnsafeAs() {
var j = 0;
for (int i = 0; i < count; i++) {
var m = Unsafe.As<int, Mask32>(ref i);
//var b1 = m.Byte1;
//var b2 = m.Byte2;
var b3 = m.Byte3;
//var b4 = m.Byte4;
j += b3;
}
}
}
The Byte1(), Byte2(), Byte3(), and Byte4() are just the extension methods that do get inlined and simply get the n-th byte by doing bitwise operations and casting:
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte1(this int it) => (byte)(it >> 24);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte2(this int it) => (byte)((it >> 16) & 0xFF);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte3(this int it) => (byte)((it >> 8) & 0xFF);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte4(this int it) => (byte)it;
EDIT: Fixed the code to make sure variables are actually used. Also commented out 3 of 4 variables to really test struct casting / member access rather than actually using the variables.
The Results
I ran these in the Release build with optimizations on x64.
Intel Core i7-3770K CPU 3.50GHz (Ivy Bridge), 1 CPU, 8 logical cores and 4 physical cores
Frequency=3410223 Hz, Resolution=293.2360 ns, Timer=TSC
[Host] : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.6.1086.0
DefaultJob : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.6.1086.0
Method | Mean | Error | StdDev | Scaled | ScaledSD |
------------------ |----------:|----------:|----------:|-------:|---------:|
Direct | 14.47 us | 0.3314 us | 0.2938 us | 1.00 | 0.00 |
ViaStructPointer | 111.32 us | 0.6481 us | 0.6062 us | 7.70 | 0.15 |
ViaStructPointer2 | 102.31 us | 0.7632 us | 0.7139 us | 7.07 | 0.14 |
ViaStructCast | 29.00 us | 0.3159 us | 0.2800 us | 2.01 | 0.04 |
ViaUnsafeAs | 14.32 us | 0.0955 us | 0.0894 us | 0.99 | 0.02 |
EDIT: New results after fixing the code:
Method | Mean | Error | StdDev | Scaled | ScaledSD |
------------------ |----------:|----------:|----------:|-------:|---------:|
Direct | 57.51 us | 1.1070 us | 1.0355 us | 1.00 | 0.00 |
ViaStructPointer | 203.20 us | 3.9830 us | 3.5308 us | 3.53 | 0.08 |
ViaStructPointer2 | 198.08 us | 1.8411 us | 1.6321 us | 3.45 | 0.06 |
ViaStructCast | 79.68 us | 1.5478 us | 1.7824 us | 1.39 | 0.04 |
ViaUnsafeAs | 57.01 us | 0.8266 us | 0.6902 us | 0.99 | 0.02 |
Questions
The benchmark results were surprising for me, and that's why I have a few questions:
EDIT: Fewer questions remain after altering the code so that the variables actually get used.
Why is the pointer stuff so slow?
Why is the cast taking twice as long as the baseline case? Aren't implicit/explicit operators inlined?
How come the new System.Runtime.CompilerServices.Unsafe package (v. 4.5.0) is so fast? I thought it would at least involve a method call...
More generally, how can I make essentially a zero-cost struct that would simply act as a "window" onto some memory or a biggish primitive type like UInt64 so that I can more effectively manipulate / read that memory? What's the best practice here?
The answer to this appears to be that the JIT compiler can make certain optimisations better when you are using Unsafe.As().
Unsafe.As() is implemented very simply like this:
public static ref TTo As<TFrom, TTo>(ref TFrom source)
{
return ref source;
}
That's it!
Here's a test program I wrote to compare that with casting:
using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
namespace Demo
{
[StructLayout(LayoutKind.Explicit, Pack = 1, Size = 4)]
public struct Mask32
{
[FieldOffset(3)]
public byte Byte1;
[FieldOffset(2)]
public ushort UShort1;
[FieldOffset(2)]
public byte Byte2;
[FieldOffset(1)]
public byte Byte3;
[FieldOffset(0)]
public ushort UShort2;
[FieldOffset(0)]
public byte Byte4;
}
public static unsafe class Program
{
static int count = 50000000;
public static int ViaStructPointer()
{
int total = 0;
for (int i = 0; i < count; i++)
{
var s = (Mask32*)&i;
total += s->Byte1;
}
return total;
}
public static int ViaUnsafeAs()
{
int total = 0;
for (int i = 0; i < count; i++)
{
var m = Unsafe.As<int, Mask32>(ref i);
total += m.Byte1;
}
return total;
}
public static void Main(string[] args)
{
var sw = new Stopwatch();
sw.Restart();
ViaStructPointer();
Console.WriteLine("ViaStructPointer took " + sw.Elapsed);
sw.Restart();
ViaUnsafeAs();
Console.WriteLine("ViaUnsafeAs took " + sw.Elapsed);
}
}
}
The results I get on my PC (x64 release build) are as follows:
ViaStructPointer took 00:00:00.1314279
ViaUnsafeAs took 00:00:00.0249446
As you can see, ViaUnsafeAs is indeed much quicker.
So let's look at what the compiler has generated:
public static unsafe int ViaStructPointer()
{
int total = 0;
for (int i = 0; i < Program.count; i++)
{
total += (*(Mask32*)(&i)).Byte1;
}
return total;
}
public static int ViaUnsafeAs()
{
int total = 0;
for (int i = 0; i < Program.count; i++)
{
total += (Unsafe.As<int, Mask32>(ref i)).Byte1;
}
return total;
}
OK, there's nothing obvious there. But what about the IL?
.method public hidebysig static int32 ViaStructPointer () cil managed
{
.locals init (
[0] int32 total,
[1] int32 i,
[2] valuetype Demo.Mask32* s
)
IL_0000: ldc.i4.0
IL_0001: stloc.0
IL_0002: ldc.i4.0
IL_0003: stloc.1
IL_0004: br.s IL_0017
.loop
{
IL_0006: ldloca.s i
IL_0008: conv.u
IL_0009: stloc.2
IL_000a: ldloc.0
IL_000b: ldloc.2
IL_000c: ldfld uint8 Demo.Mask32::Byte1
IL_0011: add
IL_0012: stloc.0
IL_0013: ldloc.1
IL_0014: ldc.i4.1
IL_0015: add
IL_0016: stloc.1
IL_0017: ldloc.1
IL_0018: ldsfld int32 Demo.Program::count
IL_001d: blt.s IL_0006
}
IL_001f: ldloc.0
IL_0020: ret
}
.method public hidebysig static int32 ViaUnsafeAs () cil managed
{
.locals init (
[0] int32 total,
[1] int32 i,
[2] valuetype Demo.Mask32 m
)
IL_0000: ldc.i4.0
IL_0001: stloc.0
IL_0002: ldc.i4.0
IL_0003: stloc.1
IL_0004: br.s IL_0020
.loop
{
IL_0006: ldloca.s i
IL_0008: call valuetype Demo.Mask32& [System.Runtime.CompilerServices.Unsafe]System.Runtime.CompilerServices.Unsafe::As<int32, valuetype Demo.Mask32>(!!0&)
IL_000d: ldobj Demo.Mask32
IL_0012: stloc.2
IL_0013: ldloc.0
IL_0014: ldloc.2
IL_0015: ldfld uint8 Demo.Mask32::Byte1
IL_001a: add
IL_001b: stloc.0
IL_001c: ldloc.1
IL_001d: ldc.i4.1
IL_001e: add
IL_001f: stloc.1
IL_0020: ldloc.1
IL_0021: ldsfld int32 Demo.Program::count
IL_0026: blt.s IL_0006
}
IL_0028: ldloc.0
IL_0029: ret
}
Aha! The only difference here is this:
ViaStructPointer: conv.u
ViaUnsafeAs: call valuetype Demo.Mask32& [System.Runtime.CompilerServices.Unsafe]System.Runtime.CompilerServices.Unsafe::As<int32, valuetype Demo.Mask32>(!!0&)
ldobj Demo.Mask32
On the face of it, you would expect conv.u to be faster than the two instructions used for Unsafe.As. However, it seems that the JIT compiler is able to optimise those two instructions much better than the single conv.u.
It's reasonable to ask why that is - unfortunately I don't have an answer to that yet! I'm almost certain that the call to Unsafe::As<>() is being inlined by the JITTER, and it is being further optimised by the JIT.
There is some information about the Unsafe class' optimisations here.
Note that the IL generated for Unsafe.As<> is simply this:
.method public hidebysig static !!TTo& As<TFrom, TTo> (
!!TFrom& source
) cil managed aggressiveinlining
{
.custom instance void System.Runtime.Versioning.NonVersionableAttribute::.ctor() = (
01 00 00 00
)
IL_0000: ldarg.0
IL_0001: ret
}
Now I think it becomes clearer as to why that can be optimised so well by the JITTER.
When you take the address of a local the jit generally has to keep that local on the stack. That's the case here. In the ViaPointer version i is kept on the stack. In the ViaUnsafe, i is copied to a temp and the temp is kept on the stack. The former is slower because i is also used to control the iteration of the loop.
You can get pretty close to the ViaUnsafe perf with the following code where you explicitly make a copy:
public static int ViaStructPointer2()
{
int total = 0;
for (int i = 0; i < count; i++)
{
int j = i;
var s = (Mask32*)&j;
total += s->Byte1;
}
return total;
}
ViaStructPointer took 00:00:00.1147793
ViaUnsafeAs took 00:00:00.0282828
ViaStructPointer2 took 00:00:00.0257589
I found this code to swap two numbers without using a third variable, using the XOR ^ operator.
Code:
int i = 25;
int j = 36;
j ^= i;
i ^= j;
j ^= i;
Console.WriteLine("i:" + i + " j:" + j);
//numbers Swapped correctly
//Output: i:36 j:25
Now I changed the above code to this equivalent code.
My Code:
int i = 25;
int j = 36;
j ^= i ^= j ^= i; // I have changed to this equivalent (???).
Console.WriteLine("i:" + i + " j:" + j);
//Not Swapped correctly
//Output: i:36 j:0
Now, I want to know, Why does my code give incorrect output?
EDIT: Okay, got it.
The first point to make is that obviously you shouldn't use this code anyway. However, when you expand it, it becomes equivalent to:
j = j ^ (i = i ^ (j = j ^ i));
(If we were using a more complicated expression such as foo.bar++ ^= i, it would be important that the ++ was only evaluated once, but here I believe it's simpler.)
Now, the order of evaluation of the operands is always left to right, so to start with we get:
j = 36 ^ (i = i ^ (j = j ^ i));
This (above) is the most important step. We've ended up with 36 as the LHS for the XOR operation which is executed last. The LHS is not "the value of j after the RHS has been evaluated".
The evaluation of the RHS of the ^ involves the "one level nested" expression, so it becomes:
j = 36 ^ (i = 25 ^ (j = j ^ i));
Then looking at the deepest level of nesting, we can substitute both i and j:
j = 36 ^ (i = 25 ^ (j = 25 ^ 36));
... which becomes
j = 36 ^ (i = 25 ^ (j = 61));
The assignment to j in the RHS occurs first, but the result is then overwritten at the end anyway, so we can ignore that - there are no further evaluations of j before the final assignment:
j = 36 ^ (i = 25 ^ 61);
This is now equivalent to:
i = 25 ^ 61;
j = 36 ^ (i = 25 ^ 61);
Or:
i = 36;
j = 36 ^ 36;
Which becomes:
i = 36;
j = 0;
I think that's all correct, and it gets to the right answer... apologies to Eric Lippert if some of the details about evaluation order are slightly off :(
Checked the generated IL and it gives out different results;
The correct swap generates a straightforward:
IL_0001: ldc.i4.s 25
IL_0003: stloc.0 //create a integer variable 25 at position 0
IL_0004: ldc.i4.s 36
IL_0006: stloc.1 //create a integer variable 36 at position 1
IL_0007: ldloc.1 //push variable at position 1 [36]
IL_0008: ldloc.0 //push variable at position 0 [25]
IL_0009: xor
IL_000a: stloc.1 //store result in location 1 [61]
IL_000b: ldloc.0 //push 25
IL_000c: ldloc.1 //push 61
IL_000d: xor
IL_000e: stloc.0 //store result in location 0 [36]
IL_000f: ldloc.1 //push 61
IL_0010: ldloc.0 //push 36
IL_0011: xor
IL_0012: stloc.1 //store result in location 1 [25]
The incorrect swap generates this code:
IL_0001: ldc.i4.s 25
IL_0003: stloc.0 //create a integer variable 25 at position 0
IL_0004: ldc.i4.s 36
IL_0006: stloc.1 //create a integer variable 36 at position 1
IL_0007: ldloc.1 //push 36 on stack (stack is 36)
IL_0008: ldloc.0 //push 25 on stack (stack is 36-25)
IL_0009: ldloc.1 //push 36 on stack (stack is 36-25-36)
IL_000a: ldloc.0 //push 25 on stack (stack is 36-25-36-25)
IL_000b: xor //stack is 36-25-61
IL_000c: dup //stack is 36-25-61-61
IL_000d: stloc.1 //store 61 into position 1, stack is 36-25-61
IL_000e: xor //stack is 36-36
IL_000f: dup //stack is 36-36-36
IL_0010: stloc.0 //store 36 into positon 0, stack is 36-36
IL_0011: xor //stack is 0, as the original 36 (instead of the new 61) is xor-ed)
IL_0012: stloc.1 //store 0 into position 1
It's evident that the code generated in the second method is incorect, as the old value of j is used in a calculation where the new value is required.
C# loads j, i, j, i on the stack, and stores each XOR result without updating the stack, so the leftmost XOR uses the initial value for j.
Rewriting:
j ^= i;
i ^= j;
j ^= i;
Expanding ^=:
j = j ^ i;
i = j ^ i;
j = j ^ i;
Substitute:
j = j ^ i;
j = j ^ (i = j ^ i);
Substitute this only works if/because the left hand side of the ^ operator is evaluated first:
j = (j = j ^ i) ^ (i = i ^ j);
Collapse ^:
j = (j ^= i) ^ (i ^= j);
Symmetrically:
i = (i ^= j) ^ (j ^= i);
I have a question regarding assigning value to a variable inside a for loop. I understand that the compiler gives this error message when there is any possibility that a variable may be read when it is not yet assigned, as stated by Microsoft.
Note that this error is generated when the compiler encounters a construct that might result in the use of an unassigned variable, even if your particular code does not.
My code looks like this:
static void Main(string[] args)
{
int i;
for (int j = 0; j <= 5; j++)
{
i = j;
}
Console.WriteLine(i.ToString());
Console.ReadLine();
}
I presume that even though in this particular scenario i will get assigned, the compiler doesn't check the actual condition inside the for statement, meaning that it would treat
for (int j = 0; j <= -1; j++)
just the same?
Since you are assigning value to variable i inside your for loop, compiler is not smart enough to see whether it will be assigned any value, hence the error.
A simple solution would be to assign some default value to your variable.
int i = 0;
Although from the loop variable values it appears that the control will enter the for loop, but the compiler can't determine that.
The reason for the error is because of the language specification,
5.3 Definite assignment
At a given location in the executable code of a function member, a
variable is said to be definitely assigned if the compiler can prove,
by static flow analysis, that the variable has been automatically initialized or has been the target of at least one assignment.
Evaluation a condition is not static flow, it would be determined at runtime. Because of that the compiler can't determine whether i would be assigned any value.
Your current case seems trivial, but what if your loop looks like this:
int i;
for (int j = 0; j <= GetUpperLimit(); j++)
{
i = j;
}
GetUpperLimit could return 5 like it could return -3 and i will never get assigned. It could be entirely dependent on the state of your application during runtime and the compiler can't know that beforehand, obviously. A simple int i = 0 will solve the issue.
If the condition is a false const or a literal, your for loop will be optimized away. Demo using LINQPad:
int i;
for (int j = 0; j <= -1; j++)
{
i = j;
}
Resulting IL:
IL_0001: ldc.i4.0
IL_0002: stloc.1 // j
IL_0003: br.s IL_000D
IL_0005: nop
IL_0006: ldloc.1 // j
IL_0007: stloc.0 // i
IL_0008: nop
IL_0009: ldloc.1 // j
IL_000A: ldc.i4.1
IL_000B: add
IL_000C: stloc.1 // j
IL_000D: ldloc.1 // j
IL_000E: ldc.i4.m1
IL_000F: cgt
IL_0011: ldc.i4.0
IL_0012: ceq
IL_0014: stloc.2 // CS$4$0000
IL_0015: ldloc.2 // CS$4$0000
IL_0016: brtrue.s IL_0005
Now with a false literal instead of an expression:
int i;
for (int j = 0; false; j++)
{
i = j;
}
Resulting IL:
IL_0001: ldc.i4.0
IL_0002: stloc.1 // j
IL_0003: br.s IL_0005
IL_0005: ldc.i4.0
IL_0006: stloc.2 // CS$4$0000
Similarly, if(false) { ... } is optimized, but not bool b = false; if(b) { ... }. This is all a result of static analysis.
This question already has answers here:
In C#, Is it slower to reference an array variable?
(5 answers)
Closed 2 years ago.
I have an array and I want to use on single value multiple times in the same method.
int[] array = new array[] {1, 2, 3};
Now I'm asking you if its a big performance issue if I use the array and an index to get the value...
int x = array[1] * 2;
int y = array[1] * 3;
int z = array[1] * 4;
or is it better to create a local variable?
int value = array[1];
int x = value * 2;
int y = value * 3;
int z = value * 4;
I know it easier to read with a local variable, but it just interested me if it makes any performace differences. ;-)
Although I agree that such micro-optimizations are evil and unnecessary, and readability might be more important, it was quite a fun to make a dummy benchmark for two methods:
private static int TestWithIndex(int[] array)
{
int x = array[1] * 2;
int y = array[1] * 3;
int z = array[1] * 4;
return x + y + z;
}
private static int TestWithTemp(int[] array)
{
int value = array[1];
int x = value * 2;
int y = value * 3;
int z = value * 4;
return x + y + z;
}
calling them int.MaxValue times in Release mode produces:
12032 ms - for TestWithIndex
10525 ms - for TestWithTemp
And then let's look at the IL generated (Release mode, Optimizations enabled):
TestWithIndex
.method private hidebysig static
int32 TestWithIndex (
int32[] 'array'
) cil managed
{
// Method begins at RVA 0x2564
// Code size 29 (0x1d)
.maxstack 2
.locals init (
[0] int32 x,
[1] int32 y,
[2] int32 z,
[3] int32 CS$1$0000
)
IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldc.i4.1
IL_0003: ldelem.i4
IL_0004: ldc.i4.2
IL_0005: mul
IL_0006: stloc.0
IL_0007: ldarg.0
IL_0008: ldc.i4.1
IL_0009: ldelem.i4
IL_000a: ldc.i4.3
IL_000b: mul
IL_000c: stloc.1
IL_000d: ldarg.0
IL_000e: ldc.i4.1
IL_000f: ldelem.i4
IL_0010: ldc.i4.4
IL_0011: mul
IL_0012: stloc.2
IL_0013: ldloc.0
IL_0014: ldloc.1
IL_0015: add
IL_0016: ldloc.2
IL_0017: add
IL_0018: stloc.3
IL_0019: br.s IL_001b
IL_001b: ldloc.3
IL_001c: ret
} // end of method Program::TestWithIndex
Here we see three ldelem.i4.
TestWithTemp
.method private hidebysig static
int32 TestWithTemp (
int32[] 'array'
) cil managed
{
// Method begins at RVA 0x2590
// Code size 29 (0x1d)
.maxstack 2
.locals init (
[0] int32 'value',
[1] int32 x,
[2] int32 y,
[3] int32 z,
[4] int32 CS$1$0000
)
IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldc.i4.1
IL_0003: ldelem.i4
IL_0004: stloc.0
IL_0005: ldloc.0
IL_0006: ldc.i4.2
IL_0007: mul
IL_0008: stloc.1
IL_0009: ldloc.0
IL_000a: ldc.i4.3
IL_000b: mul
IL_000c: stloc.2
IL_000d: ldloc.0
IL_000e: ldc.i4.4
IL_000f: mul
IL_0010: stloc.3
IL_0011: ldloc.1
IL_0012: ldloc.2
IL_0013: add
IL_0014: ldloc.3
IL_0015: add
IL_0016: stloc.s CS$1$0000
IL_0018: br.s IL_001a
IL_001a: ldloc.s CS$1$0000
IL_001c: ret
} // end of method Program::TestWithTemp
Here only one ldelem.i4 of course.
No, there would be no performance difference. For this to work:
int x = array[1] * 2;
the value at array[1] is going to have to be moved into a memory location anyway when the IL is generated. The remaining operations will then be optimized away by the compiler (i.e. it's not going to retrieve the value more than once).
Alright, to settle the argument I decided to dump each, here is the first:
.method private hidebysig static void Main(string[] args) cil managed
{
IL_0000: nop
IL_0001: ldc.i4.3
IL_0002: newarr [mscorlib]System.Int32
IL_0007: dup
IL_0008: ldtoken field valuetype '<PrivateImplementationDetails>{79A4FD92-FA37-4EB9-8056-B52A57262FBB}'/'__StaticArrayInitTypeSize=12' '<PrivateImplementationDetails>{79A4FD92-FA37-4EB9-8056-B52A57262FBB}'::'$$method0x6000001-1'
IL_000d: call void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array,
valuetype [mscorlib]System.RuntimeFieldHandle)
IL_0012: stloc.0
IL_0013: ldloc.0
IL_0014: ldc.i4.1
IL_0015: ldelem.i4
IL_0016: stloc.1
IL_0017: ldloc.1
IL_0018: ldc.i4.2
IL_0019: mul
IL_001a: stloc.2
IL_001b: ldloc.1
IL_001c: ldc.i4.3
IL_001d: mul
IL_001e: stloc.3
IL_001f: ldloc.1
IL_0020: ldc.i4.4
IL_0021: mul
IL_0022: stloc.s z
IL_0024: ret
} // end of method Program::Main
and here is the second:
.method private hidebysig static void Main(string[] args) cil managed
{
IL_0000: nop
IL_0001: ldc.i4.3
IL_0002: newarr [mscorlib]System.Int32
IL_0007: dup
IL_0008: ldtoken field valuetype '<PrivateImplementationDetails>{79A4FD92-FA37-4EB9-8056-B52A57262FBB}'/'__StaticArrayInitTypeSize=12' '<PrivateImplementationDetails>{79A4FD92-FA37-4EB9-8056-B52A57262FBB}'::'$$method0x6000001-1'
IL_000d: call void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array,
valuetype [mscorlib]System.RuntimeFieldHandle)
IL_0012: stloc.0
IL_0013: ldloc.0
IL_0014: ldc.i4.1
IL_0015: ldelem.i4
IL_0016: stloc.1
IL_0017: ldloc.1
IL_0018: ldc.i4.2
IL_0019: mul
IL_001a: stloc.2
IL_001b: ldloc.1
IL_001c: ldc.i4.3
IL_001d: mul
IL_001e: stloc.3
IL_001f: ldloc.1
IL_0020: ldc.i4.4
IL_0021: mul
IL_0022: stloc.s z
IL_0024: ret
} // end of method Program::Main
They are exactly the same--as I stated.
If you do not have performance requirements or performance problems, then your main goal is to write readable code which will be easy to maintain. It's easy to see duplication in your first example:
int x = array[1] * 2;
int y = array[1] * 3;
int z = array[1] * 4;
It has several issues. First - the more you have duplicated code, the more code you have to support and the higher chance you will not modify one of code copies at some point of time. Second - duplication always means you have hidden knowledge in your code. If some code is repeated then it has specific meaning, which you haven't made obvious. E.g. you have speed value in second item of array. Make this knowledge explicit:
int speed = array[1];
int x = speed * 2; // of course, magic numbers also should be replaced
int y = speed * 3;
int z = speed * 4;
And remember - premature optimization is evil. Usually you have 20% of code which takes 80% of execution time. There is high probability that your optimizations will not affect application performance. So, you should find these 20% first, and only then do optimizations (if they really needed).