How would I print martini glass pattern in c# optimal way - c#

I am trying to print the martini glass pattern using c#.
The pattern is like following:
for input = 4;
0000000
00000
000
0
|
|
|
|
=======
for input = 5;
000000000
0000000
00000
000
0
|
|
|
|
|
=========
I am able to get till triangle(0's).However, I am failing to get the neck(|) and bottom(=).
My code looks as follows:
const int height = 4;
for (int row = 0; row < height; row++)
{
//left padding
for (int col = 0; col < row; col++)
{
Console.Write(' ');
}
for (int col = 0; col < (height - row) * 2 - 1; col++)
{
Console.Write('0');
}
//right padding
for (int col = 0; col < row; col++)
{
Console.Write(' ');
}
Console.WriteLine();
}
for(int i = 1; i < height; i++)
{
Console.Write('|');
}
Console.ReadKey();
And it prints like this:
0000000
00000
000
0
|||
Can somebody help me in finishing the neck and the bottom?
And also is my code optimal? You are free to edit the complete code for optimization.
Thanks in advance.
Edited:
Code added for neck and bottom:
for (int i = 1; i <= height; i++)
{
// Left padding
for (int j = 1; j < height; j++)
{
Console.Write(' ');
}
Console.WriteLine('|');
}
for (int row = 0; row < height; row++)
{
for (int col = 0; col < row; col++)
{
Console.Write('=');
}
}
Console.ReadKey();

string constructor is helpful to avoid writing excessive loops
int count = 5;
for(int i = count - 1; i >= 0; i--)
{
Console.WriteLine(new string('0', 2*i + 1).PadLeft(i+count));
}
Console.Write(new string('|', count).Replace("|","|\n".PadLeft(count+1)));
Console.WriteLine(new string('=', count* 2-1));

Use the string class constructor to repeat a pattern instead of looping them over.
class HelloWorld {
static void Main() {
const int height = 1 ;
for (int row = 0; row < height; row++)
{
var spaces = new String(' ', row);
var zeroes = new String('0', ((height - row) * 2 ) -1 );
Console.WriteLine(spaces + zeroes);
}
for(int i = 1; i <= height; i++)
{
var spaces = new String(' ', height -1);
Console.WriteLine(spaces + '|');
}
Console.WriteLine(new String('=', (height *2) -1));
Console.ReadKey();
}
}
Edit
But optimal I'm assuming faster execution time. But for relatively smaller values I do not see how the both could make a significant difference. But I still ran it on BenchmarkDotNet
First refer's to my code and Second is kazem's one.
I am not sure what to make out of this output. But I assume you can read more on it from their documentation
// * Detailed results *
MartiniBenchMark.First: DefaultJob
Runtime = .NET Framework 4.6.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1586.0; GC = Concurrent Workstation
Mean = 1.7365 ms, StdErr = 0.0081 ms (0.47%); N = 15, StdDev = 0.0315 ms
Min = 1.6916 ms, Q1 = 1.7099 ms, Median = 1.7309 ms, Q3 = 1.7626 ms, Max = 1.8087 ms
IQR = 0.0527 ms, LowerFence = 1.6309 ms, UpperFence = 1.8417 ms
ConfidenceInterval = [1.7028 ms; 1.7702 ms] (CI 99.9%), Margin = 0.0337 ms (1.94% of Mean)
Skewness = 0.45, Kurtosis = 2.58
MartiniBenchMark.Second: DefaultJob
Runtime = .NET Framework 4.6.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1586.0; GC = Concurrent Workstation
Mean = 1.8580 ms, StdErr = 0.0147 ms (0.79%); N = 96, StdDev = 0.1440 ms
Min = 1.6291 ms, Q1 = 1.7440 ms, Median = 1.8311 ms, Q3 = 1.9782 ms, Max = 2.2573 ms
IQR = 0.2342 ms, LowerFence = 1.3927 ms, UpperFence = 2.3295 ms
ConfidenceInterval = [1.8081 ms; 1.9079 ms] (CI 99.9%), Margin = 0.0499 ms (2.69% of Mean)
Skewness = 0.42, Kurtosis = 2.22
Total time: 00:12:04 (724.8 sec)
// * Summary *
BenchmarkDotNet=v0.10.9, OS=Windows 10 Redstone 1 (10.0.14393)
Processor=Intel Core i3-3110M CPU 2.40GHz (Ivy Bridge), ProcessorCount=4
Frequency=2338445 Hz, Resolution=427.6346 ns, Timer=TSC
[Host] : .NET Framework 4.6.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1586.0
DefaultJob : .NET Framework 4.6.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1586.0
Method | Mean | Error | StdDev |
------- |---------:|----------:|----------:|
First | 1.737 ms | 0.0337 ms | 0.0315 ms |
Second | 1.858 ms | 0.0499 ms | 0.1440 ms |
// * Hints *
Outliers
MartiniBenchMark.First: Default -> 3 outliers were removed
MartiniBenchMark.Second: Default -> 4 outliers were removed
// * Legends *
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
1 ms : 1 Millisecond (0.001 sec)

You need to print the spaces in the left first. Just as the same you did in case of '0'. Left Padding part.
for(int i = 1; i <= height; i++)
{
// Left padding
for(int j = 1; j < height; j++)
{
Console.Write(' ');
}
Console.WriteLine('|');
}
And your neck loop should go till <=height.
Now, I think you can complete the bottom part.(It will be same as the first line of '0', without any padding) Please let me know if you face any difficulty.
Also, I don't think you need Right padding part.
Hope it helps.
EDIT:
Bottom Part:
for(int i = 1; i <= height * 2 - 1; i++)
{
Console.Write("=");
}

Related

C#.Net AsSpan().Fill() byte[] vs Int32[] vs float[] speed difference depending on array length?

I've run into something strange, when using AsSpan.Fill it's twice as fast on a byte[] array as opposed to an int or float array, and they are all of the same size in bytes. BUT it depends on the size of the arrays, on small arrays it is the same, but on larger ones the difference shows.
Here is a sample console application to illustrate
internal unsafe class Program {
static byte[]? ByteFrame;
static Int32[]? Int32Frame;
static float[]? FloatFrame;
static int[]? ResetCacheArray;
static void Main(string[] args) {
// size vars
int Width = 1500;
int Height = 1500;
// Init frames
ByteFrame = new byte[Width * Height * 4];
ByteFrame.AsSpan().Fill(0);
Int32Frame = new Int32[Width * Height];
Int32Frame.AsSpan().Fill(0);
FloatFrame = new float[Width * Height];
FloatFrame.AsSpan().Fill(1);
ResetCacheArray = new int[10000 * 10000];
ResetCacheArray.AsSpan().Fill(1);
// warmup jitter
for(int i = 0; i < 200; i++) {
ClearByteFrameAsSpanFill(0);
ClearInt32FrameAsSpanFill(0);
ClearFloatFrameAsSpanFill(0f);
ClearCache();
}
Console.WriteLine(Environment.Is64BitProcess);
int TestIterations;
double nanoseconds;
double MsDuration;
double MB = 0;
double MBSec;
double GBSec;
TestIterations = 1;
nanoseconds = 1_000_000_000.0 * Stopwatch.GetTimestamp() / Stopwatch.Frequency;
for (int i = 0; i < TestIterations; i++) {
MB = ClearByteFrameAsSpanFill(0);
}
MsDuration = (((1_000_000_000.0 * Stopwatch.GetTimestamp() / Stopwatch.Frequency) - nanoseconds) / TestIterations) / 1000000;
MBSec = (MB / MsDuration) * 1000;
GBSec = MBSec / 1000;
Console.WriteLine("ClearByteFrameAsSpanFill: MS:" + MsDuration + " GB/s:" + (int)GBSec + " MB/s:" + (int)MBSec);
ClearCache();
TestIterations = 1;
nanoseconds = 1_000_000_000.0 * Stopwatch.GetTimestamp() / Stopwatch.Frequency;
for (int i = 0; i < TestIterations; i++) {
MB = ClearInt32FrameAsSpanFill(1);
}
MsDuration = (((1_000_000_000.0 * Stopwatch.GetTimestamp() / Stopwatch.Frequency) - nanoseconds) / TestIterations) / 1000000;
MBSec = (MB / MsDuration) * 1000;
GBSec = MBSec / 1000;
Console.WriteLine("ClearInt32FrameAsSpanFill: MS:" + MsDuration + " GB/s:" + (int)GBSec + " MB/s:" + (int)MBSec);
ClearCache();
TestIterations = 1;
nanoseconds = 1_000_000_000.0 * Stopwatch.GetTimestamp() / Stopwatch.Frequency;
for (int i = 0; i < TestIterations; i++) {
MB = ClearFloatFrameAsSpanFill(1f);
}
MsDuration = (((1_000_000_000.0 * Stopwatch.GetTimestamp() / Stopwatch.Frequency) - nanoseconds) / TestIterations) / 1000000;
MBSec = (MB / MsDuration) * 1000;
GBSec = MBSec / 1000;
Console.WriteLine("ClearFloatFrameAsSpanFill: MS:" + MsDuration + " GB/s:" + (int)GBSec + " MB/s:" + (int)MBSec);
ClearCache();
Console.ReadLine();
}
static double ClearByteFrameAsSpanFill(byte clearValue) {
ByteFrame.AsSpan().Fill(clearValue);
return ByteFrame.Length / 1000000;
}
static double ClearInt32FrameAsSpanFill(Int32 clearValue) {
Int32Frame.AsSpan().Fill(clearValue);
return (Int32Frame.Length * 4) / 1000000;
}
static double ClearFloatFrameAsSpanFill(float clearValue) {
FloatFrame.AsSpan().Fill(clearValue);
return (FloatFrame.Length * 4) / 1000000;
}
static void ClearCache() {
int sum = 0;
for (int i = 0; i < ResetCacheArray.Length; i++) {
sum += ResetCacheArray[i];
}
}
}
On my machine it outputs the following:
ClearByteFrameAsSpanFill: MS:0,4913 GB/s:18 MB/s:18318
ClearInt32FrameAsSpanFill: MS:0,4851 GB/s:18 MB/s:18552
ClearFloatFrameAsSpanFill: MS:0,458 GB/s:19 MB/s:19650
It varies a little from run to run, + - a few GB/s but roughly each operation takes the same amount of time.
Now when i change the size variables to: Width = 4500, Height = 4500 then it outputs the following:
ClearByteFrameAsSpanFill: MS:3,4015 GB/s:23 MB/s:23813
ClearInt32FrameAsSpanFill: MS:7,635 GB/s:10 MB/s:10609
ClearFloatFrameAsSpanFill: MS:7,4429 GB/s:10 MB/s:10882
This will obviously change depending on ram speed from machine to machine, but on mine at least it is as such, on "small" arrays it is the same, but on large arrays filling a byte array is twice as fast as a int or float array of same byte length.
Does anyone have an explanation of this?
You are testing filling the byte array with 0 and filling the int array with 1:
ClearByteFrameAsSpanFill(0);
ClearInt32FrameAsSpanFill(1);
These cases have different optimisations.
If you fill an array of bytes with any value it will be around the same speed, because there's a processor instruction to fill a block of bytes with a specific byte value.
Although there may be processor instructions to fill an array of int or float values with non-zero values, they are likely to be slower than filling the block of memory with zero values.
I tried this out with the following code using BenchmarkDotNet:
[SimpleJob(RuntimeMoniker.Net60)]
public class UnderTest
{
[Benchmark]
public void FillBytesWithZero()
{
_bytes.AsSpan().Fill(0);
}
[Benchmark]
public void FillBytesWithOne()
{
_bytes.AsSpan().Fill(1);
}
[Benchmark]
public void FillIntsWithZero()
{
_ints.AsSpan().Fill(0);
}
[Benchmark]
public void FillIntsWithOne()
{
_ints.AsSpan().Fill(1);
}
const int COUNT = 1500 * 1500;
static readonly byte[] _bytes = new byte[COUNT * sizeof(int)];
static readonly int[] _ints = new int[COUNT];
}
With the following results:
For COUNT = 1500 * 1500:
| Method | Mean | Error | StdDev | Median |
|------------------ |---------:|---------:|---------:|---------:|
| FillBytesWithZero | 299.7 us | 7.82 us | 22.95 us | 299.3 us |
| FillBytesWithOne | 305.6 us | 11.46 us | 33.80 us | 293.3 us |
| FillIntsWithZero | 322.4 us | 2.37 us | 2.10 us | 321.6 us |
| FillIntsWithOne | 502.9 us | 27.68 us | 81.60 us | 534.4 us |
For COUNT = 4500 * 4500:
| Method | Mean | Error | StdDev |
|------------------ |---------:|----------:|----------:|
| FillBytesWithZero | 2.554 ms | 0.0307 ms | 0.0240 ms |
| FillBytesWithOne | 2.632 ms | 0.0522 ms | 0.1101 ms |
| FillIntsWithZero | 4.169 ms | 0.0258 ms | 0.0229 ms |
| FillIntsWithOne | 4.979 ms | 0.0488 ms | 0.0433 ms |
Note how filling a byte array with 0 or 1 is significantly faster.
If you inspect the source code for Span<T>.Fill() you'll see this:
public void Fill(T value)
{
if (Unsafe.SizeOf<T>() == 1)
{
// Special-case single-byte types like byte / sbyte / bool.
// The runtime eventually calls memset, which can efficiently support large buffers.
// We don't need to check IsReferenceOrContainsReferences because no references
// can ever be stored in types this small.
Unsafe.InitBlockUnaligned(ref Unsafe.As<T, byte>(ref _reference), Unsafe.As<T, byte>(ref value), (uint)_length);
}
else
{
// Call our optimized workhorse method for all other types.
SpanHelpers.Fill(ref _reference, (uint)_length, value);
}
}
This explains why filling a byte array is faster than filling an int array: It uses Unsafe.InitBlockUnaligned() for a byte array and SpanHelpers.Fill(ref _reference, (uint)_length, value); for a non-byte array.
Unsafe.InitBlockUnaligned() happens to be more performant; it's implemented as an intrinsic which performs the following:
ldarg .0
ldarg .1
ldarg .2
unaligned. 0x1
initblk
ret
Whereas SpanHelpers.Fill() is much less optimised.
It tries its best, using vectorised instructions to fill the memory if possible, but it can't compete with initblk. (It's too long to post here, but you can follow that link to look at it.)
One thing this doesn't explain is why filling an int array with zeroes is slightly faster than filling it with ones. To explain this you'd have to look at the actual processor instructions that the JIT produces, but it's definitely faster to fill a block of bytes with all 0's than it is to fill a block of bytes with 1,0,0,0 (which it would have to do for an int value of 1).
It's probably down to the comparative speeds of instructions like rep stosb (for bytes) and rep stosw (for words).
The outlier in these results is that the unaligned.1 initblk opcode sequence is about 50% faster for the smaller block size. The other times all scale up by approximately the increase in size of the memory block, i.e. around 9 times slower for the blocks that are 9 times bigger.
So the remaining question is: Why is initblk 50% faster per-byte for smaller buffer sizes (2_250_000 versus 20_250_000 bytes)?

Cache friendly and faster way faster - `InvokeMe()`

I am trying to figure out, if this really is the fastest approach. I want this to be as fast as possible, cache friendly, and serve a good time complexity.
DEMO: https://dotnetfiddle.net/BUGz8s
private static void InvokeMe()
{
int hz = horizontal.GetLength(0) * horizontal.GetLength(1);
int vr = vertical.GetLength(0) * vertical.GetLength(1);
int hzcol = horizontal.GetLength(1);
int vrcol = vertical.GetLength(1);
//Determine true from Horizontal information:
for (int i = 0; i < hz; i++)
{
if(horizontal[i / hzcol, i % hzcol] == true)
System.Console.WriteLine("True, on position: {0},{1}", i / hzcol, i % hzcol);
}
//Determine true position from vertical information:
for (int i = 0; i < vr; i++)
{
if(vertical[i / vrcol, i % vrcol] == true)
System.Console.WriteLine("True, on position: {0},{1}", i / vrcol, i % vrcol);
}
}
Pages I read:
Is there a "faster" way to iterate through a two-dimensional array than using nested for loops?
Fastest way to loop through a 2d array?
Time Complexity of a nested for loop that parses a matrix
Determining the big-O runtimes of these different loops?
EDIT: The code example, is now, more towards what I am dealing with. It's about determining a true point x,y from a N*N Grid. The information available at disposal is: horizontal and vertical 2D arrays.
To NOT cause confusion. Imagine, that overtime, some positions in vertical or horizontal get set to True. This works currently perfectly well. All I am in for, is, the current approach of using one for-loop per 2D array like this, instead of using two for loops per 2D array.
Time complexity for approach with one loop and nested loops is the same - O(row * col) (which is O(n^2) for row == col as in your example for both cases) so the difference in the execution time will come from the constants for operations (since the direction of traversing should be the same). You can use BenchmarkDotNet to measure those. Next benchmark:
[SimpleJob]
public class Loops
{
int[, ] matrix = new int[10, 10];
[Benchmark]
public void NestedLoops()
{
int row = matrix.GetLength(0);
int col = matrix.GetLength(1);
for (int i = 0; i < row ; i++)
for (int j = 0; j < col ; j++)
{
matrix[i, j] = i * row + j + 1;
}
}
[Benchmark]
public void SingleLoop()
{
int row = matrix.GetLength(0);
int col = matrix.GetLength(1);
var l = row * col;
for (int i = 0; i < l; i++)
{
matrix[i / col, i % col] = i + 1;
}
}
}
Gives on my machine:
Method
Mean
Error
StdDev
Median
NestedLoops
144.5 ns
2.94 ns
4.58 ns
144.7 ns
SingleLoop
578.2 ns
11.37 ns
25.42 ns
568.6 ns
Making single loop actually slower.
If you will change loop body to some "dummy" operation - for example incrementing some outer variable or updating fixed (for example first) element of the martix you will see that performance of both loops is roughly the same.
Did you consider
for (int i = 0; i < row; i++)
{
for (int j = 0; j < col; j++)
{
Console.Write(string.Format("{0:00} ", matrix[i, j]));
Console.Write(Environment.NewLine + Environment.NewLine);
}
}
It is basically the same loop as yours, but without / and % that compiler may or may not optimize.

Why is C# Array.BinarySearch so fast?

I have implemented a very simple binarySearch implementation in C# for finding integers in an integer array:
Binary Search
static int binarySearch(int[] arr, int i)
{
int low = 0, high = arr.Length - 1, mid;
while (low <= high)
{
mid = (low + high) / 2;
if (i < arr[mid])
high = mid - 1;
else if (i > arr[mid])
low = mid + 1;
else
return mid;
}
return -1;
}
When comparing it to C#'s native Array.BinarySearch() I can see that Array.BinarySearch() is more than twice as fast as my function, every single time.
MSDN on Array.BinarySearch:
Searches an entire one-dimensional sorted array for a specific element, using the IComparable generic interface implemented by each element of the Array and by the specified object.
What makes this approach so fast?
Test code
using System;
using System.Diagnostics;
class Program
{
static void Main()
{
Random rnd = new Random();
Stopwatch sw = new Stopwatch();
const int ELEMENTS = 10000000;
int temp;
int[] arr = new int[ELEMENTS];
for (int i = 0; i < ELEMENTS; i++)
arr[i] = rnd.Next(int.MinValue,int.MaxValue);
Array.Sort(arr);
// Custom binarySearch
sw.Restart();
for (int i = 0; i < ELEMENTS; i++)
temp = binarySearch(arr, i);
sw.Stop();
Console.WriteLine($"Elapsed time for custom binarySearch: {sw.ElapsedMilliseconds}ms");
// C# Array.BinarySearch
sw.Restart();
for (int i = 0; i < ELEMENTS; i++)
temp = Array.BinarySearch(arr,i);
sw.Stop();
Console.WriteLine($"Elapsed time for C# BinarySearch: {sw.ElapsedMilliseconds}ms");
}
static int binarySearch(int[] arr, int i)
{
int low = 0, high = arr.Length - 1, mid;
while (low <= high)
{
mid = (low+high) / 2;
if (i < arr[mid])
high = mid - 1;
else if (i > arr[mid])
low = mid + 1;
else
return mid;
}
return -1;
}
}
Test results
+------------+--------------+--------------------+
| Attempt No | binarySearch | Array.BinarySearch |
+------------+--------------+--------------------+
| 1 | 2700ms | 1099ms |
| 2 | 2696ms | 1083ms |
| 3 | 2675ms | 1077ms |
| 4 | 2690ms | 1093ms |
| 5 | 2700ms | 1086ms |
+------------+--------------+--------------------+
Your code is faster when run outside Visual Studio:
Yours vs Array's:
From VS - Debug mode: 3248 vs 1113
From VS - Release mode: 2932 vs 1100
Running exe - Debug mode: 3152 vs 1104
Running exe - Release mode: 559 vs 1104
Array's code might be already optimized in the framework but also does a lot more checking than your version (for instance, your version may overflow if arr.Length is greater than int.MaxValue / 2) and, as already said, is designed for a wide range of types, not just int[].
So, basically, it's slower only when you are debugging your code, because Array's code is always run in release and with less control behind the scenes.

Result not matching. Floating point error?

I am trying to rewrite the R function acf that computes Auto-Correlation into C#:
class AC
{
static void Main(string[] args)
{
double[] y = new double[] { 772.9, 909.4, 1080.3, 1276.2, 1380.6, 1354.8, 1096.9, 1066.7, 1108.7, 1109, 1203.7, 1328.2, 1380, 1435.3, 1416.2, 1494.9, 1525.6, 1551.1, 1539.2, 1629.1, 1665.3, 1708.7, 1799.4, 1873.3, 1973.3, 2087.6, 2208.3, 2271.4, 2365.6, 2423.3, 2416.2, 2484.8, 2608.5, 2744.1, 2729.3, 2695, 2826.7, 2958.6, 3115.2, 3192.4, 3187.1, 3248.8, 3166, 3279.1, 3489.9, 3585.2, 3676.5 };
Console.WriteLine(String.Join("\n", acf(y, 17)));
Console.Read();
}
public static double[] acf(double[] series, int maxlag)
{
List<double> acf_values = new List<double>();
float flen = (float)series.Length;
float xbar = ((float)series.Sum()) / flen;
int N = series.Length;
double variance = 0.0;
for (int j = 0; j < N; j++)
{
variance += (series[j] - xbar)*(series[j] - xbar);
}
variance = variance / N;
for (int lag = 0; lag < maxlag + 1; lag++)
{
if (lag == 0)
{
acf_values.Add(1.0);
continue;
}
double autocv = 0.0;
for (int k = 0; k < N - lag; k++)
{
autocv += (series[k] - xbar) * (series[lag + k] - xbar);
}
autocv = autocv / (N - lag);
acf_values.Add(autocv / variance);
}
return acf_values.ToArray();
}
}
I have two problems with this code:
For large arrays (length = 25000), this code takes about 1-2 seconds whereas R's acf function returns in less than 200 ms.
The output does not match R's output exactly.
Any suggestions on where I messed up or any optimizations to the code?
C# R
1 1 1
2 0.945805846 0.925682317
3 0.89060465 0.85270658
4 0.840762283 0.787096604
5 0.806487301 0.737850083
6 0.780259665 0.697253317
7 0.7433111 0.648420319
8 0.690344341 0.587527097
9 0.625632533 0.519141887
10 0.556860982 0.450228026
11 0.488922355 0.38489632
12 0.425406196 0.325843042
13 0.367735169 0.273845337
14 0.299647764 0.216766466
15 0.22344712 0.156888402
16 0.14575994 0.099240809
17 0.072389526 0.047746281
18 -0.003238526 -0.002067146
You might try changing this line:
autocv = autocv / (N - lag);
to this:
autocv = autocv / N;
Either of these is an acceptable divisor for the expected value, and R is clearly using the second one.
To see this without having access to a C# compiler, we can read in the table that you have, and adjust the values by dividing each value in the C# column by N/(N - lag), and see that they agree with the values from R.
N is 47 here, and lag ranges from 0 to 17, so N - lag is 47:30.
After copying the table above into my local clipboard:
cr <- read.table(file='clipboard', comment='', check.names=FALSE)
cr$adj <- cr[[1]]/47*(47:30)
max(abs(cr$R - cr$adj))
## [1] 2.2766e-09
A much closer approximation.
You might do better if you define flen and xbar as type double as floats do not have 9 decimal digits of precision.
The reason that R is so much faster is that acf is implemented as native and non-managed code (either C or FORTRAN).

Fill a multidimensional array with same values C#

Is there a faster way of doing this using C#?
double[,] myArray = new double[length1, length2];
for(int i=0;i<length1;i++)
for(int j=0;j<length2;j++)
myArray[i,j] = double.PositiveInfinity;
I remember using C++, there was something called memset() for doing these kind of things...
A multi-dimensional array is just a large block of memory, so we can treat it like one, similar to how memset() works. This requires unsafe code. I wouldn't say it's worth doing unless it's really performance critical. This is a fun exercise, though, so here are some benchmarks using BenchmarkDotNet:
public class ArrayFillBenchmark
{
const int length1 = 1000;
const int length2 = 1000;
readonly double[,] _myArray = new double[length1, length2];
[Benchmark]
public void MultidimensionalArrayLoop()
{
for (int i = 0; i < length1; i++)
for (int j = 0; j < length2; j++)
_myArray[i, j] = double.PositiveInfinity;
}
[Benchmark]
public unsafe void MultidimensionalArrayNaiveUnsafeLoop()
{
fixed (double* a = &_myArray[0, 0])
{
double* b = a;
for (int i = 0; i < length1; i++)
for (int j = 0; j < length2; j++)
*b++ = double.PositiveInfinity;
}
}
[Benchmark]
public unsafe void MultidimensionalSpanFill()
{
fixed (double* a = &_myArray[0, 0])
{
double* b = a;
var span = new Span<double>(b, length1 * length2);
span.Fill(double.PositiveInfinity);
}
}
[Benchmark]
public unsafe void MultidimensionalSseFill()
{
var vectorPositiveInfinity = Vector128.Create(double.PositiveInfinity);
fixed (double* a = &_myArray[0, 0])
{
double* b = a;
ulong i = 0;
int size = Vector128<double>.Count;
ulong length = length1 * length2;
for (; i < (length & ~(ulong)15); i += 16)
{
Sse2.Store(b+size*0, vectorPositiveInfinity);
Sse2.Store(b+size*1, vectorPositiveInfinity);
Sse2.Store(b+size*2, vectorPositiveInfinity);
Sse2.Store(b+size*3, vectorPositiveInfinity);
Sse2.Store(b+size*4, vectorPositiveInfinity);
Sse2.Store(b+size*5, vectorPositiveInfinity);
Sse2.Store(b+size*6, vectorPositiveInfinity);
Sse2.Store(b+size*7, vectorPositiveInfinity);
b += size*8;
}
for (; i < (length & ~(ulong)7); i += 8)
{
Sse2.Store(b+size*0, vectorPositiveInfinity);
Sse2.Store(b+size*1, vectorPositiveInfinity);
Sse2.Store(b+size*2, vectorPositiveInfinity);
Sse2.Store(b+size*3, vectorPositiveInfinity);
b += size*4;
}
for (; i < (length & ~(ulong)3); i += 4)
{
Sse2.Store(b+size*0, vectorPositiveInfinity);
Sse2.Store(b+size*1, vectorPositiveInfinity);
b += size*2;
}
for (; i < length; i++)
{
*b++ = double.PositiveInfinity;
}
}
}
}
Results:
| Method | Mean | Error | StdDev | Ratio |
|------------------------------------- |-----------:|----------:|----------:|------:|
| MultidimensionalArrayLoop | 1,083.1 us | 11.797 us | 11.035 us | 1.00 |
| MultidimensionalArrayNaiveUnsafeLoop | 436.2 us | 8.567 us | 8.414 us | 0.40 |
| MultidimensionalSpanFill | 321.2 us | 6.404 us | 10.875 us | 0.30 |
| MultidimensionalSseFill | 231.9 us | 4.616 us | 11.323 us | 0.22 |
MultidimensionalArrayLoop is slow because of bounds checking. The JIT emits code each loop that makes sure that [i, j] is inside the bounds of the array. The JIT can elide bounds checking sometimes, I know it does for single-dimensional arrays. I'm not sure if it does it for multi-dimensional.
MultidimensionalArrayNaiveUnsafeLoop is essentially the same code as MultidimensionalArrayLoop but without bounds checking. It's considerably faster, taking 40% of the time. It's considered 'Naive', though, because the loop could still be improved by unrolling the loop.
MultidimensionalSpanFill also has no bounds check, and is more-or-less the same as MultidimensionalArrayNaiveUnsafeLoop, however, Span.Fill internally does loop unrolling, which is why it's a bit faster than our naive unsafe loop. It only take 30% of the time as our original.
MultidimensionalSseFill improves on our first unsafe loop by doing two things: loop unrolling and vectorizing. This requires a CPU with Sse2 support, but it allows us to write 128-bits (16 bytes) in a single instruction. This gives us an additional speed boost, taking it down to 22% of the original. Interestingly, this same loop with Avx (256-bits) was consistently slower than the Sse2 version, so that benchmark is not included here.
But these numbers only apply to an array that is 1000x1000. As you change the size of the array, the results differ. For example, when we change the array size to 10000x10000, the results for all of the unsafe benchmarks are very close. Probably because there are more memory fetches for the larger array that it tends to equalize the smaller iterative improvements seen in the last three benchmarks.
There's a lesson in there somewhere, but I mostly just wanted to share these results, since it was a pretty fun experiment to do.
I wrote the method that is not faster, but it works with actual multidimensional arrays, not only 2D.
public static class ArrayExtensions
{
public static void Fill(this Array array, object value)
{
var indicies = new int[array.Rank];
Fill(array, 0, indicies, value);
}
public static void Fill(Array array, int dimension, int[] indicies, object value)
{
if (dimension < array.Rank)
{
for (int i = array.GetLowerBound(dimension); i <= array.GetUpperBound(dimension); i++)
{
indicies[dimension] = i;
Fill(array, dimension + 1, indicies, value);
}
}
else
array.SetValue(value, indicies);
}
}
double[,] myArray = new double[x, y];
if( parallel == true )
{
stopWatch.Start();
System.Threading.Tasks.Parallel.For( 0, x, i =>
{
for( int j = 0; j < y; ++j )
myArray[i, j] = double.PositiveInfinity;
});
stopWatch.Stop();
Print( "Elapsed milliseconds: {0}", stopWatch.ElapsedMilliseconds );
}
else
{
stopWatch.Start();
for( int i = 0; i < x; ++i )
for( int j = 0; j < y; ++j )
myArray[i, j] = double.PositiveInfinity;
stopWatch.Stop();
Print("Elapsed milliseconds: {0}", stopWatch.ElapsedMilliseconds);
}
When setting x and y to 10000 I get 553 milliseconds for the single-threaded approach and 170 for the multi-threaded one.
There is a possibility to quickly fill an md-array that does not use the keyword unsafe (see answers for this question)

Categories

Resources