System.Numerics.Vector brings SIMD support to .NET Core and .NET Framework. It works on .NET Framework 4.6+ and .NET Core.
// Baseline
public void SimpleSumArray()
{
for (int i = 0; i < left.Length; i++)
results[i] = left[i] + right[i];
}
// Using Vector<T> for SIMD support
public void SimpleSumVectors()
{
int ceiling = left.Length / floatSlots * floatSlots;
for (int i = 0; i < ceiling; i += floatSlots)
{
Vector<float> v1 = new Vector<float>(left, i);
Vector<float> v2 = new Vector<float>(right, i);
(v1 + v2).CopyTo(results, i);
}
for (int i = ceiling; i < left.Length; i++)
{
results[i] = left[i] + right[i];
}
}
Unfortunately, the initialization of the Vector can be the limiting step. To work around this, several sources recommend using MemoryMarshal to transform the source array into an array of Vectors [1][2]. For example:
// Improving Vector<T> Initialization Performance
public void SimpleSumVectorsNoCopy()
{
int numVectors = left.Length / floatSlots;
int ceiling = numVectors * floatSlots;
// leftMemory is simply a ReadOnlyMemory<float> referring to the "left" array
ReadOnlySpan<Vector<float>> leftVecArray = MemoryMarshal.Cast<float, Vector<float>>(leftMemory.Span);
ReadOnlySpan<Vector<float>> rightVecArray = MemoryMarshal.Cast<float, Vector<float>>(rightMemory.Span);
Span<Vector<float>> resultsVecArray = MemoryMarshal.Cast<float, Vector<float>>(resultsMemory.Span);
for (int i = 0; i < numVectors; i++)
resultsVecArray[i] = leftVecArray[i] + rightVecArray[i];
}
This brings a dramatic improvement in performance when running on .NET Core:
| Method | Mean | Error | StdDev |
|----------------------- |----------:|----------:|----------:|
| SimpleSumArray | 165.90 us | 0.1393 us | 0.1303 us |
| SimpleSumVectors | 53.69 us | 0.0473 us | 0.0443 us |
| SimpleSumVectorsNoCopy | 31.65 us | 0.1242 us | 0.1162 us |
Unfortunately, on .NET Framework, this way of initializing the vector has the opposite effect. It actually leads to worse performance:
| Method | Mean | Error | StdDev |
|----------------------- |----------:|---------:|---------:|
| SimpleSumArray | 152.92 us | 0.128 us | 0.114 us |
| SimpleSumVectors | 52.35 us | 0.041 us | 0.038 us |
| SimpleSumVectorsNoCopy | 77.50 us | 0.089 us | 0.084 us |
Is there a way to optimize the initialization of Vector on .NET Framework and get similar performance to .NET Core? Measurements have been performed using this sample application [1].
[1] https://github.com/CBGonzalez/SIMDPerformance
[2] https://stackoverflow.com/a/62702334/430935
As far as I know, the only efficient way to load a vector in .NET Framework 4.6 or 4.7 (presumably this will all change in 5.0) is with unsafe code, for example using Unsafe.Read<Vector<float>> (or its unaliged variant if applicable):
public unsafe void SimpleSumVectors()
{
int ceiling = left.Length / floatSlots * floatSlots;
fixed (float* leftp = left, rightp = right, resultsp = results)
{
for (int i = 0; i < ceiling; i += floatSlots)
{
Unsafe.Write(resultsp + i,
Unsafe.Read<Vector<float>>(leftp + i) + Unsafe.Read<Vector<float>>(rightp + i));
}
}
for (int i = ceiling; i < left.Length; i++)
{
results[i] = left[i] + right[i];
}
}
This uses the System.Runtime.CompilerServices.Unsafe package which you can get via NuGet, but it could be done without that too.
Related
I've run into something strange, when using AsSpan.Fill it's twice as fast on a byte[] array as opposed to an int or float array, and they are all of the same size in bytes. BUT it depends on the size of the arrays, on small arrays it is the same, but on larger ones the difference shows.
Here is a sample console application to illustrate
internal unsafe class Program {
static byte[]? ByteFrame;
static Int32[]? Int32Frame;
static float[]? FloatFrame;
static int[]? ResetCacheArray;
static void Main(string[] args) {
// size vars
int Width = 1500;
int Height = 1500;
// Init frames
ByteFrame = new byte[Width * Height * 4];
ByteFrame.AsSpan().Fill(0);
Int32Frame = new Int32[Width * Height];
Int32Frame.AsSpan().Fill(0);
FloatFrame = new float[Width * Height];
FloatFrame.AsSpan().Fill(1);
ResetCacheArray = new int[10000 * 10000];
ResetCacheArray.AsSpan().Fill(1);
// warmup jitter
for(int i = 0; i < 200; i++) {
ClearByteFrameAsSpanFill(0);
ClearInt32FrameAsSpanFill(0);
ClearFloatFrameAsSpanFill(0f);
ClearCache();
}
Console.WriteLine(Environment.Is64BitProcess);
int TestIterations;
double nanoseconds;
double MsDuration;
double MB = 0;
double MBSec;
double GBSec;
TestIterations = 1;
nanoseconds = 1_000_000_000.0 * Stopwatch.GetTimestamp() / Stopwatch.Frequency;
for (int i = 0; i < TestIterations; i++) {
MB = ClearByteFrameAsSpanFill(0);
}
MsDuration = (((1_000_000_000.0 * Stopwatch.GetTimestamp() / Stopwatch.Frequency) - nanoseconds) / TestIterations) / 1000000;
MBSec = (MB / MsDuration) * 1000;
GBSec = MBSec / 1000;
Console.WriteLine("ClearByteFrameAsSpanFill: MS:" + MsDuration + " GB/s:" + (int)GBSec + " MB/s:" + (int)MBSec);
ClearCache();
TestIterations = 1;
nanoseconds = 1_000_000_000.0 * Stopwatch.GetTimestamp() / Stopwatch.Frequency;
for (int i = 0; i < TestIterations; i++) {
MB = ClearInt32FrameAsSpanFill(1);
}
MsDuration = (((1_000_000_000.0 * Stopwatch.GetTimestamp() / Stopwatch.Frequency) - nanoseconds) / TestIterations) / 1000000;
MBSec = (MB / MsDuration) * 1000;
GBSec = MBSec / 1000;
Console.WriteLine("ClearInt32FrameAsSpanFill: MS:" + MsDuration + " GB/s:" + (int)GBSec + " MB/s:" + (int)MBSec);
ClearCache();
TestIterations = 1;
nanoseconds = 1_000_000_000.0 * Stopwatch.GetTimestamp() / Stopwatch.Frequency;
for (int i = 0; i < TestIterations; i++) {
MB = ClearFloatFrameAsSpanFill(1f);
}
MsDuration = (((1_000_000_000.0 * Stopwatch.GetTimestamp() / Stopwatch.Frequency) - nanoseconds) / TestIterations) / 1000000;
MBSec = (MB / MsDuration) * 1000;
GBSec = MBSec / 1000;
Console.WriteLine("ClearFloatFrameAsSpanFill: MS:" + MsDuration + " GB/s:" + (int)GBSec + " MB/s:" + (int)MBSec);
ClearCache();
Console.ReadLine();
}
static double ClearByteFrameAsSpanFill(byte clearValue) {
ByteFrame.AsSpan().Fill(clearValue);
return ByteFrame.Length / 1000000;
}
static double ClearInt32FrameAsSpanFill(Int32 clearValue) {
Int32Frame.AsSpan().Fill(clearValue);
return (Int32Frame.Length * 4) / 1000000;
}
static double ClearFloatFrameAsSpanFill(float clearValue) {
FloatFrame.AsSpan().Fill(clearValue);
return (FloatFrame.Length * 4) / 1000000;
}
static void ClearCache() {
int sum = 0;
for (int i = 0; i < ResetCacheArray.Length; i++) {
sum += ResetCacheArray[i];
}
}
}
On my machine it outputs the following:
ClearByteFrameAsSpanFill: MS:0,4913 GB/s:18 MB/s:18318
ClearInt32FrameAsSpanFill: MS:0,4851 GB/s:18 MB/s:18552
ClearFloatFrameAsSpanFill: MS:0,458 GB/s:19 MB/s:19650
It varies a little from run to run, + - a few GB/s but roughly each operation takes the same amount of time.
Now when i change the size variables to: Width = 4500, Height = 4500 then it outputs the following:
ClearByteFrameAsSpanFill: MS:3,4015 GB/s:23 MB/s:23813
ClearInt32FrameAsSpanFill: MS:7,635 GB/s:10 MB/s:10609
ClearFloatFrameAsSpanFill: MS:7,4429 GB/s:10 MB/s:10882
This will obviously change depending on ram speed from machine to machine, but on mine at least it is as such, on "small" arrays it is the same, but on large arrays filling a byte array is twice as fast as a int or float array of same byte length.
Does anyone have an explanation of this?
You are testing filling the byte array with 0 and filling the int array with 1:
ClearByteFrameAsSpanFill(0);
ClearInt32FrameAsSpanFill(1);
These cases have different optimisations.
If you fill an array of bytes with any value it will be around the same speed, because there's a processor instruction to fill a block of bytes with a specific byte value.
Although there may be processor instructions to fill an array of int or float values with non-zero values, they are likely to be slower than filling the block of memory with zero values.
I tried this out with the following code using BenchmarkDotNet:
[SimpleJob(RuntimeMoniker.Net60)]
public class UnderTest
{
[Benchmark]
public void FillBytesWithZero()
{
_bytes.AsSpan().Fill(0);
}
[Benchmark]
public void FillBytesWithOne()
{
_bytes.AsSpan().Fill(1);
}
[Benchmark]
public void FillIntsWithZero()
{
_ints.AsSpan().Fill(0);
}
[Benchmark]
public void FillIntsWithOne()
{
_ints.AsSpan().Fill(1);
}
const int COUNT = 1500 * 1500;
static readonly byte[] _bytes = new byte[COUNT * sizeof(int)];
static readonly int[] _ints = new int[COUNT];
}
With the following results:
For COUNT = 1500 * 1500:
| Method | Mean | Error | StdDev | Median |
|------------------ |---------:|---------:|---------:|---------:|
| FillBytesWithZero | 299.7 us | 7.82 us | 22.95 us | 299.3 us |
| FillBytesWithOne | 305.6 us | 11.46 us | 33.80 us | 293.3 us |
| FillIntsWithZero | 322.4 us | 2.37 us | 2.10 us | 321.6 us |
| FillIntsWithOne | 502.9 us | 27.68 us | 81.60 us | 534.4 us |
For COUNT = 4500 * 4500:
| Method | Mean | Error | StdDev |
|------------------ |---------:|----------:|----------:|
| FillBytesWithZero | 2.554 ms | 0.0307 ms | 0.0240 ms |
| FillBytesWithOne | 2.632 ms | 0.0522 ms | 0.1101 ms |
| FillIntsWithZero | 4.169 ms | 0.0258 ms | 0.0229 ms |
| FillIntsWithOne | 4.979 ms | 0.0488 ms | 0.0433 ms |
Note how filling a byte array with 0 or 1 is significantly faster.
If you inspect the source code for Span<T>.Fill() you'll see this:
public void Fill(T value)
{
if (Unsafe.SizeOf<T>() == 1)
{
// Special-case single-byte types like byte / sbyte / bool.
// The runtime eventually calls memset, which can efficiently support large buffers.
// We don't need to check IsReferenceOrContainsReferences because no references
// can ever be stored in types this small.
Unsafe.InitBlockUnaligned(ref Unsafe.As<T, byte>(ref _reference), Unsafe.As<T, byte>(ref value), (uint)_length);
}
else
{
// Call our optimized workhorse method for all other types.
SpanHelpers.Fill(ref _reference, (uint)_length, value);
}
}
This explains why filling a byte array is faster than filling an int array: It uses Unsafe.InitBlockUnaligned() for a byte array and SpanHelpers.Fill(ref _reference, (uint)_length, value); for a non-byte array.
Unsafe.InitBlockUnaligned() happens to be more performant; it's implemented as an intrinsic which performs the following:
ldarg .0
ldarg .1
ldarg .2
unaligned. 0x1
initblk
ret
Whereas SpanHelpers.Fill() is much less optimised.
It tries its best, using vectorised instructions to fill the memory if possible, but it can't compete with initblk. (It's too long to post here, but you can follow that link to look at it.)
One thing this doesn't explain is why filling an int array with zeroes is slightly faster than filling it with ones. To explain this you'd have to look at the actual processor instructions that the JIT produces, but it's definitely faster to fill a block of bytes with all 0's than it is to fill a block of bytes with 1,0,0,0 (which it would have to do for an int value of 1).
It's probably down to the comparative speeds of instructions like rep stosb (for bytes) and rep stosw (for words).
The outlier in these results is that the unaligned.1 initblk opcode sequence is about 50% faster for the smaller block size. The other times all scale up by approximately the increase in size of the memory block, i.e. around 9 times slower for the blocks that are 9 times bigger.
So the remaining question is: Why is initblk 50% faster per-byte for smaller buffer sizes (2_250_000 versus 20_250_000 bytes)?
I struggle to realize why my usage of intrinsics API is slower than just sum with foreach loop?
public class ArraySum
{
private double[] data;
public ArraySum()
{
if (!Avx.IsSupported)
{
throw new Exception("Avx is not supported");
}
var rnd = new Random();
var list = new List<double>();
for (int i = 0; i < 100_000; i++)
{
list.Add(rnd.Next(500));
}
data = list.ToArray();
}
[Benchmark]
public void Native()
{
int result = 0;
foreach (int i in data)
{
result += i;
}
Console.WriteLine($"Native: {result}");
}
[Benchmark]
public unsafe void Intrinsics()
{
int vectorSize = 256 / 8 / 4;
var accVector = Vector256<double>.Zero;
int i;
var array = data;
fixed (double* ptr = array)
{
for (i = 0; i <= array.Length - vectorSize; i += vectorSize)
{
var v = Avx.LoadVector256(ptr + i);
accVector = Avx.Add(accVector, v);
}
}
double result = 0;
var temp = stackalloc double[vectorSize];
Avx.Store(temp, accVector);
for (int j = 0; j < vectorSize; j++)
{
result += temp[j];
}
for (; i < array.Length; i++)
{
result += array[i];
}
Console.WriteLine($"Intrinsics: {result}");
}
Result:
.NET SDK=6.0.100-rc.2.21505.57
| Method | Mean | Error | StdDev | Median |
|----------- |---------:|---------:|---------:|---------:|
| Native | 387.6 us | 12.15 us | 35.83 us | 405.8 us |
| Intrinsics | 393.2 us | 9.01 us | 25.70 us | 385.0 us |
what may be causing this?
It's running on Windows and Intel Core i5-3340M CPU 2.70GHz (Ivy Bridge) if it does matter
BenchmarkDotNet warns that ArraySum.Native: Default -> It seems that the distribution is bimodal (mValue = 3.92)
I just realized that
native method should perform it on doubles not ints, opsie
[Benchmark]
public void Native()
{
double result = 0;
foreach (double i in data)
{
result += i;
}
Console.WriteLine($"Native: {result}");
}
| Method | Mean | Error | StdDev | Median |
|----------- |---------:|---------:|---------:|---------:|
| Native | 415.1 us | 25.35 us | 73.95 us | 385.9 us |
| Intrinsics | 388.7 us | 7.58 us | 21.74 us | 384.7 us |
but also:
Console.WriteLine adds probably too much overhead which is way higher than time spent performing sum and skews the results
now the difference is more significant:
[Benchmark]
public double Native()
{
double result = 0;
foreach (double i in data)
{
result += i;
}
return result;
}
[Benchmark]
public unsafe double Intrinsics()
{
int vectorSize = 256 / 8 / 4;
var accVector = Vector256<double>.Zero;
int i;
var array = data;
fixed (double* ptr = array)
{
for (i = 0; i <= array.Length - vectorSize; i += vectorSize)
{
var v = Avx.LoadVector256(ptr + i);
accVector = Avx.Add(accVector, v);
}
}
double result = 0;
var temp = stackalloc double[vectorSize];
Avx.Store(temp, accVector);
for (int j = 0; j < vectorSize; j++)
{
result += temp[j];
}
for (; i < array.Length; i++)
{
result += array[i];
}
return result;
}
| Method | Mean | Error | StdDev |
|----------- |---------:|---------:|---------:|
| Native | 92.92 us | 1.547 us | 1.447 us |
| Intrinsics | 25.06 us | 0.459 us | 1.090 us |
I have a scenario where the user will input a number that will be the multiplier for an image (the image isn't from a file it's just text on the console using special characters). It could be thought of as ASCII art but I didn't go the route of using ASCII values for this scenario. I instead am using multiple strings, and I want to only multiply specific characters within each string, where it will match the original scale of the original 'image', but bigger depending on what the user picks.
I've tried going down the ASCII route, but my code was pretty messy. I tried to make each line of the image a separate method, and call each of them through a parameter, but never got to the parameter part because I got confused on how I could change the specific characters like stated above.
static void Main (string[] args)
{
string edgeBorder = "#================#";
int multiplier;
multiplier = Int32.Parse(Console.ReadLine());
Console.WriteLine(multiplier);
Console.WriteLine("Sure! Coming right up...");
//Top layer of quilt
for (int i = 0; i < multiplier; i++)
{
Console.Write(edgeBorder + " ");
}
Console.Write("\n");
//top half of quilt
for (int line = 1; line <= 4; line++)
{
for (int s = 0, s < (8 - 2 * line) * multiplier; s++)
{ //s for space
Console.Write(" ");
}
Console.Write("|" + " ");
if (line == 1)
{
for (int d = 0; d < 2 * multiplier; d++)
{ //d for diamond
Console.Write("<>");
}
}
else
{
Console.Write("<>");
for (int p = 0; p < 4 * multiplier * (line - 1); p++)
{ //p for period
Console.Write(".");
}
Console.Write("<>");
}
Console.Write(" " + "|");
for (int s = 0, s < (8 - 2 * line) * multiplier; s++)
{ //s for space
Console.Write(" ");
}
Console.Write("\n");
}
//bottom half of quilt
for (int line = 4; line >= 1; line--)
{
for (int s = 0, s < (8 - 2 * line) * multiplier; s++)
{ //s for space
Console.Write(" ");
}
Console.Write("|" + " ");
if (line == 1)
{
for (int d = 0; d < 2 * multiplier; d++)
{ //d for diamond
Console.Write("<>");
}
}
else
{
Console.Write("<>");
for (int p = 0; p < 4 * multiplier * (line - 1); p++)
{ //p for period
Console.Write(".");
}
Console.Write("<>");
}
Console.Write(" " + "|");
for (int s = 0, s < (8 - 2 * line) * multiplier; s++)
{ //s for space
Console.Write(" ");
}
Console.Write("\n");
}
//bottom layer of quilt
for (int i = 0; i < multiplier; i++)
{
Console.Write(edgeBorder + " ");
}
}
//I went a different route, and decided not to use an array, only issue is it
//keeps telling me 's' is already defined in the scope, but when I change it to
//something different it says the same exact thing
/*$================$
| <><> |
| <>....<> |
| <>........<> |
|<>............<>|
|<>............<>|
| <>........<> |
| <>....<> |
| <><> |
$================$
and be this if multiplied by 2
$================$ $================$
| <><><><> |
| <>........<> |
| <>................<> |
|<>........................<>|
|<>........................<>|
| <>................<> |
| <>........<> |
| <><><><> |
$================$ #================$
I tried putting this into the "what's expected but it wouldn't format properly*/
I'm expecting an 'image' to be shown in the console, that's at whatever scale the user wants it to be. I'm lost on how to implement my idea to the code.
I put the expected part in the code section, it wouldn't format properly.
A. Inflate
Algorithm
image_width = firstLine.Lenght;
for each line:
if( first or last)
repeat with a space in between
else
pad right until width == image_width
repeat each air molecule (the dot) twice
( pad left, pad right ) until width == image_width
Code
private static IEnumerable<string> Inflate(string[] lines, int scale, string air)
{
// image_width = firstLine.Lenght;
// for each line:
// if( first or last)
// repeat with a space in between
// else
// pad right until width == image_width
// repeat each air molecule (the dot) twice
// ( pad left, pad right ) until width == image_width
var imageWidth = lines[0].TrimEnd().Length;
return lines.Select((line, i) =>{
if (i == 0 || i == lines.Length - 1)
return string.Join("", Enumerable.Repeat(line.TrimEnd(), scale));
line = line.PadRight(imageWidth, ' ');
line = line.Replace(air, string.Join("", Enumerable.Repeat(air, scale)));
while (line.Length < imageWidth * scale) line = " " + line + " ";
return line;
});
}
private static string Inflate(string input, int scale, string air)
=> string.Join(Environment.NewLine, Inflate(
input.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries), scale, air));
Test code
// I Maually removed '/*' from the input
// Please note that the first line ends with a space,
// but this space trimmed and is not considered part of the image
string input = #"$================$
| <><> |
| <>....<> |
| <>........<> |
|<>............<>|
|<>............<>|
| <>........<> |
| <>....<> |
| <><> |
$================$";
Console.WriteLine(input);
Console.WriteLine(Inflate(input, air: ".", scale: 2));
Console.WriteLine(Inflate(input, air: ".", scale: 3));
Console.WriteLine(Inflate(input, air: ".", scale: 4));
Output
Notes:
The images don't contain the spaces between the repeated header and footer, and your example output has spaces, but these spaces make the images asymmetrical, so I didn't includ them.
Your test image changes | <><> | to | <><><><> |, and this alogorithm doesn't. It could be modified to do that. It could also be extended to not have gaps so that air (.) doesn't escape, but that's for another day.
$================$
| <><> |
| <>....<> |
| <>........<> |
|<>............<>|
|<>............<>|
| <>........<> |
| <>....<> |
| <><> |
$================$
$================$$================$
| <><> |
| <>........<> |
| <>................<> |
|<>........................<>|
|<>........................<>|
| <>................<> |
| <>........<> |
| <><> |
$================$$================$
$================$$================$$================$
| <><> |
| <>............<> |
| <>........................<> |
|<>....................................<>|
|<>....................................<>|
| <>........................<> |
| <>............<> |
| <><> |
$================$$================$$================$
$================$$================$$================$$================$
| <><> |
| <>................<> |
| <>................................<> |
|<>................................................<>|
|<>................................................<>|
| <>................................<> |
| <>................<> |
| <><> |
$================$$================$$================$$================$
B. Multiply the atoms
This is not exactly what you want, but it may be good enough.
var output = string.Join(Environment.NewLine,
lines.Select( l => new string(l.SelectMany(ch => Enumerable.Repeat(ch, scale)).ToArray())));
Test
Code
string input = #"/*$================$
| <><> |
| <>....<> |
| <>........<> |
|<>............<>|
|<>............<>|
| <>........<> |
| <>....<> |
| <><> |
$================$";
Console.WriteLine(input);
var lines = input.Split(new [] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);
int scale = 2;
Console.WriteLine("");
Console.WriteLine($"Scale : {scale}");
Console.WriteLine("");
var newLines =
lines.Select( l => new string(l.SelectMany(ch => Enumerable.Repeat(ch, scale)).ToArray()));
var output = string.Join(Environment.NewLine,newLines);
Console.WriteLine(output);
Output
/*$================$
| <><> |
| <>....<> |
| <>........<> |
|<>............<>|
|<>............<>|
| <>........<> |
| <>....<> |
| <><> |
$================$
Scale : 2
//**$$================================$$
|| <<>><<>> ||
|| <<>>........<<>> ||
|| <<>>................<<>> ||
||<<>>........................<<>>||
||<<>>........................<<>>||
|| <<>>................<<>> ||
|| <<>>........<<>> ||
|| <<>><<>> ||
$$================================$$
I am looking for more efficient algorithm for printing numbers that are palindromic (for example 1001) and their power to 2 (1001 * 1001 = 1002001) are palindromic too. In my algorithm I think I make unnecessary checks to determine if number is palindromic. How can I improve it?
In [1000,9999] range I found this kind of 3 numbers: 1001, 1111 and 2002.
This is my algorithm:
for (int i = n; i <= m; i++)
{
if (checkIfPalindromic(i.ToString()))
{
if (checkIfPalindromic((i * i).ToString()))
Console.WriteLine(i);
}
}
this is my method to determine if number is palindromic:
static bool checkIfPalindromic(string A)
{
int n = A.Length - 1;
int i = 0;
bool IsPalindromic = true;
while (i < (n - i))
{
if (A[i] != A[n - i])
{
IsPalindromic = false;
break;
}
i++;
}
return IsPalindromic;
}
Instead of checking very number for "palindromness", it may be better to iterate through palindromes only. For that just iterate over the first halves of the number and then compose palindrome from it.
for(int half=10;half<=99;++half)
{
const int candidate=half*100+Reverse(half);//may need modification for odd number of digits
if(IsPalindrome(candidate*candidate))
Output(candidate);
}
This will make your program O(sqrt(m)) instead of O(m), which will probably beat all improvements of constant factors.
What you have already seems fairly efficient
Scale is checking 1,000,000 integers
Note : i use longs
Disclaimer : I must admit these results are a little sketchy, ive added more scaling so you can see
Results
Mode : Release
Test Framework : .Net 4.7.1
Benchmarks runs : 10 times (averaged)
Scale : 1,000
Name | Average | Fastest | StDv | Cycles | Pass | Gain
-----------------------------------------------------------------
Mine2 | 0.107 ms | 0.102 ms | 0.01 | 358,770 | Yes | 5.83 %
Original | 0.114 ms | 0.098 ms | 0.05 | 361,810 | Base | 0.00 %
Mine | 0.120 ms | 0.100 ms | 0.03 | 399,935 | Yes | -5.36 %
Scale : 10,000
Name | Average | Fastest | StDv | Cycles | Pass | Gain
-------------------------------------------------------------------
Mine2 | 1.042 ms | 0.944 ms | 0.17 | 3,526,050 | Yes | 11.69 %
Mine | 1.073 ms | 0.936 ms | 0.19 | 3,633,369 | Yes | 9.06 %
Original | 1.180 ms | 0.920 ms | 0.29 | 3,964,418 | Base | 0.00 %
Scale : 100,000
Name | Average | Fastest | StDv | Cycles | Pass | Gain
--------------------------------------------------------------------
Mine2 | 10.406 ms | 9.502 ms | 0.91 | 35,341,208 | Yes | 6.59 %
Mine | 10.479 ms | 9.332 ms | 1.09 | 35,592,718 | Yes | 5.93 %
Original | 11.140 ms | 9.272 ms | 1.72 | 37,624,494 | Base | 0.00 %
Scale : 1,000,000
Name | Average | Fastest | StDv | Cycles | Pass | Gain
-------------------------------------------------------------------------
Original | 106.271 ms | 101.662 ms | 3.61 | 360,996,200 | Base | 0.00 %
Mine | 107.559 ms | 102.695 ms | 5.35 | 365,525,239 | Yes | -1.21 %
Mine2 | 108.757 ms | 104.530 ms | 4.81 | 368,939,992 | Yes | -2.34 %
Mode : Release
Test Framework : .Net Core 2.0
Benchmarks runs : 10 times (averaged)
Scale : 1,000,000
Name | Average | Fastest | StDv | Cycles | Pass | Gain
-------------------------------------------------------------------------
Mine2 | 95.054 ms | 87.144 ms | 8.45 | 322,650,489 | Yes | 10.54 %
Mine | 95.849 ms | 89.971 ms | 5.38 | 325,315,589 | Yes | 9.79 %
Original | 106.251 ms | 84.833 ms | 17.97 | 350,106,144 | Base | 0.00 %
Given
protected override List<int> InternalRun()
{
var results = new List<int>();
for (var i = 0; i <= Input; i++)
if (checkIfPalindromic(i) && checkIfPalindromic(i * (long)i))
results.Add(i);
return results;
}
Mine1
private static unsafe bool checkIfPalindromic(long value)
{
var str = value.ToString();
fixed (char* pStr = str)
{
for (char* p = pStr, p2 = pStr + str.Length - 1; p < p2;)
if (*p++ != *p2--)
return false;
}
return true;
}
Mine2
private static bool checkIfPalindromic(long value)
{
var str = value.ToString();
var n = str.Length - 1;
for (var i = 0; i < n - i; i++)
if (str[i] != str[n - i])
return false;
return true;
}
More optimistic way is to use int instead of string. this algorithm is about two time faster:
static int[] pow10 = { 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000 };
static bool checkIfPalindromic(int A)
{
int n = 1;
int i = A;
if (i >= 100000000) { n += 8; i /= 100000000; }
if (i >= 10000) { n += 4; i /= 10000; }
if (i >= 100) { n += 2; i /= 100; }
if (i >= 10) { n++; }
int num = A / pow10[(n+1) / 2];
for (; num % 10 == 0;)
num /= 10;
int reversedNum = 0;
for (int input = A % pow10[ n / 2]; input != 0; input /= 10)
reversedNum = reversedNum * 10 + input % 10;
return num == reversedNum;
}
Usage:
for (int i = n; i <= m; i++)
if (checkIfPalindromic(i) && checkIfPalindromic(i * i))
Console.WriteLine(i);
Benchmark:
Bemchmark in range of [1000, 99999999] on Core2Duo CPU:
This algorithm: 12261ms
Your algorithm: 24181ms
Palindromic Numbers:
1001
1111
2002
10001
10101
10201
11011
11111
11211
20002
20102
you can use Linq to simplify your code
sample:-
static void Main(string[] args)
{
int n = 1000, m = 9999;
for (int i = n; i <= m; i++)
{
if (CheckIfNoAndPowerPalindromic(i))
{
Console.WriteLine(i);
}
}
}
private static bool CheckIfNoAndPowerPalindromic(int number)
{
string numberString = number.ToString();
string numberSquareString = (number * number).ToString();
return (Enumerable.SequenceEqual(numberString.ToCharArray(), numberString.ToCharArray().Reverse()) &&
Enumerable.SequenceEqual(numberSquareString.ToCharArray(), numberSquareString.ToCharArray().Reverse()));
}
output:-
1001
1111
2002.
Loop up to len/2 as follow:
static bool checkIfPalindromic(string A)
{
for (int i = 0; i < A.Length / 2; i++)
if (A[i] != A[A.Length - i - 1])
return false;
return true;
}
We can get an interesting optimisation by changing the palindromic checking method and using a direct integer reversing method instead of converting first to a string then looping in the string.
I used the method in the accepted answer from this question:
static int reverse(int n)
{
int left = n;
int rev = 0;
int r = 0;
while (left > 0)
{
r = left % 10;
rev = rev * 10 + r;
left = left / 10;
}
return rev;
}
I also used the StopWatch from System.Diagnostics to measure the elapsed time.
My function to check if a number is a palindromic number is:
static bool IsPalindromicNumber(int number)
{
return reverse(number) == number;
}
For n value of 1000 and for different values of m I get the following results for the elapsed time in milliseconds:
---------------------------------------------------------
| m | original | mine | optimisation|
---------------------------------------------------------
|9999 |6.3855 |4.2171 | -33.95% |
---------------------------------------------------------
|99999 |71.3961 |42.3399 | -40.69% |
---------------------------------------------------------
|999999 |524.4921 |342.8899 | -34.62% |
---------------------------------------------------------
|9999999 |7016.4050 |4565.4563 | -34.93% |
---------------------------------------------------------
|99999999 |71319.658 |49837.5632 | -30.12% |
---------------------------------------------------------
The measured values are an indicative and not absolute because from one run of the program to another they are different but the pattern stays the same and the second approach appears always faster.
To measure using the StopWatch:
With your method:
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
for (int i = n; i <= m; i++)
{
if (checkIfPalindromic(i.ToString()))
{
if (checkIfPalindromic((i * i).ToString()))
Console.WriteLine(i);
}
}
stopWatch.Stop();
Console.WriteLine("First approach: Elapsed time..." + stopWatch.Elapsed + " which is " + stopWatch.Elapsed.TotalMilliseconds + " miliseconds");
I used of course exact same approach with my changes:
With my method:
Stopwatch stopWatch2 = new Stopwatch();
stopWatch2.Start();
for (int i = n; i <= m; i++)
{
if (IsPalindromicNumber(i) && IsPalindromicNumber(i*i))
{
Console.WriteLine(i);
}
}
stopWatch2.Stop();
Console.WriteLine("Second approach: Elapsed time..." + stopWatch2.Elapsed + " which is " + stopWatch2.Elapsed.TotalMilliseconds + " miliseconds");
Hey guys I'm developing a Connect 4 game in Windows form application in C# everything works perfectly I'm just stuck in the diagonal part. This is what I've developed for the left down diagonal check but I'm not sure if it works perfectly. It's working from the first tile which is (1,1) but I can't tell about the other tiles. I also wanted to know how to create another method for the other side diagonal match. (gameButtons is my 2D array and my form is 6 rows and 7 columns) This is my method:
private void checkForDiaMatch()
{
int countBlue = 0;
int countRed = 0;
for (int i = 0; i < 6; i++)
{
if (gameButtons[i, i].BackColor == Color.Blue)
{
countBlue++;
}
else
{
countBlue = 0;
}
if (gameButtons[i, i].BackColor == Color.Red)
{
countRed++;
}
else
{
countRed = 0;
}
if (countBlue >= 4)
{
MessageBox.Show("There is a blue diagonal match");
MessageBox.Show("Blue wins!");
}
else if (countRed >= 4)
{
MessageBox.Show("There is a red diagonal match");
MessageBox.Show("Red wins!");
}
}
}
Edit: So since I took some help from the comment section I created this but it's still not working. I tried going for the diagonal right match but no luck yet.
private void checkForDiaMatch(int col,int targetRow)
{
int countBlue = 0;
int countRed = 0;
int xLocation = gameButtons[col, targetRow].Location.X/50;
int yLocation = gameButtons[col, targetRow].Location.Y/50;
//string epop = Convert.ToString(xLocation);
//MessageBox.Show(epop);
if (7 > xLocation + 1 && 6 > yLocation + 1)
{
if (gameButtons[xLocation + 1, yLocation + 1].BackColor == Color.Blue)
{
countBlue++;
string kappaBlue = Convert.ToString(countBlue);
MessageBox.Show(kappaBlue);
}
else
{
countBlue = 0;
}
if (gameButtons[xLocation + 1, yLocation + 1].BackColor == Color.Red)
{
countRed++;
string kappaRed = Convert.ToString(countRed);
MessageBox.Show(kappaRed);
}
else
{
countRed = 0;
}
Probably the most helpful thing would be to draw your board and indexes on paper, and write the indexes of any random diagonal connect 4:
------------------------------- -------------------------------
5 | | | | | | |5,6| 5 | | | | | | | |
--+---+---+---+---+---+---+---+ --+---+---+---+---+---+---+---+
4 | | | | | |4,5| | 4 | |4,1| | | | | |
--+---+---+---+---+---+---+---+ --+---+---+---+---+---+---+---+
3 | | | | |3,4| | | 3 | | |3,2| | | | |
--+---+---+---+---+---+---+---+ --+---+---+---+---+---+---+---+
2 | | | |2,3| | | | 2 | | | |2,3| | | |
--+---+---+---+---+---+---+---+ --+---+---+---+---+---+---+---+
1 | | | | | | | | 1 | | | | |1,4| | |
--+---+---+---+---+---+---+---+ --+---+---+---+---+---+---+---+
0 | | | | | | | | 0 | | | | | | | |
--+---+---+---+---+---+---+---+ --+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
We can see that from any point we need to check in two directions: up/right + down/left (like a /), and up/left + down/right (like a \). We want to count in both these directions the number of same-colored pieces as the one that we started from. As soon as the total number is 4, we can return true. If neither directions has a total of 4 matching pieces, we return false.
I played with this a little, and came up with the following method that works for me. Your classes are probably slightly different (I just did a console app), but hopefully the logic will help:
private bool CompletesDiagonal(int pieceRow, int pieceCol)
{
var colorToMatch = Board[pieceRow, pieceCol]; // Board is a ConsoleColor[7,6] array
var matchingPieces = 1; // We will count the original piece as a match
// Check forward slash direction '/'
// First check down/left (decrement both row and column up to 3 times)
for (int counter = 1; counter < 4; counter++)
{
var row = pieceRow - counter;
var col = pieceCol - counter;
// Make sure we stay within our board
if (row < Board.GetLowerBound(0) || col < Board.GetLowerBound(1)) { break; }
if (Board[row, col] == colorToMatch)
{
matchingPieces++;
if (matchingPieces == 4) return true;
}
else { break; }
}
// Next check up/right (increment both row and column up to 3 times)
for (int counter = 1; counter < 4; counter++)
{
var row = pieceRow + counter;
var col = pieceCol + counter;
// Make sure we stay within our board
if (row > Board.GetUpperBound(0) || col > Board.GetUpperBound(1)) { break; }
// Check for a match
if (Board[row, col] == colorToMatch)
{
matchingPieces++;
if (matchingPieces == 4) return true;
}
else { break; }
}
// If we got this far, no match was found in forward slash direction,
// so reset our counter and check the back slash direction '\'
matchingPieces = 1;
// First check down/right (decrement row and increment column)
for (int counter = 1; counter < 4; counter++)
{
var row = pieceRow - counter;
var col = pieceCol + counter;
// Make sure we stay within our board
if (row < Board.GetLowerBound(0) || col > Board.GetUpperBound(1)) { break; }
// Check for a match
if (Board[row, col] == colorToMatch)
{
matchingPieces++;
if (matchingPieces == 4) return true;
}
else { break; }
}
// Next check up/left (increment row and decrement column)
for (int counter = 1; counter < 4; counter++)
{
var row = pieceRow + counter;
var col = pieceCol - counter;
// Make sure we stay within our board
if (row > Board.GetUpperBound(0) || col < Board.GetLowerBound(1)) { break; }
// Check for a match
if (Board[row, col] == colorToMatch)
{
matchingPieces++;
if (matchingPieces == 4) return true;
}
else { break; }
}
// If we've gotten this far, then we haven't found a match
return false;
}
Here's the result of a diagonal win: