System.Numerics.Vector.GreaterThan and bool results - c#

I am trying to convert some existing code that can be optimized using SIMD instructions. There is a mask generation code that I am testing how much performance I can get out of SIMD after converting it and the below is a oversimplified chunk I am using to profile it.
Random r = new Random();
var random1 = new double[65536000*4];
var random2 = new double[random1.Length];
var result = new bool[random1.Length];
for (i = 0; i < random1.Length; i++)
{
random1[i] = r.Next();
random2[i] = r.Next();
}
var longRes = new long[random1.Length];
for (int i = 0; i < result.Length; i += Vector<double>.Count)
{
Vector<double> v1 = new Vector<double>(random1, i);
Vector<double> v2 = new Vector<double>(random2, i);
Vector<long> res = System.Numerics.Vector.GreaterThan(v1, v2);
res.CopyTo(longRes, i);
}
Is there a technique I could use to efficiently put the result res into the result array?
Originally I thought I could live with Vector<long> and keep the masks in long[] but I realized that maybe this is not feasible.

As commented on the original question I came to a realization that System.Numberics.Vector.GreaterThan and other similar methods like LesserThan etc are designed for use with ConditionalSelect().
In my case, I was attempting to generate a bool array that represents an image mask that is later used throughout the API and converting the long to bool wouldn't be feasible.
In other words, these comparison methods were not meant to be for general purpose use.

Related

What is the fastest way to do Array Table Lookup with an Integer Index?

I have a video processing application that moves a lot of data.
To speed things up, I have made a lookup table, as many calculations in essence only need to be calculated one time and can be reused.
However I'm at the point where all the lookups now takes 30% of the processing time. I'm wondering if it might be slow RAM.. However, I would still like to try to optimize it some more.
Currently I have the following:
public readonly int[] largeArray = new int[3000*2000];
public readonly int[] lookUp = new int[width*height];
I then perform a lookup with a pointer p (which is equivalent to width * y + x) to fetch the result.
int[] newResults = new int[width*height];
int p = 0;
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++, p++) {
newResults[p] = largeArray[lookUp[p]];
}
}
Note that I cannot do an entire array copy to optimize. Also, the application is heavily multithreaded.
Some progress was in shortening the function stack, so no getters but a straight retrieval from a readonly array.
I've tried converting to ushort as well, but it seemed to be slower (as I understand it's due to word size).
Would an IntPtr be faster? How would I go about that?
Attached below is a screenshot of time distribution:
It looks like what you're doing here is effectively a "gather". Modern CPUs have dedicated instructions for this, in particular VPGATHER** . This is exposed in .NET Core 3, and should work something like below, which is the single loop scenario (you can probably work from here to get the double-loop version);
results first:
AVX enabled: False; slow loop from 0
e7ad04457529f201558c8a53f639fed30d3a880f75e613afe203e80a7317d0cb
for 524288 loops: 1524ms
AVX enabled: True; slow loop from 1024
e7ad04457529f201558c8a53f639fed30d3a880f75e613afe203e80a7317d0cb
for 524288 loops: 667ms
code:
using System;
using System.Diagnostics;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
static class P
{
static int Gather(int[] source, int[] index, int[] results, bool avx)
{ // normally you wouldn't have avx as a parameter; that is just so
// I can turn it off and on for the test; likewise the "int" return
// here is so I can monitor (in the test) how much we did in the "old"
// loop, vs AVX2; in real code this would be void return
int y = 0;
if (Avx2.IsSupported && avx)
{
var iv = MemoryMarshal.Cast<int, Vector256<int>>(index);
var rv = MemoryMarshal.Cast<int, Vector256<int>>(results);
unsafe
{
fixed (int* sPtr = source)
{
// note: here I'm assuming we are trying to fill "results" in
// a single outer loop; for a double-loop, you'll probably need
// to slice the spans
for (int i = 0; i < rv.Length; i++)
{
rv[i] = Avx2.GatherVector256(sPtr, iv[i], 4);
}
}
}
// move past everything we've processed via SIMD
y += rv.Length * Vector256<int>.Count;
}
// now do anything left, which includes anything not aligned to 256 bits,
// plus the "no AVX2" scenario
int result = y;
int end = results.Length; // hoist, since this is not the JIT recognized pattern
for (; y < end; y++)
{
results[y] = source[index[y]];
}
return result;
}
static void Main()
{
// invent some random data
var rand = new Random(12345);
int size = 1024 * 512;
int[] data = new int[size];
for (int i = 0; i < data.Length; i++)
data[i] = rand.Next(255);
// build a fake index
int[] index = new int[1024];
for (int i = 0; i < index.Length; i++)
index[i] = rand.Next(size);
int[] results = new int[1024];
void GatherLocal(bool avx)
{
// prove that we're getting the same data
Array.Clear(results, 0, results.Length);
int from = Gather(data, index, results, avx);
Console.WriteLine($"AVX enabled: {avx}; slow loop from {from}");
for (int i = 0; i < 32; i++)
{
Console.Write(results[i].ToString("x2"));
}
Console.WriteLine();
const int TimeLoop = 1024 * 512;
var watch = Stopwatch.StartNew();
for (int i = 0; i < TimeLoop; i++)
Gather(data, index, results, avx);
watch.Stop();
Console.WriteLine($"for {TimeLoop} loops: {watch.ElapsedMilliseconds}ms");
Console.WriteLine();
}
GatherLocal(false);
if (Avx2.IsSupported) GatherLocal(true);
}
}
RAM is already one of the fastest things possible. The only memory faster is the CPU caches. So it will be Memory Bound, but that is still plenty fast.
Of course at the given sizes, this array is 6 Million entries in size. That will likely not fit in any cache. And will take forever to itterate over. It does not mater what the speed is, this is simply too much data.
As a general rule, video processing is done on the GPU nowadays. GPU's are literally desinged to operate on giant arrays. Because that is what the Image you are seeing right now is - a giant array.
If you have to keep it on the GPU side, maybe caching or Lazy Initilisation would help? Chances are that you do not truly need every value. You only need to common values. Take a examples from dicerolling: If you roll 2 6-sided dice, every result from 2-12 is possible. But the result 7 happens 6 out of 36 casess. The 2 and 12 only 1 out of 36 cases each. So having the 7 stored is a lot more beneficial then the 2 and 12.

Loading multi-dimensional array dynamically

I have the following code. It's roughly analogous in concept to the python reshape function. It successfully loads 1-dimensional data into a multi-dimensional array, the dimensions of which are not known until runtime. For example {209,64,64,3}. I have to iterate over the 1-dimensional data and create the correct indexes for each dimension of the array.
private void InitializeData()
{
var imageData = ImageData.Load(txtFileName.Text); // one dimensional array
var dimensions = txtDimensions.Text.Split(',').Select(d => int.Parse(d)).ToArray(); // e.g., {-1,64,64,3}
int elements = 1;
foreach (var dim in dimensions.Skip(1))
{
elements *= dim;
}
dimensions[0] = imageData.Length / elements; // {209,64,64,3}
// create multipliers
var multipliers = new int[dimensions.Length - 1];
for (var dimension = 1; dimension < dimensions.Length; dimension++)
{
var multiplier = 1;
for (var followingdimension = dimension; followingdimension < dimensions.Length; followingdimension++)
{
multiplier *= dimensions[followingdimension];
}
multipliers[dimension - 1] = multiplier;
}
// load data
var dataArray = Array.CreateInstance(typeof(int), dimensions);
var indexes = new int[dimensions.Length];
for (var imageDataIndex = 0; imageDataIndex < imageData.Length; imageDataIndex++)
{
indexes[0] = imageDataIndex / multipliers[0];
indexes[dimensions.Length - 1] = imageDataIndex % multipliers[multipliers.Length - 1];
for (var multiplier = 1; multiplier < dimensions.Length - 1; multiplier++)
indexes[multiplier] = (imageDataIndex / multipliers[multiplier]) % dimensions[multiplier];
dataArray.SetValue(imageData[imageDataIndex], indexes);
}
}
Is there a faster or more elegant way of doing this? I do realize those are two different things. I'll do bench-marking on the elegant suggestions, but I'd still like to see them. Because this is just too ugly to look at and was too painful to write to be the best way.
Note (Please)
The data may not always be image data, so I am not looking for bitmap operations. That just happens here but it's not necessarily a typical case. And, my goal is not to get a bitmap, but an array.
I have a partial answer thanks to How to reshape an Array in c#
The code can be replaced with just this:
var imageData = ImageData.Load(txtFileName.Text); // one dimensional array
// e.g., {209,64,64,3}
var dimensions = txtDimensions.Text.Split(',').Select(d => int.Parse(d)).ToArray();
int elements = 1;
foreach (var dim in dimensions.Skip(1))
{
elements *= dim;
}
dimensions[0] = imageData.Length / elements;
// load data
var dataArray = Array.CreateInstance(typeof(int), dimensions);
Buffer.BlockCopy(imageData, 0, dataArray, 0, imageData.Length * sizeof(int));
I would be surprised if there's a faster way to do the actual load then Buffer.BlockCopy, or a simpler one. It turns out whatever dimensional form your original data is in, BlockCopy handles it as long as you can specify your target dimensions as part of a target array.
I'll keep looking for ways to further refine the rest of the original code.

Random class generating same sequence

I have a method which I am using to generate random strings by creating random integers and casting them to char
public static string GenerateRandomString(int minLength, int maxLength)
{
var length = GenerateRandomNumber(minLength, maxLength);
var builder = new StringBuilder(length);
var random = new Random((int)DateTime.Now.Ticks);
for (var i = 0; i < length; i++)
{
builder.Append((char) random.Next(255));
}
return builder.ToString();
}
The problem is that when I call this method frequently, it is creating the same sequence of values, as the docs already says:
The random number generation starts from a seed value. If the same
seed is used repeatedly, the same series of numbers is generated. One
way to produce different sequences is to make the seed value
time-dependent, thereby producing a different series with each new
instance of Random.
As you can see I am making the seed time dependent and also creating a new instance of Random on each call to the method. Even though, my Test is still failing.
[TestMethod]
public void GenerateRandomStringTest()
{
for (var i = 0; i < 100; i++)
{
var string1 = Utilitaries.GenerateRandomString(10, 100);
var string2 = Utilitaries.GenerateRandomString(10, 20);
if (string1.Contains(string2))
throw new InternalTestFailureException("");
}
}
How could I make sure that independently of the frequency on which I call the method, the sequence will "always" be different?
Your test is failing because the GenerateRandomString function completes too soon for the DateTime.Now.Ticks to change. On most systems it is quantized at either 10 or 15 ms, which is more than enough time for a modern CPU to generate a sequence of 100 random characters.
Inserting a small delay in your test should fix the problem:
var string1 = Utilitaries.GenerateRandomString(10, 100);
Thread.Sleep(30);
var string2 = Utilitaries.GenerateRandomString(10, 20);
You're effectively doing the same as Random's default constructor. It's using Environment.TickCount. Take a look at the example in this MSDN documentation for the Random constructor. It shows that inserting a Thread.Sleep between the initialization of the different Random instances, will yield different results.
If you really want to get different values, I suggest you change to a seed value that's not time-dependent.
dasblinkenlight has given why this is happening.
Now you should do this to overcome this problem
public static string GenerateRandomString(Random random , int minLength,
int maxLength)
{
var length = GenerateRandomNumber(random , minLength, maxLength);
var builder = new StringBuilder(length);
for (var i = 0; i < length; i++)
builder.Append((char) random.Next(255));
return builder.ToString();
}
public void GenerateRandomStringTest()
{
Random rnd = New Random();
for (var i = 0; i < 100; i++)
{
var string1 = Utilitaries.GenerateRandomString(rnd, 10, 100);
var string2 = Utilitaries.GenerateRandomString(rnd, 10, 20);
if (string1.Contains(string2))
throw new InternalTestFailureException("");
}
}

What's the most efficient way to access the value of a match?

There are many ways to access a Match's value in C#:
Match mtch = //whatever
//you could do
mtch.Value
//or
mtch.ToString()
//or
mtch.Groups[0].Value
//or
mtch.Groups[0].ToString()
My question is: what is the best way to access it?
(I know this is micro-optimization, I'm just wondering)
I wrote a quick test and ended up with the following result...
[TestMethod]
public void GenericTest()
{
Regex r = new Regex(".def.");
Match mtch = r.Match("abcdefghijklmnopqrstuvwxyz", 0);
for (int i = 0; i < 1000000; i++)
{
string a = mtch.Value; // 15.4%
string b = mtch.ToString(); // 19.2%
string c = mtch.Groups[0].Value; // 23.1%
string d = mtch.Groups[0].ToString(); // 38.5%
}
}
If you are talking about effiecency based on the samples you provide I would guess that the most efficient would be the first one since when you use ToSting() it adds an extra conversion functionality to your variable which would take an extra time,
Write a test
Read result
Think about result
If you don't want to write tests look at the Microsoft Intermediate Language (MSIL) and think about what will take more time
i also tested it with the result
// VS 2012 Ultimate
//
Regex r = new Regex(".def.");
Match mtch = r.Match("abcdefghijklmnopqrstuvwxyz", 0);
string a, b, c, d;
for (int i = 0; i < int.MaxValue; i++)
{
a = mtch.Value; // 1.4%
b = mtch.ToString(); // 33.2%
c = mtch.Groups[0].Value; // 15.3%
d = mtch.Groups[0].ToString(); // 44.1%
}

C#: random number issue

See the following:
for (int i=0; i<2; i++) {
// do some stuff
r = new Random((int)DateTime.Now.Ticks);
iRandom = r.Next(30000);
// do some other stuff
}
Don't ask me how, but iRandom is sometimes the same for both iterations of the loop. I need iRandom to be different for each iteration. How do I do this?
Change your loop to this:
r = new Random((int)DateTime.Now.Ticks);
for (int i=0; i<2; i++) {
// do some stuff
iRandom = r.Next(30000);
// do some other stuff
}
In other words, put the creation of the Random object outside the loop.
For some surprises with Math.Random doubles versus RNGCryptoServiceProvider, try plotting the results of the following (say, using a spreadsheet). This code will run in LinqPad (www.LinqPad.net). It's worth a look :)
void Main()
{
{
var ds = Enumerable.Range(1, 30).Select(i => new Random(i).NextDouble());
ds.Dump();
}
{
var csp = new System.Security.Cryptography.RNGCryptoServiceProvider();
var bs = new byte[8 * 30];
var ds = new double[30];
csp.GetBytes(bs);
for (var i = 0; i < 30; i++)
{
var d = BitConverter.ToDouble(bs, i * 8);
while (d == 0D || Double.IsNaN(d))
{
var bytes = new byte[8];
csp.GetBytes(bytes);
d = BitConverter.ToDouble(bs, 0);
}
ds[i] = Math.Log10(Math.Abs(d));
}
ds.Dump();
}
}
try creating your Random with a non time-based seed, otherwise the seed may be the same (and the random number same also)
for (int i = 0; i < 2; i++)
{
// do some stuff
r = new Random(Guid.NewGuid().GetHashCode());
iRandom = r.Next(30000);
// do some other stuff
}
A REALLY nice explanation: http://msdn.microsoft.com/en-us/library/system.random.aspx
Try seeding with a new Guid, or, better yet, use the Cryto provider... Here's a Stko article with code
Best way to seed Random() in singleton
DateTime.Now.Ticks is a Int64 type. If you cast to an int, it could be casting to the same value for the seed and therefore giving you the same random number.
No need to reseed or recreate the Random object once it is created. Just reuse the object and obtain the next random number.

Categories

Resources