Matrix Multiplication with specific number of threads [duplicate]

Matrix Multiplication with specific number of threads [duplicate] - c#

This question already has answers here:
Thread parameters being changed
(2 answers)
Closed 6 years ago.
I am completely new in the field of multithreading. At the moment I am trying to implement a commandline program which is able to multiply two matrices of equal size. The main goal is that the user can enter a specific number of threads as a commandline argument and that the multiplication task is solved using exactly this number of threads.
My approach is based on the following java implementation which tries to solve a similar task: Java Implementation Matrix Multiplication Multi-Threading
My current state is the following one:
using System;
using System.Threading;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading.Tasks;
class Program
{
static int rows = 16;
static int columns = 16;
static int[] temp = new int[rows*columns];
static int[,] matrixA = new int[rows, columns];
static int[,] matrixB = new int[rows, columns];
static int[,] result = new int[rows, columns];
static Thread[] threadPool;
static void runMultiplication(int index){
for(int i = 0; i < rows; i++){
for(int j = 0; j < columns; j++){
Console.WriteLine();
result[index, i] += matrixA[index, j] * matrixB[j, i];
}
}
}
static void fillMatrix(){
for (int i = 0; i < matrixA.GetLength(0); i++) {
for (int j = 0; j < matrixA.GetLength(1); j++) {
matrixA[i, j] = 1;
matrixB[i, j] = 2;
}
}
}
static void multiplyMatrices(){
threadPool = new Thread[rows];
for(int i = 0; i < rows; i++){
threadPool[i] = new Thread(() => runMultiplication(i));
threadPool[i].Start();
}
for(int i = 0; i < rows; i++){
try{
threadPool[i].Join();
}catch (Exception e){
Console.WriteLine(e.Message);
}
}
}
static void printMatrix(int[,] matrix) {
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < columns; j++)
{
Console.Write(string.Format("{0} ", matrix[i, j]));
}
Console.Write(Environment.NewLine + Environment.NewLine);
}
}
static void Main(String[] args){
fillMatrix();
multiplyMatrices();
printMatrix(result);
}
}
At the moment I have two problems:
My result matrix contains values which are far from a valid result
I do not know if I am on the right way according to my goal that a user can specify how many threads should be used.
I would be very grateful if anyone could guide me to a solution.
PS: I know there are existing posts which are similar to mine, but the main challenge in my post is to allow the user to set the number of threads which will then solve the matrix multiplication.

Linear algebra is a difficult place to begin with threads if you're new. I'd recommend learning about map/reduce first and implementing that in C#.
Imagine if you have just one core and you wanted to perform a long calculation. Multiple threads are scheduled by the operating system so that one does some work, then the next is giving a turn, etc. It's easy to do the thought experiment and figure out that context switching will make the problem go slower than the single threaded version. There's no true parallelization there.
The problem is that most linear algebra operations are not easily parallelizable. They aren't independent of each other. More threads than cores will not improve the situation and may make performance worse.
The best you can do is one thread per core and partitioning the matrix like this.
Here's a thought: Before you worry about multithreading, take your Matrix class and make sure that every single operation works properly with a single thread. There's no sense in worrying about multithreading if your code doesn't produce the right answers for a single thread. Get that working, then figure out how to partition the problem among multiple threads.

Related

Printing square with non repetitive character

I want to print a rectangle like this :
&#*#
#*#&
*#&#
#&#*
But problem is that i can't find the algorithm to print this.
I only know how to print a simple rectangle/square
public static void Main(string[] args)
{
Console.Out.Write("Saisir la taille : ");
int taille = int.Parse(Console.In.ReadLine());
int i;
int j;
for(i = 0; i < taille; i++){
for(j = 0; j < taille; j++){
Console.Write("*");
}
Console.WriteLine("");
}
}
Thank you !

First things first unless you need your iterators outside of your loop, just declare them in the for declaration
public static void Main(string[] args)
{
Console.Out.Write("Saisir la taille : ");
int taille = int.Parse(Console.In.ReadLine());
for(int i = 0; i < taille; i++){
for(int j = 0; j < taille; j++){
Console.Write("*");
}
Console.WriteLine("");
}
}
Second you'll need a list of the characters you want to use, given your example
char[] chars = { '&', `#`, `*`, '#' };
and we'll need a way to know which character we want to use at any given time, say an iterator we can call characterIndex for simplicity. We will increment it each iteration. If incrementing it puts it out of the range of our character array, if characterIndex == 4, we set it back to zero.
int characterIndex;
To get the scrolling effect you have, before each line we must select a characterIndex that is offset by the row
characterIndex = i % chars.Length;
Tying it all together
public static void Main(string[] args)
{
char[] chars = { '&', `#`, `*`, '#' };
int characterIndex;
Console.Out.Write("Saisir la taille : ");
int taille = int.Parse(Console.In.ReadLine());
for(int i = 0; i < taille; i++){
characterIndex = i % chars.Length;
for(int j = 0; j < taille; j++){
Console.Write(chars[characterIndex]);
characterIndex++;
if(characterIndex == chars.Length)
characterIndex = 0;
}
Console.WriteLine("");
}
}

Getting the permutations by nesting for loops will only work if you know exactly how many elements there will be. Basically you need to write a for-loop for every element after the 1st.
The proper way to deal with this is Recursion. While there are cases where Recursion and nested for-loops are interchangeable. And in cases where they are, for loops have a potential speed advantage. While normally the speed rant applies to such differences, with the sheer amount of data both Recursion and Loops might have to deal with, it often maters - so best to prefer loops where possible.
Permutations is AFAIK not a case where loops and recursion are interchangeable. Recurions seems to be mandatory. Some problem as simply inherently recursive. As the recursion version is fairly well known, I will not post any example code.
You should defiitely use Recursion. With your example code I basically asume you are:
In a learning environment
You just learned recursion
A input variant recurions can effortless solve (like a 6 or 20 size input), is the next assignment

Parallelize transitive reduction

I have a Dictionary<int, List<int>>, where the Key represents an element of a set (or a vertex in an oriented graph) and the List is a set of other elements which are in relation with the Key (so there are oriented edges from Key to Values). The dictionary is optimized for creating a Hasse diagram, so the Values are always smaller than the Key.
I have also a simple sequential algorithm, that removes all transitive edges (e.g. I have relations 1->2, 2->3 and 1->3. I can remove the edge 1->3, because I have a path between 1 and 3 via 2).
for(int i = 1; i < dictionary.Count; i++)
{
for(int j = 0; j < i; j++)
{
if(dictionary[i].Contains(j))
dictionary[i].RemoveAll(r => dictionary[j].Contains(r));
}
}
Would it be possible to parallelize the algorithm? I could do Parallel.For for the inner loop. However, this is not recommended (https://msdn.microsoft.com/en-us/library/dd997392(v=vs.110).aspx#Anchor_2) and the resulting speed would not increase significantly (+ there might be problems with locking). Could I parallelize the outer loop?

There is simple way to solve the parallelization problem, separate data. Read from original data structure and write to new. That way You can run it in parallel without even need to lock.
But probably the parallelization is not even necessary, the data structures are not efficient. You use dictionary where array would be sufficient (as I understand the code You have vertices 0..result.Count-1). And List<int> for lookups. List.Contains is very inefficient. HashSet would be better. Or, for more dense graphs, BitArray. So instead of Dictionary<int, List<int>> You can use BitArray[].
I rewrote the algorithm and made some optimizations. It does not make plain copy of the graph and delete edges, it just construct the new graph from only the right edges. It uses BitArray[] for input graph and List<int>[] for final graph, as the latter one is far more sparse.
int sizeOfGraph = 1000;
//create vertices of a graph
BitArray[] inputGraph = new BitArray[sizeOfGraph];
for (int i = 0; i < inputGraph.Length; ++i)
{
inputGraph[i] = new BitArray(i);
}
//fill random edges
Random rand = new Random(10);
for (int i = 1; i < inputGraph.Length; ++i)
{
BitArray vertex_i = inputGraph[i];
for(int j = 0; j < vertex_i.Count; ++j)
{
if(rand.Next(0, 100) < 50) //50% fill ratio
{
vertex_i[j] = true;
}
}
}
//create transitive closure
for (int i = 0; i < sizeOfGraph; ++i)
{
BitArray vertex_i = inputGraph[i];
for (int j = 0; j < i; ++j)
{
if (vertex_i[j]) { continue; }
for (int r = j + 1; r < i; ++r)
{
if (vertex_i[r] && inputGraph[r][j])
{
vertex_i[j] = true;
break;
}
}
}
}
//create transitive reduction
List<int>[] reducedGraph = new List<int>[sizeOfGraph];
Parallel.ForEach(inputGraph, (vertex_i, state, ii) =>
{
{
int i = (int)ii;
List<int> reducedVertex = reducedGraph[i] = new List<int>();
for (int j = i - 1; j >= 0; --j)
{
if (vertex_i[j])
{
bool ok = true;
for (int x = 0; x < reducedVertex.Count; ++x)
{
if (inputGraph[reducedVertex[x]][j])
{
ok = false;
break;
}
}
if (ok)
{
reducedVertex.Add(j);
}
}
}
}
});
MessageBox.Show("Finished, reduced graph has "
+ reducedGraph.Sum(s => s.Count()) + " edges.");
EDIT
I wrote this:
The code has some problems. With the direction i goes now, You can delete edges You would need and the result would be incorrect. This turned out to be a mistake. I was thinking this way, lets have a graph
1->0
2->1, 2->0
3->2, 3->1, 3->0
Vertex 2 gets reduced by vertex 1, so we have
1->0
2->1
3->2, 3->1, 3->0
Now vertex 3 gets reduced by vertex 2
1->0
2->1
3->2, 3->0
And we have a problem, as we can not reduce 3->0 which stayed here because of reduced 2->0. But it is my mistake, this would never happen. The inner cycle goes strictly from lower to higher, so instead
Vertex 3 gets reduced by vertex 1
1->0
2->1
3->2, 3->1
and now by vertex 2
1->0
2->1
3->2
And the result is correct. I apologize for the error.

Weird behavior of multithread random numbers generator

Please check below code, this code try to compute birthday conflict possibility. To my surprise, if i execute those code with sequence, the result is expected around 0.44; but if try on PLinq, the result is 0.99.
Anyone can explain the result?
public static void BirthdayConflict(int num = 5, int people = 300) {
int N = 100000;
int act = 0;
Random r = new Random();
Action<int> action = (a) => {
List<int> p = new List<int>();
for (int i = 0; i < people; i++)
{
p.Add(r.Next(364) + 1);
}
p.Sort();
bool b = false;
for (int i = 0; i < 300; i++)
{
if (i + num -1 >= people) break;
if (p[i] == p[i + num -1])
b = true;
}
if (b)
Interlocked.Increment(ref act);
// act++;
};
// Result is around 0.99 - which is not OK
// Parallel.For( 0, N, action);
//Result is around 0.44 - which is OK
for (int i = 0; i < N; i++)
{
action(0);
}
Console.WriteLine(act / 100000.0);
Console.ReadLine();
}

You're using a shared (between threads) instance System.Random. It's not thread-safe then you're getting wrong results (well actually it just doesn't work and it'll return 0). From MSDN:
If your app calls Random methods from multiple threads, you must use a synchronization object to ensure that only one thread can access the random number generator at a time. If you don't ensure that the Random object is accessed in a thread-safe way, calls to methods that return random numbers return 0.
Simple (but not so efficient for parallel execution) solution is to use a lock:
lock (r)
{
for (int i = 0; i < people; i++)
{
p.Add(r.Next(364) + 1);
}
}
To improve performance (but you should measure) you may use multiple instances of System.Random, be careful to initialize each one with a different seed.

I find a useful explanation why random does not work under multi-thread, although it was original for Java, still can be benefitical.

C# Random number code check

The below code is copied from a paper, undergrad work was linked to from a wikipedia page. I believe I've spotted some flaws in the paper and some in the code but as I have no C# experience I just want to double check.
My understanding is that this code was meant to create a large psuedo totally random number but instead I believe it creates a large number which is a reoccuring smaller random number.
ie. 123123123 instead of 123784675. Can someone please confirm what the code does.
What I read here http://csharpindepth.com/Articles/Chapter12/Random.aspx and on Stackoverflow in various posts makes me believe that it's using the same seed and hence getting the same number each iteration and just appending that same number over and over.
Random randomNumber = new Random();
counter = 0;
for (int j = 0; j < 1; j++)
{
StringBuilder largeRandomNumber = new StringBuilder();
for (int i = 0; i < 40000; i++)
{
int value = randomNumber.Next(11111, 99999);
largeRandomNumber.Append(value);
}
}

Creates instance of random and loops through creating the next random number using random.Next(int min, int max) and appends this on the end of a string. Essentially, it just creates one huge number for something. Outer loop is garbage, not needed at all. Random doesn't need to be seeded again after creation...keeps same seed and progresses correctly using the Next method. Everything about this code "works" but seems pointless in any application besides learning about the random class.

The code is fine.
Try it here: http://www.compileonline.com/compile_csharp_online.php
using System.IO;
using System;
using System.Text;
class Program
{
static void Main()
{
Random randomNumber = new Random();
for (int j = 0; j < 1; j++)
{
StringBuilder largeRandomNumber = new StringBuilder();
for (int i = 0; i < 40; i++)
{
int value = randomNumber.Next(11111, 99999);
Console.WriteLine(value);
}
}
}
}

It is correct. Yes, Seed is same but this line will make sure that you get different number on different run :
int value = randomNumber.Next(11111, 99999);
and since you are appending this number to a string to create large random number, this does what it was supposed to do.

randomNumber is seeded on construction. randomNumber.Next is returning the next random integer between the two given integers based on the initial seed, and is not reseeded, thus giving 40000 new random numbers and appending them.
Not sure what the outer loop is for, it only runs once anyway

Yes, it is most likely that it will generate the same number if called quickly in succession, since Random is seeded with the current time. Unless they instantiated the randomNumber instance inside the loop however, for purposes of an example it works fine.
For example, if the code is plugged into a function like this
public string GetLargeRandomNumber()
{
Random randomNumber = new Random();
StringBuilder largeRandomNumber = new StringBuilder();
for (int j = 0; j < 1; j++)
{
for (int i = 0; i < 40000; i++)
{
int value = randomNumber.Next(11111, 99999);
largeRandomNumber.Append(value);
}
}
return largeRandomNumber.ToString();
}
And called from a main function in quick succession it will return the same random number.

c# fixed arrays - which structure is fastest to read from?

I have some large arrays of 2D data elements. A and B aren't equally sized dimensions.
A) is between 5 and 20
B) is between 1000 and 100000
The initialization time is no problem as its only going to be lookup tables for realtime application, so performance on indexing elements from knowing value A and B is crucial. The data stored is currently a single byte-value.
I was thinking around these solutions:
byte[A][B] datalist1a;
or
byte[B][A] datalist2a;
or
byte[A,B] datalist1b;
or
byte[B,A] datalist2b;
or perhaps loosing the multidimension as I know the fixed size and just multiply the to values before looking it up.
byte[A*Bmax + B] datalist3;
or
byte[B*Amax + A] datalist4;
What I need is to know, what datatype/array structure to use for most efficient lookup in C# when I have this setup.
Edit 1
the first two solutions were supposed to be multidimensional, not multi arrays.
Edit 2
All data in the smallest dimension is read at each lookup, but the large one is only used for indexing once at a time.
So its something like - Grab all A's from sample B.

I'd bet on the jagged arrays, unless the Amax or Bmax are a power of 2.
I'd say so, because a jagged array needs two indexed accesses, thus very fast. The other forms implies a multiplication, either implicit or explicit. Unless that multiplication is a simple shift, I think could be a bit heavier than a couple of indexed accesses.
EDIT: Here is the small program used for the test:
class Program
{
private static int A = 10;
private static int B = 100;
private static byte[] _linear;
private static byte[,] _square;
private static byte[][] _jagged;
unsafe static void Main(string[] args)
{
//init arrays
_linear = new byte[A * B];
_square = new byte[A, B];
_jagged = new byte[A][];
for (int i = 0; i < A; i++)
_jagged[i] = new byte[B];
//set-up the params
var sw = new Stopwatch();
byte b;
const int N = 100000;
//one-dim array (buffer)
sw.Restart();
for (int i = 0; i < N; i++)
{
for (int r = 0; r < A; r++)
{
for (int c = 0; c < B; c++)
{
b = _linear[r * B + c];
}
}
}
sw.Stop();
Console.WriteLine("linear={0}", sw.ElapsedMilliseconds);
//two-dim array
sw.Restart();
for (int i = 0; i < N; i++)
{
for (int r = 0; r < A; r++)
{
for (int c = 0; c < B; c++)
{
b = _square[r, c];
}
}
}
sw.Stop();
Console.WriteLine("square={0}", sw.ElapsedMilliseconds);
//jagged array
sw.Restart();
for (int i = 0; i < N; i++)
{
for (int r = 0; r < A; r++)
{
for (int c = 0; c < B; c++)
{
b = _jagged[r][c];
}
}
}
sw.Stop();
Console.WriteLine("jagged={0}", sw.ElapsedMilliseconds);
//one-dim array within unsafe access (and context)
sw.Restart();
for (int i = 0; i < N; i++)
{
for (int r = 0; r < A; r++)
{
fixed (byte* offset = &_linear[r * B])
{
for (int c = 0; c < B; c++)
{
b = *(byte*)(offset + c);
}
}
}
}
sw.Stop();
Console.WriteLine("unsafe={0}", sw.ElapsedMilliseconds);
Console.Write("Press any key...");
Console.ReadKey();
Console.WriteLine();
}
}

Multidimensional ([,]) arrays are nearly always the slowest, unless under a heavy random access scenario. In theory they shouldn't be, but it's one of the CLR oddities.
Jagged arrays ([][]) are nearly always faster than multidimensional arrays; even under random access scenarios. These have a memory overhead.
Singledimensional ([]) and algebraic arrays ([y * stride + x]) are the fastest for random access in safe code.
Unsafe code is, normally, fastest in all cases (provided you don't pin it repeatedly).

The only useful answer to "which X is faster" (for all X) is: you have to do performance tests that reflect your requirements.
And remember to consider, in general*:
Maintenance of the program. If this is not a quick one off, a slightly slower but maintainable program is a better option in most cases.
Micro benchmarks can be deceptive. For instance a tight loop just reading from a collection might be optimised away in ways not possible when real work is being done.
Additionally consider that you need to look at the complete program to decide where to optimise. Speeding up a loop by 1% might be useful for that loop, but if it is only 1% of the complete runtime then it is not making much differences.
* But all rules have exceptions.

On most modern computers, arithmetic operations are far, far faster than memory lookups.
If you fetch a memory address that isn't in a cache or where the out of order execution pulls from the wrong place you are looking at 10-100 clocks, a pipelined multiply is 1 clock.
The other issue is cache locality.
byte[BAmax + A] datalist4; seems like the best bet if you are accessing with A's varying sequentially.
When datalist4[bAmax + a] is accessed, the computer will usually start pulling in datalist4[bAmax + a+ 64/sizeof(dataListType)], ... +128 ... etc, or if it detects a reverse iteration, datalist4[bAmax + a - 64/sizeof(dataListType)]
Hope that helps!

May be best way for u will be use HashMap
Dictionary?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Matrix Multiplication with specific number of threads [duplicate] - c#

Related

Printing square with non repetitive character

Parallelize transitive reduction

Weird behavior of multithread random numbers generator

C# Random number code check

c# fixed arrays - which structure is fastest to read from?

Categories

Resources