Multiply large number in parallel - c#

I wrote a multiply large number function and i want to change it to calculate the power of big numbers.
ex: (2321313200000888)^25 so i do like this:
public string power(string num1, int n)
{
Stopwatch timer = new Stopwatch();
timer.Start();
string answer= num1;
for (int i = 1; i < n; i++)
answer= multiply(num1, answer);
if (this.InvokeRequired)
{
this.richTextBox1.BeginInvoke((MethodInvoker)delegate() { richTextBox1.Text = answer; });
this.label6.BeginInvoke((MethodInvoker)delegate() { label6.Text = answer.Length.ToString(); });
this.label2.BeginInvoke((MethodInvoker)delegate() { label2.Text = timer.Elapsed.ToString(); });
}
return answer;
I want to do this in parallel to reduce the time how can i do that? I tried making task but it is the same as seqential.

I don't know an easy way to parallelize, but you can definitely reduce the number of multiplications. The easiest way to consider the binary representation of the number (my apologies for lack of MathJax):
So first compute (by successive multiplication with itself):
Then multiply the three that are required. This reduces from 24 multiplications to 7 multiplications. The savings grows pretty quickly for very large powers, though the time varies quite a bit depending on how many bits are in the power.
You'll find that you can optimize the process if you're able to find a partition such that you have bigger exponents only once like in:

Related

Parallel for loop with BigINtegers

So i am working on a program to factor a given number. you write a number in the command line and it gives you all the factors. now, doing this sequentially is VERY slow. because it uses only one thread, if i'm not mistaking. now, i thought about doing it with Parallel.For and it worked, only with integers, so wanted to try it with BigIntegers
heres the normal code:
public static void Factor(BigInteger f)
{
for(BigInteger i = 1; i <= f; i++)
{
if (f % i == 0)
{
Console.WriteLine("{0} is a factor of {1}", i, f);
}
}
}
the above code should be pretty easy to understand. but as i said, this Very slow for big numbers (above one billion it starts to get impractical) and heres my parallel code:
public static void ParallelFacotr(BigInteger d)
{
Parallel.For<BigInteger>(0, d, new ParallelOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount }, i =>
{
try
{
if (d % i == 0)
{
Console.WriteLine("{0} is a factor of {1}", i, d);
}
}
catch (DivideByZeroException)
{
Console.WriteLine("Done!");
Console.ReadKey();
Launcher.Main();
}
});
}
now, the above code (parallel one) works just fine with integers (int) but its VERY fast. it factored 1000.000.000 in just 2 seconds. so i thought why not try it with bigger integers. and also, i thought that putting <Biginteger> after the parallel.for would do it. but it doesn't. so, how do you work with bigintegers in parallel for loop? and i already tried just a regular parallel loop with a bigInteger as the argument, but then it gives an error saying that it cannot convert from BigInteger to int. so how do you
Improve your algorithm efficiency first.
While it is possible to use BigInteger you will not have CPU ALU arithmetical logic to resolve arbitrarily big numbers logic in hardware, so it will be noticeably slower. So unless you need bigger numbers than 9 quintillion or exactly 9,223,372,036,854,775,807 then you can use type long.
A second thing to not is that you do not need to loop over all elements as it needs to be multiple of something, so you can reduce
for(BigInteger i = 1; i <= f; i++)
for(long i = 1; i <= Math.Sqrt(f); i++)
That would mean that instead of needing to iterate over 1000000000 items you iterate over 31623.
Additionally, if you still plan on using BigInt then check the parameters:
It should be something in the lines of
Parallel.For(
0,
(int)d,
() => BigInteger.Zero,
(x, state, subTotal) => subTotal + BigInteger.One,
Just for trivia. Some programming languages are more efficient in solving problems than others and in this case, there is a languages Wolfram (previously Mathematica) when solving problems is simpler, granted that you know what you are doing.
However they do have google alternative that answers you directly and it has a decent AI that processes your natural language to give you an exact answer as best as it could.
So finding factors of numbers is easy as:
Factor[9223372036854775809]
or use web api https://www.wolframalpha.com/input/?i=factor+9223372036854775809
You can also call Wolfram kernel from C#, but terms and conditions apply.

Calculating the approximate run time of a for loop

I have a piece of code in my C# Windows Form Application that looks like below:
List<string> RESULT_LIST = new List<string>();
int[] arr = My_LIST.ToArray();
string s = "";
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < arr.Length; i++)
{
int counter = i;
for (int j = 1; j <= arr.Length; j++)
{
counter++;
if (counter == arr.Length)
{
counter = 0;
}
s += arr[counter].ToString();
RESULT_LIST.Add(s);
}
s = "";
}
sw.Stop();
TimeSpan ts = sw.Elapsed;
string elapsedTime = String.Format("{0:00}", ts.TotalMilliseconds * 1000);
MessageBox.Show(elapsedTime);
I use this code to get any combination of the numbers of My list. I have behaved with My_LIST like a recursive one. The image below demonstrates my purpose very clearly:
All I need to do is:
Making a formula to calculate the approximate run time of these two
nested for loops to guess the run time for any length and help the
user know the approximate time that he/she must wait.
I have used a C# Stopwatch like this: Stopwatch sw = new Stopwatch(); to show the run time and below are the results(Note that in order to reduce the chance of error I've repeated the calculation three times for each length and the numbers show the time in nano seconds for the first, second and third attempt respectively.):
arr.Length = 400; 127838 - 107251 - 100898
arr.Length = 800; 751282 - 750574 - 739869
arr.Length = 1200; 2320517 - 2136107 - 2146099
arr.Length = 2000; 8502631 - 7554743 - 7635173
Note that there are only one-digit numbers in My_LIST to make the time
of adding numbers to the list approximately equal.
How can I find out the relation between arr.Length and run time?
First, let's suppose you have examined the algorithm and noticed that it appears to be quadratic in the array length. This suggests to us that the time taken to run should be a function of the form
t = A + B n + C n2
You've gathered some observations by running the code multiple times with different values for n and measuring t. That's a good approach.
The question now is: what are the best values for A, B and C such that they match your observations closely?
This problem can be solved in a variety of ways; I would suggest to you that the least-squares method of regression would be the place to start, and see if you get good results. There's a page on it here:
www.efunda.com/math/leastsquares/lstsqr2dcurve.cfm
UPDATE: I just looked at your algorithm again and realized it is cubic because you have a quadratic string concat in the inner loop. So this technique might not work so well. I suggest you use StringBuilder to make your algorithm quadratic.
Now, suppose you did not know ahead of time that the problem was quadratic. How would you determine the formula then? A good start would be to graph your points on log scale paper; if they roughly form a straight line then the slope of the line gives you a clue as to the power of the polynomial. If they don't form a straight line -- well, cross that bridge when you come to it.
Well you gonna do some math here.
Since the total number of runs is exactly n^2, not O(n^2) but exactly n^2 times.
Then what you could do is to keep a counter variable for the number of items processed and use math to find out an estimate
int numItemProcessed;
int timeElapsed;//read from stop watch
int totalItems = n * n;
int remainingEstimate = ((float) totalItems - numItemProcessed) / numItemProcessed) * timeElapsed
Don't assume the algorithm is necessarily N^2 in time complexity.
Take the averages of your numbers, and plot the best fit on a log-log plot, then measure the gradient. This will give you an idea as to the largest term in the polynomial. (see wikipedia log-log plot)
Once you have that, you can do a least-squares regression to work out the coefficients of the polynomial of the correct order. This will allow an estimate from the data, of the time taken for an unseen problem.
Note: As Eric Lippert said, it depends on what you want to measure - averaging may not be appropriate depending on your use case - the first run time might be more correct.
This method will work for any polynomial algorithm. It will also tell you if the algorithm is polynomial (non-polynomial running times will not give straight lines on the log-log plot).

Why is it faster to calculate the product of a consecutive array of integers by performing the calculation in pairs?

I was trying to create my own factorial function when I found that the that the calculation is twice as fast if it is calculated in pairs. Like this:
Groups of 1: 2*3*4 ... 50000*50001 = 4.1 seconds
Groups of 2: (2*3)*(4*5)*(6*7) ... (50000*50001) = 2.0 seconds
Groups of 3: (2*3*4)*(5*6*7) ... (49999*50000*50001) = 4.8 seconds
Here is the c# I used to test this.
Stopwatch timer = new Stopwatch();
timer.Start();
// Seperate the calculation into groups of this size.
int k = 2;
BigInteger total = 1;
// Iterates from 2 to 50002, but instead of incrementing 'i' by one, it increments it 'k' times,
// and the inner loop calculates the product of 'i' to 'i+k', and multiplies 'total' by that result.
for (var i = 2; i < 50000 + 2; i += k)
{
BigInteger partialTotal = 1;
for (var j = 0; j < k; j++)
{
// Stops if it exceeds 50000.
if (i + j >= 50000) break;
partialTotal *= i + j;
}
total *= partialTotal;
}
Console.WriteLine(timer.ElapsedMilliseconds / 1000.0 + "s");
I tested this at different levels and put the average times over a few tests in a bar graph. I expected it to become more efficient as I increased the number of groups, but 3 was the least efficient and 4 had no improvement over groups of 1.
Link to First Data
Link to Second Data
What causes this difference, and is there an optimal way to calculate this?
BigInteger has a fast case for numbers of 31 bits or less. When you do a pairwise multiplication, this means a specific fast-path is taken, that multiplies the values into a single ulong and sets the value more explicitly:
public void Mul(ref BigIntegerBuilder reg1, ref BigIntegerBuilder reg2) {
...
if (reg1._iuLast == 0) {
if (reg2._iuLast == 0)
Set((ulong)reg1._uSmall * reg2._uSmall);
else {
...
}
}
else if (reg2._iuLast == 0) {
...
}
else {
...
}
}
public void Set(ulong uu) {
uint uHi = NumericsHelpers.GetHi(uu);
if (uHi == 0) {
_uSmall = NumericsHelpers.GetLo(uu);
_iuLast = 0;
}
else {
SetSizeLazy(2);
_rgu[0] = (uint)uu;
_rgu[1] = uHi;
}
AssertValid(true);
}
A 100% predictable branch like this is perfect for a JIT, and this fast-path should get optimized extremely well. It's possible that _rgu[0] and _rgu[1] are even inlined. This is extremely cheap, so effectively cuts down the number of real operations by a factor of two.
So why is a group of three so much slower? It's obvious that it should be slower than for k = 2; you have far fewer optimized multiplications. More interesting is why it's slower than k = 1. This is easily explained by the fact that the outer multiplication of total now hits the slow path. For k = 2 this impact is mitigated by halving the number of multiplies and the potential inlining of the array.
However, these factors do not help k = 3, and in fact the slow case hurts k = 3 a lot more. The second multiplication in the k = 3 case hits this case
if (reg1._iuLast == 0) {
...
}
else if (reg2._iuLast == 0) {
Load(ref reg1, 1);
Mul(reg2._uSmall);
}
else {
...
}
which allocates
EnsureWritable(1);
uint uCarry = 0;
for (int iu = 0; iu <= _iuLast; iu++)
uCarry = MulCarry(ref _rgu[iu], u, uCarry);
if (uCarry != 0) {
SetSizeKeep(_iuLast + 2, 0);
_rgu[_iuLast] = uCarry;
}
why does this matter? Well, EnsureWritable(1) causes
uint[] rgu = new uint[_iuLast + 1 + cuExtra];
so rgu becomes length 3. The number of passes in total's code is decided in
public void Mul(ref BigIntegerBuilder reg1, ref BigIntegerBuilder reg2)
as
for (int iu1 = 0; iu1 < cu1; iu1++) {
...
for (int iu2 = 0; iu2 < cu2; iu2++, iuRes++)
uCarry = AddMulCarry(ref _rgu[iuRes], uCur, rgu2[iu2], uCarry);
...
}
which means that we have a total of len(total._rgu) * 3 operations. This hasn't saved us anything! There are only len(total._rgu) * 1 passes for k = 1 - we just do it three times!
There is actually an optimization on the outer loop that reduces this back down to len(total._rgu) * 2:
uint uCur = rgu1[iu1];
if (uCur == 0)
continue;
However, they "optimize" this optimization in a way that hurts far more than before:
if (reg1.CuNonZero <= reg2.CuNonZero) {
rgu1 = reg1._rgu; cu1 = reg1._iuLast + 1;
rgu2 = reg2._rgu; cu2 = reg2._iuLast + 1;
}
else {
rgu1 = reg2._rgu; cu1 = reg2._iuLast + 1;
rgu2 = reg1._rgu; cu2 = reg1._iuLast + 1;
}
For k = 2, that causes the outer loop to be over total, since reg2 contains no zero values with high probability. This is great because total is way longer than partialTotal, so the fewer passes the better. For k = 3, the EnsureWritable(1) will always cause a spare space because the multiplication of three numbers no more than 15 bits long can never exceed 64 bits. This means that, although we still only do one pass over total for k = 2, we do two for k = 3!
This starts to explain why the speed increases again beyond k = 3: the number of passes per addition increases slower than the number of additions decreases, as you're only adding ~15 bits to the inner value each time. The inner multiplications are fast relative to the massive total multiplications, so the more time spent consolidating values, the more time saved in passes over total. Further, the optimization is less frequently a pessimism.
It also explains why odd values take longer: they add an extra 32-bit integer to the _rgu array. This won't happen so cleanly if the ~15 bits wasn't so close to half of 32.
It's worth noting that there are a lot of ways to improve this code; the comments here are about why, not how to fix it. The easiest improvement would be to chuck the values in a heap and multiply only the two smallest values at a time.
The time required to do a BigInteger multiplication depends on the size of the product.
Both methods take the same number of multiplications, but if you multiply the factors in pairs, then the average size of the product is much smaller than it is if you multiply each factor with the product of all the smaller ones.
You can do even better if you always multiply the two smallest factors (original factors or intermediate products) that have yet to be multiplied together, until you get to the complete product.
I think you have a bug ('+' instead of '*').
partialTotal *= i + j;
Good to check that you are getting the right answer, not just interesting performance metrics.
But I'm curious what motivated you to try this. If you do find a difference, I would expect it would have to do with optimalities in register and/or memory allocation. And I would expect it would be 0-30% or something like that, not 50%.

Optimization of algorithm that convert 8-bit number to char

I need to convert a 8-bit number such as 00001110 to char. The problem is easy so I wrote the code and everything is working fine, but now I need to optimize for speed as much as possible.
In test class :
class Program
{
static void Main(string[] args)
{
Random r = new Random();
int[] testTab = new int[8];
Normal n = new Normal();
long time;
Stopwatch watch = new Stopwatch();
watch.Start();
for (int i = 0; i < 9000; i++)
{
for (int j = 0; j < 8; j++)
{
testTab[j] = r.Next(2);
}
n.SetTable(testTab);
n.Decode();
}
watch.Stop();
time = watch.ElapsedTicks;
Console.WriteLine(time);
time = watch.ElapsedMilliseconds;
Console.WriteLine(time);
Console.ReadKey();
}
}
and class with algorithm :
class Normal
{
private int[] _tab = new int[8];
public void SetTable(int[] tab)
{
_tab = tab;
}
public void Decode()
{
char a = ((char)( _tab[0]*1 + _tab[1]*2 + _tab[2]*4 + _tab[3]*8 + _tab[4]*16 + _tab[5]*32 +
_tab[6]*64 + _tab[7]*124));
}
}
In the output for 9000 times I get time 2ms it is not a long time ( for 9000 ) time, but I have good proc in my PC.
The final code will be running in smartphone so there is no powerful CPU. In my algorithm I use random data, in final version I will load data by Camera (so it will be longer ) and try to repeat this operation 10 times in one second so that is why I need best time in even smallest operations.
Is there a faster way to convert byte to char than this?
char a = ((char)( _tab[0]*1 + _tab[1]*2 + _tab[2]*4 + _tab[3]*8 + _tab[4]*16 + _tab[5]*32 + _tab[6]*64 + _tab[7]*128));
tl;dr Your conversion code is already efficient, and is not your bottleneck.
Your benchmarking is flawed. You are not just timing the conversion of binary stored in int[] to integer value. You are also timing the generation of your random data. I expect that the majority of the time is spent generating the random data.
Re-write your benchmarking program to operate on data prepared before you start timing. Make sure that the duration of the test is at least 5 or 10 seconds so that you can generate meaningful answers. If you only run for two milliseconds then the granularity of your timer affects the quality of your results.
Bear in mind that in your real application you will be taking a picture on a camera of a QR code and decoding that. The cost of that is many orders of magnitude greater than the cost of converting the 8 bit int arrays.
Your code to do that conversion is already efficient. Do not seek to optimize it further. Not only is there no need to optimize it, there is little hope for significant gains. For the sake of clarity and conciseness you may well opt to use one of the .net library methods that perform such a conversion, but performance of this part of your program is not an issue.
As an aside, it looks like you need to be converting the 8 bit value to byte, adding these values to a byte array, and then feeding to Encoding.GetString to obtain your text. A cast to UTF-16 char as per your code is not correct.
It worth a try this:
var yourString = "00100000";
char yourChar = (char) Convert.ToByte(yourString, 2); // you got ' ' (space)
It may or may not faster, but definitely simpler, more stable and more maintainable.
I ran some tests with different implementations.
First was #Melnikovl answer.
Second was mine, where I replaced + with | and * with << operator.
Third was author's original solution.
I tested with modified code and measured only conversion code.
First and second solution showed a little better performance. But BitConverter a little more often was better, so I think you should choose it (also because if simplicity of code)
var byte[] bytes = { 1, 1, 1, 1 };
int i = BitConverter.ToInt32(bytes, 0);
char a = (char)i;
Don't forgot to check if byte array litte or big endian

Simple program using Tasks

I got homework to create a program using Tasks.
The program should generate 100,000 numbers in the range from 100,000 to 100,000,000
Then check whether the number is a prime number and if that number is Fibonacci number.
There are strict requirements for this program:
1) The first task should generate the number.
2) Second task should check if the number is prime.
3) The third task should check if the number is fibonacci.
!! Only after the third action program should generate a new number. !!
I have written a program, but it takes a very long time before all the numbers is generated. What I'm doing wrong? Is there any other way to do it faster ? Because it takes about 25 minutes to generate all the numbers. (I'm using MS VStudio 2010, Framework 4)
Sorry for my English..
Code:
for (int i = 0; i < 100000; i++)
{
Task<int> t1 = Task<int>.Factory.StartNew(() => RandomGen1.Next()); // Generate Number
Task<string> t2 = Task<string>.Factory.StartNew(() => is_prime(t1.Result)); // Return Yes or No
Task<string> t3 = Task<string>.Factory.StartNew(() => IsFibonacci(t1.Result)); // Return Yes or No
Task.WaitAll(new Task[] {t1,t2,t3});
this.dataGridView1.Rows.Add(t1.Result, t2.Result, t3.Result);
}
Update:
Here's my is_prime and isFibonacci functions:
public string is_prime(int number)
{
for (int i = 2; i < number; ++i)
{
if (number % i == 0)
{
return "";
}
}
return "Yes";
}
static string IsFibonacci(int number)
{
double fi = (1 + Math.Sqrt(5)) / 2.0; //Golden ratio
int n = (int)Math.Floor(Math.Log(number * Math.Sqrt(5) + 0.5, fi)); //Find's the index (n) of the given number in the fibonacci sequence
int actualFibonacciNumber = (int)Math.Floor(Math.Pow(fi, n) / Math.Sqrt(5) + 0.5); //Finds the actual number corresponding to given index (n)
if (actualFibonacciNumber == number)
{
return "Yes";
}
return "";
}
Whilst I don't usually do homework for people, I found this a nice problem to solve, and I had a some code laying around from solving Project Euler questions that I could put to use.
I agree with one of the comments that this is a strange requirement to use to Tasks for, seems to be using Tasks for just for the sake of it.
The console app I put together runs in about 15 seconds. No, I'm not going to post the code, because you won't learn anything.
Without seeing the code behind your is_prime and IsFibonacci methods, it's hard to say what could be wrong. I hope you aren't doing something silly like calculating a list of all the fibonacci numbers up to the random number on each iteration through the loop, as that is going to be dreadfully slow.
Remember: A number is Fibonacci if and only if one or both of (5*n2 + 4) or (5*n2 – 4) is a perfect square.

Categories

Resources