Plinq gives different results from Linq - what am I doing wrong? - c#

Can anyone tell me what the correct Plinq code is for this? I'm adding up the square root of the absolute value of the sine of each element fo a double array, but the Plinq is giving me the wrong result.
Output from this program is:
Linq aggregate = 75.8310477905274 (correct)
Plinq aggregate = 38.0263653589291 (about half what it should be)
I must be doing something wrong, but I can't work out what...
(I'm running this with Visual Studio 2008 on a Core 2 Duo Windows 7 x64 PC.)
Here's the code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;
namespace ConsoleApplication1
{
class Program
{
static void Main()
{
double[] array = new double[100];
for (int i = 0; i < array.Length; ++i)
{
array[i] = i;
}
double sum1 = array.Aggregate((total, current) => total + Math.Sqrt(Math.Abs(Math.Sin(current))));
Console.WriteLine("Linq aggregate = " + sum1);
IParallelEnumerable<double> parray = array.AsParallel<double>();
double sum2 = parray.Aggregate((total, current) => total + Math.Sqrt(Math.Abs(Math.Sin(current))));
Console.WriteLine("Plinq aggregate = " + sum2);
}
}
}

Aggregate works slightly differently in PLINQ.
From MSDN Blogs:
Rather than expecting a value to
initialize the accumulator to, the
user gives us a factory function that
generates the value:
public static double Average(this IEnumerable<int> source)
{
return source.AsParallel().Aggregate(
() => new double[2],
(acc, elem) => { acc[0] += elem; acc[1]++; return acc; },
(acc1, acc2) => { acc1[0] += acc2[0]; acc1[1] += acc2[1]; return acc1; },
acc => acc[0] / acc[1]);
}
Now, PLINQ can initialize an
independent accumulator for each
thread. Now that each thread gets its
own accumulator, both the folding
function and the accumulator combining
function are free to mutate the
accumulators. PLINQ guarantees that
accumulators will not be accessed
concurrently from multiple threads .
So, in your case, you would also need to pass an accumulator function which sums the outputs of the paralleled aggregates (hence why you're seeing a result that is roughly half of what it should be).

Thank you MSDN Blogs. It now seems to be working correctly. I changed my code as follows:
using System;
using System.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main()
{
Test();
}
static void Test()
{
double[] array = new double[100];
for (int i = 0; i < array.Length; ++i)
{
array[i] = i;
}
double sum1 = array.Aggregate((total, current) => total + Math.Sqrt(Math.Abs(Math.Sin(current))));
Console.WriteLine("Linq aggregate = " + sum1);
IParallelEnumerable<double> parray = array.AsParallel();
double sum2 = parray.Aggregate
(
0.0,
(total1, current1) => total1 + Math.Sqrt(Math.Abs(Math.Sin(current1))),
(total2, current2) => total2 + current2,
acc => acc
);
Console.WriteLine("Plinq aggregate = " + sum2);
}
}
}

Related

Async Await Loop/Math problems

I'm making a little program to practice with WPF and Async/Await for multithreading, and what the program does is:
Find all the prime numbers between two numbers "a" and "b", and spit them out to a textbox called "Prime1".
Simultaneously in a different task, find all the prime numbers between "c" and "d", and spit them out to a textbox called "Prime2".
A button in the window will allow the user to click it, and it will keep track of how many times it has been clicked, whilst the other two tasks find prime numbers, to demonstrate asynchronous operations.
The code is as follows:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Data;
using System.Windows.Documents;
using System.Windows.Input;
using System.Windows.Media;
using System.Windows.Media.Imaging;
using System.Windows.Navigation;
using System.Windows.Shapes;
namespace WPF_Asynch_Project
{
public partial class MainWindow : Window
{
public int ClickAmount = 0;
public MainWindow()
{
InitializeComponent();
DelegationIsAwesome();
}
private void Test_Click(object sender, RoutedEventArgs e)
{
ClickAmount++;
MessageBox.Show("You clicked me " + ClickAmount.ToString() + " times!");
}
private void TextBox_TextChanged(object sender, TextChangedEventArgs e)
{
}
private async void DelegationIsAwesome()
{
Task enumtask = new Task(() => FindPrimes(100000, 100000000));
Task[] enumall = new Task[2];
enumall[0] = enumtask;
enumall[1] = new Task(() => FindPrimes2(1000, 10000));
enumall.ToList().ForEach(t => t.Start());
await Task.WhenAll(enumall).ConfigureAwait(false);
}
private void FindPrimes(long lower, long upper)
{
for (long i = lower; i < upper; i++)
{
long primeornot = 1;
for (long q = 2; q < i; q++)
{
if (i % q == 0)
{
primeornot = 0;
}
}
if (primeornot == 1)
{
System.Threading.Thread.Sleep(6);
Prime1.Dispatcher.BeginInvoke(
(Action)(()=>{ Prime1.Text += i.ToString() + ", "; }));
}
}
}
private void FindPrimes2(int lower, long upper)
{
for (int i = lower; i < upper; i++)
{
int primeornot = 1;
for (int q = 2; q < i; q++)
{
if (i % q == 0)
{
primeornot = 0;
}
}
if (primeornot == 1)
{
System.Threading.Thread.Sleep(5);
Prime2.Dispatcher.BeginInvoke(
(Action)(() => { Prime2.Text += i.ToString() + ", "; }));
}
}
}
}
}
However I get odd results. The following is a picture from the program:
Obviously the output from the prime-finding methods is incorrect. But why does it keep repeating those same numbers? It also sometimes spits out a number equal to UpperBound even though "i" should never equal or be greater than UpperBound.
What is happening to my output, and how do I fix it?
This has nothing to do with async/await, really.
You're calling BeginInvoke here:
Prime1.Dispatcher.BeginInvoke(
(Action)(()=>{ Prime1.Text += i.ToString() + ", "; }));
... and your lambda expression uses i, which means it will append the current value of i when the delegate executes. That's not necessarily the value of i when you call BeginInvoke.
If you want to capture the value (rather than the variable) you basically need to instantiate a new variable each time. You might as well do the conversion to a string:
string textToAppend = i + ", ";
// No need for braces here...
Prime1.Dispatcher.BeginInvoke((Action)(() => Prime1.Text += textToAppend));
Because you've declared the variable textToAppend inside the loop, each iteration will create a delegate capturing a separate variable.
You need to do this in both of your methods.

Right way to do a Parallel.For to compute data from Array

want to: sum x and sum x*x. Where x = line[i].
Because more than one thread wants to read/write to the "sumAll" and "sumAllQ" I need to lock its access.
The problem is that the lock kind off serializes things here. I would need to split this operation in #"Environment.ProcessorCount" for loops, each one summing one part of the array, and finally summing theirs results. But how can I make it programmatically?
Sample code:
//line is a float[]
Parallel.For(0, line.Length,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
i =>
{
x = (double)line[i];
lock (sumLocker)
{
sumAll += x;
sumAllQ += x * x;
}
});
EDIT 1:
Matthew Watson answer Benchmark results
At home. CPU Core 2 Quad Q9550 # 2.83 GHz:
Result via Linq: SumAll=49999950000, SumAllQ=3,33332833333439E+15
Result via loop: SumAll=49999950000, SumAllQ=3,33332833333439E+15
Result via partition: SumAll=49999950000, SumAllQ=3,333328333335E+15
Via Linq took: 00:00:02.6983044
Via Loop took: 00:00:00.4811901
Via Partition took: 00:00:00.1595113
At work. CPU i7 930 2.8 GHz:
Result via Linq: SumAll=49999950000, SumAllQ=3,33332833333439E+15
Result via loop: SumAll=49999950000, SumAllQ=3,33332833333439E+15
Result via partition: SumAll=49999950000, SumAllQ=3,333328333335E+15
Via Linq took: 00:00:01.5728736
Via Loop took: 00:00:00.3436929
Via Partition took: 00:00:00.0934209
vcjones wondered about whether you would really see any speedup. Well the answer is: it probably depends how many cores you have. The PLinq is slower than a plain loop on my home PC (which is quad core).
I've come up with an alternative approach which uses a Partitioner to chop the list of numbers up into several sections so you can add up each one separately. There's also some more information about using a Partitioner here.
Using the Partitioner approach seems a bit faster, at least on my home PC.
Here's my test program. Note that you must run a release build of this outside any debugger to get the right timings.
The important method in this code is ViaPartition():
Result ViaPartition(double[] numbers)
{
var result = new Result();
var rangePartitioner = Partitioner.Create(0, numbers.Length);
Parallel.ForEach(rangePartitioner, (range, loopState) =>
{
var subtotal = new Result();
for (int i = range.Item1; i < range.Item2; i++)
{
double n = numbers[i];
subtotal.SumAll += n;
subtotal.SumAllQ += n*n;
}
lock (result)
{
result.SumAll += subtotal.SumAll;
result.SumAllQ += subtotal.SumAllQ;
}
});
return result;
}
My results when I run the full test program (shown below these results) are:
Result via Linq: SumAll=49999950000, SumAllQ=3.33332833333439E+15
Result via loop: SumAll=49999950000, SumAllQ=3.33332833333439E+15
Result via partition: SumAll=49999950000, SumAllQ=3.333328333335E+15
Via Linq took: 00:00:01.1994524
Via Loop took: 00:00:00.2357107
Via Partition took: 00:00:00.0756707
(Note the slight differences due to rounding errors.)
It'd be interesting to see the results from other systems.
Here's the full test program:
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
namespace Demo
{
public class Result
{
public double SumAll;
public double SumAllQ;
public override string ToString()
{
return string.Format("SumAll={0}, SumAllQ={1}", SumAll, SumAllQ);
}
}
class Program
{
void run()
{
var numbers = Enumerable.Range(0, 1000000).Select(n => n/10.0).ToArray();
// Prove that the calculation is correct.
Console.WriteLine("Result via Linq: " + ViaLinq(numbers));
Console.WriteLine("Result via loop: " + ViaLoop(numbers));
Console.WriteLine("Result via partition: " + ViaPartition(numbers));
int count = 100;
TimeViaLinq(numbers, count);
TimeViaLoop(numbers, count);
TimeViaPartition(numbers, count);
}
void TimeViaLinq(double[] numbers, int count)
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < count; ++i)
ViaLinq(numbers);
Console.WriteLine("Via Linq took: " + sw.Elapsed);
}
void TimeViaLoop(double[] numbers, int count)
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < count; ++i)
ViaLoop(numbers);
Console.WriteLine("Via Loop took: " + sw.Elapsed);
}
void TimeViaPartition(double[] numbers, int count)
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < count; ++i)
ViaPartition(numbers);
Console.WriteLine("Via Partition took: " + sw.Elapsed);
}
Result ViaLinq(double[] numbers)
{
return numbers.AsParallel().Aggregate(new Result(), (input, value) => new Result
{
SumAll = input.SumAll+value,
SumAllQ = input.SumAllQ+value*value
});
}
Result ViaLoop(double[] numbers)
{
var result = new Result();
for (int i = 0; i < numbers.Length; ++i)
{
double n = numbers[i];
result.SumAll += n;
result.SumAllQ += n*n;
}
return result;
}
Result ViaPartition(double[] numbers)
{
var result = new Result();
var rangePartitioner = Partitioner.Create(0, numbers.Length);
Parallel.ForEach(rangePartitioner, (range, loopState) =>
{
var subtotal = new Result();
for (int i = range.Item1; i < range.Item2; i++)
{
double n = numbers[i];
subtotal.SumAll += n;
subtotal.SumAllQ += n*n;
}
lock (result)
{
result.SumAll += subtotal.SumAll;
result.SumAllQ += subtotal.SumAllQ;
}
});
return result;
}
static void Main()
{
new Program().run();
}
}
}
As suggested in the comments, you can use Aggregate to accomplish this with AsParallel in LINQ. For example:
using System.Linq;
//A class to hold the results.
//This can be improved by making it immutable and using a constructor.
public class Result
{
public double SumAll { get; set; }
public double SumAllQ { get; set; }
}
And you can use LINQ like so:
var result = line.AsParallel().Aggregate(new Result(), (input, value) => new Result {SumAll = input.SumAll+value, SumAllQ = input.SumAllQ+value*value});
Or even better:
var pline = line.AsParallel().WithDegreeOfParallelism(Environment.ProcessorCount);
var result = new Result { SumAll = pline.Sum(), SumAllQ = pline.Sum(x => x * x) };
AsParallel doesn't give you the ability to directly specify options, but you can use .WithDegreeOfParallelism(), .WithExecutionMode(), or .WithMergeOptions() to give you more control. You may have to use WithDegreeOfParallelism to even get it to run with multiple threads.

For vs. Linq - Performance vs. Future

Very brief question. I have a randomly sorted large string array (100K+ entries) where I want to find the first occurance of a desired string. I have two solutions.
From having read what I can my guess is that the 'for loop' is going to currently give slightly better performance (but this margin could always change), but I also find the linq version much more readable. On balance which method is generally considered current best coding practice and why?
string matchString = "dsf897sdf78";
int matchIndex = -1;
for(int i=0; i<array.length; i++)
{
if(array[i]==matchString)
{
matchIndex = i;
break;
}
}
or
int matchIndex = array.Select((r, i) => new { value = r, index = i })
.Where(t => t.value == matchString)
.Select(s => s.index).First();
The best practice depends on what you need:
Development speed and maintainability: LINQ
Performance (according to profiling tools): manual code
LINQ really does slow things down with all the indirection. Don't worry about it as 99% of your code does not impact end user performance.
I started with C++ and really learnt how to optimize a piece of code. LINQ is not suited to get the most out of your CPU. So if you measure a LINQ query to be a problem just ditch it. But only then.
For your code sample I'd estimate a 3x slowdown. The allocations (and subsequent GC!) and indirections through the lambdas really hurt.
Slightly better performance? A loop will give SIGNIFICANTLY better performance!
Consider the code below. On my system for a RELEASE (not debug) build, it gives:
Found via loop at index 999999 in 00:00:00.2782047
Found via linq at index 999999 in 00:00:02.5864703
Loop was 9.29700432810805 times faster than linq.
The code is deliberately set up so that the item to be found is right at the end. If it was right at the start, things would be quite different.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace Demo
{
public static class Program
{
private static void Main(string[] args)
{
string[] a = new string[1000000];
for (int i = 0; i < a.Length; ++i)
{
a[i] = "Won't be found";
}
string matchString = "Will be found";
a[a.Length - 1] = "Will be found";
const int COUNT = 100;
var sw = Stopwatch.StartNew();
int matchIndex = -1;
for (int outer = 0; outer < COUNT; ++outer)
{
for (int i = 0; i < a.Length; i++)
{
if (a[i] == matchString)
{
matchIndex = i;
break;
}
}
}
sw.Stop();
Console.WriteLine("Found via loop at index " + matchIndex + " in " + sw.Elapsed);
double loopTime = sw.Elapsed.TotalSeconds;
sw.Restart();
for (int outer = 0; outer < COUNT; ++outer)
{
matchIndex = a.Select((r, i) => new { value = r, index = i })
.Where(t => t.value == matchString)
.Select(s => s.index).First();
}
sw.Stop();
Console.WriteLine("Found via linq at index " + matchIndex + " in " + sw.Elapsed);
double linqTime = sw.Elapsed.TotalSeconds;
Console.WriteLine("Loop was {0} times faster than linq.", linqTime/loopTime);
}
}
}
LINQ, according to declarative paradigm, expresses the logic of a computation without describing its control flow. The query is goal oriented, selfdescribing and thus easy to analyse and understand. Is also concise. Moreover, using LINQ, one depends highly upon abstraction of data structure. That involves high rate of maintanability and reusability.
Iteration aproach addresses imperative paradigm. It gives fine-grained control, thus ease obtain higher performance. The code is also simpler to debug. Sometimes well contructed iteration is more readable than query.
There is always dilemma between performance and maintainability. And usually (if there is no specific requirements about performance) maintainability should win. Only if you have performance problems, then you should profile application, find problem source, and improve its performance (by reducing maintainability at same time, yes that's the world we live in).
About your sample. Linq is not very good solution here, because it do not add match maintainability into your code. Actually for me projecting, filtering, and projecting again looks even worse, than simple loop. What you need here is simple Array.IndexOf, which is more maintainable, than loop, and have almost same performance:
Array.IndexOf(array, matchString)
Well, you gave the answer to your question yourself.
Go with a For loop if you want the best performance, or go with Linq if you want readability.
Also perhaps keep in mind the possibility of using Parallel.Foreach() which would benefit from in-line lambda expressions (so, more closer to Linq), and that is much more readable then doing paralelization "manually".
I don't think either is considered best practice some people prefer looking at LINQ and some don't.
If performance is a issue the I would profile both bits of code for your scenario and if the difference is negligible then go with the one you feel more conformable with, after all it will most likely be you who maintains the code.
Also have you thought about using PLINQ or making the loop run in parallel?
Just an interesting observation. LINQ Lambda queries for sure add a penalty over LINQ Where queries or a For Loop. In the following code, it fills a list with 1000001 multi-parameter objects and then searches for a specific item that in this test will always be the last one, using a LINQ Lamba, a LINQ Where Query and a For Loop. Each test iterates 100 times and then averages the times to get the results.
LINQ Lambda Query Average Time: 0.3382 seconds
LINQ Where Query Average Time: 0.238 seconds
For Loop Average Time: 0.2266 seconds
I've run this test over and over, and even increase the iteration and the spread is pretty much identical statistically speaking. Sure we are talking 1/10 of a second for essentially that a million item search. So in the real world, unless something is that intensive, not sure you would even notice. But if you do the LINQ Lambda vs LINQ Where query does have a difference in performance. The LINQ Where is near the same as the For Loop.
private void RunTest()
{
try
{
List<TestObject> mylist = new List<TestObject>();
for (int i = 0; i <= 1000000; i++)
{
TestObject testO = new TestObject(string.Format("Item{0}", i), 1, Guid.NewGuid().ToString());
mylist.Add(testO);
}
mylist.Add(new TestObject("test", "29863", Guid.NewGuid().ToString()));
string searchtext = "test";
int iterations = 100;
// Linq Lambda Test
List<int> list1 = new List<int>();
for (int i = 1; i <= iterations; i++)
{
DateTime starttime = DateTime.Now;
TestObject t = mylist.FirstOrDefault(q => q.Name == searchtext);
int diff = (DateTime.Now - starttime).Milliseconds;
list1.Add(diff);
}
// Linq Where Test
List<int> list2 = new List<int>();
for (int i = 1; i <= iterations; i++)
{
DateTime starttime = DateTime.Now;
TestObject t = (from testO in mylist
where testO.Name == searchtext
select testO).FirstOrDefault();
int diff = (DateTime.Now - starttime).Milliseconds;
list2.Add(diff);
}
// For Loop Test
List<int> list3 = new List<int>();
for (int i = 1; i <= iterations; i++)
{
DateTime starttime = DateTime.Now;
foreach (TestObject testO in mylist)
{
if (testO.Name == searchtext)
{
TestObject t = testO;
break;
}
}
int diff = (DateTime.Now - starttime).Milliseconds;
list3.Add(diff);
}
float diff1 = list1.Average();
Debug.WriteLine(string.Format("LINQ Lambda Query Average Time: {0} seconds", diff1 / (double)100));
float diff2 = list2.Average();
Debug.WriteLine(string.Format("LINQ Where Query Average Time: {0} seconds", diff2 / (double)100));
float diff3 = list3.Average();
Debug.WriteLine(string.Format("For Loop Average Time: {0} seconds", diff3 / (double)100));
}
catch (Exception ex)
{
Debug.WriteLine(ex.ToString());
}
}
private class TestObject
{
public TestObject(string _name, string _value, string _guid)
{
Name = _name;
Value = _value;
GUID = _guid;
}
public string Name;
public string Value;
public string GUID;
}
The Best Option Is To Use IndexOf method of Array Class. Since it is specialized for arrays it will b significantly faster than both Linq and For Loop.
Improving on Matt Watsons Answer.
using System;
using System.Diagnostics;
using System.Linq;
namespace PerformanceConsoleApp
{
public class LinqVsFor
{
private static void Main(string[] args)
{
string[] a = new string[1000000];
for (int i = 0; i < a.Length; ++i)
{
a[i] = "Won't be found";
}
string matchString = "Will be found";
a[a.Length - 1] = "Will be found";
const int COUNT = 100;
var sw = Stopwatch.StartNew();
Loop(a, matchString, COUNT, sw);
First(a, matchString, COUNT, sw);
Where(a, matchString, COUNT, sw);
IndexOf(a, sw, matchString, COUNT);
Console.ReadLine();
}
private static void Loop(string[] a, string matchString, int COUNT, Stopwatch sw)
{
int matchIndex = -1;
for (int outer = 0; outer < COUNT; ++outer)
{
for (int i = 0; i < a.Length; i++)
{
if (a[i] == matchString)
{
matchIndex = i;
break;
}
}
}
sw.Stop();
Console.WriteLine("Found via loop at index " + matchIndex + " in " + sw.Elapsed);
}
private static void IndexOf(string[] a, Stopwatch sw, string matchString, int COUNT)
{
int matchIndex = -1;
sw.Restart();
for (int outer = 0; outer < COUNT; ++outer)
{
matchIndex = Array.IndexOf(a, matchString);
}
sw.Stop();
Console.WriteLine("Found via IndexOf at index " + matchIndex + " in " + sw.Elapsed);
}
private static void First(string[] a, string matchString, int COUNT, Stopwatch sw)
{
sw.Restart();
string str = "";
for (int outer = 0; outer < COUNT; ++outer)
{
str = a.First(t => t == matchString);
}
sw.Stop();
Console.WriteLine("Found via linq First at index " + Array.IndexOf(a, str) + " in " + sw.Elapsed);
}
private static void Where(string[] a, string matchString, int COUNT, Stopwatch sw)
{
sw.Restart();
string str = "";
for (int outer = 0; outer < COUNT; ++outer)
{
str = a.Where(t => t == matchString).First();
}
sw.Stop();
Console.WriteLine("Found via linq Where at index " + Array.IndexOf(a, str) + " in " + sw.Elapsed);
}
}
}
Output:
Found via loop at index 999999 in 00:00:01.1528531
Found via linq First at index 999999 in 00:00:02.0876573
Found via linq Where at index 999999 in 00:00:01.3313111
Found via IndexOf at index 999999 in 00:00:00.7244812
A bit of a non-answer, and really just an extension to https://stackoverflow.com/a/14894589, but I have, on and off, been working on an API-compatible replacement for Linq-to-Objects for a while now. It still doesn't provide the performance of a hand-coded loop, but it is faster for many (most?) linq scenarios. It does create more garbage, and has some slightly heavier up front costs.
The code is available https://github.com/manofstick/Cistern.Linq
A nuget package is available https://www.nuget.org/packages/Cistern.Linq/ (I can't claim this to be battle hardened, use at your own risk)
Taking the code from Matthew Watson's answer (https://stackoverflow.com/a/14894589) with two slight tweaks, and we get the time down to "only" ~3.5 time worse than the hand-coded loop. On my machine it take about 1/3 of the time of original System.Linq version.
The two changes to replace:
using System.Linq;
...
matchIndex = a.Select((r, i) => new { value = r, index = i })
.Where(t => t.value == matchString)
.Select(s => s.index).First();
With the following:
// a complete replacement for System.Linq
using Cistern.Linq;
...
// use a value tuple rather than anonymous type
matchIndex = a.Select((r, i) => (value: r, index: i))
.Where(t => t.value == matchString)
.Select(s => s.index).First();
So the library itself is a work in progress. It fails a couple of edge cases from the corefx's System.Linq test suite. It also still needs a few functions to be converted over (they currently have the corefx System.Linq implementation, which is compatible from an API perspective, if not a performance perspective). But anymore who wants to help, comment, etc would be appreciated....

How can multiple IndexOf be faster than raw iteration?

string s = "abcabcabcabcabc";
var foundIndexes = new List<int>();
The question came from the discussion here. I was simply wondering
How can this:
for (int i = s.IndexOf('a'); i > -1; i = s.IndexOf('a', i + 1))
foundIndexes.Add(i);
Be better than this :
for (int i = 0; i < s.Length; i++)
if (s[i] == 'a') foundIndexes.Add(i);
EDIT : Where all does the performance gain come from?
I did not observe that using IndexOf was any faster than direct looping. Honestly, I don't see how it could be because each character needs to be checked in both cases. My initial results were this:
Found by loop, 974 ms
Found by IndexOf 1144 ms
Edit: After running several more times I've noticed that you must run release (ie with optimizations) to see my result above. Without optimizations, the for loop is indeed slower.
The benchmark code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using System.Text;
using System.IO;
using System.Diagnostics;
namespace Test
{
public class Program
{
public static void Main(string[] args)
{
const string target = "abbbdbsdbsbbdbsabdbsabaababababafhdfhffadfd";
// Jit methods
TimeMethod(FoundIndexesLoop, target, 1);
TimeMethod(FoundIndexesIndexOf, target, 1);
Console.WriteLine("Found by loop, {0} ms", TimeMethod(FoundIndexesLoop, target, 2000000));
Console.WriteLine("Found by IndexOf {0} ms", TimeMethod(FoundIndexesIndexOf, target, 2000000));
}
private static long TimeMethod(Func<string, List<int>> method, string input, int reps)
{
var stopwatch = Stopwatch.StartNew();
List<int> result = null;
for(int i = 0; i < reps; i++)
{
result = method(input);
}
stopwatch.Stop();
TextWriter.Null.Write(result);
return stopwatch.ElapsedMilliseconds;
}
private static List<int> FoundIndexesIndexOf(string s)
{
List<int> indexes = new List<int>();
for (int i = s.IndexOf('a'); i > -1; i = s.IndexOf('a', i + 1))
{
// for loop end when i=-1 ('a' not found)
indexes.Add(i);
}
return indexes;
}
private static List<int> FoundIndexesLoop(string s)
{
var indexes = new List<int>();
for (int i = 0; i < s.Length; i++)
{
if (s[i] == 'a')
indexes.Add(i);
}
return indexes;
}
}
}
IndexOf(char value, int startIndex) is marked with the following attribute: [TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")].
Also, the implementation of this method is most likely optimized in many other ways, probably using unsafe code, or using more "native" techniques, say, using the native FindNLSString Win32 function.

Arbitrary Precision for decimals in C# help?

Here is my current code for computing Pi using the chudnovsky method in c#:
using System;
using System.Diagnostics;
using System.IO;
using java.math;
namespace pi.chudnovsky
{
public class Program
{
static Double Factorial(Double fact)
{
//begin factorial function
if (fact <= 1)
return 1;
else
return fact * Factorial(fact - 1); //loops multiplication until the factorial is reached
}
static Double doSummation(Double maxPower)
{
//begin chudnovsky summation function
Double sum = 0;
for (int i = 0; i <= maxPower; i++) //starts at i=0
{
sum += ((Math.Pow(-1, i)) * Factorial(6 * i) * (13591409 + 5451401 * i)) / (Factorial(3 * i) * Factorial(i) * Factorial(i) * Factorial(i) * Math.Pow(640320, (3 * i + 1.5))); //chudnovsky algorithm
}
return sum;
}
static void Main(string[] args)
{
int num;
Console.WriteLine("Enter how many terms to compute Chudnovsky summation: ");
//begin stopwatch
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
//parse user input
num = Convert.ToInt32(Console.ReadLine());
//perform calculation
Double inv = 1 / (12 * doSummation(num));
//stop stopwatch
stopwatch.Stop();
//display info
Console.WriteLine(inv);
Console.WriteLine("3.14159265358979323846264338327950288419716939937510");
Console.WriteLine("Time elapsed: {0}", stopwatch.Elapsed.TotalMilliseconds);
//write to pi.txt
TextWriter pi = new StreamWriter("pi.txt");
pi.WriteLine(inv);
pi.Close();
//write to stats.txt
TextWriter stats = new StreamWriter("stats.txt");
stats.WriteLine(stopwatch.Elapsed.TotalMilliseconds);
stats.Close();
}
}
}
So, I've included the J# library, and included java.math. Now when I replace all the "double"s with "BigDecimal"s, I get these compile errors:
http://f.cl.ly/items/1r2X26470d0d0n260p0p/Image%202011-11-14%20at%206.16.19%20PM.png
I know that this isn't the problem with me using Int for the loops, as it worked perfectly with Doubles. My question is, how do you resolve these errors relating to int and BigDecimal, or can you recommend another arbitrary precision library?
I've tried using XMPIR, have gotten everything to compile, but I get:
http://f.cl.ly/items/1l3C371j2u3z3n2g3a0j/Image%202011-11-14%20at%206.20.24%20PM.png
So I can I use p/invoke to include xmpir so I can use whatever the bigdecimal class is?
Thank you for your time!
Can you not convert your int's to BigDecimal's before comparing them?
I assume you understand the problem here is that there is no operator overload for the greater, less than, etc. signs on the BigDecimal class that accepts an int.

Categories

Resources