Why does post-increment fail in .Aggregate(...) but pre-increment succeeds? - c#

I was fiddling with one of my project euler answers to try and make it a bit shorter/cleaner/succinct.
I came up with this:
Sequences.FibonacciBig() // infinite fib sequence of type BigInteger
.TakeWhile(f => f.ToString().Length < 1000)
.Aggregate(1, (i, _) => i++);
My test failed as the actual was 1, which seemed odd. I first thought that the lazy enumerable wasn't being evaluated or something like that. I replaced with i += 1 and it worked, test passed. Then I replaced with ++i and it still worked.
I'm confused as to why the statement seems to not be evaluated at all when using the post-increment operator. At worst, I expected some kind of off-by-one error, but not have the aggregate function effectively do nothing.
Can someone explain?

Look at following code:
private int value = 0;
public int GetValue()
{
return value++;
}
would you expect it to return 1 when called for the first time? It doesn't. It returns current value of value and than increments it. The same happens to your lambda expression.
.Aggregate(1, (i, _) => i++);
It returns current value of i, and than increments it (which is pointless at that point as you're not holding reference to it anywhere else).
pre-increment and += work because they increment the value before returning it.

i++ increments i as a side effect, but the value of the i++ expression will be the value before i was incremented, unlike with ++i where the value will be the value of i after the increment.
In other words:
var i = 3;
var a = i++;
Console.WriteLine("a = {0}, i = {1}", a, i); // a = 3, i = 4
Compare this to:
var i = 3;
var a = ++i;
Console.WriteLine("a = {0}, i = {1}", a, i); // a = 4, i = 4
But anyway this doesn't really matter here since you shouldn't be incrementing i anyway in your code. You could just write:
.Aggregate(1, (i, _) => i + 1)
because i is a parameter, so it's just a local variable that you don't reuse later.
But, actually, why don't you just write .Count() + 1 instead? Because that's exactly what your Aggregate call does...

Related

How to use C# Parallel.For with thread local storage reference type

I am looking for an example on how to use Parallel.For in C# with a reference type. I have been through the MSDN documentation, and all that I can find are examples that use a value type for thread local storage. The code that I'm trying is as follows:
public string[] BuildStrings(IEnumerable<string> str1, IEnumerable<string> str2, IEnumerable<string> str3)
{
// This method aggregates the strings in each of the collections and returns the combined set of strings. For example:
// str1 = "A1", "B1", "C1"
// str2 = "A2", "B2", "C2"
// str3 = "A3", "B3", "C3"
//
// Should return:
// "A1 A2 A3"
// "B1 B2 B3"
// "C1 C2 C3"
//
// The idea behind this code is to use a Parallel.For along with a thread local storage StringBuilder object per thread.
// Don't need any final method to execute after each partition has completed.
// No example on how to do this that I can find.
int StrCount = str1.Count(); // str1, str2, and str3 guaranteed to be equal in size and > 0.
var RetStr = new string[StrCount];
Parallel.For<StringBuilder>(0, StrCount, () => new StringBuilder(200), (i, j, sb1) =>
{
sb1.Clear();
sb1.Append(str1.ElementAt(i)).Append(' ').Append(str2.ElementAt(i)).Append(' ').Append(str3.ElementAt(i));
RetStr[i] = sb1.ToString();
}, (x) => 0);
return RetStr;
}
This code will not compile on Visual Studio 2013 Express edition. The error is on the Parallel.For line, right after the "(200),":
"Not all code paths return a value in lambda expression of type
'System.Func<
int,System.Threading.Tasks.ParallelLoopState,System.Text.StringBuilder,System.Text.StringBuilder>'"
The test code looks like this:
static void Main(string[] args)
{
int Loop;
const int ArrSize = 50000;
// Declare the lists to hold the first, middle, and last names of the clients.
List<string> List1 = new List<string>(ArrSize);
List<string> List2 = new List<string>(ArrSize);
List<string> List3 = new List<string>(ArrSize);
// Init the data.
for (Loop = 0; Loop < ArrSize; Loop++)
{
List1.Add((Loop + 10000000).ToString());
List2.Add((Loop + 10100000).ToString());
List3.Add((Loop + 1100000).ToString());
}
IEnumerable<string> FN = List1;
IEnumerable<string> MN = List2;
IEnumerable<string> LN = List3;
//
// Time running the Parallel.For version.
//
Stopwatch SW = new Stopwatch();
SW.Start();
string[] RetStrings;
RetStrings = BuildMatchArrayOld(FN, MN, LN);
// Get the elapsed time as a TimeSpan value.
SW.Stop();
TimeSpan TS = SW.Elapsed;
// Format and display the TimeSpan value.
string ElapsedTime = TS.TotalSeconds.ToString();
Console.WriteLine("Old RunTime = " + ElapsedTime);
}
I found another somewhat similar question here that also does not compile. But, the accepted answer of using a simpler form of the function does not help me here. I could do that for this particular case, but would really like to know how to use thread local storage with a reference type in the future. Is this a MS bug, or am I missing the proper syntax?
EDIT
I did try this code from this link:
static void Main()
{
int[] nums = Enumerable.Range(0, 1000000).ToArray();
long total = 0;
// Use type parameter to make subtotal a long, not an int
Parallel.For<long>(0, nums.Length, () => 0, (j, loop, subtotal) =>
{
subtotal += nums[j];
return subtotal;
},
(x) => Interlocked.Add(ref total, x)
);
Console.WriteLine("The total is {0:N0}", total);
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
It seems to work fine.
The problem is that when I try to use Parallel.For in my code and specify a return value, it gives other errors:
sb1.Append(str1.ElementAt(i)).Append(' ').Append(str2.ElementAt(i)).Append(' ').Append(str3.ElementAt(i));
This line now generates errors:
Error 'System.Collections.Generic.IEnumerable' does not
contain a definition for 'ElementAt' and the best extension method
overload
'System.Linq.Enumerable.ElementAt(System.Collections.Generic.IEnumerable,
int)' has some invalid arguments
So, I have no clue what the problem is.
It turns out that the problem with getting the code to compile correctly is a syntax problem. It really would have helped if there had been an example published by Microsoft for this case. The following code will build and run correctly:
public string[] BuildStrings(IEnumerable<string> str1, IEnumerable<string> str2, IEnumerable<string> str3)
{
// This method aggregates the strings in each of the collections and returns the combined set of strings. For example:
// str1 = "A1", "B1", "C1"
// str2 = "A2", "B2", "C2"
// str3 = "A3", "B3", "C3"
//
// Should return:
// "A1 A2 A3"
// "B1 B2 B3"
// "C1 C2 C3"
//
// The idea behind this code is to use a Parallel.For along with a thread local storage StringBuilder object per thread.
// Don't need any final method to execute after each partition has completed.
// No example on how to do this that I can find.
int StrCount = str1.Count(); // str1, str2, and str3 guaranteed to be equal in size and > 0.
var RetStr = new string[StrCount];
Parallel.For<StringBuilder>(0, StrCount, () => new StringBuilder(200), (i, j, sb1) =>
{
sb1.Clear();
sb1.Append(str1.ElementAt(i)).Append(' ').Append(str2.ElementAt(i)).Append(' ').Append(str3.ElementAt(i));
RetStr[i] = sb1.ToString();
return sb1; // Problem #1 solved. Signature of function requires return value.
}, (x) => x = null); // Problem #2 solved. Replaces (x) => 0 above.
return RetStr;
}
So, the first problem, as was pointed out in the comments by Jon Skeet, was that my lambda method failed to return a value. Since I'm not using a return value, I did not put one in - at least initially. When I put in the return statement, then the compiler showed another error with the "ElementAt" static method - as shown above under EDIT.
It turns out that the "ElementAt" error the compiler flagged as being the problem had nothing at all to do with the issue. This tends to remind me of my C++ days when the compiler was not nearly as helpful as the C# compiler. Identifying the wrong line as an error is quite rare in C# - but as can be seen from this example, it does happen.
The second problem was the line (x) => 0). This line is the 5th parameter in the function, and is called by each thread after all its work has been completed. I initially tried changing this to (x) => x.Clear. This ended up generating the error message:
Only assignment, call, increment, decrement, await, and new object
expressions can be used as a statement
The "ElementAt" errors were still present as well. So, from this clue I decided that the (x) => 0 might be causing the real issue - minus an error message. Since the work is complete at this point, I changed it to set the StringBuffer object to null since it would not be needed again. Magically, all of the "ElementAt" errors vanished. It built and ran correctly after that.
Parallel.For provides some nice functionality, but I think Microsoft would be well advised to revisit some of the functionality. Any time a line causes a problem, it should be flagged as such. That at least needs to be addressed.
It would also be nice if Microsoft could provide some additional override methods for Parallel.For that would allow void to be returned, and accepting a null value for the 5th parameter. I actually tried sending in a NULL value for that, and it built. But, a run time exception occurred because of this. A better idea is to provide an override for 4 parameters when no "thread completion" method needs to be called.
Here's what your own For overloads would look like
public static ParallelLoopResult For<TLocal>(int fromInclusive, int toExclusive, Func<TLocal> localInit, Func<int, ParallelLoopState, TLocal, TLocal> body)
{
return Parallel.For(fromInclusive, toExclusive, localInit, body, localFinally: _ => { });
}
static void StringBuilderFor(int count, Action<int, ParallelLoopState, StringBuilder> body)
{
Func<int, ParallelLoopState, StringBuilder, StringBuilder> b = (i, j, sb1) => { body(i, j, sb1); return sb1; };
For(0, count, () => new StringBuilder(200), b);
}
You can also avoid the whole problem by using LINQ and AsParallel() instead of doing explicit parallelism.
int StrCount = str1.Count(); // str1, str2, and str3 guaranteed to be equal in size and > 0.
var RetStr = from i in Enumerable.Range(0, StrCount)
let sb1 = new StringBuilder(200)
select (sb1.Append(str1.ElementAt(i)).Append(' ').Append(str2.ElementAt(i)).Append(' ').Append(str3.ElementAt(i))).ToString();
return RetStr.AsParallel().ToArray();
This may not be quite as fast, but it's probably a lot simpler.

Not able to change the value of a Dictionary's key

I wanted to count the number of repeated characters in a text file..
I wrote this code
foreach(char c in File.ReadAllText(path))
{
if(dic.ContainsKey(c))
{
int i=dic[c];
dic[c]=i++;
}
else
{
dic.Add(c,1);
}
}
It's adding all the unique words but it's showing value for all keys as 1 even if there are repeated characters!
I think you want:
dic[c] = i + 1;
Or possibly, although IMHO this just adds complexity since you don't use i after:
dic[c] = ++i;
Explanation:
i++ is a post-increment operation. This means it assigns the current value of i to dic[c] and then increments i. So in summary, you're always reading in i=1, putting the i=1 back into the dictionary, then incrementing i to 2 before sending it to the void.
Addendum:
You don't really need to go through a temporary variable at all. You can simply read and assign the value back in one operation with dic[c] += 1; or even increment it with dic[c]++;.
i++ will add one to the value of i but return the value of i before the increment. You don't want to do that. You just want to return the value of i incremented by one. To do this, just write:
dic[c] = i+1;
On a side note, you could do the whole thing using LINQ instead:
var dic = File.ReadAllText(path).GroupBy(c => c)
.ToDictionary(group => group.Key, group => group.Count());
You want dic[c] = i + 1; or dic[c] += 1 or dic[c]++. In your code the post increment operator is incrementing i after assignment takes place so it has no effect on the value of dic[c].
dic[c]=i++; translates to
dic[c] = i;
i = i++;
i isn't a reference value and thus the value of i placed inside the dictionary will not change after you increment it outside the dictionary.
Use dic[c]++; instead.
This is because i gets incremented after being affected to dict[c]. Try this instead :
if(dic.ContainsKey(c))
{
dic[c] += 1;
}
Dictionary<char, int> LetterCount(string textPath)
{
var dic = new Dictionary<char, int>();
foreach (char c in System.IO.File.ReadAllText(textPath))
{
if (dic.ContainsKey(c))
dic[c]++;
else
dic.Add(c, 1);
}
return dic;
}
Then use like this:
var letters = LetterCount(#"C:\Text.txt");
// letters will contain the result

C# Loop Through An Array

I am completely new to C#. I am trying to loop through a short array, where the string elements in the array are placed at the end of a website search. The code:
int n = 1;
string[] s = {"firstitem","seconditem","thirditem"}
int x = s.Max(); // note, from my research this should return the maximum value in the array, but this is the first error
x = x + 1
while (n < x)
{
System.Diagnostics.Process.Start("www.website.com/" + b[0]);
b[]++; // this also generates an error "identifier expected"
}
My coding, logic or both are wrong. Based on what I've read, I should be able to get the maximum value in an array (as an int), then add to the arrays value while a WHILE loop adds each value in the array at the end of the website (and then stops). Note, that on the first error, I tried coding it differently, like the below:
int x = Convert.ToInt32(s.Max);
However, it generates an overload error. If I'm reading things correctly, MAX should find the maximum value in a sequence.
foreach(var str in s)
{
System.Diagnostics.Process.Start("www.website.com/" + str);
}
You have a collection of strings. The largest string is still a string, not an int. Since s.Max() is a string, and you're assinging it to a variable of type int: int x = s.Max(); the compiler (correctly) informs you that the types do not match. You need to convert that string to an int. Since, looking at your data, they aren't integers, and I see no sensible way of converting those strings into integers, I see no reasonable solution. What integer should "firstitem" be?
If you just want to execute some code for each item in the array then use one of these patterns:
foreach(string item in s)
{
System.Diagnostics.Process.Start("www.website.com/" + item);
}
or
for(int i = 0; i < s.Length; i++)
{
System.Diagnostics.Process.Start("www.website.com/" + s[i]);
}
You're missing a couple of semi-colons
x should presumably be the Length of the array, not the largest value in it
You need to increment x inside of your loop - at the end of it, not outside of it
You should actually be incrementing n, not x
n should be starting at 0, not at 1
Inside the loop you're using b[0] where you probably want to use b[n]
I'm no C++ guru, but I have no idea what b[]++ might mean
As other answers have mentioned, you may want to use a for or foreach instead of a while.
Make an effort to go through some introductory tutorials. Trial and error can be a useful tool, but there's no need to fall back on that when learning the very basics
Following is an image to point out what are the errors of your code:
After the correction, it would be:
int n=1;
string[] s= { "firstitem", "seconditem", "thirditem" };
int x=s.Length;
while(n<x) {
System.Diagnostics.Process.Start("www.website.com/"+s[n]);
n++; // or ++n
}
And we can make it more semantic:
var items=new[] { "firstitem", "seconditem", "thirditem" };
for(int index=1, count=items.Length; index<count; ++index)
Process.Start("www.website.com/"+items[index]);
If the starting order doesn't matter, and we can use foreach instead, and we can use Linq to make the code even simpler:
var list=(new[] { "firstitem", "seconditem", "thirditem" }).ToList();
list.ForEach(item => Process.Start("www.website.com/"+item));
and we might quite often write in another form:
foreach(var item in new[] { "firstitem", "seconditem", "thirditem" })
Process.Start("www.website.com/"+item);
from the sample
var processList = (new string[]{"firstitem","seconditem","thirditem"})
.Select(s => Process.Start("www.website.com/" + s))
.ToList();
and here is a test version that outputs to console
(new string[] { "firstitem", "seconditem", "thirditem" })
.Select(s => { Console.WriteLine(#"www.website.com/" + s); return s; })
.ToList();
note: Select requires a return type and the .ToList() enforces evaluation.

Can't figure out where a value is being set

I'm working on some Project Euler questions and need some help understanding a solution I found.
My question is: Where the heck is X being set in the SkipWhile method call?? When I break the code during runtime and step through to that point I never see a value being set for it. Yet the code will work all the way through. I checked the definition for SkipWhile and maybe I just don't understand how the arguments being passed in the call satisfy the 3 parameter method definition. Same thing for Math.Pow - Where is that X getting set!?
public long FindGreatestPrimeFactor(long factorGreaterThan, long number)
{
long upperBound = (long)Math.Ceiling(Math.Sqrt(number));
// find next factor of number
long nextFactor = Range(factorGreaterThan + 1, upperBound)
.SkipWhile(x => number % x > 0).FirstOrDefault();
// if no other factor was found, then the number must be prime
if (nextFactor == 0)
{
return number;
}
else
{
// find the multiplicity of the factor
long multiplicity = Enumerable.Range(1, Int32.MaxValue)
.TakeWhile(x => number % (long)Math.Pow(nextFactor, x) == 0)
.Last();
long quotient = number / (long)Math.Pow(nextFactor, multiplicity);
if (quotient == 1)
{
return nextFactor;
}
else
{
return FindGreatestPrimeFactor(nextFactor, quotient);
}
}
}
private IEnumerable<long> Range(long first, long last)
{
for (long i = first; i <= last; i++)
{
yield return i;
}
}
I believe you are talking about the lambda expression:
x => number % x > 0
All lambda expressions use the lambda operator =>, which is read as "goes to". The left side of the lambda operator specifies the input parameters (if any) and the right side holds the expression or statement block.
In a LINQ expression, each item, when iterated over, is supplied to the lambda. In the body of the lambda, if you wish to refer to the item, you need to give it a name. In this case the parameter ends up named x.
The expressions that look like this:
x => number % x > 0
are called lambda expressions. They actually are functions, and x is a parameter. SkipWhile takes a function, and then executes it with different values for its parameters.
Here is how the lambda expression would be written as a function:
bool Foobar(long x)
{
return number % x > 0;
}
In SkipWhile, I believe the function is called with x being the first item in the list. If it is true, the function is called again with the second item in the list, and so on down until the function returns false.
In this case, SkipWhile is asking for a function that will convert a value of the type in the list to a bool. Lambda expressions are a concise way to express this.
SkipWhile is retrieving its input values (the x) from the Range method, which in turn returns numbers from factorGreaterThan + 1 up to upperBound. Not sure why the author decided to write a method for this, since this is built in with the Enumerable.Range method.

Linq late binding confusion

Can someone please explain me what I am missing here. Based on my basic understanding linq result will be calculated when the result will be used and I can see that in following code.
static void Main(string[] args)
{
Action<IEnumerable<int>> print = (x) =>
{
foreach (int i in x)
{
Console.WriteLine(i);
}
};
int[] arr = { 1, 2, 3, 4, 5 };
int cutoff = 1;
IEnumerable<int> result = arr.Where(x => x < cutoff);
Console.WriteLine("First Print");
cutoff = 3;
print(result);
Console.WriteLine("Second Print");
cutoff = 4;
print(result);
Console.Read();
}
Output:
First Print
1
2
Second Print
1
2
3
Now I changed the
arr.Where(x => x < cutoff);
to
IEnumerable<int> result = arr.Take(cutoff);
and the output is as follow.
First Print
1
Second Print
1
Why with Take, it does not use the current value of the variable?
The behavior your seeing comes from the different way in which the arguments to the LINQ functions are evaluated. The Where method recieves a lambda which captures the value cutoff by reference. It is evaluated on demand and hence sees the value of cutoff at that time.
The Take method (and similar methods like Skip) take an int parameter and hence cutoff is passed by value. The value used is the value of cutoff at the moment the Take method is called, not when the query is evaluated
Note: The term late binding here is a bit incorrect. Late binding generally refers to the process where the members an expression binds to are determined at runtime vs. compile time. In C# you'd accomplish this with dynamic or reflection. The behavior of LINQ to evaluate it's parts on demand is known as delayed execution.
There's a few different things getting confused here.
Late-binding: This is where the meaning of code is determined after it was compiled. For example, x.DoStuff() is early-bound if the compiler checks that objects of x's type have a DoStuff() method (considering extension methods and default arguments too) and then produces the call to it in the code it outputs, or fails with a compiler error otherwise. It is late-bound if the search for the DoStuff() method is done at run-time and throws a run-time exception if there was no DoStuff() method. There are pros and cons to each, and C# is normally early-bound but has support for late-binding (most simply through dynamic but the more convoluted approaches involving reflection also count).
Delayed execution: Strictly speaking, all Linq methods immediately produce a result. However, that result is an object which stores a reference to an enumerable object (often the result of the previous Linq method) which it will process in an appropriate manner when it is itself enumerated. For example, we can write our own Take method as:
private static IEnumerable<T> TakeHelper<T>(IEnumerable<T> source, int number)
{
foreach(T item in source)
{
yield return item;
if(--number == 0)
yield break;
}
}
public static IEnumerable<T> Take<T>(this IEnumerable<T> source, int number)
{
if(source == null)
throw new ArgumentNullException();
if(number < 0)
throw new ArgumentOutOfRangeException();
if(number == 0)
return Enumerable.Empty<T>();
return TakeHelper(source, number);
}
Now, when we use it:
var taken4 = someEnumerable.Take(4);//taken4 has a value, so we've already done
//something. If it was going to throw
//an argument exception it would have done so
//by now.
var firstTaken = taken4.First();//only now does the object in taken4
//do the further processing that iterates
//through someEnumerable.
Captured variables: Normally when we make use of a variable, we make use of how its current state:
int i = 2;
string s = "abc";
Console.WriteLine(i);
Console.WriteLine(s);
i = 3;
s = "xyz";
It's pretty intuitive that this prints 2 and abc and not 3 and xyz. In anonymous functions and lambda expressions though, when we make use of a variable we are "capturing" it as a variable, and so we will end up using the value it has when the delegate is invoked:
int i = 2;
string s = "abc";
Action λ = () =>
{
Console.WriteLine(i);
Console.WriteLine(s);
};
i = 3;
s = "xyz";
λ();
Creating the λ doesn't use the values of i and s, but creates a set of instructions as to what to do with i and s when λ is invoked. Only when that happens are the values of i and s used.
Putting it all together: In none of your cases do you have any late-binding. That is irrelevant to your question.
In both you have delayed execution. Both the call to Take and the call to Where return enumerable objects which will act upon arr when they are enumerated.
In only one do you have a captured variable. The call to Take passes an integer directly to Take and Take makes use of that value. The call to Where passes a Func<int, bool> created from a lambda expression, and that lambda expression captures an int variable. Where knows nothing of this capture, but the Func does.
That's the reason the two behave so differently in how they treat cutoff.
Take doesn't take a lambda, but an integer, as such it can't change when you change the original variable.

Categories

Resources