Loop - Calculated last element different - c#

Hi everyone (sry for the bad title),
I have a loop in which I can get a rounding difference every time I pass. I would like to cumulate them and add it to the last record of my result.
var cumulatedRoundDifference = 0m;
var resultSet = Enumerable.Range(0, periods)
.Select(currentPeriod => {
var value = this.CalculateValue(currentPeriod);
var valueRounded = this.CommercialRound(value);
// Bad part :(
cumulatedRoundDifference += value - valueRounded;
if (currentPeriod == periods - 1)
valueRounded = this.CommercialRound(value + valueRounded);
return valuesRounded;
}
At the moment the code of my opinion is not so nice.
Is there a pattern / algorithm for such a thing or is it somehow clever with Linq, without a variable outside the loop?
many Greetings

It seems like you are doing two things - rounding everything, and calculating the total rounding error.
You could remove the variable outside the lambda, but then you would need 2 queries.
var baseQuery = Enumerable.Range(0, periods)
.Select(x => new { Value = CalculateValue(x), ValueRounded = CommercialRound(x) });
var cumulateRoundDifference = baseQuery.Select(x => x.Value - x.ValueRounded).Sum();
// LINQ isn't really good at doing something different to the last element
var resultSet = baseQuery.Select(x => x.ValueRounded).Take(periods - 1).Concat(new[] { CommercialRound(CalculateValue(periods - 1) + CommericalRound(periods - 1)) });

Is there a pattern / algorithm for such a thing or is it somehow clever with Linq, without a variable outside the loop?
I don't quite agree with what you're trying to accomplish. You're trying to accomplish two very different tasks, so why are you trying to merge them into the same iteration block? The latter (handling the last item) isn't even supposed to be an iteration.
For readability's sake, I suggest splitting the two off. It makes more sense and doesn't require you to check if you're on the last loop of the iteration (which saves you some code and nesting).
While I don't quite understand the calculation in and of itself, I can answer the algorithm you're directly asking for (though I'm not sure this is the best way to do it, which I'll address later in the answer).
var allItemsExceptTheLastOne = allItems.Take(allItems.Count() - 1);
foreach(var item in allItemsExceptTheLastOne)
{
// Your logic for all items except the last one
}
var theLastItem = allItems.Last();
// Your logic for the last item
This is in my opinion a cleaner and more readable approach. I'm not a fan of using lambda methods as mini-methods with a less-than-trivial readability. This may be subjective and a matter of personal style.
On rereading, I think I understand the calculation better, so I've added an attempt at implementing it, while still maximizing readability as best I can:
// First we make a list of the values (without the sum)
var myValues = Enumerable
.Range(0, periods)
.Select(period => this.CalculateValue(period))
.Select(period => period - this.CommercialRound(period))
.ToList();
// myValues = [ 0.1, 0.2, 0.3 ]
myValues.Add(myValues.Sum());
// myValues = [ 0.1, 0.2, 0.3, 0.6 ]
This follows the same approach as the algorithm I first suggested: iterate over the iteratable items, and then separately handle the last value of your intended result list.
Note that I separated the logic into two subsequent Select statements as I consider it the most readable (no excessive lambda bodies) and efficient (no duplicate CalculateValue calls) way of doing this. If, however, you are more concerned about performance, e.g. when you are expecting to process massive lists, you may want to merge these again.
I suggest that you always try to default to writing code that favors readability over (excessive) optimization; and only deviate from that path when there is a clear need for additional optimization (which I cannot decide based on your question).
On a second reread, I'm not sure you've explained the actual calculation well enough, as cumulatedRoundDifference is not actually used in your calculations, but the code seems to suggest that its value should be important to the end result.

Related

How linq function OrderByDescending and OrderBy for string length works internally? Is it faster than doing it with loop?

My question is raised based on this question, I had posted an answer on that question..here
This is the code.
var lines = System.IO.File.ReadLines(#"C:\test.txt");
var Minimum = lines[0];//Default length set
var Maximum = "";
foreach (string line in lines)
{
if (Maximum.Length < line.Length)
{
Maximum = line;
}
if (Minimum.Length > line.Length)
{
Minimum = line;
}
}
and alternative for this code using LINQ (My approach)
var lines = System.IO.File.ReadLines(#"C:\test.txt");
var Maximum = lines.OrderByDescending(a => a.Length).First().ToString();
var Minimum = lines.OrderBy(a => a.Length).First().ToString();
LINQ is easy to read and implement..
I want to know which one is good for performance.
And how Linq work internally for OrderByDescending and OrderBy for ordering by length?
You can read the source code for OrderBy.
Stop doing micro-optimizing or premature-optimization on your code. Try to write code that performs correctly, then if you face a performance problem later then profile your application and see where is the problem. If you have a piece of code which have performance problem due to finding the shortest and longest string then start to optimize this part.
We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. Yet we should not pass
up our opportunities in that critical 3% - Donald Knuth
File.ReadLines is returning an IEnumerable<string>, It means that if you do a foreach over it it will return data to you one by one. I think the best performance improvement you can do here is to improve the reading of file from the disk. If it is small enough to load the whole file into memory use File.ReadAllLines, if it is not try reading the file in big chunks that fits in memory. Reading a file line by line will cause performance degradation due to I/O operation from disk. So the problem here is not how LINQ or loop perform, The problem is in number of disk reads.
With the second method, you are not only sorting the lines twice... You are reading the file twice. This because File.ReadLines returns a IEnumerable<string>. This clearly shows why you shouldn't ever ever enumerate a IEnumerable<> twice unless you know how it was built. If you really want to do it, add a .ToList() or a .ToArray() that will materialize the IEnumerable<> to a collection... And while the first method has a memory footprint of a single line of text (because it reads the file one line at a time), the second method will load the whole file in memory to sort it, so will have a much bigger memory footprint, and if the file is some hundred mb, the difference is big (note that technically you could have a file with a single line of text long 1gb, so this rule isn't absolute... It is for reasonable files that have lines long up to some hundred characters :-) )
Now... Someone will tell you that premature optimization is evil, but I'll tell you that ignorance is twice evil.
If you know the difference between the two blocks of code then you can do an informed choice between the two... Otherwise you are simply randomly throwing rocks until it seems to work. Where seems to work is the keyword here.
In my opinion, you need to understand some points for deciding what is the best way.
First, let's think that we want to solve the problem with LINQ. Then, to write the most optimized code, you must understand Deferred Execution. Most Linq methods, such as Select, Where, OrderBy, Skip, Take and some others uses DE. So, what is Deferred Execution? It means that, these methods will not be executed unless the user doesn't need them. These methods will just create iterator. And this iterator is ready to be executed, when we need them. So, how can user make them execute? The answer is, with the help of foreach which will call GetEnumerator or other Linq methods. Such as, ToList(), First(), FirstOrDefault(), Max() and some others.
These process will help us to gain some performance.
Now, let's come back to your problem. File.ReadLines will return IEnumerable<string>, which it means that, it will not read the lines, unless we need them. In your example, you have twice called sorting method for this object, which it means that it will sort this collection over again twice. Instead of that, you can sort the collection once, then call ToList() which will execute the OrderedEnumerable iterator and then get first and last element of the collection which physically inside our hands.
var orderedList = lines
.OrderBy(a => a.Length) // This method uses deferred execution, so it is not executed yet
.ToList(); // But, `ToList()` makes it to execute.
var Maximum = orderedList.Last();
var Minimum = orderedList.First();
BTW, you can find OrderBy source code, here.
It returns OrderedEnumerable instance and the sorting algorithm is here:
public IEnumerator<TElement> GetEnumerator()
{
Buffer<TElement> buffer = new Buffer<TElement>(source);
if (buffer.count > 0)
{
EnumerableSorter<TElement> sorter = GetEnumerableSorter(null);
int[] map = sorter.Sort(buffer.items, buffer.count);
sorter = null;
for (int i = 0; i < buffer.count; i++) yield return buffer.items[map[i]];
}
}
And now, let's come back to another aspect which effects the performance. If you see, Linq uses another element to store sorted collection. Of course, it will take some memory, which tells us it is not the most efficent way.
I just tried to explain you how does Linq work. But, I am very agree with #Dotctor as a result to your overall answer. Just, don't forget that, you can use File.ReadAllLines which will not return IEnumerable<stirng>, but string[].
What does it mean? As I tried to explain in the beginning, difference is that, if it is IEnumerable, then .net will read line one by one when enuemrator enumerates over iterator. But, if it is string[], then all lines in our application memory.
The most efficient approach is to avoid LINQ here, the approach using foreach needs only one enumeration.
If you want to put the whole file into a collection anyway you could use this:
List<string> orderedLines = System.IO.File.ReadLines(#"C:\test.txt")
.OrderBy(l => l.Length)
.ToList();
string shortest = orderedLines.First();
string longest = orderedLines.Last();
Apart from that you should read about LINQ's deferred execution.
Also note that your LINQ approach does not only order all lines twice to get the longest and the shortest, it also needs to read the whole file twice since File.ReadLines is using a StreamReader(as opposed to ReadAllLines which reads all lines into an array first).
MSDN:
When you use ReadLines, you can start enumerating the collection of
strings before the whole collection is returned; when you use
ReadAllLines, you must wait for the whole array of strings be returned
before you can access the array
In general that can help to make your LINQ queries more efficient, f.e. if you filter out lines with Where, but in this case it's making things worse.
As Jeppe Stig Nielsen has mentioned in a comment, since OrderBy needs to create another buffer-collection internally(with ToList the second), there is another approach that might be more efficient:
string[] allLines = System.IO.File.ReadAllLines(#"C:\test.txt");
Array.Sort(allLines, (x, y) => x.Length.CompareTo(y.Length));
string shortest = allLines.First();
string longest = allLines.Last();
The only drawback of Array.Sort is that it performs an unstable sort as opposed to OrderBy. So if two lines have the same length the order might not be maintained.

Is there a difference between conjuncted condition and multiple Where method call?

I was sitting this cloudy Saturday morning thinking to myself:
IEnumerable<SomeType>
someThings = ...,
conjunctedThings = someThings.Where(thing => thing.Big && thing.Tall),
multiWhereThings = someThings
.Where(thing => thing.Big).Where(thing => thing.Tall);
Intuitively, I'd say that conjunctedThings will be computed no slower than multiWhereThings but is there really a difference in a general case?
I can imagine that depending on the share if big things and tall tings, the computations might elapse differently but I'd like to disregard that aspect.
Are there any other properties I need to take into consideration? E.g. the type of the enumerable or anything else?
In general the MultiWhere will be slower. It needs to process more items and call more lambdas.
If someThings contains n items, and m of which are Big then the lambda for conjucated-things is called n times while the lambdas for multi-where are called n+m times. Of couse this is true if the user of the two sequences intends to iterate all the contents. Since the Where method performs yield return internally the number of iterations might be less depending on the user of the collections. In other words the numbers above are the worst-case estimate.

Remove set of elements from list A and add into list B using Linq

I am attempting to lean linq by replacing existing code in a project with linq calls. In this method I check for a condition in my list of lines and if the condition is true, move that element from lines to processedLines.
The data structures are just lists:
List<LineSegment2> lines;
List<LineSegment2> processedLines;
The original code was:
for (int i = lines.Count - 1; i >= 0; i--)
{
if (lines[i].P2.x < sweepPosition)
{
processedLines.Add(lines[i]);
lines.RemoveAt(i);
}
}
and my linq code is:
var toMove = lines.FindAll(x => x.P2.x < sweepPosition);
toMove.ForEach(x =>
{
processedLines.Add(x);
lines.Remove(x);
});
My question is: Is this linq code less efficient because it is using more memory creating the temporary list 'toMove'. Is there a way to create the linq query without requiring the temporary list or is the original code always more efficient?
A more LINQy solution would be to add all the processed lines at once, then get the remaining lines:
processedLines.AddRange(lines.Where(x => x.P2.x < sweepPosition));
lines = lines.Where(x => x.P2.x >= sweepPosition).ToList();
As for efficiency, it won't be quite as fast as your original code. That's not why you use LINQ.
There is one potential advantage, though. It will make a new list of lines, so if you move a lot of lines to the processed list it will get rid of the unused items in the list.
The "linq" code is less efficient and (more importantly) not necessarily much easier to maintain. Stick with your original code if you must choose between these two alternatives. I'd just recommend you run the for loop forward -- no reason you should run it backwards like you're doing.
As a side note, I wonder if it would be appropriate for your use case to just maintain a single list and add an IsProcessed property to the LineSegment2 class. You might consider that.
I'm not really sure about the efficiency...but in Linq I'll do it like this
processedLines = processedLines.Concat(lines.Where(x => x < sweepPosition)).ToList();
lines.RemoveAll(x => x < sweepPosition);

Linq keyword extraction - limit extraction scope

With regards to this solution.
Is there a way to limit the number of keywords to be taken into consideration? For example, I'd like only first 1000 words of text to be calculated. There's a "Take" method in Linq, but it serves a different purpose - all words will be calculated, and N records will be returned. What's the right alternative to make this correctly?
Simply apply Take earlier - straight after the call to Split:
var results = src.Split()
.Take(1000)
.GroupBy(...) // etc
Well, strictly speaking LINQ is not necessarily going to read everything; Take will stop as soon as it can. The problem is that in the related question you look at Count, and it is hard to get a Count without consuming all the data. Likewise, string.Split will look at everything.
But if you wrote a lazy non-buffering Split function (using yield return) and you wanted the first 1000 unique words, then
var words = LazySplit(text).Distinct().Take(1000);
would work
Enumerable.Take does in fact stream results out; it doesn't buffer up its source entirely and then return only the first N. Looking at your original solution though, the problem is that the input to where you would want to do a Take is String.Split. Unfortunately, this method doesn't use any sort of deferred execution; it eagerly creates an array of all the 'splits' and then returns it.
Consequently, the technique to get a streaming sequence of words from some text would be something like:
var words = src.StreamingSplit() // you'll have to implement that
.Take(1000);
However, I do note that the rest of your query is:
...
.GroupBy(str => str) // group words by the value
.Select(g => new
{
str = g.Key, // the value
count = g.Count() // the count of that value
});
Do note that GroupBy is a buffering operation - you can expect that all of the 1,000 words from its source will end up getting stored somewhere in the process of the groups being piped out.
As I see it, the options are:
If you don't mind going through all of the text for splitting purposes, then src.Split().Take(1000) is fine. The downside is wasted time (to continue splitting after it is no longer necesary) and wasted space (to store all of the words in an array even though only the first 1,000) will be needed. However, the rest of the query will not operate on any more words than necessary.
If you can't afford to do (1) because of time / memory constraints, go with src.StreamingSplit().Take(1000) or equivalent. In this case, none of the original text will be processed after 1,000 words have been found.
Do note that those 1,000 words themselves will end up getting buffered by the GroupBy clause in both cases.

LINQ queries on possibly infinite lists

I am currently doing some Project Euler problems and the earlier ones often involve things like Fibonacci numbers or primes. Iterating over them seems to be a natural fit for LINQ, at least in readability and perceived "elegance" of the code (I'm trying to use language-specific features where possible and applicable to get a feel for the languages).
My problem is now, if I only need a set of numbers up to a certain limit, how should I best express this? Currently I have hard-coded the respective limit in the iterator but I'd really like the enumerator to return the list until something outside decides not to query it anymore, since it's over a certain limit. So basically that I have a potentially infinite iterator but I only take a finite set of numbers from it. I know such things are trivial in functional languages, but I wonder whether C# allows for that, too. The only other idea I had would be to have an iterator Primes(long) that returns primes up to a certain limit, likewise for other sequences.
Any ideas?
Most of the LINQ methods (Enumerable class) are lazy. So for instance, there's nothing wrong with:
var squares = Enumerable.Range(0, Int32.MaxValue).Select(x=>x*x);
You can use the Take method to limit the results:
var 10squares = squares.Take(10);
var smallSquares = squares.TakeWhile(x => x < 10000);
Edit: The things you need to avoid are functions that return "lazily" but have to consume the entire enumerable to produce a result. For example, grouping or sorting:
var oddsAndEvens = Enumerable.Range(0, Int32.MaxValue)
.GroupBy(x => x % 2 == 0);
foreach (var item in oddsAndEvens) {
Console.WriteLine(item.Key);
}
(That'll probably give you an OutOfMemoryExeption on 32-bit.)

Categories

Resources