Use multiple dictionaries or a single huge one - c#

I have a general question about dictionaries in C#.
Say I read in a text file, split it up into keys and values and store them in a dictionary.
Would it be more useful to put them all into a single dictionary or split it up into smaller ones?
It probably wouldn't make a huge difference with small text files but some of them have more than 100.000 lines.
What would you recommend?

First rule is always to benchmark before trying optimization. That being said, some people might have done the benchmarking for you. Check those results here
From the article (Just in case it disappears from the net)
The smaller Dictionary (with half the number of keys) was much faster.
In this case, the behavior of both Dictionaries on the input was
identical. This means that having unneeded keys in the Dictionary
makes it slower.
My perspective is that you should use separate Dictionaries for
separate purposes. If you have two sets of keys, do not store them in
the same Dictionary. If you can divide them up, you can enhance lookup
performance.
Credit: dotnetperls.com
Also from the article :
Full Dictionary: 791 ms
Half-size Dictionary: 591 ms [faster]
Maybe you can live with much less code and 200ms more, it really depends on your application

I believe the original article is either inaccurate or outdated. In any case, the statements regarding "dictionary size" have since been removed. Now, to answer the question:
Targeting .NET 6 x64 gives BETTER performance for a SINGLE dictionary. In fact, performance gets worse the more dictionaries you use:
| Method | Mean | Error | StdDev | Median |
|-------------- |----------:|---------:|----------:|----------:|
| Dictionary_1 | 91.54 us | 1.815 us | 3.318 us | 89.88 us |
| Dictionary_2 | 122.55 us | 1.067 us | 0.998 us | 122.19 us |
| Dictionary_10 | 390.77 us | 7.757 us | 18.882 us | 382.55 us |
The results should come as no surprise. For N-dictionary lookup you will calculate the hash code up to N times for every item to look up, instead of doing it just once. Also, you have to loop through the list of dictionaries which introduces a miniscule performance hit. All in all, it just makes sense.
Now, under some bizarre conditions, it might be possible to gain some speed using N-dictionary. E.g. Consider a tiny CPU cache, thrashing, hash code collisions etc. Have yet to encounter such a scenario though...
Benchmark code
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
namespace MyBenchmarks;
public class DictionaryBenchmark
{
private const int N = 1000000;
private readonly string[] data;
private readonly Dictionary<string, string> dictionary;
private readonly List<Dictionary<string, string>> dictionaries2;
private readonly List<Dictionary<string, string>> dictionaries10;
public DictionaryBenchmark()
{
data = Enumerable.Range(0, N).Select(n => Guid.NewGuid().ToString()).ToArray();
dictionary = data.ToDictionary(x => x);
dictionaries2 = CreateDictionaries(2);
dictionaries10 = CreateDictionaries(10);
}
private List<Dictionary<string, string>> CreateDictionaries(int count)
{
int chunkSize = N / count;
return data.Select((item, index) => (Item: item, Index: index))
.GroupBy(x => x.Index / chunkSize)
.Select(g => g.Select(x => x.Item).ToDictionary(x => x))
.ToList();
}
[Benchmark]
public void Dictionary_1()
{
for (int i = 0; i < N; i += 1000)
{
dictionary.ContainsKey(data[i]);
}
}
[Benchmark]
public void Dictionary_2()
{
for (int i = 0; i < N; i += 1000)
{
foreach (var d in dictionaries2)
{
if (d.ContainsKey(data[i]))
{
break;
}
}
}
}
[Benchmark]
public void Dictionary_10()
{
for (int i = 0; i < N; i += 1000)
{
foreach (var d in dictionaries10)
{
if (d.ContainsKey(data[i]))
{
break;
}
}
}
}
}
public class Program
{
public static void Main() => BenchmarkRunner.Run<DictionaryBenchmark>();
}

Related

Looking for the best practice for doing a bunch of replaces on a (possibly) large string in C#

we have a piece of code that basically reads a template from a file, and then replaces a bunch of placeholders with real data. The template is something outside of the developer's control and we have noticed that sometimes (for large template files), it can get quite CPU-intensive to perform the replaces.
At least, we believe that it is the string replaces that are intensive. Debugging locally shows that it only takes milliseconds to perform each replace, but every template is different and some templates can contain hundreds of these tags that need replacing.
I'll show a little bit of code first, before I continue.
This is a huge simplification of the real code. There could be hundreds of replacements happening on our real code.
string template = File.ReadAllText("path\to\file");
if (!string.IsNullOrEmpty(template))
{
if (template.Contains("[NUMBER]"))
template = template.Replace("[NUMBER]", myobject.Number);
if (template.Contains("[VAT_TABLE]"))
template = template.Replace("[VAT_TABLE]", ConstructVatTable(myObject));
// etc ... :
}
private string ConstructVatTable(Invoice myObject)
{
string vatTemplate = "this is a template with tags of its own";
StringBuilder builder = new StringBuilder();
foreach (var item in myObject.VatItems)
{
builder.Append(vatTemplate.Replace("[TAG1]", item.Tag1).Replace("[TAG2]", item.Tag2);
}
return builder.ToString();
}
Is this the most optimal way of replacing parts of a large string, or are there better ways? Are there ways that we could profile what we are doing in more detail to show us where the CPU intensive parts may lie? Any help or advice would be greatly appreciated.
You perhaps need to come up with some alternative strategies for your replacements and race your horses.
I did 4 here and benched them:
[MemoryDiagnoser]
[SimpleJob(RuntimeMoniker.Net50)]
public class Streplace
{
private string _doc;
private Dictionary<string,string> _tags;
[GlobalSetup]
public void GenerateDocAndTags()
{
var sb = new StringBuilder(50000);
_tags = new Dictionary<string, string>();
for (int i = 0; i < 20; i++)
{
_tags["TAG" + i] = new string((char)('a' + i), 1000);
}
for (int i = 0; i < 1000; i++)
{
sb.Append(Guid.NewGuid().ToString().ToUpper());
if (i % 50 == 0)
{
sb.Append("[TAG" + i / 50 + "]");
}
}
_doc = sb.ToString();
}
[Benchmark]
public void NaiveString()
{
var str = _doc;
foreach (var tag in _tags)
{
str = str.Replace("[" + tag.Key + "]", tag.Value);
}
}
[Benchmark]
public void NaiveStringBuilder()
{
var strB = new StringBuilder(_doc, _doc.Length * 2);
foreach (var tag in _tags)
{
strB.Replace("[" + tag.Key + "]", tag.Value);
}
var s = strB.ToString();
}
[Benchmark]
public void StringSplit()
{
var strs = _doc.Split('[',']');
for (int i = 1; i < strs.Length; i+= 2)
{
strs[i] = _tags[strs[i]];
}
var s = string.Concat(strs);
}
[Benchmark]
public void StringCrawl()
{
var strB = new StringBuilder(_doc.Length * 2);
var str = _doc;
var lastI = 0;
for (int i = str.IndexOf('['); i > -1; i = str.IndexOf('[', i))
{
strB.Append(str, lastI, i - lastI); //up to the [
i++;
var j = str.IndexOf(']', i);
var tag = str[i..j];
strB.Append(_tags[tag]);
lastI = j + 1;
}
strB.Append(str, lastI, str.Length - lastI);
var s = strB.ToString();
}
}
NaiveString - your replace replace replace approach
NaiveStringBuilder - Rafal's approach - I was quite surprised how badly this performed, but I haven't looked into why. If anyone notices a glaring error in my code, let me know
StringSplit - the approach I commented - split the string and then the odd indexes are what needs swapping out, then join the string again
StringCrawl - travel the string looking for [ and ] putting either the doc content, or a tag content into the StringBuilder, depending on whether we're inside or outside the brackets
The test document was generated; thousands of GUIDs with [TAGx] (x was 0 to 19) inserted at regular intervals. The [TAGx] were each replaced with a string considerably longer that was a bunch of repeated chars. The resulting document was 56000 chars and looked like:
The results were:
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1526 (21H2)
Intel Core i7-7820HQ CPU 2.90GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.200
[Host] : .NET 5.0.12 (5.0.1221.52207), X64 RyuJIT [AttachedDebugger]
.NET 5.0 : .NET 5.0.12 (5.0.1221.52207), X64 RyuJIT
Job=.NET 5.0 Runtime=.NET 5.0
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------- |-----------:|---------:|---------:|---------:|---------:|---------:|----------:|
| NaiveString | 1,266.9 us | 24.73 us | 39.93 us | 433.5938 | 433.5938 | 433.5938 | 1,820 KB |
| NaiveStringBuilder | 3,908.6 us | 73.79 us | 93.33 us | 78.1250 | 78.1250 | 78.1250 | 293 KB |
| StringSplit | 110.8 us | 2.15 us | 2.01 us | 34.4238 | 34.4238 | 34.4238 | 181 KB |
| StringCrawl | 101.5 us | 1.96 us | 2.40 us | 79.9561 | 79.9561 | 79.9561 | 251 KB |
NaiveString's memory usage is huge, NaiveStringBuilder's memory is better but is 3x slower. Crawl and Split are pretty good - about 12x faster than NaiveString, 40x faster than NaiveStringBuilder (and a bit less memory too, bonus).
I thought the Crawl would be better than the Split, but I'm not sure the 10% speedup is worth the extra memory/collections - that would be your call. Take the code, maybe add some more approaches, and race them. Nuget install BenchmarkDotNet, put a static void main of this:
public static async Task Main()
{
#if DEBUG
var sr = new Streplace();
sr.GenerateDocAndTags();
sr.NaiveString();
sr.NaiveStringBuilder();
sr.StringSplit();
sr.StringCrawl();
#else
var summary = BenchmarkRunner.Run<Streplace>();
#endif
}

C# foreach loop comically slower than for loop on a RaspberryPi

I was testing a .NET application on a RaspberryPi and whereas each iteration of that program took 500 milliseconds on a Windows laptop, the same took 5 seconds on a RaspberryPi. After some debugging, I found that majority of that time was being spent on a foreach loop concatenating strings.
Edit 1: To clarify, that 500 ms and 5 s time I mentioned was the time of the entire loop. I placed a timer before the loop, and stopped the timer after the loop had finished. And, the number of iterations are the same in both, 1000.
Edit 2: To time the loop, I used the answer mentioned here.
private static string ComposeRegs(List<list_of_bytes> registers)
{
string ret = string.Empty;
foreach (list_of_bytes register in registers)
{
ret += Convert.ToString(register.RegisterValue) + ",";
}
return ret;
}
Out of the blue I replaced the foreach with a for loop, and suddenly it starts taking almost the same time as it did on that laptop. 500 to 600 milliseconds.
private static string ComposeRegs(List<list_of_bytes> registers)
{
string ret = string.Empty;
for (UInt16 i = 0; i < 1000; i++)
{
ret += Convert.ToString(registers[i].RegisterValue) + ",";
}
return ret;
}
Should I always use for loops instead of foreach? Or was this just a scenario in which a for loop is way faster than a foreach loop?
The actual problem is concatenating strings not a difference between for vs foreach. The reported timings are excruciatingly slow even on a Raspberry Pi. 1000 items is so little data it can fit in either machine's CPU cache. An RPi has a 1+ GHZ CPU which means each concatenation takes at leas 1000 cycles.
The problem is the concatenation. Strings are immutable. Modifying or concatenating strings creates a new string. Your loops created 2000 temporary objects that need to be garbage collected. That process is expensive. Use a StringBuilder instead, preferably with a capacity roughly equal to the size of the expected string.
[Benchmark]
public string StringBuilder()
{
var sb = new StringBuilder(registers.Count * 3);
foreach (list_of_bytes register in registers)
{
sb.AppendFormat("{0}",register.RegisterValue);
}
return sb.ToString();
}
Simply measuring a single execution, or even averaging 10 executions, won't produce valid numbers. It's quite possible the GC run to collect those 2000 objects during one of the tests. It's also quite possible that one of the tests was delayed by JIT compilation or any other number of reasons. A test should run long enough to produce stable numbers.
The defacto standard for .NET benchmarking is BenchmarkDotNet. That library will run each benchmark long enough to eliminate startup and cooldown effect and account for memory allocations and GC collections. You'll see not only how much each test takes but how much RAM is used and how many GCs are caused
To actually measure your code try using this benchmark using BenchmarkDotNet :
[MemoryDiagnoser]
[MarkdownExporterAttribute.StackOverflow]
public class ConcatTest
{
private readonly List<list_of_bytes> registers;
public ConcatTest()
{
registers = Enumerable.Range(0,1000).Select(i=>new list_of_bytes(i)).ToList();
}
[Benchmark]
public string StringBuilder()
{
var sb = new StringBuilder(registers.Count*3);
foreach (var register in registers)
{
sb.AppendFormat("{0}",register.RegisterValue);
}
return sb.ToString();
}
[Benchmark]
public string ForEach()
{
string ret = string.Empty;
foreach (list_of_bytes register in registers)
{
ret += Convert.ToString(register.RegisterValue) + ",";
}
return ret;
}
[Benchmark]
public string For()
{
string ret = string.Empty;
for (UInt16 i = 0; i < registers.Count; i++)
{
ret += Convert.ToString(registers[i].RegisterValue) + ",";
}
return ret;
}
}
The tests are run by calling BenchmarkRunner.Run<ConcatTest>()
using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Linq;
public class Program
{
public static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<ConcatTest>();
Console.WriteLine(summary);
}
}
Results
Running this on a Macbook produced the following results. Note that BenchmarkDotNet produced results ready to use in StackOverflow, and the runtime information is included in the results :
BenchmarkDotNet=v0.13.1, OS=macOS Big Sur 11.5.2 (20G95) [Darwin 20.6.0]
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.100
[Host] : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
DefaultJob : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Allocated |
-------------- |----------:|---------:|---------:|---------:|--------:|----------:|
StringBuilder | 34.56 μs | 0.682 μs | 0.729 μs | 7.5684 | 0.3052 | 35 KB |
ForEach | 278.36 μs | 5.509 μs | 5.894 μs | 818.8477 | 24.4141 | 3,763 KB |
For | 268.72 μs | 3.611 μs | 3.015 μs | 818.8477 | 24.4141 | 3,763 KB |
Both For and ForEach took almost 10 times more than StringBuilder and used 100 times as much RAM
If a string can change like in your example then using a StringBuilder is a better option and could help the issue your dealing with.
Modify any string object will result into the creation of a new string object. This makes the use of string costly. So when the user needs the repetitive operations on the string then the need of StringBuilder come into existence. It provides the optimized way to deal with the repetitive and multiple string manipulation operations. It represents a mutable string of characters. Mutable means the string which can be changed. So String objects are immutable but StringBuilder is the mutable string type. It will not create a new modified instance of the current string object but do the modifications in the existing string object.
So intead of creating many temporary objects that will need to be garbage collected and mean while are taking a lot of memory, just use StringBuilder.
More about StringBuilder - https://learn.microsoft.com/en-us/dotnet/api/system.text.stringbuilder?view=net-6.0

What is the Algorithm should i use to maximize the number of tasks (with deadlines) that i can do?

I have many tasks, each task defined by the day that I can start working on and the last day that task is still valid to do, each task done withing one day, not more, I can do one task per day.
The tasks with the deadlines as described in the below table.
| task | valid from | valid until |
|------|------------|-------------|
| t01 | 1 | 3 |
| t02 | 2 | 2 |
| t03 | 1 | 1 |
| t04 | 2 | 3 |
| t05 | 2 | 3 |
the number of tasks may be a huge number.
I want to know which algorithm I can use to solve this problem to maximize the number of tasks that I can do.
Update
base on the comments I wrote this code it is working but still hasn't good performance with a huge number of tasks.
public static int countTodoTasks(int[] validFrom, int[] validUnitil)
{
var tasks = new List<TaskTodo>();
for (int i = 0; i < validFrom.Length; i++)
{
tasks.Add(new TaskTodo { ValidFrom = validFrom[i], ValidUntil = validUnitil[i] });
}
tasks = tasks.OrderBy(x => x.ValidUntil).ToList();
var lDay = 0;
var schedule = new Dictionary<int, TaskTodo>();
while (tasks.Count > 0)
{
lDay = findBigestMinimumOf(lDay, tasks[0].ValidFrom, tasks[0].ValidUntil);
if (lDay != -1)
{
schedule[lDay] = tasks[0];
}
tasks.RemoveAt(0);
tasks.RemoveAll(x => lDay >= x.ValidUntil);
}
return schedule.Count;
}
static int findBigestMinimumOf(int x, int start, int end)
{
if (start > x)
{
return start;
}
if ((x == start && start == end) || x == end || x > end)
{
return -1;
}
return x + 1;
}
If the tasks have the same duration, then use a greedy algorithm as described above.
If it's too slow, use indexes (= hashing) and incremental calculation to speed it up if you need to scale out.
Indexing: during setup, iterate through all tasks to create map (=dictionary?) that maps each due date to a list of tasks. Better yet, use a NavigableMap (TreeMap), so you can ask for tail iterator (all tasks starting from a specific due date, in order). The greedy algorithm can then use that to scale better (think a better bigO notation).
Incremental calculation: only calculate the delta's for each task you're considering.
If the tasks have different duration, a greedy algorithm (aka construction heuristic) won't give you the optimal solution. Then it's NP-hard. After the Construction Heuristic (= greedy algorithm), run a Local Search (such as Tabu Search). Libraries such as OptaPlanner (Java, not C# unfortunately - look for alternatives there) can do both for you.
Also note there are multiple greedy algo's (First Fit, Fit Fit Decreasing, ...)
I suppose you can apply greedy algorithm for you purpose in this way.
Select minimal "valid from", minday.
Add to Xcandidates, all candidates with "valid from" = minday.
If no Xcandidates go to 1.
Select the interval, x, from Xcandidates, with earliest "valid until".
Remove x, inserting it in your schedule.
Remove all Xcandidates with "valid until" = minday.
Increment minday and go to 2.

C# foreach performance vs memory fragmentation

Tracking down a performance problem (micro I know) I end with this test program. Compiled with the framework 4.5 and Release mode it tooks on my machine around 10ms.
What bothers me if that if I remove this line
public int[] value1 = new int[80];
times get closer to 2 ms. It seems that there is some memory fragmentation problem but I failed to explain the why. I have tested the program with Net Core 2.0 with same results. Can anyone explain this behaviour?
using System;
using System.Collections.Generic;
using System.Diagnostics;
namespace ConsoleApp4
{
public class MyObject
{
public int value = 1;
public int[] value1 = new int[80];
}
class Program
{
static void Main(string[] args)
{
var list = new List<MyObject>();
for (int i = 0; i < 500000; i++)
{
list.Add(new MyObject());
}
long total = 0;
for (int i = 0; i < 200; i++)
{
int counter = 0;
Stopwatch timer = Stopwatch.StartNew();
foreach (var obj in list)
{
if (obj.value == 1)
counter++;
}
timer.Stop();
total += timer.ElapsedMilliseconds;
}
Console.WriteLine(total / 200);
Console.ReadKey();
}
}
}
UPDATE:
After some research I came to the conclusion that it's just the processor cache access time. Using the VS profiler, the cache misses seem to be a lot higher
Without array
With array
There are several implications involved.
When you have your line public int[] value1 = new int[80];, you have one extra allocation of memory: a new array is created on a heap which will accommodate 80 integers (320 bytes) + overhead of the class. You do 500 000 of these allocations.
These allocations total up for more than 160 MBs of RAM, which may cause the GC to kick in and see if there is memory to be released.
Further, when you allocate so much memory, it is likely that some of the objects from the list are not retained in the CPU cache. When you later enumerate your collection, the CPU may need to read the data from RAM, not from cache, which will induce a serious performance penalty.
I'm not able to reproduce a big difference between the two and I wouldn't expect it either. Below are the results I get on .NET Core 2.2.
Instances of MyObject will be allocated on the heap. In one case, you have an int and a reference to the int array. In the other you have just the int. In both cases, you need to do the additional work of following the reference from the list. That is the same in both cases and the compiled code shows this.
Branch prediction will affect how fast this runs, but since you're branching on the same condition every time I wouldn't expect this to change from run to run (unless you change the data).
BenchmarkDotNet=v0.11.3, OS=Windows 10.0.17134.376 (1803/April2018Update/Redstone4)
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=2.2.200-preview-009648
[Host] : .NET Core 2.2.0 (CoreCLR 4.6.27110.04, CoreFX 4.6.27110.04), 64bit RyuJIT
DefaultJob : .NET Core 2.2.0 (CoreCLR 4.6.27110.04, CoreFX 4.6.27110.04), 64bit RyuJIT
Method | size | Mean | Error | StdDev | Ratio |
------------- |------- |---------:|----------:|----------:|------:|
WithArray | 500000 | 8.167 ms | 0.0495 ms | 0.0463 ms | 1.00 |
WithoutArray | 500000 | 8.167 ms | 0.0454 ms | 0.0424 ms | 1.00 |
For reference:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Collections.Generic;
namespace CoreSandbox
{
[DisassemblyDiagnoser(printAsm: true, printSource: false, printPrologAndEpilog: true, printIL: false, recursiveDepth: 1)]
//[MemoryDiagnoser]
public class Test
{
private List<MyObject> dataWithArray;
private List<MyObjectLight> dataWithoutArray;
[Params(500_000)]
public int size;
public class MyObject
{
public int value = 1;
public int[] value1 = new int[80];
}
public class MyObjectLight
{
public int value = 1;
}
static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<Test>();
}
[GlobalSetup]
public void Setup()
{
dataWithArray = new List<MyObject>(size);
dataWithoutArray = new List<MyObjectLight>(size);
for (var i = 0; i < size; i++)
{
dataWithArray.Add(new MyObject());
dataWithoutArray.Add(new MyObjectLight());
}
}
[Benchmark(Baseline = true)]
public int WithArray()
{
var counter = 0;
foreach(var obj in dataWithArray)
{
if (obj.value == 1)
counter++;
}
return counter;
}
[Benchmark]
public int WithoutArray()
{
var counter = 0;
foreach (var obj in dataWithoutArray)
{
if (obj.value == 1)
counter++;
}
return counter;
}
}
}

Sum of one after another sequence

I have this scenario where I have to add the numbers inside a collection. An example would serve a best example.
I have these values in my database:
| Foo |
| 1 |
| 5 |
| 8 |
| 4 |
Result:
| Foo | Result |
| 1 | 1 |
| 5 | 6 |
| 8 | 14 |
| 4 | 18 |
As you can see, it has a somewhat fibonacci effect but the twist here is the numbers are given.
I can achieve this result with the help of for loop but is this possible doing in Linq. Like querying the database then having a result like above?
Any help would be much appreciated. Thanks!
I'm not sure how exactly you're touching the database, but here's a solution that can probably be improved upon:
var numbers = new List<int> { 1, 5, 8, 4 };
var result = numbers.Select((n, i) => numbers.Where((nn, ii) => ii <= i).Sum());
This overload of Select and Where takes the object (each number) and the index of that object. For each index, I used numbers.Where to Sum all the items with a lower and equal index.
For example, when the Select gets to the number 8 (index 2), numbers.Where grabs items with index 0-2 and sums them.
MoreLINQ has a Scan method that allows you to aggregate the values in a sequence while yielding each intermediate value, rather than just the final value, which is exactly what you're trying to do.
With that you can write:
var query = data.Scan((sum, next) => sum + next);
The one overload that you need here will be copied below. See the link above for details and additional overloads:
public static IEnumerable<TSource> Scan<TSource>(this IEnumerable<TSource> source,
Func<TSource, TSource, TSource> transformation)
{
if (source == null) throw new ArgumentNullException("source");
if (transformation == null) throw new ArgumentNullException("transformation");
return ScanImpl(source, transformation);
}
private static IEnumerable<T> ScanImpl<T>(IEnumerable<T> source, Func<T, T, T> f)
{
using (var i = source.GetEnumerator())
{
if (!i.MoveNext())
throw new InvalidOperationException("Sequence contains no elements.");
var aggregator = i.Current;
while (i.MoveNext())
{
yield return aggregator;
aggregator = f(aggregator, i.Current);
}
yield return aggregator;
}
}
I think you can achieve this with :
var acc = 0;
var result = numbers.Select(i =>
{
acc += i;
return acc;
}).ToList();
You need the ToList to be sure it will be run only once (otherwise the acc will keep growing).
Also I'm not sure it can be converted to a query (and performed server side).
Thomas Levesque posted a response to a similar question where he provide a SelectAggregate method who provide the intermediate values of an aggregate computation.
It's look like this feature is not present in Linq by default, so you probably will not be able to perform the computation server side using Linq.
You could do the following
int runningTotal = 0;
var runningTotals = numbers.Select(n => new
{
Number = n,
RunningTotal = (runningTotal += n)
});
This will give you the number and the running total.
It's just an adaption of #Jonesy's answer:
int[] ints = new[] {1, 5, 8, 4};
var result = ints.Select((x, y) => x + ints.Take(y).Sum());

Categories

Resources