Optimizing linq efficiency

Optimizing linq efficiency - c#

I got this linq:
return ngrms.GroupBy(x => x)
.Select(s => new { Text = s.Key, Count = s.Count() })
.Where(x => x.Count > minCount)
.OrderByDescending(x => x.Count)
.ToDictionary(g => g.Text, g => g.Count);
ngrms is IEnumerable<String>
Is there a way that I can optimize this code?
I don't care if I have to rewrite all the code and open to all low level tweaks.

If you implement a Dictionary that can be incremented (emulating a multiset or bag) then you can speed up about 3x faster than LINQ, but the difference is small unless you have a lot of ngrms. On a list of 10 million, with about 100 unique values, the LINQ code still takes less than a second on my PC. If your LINQ code takes time 1, a foreach with a Dictionary<string,int> takes 0.85 and this code takes 0.32.
Here is the class for creating an updateable value in the Dictionary:
public class Ref<T> {
public T val { get; set; }
public Ref(T firstVal) => val = firstVal;
public static implicit operator T(Ref<T> rt) => rt.val;
}
(If C# allowed operator ref T you could return a ref to the val property and almost treat a Ref<T> as if it were a lvalue of type T.)
Now you can count the occurrences of the strings in a Dictionary<string,Ref<int>> with only one lookup per string:
var dictCounts = new Dictionary<string, Ref<int>>();
foreach (var s in ngrms) {
if (dictCounts.TryGetValue(s, out var refn))
++refn.val;
else
dictCounts.Add(s, new Ref<int>(1));
}
Finally you can compute the answer by filtering the counts to the ones you want to keep:
var ans = dictCounts.Where(kvp => kvp.Value > minCount).ToDictionary(kvp => kvp.Key, kvp => kvp.Value.val);

Going by your linq query, you may consider rewriting the code using simple foreach loop for better performance, like below. It takes o(n) time complexity to execute:
Dictionary<string, int> dict = new Dictionary<string, int>();
foreach(var s in ngrms)
{
if (dict.ContainsKey(s))
dict[s]++;
else
dict.Add(s, 1);
}
return dict.Where(a => a.Value > minCount);

Related

Populate new dictionary from old dictionary

I have a Dictionary<string, int> where the string is a randomized collection of characters and the int is the ASCII sum of that string.
e.g.
["aaaaaaaaab", 971],
["aaaaaaaaba", 971],
["aaaaaaabaa", 971],
["aaaaaabaaa", 971]
I would like to make a new dictionary from the original where the new key is the value from the original, and the new value is the List<string> which would contain all the strings with the key as the ASCII sum.
e.g.
[971, List<string>{ "aaaaaaaaab", "aaaaaaaaba", "aaaaaaabaa", "aaaaaabaaa"}]
How can I achieve this? I cannot wrap my head around the required steps.

Use could GroupBy and ToDictionary
The premise is :
group by the old Value
project to a new dictionary given the values of the GroupBy
which will be the grouped list of KeyValuePair from the original dictionary, that in-turn has the key selected out of it (.Select(y => y.Key)
Example
var newDict = old.GroupBy(x => x.Value)
.ToDictionary(x => x.Key, x => x.Select(y => y.Key)
.ToList());
Additional Resources
Enumerable.GroupBy Method
Groups the elements of a sequence.
Enumerable.ToDictionary Method
Creates a Dictionary<TKey,TValue> from an IEnumerable<T>.

Since values are not unique, you need to group by Value before converting to dictionary:
var inverse = original
.GroupBy(p => p.Value)
.ToDictionary(g => g.Key, g => g.Select(p => p.Key).ToList());

If you wanted to do this without Linq, you could do the following:
foreach(KeyValuePair<string, int> entry in dict) {
if(!dict2.ContainsKey(entry.Value)) {
dict2[entry.Value] = new List<string>();
}
dict2[entry.Value].Add(entry.Key);
}
Assuming you have dict defined as Dictionary<string, int> dict and dict2 defined as Dictionary<int, List<string>> dict2

Here is a complete example for anyone that wants to "wrap their head around" how to do this, without LINQ.
using System;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
Dictionary<string,int> origDict = new Dictionary<string,int>{{"tttt",1},{"fttt",1},{"fftt",2}};
var vals = new int[origDict.Count];
origDict.Values.CopyTo(vals,0);
var keys = new string[origDict.Count];
origDict.Keys.CopyTo(keys,0);
Dictionary<int,List<string>> newDict = new Dictionary<int,List<string>>();
for(int i = 0; i < vals.Length; i++){
int val = vals[i];
if(newDict.ContainsKey(val)){
newDict[val].Add(keys[i]);
}else{
newDict[val] = new List<string>();
newDict[val].Add(keys[i]);
}
}
foreach(var key in newDict.Keys){
Console.WriteLine(key);
foreach(var val in newDict[key]){
Console.WriteLine(val);
}
}
}
}
Output:
1
tttt
fttt
2
fftt

Use linq expression to filter a dictionary with a list of keys

I have a dictionary containing all users with their corresponding age.
Dictionary<string,int> AllUsers;
I have a list of specific users.
List<String> Users;
I would like to filter the first dictionary AllUsers with only the users who have their name in the SpecificUsers list.
I have done something manually with loops but I would like to use linq expression but I am not very familiar with them.
Thanks in advance for your help

You could filter Users:
Users.Where(i => AllUsers.ContainsKey(i)).Select(i => new { User = i, Age = AllUsers[i] });
The major benefit of this is that you're using the indexed AllUsers to do the filtering, so your total computational complexity only depends on the amount of users in Users (Dictionary.Contains is O(1)) - the naïve approaches tend to be Users * AllUsers.
If you want a dictionary on output, it's as simple as replacing the .Select(...) above with
.ToDictionary(i => i, i => AllUsers[i])

It might work
var newdict = AllUsers.Where(x => Users.Contains(x.Key))
.ToDictionary(val => val.Key, val => val.Value);
it will create new dictionary (cause linq is for querying not updating) with all the users from dictionary that are on the Users list. You need to use ToDictionary to actualy make it dictionary.
EDIT:
As #Rawling said it would be more performant to filter on Dictionary rather than on list. Solution to achieve that is present in #Luaan answer (I won't copy it as some do)

You can use a join() method to actually join the two collections. It allows us to get what you need with a single line of linq.
var allUsers = new Dictionary<string, int>();
allUsers.Add("Bob", 10);
allUsers.Add("Tom", 20);
allUsers.Add("Ann", 30);
var users = new List<string>();
users.Add("Bob");
users.Add("Tom");
users.Add("Jack");
var result = allUsers.Join(users, o => o.Key, i => i, (o, i) => o);
foreach(var r in result)
{
Console.WriteLine(r.Key + " " + r.Value);
}
It will output the following in the console:
Bob 10
Tom 20
Only the names that appears in both collection will be available in the result collection

There are multiple ways to do this
You can use this using where keyword
var result= yourDictionary.Where(p=> yourList.Contains(p.Key))
.ToDictionary(p=> p.Key, p=> p.Value);
But if you have lot of entries its better to use HashSet
var strings = new HashSet<string>(yourList);
var result= yourDictionary.Where(p=> strings.Contains(p.Key))
.ToDictionary(p=> p.Key, p=> p.Value);
using JOIN
var query =
from kvp in yourDictionary
join s in yourList on kvp.Key equals s
select new { kvp.Key, kvp.Value };

With the help of the following useful function
public static class Extensions
{
public static KeyValuePair<TKey,TValue>? Find<TKey, TValue>(this IDictionary<TKey, TValue> source, TKey key)
{
TValue value;
return source.TryGetValue(key, out value) ? new KeyValuePair<TKey, TValue>(key, value) : (KeyValuePair<TKey, TValue>?)null;
}
}
here is IMO the optimal solution (uses single lookup per key and does not introduce closure):
var filteredUsers = Users.Select(AllUsers.Find)
.Where(item => item.HasValue)
.ToDictionary(item => item.Value.Key, item => item.Value.Value);

Clever aggregating of tuples in C#

I am processing a complex query in parallel. From the called methods I get a lot of Tuple<IEnumerable<Object>, int> objects. I would like to aggregate them quickly, but probably .Aggregate (code below) is not the best option. What is the right way to do it?
public static Tuple<IEnumerable<Object>, int> Parse(Object obj)
{
var ieo = new List<Object>();
var x = 5;
return new Tuple<IEnumerable<Object>, int>(ieo, x);
}
public static void Query(List<Object> obj)
{
var result = obj
.AsParallel()
.Select(o => Parse(o))
. // do something to aggregate this quickly and get a tuple of:
// - flattened IEnumerable<Object>
// - summed up all second items
}
And my aggregate suggestion, which probably is very slow and looks terribly. But works.
.Aggregate((t1, t2) => new Tuple<IEnumerable<Object>, int>(t1.Item1.Concat(t2.Item1), t1.Item2 + t2.Item2));

you can write custom flattener.
public static Tuple<IEnumerable<T>, int> MagicFlatten<T>(
this IEnumerable<Tuple<IEnumerable<T>, int>> tupleCrap)
{
var item1 = tupleCrap.SelectMany(x => x.Item1);
var item2 = tupleCrap.Sum(x => x.Item2);
return new Tuple<...>(item1, item2);
}
and later you can use it:
.AsParallel()
.Select(o => Parse(o))
.MagicFlatten();

Alternatives to LINQ.SelectMany with constant number of inner elements

I am trying to determine if there is a better way to execute the following query:
I have a List of Pair objects.
A Pair is defined as
public class Pair
{
public int IDA;
public int IDB;
public double Stability;
}
I would like to extract a list of all distinct ID's (ints) contained in the List<Pair>.
I am currently using
var pIndices = pairs.SelectMany(p => new List<int>() { p.IDA, p.IDB }).Distinct().ToList();
Which works, but it seems unintuitive to me to create a new List<int> only to have it flattened out by SelectMany.
This is another option I find unelegant to say the least:
var pIndices = pairs.Select(p => p.IDA).ToList();
pIndices.AddRange(pairs.Select((p => p.IDB).ToList());
pIndices = pIndices.Distinct().ToList();
Is there a better way? And if not, which would you prefer?

You could use Union() to get both the A's and B's after selecting them individually.
var pIndices = pairs.Select(p => p.IDA).Union(pairs.Select(p => p.IDB));

You could possibly shorten the inner expression to p => new [] { p.IDA, p.IDB }.

If you don't want to create a 2-element array/list for each Pair, and don't want to iterate your pairs list twice, you could just do it by hand:
HashSet<int> distinctIDs = new HashSet<int>();
foreach (var pair in pairs)
{
distinctIDs.Add(pair.IDA);
distinctIDs.Add(pair.IDB);
}

This is one without a new collection:
var pIndices = pairs.Select(p => p.IDA)
.Concat(pairs.Select(p => p.IDB))
.Distinct();

Shorten it like this:
var pIndices = pairs.SelectMany(p => new[] { p.IDA, p.IDB }).Distinct().ToList();

Using Enumerable.Repeat is a little unorthodox, but here it is anyway:
var pIndices = pairs
.SelectMany(
p => Enumerable.Repeat(p.IDA, 1).Concat(Enumerable.Repeat(p.IDB, 1))
).Distinct()
.ToList();
Finally, if you do not mind a little helper class, you can do this:
public static class EnumerableHelper {
// usage: EnumerableHelper.AsEnumerable(obj1, obj2);
public static IEnumerable<T> AsEnumerable<T>(params T[] items) {
return items;
}
}
Now you can do this:
var pIndices = pairs
.SelectMany(p => EnumerableHelper.AsEnumerable(p.IDA, p.IDB))
.Distinct()
.ToList();

How to create a List<T> from a comma separated string?

Given the variable
string ids = Request.QueryString["ids"]; // "1,2,3,4,5";
Is there any way to convert it into a List without doing something like
List<int> myList = new List<int>();
foreach (string id in ids.Split(','))
{
if (int.TryParse(id))
{
myList.Add(Convert.ToInt32(id));
}
}

To create the list from scratch, use LINQ:
ids.Split(',').Select(i => int.Parse(i)).ToList();
If you already have the list object, omit the ToList() call and use AddRange:
myList.AddRange(ids.Split(',').Select(i => int.Parse(i)));
If some entries in the string may not be integers, you can use TryParse:
int temp;
var myList = ids.Split(',')
.Select(s => new { P = int.TryParse(s, out temp), I = temp })
.Where(x => x.P)
.Select(x => x.I)
.ToList();
One final (slower) method that avoids temps/TryParse but skips invalid entries is to use Regex:
var myList = Regex.Matches(ids, "[0-9]+").Cast<Match>().SelectMany(m => m.Groups.Cast<Group>()).Select(g => int.Parse(g.Value));
However, this can throw if one of your entries overflows int (999999999999).

This should do the trick:
myList.Split(',').Select(s => Convert.ToInt32(s)).ToList();
If the list may contain other data besides integers, a TryParse call should be included. See the accepted answer.

Using Linq:
myList.AddRange(ids.Split(',').Select(s => int.Parse(s));
Or directly:
var myList = ids.Split(',').Select(s => int.Parse(s));
Also, to prevent the compiler from explicitly generating the (largely redundant) lambda, consider:
var myList = ids.Split(',').Select((Func<string, int>)int.Parse);
(Hint: micro-optimization.)
There's also TryParse which should be used instead of Parse (only) if invalid input is possible and should be handled silently. However, others have posted solutions using TryParse so I certainly won't. Just bear in mind that you shouldn't duplicate the calculation.

Or including TryParse like in your example:
var res = ids.Split(',').Where(x => { int tmp; return int.TryParse(x, out tmp); }).Select(x => int.Parse(x)).ToList();

To match the request in terms of performance characteristics and behaviour, it should do the same thing and not go off doign regexes or not doing the 'TryParse':-
ds.Split(',')
.Select( i => {
int value;
bool valid = int.TryParse(out value);
return new {valid, value}
})
.Where(r=>r.valid)
.Select(r=>r.value)
.ToList();
But while correct, that's quite ugly :D
Borrowing from a hint in Jason's comment:-
ds.Split(',')
.Select( i => {
int value;
bool valid = int.TryParse(out value);
return valid ? new int?( value) : null;
})
.Where(r=>r != null)
.Select(r=>r.Value)
.ToList();
Or
static class Convert
{
public static Int32? ConvertNullable(this string s)
{
int value;
bool valid = int.TryParse(out value);
return valid ? new int?( value) : null;
}
}
ds.Split(',')
.Select( s => Convert.ConvertNullable(s))
.Where(r=>r != null)
.Select(r=>r.Value)
.ToList();

One issue at hand here is how we're gonna deal with values that are not integers (lets assume we'll get some that is not integers). One idea might be to simply use a regex:
^-?[0-9]+$
Now, we could combine all this up with (as shown in Konrad's example):
var myList = ids.Split(',').Where(s => Regex.IsMatch(s, "^-?[0-9]$")).Select(s => Convert.ToInt32(s)).ToList();
That should do the job.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Optimizing linq efficiency - c#

Related

Populate new dictionary from old dictionary

Use linq expression to filter a dictionary with a list of keys

Clever aggregating of tuples in C#

Alternatives to LINQ.SelectMany with constant number of inner elements

How to create a List<T> from a comma separated string?

Categories

Resources