I am trying to determine if there is a better way to execute the following query:
I have a List of Pair objects.
A Pair is defined as
public class Pair
{
public int IDA;
public int IDB;
public double Stability;
}
I would like to extract a list of all distinct ID's (ints) contained in the List<Pair>.
I am currently using
var pIndices = pairs.SelectMany(p => new List<int>() { p.IDA, p.IDB }).Distinct().ToList();
Which works, but it seems unintuitive to me to create a new List<int> only to have it flattened out by SelectMany.
This is another option I find unelegant to say the least:
var pIndices = pairs.Select(p => p.IDA).ToList();
pIndices.AddRange(pairs.Select((p => p.IDB).ToList());
pIndices = pIndices.Distinct().ToList();
Is there a better way? And if not, which would you prefer?
You could use Union() to get both the A's and B's after selecting them individually.
var pIndices = pairs.Select(p => p.IDA).Union(pairs.Select(p => p.IDB));
You could possibly shorten the inner expression to p => new [] { p.IDA, p.IDB }.
If you don't want to create a 2-element array/list for each Pair, and don't want to iterate your pairs list twice, you could just do it by hand:
HashSet<int> distinctIDs = new HashSet<int>();
foreach (var pair in pairs)
{
distinctIDs.Add(pair.IDA);
distinctIDs.Add(pair.IDB);
}
This is one without a new collection:
var pIndices = pairs.Select(p => p.IDA)
.Concat(pairs.Select(p => p.IDB))
.Distinct();
Shorten it like this:
var pIndices = pairs.SelectMany(p => new[] { p.IDA, p.IDB }).Distinct().ToList();
Using Enumerable.Repeat is a little unorthodox, but here it is anyway:
var pIndices = pairs
.SelectMany(
p => Enumerable.Repeat(p.IDA, 1).Concat(Enumerable.Repeat(p.IDB, 1))
).Distinct()
.ToList();
Finally, if you do not mind a little helper class, you can do this:
public static class EnumerableHelper {
// usage: EnumerableHelper.AsEnumerable(obj1, obj2);
public static IEnumerable<T> AsEnumerable<T>(params T[] items) {
return items;
}
}
Now you can do this:
var pIndices = pairs
.SelectMany(p => EnumerableHelper.AsEnumerable(p.IDA, p.IDB))
.Distinct()
.ToList();
Related
How to use OrderBy for shaping output in the same order as per the requested distinct list
public DataCollectionList GetLatestDataCollection(List<string> requestedDataPointList)
{
var dataPoints = _context.DataPoints.Where(c => requestedDataPointList.Contains(c.dataPointName))
.OrderBy(----------) //TODO: RE-ORDER IN THE SAME ORDER AS REQUESTED requestedDataPointList
.ToList();
dataPoints.ForEach(dp =>
{
....
});
}
Do the sorting on the client side:
public DataCollectionList GetLatestDataCollection(List<string> requestedDataPointList)
{
var dataPoints = _context.DataPoints.Where(c => requestedDataPointList.Contains(c.dataPointName))
.AsEnumerable()
.OrderBy(requestedDataPointList.IndexOf(c.dataPointName));
foreach (var dp in dataPoints)
{
....
});
}
NOTE: Also, I don't think ToList().ForEach() is ever better than foreach ().
It think the fastest method is to join the result back with the request list. This makes use of the fact that LINQ's join preserves the sort order of the first list:
var dataPoints = _context.DataPoints
.Where(c => requestedDataPointList.Contains(c.dataPointName))
.ToList();
var ordered = from n in requestedDataPointList
join dp in dataPoints on n equals dp.dataPointName
select dp;
foreach (var dataPoint in ordered)
{
...
}
This doesn't involve any ordering, joining does it all, which will be close to O(n).
Another fast method consists of creating a dictionary of sequence numbers:
var indexes = requestedDataPointList
.Select((n, i) => new { n, i }).ToDictionary(x => x.n, x => x.i);
var ordered = dataPoints.OrderBy(dp => indexes[dp.dataPointName]);
I have a Dictionary<string, int> where the string is a randomized collection of characters and the int is the ASCII sum of that string.
e.g.
["aaaaaaaaab", 971],
["aaaaaaaaba", 971],
["aaaaaaabaa", 971],
["aaaaaabaaa", 971]
I would like to make a new dictionary from the original where the new key is the value from the original, and the new value is the List<string> which would contain all the strings with the key as the ASCII sum.
e.g.
[971, List<string>{ "aaaaaaaaab", "aaaaaaaaba", "aaaaaaabaa", "aaaaaabaaa"}]
How can I achieve this? I cannot wrap my head around the required steps.
Use could GroupBy and ToDictionary
The premise is :
group by the old Value
project to a new dictionary given the values of the GroupBy
which will be the grouped list of KeyValuePair from the original dictionary, that in-turn has the key selected out of it (.Select(y => y.Key)
Example
var newDict = old.GroupBy(x => x.Value)
.ToDictionary(x => x.Key, x => x.Select(y => y.Key)
.ToList());
Additional Resources
Enumerable.GroupBy Method
Groups the elements of a sequence.
Enumerable.ToDictionary Method
Creates a Dictionary<TKey,TValue> from an IEnumerable<T>.
Since values are not unique, you need to group by Value before converting to dictionary:
var inverse = original
.GroupBy(p => p.Value)
.ToDictionary(g => g.Key, g => g.Select(p => p.Key).ToList());
If you wanted to do this without Linq, you could do the following:
foreach(KeyValuePair<string, int> entry in dict) {
if(!dict2.ContainsKey(entry.Value)) {
dict2[entry.Value] = new List<string>();
}
dict2[entry.Value].Add(entry.Key);
}
Assuming you have dict defined as Dictionary<string, int> dict and dict2 defined as Dictionary<int, List<string>> dict2
Here is a complete example for anyone that wants to "wrap their head around" how to do this, without LINQ.
using System;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
Dictionary<string,int> origDict = new Dictionary<string,int>{{"tttt",1},{"fttt",1},{"fftt",2}};
var vals = new int[origDict.Count];
origDict.Values.CopyTo(vals,0);
var keys = new string[origDict.Count];
origDict.Keys.CopyTo(keys,0);
Dictionary<int,List<string>> newDict = new Dictionary<int,List<string>>();
for(int i = 0; i < vals.Length; i++){
int val = vals[i];
if(newDict.ContainsKey(val)){
newDict[val].Add(keys[i]);
}else{
newDict[val] = new List<string>();
newDict[val].Add(keys[i]);
}
}
foreach(var key in newDict.Keys){
Console.WriteLine(key);
foreach(var val in newDict[key]){
Console.WriteLine(val);
}
}
}
}
Output:
1
tttt
fttt
2
fftt
I have a dictionary containing all users with their corresponding age.
Dictionary<string,int> AllUsers;
I have a list of specific users.
List<String> Users;
I would like to filter the first dictionary AllUsers with only the users who have their name in the SpecificUsers list.
I have done something manually with loops but I would like to use linq expression but I am not very familiar with them.
Thanks in advance for your help
You could filter Users:
Users.Where(i => AllUsers.ContainsKey(i)).Select(i => new { User = i, Age = AllUsers[i] });
The major benefit of this is that you're using the indexed AllUsers to do the filtering, so your total computational complexity only depends on the amount of users in Users (Dictionary.Contains is O(1)) - the naïve approaches tend to be Users * AllUsers.
If you want a dictionary on output, it's as simple as replacing the .Select(...) above with
.ToDictionary(i => i, i => AllUsers[i])
It might work
var newdict = AllUsers.Where(x => Users.Contains(x.Key))
.ToDictionary(val => val.Key, val => val.Value);
it will create new dictionary (cause linq is for querying not updating) with all the users from dictionary that are on the Users list. You need to use ToDictionary to actualy make it dictionary.
EDIT:
As #Rawling said it would be more performant to filter on Dictionary rather than on list. Solution to achieve that is present in #Luaan answer (I won't copy it as some do)
You can use a join() method to actually join the two collections. It allows us to get what you need with a single line of linq.
var allUsers = new Dictionary<string, int>();
allUsers.Add("Bob", 10);
allUsers.Add("Tom", 20);
allUsers.Add("Ann", 30);
var users = new List<string>();
users.Add("Bob");
users.Add("Tom");
users.Add("Jack");
var result = allUsers.Join(users, o => o.Key, i => i, (o, i) => o);
foreach(var r in result)
{
Console.WriteLine(r.Key + " " + r.Value);
}
It will output the following in the console:
Bob 10
Tom 20
Only the names that appears in both collection will be available in the result collection
There are multiple ways to do this
You can use this using where keyword
var result= yourDictionary.Where(p=> yourList.Contains(p.Key))
.ToDictionary(p=> p.Key, p=> p.Value);
But if you have lot of entries its better to use HashSet
var strings = new HashSet<string>(yourList);
var result= yourDictionary.Where(p=> strings.Contains(p.Key))
.ToDictionary(p=> p.Key, p=> p.Value);
using JOIN
var query =
from kvp in yourDictionary
join s in yourList on kvp.Key equals s
select new { kvp.Key, kvp.Value };
With the help of the following useful function
public static class Extensions
{
public static KeyValuePair<TKey,TValue>? Find<TKey, TValue>(this IDictionary<TKey, TValue> source, TKey key)
{
TValue value;
return source.TryGetValue(key, out value) ? new KeyValuePair<TKey, TValue>(key, value) : (KeyValuePair<TKey, TValue>?)null;
}
}
here is IMO the optimal solution (uses single lookup per key and does not introduce closure):
var filteredUsers = Users.Select(AllUsers.Find)
.Where(item => item.HasValue)
.ToDictionary(item => item.Value.Key, item => item.Value.Value);
I need to remove duplicates, but also log which I am removing. I have two solutions right now, one that can go through each duplicate and one that removes duplicates. I know that removing in-place inside a foreach is dangerous so I am a bit stuck on how to do this as efficient as possible.
What I got right now
var duplicates = ListOfThings
.GroupBy(x => x.ID)
.Where(g => g.Skip(1).Any())
.SelectMany(g => g);
foreach (var duplicate in duplicates)
{
Log.Append(Logger.Type.Error, "Conflicts with another", "N/A", duplicate.ID);
}
ListOfThings = ListOfThings.GroupBy(x => x.ID).Select(y => y.First()).ToList();
Well, ToList() will materialize the query, so if you allow side effects (i.e. writing to log) it could be like that:
var cleared = ListOfThings
.GroupBy(x => x.ID)
.Select(chunk => {
// Side effect: writing to log while selecting
if (chunk.Skip(1).Any())
Log.Append(Logger.Type.Error, "Conflicts with another", "N/A", chunk.Key);
// if there're duplicates by Id take the 1st one
return chunk.First();
})
.ToList();
Why group when one can use the Aggregate function to determine the duplicates for the report and the result?
Example
var items = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Alpha"};
var duplicatesDictionary =
items.Aggregate (new Dictionary<string, int>(),
(results, itm) =>
{
if (results.ContainsKey(itm))
results[itm]++;
else
results.Add(itm, 1);
return results;
});
Here is the result of the above where each insert was counted and reported.
Now extract the duplicates report for any count above 1.
duplicatesDictionary.Where (kvp => kvp.Value > 1)
.Select (kvp => string.Format("{0} had {1} duplicates", kvp.Key, kvp.Value))
Now the final result is to just extract all the keys.
duplicatesDictionary.Select (kvp => kvp.Key);
You can use a hash set and union it with a list to get unique items; just override the reference comparison. Implementing IEqualityComparer<T> is flexible; if it's just ID that makes two objects unique then ok; but if it's more you can extend it, too.
You can get duplicates with LINQ.
void Main()
{
//your original class:
List<Things> originalList = new List<Things> { new Things(5), new Things(3), new Things(5) };
//i'm doing this in LINQPad; if you're using VS you may need to foreach the object
Console.WriteLine(originalList);
//put your duplicates back in a list and log them as you did.
var duplicateItems = originalList.GroupBy(x => x.ID).Where(x => x.Count() > 1).ToList();//.Select(x => x.GetHashCode());
Console.WriteLine(duplicateItems);
//create a custom comparer to compare your list; if you care about more than ID then you can extend this
var tec = new ThingsEqualityComparer();
var listThings = new HashSet<Things>(tec);
listThings.UnionWith(originalList);
Console.WriteLine(listThings);
}
// Define other methods and classes here
public class Things
{
public int ID {get;set;}
public Things(int id)
{
ID = id;
}
}
public class ThingsEqualityComparer : IEqualityComparer<Things>
{
public bool Equals(Things thing1, Things thing2)
{
if (thing1.ID == thing2.ID)
{
return true;
}
else
{
return false;
}
}
public int GetHashCode(Things thing)
{
int hCode = thing.ID;
return hCode.GetHashCode();
}
}
I am working with an API that is returning duplicate Ids. I need to insert these values into my database using the EF. Before trying to add the objects I want to trim away any duplicates.
I have a small example of the code I am trying to write.
var itemsToImport = new List<Item>(){};
itemsToImport.Add(new Item() { Description = "D-0", Id = 0 });
for (int i = 0; i < 5; i++)
{
itemsToImport.Add(new Item(){Id = i,Description = "D-"+i.ToString()});
}
var currentItems = new List<Item>
{
new Item() {Id = 1,Description = "D-1"},
new Item(){Id = 3,Description = "D-3"}
};
//returns the correct missing Ids
var missing = itemsToImport.Select(s => s.Id).Except(currentItems.Select(s => s.Id));
//toAdd contains the duplicate record.
var toAdd = itemsToImport.Where(x => missing.Contains(x.Id));
foreach (var item in toAdd)
{
Console.WriteLine(item.Description);
}
What do I need to change to fix my variable "toAdd" to only return a single record even if there is a repeat?
You can do this by grouping by the Id and then selecting the first item in each group.
var toAdd = itemsToImport
.Where(x => missing.Contains(x.Id));
becomes
var toAdd = itemsToImport
.Where(x => missing.Contains(x.Id))
.GroupBy(item => item.Id)
.Select(grp => grp.First());
Use DistinctBy from MoreLINQ, as recommended by Jon Skeet in https://stackoverflow.com/a/2298230/385844
The call would look something like this:
var toAdd = itemsToImport.Where(x => missing.Contains(x.Id)).DistinctBy(x => x.Id);
If you'd rather not (or can't) use MoreLINQ for some reason, DistinctBy is fairly easy to implement yourself:
static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> sequence, Func<T, TKey> projection)
{
var set = new HashSet<TKey>();
foreach (var item in sequence)
if (set.Add(projection(item)))
yield return item;
}
You can use the Distinct function. You'll need to override Equals and GetHashCode in Item (Given they contain the same data) though.
Or use FirstOrDefault to get the first Item with the matching Id back.
itemsToImport.Where(x => missing.Contains(x.Id)).FirstOrDefault()