Excuse my pseudo code below. I'm pretty sure there is a magical way to write this in a single linq statement that will also dramatically improve the performance. Here I have a list of millions of records in AList. The id may not be unique. What I'm after is the original list removing all duplicates (based on the id), but always grabbing the record with the earliest date. mystring is almost always a different value when there is a duplicate id.
public class A
{
public string id { get; set; }
public string mystring { get; set; }
public DateTime mydate { get; set; }
}
List<A> aListNew = new List<A>();
foreach (var v in AList)
{
var first = AList.Where(d => d.id == v.id).OrderBy(d => d.mydate).First();
// If not already added, then we add
if (!aListNew.Where(t => t.id == first.id).Any())
aListNew.Add(first);
}
You could use grouping directly to accomplish this in one LINQ statement:
List<A> aListNew = AList
.GroupBy(d => d.id)
.Select(g => g.OrderBy(i => i.mydate).First())
.ToList();
The fastest is probably going to be a straight foreach loop with a dictionary:
Dictionary<int, A> lookup = Dictionary<int, A>();
foreach (var v in AList)
{
if(!lookup.ContainsKey(v.id))
// add it
lookup[id] = v;
else if (lookup[id].mydate > v.mydate)
// replace it
lookup[id] = v;
}
// convert to list
List<A> aListNew = lookup.Values.ToList();
A Linq GroupBy / First() query might be comparable if there are few collisions, but either one is going to be O(N) since it has to traverse the whole list.
This should be easiest. No LINQ involved anyway.
var lookup = Dictionary<int, A>();
foreach(var a in aListNew.OrderByDescending(d => d.mydate)) {
lookup[a.id] = a;
}
var result = lookup.Values.ToList();
Note that sub-LINQ will hurt performance, and that's why I choose not to use it. Remember that LINQ is there to make your task easier, not to make the execution faster.
Related
Example is here, should work in online compilers:
internal class Program
{
static void Main(string[] args)
{
var i1 = new Item();
i1.Val1 = 1;
i1.Val2 = 2.1;
var i2 = new Item();
i2.Val1 = 1;
i2.Val2 = 1.5;
var i3 = new Item();
i3.Val1 = 3;
i3.Val2 = 0.3;
var list = new List<Item>
{
i1,
i2,
i3
};
var grouped = list.GroupBy(x => x.Val1);
Program p = new Program();
foreach(var group in grouped)
p.Func(group);
}
public void Func(IGrouping<int, Item> list)
{
list.OrderBy(x => x.Val2); //list will be ordered, but not saved
list = (IGrouping<int, Item>)list.OrderBy(x => x.Val2); //exception
}
}
public class Item
{
public int Val1 { get; set; }
public double Val2 { get; set; }
}
It's simplified code of what I'm trying to do - I need to order list inside Func, but I have no idea how. First line works in theory, but since it's not a void it's not working in practice - list is not actually ordered.
Second line should work, actually Visual Studio suggested that, but it throws runtime exception - Unable to cast object of type System.Linq.OrderedEnumerable to System.Linq.IGrouping.
I'm out of ideas for the time being, but there is no way of bypassing it - I absolutely need to order it there.
Edit
My current solution is to use Select(x => x) to flatten the IGrouping to normal List, this way I can easily order it and edit values without losing reference to grouped. If you really want to keep IGrouping then you are out of luck, does not seem to be possible.
Try this.
var grouped = list.GroupBy(x => x.Val1).Select(a=> a.OrderBy(a=>a.Val2).ToList());
OrderBy returns IOrderedEnumerable you can't cast that to IGrouping
Use First method at the end in order to get IGrouping collection of ordered items.
public void Func(IGrouping<int, Item> list)
{
list = list.OrderBy(x => x.Val2).GroupBy(x => x.Val1).First();
}
Your example code doesn't show what you are trying to arrive at.
list.OrderBy(x => x.Val2); //list will be ordered, but not saved
OrderBy doesn't order the existing collection in-place. It effectively returns a new collection.
list = (IGrouping<int, Item>)list.OrderBy(x => x.Val2); //exception
OrderBy returns an IOrderedEnumerable<TElement>. Both IOrderedEnumerable<TElement> and IGrouping<TKey,TElement> derive from IEnumerable<TElement> but you can't cast an IOrderedEnumerable to an IGrouping.
If all you want is to write out the values, then Func could be:
public IEnumerable<Item> Func(IGrouping<int, Item> list)
{
return list.OrderBy(x => x.Val2);
}
and the foreach loop could be:
foreach(var group in grouped)
{
var orderedList = p.Func(group);
Console.WriteLine($"group: {group.Key}");
foreach (var value in orderedList)
{
Console.WriteLine($" {value.Val2}");
}
}
Hopefully this helps.
I have a List<Map> and I wanted to update the Map.Target property based from a matching value from another List<Map>.
Basically, the logic is:
If mapsList1.Name is equal to mapsList2.Name
Then mapsList1.Target = mapsList2.Name
The structure of the Map class looks like this:
public class Map {
public Guid Id { get; set; }
public string Name { get; set; }
public string Target { get; set; }
}
I tried the following but obviously it's not working:
List<Map> mapsList1 = new List<Map>();
List<Map> mapsList2 = new List<Map>();
// populate the 2 lists here
mapsList1.Where(m1 => mapsList2.Where(m2 => m1.Name == m2.Name) ) // don't know what to do next
The count of items in list 1 will be always greater than or equal to the count of items in list 2. No duplicates in both lists.
Assuming there are a small number of items in the lists and only one item in list 1 that matches:
list2.ForEach(l2m => list1.First(l1m => l1m.Name == l2m.Name).Target = l2m.Target);
If there are more than one item in List1 that must be updated, enumerate the entire list1 doing a First on list2.
list1.ForEach(l1m => l1m.Target = list2.FirstOrDefault(l2m => l1.Name == l2m.Name)?.Target ?? l1m.Target);
If there are a large number of items in list2, turn it into a dictionary
var d = list2.ToDictionary(m => m.Name);
list1.ForEach(m => m.Target = d.ContainsKey(m.Name) ? d[m.Name].Target : m.Target);
(Presumably list2 doesn't contain any repeated names)
If list1's names are unique and everything in list2 is in list1, you could even turn list1 into a dictionary and enumerate list2:
var d=list1.ToDictionary(m => m.Name);
list2.ForEach(m => d[m.Name].Target = m.Target);
If List 2 has entries that are not in list1 or list1 has duplicate names, you could use a Lookup instead, you'd just have to do something to avoid a "collection was modified; enumeration may not execute" you'd get if you were trying to modify the list it returns in response to a name
mapsList1.Where(m1 => mapsList2.Where(m2 => m1.Name == m2.Name) ) // don't know what to do next
LINQ Where doesn't really work like that / that's not a statement in itself. The m1 is the entry from list1, and the inner Where would produce an enumerable of list 2 items, but it doesn't result in the Boolean the outer Where is expecting, nor can you do anything to either of the sequences because LINQ operations are not supposed to have side effects. The only thing you can do with a Where is capture or use the sequence it returns in some other operation (like enumerating it), so Where isn't really something you'd use for this operation unless you use it to find all the objects you need to alter. It's probably worth pointing out that ForEach is a list thing, not a LINQ thing, and is basically just another way of writing foreach(var item in someList)
If collections are big enough better approach would be to create a dictionary to lookup the targets:
List<Map> mapsList1 = new List<Map>();
List<Map> mapsList2 = new List<Map>();
var dict = mapsList2
.GroupBy(map => map.Name)
.ToDictionary(maps => maps.Key, maps => maps.First().Target);
foreach (var map in mapsList1)
{
if (dict.TryGetValue(map.Name, out var target))
{
map.Target = target;
}
}
Note, that this will discard any possible name duplicates from mapsList2.
If possible please help me convert these nested loops into a LINQ statement
Thank you very much for your help!
public static List<Term> GetTermsByName(this IEnumerable<Term> terms, Dictionary<string, string> termInfo)
{
List<Term> termList = new List<Term>();
foreach (Term term in terms)
{
foreach (var value in termInfo.Values)
{
if (term.Name == value)
{
termList.Add(term);
}
}
}
return termList;
}
Maybe Contains method is what you are after
Determines whether a sequence contains a specified element.
The following can be read as, Filter all Terms where the Term.Name exists in the dictionary Values
var values = termInfo.Values;
var result = terms.Where(term => values.Contains(term.Name));
.ToList();
// or
var result = terms.Where(term => termInfo.Values.Contains(term.Name));
.ToList();
You're losing the plot of the dictionary a bit here, don't you think? The speediness is in using the keys. However, you can still do better than a nested foreach or an inline linq equivalent with where and contains. Use a join to at least improve your efficiency.
var termList = (from term in terms
join value in termInfo.Values
on term.Name equals value
select term)
.Distinct() // if there are duplicates in either set
.ToList();
Is possible to sort an in-memory list by another list (the second list would be a reference data-source or something like this) ?
public class DataItem
{
public string Name { get; set; }
public string Path { get; set; }
}
// a list of Data Items, randomly sorted
List<DataItem> dataItems = GetDataItems();
// the sort order data source with the paths in the correct order
IEnumerable<string> sortOrder = new List<string> {
"A",
"A.A1",
"A.A2",
"A.B1"
};
// is there a way to tell linq to sort the in-memory list of objects
// by the sortOrder "data source"
dataItems = dataItems.OrderBy(p => p.Path == sortOrder).ToList();
First, lets assign an index to each item in sortOrder:
var sortOrderWithIndices = sortOrder.Select((x, i) => new { path = x, index = i });
Next, we join the two lists and sort:
var dataItemsOrdered =
from d in dataItems
join x in sortOrderWithIndices on d.Path equals x.path //pull index by path
orderby x.index //order by index
select d;
This is how you'd do it in SQL as well.
Here is an alternative (and I argue more efficient) approach to the one accepted as answer.
List<DataItem> dataItems = GetDataItems();
IDictionary<string, int> sortOrder = new Dictionary<string, int>()
{
{"A", int.MaxValue},
{"A.A1", int.MaxValue-1},
{"A.A2", int.MaxValue -2},
{"A.B1", int.MaxValue-3},
};
dataItems.Sort((di1, di2) => sortOrder[di1.Path].CompareTo(sortOrder[di2.Path]));
Let's say Sort() and OrderBy() both take O(n*logn), where n is number of items in dataItems. The solution given here takes O(n*logn) to perform the sort. We assume the step required to create the dictionary sortOrder has a cost not significantly different from creating the IEnumerable in the original post.
Doing a join and then sorting the collection, however adds an additional cost O(nm) where m is number of elements in sortOrder. Thus the total time complexity for that solution comes to O(nm + nlogn).
In theory, the approach using join may boil down to O(n * (m + logn)) ~= O(n*logn) any way. But in practice, join is costing extra cycles. This is in addition to possible extra space complexity incurred in the linq approach where auxiliary collections might have been created in order to process the linq query.
If your list of paths is large, you would be better off performing your lookups against a dictionary:
var sortValues = sortOrder.Select((p, i) => new { Path = p, Value = i })
.ToDictionary(x => x.Path, x => x.Value);
dataItems = dataItems.OrderBy(di => sortValues[di.Path]).ToList();
custom ordering is done by using a custom comparer (an implementation of the IComparer interface) that is passed as the second argument to the OrderBy method.
I have a List of this class:
class Stop
{
public int ID { get; set; }
public string Name { get; set; }
}
and I want to search through all the stop names in the List matching all the keywords of search list and returning the matched subset.
List<string> searchWords = new string { "words1", "word2", "words3" ...}
Here is my try but I am not really sure I am on the right track
var l = Stops.Select((stop, index) => new { stop, index })
.Where(x => SearchWords.All(sw => x.stop.Name.Contains(sw)));
Here is an example that might make it clearer, Say I have stop with a name "Dundas at Richmond NB" and the user types in "dun", "rich" this should match and return the correct stop.
var l = Stops.Where(s => searchWords.Contains(s.Name)).ToList();
It will return List<Stop> will only these stops, which have coresponding string within searchWords collection.
To make it perform better you should consider changing searchWords to HashSet<string> first. Contains method is O(1) on HashSet<T> and O(n) on List<T>.
var searchWordsSet = new HashSet<string>(searchWords);
var l = Stops.Where(s => searchWordsSet.Contains(s.Name)).ToList();
UPDATE
Because of OP update, here is a version, which requires all items from searchWords exists in Stop.Name to return that particular Stop instance:
var l = Stops.Where(s => searchWords.All(w => s.Name.Contains(w)).ToList();