Speed of linq query grouping and intersect in particular

Speed of linq query grouping and intersect in particular - c#

Say 3 lists exist with over 500,000 records and we need to perform a set of operations (subsets shown below):
1) Check for repeating ids in list one and two and retrieve distinct ids while Summing up "ValuesA" for duplicate ids and put results in a list. Lets call this list list12.
2) compare all the values with matching ids between list 3 list12 and print results say to console.
3) ensure optimal performance.
This what i have so far:
var list1 = new List<abc>()
{
new abc() { Id = 0, ValueA = 50},
new abc() { Id = 1, ValueA = 40},
new abc() { Id = 1, ValueA = 70}
};
var list2 = new List<abc>()
{
new abc() { Id = 0, ValueA = 40},
new abc() { Id = 1, ValueA = 60},
new abc() { Id = 3, ValueA = 20},
};
var list3 = new List<abc>()
{
new abc() { Id = 0, ValueA = 50},
new abc() { Id = 1, ValueA = 40},
new abc() { Id = 4, ValueA = 70},
};
1) with the help of the solution from here [link][1] I was able to resolve part 1.
var list12 = list2.GroupBy(i => i.Id)
.Select(g => new
{
Id = g.Key,
NewValueA = g.Sum(j => j.ValueA),
});
2)I cant seem to properly get the complete result set from this part. I can get the matching account numbers, maybe someone knows of a faster way other than hashsets, but I also need the ValueA from each list along with the matching account numbers.
foreach (var values in list3.ToHashSet().Select(i => i.ID).Intersect(list12.ToHashSet().Select(j => j.UniqueAccount)))
{
Console.WriteLine(values) //prints matching account number
//?? how do I get ValueA with from both lists with this in the quickest way possible
}
3) my only attempt at improving performance from reading online is to use hashsets as I seen in the attempt above but I may be doing this incorrectly and someone may have a better solution

I don't think that any conversion to HashSet, however efficient, will increase performance. The reason is that the lists must be enumerated to create the HashSets and then the HashSets must be enumerated to get to the results.
If you put everything in one LINQ statement the number of enumerations will be minimized. And by calculating the sums at the end the number of calculations is reduced to the absolute minimum:
list1.Concat(list2)
.Join(list3, x => x.Id, l3 => l3.Id, (l12,l3) => l12)
.GroupBy (x => x.Id)
.Select(g => new
{
Id = g.Key,
NewValueA = g.Sum(j => j.ValueA),
})
With your data this shows:
Id NewValueA
0 90
1 170
I don't know if I understood all requirements well, but this should give you the general idea.

If you want to get access to both elements you probably want a join. A join is a very general construct that can be used to construct all other set operations.

Related

Understanding the GroupJoin and Join in Linq chaining syntax (Homework)

I need help with understanding the fourth argument for GroupJoin. From what i understand so far GroupJoin takes 4 arguments: (1, 2) - the first one is the secondary list and argument two is a Func that returns the key from the first object type in other words from the first list in this case (people). (3, 4) A Func that returns the key from the second object type from the second list in this case (items), and one that stores the grouped object with the group itself (I can't understand the code for this part). Considering this and having the code below:
var products = new Product[]
{
new Product { Id = 1, Type = "Phone", Model = "OnePlus", Price = 1000 },
new Product { Id = 2, Type = "Phone", Model = "Apple", Price = 2000 },
new Product { Id = 3, Type = "Phone", Model = "Samsung", Price = 1500 },
new Product { Id = 4, Type = "TV", Model = "Samsung 32", Price = 200 },
};
var people = new Person[]
{
new Person { Id = 1, Name = "Ivan Ivanov", Money = 150000 },
new Person { Id = 2, Name = "Dragan Draganov", Money = 250000 },
new Person { Id = 3, Name = "Ivelin Ivelinov", Money = 350000
}
};
var items = new Item[]
{
new Item { PersonId = 1, ProductId = 1, Amount = 1 },
new Item { PersonId = 1, ProductId = 4, Amount = 1 },
new Item { PersonId = 1, ProductId = 5, Amount = 1 },
new Item { PersonId = 1, ProductId = 7, Amount = 1 },
new Item { PersonId = 2, ProductId = 2, Amount = 1 },
};
Query:
var productOwnerList = people
.GroupJoin(
items,
o => o.Id,
i => i.PersonId,
(o, i) => new <--- (**)
{
Person = o,
Products = i
.Join(products,
o1 => o1.ProductId,
i2 => i2.Id,
(o1, i2) => i2) <--- (*)
.ToArray()
})
.ToArray();
Just to mention I post only a few lines for the data. I need help to understand what the 4th argument for the join method is performing here -> (*) (stores the grouped object with the group itself) ? When i watch the result i see it it puts all Person id's associate with the product keys and joined the two lists based on Items list (one to many). But i cannot get what exactly this line means (o1, o2) => i2). Its obvious what is doing (put all the items associated with the person id in a array (items[]) for every person. but what is "under the hood" here ? Also one question about (**) this line its creating new object, is this a anonymous class or if its not what is it.

The fourth argument - which maps to the fifth parameter in the documentation (because the first parameter is the target of the extension method call) is just the result selector. It's a function accepting two parameters: the first is an element of the "outer" sequence (the people array in your case) and the second is a sequence of elements from the "inner" sequence (the items array in your case) which have the same key as the outer element. The function should return a "result" element, and the overall result of the method call is a sequence of those results.
The function is called once for each of the "outer" elements, so you'd have:
First call: person ID 1, and products with IDs 1, 4, 5, 7
Second call: person ID 2, and the product with ID 2
Third call: person ID 3, and an empty sequence of products
Your query is complex because you're using an anonymous type for your result, and constructing an instance of the anonymous type using another query. Here's a simpler query that might help to clarify:
var productOwnerList = people
.GroupJoin(
items,
o => o.Id,
i => i.PersonId,
(person, items) => $"{person.Id}: {string.Join(",", items.Select(item => item.ProductId))}"
.ToArray();

Group and separate list

I have below entity structure
public class Item
{
public EnumType Type { get; set; }
public int Price { get; set; }
}
public enum EnumType
{
A =1,
B=2,
C =3
}
I have a list of items as follow
var items = new List<Item>
{
new Item{ Price=5, Type= EnumType.B},
new Item{ Price=5, Type= EnumType.B},
new Item{ Price=5, Type= EnumType.B},
new Item{ Price=10, Type= EnumType.B},
new Item{ Price=10, Type= EnumType.B},
new Item{ Price=10, Type= EnumType.B},
new Item{ Price=15, Type= EnumType.C},
new Item{ Price=15, Type= EnumType.C},
new Item{ Price=15, Type= EnumType.C},
new Item{ Price=15, Type= EnumType.C},
new Item{ Price=15, Type= EnumType.C}
};
If the price and type are same, based on type it need to exclude every nth item from the list and then calculate the sum.
i.e type B = 3, Type C = 4
Which means in above sample data, since there are 3 items each in type B once it group by price and type it need to exclude every 3rd item when calculate sum.
So sum for type B will be 5+5+10+10 and sum for type C will be 15+15+15+15
I tried using modular but seems its not the correct direction
I have tried this so far
static int GetFactorByType(EnumType t)
{
switch(t)
{
case EnumType.A:
return 2;
case EnumType.B:
return 3;
case EnumType.C:
return 4;
default:
return 2;
}
}
var grp = items.GroupBy(g => new { g.Type, g.Price }).Select(s => new
{
type= s.Key.Type,
price = s.Key.Price,
count = s.Count()
}).Where(d => d.count % GetFactorByType(d.type) == 0).ToList();

Here's one solve:
//track the type:nth element discard
var dict = new Dictionary<EnumType, int?>();
dict[EnumType.B] = 3;
dict[EnumType.C] = 4;
//groupby turns our list of items into two collections, depending on whether their type is b or c
var x = items.GroupBy(g => new { g.Type })
.Select(g => new //now project a new collection
{
g.Key.Type, //that has the type
SumPriceWithoutNthElement = //and a sum
//the sum is calculated by reducing the list based on index position: in where(v,i), the i is the index of the item.
//We drop every Nth one, N being determined by a dictioary lookup or 2 if the lookup is null
//we only want list items where (index%N != N-1) is true
g.Where((v, i) => (i % (dict[g.Key.Type]??2)) != ((dict[g.Key.Type] ?? 2) - 1))
.Sum(r => r.Price) //sum the price for the remaining
}
).ToList(); //tolist may not be necessary, i just wanted to look at it
It seemed to me like your question words and your example are not aligned. You said (and did in code):
If the price and type are same, based on type it need to exclude every nth item from the list and then calculate the sum. i.e type B = 3, Type C = 4
Which to me means you should group by Type and Price, so B/5 is one list, and B/10 is another list. But you then said:
Which means in above sample data, since there are 3 items each in type B once it group by price and type it need to exclude every 3rd item when calculate sum. So sum for type B will be 5+5+10+10
I couldn't quite understand this. To me there are 3 items in B/5, so B/5 should be a sum of 10 (B/5 + B/5 + excluded). There are 3 items in B/10, again, should be (B/10 + B/10 + excluded) for a total of 20.
The code above does not group by price. It outputs a collection of 2 items, Type=B,SumWithout=30 and Type=C,SumWithout=60. This one groups by price too, it outputs a 3 item collection:
var x = items.GroupBy(g => new { g.Type, g.Price })
.Select(g => new
{
g.Key.Type,
g.Key.Price,
SumPriceWithoutNthElement =
g.Where((v, i) => (i % (dict[g.Key.Type]??2)) != ((dict[g.Key.Type] ?? 2) - 1))
.Sum(r => r.Price) }
).ToList();
The items are Type=B,Price=5,SumWithout=10 and Type=B,Price=10,SumWithout=20 and Type=C,Price=15,SumWithout=60
Maybe you mean group by type&price, remove every 3rd item (from b, 4th item from c etc), then group again by type only and then sum
This means if your type B prices were
1,1,1,1,2,2,2,2,2
^ ^
we would remove one 1 and one 2 (the Ines with arrows under them), then sum for a total of 9. This is different to removing every 3rd for all type b:
1,1,1,1,2,2,2,2,2
^ ^ ^
?
In which case, maybe group by Type/sum again the SumWithout output from my second example
I did consider that there might be a more efficient ways to do this without LINQ.. and it would nearly certainly be easier to understand the code if if were non LINQ - LINQ isn't necessarily a wonderful magic bullet that can kill all ptoblems, and even though it might look like a hammer with which every problem can be beaten, sometimes it's good to avoid
Depending on how you intended the problem to be solved (is price part of the group key or not) building a dictionary and accumulating 0 instead of th price every Nth element might be one way.. The other way, if price is to be part of the key, could be to sum all the prices and then subtract (count/N)*price from the total price

Grouping by a new object, which is always unique, guarantees you that you'll have as many groups as you have items. Try something like this:
var grp = items.GroupBy(g => $"{g.Type}/{g.Price}").Select(s => new
{
type= s.Value.First().Type,
price = s.Value.First().Price,
count = s.Value.Count()
}).Where(d => count % GetFactorByType(d.type) == 0).ToList();
This way, you group by a string composed from the type/price combination, so if the items are equivalent, the strings will be equal.
The $"{g.Type}/{g.Price}"string amounts to "B/5" for your first three item examples, so it's quite readable as well.

Intersect two object lists on a common property and then compare a different property

I have two lists
List<objA> List1
List<objA> List2
I want to compare these two list on ID field, once a match is found I want to compare another field Distace amongst these two lists and grab the object with the lower value.
Using Linq isn't is not giving the result I want, atleast for the first part of the problem.
var test = List1.Select(x => x.ID)
.Intersect(List2.Select(y => y.ID));

Here's one way you could do this with Linq. Firstly, join the two lists together with Union. Then, group them by the Id field. Lastly, order those sub lists by Distance within the grouping, and take the first one of each to get a list of objects by Id with the minimum available distance.
var aList = new[]
{
new SomeObject() { Id = 1, Distance = 3 },
new SomeObject() { Id = 2, Distance = 5 }
};
var bList = new[]
{
new SomeObject() { Id = 1, Distance = 2 },
new SomeObject() { Id = 2, Distance = 6 }
};
var results = aList
.Union(bList)
.GroupBy(a => a.Id, a => a)
.Select(a => a.OrderBy(b => b.Distance).First());

LINQ look up list of objs in datable of different objs that share a common property

I have list A which is obj A and list B which is obj B. Both list share one property and I want to look up all the obj B of list B has in A and pull them out.
So, ex.
List A is a bunch of people
List B is a bunch of names
Both list have a personId
Now I want to get all the people with the names that are in List B. I was thinking something like a:
class names
{
public int id {get;set;}
public string name {get;set;}
}
class people
{
public int id {get;set;}
public string name {get;set;}
}
var newList = new List<person>();
foreach(var n in names)
{
var person = people.firstordefault(p => p.name == n);
if(person!=null)
{
newList.Add(person);
}
}
}
I was wondering is there is a more efficent way with LINQ I can do this because it wont be a list everytime it might be the database im calling it from and i dont want to call the database a thousands for no reason.
This is probably a bad example if i think about it.

This codes :
var newList = new List<person>();
foreach(var n in names)
{
var person = people.firstordefault(p => p.name == n);
if(person!=null)
{
newList.Add(person);
}
}
will produce the same result as :
var newList = new List<person>();
newList = people.Where(p => names.Contains(p.name)).ToList();
responding your update, if names is a list of names object instead of string, you can do as follow :
newList = people.Where(p => names.Select(o => o.name).Contains(p.name)).ToList();

With LINQ, You can do this:
var intersection = ListA.Intersect(ListB);
However, this is the set intersection, meaning if ListA and ListB don't have unique values in it, you won't get any copies. In other words if you have the following:
var ListA = new [] { 0, 0, 1, 2, 3 };
var ListB = new [] { 0, 0, 0, 2 };
Then ListA.Intersect(ListB) produces:
{ 0, 2 }
If you're expecting:
{ 0, 0, 2 }
Then you're going to have to maintain a count of the items yourself and yield/decrement as you scan the two lists.

Since you're dealing with two different classes, what you're really looking for is a join.
List<Person> people = new List<Person>{new Person{Name = "Mark"},
new Person{Name = "Alice"},
new Person{Name = "Jane"}};
List<string> names = new List<string>{"Mark"};
var query = from p in people
join n in names on p.Name equals n
select p; // will output Person Mark
Note: This has time complexity of O(p+n) (where p = number of people and n = number of names), because join is implemented as a hash join. Your nested loop above or a Where/Contains LINQ query time complexity O(p*n), since it's iterating n for every p. This may or may not be an issue depending on the sizes of your collections.

Need help with merging two data collections

I need to retrieve all items from two lists that contains a given value.
Example:
var list1 = {
new Dummy(){ Name = "Dummy1", Number = 1 },
new Dummy(){ Name = "Dummy2", Number = 2 },
new Dummy(){ Name = "Dummy3", Number = 3 }
};
var list2 = {
new Dummy(){ Name = "Dummy4", Number = 4 },
new Dummy(){ Name = "Dummy5", Number = 2 },
new Dummy(){ Name = "Dummy6", Number = 6 }
};
var list3 = GetAllDummiesWithNumbersContainedInBothLists();
I want list3 to contain Dummy2 and Dummy5, since both have the same number.
How do i do this? It should be simple but i cant figure it out...

See if this works for you:
(from dummy1 in list1
join dummy2 in list2 on dummy1.Number equals dummy2.Number
from dummy in new[] { dummy1, dummy2 }
select dummy)
.Distinct()
This pairs matching dummies into the same scope, then flattens out the set so you get all of the matches in one sequence. The Distinct at the end ensures that each dummy appears exactly once even if either list contains repeated numbers.

I'm not entirely sure what your requirements are, but something like this perhaps?
var commonIds = list1.Select(d => d.Number)
.Intersect(list2.Select(d => d.Number));
var commonIdsSet = new HashSet<int>(commonIds);
var list3 = list1.Concat(list2)
.Where(d => commonIdsSet.Contains(d.Number))
.ToList();
if you can clarify the exact requirements (do the results need to be grouped by theNumber, IsNumberunique for an item within a list etc.), we can provide better solutions.

var list3 = list1.Where(d => list2.Select(d2 => d2.Number).Contains(d.Number))
.Union(list2.Where(d2 => list1.Select(d => d.Number).Contains(d2.Number)));

Here's one more!
var list3 = list1
.SelectMany(x => list2
.SelectMany(y =>
(y.Number == x.Number) ? new [] { x, y } : new Dummy[]{}
)
);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Speed of linq query grouping and intersect in particular - c#

If you want to get access to both elements you probably want a join. A join is a very general construct that can be used to construct all other set operations.

Related

Understanding the GroupJoin and Join in Linq chaining syntax (Homework)

Group and separate list

Intersect two object lists on a common property and then compare a different property

LINQ look up list of objs in datable of different objs that share a common property

Need help with merging two data collections

Categories

Resources