Use LINQ to GroupBy two columns and build a dictionary with them? - c#

I am trying to use LINQ in order to build a record that is sorted by it's Store Preference and has a List of applications that are associated with each CompletionStatusFlag. In order to do this my first thought was to have the record hold a StorePreference, and then have a Dictionary that seperates the applications based on the CompletionStatus.
Employee Application
public class EmployeeApplication
{
public int Id { get; set; }
public StorePreference { get; set; }
public CompletionStatusFlag CompletionStatus { get; set; }
}
ApplicationCountRecord
public class ApplicationCountRecord
{
public Store StorePreference { get; set; }
public Dictionary<CompletionStatusFlag, List<EmployeeApplication>> StatusApplicationPairs { get; set; }
= new Dictionary<CompletionStatusFlag, List<EmployeeApplication>>();
public int TotalCount => StatusApplicationPairs.Count();
}
The problem arises when trying to create the dictionary. I have the completion status saved, but how do I get the current applications that match the completion status in order to pass them into the dictionary?
public void ConvertAppsToResults()
{
var applications = _appRepo.Query()
.GroupBy(x => new { x.StorePreference, x.CompletionStatus })
.Select(y => new ApplicationCountRecord()
{
StorePreference = y.Key.StorePreference,
//Somehow create dictionary here.
});
}
Heres is another incorrect attempt that wont even compile, but it might help you see my thought process.
public void ConvertAppsToResults()
{
//get apps in DateFilter range, and creates records seperated by StorePreference, and CompletionStatus
var applications = _applicationRepo.Query()
.GroupBy(x => new { x.StorePreference1, x.CompletionStatus })
.Select(y => new ApplicationCountRecord()
{
StorePreference = y.Key.StorePreference,
StatusApplicationPairs = new Dictionary<CompletionStatusFlag, List<EmployeeApplication>>()
.Add(y.Key.CompletionStatus, y.Where(x => x.CompletionStatus == y.Key.CompletionStatus).ToList());
});
}

There's a double grouping going on here, which is possibly the source of your confusion
employeeApplications
.GroupBy(ea => ea.Store)
.Select(g => new ApplicationCountRecord()
{
StorePreference = g.Key
StatusApplicationPairs = g.GroupBy(ea => ea.CompletionStatus).ToDictionary(g2 => g2.Key, g2 => g2.ToList())
}
)
Suppose you have 100 EmployeeApplications, across 10 Stores, and there are 5 statuses and 2 applications in each status. 2 apps * 5 statuses * 10 stores = 100 applications
GroupBy(Store) takes your list of e.g. 100 EmployeeApplications (10 per store) and groups it into 10 IGrouping that are conceptually each a list of 10 EmployeeApplications. 10 IGrouping for Store1, 10 IGrouping for Store2 etc..
Select runs over the 10 groupings, and on each one g (which, remember, behaves like a list of EmployeeApplications that all have the same Store, given in g.Key) it groups again by calling GroupBy(CompletionStatus) to further subdivide the 10 EAs in g on CompletionStatus. The 10 EAs for e.g. "Store 1" are divvied up into 5 IGroupings (5 statuses) that have 2 EAs in each inner grouping, then the same process is done for Store 2, 3 etc. There are thus 5 g2.Keys, one for each of the 5 statuses and there are 2 EAs in each g2
ToDictionary is called after it's grouped, so you get a dictionary with 5 keys and each key relates to a list of 2 applications. The ToDictionary takes two arguments; what to use for the key (g2.Key is a Status) and what to use for the value (g2.ToList() realizes a List<EmployeeApplication>)
= new Dictionary<CompletionStatusFlag, List<EmployeeApplication>>(); is unnecessary in AppCountRecord, as it will be replaced anyway

Related

LINQ join failing between table and list of objects

I need to perform an update on a table with values from a List of objects in C# .NET Core 3.0. I tried to use the Join method, but receive this error:
Processing of the LINQ expression
DbSet<Room>
.Join(
outer: __p_0,
inner: p => p.RoomId,
outerKeySelector: s => s.ruId,
innerKeySelector: (s, p) => new {
kuku = s,
riku = p
})
by 'NavigationExpandingExpressionVisitor' failed. This may indicate either a bug or a limitation in EF Core. See link for more detailed information.
public class Room
{
[DatabaseGenerated(DatabaseGeneratedOption.None)]
[Key]
public int RoomId { get; set; }
[StringLength(50, MinimumLength = 3)]
public string RoomAddress { get; set; }
}
public class roomsForUpdate
{
public int ruId { get; set; }
public string ruName { get; set; }
}
var roomList = new List<roomsForUpdate>() { new roomsForUpdate { ruId = 1, ruName = "aa" }, new roomsForUpdate { ruId = 2, ruName = "bb" } };
var result = _context.Room.Join(roomList, p => p.RoomId, s => s.ruId, (s, p) => new { kuku = s, riku = p }).ToList();
You cannot join the EF Core LINQ query with a local list, because it can't be translated into SQL. Better first you get the database data and then join in memory.
LINQ is not meant to change the sources, it can only extract data from the sources. If you need to update data, you first fetch the items that must be updated, then you update them. Alternatively you can use plain old SQL to update the data without fetching it first.
In local memory, you have a sequence of RoomsForUpdate. Every RoomForUpdate has an Id (RuId) and a Name.
In your database you have a table with Rooms, Every Room in this table has an Id in RoomId and a RoomAddress.
It seems to me, that you want to update all Rooms that have an RoomId, that is one of the RuIds in your sequence of RoomsForUpdate. In other words: fetch (some properties of) all Rooms that have a value for RoomId that is a RuId in your sequence of RoomsForUpdate:
var roomsToUpdate = new List<roomsForUpdate>()
{
new roomsForUpdate { ruId = 1, ruName = "aa" },
new roomsForUpdate { ruId = 2, ruName = "bb" }
};
// Extract the Ids of the rooms that must be fetched
var roomToUpdateIds = roomsToUpdate.Select(room => room.ruId);
// Fetch all rooms from the database that have a RoomId that is in this sequence
var fetchedRooms = dbContext.Rooms
.Where(room => roomToUpdateIds.Contains(room => room.RoomId)
.ToList();
Of course you can put everything into one big LINQ statement. This will not improve efficiency, however it will deteriorate readability of your code.
Now to update the Rooms, you'll have to enumerate them one by one, and give the fetched rooms new values. You didn't say which new value you want. I have an inkling that you want to assign RuName to RoomAddress. This means that you have to combine the Room with the new value for the RoomAddress.
This can be done by LINQ:
var roomsWithExpectedNewValues = fetchedRooms.Join(roomsToUpdate,
fetchedRoom => fetchedRoom.RoomId, // from every fetched room take the Id
roomToUpdate => roomToUpdate.RuId, // from every room to update take the RuId
// for every fetchedRoom with its matching room to update, make one new:
(fetchedRoom, roomToUpdate) => new
{
Room = fetchedRoom,
NewValue = roomToUpdate.RuName,
})
.ToList();
To actually perform the update, you'll have to enumerate this sequence:
foreach (var itemToUpdate in roomsWithExpectedNewValues)
{
// assign RuName to RoomName
itemToUpdate.Room.RoomName = itemToUpdate.NewValue;
}
dbContext.SaveChanges();
A little less LINQ
Although this works, there seems to be a lot of magic going on. The join will internally make a Dictionary for fast lookup, and throws it away. I think a little less LINQ will make it way more easy to understand what's going on.
// your original roomsToUpdate
var roomsToUpdate = new List<roomsForUpdate>()
{
new roomsForUpdate { ruId = 1, ruName = "aa" },
new roomsForUpdate { ruId = 2, ruName = "bb" }
};
var updateDictionary = roomsToUpdate.ToDictionary(
room => room.RuId, // key
room => room.RuName) // value
The Keys of the dictionary are the IDs of the rooms that you want to fetch:
// fetch the rooms that must be updated:
var fetchedRooms = dbContext.Rooms
.Where(room => updateDictionary.Keys.Contains(room => room.RoomId)
.ToList();
// Update:
foreach (var fetchedRoom in fetchedRooms)
{
// from the dictionary fetch the ruName:
var ruName = updateDicationary[fetchedRoom.RoomId];
// assign the ruName to RoomAddress
fetchedRoom.RoomAddress = ruName;
// or if you want, do this in one statement:
fetchedRoom.RoomAddress = updateDicationary[fetchedRoom.RoomId];
}
dbContext.SaveChanges();

How to get records in EF that match a list of combinations (key/values)?

I have a database table with records for each user/year combination.
How can I get data from the database using EF and a list of userId/year combinations?
Sample combinations:
UserId Year
1 2015
1 2016
1 2018
12 2016
12 2019
3 2015
91 1999
I only need the records defined in above combinations. Can't wrap my head around how to write this using EF/Linq?
List<UserYearCombination> userYears = GetApprovedYears();
var records = dbcontext.YearResults.Where(?????);
Classes
public class YearResult
{
public int UserId;
public int Year;
public DateTime CreatedOn;
public int StatusId;
public double Production;
public double Area;
public double Fte;
public double Revenue;
public double Diesel;
public double EmissionsCo2;
public double EmissionInTonsN;
public double EmissionInTonsP;
public double EmissionInTonsA;
....
}
public class UserYearCombination
{
public int UserId;
public int Year;
}
This is a notorious problem that I discussed before here. Krishna Muppalla's solution is among the solutions I came up with there. Its disadvantage is that it's not sargable, i.e. it can't benefit from any indexes on the involved database fields.
In the meantime I coined another solution that may be helpful in some circumstances. Basically it groups the input data by one of the fields and then finds and unions database data by grouping key and a Contains query of group elements:
IQueryable<YearResult> items = null;
foreach (var yearUserIds in userYears.GroupBy(t => t.Year, t => t.UserId))
{
var userIds = yearUserIds.ToList();
var grp = dbcontext.YearResults
.Where(x => x.Year == yearUserIds.Key
&& userIds.Contains(x.UserId));
items = items == null ? grp : items.Concat(grp);
}
I use Concat here because Union will waste time making results distinct and in EF6 Concat will generate SQL with chained UNION statements while Union generates nested UNION statements and the maximum nesting level may be hit.
This query may perform well enough when indexes are in place. In theory, the maximum number of UNIONs in a SQL statement is unlimited, but the number of items in an IN clause (that Contains translates to) should not exceed a couple of thousands. That means that
the content of your data will determine which grouping field performs better, Year or UserId. The challenge is to minimize the number of UNIONs while keeping the number of items in all IN clauses below approx. 5000.
you can try this
//add the possible filters to LIST
var searchIds = new List<string> { "1-2015", "1-2016", "2-2018" };
//use the list to check in Where clause
var result = (from x in YearResults
where searchIds.Contains(x.UserId.ToString()+'-'+x.Year.ToString())
select new UserYearCombination
{
UserId = x.UserId,
Year = x.Year
}).ToList();
Method 2
var d = YearResults
.Where(x=>searchIds.Contains(x.UserId.ToString() + '-' + x.Year.ToString()))
.Select(x => new UserYearCombination
{
UserId = x.UserId,
Year = x.Year
}).ToList();

How to use linq to split a list of objects by half and then group them [duplicate]

This question already has answers here:
Split List into Sublists with LINQ
(34 answers)
Closed 3 years ago.
I have a list of n objects (17 for now) and I wanted to know if it is possible to take said list and split it into (potentially) 2 groups. That way the end result would be
NewList
-"GroupA"
-List1 = {"john", "mary", "sam"}
-"GroupB"
-List2 = {"tony", "aaron"}
The desired result would help me output the first half of the list of students in page 1 and then using paging the user can then view the remaining list on the next page.
Right now I am trying to do something like this:
var groupList = Classroom.GroupBy(o => o).Select(grp=>grp.Take((Classroom.Count + 1) / 2)).ToList();
But when I debug it I'm still getting the full list. Can it be done via linq?
You can create group by some property. For example, we have 50 students, then we can make GroupId property and group them by GroupId property:
var students = new List<Student>();
for (int i = 0; i < 50; i++)
{
students.Add(new Student { Id = i, Name = $"Student { i }" });
}
var sectionedStudents = students.Select(s => new
{
GroudId = s.Id / 10,
s.Id,
s.Name
});
var groupedStudents = sectionedStudents.GroupBy(s => s.GroudId);
and Person class:
class Student
{
public int Id { get; set; }
public string Name { get; set; }
}

Random with condition

I have the following code to extract records from a dbcontext randomly using Guid class:
var CategoryList = {1,5};
var generatedQues = new List<Question>();
//Algorithm 1 :)
if (ColNum > 0)
{
generatedQues = db.Questions
.Where(q => CategoryList.Contains(q.CategoryId))
.OrderBy(q => Guid.NewGuid()).Take(ColNum).ToList();
}
First, I have a list of CategoryId stored in CategoryList as a condition to be fulfilled when getting records from the db. However, I would like to achieve an even distribution among the questions based on the CategoryId.
For example:
If the ColNum is 10, and the CategoryId obtained are {1,5}, I would like to achieve by getting 5 records that are from CategoryId = 1 and another set of 5 records from CategoryId = 5. If the ColNum is an odd number like 11, I would also like to achieve an even distribution as much as possible like maybe getting 5 records from CategoryId 1 and 6 records from CategoryId 2.
How do I do this?
This is a two step process,
Determine how many you want for each category
Select that many items from each category in a random order
For the first part, define a class to represent the category and how many items are required
public class CategoryLookup
{
public CategoryLookup(int catId)
{
this.CategoryId = catId;
}
public int CategoryId
{
get; private set;
}
public int RequiredAmount
{
get; private set;
}
public void Increment()
{
this.RequiredAmount++;
}
}
And then, given your inputs of the required categories and the total number of items required, work out how many are required for each category
var categoryList = new []{1,5};
var colNum = 7;
var categoryLookup = categoryList.Select(x => new CategoryLookup(x)).ToArray();
for(var i = 0;i<colNum;i++){
categoryLookup[i%categoryList.Length].Increment();
}
The second part is really easy, just use a SelectMany to get the list of questions (Ive used a straight linq to objects to test, should work fine for database query. questions in my code would just be db.Questions in yours)
var result = categoryLookup.SelectMany(
c => questions.Where(q => q.CategoryId == c.CategoryId)
.OrderBy(x => Guid.NewGuid())
.Take(c.RequiredAmount)
);
Live example: http://rextester.com/RHF33878
You could try something like this:
var CategoryList = {1,5};
var generatedQues = new List<Question>();
//Algorithm 1 :)
if (ColNum > 0 && CategoryList.Count > 0)
{
var take = // Calculate how many of each
// First category
var query = db.Questions
.Where(q => q.CategoryId == CategoryList[0])
.OrderBy(q => Guid.NewGuid()).Take(take);
// For all remaining categories
for(int i = 1; i < CategoryList.Count; i++)
{
// Calculate how many you want
take = // Calculate how many of each
// Union the questions for that category to query
query = query.Union(
query
.Where(q => q.CategoryId == CategoryList[i])
.OrderBy(q => Guid.NewGuid()).Take(take));
}
// Randomize again and execute query
generatedQues = query.OrderBy(q => Guid.NewGuid()).ToList()
}
The idea is to just get a random list for each category and add them all together. Then you randomize that again and create your list. I do not know if it will do all this on the database or in memory, but it should be database I think. The resulting SQL will look horrible though.

Faster ways to access subsets through LINQ

This is a question about SPEED - there are a LOT of records to be accessed.
Basic Information About The Problem
As an example, we will have three tables in a Database.
Relations:
Order-ProductInOrder is One-To-Many (an order can have many products in the order)
ProductInOrder- Product is One-To-One (a product in the order is represented by one product)
public class Order {
public bool Processed { get; set; }
// this determines whether the order has been processed
// - orders that have do not go through this again
public int OrderID { get; set; } //PK
public decimal TotalCost{ get; set; }
public List<ProductInOrder> ProductsInOrder;
// from one-to-many relationship with ProductInOrder
// the rest is irrelevant and will not be included here
}
//represents an product in an order - an order can have many products
public class ProductInOrder {
public int PIOD { get; set; } //PK
public int Quantity{ get; set; }
public int OrderID { get; set; }//FK
public Order TheOrder { get; set; }
// from one-to-many relationship with Order
public int ProductID { get; set; } //FK
public Product TheProduct{ get; set; }
//from one-to-one relationship with Product
}
//information about a product goes here
public class Product {
public int ProductID { get; set; } //PK
public decimal UnitPrice { get; set; } //the cost per item
// the rest is irrelevant to this question
}
Suppose we receive a batch of orders where we need to apply discounts to and find the total price of the order. This could apply to anywhere from 10,000 to over 100,000 orders. The way this works is that if an order has 5 or more products where the cost each is $100, we will give a 10% discount on the total price.
What I Have Tried
I have tried the following:
//this part gets the product in order with over 5 items
List<Order> discountedOrders = orderRepo
.Where(p => p.Processed == false)
.ToList();
List<ProductInOrder> discountedProducts = discountedOrders
.SelectMany(p => p.ProductsInOrder)
.Where(q => q.Quantity >=5 )
.ToList();
discountedProducts = discountedProducts
.Where(p => p.Product.UnitPrice >= 100.00)
.ToList();
discountOrders = discountedOrders
.Where(p => discountProducts.Any(q => q.OrderID == p.OrderID))
.ToList();
This is very slow and takes forever to run, and when I run integration tests on it, the test seems to time out. I was wondering if there is a faster way to do this.
Try to not call ToList after every query.
When you call ToList on a query it is executed and the objects are loaded from the database in memory. Any subsequent query based on the results from the first query is performed in memory on the list instead of performing it directly in the database. What you want to do here is to execute the whole query on the database and return only those results which verify all your conditions.
var discountedOrders = orderRepo
.Where(p=>p.Processed == false);
var discountedProducts = discountedOrders
.SelectMany(p=>p.ProductsInOrder)
.Where(q=>q.Quantity >=5);
discountedProducts = discountedProducts
.Where(p=>p.Product.UnitPrice >= 100.00);
discountOrders = discountedOrders
.Where(p=>discountProducts.Any(q=>q.OrderID == p.OrderID));
Well, for one thing, combining those calls will speed it up some. Try this:
discountOrders = orderRepo.Where(p=>p.Processed == false && p.SelectMany(q=>q.ProductsInOrder).Where(r=>r.Quantity >=5 && r.Product.UnitPrice >= 100.00 && r.OrderID == p.OrderId).Count() > 0).ToList();
Note that this isn't tested. I hope I got the logic right-- I think I did, but let me know if I didn't.
Similar to #PhillipSchmidt, you could rationalize your Linq
var discountEligibleOrders =
allOrders
.Where(order => !order.Processed
&& order
.ProductsInOrder
.Any(pio => pio.TheProduct.UnitPrice >= 100M
&& pio.Quantity >= 5))
Removing all those nasty ToList statements is a great start because you're pulling potentially significantly larger sets from the db to your app than you need to. Let the database do the work.
To get each order and its price (assuming a discounted price of 0.9*listed price):
var ordersAndPrices =
allOrders
.Where(order => !order.Processed)
.Select(order => new {
order,
isDiscounted = order
.ProductsInOrder
.Any(pio => pio.TheProduct.UnitPrice >= 100M
&& pio.Quantity >= 5)
})
.Select(x => new {
order = x.order,
price = x.order
.ProductsInOrder
.Sum(p=> p.Quantity
* p.TheProduct.UnitPrice
* (x.isDiscounted ? 0.9M : 1M))});
I know you have an accepted answer but please try this for added speed - PLINQ (Parallel LINQ) this will take a list of 4000 and if you have 4 cores it will filter 1000 on each core and then collate the results.
List<Order> orders = new List<Order>();
var parallelQuery = (from o in orders.AsParallel()
where !o.Processed
select o.ProductsInOrder.Where(x => x.Quantity >= 5 &&
x.TheProduct.UnitPrice >= 100.00 &&
orders.Any(x => x.OrderID = x.OrderID));
Please see here:
In many scenarios, PLINQ can significantly increase the speed of LINQ to Objects queries by using all available cores on the host computer more efficiently. This increased performance brings high performance computing power onto the desktop
http://msdn.microsoft.com/en-us/library/dd460688.aspx
move that into 1 query, but actually you should move this into a SSIS package or a sql job. You could easily make this a stored proc that runs in less than a second.

Categories

Resources