Faster ways to access subsets through LINQ

Faster ways to access subsets through LINQ - c#

This is a question about SPEED - there are a LOT of records to be accessed.
Basic Information About The Problem
As an example, we will have three tables in a Database.
Relations:
Order-ProductInOrder is One-To-Many (an order can have many products in the order)
ProductInOrder- Product is One-To-One (a product in the order is represented by one product)
public class Order {
public bool Processed { get; set; }
// this determines whether the order has been processed
// - orders that have do not go through this again
public int OrderID { get; set; } //PK
public decimal TotalCost{ get; set; }
public List<ProductInOrder> ProductsInOrder;
// from one-to-many relationship with ProductInOrder
// the rest is irrelevant and will not be included here
}
//represents an product in an order - an order can have many products
public class ProductInOrder {
public int PIOD { get; set; } //PK
public int Quantity{ get; set; }
public int OrderID { get; set; }//FK
public Order TheOrder { get; set; }
// from one-to-many relationship with Order
public int ProductID { get; set; } //FK
public Product TheProduct{ get; set; }
//from one-to-one relationship with Product
}
//information about a product goes here
public class Product {
public int ProductID { get; set; } //PK
public decimal UnitPrice { get; set; } //the cost per item
// the rest is irrelevant to this question
}
Suppose we receive a batch of orders where we need to apply discounts to and find the total price of the order. This could apply to anywhere from 10,000 to over 100,000 orders. The way this works is that if an order has 5 or more products where the cost each is $100, we will give a 10% discount on the total price.
What I Have Tried
I have tried the following:
//this part gets the product in order with over 5 items
List<Order> discountedOrders = orderRepo
.Where(p => p.Processed == false)
.ToList();
List<ProductInOrder> discountedProducts = discountedOrders
.SelectMany(p => p.ProductsInOrder)
.Where(q => q.Quantity >=5 )
.ToList();
discountedProducts = discountedProducts
.Where(p => p.Product.UnitPrice >= 100.00)
.ToList();
discountOrders = discountedOrders
.Where(p => discountProducts.Any(q => q.OrderID == p.OrderID))
.ToList();
This is very slow and takes forever to run, and when I run integration tests on it, the test seems to time out. I was wondering if there is a faster way to do this.

Try to not call ToList after every query.
When you call ToList on a query it is executed and the objects are loaded from the database in memory. Any subsequent query based on the results from the first query is performed in memory on the list instead of performing it directly in the database. What you want to do here is to execute the whole query on the database and return only those results which verify all your conditions.
var discountedOrders = orderRepo
.Where(p=>p.Processed == false);
var discountedProducts = discountedOrders
.SelectMany(p=>p.ProductsInOrder)
.Where(q=>q.Quantity >=5);
discountedProducts = discountedProducts
.Where(p=>p.Product.UnitPrice >= 100.00);
discountOrders = discountedOrders
.Where(p=>discountProducts.Any(q=>q.OrderID == p.OrderID));

Well, for one thing, combining those calls will speed it up some. Try this:
discountOrders = orderRepo.Where(p=>p.Processed == false && p.SelectMany(q=>q.ProductsInOrder).Where(r=>r.Quantity >=5 && r.Product.UnitPrice >= 100.00 && r.OrderID == p.OrderId).Count() > 0).ToList();
Note that this isn't tested. I hope I got the logic right-- I think I did, but let me know if I didn't.

Similar to #PhillipSchmidt, you could rationalize your Linq
var discountEligibleOrders =
allOrders
.Where(order => !order.Processed
&& order
.ProductsInOrder
.Any(pio => pio.TheProduct.UnitPrice >= 100M
&& pio.Quantity >= 5))
Removing all those nasty ToList statements is a great start because you're pulling potentially significantly larger sets from the db to your app than you need to. Let the database do the work.
To get each order and its price (assuming a discounted price of 0.9*listed price):
var ordersAndPrices =
allOrders
.Where(order => !order.Processed)
.Select(order => new {
order,
isDiscounted = order
.ProductsInOrder
.Any(pio => pio.TheProduct.UnitPrice >= 100M
&& pio.Quantity >= 5)
})
.Select(x => new {
order = x.order,
price = x.order
.ProductsInOrder
.Sum(p=> p.Quantity
* p.TheProduct.UnitPrice
* (x.isDiscounted ? 0.9M : 1M))});

I know you have an accepted answer but please try this for added speed - PLINQ (Parallel LINQ) this will take a list of 4000 and if you have 4 cores it will filter 1000 on each core and then collate the results.
List<Order> orders = new List<Order>();
var parallelQuery = (from o in orders.AsParallel()
where !o.Processed
select o.ProductsInOrder.Where(x => x.Quantity >= 5 &&
x.TheProduct.UnitPrice >= 100.00 &&
orders.Any(x => x.OrderID = x.OrderID));
Please see here:
In many scenarios, PLINQ can significantly increase the speed of LINQ to Objects queries by using all available cores on the host computer more efficiently. This increased performance brings high performance computing power onto the desktop
http://msdn.microsoft.com/en-us/library/dd460688.aspx

move that into 1 query, but actually you should move this into a SSIS package or a sql job. You could easily make this a stored proc that runs in less than a second.

Related

Use LINQ to GroupBy two columns and build a dictionary with them?

I am trying to use LINQ in order to build a record that is sorted by it's Store Preference and has a List of applications that are associated with each CompletionStatusFlag. In order to do this my first thought was to have the record hold a StorePreference, and then have a Dictionary that seperates the applications based on the CompletionStatus.
Employee Application
public class EmployeeApplication
{
public int Id { get; set; }
public StorePreference { get; set; }
public CompletionStatusFlag CompletionStatus { get; set; }
}
ApplicationCountRecord
public class ApplicationCountRecord
{
public Store StorePreference { get; set; }
public Dictionary<CompletionStatusFlag, List<EmployeeApplication>> StatusApplicationPairs { get; set; }
= new Dictionary<CompletionStatusFlag, List<EmployeeApplication>>();
public int TotalCount => StatusApplicationPairs.Count();
}
The problem arises when trying to create the dictionary. I have the completion status saved, but how do I get the current applications that match the completion status in order to pass them into the dictionary?
public void ConvertAppsToResults()
{
var applications = _appRepo.Query()
.GroupBy(x => new { x.StorePreference, x.CompletionStatus })
.Select(y => new ApplicationCountRecord()
{
StorePreference = y.Key.StorePreference,
//Somehow create dictionary here.
});
}
Heres is another incorrect attempt that wont even compile, but it might help you see my thought process.
public void ConvertAppsToResults()
{
//get apps in DateFilter range, and creates records seperated by StorePreference, and CompletionStatus
var applications = _applicationRepo.Query()
.GroupBy(x => new { x.StorePreference1, x.CompletionStatus })
.Select(y => new ApplicationCountRecord()
{
StorePreference = y.Key.StorePreference,
StatusApplicationPairs = new Dictionary<CompletionStatusFlag, List<EmployeeApplication>>()
.Add(y.Key.CompletionStatus, y.Where(x => x.CompletionStatus == y.Key.CompletionStatus).ToList());
});
}

There's a double grouping going on here, which is possibly the source of your confusion
employeeApplications
.GroupBy(ea => ea.Store)
.Select(g => new ApplicationCountRecord()
{
StorePreference = g.Key
StatusApplicationPairs = g.GroupBy(ea => ea.CompletionStatus).ToDictionary(g2 => g2.Key, g2 => g2.ToList())
}
)
Suppose you have 100 EmployeeApplications, across 10 Stores, and there are 5 statuses and 2 applications in each status. 2 apps * 5 statuses * 10 stores = 100 applications
GroupBy(Store) takes your list of e.g. 100 EmployeeApplications (10 per store) and groups it into 10 IGrouping that are conceptually each a list of 10 EmployeeApplications. 10 IGrouping for Store1, 10 IGrouping for Store2 etc..
Select runs over the 10 groupings, and on each one g (which, remember, behaves like a list of EmployeeApplications that all have the same Store, given in g.Key) it groups again by calling GroupBy(CompletionStatus) to further subdivide the 10 EAs in g on CompletionStatus. The 10 EAs for e.g. "Store 1" are divvied up into 5 IGroupings (5 statuses) that have 2 EAs in each inner grouping, then the same process is done for Store 2, 3 etc. There are thus 5 g2.Keys, one for each of the 5 statuses and there are 2 EAs in each g2
ToDictionary is called after it's grouped, so you get a dictionary with 5 keys and each key relates to a list of 2 applications. The ToDictionary takes two arguments; what to use for the key (g2.Key is a Status) and what to use for the value (g2.ToList() realizes a List<EmployeeApplication>)
= new Dictionary<CompletionStatusFlag, List<EmployeeApplication>>(); is unnecessary in AppCountRecord, as it will be replaced anyway

How to get records in EF that match a list of combinations (key/values)?

I have a database table with records for each user/year combination.
How can I get data from the database using EF and a list of userId/year combinations?
Sample combinations:
UserId Year
1 2015
1 2016
1 2018
12 2016
12 2019
3 2015
91 1999
I only need the records defined in above combinations. Can't wrap my head around how to write this using EF/Linq?
List<UserYearCombination> userYears = GetApprovedYears();
var records = dbcontext.YearResults.Where(?????);
Classes
public class YearResult
{
public int UserId;
public int Year;
public DateTime CreatedOn;
public int StatusId;
public double Production;
public double Area;
public double Fte;
public double Revenue;
public double Diesel;
public double EmissionsCo2;
public double EmissionInTonsN;
public double EmissionInTonsP;
public double EmissionInTonsA;
....
}
public class UserYearCombination
{
public int UserId;
public int Year;
}

This is a notorious problem that I discussed before here. Krishna Muppalla's solution is among the solutions I came up with there. Its disadvantage is that it's not sargable, i.e. it can't benefit from any indexes on the involved database fields.
In the meantime I coined another solution that may be helpful in some circumstances. Basically it groups the input data by one of the fields and then finds and unions database data by grouping key and a Contains query of group elements:
IQueryable<YearResult> items = null;
foreach (var yearUserIds in userYears.GroupBy(t => t.Year, t => t.UserId))
{
var userIds = yearUserIds.ToList();
var grp = dbcontext.YearResults
.Where(x => x.Year == yearUserIds.Key
&& userIds.Contains(x.UserId));
items = items == null ? grp : items.Concat(grp);
}
I use Concat here because Union will waste time making results distinct and in EF6 Concat will generate SQL with chained UNION statements while Union generates nested UNION statements and the maximum nesting level may be hit.
This query may perform well enough when indexes are in place. In theory, the maximum number of UNIONs in a SQL statement is unlimited, but the number of items in an IN clause (that Contains translates to) should not exceed a couple of thousands. That means that
the content of your data will determine which grouping field performs better, Year or UserId. The challenge is to minimize the number of UNIONs while keeping the number of items in all IN clauses below approx. 5000.

you can try this
//add the possible filters to LIST
var searchIds = new List<string> { "1-2015", "1-2016", "2-2018" };
//use the list to check in Where clause
var result = (from x in YearResults
where searchIds.Contains(x.UserId.ToString()+'-'+x.Year.ToString())
select new UserYearCombination
{
UserId = x.UserId,
Year = x.Year
}).ToList();
Method 2
var d = YearResults
.Where(x=>searchIds.Contains(x.UserId.ToString() + '-' + x.Year.ToString()))
.Select(x => new UserYearCombination
{
UserId = x.UserId,
Year = x.Year
}).ToList();

How to count the number of items that exist in a table efficiently with EntityFramework?

I have a C# project in which several items are stored in different tables, for example to count how many elements a table contains I do something similar to the following:
public int getLengthListProducts(int idCompany)
{
try
{
using (var context = new ccoFinalEntities())
{
return context.products.Where(p => true == p.status && idCompany == p.idCompany).ToList().Count;
}
}
catch
{
return -1;
}
}
So far it has worked very well, but when the amount starts to be 1000 items, on some PCs it starts to take a while to get this number.
I suspect that context.products places all items in RAM and then begins to extract and count those that meet the following conditions and that is why the application is frozen until the count is finished.
My question is: Is there a way to do it better?
For example I thought that I should resort directly to SQL statements instead of using EntityFramework to get that number, but I don't know if that would be a good idea, or if there is a more efficient way with EntityFramework.
Any comments or suggestions are welcome.

context.products.Where(p => true == p.status && idCompany == p.idCompany).ToList().Count;
In this Linq query, ToList() will generate the SQL Query :
SELECT ...
FROM Products
WHERE status = 1 and idCompagny = #idCompany
This query is executed on your database and can return a lot of rows.
All elements is loaded in client's memory in .Net Collection and Count return final result.
With Entity Framework, you can use aggregate Linq query (Count, Sum, Avg, ...) :
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/linq/aggregate-queries
Example :
context.products.Where(p => true == p.status && idCompany == p.idCompany).Count();
Count() will generate the SQL Query :
SELECT COUNT(*)
FROM Products
WHERE status = 1 and idCompagny = #idCompany
The query is executed on your database and return scalar result.

context.products.Count(p => p.status == true && idCompany == p.idCompany);
or
context.products.Where(p => p.idCompany == idCompany)
.Count(p => p.status == true);
(for readability)
will be enough.

For SQL-Server you can do this
string cmd = #"SELECT
t.NAME AS TableName,
s.Name AS SchemaName,
p.rows AS RowCounts,
SUM(a.total_pages) * 8 AS TotalSpaceKB,
SUM(a.used_pages) * 8 AS UsedSpaceKB,
(SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB
FROM
sys.tables t
INNER JOIN
sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN
sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN
sys.schemas s ON t.schema_id = s.schema_id
WHERE
t.NAME NOT LIKE 'dt%'
AND t.is_ms_shipped = 0
AND i.OBJECT_ID > 255
GROUP BY
t.Name, s.Name, p.Rows
"
db.Database.SqlQuery(cmd).ToList<Statistic>();
with
public class Statistic
{
public string TableName { get; set; }
public string SchemaName { get; set; }
public long RowCounts { get; set; }
public long TotalSpaceKB { get; set; }
public long UsedSpaceKB { get; set; }
public long UnusedSpaceKB { get; set; }
}
Then you get a pretty fast overview of all your tables, here including size in memory.
I know hard coded SQL ist not to be prefered, but since its relating to System-Tables only, and wherefore you can assume, they never change, it's acceptable.

how to use LINQ TakeWhile and Join to a list

I want to map oldest possible credit invoice's date to a list from sale table. I have a list in following manner.
var tb = // Some other method to get balances.
cust balance
1 1000
2 2000
3 3000
... so on ...
These balance are accumulation of few invoices or sometimes may be one invoice.
public class Sales
{
public int id { get; set; }
public DateTime saleDate { get; set; }
public int? cust { get; set; }
public decimal invoiceValue { get; set; }
// Other properties...
}
Sample data for cust 1
saleInvNo saleDate cust invoiceValue
1 2018/12/01 1 500
12 2018/12/20 1 750
If I fetch now, report should be as follows.
cust balance balanceFromDate
1 1000 2018/12/01 // balance from this onwards.
2 2000 ???
3 3000 ???
Is there easy way to achieve this through LINQ.
I tried with TakeWhile but attempt was not successful.
foreach (var t in tb)
{
var sum = 0m;
var q = _context.SaleHeaders.Where(w=>w.cust == t.cust)
.OrderByDescending(o=>o.saleDate)
.Select(s => new { s.id, s.saleDate, s.invoiceValue }).AsEnumerable()
.TakeWhile(x => { var temp = sum; sum += x.invoiceValue; return temp > t.balance; });
}
Note: Sales table is having ~75K records.
More Info...
Customers may pay partial amount than Sale Invoice. All Payments, Invoices are posted to another table thats why balances comes from a complex query on another table.
Sales table has purely raw sales data.
Now, cust 1 balance is 1000 than the value is of last sale invoice or even last to last sale invoice's full or partial. I need to map "balance from onwards".

So you have two sequences: a sequence of Balances and a sequence of Sales.
Every Balance has at least an int property CustomerId, and a BalanceValue
Every Sale has at least a nullable int property CustomerId and a DateTime property SaleDate
class Balance
{
public int CustomerId {get; set;}
public decimal BalanceValue {get; set;}
}
class Sale
{
public int? CustomerId {get; set;}
public DateTime SaleDate {get; set;}
}
Apparently it is possible to have a Sale without a Customer.
Now given a sequence of Balances and Sales, you want for every Balance one object, containing the CustomerId, the BalanceValue and the SaleDate that this Customer has in the sequence of Sales.
Your requirement doesn't specify what you want with Balances in your sequence of Balances that have a Customer without a Sale in the sequence of Sales. Let's assume you want those items in your result with a null LastSaleDate.
IEnumerable<Balance> balances = ...
IEnumerable<Sales> sales = ...
We are not interested in sales without a customer. So let's filter them out first:
var salesWithCustomers = sales.Where(sale => sale.CustomerId != null);
Now make groups of Balances with their sales, by groupjoiningon CustomerId:
var balancesWithSales = balances.GroupJoin(salesWithCustomers, // GroupJoin balances and sales
balance => balance.CustomerId, // from every Balance take the CustomerId
sale => sale.CustomerId, // from every Sale take the CustomerId. I know it is not null,
(balance, salesOfBalance) => new // from every balance, with all its sales
{ // make one new object
CustomerId = balance.CustomerId,
Balance = balance.BalanceValue,
LastSaleDate = salesOfBalance // take all salesOfBalance
.Select(sale => sale.SaleDate) // select the saleDate
.OrderByDescending(sale => sale.SaleDate) // order: newest dates first
.FirstOrDefault(), // keep only the first element
})
The calculation of the LastSaleDate orders all elements, and keeps only the first.
Although this solution workd, it is not efficient to order all other elements if you only need the largest one. If you are working with IEnumerables, instead of IQueryables (as would be in a database), you can optimize this, by creating a function.
I implement this as an extension function. See extension methods demystified
static DateTime? NewestDateOrDefault(this IEnumerable<DateTime> dates)
{
IEnumerator<DateTime> enumerator = dates.GetEnumerator();
if (!enumerator.MoveNext())
{
// sequence of dates is empty; there is no newest date
return null;
}
else
{ // sequence contains elements
DateTime newestDate = enumerator.Current;
while (enumerator.MoveNext())
{ // there are more elements
if (enumerator.Current > newestDate)
newestDate = enumerator.Current);
}
return newestDate;
}
}
Usage:
LastSaleDate = salesOfBalance
.Select(sale => sale.SaleDate)
.NewesDateOrDefault();
Now you know that instead of sorting the complete sequence you'll enumerate the sequence only once.
Note: you can't use Enumerable.Aggregate for this, because it doesn't work with empty sequences.

Random with condition

I have the following code to extract records from a dbcontext randomly using Guid class:
var CategoryList = {1,5};
var generatedQues = new List<Question>();
//Algorithm 1 :)
if (ColNum > 0)
{
generatedQues = db.Questions
.Where(q => CategoryList.Contains(q.CategoryId))
.OrderBy(q => Guid.NewGuid()).Take(ColNum).ToList();
}
First, I have a list of CategoryId stored in CategoryList as a condition to be fulfilled when getting records from the db. However, I would like to achieve an even distribution among the questions based on the CategoryId.
For example:
If the ColNum is 10, and the CategoryId obtained are {1,5}, I would like to achieve by getting 5 records that are from CategoryId = 1 and another set of 5 records from CategoryId = 5. If the ColNum is an odd number like 11, I would also like to achieve an even distribution as much as possible like maybe getting 5 records from CategoryId 1 and 6 records from CategoryId 2.
How do I do this?

This is a two step process,
Determine how many you want for each category
Select that many items from each category in a random order
For the first part, define a class to represent the category and how many items are required
public class CategoryLookup
{
public CategoryLookup(int catId)
{
this.CategoryId = catId;
}
public int CategoryId
{
get; private set;
}
public int RequiredAmount
{
get; private set;
}
public void Increment()
{
this.RequiredAmount++;
}
}
And then, given your inputs of the required categories and the total number of items required, work out how many are required for each category
var categoryList = new []{1,5};
var colNum = 7;
var categoryLookup = categoryList.Select(x => new CategoryLookup(x)).ToArray();
for(var i = 0;i<colNum;i++){
categoryLookup[i%categoryList.Length].Increment();
}
The second part is really easy, just use a SelectMany to get the list of questions (Ive used a straight linq to objects to test, should work fine for database query. questions in my code would just be db.Questions in yours)
var result = categoryLookup.SelectMany(
c => questions.Where(q => q.CategoryId == c.CategoryId)
.OrderBy(x => Guid.NewGuid())
.Take(c.RequiredAmount)
);
Live example: http://rextester.com/RHF33878

You could try something like this:
var CategoryList = {1,5};
var generatedQues = new List<Question>();
//Algorithm 1 :)
if (ColNum > 0 && CategoryList.Count > 0)
{
var take = // Calculate how many of each
// First category
var query = db.Questions
.Where(q => q.CategoryId == CategoryList[0])
.OrderBy(q => Guid.NewGuid()).Take(take);
// For all remaining categories
for(int i = 1; i < CategoryList.Count; i++)
{
// Calculate how many you want
take = // Calculate how many of each
// Union the questions for that category to query
query = query.Union(
query
.Where(q => q.CategoryId == CategoryList[i])
.OrderBy(q => Guid.NewGuid()).Take(take));
}
// Randomize again and execute query
generatedQues = query.OrderBy(q => Guid.NewGuid()).ToList()
}
The idea is to just get a random list for each category and add them all together. Then you randomize that again and create your list. I do not know if it will do all this on the database or in memory, but it should be database I think. The resulting SQL will look horrible though.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Faster ways to access subsets through LINQ - c#

move that into 1 query, but actually you should move this into a SSIS package or a sql job. You could easily make this a stored proc that runs in less than a second.

Related

Use LINQ to GroupBy two columns and build a dictionary with them?

How to get records in EF that match a list of combinations (key/values)?

How to count the number of items that exist in a table efficiently with EntityFramework?

how to use LINQ TakeWhile and Join to a list

Random with condition

Categories

Resources