how to use LINQ TakeWhile and Join to a list - c#

I want to map oldest possible credit invoice's date to a list from sale table. I have a list in following manner.
var tb = // Some other method to get balances.
cust balance
1 1000
2 2000
3 3000
... so on ...
These balance are accumulation of few invoices or sometimes may be one invoice.
public class Sales
{
public int id { get; set; }
public DateTime saleDate { get; set; }
public int? cust { get; set; }
public decimal invoiceValue { get; set; }
// Other properties...
}
Sample data for cust 1
saleInvNo saleDate cust invoiceValue
1 2018/12/01 1 500
12 2018/12/20 1 750
If I fetch now, report should be as follows.
cust balance balanceFromDate
1 1000 2018/12/01 // balance from this onwards.
2 2000 ???
3 3000 ???
Is there easy way to achieve this through LINQ.
I tried with TakeWhile but attempt was not successful.
foreach (var t in tb)
{
var sum = 0m;
var q = _context.SaleHeaders.Where(w=>w.cust == t.cust)
.OrderByDescending(o=>o.saleDate)
.Select(s => new { s.id, s.saleDate, s.invoiceValue }).AsEnumerable()
.TakeWhile(x => { var temp = sum; sum += x.invoiceValue; return temp > t.balance; });
}
Note: Sales table is having ~75K records.
More Info...
Customers may pay partial amount than Sale Invoice. All Payments, Invoices are posted to another table thats why balances comes from a complex query on another table.
Sales table has purely raw sales data.
Now, cust 1 balance is 1000 than the value is of last sale invoice or even last to last sale invoice's full or partial. I need to map "balance from onwards".

So you have two sequences: a sequence of Balances and a sequence of Sales.
Every Balance has at least an int property CustomerId, and a BalanceValue
Every Sale has at least a nullable int property CustomerId and a DateTime property SaleDate
class Balance
{
public int CustomerId {get; set;}
public decimal BalanceValue {get; set;}
}
class Sale
{
public int? CustomerId {get; set;}
public DateTime SaleDate {get; set;}
}
Apparently it is possible to have a Sale without a Customer.
Now given a sequence of Balances and Sales, you want for every Balance one object, containing the CustomerId, the BalanceValue and the SaleDate that this Customer has in the sequence of Sales.
Your requirement doesn't specify what you want with Balances in your sequence of Balances that have a Customer without a Sale in the sequence of Sales. Let's assume you want those items in your result with a null LastSaleDate.
IEnumerable<Balance> balances = ...
IEnumerable<Sales> sales = ...
We are not interested in sales without a customer. So let's filter them out first:
var salesWithCustomers = sales.Where(sale => sale.CustomerId != null);
Now make groups of Balances with their sales, by groupjoiningon CustomerId:
var balancesWithSales = balances.GroupJoin(salesWithCustomers, // GroupJoin balances and sales
balance => balance.CustomerId, // from every Balance take the CustomerId
sale => sale.CustomerId, // from every Sale take the CustomerId. I know it is not null,
(balance, salesOfBalance) => new // from every balance, with all its sales
{ // make one new object
CustomerId = balance.CustomerId,
Balance = balance.BalanceValue,
LastSaleDate = salesOfBalance // take all salesOfBalance
.Select(sale => sale.SaleDate) // select the saleDate
.OrderByDescending(sale => sale.SaleDate) // order: newest dates first
.FirstOrDefault(), // keep only the first element
})
The calculation of the LastSaleDate orders all elements, and keeps only the first.
Although this solution workd, it is not efficient to order all other elements if you only need the largest one. If you are working with IEnumerables, instead of IQueryables (as would be in a database), you can optimize this, by creating a function.
I implement this as an extension function. See extension methods demystified
static DateTime? NewestDateOrDefault(this IEnumerable<DateTime> dates)
{
IEnumerator<DateTime> enumerator = dates.GetEnumerator();
if (!enumerator.MoveNext())
{
// sequence of dates is empty; there is no newest date
return null;
}
else
{ // sequence contains elements
DateTime newestDate = enumerator.Current;
while (enumerator.MoveNext())
{ // there are more elements
if (enumerator.Current > newestDate)
newestDate = enumerator.Current);
}
return newestDate;
}
}
Usage:
LastSaleDate = salesOfBalance
.Select(sale => sale.SaleDate)
.NewesDateOrDefault();
Now you know that instead of sorting the complete sequence you'll enumerate the sequence only once.
Note: you can't use Enumerable.Aggregate for this, because it doesn't work with empty sequences.

Related

Filter (reservation) items on start and end DateTime

I'm new to .NET-Core and building API's and I got stuck on a problem.
I'm making an (Android) app where people can plan an event/birthday party or something.
The user can retrieve a list of available (pre-added) Venues/locations after they filled in the start and end date/time. This is the part where I get stuck.
I want a list of available venues after they filled in the start and end date, showing only available venues for their selected time slots.
I have 2 models in my API, a 'Venue' model and a 'VenueReservation' model.
The Venue model has an Id, location and name.
The VenueReservationModel has an Id, venueId, startDateTime, endDateTime.
What can I use or do to retrieve the available venues based of the DateTime input of the user (start and end DateTime)?
I played around with LINQ using the Enumerable.Where method, but I can't seem to find the answer.
Thanks in advance!
So you have classes like:
class Venue
{
public int Id {get; set;}
public string Location {get; set;}
public string Name {get; set;}
}
class VenueReservation
{
public int Id {get; set;}
public int VenueId {get; set;} // foreign key to Venue
public DateTime StartDateTime {get; set;}
public DateTime EndDateTime {get; set;}
}
And you have:
IQueryable<Venue> Venues => ... // Probably DbSet<Venue>
IQueryable<VenueReservation> VenueReservations => ...
I want a list of available venues after they filled in the start and end date, showing only available venues for their selected time slots.
So you need to fetch "Venues with their zero or more VenueReservations". Once you've got them, you could keep only those Venues that have no Reservation at all during the selected TimeSlot.
Get VenuesWithTheirReservations
Normally to get "Venues with their zero or more VenueReservations", I'd use Queryable.GroupJoin. Some people mentioned that .net core doesn't support GroupJoin. In that case, use a left outer join followed by a GroupBy:
var venuesWithTheirReservations = venues.GroupJoin(venueReservations,
venue => venue.Id, // from every Venue take the Id
venueReservation => venueReservation.VenueId, // from every Reservation take the foreign key
(venue, reservationsForThisVenue) => new
{
Venue = venue,
Reservations = reservationsForThisVenue,
});
Or LINQ left outer join followed by GroupBy:
var venuesWithTheirReservations = from venue in venues
join reservation in venueReservations
on venue.Id equals reservation.VenueId into g
from reservationForThisVenu in g.DefaultIfEmpty()
select new
{
Venue = venue,
Reservation = reservationForThisVenue,
})
.GroupBy(joinResult => joinResult.Venue,
// parameter resultSelector: for every venue, and all reservations for this venue
// make one new
(venue, reservationsForThisVenue) => new
{
Venue = venue,
Reservations = reservationsForThisVenue,
});
Keep only the available Venues
So, now you've got the venues, each with their reservations, you can use Queryable.Where to keep only those Venues that have no reservation at all during the time slot.
In other words: none of the Reservations should overlap with the time slot.
all Reservations should either end before the time slot starts
(= the complete Reservation is before the time slot)
OR start after the time slot ends
(= the complete Reservation is after the time slot.
We don't want Reservations that are partly during the time slot
So continuing the LINQ:
DateTime timeSlotStart = ...
DateTime timeSlotEnd = ...
var availableVenues = venuesWithTheirReservations.Where(venue =>
// every Reservation should end before timeSlotStart
// OR start after the timeSlotEnd:
venue.Reservations.All(reservation =>
// end before timeSlotStart OR
reservation.EndDateTime <= timeSlotStart ||
// start after the timeSlotEnd:
reservation.StartDatetime >= timeSlotEnd));
You could do it this way.
Get reservations with input date between Start and End
IEnumerable<VenueReservation> reservationsBetweenStartAndEnd = venueReservations
.Where(x => inputDateTime > x.StartDateTime && inputDateTime < x.EndDateTime);
Get reservations with input date outside of Start and End
IEnumerable<VenueReservation> reservationsOutsideStartAndEnd = venueReservations
.Where(x => inputDateTime < x.StartDateTime || inputDateTime > x.EndDateTime);
If you want want to strictly compare date and cut hours and minute you can simply use .Date on DateTime.
So as an example you would do this
IEnumerable<VenueReservation> reservationsBetweenStartAndEnd = venueReservations
.Where(x => inputDateTime.Date > x.StartDateTime.Date && inputDateTime.Date < x.EndDateTime.Date);
also
List<VenueReservation> reservationsBetweenStartAndEnd = venueReservations
.Where(x => inputDateTime.Date > x.StartDateTime.Date && inputDateTime.Date < x.EndDateTime.Date).toLis();

How to get records in EF that match a list of combinations (key/values)?

I have a database table with records for each user/year combination.
How can I get data from the database using EF and a list of userId/year combinations?
Sample combinations:
UserId Year
1 2015
1 2016
1 2018
12 2016
12 2019
3 2015
91 1999
I only need the records defined in above combinations. Can't wrap my head around how to write this using EF/Linq?
List<UserYearCombination> userYears = GetApprovedYears();
var records = dbcontext.YearResults.Where(?????);
Classes
public class YearResult
{
public int UserId;
public int Year;
public DateTime CreatedOn;
public int StatusId;
public double Production;
public double Area;
public double Fte;
public double Revenue;
public double Diesel;
public double EmissionsCo2;
public double EmissionInTonsN;
public double EmissionInTonsP;
public double EmissionInTonsA;
....
}
public class UserYearCombination
{
public int UserId;
public int Year;
}
This is a notorious problem that I discussed before here. Krishna Muppalla's solution is among the solutions I came up with there. Its disadvantage is that it's not sargable, i.e. it can't benefit from any indexes on the involved database fields.
In the meantime I coined another solution that may be helpful in some circumstances. Basically it groups the input data by one of the fields and then finds and unions database data by grouping key and a Contains query of group elements:
IQueryable<YearResult> items = null;
foreach (var yearUserIds in userYears.GroupBy(t => t.Year, t => t.UserId))
{
var userIds = yearUserIds.ToList();
var grp = dbcontext.YearResults
.Where(x => x.Year == yearUserIds.Key
&& userIds.Contains(x.UserId));
items = items == null ? grp : items.Concat(grp);
}
I use Concat here because Union will waste time making results distinct and in EF6 Concat will generate SQL with chained UNION statements while Union generates nested UNION statements and the maximum nesting level may be hit.
This query may perform well enough when indexes are in place. In theory, the maximum number of UNIONs in a SQL statement is unlimited, but the number of items in an IN clause (that Contains translates to) should not exceed a couple of thousands. That means that
the content of your data will determine which grouping field performs better, Year or UserId. The challenge is to minimize the number of UNIONs while keeping the number of items in all IN clauses below approx. 5000.
you can try this
//add the possible filters to LIST
var searchIds = new List<string> { "1-2015", "1-2016", "2-2018" };
//use the list to check in Where clause
var result = (from x in YearResults
where searchIds.Contains(x.UserId.ToString()+'-'+x.Year.ToString())
select new UserYearCombination
{
UserId = x.UserId,
Year = x.Year
}).ToList();
Method 2
var d = YearResults
.Where(x=>searchIds.Contains(x.UserId.ToString() + '-' + x.Year.ToString()))
.Select(x => new UserYearCombination
{
UserId = x.UserId,
Year = x.Year
}).ToList();

Save only unique (by property value) records in database table

My EF entity class looks like:
public class DataPoint
{
public int DataPointId {get; set;}
public DateTime DateTime {get; set;}
public double Value {get; set;}
}
where DataPointId is PK, no other columns are indexed at the time.
Let's say that we have a collection of DataPoints which must be added into context.DataPoints database table:
var dpToAdd = new List<DataPoint>{ /* 10000 different dp's */ };
But, I want to save in db only those DataPoint's which are unique when it comes to its DateTime column.
For example: one of DataPoint in dpToAdd has DateTime = 01/01/2016 00:00:00 - if context.DataPoints already contains DataPoint with the same DateTime value, this point should be ignored.
context.DataPoints table can have like 1 million of records, and at a single request there can be collection of 10-50k of records that needs to be verified before saving in db.
How to handle such process so that the performance is the best possible?
My first though about this is to create an index on DateTime column and then, for every DataPoints that is about to be added check something like:
1: simply loop thru all points collection and check which needs to be added
foreach (var dp in dpToAdd)
{
// with Any()
if !(context.DataPoints.Any(p => p.DateTime == dp.DateTime))
{
context.DataPoints.Add(dp);
}
// or with Contains()
if !(context.DataPoints.Select(p => p.DateTime).Contains(dp.DateTime))
{
context.DataPoints.Add(dp);
}
}
2: get the DateTime values which already occurs in database, and exclude them from db addition
var common = context.DataPoints.Select(p => p.DateTime).Intersect(dpToAdd.Select(d => d.DateTime));
var reallyToAdd = dpToAdd.Where(p => !common.Contains(p.DateTime));
context.DataPoints.AddRange(reallyToAdd);
Do you have any other suggestions if this task can be developed in any other, better way?
If you really have so big table and large data to insert, for performance reason, you should create helping intermediate table DataPointHelper with same columns plus Guid. First, you will insert data to this table, and then will fetch rows from it which not exist at DataPoints table, and then will insert them into DataPoints. Guid property is needed for transactional correctness:
var guid = Guid.NewGuid();
var data = dpToAdd.Select(x => new DataPointHelper
{
DateTime = x.DateTime,
Value = x.Value,
Guid = guid
}).ToList();
context.DataPointHelpers.AddRange(data);//it is better to use BulkInsert
context.SaveChanges();
var toInsert = (from x in context.DataPointHelpers
where x.Guid == guid
join y in context.DataPoints on x.DateTime equals y.DateTime into subs
from sub in subs.DefaultIfEmpty()
where sub == null
select x.DateTime).ToList();
if(toInsert.Count > 0)
{
var toInsertData = dpToAdd.Where(x => toInsert.Contains(x.DateTime)).ToList();
context.DataPoints.AddRange(toInsertData);
context.SaveChanges();
}
Sure, you should to create unique indexes for DateTime property at both tables.
So you have only three round trips to database instead of thousands or instead of executing query like: where DateTime in (thousands values), moreover in statement has limit.
I think you can use this:
var dt = DateTime.Now;
var ls = new List<DataPoint>();
ls.Add(new DataPoint { DataPointId =1, Value = 1, DateTime = dt});
ls.Add(new DataPoint { DataPointId =2, Value = 2, DateTime = dt});
ls.Add(new DataPoint { DataPointId =1, Value = 1, DateTime = DateTime.Now.AddDays(1)});
var distincData = ls.GroupBy(l => l.DateTime).Select(g => g.First());

Faster ways to access subsets through LINQ

This is a question about SPEED - there are a LOT of records to be accessed.
Basic Information About The Problem
As an example, we will have three tables in a Database.
Relations:
Order-ProductInOrder is One-To-Many (an order can have many products in the order)
ProductInOrder- Product is One-To-One (a product in the order is represented by one product)
public class Order {
public bool Processed { get; set; }
// this determines whether the order has been processed
// - orders that have do not go through this again
public int OrderID { get; set; } //PK
public decimal TotalCost{ get; set; }
public List<ProductInOrder> ProductsInOrder;
// from one-to-many relationship with ProductInOrder
// the rest is irrelevant and will not be included here
}
//represents an product in an order - an order can have many products
public class ProductInOrder {
public int PIOD { get; set; } //PK
public int Quantity{ get; set; }
public int OrderID { get; set; }//FK
public Order TheOrder { get; set; }
// from one-to-many relationship with Order
public int ProductID { get; set; } //FK
public Product TheProduct{ get; set; }
//from one-to-one relationship with Product
}
//information about a product goes here
public class Product {
public int ProductID { get; set; } //PK
public decimal UnitPrice { get; set; } //the cost per item
// the rest is irrelevant to this question
}
Suppose we receive a batch of orders where we need to apply discounts to and find the total price of the order. This could apply to anywhere from 10,000 to over 100,000 orders. The way this works is that if an order has 5 or more products where the cost each is $100, we will give a 10% discount on the total price.
What I Have Tried
I have tried the following:
//this part gets the product in order with over 5 items
List<Order> discountedOrders = orderRepo
.Where(p => p.Processed == false)
.ToList();
List<ProductInOrder> discountedProducts = discountedOrders
.SelectMany(p => p.ProductsInOrder)
.Where(q => q.Quantity >=5 )
.ToList();
discountedProducts = discountedProducts
.Where(p => p.Product.UnitPrice >= 100.00)
.ToList();
discountOrders = discountedOrders
.Where(p => discountProducts.Any(q => q.OrderID == p.OrderID))
.ToList();
This is very slow and takes forever to run, and when I run integration tests on it, the test seems to time out. I was wondering if there is a faster way to do this.
Try to not call ToList after every query.
When you call ToList on a query it is executed and the objects are loaded from the database in memory. Any subsequent query based on the results from the first query is performed in memory on the list instead of performing it directly in the database. What you want to do here is to execute the whole query on the database and return only those results which verify all your conditions.
var discountedOrders = orderRepo
.Where(p=>p.Processed == false);
var discountedProducts = discountedOrders
.SelectMany(p=>p.ProductsInOrder)
.Where(q=>q.Quantity >=5);
discountedProducts = discountedProducts
.Where(p=>p.Product.UnitPrice >= 100.00);
discountOrders = discountedOrders
.Where(p=>discountProducts.Any(q=>q.OrderID == p.OrderID));
Well, for one thing, combining those calls will speed it up some. Try this:
discountOrders = orderRepo.Where(p=>p.Processed == false && p.SelectMany(q=>q.ProductsInOrder).Where(r=>r.Quantity >=5 && r.Product.UnitPrice >= 100.00 && r.OrderID == p.OrderId).Count() > 0).ToList();
Note that this isn't tested. I hope I got the logic right-- I think I did, but let me know if I didn't.
Similar to #PhillipSchmidt, you could rationalize your Linq
var discountEligibleOrders =
allOrders
.Where(order => !order.Processed
&& order
.ProductsInOrder
.Any(pio => pio.TheProduct.UnitPrice >= 100M
&& pio.Quantity >= 5))
Removing all those nasty ToList statements is a great start because you're pulling potentially significantly larger sets from the db to your app than you need to. Let the database do the work.
To get each order and its price (assuming a discounted price of 0.9*listed price):
var ordersAndPrices =
allOrders
.Where(order => !order.Processed)
.Select(order => new {
order,
isDiscounted = order
.ProductsInOrder
.Any(pio => pio.TheProduct.UnitPrice >= 100M
&& pio.Quantity >= 5)
})
.Select(x => new {
order = x.order,
price = x.order
.ProductsInOrder
.Sum(p=> p.Quantity
* p.TheProduct.UnitPrice
* (x.isDiscounted ? 0.9M : 1M))});
I know you have an accepted answer but please try this for added speed - PLINQ (Parallel LINQ) this will take a list of 4000 and if you have 4 cores it will filter 1000 on each core and then collate the results.
List<Order> orders = new List<Order>();
var parallelQuery = (from o in orders.AsParallel()
where !o.Processed
select o.ProductsInOrder.Where(x => x.Quantity >= 5 &&
x.TheProduct.UnitPrice >= 100.00 &&
orders.Any(x => x.OrderID = x.OrderID));
Please see here:
In many scenarios, PLINQ can significantly increase the speed of LINQ to Objects queries by using all available cores on the host computer more efficiently. This increased performance brings high performance computing power onto the desktop
http://msdn.microsoft.com/en-us/library/dd460688.aspx
move that into 1 query, but actually you should move this into a SSIS package or a sql job. You could easily make this a stored proc that runs in less than a second.

Generating Columns in GridView ( C#, LINQ )

The situation is, that i need to create table in grid view looking like this:
----------| ID---|---Name--|--1/2002--|--2/2002--|--1/2003--|........| 2/2009 |
Cust1--|
Cust2--|
:
:
I have two tables in db - Customers and orders, throught LINQ to SQL DataContext
ID and Name of the customers i´m getting from a simple query
var custInfo = from cust in db.Customers
select new { ID = cust.Id,
FullName = cust.FirstName + " " + cust.LastName }
dataGridOrdersPreview.DataSource = custInfo;
And i need some clue, how to generate that columns in format t/year where t indicates the first or second half of the year, and assign to that generated columns each Customer´s orders in that session of the year ( displaying only costs )
[edit]
As far as now, i´m attempting to something like this:
var orders = from ord in db.Orders
group ord by ord.Id_cust into grouped
let costs = grouped
.Where( s => s.YearSession == session && s.Year == year)
.Select(a => new { Costs = a.Cost ) } )
select new { ID = grouped.Key,
Name = custInfo
.Where( a => a.ID == grouped.Key)
.Select( j => j.Name).Single(),
Cost = ExtensionLibrary.Sum(costs, "\n")
};
( in Cost getting only the summed costs in that year session for each customer )
and then i think about iterating throuhgh the years and sessions and getting
somehow the query results to corresponding columns
while (year <= DateTime.Today.Year)
{
year++;
while (session < 2)
{
session++;
dataGridOrdersPreview.Columns.Add(session +"/"+ year);
col.Add((session +"/"+ year),
orders.Select( a => a.Cost ).ToList() );
/* col is Dictionary<string, List<string> > */
}
session = 0;
}
Here i have generated columns that i want and i have orders in Dictionary where Key is column name and Value are orders in that column, but i need some help binding it to that columns
The way I've seen it done is to create a class that has the properties that you want, for example,
class CustOrders
{
public string CustName {get; set;}
public int Orders2002-1 {get; set;}
public int Orders2002-2 {get; set;}
...
public int Orders2009-1 {get; set;}
}
Then use the System.Windows.Forms.BindingSource, call it say CustOrdsBindingSource and set its DataSource to a list of your new class.
List<CustOrders> myListOfCustOrders = new List<CustOrders>();
/* Code to populate myListOfCustOrders */
CustOrdsBindingSource.DataSource = myListOfCustOrders;
In this case you will have to write the code to convert each result of your query results to an instance of CustOrders and store it in myListOfCustOrders.
Finally, the grid view's data source will also have to be set:
gridView1.DataSource = CustOrdsBindingSource;
The big problem I see with this approach is that you will have to change the CustOrders class every year unless there is some voodoo some can suggest to insert properties into the class at run time.
Either way, I hope this gives you a start.
As long as there will be no updating/adding/deleting of rows, I think I would just generate that grid manually.
Fetch the list of customers and the count of how many sales in what year/session. And then in the form take that list and create the needed columns.

Categories

Resources