I have an IEnumerable of invoices, these invoices have line items. These line items have a priority. I'm programming a variety of strategies to automatically apply cash against these line items and one is giving me some trouble. My pattern has been to prepare a linq statement to order the line items of the invoices then iterate over the linq query applying cash in order until I run out.
An example of this linq statement for the simplest strategy, pay each line item by priority and due date, is shown below:
from lineItem in invoices.SelectMany(invoice => invoice.LineItems)
orderby lineItem.Priority, lineItem.DueDate
select lineItem;
One of the strategies is to apply cash to the oldest remaining item with a given priority, in priority order, then move to the next oldest of each priority.
EDIT: Example of how one might start the query I'm asking for -
from lineItem in invoices.SelectMany(invoice => invoice.LineItems)
group lineItem by lineItem.Priority into priorities
orderby priorities.Key
select priorities.OrderBy(item => item.DueDate);
We now have "buckets" of line items with the same priority, ordered by due date within the bucket. I need to extract the first line item from each bucket, followed by the second, etc. until I have ordered all of the items. I would like to perform this ordering purely in linq.
Can anyone think of a way to express this entirely in linq?
I don't see how you'll get this down to a better query than what you have, perhaps nest from queries to automatically do the SelectMany.
var proposedPayments = new List<LineItem>();
decimal cashOnHand = ...;
var query = invoices.SelectMany(iv => iv.LineItems)
.GroupBy(li => li.Priority)
.SelectMany(gg =>
gg.OrderBy(li => li.DueDate)
.Select((li,idx) => Tuple.Create(idx, gg.Key, li)))
.OrderBy(tt => tt.Item1)
.ThenBy(tt => tt.Item2)
.Select(tt => tt.Item3);
foreach (var item in query)
{
if (cashOnHand >= item.Cost)
{
proposedPayments.Add(item);
cashOnHand -= item.Cost;
}
if (cashOnHand == 0m) break;
}
Edit: updated to match the paragraph the author wanted. Selected as first of each priority.
LINQ = Language Integrated QUERY not Language Integrated PROCEDURAL CODE.
If you want a query that returns the line items you need to apply the payment to, then that's do-able (see .Aggregate), but if you want to actually apply the money to the line items as you go, then a foreach loop is a fine construct to use.
See http://blogs.msdn.com/b/ericlippert/archive/2009/05/18/foreach-vs-foreach.aspx
Related
The challenge is about converting from method chain to standard linq a piece of code full of group by.
The context
To fully understand the topic here you can read the original question (with class definitions, sample data and so on): Linq: rebuild hierarchical data from the flattened list
Thanks to #Akash Kava, I've found the solution to my problem.
Chain method formulation
var macroTabs = flattenedList
.GroupBy(x => x.IDMacroTab)
.Select((x) => new MacroTab
{
IDMacroTab = x.Key,
Tabs = x.GroupBy(t => t.IDTab)
.Select(tx => new Tab {
IDTab = tx.Key,
Slots = tx.Select(s => new Slot {
IDSlot = s.IDSlot
}).ToList()
}).ToList()
}).ToList();
But, for sake of knowledge, I've tried to convert the method chain to the standard Linq formulation but something is wrong.
What happens is similar to this..
My attempt to convert it to Linq standard syntax
var antiflatten = flattenedList
.GroupBy(x => x.IDMacroTab)
.Select(grouping => new MacroTab
{
IDMacroTab = grouping.Key,
Tabs = (from t in grouping
group grouping by t.IDTab
into group_tx
select new Tab
{
IDTab = group_tx.Key,
Slots = (from s in group_tx
from s1 in s
select new Slot
{
IDSlot = s1.IDSlot
}).ToList()
}).ToList()
});
The result in LinqPad
The classes and the sample data on NetFiddle:
https://dotnetfiddle.net/8mF1qI
This challenge helped me to understand what exactly returns a Linq Group By (and how prolix is the Linq syntax with Group By).
As LinqPad clearly shows a Group By returns a List of Groups. Group is a very simple class which has just one property: a Key
As this answer states, from definition of IGrouping (IGrouping<out TKey, out TElement> : IEnumerable<TElement>, IEnumerable) the only way to access to the content of the subgroups is to iterate through elements (a foreach, another group by, a select, ecc).
Here is shown the Linq syntax formulation of the method chain.
And here is the source code on Fiddle
But let's go on trying to see another solution:
What we usually do in SQL when we do a Group By is to list all the columns but the one which have been grouped. With Linq is different.. it still returns ALL the columns.
In this example we started with a dataset with 3 'columns' {IDMacroTab, IDTab, IDSlot}. We grouped for the first column, but Linq would return the whole dataset, unless we explicitly tell him..
I have a list of 50 sorted items(say) in which few items are priority ones (assume they have flag set to 1).
By default, i have to show the latest items (based on date) first, but the priority items should appear after some 'x' number of records. Like below
index 0: Item
index 1: Item
index 2: Priority Item (insert priority items from this position)
index 3: Priority Item
index 4: Priority Item
index 5: Item
index 6: Item
The index 'x' at which priority items should be inserted is pre-defined.
To achieve this, i am using following code
These are my 50 sorted items
var list= getMyTop50SortedItems();
fetching all priority items and storing it in another list
var priorityItems = list.Where(x => x.flag == 1).ToList();
filtering out the priority items from main list
list.RemoveAll(x => z.flag == 1);
inserting priority items in the main list at given position
list.InsertRange(1, priorityRecords);
This process is doing the job correctly and giving me the expected result. But am not sure whether it is the correct way to do it or is there any better way (considering the performance)?
Please provide your suggestions.
Also, how is the performance effected as i am doing many operations (filter, remove, insert) considering the increase in number of records from 50 to 100000(any number).
Update: How can i use IQueryable to decrease the number of operations on list.
As per documentation on InsertRange:
This method is an O(n * m) operation, where n is the number of
elements to be added and m is Count.
n*m isn't too very good, so I would use LINQ's Concat method to create a whole new list from three smaller lists, instead of modifying an existing one.
var allItems = getMyTop50();
var topPriorityItems = list.Where(x => x.flag == 1).ToList();
var topNonPriorityItems = list.Where(x => x.flag != 1).ToList();
var result = topNonPriorityItems
.Take(constant)
.Concat(topPriorityItems)
.Concat(topNonPriorityItems.Skip(constant));
I am not sure how fast the Concat, Skip and Take methods for List<T> are, though, but I'd bet they are not slower than O(n).
It seems like the problem you're actually trying to solve is just sorting the list of items. If this is the case, you don't need to be concerned with removing the priority items and reinserting them at the correct index, you just need to figure out your sort ordering function. Something like this ought to work:
// Set "x" to be whatever you want based on your requirements --
// this is the number of items that will precede the "priority" items in the
// sorted list
var x = 3;
var sortedList = list
.Select((item, index) => Tuple.Create(item, index))
.OrderBy(item => {
// If the original position of the item is below whatever you've
// defined "x" to be, then keep the original position
if (item.Item2 < x) {
return item.Item2;
}
// Otherwise, ensure that "priority" items appear first
return item.Item1.flag == 1 ? x + item.Item2 : list.Count + x + item.Item2;
}).Select(item => item.Item1);
You may need to tweak this slightly based on what you're trying to do, but it seems much simpler than removing/inserting from multiple lists.
Edit: Forgot that .OrderBy doesn't provide an overload that provides the original index of the item; updated answer to wrap the items in a Tuple that contains the original index. Not as clean as the original answer, but it should still work.
This can be done using a single enumeration of the original collection using linq-to-objects. IMO this also reads pretty clearly based on the original requirements you defined.
First, define the "buckets" that we'll be sorting into: I like using an enum here for clarity, but you could also just use an int.
enum SortBucket
{
RecentItems = 0,
PriorityItems = 1,
Rest = 2,
}
Then we'll define the logic for which "bucket" a particular item will be sorted into:
private static SortBucket GetBucket(Item item, int position, int recentItemCount)
{
if (position <= recentItemCount)
{
return SortBucket.RecentItems;
}
return item.IsPriority ? SortBucket.PriorityItems : SortBucket.Rest;
}
And then a fairly straightforward linq-to-objects statement to sort first into the buckets we defined, and then by the original position. Written as an extension method:
static IEnumerable<Item> PrioritySort(this IEnumerable<Item> items, int recentItemCount)
{
return items
.Select((item, originalPosition) => new { item, originalPosition })
.OrderBy(o => GetBucket(o.item, o.originalPosition, recentItemCount))
.ThenBy(o => o.originalPosition)
.Select(o => o.item);
}
Background: I have two Collections of different types of objects with different name properties (both strings). Objects in Collection1 have a field called Name, objects in Collection2 have a field called Field.
I needed to compare these 2 properties, and get items from Collection1 where there is not a match in Collection2 based on that string property (Collection1 will always have a greater or equal number of items. All items should have a matching item by Name/Field in Collection2 when finished).
The question: I've found answers using Lists and they have helped me a little(for what it's worth, I'm using Collections). I did find this answer which appears to be working for me, however I would like to convert what I've done from query syntax (if that's what it's called?) to a LINQ query. See below:
//Query for results. This code is what I'm specifically trying to convert.
var result = (from item in Collection1
where !Collection2.Any(x => x.ColumnName == item.FieldName)
select item).ToList();
//** Remove items in result from Collection1**
//...
I'm really not at all familiar with either syntax (working on it), but I think I generally understand what this is doing. I'm struggling trying to convert this to LINQ syntax though and I'd like to learn both of these options rather than some sort of nested loop.
End goal after I remove the query results from Collection1: Collection1.Count == Collection2 and the following is true for each item in the collection: ItemFromCollection1.Name == SomeItemFromCollection2.Field (if that makes sense...)
You can convert this to LINQ methods like this:
var result = Collection1.Where(item => !Collection2.Any(x => x.ColumnName == item.FieldName))
.ToList();
Your first query is the opposite of what you asked for. It's finding records that don't have an equivalent. The following will return all records in Collection1 where there is an equivalent:
var results=Collection1.Where(c1=>!Collection2.Any(c2=>c2.Field==c1.Name));
Please note that this isn't the fastest approach, especially if there is a large number of records in collection2. You can find ways of speeding it up through HashSets or Lookups.
if you want to get a list of non duplicate values to be retained then do the following.
List<string> listNonDup = new List<String>{"6","1","2","4","6","5","1"};
var singles = listNonDup.GroupBy(n => n)
.Where(g => g.Count() == 1)
.Select(g => g.Key).ToList();
Yields: 2, 4, 5
if you want a list of all the duplicate values then you can do the opposite
var duplicatesxx = listNonDup.GroupBy(s => s)
.SelectMany(g => g.Skip(1)).ToList();
I have a controller as below, and it takes too long load the data. I am using contains and tolist() methods. And i have heard about low performance of toList() method.
How can i change this approach with better coding for performance.
public List<decimal> GetOrgSolution()
{
//Need to use USER id. but we have EMPNO in session.
var Users = db.CRM_USERS.Where(c => c.ID == SessionCurrentUser.ID || RelOrgPerson.Contains(c.EMPNO.Value)).Select(c => c.ID);
//Get the organization list regarding to HR organization
var OrgList = db.CRM_SOLUTION_ORG.Where(c => c.USER_ID == SessionCurrentUser.ID || Users.Contains(c.USER_ID.Value)).Select(c => c.ID).ToList();
//Get related solutions ID with the OrgList
List<decimal> SolutionList = db.CRM_SOLUTION_OWNER.Where(p => OrgList.Contains(p.SOLUTION_ORG_ID.Value)).Select(c => (decimal)c.SOLUTION_ID).Distinct().ToList();
return SolutionList;
}
You might be able to speed this up by dropping the ToList() from the orglist query. This uses deferred execution, rather than pulling all the records for the org list. However, if there is no match on the query that calls Contains(), it will still have to load everything.
public List<decimal> GetOrgSolution()
{
//Need to use USER id. but we have EMPNO in session.
var Users = db.CRM_USERS.Where(c => c.ID == SessionCurrentUser.ID || RelOrgPerson.Contains(c.EMPNO.Value)).Select(c => c.ID);
//Get the organization list regarding to HR organization
var OrgList = db.CRM_SOLUTION_ORG.Where(c => c.USER_ID == SessionCurrentUser.ID || Users.Contains(c.USER_ID.Value)).Select(c => c.ID);
//Get related solutions ID with the OrgList
List<decimal> SolutionList = db.CRM_SOLUTION_OWNER.Where(p => OrgList.Contains(p.SOLUTION_ORG_ID.Value)).Select(c => (decimal)c.SOLUTION_ID).Distinct().ToList();
return SolutionList;
}
Unless the lists you're working with are really huge, it's highly unlikely that calling ToList is the major bottleneck in your code. I'd be much more inclined to suspect the database (assuming you're doing LINQ-to-SQL). Or, your embedded Contains calls. You have, for example:
db.CRM_SOLUTION_ORG..Where(
c => c.USER_ID == SessionCurrentUser.ID || Users.Contains(c.USER_ID.Value))
So for every item in db.CRM_SOLUTION_ORG that fails the test against SessionCurrentUser, you're going to do a sequential search of the Users list.
Come to think of it, because Users is lazily evaluated, you're going to execute that Users query every time you call Users.Contains. It looks like your code would be much more efficient in this case if you called ToList() on the Users. That way the query is only executed once.
And you probably should keep the ToList() on the OrgList query. Otherwise you'll be re-executing that query every time you call OrgList.Contains.
That said, if Users or OrgList could have a lot of items, then you'd be better off turning them into HashSets so that you get O(1) lookup rather than O(n) lookup.
But looking at your code, it seems like you should be able to do all of this with a single query using joins, and let the database server take care of it. I don't know enough about Linq to SQL or your data model to say for sure, but from where I'm standing it sure looks like a simple joining of three tables.
Consider following code snippet
List orderList ; // This list is pre-populated
foreach (System.Web.UI.WebControls.ListItem item in OrdersChoiceList.Items) // OrdersChoiceList is of type System.Web.UI.WebControls.CheckBoxList
{
foreach (Order o in orderList)
{
if (item.id == o.id)
{
item.Selected = scopeComputer.SelectedBox;
break;
}
}
}
There are thousands of item in the list, hence these loops are time consuming. How we can optimze it?
Also how can we do the same stuff with LINQ. I tried using join operation but not able to set the value of "Selected" variable based on "SelectedBox". For now I hardocoded the value in select clause to "true", how can we pass & use SelectedBox value in select clause
var v = (from c in ComputersChoiceList.Items.Cast<ListItem>()
join s in scopeComputers on c.Text equals s.CName
select c).Select(x=>x.Selected = true);
I think you need to eliminate the nested iteration. As you state, both lists have a large set of items. If they both have 5,000 items, then you're looking at 25,000,000 iterations in the worst case.
There's no need to continually re-iterate orderList for every single ListItem. Instead create an ID lookup so you have fast O(1) lookups for each ID. Not sure what work is involved hitting scopeComputer.SelectedBox, but that may as well be resolved once outside the loop as well.
bool selectedState = scopeComputer.SelectedBox;
HashSet<int> orderIDs = new HashSet<int>(orders.Select(o => o.id));
foreach (System.Web.UI.WebControls.ListItem item in OrdersChoiceList.Items)
{
if (orderIDs.Contains(item.id))
item.Selected = selectedState;
}
Using a HashSet lookup, you're now really only iterating 5,000 times plus a super-fast lookup.
EDIT: From what I can tell, there's no id property on ListItem, but I'm assuming that the code you've posted is condensed for brevity, but largely representative of your overall process. I'll keep my code API/usage to match what you have there; I'm assuming it's translatable back to your specific implementation.
EDIT: Based on your edited question, I think you're doing yet another lookup/iteration on retrieving the scopeComputer reference. Similarly, you can make another lookup for this:
HashSet<int> orderIDs = new HashSet<int>(orders.Select(o => o.id));
Dictionary<string, bool> scopeComputersSelectedState =
scopeComputers.ToDictionary(s => s.CName, s => s.Selected);
foreach (System.Web.UI.WebControls.ListItem item in OrdersChoiceList.Items)
{
if (orderIDs.Contains(item.id))
item.Selected = scopeComputersSelectedState[item.Text];
}
Again, not sure on the exact types/usage you have. You could also condense this down with a single LINQ query, but I don't think (performance speaking) you will see much of a improvement. I'm also assuming that there is a matching ScopeComputer for every ListItem.Text entry otherwise you'll get an exception when accessing scopeComputersSelectedState[item.Text]. If not, then it should be a trivial exercise for you to change it to perform a TryGetValue lookup instead.