Regarding LINQ Usage in Large Loops

Regarding LINQ Usage in Large Loops - c#

I am wondering what is recommended in the following scenario:
I have a large loop that I traverse to get an ID which I then store in a database like so:
foreach (var rate in rates)
{
// get ID from rate name
Guid Id = dbContext.DifferentEntity
.Where(x => x.Name == rate.Name).FirstOrDefault();
// create new object with the newly discovered
// ID to insert into the database
dbContext.YetAnotherEntity.Add(new YetAnotherEntity
{
Id = Guid.NewGuid(),
DiffId = Id,
}
}
Would it be better/ faster to do this instead (first get all DifferentEntity IDs, rather than querying for them separately)?
List<DifferentEntity> differentEntities = dbContext.DifferentEntity;
foreach (var rate in rates)
{
// get ID from rate name
Guid Id = differentEntities
.Where(x => x.Name == rate.Name).FirstOrDefault();
// create new object with the newly discovered
// ID to insert into the database
dbContext.YetAnotherEntity.Add(new YetAnotherEntity
{
Id = Guid.NewGuid(),
DiffId = Id,
}
}
Is the difference negligible or is this something I should consider? Thanks for your advice.

Store your Rate Names in a sorted string array (string[]) instead of a List or Collection. Then use Array.BinarySearch() to make your search much faster. Rest of what I was going to write has already been written by #Felipe above.

Run them horses! There is really a lot we do not know. Is it possible to keep all the entities in memory? How many of them are duplicates with respect to Name?
A simplistic solution with one fetch from the database and usage of parallelism:
// Fetch entities
var entitiesDict = dbContext.DifferentEntity
.Distinct(EqualityComparerForNameProperty).ToDictionary(e => e.Name);
// Create the new ones real quick and divide into groups of 500
// (cause that horse wins in my environment with complex entities,
// maybe 5 000 or 50 000 fits your scenario better since they are not that complex?)
var newEnts = rates.AsParallel().Select((rate, index) => {
new {
Value = new YetAnotherEntity
{ Id = Guid.NewGuid(), DiffId = entitiesDict[rate.Name],},
Index = index
}
})
.GroupAdjacent(anon => anon.Index / 500) // integer division, and note GroupAdjacent! (not GroupBy)
.Select(group => group.Select(anon => anon.Value)); // do the select so we get the ienumerables
// Now we have to add them to the database
Parallel.ForEach(groupedEnts, ents => {
using (var db = new DBCONTEXT()) // your dbcontext
{
foreach(var ent in ents)
db.YetAnotherEntity.Add(ent);
db.SaveChanges();
}
});
In general in database scenarios, the expensive stuff is the fetch and commits, so try to keep them to a minimum.

You can decrease the number of queries you are doing in database. For example, take all names and query findind Ids where the names contains.
Try something like this.
// get all names you have in rates list...
var rateNames = rates.Select(x => x.Name).ToList();
// query all Ids you need where contains on the namesList... 1 query, 1 column (Id, I imagine)
var Ids = dbContext.DifferentEntity.Where(x => rateNames.Contains(x.Name).Select(x => x.Id).ToList();
// loop in Ids result, and add one by one
foreach(var id in Ids)
dbContext.YetAnotherEntity.Add(new YetAnotherEntity
{
Id = Guid.NewGuid(),
DiffId = id,
}

Related

How to get around unique key constraint when updating via Entity Framework (using dbcontext.SaveChanges())

I'm running into a problem when updating some data via EF.
Let's say I have a table in my database:
Table T (ID int, Rank int, Name varchar)
I have a unique key constraint on Rank.
For example, I have this data in the table:
My C# object is something like this: Person (name, rank), so on the front end, a user wants to switch the rank of Joe and Mark.
When I make the update via EF, I get an error because of the unique key.
I suspect it is because dbContext.SaveChanges uses a update in this style:
UPDATE Table SET rank = 5 where Name = Joe
UPDATE Table SET rank = 1 where Name = Mark
With a SQL query I can perform this update by doing this:
Pass in User Defined table (rank, name) from C# side into query and then:
update T
set T.Rank = Updated.Rank
from Table T
inner join #UserDefinedTable Updated on T.Name = Temp.Name
and this does not trigger the unique key constraint
However I want to use EF for this operation, what do I do?
I've thought of these other solutions so far:
Delete old records, add "new" records from updated objects via EF
Dropping the unique constraint on database and writing a C# function to do the job of the unique constraint
Just use a SQL query like the example above instead of EF
Note: the table structure and data I used above is just an example
Any ideas?

Idea - you could make it as two steps operation(wrapped as single transaction)
1) set values for all entities that have to updated to negative(Joe, -1; Mark -5)
2) set to correct values (Joe, 5, Mark 1)
SQL Server's equivalent:
SELECT 1 AS ID, 1 AS [rank], 'Joe' AS name INTO t
UNION SELECT 2,2,'Ann'
UNION SELECT 3,5,'Mark'
UNION SELECT 4,7,'Sam';
CREATE UNIQUE INDEX uq ON t([rank]);
SELECT * FROM t;
/* Approach 1
UPDATE t SET [rank] = 5 where Name = 'Joe';
UPDATE t SET [rank] = 1 where Name = 'Mark';
Cannot insert duplicate key row in object 'dbo.t' with unique index 'uq'.
The duplicate key value is (5). Msg 2601 Level 14 State 1 Line 2
Cannot insert duplicate key row in object 'dbo.t' with unique index 'uq'.
The duplicate key value is (1).
*/
BEGIN TRAN
-- step 1
UPDATE t SET [rank] = -[rank] where Name = 'Joe';
UPDATE t SET [rank] = -[rank] where Name = 'Mark';
-- step 2
UPDATE t SET [rank] = 5 where Name = 'Joe'
UPDATE t SET [rank] = 1 where Name = 'Mark';
COMMIT;
db<>fiddle demo

You have focused a lot on the SQL side of this, but you can do the same thing in pure EF.
It will help next time to provide you EF code so we can provide you a more specific answer.
NOTE: do not use this logic in EF in scenarios where large sets of data will exist as the ReOrder process loads all records into memory, it is however useful for managing ordinality in child or sub lists that are scoped by an additional filter clause (so not for a whole table!)
The isolated ReOrder process is a good candidate on its own to go to the DB as a stored procedure if you need to do unique ranking logic across the entire table
There are two main variations here (for Unique Values):
Rank is must always be sequential/contiguous
This simplifies insert and replace logic, but you likely have to manage add, insert, swap and delete scenarios in the code.
Code to move items up and down in rank are very easy to implement
MUST manage deletes to re-compute the rank for all items
Rank can have gaps (not all values are contiguous)
This sounds like it should be easier, but to evaluate moving up and down in the list means you have to take take the gaps into account.
I wont post the code for this variation but be aware it is usually more complicated to maintain.
On the flip side you don't need to worry about actively managing deletes.
I use the following routine when the ordinal needs to be managed.
NOTE: This routine does not save the changes, it simply loads all the records that might be affected into memory so that we can correctly process the new ranking.
public static void ReOrderTableRecords(Context db)
{
// By convention do not allow the DB to do the ordering. this type of query will load missing DB values into the current dbContext,
// but will not replace the objects that are already loaded.
// The following query would be ordered by the original DB values:
// db.Table.OrderBy(x => x.Order).ToList()
// Instead we want to order by the current modified values in the db Context. This is a very important distinction which is why I have left this comment in place.
// So, load from the DB into memory and then order:
// db.Table[.Where(...optional filter by parentId...)].ToList().OrderBy(x => x.Order)
// NOTE: in this implementation we must also ensure that we don't include the items that have been flagged for deletion.
var currentValues = db.Table.ToList()
.Where(x => db.Entry(x).State != EntityState.Deleted)
.OrderBy(x => x.Rank);
int order = 1;
foreach (var item in currentValues)
item.Order = order++;
}
Lets say you can reduce your code to a function that Inserts a new item with a specific Rank into the list or you want to Swap the rank of two items in the list:
public static Table InsertItem(Context db, Table item, int? Rank = 1)
{
// Rank is optional, allows override of the item.Rank
if (Rank.HasValue)
item.Rank = Rank;
// Default to first item in the list as 1
if (item.Rank <= 0)
item.Rank = 1;
// re-order first, this will ensure no gaps.
// NOTE: the new item is not yet added to the collection yet
ReOrderTableRecords(db);
var items = db.Table.ToList()
.Where(x => db.Entry(x).State != EntityState.Deleted)
.Where(x => x.Rank >= item.Rank);
if (items.Any())
{
foreach (var i in items)
i.Rank = i.Rank + 1;
}
else if (item.Rank > 1)
{
// special case
// either ReOrderTableRecords(db) again... after adding the item to the table
item.Rank = db.Table.ToList()
.Where(x => db.Entry(x).State != EntityState.Deleted)
.Max(x => x.Rank) + 1;
}
db.Table.Add(item);
db.SaveChanges();
return item;
}
/// <summary> call this when Rank value is changed on a single row </summary>
public static void UpdateRank(Context db, Table item)
{
var rank = item.Rank;
item.Rank = -1; // move this item out of the list so it doesn't affect the ranking on reOrder
ReOrderTableRecords(db); // ensure no gaps
// use insert logic
var items = db.Table.ToList()
.Where(x => db.Entry(x).State != EntityState.Deleted)
.Where(x => x.Rank >= rank);
if (items.Any())
{
foreach (var i in items)
i.Rank = i.Rank + 1;
}
item.Rank = rank;
db.SaveChanges();
}
public static void SwapItemsByIds(Context db, int item1Id, int item2Id)
{
var item1 = db.Table.Single(x => x.Id == item1Id);
var item2 = db.Table.Single(x => x.Id == item2Id);
var rank = item1.Rank;
item1.Rank = item2.Rank;
item2.Rank = rank;
db.SaveChanges();
}
public static void MoveUpById(Context db, int item1Id)
{
var item1 = db.Table.Single(x => x.Id == item1Id);
var rank = item1.Rank - 1;
if (rank > 0) // Rank 1 is the highest
{
var item2 = db.Table.Single(x => x.Rank == rank);
item2.Rank = item1.Rank;
item1.Rank = rank;
db.SaveChanges();
}
}
public static void MoveDownById(Context db, int item1Id)
{
var item1 = db.Table.Single(x => x.Id == item1Id);
var rank = item1.Rank + 1;
var item2 = db.Table.SingleOrDefault(x => x.Rank == rank);
if (item2 != null) // item 1 is already the lowest rank
{
item2.Rank = item1.Rank;
item1.Rank = rank;
db.SaveChanges();
}
}
To ensure that Gaps are not introduced, you should call ReOrder after removing items from the table, but before calling SaveChanges()
Alternatively call ReOrder before each of Swap/MoveUp/MoveDown similar to insert.
Keep in mind that it is far simpler to allow duplicate Rank values, especially for large lists of data, but your business requirements will determine if this is a viable solution.

LINQ to Entities - Simple Way to split resultset into multiple objects

I have a table, containing weekly sales data from multiple years for a few hundred products.
Simplified, I have 3 columns: ProductID, Quantity, [and Date (week/year), not relevant for the question]
In order to process the data, i want to fetch everything using LINQ. In the next step I would like create a List of Objects for the sales data, where an Object consists of the ProductId and an array of the corresponding sales data.
EDIT: directly after, I will process all the retrieved data product-by-product in my program by passing the sales as an array to a statistics software (R with R dot NET) in order to get predictions.
Is there a simple (built in) way to accomplish this?
If not, in order to process the sales product by product,
should I just create the mentioned List using a loop?
Or should I, in terms of performance, avoid that all together and:
Fetch the sales data product-by-product from the database as I need it?
Or should I make one big List (with query.toList()) from the resultset and get my sales data product-by-product from there?

erm, something like
var groupedByProductId = query.GroupBy(p => p.ProductId).Select(g => new
{
ProdcutId = g.Key,
Quantity = g.Sum(p => p.Quantity)
});
or perhaps, if you don't want to sum and, instread need the quantities as an array of int ordered by Date.
var groupedByProductId = query.GroupBy(p => p.ProductId).Select(g => new
{
ProdcutId = g.Key,
Quantities = g.OrderBy(p => p.Date).Select(p => p.Quantity).ToArray()
});
or maybe you need to pass the data around and an anonymous type is inappropriate., you could make an IDictionary<int, int[]>.
var salesData = query.GroupBy(p => p.ProductId).ToDictionary(
g => g.Key,
g => g.OrderBy(p => p.Date).Select(p => p.Quantity).ToArray());
so later,
int productId = ...
int[] orderedQuantities = salesData[productId];
would be valid code (less the ellipsis.)

You may create a Product class with id and list of int data. Something as below:
Public class Product{
public List<int> list = new List<int>();
public int Id;
Public Product(int id,params int[] list){
Id = id;
for (int i = 0; i < list.Length; i++)
{
list.Add(list[i]);
}
}
}
Then use:
query.where(x=>new Product(x.ProductId,x.datum1,x.datum2,x.datum3));

C# List grouping and assigning a value

I have a list of Orders. This list contains multiple orders for the same item, see the table below.
I then want to assign each item that is the same (i.e. ABC) the same block ID. So ABC would have a block ID of 1 & each GHJ would have a block ID of 2 etc. What is the best way of doing this?
Currently I order the list by Order ID and then have a for loop and check if the current Order ID is equal to the next Order ID if so assign the two the same block ID. Is there a better way of doing this using linq or any other approach?
Order ID Block ID
ABC
ABC
ABC
GHJ
GHJ
GHJ
MNO
MNO

You can do this that way, it will assign same blockid for same orderid
var ordered = listOrder.GroupBy(x => x.OrderId).ToList();
for (int i = 0; i < ordered.Count(); i++)
{
ordered[i].ForEach(x=>x.BlockId=i+1);
}
it will group orders by orderid then assign each group next blockid. Note that it won't be done fully in linq, because linq is for querying not changing data.

Always depends of what better means for you in this context.
There are a bunch of possible solutions to this trivial problem.
On top of my head, I could think of:
var blockId = 1;
foreach(var grp in yourOrders.GroupBy(o => o.OrderId))
{
foreach(var order in grp)
{
order.BlockId = blockId;
}
blockId++;
}
or (be more "linqy"):
foreach(var t in yourOrders.GroupBy(o => o.OrderId).Zip(Enumerable.Range(1, Int32.MaxValue), (grp, bid) => new {grp, bid}))
{
foreach(var order in t.grp)
{
order.BlockId = t.bid;
}
}
or (can you still follow the code?):
var orders = yourOrders.GroupBy(o => o.OrderId)
.Zip(Enumerable.Range(1, Int16.MaxValue), (grp, id) => new {orders = grp, id})
.SelectMany(grp => grp.orders, (grp, order) => new {order, grp.id});
foreach(var item in orders)
{
item.order.BlockId = item.id;
}
or (probably the closest to a simple for loop):
Order prev = null;
blockId = 1;
foreach (var order in yourOrders.OrderBy(o => o.OrderId))
{
order.BlockId = (prev == null || prev.OrderId == order.OrderId) ?
blockId :
++blockId;
prev = order;
}
Linq? Yes.
Better than a simple loop? Uhmmmm....
Using Linq will not magically make your code better. Surely, it can make it often more declarative/readable/faster (in terms of lazy evaluation), but sure enough you can make otherwise fine imperative loops unreadable if you try to force the use of Linq just because Linq.
As a side note:
if you want to have feedback on working code, you can ask at codereview.stackexchange.com

LINQ Group-by with complete object access

What I want is better explained with code. I have this query:
var items = context.Items.GroupBy(g => new {g.Name, g.Model})
.Where(/*...*/)
.Select(i => new ItemModel{
Name=g.Key.Name,
SerialNumber = g.FirstOrDefault().SerialNumber //<-- here
});
Is there a better way to get the serial number or some other property that is not used in the key? The only way I could think of is to use FirstOrDefault.

Why not just include the serial number as part of the key via the anonymous type you're declaring:
var items = context.Items.GroupBy(g => new {g.Name, g.Model, g.SerialNumber })
.Where(/*...*/)
.Select(i => new ItemModel {
Name=g.Key.Name,
SerialNumber = g.FirstOrDefault().SerialNumber //<-- here
});
Or, alternatively, make your object the key:
var items = context.Items.Where(...).GroupBy(g => g)
.Select(i => new ItemModel {...});
Sometimes it can be easier to comprehend the query syntax (here, I've projected the Item object as part of the key):
var items = from i in context.Items
group i by new { Serial = g.Serialnumber, Item = g } into gi
where /* gi.Key.Item.GetType() == typeof(context.Items[0]) */
select new ItemModel {
Name = gi.Key.Name,
SerialNumber = gi.Key.Serial
/*...*/
};
EDIT: you could try grouping after projection like so:
var items = context.Items.Where(/*...*/).Select(i => new ItemModel { /*...*/})
.GroupBy(g => new { g.Name, g.Model });
you get an
IGrouping<AnonymousType``1, IEnumerable<ItemModel>> from this with your arbitrary group by as the key, and your ItemModels as the grouped collection.

I would strongly advise against what you're doing. The serial number is being chosen arbitrarily since you do no ordering in your queries. It would be better if you specified exactly which serial number to choose that way there are no surprises if the queries return items in a different ordering than "last time".
With that said, I think it would be cleaner to project the grouping and select the fields you need and take the first result. They all will have the same key values so that will stay the same, then you can add on any other fields you want.
var items = context.Items.GroupBy(i => new { i.Name, i.Model })
.Where(/*...*/)
.Select(g =>
g.OrderBy(i => i.Name).Select(i => new ItemModel
{
Name = i.Name,
SerialNumber = i.SerialNumber,
}).FirstOrDefault()
);

Since you need all the data, you need to store all the group data into your value (in the KeyValuePair).
I don't have the exact syntax in front of me, but it would look like:
/* ... */
.Select(g => new {
Key = g.key,
Values = g
});
After that, you can loop through the Key to get your Name group. Inside of that loop, include a loop through the Values to get your ItemModel (I guess that's the object containing 1 element).
It would look like:
foreach (var g in items)
{
Console.WriteLine("List of SerialNumber in {0} group", g.Key);
foreach (var i in g.Values)
{
Console.WriteLine(i.SerialNumber);
}
}
Hope this helps!
You might want to look at Linq 101 samples for some help on different queries.

if the serial number is unique to the name and model, you should include it in your group by object.
If it is not, then you have a list of serials per name and model, and selecting firstordefault is probably plain wrong, that is, I can think of no scenario you would want this.

Better way to update the underlying lookup via EF?

Here's my situation - I've got a DB which has some tables named recipes, ingredients and recipes_ingredients.
Recipes are composed of 1+ ingredients.
The recipes_ingredients has FKs between the recipes and ingredients table.
The classes that get generated are recipe and ingredient and recipe has a navigation property that looks like so:
public virtual ICollection<ingredients> ingredients { get; set; }
Great, I understand that I get a generated recipe class and a generated ingredient class and that the recipes_ingredients table doesn't get a class generated since EF views this simply as a navigation property.
Now, I've got a function called SetIngredientsForRecipe that looks like so (minus the try-catch code for brevity's sake:
public void SetIngredientsForRecipe(long recipeId, List<string> ingredients)
{
using (var db = new FoodEntities(ConnectionString, null, null))
{
var existing = GetCurrentIngredients(recipeId);
var toRemove = existing.Except(ingredients);
var toAdd = ingredients.Except(existing);
var recipe = db.recipes.Where(r => r.Id == recipeId).FirstOrDefault();
foreach (var name in toRemove)
{
var entry = recipe.ingredients.Where(i => i.Name == name).FirstOrDefault();
recipe.ingredients.Remove(entry);
}
foreach (var name in toAdd)
{
var entry = db.ingredients.Where(i => i.Name == name).FirstOrDefault();
recipe.ingredients.Add(entry);
}
db.SaveChanges();
}
}
The intent, as the name suggests, is to update the ingredient list for the given recipe to only whatever is in the list. I'm still getting comfortable with EF and wondering if there's a better (more efficient?) way to accomplish what I'm trying to do.
Follow-up:
Following the suggestions by ntziolis below, I opted to use
recipe.ingredients.Clear() to clear out whatever was in the recipe/ingredient mapping and then use the mocking that was mentioned to quickly add the new ones. Something like this:
foreach (var name in ingredients)
{
// Mock an ingredient since we just need the FK that is referenced
// by the mapping table - the other properties don't matter since we're
// just doing the mapping not inserting anything
recipe.ingredients.Add(new Ingredient()
{
Name = name
});
}
and this works very nicely.

General performance guidelines are:
try to deal with id's only
mock entities whenever possible, rather than retrieving them from db
use the new features of EF4 like Contains in order to simplify and speed up your code
Based on these principles here is a optimized (not simpler though) solution to your problem:
public void SetIngredientsForRecipe(long recipeId, List<string> ingredients)
{
using (var db = new FoodEntities(ConnectionString, null, null))
{
var recipe = db.recipe.Single(r => r.ID == recipeId);
// make an array since EF4 supports the contains keyword for arrays
var ingrArr = ingredients.ToArray();
// get the ids (and only the ids) of the new ingredients
var ingrNew = new HasSet<int>(db.ingrediants
.Where(i => ingrArr.Contains(i.Name))
.Select(i => I.Id));
// get the ids (again only the ids) of the current receipe
var curIngr = new HasSet<int>(db.receipes
.Where(r => r.Id == recipeId)
.SelectMany(r => r.ingredients)
.Select(i => I.Id));
// use the build in hash set functions to get the ingredients to add / remove
var toAdd = ingrNew.ExpectWith(curIngr);
var toRemove = curIngr.ExpectWith(ingrNew);
foreach (var id in toAdd)
{
// mock the ingredients rather than fetching them, for relations only the id needs to be there
recipe.ingredients.Add(new Ingredient()
{
Id = id
});
}
foreach (var id in toRemove)
{
// again mock only
recipe.ingredients.Remove(new Ingredient()
{
Id = id
});
}
db.SaveChanges();
}
}
If you want it simpler you could just clear all ingredients and re add them if necessary, EF might even be clever enough to figure out that the relations haven't changed, not sure about it though:
public void SetIngredientsForRecipe(long recipeId, List<string> ingredients)
{
using (var db = new FoodEntities(ConnectionString, null, null))
{
var recipe = db.recipe.Single(r => r.ID == recipeId);
// clear all ingredients first
recipe.ingredients.Clear()
var ingrArr = ingredients.ToArray();
var ingrIds = new HasSet<int>(db.ingrediants
.Where(i => ingrArr.Contains(i.Name))
.Select(i => I.Id));
foreach (var id in ingrIds)
{
// mock the ingredients rather than fetching them, for relations only the id needs to be there
recipe.ingredients.Add(new Ingredient()
{
Id = id
});
}
db.SaveChanges();
}
}
UPDATE
Some coding errors have been corrected.

You can condense your Where clauses with the FirstOrDefault calls:
recipe.ingredients.FirstOrDefault(i => i.Name == name);
Though I personally prefer to use SingleOrDefault though I'm not sure what the difference is exactly:
recipe.ingredients.SingleOrDefault(i => i.Name == name);
Also, since the ingredient list that is passed in is a List<string> (as opposed to a list of ingredient IDs), it sort of implies that new ingredients may also be created as part of this process, which isn't handled (though may have been left out for brevity).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regarding LINQ Usage in Large Loops - c#

Store your Rate Names in a sorted string array (string[]) instead of a List or Collection. Then use Array.BinarySearch() to make your search much faster. Rest of what I was going to write has already been written by #Felipe above.

Related

How to get around unique key constraint when updating via Entity Framework (using dbcontext.SaveChanges())

LINQ to Entities - Simple Way to split resultset into multiple objects

C# List grouping and assigning a value

LINQ Group-by with complete object access

Better way to update the underlying lookup via EF?

Categories

Resources