Newbie performance issue with foreach ...need advice

Newbie performance issue with foreach ...need advice - c#

This section simply reads from an excel spreadsheet. This part works fine with no performance issues.
IEnumerable<ImportViewModel> so=data.Select(row=>new ImportViewModel{
PersonId=(row.Field<string>("person_id")),
ValidationResult = ""
}).ToList();
Before I pass to a View I want to set ValidationResult so I have this piece of code. If I comment this out the model is passed to the view quickly. When I use the foreach it will take over a minute. If I hardcode a value for item.PersonId then it runs quickly. I know I'm doing something wrong, just not sure where to start and what the best practice is that I should be following.
foreach (var item in so)
{
if (db.Entity.Any(w => w.ID == item.PersonId))
{
item.ValidationResult = "Successful";
}
else
{
item.ValidationResult = "Error: ";
}
}
return View(so.ToList());

You are now performing a database call per item in your list. This is really hard on your database and thus your performance. Try to itterate trough your excel result, gather all users and select them in one query. Make a list from this query result (else the query call is performed every time you access the list). Then perform a match between the result list and your excel.

You need to do something like this :
var ids = so.Select(i=>i.PersonId).Distinct().ToList();
// Hitting Database just for this time to get all Users Ids
var usersIds = db.Entity.Where(u=>ids.Contains(u.ID)).Select(u=>u.ID).ToList();
foreach (var item in so)
{
if (usersIds.Contains(item.PersonId))
{
item.ValidationResult = "Successful";
}
else
{
item.ValidationResult = "Error: ";
}
}
return View(so.ToList());

Related

Issues with ListView here in C#

So I'm working on learning LINQ and was assigned to work with two .txt files and to join them.
So far I'm doing well, but I've reached a bit of an impasse with the display. I'm supposed to have the name display once and then the following cases that are closed have only the case information.
The issue I'm having is that the name keeps repeating after the dataset is listed in the ListView. I think there is something wrong with the LINQ statement or the way I'm going through the foreach loop. Here is the code for the main form below:
//Fills the lists by calling the methods from the DB classes
techs = TechnicianDB.GetTechnicians();
incidents = IncidentDB.GetIncidents();
//Creates a variable to use in the LINQ statements
var ClosedCases = from Incident in incidents
join Technician in techs
on Incident.TechID equals Technician.TechID
where Incident.DateClosed!= null
orderby Technician.Name, Incident.DateOpened descending
select new { Technician.Name, Incident.ProductCode, Incident.DateOpened, Incident.DateClosed, Incident.Title };
//variables to hold the technician name, and the integer to increment the listview
string techName = "";
int i = 0;
//foreach loop to pull the fields out of the lists and to display them in the required areas in the listview box
foreach (var Incident in ClosedCases)
{
foreach (var Technician in ClosedCases)
{
if (Technician.Name != techName)
{
lvClosedCases.Items.Add(Technician.Name);
techName = Technician.Name;
}
else
{
lvClosedCases.Items.Add("");
}
}
lvClosedCases.Items[i].SubItems.Add(Incident.ProductCode);
lvClosedCases.Items[i].SubItems.Add(Incident.DateOpened.ToString());
lvClosedCases.Items[i].SubItems.Add(Incident.DateClosed.ToString());
lvClosedCases.Items[i].SubItems.Add(Incident.Title);
i++;
}
And here is the result I get: Result
As can be seen by the bar on the right hand side, the list continues on for several more columns.
What am I missing here?
Thank you.
EDIT: Per request, here is what the results are supposed to look like:
The example I was given

Why you are iterating closed cases twice?
foreach (var Incident in ClosedCases)
{
foreach (var Technician in ClosedCases)
{
}
}

Redis Optimization with .NET, and a concrete example of How to Store and get an element from Hash

I have more than 15000 POCO elements stored in a Redis List. I'm using ServiceStack in order to save and get them. However, I'm not pleased about the response times that I have when I get them into a grid. As I read , it would be better to store these object in hash - but unfortunately I could not find any good example for my case :(
This is the method I use, in order to get them into my grid
public IEnumerable<BookingRequestGridViewModel> GetAll()
{
try
{
var redisManager = new RedisManagerPool(Global.RedisConnector);
using (var redis = redisManager.GetClient())
{
var redisEntities = redis.As<BookingRequestModel>();
var result =redisEntities.Lists["BookingRequests"].GetAll().Select(z=> new BookingRequestGridViewModel
{
CreatedDate =z.CreatedDate,
DropOffBranchName =z.DropOffBranch !=null ? z.DropOffBranch.Name : string.Empty,
DropOffDate =z.DropOffDate,
DropOffLocationName = z.DropOffLocation != null ? z.DropOffLocation.Name : string.Empty,
Id =z.Id.Value,
Number =z.Number,
PickupBranchName =z.PickUpBranch !=null ? z.PickUpBranch.Name :string.Empty,
PickUpDate =z.PickUpDate,
PickupLocationName = z.PickUpLocation != null ? z.PickUpLocation.Name : string.Empty
}).OrderBy(z=>z.Id);
return result;
}
}
catch (Exception ex)
{
return null;
}
}
Note that I use redisEntities.Lists["BookingRequests"].GetAll() which is causing performance issues (I would like to use just redisEntities.Lists["BookingRequests"] but I lose last updates from grid - after editing)
I would like to know if saving them into list is a good approach as for me it's very important to have a fast grid (I have now 1 second at paging which is huge).
Please, advice!

Firstly you should not create a new Redis Client Manager like RedisManagerPool instance each time, there should only be a singleton instance of RedisManagerPool in your App which all clients are resolved from.
But otherwise I would rethink your data access strategy, downloading 15K items in a batch is not an ideal strategy. You can create indexes by storing ids in Sets or you could store items in a sorted set with a value that you can page against like an incrementing id, e.g:
var redisEntities = redis.As<BookingRequestModel>();
var bookings = redisEntities.SortedSets["bookings"];
foreach (var item in new BookingRequestModel[0])
{
redisEntities.AddItemToSortedSet(bookings, item, item.Id);
}
That way you will be able to fetch them in batches, e.g:
var batch = bookings.GetRangeByLowestScore(fromId, toId, skip, take);

How to Optimize Code Performance in .NET [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have an export job migrating data from an old database into a new database. The problem I'm having is that the user population is around 3 million and the job takes a very long time to complete (15+ hours). The machine I am using only has 1 processor so I'm not sure if threading is what I should be doing. Can someone help me optimize this code?
static void ExportFromLegacy()
{
var usersQuery = _oldDb.users.Where(x =>
x.status == 'active');
int BatchSize = 1000;
var errorCount = 0;
var successCount = 0;
var batchCount = 0;
// Using MoreLinq's Batch for sequences
// https://www.nuget.org/packages/MoreLinq.Source.MoreEnumerable.Batch
foreach (IEnumerable<users> batch in usersQuery.Batch(BatchSize))
{
Console.WriteLine(String.Format("Batch count at {0}", batchCount));
batchCount++;
foreach(var user in batch)
{
try
{
var userData = _oldDb.userData.Where(x =>
x.user_id == user.user_id).ToList();
if (userData.Count > 0)
{
// Insert into table
var newData = new newData()
{
UserId = user.user_id; // shortened code for brevity.
};
_db.newUserData.Add(newData);
_db.SaveChanges();
// Insert item(s) into table
foreach (var item in userData.items)
{
if (!_db.userDataItems.Any(x => x.id == item.id)
{
var item = new Item()
{
UserId = user.user_id, // shortened code for brevity.
DataId = newData.id // id from object created above
};
_db.userDataItems.Add(item);
}
_db.SaveChanges();
successCount++;
}
}
}
catch(Exception ex)
{
errorCount++;
Console.WriteLine(String.Format("Error saving changes for user_id: {0} at {1}.", user.user_id.ToString(), DateTime.Now));
Console.WriteLine("Message: " + ex.Message);
Console.WriteLine("InnerException: " + ex.InnerException);
}
}
}
Console.WriteLine(String.Format("End at {0}...", DateTime.Now));
Console.WriteLine(String.Format("Successful imports: {0} | Errors: {1}", successCount, errorCount));
Console.WriteLine(String.Format("Total running time: {0}", (exportStart - DateTime.Now).ToString(#"hh\:mm\:ss")));
}

Unfortunately, the major issue is the number of database round-trip.
You make a round-trip:
For every user, you retrieve user data by user id in the old database
For every user, you save user data in the new database
For every user, you save user data item in the new database
So if you say you have 3 million users, and every user has an average of 5 user data item, it mean you do at least 3m + 3m + 15m = 21 million database round-trip which is insane.
The only way to dramatically improve the performance is by reducing the number of database round-trip.
Batch - Retrieve user by id
You can quickly reduce the number of database round-trip by retrieving all user data at once and since you don't have to track them, use "AsNoTracking()" for even more performance gains.
var list = batch.Select(x => x.user_id).ToList();
var userDatas = _oldDb.userData
.AsNoTracking()
.Where(x => list.Contains(x.user_id))
.ToList();
foreach(var userData in userDatas)
{
....
}
You should already have saved a few hours only with this change.
Batch - Save Changes
Every time you save a user data or item, you perform a database round-trip.
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library allows to perform:
BulkSaveChanges
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
You can either call BulkSaveChanges at the end of the batch or create a list to insert and use directly BulkInsert instead for even more performance.
You will, however, have to use a relation to the newData instance instead of using the ID directly.
foreach (IEnumerable<users> batch in usersQuery.Batch(BatchSize))
{
// Retrieve all users for the batch at once.
var list = batch.Select(x => x.user_id).ToList();
var userDatas = _oldDb.userData
.AsNoTracking()
.Where(x => list.Contains(x.user_id))
.ToList();
// Create list used for BulkInsert
var newDatas = new List<newData>();
var newDataItems = new List<Item();
foreach(var userData in userDatas)
{
// newDatas.Add(newData);
// newDataItem.OwnerData = newData;
// newDataItems.Add(newDataItem);
}
_db.BulkInsert(newDatas);
_db.BulkInsert(newDataItems);
}
EDIT: Answer subquestion
One of the properties of a newDataItem, is the id of newData. (ex.
newDataItem.newDataId.) So newData would have to be saved first in
order to generate its id. How would I BulkInsert if there is a
dependency of an another object?
You must use instead navigation properties. By using navigation property, you will never have to specify parent id but set the parent object instance instead.
public class UserData
{
public int UserDataID { get; set; }
// ... properties ...
public List<UserDataItem> Items { get; set; }
}
public class UserDataItem
{
public int UserDataItemID { get; set; }
// ... properties ...
public UserData OwnerData { get; set; }
}
var userData = new UserData();
var userDataItem = new UserDataItem();
// Use navigation property to set the parent.
userDataItem.OwnerData = userData;
Tutorial: Configure One-to-Many Relationship
Also, I don't see a BulkSaveChanges in your example code. Would that
have to be called after all the BulkInserts?
Bulk Insert directly insert into the database. You don't have to specify "SaveChanges" or "BulkSaveChanges", once you invoke the method, it's done ;)
Here is an example using BulkSaveChanges:
foreach (IEnumerable<users> batch in usersQuery.Batch(BatchSize))
{
// Retrieve all users for the batch at once.
var list = batch.Select(x => x.user_id).ToList();
var userDatas = _oldDb.userData
.AsNoTracking()
.Where(x => list.Contains(x.user_id))
.ToList();
// Create list used for BulkInsert
var newDatas = new List<newData>();
var newDataItems = new List<Item();
foreach(var userData in userDatas)
{
// newDatas.Add(newData);
// newDataItem.OwnerData = newData;
// newDataItems.Add(newDataItem);
}
var context = new UserContext();
context.userDatas.AddRange(newDatas);
context.userDataItems.AddRange(newDataItems);
context.BulkSaveChanges();
}
BulkSaveChanges is slower than BulkInsert due to having to use some internal method from Entity Framework but still way faster than SaveChanges.
In the example, I create a new context for every batch to avoid memory issue and gain some performance. If you re-use the same context for all batchs, you will have millions of tracked entities in the ChangeTracker which is never a good idea.

Entity Framework is a very bad choice for importing large amounts of data. I know this from personal experience.
That being said, I found a few ways to optimize things when I tried to use it in the same way you are.
The Context will cache objects as you add them, and the more inserts you do, the slower future inserts will get. My solution was to limit each context to about 500 inserts before I disposed of that instance and created a new one. This boosted performance significantly.
I was able to make use of multiple threads to increase performance, but you will have to be very careful about resource contention. Each thread will definitely need its own Context, don't even think about trying to share it between threads. My machine had 8 cores, so threading will probably not help you as much; with a single core I doubt it will help you at all.
Turn off ChangeTracking with AutoDetectChangesEnabled = false;, change tracking is incredibly slow. Unfortunately this means you have to modify your code to make all changes directly through the context. No more Entity.Property = "Some Value";, it becomes Context.Entity(e=> e.Property).SetValue("Some Value"); (or something like that, I don't remember the exact syntax), which makes the code ugly.
Any queries you do should definitely use AsNoTracking.
With all that, I was able to cut a ~20 hour process down to about 6 hours, but I still don't recommend using EF for this. It was an extremely painful project due almost entirely to my poor choice of EF to add data. Please use something else... anything else...
I don't want to give the impression that EF is a bad data access library, it is great at what it was designed to do, unfortunately this is not what it was designed for.

I can think on a few options.
1) A little speed increase could be done by moving your _db.SaveChanges() under your foreach() close bracket
foreach (...){
}
successCount += _db.SaveChanges();
2) Add items to a list, and then to context
List<ObjClass> list = new List<ObjClass>();
foreach (...)
{
list.Add(new ObjClass() { ... });
}
_db.newUserData.AddRange(list);
successCount += _db.SaveChanges();
3) If it's a big amount of dada, save on bunches
List<ObjClass> list = new List<ObjClass>();
int cnt=0;
foreach (...)
{
list.Add(new ObjClass() { ... });
if (++cnt % 100 == 0) // bunches of 100
{
_db.newUserData.AddRange(list);
successCount += _db.SaveChanges();
list.Clear();
// Optional if a HUGE amount of data
if (cnt % 1000 == 0)
{
_db = new MyDbContext();
}
}
}
// Don't forget that!
_db.newUserData.AddRange(list);
successCount += _db.SaveChanges();
list.Clear();
4) If TOOOO big, considere using bulkinserts. There are a few examples on internet and a few free libraries around.
Ref: https://blogs.msdn.microsoft.com/nikhilsi/2008/06/11/bulk-insert-into-sql-from-c-app/
On most of these options you loose some control on error handling as it is difficult to know which one failed.

How to retrieve keywords within thousands of records from SQL Server 2008 fast?

Using the query function of entity collection in C# and it takes a long time to load the related records back from SQL Server 2008. Is there any fast way to do this? This is the query function I use:
public void SearchProducts()
{
//Filter by search string array(searchArray)
List<string> prodId = new List<string>();
foreach (string src in searchArray)
{
StoreProductCollection prod = new StoreProductCollection();
prod.Query.Where(prod.Query.StptName.ToLower() == src.ToLower() && prod.Query.StptDeleted.IsNull());
prod.Query.Select(prod.Query.StptName, prod.Query.StptPrice, prod.Query.StptImage, prod.Query.StptStoreProductID);
// prod.Query.es.Top = 4;
prod.Query.Load();
if (prod.Count > 0)
{
foreach (StoreProduct stpt in prod)
{
if (!prodId.Contains(stpt.StptStoreProductID.ToString().Trim()))
{
prodId.Add(stpt.StptStoreProductID.ToString().Trim());
productObjectsList.Add(stpt);
}
}
}
}

You're hitting the database once per searchArray item, this is very wrong.
You might get better performance like this (have no way of testing it, give it a shot):
public void SearchProducts()
{
//Filter by search string array(searchArray)
List<string> prodId = new List<string>();
StoreProductCollection prod = new StoreProductCollection();
// Notice that your foreach() is gone
// replace this
// prod.Query.Where(prod.Query.StptName.ToLower() == src.ToLower() && prod.Query.StptDeleted.IsNull());
// with this (or something similar: point is, you should call .Load() exactly once)
prod.Query.where(prod.Query.StptDeleted.IsNull() && src.Any(srcArrayString => prod.Query.StptName.ToLower()==srcArrayString.ToLower());
prod.Query.Select(prod.Query.StptName, prod.Query.StptPrice, prod.Query.StptImage, prod.Query.StptStoreProductID);
// prod.Query.es.Top = 4;
prod.Query.Load();
// ... rest of your code follows.
}

Given List<string> searchArray containing lowered words :
public void SearchProducts()
{
//Filter by search string array(searchArray)
List<string> prodId = new List<string>();
StoreProductCollection prod = new StoreProductCollection();
prod.Query.Where(searchArray.Contains(prod.Query.StptName.ToLower()) && prod.Query.StptDeleted.IsNull());
prod.Query.Select(prod.Query.StptName, prod.Query.StptPrice, prod.Query.StptImage, prod.Query.StptStoreProductID);
// prod.Query.es.Top = 4;
prod.Query.Load();
if (prod.Count > 0)
{
foreach (StoreProduct stpt in prod)
{
if (!prodId.Contains(stpt.StptStoreProductID.ToString().Trim()))
{
prodId.Add(stpt.StptStoreProductID.ToString().Trim());
productObjectsList.Add(stpt);
}
}
}
}
This way you have only one query for all words.

First of all, put an index on StptName column.
Second, if you need even better performance, write a Stored Procedure in SQL, to do your querying, and map it with Entity Framework.
Let me know if you need explanation on how to do any of the above.
A couple more micro-optimizations you can do if you don't want to write a Stored Procedure:
Write src.ToLower() in a temporary varaible, and than compare prod.Query.StptName.ToLower() to it.
By default, SQL Server queries are case insensitive, so check if that's the case, and if so, you can get rid of the ToLower altogether. You can change case sensitivity through Collation.
EDIT:
To create an Index:
Open the table designer in SQL Server Managment Studio.
Right click anywhere and select Indexes/Keys.
Click Add.
Under Columns add StptName.
Under Is Unique specify whether StptName is unique or not.
Under type select "index".
That's all!
As for mapping stored procedures - here's a nice tutorial:
http://www.robbagby.com/entity-framework/entity-framework-modeling-select-stored-procedures/
(You can jump straight to the "Map in the Select Stored Procedure" Section).

How to update records from an IList in a Foreach loop?

My controller is passing through a list which I then need to loop through and update every record in the list in my database. I'm using ASP.NET MVC with a repository pattern using Linq to Sql. The code below is my save method which needs to add a record to an invoice table and then update the applicable jobs in the job table from the db.
public void SaveInvoice(Invoice invoice, IList<InvoiceJob> invoiceJobs)
{
invoiceTable.InsertOnSubmit(invoice);
invoiceTable.Context.SubmitChanges();
foreach (InvoiceJob j in invoiceJobs)
{
var jobUpdate = invoiceJobTable.Where(x => x.JobID == j.JobID).Single();
jobUpdate.InvoiceRef = invoice.InvoiceID.ToString();
invoiceJobTable.GetOriginalEntityState(jobUpdate);
invoiceJobTable.Context.Refresh(RefreshMode.KeepCurrentValues, jobUpdate);
invoiceJobTable.Context.SubmitChanges();
}
}
**I've stripped the code down to just the problem area.
This code doesn't work and no job records are updated, but the invoice table is updated fine. No errors are thrown and the invoiceJobs IList is definitely not null. If I change the code by removing the foreach loop and manually specifying which JobId to update, it works fine. The below works:
public void SaveInvoice(Invoice invoice, IList<InvoiceJob> invoiceJobs)
{
invoiceTable.InsertOnSubmit(invoice);
invoiceTable.Context.SubmitChanges();
var jobUpdate = invoiceJobTable.Where(x => x.JobID == 10000).Single();
jobUpdate.InvoiceRef = invoice.InvoiceID.ToString();
invoiceJobTable.GetOriginalEntityState(jobUpdate);
invoiceJobTable.Context.Refresh(RefreshMode.KeepCurrentValues, jobUpdate);
invoiceJobTable.Context.SubmitChanges();
}
I just can't get the foreach loop to work at all. Does anyone have any idea what I'm doing wrong here?

It seems like the mostly likely cause of this problem is that the invokeJobs collection is an empty collection. That is it has no elements hence the foreach loop effectively does nothing.
You can verify this by adding the following to the top of the method (just for debugging purposes)
if (invoiceJobs.Count == 0) {
throw new ArgumentException("It's an empty list");
}

Change this
var jobUpdate = invoiceJobTable.Where(x => x.JobID == 10000).Single();
jobUpdate.InvoiceRef = invoice.InvoiceID.ToString();
invoiceJobTable.GetOriginalEntityState(jobUpdate);
invoiceJobTable.Context.Refresh(RefreshMode.KeepCurrentValues, jobUpdate);
invoiceJobTable.Context.SubmitChanges();
to
var jobUpdate = invoiceJobTable.Where(x => x.JobID == 10000).Single();
jobUpdate.InvoiceRef = invoice.InvoiceID.ToString();
invoiceJobTable.SubmitChanges();
It looks like your GetOriginalEntityState doesn't actually do anything, because you don't use the returned value. I can't see any reason why you are making the DataContext.Refresh() call. All it does is erase the changes you made, thus making your "foreach loop not work"

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.