I am looking to find the best way to cache the DB Lookup Tables which consists of about 75 table.
I want to cache these tables data to use them in my application so I won't open a connection with the DB each time I need them.
Here is what I am doing:
I have created a static class which contains static properties to each lookup table called MyApplicationCache.
In each property getter I am filling it with the intended data from DB.
I'm caching the data using HttpRuntime.Cache["PropertyName"]
Each time i GET this lookup table data I check if the HttpRuntime.Cache["PropertyName"] != null
If yes then I am getting it from cache else I am getting it from DB
Finally, I am invoking all properties at application start event in global.asax
Until now everything is good, but recently I've faced a performance issue and I can't solve it. If I wanted the cache object (Payer) to be updated from DB I am doing this:
MyApplicationCache.Payer = null;
This sets HttpRuntime.Cache["Payer"] = null so if I requested it again it reloads it from the DB.
list<payer> payerList = MyApplicationCache.Payer;
Now the Performance problem raises:
PayerList in DB are about 1700 record.
Each payer object has a List property called PayerBranches which requires looping on all PayerList List and getting PayerBranches for each PayerList item.
// MyApplicationCache Payer Property:
public static List<LDM.DataEntityTier.Payer> Payer {
get {
if (HttpRuntime.Cache["Payer"] != null)
return (List<LDM.DataEntityTier.Payer>)HttpRuntime.Cache["Payer"];
// request item from its original source
using (LDM.DataAccess.OracleManager OracleManager = new LDM.DataAccess.OracleManager()) {
OracleManager.OpenConnection();
List<LDM.DataEntityTier.Payer> result = new LDM.DataService.PayerService().GetPayersListWithFullName(3, OracleManager, "UTC");
//List<LDM.DataEntityTier.Payer> result = new LDM.DataService.PayerService().GetListOfPayer("Order by Name asc", OracleManager ,"UTC");
List<PayerBranches> payerBranchesList = new LDM.DataService.PayerBranchesService().GetListOfObject(OracleManager, "UTC");
OracleManager.CloseConnection();
foreach (Payer payerItem in result) {
payerItem.PayerBranches = new List<PayerBranches>();
foreach (PayerBranches item in payerBranchesList.FindAll(x => x.PayerID == payerItem.Id)) {
payerItem.PayerBranches.Add(item);
}
}
// add item to cache
HttpRuntime.Cache["Payer"] = result;
return result;
}
}
set {
if (value == null) {
HttpRuntime.Cache.Remove("Payer");
}
}
}
This problem occurs with each property that has a list in it
I don't know if there is a better way to cache data or if there is a problem in my code.
Is there is a better way to do caching?
Related
I have more than 15000 POCO elements stored in a Redis List. I'm using ServiceStack in order to save and get them. However, I'm not pleased about the response times that I have when I get them into a grid. As I read , it would be better to store these object in hash - but unfortunately I could not find any good example for my case :(
This is the method I use, in order to get them into my grid
public IEnumerable<BookingRequestGridViewModel> GetAll()
{
try
{
var redisManager = new RedisManagerPool(Global.RedisConnector);
using (var redis = redisManager.GetClient())
{
var redisEntities = redis.As<BookingRequestModel>();
var result =redisEntities.Lists["BookingRequests"].GetAll().Select(z=> new BookingRequestGridViewModel
{
CreatedDate =z.CreatedDate,
DropOffBranchName =z.DropOffBranch !=null ? z.DropOffBranch.Name : string.Empty,
DropOffDate =z.DropOffDate,
DropOffLocationName = z.DropOffLocation != null ? z.DropOffLocation.Name : string.Empty,
Id =z.Id.Value,
Number =z.Number,
PickupBranchName =z.PickUpBranch !=null ? z.PickUpBranch.Name :string.Empty,
PickUpDate =z.PickUpDate,
PickupLocationName = z.PickUpLocation != null ? z.PickUpLocation.Name : string.Empty
}).OrderBy(z=>z.Id);
return result;
}
}
catch (Exception ex)
{
return null;
}
}
Note that I use redisEntities.Lists["BookingRequests"].GetAll() which is causing performance issues (I would like to use just redisEntities.Lists["BookingRequests"] but I lose last updates from grid - after editing)
I would like to know if saving them into list is a good approach as for me it's very important to have a fast grid (I have now 1 second at paging which is huge).
Please, advice!
Firstly you should not create a new Redis Client Manager like RedisManagerPool instance each time, there should only be a singleton instance of RedisManagerPool in your App which all clients are resolved from.
But otherwise I would rethink your data access strategy, downloading 15K items in a batch is not an ideal strategy. You can create indexes by storing ids in Sets or you could store items in a sorted set with a value that you can page against like an incrementing id, e.g:
var redisEntities = redis.As<BookingRequestModel>();
var bookings = redisEntities.SortedSets["bookings"];
foreach (var item in new BookingRequestModel[0])
{
redisEntities.AddItemToSortedSet(bookings, item, item.Id);
}
That way you will be able to fetch them in batches, e.g:
var batch = bookings.GetRangeByLowestScore(fromId, toId, skip, take);
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have an export job migrating data from an old database into a new database. The problem I'm having is that the user population is around 3 million and the job takes a very long time to complete (15+ hours). The machine I am using only has 1 processor so I'm not sure if threading is what I should be doing. Can someone help me optimize this code?
static void ExportFromLegacy()
{
var usersQuery = _oldDb.users.Where(x =>
x.status == 'active');
int BatchSize = 1000;
var errorCount = 0;
var successCount = 0;
var batchCount = 0;
// Using MoreLinq's Batch for sequences
// https://www.nuget.org/packages/MoreLinq.Source.MoreEnumerable.Batch
foreach (IEnumerable<users> batch in usersQuery.Batch(BatchSize))
{
Console.WriteLine(String.Format("Batch count at {0}", batchCount));
batchCount++;
foreach(var user in batch)
{
try
{
var userData = _oldDb.userData.Where(x =>
x.user_id == user.user_id).ToList();
if (userData.Count > 0)
{
// Insert into table
var newData = new newData()
{
UserId = user.user_id; // shortened code for brevity.
};
_db.newUserData.Add(newData);
_db.SaveChanges();
// Insert item(s) into table
foreach (var item in userData.items)
{
if (!_db.userDataItems.Any(x => x.id == item.id)
{
var item = new Item()
{
UserId = user.user_id, // shortened code for brevity.
DataId = newData.id // id from object created above
};
_db.userDataItems.Add(item);
}
_db.SaveChanges();
successCount++;
}
}
}
catch(Exception ex)
{
errorCount++;
Console.WriteLine(String.Format("Error saving changes for user_id: {0} at {1}.", user.user_id.ToString(), DateTime.Now));
Console.WriteLine("Message: " + ex.Message);
Console.WriteLine("InnerException: " + ex.InnerException);
}
}
}
Console.WriteLine(String.Format("End at {0}...", DateTime.Now));
Console.WriteLine(String.Format("Successful imports: {0} | Errors: {1}", successCount, errorCount));
Console.WriteLine(String.Format("Total running time: {0}", (exportStart - DateTime.Now).ToString(#"hh\:mm\:ss")));
}
Unfortunately, the major issue is the number of database round-trip.
You make a round-trip:
For every user, you retrieve user data by user id in the old database
For every user, you save user data in the new database
For every user, you save user data item in the new database
So if you say you have 3 million users, and every user has an average of 5 user data item, it mean you do at least 3m + 3m + 15m = 21 million database round-trip which is insane.
The only way to dramatically improve the performance is by reducing the number of database round-trip.
Batch - Retrieve user by id
You can quickly reduce the number of database round-trip by retrieving all user data at once and since you don't have to track them, use "AsNoTracking()" for even more performance gains.
var list = batch.Select(x => x.user_id).ToList();
var userDatas = _oldDb.userData
.AsNoTracking()
.Where(x => list.Contains(x.user_id))
.ToList();
foreach(var userData in userDatas)
{
....
}
You should already have saved a few hours only with this change.
Batch - Save Changes
Every time you save a user data or item, you perform a database round-trip.
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library allows to perform:
BulkSaveChanges
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
You can either call BulkSaveChanges at the end of the batch or create a list to insert and use directly BulkInsert instead for even more performance.
You will, however, have to use a relation to the newData instance instead of using the ID directly.
foreach (IEnumerable<users> batch in usersQuery.Batch(BatchSize))
{
// Retrieve all users for the batch at once.
var list = batch.Select(x => x.user_id).ToList();
var userDatas = _oldDb.userData
.AsNoTracking()
.Where(x => list.Contains(x.user_id))
.ToList();
// Create list used for BulkInsert
var newDatas = new List<newData>();
var newDataItems = new List<Item();
foreach(var userData in userDatas)
{
// newDatas.Add(newData);
// newDataItem.OwnerData = newData;
// newDataItems.Add(newDataItem);
}
_db.BulkInsert(newDatas);
_db.BulkInsert(newDataItems);
}
EDIT: Answer subquestion
One of the properties of a newDataItem, is the id of newData. (ex.
newDataItem.newDataId.) So newData would have to be saved first in
order to generate its id. How would I BulkInsert if there is a
dependency of an another object?
You must use instead navigation properties. By using navigation property, you will never have to specify parent id but set the parent object instance instead.
public class UserData
{
public int UserDataID { get; set; }
// ... properties ...
public List<UserDataItem> Items { get; set; }
}
public class UserDataItem
{
public int UserDataItemID { get; set; }
// ... properties ...
public UserData OwnerData { get; set; }
}
var userData = new UserData();
var userDataItem = new UserDataItem();
// Use navigation property to set the parent.
userDataItem.OwnerData = userData;
Tutorial: Configure One-to-Many Relationship
Also, I don't see a BulkSaveChanges in your example code. Would that
have to be called after all the BulkInserts?
Bulk Insert directly insert into the database. You don't have to specify "SaveChanges" or "BulkSaveChanges", once you invoke the method, it's done ;)
Here is an example using BulkSaveChanges:
foreach (IEnumerable<users> batch in usersQuery.Batch(BatchSize))
{
// Retrieve all users for the batch at once.
var list = batch.Select(x => x.user_id).ToList();
var userDatas = _oldDb.userData
.AsNoTracking()
.Where(x => list.Contains(x.user_id))
.ToList();
// Create list used for BulkInsert
var newDatas = new List<newData>();
var newDataItems = new List<Item();
foreach(var userData in userDatas)
{
// newDatas.Add(newData);
// newDataItem.OwnerData = newData;
// newDataItems.Add(newDataItem);
}
var context = new UserContext();
context.userDatas.AddRange(newDatas);
context.userDataItems.AddRange(newDataItems);
context.BulkSaveChanges();
}
BulkSaveChanges is slower than BulkInsert due to having to use some internal method from Entity Framework but still way faster than SaveChanges.
In the example, I create a new context for every batch to avoid memory issue and gain some performance. If you re-use the same context for all batchs, you will have millions of tracked entities in the ChangeTracker which is never a good idea.
Entity Framework is a very bad choice for importing large amounts of data. I know this from personal experience.
That being said, I found a few ways to optimize things when I tried to use it in the same way you are.
The Context will cache objects as you add them, and the more inserts you do, the slower future inserts will get. My solution was to limit each context to about 500 inserts before I disposed of that instance and created a new one. This boosted performance significantly.
I was able to make use of multiple threads to increase performance, but you will have to be very careful about resource contention. Each thread will definitely need its own Context, don't even think about trying to share it between threads. My machine had 8 cores, so threading will probably not help you as much; with a single core I doubt it will help you at all.
Turn off ChangeTracking with AutoDetectChangesEnabled = false;, change tracking is incredibly slow. Unfortunately this means you have to modify your code to make all changes directly through the context. No more Entity.Property = "Some Value";, it becomes Context.Entity(e=> e.Property).SetValue("Some Value"); (or something like that, I don't remember the exact syntax), which makes the code ugly.
Any queries you do should definitely use AsNoTracking.
With all that, I was able to cut a ~20 hour process down to about 6 hours, but I still don't recommend using EF for this. It was an extremely painful project due almost entirely to my poor choice of EF to add data. Please use something else... anything else...
I don't want to give the impression that EF is a bad data access library, it is great at what it was designed to do, unfortunately this is not what it was designed for.
I can think on a few options.
1) A little speed increase could be done by moving your _db.SaveChanges() under your foreach() close bracket
foreach (...){
}
successCount += _db.SaveChanges();
2) Add items to a list, and then to context
List<ObjClass> list = new List<ObjClass>();
foreach (...)
{
list.Add(new ObjClass() { ... });
}
_db.newUserData.AddRange(list);
successCount += _db.SaveChanges();
3) If it's a big amount of dada, save on bunches
List<ObjClass> list = new List<ObjClass>();
int cnt=0;
foreach (...)
{
list.Add(new ObjClass() { ... });
if (++cnt % 100 == 0) // bunches of 100
{
_db.newUserData.AddRange(list);
successCount += _db.SaveChanges();
list.Clear();
// Optional if a HUGE amount of data
if (cnt % 1000 == 0)
{
_db = new MyDbContext();
}
}
}
// Don't forget that!
_db.newUserData.AddRange(list);
successCount += _db.SaveChanges();
list.Clear();
4) If TOOOO big, considere using bulkinserts. There are a few examples on internet and a few free libraries around.
Ref: https://blogs.msdn.microsoft.com/nikhilsi/2008/06/11/bulk-insert-into-sql-from-c-app/
On most of these options you loose some control on error handling as it is difficult to know which one failed.
I am currently trying to create a new order (which will be shown below) in a web service, and then send that data to insert a new row into the database. For some reason my DBML / Data Context does not allow me to use InsertOnSubmit.
Any ideas? I haven't used Linq to Sql in about 7 months.
Thanks in advance.
[WebMethod]
public string InsertOrderToDatabases()
{
//Start Data Contexts ------
DataContext db = new DataContext(System.Configuration.ConfigurationManager.AppSettings["RainbowCMSConnectionString"]);
DataContext dcSqlOES = new DataContext(System.Configuration.ConfigurationManager.AppSettings["OESConnectionString"]);
//Get table from local database
Table<Schedule> Schedule = db.GetTable<Schedule>();
//Find last order number in databases
var lastOrderNumber = from lOrder in Schedule
orderby lOrder.templ_idn descending
select lOrder.templ_idn;
int firstOrderID;
var firstOrder = lastOrderNumber.FirstOrDefault();
firstOrderID = firstOrder.Value + 1;
qrOrder qrOrd = new qrOrder
{
.... data in here creating a new order
};
//TODO: fix below with an insert on submit
if (qrOrd != null)
{
// **Schedule.InsertOnSubmit(qrOrd);**
}
//db.GetTable<Schedule>().InsertOnSubmit(qrOrd);
try
{
//Submit the changes to the database
db.SubmitChanges();
return "Orders were sent to the databases.";
}
catch ()
{
}
}
Based on your response, it appears that you are using the wrong table, or perhaps the wrong data type. I also noticed that when you declare your localSchedule variable, you declare it as type Table<Schedule>, which means it should contain Schedule entities, not qrOrder entities.
Table<TEntity>.InsertOnSubmit expects a specific strongly typed entity to be passed in. In your case, it is expecting Web_Service.Schedul‌e, but you are trying to pass in a qrOrder.
Schedule.InsertOnSubmit(qrOrd);
That line will not treat to submit changes to connected entity , Try this
db.Schedule.InsertOnSubmit(qrOrd);
db.SubmitChanges();
you can try with
db.GetTable(typeof(Schedule)).InsertOnSubmit(qrOrd);
Or
db.GetTable(qrOrd.GetType()).InsertOnSubmit(qrOrd);
I am getting an error when calling entities.savechanges() on my EF 4.3.1. My database is a sql ce v4 store and I am coding in the mvvm pattern. I have a local version of my context that I send to an observable collection and modify etc. This works fine, and when I call savechanges() when no rows exist in the database the objects persist fine. When I reload the application, the objects are populated in my listbox as they should, however if I add another object and call savechanges() I get an error saying that a duplicate value cannot be inserted into a unique index.
From my understanding it means that the context is trying to save my entities to the datastore, but it seems to be adding my untouched original objects as well as the new one. I thought it would leave them alone, since their state is unchanged.
private void Load()
{
entities.Properties.Include("Images").Load();
PropertyList = new ObservableCollection<Property>();
PropertyList = entities.Properties.Local;
//Sort the list (based on previous session stored in database)
var sortList = PropertyList.OrderBy(x => x.Sort).ToList();
PropertyList.Clear();
sortList.ForEach(PropertyList.Add);
propertyView = CollectionViewSource.GetDefaultView(PropertyList);
if (propertyView != null) propertyView.CurrentChanged += new System.EventHandler(propertyView_CurrentChanged);
private void NewProperty()
{
try
{
if (PropertyList != null)
{
Property p = new Property()
{
ID = Guid.NewGuid(),
AgentName = "Firstname Lastname",
Address = "00 Blank Street",
AuctioneerName = "Firstname Lastname",
SaleTitle = "Insert a sales title",
Price = 0,
NextBid = 0,
CurrentImage = null,
Status = "Auction Pending",
QuadVis = false,
StatVis = false, //Pause button visibility
Sort = PropertyList.Count + 1,
};
PropertyList.Add(p);
SaveProperties();
}
private void SaveProperties()
{
try
{
foreach (var image in entities.Images.Local.ToList())
{
if (image.Property == null)
entities.Images.Remove(image);
}
}
catch (Exception ex)
{
System.Windows.MessageBox.Show(ex.Message);
}
entities.SaveChanges();
}
Without commenting on all the code here this is the bit that's causing the specific problem you bring up:
//Sort the list (based on previous session stored in database)
var sortList = PropertyList.OrderBy(x => x.Sort).ToList();
PropertyList.Clear();
sortList.ForEach(PropertyList.Add);
This code:
Starts with entities that have been queried and are being tracked by the context as Unchanged entities. That is, entities that are known to already exist in the database.
Creates a new sorted list of these entities.
Calls Clear on the local collection causing each tracked entity to be marked as deleted and removed from the collection.
Adds each entity back to the context putting it now in an Added state meaning that it is new and will be saved to the database when SaveChanges is called,
So effectively you have told EF that all the entities that exist in the database actually don't exist and need to be saved. So it tries to do this and it results in the exception you see.
To fix this don't clear the DbContext local collection and add entities back. Instead you should sort in the view using the local collection to back the view.
It sounds like you're adding the existing entities to the context (which marks them for insertion) instead of attaching them (which marks them as existing, unmodified).
I'm also not sure that new Guid() isn't returning the same guid... I always use Guid.NewGuid() http://msdn.microsoft.com/en-us/library/system.guid.newguid.aspx
This section simply reads from an excel spreadsheet. This part works fine with no performance issues.
IEnumerable<ImportViewModel> so=data.Select(row=>new ImportViewModel{
PersonId=(row.Field<string>("person_id")),
ValidationResult = ""
}).ToList();
Before I pass to a View I want to set ValidationResult so I have this piece of code. If I comment this out the model is passed to the view quickly. When I use the foreach it will take over a minute. If I hardcode a value for item.PersonId then it runs quickly. I know I'm doing something wrong, just not sure where to start and what the best practice is that I should be following.
foreach (var item in so)
{
if (db.Entity.Any(w => w.ID == item.PersonId))
{
item.ValidationResult = "Successful";
}
else
{
item.ValidationResult = "Error: ";
}
}
return View(so.ToList());
You are now performing a database call per item in your list. This is really hard on your database and thus your performance. Try to itterate trough your excel result, gather all users and select them in one query. Make a list from this query result (else the query call is performed every time you access the list). Then perform a match between the result list and your excel.
You need to do something like this :
var ids = so.Select(i=>i.PersonId).Distinct().ToList();
// Hitting Database just for this time to get all Users Ids
var usersIds = db.Entity.Where(u=>ids.Contains(u.ID)).Select(u=>u.ID).ToList();
foreach (var item in so)
{
if (usersIds.Contains(item.PersonId))
{
item.ValidationResult = "Successful";
}
else
{
item.ValidationResult = "Error: ";
}
}
return View(so.ToList());