The following code results in deletions instead of updates.
My question is: is this a bug in the way I'm coding against Entity Framework or should I suspect something else?
Update: I got this working, but I'm leaving the question now with both the original and the working versions in hopes that I can learn something I didn't understand about EF.
In this, the original non working code, when the database is fresh, all the additions of SearchDailySummary object succeed, but on the second time through, when my code was supposedly going to perform the update, the net result is a once again empty table in the database, i.e. this logic manages to be the equiv. of removing each entity.
//Logger.Info("Upserting SearchDailySummaries..");
using (var db = new ClientPortalContext())
{
foreach (var item in items)
{
var campaignName = item["campaign"];
var pk1 = db.SearchCampaigns.Single(c => c.SearchCampaignName == campaignName).SearchCampaignId;
var pk2 = DateTime.Parse(item["day"].Replace('-', '/'));
var source = new SearchDailySummary
{
SearchCampaignId = pk1,
Date = pk2,
Revenue = decimal.Parse(item["totalConvValue"]),
Cost = decimal.Parse(item["cost"]),
Orders = int.Parse(item["conv1PerClick"]),
Clicks = int.Parse(item["clicks"]),
Impressions = int.Parse(item["impressions"]),
CurrencyId = item["currency"] == "USD" ? 1 : -1 // NOTE: non USD (if exists) -1 for now
};
var target = db.Set<SearchDailySummary>().Find(pk1, pk2) ?? new SearchDailySummary();
if (db.Entry(target).State == EntityState.Detached)
{
db.SearchDailySummaries.Add(target);
addedCount++;
}
else
{
// TODO?: compare source and target and change the entity state to unchanged if no diff
updatedCount++;
}
AutoMapper.Mapper.Map(source, target);
itemCount++;
}
Logger.Info("Saving {0} SearchDailySummaries ({1} updates, {2} additions)", itemCount, updatedCount, addedCount);
db.SaveChanges();
}
Here is the working version (although I'm not 100% it's optimized, it's working reliably and performing fine as long as I batch it out in groups of 500 or less items in a shot - after that it slows down exponentially but I think that just may be a different question/subject)...
//Logger.Info("Upserting SearchDailySummaries..");
using (var db = new ClientPortalContext())
{
foreach (var item in items)
{
var campaignName = item["campaign"];
var pk1 = db.SearchCampaigns.Single(c => c.SearchCampaignName == campaignName).SearchCampaignId;
var pk2 = DateTime.Parse(item["day"].Replace('-', '/'));
var source = new SearchDailySummary
{
SearchCampaignId = pk1,
Date = pk2,
Revenue = decimal.Parse(item["totalConvValue"]),
Cost = decimal.Parse(item["cost"]),
Orders = int.Parse(item["conv1PerClick"]),
Clicks = int.Parse(item["clicks"]),
Impressions = int.Parse(item["impressions"]),
CurrencyId = item["currency"] == "USD" ? 1 : -1 // NOTE: non USD (if exists) -1 for now
};
var target = db.Set<SearchDailySummary>().Find(pk1, pk2);
if (target == null)
{
db.SearchDailySummaries.Add(source);
addedCount++;
}
else
{
AutoMapper.Mapper.Map(source, target);
db.Entry(target).State = EntityState.Modified;
updatedCount++;
}
itemCount++;
}
Logger.Info("Saving {0} SearchDailySummaries ({1} updates, {2} additions)", itemCount, updatedCount, addedCount);
db.SaveChanges();
}
The thing that keeps popping up in my mind is that maybe the Entry(entity) or Find(pk) method has some side effects? I should probably be consulting the documentation but any advice is appreciated..
It's a slight assumption on my part (without looking into your models/entities), but have a look at what's going on within this block (see if the objects being attached here are related to the deletions):
if (db.Entry(target).State == EntityState.Detached)
{
db.SearchDailySummaries.Add(target);
addedCount++;
}
Your detached object won't be able to use its navigation properties to locate its related objects; you'll be re-attaching an object in a potentially conflicting state (without the correct relationships).
You haven't mentioned what is being deleted above, so I may be way off. Just off out, so this is a little rushed, hope there's something useful in there.
Related
I am trying to query objects from a database, loop through them and check if a column has a value and, if it does not, create a value and assign it to that column and save it to the database. The problem I'm having is that the entity is detaching after the query so I cannot save the changes. Below is the code I am using to query and update the entity.
DateTime runTime = passedDateTime ?? DateTime.Now;
await using DiscordDatabaseContext database = new();
DateTime startOfWeek = exactlyOneWeek ? runTime.OneWeekAgo() : runTime.StartOfWeek(StartOfWeek);
//Add if not in a Weekly Playlist already and if the video was submitted after the start of the week
List<PlaylistData> pld = await database.PlaylistsAdded.Select(playlist => new PlaylistData
{
PlaylistId = playlist.PlaylistId,
WeeklyPlaylistID = playlist.WeeklyPlaylistID,
Videos = playlist.Videos.Where(
video => (video.WeeklyPlaylistItemId == null ||
video.WeeklyPlaylistItemId.Length == 0) &&
startOfWeek <= video.TimeSubmitted)
.Select(video => new VideoData
{
WeeklyPlaylistItemId = video.WeeklyPlaylistItemId,
VideoId = video.VideoId
}).ToList()
}).ToListAsync().ConfigureAwait(false);
int count = 0;
int nRows = 0;
foreach (PlaylistData playlistData in pld)
{
if (string.IsNullOrEmpty(playlistData.WeeklyPlaylistID))
{
playlistData.WeeklyPlaylistID = await YoutubeAPIs.Instance.MakeWeeklyPlaylist().ConfigureAwait(false);
}
foreach (VideoData videoData in playlistData.Videos)
{
PlaylistItem playlistItem = await YoutubeAPIs.Instance.AddToPlaylist(videoData.VideoId, playlistId: playlistData.WeeklyPlaylistID, makeNewPlaylistOnError: false).ConfigureAwait(false);
videoData.WeeklyPlaylistItemId = playlistItem.Id;
++count;
}
}
nRows += await database.SaveChangesAsync().ConfigureAwait(false);
The query works correctly, I get all relevant Playlist and Video Rows to work with, they have the right data in only the specified columns, and the query that is logged looks good, but saves do not work and calling database.Entry() on any of the Playlists or Video objects show that they are all detached. What am I doing wrong? Are collections saved a different way? Should my query be changed? Is there a setting on initialization that should be changed? (The only setting I have set on init that I feel like may affect this is .UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery) but the query logged isn't even split as far as I can see)
You work with projected objects
PlaylistData
VideoData
Projected objects does not tracked by EF core as far as I know. So the solution is to select DbSet's entity objects (mean types that specified in database.PlaylistsAdded and playlist.Videos properties) or select those objects before update and then update them.
UPDATE:
Example code for second option:
foreach (PlaylistData playlistData in pld)
{
var playlist = database.PlaylistsAdded
.Include(x=> x.Videos)
.First(x => x.PlaylistId == playlistData.playlistData);
if (string.IsNullOrEmpty(playlistData.WeeklyPlaylistID))
{
playlist.WeeklyPlaylistID = await YoutubeAPIs.Instance.MakeWeeklyPlaylist().ConfigureAwait(false);
}
foreach (VideoData videoData in playlistData.Videos)
{
var video = playlist.Videos.First(x=> x.VideoId == videoData.VideoId);
PlaylistItem playlistItem = await YoutubeAPIs.Instance.AddToPlaylist(videoData.VideoId, playlistId: playlistData.WeeklyPlaylistID, makeNewPlaylistOnError: false).ConfigureAwait(false);
video.WeeklyPlaylistItemId = playlistItem.Id;
++count;
}
}
NOTICE: this would produce double select's so first option is more preferred
I have a method that builds out an array of objects (list) and returns it to the parent. I was having a terrible time with the performance in my MVC app so I decided to add a stopwatch in place to catch of various areas of the code which is being called. I've now isolated it down to this area:
var items = new List();
_stopwatch.Start();
var query = (from img in db.links
join link in db.lScc on img.pkID equals link.nLinkConfig
select new
{
ImageBase64 = img.bzImage,
ImageType = img.szImageType,
Description = img.szDescription,
URL = img.szURI,
HrefTarget = img.nWindowBehavior,
GroupName = link.szGroupName,
LinkConfig = link.nLinkConfig
}).DistinctBy(x => x.LinkConfig);
_stopwatch.Stop();
_stopwatch.Start();
foreach (var item in query)
{
items.Add(new
{
ImageBase64 = item.ImageBase64 != null && item.ImageBase64.Length > 0 ? Convert.ToBase64String(item.ImageBase64) : "",
ImageType = string.IsNullOrEmpty(item.ImageType) ? "" : item.ImageType,
Description = string.IsNullOrEmpty(item.Description) ? "" : item.Description,
URL = string.IsNullOrEmpty(item.URL) ? "" : item.URL,
HrefTarget = item.HrefTarget,
GroupName = item.GroupName
});
}
_stopwatch.Stop(); // takes around 11 seconds for this to complete about 20 iterations
I first thought it may be the ...Convert.ToBase64String(item.ImageBase64)... but I commented that out and it had basically no effect.
Anyone have any ideas what could be causing the slowness? This should only take a fraction of a second to complete. It deals with UI so this needs to be a lot more responsive.
The issue looks to be that you want to load all items from a db into the query and not lazy load the results. The round trip to the db can be very slow.
var query = (from img in db.links
join link in db.lScc on img.pkID equals link.nLinkConfig
select new
{
ImageBase64 = img.bzImage,
ImageType = img.szImageType,
Description = img.szDescription,
URL = img.szURI,
HrefTarget = img.nWindowBehavior,
GroupName = link.szGroupName,
LinkConfig = link.nLinkConfig
}).DistinctBy(x => x.LinkConfig);
query.count();
query.count will load all the items at once into your collection rather than lazy load your collection
It turns out there was something wrong with my datamodel so the this.Configuration.LazyLoadingEnabled = false set in the model.edmx wasn't disabling the lazy loading. I updated the model from the database and fixed the issue (there was a deleted table that kept coming back) and re-tried. It's now 1/4 the time previously. Thanks for the tips on this!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have an export job migrating data from an old database into a new database. The problem I'm having is that the user population is around 3 million and the job takes a very long time to complete (15+ hours). The machine I am using only has 1 processor so I'm not sure if threading is what I should be doing. Can someone help me optimize this code?
static void ExportFromLegacy()
{
var usersQuery = _oldDb.users.Where(x =>
x.status == 'active');
int BatchSize = 1000;
var errorCount = 0;
var successCount = 0;
var batchCount = 0;
// Using MoreLinq's Batch for sequences
// https://www.nuget.org/packages/MoreLinq.Source.MoreEnumerable.Batch
foreach (IEnumerable<users> batch in usersQuery.Batch(BatchSize))
{
Console.WriteLine(String.Format("Batch count at {0}", batchCount));
batchCount++;
foreach(var user in batch)
{
try
{
var userData = _oldDb.userData.Where(x =>
x.user_id == user.user_id).ToList();
if (userData.Count > 0)
{
// Insert into table
var newData = new newData()
{
UserId = user.user_id; // shortened code for brevity.
};
_db.newUserData.Add(newData);
_db.SaveChanges();
// Insert item(s) into table
foreach (var item in userData.items)
{
if (!_db.userDataItems.Any(x => x.id == item.id)
{
var item = new Item()
{
UserId = user.user_id, // shortened code for brevity.
DataId = newData.id // id from object created above
};
_db.userDataItems.Add(item);
}
_db.SaveChanges();
successCount++;
}
}
}
catch(Exception ex)
{
errorCount++;
Console.WriteLine(String.Format("Error saving changes for user_id: {0} at {1}.", user.user_id.ToString(), DateTime.Now));
Console.WriteLine("Message: " + ex.Message);
Console.WriteLine("InnerException: " + ex.InnerException);
}
}
}
Console.WriteLine(String.Format("End at {0}...", DateTime.Now));
Console.WriteLine(String.Format("Successful imports: {0} | Errors: {1}", successCount, errorCount));
Console.WriteLine(String.Format("Total running time: {0}", (exportStart - DateTime.Now).ToString(#"hh\:mm\:ss")));
}
Unfortunately, the major issue is the number of database round-trip.
You make a round-trip:
For every user, you retrieve user data by user id in the old database
For every user, you save user data in the new database
For every user, you save user data item in the new database
So if you say you have 3 million users, and every user has an average of 5 user data item, it mean you do at least 3m + 3m + 15m = 21 million database round-trip which is insane.
The only way to dramatically improve the performance is by reducing the number of database round-trip.
Batch - Retrieve user by id
You can quickly reduce the number of database round-trip by retrieving all user data at once and since you don't have to track them, use "AsNoTracking()" for even more performance gains.
var list = batch.Select(x => x.user_id).ToList();
var userDatas = _oldDb.userData
.AsNoTracking()
.Where(x => list.Contains(x.user_id))
.ToList();
foreach(var userData in userDatas)
{
....
}
You should already have saved a few hours only with this change.
Batch - Save Changes
Every time you save a user data or item, you perform a database round-trip.
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library allows to perform:
BulkSaveChanges
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
You can either call BulkSaveChanges at the end of the batch or create a list to insert and use directly BulkInsert instead for even more performance.
You will, however, have to use a relation to the newData instance instead of using the ID directly.
foreach (IEnumerable<users> batch in usersQuery.Batch(BatchSize))
{
// Retrieve all users for the batch at once.
var list = batch.Select(x => x.user_id).ToList();
var userDatas = _oldDb.userData
.AsNoTracking()
.Where(x => list.Contains(x.user_id))
.ToList();
// Create list used for BulkInsert
var newDatas = new List<newData>();
var newDataItems = new List<Item();
foreach(var userData in userDatas)
{
// newDatas.Add(newData);
// newDataItem.OwnerData = newData;
// newDataItems.Add(newDataItem);
}
_db.BulkInsert(newDatas);
_db.BulkInsert(newDataItems);
}
EDIT: Answer subquestion
One of the properties of a newDataItem, is the id of newData. (ex.
newDataItem.newDataId.) So newData would have to be saved first in
order to generate its id. How would I BulkInsert if there is a
dependency of an another object?
You must use instead navigation properties. By using navigation property, you will never have to specify parent id but set the parent object instance instead.
public class UserData
{
public int UserDataID { get; set; }
// ... properties ...
public List<UserDataItem> Items { get; set; }
}
public class UserDataItem
{
public int UserDataItemID { get; set; }
// ... properties ...
public UserData OwnerData { get; set; }
}
var userData = new UserData();
var userDataItem = new UserDataItem();
// Use navigation property to set the parent.
userDataItem.OwnerData = userData;
Tutorial: Configure One-to-Many Relationship
Also, I don't see a BulkSaveChanges in your example code. Would that
have to be called after all the BulkInserts?
Bulk Insert directly insert into the database. You don't have to specify "SaveChanges" or "BulkSaveChanges", once you invoke the method, it's done ;)
Here is an example using BulkSaveChanges:
foreach (IEnumerable<users> batch in usersQuery.Batch(BatchSize))
{
// Retrieve all users for the batch at once.
var list = batch.Select(x => x.user_id).ToList();
var userDatas = _oldDb.userData
.AsNoTracking()
.Where(x => list.Contains(x.user_id))
.ToList();
// Create list used for BulkInsert
var newDatas = new List<newData>();
var newDataItems = new List<Item();
foreach(var userData in userDatas)
{
// newDatas.Add(newData);
// newDataItem.OwnerData = newData;
// newDataItems.Add(newDataItem);
}
var context = new UserContext();
context.userDatas.AddRange(newDatas);
context.userDataItems.AddRange(newDataItems);
context.BulkSaveChanges();
}
BulkSaveChanges is slower than BulkInsert due to having to use some internal method from Entity Framework but still way faster than SaveChanges.
In the example, I create a new context for every batch to avoid memory issue and gain some performance. If you re-use the same context for all batchs, you will have millions of tracked entities in the ChangeTracker which is never a good idea.
Entity Framework is a very bad choice for importing large amounts of data. I know this from personal experience.
That being said, I found a few ways to optimize things when I tried to use it in the same way you are.
The Context will cache objects as you add them, and the more inserts you do, the slower future inserts will get. My solution was to limit each context to about 500 inserts before I disposed of that instance and created a new one. This boosted performance significantly.
I was able to make use of multiple threads to increase performance, but you will have to be very careful about resource contention. Each thread will definitely need its own Context, don't even think about trying to share it between threads. My machine had 8 cores, so threading will probably not help you as much; with a single core I doubt it will help you at all.
Turn off ChangeTracking with AutoDetectChangesEnabled = false;, change tracking is incredibly slow. Unfortunately this means you have to modify your code to make all changes directly through the context. No more Entity.Property = "Some Value";, it becomes Context.Entity(e=> e.Property).SetValue("Some Value"); (or something like that, I don't remember the exact syntax), which makes the code ugly.
Any queries you do should definitely use AsNoTracking.
With all that, I was able to cut a ~20 hour process down to about 6 hours, but I still don't recommend using EF for this. It was an extremely painful project due almost entirely to my poor choice of EF to add data. Please use something else... anything else...
I don't want to give the impression that EF is a bad data access library, it is great at what it was designed to do, unfortunately this is not what it was designed for.
I can think on a few options.
1) A little speed increase could be done by moving your _db.SaveChanges() under your foreach() close bracket
foreach (...){
}
successCount += _db.SaveChanges();
2) Add items to a list, and then to context
List<ObjClass> list = new List<ObjClass>();
foreach (...)
{
list.Add(new ObjClass() { ... });
}
_db.newUserData.AddRange(list);
successCount += _db.SaveChanges();
3) If it's a big amount of dada, save on bunches
List<ObjClass> list = new List<ObjClass>();
int cnt=0;
foreach (...)
{
list.Add(new ObjClass() { ... });
if (++cnt % 100 == 0) // bunches of 100
{
_db.newUserData.AddRange(list);
successCount += _db.SaveChanges();
list.Clear();
// Optional if a HUGE amount of data
if (cnt % 1000 == 0)
{
_db = new MyDbContext();
}
}
}
// Don't forget that!
_db.newUserData.AddRange(list);
successCount += _db.SaveChanges();
list.Clear();
4) If TOOOO big, considere using bulkinserts. There are a few examples on internet and a few free libraries around.
Ref: https://blogs.msdn.microsoft.com/nikhilsi/2008/06/11/bulk-insert-into-sql-from-c-app/
On most of these options you loose some control on error handling as it is difficult to know which one failed.
I need to match email sends with email bounces so I can find if they were delivered or not. The catch is, I have to limit the bounce to within 4 days of the send to eliminate matching the wrong send to the bounce. Send records are spread over a period of 30 days.
LinkedList<event_data> sent = GetMyHugeListOfSends(); //for example 1M+ records
List<event_data> bounced = GetMyListOfBounces(); //for example 150k records
bounced = bounced.OrderBy(o => o.event_date).ToList(); //this ensures the most accurate match of bounce to send (since we find the first match)
List<event_data> delivered = new List<event_data>();
event_data deliveredEmail = new event_data();
foreach (event_data sentEmail in sent)
{
event_data bounce = bounced.Find(item => item.email.ToLower() == sentEmail.email.ToLower() && (item.event_date > sentEmail.event_date && item.event_date < sentEmail.event_date.AddDays(deliveredCalcDelayDays)));
//create delivered records
if (bounce != null)
{
//there was a bounce! don't add a delivered record!
}
else
{
//if sent is not bounced, it's delivered
deliveredEmail.sid = siteid;
deliveredEmail.mlid = mlid;
deliveredEmail.mid = mid;
deliveredEmail.email = sentEmail.email;
deliveredEmail.event_date = sentEmail.event_date;
deliveredEmail.event_status = "Delivered";
deliveredEmail.event_type = "Delivered";
deliveredEmail.id = sentEmail.id;
deliveredEmail.number = sentEmail.number;
deliveredEmail.laststoretransaction = sentEmail.laststoretransaction;
delivered.Add(deliveredEmail); //add the new delivered
deliveredEmail = new event_data();
//remove bounce, it only applies to one send!
bounced.Remove(bounce);
}
if (bounced.Count() == 0)
{
break; //no more bounces to match!
}
}
So I did some testing and it's processing about 12 sent records per second. At 1M+ records, it will take 25+ hours to process!
Two questions:
How can I find the exact line that is taking the most time?
I am assuming it's the lambda expression finding the bounce that is taking the longest since this was much faster before I put that in there. How can I speed this up?
Thanks!
Edit
---Ideas---
One idea I just had is to sort the sends by date like I did the bounces so that the search through the bounces will be more efficient, since an early send would be likely to hit an early bounce as well.
Another idea I just had is to run a couple of these processes in parallel, although I would hate to multi-thread this simple application.
I would be reasonably confident in saying that yes it is your find that is taking the time.
It looks like you are certain that the find method will return 0 or 1 records only (not a list) in which case the way to speed this up would be to create a lookup (a dictionary) instead of creating a List<event_data> for your bounced var, create a Dictionary<key, event_data> instead, then you can just look-up the value by key instead of doing a find.
The trick is in creating your key (I don't know enough about your app to help with that) but essentially the same criteria that is in your find.
EDIT. (adding some pseudo code)
void Main()
{
var hugeListOfEmails = GetHugeListOfEmails();
var allBouncedEmails = GetAllBouncedEmails();
IDictionary<string, EmailInfo> CreateLookupOfBouncedEmails = CreateLookupOfBouncedEmails(allBouncedEmails);
foreach(var info in hugeListOfEmails)
{
if(CreateLookupOfBouncedEmails.ContainsKey(info.emailAddress))
{
// Email is bounced;
}
else
{
// Email is not bounced
}
}
}
public IEnumerable<EmailInfo> GetHugeListOfEmails()
{
yield break;
}
public IEnumerable<EmailInfo> GetAllBouncedEmails()
{
yield break;
}
public IDictionary<string, EmailInfo> CreateLookupOfBouncedEmails(IEnumerable<EmailInfo> emailList)
{
var result = new Dictionary<string, EmailInfo>();
foreach(var e in emailList)
{
if(!result.ContainsKey(e.emailAddress))
{
if(//satisfies the date conditions)
{
result.Add(e.emailAddress, e);
}
}
}
return result;
}
public class EmailInfo
{
public string emailAddress { get; set; }
public DateTime DateSent { get; set; }
}
You should improve by using ToLookup method to create lookup table for email address
var bouncedLookup = bounced.ToLookup(k => k.email.ToLower());
and use this in the loop to lookup by the email first
var filteredBounced = bouncedLookup[sent_email.email.ToLower()];
// mini optimisation here
var endDate = sentEmail.event_date.AddDays(deliveredCalcDelayDays);
event_data bounce = filteredBounced.Find(item => item.event_date > sentEmail.event_date && item.event_date < endDate));
I could not compile it but I think that should do. Please try it.
You are finding items in a list. That means it has to traverse the whole list so it is an order (n) operation. Could you not store those sent emails in a Dictionary with the key being the email address you are searching on. The go through the bounces linking back to the emails in the dictionary. The lookup will be constant time and the you will go through the bounces so it will be order (n) overall. You current method is order (n squared)
Converting bounced to sortedlist might be a good solution
SortedList<string,data> sl = new SortedList<string,event_data>(bounced.ToDictionary(s=>s.email,s=>s));
and to find a bounce use
sl.Select(c=>c.Key.Equals(item => item.email,StringComparison.OrdinalIgnoreCase) && ...).FirstOrDefault();
There's another concern about your code, that I want to point out.
Memory consumption. I don't know your machine configuration, but here are some thoughts about the code:
Initially you are allocating space for 1,2M+ objects of event_data
type. I can't see event_data full type definition, but assuming
that emails are all unique and seeing that the type has quite many
properties, I can assume that such a collection is rather heavy
(hundreds of Meg possibly).
Next you are allocating another bunch of event_data objects
(almost 1M if I've counted it right). It's getting even more heavy
in terms of memory consumption
I don't know about other objects, that are present in data-model of your application, but considering all things I've mentioned, you can easily get close to memory limit
for 32-bit process and thus force GC to work very often. In fact
you can easily have a GC collecting after each call
bounced.Remove(bounce); And it's really would significantly slow down your app.
So, even if you are having a plenty of memory left and/or your app is 64-bit, I would try to minimize memory consumption. Pretty sure it would get your code run faster. For example, you can do complete processing of deliveredEmail, without storing it, or load your initial event_data in chunks etc.
On Consideration, the number of bounces is relatively small, so,
Why not pre omptimise the bounce lookup as much as possible, this code makes a delegate for each possible bounce and groups them into a dictionary for access by the e-mail key.
private static DateInRange(
DateTime sendDate,
DateTime bouncedDate,
int deliveredCalcDelayDays)
{
if (sentDate < bouncedDate)
{
return false;
}
return sentDate < bouncedDate.AddDays(deliveredCalcDelayDays);
}
static IEnumerable<event_data> GetDeliveredMails(
IEnumerable<event_data> sent,
IEnumerable<event_data> bounced,
int siteId,
int mlId,
int mId,
int deliveredCalcDelayDays)
{
var grouped = bounced.GroupBy(
b => b.email.ToLowerInvariant());
var lookup = grouped.ToDictionary(
g => g.Key,
g => g.OrderBy(e => e.event_date).Select(
e => new Func<DateTime, bool>(
s => DateInRange(s, e.event_date, deliveredCalcDelayDays))).ToList());
foreach (var s in sent)
{
var key = s.email.ToLowerInvariant();
List<Func<DateTime, nool>> checks;
if (lookup.TryGetValue(key, out checks))
{
var match = checks.FirstOrDefault(c => c(s.event_date));
if (match != null)
{
checks.Remove(match);
continue;
}
}
yield return new event_data
{
.sid = siteid;
.mlid = mlid;
.mid = mid;
.email = s.email;
.event_date = s.event_date;
.event_status = "Delivered";
.event_type = "Delivered";
.id = s.id;
.number = s.number;
.laststoretransaction = s.laststoretransaction
};
}
}
You could try pre-compiling the delegates in lookup if this is not fast enough.
Ok the final solution I found was a Dictionary for the bounces.
The sent LinkedList was sorted by sent_date so it would loop through in chronological order. That's important because I have to match the right send to the right bounce.
I made a Dictionary<string,<List<event_data>>, so the key was email, and the value was a List of all <event_data> bounces for the email address. The List was sorted by event_date since I wanted to make sure the first bounce was matched to the send.
Final result...it went from processing 700 records/minute to 500k+ records/second.
Here is the final code:
LinkedList sent = GetMyHugeListOfSends();
IEnumerable sentOrdered = sent.OrderBy(send => send.event_date);
Dictionary> bounced = GetMyListOfBouncesAsDictionary();
List delivered = new List();
event_data deliveredEmail = new event_data();
List bounces = null;
bool matchedBounce = false;
foreach (event_data sentEmail in sentOrdered)
{
matchedBounce = false;
//create delivered records
if (bounced.TryGetValue(sentEmail.email, out bounces))
{
//there was a bounce! find out if it was within 4 days after the send!
foreach (event_data bounce in bounces)
{
if (bounce.event_date > sentEmail.event_date &&
bounce.event_date <= sentEmail.event_date.AddDays(4))
{
matchedBounce = true;
//remove the record because a bounce can only match once back to a send
bounces.Remove(bounce);
if(bounces.Count == 0) //no more bounces for this email
{
bounced.Remove(sentEmail.email);
}
break;
}
}
if (matchedBounce == false) //no matching bounces in the list!
{
//if sent is not bounced, it's delivered
deliveredEmail.sid = siteid;
deliveredEmail.mlid = mlid;
deliveredEmail.mid = mid;
deliveredEmail.email = sentEmail.email;
deliveredEmail.event_date = sentEmail.event_date;
deliveredEmail.event_status = "Delivered";
deliveredEmail.event_type = "Delivered";
deliveredEmail.id = sentEmail.id;
deliveredEmail.number = sentEmail.number;
deliveredEmail.laststoretransaction = sentEmail.laststoretransaction;
delivered.Add(deliveredEmail); //add the new delivered
deliveredEmail = new event_data();
}
}
else
{
//if sent is not bounced, it's delivered
deliveredEmail.sid = siteid;
deliveredEmail.mlid = mlid;
deliveredEmail.mid = mid;
deliveredEmail.email = sentEmail.email;
deliveredEmail.event_date = sentEmail.event_date;
deliveredEmail.event_status = "Delivered";
deliveredEmail.event_type = "Delivered";
deliveredEmail.id = sentEmail.id;
deliveredEmail.number = sentEmail.number;
deliveredEmail.laststoretransaction = sentEmail.laststoretransaction;
delivered.Add(deliveredEmail); //add the new delivered
deliveredEmail = new event_data();
}
if (bounced.Count() == 0)
{
break; //no more bounces to match!
}
}
I am currently running an app with a very LONG Linq transaction. Everything has to happen in this one transaction, and I'm not sure if or where objects are interfering with each other.
When I try to save the changes I see this warning come up:
System.InvalidOperationException: The changes to the database were
committed successfully, but an error occurred while updating the
object context. The ObjectContext might be in an inconsistent state.
Inner exception message: AcceptChanges cannot continue because the
object's key values conflict with another object in the
ObjectStateManager. Make sure that t he key values are unique before
calling AcceptChanges.
From what I've read googling around, people have found a lot of one-off solutions (rarely having anything to do with conflicting keys), and they don't usually post what the tip-offs are (better than nothing, of course).
What I'm not clear on is how do I hunt down the cause of this problem?
I update a lot of records in different places and let them go out of scope. I'm guessing the .NET compiler knows how to keep track of these objects without letting them go through the GC so it can commit everything at the end. And all the changes seem to end up in the database afterword.
Example:
// create new A, SA for new incoming tasks
SF_SUB_AREA sa = null;
SF_AREA a = null;
if (isNewSA) // new SA
{
areaID = MakeSalesForceGUID();
a = new SF_AREA
{
ID = areaID,
DESCRIPTION = t.DESCRIPTION,
CU_NUMBER = Convert.ToString(t.CU_NUMBER),
FC = t.FC,
PROJECT = cp.ID,
DELETE_FLAG = "I"
};
ctx.SF_AREA.AddObject(a);
SAID = MakeSalesForceGUID();
sa= new SF_SUB_AREA
{
ID = SAID,
PROJECT_REGION = t.CR,
AREA = areaID,
DELETE_FLAG = "I"
};
ctx.SF_SUB_AREA.AddObject(sa);
}
else // old SA
{
List<SF_AREA> lia = (from a2 in ctx.SF_AREA
join a2 in ctx.SF_SUB_AREA on a2.ID equals sa2.AREA
where sa2.ID == t.SUB_AREA
select ct2).ToList();
if ((lia != null) && (lia.Count > 0))
{
a = lia[0];
a.DELETE_FLAG = "U";
a.CLIENT_UNIT_NUMBER = Convert.ToString(t.CU_NUMBER);
a.DESCRIPTION = t.DESCRIPTION;
a.FC = t.FC;
a.PROJECT = cp.ID;
} // TODO: throw an error here for else block
List<SF_SUB_AREA> lisa = (from sa2 in ctx.SF_SUB_AREA
where sa2.ID == t.SUB_AREA
select sa2).ToList();
if ((lisa != null) && (lisa.Count > 0))
{
sa = lisa[0];
sa.PROJECT_REGION = t.AREA;
sa.AREA = lisa[0].AREA;
sa.DELETE_FLAG = "U";
}
}
...
ctx.SaveChanges(); // left out the try/catch
Currently I'm just creating new context every time I commit something, but I don't know if this is advisable.
foreach (SF_MOVE_ORDER mo in liMO ) {
using (SFEntitiesRM ctx2 = new SFEntitiesRM()) // new context for every MO since it goes into an unknown state after every commit
{
List<SF_CLIENT_PROJECT> liCP = (from cp in ctx2.SF_CLIENT_PROJECT
where cp.ID == mo.CLIENT_PROJECT
select cp).ToList();
if ((liCP != null) && (liCP.Count > 0))
{
PerformMoveOrder(mo, liCP[0], ctx2);
}
}
}
Usually with errors like this, it is best to start out by saving one object and building in the complexities one at a time. That way, you can start to figure out where the problem lies. It can end up being somewhere down the object graph that you weren't even expecting. But I would not just continue to throw the whole LINQ update at the context. Break it up into smaller saves and rebuild it to the largets graph and you'll find your error.