Datacontext.SaveChanges() adds unwanted records to database

Datacontext.SaveChanges() adds unwanted records to database - c#

I apologize in advance if I'm not clear about my issue but I'm working on a project I don't really understand yet, so feel free to ask to elaborate on a certain part.
Right now I'm working on a test that was already present in the project but didn't work correctly i.e. the data added to the database didn't get deleted and it still used code from before a rework when I wasn't assigned yet. At this moment the test works properly but after it has run there are records left in the database that shouldn't even have been added.
In the code, which you can find here http://pastebin.com/YzRqi7dt , you can see the test method. The records that are added in the TestInitialize are deleted properly but after finishing the test there are still 4 records too much in the database. 2 of them are copies of the ConexioContacts in the testmethod. I know it are copies because I noticed at a certain point during debugging that there were 8 contacts added instead of 4. The other 2 contacts that remain in the database look like they come from my Seed() method, but the Seed() method doesn't get called or anything.
The adding of these 4 records happens during the
List<Synchronizer<BaseContact, ExternalContact, ConexioContact>.ConexioEntitySyncContext> matches =
Synchronizer<BaseContact, ExternalContact, ConexioContact>
.FindMatches(conexioEntities, externalEntities, _unitOfWork);
when it enters the DataContext.SaveChanges() in the background (discovered while debugging). The method above can be found here: http://pastebin.com/t9iMGB31 . The DataContext.SaveChanges() gets called in the UnitOfWork.SaveChanges() at the end of the method.
Hope this was clear, if not, please ask.
With kind regards,
Gravinco
EDIT:
The transaction suggested by Wyatt Earp resolves the problem if I run my test but I still don't know how or why the error occurred, if someone could elaborate on this I would be very grateful.

One thing you could do is to wrap all of the work inside of a transaction and roll it back when you're done. This should keep any changes from hitting the database. Also, it might help locate the issue, if it is possibly coming from a different test or something...
Add a TransactionScope transaction; object to your test class and at the beginning of your TestInit() method, just do transaction = new TransactionScope(). Then, you can replace everything in your CleanTest() method with transaction.Dispose();

Related

Concurrency of modifying a Roslyn workspace? How does Visual studio do it?

probably a stupid question, but: Is there any way to reliable apply changes to a Roslyn workspace concurrently? And if not, what is the best practice to ensure it's done in correct order?
Example: Say you have some solution loaded into a Workspace, and have a basic function that will add a new project to the solution:
private Workspace _workspace;
// This is not thread-safe, right?
void AddProject()
{
var project = _workspace.CurrentSolution.AddProject(/* ... */);
_workspace.TryApplyChanges(project.Solution);
}
First question: Correct me if wrong, but I think AddProject would not be thread-safe, is that correct?
For example, lets say you want to add to new projects concurrently. So you call AddProject() twice concurrently.
My understanding is there is a race condition, and you might end up with both projects added (if one of the calls completes TryApplyChanges before the other call reaches _workspace.CurrentSolution), or only one of the projects added (if both calls have reached _worksapce.CurrentSolution before either has executed TryApplyChanges).
Second question: If my understanding is correct, then is there any best way to avoid these concurrency issues? I suppose the only real way would be to schedule/execute each modification sequentually, right?
How does Visual Studio, for example, do this.. Are modifications to the Workspace e.g. only done on the Dispatcher?
Thanks

The underlying code is thread-safe, but your usage is not.
TryApplyChanges() will return false if the solution changed in the meantime. If that happens, you need to try the changes again (starting from the new CurrentSolution).
Basically, you need to write
Solution changed;
do {
changed = _workspace.CurrentSolution....();
} while (!_workspace.TryApplyChanges(changed);
This is called a compare-and-swap loop; see my blog for more details.

What Happens if a Web Api Controller is hit in rapid succession?

I have a Web Api 2 (using C#) controller method running an asynchronous save (RESTful POST method). Our QA tester mashed the save button (client-side issue that was fixed, irrelevant to Q). Without a duplicate check, naturally this was saving duplicate entries to the db.
I implemented a duplicate check (duplicateCount just selects the number of items that are exactly like the item passed to post, should be 0, and it works, details irrelevant):
var duplicateCount = (fooCollection.CountAsync(aggregateFilter)).Result;
if (duplicateCount > 0) { return BadRequest(); }
This check works...except on the first button mash - two duplicate entries get saved, each from an individual controller hit.
So, it seems to me that the second controller hit happens before the first controller hit manages to save the item to the db, so the duplicate check passes. Is this possible?
I am more interested in the theory than the particular answer. Also, I know I can check for duplicates in the db as well, this is more of a conceptual question. The MongoDb part is really there just for completeness, I imagine it would be similar if I was doing an async save to SQL.
edit: Someone asked how I am making the call in comments. It's through RestAngular, but imo it's irrelevant because I know the controller is getting hit as many times as I hit the button. I also know for a fact that it does not create duplicates on a single hit.

Quick answer is "yes" - with the ability of a controller to be instantiated on any number of threads simultaneously it acts as though its multi-threaded. Your code is not "thread safe" in that its business operation requires an exclusive lock to be placed on some element of shared information (in this case the state of the database).
You could (I would not recommend) open a Mutex or a database transaction to force single thread behaviour, but your throughput would tank.
I personally don't face this very often because of my (possibly bad) insistence on all my entities having Guid primary keys and use of the SQL Merge command to either insert or update. This may be a helpful pattern (it doesn't matter how many times you send the same "message" to the controller - it will never save a duplicate).

Prevent the DbContext from repeatedly trying to save bad data

I have a process that is importing an Excel Spreadhseet, and parsing the data into my data objects. The source of this data is very questionable, as we're moving our customer from a spreadsheet-based data management into a managed database system with checks for valid data.
During my import process, I do some basic sanity checks of the data to accommodate just how bad the data could be that we're importing, but I have my overall validation being done in the DbContext.
Part of what I'm trying to do is that I want to provide the Row # in the spreadsheet that the data is bad so they can easily determine what they need to fix to get the file to import.
Once I have the data from the spreadsheet (model), and the Opportunity they're working with from the database (opp), here's the pseudocode of my process:
foreach (var model in Spreadsheet.Rows) { // Again, pseudocode
if(opp != null && ValidateModel(model, opp, row)) {
// Copy properties to the database object
// This is in a Repository-layer method, not directly in my import process.
// Just written here for clarity instead of several nested method calls.
context.SaveChanges();
}
}
I can provide more of the code here if needed, but the problem comes in my DbContext's ValidateEntity() method (override of DbContext).
Again, there's nothing wrong with the code that I've written, as far as I'm aware, but if an Opportunity that failed this level of validation, then it stays as part of the unsaved objects in the context, meaning that it repeatedly tries to get validated every time the ValidateEntity() is called. This leads to a repeat of the same Validation Error message for every row after the initial problem occurs.
Is there a way to [edit]get the Context to stop trying to validate the object after it fails validation once[edit]? I know I could wait until the end and call context.SaveChanges() once at the end to get around this, but I would like to be able to match this with what row it is in the Database.
For reference, I am using Entity Framework 6.1 with a Code First approach.
EDIT Attempting to clarify further for Marc L. (including an update to the code block above)
Right now, my process will iterate through as many rows as there are in the Spreadsheet. The reason why I'm calling my Repository layer with each object to save, instead of working with an approach that only calls context.SaveChanges() once is to allow myself the ability to determine which row is the one that is causing a validation error.
I'm glad that my DbContext's custom ValidateEntity() methods are catching the validation errors, but the problem resides in the fact that it is not throwing the DbEntityValidationException for the same entity multiple times.
I would like it so that if the object fails validation once, the context no longer tries to save the object, regardless of how many times context.SaveChanges() is called.

Your question is not a dupe (this is about saving, not loaded entities) but you could follow Jimmy's advice above. That is, once an entity is added to the context it is tracked in the "added" state and the only way to stop it from re-validating is by detaching it. It's an SO-internal link, but I'll reproduce the code snippet:
dbContext.Entry(entity).State = EntityState.Detached;
However, I don't think that's the way you want to go, because you're using exceptions to manage state unnecessarily (exceptions are notoriously expensive).
Working from the information given, I'd use a more set-based solution:
modify your model class so that it contains a RowID that records the original spreadsheet row (there's probably other good reasons to have this, too)
turn off entity-tracking for the context (turns of change detection allowing each Add() to be O(1))
add all the entities
call context.GetValidationErrors() and get all your errors at once, using the aforementioned RowID to identify the invalid rows.
You didn't indicate whether your process should save the good rows or reject the file as a whole, but this will accommodate either--that is, if you need to save the good rows, detach all the invalid rows using the code above and then SaveChanges().
Finally, if you do want to save the good rows and you're uncomfortable with the set-based method, it would be better to use a new DbContext for every single row, or at least create a new DbContext after each error. The ADO.NET team insists that context-creation is "relatively cheap" (sorry I don't have a cite or stats at hand for this) so this shouldn't damage your throughput too much. Even so, it will at least remain O(n). I wouldn't blame you, managing a large context can open you up to other issues as well.

Duplicate returned by Guid.NewGuid()?

We have an application that generates simulated data for one of our services for testing purposes. Each data item has a unique Guid. However, when we ran a test after some minor code changes to the simulator all of the objects generated by it had the same Guid.
There was a single data object created, then a for loop where the properties of the object were modified, including a new unique Guid, and it was sent to the service via remoting (serializable, not marshal-by-ref, if that's what you're thinking), loop and do it again, etc.
If we put a small Thread.Sleep( ...) inside of the loop, it generated unique id's. I think that is a red-herring though. I created a test app that just created one guid after another and didn't get a single duplicate.
My theory is that the IL was optimized in a way that caused this behavior. But enough about my theories. What do YOU think? I'm open to suggestions and ways to test it.
UPDATE: There seems to be a lot of confusion about my question, so let me clarify. I DON'T think that NewGuid() is broken. Clearly it works. Its FINE! There is a bug somewhere though, that causes NewGuid() to either:
1) be called only once in my loop
2) be called everytime in my loop but assigned only once
3) something else I haven't thought of
This bug can be in my code (MOST likely) or in optimization somewhere.
So to reiterate my question, how should I debug this scenario?
(and thank you for the great discussion, this is really helping me clarify the problem in my mind)
UPDATE # 2: I'd love to post an example that shows the problem, but that's part of my problem. I can't duplicate it outside of the whole suite of applications (client and servers).
Here's a relevant snippet though:
OrderTicket ticket = new OrderTicket(... );
for( int i = 0; i < _numOrders; i++ )
{
ticket.CacheId = Guid.NewGuid();
Submit( ticket ); // note that this simply makes a remoting call
}

Does Submit do an async call, or does the ticket object go into another thread at any stage.
In the code example you are reusing the same object. What if Submit sends the ticket in a background thread after a short delay (and does not take a copy). When you change the CacheId you are actually updating all the pending submits. This also explains why a Thread.Sleep fixes the problem. Try this:
for( int i = 0; i < _numOrders; i++ )
{
OrderTicket ticket = new OrderTicket(... );
ticket.CacheId = Guid.NewGuid();
Submit( ticket ); // note that this simply makes a remoting call
}
If for some reason this is not possible, try this and see if they are still the same:
ticket.CacheId = new Guid("00000000-0000-0000-0000-" +
string.Format("{0:000000000000}", i));

Thousands of developers use Guids in .NET. If Guid.NewGuid() had any tendency at all to get "stuck" on one value, the problem would have been encountered long ago.
The minor code changes are the sure culprit here. The fact that Thread.Sleep (which is less a red herring than a fish rotting in the sun) "fixes" your problem suggests that your properties are being set in some weird way that can't take effect until the loop stops blocking (either by ending or by Thread.Sleep). I'd even be willing to bet that the "minor change" was to reset all the properties from a separate thread.
If you posted some sample code, that would help.

It's a bug in your code. If you've managed to generate multiple guid's it is the most likely explanation. The clue is here in your question: "when we ran a test after some minor code changes to the simulator all of the objects generated by it had the same Guid"

See this article about how a Guid is created.
This artcile came from This answer.
Bottom line if you are creating the GUIDs too quickly and the clock hasn't moved forward that is why you are getting some as the same. However when you put a sleep in it works because the clock has moved.

The code in Submit and OrderTicket would be helpful as well...
You're reusing OrderTicket. I'd suspect that either you (or remoting itself) is batching calls out - probably in respect to # of connections/host limits - and picking up the last value of CacheId when it finally sends them along.
If you debug or Thread.Sleep the app, you're changing the timing so that the remoting call finishes before you assign a new CacheId.
Are you asyncing the remoting call? I'd think a sync call would block - but I'd check with a packet sniffer like Wireshark to be sure. Regardless, just changing to creating a new OrderTicket in each iteration would probably do the trick.
Edit: The question is not about NewGuid being broken...so my previous answer has been removed.

I dont know the details of how GUIDs are generated.. yet. However currently my org. is breeding GUIDs at a rate that would put rabbits to shame. So I can vouch for the fact that GUIDs aren't broken.. yet.
Post the source code if possible.. or a clone repro app. Many times I find the act of creating that clone app to repro the problem shows me the issue.
The other approach would be to comment out "those minor changes". If that fixes the problem, you can then triangularize to find the offending line of code. Eye-ball the minor changes hard... I mean real Hard.
Do let us know how it goes... this sounds interesting.

My gut is telling me something along these lines is going on...
class OrderTicket
{
Guid CacheId {set {_guid = new Guid("00000000-0000-0000-0000-");}
}
Log the value of CacheId into a log file every time its called with a stack trace ... Maybe someone else is setting it.

Lockless list help!

Hi im trying to write a lockless list i got the adding part working it think but the code that extracts objects from the list does not work to good :(
Well the list is not a normal list.. i have the Interface IWorkItem
interface IWorkItem
{
DateTime ExecuteTime { get; }
bool Cancelled { get; }
void Execute(DateTime now);
}
and well i have a list where i can add this :P and the idear is when i run Get(); on the list it should loop it until it finds a IWorkItem that
If (item.ExecuteTime < DateTime.Now)
and remove it from the list and return it..
i have ran tests with many threads on my dual core cpu and it seems that Add works never failed so far but the Get function looses some workitems some where i have no idear whats wrong.....
ps if i get this working any one is free to use the code :) well you are any way but i dont se the point when its bugged :P
The code is here http://www.easy-share.com/1903474734/LinkedList.zip and if you try to run it you will see that it will some times not be able to get as many workitems as it did put in the list...
Edit: I have got a lockless list working it was faster than using the lock(obj) statment but i have a lock object that uses Interlocked that was still outpreforming the lockless list, im going to try to make a lockless arraylist and se if i get the same results there when im done ill upload the result here..

The problem is your algorithm: Consider this sequence of events:
Thread 1 calls list.Add(workItem1), which completes fully.
Status is:
first=workItem1, workItem1.next = null
Then thread 1 calls list.Add(workItem2) and reaches the spot right before the second Replace (where you have the comment "//lets try").
Status is:
first=workItem1, workItem1.next = null, nextItem=workItem1
At this point thread 2 takes over and calls list.Get(). Assume workItem1's executionTime is now, so the call succeeds and returns workItem1.
After this status is:
first = null, workItem1.next = null
(and in the other thread, nextItem is still workItem1).
Now we get back to the first thread, and it completes the Add() by setting workItem1.next:=workItem2.
If we call list.Get() now, we will get null, even though the Add() completed successfully.
You should probably look up a real peer-reviewed lock-free linked list algorithm. I think the standard one is this by John Valois. There is a C++ implementation here. This article on lock-free priority queues might also be of use.

You can use a timestamping protocol for datastructures just fine, mirroring this example from the database world:
Concurrency
But be clear that each item needs both a read and write timestamp, and be sure to follow the rules of the algorithm clearly.
There are some additional difficulties of implementing this on a linked list though, I think. The database example would be fine for a vector where you know the array index of what you want. However, in a linked list, you may need to walk down the pointers -- and the structure of the list could change while you are searching! I guess you could solve that by some sort of nuance (or if you just want to traverse the "new" list as it is, do nothing), but it poses a problem. Try to solve it without introducing some rollback condition that makes it worse than locking the list!

So are you sure that it needs to be lockless? Depending on your work load the non-blocking solution can sometimes be slower. Check out this MSDN article for a little more. Also proving that a lockless data structure is correct can be very difficult.

I am in no way an expert on the subject, but as far as I can see, you need to either make the ExecutionTime-field in the implementation of IWorkItem volatile (of course it might already be that) or insert a memorybarrier either after you set the ExecutionTime or before you read it.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.