What Happens if a Web Api Controller is hit in rapid succession? - c#

I have a Web Api 2 (using C#) controller method running an asynchronous save (RESTful POST method). Our QA tester mashed the save button (client-side issue that was fixed, irrelevant to Q). Without a duplicate check, naturally this was saving duplicate entries to the db.
I implemented a duplicate check (duplicateCount just selects the number of items that are exactly like the item passed to post, should be 0, and it works, details irrelevant):
var duplicateCount = (fooCollection.CountAsync(aggregateFilter)).Result;
if (duplicateCount > 0) { return BadRequest(); }
This check works...except on the first button mash - two duplicate entries get saved, each from an individual controller hit.
So, it seems to me that the second controller hit happens before the first controller hit manages to save the item to the db, so the duplicate check passes. Is this possible?
I am more interested in the theory than the particular answer. Also, I know I can check for duplicates in the db as well, this is more of a conceptual question. The MongoDb part is really there just for completeness, I imagine it would be similar if I was doing an async save to SQL.
edit: Someone asked how I am making the call in comments. It's through RestAngular, but imo it's irrelevant because I know the controller is getting hit as many times as I hit the button. I also know for a fact that it does not create duplicates on a single hit.

Quick answer is "yes" - with the ability of a controller to be instantiated on any number of threads simultaneously it acts as though its multi-threaded. Your code is not "thread safe" in that its business operation requires an exclusive lock to be placed on some element of shared information (in this case the state of the database).
You could (I would not recommend) open a Mutex or a database transaction to force single thread behaviour, but your throughput would tank.
I personally don't face this very often because of my (possibly bad) insistence on all my entities having Guid primary keys and use of the SQL Merge command to either insert or update. This may be a helpful pattern (it doesn't matter how many times you send the same "message" to the controller - it will never save a duplicate).

Related

Make sure that call api success and save myself success

Hello I found problem when I use ASP.NET MVC with EF and call Web API from other website(that have also use Entity Framework)
the problem is that
I want to make sure that both MVC SaveChanges() and Web API SaveChanges() succeed both together.
Here's my dream pseudo code
public ActionResult Operation()
{
Code Insert Update Delete....
bool testMvcSaveSuccess = db.TempSaveChanges(); //it does not have this command.
if(testMvcSaveSuccess == true)
{
bool isApiSuccess = CallApi(); //insert data to Other Web App
if(isApiSuccess == true)
{
db.SaveChanges(); //Real Save
}
}
}
From above code, if it doesn't have db.TempSaveChanges(), maybe Web API will be successful, but MVC SaveChanges() might fail.
So there is nothing like TempSaveChanges because there is something even better: Transactions.
Transaction is an IDisposable (can be used in a using block) and has methods like Commit and Rollback.
Small example:
private void TestTransaction()
{
var context = new MyContext(connectionString);
using (var transaction = context.Database.BeginTransaction())
{
// do CRUD stuff here
// here is your 'TempSaveChanges' execution
int changesCount = context.SaveChanges();
if (changesCount > 0)
// changes were made
{
// this will do the real db changes
transaction.Commit();
}
else
{
// no changes detected -> so do nothing
// could use 'transaction.Rollback();' since there are no changes, this should not be necessary
// using block will dispose transaction and with it all changes as well
}
}
}
I have extracted this example from my GitHub Exercise.EntityFramework repository. Feel free to Star/Clone/Fork...
Yes you can.
you need to overload the .Savechanges in the context class where it will be called first checked and then call the regular after.
Or create you own TempSaveChanges() in the context class call it then if successful call SaveChanges from it.
What you are referring to is known as atomicity: you want several operations to either all succeed, or none of them. In the context of a database you obtain this via transactions (if the database supports it). In your case however, you need a transaction which spans across two disjoint systems. A general-purpose (some special cases have simpler solutions) robust implementation of such a transaction would have certain requirements on the two systems, and also require additional persistence.
Basically, you need to be able to gracefully recover from a sudden stop at any point during the sequence. Each of the databases you are using are most likely ACID compliant, so you can count on each DB transaction to fulfill the atomicity requirement (they either succeed or fail). Therefore, all you need to worry about is the sequence of the two DB transactions. Your requirement on the two systems is a way to determine a posteriori whether or not some operation was performed.
Example process flow:
Operation begins
Generate unique transaction ID and persist (with request data)
Make changes to local DB and commit
Call external Web API
Flag transaction as completed (or delete it)
Operation ends
Recovery:
Get all pending (not completed) transactions from store
Check if expected change to local DB was made
Ask Web API if expected change was made
If none of the changes were made or both of the changes were made then the transaction is done: delete/flag it.
If one of the changes was made but not the other, then either revert the change that was made (revert transaction), or perform the change that was not (resume transaction) => then delete/flag it.
Now, as you can see it quickly gets complicated, specially if "determining if changes were made" is a non-trivial operation. What is a common solution to this is to use that unique transaction ID as a means of determining which data needs attention. But at this point it gets very application-specific and depends entirely on what the specific operations are. For certain applications, you can just re-run the entire operation (since you have the entire request data stored in the transaction) in the recovery step. Some special cases do not need to persist the transaction since there are other ways of achieving the same things etc.
ok so let's clarify things a bit.
you have an MVC app A1, with its own database D1
you then have an API, let's call it A2 with its own database D2.
you want some code in A1 which does a temp save in D1, then fires a call to A2 and if the response is successful then it saves the temp data from D1 in the right place this time.
based on your pseudo code, I would suggest you create a second table where you save your "temporary" data in D1. So your database has an extra table and the flow is like this:
first you save your A1 data in that table, you then call A2, data gets saved in D2, A1 receives the confirmation and calls a method which moves the data from the second table to where it should be.
Scenarios to consider:
Saving the temp data in D1 works, but the call to A2 fails. you now clear the orphan data with a batch job or simply call something that deletes it when the call to A2 fails.
The call to A2 succeeds and the call to D1 fails, so now you have temp data in D1 which has failed to move to the right table. You could add a flag to the second table against each row, which indicates that the second call to A2 succeeded so this data needs to move in the right place, when possible. You can have a service here which runs periodically and if it finds any data with the flag set to true then it moves the data to the right place.
There are other ways to deal with scenarios like this. You could use a queue system to manage this. Each row of data becomes a message, you assign it a unique id, a GUID, that is basically a CorrelationID and it's the same in both systems. Even if one system goes down, when it comes back up the data will be saved and all is good in the world and because of the common id you can always link it up properly.

asp.net shopping cart inventory management in case of multi user environment

is there any need to handle locks in terms of threading in any inventory application.
like as i think asp.net is not thread safe.
lets say that there is a product available and its quantity available is 1 and number of user partially trying to book that particular product are 40. so which is going to get that product. or what happens.
not sure even if the question is reliable or not.
http://blogs.msdn.com/b/benchr/archive/2008/09/03/does-asp-net-magically-handle-thread-safety-for-you.aspx
i am not sure on this please help.
Well, technically, you're not even talking about ASP.NET here, but rather Entity Framework or whatever else you're using to communicate with SQL Server or whatever else persistent data store you're using. Relational databases will typically row-lock, so that as one client is updating the row, the row cannot be read by another client, but you can still run into concurrency issues.
You can handle this situation one of two ways: pessimistic concurrency or optimistic concurrency. With pessimistic concurrency you create locks and any other thread trying to read/write the same data is simply turned away in the mean time. In a multi-threaded environment, it's far more common to use optimistic concurrency, since this allows a bit of play room for failover.
With optimistic concurrency, you version the data. As a simplistic example, let's say that I'm looking for the current stock of widgets in my dbo.Widgets table. I'd have a column like Version which might initially be set to "1" and 100 widgets in my Stock column. Client one wants to buy a widget, so I read the row and note the version, 1. Now, I want to update the column so I do an update to set Stock to 99 and Version to 2, but I include in my where clause Version = 1. But, between the time the row was initially read and the update was sent, another client bought a widget and updated the version of the row to 2. The first client's update fails, because Version is no longer 1. So the application then reads the row fresh and tries to update it again, subtracting 1 from Stock and incrementing Version by 1. Rinse and repeat. Generally, you'll want to have some upward limit of attempts before you'll just give up and return an error to the user, but in most scenarios, you might have one collision and then the next one goes through fine. Your server would have to be getting slammed with people eagerly trying to buy widgets before it would be a real problem.
Now of course, this is a highly simplistic approach, and honestly, not something you really have to manage yourself. Entity Framework, for example, will handle concurrency for you automatically as long as you have a rowversion column:
[Timestamp]
public byte[] RowVersion { get; set; }
See http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application for the full guide to setting it up.
ASP.NET certainly is not Thread Safe. The article you link too is fine as a start, but doesn't tell all the story by a long way. In your case, you likely load the Product List into memory at first request for it, or at Application Startup or some other trigger.
When a Request wants to work with a product you grab the appropriate member of this preloaded list. (Believe me this is better than having every request loading the product or product list from the database.) However, now if you have 40 simultaneous requests for the same product they will all be accessing the same object, and new nasty things can happen, like ending up with -39 stock.
You can address this in a many ways ways, but they boild down to two:
Protect the data somehow
Do what Amazon does
Protect the data
There are numerous ways of doing this. One would be to use a critical section via the Lock keyword on C#. For an example, something like this in the Product Class:
private object lockableThing; // Created in the ctor
public bool ReduceStockLevelForSale(int qtySold)
{
bool success = false;
if (this.quantityOnHand >= qtySold)
{
lock (lockableThing)
{
if (this.quantityOnHand >= qtySold)
{
this.quantityOnHand -= qtySold;
success = true;
}
}
}
return success;
}
The double check on the quantity on hand is deliberate and required. There are any number of ways of doing the equivalent. Books have been written about this sort of thing.
Do what Amazon does
As long as at some point in the Order Taking sequence, Amazon thinks it has enough on hand (or maybe even any) it will let you place the order. It doesn't reduce the stock level while the order is being confirmed. Once the order has been confirmed, it has a back-end process (i.e. NOT run by the Web Site) which checks order by order that the order can be fulfilled, and only reduces the On Hand level if it can. If it can't be, they put the order on hold and send you an email saying 'Sorry! We don't have enough of Product X!' and giving you some options.
Discussion
Amazon's is the best way, because if you decrement the stock from the Web Site at what point do you do it? Probably not until the order is confirmed. If the stock has gone, what do you then do? Also, you are going to have to have some functionality to send the 'Sorry!' email: what happens when the last one (or two or three) items of that product can't be found, don't physically exist or are broken? You send a 'Sorry!' email.
However, this does assume that you are in control of the full order to dispatch cycle which is not always the case. If you aren't in control of the full cycle, you need to adjust to what you are in control of, and then pick a method.

Prevent the DbContext from repeatedly trying to save bad data

I have a process that is importing an Excel Spreadhseet, and parsing the data into my data objects. The source of this data is very questionable, as we're moving our customer from a spreadsheet-based data management into a managed database system with checks for valid data.
During my import process, I do some basic sanity checks of the data to accommodate just how bad the data could be that we're importing, but I have my overall validation being done in the DbContext.
Part of what I'm trying to do is that I want to provide the Row # in the spreadsheet that the data is bad so they can easily determine what they need to fix to get the file to import.
Once I have the data from the spreadsheet (model), and the Opportunity they're working with from the database (opp), here's the pseudocode of my process:
foreach (var model in Spreadsheet.Rows) { // Again, pseudocode
if(opp != null && ValidateModel(model, opp, row)) {
// Copy properties to the database object
// This is in a Repository-layer method, not directly in my import process.
// Just written here for clarity instead of several nested method calls.
context.SaveChanges();
}
}
I can provide more of the code here if needed, but the problem comes in my DbContext's ValidateEntity() method (override of DbContext).
Again, there's nothing wrong with the code that I've written, as far as I'm aware, but if an Opportunity that failed this level of validation, then it stays as part of the unsaved objects in the context, meaning that it repeatedly tries to get validated every time the ValidateEntity() is called. This leads to a repeat of the same Validation Error message for every row after the initial problem occurs.
Is there a way to [edit]get the Context to stop trying to validate the object after it fails validation once[edit]? I know I could wait until the end and call context.SaveChanges() once at the end to get around this, but I would like to be able to match this with what row it is in the Database.
For reference, I am using Entity Framework 6.1 with a Code First approach.
EDIT Attempting to clarify further for Marc L. (including an update to the code block above)
Right now, my process will iterate through as many rows as there are in the Spreadsheet. The reason why I'm calling my Repository layer with each object to save, instead of working with an approach that only calls context.SaveChanges() once is to allow myself the ability to determine which row is the one that is causing a validation error.
I'm glad that my DbContext's custom ValidateEntity() methods are catching the validation errors, but the problem resides in the fact that it is not throwing the DbEntityValidationException for the same entity multiple times.
I would like it so that if the object fails validation once, the context no longer tries to save the object, regardless of how many times context.SaveChanges() is called.
Your question is not a dupe (this is about saving, not loaded entities) but you could follow Jimmy's advice above. That is, once an entity is added to the context it is tracked in the "added" state and the only way to stop it from re-validating is by detaching it. It's an SO-internal link, but I'll reproduce the code snippet:
dbContext.Entry(entity).State = EntityState.Detached;
However, I don't think that's the way you want to go, because you're using exceptions to manage state unnecessarily (exceptions are notoriously expensive).
Working from the information given, I'd use a more set-based solution:
modify your model class so that it contains a RowID that records the original spreadsheet row (there's probably other good reasons to have this, too)
turn off entity-tracking for the context (turns of change detection allowing each Add() to be O(1))
add all the entities
call context.GetValidationErrors() and get all your errors at once, using the aforementioned RowID to identify the invalid rows.
You didn't indicate whether your process should save the good rows or reject the file as a whole, but this will accommodate either--that is, if you need to save the good rows, detach all the invalid rows using the code above and then SaveChanges().
Finally, if you do want to save the good rows and you're uncomfortable with the set-based method, it would be better to use a new DbContext for every single row, or at least create a new DbContext after each error. The ADO.NET team insists that context-creation is "relatively cheap" (sorry I don't have a cite or stats at hand for this) so this shouldn't damage your throughput too much. Even so, it will at least remain O(n). I wouldn't blame you, managing a large context can open you up to other issues as well.

Querying the write model for duplicated aggregate root property

I'm implementing CQRS pattern with Event sourcing, I'm using NServiceBus, NEventStore and NES(Connects between NSB and NEventStore).
My application will check a web service regularly for any file to be downloaded and processed. when the a file is found, a command (DownloadFile) is sent to the bus, and received by FileCommandHandler which creates a new aggregate root (File) and handle the message.
Now inside the (File aggregate root) I have to check that the content of the file doesn't match with any other file content (Since the web service guarantee that only file name is unique, and the content may be duplicated with different name), by hashing it and comparing with the list of hashed contents.
The question is where I have to save the list of hash codes? is it allowed to query the read model?
public class File : AggregateBase
{
public File(DownloadFile cmd, IFileService fileDownloadService, IClaimSerializerService serializerService, IBus bus)
: this()
{
// code to download the file content, deserialize it, and publish an event.
}
}
public class FileCommandHandler : IHandleMessages<DownloadFile>, IHandleMessages<ExtractFile>
{
public void Handle(DownloadFile command)
{
//for example, is it possible to do this (honestly, I feel it is not, since read model should always considered stale !)
var file = readModelContext.GetFileByHashCode (Hash(command.FileContent));
if (file != null)
throw new Exception ("File content matched with another already downloaded file");
// Since there is no way to query the event source for file content like:
// eventSourceRepository.Find<File>(c=>c.HashCode == Hash(command.FileContent));
}
}
Seems like you're looking for deduplication.
Your command side is where you want things to be consistent. Queries will always leave you open to race conditions. So, instead of running a query, I'd reverse the logic and actually write the hash into a database table (any db with ACID guarantees). If this write is successful, process the file. If the write of the hash fails, skip processing.
There's no point putting this logic into a handler, because retrying the message in case of failure (ie storing the hash multiple times) will not make it succeed. You'd also end up with messages for duplicate files in the error q.
A good place for the deduplication logic is likely inside your web service client. Some pseudo logic
Get file
Open transaction
Insert hash into database & catch failure (not any failure, only failure to insert)
Bus.Send message to process file if # of records inserted in step 3 is not zero
commit transaction
Some example deduplication code in NServiceBus gateway here
Edit:
Looking at their code, I actually think the session.Get<DeduplicationMessage> is unnecessary. session.Save(gatewayMessage); should be enough and is the consistency boundary.
Doing a query would make sense only if the rate of failure is high, meaning you have a lot of duplicate content files. If 99%+ of inserts succeed, the duplicates can indeed be treated as exceptions.
This depends on a lot of things ... throughput being one of them. But since you're approaching this problem in a "pull based" fashion anyway (you're querying a webservice to poll for work (downloading and analysing a file)), you could make this whole process serial without having to worry about collisions. Now that might not give the desired rate at which you want to be handling "the work", but more importantly ... have you measured? Let's sidestep that for a minute and assume that serial isn't going to work. How many files are we talking about? A few 100, 1000, ... millions? Depending on that hashes might fit into memory and could be rebuilt if/when the process should come down. There might also be an opportunity to partition your problem along the axis of time or context. Every file since the beginning of dawn or just today, or maybe this month's worth of files? Really, I think you should dig deeper in your problem space. Apart from that, this feels like an awkward problem to solve using event sourcing, but YMMV.
When you have a true uniqueness-constraint in your domain, you can make the uniqueness-tester a domain service, whose implementation is part of the infrastructure -- similar to a repository, whose interface is part of the domain and whose implementation is part of the infrastructure. For the implementation, you can then use an in-memory hash or a database that is updated/queried as needed.

Duplicate returned by Guid.NewGuid()?

We have an application that generates simulated data for one of our services for testing purposes. Each data item has a unique Guid. However, when we ran a test after some minor code changes to the simulator all of the objects generated by it had the same Guid.
There was a single data object created, then a for loop where the properties of the object were modified, including a new unique Guid, and it was sent to the service via remoting (serializable, not marshal-by-ref, if that's what you're thinking), loop and do it again, etc.
If we put a small Thread.Sleep( ...) inside of the loop, it generated unique id's. I think that is a red-herring though. I created a test app that just created one guid after another and didn't get a single duplicate.
My theory is that the IL was optimized in a way that caused this behavior. But enough about my theories. What do YOU think? I'm open to suggestions and ways to test it.
UPDATE: There seems to be a lot of confusion about my question, so let me clarify. I DON'T think that NewGuid() is broken. Clearly it works. Its FINE! There is a bug somewhere though, that causes NewGuid() to either:
1) be called only once in my loop
2) be called everytime in my loop but assigned only once
3) something else I haven't thought of
This bug can be in my code (MOST likely) or in optimization somewhere.
So to reiterate my question, how should I debug this scenario?
(and thank you for the great discussion, this is really helping me clarify the problem in my mind)
UPDATE # 2: I'd love to post an example that shows the problem, but that's part of my problem. I can't duplicate it outside of the whole suite of applications (client and servers).
Here's a relevant snippet though:
OrderTicket ticket = new OrderTicket(... );
for( int i = 0; i < _numOrders; i++ )
{
ticket.CacheId = Guid.NewGuid();
Submit( ticket ); // note that this simply makes a remoting call
}
Does Submit do an async call, or does the ticket object go into another thread at any stage.
In the code example you are reusing the same object. What if Submit sends the ticket in a background thread after a short delay (and does not take a copy). When you change the CacheId you are actually updating all the pending submits. This also explains why a Thread.Sleep fixes the problem. Try this:
for( int i = 0; i < _numOrders; i++ )
{
OrderTicket ticket = new OrderTicket(... );
ticket.CacheId = Guid.NewGuid();
Submit( ticket ); // note that this simply makes a remoting call
}
If for some reason this is not possible, try this and see if they are still the same:
ticket.CacheId = new Guid("00000000-0000-0000-0000-" +
string.Format("{0:000000000000}", i));
Thousands of developers use Guids in .NET. If Guid.NewGuid() had any tendency at all to get "stuck" on one value, the problem would have been encountered long ago.
The minor code changes are the sure culprit here. The fact that Thread.Sleep (which is less a red herring than a fish rotting in the sun) "fixes" your problem suggests that your properties are being set in some weird way that can't take effect until the loop stops blocking (either by ending or by Thread.Sleep). I'd even be willing to bet that the "minor change" was to reset all the properties from a separate thread.
If you posted some sample code, that would help.
It's a bug in your code. If you've managed to generate multiple guid's it is the most likely explanation. The clue is here in your question: "when we ran a test after some minor code changes to the simulator all of the objects generated by it had the same Guid"
See this article about how a Guid is created.
This artcile came from This answer.
Bottom line if you are creating the GUIDs too quickly and the clock hasn't moved forward that is why you are getting some as the same. However when you put a sleep in it works because the clock has moved.
The code in Submit and OrderTicket would be helpful as well...
You're reusing OrderTicket. I'd suspect that either you (or remoting itself) is batching calls out - probably in respect to # of connections/host limits - and picking up the last value of CacheId when it finally sends them along.
If you debug or Thread.Sleep the app, you're changing the timing so that the remoting call finishes before you assign a new CacheId.
Are you asyncing the remoting call? I'd think a sync call would block - but I'd check with a packet sniffer like Wireshark to be sure. Regardless, just changing to creating a new OrderTicket in each iteration would probably do the trick.
Edit: The question is not about NewGuid being broken...so my previous answer has been removed.
I dont know the details of how GUIDs are generated.. yet. However currently my org. is breeding GUIDs at a rate that would put rabbits to shame. So I can vouch for the fact that GUIDs aren't broken.. yet.
Post the source code if possible.. or a clone repro app. Many times I find the act of creating that clone app to repro the problem shows me the issue.
The other approach would be to comment out "those minor changes". If that fixes the problem, you can then triangularize to find the offending line of code. Eye-ball the minor changes hard... I mean real Hard.
Do let us know how it goes... this sounds interesting.
My gut is telling me something along these lines is going on...
class OrderTicket
{
Guid CacheId {set {_guid = new Guid("00000000-0000-0000-0000-");}
}
Log the value of CacheId into a log file every time its called with a stack trace ... Maybe someone else is setting it.

Categories

Resources