How do I make an in-memory process transactional?

How do I make an in-memory process transactional? - c#

I'm very familiar with using a transaction RDBMS, but how would I make sure that changes made to my in-memory data are rolled back if the transaction fails? What if I'm not even using a database?
Here's a contrived example:
public void TransactionalMethod()
{
var items = GetListOfItems();
foreach (var item in items)
{
MethodThatMayThrowException(item);
item.Processed = true;
}
}
In my example, I might want the changes made to the items in the list to somehow be rolled back, but how can I accomplish this?
I am aware of "software transactional memory" but don't know much about it and it seems fairly experimental. I'm aware of the concept of "compensatable transactions", too, but that incurs the overhead of writing do/undo code.
Subversion seems to deal with errors updating a working copy by making you run the "cleanup" command.
Any ideas?
UPDATE:
Reed Copsey offers an excellent answer, including:
Work on a copy of data, update original on commit.
This takes my question one level further - what if an error occurs during the commit? We so often think of the commit as an immediate operation, but in reality it may be making many changes to a lot of data. What happens if there are unavoidable things like OutOfMemoryExceptions while the commit is being applied?
On the flipside, if one goes for a rollback option, what happens if there's an exception during the rollback? I understand things like Oracle RDBMS has the concept of rollback segments and UNDO logs and things, but assuming there's no serialisation to disk (where if it isn't serialised to disk it didn't happen, and a crash means you can investigate those logs and recover from it), is this really possible?
UPDATE 2:
An answer from Alex made a good suggestion: namely that one updates a different object, then, the commit phase is simply changing the reference to the current object over to the new object. He went further to suggest that the object you change is effectively a list of the modified objects.
I understand what he's saying (I think), and I want to make the question more complex as a result:
How, given this scenario, do you deal with locking? Imagine you have a list of customers:
var customers = new Dictionary<CustomerKey, Customer>();
Now, you want to make a change to some of those customers, how do you apply those changes without locking and replacing the entire list? For example:
var customerTx = new Dictionary<CustomerKey, Customer>();
foreach (var customer in customers.Values)
{
var updatedCust = customer.Clone();
customerTx.Add(GetKey(updatedCust), updatedCust);
if (CalculateRevenueMightThrowException(customer) >= 10000)
{
updatedCust.Preferred = true;
}
}
How do I commit? This (Alex's suggestion) will mean locking all customers while replacing the list reference:
lock (customers)
{
customers = customerTx;
}
Whereas if I loop through, modifying the reference in the original list, it's not atomic,a and falls foul of the "what if it crashes partway through" problem:
foreach (var kvp in customerTx)
{
customers[kvp.Key] = kvp.Value;
}

Pretty much every option for doing this requires one of three basic methods:
Make a copy of your data before modifications, to revert to a rollback state if aborted.
Work on a copy of data, update original on commit.
Keep a log of changes to your data, to undo them in the case of an abort.
For example, Software Transactional Memory, which you mentioned, follows the third approach. The nice thing about that is that it can work on the data optimistically, and just throw away the log on a successful commit.

Take a look at the Microsoft Research project, SXM.
From Maurice Herlihy's page, you can download documentation as well as code samples.

You asked: "What if an error occurs during the commit?"
It doesn't matter. You can commit to somewhere/something in memory and check meanwhile if the operation succeeds. If it did, you change the reference of the intended object (object A) to where you committed (object B). Then you have failsafe commits - the reference is only updated on successful commit. Reference change is atomic.

public void TransactionalMethod()
{
var items = GetListOfItems();
try {
foreach (var item in items)
{
MethodThatMayThrowException(item);
item.Processed = true;
}
}
catch(Exception ex) {
foreach (var item in items)
{
if (item.Processed) {
UndoProcessingForThisItem(item);
}
}
}
}
Obviously, the implementation of the "Undo..." is left as an exercise for the reader.

Related

C# BulkWriteAsync, Transactions and Results

I am relatively new to working with mongodb. Currently I am getting a little more familiar with the API and especially with C# drivers. I have a few understanding questions around bulk updates. As the C# driver offers a BulkWriteAsync method, I could read a lot about it in the mongo documentation. As I understand, it is possible to configure the BulkWrite not to stop in case of an error at any step. This can be done by use the unordered setting. What I did not found is, what happens to the data. Does the database do a rollback in case of an error? Or do I have to use a surrounding by myself? In case of an error: can I get details of which step was not successful? Think of a bulk with updates on 100 documents. Can I find out, which updates were not successfull? As the BulkWriteResult offers very little information, I am not sure if this operation is realy a good one for me.
thanks in advance

You're right in that BulkWriteResult doesn't provide the full set of information to make a call on what to do.
In the case of a MongoBulkWriteException<T>, however, you can access the WriteErrors property to get the indexes of models that errored. Here's a pared down example of how to use the property.
var models = sourceOfModels.ToArray();
for (var i = 0; i < MaxTries; i++)
try
{
return await someCollection.BulkWriteAsync(models, new BulkWriteOptions { IsOrdered = false });
}
catch (MongoBulkWriteException e)
{
// reconstitute the list of models to try from the set of failed models
models = e.WriteErrors.Select(x => models[x.Index]).ToArray();
}
Note: The above is very naive code. My actual code is more sophisticated. What the above does is try over and over to do the write, in each case, with only the outstanding writes. Say you started with 1000 ReplaceOne<T> models to write, and 900 went through; the second try will try against the remaining 100, and so on until retries are exhausted, or there are no errors.
If the code is not within a transaction, and an error occurs, of course nothing is rolled back; you have some writes that succeed and some that do not. In the case of a transaction, the exception is still raised (MongoDB 4.2+). Prior to that, you would not get an exception.
Finally, while the default is ordered writes, unordered writes can be very useful when the writes are unrelated to one another (e.g. documents representing DDD aggregates where there are no dependencies). It's this same "unrelatedness" that also obviates the need for a transaction.

Edit : How to change this code into LINQ-expression

First of all, sorry if my question will confuse you. Well, I'm still a rocky about this programming in c#.
I am using the code below:
foreach (var schedule in schedules)
{
if(schedule.SupplierId != Guid.Empty)
{
var supplier = db.Suppliers.Find(schedule.SupplierId);
schedule.CompanyName = supplier.CompanyName;
}
if(schedule.CustomerId != Guid.Empty)
{
var customer = db.Customers.Find(schedule.CustomerId);
schedule.CompanyName= customer.CompanyName;
}
}
It works really well, but what if the company that I will have is about a thousand company, this looping will slow my program, how to change this code into LINQ-expression.
Please for your reply. Thank you.

There isn't a good way to do this in the client. There are some tools out there to do a mass update on EF, but I would suggest to run just a query to do this, if you need to do this at all. It seems you are updating a field which is just related, but in fact belongs to another entity. You shouldn't do that, since it means updating the one will leave the other invalid.

What Patrick mentioned is absolutely right from an architectural point of view. In my understanding CompanyName belongs somewhere else, unless this entity is a "read-only view"...which obviously is not. Now, if you can't afford to make a major change I would suggest you move this heavy processing off the main thread to a separate thread...if you can.
You can also load all suppliers and companies in memory rather than opening a database connection 1000 times to issue a lookup query. But, again, consider strongly moving this to a separate thread

How to avoid geometric slowdown with large Linq transactions?

I've written some really nice, funky libraries for use in LinqToSql. (Some day when I have time to think about it I might make it open source... :) )
Anyway, I'm not sure if this is related to my libraries or not, but I've discovered that when I have a large number of changed objects in one transaction, and then call DataContext.GetChangeSet(), things start getting reaalllly slooowwwww. When I break into the code, I find that my program is spinning its wheels doing an awful lot of Equals() comparisons between the objects in the change set. I can't guarantee this is true, but I suspect that if there are n objects in the change set, then the call to GetChangeSet() is causing every object to be compared to every other object for equivalence, i.e. at best (n^2-n)/2 calls to Equals()...
Yes, of course I could commit each object separately, but that kinda defeats the purpose of transactions. And in the program I'm writing, I could have a batch job containing 100,000 separate items, that all need to be committed together. Around 5 billion comparisons there.
So the question is: (1) is my assessment of the situation correct? Do you get this behavior in pure, textbook LinqToSql, or is this something my libraries are doing? And (2) is there a standard/reasonable workaround so that I can create my batch without making the program geometrically slower with every extra object in the change set?

In the end I decided to rewrite the batches so that each individual item is saved independently, all within one big transaction. In other words, instead of:
var b = new Batch { ... };
while (addNewItems) {
...
var i = new BatchItem { ... };
b.BatchItems.Add(i);
}
b.Insert(); // that's a function in my library that calls SubmitChanges()
.. you have to do something like this:
context.BeginTransaction(); // another one of my library functions
try {
var b = new Batch { ... };
b.Insert(); // save the batch record immediately
while (addNewItems) {
...
var i = new BatchItem { ... };
b.BatchItems.Add(i);
i.Insert(); // send the SQL on each iteration
}
context.CommitTransaction(); // and only commit the transaction when everything is done.
} catch {
context.RollbackTransaction();
throw;
}
You can see why the first code block is just cleaner and more natural to use, and it's a pity I got forced into using the second structure...

Linq caching data values - major concurrency problem?

Here's a little experiment I did:
MyClass obj = dataContext.GetTable<MyClass>().Where(x => x.ID = 1).Single();
Console.WriteLine(obj.MyProperty); // output = "initial"
Console.WriteLine("Waiting..."); // put a breakpoint after this line
obj = null;
obj = dataContext.GetTable<MyClass>().Where(x => x.ID = 1).Single(); // same as before, but reloaded
Console.WriteLine(obj.MyProperty); // output still = "initial"
obj.MyOtherProperty = "foo";
dataContext.SubmitChanges(); // throws concurrency exception
When I hit the breakpoint after line 3, I go to a SQL query window and manually change the value to "updated". Then I carry on running. Linq does not reload my object, but re-uses the one it previously had in memory! This is a huge problem for data concurrency!
How do you disable this hidden cache of objects that Linq obviously is keeping in memory?
EDIT - On reflection, it is simply unthinkable that Microsoft could have left such a gaping chasm in the Linq framework. The code above is a dumbed-down version of what I'm actually doing, and there may be little subtleties that I've missed. In short, I'd appreciate if you'd do your own experimentation to verify that my findings above are correct. Alternatively, there must be some kind of "secret switch" that makes Linq robust against concurrent data updates. But what?

This isn't an issue I've come across before (since I don't tend to keep DataContexts open for long periods of time), but it looks like someone else has:
http://www.rocksthoughts.com/blog/archive/2008/01/14/linq-to-sql-caching-gotcha.aspx

LinqToSql has a wide variety of tools to deal with concurrency problems.
The first step, however, is to admit there is a concurrency problem to be solved!
First, DataContext's intended object lifecycle is supposed to match a UnitOfWork. If you're holding on to one for extended periods, you're going to have to work that much harder because the class isn't designed to be used that way.
Second, DataContext tracks two copies of each object. One is the original state and one is the changed/changable state. If you ask for the MyClass with Id = 1, it will give you back the same instance it gave you last time, which is the changed/changable version... not the original. It must do this to prevent concurrency problems with in memory instances... LinqToSql does not allow one DataContext to be aware of two changable versions of MyClass(Id = 1).
Third, DataContext has no idea whether your in-memory change comes before or after the database change, and so cannot referee the concurrency conflict without some guidance. All it sees is:
I read MyClass(Id = 1) from the database.
Programmer modified MyClass(Id = 1).
I sent MyClass(Id = 1) back to the database (look at this sql to see optimistic concurrency in the where clause)
The update will succeed if the database's version matches the original (optimistic concurrency).
The update will fail with concurrency exception if the database's version does not match the original.
Ok, now that the problem is stated, here's a couple of ways to deal with it.
You can throw away the DataContext and start over. This is a little heavy handed for some, but at least it's easy to implement.
You can ask for the original instance or the changed/changable instance to be refreshed with the database value by calling DataContext.Refresh(RefreshMode, target) (reference docs with many good concurrency links in the "Remarks" section). This will bring the changes client side and allow your code to work-out what the final result should be.
You can turn off concurrency checking in the dbml (ColumnAttribute.UpdateCheck) . This disables optimistic concurrency and your code will stomp over anyone else's changes. Also heavy handed, also easy to implement.

Set the ObjectTrackingEnabled property of the DataContext to false.
When ObjectTrackingEnabled is set to true the DataContext is behaving like a Unit of Work. It's going to keep any object loaded in memory so that it can track changes to it. The DataContext has to remember the object as you originally loaded it to know if any changes have been made.
If you are working in a read only scenario you should turn off object tracking. It can be a decent performance improvement.
If you aren't working in a read only scenario then I'm not sure why you want it to work this way. If you have made edits then why would you want it to pull in modified state from the database?

LINQ to SQL uses the identity map design pattern which means that it will always return the same instance of an object for it's given primary key (unless you turn off object tracking).
The solution is simply either use a second data context if you don't want it to interfere with the first instance or refresh the first instance if you do.

software transactions in c#

I have some C# code that needs to work like this
D d = new D();
foreach(C item in list)
{
c.value++;
d.Add(c); // c.value must be incremented at this point
}
if(!d.Process())
foreach(C item in list)
c.value--;
It increments a value on each item in a list and then tries to do some processing on them. If the processing fails, it needs to blackout the mutation or the items.
The question it: is there a better way to do this? The things I don't like about this is that there are to may ways I can see for it to go wrong. For instance, if just about anything throws it gets out of sync.
One idea (that almost seems worse than the problem) is:
D d = new D();
var done = new List<C>();
try
{
foreach(C item in list)
{
c.value++;
done.Add(c);
d.Add(c);
}
if(d.Process()) done.Clear();
}
finaly
{
foreach(C c in done)
c.value--;
}

Without knowing the full details of your application, I don't think it's possible to give you a concrete solution to the problem. Basically, there is no method that will always work in 100% of the situations you need something like this.
But if you're in control of the class D and are aware of the circumstance in which D.Process() will fail, you might be able to do one of two things that will make this conceptually easier to manage.
One way is to test D's state before calling the Process method by implementing something like a CanProcess function that returns true if and only if Process would, but without actually processing anything.
Another way is to create a temporary D object that doesn't actually commit, but runs the Process method on its contents. Then, if Process succeeded, you call some kind of Promote or Commit method on it that finalizes the changes. Similar to if you had a Dataset object, you can create a clone of it, do your transactional stuff in the clone, then merge it back into the main dataset if and only if the processing succeeded.

It's memory intensive, but you can make a copy of the list before incrementing. If the process succeeds, you can return the new list. If it fails, return the original list.

Why don't you increment only when the process success?
I think a better way is to make the process method in D increments the c.value inside.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.