How to avoid geometric slowdown with large Linq transactions?

How to avoid geometric slowdown with large Linq transactions? - c#

I've written some really nice, funky libraries for use in LinqToSql. (Some day when I have time to think about it I might make it open source... :) )
Anyway, I'm not sure if this is related to my libraries or not, but I've discovered that when I have a large number of changed objects in one transaction, and then call DataContext.GetChangeSet(), things start getting reaalllly slooowwwww. When I break into the code, I find that my program is spinning its wheels doing an awful lot of Equals() comparisons between the objects in the change set. I can't guarantee this is true, but I suspect that if there are n objects in the change set, then the call to GetChangeSet() is causing every object to be compared to every other object for equivalence, i.e. at best (n^2-n)/2 calls to Equals()...
Yes, of course I could commit each object separately, but that kinda defeats the purpose of transactions. And in the program I'm writing, I could have a batch job containing 100,000 separate items, that all need to be committed together. Around 5 billion comparisons there.
So the question is: (1) is my assessment of the situation correct? Do you get this behavior in pure, textbook LinqToSql, or is this something my libraries are doing? And (2) is there a standard/reasonable workaround so that I can create my batch without making the program geometrically slower with every extra object in the change set?

In the end I decided to rewrite the batches so that each individual item is saved independently, all within one big transaction. In other words, instead of:
var b = new Batch { ... };
while (addNewItems) {
...
var i = new BatchItem { ... };
b.BatchItems.Add(i);
}
b.Insert(); // that's a function in my library that calls SubmitChanges()
.. you have to do something like this:
context.BeginTransaction(); // another one of my library functions
try {
var b = new Batch { ... };
b.Insert(); // save the batch record immediately
while (addNewItems) {
...
var i = new BatchItem { ... };
b.BatchItems.Add(i);
i.Insert(); // send the SQL on each iteration
}
context.CommitTransaction(); // and only commit the transaction when everything is done.
} catch {
context.RollbackTransaction();
throw;
}
You can see why the first code block is just cleaner and more natural to use, and it's a pity I got forced into using the second structure...

Related

C# BulkWriteAsync, Transactions and Results

I am relatively new to working with mongodb. Currently I am getting a little more familiar with the API and especially with C# drivers. I have a few understanding questions around bulk updates. As the C# driver offers a BulkWriteAsync method, I could read a lot about it in the mongo documentation. As I understand, it is possible to configure the BulkWrite not to stop in case of an error at any step. This can be done by use the unordered setting. What I did not found is, what happens to the data. Does the database do a rollback in case of an error? Or do I have to use a surrounding by myself? In case of an error: can I get details of which step was not successful? Think of a bulk with updates on 100 documents. Can I find out, which updates were not successfull? As the BulkWriteResult offers very little information, I am not sure if this operation is realy a good one for me.
thanks in advance

You're right in that BulkWriteResult doesn't provide the full set of information to make a call on what to do.
In the case of a MongoBulkWriteException<T>, however, you can access the WriteErrors property to get the indexes of models that errored. Here's a pared down example of how to use the property.
var models = sourceOfModels.ToArray();
for (var i = 0; i < MaxTries; i++)
try
{
return await someCollection.BulkWriteAsync(models, new BulkWriteOptions { IsOrdered = false });
}
catch (MongoBulkWriteException e)
{
// reconstitute the list of models to try from the set of failed models
models = e.WriteErrors.Select(x => models[x.Index]).ToArray();
}
Note: The above is very naive code. My actual code is more sophisticated. What the above does is try over and over to do the write, in each case, with only the outstanding writes. Say you started with 1000 ReplaceOne<T> models to write, and 900 went through; the second try will try against the remaining 100, and so on until retries are exhausted, or there are no errors.
If the code is not within a transaction, and an error occurs, of course nothing is rolled back; you have some writes that succeed and some that do not. In the case of a transaction, the exception is still raised (MongoDB 4.2+). Prior to that, you would not get an exception.
Finally, while the default is ordered writes, unordered writes can be very useful when the writes are unrelated to one another (e.g. documents representing DDD aggregates where there are no dependencies). It's this same "unrelatedness" that also obviates the need for a transaction.

StackExchange.Redis Transaction chaining parameters

I'm trying to execute a basic transactional operation that contains two operations
Get the length of a set scard MySet
Pop the entire set with the given length : spop MySet len
I know it is possible to use smembers and del consecutively. But what I want to achieve is to get the output of the first operation and use it in the second operation and do it in a transaction. Here is what I tried so far:
var transaction = this.cacheClient.Db1.Database.CreateTransaction();
var itemLength = transaction.SetLengthAsync(key).ContinueWith(async lengthTask =>
{
var length = await lengthTask;
try
{
// here I want to pass the length argument
return await transaction.SetPopAsync(key, length); // probably here transaction is already committed
// so it never passes this line and no exceptions thrown.
}
catch (Exception ex)
{
throw;
}
});
await transaction.ExecuteAsync();
Also, I tried the same thing with CreateBatch and get the same result. I'm currently using the workaround I mentioned above. I know it is also possible to evaluate a Lua script but I want to know is it possible with transactions or am I doing something terribly wrong.

The nature of redis is that you cannot read data during multi/exec - you only get results when the exec runs, which means it isn't possible to use those results inside the multi. What you are attempting is kinda doomed. There are two ways of doing what you want here:
Speculatively read what you need, then perform a multi/exec (transaction) block using that knowledge as a constraint, which SE.Redis will enforce inside a WATCH block; this is really complex and hard to get right, quite honestly
Use Lua, meaning: ScriptEvaluate[Async], where you can do everything you want in a series of operations that execute contiguously on the server without competing with other connections
Option 2 is almost always the right way to do this, ever since it became possible.

code performance question

Let's say I have a relatively large list of an object MyObjectModel called MyBigList. One of the properties of MyObjectModel is an int called ObjectID. In theory, I think MyBigList could reach 15-20MB in size. I also have a table in my database that stores some scalars about this list so that it can be recomposed later.
What is going to be more efficient?
Option A:
List<MyObjectModel> MyBigList = null;
MyBigList = GetBigList(some parameters);
int RowID = PutScalarsInDB(MyBigList);
Option B:
List<MyObjectModel> MyBigList = null;
MyBigList = GetBigList(some parameters);
int TheCount = MyBigList.Count();
StringBuilder ListOfObjectID = null;
foreach (MyObjectModel ThisObject in MyBigList)
{
ListOfObjectID.Append(ThisObject.ObjectID.ToString());
}
int RowID = PutScalarsInDB ( TheCount, ListOfObjectID);
In option A I pass MyBigList to a function that extracts the scalars from the list, stores these in the DB and returns the row where these entries were made. In option B, I keep MyBigList in the page method where I do the extraction of the scalars and I just pass these to the PutScalarsInDB function.
What's the better option, and it could be that yet another is better? I'm concerned about passing around objects this size and memory usage.

I don't think you'll see a material difference between these two approaches. From your description, it sounds like you'll be burning the same CPU cycles either way. The things that matter are:
Get the list
Iterate through the list to get the IDs
Iterate through the list to update the database
The order in which these three activities occur, and whether they occur within a single method or a subroutine, doesn't matter. All other activities (declaring variables, assigning results, etc.,) are of zero to negligible performance impact.
Other things being equal, your first option may be slightly more performant because you'll only be iterating once, I assume, both extracting IDs and updating the database in a single pass. But the cost of iteration will likely be very small compared with the cost of updating the database, so it's not a performance difference you're likely to notice.
Having said all that, there are many, many more factors that may impact performance, such as the type of list you're iterating through, the speed of your connection to the database, etc., that could dwarf these other considerations. It doesn't look like too much code either way. I'd strongly suggesting building both and testing them.
Then let us know your results!

If you want to know which method has more performance you can use the stopwatch class to check the time needed for each method. see here for stopwatch usage: http://www.dotnetperls.com/stopwatch
I think there are other issues for a asp.net application you need to verify:
From where do read your list? if you read it from the data base, would it be more efficient to do your work in database within a stored procedure.
Where is it stored? Is it only read and destroyed or is it stored in session or application?

How do I make an in-memory process transactional?

I'm very familiar with using a transaction RDBMS, but how would I make sure that changes made to my in-memory data are rolled back if the transaction fails? What if I'm not even using a database?
Here's a contrived example:
public void TransactionalMethod()
{
var items = GetListOfItems();
foreach (var item in items)
{
MethodThatMayThrowException(item);
item.Processed = true;
}
}
In my example, I might want the changes made to the items in the list to somehow be rolled back, but how can I accomplish this?
I am aware of "software transactional memory" but don't know much about it and it seems fairly experimental. I'm aware of the concept of "compensatable transactions", too, but that incurs the overhead of writing do/undo code.
Subversion seems to deal with errors updating a working copy by making you run the "cleanup" command.
Any ideas?
UPDATE:
Reed Copsey offers an excellent answer, including:
Work on a copy of data, update original on commit.
This takes my question one level further - what if an error occurs during the commit? We so often think of the commit as an immediate operation, but in reality it may be making many changes to a lot of data. What happens if there are unavoidable things like OutOfMemoryExceptions while the commit is being applied?
On the flipside, if one goes for a rollback option, what happens if there's an exception during the rollback? I understand things like Oracle RDBMS has the concept of rollback segments and UNDO logs and things, but assuming there's no serialisation to disk (where if it isn't serialised to disk it didn't happen, and a crash means you can investigate those logs and recover from it), is this really possible?
UPDATE 2:
An answer from Alex made a good suggestion: namely that one updates a different object, then, the commit phase is simply changing the reference to the current object over to the new object. He went further to suggest that the object you change is effectively a list of the modified objects.
I understand what he's saying (I think), and I want to make the question more complex as a result:
How, given this scenario, do you deal with locking? Imagine you have a list of customers:
var customers = new Dictionary<CustomerKey, Customer>();
Now, you want to make a change to some of those customers, how do you apply those changes without locking and replacing the entire list? For example:
var customerTx = new Dictionary<CustomerKey, Customer>();
foreach (var customer in customers.Values)
{
var updatedCust = customer.Clone();
customerTx.Add(GetKey(updatedCust), updatedCust);
if (CalculateRevenueMightThrowException(customer) >= 10000)
{
updatedCust.Preferred = true;
}
}
How do I commit? This (Alex's suggestion) will mean locking all customers while replacing the list reference:
lock (customers)
{
customers = customerTx;
}
Whereas if I loop through, modifying the reference in the original list, it's not atomic,a and falls foul of the "what if it crashes partway through" problem:
foreach (var kvp in customerTx)
{
customers[kvp.Key] = kvp.Value;
}

Pretty much every option for doing this requires one of three basic methods:
Make a copy of your data before modifications, to revert to a rollback state if aborted.
Work on a copy of data, update original on commit.
Keep a log of changes to your data, to undo them in the case of an abort.
For example, Software Transactional Memory, which you mentioned, follows the third approach. The nice thing about that is that it can work on the data optimistically, and just throw away the log on a successful commit.

Take a look at the Microsoft Research project, SXM.
From Maurice Herlihy's page, you can download documentation as well as code samples.

You asked: "What if an error occurs during the commit?"
It doesn't matter. You can commit to somewhere/something in memory and check meanwhile if the operation succeeds. If it did, you change the reference of the intended object (object A) to where you committed (object B). Then you have failsafe commits - the reference is only updated on successful commit. Reference change is atomic.

public void TransactionalMethod()
{
var items = GetListOfItems();
try {
foreach (var item in items)
{
MethodThatMayThrowException(item);
item.Processed = true;
}
}
catch(Exception ex) {
foreach (var item in items)
{
if (item.Processed) {
UndoProcessingForThisItem(item);
}
}
}
}
Obviously, the implementation of the "Undo..." is left as an exercise for the reader.

software transactions in c#

I have some C# code that needs to work like this
D d = new D();
foreach(C item in list)
{
c.value++;
d.Add(c); // c.value must be incremented at this point
}
if(!d.Process())
foreach(C item in list)
c.value--;
It increments a value on each item in a list and then tries to do some processing on them. If the processing fails, it needs to blackout the mutation or the items.
The question it: is there a better way to do this? The things I don't like about this is that there are to may ways I can see for it to go wrong. For instance, if just about anything throws it gets out of sync.
One idea (that almost seems worse than the problem) is:
D d = new D();
var done = new List<C>();
try
{
foreach(C item in list)
{
c.value++;
done.Add(c);
d.Add(c);
}
if(d.Process()) done.Clear();
}
finaly
{
foreach(C c in done)
c.value--;
}

Without knowing the full details of your application, I don't think it's possible to give you a concrete solution to the problem. Basically, there is no method that will always work in 100% of the situations you need something like this.
But if you're in control of the class D and are aware of the circumstance in which D.Process() will fail, you might be able to do one of two things that will make this conceptually easier to manage.
One way is to test D's state before calling the Process method by implementing something like a CanProcess function that returns true if and only if Process would, but without actually processing anything.
Another way is to create a temporary D object that doesn't actually commit, but runs the Process method on its contents. Then, if Process succeeded, you call some kind of Promote or Commit method on it that finalizes the changes. Similar to if you had a Dataset object, you can create a clone of it, do your transactional stuff in the clone, then merge it back into the main dataset if and only if the processing succeeded.

It's memory intensive, but you can make a copy of the list before incrementing. If the process succeeds, you can return the new list. If it fails, return the original list.

Why don't you increment only when the process success?
I think a better way is to make the process method in D increments the c.value inside.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to avoid geometric slowdown with large Linq transactions? - c#

Related

C# BulkWriteAsync, Transactions and Results

StackExchange.Redis Transaction chaining parameters

code performance question

How do I make an in-memory process transactional?

software transactions in c#

Categories

Resources