Improving DAL performance - c#

The way i currently populate business objects is using something similar to the snippet below.
using (SqlConnection conn = new SqlConnection(Properties.Settings.Default.CDRDatabase))
{
using (SqlCommand comm = new SqlCommand(SELECT, conn))
{
conn.Open();
using (SqlDataReader r = comm.ExecuteReader(CommandBehavior.CloseConnection))
{
while (r.Read())
{
Ailias ailias = PopulateFromReader(r);
tmpList.Add(ailias);
}
}
}
}
private static Ailias PopulateFromReader(IDataReader reader)
{
Ailias ailias = new Ailias();
if (!reader.IsDBNull(reader.GetOrdinal("AiliasId")))
{
ailias.AiliasId = reader.GetInt32(reader.GetOrdinal("AiliasId"));
}
if (!reader.IsDBNull(reader.GetOrdinal("TenantId")))
{
ailias.TenantId = reader.GetInt32(reader.GetOrdinal("TenantId"));
}
if (!reader.IsDBNull(reader.GetOrdinal("Name")))
{
ailias.Name = reader.GetString(reader.GetOrdinal("Name"));
}
if (!reader.IsDBNull(reader.GetOrdinal("Extention")))
{
ailias.Extention = reader.GetString(reader.GetOrdinal("Extention"));
}
return ailias;
}
Does anyone have any suggestions of how to improve performance on something like this? Bear in mind that PopulateFromReader, for some types, contains more database look-ups in order to populate the object fully.

One obvious change would be to replace this kind of statement:
ailias.AiliasId = reader.GetInt32(reader.GetOrdinal("AiliasId"));
with
ailias.AiliasId = reader.GetInt32(constAiliasId);
where constAiliasId is a constant holding the ordinal of the field AiliasId.
This avoids the ordinal lookups in each iteration of the loop.

If the data volume is high, then it can happen that the overhead of building a huge list can be a bottleneck; in which case, it can be more efficient to use a streaming object model; i.e.
public IEnumerable<YourType> SomeMethod(...args...) {
using(connection+reader) {
while(reader.Read()) {
YourType item = BuildObj(reader);
yield return item;
}
}
}
The consuming code (via foreach etc) then only has a single object to deal with (at a time). If they want to get a list, they can (with new List<SomeType>(sequence), or in .NET 3.5: sequence.ToList()).
This involves a few more method calls (an additional MoveNext()/Current per sequence item, hidden behind the foreach), but you will never notice this when you have out-of-process data such as from a database.

Your code looks almost identical to a lot of our business object loading functions. When we suspect DAL performance issues, we take a look at a few things things.
How many times are we hopping out to the DB? Is there any way we can connect less often and bring back larger chunks of data via the use of multiple result sets (we use stored procedures.) So, instead of each child object loading its own data, the parent will fetch all data for itself and its children. You can run into fragile SQL (sort orders that need to match, etc) and tricky loops to walk over the DataReaders, but we have found it to be more optimal than multiple DB trips.
Fire up a packet sniffer/network monitor to see exactly how much data is being transmitted across the wire. You may be surprised to see how massive some of the result sets are. If they are, then you might think about alternate ways of approaching the issue. Like lazy/defer loading some child data.
Make sure that you are using all of the results you are asking for. For example, going from SELECT * FROM (with 30 fields being returned) to simply SELECT Id, Name FROM (if that is all you needed) could make a large difference.

AFAIK, that is as fast as it gets. Perhaps the slowness is in the SQL query/server. Or somewhere else.

It's likely the real problem is the multiple, per-object lookups that you mention. Have you looked closely to see if they can all be put into a single stored procedure?

Related

C# LazyList for lots of entries

I recently became aware of the notion of LazyList, and I would like to implement this notion in my work.
I have serveral methods which may retrieve hundreds of thousands of entries from the database, I want to return a LazyList<T> rather than a typical List<T>.
I could only find Lazy<List<T>> which is, as to my understanding, not the same. The Lazy<List<T>> makes the initialization of the list lazy, thats not what I need.
I want to give an example from Scheme language, if someone ever used it.
Basically it is implemented by LinkedNodeswheras the value of a given node needs to be calculated and the node.next is actually a function which needed to be calculated to retrieve the value.
I wonder how to actually control lists in size of 400k or so, It sounds like its so expensive to hold a List in the size of couple of MB which, possibly, can get to GBs depends on the db operation the program needs to do.
Im currently using .Net 4.5, C# version is 4
Instead of returning a List<T> or LazyList, why not yield return the results? This is much better than retrieving all rows. It will stream it row by row. Better for memory management.
For example: (PSEUDO)
private IEnumerator<Row> GetRows(SqlConnection connection)
{
var resultSet = connection.ExecuteQuery(.....);
resultSet.Open();
try
{
while(resultSet.FetchNext())
{
// read one row..
yield return row;
}
}
finally
{
resultSet.Close();
}
}
foreach(var row in GetRows(connection))
{
// handle the row.
}
This way each the result set is handled each row.

code performance question

Let's say I have a relatively large list of an object MyObjectModel called MyBigList. One of the properties of MyObjectModel is an int called ObjectID. In theory, I think MyBigList could reach 15-20MB in size. I also have a table in my database that stores some scalars about this list so that it can be recomposed later.
What is going to be more efficient?
Option A:
List<MyObjectModel> MyBigList = null;
MyBigList = GetBigList(some parameters);
int RowID = PutScalarsInDB(MyBigList);
Option B:
List<MyObjectModel> MyBigList = null;
MyBigList = GetBigList(some parameters);
int TheCount = MyBigList.Count();
StringBuilder ListOfObjectID = null;
foreach (MyObjectModel ThisObject in MyBigList)
{
ListOfObjectID.Append(ThisObject.ObjectID.ToString());
}
int RowID = PutScalarsInDB ( TheCount, ListOfObjectID);
In option A I pass MyBigList to a function that extracts the scalars from the list, stores these in the DB and returns the row where these entries were made. In option B, I keep MyBigList in the page method where I do the extraction of the scalars and I just pass these to the PutScalarsInDB function.
What's the better option, and it could be that yet another is better? I'm concerned about passing around objects this size and memory usage.
I don't think you'll see a material difference between these two approaches. From your description, it sounds like you'll be burning the same CPU cycles either way. The things that matter are:
Get the list
Iterate through the list to get the IDs
Iterate through the list to update the database
The order in which these three activities occur, and whether they occur within a single method or a subroutine, doesn't matter. All other activities (declaring variables, assigning results, etc.,) are of zero to negligible performance impact.
Other things being equal, your first option may be slightly more performant because you'll only be iterating once, I assume, both extracting IDs and updating the database in a single pass. But the cost of iteration will likely be very small compared with the cost of updating the database, so it's not a performance difference you're likely to notice.
Having said all that, there are many, many more factors that may impact performance, such as the type of list you're iterating through, the speed of your connection to the database, etc., that could dwarf these other considerations. It doesn't look like too much code either way. I'd strongly suggesting building both and testing them.
Then let us know your results!
If you want to know which method has more performance you can use the stopwatch class to check the time needed for each method. see here for stopwatch usage: http://www.dotnetperls.com/stopwatch
I think there are other issues for a asp.net application you need to verify:
From where do read your list? if you read it from the data base, would it be more efficient to do your work in database within a stored procedure.
Where is it stored? Is it only read and destroyed or is it stored in session or application?

How to avoid geometric slowdown with large Linq transactions?

I've written some really nice, funky libraries for use in LinqToSql. (Some day when I have time to think about it I might make it open source... :) )
Anyway, I'm not sure if this is related to my libraries or not, but I've discovered that when I have a large number of changed objects in one transaction, and then call DataContext.GetChangeSet(), things start getting reaalllly slooowwwww. When I break into the code, I find that my program is spinning its wheels doing an awful lot of Equals() comparisons between the objects in the change set. I can't guarantee this is true, but I suspect that if there are n objects in the change set, then the call to GetChangeSet() is causing every object to be compared to every other object for equivalence, i.e. at best (n^2-n)/2 calls to Equals()...
Yes, of course I could commit each object separately, but that kinda defeats the purpose of transactions. And in the program I'm writing, I could have a batch job containing 100,000 separate items, that all need to be committed together. Around 5 billion comparisons there.
So the question is: (1) is my assessment of the situation correct? Do you get this behavior in pure, textbook LinqToSql, or is this something my libraries are doing? And (2) is there a standard/reasonable workaround so that I can create my batch without making the program geometrically slower with every extra object in the change set?
In the end I decided to rewrite the batches so that each individual item is saved independently, all within one big transaction. In other words, instead of:
var b = new Batch { ... };
while (addNewItems) {
...
var i = new BatchItem { ... };
b.BatchItems.Add(i);
}
b.Insert(); // that's a function in my library that calls SubmitChanges()
.. you have to do something like this:
context.BeginTransaction(); // another one of my library functions
try {
var b = new Batch { ... };
b.Insert(); // save the batch record immediately
while (addNewItems) {
...
var i = new BatchItem { ... };
b.BatchItems.Add(i);
i.Insert(); // send the SQL on each iteration
}
context.CommitTransaction(); // and only commit the transaction when everything is done.
} catch {
context.RollbackTransaction();
throw;
}
You can see why the first code block is just cleaner and more natural to use, and it's a pity I got forced into using the second structure...

How do I make an in-memory process transactional?

I'm very familiar with using a transaction RDBMS, but how would I make sure that changes made to my in-memory data are rolled back if the transaction fails? What if I'm not even using a database?
Here's a contrived example:
public void TransactionalMethod()
{
var items = GetListOfItems();
foreach (var item in items)
{
MethodThatMayThrowException(item);
item.Processed = true;
}
}
In my example, I might want the changes made to the items in the list to somehow be rolled back, but how can I accomplish this?
I am aware of "software transactional memory" but don't know much about it and it seems fairly experimental. I'm aware of the concept of "compensatable transactions", too, but that incurs the overhead of writing do/undo code.
Subversion seems to deal with errors updating a working copy by making you run the "cleanup" command.
Any ideas?
UPDATE:
Reed Copsey offers an excellent answer, including:
Work on a copy of data, update original on commit.
This takes my question one level further - what if an error occurs during the commit? We so often think of the commit as an immediate operation, but in reality it may be making many changes to a lot of data. What happens if there are unavoidable things like OutOfMemoryExceptions while the commit is being applied?
On the flipside, if one goes for a rollback option, what happens if there's an exception during the rollback? I understand things like Oracle RDBMS has the concept of rollback segments and UNDO logs and things, but assuming there's no serialisation to disk (where if it isn't serialised to disk it didn't happen, and a crash means you can investigate those logs and recover from it), is this really possible?
UPDATE 2:
An answer from Alex made a good suggestion: namely that one updates a different object, then, the commit phase is simply changing the reference to the current object over to the new object. He went further to suggest that the object you change is effectively a list of the modified objects.
I understand what he's saying (I think), and I want to make the question more complex as a result:
How, given this scenario, do you deal with locking? Imagine you have a list of customers:
var customers = new Dictionary<CustomerKey, Customer>();
Now, you want to make a change to some of those customers, how do you apply those changes without locking and replacing the entire list? For example:
var customerTx = new Dictionary<CustomerKey, Customer>();
foreach (var customer in customers.Values)
{
var updatedCust = customer.Clone();
customerTx.Add(GetKey(updatedCust), updatedCust);
if (CalculateRevenueMightThrowException(customer) >= 10000)
{
updatedCust.Preferred = true;
}
}
How do I commit? This (Alex's suggestion) will mean locking all customers while replacing the list reference:
lock (customers)
{
customers = customerTx;
}
Whereas if I loop through, modifying the reference in the original list, it's not atomic,a and falls foul of the "what if it crashes partway through" problem:
foreach (var kvp in customerTx)
{
customers[kvp.Key] = kvp.Value;
}
Pretty much every option for doing this requires one of three basic methods:
Make a copy of your data before modifications, to revert to a rollback state if aborted.
Work on a copy of data, update original on commit.
Keep a log of changes to your data, to undo them in the case of an abort.
For example, Software Transactional Memory, which you mentioned, follows the third approach. The nice thing about that is that it can work on the data optimistically, and just throw away the log on a successful commit.
Take a look at the Microsoft Research project, SXM.
From Maurice Herlihy's page, you can download documentation as well as code samples.
You asked: "What if an error occurs during the commit?"
It doesn't matter. You can commit to somewhere/something in memory and check meanwhile if the operation succeeds. If it did, you change the reference of the intended object (object A) to where you committed (object B). Then you have failsafe commits - the reference is only updated on successful commit. Reference change is atomic.
public void TransactionalMethod()
{
var items = GetListOfItems();
try {
foreach (var item in items)
{
MethodThatMayThrowException(item);
item.Processed = true;
}
}
catch(Exception ex) {
foreach (var item in items)
{
if (item.Processed) {
UndoProcessingForThisItem(item);
}
}
}
}
Obviously, the implementation of the "Undo..." is left as an exercise for the reader.

IDataReader and "HasColumn", Best approach?

I've seen two common approaches for checking if a column exists in an IDataReader:
public bool HasColumn(IDataReader reader, string columnName)
{
try
{
reader.getOrdinal(columnName)
return true;
}
catch
{
return false;
}
}
Or:
public bool HasColumn(IDataReader reader, string columnName)
{
reader.GetSchemaTable()
.DefaultView.RowFilter = "ColumnName='" + columnName + "'";
return (reader.GetSchemaTable().DefaultView.Count > 0);
}
Personally, I've used the second one, as I hate using exceptions for this reason.
However, on a large dataset, I believe RowFilter might have to do a table scan per column, and this may be incredibly slow.
Thoughts?
I think I have a reasonable answer for this old gem.
I would go with the first approach cause its much simpler. If you want to avoid the exception you can cache the field names and do a TryGet on the cache.
public Dictionary<string,int> CacheFields(IDataReader reader)
{
var cache = new Dictionary<string,int>();
for (int i = 0; i < reader.FieldCount; i++)
{
cache[reader.GetName(i)] = i;
}
return cache;
}
The upside of this approach is that it is simpler and gives you better control. Also, note, you may want to look into case insensitive or kana insensitive compares, which would make stuff a little trickier.
A lot depends on how you're using HasColumn. Are you calling it just once or twice, or repeatedly in a loop? Is the column likely to be there or is that completely unknown in advance?
Setting a row filter probably would do a table scan each time. (Also, in theory, GetSchemaTable() could generate an entirely new table with every call, which would be even more expensive -- I don't believe SqlDataReader does this, but at the IDataReader level, who knows?) But if you only call it once or twice I can't imagine this being that much of an issue (unless you have thousands of columns or something).
(I would, however, at least store the result of GetSchemaTable() in a local var within the method to avoid calling it twice in quick succession, if not cache it somewhere on the off chance that your particular IDataReader DOES regenerate it.)
If you know in advance that under normal circumstances the column you ask for will be present, the exception method is a bit more palatable (because the column not being there is, in fact, an exceptional case). Even if not, it might perform slightly better, but again unless you're calling it repeatedly you should ask yourself if performance is really that much of a concern.
And if you ARE calling it repeatedly, you probably should consider a different approach anyway, such as: call GetSchemaTable() once up front, loop through the table, and load the field names into a Dictionary or some other structure that is designed for fast lookups.
I wouldn't worry about the performance impact. Even if you had a table with 1000 columns (which would be an enormous table), you are still only doing a "table scan" of 1000 rows. That is likely to be trivial.
Premature optimization will just lead you toward an unnecessarily complex implementation. Implement the version that seems best to you, and then measure the performance impact. If it is unacceptable compared to your performance requirements, then consider alternatives.

Categories

Resources