Transactional operations simultaneously mixed with non-transactional ones - c#

I need to perform data import from external source to my database. Because there is a lot of data to download, the import is executing for a long time and I need to persist periodic updates about current importing state to the database (for the user to follow).
Suppose I have 2 tables: Import (storage for imported data) and Status (importing state monitoring table).
The code for data import:
public class Importer
{
public delegate void ImportHandler(string item);
public event ImportHandler ImportStarted;
public void OnStart(string item)
{
ImportStarted(item);
}
public void Execute(string[] items)
{
foreach (var item in items)
{
OnStart(item);
PersistImportedData(Download(item));
}
}
private void PersistImportedData(object data)
{
using (var connection = new SqlConnection()){ /*saving imported data*/ }
}
}
The starter code - for invoking import task and updating its status:
public class Starter
{
public void Process(string[] items)
{
var importer = new Importer();
importer.ImportStarted += UpdateImportState;
importer.Execute(items);
}
private void UpdateImportState(string item)
{
using (var connection = new SqlConnection()){ /*status updates*/ }
}
}
Now everything works fine. Import is executing and user is getting status updates (from Status table) as import goes on.
The problem occurs because such logic is not safe. I have to be sure, that import is an atomic operation. I don't want partially downloaded and saved data. I've used transaction approach as a solution for this (I've wrapped importer.Execute with TransactionScope):
importer.ImportStarted += UpdateImportState;
using (var scope = new TransactionScope())
{
importer.Execute(items);
scope.Complete();
}
Now I have safety - rollback occurs e.g. in case of process abort.
I faced different problem now - the one I want to resolve. I need status updates information for the user to show, but the Status table is not affected by updates, while transaction is not yet completed. Even if I try to use RequiresNew option for creating separate transaction (not ambient one), nothing changes. The Execute function creates its own connection to database and UpdateImportState does the same. The connection is not shared. I don't know why State table cannot be affected, even if TransactionScope covers only logic connected with Import table.
How to preserve consistent import and allow periodic status updates ?

Use TransactionScopeOption.Suppress in UpdateImportState instead of TransactionScopeOption.RequiresNew

Related

Batching stored procedure calls from C# class

We had performance issues in our app due to one by one updates of entities in a DB table where the number of rows was high (more than a million). We kept getting deadlock victim errors so it was obvious that the table was locking rows longer than it should have.
Currently, we implemented a manual batching of configurable limit/time threshold of stored procedure calls to update entities in the DB.
The simplified class would look something like this:
public class EntityBatchUpdater
{
private readonly IRepository _repository;
private List<Entity> _batch = new List<Entity>();
private readonly Timer _batchPostingTimer;
private readonly int _batchSize;
private static readonly object _batchPostingLock = new object();
public EntityBatchUpdater(IRepository repository)
{
_repository = repository;
_batchSize = 1000; // configurable
string batchPostingPeriod = "00:01:00.00"; // configurable
_batchPostingTimer = new Timer
{
Interval = TimeSpan.Parse(batchPostingPeriod).TotalMilliseconds,
Enabled = true,
AutoReset = true,
};
_batchPostingTimer.Elapsed += OnTimedEvent;
}
public void Update(Entity entity)
{
try
{
lock (_batchPostingLock)
{
_batch.Add(entity);
if (_batch.Count == _batchSize)
{
EntityBatchUpdate();
}
}
}
catch (Exception ex)
{
_logger.LogError(ex, $"Failed to insert batch {JsonConvert.SerializeObject(batch)}");
}
}
private void EntityBatchUpdate()
{
if (_batch.Count == 0)
{
return;
}
try
{
var entityBatchXML = SerializePayload();
_repository.BatchUpdate(entityBatchXML);
_batch.Clear();
}
catch (Exception ex)
{
_logger.LogError(ex, $"Failed to insert batch; batchSize:{_batch.Count}");
}
}
private string SerializePayload()
{
using (var sw = new System.IO.StringWriter())
{
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("", "");
var serializer = new XmlSerializer(typeof(List<Entity>),
new XmlRootAttribute("ENTITY_BATCH"));
serializer.Serialize(sw, _batch, ns);
return sw.ToString();
}
}
private void OnTimedEvent(object source, ElapsedEventArgs e)
{
EntityBatchUpdate();
}
}
Currently, we took advantage of SQL Server's fast XML processing and serialize payload into XML when calling the actual procedure to avoid hitting the DB with a lot of calls.
I also thought about creating a Table Valued Param to serialize the data we need to send to the proc, but I don't think that would drastically improve the performance.
My question is: How did you handle great load like this on your DB?
1.) Did you use a nuget package/some other tool to handle batching for you?
2.) Did you solve this using some other practice?
Edit:
To give a bit more insight: We are currently processing a queue manually in our app (hence the huge number of updates), and we want to do it as fast and as reliable as possible.
We will move to a better queueing mechanism in the future (RabbtiMQ or Kafka), but in the meantime, we want to have a standard approach of consuming and processing queues from a DB table.
The most confusing part, is that you are doing this work from C# code in the first place. You work on > 1.000.000 Records. That is not a work you should be doing in code, ever. I do not really see what you are doing with those Records, but as a general just keep that scale of work in the Database. Always try to do as much filtering, inserting, bulk inserting, bulk updating and the like on the DB side.
Never move what could be SQL to a client programm. At best you add the network load of having to move the data over the network once or even twice (for Updates). At worst you add a huge danger for race conditions. And at no point will you have a chance to beat the time the DB server would need for the same operation.
This seems to be just a Bulk Insert. SQL has a seperate command for it and any DBMS worth it's Diskspace has the option to import data from variety of file formats, inlcluding .csv.

Azure Mobile Service Offline Sync in Xamarin.Forms

I've gone through instructions In this documentation To implement offline sync on my Xamarin.Forms client But when I pull data using sync table, I don't get the data presently in the cloud, Instead when I Read data using the normal table, I actually receive data normally, I don't understand, Here is my code to get data Using SYncTable :
/// <summary>
/// Initialize offline sync
/// </summary>
/// <returns></returns>
public async Task InitializeAsync()
{
if(!_client.SyncContext.IsInitialized)
{
_store.DefineTable<T>();
await _client.SyncContext.InitializeAsync(_store, new MobileServiceSyncHandler());
await SyncOfflineCacheAsync();
}
}
public async Task SyncOfflineCacheAsync()
{
try
{
Debug.WriteLine("SyncOfflineCacheAsync: Initializing...");
await InitializeAsync();
// Push the Operations Queue to the mobile backend
Debug.WriteLine("SyncOfflineCacheAsync: Pushing Changes");
await _client.SyncContext.PushAsync();
// Pull each sync table
Debug.WriteLine("SyncOfflineCacheAsync: Pulling tags table");
_table = _client.GetSyncTable<T>();
string queryName = $"incsync_{typeof(T).Name}";
await _table.PullAsync(queryName, _table.CreateQuery());
}
catch (MobileServicePushFailedException e )
{
if (e.PushResult != null)
{
foreach (var error in e.PushResult.Errors)
{
await ResolveConflictAsync(error);
}
}
}
catch(Exception e)
{
throw e ;
}
}
I get no data previously added online
But when I get data without offline sync, it functions well
var data = await baseAzureMobileService.NormalTable.ReadAsync();
Try calling PullAsync with null in place of queryName, that will force it to fetch all the records instead of trying to do an incremental sync.
AFAIK, the Incremental Sync request would look like this:
Get https://{your-app-name}.azurewebsites.net/tables/TodoItem?$filter=(updatedAt%20ge%20datetimeoffset'2017-11-03T06%3A56%3A44.4590000%2B00%3A00')&$orderby=updatedAt&$skip=0&$top=50&__includeDeleted=true
For Incremental Sync, the updatedAt timestamp of the results returned from your latest pull operation would be stored in the __config table of your local SQLite db as follows:
Note: The format for the value under the id column equals deltaToken|{table-name}|{query-name}.
I would recommend you capture the network traces and check the synced records under your local table to narrow this issue. Since incremental sync has optimized the requests instead of retrieving all records each time, I would recommend you leverage this feature. If your data set is small or you do not care the bandwidth, you could just opt out of incremental sync.

Using SQL Server application locks to solve locking requirements

I have a large application based on Dynamics CRM 2011 that in various places has code that must query for a record based upon some criteria and create it if it doesn't exist else update it.
An example of the kind of thing I am talking about would be similar to this:
stk_balance record = context.stk_balanceSet.FirstOrDefault(x => x.stk_key == id);
if(record == null)
{
record = new stk_balance();
record.Id = Guid.NewGuid();
record.stk_value = 100;
context.AddObject(record);
}
else
{
record.stk_value += 100;
context.UpdateObject(record);
}
context.SaveChanges();
In terms of CRM 2011 implementation (although not strictly relevant to this question) the code could be triggered from synchronous or asynchronous plugins. The issue is that the code is not thread safe, between checking if the record exists and creating it if it doesn't, another thread could come in and do the same thing first resulting in duplicate records.
Normal locking methods are not reliable due to the architecture of the system, various services using multiple threads could all be using the same code, and these multiple services are also load balanced across multiple machines.
In trying to find a solution to this problem that doesn't add massive amounts of extra complexity and doesn't compromise the idea of not having a single point of failure or a single point where a bottleneck could occur I came across the idea of using SQL Server application locks.
I came up with the following class:
public class SQLLock : IDisposable
{
//Lock constants
private const string _lockMode = "Exclusive";
private const string _lockOwner = "Transaction";
private const string _lockDbPrincipal = "public";
//Variable for storing the connection passed to the constructor
private SqlConnection _connection;
//Variable for storing the name of the Application Lock created in SQL
private string _lockName;
//Variable for storing the timeout value of the lock
private int _lockTimeout;
//Variable for storing the SQL Transaction containing the lock
private SqlTransaction _transaction;
//Variable for storing if the lock was created ok
private bool _lockCreated = false;
public SQLLock (string lockName, int lockTimeout = 180000)
{
_connection = Connection.GetMasterDbConnection();
_lockName = lockName;
_lockTimeout = lockTimeout;
//Create the Application Lock
CreateLock();
}
public void Dispose()
{
//Release the Application Lock if it was created
if (_lockCreated)
{
ReleaseLock();
}
_connection.Close();
_connection.Dispose();
}
private void CreateLock()
{
_transaction = _connection.BeginTransaction();
using (SqlCommand createCmd = _connection.CreateCommand())
{
createCmd.Transaction = _transaction;
createCmd.CommandType = System.Data.CommandType.Text;
StringBuilder sbCreateCommand = new StringBuilder();
sbCreateCommand.AppendLine("DECLARE #res INT");
sbCreateCommand.AppendLine("EXEC #res = sp_getapplock");
sbCreateCommand.Append("#Resource = '").Append(_lockName).AppendLine("',");
sbCreateCommand.Append("#LockMode = '").Append(_lockMode).AppendLine("',");
sbCreateCommand.Append("#LockOwner = '").Append(_lockOwner).AppendLine("',");
sbCreateCommand.Append("#LockTimeout = ").Append(_lockTimeout).AppendLine(",");
sbCreateCommand.Append("#DbPrincipal = '").Append(_lockDbPrincipal).AppendLine("'");
sbCreateCommand.AppendLine("IF #res NOT IN (0, 1)");
sbCreateCommand.AppendLine("BEGIN");
sbCreateCommand.AppendLine("RAISERROR ( 'Unable to acquire Lock', 16, 1 )");
sbCreateCommand.AppendLine("END");
createCmd.CommandText = sbCreateCommand.ToString();
try
{
createCmd.ExecuteNonQuery();
_lockCreated = true;
}
catch (Exception ex)
{
_transaction.Rollback();
throw new Exception(string.Format("Unable to get SQL Application Lock on '{0}'", _lockName), ex);
}
}
}
private void ReleaseLock()
{
using (SqlCommand releaseCmd = _connection.CreateCommand())
{
releaseCmd.Transaction = _transaction;
releaseCmd.CommandType = System.Data.CommandType.StoredProcedure;
releaseCmd.CommandText = "sp_releaseapplock";
releaseCmd.Parameters.AddWithValue("#Resource", _lockName);
releaseCmd.Parameters.AddWithValue("#LockOwner", _lockOwner);
releaseCmd.Parameters.AddWithValue("#DbPrincipal", _lockDbPrincipal);
try
{
releaseCmd.ExecuteNonQuery();
}
catch {}
}
_transaction.Commit();
}
}
I would use this in my code to create a SQL Server application lock using the unique key I am querying for as the lock name like this
using (var sqlLock = new SQLLock(id))
{
//Code to check for and create or update record here
}
Now this approach seems to work, however I am by no means any kind of SQL Server expert and am wary about putting this anywhere near production code.
My question really has 3 parts
1. Is this a really bad idea because of something I haven't considered?
Are SQL Server application locks completely unsuitable for this purpose?
Is there a maximum number of application locks (with different names) you can have at a time?
Are there performance considerations if a potentially large number of locks are created?
What else could be an issue with the general approach?
2. Is the solution actually implemented above any good?
If SQL Server application locks are usable like this, have I actually used them properly?
Is there a better way of using SQL Server to achieve the same result?
In the code above I am getting a connection to the Master database and creating the locks in there. Does that potentially cause other issues? Should I create the locks in a different database?
3. Is there a completely alternative approach that could be used that doesn't use SQL Server application locks?
I can't use stored procedures to create and update the record (unsupported in CRM 2011).
I don't want to add a single point of failure.
You can do this much easier.
//make sure your plugin runs within a transaction, this is the case for stage 20 and 40
//you can check this with IExecutionContext.IsInTransaction
//works not with offline plugins but works within CRM Online (Cloud) and its fully supported
//also works on transaction rollback
var lockUpdateEntity = new dummy_lock_entity(); //simple technical entity with as many rows as different lock barriers you need
lockUpdateEntity.Id = Guid.parse("well known guid"); //well known guid for this barrier
lockUpdateEntity.dummy_field=Guid.NewGuid(); //just update/change a field to create a lock, no matter of its content
//--------------- this is untested by me, i use the next one
context.UpdateObject(lockUpdateEntity);
context.SaveChanges();
//---------------
//OR
//--------------- i use this one, but you need a reference to your OrganizationService
OrganizationService.Update(lockUpdateEntity);
//---------------
//threads wait here if they have no lock for dummy_lock_entity with "well known guid"
stk_balance record = context.stk_balanceSet.FirstOrDefault(x => x.stk_key == id);
if(record == null)
{
record = new stk_balance();
//record.Id = Guid.NewGuid(); //not needed
record.stk_value = 100;
context.AddObject(record);
}
else
{
record.stk_value += 100;
context.UpdateObject(record);
}
context.SaveChanges();
//let the pipeline flow and the transaction complete ...
For more background info refer to http://www.crmsoftwareblog.com/2012/01/implementing-robust-microsoft-dynamics-crm-2011-auto-numbering-using-transactions/

EF - Class that called SaveChanges()

Is it possible to get the class that called the SaveChanges() method in the EventHandler?
That's because I have an entity called Activity which can have it's status changed by some parts of the system and I need to log it and save in the database. In the log table I need to store the IDs of the entity that was updated or created and thus caused the activity status to change.
I think I can either do it or try the unmaintainable solution.
The unmaintainable solution would be to add some code to every part of the system that changes the activity status.
PS: I can't use database triggers..
I don't think trying to update another table as part of the SaveChanges is the correct approach here, you would be coupling your logging mechanism to that particular context - what if you wanted to disable logging or switch it out to use a different type of logging? i.e. local file.
I would update the log table along with the entity itself if the update was successful i.e.
var entity = ...
// update entity
if (context.SaveChanges() != 0)
{
// update log table
}
It's possible (but I would recommend against it) using the StackTrace, eg:
public class Test
{
public event EventHandler AnEvent;
public Test()
{
AnEvent += WhoDoneIt;
}
public void Trigger()
{
if (AnEvent != null)
AnEvent(this, EventArgs.Empty);
}
public void WhoDoneIt(object sender, EventArgs eventArgs)
{
var stack = new StackTrace();
for (var i = 0; i < stack.FrameCount; i++)
{
var frame = stack.GetFrame(i);
var method = frame.GetMethod();
Console.WriteLine("{0}:{1}.{2}", i, method.DeclaringType.FullName, method.Name);
}
}
}
public class Program
{
static void Main(string[] args)
{
var test = new Test();
test.Trigger();
Console.ReadLine();
}
}
If you look at the output of the program you can figure out which stack frame you want to look at and analyze the caller based on the Method of that frame.
HOWEVER, this can have serious performance implications - the stack trace is quite an expensive object to create, so I would really recommend changing your code to keep track of the caller in a different way - one idea could be to store the caller in a threadstatic variable before calling SaveChanges and then clearing it out afterwards
From your post it sounds like you're more interested in which entities are updating rather than which method called SaveChanges.
If that's the case, you can examine the pending changes and see which entities are either added or modified (or deleted if you care) and do your logging based on that information.
You would do that like this:
public override int SaveChanges()
{
if (changeSet != null)
foreach (var dbEntityEntry in ChangeTracker.Entries())
{
switch (dbEntityEntry.State)
{
case EntityState.Added:
// log your data
break;
case EntityState.Modified:
// log your data
break;
}
}
return base.SaveChanges();
}

SQL CLR make sure finally block is executed

I have a SQL server CLR stored proc that is used to retrieve a large set of rows, then do a process and update a count in another table.
Here's the flow:
select -> process -> update count -> mark the selected rows as processed
The nature of the process is that it should not count the same set of data twice. And the SP is called with a GUID as an argument.
So I'm keeping a list of GUIDs (in a static list in the SP) that are currently in process and halt the execution for subsequent calls to the SP with the same argument until one currently in process finishes.
I have the code to remove the GUID when a process finishes in a finally block but it's not working everytime. There are instances (like when the user cancels the execution of the SP)where the SP exits without calling the finally block and without removing the GUID from the list so subsequent calls keeps waiting indefinitely.
Can you guys give me a solution to make sure that my finally block will be called no matter what or any other solution to make sure only one ID is in process at any given time.
Here's a sample of the code with the processing bits removed
[Microsoft.SqlServer.Server.SqlProcedure]
public static void TransformSurvey(Guid PublicationId)
{
AutoResetEvent autoEvent = null;
bool existing = false;
//check if the process is already running for the given Id
//concurrency handler holds a dictionary of publicationIds and AutoresetEvents
lock (ConcurrencyHandler.PublicationIds)
{
existing = ConcurrencyHandler.PublicationIds.TryGetValue(PublicationId, out autoEvent);
if (!existing)
{
//there's no process in progress. so OK to start
autoEvent = new AutoResetEvent(false);
ConcurrencyHandler.PublicationIds.Add(PublicationId, autoEvent);
}
}
if (existing)
{
//wait on the shared object
autoEvent.WaitOne();
lock (ConcurrencyHandler.PublicationIds)
{
ConcurrencyHandler.PublicationIds.Add(PublicationId, autoEvent); //add this again as the exiting thread has removed this from the list
}
}
try
{
// ... do the processing here..........
}
catch (Exception ex)
{
//exception handling
}
finally
{
//remove the pubid
lock (ConcurrencyHandler.PublicationIds)
{
ConcurrencyHandler.PublicationIds.Remove(PublicationId);
autoEvent.Set();
}
}
}
Wrapping the code at a higher level is a good solution, another option could be the using statement with IDisposable.
public class SQLCLRProcedure : IDisposable
{
public bool Execute(Guid guid)
{
// Do work
}
public void Dispose()
{
// Remove GUID
// Close Connection
}
}
using (SQLCLRProcedure procedure = new SQLCLRProcedure())
{
procedure.Execute(guid);
}
This isn't verified in a compiler but it's commonly referred to as the IDisposable Pattern.
http://msdn.microsoft.com/en-us/library/system.idisposable.aspx

Categories

Resources