We had performance issues in our app due to one by one updates of entities in a DB table where the number of rows was high (more than a million). We kept getting deadlock victim errors so it was obvious that the table was locking rows longer than it should have.
Currently, we implemented a manual batching of configurable limit/time threshold of stored procedure calls to update entities in the DB.
The simplified class would look something like this:
public class EntityBatchUpdater
{
private readonly IRepository _repository;
private List<Entity> _batch = new List<Entity>();
private readonly Timer _batchPostingTimer;
private readonly int _batchSize;
private static readonly object _batchPostingLock = new object();
public EntityBatchUpdater(IRepository repository)
{
_repository = repository;
_batchSize = 1000; // configurable
string batchPostingPeriod = "00:01:00.00"; // configurable
_batchPostingTimer = new Timer
{
Interval = TimeSpan.Parse(batchPostingPeriod).TotalMilliseconds,
Enabled = true,
AutoReset = true,
};
_batchPostingTimer.Elapsed += OnTimedEvent;
}
public void Update(Entity entity)
{
try
{
lock (_batchPostingLock)
{
_batch.Add(entity);
if (_batch.Count == _batchSize)
{
EntityBatchUpdate();
}
}
}
catch (Exception ex)
{
_logger.LogError(ex, $"Failed to insert batch {JsonConvert.SerializeObject(batch)}");
}
}
private void EntityBatchUpdate()
{
if (_batch.Count == 0)
{
return;
}
try
{
var entityBatchXML = SerializePayload();
_repository.BatchUpdate(entityBatchXML);
_batch.Clear();
}
catch (Exception ex)
{
_logger.LogError(ex, $"Failed to insert batch; batchSize:{_batch.Count}");
}
}
private string SerializePayload()
{
using (var sw = new System.IO.StringWriter())
{
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("", "");
var serializer = new XmlSerializer(typeof(List<Entity>),
new XmlRootAttribute("ENTITY_BATCH"));
serializer.Serialize(sw, _batch, ns);
return sw.ToString();
}
}
private void OnTimedEvent(object source, ElapsedEventArgs e)
{
EntityBatchUpdate();
}
}
Currently, we took advantage of SQL Server's fast XML processing and serialize payload into XML when calling the actual procedure to avoid hitting the DB with a lot of calls.
I also thought about creating a Table Valued Param to serialize the data we need to send to the proc, but I don't think that would drastically improve the performance.
My question is: How did you handle great load like this on your DB?
1.) Did you use a nuget package/some other tool to handle batching for you?
2.) Did you solve this using some other practice?
Edit:
To give a bit more insight: We are currently processing a queue manually in our app (hence the huge number of updates), and we want to do it as fast and as reliable as possible.
We will move to a better queueing mechanism in the future (RabbtiMQ or Kafka), but in the meantime, we want to have a standard approach of consuming and processing queues from a DB table.
The most confusing part, is that you are doing this work from C# code in the first place. You work on > 1.000.000 Records. That is not a work you should be doing in code, ever. I do not really see what you are doing with those Records, but as a general just keep that scale of work in the Database. Always try to do as much filtering, inserting, bulk inserting, bulk updating and the like on the DB side.
Never move what could be SQL to a client programm. At best you add the network load of having to move the data over the network once or even twice (for Updates). At worst you add a huge danger for race conditions. And at no point will you have a chance to beat the time the DB server would need for the same operation.
This seems to be just a Bulk Insert. SQL has a seperate command for it and any DBMS worth it's Diskspace has the option to import data from variety of file formats, inlcluding .csv.
Related
So I'm running a Parallel.ForEach that basically generates a bunch of data which is ultimately going to be saved to a database. However, since collection of data can get quite large I need to be able to occasionally save/clear the collection so as to not run into an OutOfMemoryException.
I'm new to using Parallel.ForEach, concurrent collections, and locks, so I'm a little fuzzy on what exactly needs to be done to make sure everything works correctly (i.e. we don't get any records added to the collection between the Save and Clear operations).
Currently I'm saying, if the record count is above a certain threshold, save the data in the current collection, within a lock block.
ConcurrentStack<OutRecord> OutRecs = new ConcurrentStack<OutRecord>();
object StackLock = new object();
Parallel.ForEach(inputrecords, input =>
{
lock(StackLock)
{
if (OutRecs.Count >= 50000)
{
Save(OutRecs);
OutRecs.Clear();
}
}
OutRecs.Push(CreateOutputRecord(input);
});
if (OutRecs.Count > 0) Save(OutRecs);
I'm not 100% certain whether or not this works the way I think it does. Does the lock stop other instances of the loop from writing to output collection? If not is there a better way to do this?
Your lock will work correctly but it will not be very efficient because all your worker threads will be forced to pause for the entire duration of each save operation. Also, locks tends to be (relatively) expensive, so performing a lock in each iteration of each thread is a bit wasteful.
One of your comments mentioned giving each worker thread its own data storage: yes, you can do this. Here's an example that you could tailor to your needs:
Parallel.ForEach(
// collection of objects to iterate over
inputrecords,
// delegate to initialize thread-local data
() => new List<OutRecord>(),
// body of loop
(inputrecord, loopstate, localstorage) =>
{
localstorage.Add(CreateOutputRecord(inputrecord));
if (localstorage.Count > 1000)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
return localstorage;
},
// finally block gets executed after each thread exits
localstorage =>
{
if (localstorage.Count > 0)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
});
One approach is to define an abstraction that represents the destination for your data. It could be something like this:
public interface IRecordWriter<T> // perhaps come up with a better name.
{
void WriteRecord(T record);
void Flush();
}
Your class that processes the records in parallel doesn't need to worry about how those records are handled or what happens when there's too many of them. The implementation of IRecordWriter handles all those details, making your other class easier to test.
An implementation of IRecordWriter could look something like this:
public abstract class BufferedRecordWriter<T> : IRecordWriter<T>
{
private readonly ConcurrentQueue<T> _buffer = new ConcurrentQueue<T>();
private readonly int _maxCapacity;
private bool _flushing;
public ConcurrentQueueRecordOutput(int maxCapacity = 100)
{
_maxCapacity = maxCapacity;
}
public void WriteRecord(T record)
{
_buffer.Enqueue(record);
if (_buffer.Count >= _maxCapacity && !_flushing)
Flush();
}
public void Flush()
{
_flushing = true;
try
{
var recordsToWrite = new List<T>();
while (_buffer.TryDequeue(out T dequeued))
{
recordsToWrite.Add(dequeued);
}
if(recordsToWrite.Any())
WriteRecords(recordsToWrite);
}
finally
{
_flushing = false;
}
}
protected abstract void WriteRecords(IEnumerable<T> records);
}
When the buffer reaches the maximum size, all the records in it are sent to WriteRecords. Because _buffer is a ConcurrentQueue it can keep reading records even as they are added.
That Flush method could be anything specific to how you write your records. Instead of this being an abstract class the actual output to a database or file could be yet another dependency that gets injected into this one. You can make decisions like that, refactor, and change your mind because the very first class isn't affected by those changes. All it knows about is the IRecordWriter interface which doesn't change.
You might notice that I haven't made absolutely certain that Flush won't execute concurrently on different threads. I could put more locking around this, but it really doesn't matter. This will avoid most concurrent executions, but it's okay if concurrent executions both read from the ConcurrentQueue.
This is just a rough outline, but it shows how all of the steps become simpler and easier to test if we separate them. One class converts inputs to outputs. Another class buffers the outputs and writes them. That second class can even be split into two - one as a buffer, and another as the "final" writer that sends them to a database or file or some other destination.
I have a fairly simple WPF application that uses Entity Framework. The main page of the application has a list of records that I am getting from a database on startup.
Each record has a picture, so the operation can be a little slow when the wireless signal is poor. I'd like this (and many of my SQL operations) to perform in the background if possible. I have async/await setup and at first it seemed to be working exactly as I wanted but now I'm seeing that my application is becoming unresponsive when accessing the DB.
Eventually I'm thinking I'm going to load up the text in one query and the images in another background operation and load them as they come in. This way I get the important stuff right away and the pictures can come in in the background, but the way things are going it's still looking like it will lock up if I do this.
On top of that, I'm trying to implement something to handle connectivity issues (in case the wifi cuts out momentarily) so that the application notifies the user of a connection issue, automatically retries a few times, etc. I put a try catch for SQL exception which seems to be working for me, but the whole application locks up for about a minute while it is trying to connect to the DB.
I tried testing my async/await using await Task.Delay() and everything is very responsive as expected while awaiting the delay, but everything locks up when awaiting the .ToListAsync(). Is this normal and expected? My understanding of async/await is pretty limited.
My code is kind of messy (I'm new) but it does what I need it to do for the most part. I understand there's probably plenty of improvements I can make and better ways to do things, but one step at a time here. My main goal right now is to keep the application from crashing during database accessing exceptions and to keep the user notified of what the application is doing (searching, trying to access db, unable to reach DB and retrying, etc) as opposed to being frozen, which is what they're going to think when they see it being unresponsive for over a minute.
Some of my code:
In my main view model
DataHelper data = new DataHelper();
private async void GetQualityRegisterQueueAsync()
{
try
{
var task = data.GetQualityRegisterAsync();
IsSearching = true;
await task;
IsSearching = false;
QualityRegisterItems = new ObservableCollection<QualityRegisterQueue>(task.Result);
OrderQualityRegisterItems();
}
catch (M1Exception ex)
{
Debug.WriteLine(ex.Message);
Debug.WriteLine("QualityRegisterLogViewModel.GetQualityRegisterQueue() Operation Failed");
}
}
My Data Helper Class
public class DataHelper
{
private bool debugging = false;
private const int MAX_RETRY = 2;
private const double LONG_WAIT_SECONDS = 5;
private const double SHORT_WAIT_SECONDS = 0.5;
private static readonly TimeSpan longWait = TimeSpan.FromSeconds(LONG_WAIT_SECONDS);
private static readonly TimeSpan shortWait = TimeSpan.FromSeconds(SHORT_WAIT_SECONDS);
private enum RetryableSqlErrors
{
ServerNotFound = -1,
Timeout = -2,
NoLock = 1204,
Deadlock = 1205,
}
public async Task<List<QualityRegisterQueue>> GetQualityRegisterAsync()
{
if(debugging) await Task.Delay(5000);
var retryCount = 0;
using (M1Context m1 = new M1Context())
{
for (; ; )
{
try
{
return await (from a in m1.QualityRegisters
where (a.qanClosed == 0)
//orderby a.qanAssignedDate descending, a.qanOpenedDate
orderby a.qanAssignedDate.HasValue descending, a.qanAssignedDate, a.qanOpenedDate
select new QualityRegisterQueue
{
QualityRegisterID = a.qanQualityRegisterID,
JobID = a.qanJobID.Trim(),
JobAssemblyID = a.qanJobAssemblyID,
JobOperationID = a.qanJobOperationID,
PartID = a.qanPartID.Trim(),
PartRevisionID = a.qanPartRevisionID.Trim(),
PartShortDescription = a.qanPartShortDescription.Trim(),
OpenedByEmployeeID = a.qanOpenedByEmployeeID.Trim(),
OpenedByEmployeeName = a.OpenedEmployee.lmeEmployeeName.Trim(),
OpenedDate = a.qanOpenedDate,
PartImage = a.JobAssembly.ujmaPartImage,
AssignedDate = a.qanAssignedDate,
AssignedToEmployeeID = a.qanAssignedToEmployeeID.Trim(),
AssignedToEmployeeName = a.AssignedEmployee.lmeEmployeeName.Trim()
}).ToListAsync();
}
catch (SqlException ex)
{
Debug.WriteLine("SQL Exception number = " + ex.Number);
if (!Enum.IsDefined(typeof(RetryableSqlErrors), ex.Number))
throw new M1Exception(ex.Message, ex);
retryCount++;
if (retryCount > MAX_RETRY) throw new M1Exception(ex.Message, ex); ;
Debug.WriteLine("Retrying. Count = " + retryCount);
Thread.Sleep(ex.Number == (int)RetryableSqlErrors.Timeout ?
longWait : shortWait);
}
}
}
}
}
Edit: Mostly looking for general guidance here, though a specific example of what to do would be great. For these types of operations where I am downloading data, is it just a given that if I need the application to be responsive I need to be making multiple threads? Is that a common solution to this type of problem? Is this not something I should be expecting async/await to solve?
If you call this method from your UI thread, you will overload the capture of UI thread context and back on itself. Also, your service will not be necessarily "Performant" because it must wait until the UI thread is free before it can continue.
The solution is simple: just call the method passing the ConfigureAwait "false" parameter when you made the call.
.ToListAsync().ConfigureAwaiter(false);
I hope it helps
I have a large application based on Dynamics CRM 2011 that in various places has code that must query for a record based upon some criteria and create it if it doesn't exist else update it.
An example of the kind of thing I am talking about would be similar to this:
stk_balance record = context.stk_balanceSet.FirstOrDefault(x => x.stk_key == id);
if(record == null)
{
record = new stk_balance();
record.Id = Guid.NewGuid();
record.stk_value = 100;
context.AddObject(record);
}
else
{
record.stk_value += 100;
context.UpdateObject(record);
}
context.SaveChanges();
In terms of CRM 2011 implementation (although not strictly relevant to this question) the code could be triggered from synchronous or asynchronous plugins. The issue is that the code is not thread safe, between checking if the record exists and creating it if it doesn't, another thread could come in and do the same thing first resulting in duplicate records.
Normal locking methods are not reliable due to the architecture of the system, various services using multiple threads could all be using the same code, and these multiple services are also load balanced across multiple machines.
In trying to find a solution to this problem that doesn't add massive amounts of extra complexity and doesn't compromise the idea of not having a single point of failure or a single point where a bottleneck could occur I came across the idea of using SQL Server application locks.
I came up with the following class:
public class SQLLock : IDisposable
{
//Lock constants
private const string _lockMode = "Exclusive";
private const string _lockOwner = "Transaction";
private const string _lockDbPrincipal = "public";
//Variable for storing the connection passed to the constructor
private SqlConnection _connection;
//Variable for storing the name of the Application Lock created in SQL
private string _lockName;
//Variable for storing the timeout value of the lock
private int _lockTimeout;
//Variable for storing the SQL Transaction containing the lock
private SqlTransaction _transaction;
//Variable for storing if the lock was created ok
private bool _lockCreated = false;
public SQLLock (string lockName, int lockTimeout = 180000)
{
_connection = Connection.GetMasterDbConnection();
_lockName = lockName;
_lockTimeout = lockTimeout;
//Create the Application Lock
CreateLock();
}
public void Dispose()
{
//Release the Application Lock if it was created
if (_lockCreated)
{
ReleaseLock();
}
_connection.Close();
_connection.Dispose();
}
private void CreateLock()
{
_transaction = _connection.BeginTransaction();
using (SqlCommand createCmd = _connection.CreateCommand())
{
createCmd.Transaction = _transaction;
createCmd.CommandType = System.Data.CommandType.Text;
StringBuilder sbCreateCommand = new StringBuilder();
sbCreateCommand.AppendLine("DECLARE #res INT");
sbCreateCommand.AppendLine("EXEC #res = sp_getapplock");
sbCreateCommand.Append("#Resource = '").Append(_lockName).AppendLine("',");
sbCreateCommand.Append("#LockMode = '").Append(_lockMode).AppendLine("',");
sbCreateCommand.Append("#LockOwner = '").Append(_lockOwner).AppendLine("',");
sbCreateCommand.Append("#LockTimeout = ").Append(_lockTimeout).AppendLine(",");
sbCreateCommand.Append("#DbPrincipal = '").Append(_lockDbPrincipal).AppendLine("'");
sbCreateCommand.AppendLine("IF #res NOT IN (0, 1)");
sbCreateCommand.AppendLine("BEGIN");
sbCreateCommand.AppendLine("RAISERROR ( 'Unable to acquire Lock', 16, 1 )");
sbCreateCommand.AppendLine("END");
createCmd.CommandText = sbCreateCommand.ToString();
try
{
createCmd.ExecuteNonQuery();
_lockCreated = true;
}
catch (Exception ex)
{
_transaction.Rollback();
throw new Exception(string.Format("Unable to get SQL Application Lock on '{0}'", _lockName), ex);
}
}
}
private void ReleaseLock()
{
using (SqlCommand releaseCmd = _connection.CreateCommand())
{
releaseCmd.Transaction = _transaction;
releaseCmd.CommandType = System.Data.CommandType.StoredProcedure;
releaseCmd.CommandText = "sp_releaseapplock";
releaseCmd.Parameters.AddWithValue("#Resource", _lockName);
releaseCmd.Parameters.AddWithValue("#LockOwner", _lockOwner);
releaseCmd.Parameters.AddWithValue("#DbPrincipal", _lockDbPrincipal);
try
{
releaseCmd.ExecuteNonQuery();
}
catch {}
}
_transaction.Commit();
}
}
I would use this in my code to create a SQL Server application lock using the unique key I am querying for as the lock name like this
using (var sqlLock = new SQLLock(id))
{
//Code to check for and create or update record here
}
Now this approach seems to work, however I am by no means any kind of SQL Server expert and am wary about putting this anywhere near production code.
My question really has 3 parts
1. Is this a really bad idea because of something I haven't considered?
Are SQL Server application locks completely unsuitable for this purpose?
Is there a maximum number of application locks (with different names) you can have at a time?
Are there performance considerations if a potentially large number of locks are created?
What else could be an issue with the general approach?
2. Is the solution actually implemented above any good?
If SQL Server application locks are usable like this, have I actually used them properly?
Is there a better way of using SQL Server to achieve the same result?
In the code above I am getting a connection to the Master database and creating the locks in there. Does that potentially cause other issues? Should I create the locks in a different database?
3. Is there a completely alternative approach that could be used that doesn't use SQL Server application locks?
I can't use stored procedures to create and update the record (unsupported in CRM 2011).
I don't want to add a single point of failure.
You can do this much easier.
//make sure your plugin runs within a transaction, this is the case for stage 20 and 40
//you can check this with IExecutionContext.IsInTransaction
//works not with offline plugins but works within CRM Online (Cloud) and its fully supported
//also works on transaction rollback
var lockUpdateEntity = new dummy_lock_entity(); //simple technical entity with as many rows as different lock barriers you need
lockUpdateEntity.Id = Guid.parse("well known guid"); //well known guid for this barrier
lockUpdateEntity.dummy_field=Guid.NewGuid(); //just update/change a field to create a lock, no matter of its content
//--------------- this is untested by me, i use the next one
context.UpdateObject(lockUpdateEntity);
context.SaveChanges();
//---------------
//OR
//--------------- i use this one, but you need a reference to your OrganizationService
OrganizationService.Update(lockUpdateEntity);
//---------------
//threads wait here if they have no lock for dummy_lock_entity with "well known guid"
stk_balance record = context.stk_balanceSet.FirstOrDefault(x => x.stk_key == id);
if(record == null)
{
record = new stk_balance();
//record.Id = Guid.NewGuid(); //not needed
record.stk_value = 100;
context.AddObject(record);
}
else
{
record.stk_value += 100;
context.UpdateObject(record);
}
context.SaveChanges();
//let the pipeline flow and the transaction complete ...
For more background info refer to http://www.crmsoftwareblog.com/2012/01/implementing-robust-microsoft-dynamics-crm-2011-auto-numbering-using-transactions/
Will parallelism help with performance for a locked object, should it be run single threaded, or is there another technique?
I noticed that when accessing a dataset and adding rows from multiple threads exceptions were thrown. Therefore I created a "thread-safe" version to add rows by locking the table prior to updating the row. This implementation works but is appears slow with many transactions.
public partial class HaMmeRffl
{
public partial class PlayerStatsDataTable
{
public void AddPlayerStatsRow(int PlayerID, int Year, int StatEnum, int Value, DateTime Timestamp)
{
lock (TeamMemberData.Dataset.PlayerStats)
{
HaMmeRffl.PlayerStatsRow testrow = TeamMemberData.Dataset.PlayerStats.FindByPlayerIDYearStatEnum(PlayerID, Year, StatEnum);
if (testrow == null)
{
HaMmeRffl.PlayerStatsRow newRow = TeamMemberData.Dataset.PlayerStats.NewPlayerStatsRow();
newRow.PlayerID = PlayerID;
newRow.Year = Year;
newRow.StatEnum = StatEnum;
newRow.Value = Value;
newRow.Timestamp = Timestamp;
TeamMemberData.Dataset.PlayerStats.AddPlayerStatsRow(newRow);
}
else
{
testrow.Value = Value;
testrow.Timestamp = Timestamp;
}
}
}
}
}
Now I can call this safely from multiple threads, but does it actually buy me anything? Can I do this differently for better performance. For instance is there any way to use System.Collections.Concurrent namespace to optimize performance or any other methods?
In addition, I update the underlying database after the entire dataset is updated and that takes a very long time. Would that be considered an I/O operation and be worth using parallel processing by updating it after each row is updated in the dataset (or some number of rows).
UPDATE
I wrote some code to test concurrent vs sequential processing which shows it takes about 30% longer to do concurrent processing and I should use sequential processing here. I assume this is because the lock on the database is causing the overhead on the ConcurrentQueue to be more costly than the gains from parallel processing. Is this conclusion correct and is there anything that I can do to speed up the processing, or am I stuck as for a Datatable "You must synchronize any write operations".
Here is my test code which might not be scientifically correct. Here is the timer and calls between them.
dbTimer.Restart();
Queue<HaMmeRffl.PlayersRow.PlayerValue> addPlayerRow = InsertToPlayerQ(addUpdatePlayers);
Queue<HaMmeRffl.PlayerStatsRow.PlayerStatValue> addPlayerStatRow = InsertToPlayerStatQ(addUpdatePlayers);
UpdatePlayerStatsInDB(addPlayerRow, addPlayerStatRow);
dbTimer.Stop();
System.Diagnostics.Debug.Print("Writing to the dataset took {0} seconds single threaded", dbTimer.Elapsed.TotalSeconds);
dbTimer.Restart();
ConcurrentQueue<HaMmeRffl.PlayersRow.PlayerValue> addPlayerRows = InsertToPlayerQueue(addUpdatePlayers);
ConcurrentQueue<HaMmeRffl.PlayerStatsRow.PlayerStatValue> addPlayerStatRows = InsertToPlayerStatQueue(addUpdatePlayers);
UpdatePlayerStatsInDB(addPlayerRows, addPlayerStatRows);
dbTimer.Stop();
System.Diagnostics.Debug.Print("Writing to the dataset took {0} seconds concurrently", dbTimer.Elapsed.TotalSeconds);
In both examples I add to the Queue and ConcurrentQueue in an identical manner single threaded. The only difference is the insertion into the datatable. The single-threaded approach inserts as follows:
private static void UpdatePlayerStatsInDB(Queue<HaMmeRffl.PlayersRow.PlayerValue> addPlayerRows, Queue<HaMmeRffl.PlayerStatsRow.PlayerStatValue> addPlayerStatRows)
{
try
{
HaMmeRffl.PlayersRow.PlayerValue row;
while (addPlayerRows.Count > 0)
{
row = addPlayerRows.Dequeue();
TeamMemberData.Dataset.Players.AddPlayersRow(
row.PlayerID, row.Name, row.PosEnum, row.DepthEnum,
row.TeamID, row.RosterTimestamp, row.DepthTimestamp,
row.Active, row.NewsUpdate);
}
}
catch (Exception)
{
TeamMemberData.Dataset.Players.RejectChanges();
}
try
{
HaMmeRffl.PlayerStatsRow.PlayerStatValue row;
while (addPlayerStatRows.Count > 0)
{
row = addPlayerStatRows.Dequeue();
TeamMemberData.Dataset.PlayerStats.AddUpdatePlayerStatsRow(
row.PlayerID, row.Year, row.StatEnum, row.Value, row.Timestamp);
}
}
catch (Exception)
{
TeamMemberData.Dataset.PlayerStats.RejectChanges();
}
TeamMemberData.Dataset.Players.AcceptChanges();
TeamMemberData.Dataset.PlayerStats.AcceptChanges();
}
The concurrent adds as follows
private static void UpdatePlayerStatsInDB(ConcurrentQueue<HaMmeRffl.PlayersRow.PlayerValue> addPlayerRows, ConcurrentQueue<HaMmeRffl.PlayerStatsRow.PlayerStatValue> addPlayerStatRows)
{
Action actionPlayer = () =>
{
HaMmeRffl.PlayersRow.PlayerValue row;
while (addPlayerRows.TryDequeue(out row))
{
TeamMemberData.Dataset.Players.AddPlayersRow(
row.PlayerID, row.Name, row.PosEnum, row.DepthEnum,
row.TeamID, row.RosterTimestamp, row.DepthTimestamp,
row.Active, row.NewsUpdate);
}
};
Action actionPlayerStat = () =>
{
HaMmeRffl.PlayerStatsRow.PlayerStatValue row;
while (addPlayerStatRows.TryDequeue(out row))
{
TeamMemberData.Dataset.PlayerStats.AddUpdatePlayerStatsRow(
row.PlayerID, row.Year, row.StatEnum, row.Value, row.Timestamp);
}
};
Action[] actions = new Action[Environment.ProcessorCount * 2];
for (int i = 0; i < Environment.ProcessorCount; i++)
{
actions[i * 2] = actionPlayer;
actions[i * 2 + 1] = actionPlayerStat;
}
try
{
// Start ProcessorCount concurrent consuming actions.
Parallel.Invoke(actions);
}
catch (Exception)
{
TeamMemberData.Dataset.Players.RejectChanges();
TeamMemberData.Dataset.PlayerStats.RejectChanges();
}
TeamMemberData.Dataset.Players.AcceptChanges();
TeamMemberData.Dataset.PlayerStats.AcceptChanges();
}
The difference in time is 4.6 seconds for the single-threaded and 6.1 for the parallel.Invoke.
Lock & transactions are not good for parallelism and performance.
1)Try avoid lock:Will different threads need to update the same Row in dataset?
2)minimize lock time.
For db operation use may try Batch Update future of ADO.NET: http://msdn.microsoft.com/en-us/library/ms810297.aspx
Multithreading can help upto an extent because once the data across your app boundary , you will start waiting for I/O , here you can do asynchronous processing because your app does not have control over various parameters ( Resource access , Network speed etc),this will give better user experience (If UI app).
Now for your scenario , you may want to use some sort of producer/consumer queue , as soon as a row is available in queue , a different thread start processing it but again this will work upto an extent.
There's already a question about this issue, but it's not telling me what I need to know:
Let's assume I have a web application, and there's a lot of logging on every roundtrip. I don't want to open a debate about why there's so much logging, or how can I do less loggin operations. I want to know what possibilities I have in order to make this logging issue performant and clean.
So far, I've implemented declarative (attribute based) and imperative logging, which seems to be a cool and clean way of doing it... now, what can I do about performance, assuming I can expect those logs to take more time than expected. Is it ok to open a thread and leave that work to it?
Things I would consider:
Use an efficient file format to minimise the quantity of data to be written (e.g. XML and text formats are easy to read but usually terribly inefficient - the same information can be stored in a binary format in a much smaller space). But don't spend lots of CPU time trying to pack data "optimally". Just go for a simple format that is compact but fast to write.
Test use of compression on the logs. This may not be the case with a fast SSD but in most I/O situations the overhead of compressing data is less than the I/O overhead, so compression gives a net gain (although it is a compromise - raising CPU usage to lower I/O usage).
Only log useful information. No matter how important you think everything is, it's likely you can find something to cut out.
Eliminate repeated data. e.g. Are you logging a client's IP address or domain name repeatedly? Can these be reported once for a session and then not repeated? Or can you store them in a map file and use a compact index value whenever you need to reference them? etc
Test whether buffering the logged data in RAM helps improve performance (e.g. writing a thousand 20 byte log records will mean 1,000 function calls and could cause a lot of disk seeking and other write overheads, while writing a single 20,000 byte block in one burst means only one function call and could give significant performance increase and maximise the burst rate you get out to disk). Often writing blocks in sizes like (4k, 16k, 32, 64k) of data works well as it tends to fit the disk and I/O architecture (but check your specific architecture for clues about what sizes might improve efficiency). The down side of a RAM buffer is that if there is a power outage you will lose more data. So you may have to balance performance against robustness.
(Especially if you are buffering...) Dump the information to an in-memory data structure and pass it to another thread to stream it out to disk. This will help stop your primary thread being held up by log I/O. Take care with threads though - for example, you may have to consider how you will deal with times when you are creating data faster than it can be logged for short bursts - do you need to implement a queue, etc?
Are you logging multiple streams? Can these be multiplexed into a single log to possibly reduce disk seeking and the number of open files?
Is there a hardware solution that will give a large bang for your buck? e.g. Have you used SSD or RAID disks? Will dumping the data to a different server help or hinder? It may not always make much sense to spend $10,000 of developer time making something perform better if you can spend $500 to simply upgrade the disk.
I use the code below to Log. It is a singleton that accepts Logging and puts every message into a concurrentqueue. Every two seconds it writes all that has come in to the disk. Your app is now only delayed by the time it takes to put every message in the list. It's my own code, feel free to use it.
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Windows.Forms;
namespace FastLibrary
{
public enum Severity : byte
{
Info = 0,
Error = 1,
Debug = 2
}
public class Log
{
private struct LogMsg
{
public DateTime ReportedOn;
public string Message;
public Severity Seriousness;
}
// Nice and Threadsafe Singleton Instance
private static Log _instance;
public static Log File
{
get { return _instance; }
}
static Log()
{
_instance = new Log();
_instance.Message("Started");
_instance.Start("");
}
~Log()
{
Exit();
}
public static void Exit()
{
if (_instance != null)
{
_instance.Message("Stopped");
_instance.Stop();
_instance = null;
}
}
private ConcurrentQueue<LogMsg> _queue = new ConcurrentQueue<LogMsg>();
private Thread _thread;
private string _logFileName;
private volatile bool _isRunning;
public void Message(string msg)
{
_queue.Enqueue(new LogMsg { ReportedOn = DateTime.Now, Message = msg, Seriousness = Severity.Info });
}
public void Message(DateTime time, string msg)
{
_queue.Enqueue(new LogMsg { ReportedOn = time, Message = msg, Seriousness = Severity.Info });
}
public void Message(Severity seriousness, string msg)
{
_queue.Enqueue(new LogMsg { ReportedOn = DateTime.Now, Message = msg, Seriousness = seriousness });
}
public void Message(DateTime time, Severity seriousness, string msg)
{
_queue.Enqueue(new LogMsg { ReportedOn = time, Message = msg, Seriousness = seriousness });
}
private void Start(string fileName = "", bool oneLogPerProcess = false)
{
_isRunning = true;
// Unique FileName with date in it. And ProcessId so the same process running twice will log to different files
string lp = oneLogPerProcess ? "_" + System.Diagnostics.Process.GetCurrentProcess().Id : "";
_logFileName = fileName == ""
? DateTime.Now.Year.ToString("0000") + DateTime.Now.Month.ToString("00") +
DateTime.Now.Day.ToString("00") + lp + "_" +
System.IO.Path.GetFileNameWithoutExtension(Application.ExecutablePath) + ".log"
: fileName;
_thread = new Thread(LogProcessor);
_thread.IsBackground = true;
_thread.Start();
}
public void Flush()
{
EmptyQueue();
}
private void EmptyQueue()
{
while (_queue.Any())
{
var strList = new List<string>();
//
try
{
// Block concurrent writing to file due to flush commands from other context
lock (_queue)
{
LogMsg l;
while (_queue.TryDequeue(out l)) strList.Add(l.ReportedOn.ToLongTimeString() + "|" + l.Seriousness + "|" + l.Message);
if (strList.Count > 0)
{
System.IO.File.AppendAllLines(_logFileName, strList);
strList.Clear();
}
}
}
catch
{
//ignore errors on errorlogging ;-)
}
}
}
public void LogProcessor()
{
while (_isRunning)
{
EmptyQueue();
// Sleep while running so we write in efficient blocks
if (_isRunning) Thread.Sleep(2000);
else break;
}
}
private void Stop()
{
// This is never called in the singleton.
// But we made it a background thread so all will be killed anyway
_isRunning = false;
if (_thread != null)
{
_thread.Join(5000);
_thread.Abort();
_thread = null;
}
}
}
}
Check if the logger is debug enabled before calling logger.debug, this means your code does not have to evaluate the message string when debug is turned off.
if (_logger.IsDebugEnabled) _logger.Debug($"slow old string {this.foo} {this.bar}");