Thread safe update of Cached Reference data

Thread safe update of Cached Reference data - c#

Say I have several List properties. Something Like this:
List<CustomerTypes> CustomerTypes {get; set;}
List<FormatTypes> FormatTypes {get; set;}
List<WidgetTypes> WidgetTypes {get; set}
List<PriceList> PriceList {get; set;}
Because these values update very rarely, I am caching them in my WCF Service at startup. I then have a service operation that can be called to refresh them.
The service operation will query them all from the database something like this:
// Get the data from the database.
var customerTypes = dbContext.GetCustomerTypes();
var formatTypes = dbContext.GetFormatTypes();
var widgetTypes = dbContext.GetWidgetTypes ();
var priceList = dbContext.GetPriceList ();
// Update the references
CustomerTypes = customerTypes;
FormatTypes = formatTypes;
WidgetTypes = widgetTypes;
PriceList = priceList;
This results in very little time that these are not all in sync. However, they are not fully thread safe. (A call could access a new CustomerType and an old PriceList.)
How can I make it so that while I am updating the references, any use of these lists has to wait until all references have been updated?

First put all of those lists in to a single container class.
Class TypeLists
{
List<CustomerTypes> CustomerTypes {get; set;}
List<FormatTypes> FormatTypes {get; set;}
List<WidgetTypes> WidgetTypes {get; set}
List<PriceList> PriceList {get; set;}
}
Then replace the old property accesses with a function call.
private readonly object _typeListsLookupLock = new object();
private volatile TypeLists _typeLists;
private volatile DateTime _typeListAge;
public TypeLists GetTypeList()
{
if(_typeLists == null || DateTime.UtcNow - _typeListAge > MaxCacheAge)
{
//The assignment of _typeLists is thread safe, this lock is only to
//prevent multiple concurrent database lookups. If you don't care that
//two threads could call GetNewTypeList() at the same time you can remove
//the lock and inner if check.
lock(_typeListsLookupLock)
{
//Check to see if while we where waiting to enter the lock someone else
//updated the lists and making the call to the database unnecessary.
if(_typeLists == null || DateTime.UtcNow - _typeListAge > MaxCacheAge)
{
_typeLists = GetNewTypeList();
_typeListAge = DateTime.UtcNow;
}
}
}
return _typeLists;
}
private TypeLists GetNewTypeList()
{
var container = new TypeLists()
using(var dbContext = GetContext())
{
container.CustomerTypes = dbContext.GetCustomerTypes();
container.FormatTypes = dbContext.GetFormatTypes();
container.WidgetTypes = dbContext.GetFormatTypes();
container.PriceList = dbContext.GetPriceList ();
}
return container;
}
The reason we change from a property to a function is you did
SomeFunction(myClass.TypeLists.PriceList, myClass.TypeLists.FormatTypes);
You could have TypeLists changed out from under you in a multi-threaded environment, however if you do
var typeLists = myClass.GetTypeLists();
SomeFunction(typeLists.PriceList, typeLists.FormatTypes);
that typeLists object is not mutated between threads so you do not need to worry about it's value changing out from under you, you could do var typeLists = myClass.TypeLists but making it a function makes it is more clear that you could potentially get different results between calls.
If you want to be fancy you can change GetTypeList() so it uses a MemoryCache to detect when it should expire the object and get a new one.

I thought it would be fun to put something together as an example. This answer is based on guidance from Marc Gravell's answer here.
The following class accepts a milliseconds value and provides an
event to notify the caller that the refresh interval has been hit.
It uses Environment.TickCount which is orders of magnitude faster
than using DateTime objects.
The double-checked lock prevents multiple threads from refreshing
concurrently and benefits from the reduced overhead of avoiding the
lock on every call.
Refreshing the data on the ThreadPool using Task.Run() allows the
caller to continue uninterrupted with the existing cached data.
using System;
using System.Threading.Tasks;
namespace RefreshTest {
public delegate void RefreshCallback();
public class RefreshInterval {
private readonly object _syncRoot = new Object();
private readonly long _interval;
private long _lastRefresh;
private bool _updating;
public event RefreshCallback RefreshData = () => { };
public RefreshInterval(long interval) {
_interval = interval;
}
public void Refresh() {
if (Environment.TickCount - _lastRefresh < _interval || _updating) {
return;
}
lock (_syncRoot) {
if (Environment.TickCount - _lastRefresh < _interval || _updating) {
return;
}
_updating = true;
Task.Run(() => LoadData());
}
}
private void LoadData() {
try {
RefreshData();
_lastRefresh = Environment.TickCount;
}
catch (Exception e) {
//handle appropriately
}
finally {
_updating = false;
}
}
}
}
Interlocked provides a fast, atomic replacement of the cached data.
using System.Collections.Generic;
namespace RefreshTest {
internal static class ContextCache {
private static readonly RefreshInterval _refresher = new RefreshInterval(60000);
private static List<int> _customerTypes = new List<int>();
static ContextCache() {
_refresher.RefreshData += RefreshData;
}
internal static List<int> CustomerTypes {
get {
_refresher.Refresh();
return _customerTypes;
}
}
private static void RefreshData() {
List<int> customerTypes = new List<int>(); //dbContext.GetCustomerTypes();
Interlocked.Exchange(ref _customerTypes, customerTypes);
}
}
}
Several million concurrent calls runs ~ 100ms (run your own tests though!):
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading.Tasks;
namespace RefreshTest {
internal class Program {
private static void Main(string[] args) {
Stopwatch watch = new Stopwatch();
watch.Start();
List<Task> tasks = new List<Task>();
for (int i = 0; i < Environment.ProcessorCount; i++) {
Task task = Task.Run(() => Test());
tasks.Add(task);
}
tasks.ForEach(x => x.Wait());
Console.WriteLine("Elapsed Milliseconds: {0}", watch.ElapsedMilliseconds);
Console.ReadKey();
}
private static void Test() {
for (int i = 0; i < 1000000; i++) {
var a = ContextCache.CustomerTypes;
}
}
}
}
Hope that helps.

If you have a simple scenario maybe you can use a HACK.
Programatically edit your web.config (not important what you edit, you can invent a counter or go from 0 to 1 or back from 1 to 0 on some invented appSetting).
Look here for example.
This will allow all existing requests to finish and then it will restart your app domain inside IIS.
At start of new app domain data from db will be reloaded into your lists.
Be WARNED that your'll also experience a delay on starting of new app domain (on 1st request, jitting IL again) and you will also loose your data in Session, Application, etc.
Advantage is that when running you don't have any performance hit because of locking.

Related

C# Design pattern for periodic execution of multiple Threads

I have a below requirement in my C# Windows Service.
At the starting of Service, it fetches a collection of data from db
and keeps it in memory.
Have a business logic to be executed periodically from 3 different threads.
Each thread will execute same bussiness logic with different subset of data from the data collection mentioned in step 1. Each thread will produce different result sets.
All 3 threads will run periodically if any change happened to the data collection.
When any client makes call to the service, service should be able to return the status of the thread execution.
I know C# has different mechanisms to implement periodic thread execution.
Timers, Threads with Sleep, Event eventwaithandle ect.,
I am trying to understand Which threading mechanism or design pattern will be best fit for this requirement?

A more modern approach would be using tasks but have a look at the principles
namespace Test {
public class Program {
public static void Main() {
System.Threading.Thread main = new System.Threading.Thread(() => new Processor().Startup());
main.IsBackground = false;
main.Start();
System.Console.ReadKey();
}
}
public class ProcessResult { /* add your result state */ }
public class ProcessState {
public ProcessResult ProcessResult1 { get; set; }
public ProcessResult ProcessResult2 { get; set; }
public ProcessResult ProcessResult3 { get; set; }
public string State { get; set; }
}
public class Processor {
private readonly object _Lock = new object();
private readonly DataFetcher _DataFetcher;
private ProcessState _ProcessState;
public Processor() {
_DataFetcher = new DataFetcher();
_ProcessState = null;
}
public void Startup() {
_DataFetcher.DataChanged += DataFetcher_DataChanged;
}
private void DataFetcher_DataChanged(object sender, DataEventArgs args) => StartProcessingThreads(args.Data);
private void StartProcessingThreads(string data) {
lock (_Lock) {
_ProcessState = new ProcessState() { State = "Starting", ProcessResult1 = null, ProcessResult2 = null, ProcessResult3 = null };
System.Threading.Thread one = new System.Threading.Thread(() => DoProcess1(data)); // manipulate the data toa subset
one.IsBackground = true;
one.Start();
System.Threading.Thread two = new System.Threading.Thread(() => DoProcess2(data)); // manipulate the data toa subset
two.IsBackground = true;
two.Start();
System.Threading.Thread three = new System.Threading.Thread(() => DoProcess3(data)); // manipulate the data toa subset
three.IsBackground = true;
three.Start();
}
}
public ProcessState GetState() => _ProcessState;
private void DoProcess1(string dataSubset) {
// do work
ProcessResult result = new ProcessResult(); // this object contains the result
// on completion
lock (_Lock) {
_ProcessState = new ProcessState() { State = (_ProcessState.State ?? string.Empty) + ", 1 done", ProcessResult1 = result, ProcessResult2 = _ProcessState?.ProcessResult2, ProcessResult3 = _ProcessState?.ProcessResult3 };
}
}
private void DoProcess2(string dataSubset) {
// do work
ProcessResult result = new ProcessResult(); // this object contains the result
// on completion
lock (_Lock) {
_ProcessState = new ProcessState() { State = (_ProcessState.State ?? string.Empty) + ", 2 done", ProcessResult1 = _ProcessState?.ProcessResult1 , ProcessResult2 = result, ProcessResult3 = _ProcessState?.ProcessResult3 };
}
}
private void DoProcess3(string dataSubset) {
// do work
ProcessResult result = new ProcessResult(); // this object contains the result
// on completion
lock (_Lock) {
_ProcessState = new ProcessState() { State = (_ProcessState.State ?? string.Empty) + ", 3 done", ProcessResult1 = _ProcessState?.ProcessResult1, ProcessResult2 = _ProcessState?.ProcessResult2, ProcessResult3 = result };
}
}
}
public class DataEventArgs : System.EventArgs {
// data here is string, but could be anything -- just think of thread safety when accessing from the 3 processors
private readonly string _Data;
public DataEventArgs(string data) {
_Data = data;
}
public string Data => _Data;
}
public class DataFetcher {
// watch for data changes and fire when data has changed
public event System.EventHandler<DataEventArgs> DataChanged;
}
}

The simplest solution would be to define the scheduled logic in Task Method() style, and execute them using Task.Run(), while in the main thread just wait for the execution to finish using Task.WaitAny(). If a task is finished, you could Call Task.WaitAny again, but instead of the finished task, you'd pass Task.Delay(timeUntilNextSchedule).
This way the tasks are not blocking the main thread, and you can avoid spinning the CPU just to wait. In general, you can avoid managing directly in modern .NET
Depending on other requirements, like standardized error handling, monitoring capability, management of these scheduled task, you could also rely on a more robust solution, like HangFire.

Triggering DynamicData cache update using Reactive Subject

As a caveat I'm a novice with Rx (2 weeks) and have been experimenting with using Rx, RxUI and Roland Pheasant's DynamicData.
I have a service that initially loads data from local persistence and then, upon some user (or system) instruction will contact the server (TriggerServer in the example) to get additional or replacement data. The solution I've come up with uses a Subject and I've come across many a site discussing the pros/cons of using them. Although I understand the basics of hot/cold it's all based on reading rather than real world.
So, using the below as a simplified version, is this 'right' way of going about this problem or is there something I haven't properly understood somewhere?
NB: I'm not sure how important it is, but the actual code is taken from a Xamarin.Forms app, that uses RxUI, the user input being a ReactiveCommand.
Example:
using DynamicData;
using System;
using System.Linq;
using System.Reactive;
using System.Reactive.Disposables;
using System.Reactive.Linq;
using System.Reactive.Subjects;
using System.Threading.Tasks;
public class MyService : IDisposable
{
private CompositeDisposable _cleanup;
private Subject<Unit> _serverSubject = new Subject<Unit>();
public MyService()
{
var data = Initialise().Publish();
AllData = data.AsObservableCache();
_cleanup = new CompositeDisposable(AllData, data.Connect());
}
public IObservableCache<MyData, Guid> AllData { get; }
public void TriggerServer()
{
// This is what I'm not sure about...
_serverSubject.OnNext(Unit.Default);
}
private IObservable<IChangeSet<MyData, Guid>> Initialise()
{
return ObservableChangeSet.Create<MyData, Guid>(async cache =>
{
// inital load - is this okay?
cache.AddOrUpdate(await LoadLocalData());
// is this a valid way of doing this?
var sync = _serverSubject.Select(_ => GetDataFromServer())
.Subscribe(async task =>
{
var data = await task.ConfigureAwait(false);
cache.AddOrUpdate(data);
});
return new CompositeDisposable(sync);
}, d=> d.Id);
}
private IObservable<MyData> LoadLocalData()
{
return Observable.Timer(TimeSpan.FromSeconds(3)).Select(_ => new MyData("localdata"));
}
private async Task<MyData> GetDataFromServer()
{
await Task.Delay(2000).ConfigureAwait(true);
return new MyData("serverdata");
}
public void Dispose()
{
_cleanup?.Dispose();
}
}
public class MyData
{
public MyData(string value)
{
Value = value;
}
public Guid Id { get; } = Guid.NewGuid();
public string Value { get; set; }
}
And a simple Console app to run:
public static class TestProgram
{
public static void Main()
{
var service = new MyService();
service.AllData.Connect()
.Bind(out var myData)
.Subscribe(_=> Console.WriteLine("data in"), ()=> Console.WriteLine("COMPLETE"));
while (Continue())
{
Console.WriteLine("");
Console.WriteLine("");
Console.WriteLine($"Triggering Server Call, current data is: {string.Join(", ", myData.Select(x=> x.Value))}");
service.TriggerServer();
}
}
private static bool Continue()
{
Console.WriteLine("Press any key to call server, x to exit");
var key = Console.ReadKey();
return key.Key != ConsoleKey.X;
}
}

Looks very good for first try with Rx
I would suggest few changes:
1) Remove the Initialize() call from the constructor and make it a public method - helps a lot with unit tests and now you can await it if you need to
public static void Main()
{
var service = new MyService();
service.Initialize();
2) Add Throttle to you trigger - this fixes parallel calls to the server returning the same results
3) Don't do anything that can throw in Subscribe, use Do instead:
var sync = _serverSubject
.Throttle(Timespan.FromSeconds(0.5), RxApp.TaskPoolScheduler) // you can pass a scheduler via arguments, or use TestScheduler in unit tests to make time pass faster
.Do(async _ =>
{
var data = await GetDataFromServer().ConfigureAwait(false); // I just think this is more readable, your way was also correct
cache.AddOrUpdate(data);
})
// .Retry(); // or anything alese to handle failures
.Subscribe();

I'm putting what I've come to as my solution just in case there's others that find this while they're wandering the internets.
I ended up removing the Subjects all together and chaining together several SourceCache, so when one changed it pushed into the other and so on. I've removed some code for brevity:
public class MyService : IDisposable
{
private SourceCache<MyData, Guid> _localCache = new SourceCache<MyData, Guid>(x=> x.Id);
private SourceCache<MyData, Guid> _serverCache = new SourceCache<MyData, Guid>(x=> x.Id);
public MyService()
{
var localdata = _localCache.Connect();
var serverdata = _serverCache.Connect();
var alldata = localdata.Merge(serverdata);
AllData = alldata.AsObservableCache();
}
public IObservableCache<MyData, Guid> AllData { get; }
public IObservable<Unit> TriggerLocal()
{
return LoadLocalAsync().ToObservable();
}
public IObservable<Unit> TriggerServer()
{
return LoadServerAsync().ToObservable();
}
}
EDIT: I've changed this again to remove any chaining of caches - I just manage the one cache internally. Lesson is not to post too early.

C# Trying to wrap a function with a stopwatch

I've been attempting to see how long functions take to execute in my code as practice to see where I can optimize. Right now I use a helper class that is essentially a stopwatch with a message to check these. The goal of this is that I should be able to wrap whatever method call I want in the helper and I'll get it's duration.
public class StopwatcherData
{
public long Time { get; set; }
public string Message { get; set; }
public StopwatcherData(long time, string message)
{
Time = time;
Message = message;
}
}
public class Stopwatcher
{
public delegate void CompletedCallBack(string result);
public static List<StopwatcherData> Data { get; set; }
private static Stopwatch stopwatch { get; set;}
public Stopwatcher()
{
Data = new List<StopwatcherData>();
stopwatch = new Stopwatch();
stopwatch.Start();
}
public static void Click(string message)
{
Data.Add(new StopwatcherData(stopwatch.ElapsedMilliseconds, message));
}
public static void Reset()
{
stopwatch.Reset();
stopwatch.Start();
}
}
Right now to use this, I have to call the Reset before the function I want so that the timer is restarted, and then call the click after it.
Stopwatcher.Reset()
MyFunction();
Stopwatcher.Click("MyFunction");
I've read a bit about delegates and actions, but I'm unsure of how to apply them to this situation. Ideally, I would pass the function as part of the Stopwatcher call.
//End Goal:
Stopwatcher.Track(MyFunction(), "MyFunction Time");
Any help is welcome.

It's not really a good idea to profile your application like that, but if you insist, you can at least make some improvements.
First, don't reuse Stopwatch, just create new every time you need.
Second, you need to handle two cases - one when delegate you pass returns value and one when it does not.
Since your Track method is static - it's a common practice to make it thread safe. Non-thread-safe static methods are quite bad idea. For that you can store your messages in a thread-safe collection like ConcurrentBag, or just use lock every time you add item to your list.
In the end you can have something like this:
public class Stopwatcher {
private static readonly ConcurrentBag<StopwatcherData> _data = new ConcurrentBag<StopwatcherData>();
public static void Track(Action action, string message) {
var w = Stopwatch.StartNew();
try {
action();
}
finally {
w.Stop();
_data.Add(new StopwatcherData(w.ElapsedMilliseconds, message));
}
}
public static T Track<T>(Func<T> func, string message) {
var w = Stopwatch.StartNew();
try {
return func();
}
finally {
w.Stop();
_data.Add(new StopwatcherData(w.ElapsedMilliseconds, message));
}
}
}
And use it like this:
Stopwatcher.Track(() => SomeAction(param1), "test");
bool result = Stopwatcher.Track(() => SomeFunc(param2), "test");
If you are going to use that with async delegates (which return Task or Task<T>) - you need to add two more overloads for that case.

Yes, you can create a timer function that accepts any action as a delegate. Try this block:
public static long TimeAction(Action action)
{
var timer = new Stopwatch();
timer.Start();
action();
timer.Stop();
return timer.ElapsedMilliseconds;
}
This can be used like this:
var elapsedMilliseconds = TimeAction(() => MyFunc(param1, param2));
This is a bit more awkward if your wrapped function returns a value, but you can deal with this by assigning a variable from within the closure, like this:
bool isSuccess ;
var elapsedMilliseconds = TimeToAction(() => {
isSuccess = MyFunc(param1, param2);
});

I've had this problem a while ago as well and was always afraid of the case that I'll leave errors when I change Stopwatcher.Track(() => SomeFunc(), "test")(See Evk's answer) back to SomeFunc(). So I tought about something that wraps it without changing it!
I came up with a using, which is for sure not the intended purpose.
public class OneTimeStopwatch : IDisposable
{
private string _logPath = "C:\\Temp\\OneTimeStopwatch.log";
private readonly string _itemname;
private System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
public OneTimeStopwatch(string itemname)
{
_itemname = itemname;
sw.Start();
}
public void Dispose()
{
sw.Stop();
System.IO.File.AppendAllText(_logPath, string.Format($"{_itemname}: {sw.ElapsedMilliseconds}ms{Environment.NewLine}"));
}
}
This can be used a easy way
using (new OneTimeStopwatch("test"))
{
//some sensible code not to touch
System.Threading.Thread.Sleep(1000);
}
//logfile with line "test: 1000ms"
I only need to remove 2 lines (and auto format) to make it normal again.
Plus I can easily wrap multiple lines here which isn't possible without defining new functions in the other approach.
Again, this is not recommended for terms of few miliseconds.

How to implement a continuous producer-consumer pattern inside a Windows Service

Here's what I'm trying to do:
Keep a queue in memory of items that need processed (i.e. IsProcessed = 0)
Every 5 seconds, get unprocessed items from the db, and if they're not already in the queue, add them
Continuous pull items from the queue, process them, and each time an item is processed, update it in the db (IsProcessed = 1)
Do this all "as parallel as possible"
I have a constructor for my service like
public MyService()
{
Ticker.Elapsed += FillQueue;
}
and I start that timer when the service starts like
protected override void OnStart(string[] args)
{
Ticker.Enabled = true;
Task.Run(() => { ConsumeWork(); });
}
and my FillQueue is like
private static async void FillQueue(object source, ElapsedEventArgs e)
{
var items = GetUnprocessedItemsFromDb();
foreach(var item in items)
{
if(!Work.Contains(item))
{
Work.Enqueue(item);
}
}
}
and my ConsumeWork is like
private static void ConsumeWork()
{
while(true)
{
if(Work.Count > 0)
{
var item = Work.Peek();
Process(item);
Work.Dequeue();
}
else
{
Thread.Sleep(500);
}
}
}
However this is probably a naive implementation and I'm wondering whether .NET has any type of class that is exactly what I need for this type of situation.

Though #JSteward' answer is a good start, you can improve it with mixing up the TPL-Dataflow and Rx.NET extensions, as a dataflow block may easily become an observer for your data, and with Rx Timer it will be much less effort for you (Rx.Timer explanation).
We can adjust MSDN article for your needs, like this:
private const int EventIntervalInSeconds = 5;
private const int DueIntervalInSeconds = 60;
var source =
// sequence of Int64 numbers, starting from 0
// https://msdn.microsoft.com/en-us/library/hh229435.aspx
Observable.Timer(
// fire first event after 1 minute waiting
TimeSpan.FromSeconds(DueIntervalInSeconds),
// fire all next events each 5 seconds
TimeSpan.FromSeconds(EventIntervalInSeconds))
// each number will have a timestamp
.Timestamp()
// each time we select some items to process
.SelectMany(GetItemsFromDB)
// filter already added
.Where(i => !_processedItemIds.Contains(i.Id));
var action = new ActionBlock<Item>(ProcessItem, new ExecutionDataflowBlockOptions
{
// we can start as many item processing as processor count
MaxDegreeOfParallelism = Environment.ProcessorCount,
});
IDisposable subscription = source.Subscribe(action.AsObserver());
Also, your check for item being already processed isn't quite accurate, as there is a possibility to item get selected as unprocessed from db right at the time you've finished it's processing, yet didn't update it in database. In this case item will be removed from Queue<T>, and after that added there again by producer, this is why I've added the ConcurrentBag<T> to this solution (HashSet<T> isn't thread-safe):
private static async Task ProcessItem(Item item)
{
if (_processedItemIds.Contains(item.Id))
{
return;
}
_processedItemIds.Add(item.Id);
// actual work here
// save item as processed in database
// we need to wait to ensure item not to appear in queue again
await Task.Delay(TimeSpan.FromSeconds(EventIntervalInSeconds * 2));
// clear the processed cache to reduce memory usage
_processedItemIds.Remove(item.Id);
}
public class Item
{
public Guid Id { get; set; }
}
// temporary cache for items in process
private static ConcurrentBag<Guid> _processedItemIds = new ConcurrentBag<Guid>();
private static IEnumerable<Item> GetItemsFromDB(Timestamped<long> time)
{
// log event timing
Console.WriteLine($"Event # {time.Value} at {time.Timestamp}");
// return items from DB
return new[] { new Item { Id = Guid.NewGuid() } };
}
You can implement cache clean up in other way, for example, start a "GC" timer, which will remove processed items from cache on regular basis.
To stop events and processing items you should Dispose the subscription and, maybe, Complete the ActionBlock:
subscription.Dispose();
action.Complete();
You can find more information about Rx.Net in their guidelines on github.

You could use an ActionBlock to do your processing, it has a built in queue that you can post work to. You can read up on tpl-dataflow here: Intro to TPL-Dataflow also Introduction to Dataflow, Part 1. Finally, this is a quick sample to get you going. I've left out a lot but it should at least get you started.
using System;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
namespace MyWorkProcessor {
public class WorkProcessor {
public WorkProcessor() {
Processor = CreatePipeline();
}
public async Task StartProcessing() {
try {
await Task.Run(() => GetWorkFromDatabase());
} catch (OperationCanceledException) {
//handle cancel
}
}
private CancellationTokenSource cts {
get;
set;
}
private ITargetBlock<WorkItem> Processor {
get;
}
private TimeSpan DatabasePollingFrequency {
get;
} = TimeSpan.FromSeconds(5);
private ITargetBlock<WorkItem> CreatePipeline() {
var options = new ExecutionDataflowBlockOptions() {
BoundedCapacity = 100,
CancellationToken = cts.Token
};
return new ActionBlock<WorkItem>(item => ProcessWork(item), options);
}
private async Task GetWorkFromDatabase() {
while (!cts.IsCancellationRequested) {
var work = await GetWork();
await Processor.SendAsync(work);
await Task.Delay(DatabasePollingFrequency);
}
}
private async Task<WorkItem> GetWork() {
return await Context.GetWork();
}
private void ProcessWork(WorkItem item) {
//do processing
}
}
}

Manually profiling with Stopwatch massively bloats execution time

I have implemented a basic binary heap. I wanted to see just how well it performed so I wrote a quick manual 'profiler':
public class MProfile : IDisposable
{
private static Dictionary<string, ProfilerEntry> _funcs = new Dictionary<string, ProfilerEntry>();
private Stopwatch _stopwatch;
private ProfilerEntry _entry;
public MProfile(string funcName)
{
if (!_funcs.ContainsKey(funcName))
{
_funcs.Add(funcName, new ProfilerEntry(funcName));
}
_entry = _funcs[funcName];
_stopwatch = Stopwatch.StartNew();
}
public void Dispose()
{
_stopwatch.Stop();
_entry.Record(_stopwatch.Elapsed.TotalMilliseconds);
}
}
The idea was to wrap it around functions calls with using and it would record the time taken. The ProfileEntry class is just a number of calls and total time taken. By storing them against the name, I can add them up.
Now, if I wrap it around my entire test:
Random random = new Random();
int count = 20000;
using (MProfile profile = new MProfile("HeapTest"))
{
PriorityHeap<PretendPathNode> heap = new PriorityHeap<PretendPathNode>(count);
for (int i = 0; i < count; i++)
{
int next = random.Next(-1000, 1000);
heap.Insert(new PretendPathNode(next));
}
while (heap.Count() > 0)
{
heap.Pop();
}
}
It will tell me that this took: 40.6682ms
However, if I add more profiler around the Insert and Pop calls, i.e:
using (MProfile profile2 = new MProfile("Insert"))
{
heap.Insert(new PretendPathNode(next));
}
using (MProfile profile3 = new MProfile("Pop"))
{
heap.Pop();
}
The total time taken is now: 452ms, with 107ms being from Insert and 131ms being from Pop (note: I've run these tests in huge loops and taken an average). I gather that my extra 'profiling' code will obviously have an impact, but how is it bloating the Insert and Pop times to above the original execution time? I thought they way I'd done the disposable meant that -only- the inner execution time would get recorded, which would still be exactly the same. The extra creating disposable, looking up the func in the dictionary and disposing happens -outside- of the Insert/Pop calls.
Is it to do with things like JIT and compile/run time optimizations? Has throwing in that disposable effectively ruined it? I thought maybe it was GC related but I tried a different profiler (static manual calls to start/stop) that had 0 garbage and it's the same...
Edit: I get the same times using this slightly more confusing code, which caches the MProfile objects and Stopwatch objects, so there is less creation/GC.
public class MProfile : IDisposable
{
private static Dictionary<string, ProfilerEntry> _funcs = new Dictionary<string, ProfilerEntry>();
private static Dictionary<string, Stopwatch> _stopwatches = new Dictionary<string, Stopwatch>();
private static Dictionary<string, MProfile> _profiles = new Dictionary<string, MProfile>();
private ProfilerEntry _entry;
private string _name;
public MProfile(string funcName)
{
_name = funcName;
_entry = new ProfilerEntry(funcName);
_funcs.Add(funcName, _entry);
}
public static MProfile GetProfiler(string funcName)
{
if (!_profiles.ContainsKey(funcName))
{
_profiles.Add(funcName, new MProfile(funcName));
_stopwatches.Add(funcName, new Stopwatch());
}
_stopwatches[funcName].Restart();
return _profiles[funcName];
}
public void Dispose()
{
_stopwatches[_name].Stop();
_entry.Record(_stopwatches[_name].Elapsed.TotalMilliseconds);
}
}
And calling it via:
using (profile2 = MProfile.GetProfiler("Insert"))

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Thread safe update of Cached Reference data - c#

Related

C# Design pattern for periodic execution of multiple Threads

Triggering DynamicData cache update using Reactive Subject

C# Trying to wrap a function with a stopwatch

How to implement a continuous producer-consumer pattern inside a Windows Service

Manually profiling with Stopwatch massively bloats execution time

Categories

Resources