TPL DataFlow Queue with Postponement - c#

I am processing a queue concurrently using an ActionBlock.
The one catch here is that when processing an item in the queue, I may want to wait until a dependency is satisfied by the processing of another item in the queue.
I think I should be able to do this with the TPL DataFlow library with linking, postponement and release of postponement but I'm not sure what constructs to use.
In pseudocode:
public class Item
{
public string Name { get; set; }
public List<string> DependsOn = new List<string>();
}
ActionBlock<Item> block = null;
var block = new ActionBlock<Item>(o => {
if (!HasActionBlockProcessedAllDependencies(o.DependsOn))
{
// enqueue a callback when ALL dependencies have been completed
}
else
{
DoWork(o);
}
},
new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = resourceProcessorOptions.MaximumProviderConcurrency
});
var items = new[]
{
new Item { Name = "Apple", DependsOn = { "Pear" } },
new Item { Name = "Pear" }
}

I am not sure if this will be helpful to you, but here is a custom DependencyTransformBlock class that knows about the dependencies between the items it receives, and processes each one only after its dependencies have been successfully processed. This custom block supports all the built-in functionality of a normal TransformBlock, except from the EnsureOrdered option.
The constructors of this class accept a Func<TInput, TKey> lambda for retrieving the key of each item, and a Func<TInput, IReadOnlyCollection<TKey>> lambda for retrieving its dependencies. The keys are expected to be unique. In case a duplicate key is found, the block will complete with failure.
In case of circular dependencies between items, the affected items will remain unprocessed. The property TInput[] Unprocessed allows to retrieve the unprocessed items after the completion of the block. An item can also remain unprocessed in case any of its dependencies is not supplied.
public class DependencyTransformBlock<TInput, TKey, TOutput> :
ITargetBlock<TInput>, ISourceBlock<TOutput>
{
private readonly ITargetBlock<TInput> _inputBlock;
private readonly IPropagatorBlock<Item, TOutput> _transformBlock;
private readonly object _locker = new object();
private readonly Dictionary<TKey, Item> _items;
private int _pendingCount = 1;
// The initial 1 represents the completion of the _inputBlock
private class Item
{
public TKey Key;
public TInput Input;
public bool HasInput;
public bool IsCompleted;
public HashSet<Item> Dependencies;
public HashSet<Item> Dependents;
public Item(TKey key) => Key = key;
}
public DependencyTransformBlock(
Func<TInput, Task<TOutput>> transform,
Func<TInput, TKey> keySelector,
Func<TInput, IReadOnlyCollection<TKey>> dependenciesSelector,
ExecutionDataflowBlockOptions dataflowBlockOptions = null,
IEqualityComparer<TKey> keyComparer = null)
{
if (transform == null)
throw new ArgumentNullException(nameof(transform));
if (keySelector == null)
throw new ArgumentNullException(nameof(keySelector));
if (dependenciesSelector == null)
throw new ArgumentNullException(nameof(dependenciesSelector));
dataflowBlockOptions =
dataflowBlockOptions ?? new ExecutionDataflowBlockOptions();
keyComparer = keyComparer ?? EqualityComparer<TKey>.Default;
_items = new Dictionary<TKey, Item>(keyComparer);
_inputBlock = new ActionBlock<TInput>(async input =>
{
var key = keySelector(input);
var dependencyKeys = dependenciesSelector(input);
bool isReadyForProcessing = true;
Item item;
lock (_locker)
{
if (!_items.TryGetValue(key, out item))
{
item = new Item(key);
_items.Add(key, item);
}
if (item.HasInput)
throw new InvalidOperationException($"Duplicate key ({key}).");
item.Input = input;
item.HasInput = true;
if (dependencyKeys != null && dependencyKeys.Count > 0)
{
item.Dependencies = new HashSet<Item>();
foreach (var dependencyKey in dependencyKeys)
{
if (!_items.TryGetValue(dependencyKey, out var dependency))
{
dependency = new Item(dependencyKey);
_items.Add(dependencyKey, dependency);
}
if (!dependency.IsCompleted)
{
item.Dependencies.Add(dependency);
if (dependency.Dependents == null)
dependency.Dependents = new HashSet<Item>();
dependency.Dependents.Add(item);
}
}
isReadyForProcessing = item.Dependencies.Count == 0;
}
if (isReadyForProcessing) _pendingCount++;
}
if (isReadyForProcessing)
{
await _transformBlock.SendAsync(item);
}
}, new ExecutionDataflowBlockOptions()
{
CancellationToken = dataflowBlockOptions.CancellationToken,
BoundedCapacity = 1
});
var middleBuffer = new BufferBlock<Item>(new DataflowBlockOptions()
{
CancellationToken = dataflowBlockOptions.CancellationToken,
BoundedCapacity = DataflowBlockOptions.Unbounded
});
_transformBlock = new TransformBlock<Item, TOutput>(async item =>
{
try
{
TInput input;
lock (_locker)
{
Debug.Assert(item.HasInput && !item.IsCompleted);
input = item.Input;
}
var result = await transform(input).ConfigureAwait(false);
lock (_locker)
{
item.IsCompleted = true;
if (item.Dependents != null)
{
foreach (var dependent in item.Dependents)
{
Debug.Assert(dependent.Dependencies != null);
var removed = dependent.Dependencies.Remove(item);
Debug.Assert(removed);
if (dependent.HasInput
&& dependent.Dependencies.Count == 0)
{
middleBuffer.Post(dependent);
_pendingCount++;
}
}
}
item.Input = default; // Cleanup
item.Dependencies = null;
item.Dependents = null;
}
return result;
}
finally
{
lock (_locker)
{
_pendingCount--;
if (_pendingCount == 0) middleBuffer.Complete();
}
}
}, dataflowBlockOptions);
middleBuffer.LinkTo(_transformBlock);
PropagateCompletion(_inputBlock, middleBuffer,
condition: () => { lock (_locker) return --_pendingCount == 0; });
PropagateCompletion(middleBuffer, _transformBlock);
PropagateFailure(_transformBlock, middleBuffer);
PropagateFailure(_transformBlock, _inputBlock);
}
// Constructor with synchronous lambda
public DependencyTransformBlock(
Func<TInput, TOutput> transform,
Func<TInput, TKey> keySelector,
Func<TInput, IReadOnlyCollection<TKey>> dependenciesSelector,
ExecutionDataflowBlockOptions dataflowBlockOptions = null,
IEqualityComparer<TKey> keyComparer = null) : this(
input => Task.FromResult(transform(input)),
keySelector, dependenciesSelector, dataflowBlockOptions, keyComparer)
{
if (transform == null) throw new ArgumentNullException(nameof(transform));
}
public TInput[] Unprocessed
{
get
{
lock (_locker) return _items.Values
.Where(item => item.HasInput && !item.IsCompleted)
.Select(item => item.Input)
.ToArray();
}
}
public Task Completion => _transformBlock.Completion;
public void Complete() => _inputBlock.Complete();
void IDataflowBlock.Fault(Exception ex) => _inputBlock.Fault(ex);
DataflowMessageStatus ITargetBlock<TInput>.OfferMessage(
DataflowMessageHeader header, TInput value, ISourceBlock<TInput> source,
bool consumeToAccept)
{
return _inputBlock.OfferMessage(header, value, source, consumeToAccept);
}
TOutput ISourceBlock<TOutput>.ConsumeMessage(DataflowMessageHeader header,
ITargetBlock<TOutput> target, out bool messageConsumed)
{
return _transformBlock.ConsumeMessage(header, target, out messageConsumed);
}
bool ISourceBlock<TOutput>.ReserveMessage(DataflowMessageHeader header,
ITargetBlock<TOutput> target)
{
return _transformBlock.ReserveMessage(header, target);
}
void ISourceBlock<TOutput>.ReleaseReservation(DataflowMessageHeader header,
ITargetBlock<TOutput> target)
{
_transformBlock.ReleaseReservation(header, target);
}
public IDisposable LinkTo(ITargetBlock<TOutput> target,
DataflowLinkOptions linkOptions)
{
return _transformBlock.LinkTo(target, linkOptions);
}
private async void PropagateCompletion(IDataflowBlock source,
IDataflowBlock target, Func<bool> condition = null)
{
try { await source.Completion.ConfigureAwait(false); } catch { }
if (source.Completion.IsFaulted)
target.Fault(source.Completion.Exception.InnerException);
else
if (condition == null || condition()) target.Complete();
}
private async void PropagateFailure(IDataflowBlock source,
IDataflowBlock target)
{
try { await source.Completion.ConfigureAwait(false); } catch { }
if (source.Completion.IsFaulted)
target.Fault(source.Completion.Exception.InnerException);
}
}
Usage example:
var block = new DependencyTransformBlock<Item, string, Item>(item =>
{
DoWork(item);
return item;
},
keySelector: item => item.Name,
dependenciesSelector: item => item.DependsOn,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
},
keyComparer: StringComparer.OrdinalIgnoreCase);
//...
block.LinkTo(DataflowBlock.NullTarget<Item>());
In this example the block is linked to a NullTarget in order to discard its output, so that it becomes essentially an ActionBlock equivalent.

Related

UWP DataGrid - Duplicates with items added asynchronously

I have a custom collection (implementing INotifyCollectionChanged) bound to a DataGrid via property ItemsSource. This collection is created synchronously but is filled asynchronously (I want to have the UI available with the first items even the filling job is not completed). Curiously, from time to time, I have some duplicates in the DataGrid.
public class BaseItemCollection
{
public delegate void CollectionHander(BaseItemCollection collection);
public static BaseItemCollection<T> GetCollection<T>(params KeyValuePair<string, object>[] pairs) where T : BaseItem, new() => GetCollection<T>(null, pairs);
public static BaseItemCollection<T> GetCollection<T>(string procedure = null, params KeyValuePair<string, object>[] pairs) where T : BaseItem, new()
{
var collection = new BaseItemCollection<T>(pairs);
collection.Fill(procedure, pairs);
return collection;
}
public static bool IsCompatible(object parameter) => parameter != null && parameter is IBaseItemCollection;
}
public class BaseItemCollection<T> : BaseItemCollection, IBaseItemCollection, ICollection<T>, IList<T>, IDisposable,
INotifyCompletion, INotifyPropertyChanged, INotifyCollectionChanged
where T : BaseItem, new()
{
private List<T> innerCollection = new List<T>();
#region Properties
public object SyncRoot { get; } = new object(); // Use for thread-safe sync
public KeyValuePair<string, object>[] DefaultParameters { get; private set; }
public Dictionary<string, object> Parameters { get; } = new Dictionary<string, object>();
public Task<BaseItemCollection<T>> TaskFill { get; private set; } = Task.FromResult<BaseItemCollection<T>>(null);
public CancellationTokenSource TaskCancellationTokenSource { get; private set; }
public bool IsReadOnly => false;
#endregion
#region Constructors
public BaseItemCollection()
{
BaseItem.OnItemSaved += Update;
}
public BaseItemCollection(params KeyValuePair<string, object>[] parameters)
{
DefaultParameters = parameters;
BaseItem.OnItemSaved += Update;
}
#endregion
#region Methods
public void AddFromReader(DbDataReader reader) => Add(BaseItem.FromReader<T>(reader));
public void Fill(params KeyValuePair<string, object>[] pairs) => Fill(null, pairs);
public void Fill(Dictionary<string, object> pairs) => Fill(null, pairs.ToArray());
public void Fill(string procedure = null, params KeyValuePair<string, object>[] pairs)
{
// Get the stored procedure name if it has not been specified as parameter
if (string.IsNullOrEmpty(procedure)) procedure = StoredProcedures.Get(typeof(T), DataAction.Read);
// Cancel the running task and wait for it
if (TaskFill != null)
{
TaskCancellationTokenSource?.Cancel();
TaskFill.Wait();
}
//Empty the current collection
lock (SyncRoot)
{
innerCollection.Clear();
innerCollection = new List<T>();
CollectionChanged?.Invoke(this, new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Reset));
}
// Start filling the current collection
TaskFill = Task.Factory.StartNew(() =>
{
var command = BaseItem.GetCommand(procedure, pairs);
var reader = command.ExecuteReader();
Parameters.Clear();
foreach (var pair in pairs) Parameters.Add(pair.Key, pair.Value);
while (!TaskCancellationTokenSource.IsCancellationRequested && !reader.IsClosed && reader.HasRows && reader.Read())
lock (SyncRoot)
{
AddFromReader(reader);
}
command.Connection?.Close();
FireCollectionFilled(this);
return this;
}, (TaskCancellationTokenSource = new CancellationTokenSource()).Token);
}
public T Find(T item) => this.SingleOrDefault(i => item != null && i.Equals(item));
public void FromUriQuery(string query)
{
var parameters = HttpUtility.ParseQueryString(query);
if (parameters == null || parameters.Count == 0) return;
Parameters.Clear();
foreach (KeyValuePair<string, string> p in parameters) Parameters[p.Key] = HttpUtility.UrlDecode(p.Value);
}
public Task RefreshAsync()
{
Fill(Parameters.ToArray());
FirePropertyChanged("Item[]");
return TaskFill;
}
public Task ResetAsync()
{
Parameters.Clear();
DefaultParameters.ForEach(p => Parameters.Add(p.Key, p.Value));
return RefreshAsync();
}
public bool ShouldContains(T item)
{
Action<DbCommand> action = (DbCommand command) =>
{
Parameters.ForEach(p => command.Parameters.Add(BaseItem.GetParameter(p.Key, p.Value)));
item.SetParameters(command, true);
};
var cmd = BaseItem.GetCommand(StoredProcedures.Get(item.GetType(), DataAction.Read), action);
TaskFill?.Wait();
return cmd.ExecuteScalar() != null;
}
public string ToUriQuery()
{
var qp = Parameters.Where(p => p.Value != null).Select(p => string.Format("{0}={1}", p.Key, p.Value.ToString()));
return string.Join("&", qp);
}
public void Update(BaseItem baseItem, DbDataReader reader)
{
Console.WriteLine("Fired WeakItemUpdate for {0}", baseItem);
TaskFill?.Wait();
if (baseItem.GetType() != typeof(T)) return;
var item = baseItem as T;
var should = ShouldContains(item); // Should the item be in the list? (using parameters)
if (should)
{
item = this.FirstOrDefault(i => i.Equals(item)) ?? item;
if (reader != null) item.ReadFromDb(reader, 0); // Refresh the item from the database
}
if (this[item] != null && !should) Remove(item); // If item is in the list while it should not => Remove
if (this[item] == null && should) Add(item); // If item is not in the list while it should => Add
}
#endregion
#region Indexes
public T this[int index]
{
get => innerCollection[index];
set { innerCollection[index] = value; }
}
public bool this[T item]
{
get => this.Any(i => i.Equals(item));
set
{
if (value && !this.AnyEquals(item)) Add(item);
if (!value && this.AnyEquals(item)) Remove(item);
}
}
public object this[string pname]
{
get => Parameters.ContainsKey(pname) ? Parameters[pname] : null;
set => this[pname, true] = value;
}
public object this[string pname, bool refresh]
{
get => Parameters.ContainsKey(pname) ? Parameters[pname] : null;
set
{
Parameters[pname] = value;
if (refresh)
{
FirePropertyChanged("Item[]");
Fill(Parameters);
}
}
}
#endregion
#region ICollection & IList
public int Count
{
get { lock (SyncRoot) return innerCollection.Count; }
}
public void Add(T item)
{
lock (SyncRoot) innerCollection.Add(item);
var action = new Action(() =>
{
FireCollectionChanged(NotifyCollectionChangedAction.Add, item);
FirePropertyChanged("Count");
});
if (BaseItem.UserInterfaceAction != null) BaseItem.UserInterfaceAction.Invoke(action);
else action.Invoke();
}
public void Clear()
{
lock (SyncRoot)
{
innerCollection.Clear();
innerCollection = new List<T>();
var action = new Action(() =>
{
CollectionChanged?.Invoke(this, new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Reset));
FirePropertyChanged("Count");
});
if (BaseItem.UserInterfaceAction != null) BaseItem.UserInterfaceAction.Invoke(action);
else action.Invoke();
}
}
public bool Contains(T item)
{
lock (SyncRoot) return innerCollection.Contains(item);
}
public void CopyTo(T[] items, int arrayIndex)
{
lock (SyncRoot) innerCollection.CopyTo(items, arrayIndex);
}
public int IndexOf(T item)
{
lock (SyncRoot) return innerCollection.IndexOf(item);
}
public void Insert(int index, T item)
{
lock (SyncRoot) innerCollection.Insert(index, item);
}
public void RemoveAt(int index)
{
//lock (SyncRoot) innerCollection.RemoveAt(index);
lock (SyncRoot)
{
var item = innerCollection[index];
if (item == null) return;
var res = innerCollection.Remove(item);
if (!res) return;
var action = new Action(() => FireCollectionChanged(NotifyCollectionChangedAction.Remove, item));
if (BaseItem.UserInterfaceAction != null) BaseItem.UserInterfaceAction.Invoke(action);
else action.Invoke();
}
}
public bool Remove(T item)
{
lock (SyncRoot)
if (innerCollection.Remove(item))
{
var action = new Action(() =>
{
FireCollectionChanged(NotifyCollectionChangedAction.Remove, item);
FirePropertyChanged("Count");
});
if (BaseItem.UserInterfaceAction != null) BaseItem.UserInterfaceAction.Invoke(action);
else action.Invoke();
return true;
}
else return false;
}
#endregion
#region IDisposable
public void Dispose()
{
BaseItem.OnItemSaved -= Update;
TaskFill?.Wait();
}
#endregion;
#region IEnumerator
public IEnumerator<T> GetEnumerator() => new BaseItemEnumerator<T>(innerCollection, SyncRoot);
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
#endregion
#region Events (INotifyPropertyChange & INotifyCollectionChanged)
public event CollectionHander CollectionFilled;
protected void FireCollectionFilled(BaseItemCollection collection)
=> CollectionFilled?.Invoke(this);
public event PropertyChangedEventHandler PropertyChanged;
protected void FirePropertyChanged(params string[] names)
=> names.ForEach(name => PropertyChanged?.Invoke(this, new PropertyChangedEventArgs(name)));
public event NotifyCollectionChangedEventHandler CollectionChanged;
protected void FireCollectionChanged(NotifyCollectionChangedAction changedAction, params T[] items)
=> CollectionChanged?.Invoke(this, new NotifyCollectionChangedEventArgs(changedAction, items));
#endregion
#region Async
public TaskAwaiter<BaseItemCollection<T>> GetAwaiter() => TaskFill.GetAwaiter();
public BaseItemCollection<T> GetResult() => TaskFill?.Result ?? this;
public bool IsCompleted
{
get => TaskFill?.IsCompleted ?? true;
}
public void OnCompleted(Action continuation) { }
#endregion
#region Thread-safe
private void RunSafe(Action action)
{
lock (SyncRoot) action.Invoke();
}
private T0 RunSafe<T0>(Func<T0> f)
{
lock (SyncRoot) return f.Invoke();
}
#endregion
}
And the UI DataGrid (List is BaseItemCollection):
<uwpc:DataGrid x:Name="MyDataGrid" AlternatingRowBackground="WhiteSmoke" AutoGenerateColumns="False" ColumnWidth="*" IsReadOnly="True"
ItemsSource="{x:Bind List, Mode=OneWay}" SelectedItem="{x:Bind SelectedItem, Mode=TwoWay}" SelectionMode="Single" >
<uwpc:DataGrid.Columns>
<uwpc:DataGridTextColumn Binding="{Binding Path=D_From}" Header="From" Width="auto"/>
<uwpc:DataGridTextColumn Binding="{Binding Path=Name}" Header="Name"/>
<uwpc:DataGridTextColumn Binding="{Binding Path=Title}" Header="Position"/>
<uwpc:DataGridTextColumn Binding="{Binding Path=Site}" Header="Site"/>
<uwpc:DataGridTextColumn Binding="{Binding Path=Manager}" Header="Manager"/>
<uwpc:DataGridTextColumn Binding="{Binding Path=D_To}" Header="To" Width="auto"/>
</uwpc:DataGrid.Columns>
</uwpc:DataGrid>
Considering the following items in my collection:
Item1
Item2
Item3
Item4
Item5
I can see in my DataGrid:
Item1
Item2
Item1
Item2
Item3
Item4
Item5
I checked programmatically how items I have in the collection. It returned 5 items and if I select the second occurence of Item1, the first one appears as hightlighted.
Do you know how to fix this? Have some thoughs about Enumerator but would like to have your opinion...
I regularly have some curious behaviors with collection filled asynchronously (see this post: Curious behavior with ComboBox in UWP - SelectedItem & ItemsSource)

How to emit a cartesian product in TPL/Dataflow?

I am trying to implement the following behaviour:
[TestMethod]
public async Task ProducesCartesianProductOfInputs()
{
var block = new CartesianProductBlock<int, string>();
var target = new BufferBlock<Tuple<int, string>>();
var left = block.Left;
var right = block.Right;
block.LinkTo(target);
var actual = new List<Tuple<int, string>>();
Assert.IsTrue(left.Post(1));
Assert.IsTrue(right.Post("a"));
Assert.IsTrue(left.Post(2));
Assert.IsTrue(right.Post("b"));
// PROBLEM?: These can run before messages have been processed and appear to abort further processing.
left.Complete();
right.Complete();
while (await target.OutputAvailableAsync())
{
actual.Add(target.Receive());
}
var expected = new List<Tuple<int, string>>()
{
Tuple.Create(1, "a"),
Tuple.Create(2, "a"),
Tuple.Create(1, "b"),
Tuple.Create(2, "b"),
};
CollectionAssert.AreEquivalent(expected, actual.ToList());
}
My current (partial) implementation does not work and I can't figure out why:
// A block that remembers every message it receives on two channels, and pairs every message on a channel to every message on the other channel
public class CartesianProductBlock<T1, T2> : ISourceBlock<Tuple<T1, T2>>
{
private TransformManyBlock<T1, Tuple<T1, T2>> left;
private TransformManyBlock<T2, Tuple<T1, T2>> right;
private List<T1> leftReceived = new List<T1>();
private List<T2> rightReceived = new List<T2>();
private List<ITargetBlock<Tuple<T1, T2>>> targets = new List<ITargetBlock<Tuple<T1, T2>>>();
private object lockObject = new object();
public ITargetBlock<T1> Left { get { return left; } }
public ITargetBlock<T2> Right { get { return right; } }
public CartesianProductBlock()
{
left = new TransformManyBlock<T1, Tuple<T1, T2>>(l =>
{
lock (lockObject)
{
leftReceived.Add(l);
// Pair this input up with all received alternatives
return rightReceived.Select(r => Tuple.Create(l, r));
}
});
right = new TransformManyBlock<T2, Tuple<T1, T2>>(r =>
{
lock(lockObject)
{
rightReceived.Add(r);
// Pair this input up with all received alternatives
return leftReceived.Select(l => Tuple.Create(l, r));
}
});
Task.WhenAll(Left.Completion, Right.Completion).ContinueWith(_ => {
// TODO: Respect propagate completion linkOptions. Defauting to propagation for now.
foreach (var target in targets)
{
target.Complete();
}
});
}
private TaskCompletionSource<int> completion = new TaskCompletionSource<int>();
public Task Completion => completion.Task;
public void Complete() { throw new NotImplementedException(); }
public void Fault(Exception exception) { throw new NotImplementedException(); }
public IDisposable LinkTo(ITargetBlock<Tuple<T1, T2>> target, DataflowLinkOptions linkOptions)
{
left.LinkTo(target);
right.LinkTo(target);
targets.Add(target);
return null; // TODO: Return something proper to allow unlinking
}
public void ReleaseReservation(DataflowMessageHeader messageHeader, ITargetBlock<Tuple<T1, T2>> target)
{
throw new NotImplementedException();
}
public bool ReserveMessage(DataflowMessageHeader messageHeader, ITargetBlock<Tuple<T1, T2>> target)
{
throw new NotImplementedException();
}
public Tuple<T1, T2> ConsumeMessage(DataflowMessageHeader messageHeader, ITargetBlock<Tuple<T1, T2>> target, out bool messageConsumed)
{
throw new NotImplementedException();
}
}
I'm experiencing the following (probably related) issues:
It is non-deterministic. The test fails in different ways.
It appears (from adding in logging, and also since I get anywhere from 3 to 6 output messages) that the Complete call to the two inputs is causing messages to not be processed, though my understanding is that it should allow all queues to drain first. (And if this is not the case, then I don't know how to write the test correctly.)
It's quite possible my locking scheme is wrong/suboptimal, though my goal was to have something big and coarse that worked before trying to fix.
My experiments with individual TransformManyBlocks has failed to come up with interesting surprising, and I can't figure out what's different in this case.
As suspected, this was related to completeness propagation. Here is a working version, including proper link disposable and respecting link options:
// A block that remembers every message it receives on two channels, and pairs every message on a channel to every message on the other channel
public class CartesianProductBlock<T1, T2> : ISourceBlock<Tuple<T1, T2>>
{
private TransformManyBlock<T1, Tuple<T1, T2>> left;
private TransformManyBlock<T2, Tuple<T1, T2>> right;
private List<T1> leftReceived = new List<T1>();
private List<T2> rightReceived = new List<T2>();
private List<ITargetBlock<Tuple<T1, T2>>> targets = new List<ITargetBlock<Tuple<T1, T2>>>();
private object lockObject = new object();
public ITargetBlock<T1> Left { get { return left; } }
public ITargetBlock<T2> Right { get { return right; } }
public CartesianProductBlock()
{
left = new TransformManyBlock<T1, Tuple<T1, T2>>(l =>
{
lock (lockObject)
{
leftReceived.Add(l);
return rightReceived.Select(r => Tuple.Create(l, r)).ToList();
}
});
right = new TransformManyBlock<T2, Tuple<T1, T2>>(r =>
{
lock(lockObject)
{
rightReceived.Add(r);
return leftReceived.Select(l => Tuple.Create(l, r)).ToList();
}
});
Task.WhenAll(Left.Completion, Right.Completion).ContinueWith(_ => {
completion.SetResult(VoidResult.Instance);
});
}
private TaskCompletionSource<VoidResult> completion = new TaskCompletionSource<VoidResult>();
public Task Completion => completion.Task;
public void Complete() {
Left.Complete();
Right.Complete();
}
public void Fault(Exception exception) { throw new NotImplementedException(); }
public IDisposable LinkTo(ITargetBlock<Tuple<T1, T2>> target, DataflowLinkOptions linkOptions)
{
var leftLink = left.LinkTo(target);
var rightLink = right.LinkTo(target);
var link = new Link(leftLink, rightLink);
Task task = Task.FromResult(0);
if (linkOptions.PropagateCompletion)
{
task = Task.WhenAny(Task.WhenAll(Left.Completion, Right.Completion), link.Completion).ContinueWith(_ =>
{
// If the link has been disposed of, we should not longer propagate completeness.
if (!link.Completion.IsCompleted)
{
target.Complete();
}
});
}
return link;
}
public void ReleaseReservation(DataflowMessageHeader messageHeader, ITargetBlock<Tuple<T1, T2>> target)
{
throw new NotImplementedException();
}
public bool ReserveMessage(DataflowMessageHeader messageHeader, ITargetBlock<Tuple<T1, T2>> target)
{
throw new NotImplementedException();
}
public Tuple<T1, T2> ConsumeMessage(DataflowMessageHeader messageHeader, ITargetBlock<Tuple<T1, T2>> target, out bool messageConsumed)
{
throw new NotImplementedException();
}
private class Link : IDisposable
{
private IDisposable leftLink;
private IDisposable rightLink;
public Link(IDisposable leftLink, IDisposable rightLink)
{
this.leftLink = leftLink;
this.rightLink = rightLink;
}
private TaskCompletionSource<VoidResult> completionSource = new TaskCompletionSource<VoidResult>();
public Task Completion { get { return completionSource.Task; } }
public void Dispose()
{
leftLink.Dispose();
rightLink.Dispose();
completionSource.SetResult(VoidResult.Instance);
}
}
private class VoidResult
{
public static VoidResult instance = new VoidResult();
public static VoidResult Instance { get { return instance; } }
protected VoidResult() { }
}
}

BroadcastBlock with guaranteed delivery in TPL Dataflow

I have a stream of data that I process in several different ways... so I would like to send a copy of each message I get to multiple targets so that these targets may execute in parallel... however, I need to set BoundedCapacity on my blocks because the data is streamed in way faster than my targets can handle them and there is a ton of data. Without BoundedCapacity I would quickly run out of memory.
However the problem is BroadcastBlock will drop messages if a target cannot handle it (due to the BoundedCapacity).
What I need is a BroadcastBlock that will not drop messages, but will essentially refuse additional input until it can deliver messages to each target and then is ready for more.
Is there something like this, or has anybody written a custom block that behaves in this manner?
It is fairly simple to build what you're asking using ActionBlock and SendAsync(), something like:
public static ITargetBlock<T> CreateGuaranteedBroadcastBlock<T>(
IEnumerable<ITargetBlock<T>> targets)
{
var targetsList = targets.ToList();
return new ActionBlock<T>(
async item =>
{
foreach (var target in targetsList)
{
await target.SendAsync(item);
}
}, new ExecutionDataflowBlockOptions { BoundedCapacity = 1 });
}
This is the most basic version, but extending it to support mutable list of targets, propagating completion or cloning function should be easy.
Here is a polished version of svick's idea. The GuaranteedDeliveryBroadcastBlock class below is an (almost) complete substitute of the built-in BroadcastBlock. Linking and unlinking targets at any moment is supported.
public class GuaranteedDeliveryBroadcastBlock<T> :
ITargetBlock<T>, ISourceBlock<T>, IPropagatorBlock<T, T>
{
private class Subscription
{
public readonly ITargetBlock<T> Target;
public readonly bool PropagateCompletion;
public readonly CancellationTokenSource CancellationSource;
public Subscription(ITargetBlock<T> target,
bool propagateCompletion,
CancellationTokenSource cancellationSource)
{
Target = target;
PropagateCompletion = propagateCompletion;
CancellationSource = cancellationSource;
}
}
private readonly object _locker = new object();
private readonly Func<T, T> _cloningFunction;
private readonly CancellationToken _cancellationToken;
private readonly ITargetBlock<T> _actionBlock;
private readonly List<Subscription> _subscriptions = new List<Subscription>();
private readonly Task _completion;
private CancellationTokenSource _faultCTS
= new CancellationTokenSource(); // Is nullified on completion
public GuaranteedDeliveryBroadcastBlock(Func<T, T> cloningFunction,
DataflowBlockOptions dataflowBlockOptions = null)
{
_cloningFunction = cloningFunction
?? throw new ArgumentNullException(nameof(cloningFunction));
dataflowBlockOptions ??= new DataflowBlockOptions();
_cancellationToken = dataflowBlockOptions.CancellationToken;
_actionBlock = new ActionBlock<T>(async item =>
{
Task sendAsyncToAll;
lock (_locker)
{
var allSendAsyncTasks = _subscriptions
.Select(sub => sub.Target.SendAsync(
_cloningFunction(item), sub.CancellationSource.Token));
sendAsyncToAll = Task.WhenAll(allSendAsyncTasks);
}
await sendAsyncToAll;
}, new ExecutionDataflowBlockOptions()
{
CancellationToken = dataflowBlockOptions.CancellationToken,
BoundedCapacity = dataflowBlockOptions.BoundedCapacity,
MaxMessagesPerTask = dataflowBlockOptions.MaxMessagesPerTask,
TaskScheduler = dataflowBlockOptions.TaskScheduler,
});
var afterCompletion = _actionBlock.Completion.ContinueWith(t =>
{
lock (_locker)
{
// PropagateCompletion
foreach (var subscription in _subscriptions)
{
if (subscription.PropagateCompletion)
{
if (t.IsFaulted)
subscription.Target.Fault(t.Exception);
else
subscription.Target.Complete();
}
}
// Cleanup
foreach (var subscription in _subscriptions)
{
subscription.CancellationSource.Dispose();
}
_subscriptions.Clear();
_faultCTS.Dispose();
_faultCTS = null; // Prevent future subscriptions to occur
}
}, TaskScheduler.Default);
// Ensure that any exception in the continuation will be surfaced
_completion = Task.WhenAll(_actionBlock.Completion, afterCompletion);
}
public Task Completion => _completion;
public void Complete() => _actionBlock.Complete();
void IDataflowBlock.Fault(Exception ex)
{
_actionBlock.Fault(ex);
lock (_locker) _faultCTS?.Cancel();
}
public IDisposable LinkTo(ITargetBlock<T> target,
DataflowLinkOptions linkOptions)
{
if (linkOptions.MaxMessages != DataflowBlockOptions.Unbounded)
throw new NotSupportedException();
Subscription subscription;
lock (_locker)
{
if (_faultCTS == null) return new Unlinker(null); // Has completed
var cancellationSource = CancellationTokenSource
.CreateLinkedTokenSource(_cancellationToken, _faultCTS.Token);
subscription = new Subscription(target,
linkOptions.PropagateCompletion, cancellationSource);
_subscriptions.Add(subscription);
}
return new Unlinker(() =>
{
lock (_locker)
{
// The subscription may have already been removed
if (_subscriptions.Remove(subscription))
{
subscription.CancellationSource.Cancel();
subscription.CancellationSource.Dispose();
}
}
});
}
private class Unlinker : IDisposable
{
private readonly Action _action;
public Unlinker(Action disposeAction) => _action = disposeAction;
void IDisposable.Dispose() => _action?.Invoke();
}
DataflowMessageStatus ITargetBlock<T>.OfferMessage(
DataflowMessageHeader messageHeader, T messageValue,
ISourceBlock<T> source, bool consumeToAccept)
{
return _actionBlock.OfferMessage(messageHeader, messageValue, source,
consumeToAccept);
}
T ISourceBlock<T>.ConsumeMessage(DataflowMessageHeader messageHeader,
ITargetBlock<T> target, out bool messageConsumed)
=> throw new NotSupportedException();
bool ISourceBlock<T>.ReserveMessage(DataflowMessageHeader messageHeader,
ITargetBlock<T> target)
=> throw new NotSupportedException();
void ISourceBlock<T>.ReleaseReservation(DataflowMessageHeader messageHeader,
ITargetBlock<T> target)
=> throw new NotSupportedException();
}
Missing features: the IReceivableSourceBlock<T> interface is not implemented, and linking with the MaxMessages option is not supported.
This class is thread-safe.

TPL DataFlow, link blocks with priority?

Using TPL.DataFlow blocks, is it possible to link two or more sources to a single ITargetBlock(e.g. ActionBlock) and prioritize the sources?
e.g.
BufferBlock<string> b1 = new ...
BufferBlock<string> b2 = new ...
ActionBlock<string> a = new ...
//somehow force messages in b1 to be processed before any message of b2, always
b1.LinkTo (a);
b2.LinkTo (a);
As long as there are messages in b1, I want those to be fed to "a" and once b1 is empty, b2 messages are beeing pushed into "a"
Ideas?
There is nothing like that in TPL Dataflow itself.
The simplest way I can imagine doing this by yourself would be to create a structure that encapsulates three blocks: high priority input, low priority input and output. Those blocks would be simple BufferBlocks, along with a method forwarding messages from the two inputs to the output based on priority, running in background.
The code could look like this:
public class PriorityBlock<T>
{
private readonly BufferBlock<T> highPriorityTarget;
public ITargetBlock<T> HighPriorityTarget
{
get { return highPriorityTarget; }
}
private readonly BufferBlock<T> lowPriorityTarget;
public ITargetBlock<T> LowPriorityTarget
{
get { return lowPriorityTarget; }
}
private readonly BufferBlock<T> source;
public ISourceBlock<T> Source
{
get { return source; }
}
public PriorityBlock()
{
var options = new DataflowBlockOptions { BoundedCapacity = 1 };
highPriorityTarget = new BufferBlock<T>(options);
lowPriorityTarget = new BufferBlock<T>(options);
source = new BufferBlock<T>(options);
Task.Run(() => ForwardMessages());
}
private async Task ForwardMessages()
{
while (true)
{
await Task.WhenAny(
highPriorityTarget.OutputAvailableAsync(),
lowPriorityTarget.OutputAvailableAsync());
T item;
if (highPriorityTarget.TryReceive(out item))
{
await source.SendAsync(item);
}
else if (lowPriorityTarget.TryReceive(out item))
{
await source.SendAsync(item);
}
else
{
// both input blocks must be completed
source.Complete();
return;
}
}
}
}
Usage would look like this:
b1.LinkTo(priorityBlock.HighPriorityTarget);
b2.LinkTo(priorityBlock.LowPriorityTarget);
priorityBlock.Source.LinkTo(a);
For this to work, a also has to have BoundingCapacity set to one (or at least a very low number).
The caveat with this code is that it can introduce latency of two messages (one waiting in the output block, one waiting in SendAsync()). So, if you have a long list of low priority messages and suddenly a high priority message comes in, it will be processed only after those two low-priority messages that are already waiting.
If this is a problem for you, it can be solved. But I believe it would require more complicated code, that deals with the less public parts of TPL Dataflow, like OfferMessage().
Here is an implementation of a PriorityBufferBlock<T> class, that propagates high priority items more frequently than low priority items. The constructor of this class has a priorityPrecedence parameter, that defines how many high priority items will be propagated for each low priority item. If this parameter has the value 1.0 (the smallest valid value), there is no real priority to speak of. If this parameter has the value Double.PositiveInfinity, no low priority item will ever be propagated as long as there are high priority items in the queue. If this parameter has a more normal value, like 5.0 for example, one low priority item will be propagated for every 5 high priority items.
This class maintains internally two queues, one for high and one for low priority items. The number of items stored in each queue is not taken into account, unless one of the two lists is empty, in which case all items of the other queue are freely propagated on demand. The priorityPrecedence parameter influences the behavior of the class only when both internal queues are non-empty. Otherwise, if only one queue has items, the PriorityBufferBlock<T> behaves like a normal BufferBlock<T>.
public class PriorityBufferBlock<T> : IPropagatorBlock<T, T>,
IReceivableSourceBlock<T>
{
private readonly IPropagatorBlock<T, int> _block;
private readonly Queue<T> _highQueue = new();
private readonly Queue<T> _lowQueue = new();
private readonly Predicate<T> _hasPriorityPredicate;
private readonly double _priorityPrecedence;
private double _priorityCounter = 0;
private object Locker => _highQueue;
public PriorityBufferBlock(Predicate<T> hasPriorityPredicate,
double priorityPrecedence,
DataflowBlockOptions dataflowBlockOptions = null)
{
ArgumentNullException.ThrowIfNull(hasPriorityPredicate);
if (priorityPrecedence < 1.0)
throw new ArgumentOutOfRangeException(nameof(priorityPrecedence));
_hasPriorityPredicate = hasPriorityPredicate;
_priorityPrecedence = priorityPrecedence;
dataflowBlockOptions ??= new();
_block = new TransformBlock<T, int>(item =>
{
bool hasPriority = _hasPriorityPredicate(item);
Queue<T> selectedQueue = hasPriority ? _highQueue : _lowQueue;
lock (Locker) selectedQueue.Enqueue(item);
return 0;
}, new()
{
BoundedCapacity = dataflowBlockOptions.BoundedCapacity,
CancellationToken = dataflowBlockOptions.CancellationToken,
MaxMessagesPerTask = dataflowBlockOptions.MaxMessagesPerTask
});
this.Completion = _block.Completion.ContinueWith(completion =>
{
Debug.Assert(this.Count == 0 || !completion.IsCompletedSuccessfully);
lock (Locker) { _highQueue.Clear(); _lowQueue.Clear(); }
return completion;
}, default, TaskContinuationOptions.ExecuteSynchronously |
TaskContinuationOptions.DenyChildAttach, TaskScheduler.Default).Unwrap();
}
public Task Completion { get; private init; }
public void Complete() => _block.Complete();
void IDataflowBlock.Fault(Exception exception) => _block.Fault(exception);
public int Count
{
get { lock (Locker) return _highQueue.Count + _lowQueue.Count; }
}
private Queue<T> GetSelectedQueue(bool forDequeue)
{
Debug.Assert(Monitor.IsEntered(Locker));
Queue<T> selectedQueue;
if (_highQueue.Count == 0)
selectedQueue = _lowQueue;
else if (_lowQueue.Count == 0)
selectedQueue = _highQueue;
else if (_priorityCounter + 1 > _priorityPrecedence)
selectedQueue = _lowQueue;
else
selectedQueue = _highQueue;
if (forDequeue)
{
if (_highQueue.Count == 0 || _lowQueue.Count == 0)
_priorityCounter = 0;
else if (++_priorityCounter > _priorityPrecedence)
_priorityCounter -= _priorityPrecedence + 1;
}
return selectedQueue;
}
private T Peek()
{
Debug.Assert(Monitor.IsEntered(Locker));
Debug.Assert(_highQueue.Count > 0 || _lowQueue.Count > 0);
return GetSelectedQueue(false).Peek();
}
private T Dequeue()
{
Debug.Assert(Monitor.IsEntered(Locker));
Debug.Assert(_highQueue.Count > 0 || _lowQueue.Count > 0);
return GetSelectedQueue(true).Dequeue();
}
private class TargetProxy : ITargetBlock<int>
{
private readonly PriorityBufferBlock<T> _parent;
private readonly ITargetBlock<T> _realTarget;
public TargetProxy(PriorityBufferBlock<T> parent, ITargetBlock<T> target)
{
Debug.Assert(parent is not null);
_parent = parent;
_realTarget = target ?? throw new ArgumentNullException(nameof(target));
}
public Task Completion => throw new NotSupportedException();
public void Complete() => _realTarget.Complete();
void IDataflowBlock.Fault(Exception error) => _realTarget.Fault(error);
DataflowMessageStatus ITargetBlock<int>.OfferMessage(
DataflowMessageHeader messageHeader, int messageValue,
ISourceBlock<int> source, bool consumeToAccept)
{
Debug.Assert(messageValue == 0);
if (consumeToAccept) throw new NotSupportedException();
lock (_parent.Locker)
{
T realValue = _parent.Peek();
DataflowMessageStatus response = _realTarget.OfferMessage(
messageHeader, realValue, _parent, consumeToAccept);
if (response == DataflowMessageStatus.Accepted) _parent.Dequeue();
return response;
}
}
}
public IDisposable LinkTo(ITargetBlock<T> target,
DataflowLinkOptions linkOptions)
=> _block.LinkTo(new TargetProxy(this, target), linkOptions);
DataflowMessageStatus ITargetBlock<T>.OfferMessage(
DataflowMessageHeader messageHeader, T messageValue,
ISourceBlock<T> source, bool consumeToAccept)
=> _block.OfferMessage(messageHeader,
messageValue, source, consumeToAccept);
T ISourceBlock<T>.ConsumeMessage(DataflowMessageHeader messageHeader,
ITargetBlock<T> target, out bool messageConsumed)
{
_ = _block.ConsumeMessage(messageHeader, new TargetProxy(this, target),
out messageConsumed);
if (messageConsumed) lock (Locker) return Dequeue();
return default;
}
bool ISourceBlock<T>.ReserveMessage(DataflowMessageHeader messageHeader,
ITargetBlock<T> target)
=> _block.ReserveMessage(messageHeader, new TargetProxy(this, target));
void ISourceBlock<T>.ReleaseReservation(DataflowMessageHeader messageHeader,
ITargetBlock<T> target)
=> _block.ReleaseReservation(messageHeader, new TargetProxy(this, target));
public bool TryReceive(Predicate<T> filter, out T item)
{
if (filter is not null) throw new NotSupportedException();
if (((IReceivableSourceBlock<int>)_block).TryReceive(null, out _))
{
lock (Locker) item = Dequeue(); return true;
}
item = default; return false;
}
public bool TryReceiveAll(out IList<T> items)
{
if (((IReceivableSourceBlock<int>)_block).TryReceiveAll(out IList<int> items2))
{
T[] array = new T[items2.Count];
lock (Locker)
for (int i = 0; i < array.Length; i++)
array[i] = Dequeue();
items = array; return true;
}
items = default; return false;
}
}
Usage example:
var bufferBlock = new PriorityBufferBlock<SaleOrder>(x => x.HasPriority, 2.5);
The above implementation supports all the features of the built-in BufferBlock<T>, except from the TryReceive with not-null filter. The core functionality of the block is delegated to an internal TransformBlock<T, int>, that contains a dummy zero value for every item stored in one of the queues.

Why is ConcurrentDictionary.AddOrUpdate method slow?

I am working on a thread safe multi valued dictionary. Internally this dictionary uses a Concurrent dictionary (.net 4.0) with a custom linklist as value. Same key items are added in the linklist. The problem is when I use concurrent dictionary's AddOrUpdate method (approach 1) to insert an item, the code runs a bit slow as compared to when I use a TryGetValue method to check whether the key is present or not and then add or update the value manually inside a lock (approach 2). It takes around 20 seconds to insert 3 million records using the first approach, whereas using the second approach it takes around 9.5 seconds on a same machine (Intel i3 2nd generation 2.2 ghz & 4 Gb ram). There must be something missing which I am not able to figure out yet.
I have also checked the code for concurrent dictionary but it seems to do the same thing as I am doing inside a lock:
public TValue AddOrUpdate(TKey key, Func<TKey, TValue> addValueFactory, Func<TKey, TValue, TValue> updateValueFactory)
{
if (key == null) throw new ArgumentNullException("key");
if (addValueFactory == null) throw new ArgumentNullException("addValueFactory");
if (updateValueFactory == null) throw new ArgumentNullException("updateValueFactory");
TValue newValue, resultingValue;
while (true)
{
TValue oldValue;
if (TryGetValue(key, out oldValue))
//key exists, try to update
{
newValue = updateValueFactory(key, oldValue);
if (TryUpdate(key, newValue, oldValue))
{
return newValue;
}
}
else //try add
{
newValue = addValueFactory(key);
if (TryAddInternal(key, newValue, false, true, out resultingValue))
{
return resultingValue;
}
}
}
}
Here is the code for thread safe multi valued dictionary (approach 2 is commented, uncomment it to check the difference).
Update: There are Remove, Add and other methods also which I have not pasted below.
class ValueWrapper<U, V>
{
private U _key;
private V _value;
public ValueWrapper(U key, V value)
{
this._key = key;
this._value = value;
}
public U Key
{
get { return _key; }
}
public V Value
{
get { return _value; }
set { _value = value; }
}
}
class LinkNode<Type>
{
public LinkNode(Type data)
{
Data = data;
}
public LinkNode<Type> Next;
public Type Data;
}
public class SimpleLinkedList<T>
{
#region Instance Member Variables
private LinkNode<T> _startNode = null;
private LinkNode<T> _endNode = null;
private int _count = 0;
#endregion
public void AddAtLast(T item)
{
if (_endNode == null)
_endNode = _startNode = new LinkNode<T>(item);
else
{
LinkNode<T> node = new LinkNode<T>(item);
_endNode.Next = node;
_endNode = node;
}
_count++;
}
public T First
{
get { return _startNode == null ? default(T) : _startNode.Data; }
}
public int Count
{
get { return _count; }
}
}
class MultiValThreadSafeDictionary<U, T>
{
private ConcurrentDictionary<U, SimpleLinkedList<ValueWrapper<U, T>>> _internalDictionary;
private ReaderWriterLockSlim _slimLock = new ReaderWriterLockSlim();
public MultiValThreadSafeDictionary()
{
_internalDictionary = new ConcurrentDictionary<U, SimpleLinkedList<ValueWrapper<U, T>>>(2, 100);
}
public T this[U key]
{
get
{
throw new NotImplementedException();
}
set
{
/* ****Approach 1 using AddOrUpdate**** */
_internalDictionary.AddOrUpdate(key, (x) =>
{
SimpleLinkedList<ValueWrapper<U, T>> list = new SimpleLinkedList<ValueWrapper<U, T>>();
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
list.AddAtLast(vw);
//_internalDictionary[key] = list;
return list;
},
(k, existingList) =>
{
try
{
_slimLock.EnterWriteLock();
if (existingList.Count == 0)
{
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
existingList.AddAtLast(vw);
}
else
existingList.First.Value = value;
return existingList;
}
finally
{
_slimLock.ExitWriteLock();
}
});
/* ****Approach 2 not using AddOrUpdate**** */
/*
try
{
_slimLock.EnterWriteLock();
SimpleLinkedList<ValueWrapper<U, T>> list;
if (!_internalDictionary.TryGetValue(key, out list))
{
list = new SimpleLinkedList<ValueWrapper<U, T>>();
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
list.AddAtLast(vw);
_internalDictionary[key] = list;
//_iterator.AddAtLast(vw);
return;
}
if (list.Count == 0)
{
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
list.AddAtLast(vw);
//_iterator.AddAtLast(vw);
}
else
list.First.Value = value;
}
finally
{
_slimLock.ExitWriteLock();
}
*/
}
}
}
The test code only insert items, all with unique keys. It is as follows.
MultiValThreadSafeDictionary<string, int> testData = new MultiValThreadSafeDictionary<string, int>();
Task t1 = new Task(() =>
{
for (int i = 0; i < 1000000; i++)
{
testData[i.ToString()] = i;
}
}
);
Task t2 = new Task(() =>
{
for (int i = 1000000; i < 2000000; i++)
{
testData[i.ToString()] = i;
}
}
);
Task t3 = new Task(() =>
{
for (int i = 2000000; i < 3000000; i++)
{
testData[i.ToString()] = i;
}
}
);
Stopwatch watch = new Stopwatch();
watch.Start();
t1.Start();
t2.Start();
t3.Start();
t1.Wait();
t2.Wait();
t3.Wait();
watch.Stop();
Console.WriteLine("time taken:" + watch.ElapsedMilliseconds);
Update 1:
Based on the answer from '280Z28', I am rephrasing the question. Why is GetOrAdd and 'my' method taking almost the same time, where as in my method I am taking an extra lock and also calling TryAndGet method also. And why AddOrUpdate taking the double amount of time as compared to AddOrGet. Code for all of the approaches is as under:
GetOrAdd and AddOrUpdate method in ConcurrentDictionary (.net 4) has the following code:
public TValue GetOrAdd(TKey key, TValue value)
{
if (key == null) throw new ArgumentNullException("key");
TValue resultingValue;
TryAddInternal(key, value, false, true, out resultingValue);
return resultingValue;
}
public TValue AddOrUpdate(TKey key, Func<TKey, TValue> addValueFactory, Func<TKey, TValue, TValue> updateValueFactory)
{
if (key == null) throw new ArgumentNullException("key");
if (addValueFactory == null) throw new ArgumentNullException("addValueFactory");
if (updateValueFactory == null) throw new ArgumentNullException("updateValueFactory");
TValue newValue, resultingValue;
while (true)
{
TValue oldValue;
if (TryGetValue(key, out oldValue))
//key exists, try to update
{
newValue = updateValueFactory(key, oldValue);
if (TryUpdate(key, newValue, oldValue))
{
return newValue;
}
}
else //try add
{
newValue = addValueFactory(key);
if (TryAddInternal(key, newValue, false, true, out resultingValue))
{
return resultingValue;
}
}
}
}
GetOrAdd in my code is used as follows (taking 9 seconds):
SimpleLinkedList<ValueWrapper<U, T>> existingList = new SimpleLinkedList<ValueWrapper<U, T>>();
existingList = _internalDictionary.GetOrAdd(key, existingList);
try
{
_slimLock.EnterWriteLock();
if (existingList.Count == 0)
{
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
existingList.AddAtLast(vw);
}
else
existingList.First.Value = value;
}
finally
{
_slimLock.ExitWriteLock();
}
AddOrUpdate is used as follows (taking 20 seconds on all adds, no updates). As described in one of the answers this approach is not suitable for update.
_internalDictionary.AddOrUpdate(key, (x) =>
{
SimpleLinkedList<ValueWrapper<U, T>> list = new SimpleLinkedList<ValueWrapper<U, T>>();
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
list.AddAtLast(vw);
return list;
},
(k, existingList ) =>
{
try
{
_slimLock.EnterWriteLock();
if (existingList.Count == 0)
{
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
existingList.AddAtLast(vw);
}
else
existingList.First.Value = value;
return existingList;
}
finally
{
_slimLock.ExitWriteLock();
}
});
Code without AddOrGet and AddOrUpdate is as follows (taking 9.5 seconds):
try
{
_slimLock.EnterWriteLock();
VerySimpleLinkedList<ValueWrapper<U, T>> list;
if (!_internalDictionary.TryGetValue(key, out list))
{
list = new VerySimpleLinkedList<ValueWrapper<U, T>>();
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
list.AddAtLast(vw);
_internalDictionary[key] = list;
return;
}
if (list.Count == 0)
{
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
list.AddAtLast(vw);
}
else
list.First.Value = value;
}
finally
{
_slimLock.ExitWriteLock();
}
You should not be using AddOrUpdate for this code. This is extremely clear because your update method never actually updates the value stored in ConcurrentDictionary - it always returns the existingList argument unchanged. Instead, you should be doing something like the following.
SimpleLinkedList<ValueWrapper<U, T>> list = _internalDictionary.GetOrAdd(key, CreateEmptyList);
// operate on list here
...
private static SimpleLinkedList<ValueWrapper<U, T>> CreateEmptyList()
{
return new SimpleLinkedList<ValueWrapper<U, T>>();
}
Read operations on the dictionary are performed in a lock-free manner.
As mentioned in http://msdn.microsoft.com/en-us/library/dd287191.aspx
Implementation of AddOrUpdate is using fine-grained lock so to check if item already exists or not, but when you first read by yourself, being lock free read it is faster and by doing that you are reducing locks required for existing items.

Categories

Resources