There is a demo app I prepared.
using System.Collections.Concurrent;
using System.Reactive.Linq;
class Program
{
static void Main(string[] args)
{
var stored = new ConcurrentQueue<long>();
Observable.Interval(TimeSpan.FromMilliseconds(20))
.Subscribe(it => stored.Enqueue(it));
var random = new Random();
Task.Run(async () =>
{
while (true)
{
await Task.Delay((int)(random.NextDouble() * 1000));
var currBatch = stored.ToArray();
for (int i = 0; i < currBatch.Length; i++)
{
long res;
stored.TryDequeue(out res);
}
Console.WriteLine("[" + string.Join(",", currBatch) + "]");
}
});
Console.ReadLine();
}
}
It simulates independent consumer, which fires at random time intervals. In real app event source would come from file system, though might be bursty.
What this thing does is storing indefinite ammount of events in concurrent queue, until consumer decides to consume gathered events.
I have a strong feeling that this code is unsafe. Is it possible to reproduce such behaviour in purely Rx manner?
If not, can you suggest better / safer approach?
Here you go:
var producer = Observable.Interval(TimeSpan.FromMilliseconds(20));
var random = new Random();
Task.Run(async () =>
{
var notify = new Subject<int>();
producer.Window(() => notify)
.SelectMany(ev => ev.ToList())
.Subscribe(currBatch => Console.WriteLine("[" + string.Join(",", currBatch) + "]"));
while (true)
{
await Task.Delay((int)(random.NextDouble() * 1000));
notify.OnNext(1);
}
});
Console.ReadLine();
Related
I have a code base in which multiple threads are writing in a ConcurrentDictionary and every 60 seconds another thread runs and clones the main CD, clears it, and continues its work on the cloned CD. I want to know am I going to miss some data if I don't use lock while Cloning and Clearing the main CD? The code to demonstrate the problem is like the following:
class Program
{
static object lock_obj = new object();
static async Task Main(string[] args)
{
ConcurrentDictionary<string, ThreadSafeLong> cd = new ConcurrentDictionary<string, ThreadSafeLong>();
Func<Task> addData = () =>
{
return Task.Run(async () =>
{
var counter = 1;
while (true)
{
lock (lock_obj)
{
for (int i = 0; i < 100_000; i++)
{
cd.TryAdd($"{counter}:{i}", new ThreadSafeLong(i));
//WriteLine(i);
}
WriteLine($"Round {counter}");
}
counter++;
await Task.Delay(1_000);
}
});
};
Func<Task> writeData = () =>
{
return Task.Run(async () =>
{
while (true)
{
var sw = Stopwatch.StartNew();
lock (lock_obj) // to clone the data, and prevent any other data to be added while clone
{
var cloned = new ConcurrentDictionary<string, ThreadSafeLong>(cd);
cd.Clear();
WriteLine($"Cloned Count: {cloned.Count}");
}
sw.Stop();
WriteLine($"Elapsed Time: {sw.ElapsedMilliseconds}");
await Task.Delay(6_000);
}
});
};
await Task.WhenAll(addData(), writeData());
}
}
PS: Somehow might be related to the question here
In these cases I would replace the dictionary with a new one instead of calling clear:
lock (lock_obj)
{
var cloned = cd;
cd = new ConcurrentDictionary<string, ThreadSafeLong>();
}
In that case the other threads are finish their write into the old one or already working with the new one.
I am trying to get 100000 string output and trying to achieve with multiple threads but when checking final result string, it only has 10000 line.
Here =>
string result = "";
private void Testing()
{
var threadA = new Thread(() => { result += A()+Environment.NewLine; });
var threadB = new Thread(() => { result += A() + Environment.NewLine; });
var threadC = new Thread(() => { result += A() + Environment.NewLine; });
var threadD = new Thread(() => { result += A() + Environment.NewLine; });
var threadE = new Thread(() => { result += A() + Environment.NewLine; });
var threadF = new Thread(() => { result += A() + Environment.NewLine; });
var threadG = new Thread(() => { result += A() + Environment.NewLine; });
var threadH = new Thread(() => { result += A() + Environment.NewLine; });
var threadI = new Thread(() => { result += A() + Environment.NewLine; });
var threadJ = new Thread(() => { result += A()+Environment.NewLine; });
threadA.Start();
threadB.Start();
threadC.Start();
threadD.Start();
threadE.Start();
threadF.Start();
threadG.Start();
threadH.Start();
threadI.Start();
threadJ.Start();
threadA.Join();
threadB.Join();
threadC.Join();
threadD.Join();
threadE.Join();
threadF.Join();
threadG.Join();
threadH.Join();
threadI.Join();
threadJ.Join();
}
private string A()
{
for (int i = 0; i <= 10000; i++)
{
result += "select * from testing" + Environment.NewLine;
}
return result;
}
But i dont get 100000,I just get 10000.Please let me known why?
Another way you can do this is to forget about creating a Thread it has a lot of overhead and there are many better solutions. Why not just use a Parallel.For. It uses the threadpool you can set how much parallelism you like.
Also if you you are dealing with threads you need to know how to write thread safe code, there are many sorts of locking mechanisms, or there are structures built with thread safety in mind, Thread-Safe Collections . If ordering doesn't matter you could easily use ConcurrentBag<T>
Which can shorten your code to
static ConcurrentBag<string> results = new ConcurrentBag<string>();
...
private static void myTest(int count )
{
for (var i = 0; i < 1000; i++)
{
results.Add("select * from testing " + i * count);
}
}
Usage
Parallel.For(0, 10, myTest);
var result = string.Join(Environment.NewLine, results);
Anyway, this wasn't intended to be a panacea to your problems or the worlds best line writing threaded masterpiece, its just to show you there is lots of resources for threading and many ways to do what you want.
As I've explained in the comments, A() is not thread-safe.
If you visualise result += value; as result = result+ value;, you can see that between a single thread getting the result, and writing it back, it's possible for another thread to get the (now) old value.
You should build each thread's contribution in a local variable (I've changed this to StringBuilder since it's more efficient than string concatenation) and then synchronise the context, and update the result object:
private readonly object _resultLock = new object();
private void A()
{
var lines = new StringBuilder();
for (int i = 0; i <= 10000; i++)
{
lines.AppendLine("select * from testing");
}
lock (_resultLock)
{
result += lines.ToString();
}
}
Since you already have a variable called "result" in the class scope, I've changed A() to a void.
It's best to lock as little as possible, since threads will have to wait to acquire the lock. We use _resultLock so that we know that the lock is for. You can read more about lock in the docs and on this question.
You might also want to look into tasks: docs, Task vs Thread question.
I want to limit the number of items posted in a Dataflow pipeline. The number of items depends of the production environment.
These objects consume a large amount of memory (images) so I would like to post them when the last block of the pipeline has done its job.
I tried to use a SemaphoreSlim to throttle the producer and release it in the last block of the pipeline. It works, but if an exception is raised during the process, the program waits forever and the exception is not intercepted.
Here is a sample which looks like our code.
How can I do this ?
static void Main(string[] args)
{
SemaphoreSlim semaphore = new SemaphoreSlim(1, 2);
var downloadString = new TransformBlock<string, string>(uri =>
{
Console.WriteLine("Downloading '{0}'...", uri);
return new WebClient().DownloadString(uri);
});
var createWordList = new TransformBlock<string, string[]>(text =>
{
Console.WriteLine("Creating word list...");
char[] tokens = text.ToArray();
for (int i = 0; i < tokens.Length; i++)
{
if (!char.IsLetter(tokens[i]))
tokens[i] = ' ';
}
text = new string(tokens);
return text.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries);
});
var filterWordList = new TransformBlock<string[], string[]>(words =>
{
Console.WriteLine("Filtering word list...");
throw new InvalidOperationException("ouch !"); // explicit for test
return words.Where(word => word.Length > 3).OrderBy(word => word)
.Distinct().ToArray();
});
var findPalindromes = new TransformBlock<string[], string[]>(words =>
{
Console.WriteLine("Finding palindromes...");
var palindromes = new ConcurrentQueue<string>();
Parallel.ForEach(words, word =>
{
string reverse = new string(word.Reverse().ToArray());
if (Array.BinarySearch<string>(words, reverse) >= 0 &&
word != reverse)
{
palindromes.Enqueue(word);
}
});
return palindromes.ToArray();
});
var printPalindrome = new ActionBlock<string[]>(palindromes =>
{
try
{
foreach (string palindrome in palindromes)
{
Console.WriteLine("Found palindrome {0}/{1}",
palindrome, new string(palindrome.Reverse().ToArray()));
}
}
finally
{
semaphore.Release();
}
});
downloadString.LinkTo(createWordList);
createWordList.LinkTo(filterWordList);
filterWordList.LinkTo(findPalindromes);
findPalindromes.LinkTo(printPalindrome);
downloadString.Completion.ContinueWith(t =>
{
if (t.IsFaulted)
((IDataflowBlock)createWordList).Fault(t.Exception);
else createWordList.Complete();
});
createWordList.Completion.ContinueWith(t =>
{
if (t.IsFaulted)
((IDataflowBlock)filterWordList).Fault(t.Exception);
else filterWordList.Complete();
});
filterWordList.Completion.ContinueWith(t =>
{
if (t.IsFaulted)
((IDataflowBlock)findPalindromes).Fault(t.Exception);
// enter here when an exception throws
else findPalindromes.Complete();
});
findPalindromes.Completion.ContinueWith(t =>
{
if (t.IsFaulted)
((IDataflowBlock)printPalindrome).Fault(t.Exception);
// the fault is propagated here but not caught
else printPalindrome.Complete();
});
try
{
for (int i = 0; i < 10; i++)
{
Console.WriteLine(i);
downloadString.Post("http://www.google.com");
semaphore.Wait(); // waits here when an exception throws
}
downloadString.Complete();
printPalindrome.Completion.Wait();
}
catch (AggregateException agg)
{
Console.WriteLine("An error has occured : " + agg);
}
Console.WriteLine("Done");
Console.ReadKey();
}
You should simply wait on both the semaphore and the completion task together. In that way if the block ends prematurely (either by exception or cancellation) then the exception will be rethrown and if not then you will wait on your semaphore until there's room to post more.
You can do that with Task.WhenAny and SemaphoreSlim.WaitAsync:
for (int i = 0; i < 10; i++)
{
Console.WriteLine(i);
downloadString.Post("http://www.google.com");
if (printPalindrome.Completion.IsCompleted)
{
break;
}
Task.WhenAny(semaphore.WaitAsync(), printPalindrome.Completion).Wait();
}
Note: using Task.Wait is only appropriate in this case as it's Main. Usually this should be an async method and you should await the task returned from Task.WhenAny.
This is how I handled throttling or only allowing 10 items in the source block at any one time. You could modify this to have 1. Make sure that you also throttle any other blocks in the pipeline, otherwise, you could get the source block with 1 and the next block with a lot more.
var sourceBlock = new BufferBlock<string>(
new ExecutionDataflowBlockOptions() {
SingleProducerConstrained = true,
BoundedCapacity = 10 });
Then the producer does this:
sourceBlock.SendAsync("value", shutdownToken).Wait(shutdownToken);
If you're using async / await, just await the SendAsync call.
I've been having some problems with Service Stack recently- I've figured out that it seems to be caused by having multiple threads, each connecting to Redis to perform operations. If I have only one thread running at any one time it works fine, but any more and I get several different errors. I've seen elsewhere that it's best to use PooledRedisClientManager and calling GetClient on it, but it's still giving me trouble. I'd just like to know if Redis is thread-safe and what steps you can take to ensure it won't break on concurrent threads.
I've created a program specifically for testing this, which is below.
class Program
{
static IRedisClient redis = new PooledRedisClientManager(ConfigurationManager.AppSettings["RedisServer"]).GetClient();
static void Main(string[] args)
{
LimitedConcurrencyLevelTaskScheduler scheduler = new LimitedConcurrencyLevelTaskScheduler(10);
List<Task> tasks = new List<Task>();
// Create a TaskFactory and pass it our custom scheduler.
TaskFactory factory = new TaskFactory(scheduler);
for (int i = 0; i < 100; i++)
{
Task task = factory.StartNew(() => AsyncMethod1(i));
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
for (int i = 0; i < 100; i++)
{
Task task2 = factory.StartNew(() => AsyncMethod2(i));
tasks.Add(task2);
}
Task.WaitAll(tasks.ToArray());
Console.ReadKey();
}
public static void AsyncMethod1(int i)
{
redis.SetEntry("RedisTest" + i, "TestValue" + i);
}
public static void AsyncMethod2(int i)
{
List<string> result = redis.ScanAllKeys("RedisTest" + i).ToList();
if (result[0] == "RedisTest" + i) Console.Out.Write("Success! " + result[0] + "\n");
else Console.Out.Write("Failure! " + result[0] + " :(\n");
}
}
You should not share RedisClient instances across multiple threads as they're not ThreadSafe. Instead, you should resolve and release them from the thread-safe Redis Client Managers - also mentioned in the docs.
Question: Why using a WriteOnceBlock (or BufferBlock) for getting back the answer (like sort of callback) from another BufferBlock<Action> (getting back the answer happens in that posted Action) causes a deadlock (in this code)?
I thought that methods in a class can be considered as messages that we are sending to the object (like the original point of view about OOP that was proposed by - I think - Alan Kay). So I wrote this generic Actor class that helps to convert and ordinary object to an Actor (Of-course there are lots of unseen loopholes here because of mutability and things, but that's not the main concern here).
So we have these definitions:
public class Actor<T>
{
private readonly T _processor;
private readonly BufferBlock<Action<T>> _messageBox = new BufferBlock<Action<T>>();
public Actor(T processor)
{
_processor = processor;
Run();
}
public event Action<T> Send
{
add { _messageBox.Post(value); }
remove { }
}
private async void Run()
{
while (true)
{
var action = await _messageBox.ReceiveAsync();
action(_processor);
}
}
}
public interface IIdGenerator
{
long Next();
}
Now; why this code works:
static void Main(string[] args)
{
var idGenerator1 = new IdInt64();
var idServer1 = new Actor<IIdGenerator>(idGenerator1);
const int n = 1000;
for (var i = 0; i < n; i++)
{
var t = new Task(() =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
};
idServer1.Send += action;
Trace.WriteLine(answer.Receive());
}, TaskCreationOptions.LongRunning); // Runs on a separate new thread
t.Start();
}
Console.WriteLine("press any key you like! :)");
Console.ReadKey();
Trace.Flush();
}
And this code does not work:
static void Main(string[] args)
{
var idGenerator1 = new IdInt64();
var idServer1 = new Actor<IIdGenerator>(idGenerator1);
const int n = 1000;
for (var i = 0; i < n; i++)
{
var t = new Task(() =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
};
idServer1.Send += action;
Trace.WriteLine(answer.Receive());
}, TaskCreationOptions.PreferFairness); // Runs and is managed by Task Scheduler
t.Start();
}
Console.WriteLine("press any key you like! :)");
Console.ReadKey();
Trace.Flush();
}
Different TaskCreationOptions used here to create Tasks. Maybe I am wrong about TPL Dataflow concepts here, just started to use it (A [ThreadStatic] hidden somewhere?).
The problematic issue with your code is this part: answer.Receive().
When you move it inside the action the deadlock doesn't happen:
var t = new Task(() =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
Trace.WriteLine(answer.Receive());
};
idServer1.Send += action;
});
t.Start();
So why is that? answer.Receive();, as opposed to await answer.ReceiveAsnyc(); blocks the thread until an answer is returned. When you use TaskCreationOptions.LongRunning each task gets its own thread, so there's no problem, but without it (the TaskCreationOptions.PreferFairness is irrelevant) all the thread pool threads are busy waiting and so everything is much slower. It doesn't actually deadlock, as you can see when you use 15 instead of 1000.
There are other solutions that help understand the problem:
Increasing the thread pool with ThreadPool.SetMinThreads(1000, 0); before the original code.
Using ReceiveAsnyc:
Task.Run(async () =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
};
idServer1.Send += action;
Trace.WriteLine(await answer.ReceiveAsync());
});