I have a regular Queue object in C# (4.0) and I'm using BackgroundWorkers that access this Queue.
The code I was using is as follows:
do
{
while (dataQueue.Peek() == null // nothing waiting yet
&& isBeingLoaded == true // and worker 1 still actively adding stuff
)
System.Threading.Thread.Sleep(100);
// otherwise ready to do something:
if (dataQueue.Peek() != null) // because maybe the queue is complete and also empty
{
string companyId = dataQueue.Dequeue();
processLists(companyId);
// use up the stuff here //
} // otherwise nothing was there yet, it will resolve on the next loop.
} while (isBeingLoaded == true // still have stuff coming at us
|| dataQueue.Peek() != null); // still have stuff we haven’t done
However, I guess when dealing with threads I should be using a ConcurrentQueue.
I was wondering if there were examples of how to use a ConcurrentQueue in a Do While Loop like above?
Everything I tried with the TryPeek wasn't working..
Any ideas?
You can use a BlockingCollection<T> as a producer-consumer queue.
My answer makes some assumptions about your architecture, but you can probably mold it as you see fit:
public void Producer(BlockingCollection<string> ids)
{
// assuming this.CompanyRepository exists
foreach (var id in this.CompanyRepository.GetIds())
{
ids.Add(id);
}
ids.CompleteAdding(); // nothing left for our workers
}
public void Consumer(BlockingCollection<string> ids)
{
while (true)
{
string id = null;
try
{
id = ids.Take();
} catch (InvalidOperationException) {
}
if (id == null) break;
processLists(id);
}
}
You could spin up as many consumers as you need:
var companyIds = new BlockingCollection<string>();
Producer(companyIds);
Action process = () => Consumer(companyIds);
// 2 workers
Parallel.Invoke(process, process);
Related
I have a thread that handles the message receiving every 10 seconds and have another one write these messages to the database every minute.
Each message has a different sender which is named serialNumber in my case.
Therefore, I created a ConcurrentDictionary like below.
public ConcurrentDictionary<string, ConcurrentQueue<PacketModel>> _dicAllPackets;
The key of the dictionary is serialNumber and the value is the collection of 1-minute messages. The reason I want to collect a minute of data is instead of going database every 10 seconds is go once in every minute so I can reduce the process by 1/6 times.
public class ShotManager
{
private const int SLEEP_THREAD_FOR_FILE_LIST_DB_SHOOTER = 25000;
private bool ACTIVE_FILE_DB_SHOOT_THREAD = false;
private List<Devices> _devices = new List<Devices>();
public ConcurrentDictionary<string, ConcurrentQueue<PacketModel>> _dicAllPackets;
public ShotManager()
{
ACTIVE_FILE_DB_SHOOT_THREAD = Utility.GetAppSettings("AppConfig", "0", "ACTIVE_LIST_DB_SHOOT") == "1";
init();
}
private void init()
{
using (iotemplaridbContext dbContext = new iotemplaridbContext())
_devices = (from d in dbContext.Devices select d).ToList();
if (_dicAllPackets is null)
_dicAllPackets = new ConcurrentDictionary<string, ConcurrentQueue<PacketModel>>();
foreach (var device in _devices)
{
if(!_dicAllPackets.ContainsKey(device.SerialNumber))
_dicAllPackets.TryAdd(device.SerialNumber, new ConcurrentQueue<PacketModel> { });
}
}
public void Spinner()
{
while (ACTIVE_FILE_DB_SHOOT_THREAD)
{
try
{
Parallel.ForEach(_dicAllPackets, devicePacket =>
{
Thread.Sleep(100);
readAndShot(devicePacket);
});
Thread.Sleep(SLEEP_THREAD_FOR_FILE_LIST_DB_SHOOTER);
//init();
}
catch (Exception ex)
{
//init();
tLogger.EXC("Spinner exception for write...", ex);
}
}
}
public void EnqueueObjectToQueue(string serialNumber, PacketModel model)
{
if (_dicAllPackets != null)
{
if (!_dicAllPackets.ContainsKey(serialNumber))
_dicAllPackets.TryAdd(serialNumber, new ConcurrentQueue<PacketModel> { });
else
_dicAllPackets[serialNumber].Enqueue(model);
}
}
private void readAndShot(KeyValuePair<string, ConcurrentQueue<PacketModel>> keyValuePair)
{
StringBuilder sb = new StringBuilder();
if (keyValuePair.Value.Count() <= 0)
{
return;
}
sb.AppendLine($"INSERT INTO ......) VALUES(");
//the reason why I don't use while(TryDequeue(out ..)){..} is there's constantly enqueue to this dictionary, so the thread will be occupied with a single device for so long
for (int i = 0; i < 10; i++)
{
keyValuePair.Value.TryDequeue(out PacketModel packet);
if (packet != null)
{
/*
*** do something and fill the sb...
*/
}
else
{
Console.WriteLine("No packet found! For Device: " + keyValuePair.Key);
break;
}
}
insertIntoDB(sb.ToString()[..(sb.Length - 5)] + ";");
}
}
EnqueueObjectToQueue caller is from a different class like below.
private void packetToDictionary(string serialNumber, string jsonPacket, string messageTimeStamp)
{
PacketModel model = new PacketModel {
MachineData = jsonPacket,
DataInsertedAt = messageTimeStamp
};
_shotManager.EnqueueObjectToQueue(serialNumber, model);
}
How I call the above function is from the handler function itself.
private void messageReceiveHandler(object sender, MessageReceviedEventArgs e){
//do something...parse from e and call the func
string jsonPacket = ""; //something parsed from e
string serialNumber = ""; //something parsed from e
string message_timestamp = DateTime.Now().ToString("yyyy-MM-dd HH:mm:ss");
ThreadPool.QueueUserWorkItem(state => packetToDictionary(serialNumber, str, message_timestamp));
}
The problem is sometimes some packets are enqueued under the wrong serialNumber or repeat itself(duplicate entry).
Is it clever to use ConcurrentQueue in a ConcurrentDictionary like this?
No, it's not a good idea to use a ConcurrentDictionary with nested ConcurrentQueues as values. It's impossible to update atomically this structure. Take this for example:
if (!_dicAllPackets.ContainsKey(serialNumber))
_dicAllPackets.TryAdd(serialNumber, new ConcurrentQueue<PacketModel> { });
else
_dicAllPackets[serialNumber].Enqueue(model);
This little piece of code is riddled with race conditions. A thread that is running this code can be intercepted by another thread at any point between the ContainsKey, TryAdd, the [] indexer and the Enqueue invocations, altering the state of the structure, and invalidating the conditions on which the correctness of the current thread's work is based.
A ConcurrentDictionary is a good idea when you have a simple Dictionary that contains immutable values, you want to use it concurrently, and using a lock around each access could potentially create significant contention. You can read more about this here: When should I use ConcurrentDictionary and Dictionary?
My suggestion is to switch to a simple Dictionary<string, Queue<PacketModel>>, and synchronize it with a lock. If you are careful and you avoid doing anything irrelevant while holding the lock, the lock will be released so quickly that rarely other threads will be blocked by it. Use the lock just to protect the reading and updating of a specific entry of the structure, and nothing else.
Alternative designs
A ConcurrentDictionary<string, Queue<PacketModel>> structure might be a good option, under the condition that you never removed queues from the dictionary. Otherwise there is still space for race conditions to occur. You should use exclusively the GetOrAdd method to get or add atomically a queue in the dictionary, and also use always the queue itself as a locker before doing anything with it (either reading or writing):
Queue<PacketModel> queue = _dicAllPackets
.GetOrAdd(serialNumber, _ => new Queue<PacketModel>());
lock (queue)
{
queue.Enqueue(model);
}
Using a ConcurrentDictionary<string, ImmutableQueue<PacketModel>> is also possible because in this case the value of the ConcurrentDictionary is immutable, and you won't need to lock anything. You'll need to use always the AddOrUpdate method, in order to update the dictionary with a single call, as an atomic operation.
_dicAllPackets.AddOrUpdate
(
serialNumber,
key => ImmutableQueue.Create<PacketModel>(model),
(key, queue) => queue.Enqueue(model)
);
The queue.Enqueue(model) call inside the updateValueFactory delegate does not mutate the queue. Instead it creates a new ImmutableQueue<PacketModel> and discards the previous one. The immutable collections are not very efficient in general. But if your goal is to minimize the contention between threads, at the cost of increasing the work that each thread has to do, then you might find them useful.
I have this piece of code in C#:
Thread.BeginCriticalRegion();
if(visitedUrls.Contains(url) || visitedUrls.Where( x => x.Contains(root) ).Count() > 150) {
return;
}
else{
visitedUrls.Add(url);
}
Thread.EndCriticalRegion();
which it's into a function that gets called by several different processes, and that's why I (tried to) make it thread-safe.
The exception Collection was modified; enumeration operation may not execute raises on the if line, but if I leave it as
if(visitedUrls.Contains(url)
it works fine, why?
EDIT
This the actual code:
public void scrapAzienda(String url, String root_url, int depth)
{
if (depth <= 0) return;
var web = new HtmlWeb();
HtmlNode[] nodes = null;
HtmlDocument doc = null;
HtmlNode bodyNode = null;
Thread.BeginCriticalRegion();
if (urlVisitati.Contains(url) || urlVisitati.Where(x => x.Contains(root_url)).Count() > 150)
return;
else
urlVisitati.Add(url);
Thread.EndCriticalRegion();
try
{
doc = web.Load(url, Proxy.getUrl(), Proxy.getPort(), Proxy.getUsername(), Proxy.getPassword());
nodes = doc.DocumentNode.SelectNodes("//a[#href]").ToArray() ?? null;
foreach (HtmlNode item in nodes)
{
Task.Factory.StartNew(() => scrapAzienda(item.Attributes["href"].Value, root_url, depth - 1), TaskCreationOptions.AttachedToParent);
}
GC.Collect();
if (doc != null)
{
bodyNode = doc.DocumentNode.SelectSingleNode("//body");
cercaNumeri(bodyNode.InnerText, url);
cercaEmail(bodyNode.InnerText, url);
}
}
catch (Exception) { }
}
Basically it's just a webscraper.
I think threading is the entirety of your issue. Thread.BeginCriticalRegion() doesn't do what you think it does. From the docs:
Notifies a host that execution is about to enter a region of code in which the effects of a thread abort or unhandled exception might jeopardize other tasks in the application domain.
In other words, it doesn't enforce thread-safety, it just says "if this breaks, it's gonna take everything down with it!"
What you need instead is a basic lock:
lock(someObj)
{
if (urlVisitati.Contains(url) || urlVisitati.Where(x => x.Contains(root_url)).Count() > 150)
{
return;
}
else
{
urlVisitati.Add(url);
}
}
someObj needs to be a static object. Every thread needs to refer to the same object. I usually create a basic object for this purpose, at the class level:
private static readonly object SyncLock = new object();
You then use lock with that object: lock(SyncLock). You can also lock on the list itself, however, only one thread can get a lock on the sync object at a time. To help prevent deadlocks, your code should ideally be the only source of locks on whatever object you're syncing on. Can you guarantee something inside the list class itself won't get a lock on itself? Don't worry about it, make your own sync object. No big deal.
This is a crash-course into threading with lock. There are other ways of doing this. This one should work for you.
I am bulding a web-scraping project.
I have two lists:
private ConcurrentQueue<string> links = new ConcurrentQueue<string>();
private ConcurrentQueue<string> Visitedlinks = new ConcurrentQueue<string>();
On for all the links that I find on a page and one which will hold all the links I have scrapped.
Method that handels the business:
public async Task GetUrlContent(string url)
{
var page = string.Empty;
try
{
page = await service.Get(url);
if (page != string.Empty)
{
Regex regex = new Regex(#"<a[^>]*?href\s*=\s*[""']?([^'"" >]+?)[ '""][^>]*?>",
RegexOptions.Singleline | RegexOptions.CultureInvariant);
if (regex.IsMatch(page))
{
Console.WriteLine("Downloading url: " + url);
for (int i = 0; i < regex.Matches(page).Count; i++)
{
if (regex.Matches(page)[i].Groups[1].Value.StartsWith("/"))
{
if (!links.Contains(BaseUrl + regex.Matches(page)[i].Groups[1].Value.ToLower().Replace(".html", "")) &&
!Visitedlinks.Contains(BaseUrl + regex.Matches(page)[i].Groups[1].Value.ToLower()))
{
Uri ValidUri = GetUrl(regex.Matches(page)[i].Groups[1].Value);
if (ValidUri != null && HostUrls.Contains(ValidUri.Host))
links.Enqueue(regex.Matches(page)[i].Groups[1].Value.ToLower().Replace(".html", ""));
else
links.Enqueue(BaseUrl + regex.Matches(page)[i].Groups[1].Value.ToLower().Replace(".html", ""));
}
}
}
}
var results = links.Where(m => !Visitedlinks.Contains(m)); // problkem here, get multiple values
if (!results.Any())
{
// do nothing
}
else
{
Parallel.ForEach(results, new ParallelOptions { MaxDegreeOfParallelism = 4 },
webpage =>
{
if (ValidUrl(webpage))
{
if (!Visitedlinks.Contains(webpage))
{
Visitedlinks.Enqueue(webpage);
GetUrlContent(webpage).Wait();
}
}
});
}
}
}
catch (Exception e)
{
throw;
}
}
Problem is here:
var results = links.Where(m => !Visitedlinks.Contains(m));
The first iteration I might get:
Link1, link2, link3, link4,
Second iteration:
Link2 link3 link4, link5, link6 ,link 7
Third:
Link 3, link4, link 5, link 6, etc
This means that I will get the same links multiple times since this is a parallel foreach which does several operations at once. I can't figure out how to make sure that I dont get multiple values.
Anyone that can lend a helping hand?
If I understand correctly, the first queue contains the links you want to scrape, and the second queue contains the ones you have scraped.
The problem is that you're trying to iterate over the contents of your ConcurrentQueue:
var results = links.Where(m => !Visitedlinks.Contains(m));
This won't work predictably if you're accessing these queues from multiple threads.
What you should do is take items out of the queue and process them. What stands out is that TryDequeue doesn't appear anywhere in your code. Items are going into the queue but never coming out. The whole purpose of a queue is that we put things in and take them out. ConcurrentQueue makes it safe for multiple threads to put items in and take them out without stepping all over each other.
If you dequeue a link that you want to process:
string linkToProcess = null;
if(links.TryDequeue(out linkToProcess)) // if this returns false, the queue was empty
{
// process it
}
Then as soon as you've taken an item out of the queue to process it, it won't be in the queue anymore. Other threads don't have to check to see if an item has been processed. They just take the next item out of the queue, if there is one. Two threads won't ever take the same item out of the queue. Only one thread can take a given item out of the queue, because as soon as it does, the item isn't in the queue anymore.
Thanks to #Scott Hannen
The final solution is as follows:
Parallel.ForEach(links, new ParallelOptions { MaxDegreeOfParallelism = 25 },
webpage =>
{
try
{
if (WebPageValidator.ValidUrl(webpage))
{
string linkToProcess = webpage;
if (links.TryDequeue(out linkToProcess) && !Visitedlinks.Contains(linkToProcess))
{
Task obj = Scrape(linkToProcess);
Visitedlinks.Enqueue(linkToProcess);
}
}
}
catch (Exception e)
{
log.Error("Error occured: " + e.Message);
Console.WriteLine("Error occured, check log for further details.");
}
I came across a back pressure issue with RX.net I can't find a solution for. I have an observable real-time stream of log messages.
var logObservable = /* Observable stream of log messages */
Which I want to expose via a TCP interface which serializes the real-time log messages from the logObservable before they are sent over the wire. So I do the following:
foreach (var message in logObservable.ToEnumerable())
{
// 1. Serialize message
// 2. Send it over the wire.
}
The problem arises with the .ToEnumerable() if a back pressure scenario happens e.g. if the client on the other end pauses the stream. The problem is that .ToEnumerable() caches the items which result in a lot of memory usage. I'm looking for a mechanism something like a DropQueue which only buffers, let say, the last 10 messages e.g.
var observableStream = logObservable.DropQueue(10).ToEnumerable();
Is this the right way to way to solve this issue? And do you know to implement such a mechanism to avoid possible back pressure issue?
My DropQueue implementation:
public static IEnumerable<TSource> ToDropQueue<TSource>(
this IObservable<TSource> source,
int queueSize,
Action backPressureNotification = null,
CancellationToken token = default(CancellationToken))
{
var queue = new BlockingCollection<TSource>(new ConcurrentQueue<TSource>(), queueSize);
var isBackPressureNotified = false;
var subscription = source.Subscribe(
item =>
{
var isBackPressure = queue.Count == queue.BoundedCapacity;
if (isBackPressure)
{
queue.Take(); // Dequeue an item to make space for the next one
// Fire back-pressure notification if defined
if (!isBackPressureNotified && backPressureNotification != null)
{
backPressureNotification();
isBackPressureNotified = true;
}
}
else
{
isBackPressureNotified = false;
}
queue.Add(item);
},
exception => queue.CompleteAdding(),
() => queue.CompleteAdding());
token.Register(() => { subscription.Dispose(); });
using (new CompositeDisposable(subscription, queue))
{
foreach (var item in queue.GetConsumingEnumerable())
{
yield return item;
}
}
}
I'm building a multithreaded app in .net.
I have a thread that listens to a connection (abstract, serial, tcp...).
When it receives a new message, it adds it to via AddMessage. Which then call startSpool. startSpool checks to see if the spool is already running and if it is, returns, otherwise, starts it in a new thread. The reason for this is, the messages HAVE to be processed serially, FIFO.
So, my questions are...
Am I going about this the right way?
Are there better, faster, cheaper patterns out there?
My apologies if there is a typo in my code, I was having problems copying and pasting.
ConcurrentQueue<IMyMessage > messages = new ConcurrentQueue<IMyMessage>();
const int maxSpoolInstances = 1;
object lcurrentSpoolInstances;
int currentSpoolInstances = 0;
Thread spoolThread;
public void AddMessage(IMyMessage message)
{
this.messages.Add(message);
this.startSpool();
}
private void startSpool()
{
bool run = false;
lock (lcurrentSpoolInstances)
{
if (currentSpoolInstances <= maxSpoolInstances)
{
this.currentSpoolInstances++;
run = true;
}
else
{
return;
}
}
if (run)
{
this.spoolThread = new Thread(new ThreadStart(spool));
this.spoolThread.Start();
}
}
private void spool()
{
Message.ITimingMessage message;
while (this.messages.Count > 0)
{
// TODO: Is this below line necessary or does the TryDequeue cover this?
message = null;
this.messages.TryDequeue(out message);
if (message != null)
{
// My long running thing that does something with this message.
}
}
lock (lcurrentSpoolInstances)
{
this.currentSpoolInstances--;
}
}
This would be easier using BlockingCollection<T> instead of ConcurrentQueue<T>.
Something like this should work:
class MessageProcessor : IDisposable
{
BlockingCollection<IMyMessage> messages = new BlockingCollection<IMyMessage>();
public MessageProcessor()
{
// Move this to constructor to prevent race condition in existing code (you could start multiple threads...
Task.Factory.StartNew(this.spool, TaskCreationOptions.LongRunning);
}
public void AddMessage(IMyMessage message)
{
this.messages.Add(message);
}
private void Spool()
{
foreach(IMyMessage message in this.messages.GetConsumingEnumerable())
{
// long running thing that does something with this message.
}
}
public void FinishProcessing()
{
// This will tell the spooling you're done adding, so it shuts down
this.messages.CompleteAdding();
}
void IDisposable.Dispose()
{
this.FinishProcessing();
}
}
Edit: If you wanted to support multiple consumers, you could handle that via a separate constructor. I'd refactor this to:
public MessageProcessor(int numberOfConsumers = 1)
{
for (int i=0;i<numberOfConsumers;++i)
StartConsumer();
}
private void StartConsumer()
{
// Move this to constructor to prevent race condition in existing code (you could start multiple threads...
Task.Factory.StartNew(this.spool, TaskCreationOptions.LongRunning);
}
This would allow you to start any number of consumers. Note that this breaks the rule of having it be strictly FIFO - the processing will potentially process "numberOfConsumer" elements in blocks with this change.
Multiple producers are already supported. The above is thread safe, so any number of threads can call Add(message) in parallel, with no changes.
I think that Reed's answer is the best way to go, but for the sake of academics, here is an example using the concurrent queue -- you had some races in the code that you posted (depending upon how you handle incrementing currnetSpoolInstances)
The changes I made (below) were:
Switched to a Task instead of a Thread (uses thread pool instead of incurring the cost of creating a new thread)
added the code to increment/decrement your spool instance count
changed the "if currentSpoolInstances <= max ... to just < to avoid having one too many workers (probably just a typo)
changed the way that empty queues were handled to avoid a race: I think you had a race, where your while loop could have tested false, (you thread begins to exit), but at that moment, a new item is added (so your spool thread is exiting, but your spool count > 0, so your queue stalls).
private ConcurrentQueue<IMyMessage> messages = new ConcurrentQueue<IMyMessage>();
const int maxSpoolInstances = 1;
object lcurrentSpoolInstances = new object();
int currentSpoolInstances = 0;
public void AddMessage(IMyMessage message)
{
this.messages.Enqueue(message);
this.startSpool();
}
private void startSpool()
{
lock (lcurrentSpoolInstances)
{
if (currentSpoolInstances < maxSpoolInstances)
{
this.currentSpoolInstances++;
Task.Factory.StartNew(spool, TaskCreationOptions.LongRunning);
}
}
}
private void spool()
{
IMyMessage message;
while (true)
{
// you do not need to null message because it is an "out" parameter, had it been a "ref" parameter, you would want to null it.
if(this.messages.TryDequeue(out message))
{
// My long running thing that does something with this message.
}
else
{
lock (lcurrentSpoolInstances)
{
if (this.messages.IsEmpty)
{
this.currentSpoolInstances--;
return;
}
}
}
}
}
Check 'Pipelines pattern': http://msdn.microsoft.com/en-us/library/ff963548.aspx
Use BlockingCollection for the 'buffers'.
Each Processor (e.g. ReadStrings, CorrectCase, ..), should run in a Task.
HTH..