I have this piece of code in C#:
Thread.BeginCriticalRegion();
if(visitedUrls.Contains(url) || visitedUrls.Where( x => x.Contains(root) ).Count() > 150) {
return;
}
else{
visitedUrls.Add(url);
}
Thread.EndCriticalRegion();
which it's into a function that gets called by several different processes, and that's why I (tried to) make it thread-safe.
The exception Collection was modified; enumeration operation may not execute raises on the if line, but if I leave it as
if(visitedUrls.Contains(url)
it works fine, why?
EDIT
This the actual code:
public void scrapAzienda(String url, String root_url, int depth)
{
if (depth <= 0) return;
var web = new HtmlWeb();
HtmlNode[] nodes = null;
HtmlDocument doc = null;
HtmlNode bodyNode = null;
Thread.BeginCriticalRegion();
if (urlVisitati.Contains(url) || urlVisitati.Where(x => x.Contains(root_url)).Count() > 150)
return;
else
urlVisitati.Add(url);
Thread.EndCriticalRegion();
try
{
doc = web.Load(url, Proxy.getUrl(), Proxy.getPort(), Proxy.getUsername(), Proxy.getPassword());
nodes = doc.DocumentNode.SelectNodes("//a[#href]").ToArray() ?? null;
foreach (HtmlNode item in nodes)
{
Task.Factory.StartNew(() => scrapAzienda(item.Attributes["href"].Value, root_url, depth - 1), TaskCreationOptions.AttachedToParent);
}
GC.Collect();
if (doc != null)
{
bodyNode = doc.DocumentNode.SelectSingleNode("//body");
cercaNumeri(bodyNode.InnerText, url);
cercaEmail(bodyNode.InnerText, url);
}
}
catch (Exception) { }
}
Basically it's just a webscraper.
I think threading is the entirety of your issue. Thread.BeginCriticalRegion() doesn't do what you think it does. From the docs:
Notifies a host that execution is about to enter a region of code in which the effects of a thread abort or unhandled exception might jeopardize other tasks in the application domain.
In other words, it doesn't enforce thread-safety, it just says "if this breaks, it's gonna take everything down with it!"
What you need instead is a basic lock:
lock(someObj)
{
if (urlVisitati.Contains(url) || urlVisitati.Where(x => x.Contains(root_url)).Count() > 150)
{
return;
}
else
{
urlVisitati.Add(url);
}
}
someObj needs to be a static object. Every thread needs to refer to the same object. I usually create a basic object for this purpose, at the class level:
private static readonly object SyncLock = new object();
You then use lock with that object: lock(SyncLock). You can also lock on the list itself, however, only one thread can get a lock on the sync object at a time. To help prevent deadlocks, your code should ideally be the only source of locks on whatever object you're syncing on. Can you guarantee something inside the list class itself won't get a lock on itself? Don't worry about it, make your own sync object. No big deal.
This is a crash-course into threading with lock. There are other ways of doing this. This one should work for you.
Related
I have a thread that handles the message receiving every 10 seconds and have another one write these messages to the database every minute.
Each message has a different sender which is named serialNumber in my case.
Therefore, I created a ConcurrentDictionary like below.
public ConcurrentDictionary<string, ConcurrentQueue<PacketModel>> _dicAllPackets;
The key of the dictionary is serialNumber and the value is the collection of 1-minute messages. The reason I want to collect a minute of data is instead of going database every 10 seconds is go once in every minute so I can reduce the process by 1/6 times.
public class ShotManager
{
private const int SLEEP_THREAD_FOR_FILE_LIST_DB_SHOOTER = 25000;
private bool ACTIVE_FILE_DB_SHOOT_THREAD = false;
private List<Devices> _devices = new List<Devices>();
public ConcurrentDictionary<string, ConcurrentQueue<PacketModel>> _dicAllPackets;
public ShotManager()
{
ACTIVE_FILE_DB_SHOOT_THREAD = Utility.GetAppSettings("AppConfig", "0", "ACTIVE_LIST_DB_SHOOT") == "1";
init();
}
private void init()
{
using (iotemplaridbContext dbContext = new iotemplaridbContext())
_devices = (from d in dbContext.Devices select d).ToList();
if (_dicAllPackets is null)
_dicAllPackets = new ConcurrentDictionary<string, ConcurrentQueue<PacketModel>>();
foreach (var device in _devices)
{
if(!_dicAllPackets.ContainsKey(device.SerialNumber))
_dicAllPackets.TryAdd(device.SerialNumber, new ConcurrentQueue<PacketModel> { });
}
}
public void Spinner()
{
while (ACTIVE_FILE_DB_SHOOT_THREAD)
{
try
{
Parallel.ForEach(_dicAllPackets, devicePacket =>
{
Thread.Sleep(100);
readAndShot(devicePacket);
});
Thread.Sleep(SLEEP_THREAD_FOR_FILE_LIST_DB_SHOOTER);
//init();
}
catch (Exception ex)
{
//init();
tLogger.EXC("Spinner exception for write...", ex);
}
}
}
public void EnqueueObjectToQueue(string serialNumber, PacketModel model)
{
if (_dicAllPackets != null)
{
if (!_dicAllPackets.ContainsKey(serialNumber))
_dicAllPackets.TryAdd(serialNumber, new ConcurrentQueue<PacketModel> { });
else
_dicAllPackets[serialNumber].Enqueue(model);
}
}
private void readAndShot(KeyValuePair<string, ConcurrentQueue<PacketModel>> keyValuePair)
{
StringBuilder sb = new StringBuilder();
if (keyValuePair.Value.Count() <= 0)
{
return;
}
sb.AppendLine($"INSERT INTO ......) VALUES(");
//the reason why I don't use while(TryDequeue(out ..)){..} is there's constantly enqueue to this dictionary, so the thread will be occupied with a single device for so long
for (int i = 0; i < 10; i++)
{
keyValuePair.Value.TryDequeue(out PacketModel packet);
if (packet != null)
{
/*
*** do something and fill the sb...
*/
}
else
{
Console.WriteLine("No packet found! For Device: " + keyValuePair.Key);
break;
}
}
insertIntoDB(sb.ToString()[..(sb.Length - 5)] + ";");
}
}
EnqueueObjectToQueue caller is from a different class like below.
private void packetToDictionary(string serialNumber, string jsonPacket, string messageTimeStamp)
{
PacketModel model = new PacketModel {
MachineData = jsonPacket,
DataInsertedAt = messageTimeStamp
};
_shotManager.EnqueueObjectToQueue(serialNumber, model);
}
How I call the above function is from the handler function itself.
private void messageReceiveHandler(object sender, MessageReceviedEventArgs e){
//do something...parse from e and call the func
string jsonPacket = ""; //something parsed from e
string serialNumber = ""; //something parsed from e
string message_timestamp = DateTime.Now().ToString("yyyy-MM-dd HH:mm:ss");
ThreadPool.QueueUserWorkItem(state => packetToDictionary(serialNumber, str, message_timestamp));
}
The problem is sometimes some packets are enqueued under the wrong serialNumber or repeat itself(duplicate entry).
Is it clever to use ConcurrentQueue in a ConcurrentDictionary like this?
No, it's not a good idea to use a ConcurrentDictionary with nested ConcurrentQueues as values. It's impossible to update atomically this structure. Take this for example:
if (!_dicAllPackets.ContainsKey(serialNumber))
_dicAllPackets.TryAdd(serialNumber, new ConcurrentQueue<PacketModel> { });
else
_dicAllPackets[serialNumber].Enqueue(model);
This little piece of code is riddled with race conditions. A thread that is running this code can be intercepted by another thread at any point between the ContainsKey, TryAdd, the [] indexer and the Enqueue invocations, altering the state of the structure, and invalidating the conditions on which the correctness of the current thread's work is based.
A ConcurrentDictionary is a good idea when you have a simple Dictionary that contains immutable values, you want to use it concurrently, and using a lock around each access could potentially create significant contention. You can read more about this here: When should I use ConcurrentDictionary and Dictionary?
My suggestion is to switch to a simple Dictionary<string, Queue<PacketModel>>, and synchronize it with a lock. If you are careful and you avoid doing anything irrelevant while holding the lock, the lock will be released so quickly that rarely other threads will be blocked by it. Use the lock just to protect the reading and updating of a specific entry of the structure, and nothing else.
Alternative designs
A ConcurrentDictionary<string, Queue<PacketModel>> structure might be a good option, under the condition that you never removed queues from the dictionary. Otherwise there is still space for race conditions to occur. You should use exclusively the GetOrAdd method to get or add atomically a queue in the dictionary, and also use always the queue itself as a locker before doing anything with it (either reading or writing):
Queue<PacketModel> queue = _dicAllPackets
.GetOrAdd(serialNumber, _ => new Queue<PacketModel>());
lock (queue)
{
queue.Enqueue(model);
}
Using a ConcurrentDictionary<string, ImmutableQueue<PacketModel>> is also possible because in this case the value of the ConcurrentDictionary is immutable, and you won't need to lock anything. You'll need to use always the AddOrUpdate method, in order to update the dictionary with a single call, as an atomic operation.
_dicAllPackets.AddOrUpdate
(
serialNumber,
key => ImmutableQueue.Create<PacketModel>(model),
(key, queue) => queue.Enqueue(model)
);
The queue.Enqueue(model) call inside the updateValueFactory delegate does not mutate the queue. Instead it creates a new ImmutableQueue<PacketModel> and discards the previous one. The immutable collections are not very efficient in general. But if your goal is to minimize the contention between threads, at the cost of increasing the work that each thread has to do, then you might find them useful.
I have a simple logging mechanism that should be thread safe. It works most of the time, but every now and then I get an exception on this line, "_logQ.Enqueue(s);" that the queue is not long enough. Looking in the debugger there are sometimes just hundreds of items, so I can't see it being resources. The queue is supposed to expand as needed. If I catch the exception as opposed to letting the debugger pause at the exception I see the same error. Is there something not thread safe here? I don't even know how to start debugging this.
static void ProcessLogQ(object state)
{
try
{
while (_logQ.Count > 0)
{
var s = _logQ.Dequeue();
string dir="";
Type t=Type.GetType("Mono.Runtime");
if (t!=null)
{
dir ="/var/log";
}else
{
dir = #"c:\log";
if (!Directory.Exists(dir))
Directory.CreateDirectory(dir);
}
if (Directory.Exists(dir))
{
File.AppendAllText(Path.Combine(dir, "admin.log"), DateTime.Now.ToString("hh:mm:ss ") + s + Environment.NewLine);
}
}
}
catch (Exception)
{
}
finally
{
_isProcessingLogQ = false;
}
}
public static void Log(string s) {
if (_logQ == null)
_logQ = new Queue<string> { };
lock (_logQ)
_logQ.Enqueue(s);
if (!_isProcessingLogQ) {
_isProcessingLogQ = true;
ThreadPool.QueueUserWorkItem(ProcessLogQ);
}
}
Note that the threads all call Log(string s). ProcessLogQ is private to the logger class.
* Edit *
I made a mistake in not mentioning that this is in a .NET 3.5 environment, therefore I can't use Task or ConcurrentQueue. I am working on fixes for the current example within .NET 3.5 constraints.
** Edit *
I believe I have a thread-safe version for .NET 3.5 listed below. I start the logger thread once from a single thread at program start, so there is only one thread running to log to the file (t is a static Thread):
static void ProcessLogQ()
{
while (true) {
try {
lock (_logQ);
while (_logQ.Count > 0) {
var s = _logQ.Dequeue ();
string dir = "../../log";
if (!Directory.Exists (dir))
Directory.CreateDirectory (dir);
if (Directory.Exists (dir)) {
File.AppendAllText (Path.Combine (dir, "s3ol.log"), DateTime.Now.ToString ("hh:mm:ss ") + s + Environment.NewLine);
}
}
} catch (Exception ex) {
Console.WriteLine (ex.Message);
} finally {
}
Thread.Sleep (1000);
}
}
public static void startLogger(){
lock (t) {
if (t.ThreadState != ThreadState.Running)
t.Start ();
}
}
private static void multiThreadLog(string msg){
lock (_logQ)
_logQ.Enqueue(msg);
}
Look at the TaskParallel Library. All the hard work is already done for you. If you're doing this to learn about multithreading read up on locking techniques and pros and cons of each.
Further, you're checking if _logQ is null outside your lock statement, from what I can deduce it's a static field that you're not initializing inside a static constructor. You can avoid doing this null check (which should be inside a lock, it's critical code!) you can ensure thread-safety by making it a static readonly and initializing it inside the static constructor.
Further, you're not properly handling queue states. Since there's no lock during the check of the queue count it could vary on every iteration. You're missing a lock as your dequeuing items.
Excellent resource:
http://www.yoda.arachsys.com/csharp/threads/
For a thread-safe queue, you should use the ConcurrentQueue instead:
https://msdn.microsoft.com/en-us/library/dd267265(v=vs.110).aspx
I am trying to write a custom mechanism for compressing and caching web scripts. I am using a Mutex to provide managed access for the cache creation methods.
public class HttpApplicationCacheManager
{
public object Get(
Cache cache, // Reference to the HttpContext.Cache
string key, // Id of the cached object
int retrievalWaitTime,
Func<object> getData, // Method that builds the string to be cached
Func<CacheDependency> getDependency) // CacheDependency object for the
// string[] of file paths to be cached
{
Mutex mutex = null;
bool iOwnMutex = false;
object data = cache[key];
// Start check to see if available on cache
if (data == null)
{
try
{
// Lock base on resource key
// (note that not all chars are valid for name)
mutex = new Mutex(false, key);
// Wait until it is safe to enter (someone else might already be
// doing this), but also add 30 seconds max.
iOwnMutex = mutex.WaitOne(retrievalWaitTime * 1000);
// Now let's see if some one else has added it...
data = cache[key];
// They did, so send it...
if (data != null)
{
return data;
}
// Still not there, so now is the time to look for it!
data = getData();
var dependency = getDependency();
cache.Insert(key, data, dependency);
}
catch
{
throw;
}
finally
{
// Release the Mutex.
if ((mutex != null) && (iOwnMutex))
{
mutex.ReleaseMutex();
}
}
}
return data;
}
}
The
Whilst this works, I occasionally see the following error:
System.UnauthorizedAccessException
Access to the path 'SquashCss-theme.midnight.dialog' is denied.
I have found some posts suggesting that this might be due to a race condition. Unfortunately, my Mutex knowledge is very limited and I am struggling to see where the problem might be.
Any help would be much appreciated.
Why not just use any of the built-in .NET caches? I don't see anything in your code that could not be handled by the .NET cache implementations. Another option maybe the readerwriterlockslim class, since you really only need to lock on writes.
While keeping in mind that:
I am using a blocking queue that waits for ever until something is added to it
I might get a FileSystemWatcher event twice
The updated code:
{
FileProcessingManager processingManager = new FileProcessingManager();
processingManager.RegisterProcessor(new ExcelFileProcessor());
processingManager.RegisterProcessor(new PdfFileProcessor());
processingManager.Completed += new ProcessingCompletedHandler(ProcessingCompletedHandler);
processingManager.Completed += new ProcessingCompletedHandler(LogFileStatus);
while (true)
{
try
{
var jobData = (JobData)fileMonitor.FileQueue.Dequeue();
if (jobData == null)
break;
_pool.WaitOne();
Application.Log(String.Format("{0}:{1}", DateTime.Now.ToString(CultureInfo.InvariantCulture), "Thread launched"));
Task.Factory.StartNew(() => processingManager.Process(jobData));
}
catch (Exception e)
{
Application.Log(String.Format("{0}:{1}", DateTime.Now.ToString(CultureInfo.InvariantCulture), e.Message));
}
}
}
What are are you suggestions on making the code multi-threaded while taking into consideration the possibility that two identical string paths may be added into the blocking queue? I have left the possibility that this might happen and in this case.. the file would be processed twice, the thing is that sometimes I get it twice, sometimes not, it is really awkward, if you have suggestions on this, please tell.
The null checking is for exiting the loop, I intentionally add a null from outside the threaded loop to determine it to stop.
For multi-threading this... I would probably add a "Completed" event to your FileProcessingManager and register for it. One argument of that event will be the "bool" return value you currently have. Then in that event handler, I would do the checking of the bool and re-queueing of the file. Note that you will have to keep a reference to the FileMonitorManager. So, I would have this ThreadProc method be in a class where you keep the FileMonitorManager and FileProcessingManager instances in a property.
To deduplicate, in ThreadProc, I would create a List outside of the while loop. Then inside the while loop, before you process a file, lock that list, check to see if the string is already in there, if not, add the string to the list and process the file, if it is, then skip processing.
Obviously, this is based on little information surrounding your method but my 2 cents anyway.
Rough code, from Notepad:
private static FileMonitorManager fileMon = null;
private static FileProcessingManager processingManager = new FileProcessingManager();
private static void ThreadProc(object param)
{
processingManager.RegisterProcessor(new ExcelFileProcessor());
processingManager.RegisterProcessor(new PdfFileProcessor());
processingManager.Completed += ProcessingCompletedHandler;
var procList = new List<string>();
while (true)
{
try
{
var path = (string)fileMon.FileQueue.Dequeue();
if (path == null)
break;
bool processThis = false;
lock(procList)
{
if(!procList.Contains(path))
{
processThis = true;
procList.Add(path);
}
}
if(processThis)
{
Thread t = new Thread (new ParameterizedThreadStart(processingManager.Process));
t.Start (path);
}
}
catch (System.Exception e)
{
Console.WriteLine(e.Message);
}
}
}
private static void ProcessingCompletedHandler(bool status, string path)
{
if (!status)
{
fileMon.FileQueue.Enqueue(path);
Console.WriteLine("\n\nError on file: " + path);
}
else
Console.WriteLine("\n\nSucces on file: " + path);
}
I have a regular Queue object in C# (4.0) and I'm using BackgroundWorkers that access this Queue.
The code I was using is as follows:
do
{
while (dataQueue.Peek() == null // nothing waiting yet
&& isBeingLoaded == true // and worker 1 still actively adding stuff
)
System.Threading.Thread.Sleep(100);
// otherwise ready to do something:
if (dataQueue.Peek() != null) // because maybe the queue is complete and also empty
{
string companyId = dataQueue.Dequeue();
processLists(companyId);
// use up the stuff here //
} // otherwise nothing was there yet, it will resolve on the next loop.
} while (isBeingLoaded == true // still have stuff coming at us
|| dataQueue.Peek() != null); // still have stuff we haven’t done
However, I guess when dealing with threads I should be using a ConcurrentQueue.
I was wondering if there were examples of how to use a ConcurrentQueue in a Do While Loop like above?
Everything I tried with the TryPeek wasn't working..
Any ideas?
You can use a BlockingCollection<T> as a producer-consumer queue.
My answer makes some assumptions about your architecture, but you can probably mold it as you see fit:
public void Producer(BlockingCollection<string> ids)
{
// assuming this.CompanyRepository exists
foreach (var id in this.CompanyRepository.GetIds())
{
ids.Add(id);
}
ids.CompleteAdding(); // nothing left for our workers
}
public void Consumer(BlockingCollection<string> ids)
{
while (true)
{
string id = null;
try
{
id = ids.Take();
} catch (InvalidOperationException) {
}
if (id == null) break;
processLists(id);
}
}
You could spin up as many consumers as you need:
var companyIds = new BlockingCollection<string>();
Producer(companyIds);
Action process = () => Consumer(companyIds);
// 2 workers
Parallel.Invoke(process, process);