Is there a way to create a Random Array/List/HashSet WITHOUT using lock() or the Concurrent* set of methods?
My goal:
Be able to Add and Remove strings as I please (or be able to Add and "disable" strings)
Be able to Clear the list or swap out the entire sets of strings
Be able to get an item from it (Kind of like TryPeek) to choose ANY item RANDOMLY without using a Mixture of .ElementAt with Random of .Count/.Count()
I'm attempting to create a system to choose a Proxy at complete random from a collection that can be modified by another thread either removing or adding new proxies at any time.
Here's some "Randomizing" solutions thrown around a lot and why they are bad and shouldn't be used
.ElementAt(Random.Next(List.Count)) is NOT a good way to randomize a list
This is extremely unsuitable for a multi-threaded scenario for multiple reasons.
Even with a lock(){} wrapped around it and all other code to do with the collection can cause conflicts in a modified enumerable.
What I mean is, the List can change its count (perhaps lower) causing an error to be caused with .ElementAt when it chose lets say the final element, but that final element JUST got removed from the list right before ElementAt gets to it, causing an exception.
.OrderBy(Random)
Is also another bad way as it will randomize the entire list before choosing an element and is susceptible to Collection being modified while executing OrderBy causing an exception.
Both of the bad ways to randomly choose 1 item from a list can be "solved" by simply .ToArray()'ing before .OrderBy(Random) or before you do the .ElementAt but you also must use the ToArray()'s count for the .ElementAt's Random.
The issue here too is that this will be bad for memory depending on what your doing your essentially doubling the memory usage for the list.
This is why i'm asking if there's any kind of way to randomize efficiently without the possibility of multi-threading conflicting with modifications to collections.
Current Code
lock = new object();
...
lock (_Lock) {
proxy = proxyArray.ElementAt(proxyArray.Count == 1 ? 0 : rand.Next(0, proxyArray.Count - 1));
}
I'm doing a count == 1 ? 0 so that it doesn't even bother wasting CPU randomizing for an obvious answer.
My attempt with ConcurrentBag
object proxiesRefill = new object();//for lock
...
while (!concurrentProxiesBag.TryTake(out proxy)) {//keep attempting until it takes a proxy
lock (proxiesRefill) {//it failed possibly because its empty lets lock and check on 1 thread
if (concurrentProxiesBag.IsEmpty) {//if its empty lets refill
concurrentProxiesBag = new ConcurrentBag<string>(hashsetDisabledProxies);//Refill by creating a new ConcurrentBag and adding all disabled proxies to it [Properly threadsafe? not sure]
}
}
Thread.Sleep(100);//sleep for 100ms just to cool down a small bit
}
//Got proxy
hashsetDisabledProxies.Add(proxy);//Disable proxy as its taken out of the active proxy bag by TryTake
//Its a hashset for no duplicates that could possibly get added.
I made some changes to your code:
while (!concurrentProxiesBag.TryTake(out proxy)) {//keep attempting until it takes a proxy
if (concurrentProxiesBag.IsEmpty) {//if its empty lets refill
concurrentProxiesBag = new ConcurrentBag<string>(hashsetDisabledProxies);//Refill by creating a new ConcurrentBag and adding all disabled proxies to it [Properly threadsafe? not sure]
}
}
Thread.Sleep(100);//sleep for 100ms just to cool down a small bit
IsEmpty, Count, ToArray(), GetEnumerator() lock the entire structure;
Related
Im building an multithreading program that handels big data and wounder what i can do to tweak it.
Right now i have 50 000millions entrys in a normal List and as i use multithreading i use lockstatement.
public string getUsername()
{
string user = null;
lock (UsersToCheckExistList)
{
user = UsersToCheckExistList.First();
UsersToCheckExistList.Remove(user);
}
return user;
}
When i run smaller lists 500k lines it works much faster. But when i load a bigger list 5-50mill it starts to slow down. One way to solve this issue is creating many small lists dynamic and store them in an Dictonary and this is the way i think i will go with. But as i want to learn more about optimizing i wounder if there is a better solution for this task?
All i want is the get a value from the collection and remove it same time from the collection.
You're using the wrong tools for the job - explicit locking is quite expensive, not to mention that the cost of removing the head of a List is O(Count). If you want a collection that is accessed concurrently it's best to use types in System.Collections.Concurrent, as they are heavily optimised for concurrent accesses. From your use case it seems you want a queue of users, so using a ConcurrentQueue:
ConcurrentQueue<string> UsersQueue;
public string getUsername()
{
string user = null;
UsersQueue.TryDequeue(out user);
return user;
}
The problem is that removing the first item from a list is O(n), so as you list grows it takes longer to remove the first item. You would probably be better off using a Queue instead. Since you need threadsafety, you could use ConcurrentQueue, which handles efficient locking for you.
You can put them all in a ConcurrentBag (https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentbag-1?view=netframework-4.8) then each thread can just use the TryTake method to grab one entry and remove it at the same time, you then don't need to worry about doing your own locking
If you have enough RAM for your data, you should definitely use ConcurrentQueue for FIFO access to you data.
But if you have not enough RAM you can try to use some database. Modern databases can cache data very effectively, you will have almost instant access to you data and save OS memory from swapping.
I have a collection of items (SortedPoints) that I iterate over using Parallel.ForEach. Each item will become the key in a Dictionary named Stripes. Computing the value for each item is expensive and is in the method BulidStripes.
Parallel.ForEach(SortedPoints, point =>
Stripes[point] = BuildStripes(point, pointToPosition)
);
I can make Stripes a ConcurrentDictionary, but I was wondering if this would work:
1) Make Stripes a regular Dictionary.
2) Iterate over all points serially and fill Stripes with mappings to an empty object.
3) Iterate over all points in parallel and replace the mapping in Stripes with the actual value returned by BuildStripes.
foreach(var point in SortedPoints)
Stripes[point] = emptyStripe;
Parallel.ForEach(SortedPoints, point =>
Stripes[point] = BuildStripes(point, pointToPosition)
);
Is setting the value for a key thread-safe if each thread works on a separate set of keys and each key is pre-loaded into the Dictionary serially as I outlined? I looked at the source code for Dictionary and it looks safe, but these collections are subtle beasts and parallel bugs are hard to spot.
Once the dictionary is created, I never modify it again and all accesses are reads.
Lets see the facts. A thread error can happen if:
Add a new item ? Nope
Resize the dictionnary while adding an item. This is not a problem, your dictionary has a fixed size.
Two threads try to set the value of a same key. Won't happen because your SortedPoints collection have distinct item (does it?)
Is there any other option? I don't see one. I think you are safe to go with this method.
But just use a regular ConcurrentDictionnary for readibility of course! Perhaps you can gain some performance, but unless you benchmark it, there is no reason to not use a ConcurrentDictionary.
Some docs about what ConcurrentDictionary deal with.
I had some problems with a WCF web service (some dumps, memory leaks, etc.) and I run a profillng tool (ANTS Memory Profiles).
Just to find out that even with the processing over (I run a specific test and then stopped), Generation 2 is 25% of the memory for the web service. I tracked down this memory to find that I had a dictionary object full of (null, null) items, with -1 hash code.
The workflow of the web service implies that during specific processing items are added and then removed from the dictionary (just simple Add and Remove). Not a big deal. But it seems that after all items are removed, the dictionary is full of (null, null) KeyValuePairs. Thousands of them in fact, such that they occupy a big part of memory and eventually an overflow occurs, with the corresponding forced application pool recycle and DW20.exe getting all the CPU cycles it can get.
The dictionary is in fact Dictionary<SomeKeyType, IEnumerable<KeyValuePair<SomeOtherKeyType, SomeCustomType>>> (System.OutOfMemoryException because of Large Dictionary) so I already checked if there is some kind of reference holding things.
The dictionary is contained in a static object (to make it accesible to different processing threads through processing) so from this question and many more (Do static members ever get garbage collected?) I understand why that dictionary is in Generation 2. But this is also the cause of those (null, null)? Even if I remove items from dictionary something will be always occupied in the memory?
It's not a speed issue like in this question Deallocate memory from large data structures in C# . It seems that memory is never reclaimed.
Is there something I can do to actually remove items from dictionary, not just keep filling it with (null, null) pairs?
Is there anything else I need to check out?
Dictionaries store items in a hash table. An array is used internally for this. Because of the way hash tables work, this array must always be larger than the actual number of items stored (at least about 30% larger). Microsoft uses a load factor of 72%, i.e. at least 28% of the array will be empty (see An Extensive Examination of Data Structures Using C# 2.0 and especially The System.Collections.Hashtable Class
and The System.Collections.Generic.Dictionary Class) Therefore the null/null entries could just represent this free space.
If the array is too small, it will grow automatically; however, when items are removed, the array does not shrink, but the space that will be freed up should be reused when new items are inserted.
If you are in control of this dictionary, you could try to re-create it in order to shrink it:
theDict = new Dictionary<TKey, IEnumerable<KeyValuePair<TKey2, TVal>>>(theDict);
But the problem might arise from the actual (non empty) entries. Your dictionary is static and will therefore never be reclaimed automatically by the garbage collector, unless you assign it another dictionary or null (theDict = new ... or theDict = null). This is only true for the dictionary itself which is static, not for its entries. As long as references to removed entries exist somewhere else, they will persist. The GC will reclaim any object (earlier or later) which cannot be accessed any more through some reference. It makes no difference, whether this object was declared static or not. The objects themselves are not static, only their references.
As #RobertTausig kindly pointed out, since .NET Core 2.1 there is the new Dictionary.TrimExcess(), which is what you actually wanted, but didn't exist back then.
Looks like you need to recycle space in that dict periodically. You can do that by creating a new one: new Dictionary<a,b>(oldDict). Be sure to do this in a thread-safe manner.
When to do this? Either on the tick of a timer (60sec?) or when a specific number of writes has occurred (100k?) (you'd need to keep a modification counter).
A solution could be to call Clear() method on the static dictionary.
In this way, the reference to the dictionary will remain available, but the objects contained will be released.
i am building a class that inherits from List. Items are going to be added to this collection at runtime and what i want is to have this class automatically do something with each block of n items after they have been added.
So here is the scenario.
1] Create new class that inherits from List - CollectionX
2] At runtime we will be calling ColX.Add(T) many times
3] When ColX has 500 or more items it is to move them into a temporary area and do work on them, then delete them. Keeping in mind that all the while items will still be being added to ColX.
So i guess my question is how do i implement this nicely and by ensuring that it is thread safe.
The work that is to be performed must be done in blocks so i dont think a queue will work as you can only dequeue 1 item at a time.
I think im looking for more of a pattern than actual types or libraries.
Can anyone help?
Don't let CollectionX inherit from List.
Instead, use 2 Lists internally, Add() to 1 and process the other.
This way you only have to lock the swapping of the Lists. If there are timing problems you could even use a 3rd List to prevent blockage.
I need to enumerate though generic IList<> of objects. The contents of the list may change, as in being added or removed by other threads, and this will kill my enumeration with a "Collection was modified; enumeration operation may not execute."
What is a good way of doing threadsafe foreach on a IList<>? prefferably without cloning the entire list. It is not possible to clone the actual objects referenced by the list.
Cloning the list is the easiest and best way, because it ensures your list won't change out from under you. If the list is simply too large to clone, consider putting a lock around it that must be taken before reading/writing to it.
There is no such operation. The best you can do is
lock(collection){
foreach (object o in collection){
...
}
}
Your problem is that an enumeration does not allow the IList to change. This means you have to avoid this while going through the list.
A few possibilities come to mind:
Clone the list. Now each enumerator has its own copy to work on.
Serialize the access to the list. Use a lock to make sure no other thread can modify it while it is being enumerated.
Alternatively, you could write your own implementation of IList and IEnumerator that allows the kind of parallel access you need. However, I'm afraid this won't be simple.
ICollection MyCollection;
// Instantiate and populate the collection
lock(MyCollection.SyncRoot) {
// Some operation on the collection, which is now thread safe.
}
From MSDN
You'll find that's a very interesting topic.
The best approach relies on the ReadWriteResourceLock which use to have big performance issues due to the so called Convoy Problem.
The best article I've found treating the subject is this one by Jeffrey Richter which exposes its own method for a high performance solution.
So the requirements are: you need to enumerate through an IList<> without making a copy while simultaniously adding and removing elements.
Could you clarify a few things? Are insertions and deletions happening only at the beginning or end of the list?
If modifications can occur at any point in the list, how should the enumeration behave when elements are removed or added near or on the location of the enumeration's current element?
This is certainly doable by creating a custom IEnumerable object with perhaps an integer index, but only if you can control all access to your IList<> object (for locking and maintaining the state of your enumeration). But multithreaded programming is a tricky business under the best of circumstances, and this is a complex probablem.
Forech depends on the fact that the collection will not change. If you want to iterate over a collection that can change, use the normal for construct and be prepared to nondeterministic behavior. Locking might be a better idea, depending on what you're doing.
Default behavior for a simple indexed data structure like a linked list, b-tree, or hash table is to enumerate in order from the first to the last. It would not cause a problem to insert an element in the data structure after the iterator had already past that point or to insert one that the iterator would enumerate once it had arrived, and such an event could be detected by the application and handled if the application required it. To detect a change in the collection and throw an error during enumeration I could only imagine was someone's (bad) idea of doing what they thought the programmer would want. Indeed, Microsoft has fixed their collections to work correctly. They have called their shiny new unbroken collections ConcurrentCollections (System.Collections.Concurrent) in .NET 4.0.
I recently spend some time multip-threading a large application and had a lot of issues with the foreach operating on list of objects shared across threads.
In many cases you can use the good old for-loop and immediately assign the object to a copy to use inside the loop. Just keep in mind that all threads writing to the objects of your list should write to different data of the objects. Otherwise, use a lock or a copy as the other contributors suggest.
Example:
foreach(var p in Points)
{
// work with p...
}
Can be replaced by:
for(int i = 0; i < Points.Count; i ++)
{
Point p = Points[i];
// work with p...
}
Wrap the list in a locking object for reading and writing. You can even iterate with multiple readers at once if you have a suitable lock, that allows multiple concurrent readers but also a single writer (when there are no readers).
This is something that I've recently had to deal with and to me it really depends on what you're doing with the list.
If you need to use the list at a point in time (given the number of elements currently in it) AND another thread can only ADD to the end of the list, then maybe you just switch out to a FOR loop with a counter. At the point you grab the counter, you're only seeing X numbers of elements in the list. You can walk through the list (while others are adding to the end of it) . . . should not cause a problem.
Now, if the list needs to have items taken OUT of it by other threads, or CLEARED by other threads, then you'll need to implement one of the locking mechanisms mentioned above. Also, you may want to look at some of the newer "concurrent" collection classes (though I don't believe they implement IList - so you may need refactor for a dictionary).