Is there any way to observe each collection (or even one) in mongoDB? Now I think about timer to check document number or last Id, but maybe there is some possibility to implement mechanism like newDocumentAddedEvent?
There are no triggers in MongoDB (yet?), but if you're running a replica set (as you should), your app can pretend to be a catching-up secondary, tail the oplog collection and get information about new inserts/updates.
This is a very efficient approach (mongodb itself uses it for the replication).
Related
Im building an multithreading program that handels big data and wounder what i can do to tweak it.
Right now i have 50 000millions entrys in a normal List and as i use multithreading i use lockstatement.
public string getUsername()
{
string user = null;
lock (UsersToCheckExistList)
{
user = UsersToCheckExistList.First();
UsersToCheckExistList.Remove(user);
}
return user;
}
When i run smaller lists 500k lines it works much faster. But when i load a bigger list 5-50mill it starts to slow down. One way to solve this issue is creating many small lists dynamic and store them in an Dictonary and this is the way i think i will go with. But as i want to learn more about optimizing i wounder if there is a better solution for this task?
All i want is the get a value from the collection and remove it same time from the collection.
You're using the wrong tools for the job - explicit locking is quite expensive, not to mention that the cost of removing the head of a List is O(Count). If you want a collection that is accessed concurrently it's best to use types in System.Collections.Concurrent, as they are heavily optimised for concurrent accesses. From your use case it seems you want a queue of users, so using a ConcurrentQueue:
ConcurrentQueue<string> UsersQueue;
public string getUsername()
{
string user = null;
UsersQueue.TryDequeue(out user);
return user;
}
The problem is that removing the first item from a list is O(n), so as you list grows it takes longer to remove the first item. You would probably be better off using a Queue instead. Since you need threadsafety, you could use ConcurrentQueue, which handles efficient locking for you.
You can put them all in a ConcurrentBag (https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentbag-1?view=netframework-4.8) then each thread can just use the TryTake method to grab one entry and remove it at the same time, you then don't need to worry about doing your own locking
If you have enough RAM for your data, you should definitely use ConcurrentQueue for FIFO access to you data.
But if you have not enough RAM you can try to use some database. Modern databases can cache data very effectively, you will have almost instant access to you data and save OS memory from swapping.
I'm using stackexchange.redis SDK in C#, and wish to scan my hash set.
I expected the SDK executed as redis client(when I execute "hscan myKey 0", it will return several key-value pairs, and an cursor which I'll use for the next scan). But when I use stackexchange.redis SDK to implement the "hashscan" method as following:
redisCache.HashScan(myKey, pageSize:10, cursor: 0)
It will return all the fields in "myKey", there are 2,000 key-value pairs in it.
How can I let it just return several results at one time?
Cause In the future, there will be millions of fields in "myKey", if they all return at one time, it'll cost lots of memory, and will it block the online service? Cause redis is single thread application.
Thanks!
It isn't doing quite what you think it is doing. The HashScan method here returns a custom iterator which maintains at most 2 pages of data; when you get near the end of one page, it fetches the next page automatically. Essentially, then, if you only want to read 20 items, just read 20 items. For example, LINQs .Take(20) would work fine. If you call .ToList() on the iterator, then yes: it will walk from one end to the other, fetching data dynamically as it needs. So: don't do that :)
Things it does not do:
fetch all the data in a single huge call to redis
perform lots of small calls to redis before returning from the HashScan method
As a side note: the custom iterator implements a custom interface to allow you to pick up and resume cursors, if you need that.
is there any need to handle locks in terms of threading in any inventory application.
like as i think asp.net is not thread safe.
lets say that there is a product available and its quantity available is 1 and number of user partially trying to book that particular product are 40. so which is going to get that product. or what happens.
not sure even if the question is reliable or not.
http://blogs.msdn.com/b/benchr/archive/2008/09/03/does-asp-net-magically-handle-thread-safety-for-you.aspx
i am not sure on this please help.
Well, technically, you're not even talking about ASP.NET here, but rather Entity Framework or whatever else you're using to communicate with SQL Server or whatever else persistent data store you're using. Relational databases will typically row-lock, so that as one client is updating the row, the row cannot be read by another client, but you can still run into concurrency issues.
You can handle this situation one of two ways: pessimistic concurrency or optimistic concurrency. With pessimistic concurrency you create locks and any other thread trying to read/write the same data is simply turned away in the mean time. In a multi-threaded environment, it's far more common to use optimistic concurrency, since this allows a bit of play room for failover.
With optimistic concurrency, you version the data. As a simplistic example, let's say that I'm looking for the current stock of widgets in my dbo.Widgets table. I'd have a column like Version which might initially be set to "1" and 100 widgets in my Stock column. Client one wants to buy a widget, so I read the row and note the version, 1. Now, I want to update the column so I do an update to set Stock to 99 and Version to 2, but I include in my where clause Version = 1. But, between the time the row was initially read and the update was sent, another client bought a widget and updated the version of the row to 2. The first client's update fails, because Version is no longer 1. So the application then reads the row fresh and tries to update it again, subtracting 1 from Stock and incrementing Version by 1. Rinse and repeat. Generally, you'll want to have some upward limit of attempts before you'll just give up and return an error to the user, but in most scenarios, you might have one collision and then the next one goes through fine. Your server would have to be getting slammed with people eagerly trying to buy widgets before it would be a real problem.
Now of course, this is a highly simplistic approach, and honestly, not something you really have to manage yourself. Entity Framework, for example, will handle concurrency for you automatically as long as you have a rowversion column:
[Timestamp]
public byte[] RowVersion { get; set; }
See http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application for the full guide to setting it up.
ASP.NET certainly is not Thread Safe. The article you link too is fine as a start, but doesn't tell all the story by a long way. In your case, you likely load the Product List into memory at first request for it, or at Application Startup or some other trigger.
When a Request wants to work with a product you grab the appropriate member of this preloaded list. (Believe me this is better than having every request loading the product or product list from the database.) However, now if you have 40 simultaneous requests for the same product they will all be accessing the same object, and new nasty things can happen, like ending up with -39 stock.
You can address this in a many ways ways, but they boild down to two:
Protect the data somehow
Do what Amazon does
Protect the data
There are numerous ways of doing this. One would be to use a critical section via the Lock keyword on C#. For an example, something like this in the Product Class:
private object lockableThing; // Created in the ctor
public bool ReduceStockLevelForSale(int qtySold)
{
bool success = false;
if (this.quantityOnHand >= qtySold)
{
lock (lockableThing)
{
if (this.quantityOnHand >= qtySold)
{
this.quantityOnHand -= qtySold;
success = true;
}
}
}
return success;
}
The double check on the quantity on hand is deliberate and required. There are any number of ways of doing the equivalent. Books have been written about this sort of thing.
Do what Amazon does
As long as at some point in the Order Taking sequence, Amazon thinks it has enough on hand (or maybe even any) it will let you place the order. It doesn't reduce the stock level while the order is being confirmed. Once the order has been confirmed, it has a back-end process (i.e. NOT run by the Web Site) which checks order by order that the order can be fulfilled, and only reduces the On Hand level if it can. If it can't be, they put the order on hold and send you an email saying 'Sorry! We don't have enough of Product X!' and giving you some options.
Discussion
Amazon's is the best way, because if you decrement the stock from the Web Site at what point do you do it? Probably not until the order is confirmed. If the stock has gone, what do you then do? Also, you are going to have to have some functionality to send the 'Sorry!' email: what happens when the last one (or two or three) items of that product can't be found, don't physically exist or are broken? You send a 'Sorry!' email.
However, this does assume that you are in control of the full order to dispatch cycle which is not always the case. If you aren't in control of the full cycle, you need to adjust to what you are in control of, and then pick a method.
Hi im trying to write a lockless list i got the adding part working it think but the code that extracts objects from the list does not work to good :(
Well the list is not a normal list.. i have the Interface IWorkItem
interface IWorkItem
{
DateTime ExecuteTime { get; }
bool Cancelled { get; }
void Execute(DateTime now);
}
and well i have a list where i can add this :P and the idear is when i run Get(); on the list it should loop it until it finds a IWorkItem that
If (item.ExecuteTime < DateTime.Now)
and remove it from the list and return it..
i have ran tests with many threads on my dual core cpu and it seems that Add works never failed so far but the Get function looses some workitems some where i have no idear whats wrong.....
ps if i get this working any one is free to use the code :) well you are any way but i dont se the point when its bugged :P
The code is here http://www.easy-share.com/1903474734/LinkedList.zip and if you try to run it you will see that it will some times not be able to get as many workitems as it did put in the list...
Edit: I have got a lockless list working it was faster than using the lock(obj) statment but i have a lock object that uses Interlocked that was still outpreforming the lockless list, im going to try to make a lockless arraylist and se if i get the same results there when im done ill upload the result here..
The problem is your algorithm: Consider this sequence of events:
Thread 1 calls list.Add(workItem1), which completes fully.
Status is:
first=workItem1, workItem1.next = null
Then thread 1 calls list.Add(workItem2) and reaches the spot right before the second Replace (where you have the comment "//lets try").
Status is:
first=workItem1, workItem1.next = null, nextItem=workItem1
At this point thread 2 takes over and calls list.Get(). Assume workItem1's executionTime is now, so the call succeeds and returns workItem1.
After this status is:
first = null, workItem1.next = null
(and in the other thread, nextItem is still workItem1).
Now we get back to the first thread, and it completes the Add() by setting workItem1.next:=workItem2.
If we call list.Get() now, we will get null, even though the Add() completed successfully.
You should probably look up a real peer-reviewed lock-free linked list algorithm. I think the standard one is this by John Valois. There is a C++ implementation here. This article on lock-free priority queues might also be of use.
You can use a timestamping protocol for datastructures just fine, mirroring this example from the database world:
Concurrency
But be clear that each item needs both a read and write timestamp, and be sure to follow the rules of the algorithm clearly.
There are some additional difficulties of implementing this on a linked list though, I think. The database example would be fine for a vector where you know the array index of what you want. However, in a linked list, you may need to walk down the pointers -- and the structure of the list could change while you are searching! I guess you could solve that by some sort of nuance (or if you just want to traverse the "new" list as it is, do nothing), but it poses a problem. Try to solve it without introducing some rollback condition that makes it worse than locking the list!
So are you sure that it needs to be lockless? Depending on your work load the non-blocking solution can sometimes be slower. Check out this MSDN article for a little more. Also proving that a lockless data structure is correct can be very difficult.
I am in no way an expert on the subject, but as far as I can see, you need to either make the ExecutionTime-field in the implementation of IWorkItem volatile (of course it might already be that) or insert a memorybarrier either after you set the ExecutionTime or before you read it.
I need to enumerate though generic IList<> of objects. The contents of the list may change, as in being added or removed by other threads, and this will kill my enumeration with a "Collection was modified; enumeration operation may not execute."
What is a good way of doing threadsafe foreach on a IList<>? prefferably without cloning the entire list. It is not possible to clone the actual objects referenced by the list.
Cloning the list is the easiest and best way, because it ensures your list won't change out from under you. If the list is simply too large to clone, consider putting a lock around it that must be taken before reading/writing to it.
There is no such operation. The best you can do is
lock(collection){
foreach (object o in collection){
...
}
}
Your problem is that an enumeration does not allow the IList to change. This means you have to avoid this while going through the list.
A few possibilities come to mind:
Clone the list. Now each enumerator has its own copy to work on.
Serialize the access to the list. Use a lock to make sure no other thread can modify it while it is being enumerated.
Alternatively, you could write your own implementation of IList and IEnumerator that allows the kind of parallel access you need. However, I'm afraid this won't be simple.
ICollection MyCollection;
// Instantiate and populate the collection
lock(MyCollection.SyncRoot) {
// Some operation on the collection, which is now thread safe.
}
From MSDN
You'll find that's a very interesting topic.
The best approach relies on the ReadWriteResourceLock which use to have big performance issues due to the so called Convoy Problem.
The best article I've found treating the subject is this one by Jeffrey Richter which exposes its own method for a high performance solution.
So the requirements are: you need to enumerate through an IList<> without making a copy while simultaniously adding and removing elements.
Could you clarify a few things? Are insertions and deletions happening only at the beginning or end of the list?
If modifications can occur at any point in the list, how should the enumeration behave when elements are removed or added near or on the location of the enumeration's current element?
This is certainly doable by creating a custom IEnumerable object with perhaps an integer index, but only if you can control all access to your IList<> object (for locking and maintaining the state of your enumeration). But multithreaded programming is a tricky business under the best of circumstances, and this is a complex probablem.
Forech depends on the fact that the collection will not change. If you want to iterate over a collection that can change, use the normal for construct and be prepared to nondeterministic behavior. Locking might be a better idea, depending on what you're doing.
Default behavior for a simple indexed data structure like a linked list, b-tree, or hash table is to enumerate in order from the first to the last. It would not cause a problem to insert an element in the data structure after the iterator had already past that point or to insert one that the iterator would enumerate once it had arrived, and such an event could be detected by the application and handled if the application required it. To detect a change in the collection and throw an error during enumeration I could only imagine was someone's (bad) idea of doing what they thought the programmer would want. Indeed, Microsoft has fixed their collections to work correctly. They have called their shiny new unbroken collections ConcurrentCollections (System.Collections.Concurrent) in .NET 4.0.
I recently spend some time multip-threading a large application and had a lot of issues with the foreach operating on list of objects shared across threads.
In many cases you can use the good old for-loop and immediately assign the object to a copy to use inside the loop. Just keep in mind that all threads writing to the objects of your list should write to different data of the objects. Otherwise, use a lock or a copy as the other contributors suggest.
Example:
foreach(var p in Points)
{
// work with p...
}
Can be replaced by:
for(int i = 0; i < Points.Count; i ++)
{
Point p = Points[i];
// work with p...
}
Wrap the list in a locking object for reading and writing. You can even iterate with multiple readers at once if you have a suitable lock, that allows multiple concurrent readers but also a single writer (when there are no readers).
This is something that I've recently had to deal with and to me it really depends on what you're doing with the list.
If you need to use the list at a point in time (given the number of elements currently in it) AND another thread can only ADD to the end of the list, then maybe you just switch out to a FOR loop with a counter. At the point you grab the counter, you're only seeing X numbers of elements in the list. You can walk through the list (while others are adding to the end of it) . . . should not cause a problem.
Now, if the list needs to have items taken OUT of it by other threads, or CLEARED by other threads, then you'll need to implement one of the locking mechanisms mentioned above. Also, you may want to look at some of the newer "concurrent" collection classes (though I don't believe they implement IList - so you may need refactor for a dictionary).