partition data into chunks and then assure thread safety - c#

i'm writing some code that basically receive data from socket, perform deserialization and then passed to my application.the deserialized objects can be grouped by their id's (the id is being generated during the deserialization process).
to increase the performance of my application i wanted to use the new parallelism capabilities that came with C# 4.0. the only constraint i have is that 2 threads cannot access object with the same id's. now i know that i can just perform lock() on a sync object that will be placed inside the object but i want to avoid these locks (performance is an issue here).
The design I've thought about:
create some kind of partitioner that will split data by the ID (this will make sure that every buffer i'll get will always have the same object id's group together).
assign thread by using TPL of PLINQ
can someone suggest me some sources that do that?

I would suggest PLINQ when developing for multiple processors or cores.
PLINQ is a query execution engine that accepts any LINQ-to-Objects or LINQ-to-XML query and automatically utilizes multiple processors or cores for execution when they are available. The change in programming model is tiny, meaning you don't need to be a concurrency guru to use it. In fact, threads and locks won't even come up unless you really want to dive under the hood to understand how it all works. PLINQ is a key component of Parallel FX, the next generation of concurrency support in the Microsoft® .NET Framework.
This covers:
From LINQ to PLINQ
PLINQ Programming Model
Processing Query Output
Concurrent Exceptions
Ordering in the Output Results
Side Effects
Putting PLINQ to Work
Parallel LINQ (PLINQ)

Related

Best way of dealing with shared state in a real time system in dotnet core background service

I have a background service IHostedService in dotnet core 3.1 that takes requests from 100s of clients(machines in a factory) using sockets (home rolled). My issue is that multiple calls can come in on different threads to the same method on a class which has access to an object (shared state). This is common in the codebase. The requests also have to be processed in the correct order.
The reason that this is not in a database is due to performance reasons (real time system). I know I can use a lock, but I don't want to have locks all over the code base.
What is a standard way to handle this situation. Do you use an in-memory database? In-memory cache? Or do I just have to add locks everywhere?
public class Machine
{
public MachineState {get; set;}
// Gets called by multiple threads from multiple clients
public bool CheckMachineStatus()
{
return MachineState.IsRunning;
}
// Gets called by multiple threads from multiple clients
public void SetMachineStatus()
{
MachineState = Stopped;
}
}
Update
Here's an example. I have a console app that talks to a machine via sockets, for weighing products. When the console app initializes it will load data into memory (information about the products being weighed). All of this is done on the main thread, to keep data integrity.
When a call comes in from the weigh-er on Thread 1, it will get switched to the main thread to access the product information, and to finish any other work like raising events for other parts of the system.
Currently this switching from Thread 1,2, ...N to the main thread is done by a home rolled solution, and was done to avoid having locking code all over the code base. This was written in .Net 1.1 and since moving to dotnet core 3.1. I thought there might be a framework, library, tool, technique etc that might handle this for us, or just a better way.
This is an existing system that I'm still learning. Hope this makes sense.
Using an in-memory database is an option, as long as you are willing to delegate all concurrency-inducing situations to the database, and do nothing using code. For example if you must update a value in the database depending on some condition, then the condition should be checked by the database, not by your own code.
Adding locks everywhere is also an option, that will almost certainly lead to unmaintanable code quite quickly. The code will probably be riddled with hidden bugs from the get-go, bugs that you will discover one by one over time, usually under the most unfortunate of circumstances.
You must realize that you are dealing with a difficult problem, with no magic solutions available. Managing shared state in a multithreaded application has always been a source of pain.
My suggestion is to encapsulate all this complexity inside thread-safe classes, that the rest of your application can safely invoke. How you make these classes thread-safe depends on the situation.
Using locks is the most flexible option, but not always the most efficient because it has the potential of creating contention.
Using thread-safe collections, like the ConcurrentDictionary for example, is less flexible because the thread-safety guarantees they offer are limited to the integrity of their internal state. If for example you must update one collection based on a condition obtained from another collection, then the whole operation can not be made atomic by just using thread-safety collections. On the other hand these collections offer better performance than the simple locks.
Using immutable collections, like the ImmutableQueue for example, is another interesting option. They are less efficient both memory and CPU wise than the concurrent collections (adding/removing is in many cases O(Log n) instead of O(1)), and not more flexible than them, but they are very efficient specifically at providing snapshots of actively processed data. For updating atomically an immutable collection, there is the handy ImmutableInterlocked.Update method available. It updates a reference of an immutable collection with an updated version of the same collection, without using locks. In case of contention with other threads it may invoke the supplied transformation multiple times, until it wins the race.

Will DeleteManyAsync lock MongoDB collection while deleting documents?

I want to use DeleteManyAsync method to delete multiple documents. I will encounter big collections being deleted. In the meantime I would like my new documents to be inserted. I would like to know if my database collection will be locked when DeleteManyAsync is fired.
This is the code I want to use :
List<MyDocument> list= new List<MyDocument>();
var filter = Builders<MyDocument>.Filter.In("_id", vl.Select(i => i.InternalId));
await _context?.MyDocuments?.DeleteManyAsync(filter);
Mongo db locks are a low level concern and are handled at the database server level. You, as a programmer writing a client application using the driver, do not need to concern yourself about the database locks too much.
What I'm trying to say is that when using the C# driver you won't notice any kind of issue related to concurrent write operations executed on the same collection. Locks are handled by the storage engine, not by the driver used at the client application level.
If you check this documentation you can read that, in case of conflicting write operations on the same collection, the storage engine will retry the operation at the server level:
When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation
So, again, the concurrency issues are handled at the server level.
Consider that if you need your application to be highly scalable you should design your system in order to avoid as much as possible concurrent write operations on the same collection. As I said above, locks are handled by the storage engine in order to preserve the correctness of your data, but locks can reduce the overall scalability of your system. So, if scalability is critical in your scenario, you should carefully design your system and avoid contention of resources at the database level as much as possible.
At the client application level you just need to decide whether or not retrying on a failed write operation.
Sometimes you can safely retry a failed operation, some other times you can't (e.g.: in some cases you will endup having duplicate data at the database level. A good guard against this is using unique indexes).
As a rule of thumb, idempotent write operations can safely be retried in case of a failure (because applying them multiple times does not have any side effect). Put another way, strive to have idempotent write operations as much as possible: this way you are always safe retrying a failed write operation.
If you need some guidance about the mongo C# driver erorr handling, you can take a look to this documentation
Update 25th July 2020
Based on the author comment, it seems that the main concern is not the actual database locking strategy, but the delete performances instead.
In that case I would proceed in the following manner:
always prefer a command performing a single database roundtrip (such as deleteMany) over issuing multiple single commands (such as deleteOne). By doing a single roundtrip you will minimize the latency cost and you will perform a single database command. It's simply more efficient
when you use a deleteMany command be sure to always filter documents by using a proper index, so that collection scan is avoided when finding the documents to be deleted
if you measure and you are sure that your bottleneck is the deleteMany speed, considere comparing the performances of deleteMany command with the one of an equivalent bulk write operation. I never tried that, so I have no idea about the actual speed comparison. My feeling is that probably there is no difference at all, because I supsect that under the hood deleteMany performs a bulk write. I have no clue on that, this is just a feeling.
consider changing your design in order to exploit the TTL index feature for an automatic deletion of the documents when some sort of expiration criteria is satisfied. This is not always possible, but it can be handy when applicable.
if you perform the delete operation as part of some sort of cleanup task on the data, consider scheduling a job performing the data cleanup operation on a regular basis, but outisde of the business hours of your users.

How to Achieve Parallel Fan-out processing in Reactive Extensions?

We already have parallel fan-out working in our code (using ParallelEnumerable) which is currently running on a 12-core, 64G RAM server. But we would like to convert the code to use Rx so that we can have better flexibility over our downstream pipeline.
Current Workflow:
We read millions of records from a database (in a streaming fashion).
On the client side, we then use a custom OrderablePartitioner<T> class to group the database records into groups. Let’s call an instance of this class: partioner.
We then use partioner.AsParallel().WithDegreeOfParallelism(5).ForAll(group => ProcessGroupOfRecordsAsync(group));Note: this could be read as “Process all the groups, 5 at a time in parallel.” (I.e. parallel fan-out).
ProcessGroupOfRecordsAsync() – loops through all the records in the group and turns them into hundreds or even thousands of POCO objects for further processing (i.e. serial fan-out or better yet, expand).
Depending on the client’s needs:
This new serial stream of POCO objects are evaluated, sorted, ranked, transformed, filtered, filtered by manual process, and possibly more parallel and/or serial fanned-out throughout the rest of the pipeline.
The end of the pipeline may end up storing new records into the database, displaying the POCO objects in a form or displayed in various graphs.
The process currently works just fine, except that point #5 and #6 aren’t as flexible as we would like. We need the ability to swap in and out various downstream workflows. So, our first attempt was to use a Func<Tin, Tout> like so:
partioner.AsParallel
.WithDegreeOfParallelism(5)
.ForAll(group =>ProcessGroupOfRecordsAsync(group, singleRecord =>
NextTaskInWorkFlow(singleRecord));
And that works okay, but the more we flushed out our needs the more we realized we are just re-implementing Rx.
Therefore, we would like to do something like the following in Rx:
IObservable<recordGroup> rg = dbContext.QueryRecords(inputArgs)
.AsParallel().WithDegreeOfParallelism(5)
.ProcessGroupOfRecordsInParallel();
If (client1)
rg.AnalizeRecordsForClient1().ShowResults();
if (client2)
rg.AnalizeRecordsForClient2()
.AsParallel()
.WithDegreeOfParallelism(3)
.MoreProcessingInParallel()
.DisplayGraph()
.GetUserFeedBack()
.Where(data => data.SaveToDatabase)
.Select(data => data.NewRecords)
.SaveToDatabase(Table2);
...
using(rg.Subscribe(groupId =>LogToScreen(“Group {0} finished.”, groupId);
It sounds like you might want to investigate Dataflows in the Task Parallel Library - This might be a better fit than Rx for dealing with part 5, and could be extended to handle the whole problem.
In general, I don't like the idea of trying to use Rx for parallelization of CPU bound tasks; its usually not a good fit. If you are not too careful, you can introduce inefficiencies inadvertently. Dataflows can give you nice way to parallelize only where it makes most sense.
From MSDN:
The Task Parallel Library (TPL) provides dataflow components to help increase the robustness of concurrency-enabled applications. These dataflow components are collectively referred to as the TPL Dataflow Library. This dataflow model promotes actor-based programming by providing in-process message passing for coarse-grained dataflow and pipelining tasks. The dataflow components build on the types and scheduling infrastructure of the TPL and integrate with the C#, Visual Basic, and F# language support for asynchronous programming. These dataflow components are useful when you have multiple operations that must communicate with one another asynchronously or when you want to process data as it becomes available. For example, consider an application that processes image data from a web camera. By using the dataflow model, the application can process image frames as they become available. If the application enhances image frames, for example, by performing light correction or red-eye reduction, you can create a pipeline of dataflow components. Each stage of the pipeline might use more coarse-grained parallelism functionality, such as the functionality that is provided by the TPL, to transform the image.
Kaboo!
As no one has provided anything definite, I'll point out that the source code can be browsed at GitHub at Rx. Taking a quick tour around, it looks like at least some of the processing (all of it?) is done on the thread-pool already. So, maybe it's not possibly to explicitly control the parallelization degree besides implementing your own scheduler (e.g. Rx TestScheduler), but it happens nevertheless. See also the links below, judging from the answers (especially the one provided by James in the first link), the observable tasks are queued and processed serially by design -- but one can provide multiple streams for Rx to process.
See also the other questions that are related and visible on the left side (by default). In particular it looks like this one, Reactive Extensions: Concurrency within the subscriber, could provide some answers to your question. Or maybe Run methods in Parallel using Reactive.
<edit: Just a note that if storing objects to database becomes a problem, the Rx stream could push the save operations to, say, a ConcurrentQueue, which would then be processed separately. Other option would be to let Rx to queue items with a proper combination of some time and number of items and push them to the database by bulk insert.

Why do c# iterators track creating thread over using an interlocked operation?

This is just something that's been puzzling me ever since I read about iterators on Jon Skeet's site.
There's a simple performance optimisation that Microsoft has implemented with their automatic iterators - the returned IEnumerable can be reused as an IEnumerator, saving an object creation. Now because an IEnumerator necessarily needs to track state, this is only valid the first time it's iterated.
What I cannot understand is why the design team took the approach they did to ensure thread safety.
Normally when I'm in a similar position I'd use what I consider to be a simple Interlocked.CompareExchange - to ensure that only one thread manages to change the state from "available" to "in process".
Conceptually it's very simple, a single atomic operation, no extra fields are required etc.
But the design teams approach? Every IEnumerable keeps a field of the managed thread ID of the creating thread, and then that thread ID is checked on calling GetEnumerator against this field, and only if it's the same thread, and it's the first time it's called, can the IEnumerable return itself as the IEnumerator. It seems harder to reason about, imo.
I'm just wondering why this approach was taken. Are Interlocked operations far slower than two calls to System.Threading.Thread.CurrentThread.ManagedThreadId, so much so that it justifies the extra field?
Or is there some other reason behind this, perhaps involving memory models or ARM devices or something I'm not seeing? Maybe the spec imparts specific requirements on the implementation of IEnumerable? Just genuinely puzzled.
I can't answer definatively, but as to your question:
Are Interlocked operations far slower than two calls to
System.Threading.Thread.CurrentThread.ManagedThreadId, so much so that
it justifies the extra field?
Yes interlocked operations are much slower that two calls to get the ManagedThreadId - interlocked operations aren't cheap because they required multi-CPU systems to synchonize their caches.
From Understanding the Impact of Low-Lock Techniques in Multithreaded Apps:
Interlocked instructions need to ensure that caches are synchronized
so that reads and writes don't seem to move past the instruction.
Depending on the details of the memory system and how much memory was
recently modified on various processors, this can be pretty expensive
(hundreds of instruction cycles).
In Threading in C#, it lists overhead the overhead as 10ns. Whereas getting the ManagedThreadId should be a normal non-locked read of static data.
Now this is just my speculation, but if you think about the normal use case it would be to call the function to retrieve the IEnumerable and immediately iterative over it once. So in the standard use case the object is:
Used once
Used on the same thread it was created
Short lived
So this design brings in no synchronization overhead and sacrifices 4 bytes, which will probably only be in use for a very short period of time.
Of course to prove this you would have to do performance analysis to determine the relative costs and code analysis to prove what the common case was.

.NET DB Query Without Allocations?

I have been given the task of re-writing some libraries written in C# so that there are no allocations once startup is completed.
I just got to one project that does some DB queries over an OdbcConnection every 30 seconds. I've always just used .ExecuteReader() which creates an OdbcDataReader. Is there any pattern (like the SocketAsyncEventArgs socket pattern) that lets you re-use your own OdbcDataReader? Or some other clever way to avoid allocations?
I haven't bothered to learn LINQ since all the dbs at work are Oracle based and the last I checked, there was no official Linq To Oracle provider. But if there's a way to do this in Linq, I could use one of the third-party ones.
Update:
I don't think I clearly specified the reasons for the no-alloc requirement. We have one critical thread running and it is very important that it not freeze. This is for a near realtime trading application, and we do see up to a 100 ms freeze for some Gen 2 collections. (I've also heard of games being written the same way in C#). There is one background thread that does some compliance checking and runs every 30 seconds. It does a db query right now. The query is quite slow (approx 500 ms to return with all the data), but that is okay because it doesn't interfere with the critical thread. Except if the worker thread is allocating memory, it will cause GCs which freeze all threads.
I've been told that all the libraries (including this one) cannot allocate memory after startup. Whether I agree with that or not, that's the requirement from the people who sign the checks :).
Now, clearly there are ways that I could get the data into this process without allocations. I could set up another process and connect it to this one using a socket. The new .NET 3.5 sockets were specifically optimized not to allocate at all, using the new SocketAsyncEventArgs pattern. (In fact, we are using them to connect to several systems and never see any GCs from them.) Then have a pre-allocated byte array that reads from the socket and go through the data, allocating no strings along the way. (I'm not familiar with other forms of IPC in .NET so I'm not sure if the memory mapped files and named pipes allocate or not).
But if there's a faster way to get this no-alloc query done without going through all that hassle, I'd prefer it.
You cannot reuse IDataReader (or OdbcDataReader or SqlDataReader or any equivalent class). They are designed to be used with a single query only. These objects encapsulate a single record set, so once you've obtained and iterated it, it has no meaning anymore.
Creating a data reader is an incredibly cheap operation anyway, vanishingly small in contrast to the cost of actually executing the query. I cannot see a logical reason for this "no allocations" requirement.
I'd go so far as to say that it's very nearly impossible to rewrite a library so as to allocate no memory. Even something as simple as boxing an integer or using a string variable is going to allocate some memory. Even if it were somehow possible to reuse the reader (which it isn't, as I explained), it would still have to issue the query to the database again, which would require memory allocations in the form of preparing the query, sending it over the network, retrieving the results again, etc.
Avoiding memory allocations is simply not a practical goal. Better to perhaps avoid specific types of memory allocations if and when you determine that some specific operation is using up too much memory.
For such a requirement, are you sure that a high-level language like C# is your choice?
You cannot say whether the .NET library functions you are using are internally allocating memory or not. The standard doesn't guarantee that, so if they are not using allocations in the current version of .NET framework, they may start doing so later.
I suggest you profile the application to determine where the time and/or memory are being spent. Don't guess - you will only guess wrong.

Categories

Resources