Reading in 'parallel' from postgre using npgsql

Reading in 'parallel' from postgre using npgsql - c#

Sorry, if this question seems like a duplicate, but I was not able to figure out a solution from other answers.
I have a postgre DB that I am accessing using Npgsql.
I have multiple clients reading from the DB simultaneously.
I am getting exception - Operation is in progress.
I know the reason behind it.
I am using
public bool ReadRecord(arguments)
{
....
NpgsqlDataReader reader = cmd.ExecuteReader();
while (reader.Read())
{
...
}
...
}
Calling routine
private void GetBundleIdAndIsWat(arguments)
{
...
TaskQueue.GlobalInstance.AddTaskQueue(new Task<bool>(ReadRecord, InputData), tokenSource);
...
}
and while the reader is not disposed off for a given thread, the other one is not able to execute a command.
How do all multiple threads read from the DB simultaneously?
Does ExecuteReaderAsync allow only one thread to execute the command at a time?
In this case, I won't be able to read at the same time, right?
I read about Connection Pool but don't really know how to implement it.

How do all multiple threads read from the DB simultaneously?
It looks like you're sharing single database connection between parallel operations.
Just use separate connection per operation.
Note, that:
number of parallel connections is limited both by connection pool size and PostgreSQL. Connection pool size is limited by default to 100 connections, and you can change it by setting MaxPoolSize parameter in connection string. PostgreSQL limits this by max_connections parameter in postgresql.conf, and by default this one is also set to 100. Note, that a hundred of parallel connections is a rather big value, don't increase it until you really need this.
you mentioned threads, but this is a good candidate to use parallel tasks and asynchronous code instead of threads. This also will allow to utilize connections in more efficient way.
I read about Connection Pool but don't really know how to implement it
You don't need to implement connection pool yourself.
Well-done ADO .NET provider already implements it.
Just use connections, commands, and readers as usual: create, use, dispose.

Related

Thread pool multiclient TCP (Messagehandling)

I have just learned that you can use a Thread pool for multi client TCP-connections, I have an C# application today that I like to implement this to. I have read some, for example the first answer to this question ( Best way to accept multiple tcp clients? ), but I dont really get how to make the last adjustments to work with my "needs". I have a messagehandeling function for each connection (each connection is 2 threads, one for recieving/sending messages (connection open for a long duration moste of the time) and one for doing Tasks depending on the messages (also creates answers to send back). I would now like to use the recieving method in the link below, but how can I do this with a thread pool in my example?
If anything is unclear, just ask questions!
/Nick

Just avoid having one thread per connection. It generate lot of overhead on the OS and doesn't scale well.
Today we use NIO : non blocking I/O. One thread can handle 10k+ connections. There are very easy way to use them, like NodeJs for example. NIO libraries are available for most platform/languages (Netty for Java, NodeJs with javascript,...).
You should specify wich language/environement you are using.

One SQLiteConnection per thread?

I am using SQLite from system.data.sqlite.org
We need to access the database from many threads (for various reasons). I've read a lot about sqlite thread safe capabilities (the default synchronized access mode is fine for me).
I wonder if it is possible to simply open a connection per thread. Is something like this possible? I really don't care about race conditions (request something that hasn't been inserted yet). I am only interested in the fact that it is possible to access the data using one SQLiteConnection object per thread.

Yes. In fact, it's the proper way, as SQLite is not thread safe (by default. You can make it threadsafe compiling with some option). And just to ensure it works: SQLite is being used in some small websites, so multithreading is there :)
Here more information: http://www.sqlite.org/faq.html#q6

Given you use a separate connection per thread you should be fine.
From docs
Note that SQLiteConnection instance is not guaranteed to be thread
safe. You should avoid using the same SQLiteConnection in several
threads at the same time. It is recommended to open a new connection
per thread and to close it when the work is done.

Database connection pooling with multi-threaded service

I have a .NET 4 C# service that is using the TPL libraries for threading. We recently switched it to also use connection pooling, since one connection was becoming a bottle neck for processing.
Previously, we were using a lock clause to control thread safety on the connection object. As work would back up, the queue would exist as tasks, and many threads (tasks) would be waiting on the lock clause. Now, in most scenarios, threads wait on database IO and work processes MUCH faster.
However, now that I'm using connection pooling, we have a new issue. Once the max number of connections is reached (100 default), if further connections are requested, there is a timeout (see Pooling info). When this happens, an exception is thrown saying "Connection request timed out".
All of my IDisposables are within using statements, and I am properly managing my connections. This scenario happens due to more work being requested than the pool can process (which is expected). I understand why this exception is thrown, and am aware of ways of handling it. A simple retry feels like a hack. I also realize that I can increase the timeout period via the connection string, however that doesn't feel like a solid solution. In the previous design (without pooling), work items would process because of the lock within the application.
What is a good way of handling this scenario to ensure that all work gets processed?

Another approach is to use a semaphore around the code that retrieves connections from the pool (and, hopefully, returns them). A sempahore is like a lock statement, except that it allows a configurable number of requestors at a time, not just one.
Something like this should do:
//Assuming mySemaphore is a semaphore instance, e.g.
// public static Semaphore mySemaphore = new Semaphore(100,100);
try {
mySemaphore.WaitOne(); // This will block until a slot is available.
DosomeDatabaseLogic();
} finally {
mySemaphore.Release();
}

You could look to control the degree of parallelism by using the Parallel.ForEach() method as follows:
var items = ; // your collection of work items
var parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = 100 };
Parallel.ForEach(items, parallelOptions, ProcessItem)
In this case I chose to set the degree to 100, but you can choose a value that makes sense for your current connection pool implementation.
This solution of course assumes that you have a collection of work items up front. If, however, you're creating new Tasks through some external mechanism such as incoming web requests the exception is actually a good thing. At that point I would suggest that you make use of concurrent Queue data structure where you can place the work items and pop them off as worker threads become available.

The simplest solution is to increase the connection timeout to the length of time you are willing to block a request before returning failure. There must be some length of time that is "too long".
This effectively uses the connection pool as a work queue with a timeout. It's a lot easier than trying to implement one yourself. You would have to check the connection pool is fair ( FIFO ).

Too many Tasks causes SQL db to timeout

My problem is that I'm apparently using too many tasks (threads?) that call a method that queries a SQL Server 2008 database. Here is the code:
for(int i = 0; i < 100000 ; i++)
{
Task.Factory.StartNew(() => MethodThatQueriesDataBase()).ContinueWith(t=>OtherMethod(t));
}
After a while I get a SQL timeout exception. I want keep the actual number of threads low(er) than 100000 to a buffer of say "no more than 10 at a time". I know I can manage my own threads using the ThreadPool, but I want to be able to use the beauty of TPL with the ContinueWith.
I looked at the Task.Factory.Scheduler.MaximumConcurrencyLevel but it has no setter.
How do I do that?
Thanks in advance!
UPDATE 1
I just tested the LimitedConcurrencyLevelTaskScheduler class (pointed out by Skeet) and still doing the same thing (SQL Timeout).
BTW, this database receives more than 800000 events per day and has never had crashes or timeouts from those. It sounds kinda weird that this will.

You could create a TaskScheduler with a limited degree of concurrency, as explained here, then create a TaskFactory from that, and use that factory to start the tasks instead of Task.Factory.

Tasks are not 1:1 with threads - tasks are assigned threads for execution out of a pool of threads, and the pool of threads is normally kept fairly small (number of threads == number of CPU cores) unless a task/thread is blocked waiting for a long-running synchronous result - such as perhaps a synchronous network call or file I/O.
So spinning up 10,000 tasks should not result in the production of 10,000 actual threads. However, if every one of those tasks immediately dives into a blocking call, then you may wind up with more threads, but it still shouldn't be 10,000.
What may be happening here is you are overwhelming the SQL db with too many requests all at once. Even if the system only sets up a handful of threads for your thousands of tasks, a handful of threads can still cause a pileup if the destination of the call is single-threaded. If every task makes a call into the SQL db, and the SQL db interface or the db itself coordinates multithreaded requests through a single thread lock, then all the concurrent calls will pile up waiting for the thread lock to get into the SQL db for execution. There is no guarantee of which threads will be released to call into the SQL db next, so you could easily end up with one "unlucky" thread that starts waiting for access to the SQL db early but doesn't get into the SQL db call before the blocking wait times out.
It's also possible that the SQL back-end is multithreaded, but limits the number of concurrent operations due to licensing level. That is, a SQL demo engine only allows 2 concurrent requests but the fully licensed engine supports dozens of concurrent requests.
Either way, you need to do something to reduce your concurrency to more reasonable levels. Jon Skeet's suggestion of using a TaskScheduler to limit the concurrency sounds like a good place to start.

I suspect there is something wrong with the way you're handling DB connections. Web servers could have thousands of concurrent page requests running all in various stages of SQL activity. I'm betting that attempts to reduce the concurrent task count is really masking a different problem.
Can you profile the SQL connections? Check out perfmon to see how many active connections there are. See if you can grab-use-release connections as quickly as possible.

How do I safely use ADO.NET IDbConnection and IDbCommand to execute multiple database commands concurrently?

The Goal
Use an ADO.NET IDbConnection and IDbCommand to execute multiple commands at the same time, against the same database, given that the ADO.NET implementation is specified at runtime.
Investigation
The MSDN Documentation for IDbConnection does not specify any threading limitations. The SqlConnection page has the standard disclaimer saying "Any instance members are not guaranteed to be thread safe." The IDbCommand and SqlCommand documentation is equally un-informative.
Assuming that no individual instance member is thread-safe, I can still create multiple commands from a connection (on the same thread) and then execute them concurrently on different threads.
Presumably this would still not achieve the desired effect, because (I assume) only one command can execute at a time on the single underlying connection to the database. So the concurrent IDbCommand executions would get serialized at the connection.
Conclusion
So this means we have to create a separate IDbConnection, which is ok if you know you're using SqlConnection because that supports pooling. If your ADO.NET implementation is determined at runtime, these assumptions cannot be made.
Does this mean I need to implement my own connection pooling in order to support performant multi-threaded access to the database?

You will need to manage thread access to your instance members, but most ADO implementations manage their own connection pool. They generally expect that multiple queries will be run simultaneously.
I would feel free to open and close as many connections as is necessary, and handle an exceptions that could be thrown if pooling were not available.
Here's an article on ADO connection pooling

If you create a connection on one thread, you shouldn't use it on a different thread. The same goes for commands.
However, you can create a connection on each of your threads and use those objects safely on their own thread.
Pooling is for when you create lots of short-lived connection objects. It means the underlying ( expensive ) database connections are re-used.
Nick

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reading in 'parallel' from postgre using npgsql - c#

Related

Thread pool multiclient TCP (Messagehandling)

One SQLiteConnection per thread?

Database connection pooling with multi-threaded service

Too many Tasks causes SQL db to timeout

How do I safely use ADO.NET IDbConnection and IDbCommand to execute multiple database commands concurrently?

Categories

Resources