I am currently writing WebCrawler, which operates over 8 threads, each thread gets page, scrapes for links and then check if the links have already been captured. If they are new links then they are stored.
This all works, but I’ve since encountered memory problems, so I started migrating the crawler over to store the data in MySQL database.
The problem I’m having, is how I get each thread to independently interact with the database, checking for data and inserting data if required.
It currently works with one thread, but as soon as I scale the thread pool, I get connection is already open errors.
Each thread has its own connection object, created on the thread for connecting to the database. Am I ignorantly concluding that these connections can be separate?
Apologies, it turns out that there was actually an error in my code and I was opening the connection twice in same thread.
For reference if other people have similar problems. It is possible to connect multiple times to the same database across multiple threads from C#. As long as each connection object is thread safe and independent of the others.
I don't know if its safe or possible to share one connections over multiple threads. But as I'm up and running Its not a question or problem I need to test.
Thanks
Related
Imagine that you have an application that have access to SQL Server 2012, so it reads data from one table, process it and writes result to another table.
If you launch two such applications simultaneously on different computers the resulting data will be doubled.
The question is:
How to prevent this situation?
Please provide you examples with Transact-SQL and C#.
You set some state in the DB that informs applications that a processing task is being performed. (I assume it's ok for both applications can run one after the other with no side-effect, or the same app can run twice)
The application will then check this state and refuse to run if its set.
Alternatively, you can lock an entire table so the 2nd instance cannot read (or write) data using the isolation level.
What you want is to lock the corresponding tables while one application is doing it's job.
More info here: http://www.sqlteam.com/article/introduction-to-locking-in-sql-server
I have a C# Winforms application. Part of the application pulls the contents of a SQLite table and displays it to the screen on a datagridview. I seem to have a problem where multiple users/computers are using the application.
When the program loads, it opens a single connection to the SQLite DB engine, which remains open until the user exits the program. On load, it refreshes the table in question and continues to do so at regular intervals. The table correctly updates when one user is using it or if that one user has more than one instance of the program open. If, however, more than one person uses it, the table doesn't seem to reflect changes made by other users until the program is closed and reopened.
An example - the first user (user A) logs in. the table has 5 entries. they add one to it. there are now 6 entries. User B now logs in and sees 6 entries. User A enters another record, for a total of 7. User B sees 6 even after the automatic refresh. And won't see 7 until they close out and reopen the program. User A sees 7 without any issue.
Any idea what could be causing this problem? It has to be something related to the DB engine for SQLite as I'm 100% sure my auto refresh is working properly. I suspect it has something to do with the write-ahead-logging feature or the connection pooling (which I have enabled). I disabled both to see what would happen, and the same issue occurs.
It could be a file lock issue - the first application may be taking an exclusive write lock, blocking out the other application instances. If this is true, then SQLite's may be simply waiting until the lock is released before updating the data file, which is not an ideal behaviour, but then again using SQLite for multi-user applications is also not ideal.
I have found hints that SHARED locks can (or should in the most recent version) be used. This may be a simple solution, but documentation of this is not easy to find (probably because this bends the specification of SQLite too far?)
Despite this however, it may be better to serialize file access yourself. And this depends on your precise system architecture, in how you should best approach such a feature.
Your system architecture is not clear from your description. You speak of "multiple users/computers" accessing the SQLite file.
If the multiple computers requirement is implemented using network share of the SQLfile, then this is indeed going to be a growing problem. A better architecture or another RDBMS would be advisable.
If multiple computers are accessing the data through a server process (or multiple server processes on the same machine?), then a simple Monitor lock (lock keyword), or ReaderWriterLock will help (in the case of multiple server processes an OS mutex would be required).
Update
Given your original question, the above still holds true. However given your situation, looking at your root problem - no access to your businesses RDBMS, I have some other suggestions:
Maria DB / mySQL / postgreSQL on your PC - of course this would require your PC to be kept on.
Some sort of database and/or service layer hosted in a datacentre (there are many options here, such as VPS, Azure DB, shared hosting DB etc.., of course all incurring a cost [perhaps there are some small free ones out there])
SQLite across network file systems is not a great solution. You'll find that the FAQ and Appropriate Uses pages gently steer you away from using SQLite as a concurrently accessed database across an NFS.
While in theory it could work, the implementation and latency of network file systems dramatically increases the chance of locking conflicts occurring during write actions.
I should point out that reading the database creates a read-only lock, which is fine for concurrent access.
Need some help from Oracle app developers out there:
I have an C#.NET 4.0 application which updates and inserts into a table using DDTek.Oracle library. My app runs everyday for about 12 hours and this exception came exactly twice and 15 days apart and never before. On these days, it was running fine for hours(it did both inserts and updates during this period). And then this exception comes. I have read that this exception could be from a bad connection string, but as I said before, the app has been running fine for a while. Could this be a db or network issue or could it be something else?
System.InvalidOperationException: Connection must be valid and open
at DDTek.Oracle.OracleConnection.get_Session()
at DDTek.Oracle.OracleConnection.BeginDbTransaction(IsolationLevel isolationLevel)
at DDTek.Oracle.OracleConnection.BeginTransaction()
FYI(if this could be the cause), I have two connections on two threads. Each thread updates a different table.
PS: If anyone know a good documentation for DDTek. Please reply with a link.
From what you describe I can only speculate - there are several possibilities:
most providers offer built-in pooling, sometimes a connection in the pool becomes invalid and you get some strange behaviour
sometimes the network settings/firewalls/IDS... limit how long a TCP connection can stay open
sometimes a subtle bug (accessing same connection from 2 different threads) leads to strange problems
sometimes the DB server (or a DB firewall) limits how long a session can stay connected
sometimes memory problems cause such behaviour, almost every Oracle provider uses OCI under the hood which requires using unmanaged heap etc.
I had one provider leaking unmanaged memory (diagnosed via a memory profiler and fixec by the vendor rather fast)
sometimes when connected to a RAC one listener/node goes down and/or some failover happens leaving current connections invalid
As for a link to comprehensive documentation of DDTek.Oracle see here and here.
I have a GUI where different parts of the information shown is extracted from a database. In order for the GUI not to freeze up I've tried putting the database queries in BackgroundWorkers. Because these access the database asynchronously I get an exception telling me the database connection is already open and used by another.
Is it possible to create a queue for database access?
I've looked into Task and ContinueWith, but since i code against .Net framework 3.5 this is not an option.
What is the DB engine you're using? Most modern databases are optimized for concurrent operations, so there's no need to queue anything.
The thing you're appaently doing wrong is reusing the same IDbConnection instance across different threads. Thats a no-no: each thread has to have its own instance.
I think your problem is in the way you get a connection to the database. If you want to fire separate queries you could use separate connections for separate requests. If you enable connection pooling this does not add a lot of overhead.
Try to use the pool objects. Plus as per your description your trying to open a connection on an unclosed connection object.
I'm running a number of threads which each attempt to perform INSERTS to one SQLite database. Each thread creates it's own connection to the DB. They each create a command, open a Transaction perform some INSERTS and then close the transaction. It seems that the second thread to attempt anything gets the following SQLiteException: The database file is locked. I have tried unwrapping the INSERTS from the transaction as well as narrowing the scope of INSERTS contained within each commit with no real effect; subsequent access to the db file raises the same exception.
Any thoughts? I'm stumped and I'm not sure where to look next...
Update your insertion code so that if it encounters an exception indicating database lock, it waits a bit and tries again. Increase the wait time by random increments each time (the "random backoff" algorithm). This should allow the threads to each grab the global write lock. Performance will be poor, but the code should work without significant modification.
However, SQLite is not appropriate for highly-concurrent modification. You have two permanent solutions:
Move to a "real" database, such as PostgreSQL or MySQL
Serialize all your database modifications through one thread, to avoid SQLite's modifications.
Two things to check:
1) Confirmed that your version of SQLite was compiled with THREAD support
2) Confirm that you are not opening the database EXCLUSIVE
I was not doing this in C#, but rather in Android, but I got around this "database is locked" error by keeping the sqlite database always opened within the wrapper class that owns it, for the entire lifetime of the wrapper class. Each insert done within this class then can be in its own thread (because, depending on your data storage situation, sd card versus device memory etc., db writing could take a long time), and I even tried throttling it, making about a dozen insert threads at once, and each one was handled very well because the insert method didn't have to worry about opening/closing a DB.
I'm not sure if persistent DB life-cycles is considered good style, though (it may be considered bad in most cases), but for now it's working pretty well.