Is it a good idea to use MongoDB in .NET desktop application?
Mongo is meant to be run on a server with replication. It isn't really intended as a database for desktop applications (unless they're connecting to a database on a central server). There's a blog post on durability on the MongoDB blog, it's a common question.
When a write occurs and the write command returns, we can not be 100%
sure that from that moment in time on,
all other processes will see the
updated data only.
In every driver, there should be an option to do a "safe" insert or update, which waits for a database response. I don't know which driver you're planning on using (there are a few for .NET, http://github.com/samus/mongodb-csharp is the most officially supported), but the driver doesn't offer a safe option, you can run the getLastError command to synchronize things manually.
MongoDB won’t make sure your data is on the hard drive immediately. As a
result, you can lose data that you
thought was already written if your
server goes down in the period between
writing and actual storing to the hard
drive.
There is an fsync command, which you can run after every operation if you really want. Again, Mongo goes with the "safety in numbers" philosophy and encourages anyone running in production to have at least one slave for backup.
It depends on what you want to store in a database.
According to Wikipedia;
MongoDB is designed for problems
without heavy transactional
requirements that aren't easily solved
by traditional RDBMSs, including
problems which require the database to
span many servers.
There is a .NET driver available. And here is some information to help you getting started.
But you should first ask yourself; what do you want to store and what are the further requirements. (support for Stored Procedures, Triggers, expected size, etc etc)
Related
http://rockingtechnology.blogspot.co.uk/2011/06/oracle-backup-and-restore-code-in-cnet.html
As per the proposed code in the above article, more specifically:
ProcessStartInfo psi = new ProcessStartInfo();
psi.FileName = "C:/oracle/product/10.2.0/db_1/BIN/exp.exe";
Process process = Process.Start(psi);
process.WaitForExit();
process.Close();
How can I expect the database to be affected with regards to interruption of CRUD operations from elsewhere once calling Process.Start(psi) and, hence, executing exp.exe?
Using Oracle's exp.exe process - will the sessions of all users currently writing to the db in question be killed, for example? I'd imagine (or at least hope) not, but I haven't been able to find documentation to confirm this.
EXP and IMP are not proper backup and recover tools. They are intended for exchanging data and data structures between Oracle databases. This is also true for their replacement, Data Pump (EXPDP and IMPDP).
Export unloads to a file so it won't affect any users on the system. However if you want a consistent set of data you need to use the CONSISTENT=Y parameter if there are any other users connecting to the system .
Interestingly Data Pump does not have a CONSISTENT parameter. It unloads tables (or table partitions) as single transactions but the only way to guarantee consistency across all database objects is to use the FLASHBACK_SCN parameter (or kick all your users off the system).
"It is all in aid of DR."
As a DR solution this will work, with the following provisos.
The users will lose all data since the last export (obvious)
You will need to ensure the export is consistent across all objects
Imports take time. A lot of time if you have many tables or a lot of data. Plus indexes, etc
Also remember to export the statistics as well as the data.
You're really asking what effects the (old) Oracle export tool (exp) has on the database. It's a logical backup so you can think of the effects generally the same way you would think of running multiple SELECT queries against your database. That is, other sessions don't get killed but normal locking mechanisms may prevent them from accessing data until exp is done with it and this could, potentially, lead to timeouts.
EXP is the original export utility. It is discontinued and not supported in the most recent version (11g).
You can use EXPDP instead, although the export files are written on the server instead of the client machine.
Both utilities issue standard SELECT commands to the database, and since readers don't interfere with concurrency in Oracle (writer don't block readers, readers don't block readers), this will not block your other DB operations.
Since it issues statements however, it may increase the resource usage, especially IO, which could impact performance for concurrent activity.
Whatever tool you use, you should spend some time learning about the options (also since you may want to use it as a logical copy, make sure you test the respective import tools IMP and IMPDP). Also a word of warning: these tools are not backup tools. You should not rely on them for backup.
Question:
Is there a way to force the Task Parallel Library to run multiple tasks simultaneously? Even if it means making the whole process run slower with all the added context switching on each core?
Background:
I'm fairly new to multithreading, so I could use some assistance. My initial research hasn't turned up much, but I also doubt I know what exactly to search for. Perhaps someone more experienced with multithreading can help me better understand TPL and/or find a better solution.
Our company is planning on deploying a piece of software to all users' machines that will connect to a central server a few times a day, and synchronize some files and MS Access data back to the user's machine. We would like to load-test this concept first and see how the Access DB holds up to lots of simultaneous connections.
I've been tasked with writing a .NET application that behaves like the client app (connecting & syncing with a network location), but does this on multiple threads simultaneously.
I've been getting familiar with the Task Parallel Library (TPL), as this seems like the best (newest) way to handle multithreading, and get return values back from each thread easily. However as I understand it, TPL decides how to run each "task" for the fastest execution possible, splitting the work among the available cores. So lets say I want to run 30 sync jobs on a 2-core machine... the TPL would run 15 on each core, sequentially. This would mean my load test would only be hitting the Access DB with at most 2 connections at the same time. I want to hit the database with lots of simultaneous connections.
You can force the TPL to do this by specifying TaskOptions.LongRunning. According to Reflector (not according to the docs, though) this always creates a new thread. I consider relying on this safe production use.
Normal tasks will not do, because they don't guarantee execution. Setting MinThreads is a horrible solution (for production) because you are changing a process global setting to solve a local problem. And still, you are not guaranteed success.
Of course, you can also start threads. Tasks are more convenient though because of error handling. Nothing wrong with using threads for this use case.
Based on your comment, I think you should reconsider using Access in the first place. It doesn't scale well and has problems once the database grows to a certain size. Especially if this is simply served off some file share on your network.
You can try and simulate load from your single machine but I don't think that would be very representative of what you are trying to accomplish.
Have you considered using SQL Server Express? It's basically a de-tuned version of the full-blown SQL Server which might suit your needs better.
I've got a SaaS application which is being deployed to clients as a ClickOnce app, where the client connects to my cloud server via a WCF NetTcp connection. The data is all stored on the server, but the client needs to be able to see his data.
It's all working, but I'm having a bit of trouble when there's a lot of data to transfer, e.g. in one table a client might have about 3,000 records, and that takes an awful long time to come through the WCF connection. So that's problem 1: how to pull the necessary data from the server. Right now, trying to do it synchronously, it's simply timing out. I could up the timeout limits, but that feels a bit too much like brute force. Would you recommend some kind of asynchronous solution - and if so, how would you do that through WCF?
Problem 2: having got all this data down once, it would make sense to cache it and do some kind of background synchronization to make sure it stays fresh. But how to cache it? Should I ship a SQL Express DB with the ClickOnce app? Or is there a simpler way? And where can you save the cache data, bearing in mind the sandbox restrictions of a ClickOnce app?
For problem 1, you can invoke the operation that takes a long time asynchronously. Also consider not retrieving all 3000 records at once. If your use case allows, you could page the data.
For problem 2, yes you could cache it. You could do something as simple as an in-memory thread safe dictionary, or a thread safe singleton instance of the data if that makes sense. If you need to persist to disk, I would opt for a file based database like SQL CE or SQLLite so that there is no client installation required.
Depending on the data and how much of it is allowed to be cached or synchronised you could look at the replication features in SQL Compact 3.5 (not 4.0) as this supports partial synchronisation with SQL server.
Failing that you could store the data in your own database locally (for this I'd recommend SQL Compact 4.0 rather than SQL Express) and then you'd have to have your own updating logic.
Alternatively if you don't need to cache the data for long you could use the Enterprise Library caching blocks and store the data in Application Storage. This method would be best suited if you wanted to cache queries and results rather than individual records.
I'm programming a simple customer-information management software now with SQLite.
One exe file, one db file, some dll files. - That's it :)
2~4 people may be going to run this exe file simultaneously and access to a database.
Not only just reading but frequent editing will be done by them too.
Yeahhh now here comes the one of the most famous problems... "Synchronization"
I was trying to create / remove a temporary empty file whenever someone is trying
to edit it. (this is a 'key' to access the db.)
But there must be a better way for it : (
What would be the best way of preventing this problem?
Well, SQLite already locks the database file for each use, the idea being that multiple applications can share the same database.
However, the documentation for SQLite explicitly warns about using this over the network:
SQLite will work over a network
filesystem, but because of the latency
associated with most network
filesystems, performance will not be
great. Also, the file locking logic of
many network filesystems
implementation contains bugs (on both
Unix and Windows). If file locking
does not work like it should, it might
be possible for two or more client
programs to modify the same part of
the same database at the same time,
resulting in database corruption.
Because this problem results from bugs
in the underlying filesystem
implementation, there is nothing
SQLite can do to prevent it.
A good rule of thumb is that you
should avoid using SQLite in
situations where the same database
will be accessed simultaneously from
many computers over a network
filesystem.
So assuming your "2-4 people" are on different computers, using a network file share, I'd recommend that you don't use SQLite. Use a traditional client/server RDBMS instead, which is designed for multiple concurrent connections from multiple hosts.
Your app will still need to consider concurrency issues (unless it speculatively acquires locks on whatever the user is currently looking at, which is generally a nasty idea) but at least you won't have to deal with network file system locking issues as well.
You are looking at some classic problems in dealing with multiple users accessing a database: the Lost Update.
See this tutorial on concurrency:
http://www.brainbell.com/tutors/php/php_mysql/Transactions_and_Concurrency.html
At least you won't have to worry about the db file itself getting corrupted by this, because SQLite locks the whole file when it's being written. That being said, SQLite doesn't recommend you to use it if you expect your app to be accessed simultaneously by a multiple clients.
Following on from this question...
What to do when you’ve really screwed up the design of a distributed system?
... the client has reluctantly asked me to quote for option 3 (the expensive one), so they can compare prices to a company in India.
So, they want me to quote (hmm). In order for me to get this as accurate as possible, I will need to decide how I'm actually going to do it. Here's 3 scenarios...
Scenarios
Split the database
My original idea (perhaps the most tricky) will yield the best speed on both the website and the desktop application. However, it may require some synchronising between the two databases as the two "systems" so heavily connected. If not done properly and not tested thouroughly, I've learnt that synchronisation can be hell on earth.
Implement caching on the smallest system
To side-step the sync option (which I'm not fond of), I figured it may be more productive (and cheaper) to move the entire central database and web service to their office (i.e. in-house), and have the website (still on the hosted server) download data from the central office and store it in a small database (acting as a cache)...
Set up a new server in the customer's office (in-house).
Move the central database and web service to the new in-house server.
Keep the web site on the hosted server, but alter the web service URL so that it points to the office server.
Implement a simple cache system for images and most frequently accessed data (such as product information).
... the down-side is that when the end-user in the office updates something, their customers will effectively be downloading the data from a 60KB/s upload connection (albeit once, as it will be cached).
Also, not all data can be cached, for example when a customer updates their order. Also, connection redundancy becomes a huge factor here; what if the office connection is offline? Nothing to do but show an error message to the customers, which is nasty, but a necessary evil.
Mystery option number 3
Suggestions welcome!
SQL replication
I had considered MSSQL replication. But I have no experience with it, so I'm worried about how conflicts are handled, etc. Is this an option? Considering there are physical files involved, and so on. Also, I believe we'd need to upgrade from SQL express to SQL non-free, and buy two licenses.
Technical
Components
ASP.Net website
ASP.net web service
.Net desktop application
MSSQL 2008 express database
Connections
Office connection: 8 mbit down and 1 mbit up contended line (50:1)
Hosted virtual server: Windows 2008 with 10 megabit line
Having just read for the first time your original question related to this I'd say that you may have laid the foundation for resolving the problem simply because you are communicating with the database by a web service.
This web service may well be the saving grace as it allows you to split the communications without affecting the client.
A good while back I was involved in designing just such a system.
The first thing that we identified was that data which rarely changes - and immediately locked all of this out of consideration for distribution. A manual process for administering using the web server was the only way to change this data.
The second thing we identified was that data that should be owned locally. By this I mean data that only one person or location at a time would need to update; but that may need to be viewed at other locations. We fixed all of the keys on the related tables to ensure that duplication could never occur and that no auto-incrementing fields were used.
The third item was the tables that were truly shared - and although we worried a lot about these during stages 1 & 2 - in our case this part was straight-forwards.
When I'm talking about a server here I mean a DB Server with a set of web services that communicate between themselves.
As designed our architecture had 1 designated 'master' server. This was the definitive for resolving conflicts.
The rest of the servers were in the first instance a large cache of anything covered by item1. In fact it wasn't a large cache but a database duplication but you get the idea.
The second function of the each non-master server was to coordinate changes with the master. This involved a very simplistic process of actually passing through most of the work transparently to the master server.
We spent a lot of time designing and optimising all of the above - to finally discover that the single best performance improvement came from simply compressing the web service requests to reduce bandwidth (but it was over a single channel ISDN, which probably made the most difference).
The fact is that if you do have a web service then this will give you greater flexibility about how you implement this.
I'd probably start by investigating the feasability of implementing one of the SQL server replication methods
Usual disclaimers apply:
Splitting the database will not help a lot but it'll add a lot of nightmare. IMO, you should first try to optimize the database, update some indexes or may be add several more, optimize some queries and so on. For database performance tuning I recommend to read some articles from simple-talk.com.
Also in order to save bandwidth you can add bulk processing to your windows client and also add zipping (archiving) to your web service.
And probably you should upgrade to MS SQL 2008 Express, it's also free.
It's hard to recommend a good solution for your problem using the information I have. It's not clear where is the bottleneck. I strongly recommend you to profile your application to find exact place of the bottleneck (e.g. is it in the database or in fully used up channel and so on) and add a description of it to the question.
EDIT 01/03:
When the bottleneck is an up connection then you can do only the following:
1. Add archiving of messages to service and client
2. Implement bulk operations and use them
3. Try to reduce operations count per user case for the most frequent cases
4. Add a local database for windows clients and perform all operations using it and synchronize the local db and the main one on some timer.
And sql replication will not help you a lot in this case. The most fastest and cheapest solution is to increase up connection because all other ways (except the first one) will take a lot of time.
If you choose to rewrite the service to support bulking I recommend you to have a look at Agatha Project
Actually hearing how many they have on that one connection it may be time to up the bandwidth at the office (not at all my normal response) If you factor out the CRM system what else is a top user of the bandwidth? It maybe the they have reached the point of needing more bandwidth period.
But I am still curious to see how much information you are passing that is getting used. Make sure you are transferring efferently any chance you could add some easy quick measures to see how much people are actually consuming when looking at the data.