MongoDB slow writes causes socket time out exception

MongoDB slow writes causes socket time out exception - c#

I am having performance issues with MongoDB.
Running on:
MongoDB 2.0.1
Windows 2008 R2
12 GB RAM
2 TB HDD (5400 rpm)
I've written a daemon which removes and inserts records async. Each hour most of the collections are cleared and they'll get new inserted data (10-12 million deletes and 10-12 million inserts). The daemon uses ~60-80 of the CPU while inserting the data (due calculating 1+ million knapsack problems). When I fire up the daemon it can do it's job about 1-2 mins till it crashes due a socket time out (writing data to the MongoDB server).
When I look in the logs I see it takes about 30 seconds to remove data in the collection. It seems it has something to do with the CPU load and memory usage.., because when I run the daemon on a different PC everything goes fine.
Is there any optimization possible or I am just bound to using a separate PC for running the daemon (or pick another document store)?
UPDATE 11/13/2011 18:44 GMT+1
Still having problems.. I've made some modifications to my daemon. I've decreased the concurrent number of writes. However the daemon still crashes when the memory is getting full (11.8GB of 12GB) and receives more load (loading data into the frontend). It crashes due a long insert/remove of MongoDB(30 seconds). The crash of the daemon is because of MongoDB is responding slow (socket time out exception). Ofcourse there should be try/catch statements to catch such exceptions, but it should not happen in the first place. I'm looking for a solution to solve this issue instead of working around it.
Total storage size is: 8,1 GB
Index size is: 2,1 GB
I guess the problem lies in that the working set + indexes are too large to store in memory and MongoDB needs to access the HDD (which is slow 5400 rpm).. However why would this be a problem? Aren't there other strategies to store the collections (e.g. in seperate files instead of large chunks of 2GB). If an Relational database can read/write data in an acceptable amount of time from the disk, why can't MongoDB?
UPDATE 11/15/2011 00:04 GMT+1
Log file to illustrate the issue:
00:02:46 [conn3] insert bargains.auction-history-eu-bloodhoof-horde 421ms
00:02:47 [conn6] insert bargains.auction-history-eu-blackhand-horde 1357ms
00:02:48 [conn3] insert bargains.auction-history-eu-bloodhoof-alliance 577ms
00:02:48 [conn6] insert bargains.auction-history-eu-blackhand-alliance 499ms
00:02:49 [conn4] remove bargains.crafts-eu-agamaggan-horde 34881ms
00:02:49 [conn5] remove bargains.crafts-eu-aggramar-horde 3135ms
00:02:49 [conn5] insert bargains.crafts-eu-aggramar-horde 234ms
00:02:50 [conn2] remove bargains.auctions-eu-aerie-peak-horde 36223ms
00:02:52 [conn5] remove bargains.auctions-eu-aegwynn-horde 1700ms
UPDATE 11/18/2011 10:41 GMT+1
After posting this issue in the mongodb usergroup we found out that "drop" wasn't issued. Drop is much faster then a full remove of all records.
I am using official mongodb-csharp-driver. I issued this command collection.Drop();. However It didn't work, so for the time being I used this:
public void Clear()
{
if (collection.Exists())
{
var command = new CommandDocument {
{ "drop", collectionName }
};
collection.Database.RunCommand(command);
}
}
The daemon is quite stable now, yet I have to find out why the collection.Drop() method doesn't work as it supposed to, since the driver uses the native drop command aswell.

Some optimizations may be possible:
Make sure your mongodb is not running in verbose mode, this will ensure minimal logging and hence minimal I/O . Else it writes every operation to a log file.
If possible by application logic, convert your inserts to bulk inserts.Bulk insert is supported in most mongodb drivers.
http://www.mongodb.org/display/DOCS/Inserting#Inserting-Bulkinserts
Instead of one remove operation per record, try to remove in bulk.
eg. collect "_id" of 1000 documents, then fire a remove query using $in operator.
You will have 1000 times less queries to mongoDb.
If you are removing/inserting for same document to refresh data, try considering an update instead.
What kind of deamon are you running ? If you can share more info on that,it may be possible to optimize that too to reduce CPU load.

It could be totally unrelated, but there was an issue in 2.0.0 that had to do with CPU consumption. after upgrade to 2.0.0 mongo starts consuming all cpu resources locking the system, complains of memory leak

Unless I have misunderstood, your application is crashing, not mongod. Have you tried to remove MongoDB from the picture and replacing writes to MongoDB with perhaps writes to the file system?
Maybe this will bring light to some other issue inside your application that is not related specifically to MongoDB.

I had something similar happen with SQL Server 2008 on Windows Server 2008 R2. For me, it ended up being the network card. The NIC was set to auto-sense the connection speed which was leading to occasional dropped/lost packets which was leading to the socket timeout problems. To test you can ping the box from your local workstation and kick off your process to load the Windows 2008 R2 server. If it is this problem eventually you'll start to see the timeouts on your ping command
ping yourWin2008R2Server -n 1000
The solution ended up being to explicitly set the NIC connection speed
Manage Computer > Device Manager > Network Adapters > Properties and then depending on the nic you'll have either a link speed setting tab or have to go into another menu. You'll want to set this to exactly the speed of the network it is connected to. In my DEV environment it ended up being 100Mbps Half duplex.
These types of problems, as you know, can be a real pain to track down!
Best to you in figuring it out.

The daemon is stable now, After posting this issue in the mongodb usergroup we found out that "drop" wasn't issued. Drop is much faster then a full remove of all records.

Related

MongoDB Disk Usage [duplicate]

The mongodb document says that
To compact this space, run db.repairDatabase() from the mongo shell (note this operation will block and is slow).
in http://www.mongodb.org/display/DOCS/Excessive+Disk+Space
I wonder how to make the mongodb free deleted disk space automatically ?
p.s. We stored many downloading task in mongodb, up to 20GB, and finished these in half an hour.

In general if you don't need to shrink your datafiles you shouldn't shrink them at all. This is because "growing" your datafiles on disk is a fairly expensive operation and the more space that MongoDB can allocate in datafiles the less fragmentation you will have.
So, you should try to provide as much disk-space as possible for the database.
However if you must shrink the database you should keep two things in mind.
MongoDB grows it's data files by
doubling so the datafiles may be
64MB, then 128MB, etc up to 2GB (at
which point it stops doubling to
keep files until 2GB.)
As with most any database ... to
do operations like shrinking you'll
need to schedule a separate job to
do so, there is no "autoshrink" in
MongoDB. In fact of the major noSQL databases
(hate that name) only Riak
will autoshrink. So, you'll need to
create a job using your OS's
scheduler to run a shrink. You could use an bash script, or have a job run a php script, etc.
Serverside Javascript
You can use server side Javascript to do the shrink and run that JS via mongo's shell on a regular bases via a job (like cron or the windows scheduling service) ...
Assuming a collection called foo you would save the javascript below into a file called bar.js and run ...
$ mongo foo bar.js
The javascript file would look something like ...
// Get a the current collection size.
var storage = db.foo.storageSize();
var total = db.foo.totalSize();
print('Storage Size: ' + tojson(storage));
print('TotalSize: ' + tojson(total));
print('-----------------------');
print('Running db.repairDatabase()');
print('-----------------------');
// Run repair
db.repairDatabase()
// Get new collection sizes.
var storage_a = db.foo.storageSize();
var total_a = db.foo.totalSize();
print('Storage Size: ' + tojson(storage_a));
print('TotalSize: ' + tojson(total_a));
This will run and return something like ...
MongoDB shell version: 1.6.4
connecting to: foo
Storage Size: 51351
TotalSize: 79152
-----------------------
Running db.repairDatabase()
-----------------------
Storage Size: 40960
TotalSize: 65153
Run this on a schedule (during none peak hours) and you are good to go.
Capped Collections
However there is one other option, capped collections.
Capped collections are fixed sized
collections that have a very high
performance auto-FIFO age-out feature
(age out is based on insertion order).
They are a bit like the "RRD" concept
if you are familiar with that.
In addition, capped collections
automatically, with high performance,
maintain insertion order for the
objects in the collection; this is
very powerful for certain use cases
such as logging.
Basically you can limit the size of (or number of documents in ) a collection to say .. 20GB and once that limit is reached MongoDB will start to throw out the oldest records and replace them with newer entries as they come in.
This is a great way to keep a large amount of data, discarding the older data as time goes by and keeping the same amount of disk-space used.

I have another solution that might work better than doing db.repairDatabase() if you can't afford for the system to be locked, or don't have double the storage.
You must be using a replica set.
My thought is once you've removed all of the excess data that's gobbling your disk, stop a secondary replica, wipe its data directory, start it up and let it resynchronize with the master.
The process is time consuming, but it should only cost a few seconds of down time, when you do the rs.stepDown().
Also this can not be automated. Well it could, but I don't think I'm willing to try.

Running db.repairDatabase() will require that you have space equal to the current size of the database available on the file system. This can be bothersome when you know that the collections left or data you need to retain in the database would currently use much less space than what is allocated and you do not have enough space to make the repair.
As an alternative if you have few collections you actually need to retain or only want a subset of the data, then you can move the data you need to keep into a new database and drop the old one. If you need the same database name you can then move them back into a fresh db by the same name. Just make sure you recreate any indexes.
use cleanup_database
db.dropDatabase();
use oversize_database
db.collection.find({},{}).forEach(function(doc){
db = db.getSiblingDB("cleanup_database");
db.collection_subset.insert(doc);
});
use oversize_database
db.dropDatabase();
use cleanup_database
db.collection_subset.find({},{}).forEach(function(doc){
db = db.getSiblingDB("oversize_database");
db.collection.insert(doc);
});
use oversize_database
<add indexes>
db.collection.ensureIndex({field:1});
use cleanup_database
db.dropDatabase();
An export/drop/import operation for databases with many collections would likely achieve the same result but I have not tested.
Also as a policy you can keep permanent collections in a separate database from your transient/processing data and simply drop the processing database once your jobs complete. Since MongoDB is schema-less, nothing except indexes would be lost and your db and collections will be recreated when the inserts for the processes run next. Just make sure your jobs include creating any nessecary indexes at an appropriate time.

If you are using replica sets, which were not available when this question was originally written, then you can set up a process to automatically reclaim space without incurring significant disruption or performance issues.
To do so, you take advantage of the automatic initial sync capabilities of a secondary in a replica set. To explain: if you shut down a secondary, wipe its data files and restart it, the secondary will re-sync from scratch from one of the other nodes in the set (by default it picks the node closest to it by looking at ping response times). When this resync occurs, all data is rewritten from scratch (including indexes), effectively do the same thing as a repair, and disk space it reclaimed.
By running this on secondaries (and then stepping down the primary and repeating the process) you can effectively reclaim disk space on the whole set with minimal disruption. You do need to be careful if you are reading from secondaries, since this will take a secondary out of rotation for a potentially long time. You also want to make sure your oplog window is sufficient to do a successful resync, but that is generally something you would want to make sure of whether you do this or not.
To automate this process you would simply need to have a script run to perform this action on separate days (or similar) for each member of your set, preferably during your quiet time or maintenance window. A very naive version of this script would look like this in bash:
NOTE: THIS IS BASICALLY PSEUDO CODE - FOR ILLUSTRATIVE PURPOSES ONLY - DO NOT USE FOR PRODUCTION SYSTEMS WITHOUT SIGNIFICANT CHANGES
#!/bin/bash
# First arg is host MongoDB is running on, second arg is the MongoDB port
MONGO=/path/to/mongo
MONGOHOST=$1
MONGOPORT=$2
DBPATH = /path/to/dbpath
# make sure the node we are connecting to is not the primary
while (`$MONGO --quiet --host $MONGOHOST --port $MONGOPORT --eval 'db.isMaster().ismaster'`)
do
`$MONGO --quiet --host $MONGOHOST --port $MONGOPORT --eval 'rs.stepDown()'`
sleep 2
done
echo "Node is no longer primary!\n"
# Now shut down that server
# something like (assuming user is set up for key based auth and has password-less sudo access a la ec2-user in EC2)
ssh -t user#$MONGOHOST sudo service mongodb stop
# Wipe the data files for that server
ssh -t user#$MONGOHOST sudo rm -rf $DBPATH
ssh -t user#$MONGOHOST sudo mkdir $DBPATH
ssh -t user#$MONGOHOST sudo chown mongodb:mongodb $DBPATH
# Start up server again
# similar to shutdown something like
ssh -t user#$MONGOHOST sudo service mongodb start

MySQL C# performance of Insert

I am developering a application which checks the database on startup (updates for new data) and when work is done(on shutdown/log-off) it pushes the performance logs to the database. The users themself is not changeing any data, they are only generating logs (the money comes from their use of the data ;)
When the users is done with the work, the application pushes logs to the database(MySQL database), I do not want to constantly push data, due to the connections are expected to drop and go offline doing the work day (mobile work), the less time online the better. This means the application have to be able to work in offline mode too.
The log pushed for a single user is usually about 2000 records, and each record contains about 70 bytes of data.
There is about 100 user at peak time (may grow to 300 in the near future) which makes it about 200.000 records of logs which is pushed to the MySQL Database each day. Because the users work at the same hours, there is going to be heavy peak times. Worst case is 200.000 records each of 70 bytes at the same time(~14 mb of data).
The database I am using is a MySQL database, this is choosen mostly because:
It is free (Sells arguments)
I can find help online
It is a standard database, means other IT Dept most likely know about it already
I am developing the application using: C# .Net 4.5
I have tried to use EntityFramework, this is very easy to start with, but it kinda fails on preformance.
The 2000 logs (inserts) for a single user takes about 7 seconds when I run the program + Server on my developer machine. And 7 seconds is unacceptable (moreover it will proberbly increase dramaticly when 200 users are doing it at the same time)
As I have read about it, it appears EntityFramework makes every insert as a single SQL command, and take one SQL Command at a time.
So I have tried to use MySQL Connector/Net. But I do not want to do it like EntityFramework, and do each insert as a single command. So my eyes went to MySqlBulkLoader. But it only want to accept a file, and not raw data, is there a way to load MySqlBulkLoader with data within the program, I would prefer not to save data to the harddisk to be able to send data to the database, it feels like a unnecessary detour.
So my questions is(no more story telling ;)
Can I load MySqlBulkLoader with data from memory without creating a file on the disk?
Should I use MySQL Connector/Net or is there another way I should do it (like raw SQL statements)?
EDIT: THE ANSWER IS
Use MySQL Connector/Net with raw SQL commands, make the insert as a Batch Insert LIKE THIS.

Suppose the records were 2000000 instead of just 2000. EF, like other ORMs, is designed for ease of coding in normal transaction workloads instead of performance critical intensive workloads.
The answer is simple: if you are not satisfied after you refactor your code to insert all items in a single DB transaction over a single connection (because 7 seconds is really too much for me too unless you close/open connection every time), you should do raw SQL statements in that part of the code and continue using EF in others.
There is no other way. Batch processing done right is by plain old SQL.
And MysqlBulkLoader is made only for file system operation, though the file can be a temporary file

Cannot open the shared memory region error

I have a user reporting this error when they're using my application.
The application is a .NET Winforms application running on Windows XP Embedded, using SQL Server CE 3.5 sp1, and Linq-To-SQL as the ORM. The database itself is located in a subdirectory my application creates in the My Documents folder. The user account is an adminstrator account on the system. There are no other applications or processes connecting to the database.
For the most part, the application seems to run fine. It starts up, can load data from and save data to the database. The user is using the application to access the database maybe a couple hundred times a day. They get this error, but only intermittently. Maybe 3-4 times a day.
In the code itself, all of the calls to the database are using a Linq-To-SQL data context that's wrapped in a using clause. So in other words:
using(MyDataContext db = new MyDataContext(ConnectionString))
{
List<blah> someList = db.SomeTable.Where(//selection criteria).ToList();
return(someList);
}
That's what pretty much all of the calls to the database look like (with the exception that the ones that save data obviously aren't selecting and returning anything). As I mentioned before, they have no issue 99% of the time but only get the shared memory error a few times a day.
My current "fix" is on application startup I simply read all of the data out of the database (there's not a lot) and cache it in memory and converted my database calls to read from the in-memory lists. So far, this seems to have fixed the problem. For a day and a half now they've reported no problems. But this is still bugging me, because I don't know what would cause the error in the first place.
While the application is accessing the database a few hundred times a day, it's typically not in rapid-fire succession. It's usually once every few minutes at the least. However, there is one use-case where there might be two calls one right after the other, as fast as possible. In other words, something like:
//user makes a selectio n on the screen
DatabaseCall1();
DatabaseCall2();
Both of those would follow the pattern in the code block above where they create a new context, do work, and then return. But these calls aren't asynchronous, so I would expect the connection would be closed and disposed of before DatabaseCall2 is invoked. However, could it be that something on the SQL Server CE end isn't closing the connection fast enough? It might explain why it's intermittent since maybe most of the time it doesn't have a problem? I should also mention that this exact program without the fix is installed on a few other systems with the exact same hardware and software (they're clones of each other), and users of the other systems have not reported any errors.
I'm stuck scratching my head because I can't reproduce this error on my development machine or a test machine, and answers to questions about this exception here and other places typically revolve around insufficient user permissions or the database on a shared network folder.

Check this previous post,I think you will find your answer :-
SQL Server CE - Internal error: Cannot open the shared memory region

Performance for reading files and inserting contents into database

I'm developing a system that isn't real time but there's an intervening standalone server between the end user machines and the database. The idea is that instead of burdening the database server every time a user sends something up, a windows service on the database machine sweeps the relay server at regular intervals and updates the database, deleting the temporary files on the relay box.
There is a scenario where the client software installed on thousands of machines sends up information at nearly the same time. The following hold true:
The above scenario won't occur often but could occur once every other week.
For each machine, 24 bytes of data (4k on the disk) is written on the relay server, which we want to then pick up and update the database with. So although it's fine if the user base is only a few thousands for now, they may amount to millions overtime.
I was thinking of a batch operation that only picks up some 15,000 - 20,000 files at a time and runs every whenever (amendable from app.config). The problem is that if the user base grows to a few million that will take days to complete. Yes, it doesn't have to be real-time information but waiting for days for all the data to reach the database isn't ideal either.
I think there will always be a bottleneck if the relay box is hammered, but are there better ways to improve performance and get the data across at a reasonable time (a day, two tops)?
Regards,
F.

I think you might consider that to avoid hammering the disk only one thread reads the files and then hands off processing to multiple threads to write to the database and returns to the disk thread to delete the files after commit. The amount of DB threads could be "amendable from app.config" to find the best value for your hardware config.
Just my 2 cents to get you thinking.

Caching architecture advise for a specific scenario

SETUP:
We have a .Net application that is distributed over 6 local servers each with a local database(ORACLE), 1 main server and 1 load balance machine. Requests come to the load balancer which redirects the incoming requests to one of the 6 local servers. In certain time intervals data is gathered in the main server and redistributed to the 6 local servers to be able to make decisions with the complete data.
Each local server has a cache component that caches the incoming requests based on different parameters (Location, incoming parameters, etc). With each request a local server decides whether to go to the database (ORACLE) or get the response from the cache. However in both cases the local server has to goto the database to do 1 insert and 1 update per request.
PROBLEM:
On a peak day each local server receives 2000 requests per second and system starts slowing down (CPU: 90% ). I am trying to increase the capacity before adding another local server to the mix. After running some benchmarks the bottleneck as it always is, seems to be the inevitable 1 insert and 1 update per request to database.
TRIED METHODS
To be able decrease the frequency I have created a Windows service that sits between the DB and .NET application. It contains a pipe server and receives each insert and update from the main .NET application and saves them in a Hashtable. The new service then at certain time intervals goes to the database once to do batch inserts and updates. The point was to go to the database less frequently. Although this had a positive effect it didn't benefit to the system load as much as I expected. The most of the cpu load comes from oracle.exe as requests per second increase.
I am trying to avoid going to the database as much as I can and the only way to avoid DB seems to be increasing the cache hit ratio other than the above mentioned solution I tried. My cache hit ratio is around 81 % percent currently. Because each local machine has its own cache I am actually missing lots of cacheable requests. When two similar requests redirects to different servers the second request cannot benefit from the cached result of the first one.
I don't have a lot of experience in system architecture so I would appreciate any help to this problem. Any suggestions on different caching architectures or setup, or any tools are welcome.
Thank you in advance, hopefully I made my question clear.

For me this looks like a application for a timesten solution. In that case you can eliminate the local databases and return to just one. Where you now have the local oracle databases, you can implement a cache grid. Most likely this is going to be a AWT (Async, Write Through) cache. See Oracle In-Memory Database Cache Concepts
It's not a cheap option but if could be worth investigating.
You can keep concentrating on the business logic and have no worries about speed. This of course only works good, if the aplication code is already tuned and the sql is performant and scalable. The SQL has to be prepared (using bind variables) to have the best performance.
Your application connects to the cache and no longer to the database. You create the cache tables in the cache group for which you want to have caching. All tables in a SQL should be cached, otherwise, the complete SQL is passed through to the Oracle database. In the grid a cache fusion mechanism is in place so you have no worries about where the data in your grid is located.
In the current release support for .net is included.
The data is consistent and asynchronously updated to the Oracle database. If the data that is needed is in the cache and you take the Oracle database down, the app can keep running. As soon as the database is back again, the synchronization pick up again. Very powerful.

2000 requests per second per server, about 24000 rps to database. It's a HUGE load for DB.
Try to optimize, scaleup or clusterize database.
May be NoSQL DB (Redis\Raven\Mongo) as middleware will be suitable for you. Local server will read\write sharded NoSQL DB, aggregated data will by synchronized with Oracle off-peak times.

I know the question is old now, but I wanted let everyone know how we solved our issue.
After trying many optimizations it turned out that all we needed was Solid State Drives for the 6 local machines. The CPU dropped down to 30% percent immediately after we installed them. This is the first time that I've seen any kind of hardware update contributes this much to performance.
If you have high load setup, before making any software or architecture changes try upgrading to a SSD.
Thanks everyone for your answers.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.