I am developering a application which checks the database on startup (updates for new data) and when work is done(on shutdown/log-off) it pushes the performance logs to the database. The users themself is not changeing any data, they are only generating logs (the money comes from their use of the data ;)
When the users is done with the work, the application pushes logs to the database(MySQL database), I do not want to constantly push data, due to the connections are expected to drop and go offline doing the work day (mobile work), the less time online the better. This means the application have to be able to work in offline mode too.
The log pushed for a single user is usually about 2000 records, and each record contains about 70 bytes of data.
There is about 100 user at peak time (may grow to 300 in the near future) which makes it about 200.000 records of logs which is pushed to the MySQL Database each day. Because the users work at the same hours, there is going to be heavy peak times. Worst case is 200.000 records each of 70 bytes at the same time(~14 mb of data).
The database I am using is a MySQL database, this is choosen mostly because:
It is free (Sells arguments)
I can find help online
It is a standard database, means other IT Dept most likely know about it already
I am developing the application using: C# .Net 4.5
I have tried to use EntityFramework, this is very easy to start with, but it kinda fails on preformance.
The 2000 logs (inserts) for a single user takes about 7 seconds when I run the program + Server on my developer machine. And 7 seconds is unacceptable (moreover it will proberbly increase dramaticly when 200 users are doing it at the same time)
As I have read about it, it appears EntityFramework makes every insert as a single SQL command, and take one SQL Command at a time.
So I have tried to use MySQL Connector/Net. But I do not want to do it like EntityFramework, and do each insert as a single command. So my eyes went to MySqlBulkLoader. But it only want to accept a file, and not raw data, is there a way to load MySqlBulkLoader with data within the program, I would prefer not to save data to the harddisk to be able to send data to the database, it feels like a unnecessary detour.
So my questions is(no more story telling ;)
Can I load MySqlBulkLoader with data from memory without creating a file on the disk?
Should I use MySQL Connector/Net or is there another way I should do it (like raw SQL statements)?
EDIT: THE ANSWER IS
Use MySQL Connector/Net with raw SQL commands, make the insert as a Batch Insert LIKE THIS.
Suppose the records were 2000000 instead of just 2000. EF, like other ORMs, is designed for ease of coding in normal transaction workloads instead of performance critical intensive workloads.
The answer is simple: if you are not satisfied after you refactor your code to insert all items in a single DB transaction over a single connection (because 7 seconds is really too much for me too unless you close/open connection every time), you should do raw SQL statements in that part of the code and continue using EF in others.
There is no other way. Batch processing done right is by plain old SQL.
And MysqlBulkLoader is made only for file system operation, though the file can be a temporary file
Related
I have the following scenario: I am building a dummy web app that pulls betting odds every minute, stores all the events, matches, odds etc. to the database and then updates the UI.
I have this structure: Sports > Events > Matches > Bets > Odds and I am using code first approach and for all DB-related operations I am using EF.
When I am running my application for the very first time and my database is empty I am receiving XML with odds which contains: ~16 sports, ~145 events, ~675 matches, ~17100 bets & ~72824 odds.
Here comes the problem: how to save all this entities in timely manner? Parsing is not that time consuming operation - 0.2 seconds, but when I try to bulk store all these entities I face memory problems and the save took more than 1 minute so next odd pull is triggered and this is nightmare.
I saw somewhere to disable the Configuration.AutoDetectChangesEnabled and recreate my context on every 100/1000 records I insert, but I am not nearly there. Every suggestion will be appreciated. Thanks in advance
When you are inserting huge (though it is not that huge) amounts of data like that, try using SqlBulkCopy. You can also try using Table Value Parameter and pass it to a stored procedure but I do not suggest it for this case as TVPs perform well for records under 1000. SqlBulkCopy is super easy to use which is a big plus.
If you need to do an update to many records, you can use SqlBulkCopy for that as well but with a little trick. Create a staging table and insert the data using SqlBulkCopy into the staging table, then call a stored procedure which will get records from the staging table and update the target table. I have used SqlBulkCopy for both cases numerous times and it works pretty well.
Furthermore, with SqlBulkCopy you can do the insertion in batches as well and provide feedback to the user, however, in your case, I do not think you need to do that. But nonetheless, this flexibility is there.
Can I do it using EF only?
I have not tried but there is this library you can try.
I understand your situation but:
All actions you've been doing it all depends on your machine specs and
the software itself.
Now if machine specs cannot handle the process it will be the time to
change a plan like to limit the count of records to be inserted till
it all to be done.
I've a system where Data is being inserted through SP that's called via WCF Service.
In system, we have currently 12000+ actively logged in Users who will be calling WCF service at every 30 seconds (effectively min 200 requests per second).
On SQL Server side, CPU Usage shoots to 100% and when I examined, > 90% of time was spent in DB Writes. This affects overall server performance.
I need suggestion to resolve this issue so that we have less DB write operations and more CPU remains free.
Am open to integrate any other DB Server, use Entity Framework or any other ORM combination if needed. I need to have solution to handle this issue.
Other information that might be helpful:
Table has no indexes defined
Database has growth factor set to 200MB.
SQL Server Version is 2012.
SImple solution: back the writes. Do not call into the sql server for every insert.
Make a service that collects them and calls them more coarsely. The main problem is that transaction handling is a little heavy cost wise - in cases like that it may make sense to batch them.
Do not call a SP for every row, load them into a temp table and then process them in bulk (or use a table variable to provide the sp with multiple lines of information at once).
This gets rid of a lot of issues, including a ton of commits (you basically ask for like 200 TPS which is quite heavy and not needed here).
How you do that is up to you - but for something that heavy I would stay away from an ORM (Entity Framework is hilarious in not batching anything - that should be tons of sp calls) and use handcrafted sql at least for this part. I love ORM's but it is always nice to have a high performance hand crafted approach when needed.
I'm quite new to the DB subject, I hope my question is ok.
I want to build an application with a Database using entity framework code first.
Info about the DB I will have:
Each day a new DB is created.
A DB will contain approximately 9 tables, each of them has max 50 columns.
The total DB file size should be about 2GB.
The application will save data for 7 hours straight each day.
In the application will be 2 threads - one for creating the data and putting it in a buffer, and one for taking the data from the buffer and saving it in the Database.
My primary requirement is that the SaveChanges() function will finish as fast as possible since there is a lot of data that needed to be saved per day, and I'm afraid the "saving data" thread will not be as fast as the "creating data" thread and so the buffer will be overflowed.
which sql server edition should I use?
The fastest one is going to be the most expensive one - i.e. the one that supports the most memory, most cores, most CPU's( assuming you give it all that hardware to use, otherwise it won't really matter). On the same piece of hardware, licensing aside, the various editions (except CE) should run at the same level of performance. The question really should be, which one is fast enough...and we don't have enough information to answer that. Otherwise you may be throwing a lot of money at a software license (and hardware) to get performance you may not need and/or that you could get by optimizing and/or changing some code.
You should be able to use almost any version of SQL Server. Here is a comparison chart for the current versions on SQL Server: http://msdn.microsoft.com/en-us/library/cc645993(v=SQL.110).aspx
If you look there the express version only allows a 10GB limit on a database. So the express version seems good to start at, but if you get close to 10GB or want it a little faster look into the standard version.
I'm developing an application which needs to store large amounts of data.
I cannot use SQL Server Express edition since it requires separate installation and our target customers have already loaded us with complaints about our previous release with SQL Server express.
Now my choices are between SQL Server compact and Access.
We store huge amounts of reporting data (3 million a week). Which database can I use?
The application is portable and a product based application.
Our company is asking us to provide the application in such a way that it can be downloaded and used by anyone from our website. Please help.
Thanks.
Edit : 40,000 records within an hour is the approximate rate at which it is stored. The data stored is just normal varchar,datetime,nvarchar,etc. No Images and No binary or special stuff.
What is "3 million data"/ 3 million large images? 3 million bytes? There could be a vast difference there.
At any rate, I'd probably choose SQL CE over Access if the actual data size isn't going to exceed what SQL CE supports (4GB). I've used SQL CE for applications that collect a few hundred thousand records in a week without problem. The database is a single file, is portable, and has the huge benefit that full SQL Server can just attach to it and use it as a data source, even in replication scenarios.
40,000 records per hour is 10 records per second. I'd suggest creating the various tables and indexes required in both and testing first. And let the test run for a solid 8 hours and see what happens.
It's quite possible that the first x records may insert reasonably well but they get slower and slower. x being some number between 10K and 1M. Slower and slower is quite subjective and depends on the app. In Access I'd suggest doing a compact on a regular basis, ie after 100K records maybe, to clean up the indexes. However if the app wants to insert records for 8 hours straight without a break then clearly this won't work well for you.
Or you could try deleting the indexes, do the record inserts and recreate the indexes. However if the users want to query on the data while the records are being inserted then this too won't work.
Also Access can work significantly faster if the database isn't shared. Again that may not be practical.
Finally if you still don't get decent performances, or even if you do, considering having the the user install a solid state disk drive and place your database file on it. A 32 Gb SSD drive for few hundred dollars buys a lot of developer time mucking around with things.
If you are tightly coupled to .Net : sql server compact would be a better choice.
If not, consider using: sqlite
I have an importer process which is running as a windows service (debug mode as an application) and it processes various xml documents and csv's and imports into an SQL database. All has been well until I have have had to process a large amount of data (120k rows) from another table (as I do the xml documents).
I am now finding that the SQL server's memory usage is hitting a point where it just hangs. My application never receives a time out from the server and everything just goes STOP.
I am still able to make calls to the database server separately but that application thread is just stuck with no obvious thread in SQL Activity Monitor and no activity in Profiler.
Any ideas on where to begin solving this problem would be greatly appreciated as we have been struggling with it for over a week now.
The basic architecture is c# 2.0 using NHibernate as an ORM data is being pulled into the actual c# logic and processed then spat back into the same database along with logs into other tables.
The only other prob which sometimes happens instead is that for some reason a cursor is being opening on this massive table, which I can only assume is being generated from ADO.net the statement like exec sp_cursorfetch 180153005,16,113602,100 is being called thousands of times according to Profiler
When are you COMMITting the data? Are there any locks or deadlocks (sp_who)? If 120,000 rows is considered large, how much RAM is SQL Server using? When the application hangs, is there anything about the point where it hangs (is it an INSERT, a lookup SELECT, or what?)?
It seems to me that that commit size is way too small. Usually in SSIS ETL tasks, I will use a batch size of 100,000 for narrow rows with sources over 1,000,000 in cardinality, but I never go below 10,000 even for very wide rows.
I would not use an ORM for large ETL, unless the transformations are extremely complex with a lot of business rules. Even still, with a large number of relatively simple business transforms, I would consider loading the data into simple staging tables and using T-SQL to do all the inserts, lookups etc.
Are you running this into SQL using BCP? If not, the transaction logs may not be able to keep up with your input. On a test machine, try turning the recovery mode to Simple (non-logged) , or use the BCP methods to get data in (they bypass T logging)
Adding on to StingyJack's answer ...
If you're unable to use straight BCP due to processing requirements, have you considered performing the import against a separate SQL Server (separate box), using your tool, then running BCP?
The key to making this work would be keeping the staging machine clean -- that is, no data except the current working set. This should keep the RAM usage down enough to make the imports work, as you're not hitting tables with -- I presume -- millions of records. The end result would be a single view or table in this second database that could be easily BCP'ed over to the real one when all the processing is complete.
The downside is, of course, having another box ... And a much more complicated architecture. And it's all dependent on your schema, and whether or not that sort of thing could be supported easily ...
I've had to do this with some extremely large and complex imports of my own, and it's worked well in the past. Expensive, but effective.
I found out that it was nHibernate creating the cursor on the large table. I am yet to understand why, but in the mean time I have replaced the large table data access model with straight forward ado.net calls
Since you are rewriting it anyway, you may not be aware that you can call BCP directly from .NET via the System.Data.SqlClient.SqlBulkCopy class. See this article for some interesting perforance info.