this is my Problem,
i have a table with more than 1 million records and im using these record to generate some reports using crystal reports, but when selecting large number of records some time occur timeout erorros or sometime computer getting stuck,
i already used index and im retrieving data using Stored procedures.
there are several tables that joining with main table(that have 1mil records) and also data will group inside the report.
so im asking cant we use MSSQL CLR to get these data from data base by compressing or converting to light objects, if anyone have an idea will be appreciate..
thanks
there are 2 separate issues in your post that are unlikely to be solved by a CLR solution:
timeout
there are no details about where the timeout actually occur (on the rdbms while performing the selection, on the client side while retrieving the data, in the report engine while actually building the report) so i suppose that the timeout occur on the rdbms.building a CLR library will not improve the time required to gather the data.the solution is indexing (as you already did) and query plan analizing to identify bottlenecks and issues.should you have provided any sql code it would have been possible to give you some relevant hint.
computer getting stuck
this looks like an issue related to the amount of data that makes the machine struggle and there is very little you can do.once again a CLR library on the server will not lower the amount of data handed to che client and imho would only make the situation worse (the client would receive compressed data to uncompress: an additional heavy task for an already overloaded machine).your best bet are increase the amount of ram, buy a better cpu (higher speed and/or more cores), run the reports scheduled and not interactively.
there are no technical details at all in the post so all the above are more or less wild guesses; should you need some more detailed answer and advice please post another question with technical details about the issues.
https://stackoverflow.com/help/how-to-ask
Related
I am faced with a task, where I have to design a web application in .net framework. In this application users will only (99% of the time) have readonly access as they will just see data (SELECT).
However the backend database is going to be the beast where every minute there will be records updated / inserted / deleted. The projection is that at very minimum there will be about 10 million records added to system in a year, in less than 5 tables collectively.
Question/Request 1:
As these updates/inserts will happen very frequently (every minute or 2 the latest) I was hoping to get some tips so that when some rows are being changed a select query may not cause a thread deadlock or vice-versa.
Question/Request 2:
My calculated guess is that in normal situations, only few hundred records will be inserted every minute 24/7 (and updated / deleted based on some conditions). If I write a C# tool which will get data from any number of sources (xml, csv or direct from some tables from a remote db, a configuration file or registry setting will dictate which format the data is being imported from) and then do the insert / update / deleted, will this be fast enough and/or will cause deadlock issues?
I hope my questions are elaborate enough... Please let me know if this is all vague...
Thanks in advance
I will answer first your question #2: According with the scenario you've described, it will be fast enough. But remember to issue direct sql commands to the database. I personally have a very, very, similar scenario and the application runs without any problem, but when a scheduled job executes multiples inserts/deletes with a tool (like nhibernate) deadlocks do occurs. So, again, if prossible, execute direct sql statements.
Question #1: You can use "SELECT WITH NOLOCK".
For example:
SELECT * FROM table_sales WITH (NOLOCK)
It avoids blocks on database. But you have to remember that you might be reading an outdated info (once again, in the scenario you've described probably it will not be a problem).
You can also try "READ COMMITTED SNAPSHOT", it was supported since the 2005 version, but for this example I will keep it simple. Search a little about it to decide wich one may be the best choice for you.
I am trying to develop a system in which I will sync my database with 3-Party database by provided API.
The API has format in which we can provide From-Date and To-Date
Problems
There is no API which gives me only modified records.
The data is too large (1000 records/day average)
Need a scheduler so all the records are updated automatically
I also need to keep track of modified records(which is the biggest problem as I can't get them by modified date)
Note : As per the previous requirement i Have already developed the system in which i can Specify the From-Date and To-Date the record get updated (its completed with the GUI no ajax was uses). and even if I request 1 day records the system get time out error.
NOTE 2 : I really should no say but the client is too strict (DUMB);( he just need the solution nothing else will do
Assuming that the data doesn't need to be "fresh" can you not write a process to run hourly / nightly fetching that days worth of data and processing it into your DB?
Obviously this would only work if you're sure previous records are not updated?
Does the API provide batches?
Why did you you choose a web client with Ajax to process this data? Would a windows / console application be better suited?
If the data is too big to retrieve by any given query, you're just going to have to do it by ID. Figure out a good size (100 records? 250?), and just spin through every record in the system by groups of that size.
You didn't say if you're pulling down data, pushing up data, or both. If you're only pulling it down, then that's the best you can do, and it will get slower and slower as more records are added. If you're only pushing it, then you can track a "pushed date". If it's both, how do you resolve conflicts?
I'm quite new to the DB subject, I hope my question is ok.
I want to build an application with a Database using entity framework code first.
Info about the DB I will have:
Each day a new DB is created.
A DB will contain approximately 9 tables, each of them has max 50 columns.
The total DB file size should be about 2GB.
The application will save data for 7 hours straight each day.
In the application will be 2 threads - one for creating the data and putting it in a buffer, and one for taking the data from the buffer and saving it in the Database.
My primary requirement is that the SaveChanges() function will finish as fast as possible since there is a lot of data that needed to be saved per day, and I'm afraid the "saving data" thread will not be as fast as the "creating data" thread and so the buffer will be overflowed.
which sql server edition should I use?
The fastest one is going to be the most expensive one - i.e. the one that supports the most memory, most cores, most CPU's( assuming you give it all that hardware to use, otherwise it won't really matter). On the same piece of hardware, licensing aside, the various editions (except CE) should run at the same level of performance. The question really should be, which one is fast enough...and we don't have enough information to answer that. Otherwise you may be throwing a lot of money at a software license (and hardware) to get performance you may not need and/or that you could get by optimizing and/or changing some code.
You should be able to use almost any version of SQL Server. Here is a comparison chart for the current versions on SQL Server: http://msdn.microsoft.com/en-us/library/cc645993(v=SQL.110).aspx
If you look there the express version only allows a 10GB limit on a database. So the express version seems good to start at, but if you get close to 10GB or want it a little faster look into the standard version.
Does anyone have any experience with receiving and updating a large volume of data, storing it, sorting it, and visualizing it very quickly?
Preferably, I'm looking for a .NET solution, but that may not be practical.
Now for the details...
I will receive roughly 1000 updates per second, some updates, some new rows of data records. But, it can also be very burst driven, with sometimes 5000 updates and new rows.
By the end of the day, I could have 4 to 5 million rows of data.
I have to both store them and also show the user updates in the UI. The UI allows the user to apply a number of filters to the data to just show what they want. I need to update all the records plus show the user these updates.
I have an visual update rate of 1 fps.
Anyone have any guidance or direction on this problem? I can't imagine I'm the first one to have to deal with something like this...
At first though, some sort of in memory database I would think, but will it be fast enough for querying for updates near the end of the day once I get a large enough data set? Or is that all dependent on smart indexing and queries?
Thanks in advance.
It's a very interesting and also challenging problem.
I would approach a pipeline design with processors implementing sorting, filtering, aggregation etc. The pipeline needs an async (threadsafe) input buffer that is processed in a timely manner (according to your 1fps req. under a second). If you can't do it, you need to queue the data somewhere, on disk or in memory depending on the nature of your problem.
Consequently, the UI needs to be implemented in a pull style rather than push, you only want to update it every second.
For datastore you have several options. Using a database is not a bad idea, since you need the data persisted (and I guess also queryable) anyway. If you are using an ORM, you may find NHibernate in combination with its superior second level cache a decent choice.
Many of the considerations might also be similar to those Ayende made when designing NHProf, a realtime profiler for NHibernate. He has written a series of posts about them on his blog.
May be Oracle is more appropriate RDBMS solution fo you. The problem with your question is that at this "critical" levels there are too much variables and condition you need to deal with. Not only software, but hardware that you can have (It costs :)), connection speed, your expected common user system setup and more and more and more...
Good Luck.
I have an importer process which is running as a windows service (debug mode as an application) and it processes various xml documents and csv's and imports into an SQL database. All has been well until I have have had to process a large amount of data (120k rows) from another table (as I do the xml documents).
I am now finding that the SQL server's memory usage is hitting a point where it just hangs. My application never receives a time out from the server and everything just goes STOP.
I am still able to make calls to the database server separately but that application thread is just stuck with no obvious thread in SQL Activity Monitor and no activity in Profiler.
Any ideas on where to begin solving this problem would be greatly appreciated as we have been struggling with it for over a week now.
The basic architecture is c# 2.0 using NHibernate as an ORM data is being pulled into the actual c# logic and processed then spat back into the same database along with logs into other tables.
The only other prob which sometimes happens instead is that for some reason a cursor is being opening on this massive table, which I can only assume is being generated from ADO.net the statement like exec sp_cursorfetch 180153005,16,113602,100 is being called thousands of times according to Profiler
When are you COMMITting the data? Are there any locks or deadlocks (sp_who)? If 120,000 rows is considered large, how much RAM is SQL Server using? When the application hangs, is there anything about the point where it hangs (is it an INSERT, a lookup SELECT, or what?)?
It seems to me that that commit size is way too small. Usually in SSIS ETL tasks, I will use a batch size of 100,000 for narrow rows with sources over 1,000,000 in cardinality, but I never go below 10,000 even for very wide rows.
I would not use an ORM for large ETL, unless the transformations are extremely complex with a lot of business rules. Even still, with a large number of relatively simple business transforms, I would consider loading the data into simple staging tables and using T-SQL to do all the inserts, lookups etc.
Are you running this into SQL using BCP? If not, the transaction logs may not be able to keep up with your input. On a test machine, try turning the recovery mode to Simple (non-logged) , or use the BCP methods to get data in (they bypass T logging)
Adding on to StingyJack's answer ...
If you're unable to use straight BCP due to processing requirements, have you considered performing the import against a separate SQL Server (separate box), using your tool, then running BCP?
The key to making this work would be keeping the staging machine clean -- that is, no data except the current working set. This should keep the RAM usage down enough to make the imports work, as you're not hitting tables with -- I presume -- millions of records. The end result would be a single view or table in this second database that could be easily BCP'ed over to the real one when all the processing is complete.
The downside is, of course, having another box ... And a much more complicated architecture. And it's all dependent on your schema, and whether or not that sort of thing could be supported easily ...
I've had to do this with some extremely large and complex imports of my own, and it's worked well in the past. Expensive, but effective.
I found out that it was nHibernate creating the cursor on the large table. I am yet to understand why, but in the mean time I have replaced the large table data access model with straight forward ado.net calls
Since you are rewriting it anyway, you may not be aware that you can call BCP directly from .NET via the System.Data.SqlClient.SqlBulkCopy class. See this article for some interesting perforance info.