I have a DataReader...I use the result of the DataReader as parameter on another DataReader that is connected to a command with a Stored Procedure type. It works fast for now but I worry about the time when my database is filled with information. How can I speed things up? Thanks
Likely, your initial query could stand to join to the results generated by the sproc.
Essentially, you have 2 database round-trips instead of one. This may be a performance problem if you call this frequently and the result is small and you have already optimized both query and the stored procedure (so the round-trip overhead becomes significant relative to the actual useful work).
Benchmark and see if this piece of functionality is actually a bottleneck. If yes, you may try to "merge" these two operations at the SQL level, so they can be executed server-size in one go.
I'm not sure if this is related to your question, but keep in mind that (depending on your DBMS / ADO.NET provider), multiple active readers on the same connection may or may not be supported. Are you closing the first DbDataReader before opening the second one? If no, and you happen to switch to a different DBMS, there may be trouble. If memory serves me well, Oracle (ODP.NET) and DB2 support multiple readers, while MS SQL Server and PostgreSQL (Npgsql) don't.
Related
Given a data set where you will only know the relevant fields at run time. I want to select each row of data for analysis through a loop. Is it better to:
run a direct sql query to get the row each time by directly opening and closing the database
pull all the applicable rows into a datatable before the loop then selecting them through linq from inside the loop
For example, I am reading in a file that says look for rows a b and c then my query becomes "SELECT col1, col2, col3 FROM table WHERE col1 = 'a' or col1 = 'b' or col1= 'c'"
But I dont know it will be a,b,c during compile time, only after i run the program
Thanks
edit: better in terms or speed and best practice
depending on how long your analysis takes, the underlying transactional locking (if any), and the resources blocked by holding your result set on the db server it is either 1 or 2 ... but for me 2 seems rather unlikely (gigantic resultsets that are open for long periods of time and would eat up the memory on your DB system, which alone would suggest that you should rethink your whole data processing workflow)... i'd just build the SQL at runtime, which is even possible using linq directly against your DB (see "Entity Framework") and only if I would encounter serious performance problems once that is running, i'd refactor...
StackOverflow isn't really for "better" type questions because they're usually highly subjective/opinion based/lead to arguments.. We are however, free to decide whether subjective questions should be answered and this is one of those that (in my opinion) can
I'd assert that in 99% of cases you'd be better off leaving data in a database and querying it using the SQL functionality built into the database. The reason is that databases are purpose built for storing and querying data. It is somewhat ludicrous to conclude that it's a good use of resources to transfer 100 million rows across a network, into a poorly designed data storage container and then naively searching through it using a loop-in-a-loop (which is what linq in a loop would be) when you could leave it in the purpose-built, well indexed, high powered enterprise level software (on enterprise grade hardware) where it currently resides, and ask for retrieval of a tiny subset of those records to be transferred over a (slow) network link into a limited power client
From what I've read, there appears to be marginal performance benefits using stored procedures vs simply building the commands in C# and calling them explicitly in the program's code, at least when it comes to machines that share the server program and db engine (and when the procedures are simple). Most people seem to think it's a 'preference issue', and add a few other minor benefits to justify their case.
However, one I couldn't find any information on, is the benefit of a stored procedure when the database engine is located on a separate physical machine from the main application.
If I am not mistaken, in a server farm, wouldn't a stored procedure offload the processing on some cpu threads from the main server application, and have the primary processing done on the db engine server's cpu instead? Or, is this already done on the db engine's cpu anyways, when the C# libraries 'build' the information for the db engine to process?
Specifically, I have a long-running transaction that I could do multiple calls in a C# transaction block, but I suspect that a stored proc will in fact have a huge performance benefit by reducing the network calls to the db engine, as well as guaranteeing the processing is not being done on the main server application.
Is this true?
Performance gains from a stored procedure (versus something like Dapper or an OR/M like Entity Framework) can vary anywhere from nearly identical to a very noticeable performance improvement. I don't think your question can be answered without seeing the code that would be translated to a stored procedure.
Having said that, in my experience making a single stored procedure call versus multiple statements from the application code, yes, it would likely be faster.
If the SP is just a simple query (ie one SELECT statement) the performance gain is that a SP is pre-compiled. While the query is running you should not see any difference if it is a query or a SP.
I'm not sure of the effect if the SP is more complicated because this would depend on the query.
The more important benefit of a SP is that all the data are kept in the DBMS instead of being sent back and forward to the client. If you are dealing with large amount of data the benefit is more evident. The difference rises if your DB is located on a different machine and even more if the connection between them is slow.
On the contrary you must consider that a SP usually is not compiled to machine code so if the SP implements very complex logic it could be faster to implement the logic on the client.
Then you should also consider that moving the business logic to the server is not so good for code maintenance, you could add a technology debit implementing in the DB something that should be in your client code.
So, there isn't a solution valid for all the seasons but usually a well written SP is faster than the same code running on the client
There are a few issues at play here. As others have said, it kind of depends. A raw select statement will be barely noticeable. If there's a hugely complex query then a SP can save a lot of repetitive parsing. If there's a lot of intermediate data then SP will keep the data local, reducing network traffic. If your DB has a higher spec than the client it might run faster due to CPU horsepower.
Downsides can be things like bogging down the DB for everyone with processing that could be done on the client. This is generally if you're running an underpowered SQL server. Another subtle side to this is that licensing costs for a multi-core DB server can be impressive. Your $ per cycle on a SQL Server box can be many times what it is on your client.
After search through google I came to know that the SQLSRV32 odbc driver do not support MARS.What are the workarounds for this. One way i guess is stop loop through the results of several SQL commands. But in my case i have to create 30-40 table and insert about 400-500 rows of data at a time. Is it a good idea to open and close connection for every single sql commands.Please Help
Don't open and close connection for each statement, open the connection and create multiple commands to use that one connection. Inserting ~15,000 records shouldn't take too long. I don't know if ODBC has support for it, but you can also look into SQL Server's Bulk Copy functionality to do something like this.
A final word about MARS. MARS only matters when you want to have multiple simultaneous queries on the same connection that are returning result sets. That isn't really an issue here as you are doing inserts.
Also, there isn't anything stopping you from running multiple threads to do the inserts. I would do perhaps one thread per table, with a thread for each core. Parallel.ForEach could help out here.
I have the following situation:
.net 3.5 WinForm client app accessing SQL Server 2008
Some queries returning relatively big amount of data are used quite often by a form
Users are using local SQL Express and restarting their machines at least daily
Other users are working remotely over slow network connections
The problem is that after a restart, the first time users open this form the queries are extremely slow and take more or less 15s on a fast machine to execute. Afterwards the same queries take only 3s. Of course this comes from the fact that no data is cached and must be loaded from disk first.
My question:
Would it be possible to force the loading of the required data in advance into SQL Server cache?
Note
My first idea was to execute the queries in a background worker when the application starts, so that when the user starts the form the queries will already be cached and execute fast directly. I however don't want to load the result of the queries over to the client as some users are working remotely or have otherwise slow networks.
So I thought just executing the queries from a stored procedure and putting the results into temporary tables so that nothing would be returned.
Turned out that some of the result sets are using dynamic columns so I couldn't create the corresponding temp tables and thus this isn't a solution.
Do you happen to have any other idea?
Are you sure this is the execution plan being created, or is it server memory caching that's going on? Maybe the first query loads quite a bit of data, but subsequent queries can use the already-cached data, and so run much quicker. I've never seen an execution plan take more than a second to generate, so I'd suspect the plan itself isn't the cause.
Have you tried running the index tuning wizard on your query? If it is the plan that's causing problems, maybe some statistics or an additional index will help you out, and the optimizer is pretty good at recommending things.
I'm not sure how you are executing your queries, but you could do:
SqlCommand Command = /* your command */
Command.ExecuteReader(CommandBehavior.SchemaOnly).Dispose();
Executing your command with the schema-only command behavior will add SET FMTONLY ON to the query and cause SQL Server to get metadata about the result set (requiring generation of the plan), but will not actually execute the command.
To narrow down the source of the problem you can always use the SQL Server Objects in perfmon to get a general idea of how the local instance of SQL Server Express is performing.
In this case you would most likely see a lower Buffer Cache Hit Ratio on the first request and a higher number on subsequent requests.
Also you may want to check out http://msdn.microsoft.com/en-us/library/ms191129.aspx It describes how you can set a sproc to run automatically when the SQL Server service starts up.
If you retrieve the Data you need with that sproc then maybe the data will remain cached and improve the performance the first time the data is retrieved by the end user via your form.
In the end I still used the approach I tried first: Executing the queries from a stored procedure and putting the results into temporary tables so that nothing would be returned. This 'caching' stored procedure is executed in the background whenever the application starts.
It just took some time to write the temporary tables as the result sets are dynamic.
Thanks to all of you for your really fast help on the issue!
This is not a question about optimizing a SQL command. I'm wondering what ways there are to ensure that a SQL connection is kept open and ready to handle a command as efficiently as possible.
What I'm seeing right now is I can execute a SQL command and that command will take ~1s, additional executions will take ~300ms. This is after the command has previously been executed against the SQL server (from another application instance)... so the SQL cache should be fully populated for the executed query prior to this applications initial execution. As long as I continually re-execute the query I see times of about 300ms, but if I leave the application idle for 5-10 minutes and return the next request will be back to ~1s (same as the initial request).
Is there a way to via the connection string or some property on the SqlConnection direct the framework to keep the connection hydrated and ready to efficiently handle queries?
Have you checked the execution plan for your procedures. Execution plans I believe are loaded into memory on the Server and then get cleared after certain periods of time or depending on what tables etc are accessed in the procedures. We've had cases where simplifying stored procedures (perhaps splitting them) reduces the amount of work the database server has to do in calculating the plans...and ultimately reduces the first time the procedure is called...You can issue commands to force stored procedures to recompile each time for testing whether you are reducing the initial call time...
We've had cases where the complexity of a stored procedure made the database server continually have to recompile based on different parameters which drastically slowed it down, splitting the SP or simplifying large select statements into multiple update statements etc helped a considerable amount.
other ideas are perhaps intermittently calling a simple getDate() or similar every so often so that the sql server is awake (hope that makes sense)...much the same as keeping an asp.net app in memory in IIS.
The default value for open connections in a .NET connection pool is zero.
You can adjust this value in your connection string to 1 or more:
"data source=dbserver;...Asynchronous Processing=true;Min Pool Size=1"
See more about these options in MSDN.
you keep it open by not closing it. :) but that's not adviseable since connection pooling will handle connection management for you. do you have it enabled?
by default the connection pooling is enabled in ADO .NET. this will be through the connection string used by the application. More info in Using Connection Pooling with SQL Server
If you use more than one database connection, it may be more efficent. Having one database connection means the best possible access speed is always going to be limited sequentially. Whereas having >1 connections means theres an opportunity there for your compiler to optimize concurrent access a little more. I guess you're using .NET?
Also if your issuing the same SQL statement repeatedly, its possible your database server is caching the result for a short period of time, therefore making the return of the resultset quicker..