I've query executing ~2 secs in MSSMS (returning 25K of rows)
Same query used in .NET (sqlReader) exetuting few minutes!
I've also tried to execute only reader
(commented all code in while loop just leaving reader.Read() ) - still same!
Any idea what's up?
I'm not DBA and not priviledged to play with Profiler - will ask my DBA and let all know.
In the meantime I'm noticed essential performance boost after adding "WITH RECOMPILE" param to SP I'm talking
So, from my perspective it seems to be the case with execution plan...
What do you think?
[EDIT]
Also what I've checked was performing below query from QA and .NET
select ##options
My understanding is it shall return same value for both environements.
(If not differnet ex.plans will be used)
Am I right?
[EDIT2]
I've read (from http://www.sqldev.net/misc/fn_setopts.htm) that ARITHABOIRT=ON in QA (in .NET it is off)
Does enybody know how to force ARITHABOIRT=ON for every .NET connections?
I would set up a trace in SQL Server Profiler to see what SET options settings the connection is using when connecting from .NET code, and what settings are being used in SSMS. By SET options settings, I mean
ARITHABORT
ANSI_NULLS
CONCAT_NULL_YIELDS_NULL
//etc
Take a look at MSDN for a table of options
I have seen the problem before where the options were different (in that case, ARITHABORT) and the performance difference was huge.
I had that problem to. Tick the "arithmetic abort" setting in the Connection Settings of the DB server.
Also, query analyzer does not download the full contents of the large text or large binary fields. Your SqlDataReader could take longer because it does download the full contents.
I would check to see how long the actual retrieval is taking.
for instance:
Private Sub timeCheck()
'NOTE: Assuming you have a sqlconnection object named conn
'Create stopwatch
Dim sw As New System.Diagnostics.Stopwatch
'Setup query
Dim com As New SqlClient.SqlCommand("QUERY GOES HERE", conn)
sw.Start()
'Run query
Dim dr As SqlClient.SqlDataReader = com.ExecuteReader()
sw.Stop()
'Check the time
Dim sql_query_time As String = CStr((sw.ElapsedMilliseconds / 1000)) & " seconds"
End Sub
This will allow you to see whether the hold-up is in the retrieval, or in the execution of the reader.
If you ar executing the reader in a loop, where it executes many times, then make sure you are using CommandBehavior.CloseConnection
SqlCommand cmd = new SqlCommand();
SqlDataReader rdr = cmd.ExecuteReader(CommandBehavior.CloseConnection)
If you don't, each time the loop processes the line, when it finishes and the rdr and the connection object drops out of scope, the connection object will not be explicitly closed, so it will only get closed and released back to the pool when the Garbage Collector finally gets around to finalizing it...
Then, if your loop is fast enough, (which is very likely), you will run out of connections. (The pool has a maximum limit it can generate)
This will cause extra latency and delays as the code keeps creating extra unnecessary connections, (up to the maximum) and waiting for the GC to "catch up" with the loop that is using them...
Related
I expected to be able to include multiple SELECT statements, each separated by a semicolon, in my query, and get a dataset returned with as the same number of datatables as individual SELECT statements.
I am starting to think that the only way that this can be done is to create a stored procedure with multiple refcursor output parameters.
string sql = #"SELECT
R.DERVN_RULE_NUM
,P.DERVN_PARAM_INPT_IND
,R.DERVN_PARAM_NM
,R.DERVN_PARAM_VAL_DESC
,P.DERVN_PARAM_SPOT_NUM
,R.DERVN_PARAM_VAL_TXT
FROM
FDS_BASE.DERVN_RULE R
INNER JOIN FDS_BASE.DERVN_PARAM P
ON R.DERVN_TY_CD = P.DERVN_TY_CD
AND R.DERVN_PARAM_NM = P.DERVN_PARAM_NM
WHERE
R.DERVN_TY_CD = :DERVN_TY_CD
ORDER BY
R.DERVN_RULE_NUM
,P.DERVN_PARAM_INPT_IND DESC
, P.DERVN_PARAM_SPOT_NUM";
var dataSet = new DataSet();
using (OracleConnection oracleConnection = new OracleConnection(connectionString))
{
oracleConnection.Open();
var oracleCommand = new OracleCommand(sql, oracleConnection)
{
CommandType = CommandType.Text
};
oracleCommand.Parameters.Add(":DERVN_TY_CD", derivationType);
var oracleDataAdapter = new OracleDataAdapter(oracleCommand);
oracleDataAdapter.Fill(dataSet);
}
I tried to apply what I read here:
https://www.intertech.com/Blog/executing-sql-scripts-with-oracle-odp/
including changing my SQL to enclose it in a BEGIN END BLOCK in this form:
string sql = #"BEGIN
SELECT 1 FROM DUAL;
SELECT 2 FROM DUAL;
END";
and replacing my end of line character
sql = sql.Replace("\r\n", "\n");
but nothing works.
Is this even possible w/o using a stored procedure using ODP or must I make a seperate trip to the server for each query?
The simplest way to return multiple query results from a single statement is with the CURSOR SQL function. For example:
select
cursor(select * from all_tables) tables,
cursor(select * from all_objects) objects
from dual;
(However, I am not a C# programmer, so I don't know if this solution will work for you. Let me know if the code doesn't work - there's probably another solution using anonymous blocks and OUT parameters.)
must I make a seperate trip to the server for each query?
The way this is asked makes it seem like there's a considerable effort or waste of resources going on somewhere that can be saved or reduced, like making a database query is the equivalent of walking to the shops to get milk, coming back, then walking to the shops again to get bread and coming back
There isn't any appreciable saving to be had; if this was going to the shops, db querying is like being able to clone yourself X times, the X of you all going to the shops, and coming back at different times - some of you found your small things instantly and sprint back with them, some of you found the massive things instantly and stagger back with them, some of you took ages to find your things etc. (These are metaphors for the speed of query execution and the time required to download large vs small result sets).
If you have two queries that take ten seconds each to run, you can set them going in parallel and have your results ready and retrieved to the client in 10+x seconds (x being the time required to drag the data over the network), or you could execute them in series and have it be 20+x
If you think about it, putting two queries in one statement is only the same thing as submitting two statements for execution over different connections. The set of steps the db must take, and the set of steps the client must do to read, are the same. Writing a sproc to handle it is more effort, more complexity to maintain and more places code lives in. Even writing a block to do it is more. None of it saves anything. Even the bytes in the header of the tcp packets, minutiae as they are, are offset by more complex multi line blocks. If one query takes considerably longer than the other you might even be hamstrung into having to wait for them all to finish before you can get the results
Write your "query statement x with y parameters and return resultset Z" as async, start two of them and Task.WhenAll to wait for them to finish; if you can handle it, don't do a WhenAll but instead read and use the results as they finish - that's a saving, if the process can logically proceed before all queries deliver
I get that you're thinking "surely I should just walk to the shops and carry milk and bread back with me - that's more efficient than going twice" but it's a faulty perspective when you consider that the shop is nanoseconds away because you run at the speed of light, you have multiple private unobstructed paths to it and the bigger spend of time is finding the items you want and loading them into sufficient chained-together carts/dragging them all home. With a cloning approach, if the milk is right there, one of you can take it home and spend 10 minutes making the béchamel with it while the other of you is still waiting 10 minutes for the shop to bake the bread that you'll eat directly when you get home - you can still eat in 10 minutes if you maintain the parallelism, and launching separate operations is not only simpler but it keeps you in easy control of that
I looked at lots of questions but evidently my SO-fu isn't up to the task, so here I am. I am trying to efficiently use prepared statements, and I don't just mean parameterizing a single statement, but compiling one for reuse many times. My question lies around the parameters and reuse and how to implement that correctly.
Generally I follow this procedure (contrived example):
SqlConnection db = new SqlConnection(...);
SqlCommand s = new SqlCommand("select * from foo where a=#a", db);
s.Parameters.Add("#a", SqlDbType.VarChar, 8);
s.Prepare();
...
s.Parameters["#a"] = "bozo";
s.Execute();
Super, that works. However, I don't want to do all of these steps (or the latter four) every time I run this query. That seems like it's counteracting the whole idea of prepared statements. In my mind I should only have to change the parameters and re-execute, but the question is how to do that?
I tried s.Parameters.Clear(), but this actually removes the parameters themselves, not just the values, so I would essentially need to re-Add the parameters and re-Prepare again, which would seem to break the whole point as well. No thanks.
At this point I am left with iterating through s.Parameters and setting them all to null or some other value. Is this correct? Unfortunately in my current project I have queries with ~15 parameters which need to be executed ~10,000 times per run. I can shunt this iteration off into a method but was wondering if there is a better way to do this (without stored procs).
My current workaround is an extension method, SqlParameterCollection.Nullify, that sets all the parameters to null, which is fine for my case. I just run this after an execute.
I found some virtually identical but (IMHO) unanswered questions:
Prepared statements and the built-in connection pool in .NET
SQLite/C# Connection Pooling and Prepared Statement Confusion (Serge was so close to answering!)
The best answer I could find is (1) common sense above and (2) this page:
http://msdn.microsoft.com/en-us/magazine/cc163799.aspx
When re-using a prepared SqlCommand, surely all you need to do is set the parameter values to the new ones? You don't need to clear them out after use.
For myself, I haven't seen a DBMS produced in the last 10 years which got any noticeable benefit from preparing a statement (I suppose if the DB Server was at the limits of its CPU it might, but this is not typical). Are you sure that Preparing is necessary?
Running the same command "~10,000 times per run" smells a bit to me, unless you're uploading from an external source. In that case, Bulk Loading might help? What is each run doing?
To add to Simon's answer, prior to Sql 2005 Command.Prepare() would have improved query plan caching of ad-hoc queries (SPROCs would generally be compiled). However, in more recent Sql Versions, provided that your query is parameterized, ad-hoc queries which are also parameterized can also be cached, reducing the need for Prepare().
Here is an example of retaining a SqlParameters collection changing just the value of those parameters values which vary, to prevent repeated creation of the Parameters (i.e. saving parameter object creation and collection):
using (var sqlConnection = new SqlConnection("connstring"))
{
sqlConnection.Open();
using (var sqlCommand = new SqlCommand
{
Connection = sqlConnection,
CommandText = "dbo.MyProc",
CommandType = CommandType.StoredProcedure,
})
{
// Once-off setup per connection
// This parameter doesn't vary so is set just once
sqlCommand.Parameters.Add("ConstantParam0", SqlDbType.Int).Value = 1234;
// These parameters are defined once but set multiple times
sqlCommand.Parameters.Add(new SqlParameter("VarParam1", SqlDbType.VarChar));
sqlCommand.Parameters.Add(new SqlParameter("VarParam2", SqlDbType.DateTime));
// Tight loop - performance critical
foreach(var item in itemsToExec)
{
// No need to set ConstantParam0
// Reuses variable parameters, by just mutating values
sqlParameters["VarParam1"].Value = item.Param1Value; // Or sqlParameters[1].Value
sqlParameters["VarParam2"].Value = item.Param2Date; // Or sqlParameters[2].Value
sqlCommand.ExecuteNonQuery();
}
}
}
Notes:
If you are inserting a large number of rows, and concurrency with other inhabitants of the database is important, and if an ACID transaction boundary is not important, you might consider batching and committing updates such that fewer than 5000 row locks are held on a table at a time, to guard against table lock escalation.
Depending on what work your proc is actually doing, there may be an opportunity to parallelize the loop, e.g. with TPL. Obviously connection and commands are not thread safe each Task will require its own connection and Reusable Command - the localInit overload of Parallel.ForEach is ideal for this.
I tried to make the title as specific as possible. Basically what I have running inside a backgroundworker thread now is some code that looks like:
SqlConnection conn = new SqlConnection(connstring);
SqlCommand cmd = new SqlCommand(query, conn);
conn.Open();
SqlDataAdapter sda = new SqlDataAdapter(cmd);
sda.Fill(Results);
conn.Close();
sda.Dispose();
Where query is a string representing a large, time consuming query, and conn is the connection object.
My problem now is I need a stop button. I've come to realize killing the backgroundworker would be worthless because I still want to keep what results are left over after the query is canceled. Plus it wouldn't be able to check the canceled state until after the query.
What I've come up with so far:
I've been trying to conceptualize how to handle this efficiently without taking too big of a performance hit.
My idea was to use a SqlDataReader to read the data from the query piece at a time so that I had a "loop" to check a flag I could set from the GUI via a button. The problem is as far as I know I can't use the Load() method of a datatable and still be able to cancel the sqlcommand. If I'm wrong please let me know because that would make cancelling slightly easier.
In light of what I discovered I came to the realization I may only be able to cancel the sqlcommand mid-query if I did something like the below (pseudo-code):
while(reader.Read())
{
//check flag status
//if it is set to 'kill' fire off the kill thread
//otherwise populate the datatable with what was read
}
However, it would seem to me this would be highly ineffective and possibly costly. Is this the only way to kill a sqlcommand in progress that absolutely needs to be in a datatable? Any help would be appreciated!
There are really two stages where cancelling matters:
Cancelling the initial query execution before the first rows are returned
Aborting the process of reading the rows as they are served
Depending on the nature of the actual sql statement, either of these steps could be 99% of the time, so they both should be considered. For example, calling SELECT * on some table with a billion rows will take essentionally no time to execute but will take a very long time read. Conversely, requesting a super complicated join on poorly tuned tables and then wrapping it all in some aggregating clauses may take minutes to execute but negligible time to read the handful of rows once they are actually returned.
Well-tuned advanced database engines will also cache chunks of rows at a time for complicated queries, so you will see alternating pauses where the engine is executing the query on the next batch of rows and then fast bursts of data as it returns the next batch of results.
Cancelling the query execution
In order to be able to cancel a query while it is executing you can use one of the overloads of SqlCommand.BeginExecuteReader to start the query, and call SqlCommand.Cancel to abort it. Alternatively you can call ExecuteReader() syncronously in one thread and still call Cancel() from another. I'm not including code examples because there are plenty of them in the documentation.
Aborting the read operation
Here using a simple boolean flag is probably the easiest way. And remember it's really easy to fill a data table row using the Rows.Add() overload that takes an array of object, that is:
object[] buffer = new object[reader.FieldCount]
while(reader.Read()) {
if(cancelFlag) break;
reader.GetValues(buffer);
dataTable.Rows.Add(buffer);
}
Cancelling blocking calls to Read()
A sort of mixed case occurs when, as mentioned earlier, a call to reader.Read() causes the database engine to do another batch of intensive processing. As noted in the MSDN documentation, calls to Read() can be blocking in this case even if the original query was executed with BeginExecuteReader. You can still get around this by calling Read() in one thread that's handling all the reading but calling Cancel() in another thread. The way you know if you reader is in a blocking Read call is to have another flag the the reader thread updates while the monitoring thread reads:
...
inRead = true
while(reader.Read()) {
inRead = false
...
inRead = true
}
// Somewhere else:
private void foo_onUITimerTick(...) {
status.Text = inRead ? "Waiting for server" : "Reading";
}
Regarding performance of Reader vs Adapter
A DataReader is usually faster than using DataAdapter.Fill(). The whole point of a DataReader is to be really, really fast and responsive for reading. Checking some boolean flag once per row would not add a measurable difference in time even over millions of rows.
The limiting factor for a big database query is not the local CPU processing time but the size of the I/O pipe (your network connection for a remote database or your disk speed for a local one) or a combination of the db server's own disk speed and CPU processing time for a complex query. Both a DataAdapter and a DataReader will spend time (perhaps the majority of the time) just waiting for a few nanoseconds at a time for the next row to be served.
One convenience of DataAdapter.Fill() is that it does the magic of dynamically generating the DataTable columns to match the query results, but that's not difficult to do yourself (see SqlDataReader.GetSchemaTable()).
Just a try
I would suggest you to put a time consuming query in a BackgroundWorker and pass the command to it. so that you can hold the command object in control. When cancel command comes, just say passed(to BackgroundWorker which in progress) command to Cancel by command.Cancel()
I have a simple query which returns 25,026 rows:
MySqlCommand cmd = new MySqlCommand("SELECT ID FROM People", DB);
MySqlDataReader reader = cmd.ExecuteReader();
(ID is an int.) If I just do this:
int i = 0;
while (reader.Read()) i++;
i will equal 25026. However, I need to do some processing on each ID in my loop; each iteration ends up taking somewhere in the hundreds of milliseconds.
int i = 0;
MySqlCommand updater = new MySqlCommand("INSERT INTO OtherTable (...)", anotherConnection);
updater.Prepare();
while (reader.Read()) {
int id = reader.getInt32(0);
// do stuff, then
updater.ExecuteNonQuery();
i++;
}
However, after about 4:15 of processing, reader.Read() simply returns false. In most of my test runs, i equaled 14896, but it also sometimes stops at 11920. The DataReader quitting after the same number of records is suspicious, and the times it stops after a different number of rows seems even stranger.
Why is reader.Read() returning false when there's definitely more rows? There are no exceptions being thrown – not even first chance exceptions.
Update: I mentioned in my response to Shaun's answer that I was becoming convinced that MySqlDataReader.Read() is swallowing an exception, so I downloaded Connector/Net's source code (bzr branch lp:connectornet/6.2 C:/local/path) and added the project to my solution. Sure enough, after 6:15 of processing, an exception!
The call to resultSet.NextRow() throws a MySqlException with a message of "Reading from the stream has failed." The InnerException is a SocketException:
{ Message: "An existing connection was forcibly closed by the remote host",
ErrorCode: 10054,
SocketErrorCode: ConnectionReset }
10054 means the TCP socket was aborted with a RST instead of the normal disconnection handshake (FIN, FIN ACK, ACK), which tells me something screwy is happening to the network connection.
In my.ini, I cranked interactive_timeout and wait_timeout to 1814400 (seconds) to no avail.
So... why is my connection getting torn down after reading for 6:15 (375 sec)?
(Also, why is this exception getting swallowed when I use the official binary? It looks like it should bubble up to my application code.)
Perhaps you have a corrupted table - this guy's problem sounds very similar to yours:
http://forums.asp.net/t/1507319.aspx?PageIndex=2 - repair the table and see what happens.
If that doesn't work, read on:
My guess is that you are hitting some type of Deadlock, especially considering you are reading and writing. This would explaing why it works with the simple loop, but doesn't work when you do updates. It would also explain why it happens around the same row / time each time.
There was a weird bug in SqlDataReader that squelched exceptions (http://support.microsoft.com/kb/316667). There might be something similar in MySqlDatareader - After your final .Read() call, try calling .NextResult(). Even if it's not a deadlock, it might help you diagnose the problem. In these type of situations, you want to lean more towards "trust but verify" - yes, the documentation says that and exception will be thrown on timeone, but sometimes (very rarely) that documentation lies :) This is especially true for 3rd party vendors - e.g. see http://bugs.mysql.com/bug.php?id=53439 - the mysql .net library has had a couple of problems like the one you are having in the past.
Another idea would be to watch what's happening in your database - make sure data is contantly being fetched up till the row that your code exits on.
Failing that, I would just read all the data in, cache it, and then do your modifications. By batching the modifications, the code would be less chatty and execute faster.
Alternatively, reset the reader every 1000 or so rows (and keep track of what row ID you were up to)
Hope something here helps you with your frustration! :)
Since I'm just reading ints, I ultimately just read the entire resultset into a List<int>, closed the reader, and then did my processing. This is fine for ints since a even a million take up < 100 MB of RAM, but I'm still disappointed that the root issue isn't resolved – if I were reading more than a single int per row, memory would become a very large problem with a large dataset.
Try to set longer connection timeout.
There are 2 issues that make things a bit more confusing than it should be:
The first, as has been mentioned in another post, is that older versions of the MySQL .NET connector were swallowing a timeout exception. I was using mysql.data.dll version 6.1.x and after upgrading to 6.3.6 the exception was being properly thrown.
The second is the default MySQL server timeouts, specifically net_read_timeout and net_write_timeout (which default to 30 and 60 seconds respectively).
With older versions of mysql.data.dll, when you are performing an action with the data in the datareader loop and you exceed the 60 second default timeout it would just sit there and not do anything. With newer versions it properly throws a timeout exception which helps diagnose the problem.
Hope this helps someone as I stumbled upon this but the solution was to use a different approach, not an actual cause/fix.
TLDR: The fix is increase net_read_timeout and net_write_timeout on the mysql server in my.ini although upgrading mysql.data.dll is a good idea.
May be this is timeout on server-side?
Try this
1. Add reference System.Transactions;
using(TransactionScope scope = new TransactionScope())
{
//Initialize connection
// execute command
:
:
scope.Complete();
}
Write your entire insert/update logic inside Scope's using. This will definetly help you.
Add the following after creating your Command.
cmd.CommandTimeout = 0;
This will set the CommandTimeout to indefinitly. The reason your getting a timeout is probably because the connection though executed, is still in the 'command' phase because of the Reader.
Either try setting the CommandTimeout = 0 or reading everything first, then doing subsequent functions on the results. Otherwise the only other issue i could possibley see is that the Sql Server is dropping the result set against the specified process id due to a timeout on the server itself.
i've found an article here
http://corengen.wordpress.com/2010/06/09/mysql-connectornet-hangs-on-mysqldatareader-read/
What this guy experienced was something similar: a hang on the Read method at exactly the same moment, during reading of the same record (which is the same thing you experience i guess).
In his case he called another webservice during the Read() loop, and that one timed out causing the Read() to hang without an exception.
Can it be the same at your machine, that an update in the Read() loop times out (i think that update uses the default 30 secs timeout) and causes the same effect?
Maybe a longshot, but reading the two stories the sounded a lot familiair.
Try setting the timeout on the MySqlCommand and not the MySqlConnection on both "cmd" and "updater".
What happens if you do : SELECT TOP 100 ID FROM PEOPLE ?
When you use a SqlDataReader, is the return set completely determined by the ExecuteReader step, or can you influence what you get by writing to the source table(s) while reading? Here is an example in very rough pseudo code.
sc = new SqlCommand("select * from people order by last, first",db) ;
sdr = sc.ExecuteReader() ;
while (sdr.read())
{
l = (string) sdr["last"] ;
k = (string) sdr["key"] ;
if (l.Equals("Adams"))
{
sc2 = new SqlCommand("update people set last = #nm where key = #key") ;
sc2.Parameters.Add(new SqlParameter("#nm", "Ziegler"));
sc2.Parameters.Add(new SqlParameter("#key", k));
sc2.ExecuteNonQuery() ;
}
}
I've seen a lot of bad errors in other environments caused by writing to the table you are reading. Here record k gets bumped from the top of the list (Adams) to the bottom (Ziegler). I've assumed (ha!) that SqlDataReader is immune. True? False?
It depends on your transaction isolation level or other locking hints, but iirc by default reading from a table in sql server locks those records, and therefore the code you posted will either deadlock (sc2 will eventually timeout) or the updates will go into the the transaction log and none will be written until your reader is finished. I don't remember which off the top of my head.
One issue I see is that when the reader is open it owns the database connection, nothing else can use it while the reader is open. So the only way possible to do this is using a different database connection, and still it would depend on transaction level
If you want to assume that the read data is not altered by those updates, could you read the data into a temporary object container, and then after all the reading is done, then do your updates? It would make the issue moot.
Of course, I did find the question interesting from a "how does this really work" standpoint.
If you want to do updates while you're iterating on the query results you could read it all into a DataSet.
I know you didn't ask about this, and I also know this is pseudo-code, but be sure to wrap your sc, sdr, and sc2 variables in using () statements to ensure they're disposed properly.