When you use a SqlDataReader, is the return set completely determined by the ExecuteReader step, or can you influence what you get by writing to the source table(s) while reading? Here is an example in very rough pseudo code.
sc = new SqlCommand("select * from people order by last, first",db) ;
sdr = sc.ExecuteReader() ;
while (sdr.read())
{
l = (string) sdr["last"] ;
k = (string) sdr["key"] ;
if (l.Equals("Adams"))
{
sc2 = new SqlCommand("update people set last = #nm where key = #key") ;
sc2.Parameters.Add(new SqlParameter("#nm", "Ziegler"));
sc2.Parameters.Add(new SqlParameter("#key", k));
sc2.ExecuteNonQuery() ;
}
}
I've seen a lot of bad errors in other environments caused by writing to the table you are reading. Here record k gets bumped from the top of the list (Adams) to the bottom (Ziegler). I've assumed (ha!) that SqlDataReader is immune. True? False?
It depends on your transaction isolation level or other locking hints, but iirc by default reading from a table in sql server locks those records, and therefore the code you posted will either deadlock (sc2 will eventually timeout) or the updates will go into the the transaction log and none will be written until your reader is finished. I don't remember which off the top of my head.
One issue I see is that when the reader is open it owns the database connection, nothing else can use it while the reader is open. So the only way possible to do this is using a different database connection, and still it would depend on transaction level
If you want to assume that the read data is not altered by those updates, could you read the data into a temporary object container, and then after all the reading is done, then do your updates? It would make the issue moot.
Of course, I did find the question interesting from a "how does this really work" standpoint.
If you want to do updates while you're iterating on the query results you could read it all into a DataSet.
I know you didn't ask about this, and I also know this is pseudo-code, but be sure to wrap your sc, sdr, and sc2 variables in using () statements to ensure they're disposed properly.
Related
I've query executing ~2 secs in MSSMS (returning 25K of rows)
Same query used in .NET (sqlReader) exetuting few minutes!
I've also tried to execute only reader
(commented all code in while loop just leaving reader.Read() ) - still same!
Any idea what's up?
I'm not DBA and not priviledged to play with Profiler - will ask my DBA and let all know.
In the meantime I'm noticed essential performance boost after adding "WITH RECOMPILE" param to SP I'm talking
So, from my perspective it seems to be the case with execution plan...
What do you think?
[EDIT]
Also what I've checked was performing below query from QA and .NET
select ##options
My understanding is it shall return same value for both environements.
(If not differnet ex.plans will be used)
Am I right?
[EDIT2]
I've read (from http://www.sqldev.net/misc/fn_setopts.htm) that ARITHABOIRT=ON in QA (in .NET it is off)
Does enybody know how to force ARITHABOIRT=ON for every .NET connections?
I would set up a trace in SQL Server Profiler to see what SET options settings the connection is using when connecting from .NET code, and what settings are being used in SSMS. By SET options settings, I mean
ARITHABORT
ANSI_NULLS
CONCAT_NULL_YIELDS_NULL
//etc
Take a look at MSDN for a table of options
I have seen the problem before where the options were different (in that case, ARITHABORT) and the performance difference was huge.
I had that problem to. Tick the "arithmetic abort" setting in the Connection Settings of the DB server.
Also, query analyzer does not download the full contents of the large text or large binary fields. Your SqlDataReader could take longer because it does download the full contents.
I would check to see how long the actual retrieval is taking.
for instance:
Private Sub timeCheck()
'NOTE: Assuming you have a sqlconnection object named conn
'Create stopwatch
Dim sw As New System.Diagnostics.Stopwatch
'Setup query
Dim com As New SqlClient.SqlCommand("QUERY GOES HERE", conn)
sw.Start()
'Run query
Dim dr As SqlClient.SqlDataReader = com.ExecuteReader()
sw.Stop()
'Check the time
Dim sql_query_time As String = CStr((sw.ElapsedMilliseconds / 1000)) & " seconds"
End Sub
This will allow you to see whether the hold-up is in the retrieval, or in the execution of the reader.
If you ar executing the reader in a loop, where it executes many times, then make sure you are using CommandBehavior.CloseConnection
SqlCommand cmd = new SqlCommand();
SqlDataReader rdr = cmd.ExecuteReader(CommandBehavior.CloseConnection)
If you don't, each time the loop processes the line, when it finishes and the rdr and the connection object drops out of scope, the connection object will not be explicitly closed, so it will only get closed and released back to the pool when the Garbage Collector finally gets around to finalizing it...
Then, if your loop is fast enough, (which is very likely), you will run out of connections. (The pool has a maximum limit it can generate)
This will cause extra latency and delays as the code keeps creating extra unnecessary connections, (up to the maximum) and waiting for the GC to "catch up" with the loop that is using them...
I looked at lots of questions but evidently my SO-fu isn't up to the task, so here I am. I am trying to efficiently use prepared statements, and I don't just mean parameterizing a single statement, but compiling one for reuse many times. My question lies around the parameters and reuse and how to implement that correctly.
Generally I follow this procedure (contrived example):
SqlConnection db = new SqlConnection(...);
SqlCommand s = new SqlCommand("select * from foo where a=#a", db);
s.Parameters.Add("#a", SqlDbType.VarChar, 8);
s.Prepare();
...
s.Parameters["#a"] = "bozo";
s.Execute();
Super, that works. However, I don't want to do all of these steps (or the latter four) every time I run this query. That seems like it's counteracting the whole idea of prepared statements. In my mind I should only have to change the parameters and re-execute, but the question is how to do that?
I tried s.Parameters.Clear(), but this actually removes the parameters themselves, not just the values, so I would essentially need to re-Add the parameters and re-Prepare again, which would seem to break the whole point as well. No thanks.
At this point I am left with iterating through s.Parameters and setting them all to null or some other value. Is this correct? Unfortunately in my current project I have queries with ~15 parameters which need to be executed ~10,000 times per run. I can shunt this iteration off into a method but was wondering if there is a better way to do this (without stored procs).
My current workaround is an extension method, SqlParameterCollection.Nullify, that sets all the parameters to null, which is fine for my case. I just run this after an execute.
I found some virtually identical but (IMHO) unanswered questions:
Prepared statements and the built-in connection pool in .NET
SQLite/C# Connection Pooling and Prepared Statement Confusion (Serge was so close to answering!)
The best answer I could find is (1) common sense above and (2) this page:
http://msdn.microsoft.com/en-us/magazine/cc163799.aspx
When re-using a prepared SqlCommand, surely all you need to do is set the parameter values to the new ones? You don't need to clear them out after use.
For myself, I haven't seen a DBMS produced in the last 10 years which got any noticeable benefit from preparing a statement (I suppose if the DB Server was at the limits of its CPU it might, but this is not typical). Are you sure that Preparing is necessary?
Running the same command "~10,000 times per run" smells a bit to me, unless you're uploading from an external source. In that case, Bulk Loading might help? What is each run doing?
To add to Simon's answer, prior to Sql 2005 Command.Prepare() would have improved query plan caching of ad-hoc queries (SPROCs would generally be compiled). However, in more recent Sql Versions, provided that your query is parameterized, ad-hoc queries which are also parameterized can also be cached, reducing the need for Prepare().
Here is an example of retaining a SqlParameters collection changing just the value of those parameters values which vary, to prevent repeated creation of the Parameters (i.e. saving parameter object creation and collection):
using (var sqlConnection = new SqlConnection("connstring"))
{
sqlConnection.Open();
using (var sqlCommand = new SqlCommand
{
Connection = sqlConnection,
CommandText = "dbo.MyProc",
CommandType = CommandType.StoredProcedure,
})
{
// Once-off setup per connection
// This parameter doesn't vary so is set just once
sqlCommand.Parameters.Add("ConstantParam0", SqlDbType.Int).Value = 1234;
// These parameters are defined once but set multiple times
sqlCommand.Parameters.Add(new SqlParameter("VarParam1", SqlDbType.VarChar));
sqlCommand.Parameters.Add(new SqlParameter("VarParam2", SqlDbType.DateTime));
// Tight loop - performance critical
foreach(var item in itemsToExec)
{
// No need to set ConstantParam0
// Reuses variable parameters, by just mutating values
sqlParameters["VarParam1"].Value = item.Param1Value; // Or sqlParameters[1].Value
sqlParameters["VarParam2"].Value = item.Param2Date; // Or sqlParameters[2].Value
sqlCommand.ExecuteNonQuery();
}
}
}
Notes:
If you are inserting a large number of rows, and concurrency with other inhabitants of the database is important, and if an ACID transaction boundary is not important, you might consider batching and committing updates such that fewer than 5000 row locks are held on a table at a time, to guard against table lock escalation.
Depending on what work your proc is actually doing, there may be an opportunity to parallelize the loop, e.g. with TPL. Obviously connection and commands are not thread safe each Task will require its own connection and Reusable Command - the localInit overload of Parallel.ForEach is ideal for this.
I am experiencing very strange issue: the stored procedure doesn't return even though the data is properly and timely inserted. The most staggering thing about this is that the record time stamp field is always populated within milliseconds so the data seems getting into the tables very fast. But return never happens.
Important part of all this is that it only happens under the load -- individual requests are just fine. Only when DB is stressed enough, this thing starts happening.
Any ideas are welcome because I have very little understanding what can be wrong.
Here is simplified C# part of it:
try
{
using (var conn = new SqlConnection(connString))
{
conn.Open();
using (var cmd = new SqlCommand(conn, ....)
{
cmd.CommandType = StoredProcedure;
cmd.ExecuteNonQuery();
// THIS NEVER EXECUTES:
ReportSuccess();
}
}
}
catch (TimeoutException)
{
// EXCEPTION HERE
}
And the stored procedure:
CREATE PROCEDURE dbo.Blah
BEGIN
INSERT dbo.MyTable VALUES (...)
INSERT dbo.MyTable2...
-- Here is where everything stops.
END
UPDATE: we did out best to correlate timeouts and SQL server activity and it appears non-app user activity was causing locks. The process is designed to allow very quick inserts and very quick reads. However some individuals would execute quite expensive queries without actually using DIRTY READ policy, which was tipping over the fragile hardware load balance. Thanks for all your tips though.
Based on the information provided, my best guess is that there is a contingency problem in your stored procedure.
Try using transaction in your stored procedure. If my theory is correct, no record will be inserted.
Check if there are any locks being acquired on MyTable2. If you doing a big select on that table elsewhere, ensure you are using nolock.
I have some code that at the end of the program's life, uploads the entire contents of 6 different lists into a database. The problem is, they're parallel lists with about 14,000 items in each, and I have to run an Insert query for each of their separate item(s). This takes a long time, is there a faster way to do this? Here's a sample of the relevant code:
public void uploadContent()
{
var cs = Properties.Settings.Default.Database;
SqlConnection dataConnection = new SqlConnection(cs);
dataConnection.Open();
for (int i = 0; i < urlList.Count; i++)
{
SqlCommand dataCommand = new SqlCommand(Properties.Settings.Default.CommandString, dataConnection);
try
{
dataCommand.Parameters.AddWithValue("#user", userList[i]);
dataCommand.Parameters.AddWithValue("#computer", computerList[i]);
dataCommand.Parameters.AddWithValue("#date", timestampList[i]);
dataCommand.Parameters.AddWithValue("#itemName", domainList[i]);
dataCommand.Parameters.AddWithValue("#itemDetails", urlList[i]);
dataCommand.Parameters.AddWithValue("#timesUsed", hitsList[i]);
dataCommand.ExecuteNonQuery();
}
catch (Exception e)
{
using (StreamWriter sw = File.AppendText("errorLog.log"))
{
sw.WriteLine(e);
}
}
}
dataConnection.Close();
}
Here is the command string the code is pulling from the config file:
CommandString:
INSERT dbo.InternetUsage VALUES (#user, #computer, #date, #itemName, #itemDetails, #timesUsed)
As mentioned in #alerya's answer, doing the following will help (added explanation here)
1) Make Command and parameter creation outside of for loop
Since the same command is being used each time, it doesn't make sense to re-create the command each time. In addition to creating a new object (which takes time), the command must also be verified each time it is created for several things (table exists, etc). This introduces a lot of overhead.
2) Put the inserts within a transaction
Putting all of the inserts within a transaction will speed things up because, by default, a command that is not within a transaction will be considered its own transaction. Therefore, every time you insert something, the database server must then verify that what it just inserted is actually saved (usually on a harddisk, which is limited by the speed of the disk). When multiple INSERTs are within one transactions, however, the check only needs to be performed once.
The downside to this approach, based on the code you've already shown, is that one bad INSERT will spoil the bunch. Whether or not this is acceptable depends on your specific requirements.
Aside
Another thing you really should be doing (though this won't speed things up in the short term) is properly using the IDisposable interface. This means either calling .Dispose() on all IDisposable objects (SqlConnection, SqlCommand), or, ideally, wrapping them in using() blocks:
using( SqlConnection dataConnection = new SqlConnection(cs)
{
//Code goes here
}
This will prevent memory leaks from these spots, which will become a problem quickly if your loops get too large.
Make Command and parameter creation outside of for (int i = 0; i < urlList.Count; i++)
Also Create insert within a transaction
If it possible create a Stored Procedure and pass parameters as DataTable.
Sending INSERT commands one by one to a database will really make the whole process slow, because of the round trips to the database server. If you're worried about performance, you should consider using a bulk insert strategy. You could:
Generate a flat file with all your information, in the format that BULK INSERT understands.
Use the BULK INSERT command to import that file to your database (http://msdn.microsoft.com/en-us/library/ms188365(v=sql.90).aspx).
Ps. I guess when you say SQL you're using MS SQL Server.
Why don't you run your uploadContent() method from a separate thread.
This way you don't need to worry about how much time the query takes to execute.
I have to implement an algorithm on data which is (for good reasons) stored inside SQL server. The algorithm does not fit SQL very well, so I would like to implement it as a CLR function or procedure. Here's what I want to do:
Execute several queries (usually 20-50, but up to 100-200) which all have the form select a,b,... from some_table order by xyz. There's an index which fits that query, so the result should be available more or less without any calculation.
Consume the results step by step. The exact stepping depends on the results, so it's not exactly predictable.
Aggregate some result by stepping over the results. I will only consume the first parts of the results, but cannot predict how much I will need. The stop criteria depends on some threshold inside the algorithm.
My idea was to open several SqlDataReader, but I have two problems with that solution:
You can have only one SqlDataReader per connection and inside a CLR method I have only one connection - as far as I understand.
I don't know how to tell SqlDataReader how to read data in chunks. I could not find documentation how SqlDataReader is supposed to behave. As far as I understand, it's preparing the whole result set and would load the whole result into memory. Even if I would consume only a small part of it.
Any hint how to solve that as a CLR method? Or is there a more low level interface to SQL server which is more suitable for my problem?
Update: I should have made two points more explicit:
I'm talking about big data sets, so a query might result in 1 mio records, but my algorithm would consume only the first 100-200 ones. But as I said before: I don't know the exact number beforehand.
I'm aware that SQL might not be the best choice for that kind of algorithm. But due to other constraints it has to be a SQL server. So I'm looking for the best possible solution.
SqlDataReader does not read the whole dataset, you are confusing it with the Dataset class. It reads row by row, as the .Read() method is being called. If a client does not consume the resultset the server will suspend the query execution because it has no room to write the output into (the selected rows). Execution will resume as the client consumes more rows (SqlDataReader.Read is being called). There is even a special command behavior flag SequentialAccess that instructs the ADO.Net not to pre-load in memory the entire row, useful for accessing large BLOB columns in a streaming fashion (see Download and Upload images from SQL Server via ASP.Net MVC for a practical example).
You can have multiple active result sets (SqlDataReader) active on a single connection when MARS is active. However, MARS is incompatible with SQLCLR context connections.
So you can create a CLR streaming TVF to do some of what you need in CLR, but only if you have one single SQL query source. Multiple queries it would require you to abandon the context connection and use isntead a fully fledged connection, ie. connect back to the same instance in a loopback, and this would allow MARS and thus consume multiple resultsets. But loopback has its own issues as it breaks the transaction boundaries you have from context connection. Specifically with a loopback connection your TVF won't be able to read the changes made by the same transaction that called the TVF, because is a different transaction on a different connection.
SQL is designed to work against huge data sets, and is extremely powerful. With set based logic it's often unnecessary to iterate over the data to perform operations, and there are a number of built-in ways to do this within SQL itself.
1) write set based logic to update the data without cursors
2) use deterministic User Defined Functions with set based logic (you can do this with the SqlFunction attribute in CLR code). Non-Deterministic will have the affect of turning the query into a cursor internally, it means the value output is not always the same given the same input.
[SqlFunction(IsDeterministic = true, IsPrecise = true)]
public static int algorithm(int value1, int value2)
{
int value3 = ... ;
return value3;
}
3) use cursors as a last resort. This is a powerful way to execute logic per row on the database but has a performance impact. It appears from this article CLR can out perform SQL cursors (thanks Martin).
I saw your comment that the complexity of using set based logic was too much. Can you provide an example? There are many SQL ways to solve complex problems - CTE, Views, partitioning etc.
Of course you may well be right in your approach, and I don't know what you are trying to do, but my gut says leverage the tools of SQL. Spawning multiple readers isn't the right way to approach the database implementation. It may well be that you need multiple threads calling into a SP to run concurrent processing, but don't do this inside the CLR.
To answer your question, with CLR implementations (and IDataReader) you don't really need to page results in chunks because you are not loading data into memory or transporting data over the network. IDataReader gives you access to the data stream row-by-row. By the sounds it your algorithm determines the amount of records that need updating, so when this happens simply stop calling Read() and end at that point.
SqlMetaData[] columns = new SqlMetaData[3];
columns[0] = new SqlMetaData("Value1", SqlDbType.Int);
columns[1] = new SqlMetaData("Value2", SqlDbType.Int);
columns[2] = new SqlMetaData("Value3", SqlDbType.Int);
SqlDataRecord record = new SqlDataRecord(columns);
SqlContext.Pipe.SendResultsStart(record);
SqlDataReader reader = comm.ExecuteReader();
bool flag = true;
while (reader.Read() && flag)
{
int value1 = Convert.ToInt32(reader[0]);
int value2 = Convert.ToInt32(reader[1]);
// some algorithm
int newValue = ...;
reader.SetInt32(3, newValue);
SqlContext.Pipe.SendResultsRow(record);
// keep going?
flag = newValue < 100;
}
Cursors are a SQL only function. If you wanted to read chunks of data at a time, some sort of paging would be required so that only a certain amount of the records would be returned. If using Linq,
.Skip(Skip)
.Take(PageSize)
Skips and takes could be used to limit results returned.
You can simply iterate over the DataReader by doing something like this:
using (IDataReader reader = Command.ExecuteReader())
{
while (reader.Read())
{
//Do something with this record
}
}
This would be iterating over the results one at a time, similiar to a cursor in SQL Server.
For multiple recordsets at once, try MARS
(if SQL Server)
http://msdn.microsoft.com/en-us/library/ms131686.aspx