I have been fretting over this problem too long, however as I have not had any definitive answers, I will try to state the situation again, in a clearer fashion.
I have a questionaire, which allows administrators to add different categories and different questions for each category, therefore on inserts for each categories I cannot hard code parameters as I do not know how many questions the user will be answering.
Right now I have an SP which inserts (or updates) one row;
CREATE PROCEDURE [dbo].[insertResults]
#userId nvarchar(10),
#groupId INT ,
#questionId INT,
#answer INT
AS
BEGIN
SELECT * FROM answers
WHERE userId = #userId AND questionId = #questionId
IF ##ROWCOUNT=0
INSERT INTO answers
( userId,
groupId,
questionId,
answer
)
VALUES (
#userId,
#groupId ,
#questionId,
#answer
)
ELSE
UPDATE answers
SET answer = #answer
WHERE userId = #userId AND questionId = #questionId
END
then in C# I loop through all the questions;
foreach (GridViewRow gvr in GridView1.Rows)
{
var rb = gvr.FindControl("answers_list") as RadioButtonList;
var quest = rb.SelectedValue;
if (quest == "")
{
quest = "0";
}
int questionId = Convert.ToInt32(GridView1.DataKeys[gvr.RowIndex].Values[0].ToString());
int groupId = Convert.ToInt32(GridView1.DataKeys[gvr.RowIndex].Values[1].ToString());
int question = Convert.ToInt32(quest);
var objDB01 = new dbconn();
const string strSQL = "insertResults";
objDB01.objCommand.Parameters.AddWithValue("#userId", logonName);
objDB01.objCommand.Parameters.AddWithValue("#groupId", groupId);
objDB01.objCommand.Parameters.AddWithValue("#questionId", questionId);
objDB01.objCommand.Parameters.AddWithValue("#answer", question);
try
{
objDB01.GetNonQuery(strSQL);
}
finally
{
objDB01.Dispose();
}
}
I have asked my own server team if opening and closing the DB so many times is bad coding (sometimes there can be over 100 people answering over 100 questions simultaneously) but I cannot get an answer. I have asked here if there is a more efficient solution or if the use of LINQ in this scenario can improve performance - but I cannot get an answer.
I may not have the knowledge, but I am trying to learn to programme elegantly and your help would be most appreciated!
I'm assuming the app is a web based asp.net one.
In this scenario, asp.net will use Connection Pooling, so that while you may be opening and closing many connections in your code, in reality, .NET is only using a set number of connections, and re-uses those as you require them.
If the aim is to reduce the load on the SQL server, I would recommend something like a Web Service, which collects responses as they come in, and then batches them to the SQL server instead of posting each individual one directly in from the website itself.
The answer is it depends.
It will be faster if you insert/update multiple questions in one operation (either with linq or with a stored proc). You may however need to update it for every question so if the user is disconnected they wont lose their work.
You really don't need to recreate the connection object inside the foreach loop, you can reuse the existing connection for all of the rows.
As for the impact of reconnecting practically there is none, since ADO.Net uses connection pooling, which means that when you disconnect from DB the ADO subsystem keeps the connection open for future reuse by another possible connection object.
As for LINQ, it is used for querying queryable objects, not for updating as in your use-case.
If you're talking about 100 people (and not a web site with millions of users), I think you're fine. Don't do premature optimization, unless you see your SQL Server instance suffering.
As #Jroc says, SQL Server reuses the same pool of connections, so you don't have to worry about opening and closing the connection. SQL Server is designed to do so, and it's actually better to close your connection as soon as you're done with it than risking leaving connections open somewhere.
As for LINQ, it doesn't give you any performance advantage in this scenario (actually, for inserts and updates LINQ to SQL tends to do exactly what you're doing and issue one insert/update per row)
One last thing about your stored procedure: if you're on SQL 2008, you might want to take a look at the MERGE command, which combines your logic to "check if row exists, if yes update, if not insert" in one single command, and may improve performance (and readability)
Related
In my program, I want to select some bookIDs into a tempDB for later queries like this (using Dapper extension):
using (var conn = new SqlConnection(connStr)) {
conn.Execute("SELECT bookID INTO #tempdb WHERE ... FROM Books");
int count = conn.ExecuteScalar<int>("SELECT COUNT(*) FROM #tempdb");
var authors = conn.Query("SELECT * FROM #tempdb LEFT JOIN BookAuthors ON ...");
}
However when I execute the page, I get following exception:
Invalid object name '#tempdb'.
It seems that life-cycle of #tempdb is only valid in first query ?
It looks like you're using the implicit connection opening/closing. This will indeed cause problems with transient objects. If you need temp tables between queries, you will need to manually open the connection before you execute any such queries. This should then work fine, and many examples in the test suite make use of temp tables in the way.
However, from a practical standpoint, making use of temporary tables to transfer state between queries is ... awkward. In addition to being brittle, it isn't good for the plan cache, as #foo has a different meaning between all uses on different connection (including reset but reused connections).
I found a previous poster who met the same problem and his solution.
Using dapper, why is a temp table created in one use of a connection not available in a second use of the same connection
The post indicates that you have to "CREATE TABLE #tempdb" explicitly in your SQL first and everything goes fine. Even the poster himself don't know why such style of coding works.
I'm querying Caché for a list of tables in two schemas and looping through those tables to obtain a count on the tables. However, this is incredibly slow. For instance, 13 million records took 8 hours to return results. When I query an Oracle database with 13 million records (on the same network), it takes 1.1 seconds to return results.
I'm using a BackgroundWorker to carry out the work apart from the UI (Windows Form).
Here's the code I'm using with the Caché ODBC driver:
using (OdbcConnection odbcCon = new OdbcConnection(strConnection))
{
try
{
odbcCon.Open();
OdbcCommand odbcCmd = new OdbcCommand();
foreach (var item in lstSchema)
{
var item = i;
odbcCmd.CommandText = "SELECT Count(*) FROM " + item;
odbcCmd.Connection = odbcCon;
AppendTextBox(item + " Count = " + Convert.ToInt32(odbcCmd.ExecuteScalar()) + "\r\n");
int intPercentComplete = (int)((float)(lstSchema.IndexOf(item) + 1) / (float)intTotalTables * 100);
worker.ReportProgress(intPercentComplete);
ModifyLabel(" (" + (lstSchema.IndexOf(item) + 1) + " out of " + intTotalTables + " processed)");
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
return;
}
}
Is the driver the issue?
Thanks.
I supose the devil is in the details. Your code does
SELECT COUNT(*) FROM Table
If the table has no indices then I wouldn't be surprised that it is slower than you expect. If the table has indices, especially bitmap indices, I would expect this to be on par with Oracle.
The other thing to consider is to understand how Cache is configured, ie what are the global buffers, what does the performance of the disk look like.
Intersystems cache is slower for querying than any SQL database I have used, especially when you deal with large databases. Now add an ODBC overhead to the picture and you will achieve even worse performance.
Some level of performance can be achieved through use of bitmap indexes, but often the only way to get good performance is to create more data.
You might even find that you can allocate more memory for the database (but that never seemed to do much for me)
For example every time you add new data force the database to increment a number somewhere for your count (or even multiple entries for the purpose of grouping). Then you can have performance at a reasonable level.
I wrote a little Intersystems performance test post on my blog...
http://tesmond.blogspot.co.uk/2013/09/intersystems-cache-performance-woe-is-me.html
Cache has a built in (smart) function that determines how to best execute queries. Of course having indexes, especially bitmapped, will drastically help query times. Though, a mere 13 million rows should take seconds tops. How much data is in each row? We have 260 million rows in many tables and 790 million rows in others. We can mow through the whole thing in a couple of minutes. A non-indexed, complex query may take a day, though that is understandable. Take a look at what's locking your globals. We have also discovered that apparently queries run even if the client is disconnected. You can kill the task with the management portal, but the system doesn't seem to like doing more than one ODBC query at once with larger queries because it takes gigs of temp data to do such a query. We use DBVisualizer for a JDBC connection.
Someone mentioned TuneTable, that's great to run if your table changes a lot or at least a couple of times in the table's life. This is NOT something that you want to overuse. http://docs.intersystems.com/ens20151/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_optimizing is where you can find some documentation and other useful information about this and improving performance. If it's not fast then someone broke it.
Someone also mentioned that select count() will count an index instead of the table itself with computed properties. This is related to that decision engine that compiles your sql queries and decides what the most efficient method is to get your data. There is a tool in the portal that will show you how long it takes and will show you the other methods (that the smart interpreter [I forget what it's called]) that are available. You can see the Query Plan at the same page that you can execute SQL in the browser mentioned below. /csp/sys/exp/UtilSqlQueryShowPlan.csp
RE: I can't run this query from within the Management Portal because the tables are only made available from within an application and/or ODBC.
That isn't actually true. Within the management portal, go to System Explorer, SQL, then Execute SQL Statements. Please note that you must have adequate privileges to see this %ALL will allow access to everything. Also, you can run SQL queries natively in TERMINAL by executing.... do $system.SQL.Shell() Then type your queries. This interface should be faster than ODBC as I think it uses object access. Also, keep in mind that embedded SQL and object access of data is the fastest way to access data.
Please let me know if you have any more questions!
Given the following code (which is mostly irrelevant except for the last two lines), what would your method be to get the value of the identity field for the new record that was just created? Would you make a second call to the database to retrieve it based on the primary key of the object (which could be problematic if there's not one), or based on the last inserted record (which could be problematic with multithreaded apps) or is there maybe a more clever way to get the new value back at the same time you are making the insert?
Seems like there should be a way to get an Identity back based on the insert operation that was just made rather than having to query for it based on other means.
public void Insert(O obj)
{
var sqlCmd = new SqlCommand() { Connection = con.Conn };
var sqlParams = new SqlParameters(sqlCmd.Parameters, obj);
var props = obj.Properties.Where(o => !o.IsIdentity);
InsertQuery qry = new InsertQuery(this.TableAlias);
qry.FieldValuePairs = props.Select(o => new SqlValuePair(o.Alias, sqlParams.Add(o))).ToList();
sqlCmd.CommandText = qry.ToString();
sqlCmd.ExecuteNonQuery();
}
EDIT: While this question isn't a duplicate in the strictest manner, it's almost identical to this one which has some really good answers: Best way to get identity of inserted row?
It strongly depends on your database server. For example for Microsoft SQL Server you can get the value of the ##IDENTITY variable, that contains the last identity value assigned.
To prevent race conditions you must keep the insert query and the variable read inside a transaction.
Another solution could be to create a stored procedure for every type of insert you have to do and make it return the identity value and accept the insert arguments.
Otherwise, inside a transaction you can implement whatever ID assignment logic you want and be preserved from concurrency problems.
Afaik there is not finished way.
I solved by using client generated ids (guid) so that my method generated the id and returns it to the caller.
Perhaps you can analyse some SqlServer systables in order to see what has last changed. But you would get concurrency issues (What if someone else inserts a very similar record).
So I would recommend a strategy change and generate the id's on the clients
You can take a look at : this link.
I may add that to avoid the fact that multiple rows can exist, you can use "Transactions", make the Insert and the select methods in the same transaction.
Good luck.
The proper approach is to learn sql.
You can do a SQL command followed by a SELECT in one run, so you can go in and return the assigned identity.
See
I know how to execute queries from C# but I want to provide a dropdown list in which people can write a query and it will execute and populate the list.
A problem is that I want to forbid all queries that modify the database in any way. I have not managed to find a way to do this and I did my best with google.
The solution I can think of is that I will scan the query for INSERT, DELETE, UPDATE and only allow SELECT statements. However, I want to be able to allow users to call stored procedures as well. This means I need to get the body of the stored procedure and scan it before I execute it. How do I download a stored procedure then?
If anyone knows a way to only execute read only queries do share please! I have the feeling scanning the text for INSERT, DELETE, UPDATE doesn't prevent SQL injections.
The easiest way to do this might be to offload this job to the database. Just make sure that the database user that will be running the queries has read-access only. Then, any queries that do anything other than SELECT will fail, and you can report that failure back to the users.
If you don't go this route, the complexity becomes quite enormous, since you basically have to be prepared to parse an arbitrary SQL statement, not to mention arbitrary sequences of SQL statements if you allow stored procs to be run.
Even then, take care to ensure that you aren't leaking sensitive data through your queries. Directly input queries from site users can be dangerous if you're not careful. Even if you are, allowing these queries on anything but a specifically constructed sandbox database is a "whoops, I accidentally changed the user's permissions" away from becoming a security nightmare.
Another option is to write a "query creator" page, where users can pick the table and columns they'd like to see. You can then a) only show tables and columns that are appropriate for a given user (possibly based on user roles etc.) and b) generate the SQL yourself, preferably using a parameterized query.
Update: As Yahia points out, if the user has execute privilege (so that they can execute stored procs,) then the permissions of the procedure itself are honoured. Given that, it might be better to not allow arbitrary stored proc execution, but rather offer the users a list of procedures that are known to be safe. That will probably be difficult to maintain and error-prone, though, so disallowing stored procs altogether might be best.
How about creating a user account on the database server which only has select (read-only) rights?
Perhaps you could set up a SQL user with read-only access to the database and issue the command using that user? Then you can catch the errors when/if they happen.
It seems to me that it's going to be very difficult and error-prone to try to parse the query to figure out if it modifies the database.
You can't parse SQL like that reliably.
Use permissions to
Allow only SELECT on tables and views
No permissions on stored procedures that change data (An end user by default won't be able to see stored procedure definition)
Best is to not allow users to enter SQL and use only prepared/parameterized queries...
The next best way to prevent that is to use a restricted user with pure read access
The above two can be combined...
BEWARE
To execute a Stored Procedure the user must have execute privilege... IF the Stored Procedure modifies data then this would happen without an error messages even with a restricted user since the permission to modify is granted to the Stored Procedure!
IF you absolutely must allow users to enter SQL and can't restrict the login then you would need to use a SQL parser - for example this...
As to how to download the body of a Stored Procedure - this is dependent on the DB you use (SQL Server, Oracle etc.).
EDIT:
Another option are so-called "Database Firewall" - you connect instead of directly to the DB to the Firewall... in the Firewall you configure several things like time-based restrictions (when specific users/statement are/art not allowed), SQL-based statement (which are allowed...), quantity-based restrictions (like you can get 100 records, but are not able to download the whole table/DB...) etc.
There are commercial and opensource DB Firewalls out there - though these are by nature very dependent on the DB you use etc.
Examples:
Oracle Firewall - works with Oracle / SQL Server / DB2 etc.
SecureSphere - several including Oracle / SQL Server / DB2 etc.
GreenSQL - opensource version support Postgres + MySQL, commercial MS SQL Server
Don't forget about things that are even worse than INSERT, UPDATE, and DELETE. Like TRUNCATE...that's some bad stuff.
i think SQL Trigger is the best way what you want to do.
Your first move should be to create a DB user for this specific task with only the needed permissions (basically SELECT only), and with the rights to see only the tables you need them to see (so they cannot SELECT sys tables or your users table).
More generally, it seems like a bad idea to let users execute code directly on your database. Even if you protect it against data modification, they will still be able to make ugly-looking joins to make your db run slow, for instance.
Maybe whichever language your programming the UI with, you could try to look online for a custom control that allows filtering on a database. Google it...
this is not perfect but might be what you want, this allows the keyword to appear if its a part of a bigger alphanumeric string:
public static bool ValidateQuery(string query)
{
return !ValidateRegex("delete", query) && !ValidateRegex("exec", query) && !ValidateRegex("insert", query) && !ValidateRegex("alter", query) &&
!ValidateRegex("create", query) && !ValidateRegex("drop", query) && !ValidateRegex("truncate", query);
}
public static bool ValidateRegex(string term, string query)
{
// this regex finds all keywords {0} that are not leading or trailing by alphanumeric
return new Regex(string.Format("([^0-9a-z]{0}[^0-9a-z])|(^{0}[^0-9a-z])", term), RegexOptions.IgnoreCase).IsMatch(query);
}
you can see how it works here: regexstorm
see regex cheat sheet: cheatsheet1, cheatsheet2
notice this is not perfect since it might block a query with one of the keywords as a quote, but if you write the queries and its just a precaution then this might do the trick.
you can also take a different approach, try the query, and if it affects the database do a rollback:
public static bool IsDbAffected(string query, string conn, List<SqlParameter> parameters = null)
{
var response = false;
using (var sqlConnection = new SqlConnection(conn))
{
sqlConnection.Open();
using (var transaction = sqlConnection.BeginTransaction("Test Transaction"))
using (var command = new SqlCommand(query, sqlConnection, transaction))
{
command.Connection = sqlConnection;
command.CommandType = CommandType.Text;
command.CommandText = query;
if (parameters != null)
command.Parameters.AddRange(parameters.ToArray());
// ExecuteNonQuery() does not return data at all: only the number of rows affected by an insert, update, or delete.
if (command.ExecuteNonQuery() > 0)
{
transaction.Rollback("Test Transaction");
response = true;
}
transaction.Dispose();
command.Dispose();
}
}
return response;
}
you can also combine the two.
I am building an application and I want to batch multiple queries into a single round-trip to the database. For example, lets say a single page needs to display a list of users, a list of groups and a list of permissions.
So I have stored procs (or just simple sql commands like "select * from Users"), and I want to execute three of them. However, to populate this one page I have to make 3 round trips.
Now I could write a single stored proc ("getUsersTeamsAndPermissions") or execute a single SQL command "select * from Users;exec getTeams;select * from Permissions".
But I was wondering if there was a better way to specify to do 3 operations in a single round trip. Benefits include being easier to unit test, and allowing the database engine to parrallelize the queries.
I'm using C# 3.5 and SQL Server 2008.
Something like this. The example is probably not very good as it doesn't properly dispose objects but you get the idea. Here's a cleaned up version:
using (var connection = new SqlConnection(ConnectionString))
using (var command = connection.CreateCommand())
{
connection.Open();
command.CommandText = "select id from test1; select id from test2";
using (var reader = command.ExecuteReader())
{
do
{
while (reader.Read())
{
Console.WriteLine(reader.GetInt32(0));
}
Console.WriteLine("--next command--");
} while (reader.NextResult());
}
}
The single multi-part command and the stored procedure options that you mention are the two options. You can't do them in such a way that they are "parallelized" on the db. However, both of those options does result in a single round trip, so you're good there. There's no way to send them more efficiently. In sql server 2005 onwards, a multi-part command that is fully parameterized is very efficient.
Edit: adding information on why cram into a single call.
Although you don't want to care too much about reducing calls, there can be legitimate reasons for this.
I once was limited to a crummy ODBC driver against a mainframe, and there was a 1.2 second overhead on each call! I'm serious. There were times when I crammed a little extra into my db calls. Not pretty.
You also might find yourself in a situation where you have to configure your sql queries somewhere, and you can't just make 3 calls: it has to be one. It shouldn't be that way, bad design, but it is. You do what you gotta do!
Sometimes of course it can be very good to encapsulate multiple steps in a stored procedure. Usually not for saving round trips though, but for tighter transactions, getting ID for new records, constraining for permissions, providing encapsulation, blah blah blah.
Making one round-trip vs three will be more eficient indeed. The question is wether it is worth the trouble. The entire ADO.Net and C# 3.5 toolset and framework opposes what you try to do. TableAdapters, Linq2SQL, EF, all these like to deal with simple one-call==one-resultset semantics. So you may loose some serious productivity by trying to beat the Framework into submission.
I would say that unless you have some serious measurements showing that you need to reduce the number of roundtrips, abstain. If you do end up requiring this, then use a stored procedure to at least give an API kind of semantics.
But if your query really is what you posted (ie. select all users, all teams and all permissions) then you obviosuly have much bigger fish to fry before reducing the round-trips... reduce the resultsets first.
I this this link might be helpful.
Consider using at least the same connection-openning; according to what it says here, openning a connection is almost the top-leader of performance cost in Entity-Framework.
Firstly, 3 round trips isn't really a big deal. If you were talking about 300 round trips then that would be another matter, but for just 3 round trips I would conderer this to definitley be a case of premature optimisation.
That said, the way I'd do this would probably be to executed the 3 stored procuedres using SQL:
exec dbo.p_myproc_1 #param_1 = #in_param_1, #param_2 = #in_param_2
exec dbo.p_myproc_2
exec dbo.p_myproc_3
You can then iterate through the returned results sets as you would if you directly executed multiple rowsets.
Build a temp-table? Insert all results into the temp table and then select * from #temp-table
as in,
#temptable=....
select #temptable.field=mytable.field from mytable
select #temptable.field2=mytable2.field2 from mytable2
etc... Only one trip to the database, though I'm not sure it is actually more efficient.