Is SQL code faster than C# code? [closed] - c#

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Few months ago i started to work at this programming company. One of the practices they use is to do as much work as possible in SQL rather than C#.
So, lets say i have this simple example of writing a list of some files:
Is something like this:
string SQL = #"
SELECT f.FileID,
f.FileName,
f.FileExtension,
'/files/' + CAST(u.UserGuid AS VARCHAR(MAX)) + '/' + (f.FileName + f.FileExtension) AS FileSrc,
FileSize=
CASE
WHEN f.FileSizeB < 1048576 THEN CAST(CAST((f.FileSizeB / 1024) AS DECIMAL(6, 2)) AS VARCHAR(8)) + ' KB'
ELSE CAST(CAST((f.FileSizeB / 1048576) AS DECIMAL(6, 2)) AS VARCHAR(8)) + ' MB'
END
FROM Files f
INNER JOIN Users u
ON f.UserID = u.UserID
";
// some loop for writing results {
// write...
// }
Faster or better then something like this:
string SQL = #"
SELECT u.UserGuid,
f.FileID,
f.FileName,
f.FileExtension,
f.FileSizeB
FROM Files f
INNER JOIN Users u
ON f.UserID = u.UserID";
// some loop for writing results {
string FileSrc = "/Files/" + result["UserGuid"] + "/" + result["FileName"] + result["FileExtension"];
string FileSize = ConvertToKbOrMb(result["FileSizeB"]);
// write...
// }
This particular code doesn't matter (it's just some basic example) ... the question is about this kind of thing in general ... is it better to put more load on SQL or 'normal' code?

It just a bad programming practice. You should separate and isolate different parts of your program for ease of future maintenance (think of the next programmer!)
Performance
Many solutions suffer from poor DB performance, so most developers usually limit the SQL database access to the smallest transaction possible. Ideally the transformation of raw data to human readable form should happen at the very last point possible. Also the memory usage of non-formatted data is much smaller, and while memory is cheap, you shouldn't waste it. Every extra byte to be buffered, cached, and transmitted all takes up time, and reduces available server resources
e.g. for a web application formatting should be done by local JavaScript templates from a JSON data packet. This reduces the workload of the backend SQL database and application servers, and reduces the data that needs to be transmitted over the network, all of which speeds up server performance
Formatting and Localisation
Many solutions have different output needs for the same transaction e.g. different views, different localisations etc. By embedding formating into the SQL transaction you will have to make a new transaction for each localisation, this will be become a maintenance nightmare
Also formatted transactions cannot be used for an API interface, you would need yet another set of transaction for the API interface which would have no formatting
With c# you should be using a well tested template or string handling library, or at least string.Format(), do not use the '+' operator with strings, it is very slow
Share the load
Most solutions have multiple clients for one DB, so the client side formatting load is shared with the multiple clients CPU's, not the single SQL database CPU
I seriously doubt SQL is faster than c#, you should perform a simple benchmark and post the results here :-)

The reason that the second part it may be little slower is because you need to pull out the data from the SQL server and gives it to C# part of code, and this takes more time.
The more read you make like ConvertToKbOrMb(result["FileSizeB"]) can always take some more time, and also depend from your DAL layer. I have see some DALs that are really slow.
If you left them on the SQL Server you gain this extra processing of getting out the data, thats all.
From experience, one of my optimizations is always to pull out only the needed data - The more data you read from the sql server and move it to whatever (asp.net, console, c# program etc), the more time you spend to move them around, especial if they are big strings, or make a lot of conversions from string to numbers.
To answer and to the direct question, what is faster - I say that you can not compare them. They are both as fast as possible, if you make good code and good queries. SQL Server also keeps a lot of statistics and improve the return query - c# did not have this kind of part, so what to compare ?
One test by my self
Ok, I have here a lot of data from a project, and make a fast test that actually not prove that the one is faster than the other.
What I run two cases.
SELECT TOP 100 PERCENT cI1,cI2,cI3
FROM [dbo].[ARL_Mesur] WITH (NOLOCK) WHERE [dbo].[ARL_Mesur].[cWhen] > #cWhen0;
foreach (var Ena in cAllOfThem)
{
// this is the line that I move inside SQL server to see what change on speed
var results = Ena.CI1 + Ena.CI2 + Ena.CI3;
sbRender.Append(results);
sbRender.Append(Ena.CI2);
sbRender.Append(Ena.CI3);
}
vs
SELECT TOP 100 PERCENT (cI1+cI2+cI3) as cI1,cI2,cI3
FROM [dbo].[ARL_Mesur] WITH (NOLOCK) WHERE [dbo].[ARL_Mesur].[cWhen] > #cWhen0;
foreach (var Ena in cAllOfThem)
{
sbRender.Append(Ena.CI1);
sbRender.Append(Ena.CI2);
sbRender.Append(Ena.CI3);
}
and the results shows that the speed is near the same.
- All the parameters are double
- The reads are optimized, I make no extra reads at all, just move the processing from the one part to the other.
On 165,766 lines, here are some results:
Start 0ms +0ms
c# processing 2005ms +2005ms
sql processing 4011ms +2006ms
Start 0ms +0ms
c# processing 2247ms +2247ms
sql processing 4514ms +2267ms
Start 0ms +0ms
c# processing 2018ms +2018ms
sql processing 3946ms +1928ms
Start 0ms +0ms
c# processing 2043ms +2043ms
sql processing 4133ms +2090ms
So, the speed can be affected from many factors... we do not know what is your company issue that makes the c# slower than the sql processing.

As a general rule of thumb: SQL is for manipulating data, not formatting how it is displayed.
Do as much as you can in SQL, yes, but only as long as it serves that goal. I'd take a long hard look at your "SQL example", solely on that ground. Your "C# example" looks like a cleaner separation of responsibilities to me.
That being said, please don't take it too far and stop doing things in SQL that should be done in SQL, such as filtering and joining. For example reimplementing INNER JOIN Users u ON f.UserID = u.UserID in C# would be a catastrophe, performance-wise.
As for performance in this particular case:
I'd expect "C# example" (not all C#, just this example) to be slightly faster, simply because...
f.FileSizeB
...looks narrower than...
'/files/' + CAST(u.UserGuid AS VARCHAR(MAX)) + '/' + (f.FileName + f.FileExtension) AS FileSrc,
FileSize=
CASE
WHEN f.FileSizeB < 1048576 THEN CAST(CAST((f.FileSizeB / 1024) AS DECIMAL(6, 2)) AS VARCHAR(8)) + ' KB'
ELSE CAST(CAST((f.FileSizeB / 1048576) AS DECIMAL(6, 2)) AS VARCHAR(8)) + ' MB'
END
...which should conserve some network bandwidth. And network bandwidth tends to be scarcer resource than CPU (especially client-side CPU).
Of course, your mileage may vary, but either way the performance difference is likely to be small enough, so other concerns, such as the overall maintainability of code, become relatively more important. Frankly, your "C# example" looks better to me here, in that regard.

There are good reasons to do as much as you can on the database server. Minimizing the amount of data that has to be passed back and forth and giving the server as much leeway in optimizing the process is a good thing.
However that is not really illustrated in your example. Both processes pass as much data back and forth (perhaps the first passes more) and the only difference is who does the calculation and it may be that the client does this better.

Your question is about whether the string manipulation operations should be done in C# or SQL. I would argue that this example is so small that any performance gain -- one-way or the other -- is irrelevant. The question is "where should this be done"?
If the code is "one-off" code for part of an application, then doing in the application level makes a lot of sense. If this code is repeated throughout the application, then you want to encapsulate it. I would argue that the best way to encapsulate it is using a SQL Server computed column, view, table-valued function, or scalar function (with the computed column being preferable in this case). This ensures that the same processing occurs the same no matter where called.
There is a key difference between database code and C# code in terms of performance. The database code automatically runs in parallel. So, if your database server is multi-threaded, then separate threads might be doing those string manipulations at the same time (no promises, the key word here is "might").
In general when thinking about the split, you want to minimize the amount of data being passed back and forth. The difference in this case seems to be minimal.
So, if this is one place in an application that has this logic, then do it in the application. If the application is filled with references to this table that want this logic, then think about a computed column. If the application has lots of similar requests on different tables, then think about a scalar valued function, although this might affect the ability of queries to take advantage of parallelism.

It really depends on what you're doing.
Don't forget about SQL CLR. There are many operations that T-SQL code is just slower at.

Usually in production environments the database infrastructure tier is given twice sometimes three times as much resource that the application tier.
Also, for SQL code to run natively against the database will have a keen advantage of SQL code being ran on the application and passed through a database driver.

Related

Is it safe to not parameterize an SQL query when the parameter is not a string?

In terms of SQL injection, I completely understand the necessity to parameterize a string parameter; that's one of the oldest tricks in the book. But when can it be justified to not parameterize an SqlCommand? Are any data types considered "safe" to not parameterize?
For example: I don't consider myself anywhere near an expert in SQL, but I can't think of any cases where it would be potentially vulnerable to SQL injection to accept a bool or an int and just concatenate it right into the query.
Is my assumption correct, or could that potentially leave a huge security vulnerability in my program?
For clarification, this question is tagged c# which is a strongly-typed language; when I say "parameter," think something like public int Query(int id).
I think it's safe... technically, but it's a terrible habit to get into. Do you really want to be writing queries like this?
var sqlCommand = new SqlCommand("SELECT * FROM People WHERE IsAlive = " + isAlive +
" AND FirstName = #firstName");
sqlCommand.Parameters.AddWithValue("firstName", "Rob");
It also leaves you vulnerable in the situation where a type changes from an integer to a string (Think employee number which, despite its name - may contain letters).
So, we've changed the type of EmployeeNumber from int to string, but forgot to update our sql queries. Oops.
When using a strongly-typed platform on a computer you control (like a web server), you can prevent code injection for queries with only bool, DateTime, or int (and other numeric) values. What is a concern are performance issues caused by forcing sql server to re-compile every query, and by preventing it from getting good statistics on what queries are run with what frequency (which hurts cache management).
But that "on a computer you control" part is important, because otherwise a user can change the behavior used by the system for generating strings from those values to include arbitrary text.
I also like to think long-term. What happens when today's old-and-busted strongly-typed code base gets ported via automatic translation to the new-hotness dynamic language, and you suddenly lose the type checking, but don't have all the right unit tests yet for the dynamic code?
Really, there's no good reason not to use query parameters for these values. It's the right way to go about this. Go ahead and hard-code values into the sql string when they really are constants, but otherwise, why not just use a parameter? It's not like it's hard.
Ultimately, I wouldn't call this a bug, per se, but I would call it a smell: something that falls just short of a bug by itself, but is a strong indication that bugs are nearby, or will be eventually. Good code avoids leaving smells, and any good static analysis tool will flag this.
I'll add that this is not, unfortunately, the kind of argument you can win straight up. It sounds like a situation where being "right" is no longer enough, and stepping on your co-workers toes to fix this issue on your own isn't likely to promote good team dynamics; it could ultimately hurt more than it helps. A better approach in this case may be to promote the use of a static analysis tool. That would give legitimacy and credibility to efforts aimed and going back and fixing existing code.
In some cases, it IS possible to perform SQL injection attack with non-parametrized (concatenated) variables other than string values - see this article by Jon: http://codeblog.jonskeet.uk/2014/08/08/the-bobbytables-culture/ .
Thing is that when ToString is called, some custom culture provider can transform a non-string parameter into its string representation which injects some SQL into the query.
This is not safe even for non-string types. Always use parameters. Period.
Consider following code example:
var utcNow = DateTime.UtcNow;
var sqlCommand = new SqlCommand("SELECT * FROM People WHERE created_on <= '" + utcNow + "'");
At the first glance code looks safe, but everything changes if you make some changes in Windows Regional Settings and add injection in short date format:
Now resulting command text looks like this:
SELECT * FROM People WHERE created_on <= '26.09.2015' OR '1'<>' 21:21:43'
The same can be done for int type as user can define custom negative sign which can be easily changed into SQL injection.
One could argue that invariant culture should be used instead of current culture, but I have seen string concatenations like this so many times and it is quite easy to miss when concatenating strings with objects using +.
"SELECT * FROM Table1 WHERE Id=" + intVariable.ToString()
Security
It is OK.
Attackers can not inject anything in your typed int variable.
Performance
Not OK.
It's better to use parameters, so the query will be compiled once and cached for next usage. Next time even with different parameter values, query is cached and doesn't need to compile in database server.
Coding Style
Poor practice.
Parameters are more readable
Maybe it makes you get used to queries without parameters, then maybe you made a mistake once and use a string value this way and then you probably should say goodbye to your data. Bad habit!
"SELECT * FROM Product WHERE Id=" + TextBox1.Text
Although it is not your question, but maybe useful for future readers:
Security
Disaster!
Even when the Id field is an integer, your query may be subject to SQL Injection. Suppose you have a query in your application "SELECT * FROM Table1 WHERE Id=" + TextBox1.Text. An attacker can insert into text box 1; DELETE Table1 and the query will be:
SELECT * FROM Table1 WHERE Id=1; DELETE Table1
If you don't want to use a parametrized query here, you should use typed values:
string.Format("SELECT * FROM Table1 WHERE Id={0}", int.Parse(TextBox1.Text))
Your Question
My question arose because a coworker wrote a bunch of queries
concatenating integer values, and I was wondering whether it was a
waste of my time to go through and fix all of them.
I think changing those codes is not waste of time. Indeed change is recommended!
If your coworker uses int variables it has no security risk, but I think changing those codes is not waste of time and indeed changing those codes is recommended. It makes code more readable, more maintainable, and makes execution faster.
There are actually two questions in one. And question from the title has very little to do with concerns expressed by the OP in the comments afterwards.
Although I realize that for the OP it is their particular case that matters, for the readers coming from Google, it is important to answer to the more general question, that can be phrased as "is concatenation as safe as prepared statements if I made sure that every literal I am concatenating is safe?". So, I would like to concentrate on this latter one. And the answer is
Definitely NO.
The explanation is not that direct as most readers would like, but I'll try my best.
I have been pondering on the matter for a while, resulting in the article (though based on the PHP environment) where I tried to sum everything up. It occurred to me that the question of protection from SQL injection is often eludes toward some related but narrower topics, like string escaping, type casting and such. Although some of the measures can be considered safe when taken by themselves, there is no system, nor a simple rule to follow. Which makes it very slippery ground, putting too much on the developer's attention and experience.
The question of SQL injection cannot be simplified to a matter of some particular syntax issue. It is wider than average developer used to think. It's a methodological question as well. It is not only "Which particular formatting we have to apply", but "How it have to be done" as well.
(From this point of view, an article from Jon Skeet cited in the other answer is doing rather bad than good, as it is again nitpicking on some edge case, concentrating on a particular syntax issue and failing to address the problem at whole.)
When you're trying to address the question of protection not as whole but as a set of different syntax issues, you're facing multitude of problems.
the list of possible formatting choices is really huge. Means one can easily overlook some. Or confuse them (by using string escaping for identifier for example).
Concatenation means that all protection measures have to be done by the programmer, not program. This issue alone leads to several consequences:
such a formatting is manual. Manual means extremely error prone. One could simply forget to apply.
moreover, there is a temptation to move formatting procedures into some centralized function, messing things even more, and spoiling data that is not going to database.
when more than one developers involved, problems multiply by a factor of ten.
when concatenation is used, one cannot tell a potentially dangerous query at glance: they all potentially dangerous!
Unlike that mess, prepared statements are indeed The Holy Grail:
it can be expressed in the form of one simple rule that is easy to follow.
it is essentially undetacheable measure, means the developer cannot interfere, and, willingly or unwillingly, spoil the process.
protection from injection is really only a side effect of the prepared statements, which real purpose is to produce syntactically correct statement. And a syntactically correct statement is 100% injection proof. Yet we need our syntax to be correct despite of any injection possibility.
if used all the way around, it protects the application regardless of the developer's experience. Say, there is a thing called second order injection. And a very strong delusion that reads "in order to protect, Escape All User Supplied Input". Combined together, they lead to injection, if a developer takes the liberty to decide, what needs to be protected and what not.
(Thinking further, I discovered that current set of placeholders is not enough for the real life needs and have to be extended, both for the complex data structures, like arrays, and even SQL keywords or identifiers, which have to be sometimes added to the query dynamically too, but a developer is left unarmed for such a case, and forced to fall back to string concatenation but that's a matter of another question).
Interestingly, this question's controversy is provoked by the very controversial nature of Stack Overflow. The site's idea is to make use of particular questions from users who ask directly to achieve the goal of having a database of general purpose answers suitable for users who come from search. The idea is not bad per se, but it fails in a situation like this: when a user asks a very narrow question, particularly to get an argument in a dispute with a colleague (or to decide if it worth to refactor the code). While most of experienced participants are trying to write an answer, keeping in mind the mission of Stack Overflow at whole, making their answer good for as many readers as possible, not the OP only.
Let's not just think about security or type-safe considerations.
The reason you use parametrized queries is to improve performance at the database level. From a database perspective, a parametrized query is one query in the SQL buffer (to use Oracle's terminology although I imagine all databases have a similar concept internally). So, the database can hold a certain amount of queries in memory, prepared and ready to execute. These queries do not need to be parsed and will be quicker. Frequently run queries will usually be in the buffer and will not need parsing every time they are used.
UNLESS
Somebody doesn't use parametrized queries. In this case, the buffer gets continually flushed through by a stream of nearly identical queries each of which needs to be parsed and run by the database engine and performance suffers all-round as even frequently run queries end up being re-parsed many times a day. I have tuned databases for a living and this has been one of the biggest sources of low-hanging fruit.
NOW
To answer your question, IF your query has a small number of distinct numeric values, you will probably not be causing issues and may in fact improve performance infinitesimally. IF however there are potentially hundreds of values and the query gets called a lot, you are going to affect the performance of your system so don't do it.
Yes you can increase the SQL buffer but it's always ultimately at the expense of other more critical uses for memory like caching Indexes or Data. Moral, use parametrized queries pretty religiously so you can optimize your database and use more server memory for the stuff that matters...
To add some info to Maciek answer:
It is easy to alter the culture info of a .NET third party app by calling the main-function of the assembly by reflection:
using System;
using System.Globalization;
using System.Reflection;
using System.Threading;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
Assembly asm = Assembly.LoadFile(#"C:\BobbysApp.exe");
MethodInfo mi = asm.GetType("Test").GetMethod("Main");
mi.Invoke(null, null);
Console.ReadLine();
}
static Program()
{
InstallBobbyTablesCulture();
}
static void InstallBobbyTablesCulture()
{
CultureInfo bobby = (CultureInfo)CultureInfo.InvariantCulture.Clone();
bobby.DateTimeFormat.ShortDatePattern = #"yyyy-MM-dd'' OR ' '=''";
bobby.DateTimeFormat.LongTimePattern = "";
bobby.NumberFormat.NegativeSign = "1 OR 1=1 OR 1=";
Thread.CurrentThread.CurrentCulture = bobby;
}
}
}
This only works if the Main function of BobbysApp is public. If Main is not public, there might be other public functions you might call.
In my opinion if you can guarantee that the parameter you working with will never contain a string it is safe but I would not do it in any case. Also, you will see a slight performance drop due to the fact that you are performing concatenation. The question I would ask you is why don't you want to use parameters?
It is ok but never safe.. and the security always depend on the inputs, for example if the input object is TextBox, the attackers can do something tricky since the textbox can accept string, so you have to put some kind of validation/conversion to be able prevent users the wrong input. But the thing is, it is not safe. As simply as that.
No you can get an SQL injection attack that way. I have written an old article in Turkish which shows how here. Article example in PHP and MySQL but concept works same in C# and SQL Server.
Basically you attack following way. Lets consider you have a page which shows information according to integer id value. You do not parametrized this in value, like below.
http://localhost/sqlEnjeksiyon//instructors.aspx?id=24
Okay, I assume you are using MySQL and I attack following way.
http://localhost/sqlEnjeksiyon//instructors.aspx?id=ASCII((SELECT%20DATABASE()))
Note that here injected value is not string. We are changing char value to int using ASCII function. You can accomplish same thing in SQL Server using "CAST(YourVarcharCol AS INT)".
After that I use length and substring functions to find about your database name.
http://localhost/sqlEnjeksiyon//instructors.aspx?id=LEN((SELECT%20DATABASE()))
http://localhost/sqlEnjeksiyon//instructors.aspx?id=ASCII(SUBSTR(SELECT%20DATABASE(),1,1))
Then using database name, you start to get table names in database.
http://localhost/sqlEnjeksiyon//instructors.aspx?id=ASCII(SUBSTR((SELECT table_name FROM INFORMATION_SCHEMA.TABLES LIMIT 1),1,1))
Of course you have to automate this process, since you only get ONE character per query. But you can easily automate it. My article shows one example in watir. Using only one page and not parameterized ID value. I can learn every table name in your database. After that I can look for important tables. It will take time but it is doable.

Querying Intersystem Caché through ODBC

I'm querying Caché for a list of tables in two schemas and looping through those tables to obtain a count on the tables. However, this is incredibly slow. For instance, 13 million records took 8 hours to return results. When I query an Oracle database with 13 million records (on the same network), it takes 1.1 seconds to return results.
I'm using a BackgroundWorker to carry out the work apart from the UI (Windows Form).
Here's the code I'm using with the Caché ODBC driver:
using (OdbcConnection odbcCon = new OdbcConnection(strConnection))
{
try
{
odbcCon.Open();
OdbcCommand odbcCmd = new OdbcCommand();
foreach (var item in lstSchema)
{
var item = i;
odbcCmd.CommandText = "SELECT Count(*) FROM " + item;
odbcCmd.Connection = odbcCon;
AppendTextBox(item + " Count = " + Convert.ToInt32(odbcCmd.ExecuteScalar()) + "\r\n");
int intPercentComplete = (int)((float)(lstSchema.IndexOf(item) + 1) / (float)intTotalTables * 100);
worker.ReportProgress(intPercentComplete);
ModifyLabel(" (" + (lstSchema.IndexOf(item) + 1) + " out of " + intTotalTables + " processed)");
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
return;
}
}
Is the driver the issue?
Thanks.
I supose the devil is in the details. Your code does
SELECT COUNT(*) FROM Table
If the table has no indices then I wouldn't be surprised that it is slower than you expect. If the table has indices, especially bitmap indices, I would expect this to be on par with Oracle.
The other thing to consider is to understand how Cache is configured, ie what are the global buffers, what does the performance of the disk look like.
Intersystems cache is slower for querying than any SQL database I have used, especially when you deal with large databases. Now add an ODBC overhead to the picture and you will achieve even worse performance.
Some level of performance can be achieved through use of bitmap indexes, but often the only way to get good performance is to create more data.
You might even find that you can allocate more memory for the database (but that never seemed to do much for me)
For example every time you add new data force the database to increment a number somewhere for your count (or even multiple entries for the purpose of grouping). Then you can have performance at a reasonable level.
I wrote a little Intersystems performance test post on my blog...
http://tesmond.blogspot.co.uk/2013/09/intersystems-cache-performance-woe-is-me.html
Cache has a built in (smart) function that determines how to best execute queries. Of course having indexes, especially bitmapped, will drastically help query times. Though, a mere 13 million rows should take seconds tops. How much data is in each row? We have 260 million rows in many tables and 790 million rows in others. We can mow through the whole thing in a couple of minutes. A non-indexed, complex query may take a day, though that is understandable. Take a look at what's locking your globals. We have also discovered that apparently queries run even if the client is disconnected. You can kill the task with the management portal, but the system doesn't seem to like doing more than one ODBC query at once with larger queries because it takes gigs of temp data to do such a query. We use DBVisualizer for a JDBC connection.
Someone mentioned TuneTable, that's great to run if your table changes a lot or at least a couple of times in the table's life. This is NOT something that you want to overuse. http://docs.intersystems.com/ens20151/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_optimizing is where you can find some documentation and other useful information about this and improving performance. If it's not fast then someone broke it.
Someone also mentioned that select count() will count an index instead of the table itself with computed properties. This is related to that decision engine that compiles your sql queries and decides what the most efficient method is to get your data. There is a tool in the portal that will show you how long it takes and will show you the other methods (that the smart interpreter [I forget what it's called]) that are available. You can see the Query Plan at the same page that you can execute SQL in the browser mentioned below. /csp/sys/exp/UtilSqlQueryShowPlan.csp
RE: I can't run this query from within the Management Portal because the tables are only made available from within an application and/or ODBC.
That isn't actually true. Within the management portal, go to System Explorer, SQL, then Execute SQL Statements. Please note that you must have adequate privileges to see this %ALL will allow access to everything. Also, you can run SQL queries natively in TERMINAL by executing.... do $system.SQL.Shell() Then type your queries. This interface should be faster than ODBC as I think it uses object access. Also, keep in mind that embedded SQL and object access of data is the fastest way to access data.
Please let me know if you have any more questions!

tsql conditions vs c# conditions performance

I would like to know what performs faster and its preferable a condition in a tsql query like this:
select case 'color' when 'red' then 1 when 'blue' then 2 else 3 end
or performing the same switch in c# code after getting the value from the db?
switch(color):
{
case "red":
return 1;
case "blue":
return 2;
default:
return 3;
}
To add more data in my specific case we have a sql query that returns 5800+ records in some cases (date filters and so) then we concatenate those results in c# (one txt line per record) and generate a txt.
We have one server that is the sql server + webserver(asp.net) and it takes like 10 or more mins to generate it...So we where thinking about doing all the conditions on the sql side, maybe concatenating the fields as one at the sql level too vs using c# loop with StringBuilder?
Right now the sql takes 1 sec to execute and all the time its taken at the concatening loop, there are 5873 records with 11 fields each
I think you are prematurely optimizing. "Make it work, make it right, then make it fast."
I know this statement (and others like it) bring about a lot of debate, but I think you should be putting this logic in the layer that is most appropriate, as in, where it has the least duplication, most re-usability, easiest to maintain, etc. If you have a performance problem at that point, you can make actual measurements in your environment with your own loads.
As an example, rather than some naked switch like this (that must be maintained), perhaps this should be in a lookup table in the DB and brought back with a join, or maybe it's better exposed as a property of some class based upon an enum. These might be better patterns to follow.
All things being equal on processors, your performance is probably going to depend more on workload and bandwidth.
Will you be saving any bandwidth by replacing the string with an integer or simply adding an integer column? Will there be any filtering on rows which result in significantly less data going across the wire?
If you have 2 web servers and 1 sql server, the processor work will be divided by doing it on the web server. If you have thousands of rich clients and 1 sql server, the processor work will be completely distributed by doing it on the clients.
That's not really possible to say, as there are too many unknown factors.
It depends for example on how much data you return from the database, how you handle the data returned, and whether the database server or the application server is at capacity.
The switch in itself would be faster than the select, but that could easily be outweighed by the fact that returning a number instead of a string from the database could be faster to handle in the code.

Execute multiple SQL commands in one round trip

I am building an application and I want to batch multiple queries into a single round-trip to the database. For example, lets say a single page needs to display a list of users, a list of groups and a list of permissions.
So I have stored procs (or just simple sql commands like "select * from Users"), and I want to execute three of them. However, to populate this one page I have to make 3 round trips.
Now I could write a single stored proc ("getUsersTeamsAndPermissions") or execute a single SQL command "select * from Users;exec getTeams;select * from Permissions".
But I was wondering if there was a better way to specify to do 3 operations in a single round trip. Benefits include being easier to unit test, and allowing the database engine to parrallelize the queries.
I'm using C# 3.5 and SQL Server 2008.
Something like this. The example is probably not very good as it doesn't properly dispose objects but you get the idea. Here's a cleaned up version:
using (var connection = new SqlConnection(ConnectionString))
using (var command = connection.CreateCommand())
{
connection.Open();
command.CommandText = "select id from test1; select id from test2";
using (var reader = command.ExecuteReader())
{
do
{
while (reader.Read())
{
Console.WriteLine(reader.GetInt32(0));
}
Console.WriteLine("--next command--");
} while (reader.NextResult());
}
}
The single multi-part command and the stored procedure options that you mention are the two options. You can't do them in such a way that they are "parallelized" on the db. However, both of those options does result in a single round trip, so you're good there. There's no way to send them more efficiently. In sql server 2005 onwards, a multi-part command that is fully parameterized is very efficient.
Edit: adding information on why cram into a single call.
Although you don't want to care too much about reducing calls, there can be legitimate reasons for this.
I once was limited to a crummy ODBC driver against a mainframe, and there was a 1.2 second overhead on each call! I'm serious. There were times when I crammed a little extra into my db calls. Not pretty.
You also might find yourself in a situation where you have to configure your sql queries somewhere, and you can't just make 3 calls: it has to be one. It shouldn't be that way, bad design, but it is. You do what you gotta do!
Sometimes of course it can be very good to encapsulate multiple steps in a stored procedure. Usually not for saving round trips though, but for tighter transactions, getting ID for new records, constraining for permissions, providing encapsulation, blah blah blah.
Making one round-trip vs three will be more eficient indeed. The question is wether it is worth the trouble. The entire ADO.Net and C# 3.5 toolset and framework opposes what you try to do. TableAdapters, Linq2SQL, EF, all these like to deal with simple one-call==one-resultset semantics. So you may loose some serious productivity by trying to beat the Framework into submission.
I would say that unless you have some serious measurements showing that you need to reduce the number of roundtrips, abstain. If you do end up requiring this, then use a stored procedure to at least give an API kind of semantics.
But if your query really is what you posted (ie. select all users, all teams and all permissions) then you obviosuly have much bigger fish to fry before reducing the round-trips... reduce the resultsets first.
I this this link might be helpful.
Consider using at least the same connection-openning; according to what it says here, openning a connection is almost the top-leader of performance cost in Entity-Framework.
Firstly, 3 round trips isn't really a big deal. If you were talking about 300 round trips then that would be another matter, but for just 3 round trips I would conderer this to definitley be a case of premature optimisation.
That said, the way I'd do this would probably be to executed the 3 stored procuedres using SQL:
exec dbo.p_myproc_1 #param_1 = #in_param_1, #param_2 = #in_param_2
exec dbo.p_myproc_2
exec dbo.p_myproc_3
You can then iterate through the returned results sets as you would if you directly executed multiple rowsets.
Build a temp-table? Insert all results into the temp table and then select * from #temp-table
as in,
#temptable=....
select #temptable.field=mytable.field from mytable
select #temptable.field2=mytable2.field2 from mytable2
etc... Only one trip to the database, though I'm not sure it is actually more efficient.

How can I get a percentage of LINQ to SQL submitchanges?

I wonder if anyone else has asked a similar question.
Basically, I have a huge tree I'm building up in RAM using LINQ objects, and then I dump it all in one go using DataContext.SubmitChanges().
It works, but I can't find how to give the user a sort of visual indication of how far has the query progressed so far. If I could ultimately implement a sort of progress bar, that would be great, even if there is a minimal loss in performance.
Note that I have quite a large amount of rows to put into the DB, over 750,000 rows.
I haven't timed it exactly, but it does take a long while to put them in.
Edit: I thought I'd better give some indication of what I'm doing.
Basically, I'm building a suffix tree from the Lord of the Rings. Thus, there are a lot of Nodes, and certain Nodes have positions associated to them (Nodes that happen to be at the end of a suffix). I am building the Linq objects along these lines.
suffixTreeDB.NodeObjs.InsertOnSubmit(new NodeObj()
{
NodeID = 0,
ParentID = 0,
Path = "$"
});
After the suffix tree has been fully generated in RAM (which only takes a few seconds), I then call suffixTreeDB.submitChanges();
What I'm wondering is if there is any faster way of doing this. Thanks!
Edit 2: I've did a stopwatch, and apparently it takes precisely 6 minutes for the DB to be written.
I suggest you divide the calls you are doing, as they are sent in separate calls to the db anyway. This will also reduce the size of the transaction (which linq does when calling submitchanges).
If you divide them in 10 blocks of 75.000, you can provide a rough estimate on a 1/10 scale.
Update 1: After re-reading your post and your new comments, I think you should take a look at SqlBulkCopy instead. If you need to improve the time of the operation, that's the way to go. Check this related question/answer: What's the fastest way to bulk insert a lot of data in SQL Server (C# client)
I was able to get percentage progress for ctx.submitchanges() by using ctx.Log and ActionTextWriter
ctx.Log = new ActionTextWriter(s => {
if (s.StartsWith("INSERT INTO"))
insertsCount++;
ReportProgress(insertsCount);
});
more details are available at my blog post
http://epandzo.wordpress.com/2011/01/02/linq-to-sql-ctx-submitchanges-progress/
and stackoverflow question
LINQ to SQL SubmitChangess() progress
This isn't ideal, but you could create another thread that periodically queries the table you're populating to count the number of records that have been inserted. I'm not sure how/if this will work if you are running in a transaction though, since there could be locking/etc.
What I really think I need is a form of Bulk-Insert, however it appears that Linq doesn't support it.

Categories

Resources