tsql conditions vs c# conditions performance - c#

I would like to know what performs faster and its preferable a condition in a tsql query like this:
select case 'color' when 'red' then 1 when 'blue' then 2 else 3 end
or performing the same switch in c# code after getting the value from the db?
switch(color):
{
case "red":
return 1;
case "blue":
return 2;
default:
return 3;
}
To add more data in my specific case we have a sql query that returns 5800+ records in some cases (date filters and so) then we concatenate those results in c# (one txt line per record) and generate a txt.
We have one server that is the sql server + webserver(asp.net) and it takes like 10 or more mins to generate it...So we where thinking about doing all the conditions on the sql side, maybe concatenating the fields as one at the sql level too vs using c# loop with StringBuilder?
Right now the sql takes 1 sec to execute and all the time its taken at the concatening loop, there are 5873 records with 11 fields each

I think you are prematurely optimizing. "Make it work, make it right, then make it fast."
I know this statement (and others like it) bring about a lot of debate, but I think you should be putting this logic in the layer that is most appropriate, as in, where it has the least duplication, most re-usability, easiest to maintain, etc. If you have a performance problem at that point, you can make actual measurements in your environment with your own loads.
As an example, rather than some naked switch like this (that must be maintained), perhaps this should be in a lookup table in the DB and brought back with a join, or maybe it's better exposed as a property of some class based upon an enum. These might be better patterns to follow.

All things being equal on processors, your performance is probably going to depend more on workload and bandwidth.
Will you be saving any bandwidth by replacing the string with an integer or simply adding an integer column? Will there be any filtering on rows which result in significantly less data going across the wire?
If you have 2 web servers and 1 sql server, the processor work will be divided by doing it on the web server. If you have thousands of rich clients and 1 sql server, the processor work will be completely distributed by doing it on the clients.

That's not really possible to say, as there are too many unknown factors.
It depends for example on how much data you return from the database, how you handle the data returned, and whether the database server or the application server is at capacity.
The switch in itself would be faster than the select, but that could easily be outweighed by the fact that returning a number instead of a string from the database could be faster to handle in the code.

Related

SQL : SELECT * FROM method

Just out of curiosity, how exactly does SELECT * FROM table WHERE column = "something" works?
Is the underlying principle same as that of a for/foreach loop with an if condition like:
for (iterator)
{
if(condition)
//print results
}
If am dealing with , say 100 records, will there be any considerable performance difference between the 2 approaches in getting the desired data I want ?
SQL is a 4th generation language, which makes it very different from programming languages. Instead of telling the computer how to do something (loop through rows, compare columns), you tell the computer what to do (get the rows matching a condition).
The DBMS may or may not use a loop. It could as well use hashes and buckets, pre-sort a data set, whatever. It is free to choose.
On the technical side, you can provide an index in the datebase, so the DBMS can look up the keys to quickly to access the rows (like quickly finding names in a telephone book). This gives the DBMS an option how to acces the data, but it is still free to use a completely different approach, e.g. read the whole table sequentially.

Is it better additional query or get all data and filter in the client?

I have a query with EF Core in which I would like to include a property and from this property, that it is a ICollection, I would like filter what data to get.
It is something like that:
myDbContext.MyEntity.Where(x => x.ID == 1).Include(x => x.MyCollection.Where(y => y.isEnabled == true));
However, I get an error because it is not possible to filter the included properties.
In fact, the items in the collection will be few, about 6 or 7, so I was thinking that I could include all and later filter the data in the client.
Another option it would be get the the main entity first and in a second query to get the childs that really I need.
I always read that the connections to the database are expensive, so it is better to do as less queries as possible, but also I read that the best practice it is to get only the data that I need and no filter in the client, but it is better filter in the query.
But in this case, with EF Core, it seems that I can't filter in the query, so I would like to know what is better, 2 queries and get only the data that I need or one query getting all the data and filter later in the client.
But in this case, with EF Core, it seems that I can't filter in the query, so I would like to know what is better, 2 queries and get only the data that I need or one query getting all the data and filter later in the client.
Which is longer? One long piece of string, or two shorter pieces of string?
You don't know, because I haven't told you the actual lengths. You don't know if it's a 1m string versus two 5cm strings or a 10cm string vs two 8cm string.
And your question here is the same. It's better to do fewer queries than many, and it's better to do short queries than long queries. When a choice is on only one of those metrics (e.g. the shorter query from doing a simple Where on the database vs a simple Where in memory on all results) then we can make sound a priori judgements about which is likely to be the more efficient, and choose accordingly.
When though we have competing factors in play we have to:
Decide whether we even care: If they're going to still be pretty fast either way it might not be worth worrying about; find bigger fish to fry.
Measure.
Make sure what we are measuring is realistic.
The third point is important as one can often create data sets that would make one come out the victor, and other data sets that would make the other win. We need to make sure we're correctly modelling what is encountered in real life.
When the difference is small, or if they are both fast either way (and/or the use is so rare that it's still not a big deal), then just go for whichever is easier to code and maintain.

having table for fixed data or Enum?

I have a table that has Constant Value...Is it better that I have this table in my Database(that is SQL)or have an Enum in my code and delete my table?
my table has only 2 Columns and maximum 20 rows that these rows are fixed and get filled once,first time that i run application.
I would suggest to create an Enum for your case. Since the values are fixed(and I am assuming that the table is not going to change very often) you can use Enum. Creating a table in database will require an unnecessary hit to the database and will require a database connection which could be skipped if you are using Enum.
Also a lot may depend on how much operation you are going to do with your values. For example: its tedious to query your Enum values to get distinct values from your table. Whereas if you will use table approach then it would be a simple select distinct. So you may have to look into your need and the operations which you will perform on these values.
As far as the performance is concerned you can look at: Enum Fields VS Varchar VS Int + Joined table: What is Faster?
As you can see, ENUM and VARCHAR results are almost the same, but join
query performance is 30% lower. Also note the times themselves –
traversing about same amount of rows full table scan performs about 25
times better than accessing rows via index (for the case when data
fits in memory!)
So, if you have an application and you need to have some table field
with a small set of possible values, I’d still suggest you to use
ENUM, but now we can see that performance hit may not be as large as
you expect. Though again a lot depends on your data and queries.
That depends on your needs.
You may want to translate the Enum Values (if you are showing it in GUI) and order a set of record based on translated values. For example: imagine you have a Employees table and a Position column. If the record set is big, and you want to sort or order by translated position column, then you have to keep the enum values + translations in database.
Otherwise KISS and have it in code. You will spare time on asking database for values.
I depends on character of that constants.
If they are some low level system constants that never should be change (like pi=3.1415) then it is better to keep them only in code part in some config file. And also if performance is critical parameter and you use them very often (on almost each request) it is better to keep them in code part.
If they are some constants (may be business constants) that can change in future it is Ok to put them in table - then you have more flexibility to change them (for instance from admin panel).
It really depends on what you actually need.
With Enum
It is faster to access
Bound to that certain application. (although you can share by making it as reference, but it just does not look as good as using DB)
You can use in switch statement
Enum usually does not care about value and it is limited to int.
With DB
It is slower, because you have to make connection and query.
The data can be shared widely.
You can set the value to be anything (any type any value).
So, if you will use it only on certain application, Enum is good enough. But if several applications are going to use it, then DB would be better option.

Is SQL code faster than C# code? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Few months ago i started to work at this programming company. One of the practices they use is to do as much work as possible in SQL rather than C#.
So, lets say i have this simple example of writing a list of some files:
Is something like this:
string SQL = #"
SELECT f.FileID,
f.FileName,
f.FileExtension,
'/files/' + CAST(u.UserGuid AS VARCHAR(MAX)) + '/' + (f.FileName + f.FileExtension) AS FileSrc,
FileSize=
CASE
WHEN f.FileSizeB < 1048576 THEN CAST(CAST((f.FileSizeB / 1024) AS DECIMAL(6, 2)) AS VARCHAR(8)) + ' KB'
ELSE CAST(CAST((f.FileSizeB / 1048576) AS DECIMAL(6, 2)) AS VARCHAR(8)) + ' MB'
END
FROM Files f
INNER JOIN Users u
ON f.UserID = u.UserID
";
// some loop for writing results {
// write...
// }
Faster or better then something like this:
string SQL = #"
SELECT u.UserGuid,
f.FileID,
f.FileName,
f.FileExtension,
f.FileSizeB
FROM Files f
INNER JOIN Users u
ON f.UserID = u.UserID";
// some loop for writing results {
string FileSrc = "/Files/" + result["UserGuid"] + "/" + result["FileName"] + result["FileExtension"];
string FileSize = ConvertToKbOrMb(result["FileSizeB"]);
// write...
// }
This particular code doesn't matter (it's just some basic example) ... the question is about this kind of thing in general ... is it better to put more load on SQL or 'normal' code?
It just a bad programming practice. You should separate and isolate different parts of your program for ease of future maintenance (think of the next programmer!)
Performance
Many solutions suffer from poor DB performance, so most developers usually limit the SQL database access to the smallest transaction possible. Ideally the transformation of raw data to human readable form should happen at the very last point possible. Also the memory usage of non-formatted data is much smaller, and while memory is cheap, you shouldn't waste it. Every extra byte to be buffered, cached, and transmitted all takes up time, and reduces available server resources
e.g. for a web application formatting should be done by local JavaScript templates from a JSON data packet. This reduces the workload of the backend SQL database and application servers, and reduces the data that needs to be transmitted over the network, all of which speeds up server performance
Formatting and Localisation
Many solutions have different output needs for the same transaction e.g. different views, different localisations etc. By embedding formating into the SQL transaction you will have to make a new transaction for each localisation, this will be become a maintenance nightmare
Also formatted transactions cannot be used for an API interface, you would need yet another set of transaction for the API interface which would have no formatting
With c# you should be using a well tested template or string handling library, or at least string.Format(), do not use the '+' operator with strings, it is very slow
Share the load
Most solutions have multiple clients for one DB, so the client side formatting load is shared with the multiple clients CPU's, not the single SQL database CPU
I seriously doubt SQL is faster than c#, you should perform a simple benchmark and post the results here :-)
The reason that the second part it may be little slower is because you need to pull out the data from the SQL server and gives it to C# part of code, and this takes more time.
The more read you make like ConvertToKbOrMb(result["FileSizeB"]) can always take some more time, and also depend from your DAL layer. I have see some DALs that are really slow.
If you left them on the SQL Server you gain this extra processing of getting out the data, thats all.
From experience, one of my optimizations is always to pull out only the needed data - The more data you read from the sql server and move it to whatever (asp.net, console, c# program etc), the more time you spend to move them around, especial if they are big strings, or make a lot of conversions from string to numbers.
To answer and to the direct question, what is faster - I say that you can not compare them. They are both as fast as possible, if you make good code and good queries. SQL Server also keeps a lot of statistics and improve the return query - c# did not have this kind of part, so what to compare ?
One test by my self
Ok, I have here a lot of data from a project, and make a fast test that actually not prove that the one is faster than the other.
What I run two cases.
SELECT TOP 100 PERCENT cI1,cI2,cI3
FROM [dbo].[ARL_Mesur] WITH (NOLOCK) WHERE [dbo].[ARL_Mesur].[cWhen] > #cWhen0;
foreach (var Ena in cAllOfThem)
{
// this is the line that I move inside SQL server to see what change on speed
var results = Ena.CI1 + Ena.CI2 + Ena.CI3;
sbRender.Append(results);
sbRender.Append(Ena.CI2);
sbRender.Append(Ena.CI3);
}
vs
SELECT TOP 100 PERCENT (cI1+cI2+cI3) as cI1,cI2,cI3
FROM [dbo].[ARL_Mesur] WITH (NOLOCK) WHERE [dbo].[ARL_Mesur].[cWhen] > #cWhen0;
foreach (var Ena in cAllOfThem)
{
sbRender.Append(Ena.CI1);
sbRender.Append(Ena.CI2);
sbRender.Append(Ena.CI3);
}
and the results shows that the speed is near the same.
- All the parameters are double
- The reads are optimized, I make no extra reads at all, just move the processing from the one part to the other.
On 165,766 lines, here are some results:
Start 0ms +0ms
c# processing 2005ms +2005ms
sql processing 4011ms +2006ms
Start 0ms +0ms
c# processing 2247ms +2247ms
sql processing 4514ms +2267ms
Start 0ms +0ms
c# processing 2018ms +2018ms
sql processing 3946ms +1928ms
Start 0ms +0ms
c# processing 2043ms +2043ms
sql processing 4133ms +2090ms
So, the speed can be affected from many factors... we do not know what is your company issue that makes the c# slower than the sql processing.
As a general rule of thumb: SQL is for manipulating data, not formatting how it is displayed.
Do as much as you can in SQL, yes, but only as long as it serves that goal. I'd take a long hard look at your "SQL example", solely on that ground. Your "C# example" looks like a cleaner separation of responsibilities to me.
That being said, please don't take it too far and stop doing things in SQL that should be done in SQL, such as filtering and joining. For example reimplementing INNER JOIN Users u ON f.UserID = u.UserID in C# would be a catastrophe, performance-wise.
As for performance in this particular case:
I'd expect "C# example" (not all C#, just this example) to be slightly faster, simply because...
f.FileSizeB
...looks narrower than...
'/files/' + CAST(u.UserGuid AS VARCHAR(MAX)) + '/' + (f.FileName + f.FileExtension) AS FileSrc,
FileSize=
CASE
WHEN f.FileSizeB < 1048576 THEN CAST(CAST((f.FileSizeB / 1024) AS DECIMAL(6, 2)) AS VARCHAR(8)) + ' KB'
ELSE CAST(CAST((f.FileSizeB / 1048576) AS DECIMAL(6, 2)) AS VARCHAR(8)) + ' MB'
END
...which should conserve some network bandwidth. And network bandwidth tends to be scarcer resource than CPU (especially client-side CPU).
Of course, your mileage may vary, but either way the performance difference is likely to be small enough, so other concerns, such as the overall maintainability of code, become relatively more important. Frankly, your "C# example" looks better to me here, in that regard.
There are good reasons to do as much as you can on the database server. Minimizing the amount of data that has to be passed back and forth and giving the server as much leeway in optimizing the process is a good thing.
However that is not really illustrated in your example. Both processes pass as much data back and forth (perhaps the first passes more) and the only difference is who does the calculation and it may be that the client does this better.
Your question is about whether the string manipulation operations should be done in C# or SQL. I would argue that this example is so small that any performance gain -- one-way or the other -- is irrelevant. The question is "where should this be done"?
If the code is "one-off" code for part of an application, then doing in the application level makes a lot of sense. If this code is repeated throughout the application, then you want to encapsulate it. I would argue that the best way to encapsulate it is using a SQL Server computed column, view, table-valued function, or scalar function (with the computed column being preferable in this case). This ensures that the same processing occurs the same no matter where called.
There is a key difference between database code and C# code in terms of performance. The database code automatically runs in parallel. So, if your database server is multi-threaded, then separate threads might be doing those string manipulations at the same time (no promises, the key word here is "might").
In general when thinking about the split, you want to minimize the amount of data being passed back and forth. The difference in this case seems to be minimal.
So, if this is one place in an application that has this logic, then do it in the application. If the application is filled with references to this table that want this logic, then think about a computed column. If the application has lots of similar requests on different tables, then think about a scalar valued function, although this might affect the ability of queries to take advantage of parallelism.
It really depends on what you're doing.
Don't forget about SQL CLR. There are many operations that T-SQL code is just slower at.
Usually in production environments the database infrastructure tier is given twice sometimes three times as much resource that the application tier.
Also, for SQL code to run natively against the database will have a keen advantage of SQL code being ran on the application and passed through a database driver.

Execute multiple SQL commands in one round trip

I am building an application and I want to batch multiple queries into a single round-trip to the database. For example, lets say a single page needs to display a list of users, a list of groups and a list of permissions.
So I have stored procs (or just simple sql commands like "select * from Users"), and I want to execute three of them. However, to populate this one page I have to make 3 round trips.
Now I could write a single stored proc ("getUsersTeamsAndPermissions") or execute a single SQL command "select * from Users;exec getTeams;select * from Permissions".
But I was wondering if there was a better way to specify to do 3 operations in a single round trip. Benefits include being easier to unit test, and allowing the database engine to parrallelize the queries.
I'm using C# 3.5 and SQL Server 2008.
Something like this. The example is probably not very good as it doesn't properly dispose objects but you get the idea. Here's a cleaned up version:
using (var connection = new SqlConnection(ConnectionString))
using (var command = connection.CreateCommand())
{
connection.Open();
command.CommandText = "select id from test1; select id from test2";
using (var reader = command.ExecuteReader())
{
do
{
while (reader.Read())
{
Console.WriteLine(reader.GetInt32(0));
}
Console.WriteLine("--next command--");
} while (reader.NextResult());
}
}
The single multi-part command and the stored procedure options that you mention are the two options. You can't do them in such a way that they are "parallelized" on the db. However, both of those options does result in a single round trip, so you're good there. There's no way to send them more efficiently. In sql server 2005 onwards, a multi-part command that is fully parameterized is very efficient.
Edit: adding information on why cram into a single call.
Although you don't want to care too much about reducing calls, there can be legitimate reasons for this.
I once was limited to a crummy ODBC driver against a mainframe, and there was a 1.2 second overhead on each call! I'm serious. There were times when I crammed a little extra into my db calls. Not pretty.
You also might find yourself in a situation where you have to configure your sql queries somewhere, and you can't just make 3 calls: it has to be one. It shouldn't be that way, bad design, but it is. You do what you gotta do!
Sometimes of course it can be very good to encapsulate multiple steps in a stored procedure. Usually not for saving round trips though, but for tighter transactions, getting ID for new records, constraining for permissions, providing encapsulation, blah blah blah.
Making one round-trip vs three will be more eficient indeed. The question is wether it is worth the trouble. The entire ADO.Net and C# 3.5 toolset and framework opposes what you try to do. TableAdapters, Linq2SQL, EF, all these like to deal with simple one-call==one-resultset semantics. So you may loose some serious productivity by trying to beat the Framework into submission.
I would say that unless you have some serious measurements showing that you need to reduce the number of roundtrips, abstain. If you do end up requiring this, then use a stored procedure to at least give an API kind of semantics.
But if your query really is what you posted (ie. select all users, all teams and all permissions) then you obviosuly have much bigger fish to fry before reducing the round-trips... reduce the resultsets first.
I this this link might be helpful.
Consider using at least the same connection-openning; according to what it says here, openning a connection is almost the top-leader of performance cost in Entity-Framework.
Firstly, 3 round trips isn't really a big deal. If you were talking about 300 round trips then that would be another matter, but for just 3 round trips I would conderer this to definitley be a case of premature optimisation.
That said, the way I'd do this would probably be to executed the 3 stored procuedres using SQL:
exec dbo.p_myproc_1 #param_1 = #in_param_1, #param_2 = #in_param_2
exec dbo.p_myproc_2
exec dbo.p_myproc_3
You can then iterate through the returned results sets as you would if you directly executed multiple rowsets.
Build a temp-table? Insert all results into the temp table and then select * from #temp-table
as in,
#temptable=....
select #temptable.field=mytable.field from mytable
select #temptable.field2=mytable2.field2 from mytable2
etc... Only one trip to the database, though I'm not sure it is actually more efficient.

Categories

Resources