Performance comparison Sql query and Linq data query - c#

I have two opinion about manipulating data with c# programming language environment.
(select * from where ...) query with sql and get data.
(select * from) get all data and use Linq query on object list.
What is the performance difference about these opinions for big size or avarage size data. Can I use both of them?

The generic answer to performance is questions is to try it on your data and see which works better.
In your case, though, there is a right answer: Do the work in the database.
Filtering the data in the database (using the where) has two advantages. First, it reduces the amount of data sent from the database to the application. This is almost always a win (unless almost all rows are returned).
Second, it allows the database to optimize the query, using (for instance) available indexes to speed the query.

Personally - if you can reduce the amount of data you suck into memory from the database, do it. Why download 10M records, when you needed 100k.. then refine it more with linq for simplicity maybe using local conditions etc. For small data you can probably try both - although depending on what your linq is connected to object wise you could still be performing sql anyway, so...

I assume you're talking about LinqToSql here and the resulting queries are equivalent. If that is the case the only difference in terms of performance is LinqToSql overhead of translating c# expression tree to SQL query. And it's pretty serious as the process involves DB provider which uses reflection and complex logic of converting the tree.

Related

What is Best way Sql Raw Queries or Linq

I'm more familiar with SQL raw queries. Most of the time I'm using stored procedure to do complex queries and Insert,Delete,Update and Select One record are done by using Simple Entity Framework methods and Linq queries. What are the Advantages and Disadvantages of using Linq and SQL Row queries and what is the best practice.
SQL will almost always be (a lot) quicker as it is highly optimised towards the returning of specific sets of data. It uses complex indexing to manage knowing where to look for the data to be able to do this. This does however also often depend on your database maintenance. For example you can speed up the way your database searches by adding indexes to your databases, so as you can see, SQL requires more work than simply writing a stored procedure if you want to optimise the performance.
Linq on the other hand is a lot quicker to implement as opposed to SQL Stored procedures which tend to take longer to write and don't require you to perform maintenance on the data. Personally I find SQL difficult to read whereas Linq and programmatic code comes quite naturally to me.
Therefore I would say SQL is quicker and more tedious whereas the programmatic approach is slower but easier to implement.
If you are working on a small dataset you could probably get away with Linq, but if your working with a large database SQL is almost always the way to go.

Joins are for lazy people?

I recently had a discussion with another developer who claimed to me that JOINs (SQL) are useless. This is technically true but he added that using joins is less efficient than making several requests and link tables in the code (C# or Java).
For him joins are for lazy people that don't care about performance. Is this true? Should we avoid using joins?
No, we should avoid developers who hold such incredibly wrong opinions.
In many cases, a database join is several orders of magnitude faster than anything done via the client, because it avoids DB roundtrips, and the DB can use indexes to perform the join.
Off the top of my head, I can't even imagine a single scenario where a correctly used join would be slower than the equivalent client-side operation.
Edit: There are some rare cases where custom client code can do things more efficiently than a straightforward DB join (see comment by meriton). But this is very much the exception.
It sounds to me like your colleague would do well with a no-sql document-database or key-value store. Which are themselves very good tools and a good fit for many problems.
However, a relational database is heavily optimised for working with sets. There are many, many ways of querying the data based on joins that are vastly more efficient than lots of round trips. This is where the versatilty of a rdbms comes from. You can achieve the same in a nosql store too, but you often end up building a separate structure suited for each different nature of query.
In short: I disagree. In a RDBMS, joins are fundamental. If you aren't using them, you aren't using it as a RDBMS.
Well, he is wrong in the general case.
Databases are able to optimize using a variety of methods, helped by optimizer hints, table indexes, foreign key relationships and possibly other database vendor specific information.
No, you shouldnt.
Databases are specifically designed to manipulate sets of data (obviously....). Therefore they are incredibly efficient at doing this. By doing what is essentially a manual join in his own code, he is attempting to take over the role of something specifically designed for the job. The chances of his code ever being as efficient as that in the database are very remote.
As an aside, without joins, whats the point in using a database? he may as well just use text files.
If "lazy" is defined as people who want to write less code, then I agree. If "lazy" is defined as people who want to have tools do what they are good at doing, I agree. So if he is merely agreeing with Larry Wall (regarding the attributes of good programmers), then I agree with him.
Ummm, joins is how relational databases relate tables to each other. I'm not sure what he's getting at.
How can making several calls to the database be more efficient than one call? Plus sql engines are optimized at doing this sort of thing.
Maybe your coworker is too lazy to learn SQL.
"This is technicaly true" - similarly, a SQL database is useless: what's the point in using one when you can get the same result by using a bunch of CSV files, and correlating them in code? Heck, any abstraction is for lazy people, let's go back to programming in machine code right on the hardware! ;)
Also, his asssertion is untrue in all but the most convoluted cases: RDBMSs are heavily optimized to make JOINs fast. Relational database management systems, right?
Yes, You should.
And you should use C++ instead of C# because of performance. C# is for lazy people.
No, no, no. You should use C instead of C++ because of performance. C++ is for lazy people.
No, no, no. You should use assembly instead of C because of performance. C is for lazy people.
Yes, I am joking. you can make faster programs without joins and you can make programs using less memory without joins. BUT in many cases, your development time is more important than CPU time and memory. Give up a little performance and enjoy your life. Don't waste your time for little little performance. And tell him "Why don't you make a straight highway from your place to your office?"
The last company I worked for didn't use SQL joins either. Instead they moved this work to application layer which is designed to scale horizontally. The rationale for this design is to avoid work at database layer. It is usually the database that becomes bottleneck. Its easier to replicate application layer than database. There could be other reasons. But this is the one that I can recall now.
Yes I agree that joins done at application layer are inefficient compared to joins done by database. More network communication also.
Please note that I'm not taking a hard stand on avoiding SQL joins.
Without joins how are you going to relate order items to orders?
That is the entire point of a relational database management system.
Without joins there is no relational data and you might as well use text files
to process data.
Sounds like he doesn't understand the concept so he's trying to make it seem they are useless. He's the same type of person who thinks excel is a database application.
Slap him silly and tell him to read more about databases. Making multiple connections and pulling data and merging the data via C# is the wrong way to do things.
I don't understand the logic of the statement "joins in SQL are useless".
Is it useful to filter and limit the data before working on it? As you're other respondants have stated this is what database engines do, it should be what they are good at.
Perhaps a lazy programmer would stick to technologies with which they were familiar and eschew other possibilities for non technical reasons.
I leave it to you to decide.
Let's consider an example: a table with invoice records, and a related table with invoice line item records. Consider the client pseudo code:
for each (invoice in invoices)
let invoiceLines = FindLinesFor(invoice)
...
If you have 100,000 invoices with 10 lines each, this code will look up 10 invoice lines from a table of 1 million, and it will do that 100,000 times. As the table size increases, the number of select operations increases, and the cost of each select operation increases.
Becase computers are fast, you may not notice a performance difference between the two approaches if you have several thousand records or fewer. Because the cost increase is more than linear, as the number of records increases (into the millions, say), you'll begin to notice a difference, and the difference will become less tolerable as the size of the data set grows.
The join, however. will use the table's indexes and merge the two data sets. This means that you're effectively scanning the second table once rather than randomly accessing it N times. If there's a foreign key defined, the database already has the links between the related records stored internally.
Imagine doing this yourself. You have an alphabetical list of students and a notebook with all the students' grade reports (one page per class). The notebook is sorted in order by the students' names, in the same order as the list. How would you prefer to proceed?
Read a name from the list.
Open the notebook.
Find the student's name.
Read the student's grades, turning pages until you reach the next student or the last page.
Close the notebook.
Repeat.
Or:
Open the notebook to the first page.
Read a name from the list.
Read any grades for that name from the notebook.
Repeat steps 2-3 until you get to the end
Close the notebook.
Sounds like a classic case of "I can write it better." In other words, he's seeing something that he sees as kind of a pain in the neck (writing a bunch of joins in SQL) and saying "I'm sure I can write that better and get better performance." You should ask him if he is a) smarter and b) more educated than the typical person that's knee deep in the Oracle or SQL Server optimization code. Odds are he isn't.
He is most certainly wrong. While there are definite pros to data manipulation within languages like C# or Java, joins are fastest in the database due to the nature of SQL itself.
SQL keeps detailing statistics regarding the data, and if you have created your indexes correctly, can very quickly find one record in a couple of million. Besides the fact that why would you want to drag all your data into C# to do a join when you can just do it right on the database level?
The pros for using C# come into play when you need to do something iteratively. If you need to do some function for each row, it's likely faster to do so within C#, otherwise, joining data is optimized in the DB.
I will say that I have run into a case where it was faster breaking the query down and doing the joins in code. That being said, it was only with one particular version of MySQL that I had to do that. Everything else, the database is probably going to be faster (note that you may have to optimize the queries, but it will still be faster).
I suspect he has a limited view on what databases should be used for. One approach to maximise performance is to read the entire database into memory. In this situation, you may get better performance and you may want to perform joins if memory for efficiency. However this is not really using a database, as a database IMHO.
No, not only are joins better optimized in database code that ad-hoc C#/Java; but usually several filtering techniques can be applied, which yields even better performance.
He is wrong, joins are what competent programmers use. There may be a few limited cases where his proposed method is more efficient (and inthose I would probably be using a Documant database) but I can't see it if you have any deceent amount of data. For example take this query:
select t1.field1
from table1 t1
join table2 t2
on t1.id = t2.id
where t1.field2 = 'test'
Assume you have 10 million records in table1 and 1 million records in table2. Assume 9 million of the records in table 1 meet the where clause. Assume only 15 of them are in table2 as well. You can run this sql statement which if properly indexed will take milliseconds and return 15 records across the network with only 1 column of data. Or you can send ten million records with 2 columns of data and separately send another 1 millions records with one column of data across the network and combine them on the web server.
Or of course you could keep the entire contents of the database on the web server at all times which is just plain silly if you have more than a trivial amount of data and data that is continually changing. If you don't need the qualities of a relational database then don't use one. But if you do, then use it correctly.
I've heard this argument quite often during my career as a software developer. Almost everytime it has been stated, the guy making the claim didn't have much knowledge about relational database systems, the way they work and the way such systems should be used.
Yes, when used incorrectly, joins seem to be useless or even dangerous. But when used in the correct way, there is a lot of potential for database implementation to perform optimizations and to "help" the developer retrieving the correct result most efficiently.
Don't forget that using a JOIN you tell the database about the way you expect the pieces of data to relate to each other and therefore give the database more information about what you are trying to do and therefore making it able to better fit your needs.
So the answer is definitely: No, JOINSaren't useless at all!
This is "technically true" only in one case which is not used often in applications (when all the rows of all the tables in the join(s) are returned by the query). In most queries only a fraction of the rows of each table is returned. The database engine often uses indexes to eliminate the unwanted rows, sometimes even without reading the actual row as it can use the values stored in indexes. The database engine is itself written in C, C++, etc. and is at least as efficient as code written by a developer.
Unless I've seriously misunderstood, the logic in the question is very flawed
If there are 20 rows in B for each A, a 1000 rows in A implies 20k rows in B.
There can't be just 100 rows in B unless there is many-many table "AB" with 20k rows with the containing the mapping.
So to get all information about which 20 of the 100 B rows map to each A row you table AB too. So this would be either:
3 result sets of 100, 1000, and 20k rows and a client JOIN
a single JOINed A-AB-B result set with 20k rows
So "JOIN" in the client does add any value when you examine the data. Not that it isn't a bad idea. If I was retrieving one object from the database than maybe it makes more sense to break it down into separate results sets. For a report type call, I'd flatten it out into one almost always.
In any case, I'd say there is almost no use for a cross join of this magnitude. It's a poor example.
You have to JOIN somewhere, and that's what RDBMS are good at. I'd not like to work with any client code monkey who thinks they can do better.
Afterthought:
To join in the client requires persistent objects such as DataTables (in .net). If you have one flattened resultset it can be consumed via something lighter like a DataReader. High volume = lot of client resources used to avoid a database JOIN.

Efficiency of linq to sql vs stored procedure

Hi I'm writing a app which has a search page and does a search on the database.
I'm wondering whether I should do this in linq or a stored procedure.
Is the performance of a stored procedure much better than that of linq to sql?
I'm thinking it would be because in order to write the linq query you need to use the datacontext to access the table on which to query. I'm imagining this in itself means that if the table is big it might become inefficient.
That is if you were using:
context.GetTable<T>();
Can any one advise me here?
There is unlikely to be much difference UNLESS you encounter a situation where the TSQL produced by Linq to SQL is not optimal.
If you want absolute control over the TSQL use a stored procedure.
If speed is critical, benchmark both and also examine the TSQL produced by your Linq to SQL solution.
Also, you should be wary of pulling back entire tables (unless they are small, such as frequently accessed lookup data) across the wire in either solution.
If the speed is so critical to you then you should go ahead and benchmark both options on a reasonable set of data. Technically I would expect the SP to be faster but it might not be that much of a difference.
What does "efficient" mean to you?
I'm working on a website where sub seconds (preferably sub 500ms) is the goal. We're using Linq for search on most of our stuff. The only time we're actually using a SP is when we're using the hierarchyid and other SqlServer data types that don't exist in EF.
GetTable probably isn't going to be that different between the two, as fundamentally it's just SELECT * FROM T. You'll see more significant gains from stored procedures in cases where the query isn't being written very optimally by Linq, or in some very high load situations were caching the execution plan makes a difference.
Benchmarking it is the best answer, but from what it looks like you're doing I don't think the difference is going to amount to much.

should linq to sql be used for websites that have high traffic

I have read many articles about linq to sql performance. The result which i got is it is slower than normal approach(DAL or Microsoft Enterprise library). Slower for both read and write operations even after performance tuning like disable ObjectTracking and other tricks. I know it has prons like quick development, clean code etc but what about the performance.
What if i used only for read operations.
Please give your suggestions.
It seems to work well enough for stackoverflow ;-p Especially if you use compiled queries, this is unlikely to be your bottleneck, compared (for example) to appropriate DB design and fetching the right columns / rows, and avoiding n+1 loading.
For read operations LINQ To SQL should be roughly as fast as directly writing the SQL because that's exactly what it does. The overhead of creating the SQL shouldn't be noticable. There might be a few queries that it doesn't get as optimal as if you wrote the query by hand, but in my experience it does very well in most cases.
For bulk updates, LINQ To SQL is typically slower because it handles rows one at a time. You can't do something like UPDATE Foo SET x = 0 WHERE id BETWEEN 100 AND 200 in LINQ to SQL without fetching all the rows. It's currently best to write the SQL by hand for this type of operation.
The updates and deletes is where LINQ to SQL currently suffers since a separate statement is generated for each affected object.
That said, this blog post details how to get both operations down to 1 statement, which should help improve performance: Batch Updates and Deletes with LINQ to SQL.
Compiled queries will also come in handy for frequently used queries, especially those where parameters are used to get specific results. You might also find this post helpful: 10 Tips to Improve your LINQ to SQL Application Performance.

Can we convert all SQL scripts to Linq-to-SQL expressions or there is any limitation?

I want to convert all of my db stored procedures to linq to sql expressions, is there any limitation for this work? you must notice that there is some complicated queries in my db.
Several features of SQL Server are not supported by Linq to SQL:
Batch updates (unless you use non-standard extensions);
Table-Valued Parameters;
CLR types, including spatial types and hierarchyid;
DML statements (I'm thinking specifically of table variables and temporary tables);
The OUTPUT INTO clause;
The MERGE statement;
Recursive Common Table Expressions, i.e. hierarchical queries on a nested set;
Optimized paging queries using SET ROWCOUNT (ROW_NUMBER is not the most efficient);
Certain windowing functions like DENSE_RANK and NTILE;
Cursors - although these should obviously be avoided, sometimes you really do need them;
Analytical queries using ROLLUP, CUBE, COMPUTE, etc.
Statistical aggregates such as STDEV, VAR, etc.
PIVOT and UNPIVOT queries;
XML columns and integrated XPath;
...and so on...
With some of these things you could technically write your own extension methods, parse the expression trees and actually generate the correct SQL, but that won't work for all of the above, and even when it is a viable option, it will often simply be easier to write the SQL and invoke the command or stored procedure. There's a reason that the DataContext gives you the ExecuteCommand, ExecuteQuery and ExecuteMethodCall methods.
As I've stated in the past, ORMs such as Linq to SQL are great tools, but they are not silver bullets. I've found that for larger, database-heavy projects, L2S can typically handle about 95% of the tasks, but for that other 5% you need to write UDFs or Stored Procedures, and sometimes even bypass the DataContext altogether (object tracking does not play nice with server triggers).
For smaller/simpler projects it is highly probable that you could do everything in Linq to SQL. Whether or not you should is a different question entirely, and one that I'm not going to try to answer here.
I've found that in almost all cases where I've done a new project with L2S, I've completely removed the need for stored procedures. In fact, many of the cases where I would have been forced to use a stored proc, multivariable filters for instance, I've found that by building the query dynamically in LINQ, I've actually gotten better queries in the vast majority of cases since I don't need to include those parts of the query that get translated to "don't care" in the stored proc. So, from my perspective, yes -- you should be able to translate your stored procs to LINQ.
A better question, thought, might be should you translate your stored procs to LINQ? The answer to that, I think, depends on the state of the project, your relative expertise with C#/VB and LINQ vs SQL, the size of the conversion, etc. On an existing project I'd only make the effort if it improves the maintainability or extensibility of the code base, or if I was making significant changes and the new code would benefit. In the latter case you may choose to incrementally move your code to pure LINQ as you touch it to make changes. You can use stored procs with LINQ so you may not need to change it to make use of LINQ.
I'm not a fan of this approach. This is a major architectural change, because you are now removing a major interface layer you previously put in place to gain a decoupling advantage.
With stored procedures, you have already chosen the interface your database exposes. You will now need to grant users SELECT privileges on all the underlying tables/views instead of EXECUTE on just the application stored procedures and potentially you will need to restrict column read rights at the column level in the tables/views. Now you will need to re-implement at a lower level every explicit underlying table/view/column rights which your stored procedure was previously implementing with a single implicit EXECUTE right.
Whereas before the services expected from the database could be enumerated by an appropriate inventory of stored procedures, now the potential database operations are limited to the exposed tables/views/columns, vastly increasing the coupling and potential for difficulty in estimating scope changes for database refactorings and feature implementations.
Unless there are specific cases where the stored procedure interface is difficult to create/maintain, I see little benefit of changing a working SP-based architecture en masse. In cases where LINQ generates a better implementation because of application-level data coupling (for instance joining native collections to database), it can be appropriate. Even then, you might want to LINQ to the stored procedure on the database side.
If you chose LINQ from the start, you would obviously have done a certain amount of work up front in determining column/view/table permissions and limiting the scope of application code affecting database implementation details.
What does this mean? Does this mean you want to use L2S to call your stored procedures, or do you want to convert all the T-SQL statements in your stored procs to L2S? If it's the later, you should not have too many problems doing this. Most T-SQL statements can be represented in Linq without problem.
I might suggest you investigate a tool like Linqer to help you with your T-SQL conversion. It will convert most any T-SQL statement into Linq. It has saved my quite a bit of time in converting some of my queries.
There are many constructs in T-SQL which have no parallel in LINQ to SQL. Starting with flow control, ability to return multiple row sets, recursive queries.
You will need to approach this on a case by case basis. Remembering any times the SP does significant filtering work on the database much of that filtering may end up on the client, so needing to move far more data from server to client.
If you already have tested and working stored procedures, why convert them at all? That's just making work for no reason.
If you were starting a new product from scratch and were wondering whether to use stored procedures or not, that would be an entirely different question.

Categories

Resources