I've created a Linq to Entities and I've used
Enumerable.Distinct<T>(this IEnumerable<T> source, IEqualityComparer<T> comparer)
in the query.
Due to my "Comparer" class the query run in the client side. It works as expected and I get the desired result but the query is slow because the involved tables has tons of rows so I was thinking if it's possible to use SQL CLR to implement the whole query including the Comprarer class in that way the whole query run in server side.
Is it possible?
Any idea is welcome.
SQL 2005 CLR only supports .Net 2.0 framwork by default. I have imported the .Net 3.5 framework into SQL server before, but it is somewhat messy and opens up some security holes (don't recall the all the details), but the quick and dirty is in this question -- in particular
CREATE ASSEMBLY [System.Core]
AUTHORIZATION [dbo]
FROM
'C:\Program Files\Reference Assemblies\Microsoft\Framework\v3.5\System.Core.dll'
WITH PERMISSION_SET = UNSAFE
The other roadblock that you typically encounter in CLR procs is the security context from the user running the stored proc is not inherited by the CLR proc, so trying to accomplish some things can be harder than you would expect.
Debugging a CLR proc is not as easy. You can do it, and I was never able to get a full stack backtrace (with line number, etc.). So, debug before you move it to CLR as much as possible.
Other than that, I have not had much trouble with CLR procs -- they work pretty well as long as you have a handle on possible performance issues by running complex code inside SQL, don't allocate lots of memory inside your CLR proc -- you will regret it.
ADDED
One other issue I thought of. CLR proc code writing needs a higher level of proficiency than typical client side code or stored procs. This is a maintenance consideration.
Updates to the CLR proc are also more complicated the client side code or sql procs, so this can also be a problem. This also complicates maintenance.
I'm not trying to discourage CLR procs, the upsides are also good -- performance, flexibility, security (have to have DB permissions to update). But, I was trying to documents the issues that they don't tell you about when you read about "how great and simple CLR procs are".
ADDED
You don't give detail, but if you are data bound (lots of rows) and you can write the logic as set-based TSQL performance is almost certain to be much better as set-based. TSQL is slow if you try to do lots of computation via TSQL -- scripting runs slow, database I/O runs fast. TSQL is not very flexible as a programming language, so CLR adds flexibility and faster code execution.
If you are not familiar with using APPLY in a select statement (Sql 2005+), you should take the time to understand as it can be very useful in keeping complex query "set based" -- you don't ever want to use a cursor if you can avoid it -- slow and chews up database resources.
You might save yourself some drag on sending the results across the wire, though if you are on a 1 GB network then it might not matter, especially since it is fairly vague as to what "tons of rows" means. Are we talking in the hundreds of thousands, or millions?
In either case, I am not sure that there is a clear understanding here as to what SQL CLR does based on the statement "use SQL CLR to implement the whole query including the Comprarer class in that way the whole query run in server side". You cannot write a query in SQL CLR. Creating .Net / CLR Stored Procedures and Functions does not replace T-SQL for interaction with the database. This means that you are still going to need to execute a SELECT statement and get the results back. Using LINQ to SQL within a SQL CLR object will still execute the same SQL as it does right now from the client.
If more details are provided as to the end goal of doing the comparison then it might be possible to plan a more appropriate solution. But given the question as asked, it seems doubtful that moving this code into SQL Server, assuming the comparison is still done in the .Net code, will provide much, if any, benefit.
EDIT:
To be clearer: transferring the business logic to run server-side in such a way as to avoid pulling in all rows into memory such that they can be compared via your custom comparer would require that you create a SQL CLR function and use it in a WHERE clause. So the model would be changed to essentially send one row at a time to the function for comparison rather than have all available in a collection.
Related
What are the performance differences between accessing a database to query using ASP.NET Code behind against using SQL Stored Procedure
For ease of use, coding the query is easier, especially when coming to maintenance and changes.
However, a stored procedure is compiled in a database and run by the database.
Which one is more efficient, better to use?
SQL Server caches the execution plan of any query, SPROC or not. There is virtually no difference here. Basically, you save sending the query text over the network when using an sproc, nothing more. Source: http://msdn.microsoft.com/en-us/library/ms181055.aspx
Without and special reason being present, do what ever is more convenient to you.
The other answers suggesting generally better performance for sprocs are not true.
As long as it is a database centric query then the stored procedure will in most times be the faster choice (Performance).
But it is harder to maintain because its not in your regular source bundle.
"Better to use" depends on the requirements. If its okay when the query is a tad slower (like 1 ms VS 3 ms) then keep your code together and have it in ASP. If performance is the thing you want put it in the Database.
I put most of my queries in the code and only the ones that NEED the performance in the database.
Also it depends on the Database System used, of course.
Your question is very incomplete as to what you are actually comparing.
Whether the SQL code is in a stored procedure or a full-blown inline SQL statement submitted from the client usually makes little difference (assuming proper parameterization and the SQL being non-pathological) to performance. It can make a large difference in the security architecture and access required to be given to base tables or views instead of execution rights on procedures. Stored procs encourage parameterization as a requirement, but parameterization is also possible with inline SQL.
If you are talking about performing logic against sets returned from the database versus doing the work in the database, this can go both ways - it depends upon the type of operation, the type of indexing, the bandwidth between client and database and number of requests needed to be serviced.
Usually, I'd look first at doing it in the database to keep the join/looping logic abstracted from the client and reduce data on the wire (both columns and rows) and present a simple data set API to the client, but IT DEPENDS.
This is an "it depends" question.
Presuming this is SQL Server 2008R2 or higher Standard or Enterprise edition, stored procedures will cache differently than a TSQL statement. Complex T-SQL statements will almost always perform worse than a stored procedure due to multiple things such as parameterization, code compilation, parameter sniffing, and various other optimizations. In general, I prefer stored procedures as they are MUCH easier to optimize. Plus you can change a stored proceudre without re-compiling and re-deploying any code. And optimizations (such as "optimize for unknown" or
"with recompile" can be applied to a stored procedure when parameter values vary drastically) can be applied to a stored procedure and un-done without end users even noticing (well, except for a performance change).
A stored procedure will always end up in the plan cache after a single run and will never be considered an ad-hoc query. Ad-hoc queries, depending on SQL settings, may or may not be stored in the plan cache. Plus adding or removing a character (presuming it is not parameterized) will cause SQL Server to build a new plan and building new plans is a slow operation.
TL;DR - preusming SQL Server 2008R2 or higher Standard/Enterprise; for simple queries, you will notice no difference. For complex queries, stored procedure (if written properly) will almost always out perform T-SQL. Stored procedures are easier to optimize at a later date as well.
Edit - added in SQL version. I am uncertain about older SQL versions.
I am working on a ETL process for a data warehouse using C#, that supports both SQL Server and Oracle. During development I have been writing stored procedures that would synchronize data from one database to another database. The stored procedures code are rather ugly because it involves dynamic SQL. It needs to build the SQL strings since we have dynamic database name.
My team lead want to use C# code to do the ETL. We have code generation that automatic generate new classes when database definition changes. That's also why I decided not to use Rhino ETL.
Here are the pros and cons:
Stored Procedure:
Pros:
fast loading process, everything is handled by the database
easy deployment, no compiling is needed
Cons
poor readability due to dynamic SQL
Need to maintain both T-SQL and PL/SQL scripts when database definition changes
Slow development because no intellisense when writing dynamic SQL
C# Code:
Pros:
easier to develop the ETL process because we get intellisense from our generated class
easier to maintain because of generated class
better logging and error handling
Cons:
slow performance compare with stored procedure
I would prefer to use application code to do the ETL process, but the performance was horrible compare with stored procedures. In one test when I tries to update 10,000 row. The stored procedures took only 1 sec, while my ETL code took 70s. Even I somehow manage to reduce the overhead, 20% of the 70s are purely calling update statement from application code.
Could someone provide me suggestions or comment on how to speed up the ETL process using application code?
My next idea is try doing parallel ETL process by opening multiple database connections and perform the update and insert.
Thanks
You say you have code generation that automatically generates new classes - why don't you have code generation that automatically generate new stored procedures?
That should give you the best of two worlds; encapsulate it into a few nice classes that can inspect the database and update things as necessary and you can, well not increase readability, but hide it (you would not need to update the SPs manually)
Also, the difference should not be so huge, sounds as if you are not doing something right (reusing connections, moving data unnecessary from server to the application or processing data in smaller batches - row by row?).
Also, regarding better logging - care to elaborate on that? You can have logging on the database layer, too, or you can design your SPs so that application layer can still do the logging.
If your C# code is already slow with 10,000 rows, I cannot imagine it in a real environement...
Most ETL are done either within the database (stored procedures, packages, or even compiled within the database (PL/SQL, Java for Oracle)). They can handle millions of rows.
Or some professional tools can be used (Informatica, or others), but it will still be slower than stored procedures, but easier to manage.
So my conclusion is: If you want to come anywhere close to stored procedure performances, you will have to code an application as good as those professional ones on the market, that took years to develop and mature... Do you think you can?
Plus, if you have to handle different database types (SQL Server, Oracle), you CANNOT make a generic application AND optimize it at the same time, it's a choice. Because Oracle does not work the same way SQL Server does.
To give you an idea, in ETLs for Oracle, hints are used (like the Parallel Execution hints), and also some indexes may be dropped or integrity disabled temporarly to optimize the ETL.
There is no way that I know of to the the exact same thing in SQL Server (they might have similar options, but different syntax).
So "one ETL for all databases" can hardly be done without losing efficiency and speed.
So I think your pros and cons are very accurate; you have to choose between speed and ease of development, but not both.
You might consider tuning up your application.
A few tricks of mine:
Don't use connection.Open() and conenction.Close() too much.
Im some cases LINQ will slow things down
Use a procedure and pass more parameters when loading to reduce the number of calls, for example, proc_load_to_table(p1 text) change to proc_load_to_table(p1 text, p2 text, p3 text, p4 tex, p5 text)
I am trying to convince someone that using an external DLL to manage sql data is better then using stored procedures. Currently the person I am working with is using vba and calls sql stored procedures to get the complicated data they need from many different sources. It is my understanding that the best way to go about this kind of action is to use a dll/ some intermediate layer to get the data and be able to format it to the needs.
Some things to keep in mind:
The person i am working with doesn't care to much about being able to scale to much further then we are now
They don't care to be able to switch to different platforms
They don't see to much of a performance problem with the current setup
Using a dll requires more work that is in a different direction
They don't want to switch if there's not a current problem with doing it the way it is now.(So just because its not the right way wont work...I tried)
So can anyone tell me some benefits of using an external dll then using sql stored procedures ?
Use stored procedures, and write your data access layer which calls them via parameterized commands in a separate dll. Stored procedures are a standard and give you a ton of benefits, parameterized commands give you automatic string safety.
This type of design is basically so standardized and has been for years now that Microsoft has included a framework that constructs it for you in .NET 4.
More or less, both you and this other fellow are right, use sprocs for security, and separate your DAL for security and reusability and lots of reasons
ORM/DLL Approach
Pro:
You don't have to learn SQL, or stored procedure syntax
Con:
Complicates multiple operations in a single transaction
Risks increasing trips between the application and the database, which means data sync/concurrency issues
Utterly fails at complex queries; most support stored procedures via ORM because of this
You can save SQL, including stored procedures, in flat files. The file extension could be txt, but most use sql - makes storing SQL source in CVS/etc moot vs .NET or Java source code.
Agree with the points about controlling the code, much easier in a DLL. Same with source control. However, from a pure performance perspective, the stored procedures will win they day because they are compiled, not just cached. I don't know if it will make enough difference but thought I'd throw that in.
Using stored procedures can also be much more secure as you can lock down access to only stored procedures and you don't (have to) expose your table data to anyone with a connection.
I guess I'm not really answering your question as much as pointing out holes in your argument. Sorry about that but I'm looking at it from their perspective.
I really think it comes down to a matter of preference. Personally I like ORM & saved queries in a DLL vs. Stored Procs, I find them much easier to maintain and distribute than deploying S.Procs to a DB. There are some certain advantages that a S.Proc has over a raw query though. Some optimizations, and some server-side logic that could improve performance in some areas.
All In all though, personally I prefer to work in code than in DB mumbo-jumbo so that's really why I opt for the DLL approach.
Plus you can keep your source code in source-control too, much harder to do with a stored-proc.
Just my 2c.
I want to convert all of my db stored procedures to linq to sql expressions, is there any limitation for this work? you must notice that there is some complicated queries in my db.
Several features of SQL Server are not supported by Linq to SQL:
Batch updates (unless you use non-standard extensions);
Table-Valued Parameters;
CLR types, including spatial types and hierarchyid;
DML statements (I'm thinking specifically of table variables and temporary tables);
The OUTPUT INTO clause;
The MERGE statement;
Recursive Common Table Expressions, i.e. hierarchical queries on a nested set;
Optimized paging queries using SET ROWCOUNT (ROW_NUMBER is not the most efficient);
Certain windowing functions like DENSE_RANK and NTILE;
Cursors - although these should obviously be avoided, sometimes you really do need them;
Analytical queries using ROLLUP, CUBE, COMPUTE, etc.
Statistical aggregates such as STDEV, VAR, etc.
PIVOT and UNPIVOT queries;
XML columns and integrated XPath;
...and so on...
With some of these things you could technically write your own extension methods, parse the expression trees and actually generate the correct SQL, but that won't work for all of the above, and even when it is a viable option, it will often simply be easier to write the SQL and invoke the command or stored procedure. There's a reason that the DataContext gives you the ExecuteCommand, ExecuteQuery and ExecuteMethodCall methods.
As I've stated in the past, ORMs such as Linq to SQL are great tools, but they are not silver bullets. I've found that for larger, database-heavy projects, L2S can typically handle about 95% of the tasks, but for that other 5% you need to write UDFs or Stored Procedures, and sometimes even bypass the DataContext altogether (object tracking does not play nice with server triggers).
For smaller/simpler projects it is highly probable that you could do everything in Linq to SQL. Whether or not you should is a different question entirely, and one that I'm not going to try to answer here.
I've found that in almost all cases where I've done a new project with L2S, I've completely removed the need for stored procedures. In fact, many of the cases where I would have been forced to use a stored proc, multivariable filters for instance, I've found that by building the query dynamically in LINQ, I've actually gotten better queries in the vast majority of cases since I don't need to include those parts of the query that get translated to "don't care" in the stored proc. So, from my perspective, yes -- you should be able to translate your stored procs to LINQ.
A better question, thought, might be should you translate your stored procs to LINQ? The answer to that, I think, depends on the state of the project, your relative expertise with C#/VB and LINQ vs SQL, the size of the conversion, etc. On an existing project I'd only make the effort if it improves the maintainability or extensibility of the code base, or if I was making significant changes and the new code would benefit. In the latter case you may choose to incrementally move your code to pure LINQ as you touch it to make changes. You can use stored procs with LINQ so you may not need to change it to make use of LINQ.
I'm not a fan of this approach. This is a major architectural change, because you are now removing a major interface layer you previously put in place to gain a decoupling advantage.
With stored procedures, you have already chosen the interface your database exposes. You will now need to grant users SELECT privileges on all the underlying tables/views instead of EXECUTE on just the application stored procedures and potentially you will need to restrict column read rights at the column level in the tables/views. Now you will need to re-implement at a lower level every explicit underlying table/view/column rights which your stored procedure was previously implementing with a single implicit EXECUTE right.
Whereas before the services expected from the database could be enumerated by an appropriate inventory of stored procedures, now the potential database operations are limited to the exposed tables/views/columns, vastly increasing the coupling and potential for difficulty in estimating scope changes for database refactorings and feature implementations.
Unless there are specific cases where the stored procedure interface is difficult to create/maintain, I see little benefit of changing a working SP-based architecture en masse. In cases where LINQ generates a better implementation because of application-level data coupling (for instance joining native collections to database), it can be appropriate. Even then, you might want to LINQ to the stored procedure on the database side.
If you chose LINQ from the start, you would obviously have done a certain amount of work up front in determining column/view/table permissions and limiting the scope of application code affecting database implementation details.
What does this mean? Does this mean you want to use L2S to call your stored procedures, or do you want to convert all the T-SQL statements in your stored procs to L2S? If it's the later, you should not have too many problems doing this. Most T-SQL statements can be represented in Linq without problem.
I might suggest you investigate a tool like Linqer to help you with your T-SQL conversion. It will convert most any T-SQL statement into Linq. It has saved my quite a bit of time in converting some of my queries.
There are many constructs in T-SQL which have no parallel in LINQ to SQL. Starting with flow control, ability to return multiple row sets, recursive queries.
You will need to approach this on a case by case basis. Remembering any times the SP does significant filtering work on the database much of that filtering may end up on the client, so needing to move far more data from server to client.
If you already have tested and working stored procedures, why convert them at all? That's just making work for no reason.
If you were starting a new product from scratch and were wondering whether to use stored procedures or not, that would be an entirely different question.
My question is what is best practice to optimize performance using LINQ for SQL
And performance is response time out in the user interface.
Right now I have some sales data in a SQL Server 2008 database and I display this data (MAT, yearly, in different segments, growth in segment, percent of market growth ,,,,)
in charts in a ASP.NET application using LINQ for SQL to constructs Iquerable expressions that are executed
I see the challenge that I have a database and used LINQ to construct all questions and I have no control what SQL is created ( I can track it but ,,,,) and I don't use Stored Procedures so how my data is fetched is like a black box.
Right now I run some unit tests and manual test the application and use the Databasse Engine Tuning Advisor what indexes etc to create....
In addition to that, I'll usually use both SQL profiler and CLR profiler with some simulated users on a large-ish data set, and watch for long-running queries and/or long-running calls through the datacontext (which may signify multiple round-trips happening under the covers). My personal preference is also to disable deferred loading and object tracking on all my datacontexts by default, so I have to opt-IN to multiple round-trips in most cases. While you can't directly affect the SQL that's generated, you can be careful with LoadWith/AssociateWith and make sure that you're not fetching horribly large/inefficient result sets, and break up queries that have lots of expensive joins (sometimes multiple round-trips are cheaper than mondo joins on big tables).
It's all about measurement- use whatever tools you can get your hands on.
Profiling, profiling, profiling. :)
Measure not only timings, but pay attention to I/O as well. A frequently executed query that is I/O intensive can execute fast due to caching, but can in turn have a negative effect on the overall db-server performance since there will be less resources available for other queries.
As you say, L2S can be a bit of a black box, so you need to try to replicate all scenarios and/or profile while the app is in use by real users. Then use that to 1) tweak queries 2) add indexes 3) make any other changes needed to get the performance you need.
I have a profiling tool made specifically for Linq-to-SQL to make it a little bit 'less black box' - it allows you to do runtime profiling while tying the generated queries to the code (call stack) that resulted in a specific query being executed. You can download it and get a free trial license at http://www.huagati.com/L2SProfiler/
The background reason for my profiler is outlined in a bit more detail here:
http://huagati.blogspot.com/2009/06/profiling-linq-to-sql-applications.html
...and some advanced profiling options are covered here:
http://huagati.blogspot.com/2009/08/walkthrough-of-newest-filters-and.html
Another thing that may help if you have a lot of tables with a lot of columns is to get index info into the code editor. This is done by adding xml doc-comments with that info to the entity classes and member properties; that info is then displayed in the VS code editor's tooltips:
...that way you can see already while typing queries if there is an index covering the column(s) used in where clauses etc. To avoid having to type all of that in, I have created a tool for that too. See the 'update documentation' feature in http://www.huagati.com/dbmltools/