I have a query that deletes all rows that have been marked for deletion. There is a column in a table that is named IsDeleted. It is a boolean data type if it is true the row is suppose to be deleted along with all related rows in different tables.
If an article row is marked for deletion then the article comments, votes are also supose to be deleted. Which ORM can efficiently handle this?
Edit
I need this for C# .NET
DataObjects.Net offers an intermediate solution:
Currently it can't perform server-side deletion of entities selected by query. This will be implemented some day, but for now there is another solution.
One the other hand, is supports so-called generalized batching: queries it sends are sent in batches by up to 25 items at once, when this is possible. "Possible" means "query result won't be necessary right now". This is almost always correct for creations, updates and deletes. Since such queries always lead to a single (or few, if there is inheritance) seek operations, they're pretty cheap. If they're sent in bulks, SQL Server can cache plans for the whole bulks, not for just individual queries there.
So this is very fast, although not yet ideal:
For now DO4 doesn't use IN (...) to optimize such deletions.
So far it doesn't support asynchronous batch execution. When this is done (I hope this will be done in a month or so), its speed on CUD (a subset from CRUD) operations will be nearly the same as of SqlBulkCopy (~= 1.5 ... 2 times faster than now).
So in case with DO bulk deletion looks as follows:
var customersToRemove =
from customer in Query<Customer>.All
where customer.IsDeleted
select customer;
foreach (customer in customersToRemove)
customer.Remove(); // This will be automatically batched
I can name a benefit of this approach: any of such objects will be able to react on deletion; Session event subscribers will be notified about each deletion as well. So any common logic related to deletions will work as expected. This is impossible, if such operation is executed on server.
Code for soft delete must look like:
var customersToRemove =
from customer in Query<Customer>.All
where ...
select customer;
foreach (customer in customersToRemove)
customer.IsRemoved = true; // This will be automatically batched
Obviously, such an approach is slower that bulk server-side update. By our estimates, what we have now is about 5 times slower than true server-side deletion in worst case ([bigint Id, bigint Value] table, clustered primary index, no other indexes); on real-life cases (more columns, more indexes, more data) it must bring a comparable performance right now (i.e. be 2-3 times slower). Asynchronous batch execution will improve this further.
Btw, we shared tests for bulk CUD operations with entities for various ORM frameworks at ORMBattle.NET. Note that tests there don't use bulk server-side updates (in fact, such test would be a test for database performance rather than ORM); instead they test if ORM is capable of optimizing this. Anyway, the info provided there + test code might be helpful, if you're evaluating multiple ORM tools.
Typically, if you are already using an IsDeleted flag paradigm, the items are normally ignored by the application object model, and this is efficient and reliable because no referential integrity is needed to be checked (no cascade), and no data is permanently destroyed.
If you want IsDeleted rows purged on a regular basis, it is far more efficient to schedule these as batch jobs in the RDBMS using native SQL, as long as you remove things in the right order so that referential integrity is not compromised. If you do not enforce referential integrity at the DB-level, then the order doesn't matter.
Even with strong referential integrity and constraints in all my database designs over the years, I have never used cascading RI - it has never been desirable in my designs.
NHibernate supports HQL (the object oriented Hibernate Query Language) updates and deletes.
There are some examples in this Blog Post by Fabio Maulo and this Blog Post by Ayende Rahien.
It would probably look like this:
using (var session = OpenSession())
using (var tx = s.BeginTransaction())
{
session
.CreateQuery("delete from Whatever where IsDelete = true")
.ExecuteUpdate();
tx.Commit();
}
Note: this is not SQL. This is HQL containing class names and property names and it translates to (almost) any database.
Which ORM's support "criteria" based delete... Both of the ones i've worked with (Propel, Doctrine). I would think that nearly all do unless they are early in development as its pretty basic thing. But what language are you working?
As far as your deletion cascade, this is best implemented at the database level with foreign keys. Most RDBMS's support this. If youre using something that doesn't some ORM's implement this as well if support isnt available. But my advice would be to just use an RDBMS that does support it. It will be less headaches in the long run.
I am using LLBLgen, which can do cascading deletes. You might want to try it, it's very good.
Example. Delete all users in username[] from all roles in rolenames[]
string[] usernames;
string[] rolenames;
UserRoleCollection userRoles = new UserRoleCollection ();
PredicateExpression filter = new PredicateExpression();
filter.Add(new FieldCompareRangePredicate(UserFields.logincode, usernames));
filter.AddWithAnd(new FieldCompareRangePredicate(RoleFields.Name, rolenames));
userRoles.DeleteMulti(filter)
Most ORMs will allow you to either give SQL hints, or execute SQL within their framework.
For example, you can use the ExecuteQuery method in DLINQ to do what you want. Here is a brief tutorial on using custom sql with DLINQ.
http://weblogs.asp.net/scottgu/archive/2007/08/27/linq-to-sql-part-8-executing-custom-sql-expressions.aspx
I would expect that the Entity Framework would also allow it, but I have never used it but you can look into it.
Basically, find an ORM that has the features you need, and then you could ask how to do this query in your selected ORM. I think picking an ORM for this one feature is risky as there are many other factors that should go into the selection.
Related
I would like to know if the following scenario is possible with Entity Framework:
I want to load several tables with the option AsNoTracking since they are all like static tables that cannot be changed by user.
Those tables also happen to be navigation property of others. Up till now I relied on the AutoMapping feature of the Entity Framework, and don't use the .Include() or LazyLoading functionality.
So instead of:
var result = from x in context.TestTable
.Include("ChildTestTable")
select x;
I am using it like this:
context.ChildTestTable.Load();
context.TestTable.Load();
var result = context.TestTable.Local;
This is working smoothly because the application is so designed that the tables within the Database are very small, there won't be a table that exceeds 600 rows (and that's already pretty high value in my app).
Now my way of loading data, isn't working with .AsNoTracking().
Is there any way to make it working?
So I can write:
context.ChildTestTable.AsNoTracking().List();
var result = context.TestTable.AsNoTracking().List();
Instead of:
var result = from x in context.TestTable.AsNoTracking()
.Include("ChildTestTable")
select x;
So basically, I want to have 1 or more tables loaded with AutoMapping feature on but without loading them into the Object State Manager, is that a possibility?
The simple answer is no. For normal tracking queries, the state manager is used for both identity resolution (finding a previously loaded instance of a given entity and using it instead of creating a new instance) and fixup (connecting navigation properties together). When you use a no-tracking query it means that the entities are not tracked in the state manager. This means that fixup between entities from different queries cannot happen because EF has no way of finding those entities.
If you were to use Include with your no-tracking query then EF would attempt to do some fixup between entities within the query, and this will work a lot of the time. However, some queries can result in referencing the same entity multiple times and in some of those cases EF has no way of knowing that it is the same entity being referenced and hence you may get duplicates.
I guess the thing you don't really say is why you want to use no-tracking. If your tables don't have a lot of data then you're unlikely to see significant perf improvements, although many factors can influence this. (As a digression, using the ObservableCollection returned by .Local could also impact perf and should not be necessary if the data never changes.) Generally speaking you should only use no-tracking if you have an explicit need to do so since otherwise it ends up adding complexity without benefit.
I need to sync a sql-table with data from an DataTable (which is a modified copy of the SQL-table). I'd like to update/delete/insert only the differences, so i need to compare both and find the query-value (in my case ID) and change-type. Is there an efficient way, perhaps via some preset method? I'd like to have as little access as possible.
Create datadapter, set its commands, fill your datatable. Work with your datatable
Then get datatable filled with changes
DataTable updateDt = originalDt.GetChanges();
dataAdapter.Update(updateDt);
This is the basic logic of working in disconnected mode and updating database.
I can't recommend a specific practical strategy that you can code yourself, but I recommend that you look at the services provided by the Entity Framework ( SaveChanges, the ObjectContext, adding/removing Entities from the ObjectContext, Include keyword, Navigation and Association properties of related Entities, etc. ) as an estimate of the range and complexity of issues you need to solve.
SQL Server replication framework can also give you some hints ( Merge conflict resolution strategies ).
Right now I'm working on a pretty complex database. Our object model is designed to be mapped to the database. We're using EF 5 with POCO classes, manually generated.
Everything is working, but there's some complaining about the performances. I've never had performance problems with EF so I'm wondering if this time I just did something terribly wrong, or the problem could reside somewhere else.
The main query may be composed of dynamic parameters. I have several if and switch blocks that are conceptually like this:
if (parameter != null) { query = query.Where(c => c.Field == parameter); }
Also, for some complex And/Or combinations I'm using LinqKit extensions from Albahari.
The query is against a big table of "Orders", containing years and years of data. The average use is a 2 months range filter though.
Now when the main query is composed, it gets paginated with a Skip/Take combination, where the Take is set to 10 elements.
After all this, the IQueryable is sent through layers, reaches the MVC layer where Automapper is employed.
Here, when Automapper starts iterating (and thus the query is really executed) it calls a bunch of navigation properties, which have their own navigation properties and so on. Everything is set to Lazy Loading according to EF recommendations to avoid eager loading if you have more than 3 or 4 distinct entities to include. My scenario is something like this:
Orders (maximum 10)
Many navigation properties under Order
Some of these have other navigation under them (localization entities)
Order details (many order details per order)
Many navigation properties under each Order detail
Some of these have other navigation under them (localization entities)
This easily leads to a total of 300+ queries for a single rendered "page". Each of those queries is very fast, running in a few milliseconds, but still there are 2 main concerns:
The lazy loaded properties are called in sequence and not parallelized, thus taking more time
As a consequence of previous point, there's some dead time between each query, as the database has to receive the sql, run it, return it and so on for each query.
Just to see how it went, I tried to make the same query with eager loading, and as I predicted it was a total disaster, with a translated sql of more than 7K lines (yes, seven thousands) and way more slow overall.
Now I'm reluctant to think that EF and Linq are not the right choice for this scenario. Some are saying that if they were to write a stored procedure which fetches all the needed data, it would run tens of times faster. I don't believe that to be true, and we would lose the automatic materialization of all related entities.
I thought of some things I could do to improve, like:
Table splitting to reduce the selected columns
Turn off object tracking, as this scenario is read only (have untracked entities)
With all of this said, the main complaint is that the result page (done in MVC 4) renders too slowly, and after a bit of diagnostics it seems all "Server Time" and not "Network Time", taking about from 8 to 12 seconds of server time.
From my experience, this should not be happening. I'm wondering if I'm approaching this query need in a wrong way, or if I have to turn my attention to something else (maybe a bad configured IIS server, or anything else I'm really clueless). Needles to say, the database has its indexes ok, checked very carefully by our dba.
So if anyone has any tip, advice, best practice I'm missing about this, or just can tell me that I'm dead wrong in using EF with Lazy Loading for this scenario... you're all welcome.
For a very complex query that brings up tons of hierarchical data, stored procs won't generally help you performance-wise over LINQ/EF if you take the right approach. As you've noted, the two "out of the box" options with EF (lazy and eager loading) don't work well in this scenario. However, there are still several good ways to optimize this:
(1) Rather than reading a bunch of entities into memory and then mapping via automapper, do the "automapping" directly in the query where possible. For example:
var mapped = myOrdersQuery.Select(o => new OrderInfo { Order = o, DetailCount = o.Details.Count, ... })
// by deferring the load until here, we can bring only the information we actually need
// into memory with a single query
.ToList();
This approach works really well if you only need a subset of the fields in your complex hierarchy. Also, EF's ability to select hierarchical data makes this much easier than using stored procs if you need to return something more complex than flat tabular data.
(2) Run multiple LINQ queries by hand and assemble the results in memory. For example:
// read with AsNoTracking() since we'll be manually setting associations
var myOrders = myOrdersQuery.AsNoTracking().ToList();
var orderIds = myOrders.Select(o => o.Id);
var myDetails = context.Details.Where(d => orderIds.Contains(d.OrderId)).ToLookup(d => d.OrderId);
// reassemble in memory
myOrders.ForEach(o => o.Details = myDetails[o.Id].ToList());
This works really well when you need all the data and still want to take advantage of as much EF materialization as possible. Note that, in most cases a stored proc approach can do no better than this (it's working with raw SQL, so it has to run multiple tabular queries) but can't reuse logic you've already written in LINQ.
(3) Use Include() to manually control which associations are eager-loaded. This can be combined with #2 to take advantage of EF loading for some associations while giving you the flexibility to manually load others.
Try to think of an efficient yet simple sql query to get the data for your views.
Is it even possible?
If not, try to decompose (denormalize) your tables so that less joins is required to get data. Also, are there efficient indexes on table colums to speed up data retrieval?
If yes, forget EF, write a stored procedure and use it to get the data.
Turning tracking off for selected queries is a-must for a read-only scenario. Take a look at my numbers:
http://netpl.blogspot.com/2013/05/yet-another-orm-micro-benchmark-part-23_15.html
As you can see, the difference between tracking and notracking scenario is significant.
I would experiment with eager loading but not everywhere (so you don't end up with 7k lines long query) but in selected subqueries.
One point to consider, EF definitely helps make development time much quicker. However, you must remember that when you're returning lots of data from the DB, that EF is using dynamic SQL. This means that EF must 1. Create the SQL, 2.SQL Server then needs to create an execution plan. this happens before the query is run.
When using stored procedures, SQL Server can cache the execution plan (which can be edited for performance), which does make it faster than using EF. BUT... you can always create your stored proc and then execute it from EF. Any complex procedures or queries I would convert to stored procs and then call from EF. Then you can see your performance gain(s) and reevaluate from there.
In some cases, you can use Compiled Queries MSDN to improve query performance drastically. The idea is that if you have a common query that is run many times that might generate the same SQL call with different parameters, you compile the query tie first time it's run then pass it as a delegate, eliminating the overhead of Entity Framework re-generating the SQL for each subsequent call.
Currently our new database design is changing rapidly and I don't always have time to keep up to date with the latest changes being made. Therefore I would like to create some basic integration tests that are basically sanity checks on my mappings against the database.
Here are a few of the things I'd like to accomplish in these tests:
Detect columns I have not defined in my mapping but exist in the database
Detect columns I have mapped but do NOT exist in the database
Detect columns that I have mapped where the data types between the database and my business objects no longer jive with each other
Detect column name changes between database and my mapping
I found the following article by Ayende but I just want to see what other people out there are doing to handle these sort of things. Basically I'm looking for simplified tests that cover a lot of my mappings but do not require me to write seperate queries for every business object in my mappings.
I'm happy with this test, that comes from the Ayende proposed one:
[Test]
public void PerformSanityCheck()
{
foreach (var s in NHHelper.Instance.GetConfig().ClassMappings)
{
Console.WriteLine(" *************** " + s.MappedClass.Name);
NHHelper.Instance.CurrentSession.CreateQuery(string.Format("from {0} e", s.MappedClass.Name))
.SetFirstResult(0).SetMaxResults(50).List();
}
}
I'm using plain old query since this version comes from a very old project and I'm to lazy to update with QueryOver or Linq2NH or something else...
It basically ping all mapped entities configured and grasp some data too in order to see that all is ok. It does not care if some field exists in the table but not on the mapping, that can generate problem in persistence if not nullable.
I'm aware that Fabio Maulo has something eventually more accurate.
As a personal consideration, if you are thinking on improvement, I would try to implement such a strategy: since mapping are browsable by API, look for any explicit / implicit table declaration in the map, and ping it with the database using the standard schema helperclasses you have inside NH ( they eventually uses the ADO.NET schema classes, but they insulate all the configuration stuff we already did in NH itself) By playng a little with naming strategy we can achieve a one by one table field check list. Another improvement can be done by, in case of unmatching field, looking for a candidate by applying Levensthein Distance to all the available names and choosing one if some threshold requisites are satisfied. This of course is useless in class first scenarios when the DB schema are generated by NH itself.
I use this one too:
Verifying NHibernate Entities Contain Only Virtual Members
Using LINQ to Entities sounds like a great way to query against a database and get actual CLR objects that I can modify, data bind against and so forth. But if I perform the same query a second time do I get back references to the same CLR objects or an entirely new set?
I do not want multiple queries to generate an ever growing number of copies of the same actual data. The problem here is that I could alter the contents of one entity and save it back to the database but another instance of the entity is still in existence elsewhere and holding the old data.
Within the same DataContext, my understanding is that you'll always get the same objects - for queries which return full objects instead of projections.
Different DataContexts will fetch different objects, however - so there's a risk of seeing stale data there, yes.
In the same DataContext you would get the same object if it's queried (DataContext maintains internal cache for this).
Be aware that that the objects you deal are most likely mutable, so instead of one problem (data duplication) you can get another (concurrent access).
Depending on business case it may be ok to let the second transaction with stale data to fail on commit.
Also, imagine a good old IDataReader/DataSet scenario. Two queries would return two different readers that would fill different datasets. So the data duplication problem isn't ORM specific.
[oops; note that this reply applies to Linq-to-SQL, not Entity Framework.]
I've left it here (rather than delete) because it is partly on-topic, and might be useful.
Further to the other replies, note that the data-context also has the ability to avoid doing a round-trip for simply "by primary key" queries - it will check the cache first.
Unfortunately, it was completely broken in 3.5, and is still half-broken in 3.5SP1, but it works for some queries. This can save a lot of time if you are getting individual objects.
So basically, IIRC you need to use:
// uses object identity cache (IIRC)
var obj = ctx.Single(x=>x.Id == id);
But not:
// causes round-trip (IIRC)
var obj = ctx.Where(x=>x.Id == id).Single();