C# & EF: bulkinsert related objects

C# & EF: bulkinsert related objects - c#

I am currently playing with EntityFramework.BulkInsert.
While it really helps with performance of simple inserts (16 seconds with 1.000.000 rows) I can't find any information about inserting objects mapped over multiple tables. The only thing related to that is old (2014) topic from official website stating that it's not possible. Is this still actual?
If so: are there any good workarounds?

EntityFramework.BulkInsert is a very good library which supports simple scenario. However, the library is limited and not anymore supported.
So far, there are only one good workaround and it's using a library which supports everything!
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library supports everything including all associations and inheritance.
By example, for saving multiple entities in different tables, you can use BulkSaveChanges which work exactly like SaveChanges but way faster!
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
The library also do more than only inserting. It support all bulk operations:
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
However unlike EntityFramework.BulkInsert, this library is not free.
EDIT: Answer subquestion
You say way faster - do you have any metrics or a link to metrics
#Mark: You can look on metrics on our website homepage. We report BulkSaveChanges to be at least 15x faster than SaveChanges.
However metric are heavily biased. Too many things may affect it like index, trigger, latency etc!
People usually report us performance improved by 25x, 50x, 80x!
One thing people usually forget when performing benchmarking is calling our library once before the test for JIT compilation! Like Entity Framework, the first hit to the library may take several ms.

If I had a bulk insert problem, I'd not use EF. EF is meant to map objects representing entities during the normal use-cases of your application, where any given transaction should only really touch one entity (assuming your entities are designed around sensible consistency boundaries).
If I was moving lots of data around (imports/exports/transformations etc) then I would use SQL more directly, where I have more control.

Related

Improving Entity Framework Performance during testing

I'm trying to reduce the startup time for tests against an EF 6x datastore. The tests are within a transaction and the db gets rolled back once done. I would appreciate any suggestions on how to retain an instance of the DbContext between test sessions so that EF doesn't have to go through the whole view generation process again?
I don't want to use mocks/fakes, non-Microsoft branch of EF and interactive views are already in place. Thank you.

Different options. As you did not mentioned your aim of testing and there is not any code, the options are:
If you are inserting many records into your tables, you can do a bulk insert. The best library for doing this is:EntityFramework.BulkInsert-ef6. You can install it through Nuget console.
If you see slowness while working with data and you have many load/manipulation/save operations, you have to do in-memory operation as Sampath recommends.
If you are loading data, just load the columns that you need. You also should use lazy loading option(which from your post, I think you know it well).
4.Some portion of the slowness could be because of the architecture of your database. The key column types have a considerable effect on Where operations!

I would like to recommend you to use in-memory data for that. I am also used this pattern and it is really well and very fast. This is the pattern where the industry recommended and trouble free in long run. Always try to use best practices when you develop a software app.
When writing tests for your application it is often desirable to avoid
hitting the database. Entity Framework allows you to achieve this by
creating a context – with behavior defined by your tests – that makes
use of in-memory data.
Here is the article about how to do that :Testing with a mocking framework
Another article for you : Unit testing in C# using xUnit, Entity Framework, Effort and ASP.NET Boilerplate

Subsonic ORM experience

I'm looking for new ORM for a important project, im used to nHibernate with ActiveRecord and I already have a very bad experiencia with EF4, performance and crashing GUI.
So search on web I found the Subsonic, i liked what I read in the documentation.
So, I would like to know if anyone already used the Subsonic and if the experience was good.

Hmm ... well ... how should I put it....
I am currently (as in right now) expending effort to replace SubSonic with PetaPoco. I suppose that says something.
It's not that SubSonic was bad exactly, but it didn't fit my way of developing very well. And for people looking to adopt it at this point, it seems very important to note the absolute lack of activity on the project.
First, the biggest reason SubSonic didn't fit me was LINQ.
There is allure in having compiler checking of all property use, to be sure. However, in practice, it simply was not well suited to querying.
If you stick very closely to class-per-table & ActiveRecord use, I suppose it is ok. But whenever we had to make any query beyond that (anything involving multiple tables or anything beyond the simplest where clauses), it was a nightmare. Associations cannot be used directly in a SubSonic LINQ query, like they can in EF or nHibernate, which was probably the largest pain point.
For example, a query like this will not work in SubSonic, but it would in EF:
db.Accounts.Where(a => a.OwningUser.Email != null);
Where I ended up was either making many round trips to the database to assemble a result, or using SubSonic's CodingHorror class to query directly with SQL, and being unable to simply materialize them as a POCO (again, when going beyond simple class-per-table).
I also found that every LINQ provider supports different sets of operations, and sometimes the same logical operation will have slightly different syntax and use between providers. This made writing most queries very time consuming and error prone. SubSonic's LINQ provider is no shortage of quirky and under-featured. It doesn't come anywhere close to Linq-2-SQL, Entity Framework, or LINQ to nHibernate it terms of supported operations, usability, or speed of execution (be ready to learn new ways of writing joins in LINQ just for SubSonic - and be ready to have some common operations simply not be possible with SubSonic's LINQ provider, despite being known bugs for a year).
In addition to the drag on productivity, it is easy to forget that the LINQ code you are writing is very provider specific. ANSI SQL is far more standard and cross-compatible than LINQ.
LINQ also seduced me with the possibilities of reusing code with techniques like Specifications, but fleshing these out was far from easy, and the end result was not even close to worth the effort. The roadblocks I encountered here were largely due to the fact that SubSonic's LINQ provider had no support for associations.
SubSonic's facilities outside of LINQ I felt were mediocre at best (in my opinion).
Second, it is important to know that by all measures SubSonic is not an active project.
The initial creator of SubSonic, Rob Conery, no longer works on the project. The last commit Rob made was in July 2010.
The last commit to the project at all was 3 months ago, despite nearly 100 outstanding issues. And as far as I can tell there hasn't been any release, not even a minor point release, since Rob ceased working on SubSonic (though the folks still hanging around the project have been talking about a release for more than half a year).
The Google Group for SubSonic used to be active, but these days not so much. And also the official website for the SubSonic project has been yellow-screening-of-death for a while (The site no longer yellow screens).
The new hotness in data access is micro-ORM's. SubSonic's creator, actually, kind of kicked this trend off with Massive, followed soon after by the StackExchange crew releasing Dapper, and later PetaPoco came out. There's a couple more, too. And while we're giving up a little compiler checking by having SQL snippets in our code base, I find the micro-ORM fits my development style much better than SubSonic did.
My experience (albeit limited) with nHibernate was that it is overly complicated for most scenarios, and even when it is appropriate it absolutely murdered my application start up times. There was also a high learning curve (which you may be past), but also there is several ways to do .. basically everything .. so it just adds that many more decisions into my process (slowing me down).
With PetaPoco, I can write familiar SQL - I am quick and reasonably good with that - and materialize them into POCO's, which I know what the heck to do with immediately. A little sprinkling of architecture and organization and automated integration testing and I don't at all feel dirty about embedding bits of SQL.
Oh, and I suppose last thing - SubSonic is far from the fastest way to get data. May not be important, but it turned out to be for us.
In conclusion (sorry for the wall of text):
It's not that SubSonic is bad in any absolute sense. It just didn't seem to fit the ways I tried to use it well at all - and a large part of that is because LINQ is still a leaky abstraction, and it is leaky in different ways than I am used to.
The fact that development efforts are nearly non-existent is good and bad. Good, it is stable and considered "finished" in a sense. Bad, it lacks features, possibly has some bugs, and isn't the best performer - and there's no one working to improve that.

Some time ago, i was looking for a simple ORM for a small application and SubSonic was just what i needed. The setup is easy and i didn't need much time to add some persistence to my domain classes. What i liked about it, was the option to automigrate the database model based on the domain classes.
The downside of it, is that the feature set is rather limited. The things i missed the most was the option to fetch complete object graphs and the support of additional indexes. SubSonic has it's use as a persistence tool for small apps, but i for important or big apps i would rather use nHibernate or a commercial ORM like LLBLGen.
Before choosing an ORM, you should decide on the basic data access requirements. Do you want to use the Active Record pattern or the Adapter pattern? What about concurrency, performance, inheritance, etc...

I used Supersonic, it's good as long as you are using simple queries. When I started to have more complex queries I saw that it lacks LINQ features. After googling a little I switched to http://bltoolkit.net, and from that time (about 2 years now) I'm very happy with it. Plus is one of the fastest ORM as per http://ormeter.net/. Take a look it, you won't be sorry.

pluggable data store architectures

I have a pluggable system management tool. The architecture of this kind of thing is well understood (interfaces, publish/ subscribe, ....). How about the data store though. What do people do?
I need plugins to be able to add new entities, extend existing entities, establish new relationships, etc.
My thoughts (SQL), not necessarily well thought out
each plugin simply extends the schema when they are installed. In the old days changing the schema was a big no-no; now databases are very relaxed about this
plugins have their own tables. If 2 of them have an entity (say) person, then there are 2 tables p1_person and p2_person
plugins have their own database
invent some sort of flexible scheme where the tables are softly typed. Maybe many attributes packed into a single attribute. The ultimate is to have one big table called data, with key of table name & column name and a single data value.
Not SQL
object DB. I have no experience with these. Anybody care to pass on experience. db4o for example. Can I change the 'schema' of objects as the app evolves
NO-SQL
this is 'where its at' at the moment. Most of these seem to be aimed slightly differently than my needs. Anybody want to pass on experience with these
Apologies for the open ended question

My suggestion is go read about the entity framework
a lot of the situations you are describing can be solved (very elegantly) using table inheritance.
Your idea of one big table called data makes the hamsters in my computer cry ;)
The general trend is away from weakly typed schemas because they cannot be debugged at compile time. What you get from something like entity framework is a strongly typed extenislbe schema that you can code against using linq.
Object databases:
like you i havent played with them massivley - however the time when i was considering them was a time when there was no good ORM for .net and writing ado.net code was slowly killing me.
as for NO-SQL these are databases that meet a performance need. SQL performs badly in situations here there are lots of small writes occuring. I say badly tounge in cheek - it performs very well but when you scale to millions of concurrent users everything changes. My understanding of no sql is that it is a non rationalised format designed for lots of small fast writes and reads. The scale of sites that use these is usually very large.
OK - in response
I am currently lucky enough to be on a green field project so i am using EF to generate my schema.
On non greenfield projects I use sql scripts to update my table structures. As for implementing table inheritance in sql its very easy once you know the concept, its essentially a one to many relationship with a constraint that it will only ever be 0-1.
I wouldn't write .net code that updates the database structure ... that sounds like a disaster waiting to happen to me.
Beginning to think i have misunderstood what you are looking for. I find databases to be second nature as I have spent so long with them.
I haven't found a replacement for being meticulous about script management.

DAL "Typed DataSets" or Custom Business Object

I would like your opinions regarding "DataSet Designer" and DAL (Data Access Layer) best practices.
I use Visual Studio 2010 Framework .NEt 4.0.
For my understanding "DataSet Designer" allow me to create automatically strictly Typed-DataSet with DataTable and Adapter, this consist in DAL directly in Visual Studio 2010.
I would like to know:
- If in real scenario "DataSet Designer" is working well, or is better write Custom Business Object.
- If exist other new solution introduced in .net 4.0
Thanks for your support! :-)

I have to work with typed datasets and it is a nightmare. If you have an option never use them. Everything is better.

With the advent of the .Net 4.0 framework and the introduction of LINQ to SQL, I've been adopting a customized DAL of strictly written business objects. We experimented with Entity Framework briefly, but ultimately concluded that it is very similar to DataSets in that the auto-generated code, while handy, is just too bloated with extra junk that we ultimately didn't use.
We've found that writing LINQ into our DAL and extracting data pulls into our custom classes, we are able to streamline our data access and control the usage of the data functionally. It has been a very handy process, but it has taken a little bit for the junior developers to grip onto it.

I would suggest a ORM like Entity Framework or Nhibernate.
Data Sets smells too much to database way of thinking and I personally had a lot of problems working with them. They just get broken quite often and throw weird errors that are hard to troubleshoot.
Some other related questions you may find interesting
What are the advantages of using an ORM?
ASP.NET DataSet vs Business Objects / ORM

Use ADO.NET Entity Framework, which is where the future of Microsoft's ORM is going. Or, consider an open-source one like NHibernate...
HTH.

At my company we've been using Typed DataSets for a little while now, and have had a generally positive experience. I understand that many people don't like DataSets, and there are certainly newer data access tools out there, but since you asked about a real-world scenario, here are some of my requirements and findings:
Need to be able to read SQL Server, MS Access, and FoxPro data sources
SQL Server access is only through SPROC calls (not my choice)
Relatively easy to learn, especially to developers new to ASP.NET
I've personally explored low level ado.net access, typed datasets, linq-to-sql, and simply writing custom data access classes. I have not looked at the Entity Framework yet, as the version included in VS2008 seemed to have some mixed reviews, and I did not have access to VS2010 until just recently(I do plan to review EF sometime this year yet).
We chose to use Typed DataSets because they seemed to offer faster development against SPROCS and we found a very comprehensive tutorial by Scott Mitchell on the asp.net site: http://www.asp.net/data-access/tutorials.
As to our experience thus far, it has mostly been good. The DataSet designer generates a huge amount of code even for small number of Tables (<20). Making changes in the SPROCS has caused a few headaches, but I'd like to be shown a tool that would make this easier.
One thing you might try to make your decision easier: Come up with a small domain problem like a customer edit page or order entry page, and implement it multiple times using a variety of technologies. It takes some time to do this, but it is a good way to learn and you can compare the technologies for yourself. We did this and it seemed to help a lot.

I will personally prefer custom business objects with their flexibility but its more work. Also look at with Entity Framework and Linq To Sql. Entity Fx has got a lot more flexibility in .NET 4.0. This article should get you started on Entity Fx.

If anything I think you should look into Entity Framework. There are lots of great tutorials out there to get you started.

I personally agree with Joel Etherton, conditionally.
If you have a small enough project that even with EF's bloat you're still not looking at too much shenanigan-code, I would say the expediency it offers is worthwhile. However in larger codebases, it can become a lot to get your hands around so much bloat.
The other benefit to EF vs older style business objects which goes unmentioned though, is with EF implementation you will probably get easier upgrades to newer .NET versions taking advantage of benefits in the next .NET without having to rewrite a bunch of code by hand. (This can also be a double-edged sword as upgrading to new .NET with EF may affect the behaviour of your dal as opposed to a hand-written dal is less likely to be so affected.)
That said, I agree with Joel Etherton, write the simplest smallest dal you can implementing LINQ, the dal is always too important to make overly-complex whenever it can be avoided.

If you do not want to waste you time do not learn DataSets. Study general concepts of object-relational mapping, their pros and cons. Look at projects like Hibernate for Java or Doctrine for PHP. Approaches behind DataTables and DataSets which provide just wrapping of database objects is over. Your framework should guide you to design you domain model, not the database schema.

NHibernate. Especially if you are using Oracle.

Please recommend .NET ORM for N-tier development [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I need to choose carefully .NET ORM for N-tier application.
That means, the I will have the server (WCF service), which exposes the data, and client, which displays it.
The ORM should support all the related serialization issues smoothly - the objects or collections of objects, or whatever must travel across process boundaries. Ideally, the usage in multiprocess environment should be the same as in single process.
The criteria are:
Flexibility of db schema mapping to objects (preferred)
Ease of use
Free, open source (preferred)
Must be suitable for N-tier (multi-process multi-domain applications)
Performance
Tools to integrate with Visual Studio (preferred)
Testability
Adoption, availability of documentation
Wide range of RDBMS supported (preferred; we are using MSSQL, but I wouldn't like to be tied to it)
DB agnostic - different DBs, same API

Having worked with the following:
NHibernate
LLBLGen
Entity Framework
LINQ to SQL
DataObjects.Net
OpenAccess
DataTables
I can most certainly say that DataTables are superior...no just kidding. All of them do have their strengths and weaknesses.
Mainly, I have found that these strengths and wekanesses are associated with the general type of ORM, which falls into the following two categories
Heavy-weight
LLBLGen, OpenAccess, Entity Framework (pre 4.0), DataObjects all fall into this category. Heavy weight ORMs typically have entities and collections that inherit from a base class specific to the ORM (ie. EntityBase). These ORMs often offer rich design time support, code generation and in depth runtime features (such as state tracking, transaction tracking, association maintanance, etc.).
The Pro: Easier, faster development upfront leveraging the built in API for interacting with entities themselves at runtime (ie. from LLBLGen entity.Fields["MyField"].IsChanged or entity.IsNew or entity.Fields["MyField"].DbValue
The Con: Heaviness and dependencies. With these ORMs, your business objects are now tied directly into the ORM API. What if you want to change to another ORM? And what's to prevent junior developers from abusing advanced features of the ORM API to fix a simple problem with a complex solution (I've seen this a ton)? Memory usage is also a big problem with these ORMs. A collection of 5,000+ entities can easily take up 100MB of RAM with some of the above ORMs. Serialization is another problem. With the heaviness of the objects, serialization can be very slow...and it probably won't work correctly deserializing on the other side of the wire (WCF or .NET remoting, etc.). Associations may end up not re-associated correctly or certain fields may not be preserved. Several of the ORMs above have built in serialization mechanisms to improve support and speed...but none that I've seen offer full support for different formats (ie. you get binary, but not json or xml serialization support).
Light-weight
LINQ to SQL, Entity Framework POCO, NHibernate (sort of) fall into this cateogry. Light-weight ORMs typically use POCOs that you can design yourself in a file in VS (of course you can use a T4 template or a code generator too).
The Pro: Lightweight. Keeps it simple. ORM agnostic business objects.
The Con: Less features, such as entity graph maintance, state tracking, etc.
Regardless of what ORM you choose, my personal preference is to stick to LINQ, and ORM independent retrieval syntax (not the ORM's own API for fetching, if it has one).
With regard to the specific ones mentioned, here are my brief thoughts:
- NHibernate: Behind the times tech wise. Lots of maintainence of xml mapping files (though Fluent NHibernate does alleviate this).
LLBLGen: Most mature ORM I've worked with. Easy to start up a new project and get going. Best designer. VERY heavy weight. Rich VERY powerful API. Best speed I've encountered. Because it's not the new kid on the block, some of the newer features leveraging newer technology aren't implemented as well as they should be (LINQ specifically).
Entity Framework: POCO class in 4.0 looks promising. 3.5 doesn't have this and I wouldn't even consider it. Even in 4.0, doesn't have great LINQ support and generates poor SQL (can makes hundreds of DB queries for a single LINQ query). Designer support is poor when it comes to larger projects.
LINQ to SQL: Great LINQ support (better than any other except DataObjects.Net). Mediocre persistence (save/update) support. Very lightweight (POCO). Designer support is poor all around (no refresh from DB). Poor performance on advanced LINQ queries (can make hundreds of DB queries)
DataObjects.Net: Really great LINQ support and performance. Offers the closest thing I've seen to POCO in a heavy-weight ORM. Really new, powerful, promising technology. Very flexible.
OpenAccess: Haven't worked with it a ton, but it reminds me somewhat of LLBLGen, but not as feature rich or mature.
DataTables: No comment

Why not try NHibernate?

I would recommend Entity Framework v4. It has improved beyond dramatically since v1, and supports everything you require except being open source:
EF supports a very wide variety of mappings, including TPH, TPT, and TPC. Supports POCO mapping, allowing you to keep your persistence logic separate from your domain.
EF has extensive and excellent support for LINQ, providing easy to use, compile-time checked querying of your model. EF Futures components such as Code-Only simplify working with EF even more, providing a pure code, compile-time checked, fluent API for defining your model. By opting for convention over configuration, Code-Only can radically reduce your model design time, allowing you to get down to business without all the hassle of tinkering with a visual model and multiple XML mapping files.
It is free as part of .NET 4. (Sorry, Open Source preference can't be met here.)
EF provides an excellent N-Tier solution OOB via self-tracking entities
Self-tracking information uses an open xml format to transfer tracking data, so tracking support could be added to non-.NET platforms
Performance of EF v4 is very good, as extensive work was done on the query generator
See the ADO.NET Blog entry on the subject
EF provides extremely rich visual design tools, and allows extensive customization of code generation via custom T4 templates and workflows
EF v4 introduced numerous interfaces, including the IObjectSet<T> and IDbSet<T> interfaces, which greatly improve the unit testability of your custom contexts
EF v4 is an integral part of .NET 4 and a central component of all of Microsofts current and future data initiatives. As a part of .NET, its documentation is quite extensive: MSDN, EFDesign Blog, ADO.NET Blog, dozens of .NET and Programming sites and blogs provide a tremendous amount of documentation and support for the platform.

Another vote for EF here.
Very easily unit testable. You can write your own Domain Entities and have them be reasonably free of persistence awareness using POCO approach. You can then mock the database interface and test the application logic without actual database.
Supports LINQ so that if you write your LINQ properly, it will only translate a single SQL statement being sent to the server.

I've been using a product called LightSpeed, it works very well and seamlessly integrates into Visual Studio 2010 & 2008. I have been using it with Sqlite, however it supports numerous rdbms. It also has a very nice feature that allows you to create POCO objects that can be used with WCF, great time saver! At first I was using the free Express Edition but soon upgraded.
LightSpeed is the best high performance .NET domain modeling and O/R mapping framework available. First class LINQ support, Visual Studio 2008 & 2010 designer integration and our famous high performance core framework means you can create rich domain-driven models more quickly and easily than ever before.

Having used OpenAccess on several projects, I must say that it meets all the above criteria.
One system I worked on was based on WCF services talking to several client types (smart, web, other WCF services, etc.) Through a layered architecture the WCF services used OpenAccess as persistence mechanism.
I am especially fond of the scaling that OpenAccess performs.. The intelligent Level 2 cache (L2 cache) does a perfect job there and it is of cause distributable.
Actually I wouldn't call OA heavy weight... You don't even inherit from a base class. Also it is a big plus that there are tools to perform the day-to-day developer tasks (create a new DB schema, merge schemas and so on) integrated into visual studio.

About EF4...
Please do not use it in a large project, with many tables and lots of data and many users. I've made this mistake and now I'm looking for a replacement.
1-Bad query generation, especially in large TPT hierarchies. Be prepared for a 5000 line query for a hierarchy of 15 tables!
2-Extremely slow designer when the number of tables grows. 45 Seconds just to collapse/expand an entity in a model with 240 entities.
3-Serious problem with x-to-many relationships. Suppose you have Order and Customer entity. Each order has a Customer and each Customer has many Orders. There is a property, named Orders in Customer class that will be populated without you ever actually needing that data. This meant, in our system, that collections up to 1800000 entities be fetched for no actual reason. When this happens inside a transaction with Snapshot isolation level... that brings the whole system to failure. There is no actual solution to this problem, one that has no serious drawbacks. Just read the DataObjects.Net's documentation and see how they've takled this problem. I found that paying 200 or 500 euros is nothing compared to what you get. I may even get the version with source code.
If I'm unable to integrate my system with DO.Net, I'll look for another one, but this EF thing, it has to go!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.