We have a rather large ASP.NET MVC project using LINQ to SQL which we are in the process of migrating to Windows Azure.
Now, we need to serialize objects for storing in the Azure distributed cache, and setting "Serialization Mode" to "Unidirectional" in the .dbml file, and thus automatically decorating the generated classes and properties with DataContract and DataMember attributes accordingly, seems to be the recommended way. However, this makes any relationships not yet loaded by LINQ to SQL to be lost when serializing and saved as null.
What would be the prefered way of proceeding, taking into account a couple of things:
As mentioned, it's a rather large project with the generated *.designer.cs file being close to 1.5MB
Disabling lazy loading completely would most likely be a big
performance hit due to many deep class relationships.
Changing ORM tool is something we are considering, but doing this at the same time as switching platforms would probably be a bad thing.
If this boils down to somehow manually specifying which objects and relations to serialize accross the whole project; using something like protobuf-net for some extra performance gains would probably not be a huge step.
However, this makes any relationships not yet loaded by LINQ to SQL to be lost when serializing and saved as null.
Yes, this is normal and expected when serializing - you are essentially taking a snapshot of what was available at that time, because lazy-loading depends on it being loaded via a data-context. It would be inadviseable for any tool to crawl the entire model looking for things to load, because that could keep going indefinitely, essentially bringing large chunks of unwanted data into play.
Options:
explicitly fetch (either pre-emptively via "loadwith", or by hitting the appropriate properties) the data you are interested in before serializing
or, load the data into a completely separate DTO model for serialization - in many ways, this is a re-statement of the first, since it will by necessity involve iterating over (projecting) the data you want, but it means you are creating the DTO to suit the exact shape you actually want to send
Related
I have done a lot of searching and experimenting and have been unable to find a workable resolution to this problem.
Environment/Tools
Visual Studio 2013
C#
Three tier web application:
Database tier: SQL Server 2012
Middle tier: Entity Framework 6.* using Database First, Web API 2.*
Presentation tier: MVC 5 w/Razor, Bootstrap, jQuery, etc.
Background
I am building a web application for a client that requires a strict three-tier architecture. Specifically, the presentation layer must perform all data access through a web service. The presentation layer cannot access a database directly. The application allows a small group of paid staff members to manage people, waiting lists, and the resources they are waiting for. Based on the requirements the data model/database design is entirely centered around the people (User table).
Problem
When the presentation layer requests something, say a Resource, it is related to at least one User, which in turn is related to some other table, say Roles, which are related to many more Users, which are related to many more Roles and other things. The point being that, when I query for just about anything EF wants to bring in almost the entire database.
Normally this would be okay because of EF's default lazy-load behavior, but when serializing just about any object to JSON for returning to the presentation layer, the Newtonsoft.Json serializer hangs for a long time then blows a stack error.
What I Have Tried
Here is what I have attempted so far:
Set Newtonsoft's JSON serialier ReferenceLoopHandling setting to Ignore. No luck. This is not cyclic graph issue, it is just the sheer volume of data that gets brought in (there are over 20,000 Users).
Clear/reset unneeded collections and set reference properties to null. This showed some promise, but I could not get around Entity Framework's desire to track everything.
Just setting nav properties to null/clear causes those changes to be saved back to the database on the next .SaveChanges() (NOTE: This is an assumption here, but seemed pretty sound. If anyone knows different, please speak up).
Detaching the entities causes EF to automatically clear ALL collections and set ALL reference properties to null, whether I wanted it to or not.
Using .AsNotTracking() on everything threw some exception about not allowing non-tracked entities to have navigation properties (I don't recall the exact details).
Use AutoMapper to make copies of the object graph, only including related objects I specify. This approach is basically working, but in the process of (I believe) performing the auto-mapping, all of the navigation properties are accessed, causing EF to query and resolve them. In one case this leads to almost 300,000 database calls during a single request to the web service.
What I am Looking For
In short, has anyone had to tackle this problem before and come up with a working and performant solution?
Lacking that, any pointers for at least where to look for how to handle this would be greatly appreciated.
Additional Note: It occurred to me as I wrote this that I could possibly combine the second and third items above. In other words, set/clear nav properties, then automap the graph to new objects, then detach everything so it won't get saved (or perhaps wrap it in a transaction and roll it back at the end). However, if there is a more elegant solution I would rather use that.
Thanks,
Dave
It is true that doing what you are asking for is very difficult and it's an architectural trap I see a lot of projects get stuck in.
Even if this problem were solveable, you'd basically end up just having a data layer which just wraps the database and destroys performance because you can't leverage SQL properly.
Instead, consider building your data access service in such a way that it returns meaningful objects containing meaningful data; that is, only the data required to perform a specific task outlined in the requirements documentation. It is true that an post is related to an account, which has many achievements, etc, etc. But usually all I want is the text and the name of the poster. And I don't want it for one post. I want it for each post in a page. Instead, write data services and methods which do things which are relevant to your application.
To clarify, it's the difference between returning a Page object containing a list of Posts which contain only a poster name and message and returning entire EF objects containing large amounts of irrelevant data such as IDs, auditing data such as creation time.
Consider the Twitter API. If it were implemented as above, performance would be abysmal with the amount of traffic Twitter gets. And most of the information returned (costing CPU time, disk activity, DB connections as they're held open longer, network bandwidth) would be completely irrelevant to what developers want to do.
Instead, the API exposes what would be useful to a developer looking to make a Twitter app. Get me the posts by this user. Get me the bio for this user. This is probably implemented as very nicely tuned SQL queries for someone as big as Twitter, but for a smaller client, EF is great as long as you don't attempt to defeat its performance features.
This additionally makes testing much easier as the smaller, more relevant data objects are far easier to mock.
For three tier applications, especially if you are going to expose your entities "raw" in services, I would recommend that you disable Lazy Load and Proxy generation in EF. Your alternative would be to use DTO's instead of entities, so that the web services are returning a model object tailored to the service instead of the entity (as suggested by jameswilddev)
Either way will work, and has a variety of trade-offs.
If you are using EF in a multi-tier environment, I would highly recommend Julia Lerman's DbContext book (I have no affiliation): http://www.amazon.com/Programming-Entity-Framework-Julia-Lerman-ebook/dp/B007ECU7IC
There is a chapter in the book dedicated to working with DbContext in multi-tier environments (you will see the same recommendations about Lazy Load and Proxy). It also talks about how to manage inserts and updates in a multi-tier environment.
i had such a project which was the stressful one .... and also i needed to load large amount of data and process them from different angles and pass it to complex dashboard for charts and tables.
my optimization was :
1-instead of using ef to load data i called old-school stored procedure (and for more optimization grouping stuff to reduce table as much as possible for charts. eg query returns a table that multiple charts datasets can be extracted from it)
2-more important ,instead of Newtonsoft's JSON i used fastJSON which performance was mentionable( it is really fast but not compatible with complex object. simple example may be view models that have list of models inside and may so on and on or )
better to read pros and cons of fastJSON before
https://www.codeproject.com/Articles/159450/fastJSON
3-in relational database design who is The prime suspect of this problem it might be good to create those tables which have raw data to process in (most probably for analytics) denormalized schema which save performance on querying data.
also be ware of using model class from EF designer from database for reading or selecting data especially when u want serialize it(some times i think separating same schema model to two section of identical classes/models for writing and reading data in such a way that the write models has benefit of virtual collections came from foreign key and read models ignore it...i am not sure for this).
NOTE: in case of very very huge data its better go deeper and set up in-memory table OLTP for the certain table contains facts or raw data how ever in that case your table acts as none relational table like noSQL.
NOTE: for example in mssql you can use benefits of sqlCLR which let you write scripts in c#,vb..etc and call them by t-sql in other words handle data processing from database level.
4-for interactive view which needs load data i think its better to consider which information might be processed in server side and which ones can be handled by client side(some times its better to query data from client-side ... how ever you should consider that those data in client side can be accessed by user) how ever it is situation-wise.
5-in case of large raw data table in view using datatables.min.js is a good idea and also every one suggest using serverside-paging on tables.
6- in case of importing and exporting data from big files oledb is a best choice i think.
how ever still i doubt them to be exact solutions. if any body have practical solutions please mention it ;) .
I have fiddled with a similar problem using EF model first, and found the following solution satisfying for "One to Many" relations:
Include "Foreign key properties" in the sub-entities and use this for later look-up.
Define the get/set modifiers of any "Navigation Properties" (sub-collections) in your EF entity to private.
This will give you an object not exposing the sub-collections, and you will only get the main properties serialized. This workaround will require some restructuring of your LINQ queries, asking directly from your table of SubItems with the foreign key property as your filtering option like this:
var myFitnessClubs = context.FitnessClubs
?.Where(f => f.FitnessClubChainID == myFitnessClubChain.ID);
Note 1:
You may off-cause choose to implement this solution partly, hence only affecting the sub-collections that you strongly do not want to serialize.
Note 2:
For "Many to Many" relations, at least one of the entities needs to have a public representation of the collection. Since the relation cannot be retrieved using a single ID property.
We are currently using NHibernate for accessing our database.
We have database, which store a "configuration". We have the notion of "revision", meaning that every table in our database has a revision, and when we make any change(even a small one), every field in our database get duplicated(except the changes that will not be the same).
The goal is to be able to switch easily from one revision from another, and be able to delete one configuration and still be able to switch from an earlier or older revision.
It implies that when we change the configuration, we make a lot of writing in the database(and other application will have to read also it).
This steps can take a very long time(5-10 minutes, we have a lot of table), compared to 10-20 seconds to store it in xml.
After spending some time in analysis, we have the impression that NHibernate has to make a lot of reflection to map database to c# objects(using our hbm.xml files).
My questions:
How does NHibernate read/write properties in every object, with reflection, right?
Does it make reflection at every write, or is there some optimization(cache, ...?)
Is there a possibility to avoid this "reflection"? Like having some class created/compiled on build(like it's possible with entity framework)?
If I've a working NHibernate model, is there something I can do to "tune" the DB access, without changing the database?
Thank you very much for your help.
Before assuming it is relection. I would strongly recommend downloading (as a free trial at least) NHProf to see what is taking NHibernate so long. Or some other database profiler.
If I had to guess it was the number of database updates that is required here but I wouldn't guess about performance without getting some metrics first ;)
For instance it might be that you need to increase your batch size in NHibernate if you are doing a lot of small updates in one session, which can be done in your nh config file
<property name="adonet.batch_size">300</property>
You cannot avoir Reflection, but it is not a problem. NHibernate use a lot of Reflection to prepare and emit dynamic code. Generated code is faster than static code because msil allow you more thing than c#.
If implementation of reflection usage seems to be your issue you can juste extends NHibernate but write a byte code provider.
Personnaly, proxy generator, usertype and property accessors can be faster than builtin implementation.
Generally, performance issue caused by Nhibernate is often :
missing fetching
precision data causing unwanted update
bad mapping (structure or simply datatype)
bad database/datatable configuration
bad commit strategy (flush mode for exemple)
I want to add a history feature for one of my application's entities. My application is using linq to sql as O/RM.
The solution I have selected for saving previous versions of entity is to serialize old version and save serialized versions somewhere.
Anyway, I was looking for the best way to serialize linq to sql entities. One of the best solutions I found was simply setting serialization mode to Unidirectional in DataContext object, and then use .net serializer. It seems that the method works well.
But, my question is, do this method have any side effect on data context or any performance hit? For example, does lazy loading of objects works intact after turning unidirectional serialization mode on?
I've searched a lot but found nothing.
And, do you think I should go another way to serialize objects? Any cheaper method?
PS 1: I don't like using DTOs, because I should maintain compatibility between DataContext object and DTOs.
PS 2: Please note, that the application I'm working on is relatively large and currently under heavy load. So I should be careful about changes in O/RM behavior or performance issues, because I could not review the entire application + online versions.
Thanks a lot :)
Re "and then use .net serializer", the intent of unidirectional mode is to support DataContractSerializer (specifically). The effect is basically just the addition of the [DataContract]/[DataMember] flags, but: based on experience, and despite your reservations, I would strongly advise you to go the DTO route:
ORMs, due to lazy loading of both sub-collections and properties (sometimes) makes it very hard to predict exacty what is going to happen
and at the same time, make it hard to monitor when data load is happening (N+1 in your serialization code can go un-noticed)
you have very little control over the size of the model (how many levels of descendants etc) are going to be serialized
and don't even get me started on parent navigation
the deserialized objects will not have a data-context, which means that their lazy loading won't work - this can lead to unexpected behaviours
you can get into all sorts of problems if you can't version your data model separately from your serialization model
there are some... "peculiarities" that impact LINQ-to-SQL and serialization (from memory, the most "fun" one is a difference in how the IList.Add and IList<T>.Add is implemented, meaning that one triggers data-change flags and one doesn't; nasty)
you are making everything dependent on your data model, which also means that you can never change your data model (at least, not easily)
a lot of scenarios that involve DTO are read-based, not write-based, so you might not even need the ORM overheads in the first place - for example, you might be able to replace some hot code-path pieces with lighter tools (dapper, etc) - you can't do that well if everything
is tied to the data model
I strongly advocate biting the bullet and creating a DTO model that matches what you want to serialize; the DTO model would be incredibly simple, and should only take a few minutes to throw together, including the code to map to/from your data model. This will give you total control over what is serialized, while also giving you freedom to iterate without headaches, and do things like use different serializers
So I have a design decision to make. I'm building a website, so the speed would be the most important thing. I have values that depend on other values. I have two options:
1- Retrieve my objects from the database, and then generate the dependent values/objects.
2- Retrieve the objects with the dependent values already stored in the database.
I'm using ASP.NET MVC with Entity Framework.
What considerations should I have in making that choice?
You will almost certainly see no performance benefit in storing the derived values. Obviously this can change if the dependency is incredibly complex or relies on a huge amount of data, but you don't mention anything specific about the data so I can only speak in generalities.
In other words, don't store values that are completely derivative as they introduce update anomalies (in other words, someone has to have knowledge about and code for these dependencies when updating your data, rather than it being as self-explanatory and clear as possible).
Ask yourself this question:
Are the dependent values based on business rules?
If so, then don't store them in the database - not because you can't or shouldn't, but because it is good practice - you should only have business rules in the database if that is the best or only place to have it, not just because you can.
Serializing your objects to the database will usually be slower than creating the objects in normal compiled code. Database access is normally pretty quick, it is the act of serialization that is slow. However if you have a complicated object creation process that is time consuming then serialization could end up quicker, especially if you use a custom serialization method.
Sooooo.... if your 'objects' are relatively normal data objects with some calculated/derived values then I would suggest that you store the values of the 'objects' in the database, read those values from the database and map them to data objects created in the compiled code*, then calculate your dependent values.
*Note that this is standard data retrieval - some people use an ORM, some manually map the values to objects.
I'm exploring Mongo as an alternative to relational databases but I'm running into a problem with the concept of schemaless collections.
In theory it sounds great, but as soon as you tie a model to a collection, the model becomes your defacto schema. You can no longer just add or remove fields from your model and expect it to continue to work. I see the same problems here managing changes as you have with a relational database in that you need some sort of script to migrate from one version of the database schema to the other.
Am I approaching this from the wrong angle? What approaches do members here take to ensure that their collection items stay in sync with their domain model when making updates to their domain model?
Edit: It's worth noting that these problems obviously exist in relational databases as well, but I'm asking specifically for strategies in mitigating the problem using schemaless databases and more specifically Mongo. Thanks!
Schema migration with MongoDB is actually a lot less painful than with, say, SQL server.
Adding a new field is easy, old records will come in with it set to null or you can use attributes to control the default value [BsonDefaultValue("abc", SerializeDefaultValue = false)]
The [BsonIgnoreIfNull] attribute is also handy for omitting objects that are null from the document when it is serialized.
Removing a field is fairly easy too, you can use [BSonExtraElements] (see docs) to collect them up and preserve them or you can use [BsonIgnoreExtraElements] to simply throw them away.
With these in place there really is no need to go convert every record to the new schema, you can do it lazily as needed when records are updated, or slowly in the background.
PS, since you are also interested in using dynamic with Mongo, here's an experiment I tried along those lines. And here's an updated post with a complete serializer and deserializer for dynamic objects.
My current thought on this is to use the same sort of implementation I would using a relational database. Have a database version collection which stores the current version of the database.
My repositories would have a minimum required version which they require to accurately serialize and deserialize the collections items. If the current db version is lower than the required version, I just throw an exception. Then use migrations which would do all the conversion necessary to update the collections to the required state to be deserialized and update the database version number.
With statically-typed languages like C#, whenever an object gets serialised somewhere, then its original class changes, then it's deserialised back into the new class, you're probably going to run in problems somewhere along the line. It's fairly unavoidable whether it's MongoDB, WCF, XmlSerializer or whatever.
You've usually got some flexibility with serialization options, for example with Mongo you can change a class property name but still have its value map to the same field name (e.g. using the BsonElement attribute). Or you can tell the deserializer to ignore Mongo fields that don't have a corresponding class property, using the BsonIgnoreExtraElements attribute, so deleting a property won't cause an exception when the old field is loaded from Mongo.
Overall though, for any structural schema changes you'll probably need to reload the data or run a migration script. The other alternative is to use C# dynamic variables, although that doesn't really solve the underlying problem, you'll just get fewer serialization errors.
I've been using mongodb for a little over a year now though not for very large projects. I use hugo's csmongo or the fork here. I like the dynamic approach it introduces. This is especially useful for projects where the database structure is volatile.