How should one handle deep objects with EF+Web API? - c#

This is my first question here, so be gentle.
I have a database model consisting of about 60 objects describing information and various features of a an industrial process. The end result is approx. 10-level deep objects.
My intention was to
send a top level object to the client in JSON
two-way bind said object (in angular, but nvm that) and manipulate it
make the client's AJAX calls refer to that top-level object
rebuild said object via one or two constructor calls in the Web API
alter the object and save the changes via a top-level object method
My solution was to create an additional layer of objects that are based on the EF ones to allow omitting/adding data from/in the objects being sent to the client at will, circumventing issues with circular references and other problems caused by eager/lazy loading. These objects are fed to the Web API.
Now here's where the trouble begins:
as a result of the additional layer, reconstruction of the EF objects from the additional layer ones is necessary whenever saving changes in the EF objects down the chain. It is getting increasingly arduous to keep up with all of this.
The objects are highly interconnected and constrained. Should I just write extensions for the EF objects to emulate the features of the additional layer?
If that's the case, won't the JavaScriptSerializer try to serialize all the objects in all of the relationships (where the serialized object's key is defined as a FK in another object)? Because that's what I've gathered from the error messages.
Or am I doing this all wrong?

In a disconnected application like yours, I'd remove all navigation properties. They may seem convenient at first, but will cause headaches along the way.
I believe that accessing all entities via Id is the way to go.
You can write a JavaScript class which is responsible for receiving entities per Id and can therefore cache them.
So each time you need an entity on the client, you get it through this class.
This would result in having one controller for each entity.
Another advantage is that you don't always have to send and receive the whole object graph, which seems like a lot of data (10 levels deep is a lot).
won't the JavaScriptSerializer try to serialize all the objects in all of the relationships
Yes it will. That's one reason why objects with navigation properties, especially circular ones, are very difficult to serialize.

Related

Store a very complex object with circular references to avoid needing a Singleton in my web app

Currently I generate a singleton object for my site which is created from a flat file. The object is never changed by the application, it is purely used as reference by some of the functions on my site. It effectively describes the schema for another file type we use, similar to XML & XSD
The singleton object generated is fairly large (5000+ child objects, with up to 500 properties each) and it contains circular references as child objects can reference the parent object due to 2 way references.
This all works fine currently, however the first time the app loads, it takes over a minute to generate the singleton. Which is the reason I am using a singleton here, so I don't have to regenerate it every request. But it also means every time the app pool restarts, the first request takes well over a minute to load. Every consecutive request is fast once the object is in memory.
Seeing that the flat file rarely changes, I would like to find a good way to generate the object once and store the object in way that I can quickly retrieve it when needed.
I tried serializing the object to json, and storing it in the database, however due to the circular references Json.net fails or I end up losing information if I configure it to ignore the references when serializing.
Are there any better ways of handling such an object better, or am I stuck to using the singleton for now?
Given the static nature of this object, serialization would one option.
The circular reference issue you mention with Json.Net can be easily remedied using appropriate JsonSerializerSettings (refer to this answer)
If speed is of the essence, than you may want to investigate other serialization options (netserializer claims to be one of the fastest).
Ultimately though, you should look to put the file / object structure into some sort of cache that sits outside of the app pool (Redis perhaps), or even load the flat file's data into a well designed database schema (i.e. parent - child relationships etc).
Creating a massive object graph is rather inefficient and will potentially create memory issues.

Serializing complex EF model over JSON

I have done a lot of searching and experimenting and have been unable to find a workable resolution to this problem.
Environment/Tools
Visual Studio 2013
C#
Three tier web application:
Database tier: SQL Server 2012
Middle tier: Entity Framework 6.* using Database First, Web API 2.*
Presentation tier: MVC 5 w/Razor, Bootstrap, jQuery, etc.
Background
I am building a web application for a client that requires a strict three-tier architecture. Specifically, the presentation layer must perform all data access through a web service. The presentation layer cannot access a database directly. The application allows a small group of paid staff members to manage people, waiting lists, and the resources they are waiting for. Based on the requirements the data model/database design is entirely centered around the people (User table).
Problem
When the presentation layer requests something, say a Resource, it is related to at least one User, which in turn is related to some other table, say Roles, which are related to many more Users, which are related to many more Roles and other things. The point being that, when I query for just about anything EF wants to bring in almost the entire database.
Normally this would be okay because of EF's default lazy-load behavior, but when serializing just about any object to JSON for returning to the presentation layer, the Newtonsoft.Json serializer hangs for a long time then blows a stack error.
What I Have Tried
Here is what I have attempted so far:
Set Newtonsoft's JSON serialier ReferenceLoopHandling setting to Ignore. No luck. This is not cyclic graph issue, it is just the sheer volume of data that gets brought in (there are over 20,000 Users).
Clear/reset unneeded collections and set reference properties to null. This showed some promise, but I could not get around Entity Framework's desire to track everything.
Just setting nav properties to null/clear causes those changes to be saved back to the database on the next .SaveChanges() (NOTE: This is an assumption here, but seemed pretty sound. If anyone knows different, please speak up).
Detaching the entities causes EF to automatically clear ALL collections and set ALL reference properties to null, whether I wanted it to or not.
Using .AsNotTracking() on everything threw some exception about not allowing non-tracked entities to have navigation properties (I don't recall the exact details).
Use AutoMapper to make copies of the object graph, only including related objects I specify. This approach is basically working, but in the process of (I believe) performing the auto-mapping, all of the navigation properties are accessed, causing EF to query and resolve them. In one case this leads to almost 300,000 database calls during a single request to the web service.
What I am Looking For
In short, has anyone had to tackle this problem before and come up with a working and performant solution?
Lacking that, any pointers for at least where to look for how to handle this would be greatly appreciated.
Additional Note: It occurred to me as I wrote this that I could possibly combine the second and third items above. In other words, set/clear nav properties, then automap the graph to new objects, then detach everything so it won't get saved (or perhaps wrap it in a transaction and roll it back at the end). However, if there is a more elegant solution I would rather use that.
Thanks,
Dave
It is true that doing what you are asking for is very difficult and it's an architectural trap I see a lot of projects get stuck in.
Even if this problem were solveable, you'd basically end up just having a data layer which just wraps the database and destroys performance because you can't leverage SQL properly.
Instead, consider building your data access service in such a way that it returns meaningful objects containing meaningful data; that is, only the data required to perform a specific task outlined in the requirements documentation. It is true that an post is related to an account, which has many achievements, etc, etc. But usually all I want is the text and the name of the poster. And I don't want it for one post. I want it for each post in a page. Instead, write data services and methods which do things which are relevant to your application.
To clarify, it's the difference between returning a Page object containing a list of Posts which contain only a poster name and message and returning entire EF objects containing large amounts of irrelevant data such as IDs, auditing data such as creation time.
Consider the Twitter API. If it were implemented as above, performance would be abysmal with the amount of traffic Twitter gets. And most of the information returned (costing CPU time, disk activity, DB connections as they're held open longer, network bandwidth) would be completely irrelevant to what developers want to do.
Instead, the API exposes what would be useful to a developer looking to make a Twitter app. Get me the posts by this user. Get me the bio for this user. This is probably implemented as very nicely tuned SQL queries for someone as big as Twitter, but for a smaller client, EF is great as long as you don't attempt to defeat its performance features.
This additionally makes testing much easier as the smaller, more relevant data objects are far easier to mock.
For three tier applications, especially if you are going to expose your entities "raw" in services, I would recommend that you disable Lazy Load and Proxy generation in EF. Your alternative would be to use DTO's instead of entities, so that the web services are returning a model object tailored to the service instead of the entity (as suggested by jameswilddev)
Either way will work, and has a variety of trade-offs.
If you are using EF in a multi-tier environment, I would highly recommend Julia Lerman's DbContext book (I have no affiliation): http://www.amazon.com/Programming-Entity-Framework-Julia-Lerman-ebook/dp/B007ECU7IC
There is a chapter in the book dedicated to working with DbContext in multi-tier environments (you will see the same recommendations about Lazy Load and Proxy). It also talks about how to manage inserts and updates in a multi-tier environment.
i had such a project which was the stressful one .... and also i needed to load large amount of data and process them from different angles and pass it to complex dashboard for charts and tables.
my optimization was :
1-instead of using ef to load data i called old-school stored procedure (and for more optimization grouping stuff to reduce table as much as possible for charts. eg query returns a table that multiple charts datasets can be extracted from it)
2-more important ,instead of Newtonsoft's JSON i used fastJSON which performance was mentionable( it is really fast but not compatible with complex object. simple example may be view models that have list of models inside and may so on and on or )
better to read pros and cons of fastJSON before
https://www.codeproject.com/Articles/159450/fastJSON
3-in relational database design who is The prime suspect of this problem it might be good to create those tables which have raw data to process in (most probably for analytics) denormalized schema which save performance on querying data.
also be ware of using model class from EF designer from database for reading or selecting data especially when u want serialize it(some times i think separating same schema model to two section of identical classes/models for writing and reading data in such a way that the write models has benefit of virtual collections came from foreign key and read models ignore it...i am not sure for this).
NOTE: in case of very very huge data its better go deeper and set up in-memory table OLTP for the certain table contains facts or raw data how ever in that case your table acts as none relational table like noSQL.
NOTE: for example in mssql you can use benefits of sqlCLR which let you write scripts in c#,vb..etc and call them by t-sql in other words handle data processing from database level.
4-for interactive view which needs load data i think its better to consider which information might be processed in server side and which ones can be handled by client side(some times its better to query data from client-side ... how ever you should consider that those data in client side can be accessed by user) how ever it is situation-wise.
5-in case of large raw data table in view using datatables.min.js is a good idea and also every one suggest using serverside-paging on tables.
6- in case of importing and exporting data from big files oledb is a best choice i think.
how ever still i doubt them to be exact solutions. if any body have practical solutions please mention it ;) .
I have fiddled with a similar problem using EF model first, and found the following solution satisfying for "One to Many" relations:
Include "Foreign key properties" in the sub-entities and use this for later look-up.
Define the get/set modifiers of any "Navigation Properties" (sub-collections) in your EF entity to private.
This will give you an object not exposing the sub-collections, and you will only get the main properties serialized. This workaround will require some restructuring of your LINQ queries, asking directly from your table of SubItems with the foreign key property as your filtering option like this:
var myFitnessClubs = context.FitnessClubs
?.Where(f => f.FitnessClubChainID == myFitnessClubChain.ID);
Note 1:
You may off-cause choose to implement this solution partly, hence only affecting the sub-collections that you strongly do not want to serialize.
Note 2:
For "Many to Many" relations, at least one of the entities needs to have a public representation of the collection. Since the relation cannot be retrieved using a single ID property.

serializing LinqToSql generated entities keeping relations and lazy loading

We have a rather large ASP.NET MVC project using LINQ to SQL which we are in the process of migrating to Windows Azure.
Now, we need to serialize objects for storing in the Azure distributed cache, and setting "Serialization Mode" to "Unidirectional" in the .dbml file, and thus automatically decorating the generated classes and properties with DataContract and DataMember attributes accordingly, seems to be the recommended way. However, this makes any relationships not yet loaded by LINQ to SQL to be lost when serializing and saved as null.
What would be the prefered way of proceeding, taking into account a couple of things:
As mentioned, it's a rather large project with the generated *.designer.cs file being close to 1.5MB
Disabling lazy loading completely would most likely be a big
performance hit due to many deep class relationships.
Changing ORM tool is something we are considering, but doing this at the same time as switching platforms would probably be a bad thing.
If this boils down to somehow manually specifying which objects and relations to serialize accross the whole project; using something like protobuf-net for some extra performance gains would probably not be a huge step.
However, this makes any relationships not yet loaded by LINQ to SQL to be lost when serializing and saved as null.
Yes, this is normal and expected when serializing - you are essentially taking a snapshot of what was available at that time, because lazy-loading depends on it being loaded via a data-context. It would be inadviseable for any tool to crawl the entire model looking for things to load, because that could keep going indefinitely, essentially bringing large chunks of unwanted data into play.
Options:
explicitly fetch (either pre-emptively via "loadwith", or by hitting the appropriate properties) the data you are interested in before serializing
or, load the data into a completely separate DTO model for serialization - in many ways, this is a re-statement of the first, since it will by necessity involve iterating over (projecting) the data you want, but it means you are creating the DTO to suit the exact shape you actually want to send

Recommended Pattern for Lazy-loading Portions of Object Graph from Cache

I'm using memcache behind a web app to minimize the hits to our SQL database. I'm storing C# objects into this cache by marking them with SerializableAttribute. We make heavy use of dependency injection via Ninject in our app.
Some of these objects are large, and I'd like to break them up. However, they come from a single stored procedure call (i.e. one stored procedure call gets cooked into the full object graph), and I'd like to be able to break these objects up and lazy-load specific subgraphs from the cache separately rather than load the entire object graph into memory all at once.
What are some patterns that would help me accomplish this?
As far as patterns go, I'd say the one large complex object that's built from a single stored procedure is suspect. I'm not sure if your caching is a requirement or just the current state of its implementation.
The pattern that I'm used to is a type of repository pattern, using operations that fill specific contracts. And those operations house one or many datasources that call stored procedures in the database that will be used to build ONE of those sub-graphs you speak of. With that said, if you're going to lazy load data from a database, then I can only assume that many of the object members are not used much of the time which furthers my point - break that object up.
A couple things about it:
It can be chatty if the entire object is being used regularly
It is fully injectable via the Operations
The datasources contain the reader for the specific object, thus only performing ONE task (SOLID)
Can be modified to use Entity Framework, without too much fuss
Can be designed to implement an interface, making it more reusable
Will require you to break up that proc into smaller, chewable pieces, which will likely only benefit you in the long run.
The complex object shown in this diagram really shouldn't exist if only parts of it are going to be used. Instead, consider segregating those objects. However, it really depends on how this object is being used.
UPDATE:
Using your cache as the repository, I would probably approach it like this:
So basically, you store the legacy object, but in your operations, you use them to build more relavent DTOs that are returned to the client.
I know NHibernate does lazy loading buy replacing objects with proxy objects. Then in the proxy object there is some kind of check that causes the loading of the real object the first time you try to access the object.
I'm not sure of any Design Patterns that would cover that, but you could look at the Nhibernate source code.
A down side of using proxy objects is you have to be careful with inheritance and type checks as you could be checking the type of the proxy and not the actual object.

how can I save/keep-in-sync an in-memory graph of objects with the database?

Question - What is a good best practice approach for how can I save/keep-in-sync an jn-memory graph of objects with the database?
Background:
That is say I have the classes Node and Relationship, and the application is building up a graph of related objects using these classes. There might be 1000 nodes with various relationships between them. The application needs to query the structure hence an in-memory approach is good for performance no doubt (e.g. traverse the graph from Node X to find the root parents)
The graph does need to be persisted however into a database with tables NODES and RELATIONSHIPS.
Therefore what is a good best practice approach for how can I save/keep-in-sync an jn-memory graph of objects with the database?
Ideal requirements would include:
build up changes in-memory and then 'save' afterwards (mandatory)
when saving, apply updates to database in correct order to avoid hitting any database constraints (mandatory)
keep persistence mechanism separate from model, for ease in changing persistence layer if needed, e.g. don't just wrap an ADO.net DataRow in the Node and Relationship classes (desirable)
mechanism for doing optimistic locking (desirable)
Or is the overhead of all this for a smallish application just not worth it and I should just hit the database each time for everything? (assuming the response times were acceptable) [would still like to avoid if not too much extra overhead to remain somewhat scalable re performance]
I'm using the self tracking entities in Entity Framework 4. After the entities are loaded into memory the StartTracking() MUST be called on every entity. Then you can modify your entity graph in memory without any DB-Operations. When you're done with the modifications, you call the context extension method "ApplyChanges(rootOfEntityGraph)" and SaveChanges(). So your modifications are persisted. Now you have to start the tracking again on every entity in the graph. Two hints/ideas I'm using at the moment:
1.) call StartTracking() at the beginning on every entity
I'm using an Interface IWorkspace to abstract the ObjectContext (simplifies testing -> see the OpenSource implementation bbv.DomainDrivenDesign at sourceforge). They also use a QueryableContext. So I created a further concrete Workspace and QueryableContext implementation and intercept the loading process with an own IEnumerable implementation. When the workspace's consumer executes the query which he get with CreateQuery(), my intercepting IEnumerable object registers an eventhandler on the context's ChangeTracker. In this event handler I call StartTracking() for every entity loaded and added into the context (doesn't work if you load the objects with NoTrakcing, because in that case the objects aren't added to the context and the event handler will not be fired). After the enumeration in the self made Iterator, the event handler on the ObjectStateManager is deregistered.
2.) call StartTracking() after ApplyChanges()/SaveChanges()
In the workspace implementation, I ask the context's ObjectStateManager for the modified entities, i.e:
var addedEntities = this.context.ObjectStateManager.GetObjectStateEntries(EntityState.Added);
--> analogous for modified entities
cast them to IObjectWithChangeTracker and call the AcceptChanges() method on the entity itself. This starts the object's changetracker again.
For my project I have the same mandatory points as you. I played around with EF 3.5 and didn't find a satisfactory solution. But the new ability of self tracking entities in EF 4 seems to fit my requirements (as far as I explored the funcionality).
If you're interested, I'll send you my "spike"-project.
Have anyone an alternative solution? My project is a server application which holds objects in memory for fast operations, while modifications should also be persisted (no round trip to DB). At some points in code the object graphs are marked as deleted/terminated and are removed from the in-memory container. With the explained solution above I can reuse the generated model from EF and have not to code and wrapp all objects myself again. The generated code for the self tracking entities arises from T4 templates which can be adapted very easily.
Thanks a lot for other ideas/critism
Short answer is that you can still keep a graph (collection of linked objects) of the objects in memory and write the changes to the database as they occur. If this is taking too long, you could put the changes onto a message queue (but that is probably overkill) or execute the updates and inserts on a separate thread.

Categories

Resources