How to paginate OwnsMany collections? - c#

Consider I'm having a domain entity of User. I want to identify this user by (fingerprint) a set of properties such as IP, email, phone, user-agent. Now since by DDD principles a Fingerprint can't be an entity, so I defined this as a ValueObject. Each time when a user tries to make a transaction I make a lookup for a matching fingerprint to associate a user with the request.
Ef core suggests us using OwnsMany() for ValueObjects. My problem is that this collection of owned entities are loaded immediately without any pagination. I may load 100 users per page and each of which may have hundreds of fingerprints because each time IP or user-agent changes I have to create a new one.
My questions are
Is there a way to paginate those fingerprints? I can't do it, because the repository has a constraint for aggregate roots only.
Can I actually use OwnsMany ValueObjects for situations when there are more than 1k objects?
If not, how do I solve this problem?

I do not know your whole domain model, so I assume there is a reason why User is an entity and contains a collection of Fingerprints.
Each time when a user tries to make a transaction I make a lookup for a matching fingerprint to associate a user with the request.
You should rather not load all records from a database into your application memory. As I understand, what you really need when User is making a transaction, are not all the fingerprints, but only the matching one. A solution would be to properly prepare your object when you are loading it from your database. Do not load all the Fingerprints, but only the one you are interested in. You can make another property for that in your User class. Write a Linq query or SQL select in your infrastructure layer.
This solution might cause another problem. Every time you add a new Fingerprint, you will have that redundant property in your User class that you don't need. When you add new Fingerprint, this property will not update automatically. If you do not feel comfortable with it, I would suggest you to seperate your read and write models. You can use Tactical DDD patterns for saving, but probably you don't need them for simple reads.

Related

How to implement a C# Winforms database application with synchronisation?

Background
I am developing a C# winforms application - currently up to about 11000 LOC and the UI and logic is about 75% done but there is no persistence yet. There are hundreds of attributes on the forms. There are 23 entities/data classes.
Requirement
The data needs to be kept in an SQL database. Most of the users operate remotely and we cannot rely on them having a connection so we need a solution that maintains a database locally and keeps it in synch with the central database.
Edit: Most of the remote users will only require a subset of the database in their local copy. This is because if they don't have access permissions (as defined and stored in my application) to view other user's records, they will not receive copies of them during synchronisation.
How can I implement this?
Suggested Solution
I could use the Microsoft Entity Framework to create a database and the link between database and code. This would save a lot of manual work as there are hundreds of attributes. I am new to this technology but have done a "hello world" project in it.
For data synch, each entity would have an integer primary key ID. Additionally it would have a secondary ID column which relates to the central database. This secondary column would contain nulls in the central database but would be populated in the local databases.
For synchronisation, I would write code which copies the records and assigns the IDs accordingly. I would need to handle conflicts.
Can anyone foresee any stumbling blocks to doing this? Would I be better off using one of the recommended solutions for data sychronisation, and if so would these work with the entity framework?
Synching data between relational databases is a pain. Your best course of action is probably dependent on: how many users will there be? How probably are conflicts (i.e. that the users will work offline on the same data). Also possibly what kind of manpower do you have (do you have proper DBAs/Sql Server devs standing by to assist with the SQL part, or are you just .NET devs).
I don't envy you this task, it smells of trouble. I'd especially be worried about data corruption and spreading that corruption to all clients rapidly. I'd put extreme countermeasures in place before any data in the remote DB gets updated.
If you predict a lot of conflicts - the same chunk of data gets modified many times by multiple users - I'd probably at least consider creating an additional 'merge' layer to figure out, what is the correct order of operations to perform on the remote db.
One thought - it might be very wrong and crazy, but just the thing that popped in my mind - would be to use JSON Patch on the entities, be it actual domain objects or some configuration containers. All the changes the user makes are recorded as JSON Patch statements, then applied to the local db, and when the user is online - submitted - with timestamps! - to merge provider. The JSON Patch statements from different clients could be grouped by the entity id and sorted by timestamp, and user could get feedback on what other operations from different users are queued - and manually make amends to it. Those grouped statments could be even stored in a files in a git repo. Then at some pre-defined intervals, or triggered manually, the update would be performed on a server-side app and saved to the remote db. After this the users local copies would be refreshed from server.
It's just a rough idea, but I think that you need something with similar capability - it doesn't have to be JSON Patch + Git, you can do it in probably hundreds of ways. I don't thing though, that you will get away with just going through the local/remote db and making updates/merges. Imagine the scenario, where user updates some data (let's say, 20 fields) offline, another makes completely different updates to 20 fields, and 10 of those are common between the users. Now, what should the synch process do? Apply earlier and then latter changes? I'm fairly certain that both users would be furious, because their input was 'atomic' - either everything is changed, or nothing is. The latter 'commit' must be either rejected, or users should have an option to amend it in respect of the new data. That highly depends what your data is, and as I said - what will be number/behaviour of users. Duh, even time-zones become important here - if you have users all in one time-zone you might get away with having predefined times of day when system synchs - but no way you'll convince people with many different business hours that the 'synch session' will happen at e.g. 11 AM, when they are usually giving presentation to management or sth ;)

Serializing complex EF model over JSON

I have done a lot of searching and experimenting and have been unable to find a workable resolution to this problem.
Environment/Tools
Visual Studio 2013
C#
Three tier web application:
Database tier: SQL Server 2012
Middle tier: Entity Framework 6.* using Database First, Web API 2.*
Presentation tier: MVC 5 w/Razor, Bootstrap, jQuery, etc.
Background
I am building a web application for a client that requires a strict three-tier architecture. Specifically, the presentation layer must perform all data access through a web service. The presentation layer cannot access a database directly. The application allows a small group of paid staff members to manage people, waiting lists, and the resources they are waiting for. Based on the requirements the data model/database design is entirely centered around the people (User table).
Problem
When the presentation layer requests something, say a Resource, it is related to at least one User, which in turn is related to some other table, say Roles, which are related to many more Users, which are related to many more Roles and other things. The point being that, when I query for just about anything EF wants to bring in almost the entire database.
Normally this would be okay because of EF's default lazy-load behavior, but when serializing just about any object to JSON for returning to the presentation layer, the Newtonsoft.Json serializer hangs for a long time then blows a stack error.
What I Have Tried
Here is what I have attempted so far:
Set Newtonsoft's JSON serialier ReferenceLoopHandling setting to Ignore. No luck. This is not cyclic graph issue, it is just the sheer volume of data that gets brought in (there are over 20,000 Users).
Clear/reset unneeded collections and set reference properties to null. This showed some promise, but I could not get around Entity Framework's desire to track everything.
Just setting nav properties to null/clear causes those changes to be saved back to the database on the next .SaveChanges() (NOTE: This is an assumption here, but seemed pretty sound. If anyone knows different, please speak up).
Detaching the entities causes EF to automatically clear ALL collections and set ALL reference properties to null, whether I wanted it to or not.
Using .AsNotTracking() on everything threw some exception about not allowing non-tracked entities to have navigation properties (I don't recall the exact details).
Use AutoMapper to make copies of the object graph, only including related objects I specify. This approach is basically working, but in the process of (I believe) performing the auto-mapping, all of the navigation properties are accessed, causing EF to query and resolve them. In one case this leads to almost 300,000 database calls during a single request to the web service.
What I am Looking For
In short, has anyone had to tackle this problem before and come up with a working and performant solution?
Lacking that, any pointers for at least where to look for how to handle this would be greatly appreciated.
Additional Note: It occurred to me as I wrote this that I could possibly combine the second and third items above. In other words, set/clear nav properties, then automap the graph to new objects, then detach everything so it won't get saved (or perhaps wrap it in a transaction and roll it back at the end). However, if there is a more elegant solution I would rather use that.
Thanks,
Dave
It is true that doing what you are asking for is very difficult and it's an architectural trap I see a lot of projects get stuck in.
Even if this problem were solveable, you'd basically end up just having a data layer which just wraps the database and destroys performance because you can't leverage SQL properly.
Instead, consider building your data access service in such a way that it returns meaningful objects containing meaningful data; that is, only the data required to perform a specific task outlined in the requirements documentation. It is true that an post is related to an account, which has many achievements, etc, etc. But usually all I want is the text and the name of the poster. And I don't want it for one post. I want it for each post in a page. Instead, write data services and methods which do things which are relevant to your application.
To clarify, it's the difference between returning a Page object containing a list of Posts which contain only a poster name and message and returning entire EF objects containing large amounts of irrelevant data such as IDs, auditing data such as creation time.
Consider the Twitter API. If it were implemented as above, performance would be abysmal with the amount of traffic Twitter gets. And most of the information returned (costing CPU time, disk activity, DB connections as they're held open longer, network bandwidth) would be completely irrelevant to what developers want to do.
Instead, the API exposes what would be useful to a developer looking to make a Twitter app. Get me the posts by this user. Get me the bio for this user. This is probably implemented as very nicely tuned SQL queries for someone as big as Twitter, but for a smaller client, EF is great as long as you don't attempt to defeat its performance features.
This additionally makes testing much easier as the smaller, more relevant data objects are far easier to mock.
For three tier applications, especially if you are going to expose your entities "raw" in services, I would recommend that you disable Lazy Load and Proxy generation in EF. Your alternative would be to use DTO's instead of entities, so that the web services are returning a model object tailored to the service instead of the entity (as suggested by jameswilddev)
Either way will work, and has a variety of trade-offs.
If you are using EF in a multi-tier environment, I would highly recommend Julia Lerman's DbContext book (I have no affiliation): http://www.amazon.com/Programming-Entity-Framework-Julia-Lerman-ebook/dp/B007ECU7IC
There is a chapter in the book dedicated to working with DbContext in multi-tier environments (you will see the same recommendations about Lazy Load and Proxy). It also talks about how to manage inserts and updates in a multi-tier environment.
i had such a project which was the stressful one .... and also i needed to load large amount of data and process them from different angles and pass it to complex dashboard for charts and tables.
my optimization was :
1-instead of using ef to load data i called old-school stored procedure (and for more optimization grouping stuff to reduce table as much as possible for charts. eg query returns a table that multiple charts datasets can be extracted from it)
2-more important ,instead of Newtonsoft's JSON i used fastJSON which performance was mentionable( it is really fast but not compatible with complex object. simple example may be view models that have list of models inside and may so on and on or )
better to read pros and cons of fastJSON before
https://www.codeproject.com/Articles/159450/fastJSON
3-in relational database design who is The prime suspect of this problem it might be good to create those tables which have raw data to process in (most probably for analytics) denormalized schema which save performance on querying data.
also be ware of using model class from EF designer from database for reading or selecting data especially when u want serialize it(some times i think separating same schema model to two section of identical classes/models for writing and reading data in such a way that the write models has benefit of virtual collections came from foreign key and read models ignore it...i am not sure for this).
NOTE: in case of very very huge data its better go deeper and set up in-memory table OLTP for the certain table contains facts or raw data how ever in that case your table acts as none relational table like noSQL.
NOTE: for example in mssql you can use benefits of sqlCLR which let you write scripts in c#,vb..etc and call them by t-sql in other words handle data processing from database level.
4-for interactive view which needs load data i think its better to consider which information might be processed in server side and which ones can be handled by client side(some times its better to query data from client-side ... how ever you should consider that those data in client side can be accessed by user) how ever it is situation-wise.
5-in case of large raw data table in view using datatables.min.js is a good idea and also every one suggest using serverside-paging on tables.
6- in case of importing and exporting data from big files oledb is a best choice i think.
how ever still i doubt them to be exact solutions. if any body have practical solutions please mention it ;) .
I have fiddled with a similar problem using EF model first, and found the following solution satisfying for "One to Many" relations:
Include "Foreign key properties" in the sub-entities and use this for later look-up.
Define the get/set modifiers of any "Navigation Properties" (sub-collections) in your EF entity to private.
This will give you an object not exposing the sub-collections, and you will only get the main properties serialized. This workaround will require some restructuring of your LINQ queries, asking directly from your table of SubItems with the foreign key property as your filtering option like this:
var myFitnessClubs = context.FitnessClubs
?.Where(f => f.FitnessClubChainID == myFitnessClubChain.ID);
Note 1:
You may off-cause choose to implement this solution partly, hence only affecting the sub-collections that you strongly do not want to serialize.
Note 2:
For "Many to Many" relations, at least one of the entities needs to have a public representation of the collection. Since the relation cannot be retrieved using a single ID property.

Is this a good approach to select the database based on the user?

I'm developing a system using ASP.NET MVC + WebAPI + AngularJS which has the following property: users can log in and different users have totally different data. The reason is simple: the system allows management of data, but although the schema is the same for everyone, the data is totally disconnected between users. Even because of organization, consistency and security, each user would need one separate database.
This gives rise to a problem: although every single database should be the same, i.e., same tables and columns, and hence same EF Data Context, the connections are different. This confuses me because I'm used to specify the connection string on the config XML file and this couldn't be done here, since the connection string would be dynamic.
I've then thought about a solution, which I doesn't now if it's the best one: I create one repository, which in it's construction receives the username of the logged in user. Then, the repository goes on the database of the system and looks for the connection data for that logged in user (this data would be informed when the user registers). Then the repository builds then connection string and feeds it into the DbContext.
Is this a good approach to this problem? Or there are more recommended ways to deal with this kind of thing? Security is one important concern here, and because of that I'm unsure of my approach.
Each Data Context in an Entity Framework solution has a constructor overload that allows you to specify a connection string. You can find out how to build and use that connection string at the link below.
Reference
How to: Build an EntityConnection Connection String
That said, unless you have very special requirements, it's much better from a maintenance and operational standpoint to simply put a UserID in the appropriate tables, and filter on the currently logged in User ID.
One database per user seems like a crazy solution to me.
Include a user_id column in tables that contain per user data and filter on it appropriately.
I think this depends on how complex your per-user database will be. If we are talking about 5-10 tables, then it is easier to add, manage and query addition ID column for all tables, than it is to manage multiple databases. But the more complex the model will get and the more tables will be in the database, then it becomes much easier just to have one database per user compared to having one more column for each table and having to add the user checks to all your queries. Especially the more complicated ones.
Same goes for performance. If you are expecting user databases to grow large in volume of data, then having separate databases could allow you to scale horizontally by putting different databases into different servers.
Shared database will also pose a problem if user requests raw access into their data. And it can happen, for example when they want to migrate the data.

How to manage session/transaction lifetime for proccessing many entities

In the project my team is working on, there is a windows service which iterates through all the entities in a certain table, and updates some of their fields based on some rules we defined. We use NHibernate as our ORM tool. Currently, we open one session and one transaction for the entire proccess, which means the transaction is commited after all the entities have been proccessed. I think this approach isn't good, and I wanted to hear some more opinios:
Should we keep our current way of managing the session, Or should move to a different approach?
One option I thought about is opening a transaction per entity, and another suggestion was to open a new session for each entity.
What approach you think will work best?
There isn't a single way to do it; it all depends on the specific cases.
In the app I'm working on, I have examples of the three approaches, and there's a reason for choosing each one. For example:
The whole process must have transactional atomicity: use a single transaction
The process has a lot of common data, but each record in the "master" table can be considered a unit of work: use a single session, multiple transactions
Processing each record in the master table should be independent from the others (including error handling): use a session per record

Why is the asp.NET profile designed in such a horrible way?

In the current project I'm working on, we are using the asp.NET profile to store information about users, such as their involvment in a mailing list.
Now, in order to get a list of all the users in the mailing list, I cannot simply do a database query, as the asp.NET profile table is, simply put, awful.
For those who do not know, the profile table has two main columns, the 'keys' column, and 'values' column, and they are organised as so:
Keys:
Key1:dataType:startIndex:endIndex:key2:dataType . . etc.
Values:
value1value2value3...
This is pretty much impossible to query using SQL, so the only option to find users that have a specific property is to load up a list of ALL the users and loop through it.
In a site with over 150k members, this is understandably very slow!
Are there specific reasons why the Profile was designed like this, or is it just a terrible way of doing dynamically-generated data?
I agree that it's a pretty bad way to store profile data, but I suspect the use case was just to get the profile data for a user with a single query but in such a way that it can be extended to handle any number of different profile properties. If you don't like it, you can always write your own, custom profile provider that separates each value out into its own column. Having implemented various membership and role providers, I don't think that this would be too complicated a task. The number of methods doesn't look too large.
The whole point of the Provider model is that it abstracts away the data source. The idea is that, as a developer, you don't need to know how the data is stored or in what format - you just have a common set of methods for accessing it. This means you can swap providers without changing a single line of code. It also means that you specifically do not try and access data direct from the data source (eg. going straight to the database) by bypassing the provider methods - that defeats the whole point.
The default ASP.NET profile provider is actually very powerful, as it can not only store simple value types (strings, ints etc.) but it can also store complex objects and entire collections in a single field. Try doing that in a relational database! However, the downside of this generic-ism is that it comes at a cost of efficiency. Which is why, if you have a specific need, then you are supposed to implement your own provider. For example, see SearchableSqlProfileProvider - The Searchable SQL Profile Provider.
Of course, your third option is to simple not use the profile provider - nobody is forcing you to! You could implement your own classes/database entirely, as you would have had to do in other frameworks.
I have implemented various custom providers (membership/Sitemap/Roles etc) and havent really looked at the ASP.NET Profile Provider after seeing that kind of thing (Name/Value pairs or XML data). I am not sure, but I think the Profile is primary created for User Preferences/Settings where the settings are only required for a specific user, I dont think the Profile is meant for User "Data" that can be queried?
Note: This is an assumtion based on what I think I know, please comment on this otherwise.

Categories

Resources