How to manage session/transaction lifetime for proccessing many entities

How to manage session/transaction lifetime for proccessing many entities - c#

In the project my team is working on, there is a windows service which iterates through all the entities in a certain table, and updates some of their fields based on some rules we defined. We use NHibernate as our ORM tool. Currently, we open one session and one transaction for the entire proccess, which means the transaction is commited after all the entities have been proccessed. I think this approach isn't good, and I wanted to hear some more opinios:
Should we keep our current way of managing the session, Or should move to a different approach?
One option I thought about is opening a transaction per entity, and another suggestion was to open a new session for each entity.
What approach you think will work best?

There isn't a single way to do it; it all depends on the specific cases.
In the app I'm working on, I have examples of the three approaches, and there's a reason for choosing each one. For example:
The whole process must have transactional atomicity: use a single transaction
The process has a lot of common data, but each record in the "master" table can be considered a unit of work: use a single session, multiple transactions
Processing each record in the master table should be independent from the others (including error handling): use a session per record

Related

How to paginate OwnsMany collections?

Consider I'm having a domain entity of User. I want to identify this user by (fingerprint) a set of properties such as IP, email, phone, user-agent. Now since by DDD principles a Fingerprint can't be an entity, so I defined this as a ValueObject. Each time when a user tries to make a transaction I make a lookup for a matching fingerprint to associate a user with the request.
Ef core suggests us using OwnsMany() for ValueObjects. My problem is that this collection of owned entities are loaded immediately without any pagination. I may load 100 users per page and each of which may have hundreds of fingerprints because each time IP or user-agent changes I have to create a new one.
My questions are
Is there a way to paginate those fingerprints? I can't do it, because the repository has a constraint for aggregate roots only.
Can I actually use OwnsMany ValueObjects for situations when there are more than 1k objects?
If not, how do I solve this problem?

I do not know your whole domain model, so I assume there is a reason why User is an entity and contains a collection of Fingerprints.
Each time when a user tries to make a transaction I make a lookup for a matching fingerprint to associate a user with the request.
You should rather not load all records from a database into your application memory. As I understand, what you really need when User is making a transaction, are not all the fingerprints, but only the matching one. A solution would be to properly prepare your object when you are loading it from your database. Do not load all the Fingerprints, but only the one you are interested in. You can make another property for that in your User class. Write a Linq query or SQL select in your infrastructure layer.
This solution might cause another problem. Every time you add a new Fingerprint, you will have that redundant property in your User class that you don't need. When you add new Fingerprint, this property will not update automatically. If you do not feel comfortable with it, I would suggest you to seperate your read and write models. You can use Tactical DDD patterns for saving, but probably you don't need them for simple reads.

How to implement a C# Winforms database application with synchronisation?

Background
I am developing a C# winforms application - currently up to about 11000 LOC and the UI and logic is about 75% done but there is no persistence yet. There are hundreds of attributes on the forms. There are 23 entities/data classes.
Requirement
The data needs to be kept in an SQL database. Most of the users operate remotely and we cannot rely on them having a connection so we need a solution that maintains a database locally and keeps it in synch with the central database.
Edit: Most of the remote users will only require a subset of the database in their local copy. This is because if they don't have access permissions (as defined and stored in my application) to view other user's records, they will not receive copies of them during synchronisation.
How can I implement this?
Suggested Solution
I could use the Microsoft Entity Framework to create a database and the link between database and code. This would save a lot of manual work as there are hundreds of attributes. I am new to this technology but have done a "hello world" project in it.
For data synch, each entity would have an integer primary key ID. Additionally it would have a secondary ID column which relates to the central database. This secondary column would contain nulls in the central database but would be populated in the local databases.
For synchronisation, I would write code which copies the records and assigns the IDs accordingly. I would need to handle conflicts.
Can anyone foresee any stumbling blocks to doing this? Would I be better off using one of the recommended solutions for data sychronisation, and if so would these work with the entity framework?

Synching data between relational databases is a pain. Your best course of action is probably dependent on: how many users will there be? How probably are conflicts (i.e. that the users will work offline on the same data). Also possibly what kind of manpower do you have (do you have proper DBAs/Sql Server devs standing by to assist with the SQL part, or are you just .NET devs).
I don't envy you this task, it smells of trouble. I'd especially be worried about data corruption and spreading that corruption to all clients rapidly. I'd put extreme countermeasures in place before any data in the remote DB gets updated.
If you predict a lot of conflicts - the same chunk of data gets modified many times by multiple users - I'd probably at least consider creating an additional 'merge' layer to figure out, what is the correct order of operations to perform on the remote db.
One thought - it might be very wrong and crazy, but just the thing that popped in my mind - would be to use JSON Patch on the entities, be it actual domain objects or some configuration containers. All the changes the user makes are recorded as JSON Patch statements, then applied to the local db, and when the user is online - submitted - with timestamps! - to merge provider. The JSON Patch statements from different clients could be grouped by the entity id and sorted by timestamp, and user could get feedback on what other operations from different users are queued - and manually make amends to it. Those grouped statments could be even stored in a files in a git repo. Then at some pre-defined intervals, or triggered manually, the update would be performed on a server-side app and saved to the remote db. After this the users local copies would be refreshed from server.
It's just a rough idea, but I think that you need something with similar capability - it doesn't have to be JSON Patch + Git, you can do it in probably hundreds of ways. I don't thing though, that you will get away with just going through the local/remote db and making updates/merges. Imagine the scenario, where user updates some data (let's say, 20 fields) offline, another makes completely different updates to 20 fields, and 10 of those are common between the users. Now, what should the synch process do? Apply earlier and then latter changes? I'm fairly certain that both users would be furious, because their input was 'atomic' - either everything is changed, or nothing is. The latter 'commit' must be either rejected, or users should have an option to amend it in respect of the new data. That highly depends what your data is, and as I said - what will be number/behaviour of users. Duh, even time-zones become important here - if you have users all in one time-zone you might get away with having predefined times of day when system synchs - but no way you'll convince people with many different business hours that the 'synch session' will happen at e.g. 11 AM, when they are usually giving presentation to management or sth ;)

c# update single db field or whole object?

This might seem like an odd question, but it's been bugging me for a while now. Given that i'm not a hugely experienced programmer, and i'm the sole application/c# developer in the company, I felt the need to sanity check this with you guys.
We have created an application that handles shipping information internally within our company, this application works with a central DB at our IT office.
We've recently switch DB from mysql to mssql and during the transition we decided to forgo the webservices previously used and connect directly to the DB using Application Role, for added security we only allow access to Store Procedures and all CRUD operations are handle via these.
However we currently have stored procedures for updating every field in one of our objects, which is quite a few stored procedures, and as such quite a bit of work on the client for the DataRepository (needing separate code to call the procedure and pass the right params for each procedure).
So i'm thinking, would it be better to simply update the entire object (in this case, an object represents a table, for example shipments) given that a lot of that data would be change one field at a time after initial insert, and that we are trying to keep the network usage down, as some of the clients will run with limited internet.
Whats the standard practice for this kind of thing? or is there a method that I've overlooked?

I would say that updating all the columns for the entire row is a much more common practice.
If you have a proc for each field, and you change multiple fields in one update, you will have to wrap all the stored procedure calls into a single transaction to avoid the database getting into an inconsistent state. You also have to detect which field changed (which means you need to compare the old row to the new row).
Look into using an Object Relational Mapper (ORM) like Entity Framework for these kinds of operations. You will find that there is not general consensus on whether ORMs are a great solution for all data access needs, but it's hard to argue that they solve the problem of CRUD pretty comprehensively.

Connecting directly to the DB over the internet isn't something I'd switch to in a hurry.
"we decided to forgo the webservices previously used and connect directly to the DB"
What made you decide this?
If you are intent on this model, then a single SPROC to update an entire row would be advantageous over one per column. I have a similar application which uses SPROCs in this way, however the data from the client comes in via XML, then a middleware application on our server end deals with updating the DB.

The standard practice is not to connect to DB over the internet.
Even for small app, this should be the overall model:
Client app -> over internet -> server-side app (WCF WebService) -> LAN/localhost -> SQL
DB
Benefits:
your client app would not even know that you have switched DB implementations.
It would not know anything about DB security, etc.
you, as a programmer, would not be thinking in terms of "rows" and "columns" on client side. Those would be objects and fields.
you would be able to use different protocols: send only single field updates between client app and server app, but update entire rows between server app and DB.
Now, given your situation, updating entire row (the entire object) is definitely more of a standard practice than updating a single column.

It's better to only update what you change if you know what you change (if using an ORM like entity Framework for example), but if you're going down the stored proc route then yes definately update everything in a row at once that's way granular enough.
You should take the switch as an oportunity to change over to LINQ to entities however if you're already in a big change and ditch stored procedures in the process whenever possible

Is this a good approach to select the database based on the user?

I'm developing a system using ASP.NET MVC + WebAPI + AngularJS which has the following property: users can log in and different users have totally different data. The reason is simple: the system allows management of data, but although the schema is the same for everyone, the data is totally disconnected between users. Even because of organization, consistency and security, each user would need one separate database.
This gives rise to a problem: although every single database should be the same, i.e., same tables and columns, and hence same EF Data Context, the connections are different. This confuses me because I'm used to specify the connection string on the config XML file and this couldn't be done here, since the connection string would be dynamic.
I've then thought about a solution, which I doesn't now if it's the best one: I create one repository, which in it's construction receives the username of the logged in user. Then, the repository goes on the database of the system and looks for the connection data for that logged in user (this data would be informed when the user registers). Then the repository builds then connection string and feeds it into the DbContext.
Is this a good approach to this problem? Or there are more recommended ways to deal with this kind of thing? Security is one important concern here, and because of that I'm unsure of my approach.

Each Data Context in an Entity Framework solution has a constructor overload that allows you to specify a connection string. You can find out how to build and use that connection string at the link below.
Reference
How to: Build an EntityConnection Connection String
That said, unless you have very special requirements, it's much better from a maintenance and operational standpoint to simply put a UserID in the appropriate tables, and filter on the currently logged in User ID.

One database per user seems like a crazy solution to me.
Include a user_id column in tables that contain per user data and filter on it appropriately.

I think this depends on how complex your per-user database will be. If we are talking about 5-10 tables, then it is easier to add, manage and query addition ID column for all tables, than it is to manage multiple databases. But the more complex the model will get and the more tables will be in the database, then it becomes much easier just to have one database per user compared to having one more column for each table and having to add the user checks to all your queries. Especially the more complicated ones.
Same goes for performance. If you are expecting user databases to grow large in volume of data, then having separate databases could allow you to scale horizontally by putting different databases into different servers.
Shared database will also pose a problem if user requests raw access into their data. And it can happen, for example when they want to migrate the data.

Entity Framework and ADO.NET with Unit of Work pattern

We have a system built using Entity Framework 5 for creating, editing and deleting data but the problem we have is that sometimes EF is too slow or it simply isn't possible to use entity framework (Views which build data for tables based on users participating in certain groups in database, etc) and we are having to use a stored procedure to update the data.
However we have gotten ourselves into a problem where we are having to save the changes to EF in order to have the data in the database and then call the stored procedures, we can't use ITransactionScope as it always elevates to a distributed transaction and/or locks the table(s) for selects during the transaction.
We are also trying to introduce a DomainEvents pattern which will queue events and raise them after the save changes so we have the data we need in the DB but then we may end up with the first part succeeding and the second part failing.
Are there any good ways to handle this or do we need to move away from EF entirely for this scenario?

I had similar scenario . Later I break the process into small ones and use EF only, and make each small process short. Even overall time is longer, but system is easier to maintain and scale. Also I minimized joins, only update entity itself, disable EF'S AutoDetectChangesEnabled and ValidateOnSaveEnabled.
Sometimes if you look your problem in different ways, you may have better solution.
Good luck!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.