How to solve Optimistic Concurrency Updates in c# .NET N-tier applications?

How to solve Optimistic Concurrency Updates in c# .NET N-tier applications? - c#

Hy everyone.
In c# .net VS 2008 I'm developing an N-tier CRM framework solution and when it's done I want to share it.
The architecture is based on:
Data Access Layer, Entity Framework, Bussines Logic Layer, WCF and finally the presentation layer (win forms).
Somewhere I had read, that more than 2 tier layers are problematic, beacuse of the Optimistic Concurrency Updates (multiple client transactions with the same data).
In max. 2-tier layer solutions this should not be a problem because of the controls (like datagridview) that are solving this problem by themself, so I'm asking myself if it's not better to work with 2-tier layers and so avoid the optimistic concurrency problem?
Actually I want to make an N-tier layer solution for huge projects and not 2-tiers. I don't know how to solve concurrency problems like this and hope to get help right here.
Certainly there should be good some mechanisms to solve this... maybe any suggestions, examples, etc.?
Thanking you in anticipation.
Best regards,
Jooj

It's not really a question of the number of tiers. The question is how does your data access logic deal with concurrency. Dealing with concurrency should happen in whichever tier handles your data access regardless of how many tiers you have. But I understand where you're coming from as the .NET controls and components can hide this functionality and reduce the number of tiers needed.
There are two common methods of optimistic concurrency resolution.
The first is using a timestamp on rows to determine if the version the user was looking at when they started their edit has been modified by the time they commit their edit. Keep in mind that this is not necessarily a proper Timestamp database data type. Different systems will use different data types each with their own benefits and drawbacks. This is the simpler approach and works fine with most well designed databases.
The second common approach is, when committing changes, to identify the row in question not only by id but by all of the original values for the fields that the user changed. If the original values of the fields and id don't match on the record being edited you know that at least one of those fields has been changed by another user. This option has the benefit that even if two users edit the same record, as long as they don't change the same fields, the commit works. The downside is that there is possible extra work involved to guarantee that the data in the database record is in a consistent state as far as business rules are concerned.
Here's a decent explanation of how to implement simple optimistic concurrency in EF.

We use a combination manual merging (determining change sets and collisions) and last man wins depending on the data requirements. if the data changes collide same field changed from the a common original value then merge type exceptions are thrown and the clients handle that.

A few things come to my mind:
1) If you are using EF surely you don'y have Data Access Layer?! Do you mean database itself?
2) Question with tiers is both a phydical and logical one. So do you mean physical or logical?
3) In any-tiered application there is this issue with concurrency. Even in client-server, people could open a form, go soemwhere and come back and then save while the data has been changed by soemone else. You can use a timestamp to check while saving making sure your last update was when you have had the data.
4) Do not think too much on less or more tiers. Just implement the functionality as simple as possible and with the minimum number of layers.

Related

Application DAL design

Hello and thanks for looking.
I have a DAL question for an application I'm working on. The app is going to extract some data from 5-6 tables from a production RDBMS that serves a much more critical role in the org. What the app has to do is use the data in these tables, analyze, apply some business logic/rules and then present.
The restrictions are that since the storage model is critical in nature to the org, I need to restrict how the app will request the data. Since the tables are relatively small, I created my data access to use DataTables to load the entirety of the db tables on a fixed interval using a timer.
My questions are really around my current design and the potential use of EF or LINQtoSQL
Can EF/LS work around the restrictions of the RDBMS. Most tutorials I've seen, the storage exists solely for the application. Can access to the storage be controlled and/or can EF use DataTables rather than An RDBMS?
Since the entirety of the tables are going to be loaded, is there a best practice for creating classes to consume the data within these tables? I will have to do in memory joins and querying/logic to get at the actual data I need.
Sorry if I'm being generic. I'm more just looking for thoughts and opinions as opposed to a solution to my problem. Please done hesitate to share your thoughts. Thanks.

For your first question, yes Entity Framework can use a existing DB as it's source, the term to search for when looking for Entity Framework tutorials on this topic is called "Database First"
For your second question let me first preface it with a warning: many ORMs are not designed around using it to load the entire data table and do bulk operations on them, especially if you will be modifying the result set and pushing the data back to the server in large quanties. The updates will be row based not set based because you did the modifications in C# code, not in a T-SQL query. Most ORMs are built around the expectation that you will be doing CRUD operations on the row level, not ETL operations or set level CRUD operations (except for Read which most ORMs will do as a set operation).
If you will not be updating the data, only pulling out using Entity Framework and building reports and whatnot off of the data you should be fine. If you are bulk inserting in to the database, things get more problematic. See this SO question for more information.

Is using CRUD stored procedures against a view with NOLOCK bad?

Our DBAs have created a pattern where our database layer is exposed to EF via views and CRUD stored procedures. The CRUD works against the view. All the views have the NOLOCK hint. From what I understand, NOLOCK is a dirty read, and that makes me nervous. Our databases are not high volume, but it seems like blanket NOLOCK is not very scalable while maintaining data integrity. I get that the decoupling is a good idea, but the problem there is we don't. Our externally exposed objects look just like our views which map 1 to 1 with our tables.
"If we want to change the underlying data model, we can." ... but we don't. I won't touch on what a PITA this all is from a VS/EF tooling viewpoint.
Is NOLOCK used in this situation bad? Since our database looks exactly like our class library, I think it makes sense to just get rid of the whole view/sproc layer and hit the DB direct from EF, does it?

Issuing a nolock is absolutely a dirty read. There are times that there is no impact from this, but in some scenarios you may have result sets with missing records or duplicates Itzik Ben-Gan has some Q&A regarding this topic. The reason for using stored procs to abstract your CRUD operations are pretty obvious when you want to do some storage optimizations after the project goes into maintenance mode. Think of the views as a way for you to not need to worry about that later. It can be easier for your DBA's to optimize the data access code as well without consuming your time as a developer. I cannot say that your DBA's are right or wrong based only on the data in this post. There are simply too many variables that may go into the decision. A blanket implementation of nolock being the correct option would be rare though. HTH

bob beauchemin blog has many good articles about gauging the strengths and weakness of ORM wrappers from an expert DB designer perspecive. Good to check out to learn wtf is actually going on when you use EF. Regarding using NOLOCK hint this will be good until it isn't ! As you seem to allready be aware when you scale to a certain extent you will run into all type of integrity issues but this depends on what you tolerance are for phantom reads, writes etc. Basically the more precise you want to be with your atomoticity the more of a bad idea it is.

Database Design In SQL Server or C#?

Should a database be designed on SQL Server or C#?
I always thought it was more appropriate to design it on SQL Server, but recently I started reading a book (Pro ASP.NET MVC Framework) which, to my understanding, basically says that it's probably a better idea to write it in C# since you will be accessing the model through C#, which does make sense.
I was wondering what everyone else's opinion on this matter was...
I mean, for example, do you consider "correct" having a table that specifies constants (like an AccessLevel table that is always supposed to contain
1 Everyone
2 Developers
3 Administrators
4 Supervisors
5 Restricted
Wouldn't it be more robust and streamlined to just have an enum for that same purpose?

A database schema should be designed on paper or with an ERD tool.
It should be implemented in the database.
Are you thinking about ORMs like Entity Framework that let you use code to generate the database?
Personally, I would rather think through my design on paper before committing it to a DB myself. I would be happy to use an ORM or class generator from this DB later on.

Before VS.NET 2010 I was using SQL Server Management Studio to design my databases, now I am using EF 4.0 designer, for me it's the best way to go.

If your problem domain is complex or its complexity grows as the system evolves you'll soon discover you need some meta data to make life easier. C# can be a good choice as a host language for such stuff as you can utilize its type-system to enforce some invariants (like char-columns length, null/not null restrictions or check-constraints; you can declared it as consts, enums, etc). Unfortunately i don't know utilities (sqlmetal.exe can export some meta but only as xml) that can do it out of the box, although some CASE tools probably can be customized. I'd go for some custom-made generator to produce the db schema from C# (just a few hours work comparing to learning, for example, customization options offered by Sybase PowerDesigner).

ORMs have their place, that place is NOT database design. There are many considerations in designing a database that need to be thought through not automatically generated no matter how appealing the idea of not thinking about design might be. There are often many things that need to be considered that have nothing to do with the application, things like data integrity, reporting, audit tables and data imports. Using an ORM to create a database that looks like an object model may not be the best design for performance and may not have the the things you really need in terms of data integrity. Remember even if you think nothing except the application will touch the database ever, this is not true. At some point the data base will need to have someone do a major data revision (to fix a problem) that is done directly on the database not through the application. At somepoint you are going to need need to import a million records from some other company you just bought and are goping to need an ETL process outside teh application. Putting all your hopes and dreams for the database (as well as your data integrity rules) is short-sighted.

What is a better practice ? Working with Dataset or Database

I have been developing many application and have been into confusion about using dataset.
Till date i dont use dataset and works into my application directly from my database using queries and procedures that runs on Database Engine.
But I would like to know, what is the good practice
Using Dataset ?
or
Working direclty on Database.
Plz try to give me certain cases also when to use dataset along with operation (Insert/Update)
can we set read/write lock on dataset with respect to our database

You should either embrace stored procedures, or make your database dumb. That means that you have no logic whatsoever in your db, only CRUD operations. If you go with the dumb database model, Datasets are bad. You are better off working with real objects so you can add business logic to them. This approach is more complicated than just operating directly on your database with stored procs, but you can manage complexity better as your system grows. If you have large system with lots of little rules, stored procedures become very difficult to manage.

In ye olde times before MVC was a mere twinkle in Haack's eye, it was jolly handy to have DataSet handle sorting, multiple relations and caching and whatnot.
Us real developers didn't care about such trivia as locks on the database. No, we had conflict resolution strategies that generally just stamped all over the most recent edits. User friendliness? < Pshaw >.
But in these days of decent generic collections, a plethora of ORMs and an awareness of separation of concerns they really don't have much place any more. It would be fair to say that whenever I've seen a DataSet recently I've replaced it. And not missed it.

As a rule of thumb, I would put logic that refers to data consistency, integrity etc. as close to that data as possible - i.e. in the database. Also, if I am having to fetch my data in a way that is interdependent (i.e. fetch from tables A, B and C where the relationship between A, B and C's contribution is known at request time), then it makes sense to save on callout overhead and do it one go, via a database object such as a function, procedure (as already pointed out by OMGPonies). For logic that is a level or two removed, it makes sense to have it where dealing with it "procedurally" is a bit more intuitive, such as in a dataset. Having said all that, rules of thumb are sometimes what their acronym infers...ROT!
In past .Net projects I've often done data imports/transformations (e.g. for bank transaction data files) in the database (one callout, all logic is encapsulated in in procedure and is transaction protected), but have "parsed" items from that same data in a second stage, in my .net code using datatables and the like (although these days I would most likely skip the dataset stage and work on them from a higher lever of abstraction, using class objects).

I have seen datasets used in one application very well, but that is in 7 years development on quite a few different applications (at least double figures).
There are so many best practices around these days that point twords developing with Objects rather than datasets for enterprise development. Objects along with an ORM like NHibernate or Entity Framework can be very powerfull and take a lot of the grunt work out of creating CRUD stored procedures. This is the way I favour developing applications as I can seperate business logic nicely this way in a domain layer.
That is not to say that datasets don't have their place, I am sure in certain circumstances they may be a better fit than Objects but for me I would need to be very sure of this before going with them.

I have also been wondering this when I never needed DataSets in my source code for months.
Actually, if your objects are O/R-mapped, and use serialization and generics, you would never need DataSets.
But DataSet has a great use in generating reports.
This is because, reports have no specific structure that can be or should be O/R-mapped.
I only use DataSets in tandem with reporting tools.

Some data changes in the database. How can I trigger some C# code doing some work upon these changes?

Suppose I have some application A with a database. Now I want to add another application B, which should keep track of the database changes of application A. Application B should do some calculations, when data has changed. There is no direct communication between both applications. Both can only see the database.
The basic problem is: Some data changes in the database. How can I trigger some C# code doing some work upon these changes?
To give some stimulus for answers, I mention some approaches, which I am currently considering:
Make application B polling for
changes in the tables of interest.
Advantage: Simple approach.
Disadvantage: Lots of traffic,
especially when many tables are
involved.
Introduce triggers, which will fire
on certain events. When they fire
they should write some entry into an
“event table”. Application B only
needs to poll that “event table”.
Advantage: Less traffic.
Disadvantage: Logic is placed into
the database in the form of triggers.
(It’s not a question of the
“evilness” of triggers. It’s a design
question, which makes it a
disadvantage.)
Get rid of the polling approach and
use SqlDependency class to get
notified for changes. Advantage:
(Maybe?) Less traffic than polling
approach. Disadvantage: Not database
independent. (I am aware of
OracleDependency in ODP.NET, but what
about the other databases?)
What approach is more favorable? Maybe I have missed some major (dis)advantage in the mentioned approaches? Maybe there are some other approaches I haven’t think of?
Edit 1: Database independency is a factor for the ... let's call them ... "sales people". I can use SqlDependency or OracleDependency. For DB2 or other databases I can fall back to the polling approach. It's just a question of cost and benefit, which I want to at least to think about so I can discuss it.

I'd go with #1. It's not actually as much traffic as you might think. If your data doesn't change frequently, you can be pessimistic about it and only fetch something that gives you a yay or nay about table changes.
If you design your schema with polling in mind you may not really incur that much of a hit per poll.
If you're only adding records, not changing them, then checking the highest id might be enough on a particular table.
If you're updating them all then you can store a timestamp column and index it, then look for the maximum timestamp.
And you can send an ubber query that polls multiple talbes (efficiently) and returns the list of changed tables.
Nothing in this answer is particularly clever, I'm just trying to show that #1 may not be quite as bad as it at first seems.

I would go with solution #1 (polling), because avoiding dependencies and direct connections between separate apps can help reduce complexity and problems.

I think you have covered the approaches I have thought of, there's no absolute "best" way to do it, what matters are your requirements and priorities.
Personally I like the elegance of the SqlDependency class; and what does database independence matter in the real world for most applications anyway? But if it's a priority for you to have database independence then you can't use that.
Polling is my second favourite because it keeps the database clean from triggers and application logic; it really isn't a bad option anyway because as you say it's simple. If application B can wait a few minutes at a time before "noticing" database changes, it would be a good option.
So my answer is: it depends. :)
Good luck!

Do you really care about database independence?
Would it really be that hard to create a difference mechanism for each database type that all have the same public interface?
I am aware of OracleDependency in ODP.NET, but what about the other databases?
SQL Server has something like that, but I've never used it.

You can make an MySqlDependency class, and implement SqlDependency or SqlDependencyForOracle (pooling)

You can use an SQL trigger inside a SQL CLR Database Project and run your code in that project, see: https://msdn.microsoft.com/en-us/library/938d9dz2.aspx
Or, on trigger inside the SQL CLR Database Project you could make a request from the SQL CLR Database Project to the project you actually want to act on the trigger.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.