checking whether data is null

checking whether data is null - c#

Is it best practice to check whether a value is null coming from a database even though there are constraints on the column which disallow nulls.
Thanks in advance

What you're talking about is defensive programming. I like to think it's good to practice it when you don't trust your input. You might think you can trust your DB now, but what if in the future you decide that column should have a NULL value somewhere? Then you need to change your code everywhere you assumed it wasn't?
If you don't ever think you'll change it (like it's a primary key or something) then I don't think you need to. It's more future proofing in case you one day decide to change your schema. If that column will never have a case where NULL makes sense, then you probably don't need to check. In the event you get a NULL, like commenters have said, you have a bigger problem in that your DB is probably hosed.

I'd say go ahead and add checking for Nulls on your business logic. It would be particularly useful too during unit testing. Maybe not now but in the future.

It depends on a lot of factors.
In many of my projects I strive to enable unit-testing, which would decouple the code that processed data from the code that retrieved the data (ie. by talking to the database.)
That way I could also potentially reuse the code that processed the data by feeding it data from other sources. I would absolutely safeguard input values to the logic layer in this case.
Also, applications evolve over time, so even if it is impossible for you to get null values right now, at some point it might be implemented, and a lot of old code would then suddenly get things it wasn't written to handle. I would personally want them to fail fast rather than in some cases silently process wrongly.

Related

InsertedDate, set Default Value in DB or Set in application?

I am writing a new application and I am in the design phase. A few of the db tables require to have an InsertedDate field which will display when the record was inserted.
I am thinking of setting the Default Value to GetDate()
Is there an advantage to doing this in the application over setting the default value in the database?

I think its better to set the Default Value to GetDate() in SQL Server, rather than in your application. You can use that to get an ordered table based on insertion. It looks like an overhead if you try to set it from the application. Unless you want to specify some particular date with the insert, which I believe kills the purpose of it.

If you ever need to manually INSERT a record into the database, you'll need to remember to set this field if you're setting the default value in your application to avoid NULL references.
Personally, I prefer to set default values in the database where possible, but others might have differing opinions on this.

If you do it in your application, you can unit test it. In the projects I've been on, especially when using an ORM, we do all default operations in the code.

When designing, I always put a lot of importance on separation of concern, which for me, in the context of "database functionality vs application functionality", boils down to the question: "Who owns the data?". I have always been of the opinion that my code owns my data - never the database. The database is simply the container for data. This is similar to saying that I own my clothes, rather than my dresser owning my clothes. My dresser performs an important function, making my clothes available in an organized fashion, but I am always the agent putting clothes in the dresser, and I am responsible for their organization.
I'm sure many will have a problem with this analogy, saying that modern databases are much more powerful than my dresser, but in my experience the more functionality I put in the database layer, the more confusing projects get and the more blurred the line between data and functionality (e.g. database stored procedures, etc). Admittedly, yours is a simple example of this concept, but once a precedent is set, anything goes.
Another thing I'd like to address is the ease-of-use factor. I reject the idea that because a particular implementation is convenient (such as avoiding nulls, different server times, etc.) then I should choose it. To me, choosing such implementations is equivalent to saying: "It's alright if my code doesn't work. I'll avoid using my code rather than fixing it and making it robust."
I'm sure there are many cases, perhaps at extreme scale or due to other business requirements, when database-layer functionality is not only warranted but necessary, but my experience tells me the more you can keep your functionality in your code, the cleaner, simpler, and more robust your application will be.

Why should a C# property getter not read from a DB?

Some of the answers and comments to this question: Simplest C# code to poll a property?, imply that retrieving data from a database in a property's getter is Generally a Bad Idea.
Why is it so bad?
(If you have sources for your information, please mention them.)
I will usually be storing the information in a variable after the first "get" for reuse, if that influences your answer.

Because retrieving data from a database could cause any number of exceptions, and property getters, as a rule, should never throw exceptions.
The expected behavior of a property getter is just to return a value; if it's actually doing a lot more than that, it should be a method.
Microsoft's guide for Property Design explains the reasons:
https://learn.microsoft.com/en-us/dotnet/standard/design-guidelines/property

It's bad because (among other things) it violates the Principle of Least Astonishment.
Programmers generally expect properties to do simple gets/sets. Encapsulating data access in a property, which could throw exceptions, cause side effects, and change the state of the data in the database, is not what is generally expected.
I'm not saying there is no case for complex properties - sometimes, it can be a good solution. But, it is not the expected way to do things.

The Short Version: Making a property getter directly access a database would violate The Separation of Concerns principle.
More Detail:
Generally speaking, a property is intended to represent data associated with an object, such as the FirstName property of a Person object. Property values may be set internally or externally, but the act of modifying and retrieving this data on the object should be separated from the act of retrieving or committing that data to a permanent store.

Any time you accessed the getter, you'd be making another call out to your database.

A getter, by definition, should just be encapsulating data, not functionality.
Also, you would likely be redefining functionality and making many trips to the database if you had more than one getter that needed to round-trip to the database. Why not handle that in a centralized place rather than splitting that out among multiple properties?
Methods are for class-level data modification and functionality, properties are for individual data modification and retrieval only.

Besides exceptions being likely, querying a database is a slooooow operation. In both aspects, using property getters as a database accessor violates the principle of least astonishment to client code. If I see a library class with a property, I don't expect it to do a lot of work, but to access a value that's easily retrieved.
The least astonishing option here is to provide an old fashioned, simple Get function.

If you are indeed only retrieving the value from the database and not also writing its value back, such as a read only property, there is nothing inherently wrong. Especially if the property cannot exists without its parent. However in implementation it can cause maintainability problems. You are coupling the process of retrieving stored information with access to that information. If you continue to follow this pattern for other properties, and aspects of your storage system change, the change could proliferate throughout your code base. For example a table name or column data type change. This is why it is bad to have database calls in your property getter.
On a side note: if the database is throwing exceptions when you try to retrieve the value, then obviously there is a bug in your code (or the calling client's code) and the exception will still surface regardless of where you put the data access code. Many times data is backed by some sort of collection within the class that can throw exceptions and it is standard practice to store property values in this manner (see EventHandlerList).
Properties were designed specifically because programmers need to perform additional logic when getting and setting values, such as validation.
With all of that having been said, reexamine your code and ask yourself "how easy will this be to change later?" from there you should be on your way to a more maintainable solution.

Which .NET exception to throw for invalid database state?

I am writing some data access code and I want to check for potentially "invalid" data states in the database. For instance, I am returning a widget out of the database and I only expect one. If I get two, I want to throw an exception. Even though referential integrity should prevent this from occurring, I do not want to depend on the DBAs never changing the schema (to clarify this, if the primary key constraint is removed and I get a dupe, I want to break quickly and clearly).
I would like to use the System.IO.InvalidDataException, except that I am not dealing with a file stream so it would be misleading. I ended up going with a generic applicationexception. Anyone have a better idea?

InvalidDataException seems pretty reasonable to me:
The name fits perfectly
The description fits pretty reasonably when you consider that it's effectively a data "stream" from the database
Nothing in the description mentions files, so I wouldn't be worried about that side of things
You're effectively deserializing data from a store. It happens to be an RDBMS, but that's relatively unimportant. The data is invalid, so InvalidDataException fits well.
To put it another way - if you were loading the data from a file, would you use InvalidDataException? Assuming you would, why should it matter where the data is coming from, in terms of the exception being thrown?

If you need an exception that would exactly describe the situation you're dealing with, why not make your own exception?
Just inherit it from System.Exception.

I might be tempted to use one of the following:
InvalidConstraintException
NotSupportedException
OverflowException
Or, just go ahead and create my own: TooManyRowsException

You could write a custom exception if you do not find any suitable standard-exception ...
But, you say:
Even though referential integrity
should prevent this from occurring, I
do not want to depend on the DBAs
never changing the schema.
When someone changes the DB schema, changes are pretty big that you'll have to make some modifications to your application / data-access code as well ...

DAO/Repository/NHibernate And Handling Edge DB Cases

One thing that keeps stumping me, and I do not see much mention of it in books/blogs, is how to handle DB operations in a system that really don't fall under the jurisdiction of DAOs or Repositories. I like using the approach of generic DAOs/Repositories to handle common DB operations, but what about dealing with things that aren't entities? For example, say I am building a system and in a few cases I need to call a stored procedure to run a batch operation and just return a success code. Or, I need to just load a date from a miscellaneous table. Or, I want to load a list of US states from a table. These cases certainly do occur and they really may not have anything to do with an entity or other object in the system. Without dropping in a nasty "misc" DB class that forgoes something like NHibernate to manually use ADO.NET to do these types of operations, what are some other standard approaches from the OOP crowd?

Bypassing the DAO and working directly with the ADO connector (or native driver) is exactly what everyone does, and there is nothing "nasty" or wrong about it. If these are indeed edge cases for you, then what kind of framework would you expect? What is worse is when people wrap all kinds of weird shenanigans around their DAO to do something it sucks at just in the name of "not going around <insert DAO here>".
I mean if you have a stored proc, then you have obviously decided that DB agnosticism is out the window (it's a overrated goal anyway) so why have misgivings about using ADO.Net? Just make it very explicit in the code, don't hide it. Say it loud and proud "I'm using the database, and I don't give a flick what anyone thinks!". Oh and please make sure its still seperated from the rest of your logic. I don't want my unit tests to get slow because of your stored proc.

Some data changes in the database. How can I trigger some C# code doing some work upon these changes?

Suppose I have some application A with a database. Now I want to add another application B, which should keep track of the database changes of application A. Application B should do some calculations, when data has changed. There is no direct communication between both applications. Both can only see the database.
The basic problem is: Some data changes in the database. How can I trigger some C# code doing some work upon these changes?
To give some stimulus for answers, I mention some approaches, which I am currently considering:
Make application B polling for
changes in the tables of interest.
Advantage: Simple approach.
Disadvantage: Lots of traffic,
especially when many tables are
involved.
Introduce triggers, which will fire
on certain events. When they fire
they should write some entry into an
“event table”. Application B only
needs to poll that “event table”.
Advantage: Less traffic.
Disadvantage: Logic is placed into
the database in the form of triggers.
(It’s not a question of the
“evilness” of triggers. It’s a design
question, which makes it a
disadvantage.)
Get rid of the polling approach and
use SqlDependency class to get
notified for changes. Advantage:
(Maybe?) Less traffic than polling
approach. Disadvantage: Not database
independent. (I am aware of
OracleDependency in ODP.NET, but what
about the other databases?)
What approach is more favorable? Maybe I have missed some major (dis)advantage in the mentioned approaches? Maybe there are some other approaches I haven’t think of?
Edit 1: Database independency is a factor for the ... let's call them ... "sales people". I can use SqlDependency or OracleDependency. For DB2 or other databases I can fall back to the polling approach. It's just a question of cost and benefit, which I want to at least to think about so I can discuss it.

I'd go with #1. It's not actually as much traffic as you might think. If your data doesn't change frequently, you can be pessimistic about it and only fetch something that gives you a yay or nay about table changes.
If you design your schema with polling in mind you may not really incur that much of a hit per poll.
If you're only adding records, not changing them, then checking the highest id might be enough on a particular table.
If you're updating them all then you can store a timestamp column and index it, then look for the maximum timestamp.
And you can send an ubber query that polls multiple talbes (efficiently) and returns the list of changed tables.
Nothing in this answer is particularly clever, I'm just trying to show that #1 may not be quite as bad as it at first seems.

I would go with solution #1 (polling), because avoiding dependencies and direct connections between separate apps can help reduce complexity and problems.

I think you have covered the approaches I have thought of, there's no absolute "best" way to do it, what matters are your requirements and priorities.
Personally I like the elegance of the SqlDependency class; and what does database independence matter in the real world for most applications anyway? But if it's a priority for you to have database independence then you can't use that.
Polling is my second favourite because it keeps the database clean from triggers and application logic; it really isn't a bad option anyway because as you say it's simple. If application B can wait a few minutes at a time before "noticing" database changes, it would be a good option.
So my answer is: it depends. :)
Good luck!

Do you really care about database independence?
Would it really be that hard to create a difference mechanism for each database type that all have the same public interface?
I am aware of OracleDependency in ODP.NET, but what about the other databases?
SQL Server has something like that, but I've never used it.

You can make an MySqlDependency class, and implement SqlDependency or SqlDependencyForOracle (pooling)

You can use an SQL trigger inside a SQL CLR Database Project and run your code in that project, see: https://msdn.microsoft.com/en-us/library/938d9dz2.aspx
Or, on trigger inside the SQL CLR Database Project you could make a request from the SQL CLR Database Project to the project you actually want to act on the trigger.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.