Some of the answers and comments to this question: Simplest C# code to poll a property?, imply that retrieving data from a database in a property's getter is Generally a Bad Idea.
Why is it so bad?
(If you have sources for your information, please mention them.)
I will usually be storing the information in a variable after the first "get" for reuse, if that influences your answer.
Because retrieving data from a database could cause any number of exceptions, and property getters, as a rule, should never throw exceptions.
The expected behavior of a property getter is just to return a value; if it's actually doing a lot more than that, it should be a method.
Microsoft's guide for Property Design explains the reasons:
https://learn.microsoft.com/en-us/dotnet/standard/design-guidelines/property
It's bad because (among other things) it violates the Principle of Least Astonishment.
Programmers generally expect properties to do simple gets/sets. Encapsulating data access in a property, which could throw exceptions, cause side effects, and change the state of the data in the database, is not what is generally expected.
I'm not saying there is no case for complex properties - sometimes, it can be a good solution. But, it is not the expected way to do things.
The Short Version: Making a property getter directly access a database would violate The Separation of Concerns principle.
More Detail:
Generally speaking, a property is intended to represent data associated with an object, such as the FirstName property of a Person object. Property values may be set internally or externally, but the act of modifying and retrieving this data on the object should be separated from the act of retrieving or committing that data to a permanent store.
Any time you accessed the getter, you'd be making another call out to your database.
A getter, by definition, should just be encapsulating data, not functionality.
Also, you would likely be redefining functionality and making many trips to the database if you had more than one getter that needed to round-trip to the database. Why not handle that in a centralized place rather than splitting that out among multiple properties?
Methods are for class-level data modification and functionality, properties are for individual data modification and retrieval only.
Besides exceptions being likely, querying a database is a slooooow operation. In both aspects, using property getters as a database accessor violates the principle of least astonishment to client code. If I see a library class with a property, I don't expect it to do a lot of work, but to access a value that's easily retrieved.
The least astonishing option here is to provide an old fashioned, simple Get function.
If you are indeed only retrieving the value from the database and not also writing its value back, such as a read only property, there is nothing inherently wrong. Especially if the property cannot exists without its parent. However in implementation it can cause maintainability problems. You are coupling the process of retrieving stored information with access to that information. If you continue to follow this pattern for other properties, and aspects of your storage system change, the change could proliferate throughout your code base. For example a table name or column data type change. This is why it is bad to have database calls in your property getter.
On a side note: if the database is throwing exceptions when you try to retrieve the value, then obviously there is a bug in your code (or the calling client's code) and the exception will still surface regardless of where you put the data access code. Many times data is backed by some sort of collection within the class that can throw exceptions and it is standard practice to store property values in this manner (see EventHandlerList).
Properties were designed specifically because programmers need to perform additional logic when getting and setting values, such as validation.
With all of that having been said, reexamine your code and ask yourself "how easy will this be to change later?" from there you should be on your way to a more maintainable solution.
Related
I am writing a new application and I am in the design phase. A few of the db tables require to have an InsertedDate field which will display when the record was inserted.
I am thinking of setting the Default Value to GetDate()
Is there an advantage to doing this in the application over setting the default value in the database?
I think its better to set the Default Value to GetDate() in SQL Server, rather than in your application. You can use that to get an ordered table based on insertion. It looks like an overhead if you try to set it from the application. Unless you want to specify some particular date with the insert, which I believe kills the purpose of it.
If you ever need to manually INSERT a record into the database, you'll need to remember to set this field if you're setting the default value in your application to avoid NULL references.
Personally, I prefer to set default values in the database where possible, but others might have differing opinions on this.
If you do it in your application, you can unit test it. In the projects I've been on, especially when using an ORM, we do all default operations in the code.
When designing, I always put a lot of importance on separation of concern, which for me, in the context of "database functionality vs application functionality", boils down to the question: "Who owns the data?". I have always been of the opinion that my code owns my data - never the database. The database is simply the container for data. This is similar to saying that I own my clothes, rather than my dresser owning my clothes. My dresser performs an important function, making my clothes available in an organized fashion, but I am always the agent putting clothes in the dresser, and I am responsible for their organization.
I'm sure many will have a problem with this analogy, saying that modern databases are much more powerful than my dresser, but in my experience the more functionality I put in the database layer, the more confusing projects get and the more blurred the line between data and functionality (e.g. database stored procedures, etc). Admittedly, yours is a simple example of this concept, but once a precedent is set, anything goes.
Another thing I'd like to address is the ease-of-use factor. I reject the idea that because a particular implementation is convenient (such as avoiding nulls, different server times, etc.) then I should choose it. To me, choosing such implementations is equivalent to saying: "It's alright if my code doesn't work. I'll avoid using my code rather than fixing it and making it robust."
I'm sure there are many cases, perhaps at extreme scale or due to other business requirements, when database-layer functionality is not only warranted but necessary, but my experience tells me the more you can keep your functionality in your code, the cleaner, simpler, and more robust your application will be.
I'm using memcache behind a web app to minimize the hits to our SQL database. I'm storing C# objects into this cache by marking them with SerializableAttribute. We make heavy use of dependency injection via Ninject in our app.
Some of these objects are large, and I'd like to break them up. However, they come from a single stored procedure call (i.e. one stored procedure call gets cooked into the full object graph), and I'd like to be able to break these objects up and lazy-load specific subgraphs from the cache separately rather than load the entire object graph into memory all at once.
What are some patterns that would help me accomplish this?
As far as patterns go, I'd say the one large complex object that's built from a single stored procedure is suspect. I'm not sure if your caching is a requirement or just the current state of its implementation.
The pattern that I'm used to is a type of repository pattern, using operations that fill specific contracts. And those operations house one or many datasources that call stored procedures in the database that will be used to build ONE of those sub-graphs you speak of. With that said, if you're going to lazy load data from a database, then I can only assume that many of the object members are not used much of the time which furthers my point - break that object up.
A couple things about it:
It can be chatty if the entire object is being used regularly
It is fully injectable via the Operations
The datasources contain the reader for the specific object, thus only performing ONE task (SOLID)
Can be modified to use Entity Framework, without too much fuss
Can be designed to implement an interface, making it more reusable
Will require you to break up that proc into smaller, chewable pieces, which will likely only benefit you in the long run.
The complex object shown in this diagram really shouldn't exist if only parts of it are going to be used. Instead, consider segregating those objects. However, it really depends on how this object is being used.
UPDATE:
Using your cache as the repository, I would probably approach it like this:
So basically, you store the legacy object, but in your operations, you use them to build more relavent DTOs that are returned to the client.
I know NHibernate does lazy loading buy replacing objects with proxy objects. Then in the proxy object there is some kind of check that causes the loading of the real object the first time you try to access the object.
I'm not sure of any Design Patterns that would cover that, but you could look at the Nhibernate source code.
A down side of using proxy objects is you have to be careful with inheritance and type checks as you could be checking the type of the proxy and not the actual object.
I am writing some data access code and I want to check for potentially "invalid" data states in the database. For instance, I am returning a widget out of the database and I only expect one. If I get two, I want to throw an exception. Even though referential integrity should prevent this from occurring, I do not want to depend on the DBAs never changing the schema (to clarify this, if the primary key constraint is removed and I get a dupe, I want to break quickly and clearly).
I would like to use the System.IO.InvalidDataException, except that I am not dealing with a file stream so it would be misleading. I ended up going with a generic applicationexception. Anyone have a better idea?
InvalidDataException seems pretty reasonable to me:
The name fits perfectly
The description fits pretty reasonably when you consider that it's effectively a data "stream" from the database
Nothing in the description mentions files, so I wouldn't be worried about that side of things
You're effectively deserializing data from a store. It happens to be an RDBMS, but that's relatively unimportant. The data is invalid, so InvalidDataException fits well.
To put it another way - if you were loading the data from a file, would you use InvalidDataException? Assuming you would, why should it matter where the data is coming from, in terms of the exception being thrown?
If you need an exception that would exactly describe the situation you're dealing with, why not make your own exception?
Just inherit it from System.Exception.
I might be tempted to use one of the following:
InvalidConstraintException
NotSupportedException
OverflowException
Or, just go ahead and create my own: TooManyRowsException
You could write a custom exception if you do not find any suitable standard-exception ...
But, you say:
Even though referential integrity
should prevent this from occurring, I
do not want to depend on the DBAs
never changing the schema.
When someone changes the DB schema, changes are pretty big that you'll have to make some modifications to your application / data-access code as well ...
Why and when do we need immutable (i.e. readonly) classes (I am not talking about string. I am talking about Business Objects) in business or database applications?
Can anyone give me a real-world example of a scenario?
Though Jon certainly makes a compelling case for the benefits of immutable objects, I'd take a slightly different tack.
When you're modeling a business process in code, obviously what you want to do is to use mechanisms in the code to represent facts about the model. For example, if a customer is a kind of person, then you'd probably have a Person base class and a Customer derived class, and so on.
Immutability is just another one of those mechanisms. So in your business process, think about what things happen once and then never change, in contrast with what things change over time.
For example, consider "Customer". A customer has a name. Does the name of a customer ever change? Sure. Customer names change all the time, typically when they get married. So, should Customer be an immutable class? Probably not. Logically, when a customer changes her name, you do not create a new customer out of the old one; the old customer and the new customer are the same object, but the name property has changed.
Now consider "contract". Does a contract ever change? No. An emendation to an existing contract produces a new, different contract. The dates, parties, clauses, and so on in a particular contract are frozen in time. A contract object could be immutable.
Now the interesting question is what to do when a contract mentions a customer, and the customer changes their name. It's the interactions between mutable and immutable objects that make this a tricky design problem.
Immutable types are easier to reason about than mutable ones - when you've got a reference to an instance, you know you can rely on it not changing. You can build up a functional style of working, where any mutation you would want to perform becomes an operation creating a new instance (just as it does with string) - those functional operations can be composed safely, with no concerns around what happens if one of the operations changes the object in a particular way which would harm the other operations.
Once you've made a decision based on the state of an immutable value, you know that decision will remain valid for that value, because the value itself won't be able to change.
Additionally, immutability is useful for threading - immutability avoids a lot of the concerns around data-races etc when you want to use a single object across multiple threads.
A lot of these benefits can be useful for business objects, but you do need to approach problems with a different mindset. In particular, if your database rows aren't immutable (i.e. you will be modifying rows rather than always creating new "versions" of rows) then you need to be aware that any given value may no longer represent the database state for that row.
Once you have printed out an invoice and issued it to the customer, that invoice is frozen in time forever. Any adjustments would need to be applied on a subsequent invoice.
I am looking to see what approaches people might have taken to detect changes in entities that are a part of their aggregates. I have something that works, but I am not crazy about it. Basically, my repository is responsible for determining if the state of an aggregate root has changed. Let's assume that I have an aggregate root called Book and an entity called Page within the aggregate. A Book contains one or more Page entities, stored in a Pages collection.
Primarily, insert vs. update scenarios are done by inspecting the aggregate root and its entities to determine the presence of a key. If the key is present, it is presumed that the object has been, at one time, saved to the underlying data source. This makes it a candidate for an update; but it is not definitive based upon that alone for the entities. With the aggregate root the answer is obvious, since there is only one and it is the singular point of entry, it can be assumed that key presence will dictate the operation. It is an acceptable scenario, in my case, to save the aggregate root itself back again so that I can capture a modification date.
To help facilitate this behavior for the entities themselves, my EntityBase class contains two simple properties: IsUpdated(), IsDeleted(). Both of these default to false. I don't need to know if it is new or not, because I can make that determination based upon the presence of the key, as mentioned previously. The methods on the implementation, in this case the Page, would have each method that changes the backing data set IsUpdated() to true.
So, for example, Page has a method called UpdateSectionName() which changes the backing value of the SectionName property, which is read-only. This approach is used consistently, as it allows for a logical attachment point of validators in the method (preventing the entity from entering an invalid state) that performs that data setting. The end result is that I have to put a this.IsUpdated() = true; at the end of the method.
When the aggregate root is sent into the repository for the Save() (a logic switch to either an Insert() or Update() operation), it can then iterate over the Pages collection in the Book, looking for any pages that have one of three scenarios:
No key. A Page with no key will be inserted.
IsDeleted = true; A delete trumps an update, and the deletion will be committed - ignoring any update for the Page.
IsUpdated = true; An update will be committed for the Page.
Doing it this way prevents me from just blindly updating everything that is in the Pages collection, which could be daunting if there were several hundred Page entities in the Book, for example. I had been considering retrieving a copy of the Book, and doing a comparison and only committing changes detected, (inserts, updates, and deletes based upon presence and/or comparison), but it seemed to be an awfully chatty way to go about it.
The main drawback is that the developer has to remember to set IsUpdated in each method in the entity. Forget one, and it will not be able to detect changes for that value. I have toyed with the idea of some sort of a custom backing store that could transparently timestamp changes, which could in turn make IsUpdated a read-only property that the repository could use to aggregate updates.
The repository is using a unit of work pattern implementation that is basing its actions on the timestamp generated when the aggregate root was added to it. Since there might be multiple entities queued for operations, entity operations are rolled up and executed immediately after the aggregate root operation(s) are executed that the entities belong to. I could see taking it a step further and creating another unit of work to just handle the entity operations and base them off some sort of event tracking used in the entity (which is how I am assuming that some of the ORM products on the market accomplish a similar level of functionality).
Before I keep on moving in this direction, though, I would love to hear ideas/recommendations/experiences regarding this.
Edit: A few additional pieces of information that might be helpful to know:
The current language that I am working with is C#, although I tried to keep as much language-specific information out as possible, because this is more of a theoretical discussion.
The code for the repositories/services/entities/etc. is based upon Tim McCarthy's concept in his book, ".NET Domain-Driven Design with C#" and the supporting code on CodePlex. It provides a runnable understanding of the type of approach taken, although what I am working with has largely been rewritten from the ground up.
In short, my answer is that I went with what I proposed. It is working, although I am sure that there is room for improvement. The changes actually took very little time, so I feel I didn't navigate too far from the KISS or YAGNI principals in this case. :-)
I still feel that there is room for timing related issues on operations, but I should be able to work around them in the repository implementations. Not the ideal solution, but I am not sure that it is worth reinventing the wheel to correct a problem that can be avoided in less time than it takes to fix.