I am new to C# and SQL. But over the last few years while learning both in college a question really begins to burn inside me. Here it is:
It seems to me that there are really two very generic ways to handle input validation (i.e. checking for required fields, and data in the correct ranges ect).
The first, and the way shown traditionally is: Once you develop your UI, and have connected it to a database back end in some manner. On the user interface, you check for correct input, such as blank text boxes, number ranges, or to ensure a radio or check box is selected ect.
The second, and the way shown in database development is: To set check constraints on fields such as no nulls allowed, unique values, and even ranges and required fields.
My dilema is this. Given that in modern languages like C# you can do general execption handling, and also given that major league fault tolerance is built into most databases like SQL Server with regard to handling data changes in respect to committing all or none. Details like this, and to this level, would be hard to program in anything but the simplest of programs.
So my question is, why not build all the requirements directly into the table at the database back end. Take advantage of the aformentioned fault tolerance, and just forget about programming if statements to ensure correct data is input, and instead just use a generic catch all execption handler if the data is not committed.
Perhaps that is how it is done, if so I would really like to know for sure. If not, why? My preference is to avoid writing code whenever possible. Less code, less debugging, and less problems when it comes to updating. So I would tend to go with that approach of letting the DB back end do the work. Is this the generally correct thing to do.
I know that general execption handling is considered "expensive" in terms of resources. But surley once you get past 5 or 10 if statements to handle different fields and their constraints, it must be more efficient code wise to just do a general execption handler. It certantly seems easier to understand overall. (At least the way I do it).
Thanks for your help with this.
OK, here is why you need it in both places.
First the integrity of the data should be paramount and data can be changed directly in database tables (deliberately through a script to say update a million prices or by accident or even by disgruntled or criminal employees trying to disrupt the database or steal from the company). Therefore is it reckless to avoid using constraints directly in the database and it leads to bad data.
Now at the user interface level, you want to prevent the user from wasting his time submitting bad data and you want to prevent the servers and networks from wasting their time trying to process it, so you write checks at that level. Plus you don't want the data in an inconsistent state if you need to insert to several tables and aren't using a transction (which you should be using but I would suspect it happens less often than it should.) Plus the users hate it when you try the insert and it fails and tells you that X is wrong and then they fix X and now Y is wrong but it was wrong before, the process just didn't get as far as Y before.
You do both.
Create constraints at the DB - level, and check for those constraints on the client level as well.
The validation on the DB makes sure that no invalid data gets in your DB, no matter how the data is inputted.
The validation at the client side improves the user-experience.
You generally can't build all the logic for checking into the database. Also not validating user input sufficiently is a good way to open yourself up to attack.
One way to write lesss guard code in every method is 'Code Contracts' a product of microsoft research.
All input should be validated both client and server side. Always.
Also with a giant catch it would be hard to tell which field was in error. So you would end up writing a lot of which field exploded code at the other end.
While I generally advocate putting as much in the database as possible (which means that you can have a high degree of confidence about the "raw" data as possible), that isn't always possible, even with the powerful constraints and triggers available in SQL.
In addition, there are high-level "integrity" things which may change over time, and it is not realistic to always have temporally-dynamic conditions in constraints. i.e. all HR records since 2007 must have a non-NULL birthdate, but prior ones are allowed to remain NULL, but any row cannot ever be set back to NULL.
My point is you can almost never put it all in the database.
Put the things in that you can, and put others at higher levels in the system. The database is a very important part of any system, but it isn't the only part. As long as its design helps it protect its perimeter and be able to provide reliable service and guarantee what it says it will guarantee so that other parts of the system can rely on their assumptions, then that's about the most you can ask for.
In addition to all answers made here, like that UI control improves drammaticaly UX for the user, and can completely change "image" of your app, that validation on DB is made for correct insert the data to DB, but on client it have to be done for correct insert of the client data.
Consider an example of standalone enterprise app. A client work at home, he filled 20 invoices late night on his notebook in Mongolia. The day after he came back, and sync it with his office SAP server. If the error will be figure out only during sync of the data, you can imagine what awful is this situation.
Just an example. There could a plenty of others, I'm sure.
Good luck.
Its 2 years later and I have a decent amount of experience now. I am not going to accept my answer as the right one as many here have done a great job and I am very happy with their answers. But I want to add another important consideration that looking back over my experience has not been highlighted here. I also use stack overflow for reference as I progress and I always find myself looking back over my questions and answers which is another reason I wanted to add this. Like a note to my future self.
While working at that company, I was asked to build an app that would do job abc. With this I also had to build part of the database. As I was finishing with the company I learned that they were writing another app which would use my database. Effectively my point is, that as many have pointed out, data is paramount, and you don't know how it is going to be accessed when you're gone.
I have also learned that there are 3 places that data needs to be verified:
on the actual database as explained
on the server side code behind which is not the same as the DB or client side validation
on the client side
There is another worry. With the advent of new tech like tablets and smart phones. This is yet another place where validation has to be implemented. The same rules for a 4th time (unless its a web app).
I later learned that prior to MVC we had CGI forms which had something to do with handling data over the network (I humbly admit ignorance on hardware side) but from what was explained to me it seems there may even be a 5th place to do validation (although I am open to being totally wrong about that).
I think the next guru in computer science will make a name for himself if he can find a way to abstract all that verification and validation to one place so that such rules don't have to be altered in a bunch of places.
worst case:
DB
Server side code
Client side code for web apps
What about if:
There may be a native client app (i.e. windows, linux or mac (at least 6 now))
There may be various phone apps (android, iPhone, and win phone to name 3, at least 9 now))
There may be some CGI or whatever
This totals 10+ places without much exaggeration and there are other operating systems.
Even for a simple age range this is getting to be messy, but what if they bring out some new email format, or other complicated validation, or you have to change a bunch of validation rules. Now you have to modify them across at least 3 or 4 places which in itself is bad.
The major problem with that is that you are modifying a lot of code and infrastructure that has been invested in, tested, and usually proven to work and delivered to the market...
As the number of client sides grow, modifying well tested code, can't be a good thing. I think this is going to be a major headache for the future. I wonder if there will be a design pattern or best practice to resolve it. If anyone knows of one, please tell me.
Related
I've got an ASP.NET web application, that is essentially our intranet site. I made a lot of progress on the administration office's employee management pages. It ties into an SQL server database, and I'm using a three layered design (Objects, Logic, DataAccess). It was all reviewed and all of it was accepted, except! for the part that manages vacations and vacation histories.
My question, before I go into details is, how does one efficiently "untangle" code that is no longer necessary?
For example: previously I was treating each VacationDay as it's own entity with it's own history. Such that I could track the history of an individual day. To help in tracking, I have an enum called VacationDayAction, which includes options such as .Submitted, .RequestDenied, .CancellationRequested, and so on. This was in an attempt to provide meticulous detail for each day. It was then determined that we no longer need that. We do, however, still need VacationDays and all the basic functions of that (saving days, getting days, etc.), but now we no longer need any of the "history" related classes.
My problem is, when I right click a class that I no longer need in VS and go to "Show All References," I get a ton of results scattered across several pages. I need to get rid of all of them, without breaking the rest of the application. Is there not some kind of "smart" technique or method for easily untangling parts that are no longer necessary? This is particularly difficult because 90% of what I did was just fine, and needs to stay like it is. Yet scattered in that 90% is 10% of stuff that is no longer needed. I can't just go storming through with the delete key either, because with the removal of each reference, I need to be sure that any dependencies on that reference are also fixed in a way that they don't call stuff that isn't there anymore. And I still need the application is a compilable state, so that I can test along the way that the rest of the application didn't fall apart as a result of some deletion.
To give you an idea of my low level of experience, I started two years ago with having never used C#, ASP.Net, or Visual Studio. It blew my mind when, way after starting and as I was learning, someone taught me that I could use breakpoints. And then it really really blew my mind when I learned about multi-layered design. I'm wondering if there is not some technique or trick or feature that can help in scenarios like this, where you have to "untangle" and throw away unnecessary stuff.
This is not a simple question. In fact, I would say this is one of the major challenges for any systems developer; how to handle and get rid of old code which is not in use. There is lots of literature on this, and few really excellent answers. A good book may be "Working effectively with legacy code" by Michael Feathers, which deals with many related problems. It is no light read though, and will probably take some time to get through, but it will likely help you become a better coder, and better at these kinds of tasks in particular.
Maybe you can have a look at the Resharper tool? ( http://www.jetbrains.com/resharper/ ) It is a productivity tool which among other things shows "dead" code (unused code) in grey, and lets you remove it. It will also help you remove unused references from each class (again, they will be grayed out and let you remove them automatically).
Drawing diagrams where each major piece of code /component is a box with a line linking it to any related component might help you get a better overview; try to draw a hierarchy showing how different parts of the code are related and dependent.
The bottom line as far as I know, is that you just have to muddle through it, commenting out code a little at a time, then recompiling and testing it. If it still works, fine, now you can remove the commented out code completely. This would be easier if you had unit-tests covering your code, but I take it as a given that you don't, as is unfortunately often the case.
In my scenario, let's say there is a ASP.Net 4.0 C# page containing a form with several inputs on it. Based upon which state the user is in, the form needs to act in entirely different ways: some fields might be required, some not visible at all, some might have different requirements (state A might only allow numbers 1-5, state B numbers 5-10), etc.
So, to simplify things, let's just say for any given input on the form, I need to determine whether or not it's required for the user, again based on their state. For those of you who run into this scenario quite a bit, what's the best way of implementing a system to handle this? I can see the following options:
Hardcoded - Difficult to maintain, obviously
Custom Database Rule Framework - This seems like it would work; however, it would be somewhat of a pain to maintain depending on how complicated the logic is
Windows Workflow Foundation - This would be able to handle just about any kind of logic, and be decent to maintain, but I'm not sure how this would do performance wise. (could be stored externally in database)
Dynamic Code - Store the logic in a database and run it directly based upon the user. I've never done this.. is it possible?
That's all I've come up with at this point, but I'm hoping someone out there has found an elegant solution to handle scenarios with complicated forms like this.
Thanks!
I have never worked with WWF, but I have encountered a scenario like this and implemented an entry form for it that works well and is easy to maintain once you understand the system.
I will discourage you from using hardcoded logic because any degree of complexity will quickly become impossible to maintain. I tried a hybrid approach that included some hardcoding initially and it did not turn out well.
I ended up creating, as you call it, a custom database rule framework. It is a little extra work to set up config forms to associate user groups with certain codes and pieces of functionality, but in the end it is well worth it for everything to automatically configure itself. Also in my case I was able to farm out user & code setup work to a supervisor in the department that uses the application, so that is a big plus.
Hardcoding -- not so hard to maintain, just depending on how fluid the rules are. I.e., if your "states" are relatively fixed, you're not adding new ones or changing the way those states interact with the page, then hardcoding might be fine. My only recommendation in this case would be to keep it in a separate class so you can re-use it, modify & re-publish easier, etc.
If you want the flexibility to change the rules a lot, create new states (I'm thinking of these as "roles"), then storing the info in a database would make more sense.
Personally, I use the database approach. It saves me some re-publsihing of the app, and it has allowed me to build additional interfaces for my end-users to have limited capability to manage their own app in terms of role assignments ("states" as you put it), etc. For example, my end-users can grant one of their clients (based on the client's login) access to a certain report. Or in your situation, they could change the min number for some range-validator your .aspx is using.
Since this approach lets me delegate some admin functions to my end-users, it allows them to do on-the-fly changes (to a limited extent), and also saves me a lot of rush-work / do it yesterday work as far as my own to-do list is concerned.
I just had a conversation with my manager relating to checkin\out policies on a project I'm currently working on. Basically I tried to edit a file that was already checked out by another developer and I couldn't - I asked my manager why we couldn't edit the same class at the same time and he gave this reason for turning that functionality off: We had a lot of problems with developers editing the same Form (or anything visual done in the designer) and then cheking it in. Merging the changes in the designer generated code was a lot of hassle...
As I'm writing this I'm struggling to see what problem they were having - surely they were getting the latest code before trying to check something in??
Have any of you come across problems with editing the same Form (or something in the designer) as another developer and then checking into TFS? If so how did your team get around the problem? Did you also turn off the ability for developers to work on the same class?
EDIT: The following post (found here) is exactly the problem my manager was describing. Anyone know of a simpler way to resolve the issue than the ones in that post?
I would argue that the solution to your problem would be to establish best practices for source code modification.
Discourage people from going into UI code and arbitrarily jiggling the components around in the designer. Any reasonable UI modifications should be easily mergeable. Your best bet is to try and educate people as to the best way to merge in any given source control system. Also, as helpful as the designer is, ignorance of what code is being automatically generated in the background will be significantly detrimental in the long-term.
People who insist on locking checked-out files for the reasons you stated in your post typically wait long periods of time to check their code in. Naturally, the more time passes, the more code gets modified, so it makes merging difficult for these people. Checking in early, often, and incrementally requires people to think about their changes in stages, and for some coders, this is a rather painful cultural/psychological adjustment.
I've just checked back through the histories of some of my .designer.cs files and I can't see any changes that would cause a merge problems. There were no wholesale rearrangements of code for example.
Another thing to consider is to make sure that everyone does a "get latest" at regular intervals then any individual merge/resolution isn't going to be that great thus minimising the chances of anything going wrong.
It might also be worth investigating a 3rd party merge tool. There are plenty around.
Now it could be that the changes I've done are simple compared to the ones you've got so you should take my anecdotal data with a pinch of salt.
It can cause problems (in general) when a lot of people are editing UI concurrently. The merge logic will do a fine job merging things, but in a lot of cases the UI is drawn according to how things are added to the form. Your UI can get messed up quickly.
I don't know if I would use this as an excuse to enforce exclusive checkouts across the board, though. I might go from a (non programmatic) policy standpoint that says shared checkout for business logic, but exclusive for UI changes.
I would couple that with a strong MVP, MVC, or MVVM approach, though, which should limit the number of people that have to touch the UI concurrently.
As others have alluded to, keep one of the seminal rules of SCM in mind: merge early and often, and your problems are reduced. (along with that is "always get latest before you start working on the code).
I need to allow my users to be able to define formulas which will calculate values based on data. For example
//Example 1
return GetMonetaryAmountFromDatabase("Amount due") * 1.2;
//Example 2
return GetMonetaryAmountFromDatabase("Amount due") * GetFactorFromDatabase("Discount");
I will need to allow / * + - operations, also to assign local variables and execute IF statements, like so
var amountDue = GetMonetaryAmountFromDatabase("Amount due");
if (amountDue > 100000) return amountDue * 0.75;
if (amountDue > 50000) return amountDue * 0.9;
return amountDue;
The scenario is complicated because I have the following structure..
Customer (a few hundred)
Configuration (about 10 per customer)
Item (about 10,000 per customer configuration)
So I will perform a 3 level loop. At each "Configuration" level I will start a DB transaction and compile the forumlas, each "Item" will use the same transaction + compiled formulas (there are about 20 formulas per configuration, each item will use all of them).
This further complicates things because I can't just use the compiler services as it would result in continued memory usage growth. I can't use a new AppDomain per each "Configuration" loop level because some of the references I need to pass cannot be marshalled.
Any suggestions?
--Update--
This is what I went with, thanks!
http://www.codeproject.com/Articles/53611/Embedding-IronPython-in-a-C-Application
Iron Python Allows you to embed a scripting engine into your application. There are many other solutions. In fact, you can google something like "C# embedded scripting" and find a whole bunch of options. Some are easier than others to integrate, and some are easier than others to code up the scripts.
Of course, there is always VBA. But that's just downright ugly.
You could create a simple class at runtime, just by writing your logic into a string or the like, compile it, run it and make it return the calculations you need. This article shows you how to access the compiler from runtime: http://www.codeproject.com/KB/cs/codecompilation.aspx
I faced a similar problem a few years ago. I had a web app with moderate traffic that needed to allow equations, and it needed similar features to yours, and it had to be fast. I went through several ideas.
The first solution involved adding calculated columns to our database. Our tables for the app store the properties in columns (e.g., there's a column for Amount Due, another Discount, etc.). If the user typed in a formula like PropertyA * 2, the code would alter the underlying table to have a new calculated column. It's messy as far as adding and removing columns. It does have a few advantages though: the database (SQL Server) was really fast at doing the calculations; the database handled a lot of error detection for us; and I could pretend that the calculated values were the same as the non-calculated values, which meant that I didn't have to modify any existing code that worked with the non-calculated values.
That worked for a while until we needed the ability for a formula to reference another formula, and SQL Server doesn't allow that. So I switched to a scripting engine. IronPython wasn't very mature back then, so I chose another engine... I can't remember which one right now. Anyway, it was easy to write, but it was a little slow. Not a lot, maybe a few milliseconds per query, but for a web app the time really added up over all the requests.
That was when I decided to write my own parser for the formulas. That is, I have a PlusToken class to add two values, an ItemToken class that corresponds to GetValue("Discount"), etc. When the user enters a new formula, a validator parses the formula, makes sure it's valid (things like, did they reference a column that doesn't exist?), and stores it in a semi-compiled form that's easy to parse later. When the user requests a calculated value, a parser reads the formula, parses it, figures out what data is needed from the database, and computes the final answer. It took a fair amount of work up front, but it works well and it's really fast. Here's what I learned:
If the user enters a formula that leads to a cycle in the formulas, and you try to compute the value of the formula, you'll run out of stack space. If you're running this on a web app, the entire web server will stop working until you reset it. So it's important to detect cycles at the validation stage.
If you have more than a couple formulas, aggregate all the database calls in one place, then request all the data at once. Much faster.
Users will enter wacky stuff into formulas. A parser that provides useful error messages will save a lot of headaches later on.
If the custom scripts don't get more complex than the ones that you show above, I would agree with Sylvestre: Create your own parser, make a tree and do the logic yourself. You can generate a .Net expression tree or just go through the Syntax tree yourself and make the operations within your own code (Antlr below will help you generate such code).
Then you are in complete control of your references, you are always within C#, so you don't need to worry about memory management (any more than you would normally do) etc. IMO Antlr is the best tool for doing this in C# . You get examples from the site for little languages, like your scenario.
But... if this is really just a beginning and at the end you need almost full power of a proper scripting language, you would need to go into embedding a scripting language to your system. With your numbers, you will have a problem with performance, memory management and probably references as you noted. There are several approaches, but I cannot really give one recommendation for your scenario: I've never done it in such a scale.
You could build two base classes UnaryOperator (if, square, root...) and BinaryOperator (+ - / *) and build a tree from the expression. Then evaluate the tree for each item.
I'm trying to debug portions of the current application I'm working on, however when I try and check the value of a property/variable I get the error:
Cannot evaluate expression because a thread is stopped at a point where garbage collection is impossible, possibly because the code is optimized.
This is just a regular ASP.NET project. In some portions of the application I can view the properties and variables perfectly fine. I haven't figured out what's different about the blocks of code that I can and can not see the values of the variables in.
The problem was documented on an MSDN blog, as being a size limitation of certain types in certain situations, more details in the link. I believe it was 256 bytes and/or the total size/count of the number of arguments passed to a function. Sorry to say there does not seem to be a quick fix, but hopefully the MSDN blog entry will help you identify a way to solve your problem.
This article, Rules of Funceval gives a number of reasons why this can occur. If debugging is turned on and optimisation turned off already, there doesn't seem to be much else you can do about this problem.
Are you making release builds? Try changing the configuration to "debug" and see if it improves.
We have the same problem in two of our WinForm user controls. In both cases the user controls contain a lot of business logic (2000 and 3000 lines of code respectively) and make use of multiple fairly heavy objects (they have 30+ properties that get populated automatically from the database the first time when one of the properties are accessed). When you try to step through the (somewhat complicated) validation and saving methods, you get this same message when trying to access object properties.
We have come to the conclusion that the size and complexity of the user control combined with the size and complexity of the objects used and conditional database access just becomes too much for the debugger to handle and that we should probably just do some major refactoring to move most of the business logic out of the user control. It would be interesting to know if your problem arises from the same kind of situation and whether doing the said kind of refactoring actually does make a difference (we have not had the time and/or courage :) to do so).