Data integrity for a large database table

Data integrity for a large database table - c#

I have to provide data integrity for a large database table. So, if a crafty admin manually changes the table (not via UI) I want to be able to detect it.
My idea is to have HMAC for each record and calculate incremental HMAC for the table when a user change it via UI:
Calculate HMAC for first record - HMAC_Current.
Calculate HMAC for a new record - HMAC_i
Calculate new HMAC for the table as HMAC_Current = HMAC(HMAC_Current + HMAC_i).
Pros:
there is no need to calculate HMAC for entire table each time when a user adds a record via UI.
Cons:
When a user deletes or changes a record I have to recalculate HMAC for the table from this record to the end of the table.
When I want to check data integrity I have to check HMAC for each record. Then calculate HMAC for entire table from top to bottom and compare it with HMAC_Current.
Is there a better way to do it?

I see a number of problems with this approach:
If your sysdba has access to all the data, what's stopping them from messing with the HMACs as well? eg: They revert all changes to the table made in the last month. Then they put back the HMAC from last month. Is data integrity "preserved" in this case?
What stops them from subverting the application to mess with the HMACs? eg: If they don't have access to the application, they change the password for a user, and accesses the application as that user to mess with records.
Even if you can get this to work, what's it good for? Say you find a HMAC mismatch. Now who do you hold responsible? An admin? A user? Data corruption?
The better solution is to use auditing. You can set up all kinds of auditing on Oracle, and have the audits saved somewhere even the dba can't touch. Additionally, there's a huge advantage in using auditing: you can know who changed what. With your scheme, you can't possibly know that.
You can even set up FGA (fine-grained auditing) so that it'll only audit specific columns and also know what the values were before and after a change, which isn't possible with standard auditing.
Reference: Configuring and Administering Auditing

Well the first issue is that you don't trust your admins. If so why are they still there? Admins need full rights to prod databases, so they must be trustworthy.
If the issue is that there are occasional disputes about who made changes, then set up audit tables with triggers. Trustworthy admins will not bypass the triggers (even though they can). Only admins should have delete rights to audit tables.
Audit tables are a requirement for most enterprise systems. If you did not set rights through strored procs, it is likely that many internal users have the rights they need to affect the database directly which makes it easier for people to comit fraud. It may not be the admins at all who are affecting the data. Make sure you record information about the user who made the change and at what time as well as recording the change.
SQL Server also has a way to audit structural changes to the db. I don't know if Oracle does as well, but this is also a handly thing to audit.

Are the triggers available for your solution? If so, you can Write Managed Triggers using C#, and add any logic you want for this code.

This approach to 'integrity' is not really an approach to integrity - this is more like security patchwork.
So, first of all try to accomplish the same with better security model.
In case of your scenario, you have to calculate, store and check the HMAC. If check fails, you have to escalate.
If you setup your security properly (almost always it is possible that no admin needs direct write access on your tables) - then you don't have to check.
Moving as much of your business logic to the database will allow you to make stored procedures that could be the only interface to changing the data, so in this case you would have integrity guaranteed.

Related

Is there anything in C# that can be used as database Trigger

I have ERP database "A" has only read permission, where i cant create trigger on the table.
A is made for ERP system (Unknown Program for me ). I have another Database "B" that is private to my application this application work on both databases. i want to reflect A's changes(for any insert/Update/Delete) instantly to B.
Is there any Functionality in c# that can work exactly as trigger works in database???

You have few solutions, best one depends on which kind of database you have to support.
Generic solution, changes in A database aren't allowed
If you can't change master database and this must work with every kind of database then you have only one option: polling.
You shouldn't check too often (so forget to do it more or less instantly) to save network traffic and it's better to do in in different ways for insert/update/delete. What you can do depends on how database is structured, for example:
Insert: to catch an insert you may simply check for highest row ID (assuming what you need to monitor has an integer column used as key).
Update: for updates you may check a timestamp column (if it's present).
Delete: this may be more tricky to detect, a first check would be count number of rows, if it's changed and no insert occured then you detected a delete else just subtract the number of inserts.
Generic solution, changes in A database are allowed
If you can change the original database you can decrease network traffic (and complexity) using triggers on database side, when a trigger is fired just put a record in an internal log table (just few columns: one for the change type, one for affected table, one for affected record).
You will need to poll only on this table (using a simple query to check if number of rows increased). Because action (insert/update/delete) is stored in the table you just need to switch on that column to execute proper action.
This has a big disadvantage (in my point of view): it puts logic related to your application inside the master database. This may be terrible or not but it depends on many many factors.
SQL Server/Vendor specific
If you're application is tied to Microsoft SQL Server you can use SqlDependency class to track changes made. It works for SS only but I think there may be implementations for other databases. Disadvantage is that this will always bee specific to a specific vendor (so if A database will change host...you'll have to change your code too).
From MSDN:
SqlDependency was designed to be used in ASP.NET or middle-tier services where there is a relatively small number of servers having dependencies active against the database. It was not designed for use in client applications, where hundreds or thousands of client computers would have SqlDependency objects set up for a single database server.
Anyway if you're using SQL Server you have other options, just follow links in MSDN documentation.
Addendum: if you need a more fine control you may check TraceServer and Object:Altered (and friends) classes. This is even more tied to Microsoft SQL Server but it should be usable on a more wide context (and you may keep your applications unaware of these things).

You may find useful, depending on your DBMS:
Change Data Capture (MS SQL)
http://msdn.microsoft.com/en-us/library/bb522489%28v=SQL.100%29.aspx
Database Change Notification (Oracle)
http://docs.oracle.com/cd/B19306_01/appdev.102/b14251/adfns_dcn.htm
http://www.oracle.com/technetwork/issue-archive/2006/06-mar/o26odpnet-093584.html
Unfortunately, there's no SQL92 solution on data change notification

Yes There is excellent post are here please check this out..
http://devzone.advantagedatabase.com/dz/webhelp/advantage9.1/mergedprojects/devguide/part1point5/creating_triggers_in_c_with_visual_studio_net.htm
If this post solve your question then mark as answered..
Thanks

How to create two database records at the same time, that do not have sequential Ids

We currently have a website with user login.
We have a user table with userId.
We now want users to have a duplicate profile, that is entirely seperate from their main profile. This needs to be a secret profile.
Now we could just add two records into the db, but the id's would be sequential (until we hit a high hit rate of signups) so a user could workout that a userId would be related to one ID higher.
I realise this is not an ideal solution, but it is a late change to a big software project so we are trying to be as pragmatic as possible, while requiring as little code change as possible.
Options:
Turn off auto increment Ids, and build our own keyGeneration table. Start one table at 0, and start the secret one at 1000000000. We then can then turn off auto incrementing ID's in the user table, and use these keys.
The problem is, does having keys running, 1,1000000000,2,1000000001,3,1000000002 cause a massive indexing problem? Would we have to force index rebuilds all the time?
We key a seperate table just for the 2nd profile id's, again starting at 10000000000. We then modify all our code to check for id's > 999999999 and flip the logic on the server side so the lookups work correctly.
Means doing that check everywhere a user ID is passed into the site, from the front end.
As we don't do that too much, (we obviously mainly grab userId of logged in user securely, it might not be that bad.
Anyway, just wondering if anyone has any thoughts on this?
///////Edit
To put this into further context, imagine on stackoverflow or facebook, you have 2 profiles that you can control, that have no link between them. Like the way multiple users on Facebook can all act as a Page account, yet there is no link back from that account to the real user profile.
Essentially to not break referential integrity or re write too much code, I really want to pass an ID (int) back down to the front end for these account. Then the whole system just keeps ticking over as it has done.
Guid could be cool! But it would have a performance overhead (NOT THAT I CARE ABOUT THAT REALLY ;) but it would also mean writing a lot of code to handle Guids being what is passed to the front end (not that we rely on front end variables being right) but that is why I suggest the high int solution above. As we still have ASP.NET Membership lurking in the background I almost thought that could be good but A) we plan to remove that one day (or migrate to simple membership) b) we use sequential Guid generation in our user table for performance (sorry again to talk about optimization, before it's needed)

Guid, randon numbers by a web service. Plenty of solutions. I would rather go with a GUID.
THAT SAID: This is a business key, i would still use an autoincrement style technical key for referential integrity.

Asp MVC 2: Obfusicate Entity-IDs

Project type: Asp MVC 2/NHibernate/C#
Problem
If you have an edit page in an web application you will come to the problem that you have to send and then receive the id of the entity you're editing, the IDs of sub-entities, entities that can be selected by dropdownmenus,...
As it is possible to modify a form-post, an evil user could try to send back another ID which maybe would grant him more rights (if i.e. that ID was related to a security entity).
My approach
Create a GUID and associate it with the ID
Save the association in the http session
Wait for the response and extract the real ID out of the received GUID.
Question:
What techniques do you use to obfusicate an entity-ID?

If you're doing that much for GUIDs, why not just use GUIDs for the identity of the entity itself that's actually stored in the database (though I'd advise against it)?
Or you could have a server side encryption scheme that encrypts and then subsequently decrypts the id (this is a long the same lines as what you're doing except you're not storing anything random like this in the session (yuck :) ).
You could even forget trying to do this at all since a lot of sites are "affected" by this issue, and it's obviously not a problem (StackOverflow for example). The overhead is just too much.
Also, if you're worried about security, why don't you have some sort of granular permissions set on the individual action/even entity level. This would solve some problems as well.
EDIT:
Another problem with your solution is inconsistent unique identifiers. If a user says "ID as23423he423fsda has 'invalid' data", how do you know which ID it belongs to if it's changing on every request (assuming you're going to change the id in the URL as well)? You'd be much better of with an encryption algorithm that always hashes to the same value therefore, you can easily perform a lookup (if you need it) and also the user has consistent identifiers.

Your controllers should be immune to modified POST data. Before displaying or modifying records belonging to a user, you should always check whether the records in question belong to the authenticated user.

Comparing Data Graphs Using C# GetHashCode()

I have a graph of data that I'm pulling from an OAuth source using several REST calls and storing relationally in a database. The data structure ends up having about 5-10 tables with several one-to-many relationships. I'd like to periodically go a re-retrieve that information to see if updates are necessary in my database.
Since I'm going to be doing this for many users and their data will likely not change very often, my goal is to minimize the load on my database unnecessarily. My strategy is to query the data from my OAuth provider but then hash the results and compare it to the last hash that I generated for the same dataset. If the hashes don't match, then I would simply start a transaction in the database, blow away all the data for that user, re-write the data, and close the transaction. This saves me the time of reading in the data from the database and doing all the compare work to see what's changed, what rows were added, deleted changed etc.
So my question: if I glue all my data together in memory as a big string and use C# GetHasCode(), is that fairly reliable mechanism to check if my data has changed? Or, are there any better techniques to skinning this cat?
Thanks

Yes, that's a fairly reliable mechanism to detect changes. I do not know about the probabilty of collisions in the GetHashCode() Method, but I'd assume it to be safe.
Better methods: Can't the data have a version-stamp or timestamp that is set everytime something changes?

C# and Access 2000

I have developed an network application that is in use in my company for last few years.
At start it was managing information about users, rights etc.
Over the time it grew with other functionality. It grew to the point that I have tables with, let's say 10-20 columns and even 20,000 - 40,000 records.
I keep hearing that Access in not good for multi-user environments.
Second thing is the fact that when I try to read some records from the table over the network, the whole table has to be pulled to the client.
It happens because there is no database engine on the server side and data filtering is done on the client side.
I would migrate this project to the SQL Server but unfortunately it cannot be done in this case.
I was wondering if there is more reliable solution for me than using Access Database and still stay with a single-file database system.
We have quite huge system using dBase IV.
As far as I know it is fully multiuser database system.
Maybe it will be good to use it instead of Access?
What makes me not sure is the fact that dBase IV is much older than Access 2000.
I am not sure if it would be a good solution.
Maybe there are some other options?

If you're having problems with your Jet/ACE back end with the number of records you mentioned, it sounds like you have schema design problems or an inefficiently-structured application.
As I said in my comment to your original question, Jet does not retrieve full tables. This is a myth propagated by people who don't have a clue what they are talking about. If you have appropriate indexes, only the index pages will be requested from the file server (and then, only those pages needed to satisfy your criteria), and then the only data pages retrieved will be those that have the records that match the criteria in your request.
So, you should look at your indexing if you're seeing full table scans.
You don't mention your user population. If it's over 25 or so, you probably would benefit from upsizing your back end, especially if you're already comfortable with SQL Server.
But the problem you described for such tiny tables indicates a design error somewhere, either in your schema or in your application.
FWIW, I've had Access apps with Jet back ends with 100s of thousands of records in multiple tables, used by a dozen simultaneous users adding and updating records, and response time retrieving individual records and small data sets was nearly instantaneous (except for a few complex operations like checking newly entered records for duplication against existing data -- that's slower because it uses lots of LIKE comparisons and evaluation of expressions for comparison). What you're experiencing, while not an Access front end, is not commensurate with my long experience with Jet databases of all sizes.

You may wish to read this informative thread about Access: Is MS Access (JET) suitable for multiuser access?

For the record this answer is copied/edited from another question I answered.
Aristo,
You CAN use Access as your centralized data store.
It is simply NOT TRUE that access will choke in multi-user scenarios--at least up to 15-20 users.
It IS true that you need a good backup strategy with the Access data file. But last I checked you need a good backup strategy with SQL Server, too. (With the very important caveat that SQL Server can do "hot" backups but not Access.)
So...you CAN use access as your data store. Then if you can get beyond the company politics controlling your network, perhaps then you could begin moving toward upfitting your current application to use SQL Server.
I recently answered another question on how to split your database into two files. Here is the link.
Creating the Front End MDE
Splitting your database file into front end : back end is sort of a key to making it more performant. (Assume, as David Fenton mentioned, that you have a reasonably good design.)
If I may mention one last thing...it is ridiculous that your company won't give you other deployment options. Surely there is someone there with some power who you can get to "imagine life without your application." I am just wondering if you have more power than you might realize.
Seth

The problems you experience with an Access Database shared amongst your users will be the same with any file based database.
A read will pull a lot of data into memory and writes are guarded with some type of file lock. Under your environment it sounds like you are going to have to make the best of what you have.

"Second thing is the fact that when I try to read some records from the table over the network, the whole table has to be pulled to the client. "
Actually no. This is a common misstatement spread by folks who do not understand the nature of how Jet, the database engine inside Access, works. Pulling down all the records, or excessive number of records, happens because you don't have all the fields used in the selection criteria or sorting in the index. We've also found that indexing yes/no aka boolean fields can also make a huge difference in some queries.
What really happens is that Jet brings down the index pages and data pages which are required. While this is a lot more data than a database engine would create this is not the entire table.
I also have clients with 600K and 800K records in various tables and performance is just fine.

We have an Access database application that is used pretty heavily. I have had 23 users on all at the same time before without any issues. As long as they don't access the same record then I don't have any problems.
I do have a couple of forms that are used and updated by several different departments. For instance I have a Quoting form that contains 13 different tabs and 10-20 fields on each tab. Users are typically in a single record for minutes editing and looking for information. To avoid any write conflicts I call the below function any time a field is changed. As long as it is not a new record being entered, then it updates.
Function funSaveTheRecord()
If ([chkNewRecord].value = False And Me.Dirty) Then
'To save the record, turn off the form's Dirty property
Me.Dirty = False
End If
End Function
They way I have everything setup is as follows:
PDC.mdb <-- Front End, Stored on the users machine. Every user has their own copy. Links to tables found in PDC_be.mdb. Contains all forms, reports, queries, macros, and modules. I created a form that I can use to toggle on/off the shift key bipass. Only I have access to it.
PDC_be.mdb <-- Back End, stored on the server. Contains all data. Only form and VBA it contains is to toggle on/off the shift key bipass. Only I have access to it.
Secured.mdw <-- Security file, stored on the server.
Then I put a shortcut on the users desktop that ties the security file to the front end and also provides their login credentials.
This database has been running without error or corruption for over 6 years.

Access is not a flat file database system! It's a relational database system.

You can't use SQL Server Express?
Otherwise, MySQL is a good database.
But if you can't install ANYTHING (you should get into those politics sooner rather than later -- or it WILL be later), just use you existing database system.
Basically with Access, it cannot handle more than 5 people connected at the same time, or it will corrupt on you.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.