SQL Numeric Column forbid Gaps

SQL Numeric Column forbid Gaps - c#

I'm doing invoice table in my database. I need a integer column to store invoice number. This column has to forbid gap between number. Identity doesn't work because on rollback can produce a gap.
So what I want is:
InvoiceId(Primary key identity) InvoiceNumber(Unique, NOT NUll)
1 1
2 2
10 3
13 4
Is there a special way in sql to do this ?
If there is no solution in sql, how should I do it in c# + entity?
EDIT 1:
additionnal information: A row will be never deleted.
EDIT 2:
Why I need gapLess Column: it's a law requirement (french law). Invoice number as to be gapLess, to show that you didn't removed invoice.

There is no way to protect from gaps if you access your database in parallel.
Simple case:
Process A) creates an invoice. (#1)
Process B) creates an invoice. (#2)
Process A) rolls back it's transaction.
Gap.
You could either lock your whole table for the whole transaction. That means that only one of your processes can create an invoice at the same time. That might be acceptable for small companies where one person creates all invoices by hand.
Or you could leave the column empty and once per night you lock the whole table and write those numbers that are not set yet. That means you get the invoice number later in your invoice process, but it's without gaps.
Or you could read the requirements again. Germany had something equally stupid, but it was only meant to have those numbers on the form for the tax department. So you could use your normal invoice numbers with gaps and when sending them to this bureaucratic monstrosity, you would generate a unique, gap free number on export only.

Because there are multiple users you can't recalculate value at client side. Just create triggers in your database for insert/delete that will recalculate the InvoiceNumber for the entire table.

Related

Best approach to track Amount field on Invoice table when InvoiceItem items change?

I'm building an app where I need to store invoices from customers so we can track who has paid and who has not, and if not, see how much they owe in total. Right now my schema looks something like this:
Customer
- Id
- Name
Invoice
- Id
- CreatedOn
- PaidOn
- CustomerId
InvoiceItem
- Id
- Amount
- InvoiceId
Normally I'd fetch all the data using Entity Framework and calculate everything in my C# service, (or even do the calculation on SQL Server) something like so:
var amountOwed = Invoice.Where(i => i.CustomerId == customer.Id)
.SelectMany(i => i.InvoiceItems)
.Select(ii => ii.Amount)
.Sum()
But calculating everything every time I need to generate a report doesn't feel like the right approach this time, because down the line I'll have to generate reports that should calculate what all the customers owe (sometimes go even higher on the hierarchy).
For this scenario I was thinking of adding an Amount field on my Invoice table and possibly an AmountOwed on my Customer table which will be updated or populated via the InvoiceService whenever I insert/update/delete an InvoiceItem. This should be safe enough and make the report querying much faster.
But I've also been searching some on this subject and another recommended approach is using triggers on my database. I like this method best because even if I were to directly modify a value using SQL and not the app services, the other tables would automatically update.
My question is:
How do I add a trigger to update all the parent tables whenever an InvoiceItem is changed?
And from your experience, is this the best (safer, less error-prone) solution to this problem, or am I missing something?

There are many examples of triggers that you can find on the web. Many are poorly written unfortunately. And for future reference, post DDL for your tables, not some abbreviated list. No one should need to ask about the constraints and relationships you have (or should have) defined.
To start, how would you write a query to calculate the total amount at the invoice level? Presumably you know the tsql to do that. So write it, test it, verify it. Then add your amount column to the invoice table. Now how would you write an update statement to set that new amount column to the sum of the associated item rows? Again - write it, test it, verify it. At this point you have all the code you need to implement your trigger.
Since this process involves changes to the item table, you will need to write triggers to handle all three types of dml statements - insert, update, and delete. Write a trigger for each to simplify your learning and debugging. Triggers have access to special tables - go learn about them. And go learn about the false assumption that a trigger works with a single row - it doesn't. Triggers must be written to work correctly if 0 (yes, zero), 1, or many rows are affected.
In an insert statement, the inserted table will hold all the rows inserted by the statement that caused the trigger to execute. So you merely sum the values (using the appropriate grouping logic) and update the appropriate rows in the invoice table. Having written the update statement mentioned in the previous paragraphs, this should be a relatively simple change to that query. But since you can insert a new row for an old invoice, you must remember to add the summed amount to the value already stored in the invoice table. This should be enough direction for you to start.
And to answer your second question - the safest and easiest way is to calculate the value every time. I fear you are trying to solve a problem that you do not have and that you may never have. Generally speaking, no one cares about invoices that are of "significant" age. You might care about unpaid invoices for a period of time, but eventually you write these things off (especially if the amounts are not significant). Another relatively easy approach is to create an indexed view to calculate and materialize the total amount. But remember - nothing is free. An indexed view must be maintained and it will add extra processing for DML statements affecting the item table. Indexed views do have limitations - which are documented.
And one last comment. I would strongly hesitate to maintain a total amount at any level higher than invoice. Above that level one frequently wants to filter the results in any ways - date, location, type, customer, etc. At this level you are approaching data warehouse functionality which is not appropriate for a OLTP system.

First of all never use triggers for business logic. Triggers are tricky and easily forgettable. It will be hard to maintain such application.
For most cases you can easily populate your reporting data via entity framework or SQL query. But if it requires lots of joins then you need to consider using staging tables. Because reporting requires data denormalization. To populate staging tables you can use SQL jobs or other schedule mechanism (Azure Scheduler maybe). This way you won't need to work with lots of join and your reports will populate faster.

Gap-less sequence where multiple transactions with multiple tables are involved

I have a requirement (by law) for a gap-less numbers on different tables. The IDs can have holes in them but not the sequences.
This is something I have to either solve in the C# code or in the database (Postgres, MS SQL and Oracle).
This is my problem:
Start transaction 1
Start transaction 2
Insert row on table "Portfolio" in transaction 1
Get next number in sequence for column Portfolio_Sequence (1)
Insert row on table "Document" in transaction 1
Get next number in sequence for column Document_Sequence (1)
Insert row on table "Portfolio" in transaction 2
Get next number in sequence for column Portfolio_Sequence (2)
Insert row on table "Document" in transaction 2
Get next number in sequence for column Document_Sequence (2)
Problem occurred in transaction 1
Rollback transaction 1
Commit transaction 2
Problem: Gap in sequence for both Portfolio_Sequence and Document_Sequence.
Note that this is very simplified and there is way more tables included in each of the transactions.
How can I deal with this?
I have seen suggestions where you "lock" the sequence until the transaction is either committed or rolled back, but this will be a huge halt for the system when it is this many tables involved and this complex long transactions.

As you have already seemed to conclude, gapless sequences simply do not scale. Either you run the risk of dropping values when a rollback occurs, or you have a serialization point that will prevent a multi-user, concurrent transaction system from scaling. You cannot have both.
My thought would be, what about a post processing action, where every day, you have a process that runs at close of business, checks for gaps, and renumbers anything that needs to be renumbered?
One final thought: I don't know your requirement, but, I know you said this is "required by law". Well, ask yourself, what did people do before there were computers? How would this "requirement" be met? Assuming you have a stack of blank forms that come preprinted with a "sequence" number in the upper right corner? And what happens if someone spilled coffee on that form? How was that handled? It seems you need a similar method to handle that in your system.
Hope that helps.

This problem is impossible to solve by principle because any transaction can rollback (bugs, timeouts, deadlocks, network errors, ...).
You will have a serial contention point. Try to reduce contention as much as possible: Keep the transaction that is allocating numbers as small as possible. Also, allocate numbers as late as possible in the transaction because only once you allocate a number contention arises. if you're doing 1000ms of uncontended work, and then allocate a number (taking 10ms) you still have a degree of parallelism of 100 which is enough.
So maybe you can insert all rows (of which you say there are many) with dummy sequence numbers, and only at the end of the transaction you quickly allocate all real sequence numbers and update the rows that are already written. This would work well if there are more inserts than updates, or the updates are quicker than the inserts (which they will be), or there is other processing or waiting interleaved between the inserts.

Gap-less sequences are hard to come by. I suggest to use a plain serial column instead. Create a view with the window function row_number() to produce a gap-less sequence:
CREATE VIEW foo AS
SELECT *, row_number() OVER (ORDER BY serial_col) AS gapless_id
FROM tbl;

Here is an idea that should support both high performance and high concurrency:
Use a highly concurrent, cached Oracle sequence to generate a dumb unique identifier for the gap-less table row. Call this entity MASTER_TABLE
Use the dumb unique identifier for all internal referential integrity from the MASTER_TABLE to other dependent detail tables.
Now your gap-less MASTER_TABLE sequence number can be implemented as an additional attribute on the MASTER_TABLE, and will be populated by a process that is separate from the MASTER_TABLE row creation. In fact, the gap-less additional attribute should be maintained in a 4th normal form attribute table of the MASTER_TABLE, and hence then a single background thread can then populate it at leisure, without concern for any row-locks on the MASTER_TABLE.
All queries that need to display the gap-less sequence number on a screen or report or whatever, would join the MASTER_TABLE with the gap-less additional attribute 4th normal form table. Note, these joins will be satisfied only after the background thread had populated the gap-less additional attribute 4th normal form table.

Database design for handling individual and recurring charges

We have a billing system where we process individual charges as well as recurring charges (subscriptions).
There are two SQL tables:
StandardCharges
RecurringCharges
StandardCharges table holds individual items purchased by customers during the month.
RecurringCharges table holds recurring items with a charge by date. When the time comes our system automatically creates a recur request which adds a row to the StandardCharges table and increases the charge by date to next month in RecurringCharges table.
At the end of each month we get the total values for each customer from StandardCharges table and create an invoice.
Is there a kind of design pattern or another way of doing this? Is this the right database design? Ideally I would like to hold all charges in one Charges table and manage recurring charges from there as well?
Thanks

I suspect that your design is indeed correct.
When thinking about the data in real world terms it makes no sense to have "possible" transactions (I.E., transactions which have not yet happened and may not materialize, perhaps because the customer had overrun their credit limit) mixed in with committed and actual transactions.
Merging the data into a single table can also make reporting difficult as you have to apply special filtering criteria and store extra meta data - like TransactionCompleted and TransactionIsFutureCharge.
If I was to make a suggestion it would be renaming the StandardCharges to something closer to the data it holds like CompletedTransactions and the RecurringTransactions something like PendingTransactions.

The current design seems reasonable to me. But if you want to merge the two tables, you could simply add a BIT column called IsRecurring or IsFuture or IsScheduled or whatever you want to use to designate the charges that would have otherwise gone in RecurringCharges. Then when your due date is hit for a recurring charge, you just insert into the same table instead of a different table. As for the invoice, you'd just add a condition to the query to filter out the charges that have the BIT column set.

How to provide Unique ID in Offline mode?

In a client-server accounting application in invoice form when a user saves an invoce it gets An invoice number like 90134 from server and saves the invoice with that number The invoice number is needed for the customer.
So in Offline mode (like when the network dropped) how provide a unique id?
Is it good to use String Id like this pattern: client + incremental number?
I don't want to use GUIDs.

If you know in advance how many invoice numbers you will generate per client during an offline period, would you be able to pre-allocate invoice numbers? e.g. if each client is likely only to generate 4 invoices per offline period, you could allocate a block of 4 numbers to each client. This may involve an extra column in your DB to store a value indicating whether the number is an invoice already created, or a preallocation of a number. Depending on the structure and constraints within your DB, you may also need to store some dummy data to enforce referential integrity.
The downsides would be that your block of numbers may not get used sequentially, or indeed at all, so your invoice numbers would not be in chronological order. Also, you would run into problems if the pool of available numbers is used up.

You can use Guid:
var myUniqueID = Guid.NewID();
In SQL server is corresponding type uniqueidentifier.
In general the Guid is 128-bit number.
More about Guid you can read:
http://en.wikipedia.org/wiki/Globally_unique_identifier
http://msdn.microsoft.com/en-us/library/system.guid.aspx

I suppose the invoice number (integer) is incremental: in this case, since you have no way of knowing the last invoice number, you could save the invoice in a local db/cache/xml without the invoice Number and wait for the network connection to insert the new records in the DB (the invoice number would be generated then)

You could start your numbers for each client at a different range... e.g.:
client 1: 1,000,000
client 2: 2,000,000
client 3: 3,000,000
Update them every now and then when there is a connection to avoid overlaps.
It's not 100% bulletproof, but at least it's better than nothing.
My favorite would still be a GUID for this, since they're always unique.

There is a workaround, but it is merely a "dirty hack", you should seriously reconsider accepting new data entries while offline, especially when dealing with unique IDs to be inserted in many tables.
Say you have an "orders" table and another "orderDetails" table in your local dataset:
1- add a tmpID of type integer in your "orders" table to temporarily identify each unique order.
2- use the tmpID of your newly created order in the rest of the process (say for adding products to the current order in the orderDetails table)
--> once you are connected to the server, in a single transaction do the following
1- insert the first order in the "orders" table
2- get its uniqueID generated on your SQL server
3- search for each line in "orderDetails" that have a tmpID of currentOrder.tmpID and insert them in the "orderDetails" table on your server
4- commit the transaction and continue to the following row.
Keep in mind that this is very bad coding and that it can get real dirty and hard to maintain.

it looks like impossible to create unique numbers with two different systems both offline when it must be chronological and without missing numbers.
imho there is no way if the last number (on the server) was 10, to know if i should return 11 or 12; i would have to know if 11 was already used by another person.
I can only imagine to use a temporary number and later on renumber those numbers, but if the invoices are printed and the number can not be changed, i don't know how you could accomplish such a solution.

TSQL: Generate human readable ids

We have a large database with enquiries, each enquirys is referenced using a Guid. The Guid isn't very customer friendly so we want to the additional 5 digit "human id" (ok as we'll very likely won't have more than 99999 enquirys active at any time, and it's ok if a humanuid reference multiple enquirys as they aren't used for anything important).
1) Is there any way to have a IDENTITY column reset to 1 after 99999?
My current workaround to this is to use a INT IDENTITY(1,1) NOT NULL column and when presenting a HumanId take HumanId % 100000.
2) Is there any way to automatically "randomly distribute" the ids over [0..99999] so that two enquirys created after each other don't get the adjacent ids? I guess I'm looking for a two-way one-to-one hash function??
... Ideally I'd like to create this using T-SQL automatically creating these id's when a enquiry is created.

If performance and concurrency isn't too much of an issue, you can use triggers and the MAX() function to calculate a 'next human ID' value. You probably would want to keep your IDENTITY column as is, and have the 'human ID' in a separate column.
EDIT: On a side note, this sounds like a 'presentation layer' issue, which shouldn't be in your database. Your presentation layer of your application should have the code to worry about presenting a record in a human readable manner. Just a thought...

If you absolutely need to do this in the database, then why not derive your human-friendly value directly from the GUID column?
-- human_id doesn't have to be calculated when you retrieve the data
-- you could create a computed column on the table itself if you prefer
SELECT (CAST(your_guid_column AS BINARY(3)) % 100000) AS human_id
FROM your_table
This will give you a random-ish value between 0 and 99999, derived from the first 3 bytes of the GUID. If you want a larger, or smaller, range then adjust the divisor accordingly.

I would strongly recommend relooking at your logic. Your approach has a few dangers, including:
It is always a bad idea to re-use ID's, even if the original record has become "obsolete" - do you lose anything by continuing to grow ID's beyond 99999? The problem here is more likely to be with long term maintenance, especially if there is any danger of the system developing over time. Another thing to consider - is there any chance a user will take this reference number, and use it to reference your system at some stage in the future?
With manually assigning a generated / random ID, you will need to ensure that multiple records are not assigned the same ID. There are a few options that you have to follow this (for example, using transactions), however you should ensure that the scope of the transactions is not going to leave you open to problems with concurrent transactions being blocked - this may cause a few problems eg. Performance. You may be best served by generating your ID externally (as SQL does not do random especially well), and then enforcing a unique constraint on your DB, perhaps in the way suggested by Firoz Ansari.
If you still want to reset the identity column, this can be done with the DBCC CHECKIDENT command.
An example of generating random seeds in SQL server can be found here:
http://weblogs.sqlteam.com/jeffs/archive/2004/11/22/2927.aspx

You can create composite primary key with two columns, say..BatchId and HumanId.
Records in these columns will look like this:
BatchId, HumanId
1, 1
1, 2
1, 3
.
.
1, 99998
1, 99999
2, 1
2, 2
3, 3
use MAX or ORDER BY DESC to get next available HumanId with condition with BachId
SELECT TOP 1 #NextHumanId=HumanId
FROM [THAT_TABLE]
ORDER BY BatchId DESC, HumanID DESC
IF #NextHumanId>=99999 THEN SET #NextHumanId=1
Hope this help.

You could have a table of available HUMANIDs, each time you add an enquiry you could randomly pull a HUMANID from the table (and DELETE it), and each time you delete the enquiry you could add it back (by INSERTing).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.