How to provide Unique ID in Offline mode?

How to provide Unique ID in Offline mode? - c#

In a client-server accounting application in invoice form when a user saves an invoce it gets An invoice number like 90134 from server and saves the invoice with that number The invoice number is needed for the customer.
So in Offline mode (like when the network dropped) how provide a unique id?
Is it good to use String Id like this pattern: client + incremental number?
I don't want to use GUIDs.

If you know in advance how many invoice numbers you will generate per client during an offline period, would you be able to pre-allocate invoice numbers? e.g. if each client is likely only to generate 4 invoices per offline period, you could allocate a block of 4 numbers to each client. This may involve an extra column in your DB to store a value indicating whether the number is an invoice already created, or a preallocation of a number. Depending on the structure and constraints within your DB, you may also need to store some dummy data to enforce referential integrity.
The downsides would be that your block of numbers may not get used sequentially, or indeed at all, so your invoice numbers would not be in chronological order. Also, you would run into problems if the pool of available numbers is used up.

You can use Guid:
var myUniqueID = Guid.NewID();
In SQL server is corresponding type uniqueidentifier.
In general the Guid is 128-bit number.
More about Guid you can read:
http://en.wikipedia.org/wiki/Globally_unique_identifier
http://msdn.microsoft.com/en-us/library/system.guid.aspx

I suppose the invoice number (integer) is incremental: in this case, since you have no way of knowing the last invoice number, you could save the invoice in a local db/cache/xml without the invoice Number and wait for the network connection to insert the new records in the DB (the invoice number would be generated then)

You could start your numbers for each client at a different range... e.g.:
client 1: 1,000,000
client 2: 2,000,000
client 3: 3,000,000
Update them every now and then when there is a connection to avoid overlaps.
It's not 100% bulletproof, but at least it's better than nothing.
My favorite would still be a GUID for this, since they're always unique.

There is a workaround, but it is merely a "dirty hack", you should seriously reconsider accepting new data entries while offline, especially when dealing with unique IDs to be inserted in many tables.
Say you have an "orders" table and another "orderDetails" table in your local dataset:
1- add a tmpID of type integer in your "orders" table to temporarily identify each unique order.
2- use the tmpID of your newly created order in the rest of the process (say for adding products to the current order in the orderDetails table)
--> once you are connected to the server, in a single transaction do the following
1- insert the first order in the "orders" table
2- get its uniqueID generated on your SQL server
3- search for each line in "orderDetails" that have a tmpID of currentOrder.tmpID and insert them in the "orderDetails" table on your server
4- commit the transaction and continue to the following row.
Keep in mind that this is very bad coding and that it can get real dirty and hard to maintain.

it looks like impossible to create unique numbers with two different systems both offline when it must be chronological and without missing numbers.
imho there is no way if the last number (on the server) was 10, to know if i should return 11 or 12; i would have to know if 11 was already used by another person.
I can only imagine to use a temporary number and later on renumber those numbers, but if the invoices are printed and the number can not be changed, i don't know how you could accomplish such a solution.

Related

Generating Unique & Randon codes in SQL Server and .NET

I have a requirement to generate a semi-random code in C#/ASP.NET that has to be unique in the SQL Server database.
These codes need to be generated in batches of up to 100 codes per run.
Given the requirements, I'm not sure how I can do this without generating a code and then checking the database to see if it exists, which seems like a horrible way of doing it.
Here are the requirements:
Maximum 10 characters long (alpha-numeric only)
Must not be case sensitive
User can specify an optional 3 character prefix for the code
Must not violate 2 column unique constraint in the database, i.e. must be a unique "code text" within the "category" (CONSTRAINT ucCodes UNIQUE (ColumnCodeText, ColumnCategoryId))
So, given the 10 character limit, GUIDs are not an option. Given the case insensitivity requirement, the mathematical probability for database collisions are fairly high, I think.
At the same time, there are enough possible combinations that a straight look-up table in the DB would be prohibitive, I believe.
Is there a reasonably performant way of generating codes with these requirements that doesn't involve saving them to the DB one code at a time and waiting for a unique key violation to see if it goes through?

You have two options here.
You generate a new ID and insert it. If it throws dup unique key exception then try again until you succeed or bail if you run out of IDs. The performance will stink if most of the IDs are used up.
You pregenerate all the possible IDs and store them in a table. Whenever you need to get one you can remove one from a random row index and use that as the ID. Database will take care of the concurrency for you so its guarantee unique. if the first three letters are given then you can simply add a where clause to restrict the rows to match that constraint.

How to implement Identity for group of records?

I have a table that contains a non primary key RequestID. When I do a bulkInsert, all the records must have the same RequestID. But If I do another BulkInsert, the next inserted rows must have RequestID incremented :
NewRequestID = PreviousRequestID + 1
The only solution I found so far -and I don't like it by the way-, is to get the last record everytime before inserting the new records.
Why I dont like this approach ? because the database is supposed to be relationnel, which means there is "no specific order". Besides, I don't have primary keys or Dates to order with.
What is the best way to implement this?
(I've added c# tag because i am using EF. if there is an easy solution with EF)

You could take a number of different approaches:
Are you guaranteed that your RequestID's are always incremented? If so, you could query table for largest RequestID and that should represent the "last one inserted."
You could track state somewhere in your application, but this is likely dangerous in scenarios where service fails/restarts (unless state is tracked externally).
Assuming you have control over the schema, if you don't want to update the particular table schema you are speaking of, you could create another table to track the last RequestID used, and retrieve it from there (which would protect you against service restarts/failures).
Those are a few that come to mind.
UPDATE:
Assuming RequestID isn't a particular type of identifier, you could use timestamp - which will always be incremented when you do a new batch, however, I'm not sure if you needed it to always be incremented by exactly '1' which would preclude this approach.

SQL Numeric Column forbid Gaps

I'm doing invoice table in my database. I need a integer column to store invoice number. This column has to forbid gap between number. Identity doesn't work because on rollback can produce a gap.
So what I want is:
InvoiceId(Primary key identity) InvoiceNumber(Unique, NOT NUll)
1 1
2 2
10 3
13 4
Is there a special way in sql to do this ?
If there is no solution in sql, how should I do it in c# + entity?
EDIT 1:
additionnal information: A row will be never deleted.
EDIT 2:
Why I need gapLess Column: it's a law requirement (french law). Invoice number as to be gapLess, to show that you didn't removed invoice.

There is no way to protect from gaps if you access your database in parallel.
Simple case:
Process A) creates an invoice. (#1)
Process B) creates an invoice. (#2)
Process A) rolls back it's transaction.
Gap.
You could either lock your whole table for the whole transaction. That means that only one of your processes can create an invoice at the same time. That might be acceptable for small companies where one person creates all invoices by hand.
Or you could leave the column empty and once per night you lock the whole table and write those numbers that are not set yet. That means you get the invoice number later in your invoice process, but it's without gaps.
Or you could read the requirements again. Germany had something equally stupid, but it was only meant to have those numbers on the form for the tax department. So you could use your normal invoice numbers with gaps and when sending them to this bureaucratic monstrosity, you would generate a unique, gap free number on export only.

Because there are multiple users you can't recalculate value at client side. Just create triggers in your database for insert/delete that will recalculate the InvoiceNumber for the entire table.

Bulk insert strategy from c# to SQL Server

In our current project, customers will send collection of a complex/nested messages to our system. Frequency of these messages are approx. 1000-2000 msg/per seconds.
These complex objects contains the transaction data (to be added) as well as master data (which will be added if not found). But instead of passing the ids of the master data, customer passes the 'name' column.
System checks if master data exist for these names. If found, it uses the ids from database otherwise create this master data first and then use these ids.
Once master data ids are resolved, system inserts the transactional data to a SQL Server database (using master data ids). Number of master entities per message are around 15-20.
Following are the some strategies we can adopt.
We can resolve master ids first from our C# code (and insert master data if not found) and store these ids in C# cache. Once all ids are resolved, we can bulk insert the transactional data using SqlBulkCopy class. We can hit the database 15 times to fetch the ids for different entities and then hit database one more time to insert the final data. We can use the same connection will close it after doing all this processing.
We can send all these messages containing master data and transactional data in single hit to the database (in the form of multiple TVP) and then inside stored procedure, create the master data first for the missing ones and then insert the transactional data.
Could anyone suggest the best approach in this use case?
Due to some privacy issue, I cannot share the actual object structure. But here is the hypothetical object structure which is very close to our business object.
One such message will contain information about one product (its master data) and its price details (transaction data) from different vendors:
Master data (which need to be added if not found)
Product name: ABC, ProductCateory: XYZ, Manufacturer: XXX and some other other details (number of properties are in the range of 15-20).
Transaction data (which will always be added)
Vendor Name: A, ListPrice: XXX, Discount: XXX
Vendor Name: B, ListPrice: XXX, Discount: XXX
Vendor Name: C, ListPrice: XXX, Discount: XXX
Vendor Name: D, ListPrice: XXX, Discount: XXX
Most of the information about the master data will remain the same for a message belong to one product (and will change less frequently) but transaction data will always fluctuate. So, system will check if the product 'XXX' exist in the system or not. If not it check if the 'Category' mentioned with this product exist of not. If not, it will insert a new record for category and then for product. This will be done to for Manufacturer and other master data.
Multiple vendors will be sending data about multiple products (2000-5000) at the same time.
So, assume that we have 1000 suppliers, Each vendor is sending data about 10-15 different products. After each 2-3 seconds, every vendor sends us the price updates of these 10 products. He may start sending data about new products, but which will not be very frequent.

You would likely be best off with your #2 idea (i.e. sending all of the 15 - 20 entities to the DB in one shot using multiple TVPs and processing as a whole set of up to 2000 messages).
Caching master data lookups at the app layer and translating prior to sending to the DB sounds great, but misses something:
You are going to have to hit the DB to get the initial list anyway
You are going to have to hit the DB to insert new entries anyway
Looking up values in a dictionary to replace with IDs is exactly what a database does (assume a Non-Clustered Index on each of these name-to-ID lookups)
Frequently queried values will have their datapages cached in the buffer pool (which is a memory cache)
Why duplicate at the app layer what is already provided and happening right now at the DB layer, especially given:
The 15 - 20 entities can have up to 20k records (which is a relatively small number, especially when considering that the Non-Clustered Index only needs to be two fields: Name and ID which can pack many rows into a single data page when using a 100% Fill Factor).
Not all 20k entries are "active" or "current", so you don't need to worry about caching all of them. So whatever values are current will be easily identified as the ones being queried, and those data pages (which may include some inactive entries, but no big deal there) will be the ones to get cached in the Buffer Pool.
Hence, you don't need to worry about aging out old entries OR forcing any key expirations or reloads due to possibly changing values (i.e. updated Name for a particular ID) as that is handled naturally.
Yes, in-memory caching is wonderful technology and greatly speeds up websites, but those scenarios / use-cases are for when non-database processes are requesting the same data over and over in pure read-only purposes. But this particular scenario is one in which data is being merged and the list of lookup values can be changing frequently (moreso due to new entries than due to updated entries).
That all being said, Option #2 is the way to go. I have done this technique several times with much success, though not with 15 TVPs. It might be that some optimizations / adjustments need to be made to the method to tune this particular situation, but what I have found to work well is:
Accept the data via TVP. I prefer this over SqlBulkCopy because:
it makes for an easily self-contained Stored Procedure
it fits very nicely into the app code to fully stream the collection(s) to the DB without needing to copy the collection(s) to a DataTable first, which is duplicating the collection, which is wasting CPU and memory. This requires that you create a method per each collection that returns IEnumerable<SqlDataRecord>, accepts the collection as input, and uses yield return; to send each record in the for or foreach loop.
TVPs are not great for statistics and hence not great for JOINing to (though this can be mitigated by using a TOP (#RecordCount) in the queries), but you don't need to worry about that anyway since they are only used to populate the real tables with any missing values
Step 1: Insert missing Names for each entity. Remember that there should be a NonClustered Index on the [Name] field for each entity, and assuming that the ID is the Clustered Index, that value will naturally be a part of the index, hence [Name] only will provide a covering index in addition to helping the following operation. And also remember that any prior executions for this client (i.e. roughly the same entity values) will cause the data pages for these indexes to remain cached in the Buffer Pool (i.e. memory).
;WITH cte AS
(
SELECT DISTINCT tmp.[Name]
FROM #EntityNumeroUno tmp
)
INSERT INTO EntityNumeroUno ([Name])
SELECT cte.[Name]
FROM cte
WHERE NOT EXISTS(
SELECT *
FROM EntityNumeroUno tab
WHERE tab.[Name] = cte.[Name]
)
Step 2: INSERT all of the "messages" in simple INSERT...SELECT where the data pages for the lookup tables (i.e. the "entities") are already cached in the Buffer Pool due to Step 1
Finally, keep in mind that conjecture / assumptions / educated guesses are no substitute for testing. You need to try a few methods to see what works best for your particular situation since there might be additional details that have not been shared that could influence what is considered "ideal" here.
I will say that if the Messages are insert-only, then Vlad's idea might be faster. The method I am describing here I have used in situations that were more complex and required full syncing (updates and deletes) and did additional validations and creation of related operational data (not lookup values). Using SqlBulkCopy might be faster on straight inserts (though for only 2000 records I doubt there is much difference if any at all), but this assumes you are loading directly to the destination tables (messages and lookups) and not into intermediary / staging tables (and I believe Vlad's idea is to SqlBulkCopy directly to the destination tables). However, as stated above, using an external cache (i.e. not the Buffer Pool) is also more error prone due to the issue of updating lookup values. It could take more code than it's worth to account for invalidating an external cache, especially if using an external cache is only marginally faster. That additional risk / maintenance needs to be factored into which method is overall better for your needs.
UPDATE
Based on info provided in comments, we now know:
There are multiple Vendors
There are multiple Products offered by each Vendor
Products are not unique to a Vendor; Products are sold by 1 or more Vendors
Product properties are singular
Pricing info has properties that can have multiple records
Pricing info is INSERT-only (i.e. point-in-time history)
Unique Product is determined by SKU (or similar field)
Once created, a Product coming through with an existing SKU but different properties otherwise (e.g. category, manufacturer, etc) will be considered the same Product; the differences will be ignored
With all of this in mind, I will still recommend TVPs, but to re-think the approach and make it Vendor-centric, not Product-centric. The assumption here is that Vendor's send files whenever. So when you get a file, import it. The only lookup you would be doing ahead of time is the Vendor. Here is the basic layout:
Seems reasonable to assume that you already have a VendorID at this point because why would the system be importing a file from an unknown source?
You can import in batches
Create a SendRows method that:
accepts a FileStream or something that allows for advancing through a file
accepts something like int BatchSize
returns IEnumerable<SqlDataRecord>
creates a SqlDataRecord to match the TVP structure
for loops though the FileStream until either BatchSize has been met or no more records in the File
perform any necessary validations on the data
map the data to the SqlDataRecord
call yield return;
Open the file
While there is data in the file
call the stored proc
pass in VendorID
pass in SendRows(FileStream, BatchSize) for the TVP
Close the file
Experiment with:
opening the SqlConnection before the loop around the FileStream and closing it after the loops are done
Opening the SqlConnection, executing the stored procedure, and closing the SqlConnection inside of the FileStream loop
Experiment with various BatchSize values. Start at 100, then 200, 500, etc.
The stored proc will handle inserting new Products
Using this type of structure you will be sending in Product properties that are not used (i.e. only the SKU is used for the look up of existing Products). BUT, it scales very well as there is no upper-bound regarding file size. If the Vendor sends 50 Products, fine. If they send 50k Products, fine. If they send 4 million Products (which is the system I worked on and it did handle updating Product info that was different for any of its properties!), then fine. No increase in memory at the app layer or DB layer to handle even 10 million Products. The time the import takes should increase in step with the amount of Products sent.
UPDATE 2
New details related to Source data:
comes from Azure EventHub
comes in the form of C# objects (no files)
Product details come in through O.P.'s system's APIs
is collected in single queue (just pull data out insert into database)
If the data source is C# objects then I would most definitely use TVPs as you can send them over as is via the method I described in my first update (i.e. a method that returns IEnumerable<SqlDataRecord>). Send one or more TVPs for the Price/Offer per Vendor details but regular input params for the singular Property attributes. For example:
CREATE PROCEDURE dbo.ImportProduct
(
#SKU VARCHAR(50),
#ProductName NVARCHAR(100),
#Manufacturer NVARCHAR(100),
#Category NVARCHAR(300),
#VendorPrices dbo.VendorPrices READONLY,
#DiscountCoupons dbo.DiscountCoupons READONLY
)
SET NOCOUNT ON;
-- Insert Product if it doesn't already exist
IF (NOT EXISTS(
SELECT *
FROM dbo.Products pr
WHERE pr.SKU = #SKU
)
)
BEGIN
INSERT INTO dbo.Products (SKU, ProductName, Manufacturer, Category, ...)
VALUES (#SKU, #ProductName, #Manufacturer, #Category, ...);
END;
...INSERT data from TVPs
-- might need OPTION (RECOMPILE) per each TVP query to ensure proper estimated rows

From a DB point of view, there's no such fast thing than BULK INSERT (from csv files for example). The best is to bulk all data asap, then process it with stored procedures.
A C# layer will just slow down the process, since all the queries between C# and SQL will be thousands times slower than what Sql-Server can directly handle.

TSQL: Generate human readable ids

We have a large database with enquiries, each enquirys is referenced using a Guid. The Guid isn't very customer friendly so we want to the additional 5 digit "human id" (ok as we'll very likely won't have more than 99999 enquirys active at any time, and it's ok if a humanuid reference multiple enquirys as they aren't used for anything important).
1) Is there any way to have a IDENTITY column reset to 1 after 99999?
My current workaround to this is to use a INT IDENTITY(1,1) NOT NULL column and when presenting a HumanId take HumanId % 100000.
2) Is there any way to automatically "randomly distribute" the ids over [0..99999] so that two enquirys created after each other don't get the adjacent ids? I guess I'm looking for a two-way one-to-one hash function??
... Ideally I'd like to create this using T-SQL automatically creating these id's when a enquiry is created.

If performance and concurrency isn't too much of an issue, you can use triggers and the MAX() function to calculate a 'next human ID' value. You probably would want to keep your IDENTITY column as is, and have the 'human ID' in a separate column.
EDIT: On a side note, this sounds like a 'presentation layer' issue, which shouldn't be in your database. Your presentation layer of your application should have the code to worry about presenting a record in a human readable manner. Just a thought...

If you absolutely need to do this in the database, then why not derive your human-friendly value directly from the GUID column?
-- human_id doesn't have to be calculated when you retrieve the data
-- you could create a computed column on the table itself if you prefer
SELECT (CAST(your_guid_column AS BINARY(3)) % 100000) AS human_id
FROM your_table
This will give you a random-ish value between 0 and 99999, derived from the first 3 bytes of the GUID. If you want a larger, or smaller, range then adjust the divisor accordingly.

I would strongly recommend relooking at your logic. Your approach has a few dangers, including:
It is always a bad idea to re-use ID's, even if the original record has become "obsolete" - do you lose anything by continuing to grow ID's beyond 99999? The problem here is more likely to be with long term maintenance, especially if there is any danger of the system developing over time. Another thing to consider - is there any chance a user will take this reference number, and use it to reference your system at some stage in the future?
With manually assigning a generated / random ID, you will need to ensure that multiple records are not assigned the same ID. There are a few options that you have to follow this (for example, using transactions), however you should ensure that the scope of the transactions is not going to leave you open to problems with concurrent transactions being blocked - this may cause a few problems eg. Performance. You may be best served by generating your ID externally (as SQL does not do random especially well), and then enforcing a unique constraint on your DB, perhaps in the way suggested by Firoz Ansari.
If you still want to reset the identity column, this can be done with the DBCC CHECKIDENT command.
An example of generating random seeds in SQL server can be found here:
http://weblogs.sqlteam.com/jeffs/archive/2004/11/22/2927.aspx

You can create composite primary key with two columns, say..BatchId and HumanId.
Records in these columns will look like this:
BatchId, HumanId
1, 1
1, 2
1, 3
.
.
1, 99998
1, 99999
2, 1
2, 2
3, 3
use MAX or ORDER BY DESC to get next available HumanId with condition with BachId
SELECT TOP 1 #NextHumanId=HumanId
FROM [THAT_TABLE]
ORDER BY BatchId DESC, HumanID DESC
IF #NextHumanId>=99999 THEN SET #NextHumanId=1
Hope this help.

You could have a table of available HUMANIDs, each time you add an enquiry you could randomly pull a HUMANID from the table (and DELETE it), and each time you delete the enquiry you could add it back (by INSERTing).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.