Prevent insertion of duplicate documents into Lotus notes database

Prevent insertion of duplicate documents into Lotus notes database - c#

I have a c# web api hosted in iis which has a post method that takes a list of document ids to insert into a lotus notes database.
The post method can be called multiple times and I want to prevent insertion of duplicate documents.
This is the code(in a static class) that is called from the post:
lock (thisLock)
{
var id = "some unique id";
doc = vw.GetDocumentByKey(id, false);
if (doc == null)
{
NotesDocument docNew = db.CreateDocument();
//some more processing
docNew.Save(true, false, false);
}
}
Even with the lock in place, I am running into scenarios where duplicate documents are inserted. Is it because a request can be execute on a new process? What is the best way to prevent it from happening?

Your problem is: getdocumentbykey depends on the view index being up to date. On a busy server there is no guarantee that this is true. You can TRY to call a vw.Update, but unfortunately this does not trigger an update of the view index, so it might be without any effect (it just updates the vw object to represent what has changed in the backend, if the backend did not update, then it does nothing).
You could use db.Search('IdField ="' & id & '"', Nothing, 0) instead, as the search does not rely on an index to be rebuilt. This will be slightly slower, but should be way more accurate.

you might want to store the inserted ids in some singleton object or even simply static list. And lock on this list - whoever obtains the lock verifies that the ids it wants to insert are not present and then adds them to the list itself.
You need to keep them only for a short length of time, just so that 2 concurrent posts with the same content does not update plus normal view index gets updated. So rather store timestamp along id, so you can clean out older records if the list grows long.

Related

How to update List<string> column using Mapper.UpdateAsync in Cassandra?

I am a newbie to Cassandra and my current project called for me to create a table with the following columns:
id uuid PRIMARY KEY,
connections list<text>,
username text
I am using Cassandra's IMapper interface to handle my CRUD operations. While I found documentation that describes how to use the Mapping component for basic operations here:
http://docs.datastax.com/en/developer/csharp-driver/2.5/csharp-driver/reference/mapperComponent.html
I could not find documentation that outlines how to add and remove items from the List column for a specific record using the Mapper component. I tried to retrieve the record from the database, update the entity and save the changes to the record but the record is not updated in the database. It remains the same after the Update. However, the insert operation works and it mirrors the entity down to the object in the list.
User user = await _mapper.SingleAsync<T>("where Name = " + name);
user.Addresses = user.Addresses.Concat(new string[] { address });
await _mapper.UpdateAsync<T>(user);
How should this scenario be handled in Cassandra?

You can use the plus (+) and minus (-) CQL operators to append / prepend or remove items from a list.
In your case it would be:
// When using parameters, use query markers (?) instead of
// hardcoded stringified values
User user = await _mapper.SingleAsync<User>("where id = ?", id);
await _mapper.UpdateAsync<T>(
"SET connections = connections + ? WHERE id = ?", newConnections, id");
Note that append and prepend operations are not idempotent by nature. So in particular, if one of these operation timeout, then retrying the operation is not safe and it may (or may not) lead to appending/prepending the value twice.

I think in order for this to work and be efficient you may need several things:
partial update
It is atomic and doesn't require you to fetch the record first. Also, specifying only the fields you want to update avoids passing unnecessary load on the wire and relieves the pressure on the compactor.
use CqlOperator.Append/CqlOperator.Prepend/CqlOperator.SubstractAssign
Which allow you to specify only the collection items you want to add/remove.
Both of these optimizations are available via the Table API, not sure about the Mapper.

Getting Duplicate Objects in Producer/Consumer ConcurrentDictionary C#

I'm stuck on a problem and am wondering if I just have coded something incorrectly. The application polls every few seconds and grabs every record from a table whose sole purpose is to signify what records to act upon.
Please note I've left out the error handling code for space and readability
//Producing Thread, this is triggered every 5 seconds... UGH, I hate timers
foreach (var Record in GetRecordsFromDataBase()) // returns a dictionary
{
if (!ConcurrentDictionary.Contains(Record.Key))
ConcurrentDictionary.TryAdd(Record.Key, Record.Value);
}
This code works great, with the irritating fact that it may/will select the same record multiple times until said record(s) is/are processed. By processed, each selected record is being written into its own newly created, uniquely named file. Then a stored procedure is called for that record's key to remove it from the database at which point that particular key is removed from the ConcurrentDictionary.
// Consuming Thread, located within another loop to allow
// the below code to continue to cycle until instructed
// to terminate
while (!ConcurrentDictionary.IsEmpty)
{
var Record = ConcurrentDictionary.Take(1).First();
WriteToNewFile(Record.Value);
RemoveFromDatabase(Record.Key);
ConcurrentDictionary.TryRemove(Record.Key);
}
For a throughput test I added 20k+ records into the table and then turned the application loose. I was quite surprised when I noticed 22k+ files that continued to increase well into 100k+ territory.
What am I doing wrong??? Have I completely misunderstood what the concurrent dictionary is used for? Did I forget a semi-colon somewhere?

First, eliminate the call to Contains. TryAdd already checks for duplicates, and returns false if the item is already present.
foreach (var Record in GetRecordsFromDataBase()) // returns a dictionary
{
ConcurrentDictionary.TryAdd(Record.Key, Record.Value);
}
The next problem I see is that I don't think that ConcurrentDictionary.Take(1).First() is a good way to get an item from the dictionary since it isn't atomic. I think you want to use a BlockingCollection() instead. It is specifically designed for implementing a producer-consumer pattern.
Lastly, I think your problems don't really have to do with the Dictionary, but with the database. The dictionary itself is thread-safe, but your dictioanry is not atomic with the database. So suppose record A is in the database. GetRecordsFromDataBase() pulls it and adds it to the dictionary. Then it begins processing record A (I assume this is in another thread). Then, that first loop again calls GetRecordsFromDataBase() and gets record A again. Simultaneously, record A is processed and removed from the database. But it's too late! GetRecordsFromDataBase() already grabbed it! So that initial loop adds it to the dictionary again, after it has been removed.
I think you may need to take records that are to be processed, and move them into another table entirely. That way, they won't get picked-up a second time. Doing this at the C# level, rather than the database level, is going to be a problem. Either that, or you don't want to be adding records to the queue while processing records.

What am I doing wrong???
The foreach (add) loop is trying to add any record not in the database to the dictionary.
The while (remove) loop is removing items from the database and then the dictionary, also writing them to file.
This logic looks correct. But there is a race:
GetRecordsFromDataBase(); // returns records 1 through 10.
switch context to remove loop.
WriteToNewFile(Record.Value); // write record 5
RemoveFromDatabase(Record.Key); // remove record 5 from db
ConcurrentDictionary.TryRemove(Record.Key); // remove record 5 from dictionary
switch back to add loop
ConcurrentDictionary.TryAdd(Record.Key, Record.Value); // add record 5 even though it is not in the DB becuase it was part of the records returned by ConcurrentDictionary.TryAdd(Record.Key, Record.Value);;
After the item is removed the foreach loop adds it again. This is why your file count is multiplying.
foreach (var Record in GetRecordsFromDataBase()) // returns a dictionary
{
if (!ConcurrentDictionary.Contains(Record.Key)) // this if is not required. try add will do.
ConcurrentDictionary.TryAdd(Record.Key, Record.Value);
}
Try something like this:
add loop:
foreach (var Record in GetRecordsFromDataBase()) // returns a dictionary
{
if (ConcurrentDictionary.TryAdd(Record.Key, false)) // only adds the record if it has not been processed.
{
ConcurrentQueue.Enque(record) // enqueue the record
}
}
Remove loop
var record;// you will need to specify the type
if (ConcurrentQueue.TryDequeue(record))
{
if (ConcurrentDictionary.TryUpdate(record.key,true,false)) // update the value from true to false
{
WriteToNewFile(Record.Value); // write record 5
RemoveFromDatabase(Record.Key); // remove record 5 from db
}
}
This will leave items in the dictionary for each record processed. You can remove them from the dictionary eventually but multithreading involving a db can be tricky.

Generating unique ID for clients on a system running over a LAN in C#

I've a simple client registration system that runs over a network. The system is supposed to generate a unique three digit ID (primary key) with the current year concatenated (e.g. 001-2013). However, I've encountered the problem that the same primary keys being generated when two users from different computers (over a LAN) try to register different clients.
What if the user cancels the registration after an ID is already generated? I've to reuse that ID for another client. I've read about static variable but it didn't solve my problem. I really appreciate your ideas.

Unique and sequential IDs are hard to implement. To completely achive it you would have to serialize commiting creation of client information so ID generated only when data is actually stored, otherwise you'll endup with holes when something wrong happened during submittion.
If you don't need strict sequential numbers - giving out ranges of ID (1-22, 23-44,...) to each system is common approach. Instead of ranges you can give out lists of IDs to use ({1,3,233,234}, {235,236,237}) if you need to use as many IDs as possible.

Issue:
New item -001 is created, but not saved yet
New item -002 is created, but not saved yet
Item -001 is cancelled
What to do with ID -001?
The easiest solution is to simply not assign an ID until an item is definitely stored.
An alternative is, when finally saving an item, you look up the first free ID. If the item from step 2 (#2) is saved before the one from step 1, #2 gets ID -001. When #1 then gets saved, the saving logic sees that its claimed ID (-001) is in use, so it'll assign -002. So ID's get reassigned.
Finally you can simply find the next free ID when creating a new item. In the three steps described above, this'll mean you initially have a gap where -001 is supposed to be. If you now create a new item, your code will see -001 is unused and will assign that to the new item.
But, and that totally depends on your requirements which you didn't specify, now -001 was created later in time than -002, I do not know if that is allowed. Furthermore at any given moment you can have a gap in your numbering where an item has been cancelled. If it happens at the end of a reporting period, this will cause errors (-033, -034, -036).
You also might want to include an auto-incrementing primary key instead of this invoice number or whatever it is.

Website database duplicate records

On every page of my website a token is passed in as a querystring parameter. The server-side code checks to see if the token already exists in the database. (The token is a uniqueidentifider field in the database). If the token exists then it will use the existing one, if not then it will create a new row with the new token.
The problem is once in a while I see a duplicate record in the database (two rows with the same uniqueidentifider). I have noticed the record insertion times were about half a second apart. My only guess is when the site is being visited for the first time the aspx pages weren't fully compiled. So it takes a few seconds and the user goes to another page of the site by typing in a different url and the two requests were executed almost at the same time.
Is there a way to prevent this duplicate record problem from happening? (on the server-side or in the database??...)
This is the code in questions that's part of every page of the website.
var record = (from x in db.Items
where x.Token == token
select x).FirstOrDefault();
if (record == null)
{
var x = new Item();
x.Id = Guid.NewGuid();
x.Token = token;
db.Items.InsertOnSubmit(x)
db.SubmitChanges;
}

Yes, create a unique index on your token field.
create unique index tab_token on your_table(token);
This way, database will make sure you will never store two records with the same token value. Keep in mind that your code might fail when running this due to the index constraint, so make sure you are catching that exception in your code and treat it accordingly.
What is probably happening is that two request are being served at the exact same time and some racing conditions are causing two tokens getting the same value. It would help to identify your problem if you post some code.

ThreadPool and GUI wait question

I am new to threads and in need of help. I have a data entry app that takes an exorbitant amount of time to insert a new record(i.e 50-75 seconds). So my solution was to send an insert statement out via a ThreadPool and allow the user to begin entering the data for the record while that insert which returns a new record ID while that insert is running. My problem is that a user can hit save before the new ID is returned from that insert.
I tried putting in a Boolean variable which get set to true via an event from that thread when it is safe to save. I then put in
while (safeToSave == false)
{
Thread.Sleep(200)
}
I think that is a bad idea. If i run the save method before that tread returns, it gets stuck.
So my questions are:
Is there a better way of doing this?
What am I doing wrong here?
Thanks for any help.
Doug
Edit for more information:
It is doing an insert into a very large (approaching max size) FoxPro database. The file has about 200 fields and almost as many indexes on it.
And before you ask, no I cannot change the structure of it as it was here before I was and there is a ton of legacy code hitting it. The first problem is, in order to get a new ID I must first find the max(id) in the table then increment and checksum it. That takes about 45 seconds. Then the first insert is simply and insert of that new id and an enterdate field. This table is not/ cannot be put into a DBC so that rules out auto-generating ids and the like.
#joshua.ewer
You have the proccess correct and I think for the short term I will just disable the save button, but I will be looking into your idea of passing it into a queue. Do you have any references to MSMQ that I should take a look at?

1) Many :), for example you could disable the "save" button while the thread is inserting the object, or you can setup a Thread Worker which handle a queue of "save requests" (but I think the problem here is that the user wants to modify the newly created record, so disabling the button maybe it's better)
2) I think we need some more code to be able to understand... (or maybe is a synchronization issue, I am not a bug fan of threads too)
btw, I just don't understand why an insert should take so long..I think that you should check that code first! <- just as charles stated before (sorry, dind't read the post) :)

Everyone else, including you, addressed the core problems (insert time, why you're doing an insert, then update), so I'll stick with just the technical concerns with your proposed solution. So, if I get the flow right:
Thread 1: Start data entry for
record
Thread 2: Background calls to DB to retrieve new Id
The save button is always enabled,
if user tries to save before Thread
2 completes, you put #1 to sleep for
200 ms?
The simplest, not best, answer is to just have the button disabled, and have that thread make a callback to a delegate that enables the button. They can't start the update operation until you're sure things are set up appropriately.
Though, I think a much better solution (though it might be overblown if you're just building a Q&D front end to FoxPro), would be to throw those save operations into a queue. The user can key as quickly as possible, then the requests are put into something like MSMQ and they can complete in their own time asynchronously.

Use a future rather than a raw ThreadPool action. Execute the future, allow the user to do whatever they want, when they hit Save on the 2nd record, request the value from the future. If the 1st insert finished already, you'll get the ID right away and the 2nd insert will be allowed to kick off. If you are still waiting on the 1st operation, the future will block until it is available, and then the 2nd operation can execute.
You're not saving any time unless the user is slower than the operation.

First, you should probably find out, and fix, the reason why an insert is taking so long... 50-75 seconds is unreasonable for any modern database for a single row insert, and indicates that something else needs to be addressed, like indices, or blocking...
Secondly, why are you inserting the record before you have the data? Normally, data entry apps are coded so that the insert is not attempted until all the necessary data for the insert has been gathered from the user. Are you doing this because you are trying to get the new Id back from the database first, and then "update" the new empty record with the user-entered data later? If so, almost every database vendor has a mechanism where you can do the insert only once, without knowing the new ID, and have the database return the new ID as well... What vendor database are you using?

Is a solution like this possible:
Pre-calculate the unique IDs before a user even starts to add. Keep a list of unique Id's that are already in the table but are effectively place holders. When a user is trying to insert, reserve them one of the unique IDs, when the user presses save, they now replace the place-holder with their data.
PS: It's difficult to confirm this, but be aware of the following concurrency issue with what you are proposing (with or without threads): User A, starts to add, user B starts to add, user A calculates ID 1234 as the max free ID, user B calculates ID 1234 as the max free ID. User A inserts ID 1234, User B inserts ID 1234 = Boom!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.