Add column to existing SQL Server table - Implications - c#

I have an existing table in SQL Server with existing entries (over 1 million in fact).
This table gets updated, inserted and selected from on a regular basis by a front-end application. I want/need to add a datetime column e.g. M_DateModified that can be updated like so:
UPDATE Table SET M_DateModified = GETDATE()
whenever a button gets pressed on the front-end and a stored procedure gets called. This column will be added to an existing report as requested.
My problem, and answer is this. Being one of the core tables of our app, will ALTERING the table and adding an additional column break other existing queries? Obviously, you can't insert into a table without specifying all values for all columns so any existing INSERT queries will break (WHICH is a massive problem).
Any help would be much appreciated on the best solution regarding this problem.

First, as marc_s says, It should only affect SELECT * queries, and not even all of them would necessarily be affected.
Secondly, you only need to specify all non-Null fields on an INSERT, so if you make it NULL-able, you don't have to worry about that. Further, for a Created_Date-type column, it is typical to add a DEFAULT setting of =GetDate(), which will fill it in for you if it is not specified.
Thirdly, if you are still worried about impacting your existing code-base, then do the following:
Rename your table to something like "physicalTable".
Create a View with the same name that your table formely had, that does a SELECT .. FROM physicalTable, listing the columns explicitly and in the same order, but do not include the M_DateModified field in it.
Leave your code unmodified, now referencing the View, instead of directly accessing the table.
Now your code can safely interact with the table without any changes (SQL DML code cannot tell the difference between a Table and a writeable View like this).
Finally, this kind of "ModifiedDate" column is a common need and is most often handled, first by making it NULL-able, then by adding an Insert & Update trigger that sets it automatically:
UPDATE t
SET M_DateModified = GetDate()
FROM (SELECT * FROM physicalTable y JOIN inserted i ON y.PkId = i.PkId) As t
This way the application does not have to maintain the field itself. As an added bonus, neither can the application set it incorrectly or falsely (this is a common and acceptable use of triggers in SQL).

If the new column is not mandantory you have nothing to worry about. Unless you have some knuckleheads who wrote select statements with a "*" instead of column list.

Well, as long as your SELECTs are not *, those should be fine. For the INSERTs, if you give the field a default of GETDATE() and allow NULLs, you can exclude it and it will still be filled.

Depends on how your other queries are set up. If they are SELECT [Item1], [Item2], ect.... Then you won't face any issues. If it's a SELECT * FROM then you may experience some unexpected results.
Keep in mind how you want to set it up, you'll either have to set it to be nullable which could give you fits down the road, or set a default date, which could give you incorrect data for reporting, retrieval, queries, ect..

Related

Fastest Way To Upsert Using Postgres and C#

I am writing an application in C# that will copy data from one postgres table to another on a regular basis. I am using the NPGSql library.
I have run into the following issue: When there are thousands of rows to be copied (> 10k), the program runs very slowly.
I have tried:
For my first attempt, I pulled the entirety of the destination table, then compared the data I was inserting to the data that already existed. Then, I would write an insert or update statement depending on whether it already existed but had alterations, or whether it did not exist at all. This was the worst solution, as every individual statement had to be sent as a command.
Next, I tried putting an "on conflict" trigger on the actual table. This let me send all of the inserts as bulk INSERT INTO.... statements, and the table would take care of updates. This was significantly faster, but not fast enough.
I read about Postgres's COPY method, but it does not seem to suit my needs. It seems that COPY will do ONLY an insert, and NOT an upsert. Because I am modifying this table several times, some of the data will be new, but some will be old rows that need updating.
Has anyone come up with a fast way to UPSERT, provided that I need an option to EDIT a row, not just do a blanket mass INSERT of all of my data?
Please let me know if I can provide any other information
Thank you so much for your time
First of all, I assume the tables are on different databases, otherwise I would just do this all in DML.
I think copy is definitely your friend. There is no faster way to extract or load data, and then you can let the database do the heavy lifting.
On the source database:
copy source_table
to '/var/tmp/foo.csv' csv;
On the destination database:
truncate temp_table;
copy temp_table
from '/var/tmp/foo.csv' csv;
insert into destination_table
select *
from temp_table t
where not exists (
select null
from destination_table d
where t.id = d.id
);
update destination_table d
set
field1 = t.field1,
field2 = t.field2
from temp_table t
where
d.id = t.id and
(d.field1 is distinct from t.field1 or
d.field2 is distinct from t.field2)
It would be great if you can do something like this if the data is readily available:
Couple of other comments:
the insert into uses an anti-join, and this is my favorite construct to insert missing records
on the update, it's important to specify the criteria for what you udpate -- don't update everything; only those records that have changed. This will make a big difference in performance. Hopefully there are a set number of fields you can use to determine if a record has changed.
If there is a field that indicates the record has been updated (last_update_date or something similar), a slightly lazier and wonderful approach is to delete those records and let the anti-join insert re-insert them. This would omit the need for the update statement and would be much less code for tables with lots of columns

Primary key violation error in sql server 2008

I have created two threads in C# and I am calling two separate functions in parallel. Both functions read the last ID from XYZ table and insert new record with value ID+1. Here ID column is the primary key. When I execute the both functions I am getting primary key violation error. Both function having the below query:
insert into XYZ values((SELECT max(ID)+1 from XYZ),'Name')
Seems like both functions are reading the value at a time and trying to insert with the same value.
How can I solve this problem.. ?
Let the database handle selecting the ID for you. It's obvious from your code above that what you really want is an auto-incrementing integer ID column, which the database can definitely handle doing for you. So set up your table properly and instead of your current insert statement, do this:
insert into XYZ values('Name')
If your database table is already set up I believe you can issue a statement similar to:
alter table your_table modify column you_table_id int(size) auto_increment
Finally, if none of these solutions are adequate for whatever reason (including, as you indicated in the comments section, inability to edit the table schema) then you can do as one of the other users suggested in the comments and create a synchronized method to find the next ID. You would basically just create a static method that returns an int, issue your select id statement in that static method, and use the returned result to insert your next record into the table. Since this method would not guarantee a successful insert (due to external applications ability to also insert into the same table) you would also have to catch Exceptions and retry on failure).
Set ID column to be "Identity" column. Then, you can execute your queries as:
insert into XYZ values('Name')
I think that you can't use ALTER TABLE to change column to be Identity after column is created. Use Managament Studio to set this column to be Identity. If your table has many rows, this can be a long running process, because it will actually copy your data to a new table (will perform table re-creation).
Most likely that option is disabled in your Managament Studio. In order to enable it open Tools->Options->Designers and uncheck option "Prevent saving changes that require table re-creation"...depending on your table size, you will probably have to set timeout, too. Your table will be locked during that time.
A solution for such problems is to have generate the ID using some kind of a sequence.
For example, in SQL Server you can create a sequence using the command below:
CREATE SEQUENCE Test.CountBy1
START WITH 1
INCREMENT BY 1 ;
GO
Then in C#, you can retrieve the next value out of Test and assign it to the ID before inserting it.
It sounds like you want a higher transaction isolation level or more restrictive locking.
I don't use these features too often, so hopefully somebody will suggest an edit if I'm wrong, but you want one of these:
-- specify the strictest isolation level
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
insert into XYZ values((SELECT max(ID)+1 from XYZ),'Name')
or
-- make locks exclusive so other transactions cannot access the same rows
insert into XYZ values((SELECT max(ID)+1 from XYZ WITH (XLOCK)),'Name')

How to update many databases when we update any table?

I am creating a C# Windows application which is based on a medical inventory.In this application I have mainly three forms as PurchaseDetail,SalesDetail,and StockDetail.
Now I want a functionality in which if I insert or modify the records in PurchaseDetail,or SalesDetail, the data in the StockDetail should also be modified.(for example if i insert some quantity of medicines in PurchaseDetail then Quantity in StockDetail should also modified and same as for SalesDetail )
Columns in PurchaseDetail:
Id(Primary Key and auto increment int),BatchNumber,MedicineName,ManufacturingDate,ExpiryDate,Rate,MRP,Tax,Discount,Quantity
Columns in SalesDetail:
Id(PrimaryKey and auto increment int),BillNumber,CustomerName,BatchNumber,Quantity,Rate,SalesDate
Columns in StockDetail:
Id(Primary Key and auto increment int),ProductId,ProductName,OpeningStock,ClosingStock,PurchaseQty,DispenseQty,PurchaseReturn,DispenseReturn
Please help me.
I'm guessing that you are talking about 3 tables in one database (although the title seems to indicate otherwise).
You could use triggers to achieve what you are asking for. (Note that I gave you a tsql example in the link)
OR,
You could write a transactional procedure to perform all steps (or none) at once.
I'm assuming that you mean 'update many tables', not multiple databases.
There are at least 3 ways to do this
Database Triggers on PurchaseDetail and SalesDetail which also update StockDetail.
Wrap the inserts into PurchaseDetail or SalesDetail with Stored Procs, which also update the StockDetail accordingly
Do this from code (in your C# layer), similar to 2.

Linq to Entities or EF version of Set Identity_Insert <TableName> ON

Similar to This Question using linq to SQL, but I don't want to just execute SQL commands from the code. I could write a stored procedure.
I am writing the year rollover functions for an application and I would like to be able to make sure that the next year uses the next available PK slot so that I can use math to go back between years.
The user wants a roll back function also, so there is the distinct possibility of gaps since a year will be deleted at that point.
This also begs the question of whether relying on pk values to be sequential is too brittle...
Question: Is there a way to short-circuit the way EF inserts records and specify the primary key I would like inserted with the record?
I would say your design is absolutely too brittle. The PK really should not be an application concern except for retrieving a given record, imo.
That said, if you must do it that way, you can set the StoreGeneratedPattern flag to "None" and then insert whatever PK you want to from the app, but of course if hte DB itself is using an autoincrementing key of some kind (e.g. IDENTITY), then you'll still break.
Update
Why do the requirements to a) have one row per year and b) roll back each year translate into anything at all for the PK? Why not just have a 'year' column (set to UNIQUE or not) which can be used in your query?

How to avoid a database race condition when manually incrementing PK of new row

I have a legacy data table in SQL Server 2005 that has a PK with no identity/autoincrement and no power to implement one.
As a result, I am forced to create new records in ASP.NET manually via the ole "SELECT MAX(id) + 1 FROM table"-before-insert technique.
Obviously this creates a race condition on the ID in the event of simultaneous inserts.
What's the best way to gracefully resolve the event of a race collision? I'm looking for VB.NET or C# code ideas along the lines of detecting a collision and then re-attempting the failed insert by getting yet another max(id) + 1. Can this be done?
Thoughts? Comments? Wisdom?
Thank you!
NOTE: What if I cannot change the database in any way?
Create an auxiliary table with an identity column. In a transaction insert into the aux table, retrieve the value and use it to insert in your legacy table. At this point you can even delete the row inserted in the aux table, the point is just to use it as a source of incremented values.
Not being able to change database schema is harsh.
If you insert existing PK into table you will get SqlException with a message indicating PK constraint violation. Catch this exception and retry insert a few times until you succeed. If you find that collision rate is too high, you may try max(id) + <small-random-int> instead of max(id) + 1. Note that with this approach your ids will have gaps and the id space will be exhausted sooner.
Another possible approach is to emulate autoincrementing id outside of database. For instance, create a static integer, Interlocked.Increment it every time you need next id and use returned value. The tricky part is to initialize this static counter to good value. I would do it with Interlocked.CompareExchange:
class Autoincrement {
static int id = -1;
public static int NextId() {
if (id == -1) {
// not initialized - initialize
int lastId = <select max(id) from db>
Interlocked.CompareExchange(id, -1, lastId);
}
// get next id atomically
return Interlocked.Increment(id);
}
}
Obviously the latter works only if all inserted ids are obtained via Autoincrement.NextId of single process.
The key is to do it in one statement or one transaction.
Can you do this?
INSERT (PKcol, col2, col3, ...)
SELECT (SELECT MAX(id) + 1 FROM table WITH (HOLDLOCK, UPDLOCK)), #val2, #val3, ...
Without testing, this will probably work too:
INSERT (PKcol, col2, col3, ...)
VALUES ((SELECT MAX(id) + 1 FROM table WITH (HOLDLOCK, UPDLOCK)), #val2, #val3, ...)
If you can't, another way is to do it in a trigger.
The trigger is part of the INSERT transaction
Use HOLDLOCK, UPDLOCK for the MAX. This holds the row lock until commit
The row being updated is locked for the duration
A second insert will wait until the first completes.
The downside is that you are changing the primary key.
An auxiliary table needs to be part of a transaction.
Or change the schema as suggested...
Note: All you need is a source of ever-increasing integers. It doesn't have to come from the same database, or even from a database at all.
Personally, I would use SQL Express because it is free and easy.
If you have a single web server:
Create a SQL Express database on the web server with a single table [ids] with a single autoincrementing field [new_id]. Insert a record into this [ids] table, get the [new_id], and pass that onto your database layer as the PK of the table in question.
If you have multiple web servers:
It's a pain to setup, but you can use the same trick by setting appropriate seed/increment (i.e. increment = 3, and seed = 1/2/3 for three web servers).
What about running the whole batch (select for id and insert) in serializable transaction?
That should get you around needing to make changes in the database.
Is the main concern concurrent access? I mean, will multiple instances of your app (or, God forbid, other apps outside your control) be performing inserts concurrently?
If not, you can probably manage the inserts through a central, synchronized module in your app, and avoid race conditions entirely.
If so, well... like Joel said, change the database. I know you can't, but the problem is as old as the hills, and it's been solved well -- at the database level. If you want to fix it yourself, you're just going to have to loop (insert, check for collisions, delete) over and over and over again. The fundamental problem is that you can't perform a transaction (I don't mean that in the SQL "TRANSACTION" sense, but in the larger data-theory sense) if you don't have support from the database.
The only further thought I have is that if you at least have control over who has access to the database (e.g., only "authorized" apps, either written or approved by you), you could implement a side-band mutex of sorts, where a "talking stick" is shared by all the apps and ownership of the mutex is required to do an insert. That would be its own hairy ball of wax, though, as you'd have to figure out policy for dead clients, where it's hosted, configuration issues, etc. And of course a "rogue" client could do inserts without the talking stick and hose the whole setup.
The best solution is to change the database. You may not be able to change the column to be an identity column, but you should be able to make sure there's a unique constraint on the column and add a new identity column seeded with your existing PK's. Then either use the new column instead or use a trigger to make the old column mirror the new, or both.

Categories

Resources