I've searched through numerous threads to try to find an answer to this but any answer I've found suggests using a unique constraint on a single column, or multiple columns.
My problem is, I'm writing an application in C# with a SQL Server back end. One of the features is to allow a user to import a .CSV file into the database after a little bit of pre-processing. I need to find the quickest method to prevent the user from importing the same data more than once. The data will look something like
ID -- will be auto-generated in SQL Server (PK)
Date Time(datetime)
Machine(nchar)
...
...
...
Name(nchar)
Age(int)
I want to allow any number of the columns to be duplicate values, a long as the entire record is not.
I was thinking of creating another column in the database, obtained by hashing all of the columns together and making it unique but want sure if that was the most efficient method, or if the resulting hash would be guaranteed unique. The CSV files will only be around 60 MB, but there will be tens of thousands of them.
Any help would be appreciated.
Thanks
You should be able to resolve this by creating a unique constraint which includes all the columns.
create table #a (col1 varchar(10), col2 varchar(10))
ALTER TABLE #a
ADD CONSTRAINT UQ UNIQUE NONCLUSTERED
(col1, col2)
-- Works, duplicate entries in columns
insert into #a (col1, col2)
values ('a', 'b')
,('a', 'c')
,('b', 'c')
-- Fails, full duplicate record:
insert into #a (col1, col2)
values ('a1', 'b1')
,('a1', 'b1')
The code below can work to ensure that you don't duplicate the [Date Time], Machine, [Name] and Age columns when you insert the data.
It's important to ensure that at the time of running the code, each row of the incoming dataset has a unique ID on it. This code just fails to shift any rows where the ID gets selected because all four other values are already duplicated in the destination table.
INSERT INTO MAIN_TABLE ([Date Time],Machine,[Name],Age)
SELECT [Date Time],Machine,[Name],Age
FROM IMPORT_TABLE WHERE ID NOT IN
(
SELECT I.ID FROM IMPORT_TABLE I INNER JOIN MAIN_TABLE M
ON I.[Date Time]=M.[Date Time]
AND I.Machine=M.Machine
AND I.[Name]=M.[Name]
AND I.Age=M.Age
)
Related
My database contains three tables called Object_Table, Data_Table and Link_Table. The link table just contains two columns, the identity of an object record and an identity of a data record.
I want to copy the data from DATA_TABLE where it is linked to one given object identity and insert corresponding records into Data_Table and Link_Table for a different given object identity.
I can do this by selecting into a table variable and the looping through doing two inserts for each iteration.
Is this the best way to do it?
Edit : I want to avoid a loop for two reason, the first is that I'm lazy and a loop/temp table requires more code, more code means more places to make a mistake and the second reason is a concern about performance.
I can copy all the data in one insert but how do get the link table to link to the new data records where each record has a new id?
In one statement: No.
In one transaction: Yes
BEGIN TRANSACTION
DECLARE #DataID int;
INSERT INTO DataTable (Column1 ...) VALUES (....);
SELECT #DataID = scope_identity();
INSERT INTO LinkTable VALUES (#ObjectID, #DataID);
COMMIT
The good news is that the above code is also guaranteed to be atomic, and can be sent to the server from a client application with one sql string in a single function call as if it were one statement. You could also apply a trigger to one table to get the effect of a single insert. However, it's ultimately still two statements and you probably don't want to run the trigger for every insert.
You still need two INSERT statements, but it sounds like you want to get the IDENTITY from the first insert and use it in the second, in which case, you might want to look into OUTPUT or OUTPUT INTO: http://msdn.microsoft.com/en-us/library/ms177564.aspx
The following sets up the situation I had, using table variables.
DECLARE #Object_Table TABLE
(
Id INT NOT NULL PRIMARY KEY
)
DECLARE #Link_Table TABLE
(
ObjectId INT NOT NULL,
DataId INT NOT NULL
)
DECLARE #Data_Table TABLE
(
Id INT NOT NULL Identity(1,1),
Data VARCHAR(50) NOT NULL
)
-- create two objects '1' and '2'
INSERT INTO #Object_Table (Id) VALUES (1)
INSERT INTO #Object_Table (Id) VALUES (2)
-- create some data
INSERT INTO #Data_Table (Data) VALUES ('Data One')
INSERT INTO #Data_Table (Data) VALUES ('Data Two')
-- link all data to first object
INSERT INTO #Link_Table (ObjectId, DataId)
SELECT Objects.Id, Data.Id
FROM #Object_Table AS Objects, #Data_Table AS Data
WHERE Objects.Id = 1
Thanks to another answer that pointed me towards the OUTPUT clause I can demonstrate a solution:
-- now I want to copy the data from from object 1 to object 2 without looping
INSERT INTO #Data_Table (Data)
OUTPUT 2, INSERTED.Id INTO #Link_Table (ObjectId, DataId)
SELECT Data.Data
FROM #Data_Table AS Data INNER JOIN #Link_Table AS Link ON Data.Id = Link.DataId
INNER JOIN #Object_Table AS Objects ON Link.ObjectId = Objects.Id
WHERE Objects.Id = 1
It turns out however that it is not that simple in real life because of the following error
the OUTPUT INTO clause cannot be on
either side of a (primary key, foreign
key) relationship
I can still OUTPUT INTO a temp table and then finish with normal insert. So I can avoid my loop but I cannot avoid the temp table.
I want to stress on using
SET XACT_ABORT ON;
for the MSSQL transaction with multiple sql statements.
See: https://msdn.microsoft.com/en-us/library/ms188792.aspx
They provide a very good example.
So, the final code should look like the following:
SET XACT_ABORT ON;
BEGIN TRANSACTION
DECLARE #DataID int;
INSERT INTO DataTable (Column1 ...) VALUES (....);
SELECT #DataID = scope_identity();
INSERT INTO LinkTable VALUES (#ObjectID, #DataID);
COMMIT
It sounds like the Link table captures the many:many relationship between the Object table and Data table.
My suggestion is to use a stored procedure to manage the transactions. When you want to insert to the Object or Data table perform your inserts, get the new IDs and insert them to the Link table.
This allows all of your logic to remain encapsulated in one easy to call sproc.
If you want the actions to be more or less atomic, I would make sure to wrap them in a transaction. That way you can be sure both happened or both didn't happen as needed.
You might create a View selecting the column names required by your insert statement, add an INSTEAD OF INSERT Trigger, and insert into this view.
Before being able to do a multitable insert in Oracle, you could use a trick involving an insert into a view that had an INSTEAD OF trigger defined on it to perform the inserts. Can this be done in SQL Server?
Insert can only operate on one table at a time. Multiple Inserts have to have multiple statements.
I don't know that you need to do the looping through a table variable - can't you just use a mass insert into one table, then the mass insert into the other?
By the way - I am guessing you mean copy the data from Object_Table; otherwise the question does not make sense.
//if you want to insert the same as first table
$qry = "INSERT INTO table (one, two, three) VALUES('$one','$two','$three')";
$result = #mysql_query($qry);
$qry2 = "INSERT INTO table2 (one,two, three) VVALUES('$one','$two','$three')";
$result = #mysql_query($qry2);
//or if you want to insert certain parts of table one
$qry = "INSERT INTO table (one, two, three) VALUES('$one','$two','$three')";
$result = #mysql_query($qry);
$qry2 = "INSERT INTO table2 (two) VALUES('$two')";
$result = #mysql_query($qry2);
//i know it looks too good to be right, but it works and you can keep adding query's just change the
"$qry"-number and number in #mysql_query($qry"")
I have 17 tables this has worked in.
-- ================================================
-- Template generated from Template Explorer using:
-- Create Procedure (New Menu).SQL
--
-- Use the Specify Values for Template Parameters
-- command (Ctrl-Shift-M) to fill in the parameter
-- values below.
--
-- This block of comments will not be included in
-- the definition of the procedure.
-- ================================================
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE InsetIntoTwoTable
(
#name nvarchar(50),
#Email nvarchar(50)
)
AS
BEGIN
SET NOCOUNT ON;
insert into dbo.info(name) values (#name)
insert into dbo.login(Email) values (#Email)
END
GO
First I am sorry for my bad English, is not my language.
My problem is: I have a table with around 10 million records of transaction of bank. It don't have PK and didn't sort as any column.
My work is create a page to filter and export it to csv. But limit of rows to export Csv is around 200k records.
I have some idea like:
create 800 tables of 800 ATMs (just an idea, I know it's stupid) and send data from main table to it 1 time per day => export to 800 file csv
use Linq to get 100k record per time then next time, I skip those. But I am stuck when Skip command need OrderBy and I got OutOfMemoryException with it
db.tblEJTransactions.OrderBy(u => u.Id).Take(100000).ToList()
Can anyone help me, every idea is welcome (my boss said I can use anything includes create hundred of tables, use Nosql ... )
If you don't have a primary key in your table, then add one.
The simplest and easiest is to add an int IDENTITY column.
ALTER TABLE dbo.T
ADD ID int NOT NULL IDENTITY (1, 1)
ALTER TABLE dbo.T
ADD CONSTRAINT PK_T PRIMARY KEY CLUSTERED (ID)
If you can't alter the original table, create a copy.
Once the table has a primary key you can sort by it and select chunks/pages of 200K rows with predictable results.
I'm not sure about my solution. But you can refer and try it:
select top 1000000 *, row_number() over (order by (select null)) from tblEJTransactions
The above query returns sorted list.
And then you can use Linq to get the result.
I have a application where users can add update and delete a record, I wanted to know the best ways to avoid duplicate records. In this application to avoid duplicate records i created a index on the table, is it a good practice or there are others?
There are a few ways to do this. If you have a unique index on a field and you try to insert a duplicate value SQL Server with throw an error. My preferred way is to test for existence before the insert by using
IF NOT EXISTS (SELECT ID FROM MyTable WHERE MyField = #ValueToBeInserted)
BEGIN
INSERT INTO MyTable (Field1, Field2) Values (#Value1, #Value2)
END
You can also return a value to let you know if the INSERT took place using an ELSE on the above code.
If you choose to index a field you can set IGNORE_DUP_KEY to simply ignore any duplicate inserts. If you were inserting multiple rows any duplicates would be ignored and the non duplicates would continue to be inserted.
You can use UNIQUE constraints on columns or on a set of columns that you don't want to be duplicated; see also http://www.w3schools.com/sql/sql_unique.asp.
Here is an example for both a single-column and a multi-column unique constraint:
CREATE TABLE [Person]
(
…
[SSN] VARCHAR(…) UNIQUE, -- only works for single-column UNIQUE constraint
…
[Name] NVARCHAR(…),
[DateOfBirth] DATE,
…
UNIQUE ([Name], [DateOfBirth]) -- works for any number of columns
)
An id for a table is almost compulsory according to me. To avoid duplicates when inserting a row, you can simply use :
INSERT IGNORE INTO Table(id, name) VALUES (null, "blah")
This works in MySQL, i'm not sure about SQL Server.
I have 4 tables say, table1, table2, table3 and table4, which are interrelated.
Table1 will generate a primary key, that will be used in rest of the tables as reference key.
I have to insert multiple records in table 4 using this primary key.
Since the requirement is the transaction should either commit successfully or it should rollback all the changes. That is the reason I thought of writing this in stored procedure.
But got stuck, when I had to pass multiple rows data for table4.
Can anyone please suggest, how can I achieve this?
Thanks, in advance.
i guess you want to do something like this
CREATE OR REPLACE PROCEDURE myproc
(
invId IN NUMBER,
cusId IN NUMBER
)
IS
temp_id NUMBER;
BEGIN
INSERT INTO myTable (INV_ID)
VALUES (invId)
returning id into temp_id;
INSERT INTO anotherTable (ID, custID)
VALUES (temp_id, custId);
END myproc;
I have a Winform that has fields need to be filled by a user. All the fields doesn't belong to one table, the data will go to Customer table and CustomerPhone table, so i decided to do multiple inserts. I will insert appropriate data to CustomerPhone first then Insert the rest data to Customer table.
Is it possible to Join an Insert OR Insert a Join? If show me a rough sample, i will be grateful.
Many Thanks
Strictly speaking, you can chain inserts and updates in a single statement using the OUTPUT clause. For example, the code bellow inserts at once into two distinct tables:
create table A (
id_a int not null identity(1,1) primary key,
name varchar(100))
create table B (
id_b int not null identity(1,1) primary key,
id_a int null,
name_hash varbinary(16));
insert into A (name)
output inserted.id_a, hashbytes('MD5', inserted.name)
into B (id_a, name_hash)
values ('Jonathan Doe')
select * from A
select * from B
If you're asking whether you can somehow insert into two tables with one SQL statement: No, you need to do the two separate insert statements.
If you're asking something else, please elaborate..
You can make a view which has those columns and do an insert to the view. Normally, a view which combines multiple tables will not be updateable - however, you can make an instead of trigger which inserts into both tables and/or enforces your business logic.
Here's another link.
This can be a very powerful tool if you use it in an organized way to make a layer of views for both selects and unserts/updates.