Possible to get PrimaryKey IDs back after a SQL BulkCopy? - c#

I am using C# and using SqlBulkCopy. I have a problem though. I need to do a mass insert into one table then another mass insert into another table.
These 2 have a PK/FK relationship.
Table A
Field1 -PK auto incrementing (easy to do SqlBulkCopy as straight forward)
Table B
Field1 -PK/FK - This field makes the relationship and is also the PK of this table. It is not auto incrementing and needs to have the same id as to the row in Table A.
So these tables have a one to one relationship but I am unsure how to get back all those PK Id that the mass insert made since I need them for Table B.
Edit
Could I do something like this?
SELECT *
FROM Product
WHERE NOT EXISTS (SELECT * FROM ProductReview WHERE Product.ProductId = ProductReview.ProductId AND Product.Qty = NULL AND Product.ProductName != 'Ipad')
This should find all the rows that where just inserted with the sql bulk copy. I am not sure how to take the results from this then do a mass insert with them from a SP.
The only problem I can see with this is that if a user is doing the records one at a time and a this statement runs at the same time it could try to insert a row twice into the "Product Review Table".
So say I got like one user using the manual way and another user doing the mass way at about the same time.
manual way.
1. User submits data
2. Linq to sql Product object is made and filled with the data and submited.
3. this object now contains the ProductId
4. Another linq to sql object is made for the Product review table and is inserted(Product Id from step 3 is sent along).
Mass way.
1. User grabs data from a user sharing the data.
2. All Product rows from the sharing user are grabbed.
3. SQL Bulk copy insert on Product rows happens.
4. My SP selects all rows that only exist in the Product table and meets some other conditions
5. Mass insert happens with those rows.
So what happens if step 3(manual way) is happening at the same time as step 4(mass way). I think it would try to insert the same row twice causing a primary constraint execption.

In that scenario, I would use SqlBulkCopy to insert into a staging table (i.e. one that looks like the data I want to import, but isn't part of the main transactional tables), and then at the DB to a INSERT/SELECT to move the data into the first real table.
Now I have two choices depending on the server version; I could do a second INSERT/SELECT to the second real table, or I could use the INSERT/OUTPUT clause to do the second insert , using the identity rows from the table.
For example:
-- dummy schema
CREATE TABLE TMP (data varchar(max))
CREATE TABLE [Table1] (id int not null identity(1,1), data varchar(max))
CREATE TABLE [Table2] (id int not null identity(1,1), id1 int not null, data varchar(max))
-- imagine this is the SqlBulkCopy
INSERT TMP VALUES('abc')
INSERT TMP VALUES('def')
INSERT TMP VALUES('ghi')
-- now push into the real tables
INSERT [Table1]
OUTPUT INSERTED.id, INSERTED.data INTO [Table2](id1,data)
SELECT data FROM TMP

If your app allows it, you could add another column in which you store an identifier of the bulk insert (a guid for example). You would set this id explicitly.
Then after the bulk insert, you just select the rows that have that identifier.

I had the same issue where I had to get back ids of the rows inserted with SqlBulkCopy.
My ID column was an identity column.
Solution:
I have inserted 500+ rows with bulk copy, and then selected them back with the following query:
SELECT TOP InsertedRowCount *
FROM MyTable
ORDER BY ID DESC
This query returns the rows I have just inserted with their ids. In my case I had another unique column. So I selected that column and id. Then mapped them with a IDictionary like so:
IDictionary<string, int> mymap = new Dictionary<string, int>()
mymap[Name] = ID
Hope this helps.

My approach is similar to what RiceRiceBaby described, except one important thing to add is that the call to retrieve Max(Id) needs to be a part of a transaction, along with the call to SqlBulkCopy.WriteToServer. Otherwise, someone else may insert during your transaction and this would make your Id's incorrect. Here is my code:
public static void BulkInsert<T>(List<ColumnInfo> columnInfo, List<T> data, string
destinationTableName, SqlConnection conn = null, string idColumn = "Id")
{
NLogger logger = new NLogger();
var closeConn = false;
if (conn == null)
{
closeConn = true;
conn = new SqlConnection(_connectionString);
conn.Open();
}
SqlTransaction tran =
conn.BeginTransaction(System.Data.IsolationLevel.Serializable);
try
{
var options = SqlBulkCopyOptions.KeepIdentity;
var sbc = new SqlBulkCopy(conn, options, tran);
var command = new SqlCommand(
$"SELECT Max({idColumn}) from {destinationTableName};", conn,
tran);
var id = command.ExecuteScalar();
int maxId = 0;
if (id != null && id != DBNull.Value)
{
maxId = Convert.ToInt32(id);
}
data.ForEach(d =>
{
maxId++;
d.GetType().GetProperty(idColumn).SetValue(d, maxId);
});
var dt = ConvertToDataTable(columnInfo, data);
sbc.DestinationTableName = destinationTableName;
foreach (System.Data.DataColumn dc in dt.Columns)
{
sbc.ColumnMappings.Add(dc.ColumnName, dc.ColumnName);
}
sbc.WriteToServer(dt);
tran.Commit();
if(closeConn)
{
conn.Close();
conn = null;
}
}
catch (Exception ex)
{
tran.Rollback();
logger.Write(LogLevel.Error, $#"An error occurred while performing a bulk
insert into table {destinationTableName}. The entire
transaction has been rolled back.
{ex.ToString()}");
throw ex;
}
}

Depending on your needs and how much control you have of the tables, you may want to consider using UNIQUEIDENTIFIERs (Guids) instead of your IDENTITY primary keys. This moves key management outside of the database and into your application. There are some serious tradeoffs to this approach, so it may not meet your needs. But it may be worth considering. If you know for sure that you'll be pumping a lot of data into your tables via bulk-insert, it is often really handy to have those keys managed in your object model rather than your application relying on the database to give you back the data.
You could also take a hybrid approach with staging tables as suggested before. Get the data into those tables using GUIDs for the relationships, and then via SQL statements you could get the integer foreign keys in order and pump data into your production tables.

I would:
Turn on identity insert on the table
Grab the Id of the last row of the table
Loop from (int i = Id; i < datable.rows.count+1; i++)
In the loop, assign the Id property of your datable to i+1.
Run your SQL bulk insert with your keep identity turned on.
Turn identity insert back off
I think that's the safest way to get your ids on an SQL bulk insert because it will prevent mismatched ids that could caused by the application be executed on another thread.

Disclaimer: I'm the owner of the project C# Bulk Operations
The library overcome SqlBulkCopy limitations and add flexible features like output inserted identity value.
Behind the code, it does exactly like the accepted answer but way easier to use.
var bulk = new BulkOperation(connection);
// Output Identity
bulk.ColumnMappings.Add("ProductID", ColumnMappingDirectionType.Output);
// ... Column Mappings...
bulk.BulkInsert(dt);

Related

ADO.NET and SQLite single cell select performance

I want to create simple database in runtime, fill it with data from internal resource and then read each record through loop. Previously I used LiteDb for that but I couldn't squeeze time anymore so
I choosed SQLite.
I think there are few things to improve I am not aware of.
Database creation process:
First step is to create table
using var create = transaction.Connection.CreateCommand();
create.CommandText = "CREATE TABLE tableName (Id TEXT PRIMARY KEY, Value TEXT) WITHOUT ROWID";
create.ExecuteNonQuery();
Next insert command is defined
var insert = transaction.Connection.CreateCommand();
insert.CommandText = "INSERT OR IGNORE INTO tableName VALUES (#Id, #Record)";
var idParam = insert.CreateParameter();
var valueParam = insert.CreateParameter();
idParam.ParameterName = "#" + IdColumn;
valueParam.ParameterName = "#" + ValueColumn;
insert.Parameters.Add(idParam);
insert.Parameters.Add(valueParam);
Through loop each value is inserted
idParameter.Value = key;
valueParameter.Value = value.ValueAsText;
insert.Parameters["#Id"] = idParameter;
insert.Parameters["#Value"] = valueParameter;
insert.ExecuteNonQuery();
Transaction commit transaction.Commit();
Create index
using var index = transaction.Connection.CreateCommand();
index.CommandText = "CREATE UNIQUE INDEX idx_tableName ON tableName(Id);";
index.ExecuteNonQuery();
And after that i perform milion selects (to retrieve single value):
using var command = _connection.CreateCommand();
command.CommandText = "SELECT Value FROM tableName WHERE Id = #id;";
var param = command.CreateParameter();
param.ParameterName = "#id";
param.Value = id;
command.Parameters.Add(param);
return command.ExecuteReader(CommandBehavior.SingleResult).ToString();
For all select's one connection is shared and never closed. Insert is quite fast (less then minute) but select's are very troublesome here. Is there a way to improve them?
Table is quite big (around ~2 milions records) and Value contains quite heavy serialized objects.
System.Data.SQLite provider is used and connection string contains this additional options: Version=3;Journal Mode=Off;Synchronous=off;
If you go for performance, you need to consider this: each independent SELECT command is a roundtrip to the DB with some extra costs. It's similar to a N+1 select problem in case of parent-child relations.
The best thing you can do is to get a LIST of items (values):
SELECT Value FROM tableName WHERE Id IN (1, 2, 3, 4, ...);
Here's a link on how to code that: https://www.mikesdotnetting.com/article/116/parameterized-in-clauses-with-ado-net-and-linq
You could have the select command not recreated for every Id but created once and only executed for every Id. From your code it seems every select is CreateCommand/CreateParameters and so on. See this for example: https://learn.microsoft.com/en-us/dotnet/api/system.data.idbcommand.prepare?view=net-5.0 - you run .Prepare() once and then only execute (they don't need to be NonQuery)
you could then try to see if you can be faster with ExecuteScalar and not having reader created for one data result, like so: https://learn.microsoft.com/en-us/dotnet/api/system.data.idbcommand.executescalar?view=net-5.0
If scalar will not prove to be faster then you could try to use .SingleRow instead of .SingleResult in your ExecuteReader for possible performance optimisations. According to this: https://learn.microsoft.com/en-us/dotnet/api/system.data.commandbehavior?view=net-5.0 it might work. I doubt that but if first two don't help, why not try it too.

Oracle Stored Procedure - Can I empty temp table after cursor is created

I have a web service API that uses a list of item ID as input parameter and a data table as output parameter (among other parameters irrelevant to this question). This API calls an Oracle stored procedure within a package to get the content of the output data table.
The stored procedure loops through each item ID and determines an outcome for it. It then uses a temp table to store the results for each item ID (Item ID, outcome, sysdate). A the end, a cursor is used to query this temp table and get the result.
My question is that as time goes by content of this data table gets too big (millions of records). I know I can have a clean up process but was wondering if it acceptable to delete the content after cursor is created.
This is a watered version of web service API and stored procedure:
public static EnumGlobal.Errorcode GetOutcomeByItem(string itemIDs, out DataTable dtOutcome, ...)
{
OracleDbContext dbContext = new OracleDbContext();
List<OracleParameter> spParams = new List<OracleParameter>();
DataSet dsOutcome = new DataSet();
...
try
{
spParams.Add(new OracleParameter("IPSITEMIDS", OracleDbType.Varchar2, itemIDs, ParameterDirection.Input));
...
spParams.Add(new OracleParameter("CUR_OUT", OracleDbType.RefCursor, ParameterDirection.Output));
try
{
dbContext.Open();
dbContext.ExecuteStoredProcedure("PKGSOMEQUERY.USPGETOUTCOMEBYITEM", spParams, ref dsOutcome);
}
}
}
PROCEDURE USPGETOUTCOMEBYITEM
(
IPSITEMIDS VARCHAR2,
...
CUR_OUT OUT GETDATACURSOR
)
IS
LVSQUERY VARCHAR2(4000):='';
V_OUTCOME VARCHAR2(5);
V_NEWITEMSLIST VARCHAR2(4000) := REPLACE(IPSITEMIDS, '''', '');
CURSOR cur IS
SELECT REGEXP_SUBSTR(V_NEWITEMSLIST, '[^,]+', 1, LEVEL) V_NEWITEM2 FROM DUAL CONNECT BY instr(V_NEWITEMSLIST, ',',1, LEVEL -1) > 0;
BEGIN
-- Loop thorugh each ITEM ID and determine outcome, add ITEM ID and OUTCOME to temp table
FOR rec IN cur LOOP
V_NEWITEM := rec.V_NEWITEM2;
...
-- Determine V_OUTCOME
...
INSERT INTO TEMPOUTCOME
(
ITEMID,
OUTCOME,
ORIGINDATE
)
VALUES
(
V_NEWITEM,
V_OUTCOME,
SYSDATE
);
COMMIT;
END LOOP;
LVSQUERY:='SELECT ITEMID, OUTCOME, ORIGINDATE FROM TEMPOUTCOME WHERE ITEMID IN (' || IPSITEMIDS || ')';
OPEN CUR_OUT FOR LVSQUERY;
COMMIT;
-- Can I do this?
-- Delete from temp table all item IDs used in this session, in one shot
-- DELETE FROM TEMPOUTCOME WHERE ITEMID IN (select REGEXP_SUBSTR(IPSITEMIDS, '\''(.*?)\''(?:\,)?', 1, LEVEL, NULL, 1) FROM dual CONNECT BY LEVEL <= REGEXP_COUNT(IPSITEMIDS, '''(?: +)?(\,)(?: +)?''', 1) + 1);
EXCEPTION WHEN OTHERS THEN
PKGHANDLEERROR.USPHANDLEERROR('USPGETOUTCOMEBYITEM', LVIERRORCODE);
OPIERRORCODE:=LVIERRORCODE;
END USPGETOUTCOMEBYITEM;
I haven't really tested that, but from general ORACLE knowledge perspective, as soon as you open a cursor, you are no longer dealing with stored data. Instead you are iterating an in-memory snapshot. So I believe it should work. Unless there's a huge amount of data and oracle tries to page the results (not sure if it actually happens though)...
As a simple/safe option you can delete the records that are a day/hour/minute old (depending on the utilization).
Also as a suggestion, if you get sysdate once into a variable and use it in your insert, it may be much easier to deal with the dataset. as you may just query by origindate.
It will also make it a bit faster to insert
One more thing to take a look at (maybe even the best one) is Oracle Temporary tables.

Dapper + MSAccess: How to get identifier of inserted row

I am using Dapper with C# and back end is MS Access. My DAL method inserts record in database. I want to return unique identifier (or updated POCO with unique identifier) of the inserted row.
I am expecting my function something like follows (I know this does not work; just to explain what I want): -
public MyPoco Insert(MyPoco myPoco)
{
sql = #"INSERT INTO MyTable (Field1, Field2) VALUES (#Field1, #Field2)";
var param = GetMappedParams(myPoco);//ID property here is null.
var result = _connection.Query<MyPoco>(sql, param, null, false, null, CommandType.Text);.Single();
return result;//This result now contains ID that is created by database.
}
I am from NHibernate world and POCO updates automatically with NH. If not; we can call Refresh method and it updates the ID.
I am not aware how to achieve this with Dapper.
I read this question on SO which is not relevant as it talks about SQL Server.
Another this question does not have accepted answer.
I read this question where accepted answer explains pit-falls of using ##Identity.
This is what works for me:
static MyPoco Insert(MyPoco myPoco)
{
string sql = "INSERT INTO MyTable (Field1, Field2) VALUES (#Field1, #Field2)";
_connection.Execute(sql, new {myPoco.Field1, myPoco.Field2});
myPoco.ID = _connection.Query<int>("SELECT ##IDENTITY").Single();
return myPoco; // This result now contains ID that is created by database.
}
Note that this will work with an OleDbConnection to the Access database, but it will not work with an OdbcConnection.
Edit re: comment
To ensure that the Connection remains open between the INSERT and the SELECT calls, we could do this:
static void Insert(MyPoco myPoco)
{
string sql = "INSERT INTO MyTable (Field1, Field2) VALUES (#Field1, #Field2)";
bool connAlreadyOpen = (_connection.State == System.Data.ConnectionState.Open);
if (!connAlreadyOpen)
{
_connection.Open();
}
_connection.Execute(sql, new {myPoco.Field1, myPoco.Field2});
myPoco.ID = _connection.Query<int>("SELECT ##IDENTITY").Single();
if (!connAlreadyOpen)
{
_connection.Close();
}
return; // (myPoco now contains ID that is created by database.)
}
Just a couple of extra thoughts: If the ##Identity pitfalls are an issue then another option would be to create a new GUID ahead of time in code and then insert that GUID with the rest of the data, rather than letting Access create the identity value when it creates the new record.
I appreciate that will only work if your particular situation allows for a GUID primary key for the table, but it does guarantee you that you know the true value of the key for the record you just inserted.
Alternatively, if you don't want a GUID key you could create a table with a single row that holds the current seed value for any manually managed keys in your application. You can then manually increment the particular seed's value each time you want to insert a new record. As with the GUID approach, you'd then manually insert the ID with the record, this time the ID would be the newly incremented seed you just retrieved.
Again, that should guarantee you a unique key for each insert, although now you are doing a read and two writes for each insert.

How to get the last number in a database column then increment it to include in another record?

I have been given the task of rewriting and old work application from classic .asp to ASP.NET that includes a database table that does not have an auto incremented primary key. We want to continue to use this table to maintain database integrity (it also has 80,000+ records!). The problem that I am running into is that I need to be able to pull the last item from the ID column of the database table regardless of how old the record is, increment that number and then include it in the new record to be inserted as the new record's ID number. How would I go about doing this? I have tried the ListItem, DataReader, DataTables, Generic Lists (as objects), and ArrayLists. I can pull the information and store it, but I cannot get the last item in the collection by itself. Any suggestions would be appreciated.
protected void GetPrimaryKey()
{
string strSQL = "";
try
{
OleDbConnection dbConn = new OleDbConnection();
dbConn.ConnectionString = System.Web.Configuration.WebConfigurationManager.ConnectionString["ConnectionString"].ToString();
strSQL = "SELECT observationID FROM Observation";
OleDbCommand myCmd = new OleDbCommand(strSQL, dbConn);
OleDbReader reader;
ListItem item;
if (dbConn.State == ConnectionState.Colsed) dbConn.Open();
reader = myCmd.ExecuteReader();
while (reader.Read())
{
item = new ListItem();
item.Text = reader["observationID"].ToString();
}
reader.Close();
dbConn.Close();
myCmd.Dispose();
}
}
Populating the list is where this code is at. The last item still needs to be found then incremented, and the returned to the submit button event handler that starts this whole process. I know this code is missing a lot, but I didn't want to send my entire commented mess. Again, any help is appreciated. Thank You.
SELECT TOP 1 ObservationId FROM Observarion ORDER BY ObservationId DESC
This will return the last row id
If more than one person try to get this value to insert, you will run into an issue where you end up with the same Ids, unless that column is unique and will throw an error.
To minimize issues, you can do an inline select in your insert statement.
INSERT INTO Observation (ObservationId) VALUES(SELECT TOP 1 (ObservationId + 1) As NewObservationId FROM Observation ORDER BY ObservationId DESC)
Not sure if my syntax is completely correct but it should lead you in the right direction.
Try get the max observation ID in sql statement:
SELECT MAX(observationID) FROM Observation
Then increment it.
SELECT MAX(observationID) FROM Observation
will always return the max value regardless of how old the record is
just ask for next value autoincrement id from table:
SELECT IDENT_CURRENT('table_name');
enjoy xD.
Why don't you do your query like this?:
SELECT Top 1 observationID FROM Observation order by desc
If you have some sort of parameters or configuration table, I suggest you store the last value there and retrieve/update it each time you do an insert. This will prevent any issues in case you have 2 or more clients trying to insert a new record at the same time.

SQL Insert one row or multiple rows data?

I am working on a console application to insert data to a MS SQL Server 2005 database. I have a list of objects to be inserted. Here I use Employee class as example:
List<Employee> employees;
What I can do is to insert one object at time like this:
foreach (Employee item in employees)
{
string sql = #"INSERT INTO Mytable (id, name, salary)
values ('#id', '#name', '#salary')";
// replace #par with values
cmd.CommandText = sql; // cmd is IDbCommand
cmd.ExecuteNonQuery();
}
Or I can build a balk insert query like this:
string sql = #"INSERT INTO MyTable (id, name, salary) ";
int count = employees.Count;
int index = 0;
foreach (Employee item in employees)
{
sql = sql + string.format(
"SELECT {0}, '{1}', {2} ",
item.ID, item.Name, item.Salary);
if ( index != (count-1) )
sql = sql + " UNION ALL ";
index++
}
cmd.CommandType = sql;
cmd.ExecuteNonQuery();
I guess the later case is going to insert rows of data at once. However, if I have
several ks of data, is there any limit for SQL query string?
I am not sure if one insert with multiple rows is better than one insert with one row of data, in terms of performance?
Any suggestions to do it in a better way?
Actually, the way you have it written, your first option will be faster.
Your second example has a problem in it. You are doing sql = + sql + etc. This is going to cause a new string object to be created for each iteration of the loop. (Check out the StringBuilder class). Technically, you are going to be creating a new string object in the first instance too, but the difference is that it doesn't have to copy all the information from the previous string option over.
The way you have it set up, SQL Server is going to have to potentially evaluate a massive query when you finally send it which is definitely going to take some time to figure out what it is supposed to do. I should state, this is dependent on how large the number of inserts you need to do. If n is small, you are probably going to be ok, but as it grows your problem will only get worse.
Bulk inserts are faster than individual ones due to how SQL server handles batch transactions. If you are going to insert data from C# you should take the first approach and wrap say every 500 inserts into a transaction and commit it, then do the next 500 and so on. This also has the advantage that if a batch fails, you can trap those and figure out what went wrong and re-insert just those. There are other ways to do it, but that would definately be an improvement over the two examples provided.
var iCounter = 0;
foreach (Employee item in employees)
{
if (iCounter == 0)
{
cmd.BeginTransaction;
}
string sql = #"INSERT INTO Mytable (id, name, salary)
values ('#id', '#name', '#salary')";
// replace #par with values
cmd.CommandText = sql; // cmd is IDbCommand
cmd.ExecuteNonQuery();
iCounter ++;
if(iCounter >= 500)
{
cmd.CommitTransaction;
iCounter = 0;
}
}
if(iCounter > 0)
cmd.CommitTransaction;
In MS SQL Server 2008 you can create .Net table-UDT that will contain your table
CREATE TYPE MyUdt AS TABLE (Id int, Name nvarchar(50), salary int)
then, you can use this UDT in your stored procedures and your с#-code to batch-inserts.
SP:
CREATE PROCEDURE uspInsert
(#MyTvp AS MyTable READONLY)
AS
INSERT INTO [MyTable]
SELECT * FROM #MyTvp
C# (imagine that records you need to insert already contained in Table "MyTable" of DataSet ds):
using(conn)
{
SqlCommand cmd = new SqlCommand("uspInsert", conn);
cmd.CommandType = CommandType.StoredProcedure;
SqlParameter myParam = cmd.Parameters.AddWithValue
("#MyTvp", ds.Tables["MyTable"]);
myParam.SqlDbType = SqlDbType.Structured;
myParam.TypeName = "dbo.MyUdt";
// Execute the stored procedure
cmd.ExecuteNonQuery();
}
So, this is the solution.
Finally I want to prevent you from using code like yours (building the strings and then execute this string), because this way of executing may be used for SQL-Injections.
look at this thread,
I've answered there about table valued parameter.
Bulk-copy is usually faster than doing inserts on your own.
If you still want to do it in one of your suggested ways you should make it so that you can easily change the size of the queries you send to the server. That way you can optimize for speed in your production environment later on. Query times may v ary alot depending on the query size.
The batch size for a SQL Server query is listed at being 65,536 * the network packet size. The network packet size is by default 4kbs but can be changed. Check out the Maximum capacity article for SQL 2008 to get the scope. SQL 2005 also appears to have the same limit.

Categories

Resources