SQL Bulk Upload - c#

Is there a way to bulk upload data into a SQL table without using the INSERT function for each row? I'm using mySQL.
I currently have a program that constaly pulls live data off the internet, and stores in a database. When doing this, it erases a table and then puts the new data in. When it does this though, it does it a row at a time. This would be ok, execpt that other programs pull that data at asyncronous times, and as such there is not guarantee that it picks up the complete table. Sometimes it will pull 10 rows, 20, etc. If there is a way I could insert all rows at once, such that other programs will pull 0 (just after the table is erased) or all rows that would be awesome.
Thanks, and any thoughts much appreciated!

If your mysql table type is InnoDB just use a transaction. Then all the updates will be visible at the same time when you commit.

You can use LOAD DATA INFILE to quickly load the table from a file.
If you are concerned about clients getting a partial view of the data, the easiest way to avoid that would be to do your load into a temp table, then do some table renaming to make the change appear to be instantaneous.

Continuing from Eric's alternate solution: An other way to do it, instead of a temp table is to have effective dates. The SQL to access the table would change such that it would select the most recent date for the pieces of data. If it's half-way through an update, the select should grab half new data and half old data. Once the new data has arrived, you can feel free to delete the old data.

Related

How to perform Update and Insert in SQL Server 2012 using SQL Bulk insert C# and ignore the duplicate values if already present in database

Summary
I have a requirement to modify the content of Database based on some input .txt file that modify thousands of records in database, for each being a business transaction (daily around 50 transaction will performed).
My application will read that .txt file and perform the modification to data in SQL Server database.
The current application that imports the data from DB and perform the data modification in memory (DataTable) and later after that push back to database it does so using SqlBulkCopy into a SQL Server 2008 database table.
Does anyone know of a way to use SqlBulkCopy while preventing duplicate rows, without a primary key? Or any suggestion for a different way to do this?
Already implemented and dropped for performance issues.
Before this I was using SQL statements was generated automated for data modifications but it's really slow, so I thought of loading complete database table into a DataTable (C#) memory perform look up and modifications and accept the changes to that memory ...
One more approach to implement, give me some feedback about my new approach please right me if I am wrong ..
Steps
load to database table into C# DataTable (fill DataTable using SqlDataAdapter)
Once DataTable is in memory, perform data modifications on it
Load again the base table in database and compare in memory prepare the
non-existing records and finally perform insert.
push the DataTable to memory using Bulk insert
I cant have Primary key!!!!
Please give me any suggestions for my workflow. and whether i am in right approach to deal my problem?.

Azure SQL handling table locks for huge inserts

In my application, I have a windows service (hosted as quartz on azure web) which runs after a particular time and reads a file and inserts data. The data can be of any length so the type in DB is "text". All records are inserted in one table.
Problem is that the service might run in parallel and try insert the records in table at the same time. Since the data might be huge and I want to have performance also, I want to make the service run in parallel. I am using EF 6.0 and LINQ. Is there a possible way to not lock the table and insert huge data.
Note: Bulk insert might not work as the data type to insert have 'text' as well.
I am assuming that there is one column of type "text" in your table and nothing else. Based on the above assumptions, I would think that there would not be fundamental semantic problems if you want to run this in parallel.
There are a couple of ways of doing this:
use SqlBulkCopy class with row batches. This will copy batches of rows and load them into the table. This can be done in parallel.
use a temporary staging table for each service, and then run INSERT INTO SELECT for each staging table. The INSERT portion will be in serial because it will take an X lock on the table, but this may be faster than loading one row by another.
Neither of these methods uses EF 6.0 and LINQ, but they will get the job done.
Hope this helps!

Efficient Update of Table from One SQL Server to Another, Same Table Structure

I have one database server, acting as the main SQL Server, containing a Table to hold all data. Other database servers come in and out (different instances of SQL Server). When they come online, they need to download data from main Table (for a given time period), they then generate their own additional data to the same local SQL Server database table, and then want to update the main server with only new data, using a C# program, through a scheduled service, every so often. Multiple additional servers could be generating data at the same time, although it's not going to be that many.
Main table will always be online. The additional non-main database table is not always online, and should not be an identical copy of main, first it will contain a subset of the main data, then it generates its own additional data to the local table and updates main table every so often with its updates. There could be a decent amount of number of rows generated and/or downloaded. so an efficient algorithm is needed to copy from the extra database to the main table.
What is the most efficient way to transfer this in C#? SqlBulkCopy doesn't look like it will work because I can't have duplicate entries in main server, and it would fail if checking constraints since some entries already exist.
You could do it in DB or in C#. In all cases you must do something like Using FULL JOINs to Compare Datasets. You know that already.
Most important thing is to do it in transaction. If you have 100k rows split it to 1000 rows per transaction. Or try to determine what combination of rows per transaction is best for you.
Use Dapper. It's really fast.
If you have all your data in C#, use TVP to pass it to DB stored procedure. In stored procedure use MERGE to UPDATE/DELETE/INSERT data.
And last. In C# use Dictionary<Tkey, TValue> or something different with O(1) access time.
SQLBulkCopy is the fastest way for inserting data into a table from a C# program. I have used it to copy data between databases and so far nothing beats it speed wise. Here is a nice generic example: Generic bulk copy.
I would use a IsProcessed flag in the table of the main server and keep track of the main table's primary keys when you download data to the local db server. Then you should be able to do a delete and update to the main server again.
Here's how i would do it:
Create a stored procedure on the main table database which receives a user defined table variable with the same structure as the main table.
it should do something like -
INSERT INTO yourtable (SELECT * FROM tablevar)
OR you could use the MERGE statement for the Insert-or-Update functionality.
In code, (a windows service) load all (or a part of) the data from the secondery table and send it to the stored procedure as a table variable.
You could do it in bulks of 1000's and each time a bulk is updated you should mark it in the source table / source updater code.
Can you use linked servers for this? If yes it will make copying of data from and to main server much easier.
When copying data back to the main server I’d use IF EXISTS before each INSERT statement to additionally make sure there are no duplicates and encapsulate all insert statements into transaction so that if an error occurs transaction is rolled back.
I also agree with others on doing this in batches on 1000 or so records so that if something goes wrong you can limit the damage.

Limiting Number of Rows Inserted into a SQL Server Database

I have a program in c# in VS that runs a mainform.
That mainform exports data to an SQL Database with stored procedures into tables. The data exported is a lot of data (600,000 + rows).
I have a problem tho. On my mainform I need to have a "database write out interval". This is a number of how many "rows" will be imported into the database.
My problem is however the steps on how to implement that interval. The mainform runs, and when the main program is done, the sql still takes IN data for another 5-10 minutes.
Therefore, if I close the mainform, the rest of the data will not me imported.
Do you professional programmers out there know a way where I can somehow communicate with SQL to only export data for a user-specified interval. T
his has to be done with my c# class.
I dont know where to begin.
I dont think a timer would be a good idea because differenct computers and cpu's perform differently. Any advice would be appreciated.
If the data is of a fixed format (ie, there are going to be the same columns for every row and its not going to change much), you should look at Bulk Insert. Its incredibly fast at inserting large numbers of rows.
The basics are you write your data out to a text file (ie, csv, but you can specify whatever delimiter you want), then execute a BULK INSERT command against the server. One of the arguments is the path to the file you wrote out. It's a bit of a pain to use because you have to write the file in a folder on the server (or a UNC path that the server has access to) which leads to configuring windows shares or setting up FTP on the server. It sounds like exactly what you want to use, though.
Here's the MSDN documentation on BULK INSERT:
http://msdn.microsoft.com/en-us/library/ms188365.aspx
Instead of exporting all of your data to SQL and then trying to abort or manage the load a a better process might be to split your load into smaller chunks (10,000 records or so) and check whether the user wants to continue after each load. This gives you a lot more flexibility and control over the load then dumping all 600,000 records to SQL and trying to manage the process.
Also what Tim Coker mentioned is spot on. Even if your stored proc is doing some data manipulation it is a lot faster to load the data via bulk insert and run a query after the load to do any work you have to do then to run all 600,000 records through the stored proc.
Like all the other comments before, i will suggest you to use BulkInsert. You will be amazed by how fast the performance is when it comes to large dataset and perhaps your concept about interval is no longer required. Inserting 100k of records may only take seconds.
Depends on how your code is written, ADO.NET has native support for BulkInsert through SqlBulkCopy, see the code below
http://www.knowdotnet.com/articles/bulkcopy_intro1.html
If you have been using Linq to db for your code, there are already some clever code written as extension method to the datacontext which transform the linq changeset into a dataset and internally use ADO.NET to achieve the bulk insert
http://blogs.microsoft.co.il/blogs/aviwortzel/archive/2008/05/06/implementing-sqlbulkcopy-in-linq-to-sql.aspx

Maintain a local copy of a table from an external database table, ADO.NET

We have built an application which needs a local copy of a table from another database. I would like to write an ado.net routine which will keep the local table in sync with the master. Using .net 2.0, C# and ADO.NET.
Please note I really have no control over the master table which is in a third party, mission critical app I don't wish to mess with.
For example Here is the master data table:
ProjectCodeId Varchar(20) [PK]
ProjectCode Varchar(20)
ProjectDescrip Varchar(50)
OtherUneededField int
OtherUneededField2 int
The local table we need to keep in sync...
ProjectCodeId Varchar(20) [PK]
ProjectCode Varchar(20)
ProjectDescrip Varchar(50)
Perhaps a better approach to this question is what have you done in the past to this type of problem? What has worked best for you or should be avoided at all costs?
My goal with this question is to determine a good way to handle this. So often I am combining data from two or more disjointed data sources. I haven't included database platforms for this reason, it really shouldn't matter. In this current situation both databases are MSSQL, but I prefer the solution not use linked databases or DTS, etc.
Sure, truncating the local table and refilling it each time from the master is an option, but with thousands of rows I don't think this is very efficient. Do you?
EDIT: First, recognize that what you are doing is hand-rolled replication and replication is never simple.
You need to track and apply all of the CRUD state changes. That said, ADO.NET can do this.
To track changes to the source you can use Query Notification with your source database. This requires special permission against the database so the owner of the source database will need to take action to enable this solution. I haven't used this technique myself, but here is a description of it.
See "Query Notifications in SQL Server (ADO.NET)"
Query notifications were introduced in
Microsoft SQL Server 2005 and the
System.Data.SqlClient namespace in
ADO.NET 2.0. Built upon the Service
Broker infrastructure, query
notifications allow applications to be
notified when data has changed. This
feature is particularly useful for
applications that provide a cache of
information from a database, such as a
Web application, and need to be
notified when the source data is
changed.
To apply changes from the source db table you need to retrieve the data from the target db table, apply the changes to the target rows and post the changes back to the target db.
To apply the changes you can either
1) Delete and reinsert all of the rows (simple), or
2) Merge row-by-row changes (hard).
Delete and reinsert is self explanatory, so I won't go into detail on that.
For row-by-row change tracking here is an approach. (I am assuming here that Query Notification doesn't give you row-by-row change information, so you have to calculate it.)
You need to determine which rows were modified and identify inserted and deleted rows. Create a DataView with a sort for each table to get a Find method you can use to lookup matching rows by ID.
Identify modified rows by using a datetime/timestamp column, or by comparing all field values. Copy modified values to the target row.
Identify added and deleted rows by looping over the respective table DataViews and using the Find method of the other DataView to identify rows that do not appear in the first table. Insert or delete rows from the target table as required. (The Delete method doesn't remove the row but marks it for deletion by the TableAdapter Update.)
Good luck!
+tom
I would push in the direction where the application that is inserting the data would insert into one db/table then the other in the same function. Make the application do the work, the db will be pushed already.
Some questions - what db platform? how are you using the data?
I'm going to assume you're just using this data as a lookup... and as you have no timestamp and no ability modify the existing table, i'd just blow away the local copy periodically and pull it down from the master table again.
Unless you've got a hell of a lot of data the overhead for this should be pretty small.
If you need to synch back to the master table, you'll need to do something a bit more exotic.
Can you use SQL replication? This would be preferable to writing code to do it no?

Categories

Resources