I have a Windows Service application that receives a stream of data with the following format
IDX|20120512|075659|00000002|3|AALI |Astra Agro Lestari Tbk. |0|ORDI_PREOPEN|12 |00000001550.00|00000001291.67|00001574745000|00001574745000|00500|XDS1BXO1| |00001574745000|ݤ
IDX|20120512|075659|00000022|3|ALMI |Alumindo Light Metal Industry Tbk. |0|ORDI |33 |00000001300.00|00000001300.00|00000308000000|00000308000000|00500|--U3---2| |00000308000000|õÄ
This data comes in millions of rows and in sequence 00000002....00198562 and I have to parse and insert them according to the sequence into a database table.
My question is, what is the best way (the most effective) to insert these data into my database? I have tried to use a simple method as to open a SqlConnection object then generate a string of SQL insert script and then execute the script using SqlCommand object, however this method is taking too long.
I read that I can use Sql BULK INSERT but it has to read from a textfile, is it possible for this scenario to use BULK INSERT? (I have never used it before).
Thank you
update: I'm aware of SqlBulkCopy but it requires me to have DataTable first, is this good for performance? If possible I want to insert directly from my data source to SQL Server without having to use in memory DataTable.
If you are writing this in C# you might want to look at the SqlBulkCopy class.
Lets you efficiently bulk load a SQL Server table with data from another source.
First, download free LumenWorks.Framework.IO.Csv library.
Second, use the code like this
StreamReader sr = new TextReader(yourStream);
var sbc = new SqlBulkCopy(connectionString);
sbc.WriteToServer(new LumenWorks.Framework.IO.Csv.CsvReader(sr));
Yeah, it is really that easy.
You can use SSIS "Sql Server Integration Service" for converting data from source data flow to destination data flow.
The source can be a text file and destination can be a SQL Server table. Your conversion executes in bulk insert mode.
Related
Summary
I have a requirement to modify the content of Database based on some input .txt file that modify thousands of records in database, for each being a business transaction (daily around 50 transaction will performed).
My application will read that .txt file and perform the modification to data in SQL Server database.
The current application that imports the data from DB and perform the data modification in memory (DataTable) and later after that push back to database it does so using SqlBulkCopy into a SQL Server 2008 database table.
Does anyone know of a way to use SqlBulkCopy while preventing duplicate rows, without a primary key? Or any suggestion for a different way to do this?
Already implemented and dropped for performance issues.
Before this I was using SQL statements was generated automated for data modifications but it's really slow, so I thought of loading complete database table into a DataTable (C#) memory perform look up and modifications and accept the changes to that memory ...
One more approach to implement, give me some feedback about my new approach please right me if I am wrong ..
Steps
load to database table into C# DataTable (fill DataTable using SqlDataAdapter)
Once DataTable is in memory, perform data modifications on it
Load again the base table in database and compare in memory prepare the
non-existing records and finally perform insert.
push the DataTable to memory using Bulk insert
I cant have Primary key!!!!
Please give me any suggestions for my workflow. and whether i am in right approach to deal my problem?.
I have a text file that contains about a million records. What is the best way to insert them into a SQL Server database from C#?
Can I use BULK INSERT?
Best way is to use the bcp utility or an SSIS workflow. Those tools have refinements like caching and batching your will miss in a naive implementation. Next best option is BULK INSERT statement, as long as the SQL Server engine itself can reach the file. Last option would be SqlBulkCopy class which allows your app do read the file, maybe process it and transform it, then feed the data as an enumerator to the SqlBulkCopy.
I recently worked on same kind of problem , i realized there are couple solutions to it.
I wrote a BatchProgram (for batch program design read this - http://msdn.microsoft.com/en-us/magazine/cc164014.aspx)
You can use SQL Server Utilities either BCP.exe or OSQL.exe or .net framework supplied SQLBulkCopy class.
I ended up using BCP ( i got a CSV file and used a formatting file and load the data) and OSQL ( i used OSQL where i have to supply a file to the the stored proc )
i also went to .NET Process class and used outputdatarecieved event to log all output of BCP.exe into console (Read this - http://msdn.microsoft.com/en-us/library/system.diagnostics.process.outputdatareceived.aspx) this worked pretty well.
I also tried to SQLBulkCopy class , but it can be slow if you load data first to datatable ( http://msdn.microsoft.com/en-us/library/ex21zs8x.aspx ) , if you use IDataReader ( http://msdn.microsoft.com/en-us/library/434atets.aspx ) it could be fast .
Since i had millions rows i tried using CSVReader ( http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader ) which is pretty fast .But down the line there was too much problem in data conversion and i did not have much flexibity in SQL server side .
I ended up using BCP and OSQL.
Is using C# a requirement? The fastest way is to use the bcp command line tool: http://msdn.microsoft.com/en-us/library/ms162802.aspx
I'm importing data from a remote MySQL server. I'm connecting to the MySQL database through a SSH connection and then pulling the data into MS SQL Server. There are a couple of type checks that need to be performed, especially the MySQL DateTime to the MS SQL DateTime. Initially I thought about using the MySqlDataReader to read the data into a List<T> to ensure correct types and then pushing the data into a DataSet and then into MS SQL Server.
Is this a good approach or should I be looking into doing this a different way? I can certainly do a bulk insert into SQL Server but then I'll have to deal with the data types later.
Thoughts?
I personally wouldn't use a dataset in the process, but moving it into a .NET type, then using a parameterized SQL statement will work just fine.
If you have a very large set, you might think of looking at a bulk insert, but that will depend on the size of the set.
Here's a Microsoft Guideline for going from MySQL to Sql Server 2000:
http://technet.microsoft.com/en-us/library/cc966396.aspx
SQL Server has a rich set of tools and utilities to ease the migration from MySQL. SQL Server 2000 Data Transformation Services (DTS) is a set of graphical tools and programmable objects for extraction, transformation, and consolidation of data from disparate sources into single or multiple destinations.
From reading this article you can import your MySQL without writing a line of C#
You know, if it's a process you need to perform but you aren't limited to writing your own code, you might want to look at Talend. It's an open source tool for ETL (essentially data transforms between data sources).
It's open source and has a nice GUI for designing the transform - where things come from and where they go to, plus what happens in the middle.
http://www.talend.com/index.php
Just a thought, but if you're just trying to reach a goal as opposed to write the tool, it may be quicker and more flexible in the long run for you.
The easiest way to do it to convert it to Timestamp.
Timestamp SetupStart_ts = rs.getTimestamp("SetupStart");
String SetupStart = SetupStart_ts.toString()
Push it to mssql server straightaway and it will save automatically in the datetime but varify it.Thank you.
Here I am facing a problem that I want to pass a dataset to a SQL Server stored procedure and I don't have any idea about it and there is no alternate solution (I think so ) to do that, let me tell what I want ...
I have an Excel file to be read , I read it successfully and all data form this excel work book import to a dataset. Now this data needs to be inserted into two different tables and there is too many rows in Excel workbook so it is not good if I run it from code behind that's why I want to pass this dataset to stored procedure and than ........
please suggest me some solution .
Not knowing what database version you're working with, here are a few hints:
if you need to read the Excel file regularly, and split it up into two or more tables, maybe you need to use something like SQL Server Integration Services for this. With SSIS, you should be able to achieve this quite easily
you could load the Excel file into a temporary staging table, and then read the data from that staging table inside your stored procedure. This works, but it gets a bit messy when there's a chance that multiple concurrent calls need to be handled
if you're using SQL Server 2008 and up, you should look at table-valued parameters - you basically load the Excel file into a .NET DataSet and pass that to the stored proc as a special parameter. Works great, but wasn't available in SQL Server before the 2008 release
since you're using SQL Server 2005 and table-valued parameters aren't available, you might want to look at Erland Sommarskog's excellent article Arrays and Lists in SQL SErver 2005 - depending on how big your data set is, one of his approaches might work for you (e.g. passing as XML which you parse/shred inside the stored proc)
Using C# (vs2005) I need to copy a table from one database to another. Both database engines are SQL Server 2005. For the remote database, the source, I only have execute access to a stored procedure to get the data I need to bring locally.
The local database I have more control over as it's used by the [asp.net] application which needs a local copy of this remote table. We would like it local for easier lookup and joins with other tables, etc.
Could you please explain to me an efficient method of copying this data to our local database.
The local table can be created with the same schema as the remote one, if it makes things simpler. The remote table has 9 columns, none of which are identity columns. There are approximately 5400 rows in the remote table, and this number grows by about 200 a year. So not a quickly changing table.
Perhaps SqlBulkCopy; use SqlCommand.ExecuteReader to get the reader that you use in the call to SqlBulkCopy.WriteToServer. This is the same as bulk-insert, so very quick. It should look something like (untested);
using (SqlConnection connSource = new SqlConnection(csSource))
using (SqlCommand cmd = connSource.CreateCommand())
using (SqlBulkCopy bcp = new SqlBulkCopy(csDest))
{
bcp.DestinationTableName = "SomeTable";
cmd.CommandText = "myproc";
cmd.CommandType = CommandType.StoredProcedure;
connSource.Open();
using(SqlDataReader reader = cmd.ExecuteReader())
{
bcp.WriteToServer(reader);
}
}
Bulk Copy feature of ADO.NET might help you take a look at that :
MSDN - Multiple Bulk Copy Operations (ADO.NET)
An example article
I would first look at using SQL Server Intergration Services (SSIS, née Data Transfer Services (DTS)).
It is designed for moving/comparing/processing/transforming data between databases, and IIRC allows an arbitrary expression for the source. You would need it installed on your database (shouldn't be a problem, it is part of a default install).
Otherwise a code solution, given the data size (small), pull all the data from the remove system into an internal structure, and then look for rows which don't exist locally to insert.
You probably can't do this, but if you can't, DON'T do it with a program. If you have any way of talking to someone who controls the source server, see if they will set up some sort of export of the data. If the data is as small as you say, then xml or csv output would be 100x better than writing something in c# (or any language).
So let's assume they can't export, still, avoid writing a program. You say you have more control over the destination. Can you set up an SSIS package, or setup a linked server? If so, you'll have a much easier time migrating the data.
If you set up at bare minimum the source as a linked server you could write a small t-sql batch to
TRUNCATE DestTable
INSERT INTO DestTable
SELECT SourceTable.Star FROM [SourceServer].[Schema].[Table]
wouldn't be as nice as SSIS (you have more visual of what's happening, but the t-sql above is pretty clear).
Since I would not take the programming route, the best solution I could give you would be, if you absolutely had to:
Use SqlClient namespace.
So, create 2 SqlConnections, 2 SqlCommands, and get the instance of the 1 SqlReader.
Iterate through the source reader, and execute the destination SqlCommand insert for each iteration with the.
It'll be ugly, but it'll work.
Doesn’t seem to be huge quantity of data you have to synchronize. Under conditions you described (only SP to access the remote DB and no way to get anything else), you can go for Marc Gravell’s solution.
In the case the data can only grow and existing data can not be changed you can compare the record count on remote and internal DB in order to optimize operation; if no change in remote DB no need to copy.