How can I do this. I have about 10000 records in an an Excel file and I want to insert all records as fast as possible into an access database?
Any suggestions?
What you can do is something like this:
Dim AccessConn As New System.Data.OleDb.OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0; Data Source=C:\Test Files\db1 XP.mdb")
AccessConn.Open()
Dim AccessCommand As New System.Data.OleDb.OleDbCommand("SELECT * INTO [ReportFile] FROM [Text;DATABASE=C:\Documents and Settings\...\My Documents\My Database\Text].[ReportFile.txt]", AccessConn)
AccessCommand.ExecuteNonQuery()
AccessConn.Close()
Switch off the indexing on the affected tables before starting the load and then rebuilding the indexes from scratch after the bulk load has finished. Rebuilding the indexes from scratch is faster than trying to keep them up to date while loading large amount of data into a table.
If you choose to insert row by row, then maybe want to you consider using transactions. Like, open transaction, insert 1000 records, commit transaction. This should work fine.
Use the default data import features in Access. If that does not suit your needs and you want to use C#, use standard ADO.NET and simply write record-for-record. 10K records should not take too long.
Related
Problem Statement: The requirement is straight-forward, which is we have a flat file(csv basically) which we need to load into one of the tables in Sql Server database. The problem arises when we have to derive a new column(not present in flat file) and populate this too alongwith rest of the columns from the file.
The derivation logic of the new columns is - find the max date of "TransactionDate".
The entire exercise is to be performed in SSIS and we were hoping to get it done by using DataFlowTask but stuck on how to derive the new column and then add it to the destination flow.
Ideas:
Use DataFlowTask to read the file and then store it in recordset so that in ControlFlow we would use ScriptTask to read it as DataTable and use LINQ sort-of to determine the max column and push it to another DataFlow to be consumed by Sql table (but this I guess would require creating of tabletype in database which I would avoid)
Perform the entire operation in DataFlowTask itself and we would be needing Asynchronous transformation (to get all the data and find out the max value)
We are kind of out-of-ideas here and any lead would be much appreciated and do let us know if any further information would be required on this regard.
Run a dataflow task to insert the data to your destination table. Follow that up with an Execute SQL task that calculates the MAX(TransactionDate) based on the values in the table with a NULL (or other new record indicator) MaxTransactionDate.
I'd like to transfer a large amount of data from SQL Server to MongoDB (Around 80 million records) using a solution I wrote in C#.
I want to transfer say 200 000 records at a time, but my problem is keeping track of what has already been transferred. Normally I'd do it as follows:
Gather IDs from destination to exclude from source scope
Read from source (Excluding IDs already in destination)
Write to destination
Repeat
The problem is that I build a string in C# containing all the IDs that exist in the destination, for the purpose of excluding those from source selection, eg.
select * from source_table where id not in (<My large list of IDs>)
Now you can imagine what happens here when I have already inserted 600 000+ records and then build a string with all the IDs, it gets large and slows things down even more, so I'm looking for a way to iterate through say 200 000 records at a time, like a cursor, but I have never done something like this and so I am here, looking for advice.
Just as a reference, I do my reads as follows
SqlConnection conn = new SqlConnection(myConnStr);
conn.Open();
SqlCommand cmd = new SqlCommand("select * from mytable where id not in ("+bigListOfIDs+")", conn);
SqlDataReader reader = cmd.ExecuteReader();
if (reader.HasRows)
{
while (reader.Read())
{
//Populate objects for insertion into MongoDB
}
}
So basically, I want to know how to iterate through large amounts of data without selecting all that data in one go, or having to filter the data using large strings. Any help would be appreciated.
Need more rep to comment, but if you sort by your id column you could change your where clause to become
select * from source_table where *lastusedid* < id and id <= *lastusedid+200000*
which will give you the range of 200000 you asked for and you only need to store the single integer
There are many different ways of doing this, but I would suggest first that you don't try to reinvent the wheel but look at existing programs.
There are many programs designed to export and import data between different databases, some are very flexible and expensive, but others come with free options and most DBMS programs include something.
Option 1:
Use SQL Server Management Studio (SSMS) Export wizards.
This allows you to export to different sources. You can even write complex queries if required. More information here:
https://www.mssqltips.com/sqlservertutorial/202/simple-way-to-export-data-from-sql-server/
Option 2:
Export your data in ascending ID order.
Store the last exported ID in a table.
Export the next set of data where ID > lastExportedID
Option 3:
Create a copy of your data in a back-up table.
Export from this table, and delete the records as you export them.
I'm deleting data from database which is about 1.8GB big. (through C# app)
The same operation on smaller databases (~600MB) run without problem, but on the big one I'm getting:
Lock wait timeout exceeded; try restarting transaction.
Will innodb_lock_wait_timeout fix the problem or there is another way?
I don't think that optimizing queries is a solution, because there is no way to make them simpler.
I'm deleting parts of the data on conditions and relations, not all the data.
You can split the delete statement into smaller parts that wont timeout.
Like delete these stuff from id 1 to 1000 ,execute and commit, do the same for ids 10.000 -20.000 and etc..
You mentioned that you were '...deleting parts of the data based on some conditions and relations, not all the data'. I would check that there are appropriate indexes on all the keys you are using to filter the data to delete.
If you were to show us your schema and where clause we could suggest ones that may help.
You should also consider splitting your delete into multiple batches of smaller numbers of rows.
Another alternative would be to do a SELECT INTO, with only the data you want to keep into another table, drop the original, then rename this new table.
Right Click the table--> Script Table As --> Create To --> New query.. save the query...
Right click the table --> delete
Refresh your Database and Intellisense to forget the table and then run the script which will recreate the table and that's how you have an empty table...
Or you can simply increase the setting for innodb_lock_wait_timeout (or table_lock_wait_timeout, not sure which) if you dont want to delete all the info in the table
If you're deleting all the rows in the table, use
Truncate table *tablename*
The delete command uses the transaction log when completing the task, but truncate cleans it out without logging.
We are pulling a huge data from sql server DB. It has around 25000 rows with 2500 columns. The requirement is to read the data and export it to spread sheet, so pagination is not a choice. When the records are less it is able to pull the data but when it grows to the size i mentioned above it is throwing exception.
public DataSet Exportexcel(string Username)
{
Database db = DatabaseFactory.CreateDatabase(Config);
DbCommand dbCommand =
db.GetStoredProcCommand("Sp_ExportADExcel");
db.AddInParameter(dbCommand, "#Username", DbType.String,
Username);
return db.ExecuteDataSet(dbCommand);
}
Please help me in resolving this issue.
The requirement is to read the data and export it to spread sheet, so
pagination is not a choice.
Why not read data in steps. Instead of getting all records at once why not get limited number of records every time and write them to excel. Continue until you have processed all the records
Your problem is purely down to the fact that you are trying to extract so much data in one go.
You may get around the problem by installing more memory in the machine doing the query, but this is just a bodge.
Your best to retrieve such amounts of data in steps.
You could quite easily read the data back row by row and export/append that in CSV format to a file and this could all be done in a stored procedure.
You don't say what database you are using, but handling such large amounts of data is what database engines are designed to cope with.
Other than that when handling large quantities of data objects in C# code its best to look into using generics as this doesn't impose object instantiation in the same way that classes do and so reduces the memory footprint.
You can use batch processing logic to fetch records in batches say 5000 records per execution and store the result in a temp dataset and once all processing is done. Dump the data from temp dataset to excel.
You can use C# BulkCopy class for this purpose.
If it is enough to have the data available in Excel as csv you can use bulk copy
bcp "select col1, col2, col3 from database.schema.SomeTable" queryout "c:\MyData.txt" -c -t"," -r"\n" -S ServerName -T
This is mangnitudes faster and has little footprint.
I am working on a feather that export some tables(~50) to a disk file and import the file back to database. Export is quite easy, serialize dataset to a file stream. But when importing: table structure need to be determined dynamically.What I am doing now :
foreach table in dataset
(compare table schemas that in db and imported dataset)
define a batch command
foreach row in table
contruct a single insert sqlcommand,add it to batch command
execute batch insert command
this is very inefficient and I also I meet some problem to convert datatype in dataset datatable to database datatable. So I want to know is there some good method to do so?
Edit:
In fact, import and export is 2 functions(button) in program, On UI, there is a grid that list lots of tables, what I need to implement is to export selected tables's data to a disk file and import data back to database later
Why not use SQL Server's native Backup and Restore functionality? You can do incremental Restores on the data, and it's by far the fastest way to export and then import data again.
There are a lot of very advanced options to take into account some fringe cases, but at it's heart, it's two commands: Backup Database and Restore Database.
backup database mydb to disk = 'c:\my\path\to\backup.bak'
restore database mydb from disk = 'c:\my\path\to\backup.bak'
When doing this against TB-sized databases, it takes about 45 minutes to an hour in my experience. Much faster than trying to go through every row!
I'm guessing you are using SQL server? if so I would
a) make sure the table names are showing up in the export
b) look into the BulkCopy command. that will allow you to push an entire table in. so you can loop through the datatables and bulk copy each one in.
using (SqlBulkCopy copy = new SqlBulkCopy(MySQLExpConn))
{
copy.ColumnMappings.Add(0, 0);
copy.ColumnMappings.Add(1, 1);
copy.ColumnMappings.Add(2, 2);
copy.ColumnMappings.Add(3, 3);
copy.ColumnMappings.Add(4, 4);
copy.ColumnMappings.Add(5, 5);
copy.ColumnMappings.Add(6, 6);
copy.DestinationTableName = ds.Tables[i].TableName;
copy.WriteToServer(ds.Tables[i]);
}
You can use XML serializatin but you will need good ORML tool like NHibernation etc to help you with it. XML Serialization will maintain its data type and will work flowlessly.
You can read entire table and serialize all values into xml file, and you can read entire xml file back into list of objects and you can store them into database. Using good ORML tool you will not need to write any SQL. And I think they can work on different database servers as well.
I finally choose SqlCommandBuilder to build insert command automatically
See
SqlCommandBuilder Class