I have a stored procedure that returns a large result set (nearly 20 million records). I need to save this result to multiple XML files. I am currently using ADO.Net to fill a dataset, but it quickly throws System.OutOfMemoryException. What other methods that I can use to accomplish this?
Are you using sql server ?
in this case there is a sql instruction to automatically convert the result of a query into a xml structure, you would then get it as a string in the application.
Options :
you split the string into several ones and save them to files (in the app)
modify PS to split result into several xml objects then get them as different strings / row (1 row => 1 object) and save each of them into a file.
write a new PS that calls the original PS, split result into X xml objects, then returns X xml strings that you just have to save in the application
Not using sql server ?
do the XML formatting in the PS or write a new one that does it
Anyway, if think it will be easier to do the xml formatting server side
Assuming you are using SQL Server - you can use paging in your stored procedure. ROW_NUMBER is an option. SQL Server 2012 and above support OFFSET and FETCH.
Also, how many DataTables are you filling? There are row limits for DataTables.
The maximum number of rows that a DataTable can store is 16,777,216
https://msdn.microsoft.com/en-us/library/system.data.datatable.aspx
Related
Problem Statement: The requirement is straight-forward, which is we have a flat file(csv basically) which we need to load into one of the tables in Sql Server database. The problem arises when we have to derive a new column(not present in flat file) and populate this too alongwith rest of the columns from the file.
The derivation logic of the new columns is - find the max date of "TransactionDate".
The entire exercise is to be performed in SSIS and we were hoping to get it done by using DataFlowTask but stuck on how to derive the new column and then add it to the destination flow.
Ideas:
Use DataFlowTask to read the file and then store it in recordset so that in ControlFlow we would use ScriptTask to read it as DataTable and use LINQ sort-of to determine the max column and push it to another DataFlow to be consumed by Sql table (but this I guess would require creating of tabletype in database which I would avoid)
Perform the entire operation in DataFlowTask itself and we would be needing Asynchronous transformation (to get all the data and find out the max value)
We are kind of out-of-ideas here and any lead would be much appreciated and do let us know if any further information would be required on this regard.
Run a dataflow task to insert the data to your destination table. Follow that up with an Execute SQL task that calculates the MAX(TransactionDate) based on the values in the table with a NULL (or other new record indicator) MaxTransactionDate.
I've been working on a VB application which parses multiple XML files, and create an Excel file from them.
The main problem of this is that I am, simply, reading each line of each XML and outputs them to the Excel file when a specific node is found. I would like to know if exists any method to store the data from each element, just to use it once everything (all the XML files) have been parsed.
I was thinking about databases but I think this is excessive and unnecesary. Maybe you can give me some ideas in order to make it working.
System.Data.DataSet can be used as an "in memory database".
You can use a DataSet to store information in memory - a DataSet can contain multiple DataTables and you can add columns to those at runtime, even if there are already rows in the DataTable. So even if you don't know the XML node names ahead of time, you can add them as columns as they appear.
You can also use DataViews to filter the data inside the DataSet.
My typical way of pre-parsing XML is to create a two-column DataTable with the XPATH address of each node and its value. You can then do a second pass that matches XPATH addresses to your objects/dataset.
I have a csv file with 3 million lines and want to stored it in a database using c#. The csv file looks like "device;date;value".
Shall I write it into an array or directly into a System.Data.DataTable? And what is the fastest way to store this DataTable into a Database (SQL-Server for example).
I tried to store the lines using 3 million insert into statements but it was too slow :)
thanks
You can load the data in a DataTable and then use SqlBulkCopy for copying the date to the table in sql server
The SqlBulkCopy class can be used to write data only to SQL Server
tables. However, the data source is not limited to SQL Server; any
data source can be used, as long as the data can be loaded to a
DataTable instance or read with a IDataReader instance.
.
I'd guess BCP would be pretty fast. Once you have the data in a DataTable you can try
using (SqlBulkCopy bcp= new SqlBulkCopy(yourConnectionString))
{
BulkCopy.DestinationTableName = "TargetTable";
BulkCopy.WriteToServer(dataTable);
}
I think the best way is to open stream reader and create row line by line. Use ReadLine in a while loop and split to find differents parts.
Sending 3 millions insert statements is bordering on crazy slow!
Buffer it by using transactions and reading in for example 200-1000 lines at a time (the smaller your data, the more you can read in at a time) then, after reading in these lines, commit your inserts to the database directly.
I have a stored procedure that returns few thousand records. I need to apply XSL to output of this SP. What would be the best way to do that
Read data in dataset, use XmlDataDocument and apply XSL
Output XML from SP and apply XSL on that
For 2, I am worried that XML size will be too big and C# code reading it will timeout. Please suggest.
You can use the SqlXmlCommand type from the namespace Microsoft.Data.SqlXml.
From msdn here:
SqlXmlCommand cmd = new SqlXmlCommand(ConnString);
cmd.CommandText = "SELECT TOP 20 FirstName, LastName FROM Person.Contact FOR XML AUTO";
cmd.XslPath = "MyXSL.xsl";
cmd.RootTag = "root";
...etc
If you're worried about timing out just set a larger timeout on your connection string.
We use a similar approach on our platform. But we do not apply the XSL directly on the stored procedure as we use MS SQL Server Express and it does not allow to apply the XSL's. At least that is what I have found when looking for this particular subject, a superior version of MSSQL Server is needed for such transformation.
The stored procedure that outputs the XML is big and filthy (but runs smoothly) and we have no issues reading back the results. Couple of seconds for each request at most, usually around couple thousands rows as result and joining several tables. Using VS2010 and VB.Net with Telerik components.
How can I do this. I have about 10000 records in an an Excel file and I want to insert all records as fast as possible into an access database?
Any suggestions?
What you can do is something like this:
Dim AccessConn As New System.Data.OleDb.OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0; Data Source=C:\Test Files\db1 XP.mdb")
AccessConn.Open()
Dim AccessCommand As New System.Data.OleDb.OleDbCommand("SELECT * INTO [ReportFile] FROM [Text;DATABASE=C:\Documents and Settings\...\My Documents\My Database\Text].[ReportFile.txt]", AccessConn)
AccessCommand.ExecuteNonQuery()
AccessConn.Close()
Switch off the indexing on the affected tables before starting the load and then rebuilding the indexes from scratch after the bulk load has finished. Rebuilding the indexes from scratch is faster than trying to keep them up to date while loading large amount of data into a table.
If you choose to insert row by row, then maybe want to you consider using transactions. Like, open transaction, insert 1000 records, commit transaction. This should work fine.
Use the default data import features in Access. If that does not suit your needs and you want to use C#, use standard ADO.NET and simply write record-for-record. 10K records should not take too long.