Split DataTable without using AsEnumerable() - c#

I am writing a C# script that will be run inside another project that makes use of the C# compiler. This engine does not have System.Data.DataSetExtensions referenced nor will it be allowed to do so.
That being said I still need to take a DataTable of 100,000 rows and break it up into smaller DataSets of 10,000 rows max. Normally I would use DataExtensions with something like ..
var filteredDataTable = myDataTable.AsEnumerable().Take(10000).CopyToDataTable();
What would be an efficient way to go about this without using DataSetExtensions? Should I be resigned to using a foreach loop and copying over 10,000 rows into new DataTables?

Should I be resigned to using a foreach loop and copying over 10,000
rows into new DataTables?
Yes, also you may consider writing your own extension method to slice the table and reuse it wherever required. Something like below -
public DataTable SliceTable(this DataTable dt, int rowsCount, int skipRows=0)
{
DataTable dtResult = dt.Clone();
for (int i = skipRows; i < dt.Rows.Count && rowsCount > 0; i++)
{
dtResult.ImportRow(dt.Rows[i]);
rowsCount --;
}
return dtResult;
}
Uses-
DataTable myData; -- Original table
var slice1 = myData.SliceTable(1000); // get slice of first 1000 rows
var slice2 = myData.SliceTable(1000,1000); // get rows from 1001 to 2000
var slice3 = myData.SliceTable(1000,2000); // get rows from 2001 to 3000

Related

Is it feasible to run many of dynamic SQL queries on DataTable object

A DataTable object, say Stocks(Ticker, Sector, Price)
I want to support dynamic queries (i.e. not known at compile time)
like this "Ticker like '%A' and price>7.50 "
I understand that DataView.RowFilter supports certain degree of SQL expression
but the problem is every time you set .RowFilter, the DataView rebuild the internal index, and that's too much overhead
And I try not to query SQL Server because
1) it's a small dataset and doesn't change much at runtime, thus can be in memory
2) I want to evaluate tens of thousands of these free-form SQL expressions
What options do I have? Thanks!
UPDATE:
Thx jp2code for DataTable.Select
I haven't verified, but this link http://www.codeproject.com/Tips/807941/DataTable-Select-is-Slow
says building a DataView will create an index which appears to favorably affect DataTable.Select, so I will just store all data in a DataTable object and create a DataView for each of the columns that will be involved in queries.
Call GetData(table) below once, then every other time call the overload GetData(table, sqlFilter):
private DataTable m_table;
public DataTable GetData(string sqlCmd)
{
DataTable table = new DataTable();
return table;
}
public DataTable GetData(DataTable table, string sqlFilter)
{
if (!String.IsNullOrEmpty(sqlFilter))
{
var copy = table.Clone();
foreach (DataRow row in table.Select(sqlFilter))
{
var newRow = copy.NewRow();
for (int i = 0; i < row.ItemArray.Length; i++)
{
newRow[i] = row.ItemArray[i];
}
copy.Rows.Add(newRow);
}
return copy;
}
return table;
}
There may be other ways, but this should get you started.

Insert large amount of data into database using LINQ

I want to insert around 1 million records into a database using Linq in ASP.NET MVC. But when I try the following code it didn't work. It's throwing an OutOfMemoryException. And also it took 3 days in the loop. Can anyone please help me on this???
db.Database.ExecuteSqlCommand("DELETE From [HotelServices]");
DataTable tblRepeatService = new DataTable();
tblRepeatService.Columns.Add("HotelCode",typeof(System.String));
tblRepeatService.Columns.Add("Service",typeof(System.String));
tblRepeatService.Columns.Add("Category",typeof(System.String));
foreach (DataRow row in xmltable.Rows)
{
string[] servicesarr = Regex.Split(row["PAmenities"].ToString(), ";");
for (int a = 0; a < servicesarr.Length; a++)
{
tblRepeatService.Rows.Add(row["HotelCode"].ToString(), servicesarr[a], "PA");
}
String[] servicesarrA = Regex.Split(row["RAmenities"].ToString(), ";");
for (int b = 0; b < servicesarrA.Length; b++)
{
tblRepeatService.Rows.Add(row["hotelcode"].ToString(), servicesarrA[b], "RA");
}
}
HotelAmenties _hotelamenties;
foreach (DataRow hadr in tblRepeatService.Rows)
{
_hotelamenties = new HotelAmenties();
_hotelamenties.Id = Guid.NewGuid();
_hotelamenties.ServiceName = hadr["Service"].ToString();
_hotelamenties.HotelCode = hadr["HotelCode"].ToString();
db.HotelAmenties.Add(_hotelamenties);
}
db.SaveChanges();
tblRepeatService table has around 1 million rows.
Bulk inserts like this are highly inefficient in LINQtoSQL. Every insert creates at least three objects (the DataRow, the HotelAmenities object and the tracking record for it), chewing up memory on objects you don't need.
Given that you already have a DataTable, you can use System.Data.SqlClient.SqlBulkCopy to push the content of the table to a temporary table on the SQL server, then use a single insert statement to load the data into its final destination. This is the fastest way I have found so far to move many thousands of records from memory to SQL.
If performance doesn't matter and this is a 1 shot job you can stick to the way you're using. Your problem is you're only saving at the end, so entity Framework has to store and generate the SQL for 1 million operations at once, modify your code so that you save every 1000 or so inserts instead of only at the end and it should work just fine.
int i = 0;
foreach (DataRow hadr in tblRepeatService.Rows)
{
_hotelamenties = new HotelAmenties();
_hotelamenties.Id = Guid.NewGuid();
_hotelamenties.ServiceName = hadr["Service"].ToString();
_hotelamenties.HotelCode = hadr["HotelCode"].ToString();
db.HotelAmenties.Add(_hotelamenties);
if((i%1000)==0){
db.SaveChanges();
}
i++;
}
db.SaveChanges();

Update all cells of a datatable in one go in c#

I have situation, I need my DataRows (string columns) to be HTMLEncoded. Can we update entire cells of a DataRow in one go using LINQ or any other way?
Basically I want to avoid loops.
I have a datatable oDt.
DataTable have following columns : id_season, season, modifiedon(date);
To save this dataTable i have a function
Save Table(DataTable oDT){
//Here I have to update modifedon column to DateTime.now;
foreach(datarow dr in oDT.Rows)
{
dr[modifiedon] = DateTime.now;
}
// I need to avoid this loop as datatable can have 35000 + records
}
You must update every element, so you are are looking at O(n) time regardless.
On my system, updating 35,000 elements takes between 0.01 and 0.03 seconds, 350,000 items takes .1 to .3 seconds, etc. You can get a small performance boost by getting the value of DateTime.Now outside the loop like this:
var now = DateTime.Now;
foreach(DataRow dr in oDT.Rows)
{
dr["ModifiedOn"] = now;
}

Copy DataTable.Rows to DataRow[]. Faster way then Select() with the same functionality?

I have now a problem with a very old system of ours. (!It is more then 7 years old and I have no budget and resources to make bigger change in the structure, so the decision to improve the old logic as many as we can.!)
We have an own written gridcontrol. Basically it is like a normal ASP.NET grid, you can add, change, delete elements.
The problem is that the grid has a BindGrid() method, where for further usage, the rows of the datasource table copied into a DataRow[]. I need to keep the DataRow[], but I would like to implement the best way to copy the source from the the table into the array.
The current solution:
DataRow[] rows = DataSource.Select("1=1", SortOrderString);
As I experienced so far, if I need to get a specified sort, that could be the best way (I'm also interested if it has a quicker way or not.)
BUT there are some simplified pages, where the SortOrder is not needed.
So I could make two method one for the sort order and one for without.
The real problem is the second one:
DataRow[] rows = DataSource.Select("1=1");
Because it is very slow. I made some test and it is kind of 15 times slower then the CopyTo() solution:
DataRow[] rows = new DataRow[DataSource.Rows.Count];
DataSource.Rows.CopyTo(rows,0);
I would like to use the faster way, BUT when I made the tests some old function simply crashed. It seems, there is an other difference, what I only noticed now:
The Select() gets the rows like the RowChanges are accepted.
So if I deleted a row, and I do not use the AcceptRowChanges() (I can't do that unfortunately), then with Select("1=1") the row is in the DataSource but not in the DataRow[].
With a simple .CopyTo() the row is there, and that is a bad news for me.
My questions are:
1) Is the Select("1=1") the best way to get the rows by the RowChanges? (I doubt a bit, because it is like 6 year old part)
2) And if 1) is not, is it possible to achieve a faster way with the same result than the .Select("1=1") ?
UPDATE:
Here is a very basic test app, what I used for speedtesting:
DataTable dt = new DataTable("Test");
dt.Columns.Add("Id", typeof (int));
dt.Columns.Add("Name", typeof(string));
for (int i = 0; i < 10000; i++)
{
DataRow row = dt.NewRow();
row["ID"] = i;
row["Name"] = "Name" + i;
dt.Rows.Add(row);
}
dt.AcceptChanges();
DateTime start = DateTime.Now;
DataRow[] rows = dt.Select();
/*DataRow[] rows = new DataRow[dt.Rows.Count];
dt.Rows.CopyTo(rows,0);*/
Console.WriteLine(DateTime.Now - start);
You can call Select without an argument: DataRow[] allRows = DataSource.Select(); That would be for sure more efficient than "1=1" since that applies a pointless RowFilter.
Another way is using Linq-To-DataSet to order and filter the DataTable. That isn't more efficient but more readable and maintainable.
I have yet no example or measurement, but it is obvious that a RowFilter with "1=1" is more expensive than none. Select is implemented in this way:
public Select(DataTable table, string filterExpression, string sort, DataViewRowState recordStates)
{
this.table = table;
this.IndexFields = table.ParseSortString(sort);
this.indexDesc = Select.ConvertIndexFieldtoIndexDesc(this.IndexFields);
// following would be omitted if you would use DataSource.Select() without "1=1"
if (filterExpression != null && filterExpression.Length > 0)
{
this.rowFilter = new DataExpression(this.table, filterExpression);
this.expression = this.rowFilter.ExpressionNode;
}
this.recordStates = recordStates;
}
If you want to be able to select also the rows that are currently not accepted, you can use the overload of Select:
DataRow[] allRows = DataSource.Select("", "", DataViewRowState.CurrentRows | DataViewRowState.Deleted);
This will select all rows inclusive the rows that are deleted even if AcceptChanges was not called yet.

C# Optimisation: Inserting 200 million rows into database

I have the following (simplified) code which I'd like to optimise for speed:
long inputLen = 50000000; // 50 million
DataTable dataTable = new DataTable();
DataRow dataRow;
object[] objectRow;
while (inputLen--)
{
objectRow[0] = ...
objectRow[1] = ...
objectRow[2] = ...
// Generate output for this input
output = ...
for (int i = 0; i < outputLen; i++) // outputLen can range from 1 to 20,000
{
objectRow[3] = output[i];
dataRow = dataTable.NewRow();
dataRow.ItemArray = objectRow;
dataTable.Rows.Add(dataRow);
}
}
// Bulk copy
SqlBulkCopy bulkTask = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, null);
bulkTask.DestinationTableName = "newTable";
bulkTask.BatchSize = dataTable.Rows.Count;
bulkTask.WriteToServer(dataTable);
bulkTask.Close();
I'm already using SQLBulkCopy in an attempt to speed things up, but it appears assigning values to the DataTable itself proves to be slow.
I don't know how DataTables work so I'm wondering if I'm creating unnecessary overhead by first creating a reusable array, then assigning it to a DataRow, then adding the DataRow to the DataTable? Or is using DataTable not optimal in the first place? The input comes from a database.
I don't care much about LOC, just speed. Can anyone give some advice on this?
For such a big table, you should instead use the
public void WriteToServer(IDataReader reader)
method.
It may mean you'll have to implement yourself a "fake" IDataReader interface with your code (if you' don't get the data from an existing IDataReader), but this way, you'll get "streaming" from end to end, and will avoid a 200 million loop.
Instead of holding a huge data table in memory, I would suggest implementing a IDataReader which serves up the data as the bulk copy goes. This will reduce the need to keep everything in memory upfront, and should thus serve to improve performance.
You should not construct entire datatable in memory. Use this overload of WrtieToServer, that takes array of DataRow. Just split in chunks your data.

Categories

Resources