Fastest way to copy a list of ado.net datatables - c#

I have a list of DataTables like
List<DataTable> a = new List<DataTable>();
I want to make a deep copy of this list (i.e. copying each DataTable). My code currently looks like
List<DataTable> aCopy = new List<DataTable>();
for(int i = 0; i < a.Rows.Count; i++) {
aCopy.Add(a[i].Copy());
}
The performance is absolutely terrible, and I am wondering if there is a known way to speed up such a copy?
Edit: do not worry about why I have this or need to do this, just accept that it is part of a legacy code base that I cannot change

if you have to copy a data table it is essentially an N time operation. If the data table is very large and causing a large amount of allocation you may be able to speed up the operation by doing a section at a time, but you are essentially bounded by the work set.

You can try the following - it gave me a performance boost, although your mileage might vary! I've adapted it to your example to demonstrate how to copy a datatable using an alternative mechanism - clone the table, then stream the data in. You could easily put this in an extension method.
List<DataTable> aCopy = new List<DataTable>();
for(int i = 0; i < a.Rows.Count; i++) {
DataTable sourceTable = a[i];
DataTable copyTable = sourceTable.Clone(); //Clones structure
copyTable.Load(sourceTable.CreateDataReader());
}
This was many times faster (around 6 in my use case) than the following:
DataTable copyTable = sourceTable.Clone();
foreach(DataRow dr in sourceTable.Rows)
{
copyTable.ImportRow(dr);
}
Also, If we look at what DataTable.Copy is doing using ILSpy:
public DataTable Copy()
{
IntPtr intPtr;
Bid.ScopeEnter(out intPtr, "<ds.DataTable.Copy|API> %d#\n", this.ObjectID);
DataTable result;
try
{
DataTable dataTable = this.Clone();
foreach (DataRow row in this.Rows)
{
this.CopyRow(dataTable, row);
}
result = dataTable;
}
finally
{
Bid.ScopeLeave(ref intPtr);
}
return result;
}
internal void CopyRow(DataTable table, DataRow row)
{
int num = -1;
int newRecord = -1;
if (row == null)
{
return;
}
if (row.oldRecord != -1)
{
num = table.recordManager.ImportRecord(row.Table, row.oldRecord);
}
if (row.newRecord != -1)
{
if (row.newRecord != row.oldRecord)
{
newRecord = table.recordManager.ImportRecord(row.Table, row.newRecord);
}
else
{
newRecord = num;
}
}
DataRow dataRow = table.AddRecords(num, newRecord);
if (row.HasErrors)
{
dataRow.RowError = row.RowError;
DataColumn[] columnsInError = row.GetColumnsInError();
for (int i = 0; i < columnsInError.Length; i++)
{
DataColumn column = dataRow.Table.Columns[columnsInError[i].ColumnName];
dataRow.SetColumnError(column, row.GetColumnError(columnsInError[i]));
}
}
}
It's not surprising that the operation will take a long time; not only is it row by row, but it also does additional validation.

You should specify the capacity of the list otherwise it will have to grow internally to accommodate the data. See here for the detailed explanation.
List<DataTable> aCopy = new List<DataTable>(a.Count);

I found following approach much more efficient than other ways of filtering records like LINQ, provided your search criteria is simple:
public static DataTable FilterByEntityID(this DataTable table, int EntityID)
{
table.DefaultView.RowFilter = "EntityId = " + EntityID.ToString();
return table.DefaultView.ToTable();
}

Related

Copy a range of data rows to another datatable - C#

I have a data table that contains 100 rows, I want to copy a range of rows(31st row to 50th row) to another data table.
I am following below logic.
DataTable dtNew = table.Clone();
for(int k=30; k < 50 && k < table.Rows.Count; k++)
{
dtNew.ImportRow(table.Rows[k]);
}
Is there any better approach to do this?
Using LINQ you can do something like:
DataTable dtNew = table.Select().Skip(31).Take(20).CopyToDataTable();
Performance wise, using LINQ wont do any better, it however makes it more readable.
EDIT: Added handling check
int numOFEndrow = 20;
int numOfStartRow = 31;
if (table.Rows.Count > numOFEndrow +numOfStartRow)
{
DataTable dtNew = table.Select().Skip(numOfStartRow).Take(numOFEndrow).CopyToDataTable();
}
If it's about readability, then a good idea would be to throw this into an extension method.
Without changing your logic:
public static class Utils
{
public static void CopyRows(this DataTable from, DataTable to, int min, int max)
{
for (int i = min; i < max && i < from.Rows.Count; i++)
to.ImportRow(from.Rows[i]);
}
}
Then you can always reuse it without all the fancy syntax and know that it does exactly what you need if there is a concern of performance:
DataTable dt1 = new DataTable();
DataTable dt2 = new DataTable();
dt1.CopyRows(dt2, 30, 50);

C# - Specified cast is not valid using DataTable and Field<int>

I have a csv file with 8 columns, and I am trying to populate an object with 8 variables, each being a list to hold the columns in the csv file. Firstly, I am populating a DataTable with my csv data.
I am now trying to populate my object with the data from the DataTable
DataTable d = GetDataTableFromCSVFile(file);
CoolObject l = new CoolObject();
for (int i = 0; i < d.Rows.Count; i++)
{
l.column1[i] = d.Rows[i].Field<int>("column1"); <-- error here
}
And here is my CoolObject
public class CoolObject
{
public List<int> column1 { set; get; }
protected CoolObject()
{
column1 = new List<int>();
}
}
Unfortunately I am receiving an error on the highlighted line:
System.InvalidCastException: Specified cast is not valid
Why is this not allowed? How do I work around it?
Obviously you DataTable contains columns of type string, so do integer validation in GetDataTableFromCSVFile method, so consumers of this method don't need to worry about it.
Obviously you DataTable contains columns of type string, so do integer validation in GetDataTableFromCSVFile method, so consumers of this method don't need to worry about it.
private DataTable GetDataTableFromCSVFile()
{
var data = new DataTable();
data.Columns.Add("Column1", typeof(int));
// Read lines of file
// line is imaginery object which contains values of one row of csv data
foreach(var line in lines)
{
var row = data.NewRow();
int.TryParse(line.Column1Value, out int column1Value)
row.SetField("Column1", column1Value) // will set 0 if value is invalid
// other columns
}
return data;
}
Then another problem with your code, that you assugn new values to List<int> through index, where list is empty
l.column1[i] = d.Rows[i].Field<int>("column1");
Above line will throw exception because empty list doesn't have item on index i.
So you in the end your method will look
DataTable d = GetDataTableFromCSVFile(file);
CoolObject l = new CoolObject();
foreach (var row in d.Rows)
{
l.column1.Add(row.Field<int>("column1"));
}
In case you are using some third-party library for retrieving data from csv to DataTable - you can check if that library provide possibility to validate/convert string values to expected types in DataTable.
Sounds like someone didn't enter a number in one of the cells. You'll have to perform a validation check before reading the value.
for (int i = 0; i < d.Rows.Count; i++)
{
object o = d.rows[i]["column1"];
if (!o is int) continue;
l.column1[i] = (int)o;
}
Or perhaps it is a number but for some reason is coming through as a string. You could try it this way:
for (int i = 0; i < d.Rows.Count; i++)
{
int n;
bool ok = int.TryParse(d.rows[i]["column1"].ToString(), out n);
if (!ok) continue;
l.column1[i] = n;
}

Correct way to lock a DataTable in C# for multithreading?

Is this the correct way to lock and modify a DataTable that is shared by multiple threads? If not, what would be the correct way to do it?
private void DoGeocodeLookup(object info)
{
ParameterRow data = (ParameterRow)info;
DataRow dr = data.Dr;
int localIndex = data.Index;
ManualResetEvent doneEvent = data.m_doneEvent;
Geocode_Google.GeoCode gc = new GeoCode();
gc.Addr_In = m_dtAddress.Rows[localIndex]["AddressStd_tx"].ToString();
gc.GeoCodeOne();
if (gc.Status == "OK")
{
//write back to temporary datatable
lock( m_TempTable.Rows.SyncRoot )
{
m_TempTable.Rows[localIndex]["GL_Address"] = gc.Thoroughfare_tx;
}
}
doneEvent.Set();
}
My Structure:
struct ParameterRow
{
private DataRow m_row;
private int m_index;
public DataRow Dr
{
get { return m_row; }
set { m_row = value; }
}
public int Index
{
get { return m_index; }
set { m_index = value; }
}
public ManualResetEvent m_doneEvent;
public ParameterRow( DataRow row, int index, ManualResetEvent doneEvent)
{
m_row = row;
m_index = index;
m_doneEvent = doneEvent;
}
}
Snippet where I start all threads:
//make temporary table
m_TempTable = new DataTable();
m_TempTable = m_dtAddress.Copy();
for (int index = 0; index < m_geocodes; index++)
{
doneEvents[index] = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(DoGeocodeLookup, new ParameterRow( m_dtAddress.Rows[index], index, doneEvents[index]));
}
WaitHandle.WaitAll(doneEvents);
Your example does not require any locking of the DataTable. Inside of DoGeocodeLookup you are only peforming reads of the DataTable. The only access you are performing on the table is to look up a row, which counts as a read. The DataTable class is marked as being safe for multi-threaded read operations. If you where doing something like adding new rows in DoGeocodeLookup then you would require locking.
The only thing that you are changing is the data in a single DataRow specified by localIndex. Since each call to DoGeocodeLookup uses a differnet row - a single row in your table will only ever be updated by one thread, so you do not have a synchronization issue there. So that will require no locking either.
This thread is very informative and may help with your question. The general consensus is to use Interlock.Increment(object to be updated). Locking is slow, and you have to remember any and all the places you may have locked the object you are updating. And volatile doesn't necessarily mean that CPU A would see what CPU B just changed right away.

Merging DataTables - disregarding the first row

How can I merge DataTable objects ignoring the first row?
The datatable I need to merge with the one I've got comes from a parsed CSV file and its first row (sometimes) still contains headers, which are obviously not supposed to end up in the resulting table...
DataTable.Merge method does not seem to offer such an option. What's the best way to do that? Just removing the first row beforehand? But that affects (alters) the "original", and what if I wanted it to stay as it was. Removing and reinserting after the merge? Smells like "clever coding". Is there really no better way?
Editing my previous
I wrote code on similar lines and ended up with all rows of dt1 intact and dt2 containing only row 2 &3 of from dt1
var dt1 = new DataTable("Test");
dt1.Columns.Add("id", typeof(int));
dt1.Columns.Add("name", typeof(string));
var dt2 = new DataTable("Test");
dt2.Columns.Add("id", typeof(int));
dt2.Columns.Add("name", typeof(string));
dt1.Rows.Add(1, "Apple"); dt1.Rows.Add(2, "Oranges");
dt1.Rows.Add(3, "Grapes");
dt1.AcceptChanges();
dt1.Rows[0].Delete();
dt2.Merge(dt1);
dt2.AcceptChanges();
dt1.RejectChanges();
Let me know if you find it acceptable.
Vijay
You could go through the rows separately and merge them into the table, something like
public static class DataTableExtensions
{
public static void MergeRange(this DataTable dest, DataTable table, int startIndex, int length)
{
List<string> matchingColumns = new List<string>();
for (int i = 0; i < table.Columns.Count; i++)
{
// Only copy columns with the same name and type
string columnName = table.Columns[i].ColumnName;
if (dest.Columns.Contains(columnName))
{
if (dest.Columns[columnName].DataType == table.Columns[columnName].DataType)
{
matchingColumns.Add(columnName);
}
}
}
for (int i = 0; i < length; i++)
{
int row = i + startIndex;
DataRow destRow = dest.NewRow();
foreach (string column in matchingColumns)
{
destRow[column] = table.Rows[row][column];
}
dest.Rows.Add(destRow);
}
}
}

What's a clean way to break up a DataTable into chunks of a fixed size with Linq?

Update: Here's a similar question
Suppose I have a DataTable with a few thousand DataRows in it.
I'd like to break up the table into chunks of smaller rows for processing.
I thought C#3's improved ability to work with data might help.
This is the skeleton I have so far:
DataTable Table = GetTonsOfData();
// Chunks should be any IEnumerable<Chunk> type
var Chunks = ChunkifyTableIntoSmallerChunksSomehow; // ** help here! **
foreach(var Chunk in Chunks)
{
// Chunk should be any IEnumerable<DataRow> type
ProcessChunk(Chunk);
}
Any suggestions on what should replace ChunkifyTableIntoSmallerChunksSomehow?
I'm really interested in how someone would do this with access C#3 tools. If attempting to apply these tools is inappropriate, please explain!
Update 3 (revised chunking as I really want tables, not ienumerables; going with an extension method--thanks Jacob):
Final implementation:
Extension method to handle the chunking:
public static class HarenExtensions
{
public static IEnumerable<DataTable> Chunkify(this DataTable table, int chunkSize)
{
for (int i = 0; i < table.Rows.Count; i += chunkSize)
{
DataTable Chunk = table.Clone();
foreach (DataRow Row in table.Select().Skip(i).Take(chunkSize))
{
Chunk.ImportRow(Row);
}
yield return Chunk;
}
}
}
Example consumer of that extension method, with sample output from an ad hoc test:
class Program
{
static void Main(string[] args)
{
DataTable Table = GetTonsOfData();
foreach (DataTable Chunk in Table.Chunkify(100))
{
Console.WriteLine("{0} - {1}", Chunk.Rows[0][0], Chunk.Rows[Chunk.Rows.Count - 1][0]);
}
Console.ReadLine();
}
static DataTable GetTonsOfData()
{
DataTable Table = new DataTable();
Table.Columns.Add(new DataColumn());
for (int i = 0; i < 1000; i++)
{
DataRow Row = Table.NewRow();
Row[0] = i;
Table.Rows.Add(Row);
}
return Table;
}
}
This is quite readable and only iterates through the sequence once, perhaps saving you the rather bad performance characteristics of repeated redundant Skip() / Take() calls:
public IEnumerable<IEnumerable<DataRow>> Chunkify(DataTable table, int size)
{
List<DataRow> chunk = new List<DataRow>(size);
foreach (var row in table.Rows)
{
chunk.Add(row);
if (chunk.Count == size)
{
yield return chunk;
chunk = new List<DataRow>(size);
}
}
if(chunk.Any()) yield return chunk;
}
This seems like an ideal use-case for Linq's Skip and Take methods, depending on what you want to achieve with the chunking. This is completely untested, never entered in an IDE code, but your method might look something like this.
private List<List<DataRow>> ChunkifyTable(DataTable table, int chunkSize)
{
List<List<DataRow>> chunks = new List<List<DataRow>>();
for (int i = 0; i < table.Rows.Count / chunkSize; i++)
{
chunks.Add(table.Rows.Skip(i * chunkSize).Take(chunkSize).ToList());
}
return chunks;
}
Here's an approach that might work:
public static class Extensions
{
public static IEnumerable<IEnumerable<T>> InPages<T>(this IEnumerable<T> enumOfT, int pageSize)
{
if (null == enumOfT) throw new ArgumentNullException("enumOfT");
if (pageSize < 1) throw new ArgumentOutOfRangeException("pageSize");
var enumerator = enumOfT.GetEnumerator();
while (enumerator.MoveNext())
{
yield return InPagesInternal(enumerator, pageSize);
}
}
private static IEnumerable<T> InPagesInternal<T>(IEnumerator<T> enumeratorOfT, int pageSize)
{
var count = 0;
while (true)
{
yield return enumeratorOfT.Current;
if (++count >= pageSize) yield break;
if (false == enumeratorOfT.MoveNext()) yield break;
}
}
public static string Join<T>(this IEnumerable<T> enumOfT, object separator)
{
var sb = new StringBuilder();
if (enumOfT.Any())
{
sb.Append(enumOfT.First());
foreach (var item in enumOfT.Skip(1))
{
sb.Append(separator).Append(item);
}
}
return sb.ToString();
}
}
[TestFixture]
public class Tests
{
[Test]
public void Test()
{
// Arrange
var ints = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
var expected = new[]
{
new[] { 1, 2, 3 },
new[] { 4, 5, 6 },
new[] { 7, 8, 9 },
new[] { 10 },
};
// Act
var pages = ints.InPages(3);
// Assert
var expectedString = (from x in expected select x.Join(",")).Join(" ; ");
var pagesString = (from x in pages select x.Join(",")).Join(" ; ");
Console.WriteLine("Expected : " + expectedString);
Console.WriteLine("Pages : " + pagesString);
Assert.That(pagesString, Is.EqualTo(expectedString));
}
}
Jacob wrote
This seems like an ideal use-case for
Linq's Skip and Take methods,
depending on what you want to achieve
with the chunking. This is completely
untested, never entered in an IDE
code, but your method might look
something like this.
private List<List<DataRow>> ChunkifyTable(DataTable table, int chunkSize)
{
List<List<DataRow>> chunks = new List<List<DaraRow>>();
for (int i = 0; i < table.Rows.Count / chunkSize; i++)
{
chunks.Add(table.Rows.Skip(i * chunkSize).Take(chunkSize).ToList());
}
return chunks;
}
Thanks for this Jacob - useful for me but I think the test in your example should be <= not <. If you use < and the number of rows is less than chunkSize the loop is never entered. Similarly the last partial chunk is not captured, only full chunks. As you've stated, the example is untested, etc so this is just an FYI in case someone else uses your code verbatim ;-)
Here is a completely differed approach. No memory is allocated for the chunks.
public static IEnumerable<IEnumerable<DataRow>> Chunkify(
this DataTable dataTable, int chunkSize)
{
for (int i = 0; i < dataTable.Rows.Count; i += chunkSize)
{
yield return GetChunk(i, Math.Min(i + chunkSize, dataTable.Rows.Count));
}
IEnumerable<DataRow> GetChunk(int from, int toExclusive)
{
for (int j = from; j < toExclusive; j++)
{
yield return dataTable.Rows[j];
}
}
}
Usage example:
var dataTable = GetTonsOfData();
foreach (var chunk in dataTable.Chunkify(1000))
{
Console.WriteLine($"Processing chunk of {chunk.Count()} rows");
foreach (var dataRow in chunk)
{
Console.WriteLine(dataRow[0]);
}
}
.NET (Core) 6 introduced the Chunk extension method that can be used to easily split a DataTable into batches:
IEnumerable<DataRow[]> chunks=myTable.AsEnumerable()
.Chunk(1000);
In earlier versions MoreLINQ's Batch extension method can be used to do the same:
IEnumerable<IEnumerable<DataRow>> chunks=myTable.AsEnumerable()
.Batch(1000);
Both can be used to split a DataTable into smaller onces. The following extension method does this, using a LoadRows helper to extract the row loading code:
public static IEnumerable<DataTable> Chunk(this DataTable source, int size)
{
ArgumentNullException.ThrowIfNull(source);
foreach (var chunk in source.AsEnumerable().Chunk(size))
{
var chunkTable = source.Clone();
chunkTable.MinimumCapacity = size;
chunkTable.LoadRows(chunk);
yield return chunkTable;
}
}
public static DataTable LoadRows(this DataTable table, IEnumerable<DataRow> rows)
{
ArgumentNullException.ThrowIfNull(table);
ArgumentNullException.ThrowIfNull(rows);
foreach (var row in rows)
{
table.ImportRow(row);
}
return table;
}
ArgumentNullException.ThrowIfNull(source); is another .NET Core addition that throws an ArgumentNullException using the parameter name if the parameter is null.
Finally chunkTable.MinimumCapacity = size; is used to reserve space for each table's rows, to avoid reallocations

Categories

Resources