Copying Datatable Columns - c#

I have the following method that returns a trimmed down copy of a datatable based on the user selecting which 6 columns to keep. My problem is the datatable can be quite large and is taking up quite a bit of memory. By creating the initial copy, this is causing the system to have to start writing to the page file and slowing the application down considerably.
I'm wondering if it is possible to create a datatable copy but of only the specified columns (can be identified through name or index, doesn't matter) rather than creating the copy then removing the unnecessary columns?
This question appears to be asking the same thing but in VB.net.
private DataTable CreateCleanData()
{
var cleanedDataTable = _loadedDataData.Copy();
var columnsToKeep = new List<string>();
columnsToKeep.Add(1.SelectedValue.ToString());
columnsToKeep.Add(2.SelectedValue.ToString());
columnsToKeep.Add(3.SelectedValue.ToString());
columnsToKeep.Add(4.SelectedValue.ToString());
columnsToKeep.Add(5.SelectedValue.ToString());
columnsToKeep.Add(6.SelectedValue.ToString());
for (var i = cleanedDataTable.Columns.Count - 1; i >= 0; i--)
if (!columnsToKeep.Contains(cleanedDataTable.Columns[i].ColumnName))
cleanedDataTable.Columns.Remove(cleanedDataTable.Columns[i]);
cleanedDatTable.AcceptChanges();
GC.Collect();
return cleanedDataTable;
}

You could use this method, basically just use Clone instead of Copy:
public static DataTable CreateCleanData(DataTable source, params int[] keepColumns)
{
var cleanedDataTable = source.Clone(); // empty table but same columns
for (int i = cleanedDataTable.Columns.Count - 1; i >= 0; i--)
{
if (!keepColumns.Contains(i))
cleanedDataTable.Columns.RemoveAt(i);
}
cleanedDataTable.BeginLoadData();
foreach (DataRow sourceRow in source.Rows)
{
DataRow newRow = cleanedDataTable.Rows.Add();
foreach (DataColumn c in cleanedDataTable.Columns)
{
newRow.SetField(c, sourceRow[c.ColumnName]);
}
}
cleanedDataTable.EndLoadData();
return cleanedDataTable;
}

Related

Find matching records in DataTable as fast as possible

I have C# DataTables with very large numbers of rows, and in my importer app I must query these hundreds of thousands of times in a given import. So I'm trying to find the fastest possible way to search. Thus far I am puzzling over very strange results. First, here are 2 different approaches I have been experimenting with:
APPROACH #1
public static bool DoesRecordExist(string keyColumn, string keyValue, DataTable dt)
{
if (dt != null && dt.Rows.Count > 0)
return dt.Select($"{keyColumn} = '{SafeTrim(keyValue)}'").Count() > 0;
else
return false;
}
APPROACH #2
public static bool DoesRecordExist(string keyColumn, string keyValue, DataTable dt)
{
if (dt != null && dt.Rows.Count > 0)
{
int counter = dt.AsEnumerable().Where(r => string.Equals(SafeTrim(r[keyColumn]), keyValue, StringComparison.CurrentCultureIgnoreCase)).Count();
return counter > 0;
}
else
return false;
}
In a mock test I run each method 15,000 times, handing in hardcoded data. This is apples-to-apples, a fair test. Approach #1 is dramatically faster. But in actual app execution, Approach #1 is dramatically slower.
Why the counterintuitive results? Is there some other faster way to query datatables that I haven't tried?
EDIT: The reason I use datatables as opposed to other types of
collections is because all my datasources are either MySQL tables or
CSV files. So datatables seemed like a logical choice. Some of these
tables contain 10+ columns, so different types of collections seemed
an awkward match.
If you want a faster access and still want to stick to the DataTables, use a dictionary to store the row numbers for given keys. Here I assume that each key is unique in the DataTable. If not, you would have to use a Dictionary<string, List<int>> or Dictionary<string, HashSet<int>> to store the indexes.
var indexes = new Dictionary<string, int>();
for (int i = 0; i < dt.Rows.Count; i++) {
indexes.Add((string)dt.Rows[i].Column(keyColumn), i);
}
Now you can access a row in a super fast way with
var row = dt.Rows[indexes[theKey]];
I have a very similar issue except that I need the actual First Occurrence of a matching row.
Using the .Select.FirstOrDefault (Approach 1) takes 38 minutes to run.
Using the .Where.FirstOrDefault (Approach 2) takes 6 minutes to run.
In a similar situation where I didn't need the FirstOrDefault, but just needed to find and work with the uniquely matching record, what I found to be the fastest by far is to use a HashTable where the Key is the Combined Values of any Columns you are trying to match, and the Value is the Data Row itself. Finding a Match is near instant.
The Function is
public Hashtable ConvertToLookup(DataTable myDataTable, params string[] pKeyFieldNames)
{
Hashtable myLookup = new Hashtable(StringComparer.InvariantCultureIgnoreCase); //Makes the Key Case Insensitive
foreach (DataRow myRecord in myDataTable.Rows)
{
string myHashKey = "";
foreach (string strKeyFieldName in pKeyFieldNames)
{
myHashKey += Convert.ToString(myRecord[strKeyFieldName]).Trim();
}
if (myLookup.ContainsKey(myHashKey) == false)
{
myLookup.Add(myHashKey, myRecord);
}
}
return myLookup;
}
The usage is...
//Build the Lookup Table
Hashtable myLookUp = ConvertToLookup(myDataTable, "Col1Name", "Col2Name");
//Use it
if (myLookUp.ContainsKey(mySearchForValue) == true)
{
DataRow myRecord = (DataRow)myLookUp[mySearchForValue]);
}
All. BINGO! Wanted to share as a different answer just because my previous might be suited for a bit of a different approach. In this scenario, I was able to go from 8 MINUTES, down to 6 SECONDS, not using either approaches...
Again, the key is a HashTable, or in my case a dictionary because I had multiple records. To recap, for me, I needed to delete 1 row from my DataTable for every matching record I found in another DataTable. With the goal that in the end, my First Datatable only contained the "Missing" records.
This uses a different function...
// -----------------------------------------------------------
// Creates a Dictionary with Grouping Counts from a DataTable
public Dictionary<string, Int32> GroupBy(DataTable myDataTable, params string[] pGroupByFieldNames)
{
Dictionary<string, Int32> myGroupBy = new Dictionary<string, Int32>(StringComparer.InvariantCultureIgnoreCase); //Makes the Key Case Insensitive
foreach (DataRow myRecord in myDataTable.Rows)
{
string myKey = "";
foreach (string strGroupFieldName in pGroupByFieldNames)
{
myKey += Convert.ToString(myRecord[strGroupFieldName]).Trim();
}
if (myGroupBy.ContainsKey(myKey) == false)
{
myGroupBy.Add(myKey, 1);
}
else
{
myGroupBy[myKey] += 1;
}
}
return myGroupBy;
}
Now.. say you have a Table of Records that you want to use as the "Match Values" based on Col1 and Col2
Dictionary<string, Int32> myQuickLookUpCount = GroupBy(myMatchTable, "Col1", "Col2");
And now the magic. We are looping through your Primary Table, and removing 1 instance of a record for each instance in the Matching Table. This is the part that took 8 minutes with Approach #2, or 38 minutes with Approach #1.. but now only takes seconds.
myDataTable.AcceptChanges(); //Trick that allows us to delete during a ForEach!
foreach (DataRow myDataRow in myDataTable.Rows)
{
//Grab the Key Values
string strKey1Value = Convert.ToString(myDataRow ["Col1"]);
string strKey2Value = Convert.ToString(myDataRow ["Col2"]);
if (myQuickLookUpCount.TryGetValue(strKey1Value + strKey2Value, out Int32 intTotalCount) == true && intTotalCount > 0)
{
myDataTable.Delete(); //Mark our Row to Delete
myQuickLookUpCount [strKey1Value + strKey2Value ] -= 1; //Decrement our Counter
}
}
myDataTable.AcceptChanges(); //Commits our changes and actually deletes the rows.

How do I delete a datarow from a datarow array?

I am looping through a array of datarows and when a particular random item is not valid I want to remove that item and get the new total to get another random item.
But when I delete a datarow the datarow does not go away... And yes there is probably a much better way to do this but I am not smart enough to do it..
Instead of removing the row I see this inside
ItemArray = podLps[1].ItemArray threw an exception of type System.Data.RowNotInTableException
//PHASE 1: Get all LPs in the pod and add to collection
List<DataRow> allLps = dtLp.AsEnumerable().ToList();
DataRow[] podLps = allLps.Where(x => x.ItemArray[0].ToString() == loPod).ToArray();
//PHASE 2: Pick a random LP from collection that has valid WAVE1
for (int i = podLps.Count(); i > 0; i--)
{
//Recount items in collection since one may have been removed
int randomIndex = random.Next(podLps.Count());
var randomLpUserId = podLps[randomIndex].ItemArray[1].ToString();
var randomLpWave1 = int.Parse(podLps[randomIndex].ItemArray[2].ToString());
//Get WAVE1 # for selected LP
lpNumberOfLoans = GetNumberOfLoans(session, randomLpUserId);
//check if LP has valid WAVE1 then use this person
if (randomLpWave1 > lpNumberOfLoans)
{
return randomLpUserId;
}
else
{
podLps[randomIndex].Delete();
}
}
look at this example and it should point you in the right direction for removing rows I just tested it and it works
for (int i = myDataTable.Rows.Count - 1; i >= 0; i--)
{
DataRow row = myDataTable.Rows[i]; //Remove
if (myDataTable.Rows[i][0].ToString() == string.Empty)
{
myDataTable.Rows.Remove(row);
}
}
I would suggest to use a List for podLps instead of an array.
Then you can use .RemoveAt as Jaco mentioned (dosn't work for arrays).
DataRow.Delete() just flags the Row to be deleted in the next update of your DataTable.
The easiest method is to convert your array of DataRow[] to a List, call RemoveAt and then convert the list back to an array:
var dest = new List<>(podLps);
dest.RemoveAt(randomIndex);
podLps = dest.ToArray();

Is there a way to dynamically create an object at run time in .NET 3.5?

I'm working on an importer that takes tab delimited text files. The first line of each file contains 'columns' like ItemCode, Language, ImportMode etc and there can be varying numbers of columns.
I'm able to get the names of each column, whether there's one or 10 and so on. I use a method to achieve this that returns List<string>:
private List<string> GetColumnNames(string saveLocation, int numColumns)
{
var data = (File.ReadAllLines(saveLocation));
var columnNames = new List<string>();
for (int i = 0; i < numColumns; i++)
{
var cols = from lines in data
.Take(1)
.Where(l => !string.IsNullOrEmpty(l))
.Select(l => l.Split(delimiter.ToCharArray(), StringSplitOptions.None))
.Select(value => string.Join(" ", value))
let split = lines.Split(' ')
select new
{
Temp = split[i].Trim()
};
foreach (var x in cols)
{
columnNames.Add(x.Temp);
}
}
return columnNames;
}
If I always knew what columns to be expecting, I could just create a new object, but since I don't, I'm wondering is there a way I can dynamically create an object with properties that correspond to whatever GetColumnNames() returns?
Any suggestions?
For what it's worth, here's how I used DataTables to achieve what I wanted.
// saveLocation is file location
// numColumns comes from another method that gets number of columns in file
var columnNames = GetColumnNames(saveLocation, numColumns);
var table = new DataTable();
foreach (var header in columnNames)
{
table.Columns.Add(header);
}
// itemAttributeData is the file split into lines
foreach (var row in itemAttributeData)
{
table.Rows.Add(row);
}
Although there was a bit more work involved to be able to manipulate the data in the way I wanted, Karthik's suggestion got me on the right track.
You could create a dictionary of strings where the first string references the "properties" name and the second string its characteristic.

How to read tables from a particular place in a document?

When I use the below line It reads all tables of that particular document:
foreach (Microsoft.Office.Interop.Word.Table tableContent in document.Tables)
But I want to read tables of a particular content for example from one identifier to another identifier.
Identifier can be in the form of [SRS oraganisation_123] to another identifier [SRS Oraganisation_456]
I want to read the tables only in between the above mentioned identifiers.
Suppose 34th page contains my identifier so I want read all tables from that point to until I come across my second identifier. I don't want to read remaining tables.
Please ask me for any clarification in the question.
Say start and end Identifiers are stored in variables called myStartIdentifier and myEndIdentifier -
Range myRange = doc.Range();
int iTagStartIdx = 0;
int iTagEndIdx = 0;
if (myRange.Find.Execute(myStartIdentifier))
iTagStartIdx = myRange.Start;
myRange = doc.Range();
if (myRange.Find.Execute(myEndIdentifier))
iTagEndIdx = myRange.Start;
foreach (Table tbl in doc.Range(iTagStartIdx,iTagEndIdx).Tables)
{
// Your code goes here
}
Not sure how your program is structured... but if you can access the identifier in tableContent then you should be able to write a LINQ query.
var identifiers = new List<string>();
identifiers.Add("myIdentifier");
var tablesWithOnlyTheIdentifiersIWant = document.Tables.Select(tableContent => identifiers.Contains(tableContent.Identifier)
foreach(var tableContent in tablesWithOnlyTheIdentifiersIWant)
{
//Do something
}
Go through following code, if it helps you.
System.Data.DataTable dt = new System.Data.DataTable();
foreach (Microsoft.Office.Interop.Word.Cell c in r.Cells)
{
if(c.Range.Text=="Content you want to compare")
dt.Columns.Add(c.Range.Text);
}
foreach (Microsoft.Office.Interop.Word.Row row in newTable.Rows)
{
System.Data.DataRow dr = dt.NewRow();
int i = 0;
foreach (Cell cell in row.Cells)
{
if (!string.IsNullOrEmpty(cell.Range.Text)&&(cell.Range.Text=="Text you want to compare with"))
{
dr[i] = cell.Range.Text;
}
}
dt.Rows.Add(dr);
i++;
}
Go through following linked 3rd number answer.
Replace bookmark text in Word file using Open XML SDK

Problem removing row in datatable while enumerating

I get the following error while I try to delete a row while looping through it.
C#: Collection was modified; enumeration operation may not execute
I've been doing some research for a while, and I've read some similar posts here, but I still haven't found the right answer.
foreach (DataTable table in JobsDS.Tables)
{
foreach (DataRow row in table.Rows)
{
if (row["IP"].ToString() != null && row["IP"].ToString() != "cancelled")
{
string newWebServiceUrl = "http://" + row["IP"].ToString() + "/mp/Service.asmx";
webService.Url = newWebServiceUrl;
string polledMessage = webService.mpMethod(row["IP"].ToString(), row["ID"].ToString());
if (polledMessage != null)
{
if (polledMessage == "stored")
{
removeJob(id);
}
}
}
}
}
any help would be greatly appreciated
Instead of using foreach, use a reverse for loop:
for(int i = table.Rows.Count - 1; i >= 0; i--)
{
DataRow row = table.Rows[i];
//do your stuff
}
Removing the row indeed modifies the original collection of rows. Most enumerators are designed to explode if they detect the source sequence has changed in the middle of an enumeration - rather than try to handle all the weird possibilities of foreaching across something that is changing and probably introduce very subtle bugs, it is safer to simply disallow it.
You cannot modify a collection inside of a foreach around it.
Instead, you should use a backwards for loop.
If you want to remove Elements from a loop on a list of Elements, the trick is to use a for loop, start from the last Element and go to the first Element.
In your example :
int t_size = table.Rows.Count -1;
for (int i = t_size; i >= 0; i--)
{
DataRow row = table.Rows[i];
// your code ...
}
Edit : not quick enough :)
Also, if you depend on the order that you process the rows and a reverse loop does not work for you. You can add the rows that you want to delete to a List and then after you exit the foreach loop you can delete the rows added to the list. For example,
foreach (DataTable table in JobsDS.Tables)
{
List<DataRow> rowsToRemove = new List<DataRow>();
foreach (DataRow row in table.Rows)
{
if (row["IP"].ToString() != null && row["IP"].ToString() != "cancelled")
{
string newWebServiceUrl = "http://" + row["IP"].ToString() + "/mp/Service.asmx";
webService.Url = newWebServiceUrl;
string polledMessage = webService.mpMethod(row["IP"].ToString(), row["ID"].ToString());
if (polledMessage != null)
{
if (polledMessage == "stored")
{
//removeJob(id);
rowsToRemove.Add(row);
}
}
}
}
rowsToRemove.ForEach(r => removeJob(r["ID"].ToString()));
}
Somehow removeJob(id) changes one of the IEnumerables your enumerating (table.Rows or JobsDS.Tables, from the name of the method I guess it would be the latter), maybe via DataBinding.
I'm not sure the backwards for is going to work directly because it seems you're removing an element enumerated in the outer foreach from within the inner foreach. It's hard to tell without more info about what happens in removeJob(id).

Categories

Resources