Combine Referenced DataTables in C# - c#

I'm new to C# and my current code seems a bit of a hack- how can I keep combining referenced datatables more efficiently (less lines of code), or at least more readable?
(I have no say in the framework, BTW, I'm playing it as it lies ;))
I lose the last datatable once I reference it again
I can't import rows into a datatable that doesn't share the same schema.
This code address both issues, but man, it seems fugly.
DataTable dt_ref = new DataTable();
DataTable dt_final = new DataTable();
bool datatable_is_cloned;
String[] customer_ids = {"cust_a", "cust_b", "cust_c"}
foreach(String id in customer_ids)
{
if(!obj_cust.Select_by_customer_id(id, ref dt_ref, ref error_msg))
{
//do error handling
}
if(!datatable_is_cloned)
{
dt_final = dt_ref.Clone();
datatable_is_cloned = true;
}
foreach(DataRow r in dt_ref.Rows)
dt_final.ImportRow(r);
}
Edited for clarity.
Problem is is every time I loop and hit the database with another customer id, my dt_ref loses all of its previous results and gains merely the new ones (yes, this is expected behavior).
I want to keep a running total of all the results from select method.

Related

SSIS C# scrip modification to get file created time in loop/list

I"m using SSIS package with Script task to get files not older then n days and it's working fine, but now I need to bring into next step CreatedTime for each file. Below I pasted the body of my script. It works partially I just can't pass new var into LastUpdated. Frankly don't know how to deal with this structure, can I add another dimension to into existing list of create another list. I plan to use User:LastUpdated in the same way as FileNameArray.
Tx much !)
DataTable NewList = new DataTable();
DataColumn col = new DataColumn("FileName");
NewList.Columns.Add(col);
DataColumn col2 = new DataColumn("LastUpdated", System.Type.GetType("System.DateTime"));
NewList.Columns.Add(col2);
foreach (string f in MyDirFiles)
{
finf = new System.IO.FileInfo(f);
if (finf.LastWriteTime > DateTime.Now.AddDays(-7) )
)
{
NewList.Rows.Add(System.IO.Path.GetFileName(f) ,
System.IO.File.GetCreationTime(f));
}
}
Dts.Variables["User::FileNameArray"].Value = NewList.Columns["FileName"]; //<--- need convert into object
////**Dts.Variables["User::LastUpdated"].Value = NewList(xxx);
Dts.TaskResult = (int)ScriptResults.Success;
From your code and comments - can conclude the following:
NewList2 variable has DataTable type (not present in code)
User:LastUpdated SSIS package variable has DateTime type
In this case - you are trying to assign a complex structure (DataTable) to single value DateTime variable, which certainly raises an error. To do so, change type of User:LastUpdated to Object.
One can extend NewList table to contain both columns, like in the example below
DataTable NewList = new DataTable();
DataColumn col = new DataColumn("FileName");
NewList.Columns.Add(col);
DataColumn col2 = new DataColumn("LastUpdated", System.Type.GetType("System.DateTime"));
NewList.Columns.Add(col2);
Adding a new row will be more awkward.
DataRow newRow = NewList.NewRow();
newRow["FileName"] = System.IO.Path.GetFileName(f);
newRow["LastUpdated"] = System.IO.File.GetCreationTime(f);
NewList.Rows.Add(newRow);

Find matching records in DataTable as fast as possible

I have C# DataTables with very large numbers of rows, and in my importer app I must query these hundreds of thousands of times in a given import. So I'm trying to find the fastest possible way to search. Thus far I am puzzling over very strange results. First, here are 2 different approaches I have been experimenting with:
APPROACH #1
public static bool DoesRecordExist(string keyColumn, string keyValue, DataTable dt)
{
if (dt != null && dt.Rows.Count > 0)
return dt.Select($"{keyColumn} = '{SafeTrim(keyValue)}'").Count() > 0;
else
return false;
}
APPROACH #2
public static bool DoesRecordExist(string keyColumn, string keyValue, DataTable dt)
{
if (dt != null && dt.Rows.Count > 0)
{
int counter = dt.AsEnumerable().Where(r => string.Equals(SafeTrim(r[keyColumn]), keyValue, StringComparison.CurrentCultureIgnoreCase)).Count();
return counter > 0;
}
else
return false;
}
In a mock test I run each method 15,000 times, handing in hardcoded data. This is apples-to-apples, a fair test. Approach #1 is dramatically faster. But in actual app execution, Approach #1 is dramatically slower.
Why the counterintuitive results? Is there some other faster way to query datatables that I haven't tried?
EDIT: The reason I use datatables as opposed to other types of
collections is because all my datasources are either MySQL tables or
CSV files. So datatables seemed like a logical choice. Some of these
tables contain 10+ columns, so different types of collections seemed
an awkward match.
If you want a faster access and still want to stick to the DataTables, use a dictionary to store the row numbers for given keys. Here I assume that each key is unique in the DataTable. If not, you would have to use a Dictionary<string, List<int>> or Dictionary<string, HashSet<int>> to store the indexes.
var indexes = new Dictionary<string, int>();
for (int i = 0; i < dt.Rows.Count; i++) {
indexes.Add((string)dt.Rows[i].Column(keyColumn), i);
}
Now you can access a row in a super fast way with
var row = dt.Rows[indexes[theKey]];
I have a very similar issue except that I need the actual First Occurrence of a matching row.
Using the .Select.FirstOrDefault (Approach 1) takes 38 minutes to run.
Using the .Where.FirstOrDefault (Approach 2) takes 6 minutes to run.
In a similar situation where I didn't need the FirstOrDefault, but just needed to find and work with the uniquely matching record, what I found to be the fastest by far is to use a HashTable where the Key is the Combined Values of any Columns you are trying to match, and the Value is the Data Row itself. Finding a Match is near instant.
The Function is
public Hashtable ConvertToLookup(DataTable myDataTable, params string[] pKeyFieldNames)
{
Hashtable myLookup = new Hashtable(StringComparer.InvariantCultureIgnoreCase); //Makes the Key Case Insensitive
foreach (DataRow myRecord in myDataTable.Rows)
{
string myHashKey = "";
foreach (string strKeyFieldName in pKeyFieldNames)
{
myHashKey += Convert.ToString(myRecord[strKeyFieldName]).Trim();
}
if (myLookup.ContainsKey(myHashKey) == false)
{
myLookup.Add(myHashKey, myRecord);
}
}
return myLookup;
}
The usage is...
//Build the Lookup Table
Hashtable myLookUp = ConvertToLookup(myDataTable, "Col1Name", "Col2Name");
//Use it
if (myLookUp.ContainsKey(mySearchForValue) == true)
{
DataRow myRecord = (DataRow)myLookUp[mySearchForValue]);
}
All. BINGO! Wanted to share as a different answer just because my previous might be suited for a bit of a different approach. In this scenario, I was able to go from 8 MINUTES, down to 6 SECONDS, not using either approaches...
Again, the key is a HashTable, or in my case a dictionary because I had multiple records. To recap, for me, I needed to delete 1 row from my DataTable for every matching record I found in another DataTable. With the goal that in the end, my First Datatable only contained the "Missing" records.
This uses a different function...
// -----------------------------------------------------------
// Creates a Dictionary with Grouping Counts from a DataTable
public Dictionary<string, Int32> GroupBy(DataTable myDataTable, params string[] pGroupByFieldNames)
{
Dictionary<string, Int32> myGroupBy = new Dictionary<string, Int32>(StringComparer.InvariantCultureIgnoreCase); //Makes the Key Case Insensitive
foreach (DataRow myRecord in myDataTable.Rows)
{
string myKey = "";
foreach (string strGroupFieldName in pGroupByFieldNames)
{
myKey += Convert.ToString(myRecord[strGroupFieldName]).Trim();
}
if (myGroupBy.ContainsKey(myKey) == false)
{
myGroupBy.Add(myKey, 1);
}
else
{
myGroupBy[myKey] += 1;
}
}
return myGroupBy;
}
Now.. say you have a Table of Records that you want to use as the "Match Values" based on Col1 and Col2
Dictionary<string, Int32> myQuickLookUpCount = GroupBy(myMatchTable, "Col1", "Col2");
And now the magic. We are looping through your Primary Table, and removing 1 instance of a record for each instance in the Matching Table. This is the part that took 8 minutes with Approach #2, or 38 minutes with Approach #1.. but now only takes seconds.
myDataTable.AcceptChanges(); //Trick that allows us to delete during a ForEach!
foreach (DataRow myDataRow in myDataTable.Rows)
{
//Grab the Key Values
string strKey1Value = Convert.ToString(myDataRow ["Col1"]);
string strKey2Value = Convert.ToString(myDataRow ["Col2"]);
if (myQuickLookUpCount.TryGetValue(strKey1Value + strKey2Value, out Int32 intTotalCount) == true && intTotalCount > 0)
{
myDataTable.Delete(); //Mark our Row to Delete
myQuickLookUpCount [strKey1Value + strKey2Value ] -= 1; //Decrement our Counter
}
}
myDataTable.AcceptChanges(); //Commits our changes and actually deletes the rows.

Fastest way to update all rows to have same value in one column in datatable, without loop in C#

I have datatable "users" and column "is_published" in it. I have about 100k rows.
What is the fastest way to update value in the column, so the whole rows in column have same value = 1.
I try with classic foreach loop and it't slow, also I try with LINQ :
dsData.Tables["users"].Select().ToList().ForEach(x => x["is_published"] = 1;);
and it still isn't fast enough.
Also variant wit Expression doesn't work for me, because after that fields is ReadOnly and I can't change value again.
This is C#.
when you create your table you can simply push a default value to your column..
DataTable dt = new DataTable();
dt.Columns["is_published"].DataType = System.Int32;
dt.Columns["is_Published"].DefaultValue = 1;
then when you need to change the rows to default value ( or will you need? )
// Say your user selects the row which its index is 2..
// The ItemArray gives the selectedRow's cells as object..
// And say your columns index no is 5..
dt.Rows[2].ItemArray[5] = default ;
or
dt.Rows[2].ItemArray[5] = dt.Columns["is_published"].DefaultValue;
Separate the select and the update into two operations. Skip the ToList() operation and instead iterate afterwards over the IEnumerable collection using forEach and update the value:
var rows = dsData.Tables["users"].Select();
forEach(var row in rows)
{
row["is_published"] = 1;
}
The ToList forces an immediate query evaluation which in this case acts as a copy of all items from the IEnumerable collection, so you can gain some speed here. I ran some tests and the result in this case is (using your code and the modification): ToList is 3 times slower than iterating over IEnumerable and single update!
IMO 40 seconds is an awful lot for 100K items. If your DataTable is bound to a DataGridView or some other UI control, i believe that the update of the GUI is taking so long and not the update of the values itself. In my tests the update using ToList took fractions of a second (on my simple Lenovo netbook with AMD E-450 processor, and i assume you are not using a 386 machine). Try suspending the UI bevor updating and refreshing the values and then enable it again - example in this SO post.
My original post (as i can see you gained some speed using the code - interesting):
More an experiment for my part, but it is possible to:
convert the table to XML
fetch all elements that should be changed
change them
write the changed XML back to the table
The code:
// temp table
var dataTable = new DataTable("Table 1");
dataTable.Columns.Add("title", typeof(string));
dataTable.Columns.Add("number", typeof(int));
dataTable.Columns.Add("subnum1", typeof(int));
dataTable.Columns.Add("subnum2", typeof(int));
// add temp data
Enumerable.Range(1, 100000).ToList().ForEach(e =>
{
dataTable.Rows.Add(new object[] { "A", 1, 2, 3 });
});
// "bulk update"!
var sb = new StringBuilder();
var xmlWriter = XmlWriter.Create(sb);
dataTable.WriteXml(xmlWriter);
var xml = XDocument.Parse(sb.ToString());
// take column to change
var elementsToChange = xml.Descendants("title").ToList();
// the list is referenced to the XML, so the XML is changed too!
elementsToChange.ForEach(e => e.Value = "Z");
// clear current table
dataTable.Clear();
// write changed data back to table
dataTable.ReadXml(xml.CreateReader());
The table is updated. IMO the parts that make this solution slow are the
convertion from and to XML
and the fill of the StringBuilder
The other way around the pure update of the list is probably faster than the table update.
Finaly! I speed up update so it takes 2-3 sec. I added BeginLoadData() and EndLoadData()
DataTable dt = ToDataSet().Tables["users"];
var sb = new StringBuilder();
var xmlWriter = XmlWriter.Create(sb);
dt.WriteXml(xmlWriter);
var xml = XDocument.Parse(sb.ToString());
xml.Descendants("is_published").ToList().ForEach(e => e.Value = "1");
dt.Clear();
dt.BeginLoadData();
dt.ReadXml(xml.CreateReader());
dt.EndLoadData();

Query - row based table

I have no control over how the data is saved in this table. However, I have to query the table and combine the data for similar pn_id column as one row/record.
For instance current data structure is as follows,
Here we have same pn_id repeated with different question ids. This should have been really saved as one pn_id and then each question as a separate column, per my opinion. However, I have to retrieve the below data as one record like this this..
Any idea how this can be done?
Thanks
Here's some pseudocode for the transform algorithm. Note that it requires scanning the entire data set twice; there are a few other opportunities to improve the efficiency, for example, if the input data can be sorted. Also, since it's pseudocode, I haven't added handling for null values.
var columnNames = new HashSet<string> { "pn_id" };
foreach (var record in data)
columnNames.Add(record.question_id.ToString());
var table = new DataTable();
foreach (var name in columnNames)
table.Columns.Add(new DataColumn(name, typeof(string)));
foreach (var record in data)
{
var targetRecord = CreateNewOrGetExistingRecord(table, record.pn_id);
targetRecord[record.question_id.ToString()] = record.char_value ?? record.date_value.ToString();
}
And here's a sketch of the helper method:
DataRow CreateNewOrGetExistingRecord(DataTable table, object primaryKeyValue)
{
var result = table.Find(primaryKeyValue);
if (result != null)
return result;
//add code here to create a new row, add it to the table, and return it to the caller
}
the structure is fine. Wouldn't make sense to have one columns per question because you would have to add a new column every time a new question were added.
Your problem can easily be solved with PIVOT. Take a look at this link for explanation

C# convert Datatable to class

I am trying to convert a Datatable to a c# class.
I am using the following method to convert
this i developed as a console application. I have not referenced entity framework on console application.
**Class1** items = dt.AsEnumerable().Select(row =>
new Class1
{
id = row.Field<string>("id"),
name = row.Field<string>("name")
}).FirstOrDefault();
when i implied this code to my real time project
I am getting the following error
The type 'System.Data.Objects.DataClasses.IEntityWithKey' is defined in an assembly that is not referenced.
I do not want to refer entity frame work and in my console application i have not referenced any thing.It is working perfectly. why i am getting this error in my real time project
Is there any other way to convert datatable to c# class.
My application is in c# ,visual studio 2008 console application.
error is showing in Class1
Console and real time project are in vs 2008
Once you have your referencing issue sorted out, and assuming you do actually want to convert from rows in a DataTable to instances of a .NET class, take a look at this blog post.
Lets you do this:
// create and fill table
DataTable table = new DataTable();
table.Columns.Add("Id", typeof(int));
table.Rows.Add(new object[]{1});
table.Rows.Add(new object[]{2});
table.Rows.Add(new object[]{3});
// create a wrapper around Rows
LinqList<DataRow> rows = new LinqList<DataRow>(table.Rows);
// do a simple select
IEnumerable<DataRow> selectedRows = from r in rows
where (int)r["Id"] == 2
select r;
// output result
foreach (DataRow row in selectedRows)
Console.WriteLine(row["Id"]);
I just wrote this as a quick test - literally just add new console app and the below code:
static void Main(string[] args)
{
var table = new DataTable();
table.Columns.Add("id", typeof(string));
table.Columns.Add("name", typeof(string));
table.Rows.Add(new object[] { 1, "test" });
var item = table.AsEnumerable().Select(row =>
new {
id = row.Field<string>("id"),
name = row.Field<string>("name")
}).First();
Console.WriteLine(item.name);
}
I'm only referencing System.Data and System.Data.DataSetExtensions (well.. and System/System.Core/Microsoft.CSharp/System.Xml/System.Xml.Linq). The problem isn't with the code you posted, it lies somewhere that we can't see. Can you post the full list of DLLs that you are referencing.
You definitely do not require a reference to System.Data.Entity.dll to do what you're doing. Search through all your source files for System.Data.Objects.DataClasses to see where you are referencing a class in the Entity Framework library. It might also be pulled in from a library you are referencing.

Categories

Resources