Averaging every column in a DataTable

Averaging every column in a DataTable - c#

I am currently using the following LINQ statement to pull certain data out of my DataTable
var possibleRows = _Data.Select("Distance > " + (location.Distance - delta) + " AND Distance < " + (location.Distance + delta));
I would like to end up with a DataRow that contains an average of every column that was in the select statement above without having to iterate through each column. I have over 100 columns and all the data is numeric. Is there a simple way to do this?

You could just do something like:
var rows = possibleRows.Cast<DataRow>();
var averages = table.Columns.Cast<DataColumn>()
.Select(col => new {
Column = col,
Average = rows.Average(row => (double)row[col])
}).ToList();
Note this assumes double throughout.
Note this doesn't result in a DataRow; it results in a List<>, where each item in the list has the column and that column's average.

Related

Change order of rows in DataTable in C#

Is it possible to change order of rows in DataTable so for example the one with current index of 5 moves to place with index of 3, etc.?
I have this legacy, messy code where dropdown menu get it's values from DataTable, which get it's values from database. It is impossible to make changes in database, since it has too many columns and entries. My original though was to add new column in db and order by it's values, but this is going to be hard.
So since this is only matter of presentation to user I was thinking to just switch order of rows in that DataTable. Does someone knows the best way to do this in C#?
This is my current code:
DataTable result = flokkurDao.GetMCCHABAKflokka("MSCODE");
foreach (DataRow row in result.Rows)
{
m_cboReasonCode.Properties.Items.Add(row["FLOKKUR"].ToString().Trim() + " - " + row["SKYRING"]);
}
For example I want to push row 2011 - Credit previously issued to the top of the DataTable.
SOLUTION:
For those who might have problems with ordering rows in DataTable and working with obsolete technology that doesn't supports Linq this might help:
DataRow firstSelectedRow = result.Rows[6];
DataRow firstNewRow = result.NewRow();
firstNewRow.ItemArray = firstSelectedRow.ItemArray; // copy data
result.Rows.Remove(firstSelectedRow);
result.Rows.InsertAt(firstNewRow, 0);
You have to clone row, remove it and insert it again with a new index. This code moves row with index 6 to first place in the DataTable.

If you really want randomness you could use Guid.NewGuid in LINQ's OrderBy:
DataTable result = flokkurDao.GetMCCHABAKflokka("MSCODE");
var randomOrder = result.AsEnumerable().OrderBy(r => Guid.NewGuid());
foreach (DataRow row in randomOrder)
{
// ...
}
If you actually don't want randomness but you want specific values at the top, you can use:
var orderFlokkur2011 = result.AsEnumerable()
.OrderBy(r => r.Field<int>("FLOKKUR") == 2011 ? 0 : 1);

You can use linq to order rows:
DataTable result = flokkurDao.GetMCCHABAKflokka("MSCODE");
foreach (DataRow row in result.Rows.OrderBy(x => x.ColumnName))
{
m_cboReasonCode.Properties.Items.Add(row["FLOKKUR"].ToString().Trim() + " - " + row["SKYRING"]);
}
To order by multiple columns:
result.Rows.OrderBy(x => x.ColumnName).ThenBy(x => x.OtherColumnName).ThenBy(x.YetAnotherOne)
To order by a specific value:
result.Rows.OrderBy(x => (x.ColumnName == 2001 or x.ColumnName == 2002) ? 0 : 1).ThenBy(x => x.ColumName)
You can use the above code to "pin" certain rows to the top, if you want more granular than that you can use a switch for example to sort specific values into sorted values of 1, 2, 3, 4 and use a higher number for the rest.

You can not change the order or delete a row in a foreach loop, you should create a new datatable and randomly add the rows to new datatable, you should also track the inserted rows not to duplicate

Use a DataView
DataTable result = flokkurDao.GetMCCHABAKflokka("MSCODE");
DateView view = new DateView(result);
view.Sort = "FLOKKUR";
view.Filter = "... you can even apply an in memory filter here ..."
foreach (DataRowView row in view.Rows)
{
....
Every data table comes with a view DefaultView which you can use, this way you can apply the default sorting / filtering in your datalayer.
public DataTable GetMCCHABAKflokka(string tableName, string sort, string filter)
{
var result = GetMCCHABAKflokka(tableName);
result.DefaultView.Sort = sort;
result.DefaultView.Filter = filter;
return result;
}
// use like this
foreach (DataRowView row in result.DefaultView)

Filter data from DataTable

DataTable dtt = (DataTable)Session["ByBrand"];
var filldt = (dtt.Select("Price >= " + HiddenField1.Value + " and Price <= " + HiddenField2.Value + "")).CopyToDataTable();
this code is working fine when it found values in selected DataTable but it showing error when Values are not found in DataTable. So please tell me how to check if no record found.

Simply check if your Select returns anything?
DataTable dtt = (DataTable)Session["ByBrand"];
DataRow[] rows = dtt.Select("Price >= " + HiddenField1.Value + " and Price <= " + HiddenField2.Value + "");
if(rows.Length > 0)
{
var filldt = rows.CopyToDataTable();
}
Well, the Linq example from Tim is really nice, but to complete my answer.
The Select method returns Always a DataRow array also if there is no row selected, but then you cannot ask to build a datatable from this empty array. Think about it. What schema the CopyToDataTable should build for the resulting table if no rows are present in the array?

You have tagged Linq but you are using DataTable.Select which is an old method to filter a DataTable. Use Enumerable.Where and the strongyl typed Field extension method.
decimal priceFrom = decimal.Parse(HiddenField1.Value);
decimal priceTo = decimal.Parse(HiddenField2.Value);
var dtFiltered = dtt.AsEnumerable()
.Where(row => row.Field<decimal>("Price") >= priceFrom
&& row.Field<decimal>("Price") <= priceTo))
.CopyToDataTable();
Presuming that the type of the column is decimal, if it's a different type you need to use that in Field or convert it first.
Note that you need to add System.Linq(file) and a reference to System.Data.DataSetExtensions(project).
Update
but it showing error when Values are not found in DataTable
CopyToDataTable throws an exception if the input sequence is empty. In my opinion the best approach is to handle that case separately:
DataTable tblFiltered = dtt.Clone(); // clones only structure not data
var filteredRows = dtt.AsEnumerable()
.Where(row => row.Field<decimal>("Price") >= priceFrom
&& row.Field<decimal>("Price") <= priceTo));
if(filteredRows.Any())
{
tblFiltered = filteredRows.CopyToDataTable();
}
or this approach that might be more efficient since it doesn't need to use Any which can cause an additional full enumeration in worst case:
foreach(DataRow row in filteredRows)
{
tblFiltered.ImportRow(row);
}

Select statamen of a Datarow array

I have the following code, first I filter by a foreign key, but then with that result I need to filter more with dates.
But I cant understand the syntax of the select() method of a datarow array given below.
UC021_WizardStepSelectUnitDataSet.WizardStepSelectUnits_UnitsSelectedInOtherAgreementsRow[] datarows =
_uc021_WizardStepSelectUnitDataSet.WizardStepSelectUnits_UnitsSelectedInOtherAgreements.Select(
"UnitId = " + row.UnitID).Cast
<UC021_WizardStepSelectUnitDataSet.WizardStepSelectUnits_UnitsSelectedInOtherAgreementsRow>().ToArray();
DataRow[] dr = _uc021_WizardStepSelectUnitDataSet.
WizardStepSelectUnits_UnitsSelectedInOtherAgreements.Select(
"UnitId = " + row.UnitID);
if (datarows.Length > 0)
{
dr.Select("");
}

The Select on DataTable is similar to a Where clause you add to a query, in this case its filtering the records matching the row.UnitID which is to be found under column UnitId of the DataTable.
You can add multiple conditions by using AND within select like
.Select("UnitId = " + row.UnitID+ " AND IsActive='Y'")

How to query a datatable

I am using a XML file and the data from the XML is set into a dataset and user selected table is stored as a datatable.A query is been generated with filter criteria, grouping, aggregate function, expressions etc.Is it possible to query the datatable?
I did come accross table.Select(filter criteria, sort) method.but kindly let me know how grouping, aggregate function and expression (eg: Column1 + Column2 as SumColumn) can be got.

You could query the data using LINQ - assuming you are using a .Net Framework version that supports it. Check out LINQ To Dataset

Unfortunately, table.Select(filterCriteria, sort) is your only option without LINQ (I'm not a LINQ guru, so don't ask me what it can do).
Anytime I need something specific, I create/add that column to the DataTable.
DataTable table = new DataTable();
// code that populates the table
DataColumn c = table.Columns.Add("Column1 + Column2", typeof(int));
int Sum = 0;
for (int i = 0; i < table.Rows.Count; i++) {
r = table.Rows[i];
int col1 = (int)r["Column1"];
int col2 = (int)r["Column2"];
int both = col1 + col2;
Sum += both;
r[c] = string.Format("{0}", both);
}
DataRow summaryRow = table.NewRow();
summaryRow[c] = (int)((float)Sum / table.Rows.Count + 0.5); // add 0.5 to round
table.Rows.Add(summaryRow);
HTH.

DataTable.Select and Performance Issue in C#

I'm importing the data from three Tab delimited files in the DataTables and after that I need to go thru every row of master table and find all the rows in two child tables. Against each DataRow[] array I found from the child tables, I have to again go thru individually each row and check the values based upon different paramenters and at the end I need to create a final record which will be merger of master and two child table columns.
Now I have done that and its working but the problem is its Performance. I'm using the DataTable.Select to find all child rows from child table which I believe making it very slow.
Please remember the None of the table has any Primary key as the duplicate rows are acceptable.
At the moment I have 1200 rows in master table and aroun 8000 rows in child table and the total time it takes to do that is 8 minutes.
Any idea how can I increase the Performance.
Thanks in advance
The code is below ***************
DataTable rawMasterdt = importMasterFile();
DataTable rawDespdt = importDescriptionFile();
dsHelper = new DataSetHelper();
DataTable distinctdt = new DataTable();
distinctdt = dsHelper.SelectDistinct("DistinctOffers", rawMasterdt, "C1");
if (distinctdt.Rows.Count > 0)
{
int count = 0;
foreach (DataRow offer in distinctdt.Rows)
{
string exp = "C1 = " + "'" + offer[0].ToString() + "'" + "";
DataRow masterRow = rawMasterdt.Select(exp)[0];
count++;
txtBlock1.Text = "Importing Offer " + count.ToString() + " of " + distinctdt.Rows.Count.ToString();
if (masterRow != null )
{
Product newProduct = new Product();
newProduct.Code = masterRow["C4"].ToString();
newProduct.Name = masterRow["C5"].ToString();
// -----
newProduct.Description = getProductDescription(offer[0].ToString(), rawDespdt);
newProduct.Weight = getProductWeight(offer[0].ToString(), rawDespdt);
newProduct.Price = getProductRetailPrice(offer[0].ToString(), rawDespdt);
newProduct.UnitPrice = getProductUnitPrice(offer[0].ToString(), rawDespdt);
// ------- more functions similar to above here
productList.Add(newProduct);
}
}
txtBlock1.Text = "Import Completed";
public string getProductDescription(string offercode, DataTable dsp)
{
string exp = "((C1 = " + "'" + offercode + "')" + " AND ( C6 = 'c' ))";
DataRow[] dRows = dsp.Select( exp);
string descrip = "";
if (dRows.Length > 0)
{
for (int i = 0; i < dRows.Length - 1; i++)
{
descrip = descrip + " " + dRows[i]["C12"];
}
}
return descrip;
}

.Net 4.5 and the issue is still there.
Here are the results of a simple benchmark where DataTable.Select and different dictionary implementations are compared for CPU time (results are in milliseconds)
#Rows Table.Select Hashtable[] SortedList[] Dictionary[]
1000 43,31 0,01 0,06 0,00
6000 291,73 0,07 0,13 0,01
11000 604,79 0,04 0,16 0,02
16000 914,04 0,05 0,19 0,02
21000 1279,67 0,05 0,19 0,02
26000 1501,90 0,05 0,17 0,02
31000 1738,31 0,07 0,20 0,03
Problem:
The DataTable.Select method creates a "System.Data.Select" class instance internally, and this "Select" class creates indexes based on the fields (columns) specified in the query. The Select class makes re-use of the indexes it had created but the DataTable implementation does not re-use the Select class instance hence the indexes are re-created every time DataTable.Select is invoked. (This behaviour can be observed by decompiling System.Data)
Solution:
Assume the following query
DataRow[] rows = data.Select("COL1 = 'VAL1' AND (COL2 = 'VAL2' OR COL2 IS NULL)");
Instead, create and fill a Dictionary with keys corresponding to the different value combinations of the values of the columns used as the filter. (This relatively expensive operation must be done only once and the dictionary instance must then be re-used)
Dictionary<string, List<DataRow>> di = new Dictionary<string, List<DataRow>>();
foreach (DataRow dr in data.Rows)
{
string key = (dr["COL1"] == DBNull.Value ? "<NULL>" : dr["COL1"]) + "//" + (dr["COL2"] == DBNull.Value ? "<NULL>" : dr["COL2"]);
if (di.ContainsKey(key))
{
di[key].Add(dr);
}
else
{
di.Add(key, new List<DataRow>());
di[key].Add(dr);
}
}
Query the Dictionary (multiple queries may be required) to filter the rows and combine the results into a List
string key1 = "VAL1//VAL2";
string key2 = "VAL1//<NULL>";
List<DataRow>() results = new List<DataRow>();
if (di.ContainsKey(key1))
{
results.AddRange(di[key1]);
}
if (di.ContainsKey(key2))
{
results.AddRange(di[key2]);
}

I know this is an old question, and code underpinning this issue may have changed, but I've recently encountered (and gain some insight into) this very issue.
For anyone coming along at a later date ... here's what I found.
Performance of the DataTable.Select(condition) is quite sensitive to the nature and structure of the 'condition' you provide. This looks like a bug to me (where would I report it to Microsoft?) but it may merely be a quirk.
I've written a set of tests to demonstrate the issue that are structured as follows:
Define a datatable with a few simple columns,like this:
var dataTable = new DataTable();
var idCol = dataTable.Columns.Add("Id", typeof(Int32));
dataTable.Columns.Add("Code", typeof(string));
dataTable.Columns.Add("Name", typeof(string));
dataTable.Columns.Add("FormationDate", typeof(DateTime));
dataTable.Columns.Add("Income", typeof(Decimal));
dataTable.Columns.Add("ChildCount", typeof(Int32));
dataTable.Columns.Add("Foreign", typeof(Boolean));
dataTable.PrimaryKey = new DataColumn[1] { idCol };
Populate the table with 40000 records, each with a unique 'Code' field.
Perform a batch of 'selects' (each with different parameters) against the datatable using two similar, but differently formatted, queries and record and compare the total time taken by each of the two formats.
You get remarkable results. Testing, for example, the below two conditions side-by-side:
Q1: [Code] = 'XX'
Q2: ([Code] = 'XX')
[ I do multiple Select calls using the above two queries, each iteration I replace the XX with a valid code that exists in the datatable ]
The result?
Time comparison for 320 lookups against 40000 records: 180 msec total search time with no brackets, 6871 msec total search time for search WITH brackets
Yes - 38 times slower if you just have the extra brackets surrounding the condition.
There are other scenarios which react differently.
For example,
[Code] = '{searchCode}' OR 1=0 vs ([Code] = '{searchCode}' OR 1=0) take similar (slow) times to execute, but:
[Code] = '{searchCode}' AND 1=1 vs ([Code] = '{searchCode}' AND 1=1) again shows the non-bracketed version to be close to 40 times faster.
I've not investigated all scenarios, but it seems that the introduction of brackets - either redundantly around a simple comparison check, or as required to specify sub-expression precedence - or the presence of an 'OR' slows the query down considerably.
I could speculate that the issue is caused by how the datatable parses the condition you use and how it creates and uses internal indexes ... but I won't.

You can speed it up a lot by using a dictionary. For example:
if (distinctdt.Rows.Count > 0)
{
// build index of C1 values to speed inner loop
Dictionary<string, DataRow> masterIndex = new Dictionary<string, DataRow>();
foreach (DataRow row in rawMasterdt.Rows)
masterIndex[row["C1"].ToString()] = row;
int count = 0;
foreach (DataRow offer in distinctdt.Rows)
{
Then in place of
string exp = "C1 = " + "'" + offer[0].ToString() + "'" + "";
DataRow masterRow = rawMasterdt.Select(exp)[0];
You would do this
DataRow masterRow;
if (masterIndex.ContainsKey(offer[0].ToString())
masterRow = masterIndex[offer[0].ToString()];
else
masterRow = null;

If you create a DataRelation between your parent and child DataTables, you can look up child rows by invoking DataRow.GetChildRows(DataRelation) on the parent row (resp. DataRow.GetChildRelName in case of typed DataSets). The search will apply a TreeMap lookup, and performance should be fine even with a lot of child rows.
In case you have to search for rows based on other criteria than on a DataRelation's foreign keys, I recommend to use DataView.Sort / DataView.FindRows() instead of DataTable.Select(), as soon as you have to query the data more than once. DataView.FindRows() will be based on TreeMap lookup (O(log(N)), where as DataTable.Select() has to scan all rows (O(N)). This article contains more details: http://arnosoftwaredev.blogspot.com/2011/02/when-datatableselect-is-slow-use.html

DataTables can be made to have Relationships with other DataTables in a DataSet. See http://msdn.microsoft.com/en-us/library/ay82azad%28VS.71%29.aspx for a bit of discussion and as a start point to browsing. I've not much experience of using them but as I understand it they will do what you want (assuming your tables are in a suitable format). I would assume that these have greater efficiency than a manual process of doing the same but I may be wrong. Might be worth seeing if they work for you and benchmarking to see if they are an improvement or not...

Have you ran it through a profiler? That should be the first step. Anyhow, this might help:
Read the master text file into memory line by line. Put the master record into a dictionary as the key. Add it to the dataset (1 pass through master).
Read child text file line by line, add this as a value for the appropriate master record in the dictionary created above
Now you have everything in the dictionary in memory, only doing 1 pass through each file.
Do a final pass through the dictionary/children and process each column and perform final calcs.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Averaging every column in a DataTable - c#

Related

Change order of rows in DataTable in C#

Filter data from DataTable

Select statamen of a Datarow array

How to query a datatable

DataTable.Select and Performance Issue in C#

Categories

Resources