Select from DataTable most recent result for each item - c#

I have a datatable containing thousands of rows. in the table there is a serial number column and a test number column. If a serial is tested more than one the test number increments. I need to be able to select the most recent test for each serial from my datatable and insert it into another datatable. Currently I am using this:
DataTable newdata = data.AsEnumerable().Where(x => x.Field<Int16>("Test") ==
data.AsEnumerable().Where(y => y.Field<string>("Serial") ==
x.Field<string>("SerialNumber")).Select(y =>
y.Field<Int16>("Test")).Max()).Select(x => x).CopyToDataTable();
This does do the job however as it is quite clear it is incredibly inefficient. Is there a more efficient way to select the top row of data for each serial number?
Thank you
Solution
So following on from Cam Bruce's answer I implemented the following code with a Dictionary rather than with a join:
//Get all of the serial numbers and there max test numbers
Dictionary<string, Int16> dict = data.AsEnumerable().GroupBy(x => x.Field<string>("SerialNumber")).ToDictionary(x => x.Key, x => x.Max(y => y.Field<Int16>("Test")));
//Create a datatable with only the max rows
DataTable newdata = data.AsEnumerable().Where(x => x.Field<Int16>("Test") ==
dict[x.Field<string>("SerialNumber")]).Select(x => x).CopyToDataTable();
//Clear the dictionary
dict.Clear();

This will give you each serial number, and the Max test. You can then join that result set back to the DataTable to get all the max rows.
var maxTest= data.AsEnumerable()
.GroupBy(g=> g.Field<string>("SerialNumber"))
.Select(d=> new
{
SerialNumber = g.Key
Test = g.Max(g.Field<Int16>("Field"))
};
var maxRows = from d in data.AsEnumerable()
join m in maxTest
on new { S = d.Field<string>("SerialNumber"), T = d.Field<Int16>("Test") }
equals new { S = m.SerialNumber, T = m.Test }
select d;

Related

List<T> joins DataTable

I have a List of objects (lst) and DataTable (dt). I want to join the lst and dt on the common field (code as string) and need to return all matching rows in the lst.
My List contains two columns i.e code and name along with values below:
code name
==== ====
1 x
2 y
3 z
The DataTable contains two columns i.e code and value along with values below:
code value
==== =====
3 a
4 b
5 c
The result is:
3 z
Below is my code; but I know it is not a correct statement and thus seeking your advice here. I would be much appreciated if you could guide me on how to write the correct statement.
var ld = from l in lst
join d in dt.AsEnumerable() on l.code equals d.code
select new { l.code, l.name };
You can use Linq query or Join extension method to join the collection on code. Just that when you select data from datatable, you need to use dt.Field method. Please use either of the following code.
Query1:
var ld = lst.Join(dt.AsEnumerable(),
l => l.code,
d => d.Field<string>("code"),
(l, d) => new
{
l.code,
l.name,
value = d.Field<string>("value")
}).ToList();
Query2:
var ld = (from l in lst
join d in dt.AsEnumerable()
on l.code equals d.Field<string>("code")
select new
{
l.code,
l.name,
value = d.Field<string>("value")
}).ToList();
Query3:
var ld = (from l in lst
join d in dt.AsEnumerable()
on l.code equals d.Field<string>("code")
let value = d.Field<string>("value")
select new
{
l.code,
l.name,
value
}).ToList();
You can try any of the below.
var ld = from l in lst
join d in dt.AsEnumerable() on l.code equals d.Field<int>("code")
select new { l.code, l.name };
var ld = lst.Join(dt.AsEnumerable(), l => l.code, d => d.Field<int>("code"), (l,d) => new { l.code, l.name });
It's not clear what your required output is but it looks as if you are correctly getting the only common records. You could extend your select to
select new { l.code, l.name, d.value }
Which would give all the data/columns from both tables.
code name value
==== ==== =====
3 z a
Try this:
var ld = from l in lst
join d in dt.Cast <DataRow>() on l.code equals d["code"].ToString()
select new { l.code, l.name };
SO you have a List and a DataTable. You don't plan to use the Values of the DataTable, only the Codes.
You want to keep those List items, that have a Code that is also a code in the DataTable.
If you plan to use your DataTable for other things than just for this problem, My advice would be to first create a procedure to convert your DataTable into an enumerable sequence.
This way you can add LINQ statements, not only for this problem, but also for other problems.
Let's create an extension method for your DataTable that converts the data into the items that are in the DataTable. See extension methods demystified.
Alas, I don't know what's in your DataTable, let's assume that your DataTable contains Orders
class CustomerOrder
{
public int Id {get; set;}
public int CustomerId {get; set;}
public int Code {get; set;}
public string Value {get; set;}
...
}
The extension method that extends functionality of class DataTable:
public static IEnumerable<Order> ToCustomerOrders(this DataTable table)
{
return table.AsEnumerable().Select(row => new CustomerOrder
{
Id = ...
CustomerId = ...
Code = ...
Value = ...
};
}
I'm not really familiar with DataTables, but you know how to convert the cells of the row into the proper value.
Usage:
DataTable table = ...
Int customerId = 14;
var ordersOfThisCustomer = table.ToCustomerOrders
.Where(customerOrder => customerOrder.CustomerId == customerId)
.FirstOrDefault();
In words: convert the datatable into CustomerOrders, row by row, and check for every converted CustomerOrder whether it has a CustomerId equal to 14. Stop if found. return null if there is no such row.
Now that you've got a nice reusable procedure that is also easy to test, debug and change, we can answer your question.
Given a DataTable with CustomerOrders, and a sequence of items that contain Code and Name, keep only those items from the sequence that have a Code that is also a Code in the DataTable.
var dataTable = ... // your DataTable, filled with CustomerOrders.
var codeNames = ... // your list with Codes and Names
var codesInDataTable = dataTable.ToCustomerOrders
.Select(customerOrder => customerOrder.Code)
.Distinct();
This will create an enumerable sequence that will convert your DataTable row by row and extract property Code. Duplicate Code values will be removed.
If Codes are unique, you don't need Distinct.
Note: the enumerable sequence is not enumerated yet!
var result = codeNames
.Where(codeName => codesInDataTable.Contains(codeName.Code))
.ToList();
In words: for every [Code, Name] combination in your list, keep only those [Code, Name] combinations that have a value for Code that is also in codesInDataTable.

c# datatable groupby and sum column's values (without know the name)

I need to do a group by and sum the values for each columns. Actually I've been able to create a datatable as:
DataTable stats = dt.AsEnumerable().GroupBy(r => r["Data"]).OrderByDescending(r => r.Key).Select(g => g.OrderBy(r => r["Data"]).First()).CopyToDataTable();
Basically I need also to sum each values for each columns in the original datatable (dt). Please consider that, apart a couple of columns, I might dunno how many they are and its name.
In a previous test I used:
var query = from stat in stats
group stat by stat.Field<string>("Data") into data
orderby data.Key
select new
{
Data = data.Key,
TotTWorked = data.Sum(stat => stat.Field<int>("Time_Work")),
TotTHold = data.Sum(stat => stat.Field<int>("Time_Hold")),
TotTAlarm = data.Sum(stat => stat.Field<int>("Time_Alarm")),
Productivity = 0,
};
But now I need to be more flexible so I can't specify the column name as above. Any help?
So assuming you have at least the list of column names, I'd go with the approach of creating a dictionary as part of the select and then transform it later to whatever form you need it. Here's an example:
var query = from stat in stats
group stat by stat.Field<string>("Data") into data
orderby data.Key
select new
{
Data = data.Key,
SumsDictionary = listOfColumnNames
.Select(colName => new { ColName = colName, Sum = data.Sum(stat => stat.Field<int>(colName)) })
.ToDictionary(d => d.ColName, d => d.Sum),
Productivity = 0,
};
So that if you were to serialize the result object it would look something like this:
{
"Data": {},
"SumsDictionary": {
"Time_Work": 10,
"Time_Hold": 20,
"Time_Alarm": 30
},
"Productivity": 0
}
Hope it helps!

SQL to LINQ expres

I'm trying to convert a SQL expression to Linq but I can't make it work, does anyone help?
SELECT
COUNT(descricaoFamiliaNovo) as quantidades
FROM VeiculoComSeminovo
group by descricaoFamiliaNovo
I try this:
ViewBag.familiasCount = db.VeiculoComSeminovo.GroupBy(a => a.descricaoFamiliaNovo).Count();
I need to know how many times each value repeats, but this way it shows me how many distinct values ​​there are in the column.
You can try:
var list = from a in db.VeiculoComSeminovo
group a by a.descricaoFamiliaNovo into g
select new ViewBag{
familiasCount=g.Count()
};
or
var list = db.VeiculoComSeminovo.GroupBy(a => a.descricaoFamiliaNovo)
.Select (g => new ViewBag
{
familiasCount=g.Count()
});
If you need column value:
new ViewBag{
FieldName=g.Key,
familiasCount=g.Count()
};
You don't need the GROUP BY unless there are fields other than the one in COUNT. Try
SELECT
COUNT(descricaoFamiliaNovo) as quantidades
FROM VeiculoComSeminovo
UPDATE, from your comment:
SELECT
COUNT(descricaoFamiliaNovo) as quantidades,
descricaoFamiliaNovo
FROM VeiculoComSeminovo
GROUP BY descricaoFamiliaNovo
That's it as SQL. In LINQ it is something like:
var reponse = db.VeiculoComSeminovo.GroupBy(a => a.descricaoFamiliaNovo)
.Select ( n => new
{Name = n.key,
Count = n.Count()
}
)
Not tested.
Ty all for the help.
I solved the problem using this lines:
// get the objects on db
var list = db.VeiculoComSeminovo.ToList();
// lists to recive data
List<int> totaisFamilia = new List<int>();
List<int> totaisFamiliaComSN = new List<int>();
// loop to cycle through objects and add the values ​​I need to their lists
foreach (var item in ViewBag.familias)
{
totaisFamilia.Add(list.Count(a => a.descricaoFamiliaNovo == item && a.valorSeminovo == null));
totaisFamiliaComSN.Add(list.Count(a => a.descricaoFamiliaNovo == item && a.valorSeminovo != null));
}
The query was a little slow than I expected, but I got the data

How to convert to int and then compare in linq query c#

IEnumerable<classB> list = getItems();
//dt is datatable
list = list.Where(x => Convert.ToInt32( !dt.Columns["Id"]) == (x.Id));
I want to only keep the items in the list which match in datatable id column. The rest are removed. I m not doing it right.
The datatable can have: ID - 1,3,4,5,7
The list can have: ID - 1,2,3,4,5,6,7,8,9,10
I want the output list to have: ID - 1,3,4,5,7
Your code won't work because you're comparing a definition of a column to an integer value. That's not a sensible comparison to make.
What you can do is put all of the values from the data table into a collection that can be effectively searched and then get all of the items in the list that are also in that collection:
var ids = new HashSet<int>(dt.AsEnumerable()
.Select(row => row.Field<int>("Id"));
list = list.Where(x => ids.Contains(x.Id));
Try this one
var idList = dt.AsEnumerable().Select(d => (int) d["Id"]).ToList();
list = list.Where(x => idList.Contains(x.Id));
You can't do it like that. Your dt.Columns["Id"] returns the DataColumn and not the value inside that column in a specific datarow. You need to make a join between two linq query, the first one you already have, the other you need to get from the DataTable.
var queryDt = (from dtRow in dt
where !dtRow.IsNull("Id")
select int.Parse(dtRow["Id"])).ToList();
Now the join
var qry = from nonNull in queryDt
join existing in list on nonNull equals list.id

C# - Remove rows with the same column value from a DataTable

I have a DataTable which looks like this:
ID Name DateBirth
.......................
1 aa 1.1.11
2 bb 2.3.11
2 cc 1.2.12
3 cd 2.3.12
Which is the fastest way to remove the rows with the same ID, to get something like this (keep the first occurrence, delete the next ones):
ID Name DateBirth
.......................
1 aa 1.1.11
2 bb 2.3.11
3 cd 2.3.12
I don't want to double pass the table rows, because the row number is big.
I want to use some LinQ if possible, but I guess it will be a big query and I have to use a comparer.
You can use LINQ to DataTable, to distinct based on column ID, you can group by on this column, then do select first:
var result = dt.AsEnumerable()
.GroupBy(r => r.Field<int>("ID"))
.Select(g => g.First())
.CopyToDataTable();
I was solving the same situation and found it quite interesting and would like to share my finding.
If rows are to be distinct based on ALL COLUMNS.
DataTable newDatatable = dt.DefaultView.ToTable(true, "ID", "Name", "DateBirth");
The columns you mention here, only those will be returned back in newDatatable.
If distinct based on one column and column type is int then I would prefer LINQ query.
DataTable newDatatable = dt.AsEnumerable()
.GroupBy(dr => dr.Field<int>("ID"))
.Select(dg => dg).Take(1)
.CopyToDataTable();
If distinct based on one column and column type is string then I would prefer loop.
List<string> toExclude = new List<string>();
for (int i = 0; i < dt.Rows.Count; i++)
{
var idValue = (string)dt.Rows[i]["ID"];
if (toExclude.Contains(idValue))
{
dt.Rows.Remove(dt.Rows[i]);
i--;
}
toExclude.Add(glAccount);
}
Third being my favorite.
I may have answered few things which are not asked in the question. It was done in good intent and with little excitement as well.
Hope it helps.
you can try this
DataTable uniqueCols = dt.DefaultView.ToTable(true, "ID");
Not necessarily the most efficient approach, but maybe the most readable:
table = table.AsEnumerable()
.GroupBy(row => row.Field<int>("ID"))
.Select(rowGroup => rowGroup.First())
.CopyToDataTable();
Linq is also more powerful. For example, if you want to change the logic and not select the first (arbitrary) row of each id-group but the last according to DateBirth:
table = table.AsEnumerable()
.GroupBy(row => row.Field<int>("ID"))
.Select(rowGroup => rowGroup
.OrderByDescending(r => r.Field<DateTime>("DateBirth"))
.First())
.CopyToDataTable();
Get a record count for each ID
var rowsToDelete =
(from row in dataTable.AsEnumerable()
group row by row.ID into g
where g.Count() > 1
Determine which record to keep (don't know your criteria; I will just sort by DoB then Name and keep first record) and select the rest
select g.OrderBy( dr => dr.Field<DateTime>( "DateBirth" ) ).ThenBy( dr => dr.Field<string>( "Name" ) ).Skip(1))
Flatten
.SelectMany( g => g );
Delete rows
rowsToDelete.ForEach( dr => dr.Delete() );
Accept changes
dataTable.AcceptChanges();
Heres a way to achive this,
All you need to use moreLinq library use its function DistinctBy
Code:
protected void Page_Load(object sender, EventArgs e)
{
var DistinctByIdColumn = getDT2().AsEnumerable()
.DistinctBy(
row => new { Id = row["Id"] });
DataTable dtDistinctByIdColumn = DistinctByIdColumn.CopyToDataTable();
}
public DataTable getDT2()
{
DataTable dt = new DataTable();
dt.Columns.Add("Id", typeof(string));
dt.Columns.Add("Name", typeof(string));
dt.Columns.Add("Dob", typeof(string));
dt.Rows.Add("1", "aa","1.1.11");
dt.Rows.Add("2", "bb","2.3.11");
dt.Rows.Add("2", "cc","1.2.12");
dt.Rows.Add("3", "cd","2.3.12");
return dt;
}
OutPut: As what you expected
For moreLinq sample code view my blog

Categories

Resources