I am trying to understand LINQ method notations better.
Say I have a DataTable:
DataTable table = new DataTable("Products");
table.Columns.Add("ProductID", typeof(int));
table.Columns.Add("ProductName", typeof(string));
table.Columns.Add("Category", typeof(string));
table.Columns.Add("UnitPrice", typeof(decimal));
table.Columns.Add("UnitsInStock", typeof(int));
Assume that variable products is being loaded from the DataTable("Products")
var products = testDS.Tables["Products"].AsEnumerable();
So I know I can do queries like:
var productNameGroups = words4.GroupBy(x => x.Field<string>("ProductName").Substring(0, 1)).Select(x => new { FirstLetter = x.Key, Words = x});
var productGroups = products.GroupBy(p => p.Field<string>("Category")).Select(x => new { Category = x.Key, Products = x });
I'm having trouble grasping the x.Key in the Select method. I'm not sure how its set or when I can/cant use it.
The x.Key is specific to processing results of a GroupBy method.
When you do this
var res = someData.GroupBy(item => item.Property);
the result is an IEnumerable of IGrouping<K,V> - key/value pairs, where the Key property represents the value of Property on which the items are grouped.
Since in your case the grouping is done on the string representing the first letter of ProductName or ProductCategory, that is what you get when you reference x.Key in each of the groups returned by the query.
Related
Suppose I have a call log DataTable where each row represents a call placed with the following columns:
AccountNumber1, AccountNumber2, AccountListDate, AccountDisposition
I want to GroupBy column AccountNumber1 and want a new DataTable with the same columns + 1 additional column NumCalls which will be the count of calls for each AccountNumber1.
New DataTable after GroupBy:
AccountNumber1, AccountNumber2, AccountListDate, AccountDisposition, NumCalls
So far I have the following:
table.AsEnumerable()
.GroupBy(x => x.Field<int>("AccountNumber1"))
.Select(x => new { x.Key.AccountNumber1, NumCalls = x.Count() })
.CopyToDataTable()
Which gives me a DataTable with just two columns AccountNumber1 and NumCalls. How do I get the other columns as I described above?? I would appreciate any help. Thank you.
There's no magic, you need to use a loop and initialize the new table with the new column:
DataTable tblResult = table.Clone();
tblResult.Columns.Add("NumCalls", typeof(int));
var query = table.AsEnumerable().GroupBy(r => r.Field<string>("AccountNumber1"));
foreach (var group in query)
{
DataRow newRow = tblResult.Rows.Add();
DataRow firstOfGroup = group.First();
newRow.SetField<string>("AccountNumber1", group.Key);
newRow.SetField<string>("AccountNumber2", firstOfGroup.Field<string>("AccountNumber2"));
newRow.SetField<DateTime>("AccountListDate", firstOfGroup.Field<DateTime>("AccountListDate"));
newRow.SetField<string>("AccountDisposition", firstOfGroup.Field<string>("AccountDisposition"));
newRow.SetField<int>("NumCalls", group.Count());
}
This takes arbitrary values from the first row of each group which seems to be desired.
I need to do a group by and sum the values for each columns. Actually I've been able to create a datatable as:
DataTable stats = dt.AsEnumerable().GroupBy(r => r["Data"]).OrderByDescending(r => r.Key).Select(g => g.OrderBy(r => r["Data"]).First()).CopyToDataTable();
Basically I need also to sum each values for each columns in the original datatable (dt). Please consider that, apart a couple of columns, I might dunno how many they are and its name.
In a previous test I used:
var query = from stat in stats
group stat by stat.Field<string>("Data") into data
orderby data.Key
select new
{
Data = data.Key,
TotTWorked = data.Sum(stat => stat.Field<int>("Time_Work")),
TotTHold = data.Sum(stat => stat.Field<int>("Time_Hold")),
TotTAlarm = data.Sum(stat => stat.Field<int>("Time_Alarm")),
Productivity = 0,
};
But now I need to be more flexible so I can't specify the column name as above. Any help?
So assuming you have at least the list of column names, I'd go with the approach of creating a dictionary as part of the select and then transform it later to whatever form you need it. Here's an example:
var query = from stat in stats
group stat by stat.Field<string>("Data") into data
orderby data.Key
select new
{
Data = data.Key,
SumsDictionary = listOfColumnNames
.Select(colName => new { ColName = colName, Sum = data.Sum(stat => stat.Field<int>(colName)) })
.ToDictionary(d => d.ColName, d => d.Sum),
Productivity = 0,
};
So that if you were to serialize the result object it would look something like this:
{
"Data": {},
"SumsDictionary": {
"Time_Work": 10,
"Time_Hold": 20,
"Time_Alarm": 30
},
"Productivity": 0
}
Hope it helps!
I have a DataTable in C# with columns defined as follows:
DataTable dt = new DataTable();
dt.Columns.Add("OrgName", typeof(string));
dt.Columns.Add("OrgExId", typeof(string));
dt.Columns.Add("UserName", typeof(string));
dt.Columns.Add("UserExId", typeof(string));
dt.Columns.Add("UserEmail", typeof(string));
"UserName", "UserExId", and "UserEmail" are all unique and they are grouped by "OrgName" and "OrgExId"
I want to write a LINQ query to make a new DataTable that contains unique "OrgExId's" and "OrgName's"
This is as far as I got:
var results = from row in dt.AsEnumerable()
group row by row["OrgExId"] into orgs
select orgs;
Specifically in this query, I don't understand how I am supposed to select the rows from the original DataTable. Visual Studio says orgs is of the type `IGrouping, but I have never really seen this type before and am not sure how to manipulate it.
Is this a key value pair?
Sorry about that all. I did not specify my end result.
I want to end up with a DataTable with two columns, distinct "OrgExId" and "OrgName". (There is a one to one relationship between "OrgExId" and "OrgName")
All you really need is a Distinct clause
var output = dt.AsEnumerable()
.Select(x => new {OrgExId = x["OrgExId"], OrgName = x["OrgName"]})
.Distinct();
You can then iterate over this and build a DataTable or whatever you need.
UPDATE: You asked for the output to be a DataTable and the above solution didn't quite sit well with me since it requires extra work. To make this more efficient you could do a custom equality comparer.
Your linq looks like this...
// This returns a DataTable
var output = dt.AsEnumerable()
.Distinct(new OrgExIdEqualityComparer())
.CopyToDataTable();
And your comparer looks like this...
public class OrgExIdEqualityComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow x, DataRow y)
{
return x["OrgExId"].Equals(y["OrgExId"]);
}
public int GetHashCode(DataRow obj)
{
return obj["OrgExId"].GetHashCode();
}
}
Use Key property of IGrouping:
var results = from row in dt.AsEnumerable()
group row by new {
row.GetField<string>("OrgExId"),
row.GetField<string>("UserName")
} into orgs
select orgs.Key;
It will give you collection of anonymous types. To get DataTable you can simply iterate over results and add it into DataTable.
DataTable dt = new DataTable();
dt.Columns.Add("OrgName", typeof(string));
dt.Columns.Add("OrgExId", typeof(string));
dt.Columns.Add("UserName", typeof(string));
dt.Columns.Add("UserExId", typeof(string));
dt.Columns.Add("UserEmail", typeof(string));
// put some data for testing purpose
var id = Guid.NewGuid().ToString();
for (var i = 0; i < 10; i++)
dt.Rows.Add(id, i.ToString(), "user_name", Guid.NewGuid().ToString());
var x = dt.Rows.Cast<DataRow>().Select(x => x.Field<string>("UserName")).Distinct();
Console.WriteLine(x);
IEnumerable<classB> list = getItems();
//dt is datatable
list = list.Where(x => Convert.ToInt32( !dt.Columns["Id"]) == (x.Id));
I want to only keep the items in the list which match in datatable id column. The rest are removed. I m not doing it right.
The datatable can have: ID - 1,3,4,5,7
The list can have: ID - 1,2,3,4,5,6,7,8,9,10
I want the output list to have: ID - 1,3,4,5,7
Your code won't work because you're comparing a definition of a column to an integer value. That's not a sensible comparison to make.
What you can do is put all of the values from the data table into a collection that can be effectively searched and then get all of the items in the list that are also in that collection:
var ids = new HashSet<int>(dt.AsEnumerable()
.Select(row => row.Field<int>("Id"));
list = list.Where(x => ids.Contains(x.Id));
Try this one
var idList = dt.AsEnumerable().Select(d => (int) d["Id"]).ToList();
list = list.Where(x => idList.Contains(x.Id));
You can't do it like that. Your dt.Columns["Id"] returns the DataColumn and not the value inside that column in a specific datarow. You need to make a join between two linq query, the first one you already have, the other you need to get from the DataTable.
var queryDt = (from dtRow in dt
where !dtRow.IsNull("Id")
select int.Parse(dtRow["Id"])).ToList();
Now the join
var qry = from nonNull in queryDt
join existing in list on nonNull equals list.id
I have a DataTable which looks like this:
ID Name DateBirth
.......................
1 aa 1.1.11
2 bb 2.3.11
2 cc 1.2.12
3 cd 2.3.12
Which is the fastest way to remove the rows with the same ID, to get something like this (keep the first occurrence, delete the next ones):
ID Name DateBirth
.......................
1 aa 1.1.11
2 bb 2.3.11
3 cd 2.3.12
I don't want to double pass the table rows, because the row number is big.
I want to use some LinQ if possible, but I guess it will be a big query and I have to use a comparer.
You can use LINQ to DataTable, to distinct based on column ID, you can group by on this column, then do select first:
var result = dt.AsEnumerable()
.GroupBy(r => r.Field<int>("ID"))
.Select(g => g.First())
.CopyToDataTable();
I was solving the same situation and found it quite interesting and would like to share my finding.
If rows are to be distinct based on ALL COLUMNS.
DataTable newDatatable = dt.DefaultView.ToTable(true, "ID", "Name", "DateBirth");
The columns you mention here, only those will be returned back in newDatatable.
If distinct based on one column and column type is int then I would prefer LINQ query.
DataTable newDatatable = dt.AsEnumerable()
.GroupBy(dr => dr.Field<int>("ID"))
.Select(dg => dg).Take(1)
.CopyToDataTable();
If distinct based on one column and column type is string then I would prefer loop.
List<string> toExclude = new List<string>();
for (int i = 0; i < dt.Rows.Count; i++)
{
var idValue = (string)dt.Rows[i]["ID"];
if (toExclude.Contains(idValue))
{
dt.Rows.Remove(dt.Rows[i]);
i--;
}
toExclude.Add(glAccount);
}
Third being my favorite.
I may have answered few things which are not asked in the question. It was done in good intent and with little excitement as well.
Hope it helps.
you can try this
DataTable uniqueCols = dt.DefaultView.ToTable(true, "ID");
Not necessarily the most efficient approach, but maybe the most readable:
table = table.AsEnumerable()
.GroupBy(row => row.Field<int>("ID"))
.Select(rowGroup => rowGroup.First())
.CopyToDataTable();
Linq is also more powerful. For example, if you want to change the logic and not select the first (arbitrary) row of each id-group but the last according to DateBirth:
table = table.AsEnumerable()
.GroupBy(row => row.Field<int>("ID"))
.Select(rowGroup => rowGroup
.OrderByDescending(r => r.Field<DateTime>("DateBirth"))
.First())
.CopyToDataTable();
Get a record count for each ID
var rowsToDelete =
(from row in dataTable.AsEnumerable()
group row by row.ID into g
where g.Count() > 1
Determine which record to keep (don't know your criteria; I will just sort by DoB then Name and keep first record) and select the rest
select g.OrderBy( dr => dr.Field<DateTime>( "DateBirth" ) ).ThenBy( dr => dr.Field<string>( "Name" ) ).Skip(1))
Flatten
.SelectMany( g => g );
Delete rows
rowsToDelete.ForEach( dr => dr.Delete() );
Accept changes
dataTable.AcceptChanges();
Heres a way to achive this,
All you need to use moreLinq library use its function DistinctBy
Code:
protected void Page_Load(object sender, EventArgs e)
{
var DistinctByIdColumn = getDT2().AsEnumerable()
.DistinctBy(
row => new { Id = row["Id"] });
DataTable dtDistinctByIdColumn = DistinctByIdColumn.CopyToDataTable();
}
public DataTable getDT2()
{
DataTable dt = new DataTable();
dt.Columns.Add("Id", typeof(string));
dt.Columns.Add("Name", typeof(string));
dt.Columns.Add("Dob", typeof(string));
dt.Rows.Add("1", "aa","1.1.11");
dt.Rows.Add("2", "bb","2.3.11");
dt.Rows.Add("2", "cc","1.2.12");
dt.Rows.Add("3", "cd","2.3.12");
return dt;
}
OutPut: As what you expected
For moreLinq sample code view my blog