I'm new to LINQ, so I'm sure there's an error in my logic below.
I have a list of objects:
class Characteristic
{
public string Name { get; set; }
public string Value { get; set; }
public bool IsIncluded { get; set; }
}
Using each object in the list, I want to build a query in LINQ that starts with a DataTable, and filters it based on the object values, and yields a DataTable as the result.
My Code so far:
DataTable table = MyTable;
// Also tried: DataTable table = MyTable.Clone();
foreach (Characteristic c in characteristics)
{
if (c.IsIncluded)
{
var q = (from r in table.AsEnumerable()
where r.Field<string>(c.Name) == c.Value
select r);
table = rows.CopyToDataTable();
}
else
{
var q = (from r in table.AsEnumerable()
where r.Field<string>(c.Name) != c.Value
select r);
table = q.CopyToDataTable();
}
}
UPDATE
I was in a panicked hurry and I made a mistake; my DataTable was not empty, I just forgot to bind it to the DataGrid. But also, Henk Holterman pointed out that I was overwriting my result set each iteration, which was a logic error.
Henk's code seems to work the best so far, but I need to do more testing.
Spinon's answer also helped bring clarity to my mind, but his code gave me an error.
I need to try to understand Timwi's code better, but in it's current form, it did not work for me.
NEW CODE
DataTable table = new DataTable();
foreach (Characteristic c in characteristics)
{
EnumerableRowCollection<DataRow> rows = null;
if (c.IsIncluded)
{
rows = (from r in MyTable.AsEnumerable()
where r.Field<string>(c.Name) == c.Value
select r);
}
else
{
rows = (from r in MyTable.AsEnumerable()
where r.Field<string>(c.Name) != c.Value
select r);
}
table.Merge(rows.CopyToDataTable());
}
dataGrid.DataContext = table;
The logic in your posting is wonky; here is my attempt of what I think you are trying to achieve.
DataTable table = MyTable.AsEnumerable()
.Where(r => characteristics.All(c => !c.IsIncluded ||
r.Field<string>(c.Name) == c.Value))
.CopyToDataTable();
If you actually want to use the logic in your posting, change || to ^, but that seems to make little sense.
You overwrite the table variable for each characteristic, so in the end it only holds the results from the last round, and that that apparently is empty.
What you could do is something like:
// untested
var t = q.CopyToDataTable();
table.Merge(t);
And I suspect your query should use MyTable as the source:
var q = (from r in MyTable.AsEnumerable() ...
But that's not entirely clear.
If you are trying to just insert the rows into your table then try calling the CopyToDataTable method this way:
q.CopyToDataTable(table, LoadOption.PreserveChanges);
This way rather than reassigning the table variable you can just update it with the new rows that are to be inserted.
EDIT: Here is an example of what I was talking about:
DataTable table = new DataTable();
table.Columns.Add("Name", typeof(string));
table.Columns.Add("Value", typeof(string));
Related
I have a List of objects (lst) and DataTable (dt). I want to join the lst and dt on the common field (code as string) and need to return all matching rows in the lst.
My List contains two columns i.e code and name along with values below:
code name
==== ====
1 x
2 y
3 z
The DataTable contains two columns i.e code and value along with values below:
code value
==== =====
3 a
4 b
5 c
The result is:
3 z
Below is my code; but I know it is not a correct statement and thus seeking your advice here. I would be much appreciated if you could guide me on how to write the correct statement.
var ld = from l in lst
join d in dt.AsEnumerable() on l.code equals d.code
select new { l.code, l.name };
You can use Linq query or Join extension method to join the collection on code. Just that when you select data from datatable, you need to use dt.Field method. Please use either of the following code.
Query1:
var ld = lst.Join(dt.AsEnumerable(),
l => l.code,
d => d.Field<string>("code"),
(l, d) => new
{
l.code,
l.name,
value = d.Field<string>("value")
}).ToList();
Query2:
var ld = (from l in lst
join d in dt.AsEnumerable()
on l.code equals d.Field<string>("code")
select new
{
l.code,
l.name,
value = d.Field<string>("value")
}).ToList();
Query3:
var ld = (from l in lst
join d in dt.AsEnumerable()
on l.code equals d.Field<string>("code")
let value = d.Field<string>("value")
select new
{
l.code,
l.name,
value
}).ToList();
You can try any of the below.
var ld = from l in lst
join d in dt.AsEnumerable() on l.code equals d.Field<int>("code")
select new { l.code, l.name };
var ld = lst.Join(dt.AsEnumerable(), l => l.code, d => d.Field<int>("code"), (l,d) => new { l.code, l.name });
It's not clear what your required output is but it looks as if you are correctly getting the only common records. You could extend your select to
select new { l.code, l.name, d.value }
Which would give all the data/columns from both tables.
code name value
==== ==== =====
3 z a
Try this:
var ld = from l in lst
join d in dt.Cast <DataRow>() on l.code equals d["code"].ToString()
select new { l.code, l.name };
SO you have a List and a DataTable. You don't plan to use the Values of the DataTable, only the Codes.
You want to keep those List items, that have a Code that is also a code in the DataTable.
If you plan to use your DataTable for other things than just for this problem, My advice would be to first create a procedure to convert your DataTable into an enumerable sequence.
This way you can add LINQ statements, not only for this problem, but also for other problems.
Let's create an extension method for your DataTable that converts the data into the items that are in the DataTable. See extension methods demystified.
Alas, I don't know what's in your DataTable, let's assume that your DataTable contains Orders
class CustomerOrder
{
public int Id {get; set;}
public int CustomerId {get; set;}
public int Code {get; set;}
public string Value {get; set;}
...
}
The extension method that extends functionality of class DataTable:
public static IEnumerable<Order> ToCustomerOrders(this DataTable table)
{
return table.AsEnumerable().Select(row => new CustomerOrder
{
Id = ...
CustomerId = ...
Code = ...
Value = ...
};
}
I'm not really familiar with DataTables, but you know how to convert the cells of the row into the proper value.
Usage:
DataTable table = ...
Int customerId = 14;
var ordersOfThisCustomer = table.ToCustomerOrders
.Where(customerOrder => customerOrder.CustomerId == customerId)
.FirstOrDefault();
In words: convert the datatable into CustomerOrders, row by row, and check for every converted CustomerOrder whether it has a CustomerId equal to 14. Stop if found. return null if there is no such row.
Now that you've got a nice reusable procedure that is also easy to test, debug and change, we can answer your question.
Given a DataTable with CustomerOrders, and a sequence of items that contain Code and Name, keep only those items from the sequence that have a Code that is also a Code in the DataTable.
var dataTable = ... // your DataTable, filled with CustomerOrders.
var codeNames = ... // your list with Codes and Names
var codesInDataTable = dataTable.ToCustomerOrders
.Select(customerOrder => customerOrder.Code)
.Distinct();
This will create an enumerable sequence that will convert your DataTable row by row and extract property Code. Duplicate Code values will be removed.
If Codes are unique, you don't need Distinct.
Note: the enumerable sequence is not enumerated yet!
var result = codeNames
.Where(codeName => codesInDataTable.Contains(codeName.Code))
.ToList();
In words: for every [Code, Name] combination in your list, keep only those [Code, Name] combinations that have a value for Code that is also in codesInDataTable.
I have a code similar to this structure:
my table has 108000 rows.
This datatable is really just I read a tab delimited text file to process so I put it in a datatable.
private void Foo(DataTable myDataTable)
{
List<string> alreadyProcessed = new List<string>();
foreach(DataRow row in myDataTable.Rows)
{
string current = row["Emp"].ToString().Trim();
if (alreadyProcessed.Contains(current))
continue;
alreadyProcessed.Add(current);
var empRows = from p in myDataTable.AsEnumerable
where p["Emp"].ToString().Trim() == current
select new EmpClass
{
LastName = (string) p["lName"],
// some more stuff similr
};
// do some stuff with that EmpClass but they shouldn't take too long
}
}
Running such a thing is taking more than 15 minutes. How can I improve this?
Here is a rather naive rewrite of your code.
Instead of tracking which employees you have already processed, let's just group all rows by their employees and process them each separately.
var rowsPerEmployee =
(from DataRow row in dt.Rows
let emp = row["Emp"].ToString().Trim()
group row by emp into g
select g)
.ToDictionary(
g => g.Key,
g => g.ToArray());
foreach (var current in rowsPerEmployee.Keys)
{
var empRows = rowsPerEmployee[current];
... rest of your code here, note that empRows is not all the rows for a single employee
... and not just the lastname or similar
}
This will first group the entire datatable by the employee and create a dictionary from employee to rows for that employee, and then loop on the employees and get the rows.
You should do Group By "EMP", otherwise you're going through each row and for some rows you're querying the whole table. Something like this
from p in myDataTable.AsEnumerable
group p by p.Field<string>("Emp") into g
select new { Emp = g.Key,
Data = g.Select(gg=>new EmpClass
{
LastName = gg.Field<string>("lName")
}
)
}
One thing that might slow things down for you in the linq statement is, how much data you're selecting! you write 'select new EmpClass', and depending on how many columns (and rows/information for that matter) your selecting to become your output may slow things drastically down for you. Other tips and tricks to work on that problem may be found in: http://visualstudiomagazine.com/articles/2010/06/24/five-tips-linq-to-sql.aspx
I have a DataTable in C# with columns defined as follows:
DataTable dt = new DataTable();
dt.Columns.Add("OrgName", typeof(string));
dt.Columns.Add("OrgExId", typeof(string));
dt.Columns.Add("UserName", typeof(string));
dt.Columns.Add("UserExId", typeof(string));
dt.Columns.Add("UserEmail", typeof(string));
"UserName", "UserExId", and "UserEmail" are all unique and they are grouped by "OrgName" and "OrgExId"
I want to write a LINQ query to make a new DataTable that contains unique "OrgExId's" and "OrgName's"
This is as far as I got:
var results = from row in dt.AsEnumerable()
group row by row["OrgExId"] into orgs
select orgs;
Specifically in this query, I don't understand how I am supposed to select the rows from the original DataTable. Visual Studio says orgs is of the type `IGrouping, but I have never really seen this type before and am not sure how to manipulate it.
Is this a key value pair?
Sorry about that all. I did not specify my end result.
I want to end up with a DataTable with two columns, distinct "OrgExId" and "OrgName". (There is a one to one relationship between "OrgExId" and "OrgName")
All you really need is a Distinct clause
var output = dt.AsEnumerable()
.Select(x => new {OrgExId = x["OrgExId"], OrgName = x["OrgName"]})
.Distinct();
You can then iterate over this and build a DataTable or whatever you need.
UPDATE: You asked for the output to be a DataTable and the above solution didn't quite sit well with me since it requires extra work. To make this more efficient you could do a custom equality comparer.
Your linq looks like this...
// This returns a DataTable
var output = dt.AsEnumerable()
.Distinct(new OrgExIdEqualityComparer())
.CopyToDataTable();
And your comparer looks like this...
public class OrgExIdEqualityComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow x, DataRow y)
{
return x["OrgExId"].Equals(y["OrgExId"]);
}
public int GetHashCode(DataRow obj)
{
return obj["OrgExId"].GetHashCode();
}
}
Use Key property of IGrouping:
var results = from row in dt.AsEnumerable()
group row by new {
row.GetField<string>("OrgExId"),
row.GetField<string>("UserName")
} into orgs
select orgs.Key;
It will give you collection of anonymous types. To get DataTable you can simply iterate over results and add it into DataTable.
DataTable dt = new DataTable();
dt.Columns.Add("OrgName", typeof(string));
dt.Columns.Add("OrgExId", typeof(string));
dt.Columns.Add("UserName", typeof(string));
dt.Columns.Add("UserExId", typeof(string));
dt.Columns.Add("UserEmail", typeof(string));
// put some data for testing purpose
var id = Guid.NewGuid().ToString();
for (var i = 0; i < 10; i++)
dt.Rows.Add(id, i.ToString(), "user_name", Guid.NewGuid().ToString());
var x = dt.Rows.Cast<DataRow>().Select(x => x.Field<string>("UserName")).Distinct();
Console.WriteLine(x);
I have a DataTable which looks like this:
ID Name DateBirth
.......................
1 aa 1.1.11
2 bb 2.3.11
2 cc 1.2.12
3 cd 2.3.12
Which is the fastest way to remove the rows with the same ID, to get something like this (keep the first occurrence, delete the next ones):
ID Name DateBirth
.......................
1 aa 1.1.11
2 bb 2.3.11
3 cd 2.3.12
I don't want to double pass the table rows, because the row number is big.
I want to use some LinQ if possible, but I guess it will be a big query and I have to use a comparer.
You can use LINQ to DataTable, to distinct based on column ID, you can group by on this column, then do select first:
var result = dt.AsEnumerable()
.GroupBy(r => r.Field<int>("ID"))
.Select(g => g.First())
.CopyToDataTable();
I was solving the same situation and found it quite interesting and would like to share my finding.
If rows are to be distinct based on ALL COLUMNS.
DataTable newDatatable = dt.DefaultView.ToTable(true, "ID", "Name", "DateBirth");
The columns you mention here, only those will be returned back in newDatatable.
If distinct based on one column and column type is int then I would prefer LINQ query.
DataTable newDatatable = dt.AsEnumerable()
.GroupBy(dr => dr.Field<int>("ID"))
.Select(dg => dg).Take(1)
.CopyToDataTable();
If distinct based on one column and column type is string then I would prefer loop.
List<string> toExclude = new List<string>();
for (int i = 0; i < dt.Rows.Count; i++)
{
var idValue = (string)dt.Rows[i]["ID"];
if (toExclude.Contains(idValue))
{
dt.Rows.Remove(dt.Rows[i]);
i--;
}
toExclude.Add(glAccount);
}
Third being my favorite.
I may have answered few things which are not asked in the question. It was done in good intent and with little excitement as well.
Hope it helps.
you can try this
DataTable uniqueCols = dt.DefaultView.ToTable(true, "ID");
Not necessarily the most efficient approach, but maybe the most readable:
table = table.AsEnumerable()
.GroupBy(row => row.Field<int>("ID"))
.Select(rowGroup => rowGroup.First())
.CopyToDataTable();
Linq is also more powerful. For example, if you want to change the logic and not select the first (arbitrary) row of each id-group but the last according to DateBirth:
table = table.AsEnumerable()
.GroupBy(row => row.Field<int>("ID"))
.Select(rowGroup => rowGroup
.OrderByDescending(r => r.Field<DateTime>("DateBirth"))
.First())
.CopyToDataTable();
Get a record count for each ID
var rowsToDelete =
(from row in dataTable.AsEnumerable()
group row by row.ID into g
where g.Count() > 1
Determine which record to keep (don't know your criteria; I will just sort by DoB then Name and keep first record) and select the rest
select g.OrderBy( dr => dr.Field<DateTime>( "DateBirth" ) ).ThenBy( dr => dr.Field<string>( "Name" ) ).Skip(1))
Flatten
.SelectMany( g => g );
Delete rows
rowsToDelete.ForEach( dr => dr.Delete() );
Accept changes
dataTable.AcceptChanges();
Heres a way to achive this,
All you need to use moreLinq library use its function DistinctBy
Code:
protected void Page_Load(object sender, EventArgs e)
{
var DistinctByIdColumn = getDT2().AsEnumerable()
.DistinctBy(
row => new { Id = row["Id"] });
DataTable dtDistinctByIdColumn = DistinctByIdColumn.CopyToDataTable();
}
public DataTable getDT2()
{
DataTable dt = new DataTable();
dt.Columns.Add("Id", typeof(string));
dt.Columns.Add("Name", typeof(string));
dt.Columns.Add("Dob", typeof(string));
dt.Rows.Add("1", "aa","1.1.11");
dt.Rows.Add("2", "bb","2.3.11");
dt.Rows.Add("2", "cc","1.2.12");
dt.Rows.Add("3", "cd","2.3.12");
return dt;
}
OutPut: As what you expected
For moreLinq sample code view my blog
I have existing code that works very well and it finds the maximum value of a data column in the data table. Now I would like to refine this and find the maximum value per empid.
What change would be needed? I do not want to use LINQ.
I am right now using this: memberSelectedTiers.Select("Insert_Date = MAX(Insert_Date)")
and I need to group it by Empid.
My code is as below.
DataTable memberApprovedTiers = GetSupplierAssignedTiersAsTable(this.Customer_ID, this.Contract_ID);
//get row with maximum Insert_Date in memberSelectedTiers
DataRow msRow = null;
if (memberSelectedTiers != null && memberSelectedTiers.Rows != null && memberSelectedTiers.Rows.Count > 0)
{
DataRow[] msRows = memberSelectedTiers.Select("Insert_Date = MAX(Insert_Date)");
if (msRows != null && msRows.Length > 0)
{
msRow = msRows[0];
}
}
You can use LINQ to achieve this. I think the following will work (don't have VS to test):
var grouped = memberSelectedTiers.AsEnumerable()
.GroupBy(r => r.Field<int>("EmpId"))
.Select(grp =>
new {
EmpId = grp.Key
, MaxDate = grp.Max(e => e.Field<DateTime>("Insert_Date"))
});
Daniel Kelley, your answer helped me and that's great, but did you notice the OP stated he didn't want to use LINQ?