I have a code similar to this structure:
my table has 108000 rows.
This datatable is really just I read a tab delimited text file to process so I put it in a datatable.
private void Foo(DataTable myDataTable)
{
List<string> alreadyProcessed = new List<string>();
foreach(DataRow row in myDataTable.Rows)
{
string current = row["Emp"].ToString().Trim();
if (alreadyProcessed.Contains(current))
continue;
alreadyProcessed.Add(current);
var empRows = from p in myDataTable.AsEnumerable
where p["Emp"].ToString().Trim() == current
select new EmpClass
{
LastName = (string) p["lName"],
// some more stuff similr
};
// do some stuff with that EmpClass but they shouldn't take too long
}
}
Running such a thing is taking more than 15 minutes. How can I improve this?
Here is a rather naive rewrite of your code.
Instead of tracking which employees you have already processed, let's just group all rows by their employees and process them each separately.
var rowsPerEmployee =
(from DataRow row in dt.Rows
let emp = row["Emp"].ToString().Trim()
group row by emp into g
select g)
.ToDictionary(
g => g.Key,
g => g.ToArray());
foreach (var current in rowsPerEmployee.Keys)
{
var empRows = rowsPerEmployee[current];
... rest of your code here, note that empRows is not all the rows for a single employee
... and not just the lastname or similar
}
This will first group the entire datatable by the employee and create a dictionary from employee to rows for that employee, and then loop on the employees and get the rows.
You should do Group By "EMP", otherwise you're going through each row and for some rows you're querying the whole table. Something like this
from p in myDataTable.AsEnumerable
group p by p.Field<string>("Emp") into g
select new { Emp = g.Key,
Data = g.Select(gg=>new EmpClass
{
LastName = gg.Field<string>("lName")
}
)
}
One thing that might slow things down for you in the linq statement is, how much data you're selecting! you write 'select new EmpClass', and depending on how many columns (and rows/information for that matter) your selecting to become your output may slow things drastically down for you. Other tips and tricks to work on that problem may be found in: http://visualstudiomagazine.com/articles/2010/06/24/five-tips-linq-to-sql.aspx
Related
I have a List of objects (lst) and DataTable (dt). I want to join the lst and dt on the common field (code as string) and need to return all matching rows in the lst.
My List contains two columns i.e code and name along with values below:
code name
==== ====
1 x
2 y
3 z
The DataTable contains two columns i.e code and value along with values below:
code value
==== =====
3 a
4 b
5 c
The result is:
3 z
Below is my code; but I know it is not a correct statement and thus seeking your advice here. I would be much appreciated if you could guide me on how to write the correct statement.
var ld = from l in lst
join d in dt.AsEnumerable() on l.code equals d.code
select new { l.code, l.name };
You can use Linq query or Join extension method to join the collection on code. Just that when you select data from datatable, you need to use dt.Field method. Please use either of the following code.
Query1:
var ld = lst.Join(dt.AsEnumerable(),
l => l.code,
d => d.Field<string>("code"),
(l, d) => new
{
l.code,
l.name,
value = d.Field<string>("value")
}).ToList();
Query2:
var ld = (from l in lst
join d in dt.AsEnumerable()
on l.code equals d.Field<string>("code")
select new
{
l.code,
l.name,
value = d.Field<string>("value")
}).ToList();
Query3:
var ld = (from l in lst
join d in dt.AsEnumerable()
on l.code equals d.Field<string>("code")
let value = d.Field<string>("value")
select new
{
l.code,
l.name,
value
}).ToList();
You can try any of the below.
var ld = from l in lst
join d in dt.AsEnumerable() on l.code equals d.Field<int>("code")
select new { l.code, l.name };
var ld = lst.Join(dt.AsEnumerable(), l => l.code, d => d.Field<int>("code"), (l,d) => new { l.code, l.name });
It's not clear what your required output is but it looks as if you are correctly getting the only common records. You could extend your select to
select new { l.code, l.name, d.value }
Which would give all the data/columns from both tables.
code name value
==== ==== =====
3 z a
Try this:
var ld = from l in lst
join d in dt.Cast <DataRow>() on l.code equals d["code"].ToString()
select new { l.code, l.name };
SO you have a List and a DataTable. You don't plan to use the Values of the DataTable, only the Codes.
You want to keep those List items, that have a Code that is also a code in the DataTable.
If you plan to use your DataTable for other things than just for this problem, My advice would be to first create a procedure to convert your DataTable into an enumerable sequence.
This way you can add LINQ statements, not only for this problem, but also for other problems.
Let's create an extension method for your DataTable that converts the data into the items that are in the DataTable. See extension methods demystified.
Alas, I don't know what's in your DataTable, let's assume that your DataTable contains Orders
class CustomerOrder
{
public int Id {get; set;}
public int CustomerId {get; set;}
public int Code {get; set;}
public string Value {get; set;}
...
}
The extension method that extends functionality of class DataTable:
public static IEnumerable<Order> ToCustomerOrders(this DataTable table)
{
return table.AsEnumerable().Select(row => new CustomerOrder
{
Id = ...
CustomerId = ...
Code = ...
Value = ...
};
}
I'm not really familiar with DataTables, but you know how to convert the cells of the row into the proper value.
Usage:
DataTable table = ...
Int customerId = 14;
var ordersOfThisCustomer = table.ToCustomerOrders
.Where(customerOrder => customerOrder.CustomerId == customerId)
.FirstOrDefault();
In words: convert the datatable into CustomerOrders, row by row, and check for every converted CustomerOrder whether it has a CustomerId equal to 14. Stop if found. return null if there is no such row.
Now that you've got a nice reusable procedure that is also easy to test, debug and change, we can answer your question.
Given a DataTable with CustomerOrders, and a sequence of items that contain Code and Name, keep only those items from the sequence that have a Code that is also a Code in the DataTable.
var dataTable = ... // your DataTable, filled with CustomerOrders.
var codeNames = ... // your list with Codes and Names
var codesInDataTable = dataTable.ToCustomerOrders
.Select(customerOrder => customerOrder.Code)
.Distinct();
This will create an enumerable sequence that will convert your DataTable row by row and extract property Code. Duplicate Code values will be removed.
If Codes are unique, you don't need Distinct.
Note: the enumerable sequence is not enumerated yet!
var result = codeNames
.Where(codeName => codesInDataTable.Contains(codeName.Code))
.ToList();
In words: for every [Code, Name] combination in your list, keep only those [Code, Name] combinations that have a value for Code that is also in codesInDataTable.
I am very unfamiliar with Entity Framework and LINQ. I have a single entity set with some columns where I want to filter our some special rows.
4 of the rows are named Guid (string), Year (short), Month (short) and FileIndex (short). I want to get all rows which have the maximum FileIndex for each existing combination of Guid-Year-Month.
My current solution looks like this:
var maxFileIndexRecords = from item in context.Udps
group item by new { item.Guid, item.Year, item.Month }
into gcs
select new { gcs.Key.Guid, gcs.Key.Year, gcs.Key.Month,
gcs.OrderByDescending(x => x.FileIndex).FirstOrDefault().FileIndex };
var result = from item in context.Udps
join j in maxFileIndexRecords on
new
{
item.Guid,
item.Year,
item.Month,
item.FileIndex
}
equals
new
{
j.Guid,
j.Year,
j.Month,
j.FileIndex
}
select item;
I think there should be a shorter solution with more performance. Does anyone have a hint for me?
Thank you
You were close. It's not necessary to actually select the grouping key. You can simply select the first item of each group:
var maxFileIndexRecords =
from item in context.Udps
group item by new { item.Guid, item.Year, item.Month }
into gcs
select gcs.OrderByDescending(x => x.FileIndex).FirstOrDefault();
//SELECT table1.GG_ITEM, Sum(table1.REM_QTY) AS SumPerGG_ITEM
//FROM table1
//WHERE (table1.SUGG_DOCK_DATE Is Not Null)
//GROUP BY table1.GG_ITEM
//ORDER BY table1.GG_ITEM;
var try1 = (from row in db2.Dumps select new { Type1 = row.GA_ITEM, Type2 = row.REM_QTY });
Debug.Print(":::::try1:::::");
foreach (var row in try1)
{
Debug.Print(row.Type1.ToString());
Debug.Print(row.Type2.ToString());
}
var try2 = (from row in db2.Dumps group row by row.GA_ITEM into g select new { Type1 = g.Key, Type2 = g.ToList() });
Debug.Print("::::try2:::::");
foreach (var row in try2)
{
Debug.Print(row.Type1.ToString());
Debug.Print(row.Type2.ToString());
}
I'm converting an Access SQL query to Linq. The two columns I am selecting from my table Dumps are GA_ITEM and REM_QTY. My try1 is working out just fine and I see the contents of both columns printed out. My try1 is not yet duplicating the functionality of the Access SQL query.
My try2 is an attempt at grouping. For my try2 row.Type1.ToString() is readable however row.Type2.ToString() is showing up in the output window as:
System.Collections.Generic.List`1[garminaspsandbox3.Models.Dump]
What I really would like to do is in try2 select GA_ITEM and REM_QTY like I did in try1 and group by GA_ITEM however those fields aren't showing up in my autocomplete for the g object.
Does anyone know how to do this in Linq?
Thank you for posting...
Your Type2 property holds a List, not a single item,So you need to use another loop and iterate over the items in that group:
foreach (var row in try2)
{
Debug.Print(row.Type1.ToString());
foreach(var item in row.Type2)
{
Debug.Print(item.GA_ITEM);
Debug.Print(item.REM_QTY);
}
}
I Have DataTable Similar Like this.
If the adults value and child value are same. I need to Remove it and count that. I need a output similar like this.
Can anyone please help me on this???.
Thank you,
You want to group by adults+child:
var groups = tblRoooms.AsEnumerable()
.GroupBy(r => new{ Adults = r.Field<int>("Adults"), Child = r.Field<int>("Child") });
var tblRooomsCopy = tblRoooms.Clone(); // creates an empty clone of the table
foreach(var grp in groups)
{
int roomCount = grp.Sum(r => r.Field<int>("Roomcount"));
DataRow row = tblRooomsCopy.Rows.Add();
row.SetField("RoomNo", grp.First().Field<int>("RoomNo"));
row.SetField("Roomcount", roomCount);
row.SetField("Adults", grp.Key.Adults);
row.SetField("Child", grp.Key.Child);
}
Now you have your desired result in tblRooomsCopy.
I won't write the complete code for you but I will describe a suggested way: first order the datatable by adults and child, that will cause same rows to be consecutive, create a list that you will fill rows to be deleted
then use foreach to compare each row with the previous one, if it has the same value then add it to the list of rows to be removed, finally you will delete the rows in the list
I'm new to LINQ, so I'm sure there's an error in my logic below.
I have a list of objects:
class Characteristic
{
public string Name { get; set; }
public string Value { get; set; }
public bool IsIncluded { get; set; }
}
Using each object in the list, I want to build a query in LINQ that starts with a DataTable, and filters it based on the object values, and yields a DataTable as the result.
My Code so far:
DataTable table = MyTable;
// Also tried: DataTable table = MyTable.Clone();
foreach (Characteristic c in characteristics)
{
if (c.IsIncluded)
{
var q = (from r in table.AsEnumerable()
where r.Field<string>(c.Name) == c.Value
select r);
table = rows.CopyToDataTable();
}
else
{
var q = (from r in table.AsEnumerable()
where r.Field<string>(c.Name) != c.Value
select r);
table = q.CopyToDataTable();
}
}
UPDATE
I was in a panicked hurry and I made a mistake; my DataTable was not empty, I just forgot to bind it to the DataGrid. But also, Henk Holterman pointed out that I was overwriting my result set each iteration, which was a logic error.
Henk's code seems to work the best so far, but I need to do more testing.
Spinon's answer also helped bring clarity to my mind, but his code gave me an error.
I need to try to understand Timwi's code better, but in it's current form, it did not work for me.
NEW CODE
DataTable table = new DataTable();
foreach (Characteristic c in characteristics)
{
EnumerableRowCollection<DataRow> rows = null;
if (c.IsIncluded)
{
rows = (from r in MyTable.AsEnumerable()
where r.Field<string>(c.Name) == c.Value
select r);
}
else
{
rows = (from r in MyTable.AsEnumerable()
where r.Field<string>(c.Name) != c.Value
select r);
}
table.Merge(rows.CopyToDataTable());
}
dataGrid.DataContext = table;
The logic in your posting is wonky; here is my attempt of what I think you are trying to achieve.
DataTable table = MyTable.AsEnumerable()
.Where(r => characteristics.All(c => !c.IsIncluded ||
r.Field<string>(c.Name) == c.Value))
.CopyToDataTable();
If you actually want to use the logic in your posting, change || to ^, but that seems to make little sense.
You overwrite the table variable for each characteristic, so in the end it only holds the results from the last round, and that that apparently is empty.
What you could do is something like:
// untested
var t = q.CopyToDataTable();
table.Merge(t);
And I suspect your query should use MyTable as the source:
var q = (from r in MyTable.AsEnumerable() ...
But that's not entirely clear.
If you are trying to just insert the rows into your table then try calling the CopyToDataTable method this way:
q.CopyToDataTable(table, LoadOption.PreserveChanges);
This way rather than reassigning the table variable you can just update it with the new rows that are to be inserted.
EDIT: Here is an example of what I was talking about:
DataTable table = new DataTable();
table.Columns.Add("Name", typeof(string));
table.Columns.Add("Value", typeof(string));