Linq with DataTable .ToList() very slow - c#

facts.UnderlyingDataTable is a DataTable
var queryResults4 = //get all facts
(from f in facts.UnderlyingDataTable.AsEnumerable()
where f.RowState != DataRowState.Deleted &&
FactIDsToSelect.Contains(f.Field<int>("FactID"))
select f);
var queryResults5 = (from f in queryResults4.AsEnumerable()
orderby UF.Rnd.Next()
select f);
return queryResults5.ToList();
The problem is this line queryResults5.ToList();
It returns a list of DataRows. But is super slow to do this.
I am happy to return any object that implements IEnumerable. What should I do? I seems the conversion from whatever the var is to List<DataRow> is slow.
Thanks for your time.

First, not the ToList itself is slow but the query that gets executed in this method. So maybe your DataTable contains many rows. I assume also that FactIDsToSelect is large which makes the Contains check for every row slow .
You could use CopyToDataTable to create a new DataTable with the same schema instead of a List since that is more natural for an IEnumerable<DataRow>. However, as i have mentioned, that would not solve your performance issue.
You could optimize the query with a Join which is much more efficient:
var q = from row in UnderlyingDataTable.AsEnumerable()
where row.RowState != DataRowState.Deleted
join id in FactIDsToSelect
on row.Field<int>("FactID") equals id
select row;
var newTable = q.CopyToDataTable();
Why is LINQ JOIN so much faster than linking with WHERE?

Please try with following.
List<DataRow> list = new List<DataRow>(UnderlyingDataTable.Select("FactID = " + id.ToString(),DataViewRowState.Unchanged));
You may need to change the DataViewRowState argument in .Select method.

Related

Why is linq reversing order in group by

I have a linq query which seems to be reversing one column of several in some rows of an earlier query:
var dataSet = from fb in ds.Feedback_Answers
where fb.Feedback_Questions.Feedback_Questionnaires.QuestionnaireID == criteriaType
&& fb.UpdatedDate >= dateFeedbackFrom && fb.UpdatedDate <= dateFeedbackTo
select new
{
fb.Feedback_Questions.Feedback_Questionnaires.QuestionnaireID,
fb.QuestionID,
fb.Feedback_Questions.Text,
fb.Answer,
fb.UpdatedBy
};
Gets the first dataset and is confirmed working.
This is then grouped like this:
var groupedSet = from row in dataSet
group row by row.UpdatedBy
into grp
select new
{
Survey = grp.Key,
QuestionID = grp.Select(i => i.QuestionID),
Question = grp.Select(q => q.Text),
Answer = grp.Select(a => a.Answer)
};
While grouping, the resulting returnset (of type: string, list int, list string, list int) sometimes, but not always, turns the question order back to front, without inverting answer or questionID, which throws it off.
i.e. if the set is questionID 1,2,3 and question A,B,C it sometimes returns 1,2,3 and C,B,A
Can anyone advise why it may be doing this? Why only on the one column? Thanks!
edit: Got it thanks all! In case it helps anyone in future, here is the solution used:
var groupedSet = from row in dataSet
group row by row.UpdatedBy
into grp
select new
{
Survey = grp.Key,
QuestionID = grp.OrderBy(x=>x.QuestionID).Select(i => i.QuestionID),
Question = grp.OrderBy(x=>x.QuestionID).Select(q => q.Text),
Answer = grp.OrderBy(x=>x.QuestionID).Select(a => a.Answer)
};
Reversal of a grouped order is a coincidence: IQueryable<T>'s GroupBy returns groups in no particular order. Unlike in-memory GroupBy, which specifies the order of its groups, queries performed in RDBMS depend on implementation:
The query behavior that occurs as a result of executing an expression tree that represents calling GroupBy<TSource,TKey,TElement>(IQueryable<TSource>, Expression<Func<TSource,TKey>>, Expression<Func<TSource,TElement>>) depends on the implementation of the type of the source parameter.`
If you would like to have your rows in a specific order, you need to add OrderBy to your query to force it.
How I do it and maintain the relative list order, rather than apply an order to the resulting set?
One approach is to apply grouping to your data after bringing it into memory. Apply ToList() to dataSet at the end to bring data into memory. After that, the order of subsequent GrouBy query will be consistent with dataSet. A drawback is that the grouping is no longer done in RDBMS.

Filter items from database based on a List<>

I have a method that accepts two List<int> for which I need to get data from the database based on the List<>s.
So, I receive a List<PersonId> and List<NationalityId> for example, and I need to get a result set where records match the PersonIds and NationalistIds.
public List<PersonDTO> SearchPeople(List<int> persons, Lisy<int> nationalities)
{
var results = (from c in myDbContect.People where .... select c).ToList();
}
Note that I think Lists might be null.
Is there an efficient way?
I was going to try:
where ((persons != null && persons.Count > 0) && persons persons.Contains(x=>x.PersonId))
But this would generate rather inefficient SQL, and as I add more search parameters, the linq may get very messy.
Is there an efficient way to achieve this?
The join method may be easy to read, but the issue I face is that IF the input list is empty, then it shouldn't filter. That is, if nationalities is empty, don't filter any out:
var results = (from c in entities.Persons
join p in persons on c.PersonId equals b
join n in nationalities on c.NationalityId equals n
equals n
select c).ToList();
This would return no results if any of the lists were empty. Which, is bad.
If you join an IQueryable with an IEnumerable (in this case, entities.Persons and persons), your filtering will not happen within your query. Instead, your IQueryable is enumerated, retrieving all of your records from the database, while the join is performed in memory using the IEnumerable join method.
To perform your filtering against a list within your query, there are two main options:
Join using an IQueryable on both sides. This might be possible if your list of ids comes from the execution of another query, in which case you can use the underlying query in your join instead of the resulting set of ids.
Use the contains operator against your list. This is only possible with small lists, because each additional id requires its own query parameter. If you have many ids, you can possibly extend this approach with batching.
If you want to skip filtering when the list is empty, then you might consider using the extension method invocation instead of the LINQ syntax. This allows you to use an if statement:
IQueryable<Person> persons = entities.persons;
List<int> personIds = new List<int>();
if(personIds.Count > 0)
{
persons = persons.Where(p => personIds.Contains(p.PersonId));
}
var results = persons.ToList();
Note that the Where predicate uses option #2 above, and is only applied if there are any ids in the collection.
If you want to get all the records for persons for example if the list is empty and then filter by nationalityId list if its not empty you can do something like this:
List<int> personsIds = ...;
List<int> nationalitiesIds = ...;
var results = (from c in entities.Persons
join p in persons on c.PersonId equals b
join n in nationalities on c.NationalityId equals n
where ((personsIds == null || personsIds.Contains(p.Id))
&& (nationalitiesIds == null || nationalitiesIds.Contains(n.Id))
select c).ToList();

c# - Copy only selected data to new datatable with linq

I've searched the web for quite some time now and can't seem to find an elegant way to
read data from one datatable,
group it by two variables with linq
select only those two variables (forget about the others in the source datatable) and
copy these items to a new datatable.
I got it working without selecting specific variables, but at the amount of data the program is going to process later I'd rather only copy what's really needed.
var temp123 = from row in oldDataTable.AsEnumerable()
orderby row["Column1"] ascending
group row by new { Column1 = row["Column1"], Column2 = row["Column2"] } into grp
select grp.First();
newDataTable = temp123.CopyToDataTable();
Can anyone please be so kind to help me out here? Thanks!
You can use custom implementation of CopyToDataTable method from this article How to: Implement CopyToDataTable Where the Generic Type T Is Not a DataRow
newDataTable =
oldDataTable
.AsEnumerable()
.GroupBy(r => new { Column1 = row["Column1"], Column2 = row["Column2"] })
.Select(g => g.First())
.OrderBy(x => x.Column1)
.CopyToDataTable(); // your custom extension
Another option, as Tim suggested - manual creation of DataTable.
var newDataTable = new DataTable();
newDataTable.Columns.Add("Column1");
newDataTable.Columns.Add("Column2");
foreach(var item in temp123)
newDataTable.Rows.Add(item.Column1, item.Column2);
And last option (if possible) - don't use DataTable - simply use collection of strongly typed objects.

Why do these two linq queries return different numbers of results?

In a web application that I work with I found a slow piece of code that I wanted to speed up a bit. Original code below:
foreach (Guid g in SecondaryCustomersIds)
{
var Customer = (from d in Db.CustomerRelationships
join c in Db.Customers on
d.PrimaryCustomerId equals c.CustomerId
where c.IsPrimary == true && d.SecondaryCustomerId == g
select c).Distinct().SingleOrDefault();
//Add this customer to a List<>
}
I thought it might be faster to load this all into a single query, so I attempted to rewrite it as the query below:
var Customers = (from d in Db.CustomerRelationships
join c in Db.Customers on
d.PrimaryCustomerId equals c.CustomerId
where c.IsPrimary == true && SecondaryCustomersIds.Contains(d.SecondaryCustomerId)
select c).Distinct();
Which is indeed faster, but now the new query returns fewer records than the first. It seems to me that these two chunks of code are doing the same thing and should return the same number of records. Can anyone see why they would not? What am I missing here?
It's possible for the first query to add a null object to the list (SingleOrDefault will return the default for the type, or null in this case, if it can't find a matching entity). Thus, for every Customer without a matching relationship, you could be adding a null object to that List<>, which would increase the count.
In your first scenario, does your final List<Customers> have duplicates?
You're calling Distinct, but also looping, which means you're not doing Distinct on your entire collection.
Your second example is calling Distinct on the entire collection.

Why does this additional join increase # of queries?

I'm having trouble coming up with an efficient LINQ-to-SQL query. I am attempting to do something like this:
from x in Items
select new
{
Name = x.Name
TypeARelated = from r in x.Related
where r.Type == "A"
select r
}
As you might expect, it produces a single query from the "Items" table, with a left join on the "Related" table. Now if I add another few similar lines...
from x in Items
select new
{
Name = x.Name
TypeARelated = from r in x.Related
where r.Type == "A"
select r,
TypeBRelated = from r in x.Related
where r.Type == "B"
select r
}
The result is that a similar query to the first attempt is run, followed by an individual query to the "Related" table for each record in "Items". Is there a way to wrap this all up in a single query? What would be the cause of this? Thanks in advance for any help you can provide.
The above query if written directly in SQL would be written like so (pseudo-code):
SELECT
X.NAME AS NAME,
(CASE R.TYPE WHEN A THEN R ELSE NULL) AS TypeARelated,
(CASE R.TYPE WHEN B THEN R ELSE NULL) AS TypeBRelated
FROM Items AS X
JOIN Related AS R ON <some field>
However, linq-to-sql is not as efficient, from your explanation, it does one join, then goes to individually compare each record. A better way would be to use two linq queries similar to your first example, which would generate two SQL queries. Then use the result of the two linq queries and join them, which would not generate any SQL statement. This method would limit the number of queries executed in SQL to 2.
If the number of conditions i.e. r.Type == "A" etc., are going to increase over time, or different conditions are going to be added, you're better off using a stored procedure, which would be one SQL query at all times.
Hasanain
You can use eager loading to do a single join on the server to see if that helps. Give this a try.
using (MyDataContext context = new MyDataContext())
{
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<Item>(i => i.Related);
context.LoadOptions = options;
// Do your query now.
}

Categories

Resources