Dynamic LINQ to join on dynamic columns on DataTable

Dynamic LINQ to join on dynamic columns on DataTable - c#

I have a situation that I am currently unsure of how to proceed.
I have two datatables that are populated from a database, I also have a list of column names available that can be used to join these two datatables together. I wish to write a set of LINQ queries that will:
Show rows from both datatables (inner join, used for updating one from another).
Show rows from one datatable that don't exist in the other (one query, left join used for inserts, the other a right join used for deletes).
Now I know how to do this with normal LINQ to objects or datatables, however in this case I need to apply the columns to join on dynamically, and there could be more than one. Looking at the following partial example code:
table1.AsEnumerable()
.Join(table2.AsEnumerable(),
dr1 => dr1.Field<string>("ID"),
dr2 => dr2.Field<string>("ID"),
(dr1, dr2) => new
{
FieldID = dr1.Field<string>("ID"),
CdGroup = dr2.Field<string>("Name")
})
The issues are that I don't know the field type so the .Field<string> parts of the statement can't be applied. Also if their are multiple join columns, then I will need to have multiple join statements.
I have read up on dynamic LINQ and it seems quite promising, however I haven't managed to find any information on dynamic LINQ joins like I am trying to do. I know I could probably get the same results using nested loops or the.Select() method on the datatable, but I am trying to apply LINQ to some of the tougher queries that I require.
Do anyone have any pointers or examples of how I could achieve this, or should I just revert to using a non-LINQ approach?
Thanks very much.

If you are using Entity Framework, you could download the Microsoft Dynamics CRM, and use the pattern answered to this question Is there way to structure a QueryExpression so that you could dynamically handle a unknown number of conditions
There is a construct called QueryExpression from which you can model dynamic queries. See this MSDN article http://msdn.microsoft.com/en-us/library/microsoft.xrm.sdk.query.queryexpression.aspx

Related

LINQ Grouping a List of Objects into Anonymous Type

I am having difficulty trying to use LINQ to query a sql database in such a way to group all objects (b) in one table associated with an object (a) in another table into an anonymous type with both (a) and a list of (b)s. Essentially, I have a database with a table of offers, and another table with histories of actions taken related to those offers. What I'd like to be able to do is group them in such a way that I have a list of an anonymous type that contains every offer, and a list of every action taken on that offer, so the signature would be:
List<'a>
where 'a is new { Offer offer, List<OfferHistories> offerHistories}
Here is what I tried initially, which obviously will not work
var query = (from offer in context.Offers
join offerHistory in context.OffersHistories on offer.TransactionId equals offerHistory.TransactionId
group offerHistory by offerHistory.TransactionId into offerHistories
select { offer, offerHistories.ToList() }).ToList();
Normally I wouldn't come to SE with this little information but I have tried many different ways and am at a loss for how to proceed.

Please try to avoid .ToList() calls, only do if really necessary. I have an important question: Do you really need all columns of OffersHistories? Because it is very expensive grouping a full object, try only grouping the necessary columns instead. If you really need all offerHistories for one offer then I'm suggesting to write a sub select (this is also cost more performance):
var query = (from offer in context.Offers
select new { offer, offerHistories = (from offerHistory in context.OffersHistories
where offerHistory.TransactionId == offer.TransactionId
select offerHistory) });
P.s.: it's a good idea to create indexes for foreign key columns, columns that are used in where and group by statements, those are going to make the query faster,

Linq query timing out, how to streamline query

Our front end UI has a filtering system that, in the back end, operates over millions of rows. It uses a an IQueryable that is built up over the course of the logic, then executed all at once. Each individual UI component is ANDed together (for example, Dropdown1 and Dropdown2 will only return rows that have both of what is selected in common). This is not a problem. However, Dropdown3 has has two types of data in it, and the checked items need to be ORd together, then ANDed with the rest of the query.
Due to the large amount of rows it is operating over, it keeps timing out. Since there are some additional joins that need to happen, it is somewhat tricky. Here is my code, with the table names replaced:
//The end list has driver ids in it--but the data comes from two different places. Build a list of all the driver ids.
driverIds = db.CarDriversManyToManyTable.Where(
cd =>
filter.CarIds.Contains(cd.CarId) && //get driver IDs for each car ID listed in filter object
).Select(cd => cd.DriverId).Distinct().ToList();
driverIds = driverIds.Concat(
db.DriverShopManyToManyTable.Where(ds => filter.ShopIds.Contains(ds.ShopId)) //Get driver IDs for each Shop listed in filter object
.Select(ds => ds.DriverId)
.Distinct()).Distinct().ToList();
//Now we have a list solely of driver IDs
//The query operates over the Driver table. The query is built up like this for each item in the UI. Changing from Linq is not an option.
query = query.Where(d => driverIds.Contains(d.Id));
How can I streamline this query so that I don't have to retrieve thousands and thousands of IDs into memory, then feed them back into SQL?

There are several ways to produce a single SQL query. All they require to keep the parts of the query of type IQueryable<T>, i.e. do not use ToList, ToArray, AsEnumerable etc. methods that force them to be executed and evaluated in memory.
One way is to create Union query containing the filtered Ids (which will be unique by definition) and use join operator to apply it on the main query:
var driverIdFilter1 = db.CarDriversManyToManyTable
.Where(cd => filter.CarIds.Contains(cd.CarId))
.Select(cd => cd.DriverId);
var driverIdFilter2 = db.DriverShopManyToManyTable
.Where(ds => filter.ShopIds.Contains(ds.ShopId))
.Select(ds => ds.DriverId);
var driverIdFilter = driverIdFilter1.Union(driverIdFilter2);
query = query.Join(driverIdFilter, d => d.Id, id => id, (d, id) => d);
Another way could be using two OR-ed Any based conditions, which would translate to EXISTS(...) OR EXISTS(...) SQL query filter:
query = query.Where(d =>
db.CarDriversManyToManyTable.Any(cd => d.Id == cd.DriverId && filter.CarIds.Contains(cd.CarId))
||
db.DriverShopManyToManyTable.Any(ds => d.Id == ds.DriverId && filter.ShopIds.Contains(ds.ShopId))
);
You could try and see which one performs better.

The answer to this question is complex and has many facets that, individually, may or may not help in your particular case.
First of all, consider using pagination. .Skip(PageNum * PageSize).Take(PageSize) I doubt your user needs to see millions of rows at once in the front end. Show them only 100, or whatever other smaller number seems reasonable to you.
You've mentioned that you need to use joins to get the data you need. These joins can be done while forming your IQueryable (entity framework), rather than in-memory (linq to objects). Read up on join syntax in linq.
HOWEVER - performing explicit joins in LINQ is not the best practice, especially if you are designing the database yourself. If you are doing database first generation of your entities, consider placing foreign-key constraints on your tables. This will allow database-first entity generation to pick those up and provide you with Navigation Properties which will greatly simplify your code.
If you do not have any control or influence over the database design, however, then I recommend you construct your query in SQL first to see how it performs. Optimize it there until you get the desired performance, and then translate it into an entity framework linq query that uses explicit joins as a last resort.
To speed such queries up, you will likely need to perform indexing on all of the "key" columns that you are joining on. The best way to figure out what indexes you need to improve performance, take the SQL query generated by your EF linq and bring it on over to SQL Server Management Studio. From there, update the generated SQL to provide some predefined values for your #p parameters just to make an example. Once you've done this, right click on the query and either use display estimated execution plan or include actual execution plan. If indexing can improve your query performance, there is a pretty good chance that this feature will tell you about it and even provide you with scripts to create the indexes you need.

It looks to me that using the instance versions of the LINQ extensions is creating several collections before you're done. using the from statement versions should cut that down quite a bit:
driveIds = (from var record in db.CarDriversManyToManyTable
where filter.CarIds.Contains(record.CarId)
select record.DriverId).Concat
(from var record in db.DriverShopManyToManyTable
where filter.ShopIds.Contains(record.ShopId)
select record.DriverId).Distinct()
Also using the groupby extension would give better performance than querying each driver Id.

LINQ: Translating a SQL WITH clause to LINQ and Entity Framework

I have an app using Entity Framework. I want to add a tree view listing products, grouped by their categories. I have an old SQL query that will grab all of the products and categories and arrange them into parent nodes and children. I am trying to translate it into LINQ that uses the EF. But the SQL has a WITH sub-query that I am not familiar with using. I have tried using Linqer and LinqPad to sort it out, but they choke on the WITH clause and I am not sure how to fix it. Is this sort of thing possible in LINQ?
Here is the query:
declare #id int
set #id=0
WITH ChildIDs(id,parentid,type,ChildLevel) AS
(
SELECT id,parentid,type,0 AS ChildLevel
FROM dbo.brooks_product
WHERE id = #id
UNION ALL
SELECT e.id,e.parentid,e.type,ChildLevel + 1
FROM dbo.brooks_product AS e
INNER JOIN ChildIDs AS d
ON e.parentid = d.id
WHERE showitem='yes' AND tribflag=1
)
SELECT ID,parentid,type,ChildLevel
FROM ChildIDs
WHERE type in('product','productchild','productgroup','menu')
ORDER BY ChildLevel, type
OPTION (MAXRECURSION 10);
When I run the query, I get data that looks like this (a few thousand rows, truncated here):
ID.....parentid.....type.....ChildLevel
35429..0............menu.....1
49205..0............menu.....1
49206..49205........menu.....2
169999.49206........product..3
160531.169999.......productchild..4
and so on.

The WITH block is a Common Table Expression, and in this case is used to create a recursive query.
This will be VERY difficult in Linq as Linq doesn't play well with recursion. If you need all of the data on one result set that a Stored Procedure would be easier. Another option is to do the recursion in C# (not in Linq but a recursive function) and do multiple round-trips. The performance will not be as good but if you result set is small it may not make much difference (and you will get a better object model).

You may be able to solve this using LINQ to Entities, but it is non-trivial and I suspect it will be very time consuming.
In situations like this, you may prefer to build a SQL View or Table-Valued Function that returns the results for which you're looking. Then import that View or Table-Valued Function into your EF model and you can pull data directly from it using LINQ.
Querying the View in LINQ is no different than querying a table.
To get data from a Table-Valued Function in LINQ, you pass the function's parameters in after the name of the function, like so:
var query = from tvf in _db.MyTableValuedFunction(parameters)
select tvf;
EDIT
As suggested by #thepirat000, Table-Valued Function support is not available in Entity Framework versions prior to version 5. In order to use this functionality, EF must be running with .NET 4.5 or higher.

At the end of the day, I could not get this to work. I ended up writing out a SQL query dynamically and sending that straight to the database. It works fine, and I am not relying on any direct user input so there is no chance of SQL injection. But it seems so old school! For the rest of my program I am using EF and LINQ.
Thanks for the replies!

DataTable reader loading is very slow

I need to fetch some data based on a keyword, the query is tested to 100% accurate, but the problem is the the loading of the reader is pretty slow. I have tried replacing this query with one that does not contain inner joins at all and the loading was pretty fast. So I wonder, since I am only selecting one column as a result, why does DataTable.Load() take so much time? Is it the SQLite's ExecuteReader that loads the whole results and not just the one column?
Before using the DataTable, the average time of executing each reader.Read() was 7 seconds.
This is my code:
_database.Connect();
var selectCommand = new SQLiteCommand(
#"SELECT A.ID AS MY_ID FROM MD
INNER JOIN TMD ON MD.ID = TMD.ID_MD
INNER JOIN TR ON TR.ID = TMD.ID_TR
INNER JOIN P ON P.ID = TR.ID_P
INNER JOIN DP ON DP.ID_P = P.ID
INNER JOIN CD ON CD.ID = DP.ID_CD
WHERE CD.DESC = #desc"
);
selectCommand.Parameters.AddWithValue("#desc", value);
using (DbDataReader reader = _database.ExecuteQuery(selectCommand))
{
DataTable data = new DataTable("MyData");
data.Load(reader);
}
_database.Disconnect();

I think this happens due a nature of SQLite and great number of joins.
Try to refactor database scheme, like denormalize data for faster access.

The SQLite Query Planner offers some hints about query optimization for SQLite.
Some items that may apply to your question:
1.) Due to the implementation in SQLite you might try to re-order the multiple joins:
The current implementation of SQLite uses only loop joins. That is to
say, joins are implemented as nested loops. The default order of the
nested loops in a join is for the left-most table in the FROM clause to
form the outer loop and the right-most table to form the inner loop.
So, depending on how the JOINs are constructed there might be a difference in performance.
SQLite tries to optimize this automatically, but as far as I understood the documentation there is no guarantee for success (highlights by me):
However, SQLite will nest the loops in a different order if doing so
will help it to select better indices.
[...]
Join reordering is automatic and usually works well enough that programmers
do not have to think about it, especially if ANALYZE has been used to gather
statistics about the available indices. But occasionally some hints from the
programmer are needed.
2.) Also, please note that INNER JOINS are internally converted into WHERE clauses, so any of the performance tips in the WHERE section of the document might apply, too:
The ON and USING clauses of an inner join are converted into
additional terms of the WHERE clause prior to WHERE clause analysis
described above in paragraph 1.0. Thus with SQLite, there is no
computational advantage to use the newer SQL92 join syntax over the
older SQL89 comma-join syntax. They both end up accomplishing exactly
the same thing on inner joins.
3.) You might consider to select more columns in your statement, if there are any indexes on them:
It is not necessary for every column of an index to appear in a WHERE
clause term in order for that index to be used. But there can not be
gaps in the columns of the index that are used.

SubSonic .Paged() query returns duplicate records

Using a SubSonic (2.2) SqlQuery object, I am querying a view that contains distinct rows from another table. The results of the query, however, contain multiple rows for certain rows in the view. It appears to be because of a join on a temporary table in the query generated to achieve paging. How can I avoid this duplication of rows?
Bonus points: I have to use the view because SubSonic can't do .Paged() and .Distinct() at the same time. Why not?

If I remember correctly you have to use distinct on the right position.
var query = DB.Select().From<Products>()
.Where(Products.CategoryColumn).IsEqualTo(5).Distinct();
var query = DB.Select().Distinct().From<Products>()
.Where(Products.CategoryColumn).IsEqualTo(5);
Both statements compile but the first generates invalid sql code. A good starting point for debugging SubSonic SqlQueries is to generate the output:
var sql = query.BuildSqlStatement();
Another solution could be to use Group instead of distinct so you can avoid the view in the first place.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.