LINQ to SQL where ID not in some_list - c#

I'm trying to make a LINQ to SQL statement which filters results where the ID is not in some list of integers. I realise the .contains() method cannot be used in Linq to SQL but for the purposes of explaining what I'd like to do, here's what I'd like to do:
nextInvention = (from inv in iContext.Inventions
where !historyList.Contains(inv.Id)
orderby inv.DateSubmitted ascending
select inv).First<Invention>();
Any idea how I might go about doing this?
Thanks!

Contains can be used in LINQ to SQL... it's the normal way of performing "IN" queries. I have a vague recollection that there are some restrictions, but it definitely can work. The restrictions may be around the types involved... is historyList a List<T>? It probably isn't supported for arbitrary IEnumerable<T>, for example.
Now, I don't know whether inverting the result works to give a "NOT IN" query, but it's at least worth trying. Have you tried the exact query from your question?
One point to note: I think the readability of your query would improve if you tried to keep one clause per line:
var nextInvention = (from inv in iContext.Inventions
where !historyList.Contains(inv.Id)
orderby inv.DateSubmitted ascending
select inv)
.First();
Also note that in this case, the method syntax is arguably simpler:
var nextInvention = iContext.Inventions
.Where(inv => !historyList.Contains(inv.Id))
.OrderBy(inv => inv.DateSubmitted)
.First();

Related

How is OrderBy Applied?

I have the following query:
var vendors = (from pp in this.ProductPricings
join pic in this.ProductItemCompanies
on pp.CompanyId equals pic.CompanyId into left
from pic in left.DefaultIfEmpty()
orderby pp.EffectiveDate descending
group pp by new { pp.Company, SortOrder = (pic != null) ? pic.SortOrder : short.MinValue } into v
select v).OrderBy(z => z.Key.SortOrder);
Does anyone know how the last OrderBy() is applied? Does that become part of the SQL query, or are all the results loaded in to memory and then passed to OrderBy()?
And if it's the second case, is there any way to make it all one query? I only need the first item and it would be very inefficent to return all the results.
Well it will try to apply the OrderBy to the original query since you are still using an IQueryable - meaning it hasn't been converted to an IEnumerable or hydrated to a collection using ToList or an equivalent.
Whether it can or not depends on the complexity of the resulting query. You'd have to try it to find out. My guess is it will turn the main query into a subquery and layer on a "SELECT * FROM (...) ORDER BY SortOrder" outer query.
Given your specific example the order by in this situation most, likely be appliead as part of the expression tree when it getting build, there for it will be applied to sql generated by the LINQ query, if you would convert it to Enumarable like ToList as mentioned in another answer then Order by would be applied as an extension to Enumerable.
Might use readable code, because as you write it is not understandable.
You will have a problem in the future with the linq statement. The problem is that if your statement does not return any value the value will be null and whenever you make cause a exception.
You must be careful.
I recommend you to do everything separately to understand the code friend.

LINQ c# efficiency

I need to write a query pulling distinct values from columns defined by a user for any given data set. There could be millions of rows so the statements must be as efficient as possible. Below is the code I have.
What is the order of this LINQ query? Is there a more efficient way of doing this?
var MyValues = from r in MyDataTable.AsEnumerable()
orderby r.Field<double>(_varName)
select r.Field<double>(_varName);
IEnumerable result= MyValues.Distinct();
I can't speak much to the AsEnumerable() call or the field conversions, but for the LINQ side of things, the orderby is a stable quick sort and should be O(n log n). If I had to guess, everything but the orderby should be O(n), so overall you're still just O(n log n).
Update: the LINQ Distinct() call should also be O(n).
So altogether, the Big-Oh for this thing is still O(Kn log n), where K is some constant.
Is there a more efficent way of doing this?
You could get better efficiency if you do the sort as part of the query that initializes MyDataTable, instead of sorting in memory afterwards.
from comments
I actually use MyDistinct.Distinct()
If you want distinct _varName values and you cannot do this all in the select query in dbms(what would be the most efficient way), you should use Distinct before OrderBy. The order matters here.
You would need to order all million of rows before you start to filter out the duplicates. If you use distinct first, you need to order only the rest.
var values = from r in MyDataTable.AsEnumerable()
select r.Field<double>(_varName);
IEnumerable<double> orderedDistinctValues = values.Distinct()
.OrderBy(d => d);
I have asked a related question recently which E.Lippert answered with a good explanation when order matters and when not:
Order of LINQ extension methods does not affect performance?
Here's a little demo where you can see that the order matters, but you can also see that it does not really matter since comparing doubles is trivial for a cpu:
Time for first orderby then distinct: 00:00:00.0045379
Time for first distinct then orderby: 00:00:00.0013316
your above query (linq) is good if you want all the million records and you have enough memory on a 64bit memory addressing OS.
the order of the query is, if you see the underlying command, would be transalated to
Select <_varname> from MyDataTable order by <_varname>
and this is as good as it is when run on the database IDE or commandline.
to give you a short answer regarding performance
put in a where clause if you can (with columns that are indexed)
ensure that the user can choose colums (_varname) that are indexed. Imagine the DB trying to sort million records on an unindexed column, which is evidently slow, but endangers linq to receive the badpress
Ensure that (if possible) initilisation of the MyDataTable is done correctly with the records that are of value (again based on a where clause)
profile your underlying query,
if possible, create storedprocs (debatable). you can create an entity model which includes storedprocs aswell
it may be faster today, but with the tablespace growing, and if your data is not ordered (indexed) thats where things get slowerr (even if you had a good linq expression)
Hope this helps
that said, if your db is not properly indexed, meaning

How to Linq-ify this query?

I currently have this Linq query:
return this.Context.StockTakeFacts
.OrderByDescending(stf => stf.StockTakeId)
.Where(stf => stf.FactKindId == ((int)kind))
.Take(topCount)
.ToList<IStockTakeFact>();
The intent is to return every fact for the topCount of StockTakes but instead I can see that I will only get the topCount number of facts.
How do I Linq-ify this query to achieve my aim?
I could use 2 queries to get the top-topCount StockTakeId and then do a "between" but I wondered what tricks Linq might have.
This is what I'm trying to beat. Note that it's really more about learning that not being able to find a solution. Also concerned about performance not for these queries but in general, I don't want to just to easy stuff and find out it's thrashing behind the scenes. Like what is the penalty of that contains clause in my second query below?
List<long> stids = this.Context.StockTakes
.OrderByDescending(st => st.StockTakeId)
.Take(topCount)
.Select(st => st.StockTakeId)
.ToList<long>();
return this.Context.StockTakeFacts
.Where(stf => (stf.FactKindId == ((int)kind)) && (stids.Contains(stf.StockTakeId)))
.ToList<IStockTakeFact>();
What about this?
return this.Context.StockTakeFacts
.OrderByDescending(stf => stf.StockTakeId)
.Where(stf => stf.FactKindId == ((int)kind))
.Take(topCount)
.Select(stf=>stf.Fact)
.ToList();
If I've understood what you're after correctly, how about:
return this.Context.StockTakes
.OrderByDescending(st => st.StockTakeId)
.Take(topCount)
.Join(
this.Context.StockTakeFacts,
st => st.StockTakeId,
stf => stf.StockTakeId,
(st, stf) => stf)
.OrderByDescending(stf => stf.StockTakeId)
.ToList<IStockTakeFact>();
Here's my attempt using mostly query syntax and using two separate queries:
var stids =
from st in this.Context.StockTakes
orderby st.StockTakeId descending
select st.StockTakeId;
var topFacts =
from stid in stids.Take(topCount)
join stf in this.Context.StockTakeFacts
on stid equals stf.StockTakeId
where stf.FactKindId == (int)kind
select stf;
return topFacts.ToList<IStockTakeFact>();
As others suggested, what you were looking for is a join. Because the join extension has so many parameters they can be a bit confusing - so I prefer query syntax when doing joins - the compiler gives errors if you get the order wrong, for instance. Join is by far preferable to a filter not only because it spells out how the data is joined together, but also for performance reasons because it uses indexes when used in a database and hashes when used in linq to objects.
You should note that I call Take in the second query to limit to the topCount stids used in the second query. Instead of having two queries, I could have used an into (i.e., query continuation) on the select line of the stids query to combine the two queries, but that would have created a mess for limiting it to topCount items. Another option would have been to put the stids query in parentheses and invoked Take on it. Instead, separating it out into two queries seemed the cleanest to me.
I ordinarily avoid specifying generic types whenever I think the compiler can infer the type; however, IStockTakeFact is almost certainly an interface and whatever concrete type implements it is likely contained by this.Context.StockTakeFacts; which creates the need to specify the generic type on the ToList call. Ordinarily I omit the generic type parameter to my ToList calls - that seems to be an element of my personal tastes, yours may differ. If this.Context.StockTakeFacts is already a List<IStockTakeFact> you could safely omit the generic type on the ToList call.

Does LINQ to SQL support the t-sql "in" statement

I dont't know how to describe the title better (so feel free to change) but essntial I want to produce the equivilent of the following statement in LINQ to SQL:
select * from products where category in ('shoes','boots')
The fields will be coming from the query string essentially (more secure than this but for ease of explanation) i.e
string[] myfields = request.querystring["categories"].split(',');
Thanks
Yes, you use Contains:
db.Products.Where(product => myFields.Contains(product.Category))
aka
from product in db.Products
where myFields.Contains(product.Category)
select product
As other have mentioned, yes it does using the .Contains method. To benefit the other random people that may arrive here via Bing (or any of the other search engines): Linq-To-Entities does not support the .Contains method in the current version. However, with a simple extension method, you can do so:
http://george.tsiokos.com/posts/2007/11/30/linq-where-x-in/
In theory you would use contains:
var productList = from product in context.Products
where myfields.contains(product.category)
select product
However you'll need to test this - I seem to recall there being a bug in the Linq2Sql optimizer when trying to deal with List<> and array values being used like this (it may have only occured if you tried to cast them as IQueryable)

Fluent and Query Expression — Is there any benefit(s) of one over other?

LINQ is one of the greatest improvements to .NET since generics and it saves me tons of time, and lines of code. However, the fluent syntax seems to come much more natural to me than the query expression syntax.
var title = entries.Where(e => e.Approved)
.OrderBy(e => e.Rating).Select(e => e.Title)
.FirstOrDefault();
var query = (from e in entries
where e.Approved
orderby e.Rating
select e.Title).FirstOrDefault();
Is there any difference between the two or is there any particular benefit of one over other?
Neither is better: they serve different needs. Query syntax comes into its own when you want to leverage multiple range variables. This happens in three situations:
When using the let keyword
When you have multiple generators (from clauses)
When doing joins
Here's an example (from the LINQPad samples):
string[] fullNames = { "Anne Williams", "John Fred Smith", "Sue Green" };
var query =
from fullName in fullNames
from name in fullName.Split()
orderby fullName, name
select name + " came from " + fullName;
Now compare this to the same thing in method syntax:
var query = fullNames
.SelectMany (fName => fName.Split().Select (name => new { name, fName } ))
.OrderBy (x => x.fName)
.ThenBy (x => x.name)
.Select (x => x.name + " came from " + x.fName);
Method syntax, on the other hand, exposes the full gamut of query operators and is more concise with simple queries. You can get the best of both worlds by mixing query and method syntax. This is often done in LINQ to SQL queries:
var query =
from c in db.Customers
let totalSpend = c.Purchases.Sum (p => p.Price) // Method syntax here
where totalSpend > 1000
from p in c.Purchases
select new { p.Description, totalSpend, c.Address.State };
I prefer to use the latter (sometimes called "query comprehension syntax") when I can write the whole expression that way.
var titlesQuery = from e in entries
where e.Approved
orderby e.Rating
select e.Titles;
var title = titlesQuery.FirstOrDefault();
As soon as I have to add (parentheses) and .MethodCalls(), I change.
When I use the former, I usually put one clause per line, like this:
var title = entries
.Where (e => e.Approved)
.OrderBy (e => e.Rating)
.Select (e => e.Title)
.FirstOrDefault();
I find that a little easier to read.
Each style has their pros and cons. Query syntax is nicer when it comes to joins and it has the useful let keyword that makes creating temporary variables inside a query easy.
Fluent syntax on the other hand has a lot more methods and operations that aren't exposed through the query syntax. Also since they are just extension methods you can write your own.
I have found that every time I start writing a LINQ statement using the query syntax I end up having to put it in parenthesis and fall back to using fluent LINQ extension methods. Query syntax just doesn't have enough features to use by itself.
In VB.NET i very much prefer query syntax.
I hate to repeat the ugly Function-keyword:
Dim fullNames = { "Anne Williams", "John Fred Smith", "Sue Green" };
Dim query =
fullNames.SelectMany(Function(fName) fName.Split().
Select(Function(Name) New With {Name, fName})).
OrderBy(Function(x) x.fName).
ThenBy(Function(x) x.Name).
Select(Function(x) x.Name & " came from " & x.fName)
This neat query is much more readable and maintainable in my opinion:
query = From fullName In fullNames
From name In fullName.Split()
Order By fullName, name
Select name & " came from " & fullName
VB.NET's query syntax is also more powerful and less verbose than in C#: https://stackoverflow.com/a/6515130/284240
For example this LINQ to DataSet(Objects) query
VB.NET:
Dim first10Rows = From r In dataTable1 Take 10
C#:
var first10Rows = (from r in dataTable1.AsEnumerable()
select r)
.Take(10);
I don't get the query syntax at all. There's just no reason for it in my mind. let can be acheived with .Select and anonymous types. I just think things look much more organized with the "punctuation" in there.
The fluent interface if there's just a where. If I need a select or orderby, I generally use the Query syntax.
Fluent syntax does seem more powerful indeed, it should also work better for organizing code into small reusable methods.
I know this question is tagged with C#, but the Fluent syntax is painfully verbose with VB.NET.
I really like the Fluent syntax and I try to use it where I can, but in certain cases, for example where I use joins, I usually prefer the Query syntax, in those cases I find it easier to read, and I think some people are more familiar to Query (SQL-like) syntax, than lambdas.
While I do understand and like the fluent format , I've stuck to Query for the time being for readability reasons. People just being introduced to LINQ will find Query much more comfortable to read.
I prefer the query syntax as I came from traditional web programming using SQL. It is much easier for me to wrap my head around. However, it think I will start to utilize the .Where(lambda) as it is definitely much shorter.
I've been using Linq for about 6 months now. When I first started using it I preferred the query syntax as it's very similar to T-SQL.
But, I'm gradually coming round to the former now, as it's easy to write reusable chunks of code as extension methods and just chain them together. Although I do find putting each clause on its own line helps a lot with readability.
I have just set up our company's standards and we enforce the use of the Extension methods. I think it's a good idea to choose one over the other and don't mix them up in code. Extension methods read more like the other code.
The comprehension syntax does not have all operators and using parentheses around the query and add extension methods after all just begs me for using extension methods from the start.
But for the most part it is just personal preference with a few exceptions.
From Microsoft's docs:
As a rule when you write LINQ queries, we recommend that you use query syntax whenever possible and method syntax whenever necessary. There is no semantic or performance difference between the two different forms. Query expressions are often more readable than equivalent expressions written in method syntax.
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/#query-expression-overview
They also say:
To get started using LINQ, you do not have to use lambdas extensively. However, certain queries can only be expressed in method syntax and some of those require lambda expressions.
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/query-syntax-and-method-syntax-in-linq

Categories

Resources