Is anything faster than .Any() to check that condition? - c#

I have been searching for a way to do the following with a faster EF query :
using (DAL.MandatsDatas db = new DAL.MandatsDatas())
{
if(db.ARTICLE.Any( t => t.condition == condition))
oneArticle = db.ARTICLE.First( t => t.condition == condition);
}
It works fine, but the more i add of these, the slower it feels.
It just looks like it goes through all the rows 2 times (i don't know if it's the case)
I've been searching, saw people using the count() > 0 and other irrelevant stuff...
Is there a faster way to check if someting exist and then take it.
Also i was wondering if the FirstOrDefault() could help my case, how does it work ?

Yes, FirstOrDefault is better here:
oneArticle = db.ARTICLE.FirstOrDefault(t => t.condition == condition);
Basically Any will do one select, and then First will do one more. While FirstOrDefault will do the same First does, and just return null if there was no output, thus eliminating the need to run another selection operation.

Yes, FirstOrDefault will be faster because it will only query once. The way it works is if no rows where available it will return null, if there was rows available it returns the fist row based on any ordering you applied (if any).

Related

LINQ - Using FirstOrDefault or DefaultIfEmpty?

I'm getting confused about how to use FirstOrDefault or DefaultIfEmpty.
The snippet below may be empty, but if it's not, I definitely want the first one.
var vThr = _context.PostThrs.FirstOrDefault(m =>
m.ThrZero == zero
&& m.ThrText.Substring(0,8) == "SERVICE-");
If it is empty, I would like the result to be "Empty". How would I do that?
I've taken some stabs at it, but I'm not sure that it's helpful to share.
EDIT: After posting, I realized that the question doesn't really work as you cannot insert a single string into the result.
Summarize between:
.FirstOrDefault()
.DefaultIfEmpty()
Query with the result of the first item that fulfills.
Query with the result of IEnumerable. Use to initialize a default item if the sequence is empty.
- If there is item(s) fulfilled, return the first T item.
If there is item(s) fulfilled, return at least one or more T items as IEnumerable<T>.
- If not, returns default.
- If not, the defaultValue parameter is used to initialize into IEnumerable<T>. Returns an IEnumerable<T> with a single item (Count = 1).
So, based on your requirement, you are looking for .FirstOrDefault() to check the returned result is null and perform the following implementation.
Didn't cover the part that you want to assign an "Empty" string to the variable when null and you found out that it is not feasible to do as the variable is T which will conflict with the type.
References
.FirstOrDefault()
.DefaultIfEmpty()
Use FirstOrDefault() it will find first matched by condition element but if not will return just null

Faster version of LINQ .Any() and .Count()

I'm checking if a list has an element whose source and target are already in the list. If not I'm adding that element to the list. I'm doing this way :
if (!objectToSerialize.elements
.Any(x => x.data.source == edgetoAdd.data.source &&
x.data.target == edgetoAdd.data.target))
objectToSerialize.elements.Add(edgetoAdd);
This works but very slowly. Is there a way to make this part faster? Are there faster implementations of Any() or Count? Thanks in advance.
You can pre-index the data into something like a HashSet<T> for some T. Since you are comparing two values, a tuple might help:
var existingValues = new HashSet<(string,string)>(
objectToSerialize.elements.Select(x => (x.data.source, x.data.target)));
now you can test
existingValues.Contains((edgetoAdd.data.source, edgetoAdd.data.target))
efficiently. But!! Building the index is not free. This mainly helps if you are going to be testing lots of values. If you're only adding one, a linear search is probably your best bet.
Note that you can use the index approach with an index that lasts between multiple Add calls, but you would also need to remember to .Add it to the index each time. You can short-cut the test/add pair by using the return value of .Add on the hashset:
if(existingValues.Add((edgetoAdd.data.source, edgetoAdd.data.target)))
{
// a new value, yay!
objectToSerialize.elements.Add(edgetoAdd);
}

Trying to sort IQueryable by dynamic properties

I have the following problem that I would like to solve with a single Linq Query:
I have my data in a database which I retrieve with the help of Entity Framework's Linq to SQL, then I would like to apply a sorting. For this I have a simple string I get from the client which I then map to a property dynamically. After that I only get the chunk I need to display via a simple Skip and Take.
The problem now seems to be that the actions I try to apply don't really go together well. I use an IQueryable for the Linq result since I get my data from the database. As soon as the Linq query tries to execute I get the error that with Linq to SQL I cannot use "ToValue()" which I need to do my dynamical sorting like this:
x.GetType().GetProperty(propertyName).GetValue(x, null);
Is what I'm trying to do even possible and can someone point me in the right direction? I've been playing around for what seems like forever with different approaches, but to no avail.
This is my last approach with some variables hardcoded, but it doesn't work either (and it might be clunky, since I've been working on it for some time now).
IQueryable<OA_FileUpload> result;
Expression<Func<MyObject, object>> orderBy = x => x.GetType().GetProperty(propertyName).GetValue(x, null);
result = Db.MyObject.Where(f => f.isSomething == true && f.isSomethingElse == false)
.OrderBy(orderBy)
.Skip(20)
.Take(20);
As soon as I later try to do something with the result it fails completely.
No it is not possible in the way you're trying. The Entity Framework's engine cannot translate x.GetType().GetProperty(propertyName).GetValue(x, null); to SQL. You're applying OrderBy before Skip and Take, which is the right way, but it also means that your sorting will be translated as part of the generated SQL.
What you can do though, is to build your query inclemently (i.e. in several steps). Then you can add the sorting with the conditions you want. Something like this:
IQueryable<OA_FileUpload> result =
Db.MyObject.Where(f => f.isSomething == true && f.isSomethingElse == false);
//Add your conditional sorting.
if(...) //Your first condition for sorting.
result = result.OrderBy(....); //Your sorting for that condition.
else if(...) //Your second condition for sorting.
result = result.OrderBy(....); //Your sorting for that condition.
//Now apply the paging.
result = result.Skip(20).Take(20);
The LINQ above is still translated into and executed as one single query, but now you can add all the conditions you want.

Which is faster in .NET, .Contains() or .Count()?

I want to compare an array of modified records against a list of records pulled from the database, and delete those records from the database that do not exist in the incoming array. The modified array comes from a client app that maintains the database, and this code runs in a WCF service app, so if the client deletes a record from the array, that record should be deleted from the database. Here's the sample code snippet:
public void UpdateRecords(Record[] recs)
{
// look for deleted records
foreach (Record rec in UnitOfWork.Records.ToList())
{
var copy = rec;
if (!recs.Contains(rec)) // use this one?
if (0 == recs.Count(p => p.Id == copy.Id)) // or this one?
{
// if not in the new collection, remove from database
Record deleted = UnitOfWork.Records.Single(p => p.Id == copy.Id);
UnitOfWork.Remove(deleted);
}
}
// rest of method code deleted
}
My question: is there a speed advantage (or other advantage) to using the Count method over the Contains method? the Id property is guaranteed to be unique and to identify that particular record, so you don't need to do a bitwise compare, as I assume Contains might do.
Anyone?
Thanks, Dave
This would be faster:
if (!recs.Any(p => p.Id == copy.Id))
This has the same advantages as using Count() - but it also stops after it finds the first match unlike Count()
You should not even consider Count since you are only checking for the existence of a record. You should use Any instead.
Using Count forces to iterate the entire enumerable to get the correct count, Any stops enumerating as soon as you found the first element.
As for the use of Contains you need to take in consideration if for the specified type reference equality is equivalent to the Id comparison you are performing. Which by default it is not.
Assuming Record implements both GetHashCode and Equals properly, I'd use a different approach altogether:
// I'm assuming it's appropriate to pull down all the records from the database
// to start with, as you're already doing it.
foreach (Record recordToDelete in UnitOfWork.Records.ToList().Except(recs))
{
UnitOfWork.Remove(recordToDelete);
}
Basically there's no need to have an N * M lookup time - the above code will end up building a set of records from recs based on their hash code, and find non-matches rather more efficiently than the original code.
If you've actually got more to do, you could use:
HashSet<Record> recordSet = new HashSet<Record>(recs);
foreach (Record recordFromDb in UnitOfWork.Records.ToList())
{
if (!recordSet.Contains(recordFromDb))
{
UnitOfWork.Remove(recordFromDb);
}
else
{
// Do other stuff
}
}
(I'm not quite sure why your original code is refetching the record from the database using Single when you've already got it as rec...)
Contains() is going to use Equals() against your objects. If you have not overridden this method, it's even possible Contains() is returning incorrect results. If you have overridden it to use the object's Id to determine identity, then in that case Count() and Contains() are almost doing the exact same thing. Except Contains() will short circuit as soon as it hits a match, where as Count() will keep on counting. Any() might be a better choice than both of them.
Do you know for certain this is a bottleneck in your app? It feels like premature optimization to me. Which is the root of all evil, you know :)
Since you're guarenteed that there will be 1 and only 1, Any might be faster. Because as soon as it finds a record that matches it will return true.
Count will traverse the entire list counting each occurrence. So if the item is #1 in the list of 1000 items, it's going to check each of the 1000.
EDIT
Also, this might be a time to mention not doing a premature optimization.
Wire up both your methods, put a stopwatch before and after each one.
Create a sufficiently large list (1000 items or more, depending on your domain.) And see which one is faster.
My guess is that we're talking on the order of ms here.
I'm all for writing efficient code, just make sure you're not taking hours to save 5 ms on a method that gets called twice a day.
It would be so:
UnitOfWork.Records.RemoveAll(r => !recs.Any(rec => rec.Id == r.Id));
May I suggest an alternative approach that should be faster I believe since count would continue even after the first match.
public void UpdateRecords(Record[] recs)
{
// look for deleted records
foreach (Record rec in UnitOfWork.Records.ToList())
{
var copy = rec;
if (!recs.Any(x => x.Id == copy.Id)
{
// if not in the new collection, remove from database
Record deleted = UnitOfWork.Records.Single(p => p.Id == copy.Id);
UnitOfWork.Remove(deleted);
}
}
// rest of method code deleted
}
That way you are sure to break on the first match instead of continue to count.
If you need to know the actual number of elements, use Count(); it's the only way. If you are checking for the existence of a matching record, use Any() or Contains(). Both are MUCH faster than Count(), and both will perform about the same, but Contains will do an equality check on the entire object while Any() will evaluate a lambda predicate based on the object.

Would this SingleOrDefault() optimization be worthwhile or is it overkill / harmful?

I was messing around with LinqToSQL and LINQPad and I noticed that SingleOrDefault() doesn't do any filtering or limiting in the generated SQL (I had almost expected the equivalent of Take(1)).
So Assuming you wanted to protect yourself from large quantities accidentally being returned, would the following snippet be useful or is it a bad idea?
// SingleType is my LinqToSQL generated type
// Singles is the table that contains many SingleType's
// context is my datacontext
public SingleType getSingle(int id)
{
var query = from s in context.Singles where s.ID == id select s;
var result = query.Take(2).SingleOrDefault();
return result;
}
As opposed to the normal way I would have done it (notice no .Take(2) )
public SingleType getSingle(int id)
{
var query = from s in Singles where s.ID == id select s;
var result = query.SingleOrDefault();
return result;
}
I figured with the Take(2), I would still get the functionality of SingleOrDefault() with the added benefit of never having to worry about returning {n} rows accidentally, but I'm not sure if its even worth it unless i'm constantly expecting to accidentally return {n} rows with my query.
So, is this worthwhile? Is it harmful? Are there any pro's / con's that I'm not seeing?
Edit:
SQL Generated without the Take(2)
SELECT [t0].[blah], (...)
FROM [dbo].[Single] AS [t0]
WHERE [t0].[ID] = #p0
SQL Generated with the Take(2)
SELECT TOP 2 [t0].[blah], (...)
FROM [dbo].[Single] AS [t0]
WHERE [t0].[ID] = #p0
Also, when I speak of SingleOrDefault's functionality, I specifically desire to have it throw an exception if 2 or more are returned, which is why i'm doing a "Take(2)". The difference being, without the .Take(2), it will return {n} rows from the database, when it really only needs to return 2 (just enough to make it throw).
Single is more of a convenience method to get the single element of a query than a way to limit the number of results. By using Single you are, in effect, saying "I know this query can only have one item, so just give it to me," just like when doing someArray[0] when you know there will only be one element. SingleOrDefault adds the ability to return null rather than throwing an exception when dealing with sequences of length 0. You shouldn't be using Single or SingleOrDefault with queries that may return more than 1 result: an InvalidOperationException will be thrown.
If ID in your query is the primary key of the table, or a UNIQUE column, the database will ensure that the result set contains 1 row or none with no need for a TOP clause.
However, if you are selecting on a non-unique / non-key column and want the first result or last result (note that these have no meanings unless you also introduce an OrderBy) then you can use First or Last (which both have OrDefault counterparts) to get the SQL you desire:
var query = from s in context.Singles
where s.Id == id
orderby s.someOtherColumn
select s;
var item = query.FirstOrDefault();
On a side note, you can save yourself some typing if you are, indeed, doing a query for a single element:
var query = from s in context.Singles where s.Id == id select s;
var item = query.SingleOrDefault();
can become:
var item = context.Singles.SingleOrDefault(s => s.Id == id);
You already mentioned the point. If you exspect the query to return a huge number of rows quite often, it might give you a performance gain. But if this is an exceptional case - and SingleOrDefault() clearly indicates that - it is not worth the effort. It just pollutes your code and you should document it if you decide to let it in.
UPDATE
Just noticed that you are querying the id. I assume it is the primary key and you will get one or zero rows in consequence. So in theory you should not care much about using Single(), First(), Take(1) or whatever. But I would still consider it good design to use Single() to explicitly state that you exspect exactly one row. A cohort told me a few weeks ago that they even had a project, where something went terribly wrong and the primary key was no longer unique because of a mayor database malfunction. So better safe than sorry.
SingleOrDefault (and the IEnumerable<T>.SingleOrDefault() ) both raise an InvalidOperationException if the sequence has more than one element.
Your case above can never happen - it will throw an exception.
Edit:
My suggestion here would depend on your usage scenario. If you think there will be cases where you're going to regularly return more than a few rows from this query, then add the .Take(2) line. This will give you the same behavior, and the same exception, but eliminate the potential for returning many records from the DB.
However, your use of SingleOrDefault() is suggesting that there should never be >1 row returned. If that's really the case, I would leave this off and just treat this as an exception. In my opinion, you're reducing the readability of the code by suggesting that it would be normal to have >2 records when you add .Take(2), and in this case, I don't believe that's true. I'd take the perf. hit in the exceptional case for the simplicity of leaving it off.
Allen, is ID the primary key for the Singles table? If it is I don't fully understand your issue, as your second query will return one record or null. And the SQL will be where ID = ###...
Using Take(2).SingleOrDefault() defeats the purpose of SingleOrDefault.

Categories

Resources