Does LINQ do any optimization by sorting/converting data structures?
i.e.
Does LINQ iterate in this code, or is there any type of sorting/conversion done on the data to optimize the find operation?
var list = new List<IdCard> {new IdCard(){Name = "a", Number = 1}, new IdCard() { Name = "b", Number = 2 } };
var idA = list.FirstOrDefault(id => id.Name == "a");
var idB = list.FirstOrDefault(id => id.Name == "b");
I'm trying to understand if my LINQ code is directly translated to an iterative approach. If it is, then it would be better for the above code to use a dictionary lookup (given the case there were to be many look-ups) instead of relying on LINQ, right?
Does LINQ do any optimization by sorting/converting data structures?
Yes. There are all sorts of optimizations that take place throughout various LINQ methods.
Does LINQ iterate in this code
Yes.
is there any type of sorting/conversion done on the data to optimize the find operation?
No. The work spent constructing an entirely new data structure, in some sorted or hashed manner, would be more work than just iterating the sequence until you find the first item. Creating a collection in the LINQ implementation would not only be more processing work per item (as you're not only executing the predicate, but also whatever work that collection needs to do to store it for later), and using way more memory by storing the items for longer, but you can't quit as soon as you find the match, the way you could with a simple iteration.
then it would be better for the above code to use a dictionary lookup (given the case there were to be many look-ups) instead of relying on LINQ, right?
Yes, that code would be better off if you simply construct the collection using a type that natively provides operations that most effectively do what you want to perform on that collection (in this case, a collection that optimzies searching speed based on a key, so either a dictionary or sorted dictionary), rather than using an improper collection type and using LINQ methods.
Using LINQ is useful when the best algorithm for finding what you want for whatever collection you have (or can have, if you can control what collection you use) is the same as the algorithm you'd use for any arbitrary sequence. That's true in a lot of cases. This isn't one of those cases.
Thank you for your answers! They are very informative.
It looks like the answer for this query is really that some LINQ code may be optimized, and the best way to check is to simply look at the source.
Please correct me if I'm wrong, but the code appears to be available here:
https://github.com/Microsoft/referencesource/blob/master/System.Core/System/Linq/Enumerable.cs
No, all System.Linq extensions with predicate parameter enumerate the source collection, and don't modify or optimize the source collection. Some cases such as OrderBy, GroupBy, and Intersect use local collection to prevent enumerating the source more than once.
Lookup (it's similar to Dictionary<K, List<V>>) can be used as an alternative to Dictionary :
var list = new List<IdCard> {new IdCard(){Name = "a", Number = 1}, new IdCard() { Name = "b", Number = 2 } };
var lookup = list.ToLookup(c => c.Name);
var idA = lookup["a"].FirstOrDefault();
var idB = lookup["b"].FirstOrDefault();
Related
Normally when I need to have a list of ints/strings/etc. I create a list like:
var list = new List<string>
And then I create a hashtable that contains all the strings, and I don't insert into the list unless it isn't in the hashtable i.e. to enforce unique items in the list.
Is there a datatype that can satisfy both of these requirements for me?
There is. Use HashSet:
var set = new HashSet<int>();
set.Add(4);
set.Add(4); // there is already such an element in the set, no new elements added
Keep in mind, though, that it does not guarantee you the order of elements.
Do you just mean HashSet<string> ?
All elements in a HashSet<T> are unique; the Add() method returns a bool to indicate if a new item was actually added, or whether it was a no-op.
Is there a datatype that can satisfy both of these requirements for me?
No. A hashtable will provide you a direct access to an element given its unique key, whereas in a list you don't need a key and you could definitely have duplicates.
You can use the HashSet<T> data type MSDN. Which will only allow you to have a single copy of each value.
If you are after a set of unique values only (and don't subsequently care about ordering) then you should look at a HashSet<T>
Technically, there is System.Collections.Specialized.OrderedDictionary. However, this is an old non-updated (non-generic) class and I would generally recommend avoiding it ;-)
Represents a collection of key/value pairs that are accessible by the key or index.
In practice I would create a minimal wrapper class that exposes the required operations. (I would likely use a HashSet<T> (for existence) and a List<T> (for ordering), although just a single List<T> is far than sufficient for a relatively small n in most cases -- remember Big-O is about limits.)
Happy coding.
HashSet<string> set = new HashSet<string>();
bool inserted = set.Add("Item");
bool insertedDuplicate = set.Add("Item");
inserted.Dump("Result1");
insertedDuplicate.Dump("Result2");
//Result
//Result1 = true
//Result2 = false
You can run this in LinqPad to see the functionality and how it works.
We are still using .Net Framework 2.0 / VS 2005 so i do not have LINQ. If i don't want to go with the poor man's LINQ solution, what are some other alternatives for being able to query in memory custom objects in a dictionary?
I'm not sure if one of your poor man's LINQ solution is LINQBridge but I used it for a few weeks and it seemed to be working okay before we actually switched to .NET 3.5 and the real deal.
Dictionary<T> would seem like a good choice, although you haven't provided much information about what you mean by "query." Are you just looking to retrieve data based on some key value? Get a count of total items? Do a sum or average based on some condition? You really need to give more information to get a better answer.
To elaborate on what Chesso said, you'll have to iterate the loop just like LINQ does...
for example:
static T FindFirst<T>(IEnumerable<T> col, Predicate<T> predicate)
{
foreach(T t in col)
{
if(predicate(t))
{
return t;
}
}
return default(T);
}
I was not aware of the Predicate delegate, that seems to be pretty much what i was looking for. As far as the context for which i'm querying:
Say i have a object X with properties A (name, guaranteed to be unique) and B (age)
1) I have a series of objects in a dictionary whose keys are say Property A of a given object, and of course the value is the object iself.
Now i want to retrieve all objects in this dictionary which meet a certain criteria of B, say age > 20.
I can add all the values of the dictionary into a list then call the .FindAll on it, passing in a delegate. I can create an anonymous delegate to do this, but say i will reuse this many times. How can i dynamically specify an age criteria for the delegate method? Would the only choice be to encapsulate the Predicate method in a class, then create a new instance of that class with my criteria as an instance variable?
I'm looking for the fastest way to lookup if List, Set, Dictionary contains a specific Keyword (string). I don't need to store any data inside I just want to know if my Keyword is in the List.
I thought about some possibilities like:
Dictionary<string, bool> myDictionary = new Dictionary<string, bool>();
if (myDictionary.ContainsKey(valueToSearch))
{
// do something
}
but I don't need a value.
string[] myArray = {"key1", "key2", "key3"}
if (Array.IndexOf(myArray, valueToSearch) != -1)
{
// do something
}
Then I found:
List<string> list = new List<string>();
if (list.Contains(valueToSearch))
{
// do something
}
The lookup will happen very often and has to be very fast.
Any idea what's the fastest way to check if a value equals one of a given list of keys?
Of the standard collection types, Dictionary will be the fastest, since I don't think you have HashSet<T> in the compact framework. The other two do a sequential search.
In general, a Dictionary lookup is the usual solution to a problem like this, as long as your keys are good hash values that get a somewhat even distribution in the dictionary's lookup table.
However, there may be certain cases where a list lookup appears to run faster, depending on how the data is sorted and what exactly you are looking up.
The best way to tell is to run a profile of each case, and see which performs better.
I agree with Andy. You could also look at SortedList It's essentially a Dictionary that's sorted by its keys. Should make searching quicker if it's already sorted...
it has a property:
string Code
and 10 other.
common codes is list of strings(string[] )
cars a list of cars(Car[])
filteredListOfCars is List.
for (int index = 0; index < cars.Length; index++)
{
Car car = cars[index];
if (commonCodes.Contains(car.Code))
{
filteredListOfCars.Add(car);
}
}
Unfortunately this piece of methodexecutes too long.
I have about 50k records
How can I lower execution time??
The easiest optimization isto convert commonCodes from a string[] to a faster lookup structure such as a Dictionary<string,object> or a HashSet<string> if you are using .Net 3.5 or above. This will reduce the big O complexity of this loop and depending on the size of commonCodes should make this loop execute faster.
Jared has correctly pointed out that you can optimize this with a HashSet, but I would also like to point out that the entire method is unnecessary, wasting memory for the output list and making the code less clear.
You could write the entire method as:
var commonCodesLookup = new HashSet<int>(commonCodes);
var filteredCars = cars.Where(c => commonCodesLookup.Contains(c.Code));
Execution of the filteredCars filtering operation will be deferred, so that if the consumer of it only wants the first 10 elements, i.e. by using filteredCars.Take(10), then this doesn't need to build the entire list (or any list at all).
To do what you want, I would use the Linq ToLookup method to create an ILookup instead of using a dictionary. ToLookup was made especially for this type of scenario. It is basically an indexed look up on groups. You want to group your cars by Code.
var carCodeLookup = cars.ToLookup(car => car.Code);
The creation of the carCodeLookup would be slow but then you can use it for fast lookup of cars based on Code. To get your list of cars that are in your list of common codes you can do a fast lookup.
var filteredCarsQuery = commonCodes.SelectMany(code => carCodeLookup[code]);
This assumes that your list of cars does not change very often and it is your commonCodes that are dynamic between queries.
you could use the linq join command, like
var filteredListOfCars = cars.Join(commonCodes, c => c.Code, cC => cC, (car, code) => car).ToArray();
Here's an alternative to the linq options (which are also good ideas): If you're trying to do filtering quickly, I would suggest taking advantage of built in types. You could create a DataTable that has two fields, the id of the car in your array, and the code (you can add the other 10 things if they matter as well). Then you can create a DataView around it and use the filter property of that. It uses some really fast indexing internally (B-trees I believe) so you probably won't be able to beat its performance manually unless you're an algorithms whiz, which if you were, you wouldn't be asking here. It depends what you're doing and how much performance matters.
It looks like what you're really checking is whether the "code" is common, not the car. You could consider a fly weight pattern, where cars share common instances of Code objects. The code object can then have an IsCommon property and a Value property.
You can then do something to the effect of updating the used Code objects whenever the commoncodes list changes.
Now when you do your filtering you only need to check each car code's IsCommon property
Ok, understand that I come from Cold Fusion so I tend to think of things in a CF sort of way, and C# and CF are as different as can be in general approach.
So the problem is: I want to pull a "table" (thats how I think of it) of data from a SQL database via LINQ and then I want to do some computations on it in memory. This "table" contains 6 or 7 values of a couple different types.
Right now, my solution is that I do the LINQ query using a Generic List of a custom Type. So my example is the RelevanceTable. I pull some data out that I want to do some evaluation of the data, which first start with .Contains. It appears that .Contains wants to act on the whole list or nothing. So I can use it if I have List<string>, but if I have List<ReferenceTableEntry> where ReferenceTableEntry is my custom type, I would need to override the IEquatable and tell the compiler what exactly "Equals" means.
While this doesn't seem unreasonable, it does seem like a long way to go for a simple problem so I have this sneaking suspicion that my approach is flawed from the get go.
If I want to use LINQ and .Contains, is overriding the Interface the only way? It seems like if there way just a way to say which field to operate on. Is there another collection type besides LIST that maybe has this ability. I have started using List a lot for this and while I have looked and looked, a see some other but not necessarily superior approaches.
I'm not looking for some fine point of performance or compactness or readability, just wondering if I am using a Phillips head screwdriver in a Hex screw. If my approach is a "decent" one, but not the best of course I'd like to know a better, but just knowing that its in the ballpark would give me little "Yeah! I'm not stupid!" and I would finish at least what I am doing completely before switch to another method.
Hope I explained that well enough. Thanks for you help.
What exactly is it you want to do with the table? It isn't clear. However, the standard LINQ (-to-Objects) methods will be available on any typed collection (including List<T>), allowing any range of Where, First, Any, All, etc.
So: what is you are trying to do? If you had the table, what value(s) do you want?
As a guess (based on the Contains stuff) - do you just want:
bool x= table.Any(x=>x.Foo == foo); // or someObj.Foo
?
There are overloads for some of the methods in the List class that takes a delegate (optionally in the form of a lambda expression), that you can use to specify what field to look for.
For example, to look for the item where the Id property is 42:
ReferenceTableEntry found = theList.Find(r => r.Id == 42);
The found variable will have a reference to the first item that matches, or null if no item matched.
There are also some LINQ extensions that takes a delegate or an expression. This will do the same as the Find method:
ReferenceTableEntry found = theList.FirstOrDefault(r => r.Id == 42);
Ok, so if I'm reading this correctly you want to use the contains method. When using this with collections of objects (such as ReferenceTableEntry) you need to be careful because what you're saying is you're checking to see if the collection contains an object that IS the same as the object you're comparing against.
If you use the .Find() or .FindAll() method you can specify the criteria that you want to match on using an anonymous method.
So for example if you want to find all ReferenceTableEntry records in your list that have an Id greater than 1 you could do something like this
List<ReferenceTableEntry> listToSearch = //populate list here
var matches = listToSearch.FindAll(x => x.Id > 1);
matches will be a list of ReferenceTableEntry records that have an ID greater than 1.
having said all that, it's not completely clear that this is what you're trying to do.
Here is the LINQ query involved that creates the object I am talking about, and the problem line is:
.Where (searchWord => queryTerms.Contains(searchWord.Word))
List<queryTerm> queryTerms = MakeQueryTermList();
public static List<RelevanceTableEntry> CreateRelevanceTable(List<queryTerm> queryTerms)
{
SearchDataContext myContext = new SearchDataContext();
var productRelevance = (from pwords in myContext.SearchWordOccuranceProducts
where (myContext.SearchUniqueWords
.Where (searchWord => queryTerms.Contains(searchWord.Word))
.Select (searchWord => searchWord.Id)).Contains(pwords.WordId)
orderby pwords.WordId
select new {pwords.WordId, pwords.Weight, pwords.Position, pwords.ProductId});
}
This query returns a list of WordId's that match the submitted search string (when it was List and it was just the word, that works fine, because as an answerer mentioned before, they were the same type of objects). My custom type here is queryTerms, a List that contains WordId, ProductId, Position, and Weight. From there I go about calculating the relevance by doing various operations on the created object. Sum "Weight" by product, use position matches to bump up Weights, etc. My point for keeping this separate was that the rules for doing those operations will change, but the basic factors involved will not. I would have even rather it be MORE separate (I'm still learning, I don't want to get fancy) but the rules for local and interpreted LINQ queries seems to trip me up when I do.
Since CF has supported queries of queries forever, that's how I tend to lean. Pull the data you need from the db, then do your operations (which includes queries with Aggregate functions) on the in-memory table.
I hope that makes it more clear.