How to optimize this code

How to optimize this code - c#

it has a property:
string Code
and 10 other.
common codes is list of strings(string[] )
cars a list of cars(Car[])
filteredListOfCars is List.
for (int index = 0; index < cars.Length; index++)
{
Car car = cars[index];
if (commonCodes.Contains(car.Code))
{
filteredListOfCars.Add(car);
}
}
Unfortunately this piece of methodexecutes too long.
I have about 50k records
How can I lower execution time??

The easiest optimization isto convert commonCodes from a string[] to a faster lookup structure such as a Dictionary<string,object> or a HashSet<string> if you are using .Net 3.5 or above. This will reduce the big O complexity of this loop and depending on the size of commonCodes should make this loop execute faster.

Jared has correctly pointed out that you can optimize this with a HashSet, but I would also like to point out that the entire method is unnecessary, wasting memory for the output list and making the code less clear.
You could write the entire method as:
var commonCodesLookup = new HashSet<int>(commonCodes);
var filteredCars = cars.Where(c => commonCodesLookup.Contains(c.Code));
Execution of the filteredCars filtering operation will be deferred, so that if the consumer of it only wants the first 10 elements, i.e. by using filteredCars.Take(10), then this doesn't need to build the entire list (or any list at all).

To do what you want, I would use the Linq ToLookup method to create an ILookup instead of using a dictionary. ToLookup was made especially for this type of scenario. It is basically an indexed look up on groups. You want to group your cars by Code.
var carCodeLookup = cars.ToLookup(car => car.Code);
The creation of the carCodeLookup would be slow but then you can use it for fast lookup of cars based on Code. To get your list of cars that are in your list of common codes you can do a fast lookup.
var filteredCarsQuery = commonCodes.SelectMany(code => carCodeLookup[code]);
This assumes that your list of cars does not change very often and it is your commonCodes that are dynamic between queries.

you could use the linq join command, like
var filteredListOfCars = cars.Join(commonCodes, c => c.Code, cC => cC, (car, code) => car).ToArray();

Here's an alternative to the linq options (which are also good ideas): If you're trying to do filtering quickly, I would suggest taking advantage of built in types. You could create a DataTable that has two fields, the id of the car in your array, and the code (you can add the other 10 things if they matter as well). Then you can create a DataView around it and use the filter property of that. It uses some really fast indexing internally (B-trees I believe) so you probably won't be able to beat its performance manually unless you're an algorithms whiz, which if you were, you wouldn't be asking here. It depends what you're doing and how much performance matters.

It looks like what you're really checking is whether the "code" is common, not the car. You could consider a fly weight pattern, where cars share common instances of Code objects. The code object can then have an IsCommon property and a Value property.
You can then do something to the effect of updating the used Code objects whenever the commoncodes list changes.
Now when you do your filtering you only need to check each car code's IsCommon property

Related

Are LINQ Query Expressions Optimized?

Does LINQ do any optimization by sorting/converting data structures?
i.e.
Does LINQ iterate in this code, or is there any type of sorting/conversion done on the data to optimize the find operation?
var list = new List<IdCard> {new IdCard(){Name = "a", Number = 1}, new IdCard() { Name = "b", Number = 2 } };
var idA = list.FirstOrDefault(id => id.Name == "a");
var idB = list.FirstOrDefault(id => id.Name == "b");
I'm trying to understand if my LINQ code is directly translated to an iterative approach. If it is, then it would be better for the above code to use a dictionary lookup (given the case there were to be many look-ups) instead of relying on LINQ, right?

Does LINQ do any optimization by sorting/converting data structures?
Yes. There are all sorts of optimizations that take place throughout various LINQ methods.
Does LINQ iterate in this code
Yes.
is there any type of sorting/conversion done on the data to optimize the find operation?
No. The work spent constructing an entirely new data structure, in some sorted or hashed manner, would be more work than just iterating the sequence until you find the first item. Creating a collection in the LINQ implementation would not only be more processing work per item (as you're not only executing the predicate, but also whatever work that collection needs to do to store it for later), and using way more memory by storing the items for longer, but you can't quit as soon as you find the match, the way you could with a simple iteration.
then it would be better for the above code to use a dictionary lookup (given the case there were to be many look-ups) instead of relying on LINQ, right?
Yes, that code would be better off if you simply construct the collection using a type that natively provides operations that most effectively do what you want to perform on that collection (in this case, a collection that optimzies searching speed based on a key, so either a dictionary or sorted dictionary), rather than using an improper collection type and using LINQ methods.
Using LINQ is useful when the best algorithm for finding what you want for whatever collection you have (or can have, if you can control what collection you use) is the same as the algorithm you'd use for any arbitrary sequence. That's true in a lot of cases. This isn't one of those cases.

Thank you for your answers! They are very informative.
It looks like the answer for this query is really that some LINQ code may be optimized, and the best way to check is to simply look at the source.
Please correct me if I'm wrong, but the code appears to be available here:
https://github.com/Microsoft/referencesource/blob/master/System.Core/System/Linq/Enumerable.cs

No, all System.Linq extensions with predicate parameter enumerate the source collection, and don't modify or optimize the source collection. Some cases such as OrderBy, GroupBy, and Intersect use local collection to prevent enumerating the source more than once.
Lookup (it's similar to Dictionary<K, List<V>>) can be used as an alternative to Dictionary :
var list = new List<IdCard> {new IdCard(){Name = "a", Number = 1}, new IdCard() { Name = "b", Number = 2 } };
var lookup = list.ToLookup(c => c.Name);
var idA = lookup["a"].FirstOrDefault();
var idB = lookup["b"].FirstOrDefault();

Add, Update, Remove between collections using Linq

Here is my scenario. I am using WPF and making use of two way binding to show a collection objects received from a service call every 60 seconds. On the first call I create a collection of objects that will be displayed from the collection of service objects. On subsequent calls I need to compare the service collection to the existing collection and then do one of three things:
If the Item exists in both collections then update ALL of the values for the object in the Display collection with the values from the object in the service collection.
If the item Exists in the Service Collection and not the Display Collection then add it to the Display Collection.
If the Item exists in the Display collection and not the Service Collection then remove it from the Display collection.
I am looking for the best way to do this.
Adding & Removing
Is it smarter to do a Left Join here and return everything essentially unique to one side of the other and then add or remove that as appropriate?
Should I attempt to do a Union since Linq is supposed to merge the two and ignore the duplicates?
If so how does it decide uniqueness? Is it evaluating all the properties? Can I specify which collection to keep from and which to discard in merging?
Should I use Except to create a list of differences and somehow use that?
Should I create a new list to add and remove using Where / Not In logic?
Updating
Since the collections aren't dictionaries what is the best way to do the comparison:
list1.ForEach(x => list2[x.Id].SomeProperty = x.SomeProperty);
Is there some way of copying ALL the property values other than specifying each one of them similar to above? Can I perform some kind of shallow copy within Linq Without replacing the actual object that is there?
I don't want to just clear my list and re-add everything each time because the object is bound in the display and I have logic in the properties that is tracking deviations as values change.

You can use the except and intersect methods to accomplish most of what you are looking to do.
However, depending on the size of your objects this can be very resource intensive.
I would recommend the following.
var listIDsA = collectionA.Select(s => s.Id).Distinct().ToList();
var listIDsB = collecitonB.Select(s => s.Id).Distinct().ToList();
var idsToRemove = listIDsB.Select (s => !listIDsA.Contains(s.Id)).ToList();
var idsToUpdate = listIDsB.Select(s => listIDsA.Contains(s.Id)).ToList();
var idsToAdd = listIDsA.SelecT(s => !listIDsB.Contains(s.Id)).ToList();
Then using the three new collections you can add/remove/update the apporpriate records.
You can also use a hashedset instead of IEnumerables for better performance. This will require you to create an extension class to add that functionality. Here is a good explanation of how to do that (it's not complicated).
How to convert linq results to HashSet or HashedSet
If you do this, you will need to replace the .ToList() in the first two lines to .ToHasedSet()

For your comparison you need to overwrite equals and get hashcode
Object.GetHashCode Method
Then you can use List.Contains
List.Contains Method
If you can use HashSet then you will get better performance
Code not tested
ListDisplay.Remove(x => !ListSerice.Contains(x));
Foreash(ListItem li in ListDisplay)
{
ListItem lis = ListSerice.FirstOrDefault(x => x.Equals(li));
if (lis == null) continue;
// perform update
}
Foreach(ListItem li in ListSerice.Where(x => !ListDisplay.Contains(x))) ListDisplay.Add(li);

HashSet or Distinct to read distinct values of property in List<> of objects

This is in someway related to this (Getting all unique Items in a C# list) question.
The above question is talking about a simple array of values though. I have an object returned from a third party web service:
public class X
{
public Enum y {get; set;}
}
I have a List of these objects List<x> data;, about 100 records in total but variable. Now I want all the possible values in the list of the property y and I want to bind this do a CheckBoxList.DataSource (in case that makes a difference).
Hows the most efficient way to do this?
I can think of two algorithms:
var data = HashSet<Enum> hashSet = new HashSet<Enum>(xs.Select(s => s.y));
chkBoxList.DataSource = data;
Or
var data = xs.Select(s => s.y).Distinct();
chkBoxList.DataSource = data;
My gut feeling is the HashSet but I'm not 100% sure.
Open to better ideas if anyone has any idea?

If it is a one time operation - use .Distinct. If you are going to add elements again and again - use HashSet

The HashSet one, since it keeps the objects around after the hashset object has been constructed, and foreach-ing it will not require expensive operations.
On the other hand, the Distinct enumerator will likely be evaluated every time the DataSource is enumerated, and all the work of removing duplicate values will be repeated.

c# in memory query of objects without linq

We are still using .Net Framework 2.0 / VS 2005 so i do not have LINQ. If i don't want to go with the poor man's LINQ solution, what are some other alternatives for being able to query in memory custom objects in a dictionary?

I'm not sure if one of your poor man's LINQ solution is LINQBridge but I used it for a few weeks and it seemed to be working okay before we actually switched to .NET 3.5 and the real deal.

Dictionary<T> would seem like a good choice, although you haven't provided much information about what you mean by "query." Are you just looking to retrieve data based on some key value? Get a count of total items? Do a sum or average based on some condition? You really need to give more information to get a better answer.

To elaborate on what Chesso said, you'll have to iterate the loop just like LINQ does...
for example:
static T FindFirst<T>(IEnumerable<T> col, Predicate<T> predicate)
{
foreach(T t in col)
{
if(predicate(t))
{
return t;
}
}
return default(T);
}

I was not aware of the Predicate delegate, that seems to be pretty much what i was looking for. As far as the context for which i'm querying:
Say i have a object X with properties A (name, guaranteed to be unique) and B (age)
1) I have a series of objects in a dictionary whose keys are say Property A of a given object, and of course the value is the object iself.
Now i want to retrieve all objects in this dictionary which meet a certain criteria of B, say age > 20.
I can add all the values of the dictionary into a list then call the .FindAll on it, passing in a delegate. I can create an anonymous delegate to do this, but say i will reuse this many times. How can i dynamically specify an age criteria for the delegate method? Would the only choice be to encapsulate the Predicate method in a class, then create a new instance of that class with my criteria as an instance variable?

Common problem for me in C#, is my solution good, stupid, reasonable? (Advanced Beginner)

Ok, understand that I come from Cold Fusion so I tend to think of things in a CF sort of way, and C# and CF are as different as can be in general approach.
So the problem is: I want to pull a "table" (thats how I think of it) of data from a SQL database via LINQ and then I want to do some computations on it in memory. This "table" contains 6 or 7 values of a couple different types.
Right now, my solution is that I do the LINQ query using a Generic List of a custom Type. So my example is the RelevanceTable. I pull some data out that I want to do some evaluation of the data, which first start with .Contains. It appears that .Contains wants to act on the whole list or nothing. So I can use it if I have List<string>, but if I have List<ReferenceTableEntry> where ReferenceTableEntry is my custom type, I would need to override the IEquatable and tell the compiler what exactly "Equals" means.
While this doesn't seem unreasonable, it does seem like a long way to go for a simple problem so I have this sneaking suspicion that my approach is flawed from the get go.
If I want to use LINQ and .Contains, is overriding the Interface the only way? It seems like if there way just a way to say which field to operate on. Is there another collection type besides LIST that maybe has this ability. I have started using List a lot for this and while I have looked and looked, a see some other but not necessarily superior approaches.
I'm not looking for some fine point of performance or compactness or readability, just wondering if I am using a Phillips head screwdriver in a Hex screw. If my approach is a "decent" one, but not the best of course I'd like to know a better, but just knowing that its in the ballpark would give me little "Yeah! I'm not stupid!" and I would finish at least what I am doing completely before switch to another method.
Hope I explained that well enough. Thanks for you help.

What exactly is it you want to do with the table? It isn't clear. However, the standard LINQ (-to-Objects) methods will be available on any typed collection (including List<T>), allowing any range of Where, First, Any, All, etc.
So: what is you are trying to do? If you had the table, what value(s) do you want?
As a guess (based on the Contains stuff) - do you just want:
bool x= table.Any(x=>x.Foo == foo); // or someObj.Foo
?

There are overloads for some of the methods in the List class that takes a delegate (optionally in the form of a lambda expression), that you can use to specify what field to look for.
For example, to look for the item where the Id property is 42:
ReferenceTableEntry found = theList.Find(r => r.Id == 42);
The found variable will have a reference to the first item that matches, or null if no item matched.
There are also some LINQ extensions that takes a delegate or an expression. This will do the same as the Find method:
ReferenceTableEntry found = theList.FirstOrDefault(r => r.Id == 42);

Ok, so if I'm reading this correctly you want to use the contains method. When using this with collections of objects (such as ReferenceTableEntry) you need to be careful because what you're saying is you're checking to see if the collection contains an object that IS the same as the object you're comparing against.
If you use the .Find() or .FindAll() method you can specify the criteria that you want to match on using an anonymous method.
So for example if you want to find all ReferenceTableEntry records in your list that have an Id greater than 1 you could do something like this
List<ReferenceTableEntry> listToSearch = //populate list here
var matches = listToSearch.FindAll(x => x.Id > 1);
matches will be a list of ReferenceTableEntry records that have an ID greater than 1.
having said all that, it's not completely clear that this is what you're trying to do.

Here is the LINQ query involved that creates the object I am talking about, and the problem line is:
.Where (searchWord => queryTerms.Contains(searchWord.Word))
List<queryTerm> queryTerms = MakeQueryTermList();
public static List<RelevanceTableEntry> CreateRelevanceTable(List<queryTerm> queryTerms)
{
SearchDataContext myContext = new SearchDataContext();
var productRelevance = (from pwords in myContext.SearchWordOccuranceProducts
where (myContext.SearchUniqueWords
.Where (searchWord => queryTerms.Contains(searchWord.Word))
.Select (searchWord => searchWord.Id)).Contains(pwords.WordId)
orderby pwords.WordId
select new {pwords.WordId, pwords.Weight, pwords.Position, pwords.ProductId});
}
This query returns a list of WordId's that match the submitted search string (when it was List and it was just the word, that works fine, because as an answerer mentioned before, they were the same type of objects). My custom type here is queryTerms, a List that contains WordId, ProductId, Position, and Weight. From there I go about calculating the relevance by doing various operations on the created object. Sum "Weight" by product, use position matches to bump up Weights, etc. My point for keeping this separate was that the rules for doing those operations will change, but the basic factors involved will not. I would have even rather it be MORE separate (I'm still learning, I don't want to get fancy) but the rules for local and interpreted LINQ queries seems to trip me up when I do.
Since CF has supported queries of queries forever, that's how I tend to lean. Pull the data you need from the db, then do your operations (which includes queries with Aggregate functions) on the in-memory table.
I hope that makes it more clear.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.