I have an IEnumerable that where T is a complex object. I need to check and see if there are 5 or more items in the list that match a lambda expression. Currently I am using something like this:
if(myList.Count(c=> c.PropertyX == desiredX && c.Y != undesiredY) >= 5)...
However, I myList grows to containing 10K+ objects this becomes a huge bottle neck and more than likely it will have found a match in the first 100 items (but I can't make that assumption).
How can I do this as efficiently as possible.
You can use a Where to filter, then Skip the first 4 matches and use Any which will stop iterating once it hits the 5th match. The case where there are less than 5 matches will still have to iterate the entire list though.
if(myList.Where(c=> c.PropertyX == desiredX && c.Y != undesiredY).Skip(4).Any())
How about iterating though the list using a plain old for loop?:
int count = 0;
for (int i = 0; i < myList.Count; ++i)
{
if (myList[i].PropertyX == desiredX && myList[i].Y != undesiredY)
count++;
if (count == 5)
break;
}
This should be pretty much be as fast as it gets on a single thread. Since you can't make any assumption where in the list these items may be, the time complexity of the algorithm won't be better than be O(n) where n is the number of items in the list, i.e. in the worst case scenario you may have to iterate through the entire list. And there is no faster way of iterating through a list than using a for loop that I know of :)
You can use Skip 4 elements and check if your collections has any other elements. In this way you willl not count the whole elements in the collection.
var result = myList.Where(c=>c.PropertyX == desiredX && c.Y != undesiredY);
if(result.Skip(4).Any())
{
//has >= 5 elements.
}
Related
I have a list of objects. This object has a field called val. This value shouldn't be smaller than zero but there are such objects in this list. I want to replace those less-than-zero values with zero. The easiest solution is
foreach(Obj item in list)
{
if (item.val < 0)
{
item.val = 0;
}
}
But I want to do this using LINQ. The important thing is I do not want a list of updated elements. I want the same list just with the necessary values replaced. Thanks in advance.
As I read the comments I realized what I wanted to do is less efficient and pointless. LINQ is for querying and creating new collections rather than updating collections. A possible solution I came across was this
list.Select(c => { if (c.val < 0 ) c.val= 0; return c;}).ToList();
But my initial foreach solution is more efficient than this. So dont make the same mistake I do and complicate things.
you can try this one, which is faster because of parallelism
Parallel.ForEach(list, item =>
{
item.val = item.val < 0 ? 0 : item.val;
});
The Parallel ForEach in C# provides a parallel version of the standard, sequential Foreach loop. In standard Foreach loop, each iteration processes a single item from the collection and will process all the items one by one only. However, the Parallel Foreach method executes multiple iterations at the same time on different processors or processor cores. This may open the possibility of synchronization problems. So, the loop is ideally suited to processes where each iteration is independent of the others
More Details - LINK
loop 'for' is faster than 'foreach' so you can use this one
for (int i = 0; i < list.Count; i++)
{
if(list[i].val <= 0)
{
list[i].val = 0;
}
}
I have a following code:
var tempResults = new Dictionary<Record, List<Record>>();
errors = new List<Record>();
foreach (Record record in diag)
{
var code = Convert.ToInt16(Regex.Split(record.Line, #"\s{1,}")[4], 16);
var cond = codes.Where(x => x.Value == code && x.Active).FirstOrDefault();
if (cond == null)
{
errors.Add(record);
continue;
}
var min = record.Datetime.AddSeconds(downDiff);
var max = record.Datetime.AddSeconds(upDiff);
//PROBLEM PART - It takes around 4,5ms
var possibleResults = cas.Where(x => x.Datetime >= min && x.Datetime <= max).ToList();
if (possibleResults.Count == 0)
errors.Add(record);
else
{
if (!CompareCond(record, possibleResults, cond, ref tempResults, false))
{
errors.Add(record);
}
}
}
variable diag is List of Record
variable cas is List of Record with around 50k items.
The problem is that it's too slowly. The part with the first where clause needs around 4,6599ms, e.g. for 3000 records in List diag it makes 3000*4,6599 = 14 seconds. Is there any option to optimize the code?
You can speed up that specific statement you emphasized
cas.Where(x => x.Datetime >= min && x.Datetime <= max).ToList();
With binary search over cas list. First pre-sort cas by Datetime:
cas.Sort((a,b) => a.Datetime.CompareTo(b.Datetime));
Then create comparer for Record which will compare only Datetime properties (implementation assumes there are no null records in the list):
private class RecordDateComparer : IComparer<Record> {
public int Compare(Record x, Record y) {
return x.Datetime.CompareTo(y.Datetime);
}
}
Then you can translate your Where clause like this:
var index = cas.BinarySearch(new Record { Datetime = min }, new RecordDateComparer());
if (index < 0)
index = ~index;
var possibleResults = new List<Record>();
// go backwards, for duplicates
for (int i = index - 1; i >= 0; i--) {
var res = cas[i];
if (res.Datetime <= max && res.Datetime >= min)
possibleResults.Add(res);
else break;
}
// go forward until item bigger than max is found
for (int i = index; i < cas.Count; i++) {
var res = cas[i];
if (res.Datetime <= max &&res.Datetime >= min)
possibleResults.Add(res);
else break;
}
Idea is to find first record with Datetime equal or greater to your min, with BinarySearch. If exact match is found - it returns index of matched element. If not found - it returns negative value, which can be translated to the index of first element greater than target with ~index operation.
When we found that element, we can just go forward the list and grab items until we find item with Datetime greater than max (because list is sorted). We need to go a little backwards also, because if there are duplicates - binary search will not necessary return the first one, so we need to go backwards for potential duplicates.
Additional improvements might include:
Putting active codes in a Dictionary (keyed by Value) outside of for loop, and thus replacing codes Where search with Dictionary.ContainsKey.
As suggested in comments by #Digitalsa1nt - parallelize foreach loop, using Parallel.For, PLINQ, or any similar techniques. It's a perfect case for parallelization, because loop contains only CPU bound work. You need to make a little adjustments to make it thread-safe of course, such as using thread-safe collection for errors (or locking around adding to it).
Try adding AsNoTracking in the list
The AsNoTracking method can save both execution times and memory usage. Applying this option really becomes important when we retrieve a large amount of data from the database.
var possibleResults = cas.Where(x => x.Datetime >= min && x.Datetime <= max).AsNoTracking().ToList(); //around 4,6599ms
There a few improvements you can make here.
It might only be a minor performance increase but you should try using groupby instead of where in this circumstance.
So instead you should have something like this:
cas.GroupBy(x => x.DateTime >= min && x.DateTime <= max).Select(h => h.Key == true);
This ussually works for seaching through lists for distinct values, but in you case I'm unsure if it will provide you any benefit when using a clause.
Also a few other things you can do throughout you code:
Avoid using ToList when possible and stick to IEnumerable. ToList performs an eager evaluation which is probably causing a lot of slowdown in your query.
use .Any() instead of Count when checking if values exist (This only applies if the list is IEnumerable)
I have a List of longs from a DB query. The total number in the List is always an even number, but the quantity of items can be in the hundreds.
List item [0] is the lower boundary of a "good range", item [1] is the upper boundary of that range. A numeric range between item [1] and item [2] is considered "a bad range".
Sample:
var seekset = new SortedList();
var skd= 500;
while( skd< 1000000 )
{
seekset.Add(skd, 0);
skd = skd+ 100;
}
If an input number is compared to the List items, if the input number is between 500-600 or 700-800 it is considered "good", but if it is between 600-700 it is considered "bad".
Using the above sample, can anyone comment on the right/fast way to determine if the number 655 is a "bad" number, ie not within any good range boundary (C#, .NET 4.5)?
If a SortedList is not the proper container for this (eg it needs to be an array), I have no problem changing, the object is static (lower case "s") once it is populated but can be destroyed/repopulated by other threads at any time.
The following works, assuming the list is already sorted and both of each pair of limits are treated as "good" values:
public static bool IsGood<T>(List<T> list, T value)
{
int index = list.BinarySearch(value);
return index >= 0 || index % 2 == 0;
}
If you only have a few hundred items then it's really not that bad. You can just use a regular List and do a linear search to find the item. If the index of the first larger item is even then it's no good, if it's odd then it's good:
var index = data.Select((n, i) => new { n, i })
.SkipWhile(item => someValue < item.n)
.First().i;
bool isValid = index % 2 == 1;
If you have enough items that a linear search isn't desirable then you can use a BinarySearch to find the next largest item.
var searchValue = data.BinarySearch(someValue);
if (searchValue < 0)
searchValue = ~searchValue;
bool isValid = searchValue % 2 == 1;
I am thinking that LINQ may not be best suited for this problem because IEnumerable forgets about item[0] when it is ready to process item[1].
Yes, this is freshman CS, but the fastest in this case may be just
// untested code
Boolean found = false;
for(int i=0; i<seekset.Count; i+=2)
{
if (valueOfInterest >= seekset[i] &&
valueOfInterest <= seekset[i+1])
{
found = true;
break; // or return;
}
}
I apologize for not directly answering your question about "Best approach in Linq", but I sense that you are really asking about best approach for performance.
I have a list of numbers and I’d like to remove all the even ones. I think my code is right:
System.Collections.Generic.List<int> list = ...
foreach (int i in list)
{
if (i % 2 == 0)
list.Remove(i);
}
but when I run it I get an exception. What am I doing wrong?
You can't modify a collection in a foreach loop, that being said,
you can't remove an item from a list that you're iterating over in a foreach loop.
Instead of the foreach loop, just use this single line of code:
list.RemoveAll(i => i % 2 == 0);
You cannot modify the collection during a foreach loop. A foreach loop uses an enumerator to loop through the collection, and when the collection is modified this is what happens to the enumerator:
An enumerator remains valid as long as the collection remains
unchanged. If changes are made to the collection, such as adding,
modifying, or deleting elements, the enumerator is irrecoverably
invalidated and its behavior is undefined.
You can use a regular for loop.
for (int i = 0; i < list.Count; i++)
{
int n = list[i];
if (n % 2 == 0)
{
list.RemoveAt(i--);
}
}
The foreach uses an IEnumerator under the covers, when an element in your list is removed, it leaves the enumerator in a potentially inconsistent state. The 'safest' thing for it to do is throw an exception.
To work around this, make a local copy of your collection first:
var local = new List<int>(list);
foreach (int i in local) { if (i % 2 == 0) list.Remove(i); }
If you're removing from a list of anything (or even an array) you should iterate backward through it as removing an item shifts all items after it down by one position. Iterating forward will cause you to skip over the next item each time.
Which exception did you get? Sometimes foreach will lock an item to where it can't be edited when it's used in the foreach. Instead, use for (and go backwards!)
for(int i = list.Length - 1 ; i > -1 ; i--)
to follow #Chris Filstow's method....
this will take your list, and replace it with a new one where the elements meet your criteria:
System.Collections.Generic.List<int> list = ...
list = list.Where( n=> n % 2 == 0 ).ToList();
You could try something like this instead. (It creates a new list of just the even numbers rather than removing the odds from the existing list, so it depends on what you're looking to do.)
var numbers = Enumerable.Range(1, 100);
var evens = numbers.Where(n => n % 2 == 1);
All your getting out of the foreach loop is readonly if you try to change the items in the list it explains why you get an exception.
This article right here explains why.
You could alway switch to a for loop.
for (int i = 1 ; i < list.lenght; i++)
{
if (i % 2 == 0)
list.Remove(i);
}
I have a list of items to remove from an ordered collection in C#.
what's the best way in going about this?
If I remove an item in the middle, the index changes but what If I want to remove multiple items?
To avoid index changes, start at the end and go backwards to index 0.
Something along these lines:
for(int i = myList.Count - 1; i >= 0; i++)
{
if(NeedToDelete(myList[i]))
{
myList.RemoveAt(i);
}
}
What is the type of the collection? If it inherits from ICollection, you can just run a loop over the list of items to remove, then call the .Remove() method on the collection.
For Example:
object[] itemsToDelete = GetObjectsToDeleteFromSomewhere();
ICollection<object> orderedCollection = GetCollectionFromSomewhere();
foreach (object item in itemsToDelete)
{
orderedCollection.Remove(item);
}
If the collection is a List<T> you can also use the RemoveAll method:
list.RemoveAll(x => otherlist.Contains(x));
Assuming that the list of items to delete is relatively short, you can first sort the target list. Than traverse the source list and keep an index in the target list which corresponds to the item which you deleted.
Supposed that the source list is haystack and list of items to delete is needle:
needle.Sort(); // not needed if it's known that `needle` is sorted
// haystack is known to be sorted
haystackIdx = 0;
needleIdx = 0;
while (needleIdx < needle.Count && haystackIdx < haystack.Count)
{
if (haystack[haystackIdx] < needle[needleIdx])
haystackIdx++;
else if (haystack[haystackIdx] > needle[needleIdx])
needleIdx++;
else
haystack.RemoveAt(haystackIdx);
}
This way you have only 1 traversal of both haystack and needle, plus the time of sorting the needle, provided the deletion is O(1) (which is often the case for linked lists and the collections like that). If the collection is a List<...>, deletion will need O(collection size) because of data shifts, so you'd better start from the end of both collections and move to the beginning:
needle.Sort(); // not needed if it's known that `needle` is sorted
// haystack is known to be sorted
haystackIdx = haystack.Count - 1;
needleIdx = needle.Count - 1;
while (needleIdx >= 0 && haystackIdx >= 0)
{
if (haystack[haystackIdx] > needle[needleIdx])
haystackIdx--;
else if (haystack[haystackIdx] < needle[needleIdx])
needleIdx--;
else
haystack.RemoveAt(haystackIdx--);
}