count objects that meet certain condition in List-collection - c#

I want to count the occurences of objects within a List<T> that match a certain condition.
For example like this
int List<T>.Count(Predicate<T> match)
So for example if have a list of chores, I can see how many are overdue.
int overdue = listOfChores.Count((element) => { return element.DueDate <= DateTime.Today; });
I know that does not exist and so far I solve problems like that in the following way:
int overdue = listOfChores.FindAll([...]).Count;
However that allocates and initializes a new List etc. only to get the count.
A way to do this with less allocation overhead etc.:
int good = 0;
foreach(chore element in listOfChores)
if(element.DueDate <= DateTime.Today)
good++;
The last approach can also be exandend to count several conditions without iterating over the loop more than once. (I already found that getting the count property only takes O(1), but making the List to count from still eats a lot of time)
int a = 0;
int b = 0;
foreach(chore element in listOfChores)
if(element.CondA)
a++;
if(element.CondB)
b++;
Given this I could even imagine something like
int[] List<T>.Count(Predicate<T>[] matches)
My question(s):
Is there such a thing, just I haven't found it yet?
If not: What would be way to implement such functionality?
EDIT :
Adding LINQ looks like it fixes it.

You just have your syntax slightly off. This is how to use Count :
int overdue = listOfChores.Count(element => element.DueDate <= DateTime.Today);
If you already have a Predicate<T> and want to pass it to Count just call it like a function:
Predicate<Chore> p = (element) => element.DueDate <= DateTime.Today;
int overdue = listOfChores.Count(element => p(element));

There's is a count method using a predicate : see Enumerable.Count Method (IEnumerable, Func)
Note that this method is an extension method and you can use it only if you add a reference to the System.Linq namespace.

Related

C# sort List<int> recursively

there's an exercise i need to do, given a List i need to sort the content using ONLY recursive methods (no while, do while, for, foreach).
So... i'm struggling (for over 2 hours now) and i dont know how to even begin.
The function must be
List<int> SortHighestToLowest (List<int> list) {
}
I THINK i should check if the previous number is greater than the actual number and so on but what if the last number is greater than the first number on the list?, that's why im having a headache.
I appreciate your help, thanks a lot.
[EDIT]
I delivered the exercise but then teacher said i shouldn't use external variables like i did here:
List<int> _tempList2 = new List<int>();
int _actualListIndex = 0;
int _actualMaxNumber = 0;
int _actualMaxNumberIndex = 0;
List<int> SortHighestToLowest(List<int> list)
{
if (list.Count == 0)
return _tempList2;
if (_actualListIndex == 0)
_actualMaxNumber = list[0];
if (_actualListIndex < list.Count -1)
{
_actualListIndex++;
if (list[_actualListIndex] > _actualMaxNumber)
{
_actualMaxNumberIndex = _actualListIndex;
_actualMaxNumber = list[_actualListIndex];
}
return SortHighestToLowest(list);
}
_tempList2.Add(_actualMaxNumber);
list.RemoveAt(_actualMaxNumberIndex);
_actualListIndex = 0;
_actualMaxNumberIndex = 0;
return SortHighestToLowest(list);
}
Exercise is done and i approved (thanks to other exercises as well) but i was wondering if there's a way of doing this without external variables and without using System.Linq like String.Empty's response (im just curious, the community helped me to solve my issue and im thankful).
I am taking your instructions to the letter here.
Only recursive methods
No while, do while, for, foreach
Signature must be List<int> SortHighestToLowest(List<int> list)
Now, I do assume you may use at least the built-in properties and methods of the List<T> type. If not, you would have a hard time even reading the elements of your list.
That said, any calls to Sort or OrderBy methods would be beyond the point here, since they would render any recursive method useless.
I also assume it is okay to use other lists in the process, since you didn't mention anything in regards to that.
With all that in mind, I came to this piece below, making use of Max and Remove methods from List<T> class, and a new list of integers for each recursive call:
public static List<int> SortHighestToLowest(List<int> list)
{
// recursivity breaker
if (list.Count <= 1)
return list;
// remove highest item
var max = list.Max();
list.Remove(max);
// append highest item to recursive call for the remainder of the list
return new List<int>(SortHighestToLowest(list)) { max };
}
For solving this problem, try to solve smaller subsets. Consider the following list
[1,5,3,2]
Let's take the last element out of list, and consider the rest as sorted which will be [1,3,5] and 2. Now the problem reduces to another problem of inserting this 2 in its correct position. If we can insert it in correct position then the array becomes sorted. This can be applied recursively.
For every recursive problem there should be a base condition w.r.t the hypothesis we make. For the first problem the base condition is array with single element. A single element array is always sorted.
For the second insert problem the base condition will be an empty array or the last element in array is less than the element to be inserted. In both cases the element is inserted at the end.
Algorithm
---------
Sort(list)
if(list.count==1)
return
temp = last element of list
temp_list = list with last element removed
Sort(temp_list)
Insert(temp_list, temp)
Insert(list, temp)
if(list.count ==0 || list[n-1] <= temp)
list.insert(temp)
return
insert_temp = last element of list
insert_temp_list = list with last element removed
Insert(insert_temo_list, insert_temp)
For Insert after base condition its calling recursively till it find the correct position for the last element which is removed.

LINQ lazy evaluation causing issues with array iterator

I have a class that contains four EnumerableRowCollections, which all point to the same DataTable. The main one will need different combinations of the other three filtered out in different class instances. Since three of them are related, I put them in an array.
EnumerableRowCollection<DataRow> valid;
EnumerableRowCollection<DataRow>[] pending;
All of these collections are defined in the class constructor, but evaluated later due to LINQ's lazy evaluation.
I also have an array of Booleans, which are used to determine which "pending" collections are filtered out of the "valid" collection. These are also assigned in the constructor, and are never changed.
Boolean[] pendingIsValid;
The "valid" collection is filtered like this:
for (var i = 0; i < pending.Length; i++)
if (pendingIsValid[i] && pending[i].Count() > 0)
valid = valid.Where(r => !pending[i].Contains(r));
This also occurs in the constructor, but the Where clause is evaluated lazily, as expected.
This works most of the time, however, in a few cases I got a weird exception when the collection evaluation took place down the road.
I get an IndexOutOfRange because of the local iterator variable, i, in my for loop above is set to 3.
Questions:
Can I make "Where" evaluate the array indexer (or other sub-expressions) non-lazily?
How does the iterator get incremented to 3 at all? Does this lazy evaluation count as "re-entering" the loop?
!?!?
Change it to this:
for (var i = 0; i < pending.Length; i++)
if (pendingIsValid[i] && pending[i].Count() > 0)
{
var j = i;
valid = valid.Where(r => !pending[j].Contains(r));
}
For question #1 - you can make it not lazy by adding .ToList() at the end. However, with the above fix, you can keep it lazy.
Have a read of this: Captured variable in a loop in C# for the explanation
Excellent, Rob. I also figured out this while I was waiting for a response, but yours looks a bit cleaner.
for (var i = 0; i < pending.Length; i++) {
var p = pending[i];
if (pendingIsValid[i] && p.Count() > 0)
valid = valid.Where(r => !p.Contains(r));
}

Simple List<string> vs IEnumarble<string> Performance issues

I've tested List<string> vs IEnumerable<string>
iterations with for and foreach loops , is it possible that the List is much faster ?
these are 2 of few links I could find that are publicly stating that performance is better iterating IEnumerable over List.
Link1
Link2
my tests was loading 10K lines from a text file that holds a list of URLs.
I've first loaded it in to a List , then copied List to an IEnumerable
List<string> StrByLst = ...method to load records from the file .
IEnumerable StrsByIE = StrByLst;
so each has 10k items Type <string>
looping on each collection for 100 times , meaning 100K iterations, resulted with
List<string> is faster by amazing 50 x than the IEnumerable<string>
is that predictable ?
update
this is the code that is doing the tests
string WorkDirtPath = HostingEnvironment.ApplicationPhysicalPath;
string fileName = "tst.txt";
string fileToLoad = Path.Combine(WorkDirtPath, fileName);
List<string> ListfromStream = new List<string>();
ListfromStream = PopulateListStrwithAnyFile(fileToLoad) ;
IEnumerable<string> IEnumFromStream = ListfromStream ;
string trslt = "";
Stopwatch SwFr = new Stopwatch();
Stopwatch SwFe = new Stopwatch();
string resultFrLst = "",resultFrIEnumrable, resultFe = "", Container = "";
SwFr.Start();
for (int itr = 0; itr < 100; itr++)
{
for (int i = 0; i < ListfromStream.Count(); i++)
{
Container = ListfromStream.ElementAt(i);
}
//the stop() was here , i was doing changes , so my mistake.
}
SwFr.Stop();
resultFrLst = SwFr.Elapsed.ToString();
//forgot to do this reset though still it is faster (x56??)
SwFr.Reset();
SwFr.Start();
for(int itr = 0; itr<100; itr++)
{
for (int i = 0; i < IEnumFromStream.Count(); i++)
{
Container = IEnumFromStream.ElementAt(i);
}
}
SwFr.Stop();
resultFrIEnumrable = SwFr.Elapsed.ToString();
Update ... final
taking out the counter to outside of the for loops ,
int counter = ..countfor both IEnumerable & List
then passed counter(int) as a count of total items as suggested by #ScottChamberlain .
re checked that every thing is in place, now the results are 5 % faster IEnumerable.
so that concludes , use by scenario - use case... no performance difference at all ...
You are doing something wrong.
The times that you get should be very close to each other, because you are running essentially the same code.
IEnumerable is just an interface, which List implements, so when you call some method on the IEnumerable reference it ends up calling the corresponding method of List.
There is no code implemented in the IEnumerable - this is what interfaces are - they only specify what functionality a class should have, but say nothing about how it's implemented.
You have a few problems with your test, one is the IEnumFromStream.Count() inside the for loop, every time it want to get that value it must enumerate over the entire list to get the count and the value is not cached between loops. Move that call outside of the for loop and save the result in a int and use that value for the for loop, you will see a shorter time for your IEnumerable.
Also the IEnumFromStream.ElementAt(i) behaves similarly to Count() it must iterate over the whole list up to i (eg: first time it goes 0, second time 0,1, third 0,1,2, and so on...) every time where List can jump directly to the index it needs. You should be working with the IEnumerator returned from GetEnumerator() instead.
IEnumerable's and for loop's don't mix well. Use the correct tool for the job, either call GetEnumerator() and work with that or use it in a foreach loop.
Now, I know a lot of you may be saying "But it is a interface it will be just mapping the calls and it should make no difference", but there is a key thing, IEnumerable<T> Does not have a Count() or ElementAt() method!. Those methods are extension methods added by LINQ, and the LINQ classes do not know the underlying collection is a List, so it does what it knows the underlying object can do, and that is iterating over the list every time the method is called.
IEnumerable using IEnumerator
using(var enu = IEnumFromStream.GetEnumerator())
{
//You have to call "MoveNext()" once before getting "Current" the first time,
// this is done so you can have a nice clean while loop like this.
while(enu.MoveNext())
{
Container = enu.Current;
}
}
The above code is basically the same thing as
foreach(var enu in IEnumFromStream)
{
Container = enu;
}
The important thing to remember is IEnumerable's do not have a length, in fact they can be infinitely long. There is a whole field of computer science on detecting a infinitely long IEnumerable
Based on the code you posted I think the problem is with your use of the Stopwatch class.
You declare two of these, SwFr and SwFe, but only use the former. Because of this, the last call to SwFr.Elapsed will get the total amount of time across both sets of for loops.
If you are wanting to reuse that object in this way, place a call to SwFr.Reset() right after resultFrLst = SwFr.Elapsed.ToString();.
Alternatively, you could use SwFe when running the second test.

get next available integer using LINQ

Say I have a list of integers:
List<int> myInts = new List<int>() {1,2,3,5,8,13,21};
I would like to get the next available integer, ordered by increasing integer. Not the last or highest one, but in this case the next integer that is not in this list. In this case the number is 4.
Is there a LINQ statement that would give me this? As in:
var nextAvailable = myInts.SomeCoolLinqMethod();
Edit: Crap. I said the answer should be 2 but I meant 4. I apologize for that!
For example: Imagine that you are responsible for handing out process IDs. You want to get the list of current process IDs, and issue a next one, but the next one should not just be the highest value plus one. Rather, it should be the next one available from an ordered list of process IDs. You could get the next available starting with the highest, it does not really matter.
I see a lot of answers that write a custom extension method, but it is possible to solve this problem with the standard linq extension methods and the static Enumerable class:
List<int> myInts = new List<int>() {1,2,3,5,8,13,21};
// This will set firstAvailable to 4.
int firstAvailable = Enumerable.Range(1, Int32.MaxValue).Except(myInts).First();
The answer provided by #Kevin has a undesirable performance profile. The logic will access the source sequence numerous times: once for the .Count call, once for the .FirstOrDefault call, and once for each .Contains call. If the IEnumerable<int> instance is a deferred sequence, such as the result of a .Select call, this will cause at least 2 calculations of the sequence, along with once for each number. Even if you pass a list to the method, it will potentially go through the entire list for each checked number. Imagine running it on the sequence { 1, 1000000 } and you can see how it would not perform well.
LINQ strives to iterate source sequences no more than once. This is possible in general and can have a big impact on the performance of your code. Below is an extension method which will iterate the sequence exactly once. It does so by looking for the difference between each successive pair, then adds 1 to the first lower number which is more than 1 away from the next number:
public static int? FirstMissing(this IEnumerable<int> numbers)
{
int? priorNumber = null;
foreach(var number in numbers.OrderBy(n => n))
{
var difference = number - priorNumber;
if(difference != null && difference > 1)
{
return priorNumber + 1;
}
priorNumber = number;
}
return priorNumber == null ? (int?) null : priorNumber + 1;
}
Since this extension method can be called on any arbitrary sequence of integers, we make sure to order them before we iterate. We then calculate the difference between the current number and the prior number. If this is the first number in the list, priorNumber will be null and thus difference will be null. If this is not the first number in the list, we check to see if the difference from the prior number is exactly 1. If not, we know there is a gap and we can add 1 to the prior number.
You can adjust the return statement to handle sequences with 0 or 1 items as you see fit; I chose to return null for empty sequences and n + 1 for the sequence { n }.
This will be fairly efficient:
static int Next(this IEnumerable<int> source)
{
int? last = null;
foreach (var next in source.OrderBy(_ => _))
{
if (last.HasValue && last.Value + 1 != next)
{
return last.Value + 1;
}
last = next;
}
return last.HasValue ? last.Value + 1 : Int32.MaxValue;
}
public static class IntExtensions
{
public static int? SomeCoolLinqMethod(this IEnumerable<int> ints)
{
int counter = ints.Count() > 0 ? ints.First() : -1;
while (counter < int.MaxValue)
{
if (!ints.Contains(++counter)) return counter;
}
return null;
}
}
Usage:
var nextAvailable = myInts.SomeCoolLinqMethod();
Ok, here is the solution that I came up with that works for me.
var nextAvailableInteger = Enumerable.Range(myInts.Min(),myInts.Max()).FirstOrDefault( r=> !myInts.Contains(r));
If anyone has a more elegant solution I would be happy to accept that one. But for now, this is what I'm putting in my code and moving on.
Edit: this is what I implemented after Kevin's suggestion to add an extension method. And that was the real answer - that no single LINQ extension would do so it makes more sense to add my own. That is really what I was looking for.
public static int NextAvailableInteger(this IEnumerable<int> ints)
{
return NextAvailableInteger(ints, 1); // by default we use one
}
public static int NextAvailableInteger(this IEnumerable<int> ints, int defaultValue)
{
if (ints == null || ints.Count() == 0) return defaultValue;
var ordered = ints.OrderBy(v => v);
int counter = ints.Min();
int max = ints.Max();
while (counter < max)
{
if (!ordered.Contains(++counter)) return counter;
}
return (++counter);
}
Not sure if this qualifies as a cool Linq method, but using the left outer join idea from This SO Answer
var thelist = new List<int> {1,2,3,4,5,100,101};
var nextAvailable = (from curr in thelist
join next in thelist
on curr + 1 equals next into g
from newlist in g.DefaultIfEmpty()
where !g.Any ()
orderby curr
select curr + 1).First();
This puts the processing on the sql server side if you're using Linq to Sql, and allows you to not have to pull the ID lists from the server to memory.
var nextAvailable = myInts.Prepend(0).TakeWhile((x,i) => x == i).Last() + 1;
It is 7 years later, but there are better ways of doing this than the selected answer or the answer with the most votes.
The list is already in order, and based on the example 0 doesn't count. We can just prepend 0 and check if each item matches it's index. TakeWhile will stop evaluating once it hits a number that doesn't match, or at the end of the list.
The answer is the last item that matches, plus 1.
TakeWhile is more efficient than enumerating all the possible numbers then excluding the existing numbers using Except, because we TakeWhile will only go through the list until it finds the first available number, and the resulting Enumerable collection is at most n.
The answer using Except generates an entire enumerable of answers that are not needed just to grab the first one. Linq can do some optimization with First(), but it still much slower and more memory intensive than TakeWhile.

Are queries faster than a for loop when trying to find a nonexistent value in a certain range?

I have a method which should consider a collection of instances of a class and find the first positive number not present as an attribute in those instances.
Here is my situation: I have a class called GestorePersonale (an employee manager class) which administers a List of instances of Dipendente (an employee class). Each Dipendente has an ID which has to be unique among all of the other instances of Dipendente present in the List's.
When creating a new Dipendente I have to find an unique ID to assign to it.
For this task, I first find out the highest ID (Matricola) among all of the instances in the list and then cycle through all of the numbers from 0 to that ID to try to find a gap ID to use for the new Dipendente. If all else fails, I'll just assign an ID corresponding to max + 1.
Here is the method MatricolaMax() which is in charge of returning the highest ID between those of all of the instances in the List (I'm posting this code just for clarity, it is not the part the question focuses on, even though any suggestion for performance improvement would be highly appreciated here as well):
private uint MatricolaMax ()
{
// Looking for the highest ID
return dipendenti.OrderByDescending( dipendente => dipendente.Matricola ).First().Matricola;
}
and here is the method this question's title refers to:
private uint MatricolaLibera ()
{
var max = MatricolaMax();
for ( uint i = 0; i < max; i++ )
{
var conto = dipendenti.Where( dipendente => dipendente.Matricola == i ).Count();
if ( conto == 0 )
return i;
}
return max + 1;
}
As you can see in the code above, to find a gap ID I'm using a Where query to check whether a Dipendente instance with a Matricola (ID) corresponding to i exists.
If I were to do this using a for loop instead of the query, this would be the code I'd write:
private uint MatricolaLibera ()
{
var max = MatricolaMax();
bool found;
for ( uint i = 0; i < max; i++ )
{
found = false;
for( int j = 0; j < dipendenti.Count; j++)
if ( dipendenti[j].Matricola == i )
{
found = true;
break;
}
if ( !found )
return i;
}
return max + 1;
}
basically adding an inner for loop and a bool check to see if a free ID was found.
My question to you is the following:
Which of the two methods presented (query vs. inner for loop) performs the best? Does an even better solution exist?
Of course as Eric says, you should run the code to answer your question of which performs better. (But it should be run on sizes similar to what you will see in real use, not just small sizes.)
Some things I'll suggest:
Your MatricolaMax method is sorting the list to find the highest value. Sorting is at minimum O(n*logn), whereas simply enumerating the list and comparing the values would be O(n).
Both MatricolaLibra functions are going through the entire collection for each possible value. This is O(n*m). I would suggest going through the list once, putting each key in a dictionary, then enumerate the dictionary to find the first one that doesn't exist. This should be much faster.
Eric Lippert has the best answer to your main question, but to your "is there a better way question", here is an answer.
You can use an integer to hold the highest option found, as well as a bool array to hold used values. Assuming you have a good upper bound, you can do this with a single array, if you do not you will need to handle growing the array as needed. Then you can simply enumerate your list, marking used values and updating max if needed. If you have no idea an upper bound or it can be arbitrarily high, then use a different algorithm.
However at this point you are looking at some very non-trivial code (that can perform horribly if an upper bound is not known), so if your existing solutions work, use them.

Categories

Resources