Is there a classy way to do:
foreach(item i in ListItems1)
do ...
foreach(item i in ListItems2)
do ...
foreach(item i in ListItems2)
do ...
...
In a single foreach (using Linq I suppose ?) in C#, that does not hurt performance (especially the memory side) ?
The best way to manage this is to factor it into a function that you call 3 times:
public void ProcessList(List<myListType> theList)
{
//Do some cool stuff here...
}
But it's borderline whether you get much maintainability benefit with this. If you don't want to increase memory usage then this is probably the best refactor to make your code better. Unless your lists are huge the memory differences are likely to be negligible anyway.
You can always do
foreach (item i in ListItems1.Concat(ListItems2).Concat(ListItems3))
{
// do things
}
There's also the similar .Union() that removes duplicate items. However, since it is removing duplicates, it's less performant.
For the sake of answering your question and being a little creative, this here "technically" would work but I wouldn't advice using it.
int listItemTotalItemCount = ListItems1.Count() + ListItems2.Count() + ListItems3.Count();
for(int i = 0; i < listItemTotalItemCount; i++)
{
int listIndex = i;
object retrievedItem;
if(listIndex >= ListItems1.Count())
{
if (i - ListItems1.Count() >= ListItems2.Count())
{
retrievedItem = ListItems3[i - ListItems1.Count()- ListItems2.Count() ];
}
else
{
retrievedItem = ListItems2[i - ListItems1.Count()];
}
}
else
{
retrievedItem = ListItems1[listIndex];
}
retrievedItem.DoSomethingSpecial();
}
This can probably be improved as well with an array of the lists you want to iterate over, but as I said before I would not recommend this method as it is I'm guessing significantly worse than your original method.
Related
I have 2 methods that can do the work for me, one is serial and the other one is parallel.
The reason for parallelization is because there are lots of iteration(about 100,000 or so)
For some reason, the parallel one do skip or double doing some iterations, and I don't have any clue how to debug it.
The serial method
for(int i = somenum; i >= 0; i-- ){
foreach (var nue in nuelist)
{
foreach (var path in nue.pathlist)
{
foreach (var conn in nue.connlist)
{
Func(conn,path);
}
}
}
}
The parallel method
for(int i = somenum; i >= 0; i-- ){
Parallel.ForEach(nuelist,nue =>
{
Parallel.ForEach(nue.pathlist,path=>
{
Parallel.ForEach(nue.connlist, conn=>
{
Func(conn,path);
});
});
});
}
Inside Path class
Nue firstnue;
public void Func(Conn conn,Path path)
{
List<Conn> list = new(){conn};
list.AddRange(path.list);
_ = new Path(list);
}
public Path(List<Conn>)
{
//other things
firstnue.pathlist.Add(this);
/*
firstnue is another nue that will be
in the next iteration of for loop
*/
}
They are both the same method except, of course, foreach and Parallel.ForEach loop.
the code is for the code in here (GitHub page)
List<T>, which I assume you use with firstnue.pathlist, isn't thread-safe. That means, when you add/remove items from the same List<T> from multiple threads at the same time, your data will get corrupt. In order to avoid that problem, the simplest solution is to use a lock, so multiple threads doesn't try to modify list at once.
However, a lock essentially serializes the list operations, and if the only thing you do in Func is to change a list, you may not gain much by parallelizing the code. But, if you still want to give it a try, you just need to change this:
firstnue.pathlist.Add(this);
to this:
lock (firstnue.pathlist)
{
firstnue.pathlist.Add(this);
}
Thanks to sedat-kapanoglu, I found the problem is really about thread safety. The solution was to change every List<T> to ConcurrentBag<T>.
For everyone, like me, The solution of "parallel not working with collections" is to change from System.Collections.Generic to System.Collections.Concurrent
Lets assume you have a function that returns a lazily-enumerated object:
struct AnimalCount
{
int Chickens;
int Goats;
}
IEnumerable<AnimalCount> FarmsInEachPen()
{
....
yield new AnimalCount(x, y);
....
}
You also have two functions that consume two separate IEnumerables, for example:
ConsumeChicken(IEnumerable<int>);
ConsumeGoat(IEnumerable<int>);
How can you call ConsumeChicken and ConsumeGoat without a) converting FarmsInEachPen() ToList() beforehand because it might have two zillion records, b) no multi-threading.
Basically:
ConsumeChicken(FarmsInEachPen().Select(x => x.Chickens));
ConsumeGoats(FarmsInEachPen().Select(x => x.Goats));
But without forcing the double enumeration.
I can solve it with multithread, but it gets unnecessarily complicated with a buffer queue for each list.
So I'm looking for a way to split the AnimalCount enumerator into two int enumerators without fully evaluating AnimalCount. There is no problem running ConsumeGoat and ConsumeChicken together in lock-step.
I can feel the solution just out of my grasp but I'm not quite there. I'm thinking along the lines of a helper function that returns an IEnumerable being fed into ConsumeChicken and each time the iterator is used, it internally calls ConsumeGoat, thus executing the two functions in lock-step. Except, of course, I don't want to call ConsumeGoat more than once..
I don't think there is a way to do what you want, since ConsumeChickens(IEnumerable<int>) and ConsumeGoats(IEnumerable<int>) are being called sequentially, each of them enumerating a list separately - how do you expect that to work without two separate enumerations of the list?
Depending on the situation, a better solution is to have ConsumeChicken(int) and ConsumeGoat(int) methods (which each consume a single item), and call them in alternation. Like this:
foreach(var animal in animals)
{
ConsomeChicken(animal.Chickens);
ConsomeGoat(animal.Goats);
}
This will enumerate the animals collection only once.
Also, a note: depending on your LINQ-provider and what exactly it is you're trying to do, there may be better options. For example, if you're trying to get the total sum of both chickens and goats from a database using linq-to-sql or linq-to-entities, the following query..
from a in animals
group a by 0 into g
select new
{
TotalChickens = g.Sum(x => x.Chickens),
TotalGoats = g.Sum(x => x.Goats)
}
will result in a single query, and do the summation on the database-end, which is greatly preferable to pulling the entire table over and doing the summation on the client end.
The way you have posed your problem, there is no way to do this. IEnumerable<T> is a pull enumerable - that is, you can GetEnumerator to the front of the sequence and then repeatedly ask "Give me the next item" (MoveNext/Current). You can't, on one thread, have two different things pulling from the animals.Select(a => a.Chickens) and animals.Select(a => a.Goats) at the same time. You would have to do one then the other (which would require materializing the second).
The suggestion BlueRaja made is one way to change the problem slightly. I would suggest going that route.
The other alternative is to utilize IObservable<T> from Microsoft's reactive extensions (Rx), a push enumerable. I won't go into the details of how you would do that, but it's something you could look into.
Edit:
The above is assuming that ConsumeChickens and ConsumeGoats are both returning void or are at least not returning IEnumerable<T> themselves - which seems like an obvious assumption. I'd appreciate it if the lame downvoter would actually comment.
Actually simples way to achieve what you what is convert FarmsInEachPen return value to push collection or IObservable and use ReactiveExtensions for working with it
var observable = new Subject<Animals>()
observable.Do(x=> DoSomethingWithChicken(x. Chickens))
observable.Do(x=> DoSomethingWithGoat(x.Goats))
foreach(var item in FarmsInEachPen())
{
observable.OnNext(item)
}
I figured it out, thanks in large part due to the path that #Lee put me on.
You need to share a single enumerator between the two zips, and use an adapter function to project the correct element into the sequence.
private static IEnumerable<object> ConsumeChickens(IEnumerable<int> xList)
{
foreach (var x in xList)
{
Console.WriteLine("X: " + x);
yield return null;
}
}
private static IEnumerable<object> ConsumeGoats(IEnumerable<int> yList)
{
foreach (var y in yList)
{
Console.WriteLine("Y: " + y);
yield return null;
}
}
private static IEnumerable<int> SelectHelper(IEnumerator<AnimalCount> enumerator, int i)
{
bool c = i != 0 || enumerator.MoveNext();
while (c)
{
if (i == 0)
{
yield return enumerator.Current.Chickens;
c = enumerator.MoveNext();
}
else
{
yield return enumerator.Current.Goats;
}
}
}
private static void Main(string[] args)
{
var enumerator = GetAnimals().GetEnumerator();
var chickensList = ConsumeChickens(SelectHelper(enumerator, 0));
var goatsList = ConsumeGoats(SelectHelper(enumerator, 1));
var temp = chickensList.Zip(goatsList, (i, i1) => (object) null);
temp.ToList();
Console.WriteLine("Total iterations: " + iterations);
}
JavaScript supports a goto like syntax for breaking out of nested loops. It's not a great idea in general, but it's considered acceptable practice. C# does not directly support the break labelName syntax...but it does support the infamous goto.
I believe the equivalent can be achieved in C#:
int i = 0;
while(i <= 10)
{
Debug.WriteLine(i);
i++;
for(int j = 0; j < 3; j++)
if (i > 5)
{
goto Break;//break out of all loops
}
}
Break:
By the same logic of JavaScript, is nested loop scenario an acceptable usage of goto? Otherwise, the only way I am aware to achieve this functionality is by setting a bool with appropriate scope.
My opinion: complex code flows with nested loops are hard to reason about; branching around, whether it is with goto or break, just makes it harder. Rather than writing the goto, I would first think really hard about whether there is a way to eliminate the nested loops.
A couple of useful techniques:
First technique: Refactor the inner loop to a method. Have the method return whether or not to break out of the outer loop. So:
for(outer blah blah blah)
{
for(inner blah blah blah)
{
if (whatever)
{
goto leaveloop;
}
}
}
leaveloop:
...
becomes
for(outer blah blah blah)
{
if (Inner(blah blah blah))
break;
}
...
bool Inner(blah blah blah)
{
for(inner blah blah blah)
{
if (whatever)
{
return true;
}
}
return false;
}
Second technique: if the loops do not have side effects, use LINQ.
// fulfill the first unfulfilled order over $100
foreach(var customer in customers)
{
foreach(var order in customer.Orders)
{
if (!order.Filled && order.Total >= 100.00m)
{
Fill(order);
goto leaveloop;
}
}
}
leaveloop:
instead, write:
var orders = from customer in customers
from order in customer.Orders;
where !order.Filled
where order.Total >= 100.00m
select order;
var orderToFill = orders.FirstOrDefault();
if (orderToFill != null) Fill(orderToFill);
No loops, so no breaking out required.
Alternatively, as configurator points out in a comment, you could write the code in this form:
var orderToFill = customers
.SelectMany(customer=>customer.Orders)
.Where(order=>!order.Filled)
.Where(order=>order.Total >= 100.00m)
.FirstOrDefault();
if (orderToFill != null) Fill(orderToFill);
The moral of the story: loops emphasize control flow at the expense of business logic. Rather than trying to pile more and more complex control flow on top of each other, try refactoring the code so that the business logic is clear.
I would personally try to avoid using goto here by simply putting the loop into a different method - while you can't easily break out of a particular level of loop, you can easily return from a method at any point.
In my experience this approach has usually led to simpler and more readable code with shorter methods (doing one particular job) in general.
Let's get one thing straight: there is nothing fundamentally wrong with using the goto statement, it isn't evil - it is just one more tool in the toolbox. It is how you use it that really matters, and it is easily misused.
Breaking out of a nested loop of some description can be a valid use of the statement, although you should first look to see if it can be redesigned. Can your loop exit expressions be rewritten? Are you using the appropriate type of loop? Can you filter the list of data you may be iterating over so that you don't need to exit early? Should you refactor some loop code into a separate function?
IMO it is acceptable in languages that do not support break n; where n specifies the number of loops it should break out.
At least it's much more readable than setting a variable that is then checked in the outer loop.
I believe the 'goto' is acceptable in this situation. C# does not support any nifty ways to break out of nested loops unfortunately.
It's a bit of a unacceptable practice in C#. If there's no way your design can avoid it, well, gotta use it. But do exhaust all other alternatives first. It will make for better readability and maintainability. For your example, I've crafted one such potential refactoring:
void Original()
{
int i = 0;
while(i <= 10)
{
Debug.WriteLine(i);
i++;
if (Process(i))
{
break;
}
}
}
bool Process(int i)
{
for(int j = 0; j < 3; j++)
if (i > 5)
{
return true;
}
return false;
}
I recommend using continue if you want to skip that one item, and break if you want to exit the loop. For deeper nested put it in a method and use return. I personally would rather use a status bool than a goto. Rather use goto as a last resort.
anonymous functions
You can almost always bust out the inner loop to an anonymous function or lambda. Here you can see where the function used to be an inner loop, where I would have had to use GoTo.
private void CopyFormPropertiesAndValues()
{
MergeOperationsContext context = new MergeOperationsContext() { GroupRoot = _groupRoot, FormMerged = MergedItem };
// set up filter functions caller
var CheckFilters = (string key, string value) =>
{
foreach (var FieldFilter in MergeOperationsFieldFilters)
{
if (!FieldFilter(key, value, context))
return false;
}
return true;
};
// Copy values from form to FormMerged
foreach (var key in _form.ValueList.Keys)
{
var MyValue = _form.ValueList(key);
if (CheckFilters(key, MyValue))
MergedItem.ValueList(key) = MyValue;
}
}
This often occurs when searching for multiple items in a dataset manually, as well. Sad to say the proper use of goto is better than Booleans/flags, from a clarity standpoint, but this is more clear than either and avoids the taunts of your co-workers.
For high-performance situations, a goto would be fitting, however, but only by 1%, let's be honest here...
int i = 0;
while(i <= 10)
{
Debug.WriteLine(i);
i++;
for(int j = 0; j < 3 && i <= 5; j++)
{
//Whatever you want to do
}
}
Unacceptable in C#.
Just wrap the loop in a function and use return.
EDIT: On SO, downvoting is used to on incorrect answers, and not on answers you disagree with. As the OP explicitly asked "is it acceptable?", answering "unacceptable" is not incorrect (although you might disagree).
For removing an object where a property equals a value which is faster?
foreach(object o in objects)
{
if(o.name == "John Smith")
{
objects.Remove(o);
break;
}
}
or
objects.RemoveAll(o => o.Name == "John Smith");
Thanks!
EDIT:
I should have mentioned this is removing one object from the collection, then breaking out of the loop which prevents any errors you have described, although using a for loop with the count is the better option!
If you really want to know if one thing is faster than another, benchmark it. In other words, measure, don't guess! This is probably my favorite mantra.
As well as the fact that you're breaking the rules in the first one (modifying the list during the processing of it, leading me to invoke my second mantra: You can't get any more unoptimised than "wrong"), the second is more readable and that's usually what I aim for first.
And, just to complete my unholy trinity of mantras: Optimise for readability first, then optimise for speed only where necessary :-)
From a List<string> of 10,000 items, the speeds are:
for loop: 110,000 ticks
lambda: 1,000 ticks
From this information, we can conclude that the lambda expression is faster.
The source code I used can be found here.
Note that I substituted your foreach with a for loop, since we aren't able to modify values within a foreach loop.
Assuming you meant something like
for(int i = 0; i < objects.Count; i++)
{
if(objects[i].name == "John Smith")
{
objects.Remove(objects[i--]);
}
}
RemoveAll would be faster in this case. As with Remove you are iterating over the list again(IndexOf) when you already have the position.
Here is List.Remove
public bool Remove(T item)
{
int index = this.IndexOf(item);
if (index >= 0x0)
{
this.RemoveAt(index);
return true;
}
return false;
}
atm I do it like this:
lock (LockObj)
{
foreach (var o in Oo)
{
var val = o.DateActive;
if (val.AddSeconds(30) < DateTime.Now) Oo.Remove(o);
}
}
and I get this error:
Collection was modified; enumeration operation may not execute
how this should be done?
You have to use a regular for loop.
for (int i = 0; i < Oo.Length; ++i)
{
var val = Oo[i];
if (val.AddSeconds(30) < DateTime.Now)
{
Oo.RemoveAt(i);
i--; // since we just removed an element
}
}
The reason you cannot edit a collection with a foreach loop is because foreach uses a readonly IEnumerator of the collection you are iterating.
you can't modify a collection you are enumerating..
to change it get a copy of it and change it.
for(var k in OO.ToList())
.....
or
use count and iterate the collection with index,
for (int i=0;i<OO.Count();i++)
.....
You simply cannot modify the collection if you are iterating with foreach. You have two options, Loop with For instead of foreach or create another Collection and modify that.
This problem is completely unrelated to locking.
If you add/remove elements from a List all iterators pointing to that list become invalid.
One alternative to using an iterator is manually working with indices. Then you can iterate backwards and remove elements with RemoveAt.
for(int i=Oo.Count-1;i>=0;i--)
{
var o=Oo[i];
if (o.DateActive.AddSeconds(30)<DateTime.Now)
Oo.RemoveAt(i);
}
Unfortunately this native implementation is O(n^2). If you write it in a more complex way where you first assign the elements to their new position and then truncate the list it becomes O(n).
Buf if Oo is a List<T> there is a much better solution. You can use Oo.RemoveAll(o=>o.DateActive.AddSeconds(30)<DateTime.Now). Unfortunately you there is no such extension method on IList<T> by default.
I'd write the code like this:
lock (LockObj)
{
DateTime deleteTime=DateTime.Now.AddSeconds(-30);
Oo.RemoveAll(o=>o.DateActive<deleteTime);
}
As a sidenote I'd personally use UTC times instead of local times for such code.
class Program
{
static void Main(string[] args)
{
List<OOItem> oo = new List<OOItem>();
oo.Add( new OOItem() { DateActive = DateTime.Now.AddSeconds(-31) });
lock(LockObj)
{
foreach( var item in oo.Where( ooItem => ooItem.DateActive.AddSeconds(30) < DateTime.Now ).ToArray())
{
oo.Remove(item);
}
}
Debug.Assert( oo.Count == 0);
}
}
public class OOItem
{
public DateTime DateActive { get; set; }
}
I'm going to suggest an approach that avoids messing around with decrementing loop indexes and other stuff that makes code difficult to understand.
I think the best bet is to write a nice query and then do a foreach over the result of turning the query into an array:
var inactives = from o in Oo
where o.DateActive < DateTime.Now
select o;
foreach (var o in inactives.ToArray())
{
Oo.Remove(o);
}
This avoids the issue of the collection changing and makes the code quite a bit more readable.
If you're a little more "functionally" oriented then here's another choice:
(from o in Oo
where o.DateActive < DateTime.Now
select o)
.ToList()
.ForEach(o => Oo.Remove(o));
Enjoy!
The problem is not related to the lock.
Use a for() loop instead of foreach().
I can't 100% replace your code because your code provides no hint of what collection type "Oo" is. Neither does the name "Oo". Perhaps one of the evils of var keyword overuse? Or maybe I just can't see enough of your code ;)
int size = Oo.Length();
for(int i = 0; i < size; i++){
if (Oo[i].AddSeconds(30) < DateTime.Now){
Oo[i].RemoveAt(i);
size--; // Compensate for new size after removal.
}
}
you can use Parallel.ForEach(oO, val=> { oO.Remove(val); })
Parallel doesn't have the IEnumerator problem !