Instantiation in a loop: Verbosity vs. Performance - c#

So in the following bit of code, I really like the verbosity of scenario one, but I would like to know how big of a performance hit this takes when compared to scenario two. Is instantiation in the loop a big deal?
Is the syntactic benefit (which I like, some might not even agree with it being verbose or optimal) worth said performance hit? You can assume that the collections remain reasonably small (N < a few hundred).
// First scenario
var productCategoryModels = new List<ProductCategoryModel>();
foreach (var productCategory in productCategories)
{
var model = new ProductCategoryModel.ProductCategoryModelConverter(currentContext).Convert(productCategory);
productCategoryModels.Add(model);
}
// Second scenario
var productCategoryModels = new List<ProductCategoryModel>();
var modelConvert = new ProductCategoryModel.ProductCategoryModelConverter(currentContext);
foreach (var productCategory in productCategories)
{
var model = modelConvert.Convert(productCategory);
productCategoryModels.Add(model);
}
Would love to hear your guys' thoughts on this as I see this quite often.

I would approach this questions a bit differently. If whatever happens in new ProductCategoryModel.ProductCategoryModelConverter(currentContext) doesn't change during the loop, I see no reason to include it within the loop. If it is not part of the loop, it shouldn't be in there imo.
If you include it just because it looks better, you force the reader to figure out if it makes a difference or not.

Like Brian, I see no reason to create a new instance if you're not actually changing it - I'm assuming that Convert doesn't change the original object?
I'd recommend using LINQ to avoid the loop entirely though (in terms of your own code):
var modelConverter = new ProductCategoryModelConverter(currentContext);
var models = productCategories.Select(x => modelConverter.Convert(x))
.ToList();
In terms of performance, it would depend on what ProductCategoryModelConverter's constructor had to do. Just creating a new object is pretty cheap, in terms of overhead. It's not free, of course - but I wouldn't expect this to cause a bottleneck in most cases. However, I'd say that instantiating it in a loop would imply to the reader that it's necessary; that there was some reason not to use just a single object. A converter certainly sounds like something which will stay immutable as it does its work... so I'd be puzzled by the instantiate-in-a-loop version.

Keep whichever of the formats you're happiest with - if they both perform well enough for your application. Optimize later if you encounter performance problems.

Related

Is it bad practice to purposely rely on Linq Side Effects?

A programming pattern like this comes up every so often:
int staleCount = 0;
fileUpdatesGridView.DataSource = MultiMerger.TargetIds
.Select(id =>
{
FileDatabaseMerger merger = MultiMerger.GetMerger(id);
if (merger.TargetIsStale)
staleCount++;
return new
{
Id = id,
IsStale = merger.TargetIsStale,
// ...
};
})
.ToList();
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = staleCount > 0;
I'm not sure there is a more succinct way to code this?
Even if so, is it bad practice to do this?
No, it is not strictly "bad practice" (like constructing SQL queries with string concatenation of user input or using goto).
Sometimes such code is more readable than several queries/foreach or no-side-effect Aggregate call. Also it is good idea to at least try to write foreach and no-side-effect versions to see which one is more readable/easier to prove correctness.
Please note that:
it is frequently very hard to reason what/when will happen with such code. I.e. you sample hacks around the fact of LINQ queries executed lazily with .ToList() call, otherwise that value will not be computed.
pure functions can be run in parallel, once with side effects need a lot of care to do so
if you ever need to convert LINQ-to-Object to LINQ-to-SQL you have to rewrite such queries
generally LINQ queries favor functional programming style without side-effects (and hence by convention readers would not expect side-effects in the code).
Why not just code it like this:
var result=MultiMerger.TargetIds
.Select(id =>
{
FileDatabaseMerger merger = MultiMerger.GetMerger(id);
return new
{
Id = id,
IsStale = merger.TargetIsStale,
// ...
};
})
.ToList();
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.IsStale);
I would consider this a bad practice. You are making the assumption that the lambda expression is being forced to execute because you called ToList. That's an implementation detail of the current version of ToList. What if ToList in .NET 7.x is changed to return an object that semi-lazily converts the IQueryable? What if it's changed to run the lambda in parallel? All of a sudden you have concurrency issues on your staleCount. As far as I know, both of those are possibilities which would break your code because of bad assumptions your code is making.
Now as far as repeatedly calling MultiMerger.GetMerger with a single id, that really should be reworked to be a join as the logic for doing a join (w|c)ould be much more efficient than what you have coded there and would scale a lot better, especially if the implementation of MultiMerger is actually pulling data from a database (or might be changed to do so).
As far as calling ToList() before passing it to the Datasource, if the Datasource doesn't use all the fields in your new object, you would be (much) faster and take less memory to skip the ToList and let the datasource only pull the fields it needs. What you've done is highly couple the data to the exact requirements of the view, which should be avoided where possible. An example would be what if you all of a sudden need to display a field that exists in FileDatabaseMerger, but isn't in your current anonymous object? Now you have to make changes to both the controller and view to add it, where if you just passed in an IQueryable, you would only have to change the view. Again, faster, less memory, more flexible, and more maintainable.
Hope this helps.. And this question really should be posted of code review, not stackoverflow.
Update on further review, the following code would be much better:
var result=MultiMerger.GetMergersByIds(MultiMerger.TargetIds);
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.TargetIsStale);
or
var result=MultiMerger.GetMergers().Where(m=>MultiMerger.TargetIds.Contains(m.Id));
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.TargetIsStale);

Efficiently gather a list of objects

List<MyClass> options = new List<MyClass>();
foreach (MyClass entity in ExistingList) {
if (entity.IsCoolEnough) {
options.Add(entity);
}
}
I'm simply curious what the fastest, most efficient way of doing this is. The list isn't very large, but it's run frequently, so I'd like to keep it snappy. I'm not looking for a change in verbosity either. I just want runtime as fast as possible.
Well, using LINQ it reads more intuitively:
var options = (from e in ExistingList where e.IsCoolEnough select e).ToList();
I'm not sure whether it is faster or more efficient, though.
I'd say that for small lists, this is actually some kind of over-optimization, as for short lists, foreach, for and the above approach shouldn't make a difference at all. So instead of optimizing this, first check whether it imposes a runtime-problem at all.
I see two possible cases of "general" optimization, either at db-level or model-level depending on where the resource hog is.
If you can select via a where on IsCoolEnough in db-level:
var result = ExistingList.Where(e => e.IsCoolEnough) // still in db?
.ToList(); // enumerate
If you can't and the expensive operation is IsCoolEnough:
var result = new ConcurrentBag<MyClass>(); // Thread safe but unordered.
Parallel.ForEach(ExistingList, current =>
{
if (current.IsCoolEnough) result.Add(current);
});
It's impossible for me to give more solid advice given the quite unknown requirements. But you do not have any general "faults" in your implementation.
I like referring to this article, so I'm doing that again now. It's worth the read (Ie click somewhere on/in this line/sentence, it's a link).

foreach loop List performance difference

While working on a project I ran into the following piece of code, which raised a performance flag.
foreach (var sample in List.Where(x => !x.Value.Equals("Not Reviewed")))
{
//do other work here
count++;
}
I decided to run a couple of quick tests comparing the original loop to the following loop:
foreach (var sample in List)
{
if (!sample.Value.Equals("Not Reviewed"))
{
//do other work here
count++;
}
}
and threw this loop in too to see what happens:
var tempList = List.Where(x => !x.Value.Equals("Not Reviewed"));
foreach (var sample in tempList)
{
//do other work here
count++;
}
I also populated the original list 3 different ways: 50-50 (so 50% of values where "Not Reviewed" and the rest other), 10-90 and 90-10. These are my results, the first and last loops
are mostly the same but the second one is much faster, especially on 10-90 case. Why exactly? I always thought Lambda had good performance.
EDIT
The count++ is not actually what's inside the loop, I just added that here for demonstration purposes, I guess I should've used "//do something here"
EDIT 2
Results running each one 1000 times:
Basically, there's a small amount of extra indirection - both for the test via a delegate, and for the iterating part. Given just how little work is being done per iteration, that extra indirection is relatively expensive.
That's neither surprising nor worrying, in my view. It's the kind of micro-optimization you can easily perform if you're in the rare situation of it being significant in your real-world application. In my experience it's pretty rare for this sort of loop to be a significant bottleneck in the app. The normal approach should be:
Define performance requirements
Implement the functional requirements in the clearest, simplest way you can
Measure your performance against the requirements
If performance is found wanting, investigate why and only move away from clarity as little as you can, getting the biggest "bang for buck" that you can
Repeat until you're done
Responding to an edit:
The count++ is not actually what's inside the loop, I just added that here for demonstration purposes, I guess I should've used "//do something here"
Well that's the important bit - the more work that is done there, the less significant anything else will be. Just counting is pretty darned fast, so I'd expect to see a large discrepancy. Do any amount of real work, and the difference will be smaller.

Should I fully declare throw-away objects?

If I need to do something like this:
var connection = new Connection(host);
connection.Execute(Commands.Delete);
Is there anything wrong in doing this:
(new Connection(host)).Execute(Commands.Delete);
The first example may be more readable, but the second works better if I need to do this multiple times:
(new Connection(anotherHost)).Execute(Commands.Create);
(new Connection(someOtherHost)).Execute(Commands.Update);
(new Connection(host)).Execute(Commands.Delete);
Does your Connection class implement IDisposable? Then:
using (var connection = new Connection(host))
{
connection.Execute(Commands.Delete);
}
First thing I am a Java person and I havent used C#
But based on your code and knowing similarity with java what I can say is -
If your Connection class maintains a state information then its meaningful to create new object every time. But if it is stateless then its pretty inefficient to create multiple objects. You can create one and re-use the same.
i.e. If you cannot set the 'host' to a connection once created, then both approaches you mentioned should not make any difference.
The more verbose you are, the easier of a time you will have debugging.
The effect on your code readability really depends on how much you're trying to wrap into one line - if you've got a single idea that just takes a lot of different words to express, putting it in one line isn't a big deal in my opinion. But if you're trying to cram multiple ideas into one line you'll lose clarity.
For instance. We'll start with a simple idea that just takes some space to express:
transactionTime = generationTime + retrievalTime + processingTime + loggingTime
And here we have a more complex idea that we happen to express in one line:
logTransactionData(processTransaction(retrieveTransaction(generateTransactionQuery())))
I think that the first example is easier to understand at a glance than the second, even though they're about the same in character length.
So in general: consider how likely it is you'll need to debug the line, as well as your idea-complexity-to-line ratio.
Yes, you're initializing a new object, using it, and loosing it! you cannot reuse it again and it's gonna be there somewhere, until GC collects it! So, there's no harm in storing new initialized objects in a variable, and declaring them in another line makes your code more readable.

How to replace for-loops with a functional statement in C#?

A colleague once said that God is killing a kitten every time I write a for-loop.
When asked how to avoid for-loops, his answer was to use a functional language. However, if you are stuck with a non-functional language, say C#, what techniques are there to avoid for-loops or to get rid of them by refactoring? With lambda expressions and LINQ perhaps? If so, how?
Questions
So the question boils down to:
Why are for-loops bad? Or, in what context are for-loops to avoid and why?
Can you provide C# code examples of how it looks before, i.e. with a loop, and afterwards without a loop?
Functional constructs often express your intent more clearly than for-loops in cases where you operate on some data set and want to transform, filter or aggregate the elements.
Loops are very appropriate when you want to repeatedly execute some action.
For example
int x = array.Sum();
much more clearly expresses your intent than
int x = 0;
for (int i = 0; i < array.Length; i++)
{
x += array[i];
}
Why are for-loops bad? Or, in what
context are for-loops to avoid and
why?
If your colleague has a functional programming, then he's probably already familiar with the basic reasons for avoiding for loops:
Fold / Map / Filter cover most use cases of list traversal, and lend themselves well to function composition. For-loops aren't a good pattern because they aren't composable.
Most of the time, you traverse through a list to fold (aggregate), map, or filter values in a list. These higher order functions already exist in every mainstream functional language, so you rarely see the for-loop idiom used in functional code.
Higher order functions are the bread and butter of function composition, meaning you can easily combine simple function into something more complex.
To give a non-trivial example, consider the following in an imperative language:
let x = someList;
y = []
for x' in x
y.Add(f x')
z = []
for y' in y
z.Add(g y')
In a functional language, we'd write map g (map f x), or we can eliminate the intermediate list using map (f . g) x. Now we can, in principle, eliminate the intermediate list from the imperative version, and that would help a little -- but not much.
The main problem with the imperative version is simply that the for-loops are implementation details. If you want change the function, you change its implementation -- and you end up modifying a lot of code.
Case in point, how would you write map g (filter f x) in imperatively? Well, since you can't reuse your original code which maps and maps, you need to write a new function which filters and maps instead. And if you have 50 ways to map and 50 ways to filter, how you need 50^50 functions, or you need to simulate the ability to pass functions as first-class parameters using the command pattern (if you've ever tried functional programming in Java, you understand what a nightmare this can be).
Back in the the functional universe, you can generalize map g (map f x) in way that lets you swap out the map with filter or fold as needed:
let apply2 a g b f x = a g (b f x)
And call it using apply2 map g filter f or apply2 map g map f or apply2 filter g filter f or whatever you need. Now you'd probably never write code like that in the real world, you'd probably simplify it using:
let mapmap g f = apply2 map g map f
let mapfilter g f = apply2 map g filter f
Higher-order functions and function composition give you a level of abstraction that you cannot get with the imperative code.
Abstracting out the implementation details of loops let's you seamlessly swap one loop for another.
Remember, for-loops are an implementation detail. If you need to change the implementation, you need to change every for-loop.
Map / fold / filter abstract away the loop. So if you want to change the implementation of your loops, you change it in those functions.
Now you might wonder why you'd want to abstract away a loop. Consider the task of mapping items from one type to another: usually, items are mapped one at a time, sequentially, and independently from all other items. Most of the time, maps like this are prime candidates for parallelization.
Unfortunately, the implementation details for sequential maps and parallel maps aren't interchangeable. If you have a ton of sequential maps all over your code, and you want swap them out for parallel maps, you have two choices: copy/paste the same parallel mapping code all over your code base, or abstract away mapping logic into two functions map and pmap. Once you're go the second route, you're already knee-deep in functional programming territory.
If you understand the purpose of function composition and abstracting away implementation details (even details as trivial as looping), you can start to appreciate just how and why functional programming is so powerful in the first place.
For loops are not bad. There are many very valid reasons to keep a for loop.
You can often "avoid" a for loop by reworking it using LINQ in C#, which provides a more declarative syntax. This can be good or bad depending on the situation:
Compare the following:
var collection = GetMyCollection();
for(int i=0;i<collection.Count;++i)
{
if(collection[i].MyValue == someValue)
return collection[i];
}
vs foreach:
var collection = GetMyCollection();
foreach(var item in collection)
{
if(item.MyValue == someValue)
return item;
}
vs. LINQ:
var collection = GetMyCollection();
return collection.FirstOrDefault(item => item.MyValue == someValue);
Personally, all three options have their place, and I use them all. It's a matter of using the most appropriate option for your scenario.
There's nothing wrong with for loops but here are some of the reasons people might prefer functional/declarative approaches like LINQ where you declare what you want rather than how you get it:-
Functional approaches are potentially easier to parallelize either manually using PLINQ or by the compiler. As CPUs move to even more cores this may become more important.
Functional approaches make it easier to achieve lazy evaluation in multi-step processes because you can pass the intermediate results to the next step as a simple variable which hasn't been evaluated fully yet rather than evaluating the first step entirely and then passing a collection to the next step (or without using a separate method and a yield statement to achieve the same procedurally).
Functional approaches are often shorter and easier to read.
Functional approaches often eliminate complex conditional bodies within for loops (e.g. if statements and 'continue' statements) because you can break the for loop down into logical steps - selecting all the elements that match, doing an operation on them, ...
For loops don't kill people (or kittens, or puppies, or tribbles). People kill people.
For loops, in and of themselves, are not bad. However, like anything else, it's how you use them that can be bad.
Sometime you don't kill just one kitten.
for (int i = 0; i < kittens.Length; i++)
{
kittens[i].Kill();
}
Sometimes you kill them all.
You can refactor your code well enough so that you won't see them often. A good function name is definitely more readable that a for loop.
Taking the example from AndyC :
Loop
// mystrings is a string array
List<string> myList = new List<string>();
foreach(string s in mystrings)
{
if(s.Length > 5)
{
myList.add(s);
}
}
Linq
// mystrings is a string array
List<string> myList = mystrings.Where<string>(t => t.Length > 5)
.ToList<string();
Wheter you use the first or the second version inside your function, It's easier to read
var filteredList = myList.GetStringLongerThan(5);
Now that's an overly simple example, but you get my point.
Your colleague is not right. For loops are not bad per se. They are clean, readable and not particularly error prone.
Your colleague is wrong about for loops being bad in all cases, but correct that they can be rewritten functionally.
Say you have an extension method that looks like this:
void ForEach<T>(this IEnumerable<T> collection, Action <T> action)
{
foreach(T item in collection)
{
action(item)
}
}
Then you can write a loop like this:
mycollection.ForEach(x => x.DoStuff());
This may not be very useful now. But if you then replace your implementation of the ForEach extension method for use a multi threaded approach then you gain the advantages of parallelism.
This obviously isn't always going to work, this implementation only works if the loop iterations are completely independent of each other, but it can be useful.
Also: always be wary of people who say some programming construct is always wrong.
A simple (and pointless really) example:
Loop
// mystrings is a string array
List<string> myList = new List<string>();
foreach(string s in mystrings)
{
if(s.Length > 5)
{
myList.add(s);
}
}
Linq
// mystrings is a string array
List<string> myList = mystrings.Where<string>(t => t.Length > 5).ToList<string>();
In my book, the second one looks a lot tidier and simpler, though there's nothing wrong with the first one.
Sometimes a for-loop is bad if there exists a more efficient alternative. Such as searching, where it might be more efficient to sort a list and then use quicksort or binary sort. Or when you are iterating over items in a database. It is usually much more efficient to use set-based operations in a database instead of iterating over the items.
Otherwise if the for-loop, especially a for-each makes the most sense and is readable, then I would go with that rather than rafactor it into something that isn't as intuitive. I personally don't believe in these religious sounding "always do it this way, because that is the only way". Rather it is better to have guidelines, and understand in what scenarios it is appropriate to apply those guidelines. It is good that you ask the Why's!
For loop is, let's say, "bad" as it implies branch prediction in CPU, and possibly performance decrease when branch prediction miss.
But CPU (having a branch prediction accuracy of 97%) and compiler with tecniques like loop unrolling, make loop performance reduction negligible.
If you abstract the for loop directly you get:
void For<T>(T initial, Func<T,bool> whilePredicate, Func<T,T> step, Action<T> action)
{
for (T t = initial; whilePredicate(t); step(t))
{
action(t);
}
}
The problem I have with this from a functional programming perspective is the void return type. It essentially means that for loops do not compose nicely with anything. So the goal is not to have a 1-1 conversion from for loop to some function, it is to think functionally and avoid doing things that do not compose. Instead of thinking of looping and acting think of the whole problem and what you are mapping from and to.
A for loop can always be replaced by a recursive function that doesn't involve the use of a loop. A recursive function is a more functional stye of programming.
But if you blindly replace for loops with recursive functions, then kittens and puppies will both die by the millions, and you will be done in by a velocirapter.
OK, here's an example. But please keep in mind that I do not advocate making this change!
The for loop
for (int index = 0; index < args.Length; ++index)
Console.WriteLine(args[index]);
Can be changed to this recursive function call
WriteValuesToTheConsole(args, 0);
static void WriteValuesToTheConsole<T>(T[] values, int startingIndex)
{
if (startingIndex < values.Length)
{
Console.WriteLine(values[startingIndex]);
WriteValuesToTheConsole<T>(values, startingIndex + 1);
}
}
This should work just the same for most values, but it is far less clear, less effecient, and could exhaust the stack if the array is too large.
Your colleague may be suggesting under certain circumstances where database data is involved that it is better to use an aggregate SQL function such as Average() or Sum() at query time as opposed to processing the data on the C# side within an ADO .NET application.
Otherwise for loops are highly effective when used properly, but realize that if you find yourself nesting them to three or more orders, you might need a better algorithm, such as one that involves recursion, subroutines or both. For example, a bubble sort has a O(n^2) runtime on its worst-case (reverse order) scenario, but a recursive sort algorithm is only O(n log n), which is much better.
Hopefully this helps.
Jim
Any construct in any language is there for a reason. It's a tool to be used to accomplish a task. Means to an end. In every case, there are manners in which to use it appropriately, that is, in a clear and concise way and within the spirit of the language AND manners to abuse it. This applies to the much-misaligned goto statement as well as to your for loop conundrum, as well as while, do-while, switch/case, if-then-else, etc. If the for loop is the right tool for what you're doing, USE IT and your colleague will need to come to terms with your design decision.
It depends upon what is in the loop but he/she may be referring to a recursive function
//this is the recursive function
public static void getDirsFiles(DirectoryInfo d)
{
//create an array of files using FileInfo object
FileInfo [] files;
//get all files for the current directory
files = d.GetFiles("*.*");
//iterate through the directory and print the files
foreach (FileInfo file in files)
{
//get details of each file using file object
String fileName = file.FullName;
String fileSize = file.Length.ToString();
String fileExtension =file.Extension;
String fileCreated = file.LastWriteTime.ToString();
io.WriteLine(fileName + " " + fileSize +
" " + fileExtension + " " + fileCreated);
}
//get sub-folders for the current directory
DirectoryInfo [] dirs = d.GetDirectories("*.*");
//This is the code that calls
//the getDirsFiles (calls itself recursively)
//This is also the stopping point
//(End Condition) for this recursion function
//as it loops through until
//reaches the child folder and then stops.
foreach (DirectoryInfo dir in dirs)
{
io.WriteLine("--------->> {0} ", dir.Name);
getDirsFiles(dir);
}
}
The question is if the loop will be mutating state or causing side effects. If so, use a foreach loop. If not, consider using LINQ or other functional constructs.
See "foreach" vs "ForEach" on Eric Lippert's Blog.

Categories

Resources