LINQ Design Curiosity: Skip/Take vs. SkipWhile/TakeWhile - c#

Is there any particular reason to have separate methods Skip and SkipWhile, rather than simply having overloads of the same method?
What I mean is, instead of Skip(int), SkipWhile(Func<TSource,bool>), and SkipWhile(Func<TSource,int,bool>), why not have Skip(int), Skip(Func<TSource,bool>), and Skip(Func<TSource,int,bool>)? I'm sure there's some reason for it, as the whole LINQ system was designed by people with much more experience than me, but that reasoning is not apparent.
The only possibility that's come to mind has been issues with the parser for the SQL-like syntax, but that already distinguishes between things like Select(Func<TSource,TResult>) and Select(Func<TSource,int,TResult>), so I doubt that's why.
The same question applies to Take and TakeWhile, which are complimentary to the above.
Edit: To clarify, I am aware of the functional differences between the variants, I'm merely asking about the design decision on the naming of the methods.

IMO, the only reason would be better readability. Skip sound like “Skip N number of records”, while SkipWhile sounds like “Skip until a condition is met”. These names are self-explanatory

The "While" indicates that LINQ will only skip while the lambda expression evaluates to true, and will stop skipping as soon as it is no longer true. This is a very different thing from just skipping a fixed number of items.
The same reasoning holds true for Take, of course.
All is well in the interest of clarity!

Related

What are we guaranteed regarding side-effects in LINQ predicates?

I just saw this bit of code that has a count++ side-effect in the .GroupBy predicate. (originally here).
object[,] data; // This contains all the data.
int count = 0;
List<string[]> dataList = data.Cast<string>()
.GroupBy(x => count++ / data.GetLength(1))
.Select(g => g.ToArray())
.ToList();
This terrifies me because I have no idea how many times the implementation will invoke the key selector function. And I also don't know if the function is guaranteed to be applied to each item in order. I realize that, in practice, the implementation may very well just call the function once per item in order, but I never assumed that as being guaranteed, so I'm paranoid about depending on that behaviour -- especially given what may happen on other platforms, other future implementations, or after translation or deferred execution by other LINQ providers.
As it pertains to a side-effect in the predicate, are we offered some kind of written guarantee, in terms of a LINQ specification or something, as to how many times the key selector function will be invoked, and in what order?
Please, before you mark this question as a duplicate, I am looking for a citation of documentation or specification that says one way or the other whether this is undefined behaviour or not.
For what it's worth, I would have written this kind of query the long way, by first performing a select query with a predicate that takes an index, then creating an anonymous object that includes the index and the original data, then grouping by that index, and finally selecting the original data out of the anonymous object. That seems more like a correct way of doing functional programming. And it also seems more like something that could be translated to a server-side query. The side-effect in the predicate just seems wrong to me - and against the principles of both LINQ and functional programming, so I would assume there would be no guarantee specified and that this may very well be undefined behaviour. Is it?
I realize this question may be difficult to answer if the documentation and LINQ specification actually says nothing regarding side effects in predicates. I want to know specifically whether:
Specs say it's permissible and how. (I doubt it)
Specs say it's undefined behaviour (I suspect this is true and am looking for a citation)
Specs say nothing. (Sloppy spec, if you ask me, but it would be nice to know if others have searched for language regarding side-effects and also come up empty. Just because I can't find it doesn't mean it doesn't exist.)
According to official C# Language Specification, on page 203, we can read (emphasis mine):
12.17.3.1 The C# language does not specify the execution semantics of query expressions. Rather, query expressions are
translated into invocations of methods that adhere to the
query-expression pattern (§12.17.4). Specifically, query expressions
are translated into invocations of methods named Where, Select,
SelectMany, Join, GroupJoin, OrderBy, OrderByDescending, ThenBy,
ThenByDescending, GroupBy, and Cast. These methods are expected to
have particular signatures and return types, as described in §12.17.4.
These methods may be instance methods of the object being queried or
extension methods that are external to the object. These methods
implement the actual execution of the query.
From looking at the source code of GroupBy in corefx on GitHub, it does seems like the key selector function is indeed called once per element, and it is called in the order that the previous IEnumerable provides them. I would in no way consider this a guarantee though.
In my view, any IEnumerables which cannot be enumerated multiple times safely are a big red flag that you may want to reconsider your design choices. An interesting issue that could arise from this is that for example if you view the contents of this IEnumerable in the Visual Studio debugger, it will probably break your code, since it would cause the count variable to go up.
The reason this code hasn't exploded up until now is probably because the IEnumerable is never stored anywhere, since .ToList is called right away. Therefore there is no risk of multiple enumerations (again, with the caveat about viewing it in the debugger and so on).

Is there a reason to return null inside parentheses

I found return(null) in
http://msdn.microsoft.com/ru-ru/library/system.xml.serialization.ixmlserializable.aspx
I wonder is there some reason for this parentheses ? Why not just return null?
Hmm this was a curious question so I did some browsing.
I found this post which was posted answered by Jon Skeet. He states sometimes it increases readability but has no performance or logical impact.
Another user suggests it is a hold over from long ago when some compilers for C required them.
Interesting to see an example on MSDN with it though, nice find.
MSDN also has this
Many programmers use parentheses to enclose the expression argument of the return statement. However, C does not require the parentheses.
There is no reason for that, you can just as easily type
return null;
In C#, the expressions foo and (foo) evaluate to exactly the same thing. The parens have no effect when the expression only has one term.
I actively prefer not using them, though, for a couple of reasons:
To me, at first glance, return(null) resembles a function call just a little too closely.
They hint at complexity that isn't there, and people -- even people who know better, but are just having a brain fart -- end up asking questions about what's special about that statement. This very question can be considered evidence.
If you do use them, you should probably stick a space after the return.

Which is the better method to use LINQ?

I have these two lines, that do exactly the same thing. But are written differently. Which is the better practice, and why?
firstRecordDate = (DateTime)(from g in context.Datas
select g.Time).Min();
firstRecordDate = (DateTime)context.Datas.Min(x => x.Time);
there is no semantic difference between method syntax and query
syntax. In addition, some queries, such as those that retrieve the
number of elements that match a specified condition, or that retrieve
the element that has the maximum value in a source sequence, can only
be expressed as method calls.
http://msdn.microsoft.com/en-us/library/bb397947.aspx
Also look here: .NET LINQ query syntax vs method chain
It comes down to what you are comfortable with and what you find is more readable.
The second one use lambda expressions. I like it as it is compact and easier to read (although some will find the former easier to read).
Also, the first is better suited if you have a SQL background.
I'd say go with what is most readable or understandable with regards to your development team. Come back in a year or so and see if you can remember that LINQ... well, this particular LINQ is obviously simple so that's moot :-)
Best practice is also quite opinionated, you aren't going to get one answer here. In this case, I'd go for the second item because it's concise and I can personally read and understand it faster than the first, though only slightly faster.
I personally much prefer using lambda expressions. As far as I know there is no real difference as you say you can do exactly the same thing both ways. We agreed to all use the lambda as it is easy to read, follow and to pick up for people who don't like SQL.
There is absolutely no difference in terms of the results, assuming you do actually write equivalent statements in each format.
Go for the most readable one for any given query. Complex queries with joins and many where clauses etc are often easier to write/read in the linq query syntax, but really simple ones like context.Employees.SingleOrDefault(e => e.Id == empId) are easier using the method-chaining syntax. There's no general "one is better" rule, and two people may have a difference of opinion for any given example.
There is no semantic difference between the two statements. Which you choose is purely a matter of style preference
Do you need the explicit cast in either of them? Isn't Time already a DateTime?
Personally I prefer the second approach as I find the extension method syntax more familiar than the LINQ syntax, but it is really just personal preference, they perform the same.
The second one written to more exactly look like the first would be context.Datas.Select(x => x.Time).Min(). So you can see how you wrote it with Min(x => x.Time) might be slightly more efficient, because you only have on operation instead of two
The query comprehension syntax is actually compiled down to a series of calls to the extension methods, which means that the two syntaxes are semantically identical. Whichever style you prefer is the one you should use.

Do you like languages that let you put the "then" before the "if"?

I was reading through some C# code of mine today and found this line:
if (ProgenyList.ItemContainerGenerator.Status != System.Windows.Controls.Primitives.GeneratorStatus.ContainersGenerated) return;
Notice that you can tell without scrolling that it's an "if" statement that works with ItemContainerGenerator.Status, but you can't easily tell that if the "if" clause evaluates to "true" the method will return at that point.
Realistically I should have moved the "return" statement to a line by itself, but it got me thinking about languages that allow the "then" part of the statement first. If C# permitted it, the line could look like this:
return if (ProgenyList.ItemContainerGenerator.Status != System.Windows.Controls.Primitives.GeneratorStatus.ContainersGenerated);
This might be a bit "argumentative", but I'm wondering what people think about this kind of construct. It might serve to make lines like the one above more readable, but it also might be disastrous. Imagine this code:
return 3 if (x > y);
Logically we can only return if x > y, because there's no "else", but part of me looks at that and thinks, "are we still returning if x <= y? If so, what are we returning?"
What do you think of the "then before the if" construct? Does it exist in your language of choice? Do you use it often? Would C# benefit from it?
Let's reformat that a bit and see:
using System.Windows.Controls.Primitives;
...
if (ProgenyList.ItemContainerGenerator.Status != GeneratorStatus.ContainersGenerated)
{
return;
}
Now how hard is it to see the return statement? Admittedly in SO you still need to scroll over to see the whole of the condition, but in an IDE you wouldn't have to... partly due to not trying to put the condition and the result on the same line, and party due to the using directive.
The benefit of the existing C# syntax is that the textual order reflects the execution order - if you want to know what will happen, you read the code from top to bottom.
Personally I'm not a fan of "return if..." - I'd rather reformat code for readability than change the ordering.
I don't like the ambiguity this invites. Consider the following code:
doSomething(x)
if (x > y);
doSomethingElse(y);
What is it doing? Yes, the compiler could figure it out, but it would look pretty confusing for a programmer.
Yes.
It reads better. Ruby has this as part of its syntax - the term being 'statement modifiers'
irb(main):001:0> puts "Yay Ruby!" if 2 == 2
Yay Ruby!
=> nil
irb(main):002:0> puts "Yay Ruby!" if 2 == 3
=> nil
To close, I need to stress that you need to 'use this with discretion'. The ruby idiom is to use this for one-liners. It can be abused - however I guess this falls into the realm of responsible development - don't constrain the better developers by building in restrictions to protect the poor ones.
It's look ugly for me. The existing syntax much better.
if (x > y) return 3;
I think it's probably OK if the scope were limited to just return statements. As I said in my comment, imagine if this were allowed:
{
doSomething();
doSomethingElse();
// 50 lines...
lastThink();
} if (a < b);
But even just allowing it only on return statements is probably a slippery slope. People will ask, "return x if (a); is allowed, so why not something like doSomething() if (a);?" and then you're on your way down the slope :)
I know other languages do get away with it, but C#'s philosophy is more about making The One Right WayTM easy and having more than one way to do something is usually avoided (though with exceptions). Personally, I think it works pretty well, because I can look at someone else's code and know that it's pretty much in the same style that I'd write it in.
I don't see any problem with
return 3 if (x > y);
It probably bothers you because you are not accustomed to the syntax. It is also nice to be able to say
return 3 unless y <= x
This is a nice syntax option, but I don't think that c# needs it.
I think Larry Wall was very smart when he put this feature into Perl. The idea is that you want to put the most important part at the beginning where it's easy to see. If you have a short statement (i.e. not a compound statement), you can put it before the if/while/etc. If you have a long (i.e. compound) statement, it goes in braces after the condition.
Personally I like languages that let me choose.
That said, if you refactor as well as reformat, it probably doesn't matter what style you use, because they will be equally readable:
using System.Windows.Controls.Primitives;
...
var isContainersGenerated =
ProgenyList.ItemContainerGenerator.Status == GeneratorStatus.ContainersGenerated;
if (!isContainersGenerated) return;
//alternatively
return if (!isContainersGenerated);
There is a concern reading the code that you think a statement will execute only later to find out it might execute.
For example if you read "doSomething(x)", you're thinking "okay so this calls doSomething(x)" but then you read the "if" after it and have to realise that the previous call is conditional on the if statement.
When the "if" is first you know immediately that the following code might happen and can treat it as such.
We tend to read sequentially, so reading and going in your mind "the following might happen" is a lot easier than reading and then realising everything you just read needs to be reparsed and that you need to evaluate everything to see if it's within the scope of your new if statement.
Both Perl and Ruby have this and it works fine. Personally I'm fine with as much functionality you want to throw at me. The more choices I have to write my code the better the overall quality, right? "The right tool for the job" and all that.
Realistically though, it's kind of a moot point since it's pretty late for such a fundamental addition in C#'s lifecycle. We're at the point where any minor syntax change would take a lot of work to implement due to the size of the compiler code and its syntax parsing algorithm. For a new addition to be even considered it would have to bring quite a bit of functionality, and this is just a (not so) different way of saying the same thing.
Humans read beginning to end. In analyzing code flow, limits of the short term memory make it more difficult to read postfix conditions due to additional backtracking required. For short expressions, this may not be a problem, but for longer expressions it will incur significant overhead for users that are not seasoned in the language they are reading.
Agreed with confusing , I never heard about this construction before , so I think correct way using then before if must always contents the result of else, like
return (x > y) ? 3 : null;
else way there is no point of using Imperative constructions like
return 3 if (x > y);
return 4 if (x = y);
return 5 if (x < y);
imho It's kinda weird, because I have no idea where to use it...
It's like a lot of things really, it makes perfect sense when you use it in a limited context(a one liner), and makes absolutely no sense if you use it anywhere else.
The problem with that of course is that it'd be almost impossible to restrict the use to where it makes sense, and allowing its use where it doesn't make sense is just odd.
I know that there's a movement coming out of scripting languages to try and minimize the number of lines of code, but when you're talking about a compiled language, readability is really the key and as much as it might offend your sense of style, the 4 line model is clearer than the reversed if.
I think it's a useful construct and a programmer would use it to emphasize what is important in the code and to de-emphasize what is not important. It is about writing intention-revealing code.
I use something like this (in coffeescript):
index = bla.find 'a'
return if index is -1
The most important thing in this code is to get out (return) if nothing is found - notice the words I just used to explain the intention were in the same order as that in the code.
So this construct helps me to code in a way which reflects my intention slightly better.
It shouldn't be too surprising to realize that the order in which correct English or traditional programming-language grammar has typically required, isn't always the most effective or simplest way to create meaning.
Sometimes you need to let everything hang out and truly reassess what is really the best way to do something.
It's considered grammatically incorrect to put the answer before the question, why would it be any different in code?

.net code readability and maintainability [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
There Currently is a local debate as to which code is more readability
We have one programmer who comes from a c background and when that programmer codes it looks like
string foo = "bar";
if (foo[foo.Length - 1] == 'r')
{
}
We have another programmer that doesn't like this methodology and would rather use
if (foo.EndsWith("r"))
{
}
which way of doing these types of operations is better?
EndsWidth is more readable to someone who has never seen C or C++, C#, or any other programming language.
The second one is more declarative in style but I can't tell you objectively if it is more readable sine readability is very subjective. I personally find the second one more readable myself but that is just my opinion.
Here is an excerpt from my article:
Most C# developers are very familiar
with writing imperative code (even
though they may not know it by that
name). In this article, I will
introduce you to an alternative style
of programming called declarative
programming. Proper declarative code
is easier to read, understand, and
maintain.
As professionals, we should be
striving to write better code each
day. If you cannot look at code you
wrote three months ago with a critical
eye and notice things that could be
better, then you have not improved and
are not challenging yourself. I
challenge you to write code that is
easier to read and understand by using
declarative code.
Number 2 is better to read and to mantain.
Example: Verify the last 2 characters ...
Option 1)
if (foo[foo.Length - 1] == 'r' && foo[foo.Length - 2] == 'a')
{
}
Option 2)
if (foo.EndsWith("ar"))
{
}
last 3? last 4?...
I come from a C/C++ background and I vote for Endswith!
Readability rules, especially if it implies intent.
With the first example I must discover the intent - which is left for interpretation. If it appears to have a bug, how do I know that's not intentional?
The second example is telling me the intent. We want to find the end character. Armed with that knowledge I can proceed with evaluating the implementation.
I think the second way is better because it is more easy to read and because the first one duplicates logic of EndsWith method which is bad practice.
I think the right answer would be the one that is actually correct. EndsWith properly returns false for empty string input whereas the other test will throw an exception trying to index with -1.
Not only is EndWith more readable, but also more 'correct'.
As a rule, if there is a framework method provided to do the job ... use it.
What if foo == string.Empty?
IMO, the intent of the original author is clearer in the second example. In the first, the reader must evaluate what the author is trying to accomplish by pulling the last index. It is not difficult, but requires more effort on the part of the reader.
Both approaches are valid, but the endswith method is easier to read in my opinion. It also does away with the potential to make typing mistakes etc with the more complicated form..
EndsWith is probably safer. But the indexer is probably faster.
Endswith probably checks to see if the input string is empty. They will probably both throw null reference exceptions. And the indexer will fail is the length is 0.
As for readability, they both say the same thing to me, but I have been programming for a while. The .EndsWith(...) is probably faster to grasp without considering context.
It pretty much does the same thing. However, it gets more complicated with more than one character in the endswith argument. However, the first example is slightly faster as it uses no actual functions and thus requires no stack. You might want to define a macro which can be used to simply make everything uniform.
I think the main criteria should be which of these most clearly says what the developer wants to do. What does each sample actually say?
1)Access the character at position one less than the length, and check if it equals the character 'r'
2)Check if it ends with the string "r"
I think that makes it clear which is the more maintainable answer.
Unless and until it does not affect the program performance, no problem you can use either way. But adding code comments is very important for conveying what is being accomplished.
From an error handling standpoint, EndsWith.
I much prefer the second (EndsWith) version. It's clear enough for even my manager to understand!
The best practice is to write code that is easily readable. If you used the first one, developers that are debugging your code may say, "What is this dev trying to do?" You need to utilize methods that are easily explained. If a method is too complicated to figure out, retract several methods out of it.
I would definitely say the second one, legibility and simplicity are key!
Also, if the "if" statement has one line, DONT BOTHER USING BRACES, USE A SINGLE INDENTION
Remember that in classic C, the only difference between a "string" and an array of characters is that terminating null character '\0', so we had to more actively treat them accordingly and to make sure that we did not run off the end of the array. So the first block of code bases its thought process on the concept of an array of characters.
The second block of code bases the thought process on how you handle a string more abstractly and with less regard to its implementation under the covers.
So in a nutshell, if you are talking about processing characters as the main idea behind your project, go with the first piece. If you are talking about preparing a string for something greater and that does not necessarily need to focus on the mechanics of the ways that strings are built -- or you just want to keep it simple -- go with the second style.
Some of this might summarize others' contributions at this point, but more analogously put, are you playing "Bingo();" or are you "playing a game with a two-dimensional array of random integers, etc.?"
Hopefully this helps.
Jim
"Code is written to be read by humans and incidently run by computers" SICP
EndsWith FTW!!

Categories

Resources