Is the from keyword preferable to direct method calls in C#? - c#

Of these two options:
var result = from c in coll where c % 2 == 0 select c;
var result = coll.Where ( c => c % 2 == 0 );
Which is preferable?
Is there any advantage to using one over the other? To me the second one looks better, but I would like to hear other people's opinions.

If you've only got one or two clauses, I'd go for "dot notation". When you start doing joins, groupings, or anything else that introduces transparent identifiers, query syntax starts to appeal a lot more.
It's often worth trying it both ways and seeing what's the most readable for that particular situation.
In terms of the generated code, they'll be exactly the same in most cases. Occasionally there'll be an overload you can use in dot notation which makes it simpler than the query expression syntax, but value readability over everything else in most cases.
I also have a blog post on this topic. I would definitely recommend that developers should be comfortable with both options - I'd be quite concerned if a colleague were using LINQ but didn't understand the fundamentals of what query expressions were about, for example. (They don't need to know every translation involved, but some idea of what's going on will make their lives a lot easier.)

I always use the lambda syntax because to me it's clearer what's actually happening and it looks cool to boot. But we have some devs here that always do the opposite (sql nerds I guess :) Fortunately, tools like ReSharper can just transform between the two with a click.

Related

Efficient(?) string comparison

What could possibly be the reasons to use -
bool result = String.Compare(fieldStr, "PIN", true).Equals(0);
instead of,
bool result = String.Equals(fieldStr, "PIN", StringComparison.CurrentCultureIgnoreCase);
or, even simpler -
bool result = fieldStr.Equals("PIN", StringComparison.CurrentCultureIgnoreCase);
for comparing two strings in .NET with C#?
I've been assigned on a project with a large code-base that has abandon use of the first one for simple equality comparison. I couldn't (not yet) find any reason why those senior guys used that approach, and not something simpler like the second or the third one. Is there any performance issue with Equals (static or instance) method? Or is there any specific benefit with using String.Compare method that even outweighs the processing of an extra operation of the entailing .Equals(0)?
I can't give immediate examples, but I suspect there are cases where the first would return true, but the second return false. Two values maybe equal in terms of sort order, while still being distinct even under case-ignoring rules. For example, one culture may decide not to treat accents as important while sorting, but still view two strings differing only in accented characters as unequal. (Or it's possible that the reverse may be true - that two strings may be considered equal, but one comes logically before the other.)
If you're basically interested in the sort order rather than equality, then using Compare makes sense. It also potentially makes sense if the code is going to be translated - e.g. for LINQ - and that overload of Compare is supported but that overload of Equals isn't.
I'll try to come up with an example where they differ. I would certainly say it's rare. EDIT: No luck so far, having tried accents, Eszet, Turkish "I" handling, and different kinds of spaces. That's a long way from saying it cant happen though.

LINQ: Why is it called "Comprehension Syntax"

Why is the following LINQ syntax (sometimes called "query" syntax) called "comprehension" syntax? What's being comprehended (surely that's wrong)? Or, what is comprehensively represented (maybe I'm on the right track, now)?
It comes from the more language-agnostic term List Comprehension which many languages follow. The history apparently is:
The SETL programming language (later 1960s) had a set formation construct, and the computer algebra system AXIOM (1973) has a similar construct that processes streams, but the first use of the term "comprehension" for such constructs was in Rod Burstall and John Darlington's description of their functional programming language NPL from 1977.
FOLDOC mostly echoes this as well:
According to a note by Rishiyur Nikhil , (August 1992), the term itself seems to have been coined by Phil Wadler circa 1983-5, although the programming construct itself goes back much further (most likely Jack Schwartz and the SETL language).
The term "list comprehension" appears in the references below.
The earliest reference to the notation is in Rod Burstall and John Darlington's description of their language, NPL.
["The OL Manual" Philip Wadler, Quentin Miller and Martin Raskovsky, probably 1983-1985].
["How to Replace Failure by a List of Successes" FPCA September 1985, Nancy, France, pp. 113-146].
I suspect this is related to the second meaning of Comprehend:
to take in or embrace; include;
comprise
This syntax has to do with defining what should be included in a set.
I think this paper can shed light http://dl.acm.org/citation.cfm?id=181564
I.e they argue and define (I think) what a comprehension syntax is. It is issued in 1994 and maybe it affected the design concepts of LINQ.
My understanding of the term linq comprehension syntax as a.NET developer is that it allows you to write linq in a familiar style query language. As a person's understanding of linq improves they may move to what is known in .NET as extension method syntax, which is also how the .NET compiler will interpret linq at compile time.
Since the term "comprehension" and "comprehensive" is very often used in English language to indicate the "whole" and "completeness", one meaning of comprehension syntax could be a sintax that allows to build expressions which are able to generate sets of values that "include all values"(comprehend) that respect the rules expressed by those expressions.
Another meaning could be more related to the generation of subsets of values (lists) starting from some specified set, and therefore the subset of values that belongs to the starting set and it is "comprised" in the original set. For that reason, the comprehension sintax could be the sintax for the programming languages's constructs that can generate subset values comprised into a specified original set.

ternary operator imbrication

I am wondering: what is the best instruction in terms of performance between those 2 versions:
Background = Application.Current.Resources[condition ? BackgroundName1 : BackgroundName2] as Brush;
and:
Background = condition ? Application.Current.Resources[BackgroundName1] as Brush : Application.Current.Resources[BackgroundName2] as Brush;
is there any difference? and if yes, which one is better?
NB: BackgroundName1 & 2 are simply strings
The first one is shorter and more readable.
It's also easier to maintain.
If you later change it to read a different Resources dictionary, you might forget to change the second half of the second one.
The first one is also more clearly reading from the same dictionary.
First: Use a profiler to find the slowest thing. If you're having a performance problem it doesn't make sense to spend hours or days working on making something faster that is already fast enough.
Second: You can determine the answer to your question by trying it both ways and carefully measuring to see if there is a difference. Don't ask us which is faster; we don't know because we haven't tried it and have no ability to try it.
Don't get too caught up in micro-optimizations! The performance gain you'll get will be nil. Go for the code that is more readable and easier to understand in the end.
No difference whatsoever.

Is there a standard way to count statements in C#

I was looking at some code length metrics other than Lines of Code. Something that Source Monitor reports is statements. This seemed like a valuable thing to know, but the way Source Monitor counted some things seemed unintuitive. For example, a for statement is one statement, even though it contains a variable definition, a condition, and an increment statement. And if a method call is nested in an argument list to another method, the whole thing is considered one statement.
Is there a standard way that statements are counted and are their rules governing such a thing?
The first rule of metrics is "be careful what you measure". You ask for a count of statements, that's what you're going to get. As you note, that figure is perhaps not actually relevant.
If you're interested in other measures, like how "complex" code is, consider looking into other code metrics, like cyclometric complexity.
http://en.wikipedia.org/wiki/Cyclomatic_complexity
UPDATE: Re: your comment
I agree that "doing too much" is an interesting metric. My rule of thumb is that one statement should have one side effect (usually a "local" side effect like mutating a local variable, but sometimes a visible side effect, like writing to a file) and therefore "number of statements" should be roughly correlated with how much the method is "doing" in terms of its number of side effects.
In practice, of course no one's code, my own included, actually meets that bar all the time. You might consider a metric for "how much the method is doing" to count not just statements but also, say, method calls.
To actually answer your question: I'm not aware of any industry standard that regulates what "number of statements" is. The C# specification certainly defines what a "statement" is lexically, but then of course you have to do some interpretation to do a count. For example:
void M()
{
try
{
if (blah)
{
Frob();
Blob();
}
}
catch(Exception ex)
{ /* eat it */ }
finally
{
Grob();
}
}
How many statements are there in M? Well, the body of M consists of one statement, a try-catch-finally. So is the answer one? The body of the try contains one statement, an "if" statement. The consequence of the "if" contains one statement -- remember, a block is a statement. The block contains two statements. The finally contains one statement. The catch block contains no statements -- a catch block is not a statement, lexically -- but it certainly is highly relevant to the operation of the method!
So how many statements is that altogether? One could make a reasonable case for any number from one to six, depending on whether you count blocks as "real" statements, whether you consider child statements as in addition to their parent statement or not, and so on. There is no standards body which regulates the answer to this question that I'm aware of.
The closest you might get to a formal definition of "what is a statement" would be the C# specification itself. Good luck working out whether a particular tool's measurement agrees with your reading of the specification.
Given that metrics are best used as a guide to better/worse code, and not a strict formula, does the exact definition used by the tool make much difference?
If I have three methods, with "statement lengths" of 2500, 1500 and 150, I know which method I'll be examining first; that another tool might report 2480, 1620 and 174 isn't too important.
One of the best tools I've seen for measuring metrics is NDepend, though again I'm not 100% sure what definitions it is using. According to the website, NDepend has 82 separate metrics, including Number of instructions and Cyclomatic Complexity.
The C# Metrics Tool defines the things being counted ("statements", "operands"), etc. by using a precise C# BNF language definition. (In fact, it precisely parses the code according a full C# grammar and then computes structural metrics by walking over the parse tree; SLOC count it gets by countline lines as you'd expect).
You might still argue that such a definition it unintuitive (grammars rarely are), but they are precise. I agree with other posters here, however, that the precise measure isn't as important as the relative value that one block of code has with respect to another. A value of "173.92" complexity just isn't very helpful by itself; compard to another complexity value of "81.02", we can say there's a good indication that the first one is more complex than the second, and that's enough to provide a focus of attention.
I think that metrics are also useful in trending; if last week, this code was "81.02" complex, ad this week it is "173.92", I should wonder why is all that happening inthis part of the code?
You might also consider a ratio of a structural metric (e.g., Cyclomatic) to SLOC as an indication of "doing too much", or at least an indication of writing code that is way too dense to understand
One simple metric is to just count the punctuation marks (;, ,, .) between tokens (so as to avoid those in strings, comments, or numbers). Thus, for (x = 0, y = 1; x < foo.Count; x++, y++) bar[y] = foo[x]; would count as 6.

Semicolons in C#

Why are semicolons necessary at the end of each line in C#?
Why can't the complier just know where each line is ended?
The line terminator character will make you be able to break a statement across multiple lines.
On the other hand, languages like VB have a line continuation character (and may raise compile error for semicolon). I personally think it's much cleaner to terminate statements with a semicolon rather than continue using undersscore.
Finally, languages like JavaScript (JS) and Swift have optional semicolon(s), but at least JS has a convention to always put semicolons (even if not required, which prevents accidents).
No, the compiler doesn't know that a line break is for statement termination, nor should it. It allows you to carry a statement to multilines if you like.
See:
string sql = #"SELECT foo
FROM bar
WHERE baz=42";
Or how about large method overloads:
CallMyMethod(thisIsSomethingForArgument1,
thisIsSomethingForArgument2,
thisIsSomethingForArgument2,
thisIsSomethingForArgument3,
thisIsSomethingForArgument4,
thisIsSomethingForArgument5,
thisIsSomethingForArgument6);
And the reverse, the semi-colon also allows multi-statement lines:
string s = ""; int i = 0;
How many statements is this?
for (int i = 0; i < 100; i++) // <--- should there be a semi-colon here?
Console.WriteLine("foo")
Semicolons are needed to eliminate ambiguity.
So that whitespace isn't significant except inside identifiers and keywords and such.
I personally agree with having a distinct character as a line terminator. It makes it much easier for the compiler to figure out what you are trying to do.
And contrary to popular belief it is not possible 100% of the time to for the compiler to figure out where one statement end and another begins without assistance! There are edge cases where it is ambiguous whether it is a single statement or multiple statements spanning several lines.
Read this article from Paul Vick, the technical lead of Visual Basic to see why its not as easy as it sounds.
Strictly speaking, this is true: if a human could figure out where a statement ends, so can the compiler. This hasn't really caught on yet, and few languages implement anything of that kind. The next version of VB will probably be the first language to implement a proper handling of statements that require neither explicit termination nor line continuation [source]. This would allow code like this:
Dim a = OneVeryLongExpression +
AnotherLongExpression
Dim b = 2 * a
Let's keep our fingers crossed.
On the other hand, this does make parsing much harder and can potentially result in poor error messages (see Haskell).
That said, the reason for C# to use a C-like syntax was probably due to marketing reasons more than anything else: people are already familiar with languages like C, C++ and Java. No need to introduce yet another syntax. This makes sense for a variety of reasons but it obviously inherits a lot of weaknesses from these languages.
It can be done. What you refer to is called "semicolon insertion". JavaScript does it with much success, the reason why it is not applied in C# is up to its designers. Maybe they did not know about it, or feared it might cause confusion among programmers.
For more details in semicolon insertion in JavaScript, please refer to the ECMA-script standard 262 where JavaScript is specified.
I quote from page 22 (in the PDF, page 34):
When, as the program is parsed from left
to right, the end of the input
stream of tokens is encountered and
the parser is unable to parse the
input token stream as a single complete
ECMA Script Program,
then a semicolon isa utomatically inserted at
the end of the input stream.
When, as
the program is parsed from left to right,
a token is encountered that is
allowed by some production of
the grammar, but
the production is a restricted production and the token would be the
first token for a terminal or
nonterminal immediately following the
annotation “[no LineTerminator
here]” with in the restricted production (and there fore such a token is
called a restricted token), and the
restricted token is separated fromt he
previous token by at least one
LineTerminator, then a
semicolon is automatically inserted before the restricted token.
However, there is an additional
overriding condition on the preceding
rules: a semicolon is never
inserted automatically if
the semicolon would then be parsed as an empty statement
or if that semicolon
would become one of the two semicolons in the header of a for statement
(section 12.6.3).
[...]
The specification document even contains examples!
Another good reason for semicolons is to isolate syntax errors. When syntax errors occur the semicolons allow the compiler to get back on track so that something like
a = b + c = d
can be disambiguated between
a = b + c; = d
with the error in the second statement or
a = b + ; c = d
with the error in the first statement. Without the semicolons, it can be impossible to say where a statement ends in the presence of a syntax error. A missing parenthesis might mean that the entire latter half of your program may be considered one giant syntax error rather than being syntax checked line by line.
It also helps the other way - if you meant to write
a = b; c = d;
but typoed and left out the "c" then without semis it would look like
a = b = d
which is valid and you'd have a running program with a bad and difficult to locate bug so semicolons can often help catch errors that otherwise would look like valid syntax. Also, I agree with everybody on readability. I don't like working in languages without some sort of statement terminator for that reason.
I've been mulling this question a bit and if I may take a guess at the motivations of the language designers:
C# obviously has semicolons because of its heritage from C. I've been rereading the K&R book lately and it's pretty obvious that Dennis Ritchie really didn't want to force programmers to code the way he thought was best. The book is rife with comments like, "Although we are not dogmatic about the matter, it does seem that goto statements should be used rarely, if at all" and in the section on functions they mention that they chose one of many format styles, it doesn't matter which one you pick, just be consistent.
So the use of an explicit statement terminator allows the programmer to format their code however they like. For better or worse, it seems consistent with how C was originally designed: do it your way.
I would say that the biggest reason that semicolons are necessary after each statement is familiarity for programmers already familiar with C, C++, and/or Java. C# inherits many syntactical choices from those languages and is not simply named similarly to them. Semicolon-terminated statements is just one of the many syntax choices borrowed from those languages.
Semi-colons are a remnant from the C language, when programmers often wanted to save space by combining statements on one line. i.e.
int i; for( i = 0; i < 10; i++ ) printf("hello world.\n"); printf("%d instance.\n", i);
It also helped the compiler, which was not smart enough to simply infer the end of a statement. In almost all cases, combining statements on one line is not looked favorably upon by most c# developers for readability reasons. The above is typically written like so:
int i;
for( i = 0; i < 10; i++ )
{
printf("hello world.\n);
printf("%d instance.\n", i);
}
Very verbose! For modern languages, compilers can easily be developed to infer end of statements. C# could be altered into another language which uses no unnecessary delimiters other than a space and indenting tab, i.e.
int i
for i=0 i<10 i++
printf "hello world.\n"
printf "%d instance.\n" i
That would certainly save some typing and it looks neater. If indents are used rather than spaces, the code becomes much more readable. We can do one better if we allow types to be inferred and make a special case of for, to read, (for [value]=[initial value] to [final value:
for i=1 to 10 // i is inferred to be an integer
printf "hello world.\n"
printf "%d instance.\n" i
Now, its beginning to look like f# and f#, in some ways, is almost like c# without the unnecessary punctuation. However f# lacks so many extras (like special .NET language constructs, code completion and good intellisense). So, in the end f# can be more work than c# or VB.NET to implement, sadly.
Personally, my work required VB.NET and I have been happier not having to deal with semi-colons. C# is a dated language. Linq has allowed me to cut down on the number of lines of code I have to write. Still, if I had the time, I would write a version of c# which had many of the features of f#.
You could accurately argue that requiring a semicolon to terminate a statement is superfluous. It is technically possible to remove the semicolon from the C# language and still have it work. The problem is that it leaves room for misinterpretation by humans. I would argue that the necessity of semicolons is the disambiguation for the sake of humans, not the compiler. Without some form of statement delimitation, it is much harder for humans to interpret consise staements such as this:
int i = someFlag ? 12 : 5 int j = i + 3
The compiler should be able to handle this just fine, but to a human the below looks much better
int i = someFlag ? 12 : 5; int j = i + 3;

Categories

Resources