Why doesn't ; ; result in a build error in VS? - c#

No matter how much ; you placed at the end of a C# code line, the compiler will not show an error and the build is successful.
In almost all other languages like C, C++ and Java. This is not allowed.

Your contention that this pattern is illegal in C, C++ and Java is completely false.
I refer you to:
The C Programming Language, 2nd edition, section A9.2:
... the construction is called a null statement; it is often used to supply an empty body to an iteration statement...
The C++ Programming Language, 2nd edition, section r.6.2
An expression statement with the expression missing is called a null statement; it is useful ... to supply a null body to an iteration statement ...
The Java Language Specification, 1st edition, section 14.5
An empty statement does nothing.
The C# Language Specification, 4th edition, section 8.3:
An empty statement is used when there are no operations to perform in a context where a statement is required.

The empty statement:
http://msdn.microsoft.com/en-us/library/aa664739(v=vs.71).aspx
No matter how many you have - still does nothing....
You can do the same thing in C/C++, and probably Java too:
Why are empty expressions legal in C/C++?

Interestingly related to Eric's blog on why is this not a warning.
From Eric Lippert's blog " I am often asked why a particular hunk of bad-smelling code does not produce a compiler warning."
http://blogs.msdn.com/b/ericlippert/archive/2011/03/03/danger-will-robinson.aspx
The point being, would it be good use of the compiler teams valuable time to introduce such a warning?

Why not? ; delimits statements in many forms of code flow. A single ; by itself simply means "nothing happens here". Putting a bunch of ;s together still results in, well, nothing happening!

; is an empty statement, and it's perfectly legitimate. What's your objection to having a series of consecutive ;'s? :-)

Adding multiple ; means you add empty statements. They are legal.

Empty statements (a semicolon with nothing in front of it) are allowed in both C, C++, C#, and Java.

It results in blank commands, that's all. Probably Microsoft decided to let their shiny language shoot blanks, if that's what the programmer wants. :)
The complier most probably is removing this/ignoring it altogether as part of optimization.

Also, consider languages where newlines or whitespace is used to mark the end of a code block instead of semi-colons. In these languages it's not an error to leave blank lines.

Related

How does Visual Studio syntax-highlight strings in the Regex constructor?

Hi fellow programmers and nerds!
When creating regular expressions Visual Studio, the IDE will highlight the string if it's preceded by a verbatim identifier (for example, #"Some string). This looks something like this:
(Notice the way the string is highlighted). Most of you will have seen this by now, I'm sure.
My problem: I am using a package acquired from NuGet which deals with regular expressions, and they have a function which takes in a regular expression string, however their function doesn't have the syntax highlighting.
As you can see, this just makes reading the Regex string just a pain. I mean, it's not all-too-important, but it would make a difference if we can just have that visually-helpful highlighting to reduce the time and effort one's brain uses trying to decipher the expression, especially in a case like mine where there will be quite a quantity of these expressions.
The question
So what I'm wanting to know is, is there a way to make a function highlight the string this way*, or is it just something that's hardwired into the IDE for the specific case of the Regex c-tor? Is there some sort of annotation which can be tacked onto the function to achieve this with minimal effort, or would it be necessary to use some sort of extension?
*I have wrapped the call to AddStyle() into one of my own functions anyway, and the string will be passed as a parameter, so if any modifications need to be made to achieve the syntax-highlight, they can be made to my function. Therefore the fact that the AddStyle() function is from an external library should be irrelevant.
If it's a lot of work then it's not worth my time, somebody else is welcome to develop an extension to solve this, but if there is a way...
Important distinction
Please bear in mind I am talking about Visual Studio, NOT Visual Studio Code.
Also, if there is a way to pull the original expression string from the Regex, I might do it that way, since performance isn't a huge concern here as this is a once-on-startup thing, however I would prefer not to do it that way. I don't actually need the Regex object.
According to https://devblogs.microsoft.com/dotnet/visual-studio-2019-net-productivity/#regex-language-support and https://www.meziantou.net/visual-studio-tips-and-tricks-regex-editing.htm you can mark the string with a special comment to get syntax highlighting:
// language=regex
var str = #"[A-Z]\d+;
or
MyMethod(/* language=regex */ #"[A-Z]\d+);
(the comment may contain more than just this language=regex part)
The first linked blog talks about a preview, but this feature is also present in the final product.
.NET 7 introduces the new [StringSyntax(...)] attribute, which is used in .NET 7 on more than 350 string, string[], and ReadOnlySpan<char> parameters, properties, and fields to highlight to an interested tool what kind of syntax is expected to be passed or set.
https://devblogs.microsoft.com/dotnet/regular-expression-improvements-in-dotnet-7/?WT_mc_id=dotnet-35129-website&hmsr=joyk.com&utm_source=joyk.com&utm_medium=referral
So for a method argument you should just use:
void MyMethod([StringSyntax(StringSyntaxAttribute.Regex)] string regex);
Here is a video demonstrating the feature: https://youtu.be/Y2YOaqSAJAQ

# prefix for identifiers in C#

The "#" character is allowed as a prefix to enable keywords to be used as identifiers.
Majority of .net developers know about this.
But what we may not know:
Two identifiers are considered the same if they are identical after the "#" prefix is removed.
So
static void Main(string[] args)
{
int x = 123;
Console.WriteLine(#x);
}
is absolutely valid code and prints 123 to the console.
I'm curious why do we have such rule in the specs, and how this feature may be used in real world situations (it doesn't make sense to prefix identifiers with "#" if they are not keywords, right?).
It is totally logical. # is not part of the name but is a special indicator to not treat what comes after as a keyword but as an identifier.
Eric Lippert has a very good post about it: Verbatim Identifier
I’m occasionally asked why it is that any identifier can be made into
a verbatim identifier. Why not restrict the verbatim identifiers to
the reserved and contextual keywords?
The answer is straightforward. Imagine that we are back in the day
when C# 2.0 just shipped. You have a C# 1.0 program that uses yield
as an identifier, which is entirely reasonable; “yield” is a common
term in many business and scientific applications. Now, C# 2.0 was
carefully designed so that C# 1.0 programs that use yield as an
identifier are still legal C# 2.0 programs; it only has its special
meaning when it appears before return, and that never happened in a C#
1.0 program. But still, you decide that you’re going to mark the usages of yield in your program as verbatim identifiers so that it is
more clear to the future readers of the code that it is being used as
an identifier, not as part of an iterator
Let's consider the example of a program that generates C# code -- for example, something that takes the columns in a database table and creates a comparable C# POCO object, with one property per column.
What if one of the column names matches a C# keyword? The code generator doesn't have to remember which words are keywords or not if all of the property names are prefixed with #.
It's a fail-safe. The extra # characters don't hurt the code at all!!
The other answers are pretty clear about why the behavior exists, but I think it might be worthwhile to look at the rules for which identifiers are treated as equal.
Quoting the specification section 2.4.2:
Two identifiers are considered the same if they are identical after the following transformations are applied, in order:
The prefix "#", if used, is removed
Each unicode-escape-sequence is transformed into it's corresponding Unicode character.
Any formatting-characters are removed.
Following those rules, #x is identical to x.
It provides certainty:
Using #word is future-proof.
No changes are needed if it becomes a keyword later.
Most programmers will not be familiar with every keyword (C# has approx. 100 keywords)
The more recent keywords are "contextual", so sometimes they are not keywords.

Is there a syntactically legal expression that has 2 consecutive identifiers separated only by white space in C#?

That might not be the best way to phrase it, but I'm considering writing a tool that converts identifiers separated by spaces in my code to camel case. A quick example:
var zoo animals = GetZooAnimals(); // i can't help but type this
var zooAnimals = GetZooAnimals(); // i want it to rewrite it like this
I was wondering if writing a tool like this would run into any ambiguities assuming it ignores all keywords. The only reason I can think of is if there is a syntactically valid expression with 2 identifiers only separated by white space.
Looking through the grammar I could not immediately find a place that allows it, but perhaps someone else would know better.
On a side note, I realize this is not a practical solution to a real problem a lot of people have, but just something I do all the time and wanted to take a stab at fixing with tools instead of forcing myself to always write camel case.
It is hard to tell whether a space-separated sequence of identifiers represents a single variable or not without doing full semantic analysis. For example
Myclass myVariable;
is a pair of space-separated identifiers which are perfectly valid. This would cause an ambiguity if you want to camel-case both type names and variable names.
If one enters:
csharp> var i j = 3;
(1,7): error CS1525: Unexpected symbol `j', expecting `,', `;', or `='
in the csharp interactive shell, one gets an error generated by the parser (a (LA)LR parser does bookkeeping what to expect next). Such parser works left-to-right so it doesn't know which characters to come next. It simply knows that the next characters are one of the list shown above.
So that means that there is probably no way to - at least declare a variable - with spaces.
Furthermore based on this context-free grammar for C# there doesn't seem to be a case where one can place two identifiers next to each other. It is for instance possible that a primary expressions is an identifier, but there is no situation where a primary expression is placed next to an identifier (or with an optional part in between).
As #dasblinkenlight says, you can indeed see the rule "local-variable-declaration":
type variable-declarator
with type that can be evaluated to an identifier and variable-declarator starting with an identifier. You can however know that the type is the first identifier (or the var keyword). Some kind of rewrite rule is thus:
(\w+)(\s+\w+)+ -> \1 concat(\2)
where you need to combine (concat) the identifiers of the second group. In case of an assignment.

How to find the suspicious statement, like "Name = Name;" in C# by Regex?

My C# code has a lot of statement like "this.Name=...". In order to make my code neat, I let the text editor to replace all "this." to nothing. The code still worked. But later I fund it caused me a lot of new troubles for I wrote some statements like:
this.Name = Name; // the second Name is a parameter.
After the replacement, it became:
Name = Name;
Now, I met too much code. How to find the suspicious code like "Name = Name;" by Regex in VS 2010?
Thanks,
Ying
Why would you want to use Regex when you can simply compile the solution and look for the CS1717 warning:
Assignment made to same variable; did
you mean to assign something else?
Also note that in C# it is a good convention to have your parameters start with lowercase letter.
I would agree that Darin's approach is more robust and should be done first. However you might
have commented out sections of code which will be missed with this approach.
To try and find those you can use "Find in Files". In the Find box tick "Use regular expresssions" and enter {:i}:Wh*=:Wh*\1
:i C Style identifier ("tagged" expression by enclosing in braces)
:Wh* Zero or more white space chars
\1 back reference to tagged identifier found
This approach might bring back some false positives so you could try :Wh+{:i}:Wh*=:Wh*\1:Wh+ if there are too many but at the risk of missing some matches (e.g. where the closing comment mark is immediately after the assignment statement)
You could restore your last commit from your CVS, if you haven't changed too much since.
The problem with doing what you ask is that there might be other cases where "this" shouldn't have been replaced and you haven't seen the problem yet.

C# method contents validation

I need to validate the contents of a C# method.
I do not care about syntax errors that do not affect the method's scope.
I do care about characters that will invalidate parsing of the rest of the code. For example:
method()
{
/* valid comment */
/* <-- bad
for (i..) {
}
for (i..) { <-- bad
}
I need to validate/fix any non-paired characters.
This includeds /* */, { }, and maybe others.
How should I go about this?
My first thought was Regex, but that clearly isn't going to get the job done.
You'll need to scope your problem more carefully in order to get a sensible answer.
For example, what are you going to do about methods that contain preprocessor directives?
void M()
{
#if FOO
for(foo;bar;blah) {
#else
while(abc) {
#endif
Blah();
}
}
This is silly but legal, so you have to handle it. Are you going to count that as a mismatched brace or not?
Can you provide a detailed specification of exactly what you want to determine? As we've seen several times on this site, people cannot successfully build a routine that divides two numbers without a specification. You're talking about analysis that is far more complex than dividing two numbers; the code which does what you're describing in the actual compiler is tens of thousands of lines long.
A regex is certainly not the answer to this problem. Regex's are useful tools for certain types of data validation. But once you get into the business of more complicated data like matching braces or comment blocks a regex no longer gets the job done.
Here is a blog article on the limitations encountered when using a regex to validate input.
http://blogs.msdn.com/ianhu/archive/2009/11/16/intellitrace-itrace-files.aspx
In order to do this you will have to write a parser of sorts which does the validation.
A regular expression isn't a very convenient thing for such a task. This is often implemented using a stack with an algorithm like the following:
Create an empty stack S.
While( there are characters left ){
Read a character ch.
If is ch an opening paren (of any kind), push it onto S
Else
If ch is a closing paren (of any kind), look at the top of S.
If S is empty as this point, report failure.
If the top of S is the opening paren that corresponds to c,
then pop S and continue to 1, this paren matches OK.
Else report failure.
If at the end of input the stack S is not empty, return failure.
Else return success.
for more information check http://www.ccs.neu.edu/home/sbratus/com1101/lab4.html and http://codeidol.com/csharp/csharpckbk2/Data-Structures-and-Algorithms/Determining-Where-Characters-or-Strings-Do-Not-Balance/
If you're trying to "validate" the contents of a string defining a method, then you may be better off just trying to use the CodeDom classes and compile the method on the fly into an in memory assembly.
Writing your own fully-functional parser to do validation will be very, very difficult, especially if you want to support C# 3 or later. Lambda expressions and other constructs like that will be very difficult to "validate" cleanly.
You're drawing a false dichotomy between "characters that will invalidating parsing the rest of the code" and "syntax errors". Lacking a closing curly brace (one of the problems you mention) is a syntax error. It looks like you mean you're looking for syntax errors that potentially break scope boundaries? Unfortunately, there's no robust way to do this short of using a full parser.
As an example:
method()
{ <-- is missing closing brace
/* valid comment */
/* <-- bad
for (i..) {
}
for (i..) {
} <-- will be interpreted as the closing brace for the for loop
There's no general, practical way to infer that it's the for loop that's missing its closing brace, rather than the method.
If you're really interested in looking for these sort of things, you should consider running the compiler programmatically and parsing the results - that's the best approach with the lowest entry threshold.

Categories

Resources