# prefix for identifiers in C# - c#

The "#" character is allowed as a prefix to enable keywords to be used as identifiers.
Majority of .net developers know about this.
But what we may not know:
Two identifiers are considered the same if they are identical after the "#" prefix is removed.
So
static void Main(string[] args)
{
int x = 123;
Console.WriteLine(#x);
}
is absolutely valid code and prints 123 to the console.
I'm curious why do we have such rule in the specs, and how this feature may be used in real world situations (it doesn't make sense to prefix identifiers with "#" if they are not keywords, right?).

It is totally logical. # is not part of the name but is a special indicator to not treat what comes after as a keyword but as an identifier.

Eric Lippert has a very good post about it: Verbatim Identifier
I’m occasionally asked why it is that any identifier can be made into
a verbatim identifier. Why not restrict the verbatim identifiers to
the reserved and contextual keywords?
The answer is straightforward. Imagine that we are back in the day
when C# 2.0 just shipped. You have a C# 1.0 program that uses yield
as an identifier, which is entirely reasonable; “yield” is a common
term in many business and scientific applications. Now, C# 2.0 was
carefully designed so that C# 1.0 programs that use yield as an
identifier are still legal C# 2.0 programs; it only has its special
meaning when it appears before return, and that never happened in a C#
1.0 program. But still, you decide that you’re going to mark the usages of yield in your program as verbatim identifiers so that it is
more clear to the future readers of the code that it is being used as
an identifier, not as part of an iterator

Let's consider the example of a program that generates C# code -- for example, something that takes the columns in a database table and creates a comparable C# POCO object, with one property per column.
What if one of the column names matches a C# keyword? The code generator doesn't have to remember which words are keywords or not if all of the property names are prefixed with #.
It's a fail-safe. The extra # characters don't hurt the code at all!!

The other answers are pretty clear about why the behavior exists, but I think it might be worthwhile to look at the rules for which identifiers are treated as equal.
Quoting the specification section 2.4.2:
Two identifiers are considered the same if they are identical after the following transformations are applied, in order:
The prefix "#", if used, is removed
Each unicode-escape-sequence is transformed into it's corresponding Unicode character.
Any formatting-characters are removed.
Following those rules, #x is identical to x.

It provides certainty:
Using #word is future-proof.
No changes are needed if it becomes a keyword later.
Most programmers will not be familiar with every keyword (C# has approx. 100 keywords)
The more recent keywords are "contextual", so sometimes they are not keywords.

Related

Is there a syntactically legal expression that has 2 consecutive identifiers separated only by white space in C#?

That might not be the best way to phrase it, but I'm considering writing a tool that converts identifiers separated by spaces in my code to camel case. A quick example:
var zoo animals = GetZooAnimals(); // i can't help but type this
var zooAnimals = GetZooAnimals(); // i want it to rewrite it like this
I was wondering if writing a tool like this would run into any ambiguities assuming it ignores all keywords. The only reason I can think of is if there is a syntactically valid expression with 2 identifiers only separated by white space.
Looking through the grammar I could not immediately find a place that allows it, but perhaps someone else would know better.
On a side note, I realize this is not a practical solution to a real problem a lot of people have, but just something I do all the time and wanted to take a stab at fixing with tools instead of forcing myself to always write camel case.
It is hard to tell whether a space-separated sequence of identifiers represents a single variable or not without doing full semantic analysis. For example
Myclass myVariable;
is a pair of space-separated identifiers which are perfectly valid. This would cause an ambiguity if you want to camel-case both type names and variable names.
If one enters:
csharp> var i j = 3;
(1,7): error CS1525: Unexpected symbol `j', expecting `,', `;', or `='
in the csharp interactive shell, one gets an error generated by the parser (a (LA)LR parser does bookkeeping what to expect next). Such parser works left-to-right so it doesn't know which characters to come next. It simply knows that the next characters are one of the list shown above.
So that means that there is probably no way to - at least declare a variable - with spaces.
Furthermore based on this context-free grammar for C# there doesn't seem to be a case where one can place two identifiers next to each other. It is for instance possible that a primary expressions is an identifier, but there is no situation where a primary expression is placed next to an identifier (or with an optional part in between).
As #dasblinkenlight says, you can indeed see the rule "local-variable-declaration":
type variable-declarator
with type that can be evaluated to an identifier and variable-declarator starting with an identifier. You can however know that the type is the first identifier (or the var keyword). Some kind of rewrite rule is thus:
(\w+)(\s+\w+)+ -> \1 concat(\2)
where you need to combine (concat) the identifiers of the second group. In case of an assignment.

Working with Unicode Blocks in Regex

I am trying to add a feature that works with certain unicode groups from a string. I found this question that suggests the following solution, which does work on the unicodes inside of the stated range:
s = Regex.Replace(s, #"[^\u0000-\u007F]", string.Empty);
This works fine.
In my research, though, I came across the use of unicode blocks, which I find to be far more readable.
InBasic_Latin = U+0000–U+007F
More often, I saw recommendations pointing people to use the actual codes themselves (\u0000-\u007F) rather than these blocks (InBasic_Latin). I could see the benefit of explicitly declaring a range when you need some subset of that block or a specific unicode, but when you really just want that entire grouping using the block declaration it seems more friendly to readability and even programmability to use the block name instead.
So, generally, my question is why would \u0000–\u007F be considered a better syntax than InBasic_Latin?
It depends on your regex engine, but some (like .NET, Java, Perl) do support Unicode blocks:
if (Regex.IsMatch(subjectString, #"\p{IsBasicLatin}")) {
// Successful match
}
Others don't (like JavaScript, PCRE, Python, Ruby, R and most others), so you need to spell out those codepoints manually or use an extension like Steve Levithan's XRegExp library for JavaScript.

Using # prefix for variable name

In my domain model I have an Event entity. This means that I have to sometimes declare variables as: #event since event is a reserved key word.
I've read on a few stack overflow posts (like What's the use/meaning of the # character in variable names in C#?) that this is not recommended unless you're interacting with other programming languages. My question is why is it not recommended? What is the issue with using #?
I could use an "Occasion" entity instead but that would mean in my UI layer I would have events which maps to occasions?
What you're trying to do is the entire purpose of the # prefix to prevent name clashes.
But this text from MSDN says everything you're asking:
The prefix "#" enables the use of keywords as identifiers, which is
useful when interfacing with other programming languages. The
character # is not actually part of the identifier, so the identifier
might be seen in other languages as a normal identifier, without the
prefix. An identifier with an # prefix is called a verbatim
identifier. Use of the # prefix for identifiers that are not keywords
is permitted, but strongly discouraged as a matter of style.
Source
So it just comes down to style, in your case it's the right solution if you don't want to rename your entity.

#-symbol in C# code [duplicate]

This question already has answers here:
What's the use/meaning of the # character in variable names in C#?
(9 answers)
Closed 9 years ago.
I was given a project of a some developer and during looking throuhg the code I found such code:
public interface IPackage
{
void #Do();
}
I've just seen usage of # in context with strings. But what does that mean in such context? Can somebody explain please? Thank you in advance!
From MSDN:
The prefix "#" enables the use of keywords as identifiers, which is
useful when interfacing with other programming languages. The
character # is not actually part of the identifier, so the identifier
might be seen in other languages as a normal identifier, without the
prefix. An identifier with an # prefix is called a verbatim
identifier. Use of the # prefix for identifiers that are not keywords
is permitted, but strongly discouraged as a matter of style.
The # character allows you to give your variables names that are reserved as keywords. A common and useful use case for this is:
public static void AnExtensionMethod(this SomeObject #this)
{
#this.AMethod();
}
In your example the # seems obsolete in C# as Do is not a keyword (do is). It is in VB.NET though, so if this interface will be consumed by VB.NET clients it is needed. In your case it is fine only if there really wasn't any better name for the method than Do. As it is now it seems not very readable. Conceptually "package.do" can mean anything so you can call it x as well. But maybe in your domain language this is a more precise term.
The documentation states:
The prefix "#" enables the use of keywords as identifiers, which is useful when interfacing with other programming languages. The character # is not actually part of the identifier, so the identifier might be seen in other languages as a normal identifier, without the prefix. An identifier with an # prefix is called a verbatim identifier. Use of the # prefix for identifiers that are not keywords is permitted, but strongly discouraged as a matter of style.

how to check that a string isn't a keyword or type in c#

I am writing a code generator in which the variable names are given by the user.
Previous answers have suggested using Regex or CodeDomProvider, the former will tell you if the identifier is valid, but doesn't check keywords, the latter checks keywords, but doesn't appear to check all Types known to the code.
How to determine if a string is a valid variable name?
For instance, a user could name a variable List, or Type, but that is not desirable. How would I prevent this?
The easiest way is to add a list of C# keywords to your application. MSDN has a complete list here.
If you really want to get fancy, you could dynamically compile your generated code and check for the specific errors that you're concerned about. In this case, you're specifically looking for error CS1041:
error CS1041: Identifier expected; '**' is a keyword
You'll probably want to ignore any errors regarding unresolved references, undeclared identifiers, etc.
As others have suggested, you could just prepend your identifiers with #, which is fine if you don't want the user to examine the generated code. If it's something they're going to have to maintain, however, I'd avoid that as (in my opinion) it makes the code noisy, just like $ all over the place in PHP or guys that insist on putting this. in front of every freaking field reference.
I'm not sure there is a full API available which will give you what you're looking for. However the end result you seem to be looking for is the generation of code which will not cause conflicts with reserved C# keywords or existing types. If that is the case one approach you can take is to escape all identifiers given by the user with the # symbol. This allows even reserved keywords in C# to be treated as identifiers.
For example the following is completely valid C# program
class Program
{
static void Main(string[] args)
{
int #byte = 42;
int #string = #byte;
int #Program = 0;
}
}
One option here would be to have your code generator prefix the user-specified name with #. As described in 2.4.2, the # sign (verbatim identifier):
prefix "#" enables the use of keywords as identifiers, which is useful when interfacing with other programming languages. The character # is not actually part of the identifier, so the identifier might be seen in other languages as a normal identifier, without the prefix. An identifier with an # prefix is called a verbatim identifier. Use of the # prefix for identifiers that are not keywords is permitted, but strongly discouraged as a matter of style.
This would allow you to check for the main keywords, and deny them as needed, but not worry about all of the conflicting type information, etc.
You could just prepend a # character to the variable - for instance, #private is a valid variable name.

Categories

Resources