Do C# results vary based on curly brace placement? - c#

We are currently using C# and want to know if C# bracket placements can change the results.
In Javascript, it matters as results vary based on the curly brace placement .
Why do results vary based on curly brace placement?
In JS they should be kept on the same line, if there are problems with browsers incorrectly interpretting it.
if (x == a)
{
...
}
if (x == a) {
...
Does bracket placement matter for C#?

No, they don't.
In JavaScript, you can write code without ending your lines of code with a semicolon, and JavaScript will automatically fill in the missing semicolons when it interprets your code. That's what this answer to the question you linked is essentially stating. That is to say: the brace placement isn't the real issue in JS; it's the ability to write code with/without semicolons and have JS automatically fill these in for you. The brace placement issue is more of a side effect of this functionality.
In C#, a "line" doesn't end until the semicolon is reached (even if that "line" spans multiple physical lines), and writing code without semicolons isn't something that is automagically taken care of for you by the compiler; it will simply fail to compile. The brace placement in C# therefore is unimportant.

Related

.NET Regular Expression (perl-like) for detecting text that was pasted twice in a row

I've got a ton of json files that, due to a UI bug with the program that made them, often have text that was accidentally pasted twice in a row (no space separating them).
Example: {FolderLoc = "C:\testC:\test"}
I'm wondering if it's possible for a regular expression to match this. It would be per-line. If I can do this, I can use FNR, which is a batch text processing tool that supports .NET RegEx, to get rid of the accidental duplicates.
I regret not having an example of one of my attempts to show, but this is a very unique problem and I wasn't able to find anything on search engines resembling it to even start to base a solution off of.
Any help would be appreciated.
Can collect text along the string (.+ style) followed by a lookahead check for what's been captured up to that point, so what would be a repetition of it, like
/(.+)(?=\1)/; # but need more restrictions
However, this gets tripped even just on double leTTers, so it needs at least a little more. For example, our pattern can require the text which gets repeated to be at least two words long.
Here is a basic and raw example. Please also see the note on regex at the end.
use warnings;
use strict;
use feature 'say';
my #lines = (
q(It just wasn't able just wasn't able no matter how hard it tried.),
q(This has no repetitions.),
q({FolderLoc = "C:\testC:\test"}),
);
my $re_rep = qr/(\w+\W+\w+.+)(?=\1)/; # at least two words, and then some
for (#lines) {
if (/$re_rep/) {
# Other conditions/filtering on $1 (the capture) ?
say $1
}
}
This matches at least two words: word (\w+) + non-word-chars + word + anything. That'll still get some legitimate data, but it's a start that can now be customized to your data. We can tweak the regex and/or further scrutinize our catch inside that if branch.
The pattern doesn't allow for any intervening text (the repetition must follow immediately), what is changed easily if needed; the question is whether then some legitimate repetitions could get flagged.
The program above prints
just wasn't able
C:\test
Note on regex This quest, to find repeated text, is much too generic
as it stands and it will surely pick on someone's good data. It is enough to note that I had to require at least two words (with one word that that is flagged), which is arbitrary and still insufficient. For one, repeated numbers realistically found in data files (3,3,3,3,3) will be matched as well.
So this needs further specialization, for what we need to know about data.

Regex to identify C# functions

I need to find all functions in my VS solution with a certain attribute and insert a line of code at the end and at the beginning of each one. For identifying the functions, I've got as far as
\[attribute\]\r?\n(.*)void(.*)\r?\n.*\{\r?\n([^\{\}]*)\}
But that only works on functions that don't contain any other blocks of code delimited by braces. If I set the last capturing group to [\s\S] (all characters), it simply selects all text from the start of the first function to the end of the last one. Is there a way to get around this and select just one whole function?
I am afraid balancing constructs themselves are not enough since you may have unbalanced number of them in the method body. You can still try this regex that will handle most of the caveats:
\[attribute\](?<signature>[^{]*)(?<body>(?:\{[^}]*\}|//.*\r?\n|"[^"]*"|[\S\s])*?\{(?:\{[^}]*\}|//.*\r?\n|"[^"]*"|[\S\s])*?)\}
See demo on RegexStorm
The regex will ignore all { and } in the string literals and //-like comments, and will consume {...} blocks. The only thing it does not support is /*...*/ multiline comments. Please let me know if you also need to account for them.
The bad news is that you can't do that by the Search-And-Replace feature because it doesn't support balancing groups. You can write a separate program in C# that does it for you.
The construct to get the matching closing brace is:
(?=\{)(?:(?<open>\{)|(?<-open>\})|[^\{\}])+?(?(open)(?!))
This matches a block of {...}. But as #DmitryBychenko mentioned it doesn't respect comments or strings.

How to arrange the matched output in c#?

i'm matching words to create simple lexical analyzer.
here is my example code and output
example code:
public class
{
public static void main (String args[])
{
System.out.println("Hello");
}
}
output:
public = identifier
void = identifier
main = identifier
class = identifier
as you all can see my output is not arranged as the input comes. void and main comes after class but in output the class comes at the end. i want to print result as the input is matched.
c# code:
private void button1_Click(object sender, EventArgs e)
{
if (richTextBox1.Text.Contains("public"))
richTextBox2.AppendText("public = identifier\n");
if (richTextBox1.Text.Contains("void"))
richTextBox2.AppendText("void = identifier\n");
if (richTextBox1.Text.Contains("class"))
richTextBox2.AppendText("class = identifier\n");
if (richTextBox1.Text.Contains("main"))
richTextBox2.AppendText("main = identifier\n");
}
Your code is asking the following qustions:
Does the input contain the text "public"? If so, write down "public = identifier".
Does the input contain the text "void"? If so, write down "void = identifier".
Does the input contain the text "class"? If so, write down "class = identifier".
Does the input contain the text "main"? If so, write down "main = identifier".
The answer to all of these questions is yes, and since they're executed in that exact order, the output you get should not be surprising. Note: public, void, class and main are keywords, not identifiers.
Splitting on whitespace?
So your approach is not going to help you tokenize that input. Something slightly more in the right direction would be input.Split() - that will cut up the input at whitespace boundaries and give you an array of strings. Still, there's a lot of whitespace entries in there.
input.Split(new char[] { ' ', '\t', '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries) is a little better, giving us the following output: public, class, {, public, static, void, main, (String, args[]), {, System.out.println("Hello");, } and }.
But you'll notice that some of these strings contain multiple 'tokens': (String, args[]) and System.out.println("Hello");. And if you had a string with whitespace in it it would get split into multiple tokens. Apparently, just splitting on whitespace is not sufficient.
Tokenizing
At this point, you would start writing a loop that goes over every character in the input, checking if it's whitespace or a punctuation character (such as (, ), {, }, [, ], ., ;, and so on). Those characters should be treated as the end of the previous token, and punctuation characters should also be treated as a token of their own. Whitespace can be skipped.
You'll also have to take things like string literals and comments into account: anything in-between two double-quotes should not be tokenized, but be treated as part of a single 'string' token (including whitespace). Also, strings can contain escape sequences, such as \", that produce a single character (that double quote should not be treated as the end of the string, but as part of its content).
Anything that comes after two forward slashes should be ignored (or parsed as a single 'comment' token, if you want to process comments somehow), until the next newline (newline characters/sequences differ across operating systems). Anything after a /* should be ignored until you encounter a */ sequence.
Numbers can optionally start with a minus sign, can contain a dot (or start with a dot), a scientific notation part (e..), which can also be negative, and there are type suffixes...
In other words, you're writing a state machine, with different behaviour depending on what state you're in: 'string', 'comment', 'block comment', 'numeric literal', and so on.
Lexing
It's useful to assign a type to each token, either while tokenizing or as a separate step (lexing). public is a keyword, main is an identifier, 1234 is an integer literal, "Hello" is a string literal, and so on. This will help during the next step.
Parsing
You can now move on to parsing: turning a list of tokens into an abstract syntax tree (AST). At this point you can check if a list of tokens is actually valid code. You basically repeat the above step, but at a higher level.
For example, public, protected and private are keyword tokens, and they're all access modifiers. As soon as you encounter one of these, you know that either a class, a function, a field or a property definition must follow. If the next token is a while keyword, then you signal an error: public while is not a valid C# construct. If, however, the next token is a class keyword, then you know it's a class definition and you continue parsing.
So you've got a state machine once again, but this time you've got states like 'class definition', 'function definition', 'expression', 'binary expression', 'unary expression', 'statement', 'assignment statement', and so on.
Conclusion
This is by no means complete, but hopefully it'll give you a better idea of all the steps involved and how to approach this. There are also tools available that can generate parsing code from a grammar specification, which can ease the job somewhat (though you still need to learn how to write such grammars).
You may also want to read the C# language specification, specifically the part about its grammar and lexical structure. The spec can be downloaded for free from one of Microsofts websites.
CodeCaster is right. You are not on the right path.
I have an lexical analyzer made by me some time ago as a project.
I know, I know I'm not supposed to put things on a plate here, but the analyzer is for c++ so you'll have to change a few things.
Take a look at the source code and please try to understand how it works at least: C++ Lexical Analyzer
In the strictest sense, the reason for the described behaviour is that in the evaluating code, the search for void comes before the search for class. However, the approach in total seems far too simple for a lexical analysis, as it simply checks for substrings. I totally second the comments above; depending on what you are trying to achieve in the big picture, a more sophisticated approach might be necessary.

Regex to Identify Try-Catch and Try-Catch finally blocks contained in a .cs file

Hi I am trying to create an App to help me switch out the test contents of my Catch-Block and replace them with production contents. I am able to read through my file, and parse the contents, but having problems creating a regex( I am brand new to this still) to identify the try-catch block, so I can either choose to delete or change the contents of the catch block. Anyone able to help me solve this problem please??
so far I have the expression below(does not work at all)
try{*}catch(*){*)
Thanks in advance.
You can't write a regex that does this, because regex can't be used to match nested patterns. Which means that it won't be able to identify when your closing brace occurs vs other nested braces in your code. You will need a parser generator such as ANTLR like the linked answer suggests to accomplish this.
I'd suggest taking a look at Microsoft's Roslyn compiler that is under development. It's APIs should probably allow you to accomplish whatever it is you're looking to do. It is currently in preview.
I think this would drive you to a solution:
try\s*\{[^{}]*([^{}]*\{[^{}]*\}[^{}]*)*[^{}]*\}\s*catch\s*\([^()]*(\([^()]*\))*\)\s*\{[^{}]*([^{}]*\{[^{}]*\}[^{}]*)*[^{}]*\}
try\s* matches try followed by zero or more spaces.
\{[^{}]*([^{}]*\{[^{}]*\}[^{}]*)*[^{}]*\} matches a block ({ followed by zero or more characters except { and } followed by zero or more blocks with any number of characters preceding and succeeding each)
\s*catch\s*\([^()]*(\([^()]*\))*\) matches zero or more spaces preceding and following catch, then something inside brackets ()
\s*\{[^{}]*([^{}]*\{[^{}]*\}[^{}]*)*[^{}]*\} similar to try block
Note: May fail in case of ones with comments containing {s or }s

C# method contents validation

I need to validate the contents of a C# method.
I do not care about syntax errors that do not affect the method's scope.
I do care about characters that will invalidate parsing of the rest of the code. For example:
method()
{
/* valid comment */
/* <-- bad
for (i..) {
}
for (i..) { <-- bad
}
I need to validate/fix any non-paired characters.
This includeds /* */, { }, and maybe others.
How should I go about this?
My first thought was Regex, but that clearly isn't going to get the job done.
You'll need to scope your problem more carefully in order to get a sensible answer.
For example, what are you going to do about methods that contain preprocessor directives?
void M()
{
#if FOO
for(foo;bar;blah) {
#else
while(abc) {
#endif
Blah();
}
}
This is silly but legal, so you have to handle it. Are you going to count that as a mismatched brace or not?
Can you provide a detailed specification of exactly what you want to determine? As we've seen several times on this site, people cannot successfully build a routine that divides two numbers without a specification. You're talking about analysis that is far more complex than dividing two numbers; the code which does what you're describing in the actual compiler is tens of thousands of lines long.
A regex is certainly not the answer to this problem. Regex's are useful tools for certain types of data validation. But once you get into the business of more complicated data like matching braces or comment blocks a regex no longer gets the job done.
Here is a blog article on the limitations encountered when using a regex to validate input.
http://blogs.msdn.com/ianhu/archive/2009/11/16/intellitrace-itrace-files.aspx
In order to do this you will have to write a parser of sorts which does the validation.
A regular expression isn't a very convenient thing for such a task. This is often implemented using a stack with an algorithm like the following:
Create an empty stack S.
While( there are characters left ){
Read a character ch.
If is ch an opening paren (of any kind), push it onto S
Else
If ch is a closing paren (of any kind), look at the top of S.
If S is empty as this point, report failure.
If the top of S is the opening paren that corresponds to c,
then pop S and continue to 1, this paren matches OK.
Else report failure.
If at the end of input the stack S is not empty, return failure.
Else return success.
for more information check http://www.ccs.neu.edu/home/sbratus/com1101/lab4.html and http://codeidol.com/csharp/csharpckbk2/Data-Structures-and-Algorithms/Determining-Where-Characters-or-Strings-Do-Not-Balance/
If you're trying to "validate" the contents of a string defining a method, then you may be better off just trying to use the CodeDom classes and compile the method on the fly into an in memory assembly.
Writing your own fully-functional parser to do validation will be very, very difficult, especially if you want to support C# 3 or later. Lambda expressions and other constructs like that will be very difficult to "validate" cleanly.
You're drawing a false dichotomy between "characters that will invalidating parsing the rest of the code" and "syntax errors". Lacking a closing curly brace (one of the problems you mention) is a syntax error. It looks like you mean you're looking for syntax errors that potentially break scope boundaries? Unfortunately, there's no robust way to do this short of using a full parser.
As an example:
method()
{ <-- is missing closing brace
/* valid comment */
/* <-- bad
for (i..) {
}
for (i..) {
} <-- will be interpreted as the closing brace for the for loop
There's no general, practical way to infer that it's the for loop that's missing its closing brace, rather than the method.
If you're really interested in looking for these sort of things, you should consider running the compiler programmatically and parsing the results - that's the best approach with the lowest entry threshold.

Categories

Resources