Regex to identify C# functions - c#

I need to find all functions in my VS solution with a certain attribute and insert a line of code at the end and at the beginning of each one. For identifying the functions, I've got as far as
\[attribute\]\r?\n(.*)void(.*)\r?\n.*\{\r?\n([^\{\}]*)\}
But that only works on functions that don't contain any other blocks of code delimited by braces. If I set the last capturing group to [\s\S] (all characters), it simply selects all text from the start of the first function to the end of the last one. Is there a way to get around this and select just one whole function?

I am afraid balancing constructs themselves are not enough since you may have unbalanced number of them in the method body. You can still try this regex that will handle most of the caveats:
\[attribute\](?<signature>[^{]*)(?<body>(?:\{[^}]*\}|//.*\r?\n|"[^"]*"|[\S\s])*?\{(?:\{[^}]*\}|//.*\r?\n|"[^"]*"|[\S\s])*?)\}
See demo on RegexStorm
The regex will ignore all { and } in the string literals and //-like comments, and will consume {...} blocks. The only thing it does not support is /*...*/ multiline comments. Please let me know if you also need to account for them.

The bad news is that you can't do that by the Search-And-Replace feature because it doesn't support balancing groups. You can write a separate program in C# that does it for you.
The construct to get the matching closing brace is:
(?=\{)(?:(?<open>\{)|(?<-open>\})|[^\{\}])+?(?(open)(?!))
This matches a block of {...}. But as #DmitryBychenko mentioned it doesn't respect comments or strings.

Related

.NET Regular Expression (perl-like) for detecting text that was pasted twice in a row

I've got a ton of json files that, due to a UI bug with the program that made them, often have text that was accidentally pasted twice in a row (no space separating them).
Example: {FolderLoc = "C:\testC:\test"}
I'm wondering if it's possible for a regular expression to match this. It would be per-line. If I can do this, I can use FNR, which is a batch text processing tool that supports .NET RegEx, to get rid of the accidental duplicates.
I regret not having an example of one of my attempts to show, but this is a very unique problem and I wasn't able to find anything on search engines resembling it to even start to base a solution off of.
Any help would be appreciated.
Can collect text along the string (.+ style) followed by a lookahead check for what's been captured up to that point, so what would be a repetition of it, like
/(.+)(?=\1)/; # but need more restrictions
However, this gets tripped even just on double leTTers, so it needs at least a little more. For example, our pattern can require the text which gets repeated to be at least two words long.
Here is a basic and raw example. Please also see the note on regex at the end.
use warnings;
use strict;
use feature 'say';
my #lines = (
q(It just wasn't able just wasn't able no matter how hard it tried.),
q(This has no repetitions.),
q({FolderLoc = "C:\testC:\test"}),
);
my $re_rep = qr/(\w+\W+\w+.+)(?=\1)/; # at least two words, and then some
for (#lines) {
if (/$re_rep/) {
# Other conditions/filtering on $1 (the capture) ?
say $1
}
}
This matches at least two words: word (\w+) + non-word-chars + word + anything. That'll still get some legitimate data, but it's a start that can now be customized to your data. We can tweak the regex and/or further scrutinize our catch inside that if branch.
The pattern doesn't allow for any intervening text (the repetition must follow immediately), what is changed easily if needed; the question is whether then some legitimate repetitions could get flagged.
The program above prints
just wasn't able
C:\test
Note on regex This quest, to find repeated text, is much too generic
as it stands and it will surely pick on someone's good data. It is enough to note that I had to require at least two words (with one word that that is flagged), which is arbitrary and still insufficient. For one, repeated numbers realistically found in data files (3,3,3,3,3) will be matched as well.
So this needs further specialization, for what we need to know about data.

c# regex - changing pattern matches until find specific word

usually i can workaround and get everything works by myself, but this one is kinda tricky, even msdn references and examples confuses more than helps.
i have testing some codes and stuck at mixing a capture grouping for changing with a non-capturing group, to stop the matchings when i wish
a simpler code that i want to change is:
stats = "label:100,value:7878,label:110,value:7879,something,label:200,value:8888";
valor = "value:8080";
i know if i use
pattern = #"value:(\d+)";
i can change every value number to 8080 when i do
Regex.Replace(stats, pattern, valor);
but i need he stops changing these when find 'something' string
i managed to change every single char to 'valor' until he finds 'something' using
pattern = #"^(?:(?!something).)*";
is there a way to only change 'value:(\d+)' numbers to 'valor' , along with the ?:(?!something) to stop the matchings in the same sentence?
ive seen lots of examples but they never said something like this so i dunno if its possible to merge both conditions at same time
You can make use of a look-behind solution that makes sure there is no something before the value:
(?<!\bsomething\b.*)value:\d+
See demo
Note that something is matched as a whole word due to \b word boundaries.
The result of replace operation:
Note that (?:(?!something).) is very inefficient and should be used when no other means works. In .NET, there is a powerful variable-width look-behind, which is the right tool for this task.
Also note that if you are not using capture group backreferences, you do not need those capturing groups in your pattern (I remove parentheses from around \d+).

Regex to Identify Try-Catch and Try-Catch finally blocks contained in a .cs file

Hi I am trying to create an App to help me switch out the test contents of my Catch-Block and replace them with production contents. I am able to read through my file, and parse the contents, but having problems creating a regex( I am brand new to this still) to identify the try-catch block, so I can either choose to delete or change the contents of the catch block. Anyone able to help me solve this problem please??
so far I have the expression below(does not work at all)
try{*}catch(*){*)
Thanks in advance.
You can't write a regex that does this, because regex can't be used to match nested patterns. Which means that it won't be able to identify when your closing brace occurs vs other nested braces in your code. You will need a parser generator such as ANTLR like the linked answer suggests to accomplish this.
I'd suggest taking a look at Microsoft's Roslyn compiler that is under development. It's APIs should probably allow you to accomplish whatever it is you're looking to do. It is currently in preview.
I think this would drive you to a solution:
try\s*\{[^{}]*([^{}]*\{[^{}]*\}[^{}]*)*[^{}]*\}\s*catch\s*\([^()]*(\([^()]*\))*\)\s*\{[^{}]*([^{}]*\{[^{}]*\}[^{}]*)*[^{}]*\}
try\s* matches try followed by zero or more spaces.
\{[^{}]*([^{}]*\{[^{}]*\}[^{}]*)*[^{}]*\} matches a block ({ followed by zero or more characters except { and } followed by zero or more blocks with any number of characters preceding and succeeding each)
\s*catch\s*\([^()]*(\([^()]*\))*\) matches zero or more spaces preceding and following catch, then something inside brackets ()
\s*\{[^{}]*([^{}]*\{[^{}]*\}[^{}]*)*[^{}]*\} similar to try block
Note: May fail in case of ones with comments containing {s or }s

how to replace the exact word by another in a list?

I have a list like :
george fg
michel fgu
yasser fguh
I would like to replace fg, fgu, and fguh by "fguhCool" I already tried something like this :
foreach (var ignore in NameToPoulate)
{
tempo = ignore.Replace("fg", "fguhCool");
NameToPoulate_t.Add(tempo);
}
But then "fgu" become "fguhCoolu" and "fguh" become "fguhCooluh" is there are a better idea ?
Thanks for your help.
I assume that this is a homework assignment and that you are being tested for the specific algorihm rather than any code that does the job.
This is probably what your teacher has in mind:
Students will realize that the code should check for "fguh" first, then "fgu" then "fg". The order is important because replacing "fg" will, as you have noticed, destroy a "fguh".
This will by some students be implemented as a loop with if-else conditions in them. So that you will not replace a "fg" that is within an already replaced "fguhCool".
But then you will find that the algorithm breaks down if "fg" and "fgu" are both within the same string. You cannot then allow the presence of "fgu" prevent you to check for "fg" at a different part of the string.
The answer that your teacher is looking for is probably that you should first locate "fguh", "fgu" and "fg" (in that order) and replace them with an intermediary string that doesn't contain "fg". Then after you have done that, you can search for that intermediary string and replace it with "fguhCool".
You could use regular expressions:
Regex.Replace(#"\bfg\b", "fguhCool");
The \b matches a so-called word boundary which means it matches the beginnnig or end of a word (roughly, but for this purpose enough).
Use a regular expression:
Regex.Replace("fg(uh?)?", "fguhCool");
An alternative would be replacing the long words for the short ones first, then replacing the short for the end value (I'm assuming all words - "fg", "fgu" and "fguh" - would map to the same value "fguhCool", right?)
tempo = ignore
.Replace("fguh", "fg")
.Replace("fgu", "fg")
.Replace("fg", "fguhCool");
Obs.: That assumes those words can appear anywhere in the string. If you're worried about whole words (i.e. cases where those words are not substrings of a bigger word), then see #Joey's answer (in this case, simple substitutions won't do, regexes are really the best option).

Regex Help (again)

I don't really know what to entitle this, but I need some help with regular expressions. Firstly, I want to clarify that I'm not trying to match HTML or XML, although it may look like it, it's not. The things below are part of a file format I use for a program I made to specify which details should be exported in that program. There is no hierarchy involved, just that each new line contains a 'tag':
<n>
This is matched with my program to find an enumeration, which tells my program to export the name value, anyway, I also have tags like this:
<adr:home>
This specifies the home address. I use the following regex:
<((?'TAG'.*):(?'SUBTAG'.*)?)?(\s+((\w+)=('|"")?(?'VALUE'.*[^'])('|"")?)?)?>
The problem is that the regex will split the adr:home tag fine, but fail to find the n tag because it lacks a colon, but when I add a ? or a *, it then doesn't split the adr:home and similar tags. Can anyone help? I'm sure it's only simple, it's just this is my first time at creating a regular expression. I'm working in C#, by the way.
Will this help
<((?'TAG'.*?)(?::(?'SUBTAG'.*))?)?(\s+((\w+)=('|"")?(?'VALUE'.*[^'])('|"")?)?)?>
I've wrapped the : capture into a non capturing group round subtag and made the tag capture non greedy
Not entirely sure what your aim is but try this:
(?><)(?'TAG'[^:\s>]*)(:(?'SUBTAG'[^\s>:]*))?(\s\w+=['"](?'VALUE'[^'"]*)['"])?(?>>)
I find this site extremely useful for testing C# regex expressions.
What if you put the colon as part of the second tag?
<((?'TAG'.*)(?':SUBTAG'.*)?)?(\s+((\w+)=('|"")?(?'VALUE'.*[^'])('|"")?)?)?>

Categories

Resources