Regular Expression Pattern Matching - c#

Hi I need to do like this.
Actually **ctu** is a good university but **ctu's** is not. There are many **,ctus,** present.
What I want to do is, I want to replace ctu in the string like this.
Actually **<s>ctu<e>** is a good university but **<s>ctu's<e>** is not. There are many **,<s>ctus<e>,** present.
But with the following pattern
**\\bctu*(?:['\\\\|""\\\\]*)\\w+\\b**
I'm getting the out put as:
A**<s>ctu<e>**ally **<s>ctu<e>** is a good university but **<s>ctu's<e>** is not. There are many **,ctus,** present.
I dont want to replace ctu inside words Actually. and also I need to replace " ,ctus, " with " ,<s>ctus<e>, "
How do I achieve this using regex. I need this in c#. csharp.
Thanks in advance.

The following regex matches all the cases listed in your example:
#"(\bctu(?:'\w+)?\w*\b)"
Then just replace the match with #"<s>\1<e>" where \1 is the backreference to the match above.

Are you looking for #"\bctu\b" ("ctu" with word boundaries on both sides, so it matches ctu but not Actually, ctu's, or ,ctus,) for the first search pattern and ",ctus," (exactly the string ,ctus,, regardless of where it might fall in a word) as the second search pattern? To search for both of these at once, you could use #"(\bctu\b|,ctus,)".
As a slight aside, in C# you can write regex literals much easier by using the #"" notation (verbatim strings) instead of "". E.g. to get regex to understand a word boundary, it must see \b, which can be represented as #"\b" or "\\b", and a literal \ is "\\\\" or #"\\". The first is easier to read, especially in more complex cases.
If this doesn't answer your question, please give a clear example of expected input/output.

Related

how can I use unnamed Regex groups in C# inside my regex?

hey so my current regex is #"(into)(to)add\s[^\s]{1,}\1|\2[^\s]{1,}" I want the input to be something "add word into/to category" the regex in general works fine but just the \1|\2 part, I tried using groups and all sorts of solutions but I just can't seem to figure out how I can make it so that the input can be into or to
Can anyone help me out? (this is in C# and using the Regex class)
If I have understood you correctly, then you don't need back references to (unnamed) Groups, you can use a simple alternation, like this:
#"add \w+ (into|to) \w+"
That will select either into or to in the search string.
Edit:
Let's get a Little more 'advanced', using the optional sign '?':
#"add \w+ (in)?to \w+"
This will match 'in' zero or one time, followed by 'to', so it will match into as well as to, exactly as the original RegEx.
Edit2:
I have a feeling, you want to use a variable inside your RegEx, you can of course do that like this:
string search = "into|to";
RegEx regEx = new ReqEx(#"add \w+ (" + search + ") \w+");
From your given example I think you're looking for a regex like add\s\w+\s(into|to)\s\w+. Your current regex matches only strings starting with "intoto" wich is probably not what you want.

Regex - negative look-behind anywhere on line

How do I match a pattern only if there isn't a specific character before it on the same line?
I have the following regex code:
pattern = #"(?<=^|[\s.(<;])(?<!//)(" + Regex.Escape(keyword) + #")(?=[\s.(>])";
replacement = "<span style='" + keywordStyle + "'>$1</span>";
code = Regex.Replace(code, pattern, replacement);
I would like to add a criteria to only match if there aren't 2 slashes before it on the same line (C# comment).
I played around with it, and modified the pattern:
pattern = #"(?<!\/\/)(?<=^|[\s.(<;])(?<!//)(" + Regex.Escape(keyword) + #")(?=[\s.(>])";
But apparently this only works if the 2 slashes are 2 characters right before the keyword.
So this pattern wouldn't match "//foreach", but would match "// foreach".
Can negative look-behinds be used in this case, or can I accomplish this some other way, besides negative look-behinds?
Thank you.
EDIT:
Guess I wasn't clear enough. To reiterate my problem:
I'm working on syntax highlighting, and I need to find matches for c# keywords, like "foreach". However, I also need to take into account comments, which are defined by 2 slashes. I don't want to match the keyword "foreach" if it is part of a comment (2 slashes anywhere before it on the same line.
The negative lookbehind doesn't help me in this case because the slashes will not necessarily be right before the keyword, for example "// some text foreach" - I don't want this foreach to match.
So again, my question is: How can modify my pattern to only match if 2 slashes aren't anywhere before it on the same line?
Hope my question is clear now.
Simplifying your regex pattern a bit, what about the following? It makes use of the non-greedy match on "//" plus 0 or more characters thereafter.
(?<!//.*?)(?<Keyword>foreach)
Without knowing exactly what you're attempting it's hard to say the best solution but most likely it's simply checking the beginning of the line for // before you bother trying the regex, especially if there can be more than one keyword per line.
Try this:
^\s*(?<!//.*)\s*foreach
for c# code analysis try reliable and opensource Irony - .NET Language Implementation Kit from codeplex.
If you're doing things with Syntax Highlighting, you really should take a look at this CodeProject article: Fast Colored TextBox for Syntax Highlighting
This project is about a Code Editor window that does syntax highlighting too, and it uses regular expressions. Maybe it does what you need (and maybe more). It seems like the author of this has given a lot of thought to the Syntax Highlighting. I tried the foreach that you talked about here, and the "foreach" if it is part of a comment, and it displayed nicely.

Regex : replace a string

I'm currently facing a (little) blocking issue. I'd like to replace a substring by one another using regular expression. But here is the trick : I suck at regex.
Regex.Replace(contenu, "Request.ServerVariables("*"))",
"ServerVariables('test')");
Basically I'd like to replace whatever is between the " by "test". I tried ".{*}" as a pattern but it doesn't work.
Could you give me some tips, I'd appreciate it!
There are several issues you need to take care of.
You are using special characters in your regex (., parens, quotes) -- you need to escape these with a slash. And you need to escape the slashes with another slash as well because we 're in a C# string literal, unless you prefix the string with # in which case the escaping rules are different.
The expression to match "any number of whatever characters" is .*. In this case, you would want to match any number of non-quote characters, which is [^"]*.
In contrast to (1) above, the replacement string is not a regular expression so you don't want any slashes there.
You need to store the return value of the replace somewhere.
The end result is
var result = Regex.Replace(contenu,
#"Request\.ServerVariables\(""[^""]*""\)",
"Request.ServerVariables('test')");
Based purely on my knowledge of regex (and not how they are done in C#), the pattern you want is probably:
"[^"]*"
ie - match a " then match everything that's not a " then match another "
You may need to escape the double-quotes to make your regex-parser actually match on them... that's what I don't know about C#
Try to avoid where you can the '.*' in regex, you can usually find what you want to get by avoiding other characters, for example [^"]+ not quoted, or ([^)]+) not in parenthesis. So you may just want "([^"]+)" which should give you the whole thing in [0], then in [1] you'll find 'test'.
You could also just replace '"' with '' I think.
Taryn Easts regex includes the *. You should remove it, if it is just a placeholder for any value:
"[^"]"
BTW: You can test this regex with this cool editor: http://rubular.com/r/1MMtJNF3kM

How to Check if a String is a "string" or a RegEx?

How can I check if a String in an textbox is a plain String ore a RegEx?
I'm searching through a text file line by line.
Either by .Contains(Textbox.Text); or by Regex(Textbox.Text) Match(currentLine)
(I know, syntax isn't working like this, it's just for presentation)
Now my Program is supposed to autodetect if Textbox.Text is in form of a RegEx or if it is a normal String.
Any suggestions? Write my own little RexEx to detect if Textbox contains a RegEx?
Edit:
I failed to add thad my Strings
can be very simple like Foo ore 0005
I'm trying the suggested solutions
right away!
You can't detect regular expressions with a regular expression, as regular expressions themselves are not a regular language.
However, the easiest you probably could do is trying to compile a regex from your textbox contents and when it succeeds you know that it's a regex. If it fails, you know it's not.
But this would classify ordinary strings like "foo" as a regular expression too. Depending on what you need to do, this may or may not be a problem. If it's a search string, then the results are identical for this case. In the case of "foo.bar" they would differ, though since it's a valid regex but matches different things than the string itself.
My advice, also stated in another comment, would be that you simply always enable regex search since there is exactly no difference if you split code paths here. Aside from a dubious performance benefit (which is unlikely to make any difference if there is much of a benefit at all).
Many strings could be a regex, every regex could actually be a string.
Consider the string "thin." could either be a string ('.' is a dot) or a regex ('.' is any character).
I would just add a checkbox where the user indicates if he enters a regex, as usual in many applications.
One possible solution depending on your definition of string and regex would be to check if the string contains any regex typical characters.
You could do something like this:
string s = "I'm not a Regex";
if (s == Regex.Escape(s))
{
// no regex indeed
}
Try and use it in a regex and see if an exception is thrown.
This approach only checks if it is a valid regex, not whether it was intended to be one.
Another approach could be to check if it is surrounded by slashes (ie. ‘/foo/‘) Surrounding regexes with slashes is common practice (although you must remove the slashes before feeding it into the regex library)

refering to already existing group in regex, c#

I have a regex where
%word% can occur multiple times, separated by a "<"
%word% is defined as ".*?"|[a-zA-Z]+
so i wrote
(".*"|[a-zA-Z]+)([<](".*"|[a-zA-Z]+))*
Is there any way i can shrink it using capturing groups?
(".*"|[a-zA-Z]+)([<]\1)*,
But i don't think \1 can be used as it'd mean repeat the first capture, as i would not know what was captured as it can be a quoted string or a word.
Any thing similar i can use to refer matching the previously written group. I'm working in C#.
using String.Format to avoid repetition and no there is no way to repeat the regex group literally
String.Format("{0}([<]{0})*", #"("".*""|[a-zA-Z]+)")
As the support is not there yet for the feature, i made a string replacer, where i wrote the specific words i need to replaced by regex using %% and then wrote the program to replace it by the regular expression defined for the text.

Categories

Resources