I am new to stackoverflow (my first post) and regex.
Currently i am working on a simple dirty app to replace baseclass properties with ctor injected fields. (cos i need to edit about 400 files)
It should find this:
ClassName(WiredObjectRegistry registry) : base(registry)
{
and replace with:
ClassName(IDependency paramName, ISecondDependency secondParam, ... )
{
_fieldName = paramName;
...
so i need to replace the two old lines with three or more new lines.
basically i was thinking:
find this ->
className + ctorParams + zero or more
whitespaces + newline + zero or more
whitespaces + {
replace with ->
className + newCtorParams + newline +
{
my field assignments
i tried this regex for .net
className + ctorParam + #"\w*" + "\r|\n" + #"\w*" + #"\{"
which does not replace the "{" and the whitespaces correctly
the replaced file content looks like this:
public CacheManager(ICallManager callManager, ITetraEventManager tetraEventManager, IConferenceManager conferenceManager, IAudioManager audioManager)
{
_callManager = callManager;
_tetraEventManager = tetraEventManager;
_conferenceManager = conferenceManager;
_audioManager = audioManager;
{
can u please help me with this :-|
david
If you're translating
className + ctorParams + zero or more whitespaces + newline + zero or more whitespaces + {
into regex as
className + ctorParam + #"\w*" + "\r|\n" + #"\w*" + #"\{"
then you're making several errors.
First, the character class for whitespace is \s. \w means "alphanumeric character".
Second, "\r|\n" will result in the alternation operator | separating the entire regex in two alternative parts (= "match either the regex before the | or the regex after the |"). In your case, you don't need this bit at all since \s will already match spaces, tabs and newlines. If you do want a regex that matches a Unix, Mac or DOS newline, use \r?\n?.
But, as the comments show, unless you show us what you really want to do, we can't help you further.
Related
I have the following pattern:
UnallowedCharacters = #"<>\{\}" + "\"";
#"^(?<contactType>\d+):(?<contactIdentifier>[^;" + UnallowedCharacters + #"]+)(;(?<parameterName>[A-Za-z0-9_-]+)=(?<parameterValue>[^;=" + UnallowedCharacters + "]+))*$"
I need to allow the usage of semicolon in the contactIdentifier part, but still to not exclude the semicolon from not allowed chars, because the later split will not work anymore.
Two examples of input and expected output are the following:
input: "8:test;aliases=1:test#outlook.com,4:test" => after parsing, expected output should be "8:test" for contactIdentifier part
input: "8.test;.person#domain.com;aliases=1:test#outlook.com,4:test" => after parsing, expected output should be "8:test;.person#domain.com" for contactIdentifier part
The semicolons are used for splitting the unparsed string into multiple parts during parsing, but I want to allow using it in contactIdentifer character group without affecting the existing matching & parsing logic.
Any ideas?
If I have understood the question, you can do this:
UnallowedCharacters = #"<>{}"""; (no need to escape inside a character group)
(?<contactIdentifier>(?:[;]|[^" + UnallowedCharacters + #"])+
Explanation:
I changed the <contactIdentifier> group to :
?<contactIdentifier> the name
(?: start of (non capturing) group
[;]| ';' OR:
[^" + UnallowedCharacters + #"] one character not in class
)+ The whole group repeated one or more times.
I want regex that string, but I really dont know how. I have figured out how I can get the numbers, but not the other strings
string text = "1cb07348-34a4-4741-b50f-c41e584370f7 Youtuber https://youtube.com/lol love youtube";
string regexstring = "[a-z0-9]+-[a-z0-9]+-[a-z0-9]+-[a-z0-9]+-[a-z0-9]*(?<id>)"
code
Match m = Regex.Match(text, regexstring);
if(m.Success)
Console.WriteLine(m.Groups[0]);
Output
1cb07348-34a4-4741-b50f-c41e584370f7
now I want that the output is that
1cb07348-34a4-4741-b50f-c41e584370f7
Youtuber
https://youtube.com/lol
love youtube
what I finished is the first line of the output but I dont know how to regex the other strings
([\w]+-){5} is cleaner to replace what you already did.
\w means [a-zA-Z0-9_].
Then, if your string always has a website preceded and followed by a number of words separated by spaces, you can do this:
string regexstring = "((\w*-){4})(\w*) (.+?)[A-Za-z]?(https://[^ ]+?) (.+)";
Ouput
Match m = Regex.Match(text, regexstring);
if(m.Success)
Console.WriteLine(m.Groups[1] + "" + m.Groups[2] + "" + m.Groups[3] + "\n" + m.Groups[4] + "\n" + m.Groups[5] + "\n" + m.Groups[6]);
I'm guessing that, if our inputs would look like the same, this expression might be somewhat close to what you might have in mind, not sure though:
^(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b)\s+(.*?)\s+[A-Z](https?:\/\/\S+)\s+(.*)$
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.
Reference
Searching for UUIDs in text with regex
Regex are simple yet complex at times. Stuck to replace an expression having variables, assuming variable is of the following pattern:
\w+(\.\w+)*
I want to replace all the occurrences of my variable replacing dot (.) because i have to eventually tokenize the expression where tokenizer do not recognize variable having dots. So i thought to replace them with underscore before parsing. After tokenizing however i want to get the variable token with original value.
Expression:
(x1.y2.z3 + 9.99) + y2_z1 - x1.y2.z3
Three Variables:
x1.y2.z3
y2_z1
x1.y2.z3
Desired Output:
(x1_y2_z3 + 9.99) + y2_z1 - x1_y2_z3
Question 1: How to use Regex replace in this case?
Question 2: Is there any better way to address above mentioned problem because variable can have underscore so replacing dot with underscore is not a viable solution to get the original variable back in tokens?
This regex pattern seems to work: [a-zA-Z]+\d+\S+
To replace a dot found only in a match you use MatchEvaluator:
private static char charToReplaceWith = '_';
static void Main(string[] args)
{
string s = "(x1.y2.z3 + 9.99) + y2_z1 - x1.y2.z3";
Console.WriteLine(Regex.Replace(s, #"[a-zA-Z]+\d+\S+", new MatchEvaluator(ReplaceDotWithCharInMatch)));
Console.Read();
}
private static string ReplaceDotWithCharInMatch(Match m)
{
return m.Value.Replace('.', charToReplaceWith);
}
Which gives this output:
(x1_y2_z3 + 9.99) + y2_z1 - x1_y2_z3
I don't fully understand your second question and how to deal with tokenizing variables that already have underscores, but you should be able to choose a character to replace with (i.e., if (string.Contains('_')) is true then you choose a different character to replace with, but probably have to maintain a dictionary that says "I replaced all dots with underscores, and all underscores with ^, etc..).
Try this:
string input = "(x1.y2.z3 + 9.99) + y2_z1 - x1.y2.z3";
string output = Regex.Replace(input, "\\.(?<![a-z])", "_");
This will replace only periods which are followed by a letter (a-z).
Use Regex' negative lookahead by making a group that starts with (?!
A dot followed by something non-numeric would be as simple as this:
// matches any dot NOT followed by a character in the range 0-9
String output = Regex.Replace(input, "\\.(?![0-9])", "_");
This has the advantage that while the [0-9] is part of the expression, it is only checked as being behind the match, but is not actually part of the match.
I make a regex pattern and tested in this site : http://rubular.com/
I'm writing this pattern exactly like this to the first box in that site.
<div class="product clearfix">\n+<div class="img">\n+<a href="(.*?)">\n+<img class="lazyload" id='.*' data-original="(.*?)" alt=".*" title="(.*?)" \/>
I left the second box empty.
My regex pattern working perfectly fine respect to this site.
But i can't get it working in C#
I'm trying this:
WebClient client = new WebClient();
string MainPage = client.DownloadString("http://www.vatanbilgisayar.com/cep-telefonu-modelleri/");
string ItemPattern = "<div class=\"product clearfix\">\\n+" + // <div class="product clearfix">\n
"<div class=\"img\">\\n" + // <div class="img">\n
"+<a href=\"(.*?)\">\\n" + // +<a href="(.*?)">\n
"+<img class=\"lazyload\"" + // +<img class="lazyload"
"id='.*' data-original=\"(.*?)\"" + // id='.*' data-original="(.*?)"
"alt=\".*\" title=\"(.*?)\"\\/>"; // alt=".*" title="(.*?)" \/>
MatchCollection matches = Regex.Matches(MainPage, ItemPattern);
foreach (Match match in matches)
{
Console.WriteLine("Area Code: {0}", match.Groups[1].Value);
Console.WriteLine("Telephone number: {0}", match.Groups[2].Value);
Console.WriteLine();
}
I simply escaped every " with \ . I really don't understand why it's not working and this starting to drive me crazy..
You need 2 layers of escape sequences. You need to escape once for c# and once more for the regex syntax.
If you want to escape characters for regex have to escape \ too, so you should change your \ to \\ for escape sequences at the regex level
use TWO \'s for every single \ in your string. Not counting the escaping you already did for the quotes. Since \ is an escape character. It looks like mainly with "\n" occurring 3 times.
Original String:
"product clearfix">\n+<div class="img">\n+<a href="(.*?)">\n+<img class="lazyload" id='.*' data-original="(.*?)" alt=".*" title="(.*?)" \/
Also, you can break that up into more than one line. c# ignores spaces, so just close the quote and add a "+" to the end of the line, continue by starting with another quote.
C# String:
string ItemPattern = "<div class=\"product clearfix\">\\n" + // <div class="product clearfix">\n
"+<div class=\"img\">\\n" + // +<div class="img">\n
"+<a href=\"(.*?)\">\\n" + // +<a href="(.*?)">\n
"+<img class=\"lazyload\"" + // +<img class="lazyload"
"id='.*' data-original=\"(.*?)\"" + // id='.*' data-original="(.*?)"
"alt=\".*\" title=\"(.*?)\"\\/>"; // alt=".*" title="(.*?)" \/>
If you still have a problem with it, there is something else wrong, probably in the RegEx.Match(mainPage, ItemPattern). According to the debugging you did, it sounds like the string is successfully being created, and there is no MatchCollection. So it's either in how you are obtaining the matches, or in referencing them.
I found the following code to find the n-th occurrence of a value in a text here.
This is the code:
public static int NthIndexOf(this string target, string value, int n)
{
Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}");
if (m.Success)
return m.Groups[2].Captures[n - 1].Index;
else
return -1;
}
I tried to find the index of the second occurrence of "< /form>" (the space does not appear in the original string) in some webpage, and it failed, although for sure it exists in the text. I also cut some prefix of the webpage, so the second occurrence will be the first, and then I succeeded to find the expression as the first occurrence.
In one of the comment on this code, someone wrote that "This Regex does not work if the target string contains linebreaks.".
My two questions are:
Why does not this code work if the target string contains linebreaks?
How can I fix this code, so it will work also for strings that contain linebreaks (replacing/removing the linebreaks is not considered a good solution for me)?
I don't look for other techniques to do the same thing.
the regex match till the end of the line.
For what you want you need to use the Singleline mode, so your code should look something like this:
Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}", RegexOptions.Singleline);
By default Regular Expression end on a new line. To fix it you need to specify the regex option
Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}", RegexOptions.MultiLine);
You can find more information about RegExOptions here.