I want to search through my code and replace all instances of someObject.ToString() with Convert.ToString(someObject).
For example if I have:
var x = someClassInstance.ToString()
I want to replace it with:
var x = Convert.ToString(someClassInstance)
Is it possible to do this through regular expression?
The solution will differ slightly based on your environment, but for example in Notepad++:
Search for ([0-9a-zA-Z_]+)\.toString\(\).
Replace with Convert.toString\($1\).
In C#, you can use the following regex:
\b(\w+)\.ToString\(\)
It starts by matching a Word boundary and then graps all Word characters before the dot and ToString(). Note the escaped characters, they have special meaning in regex,
You then need to replace it with:
Convert.ToString($1)
Here '$1' will be replaced by the matched Group 1 from the regex (the name of the method).
Edit:
The above regex will fail if the method name is a call to a method, like 'myMethod(param).ToString()'.
I have changed the regex to accept anything not being a dot followed by 'ToString' (since the code can already compile, there's no need for further syntax checking):
\b((?!\.ToString)(?:[\w.()+*/-])*?)\.ToString\(\)
Now it should include function calls.
Example of match: 'SomeFunction(Int32.MaxValue-1).ToString()'
It will fail, if there are Spaces in the match.
Related
So i have the following RegEx for the purpose of finding and adding whitespace:
(\S)(\()
So for a string like "SomeText(Somemoretext)" I want to update this to "SomeText (Somemoretext)" it matches "t(" and so my replace eliminates the "t" from the string which is not good. I also do not know what the character could be, I'm merely trying to find the non-existence of whitespace.
Is there a better expression to use or is there a way to exclude the found character from the match returned so that I can safely replace without catching characters i do not want to replace?
Thanks
I find lookarounds hard to read and would prefer using substitutions in the replacement string instead:
var s = Regex.Replace("test1() test2()", #"(\S)\(", "$1 (");
Debug.Assert(s == "test1 () test2 ()");
$1 inserts the first capture group from the regex into the replacement string which is the non-space character before the opening parenthesis (.
If you need to detect the absence of space before a specific character (such as bracket) after a word, how about the following?
\b(?=[^\s])\(
This will detect words ( [a-zA-z0-9_] that are followed by a bracket, without a space).
(if I got your problem correctly) you can replace the full match with ( and get exactly what you need.
In case you need to look for absence spaces before a symbol (like a bracket) in any kind of text (as in the text may be non-word, such as punctuation) you might want to use the following instead.
^(?:\S*)(\()(?:\S*)$
When using this, your result will be in group 1, instead of just full match (which now contains the whole line, if a line is matched).
I'm trying to learn regex, but still have no clue. I have this line of code, which successfully seperates the placeholder 'FirstWord' by the '{' delimiter from all following text:
var regexp = new Regex(#"(?<FirstWord>.*?)\{(?<TextBetweenCurlyBrackets>.*?)\}");
Which reads this string with no problem:
Greetings{Hello World}
What I want to do is to replace the '{' with a character chain like for instance '/>>'
so I tried this:
var regexp = new Regex(#"(?<FirstWord>.*?)\/>>(?<OtherText>.*?)\");
I removed the last bracket and replaced the first one with '/>>' But it throws an ArgumentException. How would the correct character combination look like?
/ does not need to be escaped, unless you use it as the pattern-delimiter.:
#"(?<FirstWord>.*?)/>>(?<OtherText>.*?)\"
Also your last \ will basically escape the " which should end the String (c#-wise: remove it):
#"(?<FirstWord>.*?)/>>(?<OtherText>.*?)"
And since you want most likely fetch until the END of the String (.*? will fetch as less characters as required to satisfy the expression), you should use the $ at the end or use any other sort of delimiter (whitspace, linebreak, etc...).
#"(?<FirstWord>.*?)/>>(?<OtherText>.*?)$"
Example:
(.*?)/>>(.*?)$
Debuggex Demo
Removing the trailing $ will fetch the empty string for the second match group, because "" is the shortest string possible satisfying the expression .*?
(.*?)/>>(.*?)$ on This/>>Test One will match This and Test One
(.*?)/>>(.*?)\s on This/>>Test One will match This and Test
(.*?)/>>(.*?) on This/>>Test One will match This and ""
Note: I'm saying "" is the shortest string possible satisfying the expression .?* on purpose! A frequent Misstake is to interpret .*?a as "everything until a":
Regex is greedy by default!
Searching for the expressiong (.*?)a$ on "caba" will NOT fail to match - it will return cab!, because cab followed by a is satisfying the expression AND cab is the shortest string possible for any match.
One might also expect b to be matched - but regex is working from left to right, hence aborting once it found cab - even if b would be shorter.
I'm having a hard time understanding why the following expression \\[B.+\\] and code returns a Matches count of 1:
string r = "\\[B.+\\]";
return Regex.Matches(Markup, sRegEx);
I want to find all the instances (let's call them 'tags') (in a variable length HTML string Markup that contains no line breaks) that are prefixed by B and are enclosed in square brackets.
If the markup contains [BName], I get one match - good.
If the markup contains [BName] [BAddress], I get one match - why?
If the markup contains [BName][BAddress], I also only get one match.
On some web-based regex testers, I've noticed that if the text contains a CR character, I'll get a match per line - but I need some way to specify that I want matches returned independent of line breaks.
I've also poked around in the Groups and Captures collections of the MatchCollection, but to no avail - always just one result.
You are getting only one match because, by default, .NET regular expressions are "greedy"; they try to match as much as possible with a single match.
So if your value is [BName][BAddress] you will have one match - which will match the entire string; so it will match from the [B at the beginning all the way to the last ] - instead of the first one. If you want two matches, use this pattern instead: \\[B.+?\\]
The ? after the + tells the matching engine to match as little as possible... leaving the second group to be its own match.
Slaks also noted an excellent option; specifying specifically that you do not wish to match the ending ] as part of the content, like so: \\[B[^\\]]+\\] That keeps your match 'greedy', which might be useful in some other case. In this specific instance, there may not be much difference - but it's an important thing to keep in mind depending on what data/patterns you might be dealing with specifically.
On a side note, I recommend using the C# "literal string" specifier # for regular expression patterns, so that you do not need to double-escape things in regex patterns; So I would set the pattern like so:
string pattern = #"\[B.+?\]";
This makes it much easier to figure out regular expressions that are more complex
Try the regex string \\[B.+?\\] instead. .+ on it's own (same is pretty much true for .*) will match against as many characters as possible, whereas .+? (or .*?) will match against the bare minimum number of characters whilst still satisfying the rest of the expression.
.+ is a greedy match; it will match as much as possible.
In your second example, it matches BName] [BAddress.
You should write \[B[^\]]+\].
[^\]] matches every character except ], so it is forced to stop before the first ].
I'm trying to split a string into tokens (via regular expressions)
in the following way:
Example #1
input string: 'hello'
first token: '
second token: hello
third token: '
Example #2
input string: 'hello world'
first token: '
second token: hello world
third token: '
Example #3
input string: hello world
first token: hello
second token: world
i.e., only split up the string if it is NOT in single quotation marks, and single quotes should be in their own token.
This is what I have so far:
string pattern = #"'|\s";
Regex RE = new Regex(pattern);
string[] tokens = RE.Split("'hello world'");
This will work for example #1 and example #3 but it will NOT work for example #2.
I'm wondering if there's theoretically a way to achieve what I want with regular expressions
You could build a simple lexer, which would involve consuming each of the tokens one by one. So you would have a list of regular expressions and would attempt to match one of them at each point. That is the easiest and cleanest way to do this if your input is anything beyond the very simple.
Use a token parsor to split into tokens. Use regex to find a string patterns
'[^']+' will match text inside single quotes. If you want it grouped, (')([^']+)('). If no matches are found, then just use a regular string split. I don't think it makes sense to try to do the whole thing in one regular expression.
EDIT: It seems from your comments on the question that you actually want this applied over a larger block of text rather than just simple inputs like you indicated. If that's the case, then I don't think a regular expression is your answer.
While it would be possible to match ' and the text inside separately, and also alternatively match the text alone, RegExp does not allow an indefinite number of matches. Or better said, you can only match those objects you explicitely state in the expression. So ((\w+)+\b) could theoretically match all words one-by-one. The outer group will correctly match the whole text, and also the inner group will match the words separately correctly, but you will only be able to reference the last match.
There is no way to match a group of matched matches (weird sentence). The only possible way would be to match the string and then split it into separate words.
Not exactly what you are trying to do, but regular expression conditions might help out as you look for a solution:
(?<quot>')?(?<words>(?(quot)[^']|\w)+)(?(quot)')
If a quote is found, then it matches until a non-quote is found. Otherwise looks at word characters. Your results are in groups named "quot" and "words".
You'll have hard time using Split here, but you can use a MatchCollection to find all matches in your string:
string str = "hello world, 'HELLO WORLD': we'll be fine.";
MatchCollection matches = Regex.Matches(str, #"(')([^']+)(')|(\w+)");
The regex searches for a string between single quotes. If it cannot find one, it takes a single word.
Now it gets a little tricky - .net returns a collection of Matchs. Each Match has several Groups - the first Group has the whole string ('hello world'), but the rest have sub-matches (',hello world,'). Also, you get many empty unsuccessful Groups.
You can still iterate easily and get your matches. Here's an example using LINQ:
var tokens = from match in matches.Cast<Match>()
from g in match.Groups.Cast<Group>().Skip(1)
where g.Success
select g.Value;
tokens is now a collection of strings:
hello, world, ', HELLO WORLD, ', we, ll, be, fine
You can first split on quoted string, and then further tokenize.
foreach (String s in Regex.Split(input, #"('[^']+')")) {
// Check first if s is a quote.
// If so, split out the quotes.
// If not, do what you intend to do.
}
(Note: you need the brackets in the pattern to make sure Regex.Split returns those too)
Try this Regular Expression:
([']*)([a-z]+)([']*)
This finds 1 or more single quotes at the beginning and end of a string. It then finds 1 or more characters in the a-z set (if you don't set it to be case insensitive it will only find lower case characters). It groups these so that group 1 has the ', group 2 (or more) has the words which are split by anything that is not a character a - z and the last group has the single quote if it exists.
Duplicate
Regex for variable declaration and initialization in c#
I was looking for a Regular Expression to parse CSV values, and I came across this Regular Expression
[^,]+
Which does my work by splitting the words on every occurance of a ",". What i want to know is say I have the string
value_name v1,v2,v3,v4,...
Now I want a regular expression to find me the words v1,v2,v3,v4..
I tried ->
^value_name\s+([^,]+)*
But it didn't work for me. Can you tell me what I am doing wrong? I remember working on regular expressions and their statemachine implementation. Doesn't it work in the same way.
If a string starts with Value_name followed by one or more whitespaces. Go to Next State. In That State read a word until a "," comes. Then do it again! And each word will be grouped!
Am i wrong in understanding it?
You could use a Regex similar to those proposed:
(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?
The first group is non-capturing and would match the start of the line and the value_name.
To ensure that the Regex is still valid over all matches, we make that group optional by using the '?' modified (meaning match at most once).
The second group is capturing and would match your vXX data.
The third group is non-capturing and would match the ,, and any whitespace before and after it.
Again, we make it optional by using the '?' modifier, otherwise the last 'vXX' group would not match unless we ended the string with a final ','.
In you trials, the Regex wouldn't match multiple times: you have to remember that if you want a Regex to match multiple occurrences in a strings, the whole Regex needs to match every single occurrence in the string, so you have to build your Regex not only to match the start of the string 'value_name', but also match every occurrence of 'vXX' in it.
In C#, you could list all matches and groups using code like this:
Regex r = new Regex(#"(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?");
Match m = r.Match(subjectString);
while (m.Success) {
for (int i = 1; i < m.Groups.Count; i++) {
Group g = m.Groups[i];
if (g.Success) {
// matched text: g.Value
// match start: g.Index
// match length: g.Length
}
}
m = m.NextMatch();
}
I would expect it only to get v1 in the group, because the first comma is "blocking" it from grabbing the rest of the fields. How you handle this is going to depend on the methods you use on the regular expression, but it may make sense to make two passes, first grab all the fields seperated by commas and then break things up on spaces. Perhaps ^value_name\s+(?:([^,]+),?)* instead.
Oh yeah, lists....
/(?:^value_name\s+|,\s*)([^,]+)/g will theoreticly grab them, but you will have to use RegExp.exec() in a loop to get the capture, rather than the whole match.
I wish pre-matches worked in JS :(.
Otherwise, go with Logan's idea: /^value_name\s+([^,]+(?:,\s*[^,]+)*)$/ followed by .split(/,\s*/);