This is the input string: 23x^45*y or 2x^2 or y^4*x^3.
I am matching ^[0-9]+ after letter x. In other words I am matching x followed by ^ followed by numbers. Problem is that I don't know that I am matching x, it could be any letter that I stored as variable in my char array.
For example:
foreach (char cEle in myarray) // cEle is letter in char array x, y, z, ...
{
match CEle in regex(input) //PSEUDOCODE
}
I am new to regex and I new that this can be done if I define regex variables, but I don't know how.
You can use the pattern #"[cEle]\^\d+" which you can create dynamically from your character array:
string s = "23x^45*y or 2x^2 or y^4*x^3";
char[] letters = { 'e', 'x', 'L' };
string regex = string.Format(#"[{0}]\^\d+",
Regex.Escape(new string(letters)));
foreach (Match match in Regex.Matches(s, regex))
Console.WriteLine(match);
Result:
x^45
x^2
x^3
A few things to note:
It is necessary to escape the ^ inside the regular expression otherwise it has a special meaning "start of line".
It is a good idea to use Regex.Escape when inserting literal strings from a user into a regular expression, to avoid that any characters they type get misinterpreted as special characters.
This will also match the x from the end of variables with longer names like tax^2. This can be avoided by requiring a word boundary (\b).
If you write x^1 as just x then this regular expression will not match it. This can be fixed by using (\^\d+)?.
The easiest and faster way to implement from my point of view is the following:
Input: This?_isWhat?IWANT
string tokenRef = "?";
Regex pattern = new Regex($#"([^{tokenRef}\/>]+)");
The pattern should remove my tokenRef and storing the following output:
Group1 This
Group2 _isWhat
Group3 IWANT
Try using this pattern for capturing the number but excluding the x^ prefix:
(?<=x\^)[0-9]+
string strInput = "23x^45*y or 2x^2 or y^4*x^3";
foreach (Match match in Regex.Matches(strInput, #"(?<=x\^)[0-9]+"))
Console.WriteLine(match);
This should print :
45
2
3
Do not forget to use the option IgnoreCase for matching, if required.
Related
I have some string in a file in the format
rid="deqn1-2"
rid="deqn3"
rid="deqn4-5a"
rid="deqn5b-7"
rid="deqn7-8"
rid="deqn9a-10v"
rid="deqn11a-12c"
I want a regex to match each deqnX-Y where X and Y are either both integers or both combination of integer and alphabet and if there is a match store X and Y in some variables.
I tried using the regex (^(\d+)-(\d+)$|^(\d+[a-z])-(\d+[a-z]))$
, but how do I get the values of the matched groups in variables?
For a match between two integers the groups would be (I think)
Groups[2].Value
Groups[3].Value
and for match between two integer and alphabet combo will be
Groups[4].Value
Groups[5].Value
How do I determine which match actually occured and then capture the matching groups accordingly?
As branch reset(?|) is not supported in C#, we can use named capturing group with same name like
deqn(?:(?<match1>\d+)-(?<match2>\d+)|(?<match1>\d+\w+)-(?<match2>\d+\w+))\b
regextester demo
C# code
String sample = "deqn1-2";
Regex regex = new Regex("deqn(?:(?<match1>\\d+)-(?<match2>\\d+)|(?<match1>\\d+\\w+)-(?<match2>\\d+\\w+))\\b");
Match match = regex.Match(sample);
if (match.Success) {
Console.WriteLine(match.Groups["match1"].Value);
Console.WriteLine(match.Groups["match2"].Value);
}
dotnetfiddle demo
You could simply not care. One of the pairs will be empty anyway. So what if you just interpret the result as a combination of both? Just slap them together. First value of the first pair plus first value of the second pair, and second value of the first pair plus second value of the second pair. This always gives the right result.
Regex regex = new Regex("^deqn(?:(\\d+)-(\\d+)|(\\d+[a-z])-(\\d+[a-z]))$");
foreach (String str in listData)
{
Match match = regex.Match(str);
if (!match.Success)
continue;
String value1 = Groups[1].Value + Groups[3].Value;
String value2 = Groups[2].Value + Groups[4].Value;
// process your strings
// ...
}
I have this string
TEST_TEXT_ONE_20112017
I want to eliminate _20112017, which is a underscore with numbers, those numbers can vary; my goal is to have only
TEST_TEXT_ONE
So far I have this but I get the entire string, is there something I'm missing?
Regex r = new Regex(#"\b\w+[0-9]+\b");
MatchCollection words = r.Matches("TEST_TEXT_ONE_20112017");
foreach(Match word in words)
{
string w = word.Groups[0].Value;
//I still get the entire string
}
Notes for your consideration:
You should use parenthesis to mark groups for capture -or- use named group. The first group (index=0) is the entire match. you probably want index=1 instead.
\w stands for word character and it already includes both underscore and digits. If you want to match anything before the numbers then you should consider using . instead of \w.
by default +is greedy and your \w+ will consume your last undescore and all but the very last number as well. You probably want to explicitly require an underscore before last block of numbers.
I would suggest considering if you want to find a matching substring or the entire string to match. if the latter, then consider using the start and end markers: ^ and $.
if you know you want to eliminate 8 digits, then you could giving explicit count like \d{8}
For example this should work:
Regex r = new Regex(#"^(.+)_\d+$");
MatchCollection words = r.Matches("TEST_TEXT_ONE_20112017");
foreach (Match word in words)
{
string w = word.Groups[1].Value;
}
Alternative
Use a Zero-Width Positive Lookahead Assertions construct to check what comes next without capturing it. This uses the syntax on (?=stuff). So you could use a shorter code and avoid surfing in Groups altogether:
Regex r = new Regex(#"^.+(?=_\d+$)");
String result = r.Match("TEST_TEXT_ONE_20112017").Value;
Note that we require the end marker $ within the positive lookahead group.
Regex r = new Regex(#"(\b.+)_([0-9]+)\b");
String w = r.Match("TEST_TEXT_ONE_20112017").Groups[1].Value; //TEST_TEXT_ONE
or:
String w = r.Match("TEST_TEXT_ONE_20112017").Groups[2].Value; //20112017
This seems a bit overkill for Regex in my opinion. As an alternative you could just split on the _ character and rebuild the string:
private static string RemoveDate(string input)
{
string[] parts = input.Split('_');
return string.Join("_", parts.Take(parts.Length - 1));
}
Or if the date suffix is always the same length, you could also just substring:
private static string RemoveDateFixedLength(string input)
{
//Removes last 9 characters (8 for date, 1 for underscore)
return input.Substring(0, input.Length - 9);
}
However I feel like the first approach is better, this is just another option.
Fiddle here
I'm reading from a file, and need to find a string that is encapsulated by two identical non-ascii values/control seperators, in this case 'RS'
How would I go about doing this? Would I need some form of regex?
RS stands for Record Separator, and it has a value of 30 (or 0x1E in hexadecimal). You can use this regular expression:
\x1E([\w\s]*?)\x1E
That matches the RS, then matches any letter, number or space, and then again the RS. The ? is to make the regex match as less characters as possible, in case there are more RS characters afterwards.
If you prefer not to match numbers, you could use [a-zA-Z\s] instead of [\w\s].
Example:
string fileContents = "Something \u001Eyour string\u001E more things \u001Eanother text\u001E end.";
MatchCollection matches = Regex.Matches(fileContents, #"\x1E([\w\s]*?)\x1E");
if (matches.Count == 0)
return; // Not found, display an error message and exit.
foreach (Match match in matches)
{
if (match.Groups.Count > 1)
Console.WriteLine(match.Groups[1].Value);
}
As you can see, you get a collection of Match, and each match.Value will have the whole matched string including the separators. match.Groups will have all matched groups, being the first one again the whole matched string (that's by default) and then each of your groups (those between parenthesis). In this case, you only have one in your regex, so you just need the second one on that list.
Using regex you can do something like this:
string pattern = string.Format("{0}(.*){1}",firstString,secondString);
var matches = Regex.Matches(myString, pattern);
foreach (Match match in matches)
{
foreach (Capture capture in match.Captures)
{
//Do stuff, with the current you should remove firstString and secondString from the capture.Value
}
}
After that use Regex.match to find the string that match with the pattern built before.
Remember to escape all the special char for regex.
You can use Regex.Matches, I'm using X as the separator in this example:
var fileContents = "Xsomething1X Xsomething2X Xsomething3X";
var results = Regex.Matches(fileContents, #"(X).*?(\1)");
The you can loop on results to do anything you want with the matches.
The \1 in the regex means "reference first group". I've put X between () so it is going to be group 1, the I use \1 to say that the match in this place should be exactly the same as the group 1.
You don't need a regular expression for that.
Read the contents of the file (File.ReadAllText).
Split on the separator character (String.Split).
If you know there's only one occurrence of your string, take the second array element (result[1]). Otherwise, take every other entry (result.Where((x, i) => i % 2 == 1)).
I have a String
String test = #"Lists/Versions/2_.000";
I'm a bit confused on how to use regex to do this.
I'm using the pattern
String pattern = #"\D+";
The msdn page for regular expression says \D is "Matches any character other than a decimal digit"
So shouldn't it be returning 'Lists/Versions/' , '2'?
However its returning
'' , '2', '000'
I would like the string to only match the 2(Or any Integer). How would I do that?
String url = #"Lists/Versions/2_.000";
String pattern = #"\D+";
string[] substrings = Regex.Split(url, pattern);
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
The reason your receiving the issue, is because the /D is to capture non digits, so it detects two separate numeric values (2 and 000) because of the _. So that is how it is grabbing the data. So you have a couple of choices:
Break the string into manageable portions, then anchor to the array.
Build a better pattern to separate.
So the question will be, what are you trying to parse? 2.00 ? Or are you trying to separate numeric numbers in your string?
I'm assuming you have a typo also:
\d Matches a digit character. Equivalent to [0-9].
\D Matches a non-digit character. Equivalent to [^0-9].
\w Matches any word character including underscore. Equivalent to
"[A-Za-z0-9_]".
\W Matches any non-word character. Equivalent to "[^A-Za-z0-9_]".
You should be able to use:
You should simply do the following:
string url = #"Lists/Versions/2_.000";
var data = Regex.Split(url, #"\D+");
Console.WriteLine(#"Value: {0} and Secondary Value: {1}", data[0], data[1]);
That should find all integer values, so it should provide an output of:
2
000
Which should return as a normal string []. My syntax or expression may be off, but you can find a nice cheat sheet for Regular Expressions here. You'll also want to ensure you check the bounds of the array.
https://dotnetfiddle.net/BU6gp2
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
String url = #"Lists/Versions/2_.000";
String pattern = #"\D+";
string[] substrings = Regex.Split(url, pattern);
Console.WriteLine("'{0}'", substrings[1]);
}
}
Please try the following:
// using System.Linq;
String url = #"Lists/Versions/2_.000";
String pattern = #"(?<=/)\d+";
string[] substrings = Regex.Matches(url, pattern)
.Cast<Match>()
.Select(_ => _.Value)
.ToArray();
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
Alternatively, if you don't need an array.
String url = #"Lists/Versions/2_.000";
String pattern = #"(?<=/)\d+";
Console.WriteLine("'{0}'", Regex.Match(url, pattern).Value);
I have a string like
[123,234,321,....]
Can i use a regular expression to extract only the numbers ?
this is the code i am using now
public static string[] Split()
{
string s = "[123,234,345,456,567,678,789,890,100]";
var temp = s.Replace("[", "").Replace("]", "");
char[] separator = { ',' };
return temp.Split(separator);
}
You can use string.Split for this - no need for a regular expression, though your code can be simplified:
var nums = s.Split('[', ']', ',');
Thought you may want to exclude empty entries in the returned array:
var nums = s.Split(new[] { '[', ']', ',' },
StringSplitOptions.RemoveEmptyEntries);
There's an overload to Trim() that takes a character.
You could do this.
string s = "[123,234,345,456,567,678,789,890,100]";
var nums = s.Trim('[').Trim(']').Split(',');
If you want to use a regular expression, try:
string s = "[123,234,345,456,567,678,789,890,100]";
var matches = Regex.Matches(s, #"[0-9]+", RegexOptions.Compiled);
However, regular expressions tend to make your code less readable, so you might stick with your original approach.
Try with using string.Split method;
string s = "[123,234,345,456,567,678,789,890,100]";
var numbers = s.Split('[',']', ',');
foreach(var i in numbers )
Console.WriteLine(i);
Here is a DEMO.
EDIT: As Oded mentioned, you may want to use StringSplitOptions.RemoveEmptyEntries also.
string s = "[123,234,345,456,567,678,789,890,100]";
MatchCollection matches = Regex.Matches(s, #"(\d+)[,\]]");
string[] result = matches.OfType<Match>().Select(m => m.Groups[1].Value).ToArray();
Here the # is used to signify a verbatim string literal and allows the escape character '\' to be used directly in Regular expression notation without escaping itself "\".
\d is a digit, \d+ mean 1 or more digits. The parenthesis signify a group so (\d+) means I want a group of digits. (*See group used a little later)
[,\]] square brackets, in brief, mean choose any one of my element so it will choose either the comma , or a square bracket ] which I had to escape.
So the regular expression will find the expressions of sequential digits followed by a , or ]. The Matches will return the set of matches (which we use because there are multiple set) then we go through each match - with some LINQ - and grab the index 1 group which is the second group, "But we only made one group?" We only specified one group, the first group (index 0) is the entire regular expression match, which in our case, will include the , or ] which we don't want.
while you can and probably should use string.Split as other answers indicate, the question specifically asks if you can do it with regex, and yes, you can :-
var r = new Regex(#"\d+", RegexOptions.Compiled );
var matches = r.Matches("[123,234,345,456,567,678,789,890,100]");