In the example below, I want to get the string "2023". All the code I wrote for this is below. I think it shouldn't be hard to get the string "2023" like this. What is the simplest way to get the "2023" in the given string?
const string raw = #"TAAD, Türkiye Adalet Akademisi'nin 95. Kuruluş Yıl Dönümü Armağanı, y.78, S.179, Temmuz 2023, s.108-157";
var frst = raw.Split(',').FirstOrDefault(x => x.Any(char.IsDigit) && Convert.ToInt32(new string(x.Where(char.IsDigit).Take(4).ToArray())) > 2000);
var scnd = new string(frst?.Where(char.IsDigit).Take(4).ToArray());
if (scnd.Length > 0 && Convert.ToInt32(scnd) > 2000) MessageBox.Show(scnd);
Try regular expressions:
var pattern = #"\b(\d{4})\b";
foreach (var match in Regex.Matches(raw, pattern))
{
// do something with match
}
If you want to match 4 digits greater than 2000, you can use:
\b[2-9][0-9]{3}\b(?<!2000)
In parts, the pattern matches:
\b A word boundary to prevent a partial match
[2-9] Match a digit 2-9
[0-9]{3} Match 3 digits 0-9
\b A word boundary
(?<!2000) Negative lookbehind, assert not 2000 directly to the left
Regex demo
Note that in C# using \d also matches digits in other languages.
Related
I want to get a Substring out of a String.
The Substring I want is a sequence of numerical characters.
Input
"abcdefKD-0815xyz42ghijk";
"dag4ah424KD-42ab333k";
"BeverlyHills90210KD-433Nokia3310";
Generally it could be any String, but they all have one thing in common:
There is a part that starts with KD-
and ends with a number
Everything after the number to be gone.
In the examples above this number would be 0815, 42, 433 respectively. But it could be any number
Right now I have a Substring that contains all numerical characters after KD- but I would like to have only the 0815ish part of the string.
What i have so far
String toMakeSub = "abcdef21KD-0815xyz429569468949489694694689ghijk";
toMakeSub = toMakeSub.Substring(toMakeSub.IndexOf("KD-") + "KD-".Length);
String result = Regex.Replace(toMakeSub, "[^0-9]", "");
The Result is 0815429569468949489694694689 but I want only the 0815 (it could be any length though so cutting after four digits is not possible).
Its as easy as the following pattern
(?<=KD-)\d+
The way to read this
(?<=subpattern) : Zero-width positive lookbehind assertion. Continues matching only if subpattern matches on the left.
\d : Matches any decimal digit.
+ : Matches previous element one or more times.
Example
var input = "abcdef21KD-0815xyz429569468949489694694689ghijk";
var regex = new Regex(#"(?<=KD-)\d+");
var match = regex.Match(input);
if (match.Success)
{
Console.WriteLine(match.Value);
}
input = "abcdef21KD-0815xyz429569468949489694694689ghijk, KD-234dsfsdfdsf";
// or to match multiple times
var matches = regex.Matches(input);
foreach (var matchValue in matches)
{
Console.WriteLine(matchValue);
}
Using C#, i am stuck while trying to extract a specific string while limiting the string to be matched. Here is my input string:
NPS_CNTY01_10112018_Adult_Submittal.txt
I would like to extract 01 after CNTY and ingnore anything after 01.
So far i have the regex to be:
(?!NPS_CNTY)\d{2}
But the above regex gets many other digit matches from the input string. One approach i was thinking was to limit the input to 9 characters to eventually get 01. But somehow not able to achieve that. Any help is appreciated.
I would like to add that the only variable data in this input string is:
NPS_CNTY[two digit county code excluding this bracket]_[date in MMDDYYYY format excluding the brackets]_Adult_Submittal.txt.
Also please limit solutions to regex's.
The (?!NPS_CNTY)\d{2} pattern matches a location that is not immediately followed with NPS_CNTY and then matches 2 digits. The lookahead always returns true since two digits cannot start a NPS_CNTY char sequence, it is redundant.
You may use a positive lookbehind like this to get 01:
var m = Regex.Match(s, #"(?<=NPS_CNTY)\d+");
var result = "";
if (m.Success)
{
result = m.Value;
}
See the .NET regex demo
Here, (?<=NPS_CNTY), a positive lookbehind, matches a location that is immediately preceded with NPS_CNTY and then \d+ matches 1 or more digits.
An equivalent solution using capturing mechanism is
var m = Regex.Match(s, #"NPS_CNTY(\d+)");
var result = "";
if (m.Success)
{
result = m.Groups[1].Value;
}
If the string always start with NPS_CNTY and you have to extract 2 digits then you don't need a regular expression. Just use Substring() method:
string text = #"NPS_CNTY01_01141980_Adult_Submittal.txt";
string digits = text.Substring(8, 2);
EDIT:
In case you need to match N digits after NPS_CNTY you can use the following code:
string text = #"NPS_CNTY012_01141980_Adult_Submittal.txt";
string digits = text.Replace("NPS_CNTY", string.Empty)
.Split("_", StringSplitOptions.RemoveEmptyEntries)
.FirstOrDefault();
My task is extract the first digits in the following string:
GLB=VSCA|34|speed|1|
My pattern is the following:
(?x:VSCA(\|){1}(\d.))
Basically I need to extract "34", the first digits occurrence after the "VSCA". With my pattern I obtain a group but would be possibile to get only the number? this is my c# snippet:
string regex = #"(?x:VSCA(\|){1}(\d.))";
Regex rx = new Regex(regex);
string s = "GLB=VSCA|34|speed|1|";
if (rx.Match(s).Success)
{
var test = rx.Match(s).Groups[1].ToString();
}
You could match 34 (the first digits after VSCA) using a positive lookbehind (?<=VSCA\D*) to assert that what is on the left side is VSCA followed by zero or times not a digit \D* and then match one or more digits \d+:
(?<=VSCA\D*)\d+
If you need the pipe to be after VSCA the you could include that in the lookbehind:
(?<=VSCA\|)\d+
Demo
This regex pattern: (?<=VSCA\|)\d+?(?=\|) will match only the number. (If your number can be negative / have decimal places you may want to use (?<=VSCA\|).+?(?=\|) instead)
You don't need Regex for this, you can simply split on the '|' character:
string s = "GLB=VSCA|34|speed|1|";
string[] parts = s.Split('|');
if(parts.Length >= 2)
{
Console.WriteLine(parts[1]); //prints 34
}
The benefit here is that you can access all parts of the original string based on the index:
[0] - "GLB=VSCA"
[1] - "34"
[2] - "speed"
[3] - "1"
Fiddle here
While the other answers work really well, if you really must use a regular expression, or are interested in knowing how to get to that straight away you can use a named group for the number. Consider the following code:
string regex = #"(?x:VSCA(\|){1}(?<number>\d.?))";
Regex rx = new Regex(regex);
string s = "GLB:VSCA|34|speed|1|";
var match = rx.Match(s);
if(match.Success) Console.WriteLine(match.Groups["number"]);
How about (?<=VSCA\|)[0-9]+?
Try it out here
I want to get a Substring out of a String.
The Substring I want is a sequence of numerical characters.
Input
"abcdefKD-0815xyz42ghijk";
"dag4ah424KD-42ab333k";
"BeverlyHills90210KD-433Nokia3310";
Generally it could be any String, but they all have one thing in common:
There is a part that starts with KD-
and ends with a number
Everything after the number to be gone.
In the examples above this number would be 0815, 42, 433 respectively. But it could be any number
Right now I have a Substring that contains all numerical characters after KD- but I would like to have only the 0815ish part of the string.
What i have so far
String toMakeSub = "abcdef21KD-0815xyz429569468949489694694689ghijk";
toMakeSub = toMakeSub.Substring(toMakeSub.IndexOf("KD-") + "KD-".Length);
String result = Regex.Replace(toMakeSub, "[^0-9]", "");
The Result is 0815429569468949489694694689 but I want only the 0815 (it could be any length though so cutting after four digits is not possible).
Its as easy as the following pattern
(?<=KD-)\d+
The way to read this
(?<=subpattern) : Zero-width positive lookbehind assertion. Continues matching only if subpattern matches on the left.
\d : Matches any decimal digit.
+ : Matches previous element one or more times.
Example
var input = "abcdef21KD-0815xyz429569468949489694694689ghijk";
var regex = new Regex(#"(?<=KD-)\d+");
var match = regex.Match(input);
if (match.Success)
{
Console.WriteLine(match.Value);
}
input = "abcdef21KD-0815xyz429569468949489694694689ghijk, KD-234dsfsdfdsf";
// or to match multiple times
var matches = regex.Matches(input);
foreach (var matchValue in matches)
{
Console.WriteLine(matchValue);
}
The below code is performing following functionality which I intend to integrate into larger application.
Splitting large input string input by dot (.) character wherever it
occurs in input string.
Storing the splitted substrings into array result[];
In the foreach loop , a substring is matched for occurrence of
keyword.
If match occurs , starting from position of this matched substring in original input string , upto 300 characters are to be printed.
string[] result = input.Split('.');
foreach (string str in result)
{
//Console.WriteLine(str);
Match m = Regex.Match(str, keyword);
if (m.Success)
{
int start = input.IndexOf(str);
if ((input.Length - start) < 300)
{
Console.WriteLine(input.Substring(start, input.Length - start));
break;
}
else
{
Console.WriteLine(input.Substring(start, 300));
break;
}
}
The input is in fact large amount of text and I think this should be done by regular expression. Being a novice ,I am not able to put everything together using a regular expressions .
Match keyword. Match m = Regex.Match(str, keyword);
300 characters starting from dot (.) i.e starting from matched sentence , print 300 characters "^.\w{0,300}"
What I intend to do is :
Search for keyword in input text.
Just as a match is found , start from the sentence containing the
keyword and print upto 300 characters from input string.
How should I proceed ? Please help .
If I got it right, all you need to do is find your keyword and capture all that follows until you find first dot or reach maximum number of characters:
#"keyword([^\.]{0,300})"
See sample demo here.
C# code:
var regex = new Regex(#"keyword([^\.]{0,300})");
foreach (Match match in regex.Matches(input))
{
var result = match.Groups[1].Value;
// work with the result
}
Try this regex:
(?<=\.?)([\w\s]{0,300}keyword.*?)(?=\.)
explain:
(?= subexpression) Zero-width positive lookahead assertion.
(?<= subexpression) Zero-width positive lookbehind assertion.
*? Matches the previous element zero or more times, but as few times as possible.
and a simple code:
foreach (Match match in Regex.Matches(input,
#"(?<=\.?)([\w\s]{0,300}print.*?)(?=\.)"))
{
Console.WriteLine(match.Groups[1].Value);
}