Text between 2 optional strings with OR condition using Regex - c#

I have a string with 2 possibilities:
var desc = "Keyword1: That text I want \r\n Keyword2: Value2 \r\n Keyword3: Value3 \r\n Keyword4: Value4"
var desc = "Keyword1: That text I want Keyword2: Value2 \r\n Keyword3: Value3 \r\n Keyword4: Value4"
where the order of the keywords after the text "That text I want" Keyword2, Keyword3, Keyword4 doesn't matter and they are all optional.
I tried with the Regex Keyword1:(\s+)(.*)(\W+?)(\r\n?)(?=Keyword2:|Keyword3:|Keyword4:)
It does not work. Not sure what is wrong in my regex.
Any help is highly appreciated.
Thanks in advance!

Show here for the solution.
In your case you could simply use (regex between two strings):
(?<=Keyword1:)(.*)(?=Keyword2)
Try it out
Hope it helps.

Assuming those \r\n are actual special characters in the string and not the literals, this should work:
Keyword1: (.*?)(Keyword2:|Keyword3:|Keyword4:|\r\n)
You need to get the second grouping from the match. For example: match.Groups[1].
This regex matches Keyword1:, followed by the minimum amount of necessary characters, and then followed by either Keyword2: or \r\n (special characters). If those are literals in your input string, you will need to double those backslashes.
You can check it here. Note that on the right, Group 1 contains your text in both cases.

var pattern = keywordName + #":\s+(.+?)\r?\n";
var regex = new Regex(pattern);
var match = regex.Match(description);
if (!match.Success) return null;
var firstMatch = match.Groups[1].Value;
//Find if there's another keyword in the extracted Value
var lstKeywords = Enum.GetValues(typeof(Keywords)).Cast<Keywords>().Where(k => k != keywordName);
//Add : to the last value so that it's recognized as a keyword
var sOtherKeywords = string.Join(":|", lstKeywords) + ":";
var pattern2 = #"(" + sOtherKeywords + #")(\s+)";
regex = new Regex(pattern2);
match = regex.Match(firstMatch);
//If there's no other keyword in the same line then return the expression that is extracted from the first regex
if (!match.Success) return firstMatch;
var secondMatch = match.Groups[1].Value;
var pattern3 = keywordName + #":\s+(.+)(\r?\n?)" + secondMatch;
regex = new Regex(pattern3);
match = regex.Match(description);
return match.Success ? match.Groups[1].Value.TrimEnd() : null;

Related

Regular expression split string, extract string value before and numeric value between square brackets

I need to parse a string that looks like "Abc[123]". The numerical value between the brackets is needed, as well as the string value before the brackets.
The most examples that I tested work fine, but have problems to parse some special cases.
This code seems to work fine for "normal" cases, but has some problems handling "special" cases:
var pattern = #"\[(.*[0-9])\]";
var query = "Abc[123]";
var numVal = Regex.Matches(query, pattern).Cast<Match>().Select(m => m.Groups[1].Value).FirstOrDefault();
var stringVal = Regex.Split(query, pattern)
.Select(x => x.Trim())
.FirstOrDefault();
How should the code be adjusted to handle also some special cases?
For instance for the string "Abc[]" the parser should return correctly "Abc" as the string value and indicate an empty the numeric value (which could be eventually defaulted to 0).
For the string "Abc[xy33]" the parser should return "Abc" as the string value and indicate an invalid numeric value.
For the string "Abc" the parser should return "Abc" as the string value and indicate a missing numeric value. The blanks before/after or inside the brackets should be trimmed "Abc [ 123 ] ".
Try this pattern: ^([^\[]+)\[([^\]]*)\]
Explanation of a pattern:
^ - match beginning of a string
([^\[]+) - match one or more of any character ecept [ and store it insinde first capturing group
\[ - match [ literally
([^\]]*) - match zero or more of any character except ] and store inside second capturing group
\] - match ] literally
Here's tested code:
var pattern = #"^([^\[]+)\[([^\]]*)\]";
var queries = new string[]{ "Abc[123]", "Abc[xy33]", "Abc[]", "Abc[ 33 ]", "Abc" };
foreach (var query in queries)
{
string beforeBrackets;
string insideBrackets;
var match = Regex.Match(query, pattern);
if (match.Success)
{
beforeBrackets = match.Groups[1].Value;
insideBrackets = match.Groups[2].Value.Trim();
if (insideBrackets == "")
insideBrackets = "0";
else if (!int.TryParse(insideBrackets, out int i))
insideBrackets = "incorrect value!";
}
else
{
beforeBrackets = query;
insideBrackets = "no value";
}
Console.WriteLine($"Input string {query} : before brackets: {beforeBrackets}, inside brackets: {insideBrackets}");
}
Console.ReadKey();
Output:
We can try doing a regex replacement on the input, for a one-liner solution:
string input = "Abc[123]";
string letters = Regex.Replace(input, "\\[.*\\]", "");
string numbers = Regex.Replace("Abc[123]", ".*\\[(\\d+)\\]", "$1");
Console.WriteLine(letters);
Console.WriteLine(numbers);
This prints:
Abc
123
Pretty sure there'd be some language-based techniques for that, which I wouldn't know, yet with a regular expression, we'd capture everything using capturing groups and check for things one by one, maybe:
^([A-Za-z]+)\s*(\[?)\s*([A-Za-z]*)(\d*)\s*(\]?)\s*$
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.
You can achieve that easily without using regex
string temp = "Abc[123]";
string[] arr = temp.Split('[');
string name = arr[0];
string value = arr[1].ToString().TrimEnd(']');
output name = Abc, and value = 123

How to split string by another string

I have this string (it's from EDI data):
ISA*ESA?ISA*ESA?
The * indicates it could be any character and can be of any length.
? indicates any single character.
Only the ISA and ESA are guaranteed not to change.
I need this split into two strings which could look like this: "ISA~this is date~ESA|" and
"ISA~this is more data~ESA|"
How do I do this in c#?
I can't use string.split, because it doesn't really have a delimeter.
You can use Regex.Split for accomplishing this
string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
var regex = new Regex($#"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);
foreach (var item in items) {
Console.WriteLine(item);
}
Output:
ISA~this is date~ESA
ISA~this is more data~ESA|
Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.
To explain the Regex a bit:
(?<=ESA) Look-behind assertion. This portion is not captured but still matched
(?=ISA) Look-ahead assertion. This portion is not captured but still matched
Using these look-around assertions you can find the correct | character for splitting
Simply use the
int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here
and then just use the substring from 0 to that indexOf, indexOf to length.
Edit:
If ? is not known,
can we just use the regex Pattern and Matcher.
Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
if(matcher.find()) {
matcher.find();
int x = matcher.start();
}
Here x would give that start index of that match.
Edit: I mistakenly saw it as java one, for C#
string pattern = #"ISA.*ESA";
Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = myRegex.Match(whateverString); // m is the first match
while (m.Success)
{
Console.writeLine(m.value);
m = m.NextMatch(); // more matches
}
RegEx will probably be the best for this. See this link
Mask would be
ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.
This will give you 2 groups with data you need
Match match = Regex.Match(input, #"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);
if (match.Success)
{
var data1 = match.Groups["data1"].Value;
var data2 = match.Groups["data2"].Value;
}
Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.
It's kinda hacky but you could do...
string x = "ISA*ESA?ISA*ESA?";
x = x.Replace("*","~"); // OR SOME OTHER DELIMITER
string[] y = x.Split('~');
Not perfect in all situations, but it could solve your problem simply.
You could split by "ISA" and "ESA" and then put the parts back together.
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
string start = "ISA",
end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);
var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";
firstPart = "ISA~this is date~ESA|"
secondPart = "ISA~this is more data~ESA|";
Use a Regex like ISA(.+?)ESA and select the first group
string input = "ISA~mycontent+ESA";
Match match = Regex.Match(input, #"ISA(.+?)ESA",RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}
Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:
Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$
Explanation:
^ - asserts position at start of the string
( - start capturing group
ISA - match string ISA exactly
.*?(?=ESA) - match any character 0 or more times, positive lookahead on the
string ESA (basically match any character until the string ESA is found)
ESA - match string ESA exactly
. - match any character
) - end capturing group
repeat one more time...
$ - asserts position at end of the string
Try it on Regex101
Example:
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(#"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
RegexOptions.Compiled);
Match match = regex.Match(input);
if (match.Success)
{
string firstValue = match.Groups[1].Value; // "ISA~this is date~ESA|"
string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}
There are two answers to the question "How to split a string by another string".
var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);
and
var matches = Regex.Split(input, "ISA").ToList();
However, the first removes empty entries, while the second does not.

Split string and print the value of string separately in autocad

534-W1A-R1 this is my file name and I want to split it so it prints like
Code=534 Phase=1 Zone=A
in my Autocad file.
The below split code should work:
string str = #"534-W1A-R1";
var split = str.Split('-');
string code = split.First();
string phase = new string(split.ElementAt(1).Skip(1).Take(1).ToArray());
string zone = new string(split.ElementAt(1).Skip(2).Take(1).ToArray());
string result = String.Format("Code={0} Phase={1} Zone={2}", code, phase, zone);
Console.WriteLine(result);
Output:
Code=534 Phase=1 Zone=A
Use the Substring() method.
string input = "534-W1A-R1";
string sub = input.Substring(0, 3);
string sub2 = input.Substring(5, 1);
string sub3 = input.Substring(6, 1);
Console.WriteLine("Code={0} Phase={1} Zone={2}", sub, sub2, sub3);
Output:
Code=534 Phase=1 Zone=A
You have different ways to do it. if you are sure about the format of the text you can just use this:
var str= "534-W1A-R1";
var parts=str.Split('-');
var code= parts[0];
var secondPart= parts[1];
var phase=secondPart.Substring(1,secondPart.Length-2);
var zone=secondPart[secondPart.Length-1];
You can also use Regex if it is more complicated.
Using Regex
Edit: added some comments (pattern description)
var pattern = #"^(\d+)-[A-Z](\d+)([A-Z])-";
/* pattern description:
^(\d+) group 1: one or more digits at the begining
- one hyphen (literal)
[A-Z] one alphabetic character
(\d+) group 2: one or more digits
([A-Z]) group 3: one alphabetic character
- one hyphen (literal)
*/
var input = "534-W1A-R1";
var groups = Regex.Match(input, pattern, RegexOptions.IgnoreCase).Groups;
var code = groups[1].Value;
var phase = groups[2].Value;
var zone = groups[3].Value;

C# How to replace a shorter string than the matched string?

How can I replace only a part of a matched regex string ? I need to find some strings that are inside of some brackets like < >. In this example I need to match 23 characters and replace only 3 of them:
string input = "<tag abc=\"hello world\"> abc=\"whatever\"</tag>";
string output = Regex.Replace(result, ???, "def");
// wanted output: <tag def="hello world"> abc="whatever"</tag>
So I either need to find abc in <tag abc="hello world"> or find <tag abc="hello world"> and replace just abc. Do regular expressions or C# allow that ? And even if I solve the problem differently is it possible to match a big string but replace only a little part of it ?
I'd have to look up the #NET regex dialect, but in general you want to capture the parts you don't want to replace and refer to them in your replacement string.
string output = Regex.Replace(input, "(<tag )abc(=\"hello world\">)", "$1def$2");
Another option would be to use lookaround to match "abc" where it follows "<tag " and precedes "="hello world">"
string output = Regex.Replace(input, "(?<=<tag )abc(?==\"hello world\")", "def");
Instead of Regex.Replace use Regex.Match, then you can use the properties on the Match object to figure out where the match occurred.. then the regular string functions (String.Substring) can be used to replace the bit you want replaced.
Working sample with named groups:
string input = #"<tag abc=""hello world""> abc=whatever</tag>";
Regex regex = new Regex(#"<(?<Tag>\w+)\s+(?<Attr>\w+)=.*?>.*?</\k<Tag>>");
string output = regex.Replace(input, match =>
{
var attr = match.Groups["Attr"];
var value = match.Value;
var left = value.Substring(0, attr.Index);
var right = value.Substring(attr.Index + attr.Length);
return left + attr.Value.Replace("abc", "def") + right;
});

C# Regex to replace invalid character to make it as perfect float number

for example if the string is "-234.24234.-23423.344"
the result should be "-234.2423423423344"
if the string is "898.4.44.4"
the result should be "898.4444"
if the string is "-898.4.-"
the result should be "-898.4"
the result should always make scene as a double type
What I can make is this:
string pattern = String.Format(#"[^\d\{0}\{1}]",
NumberFormatInfo.CurrentInfo.NumberDecimalSeparator,
NumberFormatInfo.CurrentInfo.NegativeSign);
string result = Regex.Replace(value, pattern, string.Empty);
// this will not be able to deal with something like this "-.3-46821721.114.4"
Is there any perfect way to deal with those cases?
It's probably a bad idea, but you can do this with regex like this:
Regex.Replace(input, #"[^-.0-9]|(?<!^)-|(?<=\..*)\.", "")
The regex matches:
[^-.0-9] # anything which isn't ., -, or a digit.
| # or
(?<!^)- # a - which is not at the start of the string
| # or
(?<=\..*)\. # a dot which is not the first dot in the string
This works on your examples, and additionally this case: "9-1.1" becomes "91.1".
You could also change (?<!^)- to (?<!^[^-.0-9]*)- if you'd like "asd-8" to become "-8" rather than "8".
It's not a good idea using regex itself to achieve your goal, since regex lack AND and NOT logic for expression.
Try the code below, it will do the same thing.
var str = #"-.3-46821721.114.4";
var beforeHead = "";
var afterHead = "";
var validHead = new Regex(#"(\d\.)" /* use #"\." if you think "-.5" is also valid*/, RegexOptions.Compiled);
Regex.Replace(str, #"[^0-9\.-]", "");
var match = validHead.Match(str);
beforeHead = str.Substring(0, str.IndexOf(match.Value));
if (beforeHead[0] == '-')
{
beforeHead = '-' + Regex.Replace(beforeHead, #"[^0-9]", "");
}
else
{
beforeHead = Regex.Replace(beforeHead, #"[^0-9]", "");
}
afterHead = Regex.Replace(str.Substring(beforeHead.Length + 2 /* 1, if you use \. as head*/), #"[^0-9]", "");
var validFloatNumber = beforeHead + match.Value + afterHead;
String must be trimmed before operation.

Categories

Resources