regex split and filter construction

regex split and filter construction - c#

I need to filter input, to get only string inside parenthesis: Test1, Test2, Test3. I have try, but it is not working.
string input = "test test test #T(Test1) sample text #T(Test2) Something else #T(Test3) ";
string pattern = #"[#]";
string[] substrings = Regex.Split(input, pattern);

You can use a simple match instead.
(?<=#T\().*?(?=\))
string strRegex = #"(?<=#T\().*?(?=\))";
Regex myRegex = new Regex(strRegex, RegexOptions.None);
string strTargetString = #"test test test #T(Test1) sample text #T(Test2) Something else #T(Test3) ";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}

Note that #"[#]" regex pattern matches a single # character anywhere inside an input string. When you split against it, you are bound to get more than you need.
You should be matching rather than splitting:
string input = "test test test #T(Test1) sample text #T(Test2) Something else #T(Test3) ";
string pattern = #"#T\((?<val>[^()]*)\)";
string[] substrings = Regex.Matches(input, pattern)
.Cast<Match>()
.Select(p => p.Groups["val"].Value)
.ToArray();
The #T\((?<val>[^()]*)\) regex will match:
#T - literal #T
\( - a literal (
(?<val>[^()]*) - (Group with "val" name) 0 or more characters other than ( or )
\) - a literal )

Related

How to split string by another string

I have this string (it's from EDI data):
ISA*ESA?ISA*ESA?
The * indicates it could be any character and can be of any length.
? indicates any single character.
Only the ISA and ESA are guaranteed not to change.
I need this split into two strings which could look like this: "ISA~this is date~ESA|" and
"ISA~this is more data~ESA|"
How do I do this in c#?
I can't use string.split, because it doesn't really have a delimeter.

You can use Regex.Split for accomplishing this
string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
var regex = new Regex($#"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);
foreach (var item in items) {
Console.WriteLine(item);
}
Output:
ISA~this is date~ESA
ISA~this is more data~ESA|
Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.
To explain the Regex a bit:
(?<=ESA) Look-behind assertion. This portion is not captured but still matched
(?=ISA) Look-ahead assertion. This portion is not captured but still matched
Using these look-around assertions you can find the correct | character for splitting

Simply use the
int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here
and then just use the substring from 0 to that indexOf, indexOf to length.
Edit:
If ? is not known,
can we just use the regex Pattern and Matcher.
Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
if(matcher.find()) {
matcher.find();
int x = matcher.start();
}
Here x would give that start index of that match.
Edit: I mistakenly saw it as java one, for C#
string pattern = #"ISA.*ESA";
Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = myRegex.Match(whateverString); // m is the first match
while (m.Success)
{
Console.writeLine(m.value);
m = m.NextMatch(); // more matches
}

RegEx will probably be the best for this. See this link
Mask would be
ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.
This will give you 2 groups with data you need
Match match = Regex.Match(input, #"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);
if (match.Success)
{
var data1 = match.Groups["data1"].Value;
var data2 = match.Groups["data2"].Value;
}
Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.

It's kinda hacky but you could do...
string x = "ISA*ESA?ISA*ESA?";
x = x.Replace("*","~"); // OR SOME OTHER DELIMITER
string[] y = x.Split('~');
Not perfect in all situations, but it could solve your problem simply.

You could split by "ISA" and "ESA" and then put the parts back together.
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
string start = "ISA",
end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);
var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";
firstPart = "ISA~this is date~ESA|"
secondPart = "ISA~this is more data~ESA|";

Use a Regex like ISA(.+?)ESA and select the first group
string input = "ISA~mycontent+ESA";
Match match = Regex.Match(input, #"ISA(.+?)ESA",RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}

Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:
Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$
Explanation:
^ - asserts position at start of the string
( - start capturing group
ISA - match string ISA exactly
.*?(?=ESA) - match any character 0 or more times, positive lookahead on the
string ESA (basically match any character until the string ESA is found)
ESA - match string ESA exactly
. - match any character
) - end capturing group
repeat one more time...
$ - asserts position at end of the string
Try it on Regex101
Example:
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(#"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
RegexOptions.Compiled);
Match match = regex.Match(input);
if (match.Success)
{
string firstValue = match.Groups[1].Value; // "ISA~this is date~ESA|"
string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}

There are two answers to the question "How to split a string by another string".
var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);
and
var matches = Regex.Split(input, "ISA").ToList();
However, the first removes empty entries, while the second does not.

Extract multiple values from a string

I need to extract values from a string.
string sTemplate = "Hi [FirstName], how are you and [FriendName]?"
Values I need returned:
FirstName
FriendName
Any ideas on how to do this?

You can use the following regex globally:
\[(.*?)\]
Explanation:
\[ : [ is a meta char and needs to be escaped if you want to match it literally.
(.*?) : match everything in a non-greedy way and capture it.
\] : ] is a meta char and needs to be escaped if you want to match it literally.
Example:
string input = "Hi [FirstName], how are you and [FriendName]?";
string pattern = #"\[(.*?)\]";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);
if (matches.Count > 0)
{
Console.WriteLine("{0} ({1} matches):", input, matches.Count);
foreach (Match match in matches)
Console.WriteLine(" " + match.Value);
}

If the format/structure of the text won't be changing at all, and assuming the square brackets were used as markers for the variable, you could try something like this:
string sTemplate = "Hi FirstName, how are you and FriendName?"
// Split the string into two parts. Before and after the comma.
string[] clauses = sTemplate.Split(',');
// Grab the last word in each part.
string[] names = new string[]
{
clauses[0].Split(' ').Last(), // Using LINQ for .Last()
clauses[1].Split(' ').Last().TrimEnd('?')
};
return names;

You will need to tokenize the text and then extract the terms.
string[] tokenizedTerms = new string[7];
char delimiter = ' ';
tokenizedTerms = sTemplate.Split(delimiter);
firstName = tokenizedTerms[1];
friendName = tokenizedTerms[6];
char[] firstNameChars = firstName.ToCharArray();
firstName = new String(firstNameChars, 0, firstNameChars.length - 1);
char[] friendNameChars = lastName.ToCharArray();
friendName = new String(friendNameChars, 0, friendNameChars.length - 1);
Explanation:
You tokenize the terms, which separates the string into a string array with each element being the char sequence between each delimiter, in this case between spaces which is the words. From this word array we know that we want the 3rd word (element) and the 7th word (element). However each of these terms have punctuation at the end. So we convert the strings to a char array then back to a string minus that last character, which is the punctuation.
Note:
This method assumes that since it is a first name, there will only be one string, as well with the friend name. By this I mean if the name is just Will, it will work. But if one of the names is Will Fisher (first and last name), then this will not work.

Two Capturing Groups in Regex

I have string such as
(1)ABC(Some other text)
(2343)DEFGHIJ
(99)Q
I wanted a regex that would capture these strings into two groups like so
ist: (1) 2nd: ABC(Some other text)
1st: (2343) 2nd: DEFGHIJ
ist: (99) 2nd: Q
So I wrote this Regex
var regex new Regex("^\\((\\d+)(.*)\\)");
Match match = regex.Match(str);
But instead of the two groups I expected I get three groups
In the first example I get
(1)ABC(Some other text)
1
)ABC(Some other text
What's wrong?

The regex you are looking for is probably
#"^(\(\d+\))(.*)"
You reversed the order of the (. Note that the groups will be 3, because as someone pointed out, the group 0 is all the matched text. So
string str = "(1)ABC(Some other text)";
var regex = new Regex(#"^(\(\d+\))(.*)");
Match match = regex.Match(str);
if (match.Success)
{
string gr1 = match.Groups[1].Value; // (1)
string gr2 = match.Groups[2].Value; // (Some other text)
}

Get values between curly braces c#

I never used regex before. I was abel to see similar questions in forum but not exactly what im looking for
I have a string like following. need to get the values between curly braces
Ex: "{name}{name#gmail.com}"
And i Need to get the following splitted strings.
name and name#gmail.com
I tried the following and it gives me back the same string.
string s = "{name}{name#gmail.com}";
string pattern = "({})";
string[] result = Regex.Split(s, pattern);

Use Matches of Regex rather than Split to accomplish this easily:
string input = "{name}{name#gmail.com}";
var regex = new Regex("{(.*?)}");
var matches = regex.Matches(input);
foreach (Match match in matches) //you can loop through your matches like this
{
var valueWithoutBrackets = match.Groups[1].Value; // name, name#gmail.com
var valueWithBrackets = match.Value; // {name}, {name#gmail.com}
}

Is using regex a must? In this particular example I would write:
s.Split(new char[] { '{', '}' }, StringSplitOptions.RemoveEmptyEntries)

here you go
string s = "{name}{name#gmail.com}";
s = s.Substring(1, s.Length - 2);// remove first and last characters
string pattern = "}{";// split pattern "}{"
string[] result = Regex.Split(s, pattern);
or
string s = "{name}{name#gmail.com}";
s = s.TrimStart('{');
s = s.TrimEnd('}');
string pattern = "}{";
string[] result = Regex.Split(s, pattern);

How to split string into numerics and alphabets using Regex

I want to split a string like "001A" into "001" and "A"

string[] data = Regex.Split("001A", "([A-Z])");
data[0] -> "001"
data[1] -> "A"

Match match = Regex.Match(s, #"^(\d+)(.+)$");
string numeral = match.Groups[1].Value;
string tail = match.Groups[2].Value;

This is Java, but it should be translatable to other flavors with little modification.
String s = "123XYZ456ABC";
String[] arr = s.split("(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)");
System.out.println(Arrays.toString(arr));
// prints "[123, XYZ, 456, ABC]"
As you can see, this splits a string wherever \d is followed by a \D or vice versa. It uses positive and negative lookarounds to find the places to split.

If your code is as simple|complicated as your 001A sample, your should not be using a Regex but a for-loop.

And if there's more like 001A002B then you could
var s = "001A002B";
var matches = Regex.Matches(s, "[0-9]+|[A-Z]+");
var numbers_and_alphas = new List<string>();
foreach (Match match in matches)
{
numbers_and_alphas.Add(match.Value);
}

You could try something like this to retrieve the integers from the string:
StringBuilder sb = new StringBuilder();
Regex regex = new Regex(#"\d*");
MatchCollection matches = regex.Matches(inputString);
for(int i=0; i < matches.count;i++){
sb.Append(matches[i].value + " ");
}
Then change the regex to match on characters and perform the same loop.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

regex split and filter construction - c#

I need to filter input, to get only string inside parenthesis: Test1, Test2, Test3. I have try, but it is not working. string input = "test test test #T(Test1) sample text #T(Test2) Something else #T(Test3) "; string pattern = #"[#]"; string[] substrings = Regex.Split(input, pattern);

Related

How to split string by another string

Extract multiple values from a string

Two Capturing Groups in Regex

Get values between curly braces c#

How to split string into numerics and alphabets using Regex

Categories

Resources