How to split string by another string - c#

I have this string (it's from EDI data):
ISA*ESA?ISA*ESA?
The * indicates it could be any character and can be of any length.
? indicates any single character.
Only the ISA and ESA are guaranteed not to change.
I need this split into two strings which could look like this: "ISA~this is date~ESA|" and
"ISA~this is more data~ESA|"
How do I do this in c#?
I can't use string.split, because it doesn't really have a delimeter.

You can use Regex.Split for accomplishing this
string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
var regex = new Regex($#"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);
foreach (var item in items) {
Console.WriteLine(item);
}
Output:
ISA~this is date~ESA
ISA~this is more data~ESA|
Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.
To explain the Regex a bit:
(?<=ESA) Look-behind assertion. This portion is not captured but still matched
(?=ISA) Look-ahead assertion. This portion is not captured but still matched
Using these look-around assertions you can find the correct | character for splitting

Simply use the
int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here
and then just use the substring from 0 to that indexOf, indexOf to length.
Edit:
If ? is not known,
can we just use the regex Pattern and Matcher.
Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
if(matcher.find()) {
matcher.find();
int x = matcher.start();
}
Here x would give that start index of that match.
Edit: I mistakenly saw it as java one, for C#
string pattern = #"ISA.*ESA";
Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = myRegex.Match(whateverString); // m is the first match
while (m.Success)
{
Console.writeLine(m.value);
m = m.NextMatch(); // more matches
}

RegEx will probably be the best for this. See this link
Mask would be
ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.
This will give you 2 groups with data you need
Match match = Regex.Match(input, #"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);
if (match.Success)
{
var data1 = match.Groups["data1"].Value;
var data2 = match.Groups["data2"].Value;
}
Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.

It's kinda hacky but you could do...
string x = "ISA*ESA?ISA*ESA?";
x = x.Replace("*","~"); // OR SOME OTHER DELIMITER
string[] y = x.Split('~');
Not perfect in all situations, but it could solve your problem simply.

You could split by "ISA" and "ESA" and then put the parts back together.
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
string start = "ISA",
end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);
var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";
firstPart = "ISA~this is date~ESA|"
secondPart = "ISA~this is more data~ESA|";

Use a Regex like ISA(.+?)ESA and select the first group
string input = "ISA~mycontent+ESA";
Match match = Regex.Match(input, #"ISA(.+?)ESA",RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}

Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:
Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$
Explanation:
^ - asserts position at start of the string
( - start capturing group
ISA - match string ISA exactly
.*?(?=ESA) - match any character 0 or more times, positive lookahead on the
string ESA (basically match any character until the string ESA is found)
ESA - match string ESA exactly
. - match any character
) - end capturing group
repeat one more time...
$ - asserts position at end of the string
Try it on Regex101
Example:
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(#"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
RegexOptions.Compiled);
Match match = regex.Match(input);
if (match.Success)
{
string firstValue = match.Groups[1].Value; // "ISA~this is date~ESA|"
string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}

There are two answers to the question "How to split a string by another string".
var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);
and
var matches = Regex.Split(input, "ISA").ToList();
However, the first removes empty entries, while the second does not.

Related

In Perl you use brackets to extract your matches what is the equivalent of that in c#

For instance in Perl I can do
$x=~/$(\d+)\s/ which is basically saying from variable x find any number preceded by $ sign and followed by any white space character. Now $1 is equal to the number.
In C# I tried
Regex regex = new Regex(#"$(\d+)\s");
if (regex.IsMatch(text))
{
// need to access matched number here?
}
First off, your regex there $(\d+)\s actually means: find a number after the end of the string. It can never match. You have to escape the $ since it's a metacharacter.
Anyway, the equivalent C# for this is:
var match = Regex.Match(text, #"\$(\d+)\s");
if (match.Success)
{
var number = match.Groups[1].Value;
// ...
}
And, for better maintainability, groups can be named:
var match = Regex.Match(text, #"\$(?<number>\d+)\s");
if (match.Success)
{
var number = match.Groups["number"].Value;
// ...
}
And in this particular case you don't even have to use groups in the first place:
var match = Regex.Match(text, #"(?<=\$)\d+(?=\s)");
if (match.Success)
{
var number = match.Value;
// ...
}
To get a matched result, use Match instead of IsMatch.
var regex = new Regex("^[^#]*#(?<domain>.*)$");
// accessible via
regex.Match("foo#domain.com").Groups["domain"]
// or use an index
regex.Match("foo#domain.com").Matches[0]
Use the Match method instead of IsMatch and you need to escape $ to match it literally because it is a character of special meaning meaning "end of string".
Match m = Regex.Match(s, #"\$(\d+)\s");
if (m.Success) {
Console.WriteLine(m.Groups[1].Value);
}

Get values between curly braces c#

I never used regex before. I was abel to see similar questions in forum but not exactly what im looking for
I have a string like following. need to get the values between curly braces
Ex: "{name}{name#gmail.com}"
And i Need to get the following splitted strings.
name and name#gmail.com
I tried the following and it gives me back the same string.
string s = "{name}{name#gmail.com}";
string pattern = "({})";
string[] result = Regex.Split(s, pattern);
Use Matches of Regex rather than Split to accomplish this easily:
string input = "{name}{name#gmail.com}";
var regex = new Regex("{(.*?)}");
var matches = regex.Matches(input);
foreach (Match match in matches) //you can loop through your matches like this
{
var valueWithoutBrackets = match.Groups[1].Value; // name, name#gmail.com
var valueWithBrackets = match.Value; // {name}, {name#gmail.com}
}
Is using regex a must? In this particular example I would write:
s.Split(new char[] { '{', '}' }, StringSplitOptions.RemoveEmptyEntries)
here you go
string s = "{name}{name#gmail.com}";
s = s.Substring(1, s.Length - 2);// remove first and last characters
string pattern = "}{";// split pattern "}{"
string[] result = Regex.Split(s, pattern);
or
string s = "{name}{name#gmail.com}";
s = s.TrimStart('{');
s = s.TrimEnd('}');
string pattern = "}{";
string[] result = Regex.Split(s, pattern);

regex to grab 2 strings from line

is this the correct way to handle this?
string item = "strawb bana .93";
string itemPattern = #"\w*";
string pricePattern = #"\d*\.\d*";
var match = Regex.Match(item, itemPattern, RegexOptions.IgnoreCase);
var match2 = Regex.Match(item, pricePattern, RegexOptions.IgnoreCase);
if (match.Success & match2.Success)
{
Console.WriteLine("match");
Console.WriteLine(match.Value);
Console.WriteLine(match2.Value);
}
else
Console.WriteLine("no match");
is there a more concise way perhaps? Actually, I'm not grabbing the item correctly. Basically, I want to grab the item and price.
Just change the line to this and it should match your item even if it contains spaces:
string itemPattern = #"[a-z\s]*";
UPDATE: A better approach is to use groups:
string item = "strawb bana as .93";
string itemPattern = #"([a-z\s]*)(\d*\.*\d*)";
var match = Regex.Match(item, itemPattern, RegexOptions.IgnoreCase);
if (match.Success)
{
Console.WriteLine("match");
Console.WriteLine("name: "+match.Groups[1].Value);
Console.WriteLine("Price: "+match.Groups[2].Value);
}
else
Console.WriteLine("no match");
Console.Read();
([a-zA-Z\s]+).*?(\d*\.\d{2}) //item in group 1, price in group 2
*case insensitive, matches prices of .93 or 11.93 (digits preceding decimals are optional), also will match a slighter weirder string like "Strawb bana-11.98"
updated: to match items with numbers in them:
([\w\s]+?).?(\d*\.\d{2}) //matches 'item42 Bananas .55'
(clearly you can keep dreaming up inputs and making the pattern progressively more complicated, but maybe I should just go to bed :)
You likely want to combine to one regex and use grouping to capture the parts you want (check my C# as I'm not a C# developer!)
string item = "strawb bana .93";
string pattern = #"([a-z\s]+)(\d*\.\d{2})"
var match = Regex.Match(item,pattern,RegexOptions.IgnoreCase);
if( match.Success ) {
item_name = match.Groups[1].value;
price = match.Groups[2].value;
}
As for the regex itself:
([a-z\s]+) matches one or more (the + sign) letters or spaces and captures them as group 1
(\d*\. starts group two and optionally matches one or more numbers followed by a period
\d{2}) matches exactly two digits and closes the second group
Edit: There seems to be some disagreement about how ? is interpreted by C#. I've changed \d+? to \d* which means "0 or more digits"
Try this regex:
[\w\s]*\d*\.\d*
and by using Grouping constructs, your code should be like this:
string item = "strawb bana .93";
foreach(Match match in Regex.Matches(item, #"(?<str>[\w\s]*)(?<num>\d*\.\d+)"))
{
String str = match.Groups["str"].Value;
String nuum = match.Groups["num"].Value;
}
explain:
(?< name > subexpression)
Captures the matched subexpression into a named group.

Search string pattern

If I have a string like MCCORMIC 3H R Final 08-26-2011.dwg or even MCCORMIC SMITH 2N L Final 08-26-2011.dwg and I wanted to capture the R in the first string or the L in the second string in a variable, what is the best method for doing so? I was thinking about trying the below statement but it does not work.
string filename = "MCCORMIC 3H R Final 08-26-2011.dwg"
string WhichArea = "";
int WhichIndex = 0;
WhichIndex = filename.IndexOf("Final");
WhichArea = filename.Substring(WhichIndex - 1,1); //Trying to get the R in front of word Final
Just split by space:
var parts = filename.Split(new [] {' '},
StringSplitOptions.RemoveEmptyEntries);
WhichArea = parts[parts.Length - 3];
It looks like the file names have a very specific format, so this will work just fine.
Even with any number of spaces, using StringSplitOptions.RemoveEmptyEntries means spaces will not be part of the split result set.
Code updated to deal with both examples - thanks Nikola.
I had to do something similar, but with Mirostation drawings instead of Autocad. I used regex in my case. Here's what I did, just in case you feel like making it more complex.
string filename = "MCCORMIC 3H R Final 08-26-2011.dwg";
string filename2 = "MCCORMIC SMITH 2N L Final 08-26-2011.dwg";
Console.WriteLine(TheMatch(filename));
Console.WriteLine(TheMatch(filename2));
public string TheMatch(string filename) {
Regex reg = new Regex(#"[A-Za-z0-9]*\s*([A-Z])\s*Final .*\.dwg");
Match match = reg.Match(filename);
if(match.Success) {
return match.Groups[1].Value;
}
return String.Empty;
}
I don't think Oded's answer covers all cases. The first example has two words before the wanted letter, and the second one has three words before it.
My opinion is that the best way to get this letter is by using RegEx, assuming that the word Final always comes after the letter itself, separated by any number of spaces.
Here's the RegEx code:
using System.Text.RegularExpressions;
private string GetLetter(string fileName)
{
string pattern = "\S(?=\s*?Final)";
Match match = Regex.Match(fileName, pattern);
return match.Value;
}
And here's the explanation of RegEx pattern:
\S(?=\s*?Final)
\S // Anything other than whitespace
(?=\s*?Final) // Positive look-ahead
\s*? // Whitespace, unlimited number of repetitions, as few as possible.
Final // Exact text.

How to split string into numerics and alphabets using Regex

I want to split a string like "001A" into "001" and "A"
string[] data = Regex.Split("001A", "([A-Z])");
data[0] -> "001"
data[1] -> "A"
Match match = Regex.Match(s, #"^(\d+)(.+)$");
string numeral = match.Groups[1].Value;
string tail = match.Groups[2].Value;
This is Java, but it should be translatable to other flavors with little modification.
String s = "123XYZ456ABC";
String[] arr = s.split("(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)");
System.out.println(Arrays.toString(arr));
// prints "[123, XYZ, 456, ABC]"
As you can see, this splits a string wherever \d is followed by a \D or vice versa. It uses positive and negative lookarounds to find the places to split.
If your code is as simple|complicated as your 001A sample, your should not be using a Regex but a for-loop.
And if there's more like 001A002B then you could
var s = "001A002B";
var matches = Regex.Matches(s, "[0-9]+|[A-Z]+");
var numbers_and_alphas = new List<string>();
foreach (Match match in matches)
{
numbers_and_alphas.Add(match.Value);
}
You could try something like this to retrieve the integers from the string:
StringBuilder sb = new StringBuilder();
Regex regex = new Regex(#"\d*");
MatchCollection matches = regex.Matches(inputString);
for(int i=0; i < matches.count;i++){
sb.Append(matches[i].value + " ");
}
Then change the regex to match on characters and perform the same loop.

Categories

Resources