Split string on Nth occurrence of char

Split string on Nth occurrence of char - c#

I have this a lot of strings like this:
29/10/2018 14:50:09402325 671
I want to split these string so they are like this:
29/10/2018 14:50
09402325 671
These will then be added to a data set and analysed later.
The issue I am having is if I use this code:
string[] words = emaildata.Split(':');
it splits them twice; I only want to split it once on the second occurrence of the :.
How can I do that?

You can use LastIndexOf() and some subsequent Substring() calls:
string input = "29/10/2018 14:50:09402325 671";
int index = input.LastIndexOf(':');
string firstPart = input.Substring(0, index);
string secondPart = input.Substring(index + 1);
Fiddle here
However, another thing to ask yourself is if you even need to make it more complicated than it needs to be. It looks like this data will always be of a the same length until that second : instance right? Why not just split at a known index (i.e not finding the : first):
string firstPart = input.Substring(0, 16);
string secondPart = input.Substring(17);

you can reverse the string, then call the regular split method asking for a single result, and then reverse back the two results

and with a regex : https://dotnetfiddle.net/Nfiwmv
using System;
using System.Text.RegularExpressions;
public class Program {
public static void Main() {
string input = "29/10/2018 14:50:09402325 671";
Regex rx = new Regex(#"(.*):([^:]+)",
RegexOptions.Compiled | RegexOptions.IgnoreCase);
MatchCollection matches = rx.Matches(input);
if ( matches.Count >= 1 ) {
var m = matches[0].Groups;
Console.WriteLine(m[1]);
Console.WriteLine(m[2]);
}
}
}

Related

Find pattern to solve regex in one step

I have a problem to find the pattern that solves the problem in onestep.
The string looks like this:
Text1
Text1$Text2$Text3
Text1$Text2$Text3$Text4$Text5$Text6 etc.
What i want to get is: Take up to 4x Text. If there are more than "4xText" take only the last sign.
Example:
Text1$Text2$Text3$Text4$Text5$Text6 -> Text1$Text2$Text3$Text4&56
My current solution is:
First pattern:
^([^\$]*)\$?([^\$]*)\$?([^\$]*)\$?([^\$]*)\$?
After this i will do a substitution with the first pattern
New string: Text5$Text6
second pattern is:
([^\$])\b
result: 56
combine both and get the result:
Text1$Text2$Text3$Text4$56
For me it is not clear why i cant easily put the second pattern after the first pattern into one pattern. Is there something like an anchor that tells the engine to start the pattern from here like it would do if is would be the only pattern ?

You might use an alternation with a positive lookbehind and then concatenate the matches.
(?<=^(?:[^$]+\$){0,3})[^$]+\$?|[^$](?=\$|$)
Explanation
(?<= Positive lookbehind, assert what is on the left is
^(?:[^$]+\$){0,3} Match 0-3 times any char except $ followed by an optional $
) Close lookbehind
[^$]+\$? Match 1+ times any char except $, then match an optional $
| Or
[^$] Match any char except $
(?=\$|$) Positive lookahead, assert what is directly to the right is either $ or the end of the string
.NET regex demo | C# demo
Example
string pattern = #"(?<=^(?:[^$]*\$){0,3})[^$]*\$?|[^$](?=\$|$)";
string[] strings = {
"Text1",
"Text1$Text2$Text3",
"Text1$Text2$Text3$Text4$Text5$Text6"
};
Regex regex = new Regex(pattern);
foreach (String s in strings) {
Console.WriteLine(string.Join("", from Match match in regex.Matches(s) select match.Value));
}
Output
Text1
Text1$Text2$Text3
Text1$Text2$Text3$Text4$56

I strongly believe regular expression isn't the way to do that. Mostly because of the readability.
You may consider using simple algorithm like this one to reach your goal:
using System;
public class Program
{
public static void Main()
{
var input = "Text1$Text2$Text3$Text4$Text5$Text6";
var parts = input.Split('$');
var result = "";
for(var i=0; i<parts.Length; i++){
result += (i <= 4 ? parts[i] + "$" : parts[i].Substring(4));
}
Console.WriteLine(result);
}
}
There are also linq alternatives :
using System;
using System.Linq;
public class Program
{
public static void Main()
{
var input = "Text1$Text2$Text3$Text4$Text5$Text6";
var parts = input.Split('$');
var first4 = parts.Take(4);
var remainings = parts.Skip(4);
var result2 = string.Join("$", first4) + "$" + string.Join("", remainings.Select( r=>r.Substring(4)));
Console.WriteLine(result2);
}
}
It has to be adjusted to the actual needs but the idea is there

Try this code:
var texts = new string[] {"Text1", "Text1$Text2$Text3", "Text1$Text2$Text3$Text4$Text5$Text6" };
var parsed = texts
.Select(s => Regex.Replace(s,
#"(Text\d{1,3}(?:\$Text\d{1,3}){0,3})((?:\$Text\d{1,3})*)",
(match) => match.Groups[1].Value +"$"+ match.Groups[2].Value.Replace("Text", "").Replace("$", "")
)).ToArray();
// parsed is now: string[3] { "Text1$", "Text1$Text2$Text3$", "Text1$Text2$Text3$Text4$56" }
Explanation:
solution uses regex pattern: (Text\d{1,3}(?:\$Text\d{1,3}){0,3})((?:\$Text\d{1,3})*)
(...) - first capturing group
(?:...) - non-capturing group
Text\d{1,3}(?:\$Text\d{1,3} - match Text literally, then match \d{1,3}, which is 1 up to three digits, \$ matches $ literally
Rest is just repetition of it. Basically, first group captures first four pieces, second group captures the rest, if any.
We also use MatchEvaluator here which is delegate type defined as:
public delegate string MatchEvaluator(Match match);
We define such method:
(match) => match.Groups[1].Value +"$"+ match.Groups[2].Value.Replace("Text", "").Replace("$", "")
We use it to evaluate match, so takee first capturing group and concatenate with second, removing unnecessary text.

It's not clear to me whether your goal can be achieved using exclusively regex. If nothing else, the fact that you want to introduce a new character '&' into the output adds to the challenge, since just plain matching would never be able to accomplish that. Possibly using the Replace() method? I'm not sure that would work though...using only a replacement pattern and not a MatchEvaluator, I don't see a way to recognize but still exclude the "$Text" portion from the fifth instance and later.
But, if you are willing to mix regex with a small amount of post-processing, you can definitely do it:
static readonly Regex regex1 = new Regex(#"(Text\d(?:\$Text\d){0,3})(?:\$Text(\d))*", RegexOptions.Compiled);
static void Main(string[] args)
{
for (int i = 1; i <= 6; i++)
{
string text = string.Join("$", Enumerable.Range(1, i).Select(j => $"Text{j}"));
WriteLine(KeepFour(text));
}
}
private static string KeepFour(string text)
{
Match match = regex1.Match(text);
if (!match.Success)
{
return "[NO MATCH]";
}
StringBuilder result = new StringBuilder();
result.Append(match.Groups[1].Value);
if (match.Groups[2].Captures.Count > 0)
{
result.Append("&");
// Have to iterate (join), because we don't want the whole match,
// just the captured text.
result.Append(JoinCaptures(match.Groups[2]));
}
return result.ToString();
}
private static string JoinCaptures(Group group)
{
return string.Join("", group.Captures.Cast<Capture>().Select(c => c.Value));
}
The above breaks your requirement into three different capture groups in a regex. Then it extracts the captured text, composing the result based on the results.

Substitute only one group when dealing with an unknown number of capturing groups

Assuming I have this input:
/green/blah/agriculture/apple/blah/
I'm only trying to capture and replace the occurrence of apple (need to replace it with orange), so I have this regex
var regex = new Regex("^/(?:green|red){1}(?:/.*)+(apple){1}(?:/.*)");
So I'm grouping sections of the input, but as non-capturing, and only capturing the one I'm concerned with. According to this $` will retrieve everything before the match in the input string, and $' will get everything after, so theoretically the following should work:
"$`Orange$'"
But it only retrieves the match ("apple").
Is it possible to do this with just substitutions and NOT match evaluators and looping through groups?
The issue is that apple can occur anywhere in that url scheme, hence an unknown number of capture groups.
Thanks.

To achieve what you want, I slightly changed your regex.
The new regex looks like this look for the updated version at the end of the answer:
What I am doing here is, I want all the other groups to become captured groups. Doing this I can use them as follow:
String replacement = "$1Orange$2";
string result = Regex.Replace(text, regex.ToString(), replacement);
I am using group 1,2 and 4 and in the middle of everything (where I suspect 'apple') I replace it with Orange.
A complete example looks like this:
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
String text = "/green/blah/agriculture/apple/blah/hallo/apple";
var regex = new Regex("^(/(?:green|red)/(?:[^/]+/)*?)apple(/.*)");
String replacement = "$1$2Orange$4";
string result = Regex.Replace(text, regex.ToString(), replacement);
Console.WriteLine(result);
}
}
And as well a running example is here
See the updated regex, I needed to change it again to capture things like this:
/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple
With the above regex it matched the last apple and not the first as prio designated. I changed the regex to this:
var regex = new Regex("^(/(?:green|red)/(?:[^/]+/)*?)apple(/.*)");
I updated the code as well as the running example.

If you really want to replace only the first occurence of apple and dont mind about the URL structure then can you use one of the following methods:
First simply use apple as regex and use the overloaded Replace method.
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
String text = "/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple";
var regex = new Regex(Regex.Escape("apple"));
String replacement = "Orange";
string result = regex.Replace(text, replacement.ToString(), 1);
Console.WriteLine(result);
}
}
See working Example
Second is the use of IndexOf and Substring which could be much quick as the use of the regex classes.
See the following Example:
class Program
{
static void Main(string[] args)
{
string search = "apple";
string text = "/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple";
int idx = text.IndexOf(search);
int endIdx = idx + search.Length;
int secondStrLen = text.Length - endIdx;
if (idx != -1 && idx < text.Length && endIdx < text.Length && secondStrLen > -1)
{
string first = text.Substring(0, idx);
string second = text.Substring(endIdx, secondStrLen);
string result = first + "Orange" + second;
Console.WriteLine(result);
}
}
}
Working Example

Split string and string arrays

string s= abc**xy**efg**xy**ijk123**xy**lmxno**xy**opq**xy**rstz;
I want the output as string array, where it get splits at "xy". I used
string[] lines = Regex.Split(s, "xy");
here it removes xy. I want array along with xy. So, after I split my string to string array, array should be as below.
lines[0]= abc;
lines[1]= xyefg;
lines[2]= xyijk123;
lines[3]= xylmxno;
lines[4]= xyopq ;
lines[5]= xyrstz;
how can i do this?

(?=xy)
You need to split on 0 width assertion.See demo.
https://regex101.com/r/fM9lY3/50
string strRegex = #"(?=xy)";
Regex myRegex = new Regex(strRegex, RegexOptions.None);
string strTargetString = #"abcxyefgxyijk123xylmxnoxyopqxyrstz";
return myRegex.Split(strTargetString);
Output:
abc
xyefg
xyijk123
xylmxno
xyopq
xyrstz

It seems fairly simple to do this:
string s = "abc**xy**efg**xy**ijk123**xy**lmxno**xy**opq**xy**rstz";
string[] lines = Regex.Split(s, "xy");
lines = lines.Take(1).Concat(lines.Skip(1).Select(l => "xy" + l)).ToArray();
I get the following result:
I don't know if you wanted to keep the ** - your question doesn't make it clear. Changing the RegEx to #"\*\*xy\*\*" will remove the **.

If you're not married to Regex, you could make your own extension method:
public static IEnumerable<string> Ssplit(this string InputString, string Delimiter)
{
int idx = InputString.IndexOf(Delimiter);
while (idx != -1)
{
yield return InputString.Substring(0, idx);
InputString = InputString.Substring(idx);
idx = InputString.IndexOf(Delimiter, Delimiter.Length);
}
yield return InputString;
}
Usage:
string s = "abc**xy**efg**xy**ijk123**xy**lmxno**xy**opq**xy**rstz";
var x = s.Ssplit("xy");

How about simply looping throgh the array starting with index 1 and adding the "xy" string to each entry?
Alternatively implement your own version of split that cuts the string how you want it.
Yeat another solution would be matching "xy*" in a non-greedy way and your array would be the list of all matches. Depending on language this probably won't be called split BTW.

String "de-concatenation"

I have two strings like this
string s = "abcdef";
string t = "def";
I would like to remove t from s. Can I do this like this?
s = s - t?
EDIT
I will have two strings s and t, t will be an ending substring of s. I want to remove t from s.

No, but you can do this:
var newStr = "abcdef".Replace("def", "");
Per your comments, if you want to only remove the trailing pattern you can use a Regex:
var newStr = Regex.Replace("defdefdef", "(def)$", "");
The '$' will anchor to the end of the string, so it will only remove the final 'def'
Turning this into an extension method:
public static String ReplaceEnd(this string input, string subStr, string replace = "")
{
//Per Alexei Levenkov's comments, the string should
// be escaped in order to avoid accidental injection
// of special characters into the Regex pattern
var escaped = Regex.Escape(subStr);
var pattern = String.Format("({0})$", escaped);
return Regex.Replace(input, pattern, replace);
}
Using this method with your code above would become:
string s = "abcdef";
string t = "def";
s = s.ReplaceEnd(t); // Ta Da!

Like this:
if (s.EndsWith(t))
{
s = s.Substring(0, s.LastIndexOf(t));
}

s = s.Substring(0, s.Length - t.Length)
Substring takes two arguments: start and length. You want to take things from the start of abcdef, that's index 0, and you want to take all the characters minus the characters from t, which is the difference of length of the two strings.
This assumes the OP's contract of "t will be an ending substring of s". If in fact this precondition is not guaranteed, it needs if (s.EndsWith(t)) around it.

Extract the last word from a string using C#

My string is like this:
string input = "STRIP, HR 3/16 X 1 1/2 X 1 5/8 + API";
Here actually I want to extract the last word, 'API', and return.
What would be the C# code to do the above extraction?

Well, the naive implementation to that would be to simply split on each space and take the last element.
Splitting is done using an instance method on the String object, and the last of the elements can either be retrieved using array indexing, or using the Last LINQ operator.
End result:
string lastWord = input.Split(' ').Last();
If you don't have LINQ, I would do it in two operations:
string[] parts = input.Split(' ');
string lastWord = parts[parts.Length - 1];
While this would work for this string, it might not work for a slightly different string, so either you'll have to figure out how to change the code accordingly, or post all the rules.
string input = ".... ,API";
Here, the comma would be part of the "word".
Also, if the first method of obtaining the word is correct, that is, everything after the last space, and your string adheres to the following rules:
Will always contain at least one space
Does not end with one or more spaces (in case of this you can trim it)
Then you can use this code that will allocate fewer objects on the heap for GC to worry about later:
string lastWord = input.Substring(input.LastIndexOf(' ') + 1);
However, if you need to consider commas, semicolons, and whatnot, the first method using splitting is the best; there are fewer things to keep track of.

First:
using System.Linq; // System.Core.dll
then
string last = input.Split(' ').LastOrDefault();
// or
string last = input.Trim().Split(' ').LastOrDefault();
// or
string last = input.Trim().Split(' ').LastOrDefault().Trim();

var last = input.Substring(input.LastIndexOf(' ')).TrimStart();
This method doesn't allocate an entire array of strings as the others do.

string workingInput = input.Trim();
string last = workingInput.Substring(workingInput.LastIndexOf(' ')).Trim();
Although this may fail if you have no spaces in the string. I think splitting is unnecessarily intensive just for one word :)

static class Extensions
{
private static readonly char[] DefaultDelimeters = new char[]{' ', '.'};
public string LastWord(this string StringValue)
{
return LastWord(StringValue, DefaultDelimeters);
}
public string LastWord(this string StringValue, char[] Delimeters)
{
int index = StringValue.LastIndexOfAny(Delimeters);
if(index>-1)
return StringValue.Substring(index);
else
return null;
}
}
class Application
{
public void DoWork()
{
string sentence = "STRIP, HR 3/16 X 1 1/2 X 1 5/8 + API";
string lastWord = sentence.LastWord();
}
}

var lastWord = input.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries).Last();

string input = "STRIP, HR 3/16 X 1 1/2 X 1 5/8 + API";
var a = input.Split(' ');
Console.WriteLine(a[a.Length-1]);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Split string on Nth occurrence of char - c#

you can reverse the string, then call the regular split method asking for a single result, and then reverse back the two results

Related

Find pattern to solve regex in one step

Substitute only one group when dealing with an unknown number of capturing groups

Split string and string arrays

String "de-concatenation"

Extract the last word from a string using C#

Categories

Resources