Get a special, signed part of a string [duplicate] - c#

I'm trying to develop a method that will match all strings between two strings:
I've tried this but it returns only the first match:
string ExtractString(string s, string start,string end)
{
// You should check for errors in real-world code, omitted for brevity
int startIndex = s.IndexOf(start) + start.Length;
int endIndex = s.IndexOf(end, startIndex);
return s.Substring(startIndex, endIndex - startIndex);
}
Let's suppose we have this string
String Text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2"
I would like a c# function doing the following :
public List<string> ExtractFromString(String Text,String Start, String End)
{
List<string> Matched = new List<string>();
.
.
.
return Matched;
}
// Example of use
ExtractFromString("A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2","A1","A2")
// Will return :
// FIRSTSTRING
// SECONDSTRING
// THIRDSTRING
Thank you for your help !

private static List<string> ExtractFromBody(string body, string start, string end)
{
List<string> matched = new List<string>();
int indexStart = 0;
int indexEnd = 0;
bool exit = false;
while (!exit)
{
indexStart = body.IndexOf(start);
if (indexStart != -1)
{
indexEnd = indexStart + body.Substring(indexStart).IndexOf(end);
matched.Add(body.Substring(indexStart + start.Length, indexEnd - indexStart - start.Length));
body = body.Substring(indexEnd + end.Length);
}
else
{
exit = true;
}
}
return matched;
}

Here is a solution using RegEx. Don't forget to include the following using statement.
using System.Text.RegularExpressions
It will correctly return only text between the start and end strings given.
Will not be returned:
akslakhflkshdflhksdf
Will be returned:
FIRSTSTRING
SECONDSTRING
THIRDSTRING
It uses the regular expression pattern [start string].+?[end string]
The start and end strings are escaped in case they contain regular expression special characters.
private static List<string> ExtractFromString(string source, string start, string end)
{
var results = new List<string>();
string pattern = string.Format(
"{0}({1}){2}",
Regex.Escape(start),
".+?",
Regex.Escape(end));
foreach (Match m in Regex.Matches(source, pattern))
{
results.Add(m.Groups[1].Value);
}
return results;
}
You could make that into an extension method of String like this:
public static class StringExtensionMethods
{
public static List<string> EverythingBetween(this string source, string start, string end)
{
var results = new List<string>();
string pattern = string.Format(
"{0}({1}){2}",
Regex.Escape(start),
".+?",
Regex.Escape(end));
foreach (Match m in Regex.Matches(source, pattern))
{
results.Add(m.Groups[1].Value);
}
return results;
}
}
Usage:
string source = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
string start = "A1";
string end = "A2";
List<string> results = source.EverythingBetween(start, end);

text.Split(new[] {"A1", "A2"}, StringSplitOptions.RemoveEmptyEntries);

You can split the string into an array using the start identifier in following code:
String str = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
String[] arr = str.Split("A1");
Then iterate through your array and remove the last 2 characters of each string (to remove the A2). You'll also need to discard the first array element as it will be empty assuming the string starts with A1.
Code is untested, currently on a mobile

This is a generic solution, and I believe more readable code. Not tested, so beware.
public static IEnumerable<IList<T>> SplitBy<T>(this IEnumerable<T> source,
Func<T, bool> startPredicate,
Func<T, bool> endPredicate,
bool includeDelimiter)
{
var l = new List<T>();
foreach (var s in source)
{
if (startPredicate(s))
{
if (l.Any())
{
l = new List<T>();
}
l.Add(s);
}
else if (l.Any())
{
l.Add(s);
}
if (endPredicate(s))
{
if (includeDelimiter)
yield return l;
else
yield return l.GetRange(1, l.Count - 2);
l = new List<T>();
}
}
}
In your case you can call,
var text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
var splits = text.SplitBy(x => x == "A1", x => x == "A2", false);
This is not the most efficient when you do not want the delimiter to be included (like your case) in result but efficient for opposite cases. To speed up your case one can directly call the GetEnumerator and make use of MoveNext.

Related

how do find the same string and difference string in List<string>?

I have list like this:
List<string> myList = new List<string>()
{
"AS2258B43C014AI9954803",
"AS2258B43C014AI9954603",
"AS2258B43C014AI9954703",
"AS2258B43C014AI9954503",
"AS2258B43C014AI9954403",
"AS2258B43C014AI9954203",
"AS2258B43C014AI9954303",
"AS2258B43C014AI9954103",
};
I want to output something format is sameString+diffString0\diffString1\diffString2.... like this "AS2258B43C014AI9954803\603\703\503\403\203\303\103"
what should I do?
You can create a method which returns you the difference between to strings:
private static string GetDiff (string s1, string s2)
{
int i;
for(i = 0; i < Math.Min(s1.Length,s2.Length); i++)
{
if(s1[i] != s2[i])
{
break;
}
}
return s2.Substring(i);
}
This method iterats until the first character which is different and returns the remaining characters of the second string.
With that method, you can obtain your result with the LINQ query:
string first = myList[0];
string result = first + "/" + string.Join("/", myList.Skip(1).Select(x => GetDiff(first,x)));
Online demo: https://dotnetfiddle.net/TPkhmz
Alphanumeric sorting using LINQ
Create static method for pads
public static string PadNumbers(string input)
{
return Regex.Replace(input, "[0-9]+", match => match.Value.PadLeft(10, '0'));
}
then call it
var result = myList.OrderBy(x => PadNumbers(x));
after that you can find the difference
The simplest solution is to create a function like this :
public static string Output(List<String> ListString)
{
string sameString = ListString.First().Substring(0, 18);
string output = sameString;
foreach (var item in ListString)
{
output += item.Substring(19);
output += "\\";
}
return output.Substring(0, output.Length - 1);
}

Extract index numbers from list of string

I have a list of strings where those strings have an index in each string and I need to extract the index from that string and put it in a List<int>.
Here's is a list example:
List<string> values = new List<string>();
values.Add("cohabitantGender");
values.Add("additionalDriver0LastName");
values.Add("additionalDriver0AgeWhenLicensed");
values.Add("vehicle0City");
values.Add("vehicle1City");
values.Add("vehicle2City");
values.Add("vehicle3City");
from this list I need to extract the indexes from the values vehicleXCity.
I have this code right now:
public static List<int> FormObjectIndexExtractor(List<string> values, string prefix, string suffix)
{
var selectedMatches = values.Where(v => v.StartsWith(prefix) && v.EndsWith(suffix)).Select(v=> v).ToList();
var indexes = new List<int>();
foreach (var v in selectedMatches) indexes.Add(int.Parse(Regex.Match(v, #"\d+").Value));
return indexes;
}
And I'm using it like this:
List<int> indexes = FormObjectIndexExtractor(values, "vehicle", "City");
But if I have a value like vehicle4AnotherCity the code will work in the wrong way.
Does anyone have some alternative to this code that may help?
Below is an extension helper class incase you have a more robust list and you require multiple exclusion options
public class INumberList : List<string>
{
public List<int> GetNumberList()
{
List<int> numberList = new List<int>();
for (int i = 0; i < this.Count; i++)
{
numberList.Add(GetIntFromString(this[i]));
}
return numberList;
}
public INumberList ExcludeIndex(string prefix, string suffix)
{
for (int i = 0; i < this.Count; i++)
{
if (this[i].StartsWith(prefix) && this[i].EndsWith(suffix))
{
//remove non needed indexes
this.RemoveAt(i);
}
}
return this;
}
public static int GetIntFromString(String input)
{
// Replace everything that is no a digit.
String inputCleaned = Regex.Replace(input, "[^0-9]", "");
int value = 0;
// Tries to parse the int, returns false on failure.
if (int.TryParse(inputCleaned, out value))
{
// The result from parsing can be safely returned.
return value;
}
return 0; // Or any other default value.
}
}
Then use like this:
INumberList values = new INumberList();
values.Add("cohabitantGender");
values.Add("additionalDriver0LastName");
values.Add("additionalDriver0AgeWhenLicensed");
values.Add("vehicle0City");
values.Add("vehicle1City");
values.Add("vehicle2City");
values.Add("vehicle3City");
//Get filtered index list with multiple exclusion option
List<int> indexList = values.ExcludeIndex("cohabitantGender","")
.ExcludeIndex("additionalDriver","AgeWhenLicensed")
.GetNumberList();
//will return [0,0,1,2,3]
Try this:
public static List<int> FormObjectIndexExtractor(List<string> values, string prefix, string suffix)
{
var s = "^" + prefix + #"(\d+)\.*?" + suffix + "$";
return values
.Where(v => Regex.Match(v, s).Success)
.Select(v=> int.Parse(Regex.Match(v, s).Groups[1].Value))
.ToList();
}
Here is a solution (using modern C# features) that does not use Regex:
public static List<int> FormObjectIndexExtractor(IEnumerable<string> values, string prefix, string suffix)
{
int? TryParseItem(string val)
{
if (val.Length <= prefix.Length + suffix.Length || !val.StartsWith(prefix) || !val.EndsWith(suffix))
return null;
var subStr = val.Substring(prefix.Length, val.Length - prefix.Length - suffix.Length);
if (int.TryParse(subStr, out var number))
return number;
return null;
}
return values.Select(TryParseItem).Where(v => v.HasValue).Select(v => v.Value).ToList();
}
This version splits all the parts of string.
public static List<int> FormObjectIndexExtractor(List<string> values, string prefix, string suffix)
{
List<int> ret = new List<int>();
Regex r = new Regex("^([a-zA-Z]+)(\\d+)([a-zA-Z]+)$");
foreach (var s in values)
{
var match = r.Match(s);
if (match.Success)
{
if (match.Groups[1].ToString() == prefix && match.Groups[3].ToString() == suffix)
{
ret.Add(int.Parse(match.Groups[2].ToString()));
}
}
}
return ret;
}
or alternatively:
public static List<int> FormObjectIndexExtractor(List<string> values, string prefix, string suffix)
{
List<int> ret = new List<int>();
Regex r = new Regex($"^{prefix}(\d+){suffix}$");
foreach (var s in values)
{
var match = r.Match(s);
if (match.Success)
{
ret.Add(int.Parse(match.Groups[1].ToString()));
}
}
return ret;
}
This is the more generic version.
Regex match:
starts with 'vehicle'
matches a number
ends with 'city'.
Parse and return as List<int>
var indexes = values.Where(a => Regex.IsMatch(a, #"^vehicle\d+City$")).
Select(k => int.Parse(Regex.Match(k, #"\d+").Value)).ToList();

Split string into substrings based on the separator from the left

In c#, is there an elegant way to split a string like "a.b.c" into a, a.b, a.b.c
The number of separators are not fixed so it could be "a.b" which will output {a, a.b} or "a.b.c.d" which will output {a, a.b, a.b.c, a.b.c.d}.
The only thing I can think of is split the string into individual components and then concatenate it again.
This is what I have so far:
var fieldNames = new List<string>();
var fieldSeparator ='.';
var myString = "a.b.c.d";
var individualFields = myString.Split(fieldSeparator);
string name = "";
foreach(var fieldName in individualFields)
{
name = string.IsNullOrEmpty(name) ? fieldName : $"{name}{fieldSeparator}{fieldName}";
fieldNames.Add(name);
}
Maybe this extension?
public static string[] SplitCombineFirst(this string str, params string[] delimiter)
{
string[] tokens = str.Split(delimiter, StringSplitOptions.RemoveEmptyEntries);
var allCombinations = new List<string>(tokens.Length);
for(int take = 1; take <= tokens.Length; take++)
{
string combination = string.Join(delimiter[0], tokens.Take(take));
allCombinations.Add(combination);
}
return allCombinations.ToArray();
}
Call:
string[] result = "a.b.c".SplitCombineFirst(".");
This looks like a classic case for recursion.
List<string> splitCombine(string source, string delimiter, int startIndex)
{
List<string> result = new List<string>();
var indx = source.IndexOf(delimiter, startIndex);
if (indx >= 0)
{
if (indx > 0)
{
result.Add(source.Substring(0, indx));
}
result.AddRange(splitCombine(source, delimiter, ++indx));
}
else
{
result.Add(source);
}
return result;
}
Call:
var result = splitCombine("a.b.c.d.e", ".", 0);

Extract all strings between two strings

I'm trying to develop a method that will match all strings between two strings:
I've tried this but it returns only the first match:
string ExtractString(string s, string start,string end)
{
// You should check for errors in real-world code, omitted for brevity
int startIndex = s.IndexOf(start) + start.Length;
int endIndex = s.IndexOf(end, startIndex);
return s.Substring(startIndex, endIndex - startIndex);
}
Let's suppose we have this string
String Text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2"
I would like a c# function doing the following :
public List<string> ExtractFromString(String Text,String Start, String End)
{
List<string> Matched = new List<string>();
.
.
.
return Matched;
}
// Example of use
ExtractFromString("A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2","A1","A2")
// Will return :
// FIRSTSTRING
// SECONDSTRING
// THIRDSTRING
Thank you for your help !
private static List<string> ExtractFromBody(string body, string start, string end)
{
List<string> matched = new List<string>();
int indexStart = 0;
int indexEnd = 0;
bool exit = false;
while (!exit)
{
indexStart = body.IndexOf(start);
if (indexStart != -1)
{
indexEnd = indexStart + body.Substring(indexStart).IndexOf(end);
matched.Add(body.Substring(indexStart + start.Length, indexEnd - indexStart - start.Length));
body = body.Substring(indexEnd + end.Length);
}
else
{
exit = true;
}
}
return matched;
}
Here is a solution using RegEx. Don't forget to include the following using statement.
using System.Text.RegularExpressions
It will correctly return only text between the start and end strings given.
Will not be returned:
akslakhflkshdflhksdf
Will be returned:
FIRSTSTRING
SECONDSTRING
THIRDSTRING
It uses the regular expression pattern [start string].+?[end string]
The start and end strings are escaped in case they contain regular expression special characters.
private static List<string> ExtractFromString(string source, string start, string end)
{
var results = new List<string>();
string pattern = string.Format(
"{0}({1}){2}",
Regex.Escape(start),
".+?",
Regex.Escape(end));
foreach (Match m in Regex.Matches(source, pattern))
{
results.Add(m.Groups[1].Value);
}
return results;
}
You could make that into an extension method of String like this:
public static class StringExtensionMethods
{
public static List<string> EverythingBetween(this string source, string start, string end)
{
var results = new List<string>();
string pattern = string.Format(
"{0}({1}){2}",
Regex.Escape(start),
".+?",
Regex.Escape(end));
foreach (Match m in Regex.Matches(source, pattern))
{
results.Add(m.Groups[1].Value);
}
return results;
}
}
Usage:
string source = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
string start = "A1";
string end = "A2";
List<string> results = source.EverythingBetween(start, end);
text.Split(new[] {"A1", "A2"}, StringSplitOptions.RemoveEmptyEntries);
You can split the string into an array using the start identifier in following code:
String str = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
String[] arr = str.Split("A1");
Then iterate through your array and remove the last 2 characters of each string (to remove the A2). You'll also need to discard the first array element as it will be empty assuming the string starts with A1.
Code is untested, currently on a mobile
This is a generic solution, and I believe more readable code. Not tested, so beware.
public static IEnumerable<IList<T>> SplitBy<T>(this IEnumerable<T> source,
Func<T, bool> startPredicate,
Func<T, bool> endPredicate,
bool includeDelimiter)
{
var l = new List<T>();
foreach (var s in source)
{
if (startPredicate(s))
{
if (l.Any())
{
l = new List<T>();
}
l.Add(s);
}
else if (l.Any())
{
l.Add(s);
}
if (endPredicate(s))
{
if (includeDelimiter)
yield return l;
else
yield return l.GetRange(1, l.Count - 2);
l = new List<T>();
}
}
}
In your case you can call,
var text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
var splits = text.SplitBy(x => x == "A1", x => x == "A2", false);
This is not the most efficient when you do not want the delimiter to be included (like your case) in result but efficient for opposite cases. To speed up your case one can directly call the GetEnumerator and make use of MoveNext.

How do I produce a full set of combinations with string manipulation?

I have a small project where I have an input sentence where it is possible for the user to specify variations:
The {small|big} car is {red|blue}
Above is a sample sentence i want to split into 4 sentences, like this:
The small car is red
The big car is red
The small car is blue
The big car is blue
I can't seem to wrap my mind around the problem. Maybe someone can helt me pls.
Edit
Here is my initial code
Regex regex = new Regex("{(.*?)}", RegexOptions.Singleline);
MatchCollection collection = regex.Matches(richTextBox1.Text);
string data = richTextBox1.Text;
//build amount of variations
foreach (Match match in collection)
{
string[] alternatives = match.Value.Split(new char[] { '|', '{', '}' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string alternative in alternatives)
{
//here i get problems
}
}
It sounds like you need a dynamic cartesian function for this. Eric Lippert's blog post written in response to Generating all Possible Combinations.
Firstly, we need to parse the input string:
Regex ex = new Regex(#"(?<=\{)(?<words>\w+(\|\w+)*)(?=\})");
var sentence = "The {small|big} car is {red|blue}";
then the input string should be modified to be used in string.Format-like functions:
int matchCount = 0;
var pattern = ex.Replace(sentence, me =>
{
return (matchCount++).ToString();
});
// pattern now contains "The {0} car is {1}"
then we need to find all the matches and to apply Eric's excellent CartesianProduct extension method:
var set = ex.Matches(sentence)
.Cast<Match>()
.Select(m =>
m.Groups["words"].Value
.Split('|')
).CartesianProduct();
foreach (var item in set)
{
Console.WriteLine(pattern, item.ToArray());
}
this will produce:
The small car is red
The small car is blue
The big car is red
The big car is blue
and, finally, the CartesianProduct method (taken from here):
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(
this IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] {item}));
}
private void ExpandString( List<string> result, string text )
{
var start = text.IndexOf('{');
var end = text.IndexOf('}');
if (start >= 0 && end > start)
{
var head = text.Substring(0, start);
var list = text.Substring(start + 1, end - start - 1).Split('|');
var tail = text.Substring(end + 1);
foreach (var item in list)
ExpandString(result, head + item + tail);
}
else
result.Add(text);
}
Use like:
var result = new List<string>();
ExpandString(result, "The {small|big} car is {red|blue}");
If you don't know the number of variations, recursion is your friend:
static public IEnumerable<string> permute(string template)
{
List<string> options;
string before;
string after;
if (FindFirstOptionList(template, out options, out before, out after))
{
foreach (string option in options)
{
foreach (string permutation in permute(before + option + after))
{
yield return permutation;
}
}
}
else
{
yield return template;
}
}
static public bool FindFirstOptionList(string template, out List<string> options, out string before, out string after)
{
before = string.Empty;
after = string.Empty;
options = new List<string>(0);
if (template.IndexOf('{') == -1)
{
return false;
}
before = template.Substring(0, template.IndexOf('{'));
template = template.Substring(template.IndexOf('{') + 1);
if (template.IndexOf('}') == -1)
{
return false;
}
after = template.Substring(template.IndexOf('}') + 1);
options = template.Substring(0, template.IndexOf('}')).Split('|').ToList();
return true;
}
use is similar to danbystrom's solution, except this one returns an IEnumerable instead of manipulating one of the calling parameters. Beware syntax errors, etc
static void main()
{
foreach(string permutation in permute("The {small|big} car is {red|blue}"))
{
Console.WriteLine(permutation);
}
}
I would propose to split the input text into an ordered list of static and dynamic parts. Each dynamic part itself contains a list that stores its values and an index that represents the currently selected value. This index is intially set to zero.
To print out all possible combinations you at first have to implement a method that prints the complete list using the currently set indices of the dynamic parts. For the first call all indices will be set to zero.
Now you can increment the index of the first dynamic part and print the complete list. This will give you the first variation. Repeat this until you printed all possible values of the remaining dynamic parts.
Consider nesting iterative loops. Something like...
foreach(string s in someStringSet)
{
foreach(string t in someOtherStringSet)
{
// do something with s and t
}
}
Perhaps you are looking for this:
Edited version
static void Main(string[] args)
{
var thisstring = "The {Small|Big} car is {Red|Blue}";
string FirstString = thisstring.Substring(thisstring.IndexOf("{"), (thisstring.IndexOf("}") - thisstring.IndexOf("{")) + 1);
string[] FirstPossibility = FirstString.Replace("{", "").Replace("}", "").Split('|');
thisstring = thisstring.Replace(FirstString, "[0]");
string SecondString = thisstring.Substring(thisstring.IndexOf("{"), (thisstring.IndexOf("}") - thisstring.IndexOf("{")) + 1);
string[] SecondPosibility = SecondString.Replace("{", "").Replace("}", "").Split('|');
thisstring = thisstring.Replace(SecondString, "{1}").Replace("[0]", "{0}");
foreach (string tempFirst in FirstPossibility)
{
foreach (string tempSecond in SecondPosibility)
{
Console.WriteLine(string.Format(thisstring, tempFirst, tempSecond));
}
}
Console.Read();
}
Something like this should work:
private void Do()
{
string str = "The {small|big} car is {red|blue}";
Regex regex = new Regex("{(.*?)}", RegexOptions.Singleline);
int i = 0;
var strWithPlaceHolders = regex.Replace(str, m => "{" + (i++).ToString() + "}");
var collection = regex.Matches(str);
var alternatives = collection.OfType<Match>().Select(m => m.Value.Split(new char[] { '|', '{', '}' }, StringSplitOptions.RemoveEmptyEntries));
var replacers = GetReplacers(alternatives);
var combinations = new List<string>();
foreach (var replacer in replacers)
{
combinations.Add(string.Format(strWithPlaceHolders, replacer));
}
}
private IEnumerable<object[]> GetReplacers(IEnumerable<string[]> alternatives)
{
return GetAllPossibilities(0, alternatives.ToList());
}
private IEnumerable<object[]> GetAllPossibilities(int level, List<string[]> list)
{
if (level == list.Count - 1)
{
foreach (var elem in list[level])
yield return new[] { elem };
}
else
{
foreach (var elem in list[level])
{
var thisElemAsArray = new object[] { elem };
foreach (var subPossibilities in GetAllPossibilities(level + 1, list))
yield return thisElemAsArray.Concat(subPossibilities).ToArray();
}
}
yield break;
}
string color = SomeMethodToGetColor();
string size = SomeMethodToGetSize();
string sentence = string.Format("The {0} car is {1}", size, color);

Categories

Resources