Find all pattern indexes in string in C# - c#

How can I find all indexes of a pattern in a string using c#?
For example I want to find all ## pattern indexes in a string like this 45##78$$#56$$JK01UU

string pattern = "##";
string sentence = "45##78$$#56$$J##K01UU";
IList<int> indeces = new List<int>();
foreach (Match match in Regex.Matches(sentence, pattern))
{
indeces.Add(match.Index);
}
indeces will have 2, 14

Edited the code to make it a cleaner function.
public IEnumerable<int> FindAllIndexes(string str, string pattern)
{
int prevIndex = -pattern.Length; // so we start at index 0
int index;
while((index = str.IndexOf(pattern, prevIndex + pattern.Length)) != -1)
{
prevIndex = index;
yield return index;
}
}
string str = "45##78$$#56$$JK01UU";
string pattern = "##";
var indexes = FindAllIndexes(str, pattern);

You can get all the indices of a pattern in a string by using a regex search like this.
string input = "45##78$$#56$$JK01UU", pattern = Regex.Escape("##");
Regex rx = new Regex(pattern);
var indices = new List<int>();
var matches = rx.Matches(s);
for (int i=0 ; i<matches.Length ; i++)
{
indices.Add(matches[i].Index);
}

Another one that tries to be efficient:
public IEnumerable<int> FindPatternIndexes(string input, string search)
{
var sb = new StringBuilder(input);
for (var i = 0; search.Length <= sb.Length; i++)
{
if (sb.ToString().StartsWith(search)) yield return i;
sb.Remove(0,1);
}
}

Tested. Worked. But somewhat dumb.
string foo = "45##78$$#56$$JK01UU";
char[] fooChar = foo.ToCharArray();
int i = 0;
bool register = false;
foreach (char fc in fooChar)
{
if (fc == '#' && register == true)
{
MessageBox.Show("Index: " + (i-1));
}
else if (fc == '#')
{
register = true;
}
else
{
register = false;
}
i++;
}

Related

Delete part of string value

I want to mix 2 string in 1 randomly using foreach but I don't know how I delete the part I used on the string for the foreach like:
string s = "idontknow";
string sNew = "";
foreach(char ss in s){
s = s + ss;
ss.Delete(s); //don't exist
}
Full code here i'm trying to do:
do
{
if (state == 0)
{
for (int i = 0; random.Next(1, 5) > variable.Length; i++)
{
foreach (char ch in variable)
{
fullString = fullString + ch;
}
}
state++;
}
else if (state == 1)
{
for (int i = 0; random.Next(1, 5) > numbers.Length; i++)
{
foreach (char n in numbers)
{
fullString = fullString + n;
}
}
state--;
}
} while (variable.Length != 0 && numbers.Length != 0);
I'm pretty confident, that in your first code snippet, you are creating an infinite loop, since you are appending the used char back to the string while removing it from the first position.
Regarding your specification to shuffle two stings together, this code sample might do the job:
public static string ShuffleStrings(string s1, string s2){
List<char> charPool = new();
foreach (char c in s1) {
charPool.Add(c);
}
foreach (char c in s2) {
charPool.Add(c);
}
Random rand = new();
char[] output = new char[charPool.Count];
for(int i = 0; i < output.Length; i++) {
int randomIndex = rand.Next(0, charPool.Count);
output[i] = charPool[randomIndex];
charPool.RemoveAt(randomIndex);
}
return new string(output);
}
In case you just want to shuffle one string into another string, just use an empty string as the first or second parameter.
Example:
string shuffled = ShuffleStrings("TEST", "string");
Console.WriteLine(shuffled);
// Output:
// EgsTtSnrTi
There are possibly other solutions, which are much shorter, but I think this code is pretty easy to read and understand.
Concerning the performance, the code above should works both for small stings and large strings.
Since strings are immutable, each modify-operation on any string, e.g. "te" + "st" or "test".Replace("t", ""), will allocate and create a new string in the memory, which is - in a large scale - pretty bad.
For that very reason, I initialized a char array, which will then be filled.
Alternatively, you can use:
using System.Text;
StringBuilder sb = new();
// append each randomly picked char
sb.Append(c);
// Create a string from appended chars
sb.ToString();
And if your question was just how to remove the first char of a string:
string myStr = "Test";
foreach (char c in myStr) {
// do with c whatever you want
myStr = myStr[1..]; // assign a substring exluding first char (start at index 1)
Console.WriteLine($"c = {c}; myStr = {myStr}");
}
// Output:
// c = T; myStr = est
// c = e; myStr = st
// c = s; myStr = t
// c = t; myStr =

Get a special, signed part of a string [duplicate]

I'm trying to develop a method that will match all strings between two strings:
I've tried this but it returns only the first match:
string ExtractString(string s, string start,string end)
{
// You should check for errors in real-world code, omitted for brevity
int startIndex = s.IndexOf(start) + start.Length;
int endIndex = s.IndexOf(end, startIndex);
return s.Substring(startIndex, endIndex - startIndex);
}
Let's suppose we have this string
String Text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2"
I would like a c# function doing the following :
public List<string> ExtractFromString(String Text,String Start, String End)
{
List<string> Matched = new List<string>();
.
.
.
return Matched;
}
// Example of use
ExtractFromString("A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2","A1","A2")
// Will return :
// FIRSTSTRING
// SECONDSTRING
// THIRDSTRING
Thank you for your help !
private static List<string> ExtractFromBody(string body, string start, string end)
{
List<string> matched = new List<string>();
int indexStart = 0;
int indexEnd = 0;
bool exit = false;
while (!exit)
{
indexStart = body.IndexOf(start);
if (indexStart != -1)
{
indexEnd = indexStart + body.Substring(indexStart).IndexOf(end);
matched.Add(body.Substring(indexStart + start.Length, indexEnd - indexStart - start.Length));
body = body.Substring(indexEnd + end.Length);
}
else
{
exit = true;
}
}
return matched;
}
Here is a solution using RegEx. Don't forget to include the following using statement.
using System.Text.RegularExpressions
It will correctly return only text between the start and end strings given.
Will not be returned:
akslakhflkshdflhksdf
Will be returned:
FIRSTSTRING
SECONDSTRING
THIRDSTRING
It uses the regular expression pattern [start string].+?[end string]
The start and end strings are escaped in case they contain regular expression special characters.
private static List<string> ExtractFromString(string source, string start, string end)
{
var results = new List<string>();
string pattern = string.Format(
"{0}({1}){2}",
Regex.Escape(start),
".+?",
Regex.Escape(end));
foreach (Match m in Regex.Matches(source, pattern))
{
results.Add(m.Groups[1].Value);
}
return results;
}
You could make that into an extension method of String like this:
public static class StringExtensionMethods
{
public static List<string> EverythingBetween(this string source, string start, string end)
{
var results = new List<string>();
string pattern = string.Format(
"{0}({1}){2}",
Regex.Escape(start),
".+?",
Regex.Escape(end));
foreach (Match m in Regex.Matches(source, pattern))
{
results.Add(m.Groups[1].Value);
}
return results;
}
}
Usage:
string source = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
string start = "A1";
string end = "A2";
List<string> results = source.EverythingBetween(start, end);
text.Split(new[] {"A1", "A2"}, StringSplitOptions.RemoveEmptyEntries);
You can split the string into an array using the start identifier in following code:
String str = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
String[] arr = str.Split("A1");
Then iterate through your array and remove the last 2 characters of each string (to remove the A2). You'll also need to discard the first array element as it will be empty assuming the string starts with A1.
Code is untested, currently on a mobile
This is a generic solution, and I believe more readable code. Not tested, so beware.
public static IEnumerable<IList<T>> SplitBy<T>(this IEnumerable<T> source,
Func<T, bool> startPredicate,
Func<T, bool> endPredicate,
bool includeDelimiter)
{
var l = new List<T>();
foreach (var s in source)
{
if (startPredicate(s))
{
if (l.Any())
{
l = new List<T>();
}
l.Add(s);
}
else if (l.Any())
{
l.Add(s);
}
if (endPredicate(s))
{
if (includeDelimiter)
yield return l;
else
yield return l.GetRange(1, l.Count - 2);
l = new List<T>();
}
}
}
In your case you can call,
var text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
var splits = text.SplitBy(x => x == "A1", x => x == "A2", false);
This is not the most efficient when you do not want the delimiter to be included (like your case) in result but efficient for opposite cases. To speed up your case one can directly call the GetEnumerator and make use of MoveNext.

Extract all strings between two strings

I'm trying to develop a method that will match all strings between two strings:
I've tried this but it returns only the first match:
string ExtractString(string s, string start,string end)
{
// You should check for errors in real-world code, omitted for brevity
int startIndex = s.IndexOf(start) + start.Length;
int endIndex = s.IndexOf(end, startIndex);
return s.Substring(startIndex, endIndex - startIndex);
}
Let's suppose we have this string
String Text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2"
I would like a c# function doing the following :
public List<string> ExtractFromString(String Text,String Start, String End)
{
List<string> Matched = new List<string>();
.
.
.
return Matched;
}
// Example of use
ExtractFromString("A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2","A1","A2")
// Will return :
// FIRSTSTRING
// SECONDSTRING
// THIRDSTRING
Thank you for your help !
private static List<string> ExtractFromBody(string body, string start, string end)
{
List<string> matched = new List<string>();
int indexStart = 0;
int indexEnd = 0;
bool exit = false;
while (!exit)
{
indexStart = body.IndexOf(start);
if (indexStart != -1)
{
indexEnd = indexStart + body.Substring(indexStart).IndexOf(end);
matched.Add(body.Substring(indexStart + start.Length, indexEnd - indexStart - start.Length));
body = body.Substring(indexEnd + end.Length);
}
else
{
exit = true;
}
}
return matched;
}
Here is a solution using RegEx. Don't forget to include the following using statement.
using System.Text.RegularExpressions
It will correctly return only text between the start and end strings given.
Will not be returned:
akslakhflkshdflhksdf
Will be returned:
FIRSTSTRING
SECONDSTRING
THIRDSTRING
It uses the regular expression pattern [start string].+?[end string]
The start and end strings are escaped in case they contain regular expression special characters.
private static List<string> ExtractFromString(string source, string start, string end)
{
var results = new List<string>();
string pattern = string.Format(
"{0}({1}){2}",
Regex.Escape(start),
".+?",
Regex.Escape(end));
foreach (Match m in Regex.Matches(source, pattern))
{
results.Add(m.Groups[1].Value);
}
return results;
}
You could make that into an extension method of String like this:
public static class StringExtensionMethods
{
public static List<string> EverythingBetween(this string source, string start, string end)
{
var results = new List<string>();
string pattern = string.Format(
"{0}({1}){2}",
Regex.Escape(start),
".+?",
Regex.Escape(end));
foreach (Match m in Regex.Matches(source, pattern))
{
results.Add(m.Groups[1].Value);
}
return results;
}
}
Usage:
string source = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
string start = "A1";
string end = "A2";
List<string> results = source.EverythingBetween(start, end);
text.Split(new[] {"A1", "A2"}, StringSplitOptions.RemoveEmptyEntries);
You can split the string into an array using the start identifier in following code:
String str = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
String[] arr = str.Split("A1");
Then iterate through your array and remove the last 2 characters of each string (to remove the A2). You'll also need to discard the first array element as it will be empty assuming the string starts with A1.
Code is untested, currently on a mobile
This is a generic solution, and I believe more readable code. Not tested, so beware.
public static IEnumerable<IList<T>> SplitBy<T>(this IEnumerable<T> source,
Func<T, bool> startPredicate,
Func<T, bool> endPredicate,
bool includeDelimiter)
{
var l = new List<T>();
foreach (var s in source)
{
if (startPredicate(s))
{
if (l.Any())
{
l = new List<T>();
}
l.Add(s);
}
else if (l.Any())
{
l.Add(s);
}
if (endPredicate(s))
{
if (includeDelimiter)
yield return l;
else
yield return l.GetRange(1, l.Count - 2);
l = new List<T>();
}
}
}
In your case you can call,
var text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
var splits = text.SplitBy(x => x == "A1", x => x == "A2", false);
This is not the most efficient when you do not want the delimiter to be included (like your case) in result but efficient for opposite cases. To speed up your case one can directly call the GetEnumerator and make use of MoveNext.

How to cast an 'int' to a 'char' in C#?

I have a string variable which has a mixture of numbers and letters. I want to create a new string that only has int values of the previous string variable. So I found two ways to cast int to char. However, they do not work. Here's what I've tried
string onlyNumberString = "";
foreach (char onlyNum in puzzleData)
{
for (int i = 1; i < 10; i++)
{
if (onlyNum == (char)i)
{
onlyNumberString += onlyNum;
}
}
}
and
string onlyNumberString = "";
foreach (char onlyNum in puzzleData)
{
for (int i = 1; i < 10; i++)
{
if (onlyNum == Convert.ToChar(i))
{
onlyNumberString += onlyNum;
}
}
}
Use Char.IsDigit instead, far simpler.
StringBuilder onlyNumber = new StringBuilder();
foreach (char onlyNum in puzzleData)
{
if (Char.IsDigit(onlyNum))
{
onlyNumber.Append(onlyNum);
}
}
int iNum = 2;
char cChar = iNum.ToString()[0];
Will work for x when 0 <= x <= 9.
You can just cast an int to a char it directly:
var myChar = (char)20;
But to do what you want I suggest using a regular expression:
var onlyNumerals = Regex.Replace(myString, #"[^0-9]", "");
The above will replace any character that is not 0-9 with an empty space.
An alternative, using LINQ and char.IsDigit:
var onlyNumeral = new string(myString.Where(c => Char.IsDigit(c)).ToArray());
You can do it as:
string justNumbers = new String(text.Where(Char.IsDigit).ToArray());
A few ways:
(char)int
Or
int.Parse(char.ToString())
Or
Convert.ToChar(int);

How to extract phrases and then words in a string of text?

I have a search method that takes in a user-entered string, splits it at each space character and then proceeds to find matches based on the list of separated terms:
string[] terms = searchTerms.ToLower().Trim().Split( ' ' );
Now I have been given a further requirement: to be able to search for phrases via double quote delimiters a la Google. So if the search terms provided were:
"a line of" text
The search would match occurrences of "a line of" and "text" rather than the four separate terms [the open and closing double quotes would also need to be removed before searching].
How can I achieve this in C#? I would assume regular expressions would be the way to go, but haven't dabbled in them much so don't know if they are the best solution.
If you need any more info, please ask. Thanks in advance for the help.
Here's a regex pattern that would return matches in groups named 'term':
("(?<term>[^"]+)"\s*|(?<term>[^ ]+)\s*)+
So for the input:
"a line" of text
The output items identified by the 'term' group would be:
a line
of
text
Regular expressions would definitely be the way to go...
You should check this MSDN link out for some info on the Regex class:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
and here is an excellent link to learn some regular expression syntax:
http://www.radsoftware.com.au/articles/regexlearnsyntax.aspx
Then to add some code examples, you could be doing it something along these lines:
string searchString = "a line of";
Match m = Regex.Match(textToSearch, searchString);
or if you just want to find out if the string contains a match or not:
bool success = Regex.Match(textToSearch, searchString).Success;
use the regular expression builder here
http://gskinner.com/RegExr/
and you will be able to manipulate the regular expression to how you need it displayed
Use Regexs....
string textToSearchIn = ""a line of" text";
string result = Regex.Match(textToSearchIn, "(?<=").*?(?=")").Value;
or if more then one, put this into a match collection...
MatchCollection allPhrases = Regex.Matches(textToSearchIn, "(?<=").*?(?=")");
The Knuth-Morris-Pratt (KMP algorithm)is recognised as the fastest algorithm for finding substrings in strings (well, technically not strings but byte-arrays).
using System.Collections.Generic;
namespace KMPSearch
{
public class KMPSearch
{
public static int NORESULT = -1;
private string _needle;
private string _haystack;
private int[] _jumpTable;
public KMPSearch(string haystack, string needle)
{
Haystack = haystack;
Needle = needle;
}
public void ComputeJumpTable()
{
//Fix if we are looking for just one character...
if (Needle.Length == 1)
{
JumpTable = new int[1] { -1 };
}
else
{
int needleLength = Needle.Length;
int i = 2;
int k = 0;
JumpTable = new int[needleLength];
JumpTable[0] = -1;
JumpTable[1] = 0;
while (i <= needleLength)
{
if (i == needleLength)
{
JumpTable[needleLength - 1] = k;
}
else if (Needle[k] == Needle[i])
{
k++;
JumpTable[i] = k;
}
else if (k > 0)
{
JumpTable[i - 1] = k;
k = 0;
}
i++;
}
}
}
public int[] MatchAll()
{
List<int> matches = new List<int>();
int offset = 0;
int needleLength = Needle.Length;
int m = Match(offset);
while (m != NORESULT)
{
matches.Add(m);
offset = m + needleLength;
m = Match(offset);
}
return matches.ToArray();
}
public int Match()
{
return Match(0);
}
public int Match(int offset)
{
ComputeJumpTable();
int haystackLength = Haystack.Length;
int needleLength = Needle.Length;
if ((offset >= haystackLength) || (needleLength > ( haystackLength - offset)))
return NORESULT;
int haystackIndex = offset;
int needleIndex = 0;
while (haystackIndex < haystackLength)
{
if (needleIndex >= needleLength)
return haystackIndex;
if (haystackIndex + needleIndex >= haystackLength)
return NORESULT;
if (Haystack[haystackIndex + needleIndex] == Needle[needleIndex])
{
needleIndex++;
}
else
{
//Naive solution
haystackIndex += needleIndex;
//Go back
if (needleIndex > 1)
{
//Index of the last matching character is needleIndex - 1!
haystackIndex -= JumpTable[needleIndex - 1];
needleIndex = JumpTable[needleIndex - 1];
}
else
haystackIndex -= JumpTable[needleIndex];
}
}
return NORESULT;
}
public string Needle
{
get { return _needle; }
set { _needle = value; }
}
public string Haystack
{
get { return _haystack; }
set { _haystack = value; }
}
public int[] JumpTable
{
get { return _jumpTable; }
set { _jumpTable = value; }
}
}
}
Usage :-
using System;
using System.Collections.Generic;
using System.Text;
using System.Reflection;
namespace KMPSearch
{
class Program
{
static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: " + Environment.GetCommandLineArgs()[0] + " haystack needle");
}
else
{
KMPSearch search = new KMPSearch(args[0], args[1]);
int[] matches = search.MatchAll();
foreach (int i in matches)
Console.WriteLine("Match found at position " + i+1);
}
}
}
}
Try this, It'll return an array for text. ex: { "a line of" text "notepad" }:
string textToSearch = "\"a line of\" text \" notepad\"";
MatchCollection allPhrases = Regex.Matches(textToSearch, "(?<=\").*?(?=\")");
var RegArray = allPhrases.Cast<Match>().ToArray();
output: {"a line of","text"," notepad" }

Categories

Resources