I have a string of unknown length
it is in the format
\nline
\nline
\nline
with out know how long it is how can i just take the last 10 lines of the string
a line being separated by "\n"
As the string gets larger, it becomes more important to avoid processing characters that don't matter. Any approach using string.Split is inefficient, as the whole string will have to be processed. An efficient solution will have to run through the string from the back. Here's a regular expression approach.
Note that it returns a List<string>, because the results need to be reversed before they're returned (hence the use of the Insert method)
private static List<string> TakeLastLines(string text, int count)
{
List<string> lines = new List<string>();
Match match = Regex.Match(text, "^.*$", RegexOptions.Multiline | RegexOptions.RightToLeft);
while (match.Success && lines.Count < count)
{
lines.Insert(0, match.Value);
match = match.NextMatch();
}
return lines;
}
var result = text.Split('\n').Reverse().Take(10).ToArray();
Split() the string on \n, and take the last 10 elements of the resulting array.
If this is in a file and the file is particularly large, you may want to do this efficiently. A way to do it is to read the file backwards, and then only take the first 10 lines. You can see an example of using Jon Skeet's MiscUtil library to do this here.
var lines = new ReverseLineReader(filename);
var last = lines.Take(10);
Here's one way to do it that has the advantage that it doesn't create copies of the entire source string so is fairly efficient. Most of the code would be placed in a class along with other general purpose extension methods so the end result is that you can do it with 1 line of code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string x = "a\r\nb\r\nc\r\nd\r\ne\r\nf\r\ng\r\nh\r\ni\r\nj\r\nk\r\nl\r\nm\r\nn\r\no\r\np";
foreach(var line in x.SplitAsEnumerable("\r\n").TakeLast(10))
Console.WriteLine(line);
Console.ReadKey();
}
}
static class LinqExtensions
{
public static IEnumerable<string> SplitAsEnumerable(this string source)
{
return SplitAsEnumerable(source, ",");
}
public static IEnumerable<string> SplitAsEnumerable(this string source, string seperator)
{
return SplitAsEnumerable(source, seperator, false);
}
public static IEnumerable<string> SplitAsEnumerable(this string source, string seperator, bool returnSeperator)
{
if (!string.IsNullOrEmpty(source))
{
int pos = 0;
do
{
int newPos = source.IndexOf(seperator, pos, StringComparison.InvariantCultureIgnoreCase);
if (newPos == -1)
{
yield return source.Substring(pos);
break;
}
yield return source.Substring(pos, newPos - pos);
if (returnSeperator) yield return source.Substring(newPos, seperator.Length);
pos = newPos + seperator.Length;
} while (true);
}
}
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int count)
{
List<T> items = new List<T>();
foreach (var item in source)
{
items.Add(item);
if (items.Count > count) items.RemoveAt(0);
}
return items;
}
}
}
EDIT: It has been pointed out that this could be more efficient because it iterates the entire string. I also think that RemoveAt(0) with a list is probably inefficient also. To resolve this the code could be modified to search through the string backwards. This would eliminate the need for the TakeLast function as we could just use Take.
space efficient approach
private static void PrintLastNLines(string str, int n)
{
int idx = str.Length - 1;
int newLineCount = 0;
while (newLineCount < n)
{
if (str[idx] == 'n' && str[idx - 1] == '\\')
{
newLineCount++;
idx--;
}
idx--;
}
PrintFromIndex(str, idx + 3);
}
private static void PrintFromIndex(string str, int idx)
{
for (int i = idx; i < str.Length; i++)
{
if (i < str.Length - 1 && str[i] == '\\' && str[i + 1] == 'n')
{
Console.WriteLine();
i++;
}
else
{
Console.Write(str[i]);
}
}
Console.WriteLine();
}
Related
Problem: I want to write a method that takes a message/index pair like this:
("Hello, I am *Name1, how are you doing *Name2?", 2)
The index refers to the asterisk delimited name in the message. So if the index is 1, it should refer to *Name1, if it's 2 it should refer to *Name2.
The method should return just the name with the asterisk (*Name2).
I have attempted to play around with substrings, taking the first delimited * and ending when we reach a character that isn't a letter, number, underscore or hyphen, but the logic just isn't setting in.
I know this is similar to a few problems on SO but I can't find anything this specific. Any help is appreciated.
This is what's left of my very vague attempt so far. Based on this thread:
public string GetIndexedNames(string message, int index)
{
int strStart = message.IndexOf("#") + "#".Length;
int strEnd = message.LastIndexOf(" ");
String result = message.Substring(strStart, strEnd - strStart);
}
If you want to do it the old school way, then something like:
public static void Main(string[] args)
{
string message = "Hello, I am *Name1, how are you doing *Name2?";
string name1 = GetIndexedNames(message, "*", 1);
string name2 = GetIndexedNames(message, "*", 2);
Console.WriteLine(message);
Console.WriteLine(name1);
Console.WriteLine(name2);
Console.ReadLine();
}
public static string GetIndexedNames(string message, string singleCharDelimiter, int index)
{
string valid = "abcdefghijlmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-";
string[] parts = message.Split(singleCharDelimiter.ToArray());
if (parts.Length >= index)
{
StringBuilder sb = new StringBuilder();
for(int i = 0; i < parts[index].Length; i++)
{
string character = parts[index].Substring(i, 1);
if (valid.Contains(character))
{
sb.Append(character);
}
else
{
return sb.ToString();
}
}
return sb.ToString();
}
return "";
}
You can try using regular expressions to match the names. Assuming that name is a sequence of word characters (letters or digits):
using System.Linq;
using System.Text.RegularExpressions;
...
// Either name with asterisk *Name or null
// index is 1-based
private static ObtainName(string source, int index) => Regex
.Matches(source, #"\*\w+")
.Cast<Match>()
.Select(match => match.Value)
.Distinct() // in case the same name repeats several times
.ElementAtOrDefault(index - 1);
Demo:
string name = ObtainName(
"Hello, I am *Name1, how are you doing *Name2?", 2);
Console.Write(name);
Outcome:
*Name2
Perhaps not the most elegant solution, but if you want to use IndexOf, use a loop:
public static string GetIndexedNames(string message, int index, char marker='*')
{
int lastFound = 0;
for (int i = 0; i < index; i++) {
lastFound = message.IndexOf(marker, lastFound+1);
if (lastFound == -1) return null;
}
var space = message.IndexOf(' ', lastFound);
return space == -1 ? message.Substring(lastFound) : message.Substring(lastFound, space - lastFound);
}
Using C#, write an algorithm to find the three longest unique palindromes in a string. For the three longest palindromes, report the palindrome text, start index and length in descending order of length. For example, the output for string,
sqrrqabccbatudefggfedvwhijkllkjihxymnnmzpop
should be:
Text: hijkllkjih, Index: 23, Length: 10 Text: defggfed, Index: 13, Length: 8 Text: abccba, Index: 5 Length: 6
Now I got to the part where I can write out the palindromes and its length but I have a problem with the index. Need help on how to include the index of the palindrome and how to get unique lengths
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string inputString = "sqrrqabccbatudefggfedvwhijkllkjihxymnnmzpop";
string currentStr = string.Empty;
List<string> listOfPalindromes = new List<string>();
char[] inputStrArr = inputString.ToCharArray();
for (int i = 0; i < inputStrArr.Length; i++)
{
for (int j = i+1; j < inputStrArr.Length; j++)
{
currentStr = inputString.Substring(i, j - i + 1);
if (IsPalindrome(currentStr))
{
listOfPalindromes.Add(currentStr);
}
}
}
var longest = (from str in listOfPalindromes
orderby str.Length descending
select str).Take(3);
foreach (var item in longest)
{
Console.WriteLine("Text: " + item.ToString() + " Index: " + + " Length: " + item.Length.ToString());
}
}
private static bool IsPalindrome(String str)
{
bool IsPalindrome = true;
if (str.Length > 0)
{
for (int i = 0; i < str.Length / 2; i++)
{
if (str.Substring(i, 1) != str.Substring(str.Length - (i + 1), 1))
{
IsPalindrome = false;
}
}
}
else
{
IsPalindrome = false;
}
return IsPalindrome;
}
}
}
ok now that's out of the way how do I get distinct lengths? can it be done by using DISTINCT or do I need to edit something else?
You need to store more information when a palindrome is found.
First define a class:
class PalindromeResult
{
public string Text { get; set; }
public int Index { get; set; }
}
Then instead of your List<string>, create a list of this class:
List<PalindromeResult> listOfPalindromes = new List<PalindromeResult>();
When a result is found, ad it like this
if (IsPalindrome(currentStr))
{
listOfPalindromes.Add(new PalindromeResult { Text = currentStr, Index = i});
}
You would have to update your sorting and printing accordingly.
The most optimal solution (as pointed out by Sinatr) would be to store the index of the palindromes as you find them.
You could instead use the IndexOf function to find the index of the first occurrence of a substring within a string.
For example inputString.IndexOf(item) could be used in your Console.WriteLine function.
Try This
public static bool IsPalindromic(int l)
{
IEnumerable<char> forwards = l.ToString().ToCharArray();
return forwards.SequenceEqual(forwards.Reverse());
}
public int LongestPalindrome(List<int> integers)
{
int length=0;
int num;
foreach (var integer in integers)
{
if (integer.ToString().Length > length)
{
num = integer;
length = integer.ToString().Length;
}
}
return num;
}
public void MyFunction(string input)
{
var numbers = Regex.Split(input, #"\D+").ToList();
var allPalindromes = (from value in numbers where !string.IsNullOrEmpty(value) select int.Parse(value) into i where IsPalindromic(i) select i).ToList();
if (allPalindromes.Count>0)
Console.WriteLine(LongestPalindrome(allPalindromes));
else
Console.WriteLine("Any Palindrome number was found");
}
you cans mix the 2 two functions to have a beautiful code but i done it like this to simplify.
I want to break a string up into lines of a specified maximum length, without splitting any words, if possible (if there is a word that exceeds the maximum line length, then it will have to be split).
As always, I am acutely aware that strings are immutable and that one should preferably use the StringBuilder class. I have seen examples where the string is split into words and the lines are then built up using the StringBuilder class, but the code below seems "neater" to me.
I mentioned "best" in the description and not "most efficient" as I am also interested in the "eloquence" of the code. The strings will never be huge, generally splitting into 2 or three lines, and it won't be happening for thousands of lines.
Is the following code really bad?
private static IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
stringToSplit = stringToSplit.Trim();
var lines = new List<string>();
while (stringToSplit.Length > 0)
{
if (stringToSplit.Length <= maximumLineLength)
{
lines.Add(stringToSplit);
break;
}
var indexOfLastSpaceInLine = stringToSplit.Substring(0, maximumLineLength).LastIndexOf(' ');
lines.Add(stringToSplit.Substring(0, indexOfLastSpaceInLine >= 0 ? indexOfLastSpaceInLine : maximumLineLength).Trim());
stringToSplit = stringToSplit.Substring(indexOfLastSpaceInLine >= 0 ? indexOfLastSpaceInLine + 1 : maximumLineLength);
}
return lines.ToArray();
}
Even when this post is 3 years old I wanted to give a better solution using Regex to accomplish the same:
If you want the string to be splitted and then use the text to be displayed you can use this:
public string SplitToLines(string stringToSplit, int maximumLineLength)
{
return Regex.Replace(stringToSplit, #"(.{1," + maximumLineLength +#"})(?:\s|$)", "$1\n");
}
If on the other hand you need a collection you can use this:
public MatchCollection SplitToLines(string stringToSplit, int maximumLineLength)
{
return Regex.Matches(stringToSplit, #"(.{1," + maximumLineLength +#"})(?:\s|$)");
}
NOTES
Remember to import regex (using System.Text.RegularExpressions;)
You can use string interpolation on the match:
$#"(.{{1,{maximumLineLength}}})(?:\s|$)"
The MatchCollection works almost like an Array
Matching example with explanation here
How about this as a solution:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ').Concat(new [] { "" });
return
words
.Skip(1)
.Aggregate(
words.Take(1).ToList(),
(a, w) =>
{
var last = a.Last();
while (last.Length > maximumLineLength)
{
a[a.Count() - 1] = last.Substring(0, maximumLineLength);
last = last.Substring(maximumLineLength);
a.Add(last);
}
var test = last + " " + w;
if (test.Length > maximumLineLength)
{
a.Add(w);
}
else
{
a[a.Count() - 1] = test;
}
return a;
});
}
I reworked this as prefer this:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ');
var line = words.First();
foreach (var word in words.Skip(1))
{
var test = $"{line} {word}";
if (test.Length > maximumLineLength)
{
yield return line;
line = word;
}
else
{
line = test;
}
}
yield return line;
}
I don't think your solution is too bad. I do, however, think you should break up your ternary into an if else because you are testing the same condition twice. Your code might also have a bug. Based on your description, it seems you want lines <= maxLineLength, but your code counts the space after the last word and uses it in the <= comparison resulting in effectively < behavior for the trimmed string.
Here is my solution.
private static IEnumerable<string> SplitToLines(string stringToSplit, int maxLineLength)
{
string[] words = stringToSplit.Split(' ');
StringBuilder line = new StringBuilder();
foreach (string word in words)
{
if (word.Length + line.Length <= maxLineLength)
{
line.Append(word + " ");
}
else
{
if (line.Length > 0)
{
yield return line.ToString().Trim();
line.Clear();
}
string overflow = word;
while (overflow.Length > maxLineLength)
{
yield return overflow.Substring(0, maxLineLength);
overflow = overflow.Substring(maxLineLength);
}
line.Append(overflow + " ");
}
}
yield return line.ToString().Trim();
}
It is a bit longer than your solution, but it should be more straightforward. It also uses a StringBuilder so it is much faster for large strings. I performed a benchmarking test for 20,000 words ranging from 1 to 11 characters each split into lines of 10 character width. My method completed in 14ms compared to 1373ms for your method.
Try this (untested)
private static IEnumerable<string> SplitToLines(string value, int maximumLineLength)
{
var words = value.Split(' ');
var line = new StringBuilder();
foreach (var word in words)
{
if ((line.Length + word.Length) >= maximumLineLength)
{
yield return line.ToString();
line = new StringBuilder();
}
line.AppendFormat("{0}{1}", (line.Length>0) ? " " : "", word);
}
yield return line.ToString();
}
~6x faster than the accepted answer
More than 1.5x faster than the Regex version in Release Mode (dependent on line length)
Optionally keep the space at the end of the line or not (the regex version always keeps it)
static IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength, bool removeSpace = true)
{
int start = 0;
int end = 0;
for (int i = 0; i < stringToSplit.Length; i++)
{
char c = stringToSplit[i];
if (c == ' ' || c == '\n')
{
if (i - start > maximumLineLength)
{
string substring = stringToSplit.Substring(start, end - start); ;
start = removeSpace ? end + 1 : end; // + 1 to remove the space on the next line
yield return substring;
}
else
end = i;
}
}
yield return stringToSplit.Substring(start); // remember last line
}
Here is the example code used to test speeds (again, run on your own machine and test in Release mode to get accurate timings)
https://dotnetfiddle.net/h5I1GC
Timings on my machine in release mode .Net 4.8
Accepted Answer: 667ms
Regex: 368ms
My Version: 117ms
My requirement was to have a line break at the last space before the 30 char limit.
So here is how i did it. Hope this helps anyone looking.
private string LineBreakLongString(string input)
{
var outputString = string.Empty;
var found = false;
int pos = 0;
int prev = 0;
while (!found)
{
var p = input.IndexOf(' ', pos);
{
if (pos <= 30)
{
pos++;
if (p < 30) { prev = p; }
}
else
{
found = true;
}
}
outputString = input.Substring(0, prev) + System.Environment.NewLine + input.Substring(prev, input.Length - prev).Trim();
}
return outputString;
}
An approach using recursive method and ReadOnlySpan (Tested)
public static void SplitToLines(ReadOnlySpan<char> stringToSplit, int index, ref List<string> values)
{
if (stringToSplit.IsEmpty || index < 1) return;
var nextIndex = stringToSplit.IndexOf(' ');
var slice = stringToSplit.Slice(0, nextIndex < 0 ? stringToSplit.Length : nextIndex);
if (slice.Length <= index)
{
values.Add(slice.ToString());
nextIndex++;
}
else
{
values.Add(slice.Slice(0, index).ToString());
nextIndex = index;
}
if (stringToSplit.Length <= index) return;
SplitToLines(stringToSplit.Slice(nextIndex), index, ref values);
}
I am attempting to build a string extension method to trim a string to a certain length but with not breaking a word. I wanted to check to see if there was anything built into the framework or a more clever method than mine. Here's mine so far (not thoroughly tested):
public static string SmartTrim(this string s, int length)
{
StringBuilder result = new StringBuilder();
if (length >= 0)
{
if (s.IndexOf(' ') > 0)
{
string[] words = s.Split(' ');
int index = 0;
while (index < words.Length - 1 && result.Length + words[index + 1].Length <= length)
{
result.Append(words[index]);
result.Append(" ");
index++;
}
if (result.Length > 0)
{
result.Remove(result.Length - 1, 1);
}
}
else
{
result.Append(s.Substring(0, length));
}
}
else
{
throw new ArgumentOutOfRangeException("length", "Value cannot be negative.");
}
return result.ToString();
}
I'd use string.LastIndexOf - at least if we only care about spaces. Then there's no need to create any intermediate strings...
As yet untested:
public static string SmartTrim(this string text, int length)
{
if (text == null)
{
throw new ArgumentNullException("text");
}
if (length < 0)
{
throw new ArgumentOutOfRangeException();
}
if (text.Length <= length)
{
return text;
}
int lastSpaceBeforeMax = text.LastIndexOf(' ', length);
if (lastSpaceBeforeMax == -1)
{
// Perhaps define a strategy here? Could return empty string,
// or the original
throw new ArgumentException("Unable to trim word");
}
return text.Substring(0, lastSpaceBeforeMax);
}
Test code:
public class Test
{
static void Main()
{
Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(20));
Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(3));
Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(4));
Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(5));
Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(7));
}
}
Results:
'foo bar baz'
'foo'
'foo'
'foo'
'foo bar'
How about a Regex based solution ? You will probably want to test some more, and do some bounds checking; but this is what spring to my mind:
using System;
using System.Text.RegularExpressions;
namespace Stackoverflow.Test
{
static class Test
{
private static readonly Regex regWords = new Regex("\\w+", RegexOptions.Compiled);
static void Main()
{
Console.WriteLine("The quick brown fox jumped over the lazy dog".SmartTrim(8));
Console.WriteLine("The quick brown fox jumped over the lazy dog".SmartTrim(20));
Console.WriteLine("Hello, I am attempting to build a string extension method to trim a string to a certain length but with not breaking a word. I wanted to check to see if there was anything built into the framework or a more clever method than mine".SmartTrim(100));
}
public static string SmartTrim(this string s, int length)
{
var matches = regWords.Matches(s);
foreach (Match match in matches)
{
if (match.Index + match.Length > length)
{
int ln = match.Index + match.Length > s.Length ? s.Length : match.Index + match.Length;
return s.Substring(0, ln);
}
}
return s;
}
}
}
Try this out. It's null-safe, won't break if length is longer than the string, and involves less string manipulation.
Edit: Per recommendations, I've removed the intermediate string. I'll leave the answer up as it could be useful in cases where exceptions are not wanted.
public static string SmartTrim(this string s, int length)
{
if(s == null || length < 0 || s.Length <= length)
return s;
// Edit a' la Jon Skeet. Removes unnecessary intermediate string. Thanks!
// string temp = s.Length > length + 1 ? s.Remove(length+1) : s;
int lastSpace = s.LastIndexOf(' ', length + 1);
return lastSpace < 0 ? string.Empty : s.Remove(lastSpace);
}
string strTemp = "How are you doing today";
int nLength = 12;
strTemp = strTemp.Substring(0, strTemp.Substring(0, nLength).LastIndexOf(' '));
I think that should do it. When I ran that, it ended up with "How are you".
So your function would be:
public static string SmartTrim(this string s, int length)
{
return s.Substring(0, s.Substring(0, length).LastIndexOf(' '));;
}
I would definitely add some exception handling though, such as making sure the integer length is no greater than the string length and not less than 0.
Obligatory LINQ one liner, if you only care about whitespace as word boundary:
return new String(s.TakeWhile((ch,idx) => (idx < length) || (idx >= length && !Char.IsWhiteSpace(ch))).ToArray());
Use like this
var substring = source.GetSubstring(50, new string[] { " ", "." })
This method can get a sub-string based on one or many separator characters
public static string GetSubstring(this string source, int length, params string[] options)
{
if (string.IsNullOrWhiteSpace(source))
{
return string.Empty;
}
if (source.Length <= length)
{
return source;
}
var indices =
options.Select(
separator => source.IndexOf(separator, length, StringComparison.CurrentCultureIgnoreCase))
.Where(index => index >= 0)
.ToList();
if (indices.Count > 0)
{
return source.Substring(0, indices.Min());
}
return source;
}
I'll toss in some Linq goodness even though others have answered this adequately:
public string TrimString(string s, int maxLength)
{
var pos = s.Select((c, idx) => new { Char = c, Pos = idx })
.Where(item => char.IsWhiteSpace(item.Char) && item.Pos <= maxLength)
.Select(item => item.Pos)
.SingleOrDefault();
return pos > 0 ? s.Substring(0, pos) : s;
}
I left out the parameter checking that others have merely to accentuate the important code...
I have a search method that takes in a user-entered string, splits it at each space character and then proceeds to find matches based on the list of separated terms:
string[] terms = searchTerms.ToLower().Trim().Split( ' ' );
Now I have been given a further requirement: to be able to search for phrases via double quote delimiters a la Google. So if the search terms provided were:
"a line of" text
The search would match occurrences of "a line of" and "text" rather than the four separate terms [the open and closing double quotes would also need to be removed before searching].
How can I achieve this in C#? I would assume regular expressions would be the way to go, but haven't dabbled in them much so don't know if they are the best solution.
If you need any more info, please ask. Thanks in advance for the help.
Here's a regex pattern that would return matches in groups named 'term':
("(?<term>[^"]+)"\s*|(?<term>[^ ]+)\s*)+
So for the input:
"a line" of text
The output items identified by the 'term' group would be:
a line
of
text
Regular expressions would definitely be the way to go...
You should check this MSDN link out for some info on the Regex class:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
and here is an excellent link to learn some regular expression syntax:
http://www.radsoftware.com.au/articles/regexlearnsyntax.aspx
Then to add some code examples, you could be doing it something along these lines:
string searchString = "a line of";
Match m = Regex.Match(textToSearch, searchString);
or if you just want to find out if the string contains a match or not:
bool success = Regex.Match(textToSearch, searchString).Success;
use the regular expression builder here
http://gskinner.com/RegExr/
and you will be able to manipulate the regular expression to how you need it displayed
Use Regexs....
string textToSearchIn = ""a line of" text";
string result = Regex.Match(textToSearchIn, "(?<=").*?(?=")").Value;
or if more then one, put this into a match collection...
MatchCollection allPhrases = Regex.Matches(textToSearchIn, "(?<=").*?(?=")");
The Knuth-Morris-Pratt (KMP algorithm)is recognised as the fastest algorithm for finding substrings in strings (well, technically not strings but byte-arrays).
using System.Collections.Generic;
namespace KMPSearch
{
public class KMPSearch
{
public static int NORESULT = -1;
private string _needle;
private string _haystack;
private int[] _jumpTable;
public KMPSearch(string haystack, string needle)
{
Haystack = haystack;
Needle = needle;
}
public void ComputeJumpTable()
{
//Fix if we are looking for just one character...
if (Needle.Length == 1)
{
JumpTable = new int[1] { -1 };
}
else
{
int needleLength = Needle.Length;
int i = 2;
int k = 0;
JumpTable = new int[needleLength];
JumpTable[0] = -1;
JumpTable[1] = 0;
while (i <= needleLength)
{
if (i == needleLength)
{
JumpTable[needleLength - 1] = k;
}
else if (Needle[k] == Needle[i])
{
k++;
JumpTable[i] = k;
}
else if (k > 0)
{
JumpTable[i - 1] = k;
k = 0;
}
i++;
}
}
}
public int[] MatchAll()
{
List<int> matches = new List<int>();
int offset = 0;
int needleLength = Needle.Length;
int m = Match(offset);
while (m != NORESULT)
{
matches.Add(m);
offset = m + needleLength;
m = Match(offset);
}
return matches.ToArray();
}
public int Match()
{
return Match(0);
}
public int Match(int offset)
{
ComputeJumpTable();
int haystackLength = Haystack.Length;
int needleLength = Needle.Length;
if ((offset >= haystackLength) || (needleLength > ( haystackLength - offset)))
return NORESULT;
int haystackIndex = offset;
int needleIndex = 0;
while (haystackIndex < haystackLength)
{
if (needleIndex >= needleLength)
return haystackIndex;
if (haystackIndex + needleIndex >= haystackLength)
return NORESULT;
if (Haystack[haystackIndex + needleIndex] == Needle[needleIndex])
{
needleIndex++;
}
else
{
//Naive solution
haystackIndex += needleIndex;
//Go back
if (needleIndex > 1)
{
//Index of the last matching character is needleIndex - 1!
haystackIndex -= JumpTable[needleIndex - 1];
needleIndex = JumpTable[needleIndex - 1];
}
else
haystackIndex -= JumpTable[needleIndex];
}
}
return NORESULT;
}
public string Needle
{
get { return _needle; }
set { _needle = value; }
}
public string Haystack
{
get { return _haystack; }
set { _haystack = value; }
}
public int[] JumpTable
{
get { return _jumpTable; }
set { _jumpTable = value; }
}
}
}
Usage :-
using System;
using System.Collections.Generic;
using System.Text;
using System.Reflection;
namespace KMPSearch
{
class Program
{
static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: " + Environment.GetCommandLineArgs()[0] + " haystack needle");
}
else
{
KMPSearch search = new KMPSearch(args[0], args[1]);
int[] matches = search.MatchAll();
foreach (int i in matches)
Console.WriteLine("Match found at position " + i+1);
}
}
}
}
Try this, It'll return an array for text. ex: { "a line of" text "notepad" }:
string textToSearch = "\"a line of\" text \" notepad\"";
MatchCollection allPhrases = Regex.Matches(textToSearch, "(?<=\").*?(?=\")");
var RegArray = allPhrases.Cast<Match>().ToArray();
output: {"a line of","text"," notepad" }