Trigrams comparison - c#

I am fairly new to coding so I guess I´m not seeing the obvious answer by myself so I am sorry if this is a silly question but I´m really stucked here. I am trying to compare two sets of trigrams from two different texts (A and B). If there are no trigrams in B present on A, then I would say those two texts are different, at least for my present purpose. I am using Nuve for trigram extraction.
So far I have this:
var paragraph = "This is not a phrase. This is not a sentence.";
var paragraph2 = "This is a phrase. This is a sentence. This have nothing to do with sentences.";
ITokenizer tokenizer = new ClassicTokenizer(true);
SentenceSegmenter segmenter = new TokenBasedSentenceSegmenter(tokenizer);
var sentences = segmenter.GetSentences(paragraph);
ITokenizer tokenizer2 = new ClassicTokenizer(true);
SentenceSegmenter segmenter2 = new TokenBasedSentenceSegmenter(tokenizer2);
var sentences2 = segmenter2.GetSentences(paragraph2);
var extractor = new NGramExtractor(3);
var grams1 = extractor.ExtractAsList(sentences);
var grams2 = extractor.ExtractAsList(sentences2);
var nonintersect = grams2.Except(grams1);
foreach (var nGram in nonintersect)
{
var current = nGram;
bool found = false;
foreach (var n in grams2)
{
if (!found)
{
if (n == current)
{
found = true;
}
}
}
if (!found)
{
var result = current;
string finalresult = Convert.ToString(result);
textBox3.AppendText(finalresult+ "\n");
}
This way I hope to get the sentences that, being in B, are not present in A (i.e. all the sentences from B in the example), but now I would have to compare each trigram from B to each trigram from A to see if sentences are really different from each other. I have tried to do so with another nested foreach but I get just nonsense data, as follows:
foreach (var sentence2 in sentences2)
{
var actual = sentence2;
bool found1 = false;
foreach (var sentence in sentences)
{
if (!found1)
{
if (actual == sentence)
{
found1 = true;
}
}
}
if (!found1)
{
string finalresult= Convert.ToString(actual);
textBox3.AppendText(finalresult+ "\n");
}
}
Doing this I try to verify if the trigrams from each sentence in B are equal to the trigrams from each sentence in A and, if they are, then textBox3 would be empty.
Briefly, I am trying to code something similar to Ferret but for C# and only for comparing two given plain texts. As far as I know, there is nothing similar already done for C#.
Any help or tip would be really appreciated. Thanks!

Comparing bodies of text
Comparing two bodies of text and marking them as similar if they have at least one sentence-level trigram in common is fairly straight-forward:
public bool AreTextsSimilar(string a, string b)
{
// We can reuse these objects - they could be stored in member fields:
ITokenizer tokenizer = new ClassicTokenizer(true);
SentenceSegmenter segmenter = new TokenBasedSentenceSegmenter(tokenizer);
NGramExtractor trigramExtractor = new NGramExtractor(3);
IEnumerable<string> sentencesA = segmenter.GetSentences(a);
IEnumerable<string> sentencesB = segmenter.GetSentences(b);
// The order of trigrams doesn't matter, so we'll fetch them as sets instead,
// to make comparisons between their elements more efficient:
ISet<NGram> trigramsA = trigramExtractor.ExtractAsSet(sentencesA);
ISet<NGram> trigramsB = trigramExtractor.ExtractAsSet(sentencesB);
// 'Intersect' returns all elements that are found in both collections:
IEnumerable<NGram> sharedTrigrams = trigramsA.Intersect(trigramsB);
// 'Any' only returns true if the collection isn't empty:
return sharedTrigrams.Any();
}
Without Linq methods (Intersect, Any), those last two lines could be implemented as a loop:
foreach (NGram trigramA in trigramsA)
{
// As soon as we find a shared sentence trigram we can conclude that
// the two bodies of text are indeed similar:
if (trigramsB.Contains(trigramA))
return true;
}
return false;
}
Sentences without shared word trigrams
Retrieving all sentences that do not share word-level trigrams takes some more work:
public IEnumerable<string> GetUniqueBSentences(string a, string b)
{
// We can reuse these objects - they could be stored in member fields:
ITokenizer tokenizer = new ClassicTokenizer(true);
SentenceSegmenter segmenter = new TokenBasedSentenceSegmenter(tokenizer);
NGramExtractor trigramExtractor = new NGramExtractor(3);
IEnumerable<string> sentencesA = segmenter.GetSentences(a);
IEnumerable<string> sentencesB = segmenter.GetSentences(b);
ITokenizer wordTokenizer = new ClassicTokenizer(false);
foreach (string sentenceB in sentencesB)
{
IList<string> wordsB = wordTokenizer.Tokenize(sentenceB);
ISet<NGram> wordTrigramsB = trigramExtractor.ExtractAsSet(wordsB);
bool foundMatchingSentence = false;
foreach (string sentenceA in sentencesA)
{
// This will be repeated for every sentence in B. It would be more efficient
// to generate trigrams for all sentences in A once, before we enter these loops:
IList<string> wordsA = wordTokenizer.Tokenize(sentenceA);
ISet<NGram> wordTrigramsA = trigramExtractor.ExtractAsSet(wordsA);
if (wordTrigramsA.Intersect(wordTrigramsB).Any())
{
// We found a sentence in A that shares word-trigrams, so stop comparing:
foundMatchingSentence = true;
break;
}
}
// No matching sentence in A? Then this sentence is unique to B:
if (!foundMatchingSentence)
yield return sentenceB;
}
}
Apparently segmenter also returns an extra, empty sentence, which you may want to filter out (or figure out how to prevent segmenter from doing that).
I'm sure the above code can be optimized if performance is a concern, but I'll leave that up to you.

Related

How to iterate through multiple variables?

I'd like to create a short program to download several pictures from a website.
On a form, I would like to enter a root-link to a website with placeholders.
The placeholders can be defined with Start/End value and asc/desc.
For example: the original link is
google.de/1236-01.jpg
and I'd like to generate all links from
google.de/1236-1.jpg
up to
google.de/9955-12.jpg
So my input would be "google.de/[0]-[1].jpg" and placeholders are set to:
[0] = start 1236|end 9955|asc
[1] = start 1|end 12|asc
Via GetValidCharacters() I get a String-List of valid combinations for each entered placeholder (can be selected via ascending/descending + start&end).
The goal I'm struggling with is to build all combinations of this link, because I need to determine while runtime, how much placeholders I have.
My idea was to loop over an queue and enquueue each new build line, until there is none left with placeholders, but I don't know how to do this.
I need to make sure that all combinations are entered and they are entered only once.
private static void CreateDownloadList()
{
Queue<string> tmpQueue = new Queue<string>(); //temp queue
tmpQueue.Enqueue(DL_path); //DL_Path = google.de/[0]-[1].jpg
string line = "";
while ((line = tmpQueue.Dequeue()) != null) //not empty
{
if (line.Contains("[")) //placeholder
{
string tmpLine = line;
//how to determine, which placeholder is next?? need to know this and replace this with every combination, I get from GetValidCharacters(start, end, DESC)
}
else //done
{
_urlList.Add(line);
}
}
}
how about a simple for loop?
for (int i = 1236; i <= 9955; i++)
{
for (int j = 1; j <= 12; j++)
{
tmpQueue.Enqueue(string.Format("google.de/{0}-{1}.jpg", i, j));
}
}
I'm not going give you the full code but here is some pseudo code that would solve the problem.
given :
todostack -- stack object that holds a list of unresolved items
replace_map -- map object that holds marker string and map of all values
marker_list -- list of all markers
final_list -- list object that holds the results
(note you can probably use marker_list and replace_map in one object -- I have them separate to make my code clearer)
init :
push todostack with your starting string
set marker_list and replace_map to correct values (from parameters I assume)
clear final_list
algorithm :
while (there are any items in todostack)
{
curitem = todostack.pop
if (curitem contains a marker in marker_list)
{
loop for each replacement in replace_map
{
new_item = curitem replaced with replacement
todostack.push(new_item)
}
}
else
add curitem to final_list
}
#Hogan this was the hint to the correct way.
solution is this
private void CreateDownloadList()
{
Queue<string> tmpQueue = new Queue<string>();
tmpQueue.Enqueue(downloadPathWithPlaceHolders);
while(tmpQueue.Count > 0)
{
string currentItem = tmpQueue.Dequeue();
bool test = false;
if(currentItem.Contains("["))
{
foreach(Placeholder p in _placeholders)
{
if(currentItem.Contains(p.PlaceHolder))
{
foreach(string s in p.Replacements)
{
tmpQueue.Enqueue(currentItem.Replace(p.PlaceHolder, s));
}
test = true;
}
if(test)
break;
}
}
else
{
_downloadLinkList.Add(currentItem);
}
}
}

C# human name checking and matching algorithm

Is there an algorith or C# library to determine if a human name is correct or not and, if not, to find its nearest matching?
I found algorithms for string matching like the Levenshtein's distance algorithm, but all of them check the matching between one string and another, and i want to check the matching between one name and all the possible names in English (for example), to check if the name was wrongly written.
For example:
Someone inserts the name "Giliam" while it should be "william". I want to know if there are any algorithm (or group of them) to detect the error and propose a correction.
All solutions that come to my mind involves the implementation of a huge human name's dictionary and use it to check for the correctness of each input name matching one by one... And it sounds terrorific performace to me, so i want to ask for a better approach.
Thanks.
What you are in effect asking is how to create a spell checker with a given dictionary. One way to do this that doesn't involve looking up and testing every possible entry in a list is to do the inverse of the problem: Generate a list of possible permutations from the user input, and test each one of those to see if they're in a list. This is a much more manageable problem.
For instance, you could use a function like this to generate each possible permutation that one "edit" could get from a given word:
static HashSet<string> GenerateEdits(string word)
{
// Normalize the case
word = word.ToLower();
var splits = new List<Tuple<string, string>>();
for (int i = 0; i < word.Length; i++)
{
splits.Add(new Tuple<string, string>(word.Substring(0, i), word.Substring(i)));
}
var ret = new HashSet<string>();
// All cases of one character removed
foreach (var cur in splits)
{
if (cur.Item2.Length > 0)
{
ret.Add(cur.Item1 + cur.Item2.Substring(1));
}
}
// All transposed possibilities
foreach (var cur in splits)
{
if (cur.Item2.Length > 1)
{
ret.Add(cur.Item1 + cur.Item2[1] + cur.Item2[0] + cur.Item2.Substring(2));
}
}
var letters = "abcdefghijklmnopqrstuvwxyz";
// All replaced characters
foreach (var cur in splits)
{
if (cur.Item2.Length > 0)
{
foreach (var letter in letters)
{
ret.Add(cur.Item1 + letter + cur.Item2.Substring(1));
}
}
}
// All inserted characters
foreach (var cur in splits)
{
foreach (var letter in letters)
{
ret.Add(cur.Item1 + letter + cur.Item2);
}
}
return ret;
}
And then exercise the code to see if a given user input can be easily convoluted to one of these entries. Finding the best match can be done by weighted averages, or simply by presenting the list of possibilities to the user:
// Example file from:
// https://raw.githubusercontent.com/smashew/NameDatabases/master/NamesDatabases/first%20names/all.txt
string source = #"all.txt";
var names = new HashSet<string>();
using (var sr = new StreamReader(source))
{
string line;
while ((line = sr.ReadLine()) != null)
{
names.Add(line.ToLower());
}
}
var userEntry = "Giliam";
var found = false;
if (names.Contains(userEntry.ToLower()))
{
Console.WriteLine("The entered value of " + userEntry + " looks good");
found = true;
}
if (!found)
{
// Try edits one edit away from the user entry
foreach (var test in GenerateEdits(userEntry))
{
if (names.Contains(test))
{
Console.WriteLine(test + " is a possibility for " + userEntry);
found = true;
}
}
}
if (!found)
{
// Try edits two edits away from the user entry
foreach (var test in GenerateEdits(userEntry))
{
foreach (var test2 in GenerateEdits(test))
{
if (names.Contains(test))
{
Console.WriteLine(test + " is a possibility for " + userEntry);
found = true;
}
}
}
}
kiliam is a possibility for Giliam
liliam is a possibility for Giliam
viliam is a possibility for Giliam
wiliam is a possibility for Giliam
Of course, since you're talking about human names, you had, at best, make this a suggestion, and be very prepared for odd spellings, and spellings of things you've never seen. And if you want to support other languages, the implementation of GenerateEdits gets more complex as you consider what counts for a 'typo'

how to compare 2 strings in c#

I have some string like this:
str1 = STA001, str2 = STA002, str3 = STA003
and have code to compare strings:
private bool IsSubstring(string strChild, string strParent)
{
if (!strParent.Contains(strChild))
{
return false;
}
else return true;
}
If I have strChild = STA001STA002 and strParent = STA001STA002STA003 then return true but when I enter strChild = STA001STA003 and check with strParent = STA001STA002STA003 then return false although STA001STA003 have contains in strParent. How can i resolve it?
What you're describing is not a substring. It is basically asking of two collections the question "is this one a subset of the other one?" This question is far easier to ask when the collection is a set such as HashSet<T> than when the collection is a big concatenated string.
This would be a much better way to write your code:
var setOne = new HashSet<string> { "STA001", "STA003" };
var setTwo = new HashSet<string> { "STA001", "STA002", "STA003" };
Console.WriteLine(setOne.IsSubsetOf(setTwo)); // True
Console.WriteLine(setTwo.IsSubsetOf(setOne)); // False
Or, if the STA00 part was just filler to make it make sense in the context of strings, then use ints directly:
var setOne = new HashSet<int> { 1, 3 };
var setTwo = new HashSet<int> { 1, 2, 3 };
Console.WriteLine(setOne.IsSubsetOf(setTwo)); // True
Console.WriteLine(setTwo.IsSubsetOf(setOne)); // False
The Contains method only looks for an exact match, it doesn't look for parts of the string.
Divide the child string into its parts, and look for each part in the parent string:
private bool IsSubstring(string child, string parent) {
for (int i = 0; i < child.Length; i+= 6) {
if (!parent.Contains(child.Substring(i, 6))) {
return false;
}
}
return true;
}
You should however consider it cross-part matches are possible, and if that is an issue. E.g. looking for "1STA00" in "STA001STA002". If that would be a problem, then you should divide the parent string similarly, and only make direct comparisons between the parts, not using the Contains method.
Note: Using hungarian notation for the data type of variables is not encouraged in C#.
This may be a bit on the overkill side, but it might be incredibly beneficial.
private static bool ContainedWord(string input, string phrase)
{
var pattern = String.Format(#"\b({0})", phrase);
var result = Regex.Match(input, pattern);
if(string.Compare(result, phrase) == 0)
return true;
return false;
}
If the expression finds a match, then compare the result to your phrase. If they're zero, it matched. I may be misunderstanding your intent.

C#: Get an integer representation of a string array

I am new to C# and I ran into the following problem (I have looked for a solution here and on google but was not successful):
Given an array of strings (some columns can possibly be doubles or integers "in string format") I would like to convert this array to an integer array.
The question only concerns the columns with actual string values (say a list of countries).
Now I believe a Dictionary can help me to identify the unique values in a given column and associate an integer number to every country that appears.
Then to create my new array which should be of type int (or double) I could loop through the whole array and define the new array via the dictionary. This I would need to do for every column which has string values.
This seems inefficient, is there a better way?
In the end I would like to do multiple linear regression (or even fit a generalized linear model, meaning I want to get a design matrix eventually) with the data.
EDIT:
1) Sorry for being unclear, I will try to clarify:
Given:
MAKE;VALUE ;GENDER
AUDI;40912.2;m
WV;3332;f
AUDI;1234.99;m
DACIA;0;m
AUDI;12354.2;m
AUDI;123;m
VW;21321.2;f
I want to get a "numerical" matrix with identifiers for the the string valued columns
MAKE;VALUE;GENDER
1;40912.2;0
2;3332;1
1;1234.99;0
3;0;0
1;12354.2;0
1;123;0
2;21321.2;1
2) I think this is actually not what I need to solve my problem. Still it does seem like an interesting question.
3) Thank you for the responses so far.
This will take all the possible strings which represent an integer and puts them in a List.
You can do the same with strings wich represent a double.
Is this what you mean??
List<int> myIntList = new List<int>()
foreach(string value in stringArray)
{
int myInt;
if(Int.TryParse(value,out myInt)
{
myIntList.Add(myInt);
}
}
Dictionary is good if you want to map each string to a key like this:
var myDictionary = new Dictionary<int,string>();
myDictionary.Add(1,"CountryOne");
myDictionary.Add(2,"CountryTwo");
myDictionary.Add(3,"CountryThree");
Then you can get your values like:
string myCountry = myDictionary[2];
But still not sure if i'm helping you right now. Do you have som code to specify what you mean?
I'm not sure if this is what you are looking for but it does output the result you are looking for, from which you can create an appropriate data structure to use. I use a list of string but you can use something else to hold the processed data. I can expand further, if needed.
It does assume that the number of "columns", based on the semicolon character, is equal throughout the data and is flexible enough to handle any number of columns. Its kind of ugly but it should get what you want.
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApplication3
{
class StringColIndex
{
public int ColIndex { get; set; }
public List<string> StringValues {get;set;}
}
class Program
{
static void Main(string[] args)
{
var StringRepresentationAsInt = new List<StringColIndex>();
List<string> rawDataList = new List<string>();
List<string> rawDataWithStringsAsIdsList = new List<string>();
rawDataList.Add("AUDI;40912.2;m");rawDataList.Add("VW;3332;f ");
rawDataList.Add("AUDI;1234.99;m");rawDataList.Add("DACIA;0;m");
rawDataList.Add("AUDI;12354.2;m");rawDataList.Add("AUDI;123;m");
rawDataList.Add("VW;21321.2;f ");
foreach(var rawData in rawDataList)
{
var split = rawData.Split(';');
var line = string.Empty;
for(int i= 0; i < split.Length; i++)
{
double outValue;
var isNumberic = Double.TryParse(split[i], out outValue);
var txt = split[i];
if (!isNumberic)
{
if(StringRepresentationAsInt
.Where(x => x.ColIndex == i).Count() == 0)
{
StringRepresentationAsInt.Add(
new StringColIndex { ColIndex = i,
StringValues = new List<string> { txt } });
}
var obj = StringRepresentationAsInt
.First(x => x.ColIndex == i);
if (!obj.StringValues.Contains(txt)){
obj.StringValues.Add(txt);
}
line += (string.IsNullOrEmpty(line) ?
string.Empty :
("," + (obj.StringValues.IndexOf(txt) + 1).ToString()));
}
else
{
line += "," + split[i];
}
}
rawDataWithStringsAsIdsList.Add(line);
}
rawDataWithStringsAsIdsList.ForEach(x => Console.WriteLine(x));
Console.ReadLine();
/*
Desired output:
1;40912.2;0
2;3332;1
1;1234.99;0
3;0;0
1;12354.2;0
1;123;0
2;21321.2;1
*/
}
}
}

Check if a String is combined with Several Substring in C#

Lets say I have several short string:
string[] shortStrings = new string[] {"xxx","yyy","zzz"};
(this definition can change length on array and on string too, so not a fixed one)
When a given string, I like to check if it combines with the shortStrings ONLY, how?
let say function is like bool TestStringFromShortStrings(string s)
then
TestStringFromShortStrings("xxxyyyzzz") = true;
TestStringFromShortStrings("xxxyyyxxx") = true;
TestStringFromShortStrings("xxxyyy") = true;
TestStringFromShortStrings("xxxxxx") = true;
TestStringFromShortStrings("xxxxx") = false;
TestStringFromShortStrings("xxxXyyyzzz") = false;
TestStringFromShortStrings("xxx2yyyxxx") = false;
Please suggest a memory not tense and relatively fast method.
[EIDT] What this function for?
I will personally use this function to test if a string is a combination of a PINYIN ok, some Chinese stuff. Following Chinese are same thing if you cannot read it.
检测一个字符串是否为汉语拼音(例如检测是否拼音域名)
所有的汉语拼音字符串有:
(To detect whether a string is Hanyu Pinyin (e.g. detect the phonetic domain) of the Pinyin string:)
Regex PinYin = new Regex(#"^(a|ai|an|ang|ao|ba|bai|ban|bang|bao|bei|ben|beng|bi|bian|biao|bie|bin|bing|bo|bu|ca|cai|can|cang|cao|ce|cen|ceng|cha|chai|chan|chang|chao|che|chen|cheng|chi|chong|chou|chu|chua|chuai|chuan|chuang|chui|chun|chuo|ci|cong|cou|cu|cuan|cui|cun|cuo|da|dai|dan|dang|dao|de|den|dei|deng|di|dia|dian|diao|die|ding|diu|dong|dou|du|duan|dui|dun|duo|e|ei|en|eng|er|fa|fan|fang|fei|fen|feng|fo|fou|fu|ga|gai|gan|gang|gao|ge|gei|gen|geng|gong|gou|gu|gua|guai|guan|guang|gui|gun|guo|ha|hai|han|hang|hao|he|hei|hen|heng|hong|hou|hu|hua|huai|huan|huang|hui|hun|huo|ji|jia|jian|jiang|jiao|jie|jin|jing|jiong|jiu|ju|juan|jue|jun|ka|kai|kan|kang|kao|ke|ken|keng|kong|kou|ku|kua|kuai|kuan|kuang|kui|kun|kuo|la|lai|lan|lang|lao|le|lei|leng|li|lia|lian|liang|liao|lie|lin|ling|liu|long|lou|lu|lv|luan|lue|lve|lun|luo|ma|mai|man|mang|mao|me|mei|men|meng|mi|mian|miao|mie|min|ming|miu|mo|mou|mu|na|nai|nan|nang|nao|ne|nei|nen|neng|ni|nian|niang|niao|nie|nin|ning|niu|nong|nou|nu|nv|nuan|nuo|nun|ou|pa|pai|pan|pang|pao|pei|pen|peng|pi|pian|piao|pie|pin|ping|po|pou|pu|qi|qia|qian|qiang|qiao|qie|qin|qing|qiong|qiu|qu|quan|que|qun|ran|rang|rao|re|ren|reng|ri|rong|rou|ru|ruan|rui|run|ruo|sa|sai|san|sang|sao|se|sen|seng|sha|shai|shan|shang|shao|she|shei|shen|sheng|shi|shou|shu|shua|shuai|shuan|shuang|shui|shun|shuo|si|song|sou|su|suan|sui|sun|suo|ta|tai|tan|tang|tao|te|teng|ti|tian|tiao|tie|ting|tong|tou|tu|tuan|tui|tun|tuo|wa|wai|wan|wang|wei|wen|weng|wo|wu|xi|xia|xian|xiang|xiao|xie|xin|xing|xiong|xiu|xu|xuan|xue|xun|ya|yan|yang|yao|ye|yi|yin|ying|yo|yong|you|yu|yuan|yue|yun|za|zai|zan|zang|zao|ze|zei|zen|zeng|zha|zhai|zhan|zhang|zhao|zhe|zhei|zhen|zheng|zhi|zhong|zhou|zhu|zhua|zhuai|zhuan|zhuang|zhui|zhun|zhuo|zi|zong|zou|zu|zuan|zui|zun|zuo)+$");
用下面的正则表达式方法,试过了,最简单而且效果非常好,就是有点慢:(
递归的方式对长字符串比较麻烦,容易内存溢出
(Tried it with the regular expression: it's the most simple and gives very good results, but it's a bit slow. The recursive way on the long string is too much trouble, it's too easy to overflow the stack.)
Edit: Simplified this a lot thanks to L.B and millimoose.
Regular Expressions to the rescue! Using System.Text.RegularExpressions.Regex, we get:
public static bool TestStringFromShortStrings(string checkText, string[] pieces) {
// Build the expression. Ultimate result will be
// of the form "^(xxx|yyy|zzz)+$".
var expr = "^(" +
String.Join("|", pieces.Select(Regex.Escape)) +
")+$";
// Check whether the supplied string matches the expression.
return Regex.IsMatch(checkText, expr);
}
This should be able to properly handle cases that have multiple repeated patterns of different lenghts. E.g. if you the list of possible pieces includes strings "xxx" and "xxxx".
Copy the target string to string builder. For each string in shortstring array, remove all occurences from target. If u end up in zero length string, true else false.
Edit:
This approach is not correct. Please refer to comments. Keeping this answer still here as it may look reasonably correct initially.
You could compare the start of the input string with each of the short strings. As soon as you have a match, you take the rest of the string and repeat. As soon as you have no more string left, you're done. For example:
string[] shortStrings = new string[] { "xxx", "yyy", "zzz" };
bool Test(string input)
{
if (input.Length == 0)
return true;
foreach (string shortStr in shortStrings)
{
if (input.StartsWith(shortStr))
{
if (Test(input.Substring(shortStr.Length)))
return true;
}
}
return false;
}
You might optimize this by removing the recursion, or by sorting the short strings and do a binary instead of a linear search.
Here is a non-recursive version, that uses a Stack object instead. No chance of getting a StackOverflowException:
string[] shortStrings = new string[] { "xxx", "yyy", "zzz" };
bool Test(string input)
{
Stack<string> stack = new Stack<string>();
stack.Push(input);
while (stack.Count > 0)
{
string str = stack.Pop();
if (str.Length == 0)
return true;
foreach (string shortStr in shortStrings)
{
if (str.StartsWith(shortStr))
stack.Push(str.Substring(shortStr.Length));
}
}
return false;
}

Categories

Resources