Lets say I have several short string:
string[] shortStrings = new string[] {"xxx","yyy","zzz"};
(this definition can change length on array and on string too, so not a fixed one)
When a given string, I like to check if it combines with the shortStrings ONLY, how?
let say function is like bool TestStringFromShortStrings(string s)
then
TestStringFromShortStrings("xxxyyyzzz") = true;
TestStringFromShortStrings("xxxyyyxxx") = true;
TestStringFromShortStrings("xxxyyy") = true;
TestStringFromShortStrings("xxxxxx") = true;
TestStringFromShortStrings("xxxxx") = false;
TestStringFromShortStrings("xxxXyyyzzz") = false;
TestStringFromShortStrings("xxx2yyyxxx") = false;
Please suggest a memory not tense and relatively fast method.
[EIDT] What this function for?
I will personally use this function to test if a string is a combination of a PINYIN ok, some Chinese stuff. Following Chinese are same thing if you cannot read it.
检测一个字符串是否为汉语拼音(例如检测是否拼音域名)
所有的汉语拼音字符串有:
(To detect whether a string is Hanyu Pinyin (e.g. detect the phonetic domain) of the Pinyin string:)
Regex PinYin = new Regex(#"^(a|ai|an|ang|ao|ba|bai|ban|bang|bao|bei|ben|beng|bi|bian|biao|bie|bin|bing|bo|bu|ca|cai|can|cang|cao|ce|cen|ceng|cha|chai|chan|chang|chao|che|chen|cheng|chi|chong|chou|chu|chua|chuai|chuan|chuang|chui|chun|chuo|ci|cong|cou|cu|cuan|cui|cun|cuo|da|dai|dan|dang|dao|de|den|dei|deng|di|dia|dian|diao|die|ding|diu|dong|dou|du|duan|dui|dun|duo|e|ei|en|eng|er|fa|fan|fang|fei|fen|feng|fo|fou|fu|ga|gai|gan|gang|gao|ge|gei|gen|geng|gong|gou|gu|gua|guai|guan|guang|gui|gun|guo|ha|hai|han|hang|hao|he|hei|hen|heng|hong|hou|hu|hua|huai|huan|huang|hui|hun|huo|ji|jia|jian|jiang|jiao|jie|jin|jing|jiong|jiu|ju|juan|jue|jun|ka|kai|kan|kang|kao|ke|ken|keng|kong|kou|ku|kua|kuai|kuan|kuang|kui|kun|kuo|la|lai|lan|lang|lao|le|lei|leng|li|lia|lian|liang|liao|lie|lin|ling|liu|long|lou|lu|lv|luan|lue|lve|lun|luo|ma|mai|man|mang|mao|me|mei|men|meng|mi|mian|miao|mie|min|ming|miu|mo|mou|mu|na|nai|nan|nang|nao|ne|nei|nen|neng|ni|nian|niang|niao|nie|nin|ning|niu|nong|nou|nu|nv|nuan|nuo|nun|ou|pa|pai|pan|pang|pao|pei|pen|peng|pi|pian|piao|pie|pin|ping|po|pou|pu|qi|qia|qian|qiang|qiao|qie|qin|qing|qiong|qiu|qu|quan|que|qun|ran|rang|rao|re|ren|reng|ri|rong|rou|ru|ruan|rui|run|ruo|sa|sai|san|sang|sao|se|sen|seng|sha|shai|shan|shang|shao|she|shei|shen|sheng|shi|shou|shu|shua|shuai|shuan|shuang|shui|shun|shuo|si|song|sou|su|suan|sui|sun|suo|ta|tai|tan|tang|tao|te|teng|ti|tian|tiao|tie|ting|tong|tou|tu|tuan|tui|tun|tuo|wa|wai|wan|wang|wei|wen|weng|wo|wu|xi|xia|xian|xiang|xiao|xie|xin|xing|xiong|xiu|xu|xuan|xue|xun|ya|yan|yang|yao|ye|yi|yin|ying|yo|yong|you|yu|yuan|yue|yun|za|zai|zan|zang|zao|ze|zei|zen|zeng|zha|zhai|zhan|zhang|zhao|zhe|zhei|zhen|zheng|zhi|zhong|zhou|zhu|zhua|zhuai|zhuan|zhuang|zhui|zhun|zhuo|zi|zong|zou|zu|zuan|zui|zun|zuo)+$");
用下面的正则表达式方法,试过了,最简单而且效果非常好,就是有点慢:(
递归的方式对长字符串比较麻烦,容易内存溢出
(Tried it with the regular expression: it's the most simple and gives very good results, but it's a bit slow. The recursive way on the long string is too much trouble, it's too easy to overflow the stack.)
Edit: Simplified this a lot thanks to L.B and millimoose.
Regular Expressions to the rescue! Using System.Text.RegularExpressions.Regex, we get:
public static bool TestStringFromShortStrings(string checkText, string[] pieces) {
// Build the expression. Ultimate result will be
// of the form "^(xxx|yyy|zzz)+$".
var expr = "^(" +
String.Join("|", pieces.Select(Regex.Escape)) +
")+$";
// Check whether the supplied string matches the expression.
return Regex.IsMatch(checkText, expr);
}
This should be able to properly handle cases that have multiple repeated patterns of different lenghts. E.g. if you the list of possible pieces includes strings "xxx" and "xxxx".
Copy the target string to string builder. For each string in shortstring array, remove all occurences from target. If u end up in zero length string, true else false.
Edit:
This approach is not correct. Please refer to comments. Keeping this answer still here as it may look reasonably correct initially.
You could compare the start of the input string with each of the short strings. As soon as you have a match, you take the rest of the string and repeat. As soon as you have no more string left, you're done. For example:
string[] shortStrings = new string[] { "xxx", "yyy", "zzz" };
bool Test(string input)
{
if (input.Length == 0)
return true;
foreach (string shortStr in shortStrings)
{
if (input.StartsWith(shortStr))
{
if (Test(input.Substring(shortStr.Length)))
return true;
}
}
return false;
}
You might optimize this by removing the recursion, or by sorting the short strings and do a binary instead of a linear search.
Here is a non-recursive version, that uses a Stack object instead. No chance of getting a StackOverflowException:
string[] shortStrings = new string[] { "xxx", "yyy", "zzz" };
bool Test(string input)
{
Stack<string> stack = new Stack<string>();
stack.Push(input);
while (stack.Count > 0)
{
string str = stack.Pop();
if (str.Length == 0)
return true;
foreach (string shortStr in shortStrings)
{
if (str.StartsWith(shortStr))
stack.Push(str.Substring(shortStr.Length));
}
}
return false;
}
Related
I have some string like this:
str1 = STA001, str2 = STA002, str3 = STA003
and have code to compare strings:
private bool IsSubstring(string strChild, string strParent)
{
if (!strParent.Contains(strChild))
{
return false;
}
else return true;
}
If I have strChild = STA001STA002 and strParent = STA001STA002STA003 then return true but when I enter strChild = STA001STA003 and check with strParent = STA001STA002STA003 then return false although STA001STA003 have contains in strParent. How can i resolve it?
What you're describing is not a substring. It is basically asking of two collections the question "is this one a subset of the other one?" This question is far easier to ask when the collection is a set such as HashSet<T> than when the collection is a big concatenated string.
This would be a much better way to write your code:
var setOne = new HashSet<string> { "STA001", "STA003" };
var setTwo = new HashSet<string> { "STA001", "STA002", "STA003" };
Console.WriteLine(setOne.IsSubsetOf(setTwo)); // True
Console.WriteLine(setTwo.IsSubsetOf(setOne)); // False
Or, if the STA00 part was just filler to make it make sense in the context of strings, then use ints directly:
var setOne = new HashSet<int> { 1, 3 };
var setTwo = new HashSet<int> { 1, 2, 3 };
Console.WriteLine(setOne.IsSubsetOf(setTwo)); // True
Console.WriteLine(setTwo.IsSubsetOf(setOne)); // False
The Contains method only looks for an exact match, it doesn't look for parts of the string.
Divide the child string into its parts, and look for each part in the parent string:
private bool IsSubstring(string child, string parent) {
for (int i = 0; i < child.Length; i+= 6) {
if (!parent.Contains(child.Substring(i, 6))) {
return false;
}
}
return true;
}
You should however consider it cross-part matches are possible, and if that is an issue. E.g. looking for "1STA00" in "STA001STA002". If that would be a problem, then you should divide the parent string similarly, and only make direct comparisons between the parts, not using the Contains method.
Note: Using hungarian notation for the data type of variables is not encouraged in C#.
This may be a bit on the overkill side, but it might be incredibly beneficial.
private static bool ContainedWord(string input, string phrase)
{
var pattern = String.Format(#"\b({0})", phrase);
var result = Regex.Match(input, pattern);
if(string.Compare(result, phrase) == 0)
return true;
return false;
}
If the expression finds a match, then compare the result to your phrase. If they're zero, it matched. I may be misunderstanding your intent.
I want to check if a string contains more than one character in the string?
If i have a string 12121.23.2 so i want to check if it contains more than one . in the string.
You can compare IndexOf to LastIndexOf to check if there is more than one specific character in a string without explicit counting:
var s = "12121.23.2";
var ch = '.';
if (s.IndexOf(ch) != s.LastIndexOf(ch)) {
...
}
You can easily count the number of occurences of a character with LINQ:
string foo = "12121.23.2";
foo.Count(c => c == '.');
If performance matters, write it yourself:
public static bool ContainsDuplicateCharacter(this string s, char c)
{
bool seenFirst = false;
for (int i = 0; i < s.Length; i++)
{
if (s[i] != c)
continue;
if (seenFirst)
return true;
seenFirst = true;
}
return false;
}
In this way, you only make one pass through the string's contents, and you bail out as early as possible. In the worst case you visit all characters only once. In #dasblinkenlight's answer, you would visit all characters twice, and in #mensi's answer, you have to count all instances, even though once you have two you can stop the calculation. Further, using the Count extension method involves using an Enumerable<char> which will run more slowly than directly accessing the characters at specific indices.
Then you may write:
string s = "12121.23.2";
Debug.Assert(s.ContainsDuplicateCharacter('.'));
Debug.Assert(s.ContainsDuplicateCharacter('1'));
Debug.Assert(s.ContainsDuplicateCharacter('2'));
Debug.Assert(!s.ContainsDuplicateCharacter('3'));
Debug.Assert(!s.ContainsDuplicateCharacter('Z'));
I also think it's nicer to have a function that explains exactly what you're trying to achieve. You could wrap any of the other answers in such a function too, however.
Boolean MoreThanOne(String str, Char c)
{
return str.Count(x => x==c) > 1;
}
There are segments in the below-mentioned string. Each segment is started with a tilt(~) sign and I want to check if there exists a segment in which the AAA segment appears and on its 3rd index a number 63 is present.
ISA*ABC**TODAY*ALEXANDER GONZALEZ~HL*CDD*DKKD*S~EB*1*AKDK**DDJKJ~AAA*Y**50*P~AAA*N**50*P~AAA*N**63*C~AAA*N**50*D~AAA*N**45*D
I want to do it with a regular expression to avoid lengthy coding. I have tried and come up with this (~AAA) to check if this segment exists or not but because I am new to regular expressions I don't know how to check if 63 appears on the 3rd index or not? If anyone can help I will be very thankful.
I have to agree with Sebastian's comment. This can be accomplished using simple Split operations.
private static bool Check(string input)
{
int count = 0;
foreach (string segment in input.Split('~'))
{
string[] tokens = segment.Split('*');
if (tokens[0] == "AAA")
{
count++;
if (count == 3)
{
if (tokens[3] == "63") return true;
else return false;
}
}
}
return false;
}
EDIT:
Since you want fewer lines of codes, how about LINQ?
private bool Check(string input)
{
return input.Split('~').Select(x => x.Split('*')).Any(x => x.Length >= 4 && x[0].Equals("AAA") && x[3].Equals("63"));
}
EDIT2:
For completeness, here's a Regex solution as well:
private bool Check(string input)
{
return Regex.IsMatch(input, #".*~AAA\*([^\*~]*\*){2}63.*");
}
There are many, many, sites and tools that can help you with regex. Google can help you further.
You could use this:
~AAA\*[A-Z]\*\*63
Note the \ is used to escape the * and [A-Z] matches any uppercase alphabetical character.
string s = "ISA*ABC**TODAY*ALEXANDER GONZALEZ~HL*CDD*DKKD*S~EB*1*AKDK**DDJKJ~AAA*Y**50*P~AAA*N**50*P~AAA*N**63*C~AAA*N**50*D~AAA*N* *45*D";
string[] parts = s.Split('~');
for (int i = 0; i < parts.Count(); i++)
{
MessageBox.Show("part "+i.ToString()+" contains 63: "+parts[i].Contains("63").ToString());
}
this is just an example. you can do a lot more, I checked existance of 63 in any segment.
so if you exactly want to check if segment 3 the code would be:
bool segment3Contains63 = s.Slplit('~')[3].Contains("63");
Here I am using Split function to get the parts of string.
string[] OrSets = SubLogic.Split('|');
foreach (string OrSet in OrSets)
{
bool OrSetFinalResult = false;
if (OrSet.Contains('&'))
{
OrSetFinalResult = true;
if (OrSet.Contains('0'))
{
OrSetFinalResult = false;
}
//string[] AndSets = OrSet.Split('&');
//foreach (string AndSet in AndSets)
//{
// if (AndSet == "0")
// {
// // A single "false" statement makes the entire And statement FALSE
// OrSetFinalResult = false;
// break;
// }
//}
}
else
{
if (OrSet == "1")
{
OrSetFinalResult = true;
}
}
if (OrSetFinalResult)
{
// A single "true" statement makes the entire OR statement TRUE
FinalResult = true;
break;
}
}
How can I replace the Split operation , along with replacement of foreach constructs.
Hypothesis #1
Depending of the kind of your process, you can parallellize the work :
var OrSets = SubLogic.Split('|').AsParallel();
foreach (string OrSet in OrSets)
{
...
....
}
However, this can often leads to problems with multithreaded apps (locking resource, etc.).
And you have also to measure the benefits. Switching from one thread to another can be costly. If the job is small, the AsParallel will be slower than a simple sequential loop.
This is very efficient when you have latency with network resource, or any kind of I/O.
Hypothesis #2
Your SubLogic variable is very very very big
You can, in this case, walk sequentially the file :
class Program
{
static void Main(string[] args)
{
var SubLogic = "darere|gfgfgg|gfgfg";
using (var sr = new StringReader(SubLogic))
{
var str = string.Empty;
int charValue;
do
{
charValue = sr.Read();
var c = (char)charValue;
if (c == '|' || (charValue == -1 && str.Length > 0))
{
Process(str);
str = string.Empty; // Reset the string
}
else
{
str += c;
}
} while (charValue >= 0);
}
Console.ReadLine();
}
private static void Process(string str)
{
// Your actual Job
Console.WriteLine(str);
}
Also, depending of the length of each chunk between |, you may want to use a StringBuilder and not a simple string concatenation.
Chances are that if you need to optimize to improve the performance of your application, that the code inside of the foreach loop is what needs to be optimized, not the string.Split method.
[EDIT:]
There are a number of good answers elsewhere on StackOverflow related to optimized string parsing:
Fastest Way to Parse Large Strings (multi threaded)
Fast string parsing in C#
String.Split() likely does more than you can do on your own to actually split the string up in a well-optimized manner. That assumes that you are interesting in returning true or false for each split section of your input, of course. Otherwise, you can just focus on searching your string.
As others have mentioned, if you need to search through a huge string (many hundreds of megabytes) and, especially, do so repeatedly and continuously, then look at what .NET 4 gives you with the Task Parallel Library.
For searching through strings, you can look at this example on MSDN for how to use IndexOf, LastIndexOf, StartsWith, and EndsWith methods. Those should perform better than the Contains method.
Of course, the best solution is dependent upon the facts of your particular situation. You'll want to use the System.Diagnostics.Stopwatch class to see how long your implementations (both current and new) take to see what works best.
You could possibly deal with it by using StringBuilder.
Try reading char-by-char from your source string into StringBuilder, till you find '|', then process what a StringBuilder contains.
That is how you'll avoid creation of tonns of String objects and save a lot of memory.
If you would have used Java, I'd recommend using StringTokenizer and StreamTokenizer classes. It's a pity there are no similar classes in .NET
How to find whether a string array contains some part of string?
I have array like this
String[] stringArray = new [] { "abc#gmail.com", "cde#yahoo.com", "#gmail.com" };
string str = "coure06#gmail.com"
if (stringArray.Any(x => x.Contains(str)))
{
//this if condition is never true
}
i want to run this if block when str contains a string thats completely or part of any of array's Item.
Assuming you've got LINQ available:
bool partialMatch = stringArray.Any(x => str.Contains(x));
Even without LINQ it's easy:
bool partialMatch = Array.Exists(stringArray, x => str.Contains(x));
or using C# 2:
bool partialMatch = Array.Exists(stringArray,
delegate(string x) { return str.Contains(x)); });
If you're using C# 1 then you probably have to do it the hard way :)
If you're looking for if a particular string in your array contains just "#gmail.com" instead of "abc#gmail.com" you have a couple of options.
On the input side, there are a variety of questions here on SO which will point you in the direction you need to go to validate that your input is a valid email address.
If you can only check on the back end, I'd do something like:
emailStr = "#gmail.com";
if(str.Contains(emailStr) && str.length == emailStr.length)
{
//your processing here
}
You can also use Regex matching, but I'm not nearly familiar enough with that to tell you what pattern you'd need.
If you're looking for just anything containing "#gmail.com", Jon's answer is your best bets.