Find all possible combinations of word with and without hyphens - c#

For a string that may have zero or more hyphens in it, I need to extract all the different possibilities with and without hyphens.
For example, the string "A-B" would result in "A-B" and "AB" (two possibilities).
The string "A-B-C" would result in "A-B-C", "AB-C", "A-BC" and "ABC" (four possibilities).
The string "A-B-C-D" would result in "A-B-C-D", "AB-C-D", "A-BC-D", "A-B-CD", "AB-CD", "ABC-D", "A-BCD" and "ABCD" (eight possibilities).
...etc, etc.
I've experimented with some nested loops but haven't been able to get anywhere near the desired result. I suspect I need something recursive unless there is some simple solution I am overlooking.
NB. This is to build a SQL query (shame that SQL Server does't have MySQL's REGEXP pattern matching).
Here is one attempt I was working on. This might work if I do this recursively.
string keyword = "A-B-C-D";
List<int> hyphens = new List<int>();
int pos = keyword.IndexOf('-');
while (pos != -1)
{
hyphens.Add(pos);
pos = keyword.IndexOf('-', pos + 1);
}
for (int i = 0; i < hyphens.Count(); i++)
{
string result = keyword.Substring(0, hyphens[i]) + keyword.Substring(hyphens[i] + 1);
Response.Write("<p>" + result);
}
A B C D are words of varying length.

Take a look at your sample cases. Have you noticed a pattern?
With 1 hyphen there are 2 possibilities.
With 2 hyphens there are 4 possibilities.
With 3 hyphens there are 8 possibilities.
The number of possibilities is 2n.
This is literally exponential growth, so if there are too many hyphens in the string, it will quickly become infeasible to print them all. (With just 30 hyphens there are over a billion combinations!)
That said, for smaller numbers of hyphens it might be interesting to generate a list. To do this, you can think of each hyphen as a bit in a binary number. If the bit is 1, the hyphen is present, otherwise it is not. So this suggests a fairly straightforward solution:
Split the original string on the hyphens
Let n = the number of hyphens
Count from 2n - 1 down to 0. Treat this counter as a bitmask.
For each count begin building a string starting with the first part.
Concatenate each of the remaining parts to the string in order, preceded by a hyphen only if the corresponding bit in the bitmask is set.
Add the resulting string to the output and continue until the counter is exhausted.
Translated to code we have:
public static IEnumerable<string> EnumerateHyphenatedStrings(string s)
{
string[] parts = s.Split('-');
int n = parts.Length - 1;
if (n > 30) throw new Exception("too many hyphens");
for (int m = (1 << n) - 1; m >= 0; m--)
{
StringBuilder sb = new StringBuilder(parts[0]);
for (int i = 1; i <= n; i++)
{
if ((m & (1 << (i - 1))) > 0) sb.Append('-');
sb.Append(parts[i]);
}
yield return sb.ToString();
}
}
Fiddle: https://dotnetfiddle.net/ne3N8f

You should be able to track each hyphen position, and basically say its either there or not there. Loop through all the combinations, and you got all your strings. I found the easiest way to track it was using a binary, since its easy to add those with Convert.ToInt32
I came up with this:
string keyword = "A-B-C-D";
string[] keywordSplit = keyword.Split('-');
int combinations = Convert.ToInt32(Math.Pow(2.0, keywordSplit.Length - 1.0));
List<string> results = new List<string>();
for (int j = 0; j < combinations; j++)
{
string result = "";
string hyphenAdded = Convert.ToString(j, 2).PadLeft(keywordSplit.Length - 1, '0');
// Generate string
for (int i = 0; i < keywordSplit.Length; i++)
{
result += keywordSplit[i] +
((i < keywordSplit.Length - 1) && (hyphenAdded[i].Equals('1')) ? "-" : "");
}
results.Add(result);
}

This works for me:
Func<IEnumerable<string>, IEnumerable<string>> expand = null;
expand = xs =>
{
if (xs != null && xs.Any())
{
var head = xs.First();
if (xs.Skip(1).Any())
{
return expand(xs.Skip(1)).SelectMany(tail => new []
{
head + tail,
head + "-" + tail
});
}
else
{
return new [] { head };
}
}
else
{
return Enumerable.Empty<string>();
}
};
var keyword = "A-B-C-D";
var parts = keyword.Split('-');
var results = expand(parts);
I get:
ABCD
A-BCD
AB-CD
A-B-CD
ABC-D
A-BC-D
AB-C-D
A-B-C-D

I've tested this code and it is working as specified in the question. I stored the strings in a List<string>.
string str = "AB-C-D-EF-G-HI";
string[] splitted = str.Split('-');
List<string> finalList = new List<string>();
string temp = "";
for (int i = 0; i < splitted.Length; i++)
{
temp += splitted[i];
}
finalList.Add(temp);
temp = "";
for (int diff = 0; diff < splitted.Length-1; diff++)
{
for (int start = 1, limit = start + diff; limit < splitted.Length; start++, limit++)
{
int i = 0;
while (i < start)
{
temp += splitted[i++];
}
while (i <= limit)
{
temp += "-";
temp += splitted[i++];
}
while (i < splitted.Length)
{
temp += splitted[i++];
}
finalList.Add(temp);
temp = "";
}
}

I'm not sure your question is entirely well defined (i.e. could you have something like A-BCD-EF-G-H?). For "fully" hyphenated strings (A-B-C-D-...-Z), something like this should do:
string toParse = "A-B-C-D";
char[] toParseChars = toPase.toCharArray();
string result = "";
string binary;
for(int i = 0; i < (int)Math.pow(2, toParse.Length/2); i++) { // Number of subsets of an n-elt set is 2^n
binary = Convert.ToString(i, 2);
while (binary.Length < toParse.Length/2) {
binary = "0" + binary;
}
char[] binChars = binary.ToCharArray();
for (int k = 0; k < binChars.Length; k++) {
result += toParseChars[k*2].ToString();
if (binChars[k] == '1') {
result += "-";
}
}
result += toParseChars[toParseChars.Length-1];
Console.WriteLine(result);
}
The idea here is that we want to create a binary word for each possible hyphen. So, if we have A-B-C-D (three hyphens), we create binary words 000, 001, 010, 011, 100, 101, 110, and 111. Note that if we have n hyphens, we need 2^n binary words.
Then each word maps to the output you desire by inserting the hyphen where we have a '1' in our word (000 -> ABCD, 001 -> ABC-D, 010 -> AB-CD, etc). I didn't test the code above, but this is at least one way to solve the problem for fully hyphenated words.
Disclaimer: I didn't actually test the code

Related

How to split characters of string equally between buttons?

I have an array of string elements of various words. I need the characters of each word be split equally into the text component of 3 buttons. For example, the array could hold the elements "maybe", "his", "car". In each game one of these words will be pulled from the array and its characters divided into the 3 buttons. For example, button 1 will have "ma", button 2 will have "yb" and button 3 "e" (for the word maybe). I then hide the text element of one button for the user to drag and drop the correct missing letter(s) into the space. The purpose of the game is to help children learn to spell. Does anyone know how I could go about dividing the characters equally into the 3 buttons?
Here's a function that would split the word into the amount of segments you want. You can then iterate over that list to set each segment to a button.Text.
public List<string> SplitInSegments(string word, int segments)
{
int wordLength = word.Length;
// The remainder tells us how many segments will get an extra letter
int remainder = wordLength % segments;
// The base length of a segment
// This is a floor division, because we're dividing ints.
// So 5 / 3 = 1
int segmentLength = wordLength / segments;
var result = new List<string>();
int startIndex = 0;
for (int i = 0; i < segments; i++)
{
// This segment may get an extra letter, if its index is smaller then the remainder
int currentSegmentLength = segmentLength + (i < remainder ? 1 : 0);
string currentSegment = word.Substring(startIndex, currentSegmentLength);
// Set the startindex for the next segment.
startIndex += currentSegmentLength;
result.Add(currentSegment);
}
return result;
}
usage:
// returns ["ma", "yb", "e"]
var segments = SplitInSegments("maybe", 3);
Edit
I like the fact that this is for teaching children. So here comes.
Regarding your question on splitting the string based on specific letter sequences: After you've split the string using regex, you will have an array of strings. Then determine the amount of items in the splitted string and concatenate or split further based on the number of segments:
// sequences to split on first
static readonly string[] splitSequences = {
"el",
"ol",
"bo"
};
static readonly string regexDelimiters = string.Join('|', splitSequences.Select(s => "(" + s + ")"));
// Method to split on sequences
public static List<string> SplitOnSequences(string word)
{
return Regex.Split(word, regexDelimiters).Where(s => !string.IsNullOrEmpty(s)).ToList();
}
public static List<string> SplitInSegments(string word, int segments)
{
int wordLength = word.Length;
// The remainder tells us how many segments will get an extra letter
int remainder = wordLength % segments;
// The base length of a segment
// This is a floor division, because we're dividing ints.
// So 5 / 3 = 1
int segmentLength = wordLength / segments;
var result = new List<string>();
int startIndex = 0;
for (int i = 0; i < segments; i++)
{
// This segment may get an extra letter, if its index is smaller then the remainder
int currentSegmentLength = segmentLength + (i < remainder ? 1 : 0);
string currentSegment = word.Substring(startIndex, currentSegmentLength);
// Set the startindex for the next segment.
startIndex += currentSegmentLength;
result.Add(currentSegment);
}
return result;
}
// Splitword will now always return 3 segments
public static List<string> SplitWord(string word)
{
if (word == null)
{
throw new ArgumentNullException(nameof(word));
}
if (word.Length < 3)
{
throw new ArgumentException("Word must be at least 3 characters long", nameof(word));
}
var splitted = SplitOnSequences(word);
var result = new List<string>();
if (splitted.Count == 1)
{
// If the result is not splitted, just split it evenly.
result = SplitInSegments(word, 3);
}
else if (splitted.Count == 2)
{
// If we've got 2 segments, split the shortest segment again.
if (splitted[1].Length > splitted[0].Length
&& !splitSequences.Contains(splitted[1]))
{
result.Add(splitted[0]);
result.AddRange(SplitInSegments(splitted[1], 2));
}
else
{
result.AddRange(SplitInSegments(splitted[0], 2));
result.Add(splitted[1]);
}
}
else // splitted.Count >= 3
{
// 3 segments is good.
result = splitted;
// More than 3 segments, combine some together.
while (result.Count > 3)
{
// Find the shortest combination of two segments
int shortestComboCount = int.MaxValue;
int shortestComboIndex = 0;
for (int i = 0; i < result.Count - 1; i++)
{
int currentComboCount = result[i].Length + result[i + 1].Length;
if (currentComboCount < shortestComboCount)
{
shortestComboCount = currentComboCount;
shortestComboIndex = i;
}
}
// Combine the shortest segments and replace in the result.
string combo = result[shortestComboIndex] + result[shortestComboIndex + 1];
result.RemoveAt(shortestComboIndex + 1);
result[shortestComboIndex] = combo;
}
}
return result;
}
Now when you call the code:
// always returns three segments.
var splitted = SplitWord(word);
Here is another approach.
First make sure that the word can be divided by the desired segments (add a dummy space if necessary) , then use a Linq statement to get your parts and when adding the result trim away the dummy characters.
public static string[] SplitInSegments(string word, int segments)
{
while(word.Length % segments != 0) { word+=" ";}
var result = new List<string>();
for(int x=0; x < word.Count(); x += word.Length / segments)
result.Add((new string(word.Skip(x).Take(word.Length / segments).ToArray()).Trim()));
return result.ToArray();
}
You can split your string into a list and generate buttons based on your list. The logic for splitting the word into a string list would be something similar to this:
string test = "maybe";
List list = new List();
int i = 0, len = 2;
while(i <= test.Length)
{
int lastIndex = test.Length - 1;
list.Add(test.Substring(i, i + len > lastIndex? (i + len) - test.Length : len));
i += len;
}
HTH

Xor operation between binary values in C#

My question is that i have a list of binary string like below :
list=<"1111","1010","1010","0011">
and an input string of binary value st1=1010. I want to Xor between :
st3=st1 Xor list<0>
then :
st3=st3 Xor list<1>
st3=st3Xor list <2>;
st3=st3 Xor list <3>;
where the operation will be st1 Xor with first key in keys list and the result Xor with the second key in keys list and the result Xor with the third key in keys list and so on . Can any one help me please?
i have tried this code but it does not work as i expected :
foreach (string k in keys)
{
string st1 = textBox1.text;
string st2 = k;
string st3;
st3 = "";
//i wanted to make the length of both strings st1 and st2 equal
//here if the length of st1 greater than st2
if (st1.Length > st2.Length)
{
int n = st1.Length - st2.Length;
string pad = "";
for (int j = 1; j <= n; j++)
{ pad += 0; }
string recover = pad.ToString() + st2;
//this is my Xor operation that i made for string values
for (int counter = 0; counter < st1.Length; counter++)
{
if (st1[counter] != recover[counter])
{
st3 = st3 + '1';
}
else
{ st3 = st3 + '0'; }
}
listBox4.Items.Add("Xor :" + st3.ToString());
}
//here if st1 is less than st2
else if (st1.Length < st2.Length)
{
int nn = st2.Length - st1.Length;
string ppad = "";
for (int j = 1; j <= nn; j++)
{
ppad += 0;
}
string recover = ppad.ToString() + st1;
for (int counter = 0; counter < st2.Length; counter++)
{
if (st2[counter] != recover[counter])
{
st3 = st3 + '1';
}
else
{ st3 = st3 + '0'; }
}
listBox4.Items.Add("Xor :" + st3.ToString());}
//here if st1 equal st2
else
{
for (int counter = 0; counter < st1.Length; counter++)
{
if (st1[counter] != st2[counter])
{
st3 = st3 + '1';
}
else
{ st3 = st3 + '0'; }
}
listBox4.Items.Add("Xor :" + st3.ToString());
}
}
the result that i do not expected is :
Here's one approach (Arbitrary length binary strings):
Convert the strings back to integers BigIntegers, so that we can actually get the utility of existing bitwise Xor operator (^).
Use LINQ's Aggregate to consecutively left-fold the seed value (st1) with the converted list with Xor.
Since you seem interested only in the lowest 4 bits, I've applied a mask, although if all your numbers are strictly 4 bits, this isn't actually necessary (since 0 Xor 0 stays 0)
You can convert the int back to a binary string with Convert.ToString(x, 2) and then PadLeft to replace any missing leading zeroes.
Edit - OP has changed the question from an example 4 bit number and the requirement is now to work with arbitrary length binary strings. This approach still works, but we'll need to use BigInteger (which still has an XOR ^ operator), but we need helpers to parse and format binary strings, as these aren't built into BigInteger. The BitMask and padding have also been removed, since the strings aren't fixed length - the result will have at most 1 leading zero:
var list = new List<string>{"10101010101010101101","1101010101010101011",
"1110111111010101101","11111111111111111111111111","10101010110101010101"};
var listNum = list.Select(l => BinaryStringToBigInteger(l));
var st1 = "000000001";
var seedNumber = BinaryStringToBigInteger(st1);
var chainedXors = listNum.Aggregate(seedNumber, (prev, next) => prev ^ next);
// Back to binary representation of the string
var resultString = chainedXors.ToBinaryString();
And because there's no native support for converting BigIntegers to / from binary strings, you'll need a conversion helper such as Douglas's one here:
BigInteger BinaryStringToBigInteger(string binString)
{
return binString.Aggregate(BigInteger.Zero, (prev, next) => prev * 2 + next - '0');
}
And for the reverse operation, ToBinaryString is from this helper.
32 Bit Integer answer
If the Binary strings are 32 bits or less, then a much simpler solution exists, since there are out of the box conversions to / from binary strings. The same approach should apply for 64 bit longs.
var list = new List<string>{"1111","1010","1010","0011","0011"};
var listNum = list.Select(l => Convert.ToInt32(l, 2));
// If you only want the last 4 bits. Change this to include as many bits as needed.
var bitMask = Convert.ToInt32("00000000000000000000000000001111", 2);
var st1 = "1010";
var someNum = Convert.ToInt32(st1, 2);
var chainedXors = listNum.Aggregate(someNum, (prev, next) => prev ^ next);
// If you need the result back as a 4 bit binary-string, zero padded
var resultString = Convert.ToString(chainedXors & bitMask, 2)
.PadLeft(4, '0');
Try this code:
static void Main(string[] args)
{
List<string> list = new List<string> { "1111", "1010", "1010", "0011" };
string st1 = "1010";
foreach (string item in list)
{
st1 = XorBins(st1, item);
Console.WriteLine(st1);
}
Console.ReadKey();
}
private static string XorBins(string bin1, string bin2)
{
int len = Math.Max(bin1.Length, bin2.Length);
string res = "";
bin1 = bin1.PadLeft(len, '0');
bin2 = bin2.PadLeft(len, '0');
for (int i = 0; i < len; i++)
res += bin1[i] == bin2[i] ? '0' : '1';
return res;
}
Here is an Xor method for you:
public static string Xor(string s1, string s2) {
// find the length of the longest of the two strings
int longest = Math.Max(s1.Length, s2.Length);
// pad both strings to that length. You don't need to write the padding
// logic yourself! There is already a method that does that!
string first = s1.PadLeft(longest, '0');
string second = s2.PadLeft(longest, '0');
// Enumerable.Zip takes two sequences (in this case sequences of char, aka strings)
// and lets you transform each element in the sequences. Here what
// I did was check if the two chars are not equal, in which case
// I transform the two elements to a 1, 0 otherwise
return string.Join("", Enumerable.Zip(first, second, (x, y) => x != y ? '1' : '0'));
}
You can use it like this:
Xor("1111", "1010") // 0101

C# cant understand count

I am trying to count words in this program but i don't understand why the program is counting for 1 number less than it must be.
For example:
sun is hot
program will show me that there is only 2 words.
Console.WriteLine("enter your text here");
string text = Convert.ToString(Console.ReadLine());
int count = 0;
text = text.Trim();
for (int i = 0; i < text.Length - 1; i++)
{
if (text[i] == 32)
{
if (text[i + 1] != 32)
{
count++;
}
}
}
Console.WriteLine(count);
Regular expression works the best for this.
var str = "this,is:my test string!with(difffent?.seperators";
int count = Regex.Matches(str, #"[\w]+").Count;
the result is 8.
Counts all words, does not include spaces or any special characters, regardless if they repeat or not.

What's the fastest way to remove characters from an alpha-numeric string?

Say we have the following strings that we pass as parameters to the function below:
string sString = "S104";
string sString2 = "AS105";
string sString3 = "ASRVT106";
I want to be able to extract the numbers from the string to place them in an int variable. Is there a quicker and/or more efficient way of removing the letters from the strings than the following code?: (*These strings will be populated dynamically at runtime - they are not assigned values at construction.)
Code:
public GetID(string sCustomTag = null)
{
m_sCustomTag = sCustomTag;
try {
m_lID = Convert.ToInt32(m_sCustomTag); }
catch{
try{
int iSubIndex = 0;
char[] subString = sCustomTag.ToCharArray();
//ITERATE THROUGH THE CHAR ARRAY
for (int i = 0; i < subString.Count(); i++)
{
for (int j = 0; j < 10; j++)
{
if (subString[i] == j)
{
iSubIndex = i;
goto createID;
}
}
}
createID: m_lID = Convert.ToInt32(m_sCustomTag.Substring(iSubIndex));
}
//IF NONE OF THAT WORKS...
catch(Exception e)
{
m_lID = 00000;
throw e;
}
}
}
}
I've done things like this before, but I'm not sure if there's a more efficient way to do it. If it was just going to be a single letter at the beginning, I could just set the subStringIndex to 1 every time, but the users can essentially put in whatever they want. Generally, they will be formatted to a LETTER-then-NUMBER format, but if they don't, or they want to put in multiple letters like sString2 or sString3, then I need to be able to compensate for that. Furthermore, if the user puts in some whacked-out, non-traditional format like string sString 4 = S51A24;, is there a way to just remove any and all letters from the string?
I've looked about, and can't find anything on MSDN or Google. Any help or links to it are greatly appreciated!
You can use a regular expression. It's not necessarily faster, but it's more concise.
string sString = "S104";
string sString2 = "AS105";
string sString3 = "ASRVT106";
var re = new Regex(#"\d+");
Console.WriteLine(re.Match(sString).Value); // 104
Console.WriteLine(re.Match(sString2).Value); // 105
Console.WriteLine(re.Match(sString3).Value); // 106
You can use a Regex, but it's probably faster to just do:
public int ExtractInteger(string str)
{
var sb = new StringBuilder();
for (int i = 0; i < str.Length; i++)
if(Char.IsDigit(str[i])) sb.Append(str[i]);
return int.Parse(sb.ToString());
}
You can simplify further with some LINQ at the expense of a small performance penalty:
public int ExtractInteger(string str)
{
return int.Parse(new String(str.Where(c=>Char.IsDigit(c)).ToArray()));
}
Now, if you only want to parse the first sequence of consecutive digits, do this instead:
public int ExtractInteger(string str)
{
return int.Parse(new String(str.SkipWhile(c=>!Char.IsDigit(c)).TakeWhile(c=>Char.IsDigit(c)).ToArray()));
}
Fastest is to parse the string without removing anything:
var s = "S51A24";
int m_lID = 0;
for (int i = 0; i < s.Length; i++)
{
int d = s[i] - '0';
if ((uint)d < 10)
m_lID = m_lID * 10 + d;
}
Debug.Print(m_lID + ""); // 5124
string removeLetters(string s)
{
for (int i = 0; i < s.Length; i++)
{
char c = s[i];
if (IsEnglishLetter(c))
{
s = s.Remove(i, 1);
}
}
return s;
}
bool IsEnglishLetter(char c)
{
return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
}
While you asked "what's the fastest way to remove characters..." what you really appear to be asking is "how do I create an integer by extracting only the digits from the string"?
Going with this assumption, your first call to Convert.ToInt32 will be slow for the case where you have other than digits because of the exception throwing.
Let's try another approach. Let's think about each of the cases.
The string always starts with a series of digits (e.g. 123ABC => 123)
The string always ends with a series of digits (e.g. ABC123 => 123)
A string has a series of contiguous digits in the middle (e.g. AB123C ==> 123)
The digits are possibly noncontiguous (e.g. A77C12 => 7712)
Case 4 is the "safest" assumption (after all, it is a superset of Case 1, 2 and 3. So, we need an algorithm for that. As a bonus I'll provide algorithms specialized to the other cases.
The Main Algorithm, All Cases
Using in-place unsafe iteration of the characters of the string, which uses fixed, we can extract digits and convert them to a single number without the data copy in ToCharArray(). We can also avoid the allocations of, say, a StringBuilder implementation and a possibly slow regex solution.
NOTE: This is valid C# code though it's using pointers. It does look like C++, but I assure you it's C#.
public static unsafe int GetNumberForwardFullScan(string s)
{
int value = 0;
fixed (char* pString = s)
{
var pChar = pString;
for (int i = 0; i != s.Length; i++, pChar++)
{
// this just means if the char is not between 0-9, we exit the loop (i.e. stop calculating the integer)
if (*pChar < '0' || *pChar > '9')
continue;
// running recalculation of the integer
value = value * 10 + *pChar - '0';
}
}
return value;
}
Running this against any of the inputs: "AS106RVT", "ASRVT106", "106ASRVT", or "1AS0RVT6" results in pulling out 1, 0, 6 and calculating on each digit as
0*10 + 1 == 1
1*10 + 0 == 10
10*10 + 6 == 106
Case 1 Only Algorithm (Digits at Start of String)
This algorithm is identical to the one above, but instead of continue we can break as soon as we reach a non-digit. This would be much faster if we can assume all the inputs start with digits and the strings are long.
Case 2 Only Algorithm (Digits at End of String)
This is almost the same as Case 1 Only except you have to
iterate from the end of the string to the beginning (aka backwards) stopping on the first non-digit
change the calculation to sum up powers of ten.
Both of those are a bit tricky, so here's what that looks like
public static unsafe int GetNumberBackward(string s)
{
int value = 0;
fixed (char* pString = s)
{
char* pChar = pString + s.Length - 1;
for (int i = 0; i != -1; i++, pChar--)
{
if (*pChar < '0' || *pChar > '9')
break;
value = (*pChar - '0') * (int)Math.Pow(10, i) + value;
}
}
return value;
}
So each of the iteration of the calculation looks like
6*100 + 0 == 6
0*101 + 6 == 6
1*102 + 6 == 106
While I used Math.Pow in these examples, you can find integer only versions that might be faster.
Cases 1-3 Only (i.e. All Digits Contiguous Somewhere in the String
This algorithm says to
Scan all non-digits
Then scan only digits
First non-digit after that, stop
It would look like
public static unsafe int GetContiguousDigits(string s)
{
int value = 0;
fixed (char* pString = s)
{
var pChar = pString;
// skip non-digits
int i = 0;
for (; i != s.Length; i++, pChar++)
if (*pChar >= '0' && *pChar <= '9')
break;
for (; i != s.Length; i++, pChar++)
{
if (*pChar < '0' || *pChar > '9')
break;
value = value * 10 + *pChar - '0';
}
}
return value;
}

More efficient way to get all indexes of a character in a string

Instead of looping through each character to see if it's the one you want then adding the index your on to a list like so:
var foundIndexes = new List<int>();
for (int i = 0; i < myStr.Length; i++)
{
if (myStr[i] == 'a')
foundIndexes.Add(i);
}
You can use String.IndexOf, see example below:
string s = "abcabcabcabcabc";
var foundIndexes = new List<int>();
long t1 = DateTime.Now.Ticks;
for (int i = s.IndexOf('a'); i > -1; i = s.IndexOf('a', i + 1))
{
// for loop end when i=-1 ('a' not found)
foundIndexes.Add(i);
}
long t2 = DateTime.Now.Ticks - t1; // read this value to see the run time
I use the following extension method to yield all results:
public static IEnumerable<int> AllIndexesOf(this string str, string searchstring)
{
int minIndex = str.IndexOf(searchstring);
while (minIndex != -1)
{
yield return minIndex;
minIndex = str.IndexOf(searchstring, minIndex + searchstring.Length);
}
}
usage:
IEnumerable<int> result = "foobar".AllIndexesOf("o"); // [1,2]
Side note to a edge case: This is a string approach which works for one or more characters. In case of "fooo".AllIndexesOf("oo") the result is just 1 https://dotnetfiddle.net/CPC7D2
How about
string xx = "The quick brown fox jumps over the lazy dog";
char search = 'f';
var result = xx.Select((b, i) => b.Equals(search) ? i : -1).Where(i => i != -1);
The raw iteration is always better & most optimized.
Unless it's a bit complex task, you never really need to seek for a better optimized solution...
So I would suggest to continue with :
var foundIndexes = new List<int>();
for (int i = 0; i < myStr.Length; i++)
if (myStr[i] == 'a') foundIndexes.Add(i);
If the string is short, it may be more efficient to search the string once and count up the number of times the character appears, then allocate an array of that size and search the string a second time, recording the indexes in the array. This will skip any list re-allocations.
What it comes down to is how long the string is and how many times the character appears. If the string is long and the character appears few times, searching it once and appending indicies to a List<int> will be faster. If the character appears many times, then searching the string twice (once to count, and once to fill an array) may be faster. Exactly where the tipping point is depends on many factors that can't be deduced from your question.
If you need to search the string for multiple different characters and get a list of indexes for those characters separately, it may be faster to search through the string once and build a Dictionary<char, List<int>> (or a List<List<int>> using character offsets from \0 as the indicies into the outer array).
Ultimately, you should benchmark your application to find bottlenecks. Often the code that we think will perform slowly is actually very fast, and we spend most of our time blocking on I/O or user input.
public static List<int> GetSubstringLocations(string text, string searchsequence)
{
try
{
List<int> foundIndexes = new List<int> { };
int i = 0;
while (i < text.Length)
{
int cindex = text.IndexOf(searchsequence, i);
if (cindex >= 0)
{
foundIndexes.Add(cindex);
i = cindex;
}
i++;
}
return foundIndexes;
}
catch (Exception ex) { }
return new List<int> { };
}
public static String[] Split(this string s,char c = '\t')
{
if (s == null) return null;
var a = new List<int>();
int i = s.IndexOf(c);
if (i < 0) return new string[] { s };
a.Add(i);
for (i = i+1; i < s.Length; i++) if (s[i] == c) a.Add(i);
var result = new string[a.Count +1];
int startIndex = 0;
result[0] = s.Remove(a[0]);
for(i=0;i<a.Count-1;i++)
{
result[i + 1] = s.Substring(a[i] + 1, a[i + 1] - a[i] - 1);
}
result[a.Count] = s.Substring(a[a.Count - 1] + 1);
return result;
}

Categories

Resources