Bitwise OR on strings for large strings in c#

Bitwise OR on strings for large strings in c# - c#

I have two strings(with 1's and 0's) of equal lengths(<=500) and would like to apply Logical OR on these strings.
How should i approach on this. I'm working with c#.
When i consider the obvious solution, reading each char and applying OR | on them, I have to deal with apx, 250000 strings each with 500 length. this would kill my performance.
Performance is my main concern.
Thanks in advance!

This is fastest way:
string x="";
string y="";
StringBuilder sb = new StringBuilder(x.Length);
for (int i = 0; i < x.Length;i++ )
{
sb.Append(x[i] == '1' || y[i] == '1' ? '1' : '0');
}
string result = sb.ToString();

Since it was mentioned that speed is a big factor, it would be best to use bit-wise operations.
Take a look at an ASCII table:
The character '0' is 0x30, or 00110000 in binary.
The character '1' is 0x31, or 00110001 in binary.
Only the last bit of the character is different. As such - we can safely say that performing a bitwise OR on the characters themselves will produce the correct character.
Another important thing we can do is do to optimize speed is to use a StringBuilder, initialized to the initial capacity of our string. Or even better: we can reuse our StringBuilder for multiple operations, although we have to ensure the StringBuilder has enough capacity.
With those optimizations considered, we can make this method:
string BinaryStringBitwiseOR(string a, string b, StringBuilder stringBuilder = null)
{
if (a.Length != b.Length)
{
throw new ArgumentException("The length of given string parameters didn't match");
}
if (stringBuilder == null)
{
stringBuilder = new StringBuilder(a.Length);
}
else
{
stringBuilder.Clear().EnsureCapacity(a.Length);
}
for (int i = 0; i < a.Length; i++)
{
stringBuilder.Append((char)(a[i] | b[i]));
}
return stringBuilder.ToString();
}
Note that this will work for all bit-wise operations you would like to perform on your strings, you only have to modify the | operator.

I've found this to be faster than all proposed solutions. It combines elements from #Gediminas and #Sakura's answers, but uses a pre-initialized char[] rather than a StringBuilder.
While StringBuilder is efficient at memory management, each Append operation requires some bookkeeping of the marker, and performs more actions than only an index into an array.
string x = ...
string y = ...
char[] c = new char[x.Length];
for (int i = 0; i < x.Length; i++)
{
c[i] = (char)(x[i] | y[i]);
}
string result = new string(c);

I have two strings(with 1's and 0's) of equal lengths(<=500) and would
like to apply Logical OR on these strings.
You can write a custom logical OR operator or function which takes two characters as input and produces result (e.g. if at least one of input character is '1' return '1' - otherwise return '0'). Apply this function to each character in your strings.
You can also look at this approach. You'd first need to convert each character to boolean (e.g. '1' corresponds to true), perform OR operation between two boolean values, convert back result to character '0' or '1' - depending if result of logical OR was false or true respectively. Then just append each result of this operation to each other.

You can use a Linq query to zip and then aggregate the results:
var a = "110010";
var b = "001110";
var result = a.Zip(b, (i, j) => i == '1' || j == '1' ? '1' : '0')
.Select(i => i + "").Aggregate((i, j) => i + j);
Basically, the Zip extension method, takes two sequences and apply an action on each corresponding elements of the two sequences. Then I use Select to cast from char to String and finally I aggregate the results from a sequence of strings (of "0" and "1") to a String.

Related

C# locating where the * is in a string separated by pipes

I have to find where a * is at when it could be none at all , 1st position | 2nd position | 3rd position.
The positions are separated by pipes |
Thus
No * wildcard would be
`ABC|DEF|GHI`
However, while that could be 1 scenario, the other 3 are
string testPosition1 = "*|DEF|GHI";
string testPosition2 = "ABC|*|GHI";
string testPosition3 = "ABC|DEF|*";
I gather than I should use IndexOf , but it seems like I should incorporate | (pipe) to know the position ( not just the length as the values could be long or short in each of the 3 places. So I just want to end up knowing if * is in first, second or third position ( or not at all )
Thus I was doing this but i'm not going to know about if it is before 1st or 2nd pipe
if(testPosition1.IndexOf("*") > 0)
{
// Look for pipes?
}

There are lots of ways you could approach this. The most readable might actually just be to do it the hard way (i.e. scan the string to find the first '*' character, keeping track of how many '|' characters you see along the way).
That said, this could be a similarly readable and more concise:
int wildcardPosition = Array.IndexOf(testPosition1.Split('|'), "*");
Returns -1 if not found, otherwise 0-based index for which segment of the '|' delimited string contains the wildcard string.
This only works if the wildcard is exactly the one-character string "*". If you need to support other variations on that, you will still want to split the string, but then you can loop over the array looking for whatever criteria you need.

You can try with linq splitting the string at the pipe character and then getting the index of the element that contains just a *
var x = testPosition2.Split('|').Select((k, i) => new { text = k, index = i}).FirstOrDefault(p => p.text == "*" );
if(x != null) Console.WriteLine(x.index);
So the first line starts splitting the string at the pipe creating an array of strings. This sequence is passed to the Select extension that enumerates the sequence passing the string text (k) and the index (i). With these two parameters we build a sequences of anonymous objects with two properties (text and index). FirstOrDefault extract from this sequence the object with text equals to * and we can print the property index of that object.

The other answers are fine (and likely better), however here is another approach, the good old fashioned for loop and the try-get pattern
public bool TryGetStar(string input, out int index)
{
var split = input.Split('|');
for (index = 0; index < split.Length; index++)
if (split[index] == "*")
return true;
return false;
}
Or if you were dealing with large strings and trying to save allocations. You could remove the Split entirely and use a single parse O(n)
public bool TryGetStar(string input, out int index)
{
index = 0;
for (var i = 0; i < input.Length; i++)
if (input[i] == '|') index++;
else if (input[i] == '*') return true;
return false;
}
Note : if performance was a consideration, you could also use unsafe and pointers, or Span<Char> which would afford a small amount of efficiency.

Try DotNETFiddle:
testPosition.IndexOf("*") - testPosition.Replace("|","").IndexOf("*")
Find the index of the wildcard ("*") and see how far it moves if you remove the pipe ("|") characters. The result is a zero-based index.

From the question you have the following code segment:
if(testPosition1.IndexOf("*") > 0)
{
}
If you're now inside the if statement, you're sure the asterisk exists.
From that point, an efficient solution could be to check the first two chars, and the last two chars.
if (testPosition1.IndexOf("*") > 0)
{
if (testPosition1[0] == '*' && testPosition[1] == '|')
{
// First position.
}
else if (testPosition1[testPosition.Length - 1] == '*' && testPosition1[testPosition.Length - 2] == '|')
{
// Third (last) position.
}
else
{
// Second position.
}
}
This assumes that no more than one * can exist, and also assumes that if an * exist, it can only be surrounded by pipes. For example, I assume an input like ABC|DEF|G*H is invalid.
If you want to remove this assumptions, you could do a one-pass loop over the string and keeping track with the necessary information.

how to add a sign between each letter in a string in C#?

I have a task, in which i have to write a function called accum, which transforms given string into something like this:
Accumul.Accum("abcd"); // "A-Bb-Ccc-Dddd"
Accumul.Accum("RqaEzty"); // "R-Qq-Aaa-Eeee-Zzzzz-Tttttt-Yyyyyyy"
Accumul.Accum("cwAt"); // "C-Ww-Aaa-Tttt"
So far I only converted each letter to uppercase and... Now that I am writing about it, I think it could be easier for me to - firstly multiply the number of each letter and then add a dash there... Okay, well let's say I already multiplied the number of them(I will deal with it later) and now I need to add the dash. I tried several manners to solve this, including: for and foreach(and now that I think of it, I can't use foreach if I want to add a dash after multiplying the letters) with String.Join, String.Insert or something called StringBuilder with Append(which I don't exactly understand) and it does nothing to the string.
One of those loops that I tried was:
for (int letter = 0; letter < s.Length-1; letter += 2) {
if (letter % 2 == 0) s.Replace("", "-");
}
and
for (int letter = 0; letter < s.Length; letter++) {
return String.Join(s, "-");
}
The second one returns "unreachable code" error. What am I doing wrong here, that it does nothing to the string(after uppercase convertion)? Also, is there any method to copy each letter, in order to increase the number of them?

As you say string.join can be used as long as an enumerable is created instead of a foreach. Since the string itself is enumerable, you can use the Linq select overload which includes an index:
var input = "abcd";
var res = string.Join("-", input.Select((c,i) => Char.ToUpper(c) + new string(Char.ToLower(c),i)));
(Assuming each char is unique or can be used. e.g. "aab" would become "A-Aa-Bbb")
Explanation:
The Select extension method takes a lambda function as parameter with c being a char and i the index. The lambda returns an uppercase version of the char (c) folowed by a string of the lowercase char of the index length (new string(char,length)), (which is an empty string for the first index). Finally the string.join concatenates the resulting enumeration with a - between each element.

Use this code.
string result = String.Empty;
for (int i = 0; i < s.Length; i++)
{
char c = s[i];
result += char.ToUpper(c);
result += new String(char.ToLower(c), i);
if (i < s.Length - 1)
{
result += "-";
}
}
It will be better to use StringBuilder instead of strings concatenation, but this code can be a bit more clear.

Strings are immutable, which means that you cannot modify them once you created them. It means that Replace function return a new string that you need to capture somehow:
s = s.Replace("x", "-");
you currently are not assigning the result of the Replace method anywhere, that's why you don't see any results

For the future, the best way to approach problems like this one is not to search for the code snippet, but write down step by step algorithm of how you can achieve the expected result in plain English or some other pseudo code, e.g.
Given I have input string 'abcd' which should turn into output string 'A-Bb-Ccc-Dddd'.
Copy first character 'a' from the input to Buffer.
Store the index of the character to Index.
If Buffer has only one character make it Upper Case.
If Index is greater then 1 trail Buffer with Index-1 lower case characters.
Append dash '-' to the Buffer.
Copy Buffer content to Output and clear Buffer.
Copy second character 'b' from the input to Buffer.
...
etc.
Aha moment often happens on the third iteration. Hope it helps! :)

Shuffling a string so that no two adjacent letters are the same

I've been trying to solve this interview problem which asks to shuffle a string so that no two adjacent letters are identical
For example,
ABCC -> ACBC
The approach I'm thinking of is to
1) Iterate over the input string and store the (letter, frequency)
pairs in some collection
2) Now build a result string by pulling the highest frequency (that is > 0) letter that we didn't just pull
3) Update (decrement) the frequency whenever we pull a letter
4) return the result string if all letters have zero frequency
5) return error if we're left with only one letter with frequency greater than 1
With this approach we can save the more precious (less frequent) letters for last. But for this to work, we need a collection that lets us efficiently query a key and at the same time efficiently sort it by values. Something like this would work except we need to keep the collection sorted after every letter retrieval.
I'm assuming Unicode characters.
Any ideas on what collection to use? Or an alternative approach?

You can sort the letters by frequency, split the sorted list in half, and construct the output by taking letters from the two halves in turn. This takes a single sort.
Example:
Initial string: ACABBACAB
Sort: AAAABBBCC
Split: AAAA+BBBCC
Combine: ABABABCAC
If the number of letters of highest frequency exceeds half the length of the string, the problem has no solution.

Why not use two Data Structures: One for sorting (Like a Heap) and one for key retrieval, like a Dictionary?

The accepted answer may produce a correct result, but is likely not the 'correct' answer to this interview brain teaser, nor the most efficient algorithm.
The simple answer is to take the premise of a basic sorting algorithm and alter the looping predicate to check for adjacency rather than magnitude. This ensures that the 'sorting' operation is the only step required, and (like all good sorting algorithms) does the least amount of work possible.
Below is a c# example akin to insertion sort for simplicity (though many sorting algorithm could be similarly adjusted):
string NonAdjacencySort(string stringInput)
{
var input = stringInput.ToCharArray();
for(var i = 0; i < input.Length; i++)
{
var j = i;
while(j > 0 && j < input.Length - 1 &&
(input[j+1] == input[j] || input[j-1] == input[j]))
{
var tmp = input[j];
input[j] = input[j-1];
input[j-1] = tmp;
j--;
}
if(input[1] == input[0])
{
var tmp = input[0];
input[0] = input[input.Length-1];
input[input.Length-1] = tmp;
}
}
return new string(input);
}
The major change to standard insertion sort is that the function has to both look ahead and behind, and therefore needs to wrap around to the last index.
A final point is that this type of algorithm fails gracefully, providing a result with the fewest consecutive characters (grouped at the front).

Since I somehow got convinced to expand an off-hand comment into a full algorithm, I'll write it out as an answer, which must be more readable than a series of uneditable comments.
The algorithm is pretty simple, actually. It's based on the observation that if we sort the string and then divide it into two equal-length halves, plus the middle character if the string has odd length, then corresponding positions in the two halves must differ from each other, unless there is no solution. That's easy to see: if the two characters are the same, then so are all the characters between them, which totals ⌈n/2⌉&plus;1 characters. But a solution is only possible if there are no more than ⌈n/2⌉ instances of any single character.
So we can proceed as follows:
Sort the string.
If the string's length is odd, output the middle character.
Divide the string (minus its middle character if the length is odd) into two equal-length halves, and interleave the two halves.
At each point in the interleaving, since the pair of characters differ from each other (see above), at least one of them must differ from the last character output. So we first output that character and then the corresponding one from the other half.
The sample code below is in C++, since I don't have a C# environment handy to test with. It's also simplified in two ways, both of which would be easy enough to fix at the cost of obscuring the algorithm:
If at some point in the interleaving, the algorithm encounters a pair of identical characters, it should stop and report failure. But in the sample implementation below, which has an overly simple interface, there's no way to report failure. If there is no solution, the function below returns an incorrect solution.
The OP suggests that the algorithm should work with Unicode characters, but the complexity of correctly handling multibyte encodings didn't seem to add anything useful to explain the algorithm. So I just used single-byte characters. (In C# and certain implementations of C++, there is no character type wide enough to hold a Unicode code point, so astral plane characters must be represented with a surrogate pair.)
#include <algorithm>
#include <iostream>
#include <string>
// If possible, rearranges 'in' so that there are no two consecutive
// instances of the same character.
std::string rearrange(std::string in) {
// Sort the input. The function is call-by-value,
// so the argument itself isn't changed.
std::string out;
size_t len = in.size();
if (in.size()) {
out.reserve(len);
std::sort(in.begin(), in.end());
size_t mid = len / 2;
size_t tail = len - mid;
char prev = in[mid];
// For odd-length strings, start with the middle character.
if (len & 1) out.push_back(prev);
for (size_t head = 0; head < mid; ++head, ++tail)
// See explanatory text
if (in[tail] != prev) {
out.push_back(in[tail]);
out.push_back(prev = in[head]);
}
else {
out.push_back(in[head]);
out.push_back(prev = in[tail]);
}
}
}
return out;
}

you can do that by using a priority queue.
Please find the below explanation.
https://iq.opengenus.org/rearrange-string-no-same-adjacent-characters/

Here is a probabilistic approach. The algorithm is:
10) Select a random char from the input string.
20) Try to insert the selected char in a random position in the output string.
30) If it can't be inserted because of proximity with the same char, go to 10.
40) Remove the selected char from the input string and go to 10.
50) Continue until there are no more chars in the input string, or the failed attempts are too many.
public static string ShuffleNoSameAdjacent(string input, Random random = null)
{
if (input == null) return null;
if (random == null) random = new Random();
string output = "";
int maxAttempts = input.Length * input.Length * 2;
int attempts = 0;
while (input.Length > 0)
{
while (attempts < maxAttempts)
{
int inputPos = random.Next(0, input.Length);
var outputPos = random.Next(0, output.Length + 1);
var c = input[inputPos];
if (outputPos > 0 && output[outputPos - 1] == c)
{
attempts++; continue;
}
if (outputPos < output.Length && output[outputPos] == c)
{
attempts++; continue;
}
input = input.Remove(inputPos, 1);
output = output.Insert(outputPos, c.ToString());
break;
}
if (attempts >= maxAttempts) throw new InvalidOperationException(
$"Shuffle failed to complete after {attempts} attempts.");
}
return output;
}
Not suitable for strings longer than 1,000 chars!
Update: And here is a more complicated deterministic approach. The algorithm is:
Group the elements and sort the groups by length.
Create three empty piles of elements.
Insert each group to a separate pile, inserting always the largest group to the smallest pile, so that the piles differ in length as little as possible.
Check that there is no pile with more than half the total elements, in which case satisfying the condition of not having same adjacent elements is impossible.
Shuffle the piles.
Start yielding elements from the piles, selecting a different pile each time.
When the piles that are eligible for selection are more than one, select randomly, weighting by the size of each pile. Piles containing near half of the remaining elements should be much preferred. For example if the remaining elements are 100 and the two eligible piles have 49 and 40 elements respectively, then the first pile should be 10 times more preferable than the second (because 50 - 49 = 1 and 50 - 40 = 10).
public static IEnumerable<T> ShuffleNoSameAdjacent<T>(IEnumerable<T> source,
Random random = null, IEqualityComparer<T> comparer = null)
{
if (source == null) yield break;
if (random == null) random = new Random();
if (comparer == null) comparer = EqualityComparer<T>.Default;
var grouped = source
.GroupBy(i => i, comparer)
.OrderByDescending(g => g.Count());
var piles = Enumerable.Range(0, 3).Select(i => new Pile<T>()).ToArray();
foreach (var group in grouped)
{
GetSmallestPile().AddRange(group);
}
int totalCount = piles.Select(e => e.Count).Sum();
if (piles.Any(pile => pile.Count > (totalCount + 1) / 2))
{
throw new InvalidOperationException("Shuffle is impossible.");
}
piles.ForEach(pile => Shuffle(pile));
Pile<T> previouslySelectedPile = null;
while (totalCount > 0)
{
var selectedPile = GetRandomPile_WeightedByLength();
yield return selectedPile[selectedPile.Count - 1];
selectedPile.RemoveAt(selectedPile.Count - 1);
totalCount--;
previouslySelectedPile = selectedPile;
}
List<T> GetSmallestPile()
{
List<T> smallestPile = null;
int smallestCount = Int32.MaxValue;
foreach (var pile in piles)
{
if (pile.Count < smallestCount)
{
smallestPile = pile;
smallestCount = pile.Count;
}
}
return smallestPile;
}
void Shuffle(List<T> pile)
{
for (int i = 0; i < pile.Count; i++)
{
int j = random.Next(i, pile.Count);
if (i == j) continue;
var temp = pile[i];
pile[i] = pile[j];
pile[j] = temp;
}
}
Pile<T> GetRandomPile_WeightedByLength()
{
var eligiblePiles = piles
.Where(pile => pile.Count > 0 && pile != previouslySelectedPile)
.ToArray();
Debug.Assert(eligiblePiles.Length > 0, "No eligible pile.");
eligiblePiles.ForEach(pile =>
{
pile.Proximity = ((totalCount + 1) / 2) - pile.Count;
pile.Score = 1;
});
Debug.Assert(eligiblePiles.All(pile => pile.Proximity >= 0),
"A pile has negative proximity.");
foreach (var pile in eligiblePiles)
{
foreach (var otherPile in eligiblePiles)
{
if (otherPile == pile) continue;
pile.Score *= otherPile.Proximity;
}
}
var sumScore = eligiblePiles.Select(p => p.Score).Sum();
while (sumScore > Int32.MaxValue)
{
eligiblePiles.ForEach(pile => pile.Score /= 100);
sumScore = eligiblePiles.Select(p => p.Score).Sum();
}
if (sumScore == 0)
{
return eligiblePiles[random.Next(0, eligiblePiles.Length)];
}
var randomScore = random.Next(0, (int)sumScore);
int accumulatedScore = 0;
foreach (var pile in eligiblePiles)
{
accumulatedScore += (int)pile.Score;
if (randomScore < accumulatedScore) return pile;
}
Debug.Fail("Could not select a pile randomly by weight.");
return null;
}
}
private class Pile<T> : List<T>
{
public int Proximity { get; set; }
public long Score { get; set; }
}
This implementation can suffle millions of elements. I am not completely convinced that the quality of the suffling is as perfect as the previous probabilistic implementation, but should be close.

func shuffle(str:String)-> String{
var shuffleArray = [Character](str)
//Sorting
shuffleArray.sort()
var shuffle1 = [Character]()
var shuffle2 = [Character]()
var adjacentStr = ""
//Split
for i in 0..<shuffleArray.count{
if i > shuffleArray.count/2 {
shuffle2.append(shuffleArray[i])
}else{
shuffle1.append(shuffleArray[i])
}
}
let count = shuffle1.count > shuffle2.count ? shuffle1.count:shuffle2.count
//Merge with adjacent element
for i in 0..<count {
if i < shuffle1.count{
adjacentStr.append(shuffle1[i])
}
if i < shuffle2.count{
adjacentStr.append(shuffle2[i])
}
}
return adjacentStr
}
let s = shuffle(str: "AABC")
print(s)

Most efficient way to determine first character in string?

Which of these methods are the most efficient one or is there a better way to do it?
this.returnList[i].Title[0].ToString()
or
this.returnList[i].Title.Substring(0, 1)

They're both very fast:
Char Index
var sample = "sample";
var clock = new Stopwatch();
for (var i = 0; i < 10; i++)
{
clock.Start();
for (var j = 0; j < 10000000; j++)
{
var first = sample[0].ToString();
}
clock.Stop();
Console.Write(clock.Elapsed);
clock.Reset();
}
// Results
00:00:00.2012243
00:00:00.2207168
00:00:00.2184807
00:00:00.2258847
00:00:00.2296456
00:00:00.2261465
00:00:00.2120131
00:00:00.2221702
00:00:00.2346083
00:00:00.2330840
Substring
var sample = "sample";
var clock = new Stopwatch();
for (var i = 0; i < 10; i++)
{
clock.Start();
for (var j = 0; j < 10000000; j++)
{
var first = sample.Substring(0, 1);
}
clock.Stop();
Console.Write(clock.Elapsed);
clock.Reset();
}
// Results
00:00:00.3268155
00:00:00.3337077
00:00:00.3439908
00:00:00.3273090
00:00:00.3380794
00:00:00.3400650
00:00:00.3280275
00:00:00.3333719
00:00:00.3295982
00:00:00.3368425
I also agree with BrokenGlass that using the char index is a cleaner way of writing it. Plus if you're doing it 10 trillion times it'll be much faster!

There is a big loophole in your code that may cause problems, depending on what you mean by "first character" and what returnList contains.
C# strings contain UTF-16, which is a variable-length encoding, and if returnList is an array of strings, then returnList[i] might only be one char of a Unicode point. If you want to return the first Unicode grapheme of a string:
string s = returnList[i].Title;
if (string.IsNullOrEmpty(s))
return s;
int charsInGlyph = char.IsSurrogatePair(s, 0) ? 2 : 1;
return s.Substring(0, charsInGlyph);
You can run into the same problems with BOMs, tagged, and combining characters; these are all valid characters but are not meaningful if displayed to a user.
If you want Unicode points or graphemes, not chars, you must use strings; Unicode graphemes can be more than one char.

I don't think it would matter much efficiency wise, but in my opinion the clearer, more idiomatic and hence more maintainable way of returning the first character is using the index operator:
char c = returnList[i].Title[0];
This assumes of course there is at least one character, if that's not a given you have to check for that.

Those should be close to identical in performance.
The expensive part of the operation is to create the string, and there is no more efficient way to do that.
Unless of couse you want to pre-create strings for all possible characters and store in a dictionary, but that would use up a lot of memory for such a trivial task.

returnList[I].Title[0] is much faster as it does not need to create a new string, only accessing a char from the original one.
Of course, it will throw an exception if the string is empty, so you should check that first.
As a rule of thumb, never use strings with a fixed length of 1. that's what char is for.
The performance difference is not likely to matter though, but the better readability certainly will.

Generate all Permutations of text from a regex pattern in C#

So i have a regex pattern, and I want to generate all the text permutations that would be allowed from that pattern.
Example:
var pattern = "^My (?:biological|real)? Name is Steve$";
var permutations = getStringPermutations(pattern);
This would return the list of strings below:
My Name is Steve
My real Name is Steve
My biological Name is Steve
Update:
Obviously a regex has an infinate number of matches, so i only want to generate off of optional string literals as in the (?:biological|real)? from my example above. Something like (.)* has too many matches, so I will not be generating them off of that.

If you restrict yourself to the subset of regular expressions that are anchored at both ends, and involve only literal text, single-character wildcards, and alternation, the matching
strings should be pretty easy to enumerate. I'd probably rewrite the regex as a BNF grammar
and use that to generate an exhaustive list of matching strings. For your example:
<lang> -> <begin> <middle> <end>
<begin> -> "My "
<middle> -> "" | "real" | "biological"
<end> -> " name is Steve"
Start with the productions that have only terminal symbols on the RHS, and enumerate
all the possible values that the nonterminal on the LHS could take. Then work your
way up to the productions with nonterminals on the RHS. For concatenation of nonterminal symbols, form the Cartesian product of the sets represented by each RHS nonterminal.
For alternation, take the union of the sets represented by each option. Continue
until you've worked your way up to <lang>, then you're done.
However, once you include the '*' or '+' operators, you have to contend with infinite
numbers of matching strings. And if you also want to handle advanced features like backreferences...you're probably well on your way to something that's isomorphic
to the Halting Problem!

One method that might be a bit weird would be to put the possible choices into an array first, and then generate the regex based on the array and then use the same array to generate the permutations.

Here's a sketch of a function I wrote to take a List of Strings and return a list of all the permutated possibilities: (taking on char from each)
public static List<string> Calculate(List<string> strings) {
List<string> returnValue = new List<string>();
int[] numbers = new int[strings.Count];
for (int x = 0; x < strings.Count; x++) {
numbers[x] = 0;
}
while (true) {
StringBuilder value = new StringBuilder();
for (int x = 0; x < strings.Count; x++) {
value.Append(strings[x][numbers[x]]);
//int absd = numbers[x];
}
returnValue.Add(value.ToString());
numbers[0]++;
for (int x = 0; x < strings.Count-1; x++) {
if (numbers[x] == strings[x].Length) {
numbers[x] = 0;
numbers[x + 1] += 1;
}
}
if (numbers[strings.Count-1] == strings[strings.Count-1].Length)
break;
}
return returnValue;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Bitwise OR on strings for large strings in c# - c#

This is fastest way: string x=""; string y=""; StringBuilder sb = new StringBuilder(x.Length); for (int i = 0; i < x.Length;i++ ) { sb.Append(x[i] == '1' || y[i] == '1' ? '1' : '0'); } string result = sb.ToString();

Related

C# locating where the * is in a string separated by pipes

how to add a sign between each letter in a string in C#?

Shuffling a string so that no two adjacent letters are the same

Most efficient way to determine first character in string?

Generate all Permutations of text from a regex pattern in C#

Categories

Resources