Generate all Permutations of text from a regex pattern in C# - c#

So i have a regex pattern, and I want to generate all the text permutations that would be allowed from that pattern.
Example:
var pattern = "^My (?:biological|real)? Name is Steve$";
var permutations = getStringPermutations(pattern);
This would return the list of strings below:
My Name is Steve
My real Name is Steve
My biological Name is Steve
Update:
Obviously a regex has an infinate number of matches, so i only want to generate off of optional string literals as in the (?:biological|real)? from my example above. Something like (.)* has too many matches, so I will not be generating them off of that.

If you restrict yourself to the subset of regular expressions that are anchored at both ends, and involve only literal text, single-character wildcards, and alternation, the matching
strings should be pretty easy to enumerate. I'd probably rewrite the regex as a BNF grammar
and use that to generate an exhaustive list of matching strings. For your example:
<lang> -> <begin> <middle> <end>
<begin> -> "My "
<middle> -> "" | "real" | "biological"
<end> -> " name is Steve"
Start with the productions that have only terminal symbols on the RHS, and enumerate
all the possible values that the nonterminal on the LHS could take. Then work your
way up to the productions with nonterminals on the RHS. For concatenation of nonterminal symbols, form the Cartesian product of the sets represented by each RHS nonterminal.
For alternation, take the union of the sets represented by each option. Continue
until you've worked your way up to <lang>, then you're done.
However, once you include the '*' or '+' operators, you have to contend with infinite
numbers of matching strings. And if you also want to handle advanced features like backreferences...you're probably well on your way to something that's isomorphic
to the Halting Problem!

One method that might be a bit weird would be to put the possible choices into an array first, and then generate the regex based on the array and then use the same array to generate the permutations.

Here's a sketch of a function I wrote to take a List of Strings and return a list of all the permutated possibilities: (taking on char from each)
public static List<string> Calculate(List<string> strings) {
List<string> returnValue = new List<string>();
int[] numbers = new int[strings.Count];
for (int x = 0; x < strings.Count; x++) {
numbers[x] = 0;
}
while (true) {
StringBuilder value = new StringBuilder();
for (int x = 0; x < strings.Count; x++) {
value.Append(strings[x][numbers[x]]);
//int absd = numbers[x];
}
returnValue.Add(value.ToString());
numbers[0]++;
for (int x = 0; x < strings.Count-1; x++) {
if (numbers[x] == strings[x].Length) {
numbers[x] = 0;
numbers[x + 1] += 1;
}
}
if (numbers[strings.Count-1] == strings[strings.Count-1].Length)
break;
}
return returnValue;
}

Related

Using LINQ to check if a string contains a list of strings or characters

I've seen this question asked in different ways (check if a list contains a certain string or whether a string contains any given character) but I need something else.
The programme I'm working on is Poker-related and displays list of poker hands in the format [rank][suit], e.g. AhKd5h3c, in multiple DataGridViews.
Right now, I have this rudimentary textbox filter in place which is working fine.
for (int i = 0; i < allFilteredRows.Count; i++)
{
allFilteredRows[i] = new BindingList<CSVModel>(allRows[i].Where
(x => x.Combo.Contains(txtHandFilter.Text)).ToList());
}
allFilteredRows is the data source for my DataGridViews. allRows is the unfiltered list of hands from an SQL database. Combo is an individual poker hand.
That only filters for the exact sequence of characters in the textbox, though. What I want is to filter for each rank and suit individually. So if the user types 'AK', all combos that contain (at least) one ace and one king should be displayed. If the input is 'sss', it should filter for all hands with at least three spades. The order of the characters should not matter ('KA' is equal to 'AK') but every character needs to be included and ranks and suits can be combined, e.g. AKh should filter for all hands with at least one ace and the king of hearts.
This goes beyond my knowledge of LINQ so I'd be grateful for any help.
It seems to me you have two split operations you need to perform. First, you need to split up your filter string so that it consists of the individual card filters in it - either rank, suit, or a particular card. You can do this using regular expressions.
First, create character sets representing possible ranks and possible suits:
var ranks = "[23456789TJQKA]";
var suits = "[hsdc]";
Then, create a regular expression to extract the individual card filters:
var aCardFilter = $"{ranks}{suits}";
Finally, using an extension method that returns the values of the matches from a regular expression (or inlining it), you can split the filter and then group the similar filters. You will end up with two filters, individual card filters and rank/suite filters:
var cardFilters = txtHandFilter.Text.Matches(aCardFilter).ToList();
var suitRankFilters = txtHandFilter.Text.GroupBy(filterCh => filterCh).ToList();
Now you need to split the hand string into a collection of cards. Since each card is two characters, you can just split on substrings at every 2nd position. I wrapped this in a local function to make the code clearer:
IEnumerable<string> handToCards(string hand) => Enumerable.Range(0, hand.Length / 2).Select(p => hand.Substring(2 * p, 2));
Now you can test a hand for matching the card filters by checking that each card occurs in the hand, and for matching the suite/rank filters by checking that each occurs at least as often in the hand as in the filters:
bool handMatchesCardFilters(string hand, List<string> filters)
=> filters.All(filter => handToCards(hand).Contains(filter));
bool handMatchesFilters(string hand, List<IGrouping<char, char>> filters)
=> filters.All(fg => handToCards(hand).Count(card => card.Contains(fg.Key)) >= fg.Count());
Finally you are ready to filter the rows:
for (int i = 0; i < allFilteredRows.Count; ++i)
allFilteredRows[i] = new BindingList<CSVModel>(
allRows[i].Where(row => handMatchesCardFilters(row.Combo, cardFilters) &&
handMatchesFilters(row.Combo, suitRankFilters))
.ToList());
The extension method needed is
public static class StringExt {
public static IEnumerable<string> Matches(this string s, string sre) => Regex.Matches(s, sre).Cast<Match>().Select(m => m.Value);
}
Convert your filter string to List of cards (from what I see, each item is length of 2 if 2nd char is small, length of 1 otherwise).
Use GroupBy to get count of each card, so you should have struct { "K", 3 } instead of "KKK".
https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.groupby?view=net-5.0
Repeat on each hand (I assume, that each hand is List instead of just "AhKd5h3c" string)
Iterate through hands checking if it contains all items from p2 output (I assume that { "K", 3 } should return combo that has 4 "K" as well:
public bool Match(object GroupedCombo, object filter)
{
foreach (var item in filter)
{
if (!GroupedCombo.Contains(x => x.Card==filter.Card && x.Count>=filter.Count))
return false;
}
return true;
}
It sounds like you want to evaluate multiple conditions for your filter...
The conditions that you need to evaluate look like they are going to require you to parse through a string and extrapolate representations of card combinations. Problem here is it doesn't have any delimiters.
For my suggestion you will need them:
for (int i = 0; i < allFilteredRows.Count; i++)
{
allFilteredRows[i] = new BindingList<CSVModel>(allRows[i].Where
(x => txtHandFilter.Text.Split(" ").All(y => x.Combo.Contains(y)).ToList());
}
It the textbox you should have the card combos separated by spaces.
You could find some other way to delimit the txtHandFilter if you like, but I think this answers your base question.
Edit: I can think of only two options for coming up with the string array from the text box that you want:
1: Delimit it.
2: Parse through it character by character to find strings that match a card type representation from a dictionary.(To me this seems more impractical)
As for how to count occurrences I think that Michał Woliński has the right idea.
using System;
using System.Linq;
var cards = from card in txtHandFilter.Text.Split(" ")
group card by card
into g
select new { Card = g.Key, Count = g.Count() };
for (int i = 0; i < allFilteredRows.Count; i++)
{
allFilteredRows[i] = new BindingList<CSVModel>(allRows[i]
.Where(x => cards
.All(y => x.Combo.Contains(String.Concat(Enumerable.Repeat(y.Card, y.Count)))
.ToList());
}

Package 3 chars of Array into an integer C#

So, i have this array which contains a bunch of numbers. I want to always take 3 of those chars and make one integer out of them. I haven't found anything on this yet.
here is an example:
string number = "123456xyz";
The string is what I have, these integers are what I want
int goal1 = 123;
int goal2 = 456;
int goaln = xyz;
It should go through all the chars and always split them into groups of three. I think foreach() is going to help me, but im not quite sure how to do it.
Something like this:
var goals = new List<int>();
for (int i = 0; i + 2 < number.Length; i += 3)
{
goals.Add(int.Parse(number.Substring(i,3)));
}
This has no error checking but it shows the general outline. Foreach isn't a great option because it would go through the characters one at a time when you want to look at them three at a time.
var numbers = (from Match m in Regex.Matches(number, #"\d{3}")
select m.Value).ToList();
var goal1 = Convert.ToInt32(numbers[0]);
var goal2 = Convert.ToInt32(numbers[1]);
...

Bitwise OR on strings for large strings in c#

I have two strings(with 1's and 0's) of equal lengths(<=500) and would like to apply Logical OR on these strings.
How should i approach on this. I'm working with c#.
When i consider the obvious solution, reading each char and applying OR | on them, I have to deal with apx, 250000 strings each with 500 length. this would kill my performance.
Performance is my main concern.
Thanks in advance!
This is fastest way:
string x="";
string y="";
StringBuilder sb = new StringBuilder(x.Length);
for (int i = 0; i < x.Length;i++ )
{
sb.Append(x[i] == '1' || y[i] == '1' ? '1' : '0');
}
string result = sb.ToString();
Since it was mentioned that speed is a big factor, it would be best to use bit-wise operations.
Take a look at an ASCII table:
The character '0' is 0x30, or 00110000 in binary.
The character '1' is 0x31, or 00110001 in binary.
Only the last bit of the character is different. As such - we can safely say that performing a bitwise OR on the characters themselves will produce the correct character.
Another important thing we can do is do to optimize speed is to use a StringBuilder, initialized to the initial capacity of our string. Or even better: we can reuse our StringBuilder for multiple operations, although we have to ensure the StringBuilder has enough capacity.
With those optimizations considered, we can make this method:
string BinaryStringBitwiseOR(string a, string b, StringBuilder stringBuilder = null)
{
if (a.Length != b.Length)
{
throw new ArgumentException("The length of given string parameters didn't match");
}
if (stringBuilder == null)
{
stringBuilder = new StringBuilder(a.Length);
}
else
{
stringBuilder.Clear().EnsureCapacity(a.Length);
}
for (int i = 0; i < a.Length; i++)
{
stringBuilder.Append((char)(a[i] | b[i]));
}
return stringBuilder.ToString();
}
Note that this will work for all bit-wise operations you would like to perform on your strings, you only have to modify the | operator.
I've found this to be faster than all proposed solutions. It combines elements from #Gediminas and #Sakura's answers, but uses a pre-initialized char[] rather than a StringBuilder.
While StringBuilder is efficient at memory management, each Append operation requires some bookkeeping of the marker, and performs more actions than only an index into an array.
string x = ...
string y = ...
char[] c = new char[x.Length];
for (int i = 0; i < x.Length; i++)
{
c[i] = (char)(x[i] | y[i]);
}
string result = new string(c);
I have two strings(with 1's and 0's) of equal lengths(<=500) and would
like to apply Logical OR on these strings.
You can write a custom logical OR operator or function which takes two characters as input and produces result (e.g. if at least one of input character is '1' return '1' - otherwise return '0'). Apply this function to each character in your strings.
You can also look at this approach. You'd first need to convert each character to boolean (e.g. '1' corresponds to true), perform OR operation between two boolean values, convert back result to character '0' or '1' - depending if result of logical OR was false or true respectively. Then just append each result of this operation to each other.
You can use a Linq query to zip and then aggregate the results:
var a = "110010";
var b = "001110";
var result = a.Zip(b, (i, j) => i == '1' || j == '1' ? '1' : '0')
.Select(i => i + "").Aggregate((i, j) => i + j);
Basically, the Zip extension method, takes two sequences and apply an action on each corresponding elements of the two sequences. Then I use Select to cast from char to String and finally I aggregate the results from a sequence of strings (of "0" and "1") to a String.

Select certain part in string as variable c#

I do have a string like the following
"1 1/2 + 2 2/3"
Now i want the "1 1/2" as a variable, and the "2 2/3" as a different variable.
How do i fix this?
Thanks.
If you are always going to have a '+' inbetween, you could simply do:
var splitStrings = stringWithPlus.Split('+');
for (int i = 0; i < splitStrings.Length; i++) {
splitStrings[i] = splitStrings[i].Trim();
}
edit: If you really wanted to put these two parts into two separate variables, you could do so. But it's quite unnecessary. The type of the var is going to be string[] but to get them into two variables:
var splitStrings = stringWithPlus.Split('+');
for (int i = 0; i < splitStrings.Length; i++) {
splitStrings[i] = splitStrings[i].Trim();
}
string firstHalf = splitStrings[0];
string secondHalf = splitStrings[1];
It would be better though, to just access these strings via the array, as then you're not allocating any more memory for the same data.
If you are comfortable with Linq and want to shorten this (the above example illustrates exactly what happens) you can do the split & foreach in one line:
var splitStrings = stringWithPlus.Split('+').Select(aString => aString.Trim()).ToArray();
string firstHalf=splitStrings[0];
string secondHalf=splitStrings[1];
If this syntax is confusing, you should do some searches on Linq, and more specifically Linq to Objects.
To make it shorter I used Linq to Trim the strings. Then I converted it back to an array.
string[] parts = stringWithPlus.Split('+').Select(p => p.Trim()).ToArray();
Use them as:
parts[0], parts[1]... parts[n - 1]
where n = parts.Length.

Time complexity of a powerset generating function

I'm trying to figure out the time complexity of a function that I wrote (it generates a power set for a given string):
public static HashSet<string> GeneratePowerSet(string input)
{
HashSet<string> powerSet = new HashSet<string>();
if (string.IsNullOrEmpty(input))
return powerSet;
int powSetSize = (int)Math.Pow(2.0, (double)input.Length);
// Start at 1 to skip the empty string case
for (int i = 1; i < powSetSize; i++)
{
string str = Convert.ToString(i, 2);
string pset = str;
for (int k = str.Length; k < input.Length; k++)
{
pset = "0" + pset;
}
string set = string.Empty;
for (int j = 0; j < pset.Length; j++)
{
if (pset[j] == '1')
{
set = string.Concat(set, input[j].ToString());
}
}
powerSet.Add(set);
}
return powerSet;
}
So my attempt is this:
let the size of the input string be n
in the outer for loop, must iterate 2^n times (because the set size is 2^n).
in the inner for loop, we must iterate 2*n times (at worst).
1. So Big-O would be O((2^n)*n) (since we drop the constant 2)... is that correct?
And n*(2^n) is worse than n^2.
if n = 4 then
(4*(2^4)) = 64
(4^2) = 16
if n = 100 then
(10*(2^10)) = 10240
(10^2) = 100
2. Is there a faster way to generate a power set, or is this about optimal?
A comment:
the above function is part of an interview question where the program is supposed to take in a string, then print out the words in the dictionary whose letters are an anagram subset of the input string (e.g. Input: tabrcoz Output: boat, car, cat, etc.). The interviewer claims that a n*m implementation is trivial (where n is the length of the string and m is the number of words in the dictionary), but I don't think you can find valid sub-strings of a given string. It seems that the interviewer is incorrect.
I was given the same interview question when I interviewed at Microsoft back in 1995. Basically the problem is to implement a simple Scrabble playing algorithm.
You are barking up completely the wrong tree with this idea of generating the power set. Nice thought, clearly way too expensive. Abandon it and find the right answer.
Here's a hint: run an analysis pass over the dictionary that builds a new data structure more amenable to efficiently solving the problem you actually have to solve. With an optimized dictionary you should be able to achieve O(nm). With a more cleverly built data structure you can probably do even better than that.
2. Is there a faster way to generate a power set, or is this about optimal?
Your algorithm is reasonable, but your string handling could use improvement.
string str = Convert.ToString(i, 2);
string pset = str;
for (int k = str.Length; k < input.Length; k++)
{
pset = "0" + pset;
}
All you're doing here is setting up a bitfield, but using a string. Just skip this, and use variable i directly.
for (int j = 0; j < input.Length; j++)
{
if (i & (1 << j))
{
When you build the string, use a StringBuilder, not creating multiple strings.
// At the beginning of the method
StringBuilder set = new StringBuilder(input.Length);
...
// Inside the loop
set.Clear();
...
set.Append(input[j]);
...
powerSet.Add(set.ToString());
Will any of this change the complexity of your algorithm? No. But it will significantly reduce the number of extra String objects you create, which will provide you a good speedup.

Categories

Resources