Team:
I am building a business rules engine that is contextually aware -- but it's weighted -- or in other words each business rule has a level of granularity defined by the segments of a key. The segments are not combinatorial in that they cannot be weighted in any order, but rather permutable like a combination lock (interestingly enough improperly named but widely accepted).
However, to reduce the amount of code that's necessary to provide the business rules we are only building exclusion files meaning that each segment could end up with a specific key value or ALL.
So, now that we have an abstract background, let's take a concrete example. The defined segments are as follows:
Line of Business (LOB)
Company
State
Now, let's assume for this example that the LOB is ABC, Company is G and State is WY. If you break that down I should get the following permutations:
ABC_G_WY
ABC_G_ALL
ABC_ALL_WY
ABC_ALL_ALL
ALL_G_WY
ALL_G_ALL
ALL_ALL_WY
ALL_ALL_ALL
However, I need an algorithm to solve that problem. The segments must also be returned in the aforementioned order because you must always find the most finite rule first.
I look forward to your responses and thank you all in advance!
public static void Main(string[] args)
{
List<string> inputValues = new List<string>() { "ABC", "G", "WY" };
List<string> results = new List<string>();
int permutations = (int)Math.Pow(2.0, (double)inputValues.Count);
for (int i = 0; i < permutations; i++)
{
int mask = 1;
Stack<string> lineValues = new Stack<string>();
for (int j = inputValues.Count-1; j >= 0; j--, mask <<= 1)
{
if ((i & mask) == 0)
{
lineValues.Push(inputValues[j]);
}
else
{
lineValues.Push("ALL");
}
}
results.Add(string.Join("_", lineValues.ToArray())); //ToArray can go away in 4.0(?) I've been told. I'm still on 3.5
}
foreach (string s in results)
{
Console.WriteLine(s);
}
Console.WriteLine("Press any key to exit...");
Console.ReadKey(true);
}
If I get the question right, You should:
-Generate all binary strings of length N (there will be 2^N of them)
-sort them by number of bits set
-generate rules. Rule has 'ALL' in position i, if bit number i in the corresponding binary string is set
Related
I have a function that needs to generate a sequence of num chars to test a security algorithm.
For instance, patternLength of 4 it would generate:
["0000", "0001", "0002", ... , "9999"]
and 3 would generate:
["000", "001", ... "999"] and so on.
As we know, recursion can be pretty expensive and begins to slow to a crawl at higher lengths so I'm hoping to speed it up by using some caching or DP. Is this at all possible?
Current function with recursion:
private static List<char> PossibleCharacters = new List<char>()
{
'0','1','2','3','4','5','6','7','8','9'
};
public static List<string> SequenceGenerator(int patternLength)
{
List<string> result = new List<string>();
if (patternLength > 0)
{
List<string> prev = SequenceGenerator(patternLength - 1);
foreach (string entry in prev)
{
foreach (char ch in PossibleCharacters)
{
result.Add(entry + ch);
}
}
}
else
{
result.Add("");
}
return result;
}
My messy attempt. I'm building the list starting with length 1, then 2, and so on. It gets to 999 -> 1000 where it becomes wrong, it should be 999 -> 0000. Of course, I'll need to clear the contents of the cache where the lengths aren't what I want.
// patternLength = 4
string[] result = new string[10000];
string[] cache = new string[10000];
result.Append("");
dp[0] = "0";
int i = 1;
int j = 0;
foreach (string entry in result)
{
foreach (char ch in PossibleCharacters)
{
cache[i] = entry + ch;
i++;
}
result[j + 1] = j.ToString();
j++;
}
return cache.toList();
Thanks all for your time.
Your code is inherently bounded by an O(10^n) minimum time complexity (where n is the length of the tested string) and there is no way to bypass this limit without changing its' functionality. In other words, your code is likely mostly slow because you are doing a lot of work and not because you implemented the aforementioned work in a particularly inefficient way. Optimizing string generation is very unlikely to provide any significant improvement in run time. Utilizing multithreading is more likely to provide real-world performance improvements in your described scenario, so personally I would start from there. Additionally, if you only need to access each tested string once, it is preferable to to generate them one-by-one as they are used in order to reduce your program's space complexity. Currently you are storing every string for the entire duration of your program or function's execution, which is a potentially wasteful usage of memory if they are only needed once.
using System;
namespace reverse
{
class Program
{
static void Main(string[] args)
{
int[] a = new int [10];
for (int i= 0; i<a.Length; i++)
{
a[i] = int.Parse(Console.ReadLine());
}
}
}
}
here I can get values from a user by pressing enter key after each time i give the value but I want to give value as a whole with comma. Thanks!
I would suggest to gradually step towards functional programming.
Why?
Weel, with words from Eric Lippert from "Functional programming for beginners"
I will talk about the “why” a little bit, but basically it boils down to
a great many bugs are caused by bad mutations.
By taking a page
from functional style and designing programs so that variables and
data structures change as little as possible, we can eliminate a
large source of bugs.
Structuring programs into small functions
whose outputs each depend solely on inputs
makes for programs that
are easy to unit test.
the ability to pass functions as data
allows us to create new and interesting control flows and patterns,
like LINQ.
Rewriting your code
Use Linq in a single and simple line:
int [] res =
Console.ReadLine () // read the input
.Split (',') // transform it into an array
.Take (10) // consider only the first 10 strings
.Select (int.Parse) // transform them into int
.ToArray (); // and finally get an array
You can add a check after the Split and before Take:
.Where (d => {int t; return int.TryParse (d, out t);}).Take
Try this one and read comments to get more info :
static void Main()
{
string input = Console.ReadLine(); // grab user input in one whole line
string[] splitted = input.Split(','); // split user input by comma sign ( , )
int[] a = new int[splitted.Length]; // create a new array that matches user input values length
for(int i = 0; i < splitted.Length; i++) // iterate through all user input values
{
int temp = -1; // create temporary field to hold result
int.TryParse(splitted[i], out temp); // check if user inpu value can be parsed into int
a[i] = temp; // assign parsed int value
}
}
This method will ensure that program will execute even if user wont input numerics. For example if user input will be :
1 , 2,3,45,8,9898
The output will be :
{ 1, 2, 3, 45, 8, 9898 }
But if the input will be :
1,adsadsa,13,4,6,dsd
The output will be :
{ 1, 0, 13, 4, 6, 0 }
In short - I want to convert the first answer to the question here from Python into C#. My current solution to splitting conjoined words is exponential, and I would like a linear solution. I am assuming no spacing and consistent casing in my input text.
Background
I wish to convert conjoined strings such as "wickedweather" into separate words, for example "wicked weather" using C#. I have created a working solution, a recursive function using exponential time, which is simply not efficient enough for my purposes (processing at least over 100 joined words). Here the questions I have read so far, which I believe may be helpful, but I cannot translate their responses from Python to C#.
How can I split multiple joined words?
Need help understanding this Python Viterbi algorithm
How to extract literal words from a consecutive string efficiently?
My Current Recursive Solution
This is for people who only want to split a few words (< 50) in C# and don't really care about efficiency.
My current solution works out all possible combinations of words, finds the most probable output and displays. I am currently defining the most probable output as the one which uses the longest individual words - I would prefer to use a different method. Here is my current solution, using a recursive algorithm.
static public string find_words(string instring)
{
if (words.Contains(instring)) //where words is my dictionary of words
{
return instring;
}
if (solutions.ContainsKey(instring.ToString()))
{
return solutions[instring];
}
string bestSolution = "";
string solution = "";
for (int i = 1; i < instring.Length; i++)
{
string partOne = find_words(instring.Substring(0, i));
string partTwo = find_words(instring.Substring(i, instring.Length - i));
if (partOne == "" || partTwo == "")
{
continue;
}
solution = partOne + " " + partTwo;
//if my current solution is smaller than my best solution so far (smaller solution means I have used the space to separate words fewer times, meaning the words are larger)
if (bestSolution == "" || solution.Length < bestSolution.Length)
{
bestSolution = solution;
}
}
solutions[instring] = bestSolution;
return bestSolution;
}
This algorithm relies on having no spacing or other symbols in the entry text (not really a problem here, I'm not fussed about splitting up punctuation). Random additional letters added within the string can cause an error, unless I store each letter of the alphabet as a "word" within my dictionary. This means that "wickedweatherdykjs" would return "wicked weather d y k j s" using the above algorithm, when I would prefer an output of "wicked weather dykjs".
My updated exponential solution:
static List<string> words = File.ReadLines("E:\\words.txt").ToList();
static Dictionary<char, HashSet<string>> compiledWords = buildDictionary(words);
private void btnAutoSpacing_Click(object sender, EventArgs e)
{
string text = txtText.Text;
text = RemoveSpacingandNewLines(text); //get rid of anything that breaks the algorithm
if (text.Length > 150)
{
//possibly split the text up into more manageable chunks?
//considering using textSplit() for this.
}
else
{
txtText.Text = find_words(text);
}
}
static IEnumerable<string> textSplit(string str, int chunkSize)
{
return Enumerable.Range(0, str.Length / chunkSize)
.Select(i => str.Substring(i * chunkSize, chunkSize));
}
private static Dictionary<char, HashSet<string>> buildDictionary(IEnumerable<string> words)
{
var dictionary = new Dictionary<char, HashSet<string>>();
foreach (var word in words)
{
var key = word[0];
if (!dictionary.ContainsKey(key))
{
dictionary[key] = new HashSet<string>();
}
dictionary[key].Add(word);
}
return dictionary;
}
static public string find_words(string instring)
{
string bestSolution = "";
string solution = "";
if (compiledWords[instring[0]].Contains(instring))
{
return instring;
}
if (solutions.ContainsKey(instring.ToString()))
{
return solutions[instring];
}
for (int i = 1; i < instring.Length; i++)
{
string partOne = find_words(instring.Substring(0, i));
string partTwo = find_words(instring.Substring(i, instring.Length - i));
if (partOne == "" || partTwo == "")
{
continue;
}
solution = partOne + " " + partTwo;
if (bestSolution == "" || solution.Length < bestSolution.Length)
{
bestSolution = solution;
}
}
solutions[instring] = bestSolution;
return bestSolution;
}
How I would like to use the Viterbi Algorithm
I would like to create an algorithm which works out the most probable solution to a conjoined string, where the probability is calculated according to the position of the word in a text file that I provide the algorithm with. Let's say the file starts with the most common word in the English language first, and on the next line the second most common, and so on until the least common word in my dictionary. It looks roughly like this
the
be
and
...
attorney
Here is a link to a small example of such a text file I would like to use.
Here is a much larger text file which I would like to use
The logic behind this file positioning is as follows...
It is reasonable to assume that they follow Zipf's law, that is the
word with rank n in the list of words has probability roughly 1/(n log
N) where N is the number of words in the dictionary.
Generic Human, in his excellent Python solution, explains this much better than I can. I would like to convert his solution to the problem from Python into C#, but after many hours spent attempting this I haven't been able to produce a working solution.
I also remain open to the idea that perhaps relative frequencies with the Viterbi algorithm isn't the best way to split words, any other suggestions for creating a solution using C#?
Written text is highly contextual and you may wish to use a Markov chain to model sentence structure in order to estimate joint probability. Unfortunately, sentence structure breaks the Viterbi assumption -- but there is still hope, the Viterbi algorithm is a case of branch-and-bound optimization aka "pruned dynamic programming" (something I showed in my thesis) and therefore even when the cost-splicing assumption isn't met, you can still develop cost bounds and prune your population of candidate solutions. But let's set Markov chains aside for now... assuming that the probabilities are independent and each follows Zipf's law, what you need to know is that the Viterbi algorithm works on accumulating additive costs.
For independent events, joint probability is the product of the individual probabilities, making negative log-probability a good choice for the cost.
So your single-step cost would be -log(P) or log(1/P) which is log(index * log(N)) which is log(index) + log(log(N)) and the latter term is a constant.
Can't help you with the Viterbi Algorithm but I'll give my two cents concerning your current approach. From your code its not exactly clear what words is. This can be a real bottleneck if you don't choose a good data structure. As a gut feeling I'd initially go with a Dictionary<char, HashSet<string>> where the key is the first letter of each word:
private static Dictionary<char, HashSet<string>> buildDictionary(IEnumerable<string> words)
{
var dictionary = new Dictionary<char, HashSet<string>>();
foreach (var word in words)
{
var key = word[0];
if (!dictionary.ContainsKey(key))
{
dictionary[key] = new HashSet<string>();
}
dictionary[key].Add(word);
}
return dictionary;
}
And I'd also consider serializing it to disk to avoid building it up every time.
Not sure how much improvement you can make like this (dont have full information of you current implementation) but benchmark it and see if you get any improvement.
NOTE: I'm assuming all words are cased consistently.
I'm making a program in winforms application where in it generates random patterns with seven length of characters such as 2Vowels-5consonants, 3Vowels-4Consonants and so on..after which it generates random letters specific to the pattern generated..
After the letters are generated, I want to list all possible combinations of letters to that generated letters.. and try to check if the generated combinations are present on the system's dictionary..
-Sample Input-
Pattern : 3V-4C Letters: AIOLMNC
Combinations: AIO AOL OIL .... MAIL
.... CLAIM .... and so on...
-Output-
Words Found: OIL MAIL CLAIM ... and so on...
This program is intended for word games.. I'm asking for help and any suggestions that may help me to solve my problem. I can't think of proper way to start the algorithm and how to code it..
I don't normally do this, but I came up with an even better solution for your problem, and it deserves its own answer! This AnagramSolver solution is WAY more optimized than my other answer, because it doesn't create every-single-permutation of a word, and dictionary lookups are very optimized. Try it out:
Usage Example:
string[] dictionary = ReadDictionary(...);
var solver = new AnagramSolver(dictionary);
int minimumLength = 1;
IEnumerable<string> results = solver.SolveAnagram("AEMNS", minimumLength);
// Output the results:
foreach (var result in results)
{
Console.WriteLine(result);
}
// Output example:
// "NAMES", "MANES", "MEANS", "AMEN", "MANE", "MEAN", "NAME", "MAN", "MAE", "AM", "NA", "AN", "MA", "A",
Code:
public class AnagramSolver
{
public AnagramSolver(IEnumerable<string> dictionary)
{
// Create our lookup by keying on the sorted letters:
this.dictionary = dictionary.ToLookup<string, string>(SortLetters);
}
private ILookup<string, string> dictionary;
public IEnumerable<string> SolveAnagram(string anagram, int minimumLength)
{
return CreateCombinations(anagram, minimumLength)
// Sort the letters:
.Select<string, string>(SortLetters)
// Make sure we don't have duplicates:
.Distinct()
// Find all words that can be made from these letters:
.SelectMany(combo => dictionary[combo])
;
}
private static string SortLetters(string letters)
{
char[] chars = letters.ToCharArray();
Array.Sort(chars);
return new string(chars);
}
/// <summary> Creates all possible combinations of all lengths from the anagram. </summary>
private static IEnumerable<string> CreateCombinations(string anagram, int minimumLength)
{
var letters = anagram.ToCharArray();
// Create combinations of every length:
for (int length = letters.Length; length >= minimumLength; length--)
{
yield return new string(letters, 0, length);
// Swap characters to form every combination:
for (int a = 0; a < length; a++)
{
for (int b = length; b < letters.Length; b++)
{
// Swap a <> b if necessary:
char temp = letters[a];
if (temp != letters[b]) // reduces duplication
{
letters[a] = letters[b];
letters[b] = temp;
yield return new string(letters, 0, length);
}
}
}
}
}
}
Here's a summary of the algorithm:
The basic idea is that every set of anagrams derive from the same set of letters.
If we sort the letters, we can group together sets of anagrams.
I got this idea from Algorithm for grouping anagram words.
For example, a set of anagrams ("NAMES", "MANES", "MEANS") can be keyed on "AEMNS".
Therefore, once we create our dictionary lookup, it's incredibly easy and fast to solve the anagram -- simply sort the letters of the anagram and perform the lookup.
The next challenge is to find all "smaller" anagrams -- for example, finding "NAME", "SANE", "MAN", "AN", "A", etc.
This can be done by finding all combinations of the anagram.
Combinations are much easier to find than permutations. No recursion is needed. I implemented complete combinations with 3 loops and a simple swap! It took a while to get the algorithm right, but now that it's cleaned up, it's very pretty.
For each combination found, we must again sort the letters and perform the lookup.
This gives us all possible solutions to the anagram!
Update
This solution directly answers your question ("How do I form all combinations of characters"), but this is not a very efficient way to solve anagrams. I decided to create a better solution for solving anagrams, so please see my other answer.
This sounds like a fun puzzle.
To begin, here's how you can create your "random" inputs:
Random rng = new Random();
const string vowels = "AEIOU";
const string consonants = "BCDFGHJKLMNPQRSTVWXYZ";
string CreatePuzzle(int vowelCount, int consonantCount){
var result = new StringBuilder(vowelCount + consonantCount);
for (int i = 0; i < vowelCount; i++) {
result.Append(vowels[rng.Next(5)]);
}
for (int i = 0; i < consonantCount; i++) {
result.Append(consonants[rng.Next(21)]);
}
return result.ToString();
}
Then you'll need to create all permutations of these letters. This is a great job for recursion. The following code is an implementation of Heap's Algorithm that I found at http://www.cut-the-knot.org/do_you_know/AllPerm.shtml. Another helpful resource is http://www.codeproject.com/KB/recipes/Combinatorics.aspx
/// <summary>
/// Returns all permutations of the puzzle.
/// Uses "Heap's Algorithm" found at http://www.cut-the-knot.org/do_you_know/AllPerm.shtml
/// </summary>
IEnumerable<string> CreatePermutations(string puzzle) {
// It is easier to manipulate an array; start off the recursion:
return CreatePermutations(puzzle.ToCharArray(), puzzle.Length);
}
IEnumerable<string> CreatePermutations(char[] puzzle, int n) {
if (n == 0) {
// Convert the char array to a string and return it:
yield return new string(puzzle);
} else {
// Return the sub-string:
if (n < puzzle.Length) {
yield return new string(puzzle, n, puzzle.Length - n);
}
// Create all permutations:
for (int i = 0; i < n; i++) {
// Use recursion, and pass-through the values:
foreach (string perm in CreatePermutations(puzzle, n-1)) {
yield return perm;
}
// Create each permutation by swapping characters: (Heap's Algorithm)
int swap = (n % 2 == 1) ? 0 : i;
char temp = puzzle[n-1];
puzzle[n-1] = puzzle[swap];
puzzle[swap] = temp;
}
}
}
Note that this algorithm doesn't check for duplicates, so an input like "AAA" will still result in 6 permutations. Therefore, it might make sense to call .Distinct() on the results (although the CodeProject article has an algorithm that skips duplicates, but is more complicated).
The final step, as you stated, is to check all permutations against your dictionary.
Optimizations
This solution is fairly simple, and will probably work really well if your puzzles remain small. However, it is definitely a "brute force" kind of method, and as the puzzle gets bigger, performance drops exponentially!
I'm trying to figure out the time complexity of a function that I wrote (it generates a power set for a given string):
public static HashSet<string> GeneratePowerSet(string input)
{
HashSet<string> powerSet = new HashSet<string>();
if (string.IsNullOrEmpty(input))
return powerSet;
int powSetSize = (int)Math.Pow(2.0, (double)input.Length);
// Start at 1 to skip the empty string case
for (int i = 1; i < powSetSize; i++)
{
string str = Convert.ToString(i, 2);
string pset = str;
for (int k = str.Length; k < input.Length; k++)
{
pset = "0" + pset;
}
string set = string.Empty;
for (int j = 0; j < pset.Length; j++)
{
if (pset[j] == '1')
{
set = string.Concat(set, input[j].ToString());
}
}
powerSet.Add(set);
}
return powerSet;
}
So my attempt is this:
let the size of the input string be n
in the outer for loop, must iterate 2^n times (because the set size is 2^n).
in the inner for loop, we must iterate 2*n times (at worst).
1. So Big-O would be O((2^n)*n) (since we drop the constant 2)... is that correct?
And n*(2^n) is worse than n^2.
if n = 4 then
(4*(2^4)) = 64
(4^2) = 16
if n = 100 then
(10*(2^10)) = 10240
(10^2) = 100
2. Is there a faster way to generate a power set, or is this about optimal?
A comment:
the above function is part of an interview question where the program is supposed to take in a string, then print out the words in the dictionary whose letters are an anagram subset of the input string (e.g. Input: tabrcoz Output: boat, car, cat, etc.). The interviewer claims that a n*m implementation is trivial (where n is the length of the string and m is the number of words in the dictionary), but I don't think you can find valid sub-strings of a given string. It seems that the interviewer is incorrect.
I was given the same interview question when I interviewed at Microsoft back in 1995. Basically the problem is to implement a simple Scrabble playing algorithm.
You are barking up completely the wrong tree with this idea of generating the power set. Nice thought, clearly way too expensive. Abandon it and find the right answer.
Here's a hint: run an analysis pass over the dictionary that builds a new data structure more amenable to efficiently solving the problem you actually have to solve. With an optimized dictionary you should be able to achieve O(nm). With a more cleverly built data structure you can probably do even better than that.
2. Is there a faster way to generate a power set, or is this about optimal?
Your algorithm is reasonable, but your string handling could use improvement.
string str = Convert.ToString(i, 2);
string pset = str;
for (int k = str.Length; k < input.Length; k++)
{
pset = "0" + pset;
}
All you're doing here is setting up a bitfield, but using a string. Just skip this, and use variable i directly.
for (int j = 0; j < input.Length; j++)
{
if (i & (1 << j))
{
When you build the string, use a StringBuilder, not creating multiple strings.
// At the beginning of the method
StringBuilder set = new StringBuilder(input.Length);
...
// Inside the loop
set.Clear();
...
set.Append(input[j]);
...
powerSet.Add(set.ToString());
Will any of this change the complexity of your algorithm? No. But it will significantly reduce the number of extra String objects you create, which will provide you a good speedup.