C# Algorithm that Re-arranges Chars in a String - c#

I would like a C# algorithm that re-arranges the chars in a string that is dynamic in length. Having trouble finding one and I know there has to be one out there.
The algorithm has to realign elements to form new strings in all possible combinations.
For instance, "cat" would produce the following:
cat cta tca tac act atc

This is a fairly frequently asked question here. Try doing a search on "permutation" and you'll find a lot of good answers about how to do this in various languages.
There is a library of permutation and combination algorithms in C# here:
http://www.codeproject.com/KB/recipes/Combinatorics.aspx

There are also the operators I contributed to the MoreLinq project on Google Code in the EvenMoreLinq branch. If you're just curious about how the algorithm is implemented, you can see the code for Permutations<T>() here.
They are designed to blend well with LINQ, and use both deferred and streaming evaluation. Permutations is an interesting one, since generating all permutations is a N! operation ... which becomes very large for even small values of N. Depending on how you generate permutations, you may (or may not) be able to actually enumerate them all.
You'll also find algorithms for other combinatoric operations (Subsets, PermutedSubsets, Cartesian Products, Random Subsets, Slices, Partitions, etc) in that same codebase.
Here's how you can use the MoreLinq extensions to permute a sequence. So for instance, you could permute a string (which is treated as a sequence of chars) as follows:
using MoreLinq;
string input = "cat";
var permutations = input.Permutations();
foreach( var permutation in permutations )
{
// 'permutation' is a char[] here, so convert back to a string
Console.WriteLine( new string(permutation) );
}

static void Main(string[] args)
{
Console.WriteLine("Enter String:");
string inputString = Console.ReadLine();
Console.WriteLine();
List<string> lstAnagrams = new List<string>();
int numAnagram = 1;
permute(inputString.ToCharArray(), 0, inputString.Length - 1, lstAnagrams);
foreach(string anagram in lstAnagrams)
{
Console.WriteLine(numAnagram.ToString() + " " + anagram);
numAnagram++;
}
Console.ReadKey();
}
static void permute(char[] word, int start, int end, List<string> lstAnagrams)
{
if (start == end)
lstAnagrams.Add(string.Join("", word));
else
{
for (int position = start; position <= end; position++)
{
swap(ref word[start], ref word[position]);
permute(word, start + 1, end,lstAnagrams);
swap(ref word[start], ref word[position]);
}
}
}
static void swap(ref char a, ref char b)
{
char tmp;
tmp = a;
a = b;
b = tmp;
}

Related

Regular expression to allow a set of characters that are repeated different times

I'm new on regex and for learning purpose, i'm coding a word finding program. It takes a pool of characters and when i press "find" button, it lists all possible words from that character set. Looping in word list, the program compares each word with regex pattern.
Inside that, i wrote a simple pattern to make it work. for example:
^[mase]+$
but it doesn't work properly.
brief explation and what i try to achieve, with one example below :
if the char pool is ahbbhhh then i want to match words which contain a minimum of one a, h, or b AND a maximum of 1 a, 4 h's, and 2 b's. it shouldn't contain any other characters.
I don't think RegEx would support this.
However its a simple test on any programming language.
Here is an example on C#:
private Dictionary<char, int> WordToChars(string word)
{
var result = new Dictionary<char, int>();
foreach (var c in word)
{
if (result.ContainsKey(c))
{
result[c] += result[c] + 1;
}
else
{
result[c] = 1;
}
}
return result;
}
private bool DoesMatchPattern(string patternString, string testString)
{
var pattern = WordToChars(patternString);
var test = WordToChars(testString);
return test.All(x => pattern.TryGetValue(x.Key, out int qty) && qty >= x.Value);
}
In few words: WordToChar() converts any string to characters with repeated quantities.
And DoesMatchPattern() compares that test string has only chars found in pattern string and corresponding repeated qty is less or equal to the pattern.

How can I implement the Viterbi algorithm in C# to split conjoined words?

In short - I want to convert the first answer to the question here from Python into C#. My current solution to splitting conjoined words is exponential, and I would like a linear solution. I am assuming no spacing and consistent casing in my input text.
Background
I wish to convert conjoined strings such as "wickedweather" into separate words, for example "wicked weather" using C#. I have created a working solution, a recursive function using exponential time, which is simply not efficient enough for my purposes (processing at least over 100 joined words). Here the questions I have read so far, which I believe may be helpful, but I cannot translate their responses from Python to C#.
How can I split multiple joined words?
Need help understanding this Python Viterbi algorithm
How to extract literal words from a consecutive string efficiently?
My Current Recursive Solution
This is for people who only want to split a few words (< 50) in C# and don't really care about efficiency.
My current solution works out all possible combinations of words, finds the most probable output and displays. I am currently defining the most probable output as the one which uses the longest individual words - I would prefer to use a different method. Here is my current solution, using a recursive algorithm.
static public string find_words(string instring)
{
if (words.Contains(instring)) //where words is my dictionary of words
{
return instring;
}
if (solutions.ContainsKey(instring.ToString()))
{
return solutions[instring];
}
string bestSolution = "";
string solution = "";
for (int i = 1; i < instring.Length; i++)
{
string partOne = find_words(instring.Substring(0, i));
string partTwo = find_words(instring.Substring(i, instring.Length - i));
if (partOne == "" || partTwo == "")
{
continue;
}
solution = partOne + " " + partTwo;
//if my current solution is smaller than my best solution so far (smaller solution means I have used the space to separate words fewer times, meaning the words are larger)
if (bestSolution == "" || solution.Length < bestSolution.Length)
{
bestSolution = solution;
}
}
solutions[instring] = bestSolution;
return bestSolution;
}
This algorithm relies on having no spacing or other symbols in the entry text (not really a problem here, I'm not fussed about splitting up punctuation). Random additional letters added within the string can cause an error, unless I store each letter of the alphabet as a "word" within my dictionary. This means that "wickedweatherdykjs" would return "wicked weather d y k j s" using the above algorithm, when I would prefer an output of "wicked weather dykjs".
My updated exponential solution:
static List<string> words = File.ReadLines("E:\\words.txt").ToList();
static Dictionary<char, HashSet<string>> compiledWords = buildDictionary(words);
private void btnAutoSpacing_Click(object sender, EventArgs e)
{
string text = txtText.Text;
text = RemoveSpacingandNewLines(text); //get rid of anything that breaks the algorithm
if (text.Length > 150)
{
//possibly split the text up into more manageable chunks?
//considering using textSplit() for this.
}
else
{
txtText.Text = find_words(text);
}
}
static IEnumerable<string> textSplit(string str, int chunkSize)
{
return Enumerable.Range(0, str.Length / chunkSize)
.Select(i => str.Substring(i * chunkSize, chunkSize));
}
private static Dictionary<char, HashSet<string>> buildDictionary(IEnumerable<string> words)
{
var dictionary = new Dictionary<char, HashSet<string>>();
foreach (var word in words)
{
var key = word[0];
if (!dictionary.ContainsKey(key))
{
dictionary[key] = new HashSet<string>();
}
dictionary[key].Add(word);
}
return dictionary;
}
static public string find_words(string instring)
{
string bestSolution = "";
string solution = "";
if (compiledWords[instring[0]].Contains(instring))
{
return instring;
}
if (solutions.ContainsKey(instring.ToString()))
{
return solutions[instring];
}
for (int i = 1; i < instring.Length; i++)
{
string partOne = find_words(instring.Substring(0, i));
string partTwo = find_words(instring.Substring(i, instring.Length - i));
if (partOne == "" || partTwo == "")
{
continue;
}
solution = partOne + " " + partTwo;
if (bestSolution == "" || solution.Length < bestSolution.Length)
{
bestSolution = solution;
}
}
solutions[instring] = bestSolution;
return bestSolution;
}
How I would like to use the Viterbi Algorithm
I would like to create an algorithm which works out the most probable solution to a conjoined string, where the probability is calculated according to the position of the word in a text file that I provide the algorithm with. Let's say the file starts with the most common word in the English language first, and on the next line the second most common, and so on until the least common word in my dictionary. It looks roughly like this
the
be
and
...
attorney
Here is a link to a small example of such a text file I would like to use.
Here is a much larger text file which I would like to use
The logic behind this file positioning is as follows...
It is reasonable to assume that they follow Zipf's law, that is the
word with rank n in the list of words has probability roughly 1/(n log
N) where N is the number of words in the dictionary.
Generic Human, in his excellent Python solution, explains this much better than I can. I would like to convert his solution to the problem from Python into C#, but after many hours spent attempting this I haven't been able to produce a working solution.
I also remain open to the idea that perhaps relative frequencies with the Viterbi algorithm isn't the best way to split words, any other suggestions for creating a solution using C#?
Written text is highly contextual and you may wish to use a Markov chain to model sentence structure in order to estimate joint probability. Unfortunately, sentence structure breaks the Viterbi assumption -- but there is still hope, the Viterbi algorithm is a case of branch-and-bound optimization aka "pruned dynamic programming" (something I showed in my thesis) and therefore even when the cost-splicing assumption isn't met, you can still develop cost bounds and prune your population of candidate solutions. But let's set Markov chains aside for now... assuming that the probabilities are independent and each follows Zipf's law, what you need to know is that the Viterbi algorithm works on accumulating additive costs.
For independent events, joint probability is the product of the individual probabilities, making negative log-probability a good choice for the cost.
So your single-step cost would be -log(P) or log(1/P) which is log(index * log(N)) which is log(index) + log(log(N)) and the latter term is a constant.
Can't help you with the Viterbi Algorithm but I'll give my two cents concerning your current approach. From your code its not exactly clear what words is. This can be a real bottleneck if you don't choose a good data structure. As a gut feeling I'd initially go with a Dictionary<char, HashSet<string>> where the key is the first letter of each word:
private static Dictionary<char, HashSet<string>> buildDictionary(IEnumerable<string> words)
{
var dictionary = new Dictionary<char, HashSet<string>>();
foreach (var word in words)
{
var key = word[0];
if (!dictionary.ContainsKey(key))
{
dictionary[key] = new HashSet<string>();
}
dictionary[key].Add(word);
}
return dictionary;
}
And I'd also consider serializing it to disk to avoid building it up every time.
Not sure how much improvement you can make like this (dont have full information of you current implementation) but benchmark it and see if you get any improvement.
NOTE: I'm assuming all words are cased consistently.

How to form combinations out of a random generated characters?

I'm making a program in winforms application where in it generates random patterns with seven length of characters such as 2Vowels-5consonants, 3Vowels-4Consonants and so on..after which it generates random letters specific to the pattern generated..
After the letters are generated, I want to list all possible combinations of letters to that generated letters.. and try to check if the generated combinations are present on the system's dictionary..
-Sample Input-
Pattern : 3V-4C Letters: AIOLMNC
Combinations: AIO AOL OIL .... MAIL
.... CLAIM .... and so on...
-Output-
Words Found: OIL MAIL CLAIM ... and so on...
This program is intended for word games.. I'm asking for help and any suggestions that may help me to solve my problem. I can't think of proper way to start the algorithm and how to code it..
I don't normally do this, but I came up with an even better solution for your problem, and it deserves its own answer! This AnagramSolver solution is WAY more optimized than my other answer, because it doesn't create every-single-permutation of a word, and dictionary lookups are very optimized. Try it out:
Usage Example:
string[] dictionary = ReadDictionary(...);
var solver = new AnagramSolver(dictionary);
int minimumLength = 1;
IEnumerable<string> results = solver.SolveAnagram("AEMNS", minimumLength);
// Output the results:
foreach (var result in results)
{
Console.WriteLine(result);
}
// Output example:
// "NAMES", "MANES", "MEANS", "AMEN", "MANE", "MEAN", "NAME", "MAN", "MAE", "AM", "NA", "AN", "MA", "A",
Code:
public class AnagramSolver
{
public AnagramSolver(IEnumerable<string> dictionary)
{
// Create our lookup by keying on the sorted letters:
this.dictionary = dictionary.ToLookup<string, string>(SortLetters);
}
private ILookup<string, string> dictionary;
public IEnumerable<string> SolveAnagram(string anagram, int minimumLength)
{
return CreateCombinations(anagram, minimumLength)
// Sort the letters:
.Select<string, string>(SortLetters)
// Make sure we don't have duplicates:
.Distinct()
// Find all words that can be made from these letters:
.SelectMany(combo => dictionary[combo])
;
}
private static string SortLetters(string letters)
{
char[] chars = letters.ToCharArray();
Array.Sort(chars);
return new string(chars);
}
/// <summary> Creates all possible combinations of all lengths from the anagram. </summary>
private static IEnumerable<string> CreateCombinations(string anagram, int minimumLength)
{
var letters = anagram.ToCharArray();
// Create combinations of every length:
for (int length = letters.Length; length >= minimumLength; length--)
{
yield return new string(letters, 0, length);
// Swap characters to form every combination:
for (int a = 0; a < length; a++)
{
for (int b = length; b < letters.Length; b++)
{
// Swap a <> b if necessary:
char temp = letters[a];
if (temp != letters[b]) // reduces duplication
{
letters[a] = letters[b];
letters[b] = temp;
yield return new string(letters, 0, length);
}
}
}
}
}
}
Here's a summary of the algorithm:
The basic idea is that every set of anagrams derive from the same set of letters.
If we sort the letters, we can group together sets of anagrams.
I got this idea from Algorithm for grouping anagram words.
For example, a set of anagrams ("NAMES", "MANES", "MEANS") can be keyed on "AEMNS".
Therefore, once we create our dictionary lookup, it's incredibly easy and fast to solve the anagram -- simply sort the letters of the anagram and perform the lookup.
The next challenge is to find all "smaller" anagrams -- for example, finding "NAME", "SANE", "MAN", "AN", "A", etc.
This can be done by finding all combinations of the anagram.
Combinations are much easier to find than permutations. No recursion is needed. I implemented complete combinations with 3 loops and a simple swap! It took a while to get the algorithm right, but now that it's cleaned up, it's very pretty.
For each combination found, we must again sort the letters and perform the lookup.
This gives us all possible solutions to the anagram!
Update
This solution directly answers your question ("How do I form all combinations of characters"), but this is not a very efficient way to solve anagrams. I decided to create a better solution for solving anagrams, so please see my other answer.
This sounds like a fun puzzle.
To begin, here's how you can create your "random" inputs:
Random rng = new Random();
const string vowels = "AEIOU";
const string consonants = "BCDFGHJKLMNPQRSTVWXYZ";
string CreatePuzzle(int vowelCount, int consonantCount){
var result = new StringBuilder(vowelCount + consonantCount);
for (int i = 0; i < vowelCount; i++) {
result.Append(vowels[rng.Next(5)]);
}
for (int i = 0; i < consonantCount; i++) {
result.Append(consonants[rng.Next(21)]);
}
return result.ToString();
}
Then you'll need to create all permutations of these letters. This is a great job for recursion. The following code is an implementation of Heap's Algorithm that I found at http://www.cut-the-knot.org/do_you_know/AllPerm.shtml. Another helpful resource is http://www.codeproject.com/KB/recipes/Combinatorics.aspx
/// <summary>
/// Returns all permutations of the puzzle.
/// Uses "Heap's Algorithm" found at http://www.cut-the-knot.org/do_you_know/AllPerm.shtml
/// </summary>
IEnumerable<string> CreatePermutations(string puzzle) {
// It is easier to manipulate an array; start off the recursion:
return CreatePermutations(puzzle.ToCharArray(), puzzle.Length);
}
IEnumerable<string> CreatePermutations(char[] puzzle, int n) {
if (n == 0) {
// Convert the char array to a string and return it:
yield return new string(puzzle);
} else {
// Return the sub-string:
if (n < puzzle.Length) {
yield return new string(puzzle, n, puzzle.Length - n);
}
// Create all permutations:
for (int i = 0; i < n; i++) {
// Use recursion, and pass-through the values:
foreach (string perm in CreatePermutations(puzzle, n-1)) {
yield return perm;
}
// Create each permutation by swapping characters: (Heap's Algorithm)
int swap = (n % 2 == 1) ? 0 : i;
char temp = puzzle[n-1];
puzzle[n-1] = puzzle[swap];
puzzle[swap] = temp;
}
}
}
Note that this algorithm doesn't check for duplicates, so an input like "AAA" will still result in 6 permutations. Therefore, it might make sense to call .Distinct() on the results (although the CodeProject article has an algorithm that skips duplicates, but is more complicated).
The final step, as you stated, is to check all permutations against your dictionary.
Optimizations
This solution is fairly simple, and will probably work really well if your puzzles remain small. However, it is definitely a "brute force" kind of method, and as the puzzle gets bigger, performance drops exponentially!

How to find all the different combinations as units of different lengths of the characters of a string

Hi I want to find all the different combinations rather linear selections of characters from a given string without losing sequence as units of different sizes. Example:
Lets say a word "HAVING"
Then it can have combinations like (spaces separating individual units).
HA VI N G
HAV ING
H AV I N G
HAVIN G
H AVING
H AVIN G
HA VING
H AVI NG
....
Like this all the different selections of units of different lengths.
Can someone give a prototype code or algo idea.
Thanks,
Kalyana
In a string of size n, you have n-1 positions where you could place spaces (= between each pair of consecutive letters). Thus, you have 2^(n-1) options, each represented by a binary number with n-1 digits, e.g., your first example would be 01011.
That should give you enough information to get started (no full solution; this sounds like homework).
Simple recursive solution. Two sets are the first letter and the rest of the word. Find all combinations on the rest of the word. Then put the second letter with the first, and find all combinations on the rest of the word. Repeat until the rest of the word is 1 letter.
It's just a power set isn't it? For every position between two letters in your string you do or do not have a space. So for a string of length 6 there are 2 to the power of 5 possibilities.
A simple recursive function will enumerate all the possibilities. Do you need help writing such a function or were you just looking for the algorithm?
Between each letter, there's either a separator or there isn't. So recursively go through the word, trying out all combinations of separators/no-separators.
function f( word )
set = word[0] + f( substring(word, 1) )
set += word[0] + " " + f( substring(word, 1) )
return set;
My solution
Make it something like
void func(string s)
{
int len=s.length();
if(len==0)
return;
for(int i=1;i<=len;i++)
{
for(int j=0;j<i;j++)
cout<<s[j];
cout<<" ";
func(s[i]);
}
return;
}
Recursion. In java:
public static List<List<String>> split (String str) {
List<List<String>> res = new ArrayList<List<String>>();
if (str == null) {
return res;
}
for (int i = 0; i < str.length() - 1; i++) {
for (List<String> list : split(str.substring(i + 1))) {
List<String> tmpList = new ArrayList<String>();
tmpList.add(str.substring(0, i + 1));
for (String s : list) {
tmpList.add(s);
}
res.add(tmpList);
}
}
List<String> tmpList = new ArrayList<String>();
tmpList.add(str);
res.add(tmpList);
return res;
}
public static void main(String[] args) {
for (List<String> intermed : split("HAVING")) {
for (String str : intermed) {
System.out.print(str);
System.out.print(" ");
}
System.out.println();
}
}

Calculating Nth permutation step?

I have a char[26] of the letters a-z and via nested for statements I'm producing a list of sequences like:
aaa, aaz... aba, abb, abz, ... zzy, zzz.
Currently, the software is written to generate the list of all possible values from aaa-zzz and then maintains an index, and goes through each of them performing an operation on them.
The list is obviously large, it's not ridiculously large, but it's gotten to the point where the memory footprint is too large (there are also other areas being looked at, but this is one that I've got).
I'm trying to produce a formula where I can keep the index, but do away with the list of sequences and calculate the current sequence based on the current index (as the time between operations between sequences is long).
Eg:
char[] characters = {a, b, c... z};
int currentIndex = 29; // abd
public string CurrentSequence(int currentIndex)
{
int ndx1 = getIndex1(currentIndex); // = 0
int ndx2 = getIndex2(currentIndex); // = 1
int ndx3 = getIndex3(currentIndex); // = 3
return string.Format(
"{0}{1}{2}",
characters[ndx1],
characters[ndx2],
characters[ndx3]); // abd
}
I've tried working out a small example using a subset (abc) and trying to index into that using modulo division, but I'm having trouble thinking today and I'm stumped.
I'm not asking for an answer, just any kind of help. Maybe a kick in the right direction?
Hint: Think of how you would print a number in base 26 instead of base 10, and with letters instead of digits. What's the general algorithm for displaying a number in an arbitrary base?
Spoiler: (scroll right to view)
int ndx1 = currentIndex / 26 / 26 % 26;
int ndx2 = currentIndex / 26 % 26;
int ndx3 = currentIndex % 26;
Something like this ought to work, assuming 26 characters:
public string CurrentSequence(int currentIndex) {
return characters[currentIndex / (26 * 26)]
+ characters[(currentIndex / 26) % 26]
+ characters[currentIndex % 26];
}
Wow, two questions in one day that can be solved via Cartesian products. Amazing.
You can use Eric Lippert's LINQ snippet to generate all combinations of the index values. This approach results in a streaming set of values, so they don't require storage in memory. This approach nicely separates the logic of generating the codes from maintaining state or performing computation with the code.
Eric's code for all combinations:
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] {item}));
}
You can now write:
public static IEnumerable<string> AllCodes()
{
char[] characters = {a, b, c... z};
IEnumerable<char[]> codeSets = new[] { characters, characters, characters };
foreach( var codeValues in codeSets.CartesianProduct() )
{
yield return
string.Format( "{0}{1}{2}", codeValues[0], codeValues[1], codeValues[2]);
}
}
The code above generates a streaming sequence of all code strings from aaa to zzz. You can now use this elsewhere where you perform your processing:
foreach( var code in AllCodes() )
{
// use the code value somehow...
}
There are multiple ways to solve your problem, but an option is to generate the sequence on the fly instead of storing it in a list:
IEnumerable<String> Sequence() {
for (var c1 = 'a'; c1 <= 'z'; ++c1)
for (var c2 = 'a'; c2 <= 'z'; ++c2)
for (var c3 = 'a'; c3 <= 'z'; ++c3)
yield return String.Format("{0}{1}{2}", c1, c2, c3);
}
You can then enumerate all the strings:
foreach (var s in Sequence())
Console.WriteLine(s);
This code doesn't use indices at all but it allows you to create a loop around the sequence of strings using simple code and without storing the strings.

Categories

Resources