Calculate variations on a string - c#

I have a series of incorrectly encoded base36 values - these were encoded from integers using a string of letters, missing the "i" and "o". They now need to be converted back to integers using C#.
There are multiple permutations because of the rollover effect.
"0" can either equal 0 or 34;
"1" can either equal 1 or 35.
So, for instance, if I have a string "a110", it has six possible values.
I'm having a hard time trying to figure how to code for this.
All the examples I've looked at compute variations for a set of elements, for example
char[] = { a, b, c }
int[] = { 1, 2, 3 }
However, in my case, there are conditionals involved too, and it's making my head hurt. Can anyone help?

You can compute the list of all possible input strings. First, read the input into a list of ints. Now, you know that each of those (if it's a sufficiently low value) could be one of two things. So then you can create an enumerator that returns all of the possible inputs, via a recursive descent.

I managed to do it with the following code. It was actually a little simpler than I expected, since I only had two conditions, and two options. It uses recursion and steps through each character in the string. If that character is a 0 or 1, it then diverges, and continues building the string.
It actually generates a few duplicates, so I had to add a condition to only add it to the string list if it doesn't already exist. If someone else can point me to slightly better logic I'd appreciate it
public string st = "101"; // hardcoded for now
public char[] cs;
public List<string> variations;
static void Main()
{
cs = st.ToCharArray();
variations = new List<string>();
vary("",0);
}
static void vary(string m, int n)
{
for (int i = n; i < cs.Count(); i++)
{
if (cs[i] == '0' || cs[i] == '1')
{
// recurse
combo(m + (cs[i] == '0' ? "0" : "1"), i + 1);
combo(m + (cs[i] == '0' ? "Y" : "Z"), i + 1);
}
m += cs[i];
}
if(!variations.Contains(m))
variations.Add(m);
}
for the string "101" I get the following combinations
101
10Z
1Y1
1YZ
Z01
Z0Z
ZY1
ZYZ

Related

C# locating where the * is in a string separated by pipes

I have to find where a * is at when it could be none at all , 1st position | 2nd position | 3rd position.
The positions are separated by pipes |
Thus
No * wildcard would be
`ABC|DEF|GHI`
However, while that could be 1 scenario, the other 3 are
string testPosition1 = "*|DEF|GHI";
string testPosition2 = "ABC|*|GHI";
string testPosition3 = "ABC|DEF|*";
I gather than I should use IndexOf , but it seems like I should incorporate | (pipe) to know the position ( not just the length as the values could be long or short in each of the 3 places. So I just want to end up knowing if * is in first, second or third position ( or not at all )
Thus I was doing this but i'm not going to know about if it is before 1st or 2nd pipe
if(testPosition1.IndexOf("*") > 0)
{
// Look for pipes?
}
There are lots of ways you could approach this. The most readable might actually just be to do it the hard way (i.e. scan the string to find the first '*' character, keeping track of how many '|' characters you see along the way).
That said, this could be a similarly readable and more concise:
int wildcardPosition = Array.IndexOf(testPosition1.Split('|'), "*");
Returns -1 if not found, otherwise 0-based index for which segment of the '|' delimited string contains the wildcard string.
This only works if the wildcard is exactly the one-character string "*". If you need to support other variations on that, you will still want to split the string, but then you can loop over the array looking for whatever criteria you need.
You can try with linq splitting the string at the pipe character and then getting the index of the element that contains just a *
var x = testPosition2.Split('|').Select((k, i) => new { text = k, index = i}).FirstOrDefault(p => p.text == "*" );
if(x != null) Console.WriteLine(x.index);
So the first line starts splitting the string at the pipe creating an array of strings. This sequence is passed to the Select extension that enumerates the sequence passing the string text (k) and the index (i). With these two parameters we build a sequences of anonymous objects with two properties (text and index). FirstOrDefault extract from this sequence the object with text equals to * and we can print the property index of that object.
The other answers are fine (and likely better), however here is another approach, the good old fashioned for loop and the try-get pattern
public bool TryGetStar(string input, out int index)
{
var split = input.Split('|');
for (index = 0; index < split.Length; index++)
if (split[index] == "*")
return true;
return false;
}
Or if you were dealing with large strings and trying to save allocations. You could remove the Split entirely and use a single parse O(n)
public bool TryGetStar(string input, out int index)
{
index = 0;
for (var i = 0; i < input.Length; i++)
if (input[i] == '|') index++;
else if (input[i] == '*') return true;
return false;
}
Note : if performance was a consideration, you could also use unsafe and pointers, or Span<Char> which would afford a small amount of efficiency.
Try DotNETFiddle:
testPosition.IndexOf("*") - testPosition.Replace("|","").IndexOf("*")
Find the index of the wildcard ("*") and see how far it moves if you remove the pipe ("|") characters. The result is a zero-based index.
From the question you have the following code segment:
if(testPosition1.IndexOf("*") > 0)
{
}
If you're now inside the if statement, you're sure the asterisk exists.
From that point, an efficient solution could be to check the first two chars, and the last two chars.
if (testPosition1.IndexOf("*") > 0)
{
if (testPosition1[0] == '*' && testPosition[1] == '|')
{
// First position.
}
else if (testPosition1[testPosition.Length - 1] == '*' && testPosition1[testPosition.Length - 2] == '|')
{
// Third (last) position.
}
else
{
// Second position.
}
}
This assumes that no more than one * can exist, and also assumes that if an * exist, it can only be surrounded by pipes. For example, I assume an input like ABC|DEF|G*H is invalid.
If you want to remove this assumptions, you could do a one-pass loop over the string and keeping track with the necessary information.

How to split a string into efficient way c#

I have a string like this:
-82.9494547,36.2913021,0
-83.0784938,36.2347521,0
-82.9537782,36.079235,0
I need to have output like this:
-82.9494547 36.2913021, -83.0784938 36.2347521, -82.9537782,36.079235
I have tried this following to code to achieve the desired output:
string[] coordinatesVal = coordinateTxt.Trim().Split(new string[] { ",0" }, StringSplitOptions.None);
for (int i = 0; i < coordinatesVal.Length - 1; i++)
{
coordinatesVal[i] = coordinatesVal[i].Trim();
coordinatesVal[i] = coordinatesVal[i].Replace(',', ' ');
numbers.Append(coordinatesVal[i]);
if (i != coordinatesVal.Length - 1)
{
coordinatesVal.Append(", ");
}
}
But this process does not seem to me the professional solution. Can anyone please suggest more efficient way of doing this?
Your code is okay. You could dismiss temporary results and chain method calls
var numbers = new StringBuilder();
string[] coordinatesVal = coordinateTxt
.Trim()
.Split(new string[] { ",0" }, StringSplitOptions.None);
for (int i = 0; i < coordinatesVal.Length - 1; i++) {
numbers
.Append(coordinatesVal[i].Trim().Replace(',', ' '))
.Append(", ");
}
numbers.Length -= 2;
Note that the last statement assumes that there is at least one coordinate pair available. If the coordinates can be empty, you would have to enclose the loop and this last statement in if (coordinatesVal.Length > 0 ) { ... }. This is still more efficient than having an if inside the loop.
You ask about efficiency, but you don't specify whether you mean code efficiency (execution speed) or programmer efficiency (how much time you have to spend on it).
One key part of professional programming is to judge which one of these is more important in any given situation.
The other answers do a good job of covering programmer efficiency, so I'm taking a stab at code efficiency. I'm doing this at home for fun, but for professional work I would need a good reason before putting in the effort to even spend time comparing the speeds of the methods given in the other answers, let alone try to improve on them.
Having said that, waiting around for the program to finish doing the conversion of millions of coordinate pairs would give me such a reason.
One of the speed pitfalls of C# string handling is the way String.Replace() and String.Trim() return a whole new copy of the string. This involves allocating memory, copying the characters, and eventually cleaning up the garbage generated. Do that a few million times, and it starts to add up. With that in mind, I attempted to avoid as many allocations and copies as possible.
enum CurrentField
{
FirstNum,
SecondNum,
UnwantedZero
};
static string ConvertStateMachine(string input)
{
// Pre-allocate enough space in the string builder.
var numbers = new StringBuilder(input.Length);
var state = CurrentField.FirstNum;
int i = 0;
while (i < input.Length)
{
char c = input[i++];
switch (state)
{
// Copying the first number to the output, next will be another number
case CurrentField.FirstNum:
if (c == ',')
{
// Separate the two numbers by space instead of comma, then move on
numbers.Append(' ');
state = CurrentField.SecondNum;
}
else if (!(c == ' ' || c == '\n'))
{
// Ignore whitespace, output anything else
numbers.Append(c);
}
break;
// Copying the second number to the output, next will be the ,0\n that we don't need
case CurrentField.SecondNum:
if (c == ',')
{
numbers.Append(", ");
state = CurrentField.UnwantedZero;
}
else if (!(c == ' ' || c == '\n'))
{
// Ignore whitespace, output anything else
numbers.Append(c);
}
break;
case CurrentField.UnwantedZero:
// Output nothing, just track when the line is finished and we start all over again.
if (c == '\n')
{
state = CurrentField.FirstNum;
}
break;
}
}
return numbers.ToString();
}
This uses a state machine to treat incoming characters differently depending on whether they are part of the first number, second number, or the rest of the line, and output characters accordingly. Each character is only copied once into the output, then I believe once more when the output is converted to a string at the end. This second conversion could probably be avoided by using a char[] for the output.
The bottleneck in this code seems to be the number of calls to StringBuilder.Append(). If more speed were required, I would first attempt to keep track of how many characters were to be copied directly into the output, then use .Append(string value, int startIndex, int count) to send an entire number across in one call.
I put a few example solutions into a test harness, and ran them on a string containing 300,000 coordinate-pair lines, averaged over 50 runs. The results on my PC were:
String Split, Replace each line (see Olivier's answer, though I pre-allocated the space in the StringBuilder):
6542 ms / 13493147 ticks, 130.84ms / 269862.9 ticks per conversion
Replace & Trim entire string (see Heriberto's second version):
3352 ms / 6914604 ticks, 67.04 ms / 138292.1 ticks per conversion
- Note: Original test was done with 900000 coord pairs, but this entire-string version suffered an out of memory exception so I had to rein it in a bit.
Split and Join (see Ɓukasz's answer):
8780 ms / 18110672 ticks, 175.6 ms / 362213.4 ticks per conversion
Character state machine (see above):
1685 ms / 3475506 ticks, 33.7 ms / 69510.12 ticks per conversion
So, the question of which version is most efficient comes down to: what are your requirements?
Your solution is fine. Maybe you could write it a bit more elegant like this:
string[] coordinatesVal = coordinateTxt.Trim().Split(new string[] { ",0" },
StringSplitOptions.RemoveEmptyEntries);
string result = string.Empty;
foreach (string line in coordinatesVal)
{
string[] numbers = line.Trim().Split(',');
result += numbers[0] + " " + numbers[1] + ", ";
}
result = result.Remove(result.Count()-2, 2);
Note the StringSplitOptions.RemoveEmptyEntries parameter of Split method so you don't have to deal with empty lines into foreach block.
Or you can do extremely short one-liner. Harder to debug, but in simple cases does the work.
string result =
string.Join(", ",
coordinateTxt.Trim().Split(new string[] { ",0" }, StringSplitOptions.RemoveEmptyEntries).
Select(i => i.Replace(",", " ")));
heres another way without defining your own loops and replace methods, or using LINQ.
string coordinateTxt = #" -82.9494547,36.2913021,0
-83.0784938,36.2347521,0
-82.9537782,36.079235,0";
string[] coordinatesVal = coordinateTxt.Replace(",", "*").Trim().Split(new string[] { "*0", Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
string result = string.Join(",", coordinatesVal).Replace("*", " ");
Console.WriteLine(result);
or even
string coordinateTxt = #" -82.9494540,36.2913021,0
-83.0784938,36.2347521,0
-82.9537782,36.079235,0";
string result = coordinateTxt.Replace(Environment.NewLine, "").Replace($",", " ").Replace(" 0", ", ").Trim(new char[]{ ',',' ' });
Console.WriteLine(result);

Dice Sorensen Distance error calculating Bigrams without using Intersect method

I have been programming an object to calculate the DiceSorensen Distance between two strings. The logic of the operation is not so difficult. You calculate how many two letter pairs exist in a string, compare it with a second string and then perform this equation
2(x intersect y)/ (|x| . |y|)
where |x| and |y| is the number of bigram elements in x & y. Reference can be found here for further clarity https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient
So I have tried looking up how to do the code online in various spots but every method I have come across uses the 'Intersect' method between two lists and as far as I am aware this won't work because if you have a string where the bigram already exists it won't add another one. For example if I had a string
'aaaa'
I would like there to be 3 'aa' bigrams but the Intersect method will only produce one, if i am incorrect on this assumption please tell me cause i wondered why so many people used the intersect method. My assumption is based on the MSDN website https://msdn.microsoft.com/en-us/library/bb460136(v=vs.90).aspx
So here is the code I have made
public static double SorensenDiceDistance(this string source, string target)
{
// formula 2|X intersection Y|
// --------------------
// |X| + |Y|
//create variables needed
List<string> bigrams_source = new List<string>();
List<string> bigrams_target = new List<string>();
int source_length;
int target_length;
double intersect_count = 0;
double result = 0;
Console.WriteLine("DEBUG: string length source is " + source.Length);
//base case
if (source.Length == 0 || target.Length == 0)
{
return 0;
}
//extract bigrams from string 1
bigrams_source = source.ListBiGrams();
//extract bigrams from string 2
bigrams_target = target.ListBiGrams();
source_length = bigrams_source.Count();
target_length = bigrams_target.Count();
Console.WriteLine("DEBUG: bigram counts are source: " + source_length + " . target length : " + target_length);
//now we have two sets of bigrams compare them in a non distinct loop
for (int i = 0; i < bigrams_source.Count(); i++)
{
for (int y = 0; y < bigrams_target.Count(); y++)
{
if (bigrams_source.ElementAt(i) == bigrams_target.ElementAt(y))
{
intersect_count++;
//Console.WriteLine("intersect count is :" + intersect_count);
}
}
}
Console.WriteLine("intersect line value : " + intersect_count);
result = (2 * intersect_count) / (source_length + target_length);
if (result < 0)
{
result = Math.Abs(result);
}
return result;
}
In the code somewhere you can see I call a method called listBiGrams and this is how it looks
public static List<string> ListBiGrams(this string source)
{
return ListNGrams(source, 2);
}
public static List<string> ListTriGrams(this string source)
{
return ListNGrams(source, 3);
}
public static List<string> ListNGrams(this string source, int n)
{
List<string> nGrams = new List<string>();
if (n > source.Length)
{
return null;
}
else if (n == source.Length)
{
nGrams.Add(source);
return nGrams;
}
else
{
for (int i = 0; i < source.Length - n; i++)
{
nGrams.Add(source.Substring(i, n));
}
return nGrams;
}
}
So my understanding of the code step by step is
1) pass in strings
2) 0 length check
3) create list and pass up bigrams into them
4) get the lengths of each bigram list
5) nested loop to check in source position[i] against every bigram in target string and then increment i until no more source list to check against
6) perform equation mentioned above taken from wikipedia
7) if result is negative Math.Abs it to return a positive result (however i know the result should be between 0 and 1 already this is what keyed me into knowing i was doing something wrong)
the source string i used is source = "this is not a correct string" and the target string was, target = "this is a correct string"
the result I got was -0.090909090908
I'm SURE (99%) that what I'm missing is something small like a mis-calculated length somewhere or a count mis-count. If anyone could point out what i'm doing wrong I'd be really grateful. Thank you for your time!
This looks like homework, yet this similarity metric on strings is new to me so I took a look.
Algorith implementation in various languages
As you may notice the C# version uses HashSet and takes advantage of the IntersectWith method.
A set is a collection that contains no duplicate elements, and whose
elements are in no particular order.
This solves your string 'aaaa' puzzle. Only one bigram there.
My naive implementation on Rextester
If you prefer Linq then I'd suggest Enumerable.Distinct, Enumerable.Union and Enumerable.Intersect. These should mimic very well the duplicate removal capabilities of the HashSet.
Also found this nice StringMetric framework written in Scala.

Algorithm for finding the next number without two or more consecutive 6s

I have been given a rather odd requirement to fulfill for a particular solution. The requirement is to write a function given the current number, to find the next consecutive number that excludes numbers with two or more consecutive 6s.
So far I have the following code (in C#) which I tested with a few inputs and it works. I know it's not the most efficient solution but it does the job, I just want to see if there's a more efficient way of doing this. The way I go about it is by converting the number into a string and using a simple regular expression to see if the next sequence is a valid one given the requirement. Also I am aware that it will throw a error once the number reaches its (2^31) - 1 limit, but at the moment that's not an issue.
public int GetNextSequenceNumber(int currentSequenceNumber)
{
var nextSequenceCandidate = currentSequenceNumber + 1;
var strNum = nextSequenceCandidate.ToString();
if (IsValidSequenceNumber(strNum))
{
return nextSequenceCandidate;
}
else
{
do
{
strNum = (++nextSequenceCandidate).ToString();
} while (!IsValidSequenceNumber(strNum));
return nextSequenceCandidate;
}
}
private bool IsValidSequenceNumber(string sequenceNumber)
{
return !Regex.IsMatch(sequenceNumber, "[6]{2,}");
}
I'm thinking there's another way where one would use division and the modulus operation to find out the digits position and increment that just as needed. Any input is appreciated, thanks!
The most efficient solution that I see would actually use string replacement, but this works only if you are incrementing and returning all values of the sequence. Just replace 66 by 67.
If any starting number is allowed, you will have to append just as many 0s as there were digits after the first occurrence of 66 in your number string.
Convert your number to decimal format, say to a byte array, scan for 66.
If you can't find it, your are done.
Other wise, change it to 67, followed by all zeros.
I think you best bet is to not think in terms of incrementing a number, but in terms of validating a string.
Something like the following would prevent potentially long loop runs, and should provide the next lowest possible value that doesn't have "66".
public static int GetNextSequenceNumber(int currentSequenceNumber)
{
int nextNumber = currentSequenceNumber += 1;
string nextNumberStr = nextNumber.ToString();
if (!nextNumberStr.Contains("66"))
{
return nextNumber;
}
else
{
//travel from left to right, find the 66, and increment the last 6 and reset the remaining values.
bool doreset = false;
bool lastwassix = false;
string newString = string.Empty;
for (int i = 0; i < nextNumberStr.Length; i++)
{
if (doreset) { newString += '0'; continue; }
char c = nextNumberStr[i];
if (c == '6')
{
if (lastwassix)
{
newString += '7';
doreset = true;
continue;
}
lastwassix = true;
}
newString += c;
}
return Convert.ToInt32(newString);
}
}

How to display the items from a dictionary in a random order but no two adjacent items being the same

First of all, there is actually more restrictions than stated in the title. Plz readon.
say, i have a dictionary<char,int> where key acts as the item, and value means the number of occurrence in the output. (somewhat like weighting but without replacement)
e.g. ('a',2) ('b',3) ('c',1)
a possible output would be 'babcab'
I am thinking of the following way to implement it.
build a new list containing (accumulated weightings,char) as its entry.
randomly select an item from the list,
recalculate the accumulated weightings, also set the recent drawn item weighing as 0.
repeat.
to some extent there might be a situation like such: 'bacab' is generated, but can do no further (as only 'b' left, but the weighting is set to 0 as no immediate repetition allowed). in this case i discard all the results and start over from the very beginning.
Is there any other good approach?
Also, what if i skip the "set the corresponding weighting to 0" process, instead I reject any infeasible solution. e.g. already i got 'bab'. In the next rng selection i get 'b', then i redo the draw process, until i get something that is not 'b', and then continue. Does this perform better?
How about this recursive algorithm.
Create a list of all characters (candidate list), repeating them according to their weight.
Create an empty list of characters (your solution list)
Pick a random entry from the candidate list
If the selected item (character) is the same as the last in solution list then start scanning for another character in the candidate list (wrapping around if needed).
If no such character in step 4 can be found and candidate list is not empty then backtrack
Append the selected character to the solution list.
If the candidate list is empty print out the solution and 'backtrack', else go to step 3.
I'm not quite sure about the 'backtrack' step yet, but you should get the general idea.
Try this out, it should generate a (pseudo) random ordering of the elements in your enumerable. I'd recommend flattening from your dictionary to a list:
AKA Dictionary of
{b, 2}, {a, 3} becomes {b} {b} {a} {a} {a}
public static IEnumerable<T> RandomPermutation<T>(this IEnumerable<T> enumerable)
{
if (enumerable.Count() < 1)
throw new InvalidOperationException("Must have some elements yo");
Random random = new Random(DateTime.Now.Millisecond);
while (enumerable.Any())
{
int currentCount = enumerable.Count();
int randomIndex = random.Next(0, currentCount);
yield return enumerable.ElementAt(randomIndex);
if (randomIndex == 0)
enumerable = enumerable.Skip(1);
else if (randomIndex + 1 == currentCount)
enumerable = enumerable.Take(currentCount - 1);
else
{
T removeditem = enumerable.ElementAt(randomIndex);
enumerable = enumerable.Where(item => !item.Equals(removeditem));
}
}
}
If you need additional permutations, simply call it again for another random ordering. While this wont get you every permutation, you should find something useful. You can also use this as a base line to get a solution going.
This should be split into some seperate methods and could use some refactoring but the idea is to implement it in such a way that it does not depend on randomly moving things around till you get a valid result. That way you can't predict how long it would take
Concatenate all chars to a string and randomize that string
Loop through the randomized string and find any char that violates the rule
Remove that char from the string
Pick a random number. Use this number as "put the removed char at the nth valid position")
Loop around the remaining string to find the Nth valid postion to put the char back.
If there is no valid position left drop the char
Repeat from step 2 until no more violations are found
using System;
using System.Collections.Generic;
namespace RandomString
{
class Program
{
static void Main(string[] args)
{
Random rnd = new Random(DateTime.Now.Millisecond);
Dictionary<char, int> chars = new Dictionary<char, int> { { 'a', 2 }, { 'b', 3 }, { 'c', 1 } };
// Convert to a string with all chars
string basestring = "";
foreach (var pair in chars)
{
basestring += new String(pair.Key, pair.Value);
}
// Randomize the string
string randomstring = "";
while (basestring.Length > 0)
{
int randomIndex = rnd.Next(basestring.Length);
randomstring += basestring.Substring(randomIndex, 1);
basestring = basestring.Remove(randomIndex, 1);
}
// Now fix 'violations of the rule
// this can be optimized by not starting over each time but this is easier to read
bool done;
do
{
Console.WriteLine("Current string: " + randomstring);
done = true;
char lastchar = randomstring[0];
for (int i = 1; i < randomstring.Length; i++)
{
if (randomstring[i] == lastchar)
{
// uhoh violation of the rule. We pick a random position to move it to
// this means it gets placed at the nth location where it doesn't violate the rule
Console.WriteLine("Violation at position {0} ({1})", i, randomstring[i]);
done = false;
char tomove = randomstring[i];
randomstring = randomstring.Remove(i, 1);
int putinposition = rnd.Next(randomstring.Length);
Console.WriteLine("Moving to {0}th valid position", putinposition);
bool anyplacefound;
do
{
anyplacefound = false;
for (int replace = 0; replace < randomstring.Length; replace++)
{
if (replace == 0 || randomstring[replace - 1] != tomove)
{
// then no problem on the left side
if (randomstring[replace] != tomove)
{
// no problem right either. We can put it here
anyplacefound = true;
if (putinposition == 0)
{
randomstring = randomstring.Insert(replace, tomove.ToString());
break;
}
putinposition--;
}
}
}
} while (putinposition > 0 && anyplacefound);
break;
}
lastchar = randomstring[i];
}
} while (!done);
Console.WriteLine("Final string: " + randomstring);
Console.ReadKey();
}
}
}

Categories

Resources