Generate a unique string based on a pair of strings - c#

I've two strings StringA, StringB. I want to generate a unique string to denote this pair.
i.e.
f(x, y) should be unique for every x, y and f(x, y) = f(y, x) where x, y are strings.
Any ideas?

Compute a message digest of both strings and XOR the values
MD5(x) ^ MD5(Y)
The message digest gives you unique value for each string and the XOR makes it possible for f(x, y) to be equal to f(y, x).
EDIT: As #Phil H observed, you have to treat the case in which you receive two equal strings as input, which would generate 0 after the XOR. You could return something like an MD5(x+y) if x and y are the same, and MD5(x) ^ MD5(y) for the rest of values.

Just create a new class and override Equals & GetHashCode:
class StringTuple
{
public string StringA { get; set; }
public string StringB { get; set; }
public override bool Equals(object obj)
{
var stringTuple = obj as StringTuple;
if (stringTuple == null)
return false;
return (StringA.Equals(stringTuple.StringA) && StringB.Equals(stringTuple.StringB)) ||
(StringA.Equals(stringTuple.StringB) && StringB.Equals(stringTuple.StringA));
}
public override int GetHashCode()
{
// Order of operands is irrelevant when using *
return StringA.GetHashCode() * StringB.GetHashCode();
}
}

Just find a unique way of ordering them and concatenate with a separator.
def uniqueStr(strA,strB,sep):
if strA <= strB:
return strA+sep+strB
else:
return strB+sep+strA
For arbitrarily long lists of strings, either sort the list or generate a set, then concatenate with a separator:
def uniqueStr(sep,strList):
return sep.join(Set(strList));
Preferably, if the strings are long or the separator choice is a problem, use the hashes and hash the result:
def uniqueStr(sep,strList):
return hash(''.join([hash(str) for str in Set(strList)]))

I think the following should yield unique strings:
String f = Replace(StringA<StringB?StringA:StringB,"#","##") + "}#{" + Replace(StringA<StringB?StringB:StringA,"#","##")
(That is, there's only one place in the string where a single "#" sign can appear, and we don't have to worry about a run of "#"s at the end of StringA being confused with a run of "#"s at the start of StringB.

You can use x.GetHashCode(). That not ensures that this will be unique, but quite. See more information in this question.
For example:
public int GetUniqueValue(string x, string y)
{
unchecked {
var result = x.GetHashCode() * x.GetHashCode();
return result;
}
}

Well take into consideration the first letter of each string before combining them? So if it is alphabetically ordered f(x, y) = f(y, x) will be true.
if(x > y)
c = x + y;
else
c = y + x;

What about StringC = StringA + StringB;.
That is guaranteed to be unique for any combination of StringA or StringB. Or did you have some other considerations for the string also?
You can for example combine the strings and take the MD5 hash of it. Then you will get a string that is probably "unique enough" for your needs, but you cannot reverse the hash back into the strings again, but you can take the same strings and be sure that the generated hash will be the same the next time.
EDIT
I saw your edit now, but I feel it's only a matter of sorting the strings first in that case. So something like
StringC = StringA.CompareTo(StringB) < 0 ? StringA + StringB : StringB + StringA;

You could just sort them and concatenate them, along with, lets, say the lenght of the first word.
That way f("one","two") = "onetwo3", f("two","one") = "onetwo3", and no other combination would produce that unique string as , e,g, "onet", "wo" would yield "onetwo4"
However, this will be a abysmal solution for reasonably long strings.
You could also do some sort of hash code calculcation, like this
first.GetHashCode() ^ second.GetHashCode()
that would be reasonably unique, however, you can't guarantee uniqueness.
It would be nice if the OP provided a little more context, because this does not sound like a sound solution to any problem.

public static String getUniqString(String x,String y){
return (x.compareTo(y)<0)?(x+y):(y+x);
}

Related

Is it safe to use float.NaN as Dictionary key?

code(from interactive shell):
> var a = new Dictionary<float, string>();
> a.Add(float.NaN, "it is NaN");
> a[float.NaN]
"it is NaN"
So it is possible, but is it safe?
Paraphrasing from https://github.com/dotnet/corefx/blob/master/src/Common/src/CoreLib/System/Single.cs;
public const float NaN = (float)0.0 / (float)0.0;
public static unsafe bool IsNaN(float f) => f != f;
public int CompareTo(object? value){
...
if (m_value < f) return -1;
if (m_value > f) return 1;
if (m_value == f) return 0;
if (IsNaN(m_value))
return IsNaN(f) ? 0 : -1;
else // f is NaN.
return 1;
}
public bool Equals(float obj)
{
if (obj == m_value)
{
return true;
}
return IsNaN(obj) && IsNaN(m_value);
}
public override int GetHashCode()
{
int bits = Unsafe.As<float, int>(ref Unsafe.AsRef(in m_value));
// Optimized check for IsNan() || IsZero()
if (((bits - 1) & 0x7FFFFFFF) >= 0x7F800000)
{
// Ensure that all NaNs and both zeros have the same hash code
bits &= 0x7F800000;
}
return bits;
}
You can see that NaN requires special handling in each of these cases. The standard IEEE representation leaves most bits undefined, and defines special cases for comparisons even if those bit values are identical.
However you can also see that both GetHashCode() && Equals() treat two NaN's as equivalent. So I believe that using NaN as a dictionary key should be fine.
That depends on what you mean by safe.
If you expect people to be able to use the dictionary and compare its keys to other floats, they will have to deal with a key value of NaN correctly themselves. And since float.NaN == float.NaN happens to be False, that may cause issues down the line.
However, the Dictionary succeeds in performing the lookup and other operations work correctly as well.
The question here is really why you need it in the first place?
It's bad idea to use float as key of Dictionary.
In theory you can do it. But when you work with float\double\decimal you shoud use some Epsilon to compare 2 values. Use formula like this:
abs(a1 - a2) < Epsilon
It's need due to rounding of float in operations and existing of irrational numbers. For example how you will compare with PI or sqrt(2)?
So, on this case using float as dictionary key is bad idea.

Regex for words that miss a letter and also two letters that are inverted in order

I want to ellaborate a regex that covers the following scenarios:
The searched word is "potato".
It matches if the user searches for "potaot" (He typed quickly and the "o" finger was faster than the "t" finger. (done)
It matches if the user searches for "ptato" (He forgot one letter). (done)
With my knowlege of regex the further I could go was:
(?=[potato]{5,6})p?o?t?a?t?o?
The problem with this is that it matches reversed words like "otatop", which is a little clever but a little bezarre, and "ooooo", which is totally undesirable. So not I describe what I don't want.
I don't want repeated letters to match "ooooo", "ooopp" and etc. (unable)
By the way I'm using C#.
Don't use regular expressions.
The best solution is the simple one. There are only eleven possible inexact matches, so just enumerate them:
List<string> inexactMatches = new List<string> {
"otato", "ptato", "poato", "potto", "potao", "potat",
"optato", "ptoato", "poatto", "pottao", "potaot"};
...
bool hasInexactMatch = inexactMatches.Contains(whatever);
It took less than a minute to type those out; use the easy specific solution rather than trying to do some crazy regular expression that's going to take you hours to find and debug.
If you're going to insist on using a regular expression, here's one that works:
otato|ptato|poato|potto|potao|potat|optato|ptoato|poatto|pottao|potaot
Again: simpler is better.
Now, one might suppose that you want to solve this problem for more words than "potato". In that case, you might have said so -- but regardless, we can come up with some easy solutions.
First, let's enumerate all the strings that have an omission of one letter from a target string. Strings are IEnumerable<char> so let's solve the general problem:
static IEnumerable<T> OmitAt<T>(this IEnumerable<T> items, int i) =>
items.Take(i).Concat(items.Skip(i + 1));
That's a bit gross enumerating the sequence twice but I'm not going to stress about it. Now let's make a specific version for strings:
static IEnumerable<string> Omits(this string s) =>
Enumerable
.Range(0, s.Length)
.Select(i => new string(s.OmitAt(i).ToArray()));
Great. Now we can say "frob".Omits() and get back rob, fob, frb, fro.
Now let's do the swaps. Again, solve the general problem first:
static void Swap<T>(ref T x, ref T y)
{
T t = x;
x = y;
y = t;
}
static T[] SwapAt<T>(this IEnumerable<T> items, int i)
{
T[] newItems = items.ToArray();
Swap(ref newItems[i], ref newItems[i + 1]);
return newItems;
}
And now we can solve it for strings:
static IEnumerable<string> Swaps(this string s) =>
Enumerable
.Range(0, s.Length - 1)
.Select(i => new string(s.SwapAt(i)));
And now we're done:
string source = "potato";
string target = whatever;
bool match =
source.Swaps().Contains(target) ||
source.Omits().Contains(target);
Easy peasy. Solve general problems using simple, straightforward, correct algorithms that can be composed into larger solutions. None of my algorithms there was more than three lines long and they can easily be seen to be correct.
The weapon of choice here is a similarity (or distance) matching algorithm.
Compare similarity algorithms gives a good overview of the most common distance metrics/algorithms.
The problem is that there is no single best metric. The choice depends, e.g. on input type, accuracy requirements, speed, resources availability, etc. Nevertheless, comparing algorithms can be messy.
Two of the most commonly suggested metrics are the Levenshtein distance and Jaro-Winkler:
Levenshtein distance, which provides a similarity measure between two strings, is arguably less forgiving, and more intuitive to understand than some other metrics. (There are modified versions of the Levenshtein distance like the Damerau-Levenshtein distance, which includes transpositions, that could be even more appropriate to your use case.)
Some claim the Jaro-Winkler, which provides a similarity measure between two strings allowing for character transpositions to a degree adjusting the weighting for common prefixes, distance is "one of the most performant and accurate approximate string matching algorithms currently available [Cohen, et al.], [Winkler]." However, the choice still depends very much on the use case and one cannot draw general conclusions from specific studies, e.g. name-matching Cohen, et al. 2003.
You can find plenty of packages on NuGet that offer you a variety of similarity algorithms (a, b, c), fuzzy matches, phonetic, etc. to add this feature to your site or app.
Fuzzy matching can also be used directly on the database layer. An implementation of the Levenshtein distance can be found for most database systems (e.g. MySQL, SQL Server) or is already built-in (Oracle, PostgreSQL).
Depending on your exact use case, you could also use a cloud-based solution (i.e. use a microservice based on AWS, Azure, etc. or roll-your-own) to get autosuggest-like fuzzy search/matching.
It's easiest to do it this way:
static void Main(string[] args)
{
string correctWord = "Potato";
string incorrectSwappedWord = "potaot";
string incorrectOneLetter = "ptato";
// Returns true
bool swapped = SwappedLettersMatch(correctWord, incorrectSwappedWord);
// Returns true
bool oneLetter = OneLetterOffMatch(correctWord, incorrectOneLetter);
}
public static bool OneLetterOffMatch(string str, string input)
{
int ndx = 0;
str = str.ToLower();
input = input.ToLower();
if (string.IsNullOrWhiteSpace(str) || string.IsNullOrWhiteSpace(input))
{
return false;
}
while (ndx < str.Length)
{
string newStr = str.Remove(ndx, 1);
if (input == newStr)
{
return true;
}
ndx++;
}
return false;
}
public static bool SwappedLettersMatch(string str, string input)
{
if (string.IsNullOrWhiteSpace(str) || string.IsNullOrWhiteSpace(input))
{
return false;
}
if (str.Length != input.Length)
{
return false;
}
str = str.ToLower();
input = input.ToLower();
int ndx = 0;
while (ndx < str.Length)
{
if (ndx == str.Length - 1)
{
return false;
}
string newStr = str[ndx + 1].ToString() + str[ndx];
if (ndx > 0)
{
newStr = str.Substring(0, ndx) + newStr;
}
if (str.Length > ndx + 2)
{
newStr = newStr + str.Substring(ndx + 2);
}
if (newStr == input)
{
return true;
}
ndx++;
}
return false;
}
OneLetterOffMatch will return true is there is a match that's off by just one character missing. SwappedLettersMatch will return true is there is a match when just two of the letters are swapped. These functions are case-insenstive, but if you want a case-sensitive version, just remove the calls to .ToLower().

Property or indexer 'string.this[int]' cannot be assigned to -- it's read only

I didn't get the problem - I was trying to do a simple action:
for(i = x.Length-1, j = 0 ; i >= 0 ; i--, j++)
{
backx[j] = x[i];
}
Both are declared:
String x;
String backx;
What is the problem ? It says the error in the title...
If there is a problem - is there another way to do that?
The result (As the name 'backx' hints) is that backx will contain the string X backwards.
P.S. x is not empty - it contains a substring from another string.
Strings are immutable: you can retrieve the character at a certain position, but you cannot change the character to a new one directly.
Instead you'll have to build a new string with the change. There are several ways to do this, but StringBuilder does the job in a similar fashion to what you already have:
StringBuilder sb = new StringBuilder(backx);
sb[j] = x[i];
backx = sb.ToString();
EDIT: If you take a look at the string public facing API, you'll see this indexer:
public char this[int index] { get; }
This shows that you can "get" a value, but because no "set" is available, you cannot assign values to that indexer.
EDITx2: If you're looking for a way to reverse a string, there are a few different ways, but here's one example with an explanation as to how it works: http://www.dotnetperls.com/reverse-string
String is immutable in .NET - this is why you get the error.
You can get a reverse string with LINQ:
string x = "abcd";
string backx = new string(x.Reverse().ToArray());
Console.WriteLine(backx); // output: "dcba"
String are immuatable. You have convert to Char Array and then you would be able to modify.
Or you can use StringBuilder.
for example
char[] wordArray = word.ToCharArray();
In C# strings are immutable. You cannot "set" Xth character to whatever you want. If yo uwant to construct a new string, or be able to "edit" a string, use i.e. StringBuilder class.
Strings are immutable in C#. You can read more about it here: http://msdn.microsoft.com/en-us/library/362314fe.aspx
Both the variables you have are string while you are treating them as if they were arrays (well, they are). Of course it is a valid statement to access characters from a string through this mechanism, you cannot really assign it that way.
Since you are trying to reverse a string, do take a look at this post. It has lot of information.
public static string ReverseName( string theName)
{
string revName = string.Empty;
foreach (char a in theName)
{
revName = a + revName;
}
return revName;
}
This is simple and does not involve arrays directly.
The code below simply swaps the index of each char in the string which enables you to only have to iterate half way through the original string which is pretty efficient if you're dealing with a lot of characters. The result is the original string reversed. I tested this with a string consisting of 100 characters and it executed in 0.0000021 seconds.
private string ReverseString(string testString)
{
int j = testString.Length - 1;
char[] charArray = new char[testString.Length];
for (int i = 0; i <= j; i++)
{
if (i != j)
{
charArray[i] = testString[j];
charArray[j] = testString[i];
}
j--;
}
return new string(charArray);
}
In case you need to replace e.g. index 2 in string use this (it is ugly, but working and is easily maintainbable)
V1 - you know what you want to put their. Here you saying in pseudocode string[2] = 'R';
row3String.Replace(row3String[2], 'R');
V2 - you need to put their char R or char Y. Here string[2] = 'R' if was 'Y' or if was not stay 'Y' (this one line if needs some form of else)
row3String.Replace(row3String[2], row3String[2].Equals('Y') ? 'R' : 'Y');

How to store a ratio in a single variable and read it back in C#

Let's say I have a system that must store how many people voted on fighter A and how many on fighter B.
Let's say ratio is 200:1
How can I store that value in a single variable, instead of storing both values (number of voters on A and number of voters on B) in two variables.
How you would do that?
From the way that the question is worded this might not be the answer you are looking for, but the easiest way is to use a struct:
struct Ratio
{
public Ratio(int a, int b)
{
this.a = a;
this.b = b;
}
public int a = 1;
public int b = 1;
}
You will almost certainly want to use properties instead of fields and you will probably also want to overload == and !=, something like:
public static bool operator ==(Ratio x, Ratio y)
{
if (x.b == 0 || y.b == 0)
return x.a == y.a;
// There is some debate on the most efficient / accurate way of doing the following
// (see the comments), however you get the idea! :-)
return (x.a * y.b) == (x.b / y.a);
}
public static bool operator !=(Ratio x, Ratio y)
{
return !(x == y);
}
public override string ToString()
{
return string.Format("{0}:{1}", this.a, this.b);
}
A ratio is, by definition, the result of dividing two numbers. Therefore, just do that division and store the result in a double:
double ratio = (double)a/b;
string ratio="200:1"; // simple :)
You can use struct like this:
struct Ratio
{
public void VoteA()
{
A++;
}
public void VoteB()
{
B++;
}
public int A { get; private set; }
public int B { get; private set; }
public override string ToString()
{
return A + ":" + B;
}
}
It's enough to implement voting in case if you have only two options available. Otherwise you should implement constructor accepting number of options, data structure to store number of votes, methods for voting or index operator.
If you need the ratio for some integral math applications you might want to implement GCD method.
It think you need to do it in a string for a single variable
string s=string.Format("{0}:{1}", 3, 5);
Array? int[] ratio = new int[2] is much slimmer than a whole struct/class for 2 variables. Though if you want to add helper methods to it, a struct is the way to go.
In my opinion you can use doubles.
Ratio Number
1:200 1.200
200:1 200:1
0:1 0.1
1:0 1.0
0:0 0.0
It's easy to use.
firstNumber = (int)Number;
secondNumber = Number - firstNumber;

C# Regex sorting letter and number strings

I have a list that needs ordering say:
R1-1
R1-11
R2-2
R1-2
this needs to be ordered:
R1-1
R1-2
R1-11
R2-2
Currently I am using the C# Regex.Replace method and adding a 0 before the occurance of single numbers at the end of a string with something like:
Regex.Replace(inString,#"([1-9]$)", #"0$2")
I'm sure there is a nicer way to do this which I just can't figure out.
Does anyone have a nice way of sorting letter and number strings with regex?
I have used Greg's method below to complete this and just thought I should add the code I am using for completeness:
public static List<Rack> GetRacks(Guid aisleGUID)
{
log.Debug("Getting Racks with aisleId " + aisleGUID);
List<Rack> result = dataContext.Racks.Where(
r => r.aisleGUID == aisleGUID).ToList();
return result.OrderBy(r => r.rackName, new NaturalStringComparer()).ToList();
}
I think what you're after is natural sort order, like Windows Explorer does? If so then I wrote a blog entry a while back showing how you can achieve this in a few lines of C#.
Note: I just checked and using the NaturalStringComparer in the linked entry does return the order you are looking for with the example strings.
You can write your own comparator and use regular expressions to compare the number between "R" and "-" first, followed by the number after "-", if the first numbers are equal.
Sketch:
public int Compare(string x, string y)
{
int releaseX = ...;
int releaseY = ...;
int revisionX = ...;
int revisionY = ...;
if (releaseX == releaseY)
{
return revisionX - revisionY;
}
else
{
return releaseX - releaseY;
}
}

Categories

Resources