Optimal way to find pairs in a list of strings

Optimal way to find pairs in a list of strings - c#

I have a list of Unique strings in a list. I need to find the pairing elements. A Pairing element is one which matches either of following conditions
strings A, B are a pair if they A end in "_N" and B end in "_P" or Vice versa eg: ABCD_N & ABCD_P are a pair
strings A, B are a pair if an occurrence of "N" is replaced by "P" will give you string B or vice versa.
Eg: ABNX,ABPX are both pairs
Right now I am looping the list and searching for its corresponding pair in rest of list and pairing them which is at least O(n^2) and my list of string can be huge up to a million
for(int i = 0; i < list.Count; i++)
{
for(int j=0;j<list.Count && j != i; j++)
{
if(list[i].EndsWith("_N") || list[j].EndsWith("_P"))
{
//Call method to find corresponding pair for this string; O(n)
//Call my processPairsmethod()
}
}
}
Other option is Regex which I can leverage for ending in _N & _P condition but not sure how to get a regex for second condition.
PS: No duplicates exist in the list
Any pointers would be helpful

If you know the strings are unique, you could just create a function that creates a unique key, e.g. replacing both N and P with a non-existing char (like '*'):
// matches one single "N" or "P", that will be replaced by a single "*"
var rx = new Regex("N|P", RegexOptions.Compiled);
// pairs will be a list of string[2] arrays
var pairs = list.Select(s => new { Original = s, Key = rx.Replace(s, "*") })
.ToLookup(obj => obj.Key)
// each grp should be of length 2. In this select, validity checks should
// be performed
.Select(grp => grp.Select(obj => obj.Original).ToArray())
.ToList();
Using ToLookup should give you better performance: it roughly should go as O(n), because lookup searches would be (around) O(1).
Bear in mind the code is untested.

First your algo is flawed because you assume that strings ending with _N and _P would be pairing, but what about ABC_N and ABCDE_P ? They dont pair. (different length)
What you have to do is to iterate over each string over each character, and have only one character that differ, which will be N for one string and P for the other.
for(int i = 0; i < list.Count; i++)
{
for(int j=i + 1;j<list.Count && j != i; j++) // you dont need to start at j = 0
{
int a = 0;
while (list[i][a] == list[j][a])
a++;
if ((list[i][a] == 'N' && list[j][a] == 'P') || (list[i][a] == 'P' && list[j][a] == 'N') {
while (list[i][a] == list[j][a])
a++;
if (list[i][a] == list[j][a] && list[j][a] == '\0')
// both list items match
//Call my
}
}
}
}
O(N) Algorithm
private static Dictionary<string, string> map = new Dictionary<string, string>();
for(int i = 0; i < list.Count; i++) {
// replace N by P in the string
if (map.ContainsKey(list[i]))
processPairsmethod(map[list[i]], list[i]);
else {
int a = 0;
while (list[i][a]) {
if (list[i][a] == 'N' || 'P') {
StringBuilder sb = new StringBuilder(list[i]);
sb[a] = list[i][a] ^ 'N' ^ 'P'; // will replace N by P and P by N thanks to xoring operation as N ^ N ^ P = P
String key = sb.ToString();
map[key] = list[i]; // map["ANBC "] = "APBC", now you just need to check for the key.
}
a++;
}
}
}

Related

Extract elements from List of List in C#

I'm trying to solve the HackerRank excercise "Non-Divisible Subset"
https://www.hackerrank.com/challenges/non-divisible-subset/
Excercise track
The exercise track is about creating a program that will take in a list of integers and a number 'k', and will output the count of the maximum number of integers in the list that are not divisible by 'k' and are non-repeating.
My problem is that results differs from Expected output.
Can you detect any problems in my code? Probably it's a logic error but I'm stuck. Please help me.
With input k=9 and input list = 422346306, 940894801, 696810740, 862741861, 85835055, 313720373,
output should be 5 but my code get 6.
public static int nonDivisibleSubset(int k, List<int> s)
{
var x = GetPerm(s);
var y = x.Where(x => x.Value % k != 0).Select(x=>x.Key).ToList();
var a = y.SelectMany(x => x).ToHashSet();
return a.Count();
}
static Dictionary<List<int>,int> GetPerm (List<int> list)
{
Dictionary<List<int>,int> perm = new Dictionary<List<int>, int>();
for (int i = 0; i < list.Count; i++)
{
for (int j = i+1; j < list.Count; j++)
{
List<int> sumCouple = new List<int>();
sumCouple.Add(list[i]);
sumCouple.Add(list[j]);
perm.Add(sumCouple, sumCouple.Sum());
}
}
return perm;
}

As I can see the actual problem is quite different:
Given a set of distinct integers, print the size of a maximal subset of where the sum of any numbers in is not evenly divisible by k.
If we have a look at the example:
list = {422346306, 940894801, 696810740, 862741861, 85835055, 313720373}
k = 9
we can't take all 6 numbers since 940894801 + 313720373 is evenly divisible by k = 9. The required subset is all but last item: {422346306, 940894801, 696810740, 862741861, 85835055}
And the solution will be different as well:
public static int nonDivisibleSubset(int k, List<int> s)
{
Dictionary<int, int> remainders = s
.GroupBy(item => item % k)
.ToDictionary(group => group.Key, group => group.Count());
int result = 0;
foreach (var pair in remainders) {
if (pair.Key == 0 || pair.Key % (k / 2) == 0 && k % 2 == 0)
result += 1;
else if (!remainders.TryGetValue(k - pair.Key, out int count))
result += pair.Value;
else if (count < pair.Value)
result += pair.Value;
else if (count == pair.Value && pair.Key < k - pair.Key)
result += pair.Value;
}
return result;
}
The idea is to group all the numbers by their remainder when devided by k. Then we do the follow:
if remainder is 0 or k / 2 (for even k) we can take just one such number into the subset
if remainder is x we can add to subset either all such numbers or all the numbers which have remainder k - x.
Time complexity: O(n)
Space complexity: O(n)

Not an answer to the question - but there is at least one problem with your code - List's can't be used as a key for dictionary as is, because it does not override Equals/GetHashCode, so it will perform reference comparison. You can provide a custom equality comparer:
class PairListEqComparer : IEqualityComparer<List<int>>
{
public static PairListEqComparer Instance { get; } = new PairListEqComparer();
public bool Equals(List<int> x, List<int> y)
{
if (ReferenceEquals(x, y)) return true;
if (ReferenceEquals(x, null)) return false;
if (ReferenceEquals(y, null)) return false;
if (x.Count != 2 || y.Count != 2) return false; // or throw
return x[0] == y[0] && x[1] == y[1];
}
public int GetHashCode(List<int> obj) => HashCode.Combine(obj.Max(), obj.Min(), obj.Count);
}
And usage:
Dictionary<List<int>,int> perm = new Dictionary<List<int>, int>(PairListEqComparer.Instance);
Or consider using ordered value tuples (compiler will generate the needed methods). From that you can think about optimizations and better algorithm.
As for solution itself - valid brute-force approach would be to generate all permutations of all sizes, i.e. from 1 to s.Count and find the longest one which satisfies the condition (though I doubt that it will be efficient enough for hackerrank)

Finding the longest word (and write it out) in string without split, distinct and foreach [duplicate]

This question already has answers here:
Finding longest word in string
(4 answers)
Closed 3 years ago.
I got an assignment to make a method to find the longest word in a string without split, distinct and foreach.
I was able to split the words and count the length but I am stuck on how can I actually compare and write them out.
static void Main(string[] args)
{
String s1 = "Alex has 2 hands.";
longestWord(s1);
Console.
}
static void longestWord(String s1)
{
char emp = ' ';
int count = 0;
char[] ee = s1.ToCharArray();
for (int i = 0; i < ee.Length; i++)
{
if (ee[i] == emp || ee[i] == '.')
{
count++;
Console.Write(" " + (count-1));
Console.WriteLine();
count = 0;
}
else
{
Console.Write(ee[i]);
count++;
}
}
}
The output right now looks like this:
Alex 4
has 3
2 1
hands 5
I am pretty sure I would be able to get only the longest number to show by comparing count before reset with temp int but how to write out the word with it.
Or if there is a easier way which probably is.

You are already on a good way. Instead of directly printing the words, store the length and position of the longest word and print it at the end. Like so:
static void longestWord(String s1)
{
char emp = ' ';
int longestLength = 0;
int longestStart = 0;
int currentStart = 0;
for (int i = 0; i < s1.Length; i++)
{
if (s1[i] == emp || s1[i] == '.')
{
// calculate the current word length
int currentLength = i - currentStart;
// test if this is longer than the currently longest
if(currentLength > longestLength)
{
longestLength = currentLength;
longestStart = currentStart;
}
// a new word starts at the next character
currentStart = i + 1;
}
}
// print the longest word
Console.WriteLine($"Longest word has length {longestLength}: \"{s1.Substring(longestStart, longestLength)}\"");
}
There is no need for the .ToCharArray(). You can access the string directly.

I will question whether you are actually supposed to treat "2" as a word and count it at all. Using regular expressions will allow you to approach the problem using a LINQ one-liner:
static void Main(string[] args)
{
String s1 = "Alex has 2 hands.";
var word = longestWord(s1);
Console.WriteLine(word);
//Console.ReadLine();
}
static string longestWord(string s1) {
return Regex.Matches(s1,"[A-Za-z]+") // find all sequences containing alphabetical characters, you can add numberas as well: [A-Za-z0-9]
.Cast<Match>() // cast results to Enumberable<Match> so we can apply LINQ to it
.OrderByDescending(m => m.Length) // luckily for us, Match comes with Length, so we just sort our results by it
.Select(m => m.Value) // instead of picking up everything, we only want the actual word
.First(); // since we ordered by descending - first item will be the longest word
}

You can store for every word the chars in new list of chars (list for dynamic length)
and if the new word is longer of the prev long word convert it to string.
If you have two word in same length it will take the first.
If you want the last change the "maxLength < count" to "maxLength <= count"
static string longestWord(String s1)
{
char emp = ' ';
int count = 0;
int maxLength = 0;
string maxWord = string.Empty;
List<char> newWord = new List<char>();
char[] ee = s1.ToCharArray();
for (int i = 0; i < ee.Length; i++)
{
if (ee[i] == emp || ee[i] == '.')
{
if (maxLength < count)
{
maxLength = count;
maxWord = new string(newWord.ToArray());
}
count = 0;
newWord = new List<char>();
}
else
{
newWord.Add(ee[i]);
count++;
}
}
return maxWord;
}

find number with no pair in array

I am having trouble with a small bit of code, which in a random size array, with random number pairs, except one which has no pair.
I need to find that number which has no pair.
arLength is the length of the array.
but i am having trouble actually matching the pairs, and finding the one which has no pair..
for (int i = 0; i <= arLength; i++)
{ // go through the array one by one..
var number = nArray[i];
// now search through the array for a match.
for (int e = 0; e <= arLength; e++)
{
if (e != i)
{
}
}
}
I have also tried this :
var findValue = nArray.Distinct();
I have searched around, but so far, i haven't been able to find a method for this.
This code is what generates the array, but this question isn't about this part of the code, only for clarity.
Random num = new Random();
int check = CheckIfOdd(num.Next(1, 1000000));
int counter = 1;
while (check <= 0)
{
if (check % 2 == 0)
{
check = CheckIfOdd(num.Next(1, 1000000)); ;
}
counter++;
}
int[] nArray = new int[check];
int arLength = 0;
//generate arrays with pairs of numbers, and one number which does not pair.
for (int i = 0; i < check; i++)
{
arLength = nArray.Length;
if (arLength == i + 1)
{
nArray[i] = i + 1;
}
else
{
nArray[i] = i;
nArray[i + 1] = i;
}
i++;
}

You can do it using the bitwise operator ^, and the complexity is O(n).
Theory
operator ^ aka xor has the following table:
So suppose you have only one number without pair, all the pairs will get simplified because they are the same.
var element = nArray[0];
for(int i = 1; i < arLength; i++)
{
element = element ^ nArray[i];
}
at the end, the variable element will be that number without pair.

Distict will give you back the array with distinct values. it will not find the value you need.
You can GroupBy and choose the values with Count modulo 2 equals 1.
var noPairs = nArray.GroupBy(i => i)
.Where(g => g.Count() % 2 == 1)
.Select(g=> g.Key);

You can use a dictionary to store the number of occurrences of each value in the array. To find the value without pairs, look for a (single) number of occurrences smaller than 2.
using System.Linq;
int[] data = new[] {1, 2, 3, 4, 5, 3, 2, 4, 1};
// key is the number, value is its count
var numberCounts = new Dictionary<int, int>();
foreach (var number in data) {
if (numberCounts.ContainsKey(number)) {
numberCounts[number]++;
}
else {
numberCounts.Add(number, 1);
}
}
var noPair = numberCounts.Single(kvp => kvp.Value < 2);
Console.WriteLine(noPair.Key);
Time complexity is O(n) because you traverse the array only a single time and then traverse the dictionary a single time. The same dictionary can also be used to find triplets etc.
.NET Fiddle

An easy and fast way to do this is with a Frequency Table. Keep a dictionary with as key your number and as value the number of times you found it. This way you only have to run through your array once.
Your example should work too with some changes. It will be a lot slower if you have a big array.
for (int i = 0; i <= arLength; i++)
{
bool hasMatch = false;
for (int e = 0; e <= arLength; e++)
{
if (nArray[e] == nArray[i])//Compare the element, not the index.
{
hasMatch = true;
}
}
//if hasMatch == false, you found your item.
}

All you have to do is to Xor all the numbers:
int result = nArray.Aggregate((s, a) => s ^ a);
all items which has pair will cancel out: a ^ a == 0 and you'll have the distinc item: 0 ^ 0 ^ ...^ 0 ^ distinct ^ 0 ^ ... ^0 == distinct

Because you mentioned you like short and simple in a comment, how about getting rid of most of your other code as well?
var total = new Random().Next(500000) * 2 + 1;
var myArray = new int[total];
for (var i = 1; i < total; i+=2)
{
myArray[i] = i;
myArray[i -1] = i;
}
myArray[total - 1] = total;
Then indeed use Linq to get what you are looking for. Here is a slight variation, returning the key of the item in your array:
var key = myArray.GroupBy(t => t).FirstOrDefault(g=>g.Count()==1)?.Key;

C# char Conversion/Append to/with int, int64, string

Lately I have been working through Project Euler, specifically
https://projecteuler.net/problem=4
I create to arrays
Multiply them together
Convert the number in a CharArry
Compare the numbers
If true, my problem arises
I attempt to convert the char back to an int, or long, or string,
and
I have attempted to append the char to an int, or long, or string, or whatever
void Main()
{
int[] arrOne = new int[900]; // Initializing Array One
int[] arrTwo = new int[900]; // Initializing Array Two
Console.WriteLine(PopulateAndConvert(arrOne, arrTwo)); // Sending info into class
}
int PopulateAndConvert(int[] a, int[] b)
{
char[] c = new char[1]; // char used to store tested number
//string[] m = new string[a.Length*b.Length];
long l = 0; // Used for testing code
for(int i = 0; i < a.Length; i++) // Populating Arrays One and Two
{
a[i] = i + 100;
b[i] = i + 100;
}
for(int j = a.Length-1; j >= 0; j--) // Beginning for-loops for multiplication and testing
{
//Console.WriteLine(j);
for(int k = b.Length-1; k >= 0; k--) // Second part of for-loop previously mentioned
{
//Console.WriteLine(k);
c = (a[j] * b[k]).ToString().ToCharArray(); // Where the math and conversion happens
//Console.WriteLine(c);
if(c.Length > 5) // Checking if digit of product is greater than 5
{
if((c[0] == c[c.Length-1]) && // Comparing first and second half of product
(c[1] == c[c.Length-2]) &&
(c[2] == c[c.Length-3]))
{
/*for(int n = 0; n < c.Length; n++) // Last tidbit of code that was being attempted
sb[l].Append(Convert.ToInt32(c[0]));
l++;
Console.WriteLine(sb); */
}
}
else if (c.Length < 5) // Product with less than 6 digits go here
{
if((Convert.ToInt32(c[0]) == Convert.ToInt32(c[4])) &&
(Convert.ToInt32(c[1]) == Convert.ToInt32(c[3])))
{
//m[l] = Convert.ToChar(c); l++;
}
}
}
}
// Everything below was used to check the code that I have been trying to work through
// And to place the given products in a ascending or descending order
//foreach (char x in m)
// Console.WriteLine(m);
//IEnumerable<char> sortDescendingQuery =
// from num in c
// orderby num descending
// select num;
return 0;
}

After some time (resting the mind is always beneficial) I found a solution:
if(c.Length > 5) // Checking if digit of product is greater than 5
{
int[] n = new int[c.Length];
StringBuilder sb = new StringBuilder();
if((c[0] == c[c.Length-1]) && // Comparing first and second half of product
(c[1] == c[c.Length-2]) &&
(c[2] == c[c.Length-3]))
{
for(int l = 0; l < c.Length; l++) // Converting each value in the char array to a stringbuilder
{
sb.Append(Convert.ToInt32(new string(c[l], 1)));
}
m[q] = Int32.Parse(sb.ToString()); // Converting stringbuilder into string and then into a long
q++;
}
}
I had to convert each individual value within the char array c[] to a string, then an int, then append it to the string builder sb.
After that I then convert sb to a string (via ToString()) and Parse it to an int.
It seems like a long work around, but it works.
Now I need to Sort it numerically (another hurdle).

More efficient way to get all indexes of a character in a string

Instead of looping through each character to see if it's the one you want then adding the index your on to a list like so:
var foundIndexes = new List<int>();
for (int i = 0; i < myStr.Length; i++)
{
if (myStr[i] == 'a')
foundIndexes.Add(i);
}

You can use String.IndexOf, see example below:
string s = "abcabcabcabcabc";
var foundIndexes = new List<int>();
long t1 = DateTime.Now.Ticks;
for (int i = s.IndexOf('a'); i > -1; i = s.IndexOf('a', i + 1))
{
// for loop end when i=-1 ('a' not found)
foundIndexes.Add(i);
}
long t2 = DateTime.Now.Ticks - t1; // read this value to see the run time

I use the following extension method to yield all results:
public static IEnumerable<int> AllIndexesOf(this string str, string searchstring)
{
int minIndex = str.IndexOf(searchstring);
while (minIndex != -1)
{
yield return minIndex;
minIndex = str.IndexOf(searchstring, minIndex + searchstring.Length);
}
}
usage:
IEnumerable<int> result = "foobar".AllIndexesOf("o"); // [1,2]
Side note to a edge case: This is a string approach which works for one or more characters. In case of "fooo".AllIndexesOf("oo") the result is just 1 https://dotnetfiddle.net/CPC7D2

How about
string xx = "The quick brown fox jumps over the lazy dog";
char search = 'f';
var result = xx.Select((b, i) => b.Equals(search) ? i : -1).Where(i => i != -1);

The raw iteration is always better & most optimized.
Unless it's a bit complex task, you never really need to seek for a better optimized solution...
So I would suggest to continue with :
var foundIndexes = new List<int>();
for (int i = 0; i < myStr.Length; i++)
if (myStr[i] == 'a') foundIndexes.Add(i);

If the string is short, it may be more efficient to search the string once and count up the number of times the character appears, then allocate an array of that size and search the string a second time, recording the indexes in the array. This will skip any list re-allocations.
What it comes down to is how long the string is and how many times the character appears. If the string is long and the character appears few times, searching it once and appending indicies to a List<int> will be faster. If the character appears many times, then searching the string twice (once to count, and once to fill an array) may be faster. Exactly where the tipping point is depends on many factors that can't be deduced from your question.
If you need to search the string for multiple different characters and get a list of indexes for those characters separately, it may be faster to search through the string once and build a Dictionary<char, List<int>> (or a List<List<int>> using character offsets from \0 as the indicies into the outer array).
Ultimately, you should benchmark your application to find bottlenecks. Often the code that we think will perform slowly is actually very fast, and we spend most of our time blocking on I/O or user input.

public static List<int> GetSubstringLocations(string text, string searchsequence)
{
try
{
List<int> foundIndexes = new List<int> { };
int i = 0;
while (i < text.Length)
{
int cindex = text.IndexOf(searchsequence, i);
if (cindex >= 0)
{
foundIndexes.Add(cindex);
i = cindex;
}
i++;
}
return foundIndexes;
}
catch (Exception ex) { }
return new List<int> { };
}

public static String[] Split(this string s,char c = '\t')
{
if (s == null) return null;
var a = new List<int>();
int i = s.IndexOf(c);
if (i < 0) return new string[] { s };
a.Add(i);
for (i = i+1; i < s.Length; i++) if (s[i] == c) a.Add(i);
var result = new string[a.Count +1];
int startIndex = 0;
result[0] = s.Remove(a[0]);
for(i=0;i<a.Count-1;i++)
{
result[i + 1] = s.Substring(a[i] + 1, a[i + 1] - a[i] - 1);
}
result[a.Count] = s.Substring(a[a.Count - 1] + 1);
return result;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Optimal way to find pairs in a list of strings - c#

Related

Extract elements from List of List in C#

Finding the longest word (and write it out) in string without split, distinct and foreach [duplicate]

find number with no pair in array

C# char Conversion/Append to/with int, int64, string

More efficient way to get all indexes of a character in a string

Categories

Resources