How to group array of char/string with UNION? - c#

I have a two dimensional array of char, called Letters[ ][ ]
Letters[0][0] = A
[0][1] = B
Letters[1][0] = C
[1][1] = D
Letters[2][0] = B
[2][1] = A
[2][2] = F
Letters[3][0] = I
[3][1] = F
[3][2] = J
I need to group it, so it will be something like this:
group[0] [0] = A
group[0] [1] = B
group[0] [2] = F
group[0] [3] = I
group[0] [4] = J
group[1] [0] = C
group[1] [1] = D
My logic so far for my problem is check every elements with other elements. If both elements are the same letter, it groups together with the whole other array elements with no double/duplicated elements. But, I'm not sure of using C# Linq Union or maybe just a standard array access.
How do I supposed to do to group it in best way? Or are there any other solutions for this?

I think a pure LINQ solution would be overly complex. This isn't (if I understand your specification correctly) a simple union operation. You want to union based on non-empty intersections. That would mean having to first rearrange the data so LINQ can do a join, to find the data that matches, and since LINQ will only join on equality, doing that while preserving the original grouping information is going to result in syntax that would be more trouble than it's worth, IMHO.
Here is a non-LINQ approach that works for the example you've given:
static void Main(string[] args)
{
char[][] letters =
{
new [] { 'A', 'B' },
new [] { 'C', 'D' },
new [] { 'B', 'A', 'F' },
new [] { 'I', 'F', 'J' },
};
List<HashSet<char>> sets = new List<HashSet<char>>();
foreach (char[] row in letters)
{
List<int> setIndexes = Enumerable.Range(0, sets.Count)
.Where(i => row.Any(ch => sets[i].Contains(ch))).ToList();
CoalesceSets(sets, row, setIndexes);
}
foreach (HashSet<char> set in sets)
{
Console.WriteLine("{ " + string.Join(", ", set) + " }");
}
}
private static void CoalesceSets(List<HashSet<char>> sets, char[] row, List<int> setIndexes)
{
if (setIndexes.Count == 0)
{
sets.Add(new HashSet<char>(row));
}
else
{
HashSet<char> targetSet = sets[setIndexes[0]];
targetSet.UnionWith(row);
for (int i = setIndexes.Count - 1; i >= 1; i--)
{
targetSet.UnionWith(sets[setIndexes[i]]);
sets.RemoveAt(setIndexes[i]);
}
}
}
It builds up sets of the input data by scanning the previously identified sets to find which ones the current row of data intersects with, and then coalesces these sets into a single set containing all of the members (your specification appears to impose transitive membership…i.e. if one letter joins sets A and B, and a different letter joins set B and C, you want A, B, and C all joined into a single set).
This isn't an optimal solution, but it's readable. You could avoid the O(N^2) search by maintaining a Dictionary<char, int> to map each character to the set which contains it. Then instead of scanning all the sets, it's a simple lookup for each character in the current row, to build up the list of set indexes. But there's a lot more "housekeeping" code going that approach; I would not bother implementing it that way unless you find a proven performance issue doing it the more basic way.
By the way: I have a vague recollection I've seen this type of question before on Stack Overflow, i.e. this sort of transitive unioning of sets. I looked for the question but couldn't find it. You may have more luck, and may find there is additional helpful information with that question and its answers.

Related

Building a Matrix of Combinations

I'm sure this has been asked a million times, but when I searched all the examples didn't quite fit, so I thought I should ask it anyway.
I have two arrays which will always contain 6 items each. For example:
string[] Colors=
new string[] { "red", "orange", "yellow", "green", "blue", "purple" };
string[] Foods=
new string[] { "fruit", "grain", "dairy", "meat", "sweet", "vegetable" };
Between these two arrays, there are 36 possible combinations(e.g. "red fruit", "red grain").
Now I need to further group these into sets of six unique values.
For example:
meal[0]=
new Pair[] {
new Pair { One="red", Two="fruit" },
new Pair { One="orange", Two="grain" },
new Pair { One="yellow", Two="dairy" },
new Pair { One="green", Two="meat" },
new Pair { One="blue", Two="sweet" },
new Pair { One="purple", Two="vegetable" }
};
where meal is
Pair[][] meal;
No element can be repeated in my list of "meals". So there is only ever a single "Red" item, and a single "meat" item, etc.
I can easily create the pairs based on the first two arrays, but I am drawing a blank on how best to then group them into unique combinations.
OK, you want a sequence containing all 720 possible sequences. This is a bit trickier but it can be done.
The basic idea is the same as in my previous answer. In that answer we:
generated a permutation at random
zipped the permuted second array with the unpermuted first array
produced an array from the query
Now we'll do the same thing except instead of producing a permutation at random, we'll produce all the permutations.
Start by getting this library:
http://www.codeproject.com/Articles/26050/Permutations-Combinations-and-Variations-using-C-G
OK, we need to make all the permutations of six items:
Permutations<string> permutations = new Permutations<string>(foods);
What do we want to do with each permutation? We already know that. We want to first zip it with the colors array, turning it into a sequence of pairs, which we then turn into an array. Instead, let's turn it into a List<Pair> because, well, trust me, it will be easier.
IEnumerable<List<Pair>> query =
from permutation in permutations
select colors.Zip(permutation, (color, food)=>new Pair(color, food)).ToList();
And now we can turn that query into a list of results;
List<List<Pair>> results = query.ToList();
And we're done. We have a list with 720 items in it. Each item is a list with 6 pairs in it.
The heavy lifting is done by the library code, obviously; the query laid on top of it is straightforward.
('ve been meaning to write a blog article for some time on ways to generate permutations in LINQ; I might use this as an example!)
There are 720 possible combinations that meet your needs. It is not clear from your question whether you want to enumerate all 720 or choose one at random or what. I'm going to assume the latter.
UPDATE: Based on comments, this assumption was incorrect. I'll start a new answer.
First, produce a permutation of the second array. You can do it in-place with the Fischer-Yates-Knuth shuffle; there are many examples of how to do so on StackOverflow. Alternatively, you could produce a permutation with LINQ by sorting with a random key.
The former technique is fast even if the number of items is large, but mutates an existing array. The second technique is slower, particularly if the number of items is extremely large, which it isn't.
The most common mistake people make with the second technique is sorting on a guid. Guids are guaranteed to be unique, not guaranteed to be random.
Anyway, produce a query which, when executed, permutes the second array:
Random random = new Random();
IEnumerable<string> shuffled = from food in foods
orderby random.NextDouble()
select food;
A few other caveats:
Remember, the result of a query expression is a query, not a set of results. The permutation doesn't happen until you actually turn the thing into an array at the other end.
if you make two instances of Random within the same millisecond, you get the same sequence out of them both.
Random is pseudo-random, not truly random.
Random is not threadsafe.
Now you can zip-join your permuted sequence to the first array:
IEnumerable<Pair> results = colors.Zip(shuffled, (color, food)=>new Pair(color, food));
Again, this is still a query representing the action of zipping the two sequences together. Nothing has happened yet except building some queries.
Finally, turn it into an array. This actually executes the queries.
Pair[] finalResults = results.ToArray();
Easy peasy.
Upon request, I will be specific about how I view the problem in regards to sorting. I know that since C# is a higher level language there are tons of quick and easy libraries and objects that can be used to reduce this to minimal code. This answer is actually attempting the solve the question by implementing sorting logic.
When initially reading this question I was reminded of sorting a deck of cards. The two arrays are very similar to an array for suit and an array for face value. Since one way to solve a shuffle is to randomize the arrays and then pick a card combined of both, you could apply the same logic here.
Sorting as a possible solution
The Fisher-Yates sorting algorithm essentially loops through all the indices of the array swapping the current index with a random index. This creates a fairly efficient sorting method. So then how does this apply to the problem at hand? One possible implementation could be...
static Random rdm = new Random();
public string[] Shuffle(string[] c)
{
var random = rdm;
for (int i = c.Length; i > 1; i--)
{
int iRdm = rdm.Next(i);
string cTemp = c[iRdm];
c[iRdm] = c[i - 1];
c[i - 1] = cTemp;
}
return c;
}
Source: Fisher-Yates Shuffle
The code above randomizes the positions of values within the string array. If you passed the Colors and Food arrays into this function, you would get unique pairings for your Pairs by referencing a specific index of both.
Since the array is shuffled, the pairing of the two arrays at index 0,1,2,etc are unique. The problem however asks for Pairs to be created. A Pair class should then be created that takes in a value at a specific index for both Colors and Foods. ie...Colors[3] and Foods[3]
public class Pair
{
public string One;
public string Two;
public Pair(string m1, string m2)
{
One = m1;
Two = m2;
}
}
Since we have sorted arrays and a class to contain the unique parings, we simply create the meal array and populate it with Pairs.
If we wanted to create a new pair we would have...
Pair temp = new Pair(Colors[0],Foods[0]);
With this information we can finally populate the meal array.
Pair[] meal = new Pair[Colors.Length - 1];
for (int i = 0; i < Colors.Length - 1; i++)
{
meal[i] = new Pair(Colors[i],Foods[i]);
}
This section of code creates the meal array and defines its number of indices by the length of Colors. The code then loops through the total number of Color values while creating new pair combos and dropping them in meal. This method assumes the length of the arrays are identical, a check could easily be made for the smallest array.
Full Code
private void Form1_Load(object sender, EventArgs e)
{
string[] Colors = new string[] { "red", "orange", "yellow", "green", "blue", "purple" };
string[] Foods = new string[] { "fruit", "grain", "dairy", "meat", "sweet", "vegetable" };
Colors = Shuffle(Colors);
Foods = Shuffle(Foods);
Pair[] meal = new Pair[Colors.Length - 1];
for (int i = 0; i < Colors.Length - 1; i++)
{
meal[i] = new Pair(Colors[i],Foods[i]);
}
}
static Random rdm = new Random();
public string[] Shuffle(string[] c)
{
var random = rdm;
for (int i = c.Length; i > 1; i--)
{
int iRdm = rdm.Next(i);
string cTemp = c[iRdm];
c[iRdm] = c[i - 1];
c[i - 1] = cTemp;
}
return c;
}
}
public class Pair
{
public string One;
public string Two;
public Pair(string m1, string m2)
{
One = m1;
Two = m2;
}
}
-Original Post-
You can simply shuffle the array. This will allow for the same method to populate meal, but with different results. There is a post on Fisher-Yates shuffle Here

C# : How to compare two collections (System.Collection.Generic.List<T>) using Linq/Lambda?

I'm having two collections of String like
List<String> l_lstOne = new List<String> { "100", "1X0", "X11", "XXX" },
l_lstTwo = new List<String> { "000", "110", "100", "000" };
I need to compare the two lists and make the second list like
{ "000", "1X0", "X00", "XXX" }
Note:
Both the list will contain same numbe of elements and the length of each element will be same.
The comparision is like
If an mth element in l_lstOne have an 'X' in nth position, the the nth position of the mth in l_lstTwo should be replaced by 'X'.
Example
l_lstOne l_lstTwo Output
100 000 000
1X0 110 1X0
X11 100 X00
So, to solve this i used nested for loop , here is my source code,
for (int l_nIndex = 0; l_nIndex < l_lstTwo.Count; l_nIndex++)
{
String l_strX = String.Empty;
for (int l_nInnerIndex = 0; l_nInnerIndex < l_lstTwo[l_nInnerIndex].Length; l_nInnerIndex++)
{
l_strX += l_lstOne[l_nIndex][l_nInnerIndex] == 'X' ? 'X' : l_lstTwo[l_nIndex][l_nInnerIndex];
}
l_lstTwo[l_nIndex] = l_strX;
}
This code is working fine, but the thing is, its taking more time to execute, i.e almost 600 milliseconds to process 200000 elements and each of length 16.
And moreover i need a Linq or Lambda method to resolve this. So please help me to do this. Thanks in advance.
LINQ will not help you here; LINQ is not meant to modify collections.
You can make your code substantially faster by building a char[] instead of a string; right now, you're building 3.2 million string objects because of the +=.
Instead, you can write
char[] l_strX = new char[l_lstTwo[l_nInnerIndex].Length];
for (int l_nInnerIndex = 0; l_nInnerIndex < l_lstTwo[l_nInnerIndex].Length; l_nInnerIndex++)
{
l_strX[l_nInnerIndex] = l_lstOne[l_nIndex][l_nInnerIndex] == 'X' ? 'X' : l_lstTwo[l_nIndex][l_nInnerIndex];
}
l_lstTwo[l_nIndex] = new string(l_strX);
You could do it with the following statement in .NET 3.5
IEnumerable <String> result =
Enumerable.Range(0, l_lstOne.Count)
.Select(i => Enumerable.Range(0, l_lstOne[i].Length)
.Aggregate(string.Empty, (innerResult, x) => innerResult += l_lstOne[i][x] == 'X' ? 'X' : l_lstTwo[i][x]));
Mh, if I understand it correctly the words in l_lstOne act as a mask for the words in l_lstTwo where the mask is transparent unless it's an X. How about this:
l_lstOne.Zip(l_lstTwo,
(w1, w2) => new String(w1.Zip(w2, (c1, c2) => c1 == 'X' ? c1 : c2).ToArray())))
Zip is a Linq extension method available from .NET 4 on which combines the elements of two lists like a zip. The outer zip basically creates the word pairs to iterate over and the second one creates a the mask (take all characters from the second word unless word one has an X in that position).
Also note that this creates a new sequence of strings rather than replacing the ones in l_lstTwo - that's the Linq way of doing things.

Calculating Nth permutation step?

I have a char[26] of the letters a-z and via nested for statements I'm producing a list of sequences like:
aaa, aaz... aba, abb, abz, ... zzy, zzz.
Currently, the software is written to generate the list of all possible values from aaa-zzz and then maintains an index, and goes through each of them performing an operation on them.
The list is obviously large, it's not ridiculously large, but it's gotten to the point where the memory footprint is too large (there are also other areas being looked at, but this is one that I've got).
I'm trying to produce a formula where I can keep the index, but do away with the list of sequences and calculate the current sequence based on the current index (as the time between operations between sequences is long).
Eg:
char[] characters = {a, b, c... z};
int currentIndex = 29; // abd
public string CurrentSequence(int currentIndex)
{
int ndx1 = getIndex1(currentIndex); // = 0
int ndx2 = getIndex2(currentIndex); // = 1
int ndx3 = getIndex3(currentIndex); // = 3
return string.Format(
"{0}{1}{2}",
characters[ndx1],
characters[ndx2],
characters[ndx3]); // abd
}
I've tried working out a small example using a subset (abc) and trying to index into that using modulo division, but I'm having trouble thinking today and I'm stumped.
I'm not asking for an answer, just any kind of help. Maybe a kick in the right direction?
Hint: Think of how you would print a number in base 26 instead of base 10, and with letters instead of digits. What's the general algorithm for displaying a number in an arbitrary base?
Spoiler: (scroll right to view)
int ndx1 = currentIndex / 26 / 26 % 26;
int ndx2 = currentIndex / 26 % 26;
int ndx3 = currentIndex % 26;
Something like this ought to work, assuming 26 characters:
public string CurrentSequence(int currentIndex) {
return characters[currentIndex / (26 * 26)]
+ characters[(currentIndex / 26) % 26]
+ characters[currentIndex % 26];
}
Wow, two questions in one day that can be solved via Cartesian products. Amazing.
You can use Eric Lippert's LINQ snippet to generate all combinations of the index values. This approach results in a streaming set of values, so they don't require storage in memory. This approach nicely separates the logic of generating the codes from maintaining state or performing computation with the code.
Eric's code for all combinations:
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] {item}));
}
You can now write:
public static IEnumerable<string> AllCodes()
{
char[] characters = {a, b, c... z};
IEnumerable<char[]> codeSets = new[] { characters, characters, characters };
foreach( var codeValues in codeSets.CartesianProduct() )
{
yield return
string.Format( "{0}{1}{2}", codeValues[0], codeValues[1], codeValues[2]);
}
}
The code above generates a streaming sequence of all code strings from aaa to zzz. You can now use this elsewhere where you perform your processing:
foreach( var code in AllCodes() )
{
// use the code value somehow...
}
There are multiple ways to solve your problem, but an option is to generate the sequence on the fly instead of storing it in a list:
IEnumerable<String> Sequence() {
for (var c1 = 'a'; c1 <= 'z'; ++c1)
for (var c2 = 'a'; c2 <= 'z'; ++c2)
for (var c3 = 'a'; c3 <= 'z'; ++c3)
yield return String.Format("{0}{1}{2}", c1, c2, c3);
}
You can then enumerate all the strings:
foreach (var s in Sequence())
Console.WriteLine(s);
This code doesn't use indices at all but it allows you to create a loop around the sequence of strings using simple code and without storing the strings.

Algorithm for dynamic combinations

My code has a list called INPUTS, that contains a dynamic number of lists, let's call them A, B, C, .. N. These lists contain a dynamic number of Events
I would like to call a function with each combination of Events. To illustrate with an example:
INPUTS: A(0,1,2), B(0,1), C(0,1,2,3)
I need to call my function this many times for each combination (the input count is dynamic, in this example it is three parameter, but it can be more or less)
function(A[0],B[0],C[0])
function(A[0],B[1],C[0])
function(A[0],B[0],C[1])
function(A[0],B[1],C[1])
function(A[0],B[0],C[2])
function(A[0],B[1],C[2])
function(A[0],B[0],C[3])
function(A[0],B[1],C[3])
function(A[1],B[0],C[0])
function(A[1],B[1],C[0])
function(A[1],B[0],C[1])
function(A[1],B[1],C[1])
function(A[1],B[0],C[2])
function(A[1],B[1],C[2])
function(A[1],B[0],C[3])
function(A[1],B[1],C[3])
function(A[2],B[0],C[0])
function(A[2],B[1],C[0])
function(A[2],B[0],C[1])
function(A[2],B[1],C[1])
function(A[2],B[0],C[2])
function(A[2],B[1],C[2])
function(A[2],B[0],C[3])
function(A[2],B[1],C[3])
This is what I have thought of so far:
My approach so far is to build a list of combinations. The element combination is itself a list of "index" to the input arrays A, B and C. For our example:
my list iCOMBINATIONS contains the following iCOMBO lists
(0,0,0)
(0,1,0)
(0,0,1)
(0,1,1)
(0,0,2)
(0,1,2)
(0,0,3)
(0,1,3)
(1,0,0)
(1,1,0)
(1,0,1)
(1,1,1)
(1,0,2)
(1,1,2)
(1,0,3)
(1,1,3)
(2,0,0)
(2,1,0)
(2,0,1)
(2,1,1)
(2,0,2)
(2,1,2)
(2,0,3)
(2,1,3)
Then I would do this:
foreach( iCOMBO in iCOMBINATIONS)
{
foreach ( P in INPUTS )
{
COMBO.Clear()
foreach ( i in iCOMBO )
{
COMBO.Add( P[ iCOMBO[i] ] )
}
function( COMBO ) --- (instead of passing the events separately)
}
}
But I need to find a way to build the list iCOMBINATIONS for any given number of INPUTS and their events. Any ideas?
Is there actually a better algorithm than this?
any pseudo code to help me with will be great.
C# (or VB)
Thank You
You can use an array to hold the indexes for each list. Example:
List<List<int>> lists = new List<List<int>> {
new List<int> { 0,1,2 },
new List<int> { 0,1 },
new List<int> { 0,1,2,3 }
};
int[] cnt = new int[lists.Count];
int index;
do {
Console.WriteLine(String.Join(",", cnt.Select((c,i) => lists[i][c].ToString()).ToArray()));
index = cnt.Length - 1;
do {
cnt[index] = (cnt[index] + 1) % lists[index].Count;
} while(cnt[index--] == 0 && index != -1);
} while (index != -1 || cnt[0] != 0);
This is permutation problem. You may take a look at this:
http://www.interact-sw.co.uk/iangblog/2004/09/16/permuterate
I had similar problem some time ago (generating combinations), I've used code from: http://www.merriampark.com/comb.htm . It's java, but I hadn't any problems to translate it into C#.
Put A,B,C in matrix!
M=[A,B,C]
recursive_caller(d,params):
if d == len(M):
function(params)
return
for i in M[d]:
params[d]=i
recursive_caller(d+1,params)
It would seem that what you really want, is neither a permutation, nor a combination, per se. You want to look at the cartesian product (see here) of several sets, the iteration over which may involve iterating through combinations of individual sets.
However, this is unlike a combination problem, because you are looking for the ways to choose 1 element from each set. The number of ways to do this is the size of the set. Combinations problems usually involve choose k-many things from a set of n-many things, where k=1 or n is trivial.
Several methods of producing iterators in C# have been discussed here. (Including one by Jon Skeet).
If you are using .NET, you may also be interested in developed combinatorics modules, such as KwCombinatorics at CodePlex.
edit Now, with LINQ to the rescue:
private void cartesian1()
{
textAppend("Cartesian 1");
var setA = new[] { "whole wheat", "white", "rye" };
var setB = new[] { "cold cut", "veggie", "turkey", "roast beef" };
var setC = new[] { "everything", "just mayo" };
var query =
from bread in setA
from meat in setB
from toppings in setC
let sandwich = String.Format("{1} on {0} with {2}",
bread, meat, toppings)
select sandwich;
foreach( string sandwich in query )
{
textAppend(sandwich);
}
}
A modified version of #Guffa's answer. I am by no means a creator of this code.
List<int> lists = new List<int> { 3, 2, 4 };
int[] cnt = new int[lists.Count];
int index;
do
{
Console.WriteLine(String.Join(",", cnt));
index = cnt.Length - 1;
do
{
cnt[index] = (cnt[index] + 1) % lists[index];
} while (cnt[index--] == 0 && index != -1);
} while (index != -1 || cnt[0] != 0);
Instead of using List<List<int>> - with possible values - use List<int> describing the amount of elements in collection. The output is the same an in original answer. The performance is better.

Best way to reduce sequences in an array of strings

Please, now that I've re-written the question, and before it suffers from further fast-gun answers or premature closure by eager editors let me point out that this is not a duplicate of this question. I know how to remove duplicates from an array.
This question is about removing sequences from an array, not duplicates in the strict sense.
Consider this sequence of elements in an array;
[0] a
[1] a
[2] b
[3] c
[4] c
[5] a
[6] c
[7] d
[8] c
[9] d
In this example I want to obtain the following...
[0] a
[1] b
[2] c
[3] a
[4] c
[5] d
Notice that duplicate elements are retained but that sequences of the same element have been reduced to a single instance of that element.
Further, notice that when two lines repeat they should be reduced to one set (of two lines).
[0] c
[1] d
[2] c
[3] d
...reduces to...
[0] c
[1] d
I'm coding in C# but algorithms in any language appreciated.
EDIT: made some changes and new suggestions
What about a sliding window...
REMOVE LENGTH 2: (no other length has other matches)
//the lower case letters are the matches
ABCBAbabaBBCbcbcbVbvBCbcbcAB
__ABCBABABABBCBCBCBVBVBCBCBCAB
REMOVE LENGTH 1 (duplicate characters):
//* denote that a string was removed to prevent continual contraction
//of the string, unless this is what you want.
ABCBA*BbC*V*BC*AB
_ABCBA*BBC*V*BC*AB
RESULT:
ABCBA*B*C*V*BC*AB == ABCBABCVBCAB
This is of course starting with length=2, increase it to L/2 and iterate down.
I'm also thinking of two other approaches:
digraph - Set a stateful digraph with the data and iterate over it with the string, if a cycle is found you'll have a duplication. I'm not sure how easy it is check check for these cycles... possibly some dynamic programming, so it could be equivlent to method 2 below. I'm going to have to think about this one as well longer.
distance matrix - using a levenstein distance matrix you might be able to detect duplication from diagonal movement (off the diagonal) with cost 0. This could indicate duplication of data. I will have to think about this more.
Here's C# app i wrote that solves this problem.
takes
aabccacdcd
outputs
abcacd
Probably looks pretty messy, took me a bit to get my head around the dynamic pattern length bit.
class Program
{
private static List<string> values;
private const int MAX_PATTERN_LENGTH = 4;
static void Main(string[] args)
{
values = new List<string>();
values.AddRange(new string[] { "a", "b", "c", "c", "a", "c", "d", "c", "d" });
for (int i = MAX_PATTERN_LENGTH; i > 0; i--)
{
RemoveDuplicatesOfLength(i);
}
foreach (string s in values)
{
Console.WriteLine(s);
}
}
private static void RemoveDuplicatesOfLength(int dupeLength)
{
for (int i = 0; i < values.Count; i++)
{
if (i + dupeLength > values.Count)
break;
if (i + dupeLength + dupeLength > values.Count)
break;
var patternA = values.GetRange(i, dupeLength);
var patternB = values.GetRange(i + dupeLength, dupeLength);
bool isPattern = ComparePatterns(patternA, patternB);
if (isPattern)
{
values.RemoveRange(i, dupeLength);
}
}
}
private static bool ComparePatterns(List<string> pattern, List<string> candidate)
{
for (int i = 0; i < pattern.Count; i++)
{
if (pattern[i] != candidate[i])
return false;
}
return true;
}
}
fixed the initial values to match the questions values
I would dump them all into your favorite Set implementation.
EDIT: Now that I understand the question, your original solution looks like the best way to do this. Just loop through the array once, keeping an array of flags to mark which elements to keep, plus a counter to keep track to the size of the new array. Then loop through again to copy all the keepers to a new array.
I agree that if you can just dump the strings into a Set, then that might be the easiest solution.
If you don't have access to a Set implementation for some reason, I would just sort the strings alphabetically and then go through once and remove the duplicates. How to sort them and remove duplicates from the list will depend on what language and environment you are running your code.
EDIT: Oh, ick.... I see based on your clarification that you expect that patterns might occur even over separate lines. My approach won't solve your problem. Sorry. Here is a question for you. If I had the following file.
a
a
b
c
c
a
a
b
c
c
Would you expect it to simplify to
a
b
c

Categories

Resources