How to remove duplicate rows from nested list?

How to remove duplicate rows from nested list? - c#

I have nested list to perform as 2D array:
public List<List<float?>> arrayFloatValue = new List<List<float?>>();
This nested list has 2000 columns in parent and 20000 float values in child array. Now I want to match repeated rows and remove from sub list. Below is sample code.
//Define capacity
int ColumnsCount = 2000;
int RowsCount = 20000;
//Create object
public List<List<float?>> arrayFloatValue = new List<List<float?>>();
//Initiate parent list i.e. 2000
arrayFloatValue = new List<float?>[ColumnsCount].ToList();
//Initiate child list i.e. 20000
for (int i = 0; i < ColumnsCount; i++)
{
arrayFloatValue[i] = new float?[RowsCount].ToList();
}
//Fill dummy data.
for (int x = 0; x < ColumnsCount; x++)
{
for (int y = 0; y < RowsCount; y++)
{
if (y % 50 != 0)
arrayFloatValue[x][y] = x + y; // Assign dummy value
else
arrayFloatValue[x][y] = 0; // Forcefully 0 value added for each 50th row.
}
}
Now I have array like
// [0] [0] [1] [0] [2] [0] ...
// [1] [2] [3] ...
// [2] [3] [4] ...
// [3] [4] [5] ...
// [4] [5] [6] ...
// [5] [6] [7] ...
// [6] [7] [8] ...
// [7] [8] [9] ...
// [8] [9] [10] ...
// [9] [10] [11] ...
// ... ... ...
// [49] [50] [51] ...
// [0] [0] [0] ...
//
// And so on..
//
Now I want to remove repeated values in each column. Here in above example I have 0 value as repeated at each row index like 50th, 100th 150th .... so on. I want to remove these rows.

You can try good old Distinct with a custom IEqualityComparer<T> (we are going to compare lists with SequenceEqual):
public class ListComparer<T> : IEqualityComparer<IEnumerable<T>> {
public bool Equals(IEnumerable<T> x, IEnumerable<T> y) {
return Enumerable.SequenceEqual(x, y);
}
public int GetHashCode(IEnumerable<T> obj) {
return obj == null ? -1 : obj.Count();
}
}
Now Distinct:
List<List<float?>> list = new List<List<float?>>() {
new List<float?>() { 1, 2, 3},
new List<float?>() { 4, 5, 6, 7},
new List<float?>() { 1, 2, 3},
new List<float?>() { null },
new List<float?>() { 1, 2, null },
new List<float?>() { null },
new List<float?>() { 1, 2 },
};
var result = list
.Distinct(new ListComparer<float?>());
string report = string.Join(Environment.NewLine,
result.Select(line => $"{string.Join(", ", line)}"));
Console.Write(report);
Outcome:
1, 2, 3
4, 5, 6, 7
1, 2,
1, 2

If I understand your question, you can use a HashSet to filter your list. But you have to define a IEqualityComparer> that checks the equality of the elements.
I did an example:
using System;
using System.Collections.Generic;
using System.Linq;
namespace MyNamespace
{
public class Program
{
public static void Main()
{
List<List<float>> arrayFloatValue = new List<List<float>>
{
new List<float> {1, 2, 3},
new List<float> {1, 3, 2},
new List<float> {1, 2, 3},
new List<float> {3, 5, 7}
};
var hsArrayFloatValue = new HashSet<List<float>>(arrayFloatValue, new ListComparer());
List<List<float>> filteredArrayFloatValue = hsArrayFloatValue.ToList();
DisplayNestedList(filteredArrayFloatValue);
//output:
//1 2 3
//1 3 2
//3 5 7
}
public static void DisplayNestedList(List<List<float>> nestedList)
{
foreach (List<float> list in nestedList)
{
foreach (float f in list)
Console.Write(f + " ");
Console.WriteLine();
}
Console.ReadLine();
}
}
public class ListComparer : IEqualityComparer<List<float>>
{
public bool Equals(List<float> x, List<float> y)
{
if (x == null && y == null)
return true;
if (x == null || y == null || x.Count != y.Count)
return false;
return !x.Where((t, i) => t != y[i]).Any();
}
public int GetHashCode(List<float> obj)
{
int result = 0;
foreach (float f in obj)
result |= f.GetHashCode();
return result;
}
}
}
Although, I don't recommend you to compare floats. Use decimals instead.

Related

Most efficient way to distribute non unique elements across multiple lists

Suppose that I have a list of integer or whatever
List<int> motherlist = { 1, 1, 2, 5, 7, 2, 2, 2, 6, 1 }
Console.WriteLine(children.Count); // 10
I would like to find all duplicates and not remove them from the list but to distribute them across other lists so the final count of all childrens should be the same as motherlist:
List<List<int>> children = { { 1, 2, 5, 7, 6 }, { 1, 2 }, { 1, 2 }, { 2 }}
Console.WriteLine(children.Sum(l => l.Count())); // 10 same as mother
I tried so far a brute force approach by looping through all elements of mother, comparing the elements with all other elements and to check for duplicates, If duplicate found I add it to a list of buckets (List of Lists) and so forth until the last elements.
But the brute force approach takes 7 CPU seconds for only a mother list of 300 items.
I imagine that if I had 1000 items this would take forever.
Is there a faster way to do this in C# .NET ?

I suggest grouping duplicates and then loop taking into account size of the groups:
public static IEnumerable<List<T>> MyDo<T>(IEnumerable<T> source,
IEqualityComparer<T> comparer = null) {
if (null == source)
throw new ArgumentNullException(nameof(source));
var groups = new Dictionary<T, List<T>>(comparer ?? EqualityComparer<T>.Default);
int maxLength = 0;
foreach (T item in source) {
if (!groups.TryGetValue(item, out var list))
groups.Add(item, list = new List<T>());
list.Add(item);
maxLength = Math.Max(maxLength, list.Count);
}
for (int i = 0; i < maxLength; ++i) {
List<T> result = new List<T>();
foreach (var value in groups.Values)
if (i < value.Count)
result.Add(value[i]);
yield return result;
}
}
Demo:
int[] source = new int[] { 1, 1, 2, 5, 7, 2, 2, 2, 6, 1 };
var result = MyDo(source).ToList();
string report = string.Join(Environment.NewLine, result
.Select(line => $"[{string.Join(", ", line)}]"));
Console.Write(report);
Outcome:
[1, 2, 5, 7, 6]
[1, 2]
[1, 2]
[2]
Stress Demo:
Random random = new Random(1234); // seed, the results to be reproducible
// We don't want 1000 items be forever; let's try 1_000_000 items
int[] source = Enumerable
.Range(1, 1_000_000)
.Select(x => random.Next(1, 1000))
.ToArray();
Stopwatch sw = new Stopwatch();
sw.Start();
var result = MyDo(source).ToList();
sw.Stop();
Console.WriteLine($"Time: {sw.ElapsedMilliseconds} ms");
Outcome: (may vary from workstation to workstation)
Time: 50 ms

I would GroupBy the elements of the list, and then use the count of elements to know the number of sublists an element has to be added in
List<int> motherlist = new List<int> { 1, 1, 2, 5, 7, 2, 2, 2, 6, 1 };
var childrens = motherlist.GroupBy(x => x).OrderByDescending(x => x.Count());
var result = new List<List<int>>();
foreach (var children in childrens)
{
for (var i = 0; i < children.Count(); i++)
{
if (result.Count() <= i) result.Add(new List<int>());
result[i].Add(children.Key);
}
}
Console.WriteLine("{");
foreach (var res in result)
{
Console.WriteLine($"\t{{ { string.Join(", ", res) } }}");
}
Console.WriteLine("}");
This outputs :
{
{ 2, 1, 5, 7, 6 }
{ 2, 1 }
{ 2, 1 }
{ 2 }
}

Just a quick shot, but it seems to work quite well...
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp2
{
class Program
{
static void Main(string[] args)
{
List<int> motherlist = new List<int> { 1, 1, 2, 5, 7, 2, 2, 2, 6, 1 };
var rnd = new Random(1);
for (int i = 0; i < 1000; i++)
{
motherlist.Add(rnd.Next(1, 200));
}
var resultLists = new List<IEnumerable<int>>();
while (motherlist.Any())
{
var subList = motherlist.Distinct().OrderBy(x => x).ToList();
subList.ForEach(x => motherlist.Remove(x));
resultLists.Add(subList);
}
}
}
}

You can use a Dictionary<int, int> to keep track of the number of occurrences of each element and build the child lists in a single iteration with O(n) time complexity(most of the time) and without any LINQ:
var motherlist = new List<int>() { 1, 1, 2, 5, 7, 2, 2, 2, 6, 1 };
var counts = new Dictionary<int, int>();
var children = new List<List<int>>();
foreach(var element in motherlist)
{
counts.TryGetValue(element, out int count);
counts[element] = ++count;
if (children.Count < count)
{
children.Add(new List<int>() { element });
}
else
{
children[count - 1].Add(element);
}
}
OUTPUT
{ 1, 2, 5, 7, 6 }
{ 1, 2 }
{ 2, 1 }
{ 2 }

How to check whether an two dimensional array present in a List?

I have two list of two dimensional array
List<double[,]>list1=new List<double[4,4]>();
List<double[,]>list2=new List<double[4,4]>();
The length of lists are not necessarily equal.

What you have does not work because Contains will do a reference comparison to check equality when iterating the list. Unless your 2d arrays in each list refer to the same object reference, even if they're semantically the same it would not identify them as being equal.
For example, in this case the match would be found:
var my2d = new double[2, 2] { { 1, 3 }, { 3, 5 } };
List<double[,]> list1 = new List<double[,]>() { my2d };
List<double[,]> list2 = new List<double[,]>() { my2d };
foreach (var matrix in list1)
if (list2.Contains(matrix))
Console.WriteLine("FOUND!");
But if we change the lists to have separate instances of the 2d array, it would not:
List<double[,]> list1 = new List<double[,]>() { new double[2, 2] { { 1, 3 }, { 3, 5 } } };
List<double[,]> list2 = new List<double[,]>() { new double[2, 2] { { 1, 3 }, { 3, 5 } } };
One way you could overcome this is to specify your own IEqualityComparer to tell the Contains method how to perform a comparison. For example, here is something that could compare a two dimension array element by element:
public class TwoDimensionCompare<T> : IEqualityComparer<T[,]>
{
public bool Equals(T[,] x, T[,] y)
{
// fail fast if the sizes aren't the same
if (y.GetLength(0) != x.GetLength(0)) return false;
if (y.GetLength(1) != x.GetLength(1)) return false;
// compare element by element
for (int i = 0; i < y.GetLength(0); i++)
for (int z = 0; z < y.GetLength(1); z++)
if (!EqualityComparer<T>.Default.Equals(x[i, z], y[i, z])) return false;
return true;
}
public int GetHashCode(T[,] obj)
{
return obj.GetHashCode();
}
}
Usage:
List<double[,]> list1 = new List<double[,]>() { new double[2, 2] { { 1, 3 }, { 3, 5 } } };
List<double[,]> list2 = new List<double[,]>() { new double[2, 2] { { 1, 3 }, { 3, 5 } } };
foreach (var matrix in list1)
if (list2.Contains(matrix, new TwoDimensionCompare<double>()))
Console.WriteLine("FOUND!");

Why is my array index out-of-bounds in this algorithm?

So I'm doing a little practice that is self-explanatory in the commented code below
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static int[,] GetPairs ( int [] arr )
{
// given an array arr of unique integers, returns all the pairs
// e.g. GetPairs(new int [] { 1, 2, 3, 4, 5 }) would return
// { {1, 2}, {1, 3}, {1, 4}, {1, 5}, {2, 3}, {2, 4}, {2, 5}, {3, 4}, {3, 5}, {4, 5} }
int n = (arr.Length * (arr.Length - 1))/2; // number of pairs unique pairs in an array of unique ints
if ( n < 1 ) return new int[0,2] {}; // if array is empty or length 1
int[,] pairs = new int[n,2]; // array to store unique pairs
// populate the pairs array:
for ( int i = 0, j = 0; i < arr.Length; ++i )
{
for ( int k = i + 1; k < arr.Length; ++k )
{
pairs[j,0] = arr[i];
pairs[j,1] = arr[k];
++j;
}
}
return pairs;
}
public static void Main()
{
int [] OneThroughFour = new int [4] { 1, 2, 3, 4 };
int [,] Pairs = GetPairs(OneThroughFour);
for ( int i = 0; i < Pairs.Length; ++i )
{
Console.WriteLine("{0},{1}",Pairs[i,0],Pairs[i,1]);
}
}
}
and the error I'm getting is
[System.IndexOutOfRangeException: Index was outside the bounds of the
array.]
in the loop
for ( int i = 0; i < Pairs.Length; ++i )
{
Console.WriteLine("{0},{1}",Pairs[i,0],Pairs[i,1]);
}
which doesn't make any sense to me. What is out-of-bounds? Surely not the i, for it is in the range 0, 1, ..., Pairs.Length - 1. Surely not the 0 or 1, for those are valid indices.
Also, is it possible to do this better than O(n^2) and is there a way with .NET that is more compact and efficient?

For two dimensional arrays, the Length property returns the length of the first dimension multiplied by the length of the second dimension. In your case, this is equal to 2 * n
What you want, as far as I can tell is to loop through the first dimension.
Use the GetUpperBound method like this:
for (int i = Pairs.GetLowerBound(0); i <= Pairs.GetUpperBound(0); ++i)
{
//...
}

c# - How can I add a value to a multi array?

I have this array here:
float[, ,] vectors;
int pos = 0;
void Start() {
vectors = new float[,,] { {
{ 0, 1, 1 },
{ 0, 2, 2 } }
};
}
This works. I fill the array with numbers.
Now I want to add some values again to a given position. But how?
This are not working:
vectors[pos] = new float[,,] { { { 33, 44, 55 } } };
or
vectors[pos] = { { { 33, 44, 55 } } };
I searched, but not found the right answer.
EDIT:
I want something like this:
[0]+
[0] {1, 2, 3},
[1] {4, 5, 6}
[1]+
[0] {11, 22, 33},
[1] {44, 55, 66},
[2] {77, 88, 99}
...
etc.
Now, e.g. I want add values {10,10,10} to pos = 0. But how?

If you want to add values I suggest using generic lists instead of arrays. And you should create your own Vector class or find one that is suitable to your needs like this.
public class Vector
{
public float X { get; private set; }
public float Y { get; private set; }
public float Z { get; private set; }
public Vector(float x, float y, float z)
{
X = x;
Y = y;
Z = z;
}
}
Then you can do the following
var vectors = new List<List<Vector>>
{
new List<Vector>{
new Vector(0, 1, 1),
new Vector(0, 2, 2)
}
};
vectors[0].Add(new Vector(33,44,55));
And your vectors will contain
[0]
[0] {0, 1, 1}
[1] {0, 2, 2}
[2] {33, 44, 55}
Note that if you need to add to the first dimention you have to do this.
vectors.Add(new List<Vector>());
vectors[1].Add(new Vector(1, 2, 3));
And now you have
[0]
[0] {0, 1, 1}
[1] {0, 2, 2}
[2] {33, 44, 55}
[1]
[0] {1, 2, 3}

You should determine the other positions within the array, you are just specifying one. If your problem cannot be solved within lists
you can try array of arrays as follows
float [][][] x = new float [n][m][];
// initialize the third dimension until m is reached
x[0] = new float {1,2,3,4,5}; // specify whatever value you want
x[1] = new float {3,2,4};
x[2] = new float [3];
// etc until m is reached
// do the same for the n dimension

This will work for you, the array is assigned, you cannot change it you must expand the array as is.
float[, ,] vectors;
int pos = 0;
vectors = new float[,,]
{
{
{ 0, 1, 2 }, { 0, 3, 4 }
}
};
vectors = new float[,,]
{
{
{vectors[0,0,0], vectors[0,0,1], vectors[0,0,2]}, { vectors[0,1,0], vectors[0,1,1], vectors[0,1,2] }, { 33,44,55}
}
};

Split array into array of arrays

There's an array:
var arr = new int[] { 1, 1, 2, 6, 6, 7, 1, 1, 0 };
Is there a simple way to split it into arrays of the same values?
var arrs = new int[][] {
new int[] { 1, 1 },
new int[] { 2 },
new int[] { 6, 6 },
new int[] { 7 },
new int[] { 1, 1 },
new int[] { 0 } };
I would prefer a linq solution but couldn't find it at the first time.

I would write an extension method for this:
public static class SOExtensions
{
public static IEnumerable<IEnumerable<T>> GroupSequenceWhile<T>(this IEnumerable<T> seq, Func<T, T, bool> condition)
{
List<T> list = new List<T>();
using (var en = seq.GetEnumerator())
{
if (en.MoveNext())
{
var prev = en.Current;
list.Add(en.Current);
while (en.MoveNext())
{
if (condition(prev, en.Current))
{
list.Add(en.Current);
}
else
{
yield return list;
list = new List<T>();
list.Add(en.Current);
}
prev = en.Current;
}
if (list.Any())
yield return list;
}
}
}
}
and use it as
var arr = new int[] { 1, 1, 2, 6, 6, 7, 1, 1, 0 };
var result = arr.GroupSequenceWhile((x, y) => x == y).ToList();

var grouped = arr.GroupBy(x => x).Select(x => x.ToArray())
Didn't notice you were after neighbouring groups initially, the following should work for that
var arr = new[] { 1, 1, 2, 6, 6, 7, 1, 1, 0 };
var groups = new List<int[]>();
for (int i = 0; i < arr.Length; i++)
{
var neighours = arr.Skip(i).TakeWhile(x => arr[i] == x).ToArray();
groups.Add(neighours);
i += neighours.Length-1;
}
Live example

This will do the trick:
var arrs = arr.Select((x, index) =>
{
var ar = arr.Skip(index)
.TakeWhile(a => a == x)
.ToArray();
return ar;
}).Where((x, index) => index == 0 || arr[index - 1] != arr[index]).ToArray();
Basically this will generate an array for each sequence item with a length of 1 or greater and will only choose the arrays which correspond to an item in the original sequence which is either the first element or an element that differs from its predecessor.

You can try this:
int index = 0;
var result = arr.Select(number =>
{
var ar = arr.Skip(index)
.TakeWhile(a => a == number)
.ToArray();
index += ar.Length;
return ar;
}).Where(x => x.Any()).ToArray();

An extension method like the answer by #L.B but a little bit more functional oriented:
public static IEnumerable<IEnumerable<T>> GroupWhile<T>(this IEnumerable<T> source, Func<T, T, bool> func)
{
var firstElement = source.FirstOrDefault();
return firstElement == null ? Enumerable.Empty<IEnumerable<T>>() : source.Skip(1).Aggregate(new
{
current = Tuple.Create(firstElement, ImmutableList<T>.Empty.Add(firstElement)),
results = ImmutableList<ImmutableList<T>>.Empty,
}, (acc, x) =>
func(acc.current.Item1, x)
? new { current = Tuple.Create(x, acc.current.Item2.Add(x)), results = acc.results }
: new { current = Tuple.Create(x, ImmutableList<T>.Empty.Add(x)), results = acc.results.Add(acc.current.Item2) },
x => x.results.Add(x.current.Item2).Select(r => r));
}
Note that the extension method uses the Microsoft Immutable Collections library. The library can be downloaded through NuGet.
Usage:
var arr = new int[] { 1, 1, 2, 6, 6, 7, 1, 1, 0 };
var result = arr.GroupWhile((prev, current) => prev == current);
var printFormattedResult = result.Select((x, i) => Tuple.Create(i, string.Join(",", x)));
foreach (var array in printFormattedResult)
Console.WriteLine("Array {0} = {1}", array.Item1, array.Item2);
Output:
Array 0 = 1,1
Array 1 = 2
Array 2 = 6,6
Array 3 = 7
Array 4 = 1,1
Array 5 = 0
Benchmark
Just for the sake of fun, I tried to benchmark the answers.
I used the following code:
var rnd = new Random();
var arr = Enumerable.Range(0, 100000).Select(x => rnd.Next(10)).ToArray();
var time = Stopwatch.StartNew();
var result = <answer>.ToArray();
Console.WriteLine(t.ElapsedMilliseconds);
And got the following results:
-------------------------------------
| Solution Time(ms) Complexity |
------------------------------------|
| L.B | 3ms | O(n) |
|-----------------------------------|
|´ebb | 41ms | O(n) |
|-----------------------------------|
| James | 137ms | O(n^2) |
|-----------------------------------|
| Robert S. | 155ms | O(n^2) |
|-----------------------------------|
| Selman22 | 155ms | O(n^2) |
-------------------------------------
The slight time overhead from my solution (the 41ms) is due to using immutable collections. Adding an item to ex. List<T> would modify the List<T> object. - Adding an item to ImmutableList<T> clones the current elements in it, and adds them to a new ImmutableList<T> along with the new item (which results in a slight overhead).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to remove duplicate rows from nested list? - c#

Related

Most efficient way to distribute non unique elements across multiple lists

How to check whether an two dimensional array present in a List?

Why is my array index out-of-bounds in this algorithm?

c# - How can I add a value to a multi array?

Split array into array of arrays

Categories

Resources