I hope you can help me out on this one. I have a List < List < double[] > > and I want to remove everything which is duplicate in such list. That is:
1) Within the List < double[] > there are some of the double[] which are duplicate.I want to keep only the non-duplicate doubles[] within the List < double[] >. See lists 1 and 5 in the picture.
2) Within List < List < double[] > > there are some of the List < double[] > which are duplicate. I want to keep only the non-repeated lists. See lists 0 & 2 and lists 1 & 3.
The desired output is designated in the picture:
I have tried the following but it doesn't work.
public static List<List<double[]>> CleanListOfListsOfDoubleArray(List<List<double[]>> input)
{
var output = new List<List<double[]>>();
for (int i = 0; i < input.Count; i++)
{
var temp= input[i].Distinct().ToList();
output.Add(temp);
}
return output.Distinct().ToList();
}
Can you please help me on this?
Your code (excluding the ToList collectors) seems logically equivalent to:
return input.Select(t => t.Distinct()).Distinct();
You're trying to use Distinct on collections. That's reasonable, since you are expecting to get distinct collections.
The problem is that you have left Distinct without logic to compare these collections. Without specifying that logic, Distinct can't compare collections properly (by equality of each individual member).
There is another overload of Distinct that takes an IEqualityComparer<T> as an argument. To use it, you'll have to implement such a comparer first. A reasonable implementation (adapted from Cédric Bignon's answer) could look like this:
public class ArrayComparer<T> : IEqualityComparer<T[]>
{
public bool Equals(T[] x, T[] y)
{
return ReferenceEquals(x, y) || (x != null && y != null && x.SequenceEqual(y));
}
public int GetHashCode(T[] obj)
{
return 0;
}
}
public class ListOfArrayComparer<T> : IEqualityComparer<List<T[]>>
{
public bool Equals(List<T[]> x, List<T[]> y)
{
return ReferenceEquals(x, y) || (x != null && y != null && x.SequenceEqual(y, new ArrayComparer<T>()));
}
public int GetHashCode(List<T[]> obj)
{
return 0;
}
}
Your code should then look like this:
public static List<List<double[]>> CleanListOfListsOfDoubleArray(List<List<double[]>> input)
{
var output = new List<List<double[]>>();
for (int i = 0; i < input.Count; i++)
{
var temp = input[i].Distinct(new ArrayComparer<double>()).ToList();
output.Add(temp);
}
return output.Distinct(new ListOfArrayComparer<double>()).ToList();
}
Or even just:
public static List<List<double[]>> CleanListOfListsOfDoubleArray(List<List<double[]>> input)
{
var output = input.Select(t => t.Distinct(new ArrayComparer<double>()).ToList()).ToList();
return output.Distinct(new ListOfArrayComparer<double>()).ToList();
}
Keep in mind that this would be a lot less complicated if you used more specific types for describing your problem.
If, for example, instead of double[], you used a more specific pair type (like Tuple<double, double>), you would only need to implement one comparer (the first Distinct call could be left with its default behavior, if I remember correctly).
If, instead of the List<double> you had a specialized PairCollection that implements its own equality method, you wouldn't need the second equality comparer either (your original code would work as it already is, most probably).
So, to avoid problems like this in the future, try to declare specialized types for your problem (instead of relying on the generic lists and arrays and nesting them like here).
Related
I know there are many answers out there suggesting overriding equals and hashcode, but in my case, that is not possible because the objects used are imported from DLLs.
First, I have a list of objects called DeploymentData.
These objects, along other properties, contain the following two: Location(double x, double y, double z) and Duct(int id).
The goal is to remove those that have the same Location parameters.
First, I grouped them by Duct, as a Location can not be the same if it's on another duct.
var groupingByDuct = deploymentDataList.GroupBy(x => x.Duct.Id).ToList();
Then the actual algorithm:
List<DeploymentData> uniqueDeploymentData = new List<DeploymentData>();
foreach (var group in groupingByDuct) {
uniqueDeploymentData
.AddRange(group
.Select(x => x)
.GroupBy(d => new { d.Location.X, d.Location.Y, d.Location.Z })
.Select(x => x.First()).ToList());
}
This does the work, but in order to properly check that they are indeed duplicates, the entire location should be compared. For this, I've made the following method:
private bool CompareXYZ(XYZ point1, XYZ point2, double tolerance = 10)
{
if (System.Math.Abs(point1.X - point2.X) < tolerance &&
System.Math.Abs(point1.Y - point2.Y) < tolerance &&
System.Math.Abs(point1.Z - point2.Z) < tolerance) {
return true;
}
return false;
}
BUT I have no idea how to apply that to the code written above. To sum up:
How can I write the algorithm above without all those method calls?
How can I adjust the algorithm above to use the CompareXYZ method for a better precision?
Efficiency?
An easy way to filter duplicates is to use a Hashset with a custom equality comparer. This is a class that implements IEqualityComparer, e.g.:
public class DeploymentDataEqualityComparer : IEqualityComparer<DeploymentData>
{
private readonly double _tolerance;
public DeploymentDataEqualityComparer(double tolerance)
{
_tolerance = tolerance;
}
public bool Equals(DeploymentData a, DeploymentData b)
{
if (a.Duct.id != b.Duct.id)
return false; // Different Duct, therefore not equal
if (System.Math.Abs(a.Location.X - b.Location.X) < _tolerance &&
System.Math.Abs(a.Location.Y - b.Location.Y) < _tolerance &&
System.Math.Abs(a.Location.Z - b.Location.Z) < _tolerance) {
return true;
}
return false;
}
public GetHashCode(DeploymentData dd)
{
// If the classes of the library do not implement GetHashCode, you can create a custom implementation
return dd.Duct.GetHashCode() | dd.Location.GetHashCode();
}
}
In order to filter duplicates, you can then add them to a HashSet:
var hashSet = new HashSet<DeploymentData>(new DeploymentDataEqualityComparer(10));
foreach (var deploymentData in deploymentDataList)
hashSet.Add(deploymentData);
This way, you do not need to group by duct and use the enhanced performance of the HashSet.
I have a general question, concerning performance and best practice.
When working with a List (or any other datatype) from a different Class, which is better practice? Copying it at the beginning, working with the local and then re-copying it to the original, or always access the original?
An Example:
access the original:
public class A
{
public static List<int> list = new List<int>();
}
public class B
{
public static void insertString(int i)
{
// insert at right place
int count = A.list.Count;
if (count == 0)
{
A.list.Add(i);
}
else
{
for (int j = 0; j < count; j++)
{
if (A.list[j] >= i)
{
A.list.Insert(j, i);
break;
}
if (j == count - 1)
{
A.list.Add(i);
}
}
}
}
}
As you see I access the original List A.list several times. Here the alternative:
Copying:
public class A
{
public static List<int> list = new List<int>();
}
public class B
{
public static void insertString(int i)
{
List<int> localList = A.list;
// insert at right place
int count = localList.Count;
if (count == 0)
{
localList.Add(i);
}
else
{
for (int j = 0; j < count; j++)
{
if (localList[j] >= i)
{
localList.Insert(j, i);
break;
}
if (j == count - 1)
{
localList.Add(i);
}
}
}
A.list = localList;
}
}
Here I access the the list in the other class only twice (getting it at the beginning and setting it at the end). Which would be better.
Please note that this is a general question and that the algorithm is only an example.
I won't bother thinking about performance here and instead focus on best practice:
Giving out the whole List violates encapsulation. B can modify the List and all its elements without A noticing (This is not a problem if A never uses the List itself but then A wouldn't even need to store it).
A simple example: A creates the List and immediately adds one element. Subsequently, A never bothers to check List.Count, because it knows that the List cannot be empty. Now B comes along and empties the List...
So any time B is changed, you need to also check A to see if all the assumptions of A are still correct. This is enough of a headache if you have full control over the code. If another programmer uses your class A, he may do something unexpected with the List and never check if that's ok.
Solution(s):
If B only needs to iterate over the elements, write an IEnumerable accessor. If B mustn't modify the elements, make the accessor deliver copies.
If B needs to modify the List (add/remove elements), either give B a copy of the List (containing copies of the elements if they needn't be modified) and accept a new List from B or use an accessor as before and implement the necessary List operations. In both cases, A will know if B modifies the List and can react accordingly.
Example:
class A
{
private List<ItemType> internalList;
public IEnumerable<ItemType> Items()
{
foreach (var item in internalList)
yield return item;
// or maybe item.Copy();
// new ItemType(item);
// depending on ItemType
}
public RemoveFromList(ItemType toRemove)
{
internalList.Remove(toRemove);
// do other things necessary to keep A in a consistent state
}
}
I'm trying to use the Array.ForEach() extension method to loop through for a list of filtered elements from an array and then modify those values, unfortunately that doesn't seem to work I'm guessing because it doesn't actually modify the reference value of each element.
Is there any way to do this besides storing the results of the Array.ForEach() into a seperate array and then cloning that array to the original array? Also I know I could obviously do all of this without cloning if I use a for loop but if I could do it this way it would be cleaner and would be less code.
Here's the snippet:
Array.ForEach(Array.FindAll(starts, e => e < 0), e => e = 0);
ForEach simply isn't intended to do this - just like you wouldn't be able to do this with a foreach loop.
Personally I'd just use a for loop - it's easy to read and clear:
for (int i = 0; i < array.Length; i++)
{
// Alternatively, use Math.Max to pull up any negative values to 0
if (array[i] < 0)
{
array[i] = 0;
}
}
It really is simple - anyone will be able to understand it.
Now you could write your own extension method instead. You could write one to replace all values which satisfy a predicate with a fixed value, or you could write one to replace all values entirely... but I don't think it's really worth it. As an example of the latter:
public static void ReplaceElements<T>(this T[] array,
Func<T, T> replacementFunction)
{
// TODO: Argument validation
for (int i = 0; i < array.Length; i++)
{
array[i] = replacementFunction(array[i]);
}
}
Then call it with:
starts.ReplaceElements(x => Math.Max(x, 0));
I'd personally still use the for loop though.
(You could potentially change the above very slightly to make it take IList<T> and use Count instead. That would still work with arrays, but also List<T> etc too.)
You can do that with ref and delegates. However, I don't think it adds much value.
public delegate void RefAction<T>(ref T value);
public static void ForEachRef<T>(this T[] array, RefAction<T> action)
{
for (int i = 0; i < array.Length; ++i) action(ref array[i]);
}
You can use it as follows:
var myArray = new int[];
myArray.ForEachRef((ref int i) => i = whateverYouLike());
From the standpoint of possibility, there could be an interface IRefEnumerable<T> which iterates some container with assignable elements.
array = array.Select(x => (x < 0) ? 0: x).ToArray();
I have a generic GetMinimum method. It accepts array of IComparable type (so it may be string[] or double[]). in the case of double[] how can I implement this method to ignore the double.NaN values? (I'm looking for good practices)
when I pass this array
double[] inputArray = { double.NaN, double.NegativeInfinity, -2.3, 3 };
it returns the double.NaN!
public T GetMinimum<T>(T[] array) where T : IComparable<T>
{
T result = array[0];
foreach (T item in array)
{
if (result.CompareTo(item) > 0)
{
result = item;
}
}
return result;
}
Since both NaN < x and NaN > x will always be false, asking for the minimum of a collection that can contain NaN is simply not defined. It is like dividing by zero: there is no valid answer.
So the logical approach would be to pre-filter the values. That will not be generic but that should be OK.
var results = inputArray.EliminateNaN().GetMinimum();
Separation of concerns: the filtering should not be the responsibility (and burden) of GetMinimum().
You can't from inside the method. The reason is you have no idea what T can be from inside the method. May be you can by some little casting, but ideally this should be your approach:
public T GetMinimum<T>(T[] array, params T[] ignorables) where T : IComparable<T>
{
T result = array[0]; //some issue with the logic here.. what if array is empty
foreach (T item in array)
{
if (ignorables.Contains(item)
continue;
if (result.CompareTo(item) > 0)
{
result = item;
}
}
return result;
}
Now call this:
double[] inputArray = { double.NaN, double.NegativeInfinity, -2.3, 3 };
GetMinimum(inputArray, double.NaN);
If you're sure there is only item to be ignored, then you can take just T as the second parameter (perhaps as an optional parameter).
Or otherwise in a shorter approach, just:
inputArray.Where(x => !x.Equals(double.NaN)).Min();
According to this Q&A, it's complicated: Sorting an array of Doubles with NaN in it
Fortunately, you can hack around it:
if( item is Single || item is Double ) {
Double fitem = (Double)item;
if( fitem == Double.NaN ) continue;
}
Below, a list l that contains a list of Product with Name and Price properties.
The list can be sort alphabetically by the following class ProductNameComparer which implements IComparar.
List<Product> l = p.GetList();
l.Sort(new ProductNameComparer());
MessageBox.Show(l[0].Name);
public class ProductNameComparer : IComparer<Product>
{
public int Compare(Product x, Product y)
{
return x.Name.CompareTo(y.Name);
}
}
I do not understand how the list is being sorted. According to MSDN CompareTo returns an Int32 type value of less than zero, zero, or greater than zero. If I have:
string c = "Apple";
string d = "Orange";
return c.CompareTo(d)
The function will return "-1".
But if I replace l.Sort(-1) instead of l.Sort(new ProductNameComparer()) the code doesn't compile
Also why would Compare(Product x, Product y) takes only two Products as argument and yet managed to compare and sort a list (>2) of products?
The Sort method doesn't just call Compare once - it calls it multiple times, whenever it needs to compare two items. It's a general sort algorithm which is able to sort any collection of items, so long as it can compare any two of them in a consistent way.
The code doesn't compile if you try to call l.Sort(-1) because that's just trying to pass in an integer - what would that even mean?
You need to understand that you're not giving the Sort method one comparison result - you're giving it the ability to compare whichever items it needs to.
For the purpose of demonstration here is a possible implementation of the Sort method (a highly inefficient one, I know):
public void Sort(System.Collections.Generic.IComparer<T> comparer)
{
for (int i = 0; i < this.Count - 1; i++)
{
for (int j = i + 1; j < this.Count; j++)
{
if (comparer.Compare(this[i], this[j]) > 0)
{
T tmp = this[i];
this[i] = this[j];
this[j] = tmp;
}
}
}
}
The Sort method overload used in your example (new ProductNameComparer()) requires the parameter to implement an IComparer interface. Calling Sort(-1) won't work since int doesn't implement this interface. As per #JonSkeet, the result of calling CompareTo() is used by the sorting strategy to order the list.