Remove 'duplicates' from a list of pairings

Remove 'duplicates' from a list of pairings - c#

Title could be misleading, so an example:
I have a class:
class Pair
{
Book Book1;
Book Book2;
}
I have a list of these:
var list = new List<Pair>();
list.Add(new Pair() {
Book1 = new Book() { Id = 123 },
Book2 = new Book() { Id = 456 }
});
list.Add(new Pair() {
Book1 = new Book() { Id = 456 },
Book2 = new Book() { Id = 123 }
});
Now, despite the fact the books are 'flipped', my system should treat these as duplicates.
I need a method to remove one of these 'duplicates' from the list (any one - so let's say the first to make it simple).
What I've Tried
var tempList = new List<Pair>();
tempList.AddRange(pairs);
foreach (var dup in pairs)
{
var toRemove = pairs.FirstOrDefault(o => o.Book1.Id == dup.Book2.Id
&& o.Book2.Id == dup.Book1.Id);
if (toRemove != null)
tempList.Remove(toRemove);
}
return tempList;
This returns no items (given the example above), as both Pair objects would satisfy the condition in the lambda, I only one to remove one though.
NOTE: This wouldn't happen if I just removed the element from the collection straight away (rather than from a temporary list) - but then I wouldn't be able to iterate over it without exceptions.

You can set up an IEqualityComparer<Pair> concrete class and pass that to the .Distinct() method:
class PairComparer : IEqualityComparer<Pair>
{
public bool Equals(Pair x, Pair y)
{
return (x.Book1.Id == y.Book1.Id && x.Book2.Id == y.Book2.Id)
|| (x.Book1.Id == y.Book2.Id && x.Book2.Id == y.Book1.Id);
}
public int GetHashCode(Pair obj)
{
return obj.Book1.Id.GetHashCode() ^ obj.Book2.Id.GetHashCode();
}
}
And then use it like so:
var distinctPairs = list.Distinct(new PairComparer());

The problem is that you are removing the both duplicates.
Try this:
var uniquePairs = list.ToLookup( p => Tuple.Create(Math.Min(p.Book1.Id, p.Book2.Id), Math.Max(p.Book1.Id, p.Book2.Id)) ).Select( g => g.First() ).ToList();

I would use the following
foreach (var dup in pairs)
{
var toRemove = pairs.FirstOrDefault(o => o.Book1.Id == dup.Book2.Id
&& o.Book2.Id == dup.Book1.Id
&& o.Book1.Id > o.Book2.Id);
if (toRemove != null)
tempList.Remove(toRemove);
}
This will specifically remove the duplicate that is "out of order". But this (and your original) will fail if the duplicate pairs have the books in the same order.
A better solution (since we're looping over ever pair anyways) would be to use a HashSet
var hashSet = new HashSet<Tuple<int,int>>();
foreach (var item in pairs)
{
var tuple = new Tuple<int,int>();
if (item.Book1.Id < item.Book2.Id)
{
tuple.Item1 = item.Book1.Id;
tuple.Item2 = item.Book2.Id;
}
else
{
tuple.Item1 = item.Book2.Id;
tuple.Item2 = item.Book1.Id;
}
if (hashSet.Contains(tuple))
{
tempList.Remove(dup);
}
else
{
hashSet.Add(tuple);
}
}

I've managed to find a solution, but it's one I'm not happy with. It seems too verbose for the job I'm trying to do. I'm now doing an additional check to see whether a duplicate has already been added to the list:
if(toRemove != null && tempList.Any(o => o.Book1.Id == toRemove.Book2.Id
&& o.Book2.Id == toRemove.Book1.Id))
tempList.Remove(toRemove);
I'm very much open to alternative suggestions.

Related

How do I pick out values between a duplicate value in a collection?

I have a method that returns a collection that has a duplicate value.
static List<string> GenerateItems()
{
var _items = new List<string>();
_items.Add("Tase");
_items.Add("Ray");
_items.Add("Jay");
_items.Add("Bay");
_items.Add("Tase");
_items.Add("Man");
_items.Add("Ran");
_items.Add("Ban");
return _items;
}
I want to search through that collection and find the first place that duplicate value is located and start collecting all the values from the first appearance of the duplicate value to its next appearance. I want to put this in a collection but I only want the duplicate value to appear once in that collection.
This is what I have so far but.
static void Main(string[] args)
{
string key = "Tase";
var collection = GenerateItems();
int index = collection.FindIndex(a => a == key);
var matchFound = false;
var itemsBetweenKey = new List<string>();
foreach (var item in collection)
{
if (item == key)
{
matchFound = !matchFound;
}
if (matchFound)
{
itemsBetweenKey.Add(item);
}
}
foreach (var item in itemsBetweenKey)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
There must be an easier way of doing this. Perhaps with Indexing or a LINQ query?

You can do something like that
string key = "Tase";
var collection = GenerateItems();
int indexStart = collection.FindIndex(a => a == key);
int indexEnd = collection.FindIndex(indexStart+1, a => a == key);
var result = collection.GetRange(indexStart, indexEnd-indexStart);

You can use linq select and group by to find the first index and last index of all duplicates (Keep in mind if something is in the list more then 2 times it would ignore the middle occurences.
But I personally think the linq for this seems overcomplicated. I would stick with simple for loops and if statements (Just turn it into a method so it reads better)
Here is a solution with Linq to get all duplicate and all values between those duplicates including itself once as you mentioned.
var collection = GenerateItems();
var Duplicates = collection.Select((x,index) => new { index, value = x })
.GroupBy(x => x.value)//group by the strings
.Where(x => x.Count() > 1)//only take duplicates
.Select(x=>new {
Value = x.Key,
FirstIndex = x.Min(y=> y.index),//take first occurenc
LastIndex = x.Max(y => y.index)//take last occurence
}).ToList();
var resultlist = new List<List<string>>();
foreach (var duplicaterange in Duplicates)
resultlist .Add(collection.GetRange(duplicaterange.FirstIndex, duplicaterange.LastIndex - duplicaterange.FirstIndex));

Try this function
public List<string> PickOut(List<string> collection, string key)
{
var index = 0;
foreach (var item in collection)
{
if (item == key)
{
return collection.Skip(index).TakeWhile(x=> x != key).ToList();
}
index++;
};
return null;
}

First finding the duplicate key then find the second occurrence of the item and then take result.
var firstduplicate = collection.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => g.Key).First();
var indices = collection.Select((b, i) => b == firstduplicate ? i : -1).Where(i => i != -1).Skip(1).FirstOrDefault();
if (indices>0)
{
var result = collection.Take(indices).ToList();
}

How to find out duplicate Elements in Xelement

I am trying to find out the duplicate Elements in XElement , and make a generic function to remove duplicates .Something like:
public List<Xelement>RemoveDuplicatesFromXml(List<Xelement> xele)
{ // pass the Xelement List in the Argument and get the List back , after deleting the duplicate entries.
return xele;
}
the xml is as follows:
<Execute ID="7300" Attrib1="xyz" Attrib2="abc" Attrib3="mno" Attrib4="pqr" Attrib5="BCD" />
<Execute ID="7301" Attrib1="xyz" Attrib2="abc" Attrib3="mno" Attrib4="pqr" Attrib5="BCD" />
<Execute ID="7302" Attrib1="xyz1" Attrib2="abc" Attrib3="mno" Attrib4="pqr" Attrib5="BCD" />
I want get duplicates on every attribute excluding ID ,and then delete the one having lesser ID.
Thanks,

You can implement custom IEqualityComparer for this task
class XComparer : IEqualityComparer<XElement>
{
public IList<string> _exceptions;
public XComparer(params string[] exceptions)
{
_exceptions = new List<string>(exceptions);
}
public bool Equals(XElement a, XElement b)
{
var attA = a.Attributes().ToList();
var attB = b.Attributes().ToList();
var setA = AttributeNames(attA);
var setB = AttributeNames(attB);
if (!setA.SetEquals(setB))
{
return false;
}
foreach (var e in setA)
{
var xa = attA.First(x => x.Name.LocalName == e);
var xb = attB.First(x => x.Name.LocalName == e);
if (xa.Value == null && xb.Value == null)
continue;
if (xa.Value == null || xb.Value == null)
return false;
if (!xa.Value.Equals(xb.Value))
{
return false;
}
}
return true;
}
private HashSet<string> AttributeNames(IList<XAttribute> e)
{
return new HashSet<string>(e.Select(x =>x.Name.LocalName).Except(_exceptions));
}
public int GetHashCode(XElement e)
{
var h = 0;
var atts = e.Attributes().ToList();
var names = AttributeNames(atts);
foreach (var a in names)
{
var xa = atts.First(x => x.Name.LocalName == a);
if (xa.Value != null)
{
h = h ^ xa.Value.GetHashCode();
}
}
return h;
}
}
Usage:
var comp = new XComparer("ID");
var distXEle = xele.Distinct(comp);
Please note that IEqualityComparer implementation in this answer only compare LocalName and doesn't take namespace into considerataion. If you have element with duplicate local name attribute, then this implementation will take the first one.
You can see the demo here : https://dotnetfiddle.net/w2DteS
Edit
If you want to
delete the one having lesser ID
It means you want the largest ID, then you can chain the .Distinct call with .Select.
var comp = new XComparer("ID");
var distXEle = xele
.Distinct(comp)
.Select(z => xele
.Where(a => comp.Equals(z, a))
.OrderByDescending(a => int.Parse(a.Attribute("ID").Value))
.First()
);
It will guarantee that you get the element with largest ID.

Use Linq GroupBy
var doc = XDocument.Parse(yourXmlString);
var groups = doc.Root
.Elements()
.GroupBy(element => new
{
Attrib1 = element.Attribute("Attrib1").Value,
Attrib2 = element.Attribute("Attrib2").Value,
Attrib3 = element.Attribute("Attrib3").Value,
Attrib4 = element.Attribute("Attrib4").Value,
Attrib5 = element.Attribute("Attrib5").Value
});
var duplicates = group1.SelectMany(group =>
{
if(group.Count() == 1) // remove this if you want only duplicates
{
return group;
}
int minId = group.Min(element => int.Parse(element.Attribute("ID").Value));
return group.Where(element => int.Parse(element.Attribute("ID").Value) > minId);
});
Solution above will remove elements with lesser ID which have duplicates by attributes.
If you want return only elements which have duplicates then remove if fork from last lambda

Combine two list values into one

is there a way to combine these to list items into one list item ? i am sorry if this is a begginer mistake
List<string> values = new List<string>();
foreach (Feature f in allFeatures)
{
if (f.ColumnValues.ContainsKey(layercode)&& f.ColumnValues.ContainsKey(layercode2))
{
if (!values.Contains(f.ColumnValues[layercode].ToString()) && !values.Contains(f.ColumnValues[layercode2].ToString()))
{
values.Add(f.ColumnValues[layercode].ToString());
values.Add(f.ColumnValues[layercode2].ToString());
}
}
}

You can use a List of Tuples, a Dictionary, or create a class. I will not go into depth explaining these as you should be able to easily search and find other questions all about these. Some of this is from memory so syntax might be a bit off.
List of Tuples
List<Tuple<string,string>> values = new List<Tuple<string,string>>();
//...
if ( !values.Any(v=>v.Item1 == f.ColumnValues[layercode].ToString()) && !values.Any(v=>v.Item2 == f.ColumnValues[layercode2].ToString()) )
{
values.Add( Tuple.Create(f.ColumnValues[layercode].ToString(),
f.ColumnValues[layercode2].ToString()) );
}
Dictionary
Dictionary<string,string> values = new Dictionary<string,string> ();
//...
if (!values.ContainsKey(f.ColumnValues[layercode].ToString()) && !values.ContainsValue(f.ColumnValues[layercode2].ToString()))
{
values[f.ColumnValues[layercode].ToString()] = f.ColumnValues[layercode2].ToString();
}
List of class instances
public class LayerCodePair {
public string Code1 {get;set;}
public string Code2 {get;set;}
} // declared outside of method
...
List<LayerCodePair> values = new List<LayerCodePair>();
//...
if (!values.Any(v=> v.Code1 == f.ColumnValues[layercode].ToString()) && !values.Any(v=>v.Code2 == f.ColumnValues[layercode2].ToString()))
{
values.Add(new LayerCodePair{
Code1 = f.ColumnValues[layercode].ToString(),
Code2 = f.ColumnValues[layercode2].ToString()
});
}

It should work for you, using ";" character as a separator:
List<string> values = new List<string>();
foreach (Feature f in allFeatures)
{
var columnValues = f.ColumnValues;
var firstLayerCode = columnValues[layercode].ToString();
var secondLayerCode = columnValues[layercode2].ToString();
if (columnValues.ContainsKey(layercode) && columnValues.ContainsKey(layercode2))
{
if (!values.Contains(firstLayerCode) && !values.Contains(secondLayerCode))
{
var combinedValue = firstLayerCode + ";" + secondLayerCode;
values.Add(combinedValue);
}
}
}

Linq to objects - Search collection with value from other collection

I've tried to search SO for solutions and questions that could be similar to my case.
I got 2 collections of objects:
public class BRSDocument
{
public string IdentifierValue { get; set;}
}
public class BRSMetadata
{
public string Value { get; set;}
}
I fill the list from my datalayer:
List<BRSDocument> colBRSDocuments = Common.Instance.GetBRSDocuments();
List<BRSMetadata> colBRSMetadata = Common.Instance.GetMessageBRSMetadata();
I now want to find that one object in colBRSDocuments where x.IdentifierValue is equal to the one object in colBRSMetadata y.Value. I just need to find the BRSDocument that matches a value from the BRSMetadata objects.
I used a ordinary foreach loop and a simple linq search to find the data and break when the value is found. I'm wondering if the search can be done completely with linq?
foreach (var item in colBRSMetadata)
{
BRSDocument res = colBRSDocuments.FirstOrDefault(x => x.IdentifierValue == item.Value);
if (res != null)
{
//Do work
break;
}
}
Hope that some of you guys can push me in the right direction...

Why not do a join?
var docs = from d in colBRSDocuments
join m in colBRSMetadata on d.IdentiferValue equals m.Value
select d;
If there's only meant to be one then you can do:
var doc = docs.Single(); // will throw if there is not exactly one element
If you want to return both objects, then you can do the following:
var docsAndData = from d in colBRSDocuments
join m in colBRSMetadata on d.IdentiferValue equals m.Value
select new
{
Doc = d,
Data = m
};
then you can access like:
foreach (var dd in docsAndData)
{
// dd.Doc
// dd.Data
}

Use Linq ?
Something like this should do the job :
foreach (var res in colBRSMetadata.Select(item => colBRSDocuments.FirstOrDefault(x => x.IdentifierValue == item.Value)).Where(res => res != null))
{
//Do work
break;
}
If you are just interested by the first item, then the code would be :
var brsDocument = colBRSMetadata.Select(item => colBRSDocuments.FirstOrDefault(x => x.IdentifierValue == item.Value)).FirstOrDefault(res => res != null);
if (brsDocument != null)
//Do Stuff

firstorDefault performance rising

part of the code:
Dictionary<Calculation, List<PropertyValue>> result = new Dictionary<Calculation, List<PropertyValue>>();
while (reader != null && reader.Read()) //it loops about 60000, and it will be bigger
{
#region create calc and propvalue variables
//...
#endregion
//this FirstOrDefault needs a lot of time
tmpElementOfResult = result.Keys.FirstOrDefault(r => r.InnerID == calc.InnerID);
if (tmpElementOfResult == null)
{
result.Add(calc, new List<PropertyValue> { propValue });
}
else
{
result[tmpElementOfResult].Add(propValue);
}
}
Could you give me some idea how to make it faster, because now it's approximately 25 sec :( ?

It sounds like you should have a dictionary from the type of calc.InnerID, instead of a Dictionary<Calc, ...>. That way you can do the lookup far more quickly. Do you actually need to store the Calc itself at all, or are you only interested in the ID?
For example:
Dictionary<Guid, List<PropertyValue>> result =
new Dictionary<Guid, List<PropertyValue>>();
while (reader.Read())
{
// Work out calc
List<PropertyValue> list;
if (!result.TryGetValue(calc.InnerID, out list))
{
list = new List<PropertyValue>();
result[calc.InnerID] = list;
}
list.Add(propValue);
}
Alternatively, if you can convert the reader to an IEnumerable<Calc> you could use:
Lookup<Guid, PropertyValue> result = items.ToLookup(x => x.InnerID,
// Or however you get it...
x => x.PropertyValue);
EDIT: It sounds like two Calc values should be deemed equal if they have the same InnerID, right? So override Equals and GetHashCode within Calc to refer to the InnerID. Then you can just use:
Lookup<Calc, PropertyValue> result = items.ToLookup(x => x,
// Or however you get it...
x => x.PropertyValue);
... or you can use code like the first snippet, but with a Dictionary<Calc, ...>:
Dictionary<Calc, List<PropertyValue>> result =
new Dictionary<Calc, List<PropertyValue>>();
while (reader.Read())
{
// Work out calc
List<PropertyValue> list;
if (!result.TryGetValue(calc, out list))
{
list = new List<PropertyValue>();
result[calc] = list;
}
list.Add(propValue);
}

instead of
tmpElementOfResult = result.Keys.FirstOrDefault(r => r.InnerID == calc.InnerID);
use
result.ContainsKey(calc.InnerId);
to check if a key is present.

Is it possible to do something like this:
lookUpForResult = result.ToLookup(x => x.Key.InnerID, x => x.Value);
if (lookUpForResult.Contains(calc.InnerID))
{
result.Add(calc, new List<PropertyValue> { propValue });
}
else
{
(lookUpForResult[calc.InnerID]).Add(propValue);
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove 'duplicates' from a list of pairings - c#

The problem is that you are removing the both duplicates. Try this: var uniquePairs = list.ToLookup( p => Tuple.Create(Math.Min(p.Book1.Id, p.Book2.Id), Math.Max(p.Book1.Id, p.Book2.Id)) ).Select( g => g.First() ).ToList();

Related

How do I pick out values between a duplicate value in a collection?

How to find out duplicate Elements in Xelement

Combine two list values into one

Linq to objects - Search collection with value from other collection

firstorDefault performance rising

Categories

Resources