How to find out duplicate Elements in Xelement

How to find out duplicate Elements in Xelement - c#

I am trying to find out the duplicate Elements in XElement , and make a generic function to remove duplicates .Something like:
public List<Xelement>RemoveDuplicatesFromXml(List<Xelement> xele)
{ // pass the Xelement List in the Argument and get the List back , after deleting the duplicate entries.
return xele;
}
the xml is as follows:
<Execute ID="7300" Attrib1="xyz" Attrib2="abc" Attrib3="mno" Attrib4="pqr" Attrib5="BCD" />
<Execute ID="7301" Attrib1="xyz" Attrib2="abc" Attrib3="mno" Attrib4="pqr" Attrib5="BCD" />
<Execute ID="7302" Attrib1="xyz1" Attrib2="abc" Attrib3="mno" Attrib4="pqr" Attrib5="BCD" />
I want get duplicates on every attribute excluding ID ,and then delete the one having lesser ID.
Thanks,

You can implement custom IEqualityComparer for this task
class XComparer : IEqualityComparer<XElement>
{
public IList<string> _exceptions;
public XComparer(params string[] exceptions)
{
_exceptions = new List<string>(exceptions);
}
public bool Equals(XElement a, XElement b)
{
var attA = a.Attributes().ToList();
var attB = b.Attributes().ToList();
var setA = AttributeNames(attA);
var setB = AttributeNames(attB);
if (!setA.SetEquals(setB))
{
return false;
}
foreach (var e in setA)
{
var xa = attA.First(x => x.Name.LocalName == e);
var xb = attB.First(x => x.Name.LocalName == e);
if (xa.Value == null && xb.Value == null)
continue;
if (xa.Value == null || xb.Value == null)
return false;
if (!xa.Value.Equals(xb.Value))
{
return false;
}
}
return true;
}
private HashSet<string> AttributeNames(IList<XAttribute> e)
{
return new HashSet<string>(e.Select(x =>x.Name.LocalName).Except(_exceptions));
}
public int GetHashCode(XElement e)
{
var h = 0;
var atts = e.Attributes().ToList();
var names = AttributeNames(atts);
foreach (var a in names)
{
var xa = atts.First(x => x.Name.LocalName == a);
if (xa.Value != null)
{
h = h ^ xa.Value.GetHashCode();
}
}
return h;
}
}
Usage:
var comp = new XComparer("ID");
var distXEle = xele.Distinct(comp);
Please note that IEqualityComparer implementation in this answer only compare LocalName and doesn't take namespace into considerataion. If you have element with duplicate local name attribute, then this implementation will take the first one.
You can see the demo here : https://dotnetfiddle.net/w2DteS
Edit
If you want to
delete the one having lesser ID
It means you want the largest ID, then you can chain the .Distinct call with .Select.
var comp = new XComparer("ID");
var distXEle = xele
.Distinct(comp)
.Select(z => xele
.Where(a => comp.Equals(z, a))
.OrderByDescending(a => int.Parse(a.Attribute("ID").Value))
.First()
);
It will guarantee that you get the element with largest ID.

Use Linq GroupBy
var doc = XDocument.Parse(yourXmlString);
var groups = doc.Root
.Elements()
.GroupBy(element => new
{
Attrib1 = element.Attribute("Attrib1").Value,
Attrib2 = element.Attribute("Attrib2").Value,
Attrib3 = element.Attribute("Attrib3").Value,
Attrib4 = element.Attribute("Attrib4").Value,
Attrib5 = element.Attribute("Attrib5").Value
});
var duplicates = group1.SelectMany(group =>
{
if(group.Count() == 1) // remove this if you want only duplicates
{
return group;
}
int minId = group.Min(element => int.Parse(element.Attribute("ID").Value));
return group.Where(element => int.Parse(element.Attribute("ID").Value) > minId);
});
Solution above will remove elements with lesser ID which have duplicates by attributes.
If you want return only elements which have duplicates then remove if fork from last lambda

Related

How do I pick out values between a duplicate value in a collection?

I have a method that returns a collection that has a duplicate value.
static List<string> GenerateItems()
{
var _items = new List<string>();
_items.Add("Tase");
_items.Add("Ray");
_items.Add("Jay");
_items.Add("Bay");
_items.Add("Tase");
_items.Add("Man");
_items.Add("Ran");
_items.Add("Ban");
return _items;
}
I want to search through that collection and find the first place that duplicate value is located and start collecting all the values from the first appearance of the duplicate value to its next appearance. I want to put this in a collection but I only want the duplicate value to appear once in that collection.
This is what I have so far but.
static void Main(string[] args)
{
string key = "Tase";
var collection = GenerateItems();
int index = collection.FindIndex(a => a == key);
var matchFound = false;
var itemsBetweenKey = new List<string>();
foreach (var item in collection)
{
if (item == key)
{
matchFound = !matchFound;
}
if (matchFound)
{
itemsBetweenKey.Add(item);
}
}
foreach (var item in itemsBetweenKey)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
There must be an easier way of doing this. Perhaps with Indexing or a LINQ query?

You can do something like that
string key = "Tase";
var collection = GenerateItems();
int indexStart = collection.FindIndex(a => a == key);
int indexEnd = collection.FindIndex(indexStart+1, a => a == key);
var result = collection.GetRange(indexStart, indexEnd-indexStart);

You can use linq select and group by to find the first index and last index of all duplicates (Keep in mind if something is in the list more then 2 times it would ignore the middle occurences.
But I personally think the linq for this seems overcomplicated. I would stick with simple for loops and if statements (Just turn it into a method so it reads better)
Here is a solution with Linq to get all duplicate and all values between those duplicates including itself once as you mentioned.
var collection = GenerateItems();
var Duplicates = collection.Select((x,index) => new { index, value = x })
.GroupBy(x => x.value)//group by the strings
.Where(x => x.Count() > 1)//only take duplicates
.Select(x=>new {
Value = x.Key,
FirstIndex = x.Min(y=> y.index),//take first occurenc
LastIndex = x.Max(y => y.index)//take last occurence
}).ToList();
var resultlist = new List<List<string>>();
foreach (var duplicaterange in Duplicates)
resultlist .Add(collection.GetRange(duplicaterange.FirstIndex, duplicaterange.LastIndex - duplicaterange.FirstIndex));

Try this function
public List<string> PickOut(List<string> collection, string key)
{
var index = 0;
foreach (var item in collection)
{
if (item == key)
{
return collection.Skip(index).TakeWhile(x=> x != key).ToList();
}
index++;
};
return null;
}

First finding the duplicate key then find the second occurrence of the item and then take result.
var firstduplicate = collection.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => g.Key).First();
var indices = collection.Select((b, i) => b == firstduplicate ? i : -1).Where(i => i != -1).Skip(1).FirstOrDefault();
if (indices>0)
{
var result = collection.Take(indices).ToList();
}

How do you call a function to an Linq List query that uses the exist function

I have this method with a linq statement below. I'm not a fan of multiple if statement and I'm trying to find what is the best way to not have these if statement and have a private method.
My field values is being set as such:
var fieldValues = await GetFields // then it's being passed to my method.
public static AppraisalContactBorrower BuildCoBorrower(List<LoanFieldValue> fieldValues) {
var coborrower = new AppraisalContactBorrower();
if (fieldValues.Exists(f => f.FieldId == "CX.OS.AO.COBORRNAME")) {
coborrower.Name = fieldValues.First(v => v.FieldId == "CX.OS.AO.COBORRNAME").Value;
}
if (fieldValues.Exists(f => f.FieldId == "CX.OS.AO.BORRCONTACTZIP")) {
borrower.Zip = fieldValues.First(v => v.FieldId == "CX.OS.AO.BORRCONTACTZIP").Value;
}
if (fieldValues.Exists(f => f.FieldId == "CX.OS.AO.BORRCONTACTZIP")) {
borrower.Zip = fieldValues.First(v => v.FieldId == "CX.OS.AO.BORRCONTACTZIP").Value;
}
What I'm trying to do is instead of this:
coborrower.Name = fieldValues.First(v => v.FieldId == "CX.OS.AO.COBORRNAME").Value;
Is having something similar to this.
if (fieldValues.Exists(f => f.FieldId == "CX.OS.AO.BORRCONTACTZIP")) {
coborrower.Name = SETVALUE("CX.OS.AO.BORRCONTACTZIP")}

First, try using Enumerable.ToDictionary to have the field values grouped by FieldId, then use IDictionary.TryGetValue to get the existing values:
public static AppraisalContactBorrower BuildCoBorrower(List<LoanFieldValue> fieldValues) {
var groupedFieldValues = fieldValues.ToDictionary(f => f.FieldId)
var coborrower = new AppraisalContactBorrower();
if (groupedFieldValues.TryGetValue("CX.OS.AO.COBORRNAME", out var name)) {
coborrower.Name = name.Value;
}
if (groupedFieldValues.TryGetValue("CX.OS.AO.BORRCONTACTZIP", out var zip)) {
borrower.Zip = zip.Value;
}
}
Using Dictionary makes it faster to check the appropriate field existence as it is O(1) and with TryGetValue you combine two operations into one (existence check + obtaining the value).

Your two last statements are almost identitical. The equivalent of :
if (groupedFieldValues.TryGetValue("CX.OS.AO.COBORRNAME", out var name)) {
coborrower.Name = name.Value;
}
is:
coborrower.Name = fieldValues.FirstOrDefault(v => v.FieldId == "CX.OS.AO.COBORRNAME")
?? coborrower.Name;
In the original code, coborrower.Name is not updated if the field doesn't exist in the list.

Remove 'duplicates' from a list of pairings

Title could be misleading, so an example:
I have a class:
class Pair
{
Book Book1;
Book Book2;
}
I have a list of these:
var list = new List<Pair>();
list.Add(new Pair() {
Book1 = new Book() { Id = 123 },
Book2 = new Book() { Id = 456 }
});
list.Add(new Pair() {
Book1 = new Book() { Id = 456 },
Book2 = new Book() { Id = 123 }
});
Now, despite the fact the books are 'flipped', my system should treat these as duplicates.
I need a method to remove one of these 'duplicates' from the list (any one - so let's say the first to make it simple).
What I've Tried
var tempList = new List<Pair>();
tempList.AddRange(pairs);
foreach (var dup in pairs)
{
var toRemove = pairs.FirstOrDefault(o => o.Book1.Id == dup.Book2.Id
&& o.Book2.Id == dup.Book1.Id);
if (toRemove != null)
tempList.Remove(toRemove);
}
return tempList;
This returns no items (given the example above), as both Pair objects would satisfy the condition in the lambda, I only one to remove one though.
NOTE: This wouldn't happen if I just removed the element from the collection straight away (rather than from a temporary list) - but then I wouldn't be able to iterate over it without exceptions.

You can set up an IEqualityComparer<Pair> concrete class and pass that to the .Distinct() method:
class PairComparer : IEqualityComparer<Pair>
{
public bool Equals(Pair x, Pair y)
{
return (x.Book1.Id == y.Book1.Id && x.Book2.Id == y.Book2.Id)
|| (x.Book1.Id == y.Book2.Id && x.Book2.Id == y.Book1.Id);
}
public int GetHashCode(Pair obj)
{
return obj.Book1.Id.GetHashCode() ^ obj.Book2.Id.GetHashCode();
}
}
And then use it like so:
var distinctPairs = list.Distinct(new PairComparer());

The problem is that you are removing the both duplicates.
Try this:
var uniquePairs = list.ToLookup( p => Tuple.Create(Math.Min(p.Book1.Id, p.Book2.Id), Math.Max(p.Book1.Id, p.Book2.Id)) ).Select( g => g.First() ).ToList();

I would use the following
foreach (var dup in pairs)
{
var toRemove = pairs.FirstOrDefault(o => o.Book1.Id == dup.Book2.Id
&& o.Book2.Id == dup.Book1.Id
&& o.Book1.Id > o.Book2.Id);
if (toRemove != null)
tempList.Remove(toRemove);
}
This will specifically remove the duplicate that is "out of order". But this (and your original) will fail if the duplicate pairs have the books in the same order.
A better solution (since we're looping over ever pair anyways) would be to use a HashSet
var hashSet = new HashSet<Tuple<int,int>>();
foreach (var item in pairs)
{
var tuple = new Tuple<int,int>();
if (item.Book1.Id < item.Book2.Id)
{
tuple.Item1 = item.Book1.Id;
tuple.Item2 = item.Book2.Id;
}
else
{
tuple.Item1 = item.Book2.Id;
tuple.Item2 = item.Book1.Id;
}
if (hashSet.Contains(tuple))
{
tempList.Remove(dup);
}
else
{
hashSet.Add(tuple);
}
}

I've managed to find a solution, but it's one I'm not happy with. It seems too verbose for the job I'm trying to do. I'm now doing an additional check to see whether a duplicate has already been added to the list:
if(toRemove != null && tempList.Any(o => o.Book1.Id == toRemove.Book2.Id
&& o.Book2.Id == toRemove.Book1.Id))
tempList.Remove(toRemove);
I'm very much open to alternative suggestions.

Merge two or more T in List<T> based on condition

I have the below class:
public class FactoryOrder
{
public string Text { get; set; }
public int OrderNo { get; set; }
}
and collection holding the list of FactoryOrders
List<FactoryOrder>()
here is the sample data
FactoryOrder("Apple",20)
FactoryOrder("Orange",21)
FactoryOrder("WaterMelon",42)
FactoryOrder("JackFruit",51)
FactoryOrder("Grapes",71)
FactoryOrder("mango",72)
FactoryOrder("Cherry",73)
My requirement is to merge the Text of FactoryOrders where orderNo are in sequence and retain the lower orderNo for the merged FactoryOrder
- so the resulting output will be
FactoryOrder("Apple Orange",20) //Merged Apple and Orange and retained Lower OrderNo 20
FactoryOrder("WaterMelon",42)
FactoryOrder("JackFruit",51)
FactoryOrder("Grapes mango Cherry",71)//Merged Grapes,Mango,cherry and retained Lower OrderNo 71
I am new to Linq so not sure how to go about this. Any help or pointers would be appreciated

As commented, if your logic depends on consecutive items so heavily LINQ is not the easiest appoach. Use a simple loop.
You could order them first with LINQ: orders.OrderBy(x => x.OrderNo )
var consecutiveOrdernoGroups = new List<List<FactoryOrder>> { new List<FactoryOrder>() };
FactoryOrder lastOrder = null;
foreach (FactoryOrder order in orders.OrderBy(o => o.OrderNo))
{
if (lastOrder == null || lastOrder.OrderNo == order.OrderNo - 1)
consecutiveOrdernoGroups.Last().Add(order);
else
consecutiveOrdernoGroups.Add(new List<FactoryOrder> { order });
lastOrder = order;
}
Now you just need to build the list of FactoryOrder with the joined names for every group. This is where LINQ and String.Join can come in handy:
orders = consecutiveOrdernoGroups
.Select(list => new FactoryOrder
{
Text = String.Join(" ", list.Select(o => o.Text)),
OrderNo = list.First().OrderNo // is the minimum number
})
.ToList();
Result with your sample:

I'm not sure this can be done using a single comprehensible LINQ expression. What would work is a simple enumeration:
private static IEnumerable<FactoryOrder> Merge(IEnumerable<FactoryOrder> orders)
{
var enumerator = orders.OrderBy(x => x.OrderNo).GetEnumerator();
FactoryOrder previousOrder = null;
FactoryOrder mergedOrder = null;
while (enumerator.MoveNext())
{
var current = enumerator.Current;
if (mergedOrder == null)
{
mergedOrder = new FactoryOrder(current.Text, current.OrderNo);
}
else
{
if (current.OrderNo == previousOrder.OrderNo + 1)
{
mergedOrder.Text += current.Text;
}
else
{
yield return mergedOrder;
mergedOrder = new FactoryOrder(current.Text, current.OrderNo);
}
}
previousOrder = current;
}
if (mergedOrder != null)
yield return mergedOrder;
}
This assumes FactoryOrder has a constructor accepting Text and OrderNo.

Linq implementation using side effects:
var groupId = 0;
var previous = Int32.MinValue;
var grouped = GetItems()
.OrderBy(x => x.OrderNo)
.Select(x =>
{
var #group = x.OrderNo != previous + 1 ? (groupId = x.OrderNo) : groupId;
previous = x.OrderNo;
return new
{
GroupId = group,
Item = x
};
})
.GroupBy(x => x.GroupId)
.Select(x => new FactoryOrder(
String.Join(" ", x.Select(y => y.Item.Text).ToArray()),
x.Key))
.ToArray();
foreach (var item in grouped)
{
Console.WriteLine(item.Text + "\t" + item.OrderNo);
}
output:
Apple Orange 20
WaterMelon 42
JackFruit 51
Grapes mango Cherry 71
Or, eliminate the side effects by using a generator extension method
public static class IEnumerableExtensions
{
public static IEnumerable<IList<T>> MakeSets<T>(this IEnumerable<T> items, Func<T, T, bool> areInSameGroup)
{
var result = new List<T>();
foreach (var item in items)
{
if (!result.Any() || areInSameGroup(result[result.Count - 1], item))
{
result.Add(item);
continue;
}
yield return result;
result = new List<T> { item };
}
if (result.Any())
{
yield return result;
}
}
}
and your implementation becomes
var grouped = GetItems()
.OrderBy(x => x.OrderNo)
.MakeSets((prev, next) => next.OrderNo == prev.OrderNo + 1)
.Select(x => new FactoryOrder(
String.Join(" ", x.Select(y => y.Text).ToArray()),
x.First().OrderNo))
.ToList();
foreach (var item in grouped)
{
Console.WriteLine(item.Text + "\t" + item.OrderNo);
}
The output is the same but the code is easier to follow and maintain.

LINQ + sequential processing = Aggregate.
It's not said though that using Aggregate is always the best option. Sequential processing in a for(each) loop usually makes for better readable code (see Tim's answer). Anyway, here's a pure LINQ solution.
It loops through the orders and first collects them in a dictionary having the first Id of consecutive orders as Key, and a collection of orders as Value. Then it produces a result using string.Join:
Class:
class FactoryOrder
{
public FactoryOrder(int id, string name)
{
this.Id = id;
this.Name = name;
}
public int Id { get; set; }
public string Name { get; set; }
}
The program:
IEnumerable<FactoryOrder> orders =
new[]
{
new FactoryOrder(20, "Apple"),
new FactoryOrder(21, "Orange"),
new FactoryOrder(22, "Pear"),
new FactoryOrder(42, "WaterMelon"),
new FactoryOrder(51, "JackFruit"),
new FactoryOrder(71, "Grapes"),
new FactoryOrder(72, "Mango"),
new FactoryOrder(73, "Cherry"),
};
var result = orders.OrderBy(t => t.Id).Aggregate(new Dictionary<int, List<FactoryOrder>>(),
(dir, curr) =>
{
var prevId = dir.SelectMany(d => d.Value.Select(v => v.Id))
.OrderBy(i => i).DefaultIfEmpty(-1)
.LastOrDefault();
var newKey = dir.Select(d => d.Key).OrderBy(i => i).LastOrDefault();
if (prevId == -1 || curr.Id - prevId > 1)
{
newKey = curr.Id;
}
if (!dir.ContainsKey(newKey))
{
dir[newKey] = new List<FactoryOrder>();
}
dir[newKey].Add(curr);
return dir;
}, c => c)
.Select(t => new
{
t.Key,
Items = string.Join(" ", t.Value.Select(v => v.Name))
}).ToList();
As you see, it's not really straightforward what happens here, and chances are that it performs badly when there are "many" items, because the growing dictionary is accessed over and over again.
Which is a long-winded way to say: don't use Aggregate.

Just coded a method, it's compact and quite good in terms of performance :
static List<FactoryOrder> MergeValues(List<FactoryOrder> dirtyList)
{
FactoryOrder[] temp1 = dirtyList.ToArray();
int index = -1;
for (int i = 1; i < temp1.Length; i++)
{
if (temp1[i].OrderNo - temp1[i - 1].OrderNo != 1) { index = -1; continue; }
if(index == -1 ) index = dirtyList.IndexOf(temp1[i - 1]);
dirtyList[index].Text += " " + temp1[i].Text;
dirtyList.Remove(temp1[i]);
}
return dirtyList;
}

Linq Conditional .Any() Select

How can I perform a conditional select on a column value, where I have a preference over which value is returned. If I can't find the top choice, I settle on the next, if available, and then if not the next, etc. As it looks right now, it would take 3 total queries. Is there a way to simplify this further?
var myResult = string.Empty;
if (myTable.Where(x => x.ColumnValue == "Three").Any())
{
myResult = "Three"; // Can also be some list.First().Select(x => x.ColumnValue) if that makes it easier;
}
else if (myTable.Where(x => x.ColumnValue == "One").Any())
{
myResult = "One";
}
else if (myTable.Where(x => x.ColumnValue == "Two").Any())
{
myResult = "Two";
}
else
{
myResult = "Four";
}

You could use a string[] for your preferences:
string[] prefs = new[]{ "One", "Two", "Three" };
string myResult = prefs.FirstOrDefault(p => myTable.Any(x => x.ColumnValue == p));
if(myResult == null) myResult = "Four";
Edit Enumerable.Join is a very efficient hash table method, it also needs only one query:
string myResult = prefs.Select((pref, index) => new { pref, index })
.Join(myTable, xPref => xPref.pref, x => x.ColumnValue, (xPref, x) => new { xPref, x })
.OrderBy(x => x.xPref.index)
.Select(x => x.x.ColumnValue)
.DefaultIfEmpty("Four")
.First();
Demo

I wrote an extension method that effectively mirrors Tim Schmelter's answer (was testing this when he posted his update. :-()
public static T PreferredFirst<T>(this IEnumerable<T> data, IEnumerable<T> queryValues, T whenNone)
{
var matched = from d in data
join v in queryValues.Select((value,idx) => new {value, idx}) on d equals v.value
orderby v.idx
select new { d, v.idx };
var found = matched.FirstOrDefault();
return found != null ? found.d : whenNone;
}
// usage:
myResult = myTable.Select(x => x.ColumnValue)
.PreferredFirst(new [] {"Three", "One", "Two"}, "Four");
I've written one that will quit a little more early:
public static T PreferredFirst<T>(this IEnumerable<T> data, IList<T> orderBy, T whenNone)
{
// probably should consider a copy of orderBy if it can vary during runtime
var minIndex = int.MaxValue;
foreach(var d in data)
{
var idx = orderBy.IndexOf(d);
if (idx == 0) return d; // best case; quit now
if (idx > 0 && idx < minIndex) minIndex = idx;
}
// return the best found or "whenNone"
return minIndex == int.MaxValue ? whenNone : orderBy[minIndex];
}

I use a weighted approach in SQL where I assign a weight to each conditional value. The solution would then be found by finding the highest or lowest weight depending on your ordering scheme.
Below would be the equivalent LINQ query. Note that in this example I am assigning a lower weight a higher priority:
void Main()
{
// Assume below list is your dataset
var myList =new List<dynamic>(new []{
new {ColumnKey=1, ColumnValue ="Two"},
new {ColumnKey=2, ColumnValue ="Nine"},
new {ColumnKey=3, ColumnValue ="One"},
new {ColumnKey=4, ColumnValue ="Eight"}});
var result = myList.Select(p => new
{
ColVal = p.ColumnValue,
OrderKey = p.ColumnValue == "Three" ? 1 :
p.ColumnValue == "One" ? 2 :
p.ColumnValue == "Two" ? 3 : 4
}).Where(i=> i.OrderKey != 4)
.OrderBy(i=>i.OrderKey)
.Select(i=> i.ColVal)
.FirstOrDefault();
Console.WriteLine(result ?? "Four");
}

How about something like this:
var results = myTable.GroupBy(x => x.ColumnValue).ToList();
if (results.Contains("Three")) {
myResult = "Three";
} else if (results.Contains("One")) {
myResult = "One";
} else if (results.Contains("Two")) {
myResult = "Two";
} else {
myResult = "Four";
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to find out duplicate Elements in Xelement - c#

Related

How do I pick out values between a duplicate value in a collection?

How do you call a function to an Linq List query that uses the exist function

Remove 'duplicates' from a list of pairings

Merge two or more T in List<T> based on condition

Linq Conditional .Any() Select

Categories

Resources