Remove Duplicate item from datatable that starts with alphabet - c#

I'm trying to remove duplicate data from datatable but not just keeping the first data entry and removed the second duplicate entry onward. I need to set a condition such that it will be able to removed the incorrect entry.
For example:
ID Value
111 A
222 B
333 C
444 A
I want to remove 111 data and keep 444 because they have duplicate data A. The other solution I found will remove 444 instead.
The closest thing I can find that relates to my question is this.
Remove Duplicate item from list based on condition
The answer is using linq, which I'm not familiar with. I was thinking to use "StartsWith" to filter the correct data I want and I have no idea how to implement into it.
var result = items
.GroupBy(item => item.Name)
.SelectMany(g => g.Count() > 1 ? g.Where(x => x.Price != 500) : g); <-- I want to apply StartsWith here
Really appreciate if someone could help me with this.

I think you need something like
var result = items
.GroupBy(item => item.Name)
.SelectMany(g =>
{
if (g.Count() > 1 && g.Key == "A") //g.Key.StartsWith("A")
return g;
});
This will return u an array where will be all "A" elements and then u could decide which u'd like to delete
To delete all duplicates and leave only the last inserted element:
var result = items
.GroupBy(item => item.Name)
.SelectMany(g =>
{
if (g.Count() > 1)
{
var mainElement = g.OrderByDescending(x => x.ID).First();
return g.Where(x => x.ID != mainElement.ID).ToArray();
}
});

You forgot to say why you want to keep item 444 and not item 111 instead of the other way around.
LINQ is developed to query data. LINQ will never change the original source sequence.
You can use LINQ to query the items that you want to remove, and then use a foreach to remove the items one by one.
To query the items with duplicates is easy. If you need this function more often, consider creating an extension function for this:
static IEnumerable<IGrouping<TSource, TKey>> GetDuplicates<TSource>(
this IEnumerable<TSource> source,
Func<TSource, TKey> propertySelector)
{
// TODO: check source and propertySelector not null
// make groups of source items that have the same value for property:
return source.GroupBy(item => propertySelector(item))
// keep only the groups that have more than one element
// it would be a waste to Coun(), just stop after counting more than one
.Where(group => group.Skip(1).Any());
}
This will give you groups of all source items that have duplicate values for the selected property.
In your case:
var itemsWithDuplicateValues = mySourceItems.GetDuplicates(item => item.Value);
This will give you all your source items that have duplicate values for item.Value, grouped by same item.Value
Now that you've got time to find out why you want to keep item with Id 444 and not 111, you can write a function that takes a group of duplicates and returns the elements that you want to remove.
static IEnumerable<TSource> SelectItemsIWantToRemove<TSource>(
IEnumerable<TSource> source)
{
// TODO: check source not null
// select the items that you want to remove:
foreach (var item in source)
{
if (I want to remove this item)
yield return item;
}
// TODO: make sure there is always one item that you want to keep
// or decide what to do if there isn't any item that you want to keep
}
Now that you've got a function that selects the items that you want to remove it is easy to create a LINQ that will select from your sequence of duplicates the item that you want to remove:
static IEnumerable<TSource> WhereIWantToRemove<TSource>(
this IEnumerable<IGrouping<TSource>> duplicateGroups)
{
foreach (var group in duplicateGroups)
{
foreach (var sourceItem in group.WhereIWantToRemove())
{
yield return sourceItem;
}
}
}
You could also use a SelectMany for this.
Now put everything together:
static IEnumerable<TSource> WhereIWantToRemove<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> propertySelector)
{
return source.GetDuplicates(propertySelector)
.WhereIWantToRemove();
}
Usage:
var itemsToRemove = mySourceItems.WhereIWantToRemove(item => item.Value);
You can see that I chose to create several fairly small and easy to understand extension functions. Of course you can put them all together in one big LINQ statement. However, I'm not sure if you can convince your project leader that this would make your code better readable, testable, maintainable and re-usable. So my advice would be to stick to the small extension functions.

You can group the DataRows by Value and then select all the rows that don't match your conditions, and then delete all those rows:
var result = items.AsEnumerable()
.GroupBy(item => item.Field<string>("Value"))
.Where(g => g.Count() > 1)
.SelectMany(g => g.Where(x => !x.Field<string>("ID").StartsWith("4")));
foreach (var r in result) {
r.Delete();
}

Related

Sublists of consecutive elements that fit a condition in a list c# linq

So suppose we have a parking(represented as a dictionary<int,bool> :
Every parking lot has its id and a boolean(free,filled).
This way:
Dictionary<int,bool> parking..
parking[0]= true // means that the first parking lot is free
My question is i want to get the all sublist of consecutive elements that matchs in a condition : parking-lot is free.
First i can get elements that fits in this condition easy:
parking.Where(X => X.Value).Select(x => x.Key).ToList();
But then using linq operations i dont know how to get the first generated list that matchs in.
Can i do this without thousand of foreach-while loops checking iterating one by one, is there a easier way with linq?
This method gets a list of consecutive free parking lots
data:
0-free,
1-free,
2-filled ,
3-free
The results will be two lists:
First One will contain => 0 ,1
Second One will contain=> 3
These are the list of consecutive of parking lots that are free.
public List<List<int>> ConsecutiveParkingLotFree(int numberOfConsecutive){}
You can always write your own helper function to do things like this. For example
public static IEnumerable<List<T>> GroupSequential<T, TKey>(
this IEnumerable<T> self,
Func<T, bool> condition)
{
var list = new List<T>();
using var enumerator = self.GetEnumerator();
if (enumerator.MoveNext())
{
var current = enumerator.Current;
var oldValue = condition(current);
if (oldValue)
{
list.Add(current);
}
while (enumerator.MoveNext())
{
current = enumerator.Current;
var newValue = condition(current);
if (newValue)
{
list.Add(current);
}
else if (oldValue)
{
yield return list;
list = new List<T>();
}
oldValue = newValue;
}
if (list.Count > 0)
{
yield return list;
}
}
}
This will put all the items with a true-value in a list. When a true->false transition is encountered the list is returned and recreated. I would expect that there are more compact ways to write functions like this, but it should do the job.
You can apply GroupWhile solution here.
parking.Where(X => X.Value)
.Select(x => x.Key)
.GroupWhile((x, y) => y - x == 1)
.ToList()

get priority items from a list and insert them into the same list at given position c#

I have a list of 50 sorted items(say) in which few items are priority ones (assume they have flag set to 1).
By default, i have to show the latest items (based on date) first, but the priority items should appear after some 'x' number of records. Like below
index 0: Item
index 1: Item
index 2: Priority Item (insert priority items from this position)
index 3: Priority Item
index 4: Priority Item
index 5: Item
index 6: Item
The index 'x' at which priority items should be inserted is pre-defined.
To achieve this, i am using following code
These are my 50 sorted items
var list= getMyTop50SortedItems();
fetching all priority items and storing it in another list
var priorityItems = list.Where(x => x.flag == 1).ToList();
filtering out the priority items from main list
list.RemoveAll(x => z.flag == 1);
inserting priority items in the main list at given position
list.InsertRange(1, priorityRecords);
This process is doing the job correctly and giving me the expected result. But am not sure whether it is the correct way to do it or is there any better way (considering the performance)?
Please provide your suggestions.
Also, how is the performance effected as i am doing many operations (filter, remove, insert) considering the increase in number of records from 50 to 100000(any number).
Update: How can i use IQueryable to decrease the number of operations on list.
As per documentation on InsertRange:
This method is an O(n * m) operation, where n is the number of
elements to be added and m is Count.
n*m isn't too very good, so I would use LINQ's Concat method to create a whole new list from three smaller lists, instead of modifying an existing one.
var allItems = getMyTop50();
var topPriorityItems = list.Where(x => x.flag == 1).ToList();
var topNonPriorityItems = list.Where(x => x.flag != 1).ToList();
var result = topNonPriorityItems
.Take(constant)
.Concat(topPriorityItems)
.Concat(topNonPriorityItems.Skip(constant));
I am not sure how fast the Concat, Skip and Take methods for List<T> are, though, but I'd bet they are not slower than O(n).
It seems like the problem you're actually trying to solve is just sorting the list of items. If this is the case, you don't need to be concerned with removing the priority items and reinserting them at the correct index, you just need to figure out your sort ordering function. Something like this ought to work:
// Set "x" to be whatever you want based on your requirements --
// this is the number of items that will precede the "priority" items in the
// sorted list
var x = 3;
var sortedList = list
.Select((item, index) => Tuple.Create(item, index))
.OrderBy(item => {
// If the original position of the item is below whatever you've
// defined "x" to be, then keep the original position
if (item.Item2 < x) {
return item.Item2;
}
// Otherwise, ensure that "priority" items appear first
return item.Item1.flag == 1 ? x + item.Item2 : list.Count + x + item.Item2;
}).Select(item => item.Item1);
You may need to tweak this slightly based on what you're trying to do, but it seems much simpler than removing/inserting from multiple lists.
Edit: Forgot that .OrderBy doesn't provide an overload that provides the original index of the item; updated answer to wrap the items in a Tuple that contains the original index. Not as clean as the original answer, but it should still work.
This can be done using a single enumeration of the original collection using linq-to-objects. IMO this also reads pretty clearly based on the original requirements you defined.
First, define the "buckets" that we'll be sorting into: I like using an enum here for clarity, but you could also just use an int.
enum SortBucket
{
RecentItems = 0,
PriorityItems = 1,
Rest = 2,
}
Then we'll define the logic for which "bucket" a particular item will be sorted into:
private static SortBucket GetBucket(Item item, int position, int recentItemCount)
{
if (position <= recentItemCount)
{
return SortBucket.RecentItems;
}
return item.IsPriority ? SortBucket.PriorityItems : SortBucket.Rest;
}
And then a fairly straightforward linq-to-objects statement to sort first into the buckets we defined, and then by the original position. Written as an extension method:
static IEnumerable<Item> PrioritySort(this IEnumerable<Item> items, int recentItemCount)
{
return items
.Select((item, originalPosition) => new { item, originalPosition })
.OrderBy(o => GetBucket(o.item, o.originalPosition, recentItemCount))
.ThenBy(o => o.originalPosition)
.Select(o => o.item);
}

Select records with max property value per group

I've got a data set like this:
GroupName GroupValue MemberName MemberValue
'Group1' 1 'Member1' 1
'Group1' 1 'Member2' 2
'Group2' 2 'Member3' 3
'Group2' 2 'Member4' 2
'Group3' 2 'Member5' 4
'Group3' 2 'Member6' 1
What I want to select is the rows that have the maximum MemberValue per GroupName, but only for those GroupNames that have the largest GroupValue, and pass them into a delegate function. Like this:
'Group2' 2 'Member3' 3
'Group3' 2 'Member5' 4
So far I've tried this format...
data.Where(maxGroupValue =>
maxGroupValue.GroupValue == data.Max(groupValue => groupValue.GroupValue))
.Select(FunctionThatTakesData)
...but that just gives me every member of Group2 and Group3. I've tried putting a GroupBy() before the Select(), but that turns the output into an IGrouping<string, DataType> so FunctionThatTakesData() doesn't know what to do with it, and I can't do another Where() to filter out only the maximum MemberValues.
What can I do to get this data set properly filtered and passed into my function?
You can do that with the following Linq.
var results = data.GroupBy(r = r.GroupValue)
.OrderByDescending(g => g.Key)
.FirstOrDefault()
?.GroupBy(r => r.GroupName)
.Select(g => g.OrderByDescending(r => r.MemberValue).First());
First you have to group on the GroupValue then order the groups in descending order by the Key (which is the GroupValue) and take the first one. Now you have all the rows with the max GroupValue. Then you group those on the GroupName and from those groups order the MemberValue in descending order and take the First row to get the row in each GroupName group with the max MemberValue. Also I'm using the C# 6 null conditional operator ?. after FirstOrDefault in case data is empty. If you're not using C# 6 then you'll need to handle that case up front and you can just use First instead.
So basically what you want, is to divide your data elements into groups with the same value for GroupName. From every group you want to take one element, namely the one with the largest value for property MemberValue.
Whenever you have a sequence of items, and you want to divide this sequence into groups based on the value of one or more properties of the items in the sequence you use Enumerable.GroupBy
'GroupBy' takes your sequence as input and an extra input parameter: a function that selects which properties of your items you want to compare in your decision in which group you want the item to appear.
In your case, you want to divide your sequence into groups where all elements in a group have the same GroupName.
var groups = mySequence.GroupBy(element => element.GroupName);
What it does, it takes from every element in mySequence the property GroupName, and puts this element into a group of elements that have this value of GroupName.
Using your example data, you'll have three groups:
The group with all elements with GroupName == "Group1". The first two elements of your sequence will be in this group
The group with all elements with GroupName == "Group2". The third and fourth element of your sequence will be in this group
The group with all elements with GroupName == "Group3". The last two elements of your sequence will be in this group
Each group has a property Key, containing your selection value. This key identifies the group and is guaranteed to be unique within your collection of groups. So you'll have a group with Key == "Group1", a group with Key == "Group2", etc.
Besides the Key, every group is a sequence of the elements in the group (note: the group IS an enumerable sequence, not: it HAS an enumerable sequence.
Your second step would be to take from every group the element in the group with the largest value for MemberValue. For this you would order the elements in the group by descending value for property MemberValue and take the first one.
var myResult = mySequence.GroupBy(element => element.GroupName)
// intermediate result: groups where all elements have the same GroupName
.Select(group => group.OrderByDescending(groupElement => groupElement.MemberValue)
// intermediate result: groups where all elements are ordered in descending memberValue
.First();
Result: from every group ordered by descending memberValue, take the first element, which should be the largest one.
It is not very efficient to order the complete group, if you only want the element with the largest value for memberValue. The answer for this can be found
here on StackOverflow
The easier way to solve this problem is to use the new (.NET 6) MaxBy LINQ operator, along with the GroupBy and Select operators:
IEnumerable<Record> query = records
.GroupBy(x => x.GroupName)
.Select(g => g.MaxBy(x => x.MemberValue));
This is an easy but not memory efficient solution. The reason is because it generates a full blown Lookup<TKey, TSource> structure under the hood, which is a dictionary-line container that contains all the records associated with each key. This structure is generated before starting to compare the elements contained in each grouping, in order to select the maximum element.
In most cases this inefficiency is not a problem, because the records are not that many, and they are already stored in memory. But if you have a truly deferred enumerable sequence that contains a humongous number of elements, you might run out of memory. In this case you could use the GroupMaxBy operator below. This operator stores in memory only the currently maximum element per key:
/// <summary>
/// Groups the elements of a sequence according to a specified key selector
/// function, and then returns the maximum element in each group according to
/// a specified value selector function.
/// </summary>
public static IEnumerable<TSource> GroupMaxBy<TSource, TKey, TValue>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TSource, TValue> valueSelector,
IEqualityComparer<TKey> keyComparer = default,
IComparer<TValue> valueComparer = default)
{
// Arguments validation omitted
valueComparer ??= Comparer<TValue>.Default;
var dictionary = new Dictionary<TKey, (TSource Item, TValue Value)>(keyComparer);
foreach (var item in source)
{
var key = keySelector(item);
var value = valueSelector(item);
if (dictionary.TryGetValue(key, out var existing) &&
valueComparer.Compare(existing.Value, value) >= 0) continue;
dictionary[key] = (item, value);
}
foreach (var entry in dictionary.Values)
yield return entry.Item;
}
Usage example:
IEnumerable<Record> query = records
.GroupMaxBy(x => x.GroupName, x => x.MemberValue);
The reverse GroupMinBy can be implemented similarly by replacing the >= with <=.
Below is a demonstration of the difference in memory-efficiency between the two approaches:
var source = Enumerable.Range(1, 1_000_000);
{
var mem0 = GC.GetTotalAllocatedBytes(true);
source.GroupBy(x => x % 1000).Select(g => g.MaxBy(x => x % 3333)).Count();
var mem1 = GC.GetTotalAllocatedBytes(true);
Console.WriteLine($"Allocated: {mem1 - mem0:#,0} bytes");
}
{
var mem0 = GC.GetTotalAllocatedBytes(true);
source.GroupMaxBy(x => x % 1000, x => x % 3333).Count();
var mem1 = GC.GetTotalAllocatedBytes(true);
Console.WriteLine($"Allocated: {mem1 - mem0:#,0} bytes");
}
Output:
Allocated: 8,571,168 bytes
Allocated: 104,144 bytes
Try it on Fiddle.

Query IEnumerable as IEnumerable<Type>

I have a problem I need to solve efficiently.
I require the index of an element in an IEnumerable source, one way I could do this is with the following
var items = source.Cast<ObjectType>().Where(obj => obj.Start == forDate);
This would give me an IEnumerable of all the items that match the predicate.
if(items != null && items.Any()){
// I now need the ordinal from the original list
return source.IndexOf(items[0]);
}
However, the list could be vast and the operation will be carried out many times. I believe this is inefficient and there must be a better way to do this.
I would be grateful if anyone can point me in the correct direction.
Sometimes, it's good to forget about Linq and go back to basics:
int index = 0;
foeach (ObjectType element in source)
{
if (element.Start == forDate)
{
return index;
}
index++;
}
// No element found
Using Linq, you can take the index of each object before filtering them:
source
.Cast<ObjectType>()
.Select((obj, i) => new { Obj = obj, I = i })
.Where(x => x.Obj.Start == forDate)
.Select(x => x.I)
.FirstOrDefault();
However, this is not really efficient, the following will do the same without allocations:
int i = 0;
foreach (ObjectType obj in source)
{
if (obj.Start == forDate)
{
return i;
}
i++;
}
Your second code sample was invalid: since items is an IEnumerable, you cannot call items[0]. You can use First(). Anyway:
var items = source.Cast<ObjectType>()
.Select((item, index) => new KeyValuePair<int, ObjectType>(index, item))
.Where(obj => obj.Value.Start == forDate);
and then:
if (items != null && items.Any()) {
return items.First().Key;
}
If you need to do this multiple times I would create a lookup for the indices.
ILookup<DateTime, int> lookup =
source
.Cast<ObjectType>()
.Select((e, i) => new { e, i })
.ToLookup(x => x.e.Start, x => x.i);
Now given a forDate you can do this:
IEnumerable<int> indices = lookup[forDate];
Since the lookup is basically like a dictionary that returns multiple values you get the results instantly. So repeating this for multiple values is super fast.
And since this returns IEnumerable<int> you know when there are duplicate values within the source list. If you only need the first one then just do a .First().

Use LINQ to search items in a SortedDictionary

I have a SortedDictionary of the type:
SortedDictionary<PriorityType, List<T>> dictionary;
where PriorityType is an Enum class, and the List contains various string values.
I want to use LINQ to search for the string items in the list, that have an even length.
As in:
IEnumerable<T> filteredList = new List<T>();
// Stores items in list whose string length is even
filteredList = //LINQ code;
I have tried a lot of implementations of LINQ but, it seems tough to traverse a List in a SortedDictionary using LINQ (taking into account I'm relatively new to LINQ).
Please help me with the LINQ code. Thanks!
If I understand you correctly, then you need items from lists which have even count of items:
filteredList = dictionary.Select(kvp => kvp.Value)
.Where(l => l != null && l.Count % 2 == 0)
.SelectMany(l => l)
.ToList();
UPDATE: If you want to select strings with even length, then you should use List<string> instead of generic list of T:
SortedDictionary<PriorityType, List<string>> dictionary;
filteredList = dictionary.SelectMany(kvp => kvp.Value)
.Where(s => s.ToString().Length % 2 == 0)
.ToList();
The solution provided by #Sergey is correct & in conformance to my requirements.
Also I found another easy solution using the select statement.
filteredList = from list in dictionary.Values from item in list where item.ToString().Length % 2 == 0 select item;
Hope this helps!

Categories

Resources