linq groupby and max count - c#

I have a class like
public class Test
{
public string name;
public int status;
}
Example data
new Test("Name1", 1);
new Test("Name2", 2);
new Test("Name3", 3);
new Test("Name4", 1);
new Test("Name5", 2);
new Test("Name6", 2);
new Test("Name7", 3);
I'm looking for some linq to return the value 2 - which is the status that occurs the most.
Currently I have the following which is not correct.
var status = listTest.GroupBy(x => x.status).Select(x => x.OrderByDescending(t => t.status).First()).FirstOrDefault().status;
But hoping there is something cleaner?

I think this is what you want
You need to order the groups themselves, not what is in each group.
var status = listTest
.GroupBy(x => x.Status)
.OrderByDescending(g => g.Count())
.FirstOrDefault()?.Key;

You can group and pick the top after sorting them descending
var value = list.GroupBy(q => q.status)
.OrderByDescending(gp => gp.Count())
.First().Key;

Requirement: Given a sequence of objects of class Test, where every Test has an int property Status, give me the value of Status that occurs the most.
For this, make Groups Test objects that have the same value for property Status. Count the number of elements in each group. Order the result such that the group with the largest number comes first, and take the first element.
IEnumerable<Test> testSequence = ...
var statusThatOccursMost = testSequence
// make Groups of Tests that have the same value for Status:
.GroupBy(test => test.Status,
// parameter resultSelector: for every occurring Status value and all
// Tests with this common status value, make one new object,
// containing the common Status value and the number of Tests that have
// this common Status value
(commonStatusValue, testsThatHaveThisCommonStatusValue) => new
{
Status = commonStatusValue,
Count = testsThatHaveThisCommonStatusValue.Count(),
})
Result: a sequence of [Status, Count] combinations. The Status occurs at least once in testSequence. Count is the number of times that Status occurs. So we know, that Count is >= 1.
Order this sequence of [Status, Count] combinations by descending value of Count, so the first element is the one with the largest value for Count:
.OrderByDescenting(statusCountCombination => statusCountCombination.Count)
Result: a sequence of [Status, Count] combinations, where the combination with the largest value of Count comes first.
Extract the value of Status from the combination, and take the first one:
.Select(statusCountCombination => statusCountCombination.Status)
.FirstOrDefault();
Optimization
Although this LINQ is fairly simple, it is not very efficient to count all Status values and order all StatusCount combinations, if you only want the one that has the largest value for Count.
Consider to create an extension method. If you are not familiar with extension methods, read Extension Methods Demystified
Make a Dictionary: key is the Status. Value is the number of time this Status has occurred. Then take the Status with the largest Count
public static int ToMostOccuringStatusValueOrDefault(
this IEnumerable<Test> testSequence)
{
// return default if testSequence is empty
if (!testSequence.Any()) return 0;
Dictionary<int, int> statusCountCombinations = new Dictionary<int, int>();
foreach (Test test in testSequence)
{
if (statusCountCombinations.TryGetValue(test.Status, out int count)
{
// Status value already in dictionary: increase count:
statusCountCombinations[test.Status] = count + 1;
}
else
{
// Status value not in dictionary yet. Add with count 1
statusCountCombinations.Add(test.Status, 1);
}
}
GroupBy works similar to above, except it will first make a Dictionary where every Value is a list of Tests. Then if counts the number of Tests, and throws away the list. In the extension method we don't have to make the List.
Continuing the extension method: find the KeyValuePair that has the largest Value. We can use Enumerable.Aggregate, or enumerate:
using (var enumerator = statusCountCombinations.GetEnumerator())
{
// we know there is at least one element
enumerator.MoveNext();
// the first element is the largest until now:
KeyValuePair<int, int> largest = enumerator.Current;
// enumerate the rest:
while (enumerator.MoveNext)
{
if (enumerator.Current.Value > largest.Value)
{
// found a new largest one
largest = enumerator.Current;
}
}
return largest.Key;
}
}
In this method we only have to enumerate testSequence once, and your Dictionary once. If you would use Linq GroupBy / OrderByDescending, the result of GroupBy would be enumerated several times
Usage:
IEnumerable<Test> testSequence = ...
var mostCommonStatus = testSequence.ToMostOccurringStatusValueOrDefault();

Related

Sublists of consecutive elements that fit a condition in a list c# linq

So suppose we have a parking(represented as a dictionary<int,bool> :
Every parking lot has its id and a boolean(free,filled).
This way:
Dictionary<int,bool> parking..
parking[0]= true // means that the first parking lot is free
My question is i want to get the all sublist of consecutive elements that matchs in a condition : parking-lot is free.
First i can get elements that fits in this condition easy:
parking.Where(X => X.Value).Select(x => x.Key).ToList();
But then using linq operations i dont know how to get the first generated list that matchs in.
Can i do this without thousand of foreach-while loops checking iterating one by one, is there a easier way with linq?
This method gets a list of consecutive free parking lots
data:
0-free,
1-free,
2-filled ,
3-free
The results will be two lists:
First One will contain => 0 ,1
Second One will contain=> 3
These are the list of consecutive of parking lots that are free.
public List<List<int>> ConsecutiveParkingLotFree(int numberOfConsecutive){}
You can always write your own helper function to do things like this. For example
public static IEnumerable<List<T>> GroupSequential<T, TKey>(
this IEnumerable<T> self,
Func<T, bool> condition)
{
var list = new List<T>();
using var enumerator = self.GetEnumerator();
if (enumerator.MoveNext())
{
var current = enumerator.Current;
var oldValue = condition(current);
if (oldValue)
{
list.Add(current);
}
while (enumerator.MoveNext())
{
current = enumerator.Current;
var newValue = condition(current);
if (newValue)
{
list.Add(current);
}
else if (oldValue)
{
yield return list;
list = new List<T>();
}
oldValue = newValue;
}
if (list.Count > 0)
{
yield return list;
}
}
}
This will put all the items with a true-value in a list. When a true->false transition is encountered the list is returned and recreated. I would expect that there are more compact ways to write functions like this, but it should do the job.
You can apply GroupWhile solution here.
parking.Where(X => X.Value)
.Select(x => x.Key)
.GroupWhile((x, y) => y - x == 1)
.ToList()

Is it possible to always take 3 objects and if only 2 exists it returns 3 but one has null values in?

I need to return the 3 latest elements in a collection... If use Linq e.g. .OrderByDescending(a => a.Year).Take(3) then this is fine as long as the collection contains at least 3 elements. What I want is for it always to return 3, so for example if there are only 2 items then the last item would be a blank/initialised element (ideally where I could configure what was returned)
Is this possible?
You can concatenate the sequence with another (lazily created) sequence of 3 elements:
var result = query
.OrderByDescending(a => a.Year)
.Concat(Enumerable.Range(0, 3).Select(_ => new ResultElement()))
.Take(3);
Or perhaps:
var result = query
.OrderByDescending(a => a.Year)
.Concat(Enumerable.Repeat(new ResultElement(), 3))
.Take(3);
(The latter will end up with duplicate references and will always create an empty element, so I'd probably recommend the former... but it depends on the context. You might want to use Enumerable.Repeat(null, 3) and handle null elements instead.)
You could write your own extension method:
public static IEnumerable<T> TakeAndCreate<T>(this IEnumerable<T> input, int amount, Func<T> defaultElement)
{
int counter = 0;
foreach(T element in input.Take(amount))
{
yield return element;
counter++;
}
for(int i = 0; i < amount - counter; i++)
{
yield return defaultElement.Invoke();
}
}
Usage is
var result = input.OrderByDescending(a => a.Year).TakeAndCreate(3, () => new ResultElement());
One advantage of this solution is that it will create new elements only if they are acutally needed, which might be good for performance if you have a lot of elements to be created or their creation is not trivial.
Online demo: https://dotnetfiddle.net/HHexGd

Flatten a Dictionary<int, List<object>>

I have a dictionary which has an integer Key that represents a year, and a Value which is a list of object Channel. I need to flatten the data and create a new object from it.
Currently, my code looks like this:
Dictionary<int, List<Channel>> myDictionary;
foreach(var x in myDictionary)
{
var result = (from a in x.Value
from b in anotherList
where a.ChannelId == b.ChannelId
select new NewObject
{
NewObjectYear = x.Key,
NewObjectName = a.First().ChannelName,
}).ToList();
list.AddRange(result);
}
Notice that I am using the Key to be the value of property NewObjectYear.
I want to get rid of foreach since the dictionary contains a lot of data and doing some joins inside the iteration makes it very slow. So I decided to refactor and came up with this:
var flatten = myDictionary.SelectMany(x => x.Value.Select(y =>
new KeyValuePair<int, Channel>(x.Key, y))).ToList();
But with this, I couldn't get the Key directly. Using something like flatten.Select(x => x.Key) is definitely not the correct way. So I tried finding other ways to flatten that would be favorable for my scenario but failed. I also thought about creating a class which will contain the year and the list from the flattened but I don't know how.
Please help me with this.
Also, is there also another way that doesn't have the need to create a new class?
It seems to me you are trying to do only filtering, you do not need join for that:
var anotherListIDs = new HashSet<int>(anotherList.Select(c => c.ChannelId));
foreach (var x in myDictionary)
{
list.AddRange(x.Value
.Where(c => anotherListIDs.Contains(c.ChannelId))
.Select(c => new NewObject
{
NewObjectYear = x.Key,
NewObjectName = c.First().ChannelName,
}));
}
You do realise, that if the second element of the list in a specific dictionary element has a matching channelId, that you return the first element of this list, don't you?
var otherList = new OtherItem[]
{
new OtherItem() {ChannelId = 1, ...}
}
var dictionary = new Dictionary<int, List<Channel>[]
{
{ 10, // Key
new List<Channel>() // Value
{
new Channel() {ChannelId = 100, Name = "100"},
new Channel() {ChannelId = 1, Name = "1"},
},
};
Although the 2nd element has a matching ChannelId, you return the Name of the first element.
Anyway, let's assume this is what you really want. You are right, your function isn't very efficient.
Your dictionary implements IEnumerable<KeyValuePair<int, List<Channel>>. Therefore every x in your foreach is a KeyValuePair<int, List<Channel>. Every x.Value is a List<Channel>.
So for every element in your dictionary (which is a KeyValuePair<int, List<Channel>), you take the complete list, and perform a full inner join of the complete list with otherList, and for the result you take the key of the KeyValuePair and the first element of the List in the KeyValuePair.
And even though you might not use the complete result, but only the first or the first few, because of FirstOrDefault(), or Take(3), you do this for every element of every list in your Dictionary.
Indeed your query could be much more efficient.
As you use the ChannelIds in your OtherList only to find out if it is present, one of the major improvements would be to convert the ChannelIds of OtherList to a HashSet<int> where you have superior fast lookup to check if the ChannelId of one of the values in your Dictionary is in the HashSet.
So for every element in your dictionary, you only have to check every ChannelId in the list to see if one of them is in the HashSet. As soon as you've found one, you can stop and return only the first element of the List and the Key.
My solution is an extension function of Dictionary>. See Extension Methods Demystified
public static IEnumerable<NewObject> ExtractNewObjects(this Dictionary<int, List<Channel>> dictionary,
IEnumerable<OtherItem> otherList)
{
// I'll only use the ChannelIds of the otherList, so extract them
IEnumerable<int> otherChannelIds = otherList
.Select(otherItem => otherItem.ChannelId);
return dictionary.ExtractNewObjects(otherChannelIds);
}
This calls the other ExtractNewobjects:
public static IEnumerable<NewObject> ExtractNewObjects(this Dictionary<int, List<Channel>> dictionary,
IEnumerable<int> otherChannelIds)
{
var channelIdsSet = new HashSet<int>(otherChannelIds));
// duplicate channelIds will be removed automatically
foreach (KeyValuePair<int, List<Channel>> keyValuePair in dictionary)
{
// is any ChannelId in the list also in otherChannelIdsSet?
// every keyValuePair.Value is a List<Channel>
// every Channel has a ChannelId
// channelId found if any of these ChannelIds in in the HashSet
bool channelIdFound = keyValuePair.Value
.Any(channel => otherChannelIdsSet.Contains(channel.ChannelId);
if (channelIdFound)
{
yield return new NewObject()
{
NewObjectYear = keyValuePair.Key,
NewObjectName = keyValuePair.Value
.Select(channel => channel.ChannelName)
.FirstOrDefault(),
};
}
}
}
usage:
IEnumerable<OtherItem> otherList = ...
Dictionary<int, List<Channel>> dictionary = ...
IEnumerable<Newobject> extractedNewObjects = dictionary.ExtractNewObjects(otherList);
var someNewObjects = extractedNewObjects
.Take(5) // here we see the benefit from the yield return
.ToList();
We can see four efficiency improvements:
the use of HashSet<int> enables a very fast lookup to see if the ChannelId is in OtherList
the use of Any() stops enumerating the List<Channel> as soon as we've found a matching Channelid in the HashSet
the use of yield return makes that you don't enumerate over more elements in your Dictionary than you'll actually use.
The use of Select and FirstOrDefault when creating NewObjectName prevents exceptions if List<Channel> is empty

get priority items from a list and insert them into the same list at given position c#

I have a list of 50 sorted items(say) in which few items are priority ones (assume they have flag set to 1).
By default, i have to show the latest items (based on date) first, but the priority items should appear after some 'x' number of records. Like below
index 0: Item
index 1: Item
index 2: Priority Item (insert priority items from this position)
index 3: Priority Item
index 4: Priority Item
index 5: Item
index 6: Item
The index 'x' at which priority items should be inserted is pre-defined.
To achieve this, i am using following code
These are my 50 sorted items
var list= getMyTop50SortedItems();
fetching all priority items and storing it in another list
var priorityItems = list.Where(x => x.flag == 1).ToList();
filtering out the priority items from main list
list.RemoveAll(x => z.flag == 1);
inserting priority items in the main list at given position
list.InsertRange(1, priorityRecords);
This process is doing the job correctly and giving me the expected result. But am not sure whether it is the correct way to do it or is there any better way (considering the performance)?
Please provide your suggestions.
Also, how is the performance effected as i am doing many operations (filter, remove, insert) considering the increase in number of records from 50 to 100000(any number).
Update: How can i use IQueryable to decrease the number of operations on list.
As per documentation on InsertRange:
This method is an O(n * m) operation, where n is the number of
elements to be added and m is Count.
n*m isn't too very good, so I would use LINQ's Concat method to create a whole new list from three smaller lists, instead of modifying an existing one.
var allItems = getMyTop50();
var topPriorityItems = list.Where(x => x.flag == 1).ToList();
var topNonPriorityItems = list.Where(x => x.flag != 1).ToList();
var result = topNonPriorityItems
.Take(constant)
.Concat(topPriorityItems)
.Concat(topNonPriorityItems.Skip(constant));
I am not sure how fast the Concat, Skip and Take methods for List<T> are, though, but I'd bet they are not slower than O(n).
It seems like the problem you're actually trying to solve is just sorting the list of items. If this is the case, you don't need to be concerned with removing the priority items and reinserting them at the correct index, you just need to figure out your sort ordering function. Something like this ought to work:
// Set "x" to be whatever you want based on your requirements --
// this is the number of items that will precede the "priority" items in the
// sorted list
var x = 3;
var sortedList = list
.Select((item, index) => Tuple.Create(item, index))
.OrderBy(item => {
// If the original position of the item is below whatever you've
// defined "x" to be, then keep the original position
if (item.Item2 < x) {
return item.Item2;
}
// Otherwise, ensure that "priority" items appear first
return item.Item1.flag == 1 ? x + item.Item2 : list.Count + x + item.Item2;
}).Select(item => item.Item1);
You may need to tweak this slightly based on what you're trying to do, but it seems much simpler than removing/inserting from multiple lists.
Edit: Forgot that .OrderBy doesn't provide an overload that provides the original index of the item; updated answer to wrap the items in a Tuple that contains the original index. Not as clean as the original answer, but it should still work.
This can be done using a single enumeration of the original collection using linq-to-objects. IMO this also reads pretty clearly based on the original requirements you defined.
First, define the "buckets" that we'll be sorting into: I like using an enum here for clarity, but you could also just use an int.
enum SortBucket
{
RecentItems = 0,
PriorityItems = 1,
Rest = 2,
}
Then we'll define the logic for which "bucket" a particular item will be sorted into:
private static SortBucket GetBucket(Item item, int position, int recentItemCount)
{
if (position <= recentItemCount)
{
return SortBucket.RecentItems;
}
return item.IsPriority ? SortBucket.PriorityItems : SortBucket.Rest;
}
And then a fairly straightforward linq-to-objects statement to sort first into the buckets we defined, and then by the original position. Written as an extension method:
static IEnumerable<Item> PrioritySort(this IEnumerable<Item> items, int recentItemCount)
{
return items
.Select((item, originalPosition) => new { item, originalPosition })
.OrderBy(o => GetBucket(o.item, o.originalPosition, recentItemCount))
.ThenBy(o => o.originalPosition)
.Select(o => o.item);
}

Take specific number of array first then process

I have this code below that:
InstanceCollection instances = this.MyService(typeID, referencesIDs);
My problem here is when the referencesIDs.Count() is greater than a specific count, it throws an error which is related to SQL.
Suggested to me is to call the this.MyService multiple times so it won't process many referencesIDs.
What is the way to do that? I am thinking of using a while loop like this:
while (referencesIDs.Count() != maxCount)
{
newReferencesIDs = referencesIDs.Take(500).ToArray();
instances = this.MyService(typeID, newReferencesIDs);
maxCount += newReferencesIDs.Count();
}
The problem that I can see here is that how can I remove the first 500 referencesIDs on the newReferencesIDs? Because if I won't remove the first 500 after the first loop, it will continue to add the referencesIDs.
Are you just looking to update the referencesIDs value? Something like this?:
referencesIDs = referencesIDs.Skip(500);
Then the next time you call .Take(500) on referencesIDs it'll get the next 500 values.
Conversely, without updating the referencesIDs variable, you can include the Skip in your loop. Something like this:
var pageSize = 500;
var skipCount = 0;
while(...)
{
newReferencesIDs = referencesIDs.Skip(skipCount).Take(pageSize).ToArray();
skipCount += pageSize;
...
}
My first choice would be to fix the service, if you have access to it. A SQL-specific error could be a result of an incomplete database configuration, or a poorly written SQL query on the server. For example, Oracle limits IN lists in SQL queries to about 1000 items by default, but your Oracle DBA should be able to re-configure this limit for you. Alternatively, server side programmers could rewrite their query to avoid hitting this limit in the first place.
If this does not work, you could split your list into blocks of max size that does not trigger the error, make multiple calls to the server, and combine the instances on your end, like this:
InstanceCollection instances = referencesIDs
.Select((id, index) => new {Id = id, Index = index})
.GroupBy(p => p.Index / 500) // 500 is the max number of IDs
.SelectMany(g => this.MyService(typeID, g.Select(item => item.Id).ToArray()))
.ToList();
If you want a general way of splitting lists into chunks, you can use something like:
/// <summary>
/// Split a source IEnumerable into smaller (more manageable) lists.
/// </summary>
public static IEnumerable<IList<TSource>>
SplitIntoChunks<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
long i = 1;
var list = new List<TSource>();
foreach (var t in source)
{
list.Add(t);
if (i++ % chunkSize == 0)
{
yield return list;
list = new List<TSource>();
}
}
if (list.Count > 0)
yield return list;
}
And then you can use SelectMany to flatten results:
InstanceCollection instances = referencesIDs
.SplitIntoChunks(500)
.SelectMany(chunk => MyService(typeID, chunk))
.ToList();

Categories

Resources