Group list of strings by most common case variant - c#

I have a function that is given a list of strings from a database to display as options in a select filter.
It handles case variants with StringComparer.InvariantCultureIgnoreCase.
public static List<string> GetMostCommonItems(List<string> values)
{
var filterData = values.Where(rawValue => !string.IsNullOrEmpty(rawValue))
.GroupBy(item => item.ToLower())
.ToDictionary(g => g.Key, g => g.GroupBy(gx => gx, StringComparer.InvariantCultureIgnoreCase).ToDictionary(gy => gy.Key, gy => gy.Count()));
var data = filterData.Select(element => element.Value.OrderByDescending(a => a.Value).FirstOrDefault().Key).OrderBy(c => c).ToList();
if (values.FirstOrDefault(x => string.IsNullOrEmpty(x)) != null)
{
data.Add(null);
}
return data;
}
Now when I load the list of options it will display only lower case if there is any,
if not Capital case if there is any, if not UPPER CASE.
I would like to count all lower, capital and upper case variants and only add the highest occurrence.

Here's my solution:
List<string> values = new List<string>{
"aaa", "aaa", "Aaa", "BBB", "BBB", "bbb"
};
var filterData = values
.Where(rawValue => !string.IsNullOrEmpty(rawValue))
.GroupBy(gx => gx, StringComparer.InvariantCultureIgnoreCase)
.Select(g => new { g.Key, Best = g.GroupBy(x => x).Select( g2 => new { g2.Key, Count = g2.Count() }).OrderByDescending( x => x.Count).FirstOrDefault() });
foreach(var d in filterData)
{
Console.WriteLine($"{d.Best.Key} # {d.Best.Count}");
}
This prints:
aaa # 2
BBB # 2

Related

Linq - Get a list and sort it by a list of string values

I have a list of guids as string:
This is how i retrive my list of string guids:
List<string> p0 = ctx.PrProjectRating.Select(k => k).GroupBy(g => new { g.PrPIdG }, (key, group) => new { sumR = group.Sum(k => k.PrValue), pidG = key.PrPIdG }).Select(t => t.pidG).ToList();
Now i have another list that contains a field called pidG but this list needs to be ordered by the list of guid strings above.
How do i achiveve this.
i tried:
List<PProject> p = p.OrderBy(c => p0.Contains(c.PIdG)).ToList();
but still the list is not ordered by the string guids in the first list "p0"
You have to do join here
List<string> p0 = ctx.PrProjectRating
.Select(k => k)
.GroupBy(g => new { g.PrPIdG }, (key, group) =>
new { sumR = group.Sum(k => k.PrValue), pidG = key.PrPIdG })
.Select(t => t.pidG).ToList();
var result = p0.Join(p, x => x, c => c.PIdG, (x, c) => c)
.ToList()

Merge multiple dictionaries and aggregate values where required

I have three Dictionaries created by calling ToDictionary on a GroupBy projection in LINQ.
var dictionaryOne = _repositoryOne.GetData()
.GroupBy(d => new { d.Property1, d.Property2, d.LocalCcyId})
.ToDictionary(d =>
new
{
d.Key.Property1,
d.Key.Property2,
d.Key.LocalCcyId
},
v => v.Sum(l => ConvertToUsd(effectiveDate, l.LocalCcyId, l.Amount)));
var dictionaryTwo = _repositoryTwo.GetData()
.GroupBy(d => new { d.Property1, d.Property2, d.LocalCcyId})
.ToDictionary(d =>
new
{
d.Key.Property1,
d.Key.Property2,
d.Key.LocalCcyId
},
v => v.Sum(l => ConvertToUsd(effectiveDate, l.LocalCcyId, l.Balance)));
var dictionaryThree = _repositoryThree.GetData()
.GroupBy(d => new { d.Property1, d.Property2, d.LocalCcyId})
.ToDictionary(d =>
new
{
d.Key.Property1,
d.Key.Property2,
d.Key.LocalCcyId
},
v => v.Sum(l => ConvertToUsd(effectiveDate, l.LocalCcyId, l.Total)));
I want to merge these into a dictionary and
i) Sum up the values which are in USD &
ii) Drop the grouping by LocalCcyId column from the Key
The will be instances to the same key occurring in each of the three dictionaries and I need to aggregate the Sums for all such cases. How do I achieve this in LINQ?
Seems to me that this is all you need:
var finalDictionary =
dictionaryOne
.Concat(dictionaryTwo)
.Concat(dictionaryThree)
.GroupBy(x => new { x.Key.Property1, x.Key.Property2 }, x => x.Value)
.ToDictionary(x => new { x.Key.Property1, x.Key.Property2 }, x => x.Sum());
Or, using LINQ syntax (as much as possible) this:
var finalDictionary =
(
from x in dictionaryOne.Concat(dictionaryTwo).Concat(dictionaryThree)
group x.Value by new { x.Key.Property1, x.Key.Property2 }
)
.ToDictionary(x => new { x.Key.Property1, x.Key.Property2 }, x => x.Sum());
Assuming you are querying a remote datasource, running queries twice over the data or convering to USD twice doesn't seem more efficient then taking the dictionaries and combining them, so that's what I did.
First you need to convert each Dictionary to a new anonymous object having the data you need, then group by the properties summing the values:
var allDictionary = dictionaryOne.Select(kv => new { kv.Key.Property1, kv.Key.Property2, kv.Value })
.Concat(dictionaryTwo.Select(kv => new { kv.Key.Property1, kv.Key.Property2, kv.Value }))
.Concat(dictionaryThree.Select(kv => new { kv.Key.Property1, kv.Key.Property2, kv.Value }))
.GroupBy(k2v => new { k2v.Property1, k2v.Property2 })
.ToDictionary(k2vg => new { k2vg.Key.Property1, k2vg.Key.Property2 }, k2vg => k2vg.Sum(k2v => k2v.Value));

How to rank a list with original order in c#

I want to make a ranking from a list and output it on original order.
This is my code so far:
var data = new[] { 7.806468478, 7.806468478, 7.806468478, 7.173501754, 7.173501754, 7.173501754, 3.40877696, 3.40877696, 3.40877696,
4.097010736, 4.097010736, 4.097010736, 4.036494085, 4.036494085, 4.036494085, 38.94333318, 38.94333318, 38.94333318, 14.43588131, 14.43588131, 14.43588131 };
var rankings = data.OrderByDescending(x => x)
.GroupBy(x => x)
.SelectMany((g, i) =>
g.Select(e => new { Col1 = e, Rank = i + 1 }))
.ToList();
However, the result will be order it from descending:
What I want is to display by its original order.
e.g.: Rank = 3, Rank = 3, Rank = 3, Rank = 4, Rank = 4, Rank = 4, etc...
Thank You.
Using what you have, one method would be to keep track of the original order and sort a second time (ugly and potentially slow):
var rankings = data.Select((x, i) => new {Item = x, Index = i})
.OrderByDescending(x => x.Item)
.GroupBy(x => x.Item)
.SelectMany((g, i) =>
g.Select(e => new {
Index = e.Index,
Item = new { Col1 = e.Item, Rank = i + 1 }
}))
.OrderBy(x => x.Index)
.Select(x => x.Item)
.ToList();
I would instead suggest creating a dictionary with your rankings and joining this back with your list:
var rankings = data.Distinct()
.OrderByDescending(x => x)
.Select((g, i) => new { Key = g, Rank = i + 1 })
.ToDictionary(x => x.Key, x => x.Rank);
var output = data.Select(x => new { Col1 = x, Rank = rankings[x] })
.ToList();
As #AntonínLejsek kindly pointed out, replacing the above GroupBy call with Distinct() is the way to go.
Note doubles are not a precise type and thus are really not a good candidate for values in a lookup table, nor would I recommend using GroupBy/Distinct with a floating-point value as a key. Be mindful of your precision and consider using an appropriate string conversion. In light of this, you may want to define an epsilon value and forgo LINQ's GroupBy entirely, opting instead to encapsulate each data point into a (non-anonymous) reference type, then loop through a sorted list and assign ranks. For example (disclaimer: untested):
class DataPoint
{
decimal Value { get; set; }
int Rank { get; set; }
}
var dataPointsPreservingOrder = data.Select(x => new DataPoint {Value = x}).ToList();
var sortedDescending = dataPointsPreservingOrder.OrderByDescending(x => x.Value).ToList();
var epsilon = 1E-15; //use a value that makes sense here
int rank = 0;
double? currentValue = null;
foreach(var x in sortedDescending)
{
if(currentValue == null || Math.Abs(x.Value - currentValue.Value) > epsilon)
{
currentValue = x.Value;
++rank;
}
x.Rank = rank;
}
From review of the data you will need to iterate twice over the result set.
The first iteration will be to capture the rankings as.
var sorted = data
.OrderByDescending(x => x)
.GroupBy(x => x)
.Select((g, i) => new { Col1 = g.First(), Rank = i + 1 })
.ToList();
Now we have a ranking of highest to lowest with the correct rank value. Next we iterate the data again to find where the value exists in the overall ranks as:
var rankings = (from i in data
let rank = sorted.First(x => x.Col1 == i)
select new
{
Col1 = i,
Rank = rank.Rank
}).ToList();
This results in a ranked list in the original order of the data.
A bit shorter:
var L = data.Distinct().ToList(); // because SortedSet<T> doesn't have BinarySearch :[
L.Sort();
var rankings = Array.ConvertAll(data,
x => new { Col1 = x, Rank = L.Count - L.BinarySearch(x) });

Foreach Loop In LINQ in C#

I would like to replace the foreach loop in the following code with LINQ ForEach() Expression:
List<int> idList = new List<int>() { 1, 2, 3 };
IEnumerable<string> nameList = new List<string>();
foreach (int id in idList)
{
var Name = db.Books.Where(x => x.BookId == id).Select(x => x.BookName);
nameList.Add(Name);
}
Any Help Please!!
Your code doesn't quite work (you're adding an IEnumerable<string> to a List<string>). You also won't need ForEach, since you're constructing the list:
You can do this:
var nameList = idList.SelectMany(id => db.Books.Where(x => x.BookId == id)
.Select(x => x.BookName)).ToList();
But then you're hitting the database for each ID. You can grab all the books at once with :
var nameList = db.Books.Where(b => idList.Contains(b.BookId))
.Select(b => b.BookName).ToList();
Which will only hit the database once.
Why not a select?
List<int> idList = new List<int>() { 1, 2, 3 };
List<string> nameList = idList
.Select(id => db.Books.Where(x => x.BookId == id).Select(x => x.BookName))
.ToList();
Or better yet: refactorise and select...
int[] idList = new int[] { 1, 2, 3 };
List<string> nameList = db.Books
.Where(x => idList.Contains(x.BookId))
.Select(x => x.BookName))
.ToList();
nameList.AddRange(
db.Books.Where(x => idList.Contains(x.BookId))
.Select(x => x.BookName)
.ToList());
This will generate an IN statement in the SQL, thereby only doing a single select.
One thing to be aware of is the performance of IN degrades as the set (idList in this case) gets bigger. In the case of a large set, you can batch the set and do multiple queries:
int start = 0;
int batch = 1000;
while (start < idList.Count())
{
var batchSet = idList.Skip(start).Take(batch);
nameList.AddRange(
db.Books.Where(x => batchSet.Contains(x.BookId))
.Select(x => x.BookName)
.ToList());
start += batch;
}
To answer your specific question, you can do this:
List<int> idList = new List<int>() { 1, 2, 3 };
List<string> nameList = new List<string>();
idList.ForEach(id => {
var Name = db.Books.Where(x => x.BookId == id).Select(x => x.BookName);
nameList.Add(Name);
});

How to select non-distinct elements along with their indexes

List<string> str = new List<string>() {
"Alpha", "Beta", "Alpha", "Alpha", "Gamma", "Beta", "XYZ" };
Expected output:
String | Indexes
----------------------------
Alpha | 0, 2, 3
Beta | 1, 5
Gamma and XYZ are distinct so, they are ignored.
I've done this by comparing the strings manually. Would it be possible to do it using LINQ in more easier way?
foreach (var grp in
str.Select((s, i) => new { s, i })
.ToLookup(pair => pair.s, pair => pair.i)
.Where(pair => pair.Count() > 1))
{
Console.WriteLine("{0}: {1}", grp.Key, string.Join(", ", grp));
}
Something like this should work:
var elements = str
.Select((Elem, Idx) => new {Elem, Idx})
.GroupBy(x => x.Elem)
.Where(x => x.Count() > 1);
If you want to get a Dictionary<string,List<int>> having the duplicated string as key and the indexes as value, just add
.ToDictionary(x => x.Key, x => x.Select(e => e.Idx).ToList() );
after Where()
You can get the non-distinct strings by grouping, then you can get the index for each non-distinct string and group them to create an array for each string:
var distinct = new HashSet<string>(
str.GroupBy(s => s)
.Where(g => g.Count() > 1)
.Select(g => g.Key)
);
var index =
str.Select((s, i) => new {
Str = s,
Index = i
})
.Where(s => distinct.Contains(s.Str))
.GroupBy(i => i.Str).Select(g => new {
Str = g.Key,
Index = g.Select(s => s.Index).ToArray()
});
foreach (var i in index) {
Console.WriteLine("{0} : {1}", i.Str, String.Join(", ", i.Index.Select(n => n.ToString())));
}
Output:
Alpha : 0, 2, 3
Beta : 1, 5

Categories

Resources