Related
Suppose I have a list of strings [city01, city01002, state02, state03, city04, statebg, countryqw, countrypo]
How do I group them in a dictionary of <string, List<Strings>> like
city - [city01, city04, city01002]
state- [state02, state03, statebg]
country - [countrywq, countrypo]
If not code, can anyone please help with how to approach or proceed?
As shown in other answers you can use the GroupBy method from LINQ to create this grouping based on any condition you want. Before you can group your strings you need to know the conditions for how a string is grouped. It could be that it starts with one of a set of predefined prefixes, grouped by whats before the first digit or any random condition you can describe with code. In my code example the groupBy method calls another method for every string in your list and in that method you can place the code you need to group the strings as you want by returning the key to group the given string under. You can test this example online with dotnetfiddle: https://dotnetfiddle.net/UHNXvZ
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
List<string> ungroupedList = new List<string>() {"city01", "city01002", "state02", "state03", "city04", "statebg", "countryqw", "countrypo", "theFirstTown"};
var groupedStrings = ungroupedList.GroupBy(x => groupingCondition(x));
foreach (var a in groupedStrings) {
Console.WriteLine("key: " + a.Key);
foreach (var b in a) {
Console.WriteLine("value: " + b);
}
}
}
public static string groupingCondition(String s) {
if(s.StartsWith("city") || s.EndsWith("Town"))
return "city";
if(s.StartsWith("country"))
return "country";
if(s.StartsWith("state"))
return "state";
return "unknown";
}
}
You can use LINQ:
var input = new List<string>()
{ "city01", "city01002", "state02",
"state03", "city04", "statebg", "countryqw", "countrypo" };
var output = input.GroupBy(c => string.Join("", c.TakeWhile(d => !char.IsDigit(d))
.Take(4))).ToDictionary(c => c.Key, c => c.ToList());
i suppose you have a list of references you are searching in the list:
var list = new List<string>()
{ "city01", "city01002", "state02",
"state03", "city04", "statebg", "countryqw", "countrypo" };
var tofound = new List<string>() { "city", "state", "country" }; //references to found
var result = new Dictionary<string, List<string>>();
foreach (var f in tofound)
{
result.Add(f, list.FindAll(x => x.StartsWith(f)));
}
In the result, you have the dictionary wanted. If no value are founded for a reference key, the value of key is null
Warning: This answer has a combinatorial expansion and will fail if your original string set is large. For 65 words I gave up after running for a couple of hours.
Using some IEnumerable extension methods to find Distinct sets and to find all possible combinations of sets, you can generate a group of prefixes and then group the original strings by these.
public static class IEnumerableExt {
public static bool IsDistinct<T>(this IEnumerable<T> items) {
var hs = new HashSet<T>();
foreach (var item in items)
if (!hs.Add(item))
return false;
return true;
}
public static bool IsEmpty<T>(this IEnumerable<T> items) => !items.Any();
public static IEnumerable<IEnumerable<T>> AllCombinations<T>(this IEnumerable<T> start) {
IEnumerable<IEnumerable<T>> HelperCombinations(IEnumerable<T> items) {
if (items.IsEmpty())
yield return items;
else {
var head = items.First();
var tail = items.Skip(1);
foreach (var sequence in HelperCombinations(tail)) {
yield return sequence; // Without first
yield return sequence.Prepend(head);
}
}
}
return HelperCombinations(start).Skip(1); // don't return the empty set
}
}
var keys = Enumerable.Range(0, src.Count - 1)
.SelectMany(n1 => Enumerable.Range(n1 + 1, src.Count - n1 - 1).Select(n2 => new { n1, n2 }))
.Select(n1n2 => new { s1 = src[n1n2.n1], s2 = src[n1n2.n2], Dist = src[n1n2.n1].TakeWhile((ch, n) => n < src[n1n2.n2].Length && ch == src[n1n2.n2][n]).Count() })
.SelectMany(s1s2d => new[] { new { s = s1s2d.s1, s1s2d.Dist }, new { s = s1s2d.s2, s1s2d.Dist } })
.Where(sd => sd.Dist > 0)
.GroupBy(sd => sd.s.Substring(0, sd.Dist))
.Select(sdg => sdg.Distinct())
.AllCombinations()
.Where(sdgc => sdgc.Sum(sdg => sdg.Count()) == src.Count)
.Where(sdgc => sdgc.SelectMany(sdg => sdg.Select(sd => sd.s)).IsDistinct())
.OrderByDescending(sdgc => sdgc.Sum(sdg => sdg.First().Dist)).First()
.Select(sdg => sdg.First())
.Select(sd => sd.s.Substring(0, sd.Dist))
.ToList();
var groups = src.GroupBy(s => keys.First(k => s.StartsWith(k)));
I need to retain all the list that are redundant and not incremental. But my code so far is for the items that are redundant only
_lst.Add(new MSheetValue
{
Column = 1,
Line = "1",
Pdf = "PDF1"
});
_lst.Add(new MSheetValue
{
Column = 1,
Line = "1",
Pdf = "PDF1"
});
_lst.Add(new MSheetValue
{
Column = 1,
Line = "1",
Pdf = "PDF2"
});
_lst.Add(new MSheetValue
{
Column = 1,
Line = "2",
Pdf = "PDF2"
});
_lst.Add(new MSheetValue
{
Column = 1,
Line = "3",
Pdf = "PDF2"
});
_lst.Add(new MSheetValue
{
Column = 1,
Line = "1",
Pdf = "PDF3"
});
_lst.Add(new MSheetValue
{
Column = 1,
Line = "3",
Pdf = "PDF3"
});
Here is my code
var result = _lst.GroupBy(x => new { x.Line, x.Pdf })
.Where(x => x.Skip(1).Any()).ToList();
and the result is
Column = 1,
Line = "1",
Pdf = "PDF1"
But i also need the list that are not incremental
so i also need this
Column = 1,
Line = "1",
Pdf = "PDF3"
Column = 1,
Line = "3",
Pdf = "PDF3"
How can i solve it. I tried searching for a solution and test what i've found but i can't solve it. it doesn't return what i expected
var distinctItems = _lst.Distinct();
To match on only some of the properties, create a custom equality comparer, e.g.:
class DistinctItemComparer : IEqualityComparer<Item> {
public bool Equals(Item x, Item y) {
return x.Column == y.Column &&
x.Line == y.Line &&
x.Pdf == y.Pdf;
}
public int GetHashCode(Item obj) {
return obj.Column.GetHashCode() ^
obj.Line.GetHashCode() ^
obj.Pdf.GetHashCode();
}
}
Then use it like this:
var distinctItems = _lst.Distinct(new DistinctItemComparer());
Or try it:
var distinctItems = _lst.GroupBy(x => x.Id).Select(y => y.First());
using zip to get the adjacent items and then comparing the adjacent items and selecting the items that are not adjacent may do the trick. This example is a little oversimplified as you may want to compare the field with Pdfs as well. The Union adds the duplicates to the non-adjacents.
return _lst.Zip(_lst.Skip(1), (a, b) => new { a, b})
.Where(w => w.b.Line != w.a.Line + 1)
.Select(w => w.b)
.Union(_lst.GroupBy(x => new { x.Line, x.Pdf })
.Where(x => x.Skip(1).Any()).ToList()
.SelectMany(s => s));
Using some handy extension methods:
public static class Ext {
public static IEnumerable<(TKey Key, T Value)> ScanPair<T, TKey>(this IEnumerable<T> src, TKey seedKey, Func<(TKey Key, T Value), T, TKey> combine) {
using (var srce = src.GetEnumerator()) {
if (srce.MoveNext()) {
var prevkv = (seedKey, srce.Current);
while (srce.MoveNext()) {
yield return prevkv;
prevkv = (combine(prevkv, srce.Current), srce.Current);
}
yield return prevkv;
}
}
}
public static IEnumerable<IGrouping<int, TRes>> GroupByWhile<T, TRes>(this IEnumerable<T> src, Func<T, T, bool> test, Func<T, TRes> result) =>
src.ScanPair(1, (kvp, cur) => test(kvp.Value, cur) ? kvp.Key : kvp.Key + 1).GroupBy(kvp => kvp.Key, kvp => result(kvp.Value));
public static IEnumerable<IGrouping<int, TRes>> GroupBySequential<T, TRes>(this IEnumerable<T> src, Func<T, int> SeqNum, Func<T, TRes> result) =>
src.GroupByWhile((prev, cur) => SeqNum(prev) + 1 == SeqNum(cur), result);
public static IEnumerable<IGrouping<int, T>> GroupBySequential<T>(this IEnumerable<T> src, Func<T, int> SeqNum) => src.GroupBySequential(SeqNum, e => e);
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> src, Func<T, TKey> keyFun, IEqualityComparer<TKey> comparer = null) {
var seenKeys = new HashSet<TKey>(comparer);
foreach (var e in src)
if (seenKeys.Add(keyFun(e)))
yield return e;
}
public static int ToInteger(this string s) => Convert.ToInt32(s);
}
ScanPair is a variation of my APL inspired Scan operator (which is like Aggregate only returns the intermediate results). I discovered I was doing a lot of Scan with tuples to carry the original information, so ScanPair combines the intermediate results with the original values.
Using ScanPair, GroupByWhile runs a test on each element and groups while the test is true.
Using GroupByWhile, GroupBySequential groups when each elements sequence number is sequential.
DistinctBy returns the distinct objects based on a key selection function. I cheat and use this rather than create an IEqualityComparer for MSheetValue.
Finally, ToInteger is just a handy extension for reading flow.
With these extension methods, processing the _lst is relatively straightforward:
var nonSeq = _lst.GroupBy(m => m.Pdf) // need to test each Pdf
.Select(mg => mg.GroupBySequential(m => m.Line.ToInteger())) // get the sequential groups
.Where(mg => mg.Count() > 1) // keep the ones with non-sequential lines
// parse each non-sequential group into just the unique entries and flatten
.Select(mg => mg.SelectMany(m => m).DistinctBy(m => new { m.Column, m.Line, m.Pdf }));
I need to be able to return back only the records that have a unique AccessionNumber with it's corresponding LoginId. So that at the end, the data looks something like:
A1,L1
A2,L1
A3,L2
However, my issue is with this line of code because Distinct() returns a IEnumerable of string and not IEnumerable of string[]. Therefore, compiler complains about string not containing a definition for AccessionNumber and LoginId.
yield return new[] { record.AccessionNumber, record.LoginId };
This is the code that I am trying to execute:
internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
{
IEnumerable<StudentAssessmentTestData> data = DataGetter.GetTestData("MyTestData");
data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString());
var z = data.Select(x => x.AccessionNumber).Distinct();
foreach (var record in z)
{
yield return new[] { record.AccessionNumber, record.LoginId };
}
}
That's cause you are selecting only that property AccessionNumber by saying the below
var z = data.Select(x => x.AccessionNumber).Distinct();
You probably want to select entire StudentAssessmentTestData record
data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString()).Distinct();
foreach (var record in data)
{
yield return new[] { record.AccessionNumber, record.LoginId };
}
Instead of using Distinct, use GroupBy. This:
var z = data.Select(x => x.AccessionNumber).Distinct();
foreach (var record in z)
{
yield return new[] { record.AccessionNumber, record.LoginId };
}
should be something like this:
return data.GroupBy(x => x.AccessionNumber)
.Select(r => new { AccessionNumber = r.Key, r.First().LoginId});
The GroupBy() call ensures only unique entries for AccessionNumber and the First() ensures that only the first one LoginId with that AccessionNumber is returned.
This assumes that your data is sorted in a way that if there are multiple logins with the same AccessionNumber, the first login is correct.
If you want to choose distinct values based on a certain property you can do it in several ways.
If it is always the same property you wish to use for comparision, you can override Equals and GetHashCode methods in the StudentAssessmentTestData class, thus allowing the Distinct method to recognize how the classes differ from each other, an example can be found in this question
However, you can also implement a custom IEqualityComparer<T> for your implementation, for example the following version
// Custom comparer taking generic input parameter and a delegate function to do matching
public class CustomComparer<T> : IEqualityComparer<T> {
private readonly Func<T, object> _match;
public CustomComparer(Func<T, object> match) {
_match = match;
}
// tries to match both argument its return values against eachother
public bool Equals(T data1, T data2) {
return object.Equals(_match(data1), _match(data2));
}
// overly simplistic implementation
public int GetHashCode(T data) {
var matchValue = _match(data);
if (matchValue == null) {
return 42.GetHashCode();
}
return matchValue.GetHashCode();
}
}
This class can then be used as an argument for the Distinct function, for example in this way
// compare by access number
var accessComparer = new CustomComparer<StudentTestData>(d => d.AccessionNumber );
// compare by login id
var loginComparer = new CustomComparer<StudentTestData>(d => d.LoginId );
foreach (var d in data.Distinct( accessComparer )) {
Console.WriteLine( "{0}, {1}", d.AccessionNumber, d.LoginId);
}
foreach (var d in data.Distinct( loginComparer )) {
Console.WriteLine( "{0}, {1}", d.AccessionNumber, d.LoginId);
}
A full example you can find in this dotnetfiddle
Add a LinqExtension method DistinctBy as below.
public static class LinqExtensions
{
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (seenKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
}
Use it in your code like this:
var z = data.DistinctBy(x => x.AccessionNumber);
internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
{
IEnumerable<StudentAssessmentTestData> data = DataGetter.GetTestData("MyTestData");
data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString());
var z = data.DistinctBy(x => x.AccessionNumber);
foreach (var record in z)
{
yield return new[] { record.AccessionNumber, record.LoginId };
}
}
This is the code that finally worked:
internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
{
var data = DataGetter.GetTestData("MyTestData");
data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString());
var z = data.GroupBy(x => new{x.AccessionNumber})
.Select(x => new StudentAssessmentTestData(){ AccessionNumber = x.Key.AccessionNumber, LoginId = x.FirstOrDefault().LoginId});
foreach (var record in z)
{
yield return new[] { record.AccessionNumber, record.LoginId };
}
}
Returns a sequence that looks like similar to this:
Acc1, Login1
Acc2, Login1
Acc3, Login2
Acc4, Login1
Acc5, Login3
You can try this. It works for me.
IEnumerable<StudentAssessmentTestData> data = DataGetter.GetTestData("MyTestData");
data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString());
var z = data.GroupBy(x => x.AccessionNumber).SelectMany(y => y.Take(1));
foreach (var record in z)
{
yield return new[] { record.AccessionNumber, record.LoginId };
}
I'm not 100% sure what you're asking. You either want (1) only records with a unique AccessionNumber , if two or more records had the same AccessionNumber then don't return them, or (2) only the first record for each AccessionNumber.
Here's both options:
(1)
internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
{
return
DataGetter
.GetTestData("MyTestData");
.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString())
.GroupBy(x => x.AccessionNumber)
.Where(x => !x.Skip(1).Any())
.SelectMany(x => x)
.Select(x => new [] { x.AccessionNumber, x.LoginId });
}
(2)
internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
{
return
DataGetter
.GetTestData("MyTestData");
.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString())
.GroupBy(x => x.AccessionNumber)
.SelectMany(x => x.Take(1))
.Select(x => new [] { x.AccessionNumber, x.LoginId });
}
Suppose I have a list of strings, like this:
var candidates = new List<String> { "Peter", "Chris", "Maggie", "Virginia" };
Now I'd like to verify that another List<String>, let's call it list1, contains each of those candidates exactly once.
How can I do that, succintly? I think I can use Intersect(). I also want to get the missing candidates.
private bool ContainsAllCandidatesOnce(List<String> list1)
{
????
}
private IEnumerable<String> MissingCandidates(List<String> list1)
{
????
}
Order doesn't matter.
This may not be optimal in terms of speed, but both queries are short enough to fit on a single line, and are easy to understand:
private bool ContainsAllCandidatesOnce(List<String> list1)
{
return candidates.All(c => list1.Count(v => v == c) == 1);
}
private IEnumerable<String> MissingCandidates(List<String> list1)
{
return candidates.Where(c => list1.Count(v => v == c) != 1);
}
Here we are talking about Except, Intersect and Distinct. I could have used a lamba operator with expression but it would have to loop over each and every item. That functionality is available with a predefined functions.
for your first method
var candidates = new List<String> { "Peter", "Chris", "Maggie", "Virginia" };
private bool ContainsAllCandidatesOnce(List<String> list1)
{
list1.Intersect(candidates).Distinct().Any();
}
This will give any element from list1 which are in common in candidates list or you can do it the other way
candidates.Intersect(list1).Distinct().Any();
for your second method
private IEnumerable<String> MissingCandidates(List<String> list1)
{
list1.Except(candidates).AsEnumerable();
}
This will remove all elements from list1 which are in candidates. If you wants it the other way you can do
candidates.Except(list1).AsEnumerable();
This should be quite efficient:
IEnumerable<string> strings = ...
var uniqueStrings = from str in strings
group str by str into g
where g.Count() == 1
select g.Key;
var missingCandidates = candidates.Except(uniqueStrings).ToList();
bool isValid = !missingCandidates.Any();
Filter out repeats.
Ensure that all the candidates occur in the filtered-out-set.
GroupJoin is the right tool for the job. From msdn:
GroupJoin produces hierarchical results, which means that elements
from outer are paired with collections of matching elements from
inner. GroupJoin enables you to base your results on a whole set of
matches for each element of outer.
If there are no correlated elements in inner for a given element of outer, the sequence of matches for that element will be empty but
will still appear in the results.
So, GroupJoin will find any matches from the target, for each item in the source. Items in the source are not filtered if no matches are found in the target. Instead they are matched to an empty group.
Dictionary<string, int> counts = candidates
.GroupJoin(
list1,
c => c,
s => s,
(c, g) => new { Key = c, Count = g.Count()
)
.ToDictionary(x => x.Key, x => x.Count);
List<string> missing = counts.Keys
.Where(key => counts[key] == 0)
.ToList();
List<string> tooMany = counts.Keys
.Where(key => 1 < counts[key])
.ToList();
private bool ContainsAllCandidatesOnce(List<String> list1)
{
return list1.Where(s => candidates.Contains(s)).Count() == candidates.Count();
}
private IEnumerable<String> MissingCandidates(List<String> list1)
{
return candidates.Where(s => list1.Count(c => c == s) != 1);
}
How about using a HashSet instead of List?
private static bool ContainsAllCandidatesOnce(List<string> lotsOfCandidates)
{
foreach (string candidate in allCandidates)
{
if (lotsOfCandidates.Count(t => t.Equals(candidate)) != 1)
{
return false;
}
}
return true;
}
private static IEnumerable<string> MissingCandidates(List<string> lotsOfCandidates)
{
List<string> missingCandidates = new List<string>();
foreach (string candidate in allCandidates)
{
if (lotsOfCandidates.Count(t => t.Equals(candidate)) != 1)
{
missingCandidates.Add(candidate);
}
}
return missingCandidates;
}
int j = 0;
foreach (var e in XmlData.Elements())
{
xDictionary.Add(j++, e.Value);
}
You probably shouldn't be using a dictionary if the key is simply the positional index. I'd suggest using a list instead:
var xList = XmlData.Elements().ToList();
Well, this would do it, using the overload of Select which provides the index, and ToDictionary:
var dictionary = XmlData.Elements()
.Select((value, index) => new { value, index })
.ToDictionary(x => x.index, x => x.value);
That's assuming xDictionary was empty before you started.
Something like this: To create a new dictionary:
var dict = XmlData.Elements()
.Select((e, i) => new {Element = e, Index = i})
.ToDictionary(p => p.Index, p => p.Element.Value);
Also if you want to add to an existing dictionary you can use an AddRange convenience extension method:
xDictionary.AddRange(XmlData.Elements()
.Select((e, i) => new KeyValuePair<int, string>(i, e.Value)));
And the extension method implementation:
public static void AddRange<T>(this ICollection<T> source, IEnumerable<T> elements)
{
foreach (T element in elements)
{
source.Add(element);
}
}