Group and Merge multiple values from single List using LINQ - c#

I have the following enums and class
internal enum flag_dead_alive
{
none = 0
,
dead = 1
,
alive = 2
}
internal enum flag_blocked_unblocked
{
none = 0
,
blocked = 1
,
unblocked = 2
}
internal class cls_entry
{
internal string id_number { get; set; }
internal flag_dead_alive dead_alive { get; set; }
internal flag_blocked_unblocked blocked_unblocked { get; set; }
}
I have List which contains hundreds of thousands of records, so for testing purposes I have created a sample list that contain the same sort of records, below (id_number is deliberately set as string for reasons that are irrelevant right now)
List<cls_npr_entry> output = new List<cls_npr_entry>()
{
new cls_npr_entry() { id_number = "1", dead_alive = flag_dead_alive.alive, blocked_unblocked = flag_blocked_unblocked.none }
,
new cls_npr_entry() { id_number= "1", dead_alive = flag_dead_alive.none, blocked_unblocked= flag_blocked_unblocked.blocked }
,
new cls_npr_entry(){id_number= "2", dead_alive = flag_dead_alive.none, blocked_unblocked= flag_blocked_unblocked.blocked }
,
new cls_npr_entry(){id_number= "3", dead_alive = flag_dead_alive.dead, blocked_unblocked= flag_blocked_unblocked.none }
,
new cls_npr_entry(){id_number= "3", dead_alive = flag_dead_alive.none, blocked_unblocked= flag_blocked_unblocked.unblocked }
};
From the list, I want to get output of grouped and merged (is that the correct term here?) records from my list. However, any enum that is set to "none" should be discarded if a matched record has a different value to "none", otherwise it must remain "none". For example, the output for the above list should be
1 : id_number = 1, dead_alive = alive, blocked_unblocked = blocked
2 : id_number = 2, dead_alive = none, blocked_unblocked = blocked
3 : id_number = 3, dead_alive = dead, blocked_unblocked = unblocked
The code
var groups = output.GroupBy(x => x.id_number);
returns the records in the correct groups, but I have no idea where to from here. I also have
var groups = output.GroupBy(x => x.id_number).Select(g => g.Select(y => new { y.id_number, y.blocked_unblocked, y.dead_alive }));
but that returns the same result as the first query. I need to figure out how to select one record from y.dead_alive and one record from y.blocked_unblocked, so that the result only returns only the relevant record to create one record from both.
All help would be appreciated.

For your outputtest list, you can get theMax of dead_alive and blocked_unblocked after grouping, like the following code:
var groups = output.GroupBy(x => x.id_number)
.Select(y => new cls_entry
{
id_number = y.Key,
dead_alive = y.Max(e => e.dead_alive),
blocked_unblocked = y.Max(e => e.blocked_unblocked)
}).ToList();
Documentation of Max method : Max

Related

Get list items by checking priority of another list using linq

I need to implement something,
I have an item list with different priorities and I need to select the items based on these priorities. For that, I have another list of priorities. If priority "one" does not match it should check the second priority and so on. This can be checked with a simple foreach loop and check with .Any() function. But I'm wondering this can be done with a single LINQ query.
enum Brand
{
Nike,
Adidas,
Levis
}
class Info
{
public Brand Brand { get; set; }
public int Status{ get; set; }
public int Group{ get; set; }
}
var list = new Info[]
{
new Info{Brand = Brand.Nike, Status = 0, Group = "C"},
new Info{Brand = Brand.Adidas, Status = 0, Group = "D"},
new Info{Brand = Brand.Nike, Status = 4, Group = "A"},
new Info{Brand = Brand.Levis, Status = 0, Group = "B"},
new Info{Brand = Brand.Adidas, Status = 5, Group = "B"}
};
List<string> groupPririties = new List<string>() { "B", "D", "A", "E" };
According to this item list and priority list, I should get Levis and Addidas only. But if those two are not on the list it should return Adidas item and so on. If I dont have any priority matching items in my list, it should return null.
Can I acheive this with linq query only?
But I'm wondering this can be done with a single LINQ query
var priorityItems = list
.GroupBy(x => x.Group)
.Where(grp => groupPririties.Contains(grp.Key))
.OrderBy(grp => groupPririties.IndexOf(grp.Key))
.FirstOrDefault();

Optimising a Get Next Value List function

We retrieve a list of objects from a database and cannot rely on the Id order to guarantee they will be in the right sequence, as objects may have been edited, deleted etc.
They look like this:
Id NextId
1 3
2 0
3 17
17 2
So the correct order is 1, 3, 17, 2.
I came up with this code to solve the problem:
long lastStep = steps.Single(x => x.NextId == 0).Id;
//Probably should be a guard clause for nulls
List<MyObject> orderedSteps = new List<MyObject>();
int retries = 0;
do
{
foreach (var entry in steps)
{
if (lastStep == entry.NextId) orderedSteps.Add(entry);
retries++;
}
} while (orderedSteps.Count() < steps.Count() && retries < 10000);
//Flip the order so it runs first to last
orderedSteps.Reverse();
return orderedSteps;
I think this works...but it feels kind of hacky, and that there's a more safe and efficient way of doing it.
Any Suggestions? Thanks!
You could do this directly in the database using a recursive CTE:
WITH SequenceQuery (Id, NextId, Ordering)
AS
(
SELECT Id,
NextId,
0 AS Ordering
FROM Steps
WHERE Id = 1
UNION ALL
SELECT Steps.Id,
Steps.NextId,
SequenceQuery.Ordering + 1 AS Ordering
FROM SequenceQuery INNER JOIN Steps
ON SequenceQuery.NextId = Steps.Id
)
SELECT *
FROM SequenceQuery
ORDER BY Ordering
In the event of a cycle, this will return an error once it hits the maximum recursion depth. The maximum depth is by default 100; if your data set could legitimately have more than 100 elements, you can increase the limit with the following clause (which goes right at the end of the query, after the SELECT statement):
OPTION (MAXRECURSION 1000) -- (for example)
This will be by far the fastest way to get the data back, provided that the Id column is properly indexed.
If you prefer to do it in code, then you'll need to load the entire table into a dictionary beforehand and then walk through it. The advantage to this is that you can explicitly detect cycles instead of depending on a numeric limit to the number of levels.
var steps = ...;
var stepById = steps.ToDictionary(step => step.Id);
var stepsInOrder = new List<int>();
var visited = new HashSet<int>();
// Make sure that when we hit 0, we'll definitely stop.
Debug.Assert(!stepsInOrder.ContainsKey(0));
int currentStepId = 1;
while (stepById.TryGetValue(currentStepId, out Step step))
{
stepsInOrder.Add(currentStepId);
int nextStepId = step.NextId;
if (!visited.Add(nextStepId))
throw new Exception($"Cycle found at step {nextStepId}");
currentStepId = nextStepId;
}
(SQL tested, C# code untested)
Here's my solution. Requires several assumptions to be true: Single chain, terminated with a 0 Id.
public class Item
{
public int Id;
public int NextId;
public override string ToString()
{
return string.Format("Item {0} (links to {1})", Id, NextId);
}
};
class Program
{
static void Main(string[] args)
{
Item[] items = new Item[] {
new Item() { Id = 1, NextId = 3 },
new Item() { Id = 2, NextId = 0 },
new Item() { Id = 3, NextId = 17 },
new Item() { Id = 17, NextId = 2 }
};
Dictionary<int, int> idToIndex = new Dictionary<int, int>();
int headId = 0;
for (int index = 0; index < items.Length; ++index)
{
idToIndex.Add(items[index].Id, index);
headId = headId ^ items[index].NextId ^ items[index].Id;
}
int currentId = headId;
while (currentId != 0)
{
var item = items[idToIndex[currentId]];
Console.WriteLine(item);
currentId = item.NextId;
}
}
}
My suggestion is as follows:
class MyObject
{
public long Id;
public long NextId;
public override string ToString() => Id.ToString();
};
public void q48710242()
{
var items = new[]
{
new MyObject{ Id = 1, NextId = 3 },
new MyObject{ Id = 2, NextId = 0 },
new MyObject{ Id = 3, NextId = 17 },
new MyObject{ Id = 17, NextId = 2 }
};
var nextIdIndex = items.ToDictionary(item => item.NextId);
var orderedSteps = new List<MyObject>();
var currentStep = new MyObject() { Id = 0 };
while (nextIdIndex.TryGetValue(currentStep.Id, out currentStep))
{
orderedSteps.Add(currentStep);
}
orderedSteps.Reverse();
var output = string.Join(", ", orderedSteps);
}
Returns:
output = "1, 3, 17, 2"
This uses a dictionary to build an index of the items as in Jonathan's answer, but by using NextId as the key. The algorithm then proceeds backwards from the 0 as in the original question to build the list in reverse. This approach has no problems with loops in the data as any such loop will never be entered assuming that Id is unique.
If the data contains multiple elements with the same NextId, then it forms a tree structure:
var items = new[]
{
new { Id = 1, NextId = 3 },
new { Id = 2, NextId = 0 },
new { Id = 3, NextId = 17 },
new { Id = 17, NextId = 2 },
new { Id = 100, NextId = 2 }
};
This will cause the .ToDictionary() call to fail with System.ArgumentException: An item with the same key has already been added.
If the data contains no entries with a NextId equal to 0, it will return an empty list.
Update Changed to return a list of objects rather than the indices.
Hope this helps

LINQ grouping without changing the order [duplicate]

Let's say I have following data:
Time Status
10:00 On
11:00 Off
12:00 Off
13:00 Off
14:00 Off
15:00 On
16:00 On
How could I group that using Linq into something like
[On, [10:00]], [Off, [11:00, 12:00, 13:00, 14:00]], [On, [15:00, 16:00]]
Create a GroupAdjacent extension, such as the one listed here.
And then it's as simple as:
var groups = myData.GroupAdjacent(data => data.OnOffStatus);
You could also do this with one Linq query using a variable to keep track of the changes, like this.
int key = 0;
var query = data.Select(
(n,i) => i == 0 ?
new { Value = n, Key = key } :
new
{
Value = n,
Key = n.OnOffFlag == data[i - 1].OnOffFlag ? key : ++key
})
.GroupBy(a => a.Key, a => a.Value);
Basically it assigns a key for each item that increments when the current item does not equal the previous item. Of course this assumes that your data is in a List or Array, otherwise you'd have to try a different method
Here is a hardcore LINQ solution by using Enumerable.Zip to compare contiguous elements and generate a contiguous key:
var adj = 0;
var t = data.Zip(data.Skip(1).Concat(new TimeStatus[] { null }),
(x, y) => new { x, key = (x == null || y == null || x.Status == y.Status) ? adj : adj++ }
).GroupBy(i => i.key, (k, g) => g.Select(e => e.x));
It can be done as.
Iterate over collection.
Use TakeWhile<Predicate> condition is text of first element of collection On or Off.
Iterate over the subset of from point one and repeat above step and concatenate string.
Hope it helps..
You could parse the list and assign a contiguous key e.g define a class:
public class TimeStatus
{
public int ContiguousKey { get; set; }
public string Time { get; set; }
public string Status { get; set; }
}
You would assign values to the contiguous key by looping through, maintaining a count and detecting when the status changes from On to Off and so forth which would give you a list like this:
List<TimeStatus> timeStatuses = new List<TimeStatus>
{
new TimeStatus { ContiguousKey = 1, Status = "On", Time = "10:00"},
new TimeStatus { ContiguousKey = 1, Status = "On", Time = "11:00"},
new TimeStatus { ContiguousKey = 2, Status = "Off", Time = "12:00"},
new TimeStatus { ContiguousKey = 2, Status = "Off", Time = "13:00"},
new TimeStatus { ContiguousKey = 2, Status = "Off", Time = "14:00"},
new TimeStatus { ContiguousKey = 3, Status = "On", Time = "15:00"},
new TimeStatus { ContiguousKey = 3, Status = "On", Time = "16:00"}
};
Then using the following query you can extract the Status and grouped Times:
var query = timeStatuses.GroupBy(t => t.ContiguousKey)
.Select(g => new { Status = g.First().Status, Times = g });

Linq query to group by field1, count field2 and filter by count between values of joined collection

I'm having trouble with getting a my linq query correct. I've been resisting doing this with foreach loops because I'm trying to better understand linq.
I have following data in LinqPad.
void Main()
{
var events = new[] {
new {ID = 1, EventLevel = 1, PatientID = "1", CodeID = "2", Occurences = 0 },
new {ID = 2, EventLevel = 2, PatientID = "1", CodeID = "2", Occurences = 0 },
new {ID = 3, EventLevel = 1, PatientID = "2", CodeID = "1", Occurences = 0 },
new {ID = 4, EventLevel = 3, PatientID = "2", CodeID = "2", Occurences = 0 },
new {ID = 5, EventLevel = 1, PatientID = "3", CodeID = "3", Occurences = 0 },
new {ID = 6, EventLevel = 3, PatientID = "1", CodeID = "4", Occurences = 0 }
};
var filter = new FilterCriterion();
var searches = new List<FilterCriterion.Occurence>();
searches.Add(new FilterCriterion.Occurence() { CodeID = "1", MinOccurences = 2, MaxOccurences = 3 });
searches.Add(new FilterCriterion.Occurence() { CodeID = "2", MinOccurences = 2, MaxOccurences = 3 });
filter.Searches = searches;
var summary = from e in events
let de = new
{
PatientID = e.PatientID,
CodeID = e.CodeID
}
group e by de into t
select new
{
PatientID = t.Key.PatientID,
CodeID = t.Key.CodeID,
Occurences = t.Count(d => t.Key.CodeID == d.CodeID)
};
var allCodes = filter.Searches.Select(i => i.CodeID);
summary = summary.Where(e => allCodes.Contains(e.CodeID));
// How do I find the original ID property from the "events" collection and how do I
// eliminate the instances where the Occurences is not between MinOccurences and MaxOccurences.
foreach (var item in summary)
Console.WriteLine(item);
}
public class FilterCriterion
{
public IEnumerable<Occurence> Searches { get; set; }
public class Occurence
{
public string CodeID { get; set; }
public int? MinOccurences { get; set; }
public int? MaxOccurences { get; set; }
}
}
The problem I have is that need to filter the results by the MinOccurences and MaxOccurences filter property and in the end I want the "events" objects where the IDs are 1,2,3 and 4.
Thanks in advance if you can provide help.
To access event.ID at the end of processing you need to pass it with your first query. Alter select to this:
// ...
group e by de into t
select new
{
PatientID = t.Key.PatientID,
CodeID = t.Key.CodeID,
Occurences = t.Count(d => t.Key.CodeID == d.CodeID),
// taking original items with us
Items = t
};
Having done that, your final query (including occurrences filter) might look like this:
var result = summary
// get all necessary data, including filter that matched given item
.Select(Item => new
{
Item,
Filter = searches.FirstOrDefault(f => f.CodeID == Item.CodeID)
})
// get rid of those without matching filter
.Where(i => i.Filter != null)
// this is your occurrences filtering
.Where(i => i.Item.Occurences >= i.Filter.MinOccurences
&& i.Item.Occurences <= i.Filter.MaxOccurences)
// and finally extract original events IDs
.SelectMany(i => i.Item.Items)
.Select(i => i.ID);
This produces 1, 2 as result. 3 and 4 are left out as they don't get past occurrences filtering.
I have run your program in linqpad.
My understanding is that you want to filter using filter.MinOccurences and filter.MaxOccurences on Occurences count of result data set.
You can add additional filters using Where clause.
if (filter.MinOccurences.HasValue)
summary = summary.Where (x=> x.Occurences >= filter.MinOccurences);
if (filter.MaxOccurences.HasValue)
summary = summary.Where (x=> x.Occurences <= filter.MaxOccurences);

LINQ-to-objects index within a group + for different groupings (aka ROW_NUMBER with PARTITION BY equivalent)

After much Google searching and code experimentation, I'm stumped on a complex C# LINQ-to-objects problem which in SQL would be easy to solve with a pair of ROW_NUMBER()...PARTITION BY functions and a subquery or two.
Here's, in words, what I'm trying to do in code-- the underlying requirement is removing duplicate documents from a list:
First, group a list by (Document.Title, Document.SourceId), assuming a (simplified) class definition like this:
class Document
{
string Title;
int SourceId; // sources are prioritized (ID=1 better than ID=2)
}
Within that group, assign each document an index (e.g. Index 0 == 1st document with this title from this source, Index 1 = 2nd document with this title from this source, etc.). I'd love the equivalent of ROW_NUMBER() in SQL!
Now group by (Document.Title, Index), where Index was computed in Step #2. For each group, return only one document: the one with the lowest Document.SourceId.
Step #1 is easy (e.g. codepronet.blogspot.com/2009/01/group-by-in-linq.html), but I'm getting stumped on steps #2 and #3. I can't seem to build a red-squiggle-free C# LINQ query to solve all three steps.
Anders Heilsberg's post on this thread is I think the answer to Steps #2 and #3 above if I could get the syntax right.
I'd prefer to avoid using an external local variable to do the Index computation, as recommended on slodge.blogspot.com/2009/01/adding-row-number-using-linq-to-objects.html, since that solution breaks if the external variable is modified.
Optimally, the group-by-Title step could be done first, so the "inner" groupings (first by Source to compute the index, then by Index to filter out duplicates) can operate on small numbers of objects in each "by title" group, since the # of documents in each by-title group is usually under 100. I really don't want an N2 solution!
I could certainly solve this with nested foreach loops, but it seems like the kind of problem which should be simple with LINQ.
Any ideas?
I think jpbochi missed that you want your groupings to be by pairs of values (Title+SourceId then Title+Index). Here's a LINQ query (mostly) solution:
var selectedFew =
from doc in docs
group doc by new { doc.Title, doc.SourceId } into g
from docIndex in g.Select((d, i) => new { Doc = d, Index = i })
group docIndex by new { docIndex.Doc.Title, docIndex.Index } into g
select g.Aggregate((a,b) => (a.Doc.SourceId <= b.Doc.SourceId) ? a : b);
First we group by Title+SourceId (I use an anonymous type because the compiler builds a good hashcode for the grouping lookup). Then we use Select to attach the grouped index to the document, which we use in our second grouping. Finally, for each group we pick the lowest SourceId.
Given this input:
var docs = new[] {
new { Title = "ABC", SourceId = 0 },
new { Title = "ABC", SourceId = 4 },
new { Title = "ABC", SourceId = 2 },
new { Title = "123", SourceId = 7 },
new { Title = "123", SourceId = 7 },
new { Title = "123", SourceId = 7 },
new { Title = "123", SourceId = 5 },
new { Title = "123", SourceId = 5 },
};
I get this output:
{ Doc = { Title = ABC, SourceId = 0 }, Index = 0 }
{ Doc = { Title = 123, SourceId = 5 }, Index = 0 }
{ Doc = { Title = 123, SourceId = 5 }, Index = 1 }
{ Doc = { Title = 123, SourceId = 7 }, Index = 2 }
Update: I just saw your question about grouping by Title first. You can do this using a subquery on your Title groups:
var selectedFew =
from doc in docs
group doc by doc.Title into titleGroup
from docWithIndex in
(
from doc in titleGroup
group doc by doc.SourceId into idGroup
from docIndex in idGroup.Select((d, i) => new { Doc = d, Index = i })
group docIndex by docIndex.Index into indexGroup
select indexGroup.Aggregate((a,b) => (a.Doc.SourceId <= b.Doc.SourceId) ? a : b)
)
select docWithIndex;
To be honest, I'm quite confused with your question. Maybe if you should explain what you're trying to solve. Anyway, I'll try to answer what I understood.
1) First, I'll assume that you already have a list of documents grouped by Title+SourceId. For testing purposes, I hardcoded a list as follow:
var docs = new [] {
new { Title = "ABC", SourceId = 0 },
new { Title = "ABC", SourceId = 4 },
new { Title = "ABC", SourceId = 2 },
new { Title = "123", SourceId = 7 },
new { Title = "123", SourceId = 5 },
};
2) To get put a index in every item, you can use the Select extension method, passing a Func selector function. Like this:
var docsWithIndex
= docs
.Select( (d, i) => new { Doc = d, Index = i } );
3) From what I understood, the next step would be to group the last result by Title. Here's how to do it:
var docsGroupedByTitle
= docsWithIndex
.GroupBy( a => a.Doc.Title );
The GroupBy function (used above) returns an IEnumerable<IGrouping<string,DocumentWithIndex>>. Since a group is enumerable too, we now have an enumerable of enumerables.
4) Now, for each of the groups above, we'll get only the item with the minimum SourceId. To make this operation we'll need 2 levels of recursion. In LINQ, the outer level is a selection (for each group, get one of its items), and the inner level is an aggregation (get the item with the lowest SourceId):
var selectedFew
= docsGroupedByTitle
.Select(
g => g.Aggregate(
(a, b) => (a.Doc.SourceId <= b.Doc.SourceId) ? a : b
)
);
Just to ensure that it works, I tested it with a simple foreach:
foreach (var a in selectedFew) Console.WriteLine(a);
//The result will be:
//{ Doc = { Title = ABC, SourceId = 0 }, Index = 0 }
//{ Doc = { Title = 123, SourceId = 5 }, Index = 4 }
I'm not sure that's what you wanted. If not, please comment the answer and I can fix the answer. I hope this helps.
Obs.: All the classes used in my tests were anonymous. So, you don't really need to define a DocumentWithIndex type. Actually, I haven't even declared a Document class.
Method Based Syntax:
var selectedFew = docs.GroupBy(doc => new {doc.Title, doc.SourceId}, doc => doc)
.SelectMany((grouping) => grouping.Select((doc, index) => new {doc, index}))
.GroupBy(anon => new {anon.doc.Title, anon.index})
.Select(grouping => grouping.Aggregate((a, b) => a.doc.SourceId <= b.doc.SourceId ? a : b));
Would you say the above is the equivalent Method based syntax?
I implemented an extension method. It supports multiple partition by fields as well as multiple order conditions.
public static IEnumerable<TResult> Partition<TSource, TKey, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<IEnumerable<TSource>, IOrderedEnumerable<TSource>> sorter,
Func<TSource, int, TResult> selector)
{
AssertUtilities.ArgumentNotNull(source, "source");
return source
.GroupBy(keySelector)
.Select(arg => sorter(arg).Select(selector))
.SelectMany(arg => arg);
}
Usage:
var documents = new[]
{
new { Title = "Title1", SourceId = 1 },
new { Title = "Title1", SourceId = 2 },
new { Title = "Title2", SourceId = 15 },
new { Title = "Title2", SourceId = 14 },
new { Title = "Title3", SourceId = 100 }
};
var result = documents
.Partition(
arg => arg.Title, // partition by
arg => arg.OrderBy(x => x.SourceId), // order by
(arg, rowNumber) => new { RowNumber = rowNumber, Document = arg }) // select
.Where(arg => arg.RowNumber == 0)
.Select(arg => arg.Document)
.ToList();
Result:
{ Title = "Title1", SourceId = 1 },
{ Title = "Title2", SourceId = 14 },
{ Title = "Title3", SourceId = 100 }

Categories

Resources