Linq: Group by with OR condition - c#

Below are the records where we are trying to group the records by the following OR conditions:
Name is same
Email is same
Phone is same
Is there a way in LINQ to Group By with Or condition?
Name Email Phone Id
--- --------- ------------ ----------
Rohan rohan#s.com NULL 1
R. Mehta rohan#s.com 9999999999 2
Alex alex#j.com 7777777777 3
Lisa John john#j.com 6666666666 4
Lisa lisa#j.com 6666666666 5
Siri siri#s.com NULL 6
RM info#s.com 9999999999 7
Lisa NULL NULL 8
Lisa John m#s.com 7777777757 9
Output Expected
Group 1:
Key: Rohan
RecordIds: 1,2,7 (As `Id:1` has same email as `Id:2`, `Id:2` has same
phone number as `Id:7`.)
Group 2:
Key: Lisa John
RecordIds: 4,5,8,9 (As `Id:4` has same phone number as `Id:5`. While `Id:5`
has the same name as `Id:8`. As `Id:9` has the same name
as `Id: 4`, include that)
3 and 6 are not part of the output as the output are only group with more than 1 record
Key can be anything I just put in a random key.
If record 9 had email-id: rohan#s.com then:
Output
Group 1:
Key: Rohan
RecordIds: 1,2,7,4,5,8,9
NOTE: Input is SQL table to be read through LINQ to SQL. So query performance too has to be taken into account.
Crud Solution:
A dirty solution would be the following:
Group the records by Name -> store result in var gl-1
Group the records by Email -> store result in var gl-2
Group the records by Phone -> store result in var gl-3
Take each result in gl-1 check if corresponding id is present in gl-2,gl-3. If so include those ids in gl-1
Take each result in gl-2 check if corresponding id is present in any result in gl-1 is so, include the exclusive ids to gl-1 record. If the loop encounters a result which is not present in gl-1, include it as a result in gl-1.
Do step 5 for gl-3.

GroupBy requires some definition of "equality". You could define an EqualityComparer with the logic you want, but you'll get inconsistent results. Your grouping breaks the transitive property of equality needed for grouping. In other words, if A=B and B=C then A=C must be true.
For example, the following pairs of items would be in the same group ("equal"):
A, B, C and A, D, E
A, D, E and F, G, E
but
A, B, C and F, G, E
would not be in the same group.
To get the output you want (e.g. item 9 in multiple groups) you'd need to use standard looping to recursively find all items that are "equal" to the first, then all items that are "equal" to that group, then all items that are "equal" to the third group, etc. Linq is not going to be very helpful here (except possibly for the searching within each recursive call).

Linq queries run linear which means once it has passed a new possible group it cant go back and work with it.
lets asume
public class aperson
{
public string Name;
public string Email;
public string Phone;
public int ID;
public aperson(string name,string email,string phone,int id)
{
Name = name;
Email = email;
Phone = phone;
ID = id;
}
}
example
new aperson("a","a#","1",1),
new aperson("b","b#","2",2),
new aperson("a","c#","2",3)
Iteration 1: create group 1 with ("a","a#","1") values
Iteration 2: create group 2 with ("b","b#","2") values
Iteration 3: here the system will have to group it with either group 1 or with group 2 but not both.
To fix this your iterator will have to go back to group 2 and group 1 and join them.
To solve this you will have to break it into steps.
Step1. Create the groups
Step2. Group by the created groups.
I think there are much better ways to do this. I am just illustrating the flow how this problem needs to be approached and why.
Code for solution
public static Dictionary<string, int> group = new Dictionary<string, int>();
public static void adduniquevalue(aperson person,int id)
{
if (person.Email != null && !group.Keys.Contains(person.Email))
{
group.Add(person.Email, id);
}
if (person.Phone != null && !group.Keys.Contains(person.Phone))
{
group.Add(person.Phone, id);
}
if (person.Name != null && !group.Keys.Contains(person.Name))
{
group.Add(person.Name, id);
}
}
public static void CreateGroupKeys(aperson person)
{
int id = group.Count;
List<int> groupmatches = new List<int>();
if (person.Email != null && group.Keys.Contains(person.Email))
groupmatches.Add(group[person.Email]);
if (person.Phone != null && group.Keys.Contains(person.Phone))
groupmatches.Add(group[person.Phone]);
if (person.Name != null && group.Keys.Contains(person.Name))
groupmatches.Add(group[person.Name]);
if (groupmatches.GroupBy(x=>x).Count() > 1)
{
int newid = groupmatches[0];
group.Keys.Where(key => groupmatches.Contains(group[key]))
.ToList()
.ForEach(key => { group[key] = newid; });
}
if (groupmatches.Count == 0)
adduniquevalue(person, id);
else adduniquevalue(person, groupmatches[0]);
}
public static int GetGroupKey(aperson person)
{
if (person.Email != null && group.Keys.Contains(person.Email))
return group[person.Email];
if (person.Phone != null && group.Keys.Contains(person.Phone))
return group[person.Phone];
if (person.Name != null && group.Keys.Contains(person.Name))
return group[person.Name];
else return 0;
}
This will create your groups in a dictionary which you could use in a normal group by method later on.
Like so:
people.ForEach(x => CreateGroupKeys(x));
var groups = people.GroupBy(x => GetGroupKey(x)).ToList();

Related

Find Common value against different id c# Linq

I have one table which is looking like this
ID
UserID
UserEncryptValue
1
1
abcd
2
2
1234
3
3
qwert
4
1
rstuv (Common value for user 1 and 2)
5
2
rstuv (Common value for user 1 and 2)
6
2
78901 (Common value for user 2 and 3)
7
3
78901 (Common value for user 2 and 3)
8
1
Hello123 (Common value for user 1,2 and 3)
9
2
Hello123 (Common value for user 1,2 and 3)
10
3
Hello123 (Common value for user 1,2 and 3)
Now I want to find if user 1 and 2 or 1, 2 and 3 have common value or not with use of Linq.
Assuming you're mapping that table to an actual object like 'UserData' like this:
public class UserData
{
public int Id { get; set; }
public int UserId { get; set; }
public string UserEncryptValue { get; set; }
}
You can get the common values like this (userData is a list of UserData and represents your data):
var searchId = 1;
var commonValues = userData.GroupBy(user => user.UserEncryptValue)
.Where(grp => grp.Count() > 1 && grp.Any(usr => usr.UserId == searchId))
.SelectMany(u => u);
This groups on the UserEncryptValue and only selects groups that have more than 1 value (has a match) and at least 1 of the user ids is equal to the searchId.
Table.Where(n => Table.Any(o => !(o === n) && o.UserEncryptValue == n.UserEncryptValue)).Select(n => n.UserID)
Will return a collection of user id's for members of collection Table where at least on other member of the table has the same value UserEncryptValue but is not the same object
Learn LINQ to understand how this works and what you can do to tweak it.
One way is to use GroupBy. In this case you would group by UserEncryptValue.
You can then examine each group and check which users are in each group.

How to get records in EF that match a list of combinations (key/values)?

I have a database table with records for each user/year combination.
How can I get data from the database using EF and a list of userId/year combinations?
Sample combinations:
UserId Year
1 2015
1 2016
1 2018
12 2016
12 2019
3 2015
91 1999
I only need the records defined in above combinations. Can't wrap my head around how to write this using EF/Linq?
List<UserYearCombination> userYears = GetApprovedYears();
var records = dbcontext.YearResults.Where(?????);
Classes
public class YearResult
{
public int UserId;
public int Year;
public DateTime CreatedOn;
public int StatusId;
public double Production;
public double Area;
public double Fte;
public double Revenue;
public double Diesel;
public double EmissionsCo2;
public double EmissionInTonsN;
public double EmissionInTonsP;
public double EmissionInTonsA;
....
}
public class UserYearCombination
{
public int UserId;
public int Year;
}
This is a notorious problem that I discussed before here. Krishna Muppalla's solution is among the solutions I came up with there. Its disadvantage is that it's not sargable, i.e. it can't benefit from any indexes on the involved database fields.
In the meantime I coined another solution that may be helpful in some circumstances. Basically it groups the input data by one of the fields and then finds and unions database data by grouping key and a Contains query of group elements:
IQueryable<YearResult> items = null;
foreach (var yearUserIds in userYears.GroupBy(t => t.Year, t => t.UserId))
{
var userIds = yearUserIds.ToList();
var grp = dbcontext.YearResults
.Where(x => x.Year == yearUserIds.Key
&& userIds.Contains(x.UserId));
items = items == null ? grp : items.Concat(grp);
}
I use Concat here because Union will waste time making results distinct and in EF6 Concat will generate SQL with chained UNION statements while Union generates nested UNION statements and the maximum nesting level may be hit.
This query may perform well enough when indexes are in place. In theory, the maximum number of UNIONs in a SQL statement is unlimited, but the number of items in an IN clause (that Contains translates to) should not exceed a couple of thousands. That means that
the content of your data will determine which grouping field performs better, Year or UserId. The challenge is to minimize the number of UNIONs while keeping the number of items in all IN clauses below approx. 5000.
you can try this
//add the possible filters to LIST
var searchIds = new List<string> { "1-2015", "1-2016", "2-2018" };
//use the list to check in Where clause
var result = (from x in YearResults
where searchIds.Contains(x.UserId.ToString()+'-'+x.Year.ToString())
select new UserYearCombination
{
UserId = x.UserId,
Year = x.Year
}).ToList();
Method 2
var d = YearResults
.Where(x=>searchIds.Contains(x.UserId.ToString() + '-' + x.Year.ToString()))
.Select(x => new UserYearCombination
{
UserId = x.UserId,
Year = x.Year
}).ToList();

How to compare value of a field in a list to another value in another list in Where clause

I have a list of Employee Appraisal table, this table has TotalResult field, the values in this field are between 1 and 10. Another table Result Segmentation has the following columns:
Id int, Max double, Min double, Desc string
Let's say I have this data for Employee Appraisal:
EmpId EmpName TotalResult
--- ------- -----------
1 Jaims 1.5
2 Johny 8.3
3 Moon 5.6
4 Michle 7
5 Mariam 9
6 Kamel 4
Result Segmentation Values
Id Max Min Desc
--- --- --- -----
1 3 1 ~ 30%
2 4 3 40%
3 5 4 50%
4 6 5 60%
5 7 6 70%
6 10 7 ~ 80%
Now, the user has a multi select list of the Rate Segmentation table
if the user chooses 70% and 40%, the query should show these employee appraisals:
EmpId EmpName TotalResult
----- ------- -----------
3 Moon 5.6
6 Kamel 4
4 Michle 7
i wrote this code
if (rateSegIds != null)
{
var rateSegs = _repositoryRateSeg.Query(x => rateSegId.Contains(x.Id)).ToList();
if (rateSeg.Any())
{
foreach (var segmentation in rateSeg)
{
query = query.Where(x => x.TotalResult > segmentation.Min && x.TotalResult <= segmentation.Max);
}
}
}
rateSegIds is the a list of integers hold the user selection
rateSegs contains the records from RateSegmataions table according to the list of Ids
query is a queryable object of EmployeeAppraisal table
This code works only if the user choose one value from the list, if he/she choose multiple values, the query will return nothing.
Because it's acting like "And" , it should acting like "OR" but I didn't know how to write.
This was something that had been bugging me a while back, and the question just prompted me to dig into it a bit. .Where() will append conditions, but as you noted, with an AndAlso operation. To get EF and Linq to support an OrElse condition more dynamically you need to rebuild the expression tree a little to or the conditions together. Kudos to user743382's answer on Exception using OrElse and AndAlso expression methods
You'll need a couple classes to enable an expression visitor to line up the parameters for multiple expressions to be Or'd together. Something like:
private Expression<Func<EmployeeAppraisal, bool>> buildFilterExpression(IEnumerable<Segment> segments)
{
Expression<Func<EmployeeAppraisal, bool>> exp = c => false;
foreach (var segment in segments)
{
Expression<Func<EmployeeAppraisal, bool>> filter = x => x.TotalResult >= segment.Min && x.TotalResult <= segment.Max;
exp = Expression.Lambda<Func<EmployeeAppraisal, bool>>(Expression.OrElse(exp.Body,
new ExpressionParameterReplacer(filter.Parameters, exp.Parameters).Visit(filter.Body)), exp.Parameters);
}
return exp;
}
private class ExpressionParameterReplacer : ExpressionVisitor
{
public ExpressionParameterReplacer(IList<ParameterExpression> fromParameters, IList<ParameterExpression> toParameters)
{
ParameterReplacements = new Dictionary<ParameterExpression, ParameterExpression>();
for (int i = 0; i != fromParameters.Count && i != toParameters.Count; i++)
ParameterReplacements.Add(fromParameters[i], toParameters[i]);
}
private IDictionary<ParameterExpression, ParameterExpression> ParameterReplacements
{
get;
set;
}
protected override Expression VisitParameter(ParameterExpression node)
{
ParameterExpression replacement;
if (ParameterReplacements.TryGetValue(node, out replacement))
node = replacement;
return base.VisitParameter(node);
}
}
Then in your EF Linq expression:
var rateSegs = _repositoryRateSeg.Query(x => rateSegId.Contains(x.Id)).ToList();
if (rateSeg.Any())
query = query.Where(buildFilterExpression(rateSegs));
The ExpressionParameterReplacer and supporting classes accommodate OR-ing the different expression bodies together and ensuring that they are associating to the same expression parameter so that Linq will evaluate them correctly as a single expression.
Cross join can be one solution like:
var rateSegIds = new int[] {2, 5}; //40% and 70%
var result = from emp in EmployeeAppraisals
from segment in Segments.Where(x => rateSegIds.Contains(x.Id))
where emp.Total >= segment.Min && emp.Total <= segment.Max
select emp;

Perform condition on last record of table and get id of that record

I want to perform condition on last record of my below model :
public partial class TestPart
{
public int Id { get; set; }
public int TestId { get; set; }
public int Status { get; set; }
public virtual ICollection<Job> Jobs { get; set; }
}
Query :
var query = context.TestPart.OrderByDescending(tp=>tp.Id)
.Take(1)
.Where(tp => tp.TestId == 100 &&
tp.Status == 1 &&
tp.Jobs.All(j => j.Type == "Recurring")
Here I want to get Id of TestPart whose status = 1 and all jobs are recurring but this should only consider checking last record of test part
But I am unable to select Id of last TestPart in above query.
Update :
Id TestId Status
1 100 1
2 100 2
3 100 0
4 100 1
5 101 1
so here I want to filter out data based on TestId and then select last record for that specific TEST and then check out whether all job types are recurring for that last selected TestPart id i.e in above case TestPartId=4.
The explanation is a bit fragmented. In order to make sure that I'm answering to the right problem, these are my assumptions:
One Test has many TestPart children.
You want the last TestPart of a given test, not just the last entry of the table (regardless of test id).
You're trying to ascertain if this last item fits the criteria, thus making the end result of your code a boolean value.
You need to split the data retrieval and data validation steps here.
When you merge them, you get different results. You ask for the last item that fits the criteria. This means that in a list of 10 items (numbered 1 through 10 chronologically) you might end up getting item 8 if it fits the criteria and 9 and 10 do not fit the criteria.
From your description, I surmise that's not what you want. You want to take item 10 (regardless of whether it fits the criteria, and only then check if this item fits the criteria or not.
Think of it this way:
I want the last person named John who entered this building.
I want to see if the last person who entered the building is named John.
Your code is trying to do the first. But what you really want to do is the second.
The correct version of your code:
//Get the last testpart of the test.
TestPart item = context.TestPart
.Include(tp => tp.Jobs) //possibly optional dependent on lazy/eager loading.
.OrderByDescending(tp=>tp.Id)
.First(tp => tp.TestId == 100);
//Check if this item fits the criteria
bool isValid =
item.Status == 1
&& item.Jobs.All(j => j.Type == "Recurring");
isValid contains your answer.
Edit - just for completeness
There are ways to merge this into one query, but this makes the code easily prone to misinterpretation.
bool isLastItemValid = context.TestPart
.Where(tp => tp.TestId == 100)
.OrderByDescending(tp => tp.Id)
.Take(1)
.Any(tp =>
tp.Status == 1
&& tp.Jobs.All(j => j.Type == "Recurring");
This gives you the same result. It relies on the "trick" that calling Any() on a list with only one item really just evaluates the one item.
While technically correct, I find this version unnecessarily complicated, less readable, and more prone to developer misinterpretation.
Replace .Take(1).Where() with FirstOrDefault()
TestPart item = context.TestPart.OrderByDescending(tp => tp.Id)
.FirstOrDefault(tp => tp.TestId == 100 &&
tp.Status == 1 &&
tp.Jobs.All(j => j.Type == "Recurring");
int result = item.Id;
I think the appropriate thing to do is break it into steps. I do love a big LINQ statement like the next guy, but only when it elegantly represents the required logic. In this case you're to get a record, check its status, and return its ID, so why not express that in ROC?
var lastPart = context.TestPart.OrderByDescending(tp=>tp.Id)
.First();
bool match = (lastPart.TestId == 100 &&
lastPart.Status == 1 &&
lastPart.Jobs.All( j => j.Type == "Recurring"));
if (match) return lastPart.Id;
Relevant: Writing ROC (Really Obvious Code).

Find a combination of two elements that have not been viewed together (LINQ, SQL or C#)

I have a page that displays two objects and then the user picks one of these. I record the preference and the combination in a MSSQL database and end up storing data like this:
UserId=1, BetterObjectId=1, WorseObjectId=2
Now I would like to avoid showing that combination of objects (1,2 / 2,1) ever again.
So how do I generate random combinations to show the user excluding previously viewed combinations?
This seems like it should be a really straightforward question but like most programmers I'm short on sleep and coffee so your help is much appreciated :-)
The very naive approach is something like this (and all calls to this function would have to be wrapped in a check to see if the user has already rated as many times as nCr where n is the item count and r is 2):
public List<Item> GetTwoRandomItems(int userId)
{
Item i = null, i2 = null;
List<Item> r = null;
while (i == null || i2 == null)
{
r = GetTwoRandomItemsRaw();
i = r[0];
i2 = r[1];
if (GetRating(i.Id, i2.Id, userId) != null) /* Checks if viewed */
{
i = null;
i2 = null;
}
}
return r;
}
private List<Item> GetTwoRandomItemsRaw()
{
return Items.ToList().OrderBy(i => Guid.NewGuid()).Take(2).ToList();
}
Edits
Using some SQL I can generate a list of all items that aren't complete (i.e. there is a combination involving the item that the user hasn't seen) but I don't think is particularly useful.
I can also imagine generating every possible combination and eliminating already viewed ones before picking 2 random items but this is a another terrible solution.
A possibility (memory intensive for large n) is to generate all possible combinations and store the combinationId in the rating. Then I can just do a SELECT of all combinations WHERE combinationId IS NOT IN (SELECT combinationId FROM ratings WHERE userId=x) with some changes to reflect the symmetric relationship of combinations.
Table Item: ItemId
Table Rating: UserId, ItemId1, ItemId2, WinnerId
If you require that ItemId1 < ItemId2 in the Rating table, you only have to check the Rating table once.
var pair = db.Items.Join(db.Items,
i1 => i1.ItemId,
i2 => i2.ItemId,
(i1, i2) => new {i1, i2}
) //produce all pairs
.Where(x => x.i1.ItemId < x.i2.ItemId) //filter diagonal to unique pairs
.Where(x =>
!db.Ratings
.Where(r => r.UserId == userId
&& r.ItemId1 == x.i1.ItemId
&& r.ItemId2 == x.i2.ItemId)
.Any() //not any ratings for this user and pair
)
.OrderBy(x => db.GetNewId()) //in-database random ordering
.First(); // just give me the first one
return new List<Item>() {pair.i1, pair.i2 };
Here's a blog about getting "random" translated into the database.
One solution is this:
SELECT TOP 1 i.id item1, i2.id item2 from item i, item i2
WHERE i.id <> i2.id
AND (SELECT COUNT(*) FROM Rating WHERE userId=#userId AND FK_ItemBetter=i.id AND FK_ItemWorse=i2.id) = 0
AND (SELECT COUNT(*) FROM Rating WHERE userId=#userId AND FK_ItemBetter=i2.id AND FK_ItemWorse=i.id) = 0
ORDER BY NEWID()
I wasn't aware of the cross join method of just listing multiple FROM tables before.
Assuming that the list of available items is in the database, I would handle this problem entirely in the database. You are hitting the database already, no matter what, so why not get it done there?
What about putting all the objects in a queue or a stack, and then pop 2 and 2 off until they are empty?

Categories

Resources