Remove duplicate rows using linq query - c#

I am not the greatest with linq but is the language of choice. I'm trying to write the query using sql like. Standard scenerio I have an invoice and that invoice had invoice details. When joining the tables together of course the invoices that have mulitple details are going to repeat. In standard sql I could use distinct or group by. I've tried to follow that with linq but getting errors or it is just not filtering them out.
Here is my query
var result = (from invoice in invoices
join invoiceItem in invItems on invoice.Id equals invoiceItem.InvoiceId
orderby invoice.InvoiceNo
select new InvoiceReceiveShipmentVM
{
dtInvoiced = invoice.dtInvoiced,
InvoiceNumber = invoice.InvoiceNo,
InvoiceType = invoice.InvoiceType,
InvoiceStatus = invoice.InvoiceStatus,
Lines = invoiceItem.Line,
Total = invoice.Total,
Carrier = invoice.Carrier,
});
return result.Distinct();
I've also tried :
var myList = result.GroupBy(x => x.InvoiceNumber)
.Select(g => g.First()).ToList();
return myList.Skip(fetch.Skip).Take(fetch.Take).AsQueryable();

Using distinct, please override Equals and GetHashCode in InvoiceReceiveShipmentVM
public class InvoiceReceiveShipmentVM
{
public override bool Equals(object obj)
{
if (obj is InvoiceReceiveShipmentVM == false) return false;
var invoice = (InvoiceReceiveShipmentVM)obj;
return invoice.InvoiceNumber == InvoiceNumber
&& invoice.InvoiceType == InvoiceType
&& invoice.InvoiceStatus == InvoiceStatus
&& invoice.Lines == Lines
&& invoice.Total == Total
&& invoice.Carrier == Carrier;
}
public override int GetHashCode()
{
return InvoiceNumber.GetHashCode()
^ InvoiceType.GetHashCode()
^ InvoiceStatus.GetHashCode()
^ Lines.GetHashCode()
^ Total.GetHashCode()
^ Carrier.GetHashCode();
}
}

Related

Union Lists using IEqualityComparer

I'we got two Lists of my class Nomen:
var N1 = new List<Nomen>();
var N2 = new List<Nomen>();
public class Nomen
{
public string Id;
public string NomenCode;
...
public string ProducerName;
public decimal? minPrice;
}
I need to join them. I used to do it like this:
result = N2.Union(N1, new NomenComparer()).ToList();
class NomenComparer : IEqualityComparer<Nomen>
{
public bool Equals(Nomen x, Nomen y)
{
return x.Equals(y);
}
public int GetHashCode(Nomen nomen)
{
return nomen.GetHashCode();
}
}
public override int GetHashCode()
{
return (Id + NomenCode + ProducerName).GetHashCode();
}
public bool Equals(Nomen n)
{
if (!String.IsNullOrEmpty(Id) && Id == n.Id) return true;
return (NomenCode == n.NomenCode && ProducerName == n.ProducerName);
}
As you can see, if Ids or NomenCode and ProducerName are equal, for me it's the same Nomen.
now my task have changed and I need to take, if they equal, the one with less minPrice. Please, help me to solve this problem.
Tried to do the same with linq, but failed
var groups = (from n1 in N1
join n2 in N2
on new { n1.Id, n1.NomenCode, n1.ProducerName } equals new { n2.Id, n2.NomenCode, n2.ProducerName }
group new { n1, n2 } by new { n1.Id, n1.NomenCode, n1.ProducerName } into q
select new Nomen()
{
NomenCode = q.Key.NomenCode,
ProducerName = q.Key.ProducerName,
minPrice = q.Min(item => item.minPrice)
}).ToList();
Mostly because I need to join Lists by Ids OR {NomenCode, ProducerName} and I don't know how to do it.
Concat, GroupBy and then Select again? for example (less untested than before):
var nomens = N1.Concat(N2)
.GroupBy(n=>n, new NomenComparer())
.Select(group=>group.Aggregate( (min,n) => min == null || (n.minPrice ?? Decimal.MaxValue) < min.minPrice ? n : min));
Linq joins with OR conditions have been answered in this SO post:
Linq - left join on multiple (OR) conditions
In short, as Jon Skeet explains in that post, you should do something like
from a in tablea
from b in tableb
where a.col1 == b.col1 || a.col2 == b.col2
select ...

What is the appropriate LINQ query to this specific case?

Given the following two classes:
public class Apple
{
public int Id { get; set; }
public string Name { get; set; }
}
public class Worm
{
public int AppleId { get; set; }
public int WormType { get; set; }
public int HungerValue { get; set; }
}
All instances of Worm are given an AppleId equal to a randomly existing Apple.Id
public void DoLINQ(List<Apple> apples, List<Worm> worms, string targetAppleName, List<int> wormTypes )
{
// Write LINQ Query here
}
How can we write a Linq query which
finds all the elements in 'apples', whose 'Name' matches the 'targetAppleName'
AND
(does not "contain" the any worm with Wormtype given in Wormtypes
OR
only contains worms with Hungervalue equal to 500)?
Note that an instance of Apple does not actually 'contain' any elements of Worm, since the relation is the other way around. This is also what complicates things and why it is more difficult to figure out.
--Update 1--
My attempt which selects multiple apples with the same Id:
var query =
from a in apples
join w in worms
on a.Id equals w.AppleId
where (a.Name == targetAppleName) && (!wormTypes.Any(p => p == w.WormType) || w.HungerValue == 500)
select a;
--Update 2--
This is closer to a solution. Here we use two queries and then merge the results:
var query =
from a in apples
join w in worms
on a.Id equals w.AppleId
where (a.Name == targetAppleName) && !wormTypes.Any(p => p == w.WormType)
group a by a.Id into q
select q;
var query2 =
from a in apples
join w in worms
on a.Id equals w.AppleId
where (a.Name == targetAppleName) && wormTypes.Any(p => p == w.WormType) && w.HungerValue == 500
group a by a.Id into q
select q;
var merged = query.Concat(query2).Distinct();
--Update 3--
For the input we expect the LINQ query to use the parameters in the method, and those only.
For the output we want all apples which satisfy the condition described above.
You can use a let construct to find the worms of a given apple if you want to use query syntax:
var q =
from a in apples
let ws = from w in worms where w.AppleId == a.Id select w
where
(ws.All(w => w.HungerValue == 500)
|| ws.All(w => !wormTypes.Any(wt => wt == w.WormType)))
&& a.Name == targetAppleName
select a;
In method chain syntax this is equivalent to introducing an intermediary anonymous object using Select:
var q =
apples.Select(a => new {a, ws = worms.Where(w => w.AppleId == a.Id)})
.Where(t => (t.ws.All(w => w.HungerValue == 500)
|| t.ws.All(w => wormTypes.All(wt => wt != w.WormType)))
&& t.a.Name == targetAppleName).Select(t => t.a);
I wouldn't exactly call this more readable, though :-)
var result = apples.Where(apple =>
{
var wormsInApple = worms.Where(worm => worm.AppleId == apple.Id);
return apple.Name == targetAppleName
&& (wormsInApple.Any(worm => wormTypes.Contains(worm.WormType)) == false
|| wormsInApple.All(worm => worm.HungerValue == 500));
});
For each apple, create a collection of worms in that apple. Return only apples that match the required name AND (contain no worms that are in WormType OR only contain worms with a HungerValue of 500).
You were so close in your first attempt. But instead of a Join which multiplies the apples you really need GroupJoin which "Correlates the elements of two sequences based on key equality and groups the results". In query syntax it's represented by the join .. into clause.
var query =
from apple in apples
join worm in worms on apple.Id equals worm.AppleId into appleWorms
where apple.Name == targetAppleName
&& (!appleWorms.Any(worm => wormTypes.Contains(worm.WormType))
|| appleWorms.All(worm => worm.HungerValue == 500))
select apple;
Using lambda would look like this:
var result = apples.Where(a =>
a.Name == targetAppleName &&
(worms.Any(w => w.AppleId == a.Id && w.HungerValue >= 500)) ||
worms.All(w => w.AppleId != a.Id));
I think the lambda makes the code look a bit cleaner/easier to read, plus, the usage of.Any() and .All() is more efficient than a full on join IMHO... I haven't tested it with any heavy data so hard to speak with authority here (plus, there can't be that many apples...!)
BTW, this is the entire body of code. Kind of surprised it doesn't work for you. Maybe you missed something...?
public class Apple
{
public int Id { get; set; }
public string Name { get; set; }
}
public class Worm
{
public int AppleId { get; set; }
public int WormType { get; set; }
public int HungerValue { get; set; }
}
void Main()
{
var apples = Enumerable.Range(1, 9).Select(e => new Apple { Id = e, Name = "Apple_" + e}).ToList();
var worms = Enumerable.Range(1, 9).SelectMany(a =>
Enumerable.Range(1, 5).Select((e, i) => new Worm { AppleId = a, WormType = e, HungerValue = i %2 == 0 ? a * e * 20 : 100 })).ToList();
DoLINQ(apples, worms, "Apple_4", new[] {4, 5});
}
public void DoLINQ(IList apples, IList worms, string targetAppleName, IList wormTypes)
{
// Write LINQ Query here
var result = apples.Where(a =>
a.Name == targetAppleName &&
(worms.All(w => w.AppleId != a.Id) || worms.Any(w => w.AppleId == a.Id && w.HungerValue >= 500)));
result.Dump(); // remark this out if you're not using LINQPad
apples.Dump(); // remark this out if you're not using LINQPad
worms.Dump(); // remark this out if you're not using LINQPad
}
I have modify your query but didn't tested yet lets have a look and try it. Hopefully it will solve your problem.
var query =
from a in apples
join w in worms
on a.Id equals w.AppleId into pt
from w in pt.DefaultIfEmpty()
where (a.Name == targetAppleName) && (!wormTypes.Any(p => p == w.WormType) || (w.HungerValue == 500))
select a;
Thanks.

Optimize LINQ instead of creating new collections/loops

I have two tables:
Invoices (InvoiceID, InvoiceNumber)
Invoices_Products (InvoiceID, ProductID, IsFinalized)
I show a list of all invoices, and there are buttons to filter by "finalized" or "not finalized" invoices. A finalized invoice is one where every product on it is IsFinalized==true.
At the moment I have the following code which is performing quite slowly:
IEnumerable<Invoice> invoices = db.Invoices;
if (isFinalized) // filter by finalized invoices
{
List<Invoice> unfinalizedInvoices = new List<Invoice>();
foreach (var invoice in invoices)
{
int invoicesProductsCountTotal = db.Invoices_Products.Where(l => l.InvoiceID == invoice.InvoiceID).Count();
int invoicesProductsCountFinalized = db.Invoices_Products.Where(l => l.InvoiceID == invoice.InvoiceID && l.IsFinalized == true).Count();
if (invoicesProductsCountTotal != invoicesProductsCountFinalized)
{
unfinalizedInvoices.Add(invoice);
}
}
invoices = invoices.Except(unfinalizedInvoices);
}
else
{
List<Invoice> finalizedInvoices = new List<Invoice>();
foreach (var invoice in invoices)
{
int invoicesProductsCountTotal = db.Invoices_Products.Where(l => l.InvoiceID == invoice.InvoiceID).Count();
int invoicesProductsCountFinalized = db.Invoices_Products.Where(l => l.InvoiceID == invoice.InvoiceID && l.IsFinalized == true).Count();
if (invoicesProductsCountTotal == invoicesProductsCountFinalized && invoicesProductsCountFinalized > 0)
{
finalizedInvoices.Add(invoice);
}
}
invoices = invoices.Except(finalizedInvoices);
}
I realize this isn't optimal but I like spreading out my LINQ so that I can read and understand it.My question: Is there any way I could make this query faster using .All or .Any or something, or do I need to rethink my database design (possibly adding an extra column to the Invoices table)
edit: Third table is Products (ProductID, ProductNumber) but you knew that
At the moment you're loading all your invoices and then loading the products for each invoice. This is bound to be slow (and it will become a lot slower when you start adding a lot of invoice).
You should create a many-to-many relationship in EntityFramework. (see example)
Your classes would look like this:
class Invoice
{
List<Product> Products {get; set;}
}
class Product
{
bool IsFinalized {get; set;}
}
Now you can use LINQ to make sure that only SQL statement is executed which fetches only the data you want:
var invoices = db.Invoices.Where(i => i.Products.All(p => p.IsFinalized == finalized));
Iterating over each Invoice and then make additional requests to the database will be very slow. Let your query get all the informations at once and iterate through the results instead.
var result = from invoice in db.Invoices
join invoicedProduct in db.Invoices_Products
on invoice.InvoiceId equals invoicedProduct.InvoiceId
select new
{
InvoiceId = invoice.InvoiceId,
ProductId = invoicedProduct.ProductId,
IsFinalized = invoicedProuct.IsFinalized
};
var grpResult = from record in result
group record by record.ProductId into productGrp
select productGrp;
foreach( var grp in grpResult )
{
Console.WriteLine( "ProductId: " + grp.Key.ToString( ) );
Console.WriteLine( "TotalCount: " + grp.Count( ).ToString( ) );
Console.WriteLine( "Finalized: " + grp.Where( item => item.IsFinalized ).Count( ).ToString( ) );
}
if (isFinalized)
{
invoices = invoices.Where(l => l.Invoices_Products.All(m => m.IsFinalized == true));
}
else
{
List<Invoice> finalizedInvoices = invoices.Where(l => l.Invoices_Products.All(m => m.IsFinalized == true)).ToList();
invoices = invoices.Except(finalizedInvoices);
}
^^ this seems to have improved performance dramatically. oh well, thanks for listening

How to perform a this kind of Distinct operation with LINQ?

I have the following foreach loop:
List<WorkingJournal> workingJournals = new List<WorkingJournal>();
foreach (WorkRoster workRoster in workRosters)
{
bool exists = workingJournals.Any(workingJournal => workingJournal.ServicePlan.Id == workRoster.ServicePlan.Id
&& workingJournal.Nurse.Id == workRoster.Nurse.Id
&& workingJournal.Month == workRoster.Start.Month
&& workingJournal.Year == workRoster.Start.Year);
if (exists == false)
{
WorkingJournal workingJournal = new WorkingJournal
{
ServicePlan = workRoster.ServicePlan,
Nurse = workRoster.Nurse,
Month = workRoster.Start.Month,
Year = workRoster.Start.Year
};
workingJournals.Add(workingJournal);
}
}
I started writing:
from workRoster in workRosters
select new WorkingJournal
{
ServicePlan = workRoster.ServicePlan,
Nurse = workRoster.Nurse,
Month = workRoster.Start.Month,
Year = workRoster.Start.Year
};
But now I am stuck with the comparison that produces distinct WorkingJournals.
I have a feeling that a group by clause should be here but I'm not sure how it should be done.
Assuming LINQ to objects:
(from workRoster in workRosters
select new WorkingJournal
{
ServicePlan = workRoster.ServicePlan,
Nurse = workRoster.Nurse,
Month = workRoster.Start.Month,
Year = workRoster.Start.Year
}).Distinct();
Note that for this to work you need Equals and GetHashCode implemented for the WorkingJournal object. If not, see Anthony's answer: How to perform a this kind of Distinct operation with LINQ?
If it's LINQ to SQL you could group by the new expression, then select the group key:
from workRoster in workRosters
group workRoster by new WorkingJournal
{
ServicePlan = workRoster.ServicePlan,
Nurse = workRoster.Nurse,
Month = workRoster.Start.Month,
Year = workRoster.Start.Year
} into workRosterGroup
select workRosterGroup.Key;
If you have proper Equals and GetHashCode implementations inside your class, you can simply invoke Distinct().
var result = workRosters.Select(...).Distinct();
On the chance you do not have such implementations, you can define an IEqualityComparer<WorkingJournal> implementation. This will have you defining Equals and GetHashCode methods for the T that can then be used by a dictionary or hashset and can also be used in overloads of Distinct() in Linq.
class JournalComparer : IEqualityComparer<WorkingJournal>
{
public bool Equals(WorkingJournal left, WorkingJournal right)
{
// perform your equality semantics here
}
public int GetHashCode(WorkingJournal obj)
{
// return some hash code here.
return obj.ServicePlan.GetHashCode();
}
}
var comparer = new JournalComparer(); // implements the interface
var result = workRosters.Select(r => new WorkingJournal { ... }).Distinct(comparer);

dynamic join based on where expression - linq/c#

I have a sp which builds a dynamic sql query based on my input params. I tried replicating in linq and somehow it seems incorrect.
My linq:
var result = from R in db.Committees.Where(committeeWhere)
join C in db.Employees.Where(employeeWhere) on R.PID equals C.PID
join K in db.CommitteeTypes.Where(committeesWhere) on R.PID equals K.PID
select new { R };
The 3 input params i have are:
1. Committee ID and/or
Employee ID and/or
Committee Type ID
Based on this, i want to be able to make the joins in my linq.
Note: i had to change table names and column names so please do not give thought on the names.
Sql snippet:
IF #committeeID is not null
set #wherestr = #wherestr + 'Committees.committeeID like' + #committeeID + #andstr
//...
IF len(#wherestr) > 6
SELECT #qrystr = #selectstr + #fromstr + left(#wherestr, len(#wherestr)-3) + ' ORDER BY Committees.committeeID DESC
EXEC (#qrystr)
Why do you need to use dynamic SQL? Wouldn't this work?
IQueryable<Committee> GetCommittees(int? committeeID, int? employeeID, int? committeeTypeID)
{
var result = from R in db.Committees.Where(c => committeeID == null || committeeID == c.ID)
join C in db.Employees.Where(e => employeedID == null || employeeID == e.ID)
on R.PID equals C.PID
join K in db.CommitteeTypes.Where(c => committeeTypeID == null || committeeTypeID == c.ID)
on R.PID equals K.PID
select R;
}
If that won't work, you can use different predicate expressions depending on your parameters:
Expression<Func<Committee, bool>> committeeWhere;
if(committeeID.HasValue)
{
int id = committeeID.Value;
committeeWhere = c => c.ID == id;
}
else
{
committeeWhere = c => true;
}
// etc
Update: Seeing your last comment, maybe you want something like this:
IQueryable<Committee> GetCommittees(int? committeeID, int? employeeID, int? committeeTypeID)
{
var result = db.Committees.Select(c => c);
if(committeeID.HasValue)
{
result = result.Where(c => c.ID = committeeID);
}
else if(employeeID.HasValue)
{
result = from R in result
join C in db.Employees.Where(e => employeeID == e.ID)
on R.PID equals C.PID
select R;
}
else if(committeeTypeID.HasValue)
{
result = from R in result
join K in db.CommitteeTypes.Where(ct => committeeTypeID == ct.ID)
on R.PID equals K.PID
select R;
}
return result;
}
If I may improve upon dahlbyk's answer... sometimes joining introduces duplicates. If you really intend to filter - then filter. Also - if you add the relationships in the LinqToSql designer, you'll have properties (such as Committee.Employees) which will be translated for you.
IQueryable<Committee> GetCommittees(int? committeeID, int? employeeID, int? committeeTypeID){
IQueryable<Committee> result = db.Committees.AsQueryable();
if(committeeID.HasValue)
{
result = result.Where(c => c.ID = committeeID);
}
if(employeeID.HasValue)
{
result = result
.Where(committee => committee.Employees
.Any(e => employeeID == e.ID)
);
}
if(committeeTypeID.HasValue)
{
result = result
.Where(committee => committee.CommitteeTypes
.Any(ct => committeeTypeID == ct.ID)
);
}
return result;
}

Categories

Resources