Another Q about Linq grouping - c#

I using Linq (together with EF) in order to access my database. I have object "Job", which contains several properties, some of them are "complex". My goal is to group jobs by these properties, and to get a count for each group.
Here my objects (simplified):
public class Job
{
[Key]
public int Id
{
get;
set;
}
[Required]
public Salary Salary
{
get;
set;
}
[Required]
public ICollection<Category> Categories
{
get;
set;
}
}
"Category" is a complex class, and looks like this:
public class Category
{
[Key]
public int Id
{
get;
set;
}
public Industry Industry //Example: Software
{
get;
set;
}
public Field Field //Example: .NET
{
get;
set;
}
public Position Position //Example: Developer
{
get;
set;
}
}
Industry, Field, Position and Salary classes contains just "int" id and "string" name.
I need to group list of Jobs by Industry, Field, Position and Salary and to get a count of each group. This is how I doing it right now:
var IndustryGroupsQuery = from t in Jobs.SelectMany(p => p.Categories)
group t by new { t.Industry} into g
select new
{
Tag = g.Key.Industry,
Count = g.Count()
};
var FieldsGroupsQuery = from t in Jobs.SelectMany(p => p.Categories)
group t by new { t.Field} into g
select new
{
Tag = g.Key.Field,
Count = g.Count()
};
var PositionsGroupsQuery = from t in Jobs.SelectMany(p => p.Categories)
group t by new { t.Position} into g
select new
{
Tag = g.Key.Position,
Count = g.Count()
};
Jobs.GroupBy(job => job.Salary)
.Select(group => new
{
Tag = group.Key,
Count = group.Count()
}))
This is works fine, but I wondering is it possible to improve somehow its performance.
Q1: I think, that probably one single query will perform better that four. Is it possible to combine these queries into one single query?
Q2: When I asking Linq to group by "Industry", how exactly it able to distinguish between one Industry to another? Is it implicitly comparing records' keys? Is it will be faster if I explicitly tell to linq which property to group by (e.g. "id") ?
Thanks!

Answer in reverse order:
Q2:
When you group by an object instead of a base type, it uses the standard equality comparer (obj x == obj y) which does a simple reference comparison (http://msdn.microsoft.com/en-us/library/bsc2ak47(v=vs.110).aspx). If that suits, it works, otherwise you can implement a custom equality comparer (How to implement IEqualityComparer to return distinct values?)
Q1:
If you wanted sub-groups of the groups, then you can do it in a single query. If you just want the counts for each, then you are doing it exactly the right way.

You can user conditional GROUP BY.
You can define a variable to tell the query which column to use for grouping. You can define an ENUM for GROUP BY columns.
int groupByCol = 1; //Change the value of this field according to the field you want to group by
var GenericGroupsQuery = from t in Jobs
group t by new { GroupCol = ( groupByCol == 1 ? t.Industry:(groupByCol == 2 ? t.Field:(groupByCol == 3 ? t.Position : t.Job)))} into g
select new
{
Tag = g.Key,
Count = g.Count()
};

Related

Linq: find elements with common properties but different types

I have two list with objects:
class Class1
{
public int Id { get; set; }
public string Name { get; set; }
public Guid UniqueIdentifier { get; set; }
public bool IsValid { get; set; }
}
class Class2
{
public int Identifier { get; set; }
public string Producer{ get; set; }
public Guid Guid { get; set; }
public int SupplierId { get; set; }
}
Is there a way to use linq to get the elements of type Class1 from the list that have the same Id (identifier) and Guid with the elements of type Class2 from the second list?
Here is one way to do it:
var result = list1
.Where(x => list2.Any(y => x.Id == y.Identifier && x.UniqueIdentifier == y.Guid))
.ToList();
Please note that this version is not optimized for large lists. If your lists are small, this is fine. If you have large lists, you need a solution that involves things like HashSets. Here is an example:
var list2HashSet = CreateHashset(list2.Select(x => new {x.Identifier, x.Guid}));
var result = list1
.Where(x => list2HashSet.Contains(new {Identifier = x.Id, Guid = x.UniqueIdentifier}))
.ToList();
Where CreateHashset is defined like this:
public static HashSet<T> CreateHashset<T>(IEnumerable<T> collection)
{
return new HashSet<T>(collection);
}
I had to create this simple method because the compiler is complaining that it cannot resolve the correct constructor overload. I am not really sure why.
You could try something like this:
var result = from item1 in list1
join item2 in list2 on
new { G = item1.UniqueIdentifier, Id = item1.Id }
equals new { G = item2.Guid, Id = item2.Identifier }
select new { item1, item2 };
foreach(var item in result)
{
Console.WriteLine($"Producer: {item.item2.Producer} with product: {item.item1.Name}");
}
Let's say you have the two lists
List<Class1> List1;
List<Class2> List2;
You can select all items of List1 containing the Id and Guid of the second list with
List1.Where(C1 => List2.Any(C2 => C1.Id == C2.Identifier && C1.UniqueIdentifier == C2.Guid));
Note that Guid is a class. If you don't want to check if C1.UniqueIdentifier and C2.Guid are exactly the same objects, you should implement IsEqual and use it like
List1.Where(C1 => List2.Any(C2 => C1.Id == C2.Identifier && C1.UniqueIdentifier.Equals(C2.Guid)));
If it suffice for you that the ids or the guids match, see the answer from Jeroen van Langen. Otherwise, I see two options:
Add a where clause afterwards, i.e.,
var result = from item1 in list1
join item2 in list2 on item1.Id equals item2.Identifier
where item1.UniqueIdentifier = item2.Guid
select new { item1, item2 };
Create a tuple class and join on the tuples of Guid and Id. You cannot reuse the .NET 4 tuple types (Tuple<,>), but you could reuse the C# 7 tuple types as they correctly implement equality.
Both versions should also be fine with large lists. Basically, the whole thing should scale as long as you use join.

How to use LINQ to get multiple totals

Let's say I have the following data in a database.
class Data
{
public int Category { get; set; }
public int ValueA { get; set; }
public int ValueB { get; set; }
}
How can I write a LINQ query to get the sum of ValueA and also the sum of ValueB for all rows with Category == 1?
I know I could load all the data and then use Sum on the loaded data but would prefer to total them in the database.
I know I can use group by but I'm not grouping by anything. I just want these two totals from the data.
If you are using EF, you can try this:
var result= context.Data.Where(d=>d.Category == 1)
.GroupBy(d=>d.Category)
.Select(g=>new {
SumA=g.Sum(d=>d.ValueA),
SumB=g.Sum(d=>d.ValueB)
}
);
You can group by a constant
var result = from d in context.Data
where d.Category == 1
group d by 1 into g
select
{
ASum = g.Sum(d => d.ValueA),
BSum = g.Sum(d => d.ValueB)
};
Or as octavioccl pointed out you can also group by Category since it will be a constant value because of the where clause. But using a constant is how you can achieve what you want in the general case.

Count the items of a List according to their status

My question is simple, but I can't find a solution in my search.
Well, I have here these Classes and Lists:
public class Table
{
public string TableName { get; set; }
public string Status { get; set; }
public List<Request> Requests { get; set; }
}
public class Request
{
public string ProductName { get; set; }
public double ProductPrice { get; set; }
public int Quantity { get; set; }
}
public List<Table> Tables = new List<Table>();
The tables can have two status: Unoccupied and Occupied. What I need is to count all the tables presents in the Table List and separate them according to Status, but I don't know how to do this. For example, I have two tables with their status occupied and three tables with their status unoccupied. I need the output like: You have 2 tables occupied and 3 tables unoccupied. I think I need to use a while loop, but I don't know how to count separate.
Just use GroupBy
var results = Tables.GroupBy(t => t.Status)
.Select(g => new
{
Status = g.Key,
Count = g.Count()
});
foreach(var item in results)
{
Console.WriteLine("You have {0} tables of status {1}", item.Count, item.Status);
}
Note that this will only give you statuses that have at least one table, so if a status has none and you need that as well you can adjust to the following.
var results = Tables.GroupBy(t => t.Status)
.ToDictionary(g => g.Key, g => g.Count());
foreach(var status in new[] {"Unoccupied", "Occupied"})
{
int count;
results.TryGetValue(status, out count);
Console.WriteLine("You have {0} tables of status {1}", count, status);
}
Also you might want to consider creating an enum for the Status if it should only be one of two values.
public enum TableStatus
{
Unoccupied,
Occupied
}
You can use LINQ like:
List<Table> UnoccupiedTables = Tables.Where(r=> r.Status == "Unoccupied")
.ToList();
List<Table> OccupiedTables = Tables.Where(r=> r.Status == "Occupied")
.ToList();
If you want case insensitive comparison then you can replace you can use String.Equals like:
String.Equals(r.Status, "Occupied", StringComparison.InvariantCultureIgnoreCase)
I missed the part of getting count. You can use GroupBy as mentioned in the other answer, or you can get count for each item like:
int CountOfUnoccupiedTables = Tables.Count(r=> r.Status == "Unoccupied");

LINQ Union between two tables with the same fields and then returned in a collection

I have given up trying to create a linq query to retrieve a sql server view which is a union between two tables. I will now try to create a linq union.
I have two views, MemberDuesPaid and MemberDuesOwed. They have the same fields in both; (BatchNo, TranDate, DebitAmount, CreditAmount, ReceiptNo, CheckNo, SocSecNo).
I also have a helper class in my application which is called MemberTransaction. It has all the same properties.
How how do i do a union between the two tables where socSecNo = the ssn passed in? I want to union the two tables and return an IEnumerable collection of MemberTransaction. After the two tables are unioned together i want to have the collection that is returned ordered by trandate in descending order.
You can do it in a Linq Union query:
var infoQuery =
(from paid in db.MemberDuesPaid
select new MemberTransaction() {
BatchNo = paid.BatchNo,
TranDate = paid.TranDate,
DebitAmount = paid.DebitAmount,
CreditAmount = paid.CreditAmount,
ReceiptNo = paid.ReceiptNo,
CheckNo = paid.CheckNo,
SocSecNo = paid.SocSecNo})
.Union
(from owed in db.MemberDuesOwed
select new MemberTransaction() {
BatchNo = owed.BatchNo,
TranDate = owed.TranDate,
DebitAmount = owed.DebitAmount,
CreditAmount = owed.CreditAmount,
ReceiptNo = owed.ReceiptNo,
CheckNo = owed.CheckNo,
SocSecNo = owed.SocSecNo});
That should return you a set with everything combined into a single list.
[Edit]
If you want distinct values, you can do something like this after the above statement (you can do it inline if you bracket everything, but this is simpler to explain):
infoQuery = infoQuery.Distinct();
The variable infoQuery will by this time be populated entirely with objects of type MemberTransaction rather than the two disparate types in the union statement.
Assuming you've got two collections, one representing each view:
var paid = new List<MemberDuesPaid>();
var owed = new List<MemberDuesOwed>();
Convert both collections above to instances of the third class before performing the union:
var everyone
= paid.Select(x => new MemberTransaction { BatchNo = x.BatchNo, ... })
.Union(owed.Select(x => new MemberTransaction { BatchNo = x.BatchNo, ... }))
.Where(x => x.SocSecNo == ssn)
.OrderByDescending(x => x.TranDate)
.ToList();
Now you've got a collection of MemberTransaction, but there's nothing to indicate how one MemberTransaction equals another. So if you just run the above, you'll end up with everything from both collections in the result, instead of a true union.
You have to tell it what makes two instance equal, by implementing IEquatable<T> on the MemberTransaction class.
public class MemberTransaction : IEquatable<MemberTransaction>
{
public int BatchNo { get; set; }
public DateTime TranDate { get; set; }
public decimal DebitAmount { get; set; }
public decimal CreditAmount { get; set; }
public int ReceiptNo { get; set; }
public int CheckNo { get; set; }
public int SocSecNo { get; set; }
public bool Equals(MemberTransaction other)
{
return BatchNo == other.BatchNo
&& TranDate.Equals(other.TranDate)
&& DebitAmount == other.DebitAmount
&& CreditAmount == other.CreditAmount
&& ReceiptNo == other.ReceiptNo
&& CheckNo == other.CheckNo
&& SocSecNo == other.SocSecNo;
}
}

C# Looking for a sortable array that handles both ints and strings

I'm not really sure if what i'm looking for actually exists, so maybe you guys can help out.
I have the below data:
Apples|3211|12
Markers|221|9
Turtle|1023123123|22
The first column is always a string, the second column and third column are ints. However, what I want to do is be able to reference theses as strings or ints, and then be able to sort via the third column asc. Any ideas?
Something like MyTable[i].Column[i] and in this case MyTable[1].Column[2] would produce 12 as a int (because it's ordered).
If you want type safety you will need to create a class that holds each record:
class Record
{
string Name { get; set; }
int SomeValue { get; set; }
int OrderNr { get; set; }
}
Afterwards store them in a generic List<>, then you can order them, as you like:
List<Record> items = // read them into a list of items;
List<Record> orderedList = items.OrderBy(i => i.OrderNr).ToList();
UPDATE
Since it was requested I customized the answer from JustinNiessner to fit to my example:
string data = // your data as string
List<Record> records = data
.Split('|')
.Select(item => new Record
{
Name = item[0],
SomeValue = int.Parse(item[1]),
OrderNr = int.Parse(item[2])
}).ToList();
List<Record> orderedRecords = records.OrderBy(r => r.OrderNr).ToList();
This can be optimized by using var and not executing ToList() on the list, but is done this way in order to keep it simple for you to understand the concepts better.
Assuming you have your data stored in some sort of IEnumerable<string> type, you could try something like:
var sortedObjs = stringRows
.Split('|')
.Select(r => new
{
ColA = r[0],
ColB = int.Parse(r[1]),
ColC = int.Parse(r[2])
})
.OrderBy(r => r.ColC).ToList();
var specificVal = sortedObjs[1].ColC;
This speaks to a larger problem in your design. Using collections to hold a bunch of disparate types with the intent of organizing them into some sort of structure is fragile, error prone, and completely unnecessary.
Instead, create your own type to organize this information.
class MyType
{
public string Name { get; set; }
public int Whatever { get; set; }
public int AnotherProp { get; set; }
}
Now your data is logically grouped in a nice, tight, type safe package.
Your original post didn't specify what the ints were, but since you wanted to select them either by the Descripton(?) or the Id(?) and then sort them by the third column perhaps something like this will work for you?
//Code tested in LinqPad
void Main()
{
//Apples|3211|12
//Markers|221|9
//Turtle|1023123123|22
//Create a list of items
var items = new List<Item>
{
new Item { Description = "Apple", Id = 3211, Sequence = 12 },
new Item { Description = "Markers", Id = 221, Sequence = 9 },
new Item { Description = "Turtle", Id = 1023123123, Sequence = 22 }
};
//Get sorted list of Apple by Description
var sortedByDescription = items.Where(i => i.Description == "Apple").OrderBy(i => i.Sequence);
//Get sorted list of Turtle by Id
var sortedById = items.Where(i => i.Id == 221).OrderBy(i => i.Sequence);
}
public class Item
{
public string Description { get; set; }
public int Id { get; set; }
public int Sequence { get; set; }
}

Categories

Resources