How do I construct a LINQ with multiple GroupBys? - c#

I have an entity that looks like this:
public partial class MemberTank
{
public int Id { get; set; }
public int AccountId { get; set; }
public int Tier { get; set; }
public string Class { get; set; }
public string TankName { get; set; }
public int Battles { get; set; }
public int Victories { get; set; }
public System.DateTime LastUpdated { get; set; }
}
A tiny sample of the data:
Id AccountId Tier Class TankName Battles Victories
--- --------- ---- ----- --------- ------- ----------
1 432423 5 Heavy KV 105 58
2 432423 6 Heavy IS 70 39
3 544327 5 Heavy KV 200 102
4 325432 7 Medium KV-13 154 110
5 432423 7 Medium KV-13 191 101
Ultimately I am trying to get a result that is a list of tiers, within the tiers is a list of classes, and within the class is a distinct grouping of the TankName with the sums of Battles and Victories.
Is it possible to do all this in a single LINQ statement? Or is there another way to easily get the result? (I know I can easily loop through the DbSet several times to produce the list I want; I am hoping for a more efficient way of getting the same result with LINQ.)

This should do it:
var output = from mt in MemberTanks
group by {mt.Tier, mt.Class, mt.TankName} into g
select new { g.Key.Tier,
g.Key.Class,
g.Key.TankName,
Fights = g.Sum(mt => mt.Battles),
Wins = g.Sum(mt=> mt.Victories
};

You could also use Method syntax. This should give you the same as #TheEvilGreebo
var result = memberTanks.GroupBy(x => new {x.Tier, x.Class, x.TankName})
.Select(g => new { g.Key.Tier,
g.Key.Class,
g.Key.TankName,
Fights = g.Sum(mt => mt.Battles),
Wins = g.Sum(mt=> mt.Victories)
});
Which syntax you use comes down to preference.
Remove the .Select to return the IGrouping which will enable you to enumerate the groups
var result = memberTanks.GroupBy(x => new {x.Tier, x.Class, x.TankName})

I kept trying to get useful results our of the The Evil Greebo's answer. While the answer does yield results (after fixing the compilation issues mentioned in responses) it doesn't give me what I was really looking for (meaning I didn't explain myself well enough in the question).
Feanz left a comment in my question to check out the MS site with LINQ examples and, even though I thought I had looked there before, this time I found their example of nested group bys and I tried it their way. The following code gives me exactly what I was looking for:
var result = from mt in db.MemberTanks
group mt by mt.Tier into tg
select new
{
Tier = tg.Key,
Classes = from mt in tg
group mt by mt.Class into cg
select new
{
Class = cg.Key,
TankTypes = from mt in cg
group mt by mt.TankName into tng
select new
{
TankName = tng.Key,
Battles = tng.Sum(mt => mt.Battles),
Victories = tng.Sum(mt => mt.Victories),
Count = tng.Count()
}
}
};
I'll leave the answer by Mr. Greebo checked as most people will likely get the best results from that.

Related

Is it possible to select huge list from db by list of composite primary keys

I have 2 databases.
From my DB I'am taking List items (im getting this by Date, it can be up to 300 000 elements)
public class Item
{
public string A { get; set; }
public string B { get; set; }
public string C { get; set; }
public DateTime Date { get; set; }
}
In other database (I don't control that DB, I can olny read from it, and i can't change anything in this DB) I need to select List
public class OtherDbItem
{
public string X { get; set; }
public string Y { get; set; }
public string Z { get; set; }
public string FewOtherProperties { get; set; }
}
Where X, Y, Z are primary key, I need to select all otherDbItems where Item.A = OtherDbItem.X and Item.B = OtherDbItem.Y and Item.C = OtherDbItem.Z (than map OtherDbItems to my model and save in my database).
I am using 2 different EF Core DbContext for connecting with databases.
I tryed:
var otherDbItems = new List<OtherDbItem>();
foreach (var item in Items)
{
var otherDbItem = await this.context.OtherDbItems.FindAsync(item.A, item.B, item.C);
if (otherDbItem != null)
{
otherDbItems.Add(otherDbItem);
}
}
return otherDbItems;
But this can be 300 000 Items, so it's 300 000 requests to database, obviusly it's not optimal, and not acceptable.
I tryed also:
var ids = items.Select(item => item.A + item.B + item.C).ToList();
var otherDbItems = await this.context.OtherDbItems.Where(otherDbItem => ids.Contains(otherDbItem.X + otherDbItem.Y + otherDbItem.Z)).ToListAsync();
But this result in huge sql query, it's slow, and cause ConnectionTimeOut.
Is it possible to get OtherDbItems fast and relaiable?
And do I have to get this item's in parts? For example .take(1000).skip(0) items at 1 call? If yes how big should this parts be?
I can't say for sure that this is the best approach because I'm not an EF expert, but I had a similar scenario recently where I was dealing with a sync that came from an external JSON export to an EF Core database. Part of that operation was validating that existing EF Core entries that would grow based on the imported data were still valid if the export changed, suffice to say as the database grew towards a million or so records that had to be validated, we encountered timeout and expensive query issues.
The approach we ended with that actually ended up improving the speed of even our original process was to batch the operations. The one thing we did different than just the take()skip() approach was we actually batched on the input side. In essence, we took a collection of 1000 ids at a time and used that for the query before moving onto the next. So with your code/data that might look something like this:
int chunkIndex = 0;
int batch = 1000;
var ids = items.Select(item => item.A + item.B + item.C).ToList();
while (chunkIndex < ids.Count)
{
var chunkIDs = ids.GetRange(chunkIndex,
chunkIndex + batch >= ids.Count ? ids.Count - chunkIndex : batch);
var otherDbItems = await this.context.OtherDbItems.Where(otherDbItem => chunkIDs.Contains(otherDbItem.X + otherDbItem.Y + otherDbItem.Z)).ToListAsync();
chunkIndex += batch;
}
So I think this makes your query less expensive since it isn't having to run the entire thing and then limiting the result, but where your situation is slightly different is that your source is also a database whereas ours was JSON content. You could probably further optimize this by using the take() approach on your query of of ids in the Items source table. The syntax on this might not be 100% right, but perhaps this gives the idea:
int chunkIndex = 0;
int batch = 1000;
// Update dbItemsContext.Items to your source context and table
int totalRecords = dbItemsContext.Items.Count();
while (chunkIndex < totalRecords)
{
// Update dbItemsContext.Items to your source context and table
var chunkIDs = dbItemsContext.Items.Select(item => item.A + item.B + item.C).Take(batch).Skip(chunkIndex).ToList();
var otherDbItems = await this.context.OtherDbItems.Where(otherDbItem => chunkIDs.Contains(otherDbItem.X + otherDbItem.Y + otherDbItem.Z)).ToListAsync();
chunkIndex += batch;
}
I hope that helps demonstrate our approach, but I think this route you'd need to lock the tables to avoid changes until your operations are complete. I welcome any feedback since it could improve our process as well. I'll also note that our application/context is not setup to run async so you might need some additional modifications or could possibly even have these batches run asynchronously for your use case.
Final note in regards to batch size: you may need to play with it a bit. Our query was quite a bit more complex so 1000 seemed to be the sweet spot for us, but you may be able to take quite a bit more at a time. I'm not sure there's any other way to determine the best batch size without just testing some different sizes.
Ok, it was much easier than i thought. Both databases are in the same SQL server, so it was mater of simple inner join.
I just added properties from Item to OtherDbItem
public class OtherDbItem
{
public string A { get; set; }
public string B { get; set; }
public string C { get; set; }
public DateTime Date { get; set; }
public string X { get; set; }
public string Y { get; set; }
public string Z { get; set; }
public string FewOtherProperties { get; set; }
}
And in OnModemCreating:
protected override void OnModelCreating(ModelBuilder builder)
{
builder.Entity<OtherDbItem>(
entity =>
{
entity.ToSqlQuery(#"SELECT
i.A,
i.B,
i.C,
i.Date,
o.X,
o.Y,
o.Z,
o.FewOtherProperties
FROM [DB1].[dbo].[Items] i
inner join [DB2].[dbo].[OtherDbItem] o on i.A = o.X and i.B = o.Y and i.C = o.Z");
entity.HasKey(o => new { o.X, o.Y, o.Z});
});
}
And last thing to do:
{
return this.context.OtherDbItems
.Where(x => x.Date == date)
.Distinct()
.ToListAsync();
}

LINQ - AND, ANY and NOT Query

I am fairly rookie with LINQ. I can do some basic stuff with it but I am in need of an expert.
I am using Entity Framework and I have a table that has 3 columns.
public class Aspect
{
[Key, Column(Order = 0)]
public int AspectID { get; set; }
[Key, Column(Order = 1)]
public int AspectFieldID { get; set; }
public string Value { get; set; }
}
I have 3 lists of words from a user's input. One contains phrases or words that must be in the Value field (AND), another contains phrases or words that don't have to be in the Value field (ANY) and the last list contains phrases or words that can not be found in the Value field (NOT).
I need to get every record that has all of the ALL words, any of the ANY words and none of the NOT words.
Here are my objects.
public class SearchAllWord
{
public string Word { get; set; }
public bool includeSynonoyms { get; set; }
}
public class SearchAnyWord
{
public string Word { get; set; }
public bool includeSynonoyms { get; set; }
}
public class SearchNotWord
{
public string Word { get; set; }
}
What I have so far is this,
var aspectFields = getAspectFieldIDs().Where(fieldID => fieldID > 0).ToList();//retrieves a list of AspectFieldID's that match user input
var result = db.Aspects
.Where(p => aspectFields.Contains(p.AspectFieldID))
.ToList();
Any and all help is appreciated.
First let me say, if this is your requirement... your query will read every record in the database. This is going to be a slow operation.
IQueryable<Aspect> query = db.Aspects.AsQueryable();
//note, if AllWords is empty, query is not modified.
foreach(SearchAllWord x in AllWords)
{
//important, lambda should capture local variable instead of loop variable.
string word = x.Word;
query = query.Where(aspect => aspect.Value.Contains(word);
}
foreach(SearchNotWord x in NotWords)
{
string word = x.Word;
query = query.Where(aspect => !aspect.Value.Contains(word);
}
if (AnyWords.Any()) //haha!
{
List<string> words = AnyWords.Select(x => x.Value).ToList();
query =
from aspect in query
from word in words //does this work in EF?
where aspect.Value.Contains(word)
group aspect by aspect into g
select g.Key;
}
If you're sending this query into Sql Server, be aware of the ~2100 parameter limit. Each word is going to be sent as a parameter.
What you need are the set operators, specifically
Intersect
Any
Bundle up your "all" words into a string array (or some other enumerable) and then you can use intersect and count to check they are all present.
Here are two sets
var A = new string[] { "first", "second", "third" };
var B = new string[] { "second", "third" };
A is a superset of B?
var isSuperset = A.Intersect(B).Count() == B.Count();
A is disjoint with B?
var isDisjoint1 = !A.Intersect(B).Any();
var isDisjoint2 = !A.Any(a => B.Any(b => a == b)); //faster
Your objects are not strings so you will want the overload that allows you to supply a comparator function.
And now some soapboxing.
Much as I love Linq2sql it is not available in ASP.NET Core and the EF team wants to keep it that way, probably because jerks like me keep saying "gross inefficiency X of EF doesn't apply to Linq2Sql".
Core is the future. Small, fast and cross platform, it lets you serve a Web API from a Raspberry Pi running Windows IOT or Linux -- or get ridiculously high performance on big hardware.
EF is not and probably never will be a high performance proposition because it takes control away from you while insisting on being platform agnostic, which prevents it from exploiting the platform.
In the absence of Linq2sql, the solution seems to be libraries like Dapper, which handle parameters when sending the query and map results into object graphs when the result arrives, but otherwise don't do much. This makes them more or less platform agnostic but still lets you exploit the platform - apart from parameter substitution your SQL is passthrough.

Linq query projecting Id's not names

I have a simple setup.
FRUIT Table
Id Name
1 Gala Apples
2 Navel Oranges
3 Peach
4 Mandarin Oranges
5 Kiwi
6 Fuji Apples
INTERSECT TABLE
FruitId CrossRefFruitId
1 6
2 4
So if the user is looking at Gala Apples (1) they may also be interested in Fuji Apples (6).
I have a simple model that returns the Fruit
Model
public class FruitCategory
{
public int Id { get; set; }
public string FruitName { get; set; }
}
EF:
public IEnumerable<FruitCategory> GetFruitbyId(int id)
{
return _context.FruitTable.Where(q => q.FruitId == id);
}
This works fine but now I also want to add the "SeeAlso" fruit. So I create a crossref modal and a new field in my Model.
CrossReff Model
public class FruitCrossRef
{
public int Id { get; set; }
public string CrossRefName { get; set; }
}
Model
public class FruitCategory
{
public int Id { get; set; }
public string FruitName { get; set; }
public List<FruitCrossRef> SeeAlsoFruits {get; set;}
}
Now I come to my difficulty....how to get a LINQ projection that will populate this model.
Since I don't know how to write this I open LINQPAD and start hacking and googling.
So far this is what I have come up with but it returns the MATCHING id in the intersect table but what I want is to return the CrossReferenced ID and the FruitName in the Fruit Table.
var seeAlso =
(from frt in FruitTable
where frt.Id == 1
select frt.Id)
.Intersect
(from frtCross in IntersectTable
select frtCross.FruitId);
seeAlso.Dump();
Now I can see a path where I can get the job done by making several loops getting the seealso references and then for each one going back to the Fruit table and getting that record...however it seems there ought to be a way to leverage the power of the relationship and project my fully populated model???
Code Correction
For anyone else who may come across this there were a couple syntax errors in the answer but the answer was still exactly what I wanted.
var seeAlso =
(from frt in FruitTable
join intsec in IntersectionTable
on frt.Id equals intsec.CrossRefFruitId
where intsec.FruitId == 1
select frt);
seeAlso.Dump();
Remember this was written for Linq Pad a little more tweaking is needed for production code.
What you ultimately want is a list of FruitItems that are also of interest based on some other fruit, given that fruits Id. Therefore rather than selecting the fruit corresponding to the Id you want, you should select the Fruits that join to the Intersection table with that Id. For example.
var seeAlso =
(from frt in FruitTable
join intsec in IntersectionTable
on frt.Id = intsec.CrossRefFruitId
where intsec.FruitId == 1);

Efficient query involving count in subquery

Say I have this hypothetical many-to-many relationship:
public class Paper
{
public int Id { get; set; }
public string Title { get; set; }
public virtual ICollection<Author> Authors { get; set; }
}
public class Author
{
public int Id { get; set; }
public string Name { get; set; }
public virtual ICollection<Paper> Papers { get; set; }
}
I want to use LINQ to build a query that will give me the "popularity" of each author compared to other authors, which is the number of papers the author contributed to divided by the total number of author contributions in general across all papers. I've come up with a couple queries to achieve this.
Option 1:
var query1 = from author in db.Authors
let sum = (double)db.Authors.Sum(a => a.Papers.Count)
select new
{
Author = author,
Popularity = author.Papers.Count / sum
};
Option 2:
var temp = db.Authors.Select(a => new
{
Auth = a,
Contribs = a.Papers.Count
});
var query2 = temp.Select(a => new
{
Author = a,
Popularity = a.Contribs / (double)temp.Sum(a2 => a2.Contribs)
});
Basically, my question is this: which of these is more efficient, and are there other single queries that are more efficient? How do any of those compare to two separate queries, like this:
double sum = db.Authors.Sum(a => a.Papers.Count);
var query3 = from author in db.Authors
select new
{
Author = author,
Popularity = author.Papers.Count / sum
};
Well, first of all, you can try them out yourself and see which one takes the longest for instance.
The first thing you should look for is that they translate perfectly into SQL or as close as possible so that the data doesn't get all get loaded in memory just to apply those computations.
But i feel that option 2 might be your best shot ,with one more optimization to cache the total sum of pages contributed. This way you only make one call to the db to get the authors which you anyway need, the rest will run in your code and there you can paralellize and do whatever you need to make it fast.
So something like this (sorry, I prefer the Fluent style of writing Linq):
//here you can even load only the needed info if you don't need the whole entity.
//I imagine you might only need the name and the Pages.Count which you can use below, this would be another optimization.
var allAuthors = db.Authors.All();
var totalPageCount = allAuthors.Sum(x => x.Pages.Count);
var theEndResult = allAuthors .Select(a => new
{
Author = a,
Popularity = a.Pages.Count/ (double)totalPageCount
});
Option 1 and 2 should generate the same SQL code. For readability I would go with option 1.
Option 3 will generate two SQL statements and be a little slower.

I need help with using linq

I want to filter the objects that I have by their topic.
I have many topics: Arts, Economics, Business, Politics. Each topic is a property within the object that I try to classify from a list of those objects.
Here is part of my objects:
public class AllQuestionsPresented
{
public string Name{ get; set; }
public string ThreadName { get; set; }
public string Topic { get; set; }
public string Subtopic { get; set; }
public int Views { get; set; }
public int Replies { get; set; }
public int PageNumber { get; set; }
public DateTime Time { get; set; }
// snip
I created many of those objects feed their properties with different values and put them into a List:
List<AllQuestionsPresented> forumData;
Now I want to group them all into linq by their topics..
var groupedByPages =
from n in forumData
group n by forumData
select .....
Basically i dont know how to continue cause i am not used to deal with linq.. what i want to get is some dictionary..
Dictionary<string,AllQuestionsPresented> dictionary..
If i dont use linq, and add to a dictionary every topic, it will put several "AllQuestionsPresented" objects with the same topic..which will throw an exception..so i have to use group by..but dont know how to achieve that manipulation
You can use ToLookup, which will give you a key/list of values collection. Your key will be the Topic, and you will get a list of AllQuestionsPresented for each key.
var lookup = forumData.ToLookup(f => f.Topic);
Reference on ToLookup
var groupedByTopics =
from n in forumData
group n by forumData.Topic into g
select new { Topic = forumData.Topic, Questions = g }
You may also want to keep this around for reference :-)
http://msdn.microsoft.com/en-us/vcsharp/aa336746
The grouped results are returned as an IEnumerable<IGrouping<TKey, T>>, which in your case will be IEnumerable<IGrouping<string, AllQuestionsPresented>>.
The code below shows how you can access the data in the grouping.
var groupedByTopic = from question in forumData
group question by question.Topic;
foreach (var group in groupedByTopic)
{
Console.WriteLine(group.Key);
foreach (var question in group)
{
Console.WriteLine("\t" + question.Name);
}
}
To create a dictionary from the above you can do the following
var groupingDictionary = groupedByTopic.ToDictionary(q=>q.Key, q=>q.ToList());
Which will give you a Dictionary<string, List<AllQuestionsPresented>>
If you went the LookUp route, which is nicely demonstrated by #wsanville
, then you can get the dictionary the same way
var lookup = forumData.ToLookup(q => q.Topic);
var groupingDictionary = lookup.ToDictionary(q => q.Key, q => q.ToList());
You can just call ToDictionary. The parameters are a function to select the keys and another to select the values:
var groupedByPages =
from n in forumData
group n by n.Topic;
IDictionary<string, IEnumerable<AllQuestionsPresented>> dictionary =
groupedByPages.ToDictionary(x => x.Key, x => x.AsEnumerable());
But if all you need from the IDictionary interface is the indexing operation, it's easier to just use a ILookup:
ILookup<string, AllQuestionsPresented> groupedByPages = forumData.ToLookup(x => x.Topic);
var groupedByPages =
from n in forumData
group n by forumData.Topic
select n;

Categories

Resources