Getting the DENSE_RANK() in C# over multiple splits of data - c#

I have a dataset that is built gradually in parts, and as each part is done, I'm associating the entries with their DENSE_RANK() with the following code (source: implement dense rank with linq):
aQueryable.GroupBy(x => x)
.Where(g => g.Any())
.OrderBy(g => g.Key.SortOrder1)
.ThenBy(g => g.Key.SortOrder2)
.ThenBy(g => g.Key.SortOrder3)
.Select((g, i) =>
{
++i;
foreach (var x in g)
{
x.DenseRank = i;
}
return g;
}).Select(g => g.Key)
SQL equivalent: DENSE_RANK() OVER ( ORDER BY SortOrder1, SortOrder2, SortOrder3 )
However, the DenseRank that I'm computing here doesn't match the DENSE_RANK() I get in SQL once the entire dataset is written. I suspect this is because I'm computing my DENSE_RANK() on a subset of the full dataset.
Is there any way I can compute the same DENSE_RANK() as SQL without waiting for my entire dataset to finish populating first?

If you wanted to stream the results through an IEnumerable, you can OrderBy on the database and then write your own select, checking for rank:
var lastSort1 = default(int);
var lastSort2 = default(int);
var lastSort3 = default(int);
var firstRun = true;
var rank = 1;
iqueryable
.OrderBy(i => i.SortOrder1)
.ThenBy(i => i.SortOrder2)
.ThenBy(i => i.SortOrder3)
.AsEnumerable()
.Select(i =>
{
if (!firstRun && (lastSort1 != i.SortOrder1 || lastSort2 != i.SortOrder2 || lastSort3 != i.SortOrder3))
{
rank++;
}
firstRun = false;
lastSort1 = i.SortOrder1;
lastSort2 = i.SortOrder2;
lastSort3 = i.SortOrder3;
i.DenseRank = rank;
return i;
});
As soon as the first data comes through, you'll start receiving items through the IEnumerable, but eventually it will all be materialized.

Related

Performing multiple Linq queries against the same Linq result

I have created a dashboard that all data displayed on it shares 4 common elements (startDate,endDate,CompanyID,StoreID) that are used as Where clauses in a Linq statement. The result of that statement is then queried in a variety of ways to group and sort the data and used in charts, lists etc. Here is a short snippit to show the duplication that is currently going on:
var dashboardEntity = new BlueStreakSalesDWEntities();
//Get Total Sales
ViewBag.companySalesTotal = dashboardEntity.FactSales.Where(d => d.DateKey >= startDate)
.Where(d => d.DateKey <= endDate)
.Where(c => c.CompanyID == companyID)
.Sum(a => a.Amount);
//get list of all items sold
var companyStoreTotalItem = dashboardEntity.FactSales.Where(d => d.DateKey >= startDate)
.Where(d => d.DateKey <= endDate)
.Where(c => c.CompanyID == companyID).GroupBy(m => new { m.Description })
.Select(g => new DescriptionAmountModel { Amount = g.Sum(a => a.Amount).Value, Description = g.Key.Description })
.OrderByDescending(x => x.Amount);
I have like 15 of these calls on the dashboard and it can get very slow at times from what I imagine are multiple calls when in reality the database only needs to be queried once then that result needs to be queried for different results.
How can I do this?
Any help would be greatly appreciated
In your current solution each query executes separatly, on the same data. You can first execute the shared parts of the queries and bring the results from database. In your examples it is these where conditions
//Executes in database
var entities = dashboardEntity.FactSales.Where(d => d.DateKey >= startDate)
.Where(d => d.DateKey <= endDate)
.Where(c => c.CompanyID == companyID)
.ToList();
Now that this data is filtered to only what you want you can in memory do the rest of the aggregations:
//Happens in the List<T> in memory
ViewBag.companySalesTotal = entities.Sum(a => a.Amount);
var companyStoreTotalItem = entities.GroupBy(m => new { m.Description })
.Select(g => new DescriptionAmountModel { Amount = g.Sum(a => a.Amount).Value, Description = g.Key.Description })
.OrderByDescending(x => x.Amount);
This way you can make efficient. This make the query execute single time in database and rest of the part happen on the pullout in memory data
var result = dashboardEntity.FactSales.Where(d => d.DateKey >= startDate && d => d.DateKey <= endDate && d.CompanyID == companyID).ToList();
ViewBag.companySalesTotal = result.Sum(a => a.Amount);
//then get list of all items sold from in memory data
var companyStoreTotalItem = result.GroupBy(m => new { m.Description }).Select(g => new DescriptionAmountModel { Amount = g.Sum(a => a.Amount).Value, Description = g.Key.Description }).OrderByDescending(x => x.Amount);

Union of ordered sets in entity framework and then skip

I have a table of lessons, and I want to perform a text search over several fields of it. However the search should be ordered: for example lesson have a Keywords field and Description field. The search should give a priority over values found by Keywords. Everything should be also ordered by date but only after the priority is considered.
I'm also using ToPagedList() in the end from https://github.com/troygoode/PagedList (I think it just uses Skip() and Top() to manage pages)
This is what I have so far:
string[] word = /*Search words*/
var data = db.LessonsLearneds.Where(dbRecord => words.Any(word =>
dbRecord.SearchKeywords.StartsWith(word + ",") ||
dbRecord.SearchKeywords.Contains("," + word + ",") ||
dbRecord.SearchKeywords.EndsWith("," + word)))
.Select(x => new { Record = x, Order = 1 });
data = data.Union(
db.LessonsLearneds
.Where(dbRecord => words.Any(word => dbRecord.Title.Contains(word)))
.Select(x => new { Record = x, Order = 2 }));
data = data.Union(
db.LessonsLearneds
.Where(dbRecord => words.Any(word => dbRecord.Description.Contains(word)))
.Select(x => new { Record = x, Order = 3}));
data = data.Union(
db.LessonsLearneds
.Where(dbRecord => words.Any(word => dbRecord.Lesson.Contains(word)))
.Select(x => new { Record = x, Order = 4 }));
return data
.Distinct()
.OrderBy(x => x.Order)
.ThenByDescending(x => x.Record.Date)
.Select(x => x.Record)
.ToPagedList(pageNumber, pageSize);
Overall this code does almost what I want, except of Distinct(). Each union here can retrieve the same record, so I may receive it several times, and Distinct() does not forces the uniqueness because of virtual Order field. I cannot put Distinct after Select(x => x.Record) because of ToPagedList(..) which requires the set to be ordered (results in: The method 'Skip' is only supported for sorted input in LINQ to Entities. exception)
Any ideas?
I have one so far: to add Order field after I Distinct, but this means that I will have to write those Contains checks twice which I think is very ugly solution.
First, since you are projecting unique records due to the different Order value, replace the Union operator with Concat (which is the LINQ equivalent of the SQL UNION ALL).
string[] word = /*Search words*/
var data = db.LessonsLearneds.Where(dbRecord => words.Any(word =>
dbRecord.SearchKeywords.StartsWith(word + ",") ||
dbRecord.SearchKeywords.Contains("," + word + ",") ||
dbRecord.SearchKeywords.EndsWith("," + word)))
.Select(x => new { Record = x, Order = 1 });
data = data.Concat(
db.LessonsLearneds
.Where(dbRecord => words.Any(word => dbRecord.Title.Contains(word)))
.Select(x => new { Record = x, Order = 2 }));
data = data.Concat(
db.LessonsLearneds
.Where(dbRecord => words.Any(word => dbRecord.Description.Contains(word)))
.Select(x => new { Record = x, Order = 3}));
data = data.Concat(
db.LessonsLearneds
.Where(dbRecord => words.Any(word => dbRecord.Lesson.Contains(word)))
.Select(x => new { Record = x, Order = 4 }));
Then replace the Distinct with GroupBy using x.Record as a key and taking min Order for each grouping, and do the rest as in your current query:
return data
.GroupBy(x => x.Record)
.Select(g => new { Record = g.Key, Order = g.Min(x => x.Order) })
.OrderBy(x => x.Order)
.ThenByDescending(x => x.Record.Date)
.Select(x => x.Record)
.ToPagedList(pageNumber, pageSize);
You can replace Distinct with GroupBy and Select, like this:
return data
.GroupBy(x => x.Record)
.Select(g => g.OrderBy(x => x.Order).ThenByDescending(x => x.Record.Date).First())
.OrderBy(x => x.Order)
.ThenByDescending(x => x.Record.Date)
.Select(x => x.Record)
.ToPagedList(pageNumber, pageSize);
The unfortunate side effect of this approach is that you need to repeat OrderBy inside the first Select, but it should produce the results that you are looking for.

Improve the time complexity of current Linq queries

I have the following lists:
RakeSnapshots, ProductMovements
Aim is to process the both and get the count of elements that match a condition, as follows:
Consider RakeSnapshots with StatusCode == "Dumping"
Consider ProductMovement with Status == "InProgress"
Fetch the count of all elements both lists, which meet the condition RakeSnapshots.RakeCode equal to ProductMovements.ProductCode
Following are my current options:
// Code 1:
var resultCount = ProductMovements.Where(x => RakeSnapshots
.Where(r => r.StatusCode == "Dumping")
.Any(y => y.RakeCode == x.ProductCode &&
x.Status == "InProgress"))
.Count();
// Code 2:
var productMovementsInprogress = ProductMovements.Where(x => x.Status == "InProgress");
var rakeSnapShotsDumping = RakeSnapshots.Where(r => r.StatusCode == "Dumping");
var resultCount = productMovementsInprogress.Zip(rakeSnapShotsDumping,(x,y) => (y.RakeCode == x.ProductCode) ? true : false)
.Where(x => x).Count();
Challenge is both the codes are O(n^2) complexity, is there a way to improve it, this will hurt if the data is very large
You can use an inner join to do this:
var dumpingRakeSnapshots = rakeSnapshots.Where(r => r.StatusCode == "Dumping");
var inProgressProductMovements = productMovements.Where(p => p.Status == "InProgress");
var matches =
from r in dumpingRakeSnapshots
join p in inProgressProductMovements on r.RakeCode equals p.ProductCode
select r;
int count = matches.Count(); // Here's the answer.
Note that (as Ivan Stoev points out) this only works if RakeCode is the primary key of RakeSnapshots.
If it is not, you will have to use a grouped join.
Here's the Linq query syntax version that you should use in that case, but note that this is exactly the same as Ivan's answer (only in Linq query form):
var matches =
from r in dumpingRakeSnapshots
join p in inProgressProductMovements on r.RakeCode equals p.ProductCode into gj
select gj;
For completeness, here's a compilable console app that demonstrates the different results you'll get if RakeCode and ProductCode are not primary keys:
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp1
{
class RakeSnapshot
{
public string StatusCode;
public string RakeCode;
}
class ProductMovement
{
public string Status;
public string ProductCode;
}
sealed class Program
{
void run()
{
var rakeSnapshots = new List<RakeSnapshot>
{
new RakeSnapshot {StatusCode = "Dumping", RakeCode = "1"},
new RakeSnapshot {StatusCode = "Dumping", RakeCode = "1"},
new RakeSnapshot {StatusCode = "Dumping", RakeCode = "2"}
};
var productMovements = new List<ProductMovement>
{
new ProductMovement {Status = "InProgress", ProductCode = "1"},
new ProductMovement {Status = "InProgress", ProductCode = "2"},
new ProductMovement {Status = "InProgress", ProductCode = "2"}
};
var dumpingRakeSnapshots = rakeSnapshots.Where(r => r.StatusCode == "Dumping");
var inProgressProductMovements = productMovements.Where(p => p.Status == "InProgress");
// Inner join.
var matches1 =
from r in dumpingRakeSnapshots
join p in inProgressProductMovements on r.RakeCode equals p.ProductCode
select r;
Console.WriteLine(matches1.Count());
// Grouped join.
var matches2 =
from r in dumpingRakeSnapshots
join p in inProgressProductMovements on r.RakeCode equals p.ProductCode into gj
select gj;
Console.WriteLine(matches2.Count());
// OP's code.
var resultCount =
productMovements
.Count(x => rakeSnapshots
.Where(r => r.StatusCode == "Dumping")
.Any(y => y.RakeCode == x.ProductCode && x.Status == "InProgress"));
Console.WriteLine(resultCount);
}
static void Main(string[] args)
{
new Program().run();
}
}
}
Sounds like Group Join which (as well as Join) is the most efficient LINQ way of correlating two sets:
var resultCount = ProductMovements.Where(p => p.Status == "InProgress")
.GroupJoin(RakeSnapshots.Where(r => r.StatusCode == "Dumping"),
p => p.ProductCode, r => r.RakeCode, (p, match) => match)
.Count(match => match.Any());
The time complexity of the above is O(N+M).
Normally, with an O(N^2), you'd look to create an intermediate 'search' data structure which speeds up the lookup. Something like a hash table for O(1) access, or a sorted list for O(log N) access.
Technically, you have two different lists, so the actual order would be O(P.R), where P is the number of product movements, and R is the number of rake snapshots.
In your case, this is your original code;
var resultCount = ProductMovements
.Where(x => RakeSnapshots
.Where(r => r.StatusCode == "Dumping")
.Any(y => y.RakeCode == x.ProductCode &&
x.Status == "InProgress"))
.Count();
Is O(P.R) because for each P, the inner where clause is looping through every R. I'd look to creating a Dictionary<T> or HashSet<T>, then transforming your code to something like
var rakeSnapshotSummary = ... magic happens here ...;
var resultCount = ProductMovements
.Where(x => rakeSnapshotSummary[x.ProductCode] == true)
.Count();
In this way, creating the snapshot is O(R), lookup into the data structure is O(1), and creating the result is O(P), for a much healthier O(P+R). I thing that's is as good as it can be.
So my suggestion for your indexing routine would be something like;
var rakeSnapshotSummary = new HashSet<string>(RakeSnapshots
.Where(r => r.StatusCode == "Dumping")
.Select(r => r.RakeCode));
This creates a HashSet<string> which will have O(1) time complexity for testing existance of a rake code. Then your final line looks like
var resultCount = ProductMovements
.Where(x => x.Status == "InProgress" && rakeSnapshotSummary.Contains(x.ProductCode))
.Count();
So overall, O(P+R) or, roughly, O(2N) => O(N).

Groupby and selectMany and orderby doesn't bring back the data I need

I have two List row1 and row2.This is data for row1:
and data for row2:
I Concatenate these two lists into one :
var rows = rows1.Concat(rows2).ToList();
The result would be this:
and then want to groupBy on a few fields and order by with other fields.and do some changes to some data. This is my Code
var results = rows.GroupBy(row => new { row.FromBayPanel, row.TagNo })
.SelectMany(g => g.OrderBy(row => row.RowNo)
.Select((x, i) =>
new
{
TagGroup = x.TagGroup,
RowNo = (i == 0) ? (j++).ToString() : "",
TagNo = (i == 0) ? x.TagNo.ToString() : "",
FromBayPanel = x.FromBayPanel,
totalItem = x.totalItem
}).ToList());
which brings me back this result:
This is not what I really want I want to have this result. I Want all data with same "FromBayPanel" be listed together.
which part of my code is wrong?
I think when you want to order the elements within your group you have to use a different approach as SelectMany will simply flatten your grouped items into one single list. Thus instead of rows.GroupBy(row => new { row.FromBayPanel, row.TagNo }).SelectMany(g => g.OrderBy(row => row.RowNo) you may use this:
rows.OrderBy(x => x.FromBayPanel).ThenBy(x => x.TagNo) // this preserves the actual group-condition
.ThenBy(x => x.RowNo) // here you order the items of every item within the group by its RowNo
.GroupBy(row => new { row.FromBayPanel, row.TagNo })
.Select(...)
EDIT: You have to make your select WITHIN every group, not afterwards:
rows.GroupBy(row => new { row.FromBayPanel, row.TagNo })
.ToDictionary(x => x.Key,
x => x.OrderBy(y => y.RowNo)
.Select((y, i) =>
new
{
TagGroup = y.TagGroup,
RowNo = (i == 0) ? (j++).ToString() : "",
TagNo = (i == 0) ? y.TagNo.ToString() : "",
FromBayPanel = x.FromBayPanel,
totalItem = y.totalItem
})
)
EDIT: Test see here

Translating SQL to lambda with groupby

I'm trying to translate this sql statement
SELECT row, SUM(value) as VarSum, AVG(value) as VarAve, COUNT(value) as TotalCount
FROM MDNumeric
WHERE collectionid = 6 and varname in ('C3INEV1', 'C3INEVA2', 'C3INEVA3', 'C3INVA11', 'C3INVA17', 'C3INVA19')
GROUP BY row
into an EF 4 query using lambda expressions and am missing something.
I have:
sumvars = sv.staticvararraylist.Split(',');
var aavresult = _myIFR.MDNumerics
.Where(r => r.collectionid == _collid)
.Where(r => sumvars.Contains(r.varname))
.GroupBy(r1 =>r1.row)
.Select(rg =>
new
{
Row = rg.Key,
VarSum = rg.Sum(p => p.value),
VarAve = rg.Average(p => p.value),
TotalCount = rg.Count()
});
where the staticvararraylist has the string 'C3INEV1', 'C3INEVA2', 'C3INEVA3', 'C3INVA11', 'C3INVA17', 'C3INVA19' (without single quotes) and the _collid variable = 6.
While I'm getting the correct grouping, my sum, average, & count values aren't correct.
You didn't post your error message, but I suspect it's related to Contains. I've found that Any works just as well.
This should get you quite close:
var result =
from i in _myIFR.MDNumerics
where i.collectionid == _collid && sumvars.Any(v => i.varname == v)
group i by i.row into g
select new {
row = g.Key,
VarSum = g.Sum(p => p.value),
VarAve = g.Average(p => p.value),
TotalCount = g.Count()
};
Try this:
var aavresult = _myIFR.MDNumerics
.Where(r => r.collectionid == _collid && sumvars.Contains(r.varname))
.GroupBy(r1 =>r1.row,
(key,res) => new
{
Row = key,
VarSum = res.Sum(r1 => r1.value),
VarAve = res.Average(r1 => r1.value),
TotalCount = res.Count()
});

Categories

Resources