LINQ query to select rows matching an array of pairs

LINQ query to select rows matching an array of pairs - c#

Right now, I have a class called TrainingPlan that looks like this:
public class TrainingPlan
{
public int WorkgroupId { get; set; }
public int AreaId { get; set; }
}
I'm given an array of these instances, and need to load the matching training plans from the database. The WorkgroupId and AreaId basically form a compound key. What I'm doing now is looping through each TrainingPlan like so:
foreach (TrainingPlan plan in plans)
LoadPlan(pid, plan.AreaId, plan.WorkgroupId);
Then, LoadPlan has a LINQ query to load the individual plan:
var q = from tp in context.TPM_TRAININGPLAN.Include("TPM_TRAININGPLANSOLUTIONS")
where tp.PROJECTID == pid && tp.AREAID == areaid &&
tp.WORKGROUPID == workgroupid
select tp;
return q.FirstOrDefault();
The Problem:
This works, however it's very slow for a large array of plans. I believe this could be much faster if I could perform a single LINQ query to load in every TPM_TRAININGPLAN at once.
My Question:
Given an array of TrainingPlan objects, how can I load every matching WorkgroupId/AreaId combination at once? This query should translate into similar SQL syntax:
SELECT * FROM TPM_TRAININGPLANS
WHERE (AREAID, WORKGROUPID) IN ((1, 2), (3, 4), (5, 6), (7, 8));

I've used Contains to run a bulk filter similar to where-in. I setup a rough approximation of your scenario. The single select queries actually ran quicker than Contains did. I recommend running a similar test on your end with the DB tied in to see how your results wind up. Ideally see how it scales too. I'm running .NET 4.0 in visual studio 2012. I jammed in ToList() calls to push past potential lazy loading problems.
public class TrainingPlan
{
public int WorkgroupId { get; set; }
public int AreaId { get; set; }
public TrainingPlan(int workGroupId, int areaId)
{
WorkgroupId = workGroupId;
AreaId = areaId;
}
}
public class TrainingPlanComparer : IEqualityComparer<TrainingPlan>
{
public bool Equals(TrainingPlan x, TrainingPlan y)
{
//Check whether the compared objects reference the same data.
if (x.WorkgroupId == y.WorkgroupId && x.AreaId == y.AreaId)
return true;
return false;
}
public int GetHashCode(TrainingPlan trainingPlan)
{
if (ReferenceEquals(trainingPlan, null))
return 0;
int wgHash = trainingPlan.WorkgroupId.GetHashCode();
int aHash = trainingPlan.AreaId.GetHashCode();
return wgHash ^ aHash;
}
}
internal class Class1
{
private static void Main()
{
var plans = new List<TrainingPlan>
{
new TrainingPlan(1, 2),
new TrainingPlan(1, 3),
new TrainingPlan(2, 1),
new TrainingPlan(2, 2)
};
var filter = new List<TrainingPlan>
{
new TrainingPlan(1, 2),
new TrainingPlan(1, 3),
};
Stopwatch resultTimer1 = new Stopwatch();
resultTimer1.Start();
var results = plans.Where(plan => filter.Contains(plan, new TrainingPlanComparer())).ToList();
resultTimer1.Stop();
Console.WriteLine("Elapsed Time for filtered result {0}", resultTimer1.Elapsed);
Console.WriteLine("Result count: {0}",results.Count());
foreach (var item in results)
{
Console.WriteLine("WorkGroup: {0}, Area: {1}",item.WorkgroupId, item.AreaId);
}
resultTimer1.Reset();
resultTimer1.Start();
var result1 = plans.Where(p => p.AreaId == filter[0].AreaId && p.WorkgroupId == filter[0].WorkgroupId).ToList();
var result2 = plans.Where(p => p.AreaId == filter[1].AreaId && p.WorkgroupId == filter[1].WorkgroupId).ToList();
resultTimer1.Stop();
Console.WriteLine("Elapsed time for single query result: {0}",resultTimer1.Elapsed);//single query is faster
Console.ReadLine();
}
}

It seems to me that using Intersect() may get this done the way that you want. But, I don't have an environment set up to test this myself.
var q = (from tp in context.TPM_TRAININGPLAN.Include("TPM_TRAININGPLANSOLUTIONS")
where pid == tp.PROJECTID
select tp)
.Intersect
(from tp in context.TPM_TRAININGPLAN.Include("TPM_TRAININGPLANSOLUTIONS")
where plans.Any(p => p.AreaID == tp.AREAID)
select tp)
.Intersect
(from tp in context.TPM_TRAININGPLAN.Include("TPM_TRAININGPLANSOLUTIONS")
where plans.Any(p => p.WorkgroupId == tp.WORKGROUPID)
select tp);
My only concern might be that Intersect could cause it to load more records in memory than you would want, but I'm unable to test to confirm if that's the case.

Related

How do I minimize C#, LINQ code in Deleting Multiple Records

Here is a method to delete zero Inventory records from Inventory Table. I would like to reduce code/no of times that LINQ executes on Database.
Inventory Table
public class Inventory
{
public int itemCode { get; set; }
public decimal price { get; set; }
public decimal availQty { get; set; } // Can have Negative values.
}
example data
itemCode price availQty
1 10 10
1 12 -10
2 10 10
From above records, i want to delete all records of itemCode == 1, as net availQty is 0.
Here is my method
private void RemoveZeroInvs()
{
// Remove individual zero Inventorys
var rinvs = from ri in _context.Inventorys
where ri.availQty == 0
select ri;
_context.Inventorys.RemoveRange(rinvs);
_context.SaveChanges();
// Remove if group is zero in availQty, as it allows Negative Qty.
var result = from d in _context.Inventorys
group d by new
{
d.itemCode
}
into g
select new
{
g.Key.itemCode,
availQty = g.Sum(y => y.availQty)
};
var zrs = from r in result
where r.availQty == 0
select r;
foreach (var zr in zrs) // Here, zrs length may be more than 500
{
var ri = _context.Inventorys.Where(w => w.itemCode == zr.itemCode);
_context.Inventorys.RemoveRange(ri);
_context.SaveChanges();
}
}
I use Asp.Net Core 2.2. Is there any such possibility?
Also I get following error at line _context.Inventorys.RemoveRange(ri); in the loop.
A command is already in progress: SELECT t."itemCode", t."availQty"
FROM (
SELECT d."itemCode", SUM(d."availQty") AS "availQty"
FROM "Inventorys" AS d
GROUP BY d."itemCode"
) AS t

var todelete = _context.Inventorys
.GroupBy(i => i.itemCode)
.Where(g => g.Sum(i => i.availQty) == 0)
.SelectMany(g => g);
Here is a shorter versions of your code, in terms of DB excecution, one would have to compare the raw queries. But it may be lighter that your code…

How do I get total Qty using one linq query?

I have two linq queries, one to get confirmedQty and another one is to get unconfirmedQty.
There is a condition for getting unconfirmedQty. It should be average instead of sum.
result = Sum(confirmedQty) + Avg(unconfirmedQty)
Is there any way to just write one query and get the desired result instead of writing two separate queries?
My Code
class Program
{
static void Main(string[] args)
{
List<Item> items = new List<Item>(new Item[]
{
new Item{ Qty = 100, IsConfirmed=true },
new Item{ Qty = 40, IsConfirmed=false },
new Item{ Qty = 40, IsConfirmed=false },
new Item{ Qty = 40, IsConfirmed=false },
});
int confirmedQty = Convert.ToInt32(items.Where(o => o.IsConfirmed == true).Sum(u => u.Qty));
int unconfirmedQty = Convert.ToInt32(items.Where(o => o.IsConfirmed != true).Average(u => u.Qty));
//Output => Total : 140
Console.WriteLine("Total : " + (confirmedQty + unconfirmedQty));
Console.Read();
}
public class Item
{
public int Qty { get; set; }
public bool IsConfirmed { get; set; }
}
}

Actually accepted answer enumerates your items collection 2N + 1 times and it adds unnecessary complexity to your original solution. If I'd met this piece of code
(from t in items
let confirmedQty = items.Where(o => o.IsConfirmed == true).Sum(u => u.Qty)
let unconfirmedQty = items.Where(o => o.IsConfirmed != true).Average(u => u.Qty)
let total = confirmedQty + unconfirmedQty
select new { tl = total }).FirstOrDefault();
it would take some time to understand what type of data you are projecting items to. Yes, this query is a strange projection. It creates SelectIterator to project each item of sequence, then it create some range variables, which involves iterating items twice, and finally it selects first projected item. Basically you have wrapped your original queries into additional useless query:
items.Select(i => {
var confirmedQty = items.Where(o => o.IsConfirmed).Sum(u => u.Qty);
var unconfirmedQty = items.Where(o => !o.IsConfirmed).Average(u => u.Qty);
var total = confirmedQty + unconfirmedQty;
return new { tl = total };
}).FirstOrDefault();
Intent is hidden deeply in code and you still have same two nested queries. What you can do here? You can simplify your two queries, make them more readable and show your intent clearly:
int confirmedTotal = items.Where(i => i.IsConfirmed).Sum(i => i.Qty);
// NOTE: Average will throw exception if there is no unconfirmed items!
double unconfirmedAverage = items.Where(i => !i.IsConfirmed).Average(i => i.Qty);
int total = confirmedTotal + (int)unconfirmedAverage;
If performance is more important than readability, then you can calculate total in single query (moved to extension method for readability):
public static int Total(this IEnumerable<Item> items)
{
int confirmedTotal = 0;
int unconfirmedTotal = 0;
int unconfirmedCount = 0;
foreach (var item in items)
{
if (item.IsConfirmed)
{
confirmedTotal += item.Qty;
}
else
{
unconfirmedCount++;
unconfirmedTotal += item.Qty;
}
}
if (unconfirmedCount == 0)
return confirmedTotal;
// NOTE: Will not throw if there is no unconfirmed items
return confirmedTotal + unconfirmedTotal / unconfirmedCount;
}
Usage is simple:
items.Total();
BTW Second solution from accepted answer is not correct. It's just a coincidence that it returns correct value, because you have all unconfirmed items with equal Qty. This solution calculates sum instead of average. Solution with grouping will look like:
var total =
items.GroupBy(i => i.IsConfirmed)
.Select(g => g.Key ? g.Sum(i => i.Qty) : (int)g.Average(i => i.Qty))
.Sum();
Here you have grouping items into two groups - confirmed and unconfirmed. Then you calculate either sum or average based on group key, and summary of two group values. This also neither readable nor efficient solution, but it's correct.

Map reduce in RavenDb, update 1

Update 1 , following Ayende's answer
This is my first journey into RavenDb and to experiment with it I wrote a small map/ reduce, but unfortunately the result is empty?
I have around 1.6 million documents loaded into RavenDb
A document:
public class Tick
{
public DateTime Time;
public decimal Ask;
public decimal Bid;
public double AskVolume;
public double BidVolume;
}
and wanted to get Min and Max of Ask over a specific period of Time.
The collection by Time is defined as:
var ticks = session.Query<Tick>().Where(x => x.Time > new DateTime(2012, 4, 23) && x.Time < new DateTime(2012, 4, 24, 00, 0, 0)).ToList();
Which gives me 90280 documents, so far so good.
But then the map/ reduce:
Map = rows => from row in rows
select new
{
Max = row.Bid,
Min = row.Bid,
Time = row.Time,
Count = 1
};
Reduce = results => from result in results
group result by new{ result.MaxBid, result.Count} into g
select new
{
Max = g.Key.MaxBid,
Min = g.Min(x => x.MaxBid),
Time = g.Key.Time,
Count = g.Sum(x => x.Count)
};
...
private class TickAggregationResult
{
public decimal MaxBid { get; set; }
public decimal MinBid { get; set; }
public int Count { get; set; }
}
I then create the index and try to Query it:
Raven.Client.Indexes.IndexCreation.CreateIndexes(typeof(TickAggregation).Assembly, documentStore);
var session = documentStore.OpenSession();
var g1 = session.Query<TickAggregationResult>(typeof(TickAggregation).Name);
var group = session.Query<Tick, TickAggregation>()
.Where(x => x.Time > new DateTime(2012, 4, 23) &&
x.Time < new DateTime(2012, 4, 24, 00, 0, 0)
)
.Customize(x => x.WaitForNonStaleResults())
.AsProjection<TickAggregationResult>();
But the group is just empty :(
As you can see I've tried two different Queries, I'm not sure about the difference, can someone explain?
Now I get an error:
The group are still empty :(
Let me explain what I'm trying to accomplish in pure sql:
select min(Ask), count(*) as TickCount from Ticks
where Time between '2012-04-23' and '2012-04-24)

Unfortunately, Map/Reduce doesn't work that way. Well, at least the Reduce part of it doesn't. In order to reduce your set, you would have to predefine specific time ranges to group by, for example - daily, weekly, monthly, etc. You could then get min/max/count per day if you reduced daily.
There is a way to get what you want, but it has some performance considerations. Basically, you don't reduce at all, but you index by time and then do the aggregation when transforming results. This is similar to if you ran your first query to filter and then aggregated in your client code. The only benefit is that the aggregation is done server-side, so you don't have to transmit all of that data to the client.
The performance concern here is how big of a time range are you filtering to, or more precisely, how many items will there be inside your filter range? If it's relatively small, you can use this approach. If it's too large, you will be waiting while the server goes through the result set.
Here is a sample program that illustrates this technique:
using System;
using System.Linq;
using Raven.Client.Document;
using Raven.Client.Indexes;
using Raven.Client.Linq;
namespace ConsoleApplication1
{
public class Tick
{
public string Id { get; set; }
public DateTime Time { get; set; }
public decimal Bid { get; set; }
}
/// <summary>
/// This index is a true map/reduce, but its totals are for all time.
/// You can't filter it by time range.
/// </summary>
class Ticks_Aggregate : AbstractIndexCreationTask<Tick, Ticks_Aggregate.Result>
{
public class Result
{
public decimal Min { get; set; }
public decimal Max { get; set; }
public int Count { get; set; }
}
public Ticks_Aggregate()
{
Map = ticks => from tick in ticks
select new
{
Min = tick.Bid,
Max = tick.Bid,
Count = 1
};
Reduce = results => from result in results
group result by 0
into g
select new
{
Min = g.Min(x => x.Min),
Max = g.Max(x => x.Max),
Count = g.Sum(x => x.Count)
};
}
}
/// <summary>
/// This index can be filtered by time range, but it does not reduce anything
/// so it will not be performant if there are many items inside the filter.
/// </summary>
class Ticks_ByTime : AbstractIndexCreationTask<Tick>
{
public class Result
{
public decimal Min { get; set; }
public decimal Max { get; set; }
public int Count { get; set; }
}
public Ticks_ByTime()
{
Map = ticks => from tick in ticks
select new {tick.Time};
TransformResults = (database, ticks) =>
from tick in ticks
group tick by 0
into g
select new
{
Min = g.Min(x => x.Bid),
Max = g.Max(x => x.Bid),
Count = g.Count()
};
}
}
class Program
{
private static void Main()
{
var documentStore = new DocumentStore { Url = "http://localhost:8080" };
documentStore.Initialize();
IndexCreation.CreateIndexes(typeof(Program).Assembly, documentStore);
var today = DateTime.Today;
var rnd = new Random();
using (var session = documentStore.OpenSession())
{
// Generate 100 random ticks
for (var i = 0; i < 100; i++)
{
var tick = new Tick { Time = today.AddMinutes(i), Bid = rnd.Next(100, 1000) / 100m };
session.Store(tick);
}
session.SaveChanges();
}
using (var session = documentStore.OpenSession())
{
// Query items with a filter. This will create a dynamic index.
var fromTime = today.AddMinutes(20);
var toTime = today.AddMinutes(80);
var ticks = session.Query<Tick>()
.Where(x => x.Time >= fromTime && x.Time <= toTime)
.OrderBy(x => x.Time);
// Ouput the results of the above query
foreach (var tick in ticks)
Console.WriteLine("{0} {1}", tick.Time, tick.Bid);
// Get the aggregates for all time
var total = session.Query<Tick, Ticks_Aggregate>()
.As<Ticks_Aggregate.Result>()
.Single();
Console.WriteLine();
Console.WriteLine("Totals");
Console.WriteLine("Min: {0}", total.Min);
Console.WriteLine("Max: {0}", total.Max);
Console.WriteLine("Count: {0}", total.Count);
// Get the aggregates with a filter
var filtered = session.Query<Tick, Ticks_ByTime>()
.Where(x => x.Time >= fromTime && x.Time <= toTime)
.As<Ticks_ByTime.Result>()
.Take(1024) // max you can take at once
.ToList() // required!
.Single();
Console.WriteLine();
Console.WriteLine("Filtered");
Console.WriteLine("Min: {0}", filtered.Min);
Console.WriteLine("Max: {0}", filtered.Max);
Console.WriteLine("Count: {0}", filtered.Count);
}
Console.ReadLine();
}
}
}
I can envision a solution to the problem of aggregating over a time filter with a potentially large scope. The reduce would have to break things down into decreasingly smaller units of time at different levels. The code for this is a bit complex, but I am working on it for my own purposes. When complete, I will post over in the knowledge base at www.ravendb.net.
UPDATE
I was playing with this a bit more, and noticed two things in that last query.
You MUST do a ToList() before calling single in order to get the full result set.
Even though this runs on the server, the max you can have in the result range is 1024, and you have to specify a Take(1024) or you get the default of 128 max. Since this runs on the server, I didn't expect this. But I guess its because you don't normally do aggregations in the TransformResults section.
I've updated the code for this. However, unless you can guarantee that the range is small enough for this to work, I would wait for the better full map/reduce that I spoke of. I'm working on it. :)

How to merge multi sets in LinQ

I have 3 sets in Linq, like this:
struct Index
{
string code;
int indexValue;
}
List<Index> reviews
List<Index> products
List<Index> pages
These lists have different code.
I want to merge these sets as following:
Take the first in reviews
Take the first in products
Take the first in pages
Take the second in reviews
-... and so on, note that these lists are not same-size.
How can I do this in Linq?
EDIT: Wait, is there a change to do this without .NET 4.0?
Thank you very much

You could use Zip to do your bidding.
var trios = reviews
.Zip(products, (r, p) => new { Review = r, Product = p })
.Zip(pages, (rp, p) => new { rp.Review, rp.Product, Page = p });
Edit:
For .NET 3.5, it's possible to implement Zip quite easily: but there are a few gotcha s. Jon Skeet has a great post series on how to implement LINQ to objects operators (for educational purposes), including this post, on Zip. The source code of the whole series, edulinq, can be found on Google Code.

The simple answer
To merge them into a common list without any common data, using the order they appear this, you can use the Zip method:
var rows = reviews
.Zip(products, (r, p) => new { Review = r, Product = p })
.Zip(pages, (rp, page) => new { rp.Review, rp.Product, Page = page });
The problem with this solution is that the lists must be identical length, or your result will be chopped to the shortest list of those original three.
Edit:
If you can't use .Net 4, check out Jon Skeet's blog posts on a clean-room implementation of Linq and His article on Zip in particular.
If you're using .Net 2, then try his library (possibly) or try LinqBridge
How to deal with different-lengthed lists
You can pre-pad the list to the desired length. I couldn't find an existing method to do this, so I'd use an extension method:
public static class EnumerableExtensions
{
public static IEnumerable<T> Pad<T>(this IEnumerable<T> source,
int desiredCount, T padWith = default(T))
{
// Note: Not using source.Count() to avoid double-enumeration
int counter = 0;
var enumerator = source.GetEnumerator();
while(counter < desiredCount)
{
yield return enumerator.MoveNext()
? enumerator.Current
: padWith;
++counter;
}
}
}
You can use it like this:
var paddedReviews = reviews.Pad(desiredLength);
var paddedProducts = products.Pad(desiredLength,
new Product { Value2 = DateTime.Now }
);
Full compiling sample and corresponding output
using System;
using System.Collections.Generic;
using System.Linq;
class Review
{
public string Value1;
}
class Product
{
public DateTime Value2;
}
class Page
{
public int Value3;
}
public static class EnumerableExtensions
{
public static IEnumerable<T> Pad<T>(this IEnumerable<T> source,
int desiredCount, T padWith = default(T))
{
int counter = 0;
var enumerator = source.GetEnumerator();
while(counter < desiredCount)
{
yield return enumerator.MoveNext()
? enumerator.Current
: padWith;
++counter;
}
}
}
class Program
{
static void Main(string[] args)
{
var reviews = new List<Review>
{
new Review { Value1 = "123" },
new Review { Value1 = "456" },
new Review { Value1 = "789" },
};
var products = new List<Product>()
{
new Product { Value2 = DateTime.Now },
new Product { Value2 = DateTime.Now.Subtract(TimeSpan.FromSeconds(5)) },
};
var pages = new List<Page>()
{
new Page { Value3 = 123 },
};
int maxCount = Math.Max(Math.Max(reviews.Count, products.Count), pages.Count);
var rows = reviews.Pad(maxCount)
.Zip(products.Pad(maxCount), (r, p) => new { Review = r, Product = p })
.Zip(pages.Pad(maxCount), (rp, page) => new { rp.Review, rp.Product, Page = page });
foreach (var row in rows)
{
Console.WriteLine("{0} - {1} - {2}"
, row.Review != null ? row.Review.Value1 : "(null)"
, row.Product != null ? row.Product.Value2.ToString() : "(null)"
, row.Page != null ? row.Page.Value3.ToString() : "(null)"
);
}
}
}
123 - 9/7/2011 10:02:22 PM - 123
456 - 9/7/2011 10:02:17 PM - (null)
789 - (null) - (null)
On use of the Join tag
This operation isn't a logical Join. This is because you're matching on index, not on any data out of each object. Each object would have to have other data in common (besides their position in the lists) to be joined in the sense of a Join that you would find in a relational database.

LINQ: Chain ID's from one row to another

I have a table which has sort of a child->child->parent set up inside it. (It's a patchup im using on an existing old database so it's a little dodgy).
The class for the table:
public class Foo
{
int ID {get;set;}
int ParentID {get;set;}
int BaseParentID {get;set;}
}
Lets say i have a few records in there
ID: 10, ParentID: 5, BaseParentID: 5
ID: 05, ParentID: 1, BaseParentID: 5
ID: 01, ParentID: 1, BaseParentID: 0
What I want to do, is get each of the ParentID's until the the baseparentid is 0. So in a way, it's stepping through the table from one record to another and retrieving it into a list of ID's.
The end result should be a list: { 10, 5, 1 }
This is what I'm doing now (there is a limit of 4 at the moment, but i'd prefer it if there was no limit):
var list = new List<int?>();
var id = 10; // The first ID is given when this method is started.
list.Add(id);
int? pid = db.Foo.Where(w => w.ID == id).Single().BaseParentID; // i have this as a compiled query function
if (pid != 0) {
list.Add(pid);
pid = db.Foo.Where(w => w.ID == pid).Single().BaseParentID; // for the sake of this example i'm just using the query here
if (pid != null) {
list.Add(pid);
// And so on
}
}
As you can see, it's a bit of a crappy way to do this. But i'm not sure if there's a way to do this in a fancy linq query.
ps. The point of this is sort of a pseudo folder structure.

This is a good example of where you would write a separate iterator function:
IEnumerable<Foo> TraverseParents(Foo foo, IEnumerable<Foo> all)
{
while(foo != null)
{
yield return foo;
foo = (foo.pid == 0) ? null : all.FirstOrDefault(f => f.ID == foo.pid);
}
}
// In the calling code
var id = 10;
Foo root = db.Foo.FirstOrDefault(f => f.ID == id);
List<int> list = TraverseParents(root, db.Foo)
.Select(f => f.ID)
.ToList();

You can use the following method:
List<int> GetParentHierarchy(int startingId)
{
List<int> hierarchy = new List<int> { startingId };
using(Connection db = new Connection()) //change to your context
{
int parentId = startingId;
while(true)
{
var foo = db.Foo(x => x.Id == parentId).SingleOrDefault();
if(foo == null)
break;
parentId = foo.ParentId;
hierarchy.Add(foo.Id);
if(foo.BaseParentID == 0)
break;
}
}
return hierarchy;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ query to select rows matching an array of pairs - c#

Related

How do I minimize C#, LINQ code in Deleting Multiple Records

How do I get total Qty using one linq query?

Map reduce in RavenDb, update 1

How to merge multi sets in LinQ

LINQ: Chain ID's from one row to another

Categories

Resources