Cleaning up a simple foreach with linq - c#

The following method is pretty simple, I'm trying to determine a line-item rate by matching up another property of the line-item with a lookup from a parent object. There's a few things I don't like about it and am looking for elegant solutions to either make the method smaller, more efficient, or both. It works in it's current state and it's not like it's noticeably inefficient or anything. This isn't mission critical or anything, more of a curiosity.
private decimal CalculateLaborTotal()
{
decimal result = 0;
foreach (ExtraWorkOrderLaborItem laborItem in Labor)
{
var rates = (from x in Project.ContractRates where x.ProjectRole.Name == laborItem.ProjectRole.Name select x).ToList();
if (rates != null && rates.Count() > 0)
{
result += laborItem.Hours * rates[0].Rate;
}
}
return result;
}
I like the idea of using List<T>.ForEach(), but I was having some trouble keeping it succinct enough to still be easy to read/maintain. Any thoughts?

Something like this should do it (untested!):
var result =
(from laborItem in Labor
let rate = (from x in Project.ContractRates
where x.ProjectRole.Name == laborItem.ProjectRole.Name
select x).FirstOrDefault()
where rate != null
select laborItem.Hours * rate.Rate).Sum();
Or (assuming only one rate can match) a join would be even neater:
var result =
(from laborItem in Labor
join rate in Project.ContractRates
on laborItem.ProjectRole.Name equals rate.ProjectRole.Name
select laborItem.Hours * rate.Rate).Sum();

Okay, well how about this:
// Lookup from name to IEnumerable<decimal>, assuming Rate is a decimal
var ratesLookup = Project.ContractRates.ToLookup(x => x.ProjectRole.Name,
x => x.Rate);
var query = (from laborItem in Labor
let rate = ratesGroup[laborItem].FirstOrDefault()
select laborItem.Hours * rate).Sum();
The advantage here is that you don't need to look through a potentially large list of contract rates every time - you build the lookup once. That may not be an issue, of course.

Omit the rates != null check - A linq query may be empty but not null.
If you just need the first element of a list, use List.First or List.FirstOrDefault.
There is no advantage in using List<T>.ForEach.

Related

How to implement a linq expression for maximum in different nodes?

I have the following xml:
https://pastebin.pl/view/63af9294
I should find the name of the battle where the highest army fought.
(Yes, it's a school work, but i'm not getting any closer)
So far my code looks like:
var q10 = (from a in
(from x in xdoc.Descendants("size") select x)
join b in
(from y in xdoc.Descendants("battle") select y)
on a equals b.Element("attacker").Element("size")
select new
{
size = a.Value,
battle = b.Element("name")
});
I am trying to get the highest number first, than join every battle by the size and than use a
.Max(x => x.size) , but as you see i have no clue how to do it for 2 different nodes, or whatever called.
I mean i can join the attacker or the defender based on the size of an army but cannot join both of them in the same time, unless i use 2 joins, but i guess it could be done much easier and nicer. I don't want to write the code, just need some tips.
If I correctly understand what you need, I would go this way:
var battle = xDoc.Descendants("battle")
.OrderByDescending(b => GetMaxBattleArmySize(b))
.First();
where GetMaxArmySize is
private static int GetMaxBattleArmySize(XElement battle)
{
XElement attacker = battle.Element("attacker");
XElement defender = battle.Element("defender");
var attackerSizeEl = attacker.Element("size");
var defenderSizeEl = defender.Element("size");
var attackerSize = attackerSizeEl == null
? 0
: int.Parse(attackerSizeEl.Value);
var defenderSize = defenderSizeEl == null
? 0
: int.Parse(defenderSizeEl.Value);
return Math.Max(attackerSize, defenderSize);
}

LINQ Select Last & Unique Record from a DB using List

I have a table where all vehicles are registered and another table where I have millions of pings for each registered vehicle.
I'm trying to select the last ping from each vehicle that has sent a ping in the last 30 minutes using the LINQ QUERY. I've done the code below through the "for each" idea, but I'm not sure if it is the best way to do.
I would like to know if there is any better way to select this using a single line? I know that I can "group by" them by vehicle_fleetNumber but I couldn't achieve the proper result as the TAKE() is limiting the final result.
var timeRestriction = DateTime.UtcNow.AddMinutes(-30);
var x = _db.Vehicles.Where(r=> r.isActive.Equals(true) && r.helperLastPing > timeRestriction);
foreach (var vehicle in x)
{
var firstOrDefault = _db.Tracks.OrderByDescending(r => r.collectedOn)
.FirstOrDefault(r => r.vehicle_fleetNumber.Equals(vehicle.fleetNumber));
}
return View();
Thank you,
Yes, you should do it in the database by joining both tables and using GroupBy:
var query = from v in _db.Vehicles
join t in _db.Tracks
on v.fleetNumber equals t.vehicle_fleetNumber
where v.isActive && v.helperLastPing > timeRestriction
group t by t.vehicle_fleetNumber into vehicleGroup
select vehicleGroup.OrderByDescending(x => x.collectedOn).First();
foreach(var track in query)
{
// ...
}
Instead of the foreach you can also use query.ToArray or ToList, i don't know what you want to do with it.
If you get moreLinq from nuget you will find the .maxby() method:
for example in a different context:
//get the correct exchange rate
var rateList = _db.lists_ExchangeRates.Where(
rates => rates.Currency == currencyCode);
Decimal? exRate = rateList.MaxBy(rates => rates.LastUpdated).ExchangeRate;
Also see below this gives additional info.
MoreLinq maxBy vs LINQ max + where
In my case if I want the last data that has been save I use this method
var id = db.DPSlips.Max(item => item.Id);
So I thought this might work as will just try
var timeRestriction = DateTime.UtcNow.AddMinutes(-30);
var x = _db.Vehicles.Max(a => a.isActive == true && a.helperLastPing > timeRestriction);

Multiple LINQ to SQL queries. Enumeration optimisation

I have a list of ~ 15,000 'team's that need an individual linq query to return results.
Namely - [Select the last 10 games for 'team']
public IEnumerable<ResultsByDate> SelectLast10Games(DateTime date, string team)
{
return
( from e in db.E0s
where e.DateFormatted < date &&
(e.HomeTeam == team || e.AwayTeam == team)
orderby e.Date descending
select new ResultsByDate
{
Date = e.Date,
HomeTeam = e.HomeTeam,
AwayTeam = e.AwayTeam,
HomeGoals = e.FTHG,
AwayGoals = e.FTAG
}
).Take(10);
}
This query is probably fine, it seems fast enough when called 15,000 times.
My real issue is that I have to enumerate each query and this really kills the performance.
For each of these queries I need to run a method on the 10 results and hence the queries need enumerating.
The question is how can I avoid 15,000 enumerations?
I thought about placing each of the results into a big list and then calling .ToList() or whatever's best, but adding to a List enumerates as it goes along so this doesn't seem viable.
Is there a way to combine all 15,000 LINQ queries into one giant LINQ query such as..
public IEnumerable<ResultsByDate> SelectLast10Games(DateTime date, List<string> Teams)
{
foreach(var team in Teams)
{ var query =
(from e in db.E0s
where e.DateFormatted < date &&
(e.HomeTeam == team || e.AwayTeam == team)
orderby e.Date descending
select new ResultsByDate
{
Date = e.Date,
HomeTeam = e.HomeTeam,
AwayTeam = e.AwayTeam,
HomeGoals = e.FTHG,
AwayGoals = e.FTAG
}
).Take(10);
}
}
So this would return one huge result set that I can then enumerate in one go and work from there?
I have tried but I can't seem to get the LINQ loop correct ( if it's even possible - and the best way to fix my issue).
The whole program takes ~ 29 minutes to complete. Without the enumeration its around 30 seconds which is not amazing but satisfactory given the criteria.
Thanks!
This can be accomplish with using Teams.Select(team => ..)
var query = Teams
.Select(team =>
db.E0s
.Where(e => e.DateFormatted < date && (e.HomeTeam == team || e.AwayTeam == team))
.OrderByDescending(e => e.Date)
.Select(
e =>
new ResultsByDate {
Date = e.Date,
HomeTeam = e.HomeTeam,
AwayTeam = e.AwayTeam,
HomeGoals = e.FTHG,
AwayGoals = e.FTAG
}
)
.Take(10)
)
If you're looking for best performance for heavily querying, you should consider using SQL Stored Procedure and calling it using ADO.NET, Dapper or EntityFramework (The order of choices is from the optimal to the trivial) My recommendation is using Dapper. This will speed up your query, especially if the table is indexed correctly.
To feed 15k parameters efficiently into server, You can use TVP:
http://blog.mikecouturier.com/2010/01/sql-2008-tvp-table-valued-parameters.html
My real issue is that I have to enumerate each query and this really
kills the performance.
Unless You enumerate the result, there is no call to the server. So no wonder it is fast without enumeration. But that does not mean that the enumeration is the problem.

How can I make this LINQ query of an Enumerable DataTable of GTFS data faster?

I'm working with the GTFS data for the New York City MTA subway system. I need to find the stop times for each route at a specific stop. To do that, I get the stop times from a StopTimes DataTable that I have, for a specific stop_id. I only want stop times between now and the next 2 hours.
Then, I need to lookup the trip for each stop time, using the trip_id value. From that trip, I have to lookup the route, using the route_id value, in order to get the route name or number for the stop time.
Here are the counts for each DataTable: StopTimes(522712), Trips(19092), Routes(27).
Right now, this takes anywhere from 20 seconds to 40 seconds to execute. How can I speed this up? Any and all suggestions are appreciated. Thanks!
foreach (var r in StopTimes.OrderBy(z => z.Field<DateTime>("departure_time").TimeOfDay)
.Where(z => z.Field<string>("stop_id") == stopID &&
z["departure_time"].ToString() != "" &&
z.Field<DateTime>("departure_time").TimeOfDay >= DateTime.UtcNow.AddHours(-5).TimeOfDay &&
z.Field<DateTime>("departure_time").TimeOfDay <= DateTime.UtcNow.AddHours(-5).AddHours(2).TimeOfDay))
{
var trip = (from z in Trips
where z.Field<string>("trip_id") == r.Field<string>("trip_id") &&
z["route_id"].ToString() != ""
select z).Single();
var route = (from z in Routes
where z.Field<string>("route_id") == trip.Field<string>("route_id")
select z).Single();
// do stuff (not time-consuming)
}
Try this:
var now = DateTime.UtcNow;
var tod0 = now.AddHours(-5).TimeOfDay;
var tod1 = now.AddHours(-5).AddHours(2).TimeOfDay;
var sts =
from st in StopTimes
let StopID = st.Field<string>("stop_id")
where StopID == stopID
where st["departure_time"].ToString() != ""
let DepartureTime = st.Field<DateTime>("departure_time").TimeOfDay
where DepartureTime >= tod0
where DepartureTime >= tod1
let TripID = st.Field<string>("trip_id")
select new
{
StopID,
TripID,
DepartureTime,
};
Note that there is no orderby in this query and that we're returning an anonymous type. For your "do stuff (not time-consuming)" code to run you may need to add some more properties.
The same approach happens for Trips & Routes.
var ts =
from t in Trips
where t["route_id"].ToString() != ""
let TripID = t.Field<string>("trip_id")
let RouteID = t.Field<string>("route_id")
select new
{
TripID,
RouteID,
};
var rs =
from r in Routes
let RouteID = r.Field<string>("route_id")
select new
{
RouteID,
};
Since you're getting a single record for each look up then using ToDictionary(...) is a good choice to use.
var tripLookup = ts.ToDictionary(t => t.TripID);
var routeLookup = rs.ToDictionary(r => r.RouteID);
Now your query looks like this:
var query = from StopTime in sts.ToArray()
let Trip = tripLookup[StopTime.TripID]
let Route = routeLookup[Trip.RouteID]
orderby StopTime.DepartureTime
select new
{
StopTime,
Trip,
Route,
};
Notice that I've used .ToArray() and I've put the orderby right at the end.
And you run your code like this:
foreach (var q in query)
{
// do stuff (not time-consuming)
}
Let me know if this helps.
I would make a Dictionary<int, Trip> from Trips where the key is the trip_id, and a Dictionary<int, Route> from Routes where the key is route_id. your code is iterating over the 19092 items in Trips once for every one of the items in the filtered IEnumerable<StopTime>. Same deal for Routes, but at least there are only 27 items in there.
Edit:
actually looking at it more closely, the first dictionary would be Dictionary<int, int> where the value is the route_id. And given the one to one relationship between trip_id and route_id you could just build a Dictionary<trip_id, Route> and do one lookup.
It helps to understand deferred query execution so you can make case by case decisions on how to optimize your runtime. Here is a good blog post that can get you started: http://ox.no/posts/linq-vs-loop-a-performance-test

What's faster? Struct array or DataTable

I am using LinqToSQL to process data from SQL Server to dump it into an iSeries server for further processing. More details on that here.
My problem is that it is taking about 1.25 minutes to process those 350 rows of data. I am still trying to decipher the results from the SQL Server Profiler, but there are a TON of queries being run. Here is a bit more detail on what I am doing:
using (CarteGraphDataDataContext db = new CarteGraphDataDataContext())
{
var vehicles = from a in db.EquipmentMainGenerals
join b in db.EquipmentMainConditions on a.wdEquipmentMainGeneralOID equals b.wdEquipmentMainGeneralOID
where b.Retired == null
orderby a.VehicleId
select a;
et = new EquipmentTable[vehicles.Count()];
foreach (var vehicle in vehicles)
{
// Move data to the array
// Rates
GetVehcileRates(vehicle.wdEquipmentMainGeneralOID);
// Build the costs accumulators
GetPartsAndOilCosts(vehicle.VehicleId);
GetAccidentAndOutRepairCosts(vehicle.wdEquipmentMainGeneralOID);
// Last Month's Accumulators
et[i].lastMonthActualGasOil = GetFuel(vehicle.wdEquipmentMainGeneralOID) + Convert.ToDecimal(oilCost);
et[i].lastMonthActualParts = Convert.ToDecimal(partsCost);
et[i].lastMonthActualLabor = GetLabor(vehicle.VehicleId);
et[i].lastMonthActualOutRepairs = Convert.ToDecimal(outRepairCosts);
et[i].lastMonthActualAccidentCosts = Convert.ToDecimal(accidentCosts);
// Move more data to the array
i++;
}
}
The Get methods all look similar to:
private void GetPartsAndOilCosts(string vehicleKey)
{
oilCost = 0;
partsCost = 0;
using (CarteGraphDataDataContext db = new CarteGraphDataDataContext())
{
try
{
var costs = from a in db.WorkOrders
join b in db.MaterialLogs on a.WorkOrderId equals b.WorkOrder
join c in db.Materials on b.wdMaterialMainGeneralOID equals c.wdMaterialMainGeneralOID
where (monthBeginDate.Date <= a.WOClosedDate && a.WOClosedDate <= monthEndDate.Date) && a.EquipmentID == vehicleKey
group b by c.Fuel into d
select new
{
isFuel = d.Key,
totalCost = d.Sum(b => b.Cost)
};
foreach (var cost in costs)
{
if (cost.isFuel == 1)
{
oilCost = (double)cost.totalCost * (1 + OVERHEAD_RATE);
}
else
{
partsCost = (double)cost.totalCost * (1 + OVERHEAD_RATE);
}
}
}
catch (InvalidOperationException e)
{
oilCost = 0;
partsCost = 0;
}
}
return;
}
My thinking here is cutting down the number of queries to the DB should speed up the processing. If LINQ does a SELECT for every record, maybe I need to load every record into memory first.
I still consider myself a beginner with C# and OOP in general (I do mostly RPG programming on the iSeries). So I am guessing I am doing something stupid. Can you help me fix my stupidity (at least with this problem)?
Update: Thought I would come back and update you on what I have discovered. It appears like the database was poorly designed. Whatever LINQ was generating in the background it was highly inefficient code. I am not saying the LINQ is bad, it just was bad for this database. I converted to a quickly thrown together .XSD setup and the processing time went from 1.25 minutes to 15 seconds. Once I do a proper redesign, I can only guess I'll shave a few more seconds off of that. Thank you all for you comments. I'll try LINQ again some other day on a better database.
There are a few things that I spot in your code:
You query the database multiple times for each item in the 'var vehicles' query, you might want to rewrite that query so that less database queries are needed.
When you don't need all the properties of the queried entity, or need sub entities of that entity, it's better for performance to use an anonymous type in your select. LINQ to SQL will analyze this and retrieve less data from your database. Such a select might look like this: select new { a.VehicleId, a.Name }
The query in the GetPartsAndOilCosts can be optimized by putting the calculation cost.totalCost * (1 + OVERHEAD_RATE) in the LINQ query. This way the query can be executed in the database completely, which should make it much faster.
You are doing a Count() on the var vehicles query, but you only use it for determining the size of the array. While LINQ to SQL will make a very efficient SELECT count(*) query of it, it takes an extra round trip to the database. Besides that (depending on your isolation level) the time you start iterating the query an item could be added. In that case your array is too small and an ArrayIndexOutOfBoundsException will be thrown. You can simply use .ToArray() on the query or create a List<EquipmentTable> and call .ToArray() on that. This will normally be fast enough especially when you only have only 380 items in this collection and it will certainly be faster than having an extra roundtrip to the database (the count).
As you probably already expect, the amount of database queries are the actual problem. Switching between struct array or DataTable will not perform much different.
After you optimized away as much queries that you could, start analyzing the queries left (using SQL profiler) and optimize these queries using the Index tuning wizard. It will propose some new indexes for you, that could speed things up considerably.
A little extra explanation for point #1. What you're doing here is a bit like this:
var query = from x in A select something;
foreach (var row in query)
{
var query2 = from y in data where y.Value = row.Value select something;
foreach (var row2 in query2)
{
// do some computation.
}
}
What you should try to accomplish is to remove the query2 subquery, because it is executing on each row of the top query. So you could end up with something like this:
var query =
from x in A
from y in B
where x.Value == y.Value
select something;
foreach (var row in query)
{
}
Of course this example is simplistic and in real life it gets get pretty complicated (as you’ve already noticed). In your case also because you've got multiple of those 'sub queries'. It can take you some time to get this right, especially with your lack of knowledge of LINQ to SQL (as you said yourself).
If you can't figure it out, you can always ask again here at Stackoverflow, but please remember to strip your problem to the smallest possible thing, because it's no fun to read over someone's mess (we're not getting paid for this) :-)
Good luck.

Categories

Resources