Ways to enhance nested Parallel loops - c#

I have a method that contains nested parallel loops to populate one of the properties of the loop.
public static string CalculateFantasyPointsLeagueSettings(ref List<Projections> projection, League league, Teams teamList)
{
string PPR = "";
var stats = (from a in league.Settings.StatCategories.stats
join b in league.Settings.StatModifiers.stats on a.StatId equals b.StatId
select new Tuple<string, double>(a.Name.Replace(" ", string.Empty), b.Value)).ToList();
var props = new Projections().GetType().GetProperties();
double receptionValue = (stats.Where(a => a.Item1 == "Receptions").Select(a => a.Item2).FirstOrDefault());
if (receptionValue == 1.0)
{
PPR = "PPR";
}
else if (receptionValue == .5)
{
PPR = "Half PPR";
}
Parallel.ForEach(projection, proj =>
{
double points = 0;
Parallel.ForEach(props, prop =>
{
var stat = (from a in stats
where a.Item1 == prop.Name
select
Convert.ToDouble(prop.GetValue(proj, null)) * a.Item2
).FirstOrDefault();
points += stat;
});
proj.ProjectedPoints = Math.Round(points, 2);
proj.FantasyTeam = (from a in teamList.TeamList
where a.Players.player.Select(b => b.Name.Full).Contains(proj.Name)
select a.Name).FirstOrDefault();
});
return PPR;
}
What this method does is calculates the points for a fantasy football player based on their league settings (stats object) and their projected stats (projections list). Since stats is not a static list (you can have custom settings), I need to be able to loop through all the items in that list, and calculate each one by hand. In order to do this, I have to get the property in Reflection and than use it's value. Since projection has about 35k records and stats will be well over 30, this method takes a long time to run, probably due to reflection. I am trying to figure out if there is anything I can do to make it run faster. Right now it takes about 2-3 seconds, which is not ideal. I cannot put it in cache as it is fairly dynamic. Any help would be greatly appreciated.

Related

Get Rank from list of object depending upon some condition

I have list of object of class which contain totalScore as one property.I want to get rank of Team depending upon totalscore of team.Here is the list of object I called it as List data= new List();
so data contain object of scoreboard class with total score property.
I need rank of team depending upon totalscore.Here is the code that I try but it give result like Rank=1,2,2,3 but I need Rank=1,2,2,4 like this.
data.OrderByDescending(x => x.totalScore).GroupBy(x => x.totalScore)
.SelectMany((g, i) => g.Select(e => new { data = e.Rank = i + 1 }))
.ToList();
The data list contain unique team but there total score may be same so same totalscore team must be in one rank. please help me!
If you need to update the list in-place:
int i = 0;
decimal? prevValue = null;
foreach(var item in data.OrderByDescending(x => x.totalScore))
{
item.Rank = prevValue == item.totalScore ? i : ++i;
prevValue = item.totalScore;
}
A different notation (which I prefer for readability) but essentially the same answer as provided by user3185569.
var i = 1;
var results = (from d in data orderby d.totalScore descending select new { Obj = d, Rank = i++ } ).ToList();

avoid loop to generate nullable collection, increase performance

I've been using Stopwatch and it looks like the below query is very expensive in terms of performance, even though what I already have below I find most optimal based on various reading (change foreach loop with for, use arrays instead of collection, using anonymous type not to take the whole table from DB). Is there a way to make it faster? I need to fill the prices array, which needs to be nullable. I'm not sure if I'm missing something?
public float?[] getPricesOfGivenProducts(string[] lookupProducts)
{
var idsAndPrices = from r in myReadings select
new { ProductId = r.ProductId, Price = r.Price };
float?[] prices = new float?[lookupProducts.Length];
for(int i=0;i<lookupProducts.Length;i++)
{
string id = lookupProducts[i];
if (idsAndPrices.Any(r => r.ProductId == id))
{
prices[i] = idsAndPrices.Where(p => p.ProductId == id)
.Select(a=>a.Price).FirstOrDefault();
}
else
{
prices[i] = null;
}
}
return prices;
}
It's likely every time you call idsAndPrices.Any(r => r.ProductId == id), you are hitting the database, because you haven't materialized the result (.ToList() would somewhat fix it). That's probably the main cause of the bad performance. However, simply loading it all into memory still means you're searching the list for a productID every time (twice per product, in fact).
Use a Dictionary when you're trying to do lookups.
public float?[] getPricesOfGivenProducts(string[] lookupProducts)
{
var idsAndPrices = myReadings.ToDictionary(r => r.ProductId, r => r.Price);
float?[] prices = new float?[lookupProducts.Length];
for (int i = 0; i < lookupProducts.Length; i++)
{
string id = lookupProducts[i];
if (idsAndPrices.ContainsKey(id))
{
prices[i] = idsAndPrices[id];
}
else
{
prices[i] = null;
}
}
return prices;
}
To improve this further, we can identify that we only care about products passed to us in the array. So let's not load the entire database:
var idsAndPrices = myReadings
.Where(r => lookupProducts.Contains(r.ProductId))
.ToDictionary(r => r.ProductId, r => r.Price);
Now, we might want to avoid the 'return null price if we can't find the product' scenario. Perhaps the validity of the product id should be handled elsewhere. In that case, we can make the method a lot simpler (and we won't have to rely on having the array in order, either):
public Dictionary<string, float> getPricesOfGivenProducts(string[] lookupProducts)
{
return myReadings
.Where(r => lookupProducts.Contains(r.ProductId))
.ToDictionary(r => r.ProductId, r => r.Price);
}
And a note unrelated to performance, you should use decimal for money
Assuming that idsAndPrices is an IEnumerable<T>, you should make it's initialization:
var idsAndPrices = (from r in myReadings select
new { ProductId = r.ProductId, Price = r.Price })
.ToList();
It's likely that the calls to:
idsAndPrices.Any(r => r.ProductId == id)
and:
idsAndPrices.Where(p => p.ProductId == id)
..are causing the IEnumerable<T> to be evaluated every time it's called.
Based on
using anonymous type not to take the whole table from DB
I assume myReadings is the database table and
var idsAndPrices =
from r in myReadings
select new { ProductId = r.ProductId, Price = r.Price };
is the database query.
Your implementation is far from optimal (I would rather say quite inefficient) because the above query is executed twice per each element of lookupProducts array - idsAndPrices.Any(...) and idsAndPrices.Where(...) statements.
The optimal way I see is to filter as much as possible the database query, and then use the most efficient LINQ to Objects method for correlating two in memory sequences - join, in your case left outer join:
var dbQuery =
from r in myReadings
where lookupProducts.Contains(r.ProductId)
select new { ProductId = r.ProductId, Price = r.Price };
var query =
from p in lookupProducts
join r in dbQuery on p equals r.ProductId into rGroup
from r in rGroup.DefaultIfEmpty().Take(1)
select r?.Price;
var result = query.ToArray();
The Any and FirstOrDefault are O(n) and redundant. You can get a 50% speed up just by removing theAll call. FirstOrDefault will give you back a null, so use it to get a product object (remove the Select). If you want to really speed it up you should just loop through the products and check if prices[p.ProductId] != null before setting prices[p.ProductId] = p.Price.
bit of extra code code there
var idsAndPrices = (from r in myReadings select
new { ProductId = r.ProductId, Price = r.Price })
.ToList();
for(int i=0;i<lookupProducts.Length;i++)
{
string id = lookupProducts[i];
prices[i] = idsAndPrices.FirstOrDefault(p => p.ProductId == id);
}
better yet
Dictionary<Int, Float?> dp = new Dictionary<Int, Float?>();
foreach(var reading in myReadings)
dp.add(r.ProductId, r.Price);
for(int i=0;i<lookupProducts.Length;i++)
{
string id = lookupProducts[i];
if(dp.Contains(id)
prices[i] = dp[id];
else
prices[i] = null;
}

How to sort a list after AddRange?

new to C#, SQL and Linq. I have two lists, one "dataTransactions" (fuel from gas stations) and a similar one "dataTransfers" (fuel from slip tanks).
They each access a different table from SQL and get combined later.
List<FuelLightTruckDataSource> data = new List<FuelLightTruckDataSource>();
using (SystemContext ctx = new SystemContext())
{
List<FuelLightTruckDataSource> dataTransactions
= ctx.FuelTransaction
.Where(tx => DbFunctions.TruncateTime(tx.DateTime) >= from.Date && DbFunctions.TruncateTime(tx.DateTime) <= to.Date
//&& tx.AssetFilled.AssignedToEmployee.Manager
&& tx.AssetFilled.AssignedToEmployee != null
//&
&& tx.AssetFilled.AssetType.Code == "L"
&& (tx.FuelProductType.FuelProductClass.Code == "GAS" || tx.FuelProductType.FuelProductClass.Code == "DSL"))
.GroupBy(tx => new { tx.AssetFilled, tx.DateTime, tx.FuelProductType.FuelProductClass, tx.FuelCard.FuelVendor, tx.City, tx.Volume, tx.Odometer}) //Added tx.volume to have individual transactions
.Select(g => new FuelLightTruckDataSource()
{
Asset = g.FirstOrDefault().AssetFilled,
Employee = g.FirstOrDefault().AssetFilled.AssignedToEmployee,
ProductClass = g.FirstOrDefault().FuelProductType.FuelProductClass,
Vendor = g.FirstOrDefault().FuelCard.FuelVendor,
FillSource = FuelFillSource.Transaction,
Source = "Fuel Station",
City = g.FirstOrDefault().City.ToUpper(),
Volume = g.FirstOrDefault().Volume,
Distance = g.FirstOrDefault().Odometer,
Date = g.FirstOrDefault().DateTime
})
.ToList();
In the end, I use
data.AddRange(dataTransactions);
data.AddRange(dataTransfers);
to put the two lists together and generate a fuel consumption report.
Both lists are individually sorted by Date, but after "AddRange" the "dataTransfers" just gets added to the end, losing my sort by Date. How do I sort the combined result again by date after using the "AddRange" command?
Try this:
data = data.OrderBy(d => d.Date).ToList();
Or if you want to order descending:
data = data.OrderByDescending(d => d.Date).ToList();
You can call List<T>.Sort(delegate).
https://msdn.microsoft.com/en-us/library/w56d4y5z(v=vs.110).aspx
Example:
data.Sort(delegate(FuelLightTruckDataSource x, FuelLightTruckDataSource y)
{
// your sort logic here.
});
Advantage: this sort doesn't create a new IList<T> instance as it does in OrderBy. it's a small thing, but to some people this matters, especially for performance and memory sensitive situations.

How can I make this LINQ query of an Enumerable DataTable of GTFS data faster?

I'm working with the GTFS data for the New York City MTA subway system. I need to find the stop times for each route at a specific stop. To do that, I get the stop times from a StopTimes DataTable that I have, for a specific stop_id. I only want stop times between now and the next 2 hours.
Then, I need to lookup the trip for each stop time, using the trip_id value. From that trip, I have to lookup the route, using the route_id value, in order to get the route name or number for the stop time.
Here are the counts for each DataTable: StopTimes(522712), Trips(19092), Routes(27).
Right now, this takes anywhere from 20 seconds to 40 seconds to execute. How can I speed this up? Any and all suggestions are appreciated. Thanks!
foreach (var r in StopTimes.OrderBy(z => z.Field<DateTime>("departure_time").TimeOfDay)
.Where(z => z.Field<string>("stop_id") == stopID &&
z["departure_time"].ToString() != "" &&
z.Field<DateTime>("departure_time").TimeOfDay >= DateTime.UtcNow.AddHours(-5).TimeOfDay &&
z.Field<DateTime>("departure_time").TimeOfDay <= DateTime.UtcNow.AddHours(-5).AddHours(2).TimeOfDay))
{
var trip = (from z in Trips
where z.Field<string>("trip_id") == r.Field<string>("trip_id") &&
z["route_id"].ToString() != ""
select z).Single();
var route = (from z in Routes
where z.Field<string>("route_id") == trip.Field<string>("route_id")
select z).Single();
// do stuff (not time-consuming)
}
Try this:
var now = DateTime.UtcNow;
var tod0 = now.AddHours(-5).TimeOfDay;
var tod1 = now.AddHours(-5).AddHours(2).TimeOfDay;
var sts =
from st in StopTimes
let StopID = st.Field<string>("stop_id")
where StopID == stopID
where st["departure_time"].ToString() != ""
let DepartureTime = st.Field<DateTime>("departure_time").TimeOfDay
where DepartureTime >= tod0
where DepartureTime >= tod1
let TripID = st.Field<string>("trip_id")
select new
{
StopID,
TripID,
DepartureTime,
};
Note that there is no orderby in this query and that we're returning an anonymous type. For your "do stuff (not time-consuming)" code to run you may need to add some more properties.
The same approach happens for Trips & Routes.
var ts =
from t in Trips
where t["route_id"].ToString() != ""
let TripID = t.Field<string>("trip_id")
let RouteID = t.Field<string>("route_id")
select new
{
TripID,
RouteID,
};
var rs =
from r in Routes
let RouteID = r.Field<string>("route_id")
select new
{
RouteID,
};
Since you're getting a single record for each look up then using ToDictionary(...) is a good choice to use.
var tripLookup = ts.ToDictionary(t => t.TripID);
var routeLookup = rs.ToDictionary(r => r.RouteID);
Now your query looks like this:
var query = from StopTime in sts.ToArray()
let Trip = tripLookup[StopTime.TripID]
let Route = routeLookup[Trip.RouteID]
orderby StopTime.DepartureTime
select new
{
StopTime,
Trip,
Route,
};
Notice that I've used .ToArray() and I've put the orderby right at the end.
And you run your code like this:
foreach (var q in query)
{
// do stuff (not time-consuming)
}
Let me know if this helps.
I would make a Dictionary<int, Trip> from Trips where the key is the trip_id, and a Dictionary<int, Route> from Routes where the key is route_id. your code is iterating over the 19092 items in Trips once for every one of the items in the filtered IEnumerable<StopTime>. Same deal for Routes, but at least there are only 27 items in there.
Edit:
actually looking at it more closely, the first dictionary would be Dictionary<int, int> where the value is the route_id. And given the one to one relationship between trip_id and route_id you could just build a Dictionary<trip_id, Route> and do one lookup.
It helps to understand deferred query execution so you can make case by case decisions on how to optimize your runtime. Here is a good blog post that can get you started: http://ox.no/posts/linq-vs-loop-a-performance-test

Adding where clause to nested Linq selects

I'm still new to Linq so if you see something I really shouldn't be doing, please feel free to suggest a change.
I am working on a new system to allow officers to sign up for overtime. Part of the data is displayed on a map with search criteria filtering unwanted positions. In order to make the data easier to work with, it is read into a hierarchy object structure using Linq. In this example, a job can contain multiple shifts and each shift can have multiple positions available. The Linq statement to read them in looks like the following.
var jobs = (from j in db.Job
join s in db.Shift on j.Id equals s.JobId into shifts
select new JobSearchResult
{
JobNumber = j.Id,
Name = j.JobName,
Latitude = j.LocationLatitude,
Longitude = j.LocationLongitude,
Address = j.AddressLine1,
Shifts = (from shift in shifts
join p in db.Position on shift.Id equals p.ShiftId into positions
select new ShiftSearchResult
{
Id = shift.Id,
Title = shift.ShiftTitle,
StartTime = shift.StartTime,
EndTime = shift.EndTime,
Positions = (from position in positions
select new PositionSearchResult
{
Id = position.Id,
Status = position.Status
}).ToList()
}).ToList()
});
That works fine and has been tested. There may be a better way to do it and if you know of a way, feel free to suggest. My problem is this. After the query is created, search criteria will be added. I know that I could add it when the query is created but for this its easier to do it after. Now, I can easy add criteria that looks like this.
jobs = jobs.Where(j => j.JobNumber == 1234);
However, I am having trouble figuring out how to do the same for Shifts or Positions. In other words, how would I could it to add the condition that a shift starts after a particular time? The following example is what I am trying to accomplish but will not (obviously) work.
jobs = jobs.Shifts.Where(s = s.StartTime > JobSearch.StartTime) //JobSearch.StartTime is a form variable.
Anyone have any suggestions?
Step 1: create associations so you can have the joins hidden behind EntitySet properties.
http://msdn.microsoft.com/en-us/library/bb629295.aspx
Step 2: construct your filters. You have 3 queryables and the possibility of filter interaction. Specify the innermost filter first so that the outer filters may make use of them.
Here are all jobs (unfiltered). Each job has only the shifts with 3 open positions. Each shift has those open positions.
Expression<Func<Position, bool>> PositionFilterExpression =
p => p.Status == "Open";
Expression<Func<Shift, bool>> ShiftFilterExpression =
s => s.Positions.Where(PositionFilterExpression).Count == 3
Expression<Func<Job, bool>> JobFilterExpression =
j => true
Step 3: put it all together:
List<JobSearchResult> jobs = db.Jobs
.Where(JobFilterExpression)
.Select(j => new JobSearchResult
{
JobNumber = j.Id,
Name = j.JobName,
Latitude = j.LocationLatitude,
Longitude = j.LocationLongitude,
Address = j.AddressLine1,
Shifts = j.Shifts
.Where(ShiftFilterExpression)
.Select(s => new ShiftSearchResult
{
Id = s.Id,
Title = s.ShiftTitle,
StartTime = s.StartTime,
EndTime = s.EndTime,
Positions = s.Positions
.Where(PositionFilterExpression)
.Select(p => new PositionSearchResult
{
Id = position.Id,
Status = position.Status
})
.ToList()
})
.ToList()
})
.ToList();

Categories

Resources