I am using LinqToSQL to process data from SQL Server to dump it into an iSeries server for further processing. More details on that here.
My problem is that it is taking about 1.25 minutes to process those 350 rows of data. I am still trying to decipher the results from the SQL Server Profiler, but there are a TON of queries being run. Here is a bit more detail on what I am doing:
using (CarteGraphDataDataContext db = new CarteGraphDataDataContext())
{
var vehicles = from a in db.EquipmentMainGenerals
join b in db.EquipmentMainConditions on a.wdEquipmentMainGeneralOID equals b.wdEquipmentMainGeneralOID
where b.Retired == null
orderby a.VehicleId
select a;
et = new EquipmentTable[vehicles.Count()];
foreach (var vehicle in vehicles)
{
// Move data to the array
// Rates
GetVehcileRates(vehicle.wdEquipmentMainGeneralOID);
// Build the costs accumulators
GetPartsAndOilCosts(vehicle.VehicleId);
GetAccidentAndOutRepairCosts(vehicle.wdEquipmentMainGeneralOID);
// Last Month's Accumulators
et[i].lastMonthActualGasOil = GetFuel(vehicle.wdEquipmentMainGeneralOID) + Convert.ToDecimal(oilCost);
et[i].lastMonthActualParts = Convert.ToDecimal(partsCost);
et[i].lastMonthActualLabor = GetLabor(vehicle.VehicleId);
et[i].lastMonthActualOutRepairs = Convert.ToDecimal(outRepairCosts);
et[i].lastMonthActualAccidentCosts = Convert.ToDecimal(accidentCosts);
// Move more data to the array
i++;
}
}
The Get methods all look similar to:
private void GetPartsAndOilCosts(string vehicleKey)
{
oilCost = 0;
partsCost = 0;
using (CarteGraphDataDataContext db = new CarteGraphDataDataContext())
{
try
{
var costs = from a in db.WorkOrders
join b in db.MaterialLogs on a.WorkOrderId equals b.WorkOrder
join c in db.Materials on b.wdMaterialMainGeneralOID equals c.wdMaterialMainGeneralOID
where (monthBeginDate.Date <= a.WOClosedDate && a.WOClosedDate <= monthEndDate.Date) && a.EquipmentID == vehicleKey
group b by c.Fuel into d
select new
{
isFuel = d.Key,
totalCost = d.Sum(b => b.Cost)
};
foreach (var cost in costs)
{
if (cost.isFuel == 1)
{
oilCost = (double)cost.totalCost * (1 + OVERHEAD_RATE);
}
else
{
partsCost = (double)cost.totalCost * (1 + OVERHEAD_RATE);
}
}
}
catch (InvalidOperationException e)
{
oilCost = 0;
partsCost = 0;
}
}
return;
}
My thinking here is cutting down the number of queries to the DB should speed up the processing. If LINQ does a SELECT for every record, maybe I need to load every record into memory first.
I still consider myself a beginner with C# and OOP in general (I do mostly RPG programming on the iSeries). So I am guessing I am doing something stupid. Can you help me fix my stupidity (at least with this problem)?
Update: Thought I would come back and update you on what I have discovered. It appears like the database was poorly designed. Whatever LINQ was generating in the background it was highly inefficient code. I am not saying the LINQ is bad, it just was bad for this database. I converted to a quickly thrown together .XSD setup and the processing time went from 1.25 minutes to 15 seconds. Once I do a proper redesign, I can only guess I'll shave a few more seconds off of that. Thank you all for you comments. I'll try LINQ again some other day on a better database.
There are a few things that I spot in your code:
You query the database multiple times for each item in the 'var vehicles' query, you might want to rewrite that query so that less database queries are needed.
When you don't need all the properties of the queried entity, or need sub entities of that entity, it's better for performance to use an anonymous type in your select. LINQ to SQL will analyze this and retrieve less data from your database. Such a select might look like this: select new { a.VehicleId, a.Name }
The query in the GetPartsAndOilCosts can be optimized by putting the calculation cost.totalCost * (1 + OVERHEAD_RATE) in the LINQ query. This way the query can be executed in the database completely, which should make it much faster.
You are doing a Count() on the var vehicles query, but you only use it for determining the size of the array. While LINQ to SQL will make a very efficient SELECT count(*) query of it, it takes an extra round trip to the database. Besides that (depending on your isolation level) the time you start iterating the query an item could be added. In that case your array is too small and an ArrayIndexOutOfBoundsException will be thrown. You can simply use .ToArray() on the query or create a List<EquipmentTable> and call .ToArray() on that. This will normally be fast enough especially when you only have only 380 items in this collection and it will certainly be faster than having an extra roundtrip to the database (the count).
As you probably already expect, the amount of database queries are the actual problem. Switching between struct array or DataTable will not perform much different.
After you optimized away as much queries that you could, start analyzing the queries left (using SQL profiler) and optimize these queries using the Index tuning wizard. It will propose some new indexes for you, that could speed things up considerably.
A little extra explanation for point #1. What you're doing here is a bit like this:
var query = from x in A select something;
foreach (var row in query)
{
var query2 = from y in data where y.Value = row.Value select something;
foreach (var row2 in query2)
{
// do some computation.
}
}
What you should try to accomplish is to remove the query2 subquery, because it is executing on each row of the top query. So you could end up with something like this:
var query =
from x in A
from y in B
where x.Value == y.Value
select something;
foreach (var row in query)
{
}
Of course this example is simplistic and in real life it gets get pretty complicated (as you’ve already noticed). In your case also because you've got multiple of those 'sub queries'. It can take you some time to get this right, especially with your lack of knowledge of LINQ to SQL (as you said yourself).
If you can't figure it out, you can always ask again here at Stackoverflow, but please remember to strip your problem to the smallest possible thing, because it's no fun to read over someone's mess (we're not getting paid for this) :-)
Good luck.
Related
I have to update multiple records in EF and I've come up with the following two methods:
Method 1: ExecuteSqlCommand (direct SQL)
customDB.Database.ExecuteSqlCommand(
#"UPDATE dbo.UserInfo
SET dbo.UserInfo.Email = dbo.IAMUser.Email,
dbo.UserInfo.Mobile = dbo.IAMUser.Phone
FROM dbo.UserInfo
INNER JOIN dbo.IAMUserMapping ON dbo.UserInfo.UserID = dbo.IAMUserMapping.fUserId
INNER JOIN dbo.IAMUser ON IAMUser.IAMID = IAMUserMapping.fIAMID
WHERE dbo.IAMUser.IAMID = #iamid ", new SqlParameter("#iamid", SqlDbType.UniqueIdentifier) { Value = IAMID });
Method 2: Linq foreach:
var ui = from userInfo in customDB.UserInfo
join userMapping in customDB.IAMUserMapping
on userInfo.UserID equals userMapping.fUserId
join iamUser in customDB.IAMUser
on userMapping.IAMUser.IAMID equals iamUser.IAMID
where iamUser.IAMID == IAMID
select new
{
userInfo.UserID,
iamUser.Email,
iamUser.Phone
};
foreach (var x1 in ui)
{
var u = new UserInfo
{
UserID = x1.UserID,
Email = x1.Email,
Mobile = x1.Phone
};
customDB.UserInfo.Attach(u);
var entry = customDB.Entry(u);
entry.Property(e => e.Email).IsModified = true;
entry.Property(e => e.Mobile).IsModified = true;
}
customDB.SaveChanges();
Method #1 is the most efficient, resulting in a single SQL query.
Method #2 is as efficient on the SQL Server side of things, but it generates a lot more round trips to the server. 1 select to get the records, then 1 update for each record that is updated.
Method #2 will give a compile time error if anything in the DB is changed, while #1 will give a run time error.
What do people consider the best practice in cases like these?
Is there any way to get the best of both worlds?
With first approach problems that you may faced will be much more critical and harder to solve, e.g. when renaming/restructuring of entities. Compile time errors are much better, easy to solve then runtime ones.
Also, I am not sure how ExecuteSqlCommand method works, but looks like this code is vulnerable to SQL Injection.
So, I will definitelly choose linq approach.
I have a list of ~ 15,000 'team's that need an individual linq query to return results.
Namely - [Select the last 10 games for 'team']
public IEnumerable<ResultsByDate> SelectLast10Games(DateTime date, string team)
{
return
( from e in db.E0s
where e.DateFormatted < date &&
(e.HomeTeam == team || e.AwayTeam == team)
orderby e.Date descending
select new ResultsByDate
{
Date = e.Date,
HomeTeam = e.HomeTeam,
AwayTeam = e.AwayTeam,
HomeGoals = e.FTHG,
AwayGoals = e.FTAG
}
).Take(10);
}
This query is probably fine, it seems fast enough when called 15,000 times.
My real issue is that I have to enumerate each query and this really kills the performance.
For each of these queries I need to run a method on the 10 results and hence the queries need enumerating.
The question is how can I avoid 15,000 enumerations?
I thought about placing each of the results into a big list and then calling .ToList() or whatever's best, but adding to a List enumerates as it goes along so this doesn't seem viable.
Is there a way to combine all 15,000 LINQ queries into one giant LINQ query such as..
public IEnumerable<ResultsByDate> SelectLast10Games(DateTime date, List<string> Teams)
{
foreach(var team in Teams)
{ var query =
(from e in db.E0s
where e.DateFormatted < date &&
(e.HomeTeam == team || e.AwayTeam == team)
orderby e.Date descending
select new ResultsByDate
{
Date = e.Date,
HomeTeam = e.HomeTeam,
AwayTeam = e.AwayTeam,
HomeGoals = e.FTHG,
AwayGoals = e.FTAG
}
).Take(10);
}
}
So this would return one huge result set that I can then enumerate in one go and work from there?
I have tried but I can't seem to get the LINQ loop correct ( if it's even possible - and the best way to fix my issue).
The whole program takes ~ 29 minutes to complete. Without the enumeration its around 30 seconds which is not amazing but satisfactory given the criteria.
Thanks!
This can be accomplish with using Teams.Select(team => ..)
var query = Teams
.Select(team =>
db.E0s
.Where(e => e.DateFormatted < date && (e.HomeTeam == team || e.AwayTeam == team))
.OrderByDescending(e => e.Date)
.Select(
e =>
new ResultsByDate {
Date = e.Date,
HomeTeam = e.HomeTeam,
AwayTeam = e.AwayTeam,
HomeGoals = e.FTHG,
AwayGoals = e.FTAG
}
)
.Take(10)
)
If you're looking for best performance for heavily querying, you should consider using SQL Stored Procedure and calling it using ADO.NET, Dapper or EntityFramework (The order of choices is from the optimal to the trivial) My recommendation is using Dapper. This will speed up your query, especially if the table is indexed correctly.
To feed 15k parameters efficiently into server, You can use TVP:
http://blog.mikecouturier.com/2010/01/sql-2008-tvp-table-valued-parameters.html
My real issue is that I have to enumerate each query and this really
kills the performance.
Unless You enumerate the result, there is no call to the server. So no wonder it is fast without enumeration. But that does not mean that the enumeration is the problem.
What I want to do, is basically what this question offers: SQL Server - How to display most recent records based on dates in two tables .. Only difference is: I am using Linq to sql.
I have to tables:
Assignments
ForumPosts
These are not very similar, but they both have a "LastUpdated" field. I want to get the most recent joined records. However, I also need a take/skip functionality for paging (and no, I don't have SQL 2012).
I don't want to create a new list (with ToList and AddRange) with ALL my records, so I know the whole set of records, and then order.. That seems extremely unefficient.
My attempt:
Please don't laugh at my inefficient code.. Well ok, a little (both because it's inefficient and... it doesn't do what I want when skip is more than 0).
public List<TempContentPlaceholder> LatestReplies(int take, int skip)
{
using (GKDBDataContext db = new GKDBDataContext())
{
var forumPosts = db.dbForumPosts.OrderBy(c => c.LastUpdated).Skip(skip).Take(take).ToList();
var assignMents = db.dbUploadedAssignments.OrderBy(c => c.LastUpdated).Skip(skip).Take(take).ToList();
List<TempContentPlaceholder> fps =
forumPosts.Select(
c =>
new TempContentPlaceholder()
{
Id = c.PostId,
LastUpdated = c.LastUpdated,
Type = ContentShowingType.ForumPost
}).ToList();
List<TempContentPlaceholder> asm =
assignMents.Select(
c =>
new TempContentPlaceholder()
{
Id = c.UploadAssignmentId,
LastUpdated = c.LastUpdated,
Type = ContentShowingType.ForumPost
}).ToList();
fps.AddRange(asm);
return fps.OrderBy(c=>c.LastUpdated).ToList();
}
}
Any awesome Linq to SQl people, who can throw me a hint? I am sure someone can join their way out of this!
First, you should be using OrderByDescending, since later dates have greater values than earlier dates, in order to get the most recent updates. Second, I think what you are doing will work, for the first page, but you need to only take the top take values from the joined list as well. That is if you want the last 20 entries from both tables combined, take the last 20 entries from each, merge them, then take the last 20 entries from the merged list. The problem comes in when you attempt to use paging because what you will need to do is know how many elements from each list went into making up the previous pages. I think, your best bet is probably to merge them first, then use skip/take. I know you don't want to hear that, but other solutions are probably more complex. Alternatively, you could take the top skip+take values from each table, then merge, skip the skip values and apply take.
using (GKDBDataContext db = new GKDBDataContext())
{
var fps = db.dbForumPosts.Select(c => new TempContentPlaceholder()
{
Id = c.PostId,
LastUpdated = c.LastUpdated,
Type = ContentShowingType.ForumPost
})
.Concat( db.dbUploadedAssignments.Select(c => new TempContentPlaceholder()
{
Id = c.PostId,
LastUpdated = c.LastUpdated,
Type = ContentShowingType.ForumPost
}))
.OrderByDescending( c => c.LastUpdated )
.Skip(skip)
.Take(take)
.ToList();
return fps;
}
I have the query that returns parent with filtered child's:
Context.ContextOptions.LazyLoadingEnabled = false;
var query1 = (from p in Context.Partners
where p.PartnerCategory.Category == "03"
|| p.PartnerCategory.Category == "02"
select new
{
p,
m = from m in p.Milk
where m.Date >= beginDate
&& m.Date <= endDate
&& m.MilkStorageId == milkStorageId
select m,
e = p.ExtraCodes,
ms = from ms in p.ExtraCodes
select ms.MilkStorage,
mp = from mp in p.MilkPeriods
where mp.Date >= beginDate
&& mp.Date <= endDate
select mp
}).Where(
p =>
p.p.ExtraCodes.Select(ex => ex.MilkStorageId).Contains(
milkStorageId) ).OrderBy(p => p.p.Name);
var partners = query1.AsEnumerable().ToList();
Query return 200 records and converting from IOrderedQueryable ToList() is very slow. Why?
After profiling query in sql server management studio i've noticed that query execute's 1 second and returns 2035 records.
There could be a number of reasons for this and without any profiler information it's just guess work and even highly educated guess work by some one that knows the code and domain well is often wrong.
You should profile the code and since it's likely that the bottleneck is in the DB get the command text as #Likurg suggests and profile that in the DB. It's likely that you are missing one or more indexes.
There's a few things you could do to the query it self as well if for nothing else to make it easier to understand and potentially faster
E.g.
p.p.ExtraCodes.Select(ex => ex.MilkStorageId).Contains(milkStorageId)
is really
p.p.ExtraCodes.Any(ex => ex.MilkStorageId == milkStorageId)
and could be moved to the first where clause potentially lowering the number of anonymously typed objects you create. That said the most likely case is that one of the many fields you use in your comparisons are with out an index potentially resulting in a lot of table scans for each element in the result set.
Some of the fields where an index might speed things up are
p.p.Name
m.Date
m.MilkStorageId
mp.Date
PartnerCategory.Category
The reason it is slow is because when you do ToList that is the time when the actual query execution takes place. This is called deferred execution.
You may see: LINQ and Deferred Execution
I don't think you need to do AsEnumerable when converting it to a list, you can do it directly like:
var partners = query1.ToList();
At first, look at the generated query by using this
Context.GetCommand(query1).CommandText;
then invoke this command in db. And check how many records reads by profiler.
I've been searching for possible solutions and attempting this for several hours without luck. Any help will be greatly appreciated.
I've got a Sql statement which I'm trying to put together as a C# LINQ query.
Here is the working SQL:
SELECT up.UserProfileID
,up.FirstName
,up.LastName
,SUM(CASE WHEN ul.CompletionDate IS NULL THEN 0
ELSE ISNULL(ul.Score, 0)
END) AS TotalScore
FROM dbo.UserProfile up
LEFT OUTER JOIN dbo.UserLearning ul ON up.UserProfileID = ul.UserProfileID
WHERE up.ManagerUserProfileID IS NULL
GROUP BY up.UserProfileID, up.FirstName, up.LastName
I've tried several different ways but seem to end up with either a statement that doesn't return what I want or doesn't execute successfully
My current (non-working) code looks something like this:
var pd = from up in db.UserProfiles
join ul in db.UserLearnings on up.UserProfileID equals ul.UserProfileID into temp
from upJOINul in temp.DefaultIfEmpty(new UserLearning() { Score = 0 })
where up.ManagerUserProfileID.Equals(null)
group up by new
{
UserProfileID = up.UserProfileID,
FirstName = up.FirstName,
LastName = up.LastName,
TotalScore = up.UserLearnings.Sum(u => u.Score)
};
Thank you for any help
After several more attempts and further use of google I finally managed to get a working solution. I hope it'll be of use to someone else.
var pd = db.UserProfiles.AsEnumerable()
.Where(up => up.ManagerUserProfileID.Equals(null))
.Select(up => new
{
UserProfileID = up.UserProfileID,
FirstName = up.FirstName,
LastName = up.LastName,
TotalScore = up.UserLearnings
.Where(ul => ul.CompletionDate.HasValue && ul.Score.HasValue)
.DefaultIfEmpty()
.Sum(ul => ul != null && ul.Score.HasValue ? ul.Score : 0)
});
Not what you asked for, but if you have a working complex SQL query, that is fairly static, put it in a stored proc, and drag that SP to your LINQ DataContext.
The LINQ provider has to compile your query to sql every time it's called, and that takes time, and server CPU cycles. If it's a complex query, it can eat up significant resources. Also may miss some optimizations you can do with straight SQL.
Unless of course there is a purpose to it.
If you have ORM problem, grap the actual SQL commands, take a look at it, and compare with what you want to achieve. Can you show the generated SQL as well, so we can find the difference easier?