In my service, first I generate 40,000 possible combinations of home and host countries, like so (clientLocations contains 200 records, so 200 x 200 is 40,000):
foreach (var homeLocation in clientLocations)
{
foreach (var hostLocation in clientLocations)
{
allLocationCombinations.Add(new AirShipmentRate
{
HomeCountryId = homeLocation.CountryId,
HomeCountry = homeLocation.CountryName,
HostCountryId = hostLocation.CountryId,
HostCountry = hostLocation.CountryName,
HomeLocationId = homeLocation.LocationId,
HomeLocation = homeLocation.LocationName,
HostLocationId = hostLocation.LocationId,
HostLocation = hostLocation.LocationName,
});
}
}
Then, I run the following query to find existing rates for the locations above, but also include empty the missing rates; resulting in a complete recordset of 40,000 rows.
var allLocationRates = (from l in allLocationCombinations
join r in Db.PaymentRates_AirShipment
on new { home = l.HomeLocationId, host = l.HostLocationId }
equals new { home = r.HomeLocationId, host = (Guid?)r.HostLocationId }
into matches
from rate in matches.DefaultIfEmpty(new PaymentRates_AirShipment
{
Id = Guid.NewGuid()
})
select new AirShipmentRate
{
Id = rate.Id,
HomeCountry = l.HomeCountry,
HomeCountryId = l.HomeCountryId,
HomeLocation = l.HomeLocation,
HomeLocationId = l.HomeLocationId,
HostCountry = l.HostCountry,
HostCountryId = l.HostCountryId,
HostLocation = l.HostLocation,
HostLocationId = l.HostLocationId,
AssigneeAirShipmentPlusInsurance = rate.AssigneeAirShipmentPlusInsurance,
DependentAirShipmentPlusInsurance = rate.DependentAirShipmentPlusInsurance,
SmallContainerPlusInsurance = rate.SmallContainerPlusInsurance,
LargeContainerPlusInsurance = rate.LargeContainerPlusInsurance,
CurrencyId = rate.RateCurrencyId
});
I have tried using .AsEnumerable() and .AsNoTracking() and that has sped things up quite a bit. The following code shaves several seconds off of my query:
var allLocationRates = (from l in allLocationCombinations.AsEnumerable()
join r in Db.PaymentRates_AirShipment.AsNoTracking()
But, I am wondering: How can I speed this up even more?
Edit: Can't replicate foreach functionality in linq.
allLocationCombinations = (from homeLocation in clientLocations
from hostLocation in clientLocations
select new AirShipmentRate
{
HomeCountryId = homeLocation.CountryId,
HomeCountry = homeLocation.CountryName,
HostCountryId = hostLocation.CountryId,
HostCountry = hostLocation.CountryName,
HomeLocationId = homeLocation.LocationId,
HomeLocation = homeLocation.LocationName,
HostLocationId = hostLocation.LocationId,
HostLocation = hostLocation.LocationName
});
I get an error on from hostLocation in clientLocations which says "cannot convert type IEnumerable to Generic.List."
The fastest way to query a database is to use the power of the database engine itself.
While Linq is a fantastic technology to use, it still generates a select statement out of the Linq query, and runs this query against the database.
Your best bet is to create a database View, or a stored procedure.
Views and stored procedures can easily be integrated into Linq.
Material Views ( in MS SQL ) can further speed up execution, and missing indexes are by far the most effective tool in speeding up database queries.
How can I speed this up even more?
Optimizing is a bitch.
Your code looks fine to me. Make sure to set the index on your DB schema where it's appropriate. And as already mentioned: Run your Linq against SQL to get a better idea of the performance.
Well, but how to improve performance anyway?
You may want to have a glance at the following link:
10 tips to improve LINQ to SQL Performance
To me, probably the most important points listed (in the link above):
Retrieve Only the Number of Records You Need
Turn off ObjectTrackingEnabled Property of Data Context If Not
Necessary
Filter Data Down to What You Need Using DataLoadOptions.AssociateWith
Use compiled queries when it's needed (please be careful with that one...)
Related
I have a datatable in memory and I need to select some records from it, walk through the records making changes to fields and they same the changes back to the datatable. I can do this with filters, views, and sql but I'm trying to do it in Linq.
var results = (from rows in dtTheRows.AsEnumerable()
select new
{
rows.Job,
}).Distinct();
foreach (var row in results)
{
firstRow = true;
thisOnHand = 0;
var here = from thisRow in dtTheRows.AsEnumerable()
orderby thisRow.PromisedDate
select new
{
thisRow.OnHandQuantity,
thisRow.Balance,
thisRow.RemainingQuantity
};
foreach(var theRow in here)
{
// business logic here ...
theRow.OnHandQuantity = 5;
} // foreach ...
The first linq query and foreach are gain the list of subsets of data to be considered. I include it here in case it is relevant. My problem is at this line:
heRow.OnHandQuantity = 5;
My error is:
"Error 19 Property or indexer 'AnonymousType#1.OnHandQuantity' cannot be assigned to -- it is read only"
What am I missing here? Can I update this query back into the original datatable?
var here = from thisRow in dtTheRows.AsEnumerable()
orderby thisRow.PromisedDate
select new
{
thisRow.OnHandQuantity,
thisRow.Balance,
thisRow.RemainingQuantity
};
Instead of passing three variables in select, pass thisRow itself. That may solve error on statement - theRow.OnHandQuantity = 5;
The error is self descriptive, you can't update/modify an anonymous type. You have to return the original entity you want to modify from your query.
select thisRow;
instead of
select new
{
thisRow.OnHandQuantity,
thisRow.Balance,
thisRow.RemainingQuantity
};
I have started using performance wizard in visual studio 2012 because there was a slow method which is basically used to get all users from the datacontext. I fixed the initial problem but I am now curious if I can make it faster.
Currently I am doing this:
public void GetUsers(UserManagerDashboard UserManagerDashboard)
{
try
{
using (GenesisOnlineEnties = new GenesisOnlineEntities())
{
var q = from u in GenesisOnlineEnties.vw_UserManager_Users
select u;
foreach (var user in q)
{
User u = new User();
u.UserID = user.UserId;
u.ApplicationID = user.ApplicationId;
u.UserName = user.UserName;
u.Salutation = user.Salutation;
u.EmailAddress = user.Email;
u.Password = user.Password;
u.FirstName = user.FirstName;
u.Surname = user.LastName;
u.CompanyID = user.CompanyId;
u.CompanyName = user.CompanyName;
u.GroupID = user.GroupId;
u.GroupName = user.GroupName;
u.IsActive = user.IsActive;
u.RowType = user.UserType;
u.MaximumConcurrentUsers = user.MaxConcurrentUsers;
u.ModuleID = user.ModuleId;
u.ModuleName = user.ModuleName;
UserManagerDashboard.GridUsers.users.Add(u);
}
}
}
catch (Exception ex)
{
}
}
It's a very straight forward method. Connect to the database using entity framework, get all users from the view "vw_usermanager_users" and populate the object which is part of a collection.
I was casting ?int to int and I changed the property to ?int so no cast is needed. I know that it is going to take longer because I am looping through records. But is it possible to speed this query up?
Ok, first things first, what does your vw_UserManager_Users object look like? If any of those properties you're referencing are navigational properties:-
public partial class UserManager_User
{
public string GroupName { get { return this.Group.Name; } }
// See how the getter traverses across the "Group" relationship
// to get the name?
}
then you're likely running face-first into this issue - basically you'll be querying your database once for the list of users, and then once (or more) for each user to load the relationships. Some people, when faced with a problem, think "I know, I'll use an O/RM". Now they have N+1 problems.
You're better to use query projection:-
var q = from u in GenesisOnlineEnties.vw_UserManager_Users
select new User()
{
UserID = u.UserId,
ApplicationId = u.ApplicationID,
GroupName = u.Group.Name, // Does the join on the database instead
...
};
That way, the data is already in the right shape, and you only send the columns you actually need across the wire.
If you want to get fancy, you can use AutoMapper to do the query projection for you; saves on some verbosity - especially if you're doing the projection in multiple places:-
var q = GenesisOnlineEnties.vw_UserManager_Users.Project().To<User>();
Next up, what grid are you using? Can you use databinding (or simply replace the Grid's collection) rather than populating it one-by-one with the results from your query?:-
UserManagerDashboard.GridUsers.users = q.ToList();
or:-
UserManagerDashboard.GridUsers.DataSource = q.ToList();
or maybe:-
UserManagerDashboard.GridUsers = new MyGrid(q.ToList());
The way you're adding the users to the grid right now is like moving sand from one bucket to another one grain at a time. If you're making a desktop app it's even worse because adding an item to the grid will probably trigger a redraw of the UI (i.e. one grain at a time and, describing every grain in the bucket to your buddy after each one). Either way you're doing unnecessary work, see what methods your grid gives you to avoid this.
How many users are in the table? If the number is very large, then you'll want to page your results. Make sure that the paging happens on the database rather than after you've got all the data - otherwise it kind of defeats the purpose:-
q = q.Skip(index).Take(pageSize);
though bear in mind that some grids interact with IQueryable to do paging out-of-the-box, in that case you'd just pass q to the grid directly.
Those are the obvious ones. If that doesn't fix your problem, post more code and I'll take a deeper look.
Yes, by turning off change tracking:
var q = from u in GenesisOnlineEnties.vw_UserManager_Users.AsNoTracking()
select u;
Unless you are using all the properties on the entity you can also select only the columns you want.
var q = from u in GenesisOnlineEnties.vw_UserManager_Users.AsNoTracking()
select new User
{
UserId = u.UserId,
...
}
I've read MANY different solutions for the separate functions of LINQ that, when put together would solve my issue. My problem is that I'm still trying to wrap my head about how to put LINQ statements together correctly. I can't seem to get the syntax right, or it comes up mish-mash of info and not quite what I want.
I apologize ahead of time if half of this seems like a duplicate. My question is more specific than just reading the file. I'd like it all to be in the same query.
To the point though..
I am reading in a text file with semi-colon separated columns of data.
An example would be:
US;Fort Worth;TX;Tarrant;76101
US;Fort Worth;TX;Tarrant;76103
US;Fort Worth;TX;Tarrant;76105
US;Burleson;TX;Tarrant;76097
US;Newark;TX;Tarrant;76071
US;Fort Worth;TX;Tarrant;76103
US;Fort Worth;TX;Tarrant;76105
Here is what I have so far:
var items = (from c in (from line in File.ReadAllLines(myFile)
let columns = line.Split(';')
where columns[0] == "US"
select new
{
City = columns[1].Trim(),
State = columns[2].Trim(),
County = columns[3].Trim(),
ZipCode = columns[4].Trim()
})
select c);
That works fine for reading the file. But my issue after that is I don't want the raw data. I want a summary.
Specifically I need the count of the number of occurrences of the City,State combination, and the count of how many times the ZIP code appears.
I'm eventually going to make a tree view out of it.
My goal is to have it laid out somewhat like this:
- Fort Worth,TX (5)
- 76101 (1)
- 76103 (2)
- 76105 (2)
- Burleson,TX (1)
- 76097 (1)
- Newark,TX (1)
- 76071 (1)
I can do the tree thing late because there is other processing to do.
So my question is: How do I combine the counting of the specific values in the query itself? I know of the GroupBy functions and I've seen Aggregates, but I can't get them to work correctly. How do I go about wrapping all of these functions into one query?
EDIT: I think I asked my question the wrong way. I don't mean that I HAVE to do it all in one query... I'm asking IS THERE a clear, concise, and efficient way to do this with LINQ in one query? If not I'll just go back to looping through.
If I can be pointed in the right direction it would be a huge help.
If someone has an easier idea in mind to do all this, please let me know.
I just wanted to avoid iterating through a huge array of values and using Regex.Split on every line.
Let me know if I need to clarify.
Thanks!
*EDIT 6/15***
I figured it out. Thanks to those who answered it helped out, but was not quite what I needed. As a side note I ended up changing it all up anyways. LINQ was actually slower than doing it other ways that I won't go into as it's not relevent. As to those who made multiple comments on "It's silly to have it in one query", that's the decision of the designer. All "Best Practices" don't work in all places. They are guidelines. Believe me, I do want to keep my code clear and understandable but I also had a very specific reasoning for doing it the way I did.
I do appreciate the help and direction.
Below is the prototype that I used but later abandoned.
/* Inner LINQ query Reads the Text File and gets all the Locations.
* The outer query summarizes this by getting the sum of the Zips
* and orders by City/State then ZIP */
var items = from Location in(
//Inner Query Start
(from line in File.ReadAllLines(FilePath)
let columns = line.Split(';')
where columns[0] == "US" & !string.IsNullOrEmpty(columns[4])
select new
{
City = (FM.DecodeSLIC(columns[1].Trim()) + " " + columns[2].Trim()),
County = columns[3].Trim(),
ZipCode = columns[4].Trim()
}
))
//Inner Query End
orderby Location.City, Location.ZipCode
group Location by new { Location.City, Location.ZipCode , Location.County} into grp
select new
{
City = grp.Key.City,
County = grp.Key.County,
ZipCode = grp.Key.ZipCode,
ZipCount = grp.Count()
};
The downside of using File.ReadAllLines is that you have to pull the entire file into memory before operating over it. Also, using Columns[] is a bit clunky. You might want to consider my article describing using DynamicObject and streaming the file as an alternative implemetnation. The grouping/counting operation is secondary to that discussion.
var items = (from c in
(from line in File.ReadAllLines(myFile)
let columns = line.Split(';')
where columns[0] == "US"
select new
{
City = columns[1].Trim(),
State = columns[2].Trim(),
County = columns[3].Trim(),
ZipCode = columns[4].Trim()
})
select c);
foreach (var i in items.GroupBy(an => an.City + "," + an.State))
{
Console.WriteLine("{0} ({1})",i.Key, i.Count());
foreach (var j in i.GroupBy(an => an.ZipCode))
{
Console.WriteLine(" - {0} ({1})", j.Key, j.Count());
}
}
There is no point getting everything into one query. It's better to split the queries so that it would be meaningful. Try this to your results
var grouped = items.GroupBy(a => new { a.City, a.State, a.ZipCode }).Select(a => new { City = a.Key.City, State = a.Key.State, ZipCode = a.Key.ZipCode, ZipCount = a.Count()}).ToList();
Result screen shot
EDIT
Here is the one big long query which gives the same output
var itemsGrouped = File.ReadAllLines(myFile).Select(a => a.Split(';')).Where(a => a[0] == "US").Select(a => new { City = a[1].Trim(), State = a[2].Trim(), County = a[3].Trim(), ZipCode = a[4].Trim() }).GroupBy(a => new { a.City, a.State, a.ZipCode }).Select(a => new { City = a.Key.City, State = a.Key.State, ZipCode = a.Key.ZipCode, ZipCount = a.Count() }).ToList();
I'm currently learning Linq to Sql and Im very surprised by the performance of selecting data. I'm retreving joined data from few tables. I select about 40k of rows. Mapping this data to objects using ADO times about 35s, using NHbiernate times about 130s and what is suspicious using Linq To Sql only 3,5s. Additionally I would like to write that I'm using immediately loading which looks like:
THESIS th = new THESIS(connectionString);
DataLoadOptions dlo = new DataLoadOptions();
dlo.LoadWith<NumericFormula>(x => x.RPN);
dlo.LoadWith<RPN>(x => x.RPNDetails);
dlo.LoadWith<RPNDetail>(x => x.Parameter);
th.LoadOptions = dlo;
th.Log = Console.Out;
Looking to the logs when I'm iterating I can't see that Linq To Sql generate some additional queries to database.
I'm very surprised by huge differences in performance and I wonder that maybe I don't understand something.
Could someone explain me why it works so fast?
To measure time I'm using Stopwatch class.
ADO.NET Code:
public static List<NumericFormulaDO> SelectAllNumericFormulas()
{
var nFormulas = new List<NumericFormulaDO>();
string queryString = #"
SELECT *
FROM NumericFormula nf
Left Join Unit u on u.Unit_Id = nf.Unit_Id
Left Join UnitType ut on ut.UnitType_Id = u.UnitType_Id
Join RPN r on r.RPN_Id = nf.RPN_Id
Join RPNDetails rd on rd.RPN_Id = r.RPN_Id
Join Parameter par on par.Parameter_Id = rd.Parameter_Id where nf.NumericFormula_Id<=10000";
using (var connection = new SqlConnection(connectionString))
{
var command = new SqlCommand(queryString, connection);
connection.Open();
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
var det = new RPNDetailsDO();
det.RPNDetails_Id = Int32.Parse(reader["RPNDetails_Id"].ToString());
det.RPN_Id = Int32.Parse(reader["RPN_Id"].ToString());
det.Identifier = reader["Identifier"].ToString();
det.Parameter.Architecture = reader["Architecture"].ToString();
det.Parameter.Code = reader["Code"].ToString();
det.Parameter.Description = reader["Description"].ToString();
det.Parameter.Parameter_Id = Int32.Parse(reader["Parameter_Id"].ToString());
det.Parameter.ParameterType = reader["ParameterType"].ToString();
det.Parameter.QualityDeviationLevel = reader["QualityDeviationLevel"].ToString();
if (nFormulas.Count > 0)
{
if (nFormulas.Any(x => x.RPN.RPN_Id == Int32.Parse(reader["RPN_Id"].ToString())))
{
nFormulas.First(x=>x.RPN.RPN_Id == Int32.Parse(reader["RPN_Id"].ToString())).RPN.RPNDetails.Add(det);
}
else
{
NumericFormulaDO nFormula = CreatingNumericFormulaDO(reader, det);
nFormulas.Add(nFormula);
//System.Diagnostics.Trace.WriteLine(nFormulas.Count.ToString());
}
}
else
{
NumericFormulaDO nFormula = CreatingNumericFormulaDO(reader, det);
nFormulas.Add(nFormula);
//System.Diagnostics.Trace.WriteLine(nFormulas.Count.ToString());
}
}
}
}
return nFormulas;
}
private static NumericFormulaDO CreatingNumericFormulaDO(SqlDataReader reader, RPNDetailsDO det)
{
var nFormula = new NumericFormulaDO();
nFormula.CalculateDuringLoad = Boolean.Parse(reader["CalculateDuringLoad"].ToString());
nFormula.NumericFormula_Id = Int32.Parse(reader["NumericFormula_Id"].ToString());
nFormula.RPN.RPN_Id = Int32.Parse(reader["RPN_Id"].ToString());
nFormula.RPN.Formula = reader["Formula"].ToString();
nFormula.Unit.Name = reader["Name"].ToString();
if (reader["Unit_Id"] != DBNull.Value)
{
nFormula.Unit.Unit_Id = Int32.Parse(reader["Unit_Id"].ToString());
nFormula.Unit.UnitType.Type = reader["Type"].ToString();
nFormula.Unit.UnitType.UnitType_Id = Int32.Parse(reader["UnitType_Id"].ToString());
}
nFormula.RPN.RPNDetails.Add(det);
return nFormula;
}
LINQ to SQL Code:
THESIS th = new THESIS(connectionString);
DataLoadOptions dlo = new DataLoadOptions();
dlo.LoadWith<NumericFormula>(x => x.RPN);
dlo.LoadWith<RPN>(x => x.RPNDetails);
dlo.LoadWith<RPNDetail>(x => x.Parameter);
th.LoadOptions = dlo;
th.Log = Console.Out;
var nFormulas =
th.NumericFormulas.ToList<NumericFormula>();
NHibernate Code:
IQueryable<NumericFormulaDO> nFormulas =
session.Query<NumericFormulaDO>()
.Where(x=>x.NumericFormula_Id <=10000);
List<NumericFormulaDO> nForList =
new List<NumericFormulaDO>();
nForList = nFormulas.ToList<NumericFormulaDO>();
Related to your comments you can see that in ADO I'm using SqlReader and in LINQ I try to use immediate execution.
Of course it is possible that my mapping "algorithm" in ADO part it's not very good but NHibernate is much more slow than ADO (4x slower) so I wonder if for sure is everything alright in LINQ to SQL part because I think in NHibernate is everything good and after all is much more slow than little confusing ADO part.
Thank you guys for responses.
LINQ-to-SQL consumes ADO.NET and has additional overheads, so no: it shouldn't be faster unless it isn't doing the same work. There was mention of access via ordinals vs names, but frankly that affects micro-seconds, not seconds. It won't explain an order of magnitude change.
The only way to answer this is to trace what LINQ-to-SQL is doing. Fortunately this is simple - you can just do:
dbContext.Log = Console.Out;
which will write the TSQL is executes to the console. There are two options then:
you discover the TSQL isn't doing the same thing (maybe it isn't eager-loading)
you discover the TSQL is valid (=doing the same), but has a better plan - in which case... "borrow" it :p
Once you have the TSQL to compare, test that side-by-side, so you are testing the same work. If you want the convenience without the overheads, I'd look at "dapper" - takes away the boring grunt-work of mapping readers to objects, but very optimised.
Rewritten ADO.NET code based on above remarks, this should be a lot faster. You could still improve by using the ordinal value instead of the column names and by reading the fields in exactly the same order as in the query, but those are micro optimizations.
I've also removed a couple of duplications. You might also want to check how to improve the performance of typecasting and conversions, as the Parse(ToString) route is very inefficient and can cause very strange issues when running with systems running in different languages. There's also a chance of dataloss when doing these conversions when decimal, float or doubles are involved, as not all of their values can be translated to strings correctly (or can't roundtrip back).
public static List<NumericFormulaDO> SelectAllNumericFormulas()
{
var nFormulas = new Dictionary<int, NumericFormulaDO>();
string queryString = #"
SELECT *
FROM NumericFormula nf
Left Join Unit u on u.Unit_Id = nf.Unit_Id
Left Join UnitType ut on ut.UnitType_Id = u.UnitType_Id
Join RPN r on r.RPN_Id = nf.RPN_Id
Join RPNDetails rd on rd.RPN_Id = r.RPN_Id
Join Parameter par on par.Parameter_Id = rd.Parameter_Id where nf.NumericFormula_Id<=10000";
using (var connection = new SqlConnection(connectionString))
{
connection.Open();
using (var command = new SqlCommand(queryString, connection));
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
var det = new RPNDetailsDO();
det.RPNDetails_Id = (int) reader.GetValue("RPNDetails_Id");
det.RPN_Id = (int) reader.GetValue("RPN_Id");
det.Identifier = (string) reader.GetValue("Identifier");
det.Parameter.Architecture = (string)reader.GetValue("Architecture");
det.Parameter.Code = (string)reader.GetValue("Code");
det.Parameter.Description = (string)reader.GetValue("Description");
det.Parameter.Parameter_Id = (int) reader.GetValue("Parameter_Id");
det.Parameter.ParameterType = (string)reader.GetValue("ParameterType");
det.Parameter.QualityDeviationLevel = (string)reader.GetValue("QualityDeviationLevel");
NumericFormulaDO parent = null;
if (!nFormulas.TryGetValue((int)reader.GetValue("RPN_Id"), out parent)
{
parent = CreatingNumericFormulaDO(reader, det);
nFormulas.Add(parent.RPN.RPNID, parent);
}
else
{
parent.RPN.RPNDetails.Add(det);
}
}
}
}
return nFormulas.Values.ToList();
}
I have the following three tables, and need to bring in information from two dissimilar tables.
Table baTable has fields OrderNumber and Position.
Table accessTable has fields OrderNumber and ProcessSequence (among others)
Table historyTable has fields OrderNumber and Time (among others).
.
var progress = from ba in baTable
from ac in accessTable
where ac.OrderNumber == ba.OrderNumber
select new {
Position = ba.Position.ToString(),
Time = "",
Seq = ac.ProcessSequence.ToString()
};
progress = progress.Concat(from ba in baTable
from hs in historyTable
where hs.OrderNumber == ba.OrderNumber
select new {
Position = ba.Position.ToString(),
Time = String.Format("{0:hh:mm:ss}", hs.Time),
Seq = ""
});
int searchRecs = progress.Count();
The query compiles successfully, but when the SQL executes during the call to Count(), I get an error
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.
Clearly the two lists each have three items, one of which is a constant. Other help boards suggested that the Visual Studio 2010 C# compiler was optimizing out the constants, and I have experimented with alternatives to the constants.
The most surprising thing is that, if the Time= entry within the select new {...} is commented out in both of the sub-queries, no error occurs when the SQL executes.
I actually think the problem is that Sql won't recognize your String.Format(..) method.
Change your second query to:
progress = progress.Concat(from ba in baTable
from hs in historyTable
where hs.OrderNumber == ba.OrderNumber
select new {
Position = ba.Position.ToString(),
Time = hs.Time.ToString(),
Seq = ""
});
After that you could always loop trough the progress and format the Time to your needs.