We see a strange behaviour in my .NET 6.0 application in my production environment. Shortly described the applications endpoint does the follwoing:
Fetches a list of objects from a database
Maps the objects from SQLReader to a model (with some custom logic)
Returns the list
Seems simple and it runs very fast for the most time being. But some times we see a very long running task between the mapping of SQLReader output to the model class.
The mapping follows this object creation:
private static MyCustomObject MapRecord(int indexNo, long id, IDataReader reader)
return new MyCustomObject(indexNo,
reader.GetDateTime(DATE),
id,
reader.GetString(ID),
reader.TryRead(TEXT, string.Empty),
reader.GetDecimal(NUMBER1),
reader.GetDecimal(NUMBER2),
reader.TryRead(DATA, string.Empty),
reader.TryRead(CATEGORY_ID, 0),
GetCategoryScores(reader.TryRead(CATEGORIES, string.Empty)),
reader.TryRead<bool?>(SOMEBOOL, null),
reader.TryRead(SOMEOTHERBOOL, false),
reader.TryRead(COMMENT, string.Empty),
);
}
At first we suspected the GetCategoryScores, which does a string split in two dimensions:
private static IList<(int, double)>? GetCategoryScores(string? scores)
{
if (scores != null)
{
return scores.Split(';' , StringSplitOptions.RemoveEmptyEntries)
.Select(x =>
{
var split = x.Split(':');
var categoryScore = string.IsNullOrEmpty(split[1]) ? 0 : double.Parse(split[1], CultureInfo.InvariantCulture);
return (int.Parse(split[0]), categoryScore);
}).ToList();
}
return null;
}
My Jaeger traces showed that, it was NOT the case, hence i can only see the "TryRead" function to be the sinner. It looks like this:
public static T? TryRead<T>(this IDataReader reader, int ordinal, T? defaultOnNull) =>
reader.IsDBNull(ordinal)
? defaultOnNull
: (T)reader[ordinal];
My other suspicion is that it is when the GC collects unused objects, it takes time. Usually the execution time of the MapRecord function is negligible, but when it takes time it takes between 1s and 1.5s.
The application is deployed as a docker container and runs on a kubernetes cluster.
Has anyone seen anything like this before and have any suggestions of what to do?
If it's not GC, my guess is that it's the reader having to wait for data being streamed back from the database, most likely because other queries are locking tables and delaying the result, but maybe also due to general CPU/disk contention or networking issues.
Related
I'm wondering if anyone else has experienced this before?
I should start by saying I'm not using ModelsBuilder for this project. I was having too many problems with it so abandoned that route.
I am however converting IPublishedContent items into Dtos within my app, using a converter class that basically maps the values. The problem I'm finding is that it's causing a massive slowdown in my code execution, especially in comparision to just getting the raw IPublishedContent collection.
To give you an example, I have a 'Job' document type. Jobs can be assigned to workers. In one of my services I need to get a collection of all jobs assigned to a worker:
public IEnumerable<IPublishedContent> GetJobsForWorker(int workerId)
{
var jobs = Umbraco.TypedContent(1234);
return jobs.Descendants("job").Where(j => j.GetPropertyValue<int>("assignedWorker") == workerId).ToList();
}
This function returns a collection of IPublishContent, which returns lightning fast, as I'd expect.
However if I try and convert the results to my job Dto class, it goes from taking 0 seconds, to around 7.. and that's just returning a collection of ~7 from ~20 or so records:
public IEnumerable<Job> GetJobsCompletedByWorker(int workerId)
{
var jobs = Umbraco.TypedContent(1234);
return jobs.Descendants("job").Where(j => j.GetPropertyValue<int>("assignedWorker") == workerId).Select(node => _jobConverter.ConvertToModel(_umbracoHelper, node)).ToList();
}
Now I'm not doing any complex processing in this converter, it's just mapping the values as such:
public class JobConverter
{
public Job ConvertToModel(UmbracoHelper umbracoHelper, IPublishedContent node)
{
if (node != null)
{
var job = new Job
{
Id = node.Id,
Property1 = node.GetPropertyValue<string>("property1"),
Property2 = node.GetPropertyValue<string>("property2")
... more properties
};
return job;
}
return null;
}
}
I'm not really sure what best practice is here? Is there something I'm missing that's causing this slowdown? I only ask because I've used ModelsBuilder before which essentially does the same thing ie. map umbraco fields to properties, and yet there's nowhere near the same delay.
Ultimately I could just use IPublishedContent, but it makes for messy code and it's far more difficult to understand.
I just wonder if anyone's been in this situation before and how they handled it?
Thanks
It turns out I actually had a helper method running on one of my properties that was querying member data, which made a database call.. hence the slowdown!
I was using servicestack.redis recently, and I need query from IRedisTypedClient. I know all data is in memory, but still want to know, is there a speed different between GetAll().Where() and GetByIds()?
GetAll() and GetByIds() are two methods provided by servicestack.redis.
Use GetAll() can continue search in result(by lambda), that means I can use some custom conditions, but I don't know, whether that will load all data from redis memory then search in IEnumable<T>, and whether the search speed will slower than GetByIds().
I just did a experiment, I stored 1 million object(ps:there is a servicestack's bug, can only store about half million object once).
queried with these two methods.
DateTime beginDate = DateTime.Now;
Debug.WriteLine("查询1开始");`
Website site = WebsiteRedis.GetByCondition(w => w.Name == "网址2336677").First();
double time = (DateTime.Now - beginDate).TotalMilliseconds;
Debug.WriteLine("耗时:" + time + "ms");
DateTime beginDate2 = DateTime.Now;
Debug.WriteLine("查询2开始");
Website site2 = WebsiteRedis.GetByID(new Guid("29284415-5de0-4781-bea4-5e01332814b2"));
double time2 = (DateTime.Now - beginDate2).TotalMilliseconds;
Debug.WriteLine("耗时:" + time2 + "ms");
Result is
GetAll().Where() - takes 19 seconds,
GetById()- take 190ms.
I guess it's because servicestack use object id as redis key, so never use GetAll().Where() as query, every object should related with id and use GetById() as query. GetAll() should use on object with less records.
You can have a look at the implementations of GetAll and GetByIds to see how they work.
GetByIds just converts all Ids to a fully qualified Key which each entry is stored under then calls GetValues() which creates a single MGET request to fetch all the values:
public IList<T> GetByIds(IEnumerable ids)
{
if (ids != null)
{
var urnKeys = ids.Map(x => client.UrnKey<T>(x));
if (urnKeys.Count != 0)
return GetValues(urnKeys);
}
return new List<T>();
}
public IList<T> GetAll()
{
var allKeys = client.GetAllItemsFromSet(this.TypeIdsSetKey);
return this.GetByIds(allKeys.ToArray());
}
GetAll fetches all the Ids from the TypeIdsSetKey (i.e. Redis SET containing all ids for that Type) then calls GetByIds().
So GetByIds is faster because it makes one less call to Redis, but together they only make 2 Redis operations.
Note they both return an in memory .NET List<T> so you can use LINQ to further filter the returned results, but it returns all results for that Type and the filtering is performed on the client so this isn't efficient for large datasets. Instead you should look at creating manual indexes using Redis SETs for common queries.
We're investigating a performance issue where EF 6.1.3 is being painfully slow, and we cannot figure out what might be causing it.
The database context is initialized with:
Configuration.ProxyCreationEnabled = false;
Configuration.AutoDetectChangesEnabled = false;
Configuration.ValidateOnSaveEnabled = false;
We have isolated the performance issue to the following method:
protected virtual async Task<long> UpdateEntityInStoreAsync(T entity,
string[] changedProperties)
{
using (var session = sessionFactory.CreateReadWriteSession(false, false))
{
var writer = session.Writer<T>();
writer.Attach(entity);
await writer.UpdatePropertyAsync(entity, changedProperties.ToArray()).ConfigureAwait(false);
}
return entity.Id;
}
There are two names in the changedProperties list, and EF correctly generated an update statement that updates just these two properties.
This method is called repeatedly (to process a collection of data items) and takes about 15-20 seconds to complete.
If we replace the method above with the following, execution time drops to 3-4 seconds:
protected virtual async Task<long> UpdateEntityInStoreAsync(T entity,
string[] changedProperties)
{
var sql = $"update {entity.TypeName()}s set";
var separator = false;
foreach (var property in changedProperties)
{
sql += (separator ? ", " : " ") + property + " = #" + property;
separator = true;
}
sql += " where id = #Id";
var parameters = (from parameter in changedProperties.Concat(new[] { "Id" })
let property = entity.GetProperty(parameter)
select ContextManager.CreateSqlParameter(parameter, property.GetValue(entity))).ToArray();
using (var session = sessionFactory.CreateReadWriteSession(false, false))
{
await session.UnderlyingDatabase.ExecuteSqlCommandAsync(sql, parameters).ConfigureAwait(false);
}
return entity.Id;
}
The UpdatePropertiesAsync method called on the writer (a repository implementation) looks like this:
public virtual async Task UpdatePropertyAsync(T entity, string[] changedPropertyNames, bool save = true)
{
if (changedPropertyNames == null || changedPropertyNames.Length == 0)
{
return;
}
Array.ForEach(changedPropertyNames, name => context.Entry(entity).Property(name).IsModified = true);
if (save)
await context.SaveChangesAsync().ConfigureAwait(false);
}
}
What is EF doing that completely kills performance? And is there anything we can do to work around it (short of using another ORM)?
By timing the code I was able to see that the additional time spent by EF was in the call to Attach the object to the context, and not in the actual query to update the database.
By eliminating all object references (setting them to null before attaching the object and restoring them after the update is complete) the EF code runs in "comparable times" (5 seconds, but with lots of logging code) to the hand-written solution.
So it looks like EF has a "bug" (some might call it a feature) causing it to inspect the attached object recursively even though change tracking and validation have been disabled.
Update: EF 7 appears to have addressed this issue by allowing you to pass in a GraphBehavior enum when calling Attach.
The problem with Entity framework is that when you call SaveChanges(), insert statements are sent to database one by one, that's how Entity works.
And actually there are 2 db hits per insert, first db hit is insert statement for a record, and the second one is select statement to get the id of inserted record.
So you have numOfRecords * 2 database trips * time for one database trip.
Write this in your code context.Database.Log = message => Debug.WriteLine(message); to log generated sql to console, and you will see what am I talking about.
You can use BulkInsert, here is the link: https://efbulkinsert.codeplex.com/
Seeing as though you already have tried setting:
Configuration.AutoDetectChangesEnabled = false;
Configuration.ValidateOnSaveEnabled = false;
And you are not using an ordered lists, I think you are going to have to refactor your code and do some benchmarking.
I believe the bottleneck is coming from the foreach as the context is having to deal with a potentially large amounts of bulk data (not sure how many this is in your case).
Try and cut the items contained in your array down into smaller batches before calling the SaveChanges(); or SaveChangesAsync(); methods, and note the performance deviations as apposed to letting the context grow too large.
Also, if you are still not seeing further gains, try disposing of the context post SaveChanges(); and then creating a new one, depending on the size of your entities list, flushing out the context may yield even further improvements.
But this all depends on how many entities we are talking about and may only be noticeable in the hundreds and thousands of record scenarios.
I am writing a fairly large service centered around Stanford's Folding#Home project. This portion of the project is a WCF service hosted inside of a Windows Service. With proper database indices and a dual core Core2Duo/7200rpm platter I am able to run approximately 1500 rows per second (SQL 2012 Datacenter instance). Each hour when I run this update, it takes a considerable amount of time to iterate through all 1.5 million users and add updates where necessary.
Looking at the performance profiler in SQL Server Management Studio 2012, I see that every user is being loaded via individual queries. Is there a way with EF to eagerly load a set of a given size of users, update them in memory, then save the updated users - using queries more elegant than single-select, single-update? I am currently using EF5, but if I need to move to 6 for improved performance, I will. The main source of delay on this process is waiting for database results.
Also, if there is anything I should change about the ForAll or pre-processing, feel free to mention it. The group pre-processing is very quick and dramatically increases the speed of the update by controlling each EF context's size - but if I can pre-process more and improve the overall time, I am more than willing to look into it!
private void DoUpdate(IEnumerable<Update> table)
{
var t = table.ToList();
var numberOfRowsInGroups = t.Count() / (Properties.Settings.Default.UpdatesPerContext); //Control each local context size. 120 works well on most systems I have.
//Split work groups out of the table of updates.
var groups = t.AsParallel()
.Select((update, index) => new {Value = update, Index = index})
.GroupBy(a => a.Index % numberOfRowsInGroups)
.ToList();
groups.AsParallel().ForAll(group =>
{
var ents = new FoldingDataEntities();
ents.Configuration.AutoDetectChangesEnabled = false;
ents.Configuration.LazyLoadingEnabled = true;
ents.Database.Connection.Open();
var count = 0;
foreach (var a in group)
{
var update = a.Value;
var data = UserData.GetUserData(update.Name, update.Team, ents); //(Name,Team) is a superkey; passing ents allows external context control
if (data.TotalPoints < update.NewCredit)
{
data.addUpdate(update.NewCredit, update.Sum); //basic arithmetic, very quick - may attach a row to the UserData.Updates collection. (does not SaveChanges here)
}
}
ents.ChangeTracker.DetectChanges();
ents.SaveChanges();
});
}
//from the UserData class which wraps the EF code.
public static UserData GetUserData(string name, long team, FoldingDataEntities ents)
{
return context.Users.Local.FirstOrDefault(u => (u.Team == team && u.Name == name))
?? context.Users.FirstOrDefault(u => (u.Team == team && u.Name == name))
?? context.Users.Add(new User { Name = name, Team = team, StartDate = DateTime.Now, LastUpdate = DateTime.Now });
}
internal struct Update
{
public string Name;
public long NewCredit;
public long Sum;
public long Team;
}
EF is not the solution for raw performance... It's the "easy way" to do a Data Access Layer, or DAL, but comes with a fair bit of overhead. I'd highly recommend using Dapper or raw ADO.NET to do a bulk update... Would be a lot faster.
http://www.ormbattle.net/
Now, to answer your question, to do a batch update in EF, you'll need to download some extensions and third party plugins that will enable such abilities. See: Batch update/delete EF5
I have a console application with a few methods that:
insert data1 (customers) from db 1 to db 2
update data1 from db 1 to db 2
insert data2 (contacts) from db 1 to db 2
insert data2 from db 1 to db 2
and then some data from db 2 (accessed by web services) to db 1 (MySql), the methods are initialized on execution of the application.
With these inserts and updates I need to compare a field (country state) with a value in a list I get from a web service. To get the states I have to do:
GetAllRecord getAllStates = new GetAllRecord();
getAllStates.recordType = GetAllRecordType.state;
getAllStates.recordTypeSpecified = true;
GetAllResult stateResult = _service.getAll(getAllStates);
Record[] stateRecords = stateResult.recordList;
and I can then loop through the array and look for shortname/fullname with
if (stateResult.status.isSuccess)
{
foreach (State state in stateRecords)
{
if (addressState.ToUpper() == state.fullName.ToUpper())
{
addressState = state.shortname;
}
}
}
As it is now I have the code above in all my methods but it takes a lot of time to fetch the state data and I have to do it many times (about 40k records and the web service only let me get 1k at a time so I have to use a "searchNext" method 39 times meaning that I query the web service 40 times for the states in each method.
I guess I could try to come up with something but I'm just checking what best praxis would be? If I create a separate method or class how can I access this list with all its values many times without having to download them again?
Edit: should I do something like this:
GetAllRecord getAllStates = new GetAllRecord();
getAllStates.recordType = GetAllRecordType.state;
getAllStates.recordTypeSpecified = true;
GetAllResult stateResult = _service.getAll(getAllStates);
Record[] stateRecords = stateResult.recordList;
Dictionary<string, string> allStates = new Dictionary<string, string>();
foreach (State state in stateRecords)
{
allStates.Add(state.shortname, state.fullName);
}
I am not sure where to put it though and how to access it from my methods.
One thing first, you should add a break to your code when you get a match. No need to continue looping the foreach after you have a match.
addressState = state.shortname;
break;
40 thousand records isn´t necessarily that much in with todays computers, and I would definitely implement a cache of all the fullnames <-> shortname.
If the data don´t change very often this is a perfectly good approach.
Create a Dictionary with fullName as the key and shortName as the value. Then you can just do a lookup in the methods which needs to translate the full name to the short name. You could either store this list as a static variable accessible from other classes, or have it in an instance class which you pass to your other objects as a reference.
If the data changes, you could refresh your cache every so often.
This way you only call the web service 40 times to get all the data, and all other lookups are in memory.
Code sample (not tested):
class MyCache
{
public static Dictionary<string,string> Cache = new Dictionary<string,string>();
public static void FillCache()
{
GetAllRecord getAllStates = new GetAllRecord();
getAllStates.recordType = GetAllRecordType.state;
getAllStates.recordTypeSpecified = true;
GetAllResult stateResult = _service.getAll(getAllStates);
Record[] stateRecords = stateResult.recordList;
if (stateResult.status.isSuccess)
{
foreach (State state in stateRecords)
{
Cache[state.fullName.ToUpper()] = state.shortname;
}
}
// and some code to do the rest of the web service calls until you have all results.
}
}
void Main()
{
// initialize the cache
MyCache.FillCache();
}
and in some method using it
...
string stateName = "something";
string shortName = MyCache.Cache[stateName.ToUpper()];
An easy way would be (and you really should) to cache the data locally. If I understand you correctly you do the webservice check everytime something changes which is likely unneccessary.
An easy implementation (if you can't or don't want to change your original data structures) would be to use a Dictionary somewhat like:
Dictionary<String, String> cache;
cache[addressState] = state.shortname;
BTW: You REALLY should not be using ToUpper for case insensitive compares. Use String.Compare (a, b, StringComparison.OrdinalIgnoreCase) instead.
From what I gather all the first bit of code is in some form of loop, and because of which the following line (which internally does the call to the web service) is being called 40 times:
GetAllResult stateResult = _service.getAll(getAllStates);
Perhaps you should try moving the stateResult variable to a class level scope: make it a private variable or something. So at least it will be there for the life time of the object. In the constructor of the class or in some method, make a call to the method on the object which interfaces with the ws. If you've gone with writing a method, make sure you've called the method once before you execute your loop-logic.
Hence you wouldn't have to call the ws all the time, just once.