We noticed that some very small web service calls were taking much longer than we expected. We did some investigation and put some timers in place and we narrowed it down to creating an instance of our Entity Framework 6 DbContext. Not the query itself, just the creation of the context. I've since put some logging to see on average how long it actually takes to create an instance of our DbContext and it seems it was around 50ms.
After the application is warmed up context creation is not slow. After an app recycle it starts out at 2-4ms (which is what we see in our dev environments). Context creation seems to slow down over time. Over the next couple hours it will creep up to the 50-80ms range and level off.
Our context is a fairly large code-first context with around 300 entities - including some pretty complex relationships between some of the entities. We are running EF 6.1.3. We are doing a "one context per request", but for most of our web API calls it's only doing one or two queries. Creating a context taking 60+ms, and then execute a 1ms query is a bit dissatisfying. We have about 10k requests per minute, so we aren't a lightly used site.
Here is a snapshot of what we see. Times are in MS, the big dip is a deploy which recycled the app domain. Each line is one of 4 different web servers. Notice it's not always the same server either.
I did take a memory dump to try and flesh out what's going on and here is the heap stats:
00007ffadddd1d60 70821 2266272 System.Reflection.Emit.GenericFieldInfo
00007ffae02e88a8 29885 2390800 System.Linq.Enumerable+WhereSelectListIterator`2[[NewRelic.Agent.Core.WireModels.MetricDataWireModel, NewRelic.Agent.Core],[System.Single, mscorlib]]
00007ffadda7c1a0 1462 2654992 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.Object, mscorlib],[System.Object, mscorlib]][]
00007ffadd4eccf8 83298 2715168 System.RuntimeType[]
00007ffadd4e37c8 24667 2762704 System.Reflection.Emit.DynamicMethod
00007ffadd573180 30013 3121352 System.Web.Caching.CacheEntry
00007ffadd2dc5b8 35089 3348512 System.String[]
00007ffadd6734b8 35233 3382368 System.RuntimeMethodInfoStub
00007ffadddbf0a0 24667 3749384 System.Reflection.Emit.DynamicILGenerator
00007ffae04491d8 67611 4327104 System.Data.Entity.Core.Metadata.Edm.MetadataProperty
00007ffadd4edaf0 57264 4581120 System.Signature
00007ffadd4dfa18 204161 4899864 System.RuntimeMethodHandle
00007ffadd4ee2c0 41900 5028000 System.Reflection.RuntimeParameterInfo
00007ffae0c9e990 21560 5346880 System.Data.SqlClient._SqlMetaData
00007ffae0442398 79504 5724288 System.Data.Entity.Core.Metadata.Edm.TypeUsage
00007ffadd432898 88807 8685476 System.Int32[]
00007ffadd433868 9985 9560880 System.Collections.Hashtable+bucket[]
00007ffadd4e3160 92105 10315760 System.Reflection.RuntimeMethodInfo
00007ffadd266668 493622 11846928 System.Object
00007ffadd2dc770 33965 16336068 System.Char[]
00007ffadd26bff8 121618 17335832 System.Object[]
00007ffadd2df8c0 168529 68677312 System.Byte[]
00007ffadd2d4d08 581057 127721734 System.String
0000019cf59e37d0 166894 143731666 Free
Total 5529765 objects
Fragmented blocks larger than 0.5 MB:
Addr Size Followed by
0000019ef63f2140 2.9MB 0000019ef66cfb40 Free
0000019f36614dc8 2.8MB 0000019f368d6670 System.Data.Entity.Core.Query.InternalTrees.SimpleColumnMap[]
0000019f764817f8 0.8MB 0000019f76550768 Free
0000019fb63a9ca8 0.6MB 0000019fb644eb38 System.Data.Entity.Core.Common.Utils.Set`1[[System.Data.Entity.Core.Metadata.Edm.EntitySet, EntityFramework]]
000001a0f6449328 0.7MB 000001a0f64f9b48 System.String
000001a0f65e35e8 0.5MB 000001a0f666e2a0 System.Collections.Hashtable+bucket[]
000001a1764e8ae0 0.7MB 000001a17659d050 System.RuntimeMethodHandle
000001a3b6430fd8 0.8MB 000001a3b6501aa0 Free
000001a4f62c05c8 0.7MB 000001a4f636e8a8 Free
000001a6762e2300 0.6MB 000001a676372c38 System.String
000001a7761b5650 0.6MB 000001a776259598 System.String
000001a8763c4bc0 2.3MB 000001a8766083a8 System.String
000001a876686f48 1.4MB 000001a8767f9178 System.String
000001a9f62adc90 0.7MB 000001a9f63653c0 System.String
000001aa362b8220 0.6MB 000001aa36358798 Free
That seems like quite a bit of metadata and typeusage.
Things we've tried:
Creating a simple test harness to replicate. It failed, my guess is because we weren't varying traffic, or varying the type of queries run. Just loading the context and executing a couple queries over and over didn't result in the timing increase.
We've reduced the context significantly, it was 500 entities, now 300. Didn't make a difference in speed. My guess is because we weren't using those 200 entities at all.
(Edit) We use SimpleInjector to create our "context per request". To validate it isn't SimpleInjector I have spun up an instance of the Context by just new'in it up. Same slow create times.
(Edit) We have ngen'd EF. Didn't make any impact.
What can we investigate further? I understand the cache used by EF is extensive to speed things up. Does more things in the cache, slow down the context creation? Is there a way to see exactly what's in that cache to flesh out any weird things in there? Does anyone know what specifically we can do to speed up context creation?
Update - 5/30/17
I took the EF6 source and compiled our own version to stick in some timings. We run a pretty popular site so collecting huge amount of timing info is tricky and I didn't get as far as I wanted, but basically we found that all of the slowdown is coming from this method
public void ForceOSpaceLoadingForKnownEntityTypes()
{
if (!_oSpaceLoadingForced)
{
// Attempting to get o-space data for types that are not mapped is expensive so
// only try to do it once.
_oSpaceLoadingForced = true;
Initialize();
foreach (var set in _genericSets.Values.Union(_nonGenericSets.Values))
{
set.InternalSet.TryInitialize();
}
}
}
Each iteration of that foreach hits for each one of the entities defined by a DBSet in our context. Each iteration is relatively short .1-.3 ms, but when you add in the 254 entities we had it adds up. We still haven't figured out why it's fast at the beginning and slows down.
Here is where I would start to solving the problem, sans moving to a more enterprise friendly solution.
Our context is a fairly large code-first context with around 300 entities
While EF has greatly improved over time, I still would start seriously looking at chopping things up once you get to 100 entities (actually I would start well before that, but that seems to be a magic number many people have stated - concensus?). Think of it as designing for "contexts", but use the word "domain" instead? That way you can sell your execs that you are applying "domain driven design" to fix the application? Maybe you are designing for future "microservices", then you use two buzz words in a single paragraph. ;-)
I am not a huge fan of EF in the Enterprise space, so I tend to avoid it for high scale or high performance applications. Your mileage may vary. For SMB, it is probably perfectly fine. I do run into clients that use it, however.
I am not sure the following ideas are completely up to date, but they are some other things I would consider, based on experience.
Pre-gen your views. They are the most expensive part of the query. This will help even more with large models.
Move your model to a separate assembly. Not so much a perf thing than a pet peeve of mine in code organization.
Examine your application, model, for caching possibilities. Query plan caching can often shave quite a bit of time off.
Use CompileQuery.
When feasible, use NoTracking. This is a huge savings, if you do not need the feature.
It looks like you are already running some type of profiler on the application, so I am going to assume you also looked at your SQL queries and possible performance gains. Yes, I know that is not the problem you are looking to solve, but it is something that can contribute to the entire issue from the user perspective.
In response to #WiktorZichia's comment about not answering the question about the performance problem, the best way to get rid of these types of problems in an Enterprise System is to get rid of Entity Framework. There are trade offs in every decision. EF is a great abstraction and speeds up development. But it comes with some unnecessary overhead that can hurt systems at scale. Now, technically, I still did not answer the "how do I solve this problem they way I am trying to solve it" question, so this might still be seen as a fail.
Related
This might sound like a very strange question. But i work on a project which needs to have cirular references within it. Actually, they are even non-avoidable. Because Users could create their own circular references within the GUI. And this is absolutely intended.... Please don't ask why, this would take ages to explain.
All Question, Answers, Resources i found which discuss Circular References provide Solutions and Approaches on how to avoid one. But non i have read contained a solution on how to make one, without killing the underlying computational resources.
Issues i see
Such a cicular reference seems to me to always have the possibility to completely overhelm the underlying system, be it a simple home computer or research supercomputer where this program is meant to be run.
This is due to my understanding that the resources provided are always finite, but circular references are infinite by nature.
The resources i see which might be of issue here are:
computational power (CPU)
working memory (RAM)
Data storage
Network bandwidth
How could it be possible to mitigate those issues
Mitigation could take place by making sure that the program itself is only ever able to increase it's needs for computational resources in an very minor and incremental fashion. If there are then measures implemented which, based on gathered Data of the whole System as a Unit, allows us to decide if further evolutions are even necessary to improve the perceived Quality of the System. It would help us to cap the needs for Computational Resources.
One of the ways i could imagine that this capping could take place is by introducing time as a limiting factor. The program could be designed in such a way that it only considers re-evaluating "itself" after a given amount of time. If this time and the limit of Quality are carefully choosen to match the underlying computational resources, i feel like the resource issues with circular references could be mitigated.
Code Snippet
Find below a very simplified Code Snippet. Point 1 and Point 2 are completely independent in nature, they could even be on different Threads (actually that's an Idea how it could be done, but i dont understand multithreading well enough to decide if it would be a good approach or not). The action first begins when they are attached to another. I do not care if the behavior of "First this then that" happens in a specific way. The only thing for which i do care is that all interactions between those two Points have been taken place at some point in the future (after their attachement).
namespace Circularity
{
class Program
{
static void Main(string[] args)
{
Point Point1 = new Point();
Point Point2 = new Point();
Point1.attach(Point2);
}
}
class Point
{
private ulong Value;
public Point()
{
Value = ulong.MaxValue / 2;
}
public void attach(Point otherPoint)
{
if (Value < ulong.MaxValue) Value++;
otherPoint.attach(this);
}
}
}
This Code leads instantly to a Stack Overflow. But i do not understand the underlying concepts of the Stack well enough to implement a counter measure. I tried to apply the Time concept here already, but it just takes longer for the Stack Overflow.
The reason you're getting a stack overflow is because you're calling attach recursively, so you will just keep adding stack frames, the CLR can't handle that many and as you've witnessed, it quickly maxes out. One strategy here would be to use Continuation Passing Style so you avoid building a stack of method calls.
When and how to use continuation passing style
Sorry for my last question, my code was so stupid.
My base situation is: I want to construct a state tree which has 8! items in the last state. so the total count of iterations is about 100.000 (8!*2 + 7! + 6! + ... )
it currently takes less than one second, i need to construct it every time my artificial intelligence is making a move. Of course, the alpha/beta search is a solution but before thinking of that i want to optimize my code so i really have the best possible performance.
what i already did:
tried to replace every LINQ function with precalculations or collections with faster access (Dictionary), more precalculations for skipping whole operations, of course, some approximations to spare heavy calculations, using List constructors only when there's actually a change, if not, just use the reference.
there'll be more calculations coming so i really need more ideas for reducing. maybe something about what collection is fastest for my purpose.
My code
It's about the BuildChildNodes function and the called TryCollect function. My Constructor is doing some little precalculations. my state tree knows everything, even the cards which aren't actually shown.
as the comment came up: i'm not asking you to read and understand my code to provide content-wise advices. i'm asking you about the functions, operators, data types and classes i'm using and if there could be make a replacement which runs a bit faster. e.g. if there's a faster collection for my purpose or if you have a better idea to replace the collections constructor with a faster method reagarding of adding and removing afterwards.
Edit: okay List is definitely the best type i can use. i tried [] Arrays and even Dictionaries () and last of all i even tried LinkedLists. All with a significant loss.
I can see that RemoveAt() could be expensive at it is proportional to the size of the list.
You can always use the Visual Studio performance profiler to find out where you should optimize your code the most.
If you can find a way to use fixed-size arrays that you allocate when your program starts, instead of dynamically-allocated data structures like List, you will save of lot on memory allocation management overhead.
I've just been noodling about with a profiler looking at performance bottlenecks in a WCF application after some users complained of slowness.
To my surprise, almost all the problems came down to Entity Framework operations. We use a repository pattern and most of the "Add/Modify" code looks very much like this:
public void Thing_Add(Thing thing)
{
Log.Trace("Thing_Add called with ThingID " + thing.ThingID);
if (db.Things.Any(m => m.ThingID == thing.ThingID))
{
db.Entry(thing).State = System.Data.EntityState.Modified;
}
else
{
db.Things.Add(thing);
}
}
This is obviously a convenient way to wrap an add/update check into a single function.
Now, I'm aware that EF isn't the most efficient thing when it comes to doing inserts and updates. However, my understanding was (which a little research bears out) that it should be capable of processing a few hundred records faster than a user would likely notice.
But this is causing big bottlenecks on small upserts. For example, in one case it takes six seconds to process about fifty records. That's a particularly bad example but there seem to be instances all over this application where small EF upserts are taking upwards of a second or two. Certainly enough to annoy a user.
We're using Entity Framework 5 with a Database First model. The profiler says it's not the Log.Trace that's causing the issue. What could be causing this, and how can I investigate and fix the issue?
I found the root of the problem on another SO post: DbContext is very slow when adding and deleting
Turns out that when you're working with a large number of objects, especially in a loop, the gradual accumulation of change tracking makes EF get slower and slower.
Refreshing the DbContext isn't enough in this instance as we're still working with too many linked entities. So I put this inside the repository:
public void AutoDetectChangesEnabled(bool detectChanges)
{
db.Configuration.AutoDetectChangesEnabled = detectChanges;
}
And can now use it to turn AutoDetectChangesEnabled on and off before doing looped inserts:
try
{
rep.AutoDetectChangesEnabled(false);
foreach (var thing in thingsInFile)
{
rep.Thing_add(new Thing(thing));
}
}
finally
{
rep.AutoDetectChangesEnabled(true);
}
This makes a hell of a difference. Although it needs to used with care, since it'll stop EF from recognizing potential updates to changed objects.
We have a problem which seems to be caused by the constant allocation and deallocation of memory:
We have a rather complex system here, where a USB device is measuring arbitrary points and sending the measurement data to the PC at a rate of 50k samples per second. These samples are then collected as MeasurementTasks in the software for each point and afterwards processed which causes even more needed memory because of the requirements of the calculations.
Simplified each MeasurementTask looks like the following:
public class MeasurementTask
{
public LinkedList<Sample> Samples { get; set; }
public ComplexSample[] ComplexSamples { get; set; }
public Complex Result { get; set; }
}
Where Sample looks like:
public class Sample
{
public ushort CommandIndex;
public double ValueChannel1;
public double ValueChannel2;
}
and ComplexSample like:
public class ComplexSample
{
public double Channel1Real;
public double Channel1Imag;
public double Channel2Real;
public double Channel2Imag;
}
In the calculation process the Samples are first calculated into a ComplexSample each and then futher processed until we get our Complex Result. After these calculations are done we release all the Sample and ComplexSample instances and the GC cleans them up soon after, but this results in a constant "up and down" of the memory usage.
This is how it looks at the moment with each MeasurementTask containing ~300k samples:
Now we have sometimes the problem that the samples buffer in our HW device is overflown, as it can only store ~5000 samples (~100ms) and it seems the application is not always reading the device fast enough (we use BULK transfer with LibUSB/LibUSBDotNet). We tracked this problem down to this "memory up and down" by the following facts:
the reading from the USB device happens in its own thread which runs at ThreadPriority.Highest, so the calculations should not interfere
CPU usage is between 1-5% on my 8-core CPU => <50% of one core
if we have (much) faster MeasurementTasks with only a few hundret samples each, the memory goes only up and down very little and the buffer never overflows (but the amount of instances/second is the same, as the device still sends 50k samples/second)
we had a bug before, which did not release the Sample and ComplexSample instances after the calculations and so the memory only went up at ~2-3 MB/s and the buffer overflew all the time
At the moment (after fixing the bug mentioned above) we have a direct correlation between the samples count per point and the overflows. More samples/point = higher memory delta = more overflows.
Now to the actual question:
Can this behaviour be improved (easily)?
Maybe there is a way to tell the GC/runtime to not release the memory so there is no need to re-allocate?
We also thought of an alternative approach by "re-using" the LinkedList<Sample> and ComplexSample[]: Keep a pool of such lists/arrays and instead of releasing them put them back in the pool and "change" these instances instead of creating new ones, but we are not sure this is a good idea as it adds complexity to the whole system...
But we are open to other suggestions!
UPDATE:
I now optimized the code base with the following improvements and did various test runs:
converted Sample to a struct
got rid of the LinkedList<Sample> and replaced them by straigt arrays (I actually had another one somewhere else I also removed)
several minor optimizations I found during analysis and optimization
(optional - see below) converted ComplexSample to a struct
In any case it seems that the problem is gone now on my machine (long term tests and test on low-spec hardware will follow), but I first run a test with both types as struct and got the following memory usage graph:
There it still was going up to ~300 MB on a regular basis (but no overflow errors anymore), but as this still seemed odd to me I did some additional tests:
Side note: Each value of each ComplexSample is altered at least once during the calculations.
1) Add a GC.Collect after a task is processed and the samples are not referenced any more:
Now it was alternating between 140 MB and 150 MB (no noticable perfomance hit).
2) ComplexSample as a class (no GC.Collect):
Using a class it is much more "stable" at ~140-200 MB.
3) ComplexSample as a class and GC.Collect:
Now it is going "up and down" a little in the range of 135-150 MB.
Current solution:
As we are not sure this is a valid case for manually calling GC.Collect we are using "solution 2)" now and I will start running the long-term (= several hours) and low-spec hardware tests...
Can this behaviour be improved (easily)?
Yes (depends on how much you need to improve it).
The first thing I would do is to change Sample and ComplexSample to be value-types. This will reduce the complexity of the graph dealt with by GC as while the arrays and linked lists are still collected, they contain those values directly rather than references to them, and that simplifies the rest of GC.
Then I'd measure performance at this point. The impact of working with relatively large structs is mixed. The guideline that value types should be less than 16 bytes comes from it being around that point where the performance benefits of using a reference type tend to overwhelm the performance benefits of using a value type, but that guideline is only a guideline because "tend to overwhelm" is not the same as "will overwhelm in your application".
After that if it had either not improved things, or not improved things enough, I would consider using a pool of objects; whether for those smaller objects, only the larger objects, or both. This will most certainly increase the complexity of your application, but if it's time-critical, then it might well help. (See How do these people avoid creating any garbage? for example which discusses avoiding normal GC in a time-critical case).
If you know you'll need a fixed maximum of a given type this isn't too hard; create and fill an array of them and dole them out from that array before returning them as they are no longer used. It's still hard enough in that you no longer have GC being automatic and have to manually "delete" the objects by putting them back in the pool.
If you don't have such knowledge, it gets harder but is still possible.
If it is really vital that you avoid GC, be careful of hidden objects. Adding to most collection types can for example result in them moving up to a larger internal store, and leaving the earlier store to be collected. Maybe this is fine in that you've still reduced GC use enough that it is no longer causing the problem you have, but maybe not.
Rarely I've seen a LinkedList<> used in .NET... Have you tried using a List<>? Consider that the basic "element" of a LinkedList<> is a LinkedListNode<> that is a class... So for each Sample there is a whole additional overhead of one object.
Note that if you want to use "big" value types (as suggested by others), the List<> could become again slower (because the List<> grows by "generate a new-internal array of double the current size size and copy from old to new), so the bigger the elements, the more memory the List<> has to copy around when it doubles itself.
If you go to List<> you could try splitting the Sample to
List<ushort> CommandIndex;
List<Sample> ValueChannels;
This because the doubles of Sample require 8 byte alignment, so as written the Sample is 24 bytes, with only 18 bytes used.
This wouldn't be a good idea for LinkedList<>, because the LL has a big overhead per item.
Change Sample and ComplexSample to struct.
I currently have a function:
public static Attribute GetAttribute(MemberInfo Member, Type AttributeType)
{
Object[] Attributes = Member.GetCustomAttributes(AttributeType, true);
if (Attributes.Length > 0)
return (Attribute)Attributes[0];
else
return null;
}
I am wondering if it would be worthwhile caching all the attributes on a property into a
Attribute = _cache[MemberInfo][Type] dictionary,
This would require using GetCustomAttributes without any type parameter then enumerating over the result. Is it worth it?
You will get better bangs for your bucks if you replace the body of your method with this:
return Attribute.GetCustomAttribute(Member, AttributeType,false); // only look in the current member and don't go up the inheritance tree.
If you really need to cache on a type-basis:
public static class MyCacheFor<T>
{
static MyCacheFor()
{
// grab the data
Value = ExtractExpensiveData(typeof(T));
}
public static readonly MyExpensiveToExtractData Value;
private static MyExpensiveToExtractData ExtractExpensiveData(Type type)
{
// ...
}
}
Beats dictionary lookups everytime. Plus it's threadsafe:)
Cheers,
Florian
PS: Depends how often you call this. I had some cases where doing a lot of serialization using reflection really called for caching, as usual, you want to measure the performance gain versus the memory usage increase. Instrument your memory use and profile your CPU time.
The only way you can know for sure, is to profile it. I am sorry if this sounds like a cliche. But the reason why a saying is a cliche is often because it's true.
Caching the attribute is actually making the code more complex, and more error prone. So you might want to take this into account-- your development time-- before you decide.
So like optimization, don't do it unless you have to.
From my experience ( I am talking about AutoCAD-like Windows Application, with a lot of click-edit GUI operations and heavy number crunching), the reading of custom attribute is never--even once-- the performance bottleneck.
I just had a scenario where GetCustomAttributes turned out to be the performance bottleneck. In my case it was getting called hundreds of thousands of times in a dataset with many rows and this made the problem easy to isolate. Caching the attributes solved the problem.
Preliminary testing led to a barely noticeable performance hit at about 5000 calls on a modern day machine. (And it became drastically more noticeable as the dataset size increased.)
I generally agree with the other answers about premature optimization, however, on a scale of CPU instruction to DB call, I'd suggest that GetCustomAttributes would lean more towards the latter.
Your question is a case of premature optimization.
You don't know the inner workings of the reflection classes and therefore are making assumptions about the performance implications of calling GetCustomAttributes multiple times. The method itself could well cache its output already, meaning your code would actually add overhead with no performance improvement.
Save your brain cycles for thinking about things which you already know are problems!
Old question but GetCustomAttributes is costly/heavyweight
Using a cache if it is causing performance problems can be a good idea
The article I linked: (Dodge Common Performance Pitfalls to Craft Speedy Applications) was taken down but here a link to an archived version:
https://web.archive.org/web/20150118044646/http://msdn.microsoft.com:80/en-us/magazine/cc163759.aspx
Are you actually having a performance problem? If not then don't do it until you need it.
It might help depending on how often you call the method with the same paramters. If you only call it once per MemberInfo, Type combination then it won't do any good. Even if you do cache it you are trading speed for memory consumption. That might be fine for your application.