I am using EF 4.2 and am having an issue that happens quite randomly and without warning. I have a Windows Service which updates the database. In the service I have a timer. When the time of the timer elapses a method gets called. This is the basic structure of the method
IEnumerable<Foo> foos = GetFoosFromDB();
foreach (Foo foo in foos)
{
if (some condition)
{
foo.Bar = 1;
}
if (some other condition)
{
foo.Bar = 2;
}
if (yet some other condition)
{
foo.Bar = 3;
}
else
{
int val = GetSomeValueFromDB();
if (val == something)
{
if(GetSomeOtherValueFromDB())
{
foo.Bar = 4;
}
else
{
CallSomeMethodThatAlsoCallsSaveChanges();
foo.Bat = SomeCalculatedValue();
}
}
}
}
SaveChanges();
Now, the problem is that once we start working with the database for a day and there are a few rows in the tables of that database (we are talking about 100 or 200 rows only), then even though this method is called, SaveChanges doesn't seem to do what it should do. What am I doing wrong?
thanks,
Sachin
Ignoring the other aspects of the code, this line stuck out as a likely problem:
else
{
CallSomeMethodThatAlsoCallsSaveChanges();
foo.Bat = SomeCalculatedValue();
}
// a few }} later...
SaveChanges();
When this logic branch is executed, your context's pending changes are committed to the DB (based on what you've provided). Depending on how you're creating and managing your db context objects, you've either cleared the modified list, or you've introduced potential change conflicts. When SaveChanges() is called after the loop, it may or may not have pending changes to commit (depends on whether the conditional logic called your other method).
Consider what logical unit(s) of work are being performed with this logic and keep those UoW atomically separated. Think about how your DB contexts are being created, managed, and passed around, since those maintain the local state of your objects.
If you are still having trouble, you can post more of your code and we can attempt to troubleshoot futher
Related
So I'm running a Parallel.ForEach that basically generates a bunch of data which is ultimately going to be saved to a database. However, since collection of data can get quite large I need to be able to occasionally save/clear the collection so as to not run into an OutOfMemoryException.
I'm new to using Parallel.ForEach, concurrent collections, and locks, so I'm a little fuzzy on what exactly needs to be done to make sure everything works correctly (i.e. we don't get any records added to the collection between the Save and Clear operations).
Currently I'm saying, if the record count is above a certain threshold, save the data in the current collection, within a lock block.
ConcurrentStack<OutRecord> OutRecs = new ConcurrentStack<OutRecord>();
object StackLock = new object();
Parallel.ForEach(inputrecords, input =>
{
lock(StackLock)
{
if (OutRecs.Count >= 50000)
{
Save(OutRecs);
OutRecs.Clear();
}
}
OutRecs.Push(CreateOutputRecord(input);
});
if (OutRecs.Count > 0) Save(OutRecs);
I'm not 100% certain whether or not this works the way I think it does. Does the lock stop other instances of the loop from writing to output collection? If not is there a better way to do this?
Your lock will work correctly but it will not be very efficient because all your worker threads will be forced to pause for the entire duration of each save operation. Also, locks tends to be (relatively) expensive, so performing a lock in each iteration of each thread is a bit wasteful.
One of your comments mentioned giving each worker thread its own data storage: yes, you can do this. Here's an example that you could tailor to your needs:
Parallel.ForEach(
// collection of objects to iterate over
inputrecords,
// delegate to initialize thread-local data
() => new List<OutRecord>(),
// body of loop
(inputrecord, loopstate, localstorage) =>
{
localstorage.Add(CreateOutputRecord(inputrecord));
if (localstorage.Count > 1000)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
return localstorage;
},
// finally block gets executed after each thread exits
localstorage =>
{
if (localstorage.Count > 0)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
});
One approach is to define an abstraction that represents the destination for your data. It could be something like this:
public interface IRecordWriter<T> // perhaps come up with a better name.
{
void WriteRecord(T record);
void Flush();
}
Your class that processes the records in parallel doesn't need to worry about how those records are handled or what happens when there's too many of them. The implementation of IRecordWriter handles all those details, making your other class easier to test.
An implementation of IRecordWriter could look something like this:
public abstract class BufferedRecordWriter<T> : IRecordWriter<T>
{
private readonly ConcurrentQueue<T> _buffer = new ConcurrentQueue<T>();
private readonly int _maxCapacity;
private bool _flushing;
public ConcurrentQueueRecordOutput(int maxCapacity = 100)
{
_maxCapacity = maxCapacity;
}
public void WriteRecord(T record)
{
_buffer.Enqueue(record);
if (_buffer.Count >= _maxCapacity && !_flushing)
Flush();
}
public void Flush()
{
_flushing = true;
try
{
var recordsToWrite = new List<T>();
while (_buffer.TryDequeue(out T dequeued))
{
recordsToWrite.Add(dequeued);
}
if(recordsToWrite.Any())
WriteRecords(recordsToWrite);
}
finally
{
_flushing = false;
}
}
protected abstract void WriteRecords(IEnumerable<T> records);
}
When the buffer reaches the maximum size, all the records in it are sent to WriteRecords. Because _buffer is a ConcurrentQueue it can keep reading records even as they are added.
That Flush method could be anything specific to how you write your records. Instead of this being an abstract class the actual output to a database or file could be yet another dependency that gets injected into this one. You can make decisions like that, refactor, and change your mind because the very first class isn't affected by those changes. All it knows about is the IRecordWriter interface which doesn't change.
You might notice that I haven't made absolutely certain that Flush won't execute concurrently on different threads. I could put more locking around this, but it really doesn't matter. This will avoid most concurrent executions, but it's okay if concurrent executions both read from the ConcurrentQueue.
This is just a rough outline, but it shows how all of the steps become simpler and easier to test if we separate them. One class converts inputs to outputs. Another class buffers the outputs and writes them. That second class can even be split into two - one as a buffer, and another as the "final" writer that sends them to a database or file or some other destination.
In my tests I am reading a text file line by line and inserting an entity along with other related entities. The problem is when too many are inserted I receive an Out of memory exception.
In my attempt to prevent this I create a new DbContext for every 50 rows and dispose of the old one. It was my understanding that this would free up memory from the earlier entity operations, but the memory continues to climb and if the file is big enough an out of memory exception occurs. This is related to the entity code as if I remove the lines of code that adds the entity the memory stays at a consistent usage.
Below is a simplified version of my code.
public class TestClass
{
public void ImportData(byte[] fileBytes)
{
using (Stream stream = new MemoryStream(fileBytes))
{
TextFieldParser parser = new TextFieldParser(stream);
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
//Processes 50 lines creates a new DbContext each time its called
ImportBatch(parser);
}
}
}
public void ImportBatch(TextFieldParser parser)
{
using(myDbContext context = new myDbContext())
{
context.Configuration.AutoDetectChangesEnabled = false;
int batchCount = 0;
while (!parser.EndOfData && batchCount < 50)
{
string[] fields = parser.ReadFields();
//Here I call some code that will add an entity and add releated entities
//In its navigation properties
MyService.AddMyEntity(fields,myDbContext);
batchCount++;
}
myDbContext.ChangeTracker.DetectChanges();
myDbContext.SaveChanges();
}
}
}
As I am disposing and creating a new context every 50 inserts I would expect the memory usage to stay constant, but it seems to be constant for the first 2 thousands rows but after that the memory constantly climbs, unitl an OutOfMemory exception is hit.
Is there a reason why disposing of a dbContext in the following fashion would not result in the memory being released?
EDIT - added some simplified code of my add entity method
public void AddMyEntity(string[] fields, MyDbContext, myDbContext)
{
MyEntity myEntity = new MyEntity();
newRequest.InsertDate = DateTime.UtcNow;
newRequest.AmendDate = DateTime.UtcNow;
//If I remove this line the memory does not consistently climb
myDbContext.MyEntities.Add(myEntity);
foreach(string item in fields)
{
ReleatedEntity releatedEntity = new ReleatedEntity();
releatedEntity.Value = item;
newRequest.ReleatedEntities.Add(releatedEntity);
}
}
Another Edit
Turns out after more testing it is something to do with Glimpse profiler. I have included Glimpse in my project and the web config has a section similar to below.
<glimpse defaultRuntimePolicy="On" endpointBaseUri="~/Glimpse.axd">
<tabs>
<ignoredTypes>
<add type="Glimpse.Mvc.Tab.ModelBinding, Glimpse.Mvc5"/>
<add type="Glimpse.Mvc.Tab.Metadata, Glimpse.Mvc5"/>
</ignoredTypes>
</tabs>
<inspectors>
<ignoredTypes>
<add type="Glimpse.Mvc.Inspector.ModelBinderInspector, Glimpse.Mvc5"/>
</ignoredTypes>
</inspectors>
Turning defaultRuntimePolicy to Off fixed the memory leak. Still not sure why tho
Calling Dispose on an object does not necessarily free up memory. Objects are removed from memory by the garbage collector when they are no longer referenced by any live object. Calling Dispose might free up other resources (e.g. by closing an open SQL connection in the case of a DbContext), but it's only when the object is no longer referenced that it becomes a candidate for garbage collection.
There's no guarantee that the garbage collector will run at a specific point in time. Calling Dispose certainly doesn't cause it to run. Having said that, I'm surprised that it doesn't run before you run out of memory. You could force it to run with GC.Collect but that really shouldn't be necessary.
It's possible that you still have a reference to your context objects that's causing them not to be considered eligible for garbage collection. For example, you pass myDbContext to your AddEntity method in your service layer - does that store something that includes a back-reference (even indirectly) to the context?
Because whenever you call ImportBatch(parser) it creates a new DbContext. Not 1 DbContext for each 50. You may try a get property to count and give back the context to you. Something like this :
int _batchCount = 0;
public myDbContext _db;
public myDbContext Db
{
get
{
// If batchCount > 50 or _db is not created we need to create _db
if (_db == null || _batchCount > 50)
{
// If db is already created _batchcount > 50
if (_db != null)
{
_db.ChangeTracker.DetectChanges();
_db.SaveChanges();
_db.Dispose();
}
_db = new myDbContext();
_db.Configuration.AutoDetectChangesEnabled = false;
_batchCount = 0;
}
batchCount++;
return _db;
}
}
Additionally in MyService.AddMyEntity(fields); you are using the DbContext from the MyServiceclass, not the one you are creating in using line.
Is it possible to get the class that called the SaveChanges() method in the EventHandler?
That's because I have an entity called Activity which can have it's status changed by some parts of the system and I need to log it and save in the database. In the log table I need to store the IDs of the entity that was updated or created and thus caused the activity status to change.
I think I can either do it or try the unmaintainable solution.
The unmaintainable solution would be to add some code to every part of the system that changes the activity status.
PS: I can't use database triggers..
I don't think trying to update another table as part of the SaveChanges is the correct approach here, you would be coupling your logging mechanism to that particular context - what if you wanted to disable logging or switch it out to use a different type of logging? i.e. local file.
I would update the log table along with the entity itself if the update was successful i.e.
var entity = ...
// update entity
if (context.SaveChanges() != 0)
{
// update log table
}
It's possible (but I would recommend against it) using the StackTrace, eg:
public class Test
{
public event EventHandler AnEvent;
public Test()
{
AnEvent += WhoDoneIt;
}
public void Trigger()
{
if (AnEvent != null)
AnEvent(this, EventArgs.Empty);
}
public void WhoDoneIt(object sender, EventArgs eventArgs)
{
var stack = new StackTrace();
for (var i = 0; i < stack.FrameCount; i++)
{
var frame = stack.GetFrame(i);
var method = frame.GetMethod();
Console.WriteLine("{0}:{1}.{2}", i, method.DeclaringType.FullName, method.Name);
}
}
}
public class Program
{
static void Main(string[] args)
{
var test = new Test();
test.Trigger();
Console.ReadLine();
}
}
If you look at the output of the program you can figure out which stack frame you want to look at and analyze the caller based on the Method of that frame.
HOWEVER, this can have serious performance implications - the stack trace is quite an expensive object to create, so I would really recommend changing your code to keep track of the caller in a different way - one idea could be to store the caller in a threadstatic variable before calling SaveChanges and then clearing it out afterwards
From your post it sounds like you're more interested in which entities are updating rather than which method called SaveChanges.
If that's the case, you can examine the pending changes and see which entities are either added or modified (or deleted if you care) and do your logging based on that information.
You would do that like this:
public override int SaveChanges()
{
if (changeSet != null)
foreach (var dbEntityEntry in ChangeTracker.Entries())
{
switch (dbEntityEntry.State)
{
case EntityState.Added:
// log your data
break;
case EntityState.Modified:
// log your data
break;
}
}
return base.SaveChanges();
}
I wanted to parallelize a piece of code, but the code actually got slower probably because of overhead of Barrier and BlockCollection. There would be 2 threads, where the first would find pieces of work wich the second one would operate on. Both operations are not much work so the overhead of switching safely would quickly outweigh the two threads.
So I thought I would try to write some code myself to be as lean as possible, without using Barrier etc. It does not behave consistent however. Sometimes it works, sometimes it does not and I can't figure out why.
This code is just the mechanism I use to try to synchronize the two threads. It doesn't do anything useful, just the minimum amount of code you need to reproduce the bug.
So here's the code:
// node in linkedlist of work elements
class WorkItem {
public int Value;
public WorkItem Next;
}
static void Test() {
WorkItem fst = null; // first element
Action create = () => {
WorkItem cur=null;
for (int i = 0; i < 1000; i++) {
WorkItem tmp = new WorkItem { Value = i }; // create new comm class
if (fst == null) fst = tmp; // if it's the first add it there
else cur.Next = tmp; // else add to back of list
cur = tmp; // this is the current one
}
cur.Next = new WorkItem { Value = -1 }; // -1 means stop element
#if VERBOSE
Console.WriteLine("Create is done");
#endif
};
Action consume = () => {
//Thread.Sleep(1); // this also seems to cure it
#if VERBOSE
Console.WriteLine("Consume starts"); // especially this one seems to matter
#endif
WorkItem cur = null;
int tot = 0;
while (fst == null) { } // busy wait for first one
cur = fst;
#if VERBOSE
Console.WriteLine("Consume found first");
#endif
while (true) {
if (cur.Value == -1) break; // if stop element break;
tot += cur.Value;
while (cur.Next == null) { } // busy wait for next to be set
cur = cur.Next; // move to next
}
Console.WriteLine(tot);
};
try { Parallel.Invoke(create, consume); }
catch (AggregateException e) {
Console.WriteLine(e.Message);
foreach (var ie in e.InnerExceptions) Console.WriteLine(ie.Message);
}
Console.WriteLine("Consume done..");
Console.ReadKey();
}
The idea is to have a Linkedlist of workitems. One thread adds items to the back of that list, and another thread reads them, does something, and polls the Next field to see if it is set. As soon as it is set it will move to the new one and process it. It polls the Next field in a tight busy loop because it should be set very quickly. Going to sleep, context switching etc would kill the benefit of parallizing the code.
The time it takes to create a workitem would be quite comparable to executing it, so the cycles wasted should be quite small.
When I run the code in release mode, sometimes it works, sometimes it does nothing. The problem seems to be in the 'Consumer' thread, the 'Create' thread always seems to finish. (You can check by fiddling with the Console.WriteLines).
It has always worked in debug mode. In release it about 50% hit and miss. Adding a few Console.Writelines helps the succes ratio, but even then it's not 100%. (the #define VERBOSE stuff).
When I add the Thread.Sleep(1) in the 'Consumer' thread it also seems to fix it. But not being able to reproduce a bug is not the same thing as knowing for sure it's fixed.
Does anyone here have a clue as to what goes wrong here? Is it some optimization that creates a local copy or something that does not get updated? Something like that?
There's no such thing as a partial update right? like a datarace, but then that one thread is half doen writing and the other thread reads the partially written memory? Just checking..
Looking at it I think it should just work.. I guess once every few times the threads arrive in different order and that makes it fail, but I don't get how. And how I could fix this without adding slowing it down?
Thanks in advance for any tips,
Gert-Jan
I do my damn best to avoid the utter minefield of closure/stack interaction at all costs.
This is PROBABLY a (language-level) race condition, but without reflecting Parallel.Invoke i can't be sure. Basically, sometimes fst is being changed by create() and sometimes not. Ideally, it should NEVER be changed (if c# had good closure behaviour). It could be due to which thread Parallel.Invoke chooses to run create() and consume() on. If create() runs on the main thread, it might change fst before consume() takes a copy of it. Or create() might be running on a separate thread and taking a copy of fst. Basically, as much as i love c#, it is an utter pain in this regard, so just work around it and treat all variables involved in a closure as immutable.
To get it working:
//Replace
WorkItem fst = null
//with
WorkItem fst = WorkItem.GetSpecialBlankFirstItem();
//And
if (fst == null) fst = tmp;
//with
if (fst.Next == null) fst.Next = tmp;
A thread is allowed by the spec to cache a value indefinitely.
see Can a C# thread really cache a value and ignore changes to that value on other threads? and also http://www.yoda.arachsys.com/csharp/threads/volatility.shtml
I'm dealing with a courious scenario.
I'm using EntityFramework to save (insert/update) into a SQL database in a multithreaded environment. The problem is i need to access database to see whether a register with a particular key has been already created in order to set a field value (executing) or it's new to set a different value (pending). Those registers are identified by a unique guid.
I've solved this problem by setting a lock since i do know entity will not be present in any other process, in other words, i will not have same guid in different processes and it seems to be working fine. It looks something like that:
static readonly object LockableObject = new object();
static void SaveElement(Entity e)
{
lock(LockableObject)
{
Entity e2 = Repository.FindByKey(e);
if (e2 != null)
{
Repository.Insert(e2);
}
else
{
Repository.Update(e2);
}
}
}
But this implies when i have a huge ammount of requests to be saved, they will be queued.
I wonder if there is something like that (please, take it just as an idea):
static void SaveElement(Entity e)
{
(using ThisWouldBeAClassToProtectBasedOnACondition protector = new ThisWouldBeAClassToProtectBasedOnACondition(e => e.UniqueId)
{
Entity e2 = Repository.FindByKey(e);
if (e2 != null)
{
Repository.Insert(e2);
}
else
{
Repository.Update(e2);
}
}
}
The idea would be having a kind of protection that protected based on a condition so each entity e would have its own lock based on e.UniqueId property.
Any idea?
Don't use application-locks where database transactions or constraints are needed.
The use of a lock to prevent duplicate entries in a database is not a good idea. It limits the scalability of your application be forcing only a single instance to ever exist that can add or update such records. Or worse, someone will eventually try to scale the application to multiple processes or servers and it will cause data corruption (since locks are local to a single process).
What you should consider instead is using a combination of unique constraints in the database and transactions to ensure that no two attempts to add the same entry can both succeed. One will succeed - the other will be forced to rollback.
This might work for you, you can just lock on the instance of e:
lock(e)
{
Entity e2 = Repository.FindByKey(e);
if (e2 != null)
{
Repository.Insert(e2);
}
else
{
Repository.Update(e2);
}
}