First I've read all the posts here regarding this issue and I manged to progress a bit. However it seems I do need your help :)
I have a program with several threads, sometimes (not always) the CPU usage of the program is increasing up to 100% and never reduced until I shut down the program.
As I read in other similar posts, I ran the app using the visual studio (2012 - Ultimate).
I paused the app, and open the threads window.
There I pauses the threads until I've found the 4 threads which stuck the app.
The all refer to the same line of code (a call for constructor).
I checked the constructor inside and outside and couldn't find any loop which could cause it.
To be more careful I've added break point to almost every line of code and resume the app. None of them have been triggered.
This is the line of code:
public static void GenerateDefacementSensors(ICrawlerManager cm)
{
m_SensorsMap = new Dictionary<DefacementSensorType, DefacementSensor>();
// Create instance of all sensors
// For any new defacement sensor, don't forget to add an appropriate line here
// m_SensorsMap.add(DefacementSensorType.[Type], new [Type]Sensor())
try
{
if (m_SensorsMap.Count <= 0)
{
m_SensorsMap.Add(DefacementSensorType.BackgroundSensor, new BackgroundSensor());
m_SensorsMap.Add(DefacementSensorType.TaglinesSensor, new TaglinesSensor(cm.Database));
m_SensorsMap.Add(DefacementSensorType.SingleImageSensor, new SingleImageSensor());
}
}
catch (Exception)
{
Console.WriteLine("There was a problem initializing defacement sensors");
}
}
The second "m_SensorsMap.Add" is marked with green arrow, as I understand it, it means it's still waiting to the first line to finish.
By the way, the m_SensorsMap.Count value is 3.
How can I find the problem?
Is it a loop?
Or maybe a deadlock (not make sense because it shouldn't be 100% cpu, right?)
It's pointless to upload a code because this is a huge project.
I need more general help like how to debug?
Is it could something else than a loop?
Because it's a bug that returns every while and than I'm not closing the app until I found the problem :)
Thanks in advance!!
Edit:
The constructors:
public TaglinesSensor(IDatabase db)
{
m_DB = db;
}
I couldn't found the problem so I've changed the design on order not to call those constructors anymore.
Thanks for the guys who tried to help.
Shaul
Related
I have a service using WCF. Internally it has a dictionary with lists that you can add to or get a subset of from different endpoints.
The code is something like this:
List<Data> list = null;
try
{
locker.EnterReadLock();
list = internalData[Something].Where(x => x.hassomething()).ToList();
}
finally
{
locker.ExitReadLock();
}
foreach (var y in list)
{
result[y.proprty1].Add(y.property2); // <-- here it hangs
}
return result;
So the internalData is locked with a ReaderWriterLockSlim for all operations, readerlock for reading and writerlock for adding. I make a copy of the items inside the lock and work on this copy later.
The issue is after a while, more and more cpu-cores goes to 100% and finally is uses all cores. It can run perfectly for days and million calls before it stops.
Attaching debugger and pausing shows that one treads hang on adding to the result dictionary. But as soon as I resume all threads will continue and a lot of memory is being released.
Is there something special happening when a debugger is attached, pausing and resuming that will release something like this?
I changed my locks to lock(something) { code... } and the problem with hangs went away. So it looks lilke i ran into the issue Steffen Winkler pointed out in his comment. http://joeduffyblog.com/2007/02/07/introducing-the-new-readerwriterlockslim-in-orcas/
I'm developing an app which basically performs some tasks on timer tick (in this case - searching for beacons) and sends results to the server. My goal was to create an app which does its job constantly in the background. Fortunately, I'm using logging all over the code, so when we started to test it we found that sometime later the timer's callback wasn't being called on time. There were some pauses which obviously had been caused by standby and doze mode. At that moment I was using a background service and System.Threading.Timer. Then, after some research, I rewrote the services to use Alarm Manager + Wake locks, but the pauses were still there. The next try was to make the service foreground and use it with a Handler to post delayed tasks and everything seemed to be fine while the device was connected to the computer. When the device is not connected to a charger those pauses are here again. The interesting thing is that we cannot actually predict this behavior. Sometimes it works perfectly fine and sometimes not. And this is really strange because the code to schedule it is pretty simple and straightforward:
...
private int scanThreadsCount = 0;
private Android.OS.Handler handler = new Android.OS.Handler();
private bool LocationInProgress
{
get { return Interlocked.CompareExchange(ref scanThreadsCount, 0, 0) != 0; }
}
public void ForceLocation()
{
if (!LocationInProgress) DoLocation();
}
private async void DoLocation()
{
Interlocked.Increment(ref scanThreadsCount);
Logger.Debug("Location is started");
try
{
// Location...
}
catch (Exception e)
{
Logger.Error(e, "Location cannot be performed due to an unexpected error");
}
finally
{
if (LocationInterval > 0)
{
# It's here. The location interval is 60 seconds
# and the service is running in the foreground!
# But in the screenshot we can see the delay which
# sometimes reaches 10 minutes or even more
handler.PostDelayed(ForceLocation, LocationInterval * 1000);
}
Logger.Debug("Location has been finished");
Interlocked.Decrement(ref scanThreadsCount);
}
}
...
Actually it can be ok, but I need that service to do its job strictly on time, but the callback is being called with a few seconds delay or a few minutes and that's not acceptable.
The Android documentation says that foreground services are not restricted by standby and doze mode, but I cannot really find the cause of that strange behavior. Why is the callback not being called on time? Where do these 10 minutes pauses come from? It's pretty frustrating because I cannot move further unless I have the robust basis. Does anybody know the reason of such a strange behavior or any suggestions how I can achieve the callback to be executed on time?
P.S. The current version of the app is here. I know, it's quite boring trying to figure out what is wrong with one's code, but there are only 3 files which have to do with that problem:
~/Services/BeaconService.cs
~/Services/BeaconServiceScanFunctionality.cs
~/Services/BeaconServiceSyncFunctionality.cs
The project was provided for those who would probably want to try it in action and figure it out by themselves.
Any help will be appreciated!
Thanks in advance
I have a Windows service that every 5 seconds checks for work. It uses System.Threading.Timer for handling the check and processing and Monitor.TryEnter to make sure only one thread is checking for work.
Just assume it has to be this way as the following code is part of 8 other workers that are created by the service and each worker has its own specific type of work it needs to check for.
readonly object _workCheckLocker = new object();
public Timer PollingTimer { get; private set; }
void InitializeTimer()
{
if (PollingTimer == null)
PollingTimer = new Timer(PollingTimerCallback, null, 0, 5000);
else
PollingTimer.Change(0, 5000);
Details.TimerIsRunning = true;
}
void PollingTimerCallback(object state)
{
if (!Details.StillGettingWork)
{
if (Monitor.TryEnter(_workCheckLocker, 500))
{
try
{
CheckForWork();
}
catch (Exception ex)
{
Log.Error(EnvironmentName + " -- CheckForWork failed. " + ex);
}
finally
{
Monitor.Exit(_workCheckLocker);
Details.StillGettingWork = false;
}
}
}
else
{
Log.Standard("Continuing to get work.");
}
}
void CheckForWork()
{
Details.StillGettingWork = true;
//Hit web server to grab work.
//Log Processing
//Process Work
}
Now here's the problem:
The code above is allowing 2 Timer threads to get into the CheckForWork() method. I honestly don't understand how this is possible, but I have experienced this with multiple clients where this software is running.
The logs I got today when I pushed some work showed that it checked for work twice and I had 2 threads independently trying to process which kept causing the work to fail.
Processing 0-3978DF84-EB3E-47F4-8E78-E41E3BD0880E.xml for Update Request. - at 09/14 10:15:501255801
Stopping environments for Update request - at 09/14 10:15:501255801
Processing 0-3978DF84-EB3E-47F4-8E78-E41E3BD0880E.xml for Update Request. - at 09/14 10:15:501255801
Unloaded AppDomain - at 09/14 10:15:10:15:501255801
Stopping environments for Update request - at 09/14 10:15:501255801
AppDomain is already unloaded - at 09/14 10:15:501255801
=== Starting Update Process === - at 09/14 10:15:513756009
Downloading File X - at 09/14 10:15:525631183
Downloading File Y - at 09/14 10:15:525631183
=== Starting Update Process === - at 09/14 10:15:525787359
Downloading File X - at 09/14 10:15:525787359
Downloading File Y - at 09/14 10:15:525787359
The logs are written asynchronously and are queued, so don't dig too deep on the fact that the times match exactly, I just wanted to point out what I saw in the logs to show that I had 2 threads hit a section of code that I believe should have never been allowed. (The log and times are real though, just sanitized messages)
Eventually what happens is that the 2 threads start downloading a big enough file where one ends up getting access denied on the file and causes the whole update to fail.
How can the above code actually allow this? I've experienced this problem last year when I had a lock instead of Monitor and assumed it was just because the Timer eventually started to get offset enough due to the lock blocking that I was getting timer threads stacked i.e. one blocked for 5 seconds and went through right as the Timer was triggering another callback and they both somehow made it in. That's why I went with the Monitor.TryEnter option so I wouldn't just keep stacking timer threads.
Any clue? In all cases where I have tried to solve this issue before, the System.Threading.Timer has been the one constant and I think its the root cause, but I don't understand why.
I can see in log you've provided that you got an AppDomain restart over there, is that correct? If yes, are you sure that you have the one and the only one object for your service during the AppDomain restart? I think that during that not all the threads are being stopped right in the same time, and some of them could proceed with polling the work queue, so the two different threads in different AppDomains got the same Id for work.
You probably could fix this with marking your _workCheckLocker with static keyword, like this:
static object _workCheckLocker;
and introduce the static constructor for your class with initialization of this field (in case of the inline initialization you could face some more complicated problems), but I'm not sure is this be enough for your case - during AppDomain restart static class will reload too. As I understand, this is not an option for you.
Maybe you could introduce the static dictionary instead of object for your workers, so you can check the Id for documents in process.
Another approach is to handle the Stopping event for your service, which probably could be called during the AppDomain restart, in which you will introduce the CancellationToken, and use it to stop all the work during such circumstances.
Also, as #fernando.reyes said, you could introduce heavy lock structure called mutex for a synchronization, but this will degrade your performance.
TL;DR
Production stored procedure has not been updated in years. Workers were getting work they should have never gotten and so multiple workers were processing update requests.
I was able to finally find the time to properly set myself up locally to act as a production client through Visual Studio. Although, I wasn't able to reproduce it like I've experienced, I did accidentally stumble upon the issue.
Those with the assumptions that multiple workers were picking up the work was indeed correct and that's something that should have never been able to happen as each worker is unique in the work they do and request.
It turns out that in our production environment, the stored procedure to retrieve work based on the work type has not been updated in years (yes, years!) of deploys. Anything that checked for work automatically got updates which meant when the Update worker and worker Foo checked at the same time, they both ended up with the same work.
Thankfully, the fix is database side and not a client update.
I wrote an API that automates a certain website. However, on the testing stage, I noticed that (not very sure), my thread is not being terminated correctly.
I am using the WebBrowser object to navigate inside a thread, so that it works synchronously with my program:
private void NavigateThroughTread(string url)
{
Console.WriteLine("Defining thread...");
var th = new Thread(() =>
{
_wb = new WebBrowser();
_wb.DocumentCompleted += PageLoaded;
_wb.Visible = true;
_wb.Navigate(url);
Console.WriteLine("Web browser navigated.");
Application.Run();
});
Console.WriteLine("Thread defined.");
th.SetApartmentState(ApartmentState.STA);
Console.WriteLine("Before thread start...");
th.Start();
Console.WriteLine("Thread started.");
while (th.IsAlive) { }
Console.WriteLine("Journey ends.");
}
private void PageLoaded(object sender, WebBrowserDocumentCompletedEventArgs e)
{
Console.WriteLine("Pages loads...");
.
.
.
switch (_action)
{
.
.
.
case ENUM.FarmActions.Idle:
_wb.Navigate(new Uri("about:blank"));
_action = ENUM.FarmActions.Exit;
return;
case ENUM.FarmActions.Exit:
Console.WriteLine("Disposing wb...");
_wb.DocumentCompleted -= PageLoaded;
_wb.Dispose();
break;
}
Application.ExitThread(); // Stops the thread
}
Here is how I call this function:
public int Attack(int x, int y, ArmyBuilder army)
{
// instruct to attack the village
_action = ENUM.FarmActions.Attack;
//get the army and coordinates
_army = army;
_enemyCoordinates[X] = x;
_enemyCoordinates[Y] = y;
//Place the attack command
_errorFlag = true; // the action is not complated, the flag will set as false once action is complete
_attackFlag = false; // attack is not made yet
Console.WriteLine("Journey starts");
NavigateThroughTread(_url.GetUrl(ENUM.Screens.RallyPoint));
return _errorFlag ? -1 : CalculateDistance();
}
So the problem is, when I call the Attack function, couple times like this:
_command.Attack(509, 355, new ArmyBuilder(testArmy_lc));
_command.Attack(509, 354, new ArmyBuilder(testArmy_lc));
_command.Attack(505, 356, new ArmyBuilder(testArmy_lc));
_command.Attack(504, 356, new ArmyBuilder(testArmy_lc));
_command.Attack(504, 359, new ArmyBuilder(testArmy_lc));
_command.Attack(505, 356, new ArmyBuilder(testArmy_lc));
_command.Attack(504, 356, new ArmyBuilder(testArmy_lc));
_command.Attack(504, 359, new ArmyBuilder(testArmy_lc));
My application most of the times, gets stuck in one of these function (usually happens after the 4th or 5th). When it gets stuck the last log that I see is
Web browser navigated.
I assume it is something to do with termination of my thread. Can someone show me how I can run a thread which runs the DocumentCompleted event ?
I don't see any obvious reason for deadlock, nor did it reproduce at all when testing the code. There are a number of flaws in the code but nothing that yells "here!" loudly. I can only make recommendations:
Consider that you do not need a thread at all. The while (th.IsAlive) { } hot loop blocks your main thread while you wait for the browser code to finish the job. That is not a useful way to use a thread, you might as well use your main thread. This instantly eliminates a large number of potential hang causes.
The state logic in PageLoaded is risky. We cannot see all of it but one glaring issue is that you dispose the WebBrowser twice. If you have a case where you use return without a Navigate() call then you'll hang as described. No need to unsubscribe the event but same story, if you do unsubscribe but don't all Application.Exit() then you'll hang as described. State machines can be hard to debug, thorough logging is necessary. Minimize the risk by moving the Dispose() call and unsubscribing the event out of the logic, it doesn't belong there. And you need to test what happens when any Navigate() call ends up in failure, redirecting to a page you did not expect.
The _wb.Dispose() call is risky. Note that you destroy the WebBrowser while its DocumentCompleted event is in flight. Technically that can return code execution to code that is no longer alive or present. That can trip a race condition in the browser. As well as in the debugger, there is a dedicated MDA that checks for this problem. It is trivially avoided by moving the Dispose() call after the Application.Run() call where it belongs.
The while-loop burns 100% core, potentially starving the worker thread. Not a good enough reason to explain deadlock, but certainly unnecessary. Use Thread.Join() instead.
You create a lot of WebBrowser objects in this code. It is a very heavy object, as you can imagine, you need to keep an eye on memory usage in your program. Especially the unmanaged kind. If the browser leaks, like they so often do, you could technically create a scenario where the WB initializes okay but does not have enough memory left to load the page. Strongly favor using only one WB.
You need to consider that this might well be an environmental problem. On the top of that list is forever anti-malware and firewall, they always have a very good reason to treat a browser specially since that is the most common malware injection vector. You'll need to run your test with anti-malware and firewall disabled to ensure that it is not the cause of the hang.
Another environmental problem is one I noticed while testing this code, Google got sulky about me hitting it so often and started to throttle the requests, greatly slowing down the code. Talk to the web site owner and ask if he's got similar blocking or throttling counter-measures in place, most do. You need to test your state logic to verify that it still works properly when the browser redirects to an error page.
Yet another environmental issue is the WB will display a dialog itself in certain cases. This can deadlock in 3rd party code, very hard to diagnose. You should at least set the WebBrower.ScriptErrorsSuppressed to true but beware of Javascript code in the web page you load that itself creates new windows or displays alert dialogs. Using one WB is the workaround.
Keep in mind that your program can only be as reliable as your Internet connection and the web page server. That's not a terribly good place to be of course, both are quite out of your reach and you don't get nice exceptions to help you diagnose such a failure. And consider that you probably have not yet tested your program well enough yet to check if it can survive such a failure, it doesn't happen enough.
Quite a laundry list, focus first on eliminating the unnecessary thread and temporarily suppressing anti-malware. That's quick, focus next on using only one WebBrowser.
Hans thank you, I was able to fix this issue with one of your ideas. As you spent your time giving me a long answer, I wanted respond in same manner.
2 - I built the state machine structure carefully and with a lot logs (you can see it from my git account) also did a lot of debugs. I am sure that after I'm done navigating, I use Application.ExitThread() and wb.Dispose() only once.
3 - I tried placing the wb.Dispose() outside the event, however I couldn't find any other place where the Thread is still alive. If I try disposing WebBrowser outside the thread which is created inside the thread, the application gives me an error.
4 - I changed the code while (th.IsAlive) { } with th.Join(2000) this is absolutely a better idea but did not change anything. It optimized the code and as you mentioned, it prevented burning 100% core of my CPU.
5 - I tried using a single WebBrowser object which is instantiated in the constructor. However when I tried to navigate inside the thread, the application wouldnt even fire the events anymore. For some reason, I couldn't make it running whit a single WB object.
6,7 - I tested my application with different PC's and diffrent networks(with firewall and non-firewall protection). I changed windows firewall options as well but no travail. On my original code I do have _wb.ScriptErrorsSuppressed = true; so this shouldn't also be the issue.
8,9 - If these are the reasons, I can't do anything about it. But I doubt the real problem is caused because of them.
1 - This one was a good suggestion. I tried implementing my code without using a thread and it is now working fine. Here is how it looks like (still needs a lot optimization)
// Constructer
public FarmActions(string token)
{
// set the urls using the token
_url = new URL(token);
// define web browser properties
_wb = new WebBrowser();
_wb.DocumentCompleted += PageLoaded;
_wb.Visible = true;
_wb.AllowNavigation = true;
_wb.ScriptErrorsSuppressed = true;
}
public int Attack(int x, int y, ArmyBuilder army)
{
// instruct to attack the village
_action = ENUM.FarmActions.Attack;
//get the army and coordinates
_army = army;
_enemyCoordinates[X] = x;
_enemyCoordinates[Y] = y;
//Place the attack command
_errorFlag = true; // the action is not complated, the flag will set as false once action is complete
_attackFlag = false; // attack is not made yet
_isAlive = true;
Console.WriteLine("-------------------------");
Console.WriteLine("Journey starts");
NavigateThroughTread(_url.GetUrl(ENUM.Screens.RallyPoint));
return _errorFlag ? -1 : CalculateDistance();
}
private void NavigateThroughTread(string url)
{
Console.WriteLine("Defining thread...");
_wb.Navigate(url);
while (_isAlive) Application.DoEvents();
}
private void PageLoaded(object sender, WebBrowserDocumentCompletedEventArgs e)
{
Console.WriteLine("Pages loads...");
.
.
.
switch (_action)
{
.
.
.
case ENUM.FarmActions.Idle:
_wb.Navigate(new Uri("about:blank"));
_action = ENUM.FarmActions.Exit;
return;
case ENUM.FarmActions.Exit:
break;
}
_isAlive = false;
}
This is how I was able to wait without using a thread.
The main problem was probably as you mentioned in number 3 or 5. But I wasn't able to fix the problem as I spent couple of hours.
Anyway thanks for your help it works.
I have the following code that throws an out of memory exception when writing large files. Is there something I'm missing?
I am not sure why it is throwing an out of memory error as I thought the Filestream would only use a maximum of 4096 bytes for the buffer? I am not entirely sure what it means by the Buffer to be honest and any advice would be appreciated.
public static async Task CreateRandomFile(string pathway, int size, IProgress<int> prog)
{
byte[] fileSize = new byte[size];
new Random().NextBytes(fileSize);
await Task.Run(() =>
{
using (FileStream fs = File.Create(pathway,4096))
{
for (int i = 0; i < size; i++)
{
fs.WriteByte(fileSize[i]);
prog.Report(i);
}
}
}
);
}
public static void p_ProgressChanged(object sender, int e)
{
int pos = Console.CursorTop;
Console.WriteLine("Progress Copied: " + e);
Console.SetCursorPosition (0, pos);
}
public static void Main()
{
Console.WriteLine("Testing CopyLearning");
//CopyFile()
Progress<int> p = new Progress<int>();
p.ProgressChanged += p_ProgressChanged;
Task ta = CreateRandomFile(#"D:\Programming\Testing\RandomFile.asd", 99999999, p);
ta.Wait();
}
Edit: the 99,999,999 was just created to make a 99MB file
Note: I have commented out prog.Report(i) and it will work fine.
It seems for some reason, the error occurs at the line
Console.writeline("Progress Copied: " + e);
I am not entirely sure why this causes an error? So the error might have been caused because of the progressEvent?
Edit 2: I have followed advice to change the code such that it reports progress every 4000 Bytes by using the following:
if (i%4000==0)
prog.Report(i);
For some reason. I am now able to write files up to 900MBs fine.
I guess the question is, why would the "Edit 2"'s code allow it to write up to 900MB just fine? Is it because it's reporting progress and writing to the console up to 4000x less than before? I didn't realize the Console would take up so much memory especially because I'm assuming all it's doing is outputting "Progress Copied"?
Edit 3:
For some reason when I change the following line as follows:
for (int i = 0; i < size; i++)
{
fs.WriteByte(fileSize[i]);
Console.Writeline(i)
prog.Report(i);
}
where there is a "Console.Writeline()" before the prog.Report(i), it would work fine and copy the file, albeit take a very long time to do so. This leads me to believe that this is a Console related issue for some reason but I am not sure as to what.
fs.WriteByte(fileSize[i]);
prog.Report(i);
You created a fire-hose problem. After deadlocks and threading races, probably the 3rd most likely problem caused by threads. And just as hard to diagnose.
Easiest to see by using the debugger's Debug + Windows + Threads window and look at thread that is executing CreateRandomFile(). With some luck, you'll see it is completed and has written all 99MB bytes. But the progress reported on the console is far behind this, having only reported 125KB bytes written, give or take.
Core issue is the way Progress<>.Report() works. It uses SynchronizationContext.Post() to invoke the ProgressChanged event handler. In a console mode app that will call ThreadPool.QueueUserWorkItem(). That's quite fast, your CreateRandomFile() method won't be bogged down much by it.
But the event handler itself is quite a lot slower, console output is not very fast. So in effect, you are adding threadpool work requests at an enormous rate, 99 million of them in a handful of seconds. No way for the threadpool scheduler to keep up, you'll have roughly 4 of them executing at the same time. All competing to write to the console as well, only one of them can acquire the underlying lock.
So it is the threadpool scheduler that causes OOM, forced to store so many work requests.
And sure, when you call Report() less frequently then the fire-hose problem is a lot less worse. Not actually that simple to ensure it never causes a problem, although directly calling Console.Write() is an obvious fix. Ultimately simple, create a usable UI that is useful to a human. Nobody likes a crazily scrolling window or a blur of text. Reporting progress no more frequently than 20 times per second is plenty good enough for the user's eyes, the console has no trouble keeping up with that.