FTP poller occasionally hangs on file download until restart

FTP poller occasionally hangs on file download until restart - c#

I have a Windows service that polls a remote FTP server every three seconds. It checks a directory for files, downloads any files present, and deletes those files once downloaded. Average file size is 10 KB, and rarely they will go up to the 100 KB range.
Occasionally (I have noticed no pattern), the WebClient will throw the following:
System.Net.WebException: The operation has timed out.
at System.Net.WebClient.OpenRead(Uri address)
It will do this for one or more files, usually whatever files are in the remote directory at that time. It will continue to do so indefinitely, churning on the "stuck" files at each polling interval. The bizarre part is that when I stop/start the Windows service, the "stuck" files download perfectly and the polling/downloading works again for long stretches of time. This is bizarre because I download like this:
private object _pollingLock = new object();
public void PollingTimerElapsed(object sender, ElapsedEventArgs e)
{
if(Monitor.TryEnter(_pollingLock);
{
//FtpHelper lists content of files in directory
...
foreach(var file in files)
{
using(var client = new WebClient())
{
client.Proxy = null;
using(var data = client.OpenRead(file.Uri)
{
//Use data stream to write file locally
...
}
}
//FtpHelper deletes the file
...
}
}
//Release the _pollingLock inside a finally
}
I would assume that a new connection is opened and closed for each file (unless .NET is doing something behind the scenes). If a file download had an issue, it would get a fresh retry on the next polling interval (in 3 sec). Why would a service restart make things work?
I've begun to suspect that the issue has something to do with caching (file or connection). Recently I tried going into Internet Explorer and clearing the cache. Approximately 30 sec or so later, all the files downloaded with no service restart. But, the next batch of files to arrive all got hung up again. I might try adding a line like this:
client.CachePolicy = new RequestCachePolicy(RequestCacheLevel.NoCacheNoStore);
or try disabling KeepAlives, but I want to get some opinions before I start trying random stuff.
So: What is causing the occasional timeouts? Why does restarting the service work? Why did clearing the cache work?
Update
I made the cache policy and keep alive change mentioned above about two weeks ago. I just now got my first timeout since then. It appears to have improved frequency, but alas, it is still happening.
Update
As requested, this is how I am kicking off the Timer:
_pollingTimer.AutoReset = true;
_pollingTimer.Elapser += PollingTimerElapsed;
_pollingTimer.Interval = 10000;
_pollingTimer.Enabled = true;`

Looks like you are kicking off your processing using the System.Timers.Timer.Elapsed event.
One gotcha that I found is that if your Elapsed event takes longer to execute than the timer interval, your event can be called again from another thread before it has finished executing.
This is specifically mentioned in the docs:
If the SynchronizingObject property is null, the Elapsed event is
raised on a ThreadPool thread. If the processing of the Elapsed event
lasts longer than Interval, the event might be raised again on another
ThreadPool thread. In this situation, the event handler should be
reentrant.
Assuming you are indeed using a vanilla timer with AutoReset=true (its on by default), first thing to do would be address this potential issue. You can use a SynchronizingObject, alternatively you can do something like this:
//setup code
Timer myTimer = new Timer(30000);
myTimer.AutoReset = false;
....
//Elapsed handler
public void PollingTimerElapsed(object sender, ElapsedEventArgs e)
{
//do what you currently do
...
//when finished, kick off the timer again
myTimer.Start();
}
Either way, the main thing is to ensure that your code doesn't accidentally get called simultaneously by multiple threads - if that happens there's a good chance that occasionally you'll have one thread trying to download something from the site while another thread is simultaneously deleting the file.
The things that you mentioned e.g. it only happens occasionally, that normally file sizes are small, that its fixed by a restart, etc. would point me in the direction of this being the issue.

Related

Foreground services and repetitive tasks which need to be executed on time

I'm developing an app which basically performs some tasks on timer tick (in this case - searching for beacons) and sends results to the server. My goal was to create an app which does its job constantly in the background. Fortunately, I'm using logging all over the code, so when we started to test it we found that sometime later the timer's callback wasn't being called on time. There were some pauses which obviously had been caused by standby and doze mode. At that moment I was using a background service and System.Threading.Timer. Then, after some research, I rewrote the services to use Alarm Manager + Wake locks, but the pauses were still there. The next try was to make the service foreground and use it with a Handler to post delayed tasks and everything seemed to be fine while the device was connected to the computer. When the device is not connected to a charger those pauses are here again. The interesting thing is that we cannot actually predict this behavior. Sometimes it works perfectly fine and sometimes not. And this is really strange because the code to schedule it is pretty simple and straightforward:
...
private int scanThreadsCount = 0;
private Android.OS.Handler handler = new Android.OS.Handler();
private bool LocationInProgress
{
get { return Interlocked.CompareExchange(ref scanThreadsCount, 0, 0) != 0; }
}
public void ForceLocation()
{
if (!LocationInProgress) DoLocation();
}
private async void DoLocation()
{
Interlocked.Increment(ref scanThreadsCount);
Logger.Debug("Location is started");
try
{
// Location...
}
catch (Exception e)
{
Logger.Error(e, "Location cannot be performed due to an unexpected error");
}
finally
{
if (LocationInterval > 0)
{
# It's here. The location interval is 60 seconds
# and the service is running in the foreground!
# But in the screenshot we can see the delay which
# sometimes reaches 10 minutes or even more
handler.PostDelayed(ForceLocation, LocationInterval * 1000);
}
Logger.Debug("Location has been finished");
Interlocked.Decrement(ref scanThreadsCount);
}
}
...
Actually it can be ok, but I need that service to do its job strictly on time, but the callback is being called with a few seconds delay or a few minutes and that's not acceptable.
The Android documentation says that foreground services are not restricted by standby and doze mode, but I cannot really find the cause of that strange behavior. Why is the callback not being called on time? Where do these 10 minutes pauses come from? It's pretty frustrating because I cannot move further unless I have the robust basis. Does anybody know the reason of such a strange behavior or any suggestions how I can achieve the callback to be executed on time?
P.S. The current version of the app is here. I know, it's quite boring trying to figure out what is wrong with one's code, but there are only 3 files which have to do with that problem:
~/Services/BeaconService.cs
~/Services/BeaconServiceScanFunctionality.cs
~/Services/BeaconServiceSyncFunctionality.cs
The project was provided for those who would probably want to try it in action and figure it out by themselves.
Any help will be appreciated!
Thanks in advance

Monitor.TryEnter and Threading.Timer race condition

I have a Windows service that every 5 seconds checks for work. It uses System.Threading.Timer for handling the check and processing and Monitor.TryEnter to make sure only one thread is checking for work.
Just assume it has to be this way as the following code is part of 8 other workers that are created by the service and each worker has its own specific type of work it needs to check for.
readonly object _workCheckLocker = new object();
public Timer PollingTimer { get; private set; }
void InitializeTimer()
{
if (PollingTimer == null)
PollingTimer = new Timer(PollingTimerCallback, null, 0, 5000);
else
PollingTimer.Change(0, 5000);
Details.TimerIsRunning = true;
}
void PollingTimerCallback(object state)
{
if (!Details.StillGettingWork)
{
if (Monitor.TryEnter(_workCheckLocker, 500))
{
try
{
CheckForWork();
}
catch (Exception ex)
{
Log.Error(EnvironmentName + " -- CheckForWork failed. " + ex);
}
finally
{
Monitor.Exit(_workCheckLocker);
Details.StillGettingWork = false;
}
}
}
else
{
Log.Standard("Continuing to get work.");
}
}
void CheckForWork()
{
Details.StillGettingWork = true;
//Hit web server to grab work.
//Log Processing
//Process Work
}
Now here's the problem:
The code above is allowing 2 Timer threads to get into the CheckForWork() method. I honestly don't understand how this is possible, but I have experienced this with multiple clients where this software is running.
The logs I got today when I pushed some work showed that it checked for work twice and I had 2 threads independently trying to process which kept causing the work to fail.
Processing 0-3978DF84-EB3E-47F4-8E78-E41E3BD0880E.xml for Update Request. - at 09/14 10:15:501255801
Stopping environments for Update request - at 09/14 10:15:501255801
Processing 0-3978DF84-EB3E-47F4-8E78-E41E3BD0880E.xml for Update Request. - at 09/14 10:15:501255801
Unloaded AppDomain - at 09/14 10:15:10:15:501255801
Stopping environments for Update request - at 09/14 10:15:501255801
AppDomain is already unloaded - at 09/14 10:15:501255801
=== Starting Update Process === - at 09/14 10:15:513756009
Downloading File X - at 09/14 10:15:525631183
Downloading File Y - at 09/14 10:15:525631183
=== Starting Update Process === - at 09/14 10:15:525787359
Downloading File X - at 09/14 10:15:525787359
Downloading File Y - at 09/14 10:15:525787359
The logs are written asynchronously and are queued, so don't dig too deep on the fact that the times match exactly, I just wanted to point out what I saw in the logs to show that I had 2 threads hit a section of code that I believe should have never been allowed. (The log and times are real though, just sanitized messages)
Eventually what happens is that the 2 threads start downloading a big enough file where one ends up getting access denied on the file and causes the whole update to fail.
How can the above code actually allow this? I've experienced this problem last year when I had a lock instead of Monitor and assumed it was just because the Timer eventually started to get offset enough due to the lock blocking that I was getting timer threads stacked i.e. one blocked for 5 seconds and went through right as the Timer was triggering another callback and they both somehow made it in. That's why I went with the Monitor.TryEnter option so I wouldn't just keep stacking timer threads.
Any clue? In all cases where I have tried to solve this issue before, the System.Threading.Timer has been the one constant and I think its the root cause, but I don't understand why.

I can see in log you've provided that you got an AppDomain restart over there, is that correct? If yes, are you sure that you have the one and the only one object for your service during the AppDomain restart? I think that during that not all the threads are being stopped right in the same time, and some of them could proceed with polling the work queue, so the two different threads in different AppDomains got the same Id for work.
You probably could fix this with marking your _workCheckLocker with static keyword, like this:
static object _workCheckLocker;
and introduce the static constructor for your class with initialization of this field (in case of the inline initialization you could face some more complicated problems), but I'm not sure is this be enough for your case - during AppDomain restart static class will reload too. As I understand, this is not an option for you.
Maybe you could introduce the static dictionary instead of object for your workers, so you can check the Id for documents in process.
Another approach is to handle the Stopping event for your service, which probably could be called during the AppDomain restart, in which you will introduce the CancellationToken, and use it to stop all the work during such circumstances.
Also, as #fernando.reyes said, you could introduce heavy lock structure called mutex for a synchronization, but this will degrade your performance.

TL;DR
Production stored procedure has not been updated in years. Workers were getting work they should have never gotten and so multiple workers were processing update requests.
I was able to finally find the time to properly set myself up locally to act as a production client through Visual Studio. Although, I wasn't able to reproduce it like I've experienced, I did accidentally stumble upon the issue.
Those with the assumptions that multiple workers were picking up the work was indeed correct and that's something that should have never been able to happen as each worker is unique in the work they do and request.
It turns out that in our production environment, the stored procedure to retrieve work based on the work type has not been updated in years (yes, years!) of deploys. Anything that checked for work automatically got updates which meant when the Update worker and worker Foo checked at the same time, they both ended up with the same work.
Thankfully, the fix is database side and not a client update.

ASP.NET FileSystemWatcher Changed Event

I created a ASP.NET Website with Visual Studio 2010 C#.
My program reads a config file to create some classes and display informations.
The config file is not included in the project (does not appear in the solution explorer). If I modify the file while my application is not running, and run it afterwards, it still reads the old version like it keep it in cache. I have to close Visual Studio for it to accept the changes.
My second problem is related to (if not caused by) my first problem. I am using FileSystemWatcher to see if the config file is modified while the application is running, but the Changed event is never called.
private string _configFilePath;
private FileSystemWatcher _watcher;
protected void Page_Load(object sender, EventArgs e)
{
//Gets the config file in the application's parent directory
string appPath = this.MapPath("~");
string[] split = appPath.Split('\\');
_configFilePath = appPath.Substring(0, appPath.Length - split[split.Length-1].Length);
Application.Add("watcher", new FileSystemWatcher(_configFilePath.Substring(0, _configFilePath.Length-1), "*.xml"));
_watcher = (FileSystemWatcher)Application["watcher"];
_watcher.NotifyFilter = NotifyFilters.FileName;
_watcher.Changed += new System.IO.FileSystemEventHandler(Watcher_Changed);
_configFilePath += "ProductsConfig.xml";
UpdateDisplay();
}
private void Watcher_Changed(object source, FileSystemEventArgs e)
{
UpdateDisplay();
}
How can I solve this?
Thank you

My second problem is related to (if not caused by) my first problem. I
am using FileSystemWatcher to see if the config file is modified while
the application is running, but the Changed event is never called.
It's never called because at that point the Thread that's servicing the request is already returned to the pool and the request has ended. The Watcher_Changed event will never fire.
You need to tackle this in a different manner, remember that HTTP is a "disconnected" protocol, after the request has been served, don't expect any of the page events to fire "automagically" when something happens on the server side that would notify all connected users.
One way to do this is via Ajax. You'd need to constantly "ask" the server whether there's new information or not and update the sections of the page that need to be updated as a result of the change on the server.

There are 2 problems here.
1. You never called _watcher.EnableRaisingEvents = true;
2. You try to go to the parent folder of your root folder, which might not be allowed.
/ Tibi

c# service "The netwerk bios command limit has been reached" error why and whats the best solution?

I have a service that opens multiple watchers to watch multiple folders. After watching folders for a certain amount of time I get the "The network bios command limit has been reached".
As I read on here this is caused by having more long term requests than allowed.
I believe this occurs due to the below error handling code i have, which is triggered by the watchers error event. This starts a new instance of the watcher by calling the WatchFile method again. I believe this leaves the old now defunct watcher running and starts a new watcher but i am afraid stopping the watcher will either prevent it from starting it up again or will stop all instances based on the watcher.
Or am I wrong and is the error dependent on the amount of changes? This would cause 100 files to drop in at the same time to cause this error.
I was thinking of stopping and starting the service whenever I run in this error but this would not solve the problem itself but just hide it. Is there a better solution to it?
private static void watcherError(String directory, Boolean intray, ErrorEventArgs e, FileSystemWatcher watcher)
{
Exception watchException = e.GetException();
EventLog.WriteEntry("WhiteFileMover", String.Concat("error gedetecteerd, watcher werd herstart - ", watchException.Message), EventLogEntryType.Information);
watcher = new FileSystemWatcher();
while (!watcher.EnableRaisingEvents)
{
try
{
// This will throw an error at the
// watcher.NotifyFilter line if it can't get the path.
WatchFile(directory, intray);
}
catch(Exception exp)
{
// Sleep for a bit; otherwise, it takes a bit of
// processor time
EventLog.WriteEntry("WhiteFileMover", String.Concat("Failed to restart watcher, retrying in 5 seconds - ", exp.Message), EventLogEntryType.Warning);
System.Threading.Thread.Sleep(5000);
}
}
}

Look at this line:
watcher = new FileSystemWatcher();
You passed in a FileSystemWatcher variable, but completely ignored the passed value. Instead, you created a new instance. Not only that, but you fail to correctly dispose the instance. My guess is that you have a bunch of old FileSystemWatcher objects hanging around waiting to be collected. Each of those will hold on to some real file system resources from the operating system. Over time, you run out of available file handles.

Why don't you create the filewatcher and define the event handlers?
http://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher.aspx

program stuck because of missing thrown events

This is the case:
A business has several sites, each site holds several cameras that take alt of pictures daily (about a thousand pictures each). These pictures are then stored in a folder (one folder for a day) on one computer.
The business own an image analyzing program that gets an "in.xml" file as input and returns an "out.xml" file, for analyzing of one picture. This program must used and cannot be changed.
I wrote a UI for that program that runs on that folder and processes each camera from each site, sending pic after pic to that program which runs as a separate process.
Because this processing is async I have used events at the start and end of every pic's handling, and the same for sites and cameras on sites.
The program run on that business greatly, but sometimes it gets stuck after handling a pic, like it has missed the end_pic_analizing event, and is still waiting for it to be thrown.
I tried putting a timer for every picture, that moves to the next pic in such cases, but it still got stuck again, acting like is was missing the timer event as well.
This bug happens too many times, even when running almost as single process at that computer, and has got stuck even at the start of the process (happened at the third picture once). this bug doesn't depend on specific pictures either, because it can be stuck at different pics or not be stuck at all, while running repeatedly on the same folder.
Code samples:
on the Image class:
static public void image_timer_Elapsed(object sender, ElapsedEventArgs e)
{
//stop timer and calculate the how much time left for next check for file.
_timer.Stop();
_timerCount += (int)_timer.Interval;
int timerSpan = (int)(Daily.duration * 1000) - _timerCount;
//daily.duration is the max duration for seekin the "out.xml" file before quiting.
if (timerSpan < _timer.Interval) _timer.Interval = timerSpan + 1;
//check for file and analize it.
String fileName = Daily.OutPath + #"\out.xml";
ResultHandler.ResultOut output = ResultHandler.GetResult(ref _currentImage);
//if no file found and there is time left wait and check again
if (output == ResultHandler.ResultOut.FileNotFound && timerSpan > 0)
{
_timer.Start();
}
else //file found or time left
{
if (MyImage.ImageCheckCompleted != null)
MyImage.ImageCheckCompleted(_currentImage); //throw event
// the program is probably got stuck here.
}
On camera class:
static public void Camera_ImageCheckCompleted(MyImage image)
{
//if this is not the last image. (parent as Camera )
if (image.Id + 1 < image.parent.imageList.Count)
{
image.parent.imageList[image.Id + 1].RunCheck(); //check next image
}
else
{
if (Camera.CameraCheckCompleted != null)
Camera.CameraCheckCompleted(image.parent); // throw event
}
}

You don't appear to have any error handling or logging code, so if an exception is thrown your program will halt and you might not have a record of what happened. This is especially true since your program is processing the images asynchronously, so the main thread may have already exited by the time an error occurs in one of your processing threads.
So first and foremost, I would suggest throwing a try/catch block around all the code that gets run in the separate thread. If an exception gets thrown there, you will want to catch that and either fire ImageCheckCompleted with some special event arguments to indicate there was an error or fire some other event that you create specifically for when errors occur. That way your program can continue to process even if an exception is thrown inside your code.
try
{
//... Do your processing
// This will happen if everything worked correctly.
InvokeImageCheckCompleted(new ImageCheckCompletedEventArgs();
}
catch (Exception e)
{
// This will happen if an exception got thrown.
InvokeImageCheckCompleted(new ImageCheckCompletedEventArgs(e);
}
For the sake of simplicity, I'd suggest using a for loop to process each image. You can use a ManualResetEvent to block execution until the ImageCheckCompleted event fires for each check. This should make it easier to log the execution of each loop, catch errors that may be preventing the ImageCheckCompleted event from firing, and even possibly move on to process the next image if one of them appears to be taking too long.
Finally, if you can make your image processing thread-safe, you might consider using Parallel.ForEach to make it so that multiple images can be processed at the same time. This will probably significantly improve the overall speed of processing the batch.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.