I currently just inherited some complex code that unfortunately I do not fully understand. It handles a large number of inventory records inputting/outputting to a database. The solution is extremely large/advanced where I am still on the newer side of c#. The issue I am encountering is that periodically the program will throw an IO Exception. It doesn't actually throw a failure code, but it messes up our output data.
the try/catch block is as follows:
private static void ReadRecords(OleDbRecordReader recordReader, long maxRows, int executionTimeout, BlockingCollection<List<ProcessRecord>> processingBuffer, CancellationTokenSource cts, Job theStack, string threadName) {
ProcessRecord rec = null;
try {
Thread.CurrentThread.Name = threadName;
if(null == cts)
throw new InvalidOperationException("Passed CancellationToken was null.");
if(cts.IsCancellationRequested)
throw new InvalidOperationException("Passed CancellationToken is already been cancelled.");
long reportingFrequency = (maxRows <250000)?10000:100000;
theStack.FireStatusEvent("Opening "+ threadName);
recordReader.Open(maxRows, executionTimeout);
theStack.FireStatusEvent(threadName + " Opened");
theStack.FireInitializationComplete();
List<ProcessRecord> inRecs = new List<PIRecord>(500);
ProcessRecord priorRec = rec = recordReader.Read();
while(null != priorRec) { //-- note that this is priorRec, not Rec. We process one row in arrears.
if(cts.IsCancellationRequested)
theStack.FireStatusEvent(threadName + " cancelling due to request or error.");
cts.Token.ThrowIfCancellationRequested();
if(rec != null) //-- We only want to count the loop when there actually is a record.
theStack.RecordCountRead++;
if(theStack.RecordCountRead % reportingFrequency == 0)
theStack.FireProgressEvent();
if((rec != null) && (priorRec.SKU == rec.SKU) && (priorRec.Store == rec.Store) && (priorRec.BatchId == rec.BatchId))
inRecs.Add(rec); //-- just store it and keep going
else { //-- otherwise, we need to process it
processingBuffer.Add(inRecs.ToList(),cts.Token); //-- note that we don't enqueue the original LIST! That could be very bad.
inRecs.Clear();
if(rec != null) //-- Again, we need this check here to ensure that we don't try to enqueue the EOF null record.
inRecs.Add(rec); //-- Now, enqueue the record that fired this condition and start the loop again
}
priorRec = rec;
rec = recordReader.Read();
} //-- end While
}
catch(OperationCanceledException) {
theStack.FireStatusEvent(threadName +" Canceled.");
}
catch(Exception ex) {
theStack.FireExceptionEvent(ex);
theStack.FireStatusEvent("Error in RecordReader. Requesting cancellation of other threads.");
cts.Cancel(); // If an exception occurs, notify all other pipeline stages, then rethrow
// throw; //-- This will also propagate Cancellation, but that's OK
}
In the log of our job we see the output loader stopping and the exception is
System.Core: Pipe is broken.
Does any one have any ideas as to what may cause this? More importantly, the individual who made this large-scale application is no longer here. When I debug all of my applications, I am able to add break points in the solution and do the standard VS stepping through everything to find the issue. However, this application is huge and has a GUI that pops up when I debug the application. I believe the GUI was made for testing purposes, but it hinders me from actually being able to step through everything. However when the .exe is run from our actual job stream, there is no GUI it just executes the way it's supposed to.
The help I am asking for is 2 things:
just suggestions as to what may cause this. Could an OleDB driver be the cause? Reason I ask is because I have this running on 2 different servers. One test and one not. The one with a new OleDB driver version does not fail (7.0 i believe whereas the other where it fails is 6.0).
Is there any code that I could add that may give me a better indication as to what may be causing the broken pipe? The error only happens periodically. If I run the job again right after, it may not happen. I'd say it's 30-40% of the time it throws the exception.
If you have any additional questions about the structure just let me know.
Related
So I have a bit of C# code that looks like the below (simplified for the purpose of the question, any bugs are from me making these changes). This code can be called from multiple threads or contexts, in an asynchronous fashion. The whole purpose of this is to make sure that if a record already exists, it is used, and if it doesn't it gets created. May or may not be great design, but this works as expected.
var timeout = TimeSpan.FromMilliseconds(500);
bool lockTaken = false;
try
{
Monitor.TryEnter(m_lock, timeout, ref lockTaken); // m_lock declared statically above
if (lockTaken)
{
var myDBRecord = _DBContext.MyClass.SingleOrDefault(x => x.ForeignKeyId1 == ForeignKeyId1
&& x.ForeignKeyId2 == ForeignKeyId2);
if (myDBRecord == null)
{
myDBRecord = new MyClass
{
ForeignKeyId1 == ForeignKeyId1,
ForeignKeyId2 == ForeignKeyId2
// ...datapoints
};
_DBContext.MyClass.Add(myDBRecord);
_DBContext.SaveChanges();
}
}
else
{
throw new Exception("Can't get lock");
}
}
finally
{
if (lockTaken)
{
Monitor.Exit(m_lock);
}
}
The problem occurs if there are a lot of requests that come in, it can overwhelm the monitor, timing out if it has to wait too long. While the timeout for the lock can certainly be shorter, what is the preferred approach, if any, to addressing this type of a problem? Anything that would try to see if the monitor'd code needed to be entered would need to be part of that atomic operation.
I would suggest that you get rid of the monitor altogether and instead handle the duplicate key exception. You have to handle the condition where you are trying to enter a duplicate value anyway, why not do so directly?
Before I use Nito.MVVM, I used plain async/await and it was throwing me an aggregate exception and I could read into it and know what I have. But since Nito, my exceptions are ignored and the program jumps from async code block and continue executes. I know that it catch exceptions because when I put a breakpoint on catch(Exception ex) line it breaks here but with ex = null. I know that NotifyTask has properties to check if an exception was thrown but where I put it, it checks when Task is uncompleted, not when I need it.
View model:
public FileExplorerPageViewModel(INavigationService navigationService)
{
_navigationService = navigationService;
_manager = new FileExplorerManager();
Files = NotifyTask.Create(GetFilesAsync("UniorDev", "GitRemote/GitRemote"));
}
Private method:
private async Task<ObservableCollection<FileExplorerModel>> GetFilesAsync(string login, string reposName)
{
return new ObservableCollection<FileExplorerModel>(await _manager.GetFilesAsync(login, reposName));
}
Manager method(where exception throws):
public async Task<List<FileExplorerModel>> GetFilesAsync(string login, string reposName)
{
//try
//{
var gitHubFiles = await GetGitHubFilesAsync(login, reposName);
var gitRemoteFiles = new List<FileExplorerModel>();
foreach ( var file in gitHubFiles )
{
if ( file.Type == ContentType.Symlink || file.Type == ContentType.Submodule ) continue;
var model = new FileExplorerModel
{
Name = file.Name,
FileType = file.Type.ToString()
};
if ( model.IsFolder )
{
var nextFiles = await GetGitHubFilesAsync(login, reposName);
var count = nextFiles.Count;
}
model.FileSize = file.Size.ToString();
gitRemoteFiles.Add(model);
}
return gitRemoteFiles;
//}
//catch ( WebException ex )
//{
// throw new Exception("Something wrong with internet connection, try to On Internet " + ex.Message);
//}
//catch ( Exception ex )
//{
// throw new Exception("Getting ExplorerFiles from github failed! " + ex.Message);
//}
}
With try/catch or without it has the same effect. This behavior is anywhere where I have NotifyTask.
Update
There is no event, that fires when exception occurred, but there is Property Changed event, so I used it and added this code:
private void FilesOnPropertyChanged(object sender, PropertyChangedEventArgs propertyChangedEventArgs)
{
throw new Exception("EXCEPTION");
bool failed;
if ( Files.IsFaulted )
failed = true;
}
And exception not fires.
I added throw exception in App class (main class) and it fired. And when I have exceptions that come from XAML, it also fires. So maybe it not fires when it comes from a view model, or something else. I have no idea. Will be very happy for some help with it.
Update
We deal with exception = null, but the question is still alive. What I wanna add, that I rarely this issue, when the app is starting to launch on the physic device. I read some info about it, and it doesn't seem to be related, but maybe it is:
I'm not entirely sure what your desired behavior is, but here's some information I hope you find useful.
NotifyTask is a data-bindable wrapper around Task. That's really all it does. So, if its Task faults with an exception, then it will update its own data-bindable properties regarding that exception. NotifyTask is intended for use when you want the UI to respond to a task completing, e.g., show a spinner while the task is in progress, an error message if the task faults, and data if the task completes successfully.
If you want your application to respond to the task faulting (with code, not just a UI update), then you should use try/catch like you have commented out in GetFilesAsync. NotifyTask doesn't change how those exceptions work; they should work just fine.
I know that it catch exceptions because when I put a breakpoint on catch(Exception ex) line it breaks here but with ex = null.
That's not possible. I suggest you try it again.
I know that NotifyTask has properties to check if an exception was thrown but where I put it, it checks when Task is uncompleted, not when I need it.
If you really want to (asynchronously) wait for the task to complete and then check for exceptions, then you can do so like this:
await Files.TaskCompleted;
var ex = Files.InnerException;
Or, if you just want to re-raise the exception:
await Files.Task;
Though I must say this usage is extremely unusual. The much more proper thing to do is to have a try/catch within your GetFilesAsync.
I have thef ollowing background worker in my app which is meant to start a user's session automatically if there is not already one available.
This is done on a backgroundworker (backgroundInit) on initialisation. As you can see below, I have a while loop which continues to run as long as the var checker remains false:
var checker = false;
var i = 0;
while (checker == false)
{
_session = funcs.GetSession(_servers, _name);
_sessID = _session[0].Trim();
_servName = _session[1];
checker = funcs.CheckRunning("lync.exe");
i++;
if (i > 200)
{
break;
}
}
The CheckRunning method just checks if a specified program (in this case, "lync") is currently running and returns either true or false accordingly (This is done via a CMD command).
When I run the app in an empty session however, the while loop only iterates one time before breaking out, even though "Lync" is definitely not running.
Is there any reason why running a process or too many processes from within a Backgroundworker may cause it to exit?
As the comments mentioned, this was not an issue with the BackgroundWorker, but rather an exception occurring at _sessID = session[0].Trim(); where the session had not yet started, so there is no ID.
To resolve this, I simply placed a Try/Catch block around this assignment, and let the program silently ignore the exception:
try
{
_sessID = _session[0].Trim();
_servName = _session[1];
}
catch (Exception exp)
{
// MessageBox.Show(exp.Message);
}
This works for me, as the loop will continue checking until the counter i reaches the 200 limit, at which stage the program will accept failure.
I currently have a console application written in C# for production server environment. It's quite simple, but I've been testing it for +- 3 months now on my own server (semi-production, but not a disaster if the program fails). I've been getting stack overflow errors every few weeks or so, though.
Of course, pasting my entire source code here would be quite a long piece, so I will try to explain it the best I can: the program is in an infinite while loop (this is probably the cause of the issue), and it checks every second a few small things and prints to the console every so often (if no activity, every 15min, otherwise up-to every second).
Now why and how can I fix the stack overflow errors? Being able to run it for a couple weeks without issue may seem like a lot, but it is obviously not in a server production environment. My guess is that it's the fact I'm using a while loop, but what alternatives do I have?
Edit: here's a portion of the code:
int timeLoop = 0;
while (true)
{
// If it has been +-10min since last read of messages file, read again
if (timeLoop > 599)
{
Console.WriteLine(DateTime.Now.ToString("HH:mm:ss") + " Reading messages...");
messagesFile = File.ReadAllText(#"messages.cfg");
messages = messagesFile.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
timeLoop = 0;
}
// For each message, check if time specified equals current time, if so say message globally.
foreach (string x in messages)
{
string[] messageThis = x.Split('~');
int typeLoop = 0;
if (messageThis[2] != "no-loop")
typeLoop = Convert.ToInt16(messageThis[2].Remove(0, 5));
DateTime checkDateTime = DateTime.ParseExact(messageThis[0], "HH:mm:ss", null);
if (typeLoop == 0)
{
if (checkDateTime.ToString("HH:mm:ss") == DateTime.Now.ToString("HH:mm:ss"))
publicMethods.sayGlobal(messageThis[1]);
}
else
{
DateTime originalDateTime = checkDateTime;
do
{
checkDateTime = checkDateTime.AddHours(typeLoop);
if (checkDateTime.ToString("HH:mm:ss") == DateTime.Now.ToString("HH:mm:ss"))
publicMethods.sayGlobal(messageThis[1]);
} while (checkDateTime.ToString("HH:mm:ss") != originalDateTime.ToString("HH:mm:ss"));
}
}
timeLoop++;
Thread.Sleep(1000);
}
Also, I forgot that I actually have this code on Github, which probably helps a lot. I know you shouldn't be using any links to code and have them here, so that's why I included the snippet above - but if you are interested in helping me out the repository is located here. Another note - I know that doing it like this is not very accurate on timing - but this is not much of an issue at the moment.
BattleEyeClient.Connect can call itself in some (failing) circumstances, and this method can be called by sayGlobal - so you probably want to change this code block (from line 98):
catch
{
if (disconnectionType == BattlEyeDisconnectionType.ConnectionLost)
{
Disconnect(BattlEyeDisconnectionType.ConnectionLost);
Connect();
return BattlEyeConnectionResult.ConnectionFailed;
}
else
{
OnConnect(loginCredentials, BattlEyeConnectionResult.ConnectionFailed);
return BattlEyeConnectionResult.ConnectionFailed;
}
}
Maybe keep track of how many reconnection attempts you make or transform this section of code so that it is a while loop rather than a recursive call during this failure mode.
Even worse, of course, is if that recursive Connect call succeeds, this catch block then returns BattlEyeConnectionResult.ConnectionFailed which it probably shouldn't.
I wanted to parallelize a piece of code, but the code actually got slower probably because of overhead of Barrier and BlockCollection. There would be 2 threads, where the first would find pieces of work wich the second one would operate on. Both operations are not much work so the overhead of switching safely would quickly outweigh the two threads.
So I thought I would try to write some code myself to be as lean as possible, without using Barrier etc. It does not behave consistent however. Sometimes it works, sometimes it does not and I can't figure out why.
This code is just the mechanism I use to try to synchronize the two threads. It doesn't do anything useful, just the minimum amount of code you need to reproduce the bug.
So here's the code:
// node in linkedlist of work elements
class WorkItem {
public int Value;
public WorkItem Next;
}
static void Test() {
WorkItem fst = null; // first element
Action create = () => {
WorkItem cur=null;
for (int i = 0; i < 1000; i++) {
WorkItem tmp = new WorkItem { Value = i }; // create new comm class
if (fst == null) fst = tmp; // if it's the first add it there
else cur.Next = tmp; // else add to back of list
cur = tmp; // this is the current one
}
cur.Next = new WorkItem { Value = -1 }; // -1 means stop element
#if VERBOSE
Console.WriteLine("Create is done");
#endif
};
Action consume = () => {
//Thread.Sleep(1); // this also seems to cure it
#if VERBOSE
Console.WriteLine("Consume starts"); // especially this one seems to matter
#endif
WorkItem cur = null;
int tot = 0;
while (fst == null) { } // busy wait for first one
cur = fst;
#if VERBOSE
Console.WriteLine("Consume found first");
#endif
while (true) {
if (cur.Value == -1) break; // if stop element break;
tot += cur.Value;
while (cur.Next == null) { } // busy wait for next to be set
cur = cur.Next; // move to next
}
Console.WriteLine(tot);
};
try { Parallel.Invoke(create, consume); }
catch (AggregateException e) {
Console.WriteLine(e.Message);
foreach (var ie in e.InnerExceptions) Console.WriteLine(ie.Message);
}
Console.WriteLine("Consume done..");
Console.ReadKey();
}
The idea is to have a Linkedlist of workitems. One thread adds items to the back of that list, and another thread reads them, does something, and polls the Next field to see if it is set. As soon as it is set it will move to the new one and process it. It polls the Next field in a tight busy loop because it should be set very quickly. Going to sleep, context switching etc would kill the benefit of parallizing the code.
The time it takes to create a workitem would be quite comparable to executing it, so the cycles wasted should be quite small.
When I run the code in release mode, sometimes it works, sometimes it does nothing. The problem seems to be in the 'Consumer' thread, the 'Create' thread always seems to finish. (You can check by fiddling with the Console.WriteLines).
It has always worked in debug mode. In release it about 50% hit and miss. Adding a few Console.Writelines helps the succes ratio, but even then it's not 100%. (the #define VERBOSE stuff).
When I add the Thread.Sleep(1) in the 'Consumer' thread it also seems to fix it. But not being able to reproduce a bug is not the same thing as knowing for sure it's fixed.
Does anyone here have a clue as to what goes wrong here? Is it some optimization that creates a local copy or something that does not get updated? Something like that?
There's no such thing as a partial update right? like a datarace, but then that one thread is half doen writing and the other thread reads the partially written memory? Just checking..
Looking at it I think it should just work.. I guess once every few times the threads arrive in different order and that makes it fail, but I don't get how. And how I could fix this without adding slowing it down?
Thanks in advance for any tips,
Gert-Jan
I do my damn best to avoid the utter minefield of closure/stack interaction at all costs.
This is PROBABLY a (language-level) race condition, but without reflecting Parallel.Invoke i can't be sure. Basically, sometimes fst is being changed by create() and sometimes not. Ideally, it should NEVER be changed (if c# had good closure behaviour). It could be due to which thread Parallel.Invoke chooses to run create() and consume() on. If create() runs on the main thread, it might change fst before consume() takes a copy of it. Or create() might be running on a separate thread and taking a copy of fst. Basically, as much as i love c#, it is an utter pain in this regard, so just work around it and treat all variables involved in a closure as immutable.
To get it working:
//Replace
WorkItem fst = null
//with
WorkItem fst = WorkItem.GetSpecialBlankFirstItem();
//And
if (fst == null) fst = tmp;
//with
if (fst.Next == null) fst.Next = tmp;
A thread is allowed by the spec to cache a value indefinitely.
see Can a C# thread really cache a value and ignore changes to that value on other threads? and also http://www.yoda.arachsys.com/csharp/threads/volatility.shtml