Force loop containing asynchronous task to maintain sequence - c#

Something tells me this might be a stupid question and I have in fact approached my problem from the wrong direction, but here goes.
I have some code that loops through all the documents in a folder - The alphabetical order of these documents in each folder is important, this importance is also reflected in the order the documents are printed. Here is a simplified version:
var wordApp = new Microsoft.Office.Interop.Word.Application();
foreach (var file in Directory.EnumerateFiles(folder))
{
fileCounter++;
// Print file, referencing a previously instantiated word application object
wordApp.Documents.Open(...)
wordApp.PrintOut(...)
wordApp.ActiveDocument.Close(...)
}
It seems (and I could be wrong) that the PrintOut code is asynchronous, and the application sometimes gets into a situation where the documents get printed out of order. This is confirmed because if I step through, or place a long enough Sleep() call, the order of all the files is correct.
How should I prevent the next print task from starting before the previous one has finished?
I initially thought that I could use a lock(someObject){} until I remembered that they are only useful for preventing multiple threads accessing the same code block. This is all on the same thread.
There are some events I can wire into on the Microsoft.Office.Interop.Word.Application object: DocumentOpen, DocumentBeforeClose and DocumentBeforePrint
I have just thought that this might actually be a problem with the print queue not being able to accurately distinguish lots of documents that are added within the same second. This can't be the problem, can it?
As a side note, this loop is within the code called from the DoWork event of a BackgroundWorker object. I'm using this to prevent UI blocking and to feedback the progress of the process.

Your event-handling approach seems like a good one. Instead of using a loop, you could add a handler to the DocumentBeforeClose event, in which you would get the next file to print, send it to Word, and continue. Something like this:
List<...> m_files = Directory.EnumerateFiles(folder);
wordApp.DocumentBeforeClose += ProcessNextDocument;
...
void ProcessNextDocument(...)
{
File file = null;
lock(m_files)
{
if (m_files.Count > 0)
{
file = m_files[m_files.Count - 1];
m_files.RemoveAt(m_files.Count - 1);
}
else
{
// Done!
}
}
if (file != null)
{
PrintDocument(file);
}
}
void PrintDocument(File file)
{
wordApp.Document.Open(...);
wordApp.Document.PrintOut(...);
wordApp.ActiveDocument.Close(...);
}

The first parameter of Application.PrintOut specifies whether the printing should take place in the background or not. By setting it to false it will work synchronously.

Related

program stuck because of missing thrown events

This is the case:
A business has several sites, each site holds several cameras that take alt of pictures daily (about a thousand pictures each). These pictures are then stored in a folder (one folder for a day) on one computer.
The business own an image analyzing program that gets an "in.xml" file as input and returns an "out.xml" file, for analyzing of one picture. This program must used and cannot be changed.
I wrote a UI for that program that runs on that folder and processes each camera from each site, sending pic after pic to that program which runs as a separate process.
Because this processing is async I have used events at the start and end of every pic's handling, and the same for sites and cameras on sites.
The program run on that business greatly, but sometimes it gets stuck after handling a pic, like it has missed the end_pic_analizing event, and is still waiting for it to be thrown.
I tried putting a timer for every picture, that moves to the next pic in such cases, but it still got stuck again, acting like is was missing the timer event as well.
This bug happens too many times, even when running almost as single process at that computer, and has got stuck even at the start of the process (happened at the third picture once). this bug doesn't depend on specific pictures either, because it can be stuck at different pics or not be stuck at all, while running repeatedly on the same folder.
Code samples:
on the Image class:
static public void image_timer_Elapsed(object sender, ElapsedEventArgs e)
{
//stop timer and calculate the how much time left for next check for file.
_timer.Stop();
_timerCount += (int)_timer.Interval;
int timerSpan = (int)(Daily.duration * 1000) - _timerCount;
//daily.duration is the max duration for seekin the "out.xml" file before quiting.
if (timerSpan < _timer.Interval) _timer.Interval = timerSpan + 1;
//check for file and analize it.
String fileName = Daily.OutPath + #"\out.xml";
ResultHandler.ResultOut output = ResultHandler.GetResult(ref _currentImage);
//if no file found and there is time left wait and check again
if (output == ResultHandler.ResultOut.FileNotFound && timerSpan > 0)
{
_timer.Start();
}
else //file found or time left
{
if (MyImage.ImageCheckCompleted != null)
MyImage.ImageCheckCompleted(_currentImage); //throw event
// the program is probably got stuck here.
}
On camera class:
static public void Camera_ImageCheckCompleted(MyImage image)
{
//if this is not the last image. (parent as Camera )
if (image.Id + 1 < image.parent.imageList.Count)
{
image.parent.imageList[image.Id + 1].RunCheck(); //check next image
}
else
{
if (Camera.CameraCheckCompleted != null)
Camera.CameraCheckCompleted(image.parent); // throw event
}
}
You don't appear to have any error handling or logging code, so if an exception is thrown your program will halt and you might not have a record of what happened. This is especially true since your program is processing the images asynchronously, so the main thread may have already exited by the time an error occurs in one of your processing threads.
So first and foremost, I would suggest throwing a try/catch block around all the code that gets run in the separate thread. If an exception gets thrown there, you will want to catch that and either fire ImageCheckCompleted with some special event arguments to indicate there was an error or fire some other event that you create specifically for when errors occur. That way your program can continue to process even if an exception is thrown inside your code.
try
{
//... Do your processing
// This will happen if everything worked correctly.
InvokeImageCheckCompleted(new ImageCheckCompletedEventArgs();
}
catch (Exception e)
{
// This will happen if an exception got thrown.
InvokeImageCheckCompleted(new ImageCheckCompletedEventArgs(e);
}
For the sake of simplicity, I'd suggest using a for loop to process each image. You can use a ManualResetEvent to block execution until the ImageCheckCompleted event fires for each check. This should make it easier to log the execution of each loop, catch errors that may be preventing the ImageCheckCompleted event from firing, and even possibly move on to process the next image if one of them appears to be taking too long.
Finally, if you can make your image processing thread-safe, you might consider using Parallel.ForEach to make it so that multiple images can be processed at the same time. This will probably significantly improve the overall speed of processing the batch.

Where to store progress information in ASP.Net web application

I'm creating a page that get uploaded text files and builds them into multiple PDFs. They are just exports from Excel. Each row in the file corresponds to a new PDF that needs to be created.
Anyway, once the files are uploaded I want to begin processing them, but I don't want the user to have to stay on the page, or even still have their session open. For example they could close the browser and come back 10 minutes later, log in, and the progress information will say like 112/200 files processed or something. It will be a lot quicker than that though.
So two questions really, how can I pass this processing job to something (Handler?Thread?) that will continue to run when the page is closed, and will return as soon as the job has started (so the browser isn't stopped)? Secondly, where can I store this information so that when the user comes back to the page, they can see the current progress.
I realise that I can't use sessions, and since it will be processing about a file a second I don't really want to update a DB every second. Is there some way I can do this? Is it possible?
I solved this by using the link provided by astander above. I simply create an object in the HttpContext.Application to store progress variables, and then Set the method which does my processing inside a new Thread.
// Create the new progress object
BatchProgress bs = new BatchProgress(0);
if(Application["BatchProgress"] != null)
{
// Should never happen
Application["BatchProgress"] = bs;
}
else
{
Application.Add("BatchProgress","bs");
}
//Set up new thread, run batch is the method that does all the processing.
ThreadStart ts = new ThreadStart(RunBatch);
Thread t = new Thread(ts);
t.Start();
It then returns after the thread starts and I can use jQuery to get the Application["BatchProgress"] object at regular intervals. At the end of my thread the BatchProgress object has its status set to "Complete", then when jQuery queries it, it sees the complete status and removes the progress object from the application.

Monitoring a remote process

I have a method that stops a service(s) but I also need to delete the logs. Usually this is not a problem but the process can take a little bit of time before closing. Again, although the service appears stopped, the process does take additional time to close properly. Since the process is still running, I cannot delete the logs so I need to find a way to monitor the .exe to know when its safe to delete the logs.
so far my best option is a do while loop, unfortunately the first iteration of the delete statement throws an exception and stops the program.
do
{
// delete logs
}
while (System.Diagnostics.Process.GetProcessesByName(processName, machineName).Length > 0);
Im sure there is a simple solution but my lack of experience is the real problem.
This is probably not the best answer either, but you could invert the loop to:
while (System.Diagnostics.Process.GetProcessesByName(processName, machineName).Length > 0)
{
// delete log files.
}
I would suppose this would evalutate the condition of the loop before executing the contents. But according to your statements, this will not execute the code until the process has exited.
A hackish way around this is to perform a loop, and break out manually once the conditions:
bool CloseProcessOperation = true; // Control variable incase you want to abort the loop
while (CloseProcessOperation)
{
if (System.Diagnostics.Process.GetProcessesByName(processName, machineName).Length > 0) { break; }
// break if no logs exist
// break for some other condition
// etc
// delete logs
}

MSWord automation:Get file contents after it was saved

I have an application that uses MSWord automation to edit some documents, after they save and close word I need to grab the modified file and put it back on the repository, there is only one scenario where I can't get it to work and that is
when the user makes changes to the file, selects to close word and selects yes to save the file
there are 2 events that I'm using:
DocumentBeforeSave
Quit
on the Quit event I'm trying to load the .docx file from disk but on this particular scenario I get an IOException because the file is still in use, somehow I need to wait until after the Quit event has been processed, which is when Word is actually closed and the file is no longer being used
right now I have it working using this
word.Visible = true;
while (!wordDone) { //gets changed to true on the Quit event
System.Threading.Thread.Sleep(100);
}
bool error = false;
do {
try { //need to load the contents of the modified file
ls.Content = System.IO.File.ReadAllBytes(provider.GetFileName());
error = false;
}
catch (System.IO.IOException) {
error = true;
System.Threading.Thread.Sleep(200);
}
} while (error);
while this works it is very ugly, I need a way to fire an event after the Quit event has been handled, or block the current thread while word is still running, or get an event after the document has been saved, the bottom line is I need a clean way to load the file after it has been saved and word is closed. DocumentAfterSave would be awesome, but doesn't seem to exist.
I Also tried unhooking the Quit handler and calling word.Quit on the Quit handler, that made no difference
I'm also investigating the use of ManualResetEvent or related classes, so far it almost works, but I still need to pause after it has been signaled to make sure word is closed and the file is no longer in use
I faced similar problem in the past as well. I dont think there is any nice clean way but instead of doing it like your above, how about considering this (will suit if you have a controlled environment)
Create word app
Get the Process ID immediately by using GetProcesses matching Winword and the last one in the list return should be the one you are after. This is not 100% reliable in multiuser environment.
After word quit, use the Thread.Sleep loop to ensure the PID no longer exist.
Reading the docx for your custom operations
I used to have the same problem. Using ReleaseComObject on all COM-related objects did the trick (that is, on your Word document object and your Word.Application object). That way you ensure that all dirty locks are removed after the COM object has been destroyed. Close the document and application with the Interop API. I use:
var localWordapp = new Word.Application();
localWordapp.Visible = false;
Word.Document doc = null;
// ...
if (doc != null)
{
doc.Close();
System.Runtime.InteropServices.Marshal.ReleaseComObject(doc);
}
localWordapp.Quit();
System.Runtime.InteropServices.Marshal.ReleaseComObject(localWordapp);

How to Lock a file and avoid readings while it's writing

My web application returns a file from the filesystem. These files are dynamic, so I have no way to know the names o how many of them will there be. When this file doesn't exist, the application creates it from the database. I want to avoid that two different threads recreate the same file at the same time, or that a thread try to return the file while other thread is creating it.
Also, I don't want to get a lock over a element that is common for all the files. Therefore I should lock the file just when I'm creating it.
So I want to lock a file till its recreation is complete, if other thread try to access it ... it will have to wait the file be unlocked.
I've been reading about FileStream.Lock, but I have to know the file length and it won't prevent that other thread try to read the file, so it doesn't work for my particular case.
I've been reading also about FileShare.None, but it will throw an exception (which exception type?) if other thread/process try to access the file... so I should develop a "try again while is faulting" because I'd like to avoid the exception generation ... and I don't like too much that approach, although maybe there is not a better way.
The approach with FileShare.None would be this more or less:
static void Main(string[] args)
{
new Thread(new ThreadStart(WriteFile)).Start();
Thread.Sleep(1000);
new Thread(new ThreadStart(ReadFile)).Start();
Console.ReadKey(true);
}
static void WriteFile()
{
using (FileStream fs = new FileStream("lala.txt", FileMode.Create, FileAccess.Write, FileShare.None))
using (StreamWriter sw = new StreamWriter(fs))
{
Thread.Sleep(3000);
sw.WriteLine("trolololoooooooooo lolololo");
}
}
static void ReadFile()
{
Boolean readed = false;
Int32 maxTries = 5;
while (!readed && maxTries > 0)
{
try
{
Console.WriteLine("Reading...");
using (FileStream fs = new FileStream("lala.txt", FileMode.Open, FileAccess.Read, FileShare.Read))
using (StreamReader sr = new StreamReader(fs))
{
while (!sr.EndOfStream)
Console.WriteLine(sr.ReadToEnd());
}
readed = true;
Console.WriteLine("Readed");
}
catch (IOException)
{
Console.WriteLine("Fail: " + maxTries.ToString());
maxTries--;
Thread.Sleep(1000);
}
}
}
But I don't like the fact that I have to catch exceptions, try several times and wait an inaccurate amount of time :|
You can handle this by using the FileMode.CreateNew argument to the stream constructor. One of the threads is going to lose and find out that the file was already created a microsecond earlier by another thread. And will get an IOException.
It will then need to spin, waiting for the file to be fully created. Which you enforce with FileShare.None. Catching exceptions here doesn't matter, it is spinning anyway. There's no other workaround for it anyway unless you P/Invoke.
i think that a right aproach would be the following:
create a set of string were u will save the current file name
so one thread would process the file at time, something like this
//somewhere on your code or put on a singleton
static System.Collections.Generic.HashSet<String> filesAlreadyProcessed= new System.Collections.Generic.HashSet<String>();
//thread main method code
bool filealreadyprocessed = false
lock(filesAlreadyProcessed){
if(set.Contains(filename)){
filealreadyprocessed= true;
}
else{
set.Add(filename)
}
}
if(!filealreadyprocessed){
//ProcessFile
}
Do you have a way to identify what files are being created?
Say every one of those files corresponds to a unique ID in your database. You create a centralised location (Singleton?), where these IDs can be associated with something lockable (Dictionary). A thread that needs to read/write to one of those files does the following:
//Request access
ReaderWriterLockSlim fileLock = null;
bool needCreate = false;
lock(Coordination.Instance)
{
if(Coordination.Instance.ContainsKey(theId))
{
fileLock = Coordination.Instance[theId];
}
else if(!fileExists(theId)) //check if the file exists at this moment
{
Coordination.Instance[theId] = fileLock = new ReaderWriterLockSlim();
fileLock.EnterWriteLock(); //give no other thread the chance to get into write mode
needCreate = true;
}
else
{
//The file exists, and whoever created it, is done with writing. No need to synchronize in this case.
}
}
if(needCreate)
{
createFile(theId); //Writes the file from the database
lock(Coordination.Instance)
Coordination.Instance.Remove[theId];
fileLock.ExitWriteLock();
fileLock = null;
}
if(fileLock != null)
fileLock.EnterReadLock();
//read your data from the file
if(fileLock != null)
fileLock.ExitReadLock();
Of course, threads that don't follow this exact locking protocol will have access to the file.
Now, locking over a Singleton object is certainly not ideal, but if your application needs global synchronization then this is a way to achieve it.
Your question really got me thinking.
Instead of having every thread responsible for file access and having them block, what if you used a queue of files that need to be persisted and have a single background worker thread dequeue and persist?
While the background worker is cranking away, you can have the web application threads return the db values until the file does actually exist.
I've posted a very simple example of this on GitHub.
Feel free to give it a shot and let me know what you think.
FYI, if you don't have git, you can use svn to pull it http://svn.github.com/statianzo/MultiThreadFileAccessWebApp
The question is old and there is already a marked answer. Nevertheless I would like to post a simpler alternative.
I think we can directly use the lock statement on the filename, as follows:
lock(string.Intern("FileLock:absoluteFilePath.txt"))
{
// your code here
}
Generally, locking a string is a bad idea because of String Interning. But in this particular case it should ensure that no one else is able to access that lock. Just use the same lock string before attempting to read. Here interning works for us and not against.
PS: The text 'FileLock' is just some arbitrary text to ensure that other string file paths are not affected.
Why aren't you just using the database - e.g. if you have a way to associate a filename with the data from the db it contains, just add some information to the db that specifies whether a file exists with that information currently and when it was created, how stale the information in the file is etc. When a thread needs some information, it checks the db to see if that file exists and if not, it writes out a row to the table saying it's creating the file. When it's done it updates that row with a boolean saying the file is ready to be used by others.
the nice thing about this approach - all your information is in 1 place - so you can do nice error recovery - e.g. if the thread creating the file dies badly for some reason, another thread can come along and decide to rewrite the file because the creation time is too old. You can also create simple batch cleanup processes and get accurate data on how frequently certain data is being used for a file, how often information is updated (by looking at the creation times etc). Also, you avoid having to do many many disk seeks across your filesystem as different threads look for different files all over the place - especially if you decide to have multiple front-end machines seeking across a common disk.
The tricky thing - you'll have to make sure your db supports row-level locking on the table that threads write to when they create files because otherwise the table itself may be locked which could make this unacceptably slow.

Categories

Resources