I'm working on a xml service at the moment , which is a sum of 20+ other xml's from other site's services.
So at first it was just ;
GetherDataAndCreateXML();
But obviously getting 20+ other xml , editing and serving it takes time , so i decided to cache it for like 10 minutes and added a final.xml file with a DateTime attribute to check if it's out of date etc. So it became something like ;
var de = DateTime.Parse(x.Element("root").Attribute("DateTime").Value).AddSeconds(10.0d);
if (de >= DateTime.Now)
return finalXML();
else
{
RefreshFinalXml();
return finalXML();
}
The problem now , is that any request after that 10 minute obviously takes too much time as it's waiting for my looong RefreshFinalXml() function. So i did this;
if (ndt >= DateTime.Now)
return finalXML();
else
{
ThreadStart start = RefreshFinalXml;
var thr = new Thread(start);
thr.IsBackground = true;
thr.Start();
return finalXML();
}
This way , even at the 11th minute i simply return the old final.xml but meanwhile i start another thread to refresh current xml at the background. So after something like 13th minute , users get fresh data without any delay.
But still there is a problem with this ; it creates a new thread for every single request between 10 to 13th minutes ( while first RefreshFinalXml is still working at the background ) and obviously i can't let that happen , right? And since I don't know much about locking files and detecting if it's lock , i added a little attribute , "Updating" to my final xml ;
if (ndt >= DateTime.Now)
return finalXML();
else
{
if (final.Element("root").Attribute("Updating").Value != "True")
{
final.Element("root").SetAttributeValue("Updating", "True");
final.Save(Path);
ThreadStart start = RefreshFinalXml;
//I change Updating Attribute back to False at the end of this function , right before saving Final Xml
var thr = new Thread(start);
thr.IsBackground = true;
thr.Start();
}
return finalXML();
}
So ,
0-10 minutes = return from cache
10~13 minutes = return from cache while just one thread is refreshing final.xml
13+ minutes = returns from cache
It works and seems decent at the moment , but the question/problem is ; I'm extremely inexperienced in these kind of stuff ( xml services , threading , locks etc ) so i'm not really sure if it'll work flawlessly under tougher situations. For example , will my custom locking create problems under heavy traffic, should i switch to lock file etc.
So I'm looking for any advice/correction about this process , what would be the "best practice" etc.
Thanks in advance
Full Code : http://pastebin.com/UH94S8t6
Also apologies for my English as it's not my mother language and it gets even worse when I'm extremely sleepless/tired as I'm at the moment.
EDIT : Oh I'm really sorry but somehow i forgot to mention a crucial thing ; this is all working on Asp.Net Mvc2. I think i could have done a little better if it wasn't a web application but i think that changes many things right?
You've got a couple of options here.
Approach #1
First, you can use .NET's asychronous APIs for fetching the data. Assuming you're using HttpWebRequest you'd want to take a look at BeginGetResponse and EndGetResponse, as well as the BeginRead and EndRead methods on the Stream you get back the response.
Example
var request = WebRequest.Create("http://someurl.com");
request.BeginGetResponse(delegate (IAsyncResult ar)
{
Stream responseStream = request.EndGetResponse(ar).GetResponseStream();
// use async methods on the stream to process the data -- omitted for brevity
});
Approach #2
Another approach is to use the thread pool to do your work, rather than creating and managing your own threads. This will effectively cap the number of threads you're running, as well as removing the performance hit you'd normally get when you create a new thread.
Now, you're right about not wanting to repeatedly fire updates while you wait for
Example #2
Your code might look something like this:
// We use a dictionary here for efficiency
var Updating = new Dictionary()<TheXMLObjectType, object>;
...
if (de >= DateTime.Now)
{
return finalXML();
}
else
{
// Lock the updating dictionary to prevent other threads from
// updating it before we're done.
lock (Updating)
{
// If the xml is already in the updating dictionary, it's being
// updated elsewhere, so we don't need to do anything.
// On the other hand, if it's not already being updated we need
// to queue RefreshFinalXml, and set the updating flag
if (!Updating.ContainsKey(xml))
{
// Use the thread pool for the work, rather than managing our own
ThreadPool.QueueUserWorkItem(delegate (Object o)
{
RefreshFinalXml();
lock(Updating)
{
Updating.Remove(xml);
}
});
// Set the xml in the updating dictionary
Updating[xml] = null;
}
}
return finalXML();
}
Hopefully that's enough for you to work off of.
I would go for a different method assuming the following
Your service is always running
You can afford/are allowed to getting the XML files even if you don't have any request to your service currently.
The XML files you fetch are the same files for all your requests. (that is the total number of XML files you need for all your responses are those 20 files)
The resulting XML file is not too big to keep in memory all the time
1
First of all I would not store the resulting XML in a file on disk but rather in a static variable.
2
Second I would create a timer set on 10 minutes that updates the cache even if you have no calls to your service. That way you always have quite recent data ready and cached even if your service was not called for a while. It also removes the need to think about if you already have a refresh "ongoing".
3
Third I would consider using threading/async calls to fetch all your 20 XML's in parallel. This is only useful if you want to reduce the refresh time. It could allow you to reduce the refresh interval from 10 to maybe 1-2 minutes if that is improving your service.
I would recommend 1 and 2, but 3 is more optional.
Related
Hi could someone guide me in the following problem, there must be tons of guides on this problem but for some reason I can't get google to find a nice how to, to follow
I'm implementing this in aspnet core API, but I think the problem/solution could go for any language.
The problem, i have to call a view from a database that is painfully slow it takes about 15-30 seconds to return the ~ 300 rows
It only returns the fields that are required. It joins from a lot of tables and multiple databases. (There are other applications that updates the data, I'm only interested in reading the result)
The DBA says there is nothing he can do, so I have to find a solution, and why not, it could be fun
Now the real problem is there are about 250 autonomous clients requesting data, a client request data about every 2 minutes, and with the time it takes to select data it doesn't take long for the system to become unresponsive. It is the same data in the response for all requests
It would be acceptable to cache the rows for 5 minutes. Now how would I implement it so only one request select form the database and update the cache while all others read from a cache and perhaps are waiting if the cache is empty for a short period, while new data is being loaded to cache from the database view?
(I could write a script to be scheduled to execute every x minute, but it would be more fun to solve this in the application.)
I could perhaps make some cache tables in the database and let the api call check if the cache table is empty, if not get the data from the slow view, populate the cache database and return result. But then what would be a god solution to only empty the cache and populate the cache once and not multiple times when there are going to come multiple requests in the timeframe it takes to load data from the view.
And perhaps there are better alternatives that caching in a database table?
Hope anyone can help
Your question is very unspecific to a technology. So you are asking for an concept. In general
check cache without locking
return data if it is up to date
perform lock
check cache again
update cache
unlock and return data
You may read/use https://learn.microsoft.com/en-us/dotnet/core/extensions/caching
// pseudo code
async Result QueryFromCache()
{
// check cache is up to date - without lock
var cacheData = await GetCacheData(); // latest data or null
if (cacheData == null)
{
// wait for cache data
cacheData = await UpdateCache();
return cacheData.Data;
}
// data is still up to date?
if (cacheData.UpdateDate.AddMinutes(5) > DateTime.Now())
{
return cacheData.Data;
}
// Cache Update is necessary
// Option 1: Start separate "fire and forget" Thread/Task to fill cache, but return old data
Task.Run(UpdateCache()); // do not await
return cacheData.Data; // return immediatly with old data > 5 Minutes
// Option 2:
cacheData = await UpdateCache(); // wait for an update
return cacheData.Data;
}
async CacheData UpdateCache()
{
var lock = GetLock(); // lock, Semaphore, Database-Lock => depends on your architecture
try
{
// doubled check cache is up to date - with lock
var cacheData = await GetCacheData();
if (cacheDate != null && cacheData.UpdateDate.AddMinutes(5) > DateTime.Now())
{
return cacheData;
}
// Query data
var result = await PerformLongQuery();
// update cache
var cacheData = new CacheData {
UpdateDate = DateTime.Now(),
Data = result;
}
await SetCacheData(cacheEntry);
return cacheData;
} finally {
lock.Release();
}
}
I have a Windows service that every 5 seconds checks for work. It uses System.Threading.Timer for handling the check and processing and Monitor.TryEnter to make sure only one thread is checking for work.
Just assume it has to be this way as the following code is part of 8 other workers that are created by the service and each worker has its own specific type of work it needs to check for.
readonly object _workCheckLocker = new object();
public Timer PollingTimer { get; private set; }
void InitializeTimer()
{
if (PollingTimer == null)
PollingTimer = new Timer(PollingTimerCallback, null, 0, 5000);
else
PollingTimer.Change(0, 5000);
Details.TimerIsRunning = true;
}
void PollingTimerCallback(object state)
{
if (!Details.StillGettingWork)
{
if (Monitor.TryEnter(_workCheckLocker, 500))
{
try
{
CheckForWork();
}
catch (Exception ex)
{
Log.Error(EnvironmentName + " -- CheckForWork failed. " + ex);
}
finally
{
Monitor.Exit(_workCheckLocker);
Details.StillGettingWork = false;
}
}
}
else
{
Log.Standard("Continuing to get work.");
}
}
void CheckForWork()
{
Details.StillGettingWork = true;
//Hit web server to grab work.
//Log Processing
//Process Work
}
Now here's the problem:
The code above is allowing 2 Timer threads to get into the CheckForWork() method. I honestly don't understand how this is possible, but I have experienced this with multiple clients where this software is running.
The logs I got today when I pushed some work showed that it checked for work twice and I had 2 threads independently trying to process which kept causing the work to fail.
Processing 0-3978DF84-EB3E-47F4-8E78-E41E3BD0880E.xml for Update Request. - at 09/14 10:15:501255801
Stopping environments for Update request - at 09/14 10:15:501255801
Processing 0-3978DF84-EB3E-47F4-8E78-E41E3BD0880E.xml for Update Request. - at 09/14 10:15:501255801
Unloaded AppDomain - at 09/14 10:15:10:15:501255801
Stopping environments for Update request - at 09/14 10:15:501255801
AppDomain is already unloaded - at 09/14 10:15:501255801
=== Starting Update Process === - at 09/14 10:15:513756009
Downloading File X - at 09/14 10:15:525631183
Downloading File Y - at 09/14 10:15:525631183
=== Starting Update Process === - at 09/14 10:15:525787359
Downloading File X - at 09/14 10:15:525787359
Downloading File Y - at 09/14 10:15:525787359
The logs are written asynchronously and are queued, so don't dig too deep on the fact that the times match exactly, I just wanted to point out what I saw in the logs to show that I had 2 threads hit a section of code that I believe should have never been allowed. (The log and times are real though, just sanitized messages)
Eventually what happens is that the 2 threads start downloading a big enough file where one ends up getting access denied on the file and causes the whole update to fail.
How can the above code actually allow this? I've experienced this problem last year when I had a lock instead of Monitor and assumed it was just because the Timer eventually started to get offset enough due to the lock blocking that I was getting timer threads stacked i.e. one blocked for 5 seconds and went through right as the Timer was triggering another callback and they both somehow made it in. That's why I went with the Monitor.TryEnter option so I wouldn't just keep stacking timer threads.
Any clue? In all cases where I have tried to solve this issue before, the System.Threading.Timer has been the one constant and I think its the root cause, but I don't understand why.
I can see in log you've provided that you got an AppDomain restart over there, is that correct? If yes, are you sure that you have the one and the only one object for your service during the AppDomain restart? I think that during that not all the threads are being stopped right in the same time, and some of them could proceed with polling the work queue, so the two different threads in different AppDomains got the same Id for work.
You probably could fix this with marking your _workCheckLocker with static keyword, like this:
static object _workCheckLocker;
and introduce the static constructor for your class with initialization of this field (in case of the inline initialization you could face some more complicated problems), but I'm not sure is this be enough for your case - during AppDomain restart static class will reload too. As I understand, this is not an option for you.
Maybe you could introduce the static dictionary instead of object for your workers, so you can check the Id for documents in process.
Another approach is to handle the Stopping event for your service, which probably could be called during the AppDomain restart, in which you will introduce the CancellationToken, and use it to stop all the work during such circumstances.
Also, as #fernando.reyes said, you could introduce heavy lock structure called mutex for a synchronization, but this will degrade your performance.
TL;DR
Production stored procedure has not been updated in years. Workers were getting work they should have never gotten and so multiple workers were processing update requests.
I was able to finally find the time to properly set myself up locally to act as a production client through Visual Studio. Although, I wasn't able to reproduce it like I've experienced, I did accidentally stumble upon the issue.
Those with the assumptions that multiple workers were picking up the work was indeed correct and that's something that should have never been able to happen as each worker is unique in the work they do and request.
It turns out that in our production environment, the stored procedure to retrieve work based on the work type has not been updated in years (yes, years!) of deploys. Anything that checked for work automatically got updates which meant when the Update worker and worker Foo checked at the same time, they both ended up with the same work.
Thankfully, the fix is database side and not a client update.
I've got a routine called GetEmployeeList that loads when my Windows Application starts.
This routine pulls in basic employee information from our Active Directory server and retains this in a list called m_adEmpList.
We have a few Windows accounts set up as Public Profiles that most of our employees on our manufacturing floor use. This m_adEmpList gives our employees the ability to log in to select features using those Public Profiles.
Once all of the Active Directory data is loaded, I attempt to "auto logon" that employee based on the System.Environment.UserName if that person is logged in under their private profile. (employees love this, by the way)
If I do not thread GetEmployeeList, the Windows Form will appear unresponsive until the routine is complete.
The problem with GetEmployeeList is that we have had times when the Active Directory server was down, the network was down, or a particular computer was not able to connect over our network.
To get around these issues, I have included a ManualResetEvent m_mre with the THREADSEARCH_TIMELIMIT timeout so that the process does not go off forever. I cannot login someone using their Private Profile with System.Environment.UserName until I have the list of employees.
I realize I am not showing ALL of the code, but hopefully it is not necessary.
public static ADUserList GetEmployeeList()
{
if ((m_adEmpList == null) ||
(((m_adEmpList.Count < 10) || !m_gotData) &&
((m_thread == null) || !m_thread.IsAlive))
)
{
m_adEmpList = new ADUserList();
m_thread = new Thread(new ThreadStart(fillThread));
m_mre = new ManualResetEvent(false);
m_thread.IsBackground = true;
m_thread.Name = FILLTHREADNAME;
try {
m_thread.Start();
m_gotData = m_mre.WaitOne(THREADSEARCH_TIMELIMIT * 1000);
} catch (Exception err) {
Global.LogError(_CODEFILE + "GetEmployeeList", err);
} finally {
if ((m_thread != null) && (m_thread.IsAlive)) {
// m_thread.Abort();
m_thread = null;
}
}
}
return m_adEmpList;
}
I would like to just put a basic lock using something like m_adEmpList, but I'm not sure if it is a good idea to lock something that I need to populate, and the actual data population is going to happen in another thread using the routine fillThread.
If the ManualResetEvent's WaitOne timer fails to collect the data I need in the time allotted, there is probably a network issue, and m_mre does not have many records (if any). So, I would need to try to pull this information again the next time.
If anyone understands what I'm trying to explain, I'd like to see a better way of doing this.
It just seems too forced, right now. I keep thinking there is a better way to do it.
I think you're going about the multithreading part the wrong way. I can't really explain it, but threads should cooperate and not compete for resources, but that's exactly what's bothering you here a bit. Another problem is that your timeout is too long (so that it annoys users) and at the same time too short (if the AD server is a bit slow, but still there and serving). Your goal should be to let the thread run in the background and when it is finished, it updates the list. In the meantime, you present some fallbacks to the user and the notification that the user list is still being populated.
A few more notes on your code above:
You have a variable m_thread that is only used locally. Further, your code contains a redundant check whether that variable is null.
If you create a user list with defaults/fallbacks first and then update it through a function (make sure you are checking the InvokeRequired flag of the displaying control!) you won't need a lock. This means that the thread does not access the list stored as member but a separate list it has exclusive access to (not a member variable). The update function then replaces (!) this list, so now it is for exclusive use by the UI.
Lastly, if the AD server is really not there, try to forward the error from the background thread to the UI in some way, so that the user knows what's broken.
If you want, you can add an event to signal the thread to stop, but in most cases that won't even be necessary.
I've been trying to get what I believe to be the simplest possible form of threading to work in my application but I just can't do it.
What I want to do: I have a main form with a status strip and a progress bar on it. I have to read something between 3 and 99 files and add their hashes to a string[] which I want to add to a list of all files with their respective hashes. Afterwards I have to compare the items on that list to a database (which comes in text files).
Once all that is done, I have to update a textbox in the main form and the progressbar to 33%; mostly I just don't want the main form to freeze during processing.
The files I'm working with always sum up to 1.2GB (+/- a few MB), meaning I should be able to read them into byte[]s and process them from there (I have to calculate CRC32, MD5 and SHA1 of each of those files so that should be faster than reading all of them from a HDD 3 times).
Also I should note that some files may be 1MB while another one may be 1GB. I initially wanted to create 99 threads for 99 files but that seems not wise, I suppose it would be best to reuse threads of small files while bigger file threads are still running. But that sounds pretty complicated to me so I'm not sure if that's wise either.
So far I've tried workerThreads and backgroundWorkers but neither seem to work too well for me; at least the backgroundWorkers worked SOME of the time, but I can't even figure out why they won't the other times... either way the main form still froze.
Now I've read about the Task Parallel Library in .NET 4.0 but I thought I should better ask someone who knows what he's doing before wasting more time on this.
What I want to do looks something like this (without threading):
List<string[]> fileSpecifics = new List<string[]>();
int fileMaxNumber = 42; // something between 3 and 99, depending on file set
for (int i = 1; i <= fileMaxNumber; i++)
{
string fileName = "C:\\path\\to\\file" + i.ToString("D2") + ".ext"; // file01.ext - file99.ext
string fileSize = new FileInfo(fileName).Length.ToString();
byte[] file = File.ReadAllBytes(fileName);
// hash calculations (using SHA1CryptoServiceProvider() etc., no problems with that so I'll spare you that, return strings)
file = null; // I didn't yet check if this made any actual difference but I figured it couldn't hurt
fileSpecifics.Add(new string[] { fileName, fileSize, fileCRC, fileMD5, fileSHA1 });
}
// look for files in text database mentioned above, i.e. first check for "file bundles" with the same amount of files I have here; then compare file sizes, then hashes
// again, no problems with that so I'll spare you that; the database text files are pretty small so parsing them doesn't need to be done in an extra thread.
Would anybody be kind enough to point me in the right direction? I'm looking for the easiest way to read and hash those files quickly (I believe the hashing takes some time in which other files could already be read) and save the output to a string[], without the main form freezing, nothing more, nothing less.
I'm thankful for any input.
EDIT to clarify: by "backgroundWorkers working some of the time" I meant that (for the very same set of files), maybe the first and fourth execution of my code produces the correct output and the UI unfreezes within 5 seconds, for the second, third and fifth execution it freezes the form (and after 60 seconds I get an error message saying some thread didn't respond within that time frame) and I have to stop execution via VS.
Thanks for all your suggestions and pointers, as you all have correctly guessed I'm completely new to threading and will have to read up on the great links you guys posted.
Then I'll give those methods a try and flag the answer that helped me the most. Thanks again!
With .NET Framework 4.X
Use Directory.EnumerateFiles Method for efficient/lazy files enumeration
Use Parallel.For() to delegate parallelism work to PLINQ framework or use TPL to delegate single Task per pipeline Stage
Use Pipelines pattern to pipeline following stages: calculating hashcodes, compare with pattern, update UI
To avoid UI freeze use appropriate techniques: for WPF use Dispatcher.BeginInvoke(), for WinForms use Invoke(), see this SO answer
Considering that all this stuff has UI it might be useful adding some cancellation feature to abandon long running operation if needed, take a look at the CreateLinkedTokenSource class which allows triggering CancellationToken from the "external scope"
I can try adding an example but it's worth do it yourself so you would learn all this stuff rather than simply copy/paste - > got it working -> forgot about it.
PS: Must read - Pipelines paper at MSDN
TPL specific pipeline implementation
Pipeline pattern implementation: three stages: calculate hash, match, update UI
Three tasks, one per stage
Two Blocking Queues
//
// 1) CalculateHashesImpl() should store all calculated hashes here
// 2) CompareMatchesImpl() should read input hashes from this queue
// Tuple.Item1 - hash, Typle.Item2 - file path
var calculatedHashes = new BlockingCollection<Tuple<string, string>>();
// 1) CompareMatchesImpl() should store all pattern matching results here
// 2) SyncUiImpl() method should read from this collection and update
// UI with available results
var comparedMatches = new BlockingCollection<string>();
var factory = new TaskFactory(TaskCreationOptions.LongRunning,
TaskContinuationOptions.None);
var calculateHashesWorker = factory.StartNew(() => CalculateHashesImpl(...));
var comparedMatchesWorker = factory.StartNew(() => CompareMatchesImpl(...));
var syncUiWorker= factory.StartNew(() => SyncUiImpl(...));
Task.WaitAll(calculateHashesWorker, comparedMatchesWorker, syncUiWorker);
CalculateHashesImpl():
private void CalculateHashesImpl(string directoryPath)
{
foreach (var file in Directory.EnumerateFiles(directoryPath))
{
var hash = CalculateHashTODO(file);
calculatedHashes.Add(new Tuple<string, string>(hash, file.Path));
}
}
CompareMatchesImpl():
private void CompareMatchesImpl()
{
foreach (var hashEntry in calculatedHashes.GetConsumingEnumerable())
{
// TODO: obviously return type is up to you
string matchResult = GetMathResultTODO(hashEntry.Item1, hashEntry.Item2);
comparedMatches.Add(matchResult);
}
}
SyncUiImpl():
private void UpdateUiImpl()
{
foreach (var matchResult in comparedMatches.GetConsumingEnumerable())
{
// TODO: track progress in UI using UI framework specific features
// to do not freeze it
}
}
TODO: Consider using CancellationToken as a parameter for all GetConsumingEnumerable() calls so you easily can stop a pipeline execution when needed.
First off, you should be using a higher level of abstraction to solve this problem. You have a bunch of tasks to complete, so use the "task" abstraction. You should be using the Task Parallel Library to do this sort of thing. Let the TPL deal with the question of how many worker threads to create -- the answer could be as low as one if the work is gated on I/O.
If you do want to do your own threading, some good advice:
Do not ever block on the UI thread. That's is what is freezing your application. Come up with a protocol by which working threads can communicate with your UI thread, which then does nothing except for responding to UI events. Remember that methods of user interface controls like task completion bars must never be called by any other thread other than the UI thread.
Do not create 99 threads to read 99 files. That's like getting 99 pieces of mail and hiring 99 assistants to write responses: an extraordinarily expensive solution to a simple problem. If your work is CPU intensive then there is no point in "hiring" more threads than you have CPUs to service them. (That's like hiring 99 assistants in an office that only has four desks. The assistants spend most of their time waiting for a desk to sit at instead of reading your mail.) If your work is disk-intensive then most of those threads are going to be idle most of the time waiting for the disk, which is an even bigger waste of resources.
First, I hope you are using a built-in library for calculating hashes. It's possible to write your own, but it's far safer to use something that has been around for a while.
You may need only create as many threads as CPUs if your process is CPU intensive. If it is bound by I/O, you might be able to get away with more threads.
I do not recommend loading the entire file into memory. Your hashing library should support updating a chunk at a time. Read a chunk into memory, use it to update the hashes of each algorighm, read the next chunk, and repeat until end of file. The chunked approach will help lower your program's memory demands.
As others have suggested, look into the Task Parallel Library, particularly Data Parallelism. It might be as easy as this:
Parallel.ForEach(fileSpecifics, item => CalculateHashes(item));
Check out TPL Dataflow. You can use a throttled ActionBlock which will manage the hard part for you.
If my understanding that you are looking to perform some tasks in the background and not block your UI, then the UI BackgroundWorker would be an appropriate choice. You mentioned that you got it working some of the time, so my recommendation would be to take what you had in a semi-working state, and improve upon it by tracking down the failures. If my hunch is correct, your worker was throwing an exception, which it does not appear you are handling in your code. Unhandled exceptions that bubble out of their containing threads make bad things happen.
This code hashing one file (stream) using two tasks - one for reading, second for hashing, for more robust way you should read more chunks forward.
Because bandwidth of processor is much higher than of disk, unless you use some high speed Flash drive you gain nothing from hashing more files concurrently.
public void TransformStream(Stream a_stream, long a_length = -1)
{
Debug.Assert((a_length == -1 || a_length > 0));
if (a_stream.CanSeek)
{
if (a_length > -1)
{
if (a_stream.Position + a_length > a_stream.Length)
throw new IndexOutOfRangeException();
}
if (a_stream.Position >= a_stream.Length)
return;
}
System.Collections.Concurrent.ConcurrentQueue<byte[]> queue =
new System.Collections.Concurrent.ConcurrentQueue<byte[]>();
System.Threading.AutoResetEvent data_ready = new System.Threading.AutoResetEvent(false);
System.Threading.AutoResetEvent prepare_data = new System.Threading.AutoResetEvent(false);
Task reader = Task.Factory.StartNew(() =>
{
long total = 0;
for (; ; )
{
byte[] data = new byte[BUFFER_SIZE];
int readed = a_stream.Read(data, 0, data.Length);
if ((a_length == -1) && (readed != BUFFER_SIZE))
data = data.SubArray(0, readed);
else if ((a_length != -1) && (total + readed >= a_length))
data = data.SubArray(0, (int)(a_length - total));
total += data.Length;
queue.Enqueue(data);
data_ready.Set();
if (a_length == -1)
{
if (readed != BUFFER_SIZE)
break;
}
else if (a_length == total)
break;
else if (readed != BUFFER_SIZE)
throw new EndOfStreamException();
prepare_data.WaitOne();
}
});
Task hasher = Task.Factory.StartNew((obj) =>
{
IHash h = (IHash)obj;
long total = 0;
for (; ; )
{
data_ready.WaitOne();
byte[] data;
queue.TryDequeue(out data);
prepare_data.Set();
total += data.Length;
if ((a_length == -1) || (total < a_length))
{
h.TransformBytes(data, 0, data.Length);
}
else
{
int readed = data.Length;
readed = readed - (int)(total - a_length);
h.TransformBytes(data, 0, data.Length);
}
if (a_length == -1)
{
if (data.Length != BUFFER_SIZE)
break;
}
else if (a_length == total)
break;
else if (data.Length != BUFFER_SIZE)
throw new EndOfStreamException();
}
}, this);
reader.Wait();
hasher.Wait();
}
Rest of code here: http://hashlib.codeplex.com/SourceControl/changeset/view/71730#514336
I'm creating a page that get uploaded text files and builds them into multiple PDFs. They are just exports from Excel. Each row in the file corresponds to a new PDF that needs to be created.
Anyway, once the files are uploaded I want to begin processing them, but I don't want the user to have to stay on the page, or even still have their session open. For example they could close the browser and come back 10 minutes later, log in, and the progress information will say like 112/200 files processed or something. It will be a lot quicker than that though.
So two questions really, how can I pass this processing job to something (Handler?Thread?) that will continue to run when the page is closed, and will return as soon as the job has started (so the browser isn't stopped)? Secondly, where can I store this information so that when the user comes back to the page, they can see the current progress.
I realise that I can't use sessions, and since it will be processing about a file a second I don't really want to update a DB every second. Is there some way I can do this? Is it possible?
I solved this by using the link provided by astander above. I simply create an object in the HttpContext.Application to store progress variables, and then Set the method which does my processing inside a new Thread.
// Create the new progress object
BatchProgress bs = new BatchProgress(0);
if(Application["BatchProgress"] != null)
{
// Should never happen
Application["BatchProgress"] = bs;
}
else
{
Application.Add("BatchProgress","bs");
}
//Set up new thread, run batch is the method that does all the processing.
ThreadStart ts = new ThreadStart(RunBatch);
Thread t = new Thread(ts);
t.Start();
It then returns after the thread starts and I can use jQuery to get the Application["BatchProgress"] object at regular intervals. At the end of my thread the BatchProgress object has its status set to "Complete", then when jQuery queries it, it sees the complete status and removes the progress object from the application.