I have the following scenario, a timer every x minutes:
download an item to work from a rest service (made in php)
run a process batch to elaborate item
Now the application is fully functional, but I want to speedup the entire process downloading another item (if present in the rest service) while the application is processing one.
I think that I need a buffer/queue to accomplish this, like BlockingCollection, but I've no idea how to use it.
What's the right way to accomplish what I'm trying to do?
Thank you in advance!
What you can do is create a function which checks for new files to download. Have this function start as its own background thread that runs in an infinite loop, checking for new downloads in each iteration. If it finds any files that need downloading, call a separate function to download the file as a new thread. This new download function can then call the processing function as yet another thread once the file finishes downloading. With this approach you will be running all tasks in parallel for multiple files if needed.
Functions can be started as new threads by doing this
Thread thread = new Thread(FunctionName);
thread.start();
Use Microsoft's Reactive Framework (NuGet "System.Reactive"). Then you can do this:
var x_minutes = 5;
var query =
from t in Observable.Interval(TimeSpan.FromMinutes(x_minutes))
from i in Observable.Start(() => DownloadAnItem())
from e in Observable.Start(() => ElaborateItem(i))
select new { i, e };
var subscription =
query.Subscribe(x =>
{
// Do something with each `x.i` & `x.e`
});
Multi-threaded and simple.
If you want to stop processing then just call subscription.Dispose().
Related
I want to write a program which will have 2 thread. One will download another will parse the downloaded file. The tricky part is I can not have 2 parsing thread at the same time as it is using a library technique to parse the file. Please help with a suggestion. Thank you.
Foreach(string filename in filenames)
{
//start downloading thread here;
readytoparse.Add(filename);
}
Foreach(string filename in readytoparse)
{
//start parsing here
}
I ended up with the following logic
bool parserrunning = false;
List<string> readytoparse = new List<string>();
List<string> filenames= new List<string>();
//downloading method
Foreach(string filename in filenames)
{
//start downloading thread here;
readytoparse.Add(filename);
if(parserrunning == false;
{
// start parser method
}
}
//parsing method
parserrunning = true;
list<string> _readytoparse = new List<string>(readytoparse);
Foreach(string filename in _readytoparse)
{
//start parsing here
}
parserrunning = false;
Yousuf, your "question" is pretty vague. You could take an approach where your main thread downloads the files, then each time a file finishes downloading, spawns a worker thread to parse that file. There is the Task API or QueueUserWorkItem for this sort of thing. I suppose it's possible that you could end up with an awful lot of worker threads running concurrently this way, which isn't necessarily the key to getting the work done faster and could negatively impact other concurrent work on the computer.
If you want to limit this to two threads, you might consider having the download thread write the file name into a queue each time a download finishes. Then your parser thread monitors that queue (wake up every x seconds, check the queue to see if there's anything to do, do the work, check the queue again, if there's nothing to do, go back to sleep for x seconds, repeat).
If you want the parser to be resilient, make that queue persistent (a database, MSMQ, a running text file on disk--something persistent). That way, if there is an interruption (computer crashes, program crashes, power loss), the parser can start right back up where it left off.
Code synchronization comes into play in the sense that you obviously cannot have the parser trying to parse a file that the downloader is still downloading, and if you have two threads using a queue, then you obviously have to protect that queue from concurrent access.
Whether you use Monitors or Mutexes, or QueueUserWorkItem or the Task API is sort of academic. There is plenty of support in the .NET framework for synchronizing and parallelizing units of work.
I suggest avoiding all of the heart-ache in doing this yourself with any primatives and use a library designed for this kind of thing.
I recommend Microsoft's Reactive Framework (Rx).
Here's the code:
var query =
from filename in filenames.ToObservable(Scheduler.Default)
from file in Observable.Start(() => /* read file */, Scheduler.Default)
from parsed in Observable.Start(() => /* parse file */, Scheduler.Default)
select new
{
filename,
parsed,
};
query.Subscribe(fp =>
{
/* Do something with finished file */
});
Very simple.
If your parsing library is single threaded only, then add this line:
var els = new EventLoopScheduler();
And then replace Scheduler.Default with els on the parsing line.
I need to call a stored proc that does quite a bit of work.
I want to be able to "fire and forget" e.g not wait for the response of the stored proc before moving onto the next record as that slows things down and I need to be moving quickly.
What is the best method of calling a stored proc in C# and NOT waiting for the result, returning a success status, and then moving onto the next record.
I'd like to be able to just quickly loop through my object list of selections and call some method that does the DB call without waiting for the response before moving to the next item in the loop.
It's using C# 4.5.
I was thinking about using something like
Parallel.ForEach(sqlRecordset.AsEnumerable(), recordsetRow =>
{
// get data for selection
// call DB to save without waiting for response / success/failure status
}
But I don't know if this is the best approach or not.
Ideas?
Thanks in advance for any help.
Parallel.ForEach is parallel but synchronous - it will wait for all its iterations to finish.
You may use TPL (Task Parallel Library) to achieve what you want:
foreach (var recordsetRow_doNotUse in sqlRecordset.AsEnumerable())
{
var recordsetRow = recordsetRow_doNotUse;
Task.Run(() => {
Console.WriteLine(recordsetRow);
/* or do whatever you want with it here */
});
}
Task.Run returns a task so if you need to do something when all of them are done you can put all of them in an array and then use Task.WhenAll to obtain a Task which will be complete when all iterations are complete, without blocking any threads.
P.S. I don't know what you mean by C# 4.5, probably .NET 4.5 which is C# 5. My sample code above won't work with earlier versions of C#.
The easier approach is using ADO.NET:
var cmd = new SqlCommand("storeProcName", sqlConnection);
cmd.CommandType = CommandType.StoredProcedure;
cmd.BeginExecuteNonQuery();
Since you would like to iterate through each records, get some values and store it in the database and not wait for any record set, you could do something like below; Assuming you are using windows application and know threading;
foreach (DataRow row in DataTable.Rows) //you can use any enumerable object here
{
string s = row.Fields("name").ToString();
Thread t = new Thread(new ParameterizedThreadStart(SaveName));
t.IsBackground = true;
t.start(s);
}
public void SaveName(object s)
{
//your code to save the value in 's'
}
This way, you will not have to wait for the database to respond and every values will be saved. The threads will be destroyed as soon as it finish saving and processing your records..
Hope this helps.
Happy coding..!!
I am using Amazon SQS for image file upload. I have a function that checks for new SQS messages and read the messages in a loop and run a function
Code:
if (receiveMessageResponse.ReceiveMessageResult.Message.Count != 0)
{
for (int i = 0; i < receiveMessageResponse.ReceiveMessageResult.Message.Count; i++)
{
string messageBody = receiveMessageResponse.ReceiveMessageResult.Message[i].Body; // read as json text
dynamic dynResult = JObject.Parse(messageBody);
ImageServiceReference.statePackage sp = new ImageServiceReference.statePackage();
..... // some sp objevt initialization
Task.Factory.StartNew(() =>
{
SaveImageProcedure(sp);
});
}
}
In the working code, I call SaveImageProcedure without a Task (synchronously). But now I wanted to make the SaveImageProcedure run asynchronously.
In that function SaveImageProcedure I have a task:
var task = Task<int>.Factory.FromAsync(proxy.BeginSaveImage(sp, new AsyncCallback(CompleteSave), state), proxy.EndSaveImage);
this task calls a WCF Service that make long image processing asynchronously.
My problem:
When I use Task.Factory.StartNew to call the SaveImageProcedure asynchronously, the WCF doesn't process the images, whether when I run it without the Task.Factory.StartNew (just the function name itself), it runs fine and I see the image processed.
I can't figure out why when I made the change to run the SaveImageProcedure function async, it causes the WCF not to work as with running synchronously.
The SaveImageProcedure already is asynchronous as Task<int>.Factory.FromAsync is asynchronous. Why do you need the Task.Factory.StartNew around it? That will use an extra thread and once you use up all the threads in your thread pool the program will become very slow as the thread pool size is increased using a very slow hill climbing algorithm. I'd be curious to have a look at the number of threads in you program as that routine is running.
I'm building this program in visual studio 2010 using C# .Net4.0
The goal is to use thread and queue to improve performance.
I have a list of urls I need to process.
string[] urls = { url1, url2, url3, etc.} //up to 50 urls
I have a function that will take in each url and process them.
public void processUrl(string url) {
//some operation
}
Originally, I created a for-loop to go through each urls.
for (i = 0; i < urls.length; i++)
processUrl(urls[i]);
The method works, but the program is slow as it was going through urls one after another.
So the idea is to use threading to reduce the time, but I'm not too sure how to approach that.
Say I want to create 5 threads to process at the same time.
When I start the program, it will start processing the first 5 urls. When one is done, the program start process the 6th url; when another one is done, the program starts processing the 7th url, and so on.
The problem is, I don't know how to actually create a 'queue' of urls and be able to go through the queue and process.
Can anyone help me with this?
-- EDIT at 1:42PM --
I ran into another issue when I was running 5 process at the same time.
The processUrl function involve writing to log file. And if multiple processes timeout at the same time, they are writing to the same log file at the same time and I think that's throwing an error.
I'm assuming that's the issue because the error message I got was "The process cannot access the file 'data.log' because it is being used by another process."
The simplest option would be to just use Parallel.ForEach. Provided processUrl is thread safe, you could write:
Parallel.ForEach(urls, processUrl);
I wouldn't suggest restricting to 5 threads (the scheduler will automatically scale normally), but this can be done via:
Parallel.ForEach(urls, new ParallelOptions { MaxDegreeOfParallelism = 5}, processUrl);
That being said, URL processing is, by its nature, typically IO bound, and not CPU bound. If you could use Visual Studio 2012, a better option would be to rework this to use the new async support in the language. This would require changing your method to something more like:
public async Task ProcessUrlAsync(string url)
{
// Use await with async methods in the implementation...
You could then use the new async support in the loop:
// Create an enumerable to Tasks - this will start all async operations..
var tasks = urls.Select(url => ProcessUrlAsync(url));
await Task.WhenAll(tasks); // "Await" until they all complete
Use a Parallel Foreach with the Max Degree of Parallelism set to the number of threads you want (or leave it empty and let .NET do the work for you)
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = 5;
Parallel.ForEach(urls, parallelOptions, url =>
{
processUrl(url);
});
If you really want to create threads to accomplish you task in place of using parallel execution:
Suppose that I want one thread for each URL:
string[] urls = {"url1", "url2", "url3"};
I just start a new Thread instance for each URL (or each 5 url's):
foreach (var thread in urls.Select(url => new Thread(() => DownloadUrl(url))))
thread.Start();
And the method to download your URL:
private static void DownloadUrl(string url)
{
Console.WriteLine(url);
}
i have (want) to execute a search request to multiple sources.
Now i've done some multithreading in the past, but it was all fire and forget.
Now what i want to do, is to spin up 3 identical requests on 3 different objects, wait until they are all 'done' (and that gives me the first question: how do they say 'i'm done', and then collect all the data thet've sent me.
So in pseudo code i have this interface:
interface ISearch
SearchResult SearchForContent(SearchCriteria criteria)
So in code i create the three search services:
ISearch s1 = new SearchLocal();
ISearch s2 = new SearchThere();
ISearch s3 = new SearchHere();
And then call SearchForContent(SearchCriteria criteria) on all three of them, in a multihreaded / async way
and the they all come back to me with their SearchResult and after they are ALL done, i process their SearchResult objects.
I hope these lines of text kindof makes you get what is in my head :)
i'm working on a ASP.Net 3.5 C# project.
Create AutoResetEvent and pass them to WaitHandle.WaitAll()
There is an example here.
Basically:
1) You create an AutoResetEvent for each search and pass false to its constructor.
2) Create the threads and run search for each one and at the end, call Set on the AutoResetEvent in the finally block. It is very important that calling Set is done inside the finally block otherwise WaitAll() will be waiting indefinitely.
3) In the code right after you have spawned the threads, you call WaitHandle.WaitAll() and pass all those AutoResetEvent to it. This code will wait until all is finished.
Using tasks you could do a continuation like this:
Task[] t = new Task[2];
t[0] = Task.Factory.StartNew(() => { Thread.Sleep(1000); });
t[1] = Task.Factory.StartNew(() => { Thread.Sleep(2000); });
Task.Factory.ContinueWhenAll(t, myTasks => { Console.WriteLine("All done!"); });
Make an IEnumerable<ISearch>, and those items to it, and do .AsParallel().ForAll(...) on it.
Edit
ForAll won't return results, if you can change ISearch, give it a property for the results, then once the ForAll is done you can look at the results through the IEnumerable.
And yes, sorry, this is 4.0.