Let's say I have a list that contains list of files I want to download and a function that gets a file URL name and downloads it. I want to download at max 4 parallel downloads. I know I can use a perfect solution for this:
Parallel.ForEach
(
Result,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
file => DownloadSingleFile(file)
);
But what do you suggest if we don't want to use this method? What is best in your idea?
Thank you.
How about a good old-fashioned Thread like this:
for(int i = 0; i < numThreads; i++)
{
Thread t = new Thread(()=>
{
try
{
while(working)
{
file = DownloadSingleFile(blockingFileQueue.Dequeue());
}
}
catch(InterruptException)
{
// eat the exception and exit the thread
}
});
t.IsBackground = true;
t.Start();
}
We may start 4 threads or use ThreadPool to add 4 work items.
Related
By using the below code firstly some of the calls are not getting made lets say out of 250 , 238 calls are made and rest doesn't.Secondly I am not sure if the calls are made at the rate of 20 calls per 10 seconds.
public List<ShowData> GetAllShowAndTheirCast()
{
ShowResponse allShows = GetAllShows();
ShowCasts showCast = new ShowCasts();
showCast.showCastList = new List<ShowData>();
using (Semaphore pool = new Semaphore(20, 20))
{
for (int i = 0; i < allShows.Shows.Length; i++)
{
pool.WaitOne();
Thread t = new Thread(new ParameterizedThreadStart((taskId) =>
{
showCast.showCastList.Add(MapResponse(allShows.Shows[i]));
}));
pool.Release();
t.Start(i);
}
}
//for (int i = 0; i < allShows.Shows.Length; i++)
//{
// showCast.showCastList.Add(MapResponse(allShows.Shows[i]));
//}
return showCast.showCastList;
}
public ShowData MapResponse(Show s)
{
CastResponse castres = new CastResponse();
castres.CastlistResponse = (GetShowCast(s.id)).CastlistResponse;
ShowData sd = new ShowData();
sd.id = s.id;
sd.name = s.name;
if (castres.CastlistResponse != null && castres.CastlistResponse.Any())
{
sd.cast = new List<CastData>();
foreach (var item in castres.CastlistResponse)
{
CastData cd = new CastData();
cd.birthday = item.person.birthday;
cd.id = item.person.id;
cd.name = item.person.name;
sd.cast.Add(cd);
}
}
return sd;
}
public ShowResponse GetAllShows()
{
ShowResponse response = new ShowResponse();
string showUrl = ClientAPIUtils.apiUrl + "shows";
response.Shows = JsonConvert.DeserializeObject<Show[]>(ClientAPIUtils.GetDataFromUrl(showUrl));
return response;
}
public CastResponse GetShowCast(int showid)
{
CastResponse res = new CastResponse();
string castUrl = ClientAPIUtils.apiUrl + "shows/" + showid + "/cast";
res.CastlistResponse = JsonConvert.DeserializeObject<List<Cast>>(ClientAPIUtils.GetDataFromUrl(castUrl));
return res;
}
All the Calls should be made , but I am not sure where they are getting aborted and even please let me know how to check the rate of calls being made.
I'm assuming that your goal is to process all data about shows but no more than 20 at once.
For that kind of task you should probably use ThreadPool and limit maximum number of concurrent threads using SetMaxThreads.
https://learn.microsoft.com/en-us/dotnet/api/system.threading.threadpool?view=netframework-4.7.2
You have to make sure that collection that you are using to store your results is thread-safe.
showCast.showCastList = new List<ShowData>();
I don't think that standard List is thread-safe. Thread-safe collection is ConcurrentBag (there are others as well). You can make standard list thread-safe but it requires more code. After you are done processing and need to have results in list or array you can convert collection to desired type.
https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentbag-1?view=netframework-4.7.2
Now to usage of semaphore. What your semaphore is doing is ensuring that maximum 20 threads can be created at once. Assuming that this loop runs in your app main thread your semaphore has no purpose. To make it work you need to release semaphore once thread is completed; but you are calling thread Start() after calling Release(). That results in thread being executed outside "critical area".
using (Semaphore pool = new Semaphore(20, 20)) {
for (int i = 0; i < allShows.Shows.Length; i++) {
pool.WaitOne();
Thread t = new Thread(new ParameterizedThreadStart((taskId) =>
{
showCast.showCastList.Add(MapResponse(allShows.Shows[i]));
pool.Release();
}));
t.Start(i);
}
}
I did not test this solution; additional problems might arise.
Another issue with this program is that it does not wait for all threads to complete. Once all threads are started; program will end. It is possible (and in your case I'm sure) that not all threads completed its operation; this is why ~240 data packets are done when program finishes.
thread.Join();
But if called right after Start() it will stop main thread until it is completed so to keep program concurrent you need to create a list of threads and Join() them at the end of program. It is not the best solution. How to wait on all threads that program can add to ThreadPool
Wait until all threads finished their work in ThreadPool
As final note you cannot access loop counter like that. Final value of loop counter is evaluated later and with test I ran; code has tendency to process odd records twice and skip even. This is happening because loop increases counter before previous thread is executed and causes to access elements outside bounds of array.
Possible solution to that is to create method that will create thread. Having it in separate method will evaluate allShows.Shows[i] to show before next loop pass.
public void CreateAndStartThread(Show show, Semaphore pool, ShowCasts showCast)
{
pool.WaitOne();
Thread t = new Thread(new ParameterizedThreadStart((s) => {
showCast.showCastList.Add(MapResponse((Show)s));
pool.Release();
}));
t.Start(show);
}
Concurrent programming is tricky and I would highly recommend to do some exercises with examples on common pitfalls. Books on C# programming are sure to have a chapter or two on the topic. There are plenty of online courses and tutorials on this topic to learn from.
Edit:
Working solution. Still might have some issues.
public ShowCasts GetAllShowAndTheirCast()
{
ShowResponse allShows = GetAllShows();
ConcurrentBag<ShowData> result = new ConcurrentBag<ShowData>();
using (var countdownEvent = new CountdownEvent(allShows.Shows.Length))
{
using (Semaphore pool = new Semaphore(20, 20))
{
for (int i = 0; i < allShows.Shows.Length; i++)
{
CreateAndStartThread(allShows.Shows[i], pool, result, countdownEvent);
}
countdownEvent.Wait();
}
}
return new ShowCasts() { showCastList = result.ToList() };
}
public void CreateAndStartThread(Show show, Semaphore pool, ConcurrentBag<ShowData> result, CountdownEvent countdownEvent)
{
pool.WaitOne();
Thread t = new Thread(new ParameterizedThreadStart((s) =>
{
result.Add(MapResponse((Show)s));
pool.Release();
countdownEvent.Signal();
}));
t.Start(show);
}
I have an array of filepath in List<string> with thousands of files. I want to process them in a function parallel with 8 threads.
ParallelOptions opt= new ParallelOptions();
opt.TaskScheduler = null;
opt.MaxDegreeOfParallelism = 8;
Parallel.ForEach(fileList, opt, item => DoSomething(item));
This code works fine for me but it guarantees to run max 8 threads and I want to run 8 threads always. CLR decides the number of threads to be use as per CPU load.
Please suggest a way in threading that always 8 threads are used in computing with minimum overhead.
Use a producer / consumer model. Create one producer and 8 consumers. For example:
BlockingCollection<string> _filesToProcess = new BlockingCollection<string>();
// start 8 tasks to do the processing
List<Task> _consumers = new List<Task>();
for (int i = 0; i < 8; ++i)
{
var t = Task.Factory.StartNew(ProcessFiles, TaskCreationOptions.LongRunning);
_consumers.Add(t);
}
// Populate the queue
foreach (var filename in filelist)
{
_filesToProcess.Add(filename);
}
// Mark the collection as complete for adding
_filesToProcess.CompleteAdding();
// wait for consumers to finish
Task.WaitAll(_consumers.ToArray(), Timeout.Infinite);
Your processing method removes things from the BlockingCollection and processes them:
void ProcessFiles()
{
foreach (var filename in _filesToProcess.GetConsumingEnumerable())
{
// do something with the file name
}
}
That will keep 8 threads running until the collection is empty. Assuming, of course, you have 8 cores on which to run the threads. If you have fewer available cores, then there will be a lot of context switching, which will cost you.
See BlockingCollection for more information.
Within a static counter, you might be able to get the number of current threads.
Every time you call start a task there is the possibility to use the Task.ContinueWith (http://msdn.microsoft.com/en-us/library/dd270696.aspx) to notify that it's over and you can start another one.
This way there is going to be always 8 tasks running.
OrderablePartitioner<Tuple<int, int>> chunkPart = Partitioner.Create(0, fileList.Count, 1);//Partition the list in chunk of 1 entry
ParallelOptions opt= new ParallelOptions();
opt.TaskScheduler = null;
opt.MaxDegreeOfParallelism = 8;
Parallel.ForEach(chunkPart, opt, chunkRange =>
{
for (int i = chunkRange.Item1; i < chunkRange.Item2; i++)
{
DoSomething(fileList[i].FullName);
}
});
I'm using C# Parallel.ForEach to process more than thousand subsets of data. One set takes 5-30 minutes to process, depending on size of the set. In my computer with option
ParallelOptions po = new ParallelOptions();
po.MaxDegreeOfParallelism = Environment.ProcessorCount
I'll get 8 parallel processes. As I understood, processes are divided equally between parallel tasks (e.g. the first task gets jobs number 1,9,17 etc, the second gets 2,10,18 etc.); therefore, one task can finish own jobs sooner than others. Because those sets of data took less time than others.
The problem is that four parallel tasks finish their jobs within 24 hours, but the last one finish in 48 hours. It there some chance to organize parallelism so that all parallel tasks are finishing equally? It means all parallel tasks continue working until all jobs are done?
Since the jobs are not equal, you can't split the number of jobs between processors and have them finish at about the same time. I think what you need here is 8 worker threads that retrieve the next job in line. You will have to use a lock on the function to get the next job.
Somebody correct me if I'm wrong, but off the top of my head... a worker thread could be given a function like this:
public void ProcessJob()
{
for (Job myJob = GetNextJob(); myJob != null; myJob = GetNextJob())
{
// process job
}
}
And the function to get the next job would look like:
private List<Job> jobs;
private int currentJob = 0;
private Job GetNextJob()
{
lock (jobs)
{
Job job = null;
if (currentJob < jobs.Count)
{
job = jobs[currentJob];
currentJob++;
}
return job;
}
}
It seems that there is no ready-to-use solution and it has to be created.
My previous code was:
var ListOfSets = (from x in Database
group x by x.SetID into z
select new { ID = z.Key}).ToList();
ParallelOptions po = new ParallelOptions();
po.MaxDegreeOfParallelism = Environment.ProcessorCount;
Parallel.ForEach(ListOfSets, po, SingleSet=>
{
AnalyzeSet(SingleSet.ID);
});
To share work equally between all CPU-s, I still use Parallel to do the work, but instead of ForEach I use For and an idea from Matt. The new code is:
Parallel.For(0, Environment.ProcessorCount, i=>
{
while(ListOfSets.Count() > 0)
{
double SetID = 0;
lock (ListOfSets)
{
SetID = ListOfSets[0].ID;
ListOfSets.RemoveAt(0);
}
AnalyzeSet(SetID);
}
});
So, thank you for your advice.
One option, as suggested by others, is to manage your own producer consumer queue. I'd like to note that using the BlockingCollection makes this very easy to do.
BlockingCollection<JobData> queue = new BlockingCollection<JobData>();
//add data to queue; if it can be done quickly, just do it inline.
//If it's expensive, start a new task/thread just to add items to the queue.
foreach (JobData job in data)
queue.Add(job);
queue.CompleteAdding();
for (int i = 0; i < Environment.ProcessorCount; i++)
{
Task.Factory.StartNew(() =>
{
foreach (var job in queue.GetConsumingEnumerable())
{
ProcessJob(job);
}
}, TaskCreationOptions.LongRunning);
}
Our company has a web service which I want to send XML files (stored on my drive) via my own HTTPWebRequest client in C#. This already works. The web service supports 5 synchronuous requests at the same time (I get a response from the web service once the processing on the server is completed). Processing takes about 5 minutes for each request.
Throwing too many requests (> 5) results in timeouts for my client. Also, this can lead to errors on the server side and incoherent data. Making changes on the server side is not an option (from different vendor).
Right now, my Webrequest client will send the XML and wait for the response using result.AsyncWaitHandle.WaitOne();
However, this way, only one request can be processed at a time although the web service supports 5. I tried using a Backgroundworker and Threadpool but they create too many requests at same, which make them useless to me. Any suggestion, how one could solve this problem? Create my own Threadpool with exactly 5 threads? Any suggestions, how to implement this?
The easy way is to create 5 threads ( aside: that's an odd number! ) that consume the xml files from a BlockingCollection.
Something like:
var bc = new BlockingCollection<string>();
for ( int i = 0 ; i < 5 ; i++ )
{
new Thread( () =>
{
foreach ( var xml in bc.GetConsumingEnumerable() )
{
// do work
}
}
).Start();
}
bc.Add( xml_1 );
bc.Add( xml_2 );
...
bc.CompleteAdding(); // threads will end when queue is exhausted
If you're on .Net 4, this looks like a perfect fit for Parallel.ForEach(). You can set its MaxDegreeOfParallelism, which means you are guaranteed that no more items are processed at one time.
Parallel.ForEach(items,
new ParallelOptions { MaxDegreeOfParallelism = 5 },
ProcessItem);
Here, ProcessItem is a method that processes one item by accessing your server and blocking until the processing is done. You could use a lambda instead, if you wanted.
Creating your own threadpool of five threads isn't tricky - Just create a concurrent queue of objects describing the request to make, and have five threads that loop through performing the task as needed. Add in an AutoResetEvent and you can make sure they don't spin furiously while there are no requests that need handling.
It can though be tricky to return the response to the correct calling thread. If this is the case for how the rest of your code works, I'd take a different approach and create a limiter that acts a bit like a monitor but allowing 5 simultaneous threads rather than only one:
private static class RequestLimiter
{
private static AutoResetEvent _are = new AutoResetEvent(false);
private static int _reqCnt = 0;
public ResponseObject DoRequest(RequestObject req)
{
for(;;)
{
if(Interlocked.Increment(ref _reqCnt) <= 5)
{
//code to create response object "resp".
Interlocked.Decrement(ref _reqCnt);
_are.Set();
return resp;
}
else
{
if(Interlocked.Decrement(ref _reqCnt) >= 5)//test so we don't end up waiting due to race on decrementing from finished thread.
_are.WaitOne();
}
}
}
}
You could write a little helper method, that would block the current thread until all the threads have finished executing the given action delegate.
static void SpawnThreads(int count, Action action)
{
var countdown = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
new Thread(() =>
{
action();
countdown.Signal();
}).Start();
}
countdown.Wait();
}
And then use a BlockingCollection<string> (thread-safe collection), to keep track of your xml files. By using the helper method above, you could write something like:
static void Main(string[] args)
{
var xmlFiles = new BlockingCollection<string>();
// Add some xml files....
SpawnThreads(5, () =>
{
using (var web = new WebClient())
{
web.UploadFile(xmlFiles.Take());
}
});
Console.WriteLine("Done");
Console.ReadKey();
}
Update
An even better approach would be to upload the files async, so that you don't waste resources on using threads for an IO task.
Again you could write a helper method:
static void SpawnAsyncs(int count, Action<CountdownEvent> action)
{
var countdown = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
action(countdown);
}
countdown.Wait();
}
And use it like:
static void Main(string[] args)
{
var urlXML = new BlockingCollection<Tuple<string, string>>();
urlXML.Add(Tuple.Create("http://someurl.com", "filename"));
// Add some more to collection...
SpawnAsyncs(5, c =>
{
using (var web = new WebClient())
{
var current = urlXML.Take();
web.UploadFileCompleted += (s, e) =>
{
// some code to mess with e.Result (response)
c.Signal();
};
web.UploadFileAsyncAsync(new Uri(current.Item1), current.Item2);
}
});
Console.WriteLine("Done");
Console.ReadKey();
}
I have an application that has many cases. Each case has many multipage tif files. I need to covert the tf files to pdf file. Since there are so many file, I thought I could thread the conversion process. I'm currently limiting the process to ten conversions at a time (i.e ten treads). When one conversion completes, another should start.
This is the current setup I'm using.
private void ConvertFiles()
{
List<AutoResetEvent> semaphores = new List<AutoResetEvet>();
foreach(String fileName in filesToConvert)
{
String file = fileName;
if(semaphores.Count >= 10)
{
WaitHandle.WaitAny(semaphores.ToArray());
}
AutoResetEvent semaphore = new AutoResetEvent(false);
semaphores.Add(semaphore);
ThreadPool.QueueUserWorkItem(
delegate
{
Convert(file);
semaphore.Set();
semaphores.Remove(semaphore);
}, null);
}
if(semaphores.Count > 0)
{
WaitHandle.WaitAll(semaphores.ToArray());
}
}
Using this, sometimes results in an exception stating the WaitHandle.WaitAll() or WaitHandle.WaitAny() array parameters must not exceed a length of 65. What am I doing wrong in this approach and how can I correct it?
There are a few problems with what you have written.
1st, it isn't thread safe. You have multiple threads adding, removing and waiting on the array of AutoResetEvents. The individual elements of the List can be accessed on separate threads, but anything that adds, removes, or checks all elements (like the WaitAny call), need to do so inside of a lock.
2nd, there is no guarantee that your code will only process 10 files at a time. The code between when the size of the List is checked, and the point where a new item is added is open for multiple threads to get through.
3rd, there is potential for the threads started in the QueueUserWorkItem to convert the same file. Without capturing the fileName inside the loop, the thread that converts the file will use whatever value is in fileName when it executes, NOT whatever was in fileName when you called QueueUserWorkItem.
This codeproject article should point you in the right direction for what you are trying to do: http://www.codeproject.com/KB/threads/SchedulingEngine.aspx
EDIT:
var semaphores = new List<AutoResetEvent>();
foreach (String fileName in filesToConvert)
{
String file = fileName;
AutoResetEvent[] array;
lock (semaphores)
{
array = semaphores.ToArray();
}
if (array.Count() >= 10)
{
WaitHandle.WaitAny(array);
}
var semaphore = new AutoResetEvent(false);
lock (semaphores)
{
semaphores.Add(semaphore);
}
ThreadPool.QueueUserWorkItem(
delegate
{
Convert(file);
lock (semaphores)
{
semaphores.Remove(semaphore);
}
semaphore.Set();
}, null);
}
Personally, I don't think I'd do it this way...but, working with the code you have, this should work.
Are you using a real semaphore (System.Threading)? When using semaphores, you typically allocate your max resources and it'll block for you automatically (as you add & release). You can go with the WaitAny approach, but I'm getting the feeling that you've chosen the more difficult route.
Looks like you need to remove the handle the triggered the WaitAny function to proceed
if(semaphores.Count >= 10)
{
int index = WaitHandle.WaitAny(semaphores.ToArray());
semaphores.RemoveAt(index);
}
So basically I would remove the:
semaphores.Remove(semaphore);
call from the thread and use the above to remove the signaled event and see if that works.
Maybe you shouldn't create so many events?
// input
var filesToConvert = new List<string>();
Action<string> Convert = Console.WriteLine;
// limit
const int MaxThreadsCount = 10;
var fileConverted = new AutoResetEvent(false);
long threadsCount = 0;
// start
foreach (var file in filesToConvert) {
if (threadsCount++ > MaxThreadsCount) // reached max threads count
fileConverted.WaitOne(); // wait for one of started threads
Interlocked.Increment(ref threadsCount);
ThreadPool.QueueUserWorkItem(
delegate {
Convert(file);
Interlocked.Decrement(ref threadsCount);
fileConverted.Set();
});
}
// wait
while (Interlocked.Read(ref threadsCount) > 0) // paranoia?
fileConverted.WaitOne();