.NET Core API seems to deadlock - c#

We have a .NET Core API running in production which can run stable for days or even weeks and then suddenly freezes. Such a freeze can even happen multiple times a day, completely random. What happens: the code seems to be frozen and doesn't accept any new requests. No new requests are logged, the thread count rises sky-high and the memory rises steadily until it's maxed out.
I created a memory dump to analyze. It tells me that most threads are waiting for a lock to be released at a specific function, looking like a deadlock. I analysed this function and cannot see why this would cause issues. Can someone help me out? Obviously I suspect AsParallel() to be thread unsafe, but the internet says no, it is thread safe.
public async Task<bool> TryStorePropertiesAsync(string sessionId, Dictionary<string, string> keyValuePairs, int ttl = 1500)
{
try
{
await Task.WhenAll(keyValuePairs.AsParallel().Select(async item =>
{
var doc = await _cosmosDbRepository.GetItemByKeyAsync(GetId(sessionId, item.Key), sessionId) ?? new Document();
doc.SetPropertyValue("_partitionKey", sessionId);
doc.SetPropertyValue("key", GetId(sessionId, item.Key));
doc.SetPropertyValue("name", item.Key.ToLowerInvariant());
doc.SetPropertyValue("value", item.Value);
doc.TimeToLive = ttl;
await _cosmosDbRepository.UpsertDocumentAsync(doc, "_partitionKey");
}));
return true;
}
catch
{
ApplicationInsightsLogger.TrackException(ex, new Dictionary<string, string>
{
{ "sessionID", sessionId },
{ "action", "TryStoreItems" }
});
return false;
}
}

The code has serious issues. For eg 100 items, it fires off 100 concurrent operations, 4/8 at a time. The code inside the loop seems to read a document from CosmosDB, set all its properties then call a method named similar to DocumentClient.UpsertDocumentAsync which doesn't need pre-loading anything. Without knowing what _cosmosDbRepository is and what its methods do, one can only guess. It's possible it creates extra conflicts though by trying to lock stuff while the (probably useless) load/update cycle takes place.
For starters, AsParallel() is only meant for data parallelism: partition some data in memory and use as many workers are there are cores to crunch each partition. There's no data here though, just calls to async operations. That's why for 100 items, this code will fire off 100 concurrent tasks.
That could hit any number of CosmosDB throttling limits, even if it doesn't cause concurrency conflicts. It could also lead to networking issues, as the same cable is used for all those concurrent connections.
Not taking CosmosDB into account, the correct way to make lots of calls to a remote service is to queue them and execute them with a limited number of workers. This is very easy to do with .NET's ActionBlock. The code could change to something like this :
class Payload
{
public string SessionKey{get;set;}
public string Key{get;set;}
public string Name{get;set;}
public string Value{get;set;}
public int TTL{get;set;}
}
//Allow only 10 concurrent upserts
var options=new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
var upsertBlock=new ActionBlock<Payload>(myPosterAsync,options);
foreach(var payload in payloads)
{
block.Post(pair);
}
//Tell the block we're done
block.Complete();
//Await for all queued operations to complete
await block.Completion;
Where myPosterAsync contains the posting code :
async Task myPosterAsync(Payload item)
{
try
{
var doc = await _cosmosDbRepository.GetItemByKeyAsync(GetId(item.SessionId, item.Key),
item.SessionId)
?? new Document();
doc.SetPropertyValue("_partitionKey", item.SessionId);
doc.SetPropertyValue("key", GetId(sessionId, item.Key));
doc.SetPropertyValue("name", item.Name);
doc.SetPropertyValue("value", item.Value);
doc.TimeToLive = item.TTL;
await _cosmosDbRepository.UpsertDocumentAsync(doc, "_partitionKey");
catch (Exception ex)
{
//Handle the error in some way, eg log it
ApplicationInsightsLogger.TrackException(ex, new Dictionary<string, string>
{
{ "sessionID", item.SessionId },
{ "action", "TryStoreItems" }
});
}
}

Related

Running parallel async tasks and return result in .NET Core Web API

Hi Recently i was working in .net core web api project which is downloading files from external api.
In this .net core api recently found some issues while the no of files is more say more than 100. API is downloading max of 50 files and skipping others. WebAPI is deployed on AWS Lambda and timeout is 15mnts.
Actually the operation is timing out due to the long download process
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
bool DownloadFlag = false;
foreach (DownloadAttachment downloadAttachment in downloadAttachments)
{
DownloadFlag = await DownloadAttachment(downloadAttachment.id);
//update the download status in database
if(DownloadFlag)
{
bool UpdateFlag = await _DocumentService.UpdateDownloadStatus(downloadAttachment.id);
if (UpdateFlag)
{
await DeleteAttachment(downloadAttachment.id);
}
}
}
return true;
}
catch (Exception ext)
{
log.Error(ext, "Error in Saving attachment {attachemntId}",downloadAttachment.id);
return false;
}
}
Document service code
public async Task<bool> UpdateAttachmentDownloadStatus(string AttachmentID)
{
return await _documentRepository.UpdateAttachmentDownloadStatus(AttachmentID);
}
And DB update code
public async Task<bool> UpdateAttachmentDownloadStatus(string AttachmentID)
{
using (var db = new SqlConnection(_connectionString.Value))
{
var Result = 0; bool SuccessFlag = false;
var parameters = new DynamicParameters();
parameters.Add("#pm_AttachmentID", AttachmentID);
parameters.Add("#pm_Result", Result, System.Data.DbType.Int32, System.Data.ParameterDirection.Output);
var result = await db.ExecuteAsync("[Loan].[UpdateDownloadStatus]", parameters, commandType: CommandType.StoredProcedure);
Result = parameters.Get<int>("#pm_Result");
if (Result > 0) { SuccessFlag = true; }
return SuccessFlag;
}
}
How can i move this async task to run parallel ? and get the result? i tried following code
var task = Task.Run(() => DownloadAttachment( downloadAttachment.id));
bool result = task.Result;
Is this approach is fine? how can improve the performance? how to get the result from each parallel task and update to DB and delete based on success flag? Or this error is due to AWS timeout?
Please help
If you extracted the code that handles individual files to a separate method :
private async Task DownloadSingleAttachment(DownloadAttachment attachment)
{
try
{
var download = await DownloadAttachment(downloadAttachment.id);
if(download)
{
var update = await _DocumentService.UpdateDownloadStatus(downloadAttachment.id);
if (update)
{
await DeleteAttachment(downloadAttachment.id);
}
}
}
catch(....)
{
....
}
}
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
foreach (var attachment in downloadAttachments)
{
await DownloadSingleAttachment(attachment);
}
}
....
}
It would be easy to start all downloads at once, although not very efficient :
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
//Start all of them
var tasks=downloadAttachments.Select(att=>DownloadSingleAttachment(att));
await Task.WhenAll(tasks);
}
....
}
This isn't very efficient because external services hate lots of concurrent calls from a single source as you do, and almost certainly impose throttling. The database doesn't like lots of concurrent calls either, because in all database products concurrent calls lead to blocking one way or another. Even in databases that use multiversioning, this comes with an overhead.
Using Dataflow classes - Single block
One easy way to fix this is to use .NET's Dataflow classes to break the operation into a pipeline of steps, and execute each one with a different number of concurrent tasks.
We could put the entire operation into a single block, but that could cause problems if the update and delete operations aren't thread-safe :
var dlOptions= new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10,
};
var downloader=new ActionBlock<DownloadAttachment>(async att=>{
await DownloadSingleAttachment(att);
},dlOptions);
foreach (var attachment in downloadAttachments)
{
await downloader.SendAsync(attachement.id);
}
downloader.Complete();
await downloader.Completion;
Dataflow - Multiple steps
To avoid possible thread issues, the rest of the methods can go to their own blocks. They could both go into one ActionBlock that calls both Update and Delete, or they could go into separate blocks if the methods talk to different services with different concurrency requirements.
The downloader block will execute at most 10 concurrent downloads. By default, each block uses only a single task at a time.
The updater and deleter blocks have their default DOP=1, which means there's no risk of race conditions as long as they don't try to use eg the same connection at the same time.
var downloader=new TransformBlock<string,(string id,bool download)>(
async id=> {
var download=await DownloadAttachment(id);
return (id,download);
},dlOptions);
var updater=new TransformBlock<(string id,bool download),(string id,bool update)>(
async (id,download)=> {
if(download)
{
var update = await _DocumentService.UpdateDownloadStatus(id);
return (id,update);
}
return (id,false);
});
var deleter=new ActionBlock<(string id,bool update)>(
async (id,update)=> {
if(update)
{
await DeleteAttachment(id);
}
});
The blocks can be linked into a pipeline now and used. The setting PropagateCompletion = true means that as soon as a block is finished processing, it will tell all its connected blocks to finish as well :
var linkOptions=new DataflowLinkOptions { PropagateCompletion = true};
downloader.LinkTo(updater, linkOptions);
updater.LinkTo(deleter,linkOptions);
We can pump data into the head block as long as we need. When we're done, we call the head block's Complete() method. As each block finishes processing its data, it will propagate its completion to the next block in the pipeline. We need to await for the last (tail) block to complete to ensure all the attachments have been processed:
foreach (var attachment in downloadAttachments)
{
await downloader.SendAsync(attachement.id);
}
downloader.Complete();
await deleter.Completion;
Each block has an input and (when necessary) an output buffer, which means the "producer" and "consumers" of the messages don't have to be in sync, or even know of each other. All the "producer" needs to know is where to find the head block in a pipeline.
Throttling and backpressure
One way to throttle is to use a fixed number of tasks through MaxDegreeOfParallelism.
It's also possible to put a limit to the input buffer, thus blocking previous steps or producers if a block can't process messages fast enough. This can be done simply by setting the BoundedCapacity option for a block:
var dlOptions= new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10,
BoundedCapacity=20,
};
var updaterOptions= new ExecutionDataflowBlockOptions
{
BoundedCapacity=20,
};
...
var downloader=new TransformBlock<...>(...,dlOptions);
var updater=new TransformBlock<...>(...,updaterOptions);
No other changes are necessary
To run multiple asynchronous operations you could do something like this:
public async Task RunMultipleAsync<T>(IEnumerable<T> myList)
{
const int myNumberOfConcurrentOperations = 10;
var mySemaphore = new SemaphoreSlim(myNumberOfConcurrentOperations);
var tasks = new List<Task>();
foreach(var myItem in myList)
{
await mySemaphore.WaitAsync();
var task = RunOperation(myItem);
tasks.Add(task);
task.ContinueWith(t => mySemaphore.Release());
}
await Task.WhenAll(tasks);
}
private async Task RunOperation<T>(T myItem)
{
// Do stuff
}
Put your code from DownloadAttachmentsAsync at the 'Do stuff' comment
This will use a semaphore to limit the number of concurrent operations, since running to many concurrent operations is often a bad idea due to contention. You would need to experiment to find the optimal number of concurrent operations for your use case. Also note that error handling have been omitted to keep the example short.

Multi-threading in a foreach loop

I have read a few stackoverflow threads about multi-threading in a foreach loop, but I am not sure I am understanding and using it right.
I have tried multiple scenarios, but I am not seeing much increase in performance.
Here is what I believe runs Asynchronous tasks, but running synchronously in the loop using a single thread:
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
foreach (IExchangeAPI selectedApi in selectedApis)
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
}
}
stopWatch.Stop();
Here is what I hoped to be running Asynchronously (still using a single thread) - I would have expected some speed improvement already here:
List<Task<ExchangeTicker>> exchTkrs = new List<Task<ExchangeTicker>>();
stopWatch.Start();
foreach (IExchangeAPI selectedApi in selectedApis)
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
exchTkrs.Add(selectedApi.GetTickerAsync(symbol));
}
}
ExchangeTicker[] retTickers = await Task.WhenAll(exchTkrs);
stopWatch.Stop();
Here is what I would have hoped to run Asynchronously in Multi-thread:
stopWatch.Start();
Parallel.ForEach(selectedApis, async (IExchangeAPI selectedApi) =>
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
}
});
stopWatch.Stop();
Stop watch results interpreted as follows:
Console.WriteLine("Time elapsed (ns): {0}", stopWatch.Elapsed.TotalMilliseconds * 1000000);
Console outputs:
Time elapsed (ns): 4183308100
Time elapsed (ns): 4183946299.9999995
Time elapsed (ns): 4188032599.9999995
Now, the speed improvement looks minuscule. Am I doing something wrong or is that more or less what I should be expecting? I suppose writing to files would be a better to check that.
Would you mind also confirming I am interpreting the different use cases correctly?
Finally, using a foreach loop in order to get the ticker from multiple platforms in parallel may not be the best approach. Suggestions on how to improve this would be welcome.
EDIT
Note that I am using the ExchangeSharp code base that you can find here
Here is what the GerTickerAsync() method looks like:
public virtual async Task<ExchangeTicker> GetTickerAsync(string marketSymbol)
{
marketSymbol = NormalizeMarketSymbol(marketSymbol);
return await Cache.CacheMethod(MethodCachePolicy, async () => await OnGetTickerAsync(marketSymbol), nameof(GetTickerAsync), nameof(marketSymbol), marketSymbol);
}
For the Kraken API, you then have:
protected override async Task<ExchangeTicker> OnGetTickerAsync(string marketSymbol)
{
JToken apiTickers = await MakeJsonRequestAsync<JToken>("/0/public/Ticker", null, new Dictionary<string, object> { { "pair", NormalizeMarketSymbol(marketSymbol) } });
JToken ticker = apiTickers[marketSymbol];
return await ConvertToExchangeTickerAsync(marketSymbol, ticker);
}
And the Caching method:
public static async Task<T> CacheMethod<T>(this ICache cache, Dictionary<string, TimeSpan> methodCachePolicy, Func<Task<T>> method, params object?[] arguments) where T : class
{
await new SynchronizationContextRemover();
methodCachePolicy.ThrowIfNull(nameof(methodCachePolicy));
if (arguments.Length % 2 == 0)
{
throw new ArgumentException("Must pass function name and then name and value of each argument");
}
string methodName = (arguments[0] ?? string.Empty).ToStringInvariant();
string cacheKey = methodName;
for (int i = 1; i < arguments.Length;)
{
cacheKey += "|" + (arguments[i++] ?? string.Empty).ToStringInvariant() + "=" + (arguments[i++] ?? string.Empty).ToStringInvariant("(null)");
}
if (methodCachePolicy.TryGetValue(methodName, out TimeSpan cacheTime))
{
return (await cache.Get<T>(cacheKey, async () =>
{
T innerResult = await method();
return new CachedItem<T>(innerResult, CryptoUtility.UtcNow.Add(cacheTime));
})).Value;
}
else
{
return await method();
}
}
At first it should be pointed out that what you are trying to achieve is performance, not asynchrony. And you are trying to achieve it by running multiple operations concurrently, not in parallel. To keep the explanation simple I'll use a simplified version of your code, and I'll assume that each operation is a direct web request, without an intermediate caching layer, and with no dependencies to values existing in dictionaries.
foreach (var symbol in selectedSymbols)
{
var ticker = await selectedApi.GetTickerAsync(symbol);
}
The above code runs the operations sequentially. Each operation starts after the completion of the previous one.
var tasks = new List<Task<ExchangeTicker>>();
foreach (var symbol in selectedSymbols)
{
tasks.Add(selectedApi.GetTickerAsync(symbol));
}
var tickers = await Task.WhenAll(tasks);
The above code runs the operations concurrently. All operations start at once. The total duration is expected to be the duration of the longest running operation.
Parallel.ForEach(selectedSymbols, async symbol =>
{
var ticker = await selectedApi.GetTickerAsync(symbol);
});
The above code runs the operations concurrently, like the previous version with Task.WhenAll. It offers no advantage, while having the huge disadvantage that you no longer have a way to await the operations to complete. The Parallel.ForEach method will return immediately after launching the operations, because the Parallel class doesn't understand async delegates (it does not accept Func<Task> lambdas). Essentially there are a bunch of async void lambdas in there, that are running out of control, and in case of an exception they will bring down the process.
So the correct way to run the operations concurrently is the second way, using a list of tasks and the Task.WhenAll. Since you've already measured this method and haven't observed any performance improvements, I am assuming that there something else that serializes the concurrent operations. It could be something like a SemaphoreSlim hidden somewhere in your code, or some mechanism on the server side that throttles your requests. You'll have to investigate further to find where and why the throttling happens.
In general, when you do not see an increase by multi threading, it is because your task is not CPU limited or large enough to offset the overhead.
In your example, i.e.:
selectedApi.GetTickerAsync(symbol);
This can hae 2 reasons:
1: Looking up the ticker is brutally fast and it should not be an async to start with. I.e. when you look it up in a dictionary.
2: This is running via a http connection where the runtime is LIMITING THE NUMBER OF CONCURRENT CALLS. Regardless how many tasks you open, it will not use more than 4 at the same time.
Oh, and 3: you think async is using threads. It is not. It is particularly not the case in a codel ike this:
await selectedApi.GetTickerAsync(symbol);
Where you basically IMMEDIATELY WAIT FOR THE RESULT. There is no multi threading involved here at all.
foreach (IExchangeAPI selectedApi in selectedApis) {
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
} }
This is linear non threaded code using an async interface to not block the current thread while the (likely expensive IO) operation is in place. It starts one, THEN WAITS FOR THE RESULT. No 2 queries ever start at the same time.
If you want a possible (just as example) more scalable way:
In the foreach, do not await but add the task to a list of tasks.
Then start await once all the tasks have started. Likein a 2nd loop.
WAY not perfect, but at least the runtime has a CHANCE to do multiple lookups at the same time. Your await makes sure that you essentially run single threaded code, except async, so your thread goes back into the pool (and is not waiting for results), increasing your scalability - an item possibly not relevant in this case and definitely not measured in your test.

Parallel.ForEach faster than Task.WaitAll for I/O bound tasks?

I have two versions of my program that submit ~3000 HTTP GET requests to a web server.
The first version is based off of what I read here. That solution makes sense to me because making web requests is I/O bound work, and the use of async/await along with Task.WhenAll or Task.WaitAll means that you can submit 100 requests all at once and then wait for them all to finish before submitting the next 100 requests so that you don't bog down the web server. I was surprised to see that this version completed all of the work in ~12 minutes - way slower than I expected.
The second version submits all 3000 HTTP GET requests inside a Parallel.ForEach loop. I use .Result to wait for each request to finish before the rest of the logic within that iteration of the loop can execute. I thought that this would be a far less efficient solution, since using threads to perform tasks in parallel is usually better suited for performing CPU bound work, but I was surprised to see that the this version completed all of the work within ~3 minutes!
My question is why is the Parallel.ForEach version faster? This came as an extra surprise because when I applied the same two techniques against a different API/web server, version 1 of my code was actually faster than version 2 by about 6 minutes - which is what I expected. Could performance of the two different versions have something to do with how the web server handles the traffic?
You can see a simplified version of my code below:
private async Task<ObjectDetails> TryDeserializeResponse(HttpResponseMessage response)
{
try
{
using (Stream stream = await response.Content.ReadAsStreamAsync())
using (StreamReader readStream = new StreamReader(stream, Encoding.UTF8))
using (JsonTextReader jsonTextReader = new JsonTextReader(readStream))
{
JsonSerializer serializer = new JsonSerializer();
ObjectDetails objectDetails = serializer.Deserialize<ObjectDetails>(
jsonTextReader);
return objectDetails;
}
}
catch (Exception e)
{
// Log exception
return null;
}
}
private async Task<HttpResponseMessage> TryGetResponse(string urlStr)
{
try
{
HttpResponseMessage response = await httpClient.GetAsync(urlStr)
.ConfigureAwait(false);
if (response.StatusCode != HttpStatusCode.OK)
{
throw new WebException("Response code is "
+ response.StatusCode.ToString() + "... not 200 OK.");
}
return response;
}
catch (Exception e)
{
// Log exception
return null;
}
}
private async Task<ListOfObjects> GetObjectDetailsAsync(string baseUrl, int id)
{
string urlStr = baseUrl + #"objects/id/" + id + "/details";
HttpResponseMessage response = await TryGetResponse(urlStr);
ObjectDetails objectDetails = await TryDeserializeResponse(response);
return objectDetails;
}
// With ~3000 objects to retrieve, this code will create 100 API calls
// in parallel, wait for all 100 to finish, and then repeat that process
// ~30 times. In other words, there will be ~30 batches of 100 parallel
// API calls.
private Dictionary<int, Task<ObjectDetails>> GetAllObjectDetailsInBatches(
string baseUrl, Dictionary<int, MyObject> incompleteObjects)
{
int batchSize = 100;
int numberOfBatches = (int)Math.Ceiling(
(double)incompleteObjects.Count / batchSize);
Dictionary<int, Task<ObjectDetails>> objectTaskDict
= new Dictionary<int, Task<ObjectDetails>>(incompleteObjects.Count);
var orderedIncompleteObjects = incompleteObjects.OrderBy(pair => pair.Key);
for (int i = 0; i < 1; i++)
{
var batchOfObjects = orderedIncompleteObjects.Skip(i * batchSize)
.Take(batchSize);
var batchObjectsTaskList = batchOfObjects.Select(
pair => GetObjectDetailsAsync(baseUrl, pair.Key));
Task.WaitAll(batchObjectsTaskList.ToArray());
foreach (var objTask in batchObjectsTaskList)
objectTaskDict.Add(objTask.Result.id, objTask);
}
return objectTaskDict;
}
public void GetObjectsVersion1()
{
string baseUrl = #"https://mywebserver.com:/api";
// GetIncompleteObjects is not shown, but it is not relevant to
// the question
Dictionary<int, MyObject> incompleteObjects = GetIncompleteObjects();
Dictionary<int, Task<ObjectDetails>> objectTaskDict
= GetAllObjectDetailsInBatches(baseUrl, incompleteObjects);
foreach (KeyValuePair<int, MyObject> pair in incompleteObjects)
{
ObjectDetails objectDetails = objectTaskDict[pair.Key].Result
.objectDetails;
// Code here that copies fields from objectDetails to pair.Value
// (the incompleteObject)
AllObjects.Add(pair.Value);
};
}
public void GetObjectsVersion2()
{
string baseUrl = #"https://mywebserver.com:/api";
// GetIncompleteObjects is not shown, but it is not relevant to
// the question
Dictionary<int, MyObject> incompleteObjects = GetIncompleteObjects();
Parallel.ForEach(incompleteHosts, pair =>
{
ObjectDetails objectDetails = GetObjectDetailsAsync(
baseUrl, pair.Key).Result.objectDetails;
// Code here that copies fields from objectDetails to pair.Value
// (the incompleteObject)
AllObjects.Add(pair.Value);
});
}
A possible reason why Parallel.ForEach may run faster is because it creates the side-effect of throttling. Initially x threads are processing the first x elements (where x in the number of the available cores), and progressively more threads may be added depending on internal heuristics. Throttling IO operations is a good thing because it protects the network and the server that handles the requests from becoming overburdened. Your alternative improvised method of throttling, by making requests in batches of 100, is far from ideal for many reasons, one of them being that 100 concurrent requests are a lot of requests! Another one is that a single long running operation may delay the completion of the batch until long after the completion of the other 99 operations.
Note that Parallel.ForEach is also not ideal for parallelizing IO operations. It just happened to perform better than the alternative, wasting memory all along. For better approaches look here: How to limit the amount of concurrent async I/O operations?
https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreach?view=netframework-4.8
Basically the parralel foreach allows iterations to run in parallel so you are not constraining the iteration to run in serial, on a host that is not thread constrained this will tend to lead to improved throughput
In short:
Parallel.Foreach() is most useful for CPU bound tasks.
Task.WaitAll() is more useful for IO bound tasks.
So in your case, you are getting information from webservers, which is IO. If the async methods are implemented correctly, it won't block any thread. (It will use IO Completion ports to wait on) This way the threads can do other stuff.
By running the async methods GetObjectDetailsAsync(baseUrl, pair.Key).Result synchroniced, it will block a thread. So the threadpool will be flood by waiting threads.
So I think the Task solution will have a better fit.

Parallel.ForEach and blocking thread

I created Windows Service application with Quartz.NET library to schedule jobs for reporting purposes. Main part of application is fetching some data from databases on different locations (~260), so I decided to use Parallel.ForEach for parallel fetching and storing data on central location.
In Quartz.NET Job I run static method from my utility class that do parallel processing.
Utility class:
public class Helper
{
public static ConcurrentQueue<Exception> KolekcijaGresaka = new ConcurrentQueue<Exception>(); // Thread-safe
public static void Start()
{
List<KeyValuePair<string, string>> podaci = Aktivne(); // List of data for processing (260 items)
ParallelOptions opcije = new ParallelOptions { MaxDegreeOfParallelism = 50 };
Parallel.ForEach(podaci, opcije, p =>
{
UzmiPodatke(p.Key, p.Value, 2000);
});
}
public static void UzmiPodatke(string oznaka, string ipAdresa, int pingTimeout)
{
string datumTrenutneString = DateTime.Now.ToString("d.M.yyyy");
string datumPrethodneString = DatumPrethodneGodineString();
string sati = DateTime.Now.ToString("HH");
// Ping:
Ping ping = new Ping();
PingReply reply = ping.Send(ipAdresa, pingTimeout);
// If is online call method for copy data:
if (reply.Status == IPStatus.Success)
{
KopirajPodatke(oznaka, ipAdresa, datumTrenutneString, datumPrethodneString, sati, "TBL_DATA");
}
}
public static void KopirajPodatke(string oznaka, string ipAdresa, string datumTrenutneString, string datumPrethodneString, string sati, string tabelaDestinacija)
{
string lanString = "Database=" + ipAdresa + "://DBS//custdb.gdb; User=*******; Password=*******; Dialect=3;";
IDbConnection lanKonekcija = new FbConnection(lanString);
IDbCommand lanCmd = lanKonekcija.CreateCommand();
try
{
lanKonekcija.Open();
lanCmd.CommandText = "query ...";
DataTable podaciTabela = new DataTable();
// Get data from remote location:
try
{
podaciTabela.Load(lanCmd.ExecuteReader());
}
catch (Exception ex)
{
throw ex;
}
// Save data:
if (podaciTabela.Rows.Count > 0)
{
using (SqlConnection sqlKonekcija = new SqlConnection(Konekcije.DB("Podaci")))
{
sqlKonekcija.Open();
using (SqlBulkCopy bulkcopy = new SqlBulkCopy(sqlKonekcija))
{
bulkcopy.DestinationTableName = tabelaDestinacija;
bulkcopy.BulkCopyTimeout = 5; // seconds
bulkcopy.ColumnMappings.Add("A", "A");
bulkcopy.ColumnMappings.Add("B", "B");
bulkcopy.ColumnMappings.Add("C", "C");
bulkcopy.ColumnMappings.Add("D", "D");
try
{
bulkcopy.WriteToServer(podaciTabela);
}
catch (Exception ex)
{
throw ex;
}
}
}
}
}
catch (Exception ex)
{
KolekcijaGresaka.Enqueue(ex);
}
finally
{
lanCmd.Dispose();
lanKonekcija.Close();
lanKonekcija.Dispose();
}
}
Application works most of times (job is executing 4 times per day), but sometimes get stuck and hanging (usually when processed ~200 items parallel) thus blocking main thread and never ends. Seems like one of thread from parallel processing get blocked and prevents execution of main thread. Can this be caused by deadlocks?
How can I ensure that no one thread blocks application execution (even with no success of fetching data)? What can get wrong with code above?
How can I ensure that no one thread blocks application execution (even with no success of fetching data)? What can get wrong with code above?
Parallel.Foreach is not asynchronous, it only executes each iteration in parallel, so it will wait for every operation to finish before proceeding. If you truly do not care to wait for all operations to finish before proceeding back to the caller, then try using the Task factory to schedule these and use the thread pool by default.
i.e.
foreach(var p in podaci)
{
Task.Factory.StartNew(() => UzmiPodatke(p.Key, p.Value, 2000));
}
Or use ThreadPool.QueueUserWorkItem or BackgroundWorker, whatever you're familiar with and is applicable to the behavior you want.
This probably won't solve all your problems, just the unresponsive program. Most likely, if there is actually a problem with your code, one of your Tasks will eventually throw an exception which will crash your program if unhandled. Or worse yet, you will have "stuck" tasks just sitting there hogging resources if the Task(s) never finish. However, it may just be the case that occasionally one of these takes extremely long. In this case, you can handle this however you want (cancellation of long task, make sure all previously scheduled tasks complete before scheduling more, etc.), and the Task Parallel Library can support all these cases with some minor modifications.

Logging exceptions for each item in Parallel.ForEach and Task.Factory.StartNew from it

I am trying to use Parallel.ForEach on a list and for each item in the list, trying to make a database call. I am trying to log each item with or without error. Just wanted to check with experts here If I am doing thinsg right way. For this example, I am simulating the I/O using the File access instead of database access.
static ConcurrentQueue<IdAndErrorMessage> queue = new ConcurrentQueue<IdAndErrorMessage>();
private static void RunParallelForEach()
{
List<int> list = Enumerable.Range(1, 5).ToList<int>();
Console.WriteLine("Start....");
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
Parallel.ForEach(list, (tempId) =>
{
string errorMessage = string.Empty;
try
{
ComputeBoundOperationTest(tempId);
try
{
Task[] task = new Task[1]
{
Task.Factory.StartNew(() => this.contentFactory.ContentFileUpdate(content, fileId))
};
}
catch (Exception ex)
{
this.tableContentFileConversionInfoQueue.Enqueue(new ContentFileConversionInfo(fileId, ex.ToString()));
}
}
catch (Exception ex)
{
errorMessage = ex.ToString();
}
if (queue.SingleOrDefault((IdAndErrorMessageObj) => IdAndErrorMessageObj.Id == tempId) == null)
{
queue.Enqueue(new IdAndErrorMessage(tempId, errorMessage));
}
}
);
Console.WriteLine("Stop....");
Console.WriteLine("Total milliseconds :- " + stopWatch.ElapsedMilliseconds.ToString());
}
Below are the helper methods :-
private static byte[] FileAccess(int id)
{
if (id == 5)
{
throw new ApplicationException("This is some file access exception");
}
return File.ReadAllBytes(Directory.GetFiles(Environment.SystemDirectory).First());
//return File.ReadAllBytes("Files/" + fileName + ".docx");
}
private static void ComputeBoundOperationTest(int tempId)
{
//Console.WriteLine("Compute-bound operation started for :- " + tempId.ToString());
if (tempId == 4)
{
throw new ApplicationException("Error thrown for id = 4 from compute-bound operation");
}
Thread.Sleep(20);
}
private static void EnumerateQueue(ConcurrentQueue<IdAndErrorMessage> queue)
{
Console.WriteLine("Enumerating the queue items :- ");
foreach (var item in queue)
{
Console.WriteLine(item.Id.ToString() + (!string.IsNullOrWhiteSpace(item.ErrorMessage) ? item.ErrorMessage : "No error"));
}
}
There is no reason to do this:
/*Below task is I/O bound - so do this Async.*/
Task[] task = new Task[1]
{
Task.Factory.StartNew(() => FileAccess(tempId))
};
Task.WaitAll(task);
By scheduling this in a separate task, and then immediately waiting on it, you're just tying up more threads. You're better off leaving this as:
/*Below task is I/O bound - but just call it.*/
FileAccess(tempId);
That being said, given that you're making a logged value (exception or success) for every item, you might want to consider writing this into a method and then just calling the entire thing as a PLINQ query.
For example, if you write this into a method that handles the try/catch (with no threading), and returns the "logged string", ie:
string ProcessItem(int id) { // ...
You could write the entire operation as:
var results = theIDs.AsParallel().Select(id => ProcessItem(id));
You might want to remove Console.WriteLine from thread code. Reason being there can be only one console per windows app. So if two or more threads going to write parallel to console, one has to wait.
In replacement to your custom error queue you might want to see .NET 4's Aggregate Exception and catch that and process exceptions accordingly. The InnerExceptions propery will give you the necessary list of exceptions. More here
And a general code review comment, don't use magic numbers like 4 in if (tempId == 4) Instead have some const defined which tells what 4 stands for. e.g. if (tempId == Error.FileMissing)
Parallel.ForEach runs an action/func concurrently up to a certain number of simultaneous instances. If what each of those iterations is doing is not inherently independent on one another, you're not getting any performance gains. And, likely are reducing performance by introducing expensive context switching and contention. You say that you want to do a "database call" and simulating it with a file operation. If each iteration uses the same resource (same row in a database table, for example; or try to write to the same file in the same location) then they're not really going to be run in parallel. only one will be running at a time, the others will simply be "waiting" to get a hold of the resource--needlessly making your code complex.
You haven't detailed what you want to do for each iteration; but when I've encountered situations like this with other programmers, they almost always aren't really doing things in parallel and they've simply gone through and replaced foreachs with Parallel.ForEach in the hopes of magically gaining performance or magically making use of multi-CPU/Core processors.

Categories

Resources