I'm running a Performance Testing on EF because we were having some issue running concurrent calls on our Server.
Here is my Tests which i'm executing against northwind database, with 20,000 extra employees in the database to slow down retrieval
single execution generally return in about 451 ms, however when calling in parallel it seems I'm seeing some weird extra execution time
When I Run 10 Concurrent calls I'm getting 4 times the over head. It may seem trivial but when I execute things on my database you can see it a whole lot worse.
When running sp_whoisactive you can see that sql has already finished and it is wait to send back the (246ms)ASYNC_NETWORK_IO
Sql is not bottle necked.
class Program
{
public static int MAX_IO_THREADS = 2000;
public static int MIN_IO_THREADS = 1000;
public static int CONCURRENT_CALLS = 10;
static void Main(string[] args)
{
ConfigureThreadPool();
WarmUpContexts();
ProfileAction();
Console.ReadKey();
}
protected static void ProfileAction()
{
var callRange = Enumerable.Range(0, CONCURRENT_CALLS);
var stopwatch = new Stopwatch();
stopwatch.Start();
var results = new ConcurrentDictionary<int, string>();
var parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = 512 };
Parallel.ForEach(callRange, parallelOptions, i =>
{
var callSw = new Stopwatch();
callSw.Start();
var result = RunTest();
callSw.Stop();
results.TryAdd(i, $"Call Index: {i} took {callSw.ElapsedMilliseconds} milliseconds to complete");
});
stopwatch.Stop();
var milliseconds = stopwatch.ElapsedMilliseconds;
foreach (var result in results.OrderBy(x => x.Key))
{
Console.WriteLine(result);
}
Console.WriteLine("Total Time " + milliseconds);
}
public static void ConfigureThreadPool()
{
ThreadPool.GetMinThreads(out var defaultMinWorkerCount, out var _);
ThreadPool.GetMaxThreads(out var defaultMaxWorkerCount, out var _);
ThreadPool.SetMaxThreads(defaultMaxWorkerCount, MAX_IO_THREADS);
var successMin = ThreadPool.SetMinThreads(defaultMinWorkerCount, MIN_IO_THREADS);
}
static void WarmUpContexts()
{
using (var stagingContext = new NORTHWNDEntities())
{
stagingContext.Employees.First();
}
}
static int RunTest()
{
using (var context = new Entities())
{
var employees = context.Employees.ToArray();
return employees.Length;
}
}
}
How can I optimize the performance of my threads so they actually return NOTE if I call async this will hang forever,
could this be a bug in .net?
A Few reasons why my project code was Delaying.
1.Configure Thread Pool. Opening up your IO threads will help delay with request that are not only cpu bound. Such as writing a file to disk, or a network bound request.
Note: if you're setting higher values you need to be careful not to
have you min exceed the max, because these functions are poorly
named, and should be called TrySetMin/TrySetMax since they return a
bool whether successful or not
Be careful setting these I was only uses extremely high numbers for testing and learning purposes
ThreadPool.SetMaxThreads(defaultMaxWorkerCount, MAX_IO_THREADS);
ThreadPool.SetMinThreads(defaultMinWorkerCount, MIN_IO_THREADS);
2.GcServer Apparently, in .Net by Default the Garbage Collector is optimized for single Core performance instead of multicore Performance. You can read more on this issue on MSDN
Note: in .Net 4.0 Enabling GCServer Causing issues for application who had UI. In .Net 4.5 these issues were fixed by enabling gcConcurrent=true by Default
<configuration>
<runtime>
<gcServer enabled="true"/>
</runtime>
</configuration>
3.Additional Delays caused by EntityFrameWork, It looks like mapping relationships is really expensive, even if you are not pull out any related entities.
var employees = context.Employees.AsNoTracking().ToArray();
In was testing using an extremely large database, Moral is, if you are loading data for GET requests, if possible load without tracking.
Note: This may cause unexpected behavior as, different entities who share a common Foreign Key will have Different CLR object associated with them
Related
I have a data processing program in C# (.NET 4.6.2; WinForms for the UI). I'm experiencing a strange situation where computer speed seems to be causing Task.Factory.ContinueWhenAll to run earlier than expected or some Tasks are reporting complete before actually running. As you can see below, I have a queue of up to 390 tasks, with no more than 4 in queue at once. When all tasks are complete, the status label is updated to say complete. The ScoreManager involves retrieving information from a database, performing several client-side calculations, and saving to an Excel file.
When running the program from my laptop, everything functions as expected; when running from a substantially more powerful workstation, I experience this issue. Unfortunately, due to organizational limitations, I likely cannot get Visual Studio on the workstation to debug directly. Does anyone have any idea what might be causing this for me to investigate?
private void button1_Click(object sender, EventArgs e)
{
int startingIndex = cbStarting.SelectedIndex;
int endingIndex = cbEnding.SelectedIndex;
lblStatus.Text = "Running";
if (endingIndex < startingIndex)
{
MessageBox.Show("Ending must be further down the list than starting.");
return;
}
List<string> lItems = new List<string>();
for (int i = startingIndex; i <= endingIndex; i++)
{
lItems.Add(cbStarting.Items[i].ToString());
}
System.IO.Directory.CreateDirectory(cbMonth.SelectedItem.ToString());
ThreadPool.SetMaxThreads(4, 4);
List<Task<ScoreResult>> tasks = new List<Task<ScoreResult>>();
for (int i = startingIndex; i <= endingIndex; i++)
{
ScoreManager sm = new ScoreManager(cbStarting.Items[i].ToString(),
cbMonth.SelectedItem.ToString());
Task<ScoreResult> task = Task.Factory.StartNew<ScoreResult>((manager) =>
((ScoreManager)manager).Execute(), sm);
sm = null;
Action<Task<ScoreResult>> itemcomplete = ((_task) =>
{
if (_task.Result.errors.Count > 0)
{
txtLog.Invoke((MethodInvoker)delegate
{
txtLog.AppendText("Item " + _task.Result.itemdetail +
" had errors/warnings:" + Environment.NewLine);
});
foreach (ErrorMessage error in _task.Result.errors)
{
txtLog.Invoke((MethodInvoker)delegate
{
txtLog.AppendText("\t" + error.ErrorText +
Environment.NewLine);
});
}
}
else
{
txtLog.Invoke((MethodInvoker)delegate
{
txtLog.AppendText("Item " + _task.Result.itemdetail +
" succeeded." + Environment.NewLine);
});
}
});
task.ContinueWith(itemcomplete);
tasks.Add(task);
}
Action<Task[]> allComplete = ((_tasks) =>
{
lblStatus.Invoke((MethodInvoker)delegate
{
lblStatus.Text = "Complete";
});
});
Task.Factory.ContinueWhenAll<ScoreResult>(tasks.ToArray(), allComplete);
}
You are creating fire-and-forget tasks, that you don't wait or observe, here:
task.ContinueWith(itemcomplete);
tasks.Add(task);
Task.Factory.ContinueWhenAll<ScoreResult>(tasks.ToArray(), allComplete);
The ContinueWith method returns a Task. You probably need to attach the allComplete continuation to these tasks, instead of their antecedents:
List<Task> continuations = new List<Task>();
Task continuation = task.ContinueWith(itemcomplete);
continuations.Add(continuation);
Task.Factory.ContinueWhenAll<ScoreResult>(continuations.ToArray(), allComplete);
As a side note, you could make your code half in size and significantly more readable if you used async/await instead of the old-school ContinueWith and Invoke((MethodInvoker) technique.
Also: setting an upper limit to the number of ThreadPool threads in order to control the degree of parallelism is extremely inadvisable:
ThreadPool.SetMaxThreads(4, 4); // Don't do this!
You can use the Parallel class instead. It allows controlling the MaxDegreeOfParallelism quite easily.
After discovering state was IsFaulted, I added some code to add some exception information to the log (https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/exception-handling-task-parallel-library). Seems the problem is an underlying database issue where there are not enough connections left in the connection pool (Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.)--the additional speed allows queries to fire more quickly/frequently. Not sure entirely why, as I do have the SqlConnection enclosed in a using clause, but investigating a few things on that front. At any rate, the problem is clearly a little different than what I thought above, so marking this quasi-answered.
I have read a few stackoverflow threads about multi-threading in a foreach loop, but I am not sure I am understanding and using it right.
I have tried multiple scenarios, but I am not seeing much increase in performance.
Here is what I believe runs Asynchronous tasks, but running synchronously in the loop using a single thread:
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
foreach (IExchangeAPI selectedApi in selectedApis)
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
}
}
stopWatch.Stop();
Here is what I hoped to be running Asynchronously (still using a single thread) - I would have expected some speed improvement already here:
List<Task<ExchangeTicker>> exchTkrs = new List<Task<ExchangeTicker>>();
stopWatch.Start();
foreach (IExchangeAPI selectedApi in selectedApis)
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
exchTkrs.Add(selectedApi.GetTickerAsync(symbol));
}
}
ExchangeTicker[] retTickers = await Task.WhenAll(exchTkrs);
stopWatch.Stop();
Here is what I would have hoped to run Asynchronously in Multi-thread:
stopWatch.Start();
Parallel.ForEach(selectedApis, async (IExchangeAPI selectedApi) =>
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
}
});
stopWatch.Stop();
Stop watch results interpreted as follows:
Console.WriteLine("Time elapsed (ns): {0}", stopWatch.Elapsed.TotalMilliseconds * 1000000);
Console outputs:
Time elapsed (ns): 4183308100
Time elapsed (ns): 4183946299.9999995
Time elapsed (ns): 4188032599.9999995
Now, the speed improvement looks minuscule. Am I doing something wrong or is that more or less what I should be expecting? I suppose writing to files would be a better to check that.
Would you mind also confirming I am interpreting the different use cases correctly?
Finally, using a foreach loop in order to get the ticker from multiple platforms in parallel may not be the best approach. Suggestions on how to improve this would be welcome.
EDIT
Note that I am using the ExchangeSharp code base that you can find here
Here is what the GerTickerAsync() method looks like:
public virtual async Task<ExchangeTicker> GetTickerAsync(string marketSymbol)
{
marketSymbol = NormalizeMarketSymbol(marketSymbol);
return await Cache.CacheMethod(MethodCachePolicy, async () => await OnGetTickerAsync(marketSymbol), nameof(GetTickerAsync), nameof(marketSymbol), marketSymbol);
}
For the Kraken API, you then have:
protected override async Task<ExchangeTicker> OnGetTickerAsync(string marketSymbol)
{
JToken apiTickers = await MakeJsonRequestAsync<JToken>("/0/public/Ticker", null, new Dictionary<string, object> { { "pair", NormalizeMarketSymbol(marketSymbol) } });
JToken ticker = apiTickers[marketSymbol];
return await ConvertToExchangeTickerAsync(marketSymbol, ticker);
}
And the Caching method:
public static async Task<T> CacheMethod<T>(this ICache cache, Dictionary<string, TimeSpan> methodCachePolicy, Func<Task<T>> method, params object?[] arguments) where T : class
{
await new SynchronizationContextRemover();
methodCachePolicy.ThrowIfNull(nameof(methodCachePolicy));
if (arguments.Length % 2 == 0)
{
throw new ArgumentException("Must pass function name and then name and value of each argument");
}
string methodName = (arguments[0] ?? string.Empty).ToStringInvariant();
string cacheKey = methodName;
for (int i = 1; i < arguments.Length;)
{
cacheKey += "|" + (arguments[i++] ?? string.Empty).ToStringInvariant() + "=" + (arguments[i++] ?? string.Empty).ToStringInvariant("(null)");
}
if (methodCachePolicy.TryGetValue(methodName, out TimeSpan cacheTime))
{
return (await cache.Get<T>(cacheKey, async () =>
{
T innerResult = await method();
return new CachedItem<T>(innerResult, CryptoUtility.UtcNow.Add(cacheTime));
})).Value;
}
else
{
return await method();
}
}
At first it should be pointed out that what you are trying to achieve is performance, not asynchrony. And you are trying to achieve it by running multiple operations concurrently, not in parallel. To keep the explanation simple I'll use a simplified version of your code, and I'll assume that each operation is a direct web request, without an intermediate caching layer, and with no dependencies to values existing in dictionaries.
foreach (var symbol in selectedSymbols)
{
var ticker = await selectedApi.GetTickerAsync(symbol);
}
The above code runs the operations sequentially. Each operation starts after the completion of the previous one.
var tasks = new List<Task<ExchangeTicker>>();
foreach (var symbol in selectedSymbols)
{
tasks.Add(selectedApi.GetTickerAsync(symbol));
}
var tickers = await Task.WhenAll(tasks);
The above code runs the operations concurrently. All operations start at once. The total duration is expected to be the duration of the longest running operation.
Parallel.ForEach(selectedSymbols, async symbol =>
{
var ticker = await selectedApi.GetTickerAsync(symbol);
});
The above code runs the operations concurrently, like the previous version with Task.WhenAll. It offers no advantage, while having the huge disadvantage that you no longer have a way to await the operations to complete. The Parallel.ForEach method will return immediately after launching the operations, because the Parallel class doesn't understand async delegates (it does not accept Func<Task> lambdas). Essentially there are a bunch of async void lambdas in there, that are running out of control, and in case of an exception they will bring down the process.
So the correct way to run the operations concurrently is the second way, using a list of tasks and the Task.WhenAll. Since you've already measured this method and haven't observed any performance improvements, I am assuming that there something else that serializes the concurrent operations. It could be something like a SemaphoreSlim hidden somewhere in your code, or some mechanism on the server side that throttles your requests. You'll have to investigate further to find where and why the throttling happens.
In general, when you do not see an increase by multi threading, it is because your task is not CPU limited or large enough to offset the overhead.
In your example, i.e.:
selectedApi.GetTickerAsync(symbol);
This can hae 2 reasons:
1: Looking up the ticker is brutally fast and it should not be an async to start with. I.e. when you look it up in a dictionary.
2: This is running via a http connection where the runtime is LIMITING THE NUMBER OF CONCURRENT CALLS. Regardless how many tasks you open, it will not use more than 4 at the same time.
Oh, and 3: you think async is using threads. It is not. It is particularly not the case in a codel ike this:
await selectedApi.GetTickerAsync(symbol);
Where you basically IMMEDIATELY WAIT FOR THE RESULT. There is no multi threading involved here at all.
foreach (IExchangeAPI selectedApi in selectedApis) {
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
} }
This is linear non threaded code using an async interface to not block the current thread while the (likely expensive IO) operation is in place. It starts one, THEN WAITS FOR THE RESULT. No 2 queries ever start at the same time.
If you want a possible (just as example) more scalable way:
In the foreach, do not await but add the task to a list of tasks.
Then start await once all the tasks have started. Likein a 2nd loop.
WAY not perfect, but at least the runtime has a CHANCE to do multiple lookups at the same time. Your await makes sure that you essentially run single threaded code, except async, so your thread goes back into the pool (and is not waiting for results), increasing your scalability - an item possibly not relevant in this case and definitely not measured in your test.
We have a .NET Core API running in production which can run stable for days or even weeks and then suddenly freezes. Such a freeze can even happen multiple times a day, completely random. What happens: the code seems to be frozen and doesn't accept any new requests. No new requests are logged, the thread count rises sky-high and the memory rises steadily until it's maxed out.
I created a memory dump to analyze. It tells me that most threads are waiting for a lock to be released at a specific function, looking like a deadlock. I analysed this function and cannot see why this would cause issues. Can someone help me out? Obviously I suspect AsParallel() to be thread unsafe, but the internet says no, it is thread safe.
public async Task<bool> TryStorePropertiesAsync(string sessionId, Dictionary<string, string> keyValuePairs, int ttl = 1500)
{
try
{
await Task.WhenAll(keyValuePairs.AsParallel().Select(async item =>
{
var doc = await _cosmosDbRepository.GetItemByKeyAsync(GetId(sessionId, item.Key), sessionId) ?? new Document();
doc.SetPropertyValue("_partitionKey", sessionId);
doc.SetPropertyValue("key", GetId(sessionId, item.Key));
doc.SetPropertyValue("name", item.Key.ToLowerInvariant());
doc.SetPropertyValue("value", item.Value);
doc.TimeToLive = ttl;
await _cosmosDbRepository.UpsertDocumentAsync(doc, "_partitionKey");
}));
return true;
}
catch
{
ApplicationInsightsLogger.TrackException(ex, new Dictionary<string, string>
{
{ "sessionID", sessionId },
{ "action", "TryStoreItems" }
});
return false;
}
}
The code has serious issues. For eg 100 items, it fires off 100 concurrent operations, 4/8 at a time. The code inside the loop seems to read a document from CosmosDB, set all its properties then call a method named similar to DocumentClient.UpsertDocumentAsync which doesn't need pre-loading anything. Without knowing what _cosmosDbRepository is and what its methods do, one can only guess. It's possible it creates extra conflicts though by trying to lock stuff while the (probably useless) load/update cycle takes place.
For starters, AsParallel() is only meant for data parallelism: partition some data in memory and use as many workers are there are cores to crunch each partition. There's no data here though, just calls to async operations. That's why for 100 items, this code will fire off 100 concurrent tasks.
That could hit any number of CosmosDB throttling limits, even if it doesn't cause concurrency conflicts. It could also lead to networking issues, as the same cable is used for all those concurrent connections.
Not taking CosmosDB into account, the correct way to make lots of calls to a remote service is to queue them and execute them with a limited number of workers. This is very easy to do with .NET's ActionBlock. The code could change to something like this :
class Payload
{
public string SessionKey{get;set;}
public string Key{get;set;}
public string Name{get;set;}
public string Value{get;set;}
public int TTL{get;set;}
}
//Allow only 10 concurrent upserts
var options=new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
var upsertBlock=new ActionBlock<Payload>(myPosterAsync,options);
foreach(var payload in payloads)
{
block.Post(pair);
}
//Tell the block we're done
block.Complete();
//Await for all queued operations to complete
await block.Completion;
Where myPosterAsync contains the posting code :
async Task myPosterAsync(Payload item)
{
try
{
var doc = await _cosmosDbRepository.GetItemByKeyAsync(GetId(item.SessionId, item.Key),
item.SessionId)
?? new Document();
doc.SetPropertyValue("_partitionKey", item.SessionId);
doc.SetPropertyValue("key", GetId(sessionId, item.Key));
doc.SetPropertyValue("name", item.Name);
doc.SetPropertyValue("value", item.Value);
doc.TimeToLive = item.TTL;
await _cosmosDbRepository.UpsertDocumentAsync(doc, "_partitionKey");
catch (Exception ex)
{
//Handle the error in some way, eg log it
ApplicationInsightsLogger.TrackException(ex, new Dictionary<string, string>
{
{ "sessionID", item.SessionId },
{ "action", "TryStoreItems" }
});
}
}
I have two versions of my program that submit ~3000 HTTP GET requests to a web server.
The first version is based off of what I read here. That solution makes sense to me because making web requests is I/O bound work, and the use of async/await along with Task.WhenAll or Task.WaitAll means that you can submit 100 requests all at once and then wait for them all to finish before submitting the next 100 requests so that you don't bog down the web server. I was surprised to see that this version completed all of the work in ~12 minutes - way slower than I expected.
The second version submits all 3000 HTTP GET requests inside a Parallel.ForEach loop. I use .Result to wait for each request to finish before the rest of the logic within that iteration of the loop can execute. I thought that this would be a far less efficient solution, since using threads to perform tasks in parallel is usually better suited for performing CPU bound work, but I was surprised to see that the this version completed all of the work within ~3 minutes!
My question is why is the Parallel.ForEach version faster? This came as an extra surprise because when I applied the same two techniques against a different API/web server, version 1 of my code was actually faster than version 2 by about 6 minutes - which is what I expected. Could performance of the two different versions have something to do with how the web server handles the traffic?
You can see a simplified version of my code below:
private async Task<ObjectDetails> TryDeserializeResponse(HttpResponseMessage response)
{
try
{
using (Stream stream = await response.Content.ReadAsStreamAsync())
using (StreamReader readStream = new StreamReader(stream, Encoding.UTF8))
using (JsonTextReader jsonTextReader = new JsonTextReader(readStream))
{
JsonSerializer serializer = new JsonSerializer();
ObjectDetails objectDetails = serializer.Deserialize<ObjectDetails>(
jsonTextReader);
return objectDetails;
}
}
catch (Exception e)
{
// Log exception
return null;
}
}
private async Task<HttpResponseMessage> TryGetResponse(string urlStr)
{
try
{
HttpResponseMessage response = await httpClient.GetAsync(urlStr)
.ConfigureAwait(false);
if (response.StatusCode != HttpStatusCode.OK)
{
throw new WebException("Response code is "
+ response.StatusCode.ToString() + "... not 200 OK.");
}
return response;
}
catch (Exception e)
{
// Log exception
return null;
}
}
private async Task<ListOfObjects> GetObjectDetailsAsync(string baseUrl, int id)
{
string urlStr = baseUrl + #"objects/id/" + id + "/details";
HttpResponseMessage response = await TryGetResponse(urlStr);
ObjectDetails objectDetails = await TryDeserializeResponse(response);
return objectDetails;
}
// With ~3000 objects to retrieve, this code will create 100 API calls
// in parallel, wait for all 100 to finish, and then repeat that process
// ~30 times. In other words, there will be ~30 batches of 100 parallel
// API calls.
private Dictionary<int, Task<ObjectDetails>> GetAllObjectDetailsInBatches(
string baseUrl, Dictionary<int, MyObject> incompleteObjects)
{
int batchSize = 100;
int numberOfBatches = (int)Math.Ceiling(
(double)incompleteObjects.Count / batchSize);
Dictionary<int, Task<ObjectDetails>> objectTaskDict
= new Dictionary<int, Task<ObjectDetails>>(incompleteObjects.Count);
var orderedIncompleteObjects = incompleteObjects.OrderBy(pair => pair.Key);
for (int i = 0; i < 1; i++)
{
var batchOfObjects = orderedIncompleteObjects.Skip(i * batchSize)
.Take(batchSize);
var batchObjectsTaskList = batchOfObjects.Select(
pair => GetObjectDetailsAsync(baseUrl, pair.Key));
Task.WaitAll(batchObjectsTaskList.ToArray());
foreach (var objTask in batchObjectsTaskList)
objectTaskDict.Add(objTask.Result.id, objTask);
}
return objectTaskDict;
}
public void GetObjectsVersion1()
{
string baseUrl = #"https://mywebserver.com:/api";
// GetIncompleteObjects is not shown, but it is not relevant to
// the question
Dictionary<int, MyObject> incompleteObjects = GetIncompleteObjects();
Dictionary<int, Task<ObjectDetails>> objectTaskDict
= GetAllObjectDetailsInBatches(baseUrl, incompleteObjects);
foreach (KeyValuePair<int, MyObject> pair in incompleteObjects)
{
ObjectDetails objectDetails = objectTaskDict[pair.Key].Result
.objectDetails;
// Code here that copies fields from objectDetails to pair.Value
// (the incompleteObject)
AllObjects.Add(pair.Value);
};
}
public void GetObjectsVersion2()
{
string baseUrl = #"https://mywebserver.com:/api";
// GetIncompleteObjects is not shown, but it is not relevant to
// the question
Dictionary<int, MyObject> incompleteObjects = GetIncompleteObjects();
Parallel.ForEach(incompleteHosts, pair =>
{
ObjectDetails objectDetails = GetObjectDetailsAsync(
baseUrl, pair.Key).Result.objectDetails;
// Code here that copies fields from objectDetails to pair.Value
// (the incompleteObject)
AllObjects.Add(pair.Value);
});
}
A possible reason why Parallel.ForEach may run faster is because it creates the side-effect of throttling. Initially x threads are processing the first x elements (where x in the number of the available cores), and progressively more threads may be added depending on internal heuristics. Throttling IO operations is a good thing because it protects the network and the server that handles the requests from becoming overburdened. Your alternative improvised method of throttling, by making requests in batches of 100, is far from ideal for many reasons, one of them being that 100 concurrent requests are a lot of requests! Another one is that a single long running operation may delay the completion of the batch until long after the completion of the other 99 operations.
Note that Parallel.ForEach is also not ideal for parallelizing IO operations. It just happened to perform better than the alternative, wasting memory all along. For better approaches look here: How to limit the amount of concurrent async I/O operations?
https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreach?view=netframework-4.8
Basically the parralel foreach allows iterations to run in parallel so you are not constraining the iteration to run in serial, on a host that is not thread constrained this will tend to lead to improved throughput
In short:
Parallel.Foreach() is most useful for CPU bound tasks.
Task.WaitAll() is more useful for IO bound tasks.
So in your case, you are getting information from webservers, which is IO. If the async methods are implemented correctly, it won't block any thread. (It will use IO Completion ports to wait on) This way the threads can do other stuff.
By running the async methods GetObjectDetailsAsync(baseUrl, pair.Key).Result synchroniced, it will block a thread. So the threadpool will be flood by waiting threads.
So I think the Task solution will have a better fit.
I have an issue with data concurrent processing. My PC is running out of RAM quickly. Any advices on how to fix my concurrent implementation?
Common class:
public class CalculationResult
{
public int Count { get; set; }
public decimal[] RunningTotals { get; set; }
public CalculationResult(decimal[] profits)
{
this.Count = 1;
this.RunningTotals = new decimal[12];
profits.CopyTo(this.RunningTotals, 0);
}
public void Update(decimal[] newData)
{
this.Count++;
// summ arrays
for (int i = 0; i < 12; i++)
this.RunningTotals[i] = this.RunningTotals[i] + newData[i];
}
public void Update(CalculationResult otherResult)
{
this.Count += otherResult.Count;
// summ arrays
for (int i = 0; i < 12; i++)
this.RunningTotals[i] = this.RunningTotals[i] + otherResult.RunningTotals[i];
}
}
Single-core implementation of the code is following:
Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
foreach (var i in itterations)
{
// do the processing
// ..
string combination = "1,2,3,4,42345,52,523"; // this is determined during the processing
if (combinations.ContainsKey(combination))
combinations[combination].Update(newData);
else
combinations.Add(combination, new CalculationResult(newData));
}
Multi-core implementation:
ConcurrentBag<Dictionary<string, CalculationResult>> results = new ConcurrentBag<Dictionary<string, CalculationResult>>();
Parallel.ForEach(itterations, (i, state) =>
{
Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
// do the processing
// ..
// add combination to combinations -> same logic as in single core implementation
results.Add(combinations);
});
Dictionary<string, CalculationResult> combinationsReal = new Dictionary<string, CalculationResult>();
foreach (var item in results)
{
foreach (var pair in item)
{
if (combinationsReal.ContainsKey(pair.Key))
combinationsReal[pair.Key].Update(pair.Value);
else
combinationsReal.Add(pair.Key, pair.Value);
}
}
The issue I am having is that almost each combinations dictionary ends up with 930k records in it, which is on average consumes 400 [MB] RAM memory.
Now, in single core implementation there is only one such dictionary. All checks are performed against one dictionary. But this is slow approach and I want to use multi-core optimizations.
In multi-core implementation there is a ConcurrentBag instance created which holds all combinations dictionaries. As soon as the multi-thread job is finished - all dictionaries are aggregated into one. This approach works well for small amount of concurrent iterations. For example, for 4 iterations my RAM usage was ~ 1.5 [GB]. The issue arises, when I set the full amount of parallel iterations, which is 200! No amount of PC RAM is enough to hold all dictionaries, with million records each!
I was thinking about using ConcurrentDictioanary, until I found out that the "TryAdd" method does not guarantee integrity of added data in my situation, as I also need to run updates on running totals.
The only real multi-threaded option is, instead of adding all combinations to dictionary - is to save them to some DB. Data aggregation will then be a matter of 1 SQL select statement with a group by clause... but I don't like the idea of creating a temporary table and running DB instance just for that..
Is there a work around on how to processes data concurrently and not run out of RAM?
EDIT:
Maybe the real question should have been - how to make updating of RunningTotals thread-safe when using ConcurrentDictionary? I have just ran across this thread, with a similar issue with ConcurrentDictionary, but my situation seems to be more complicated as I have an array that needs to be updated. I am still investigating this matter.
EDIT2: Here is a working solution with ConcurrentDictionary. All I needed to do is to add a lock for the dictionary key.
ConcurrentDictionary<string, CalculationResult> combinations = new ConcurrentDictionary<string, CalculationResult>();
Parallel.ForEach(itterations, (i, state) =>
{
// do the processing
// ..
string combination = "1,2,3,4,42345,52,523"; // this is determined during the processing
if (combinations.ContainsKey(combination)) {
lock(combinations[combination])
combinations[combination].Update(newData);
}
else
combinations.TryAdd(combination, new CalculationResult(newData));
});
Single-thread code execution time is 1m 48s, whereas this solution execution time is 1m 7s for 4 iterations (37% performance increase). I am still wondering if SQL approach will be any faster, with millions of records? I will test it out possibly tomorrow and update.
Edit 3: For those of you wondering what's wrong with ConcurrentDictionary updates on a value - run this code with and without the lock.
public class Result
{
public int Count { get; set; }
}
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Start");
List<int> keys = new List<int>();
for (int i = 0; i < 100; i++)
keys.Add(i);
ConcurrentDictionary<int, Result> dict = new ConcurrentDictionary<int, Result>();
Parallel.For(0, 8, i =>
{
foreach(var key in keys)
{
if (dict.ContainsKey(key))
{
//lock (dict[key]) // uncomment this
dict[key].Count++;
}
else
dict.TryAdd(key, new Result());
}
});
// any output here is incorrect behavior. best result = no lines
foreach (var item in dict)
if (item.Value.Count != 7) { Console.WriteLine($"{item.Key}; {item.Value.Count}"); }
Console.WriteLine($"Finish");
Console.ReadKey();
}
}
Edit 4: After trials and errors I couldn't optimize SQL approach. This turned out to be the worst idea :) I have used an SQL Lite database. In-memory and in-file. With transaction and reusable SQL command parameters. Due to the huge amount of records that needed to be inserted - the performance is lacking. Data aggregation is the easiest part, but it takes a huge amount of time just to insert 4 millions of rows, I can't even begin to imagine how the 240 million of data could be processed efficiently.. So far (and also strangely), ConcurrentBag approach seems to be the fastest on my PC. Followed by a ConcurrentDictionary approach. ConcurrentBag is a bit heavier on memory, though. Thanks to the work of #Alisson - it is now perfectly fine to use it for larger set of iterations!
So, you just need to be sure you'll have no more than 4 concurrent iterations, that's the limit of your computer resources and by using only this computer, there is no magic.
I created a class to control the concurrent execution and the number of concurrent tasks it will perform.
The class will hold these properties:
public class ConcurrentCalculationProcessor
{
private const int MAX_CONCURRENT_TASKS = 4;
private readonly IEnumerable<int> _codes;
private readonly List<Task<Dictionary<string, CalculationResult>>> _tasks;
private readonly Dictionary<string, CalculationResult> _combinationsReal;
public ConcurrentCalculationProcessor(IEnumerable<int> codes)
{
this._codes = codes;
this._tasks = new List<Task<Dictionary<string, CalculationResult>>>();
this._combinationsReal = new Dictionary<string, CalculationResult>();
}
}
I made the number of concurrent tasks a const, but it could be a parameter in the constructor.
I created a method to handle the processing. For test purposes, I simulated a loop through 900k itens, adding them to a dictionary, and finally returning them:
private async Task<Dictionary<string, CalculationResult>> ProcessCombinations()
{
Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
// do the processing
// here we should do something that worth using concurrency
// like querying databases, consuming APIs/WebServices, and other I/O stuff
for (int i = 0; i < 950000; i++)
combinations[i.ToString()] = new CalculationResult(new decimal[] { 1, 10, 15 });
return await Task.FromResult(combinations);
}
The main method will start tasks in parallel, adding them to a list of tasks, so we can keep track of them lately.
Everytime the list reaches the maximum concurrent tasks, we await a method called ProcessRealCombinations.
public async Task<Dictionary<string, CalculationResult>> Execute()
{
ConcurrentBag<Dictionary<string, CalculationResult>> results = new ConcurrentBag<Dictionary<string, CalculationResult>>();
for (int i = 0; i < this._codes.Count(); i++)
{
// start the task imediately
var task = ProcessCombinations();
this._tasks.Add(task);
if (this._tasks.Count() >= MAX_CONCURRENT_TASKS)
{
// if we have more than MAX_CONCURRENT_TASKS in progress, we start processing some of them
// this will await any of the current tasks to complete, them process it (and any other task which may have been completed as well)...
await ProcessCompletedTasks().ConfigureAwait(false);
}
}
// keep processing until all the pending tasks have been completed...it should be no more than MAX_CONCURRENT_TASKS
while(this._tasks.Any())
await ProcessCompletedTasks().ConfigureAwait(false);
return this._combinationsReal;
}
The next method ProcessCompletedTasks will wait for at least one of the existing tasks to complete. After that, it will take all the completed tasks from the list (that one which finished and any other which may have been finished together), and get the result of them (the combinations).
With each processedCombinations, it'll merge with this._combinationsReal (using the same logic you provided in your question).
private async Task ProcessCompletedTasks()
{
await Task.WhenAny(this._tasks).ConfigureAwait(false);
var completedTasks = this._tasks.Where(t => t.IsCompleted).ToArray();
// completedTasks will have at least one task, but it may have more ;)
foreach (var completedTask in completedTasks)
{
var processedCombinations = await completedTask.ConfigureAwait(false);
foreach (var pair in processedCombinations)
{
if (this._combinationsReal.ContainsKey(pair.Key))
this._combinationsReal[pair.Key].Update(pair.Value);
else
this._combinationsReal.Add(pair.Key, pair.Value);
}
this._tasks.Remove(completedTask);
}
}
For each processedCombinations merged in _combinationsReal, it will remove its respective task from the list, and move on (start adding more tasks again). This will happen until we have created all the tasks for all iterations.
Finally, we keep processing it, until there are no more tasks in the list.
If you monitor the RAM consumption, you'll notice it will increase to about 1.5 GB (when we have 4 tasks being processed concurrently), then decrease to about 0.8 GB (when we remove tasks from the list). At least this is what happened in my computer.
Here is a fiddle, however I had to decrease the number of itens from 900k to 100, because fiddle limits the memory usage to avoid abuse.
I hope this help you somehow.
One thing to notice about all this stuff, is that you will benefit from using concurrent tasks mostly if your ProcessCombinations (the method that is executed concurrently when processing those 900k items) calls external resources, like reading files from your HD, executing a query in a database, calling an API/WebService method. I guess that code is probably reading 900k items from an external resource, then this will reduce the time needed to process it.
If the items were previously loaded and ProcessCombinations is just reading data that was already in memory, then the concurrency won't help at all (actually I believe it would make your code ran slower). If that's the case, then we are applying concurrency in the wrong place.
Using async calls in parallel is likely to help more when said calls are going to access external resources (either to get or store data), and depending on how many concurrent calls that external resources can support, it may still not make such a difference.