Parallel.Invoke and Index was outside the bounds of the array [duplicate] - c#

This question already has answers here:
List.Add() thread safety
(9 answers)
Closed last year.
I am sporadically seeing the following error,
System.IndexOutOfRangeException: Index was outside the bounds of the
array. at System.Collections.Generic.List`1.Add(T item)
I have a console application which connects to multiple data sources and generate report in excel format.
In order to speedup the process, I am using Parallel.Invoke and it saved approx 40% of time.
The code structure is as follows,
public static void Execute(List<Members> activeRecords)
{
var resultList = new List<Recon>();
var lookupList = DataManager.GetLookUpData();
Parallel.Invoke(
() =>
{
GenerateReportA(activeRecords, lookupList, resultList);
},
() =>
{
GenerateReportB(activeRecords, lookupList, resultList);
},
() =>
{
GenerateReportC(activeRecords, lookupList, resultList);
},
() =>
{
GenerateReportD(activeRecords, lookupList, resultList);
},
And each GenerateReport method has similar structure but, generates different reports based on the need. There are totally 32 reports.
private static void GenerateReportA(List<Members> activeData, List<Recon> resultList)
{
var message = string.Empty;
var reportName = $"{area} {Name}";
var reportDataList = null;
try
{
//logic which compares generates report data
}
catch (Exception ex)
{
message = $"Error: {ex.Message}";
}
AddToResult(reportName, reportDataList, message, resultList);
}
The issue is happening in AddToResult method at resultList.Add(recon);
private static void AddToResult(string reportName, IList data, string message, List<Recon> resultList)
{
var recon = new Recon
{
ReportName = reportName,
ExcelData = ExcelManager.GetExcelData(reportName, message, data)
};
resultList.Add(recon);
}
Need suggestion / guidance on how I can avoid this error but still using parallel.invoke. Any suggestion will be greatly appreciated.

The fundamental issue is that List<T>.Add is not thread safe. By trying to add to the same array in parallel, you are most likely causing an internal collision, potentially during resizing operations and it is leading to failed indexing.
The proper solution is to lock before adding:
private static object _locker = new object();
private static void AddToResult(string reportName, IList data, string message, List<Recon> resultList)
{
var recon = new Recon
{
ReportName = reportName,
ExcelData = ExcelManager.GetExcelData(reportName, message, data)
};
lock(_locker)
{
resultList.Add(recon);
}
}

If you change something in threads, make sure that no one can do that at the same time. In your case it easy, just add lock on list.
private static void AddToResult(string reportName, IList data, string message, List<Recon> resultList)
{
var recon = new Recon
{
ReportName = reportName,
ExcelData = ExcelManager.GetExcelData(reportName, message, data)
};
lock (resultList)
{
resultList.Add(recon);
}
}
Also ensure that your code do not change data list.

Related

C# Code takes to long to run. Is there a way to make it finish quicker?

I need some help. If you input an Directory into my code, it goes in every folder in that Directory and gets every single file. This way, i managed to bypass the "AccessDeniedException" by using a code, BUT if the Directory is one, which contains alot of Data and folders (example: C:/) it just takes way to much time.
I dont really know how to multithread and i could not find any help on the internet. Is there a way to make the code run faster by multithreading? Or is it possible to ask the code to use more memory or Cores ? I really dont know and could use advise
My code to go in every File in every Subdirectory:
public static List<string> Files = new List<string>();
public static List<string> Exceptions = new List<string>();
public MainWindow()
{
InitializeComponent();
}
private static void GetFilesRecursively(string Directory)
{
try
{
foreach (string A in Directory.GetDirectories(Directory))
GetFilesRecursively(A);
foreach (string B in Directory.GetFiles(Directory))
AddtoList(B);
} catch (System.Exception ex) { Exceptions.Add(ex.ToString()); }
}
private static void AddtoList(string Result)
{
Files.Add(Result);
}
private void Btn_Click(object sender, RoutedEventArgs e)
{
GetFilesRecursively(Textbox1.Text);
foreach(string C in Files)
Textbox2.Text += $"{C} \n";
}
You don't need recursion to avoid inaccessible files. You can use the EnumerateFiles overload that accepts an EnumerationOptions parameter and set EnumerationOptions.IgnoreInaccessible to true:
var options=new EnumerationOptions
{
IgnoreInaccessible=true,
RecurseSubdirectories=true
};
var files=Directory.EnumerateFiles(somePath,"*",options);
The loop that appends file paths is very expensive too. Not only does it create a new temporary string on each iteration, it also forces a UI redraw. You could improve speed and memory usage (which, due to garbage collection is also affecting performance) by creating a single string, eg with String.Join or a StringBuilder :
var text=String.Join("\n",files);
Textbox2.Text=text;
String.Join uses a StringBuilder internally whose internal buffer gets reallocated each time it's full. The previous buffer has to be garbage-collected. Once could avoid even this by using a StringBuilder with a specific capacity. Even a rough estimate can reduce reallocations significantly:
var builder=new StringBuilder(4096);
foreach(var file in files)
{
builder.AppendLine(file);
}
create a class so you can add a private field to count the deep of the directroy.
add a TaskSource<T> property to the class, and await the Task that generated only if the deep out of the limit, and trigger an event so your UI can hook into the action and ask user.
if user cancel , then the task fail, if user confirm, then continue.
some logic code
public class FileLocator
{
public FileLocator(int maxDeep = 6){
_maxDeep = maxDeep;
this.TaskSource = new TaskSource();
this.ConfirmTask = this.TaskSource.Task;
}
private int _maxDeep;
private int _deep;
public event Action<FileLocator> OnReachMaxDeep;
public Task ConfirmTask ;
public TaskSource TaskSource {get;}
public Task<List<string>> GetFilesRecursivelyAsync()
{
var result = new List<string>();
foreach(xxxxxxx)
{
xxxxxxxxxxxxxx;
this._deep +=1;
if(_deep == _maxDeep)
{ OnRichMaxDeep?.Invoke(this); }
if(_deep >= _maxDeep)
{
try{
await ConfirmTask;
continue;
}
catch{
return result;
}
}
}
}
}
and call
var locator = new FileLocator();
locator.OnReachMaxDeep += (x)=> { var result = UI.Confirm(); if(result){ x.TaskSource.SetResult(); else{ x.TaskSource.SetException(new Exception()) } } }
var result = await locator.GetFilesRecursivelyAsync("C:");

c# How do I handle multiple return values from multiple threads with multithreading?

I'm trying to convert my single thread function into a multithreaded function. I've been reading up on threading, but I'm unsure of the proper structure needed to be able to start a thread for each server in the array while at the same time waiting for the threads to finish and receiving a return value from each one before being able to parse the return value data.
So the order of operations should be
Start a new thread for each server name
Each thread when it ends should receive an output from the function runPowerShellScript(server))
Wait for each thread to end and then organize the data. If I have 5 servers, I will have 5 different variables with return values
Also, how does the OS/compiler handle a situation like this where the return variable (returnVal) name is the same with each thread opened? I understand this is probably basic for someone who was classroom taught, since I am self taught I'm not sure what to do.
private void Run_Click(object sender, EventArgs e)
{
string[] servers = { "server1", "server2", "server3", "server4", "server5" };
foreach (string server in servers)
{
Collection<PSObject> returnVal = new Collection<PSObject>();
Thread psquery = new Thread(() => returnVal = runPowerShellScript(server)); // lambda expression assigns return value to function
psquery.Start();
psquery.Join(); // waits for thread to finish
}
// process data here
}
You could use the TAP pattern (https://learn.microsoft.com/en-us/dotnet/standard/asynchronous-programming-patterns/task-based-asynchronous-pattern-tap). Since you want to wait for each task to finish you can have the following approach:
private async void Run_Click(object sender, EventArgs e)
{
string[] servers = { "server1", "server2", "server3", "server4", "server5" };
var returnVal = new List<PSObject>();
foreach (var server in servers)
{
var result = await Task.Run(() => runPowerShellScript(server));
returnVal.Add(result);
}
// process data here
}
Each task will be awaited for the result and added to the returnVal listso you can use the result in your code.
There is an article for further reading about your options on Stackoverflow here: article
As for a code example using your question/examples, using the PTL API:
/// <summary>
/// I didn't test this, I am assuming you can work out the details
/// (I prefer this way)
/// </summary>
/// <returns></returns>
private void Run_Click(object sender, EventArgs e)
{
var response = new ConcurrentBag<PSObject>();
var exceptions = new ConcurrentQueue<Exception>();
string[] servers = { "server1", "server2", "server3", "server4", "server5" };
try
{
Parallel.ForEach(servers, server =>
{
response.Add(runPowerShellScript(server));
});
}
catch (AggregateException ae)
{
foreach (var e in ae.InnerExceptions)
{
exceptions.Enqueue(e);
}
}
if (exceptions.ToList().Any())
{
//there are exceptions, do something with them
//do something?
}
try
{
// quote: // process data here
}
catch (Exception e)
{
// do something with it
}
}

EF and MVC - approach to work together

I used the following approach long time (approx 5 years):
Create one big class with initialization of XXXEntities in controller and create each method for each action with DB. Example:
public class DBRepository
{
private MyEntities _dbContext;
public DBRepository()
{
_dbContext = new MyEntities();
}
public NewsItem NewsItem(int ID)
{
var q = from i in _dbContext.News where i.ID == ID select new NewsItem() { ID = i.ID, FullText = i.FullText, Time = i.Time, Topic = i.Topic };
return q.FirstOrDefault();
}
public List<Screenshot> LastPublicScreenshots()
{
var q = from i in _dbContext.Screenshots where i.isPublic == true && i.ScreenshotStatus.Status == ScreenshotStatusKeys.LIVE orderby i.dateTimeServer descending select i;
return q.Take(5).ToList();
}
public void SetPublicScreenshot(string filename, bool val)
{
var screenshot = Get<Screenshot>(p => p.filename == filename);
if (screenshot != null)
{
screenshot.isPublic = val;
_dbContext.SaveChanges();
}
}
public void SomeMethod()
{
SomeEntity1 s1 = new SomeEntity1() { field1="fff", field2="aaa" };
_dbContext.SomeEntity1.Add(s1);
SomeEntity2 s2 = new SomeEntity2() { SE1 = s1 };
_dbContext.SomeEntity1.Add(s2);
_dbContext.SaveChanges();
}
And some external code create DBRepository object and call methods.
It worked fine. But now Async operations came in. So, if I use code like
public async void AddStatSimplePageAsync(string IPAddress, string login, string txt)
{
DateTime dateAdded2MinsAgo = DateTime.Now.AddMinutes(-2);
if ((from i in _dbContext.StatSimplePages where i.page == txt && i.dateAdded > dateAdded2MinsAgo select i).Count() == 0)
{
StatSimplePage item = new StatSimplePage() { IPAddress = IPAddress, login = login, page = txt, dateAdded = DateTime.Now };
_dbContext.StatSimplePages.Add(item);
await _dbContext.SaveChangesAsync();
}
}
can be a situation, when next code will be executed before SaveChanged completed and one more entity will be added to _dbContext, which should not be saved before some actions. For example, some code:
DBRepository _rep = new DBRepository();
_rep.AddStatSimplePageAsync("A", "b", "c");
_rep.SomeMethod();
I worry, that SaveChanged will be called after line
_dbContext.SomeEntity1.Add(s1);
but before
_dbContext.SomeEntity2.Add(s2);
(i.e. these 2 actions is atomic operation)
Am I right? My approach is wrong now? Which approach should be used?
PS. As I understand, will be the following stack:
1. calling AddStatSimplePageAsync
2. start calling await _dbContext.SaveChangesAsync(); inside AddStatSimplePageAsync
3. start calling SomeMethod(), _dbContext.SaveChangesAsync() in AddStatSimplePageAsync is executing in another (child) thread.
4. complete _dbContext.SaveChangesAsync() in child thread. Main thread is executing something in SomeMethod()
Ok this time I (think)'ve got your problem.
At first, it's weird that you have two separate calls to SaveChangesmethod. Usually you should try to have it at the end of all your operations and then dispose it.
Even thought yes, your concerns are right, but some clarifications are needed here.
When encountering an asyncor await do not think about threads, but about tasks, that are two different concepts.
Have a read to this great article. There is an image that will practically explain you everything.
To say that in few words, if you do not await an async method, you can have the risk that your subsequent operation could "harm" the execution of the first one. To solve it, simply await it.

WebClient - wait until file has downloaded

I'm developing a function to return a collection, generated from an xml file.
Initially, I was using a local xml file for testing, but now I'm ready to have the app download the real xml file from a server. I'm struggling to see how I could do this due to the fact a WebClient object needs to be given an OpenReadCompleted event handler - I cannot return the collection data from this, and also by the time this handler executes, the original function has ended.
My original code is as follows:
public static ObservableCollection<OutletViewModel> GetNear(GeoCoordinate location)
{
ObservableCollection<OutletViewModel> Items = new ObservableCollection<OutletViewModel>();
// Load a local XML doc to simulate server response for location
XDocument xDoc = XDocument.Load("SampleRemoteServer/outlet_list.xml");
foreach (XElement outlet in xDoc.Descendants("outlet"))
{
Items.Add(new OutletViewModel()
{
Name = outlet.Attribute("name").Value,
Cuisine = outlet.Attribute("cuisine").Value
});
}
return Items;
}
How can I load the file in this function, have the event handler run, and then continue the function?
The only was I can think of is to add a loop to keep checking a variable, which is updated by the event handler code... and that doesn't sound like a good solution.
Thanks,
Josh
You move the foreach() loop to the completed event.
And that indeed means you cannot return anything from the original method. Make it a void.
This is how async I/O works, better get used to it. You will need to rethink your design.
You should start to take a look at async programming.
One (old school) way would be to implement a public event and subscribe to that event in the calling class.
However, using callbacks is more elegant. I whipped up a simple (useless, but still conceptually valid) example that you can build upon:
public static void Main(string[] args)
{
List<string> list = new List<string>();
GetData(data =>
{
foreach (var item in data)
{
list.Add(item);
Console.WriteLine(item);
}
Console.WriteLine("Done");
});
Console.ReadLine();
}
public static void GetData(Action<IEnumerable<string>> callback)
{
WebClient webClient = new WebClient();
webClient.DownloadStringCompleted += (s, e) =>
{
List<string> data = new List<string>();
for (int i = 0; i < 5; i++)
{
data.Add(e.Result);
}
callback(e.Error == null ? data : Enumerable.Empty<string>());
};
webClient.DownloadStringAsync(new Uri("http://www.google.com"));
}
If you want to jump onto the C# async bandwagon (link for WP7 implementation), you can implement it using the new async and await keywords:
public static async void DoSomeThing()
{
List<string> list = new List<string>();
list = await GetDataAsync();
foreach (var item in list)
{
Console.WriteLine(item);
}
}
public static async Task<List<string>> GetDataAsync()
{
WebClient webClient = new WebClient();
string result = await webClient.DownloadStringTaskAsync(new Uri("http://www.google.com"));
List<string> data = new List<string>();
for (int i = 0; i < 5; i++)
{
data.Add(result);
}
return data;
}

Questions about code using Task queue for parallel web gets

So I've got this code to drill down into a heirarchy of XML documents from a REST api. I posted earlier to get advice on how to make it recursive, then I went ahead and made it parralel.
First, I was SHOCKED by how fast it ran - it pulled down 318 XML docs in just under 12 seconds, compared to well over 10 minutes single threaded - I really didn't expect to gain that much. Is there some catch to this, because it seems too good to be true?
Second, I suspect this code is implementing a common pattern but possibly in a non "idiomatic" way. I have kind of a "producer-consumer queue" happening, with two separate locking objects. Is there a more standard way I could have done this?
Code.
public class ResourceGetter
{
public ResourceGetter(ILogger logger, string url)
{
this.logger = logger;
this.rootURL = url;
}
public List<XDocument> GetResources()
{
GetResources(rootURL);
while (NumTasks() > 0) RemoveTask().Wait();
return resources;
}
void GetResources(string url)
{
logger.Log("Getting resources at " + url);
AddTask(Task.Factory.StartNew(new Action(() =>
{
var doc = XDocument.Parse(GetXml(url));
if (deserializer.CanDeserialize(doc.CreateReader()))
{
var rl = (resourceList)deserializer.Deserialize(doc.CreateReader());
foreach (var item in rl.resourceURL)
{
GetResources(url + item.location);
}
}
else
{
logger.Log("Got resource for " + url);
AddResrouce(doc);
}
})));
}
object resourceLock = new object();
List<XDocument> resources = new List<XDocument>();
void AddResrouce(XDocument doc)
{
lock (resourceLock)
{
logger.Log("add resource");
resources.Add(doc);
}
}
object taskLock = new object();
Queue<Task> tasks = new Queue<Task>();
void AddTask(Task task)
{
lock (taskLock)
{
tasks.Enqueue(task);
}
}
Task RemoveTask()
{
lock (taskLock)
{
return tasks.Dequeue();
}
}
int NumTasks()
{
lock (taskLock)
{
logger.Log(tasks.Count + " tasks left");
return tasks.Count;
}
}
ILogger logger;
XmlSerializer deserializer = new XmlSerializer(typeof(resourceList));
readonly string rootURL;
}
Just offhand, I wouldn't bother with the code for managing the task list, all the locking, and the NumTasks() method. It would be simpler to just use a CountdownEvent, which is threadsafe to begin with. Just increment it when you create a new task, and decrement it when a task finishes, kind of like you are doing now but without the locking.

Categories

Resources