I have a method which takes an argument and run it against database, retrieve records, process and save processed records to a new table. Running the method from the service with one parameter works. What i am trying to achieve now is make the parameter dynamic. I have implemented a method to retrieve the parameters and it works fine. Now i am trying to run methods parallel from the list of parameter's provided. My current implementation is:
WorkerClass WorkerClass = new WorkerClass();
var ParametersList = WorkerClass.GetParams();
foreach (var item in ParametersList){
WorkerClass WorkerClass2 = new WorkerClass();
Parallel.Invoke(
()=>WorkerClass2.ProcessAndSaveMethod(item)
);
}
On the above implementation i think defining a new WorkerClass2 defies the whole point of Parallel.Invoke but i am having an issue with data mixup when using already defined WorkerClass. The reason for the mix up is Oracle connection is opened inside the Init() Method of the class and static DataTable DataCollectionList; is defined on a class level thus creating an issue.
Inside the method ProcessAndSaveMethod(item) i have:
OracleCommand Command = new OracleCommand(Query, OracleConnection);
OracleDataAdapter Adapter = new OracleDataAdapter(Command);
Adapter.Fill(DataCollectionList);
Inside init():
try
{
OracleConnection = new OracleConnection(Passengers.OracleConString);
DataCollectionList = new DataTable();
OracleConnection.Open();
return true;
}
catch (Exception ex)
{
OracleConnection.Close();
DataCollectionList.Clear();
return false;
}
And the function isn't run parallely as i was trying to do. Is there another way to implement this?
To run it in parallel you need to call Parallel.Invoke only once with all the tasks to be completed:
Parallel.Invoke(
ParametersList.Select(item =>
new Action(()=>WorkerClass2.ProcessAndSaveMethod(item))
).ToArray()
);
If you have a list of somethings and want it processed in parallel, there really is no easier way than PLinq:
var parametersList = SomeObject.SomeFunction();
var resultList = parametersList.AsParallel()
.Select(item => new WorkerClass().ProcessAndSaveMethod(item))
.ToList();
The fact that you build up a new connection and use a lot of variables local to the one item you process is fine. It's actually the preferred way to do multi-threading: keep as much local to the thread as you can.
That said, you have to measure if multi-threading is actually the fastest way to solve your problem. Maybe you can do your processing sequentially and then do all your database stuff in one go with bulk inserts, temporary tables or whatever is suited to your specific problem. Splitting a task into smaller tasks for more processors to run is not always faster. It's a tool and you need to find out if that tool is helping in your specific situation.
I achieved parallel processing using the below code and also avoided null pointer exception from DbCon.open() caused by connection pooling using the max degree of parallelism parameter.
Parallel.ForEach(ParametersList , new ParallelOptions() { MaxDegreeOfParallelism = 5 }, item=>
{
WorkerClass Worker= new WorkerClass();
Worker.ProcessAndSaveMethod(item);
});
Related
I wanted to know what is a better approach for solving the below problem.
I have a situation where I will be making a service call and db call and based on the inputs i will be doing some calculations and return some mismatch between those two.
I have the below sample snippet: If anyone could advise if making the Class static is better or the instance method is better as the call will be from Parallel.for where multiple threads will be using it at the same time.
//Sample Call ..actually will be controllig no of parallel calls using MAxDegreeofparallelism
Parallel.For(1, 10, i =>
{
SomeClass c = new SomeClass();
var res=c.SomeMethod("test", "test");
});
public class SomeClass
{
private readonly IDbRepositroy _IDbRepository =null;
private readonly IServiceRepositroy _IServiceRepositroy = null;
public SomeClass()
{
_IDbRepository = new DbRepository(); // Can do DI
_IServiceRepositroy = new ServiceRepositroy(); // Can DO DI
}
//Here Return Type is shown as string but can a new class object of errors
public List<Errors> SomeMethod(string param1,string param2)
{ var err = new List<Errors>();
var dbData = _IDbRepository.GetDbData(param1, param2);
var serviceData = _IServiceRepositroy.GetServiceData(param1, param2);
//Based on Servcie Data and DB data Calculate erros and return
//Code Logic
//Multiple Logic
return err;
}
}
In my opinion it make sense to prepare a cache providing some set of class instances and get instance from cache when required. It is better to avoid creating new class instances during execution of parallel operations because it may cause specific GC operations that may block all running threads (mostly they run in a single thread, but if they block application this is an issue). BTW: The best way to check what is going one is to use some profile tools. I prefer PerfView provided by Microsoft, since it help to diagnose problems with GC.
Discription of the project
I am creating two windows form application in which one form can insert, search and export (into a word doc) the data and other one will Update and delete the data. those data comes from a Access database that has 10 tables:
STAKEHOLDERS_ST
STAKEHOLDERS_STAFF_SST
STAKEHOLDERS_RELATED_DOCUMENT_STRD
RECENT_INTERACTIONS_RI
RECENT_INTERACTIONS_PARTICIPANTS_RIP
RECENT_INTERACTION_MATERIALS_PREPARED_RIMP
ODI_STAFF_OS (did not use yet)
MEETINGS_MET
MEETINGS_NOTES_MN
BRIDGE_ST_MET
These tables are related like this:
In order to access the data from DB, I have decided to create dictionaries and lists and copy all the data into those collections. With that, the program will easily display the data in the GUI with a reasonable speed. Also, whenever the users wants to display a certain information of a Stakeholder, the program will use lamba expression and Linq to display the data. At first, my first attempt was directly get the info's from DB by using SQL queries, but I notice that the process was very slow.
My problem
In general, all the main function (display data in GUI and insert,update,delete data in DB) in the App works fine. However, my only concern is about the amount of time for loading data into the collections and displaying forms. More specifically, whenever the user lunch a form, their form_load event in which the program will populate 9 collection: 3 dictionaries and 6 lists. But, I notice that it will take some time (~6 sec for 10 records in each table) for loading these data and displaying GUI.
Here's a sample of my code:
//Collection data
Dictionary<int, ST_Stakeholder> List_ST;
List<SST_StakeholderStaff> List_SST;
List<STRD_Stakeholder_Related_Document> List_STRD;
Dictionary<int, RI_Recent_Interaction> List_RI;
List<RIP_Recent_Interaction_Participants> List_RIP;
List<RIMP_Recent_Interaction_Materials_Prepared> List_RIMP;
Dictionary<int, MET_Meeting> List_MET;
List<MN_Meeting_notes> List_MN;
List<Tuple<int, int>> List_BR;
private void EditSTInfo_Load(object sender, EventArgs e)
{
try
{
//Fill the Collection
List_ST = new Dictionary<int, ST_Stakeholder>(DatabaseConnection.DicStakeholderInformation());
List_SST = new List<SST_StakeholderStaff>(DatabaseConnection.ListStakeholderStaff());
List_STRD = new List<STRD_Stakeholder_Related_Document>(DatabaseConnection.ListStakeholderDocument());
List_RI = new Dictionary<int, RI_Recent_Interaction>(DatabaseConnection.DicRecentInteraction());
List_RIP = new List<RIP_Recent_Interaction_Participants>(DatabaseConnection.ListRecentInteractionParticipants());
List_RIMP = new List<RIMP_Recent_Interaction_Materials_Prepared>(DatabaseConnection.ListRecentInteractionMaterials());
List_MET = new Dictionary<int, MET_Meeting>(DatabaseConnection.DicMeeting());
List_MN = new List<MN_Meeting_notes>(DatabaseConnection.ListMeetingNotes());
List_BR = new List<Tuple<int, int>>(DatabaseConnection.DicBrige());
//Display the Stakeholder Name in the List Bos
foreach (KeyValuePair<int, ST_Stakeholder> item in List_ST)
{
lbListStakeholder.Items.Add(item.Value.ST_Name);
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
DatabaseConnection.CloseDB();
}
}
PS: the DataBaseConnection is static class in which must of the functions will execute SQL commands. here a example for DatabaseConnection.DicStakeholderInformation()
//Display the data
//Function that return a dictionnary of Stakeholder information
public static Dictionary<int, ST_Stakeholder> DicStakeholderInformation()
{
Dictionary<int, ST_Stakeholder> result = new Dictionary<int, ST_Stakeholder>();
_CncxBD.Open();
OleDbCommand cmd = new OleDbCommand(stListStakeholderinfo, _CncxBD);
OleDbDataReader row = cmd.ExecuteReader();
while (row.Read())
{
result.Add((int)row["ST_ID"], new ST_Stakeholder((string)row["ST_Name"], (string)row["ST_Address"], (string)row["ST_City"], (string)row["ST_Province"], (string)row["ST_Postal_Code"], (string)row["ST_Vision"], (string)row["ST_Objectives"], (string)row["ST_Major_Successes"], (string)row["ST_Gorvernance"], (string)row["ST_Funding_Structure"], (string)row["ST_Type_of_work"], (string)row["ST_Scope"], row["ST_Additional_Information"].ToString()));
}
_CncxBD.Close();
return result;
}
With that I was just wondering if someone can help me to optimizing the speed for all this work! my first idea was probably to create multiple Thread in order to fill the collections. but I think that you can't use thread with collection because it's not safe. anyway I just need some ideas for speeding up the process.
Probably you can create different task and run it in different threads.
Task.Factory.StartNew( () => { your code here});
later on if your next step depends on this data to be loaded you can add all task that you have created to a list.
List<Task> lst = new List<Task>();
and later make your application wait for all those task to finish to continue.
Task.WaitAll(tasks.ToArray());
Imagine that you have created 4 different Task, you have group your procedures by the time that takes, and you have create 4 task, every task will run in a different thread, and they will run in parallel.
i hope this help you, probably there is a better way to do it. I do it sometimes to speed up heavy processing.
if your next code doesn't depend on this dictionary being filled to continue just dont wait for the task and continue, they will keep working in the background.
Thank you
So, I have a requirement to read each record(line) of a large data file and then application various validation rules on each of these lines. So, rather than just apply sequential validation, I decided to see if I could use some pipelining to help speed things up. So, I need to apply the same set of Business validation rules(5 at the moment) to all items in my collection. As there is no need to return output from each validation process, I don't need to worry about passing values from one validation routine to the other. I do however need to make the same data available to all my validation steps and to do this, I came up with coping the same data(records) to 5 different buffers, which will be used by each of the validation stages.
Below is the code I have going. But I have little confidence in this applied and wanted to know if there is a better way of doing this please? I appreciate any help you can give on this please. Thanks in advance.
public static void LoadBuffers(List<BlockingCollection<FlattenedLoadDetail>> outputs,
BlockingCollection<StudentDetail> students)
{
try
{
foreach (var student in students)
{
foreach (var stub in student.RecordYearDetails)
foreach (var buffer in outputs)
buffer.Add(stub);
}
}
finally
{
foreach (var buffer in outputs)
buffer.CompleteAdding();
}
}
public void Process(BlockingCollection<StudentRecordDetail> StudentRecords)
{
//Validate header record before proceeding
if(! IsHeaderRecordValid)
throw new Exception("Invalid Header Record Found.");
const int buffersize = 20;
var buffer1 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
var buffer2 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
var buffer3 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
var buffer4 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
var taskmonitor = new TaskFactory(TaskCreationOptions.LongRunning, TaskContinuationOptions.NotOnCanceled);
using (var loadUpStartBuffer = taskmonitor.StartNew(() => LoadBuffers(
new List<BlockingCollection<FlattenedLoadDetail>>
{buffer1, buffer2, buffer3, buffer4}, StudentRecords)))
{
var recordcreateDateValidationStage = taskmonitor.StartNew(() => ValidateRecordCreateDateActivity.Validate(buffer1));
var uniqueStudentIDValidationStage =
taskmonitor.StartNew(() => ValidateUniqueStudentIDActivity.Validate(buffer2));
var SSNNumberRangeValidationStage =
taskmonitor.StartNew(() => ValidateDocSequenceNumberActivity.Validate(buffer3));
var SSNRecordNumberMatchValidationStage =
taskmonitor.StartNew(() => ValidateStudentSSNRecordNumberActivity.Validate(buffer4));
Task.WaitAll(loadUpStartBuffer, recordcreateDateValidationStage, uniqueStudentIDValidationStage,
SSNNumberRangeValidationStage, SSNRecordNumberMatchValidationStage);
}
}
In fact, if I could tie up the tasks in such a way that once one fails, all the others stop, that would help me a lot but I am a newbie to this pattern and kind of trying to figure out best way to handle this problem I have here. Should I just throw caution to the wind and have each of the validation steps load an output buffer to be passed on to subsequent task? Is that a better way to go with this?
The first question you need to answer for yourself is whether you want to improve latency or throughput.
The strategy you depicted takes a single item and perform parallel calculation on it. This means that an item is serviced very fast, but at the expense of other items that are left waiting for their turn to enter.
Consider an alternative concurrent approach. You can treat the entire validation process as a sequential operation, but simultaneously service more than one item in parallel.
It seems to me that in your case you will benefit more from the latter approach, especially from the perspective of simplicity and since I am guessing that latency is not as important here.
I have this scenario where a windows service runs on the server. Every hour or so it reads a log file and saves the contents to db.
Now, there are going to be three files and these must be read and saved to three different tables. (i will read these connection strings etc from config file) This can be achieved by threading I know. So I want to call the existing 'readfile' method in a thread.
Am not familiar with threading, but is this the way to go?
NameValueConfigurationCollection config = ConfigurationManager.GetSection("LogDirectoryPath") as NameValueConfigurationCollection;
foreach (DictionaryEntry keyvalue in configs)
{
Thread t = new Thread(new ThreadStart(ExecuteProcess(keyvalue.Key.ToString())));
t.IsBackground = true;
t.Start();
}
private void ExecuteProcess(string path)
{
var xPathDocument = new XPathDocument(path);
XPathNavigator xPathNavigator = xPathDocument.CreateNavigator();
string connectionString = GetXPathQuery(
xPathNavigator,
"/connectionString/#value");
string commandText = GetXPathQuery(
xPathNavigator,
"/commandText/#value");
string filePath = GetXPathQuery(
xPathNavigator,
"/filePath/#value");
SqlConnection sqlConnection = new SqlConnection(connectionString);
sqlConnection.Open();
ProcessFiles(sqlConnection, commandText, filePath);
}
Do I have to make the method static ? What about the variables used?
In .NET 4 you could leverage the Task Parallel Library for this, so you would not have to explicitly create threads, but just express what code you want to execute in parallel:
Parallel.ForEach(configs.Select( x => x.Key.ToString()), path => ExecuteProcess(path));
My personal preference is to use ThreadPool for threads which I wish to run in parallel in a fire and forget scenario e.g. something like
foreach (DictionaryEntry keyvalue in configs)
{
ThreadPool.QueueUserWorkItem((state) =>
{
ExecuteProcess(keyvalue.Key.ToString());
});
}
The Parallel Task library is also useful but not guaranteed to go parallel and instead adapts, as I understand it, to the number of processors. It does however synchronise all your worker threads back again. So it depends on what it is you are trying to achieve,
Scenario
I have a line of code whereby I pass a good number of parameters into a method.
CODE as described above
foreach(Asset asset in assetList)
{
asset.ContributePrice(m_frontMonthPrice, m_Vol, m_divisor, m_refPrice, m_type,
m_overrideVol, i, m_decimalPlaces, metalUSDFID, metalEURFID);
}
What I really want to do...
What I really want to do is spawn a new thread everytime I call this method so that it does the work quicker (there are a lot of assets).
Envisaged CODE
foreach(Asset asset in assetList)
{
Thread myNewThread =
new Thread(new ThreadStart(asset.ContributePrice (m_frontMonthPrice, m_Vol,
m_divisor, m_refPrice, m_type, m_overrideVol, i, m_decimalPlaces, metalUSDFID,
metalEURFID)));
myNewThread.Start();
}
ISSUES
This is something which has always bothered me......why can't I pass the parameters into the thread.....what difference does it make?
I can't see a way around this that won't involve lots of refactoring......
.......This is an old application, built piece by piece as a result of feature creep.
Therefore, the code itself is messy and hard to read/follow.
I thought I had pinpointed an area to save some time and increase the processing speed but now I've hit a wall with this.
SUGGESTIONS?
Any help or suggestions would be greatly appreciated.
Cheers.
EDIT:
I'm using .Net 3.5.......I could potentially update to .Net 4.0
If you're using C# 3, the easiest way would be:
foreach(Asset asset in assetList)
{
Asset localAsset = asset;
ThreadStart ts = () => localAsset.ContributePrice (m_frontMonthPrice, m_Vol,
m_divisor, m_refPrice, m_type, m_overrideVol, i,
m_decimalPlaces, metalUSDFID, metalEURFID);
new Thread(ts).Start();
}
You need to take a "local" copy of the asset loop variable to avoid weird issues due to captured variables - Eric Lippert has a great blog entry on it.
In C# 2 you could do the same with an anonymous method:
foreach(Asset asset in assetList)
{
Asset localAsset = asset;
ThreadStart ts = delegate { localAsset.ContributePrice(m_frontMonthPrice,
m_Vol, m_divisor, m_refPrice, m_type, m_overrideVol, i,
m_decimalPlaces, metalUSDFID, metalEURFID); };
new Thread(ts).Start();
}
In .NET 4 it would probably be better to use Parallel.ForEach. Even before .NET 4, creating a new thread for each item may well not be a good idea - consider using the thread pool instead.
Spawning a new thread for each task will most likely make the task run significantly slower. Use the thread pool for that as it amortizes the cost of creating new threads. If you're on .NET 4 take a look at the new Task class.
If you need to pass parameters to a thread when starting it, you must use the ParameterizedThreadStart delegate. If you need to pass several parameters, consider encapsulating them in a type.
You could use ParameterizedThreadStart. You'll need to wrap all of your parameters into a single object. (Untested code below).
struct ContributePriceParams
{
public decimal FrontMonthPrice;
public int Vol;
//etc
}
//...
foreach(Asset asset in assetList)
{
ContributePriceParams pStruct = new pStruct() {FrontMonthPrice = m_frontMonthPrice, Vol = m_vol};
ParameterizedThreadStart pStart = new ParameterizedThreadStart(asset.ContributePrice);
Thread newThread = new Thread(pStart);
newThread.Start(pStruct);
}