multithread doesn't improve the performance but make it even slower

multithread doesn't improve the performance but make it even slower - c#

I have a multithread application, compare CompareRow in the two List o1 and o2, and get the similarity, then store o1.CompareRow,o2.CompareRow, and Similarity in List, because the getSimilarity Process is very time consuming, and the data usually more than 1000, so using multithread should be helping, but in fact, it is not, pls
help point out, several things i already consider are
1. Database shouldnot be a problem, cause i already load the data into
two List<>
2. There is no shared writable data to complete
3. the order of the records is not a problem
pls help and it is urgent, the deadline is close....
public class OriginalRecord
{
public int PrimaryKey;
public string CompareRow;
}
===============================================
public class Record
{
// public ManualResetEvent _doneEvent;
public string r1 { get; set; }
public string r2 { get; set; }
public float similarity { get; set; }
public CountdownEvent CountDown;
public Record(string r1, string r2, ref CountdownEvent _countdown)
{
this.r1 = r1;
this.r2 = r2;
//similarity = GetSimilarity(r1, r2);
CountDown = _countdown;
}
public void ThreadPoolCallback(Object threadContext)
{
int threadIndex = (int)threadContext;
similarity = GetSimilarity(r1, r2);
CountDown.Signal();
}
private float GetSimilarity(object obj1, object obj2)
{
//Very time-consuming
ComparisionLibrary.MatchsMaker match
= new ComparisionLibrary.MatchsMaker (obj1.ToString(), obj2.ToString());
return match.Score;
}
}
================================================================
public partial class FormMain : Form
{
public FormMain()
{
InitializeComponent();
List<OriginalRecord> oList1=... //get loaded from database
List<OriginalRecord> oList2 =... //get loaded from database
int recordNum = oList1.Count * oList2.Count;
CountdownEvent _countdown = new CountdownEvent(recordNum);
Record[] rArray = new Record[recordNum];
int num = 0;
for (int i = 0; i <oList1.Count; i++)
{
for (int j = 0; j < oList2.Count; j++)
{
//Create a record instance
Record r
=new Record(oList1[i].CompareRow,oList2[j].CompareRow,ref _countdown);
rArray[num]=r;
//here use threadpool
ThreadPool.QueueUserWorkItem(r.ThreadPoolCallback, num);
num++;
}
}
_countdown.Wait();
List<Record> rList = rArray.ToList();
PopulateGridView(rList);
}
Here are the photos i capture in the debug mode
two things bring to my attention are
1. there are only 4 threads created to work but i set the minthreads in the threadpool is 10
2. as you can see, even 4 threads are created , but only one thread working at any time,
what is worse is sometimes none of the thread is working
by the way, the ComparisionLibrary is the library i download to do the heavy work
i cant post photo, would you pls leave me an email or sth that i can send the photos to you,thanks.

If you want to split your large task into several small tasks, do mind that parallelization only works if the small tasks are not too short in terms of run time. If you cut your 2 sec runtime job into 100.000 0.02 ms jobs, the overhead of distributing the jobs over the workers can be so great that the process runs much slower in parallel then it would normally do. Only if the overhead of parallelization is much smaller than the average runtime of one of the small tasks, you will see a performance gain. Try to cut up your problem in larger chunks.

Hi, my assumptions:
1) If your computer has 1 CPU than you won't gain any performance improvement. Indeed, you will lose in the performance, because of Context Switch.
2) Try to set a max threads value of the ThreadPool class.
ThreadPool.SetMaxThreads(x);
3) Try to use PLINQ.

As a guess, try to use TPL instead of plain ThreadPool, e.g.:
Replace
ThreadPool.QueueUserWorkItem(r.ThreadPoolCallback, num);
by such thing:
int nums = new int[1];
nums[0] = num;
Task t = null;
t = new Task(() =>
{
r.ThreadPoolCallback(nums[0]);
});
t.Start();
EDIT
You're trying to compare two list1, list2, each item in separate thread from pool, which is list1.Count*list2.Count thread pool calls, which looks for me like to much paralelization(this could cause many context switches) .
Try to split all comparison tasks onto several groups(e.g. 4-8) and start each one in separate thread. Maybe this could help. But again, this is only an assumption.

Related

C# - thread-safe implementation of a class with fixed array (queue/buffer?) (ADD, REMOVE, MODIFY)

I'm new to programming, threading and thread safe synchronization in general, been reading a lot and searching, found dozens of similar questions - however couldn't find anything that would help me out in the end - or either my lack of knowledge about the topic didn't allow me to notice/understand the answer.
I've created class with 3 data fields (string, int, double) and 1 method (will be used to calculate new value - which later will be compared to selected criteria for filtering).
Then I want my main thread to perform actions in such order:
Reads data (3 files) to selected data structure. - let's ignore these
Runs selected amount of threads (2 <= x <= n/4, n - record count in file), which will take records, perform method on them and if the result meets the criteria - record will be written to results data structure.
To the data structure, which was used to pass data via threads, inserts all read data.
Waits for all threads to finish.
Writes data from results structure to file. - let's ignore these
Thread reqs:
When all data is processed, threads automatically finish the job. While at least a single thread is working, main thread is waiting for it to finish.
Data to threads is passed via synchronized data structure: monitor. I need to implement it as a class or structure which contains add and remove methods.
Data inside the monitor class is stored to array (arr.size <= all data size / 2)
If an attempt is made to read/delete from the monitor when the internal array is empty or write/add when the internal array is full, the thread is blocked - via conditional synchronization, threads execute only after some condition is met, I assume. (Using lock, unlock, Monitor.Wait Pulse and PulseAll methods).
Result data structure is the same as initial data structure, except no fixed array size. Data to result structure must be added in order.
At the moment I have this (pseudo-ish):
Also I've been trying to search for different implementations of what I'm talking about and kept failing miserably, it's 7 AM now - so code is a mess, I excuse myself for that
class Computer // 3 field class.
{
public string _model { get; set; }
public int _ram { get; set; }
public double _screen { get; set; }
double methodForFilterValue(Computer computer)
{
return computer._screen + computer._ram;
}
}
class Computers // Monitor class.
{
Computer[] _computers; // array of computers - buffer.
int _bufferHead { get; set; } // circular buffer properties.
int _counter { get; set; }
int _bufferTail { get; set; }
int _length { get; private set; }
private readonly object _locker = new Object();
Computers (int length)
{
_bufferHead = _bufferTail = -1;
_length = length;
_computers = new Computer[length];
_counter = 0;
}
Computers(Computer[] computers)
{
_length = computers.Length;
_computers = computers;
_bufferHead = 0;
_bufferTail = _length - 1;
_counter = computers.ToList().Count;
}
Add(Computer computer)
{
lock (_locker)
{
if((_bufferHead == 0 && _bufferTail == _length - 1) || (_bufferTail +1 == _bufferHead))
{
//means Empty.
return;
}
else
{
if (_bufferTail == _length - 1)
_bufferTail = 0;
else
_bufferTail++;
_computers[_bufferTail] = computer;
_counter++;
}
if (_bufferHead == -1)
_bufferHead = 0;
}
}
Remove()
{
lock (_locker){} -- not implemented yet.
}
}
class ResultStructure // Result class - for filtered records, printing to file.
{
public Computer[] _resultComputers;
public int _capacity { get; set; }
public int _counter { get; set; }
private readonly object _locker = new Object();
private const int INITIAL_BUFFER_SIZE = 10;
public ResultStructure()
{
_resultComputers = new Computer[INITIAL_BUFFER_SIZE];
_capacity = INITIAL_BUFFER_SIZE;
_counter = 0;
}
public ResultStructure(int length)
{
_resultComputers = new Computer[length];
_capacity = length;
_counter = 0;
}
public void AddSorted(Computer computer)
{
-- yet to be implemented also.
}
}
class Program
{
static void Main(string[] args)
{
int size = 90; // single file size: 30 lines.
string path = AppDomain.CurrentDomain.BaseDirectory;
string[] filePaths = Directory.GetFiles(file_*.json");
int selectedCriteria = 20; // each computer's methodForFilterValue() will have to result in a higher value than 20 for it to be added to resultStructure.
Computers allComputers = new Computers(size/2);
Computers[] computers = new Computers[filePaths.Length];
ResultStructure result = new ResultStructure(size);
**//STEPS:**
**//1. Data read.**
for (int i = 0; i < filePaths.Length; i++)
{
computers[i] = ReadJson(filePaths[i]);
}
**//2. ??? - Need explanation on this part.**
**//Basically this part about: iterating through each record in computers[i] - calling methodForFilterValue() function, checking if the result of that function is > 20 and if yes, add record to result._resultComputers array. Items should be added while leaving the array sorted. "AddSorted()"**
**//3. Run threads - addAll items to the same data struct.**
var threadsAdd = Enumerable.Range(0, filePaths.Length).Select(i => new Thread(() =>
{
for (int j = 0; j < computers[i]._computers.Length; j++)
{
allComputers.Add(computers[i]._computers[j]);
}
}));
var threads = threadsAdd.ToList();
foreach (var thread in threads)
{
thread.Start();
}
**//4. wait for all threads to finish**
foreach (var thread in threads)
{
thread.Join();
}
**5.** WriteToFile();
Write("Program finished execution");
}
Edit:
As asked will narrow it down to:
How to create thread-safe add, remove methods - run them on multiple threads, while in Computers class having limited size array - blocking threads when array is empty or full.
In total records: 90
, Computers class array = [45];
Should say also that I'm specifically required not to use concurrent collections.
code if needed!

Since you are new to multithreading, the answer about what you should do is simple: lock every time you access a shared resource. "Access" means read and write. "Shared resource" means every variable or field or property that is accessed by more than one threads concurrently. Use the same lock object to protect the same variable. Release the lock as soon as possible.
This advice is not enough for creating a performant application, nor is enough for avoiding deadlocks, but at least your data will not be corrupted.

Be aware that some concurrent collections exists in C# and should be usefull for you.
You should use :
ConcurentDictionary<string, Computer>
if you have a computer identifier.

Correct way to check performance of piece of code in C#

Just assume I have two pieces of code and I want to check CPU usage and Memory of these codes and compare together, is this a good way to check performance:
public class CodeChecker: IDisposable
{
public PerformanceResult Check(Action<int> codeToTest, int loopLength)
{
var stopWatch = new Stopwatch();
stopWatch.Start();
for(var i = 0; i < loopLength; i++)
{
codeToTest.Invoke(i);
}
stopWatch.Stop();
var process = Process.GetCurrentProcess();
var result = new PerformanceResult(stopWatch.ElapsedMilliseconds, process.PrivateMemorySize64);
return result;
}
}
public class PerformanceResult
{
public long DurationMilliseconds { get; set; }
public long PrivateMemoryBytes { get; set; }
public PerformanceResult(long durationMilliseconds, long privateMemoryBytes)
{
DurationMilliseconds = durationMilliseconds;
PrivateMemoryBytes = privateMemoryBytes;
}
public override string ToString()
{
return $"Duration: {DurationMilliseconds} - Memory: {PrivateMemoryBytes}";
}
}
And:
static void Main(string[] args)
{
Console.WriteLine("Start!");
int loopLength = 10000000;
var collection = new Dictionary<int, Target>();
PerformanceResult result;
using (var codeChecker = new CodeChecker())
{
result = codeChecker.Check((int i) => collection.Add(i, new Target()) , loopLength);
}
Console.WriteLine($"Dict Performance: {result}");
var list = new List<Target>();
using(var codeChecker = new CodeChecker())
{
result = codeChecker.Check((int i) => list.Add(new Target()), loopLength);
}
Console.WriteLine($"List Performance: {result}");
Console.ReadLine();
}
I'm looking for checking performance programmatically and I want to check piece of code specifically not all the application.
Any suggestion to improve aforementioned code?
And I will open to any suggestion for using free tools.

There are lot of factors which may impose a bias into your measurement including CLR and JIT compiler influence, heap state, cold or hot run, overall load in your system, etc. Ideally you need to isolate the pieces of code you'd like to benchmark from each other to exclude mutual impact, benchmark only hot runs, not cold to exclude JIT compilation and other cold run factors and what is most important you need to conduct multiple runs to obtain statistical information as single run can be not representative especially on a system which implies multitasking. Luckily you don't have to do it everything manually - there is great library for bench-marking which does all things mentioned and much more and which is widely used in various .NET projects.

Is parallelism usefull with a lock for datatable write transactions

Will parallelism help with performance for a locked object, should it be run single threaded, or is there another technique?
I noticed that when accessing a dataset and adding rows from multiple threads exceptions were thrown. Therefore I created a "thread-safe" version to add rows by locking the table prior to updating the row. This implementation works but is appears slow with many transactions.
public partial class HaMmeRffl
{
public partial class PlayerStatsDataTable
{
public void AddPlayerStatsRow(int PlayerID, int Year, int StatEnum, int Value, DateTime Timestamp)
{
lock (TeamMemberData.Dataset.PlayerStats)
{
HaMmeRffl.PlayerStatsRow testrow = TeamMemberData.Dataset.PlayerStats.FindByPlayerIDYearStatEnum(PlayerID, Year, StatEnum);
if (testrow == null)
{
HaMmeRffl.PlayerStatsRow newRow = TeamMemberData.Dataset.PlayerStats.NewPlayerStatsRow();
newRow.PlayerID = PlayerID;
newRow.Year = Year;
newRow.StatEnum = StatEnum;
newRow.Value = Value;
newRow.Timestamp = Timestamp;
TeamMemberData.Dataset.PlayerStats.AddPlayerStatsRow(newRow);
}
else
{
testrow.Value = Value;
testrow.Timestamp = Timestamp;
}
}
}
}
}
Now I can call this safely from multiple threads, but does it actually buy me anything? Can I do this differently for better performance. For instance is there any way to use System.Collections.Concurrent namespace to optimize performance or any other methods?
In addition, I update the underlying database after the entire dataset is updated and that takes a very long time. Would that be considered an I/O operation and be worth using parallel processing by updating it after each row is updated in the dataset (or some number of rows).
UPDATE
I wrote some code to test concurrent vs sequential processing which shows it takes about 30% longer to do concurrent processing and I should use sequential processing here. I assume this is because the lock on the database is causing the overhead on the ConcurrentQueue to be more costly than the gains from parallel processing. Is this conclusion correct and is there anything that I can do to speed up the processing, or am I stuck as for a Datatable "You must synchronize any write operations".
Here is my test code which might not be scientifically correct. Here is the timer and calls between them.
dbTimer.Restart();
Queue<HaMmeRffl.PlayersRow.PlayerValue> addPlayerRow = InsertToPlayerQ(addUpdatePlayers);
Queue<HaMmeRffl.PlayerStatsRow.PlayerStatValue> addPlayerStatRow = InsertToPlayerStatQ(addUpdatePlayers);
UpdatePlayerStatsInDB(addPlayerRow, addPlayerStatRow);
dbTimer.Stop();
System.Diagnostics.Debug.Print("Writing to the dataset took {0} seconds single threaded", dbTimer.Elapsed.TotalSeconds);
dbTimer.Restart();
ConcurrentQueue<HaMmeRffl.PlayersRow.PlayerValue> addPlayerRows = InsertToPlayerQueue(addUpdatePlayers);
ConcurrentQueue<HaMmeRffl.PlayerStatsRow.PlayerStatValue> addPlayerStatRows = InsertToPlayerStatQueue(addUpdatePlayers);
UpdatePlayerStatsInDB(addPlayerRows, addPlayerStatRows);
dbTimer.Stop();
System.Diagnostics.Debug.Print("Writing to the dataset took {0} seconds concurrently", dbTimer.Elapsed.TotalSeconds);
In both examples I add to the Queue and ConcurrentQueue in an identical manner single threaded. The only difference is the insertion into the datatable. The single-threaded approach inserts as follows:
private static void UpdatePlayerStatsInDB(Queue<HaMmeRffl.PlayersRow.PlayerValue> addPlayerRows, Queue<HaMmeRffl.PlayerStatsRow.PlayerStatValue> addPlayerStatRows)
{
try
{
HaMmeRffl.PlayersRow.PlayerValue row;
while (addPlayerRows.Count > 0)
{
row = addPlayerRows.Dequeue();
TeamMemberData.Dataset.Players.AddPlayersRow(
row.PlayerID, row.Name, row.PosEnum, row.DepthEnum,
row.TeamID, row.RosterTimestamp, row.DepthTimestamp,
row.Active, row.NewsUpdate);
}
}
catch (Exception)
{
TeamMemberData.Dataset.Players.RejectChanges();
}
try
{
HaMmeRffl.PlayerStatsRow.PlayerStatValue row;
while (addPlayerStatRows.Count > 0)
{
row = addPlayerStatRows.Dequeue();
TeamMemberData.Dataset.PlayerStats.AddUpdatePlayerStatsRow(
row.PlayerID, row.Year, row.StatEnum, row.Value, row.Timestamp);
}
}
catch (Exception)
{
TeamMemberData.Dataset.PlayerStats.RejectChanges();
}
TeamMemberData.Dataset.Players.AcceptChanges();
TeamMemberData.Dataset.PlayerStats.AcceptChanges();
}
The concurrent adds as follows
private static void UpdatePlayerStatsInDB(ConcurrentQueue<HaMmeRffl.PlayersRow.PlayerValue> addPlayerRows, ConcurrentQueue<HaMmeRffl.PlayerStatsRow.PlayerStatValue> addPlayerStatRows)
{
Action actionPlayer = () =>
{
HaMmeRffl.PlayersRow.PlayerValue row;
while (addPlayerRows.TryDequeue(out row))
{
TeamMemberData.Dataset.Players.AddPlayersRow(
row.PlayerID, row.Name, row.PosEnum, row.DepthEnum,
row.TeamID, row.RosterTimestamp, row.DepthTimestamp,
row.Active, row.NewsUpdate);
}
};
Action actionPlayerStat = () =>
{
HaMmeRffl.PlayerStatsRow.PlayerStatValue row;
while (addPlayerStatRows.TryDequeue(out row))
{
TeamMemberData.Dataset.PlayerStats.AddUpdatePlayerStatsRow(
row.PlayerID, row.Year, row.StatEnum, row.Value, row.Timestamp);
}
};
Action[] actions = new Action[Environment.ProcessorCount * 2];
for (int i = 0; i < Environment.ProcessorCount; i++)
{
actions[i * 2] = actionPlayer;
actions[i * 2 + 1] = actionPlayerStat;
}
try
{
// Start ProcessorCount concurrent consuming actions.
Parallel.Invoke(actions);
}
catch (Exception)
{
TeamMemberData.Dataset.Players.RejectChanges();
TeamMemberData.Dataset.PlayerStats.RejectChanges();
}
TeamMemberData.Dataset.Players.AcceptChanges();
TeamMemberData.Dataset.PlayerStats.AcceptChanges();
}
The difference in time is 4.6 seconds for the single-threaded and 6.1 for the parallel.Invoke.

Lock & transactions are not good for parallelism and performance.
1)Try avoid lock：Will different threads need to update the same Row in dataset?
2)minimize lock time.
For db operation use may try Batch Update future of ADO.NET: http://msdn.microsoft.com/en-us/library/ms810297.aspx

Multithreading can help upto an extent because once the data across your app boundary , you will start waiting for I/O , here you can do asynchronous processing because your app does not have control over various parameters ( Resource access , Network speed etc),this will give better user experience (If UI app).
Now for your scenario , you may want to use some sort of producer/consumer queue , as soon as a row is available in queue , a different thread start processing it but again this will work upto an extent.

Java is scaling much worse than C# over many cores?

I am testing spawning off many threads running the same function on a 32 core server for Java and C#. I run the application with 1000 iterations of the function, which is batched across either 1,2,4,8, 16 or 32 threads using a threadpool.
At 1, 2, 4, 8 and 16 concurrent threads Java is at least twice as fast as C#. However, as the number of threads increases, the gap closes and by 32 threads C# has nearly the same average run-time, but Java occasionally takes 2000ms (whereas both languages are usually running about 400ms). Java is starting to get worse with massive spikes in the time taken per thread iteration.
EDIT This is Windows Server 2008
EDIT2 I have changed the code below to show using the Executor Service threadpool. I have also installed Java 7.
I have set the following optimisations in the hotspot VM:
-XX:+UseConcMarkSweepGC -Xmx 6000
but it still hasnt made things any better. The only difference between the code is that im using the below threadpool and for the C# version we use:
http://www.codeproject.com/Articles/7933/Smart-Thread-Pool
Is there a way to make the Java more optimised? Perhaos you could explain why I am seeing this massive degradation in performance?
Is there a more efficient Java threadpool?
(Please note, I do not mean by changing the test function)
import java.io.DataOutputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.PrintStream;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadPoolExecutor;
public class PoolDemo {
static long FastestMemory = 2000000;
static long SlowestMemory = 0;
static long TotalTime;
static int[] FileArray;
static DataOutputStream outs;
static FileOutputStream fout;
static Byte myByte = 0;
public static void main(String[] args) throws InterruptedException, FileNotFoundException {
int Iterations = Integer.parseInt(args[0]);
int ThreadSize = Integer.parseInt(args[1]);
FileArray = new int[Iterations];
fout = new FileOutputStream("server_testing.csv");
// fixed pool, unlimited queue
ExecutorService service = Executors.newFixedThreadPool(ThreadSize);
ThreadPoolExecutor executor = (ThreadPoolExecutor) service;
for(int i = 0; i<Iterations; i++) {
Task t = new Task(i);
executor.execute(t);
}
for(int j=0; j<FileArray.length; j++){
new PrintStream(fout).println(FileArray[j] + ",");
}
}
private static class Task implements Runnable {
private int ID;
public Task(int index) {
this.ID = index;
}
public void run() {
long Start = System.currentTimeMillis();
int Size1 = 100000;
int Size2 = 2 * Size1;
int Size3 = Size1;
byte[] list1 = new byte[Size1];
byte[] list2 = new byte[Size2];
byte[] list3 = new byte[Size3];
for(int i=0; i<Size1; i++){
list1[i] = myByte;
}
for (int i = 0; i < Size2; i=i+2)
{
list2[i] = myByte;
}
for (int i = 0; i < Size3; i++)
{
byte temp = list1[i];
byte temp2 = list2[i];
list3[i] = temp;
list2[i] = temp;
list1[i] = temp2;
}
long Finish = System.currentTimeMillis();
long Duration = Finish - Start;
TotalTime += Duration;
FileArray[this.ID] = (int)Duration;
System.out.println("Individual Time " + this.ID + " \t: " + (Duration) + " ms");
if(Duration < FastestMemory){
FastestMemory = Duration;
}
if (Duration > SlowestMemory)
{
SlowestMemory = Duration;
}
}
}
}

Summary
Below are the original response, update 1, and update 2. Update 1 talks about dealing with the race conditions around the test statistic variables by using concurrency structures. Update 2 is a much simpler way of dealing with the race condition issue. Hopefully no more updates from me - sorry for the length of the response but multithreaded programming is complicated!
Original Response
The only difference between the code is that im using the below
threadpool
I would say that is an absolutely huge difference. It's difficult to compare the performance of the two languages when their thread pool implementations are completely different blocks of code, written in user space. The thread pool implementation could have enormous impact on performance.
You should consider using Java's own built-in thread pools. See ThreadPoolExecutor and the entire java.util.concurrent package of which it is part. The Executors class has convenient static factory methods for pools and is a good higher level interface. All you need is JDK 1.5+, though the newer, the better. The fork/join solutions mentioned by other posters are also part of this package - as mentioned, they require 1.7+.
Update 1 - Addressing race conditions by using concurrency structures
You have race conditions around the setting of FastestMemory, SlowestMemory, and TotalTime. For the first two, you are doing the < and > testing and then the setting in more than one step. This is not atomic; there is certainly the chance that another thread will update these values in between the testing and the setting. The += setting of TotalTime is also non-atomic: a test and set in disguise.
Here are some suggested fixes.
TotalTime
The goal here is a threadsafe, atomic += of TotalTime.
// At the top of everything
import java.util.concurrent.atomic.AtomicLong;
...
// In PoolDemo
static AtomicLong TotalTime = new AtomicLong();
...
// In Task, where you currently do the TotalTime += piece
TotalTime.addAndGet (Duration);
FastestMemory / SlowestMemory
The goal here is testing and updating FastestMemory and SlowestMemory each in an atomic step, so no thread can slip in between the test and update steps to cause a race condition.
Simplest approach:
Protect the testing and setting of the variables using the class itself as a monitor. We need a monitor that contains the variables in order to guarantee synchronized visibility (thanks #A.H. for catching this.) We have to use the class itself because everything is static.
// In Task
synchronized (PoolDemo.class) {
if (Duration < FastestMemory) {
FastestMemory = Duration;
}
if (Duration > SlowestMemory) {
SlowestMemory = Duration;
}
}
Intermediate approach:
You may not like taking the whole class for the monitor, or exposing the monitor by using the class, etc. You could do a separate monitor that does not itself contain FastestMemory and SlowestMemory, but you will then run into synchronization visibility issues. You get around this by using the volatile keyword.
// In PoolDemo
static Integer _monitor = new Integer(1);
static volatile long FastestMemory = 2000000;
static volatile long SlowestMemory = 0;
...
// In Task
synchronized (PoolDemo._monitor) {
if (Duration < FastestMemory) {
FastestMemory = Duration;
}
if (Duration > SlowestMemory) {
SlowestMemory = Duration;
}
}
Advanced approach:
Here we use the java.util.concurrent.atomic classes instead of monitors. Under heavy contention, this should perform better than the synchronized approach. Try it and see.
// At the top of everything
import java.util.concurrent.atomic.AtomicLong;
. . . .
// In PoolDemo
static AtomicLong FastestMemory = new AtomicLong(2000000);
static AtomicLong SlowestMemory = new AtomicLong(0);
. . . . .
// In Task
long temp = FastestMemory.get();
while (Duration < temp) {
if (!FastestMemory.compareAndSet (temp, Duration)) {
temp = FastestMemory.get();
}
}
temp = SlowestMemory.get();
while (Duration > temp) {
if (!SlowestMemory.compareAndSet (temp, Duration)) {
temp = SlowestMemory.get();
}
}
Let me know what happens after this. It may not fix your problem, but the race condition around the very variables that track your performance is too dangerous to ignore.
I originally posted this update as a comment but moved it here so that I would have room to show code. This update has been through a few iterations - thanks to A.H. for catching a bug I had in an earlier version. Anything in this update supersedes anything in the comment.
Last but not least, an excellent source covering all this material is Java Concurrency in Practice, the best book on Java concurrency, and one of the best Java books overall.
Update 2 - Addressing race conditions in a much simpler way
I recently noticed that your current code will never terminate unless you add executorService.shutdown(). That is, the non-daemon threads living in that pool must be terminated or else the main thread will never exit. This got me to thinking that since we have to wait for all threads to exit, why not compare their durations after they finished, and thus bypass the concurrent updating of FastestMemory, etc. altogether? This is simpler and could be faster; there's no more locking or CAS overhead, and you are already doing an iteration of FileArray at the end of things anyway.
The other thing we can take advantage of is that your concurrent updating of FileArray is perfectly safe, since each thread is writing to a separate cell, and since there is no reading of FileArray during the writing of it.
With that, you make the following changes:
// In PoolDemo
// This part is the same, just so you know where we are
for(int i = 0; i<Iterations; i++) {
Task t = new Task(i);
executor.execute(t);
}
// CHANGES BEGIN HERE
// Will block till all tasks finish. Required regardless.
executor.shutdown();
executor.awaitTermination(10, TimeUnit.SECONDS);
for(int j=0; j<FileArray.length; j++){
long duration = FileArray[j];
TotalTime += duration;
if (duration < FastestMemory) {
FastestMemory = duration;
}
if (duration > SlowestMemory) {
SlowestMemory = duration;
}
new PrintStream(fout).println(FileArray[j] + ",");
}
. . .
// In Task
// Ending of Task.run() now looks like this
long Finish = System.currentTimeMillis();
long Duration = Finish - Start;
FileArray[this.ID] = (int)Duration;
System.out.println("Individual Time " + this.ID + " \t: " + (Duration) + " ms");
Give this approach a shot as well.
You should definitely be checking your C# code for similar race conditions.

...but Java occasionally takes 2000ms...
And
byte[] list1 = new byte[Size1];
byte[] list2 = new byte[Size2];
byte[] list3 = new byte[Size3];
The hickups will be the garbage collector cleaning up your arrays. If you really want to tune that I suggest you use some kind of cache for the arrays.
Edit
This one
System.out.println("Individual Time " + this.ID + " \t: " + (Duration) + " ms");
does one or more synchronized internally. So your highly "concurrent" code will be serialized quite good at this point. Just remove it and retest.

While #sparc_spread's answer is great, another thing I've noticed is this:
I run the application with 1000 iterations of the function
Notice that the HotSpot JVM is working on interpreted mode for the first 1.5k iterations of any function on client mode, and for 10k iterations on server mode. Computers with that many cores are automatically considered "servers" by the HotSpot JVM.
That would mean that C# would do JIT (and run in machine code) before Java does, and has a chance for better performance at the function runtime. Try increasing the iterations to 20,000 and start counting from 10k iteration.
The rationale here is that the JVM collects statistical data for how to do JIT best. It trusts that your function is going to be run a lot through time, so it takes a "slow bootstrapping" mechanism for a faster runtime overall. Or in their words "20% of the functions run 80% of the time", so why JIT them all?

Are you using java6? Java 7 comes with features to improve performance in parallel programing:
http://www.oracle.com/technetwork/articles/java/fork-join-422606.html

static method vs instance method, multi threading, performance

Can you help explain how multiple threads access static methods? Are multiple threads able to access the static method concurrently?
To me it would seem logical that if a method is static that would make it a single resouce that is shared by all the threads. Therefore only one thread would be able to use it at a time. I have created a console app to test this. But from the results of my test it would appear that my assumption is incorrect.
In my test a number of Worker objects are constructed. Each Worker has a number of passwords and keys. Each Worker has an instance method that hashes it's passwords with it's keys. There is also a static method which has exactly the same implementation, the only difference being that it is static. After all the Worker objects have been created the start time is written to the console. Then a DoInstanceWork event is raised and all of the Worker objects queue their useInstanceMethod to the threadpool. When all the methods or all the Worker objects have completed the time it took for them all to complete is calculated from the start time and is written to the console. Then the start time is set to the current time and the DoStaticWork event is raised. This time all the Worker objects queue their useStaticMethod to the threadpool. And when all these method calls have completed the time it took until they had all completed is again calculated and written to the console.
I was expecting the time taken when the objects use their instance method to be 1/8 of the time taken when they use the static method. 1/8 because my machine has 4 cores and 8 virtual threads. But it wasn't. In fact the time taken when using the static method was actually fractionally faster.
How is this so? What is happening under the hood? Does each thread get it's own copy of the static method?
Here is the Console app-
using System;
using System.Collections.Generic;
using System.Security.Cryptography;
using System.Threading;
namespace bottleneckTest
{
public delegate void workDelegate();
class Program
{
static int num = 1024;
public static DateTime start;
static int complete = 0;
public static event workDelegate DoInstanceWork;
public static event workDelegate DoStaticWork;
static bool flag = false;
static void Main(string[] args)
{
List<Worker> workers = new List<Worker>();
for( int i = 0; i < num; i++){
workers.Add(new Worker(i, num));
}
start = DateTime.UtcNow;
Console.WriteLine(start.ToString());
DoInstanceWork();
Console.ReadLine();
}
public static void Timer()
{
complete++;
if (complete == num)
{
TimeSpan duration = DateTime.UtcNow - Program.start;
Console.WriteLine("Duration: {0}", duration.ToString());
complete = 0;
if (!flag)
{
flag = true;
Program.start = DateTime.UtcNow;
DoStaticWork();
}
}
}
}
public class Worker
{
int _id;
int _num;
KeyedHashAlgorithm hashAlgorithm;
int keyLength;
Random random;
List<byte[]> _passwords;
List<byte[]> _keys;
List<byte[]> hashes;
public Worker(int id, int num)
{
this._id = id;
this._num = num;
hashAlgorithm = KeyedHashAlgorithm.Create("HMACSHA256");
keyLength = hashAlgorithm.Key.Length;
random = new Random();
_passwords = new List<byte[]>();
_keys = new List<byte[]>();
hashes = new List<byte[]>();
for (int i = 0; i < num; i++)
{
byte[] key = new byte[keyLength];
new RNGCryptoServiceProvider().GetBytes(key);
_keys.Add(key);
int passwordLength = random.Next(8, 20);
byte[] password = new byte[passwordLength * 2];
random.NextBytes(password);
_passwords.Add(password);
}
Program.DoInstanceWork += new workDelegate(doInstanceWork);
Program.DoStaticWork += new workDelegate(doStaticWork);
}
public void doInstanceWork()
{
ThreadPool.QueueUserWorkItem(useInstanceMethod, new WorkerArgs() { num = _num, keys = _keys, passwords = _passwords });
}
public void doStaticWork()
{
ThreadPool.QueueUserWorkItem(useStaticMethod, new WorkerArgs() { num = _num, keys = _keys, passwords = _passwords });
}
public void useInstanceMethod(object args)
{
WorkerArgs workerArgs = (WorkerArgs)args;
for (int i = 0; i < workerArgs.num; i++)
{
KeyedHashAlgorithm hashAlgorithm = KeyedHashAlgorithm.Create("HMACSHA256");
hashAlgorithm.Key = workerArgs.keys[i];
byte[] hash = hashAlgorithm.ComputeHash(workerArgs.passwords[i]);
}
Program.Timer();
}
public static void useStaticMethod(object args)
{
WorkerArgs workerArgs = (WorkerArgs)args;
for (int i = 0; i < workerArgs.num; i++)
{
KeyedHashAlgorithm hashAlgorithm = KeyedHashAlgorithm.Create("HMACSHA256");
hashAlgorithm.Key = workerArgs.keys[i];
byte[] hash = hashAlgorithm.ComputeHash(workerArgs.passwords[i]);
}
Program.Timer();
}
public class WorkerArgs
{
public int num;
public List<byte[]> passwords;
public List<byte[]> keys;
}
}
}

Methods are code - there's no problem with thread accessing that code concurrently since the code isn't modified by running it; it's a read-only resource (jitter aside). What needs to be handled carefully in multi-threaded situations is access to data concurrently (and more specifically, when modifying that data is a possibility). Whether a method is static or an instance method has nothing to do with whether or not it needs to ne serialized in some way to make it threadsafe.

In all cases, whether static or instance, any thread can access any method at any time unless you do explicit work to prevent it.
For example, you can create a lock to ensure only a single thread can access a given method, but C# will not do that for you.
Think of it like watching TV. A TV does nothing to prevent multiple people from watching it at the same time, and as long as everybody watching it wants to see the same show, there's no problem. You certainly wouldn't want a TV to only allow one person to watch it at once just because multiple people might want to watch different shows, right? So if people want to watch different shows, they need some sort of mechanism external to the TV itself (perhaps having a single remote control that the current viewer holds onto for the duration of his show) to make sure that one guy doesn't change the channel to his show while another guy is watching.

C# methods are "reentrant" (As in most languages; the last time I heard of genuinely non-reentrant code was DOS routines) Each thread has its own call stack, and when a method is called, the call stack of that thread is updated to have space for the return address, calling parameters, return value, local values, etc.
Suppose Thread1 and Thread2 calls the method M concurrently and M has a local int variable n. The call stack of Thread1 is seperate from the call stack of Thread2, so n will have two different instantiations in two different stacks. Concurrency would be a problem only if n is stored not in a stack but say in the same register (i.e. in a shared resource) CLR (or is it Windows?) is careful not to let that cause a problem and cleans, stores and restores the registers when switching threads. (What do you do in presence of multiple CPU's, how do you allocate registers, how do you implement locking. These are indeed difficult problems that makes one respect compiler, OS writers when one comes to think of it)
Being reentrant does not prove no bad things happen when two threads call the same method at the same time: it only proves no bad things happen if the method does not access and update other shared resources.

When you access an instance method, you are accessing it through an object reference.
When you access a static method, you are accessing it directly.
So static methods are a tiny bit faster.

When you instanciate a class you dont create a copy of the code. You have a pointer to the definition of the class, and the code is acceded through it. So, instance methods are accessed the sane way than static methods

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

multithread doesn't improve the performance but make it even slower - c#

Hi, my assumptions: 1) If your computer has 1 CPU than you won't gain any performance improvement. Indeed, you will lose in the performance, because of Context Switch. 2) Try to set a max threads value of the ThreadPool class. ThreadPool.SetMaxThreads(x); 3) Try to use PLINQ.

Related

C# - thread-safe implementation of a class with fixed array (queue/buffer?) (ADD, REMOVE, MODIFY)

Correct way to check performance of piece of code in C#

Is parallelism usefull with a lock for datatable write transactions

Java is scaling much worse than C# over many cores?

static method vs instance method, multi threading, performance

Categories

Resources