Ok, So, I just started screwing around with threading, now it's taking a bit of time to wrap my head around the concepts so i wrote a pretty simple test to see how much faster if faster at all printing out 20000 lines would be (and i figured it would be faster since i have a quad core processor?)
so first i wrote this, (this is how i would normally do the following):
System.DateTime startdate = DateTime.Now;
for (int i = 0; i < 10000; ++i)
{
Console.WriteLine("Producing " + i);
Console.WriteLine("\t\t\t\tConsuming " + i);
}
System.DateTime endtime = DateTime.Now;
Console.WriteLine(a.startdate.Second + ":" + a.startdate.Millisecond + " to " + endtime.Second + ":" + endtime.Millisecond);
And then with threading:
public class Test
{
static ProducerConsumer queue;
public System.DateTime startdate = DateTime.Now;
static void Main()
{
queue = new ProducerConsumer();
new Thread(new ThreadStart(ConsumerJob)).Start();
for (int i = 0; i < 10000; i++)
{
Console.WriteLine("Producing {0}", i);
queue.Produce(i);
}
Test a = new Test();
}
static void ConsumerJob()
{
Test a = new Test();
for (int i = 0; i < 10000; i++)
{
object o = queue.Consume();
Console.WriteLine("\t\t\t\tConsuming {0}", o);
}
System.DateTime endtime = DateTime.Now;
Console.WriteLine(a.startdate.Second + ":" + a.startdate.Millisecond + " to " + endtime.Second + ":" + endtime.Millisecond);
}
}
public class ProducerConsumer
{
readonly object listLock = new object();
Queue queue = new Queue();
public void Produce(object o)
{
lock (listLock)
{
queue.Enqueue(o);
Monitor.Pulse(listLock);
}
}
public object Consume()
{
lock (listLock)
{
while (queue.Count == 0)
{
Monitor.Wait(listLock);
}
return queue.Dequeue();
}
}
}
Now, For some reason i assumed this would be faster, but after testing it 15 times, the median of the results is ... a few milliseconds different in favor of non threading
Then i figured hey ... maybe i should try it on a million Console.WriteLine's, but the results were similar
am i doing something wrong ?
Writing to the console is internally synchronized. It is not parallel. It also causes cross-process communication.
In short: It is the worst possible benchmark I can think of ;-)
Try benchmarking something real, something that you actually would want to speed up. It needs to be CPU bound and not internally synchronized.
As far as I can see you have only got one thread servicing the queue, so why would this be any quicker?
I have an example for why your expectation of a big speedup through multi-threading is wrong:
Assume you want to upload 100 pictures. The single threaded variant loads the first, uploads it, loads the second, uploads it, etc.
The limiting part here is the bandwidth of your internet connection (assuming that every upload uses up all the upload bandwidth you have).
What happens if you create 100 threads to upload 1 picture only? Well, each thread reads its picture (this is the part that speeds things up a little, because reading the pictures is done in parallel instead of one after the other).
As the currently active thread uses 100% of the internet upload bandwidth to upload its picture, no other thread can upload a single byte when it is not active. As the amount of bytes that needs to be transmitted, the time that 100 threads need to upload one picture each is the same time that one thread needs to upload 100 pictures one after the other.
You only get a speedup if uploading pictures was limited to lets say 50% of the available bandwidth. Then, 100 threads would be done in 50% of the time it would take one thread to upload 100 pictures.
"For some reason i assumed this would be faster"
If you don't know why you assumed it would be faster, why are you surprised that it's not? Simply starting up new threads is never guaranteed to make any operation run faster. There has to be some inefficiency in the original algorithm that a new thread can reduce (and that is sufficient to overcome the extra overhead of creating the thread).
All the advice given by others is good advice, especially the mention of the fact that the console is serialized, as well as the fact that adding threads does not guarantee speedup.
What I want to point out and what it seems the others missed is that in your original scenario you are printing everything in the main thread, while in the second scenario you are merely delegating the entire printing task to the secondary worker. This cannot be any faster than your original scenario because you simply traded one worker for another.
A scenario where you might see speedup is this one:
for(int i = 0; i < largeNumber; i++)
{
// embarrassingly parallel task that takes some time to process
}
and then replacing that with:
int i = 0;
Parallel.For(i, largeNumber,
o =>
{
// embarrassingly parallel task that takes some time to process
});
This will split the loop among the workers such that each worker processes a smaller chunk of the original data. If the task does not need synchronization you should see the expected speedup.
Cool test.
One thing to have in mind when dealing with threads is bottlenecks. Consider this:
You have a Restaurant. Your kitchen can make a new order every 10
minutes (your chef has a bladder problem so he's always in the
bathroom, but is your girlfriend's cousin), so he produces 6 orders an
hour.
You currently employ only one waiter, which can attend tables
immediately (he's probably on E, but you don't care as long as the
service is good).
During the first week of business everything is fine: you get
customers every ten minutes. Customers still wait for exactly ten
minutes for their meal, but that's fine.
However, after that week, you are getting as much as 2 costumers every
ten minutes, and they have to wait as much as 20 minutes to get their
meal. They start complaining and making noises. And god, you have
noise. So what do you do?
Waiters are cheap, so you hire two more. Will the wait time change?
Not at all... waiters will get the order faster, sure (attend two
customers in parallel), but still some customers wait 20 minutes for
the chef to complete their orders.You need another chef, but as you
search, you discover they are lacking! Every one of them is on TV
doing some crazy reality show (except for your girlfriend's cousin who
actually, you discover, is a former drug dealer).
In your case, waiters are the threads making calls to Console.WriteLine; But your chef is the Console itself. It can only service so much calls a second. Adding some threads might make things a bit faster, but the gains should be minimal.
You have multiple sources, but only 1 output. It that case multi-threading will not speed it up. It's like having a road where 4 lanes that merge into 1 lane. Having 4 lanes will move traffic faster, but at the end it will slow back down when it merges into 1 lane.
Related
I have an application that uses hangfire to do long-running jobs for me (I know the time the job takes and it is always roughly the same), and in my UI I want to give an estimate for when a certain job is done. For that I need to query hangfire for the position of the job in the queue and the number of servers working on it.
I know I can get the number of enqueued jobs (in the "DEFAULT" queue) by
public long JobsInQueue() {
var monitor = JobStorage.Current.GetMonitoringApi();
return monitor.EnqueuedCount("DEFAULT");
}
and the number of servers by
public int HealthyServers() {
var monitor = JobStorage.Current.GetMonitoringApi();
return monitor.Servers().Count(n => (n.Heartbeat != null) && (DateTime.Now - n.Heartbeat.Value).TotalMinutes < 5);
}
(BTW: I exclude older heartbeats, because if I turn off servers they sometimes linger in the hangfire database. Is there a better way?), but to give a proper estimate I need to know the position of the job in the queue. How do I get that?
The problem you have is that hangfire is asynchronous, queued, parallel, exhibits an at-least-once durability semantic, and basically non-deterministic.
To know with certainty the order in which an item will finish being processed in such a system is impossible. In fact, if the requirement was to enforce strict ordering, then many of the benefits of hangfire would go away.
There is a very good blog post by #odinserj (the author of hangfire) where he outlines this point: http://odinserj.net/2014/05/10/are-your-methods-ready-to-run-in-background/
However, that said, it's not impossible to come up with a sensible estimation algorithm, but it would have to be one where the order of execution is approximated in some way. As to how you can arrive at such an algorithm I don't know but something like this might work (but probably won't):
Approximate seconds remaining until completion =
(
(average duration of job in seconds * queue depth)
/ (the lower of: number of hangfire threads OR queue depth)
)
- number of seconds already spent in queue
+ average duration of job in seconds
I have a for loop with more than 20k iterations,for each iteration it is taking around two or three seconds and total around 20minutes. how i can optimize this for loop. I am using .net3.5 so parallel foreach is not possible. so i splited the 200000 nos into small chunks and implemented some threading now i am able reduce the time by 50%. is there any other way to optimize these kind of for loops.
My sample code is given below
static double sum=0.0;
public double AsyncTest()
{
List<Item> ItemsList = GetItem();//around 20k items
int count = 0;
bool flag = true;
var newItemsList = ItemsList.Take(62).ToList();
while (flag)
{
int j=0;
WaitHandle[] waitHandles = new WaitHandle[62];
foreach (Item item in newItemsList)
{
var delegateInstance = new MyDelegate(MyMethod);
IAsyncResult asyncResult = delegateInstance.BeginInvoke(item.id, new AsyncCallback(MyAsyncResults), null);
waitHandles[j] = asyncResult.AsyncWaitHandle;
j++;
}
WaitHandle.WaitAll(waitHandles);
count = count + 62;
newItemsList = ItemsList.Skip(count).Take(62).ToList();
}
return sum;
}
public double MyMethod(int id)
{
//Calculations
return sum;
}
static public void MyAsyncResults(IAsyncResult iResult)
{
AsyncResult asyncResult = (AsyncResult) iResult;
MyDelegate del = (MyDelegate) asyncResult.AsyncDelegate;
double mySum = del.EndInvoke(iResult);
sum = sum + mySum;
}
It's possible to reduce number of loops by various techniques. However, this won't give you any noticeable improvement since the heavy computation is performed inside your loops. If you've already parallelized it to use all your CPU cores there is not much to be done. There is a certain amount of computation to be done and there is a certain computer power available. You can't squeeze from your machine more than it can provide.
You can try to:
Do a more efficient implementation of your algorithm if it's possible
Switch to faster environment/language, such as unmanaged C/C++.
Is there a rationale behind your batches size (62)?
Is "MyMethod" method IO bound or CPU bound?
What you do in each cycle is wait till all the batch completes and this wastes some cycles (you are actually waiting for all 62 calls to complete before taking the next batch).
Why won't you change the approach a bit so that you still keep N operations running simultaneosly, but you fire a new operation as soon as one of the executind operations completes?
According to this blog, for loops are more faster than foreach in case of collections. Try looping with for. It will help.
It sounds like you have a CPU intensive MyMethod. For CPU intensive tasks, you can gain a significant improvement by parallelization, but only to the point of better utilizing all CPU cores. Beyond that point, too much parallelization can start to hurt performance -- which I think is what you're doing. (This is unlike I/O intensive tasks where you pretty much parallelize as much as possible.)
What you need to do, in my opinion, is write another method that takes a "chunk" of items (not a single item) and returns their "sum":
double SumChunk(IEnumerable<Item> items)
{
return items.Sum(x => MyMethod(x));
}
Then divide the number of items by n (n being the degree of parallelism -- try n = number of CPU cores, and compare that to x2) and pass each chunk to an async task of SumChunk. And finally, sum up the sub-results.
Also, watch if any of the chunks is completed much before the other ones. If that's the case, then your task distributions is not homogen. You'd need to create smaller chunks (say chunks of 300 items) and pass those to SumChunk.
Correct me if I'm wrong, but it looks to me like your threading is at the individual item level - I wonder if this may be a little too granular.
You are already doing your work in blocks of 62 items. What if you were to take those items and process all of them within a single thread? I.e., you would have something like this:
void RunMyMethods(IEnumerable<Item> items)
{
foreach(Item item in items)
{
var result = MyMethod(item);
...
}
}
Keep in mind that WaitHandle objects can be slower than using Monitor objects: http://www.yoda.arachsys.com/csharp/threads/waithandles.shtml
Otherwise, the usual advice holds: profile the performance to find the true bottlenecks. In your question you state that it takes 2-3 seconds per iteration - with 20000 iterations, it would take a fair bit more than 20 minutes.
Edit:
If you are wanting to maximise your usage of CPU time, then it may be best to split your 20000 items into, say, four groups of 5000 and process each group in its own thread. I would imagine that this sort of "thick 'n chunky" concurrency would be more efficient than a very fine-grained approach.
To start with, the numbers just don't add:
20k iterations,for each iteration it is taking around two or three seconds and total around 20minutes
That's a x40 'parallelism factor' - you can never achieve that running on a normal machine.
Second, when 'optimizing' a CPU intensive computation, there's no sense in parallelizing beyond the number of cores. Try dropping that magical 62 to 16 and bench test - it will actually run faster.
I ran a deformed malversion of your code on my laptop, and got some 10-20% improvement using Parallel.ForEach
So maybe you can make it run 17 minutes instead of 20 - does it really matter ?
Does looping in C# occur at the same speed for all systems. If not, how can I control a looping speed to make the experience consistent on all platforms?
You can set a minimum time for the time taken to go around a loop, like this:
for(int i= 0; i < 10; i++)
{
System.Threading.Thread.Sleep(100);
... rest of your code...
}
The sleep call will take a minimum of 100ms (you cannot say what the maximum will be), so your loop wil take at least 1 second to run 10 iterations.
Bear in mind that it's counter to the normal way of Windows programming to sleep on your user-interface thread, but this might be useful to you for a quick hack.
You can never depend on the speed of a loop. Although all existing compilers strive to make loops as efficient as possible and so they probably produce very similar results (given enough development time), the compilers are not the only think influencing this.
And even leaving everything else aside, different machines have different performance. No two machines will yield the exact same speed for a loop. In fact, even starting the program twice on the same machine will yield slightly different performances. It depends on what other programs are running, how the CPU is feeling today and whether or not the moon is shining.
No, loops do not occur the same in all systems. There are so many factors to this question that it can not be appreciable answered without code.
This is a simple loop:
int j;
for(int i = 0; i < 100; i++) {
j = j + i;
}
this loop is too simple, it's merely a pair of load, add, store operations, with a jump and a compare. This will be only a few microops and will be really fast. However, the speed of those microops will be dependent on the processor. If the processor can do one microop in 1 billionth of a second (roughly one gigahertz) then the loop will take approximately 6 * 100 microops (this is all rough estimation, there are so many factors involved that I'm only going for approximation) or 6 * 100 billionths of a second, or slightly less than one millionth of a second. For the entire loop. You can barely measure this with most operating system functions.
I wanted to demonstrate the speed of the looping. I referenced above a processor of 1 billion microops per second. Now consider a processor that can do 4 billion microops per second. That processor would be four times faster (roughly) than the first processor. And we didn't change the code.
Does this answer the question?
For those who want to mention that the compiler might loop unroll this, ignore that for the sake of the learning.
One way of controlling this is by using the Stopwatch to control when you do your logic. See this example code:
int noofrunspersecond = 30;
long ticks1 = 0;
long ticks2 = 0;
double interval = (double)Stopwatch.Frequency / noofrunspersecond;
while (true) {
ticks2 = Stopwatch.GetTimestamp();
if (ticks2 >= ticks1 + interval) {
ticks1 = Stopwatch.GetTimestamp();
//perform your logic here
}
Thread.Sleep(1);
}
This will make sure that that the logic is performed at given intervals as long as the system can keep up, so if you try to execute 100 times per second, depending on the logic performed the system might not manage to perform that logic 100 times a second. In other cases this should work just fine.
This kind of logic is good for getting smooth animations that will not speed up or slow down on different systems for example.
I am trying to run the following program from the book.
The author claims that the resultant output
" should be "
1000
2000
....
10000
if you run the program on normal processor but on multiprocessor computer it could be
999
1998
...
9998
when using normal increment method (number+=1) but using the intelocked increment as shown in the program solves the problem(i.e. you get first output)
Now I have got 3 questions.
First why cant i use normal increment in the inner loop [i++ instead of Interlocked.Increment(ref i)]. Why has author choosed the other method?
Secondly what purpose does Thread.Sleep(1000) has in the context. When I comment out this line, I get second output even if I am using Interlocked method to increment number.
Thirdly I get correct output even by using normal increment method [number += 1] if I dont comment the Thread.Sleep(1000) line and second output if I do so.
Now I am running the program on Intel(R) Core(TM) i7 Q820 cpu if it makes any difference
static void Main(string[] args)
{
MyNum n = new MyNum();
for (int a = 0; a < 10; a++)
{
for (int i = 1; i <= 1000; Interlocked.Increment(ref i))
{
Thread t = new Thread(new ThreadStart(n.AddOne));
t.Start();
}
Thread.Sleep(1000);
Console.WriteLine(n.number);
}
}
class MyNum
{
public int number = 0;
public void AddOne()
{
Interlocked.Increment(ref number);
}
}
The sleep is easy--let the threads finish before you look at the result. It's not really a good answer, though--while they should finish in a second there is no guarantee they actually do.
The need for the interlocked increment in the MyNum class is clear--there are 1000 threads trying for the number, without protection it would be quite possible for one to read the number, then a second read it, then the first one put it back and then the second put it back, wiping out the change the first one made. Note that such errors are FAR more likely when there are multiple cores, otherwise it can only happen if a thread switch hits at the wrong time.
I can't see why i needs to be protected, though.
Edit: You are getting about the same result because the code executes too fast. The thread runs faster than it's created so they aren't running all at once.
Try:
public void AddOne()
{
int x = number + fibnocci(20) + 1 - fibnocci(20);
}
private int fibnocci(int n)
{
if (n < 3) return 1 else return fibnocci(n - 1) + fibnocci(n - 2);
}
(I hope the optimizer isn't good enough to kill this extra code)
The code is actually pretty strange. Since Thread t is declared locally on each iteration, it can possibly be garbage collected by .NET because no reference exists to the thread. Anyway...
To answer the first question, I don't see a need for Interlocked.Increment(ref i) to take place. The main thread is the only thread that will touch i. Using i++ is not a problem here.
For the second question, Thread.Sleep(1000) exists to give the program enough time to complete all the threads. Your i7 (quad core with hyper-threading) is probably finishing each item pretty fast.
For the third question, having the same result is not really a guaranteed with number += 1. The two cores might read the same numeral and increment the numerals to the same value (i.e., 1001, 1001).
Lastly, I'm not sure whether or not you are running the program in debug mode. Building the program in release mode may give you different behaviors and side effects that a multi-threaded program should do.
if you comment out the thread.sleep line, there is a good chance that the threads will not finish prior to the print line... in this case you will see a number smaller than the "correct" output, but not because the incrementer wasn't atomic.
On a true multicore system, it is possible for the non-atomic actions to collide. Perhaps you are doing too few iterations to see the collision.
I'm writing a UDP multicast client/server pair in C# and I need a delay on the order of 50-100 µsec (microseconds) to throttle the server transmission rate. This helps to avoid significant packet loss and also helps to keep from overloading the clients that are disk I/O bound. Please do not suggest Thread.Sleep or Thread.SpinWait. I would not ask if I needed either of those.
My first thought was to use some kind of a high-performance counter and do a simple while() loop checking the elapsed time but I'd like to avoid that as it feels kludgey. Wouldn't that also peg the CPU utilization for the server process?
Bonus points for a cross-platform solution, i.e. not Windows specific. Thanks in advance, guys!
Very short sleep times are generally best achieved by a CPU spin loop (like the kind you describe). You generally want to avoid using the high-precision timer calls as they can themselves take up time and skew the results. I wouldn't worry too much about CPU pegging on the server for such short wait times.
I would encapsulate the behavior in a class, as follows:
Create a class whose static constructor runs a spin loop for several million iterations and captures how long it takes. This gives you an idea of how long a single loop cycle would take on the underlying hardware.
Compute a uS/iteration value that you can use to compute arbitrary sleep times.
When asked to sleep for a particular period of time, divide uS to sleep by the uS/iteration value previously computed to identify how many loop iterations to perform.
Spin using a while loop until the estimated time elapses.
I would use stopwatch but would need a loop
read this to add more extension to the stopwatch, like ElapsedMicroseconds
or something like this might work too
System.Diagnostics.Stopwatch.IsHighResolution MUST be true
static void Main(string[] args)
{
Stopwatch sw;
sw = Stopwatch.StartNew();
int i = 0;
while (sw.ElapsedMilliseconds <= 5000)
{
if (sw.Elapsed.Ticks % 100 == 0)
{ i++; /* do something*/ }
}
sw.Stop();
}
I've experienced with such requirement when I needed more precision with my multicast application.
I've found that the best solution resides with the MultiMedia timers as seen in this example.
I've used this implementation and added TPL async invoke to it. You should see my SimpleMulticastAnalyzer project for more information.
static void udelay(long us)
{
var sw = System.Diagnostics.Stopwatch.StartNew();
long v = (us * System.Diagnostics.Stopwatch.Frequency )/ 1000000;
while (sw.ElapsedTicks < v)
{
}
}
static void Main(string[] args)
{
for (int i = 0; i < 100; i++)
{
Console.WriteLine("" + i + " " + DateTime.Now.Second + "." + DateTime.Now.Millisecond);
udelay(1000000);
}
}
I would discourage using spin loop as it consumes and creates blocking thread. thread.sleep is better, it doesn't use processor resource during sleep, it just slice the time. Try it, and you'll see from task manager how the CPU usage spike with the spin loop.
Have you looked at multimedia timers? You could probably find a .NET library somewhere that wraps the API calls somewhere.