To start, I'm new to Threads, never worked with them before, but since the current project is severely impacted by slow runtimes, I wanted to take a peek into multithreading and this post is a question on whether it is possible given my current project, and how I would approach this in its entirety.
To start, you must understand that I have a Windows Form UI, which has a button that, upon clicking, runs an algorithm that generates an image. This is done in 3 steps really: Extracting patterns (if the settings have changed only), running the algorithm and displaying the result bitmap.
The average time spent on each part is as follows:
Pattern extraction takes up the biggest chunk, usually 60%+ of the running time.
The Algorithm itself takes up 40% of the running time.
However, if no settings have been changed, simply re-running won't require the recalculation of the patterns and hence it's way faster.
The displaying of the result bitmap, due to the bitmap being rescaled, takes a fixed ~200ms (Which I think can be optimized but IDK how).
The problem I'm having with trying to grasp the threading issue is that the algorithm is based on the patterns extracted from the first step, and the resulting bitmap is dependent on the algorithm.
My algorithm does, however, compute each pixel one by one, so I was wondering if it was possible to, once a single pixel has been calculated, already display it, such that the displaying of the image and the calculation of the others can be done in parallel.
If anything is unclear, please feel free to ask any questions.
Thank you!
current project is severely impacted by slow runtime
I would advice that you start with doing some measurements/profiling before doing anything else. It is not uncommon for programs to waste the vast majority of time doing useless stuff. Avoiding such unnecessary work can give a much more performance improvement than multi threading.
The typical method for moving processing intensive work to a background work is using Task.Run and async/await for processing the result. Note that using a background thread to avoid blocking the UI thread is different from doing the processing in parallel to improve performance, but both methods can be combined if needed.
My algorithm does, however, compute each pixel one by one, so I was wondering if it was possible to, once a single pixel has been calculated, already display it, such that the displaying of the image and the calculation of the others can be done in parallel.
Updating the displayed image for every pixel is probably not the way to go, since that would be rather costly. And you are typically not allowed to touch objects used by the UI from a background thread.
One way to manage things like this would be to have a timer that updates the UI every so often, and a shared buffer for the processed data. Once the update is triggered you would have a method that copies the shared buffer to the displayed bitmap, without locks this would not guarantee that the latest values are included, but for simply showing progress it might be good enough.
You could also consider things like splitting the image into individual chunks, so that you can process complete chunks on the background thread, and then put them in a output queue for the UI thread to pickup and display. See for example channel
Related
Using C#.
I have 100,000+ pieces of test data that need to have some calculations run with. My actual data set will be in the millions of pieces of data. The test data currently runs sequentially and takes about a minute to process. I want to split this work up and have backgroundworkers process back to back so I will hopefully get the processing done quicker.
What I have in mind is to do a foreach loop with the data and start a backgroundworker with each piece of data. I know I need to limit the number of bw's to three as I have 4 cores on this machine. I have done some testing with simple bw's but not three at the same time.
I have no idea how to go about this. How would one execute three background workers to process this data?
The BackgroundWorker is designed for early learning work mostly. Maybe the odd alternative threading scenario. What you are doing sounds like a very advanced opeartion. You can still use BGW, but raw Threads, Tasks, Threadpools and the like would be better at this point.
There is also the general question if this operation can even be accelerated with Multithreading. I like to say "multithreading has to pick it's problems carefully". Pick it in the wrong scenario and you end with a programm that needs more memory, is more prone to errors and slower then a single BGW or sequential programm.
Your case could be one of the rare cases of a pleasingly paralell operation. Or it could be mostly memory bound. Wich means you run into Paralell slowdown almost instantly. Resist atempts at hardcoding the number of threads. Usually you can leave that load-balancing work to a ThreadPool. To get a better answer you need to get a lot more specific.
I developed a c# application that takes in input the streamRGB (640x480 rate:30fps) generated by Kinect device. After each frame is received I save it on disk as file.wmv.
The problem starts when I try to manipulate each frame before save it, cause the stream rate is 30fps and the operation of manipulating lasts about 200ms (so I can acquire just 5fps).
I know this is a common problem. What are the most common solution used in order to solve it?
This is a common problem that occurs when you need to do something in real-time, but which is actually too slow to be handled in real-time. The first and foremost 'solution' would be to increase performance of the real-time operations so it will be fast enough, however this is often not possible.
The more realistic option is to establish a queue to be processed on another thread. This is a perfect example for a consumer/producer design pattern, as you can will be producing frames and consuming them to be handled as fast as possible. To off-load memory, you can write the frames to the file disk and read them when consuming.
Also note that GDI+, the code behind bitmaps, is single threaded and will lock everything regarding image manipulation to a single thread. This can be migrated using different processes (one for each core) to optimize machine performance.
I have a lighting system in my xna game that loops through each light, and adds these lights to a final light map, which contains all the lights.
The process to create these lights involves many functions that have to do with the graphics device, such as using effects / shaders and drawing to render targets, and using graphics.device.clear to clear the render target, etc
So my question is, would it be possible to multi thread each light? Or would this not be possible because there is only 1 graphics device, and only 1 thread can use it at a time? If it is possible, would it improve performance?
Basically no. The GraphicsDevice in XNA is single-threaded for rendering. You can send resources (textures, vertex buffers, etc) to the GPU from multiple threads. But you can only call Draw (and other rendering functions like Present) from your main thread.
I have heard of people having success doing rendering-type-things from multiple threads with the appropriate locking in place. But that seems like "bad voodoo". As the linked post says: "the XNA Framework documentation doesn’t make any promises here". Not to mention: even getting the locking right is tricky.
I'm not really sure about making multiple graphics devices - I've not tried it myself. I think that it is possible, but that you can't share resources between the devices - making it fairly useless. Probably not worth the effort.
As jalf mentioned in a comment on your question - once you get to the GPU everything is handled in parallel already. So this would only be useful if you are CPU limited due to hitting the batch limit (because almost everything that isn't your batches can be moved to another thread). And in that case there are many optimisations to consider first to reduce the number of batches - before trying a crazy scheme like this. (And you have measured your performance, right?)
It sounds like what you might be trying to do is render a fairly complicated scene to a render target in the background, and spreading the load across many frames. In that case - if performance requirements dictate it - you could perhaps render across multiple frames, on the main thread, scheduling it manually. Don't forget to set RenderTargetUsage.PreserveContents so it doesn't get cleared each time you put it on the graphics device.
I've recently been getting a bit of lag since I moved all of my c# SlimDX DX11 rendering code from my Form (yes, I'm a lazy developer) to bespoke classes. I whacked my program into EQATEC Profiler and got this as the major contributor to my lag:
Now it's clear here that whatever's in postRender() is really hogging the precious milliseconds. In fact, whatever crazy, convoluted code I have in there is effectively reducing my frame rate to ~15 FPS on its own.
So what's in postRender()? Just one line of code:
swapChain.Present(0, PresentFlags.None);
I just have no idea what's caused it to slow down so much, I've not made any changes to the swapchain code at all. All I've altered is the screen resolution (1680x1050), but that should be absolutely fine (for reference, this machine can run crysis2 at maximum settings at that resolution without breaking a sweat).
Does anybody have any idea what might cause a swapchain to take so long on presenting or where I should look next for problems?
EDIT:
Looking at the structure of my code, my RenderFrame() function is as follows:
preRender();
DeferredRender(preShader);
//Composite scene to output image
CompositeScene(compositeShader);
//Post Process
PostProcess(postProcShader);
//Depth of Field
DoF(dofShader);
//Present the swapchain
postRender();
The results of some of these functions are based on the functions before (for example, DeferredRender uses four render targets to capture Diffuse lighting, Normals, Positions and Color in a per-pixel manner. CompositeScene then puts them all together. This would require the GPU to have computed the previous step before it can continue. This whole process continues along, with DoF requiring the results of PostProcess, etc. Therefore the only shader that could possibly be holding Swapchain.Present() up must be the shader which runs in the function DoF, as all the other shaders cause the CPU to lock until they're finished. Correct?
There are a few reasons why you might find Present() taking up that much time in your frame. The Present call is the primary method of synchronization between the CPU and the GPU; if the CPU is generating frames much faster than the GPU can process them, they'll get queued up. Once the buffer gets full, Present() turns into a glorified Sleep() call while it waits for it to empty out.
Of course, it's pretty much impossible to say with the little information that you've given here. Just because a machine runs Crysis doesn't mean you can get away with throwing anything you want at the card. Double check that you're not expecting to render crazy amounts of geometry and that your shaders aren't abnormally long and complex.
Also take a look at your app using one of the available GPU profilers; PIX is good as a base point, while NVIDIA and AMD each have their own more specific offerings for their own products. Finally, make sure your drivers are updated. If there's a bug in the driver, any chance you have at reasoning about the issue goes out the window.
I'm making a loading screen for a game in c#. Do I need to create a thread for drawing the spinning animation as well as a thread for loading the level?
I'm a bit confused as to how it works. I've spent quite a few hours messing with it to no avail. Any help would be appreciated.
Short of anything that XNA may provide for you, anytime you require doing multiple units of work at once, multiple threads are usually required - and almost certainly if you want to benefit from multiple CPUs. Depending upon exactly what you're looking to do, you're already in one thread (for your main method / program execution) - so you don't likely need to create 2 additional threads - but just one additional for either the loading of your level, or for the animation.
Alternatively, as was probably more common-place in older development when developers weren't concerned with multi-core CPUs, etc., you could use tricks such as doing both the level loading and the animation in the same thread - but at the expense of additional complexity for combining both concerns into the same unit of processing. (In every x # of lines of processing for loading the level, add code to update the loading animation.) However, given today's technology, you are almost certainly better off using multiple threads for this.
Loading takes time because it makes long calculations and long calculations are usually done in a different thread so that the program woun't freeze.
So the answer is yes.