I've recently been getting a bit of lag since I moved all of my c# SlimDX DX11 rendering code from my Form (yes, I'm a lazy developer) to bespoke classes. I whacked my program into EQATEC Profiler and got this as the major contributor to my lag:
Now it's clear here that whatever's in postRender() is really hogging the precious milliseconds. In fact, whatever crazy, convoluted code I have in there is effectively reducing my frame rate to ~15 FPS on its own.
So what's in postRender()? Just one line of code:
swapChain.Present(0, PresentFlags.None);
I just have no idea what's caused it to slow down so much, I've not made any changes to the swapchain code at all. All I've altered is the screen resolution (1680x1050), but that should be absolutely fine (for reference, this machine can run crysis2 at maximum settings at that resolution without breaking a sweat).
Does anybody have any idea what might cause a swapchain to take so long on presenting or where I should look next for problems?
EDIT:
Looking at the structure of my code, my RenderFrame() function is as follows:
preRender();
DeferredRender(preShader);
//Composite scene to output image
CompositeScene(compositeShader);
//Post Process
PostProcess(postProcShader);
//Depth of Field
DoF(dofShader);
//Present the swapchain
postRender();
The results of some of these functions are based on the functions before (for example, DeferredRender uses four render targets to capture Diffuse lighting, Normals, Positions and Color in a per-pixel manner. CompositeScene then puts them all together. This would require the GPU to have computed the previous step before it can continue. This whole process continues along, with DoF requiring the results of PostProcess, etc. Therefore the only shader that could possibly be holding Swapchain.Present() up must be the shader which runs in the function DoF, as all the other shaders cause the CPU to lock until they're finished. Correct?
There are a few reasons why you might find Present() taking up that much time in your frame. The Present call is the primary method of synchronization between the CPU and the GPU; if the CPU is generating frames much faster than the GPU can process them, they'll get queued up. Once the buffer gets full, Present() turns into a glorified Sleep() call while it waits for it to empty out.
Of course, it's pretty much impossible to say with the little information that you've given here. Just because a machine runs Crysis doesn't mean you can get away with throwing anything you want at the card. Double check that you're not expecting to render crazy amounts of geometry and that your shaders aren't abnormally long and complex.
Also take a look at your app using one of the available GPU profilers; PIX is good as a base point, while NVIDIA and AMD each have their own more specific offerings for their own products. Finally, make sure your drivers are updated. If there's a bug in the driver, any chance you have at reasoning about the issue goes out the window.
Related
To start, I'm new to Threads, never worked with them before, but since the current project is severely impacted by slow runtimes, I wanted to take a peek into multithreading and this post is a question on whether it is possible given my current project, and how I would approach this in its entirety.
To start, you must understand that I have a Windows Form UI, which has a button that, upon clicking, runs an algorithm that generates an image. This is done in 3 steps really: Extracting patterns (if the settings have changed only), running the algorithm and displaying the result bitmap.
The average time spent on each part is as follows:
Pattern extraction takes up the biggest chunk, usually 60%+ of the running time.
The Algorithm itself takes up 40% of the running time.
However, if no settings have been changed, simply re-running won't require the recalculation of the patterns and hence it's way faster.
The displaying of the result bitmap, due to the bitmap being rescaled, takes a fixed ~200ms (Which I think can be optimized but IDK how).
The problem I'm having with trying to grasp the threading issue is that the algorithm is based on the patterns extracted from the first step, and the resulting bitmap is dependent on the algorithm.
My algorithm does, however, compute each pixel one by one, so I was wondering if it was possible to, once a single pixel has been calculated, already display it, such that the displaying of the image and the calculation of the others can be done in parallel.
If anything is unclear, please feel free to ask any questions.
Thank you!
current project is severely impacted by slow runtime
I would advice that you start with doing some measurements/profiling before doing anything else. It is not uncommon for programs to waste the vast majority of time doing useless stuff. Avoiding such unnecessary work can give a much more performance improvement than multi threading.
The typical method for moving processing intensive work to a background work is using Task.Run and async/await for processing the result. Note that using a background thread to avoid blocking the UI thread is different from doing the processing in parallel to improve performance, but both methods can be combined if needed.
My algorithm does, however, compute each pixel one by one, so I was wondering if it was possible to, once a single pixel has been calculated, already display it, such that the displaying of the image and the calculation of the others can be done in parallel.
Updating the displayed image for every pixel is probably not the way to go, since that would be rather costly. And you are typically not allowed to touch objects used by the UI from a background thread.
One way to manage things like this would be to have a timer that updates the UI every so often, and a shared buffer for the processed data. Once the update is triggered you would have a method that copies the shared buffer to the displayed bitmap, without locks this would not guarantee that the latest values are included, but for simply showing progress it might be good enough.
You could also consider things like splitting the image into individual chunks, so that you can process complete chunks on the background thread, and then put them in a output queue for the UI thread to pickup and display. See for example channel
Hello I made a game server in c# for a big online game.
The only problem is that I get sometimes out of no where 90% cpu usage.
It also got stuck on that 90% when the bug is there it stays on the 90% for ever..
So my question is how can I find the code failure on a simple way because the server is huge.
Are there any .net profilers good foor or something like that?
It also got stuck on that 90% when the bug is there it stays on the 90% for ever
That's awesome! You just got handed the holy grail right there my friend!
Simply attach a debugger when it gets in that state and break. Chances are you'll break in the buggy code, if not keep running and breaking. You should get there really quick if it's using 90% of your CPU.
Alternatively run it through VS's profiler and zoom the graph in only the zone with extremely high sustained CPU usage, you'll get a list of functions that use the most time (assuming it's a CPU bound issue, if it's I/O I don't think it will show).
I've been working on a game in Visual C# (Not the best platform, I know), and, as would perhaps be expected, it has started to run rather slowly. Running some tests has shown that the main hold up is in drawing images. I've been told that Sprite Batching is a good fix for that.
Problem is, I can't find anything on sprite batching that isn't specific to XNA or OpenGL. I know little to nothing about the process, and I was hoping to get some information on whether such a thing can be implemented using Visual Studio's Visual C#, and (if so) where I can go to learn more about it. If not, are there any other useful methods of speeding the process up a bit? Thanks!
It basically comes down to batching together calls to save on state switches (textures, fill rate) and draw calls (sending a draw call 50,000 times isn't as efficient as sending a single draw call, surprisingly enough). You're going to have to check, when calling the equivalent of a SpriteBatch.Draw(...), the following:
An internal 'max size' of your batch
If the texture switches, flush your buffer (i.e. draw whatever you have)
If SpriteBatch.End(...) has been called (you're done; flush the buffer and draw)
If you're still having trouble, feel free to check out MonoGame's implementation.
Edit: found a great gamedev question about this.
I have a lighting system in my xna game that loops through each light, and adds these lights to a final light map, which contains all the lights.
The process to create these lights involves many functions that have to do with the graphics device, such as using effects / shaders and drawing to render targets, and using graphics.device.clear to clear the render target, etc
So my question is, would it be possible to multi thread each light? Or would this not be possible because there is only 1 graphics device, and only 1 thread can use it at a time? If it is possible, would it improve performance?
Basically no. The GraphicsDevice in XNA is single-threaded for rendering. You can send resources (textures, vertex buffers, etc) to the GPU from multiple threads. But you can only call Draw (and other rendering functions like Present) from your main thread.
I have heard of people having success doing rendering-type-things from multiple threads with the appropriate locking in place. But that seems like "bad voodoo". As the linked post says: "the XNA Framework documentation doesn’t make any promises here". Not to mention: even getting the locking right is tricky.
I'm not really sure about making multiple graphics devices - I've not tried it myself. I think that it is possible, but that you can't share resources between the devices - making it fairly useless. Probably not worth the effort.
As jalf mentioned in a comment on your question - once you get to the GPU everything is handled in parallel already. So this would only be useful if you are CPU limited due to hitting the batch limit (because almost everything that isn't your batches can be moved to another thread). And in that case there are many optimisations to consider first to reduce the number of batches - before trying a crazy scheme like this. (And you have measured your performance, right?)
It sounds like what you might be trying to do is render a fairly complicated scene to a render target in the background, and spreading the load across many frames. In that case - if performance requirements dictate it - you could perhaps render across multiple frames, on the main thread, scheduling it manually. Don't forget to set RenderTargetUsage.PreserveContents so it doesn't get cleared each time you put it on the graphics device.
I develop a utility that behaves like "Adobe photoshop". user can draw rectangle,circle,... with mouse pointer and then move or resize it. for this functionality, I assume each shape is a object that stored in a generic collection in a container object. when user wants to change anything I recognise that where he clicked and in behind of scence I select the target object and so on...
this way have a problem when objects in screen is lot or user loads a picture with high resolution.
What's your opinion?
How can I solve it?
This makes sense because the larger the data set, the more RAM and CPU will be required to handle it.
While performance issues are important to solve, a lot of it may be perceieved performance so something like a threading issue - where you have one thread trying to process the information and you block the UI thread which makes it look like the system is freezing.
There is a lot of information on StackOverflow that you may want to look at
C# Performance Optimization
C# Performance Best Practices
C# Performance Multi threading
C# Performance Collections (Since you said you were using a collection)
Use a profiler such as dotTrace and find out which method is the one most called and the one that takes the most amount of time to process. Those are the ones you want to try to optimize. Other than that, you may have to go down to the GPU to try to speed things up.
About these kind of problem, think about parallel extensions :
http://msdn.microsoft.com/en-us/concurrency/default.aspx
The more cpu you have, the faster your program is running.
The thing is that in hi resolution the computer needs to use more the processor, then this occurs, remember that this also occurs in The Gimp, even in Adobe Photoshop.
Regards.
Look into using a performance analyzing tool (such as ANTS Profiler) to help you pinpoint exactly where the bottle necks are occurring. True graphical computations on a high res photo require alot of resources, but I would assume the logic you are using to manage and find your objects require some tuning up as well.
I high-resolution image takes up a lot of memory (more bits-per-pixel). As such, any operation that you do to it means more bits to manipulate.
Does your program utilise "layers"?
If not, then I'm guessing you are adding components directly to the image - which means each operation has to manipulate the bits. So if you aren't using layers, then you should definitely start. Layers will allow you to draw operations to the screen but only merge them into the base high-resolution image once - when you save!
What library from Windows are you using to open the image?
If you are using System.Drawing then you are actually using GDI+ as it is a wrapper on top of it. GDI+ is nice for a lot of things because it simplies tons of operations, however it isn't the fastest in the world. For example using the [Get|Set]Pixel methods are MUCH slower than working directly on the BitmapData. As there are tons of articles on speeding up operations on top of GDI+ I will not re-iterate them here.
I hope the information I've provided answer some of your questions causes new ones. Good luck!