Issue realtime frame manipulation

Issue realtime frame manipulation - c#

I developed a c# application that takes in input the streamRGB (640x480 rate:30fps) generated by Kinect device. After each frame is received I save it on disk as file.wmv.
The problem starts when I try to manipulate each frame before save it, cause the stream rate is 30fps and the operation of manipulating lasts about 200ms (so I can acquire just 5fps).
I know this is a common problem. What are the most common solution used in order to solve it?

This is a common problem that occurs when you need to do something in real-time, but which is actually too slow to be handled in real-time. The first and foremost 'solution' would be to increase performance of the real-time operations so it will be fast enough, however this is often not possible.
The more realistic option is to establish a queue to be processed on another thread. This is a perfect example for a consumer/producer design pattern, as you can will be producing frames and consuming them to be handled as fast as possible. To off-load memory, you can write the frames to the file disk and read them when consuming.
Also note that GDI+, the code behind bitmaps, is single threaded and will lock everything regarding image manipulation to a single thread. This can be migrated using different processes (one for each core) to optimize machine performance.

Related

Threading UI Application

To start, I'm new to Threads, never worked with them before, but since the current project is severely impacted by slow runtimes, I wanted to take a peek into multithreading and this post is a question on whether it is possible given my current project, and how I would approach this in its entirety.
To start, you must understand that I have a Windows Form UI, which has a button that, upon clicking, runs an algorithm that generates an image. This is done in 3 steps really: Extracting patterns (if the settings have changed only), running the algorithm and displaying the result bitmap.
The average time spent on each part is as follows:
Pattern extraction takes up the biggest chunk, usually 60%+ of the running time.
The Algorithm itself takes up 40% of the running time.
However, if no settings have been changed, simply re-running won't require the recalculation of the patterns and hence it's way faster.
The displaying of the result bitmap, due to the bitmap being rescaled, takes a fixed ~200ms (Which I think can be optimized but IDK how).
The problem I'm having with trying to grasp the threading issue is that the algorithm is based on the patterns extracted from the first step, and the resulting bitmap is dependent on the algorithm.
My algorithm does, however, compute each pixel one by one, so I was wondering if it was possible to, once a single pixel has been calculated, already display it, such that the displaying of the image and the calculation of the others can be done in parallel.
If anything is unclear, please feel free to ask any questions.
Thank you!

current project is severely impacted by slow runtime
I would advice that you start with doing some measurements/profiling before doing anything else. It is not uncommon for programs to waste the vast majority of time doing useless stuff. Avoiding such unnecessary work can give a much more performance improvement than multi threading.
The typical method for moving processing intensive work to a background work is using Task.Run and async/await for processing the result. Note that using a background thread to avoid blocking the UI thread is different from doing the processing in parallel to improve performance, but both methods can be combined if needed.
My algorithm does, however, compute each pixel one by one, so I was wondering if it was possible to, once a single pixel has been calculated, already display it, such that the displaying of the image and the calculation of the others can be done in parallel.
Updating the displayed image for every pixel is probably not the way to go, since that would be rather costly. And you are typically not allowed to touch objects used by the UI from a background thread.
One way to manage things like this would be to have a timer that updates the UI every so often, and a shared buffer for the processed data. Once the update is triggered you would have a method that copies the shared buffer to the displayed bitmap, without locks this would not guarantee that the latest values are included, but for simply showing progress it might be good enough.
You could also consider things like splitting the image into individual chunks, so that you can process complete chunks on the background thread, and then put them in a output queue for the UI thread to pickup and display. See for example channel

Limits of (soft real-time) timing requirements in Windows OS

In the company I work for we build machines which are controlled by software running on Windows OS. A C# application communicates with a bus controller (via a DLL). The bus controller runs on a tact time of 15ms. That means, that we get updates of the actual sensors in the system with a heart beat of 15ms from the bus controller (which is real time).
Now, the machines are evolving into a next generation, where we get a new bus controller which runs on a tact of 1ms. Since everybody realizes that Windows is not a real time OS, the question arises: should we move the controlling part of the software to a real time application (on a real time OS, e.g. a (soft) PLC).
If we stay on the windows platform, we do not have guaranteed responsiveness. That on itself is not necessarily a problem; if we miss a few bus cycles (have a few hickups), the machine will just produce slightly slower (which is acceptable).
The part that worries me, is Thread synchronization between the main machine controlling thread, and the updates we receive from the real time controller (every millisecond).
Where can I learn more about how Windows / .NET C# behaves when it goes down the path of thread synchronization on milliseconds? I know that e.g. Thread.Sleep(1) can take up to 15 ms because Windows is preempting other tasks, so how does this reflect when I synchronize between two threads with Monitor.PulseAll every ms? Can I expect the same unpredictable behavior? Is it asking for trouble when I am moving into the soft real time requirements of 1ms in Windows applications?
I hope somebody with experience on these aspects of threading can shed some light on this. If I need to clarify more, by all means, shoot.

Your scenario sounds like a candidate for a kiosk-mode/dedicated application.
In the company I work for we build machines which are controlled by software running on Windows OS.
If so, you could rig the machines such that your low-latency I/O thread could run on a dedicated core with thread and process priorities maximized. Furthermore, ensure the machine has enough cores to handle a buffering thread as well as any others that process your data in transit. The buffer should allocate memory upfront if possible to avoid garbage collection bottlenecks.
#Aron's example is good for situations where data integrity can be compromised to a certain extent. In audio, latency matters a lot during recording for multiple reasons but for pure playback, data loss is acceptable to a certain degree. I am assuming this is not an option in your case.
Of course Windows is not designed to be a real-time OS but if you are using it for a dedicated app, you have control over every aspect of it and can turn off all unrelated services and background processes.
I have had a reasonable amount of success writing software to monitor how well UPS units cope with power fluctuations by measuring their power compensation response times (disclaimer: not for commercial purposes though). Since the data to measure per sample was very small, the GC was not problematic and we cycled pre-allocated memory blocks for buffers.
Some micro-optimizations that came in handy:
Using immutable structs to poll I/O data.
Optimizing data structures to work well with memory allocation.
Optimizing processing algorithms to minimize CPU cache misses.
Using an optimized buffer class to hold data in transit.
Using the Monitor and Interlocked classes for synchronization.
Using unsafe code with (void*) to gain easy access to buffer arrays in various ways to decrease processing time. Minimal use of Marshal and Buffer.BlockCopy.
Lastly, you could go the DDK way and write a small driver. Albeit off-topic, DFMirage is a good example of a video driver that provides both an event-based and a polling model for differential screen capture such that the consumer application can chose on-the-fly based on system load.
As for Thread.Sleep, you could use it as sparingly as possible considering your energy consumption boundaries. With redundant processes out of the way, Thread.Sleep(1) should not be as bad as you think. Try the following to see what you get. Note that this has been coded in the SO editor so I may have made mistakes.
Thread.CurrentThread.Priority = ThreadPriority.Highest;
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.RealTime;
var ticks = 0L;
var iteration = 0D;
var timer = new Stopwatch();
do
{
iteration++;
timer.Restart();
Thread.Sleep(1);
timer.Stop();
ticks += timer.Elapsed.Ticks;
if (Console.KeyAvailable) { if (Console.ReadKey(true).Key == ConsoleKey.Escape) { break; } }
Console.WriteLine("Elapsed (ms): Last Iteration = {0:N2}, Average = {1:N2}.", timer.Elapsed.TotalMilliseconds, TimeSpan.FromTicks((long) (ticks / iteration)).TotalMilliseconds);
}
while (true);
Console.WriteLine();
Console.WriteLine();
Console.Write("Press any key to continue...");
Console.ReadKey(true);

Come to think about the actual problem itself, processing data at 1ms is pretty easy. When considering audio recording, as an analogous (pun not intended) problem, you might be able to find some inspiration in how to achieve your goals.
Bear in mind.
Even a modest setup can achieve 44.1kHz#16bit per channel sampling rate (that is about 22microseconds or less than a hundredth of your target).
Using ASIO you can achieve sub 10ms latencies
Most methods of achieving high sampling rates will work by increasing your buffer size and sending data to your system in batches
To achieve the best throughput, don't use threads. You DMA and interrupts to callback your processing loop.
Given that sound cards routinely can achieve your goals, you might have a chance.

Control Memory-Hungy Multi-Threaded App

This a VERY open question.
Basically, I have a computing application that launches test combinations for N Scenarios.
Each test is conducted in a single dedicated thread, and involves reading large binary data, processing it, and dropping results to DB.
If the number of threads is too large, the app gets rogue and eats out all available memory and hangs out..
What is the most efficient way to exploit all CPU+RAM capabilities (High Performance computing i.e 12Cores/16GB RAM) without putting the system down to its knees (which happens if "too many" simultaneous threads are launched, "too many" being a relative notion of course)
I have to specify that I have a workers buffer queue with N workers, every time one finishes and dies a new one is launched via a Queue. This works pretty fine as of now. But I would like to avoid "manually" and "empirically" setting the number of simultaneous threads and have an intelligent scalable system that drops as many threads at a time that the system can properly handle, and stop at a "reasonable" memory usage (the target server is dedicated to the app so there is no problem regarding other applications except the system)
PS : I know that .Net 3.5 comes with Thread Pools and .Net 4 has interesting TPL capabilites, that I am still considering right now (I never went very deep into this so far).
PS 2 : After reading this post I was a bit puzzled by the "don't do this" answers. Though I think such request is fair for a memory-demanding computing program.
EDIT
After reading this post I will to try to use WMI features

All built-in threading capabilities in .NET do not support adjusting according to memory usage. You need to build this yourself.
You can either predict memory usage or react to low memory conditions. Alternatives:
Look at the amount of free memory on the system before launching a new task. If it is below 500mb, wait until enough has been freed.
Launch tasks as they come and throttle as soon as some of them start to fail because of OOM. Restart them later. This alternative sucks big time because your process will do garbage collections like crazy to avoid the OOMs.
I recommend (1).
You can either look at free system memory or your own processes memory usage. In order to get the memory usage I recommend looking at private bytes using the Process class.
If you set aside 1GB of buffer on your 16GB system you run at 94% efficiency and are pretty safe.

Threads vs Processes in .NET

I have a long-running process that reads large files and writes summary files. To speed things up, I'm processing multiple files simultaneously using regular old threads:
ThreadStart ts = new ThreadStart(Work);
Thread t = new Thread(ts);
t.Start();
What I've found is that even with separate threads reading separate files and no locking between them and using 4 threads on a 24-core box, I can't even get up to 10% on the CPU or 10% on disk I/O. If I use more threads in my app, it seems to run even more slowly.
I'd guess I'm doing something wrong, but where it gets curious is that if I start the whole exe a second and third time, then it actually processes files two and three times faster. My question is, why can't I get 12 threads in my one app to process data and tax the machine as well as 4 threads in 3 instances of my app?
I've profiled the app and the most time-intensive and frequently called functions are all string processing calls.

It's possible that your computing problem is not CPU bound, but I/O bound. It doesn't help to state that your disk I/O is "only at 10%". I'm not sure such performance counter even exists.
The reason why it gets slower while using more threads is because those threads are all trying to get to their respective files at the same time, while the disk subsystem is having a hard time trying to accomodate all of the different threads. You see, even with a modern technology like SSDs where the seek time is several orders of magnitude smaller than with traditional hard drives, there's still a penalty involved.
Rather, you should conclude that your problem is disk bound and a single thread will probably be the fastest way to solve your problem.
One could argue that you could use asynchronous techniques to process a bit that's been read, while on the background the next bit is being read in, but I think you'll see very little performance improvement there.
I've had a similar problem not too long ago in a small tool where I wanted to calculate MD5 signatures of all the files on my harddrive and I found that the CPU is way too fast compared to the storage system and I got similar results trying to get more performance by using more threads.
Using the Task Parallel Library didn't alleviate this problem.

First of all on a 24 core box if you are using only 4 threads the most cpu it could ever use is 16.7% so really you are getting 60% utilization, which is fairly good.
It is hard to tell if your program is I/O bound at this point, my guess is that is is. You need to run a profiler on your project and see what sections of code your project is spending the most of it's time. If it is sitting on a read/write operation it is I/O bound.
It is possable you have some form of inter-thread locking being used. That would cause the program to slow down as you add more threads, and yes running a second process would fix that but fixing your locking would too.
What it all boils down to is without profiling information we can not say if using a second process will speed things up or make things slower, we need to know if the program is hanging on a I/O operation, a locking operation, or just taking a long time in a function that can be parallelized better.

I think you find out what file cache is not ideal in case when one proccess write data in many file concurrently. File cache should sync to disk when the number of dirty page cache exceeds a threshold. It seems concurrent writers in one proccess hit threshold faster than the single thread writer. You can read read about file system cache here File Cache Performance and Tuning

Try using Task library from .net 4 (System.Threading.Task). This library have built-in optimizations for different number of processors.
Have no clue what is you problem, maybe because your code snippet is not really informative

Fast data recording/logging on a separate thread in C#

We're developing an application which reads data from a number of external hardware devices continuously. The data rate is between 0.5MB - 10MB / sec, depending on the external hardware configuration.
The reading of the external devices is currently being done on a BackgroundWorker. Trying to write the acquired data to disk with this same BackgroundWorker does not appear to be a good solution, so what we want to do is, to queue this data to be written to a file, and have another thread dequeue the data and write to a file. Note that there will be a single producer and single consumer for the data.
We're thinking of using a synchronized queue for this purpose. But we thought this wheel must have been invented so many times already, so we should ask the SO community for some input.
Any suggestions or comments on things that we should watch out for would be appreciated.

I would do what a combination of what mr 888 does.
Basicly in you have 2 background workers,
one that reads from the hardware device.
one that writes the data to disk.
Hardware background worker:
Adds chucks on data from the hardware in the Queue<> . In whatever format you have it in.
Write background worker
Parses the data if needed and dumps to disk.
One thing to consider here is is getting the data from the hardware to disk as fast as posible importent?
If Yes, then i would have the write brackground test basicly in a loop with a 100ms or 10ms sleep in the while loop with checking if the que has data.
If No, Then i would have it either sleep a simular amount ( Making the assumtion that the speed you get from your hardware changes periodicly) and make only write to disk when it has around 50-60mb of data. I would consider doing it this way because modern hard drives can write about 60mb pr second ( This is a desktop hard drive, your enterprice once could be much quicker) and constantly writing data to it in small chucks is a waste of IO bandwith.

I am pretty confident that your queue will be pretty much ok. But make sure that you use efficient method of storing/retrieving data not to overhaul you logging procedure with memory allocation/deallocation. I would go for some pre-allocated memory buffer, and use it as a circular queue.

u might need queing
eg. code
protected Queue<Byte[]> myQ;
or
protected Queue<Stream> myQ;
//when u got the content try
myQ.Enque(...);
and use another thread to pop the queue
// another thread
protected void Loging(){
while(true){
while(myQ.Count > 0){
var content = myQ.Dequeue();
// save content
}
System.Threading.Thread.Sleep(1000);
}
}

I have a similar situation, In my case I used an asynchrounous lockfree queue with a LIFO synchronous object
Basically the threads that write to the queue set the sync object in the LIFO while the other threads 'workers' reset the sync object in the LIFO
We have fixed number of sync objects that are equal to the threads. the reason for using a LIFO is that to keep minimum number of threads running and better use of cache system.

Have you tried MSMQ

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.