Debug.WriteLine("Timer is high-resolution: {0}", Stopwatch.IsHighResolution);
Debug.WriteLine("Timer frequency: {0}", Stopwatch.Frequency);
Result:
Timer is high-resolution: True
Timer frequency: 2597705
This article (from 2005!) mentions a Frequency of 3579545, a million more than mine. This blog post mentions a Frequency of 3,325,040,000, which is insane.
Why is my Frequency so much comparatively lower? I'm on an i7 920 machine, so shouldn't it be faster?
3,579,545 is the magic number. That's the frequency in Hertz before dividing it by 3 and feeding it into the 8053 timer chip in the original IBM PC. The odd looking number wasn't chosen by accident, it is the frequency of the color burst signal in the NTSC TV system used in the US and Japan. The IBM engineers were looking for a cheap crystal to implement the oscillator, nothing was cheaper than the one used in every TV set.
Once IBM clones became widely available, it was still important for their designers to choose the same frequency. A lot of MS-DOS software relied on the timer ticking at that rate. Directly addressing the chip was a common crime.
That changed once Windows came around. A version of Windows 2 was the first one to virtualize the timer chip. In other words, software wasn't allowed to directly address the timer chip anymore. The processor was configured to run in protected mode and intercepted the attempt to use the I/O instruction. Running kernel code instead, allowing the return value of the instruction to be faked. It was now possible to have multiple programs using the timer without them stepping on each other's toes. An important first step to break the dependency on how the hardware is actually implemented.
The Win32 API (Windows NT 3.1 and Windows 95) formalized access to the timer with an API, QueryPerformanceCounter() and QueryPerformanceFrequency(). A kernel level component, the Hardware Adaption Layer, allows the BIOS to pass that frequency. Now it was possible for the hardware designers to really drop the dependency on the exact frequency. That took a long time btw, around 2000 the vast majority of machines still had the legacy rate.
But the never-ending quest to cut costs in PC design put an end to that. Nowadays, the hardware designer just picks any frequency that happens to be readily available in the chipset. 3,325,040,000 would be such a number, it is most probably the CPU clock rate. High frequencies like that are common in cheap designs, especially the ones that have an AMD core. Your number is pretty unusual, some odds that your machine wasn't cheap. And that the timer is a lot more accurate, CPU clocks have typical electronic component tolerances.
The frequence depends on the HAL (Hardware abstraction layer). Back in the pentium days, it was common to use the CPU tick (which was based on the CPU clock rate) so you ended up with really high frequency timers.
With multi-processor and multi-core machines, and especially with variable rate CPUs (the CPU clock slows down for low power states) using the CPU tick as the timer becomes difficult and error prone, so the writers of the HAL seem to have chosen to use a slower, but more reliable hardware clock, like the real time clock.
The Stopwatch.Frequency value is per second, so your frequency of 2,597,705 means you have more than 2.5 million ticks per second. Exactly how much precision do you need?
As for the variations in frequency, that is a hardware-dependent thing. Some of the most common hardware differences are the number of cores, the frequency of each core, the current power state of your cpu (or cores), whether you have enabled the OS to dynamically adjust the cpu frequency, etc. Your frequency will not always be the same, and depending on what state your cpu is in when you check it, it may be lower or higher, but generally around the same (for you, probably around 2.5 million.)
I think 2,597,705 = your processor frequency. Myne is 2,737,822. i7 930
Related
I'm converting some code over from .NET Micro Framework which I was running on a Netduino. The code measures the frequency of a square-wave oscillator that has a maximum frequency of about 1000 Hz, or a period of about 1 millisecond. The application is a rain detector that varies its capacitance depending on how wet it is. Capacitance increases with wetness, which reduces the oscillator frequency.
On the Netduino, I used an InterruptPin. It's not a genuine interrupt but schedules a .NET event, and in the EventArgs is contained a timestamp of when the pin value changed. On the Netduino, I could also configure whether it would be the rising or falling edge that would trigger the event. I managed to get this working fairly well and 1 KHz was approaching the maximum throughput that the Netduino could reliably measure.
On the Raspberry Pi, things don't go as well. It's running Windows 10 IoT Core, which to be sure is quite a different environment to the Netduino. I have a ValueChanged event that I can tap into but there is no timestamp and it occurs twice as fast because it gets triggered by both halves of the waveform. I hoped that, with its faster quad-core CPU, the Raspberry Pi might be able to cope with this, but in fact the best throughput I can get appears to be in the order of 1 event every 30 milliseconds - an order of magnitude worse that what I got on the Netduino, at least, which means I'm falling a long way short of timing a 1 KHz square wave.
So I'm looking for ideas. I've thought about slowing the oscillator down. The original circuit was running at around 1 MHz and I've added a lot of resistors to increase the time constant, bringing it down to around 1 KHz. I could go on adding resistors but there comes a point where it starts to get silly and I'm worried about component tolerances making the thing hard to calibrate.
It would be handy of the Raspberry Pi exposed some counter/timer functionality, but none of these 'maker' boards seem to do that, for some unfathomable reason.
One approach could be to use an A-to-D converter to somehow get a direct reading, but the electronics is a bit beyond me (hey, I'm a software guy!).
There is enough grunt in the Raspberry Pi that I ought to be able to get this to work! Has anyone found a way of getting faster throughput to the GPIO pins?
In the company I work for we build machines which are controlled by software running on Windows OS. A C# application communicates with a bus controller (via a DLL). The bus controller runs on a tact time of 15ms. That means, that we get updates of the actual sensors in the system with a heart beat of 15ms from the bus controller (which is real time).
Now, the machines are evolving into a next generation, where we get a new bus controller which runs on a tact of 1ms. Since everybody realizes that Windows is not a real time OS, the question arises: should we move the controlling part of the software to a real time application (on a real time OS, e.g. a (soft) PLC).
If we stay on the windows platform, we do not have guaranteed responsiveness. That on itself is not necessarily a problem; if we miss a few bus cycles (have a few hickups), the machine will just produce slightly slower (which is acceptable).
The part that worries me, is Thread synchronization between the main machine controlling thread, and the updates we receive from the real time controller (every millisecond).
Where can I learn more about how Windows / .NET C# behaves when it goes down the path of thread synchronization on milliseconds? I know that e.g. Thread.Sleep(1) can take up to 15 ms because Windows is preempting other tasks, so how does this reflect when I synchronize between two threads with Monitor.PulseAll every ms? Can I expect the same unpredictable behavior? Is it asking for trouble when I am moving into the soft real time requirements of 1ms in Windows applications?
I hope somebody with experience on these aspects of threading can shed some light on this. If I need to clarify more, by all means, shoot.
Your scenario sounds like a candidate for a kiosk-mode/dedicated application.
In the company I work for we build machines which are controlled by software running on Windows OS.
If so, you could rig the machines such that your low-latency I/O thread could run on a dedicated core with thread and process priorities maximized. Furthermore, ensure the machine has enough cores to handle a buffering thread as well as any others that process your data in transit. The buffer should allocate memory upfront if possible to avoid garbage collection bottlenecks.
#Aron's example is good for situations where data integrity can be compromised to a certain extent. In audio, latency matters a lot during recording for multiple reasons but for pure playback, data loss is acceptable to a certain degree. I am assuming this is not an option in your case.
Of course Windows is not designed to be a real-time OS but if you are using it for a dedicated app, you have control over every aspect of it and can turn off all unrelated services and background processes.
I have had a reasonable amount of success writing software to monitor how well UPS units cope with power fluctuations by measuring their power compensation response times (disclaimer: not for commercial purposes though). Since the data to measure per sample was very small, the GC was not problematic and we cycled pre-allocated memory blocks for buffers.
Some micro-optimizations that came in handy:
Using immutable structs to poll I/O data.
Optimizing data structures to work well with memory allocation.
Optimizing processing algorithms to minimize CPU cache misses.
Using an optimized buffer class to hold data in transit.
Using the Monitor and Interlocked classes for synchronization.
Using unsafe code with (void*) to gain easy access to buffer arrays in various ways to decrease processing time. Minimal use of Marshal and Buffer.BlockCopy.
Lastly, you could go the DDK way and write a small driver. Albeit off-topic, DFMirage is a good example of a video driver that provides both an event-based and a polling model for differential screen capture such that the consumer application can chose on-the-fly based on system load.
As for Thread.Sleep, you could use it as sparingly as possible considering your energy consumption boundaries. With redundant processes out of the way, Thread.Sleep(1) should not be as bad as you think. Try the following to see what you get. Note that this has been coded in the SO editor so I may have made mistakes.
Thread.CurrentThread.Priority = ThreadPriority.Highest;
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.RealTime;
var ticks = 0L;
var iteration = 0D;
var timer = new Stopwatch();
do
{
iteration++;
timer.Restart();
Thread.Sleep(1);
timer.Stop();
ticks += timer.Elapsed.Ticks;
if (Console.KeyAvailable) { if (Console.ReadKey(true).Key == ConsoleKey.Escape) { break; } }
Console.WriteLine("Elapsed (ms): Last Iteration = {0:N2}, Average = {1:N2}.", timer.Elapsed.TotalMilliseconds, TimeSpan.FromTicks((long) (ticks / iteration)).TotalMilliseconds);
}
while (true);
Console.WriteLine();
Console.WriteLine();
Console.Write("Press any key to continue...");
Console.ReadKey(true);
Come to think about the actual problem itself, processing data at 1ms is pretty easy. When considering audio recording, as an analogous (pun not intended) problem, you might be able to find some inspiration in how to achieve your goals.
Bear in mind.
Even a modest setup can achieve 44.1kHz#16bit per channel sampling rate (that is about 22microseconds or less than a hundredth of your target).
Using ASIO you can achieve sub 10ms latencies
Most methods of achieving high sampling rates will work by increasing your buffer size and sending data to your system in batches
To achieve the best throughput, don't use threads. You DMA and interrupts to callback your processing loop.
Given that sound cards routinely can achieve your goals, you might have a chance.
I don't understand the meaning of timer precision and resolution. Can anyone explain it to me?
NOTE: This question is related to Stopwatch.
Accuracy and precision are opposing goals, you can't get both. An example of a very accurate timing source is DateTime.UtcNow. It provides absolute time that's automatically corrected for clock rate errors by the kernel, using a timing service to periodically re-calibrate the clock. You probably heard of time.windows.com, the NTP server that most Windows PC use. Very accurate, you can count on less than a second of error over an entire year. But not precise, the value only updates 64 times per second. It is useless to time anything that takes less than a second with any kind of decent precision.
The clock source for Stopwatch is very different. It uses a free running counter that is driven by a frequency source available somewhere in the chipset. This used to be a dedicate crystal running at the color burst frequency (3.579545 MHz) but relentless cost cutting has eliminated that from most PCs. Stopwatch is very precise, you can tell from its Frequency property. You should get something between a megahertz and the cpu clock frequency, allowing you to time down to a microsecond or better. But it is not accurate, it is subject to electronic part tolerances. Particularly mistrust any Frequency beyond a gigahertz, that's derived from a multiplier which also multiplies the error. And beware the Heisenberg principle, starting and stopping the Stopwatch takes non-zero overhead that will affect the accuracy of very short measurements. Another common accuracy problem with Stopwatch is the operating system switching out your thread to allow other code to run. You need to take multiple samples and use the median value.
They are the same as with any measurement. See this Wikipedia article for more details --
http://en.wikipedia.org/wiki/Accuracy_and_precision
There are different types of times in .net (3 or 4 of them, if i remember correctly), each working with his own algorithm. The precision of timer means how accurate it is in informing the using application on the ticking events. For example, if you use a timer and set it to trigger its ticking event every 1000 ms, the precision of the timer means how close to the specified 1000 ms it will actually tick.
for more information (at least in c#), i suggest u read the msdn page on timers:
From MSDN Stopwatch Class: (emphasis mine)
"The Stopwatch measures elapsed time by counting timer ticks in the underlying timer mechanism. If the installed hardware and operating system support a high-resolution performance counter, then the Stopwatch class uses that counter to measure elapsed time. Otherwise, the Stopwatch class uses the system timer to measure elapsed time. Use the Frequency and IsHighResolution fields to determine the precision and resolution of the Stopwatch timing implementation."
System.Environment.ProcessorCount shows me N Processors (N in my case = 8), which I want to make use of. Now the problem is, that the windows resourcemanager sais, that 4 of my CPU's are 'parked', and the 8 Threads i start just seperate up to the 4 unblocked CPUs.
Now is there a way to use the parked CPU's, too?
When Windows "parks" a CPU core, it means that there is not enough work for that core to do so it puts that core in a low-power state. In order to "unpark" the CPU, you just have to create enough work.
If you are starting 8 threads and Windows isn't unparking the CPUs, the threads probably are doing I/O, blocking, or completing too quickly. If you post what your threads are doing, maybe somebody can explain why they're not running on the parked cores.
Usually, you should be able to do it this way:
Process.GetCurrentProcess().ProcessorAffinity = (IntPtr)0x00FF;
see documentation for it here:
http://msdn.microsoft.com/en-us/library/system.diagnostics.process.processoraffinity.aspx
but it also says that, by default your process is assigned to all cores.
On the other hand, you could try ProcessThread.ProcessorAffinity and try to set it manually (if you want to force each thread to use another core).
Win7/2K8R2 won't unpark cores until the other ones are saturated or near saturation.
The whole point of parking cores is to consolidate work. It's more power efficient to use 4 cores at 80% than 8 cores at 40%. Also, the performance difference should be almost non-existent.
Also, depending on how much data is shared, consolidating the work will actually be faster because there would be less sync overhead because there are fewer hardware threads involved. Recent data changes from one thread will be more likely in cache.
So, common worst case is about same performance and less power used and common best case is better performance and less power used.
The parking is not controlled by the CPU affinity setting of your process, it is done automatically by the Windows CPU Scheduler. Adjustments to your CPU affinity can perhaps force utilization of certain cores, but then Windows will just park different cores. The parking is turned on or off dynamically, very quickly, in accordance with system load. It is actually surprisingly aggressive by default (maybe too much so on some platforms). You can watch it in the Resource Monitor, as you saw.
Setting your own CPU affinity is something you should do with extreme caution. You must consider HyperThreaded cores, or in the case of AMD Bulldozer, paired cores that share computational units (their HyperThreading without being HyperThreading ;p). You don't want to end up 'stuck' on a Hyper-Threaded core that offers a mere fraction of the performance of a real core. The CPU scheduler is aware of such things, so usually the affinity is best left to it -- unless you know what you're doing, and have checked that system's CPU.
However, you can enable/disable or tweak CPU Parking very easily, without rebooting. I wrote a HOW-TO, complete with a simple GUI, here: How to Enable/Disable or Tweak CPU Parking Without a Reboot, and without Registry Edits
It also includes more information about CPU Parking, and how to tweak it using PowerCfg.exe. You can actually make the option show up in the standard Advanced Power Profile settings in Windows, but it takes some tweaking I won't get into here.
I have to build a simulator with C#. This simulator should be able to run a second thread with configureable CPU speed and limited RAM size, e.g. 144MHz and 50 MB.
Of course I know that a simulator can never be as accurate as the real hardware. But I try to get almost similar performance.
At the moment I'm thinking about creating a thread which I will stop/sleep from time to time. Depending on the desired CPU speed the simulator should adjust the sleep time of this thread and therefore simulate different cpu frequency. To measure the achieved speed I though about using PerformanceCounters. But with this approach I have the problem that I don't know how to limit the RAM size the thread could use.
Do you have any ideas how to realize such a simulator?
Thanks in advance!!
Limit memory is easy with the virtual machines like vmware. You can change cpu speed with some overclocking tools. For example http://cpu.rightmark.org/products/rmclock.shtml
Good luck!
CPU speed limiting? You should check this, perhaps it will useful (to some degree at least).
CPU Emulation and locking to a specific clock speed
If you are concerned with simulating an operating system environment then one answer would be to use a virtual machines environment where you can control memory and CPU parameters, etc.
The threading pause\stop may help you to simulate CPU frequency, but this is going to be terribly inaccurate as when you pause the thread it will be de-scheduled, then it's up to the operating system to re-schedule it at some "random" point in time i.e. a point which you have no control over.
As for limiting the memory, starting a new process that will host your code is an option, and then limiting the memory of that process, e.g.:
http://www.codeproject.com/KB/threads/Setting_Max_Memory_Limit.aspx
This will not really simulate overall OS memory limitations though.
A thread to sleep down the software execution of your guest opcodes ?
I think it works but a little weird, like fast-forward, pause, ff, pause, etc...
If you just want to speed down a process, try this: use the cpu single step features, and "debug" the process. You have to wrote a custom handler for the cpu single stepping trap. Your handler job is only a big loop of NOPs.
You have a fine delay between each instruction.