Mutithreading in C# queries - c#

I am new to multithreading in C# . I have a 3D array of size (x)(y)(z) and say i want to calculate the average of all the z samples for every (x,y) values. I wish to do that using multithreading (say 2 threads) where i will send half the array of size (x/2)*y*z for processing to thread1 and the other half to thread2.
How to do it? How do I pass and retrieve arguments from individual threads? A code example will be helpful.
Regards

I would recommend using PLINQ for this instead of threading this yourself.
It will let you run your query using LINQ syntax, but parallelize it (across all of your cores) automatically.

There are many reasons why it makes sense to use something PLINQ (as mentioned by Reed) or Parallel.For as implementing a low-overhead scheduler for distributing jobs over several cpus is a bit challenging.
So if I understood you correctly maybe this could get you started (on my 4 core machine the parallel version is 3 times faster than the single core version):
using System;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
class Program
{
static void AverageOfZ (
double[] input,
double[] result,
int x,
int y,
int z
)
{
Debug.Assert(input.Length == x*y*z);
Debug.Assert(result.Length == x*y);
//Replace Parallel with Sequential to compare with non-parallel loop
//Sequential.For(
Parallel.For(
0,
x*y,
i =>
{
var begin = i*z;
var end = begin + z;
var sum = 0.0;
for (var iter = begin; iter < end; ++iter)
{
sum += input[iter];
}
result[i] = sum/z;
});
}
static void Main(string[] args)
{
const int X = 64;
const int Y = 64;
const int Z = 64;
const int Repetitions = 40000;
var random = new Random(19740531);
var samples = Enumerable.Range(0, X*Y*Z).Select(x => random.NextDouble()).ToArray();
var result = new double[X*Y];
var then = DateTime.Now;
for (var iter = 0; iter < Repetitions; ++iter)
{
AverageOfZ(samples, result, X, Y, Z);
}
var diff = DateTime.Now - then;
Console.WriteLine(
"{0} samples processed {1} times in {2} seconds",
samples.Length,
Repetitions,
diff.TotalSeconds
);
}
}
static class Sequential
{
public static void For(int from, int to, Action<int> action)
{
for (var iter = from; iter < to; ++iter)
{
action(iter);
}
}
}
PS. When going for concurrent performance its important to consider how the different cores access memory as its very easy to get disappointing performance otherwise.

Dot Net 3.5 and onward introduced many shortcut keywords that abstract away the complexity of things like Parallel for multi threading or Async for Async IO. Unfortunately this also provides no opportunity for understanding whats involved in these tasks. For example a colleague of mine was recently trying to use Async for a login method which returned an authentication token.
Here is the full blown multi threaded sample code for your scenario. to make it more real the sample code pretends that:
X is Longitude
Y is Lattitude
and Z is Rainfall Samples at the coordinates
The sample code also follows the Unit of Work design pattern where Rainfall Samples at each coordinate becomes a work item. It also creates discrete foreground threads instead of using a background threadpool.
Due to the simplicity of the work item and short compute time involved I've split the thread synchronization locks into two locks. one for the work queue and one for the output data.
Note: I have not used any Dot net shortcuts such as Lync so this code should run on Dot Net 2.0 as well.
In real world app development something like whats below would only be needed in complex scenarios such as stream processing of a continuous stream of work items in which case you would also need to implement output data buffers cleared regularly as the threads would effectively run forever.
public static class MultiThreadSumRainFall
{
const int LongitudeSize = 64;
const int LattitudeSize = 64;
const int RainFallSamplesSize = 64;
const int SampleMinValue = 0;
const int SampleMaxValue = 1000;
const int ThreadCount = 4;
public static void SumRainfallAndOutputValues()
{
int[][][] SampleData;
SampleData = GenerateSampleRainfallData();
for (int Longitude = 0; Longitude < LongitudeSize; Longitude++)
{
for (int Lattitude = 0; Lattitude < LattitudeSize; Lattitude++)
{
QueueWork(new WorkItem(Longitude, Lattitude, SampleData[Longitude][Lattitude]));
}
}
System.Threading.ThreadStart WorkThreadStart;
System.Threading.Thread WorkThread;
List<System.Threading.Thread> RunningThreads;
WorkThreadStart = new System.Threading.ThreadStart(ParallelSum);
int NumThreads;
NumThreads = ThreadCount;
if (ThreadCount < 1)
{
NumThreads = 1;
}
else if (NumThreads > (Environment.ProcessorCount + 1))
{
NumThreads = Environment.ProcessorCount + 1;
}
OutputData = new int[LongitudeSize, LattitudeSize];
RunningThreads = new List<System.Threading.Thread>();
for (int I = 0; I < NumThreads; I++)
{
WorkThread = new System.Threading.Thread(WorkThreadStart);
WorkThread.Start();
RunningThreads.Add(WorkThread);
}
bool AllThreadsComplete;
AllThreadsComplete = false;
while (!AllThreadsComplete)
{
System.Threading.Thread.Sleep(100);
AllThreadsComplete = true;
foreach (System.Threading.Thread WorkerThread in RunningThreads)
{
if (WorkerThread.IsAlive)
{
AllThreadsComplete = false;
}
}
}
for (int Longitude = 0; Longitude < LongitudeSize; Longitude++)
{
for (int Lattitude = 0; Lattitude < LattitudeSize; Lattitude++)
{
Console.Write(string.Concat(OutputData[Longitude, Lattitude], #" "));
}
Console.WriteLine();
}
}
private class WorkItem
{
public WorkItem(int _Longitude, int _Lattitude, int[] _RainFallSamples)
{
Longitude = _Longitude;
Lattitude = _Lattitude;
RainFallSamples = _RainFallSamples;
}
public int Longitude { get; set; }
public int Lattitude { get; set; }
public int[] RainFallSamples { get; set; }
}
public static int[][][] GenerateSampleRainfallData()
{
int[][][] Result;
Random Rnd;
Rnd = new Random();
Result = new int[LongitudeSize][][];
for(int Longitude = 0; Longitude < LongitudeSize; Longitude++)
{
Result[Longitude] = new int[LattitudeSize][];
for (int Lattidude = 0; Lattidude < LattitudeSize; Lattidude++)
{
Result[Longitude][Lattidude] = new int[RainFallSamplesSize];
for (int Sample = 0; Sample < RainFallSamplesSize; Sample++)
{
Result[Longitude][Lattidude][Sample] = Rnd.Next(SampleMinValue, SampleMaxValue);
}
}
}
return Result;
}
private static object SyncRootWorkQueue = new object();
private static Queue<WorkItem> WorkQueue = new Queue<WorkItem>();
private static void QueueWork(WorkItem SamplesWorkItem)
{
lock(SyncRootWorkQueue)
{
WorkQueue.Enqueue(SamplesWorkItem);
}
}
private static WorkItem DeQueueWork()
{
WorkItem Samples;
Samples = null;
lock (SyncRootWorkQueue)
{
if (WorkQueue.Count > 0)
{
Samples = WorkQueue.Dequeue();
}
}
return Samples;
}
private static int QueueSize()
{
lock(SyncRootWorkQueue)
{
return WorkQueue.Count;
}
}
private static object SyncRootOutputData = new object();
private static int[,] OutputData;
private static void SetOutputData(int Longitude, int Lattitude, int SumSamples)
{
lock(SyncRootOutputData)
{
OutputData[Longitude, Lattitude] = SumSamples;
}
}
private static void ParallelSum()
{
WorkItem SamplesWorkItem;
int SummedResult;
SamplesWorkItem = DeQueueWork();
while (SamplesWorkItem != null)
{
SummedResult = 0;
foreach (int SampleValue in SamplesWorkItem.RainFallSamples)
{
SummedResult += SampleValue;
}
SetOutputData(SamplesWorkItem.Longitude, SamplesWorkItem.Lattitude, SummedResult);
SamplesWorkItem = DeQueueWork();
}
}
}

Related

Design a thread-safe Hit Counter

I have designed a Hit Counter which can conveniently be used to design a Rate Limiter
Original question: https://leetcode.com/problems/design-hit-counter/
To make it thread safe, I used lock statement.
public class HitCounter {
private readonly int DURATION = 300;
private readonly int[] hits, times;
//https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/statements/lock
private readonly Object _lockObj = new Object();
public HitCounter() {
times = new int[DURATION];
hits = new int[DURATION];
}
public void Hit(int timestamp) {
var idx = timestamp % DURATION;
lock (_lockObj){
if(times[idx] != timestamp){
times[idx] = timestamp;
hits[idx] = 1;
}else{
hits[idx]++;
}
}
}
public int GetHits(int timestamp) {
var res = 0;
lock (_lockObj){
for(var i = 0; i < DURATION; ++i){
if(timestamp - times[i] < DURATION){
res += hits[i];
}
}
}
return res;
}
}
My questions are:
Is it just about surrounding the part which the two arrays are accessed with locks
Isn't it better to use ConcurrentDictionary as an index-array?
Any insight is appreciated.

c# 1D-byte array to 2D-double array

I'm dealing with c# concurrent-queue and multi-threading in socket-programming tcp/ip
First, I've already done with socket-programming itself. That means, I've already finished coding about client, server and stuffs about communication itself
basic structure is pipe-lined(producer-consumer problem) and now I'm doing with bit conversion
below is brief summary about my code
client-socket ->server-socket -> concurrent_queue_1(with type byte[65536],Thread_1 process this) -> concurrent_queue_2(with type double[40,3500], Thread_2 process this) -> display-data or other work(It can be gpu-work)
*(double[40,3500] can be changed to other size)
Till now,I've implemented putting_data into queue1(Thread1) and just dequeuing all(Thread2) and, its speed is about 700Mbps
The reason I used two concurrent_queue is, I want communication,and type conversion work to be processed in background regardless of main procedure about control things.
Here is the code about my own concurrent_queue with Blocking
public class BlockingConcurrentQueue<T> : IDisposable
{
private readonly ConcurrentQueue<T> _internalQueue;
private AutoResetEvent _autoResetEvent;
private long _consumed;
private long _isAddingCompleted = 0;
private long _produced;
private long _sleeping;
public BlockingConcurrentQueue()
{
_internalQueue = new ConcurrentQueue<T>();
_produced = 0;
_consumed = 0;
_sleeping = 0;
_autoResetEvent = new AutoResetEvent(false);
}
public bool IsAddingCompleted
{
get
{
return Interlocked.Read(ref _isAddingCompleted) == 1;
}
}
public bool IsCompleted
{
get
{
if (Interlocked.Read(ref _isAddingCompleted) == 1 && _internalQueue.IsEmpty)
return true;
else
return false;
}
}
public void CompleteAdding()
{
Interlocked.Exchange(ref _isAddingCompleted, 1);
}
public void Dispose()
{
_autoResetEvent.Dispose();
}
public void Enqueue(T item)
{
_internalQueue.Enqueue(item);
if (Interlocked.Read(ref _isAddingCompleted) == 1)
throw new InvalidOperationException("Adding Completed.");
Interlocked.Increment(ref _produced);
if (Interlocked.Read(ref _sleeping) == 1)
{
Interlocked.Exchange(ref _sleeping, 0);
_autoResetEvent.Set();
}
}
public bool TryDequeue(out T result)
{
if (Interlocked.Read(ref _consumed) == Interlocked.Read(ref _produced))
{
Interlocked.Exchange(ref _sleeping, 1);
_autoResetEvent.WaitOne();
}
if (_internalQueue.TryDequeue(out result))
{
Interlocked.Increment(ref _consumed);
return true;
}
return false;
}
}
My question is here
As I mentioned above, concurrent_queue1's type is byte[65536] and 65536 bytes = 8192 double data.
(40 * 3500=8192 * 17.08984375)
I want merge multiple 8192 double data into form of double[40,3500](size can be changed)and enqueue to concurrent_queue2 with Thread2
It's easy to do it with naive-approach(using many complex for loop) but it's slow cuz, It copys all the
data and expose to upper class or layer.
I'm searching method automatically enqueuing with matched size like foreach loop automatically iterates through 2D-array in row-major way, not yet found
Is there any fast way to merge 1D-byte array into form of 2D-double array and enqueue it?
Thanks for your help!
I try to understand your conversion rule, so I write this conversion code. Use Parallel to speed up the calculation.
int maxSize = 65536;
byte[] dim1Array = new byte[maxSize];
for (int i = 0; i < maxSize; ++i)
{
dim1Array[i] = byte.Parse((i % 256).ToString());
}
int dim2Row = 40;
int dim2Column = 3500;
int byteToDoubleRatio = 8;
int toDoubleSize = maxSize / byteToDoubleRatio;
double[,] dim2Array = new double[dim2Row, dim2Column];
Parallel.For(0, toDoubleSize, i =>
{
int row = i / dim2Column;
int col = i % dim2Column;
int originByteIndex = row * dim2Column * byteToDoubleRatio + col * byteToDoubleRatio;
dim2Array[row, col] = BitConverter.ToDouble(
dim1Array,
originByteIndex);
});

Task process ending before finishing all the work

I've been having trouble running multiple tasks with heavy operations.
It seems as if the task processes is killed before all the operations are complete.
The code here is an example code I used to replicate the issue. If I add something like Debug.Write(), the added wait for writing fixes the issue. The issue is gone if I test on a smaller sample size too. The reason there is a class in the example below is to create complexity for the test.
The real case where I encountered the issue first is too complicated to explain for a post here.
public static class StaticRandom
{
static int seed = Environment.TickCount;
static readonly ThreadLocal<Random> random =
new ThreadLocal<Random>(() => new Random(Interlocked.Increment(ref seed)));
public static int Next()
{
return random.Value.Next();
}
public static int Next(int maxValue)
{
return random.Value.Next(maxValue);
}
public static double NextDouble()
{
return random.Value.NextDouble();
}
}
// this is the test function I run to recreate the problem:
static void tasktest()
{
var testlist = new List<ExampleClass>();
for (var index = 0; index < 10000; ++index)
{
var newClass = new ExampleClass();
newClass.Populate(Enumerable.Range(0, 1000).ToList());
testlist.Add(newClass);
}
var anotherClassList = new List<ExampleClass>();
var threadNumber = 5;
if (threadNumber > testlist.Count)
{
threadNumber = testlist.Count;
}
var taskList = new List<Task>();
var tokenSource = new CancellationTokenSource();
CancellationToken cancellationToken = tokenSource.Token;
int stuffPerThread = testlist.Count / threadNumber;
var stuffCounter = 0;
for (var count = 1; count <= threadNumber; ++count)
{
var toSkip = stuffCounter;
var threadWorkLoad = stuffPerThread;
var currentIndex = count;
// these ifs make sure all the indexes are covered
if (stuffCounter + threadWorkLoad > testlist.Count)
{
threadWorkLoad = testlist.Count - stuffCounter;
}
else if (count == threadNumber && stuffCounter + threadWorkLoad < testlist.Count)
{
threadWorkLoad = testlist.Count - stuffCounter;
}
taskList.Add(Task.Factory.StartNew(() => taskfunc(testlist, anotherClassList, toSkip, threadWorkLoad),
cancellationToken, TaskCreationOptions.None, TaskScheduler.Default));
stuffCounter += stuffPerThread;
}
Task.WaitAll(taskList.ToArray());
}
public class ExampleClass
{
public ExampleClassInner[] Inners { get; set; }
public ExampleClass()
{
Inners = new ExampleClassInner[5];
for (var index = 0; index < Inners.Length; ++index)
{
Inners[index] = new ExampleClassInner();
}
}
public void Populate(List<int> intlist) {/*adds random ints to the inner class*/}
public ExampleClass(ExampleClass copyFrom)
{
Inners = new ExampleClassInner[5];
for (var index = 0; index < Inners.Length; ++index)
{
Inners[index] = new ExampleClassInner(copyFrom.Inners[index]);
}
}
public class ExampleClassInner
{
public bool SomeBool { get; set; } = false;
public int SomeInt { get; set; } = -1;
public ExampleClassInner()
{
}
public ExampleClassInner(ExampleClassInner copyFrom)
{
SomeBool = copyFrom.SomeBool;
SomeInt = copyFrom.SomeInt;
}
}
}
static int expensivefunc(int theint)
{
/*a lot of pointless arithmetic and loops done only on primitives and with primitives,
just to increase the complexity*/
theint *= theint + 1;
var anotherlist = Enumerable.Range(0, 10000).ToList();
for (var index = 0; index < anotherlist.Count; ++index)
{
theint += index;
if (theint % 5 == 0)
{
theint *= index / 2;
}
}
var yetanotherlist = Enumerable.Range(0, 50000).ToList();
for (var index = 0; index < yetanotherlist.Count; ++index)
{
theint += index;
if (theint % 7 == 0)
{
theint -= index / 3;
}
}
while (theint > 8)
{
theint /= 2;
}
return theint;
}
// this function is intentionally creating a lot of objects, to simulate complexity
static void taskfunc(List<ExampleClass> intlist, List<ExampleClass> anotherClassList, int skip, int take)
{
if (take == 0)
{
take = intlist.Count;
}
var partial = intlist.Skip(skip).Take(take).ToList();
for (var index = 0; index < partial.Count; ++index)
{
var testint = expensivefunc(index);
var newClass = new ExampleClass(partial[index]);
newDna.Inners[StaticRandom.Next(5)].SomeInt = testint;
anotherClassList.Add(new ExampleClass(newClass));
}
}
The expected result is that the list anotherClassList will be the same size as testlist and this happens when the lists are smaller or the complexity of the task operations is smaller. However, when I increase the volume of operations, the anotherClassList has a few indexes missing and sometimes some of the indexes in the list are null objects.
Example result:
Why does this happen, I have Task.WaitAll?
Your problem is it's just not thread-safe; you just can't add to a list<T> in a multi-threaded environment and expect it to play nice.
One way is to use lock or a thread safe collection, but I feel this all should be refactored (my OCD is going off all over the place).
private static object _sync = new object();
...
private static void TaskFunc(List<ExampleClass> intlist, List<ExampleClass> anotherClassList, int skip, int take)
{
...
var partial = intlist.Skip(skip).Take(take).ToList();
...
// note that locking here will likely drastically decrease any performance threading gain
lock (_sync)
{
for (var index = 0; index < partial.Count; ++index)
{
// this is your problem, you are adding to a list from multiple threads
anotherClassList.Add(...);
}
}
}
In short, I think you need to better thinking about the threading logic of your method, identify what you are trying to achieve, and how to make it conceptually thread safe (while keeping your performance gains).
After TheGeneral enlightened me that Lists are not thread safe, I changed the List to which I was adding in a thread, to an Array type and this fixed my issue.

Thread safety Parallel.For c#

im frenchi so sorry first sorry for my english .
I have an error on visual studio (index out of range) i have this problem only with a Parallel.For not with classic for.
I think one thread want acces on my array[i] and another thread want too ..
It's a code for calcul Kmeans clustering for building link between document (with cosine similarity).
more information :
IndexOutOfRange is on similarityMeasure[i]=.....
I have a computer with 2 Processor (12logical)
with classic for , cpu usage is 9-14% , time for 1 iteration=9min..
with parallel.for , cpu usage is 70-90% =p, time for 1 iteration =~1min30
Sometimes it works longer before generating an error
My function is :
private static int FindClosestClusterCenter(List<Centroid> clustercenter, DocumentVector obj)
{
float[] similarityMeasure = new float[clustercenter.Count()];
float[] copy = similarityMeasure;
object sync = new Object();
Parallel.For(0, clustercenter.Count(), (i) => //for(int i = 0; i < clustercenter.Count(); i++) Parallel.For(0, clustercenter.Count(), (i) => //
{
similarityMeasure[i] = SimilarityMatrics.FindCosineSimilarity(clustercenter[i].GroupedDocument[0].VectorSpace, obj.VectorSpace);
});
int index = 0;
float maxValue = similarityMeasure[0];
for (int i = 0; i < similarityMeasure.Count(); i++)
{
if (similarityMeasure[i] > maxValue)
{
maxValue = similarityMeasure[i];
index = i;
}
}
return index;
}
My function is call here :
do
{
prevClusterCenter = centroidCollection;
DateTime starttime = DateTime.Now;
foreach (DocumentVector obj in documentCollection)//Parallel.ForEach(documentCollection, parallelOptions, obj =>//foreach (DocumentVector obj in documentCollection)
{
int ind = FindClosestClusterCenter(centroidCollection, obj);
resultSet[ind].GroupedDocument.Add(obj);
}
TimeSpan tempsecoule = DateTime.Now.Subtract(starttime);
Console.WriteLine(tempsecoule);
//Console.ReadKey();
InitializeClusterCentroid(out centroidCollection, centroidCollection.Count());
centroidCollection = CalculMeanPoints(resultSet);
stoppingCriteria = CheckStoppingCriteria(prevClusterCenter, centroidCollection);
if (!stoppingCriteria)
{
//initialisation du resultat pour la prochaine itération
InitializeClusterCentroid(out resultSet, centroidCollection.Count);
}
} while (stoppingCriteria == false);
_counter = counter;
return resultSet;
FindCosSimilarity :
public static float FindCosineSimilarity(float[] vecA, float[] vecB)
{
var dotProduct = DotProduct(vecA, vecB);
var magnitudeOfA = Magnitude(vecA);
var magnitudeOfB = Magnitude(vecB);
float result = dotProduct / (float)Math.Pow((magnitudeOfA * magnitudeOfB),2);
//when 0 is divided by 0 it shows result NaN so return 0 in such case.
if (float.IsNaN(result))
return 0;
else
return (float)result;
}
CalculMeansPoint :
private static List<Centroid> CalculMeanPoints(List<Centroid> _clust)
{
for (int i = 0; i < _clust.Count(); i++)
{
if (_clust[i].GroupedDocument.Count() > 0)
{
for (int j = 0; j < _clust[i].GroupedDocument[0].VectorSpace.Count(); j++)
{
float total = 0;
foreach (DocumentVector vspace in _clust[i].GroupedDocument)
{
total += vspace.VectorSpace[j];
}
_clust[i].GroupedDocument[0].VectorSpace[j] = total / _clust[i].GroupedDocument.Count();
}
}
}
return _clust;
}
You may have some side effects in FindCosineSimilarity, make sure it does not modify any field or input parameter. Example: resultSet[ind].GroupedDocument.Add(obj);. If resultSet is not a reference to locally instantiated array, then that is a side effect.
That may fix it. But FYI you could use AsParallel for this rather than Parallel.For:
similarityMeasure = clustercenter
.AsParallel().AsOrdered()
.Select(c=> SimilarityMatrics.FindCosineSimilarity(c.GroupedDocument[0].VectorSpace, obj.VectorSpace))
.ToArray();
You realize that if you synchronize the whole Content of the Parallel-For, it's just the same as having a normal synchrone for-loop, right? Meaning the code as it is doesnt do anything in parallel, so I dont think you'll have any Problems with concurrency. My guess from what I can tell is clustercenter[i].GroupedDocument is propably an empty Array.

Monitoring the FPS of a Direct X Application

I am looking to create an external application that monitors the 'FPS' of a DirectX application (like FRAPS without the recording). I have read several Microsoft articles on performance measuring tools - but I am looking to get the feedback (and experience) of the community.
My question: what is the best method for obtaining the FPS of a DirectX application?
Windows has some Event Tracing for Windows providers related to DirectX profiling. The most intresting ones are Microsoft-Windows-D3D9 and Microsoft-Windows-DXGI, which allow tracing of the frame presentation events. The simplest way to calculate FPS is to count the number of PresentStart events withing a time interval and divide that by the length of the interval.
To work with ETW in C#, install Microsoft.Diagnostics.Tracing.TraceEvent package.
The following code sample displays FPS of running processes:
using System;
using System.Collections.Generic;
using System.Text;
using System.Diagnostics;
using System.Threading;
using Microsoft.Diagnostics.Tracing.Session;
namespace ConsoleApp1
{
//helper class to store frame timestamps
public class TimestampCollection
{
const int MAXNUM = 1000;
public string Name { get; set; }
List<long> timestamps = new List<long>(MAXNUM + 1);
object sync = new object();
//add value to the collection
public void Add(long timestamp)
{
lock (sync)
{
timestamps.Add(timestamp);
if (timestamps.Count > MAXNUM) timestamps.RemoveAt(0);
}
}
//get the number of timestamps withing interval
public int QueryCount(long from, long to)
{
int c = 0;
lock (sync)
{
foreach (var ts in timestamps)
{
if (ts >= from && ts <= to) c++;
}
}
return c;
}
}
class Program
{
//event codes (https://github.com/GameTechDev/PresentMon/blob/40ee99f437bc1061a27a2fc16a8993ee8ce4ebb5/PresentData/PresentMonTraceConsumer.cpp)
public const int EventID_D3D9PresentStart = 1;
public const int EventID_DxgiPresentStart = 42;
//ETW provider codes
public static readonly Guid DXGI_provider = Guid.Parse("{CA11C036-0102-4A2D-A6AD-F03CFED5D3C9}");
public static readonly Guid D3D9_provider = Guid.Parse("{783ACA0A-790E-4D7F-8451-AA850511C6B9}");
static TraceEventSession m_EtwSession;
static Dictionary<int, TimestampCollection> frames = new Dictionary<int, TimestampCollection>();
static Stopwatch watch = null;
static object sync = new object();
static void EtwThreadProc()
{
//start tracing
m_EtwSession.Source.Process();
}
static void OutputThreadProc()
{
//console output loop
while (true)
{
long t1, t2;
long dt = 2000;
Console.Clear();
Console.WriteLine(DateTime.Now.ToString() + "." + DateTime.Now.Millisecond.ToString());
Console.WriteLine();
lock (sync)
{
t2 = watch.ElapsedMilliseconds;
t1 = t2 - dt;
foreach (var x in frames.Values)
{
Console.Write(x.Name + ": ");
//get the number of frames
int count = x.QueryCount(t1, t2);
//calculate FPS
Console.WriteLine("{0} FPS", (double)count / dt * 1000.0);
}
}
Console.WriteLine();
Console.WriteLine("Press any key to stop tracing...");
Thread.Sleep(1000);
}
}
public static void Main(string[] argv)
{
//create ETW session and register providers
m_EtwSession = new TraceEventSession("mysess");
m_EtwSession.StopOnDispose = true;
m_EtwSession.EnableProvider("Microsoft-Windows-D3D9");
m_EtwSession.EnableProvider("Microsoft-Windows-DXGI");
//handle event
m_EtwSession.Source.AllEvents += data =>
{
//filter out frame presentation events
if (((int)data.ID == EventID_D3D9PresentStart && data.ProviderGuid == D3D9_provider) ||
((int)data.ID == EventID_DxgiPresentStart && data.ProviderGuid == DXGI_provider))
{
int pid = data.ProcessID;
long t;
lock (sync)
{
t = watch.ElapsedMilliseconds;
//if process is not yet in Dictionary, add it
if (!frames.ContainsKey(pid))
{
frames[pid] = new TimestampCollection();
string name = "";
var proc = Process.GetProcessById(pid);
if (proc != null)
{
using (proc)
{
name = proc.ProcessName;
}
}
else name = pid.ToString();
frames[pid].Name = name;
}
//store frame timestamp in collection
frames[pid].Add(t);
}
}
};
watch = new Stopwatch();
watch.Start();
Thread thETW = new Thread(EtwThreadProc);
thETW.IsBackground = true;
thETW.Start();
Thread thOutput = new Thread(OutputThreadProc);
thOutput.IsBackground = true;
thOutput.Start();
Console.ReadKey();
m_EtwSession.Dispose();
}
}
}
Based on the source code of PresentMon project.
Fraps inserts a DLL into every running application and hooks specific DX calls to figure out the framerate and capture video, pretty sure that you'll have to do something similar. After a bit of poking around I found a Github project that does some basic DX hooking for doing captures and overlays, so that might be a good spot to start out with. Though I've not used it personally so I can't totally vouch for the quality.
http://spazzarama.com/2011/03/14/c-screen-capture-and-overlays-for-direct3d-9-10-and-11-using-api-hooks/
Building on https://stackoverflow.com/a/54625953/12047161:
I had more success not using the stopwatch as the event triggers seems to be asynchronous with the actual frames. I kept getting batches of 20-50 frames all at once, making the estimated FPS fluctuate between 50 and 250% of the actual value.
Instead i used TimeStampRelativeMSec
//handle event
m_EtwSession.Source.AllEvents += data =>
{
//filter out frame presentation events
if((int) data.ID == EventID_DxgiPresentStart && data.ProviderGuid == DXGI_provider)
{
int pid = data.ProcessID;
long t;
t = watch.ElapsedMilliseconds;
//if process is not yet in Dictionary, add it
if (!frames.ContainsKey(pid))
{
frames[pid] = new TimestampCollection();
string name = "";
var proc = Process.GetProcessById(pid);
if (proc != null)
{
using (proc)
{
name = proc.ProcessName;
}
}
else name = pid.ToString();
frames[pid].Name = name;
}
frames[pid].Add((long)data.TimeStampRelativeMSec);
}
};
property from the TraceEvent class, and calculate FPS by rounding the average time between an arbitrary number of past entries:
public double GetFrameTime(int count)
{
double returnValue = 0;
int listCount = timestamps.Count;
if(listCount > count)
{
for(int i = 1; i <= count; i++)
{
returnValue += timestamps[listCount - i] - timestamps[listCount - (i + 1)];
}
returnValue /= count;
}
return returnValue;
}
This method gave me far more accurate (Compared to, as available, in-game counters) of several different games i've tried.

Categories

Resources