I have a working implementation of NAudio's wasapi loopback recording and the FFT of the data.
Most of the data I get is just as it should be but every once in a while (10 sec to minutes intervals) it shows amplitude on almost all frequencies.
Basicly the picture is rolling from right to left with time and frequencies going on logarithmic scale from lowest frequencies on the bottom. The lines are the errors. As far as i can tell those are not supposed to be there.
I get the audio buffer and send the samples to an aggregator (applies Hamming window) which implements the NAudio FFT. I have checked the data (FFT result) before I modify it in any way (the picture is not from the raw FFT output, but desibel scaled) confirming the FFT result is giving those lines. I could also point out the picture is modified with LockBits so I thought I had something wrong with the logic there, but that's why I checked the FFT output data which shows the same problem.
Well I could be wrong and the problem might be somewhere I said it isn't but it really seems it originates from the FFT OR the buffer data (data itself or the aggregation of samples). Somehow I doubt the buffer itself is corrupted like this.
If anyone has any idea what could cause this I would greatly appreciate it!
UPDATE
So I decided to draw the whole FFT result range rather than half of it. It showed something strange. I'm not sure of FFT but I thought Fourier transformation should give a result that is mirrored around the middle. This certainly is not the case here.
The picture is in linear scale so the exact middle of the picture is the middle point of the FFT result. Bottom is the first and top is the last.
I was playing a 10kHz sine wave which gives the two horizontal lines there but the top part is beyond me. It also seems like the lines are mirrored around the bottom quarter of the picture so that seems strange to me as well.
UPDATE 2
So I increased the FFT size from 4096 to 8192 and tried again. This is the output with me messing with the sine frequency.
It would seem the result is mirrored twice. Once in the middle and then again on the top and bottom halves. And the huge lines are now gone.. And it would seem like the lines only appear on the bottom half now.
After some further testing with different FFT lengths it seems the lines are completely random in that account.
UPDATE 3
I have done some testing with many things. The latest thing I added was overlapping of samples so that I reuse the last half of the sample array in the beginning of the next FFT. On Hamming and Hann windows it gives me massive intensities (quite like in the second picture I posted) but not with BlackmannHarris. Disabling overlapping removes the biggest errors on every window function. The smaller errors like in the top picture still remain even with BH window. I still have no idea why those lines appear.
My current form allows control over which window function to use (of the three previously mentioned), overlapping (on/off) and multiple different drawing options. This allows me to compare all the affecting parties effects when changed.
I shall investigate further (I am quite sure I have made a mistake at some point) but good suggestions are more than welcome!
The problem was in the way I handled the data arrays. Working like a charm now.
Code (removed excess and might have added mistakes):
// Other inputs are also usable. Just look through the NAudio library.
private IWaveIn waveIn;
private static int fftLength = 8192; // NAudio fft wants powers of two!
// There might be a sample aggregator in NAudio somewhere but I made a variation for my needs
private SampleAggregator sampleAggregator = new SampleAggregator(fftLength);
public Main()
{
sampleAggregator.FftCalculated += new EventHandler<FftEventArgs>(FftCalculated);
sampleAggregator.PerformFFT = true;
// Here you decide what you want to use as the waveIn.
// There are many options in NAudio and you can use other streams/files.
// Note that the code varies for each different source.
waveIn = new WasapiLoopbackCapture();
waveIn.DataAvailable += OnDataAvailable;
waveIn.StartRecording();
}
void OnDataAvailable(object sender, WaveInEventArgs e)
{
if (this.InvokeRequired)
{
this.BeginInvoke(new EventHandler<WaveInEventArgs>(OnDataAvailable), sender, e);
}
else
{
byte[] buffer = e.Buffer;
int bytesRecorded = e.BytesRecorded;
int bufferIncrement = waveIn.WaveFormat.BlockAlign;
for (int index = 0; index < bytesRecorded; index += bufferIncrement)
{
float sample32 = BitConverter.ToSingle(buffer, index);
sampleAggregator.Add(sample32);
}
}
}
void FftCalculated(object sender, FftEventArgs e)
{
// Do something with e.result!
}
And the Sample Aggregator class:
using NAudio.Dsp; // The Complex and FFT are here!
class SampleAggregator
{
// FFT
public event EventHandler<FftEventArgs> FftCalculated;
public bool PerformFFT { get; set; }
// This Complex is NAudio's own!
private Complex[] fftBuffer;
private FftEventArgs fftArgs;
private int fftPos;
private int fftLength;
private int m;
public SampleAggregator(int fftLength)
{
if (!IsPowerOfTwo(fftLength))
{
throw new ArgumentException("FFT Length must be a power of two");
}
this.m = (int)Math.Log(fftLength, 2.0);
this.fftLength = fftLength;
this.fftBuffer = new Complex[fftLength];
this.fftArgs = new FftEventArgs(fftBuffer);
}
bool IsPowerOfTwo(int x)
{
return (x & (x - 1)) == 0;
}
public void Add(float value)
{
if (PerformFFT && FftCalculated != null)
{
// Remember the window function! There are many others as well.
fftBuffer[fftPos].X = (float)(value * FastFourierTransform.HammingWindow(fftPos, fftLength));
fftBuffer[fftPos].Y = 0; // This is always zero with audio.
fftPos++;
if (fftPos >= fftLength)
{
fftPos = 0;
FastFourierTransform.FFT(true, m, fftBuffer);
FftCalculated(this, fftArgs);
}
}
}
}
public class FftEventArgs : EventArgs
{
[DebuggerStepThrough]
public FftEventArgs(Complex[] result)
{
this.Result = result;
}
public Complex[] Result { get; private set; }
}
And that is it I think. I might have missed something though.
Hope this helps!
Related
I am using Visual Studio C# 2010 on Windows 10.
EDIT: My objective is to create an abort button which allows me to use the UI to cancel a loop.
I'm using an example I found online which is similar to this:
Here is the code:
public void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
e.Result = "";
for (int i = 0; i < 100; i++)
{
System.Threading.Thread.Sleep(50); //do some intense task here.
if (backgroundWorker1.CancellationPending)
{
e.Cancel = true;
return;
}
}
}
(There was a bit of code sharing the date/time but I removed it since it was superfluous to the basic example).
How I'm understanding this code is that, 100 times, the code pauses the main thread and works on "some intense task" for 50ms. Is this the case?
If so, I think I've ran into a problem. I have this code I want to run:
private void btn_runtest_Click(object sender, EventArgs e)
{
// Exterior Platform Loop
Output.AppendText("Starting Test \n");
for (int i = 0; i <= 3200; i += 178)
{
// Interior Platform Loop
for (int ii = 0; i <= 6400; ii += 178)
{
comport7_interior.Write("#1N178\r"); //change to have actual length?
comport7_interior.Write("#1G\r");
Output.AppendText("Interior shift 5 degrees \n");
Thread.Sleep(4000);
// ***********************************************************************************
// Read data from comport2_lightsensor, Parse byte, store Lux value, store angle (i,ii)
// ***********************************************************************************
}
comport8_exterior.Write("#0N178\r");
comport8_exterior.Write("#0G\r");
Output.AppendText("Exterior shift 5 degrees \n");
Thread.Sleep(4000);
//flip direction for internal
if (IsOdd(jjj))
{
//
comport7_interior.Write("#1-\r");
Output.AppendText("Interior switching direction to counter clockwise \n");
}
else
{
comport7_interior.Write("#1+\r");
Output.AppendText("Interior switching direction to clockwise \n");
}
jjj = jjj + 1;
// ***********************************************************************************
// Read data from compart2_lightsensor, Parse byte, store Lux value, store angle (i,ii)
// ***********************************************************************************
}
Output.AppendText("Loop Ended");
// manually reverse mount direction
// repeat the whole double for loop
}
This is pretty "fatty" code, so to summarize, it controls two stepper motors, rotates them to follow the desired path, pauses for 4 seconds, and will eventually log data. We could do the exact math but simply estimating we can realize this code will take 1-2 hours. With all of this pausing, it does not seem conducive to being split into 100 little chunks on worked on separately (if my understanding of the previously stated code is correct). Correct me I'm wrong but I don't think this code would fit in the above code.
Does this mean I'm looking at the wrong approach?
Thank you all in advance.
thats how i wrote your beautiful code(some simple changes for me for easier understanding)
private void Form1_Load(object sender, EventArgs e)
{
prev = GetDesktopImage();//get a screenshot of the desktop;
cur = GetDesktopImage();//get a screenshot of the desktop;
var locked1 = cur.LockBits(new Rectangle(0, 0, cur.Width, cur.Height),
ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
var locked2 = prev.LockBits(new Rectangle(0, 0, prev.Width, prev.Height),
ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
ApplyXor(locked1, locked2);
compressionBuffer = new byte[1920* 1080 * 4];
// Compressed buffer -- where the data goes that we'll send.
int backbufSize = LZ4.LZ4Codec.MaximumOutputLength(this.compressionBuffer.Length) + 4;
backbuf = new CompressedCaptureScreen(backbufSize);
MessageBox.Show(compressionBuffer.Length.ToString());
int length = Compress();
MessageBox.Show(backbuf.Data.Length.ToString());//prints the new buffer size
}
the compression buffer length is for example 8294400
and the backbuff.Data.length is 8326947
I didn't like the compression suggestions, so here's what I would do.
You don't want to compress a video stream (so MPEG, AVI, etc are out of the question -- these don't have to be real-time) and you don't want to compress individual pictures (since that's just stupid).
Basically what you want to do is detect if things change and send the differences. You're on the right track with that; most video compressors do that. You also want a fast compression/decompression algorithm; especially if you go to more FPS that will become more relevant.
Differences. First off, eliminate all branches in your code, and make sure memory access is sequential (e.g. iterate x in the inner loop). The latter will give you cache locality. As for the differences, I'd probably use a 64-bit XOR; it's easy, branchless and fast.
If you want performance, it's probably better to do this in C++: The current C# implementation doesn't vectorize your code, and that will help you a great deal here.
Do something like this (I'm assuming 32bit pixel format):
for (int y=0; y<height; ++y) // change to PFor if you like
{
ulong* row1 = (ulong*)(image1BasePtr + image1Stride * y);
ulong* row2 = (ulong*)(image2BasePtr + image2Stride * y);
for (int x=0; x<width; x += 2)
row2[x] ^= row1[x];
}
Fast compression and decompression usually means simpler compression algorithms. https://code.google.com/p/lz4/ is such an algorithm, and there's a proper .NET port available for that as well. You might want to read on how it works too; there is a streaming feature in LZ4 and if you can make it handle 2 images instead of 1 that will probably give you a nice compression boost.
All in all, if you're trying to compress white noise, it simply won't work and your frame rate will drop. One way to solve this is to reduce the colors if you have too much 'randomness' in a frame. A measure for randomness is entropy, and there are several ways to get a measure of the entropy of a picture ( https://en.wikipedia.org/wiki/Entropy_(information_theory) ). I'd stick with a very simple one: check the size of the compressed picture -- if it's above a certain limit, reduce the number of bits; if below, increase the number of bits.
Note that increasing and decreasing bits is not done with shifting in this case; you don't need your bits to be removed, you simply need your compression to work better. It's probably just as good to use a simple 'AND' with a bitmask. For example, if you want to drop 2 bits, you can do it like this:
for (int y=0; y<height; ++y) // change to PFor if you like
{
ulong* row1 = (ulong*)(image1BasePtr + image1Stride * y);
ulong* row2 = (ulong*)(image2BasePtr + image2Stride * y);
ulong mask = 0xFFFCFCFCFFFCFCFC;
for (int x=0; x<width; x += 2)
row2[x] = (row2[x] ^ row1[x]) & mask;
}
PS: I'm not sure what I would do with the alpha component, I'll leave that up to your experimentation.
Good luck!
The long answer
I had some time to spare, so I just tested this approach. Here's some code to support it all.
This code normally run over 130 FPS with a nice constant memory pressure on my laptop, so the bottleneck shouldn't be here anymore. Note that you need LZ4 to get this working and that LZ4 is aimed at high speed, not high compression ratio's. A bit more on that later.
First we need something that we can use to hold all the data we're going to send. I'm not implementing the sockets stuff itself here (although that should be pretty simple using this as a start), I mainly focused on getting the data you need to send something over.
// The thing you send over a socket
public class CompressedCaptureScreen
{
public CompressedCaptureScreen(int size)
{
this.Data = new byte[size];
this.Size = 4;
}
public int Size;
public byte[] Data;
}
We also need a class that will hold all the magic:
public class CompressScreenCapture
{
Next, if I'm running high performance code, I make it a habit to preallocate all the buffers first. That'll save you time during the actual algorithmic stuff. 4 buffers of 1080p is about 33 MB, which is fine - so let's allocate that.
public CompressScreenCapture()
{
// Initialize with black screen; get bounds from screen.
this.screenBounds = Screen.PrimaryScreen.Bounds;
// Initialize 2 buffers - 1 for the current and 1 for the previous image
prev = new Bitmap(screenBounds.Width, screenBounds.Height, PixelFormat.Format32bppArgb);
cur = new Bitmap(screenBounds.Width, screenBounds.Height, PixelFormat.Format32bppArgb);
// Clear the 'prev' buffer - this is the initial state
using (Graphics g = Graphics.FromImage(prev))
{
g.Clear(Color.Black);
}
// Compression buffer -- we don't really need this but I'm lazy today.
compressionBuffer = new byte[screenBounds.Width * screenBounds.Height * 4];
// Compressed buffer -- where the data goes that we'll send.
int backbufSize = LZ4.LZ4Codec.MaximumOutputLength(this.compressionBuffer.Length) + 4;
backbuf = new CompressedCaptureScreen(backbufSize);
}
private Rectangle screenBounds;
private Bitmap prev;
private Bitmap cur;
private byte[] compressionBuffer;
private int backbufSize;
private CompressedCaptureScreen backbuf;
private int n = 0;
First thing to do is capture the screen. This is the easy part: simply fill the bitmap of the current screen:
private void Capture()
{
// Fill 'cur' with a screenshot
using (var gfxScreenshot = Graphics.FromImage(cur))
{
gfxScreenshot.CopyFromScreen(screenBounds.X, screenBounds.Y, 0, 0, screenBounds.Size, CopyPixelOperation.SourceCopy);
}
}
As I said, I don't want to compress 'raw' pixels. Instead, I'd much rather compress XOR masks of previous and the current image. Most of the times this will give you a whole lot of 0's, which is easy to compress:
private unsafe void ApplyXor(BitmapData previous, BitmapData current)
{
byte* prev0 = (byte*)previous.Scan0.ToPointer();
byte* cur0 = (byte*)current.Scan0.ToPointer();
int height = previous.Height;
int width = previous.Width;
int halfwidth = width / 2;
fixed (byte* target = this.compressionBuffer)
{
ulong* dst = (ulong*)target;
for (int y = 0; y < height; ++y)
{
ulong* prevRow = (ulong*)(prev0 + previous.Stride * y);
ulong* curRow = (ulong*)(cur0 + current.Stride * y);
for (int x = 0; x < halfwidth; ++x)
{
*(dst++) = curRow[x] ^ prevRow[x];
}
}
}
}
For the compression algorithm I simply pass the buffer to LZ4 and let it do its magic.
private int Compress()
{
// Grab the backbuf in an attempt to update it with new data
var backbuf = this.backbuf;
backbuf.Size = LZ4.LZ4Codec.Encode(
this.compressionBuffer, 0, this.compressionBuffer.Length,
backbuf.Data, 4, backbuf.Data.Length-4);
Buffer.BlockCopy(BitConverter.GetBytes(backbuf.Size), 0, backbuf.Data, 0, 4);
return backbuf.Size;
}
One thing to note here is that I make it a habit to put everything in my buffer that I need to send over the TCP/IP socket. I don't want to move data around if I can easily avoid it, so I'm simply putting everything that I need on the other side there.
As for the sockets itself, you can use a-sync TCP sockets here (I would), but if you do, you will need to add an extra buffer.
The only thing that remains is to glue everything together and put some statistics on the screen:
public void Iterate()
{
Stopwatch sw = Stopwatch.StartNew();
// Capture a screen:
Capture();
TimeSpan timeToCapture = sw.Elapsed;
// Lock both images:
var locked1 = cur.LockBits(new Rectangle(0, 0, cur.Width, cur.Height),
ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
var locked2 = prev.LockBits(new Rectangle(0, 0, prev.Width, prev.Height),
ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
try
{
// Xor screen:
ApplyXor(locked2, locked1);
TimeSpan timeToXor = sw.Elapsed;
// Compress screen:
int length = Compress();
TimeSpan timeToCompress = sw.Elapsed;
if ((++n) % 50 == 0)
{
Console.Write("Iteration: {0:0.00}s, {1:0.00}s, {2:0.00}s " +
"{3} Kb => {4:0.0} FPS \r",
timeToCapture.TotalSeconds, timeToXor.TotalSeconds,
timeToCompress.TotalSeconds, length / 1024,
1.0 / sw.Elapsed.TotalSeconds);
}
// Swap buffers:
var tmp = cur;
cur = prev;
prev = tmp;
}
finally
{
cur.UnlockBits(locked1);
prev.UnlockBits(locked2);
}
}
Note that I reduce Console output to ensure that's not the bottleneck. :-)
Simple improvements
It's a bit wasteful to compress all those 0's, right? It's pretty easy to track the min and max y position that has data using a simple boolean.
ulong tmp = curRow[x] ^ prevRow[x];
*(dst++) = tmp;
hasdata |= tmp != 0;
You also probably don't want to call Compress if you don't have to.
After adding this feature you'll get something like this on your screen:
Iteration: 0.00s, 0.01s, 0.01s 1 Kb => 152.0 FPS
Using another compression algorithm might also help. I stuck to LZ4 because it's simple to use, it's blazing fast and compresses pretty well -- still, there are other options that might work better. See http://fastcompression.blogspot.nl/ for a comparison.
If you have a bad connection or if you're streaming video over a remote connection, all this won't work. Best to reduce the pixel values here. That's quite simple: apply a simple 64-bit mask during the xor to both the previous and current picture... You can also try using indexed colors - anyhow, there's a ton of different things you can try here; I just kept it simple because that's probably good enough.
You can also use Parallel.For for the xor loop; personally I didn't really care about that.
A bit more challenging
If you have 1 server that is serving multiple clients, things will get a bit more challenging, as they will refresh at different rates. We want the fastest refreshing client to determine the server speed - not slowest. :-)
To implement this, the relation between the prev and cur has to change. If we simply 'xor' away like here, we'll end up with a completely garbled picture at the slower clients.
To solve that, we don't want to swap prev anymore, as it should hold key frames (that you'll refresh when the compressed data becomes too big) and cur will hold incremental data from the 'xor' results. This means you can basically grab an arbitrary 'xor'red frame and send it over the line - as long as the prev bitmap is recent.
H264 or Equaivalent Codec Streaming
There are various compressed streaming available which does almost everything that you can do to optimize screen sharing over network. There are many open source and commercial libraries to stream.
Screen transfer in Blocks
H264 already does this, but if you want to do it yourself, you have to divide your screens into smaller blocks of 100x100 pixels, and compare these blocks with previous version and send these blocks over network.
Window Render Information
Microsoft RDP does lot better, it does not send screen as a raster image, instead it analyzes screen and creates screen blocks based on the windows on the screen. It then analyzes contents of screen and sends image only if needed, if it is a text box with some text in it, RDP sends information to render text box with a text with font information and other information. So instead of sending image, it sends information on what to render.
You can combine all techniques and make a mixed protocol to send screen blocks with image and other rendering information.
Instead of handling data as an array of bytes, you can handle it as an array of integers.
int* p = (int*)((byte*)scan0.ToPointer() + y * stride);
int* p2 = (int*)((byte*)scan02.ToPointer() + y * stride2);
for (int x = 0; x < nWidth; x++)
{
//always get the complete pixel when differences are found
if (*p2 != 0)
*p = *p2
++p;
++p2;
}
I'm currently using Naudio to create a mixer with multiple inputs and multiple outputs.
For each input, while the audio is playing, I would like to be able to add a sample sized delay whenever I choose to.
The effect will be similar to pausing for, say 100 samples, and then playing again.
For example, a mixingSampleProvider is currently playing audio to output Channel 5, using multiplexingSampleProvider. On a certain button click, i would like to delay mixingSampleProvider by 200 samples, before it continues playing from the point it paused for a moment.
I would like to know if it's possible to do so, and what options can be looked at to achieve this effect.
Edit:
The specific problem which requires fixing, would be the knowledge gap I have in inserting Zeros to the buffer, while pushing all the rest of the samples back, while audio is playing. Here, I'm not looking for code to be written, but am looking for resources to attempt to create a new class to solve this issue. Or whether it is even possible at all.
Edit 2: Code I have tried (failed)
public int Read(float[] buffer, int offset, int count)
{
int sampleRead = source.Read(buffer,offset,count);
int delaySamples = Math.Min(count, DelayBySamples - delayPos);
float[] copyBuffer = buffer;
for (int n = 0; n < delaySamples;n++ )
{
buffer[offset + n] = 0;
}
for (int n = 0; n < sampleRead;n++ )
{
buffer[offset + delaySamples + n] = copyBuffer[offset+n];
}
delayPos += delaySamples;
sampleRead += delaySamples;
return sampleRead;
}
You should implement this by creating your own custom ISampleProvider that wraps each input to your mixer (Decorator pattern). Then have a method that tells it to insert n samples of silence. Then in the Read method of your SampleProvider if silence is required, then put up to the desired number of silent samples into the read buffer. And when the pause finishes, start returning samples from the source provider again.
You can have a look at the source code to the OffsetSampleProvider for inspiration, which allows you to add a specified number of silence samples at the start of your stream.
I have a compute shader and the C# script which goes with it used to modify an array of vertices on the y axis simple enough to be clear.
But despite the fact that it runs fine the shader seems to forget the first vertex of my shape (except when that shape is a closed volume?)
Here is the C# class :
Mesh m;
//public bool stopProcess = false; //Useless in this version of exemple
MeshCollider coll;
public ComputeShader csFile; //the compute shader file added the Unity way
Vector3[] arrayToProcess; //An array of vectors i'll use to store data
ComputeBuffer cbf; //the buffer CPU->GPU (An early version with exactly
//the same result had only this one)
ComputeBuffer cbfOut; //the Buffer GPU->CPU
int vertexLength;
void Awake() { //Assigning my stuff
coll = gameObject.GetComponent<MeshCollider>();
m = GetComponent<MeshFilter>().sharedMesh;
vertexLength = m.vertices.Length;
arrayToProcess = m.vertices; //setting the first version of the vertex array (copy of mesh)
}
void Start () {
cbf = new ComputeBuffer(vertexLength,32); //Buffer in
cbfOut = new ComputeBuffer(vertexLength,32); //Buffer out
csFile.SetBuffer(0,"Board",cbf);
csFile.SetBuffer(0,"BoardOut",cbfOut);
}
void Update () {
csFile.SetFloat("time",Time.time);
cbf.SetData(m.vertices);
csFile.Dispatch(0,vertexLength,vertexLength,1); //Dispatching (i think there is my mistake)
cbfOut.GetData(arrayToProcess); //getting back my processed vertices
m.vertices = arrayToProcess; //assigning them to the mesh
//coll.sharedMesh = m; //collider stuff useless in this demo
}
And my compute shader script :
#pragma kernel CSMain
RWStructuredBuffer<float3> Board : register(s[0]);
RWStructuredBuffer<float3> BoardOut : register(s[1]);
float time;
[numthreads(1,1,1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
float valx = (sin((time*4)+Board[id.x].x));
float valz = (cos((time*2)+Board[id.x].z));
Board[id.x].y = (valx + valz)/5;
BoardOut[id.x] = Board[id.x];
}
At the beginning I was reading and writing from the same buffer, but as I had my issue I tried having separate buffers, but with no success. I still have the same problem.
Maybe I misunderstood the way compute shaders are supposed to be used (and I know I could use a vertex shader but I just want to try compute shaders for further improvements.)
To complete what I said, I suppose it is related with the way vertices are indexed in the Mesh.vertices Array.
I tried a LOT of different Blocks/Threads configuration but nothing seems to solve the issue combinations tried :
Block Thread
60,60,1 1,1,1
1,1,1 60,60,3
10,10,3 3,1,1
and some others I do not remember. I think the best configuration should be something with a good balance like :
Block : VertexCount,1,1 Thread : 3,1,1
About the closed volume: I'm not sure about that because with a Cube {8 Vertices} everything seems to move accordingly, but with a shape with an odd number of vertices, the first (or last did not checked that yet) seems to not be processed
I tried it with many different shapes but subdivided planes are the most obvious, one corner is always not moving.
EDIT :
After further study i found out that it is simply the compute shader which does not compute the last (not the first i checked) vertices of the mesh, it seems related to the buffer type, i still dont get why RWStructuredBuffer should be an issue or how badly i use it, is it reserved to streams? i cant understand the MSDN doc on this one.
EDIT : After resolution
The C# script :
using UnityEngine;
using System.Collections;
public class TreeObject : MonoBehaviour {
Mesh m;
public bool stopProcess = false;
MeshCollider coll;
public ComputeShader csFile;
Vector3[] arrayToProcess;
ComputeBuffer cbf;
ComputeBuffer cbfOut;
int vertexLength;
// Use this for initialization
void Awake() {
coll = gameObject.GetComponent<MeshCollider>();
m = GetComponent<MeshFilter>().mesh;
vertexLength = m.vertices.Length+3; //I add 3 because apparently
//vertexnumber is odd
//arrayToProcess = new Vector3[vertexLength];
arrayToProcess = m.vertices;
}
void Start () {
cbf = new ComputeBuffer(vertexLength,12);
cbfOut = new ComputeBuffer(vertexLength,12);
csFile.SetBuffer(0,"Board",cbf);
csFile.SetBuffer(0,"BoardOut",cbfOut);
}
// Update is called once per frame
void Update () {
csFile.SetFloat("time",Time.time);
cbf.SetData(m.vertices);
csFile.Dispatch(0,vertexLength,1,1);
cbfOut.GetData(arrayToProcess);
m.vertices = arrayToProcess;
coll.sharedMesh = m;
}
}
I had already rolled back to a
Blocks VCount,1,1
Before your answer because it was logic that i was using VCount*VCount so processing the vertices "square-more" times than needed.
To complete, you were absolutely right the Stride was obviously giving issues could you complete your answer with a link to doc about the stride parameter? (from anywhere because Unity docs are VOID and MSDN did not helped me to get why it should be 12 and not 32 (as i thought 32 was the size of a float3)
so Doc needed please
In the mean time i'll try to provide a flexible enough (generic?) version of this to make it stronger, and start adding some nice array processing functions in my shader...
I'm familiar with Compute Shaders but have never touched Unity, but having looked over the documentation for Compute Shaders in Unity a couple of things stand out.
The cbf and cbfOut ComputeBuffers are created with a stride of 32 (bytes?). Both your StructuredBuffers contain float3s which have a stride of 12 bytes, not 32. Where has 32 come from?
When you dispatch your compute shader you're requesting a two-dimensional dispatch (vertexLength,vertexLength, 1) but you're operating on a 1D array of float3s. You will end up with a race condition where many different threads think they're responsible for updating each element of the array. Although awful for performance, if you want a thread group size of [numthreads(1,1,1)] then you should dispatch (vertexLength, 1, 1) numbers of waves/wavefronts when calling Dispatch (ie, Dispatch (60,1,1) with numThreads(1,1,1)).
For best/better performance the number of threads in your thread group / wave should at least be a multiple of 64 for best efficiency on AMD hardware. You then need only dispatch ceil(numVertices/64) wavefronts and then simply insert some logic into the shader to ensure id.x is not out of bounds for any given thread.
EDIT:
The documentation for the ComputeBuffer constructor is here: Unity ComputeBuffer Documentation
While it doesn't explicitly say "stride" is in bytes, it's the only reasonable assumption.
I am plotting to my data to ZedGraph. Using FileStream to read files. Sometimes my data is greater than 200 megabyte. To draw this amount of data i should calculate peak values or must apply a window. However i want to see the all points of zoomed area. Please share any suggestion.
PointPairList list1 = new PointPairList();
int read;
int count = 0;
while (file.Position < file.Length)
{
read = file.Read(mainBuffer, 0, mainBuffer.Length);
for (int i = 0; i < read / window; i++)
{
list1.Add(count++, BitConverter.ToSingle(mainBuffer, i * window));
count++;
}
}
myCurve1 = zgc.MasterPane.PaneList[1].AddCurve(null, list1, Color.Lime, SymbolType.None);
myCurve1.IsX2Axis = true;
zgc.MasterPane.PaneList[1].XAxis.Scale.MaxAuto = true;
zgc.MasterPane.PaneList[1].XAxis.Scale.MinAuto = true;
zgc.AxisChange();
zgc.Invalidate();
window=2048 for file size between 100 megabyte to 300 megabyte.
Instead of using a PointPairList, I would suggest to use a FilteredPointList instead. By this way, you can keep every points in memory, ZedGraph will only show the points that are necessary for display.
The FilteredPointList class is well explained here.
You will have to change your code a bit this way:
// Load the X, Y points in two double arrays
// ...
var list1 = new FilteredPointList(xArray, yArray);
// ...
// Use the ZoomEvent to adjust the bounds of the filtered point list
void zedGraphControl1_ZoomEvent(ZedGraphControl sender, ZoomState oldState, ZoomState newState)
{
// The maximum number of point to displayed is based on the width of the graphpane, and the visible range of the X axis
list1.SetBounds(sender.GraphPane.XAxis.Scale.Min, sender.GraphPane.XAxis.Scale.Max, (int)zgc.GraphPane.Rect.Width);
// This refreshes the graph when the button is released after a panning operation
if (newState.Type == ZoomState.StateType.Pan)
sender.Invalidate();
}
Edit
If you can't not host all the points in memory, then you will have to provide your own IPointList implementation for ZedGraph using the logic in the code you describe above. You can inspire from the FilteredPointList itself.
I would use the SetBounds method to preload the points from the disk, based on the decimation algorithm you already implemented, using the min, max and MaxPts in parameters.