Remove noise and extreme values from data?

Remove noise and extreme values from data? - c#

I have a program that reads data over serial from an ADC on a PSoC.
The numbers are sent in the format <uint16>, inclusive of the '<' and '>' symbols, transmitted in binary format 00111100 XXXXXXXX XXXXXXXX 00111110 where the 'X's make up the 16 bit unsigned int.
Occasionally the read won't work very well and the program uses the binary data for the '>' symbol as part of its number resulting in the glitch as shown in this screenshot of 2500 samples (ignore the drop between samples 800 to 1500, that was me playing with the ADC input):
You can clearly see that the glitch causes the data to sample roughly the same value each time it happens.
The data is sent ten times a second, so what I was planning on doing was to take ten samples, remove any glitches (where the value is far away from the other samples) and then average the remaining values to smooth out the curve a bit. The output can go anywhere from 0 to 50000+ so I can't just remove values below a certain number.
I'm uncertain how to remove the values that are a long way out of the range of the other values in the 10-sample group, because there may be instances where there are two samples that are affected by this glitch. Perhaps there's some other way of fixing this glitchy data instead of just working around it!
What is the best way of doing this? Here's my code so far (this is inside the DataReceivedEvent method):
SerialPort sp = (SerialPort)sender; //set up serial port
byte[] spBuffer = new byte[4];
int indata = 0;
sp.Read(spBuffer, 0, 4);
indata = BitConverter.ToUInt16(spBuffer, 1);
object[] o = { numSamples, nudDutyCycle.Value, freqMultiplied, nudDistance.Value, pulseWidth, indata };
lock (dt) //lock for multithread safety
{
dt.Rows.Add(o); //add data to datatable
}

I suspect your problem may be because you are reading less bytes from the serial port than you think you are.
For example, sp.Read(spBuffer, 0, 4); won't necessarily read 4 bytes. It could read 1, 2, 3 or 4 bytes (but never 0).
If you know you should be reading a certain number of bytes, try something like this:
public static void BlockingRead(SerialPort port, byte[] buffer, int offset, int count)
{
while (count > 0)
{
// SerialPort.Read() blocks until at least one byte has been read, or SerialPort.ReadTimeout milliseconds
// have elapsed. If a timeout occurs a TimeoutException will be thrown.
// Because SerialPort.Read() blocks until some data is available this is not a busy loop,
// and we do NOT need to issue any calls to Thread.Sleep().
int bytesRead = port.Read(buffer, offset, count);
offset += bytesRead;
count -= bytesRead;
}
}
If there's a timeout during the read, there should be a TimeoutException, so no need to put your own timeout in there.
Then change calls like this:
sp.Read(spBuffer, 0, 4);
To this:
BlockingRead(sp, spbuffer, 0, 4);

A common method in engineering is add a damping function. A damping function basically acts on the differential of a parameter, i.e. the difference between successive values. There are no hard and fast rules about how to choose a damping function and mostly they are tweaked to produce a reasonable result.
So in your case what that means is that you compare the latest value with the one previous to it. If it is greater than a certain amount, either default the latest value to the previous one or reduce the latest value by some fixed factor, say 10% or 1%. That way you don't lose information but also don't have sudden jumps and glitches.

First of all, I would strongly suggest to just fix the parsing issue, then you won't have to worry about glitch values.
However, if you still decide to go down the route of fixing data afterwards:
I see all the glitched data is around a certain value: ~16000. In fact, judging from the graph, I'd say it's almost identical every time. You could simply ignore the data which is in the glitched value range (you would have to do some testing to find the exact bounds), and use the last non-glitched value instead.

Related

How to Plot Data Faster

I want to plot all the data points, I get from the TCP server. But I could not figure out a way to plot all the data points. Instead currently I print the string to the text box. From the text box only the first line is printed.
This is a real time data plotting for an oscilloscope GUI.
How can I plot all the values.
I tested with a sine wave with a I2S mic, it gave a distorted signal when plotted with the following code.
int t;
private void Timer1_Tick(object sender, EventArgs e)
{
one = new Thread(test);
one.Start();
t++;
}
public void test()
{
byte[] bytes = new byte[client.ReceiveBufferSize];
var readCount = stream.Read(bytes, 0, bytes.Length);
string datastring = Encoding.UTF8.GetString(bytes);
txtdata.Invoke((MethodInvoker)(() => txtdata.Text = datastring.Substring(0, 100)));
txtno.Invoke((MethodInvoker)(() =>
txtno.Text = ("\nnumber of bytes read: " + readCount)
));
String ch1 = txtdata.Text; ;
String[] ch1y = ch1.Split(new char[] { '\n', '\r' }, StringSplitOptions.RemoveEmptyEntries);
for (int a = 1; a < ch1y.Length - 1; a++)
{
chart1.Invoke((MethodInvoker)(() =>
chart1.Series[0].Points.AddXY(t, Convert.ToDouble(ch1y[a]))
));
}
}

The issue here is not how fast you plot the data, but the fact that you are trying to plot real-time analogue values using a non-uniform asynchronous pattern.
It sounds like you are trying to approach this from a First Principals perspective but you are skipping a lot of background understanding that is necessary to pull this off.
The oscilloscope has many functions that allow you to focus in on the specific width or length of the analogue sample that you see on the display, this might represent a tiny sample of audio, the amplitude of the line in the display usually represents the voltage of the electrical pulse at a relative point in time, not a specific point in time.
If your sine wave represents a constant 2KHz, and you can see 2 peaks (and 2 troughs) on the screen then you are actually looking at the data captured over an interval of 1ms, to draw a wavy line like that in a chart requires many points, so your initial attempt to plot this as a single X,Y ordinate is going to need many, hundreds perhaps, of discrete points over 1 ms to render the same sort of line, but you would also have to be extra specially careful to ensure that the x interval represented the exact amount of time, or the points you plot will not be in the right place in respect to the time the value was sampled.
This frequency of processing is not something you want to try to achieve with C#, that would need to be done way down at the hardware level.
The ESP32 you are using to sample the analogue input will be capturing a specific packet of data at a specific interval, you may have seen phrases like 8-Bit 16Khz Mono to describe audio quality.
Depending on what processing is going on in the ESP32, the bytes that you receieve will usually represent the total sample, we then need to use the bitness to determine how to break the bytes into an array of values, it is not a single value.
So 8-Bit Mono means that every 8 bits will represent a single value, so in this case each byte is a separate value. Mono means that this is a 1 dimensional array of values, in Stereo or 2 channel interleaved the bytes represents 2 separate arrays of values, so every second byte actually goes into the second array...
So you don't simply convert the bytes to text using UTF8 Encoding, you need to use other libraries, like NAudio to decode this, you can try it manually though.
In simple c# terms, assuming 8-bit mono, you could plot the entire sample using the array index as X and the byte value at the x index as Y, however this still isn't going to quite match what you see in the oscilloscope.
Instead have a think about what you actually want to see on the screen, do some research into using Fast Fourier Transform calculations (FFT) to analyse the analog readings, there are even ways to do this directly on the ESP32 which may reduce the processing you do in the C# code.

I reduced the baud rate. The wave shows almost a pure sine wave at 2kHz wave after that. But only the 2kHz wave looks like that. instead of running a thread, the chart was plotted for each clock tick with Add command.

What is correct way to receive data from Socket in c#?

Assuming that I know the payload bytes before hand. What is the correct way to receive all the bytes? Currently, I am doing something like this
byte[] buffer = new byte[payloadLength];
socket.Receive(buffer, buffer.Length, SocketFlags.None);
But then, I thought that what if the payload is big and Receive might not be able to receive the whole data in one go. So I was planning to do something like this
byte[] buffer = new byte[payloadLength];
int remained = payloadLength;
int size = 0;
do {
size = socket.Receive(buffer, payloadLength - remained, remained, SocketFlags.None);
remained -= size;
} while (remained > 0 && size > 0);
Which one is more correct? Or do you guy have any better idea?

Definitely some variant like the second. Ignoring the return value from Receive is one of the most common beginner bugs found on SO, because all that the contract guarantees (if you don't ask for 0 bytes) when Receive returns is that it will have read at least one byte. It makes no guarantee that it will try to read as many bytes as you've asked for.1
Any message framing (such as here with fixed size messages, apparently) is up to you to implement atop TCPs streams of bytes.
1Even for relatively small receive sizes, there's no guarantee. So if you know how big your message is going to be because you're sending the length first (another common means of message framing), you need to loop even to get the 2/4/8 bytes that make up the message length short, int or long.

In practice the first option usually works. However you cannot guarantee it, so if it's a serious program that you care about then take the 2nd approach.
Certainly if you switched to ReceiveAsync, which you should if you have high performance needs, you would need to take the 2nd approach.

.Net SerialPort taking over 0.5 seconds to read byte when bytes available

I'm using the .Net SerialPort class in C# to read bytes from a port. On receipt of a DataReceived event I check the serial port to see if bytes are available to be read. However, even if bytes are available, the port can take over half a second to read a single byte. Code is roughtly as follows:
...
while(Port.BytesToRead > 0)
{
StopWatch.Restart();
Int32 BytesRead = Port.Read(Read, 0, 1);
StopWatch.Stop();
if (StopWatch.ElapsedMilliseconds > 100)
{
// Record the time. The stopwatch code
// was only added after performance issues were observed.
}
}
Note that the time which I've measured is not the time to read all bytes, rather the time to read a single byte. Frequently I'll receive a DataReceived event and have to wait 0.5 seconds for the first byte to be read.
I've actually tried setting the Port's ReadTimeout property to something smaller to prevent it from sitting there indefinitely, but this property seems to be ignored.
Any help greatly appreciated.

Turns out that running connected to the Debugger was causing the problem. Running outside of the debugger the maximum time recorded to read a byte was around 20ms, as opposed to up to 700ms when running within (no breakpoints, conditional or otherwise enabled).
Bit of a red herring, as the real cause of the comms problem when running a release build probably lay elsewhere.

Fastest way to delete the first few bytes of a file

I am using a windows mobile compact edition 6.5 phone and am writing out binary data to a file from bluetooth. These files get quite large, 16M+ and what I need to do is to once the file is written then I need to search the file for a start character and then delete everything before, thus eliminating garbage. I cannot do this inline when the data comes in due to graphing issues and speed as I get alot of data coming in and there is already too many if conditions on the incoming data. I figured it was best to post process. Anyway here is my dilemma, speed of search for the start bytes and the rewrite of the file takes sometimes 5mins or more...I basically move the file over to a temp file parse through it and rewrite a whole new file. I have to do this byte by byte.
private void closeFiles() {
try {
// Close file stream for raw data.
if (this.fsRaw != null) {
this.fsRaw.Flush();
this.fsRaw.Close();
// Move file, seek the first sync bytes,
// write to fsRaw stream with sync byte and rest of data after it
File.Move(this.s_fileNameRaw, this.s_fileNameRaw + ".old");
FileStream fsRaw_Copy = File.Open(this.s_fileNameRaw + ".old", FileMode.Open);
this.fsRaw = File.Create(this.s_fileNameRaw);
int x = 0;
bool syncFound = false;
// search for sync byte algorithm
while (x != -1) {
... logic to search for sync byte
if (x != -1 && syncFound) {
this.fsPatientRaw.WriteByte((byte)x);
}
}
this.fsRaw.Close();
fsRaw_Copy.Close();
File.Delete(this.s_fileNameRaw + ".old");
}
} catch(IOException e) {
CLogger.WriteLog(ELogLevel.ERROR,"Exception in writing: " + e.Message);
}
}
There has got to be a faster way than this!
------------Testing times using answer -------------
Initial Test my way with one byte read and and one byte write:
27 Kb/sec
using a answer below and a 32768 byte buffer:
321 Kb/sec
using a answer below and a 65536 byte buffer:
501 Kb/sec

You're doing a byte-wise copy of the entire file. That can't be efficient for a load of reasons. Search for the start offset (and end offset if you need both), then copy from one stream to another the entire contents between the two offsets (or the start offset and end of file).
EDIT
You don't have to read the entire contents to make the copy. Something like this (untested, but you get the idea) would work.
private void CopyPartial(string sourceName, byte syncByte, string destName)
{
using (var input = File.OpenRead(sourceName))
using (var reader = new BinaryReader(input))
using (var output = File.Create(destName))
{
var start = 0;
// seek to sync byte
while (reader.ReadByte() != syncByte)
{
start++;
}
var buffer = new byte[4096]; // 4k page - adjust as you see fit
do
{
var actual = reader.Read(buffer, 0, buffer.Length);
output.Write(buffer, 0, actual);
} while (reader.PeekChar() >= 0);
}
}
EDIT 2
I actually needed something similar to this today, so I decided to write it without the PeekChar() call. Here's the kernel of what I did - feel free to integrate it with the second do...while loop above.
var buffer = new byte[1024];
var total = 0;
do
{
var actual = reader.Read(buffer, 0, buffer.Length);
writer.Write(buffer, 0, actual);
total += actual;
} while (total < reader.BaseStream.Length);

Don't discount an approach because you're afraid it will be too slow. Try it! It'll only take 5-10 minutes to give it a try and may result in a much better solution.
If the detection process for the start of the data is not too complex/slow, then avoiding writing data until you hit the start may actually make the program skip past the junk data more efficiently.
How to do this:
Use a simple bool to know whether or not you have detected the start of the data. If you are reading junk, then don't waste time writing it to the output, just scan it to detect the start of the data. Once you find the start, then stop scanning for the start and just copy the data to the output. Just copying the good data will incur no more than an if (found) check, which really won't make any noticeable difference to your performance.
You may find that in itself solves the problem. But you can optimise it if you need more performance:
What can you do to minimise the work you do to detect the start of the data? Perhaps if you are looking for a complex sequence you only need to check for one particular byte value that starts the sequence, and it's only if you find that start byte that you need to do any more complex checking. There are some very simple but efficient string searching algorithms that may help in this sort of case too. Or perhaps you can allocate a buffer (e.g. 4kB) and gradually fill it with bytes from your incoming stream. When the buffer is filled, then and only then search for the end of the "junk" in your buffer. By batching the work you can make use of memory/cache coherence to make the processing considerably more efficient than it would be if you did the same work byte by byte.
Do all the other "conditions on the incoming data" need to be continually checked? How can you minimise the amount of work you need to do but still achieve the required results? Perhaps some of the ideas above might help here too?
Do you actually need to do any processing on the data while you are skipping junk? If not, then you can break the whole thing into two phases (skip junk, copy data), and skipping the junk won't cost you anything when it actually matters.

Issue with BinaryReader.ReadChars()

I've run into what I believe is an issue with the BinaryReader.ReadChars() method. When I wrap a BinaryReader around a raw socket NetworkStream occasionally I get a stream corruption where the stream being read gets out of sync. The stream in question contains messages in a binary serialisation protocol.
I've tracked this down to the following
It only happens when reading a unicode string (encoded using the Encoding.BigEndian)
It only happens when the string in question is split across two tcp packets (confirmed using wireshark)
I think what is happening is the following (in the context of the example below)
BinaryReader.ReadChars() is called asking it to read 3 characters (string lengths are encoded before the string itself)
First loop internally requests a read of 6 bytes (3 remaining characters * 2 bytes/char) off the network stream
Network stream only has 3 bytes available
3 bytes read into local buffer
Buffer handed to Decoder
Decoder decodes 1 char, and keeps the other byte in it's own internal buffer
Second loop internally requests a read of 4 bytes! (2 remaining characters * 2 bytes/char)
Network stream has all 4 bytes available
4 bytes read into local buffer
Buffer handed to Decoder
Decoder decodes 2 char, and keeps the remaining 4th bytes internally
String decode is complete
Serialisation code attempts to unmarshal the next item and croaks because of stream corruption.
char[] buffer = new char[3];
int charIndex = 0;
Decoder decoder = Encoding.BigEndianUnicode.GetDecoder();
// pretend 3 of the 6 bytes arrives in one packet
byte[] b1 = new byte[] { 0, 83, 0 };
int charsRead = decoder.GetChars(b1, 0, 3, buffer, charIndex);
charIndex += charsRead;
// pretend the remaining 3 bytes plus a final byte, for something unrelated,
// arrive next
byte[] b2 = new byte[] { 71, 0, 114, 3 };
charsRead = decoder.GetChars(b2, 0, 4, buffer, charIndex);
charIndex += charsRead;
I think the root is a bug in the .NET code which uses charsRemaining * bytes/char each loop to calculate the remaining bytes required. Because of the extra byte hidden in the Decoder this calculation can be off by one causing an extra byte to be consumed off the input stream.
Here's the .NET framework code in question
while (charsRemaining>0) {
// We really want to know what the minimum number of bytes per char
// is for our encoding. Otherwise for UnicodeEncoding we'd have to
// do ~1+log(n) reads to read n characters.
numBytes = charsRemaining;
if (m_2BytesPerChar)
numBytes <<= 1;
numBytes = m_stream.Read(m_charBytes, 0, numBytes);
if (numBytes==0) {
return (count - charsRemaining);
}
charsRead = m_decoder.GetChars(m_charBytes, 0, numBytes, buffer, index);
charsRemaining -= charsRead;
index+=charsRead;
}
I'm not entirely sure if this is a bug or just a misuse of the API. To work round this issue I'm just calculating the bytes required myself, reading them, and then running the byte[] through the relevant Encoding.GetString(). However this wouldn't work for something like UTF-8.
Be interested to hear people's thoughts on this and whether I'm doing something wrong or not. And maybe it will save the next person a few hours/days of tedious debugging.
EDIT: posted to connect Connect tracking item

I have reproduced the problem you mentioned with BinaryReader.ReadChars.
Although the developer always needs to account for lookahead when composing things like streams and decoders, this seems like a fairly significant bug in BinaryReader because that class is intended for reading data structures composed of various types of data. In this case, I agree that ReadChars should have been more conservative in what it read to avoid losing that byte.
There is nothing wrong with your workaround of using the Decoder directly, after all that is what ReadChars does behind the scenes.
Unicode is a simple case. If you think about an arbitrary encoding, there really is no general purpose way to ensure that the correct number of bytes are consumed when you pass in a character count instead of a byte count (think about varying length characters and cases involving malformed input). For this reason, avoiding BinaryReader.ReadChars in favor of reading the specific number of bytes provides a more robust, general solution.
I would suggest that you bring this to Microsoft's attention via http://connect.microsoft.com/visualstudio.

Interesting; you could report this on "connect". As a stop-gap, you could also try wrapping with BufferredStream, but I expect this is papering over a crack (it may still happen, but less frequently).
The other approach, of course, is to pre-buffer an entire message (but not the entire stream); then read from something like MemoryStream - assuming your network protocol has logical (and ideally length-prefixed, and not too big) messages. Then when it is decoding all the data is available.

This reminds of one of my own questions (Reading from a HttpResponseStream fails) where I had an issue that when reading from a HTTP response stream the StreamReader would think it had hit the end of the stream prematurely so my parsers would bomb out unexpectedly.
Like Marc suggested for your problem I first tried pre-buffering in a MemoryStream which works well but means you may have to wait a long time if you have a large file to read (especially from the network/web) before you can do anything useful with it. I eventually settled on creating my own extension of TextReader which overrides the Read methods and defines them using the ReadBlock method (which does a blocking read i.e. it waits until it can get exactly the number of characters you ask for)
Your problem is probably due like mine to the fact that Read methods aren't guarenteed to return the number of characters you ask for, for example if you look at the documentation for the BinaryReader.Read (http://msdn.microsoft.com/en-us/library/ms143295.aspx) method you'll see that it states:
Return Value
Type: System..::.Int32
The number of characters read into buffer. This might be less than the number of bytes requested if that many bytes are not available, or it might be zero if the end of the stream is reached.
Since BinaryReader has no ReadBlock methods like a TextReader all you can do is take your own approach of monitoring the position yourself or Marc's of pre-caching.

I'm working with Unity3D/Mono atm and the ReadChars-method might even contain more errors. I made a string like this:
mat.name = new string(binaryReader.ReadChars(64));
mat.name even contained the correct string, but I could just add strings before it. Everything after the string just disappered. Even with String.Format. My solution so far is not using the ReadChars-method, but read the data as byte array and convert it to a string:
byte[] str = binaryReader.ReadBytes(64);
int lengthOfStr = Array.IndexOf(str, (byte)0); // e.g. 4 for "clip\0"
mat.name = System.Text.ASCIIEncoding.Default.GetString(str, 0, lengthOfStr);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.