Reference numbers, and using them to compare numbers in text file - c#

The project is based on Eye Tracker. Let me brief the idea behind the project to understand my problem better.
I have the hardware of Tobii C eye tracker. This eye tracker will be able to give out coordinates of the X, Y of where I am looking at. But this device is very sensitive. When I look at 1 point, the eye tracker will send out many different data of coordinates but within ± 100 range which I found out. Even though you are staring at 1 point, your eyes keep moving, therefore giving out many data. This many data (float numbers) are then saved in a text file. Now I only need 1 data (X coordinate) which signifies the 1 point I am staring instead of the many data which are within the ± 100 range and move it to a new text file.
I have no idea how I should code to do that.
These are the float numbers in the text file.
200
201
198
202
250
278
310
315
360
389
500
568
579
590
When I stare at point 1, the data are 200-300, which are within the ± 100 range. I wanna set the 200 as reference point subtracts itself with the next number and check if the resultant value within 100, if it is, remove them. The reference point should keep doing that to the following numbers until it reaches outside the ± 100 range. Once outside the 100 range, now the number is 310, then now this number is the next reference point and do the same and subtract with the following numbers below and check if the resultant value within 100. Once outside the 100 range, the next number is 500, now, that is the new reference point, and do the same. That is my objective. To put it to simpler terms, The reference points should be moved into a new file.
This is my code so far which get the gaze coordinates and stores them in a text file.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using Tobii.Interaction;
namespace ConsoleApp1
{
class Program
{
private static void programintro()
{
Console.WriteLine("Press Any Keys To Start");
}
public static void Main(string[] args)
{
programintro();
Console.ReadKey();
double currentX = 0.0;
double currentY = 0.0;
double timeStampCurrent = 0.0;
double diffX = 0.0;
double diffY = 0.0;
int counter = 0;
var host = new Host();
host.EnableConnection();
var gazePointDataStream = host.Streams.CreateGazePointDataStream();
gazePointDataStream.GazePoint((gazePointX, gazePointY, timestamp) =>
{
diffX = gazePointX - currentX;
diffY = gazePointY - currentY;
currentX = gazePointX;
currentY = gazePointY;
timeStampCurrent = timestamp;
if (diffX > 100 || diffX <= -100 || diffY >= 100 || diffY <= -100)
{
counter++;
using (StreamWriter writer = new StreamWriter("C: \\Users\\Student\\Desktop\\FYP 2019\\ConsoleApp1\\ConsoleApp1\\Data\\TextFile1.txt", true))
{
writer.WriteLine("Recorded Data " + counter + "\n=================================================================================================================\nX: {0} Y:{1}\nData collected at {2}", currentX, currentY, timeStampCurrent);
writer.WriteLine("=================================================================================================================");
}
Console.WriteLine("Recorded Data " + counter + "\n=================================================================================================================\nX: {0} Y:{1}\nData collected at {2}", currentX, currentY, timeStampCurrent);
Console.WriteLine("=================================================================================================================");
}
});
//host.DisableConnection();
while (true)
{
if (counter < 10)
{
continue;
}
else
{
Environment.Exit(0);
}
}
Now my Question is how do I code to read the text file and set a
reference number and subtracts itself with the next number and check
if the resultant value within 100 and have a new reference number if
it outside the ± 100 range. Those reference numbers are then stored in
a new text file.
If there is a code example, I will create a new programme and store there and test it out first.

Assuming that the initial data is present in a list, the logic to get all reference points is as follows:
var initialData = new List<float> { 200,201,198,202,250,278,310,315,360,389,500,568,579,590 };
var lstReferencePoints = new List<float>();
var referencePoint = default(float);
foreach(var num in initialData)
{
if(referencePoint == default(float))
{
referencePoint = num;
}
if(Math.Abs(referencePoint - num) > 100)
{
lstReferencePoints.Add(referencePoint);
referencePoint = num;
}
}
lstReferencePoints.Add(referencePoint);
lstReferencePoints contains the list of referencePoints.
Edit: reading the float numbers from a text file to a List
var pointsArray = File.ReadAllLines(your_file_path);
var initialData = new List<float>(pointsArray.Select(float.Parse));
Storing the lstReferencePoints to a new text file:
using(TextWriter tw = new StreamWriter("newFile_Path"))
{
foreach (var item in lstReferencePoints)
tw.WriteLine(item);
}

Related

Random number with Probabilities in C#

I have converted this Java program into a C# program.
using System;
using System.Collections.Generic;
namespace RandomNumberWith_Distribution__Test
{
public class DistributedRandomNumberGenerator
{
private Dictionary<Int32, Double> distribution;
private double distSum;
public DistributedRandomNumberGenerator()
{
distribution = new Dictionary<Int32, Double>();
}
public void addNumber(int val, double dist)
{
distribution.Add(val, dist);// are these two
distSum += dist; // lines correctly translated?
}
public int getDistributedRandomNumber()
{
double rand = new Random().NextDouble();//generate a double random number
double ratio = 1.0f / distSum;//why is ratio needed?
double tempDist = 0;
foreach (Int32 i in distribution.Keys)
{
tempDist += distribution[i];
if (rand / ratio <= tempDist)//what does "rand/ratio" signify? What does this comparison achieve?
{
return i;
}
}
return 0;
}
}
public class MainClass
{
public static void Main(String[] args)
{
DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
drng.addNumber(2, 0.3d);
drng.addNumber(3, 0.5d);
//=================
// start simulation
int testCount = 1000000;
Dictionary<Int32, Double> test = new Dictionary<Int32, Double>();
for (int i = 0; i < testCount; i++)
{
int random = drng.getDistributedRandomNumber();
if (test.ContainsKey(random))
{
double prob = test[random]; // are these
prob = prob + 1.0 / testCount;// three lines
test[random] = prob; // correctly translated?
}
else
{
test.Add(random, 1.0 / testCount);// is this line correctly translated?
}
}
foreach (var item in test.Keys)
{
Console.WriteLine($"{item}, {test[item]}");
}
Console.ReadLine();
}
}
}
I have several questions:
Can you check if the marked-by-comment lines are correct or need explanation?
Why doesn't getDistributedRandomNumber() check if the sum of the distribution 1 before proceeding to further calculations?
The method
public void addNumber(int val, double dist)
Is not correctly translated, since you are missing the following lines:
if (this.distribution.get(value) != null) {
distSum -= this.distribution.get(value);
}
Those lines should cover the case when you call the following (based on your example code):
DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
drng.addNumber(1, 0.5d);
So calling the method addNumber twice with the same first argument, the missing code part looks if the first argument is already present in the dictionary and if so it will remove the "old" value from the dictionary to insert the new value.
Your method should look like this:
public void addNumber(int val, double dist)
{
if (distribution.TryGetValue(val, out var oldDist)) //get the old "dist" value, based on the "val"
{
distribution.Remove(val); //remove the old entry
distSum -= oldDist; //substract "distSum" with the old "dist" value
}
distribution.Add(val, dist); //add the "val" with the current "dist" value to the dictionary
distSum += dist; //add the current "dist" value to "distSum"
}
Now to your second method
public int getDistributedRandomNumber()
Instead of calling initializing a new instance of Random every time this method is called you should only initialize it once, so change the line
double rand = new Random().NextDouble();
to this
double rand = _random.NextDouble();
and initialize the field _random outside of a method inside the class declaration like this
public class DistributedRandomNumberGenerator
{
private Dictionary<Int32, Double> distribution;
private double distSum;
private Random _random = new Random();
... rest of your code
}
This will prevent new Random().NextDouble() from producing the same number over and over again if called in a loop.
You can read about this problem here: Random number generator only generating one random number
As I side note, fields in c# are named with a prefix underscore. You should consider renaming distribution to _distribution, same applies for distSum.
Next:
double ratio = 1.0f / distSum;//why is ratio needed?
Ratio is need because the method tries its best to do its job with the information you have provided, imagine you only call this:
DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
int random = drng.getDistributedRandomNumber();
With those lines you told the class you want to have the number 1 in 20% of the cases, but what about the other 80%?
And that's where the ratio variable comes in place, it calculates a comparable value based on the sum of probabilities you have given.
eg.
double ratio = 1.0f / distSum;
As with the latest example drng.addNumber(1, 0.2d); distSum will be 0.2, which translates to a probability of 20%.
double ratio = 1.0f / 0.2;
The ratio is 5.0, with a probability of 20% the ratio is 5 because 100% / 5 = 20%.
Now let's have a look at how the code reacts when the ratio is 5
double tempDist = 0;
foreach (Int32 i in distribution.Keys)
{
tempDist += distribution[i];
if (rand / ratio <= tempDist)
{
return i;
}
}
rand will be to any given time a value that is greater than or equal to 0.0, and less than 1.0., that's how NextDouble works, so let's assume the following 0.254557522132321 as rand.
Now let's look what happens step by step
double tempDist = 0; //initialize with 0
foreach (Int32 i in distribution.Keys) //step through the added probabilities
{
tempDist += distribution[i]; //get the probabilities and add it to a temporary probability sum
//as a reminder
//rand = 0.254557522132321
//ratio = 5
//rand / ratio = 0,0509115044264642
//tempDist = 0,2
// if will result in true
if (rand / ratio <= tempDist)
{
return i;
}
}
If we didn't apply the ratio the if would be false, but that would be wrong, since we only have a single value inside our dictionary, so no matter what the rand value might be the if statement should return true and that's the natur of rand / ratio.
To "fix" the randomly generated number based on the sum of probabilities we added. The rand / ratio will only be usefull if you didn't provide probabilites that perfectly sum up to 1 = 100%.
eg. if your example would be this
DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
drng.addNumber(2, 0.3d);
drng.addNumber(3, 0.5d);
You can see that the provided probabilities equal to 1 => 0.2 + 0.3 + 0.5, in this case the line
if (rand / ratio <= tempDist)
Would look like this
if (rand / 1 <= tempDist)
Divding by 1 will never change the value and rand / 1 = rand, so the only use case for this devision are cases where you didn't provided a perfect 100% probability, could be either more or less.
As a side note, I would suggest changing your code to this
//call the dictionary distributions (notice the plural)
//dont use .Keys
//var distribution will be a KeyValuePair
foreach (var distribution in distributions)
{
//access the .Value member of the KeyValuePair
tempDist += distribution.Value;
if (rand / ratio <= tempDist)
{
return i;
}
}
Your test routine seems to be correctly translated.

Setting a reference number and comparing that to other data in textfile

The project is based on Eye Tracker. Let me brief the idea behind the project to understand my problem better.
I have the hardware of Tobii C eye tracker. This eye tracker will be able to give out coordinates of the X, Y of where I am looking at. But this device is very sensitive. When I look at 1 point, the eye tracker will send out many different data of coordinates but within ± 100 range which I found out. Even though you are staring at 1 point, your eyes keep moving, therefore giving out many data. This many data (float numbers) are then saved in a text file. Now I only need 1 data (X coordinate) which signifies the 1 point I am staring instead of the many data which are within the ± 100 range and move it to a new text file.
I have no idea how I should code to do that.
These are the float numbers in the text file.
200
201
198
202
250
278
310
315
360
389
500
568
579
590
When I stare at point 1, the data are 200-300, which are within the ± 100 range. I wanna set the 200 as reference point subtracts itself with the next number and check if the resultant value within 100, if it is, remove them. The reference point should keep doing that to the following numbers until it reaches outside the ± 100 range. Once outside the 100 range, now the number is 310, then now this number is the next reference point and do the same and subtract with the following numbers below and check if the resultant value within 100. Once outside the 100 range, the next number is 500, now, that is the new reference point, and do the same. That is my objective. To put it to simpler terms, The reference points should be moved into a new file.
This is my code so far which get the gaze coordinates and stores them in a text file.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using Tobii.Interaction;
namespace ConsoleApp1
{
class Program
{
private static void programintro()
{
Console.WriteLine("Press Any Keys To Start");
}
public static void Main(string[] args)
{
programintro();
Console.ReadKey();
double currentX = 0.0;
double currentY = 0.0;
double timeStampCurrent = 0.0;
double diffX = 0.0;
double diffY = 0.0;
int counter = 0;
var host = new Host();
host.EnableConnection();
var gazePointDataStream = host.Streams.CreateGazePointDataStream();
gazePointDataStream.GazePoint((gazePointX, gazePointY, timestamp) =>
{
diffX = gazePointX - currentX;
diffY = gazePointY - currentY;
currentX = gazePointX;
currentY = gazePointY;
timeStampCurrent = timestamp;
if (diffX > 100 || diffX <= -100 || diffY >= 100 || diffY <= -100)
{
counter++;
using (StreamWriter writer = new StreamWriter("C: \\Users\\Student\\Desktop\\FYP 2019\\ConsoleApp1\\ConsoleApp1\\Data\\TextFile1.txt", true))
{
writer.WriteLine("Recorded Data " + counter + "\n=================================================================================================================\nX: {0} Y:{1}\nData collected at {2}", currentX, currentY, timeStampCurrent);
writer.WriteLine("=================================================================================================================");
}
Console.WriteLine("Recorded Data " + counter + "\n=================================================================================================================\nX: {0} Y:{1}\nData collected at {2}", currentX, currentY, timeStampCurrent);
Console.WriteLine("=================================================================================================================");
}
});
//host.DisableConnection();
while (true)
{
if (counter < 10)
{
continue;
}
else
{
Environment.Exit(0);
}
}
Now my Question is how do I code to read the text file and set a
reference number and subtracts itself with the next number and check
if the resultant value within 100 and have a new reference number if
it outside the ± 100 range. Those reference numbers are then stored in
a new text file.
Based on your sample data here is the code to get only numbers which has 100+ difference.
static void Main(string[] args)
{
string filename = #"C:\PowershellScripts\test.txt"; // INPUT FILE
String outputFile = #"C:\PowershellScripts\result.txt"; // OUTPUT FILE
string[] data = File.ReadAllLines(filename); // READING FORM FILE
int TotalLine = data.Length; // COUNT TOTAL NO OF ROWS
List<string> FinalList = new List<string>(); // INITIALIZE LIST FOR FINAL RESULT
if (TotalLine <= 0) // CHECK IF FILE HAS NO DATA
{
Console.WriteLine("No Data found !");
return;
}
double CurrentNumber = double.Parse(data[0]), NextNumber = 0, diff = 0; // INITIALIZE OF LOCAL VARIABLES, CURRENT NUMBER = FIRST NO FROM FILE
for (int cntr = 1; cntr < TotalLine; cntr++) // FOR LOOP FOR EACH LINE
{
NextNumber = double.Parse(data[cntr]); //PARSING NEXT NO
diff = CurrentNumber - NextNumber; // GETTING DIFFERENCE
if (diff <= 100 && diff >= -100) // MATCH THE DIFFERENCE
{
continue; // SKIP THE LOGIC IF DIFF IS LESS THEN 100
}
else
{
FinalList.Add(CurrentNumber.ToString()); // ADDING THE NO TO LIST
CurrentNumber = NextNumber; // POINTING TO NEXT NO
}
}
FinalList.Add(CurrentNumber.ToString()); // ADDING LAST NO.
foreach (string d in FinalList) // FOR EACH LOOP TO PRINT THE FINAL LIST
Console.WriteLine(d);
File.WriteAllLines(outputFile, FinalList); // SAVING TO THE FILE
}
The above program will generate the output is :
200
310
500

Referencing 2 coordinates and check differential value

This question is a extension of a previous question i asked.
Setting a reference number and comparing that to other data in textfile
I have a set of X & Y data coordinates in a text file.
Recorded Data 1
X: 1081.02409791506 Y:136.538121516361
Data collected at 208786.9115
Recorded Data 2
X: 1082.82841293328 Y:139.344405668078
Data collected at 208810.0446
Recorded Data 4
X: 1525.397051187 Y:1163.1786031393
Data collected at 245756.8823
Recorded Data 5
X: 1524.98201445054 Y:1166.38589429581
Data collected at 245769.489
Recorded Data 6
X: 509.002294087998 Y:913.213486303154
Data collected at 277906.6251
Recorded Data 7
X: 479.826998339658 Y:902.689393940613
Data collected at 277912.9958
I wanna set the first set of data which is X: 1081.02409791506 Y:136.538121516361 as reference point, then it subtracts itself with the next set of data X & Y respectively and check if the resultant value within 100 for both X & Y differential value, if it is, continue the operation. The reference point should keep doing that to the following numbers until it reaches outside the ± 100 range. Once outside the 100 range, now the set of data is X: 1525.397051187 Y:1163.1786031393 because the differential value of 1st data and this data is over 100, then now this set of data is the next reference point and do the same and subtract with the following data below and check if the resultant value within 100. Once outside the 100 range, the next number is X: 509.002294087998 Y:913.213486303154, now, that is the new reference point, and do the same. That is my objective. To put it to simpler terms, The reference points should be moved into a new file.
This code is able to do the above but only for numbers shown below.
278
299
315
360
389
400
568
579
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
namespace ReadTextFile
{
class Program
{
static void Main(string[] args)
{ string inputFile = #"C:\Users\Student\Desktop\ConsoleApp1\ConsoleApp1\Data\TextFile.txt"; // INPUT FILE
string outputFile = #"C:\Users\Student\Desktop\Test.txt"; // OUTPUT FILE
string[] data = File.ReadAllLines(inputFile); // READING FORM FILE
int TotalLine = data.Length; // COUNT TOTAL NUMBER OF ROWS
List<string> FinalList = new List<string>(); // INITIALIZE LIST FOR FINAL RESULT
double CurrentNumber = double.Parse(data[0]), NextNumber, diff; // INITIALIZE OF LOCAL VARIABLES, CURRENT NUMBER = FIRST NUMBER FROM FILE
for (int cntr = 1; cntr < TotalLine; cntr++) // FOR LOOP FOR EACH LINE
{
NextNumber = double.Parse(data[cntr]); //PARSING NEXT NUMBER
diff = CurrentNumber - NextNumber; // GETTING DIFFERENCE
if (diff <= 100 && diff >= -100) // MATCH THE DIFFERENCE
{
continue; // SKIP THE LOGIC IF DIFFERENCE IS LESS THEN 100
}
else
{
FinalList.Add(CurrentNumber.ToString()); // ADDING THE NUMBER TO LIST
CurrentNumber = NextNumber; // POINTING TO NEXT NUMBER
}
}
FinalList.Add(CurrentNumber.ToString()); // ADDING LAST NUMBER
foreach (string d in FinalList) // FOR EACH LOOP TO PRINT THE FINAL LIST
Console.WriteLine(d);
File.WriteAllLines(outputFile, FinalList); // SAVING TO THE FILE
}
How do i do the same for 2 coordinates?
1st condition: At least 1 differential value of X or Y is outside the ± 100 range, that set of data is the new reference data.
2nd condition: If both x and Y differential value is within ± 100 range, we must continue the operation.
Here is a solution, provided the source file content is as stated above:
Recorded Data 1
X: 1081.02409791506 Y:136.538121516361
Data collected at 208786.9115
Recorded Data 2
X: 1082.82841293328 Y:139.344405668078
Data collected at 208810.0446
..
and the target file as follows:
X: 1081.02409791506 Y:136.538121516361
X: 1525.397051187 Y:1163.1786031393
..
Solution
using System;
using System.Collections;
using System.Collections.Generic;
using System.Dynamic;
using System.Globalization;
using System.Linq;
using System.Reflection;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
var points = ParseFromFile(
#"C:\Users\Student\Desktop\ConsoleApp1\ConsoleApp1\Data\TextFile.txt");
RenderToFile(
#"C:\Users\Student\Desktop\Test.txt",
Merge(points).ToArray());
}
static void RenderToFile(string fileName, (double x, double y)[] points)
{
var formatProvider = new CultureInfo("en-US", false);
var builder = new StringBuilder();
foreach (var point in points)
{
builder.Append(
$"X: {point.x.ToString(formatProvider)} Y:{point.y.ToString(formatProvider)}");
}
System.IO.File.WriteAllText(fileName, builder.ToString());
}
static (double x, double y)[] ParseFromFile(string fileName)
{
return Parse(System.IO.File.ReadAllText(fileName)).ToArray();
}
static IEnumerable<(double x, double y)> Merge((double x, double y)[] points)
{
points = points ?? throw new ArgumentNullException(nameof(points));
if (points.Length == 0) yield break;
var std = 100;
var current = points[0];
if (points.Length == 1)
{
yield return current;
yield break;
}
for (var i = 1; i < points.Length; i++)
{
var dx = Math.Abs(points[i].x - current.x);
var dy = Math.Abs(points[i].y - current.y);
if (dx <= std && dy <= std)
{
continue;
}
yield return current;
current = points[i];
}
yield return current;
}
static IEnumerable<(double x, double y)> Parse(string raw)
{
var formatProvider = new CultureInfo("en-US", false);
var pattern = new Regex(#"^[Xx][:] (?<x>\d*([.]\d+)?) [Yy][:](?<y>\d*([.]\d+)?)$");
raw = raw ?? throw new ArgumentNullException(nameof(raw));
foreach (var line in raw.Split(
Environment.NewLine, StringSplitOptions.RemoveEmptyEntries).Where(
line => line.ToLowerInvariant().StartsWith("x")))
{
var match = pattern.Match(line);
var xToken = match.Groups["x"].Value.Trim();
var yToken = match.Groups["y"].Value.Trim();
var x = double.Parse(xToken, formatProvider);
var y = double.Parse(yToken, formatProvider);
yield return (x: x, y: y);
}
}
}
}
Explained
First we need to parse the data.
The format provider is required to parse a double correctly from a fixed string with decimal seperator ..
// can parse 1525.397051187, but not 1525,397051187
// en-US is the format you comply with
// 'false' is required to use the default culture settings, not any user overrided
var formatProvider = new CultureInfo("en-US", false);
The pattern ensures we parse the x and y coordinate correctly.
// X: 1525.397051187 Y:1163.1786031393
// we use named groups to capture x (?<x>\d*([.]\d+)?)
// and y (?<y>\d*([.]\d+)?)
var pattern = new Regex(#"^[Xx][:] (?<x>\d*([.]\d+)?) [Yy][:](?<y>\d*([.]\d+)?)$");
Once parsed, we can merge (x,y) coordinates based on your specification. std is the allowed standard deviation for our deltas (dx, dy).
var dx = Math.Abs(points[i].x - current.x);
var dy = Math.Abs(points[i].y - current.y);
if (dx <= std && dy <= std)
{
continue;
}
A note on IEnumerable<T>:
Using this as return value allows us to use the yield syntax. This is called a generator function.
A note on value tuple (double x, double y):
We can use named tuples to avoid creating 'stupid' intermediate classes.

use GPU/TPL on C# code to speed up things, taking 40 minutes

I want to perform some calculations on a text file that have 1 number "0,1" on each line and have almost 1 million lines.
What I want to check how many time a sequence exists in the whole file and it makes a sequence according to the sequence lengthis, for example my file is:
01100101011....up to 1 milion (each number on a new line)
Code
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
public class Program
{
static void Main(string[] args)
{
Stopwatch time = new Stopwatch();
time.Start();
try
{
// I have hard coded fileName and Sequence Length that i am taking from user
string data = "", fileName = "10.txt"; // this file has almost 1 Million records
int first = 0, last = 0;
// reads data and make a string of that data
// which means "data" = "1001011001010100101 .... upto 1 million"
data = string.Join("", File.ReadAllLines(fileName));
last = Convert.ToInt32("15"); // sequence length
int l = data.Length; // calculates it one time so that dont have to do it everytime
//so why i create List is because sometime Array dont get fully used to its length
// and get Null values at the end
List<string> dataList = new List<string>();
while (first + last < l+1)
{
dataList.Add((data.Substring(first, last)));
first++;
}
// converts list to Array so array will have values and no Null
// and will use Array.FindAll() later
string[] dataArray = dataList.ToArray(), value;
// get rready a file to start writing on
StreamWriter sw = new StreamWriter(fileName.Substring(0, fileName.Length - 4) + "Results.txt");
//THIS IS THE PART THATS TAKING around 40 minutes
for (int j = 0; j < dataArray.Length; j++)
{
// finds a value in whole array and make array of that finding
value = Array.FindAll(dataArray, str => str.Equals(dataArray[j]));
// value.Length means the count of the Number in the whole array
sw.WriteLine(value.Length);
}
sw.Close();
time.Stop();
Console.WriteLine("Time : " + time.Elapsed);
Console.ReadLine();
}
catch (Exception ex)
{
Console.WriteLine("Exception " + ex.StackTrace);
Console.ReadLine();
}
}
}
I set a sequence length = 3, now what my program does make an array :
dataArray = {"011" , "110" , "100" , "001" , "010" , "101" , "011"}
by making use of String.Substring() . Now I simply want to calculate the Frequency of element of the array.
Data in Resultant .txt file
011 - 2
110 - 0
100 - 0
001 - 0
010 - 0
101 - 0
011 - 2
Now it seems to be pretty simple but it is not, I can't convert it int because it's a sequence I don't want to lost the zeros at the front of the sequence.
Right now my program has to loop 1 million (each element ) X 1 million (compares with each element of array) = 1 trillion times. It is taking almost 40 minutes. I want to know how can I make it fast , Parallel.For, TPL I don't know about them how to use them. Because it should be done in seconds.
My Systems Specs
32 GB RAM
i7- 5820k 3.30 ghz
64 bit
2x nvidia gtx 970
If I'm understanding your code and question correctly, you need to "slide a window" (of length N, last in your original code) over the text, and count how many times each substring exists in the text.
If that's right, the following code does it in 0.292 seconds or thereabouts on a million-character file, and you don't need parallelism or GPU at all.
The idea here is to tally the chunk counts into a Dictionary as we're sliding that window over the text.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
public class Program
{
static Dictionary<string, int> CountChunks(string data, int chunkLength)
{
var chunkCounts = new Dictionary<string, int>();
var l = data.Length;
for (var i = 0; i < l - chunkLength; i++)
{
var chunk = data.Substring(i, chunkLength);
int count = 0;
chunkCounts.TryGetValue(chunk, out count);
chunkCounts[chunk] = count + 1;
}
return chunkCounts;
}
static void Main(string[] args)
{
var time = new Stopwatch();
time.Start();
var fileName = "10.txt";
var data = string.Join("", File.ReadAllText(fileName));
var chunkCounts = CountChunks(data, 15);
using (var sw = new StreamWriter(fileName.Substring(0, fileName.Length - 4) + "Results.txt"))
{
foreach (var pair in chunkCounts)
{
sw.WriteLine($"{pair.Key} - {pair.Value}");
}
}
time.Stop();
Console.WriteLine("Time : " + time.Elapsed);
}
}
The output 10Results.txt looks something like
011100000111100 - 34
111000001111000 - 37
110000011110001 - 27
100000111100010 - 28
000001111000101 - 37
000011110001010 - 36
000111100010100 - 44
001111000101001 - 35
011110001010011 - 41
111100010100110 - 42
etc.
EDIT: Here's the equivalent Python program. It's a little slower at about 0.9 seconds.
import time
from collections import Counter
t0 = time.time()
c = Counter()
data = ''.join(l for l in open('10.txt'))
l = 15
for i in range(0, len(data) - l):
c[data[i : i + l]] += 1
with open('10Results2.txt', 'w') as outf:
for key, value in c.items():
print(f'{key} - {value}', file=outf)
print(time.time() - t0)
For loop will give you terrible performance as it has to loop through a million string comparison.
I would suggest using a dictionary instead of a list to store your sequence as a key and count as a value.
It should give you much better performance as compared to a while/for loop .
All you need to do is tweak a little bit for performance point of view and may not even need to leverage GPU/TLP runtime unless it's you sole purpose.
Something below should get you going.
string keyString = string.Empty;
Dictionary<string,int> dataList = new Dictionary<string,int>;
while (first + last < l+1)
{
keyString = data.Substring(first, last);
if(dataList.ContainsKey(keyString)
{
dataList[keyString] = dataList[keyString] + 1;
}
else
{
dataList.Add(keyString,1);
}
first++;
}
the rest of the code you need is to print this dictionary.

Detecting spikes and drops in a long list of integers C#

Hi there I'm trying to write a method that reads every number in a list and detects where it spikes and drops. This is what I have so far:
I basically figure if I loop through the list, loop through it again to get the next number in the list, then detecting if it's more or less. If it's more it'll save to one list, vice versa.
What I want this method to do is determine where there's a spike of 100 or more, save the point that it does this (which is 'counter') and also save the points where the numbers drop.
This so far notices only a drop and it will save every number in the list until it spikes again and once it has spiked it shows no numbers, until it drops again and so on.
I've put 'check' and 'check2' to try and counteract it saving every number after it notices a drop and only save it once but no luck.
Any ideas?
public void intervalDetection()
{
//Counter is the point in the list
int counter = 0;
int spike = 0;
int drop = 0;
//Loop through power list
for (int i = 0; i < powerList.Count(); i++)
{
counter++;
int firstNumber = powerList[i];
//Loop again to get the number after??
for (int j = 1; j < 2; j++)
{
//Detect Spike
spike = firstNumber + 100;
drop = firstNumber - 100;
if (powerList[j] > spike)
{
if (check2 == false)
{
intervalStartList.Add(counter);
check2 = true;
check = false;
}
}
//Detect Drop
else if (powerList[j] < drop)
{
if (check == false)
{
intervalEndList.Add(counter);
check = true;
check2 = false;
}
}
}
Create integer "average"
Loop through List/Array and add each value to average
Divide average by the count of the List/Array
Loop through List/Array and check deviation to the average integer
derp
Code example:
public class DSDetector {
public static List<int>[] getDropsnSpikes(List<int> values, int deviation) {
List<int> drops = new List<int>();
List<int> spikes = new List<int>();
int average = 0;
foreach (int val in values) {
average += val;
}
average = average/values.Count;
foreach (int val in values) {
if (val < average - deviation) {
drops.add(val);
}
if (val > average + deviation) {
spikes.add(val);
}
}
//derp.
return new List<int>{drops, spikes};
}
}
not tested but I think it works. Just try it.
What exactly do you mean saying "peaks" and "drops"?
Let's say you have following list of integers
112, 111, 113, 250, 112, 111, 1, 113
In this case value 250 is peak and 1 drop relative to average value and you can get it using Kai_Jan_57 answer.
But also 250 is peak to previous value 113 and 112 is drop for 250.
If you want to find local peaks and drops you can check each value relative to previous and next: find average as avg=(val[i-1]+val[i+1])/2 and check if val[i]>avg + 100 (peak) or val[i]

Categories

Resources