Check how many instances of strings in a list

Check how many instances of strings in a list - c#

Well I enquired about checking if certain keywords can be found in an list and if they are all there the question is correct. Found here: Check if the string contains all inputs on the list
What I would like to also know is how many of the words are in the list, then divide it and get a percentage, so the user knows how accurately they answered each question.
public String KeyWords_Found()
{
int Return_Value = 0;
foreach (String s in KeyWords)
{
if (textBox1.Text.Contains(s))
{
Return_Value++;
}
}
int Holder = Return_Value / KeyWords.Count;
int Fixed = Holder * 100;
return Fixed + "%";
}
So what I want that code it do is check for all instances of keywords listed into the list KeyWords. Then get the percentage by dividing by the total amount of keywords and multiplying by 100. But it says that both values are 0 and i cant divide by 0. I'm not sure why they would be zero. Confused! Help!

You should first check, if KeyWords is empty or not
public String KeyWords_Found()
{
if (KeyWords.Count == 0)
return "0%";
// rest of the code
}
Alternatively you could use Linq instead of writing your own method:
int nOfOccurences = KeyWords.Where(k => textBox1.Text.Contains(k)).Count();
make sure you are using System.Linq; for that to work.
You'll still need to check for KeyWords.Count == 0 and compute the percentage yourself, though.

You should use floating point maths instead of integer maths in your calculations.
int i=100;
int a=51;
(i/a)==0 //true, integer division sucks for calculating percentages
((double)i/a)==0 //false, actually equals ~1.96

Related

Reducing a BigInteger value in C#

I'm somewhat new to working with BigIntegers and have tried some stuff to get this system working, but feel a little stuck at the moment and would really appreciate a nudge in the right direction or a solution.
I'm currently working on a system which reduces BigInteger values down to a more readable form, and this is working fine with my current implementation, but I would like to further expand on it to get decimals implemented.
To better give a picture of what I'm attempting, I'll break it down.
In this context, we have a method which is taking a BigInteger, and returning it as a string:
public static string ShortenBigInt (BigInteger moneyValue)
With this in mind, when a number such as 10,000 is passed to this method, 10k will be returned. Same for 1,000,000 which will return 1M.
This is done by doing:
for(int i = 0; i < prefixes.Length; i++)
{
if(!(moneyValue >= BigInteger.Pow(10, 3*i)))
{
moneyValue = moneyValue / BigInteger.Pow(10, 3*(i-1));
return moneyValue + prefixes[i-1];
}
}
This system is working by grabbing a string from an array of prefixes and reducing numbers down to their simplest forms and combining the two and returning it when inside that prefix range.
So with that context, the question I have is:
How might I go about returning this in the same way, where passing 100,000 would return 100k, but also doing something like 1,111,111 would return 1.11M?
Currently, passing 1,111,111M returns 1M, but I would like that additional .11 tagged on. No more than 2 decimals.
My original thought was to convert the big integer into a string, then chunk out the first few characters into a new string and parse a decimal in there, but since prefixes don't change until values reach their 1000th mark, it's harder to tell when to place the decimal place.
My next thought was using BigInteger.Log to reduce the value down into a decimal friendly number and do a simple division to get the value in its decimal form, but doing this didn't seem to work with my implementation.
This system should work for the following prefixes, dynamically:
k, M, B, T, qd, Qn, sx, Sp,
O, N, de, Ud, DD, tdD, qdD, QnD,
sxD, SpD, OcD, NvD, Vgn, UVg, DVg,
TVg, qtV, QnV, SeV, SPG, OVG, NVG,
TGN, UTG, DTG, tsTG, qtTG, QnTG, ssTG,
SpTG, OcTG, NoTG, QdDR, uQDR, dQDR, tQDR,
qdQDR, QnQDR, sxQDR, SpQDR, OQDDr, NQDDr,
qQGNT, uQGNT, dQGNT, tQGNT, qdQGNT, QnQGNT,
sxQGNT, SpQGNT, OQQGNT, NQQGNT, SXGNTL
Would anyone happen to know how to do something like this? Any language is fine, C# is preferable, but I'm all good with translating. Thank you in advance!

formatting it manually could work a bit like this:
(prefixes as a string which is an char[])
public static string ShortenBigInt(BigInteger moneyValue)
{
string prefixes = " kMGTP";
double m2 = (double)moneyValue;
for (int i = 1; i < prefixes.Length; i++)
{
var step = Math.Pow(10, 3 * i);
if (m2 / step < 1000)
{
return String.Format("{0:F2}", (m2/step)) + prefixes[i];
}
}
return "err";
}

Although Falco's answer does work, it doesn't work for what was requested. This was the solution I was looking for and received some help from a friend on it. This solution will go until there are no more prefixes left in your string array of prefixes. If you do run out of bounds, the exception will be thrown and handled by returning "Infinity".
This solution is better due to the fact there is no crunch down to doubles/decimals within this process. This solution does not have a number cap, only limit is the amount of prefixes you make/provide.
public static string ShortenBigInt(BigInteger moneyValue)
{
if (moneyValue < 1000)
return "" + moneyValue;
try
{
string moneyAsString = moneyValue.ToString();
string prefix = prefixes[(moneyAsString.Length - 1) / 3];
BigInteger chopAmmount = (moneyAsString.Length - 1) % 3 + 1;
int insertPoint = (int)chopAmmount;
chopAmmount += 2;
moneyAsString = moneyAsString.Remove(Math.Min(moneyAsString.Length - 1, (int)chopAmmount));
moneyAsString = moneyAsString.Insert(insertPoint, ".");
return moneyAsString + " " + prefix;
}
catch (Exception exceptionToBeThrown)
{
return "Infinity";
}
}

Pitfalls in C# for a new user. (FWHM calculation)

This is my idea to program a simple math module (function) that can be called from another main program. It calculates the FWHM(full width at half the max) of a curve. Since this is my first try at Visual Studio and C#. I would like to know few basic programming structures I should learn in C# coming from a Mathematica background.
Is double fwhm(double[] data, int c) indicate the input arguments
to this function fwhm should be a double data array and an Integer
value? Did I get this right?
I find it difficult to express complex mathematical equations (line 32/33) to express them in parenthesis and divide one by another, whats the right method to do that?
How can I perform Mathematical functions on elements of an Array like division and store the results in the same Array?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace DEV_2
{
class fwhm
{
static double fwhm(double[] data, int c) // data as 2d data and c is integer
{
double[] datax;
double[] datay;
int L;
int Mag = 4;
double PP = 2.2;
int CI;
int k;
double Interp;
double Tlead;
double Ttrail;
double fwhm;
L = datay.Length;
// Create datax as index for the number of elemts in data from 1-Length(data).
for (int i = 1; i <= data.Length; i++)
{
datax[i] = (i + 1);
}
//Find max in datay and divide all elements by maxValue.
var m = datay.Length; // Find length of datay
Array.ForEach(datay, (x) => {datay[m++] = x / datay.Max();}); // Divide all elements of datay by max(datay)
double maxValue = datay.Max();
CI = datay.ToList().IndexOf(maxValue); // Push that index to CI
// Start to search lead
int k = 2;
while (Math.Sign(datay[k]) == Math.Sign(datay[k-1]-0.5))
{
k=k+1;
}
Interp = (0.5-datay[k-1])/(datay[k]-datay[k-1]);
Tlead = datax[k-1]+Interp*(datax[k]-datax[k-1]);
CI = CI+1;
// Start search for the trail
while (Math.Sign(datay[k]-0.5) == Math.Sign(datay[k-1]-0.5) && (k<=L-1))
{
k=k+1;
}
if (k != L)
{
Interp = (0.5-datay[k-1])/(datay[k]-datay[k-1]);
Ttrail = datax[k-1] + Interp*(datax[k]-datax[k-1]);
fwhm =((Ttrail-Tlead)*PP)/Mag;
}
}//end main
}//end class
}//end namespace

There are plenty of pitfalls in C#, but working through problems is a great way to find and learn them!
Yes, when passing parameters to a method the correct syntax is MethodName(varType varName) seperated by a comma for multiple parameters. Some pitfalls arise here with differences in passing Value types and Reference types. If you're interested here is some reading on the subject.
Edit: As pointed out in the comments you should write code as best as possible to require as few comments as possible (thus paragraph between #3 and #4), however if you need to do very specific and slightly complex math then you should comment to clarify what is occuring.
If you mean difficulties understanding, make sure you comment your code properly. If you mean difficulties writing it, you can create variables to simplify reading your code (but generally unnecessary) or look up functions or libraries to help you, this is a bit open ended question if you have a particular functionality you are looking for perhaps we could be of more help.
You can access your array via indexes such as array[i] will get the ith index. Following this you can manipulate the data that said index is pointing to in any way you wish, array[i] = (array[i]/24)^3 or array[i] = doMath(array[i])
A couple things you can do if you like to clean a little, but they are preference based, is not declare int CI; int k; in your code before you initialize them with int k = 2;, there is no need (although you can if it helps you). The other thing is to correctly name your variables, common practice is a more descriptive camelCase naming, so perhaps instead of int CI = datay.ToList().IndexOf(maxValue); you coud use int indexMaxValueYData = datay.ToList().IndexOf(maxValue);
As per your comment question "What would this method return?" The method will return a double, as declared above. returnType methodName(parameters) However you need to add that in your code, as of now I see no return line. Such as return doubleVar; where doubleVar is a variable of type double.

C# getting first digit of int in custom class

I am trying to build a help function in my guess the number game, whereby the user gets the first digit of the number he/she has to guess. So if the generated number is 550, he will get the 5.
I have tried a lot of things, maybe one of you has an idea what is wrong?
public partial class Class3
{
public Class3()
{
double test = Convert.ToDouble(globalVariableNumber.number);
while (test > 10)
{
double firstDigit = test / 10;
test = Math.Round(test);
globalVariableNumber.helpMe = Convert.ToString(firstDigit);
}
}
}
Under the helpButton clicked I have:
private void helpButton_Click(object sender, EventArgs e)
{
label3.Text = globalVariableNumber.helpMe;
label3.AutoSize = true;
That is my latest try, I putted all of this in a custom class. In the main I putted the code to show what is in the helpMe string.
If you need more code please tell me

Why not ToString the number and use Substring to get the first character?
var number = 550;
var result = number.ToString().Substring(0, 1);
If for some reason you dont want to use string manipulation you could do this mathematically like this
var number = 550;
var result = Math.Floor(number / Math.Pow(10, Math.Floor(Math.Log10(number))));

What's wrong - you have an infinite while loop there. Math.Round(test) will leave the value of test unchanged after the first iteration.
You may have intended to use firstDigit as the variable controlling the loop.
Anyway, as suggested by others, you can set helpMe to the first digit by converting to a string and using the first character.
As an aside, you should consider supplying the number as a parameter and returning the helpMe string from the method. Your current approach is a little brittle.

The problem with your code is that you are doing the division and storing that in a separate variable, then you round the original value. That means that the original value only changes in the first iteration of the loop (and is only rounded, not divided), and unless that happens to make the loop condition false (i.e. for values between 10 and 10.5), the loop will never end.
Changes:
Use an int intead of a double, that gets you away from a whole bunch of potential precision problems.
Use the >= operator rather than >. If you get the value 10 then you want the loop to go on for another iteration to get a single digit.
You would use Math.Floor instead of Math.Round as you don't want the first digit to be rounded up, i.e. getting the first digit for 460 as 5. However, if you are using an integer then the division truncates the result, so there is no need to do any rounding at all.
Divide the value and store it back into the same variable.
Use the value after the loop, there is no point in updating it while you still have multiple digits in the variable.
Code:
int test = (int)globalVariableNumber.number;
while (test >= 10) {
test = test / 10;
}
globalVariableNumber.helpMe = test.ToString();

By using Math.Round(), in your example, you're rounding 5.5 to 6 (it's the even integer per the documentation). Use Math.Floor instead, this will drop the decimal point but give you the number you're expecting for this test.
i.e.
double test = Convert.ToDouble(globalVariableNumber.number);
while (test > 10)
{
test = Math.Floor(test / 10);
globalVariableNumber.helpMe = Convert.ToString(firstDigit);
}
Like #Sam Greenhalgh mentions, though, returning the first character of the number as a string will be cleaner, quicker and easier.
globalVariableNumber.helpMe = test >= 10
? test.ToString().SubString(0, 1)
: "Hint not possible, number is less than ten"
This assumes that helpMe is a string.
Per our discussion in the comments, you'd be better off doing it like this:
private void helpButton_Click(object sender, EventArgs e)
{
label3.Text = GetHelpText();
label3.AutoSize = true;
}
// Always good practice to name a method that returns something Get...
// Also good practice to give it a descriptive name.
private string GetHelpText()
{
return test >= 10 // The ?: operator just means if the first part is true...
? test.ToString().SubString(0, 1) // use this, otherwise...
: "Hint not possible, number is less than ten" // use this.
}

Compare each element in an array to each other

I need to compare a 1-dimensional array, in that I need to compare each element of the array with each other element. The array contains a list of strings sorted from longest to the shortest. No 2 items in the array are equal however there will be items with the same length. Currently I am making N*(N+1)/2 comparisons (127.8 Billion) and I'm trying to reduce the number of over all comparisons.
I have implemented a feature that basically says: If the strings are different in length by more than x percent then don't bother they not equal, AND the other guys below him aren't equal either so just break the loop and move on to the next element.
I am currently trying to further reduce this by saying that: If element A matches element C and D then it stands to reason that elements C and D would also match so don't bother checking them (i.e. skip that operation). This is as far as I've factored since I don't currently know of a data structure that will allow me to do that.
The question here is: Does anyone know of such a data structure? or Does anyone know how I can further reduce my comparisons?
My current implementation is estimated to take 3.5 days to complete in a time window of 10 hours (i.e. it's too long) and my only options left are either to reduce the execution time, which may or may not be possible, or distrubute the workload accross dozens of systems, which may not be practical.
Update: My bad. Replace the word equal with closely matches with. I'm calculating the Levenstein distance
The idea is to find out if there are other strings in the array which closely matches with each element in the array. The output is a database mapping of the strings that were closely related.
Here is the partial code from the method. Prior to executing this code block there is code that loads items into the datbase.
public static void RelatedAddressCompute() {
TableWipe("RelatedAddress");
decimal _requiredDistance = Properties.Settings.Default.LevenshteinDistance;
SqlConnection _connection = new SqlConnection(Properties.Settings.Default.AML_STORE);
_connection.Open();
string _cacheFilter = "LevenshteinCache NOT IN ('','SAMEASABOVE','SAME')";
SqlCommand _dataCommand = new SqlCommand(#"
SELECT
COUNT(DISTINCT LevenshteinCache)
FROM
Address
WHERE
" + _cacheFilter + #"
AND
LEN(LevenshteinCache) > 12", _connection);
_dataCommand.CommandTimeout = 0;
int _addressCount = (int)_dataCommand.ExecuteScalar();
_dataCommand = new SqlCommand(#"
SELECT
Data.LevenshteinCache,
Data.CacheCount
FROM
(SELECT
DISTINCT LevenshteinCache,
COUNT(LevenshteinCache) AS CacheCount
FROM
Address
WHERE
" + _cacheFilter + #"
GROUP BY
LevenshteinCache) Data
WHERE
LEN(LevenshteinCache) > 12
ORDER BY
LEN(LevenshteinCache) DESC", _connection);
_dataCommand.CommandTimeout = 0;
SqlDataReader _addressReader = _dataCommand.ExecuteReader();
string[] _addresses = new string[_addressCount + 1];
int[] _addressInstance = new int[_addressCount + 1];
int _itemIndex = 1;
while (_addressReader.Read()) {
string _address = (string)_addressReader[0];
int _count = (int)_addressReader[1];
_addresses[_itemIndex] = _address;
_addressInstance[_itemIndex] = _count;
_itemIndex++;
}
_addressReader.Close();
decimal _comparasionsMade = 0;
decimal _comparisionsAttempted = 0;
decimal _comparisionsExpected = (decimal)_addressCount * ((decimal)_addressCount + 1) / 2;
decimal _percentCompleted = 0;
DateTime _startTime = DateTime.Now;
Parallel.For(1, _addressCount, delegate(int i) {
for (int _index = i + 1; _index <= _addressCount; _index++) {
_comparisionsAttempted++;
decimal _percent = _addresses[i].Length < _addresses[_index].Length ? (decimal)_addresses[i].Length / (decimal)_addresses[_index].Length : (decimal)_addresses[_index].Length / (decimal)_addresses[i].Length;
if (_percent < _requiredDistance) {
decimal _difference = new Levenshtein().threasholdiLD(_addresses[i], _addresses[_index], 50);
_comparasionsMade++;
if (_difference <= _requiredDistance) {
InsertRelatedAddress(ref _connection, _addresses[i], _addresses[_index], _difference);
}
}
else {
_comparisionsAttempted += _addressCount - _index;
break;
}
}
if (_addressInstance[i] > 1 && _addressInstance[i] < 31) {
InsertRelatedAddress(ref _connection, _addresses[i], _addresses[i], 0);
}
_percentCompleted = (_comparisionsAttempted / _comparisionsExpected) * 100M;
TimeSpan _estimatedDuration = new TimeSpan((long)((((decimal)(DateTime.Now - _startTime).Ticks) / _percentCompleted) * 100));
TimeSpan _timeRemaining = _estimatedDuration - (DateTime.Now - _startTime);
string _timeRemains = _timeRemaining.ToString();
});
}
InsertRelatedAddress is a function that updates the database, and there are 500,000 items in the array.

OK. With the updated question, I think it makes more sense. You want to find pairs of strings with a Levenshtein Distance less than a preset distance. I think the key is that you don't compare every set of strings and rely on the properties of Levenshtein distance to search for strings within your preset limit. The answer involves computing the tree of possible changes. That is, compute possible changes to a given string with distance < n and see if any of those strings are in your set. I supposed this is only faster if n is small.
It looks like the question posted here: Finding closest neighbour using optimized Levenshtein Algorithm.

More info required. What is your desired outcome? Are you trying to get a count of all unique strings? You state that you want to see if 2 strings are equal and that if 'they are different in length by x percent then don't bother they not equal'. Why are you checking with a constraint on length by x percent? If you're checking for them to be equal they must be the same length.
I suspect you are trying to something slightly different to determining an exact match in which case I need more info.
Thanks
Neil

Standard deviation of generic list? [duplicate]

This question already has answers here:
How do I determine the standard deviation (stddev) of a set of values?
(12 answers)
Standard Deviation in LINQ
(8 answers)
Closed 9 years ago.
I need to calculate the standard deviation of a generic list. I will try to include my code. Its a generic list with data in it. The data is mostly floats and ints. Here is my code that is relative to it without getting into to much detail:
namespace ValveTesterInterface
{
public class ValveDataResults
{
private List<ValveData> m_ValveResults;
public ValveDataResults()
{
if (m_ValveResults == null)
{
m_ValveResults = new List<ValveData>();
}
}
public void AddValveData(ValveData valve)
{
m_ValveResults.Add(valve);
}
Here is the function where the standard deviation needs to be calculated:
public float LatchStdev()
{
float sumOfSqrs = 0;
float meanValue = 0;
foreach (ValveData value in m_ValveResults)
{
meanValue += value.LatchTime;
}
meanValue = (meanValue / m_ValveResults.Count) * 0.02f;
for (int i = 0; i <= m_ValveResults.Count; i++)
{
sumOfSqrs += Math.Pow((m_ValveResults - meanValue), 2);
}
return Math.Sqrt(sumOfSqrs /(m_ValveResults.Count - 1));
}
}
}
Ignore whats inside the LatchStdev() function because I'm sure its not right. Its just my poor attempt to calculate the st dev. I know how to do it of a list of doubles, however not of a list of generic data list. If someone had experience in this, please help.

The example above is slightly incorrect and could have a divide by zero error if your population set is 1. The following code is somewhat simpler and gives the "population standard deviation" result. (http://en.wikipedia.org/wiki/Standard_deviation)
using System;
using System.Linq;
using System.Collections.Generic;
public static class Extend
{
public static double StandardDeviation(this IEnumerable<double> values)
{
double avg = values.Average();
return Math.Sqrt(values.Average(v=>Math.Pow(v-avg,2)));
}
}

This article should help you. It creates a function that computes the deviation of a sequence of double values. All you have to do is supply a sequence of appropriate data elements.
The resulting function is:
private double CalculateStandardDeviation(IEnumerable<double> values)
{
double standardDeviation = 0;
if (values.Any())
{
// Compute the average.
double avg = values.Average();
// Perform the Sum of (value-avg)_2_2.
double sum = values.Sum(d => Math.Pow(d - avg, 2));
// Put it all together.
standardDeviation = Math.Sqrt((sum) / (values.Count()-1));
}
return standardDeviation;
}
This is easy enough to adapt for any generic type, so long as we provide a selector for the value being computed. LINQ is great for that, the Select funciton allows you to project from your generic list of custom types a sequence of numeric values for which to compute the standard deviation:
List<ValveData> list = ...
var result = list.Select( v => (double)v.SomeField )
.CalculateStdDev();

Even though the accepted answer seems mathematically correct, it is wrong from the programming perspective - it enumerates the same sequence 4 times. This might be ok if the underlying object is a list or an array, but if the input is a filtered/aggregated/etc linq expression, or if the data is coming directly from the database or network stream, this would cause much lower performance.
I would highly recommend not to reinvent the wheel and use one of the better open source math libraries Math.NET. We have been using that lib in our company and are very happy with the performance.
PM> Install-Package MathNet.Numerics
var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();
var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();
See http://numerics.mathdotnet.com/docs/DescriptiveStatistics.html for more information.
Lastly, for those who want to get the fastest possible result and sacrifice some precision, read "one-pass" algorithm https://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods

I see what you're doing, and I use something similar. It seems to me you're not going far enough. I tend to encapsulate all data processing into a single class, that way I can cache the values that are calculated until the list changes.
for instance:
public class StatProcessor{
private list<double> _data; //this holds the current data
private _avg; //we cache average here
private _avgValid; //a flag to say weather we need to calculate the average or not
private _calcAvg(); //calculate the average of the list and cache in _avg, and set _avgValid
public double average{
get{
if(!_avgValid) //if we dont HAVE to calculate the average, skip it
_calcAvg(); //if we do, go ahead, cache it, then set the flag.
return _avg; //now _avg is garunteed to be good, so return it.
}
}
...more stuff
Add(){
//add stuff to the list here, and reset the flag
}
}
You'll notice that using this method, only the first request for average actually computes the average. After that, as long as we don't add (or remove, or modify at all, but those arnt shown) anything from the list, we can get the average for basically nothing.
Additionally, since the average is used in the algorithm for the standard deviation, computing the standard deviation first will give us the average for free, and computing the average first will give us a little performance boost in the standard devation calculation, assuming we remember to check the flag.
Furthermore! places like the average function, where you're looping through every value already anyway, is a great time to cache things like the minimum and maximum values. Of course, requests for this information need to first check whether theyve been cached, and that can cause a relative slowdown compared to just finding the max using the list, since it does all the extra work setting up all the concerned caches, not just the one your accessing.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Check how many instances of strings in a list - c#

You should use floating point maths instead of integer maths in your calculations. int i=100; int a=51; (i/a)==0 //true, integer division sucks for calculating percentages ((double)i/a)==0 //false, actually equals ~1.96

Related

Reducing a BigInteger value in C#

Pitfalls in C# for a new user. (FWHM calculation)

C# getting first digit of int in custom class

Compare each element in an array to each other

Standard deviation of generic list? [duplicate]

Categories

Resources