How to use math to create more efficient code? - c#

I'm horrible with Math, however I'm working diligently to change that. I hate writing ugly, inefficient code. I'm currently developing a small hobby game and I need to be able to calculate adjacent tiles. I decided against using x,y coords and instead each tile has a unique id. I have the following functional code, however I'm struggling to find a way to make it not so ugly:
this.AdjacentTiles = new int[8];
this.AdjacentTiles[0] = (id - 9) - 1;
this.AdjacentTiles[1] = (id - 9);
this.AdjacentTiles[2] = (id - 9) + 1;
this.AdjacentTiles[3] = (id - 1);
this.AdjacentTiles[4] = (id + 1);
this.AdjacentTiles[5] = (id + 9) - 1;
this.AdjacentTiles[6] = (id + 9);
this.AdjacentTiles[7] = (id + 9) + 1;
I know it's possible to calculate all this within a single, tidy for loop, however I just can't wrap my head around how to do so. I don't necessarily want to be spoon fed, because like I said I'm trying to improve myself, I just need some direction.
Thank you very much.

The math is relatively easy, with something like (pseudo-code):
index = 0
for row = -1 to 1 inclusive: # prev/this/next row
for col = -1 to 1 inclusive: # prev/this/next col
if row != 0 or col != 0: # ignore current cell
array[index++] = id + row * 3 + col
That will set the elements to what you need. However, you may want to consider the edge cases such as when id is in the left-most column. In that case, there are no adjacent cells to the left, unless you've created an array with empty edges.

Related

C# - Taking top 10 values and multiplying them

I am currently working on a bayesian spam filter, made a filter using an algorithm, but it wil not work for long emails, there are just too much values to multiply and it excedes the range of double. I thought about making it so that I only take 10 or 20 most important (highest values for both spam and ham) and multiply only them. I thought about making another Dictionary inside and then multiply values out of it.
This is how it looks right now:
if (countsWordOccurenceSpam.ContainsKey(word.Key) && (!countsWordOccurenceOk.ContainsKey(word.Key)))
{
int spamValue = 0;
countsWordOccurenceSpam.TryGetValue(word.Key, out spamValue);
totals = spamValue ;
fprob_spam = ((double)spamValue) / ile_spam;
sum_spam = (((weight * probability) + (totals * fprob_spam)) / (totals + weight));
sum_ok = ((weight * probability) / (totals + weight));
sum_spam = Math.Pow(sum_spam, word.Value);
sum_ok = Math.Pow(sum_ok, word.Value);
wp_spam_1 = wp_spam_1 * sum_spam;
last_o_1 = last_o_1 * sum_ok;
}
This is one part of algorithm, now I am thinking about putting all the values from sum_spam to one Dictionary, and all the values from sum_ok to another and take using .Take(10) to select 10 highest values and multiply all of them.
Does it seem right? I am really thinking it would be very inefficient, Is there any way to do it?

Efficiency of string.Split() vs. string.Substring() in C#?

I'm working on a project that involves taking large text files and parsing each line. The point is to parse the whole text file into cells, much like an Excel spreadsheet. Unfortunately, there are no delimiters for most of the files, so I need some sort of index-based method to manually create the cells, even if the column is blank.
Previously, lines were parsed by splitting on null, which worked well. However, new data has made this method unreliable due to its not including blank cells, so I had to make a new method of parsing lines, which uses Substring. The method takes in an array of integers indices and splits the strings on the given indices:
private string[] SetCols3(int[] fixedWidthValues, string line)
{
{
string[] cols = new string[fixedWidthValues.Length];
int columnLength;
int FWV;
int FWV2;
bool lastOfFWV;
bool outOfBounds;
for (int x = 0; x < fixedWidthValues.Length; x++)
{
FWV = fixedWidthValues[x];
lastOfFWV = x + 1 >= fixedWidthValues.Length;
outOfBounds = lastOfFWV ? true : fixedWidthValues[x + 1] >= line.Length;
FWV2 = lastOfFWV || outOfBounds ? line.Length : fixedWidthValues[x + 1];
columnLength = FWV2 - FWV;
columnLength *= columnLength < 0 ? -1 : 1;
if (FWV < line.Length)
{
cols[x] = line.Substring(FWV, columnLength).Trim();
}
}
return cols;
}
Quick breakdown of the code: the integers and booleans are just to handle blank columns, lines that are shorter than normal, etc., and to make the code cleaner for other people to understand a little better (as opposed to one long, convoluted if statement).
My question: is there a way to make this more efficient? For some reason, this method takes significantly longer than the previous method. I understand it does more, so more time was expected. However, the difference is surprisingly huge. One iteration (with 15 indices) takes around 0.07 seconds (which is huge considering this method gets called several thousands time per file), compared to 0.00002 seconds on the high end for the method that splits on null. Is there something I can change in my code to noticeably increase its efficiency? I haven't been able to find anything particularly useful after hours of searching online.
Also, the number of indices/columns greatly affects the speed. For 15 columns, it takes around 0.07 seconds compared to 0.05 for 10 columns.
First,
outOfBounds = lastOfFWV ? true : fixedWidthValues[x + 1] >= line.Length;
could be changed to
outOfBounds = lastOfFWV || fixedWidthValues[x + 1] >= line.Length;
Next,
columnLength = FWV2 - FWV;
columnLength *= columnLength < 0 ? -1 : 1;
could be changed to
columnLength = Math.Abs(FWV2 - FWV);
And last,
if (FWV < line.Length)
{
could be moved to just after the FWV assignment at the top of the loop and changed to
if (FWV < line.Length)
continue;
But, I don't think any of these changes would make a significant impact on speed. Possibly more impact would be gained by changing what's passed in. Instead of passing in the column starting positions and calculating the column widths for each line, which won't change, pass in the starting positions and column widths. This way there's no calculation involved.
But rather than guessing, it'd be best to profile the method to find the hot spot(s).
The issue was two stray .ToInt32() calls I accidentally included (I don't know why they were there). This particular method was a different method, one from my company, than the Convert.ToInt32(), and for some reason it was majorly inefficient when trying to convert numbers. For reference, the issues was on the following lines as follows:
FWV = fixedWidthValues[x].ToInt32();
...
FWV2 = lastOfFWV || outOfBounds ? line.Length : fixedWidthValues[x + 1].ToInt32();
Removing them increased the efficiency by 60 times...

How to compare 2 strings with int out of errors?

I've searched online for a diff algorithm but none of them do what I am looking for. It is for a texting contest (as in cell phone) and I need the entry text compared to the master text recording the errors along the way. I am semi-new to C# and I get most of the string functions and didn't think this was going to be that hard of a problem, but alas I just can't wrap my head around it.
I have a form with 2 rich-text-boxes (one on top of the other) and 2 buttons. The top box is the master text (string) and the bottom box is the entry text (string). Every contestant is sending a text to an email account, from the email we copy and paste the text into the Entry RTB and compare to the Master RTB. For each single word and single space counts as a thing to check. A word, no matter how many errors it has, is still 1 error. And for every error add 1 sec. to their time.
Examples:
Hello there! <= 3 checks (2 words and 1 space)
Helothere! <= 2 errors (Helo and space)
Hello there!! <= 1 error (extra ! at end of there!)
Hello there! How are you? <= 9 checks (5 words and 4 spaces)
Helothere!! How a re you? <= still 9 checks, 4 errors(helo, no space, extra !, and a space in are)
Hello there!# Ho are yu?? <= 3 errors (# at end of there!, no w, no o and extra ? (all errors are still under the 1 word)
What I have so far:
I've created 6 arrays (3 for master, 3 for entry) and they are
CharArray of all chars
StringArray of all strings(words) including the spaces
IntArray with length of the string in each StringArray
My biggest trouble is if the entry text is wrong and it's shorter or longer than the master. I keep getting IndexOutOfRange exceptions (understandably) but can't fathom how to go about checking and writing the code to compensate.
I hope I have made myself clear enough as to what I need help with. If anyone could give some code examples or something to shoot me in the right path would be very helpful.
Have you looked into the Levenshtein distance algorithm? It returns the number of differences between two strings, which, in your case would be texting errors. Implementing the algorithm based off the pseudo-code found on the wikipedia page passes the first 3 of your 4 use cases:
Assert.AreEqual(2, LevenshteinDistance("Hello there!", "Helothere!");
Assert.AreEqual(1, LevenshteinDistance("Hello there!", "Hello there!!"));
Assert.AreEqual(4, LevenshteinDistance("Hello there! How are you?", "Helothere!! How a re you?"));
Assert.AreEqual(3, LevenshteinDistance("Hello there! How are you?", "Hello there!# Ho are yu??")); //fails, returns 4 errors
So while not perfect out of the box, it is probably a good starting point for you. Also, if you have too much trouble implementing your scoring rules, it might be worth revisiting them.
hth
Update:
Here is the result of the string you requested in the comments:
Assert.AreEqual(7, LevenshteinDistance("Hello there! How are you?", "Hlothere!! Hw a reYou?"); //fails, returns 8 errors
And here is my implementation of the Levenshtein Distance algorithm:
int LevenshteinDistance(string left, string right)
{
if (left == null || right == null)
{
return -1;
}
if (left.Length == 0)
{
return right.Length;
}
if (right.Length == 0)
{
return left.Length;
}
int[,] distance = new int[left.Length + 1, right.Length + 1];
for (int i = 0; i <= left.Length; i++)
{
distance[i, 0] = i;
}
for (int j = 0; j <= right.Length; j++)
{
distance[0, j] = j;
}
for (int i = 1; i <= left.Length; i++)
{
for (int j = 1; j <= right.Length; j++)
{
if (right[j - 1] == left[i - 1])
{
distance[i, j] = distance[i - 1, j - 1];
}
else
{
distance[i, j] = Min(distance[i - 1, j] + 1, //deletion
distance[i, j - 1] + 1, //insertion
distance[i - 1, j - 1] + 1); //substitution
}
}
}
return distance[left.Length, right.Length];
}
int Min(int val1, int val2, int val3)
{
return Math.Min(val1, Math.Min(val2, val3));
}
You need to come up with a scoring systems that works for you're situation.
I would make a word array after each space.
If a word is found on the same index +5.
If a word is found on the same index +-1 index location +3 (keep a counter how much words differ to increase the +- correction
If a needed word is found as part of another word +2
etc.etc. Matching words is hard, getting up with a rules engine that works is 'easier'
I once implemented an algorithm (which I can't find at the moment, I'll post code when I find it) which looked at the total number of PAIRS in the target string. i.e. "Hello, World!" would have 11 pairs, { "He", "el", "ll",...,"ld", "d!" }.
You then do the same thing on an input string such as "Helo World" so you have { "He",...,"ld" }.
You can then calculate accuracy as a function of correct pairs (i.e. input pairs that are in the list of target pairs), incorrect pairs (i.e. input pairs that do not exists in the list of target pairs), compared to the total list of target pairs. Over long enough sentences, this measure would be very accurate fair.
A simple algorithm would be to check letter by letter. If the letters differ increment the num of errors. If the next pairing of letters match, its a switched letter so just continue. If the messup matches the next letter, it is an omission and treat it accordingly. If the next letter matches the messed up one, its an insertion and treat it accordingly. Else the person really messed up and continue.
This doesn't get everything but with a few modifications this could become comprehensive.
a weak attempt at pseudocode:
edit: new idea. look at comments. I don't know the string functions off the top of my head so you'll have to figure that part out. The algorithm kinda fails for words that repeat a lot though...
string entry; //we'll pretend that this has stuff inside
string master; // this too...
string tempentry = entry; //stuff will be deleted so I need a copy to mess up
int e =0; //word index for entry
int m = 0; //word index for master
int errors = 0;
while(there are words in tempentry) //!tempentry.empty() ?
string mword = the next word in master;
m++;
int eplace = find mword in tempentry; //eplace is the index of where the mword starts in tempentry
if(eplace == -1) //word not there...
continue;
else
errors += m - e;
errors += find number of spaces before eplace
e = m // there is an error
tempentry = stripoff everything between the beginning and the next word// substring?
all words and spaces left in master are considered errors.
There are a couple of bounds checking errors that need to be fixed here but its a good start.

Matching Two Lists

I have a table that contains human entered observation data. There is a column that is supposed to correspond to another list; the human entered value should identically match that in a sort of master list of possibilities.
The problem however is that the human data is abbreviated, misspelled, and etc. Is there a mechanism that does some sort of similarity search to find what the human entered data should actually be?
Examples
**Human Entered** **Should Be**
Carbon-12 Carbon(12)
South Korea Republic of Korea
farenheit Fahrenheit
The only thought I really have is to break up the Human Entered data into like 3 character sections and see if they are contained in the Should Be list. It would just pick the highest rated entry. As a later addition it could present the user with a choice of the top 10 or something.
I'm also not necessarily interested in an absolutely perfect solution, but if it worked like 70% right it would save A LOT of time going through the list.
One option is to look for a small Levenshtein distance between two strings rather than requiring an exact match. This would help find matches where there are minor spelling differences or typos.
Another option is to normalize the strings before comparing them. The normalization techniques that make sense depend on your specific application but it could for example involve:
Removing all punctuation.
Converting UK spellings to US spellings.
Using the scientific name for a substance instead of its common names.
etc.
You can then compare the normalized forms of the members of each list instead of the original forms. You may also want to consider using a case-insensitive comparison instead of a case-sensitive comparison.
You can try to calculate the similarity of two strings using Levenshtein distance:
private static int CalcLevensteinDistance(string left, string right)
{
if (left == right)
return 0;
int[,] matrix = new int[left.Length + 1, right.Length + 1];
for (int i = 0; i <= left.Length; i++)
// delete
matrix[i, 0] = i;
for (int j = 0; j <= right.Length; j++)
// insert
matrix[0, j] = j;
for (int i = 0; i < left.Length; i++)
{
for (int j = 0; j < right.Length; j++)
{
if (left[i] == right[j])
matrix[i + 1, j + 1] = matrix[i, j];
else
{
// deletion or insertion
matrix[i + 1, j + 1] = System.Math.Min(matrix[i, j + 1] + 1, matrix[i + 1, j] + 1);
// substitution
matrix[i + 1, j + 1] = System.Math.Min(matrix[i + 1, j + 1], matrix[i, j] + 1);
}
}
}
return matrix[left.Length, right.Length];
}
Now calculate the similarity between two strings in %
public static double CalcSimilarity(string left, string right, bool ignoreCase)
{
if (ignoreCase)
{
left = left.ToLower();
right = right.ToLower();
}
double distance = CalcLevensteinDistance(left, right);
if (distance == 0.0f)
return 1.0f;
double longestStringSize = System.Math.Max(left.Length, right.Length);
double percent = distance / longestStringSize;
return 1.0f - percent;
}
Have you considered using a (...or several) drop down list(s) to enforce correct input? In my opinion, that would be a better approach in most cases when considering usability and user friendlyness. It would also make treatment of this input a lot easier. When just using free text input, you'd probably get a lot of different ways to write one thing, and you'll "never" be able to figure out every way of writing anything complex.
Example: As you wrote; "carbon-12", "Carbon 12", "Carbon ( 12 )", "Carbon (12)", "Carbon - 12" etc... Just for this, the possibilities are nearly endless. When you also consider things like "South Korea" vs "Republic of Korea" where the mapping is not "1:1" (What about North Korea? Or just "Korea"?), this gets even harder.
Of course, I know nothing about your application and might be completely wrong. But usually, when you expect complex values in a certain format, a drop down list would in many cases make both your job as a developer easier, as well as give the end user a better experience.

Random directions, with no repeat.. (Bad Description)

Hey there, So I'm knocking together a random pattern generation thing.
My code so far:
int permutes = 100;
int y = 31;
int x = 63;
while (permutes > 0) {
int rndTurn = random(1, 4);
if (rndTurn == 1) { y = y - 1; } //go up
if (rndTurn == 2) { y = y + 1; } //go down
if (rndTurn == 3) { x = x - 1; } //go right
if (rndTurn == 4) { x = x + 1; } //go left
setP(x, y, 1);
delay(250);
}
My question is, how would I go about getting the code to not go back on itself?
e.g. The code says "Go Left" but the next loop through it says "Go Right", how can I stop this?
NOTE: setP turns a specific pixel on.
Cheers peoples!
It depends on what you mean.
If you mean "avoid going back to a step I was most previously on" then you have to remember the direction of the last movement. That is if you move up your next movement can't be down.
If you mean "avoid going back on a spot you've ever been on" then you're going to have to remember every spot you've been on. This can be implemented efficiently with a hash table using a key with a class representing a coordinate with appropriate Equals/HashCode functions.
Since each square corresponds to a pixel, your coordinate space must be finite, so you could keep track of coordinates you've already visited.
If there's a corresponding getP function to determine if a pixel has already been turned on, you could just use that.
You remember the last direction and, using random(1,3), pick either of the remaining three, then store that as the last one.
Not sure if this approach will work.
Create a new variable called lastRndTurn as int, and assign this after your if statements.
Then add a new while loop after your int rndTurn = random(1, 4).
while (lastRndTurn == rndTurn)
{
rndTurn = random(1, 4);
}

Categories

Resources