How to compare 2 strings with int out of errors? - c#

I've searched online for a diff algorithm but none of them do what I am looking for. It is for a texting contest (as in cell phone) and I need the entry text compared to the master text recording the errors along the way. I am semi-new to C# and I get most of the string functions and didn't think this was going to be that hard of a problem, but alas I just can't wrap my head around it.
I have a form with 2 rich-text-boxes (one on top of the other) and 2 buttons. The top box is the master text (string) and the bottom box is the entry text (string). Every contestant is sending a text to an email account, from the email we copy and paste the text into the Entry RTB and compare to the Master RTB. For each single word and single space counts as a thing to check. A word, no matter how many errors it has, is still 1 error. And for every error add 1 sec. to their time.
Examples:
Hello there! <= 3 checks (2 words and 1 space)
Helothere! <= 2 errors (Helo and space)
Hello there!! <= 1 error (extra ! at end of there!)
Hello there! How are you? <= 9 checks (5 words and 4 spaces)
Helothere!! How a re you? <= still 9 checks, 4 errors(helo, no space, extra !, and a space in are)
Hello there!# Ho are yu?? <= 3 errors (# at end of there!, no w, no o and extra ? (all errors are still under the 1 word)
What I have so far:
I've created 6 arrays (3 for master, 3 for entry) and they are
CharArray of all chars
StringArray of all strings(words) including the spaces
IntArray with length of the string in each StringArray
My biggest trouble is if the entry text is wrong and it's shorter or longer than the master. I keep getting IndexOutOfRange exceptions (understandably) but can't fathom how to go about checking and writing the code to compensate.
I hope I have made myself clear enough as to what I need help with. If anyone could give some code examples or something to shoot me in the right path would be very helpful.

Have you looked into the Levenshtein distance algorithm? It returns the number of differences between two strings, which, in your case would be texting errors. Implementing the algorithm based off the pseudo-code found on the wikipedia page passes the first 3 of your 4 use cases:
Assert.AreEqual(2, LevenshteinDistance("Hello there!", "Helothere!");
Assert.AreEqual(1, LevenshteinDistance("Hello there!", "Hello there!!"));
Assert.AreEqual(4, LevenshteinDistance("Hello there! How are you?", "Helothere!! How a re you?"));
Assert.AreEqual(3, LevenshteinDistance("Hello there! How are you?", "Hello there!# Ho are yu??")); //fails, returns 4 errors
So while not perfect out of the box, it is probably a good starting point for you. Also, if you have too much trouble implementing your scoring rules, it might be worth revisiting them.
hth
Update:
Here is the result of the string you requested in the comments:
Assert.AreEqual(7, LevenshteinDistance("Hello there! How are you?", "Hlothere!! Hw a reYou?"); //fails, returns 8 errors
And here is my implementation of the Levenshtein Distance algorithm:
int LevenshteinDistance(string left, string right)
{
if (left == null || right == null)
{
return -1;
}
if (left.Length == 0)
{
return right.Length;
}
if (right.Length == 0)
{
return left.Length;
}
int[,] distance = new int[left.Length + 1, right.Length + 1];
for (int i = 0; i <= left.Length; i++)
{
distance[i, 0] = i;
}
for (int j = 0; j <= right.Length; j++)
{
distance[0, j] = j;
}
for (int i = 1; i <= left.Length; i++)
{
for (int j = 1; j <= right.Length; j++)
{
if (right[j - 1] == left[i - 1])
{
distance[i, j] = distance[i - 1, j - 1];
}
else
{
distance[i, j] = Min(distance[i - 1, j] + 1, //deletion
distance[i, j - 1] + 1, //insertion
distance[i - 1, j - 1] + 1); //substitution
}
}
}
return distance[left.Length, right.Length];
}
int Min(int val1, int val2, int val3)
{
return Math.Min(val1, Math.Min(val2, val3));
}

You need to come up with a scoring systems that works for you're situation.
I would make a word array after each space.
If a word is found on the same index +5.
If a word is found on the same index +-1 index location +3 (keep a counter how much words differ to increase the +- correction
If a needed word is found as part of another word +2
etc.etc. Matching words is hard, getting up with a rules engine that works is 'easier'

I once implemented an algorithm (which I can't find at the moment, I'll post code when I find it) which looked at the total number of PAIRS in the target string. i.e. "Hello, World!" would have 11 pairs, { "He", "el", "ll",...,"ld", "d!" }.
You then do the same thing on an input string such as "Helo World" so you have { "He",...,"ld" }.
You can then calculate accuracy as a function of correct pairs (i.e. input pairs that are in the list of target pairs), incorrect pairs (i.e. input pairs that do not exists in the list of target pairs), compared to the total list of target pairs. Over long enough sentences, this measure would be very accurate fair.

A simple algorithm would be to check letter by letter. If the letters differ increment the num of errors. If the next pairing of letters match, its a switched letter so just continue. If the messup matches the next letter, it is an omission and treat it accordingly. If the next letter matches the messed up one, its an insertion and treat it accordingly. Else the person really messed up and continue.
This doesn't get everything but with a few modifications this could become comprehensive.
a weak attempt at pseudocode:
edit: new idea. look at comments. I don't know the string functions off the top of my head so you'll have to figure that part out. The algorithm kinda fails for words that repeat a lot though...
string entry; //we'll pretend that this has stuff inside
string master; // this too...
string tempentry = entry; //stuff will be deleted so I need a copy to mess up
int e =0; //word index for entry
int m = 0; //word index for master
int errors = 0;
while(there are words in tempentry) //!tempentry.empty() ?
string mword = the next word in master;
m++;
int eplace = find mword in tempentry; //eplace is the index of where the mword starts in tempentry
if(eplace == -1) //word not there...
continue;
else
errors += m - e;
errors += find number of spaces before eplace
e = m // there is an error
tempentry = stripoff everything between the beginning and the next word// substring?
all words and spaces left in master are considered errors.
There are a couple of bounds checking errors that need to be fixed here but its a good start.

Related

Cant get a path finding algorithm to work correctly in all cases

I am having trouble with an assignment about finding the shortest total path in a grid, while visiting all the correct tiles in the correct order.
We are supposed to emulate manually inputting a word, like when using a controller to write something, and find the least amount of commands (up, down, left, right) needed to do so.
Our input is the grid, parameters, and the word we are supposed to work with.
I store them like this (with example inputs):
Height = 2;
Width = 2;
Content = "ABCC";
Word = "ABC";
grid = new char[Height, Width];
Contents = Content.ToCharArray();
Words = Word.ToCharArray();
int ch = 0;
for (int i = 0; i < Height; i++)
{
for (int j = 0; j < Width; j++)
{
if (ch < Contents.Length)
{
grid[i, j] = Contents[ch];
ch++;
}
}
}
The actual way I compute the shortest path is like so:
public void GridSearch( int a, int FirstX, int FirstY, int PathLength)
{
int NewPath;
int NewX;
int NewY;
int SecondX = 0;
int SecondY = 0;
for (int i = 0; i < Height; i++)
{
for (int j = 0; j < Width; j++)
{
if (grid[i, j] == Words[a])
{
SecondX = i;
SecondY = j;
NewPath = PathLength;
NewPath += Math.Abs(FirstX - SecondX);
NewPath += Math.Abs(FirstY - SecondY);
NewX = SecondX;
NewY = SecondY;
if (a < Words.Length-1)
{
GridSearch(a+1, NewX, NewY, NewPath);
}
else
{
if (FinalPath > NewPath ^ FinalPath == -1)
{
FinalPath = NewPath;
}
}
}
}
}
We are also supposed to "click" when on the correct tile, so I am adding the length of "Words" to the total of commands.
In this case, the shortest path between the letters would be 2 (right, down) and the length is 3, so 5 is the correct answer.
This is also what my program gets, however when I try to send it in, the automated checker says it only passes 1 out of 5 tests, which is an improvement over the 0 that I had until recently, but still not actually good.
Sadly it does not say which inputs it used to make my program fail, and after a day of trying things I am out of ideas on how to fix it, could anyone please point out the, no doubt, obvious mistake I am making and help me fix this program?
EDIT: The assignment instructions as written (since a commenter asked for them):
Some devices allow text entry using a grid of letters. The grid
contains a movable cursor, which begins in the upper-left-hand corner.
Arrow keys move the cursor up, down, left and right and Enter key
chooses the letter under cursor.
For example, if the input grid looks like this:
ABCDEFGH
IJKLMNOP
QRSTUVWX
YZ
we can enter the text "HELLO" with the following sequence of keys
(which is only one of many possible sequences):
right
right
right
right
right
right
right
Enter
left
left
left
Enter
down
left
Enter
Enter
right
right
right
Enter
Write a program that for a given grid (which may contain both
lowercase and uppercase letters) and text (which may also contain
non-alphabetic characters) determines and writes out the minimum
number of keystrokes required to enter the given text.
Caution: Each letter may appear more than once in the grid!
The input begins with numbers indicating the width and height of the
grid (each on its own line).
A single line follows containing the contents of the entire grid (in
row-major order, i.e. with one row after another).
The rest of the lines contain the text to be entered. ! You should
ignore any characters in the text that are not present in the grid.
Example:
Input:
3
3
ABCBFECDF
ABCDEFA
Output:
15
In this example, the grid has the form
ABC
BFE
CDF
It is possible to enter the text ABCDEFA in many possible ways; 15
keystrokes is the length of the shortest of these.
Revising my previous answer, it is likely that you have not counted the "enter" keystroke. I.e. you should add one to the candidate path length for each letter:
...
NewY = SecondY;
**NewPath++;**
if (a < Words.Length - 1)
...
This gives a correct length of 15 keypresses on your example set of "ABCBFECDF" / "ABCDEFA".
Note that this type of code greatly benefits from a type that represents a pair of x/y coordinate, like a Point or Vector2i, so you don't have to repeat a bunch of calculations for both x and y coordinates. I would also recommend following common coding conventions like
declare local variables in the smallest possible scope, not at the top of the method
Use "camelCasing" for local variables
Prefer pure methods whenever possible, i.e.
I would still recommend reading up on Djikstra or A*, since these should be more generally applicable and be more efficient.

Writing the letter O in C#, consisting of "#"

I am new here and also new to C# in general. I have as a task to output the letter O. Furthermore, the program should read a number that means the corresponding height.
The output should look like the given photo illustrates.
If the number is now larger, the object should be adjusted in width accordingly. And this is now my problem. I already have a code block, which outputs this above constellation, but is not customizable.
I have already tried using WriteLine and if statements to scale a width with spaces, but that doesn't work the way I want it to.
In addition, the convert is still missing at the beginning, which introduces the entered number.
So what I want:
you should enter a number that defines the height.
the output should be adjusted to the height and scaled logically
no more lines should be added that contain the #, it should only be the total of 8 "#" (stretched, you could say)
My code so far is:
class Program
{
static void Main(string[] args)
{
for (int i = 0; i <= 7; i++)
if (i == 0)
Console.WriteLine(" #");
for (int j = 1; j <= 7; j++)
if (j == 1)
Console.WriteLine(" # #");
for (int k = 2; k <= 7; k++)
if (k == 2)
Console.WriteLine("# #");
for (int m = 3; m <= 7; m++)
if (m == 3)
Console.WriteLine(" # #");
for (int m = 4; m <= 7; m++)
if (m == 4)
Console.WriteLine(" #");
Console.ReadKey();
}
}
Have a close look at the diamond (the letter O).
The first and last line contain only one #. On these lines you need a loop to print the variable number of spaces before.
___#
Between these two lines you have a variable number of lines. You need a loop to produce these lines. Or two loops. One to produce the ^ shape
__#_#
_#___#
#_____#
and another to produce the v shape.
_#___#
__#_#
Nested inside these loops you need two more loops. One which prints the spaces before and one which prints the spaces between the first # and the second #.
Note that if the total number of lines is odd, the middle line is the longest and the ^ and v parts do not have the same size. If the number is even, then you will have two longest lines and two parts having the same height.
and the last
___#
With Console.Write you can write characters on the same line. With Console.WriteLine you end the line, and the next write will write on the next line.
Well, and then you need some basic arithmetic to calculate the number of spaces depending on the total height and the current line number.
Note that the greatest effort goes into dissecting this shape into its basic components and analyzing its geometry. Once you have this, the coding itself is rather simple.
Since this looks like homework, I'll let you figure out the details and do the coding part.

Caesar Cipher (C#)

I would like to preface this by saying it is for an assessment, so I don't want you to directly give me the answer, I would like you to point me in the right direction, slightly tweak what I have done or just tell me what I should look into doing.
I am trying to create a Caesar Cipher to decipher a text document we have been given. It needs to print all the possible shifts in the console and output the final one to a text document. I'm thinking of trying to add frequency analysis to this later to find the correct one, but I have some problems now that I need to sort out before I can do that.
This is what I have made so far:
using System;
using System.IO;
class cipher
{
public static void Main(string[] args)
{
string output = "";
int shift;
bool userright = false;
string cipher = File.ReadAllText("decryptme.txt");
char[] decr = cipher.ToCharArray();
do {
Console.WriteLine("How many times would you like to shift? (Between 0 and 26)");
shift = Convert.ToInt32(Console.ReadLine());
if (shift > 26) {
Console.WriteLine("Over the limit");
userright = false;
}
if (shift < 0) {
Console.WriteLine("Under the limit");
userright = false;
}
if (shift <= 26 && shift >= 0)
{
userright = true;
}
} while (userright == false);
for (int i = 0; i < decr.Length; i++)
{
{
char character = decr[i];
character = (char)(character + shift);
if (character == '\'' || character == ' ')
continue;
if (character > 'Z')
character = (char)(character - 26);
else if (character < 'A')
character = (char)(character + 26);
output = output + character;
}
Console.WriteLine("\nShift {0} \n {1}", i + 1, output);
}
StreamWriter file = new StreamWriter("decryptedtext.txt");
file.WriteLine(output);
file.Close();
}
}
Right now it compiles and read the document but when it runs in the console it prints shift one as 1 letter from the encoded text, shift 2 as 2 letters from it, etc.
I have no idea what I have done wrong and any help would be greatly appreciated. I have also started thinking about ASCII values for letters but have no idea how to implement this.
Once again, please don't just give me the answer or I will not have learned anything from this - and I have been trying to crack this myself but had no luck.
Thanks.
Break the problem down into smaller bite-sized chunks. Start by printing a single shifted line, say with a shift of 1.
When you have that part working correctly (and only then) extend your code to print 26 lines with shifts of 0, 1, 2, 3, ... 26. I am not sure if your instructor wants either or both of shift 0 at the start and shift 26 at the end. You will need to ask.
Again, get that working correctly, and write new code to analyse one line only, and give it some sort of score. Get that working properly.
Now calculate the scores for all the lines and pick out the line with the best score. That should be the right answer. If it isn't then you will need to check your scoring method.
Writing small incremental changes to a very simple starting program is usually a lot easier than trying to go straight from a blank screen to the full, complex, program. Add the complexity gradually, one piece at a time, testing as you go.

What would be the shortest way to sum up the digits in odd and even places separately

I've always loved reducing number of code lines by using simple but smart math approaches. This situation seems to be one of those that need this approach. So what I basically need is to sum up digits in the odd and even places separately with minimum code. So far this is the best way I have been able to think of:
string number = "123456789";
int sumOfDigitsInOddPlaces=0;
int sumOfDigitsInEvenPlaces=0;
for (int i=0;i<number.length;i++){
if(i%2==0)//Means odd ones
sumOfDigitsInOddPlaces+=number[i];
else
sumOfDigitsInEvenPlaces+=number[i];
{
//The rest is not important
Do you have a better idea? Something without needing to use if else
int* sum[2] = {&sumOfDigitsInOddPlaces,&sumOfDigitsInEvenPlaces};
for (int i=0;i<number.length;i++)
{
*(sum[i&1])+=number[i];
}
You could use two separate loops, one for the odd indexed digits and one for the even indexed digits.
Also your modulus conditional may be wrong, you're placing the even indexed digits (0,2,4...) in the odd accumulator. Could just be that you're considering the number to be 1-based indexing with the number array being 0-based (maybe what you intended), but for algorithms sake I will consider the number to be 0-based.
Here's my proposition
number = 123456789;
sumOfDigitsInOddPlaces=0;
sumOfDigitsInEvenPlaces=0;
//even digits
for (int i = 0; i < number.length; i = i + 2){
sumOfDigitsInEvenPlaces += number[i];
}
//odd digits, note the start at j = 1
for (int j = 1; i < number.length; i = i + 2){
sumOfDigitsInOddPlaces += number[j];
}
On the large scale this doesn't improve efficiency, still an O(N) algorithm, but it eliminates the branching
Since you added C# to the question:
var numString = "123456789";
var odds = numString.Split().Where((v, i) => i % 2 == 1);
var evens = numString.Split().Where((v, i) => i % 2 == 0);
var sumOfOdds = odds.Select(int.Parse).Sum();
var sumOfEvens = evens.Select(int.Parse).Sum();
Do you like Python?
num_string = "123456789"
odds = sum(map(int, num_string[::2]))
evens = sum(map(int, num_string[1::2]))
This Java solution requires no if/else, has no code duplication and is O(N):
number = "123456789";
int[] sums = new int[2]; //sums[0] == sum of even digits, sums[1] == sum of odd
for(int arrayIndex=0; arrayIndex < 2; ++arrayIndex)
{
for (int i=0; i < number.length()-arrayIndex; i += 2)
{
sums[arrayIndex] += Character.getNumericValue(number.charAt(i+arrayIndex));
}
}
Assuming number.length is even, it is quite simple. Then the corner case is to consider the last element if number is uneven.
int i=0;
while(i<number.length-1)
{
sumOfDigitsInEvenPlaces += number[ i++ ];
sumOfDigitsInOddPlaces += number[ i++ ];
}
if( i < number.length )
sumOfDigitsInEvenPlaces += number[ i ];
Because the loop goes over i 2 by 2, if number.length is even, removing 1 does nothing.
If number.length is uneven, it removes the last item.
If number.length is uneven, then the last value of i when exiting the loop is that of the not yet visited last element.
If number.length is uneven, by modulo 2 reasoning, you have to add the last item to sumOfDigitsInEvenPlaces.
This seems slightly more verbose, but also more readable, to me than Anonymous' (accepted) answer. However, benchmarks to come.
Well, the compiler seems to think my code more understandable as well, since he removes it all if I don't print the results (which explains why I kept getting a time of 0 all along...). The other code though is obfuscated enough for even the compiler.
In the end, even with huge arrays, it's pretty hard for clock_t to tell the difference between the two. You get about a third less instructions in the second case, but since everything's in cache (and your running sums even in registers) it doesn't matter much.
For the curious, I've put the disassembly of both versions (compiled from C) here : http://pastebin.com/2fciLEMw

Text Justification

I am looking for a c# function or routine that will center justify text.
For example, if I have a sentence, I have noticed that when the sentence is justified to the edges of the screen, that spaces are placed in the line. The inserted spaces start in the center and move out from there on both sides as needed as needed.
Is there a C# function that I can pass my string, say 50 chars, and get back a pretty 56 char string?
Thanks in advance,
Rob
Nice task. Here's a solution based on Linq extension methods. If you do not wish to use them, see history for previous version of code. In this example spaces on the left and right sides from center are 'equal' with respect to order of inserting.
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
public static String Justify(String s, Int32 count)
{
if (count <= 0)
return s;
Int32 middle = s.Length / 2;
IDictionary<Int32, Int32> spaceOffsetsToParts = new Dictionary<Int32, Int32>();
String[] parts = s.Split(' ');
for (Int32 partIndex = 0, offset = 0; partIndex < parts.Length; partIndex++)
{
spaceOffsetsToParts.Add(offset, partIndex);
offset += parts[partIndex].Length + 1; // +1 to count space that was removed by Split
}
foreach (var pair in spaceOffsetsToParts.OrderBy(entry => Math.Abs(middle - entry.Key)))
{
count--;
if (count < 0)
break;
parts[pair.Value] += ' ';
}
return String.Join(" ", parts);
}
static void Main(String[] args) {
String s = "skvb sdkvkd s kc wdkck sdkd sdkje sksdjs skd";
String j = Justify(s, 5);
Console.WriteLine("Initial: " + s);
Console.WriteLine("Result: " + j);
Console.ReadKey();
}
}
As far as I know, there is no "built-in" C# or .net library function for this, so you'd have to implement something on your own (or find some code online whose license suits your needs).
A simple greedy algorithm shouldn't be too difficult to implement, however:
Until the required number of characters is reached:
Extend the shortest sequence of spaces by one
(choose one randomly if there is more than one such sequence).
I'd distribute the spaces randomly rather than starting at the center, to make sure the spaces are evenly distributed among your text (rather than concentrated at one position). Oh, and keep in mind that some people consider fully-justified fixed-font text harder to read than left-justified fixed-font text.

Categories

Resources