Searching for a phrase - c#

I've got an array of words, which are taken from a sentence (each word in the sentence is placed in an array).
A user can search for a phrase to see if it's found in this sentence. This is to be determined through the characters' offset values. This means that each word is checked to see if it exists in the phrase, separately, then a check is carried out to see if the words come after each other or not (separated by a space in the sentence).
The words are stored in a tree, and thus the offset values (character position) is the only thing which determines which word comes after which (and is separated by a space).
My problem is that words which are the same (and already stored in the tree) have the same offset values, and thus each word stores a data structure of all offset valuesattributed with the specific word. This is the code I've got so far, which works perfectly except that it fails in the following case:
For example I've got this sentence: this is a test to see if this is working.
If I search for 'this is a', then the first this is is returned as well as this is a.
Here's the code:
for (int i = 0; i < offsets.Count - 1; i++)
{
LinkedList<int> current = allOffsets[i];
LinkedList<int> next = allOffsets[i + 1];
for (int j = 0; j < current.Count; j++)
{
for (int k = 0; k < next.Count; k++)
{
if (current.ElementAt(j) + words[i].Length - 1 + 2 == next.ElementAt(k))
{
if (!finalResult.Contains(current.ElementAt(j)))
{
finalResult.Add(current.ElementAt(j));
}
if (!finalResult.Contains(next.ElementAt(k)))
{
finalResult.Add(next.ElementAt(k));
}
}
}
}
}
return finalResult;
Please note that finalResult is a list which stores all the 'valid' offsets, and offsets stores all the offsets in the tree. words is an array which contains all the words after they're split from the sentence.
EDIT: Also please note that I'm checking to see if the words follow each other by adding the offset of the first letter of a word by 2 (to account for the space) and this will be equal to the offset of the first letter of the next word, if it follows.

var source = new List<string>() { "this", "is", "a", "test", "to", "see", "if", "this", "is", "working" };
var userInput = "this is a";
var inputList = userInput.Split(' ');
var inputListCount = inputList.Count();
var exists = false;
for (int i = 0; i < source.Count; i++)
{
if (inputList[0] == source[i])
{
var found = true;
for (int j = 1; j < inputListCount; j++)
{
if (inputList[j] != source[j])
{
found = false;
break;
}
}
if (found)
{
exists = true;
break;
}
}
}
Console.WriteLine(exists);

Related

Text from file to the end of 2d char array

I'm trying to solve the problem, but I just can't find the answer.
It is required to read a names.txt file, consisting of 5 words. After that, needs to convert them into char and then put the left side of the matrix and the bottom (look picture down). Other empty spaces need to fill with symbol "+".
I've tried many variations, but it doesn't display correctly.
Please help!
String txtFromFile = File.ReadAllText(#"C:\Users\source\names.txt");
Console.WriteLine("Words from file:\n{0}", txtFromFile);
int rows = 10;
int column = 10;
char[,] charArray = new char[rows, column];
for (int a = 0; a < rows; a++)
{
for (int b = 0; b < column; b++)
{
charArray[a, b] = '+';
Console.Write(string.Format("{0} ", charArray[a, b]));
}
Console.Write(Environment.NewLine + Environment.NewLine);
}
If you are inexperienced with Linq her is a solution without using it.
int rows = 10;
int column = 10;
int lineCount = 0; //pointer variable to be used when padding lines with +
string emptyLine = "";
emptyLine = emptyLine.PadRight(column, '+'); //create empty line string
string[] lines = File.ReadLines(#"C:\Users\source\names.txt").ToArray(); //read all lines and store in a string array variable
//add lines with only +
for (int row = 0; row < rows - lines.Length; row++)
{
Console.WriteLine(emptyLine);
}
//loop through all read lines and pad them
foreach (string line in lines)
{
lines[lineCount] = lines[lineCount].Replace(line, line.PadRight(column, '+')); //pad the line and replace it in the collection
Console.WriteLine(lines[lineCount]);
lineCount++;
}
This solution uses string instead of char[]. However, if you need to get the array you can simply find it in the read lines collection by
char[] charArray = lines[i].ToCharArray();
for an arbitrary index i in the read lines collection.
You can do it in one Line,
using System.Linq;
...
//Read all lines instead of reading all inputs in form of text.
//Note: Expecting all words should be are stored on different line.
string[] txtFromFile = File.ReadAllLines(#"C:\Users\source\names.txt");
var result = Enumerable.Range(0, 10) //Iterate for 10 lines
.Select(x => x < 5 // Check for line number
? new string('+', 10) //If line is from 0..4, then print ++++++++++
: txtFromFile[x-5].PadRight(10, '+') //else print word then pad it with ++
);
//Print the result
Console.WriteLine(string.Join(Environment.NewLine, result));
.NET Fiddle
output:
++++++++++
++++++++++
++++++++++
++++++++++
++++++++++
DOG+++++++
SHEEP+++++
CHIMPANZEE
BREAVER+++
LION++++++

Appending to StringBuilder while there is nothing left

I have to the task to rearrange the words in a sentence backwards, but i am able to do it only for the first letter.Example: Fun exam right.What i have until now:
var sentance = Console.Readline().Split(' ');
var rearrangedSentence = new StringBuilder();
for(int i = 0,i<sentance.Lenght,i++)
{
rearrangedSentence.Append(sentance[i].Last());//this gives me "nmt"
}
My question is how to make this loop repeat itself while there is nothing left.
Any help will be greatly appriciated :)
EDIT: Question is
I mean if i have the sentence "Fun exam right" the result should be :nmtuahFxgeir . We first take the last chars of each word append that results in "nmt" then take the next one and add them resulting in "nmtuah" and so on
When you use sentance[i].Last(), you are only picking up the last element of your array.
EDIT: As per your updated requirements, you can use this code.
//Get the sentence array
var sentence = Console.ReadLine().Split(' ');
var rearrangedSentence = new StringBuilder();
//Get the length of longest word in array
int loopLength = sentence.OrderBy(n => n.Length).Last().Length;
int x = 0;
// Run for the length of longest word
for (int i = loopLength-1; i >=0 ; i--)
{
// need to pick up an element at every run for each element.
for (var j = 0; j < sentence.Length; j++)
{
//Picking the position of item to be picked up
int val = sentence[j].Length - (x + 1);
// If index not out of bounds
if(val >= 0 && val <= sentence[j].Length)
{
// Pick the character and append to stringbuilder.
rearrangedSentence.Append(sentence[j][val]);
}
}
// Next letter should be n-1, then n-2.
// Increase this. Val will decrease
x++;
}
Console.WriteLine(rearrangedSentence.ToString());
Console.ReadLine();

C# cant understand count

I am trying to count words in this program but i don't understand why the program is counting for 1 number less than it must be.
For example:
sun is hot
program will show me that there is only 2 words.
Console.WriteLine("enter your text here");
string text = Convert.ToString(Console.ReadLine());
int count = 0;
text = text.Trim();
for (int i = 0; i < text.Length - 1; i++)
{
if (text[i] == 32)
{
if (text[i + 1] != 32)
{
count++;
}
}
}
Console.WriteLine(count);
Regular expression works the best for this.
var str = "this,is:my test string!with(difffent?.seperators";
int count = Regex.Matches(str, #"[\w]+").Count;
the result is 8.
Counts all words, does not include spaces or any special characters, regardless if they repeat or not.

Find all possible combinations of word with and without hyphens

For a string that may have zero or more hyphens in it, I need to extract all the different possibilities with and without hyphens.
For example, the string "A-B" would result in "A-B" and "AB" (two possibilities).
The string "A-B-C" would result in "A-B-C", "AB-C", "A-BC" and "ABC" (four possibilities).
The string "A-B-C-D" would result in "A-B-C-D", "AB-C-D", "A-BC-D", "A-B-CD", "AB-CD", "ABC-D", "A-BCD" and "ABCD" (eight possibilities).
...etc, etc.
I've experimented with some nested loops but haven't been able to get anywhere near the desired result. I suspect I need something recursive unless there is some simple solution I am overlooking.
NB. This is to build a SQL query (shame that SQL Server does't have MySQL's REGEXP pattern matching).
Here is one attempt I was working on. This might work if I do this recursively.
string keyword = "A-B-C-D";
List<int> hyphens = new List<int>();
int pos = keyword.IndexOf('-');
while (pos != -1)
{
hyphens.Add(pos);
pos = keyword.IndexOf('-', pos + 1);
}
for (int i = 0; i < hyphens.Count(); i++)
{
string result = keyword.Substring(0, hyphens[i]) + keyword.Substring(hyphens[i] + 1);
Response.Write("<p>" + result);
}
A B C D are words of varying length.
Take a look at your sample cases. Have you noticed a pattern?
With 1 hyphen there are 2 possibilities.
With 2 hyphens there are 4 possibilities.
With 3 hyphens there are 8 possibilities.
The number of possibilities is 2n.
This is literally exponential growth, so if there are too many hyphens in the string, it will quickly become infeasible to print them all. (With just 30 hyphens there are over a billion combinations!)
That said, for smaller numbers of hyphens it might be interesting to generate a list. To do this, you can think of each hyphen as a bit in a binary number. If the bit is 1, the hyphen is present, otherwise it is not. So this suggests a fairly straightforward solution:
Split the original string on the hyphens
Let n = the number of hyphens
Count from 2n - 1 down to 0. Treat this counter as a bitmask.
For each count begin building a string starting with the first part.
Concatenate each of the remaining parts to the string in order, preceded by a hyphen only if the corresponding bit in the bitmask is set.
Add the resulting string to the output and continue until the counter is exhausted.
Translated to code we have:
public static IEnumerable<string> EnumerateHyphenatedStrings(string s)
{
string[] parts = s.Split('-');
int n = parts.Length - 1;
if (n > 30) throw new Exception("too many hyphens");
for (int m = (1 << n) - 1; m >= 0; m--)
{
StringBuilder sb = new StringBuilder(parts[0]);
for (int i = 1; i <= n; i++)
{
if ((m & (1 << (i - 1))) > 0) sb.Append('-');
sb.Append(parts[i]);
}
yield return sb.ToString();
}
}
Fiddle: https://dotnetfiddle.net/ne3N8f
You should be able to track each hyphen position, and basically say its either there or not there. Loop through all the combinations, and you got all your strings. I found the easiest way to track it was using a binary, since its easy to add those with Convert.ToInt32
I came up with this:
string keyword = "A-B-C-D";
string[] keywordSplit = keyword.Split('-');
int combinations = Convert.ToInt32(Math.Pow(2.0, keywordSplit.Length - 1.0));
List<string> results = new List<string>();
for (int j = 0; j < combinations; j++)
{
string result = "";
string hyphenAdded = Convert.ToString(j, 2).PadLeft(keywordSplit.Length - 1, '0');
// Generate string
for (int i = 0; i < keywordSplit.Length; i++)
{
result += keywordSplit[i] +
((i < keywordSplit.Length - 1) && (hyphenAdded[i].Equals('1')) ? "-" : "");
}
results.Add(result);
}
This works for me:
Func<IEnumerable<string>, IEnumerable<string>> expand = null;
expand = xs =>
{
if (xs != null && xs.Any())
{
var head = xs.First();
if (xs.Skip(1).Any())
{
return expand(xs.Skip(1)).SelectMany(tail => new []
{
head + tail,
head + "-" + tail
});
}
else
{
return new [] { head };
}
}
else
{
return Enumerable.Empty<string>();
}
};
var keyword = "A-B-C-D";
var parts = keyword.Split('-');
var results = expand(parts);
I get:
ABCD
A-BCD
AB-CD
A-B-CD
ABC-D
A-BC-D
AB-C-D
A-B-C-D
I've tested this code and it is working as specified in the question. I stored the strings in a List<string>.
string str = "AB-C-D-EF-G-HI";
string[] splitted = str.Split('-');
List<string> finalList = new List<string>();
string temp = "";
for (int i = 0; i < splitted.Length; i++)
{
temp += splitted[i];
}
finalList.Add(temp);
temp = "";
for (int diff = 0; diff < splitted.Length-1; diff++)
{
for (int start = 1, limit = start + diff; limit < splitted.Length; start++, limit++)
{
int i = 0;
while (i < start)
{
temp += splitted[i++];
}
while (i <= limit)
{
temp += "-";
temp += splitted[i++];
}
while (i < splitted.Length)
{
temp += splitted[i++];
}
finalList.Add(temp);
temp = "";
}
}
I'm not sure your question is entirely well defined (i.e. could you have something like A-BCD-EF-G-H?). For "fully" hyphenated strings (A-B-C-D-...-Z), something like this should do:
string toParse = "A-B-C-D";
char[] toParseChars = toPase.toCharArray();
string result = "";
string binary;
for(int i = 0; i < (int)Math.pow(2, toParse.Length/2); i++) { // Number of subsets of an n-elt set is 2^n
binary = Convert.ToString(i, 2);
while (binary.Length < toParse.Length/2) {
binary = "0" + binary;
}
char[] binChars = binary.ToCharArray();
for (int k = 0; k < binChars.Length; k++) {
result += toParseChars[k*2].ToString();
if (binChars[k] == '1') {
result += "-";
}
}
result += toParseChars[toParseChars.Length-1];
Console.WriteLine(result);
}
The idea here is that we want to create a binary word for each possible hyphen. So, if we have A-B-C-D (three hyphens), we create binary words 000, 001, 010, 011, 100, 101, 110, and 111. Note that if we have n hyphens, we need 2^n binary words.
Then each word maps to the output you desire by inserting the hyphen where we have a '1' in our word (000 -> ABCD, 001 -> ABC-D, 010 -> AB-CD, etc). I didn't test the code above, but this is at least one way to solve the problem for fully hyphenated words.
Disclaimer: I didn't actually test the code

Find most accurate match in strings

I'm developing a tool which fixes incorrect filenames by searching the correct names on a YouTube playlist. This tool gets the YouTube playlist videos' titles and stores them in a List:
static List<string> tracksList = new List<string>();
After storing all correct names in this List, the tool performs a search in a folder, it will only search on files with '.mp3' extension:
DirectoryInfo dir = new DirectoryInfo(#"C:\folder");
FileInfo[] files = musicDir.GetFiles("*.mp3", SearchOption.TopDirectoryOnly);
After storing all MP3 files in a FileInfo array, it loops through all of them. This loop will go file by file and, with the filename of each file, will check which is the most similar value that is in the trackList List. I have already tried with this, but it did return an empty array:
var trackMatch = tracksList.Where(track => track.Contains(file.Name.Replace(".mp3", "")))
.ToArray();
Is there any way I could do that?
String comparisons can be performed by using Levenshtein's algorithm (more information). The implementations for this algorithm can be found here.
The function (that will count how many characters have to be changed to have the other string) is the following (taken from https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C.23):
public static int LevenshteinDistance(string source, string target)
{
if (String.IsNullOrEmpty(source))
{
if (String.IsNullOrEmpty(target)) return 0;
return target.Length;
}
if (String.IsNullOrEmpty(target)) return source.Length;
if (source.Length > target.Length)
{
var temp = target;
target = source;
source = temp;
}
var m = target.Length;
var n = source.Length;
var distance = new int[2, m + 1];
// Initialize the distance 'matrix'
for (var j = 1; j <= m; j++) distance[0, j] = j;
var currentRow = 0;
for (var i = 1; i <= n; ++i)
{
currentRow = i & 1;
distance[currentRow, 0] = i;
var previousRow = currentRow ^ 1;
for (var j = 1; j <= m; j++)
{
var cost = (target[j - 1] == source[i - 1] ? 0 : 1);
distance[currentRow, j] = Math.Min(Math.Min(
distance[previousRow, j] + 1,
distance[currentRow, j - 1] + 1),
distance[previousRow, j - 1] + cost);
}
}
return distance[currentRow, m];
}
Therefore, if use the previous function for comparing an input string with every string stored in tracksList, we will get Levenshtein value: the lowest one will mean that it's the most similar:
static List<int> matchList = new List<int>();
foreach (string Track in tracksList)
{
matchList.Add(LevenshteinDistance(Track, "Dailucia Where My Heart Matches The Beat (Ft Poprebel) [FULL HQ + HD]"));
}
string match = tracksList.ElementAt(matchList.IndexOf(matchList.Min()));
This is a non-trivial task.
The problem of course is that the errors in the filenames can be anything, from spelling errors to left out words to added spaces..
This means that any character can be affected in any way.
Therefore neither a simplistic Contains nor even a smart RegEx will work reliably.
I would split the filename into words and do a count of how many of the word I find in the list titles. The one with the highest count has the best chance to be the right one.
I would also try to go for a semi-automatic program, where I get offered the choices ordered by hit count and then can confirm, correct or pass..

Categories

Resources