Find the exact occurence of a string in HTML file

Find the exact occurence of a string in HTML file - c#

I would like to find the count of Exact match of string
Let suppose string is 'My Computer'. I want to find it,s occurrence in string
This is My computer,this is a good Computer,this is my Computer,this is my Computers
So at end I shall get Count 2 ,
I have tried the following formula with 'mykeyWord' as string to be found.
int strength = (innerDocument.DocumentNode.InnerText.Length - innerDocument.DocumentNode.InnerText.ToLower().Replace(mykeyWord.ToLower(), "").Length) / mykeyWord.Length;
But it will also count strings like 'my Computers' that is wrong.

This is a perfect place to use regular expressions, just as you tagged your post:
Regex re = new Regex("\\b" + Regex.Escape(mykeyWord) + "\\b", RegexOptions.IgnoreCase);
int count = re.Matches(innerDocument.DocumentNode.InnerText).Count;

You could use the regular expression [^A-z](my computer)[^A-z] This matches 'my computer' but not if it's before or after 'A to Z'. To make the regex search case insensitive use RegexOptions.IgnoreCase.
Edit
minitech's answer using word boundaries is better.

int FindCount(string keyword, string input)
{
if (input.Contains(keyword))
{
int count = 0;
int i = 0;
foreach (var c in input)
{
if (c == keyword[i])
i++;
else
i = 0;
if (i == keyword.Length)
{
i = 0;
count++;
}
}
return count;
}
return 0;
}

Related

Check if a word exists in another string in c# without using any inbuilt function

string sentence = "This is a goodday";
string word = "good";
I know this can be done with .Contains() method. I was asked in an interview how to do it without contains method.

how to do it in english.
pick the first letter of word.
walk down sentence a character at a time till you find that letter
now look at the next letters in sentence to see if they are the rest of that word
yes - done
no - keep going
the thing to use is that word[x] is the x-1th character of word, so you just need 2 indexes and a loop or 2

Q: Check if a word exists in another string in c# without using any inbuilt function
A: Tricky
It depends on how detailed that "any inbuilt function" really is.
In general the algorithm is simple:
Loop through the string you're searching in
for each position, see if you've found the word
you do this by looping through all the characters in what we're looking for
and compare each character from the first string with one from the second
if they all matched, we've found a match
but then ... "without using any inbuilt function".
I assume this would mean, do not use the obvious ones, such as Contains, IndexOf, a regular expression, all those things.
But taken to the extreme, does that mean I cannot even know how long the strings are? Is s.Length a built-in function? And thus not allowed?
public bool Contains(string value, string whatWereLookingFor)
{
return IndexOf(value, whatWereLookingFor) >= 0;
}
public int Length(string s)
{
int result = 0;
for (int i = 0; i <= 2147483647; i++)
{
try
{
char c = s[i];
}
catch (IndexOutOfRangeException)
{
break;
}
result = i + 1;
}
return result;
}
public int IndexOf(string value, string whatWereLookingFor)
{
int iMax = Length(value);
int whatMax = Length(whatWereLookingFor);
for (int i = 0; i <= iMax - whatMax; i++)
{
bool isMatch = true;
for (int j = 0; j < whatMax; j++)
{
if (value[i + j] != whatWereLookingFor[j])
{
isMatch = false;
break;
}
}
if (isMatch)
return i;
}
return -1;
}

How about using a for loop?
Loop through the sentence checking for the first character of the work. Once you find that check for the next character of the word until you find the whole word (done - word exists in sentence) or a different character so you start again looking for the first character of the word until you run out of characters.
As the comments say, it would have been nicer to let us know what you answered.

try this:
static bool checkString(string inputString="",string word="")
{
bool returV=false;
if(inputString =="" || word=="")
return false;
int intexcS=0;
Dictionary<int,string> d = new Dictionary<int, string>();
foreach (char cW in word)
{
foreach (char cS in inputString)
{
if(cW==cS){
if(!d.ContainsKey(intexcS) && d.Count<word.Length){
d.Add(intexcS,cW.ToString());
}
}
intexcS++;
}
intexcS=0;
}
int i=0;
foreach(var iitem in d) { if(iitem.Value!=word[i].ToString()) returV=false; else returV=true; i++; }
return returV;
}

count the number of points at the end of a string

I need to count the number of points at the END of string.
The number of points in the middle of the string are not relevant and should not be countet.
How can this be done?
string sample = "This.is.a.sample.string.....";
for the example above the correct answer would be 5 because there are 5 points at the end of the string.
because of performace reasons I would prefer a fast solution. Don't know if Regular Expressions
\.*$
should be used in such a case.

Start from the end of the string and go back char by char until its not a dot:
string sample = "This.is.a.sample.string....."
int count = 0;
for (int i = sample.Length - 1; i >= 0; i--)
{
if (sample[i] != '.') break;
count++;
}

Using Linq:
var test = "this.is.a.test........";
var count = test.ToCharArray().Reverse().TakeWhile(q => q == '.').Count();
Convert string to array, reverse, then take while character = '.'. Count result.

A simple solution using an extension method.
var test = "this.is.a.test........";
Console.WriteLine(test.CountTrailingDots());
public static int CountTrailingDots(this string value)
{
return value.Length - value.TrimEnd('.').Length;
}

Using Regex:
int points = Regex.Match("This.is.a.sample.string....", #"^[\w\W]*?([.]*+)$").Groups[1].Value.Length;
Description:
*+ = Matches as many characters as possible
*? = Matches as few characters as possible.

It can be something like..
string sample = "This.is.a.sample.string.....";
int count = 0;
if(sample.EndsWith("."))
count = sample.Substring(sample.TrimEnd('.').Length).Length;

Upper Case Everything Before the nth character in .NET

I need to capitalize everything before the second - from the beginning of the string in .NET. What is the best way to do this? The string before the second dash can be anything. I need a new single string once this is complete.
Before
Tt-Fga - Louisville - Kentucky
After
TT-FGA - Louisville - Kentucky

This should get the job done for your specific case:
public static string ToUpperUntilSecondHyphen(string text)
{
int index = text.IndexOf('-', text.IndexOf('-') + 1);
return text.Substring(0, index).ToUpper() + text.Substring(index);
}
A more generalized method could look something like this:
public static string ToUpperUntilNthOccurrenceOfChar(string text, char c, int occurrences)
{
if (occurrences > text.Count(x => x == c))
{
return text.ToUpper();
}
int index = 0;
for (int i = 0; i < occurrences; i++, index++)
{
index = text.IndexOf(c, index);
}
return text.Substring(0, index).ToUpper() + text.Substring(index);
}

Identify the location of the hyphen with IndexOf. You'll have to use this function twice so that you can find the first hyphen, and then the second one.
Construct the substring that only contains the characters up to that with Substring. Construct the substring that contains all the remaining characters as well.
Upper case the first string with ToUpper.
Concatenate with the + operator.

(.*?-.*)(?=-)
You can use replace here.Replace with $1.upper() or something which is available in c#.
See
http://regex101.com/r/yR3mM3/50

I went ahead and did this. If there is a better answer let me know.
var parts = #event.EventParent.Name.Split(new[] {'-'}, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < parts.Length; i++)
{
if (i >= 2)
break;
parts[i] = parts[i].ToUpper();
}
#event.EventParent.Name = string.Join("-", parts);

Char/String comparison

I'm trying to have a suggestion feature for the search function in my program eg I type janw doe in the search section and it will output NO MATCH - did you mean jane doe? I'm not sure what the problem is, maybe something to do with char/string comparison..I've tried comparing both variables as type char eg char temp -->temp.Contains ...etc but an error appears (char does not contain a definition for Contains). Would love any help on this! 8)
if (found == false)
{
Console.WriteLine("\n\nMATCH NOT FOUND");
int charMatch = 0, charCount = 0;
string[] checkArray = new string[26];
//construction site /////////////////////////////////////////////////////////////////////////////////////////////////////////////
for (int controlLoop = 0; controlLoop < contPeople.Length; controlLoop++)
{
foreach (char i in userContChange)
{
charCount = charCount + 1;
}
for (int i = 0; i < userContChange.Length; )
{
string temp = contPeople[controlLoop].name;
string check=Convert.ToString(userContChange[i]);
if (temp.Contains(check))
{
charMatch = charMatch + 1;
}
}
int half = charCount / 2;
if (charMatch >= half)
{
checkArray[controlLoop] = contPeople[controlLoop].name;
}
}
///////////////////////////////////////////////////////////////////////////////////////////////////////////
Console.WriteLine("Did you mean: ");
for (int a = 0; a < checkArray.Length; a++)
{
Console.WriteLine(checkArray[a]);
}
///////////////////////////////////////////////////////////////////////////////////////////////////

A string is made up of many characters. A character is a primitive, likewise, it doesn't "contain" any other items. A string is basically an array of characters.
For comparing string and characters:
char a = 'A';
String alan = "Alan";
Debug.Assert(alan[0] == a);
Or if you have a single digit string.. I suppose
char a = 'A';
String alan = "A";
Debug.Assert(alan == a.ToString());
All of these asserts are true
But, the main reason I wanted to comment on your question, is to suggest an alternative approach for suggesting "Did you mean?". There's an algorithm called Levenshtein Distance which calculates the "number of single character edits" required to convert one string to another. It can be used as a measure of how close two strings are. You may want to look into how this algorithm works because it could help you.
Here's an applet that I found which demonstrates: Approximate String Matching with k-differences
Also the wikipedia link Levenshtein distance

Char type cannot have .Contains() because is only 1 char value type.
In your case (if i understand), maybe you need to use .Equals() or the == operator.
Note: for compare String correctly, use .Equals(),
the == operator does not work good in this case because String is reference type.
Hope this help!

char type dosen't have the Contains() method, but you can use iit like this: 'a'.ToString().Contains(...)
if do not consider the performance, another simple way:
var input = "janw doe";
var people = new string[] { "abc", "123", "jane", "jane doe" };
var found = Array.BinarySearch<string>(people, input);//or use FirstOrDefault(), FindIndex, search engine...
if (found < 0)//not found
{
var i = input.ToArray();
var target = "";
//most similar
//target = people.OrderByDescending(p => p.ToArray().Intersect(i).Count()).FirstOrDefault();
//as you code:
foreach (var p in people)
{
var count = p.ToArray().Intersect(i).Count();
if (count > input.Length / 2)
{
target = p;
break;
}
}
if (!string.IsNullOrWhiteSpace(target))
{
Console.WriteLine(target);
}
}

Regex to find first capital letter occurrence in a string

I want to find the index of first capital letter occurrence in a string.
E.g. -
String x = "soHaM";
Index should return 2 for this string. The regex should ignore all other capital letters after the first one is found. If there are no capital letters found then it should return 0. Please help.

I'm pretty sure all you need is the regex A-Z \p{Lu}:
public static class Find
{
// Apparently the regex below works for non-ASCII uppercase
// characters (so, better than A-Z).
static readonly Regex CapitalLetter = new Regex(#"\p{Lu}");
public static int FirstCapitalLetter(string input)
{
Match match = CapitalLetter.Match(input);
// I would go with -1 here, personally.
return match.Success ? match.Index : 0;
}
}
Did you try this?

Just for fun, a LINQ solution:
string x = "soHaM";
var index = from ch in x.ToArray()
where Char.IsUpper(ch)
select x.IndexOf(ch);
This returns IEnumerable<Int32>. If you want the index of the first upper case character, simply call index.First() or retrieve only the first instance in the LINQ:
string x = "soHaM";
var index = (from ch in x.ToArray()
where Char.IsUpper(ch)
select x.IndexOf(ch)).First();
EDIT
As suggested in the comments, here is another LINQ method (possibly more performant than my initial suggestion):
string x = "soHaM";
x.Select((c, index) => new { Char = c, Index = index }).First(c => Char.IsUpper(c.Char)).Index;

No need for Regex:
int firstUpper = -1;
for(int i = 0; i < x.Length; i++)
{
if(Char.IsUpper(x[i]))
{
firstUpper = i;
break;
}
}
http://msdn.microsoft.com/en-us/library/system.char.isupper.aspx
For the sake of completeness, here's my LINQ approach(although it's not the right tool here even if OP could use it):
int firstUpperCharIndex = -1;
var upperChars = x.Select((c, index) => new { Char = c, Index = index })
.Where(c => Char.IsUpper(c.Char));
if(upperChars.Any())
firstUpperCharIndex = upperChars.First().Index;

First your logic fails, if the method returns 0 in your case it would mean the first char in that list was in upperCase, so I would recomend that -1 meens not found, or throw a exception.
Anyway just use regular expressions becasue you can is not always the best choise, plus they are pretty slow and hard to read in general, making yoru code much harder to work with.
Anyway here is my contribution
public static int FindFirstUpper(string text)
{
for (int i = 0; i < text.Length; i++)
if (Char.IsUpper(text[i]))
return i;
return -1;
}

Using Linq:
using System.Linq;
string word = "soHaMH";
var capChars = word.Where(c => char.IsUpper(c)).Select(c => c);
char capChar = capChars.FirstOrDefault();
int index = word.IndexOf(capChar);
Using C#:
using System.Text.RegularExpressions;
string word = "soHaMH";
Match match= Regex.Match(word, "[A-Z]");
index = word.IndexOf(match.ToString());

Using loop
int i = 0;
for(i = 0; i < mystring.Length; i++)
{
if(Char.IsUpper(mystring, i))
break;
}
i is the value u should be looking at;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Find the exact occurence of a string in HTML file - c#

This is a perfect place to use regular expressions, just as you tagged your post: Regex re = new Regex("\\b" + Regex.Escape(mykeyWord) + "\\b", RegexOptions.IgnoreCase); int count = re.Matches(innerDocument.DocumentNode.InnerText).Count;

You could use the regular expression [^A-z](my computer)[^A-z] This matches 'my computer' but not if it's before or after 'A to Z'. To make the regex search case insensitive use RegexOptions.IgnoreCase. Edit minitech's answer using word boundaries is better.

int FindCount(string keyword, string input) { if (input.Contains(keyword)) { int count = 0; int i = 0; foreach (var c in input) { if (c == keyword[i]) i++; else i = 0; if (i == keyword.Length) { i = 0; count++; } } return count; } return 0; }

Related

Check if a word exists in another string in c# without using any inbuilt function

count the number of points at the end of a string

Upper Case Everything Before the nth character in .NET

Char/String comparison

Regex to find first capital letter occurrence in a string

Categories

Resources