Finding number of instances of exact word of "x" in text

Finding number of instances of exact word of "x" in text - c#

I'm working c# to find out number of instances of exact word of "x".
For example:
List<string> words = new List<string> {"Mode", "Model", "Model:"};
Text= "This is Model: x Type: y aa: e";
I've used Regex:
for(i=0; i<words.count; i++)
{
word= list[i]
int count= Regex.Matches(Text,word)
}
But its not working. The result of above code gave count=1 for every Mode, Model, and Model:.
I want to have my count to be 0 for Mode, 0 for Model, but 1 for Model: that it finds the number of instance of exact word.
Forgot that I can't use split in my case. Is there any way I can get not using split?

I use LINQ for this purpose:
List<string> words = new List<string> { "Mode", "Model", "Model:" };
Text = "This is Model: x Type: Model: y aa: Mode e Model:";
var textArray = Text.Split(' ');
var countt = words.Select(item => textArray.ToList().Contains(item) ?
textArray.Count(d => d == item) : 0).ToArray();
Result:
For Mode => count = 1
For Model => count = 0
For Model: => count = 3
EDIT: I prefer to use LINQ for this purpose because as you see it is more easier and cleaner in this scenario, but if you are looking for a Regex solution yet you could try this:
List<int> count = new List<int>();
foreach (var word in words)
{
var regex = new Regex(string.Format(#"\b{0}(\s|$)", word), RegexOptions.IgnoreCase);
count.Add(regex.Matches(Text).Count);
}
EDIT2: Or by combining LINQ and Regex and without Split you can:
List<int> count = words.Select(word => new Regex(string.Format(#"\b{0}(\s|$)", word), RegexOptions.IgnoreCase))
.Select(regex => regex.Matches(Text).Count).ToList();

Although #S.Akhbari 's solution works... I think using Linq is cleaner:
var splitted = Text.Split(' ');
var items = words.Select(x => new { Word = x, Count = splitted.Count(y => y == x) });
Each item will have Word and Count properties.
See it in action here

\b matches on word boundaries.
for(i=0; i<words.count; i++)
{
word= list[i]
var regex = new Regex(string.Format(#"\b{0}\b", word),
RegexOptions.IgnoreCase);
int count= regex.Matches(Text).Count;
}

Related

How to count 2 or 3 letter words in a string using asp c#

How to count 2 or 3 letter words of a string using asp csharp, eg.
string value="This is my string value";
and output should look like this
2 letter words = 2
3 letter words = 0
4 letter words = 1
Please help, Thanks in advance.

You can try something like this:
split sentence by space to get array of words
group them by length of word (and order by that length)
iterate through every group and write letter count and number of words with that letter count
code
using System.Linq;
using System.Diagnostics;
...
var words = value.Split(' ');
var groupedByLength = words.GroupBy(w => w.Length).OrderBy(x => x.Key);
foreach (var grp in groupedByLength)
{
Debug.WriteLine(string.Format("{0} letter words: {1}", grp.Key, grp.Count()));
}

First of all you need to decide what counts as a word. A naive approach is to split the string with spaces, but this will also count commas. Another approach is to use the following regex
\b\w+?\b
and collect all the matches.
Now you got all the words in a words array, we can write a LINQ query:
var query = words.Where(x => x.Length >= 2 && x.Length <= 4)
.GroupBy(x => x.Length)
.Select(x => new { CharCount = x.Key, WordCount = x.Count() });
Then you can print the query out like this:
query.ToList().ForEach(Console.WriteLine);
This prints:
{ CharCount = 4, WordCount = 1 }
{ CharCount = 2, WordCount = 2 }
You can write some code yourself to produce a more formatted output.

If i understood your question correctly
You can do it using dictionary
First split the string by space in this case
string value = "This is my string value";
string[] words = value.Split(' ');
Then loop trough array of words and set the length of each word as a key of dictionary, note that I've used string as a key, but you can modify this to your needs.
Dictionary<string, int> latteWords = new Dictionary<string,int>();
for(int i=0;i<words.Length;i++)
{
string key = words[i].Length + " letter word";
if (latteWords.ContainsKey(key))
latteWords[key] += 1;
else
latteWords.Add(key, 1);
}
And the output would be
foreach(var ind in latteWords)
{
Console.WriteLine(ind.Key + " = " + ind.Value);
}
Modify this by wish.

Cut last character from string which was earlier splitted by char

I want to order list with string names by name included in brackes.
List<string> result = new List<string>();
list.ForEach(elem => result.Add(elem.Value));
result.Add(item);
result = result.OrderBy(o=>o.Split(';')[0].Substring(0, o.Length - 1).Split('(')[1]).ToList();
Example: 2-osobowy(Agrawka);Śniadanie+Obiadokolacja
I want to extract this name Agrawka
How to change instruction Substring(0, o.Length - 1)to cut last char from splitted string in orderby instruction?

If I right understood you want extract values in the brackets and sort input' list by that values. So code below sorts your data and extracts value to additional list:
List<string> resultList = new List<string>() { "2-osobowy(Bgrawka);Śniadanie+Obiadokolacja", "2-osobowy(Agrawka);Śniadanie+Obiadokolacja" };
string tempStr = null;
var extractedStr = new List<String>();
resultList = resultList.OrderBy(o =>
{
var extract = (tempStr = o.Split(';')[0].Split('(')[1]).Substring(0, tempStr.Length - 1);
extractedStr.Add(extract);
return extract;
}).ToList();
If you want only sort input data just simplify the lambda:
resultList = resultList.OrderBy(o => (tempStr = o.Split(';')[0].Split('(')[1]).Substring(0, tempStr.Length - 1)).ToList();

How to ignore the punctuation c#

I want to ignore the punctuation.So, I'm trying to make a program that counts all the appearences of every word in my text but without taking in consideration the punctuation marks.
So my program is:
static void Main(string[] args)
{
string text = "This my world. World, world,THIS WORLD ! Is this - the world .";
IDictionary<string, int> wordsCount =
new SortedDictionary<string, int>();
text=text.ToLower();
text = text.replaceAll("[^0-9a-zA-Z\text]", "X");
string[] words = text.Split(' ',',','-','!','.');
foreach (string word in words)
{
int count = 1;
if (wordsCount.ContainsKey(word))
count = wordsCount[word] + 1;
wordsCount[word] = count;
}
var items = from pair in wordsCount
orderby pair.Value ascending
select pair;
foreach (var p in items)
{
Console.WriteLine("{0} -> {1}", p.Key, p.Value);
}
}
The output is:
is->1
my->1
the->1
this->3
world->5
(here is nothing) -> 8
How can I remove the punctuation here?

You should try specifying StringSplitOptions.RemoveEmptyEntries:
string[] words = text.Split(" ,-!.".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
Note that instead of manually creating a char[] with all the punctuation characters, you may create a string and call ToCharArray() to get the array of characters.
I find it easier to read and to modify later on.

string[] words = text.Split(new char[]{' ',',','-','!','.'}, StringSplitOPtions.RemoveEmptyItems);

It is simple - first step is to remove undesired punctuation with function Replace and then continue with splitting as you have it.

... you can go with the making people cry version ...
"This my world. World, world,THIS WORLD ! Is this - the world ."
.ToLower()
.Split(" ,-!.".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.GroupBy(i => i)
.Select(i=>new{Word=i.Key, Count = i.Count()})
.OrderBy(k => k.Count)
.ToList()
.ForEach(Console.WriteLine);
.. output
{ Word = my, Count = 1 }
{ Word = is, Count = 1 }
{ Word = the, Count = 1 }
{ Word = this, Count = 3 }
{ Word = world, Count = 5 }

Given collection of strings, count number of times each word appears in List<T>

Input 1: List<string>, e.g:
"hello", "world", "stack", "overflow".
Input 2: List<Foo> (two properties, string a, string b), e.g:
Foo 1:
a: "Hello there!"
b: string.Empty
Foo 2:
a: "I love Stack Overflow"
b: "It's the best site ever!"
So i want to end up with a Dictionary<string,int>. The word, and the number of times it appears in the List<Foo>, either in the a or the b field.
Current first-pass/top of my head code, which is far too slow:
var occurences = new Dictionary<string, int>();
foreach (var word in uniqueWords /* input1 */)
{
var aOccurances = foos.Count(x => !string.IsNullOrEmpty(x.a) && x.a.Contains(word));
var bOccurances = foos.Count(x => !string.IsNullOrEmpty(x.b) && x.b.Contains(word));
occurences.Add(word, aOccurances + bOccurances);
}

Roughly:
Build a dictionary (occurrences) from the first input, optionally with a case-insensitive comparer.
For each Foo in the second input, use RegEx to split a and b into words.
For each word, check if the key exists in occurrences. If it exists, increment and update the value in the dictionary.

You could try concating the two strings a + b. Then doing a regex to pull out all the words into a collection. Then finally indexing that using a group by query.
For example
void Main()
{
var a = "Hello there!";
var b = "It's the best site ever!";
var ab = a + " " + b;
var matches = Regex.Matches(ab, "[A-Za-z]+");
var occurences = from x in matches.OfType<System.Text.RegularExpressions.Match>()
let word = x.Value.ToLowerInvariant()
group word by word into g
select new { Word = g.Key, Count = g.Count() };
var result = occurences.ToDictionary(x => x.Word, x => x.Count);
Console.WriteLine(result);
}
Example with some changes suggested...
Edit. Just reread the requirement....kinda strange but hey...
void Main()
{
var counts = GetCount(new [] {
"Hello there!",
"It's the best site ever!"
});
Console.WriteLine(counts);
}
public IDictionary<string, int> GetCount(IEnumerable<Foo> inputs)
{
var allWords = from input in inputs
let matchesA = Regex.Matches(input.A, "[A-Za-z']+").OfType<System.Text.RegularExpressions.Match>()
let matchesB = Regex.Matches(input.B, "[A-Za-z']+").OfType<System.Text.RegularExpressions.Match>()
from x in matchesA.Concat(matchesB)
select x.Value;
var occurences = allWords.GroupBy(x => x, (x, y) => new{Key = x, Count = y.Count()}, StringComparer.OrdinalIgnoreCase);
var result = occurences.ToDictionary(x => x.Key, x => x.Count, StringComparer.OrdinalIgnoreCase);
return result;
}

C# Get substring with specific pattern from string

I have a list of strings like this:
List<string> list = new List<string>();
list.Add("Item 1: #item1#");
list.Add("Item 2: #item2#");
list.Add("Item 3: #item3#");
How can I get and add the substrings #item1#, #item2# etc into a new list?
I am only able to get the complete string if it contains a "#" by doing this:
foreach (var item in list)
{
if(item.Contains("#"))
{
//Add item to new list
}
}

You could have a look at Regex.Match. If you know a little bit about regular expressions (in your case it would be a quite simple pattern: "#[^#]+#"), you can use it to extract all items starting and ending with '#' with any number of other characters other than '#' in between.
Example:
Match match = Regex.Match("Item 3: #item3#", "#[^#]+#");
if (match.Success) {
Console.WriteLine(match.Captures[0].Value); // Will output "#item3#"
}

Here's another way using a regex with LINQ. (Not sure your exact requirements reference the regex, so now you may have two problems.)
var list = new List<string> ()
{
"Item 1: #item1#",
"Item 2: #item2#",
"Item 3: #item3#",
"Item 4: #item4#",
"Item 5: #item5#",
};
var pattern = #"#[A-za-z0-9]*#";
list.Select (x => Regex.Match (x, pattern))
.Where (x => x.Success)
.Select (x => x.Value)
.ToList ()
.ForEach (Console.WriteLine);
Output:
#item1#
#item2#
#item3#
#item4#
#item5#

LINQ would do the job nicely:
var newList = list.Select(s => '#' + s.Split('#')[1] + '#').ToList();
Or if you prefer query expressions:
var newList = (from s in list
select '#' + s.Split('#')[1] + '#').ToList();
Alternatively, you can use regular expressions as suggested with Botz3000 and combine those with LINQ:
var newList = new List(
from match in list.Select(s => Regex.Match(s, "#[^#]+#"))
where match.Success
select match.Captures[0].Value
);

The code will solve your problem.
But if the string does not contain #item# then the original string will be used.
var inputList = new List<string>
{
"Item 1: #item1#",
"Item 2: #item2#",
"Item 3: #item3#",
"Item 4: item4"
};
var outputList = inputList
.Select(item =>
{
int startPos = item.IndexOf('#');
if (startPos < 0)
return item;
int endPos = item.IndexOf('#', startPos + 1);
if (endPos < 0)
return item;
return item.Substring(startPos, endPos - startPos + 1);
})
.ToList();

How about this:
List<string> substring_list = new List<string>();
foreach (string item in list)
{
int first = item.IndexOf("#");
int second = item.IndexOf("#", first);
substring_list.Add(item.Substring(first, second - first);
}

You could do that by simply using:
List<string> list2 = new List<string>();
list.ForEach(x => list2.Add(x.Substring(x.IndexOf("#"), x.Length - x.IndexOf("#"))));

try this.
var itemList = new List<string>();
foreach(var text in list){
string item = text.Split(':')[1];
itemList.Add(item);
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Finding number of instances of exact word of "x" in text - c#

Although #S.Akhbari 's solution works... I think using Linq is cleaner: var splitted = Text.Split(' '); var items = words.Select(x => new { Word = x, Count = splitted.Count(y => y == x) }); Each item will have Word and Count properties. See it in action here

\b matches on word boundaries. for(i=0; i<words.count; i++) { word= list[i] var regex = new Regex(string.Format(#"\b{0}\b", word), RegexOptions.IgnoreCase); int count= regex.Matches(Text).Count; }

Related

How to count 2 or 3 letter words in a string using asp c#

Cut last character from string which was earlier splitted by char

How to ignore the punctuation c#

Given collection of strings, count number of times each word appears in List<T>

C# Get substring with specific pattern from string

Categories

Resources