How to get words from a text box - c#

In my WPF application I have a text box named: textBox.
I'm trying to get each individual word from a sentence typed by user in a string array, say arrayWords.
I found a piece of code on stackOverFlow that counts the number of words but I want to copy each individual word.
Bellow is the code for counting the number of words.
String text = textBox.Text.Trim();
int wordCount = 0, index = 0;
while (index < text.Length)
{
// check if current char is part of a word
while (index < text.Length && Char.IsWhiteSpace(text[index]) == false)
index++;
wordCount++;
// skip whitespace until next word
while (index < text.Length && Char.IsWhiteSpace(text[index]) == true)
index++;
}

You can use String.Split function.
String text = textBox.Text.Trim()
var words = text.Split(' ');
or
var words = text.Split(); // Default parameter is taken as whitespace delimiter

While #dotNET answer is on the right way, it assumes, that you should maintain punctuation marks list yourself (it isn't full in his answer). Besides, there could be words with hyphen.
I'd recommend to use regular expression:
var words = Regex.Matches(textBox.Text, #"\w+-?\w+")
.OfType<Match>()
.Select(m => m.Value)
.ToArray();

String.Split() can chop your sentence(s) into words. You should however take care of trimming punctuation characters from your words. For example if you use Split() on the sentence "StackOverflow is good, I like it.", two of the words you get in your array will have comma and period characters appended to them. So you should use something like this to get "pure" words:
string[] words = textBox.Text.Split().Select(x => x.TrimEnd(",.;:-".ToCharArray())).ToArray();
LINQ has been used in the above statement, so you should import System.Linq.

The following code will give an array of words from your textBox.
string[] words = textBox.Text.Split(" ");

string[] words = textBox.Text.Split(new char(" "));

Logic behind the getting words from sentence is that first you split sentence into words and then store these words into an array of string then you can do any thing you want. the code below will surely help you to sort out your problem.
static void Main(string[] args)
{
string sentence = "Thats the sentence";
string[] operands = Regex.Split(sentence,#" ");
foreach(string i in operands)
{
Console.WriteLine(i);
}
Console.ReadLine();
}
It will extract words from the sentence and will store in array and will display them.

Related

C#: How can I only return the first set of capital letter words from a string?

If I wanted to parse back a string to only return the first all capital words in it how would I do that?
Example:
"OTHER COMMENTS These are other comments that would be here. Some more
comments"
I want to just return "OTHER COMMENTS"
These first upper case words can be many and the exact count is
unknown.
There could be other words in the string after with all caps
that I just want to ignore.
You can use a combination of Split (to break the sentence into words), SkipWhile (to skip words that aren't all caps), ToUpper (to test the word against it's upper-case counterpart), and TakeWhile (to take all sequential upper-case words once one is found). Finally, these words can be re-joined using Join:
string words = "OTHER COMMENTS These are other comments that would be here. " +
"Some more comments";
string capitalWords = string.Join(" ", words
.Split()
.SkipWhile(word => word != word.ToUpper())
.TakeWhile(word => word == word.ToUpper()));
You can loop through the string as an array of chars. To check if the char is uppercase, use Char.IsUpper https://www.dotnetperls.com/char-islower. So, in the loop you can say if its a char - set a flag that we started reading the set. Then add that char to a collection of chars. Keep looping and once it is no longer an upper case char and the flag is still true, break out of the loop. Then return the collection of chars as a string.
Hope that helps.
var input = "OTHER COMMENTS These are other comments that would be here. Some more comments";
var output = String.Join(" ", input.Split(' ').TakeWhile(w => w.ToUpper() == w));
Split it into words, then take words while the uppercase version of the word is the same as the word. Then combine them back with the space separator.
You could also use Regex:
using System.Text.RegularExpressions;
...
// The Regex pattern is any number of capitalized letter followed by a non-word character.
// You may have to adjust this a bit.
Regex r = new Regex(#"([A-Z]+\W)+");
string s = "OTHER COMMENTS These are other comments that would be here. Some more comments";
MatchCollection m = r.Matches(s);
// Only return the first match if there are any matches.
if (m.Count > 0)
{
Console.WriteLine(r.Matches(s)[0]);
}

How to save the strings in array and display the next string array if match found?

I read the *.txt file from c# and displayed in the console.
My text file looks like a table.
diwas hey
ivonne how
pokhara d kd
lekhanath when
dipisha dalli hos
dfsa sasf
Now I want to search for a string "pokhara" and if it is found then it should display the "d kd" and if not found display "Not found"
What I tried?
string[] lines = System.IO.ReadAllLines(#"C:\readme.txt");
foreach(string line in lines)
{
string [] words = line.Split();
foreach(string word in words)
{
if (word=="pokhara")
{
Console.WriteLine("Match Found");
}
}
}
My Problem:
Match was found but how to display the next word of the line. Also sometimes
in second row some words are split in two with a space, I need to show both words.
I guess your delimiter is the tab-character, then you can use String.Split and LINQ:
var lineFields = System.IO.File.ReadLines(#"C:\readme.txt")
.Select(l => l.Split('\t'));
var matches = lineFields
.Where(arr => arr.First().Trim() == "pokhara")
.Select(arr => arr.Last().Trim());
// if you just want the first match:
string result = matches.FirstOrDefault(); // is null if not found
If you don't know the delimiter as suggested by your comment you have a problem. If you don't even know the rules of how the fields are separated it's very likely that your code is incorrect. So first determine the business logic, ask the people who created the text file. Then use the correct delimiter in String.Split.
If it's a space you can either use string.Split()(without argument), that includes spaces, tabs and new-line characters or use string.Split(' ') which only includes the space. But note that is a bad delimiter if the fields can contain spaces as well. Then either use a different or wrap the fields in quoting characters like "text with spaces". But then i suggest a real text-parser like the Microsoft.VisualBasic.FileIO.TextFieldParser which can also be used in C#. It has a HasFieldsEnclosedInQuotes property.
This works ...
string[] lines = System.IO.ReadAllLines(#"C:\readme.txt");
string stringTobeDisplayed = string.Empty;
foreach(string line in lines)
{
stringTobeDisplayed = string.Empty;
string [] words = line.Split();
//I assume that the first word in every line is the key word to be found
if (word[0].Trim()=="pokhara")
{
Console.WriteLine("Match Found");
for(int i=1 ; i < words.Length ; i++)
{
stringTobeDisplayed += words[i]
}
Console.WriteLine(stringTobeDisplayed);
}
}

how to extract a whole sentence by a single word match in a string?

So I have got a whole string (about 10k chars) and then searching for a word(or many words) in that string. With regex(word).Matches(scrappedstring).
But how to do so to extract the whole sentence, that contains that word. I was thinking of taking a substring after the searched word until the first dot/exclamation mark/question mark/etc. But how to take the part of the sentence before the searched word ?
Or maybe there's a better logic ?
If your boundaries are e.g. ., !, ? and ;, match all sentences across [^.!?;]*(wordmatch)[^.!?;]* expression.
It will give all sentences with desired wordmatch inside.
Example:
var s = "First sentence. Second with wordmatch ? Third one; The last wordmatch, EOM!";
var r = new Regex("[^.!?;]*(wordmatch)[^.!?;]*");
var m = r.Matches(s);
var result = Enumerable.Range(0, m.Count).Select(index => m[index].Value).ToList();
You can get substrings between sentence finishers (dot/exclamation mark/qustion mark/etc) and search for the word in each sentence inside a loop.
Then return the substring when you find the matching word.
Once you have a position, you would then read up to the next ., or end of the file.. but you also need to read backwards from the beginning of the word to a . or the beginning of the file. Those two positions mean you can then extract the sentence.
Note, it's not fool-proof... in its simplest form as outlined above e.g. would mean the sentence started after the g. which is not probably the case.
Extract the sentances from the input. Then search for the specified word(s) within each sentance.
Return the sentances where the word(s) is present.
public List<string> GetMatchedString(string match, string input)
{
var sentanceList = input.Split(new char[] { '.', '?', '!' });
var regex = new Regex(match);
return sentanceList.Where(sentance => regex.Matches(sentance,0).Count > 0).ToList();
}
You can do that using a process in 2 steps.
1st you fragment the phrases and then filter each one has the word.
something like this:
var input = "A large text with many sentences. Many chars in a string!. A sentence without the pattern word.";
//Step 1: fragment phrase.
var patternPhrase = #"(?<=(^|[.!?]\s*))[^ .!?][^.!?]+[.!?]";
//Step 2: filter out only the phrases containing the word.
var patternWord = #"many";
var result = Regex
.Matches(input, patternPhrase) // step 1
.Cast<Match>()
.Select(s => s.Value)
.Where(w => Regex.IsMatch(w, patternWord, RegexOptions.IgnoreCase)); // step 2
foreach (var item in result)
{
//do something with any phrase.
}

Regular expression to split long strings in several lines

I'm not an expert in regular expressions and today in my project I face the need to split long string in several lines in order to check if the string text fits the page height.
I need a C# regular expression to split long strings in several lines by "\n", "\r\n" and keeping 150 characters by line maximum. If the character 150 is in the middle of an word, the entire word should be move to the next line.
Can any one help me?
It's actually a quite simple problem. Look for any characters up to 150, followed by a space. Since Regex is greedy by nature it will do exactly what you want it to. Replace it by the Match plus a newline:
.{0,150}(\s+|$)
Replace with
$0\r\n
See also: http://regexhero.net/tester/?id=75645133-1de2-4d8d-a29d-90fff8b2bab5
var regex = new Regex(#".{0,150}", RegexOptions.Multiline);
var strings = regex.Replace(sourceString, "$0\r\n");
Here you go:
^.{1,150}\n
This will match the longest initial string like this.
if you just want to split a long string into lines of 150 chars then I'm not sure why you'd need a regular expression:
private string stringSplitter(string inString)
{
int lineLength = 150;
StringBuilder sb = new StringBuilder();
while (inString.Length > 0)
{
var curLength = inString.Length >= lineLength ? lineLength : inString.Length;
var lastGap = inString.Substring(0, curLength).LastIndexOfAny(new char[] {' ', '\n'});
if (lastGap == -1)
{
sb.AppendLine(inString.Substring(0, curLength));
inString = inString.Substring(curLength);
}
else
{
sb.AppendLine(inString.Substring(0, lastGap));
inString = inString.Substring(lastGap + 1);
}
}
return sb.ToString();
}
edited to account for word breaks
This code should help you. It will check the length of the current string. If it is greater than your maxLength (150) in this case, it will start at the 150th character and (going backwards) find the first non-word character (as described by the OP, this is a sequence of non-space characters). It will then store the string up to that character and start over again with the remaining string, repeating until we end up with a substring that is less than maxLength characters. Finally, join them all back together again in a final string.
string line = "This is a really long run-on sentence that should go for longer than 150 characters and will need to be split into two lines, but only at a word boundary.";
int maxLength = 150;
string delimiter = "\r\n";
List<string> lines = new List<string>();
// As long as we still have more than 'maxLength' characters, keep splitting
while (line.Length > maxLength)
{
// Starting at this character and going backwards, if the character
// is not part of a word or number, insert a newline here.
for (int charIndex = (maxLength); charIndex > 0; charIndex--)
{
if (char.IsWhiteSpace(line[charIndex]))
{
// Split the line after this character
// and continue on with the remainder
lines.Add(line.Substring(0, charIndex+1));
line = line.Substring(charIndex+1);
break;
}
}
}
lines.Add(line);
// Join the list back together with delimiter ("\r\n") between each line
string final = string.Join(delimiter , lines);
// Check the results
Console.WriteLine(final);
Note: If you run this code in a console application, you may want to change "maxLength" to a smaller number so that the console doesn't wrap on you.
Note: This code does not take into effect any tab characters. If tabs are also included, your situation gets a bit more complicated.
Update: I fixed a bug where new lines were starting with a space.

How to remove words based on a word count

Here is what I'm trying to accomplish. I have an object coming back from
the database with a string description. This description can be up to 1000
characters long, but we only want to display a short view of this. So I coded
up the following, but I'm having trouble in actually removing the number of
words after the regular expression finds the total count of words. Does anyone
have good way of dispalying the words which are less than the Regex.Matches?
Thanks!
if (!string.IsNullOrEmpty(myObject.Description))
{
string original = myObject.Description;
MatchCollection wordColl = Regex.Matches(original, #"[\S]+");
if (wordColl.Count < 70) // 70 words?
{
uxDescriptionDisplay.Text =
string.Format("<p>{0}</p>", myObject.Description);
}
else
{
string shortendText = original.Remove(200); // 200 characters?
uxDescriptionDisplay.Text =
string.Format("<p>{0}</p>", shortendText);
}
}
EDIT:
So this is what I got working on my own:
else
{
int count = 0;
StringBuilder builder = new StringBuilder();
string[] workingText = original.Split(' ');
foreach (string word in workingText)
{
if (count < 70)
{
builder.AppendFormat("{0} ", word);
}
count++;
}
string shortendText = builder.ToString();
}
It's not pretty, but it worked. I would call it a pretty naive way of doing this. Thanks for all of the suggestions!
I would opt to go by a strict character count rather than a word count because you might happen to have a lot of long words.
I might do something like (pseudocode)
if text.Length > someLimit
find first whitespace after someLimit (or perhaps last whitespace immediately before)
display substring of text
else
display text
Possible code implementation:
string TruncateText(string input, int characterLimit)
{
if (input.Length > characterLimit)
{
// find last whitespace immediately before limit
int whitespacePosition = input.Substring(0, characterLimit).LastIndexOf(" ");
// or find first whitespace after limit (what is spec?)
// int whitespacePosition = input.IndexOf(" ", characterLimit);
if (whitespacePosition > -1)
return input.Substring(0, whitespacePosition);
}
return input;
}
One method, if you're using at least C#3.0, would be a LINQ like the following. This is provided you're going strictly by word count, not character count.
if (wordColl.Count > 70)
{
foreach (var subWord in wordColl.Cast<Match>().Select(r => r.Value).Take(70))
{
//Build string here out of subWord
}
}
I did a test using a simple Console.WriteLine with your Regex and your question body (which is over 70 words, it turns out).
You can use Regex Capture Groups to hold the match and access it later.
For your application, I'd recommend instead simply splitting the string by spaces and returning the first n elements of the array:
if (!string.IsNullOrEmpty(myObject.Description))
{
string original = myObject.Description;
string[] words = original.Split(' ');
if (words.Length < 70)
{
uxDescriptionDisplay.Text =
string.Format("<p>{0}</p>", original);
}
else
{
string shortDesc = string.Empty;
for(int i = 0; i < 70; i++) shortDesc += words[i] + " ";
uxDescriptionDisplay.Text =
string.Format("<p>{0}</p>", shortDesc.Trim());
}
}
Are you wanting to remove 200 characters or start truncating at the 200th character? When you call original.Remove(200) you are indexing the start of the truncation at the 200th character. This is how you use Remove() for a certain number of characters to remove:
string shortendText = original.Remove(0,200);
This starts at the first character and removes 200 starting with that one. Which I imagine that's not what you're trying to do since you're shortening a description. That's merely the correct way to use Remove().
Instead of using Regex matchcollections why not just split the string? It's a lot easier and straight forward. You can set the delimiter to a space character and split that way. Not sure if that completely fixes your need but it just might. I'm not sure what your data looks like in the description. But you split this way:
String[] wordArray = original.Split(' ');
From there you can determine the word count with wordArray's Length property value.
If I was you I would go by characters as you may have many one letter words or many long words in your text.
Go through until characters <= your limit, then either find the next space and then add these characters to a new string (possibly using the SubString method) or take these characters and add a few full stops, then make a new string The later could be unproffessional I suppose.

Categories

Resources