If I have user input from a text box in c#, and have a CSV file of words for example with user input being: "Wow that's cool LOL" and the CSV file:
LOL Laughing Out Loud
ROFL Rolling On Floor Laughing
How would I compare the input text to find any matches in the file? How would I load the file?
You can do:
string input = "Wow that's cool LOL";
string[] splitArray = input.Split();
bool ifFound = File.ReadLines("yourCVSFilePath.csv")
.Any(line => splitArray.Any(line.Contains));
It is doing:
Split the input into an array of words to be compared individually.
Loads the file line by line (lazy loading)
Compare to see if any words in the split array matches any word in the line.
If you want to perform string comparison ignoring case then you can do:
bool ifFound = File.ReadLines("yourCVSFilePath.csv")
.Any(line =>
splitArray.Any(sa =>
line.IndexOf(sa, StringComparison.CurrentCultureIgnoreCase) > -1));
You can use File.ReadLines to read the lines from the csv file and LINQ to filter them:
string input = "Wow that's cool LOL";
string[] words = input.Split();
char delimiter = '\t'; // for example
IEnumerable<string> matchingLines = File.ReadLines("yourCVSFilePath.csv")
.Where(line => words.Intersect(line.Split(delimiter)).Any())
.ToList(());
The Intersect...Any approach is an optimized version of this alternative query:
......
.Where(line => words.Any(word => line.Split(delimiter).Contains(word)))
Related
I have list and text file and I want:
Find all list items that are also in string (matched words) and
store them in list or array
Replace all the found matched words with "Names"
Count the matched words
The code is working fine, but it takes about 10 minutes to execute i want to improve the performance of the code, i have also try to use the contain function instead of the regex, but it effect on the working of the code as i am trying to matched the full words not sub-string.
Here is the code:
List<string> Names = new List<string>();
// Names = Millions values from the database
string Text = System.IO.File.ReadAllText(#"D:\record-13.txt");
var letter = new Regex(#"(?<letter>\W)");
var pattern = string.Join("|", names
.Select(n => $#"((?<=(^|\W)){letter.Replace(n, "[${letter}]")}(?=($|\W)))"));
var regex = new Regex(pattern);
var matchedWords = regex
.Matches(text)
.Cast<Match>()
.Select(m => m.Value)
//.Distinct()
.ToList();
text = regex.Replace(text, "Names");
Console.WriteLine($"Matched Words: {string.Join(", ", matchedWords.Distinct())}");
Console.WriteLine($"Count: {matchedWords.Count}");
Console.WriteLine($"Replaced Text: {text}");
Is there an alternate way to do the same thing as the above code do, with improved performance?
What you are doing is building a regular expression with "millions" of strings embedded in it, if Names really contains "millions" of strings. This is going to perform very poorly, even just to parse the regular expression, let alone evaluate it.
What you should do instead is load your Names into a HashSet<string>, then parse through the document one time, pulling out whole words. You can use a regular expression or write a state machine to do this. For each "word" you read, check if it exists in the HashSet<string> of names, and if so, write "Names" to your output (a StringBuilder would be ideal for this). If the word is not in the Names hashset, write the actual word to your output. Be sure to also write any non-word characters to the output as you encounter them. When you are done, your output will contain the sanitized result, and it should complete it milliseconds rather than minutes.
If I understand what you really want; I think you can use this code instead:
If you can ignore resulting Matched Words and Count:
text = names.Select(name => $#"\b{name}\b")
.Aggregate(text, (current, pattern) => Regex.Replace(current, pattern, "Names"));
Else:
var count = 0;
var matchedWord = new List<string>();
foreach (var name in names)
{
var regex = new Regex($#"\b{name}\b");
if (regex.IsMatch(text))
{
count++;
matchedWord.Add(name);
}
text = regex.Replace(text, "Names");
}
I read the *.txt file from c# and displayed in the console.
My text file looks like a table.
diwas hey
ivonne how
pokhara d kd
lekhanath when
dipisha dalli hos
dfsa sasf
Now I want to search for a string "pokhara" and if it is found then it should display the "d kd" and if not found display "Not found"
What I tried?
string[] lines = System.IO.ReadAllLines(#"C:\readme.txt");
foreach(string line in lines)
{
string [] words = line.Split();
foreach(string word in words)
{
if (word=="pokhara")
{
Console.WriteLine("Match Found");
}
}
}
My Problem:
Match was found but how to display the next word of the line. Also sometimes
in second row some words are split in two with a space, I need to show both words.
I guess your delimiter is the tab-character, then you can use String.Split and LINQ:
var lineFields = System.IO.File.ReadLines(#"C:\readme.txt")
.Select(l => l.Split('\t'));
var matches = lineFields
.Where(arr => arr.First().Trim() == "pokhara")
.Select(arr => arr.Last().Trim());
// if you just want the first match:
string result = matches.FirstOrDefault(); // is null if not found
If you don't know the delimiter as suggested by your comment you have a problem. If you don't even know the rules of how the fields are separated it's very likely that your code is incorrect. So first determine the business logic, ask the people who created the text file. Then use the correct delimiter in String.Split.
If it's a space you can either use string.Split()(without argument), that includes spaces, tabs and new-line characters or use string.Split(' ') which only includes the space. But note that is a bad delimiter if the fields can contain spaces as well. Then either use a different or wrap the fields in quoting characters like "text with spaces". But then i suggest a real text-parser like the Microsoft.VisualBasic.FileIO.TextFieldParser which can also be used in C#. It has a HasFieldsEnclosedInQuotes property.
This works ...
string[] lines = System.IO.ReadAllLines(#"C:\readme.txt");
string stringTobeDisplayed = string.Empty;
foreach(string line in lines)
{
stringTobeDisplayed = string.Empty;
string [] words = line.Split();
//I assume that the first word in every line is the key word to be found
if (word[0].Trim()=="pokhara")
{
Console.WriteLine("Match Found");
for(int i=1 ; i < words.Length ; i++)
{
stringTobeDisplayed += words[i]
}
Console.WriteLine(stringTobeDisplayed);
}
}
In my WPF application I have a text box named: textBox.
I'm trying to get each individual word from a sentence typed by user in a string array, say arrayWords.
I found a piece of code on stackOverFlow that counts the number of words but I want to copy each individual word.
Bellow is the code for counting the number of words.
String text = textBox.Text.Trim();
int wordCount = 0, index = 0;
while (index < text.Length)
{
// check if current char is part of a word
while (index < text.Length && Char.IsWhiteSpace(text[index]) == false)
index++;
wordCount++;
// skip whitespace until next word
while (index < text.Length && Char.IsWhiteSpace(text[index]) == true)
index++;
}
You can use String.Split function.
String text = textBox.Text.Trim()
var words = text.Split(' ');
or
var words = text.Split(); // Default parameter is taken as whitespace delimiter
While #dotNET answer is on the right way, it assumes, that you should maintain punctuation marks list yourself (it isn't full in his answer). Besides, there could be words with hyphen.
I'd recommend to use regular expression:
var words = Regex.Matches(textBox.Text, #"\w+-?\w+")
.OfType<Match>()
.Select(m => m.Value)
.ToArray();
String.Split() can chop your sentence(s) into words. You should however take care of trimming punctuation characters from your words. For example if you use Split() on the sentence "StackOverflow is good, I like it.", two of the words you get in your array will have comma and period characters appended to them. So you should use something like this to get "pure" words:
string[] words = textBox.Text.Split().Select(x => x.TrimEnd(",.;:-".ToCharArray())).ToArray();
LINQ has been used in the above statement, so you should import System.Linq.
The following code will give an array of words from your textBox.
string[] words = textBox.Text.Split(" ");
string[] words = textBox.Text.Split(new char(" "));
Logic behind the getting words from sentence is that first you split sentence into words and then store these words into an array of string then you can do any thing you want. the code below will surely help you to sort out your problem.
static void Main(string[] args)
{
string sentence = "Thats the sentence";
string[] operands = Regex.Split(sentence,#" ");
foreach(string i in operands)
{
Console.WriteLine(i);
}
Console.ReadLine();
}
It will extract words from the sentence and will store in array and will display them.
So I have got a whole string (about 10k chars) and then searching for a word(or many words) in that string. With regex(word).Matches(scrappedstring).
But how to do so to extract the whole sentence, that contains that word. I was thinking of taking a substring after the searched word until the first dot/exclamation mark/question mark/etc. But how to take the part of the sentence before the searched word ?
Or maybe there's a better logic ?
If your boundaries are e.g. ., !, ? and ;, match all sentences across [^.!?;]*(wordmatch)[^.!?;]* expression.
It will give all sentences with desired wordmatch inside.
Example:
var s = "First sentence. Second with wordmatch ? Third one; The last wordmatch, EOM!";
var r = new Regex("[^.!?;]*(wordmatch)[^.!?;]*");
var m = r.Matches(s);
var result = Enumerable.Range(0, m.Count).Select(index => m[index].Value).ToList();
You can get substrings between sentence finishers (dot/exclamation mark/qustion mark/etc) and search for the word in each sentence inside a loop.
Then return the substring when you find the matching word.
Once you have a position, you would then read up to the next ., or end of the file.. but you also need to read backwards from the beginning of the word to a . or the beginning of the file. Those two positions mean you can then extract the sentence.
Note, it's not fool-proof... in its simplest form as outlined above e.g. would mean the sentence started after the g. which is not probably the case.
Extract the sentances from the input. Then search for the specified word(s) within each sentance.
Return the sentances where the word(s) is present.
public List<string> GetMatchedString(string match, string input)
{
var sentanceList = input.Split(new char[] { '.', '?', '!' });
var regex = new Regex(match);
return sentanceList.Where(sentance => regex.Matches(sentance,0).Count > 0).ToList();
}
You can do that using a process in 2 steps.
1st you fragment the phrases and then filter each one has the word.
something like this:
var input = "A large text with many sentences. Many chars in a string!. A sentence without the pattern word.";
//Step 1: fragment phrase.
var patternPhrase = #"(?<=(^|[.!?]\s*))[^ .!?][^.!?]+[.!?]";
//Step 2: filter out only the phrases containing the word.
var patternWord = #"many";
var result = Regex
.Matches(input, patternPhrase) // step 1
.Cast<Match>()
.Select(s => s.Value)
.Where(w => Regex.IsMatch(w, patternWord, RegexOptions.IgnoreCase)); // step 2
foreach (var item in result)
{
//do something with any phrase.
}
my double data in text file like below:
1.4 2.3 3.4
2.2 2.5 2.5
I want simply read this data from file
and store it in an array.
please help me.
I'm a beginner in C#
You can use LINQ:
double[] numbers =
File.ReadAllLines(path)
.Select(s => double.Parse(s)
.ToArray()
If each line can have multiple numbers, you'll need to split the lines:
double[] numbers =
File.ReadAllLines(path)
.SelectMany(s => s.Split(' '))
.Select(s => double.Parse(s)
.ToArray()
You can also use a normal loop:
List<double> numbers = new List<double>();
foreach(string line in File.ReadAllLines(path)) {
numbers.Add(Double.Parse(line));
}
Or, to split them,
List<double> numbers = new List<double>();
foreach(string line in File.ReadAllLines(path)) {
foreach(string word in line.Split(' ') {
numbers.Add(Double.Parse(word));
}
}
Consult MSDN with the following pointers...
File class to read the file contents as string
String.Split to split the numbers based on your delimited
and Double.Parse to convert from string into double.
var values = from value in File.ReadAllText("C:\\Yourfilename.txt").Split(' ')
select double.Parse(value);
Try this, it should be somewhat more bulletproof than some of the other answers. It does not check for invalid data entries which can't be parsed correctly.
var doubles = File.ReadAllText(path)
.Split(new[] {" ", "\r\n", "\n"}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => double.Parse(s, CultureInfo.GetCultureInfo("en-US"))).ToArray();
This will work if the doubles are splitted by either a space or a line break. And it also works if there are multiple spaces/line breaks. I live in DK so I have setted the culture info explicit for parsing.