Search and count specific words from a text file - c#

i would like to search for a specific set of words (or for now one word) which is "Jude" this is my current code, i can read the file, it separates the words but its just comparing them to a word is the problem. (at the moment it is rigged up to just count words and the output is correct).
Many Thanks
-Fred
String theLine;
string theFile;
int counter = 0;
string[] fields = null;
string delim = " ,.";
Console.WriteLine("Please enter a filename:");
theFile = Console.ReadLine();
System.IO.StreamReader sr =
new System.IO.StreamReader(theFile);
while (!sr.EndOfStream)
{
theLine = sr.ReadLine();
theLine.Trim();
fields = theLine.Split(delim.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
counter += fields.Length;
}
sr.Close();
Console.WriteLine("The word count is {0}", counter);
Console.ReadLine();
}

Using LINQ, you can enumerate the lines of the file, then count the number of occurrences of your word or words in each line and sum the counts together:
Console.WriteLine("Please enter a filename:");
var theFile = Console.ReadLine();
var delim = " ,.".ToCharArray();
var countWords = new HashSet(new[] { "Jude" }.Select(w => w.ToUpperInvariant()));
var count = File.ReadLines(theFile).Select(l => l.Split(delim, StringSplitOptions.RemoveEmptyEntries).Count(w => countWords.Contains(w.ToUpperInvariant()))).Sum();
Console.WriteLine("The word count is {0}", count);
If you prefer #Dai's regex pattern approach, you can use it to count the occurrences in each line, still using LINQ to process the lines and sum the counts:
Console.WriteLine("Please enter a filename:");
var theFile = Console.ReadLine();
var delim = " ,.".ToCharArray();
var countWords = new[] { "Jude" };
var wordPattern = new Regex(#"\b(?:"+String.Join("|", countWords)+#")\b", RegexOptions.Compiled|RegexOptions.IgnoreCase);
var count = File.ReadLines(theFile).Select(l => wordPattern.Matches(l).Count).Sum();
Console.WriteLine("The word count is {0}", count);

Avoid new object allocations inside tight loops, in particular:
Don't use String.Split() as it causes excess string allocation
Also avoid calling ToCharArray() too - you can just cache the results.
Use using() to ensure IDisposable objects are always disposed.
I recommend using a Regex instead:
Regex regex = new Regex( #"\bJude\b", RegexOptions.Compiled | RegexOptions.IgnoreCase );
Int32 count = 0;
using( StreamReader rdr = new StreamReader( theFile ) )
{
String line;
while( ( line = rdr.ReadLine() ) != null )
{
count += regex.Matches( line ).Count;
}
}
The \b escape matches a "word-boundary", such as the start and end of strings and punctuation, so it will match "Jude" in the following examples: "Jude", "Jude foo", "Foo Jude", "Hello. Jude." but not "JudeJude".

Related

Repeat substrings N times

I receive series of strings followed by non-negative numbers, e.g. "a3". I have to print on the console each string repeated N times (uppercase) where N is a number in the input. In the example, the result: "AAA". As you see, I have tried to get the numbers from the input and I think it's working fine. Can you help me with the repeating?
string input = Console.ReadLine();
//input = "aSd2&5s#1"
MatchCollection matched = Regex.Matches(input, #"\d+");
List<int> repeatsCount = new List<int>();
foreach (Match match in matched)
{
int repeatCount = int.Parse(match.Value);
repeatsCount.Add(repeatCount);
}
//repeatsCount: [2, 5, 1]
//expected output: ASDASD&&&&&S# ("aSd" is converted to "ASD" and repeated twice;
// "&" is repeated 5 times; "s#" is converted to "S#" and repeated once.)
For example, if we have "aSd2&5s#1":
"aSd" is converted to "ASD" and repeated twice; "&" is repeated 5 times; "s#" is converted to "S#" and repeated once.
Let the pattern include two groups: value to repeat and how many times to repeat:
#"(?<value>[^0-9]+)(?<times>[0-9]+)"
Then we can operate with these groups, say, with a help of Linq:
string source = "aSd2&5s#1";
string result = string.Concat(Regex
.Matches(source, #"(?<value>[^0-9]+)(?<times>[0-9]+)")
.OfType<Match>()
.SelectMany(match => Enumerable // for each match
.Repeat(match.Groups["value"].Value.ToUpper(), // repeat "value"
int.Parse(match.Groups["times"].Value)))); // "times" times
Console.Write(result);
Outcome:
ASDASD&&&&&S#
Edit: Same idea without Linq:
StringBuilder sb = new StringBuilder();
foreach (Match match in Regex.Matches(source, #"(?<value>[^0-9]+)(?<times>[0-9]+)")) {
string value = match.Groups["value"].Value.ToUpper();
int times = int.Parse(match.Groups["times"].Value);
for (int i = 0; i < times; ++i)
sb.Append(value);
}
string result = sb.ToString();
You can extract substring and how often it should be repeated with this regex:
(?<content>.+?)(?<count>\d+)
Now you can use a StringBuilder to create output string. Full code:
var input = "aSd2&5s#1";
var regex = new Regex("(?<content>.+?)(?<count>\\d+)");
var matches = regex.Matches(input).Cast<Match>();
var sb = new StringBuilder();
foreach (var match in matches)
{
var count = int.Parse(match.Groups["count"].Value);
for (var i = 0; i < count; ++i)
sb.Append(match.Groups["content"].Value.ToUpper());
}
Console.WriteLine(sb.ToString());
Output is
ASDASD&&&&&S#
Another solution without LINQ
i tried to keep the solution so it would be similar to yours
string input = "aSd2&5s#1";
var matched = Regex.Matches(input, #"\d+");
var builder = new StringBuilder();
foreach (Match match in matched)
{
string stingToDuplicate = input.Split(Char.Parse(match.Value))[0];
input = input.Replace(stingToDuplicate, String.Empty).Replace(match.Value, String.Empty);
for (int i = 0; i < Convert.ToInt32(match.Value); i++)
{
builder.Append(stingToDuplicate.ToUpper());
}
}
and finally Console.WriteLine(builder.ToString());
which result ASDASD&&&&&S#
My solution is same as others with slight differences :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication107
{
class Program
{
static void Main(string[] args)
{
string input = "aSd2&5s#1";
string pattern1 = #"[a-zA-z#&]+\d+";
MatchCollection matches = Regex.Matches(input, pattern1);
string output = "";
foreach(Match match in matches.Cast<Match>().ToList())
{
string pattern2 = #"(?'string'[^\d]+)(?'number'\d+)";
Match match2 = Regex.Match(match.Value, pattern2);
int number = int.Parse(match2.Groups["number"].Value);
string str = match2.Groups["string"].Value;
output += string.Join("",Enumerable.Repeat(str.ToUpper(), number));
}
Console.WriteLine(output);
Console.ReadLine();
}
}
}
Very simple program. No linq nothing, simple string and for loop.
string input = "aSd2&5s#1";
char[] inputArray = input.ToCharArray();
string output = "";
string ab = "";
foreach (char c in inputArray)
{
int x;
string y;
if(int.TryParse(c.ToString(), out x))
{
string sb = "";
ab = ab.ToUpper();
for(int i=0;i<b;i++)
{
sb += ab;
}
ab = "";
output += sb;
}
else
{
ab += c;
}
}
if(!string.IsNullOrEmpty(ab))
{
output += ab.ToUpper();
}
Console.WriteLine(output);
Hope it helps.

Alternatively upper- and lowercase words in a string

I use Visual Studio 2010 ver.
I have array strings [] = { "eat and go"};
I display it with foreach
I wanna convert strings like this : EAT and GO
Here my code:
Console.Write( myString.First().ToString().ToUpper() + String.Join("",myString].Skip(1)).ToLower()+ "\n");
But the output is : Eat and go . :D lol
Could you help me? I would appreciate it. Thanks
While .ToUpper() will convert a string to its upper case equivalent, calling .First() on a string object actually returns the first element of the string (since it's effectively a char[] under the hood). First() is actually exposed as a LINQ extension method and works on any collection type.
As with many string handling functions, there are a number of ways to handle it, and this is my approach. Obviously you'll need to validate value to ensure it's being given a long enough string.
using System.Text;
public string CapitalizeFirstAndLast(string value)
{
string[] words = value.Split(' '); // break into individual words
StringBuilder result = new StringBuilder();
// Add the first word capitalized
result.Append(words[0].ToUpper());
// Add everything else
for (int i = 1; i < words.Length - 1; i++)
result.Append(words[i]);
// Add the last word capitalized
result.Append(words[words.Length - 1].ToUpper());
return result.ToString();
}
If it's always gonna be a 3 words string, the you can simply do it like this:
string[] mystring = {"eat and go", "fast and slow"};
foreach (var s in mystring)
{
string[] toUpperLower = s.Split(' ');
Console.Write(toUpperLower.First().ToUpper() + " " + toUpperLower[1].ToLower() +" " + toUpperLower.Last().ToUpper());
}
If you want to continuously alternate, you can do the following:
private static string alternateCase( string phrase )
{
String[] words = phrase.split(" ");
StringBuilder builder = new StringBuilder();
//create a flag that keeps track of the case change
book upperToggle = true;
//loops through the words
for(into i = 0; i < words.length; i++)
{
if(upperToggle)
//converts to upper if flag is true
words[i] = words[i].ToUpper();
else
//converts to lower if flag is false
words[i] = words[i].ToLower();
upperToggle = !upperToggle;
//adds the words to the string builder
builder.append(words[i]);
}
//returns the new string
return builder.ToString();
}
Quickie using ScriptCS:
scriptcs (ctrl-c to exit)
> var input = "Eat and go";
> var words = input.Split(' ');
> var result = string.Join(" ", words.Select((s, i) => i % 2 == 0 ? s.ToUpperInvariant() : s.ToLowerInvariant()));
> result
"EAT and GO"

How can I capitalize every third letter of a string in C#?

How can I capitalize every third letter of a string in C#?
I loop through the whole string with a for loop, but I can't think of the sequence right now.
I suspect you just want something like this:
// String is immutable; copy to a char[] so we can modify that in-place
char[] chars = input.ToCharArray();
for (int i = 0; i < chars.Length; i += 3)
{
chars[i] = char.ToUpper(chars[i]);
}
// Now construct a new String from the modified character array
string output = new string(chars);
That assumes you want to start capitalizing from the first letter, so "abcdefghij" would become "AbcDefGhiJ". If you want to start capitalizing elsewhere, just change the initial value of i.
var s = "Lorem ipsum";
var foo = new string(s
.Select((c, i) => (i + 1) % 3 == 0 ? Char.ToUpper(c) : c)
.ToArray());
You are already looping through the characters inside a string? Then add a counter, increment it on each iteration, and if it is 3, then use .ToUpper(currentCharacter) to make it upper case. Then reset your counter.
You could just use a regular expression.
If the answer is every third char then you want
var input = "sdkgjslgjsklvaswlet";
var regex = new Regex("(..)(.)");
var replacement = regex.Replace(input, delegate(Match m)
{
return m.Groups[1].Value + m.Groups[2].Value.ToUpper();
});
If you want every third character, but starting with the first you want:
var input = "sdkgjslgjsklvaswlet";
var regex = new Regex("(.)(..)");
var replacement = regex.Replace(input, delegate(Match m)
{
return m.Groups[1].Value.ToUpper() + m.Groups[2].Value;
});
If you want a loop, you can convert to a character array first, so you can alter the values.
For every third character:
var x = input.ToCharArray();
for (var i = 2; i <x.Length; i+=3) {
x[i] = char.ToUpper(x[i]);
}
var replacement = new string(x);
For every third character from the beginning:
var x = input.ToCharArray();
for (var i = 0; i <x.Length; i+=3) {
x[i] = char.ToUpper(x[i]);
}
var replacement = new string(x);

How to get value specific column value in csv using c#?

I do a project in c# winforms.
I want to get first column value in csv.
var reader = new StreamReader(File.OpenRead(#"C:\test.csv"));
List<string> listA = new List<string>();
List<string> listB = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(';');
listA.Add(values[0]);
}
------------------
no |name |
------------------
1 |wwwwww
2 |yyyyy
3 |aaaaa
4 |bbbbbb
Now I am using above this code. It gives the value row by row. i want all name value in listA
Any one have idea?
There is now way to read column in CSV without reading whole file. You can use some wrappers (for example: LINQ to CSV library) but they will just "hide" reading operation.
Yes - you're currently spliting on ;
Try using a comma instead.
Better to use a dedicated library btw...
Some frown upon Regex but I think it provides good flexibility. Here is an example inspired by
Adrian Mejia. Basically, you can choose particular characters between which the delimiter is valid in the context. i.e. a comma in "hello, world" or 'hello, world' would be valid.
static void Main(string[] args)
{
string csv = "Hello,1,3.5,25,\"speech marks\",'inverted commas'\r\nWorld,2,4,60,\"again, more speech marks\",'something else in inverted commas, with a comma'";
// General way to create grouping constructs which are valid 'text' fields
string p = "{0}([^{0}]*){0}"; // match group '([^']*)' (inverted commas) or \"([^\"]*)\" (speech marks)
string c = "(?<={0}|^)([^{0}]*)(?:{0}|$)"; // commas or other delimiter group (?<=,|^)([^,]*)(?:,|$)
char delimiter = ','; // this can be whatever delimiter you like
string p1 = String.Format(p, "\""); // speechmarks group (0)
string p2 = String.Format(p, "'"); // inverted comma group (1)
string c1 = String.Format(c, delimiter); // delimiter group (2)
/*
* The first capture group will be speech marks ie. "some text, "
* The second capture group will be inverted commas ie. 'this text'
* The third is everything else seperated by commas i.e. this,and,this will be [this][and][this]
* You can extend this to customise delimiters that represent text where a comma between is a valid entry eg. "this text, complete with a pause, is perfectly valid"
*
* */
//string pattern = "\"([^\"]*)\"|'([^']*)'|(?<=,|^)([^,]*)(?:,|$)";
string pattern = String.Format("{0}|{1}|{2}", new object[] { p1, p2, c1 }); // The actual pattern to match based on groups
string text = csv;
// If you're reading from a text file then this will do the trick. Uses the ReadToEnd() to put the whole file to a string.
//using (TextReader tr = new StreamReader("PATH TO MY CSV FILE", Encoding.ASCII))
//{
// text = tr.ReadToEnd(); // just read the whole stream
//}
string[] lines = text.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries); // if you have a blank line just remove it?
Regex regex = new Regex(pattern, RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase); // compile for speed
List<object> rowsOfColumns = new List<object>();
foreach (string row in lines)
{
List<string> columns = new List<string>();
// Find matches.
MatchCollection matches = regex.Matches(row);
foreach (Match match in matches)
{
for (int ii = 0; ii < match.Groups.Count; ii++)
{
if (match.Groups[ii].Success) // ignore things that don't match
{
columns.Add(match.Groups[ii].Value.TrimEnd(new char[] { delimiter })); // strip the delimiter
break;
}
}
}
// Do something with your columns here (add to List for example)
rowsOfColumns.Add(columns);
}
}
var reader = new StreamReader(File.OpenRead(#"C:\test.csv"));
List<string> listA = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
string[] dates = line.Split(',');
for (int i = 0; i < dates.Length; i++)
{
if(i==0)
listA.Add(dates[0]);
}
}

loop through string to find substring

I have this string:
text = "book//title//page/section/para";
I want to go through it to find all // and / and their index.
I tried doing this with:
if (text.Contains("//"))
{
Console.WriteLine(" // index: {0} ", text.IndexOf("//"));
}
if (text.Contains("/"))
{
Console.WriteLine("/ index: {0} :", text.IndexOf("/"));
}
I was also thinking about using:
Foreach(char c in text)
but it will not work since // is not a single char.
How can I achieve what I want?
I tried this one also but did not display result
string input = "book//title//page/section/para";
string pattern = #"\/\//";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);
if (matches.Count > 0)
{
Console.WriteLine("{0} ({1} matches):", input, matches.Count);
foreach (Match match in matches)
Console.WriteLine(" " + input.IndexOf(match.Value));
}
Thank you in advance.
Simple:
var text = "book//title//page/section/para";
foreach (Match m in Regex.Matches(text, "//?"))
Console.WriteLine(string.Format("Found {0} at index {1}.", m.Value, m.Index));
Output:
Found // at index 4.
Found // at index 11.
Found / at index 17.
Found / at index 25.
Would it be possible using Split?
So:
string[] words = text.Split(#'/');
And then go through the words? You would have blanks, due to the //, but that might be possible?
If what you want is a list "book","title","page","section","para"
you can use split.
string text = "book//title//page/section/para";
string[] delimiters = { "//", "/" };
string[] result = text.Split(delimiters,StringSplitOptions.RemoveEmptyEntries);
System.Diagnostics.Debug.WriteLine(result);
Assert.IsTrue(result[0].isEqual("book"));
Assert.IsTrue(result[1].isEqual("title"));
Assert.IsTrue(result[2].isEqual("page"));
Assert.IsTrue(result[3].isEqual("section"));
Assert.IsTrue(result[4].isEqual("para"));
Sometin like:
bool lastCharASlash = false;
foreach(char c in text)
{
if(c == #'/')
{
if(lastCharASlash)
{
// my code...
}
lastCharASlash = true;
}
else lastCharASlash = false;
}
You can also do text.Split(#"//")
You could replace // and / with your own words and then find the last index of
string s = "book//title//page/section/para";
s = s.Replace("//", "DOUBLE");
s = s.Replace("/", "SINGLE");
IList<int> doubleIndex = new List<int>();
while (s.Contains("DOUBLE"))
{
int index = s.IndexOf("DOUBLE");
s = s.Remove(index, 6);
s = s.Insert(index, "//");
doubleIndex.Add(index);
}
IList<int> singleIndex = new List<int>();
while (s.Contains("SINGLE"))
{
int index = s.IndexOf("SINGLE");
s = s.Remove(index, 6);
s = s.Insert(index, "/");
singleIndex.Add(index);
}
Remember to first replace double, otherwise you'll get SINGLESINGLE for // instead of DOUBLE. Hope this helps.

Categories

Resources