Read Text after custom word - c#

My goal is to find a way how to read the text after a word in a File. An example of this is:
Word("Text")
The output would be Text.
Is this achievable?

Cut your problem into small pieces:
Read a text file
Divide the text into a sequence of words
Skip all words in the sequence until the word that you are looking for
Use the rest of the sequence of words
Read the characters in a text file as a sequence of characters
public IEnumerable<char> ReadTextFile(string fileName)
{
using (TextReader textReader = new StreamReader(fileName))
{
// read the characters one by one until there are no more character (= -1)
int readResult = textReader.Read();
while (readResult != -1);
{
yield return (char)readResult;
}
}
}
I decided to return a sequence of char instead of a string. This is a small optimization. If you decide not to read all characters, then not the complete file is read.
Divide a sequence of characters into a sequence of words.
The problem is: what is a word? Probably something like: all characters between two white spaces. Something special with the beginning and the end of the sequence.
But how about this: "Hello..World" Are these only the words "Hello" and "World", or is there also an empty word between the two dots? And what if the sequence of characters starts with a dot: ".Hello"?
I'll write this part as an extension method, so you can use it in a LINQ. If you are not familiar with extension methods, consider to read Extension methods demystified.
public static IEnumerable<string> ToWords(this IEnumerable<char> source)
{
string word = String.Empty;
foreach (char c in source)
{
if (Char.IsWhiteSpace(c))
{
// white space. only a word if already read something
if (word.Length != 0)
{
yield return word
word = String.Empty;
}
// else: the sequence starts with a white space: not a word
}
else
{
// not a white space: add the character to the word
word = word + c;
}
}
// if word not empty, then there were characters after the last whitespace
// like: "Hello World". "Hello" already returned. "World" not yet
if (word.Length != 0)
yield return word;
}
Consider to optimize the word = word + c; part.
Use LINQ to concatenate what you want
string fileName = ...
string wordToFind = "Hello";
// in this example ignore case:
IEqualityComparer stringComparer = StringComparison.CurrentCultureIgnoreCase;
IEnumerable<string> wordsAfterHello = ReadTextFile(fileName)
.ToWords()
.SkipWhile(word => stringComparer.Equals(word, wordToFind);
Of course if you plan to use this often, you could write extension methods for this.

Related

Escape character in C#'s Split()

I am parsing some delimiter separated values, where ? is specified as the escape character in case the delimiter appears as part of one of the values.
For instance: if : is the delimiter, and a certain field the value 19:30, this needs to be written as 19?:30.
Currently, I use string[] values = input.Split(':'); in order to get an array of all values, but after learning about this escape character, this won't work anymore.
Is there a way to make Split take escape characters into account? I have checked the overload methods, and there does not seem to be such an option directly.
string[] substrings = Regex.Split("aa:bb:00?:99:zz", #"(?<!\?):");
for
aa
bb
00?:99
zz
Or as you probably want to unescape ?: at some point, replace the sequence in the input with another token, split and replace back.
(This requires the System.Text.RegularExpressions namespace to be used.)
This kind of stuff is always fun to code without using Regex.
The following does the trick with one single caveat: the escape character will always escape, it has no logic to check for only valid ones: ?;. So the string one?two;three??;four?;five will be split into onewo, three?, fourfive.
public static IEnumerable<string> Split(this string text, char separator, char escapeCharacter, bool removeEmptyEntries)
{
string buffer = string.Empty;
bool escape = false;
foreach (var c in text)
{
if (!escape && c == separator)
{
if (!removeEmptyEntries || buffer.Length > 0)
{
yield return buffer;
}
buffer = string.Empty;
}
else
{
if (c == escapeCharacter)
{
escape = !escape;
if (!escape)
{
buffer = string.Concat(buffer, c);
}
}
else
{
if (!escape)
{
buffer = string.Concat(buffer, c);
}
escape = false;
}
}
}
if (buffer.Length != 0)
{
yield return buffer;
}
}
No, there's no way to do that. You will need to use regex (which depends on how exactly do you want your "escape character" to behave). In worst case I suppose you'll have to do the parsing manually.

How to save the strings in array and display the next string array if match found?

I read the *.txt file from c# and displayed in the console.
My text file looks like a table.
diwas hey
ivonne how
pokhara d kd
lekhanath when
dipisha dalli hos
dfsa sasf
Now I want to search for a string "pokhara" and if it is found then it should display the "d kd" and if not found display "Not found"
What I tried?
string[] lines = System.IO.ReadAllLines(#"C:\readme.txt");
foreach(string line in lines)
{
string [] words = line.Split();
foreach(string word in words)
{
if (word=="pokhara")
{
Console.WriteLine("Match Found");
}
}
}
My Problem:
Match was found but how to display the next word of the line. Also sometimes
in second row some words are split in two with a space, I need to show both words.
I guess your delimiter is the tab-character, then you can use String.Split and LINQ:
var lineFields = System.IO.File.ReadLines(#"C:\readme.txt")
.Select(l => l.Split('\t'));
var matches = lineFields
.Where(arr => arr.First().Trim() == "pokhara")
.Select(arr => arr.Last().Trim());
// if you just want the first match:
string result = matches.FirstOrDefault(); // is null if not found
If you don't know the delimiter as suggested by your comment you have a problem. If you don't even know the rules of how the fields are separated it's very likely that your code is incorrect. So first determine the business logic, ask the people who created the text file. Then use the correct delimiter in String.Split.
If it's a space you can either use string.Split()(without argument), that includes spaces, tabs and new-line characters or use string.Split(' ') which only includes the space. But note that is a bad delimiter if the fields can contain spaces as well. Then either use a different or wrap the fields in quoting characters like "text with spaces". But then i suggest a real text-parser like the Microsoft.VisualBasic.FileIO.TextFieldParser which can also be used in C#. It has a HasFieldsEnclosedInQuotes property.
This works ...
string[] lines = System.IO.ReadAllLines(#"C:\readme.txt");
string stringTobeDisplayed = string.Empty;
foreach(string line in lines)
{
stringTobeDisplayed = string.Empty;
string [] words = line.Split();
//I assume that the first word in every line is the key word to be found
if (word[0].Trim()=="pokhara")
{
Console.WriteLine("Match Found");
for(int i=1 ; i < words.Length ; i++)
{
stringTobeDisplayed += words[i]
}
Console.WriteLine(stringTobeDisplayed);
}
}

Using string.ToUpper on substring

Have an assignment to allow a user to input a word in C# and then display that word with the first and third characters changed to uppercase. Code follows:
namespace Capitalizer
{
class Program
{
static void Main(string[] args)
{
string text = Console.ReadLine();
char[] delimiterChars = { ' ' };
string[] words = text.Split(delimiterChars);
string Upper = text.ToUpper();
Console.WriteLine(Upper);
Console.ReadKey();
}
}
}
This of course generates the entire word in uppercase, which is not what I want. I can't seem to make text.ToUpper(0,2) work, and even then that'd capitalize the first three letters. Only solution I can think of now that would make the word appear on one line (and I don't know if it works) is to move the capitalized letters and lowercase letters into a character array and try to get that to print all values in a modified order.
The simplest way I can think of to address your exact question as described — to convert to upper case the first and third characters of the input — would be something like the following:
StringBuilder sb = new StringBuilder(text);
sb[0] = char.ToUpper(sb[0]);
sb[2] = char.ToUpper(sb[2]);
text = sb.ToString();
The StringBuilder class is essentially a mutable string object, so when doing these kinds of operations is the most fluid way to approach the problem, as it provides the most straightforward conversions to and from, as well as the full range of string operations. Changing individual characters is easy in many data structures, but insertions, deletions, appending, formatting, etc. all also come with StringBuilder, so it's a good habit to use that versus other approaches.
But frankly, it's hard to see how that's a useful operation. I can't help but wonder if you have stated the requirements incorrectly and there's something more to this question than is seen here.
You could use LINQ:
var upperCaseIndices = new[] { 0, 2 };
var message = "hello";
var newMessage = new string(message.Select((c, i) =>
upperCaseIndices.Contains(i) ? Char.ToUpper(c) : c).ToArray());
Here is how it works. message.Select (inline LINQ query) selects characters from message one by one and passes into selector function:
upperCaseIndices.Contains(i) ? Char.ToUpper(c) : c
written as C# ?: shorthand syntax for if. It reads as "If index is present in the array, then select upper case character. Otherwise select character as is."
(c, i) => condition
is a lambda expression. See also:
Understand Lambda Expressions in 3 minutes
The rest is very simple - represent result as array of characters (.ToArray()), and create a new string based off that (new string(...)).
Only solution I can think of now that would make the word appear on one line (and I don't know if it works) is to move the capitalized letters and lowercase letters into a character array and try to get that to print all values in a modified order.
That seems a lot more complicated than necessary. Once you have a character array, you can simply change the elements of that character array. In a separate function, it would look something like
string MakeFirstAndThirdCharacterUppercase(string word) {
var chars = word.ToCharArray();
chars[0] = chars[0].ToUpper();
chars[2] = chars[2].ToUpper();
return new string(chars);
}
My simple solution:
string text = Console.ReadLine();
char[] delimiterChars = { ' ' };
string[] words = text.Split(delimiterChars);
foreach (string s in words)
{
char[] chars = s.ToCharArray();
chars[0] = char.ToUpper(chars[0]);
if (chars.Length > 2)
{
chars[2] = char.ToUpper(chars[2]);
}
Console.Write(new string(chars));
Console.Write(' ');
}
Console.ReadKey();

C# Reading strings from a file and finding matches between Start string and an End String different by one character

Suppose I have a txt file with strings {ABAA, AAAA, ABZA, ABZZ, and AAZZ} and my Start word is AAAA and my end word is AAZZ.
I need to find all the words between the start word and end word different by one character; so from the example given my results would be: AAAA, ABZZ and AAZZ.
At the moment what I am doing is creating a list and reading the file line-by-line and passing it to the list.
// 1 Declare new List.
List<string> lines = new List<string>();
// 2
// Use using StreamReader for disposing.
using (StreamReader sr = new StreamReader(PATH))
{
// 3
// Use while != null pattern for loop
string line;
while ((line = sr.ReadLine()) != null)
{
// 4
// Insert logic here.
// ...
// "line" is a line in the file. Add it to our List.
lines.Add(line);
}
}
My question is: how do I look for strings different by one character? Do I need to break the string that I read from the file into characters and do a comparison to my Start and End Strings?
bool compareStrings(a, b): return a.Zip(b, (a,b) => { a, b }).Where(x => x.a != x.b).Take(2).Count() <= 1;
Regular expressions are very good at finding this sort of thing and .NET has excellent support for regular expressions. First you need to define the regular expression.
Your requirements are a bit vague but according to your description, example data and example results I'm inferring that you want to match the start word and every word that varies from the end word by exactly one character. The regex you need is:
\bAAAA\b|\bAAZ\w\b|\bAA\wZ\b|\bA\wZZ\b|\b\wAZZ\b
Let me break that down for left to right.
'\b' means "word boundary" which could be whitespace or a curly brace or other such non-word character.
'AAAA' is your start word and would be matched litterally
'\b' means "word boundary"
'|' means "alternation" which essentially means "match the expression on the left OR match the expression on the right"
'\b' means "word boundary"
'AAZ\w' is the first permutation of one-character differences from your end word. '\w' means "any word character."
'\b' means "word boundary"
'\bAA\wZ\b' is the second permutation of one-character differences from your end word.
'\bA\wZZ\b' is the third permutation.
'\b\wAZZ\b' is the fourth and final permutation and would also match the end word.
See http://www.regular-expressions.info/reference.html for definitions of "word boundary" and "word character."
Now for the code:
using System;
using System.Text.RegularExpressions;
string pattern = #"\bAAAA\b|\bAAZ\w\b|\bAA\wZ\b|\bA\wZZ\b|\b\wAZZ\b";
// 1 Declare new List.
List<string> lines = new List<string>();
// 2
// Use using StreamReader for disposing.
using (StreamReader sr = new StreamReader(PATH))
{
// 3
// Use while != null pattern for loop
string line;
while ((line = sr.ReadLine()) != null)
{
// 4
if (Regex.IsMatch(line, pattern, RegexOptions.IgnoreCase))
{
// ...
// "line" is a line in the file. Add it to our List.
lines.Add(line);
}
}
}
I'm not sure of all the requirements, but this function should return the amount of characters that match between two words.
private int CheckWord(string startWord, string otherWord)
{
List<char> start = new List<char>(startWord.ToArray());
List<char> wordt = new List<char>(otherWord.ToArray());
return start.Intersect(wordt).Count();
}
This call CheckWord("start", "srart"); returns 4. Match that number against the length of the string to determine how different they are.

Split large text string into variable length strings without breaking words and keeping linebreaks and spaces

I am trying to break a large string of text into several smaller strings of text and define each smaller text strings max length to be different. for example:
"The quick brown fox jumped over the red fence.
The blue dog dug under the fence."
I would like to have code that can split this into smaller lines and have the first line have a max of 5 characters, the second line have a max of 11, and rest have a max of 20, resulting in this:
Line 1: The
Line 2: quick brown
Line 3: fox jumped over the
Line 4: red fence.
Line 5: The blue dog
Line 6: dug under the fence.
All this in C# or MSSQL, is it possible?
public List<String> SplitString(String text, int [] lengths)
{
List<String> output = new List<String>();
List<String> words = Split(text);
int i = 0;
int lineNum = 0;
string s = string.empty;
while(i<words.Length)
{
if(s.Length+words[i].Length <lengths[lineNum])
{
s+=words[i];
i++;
if(lineNum<lengths.Length-1)
lineNum++;
}
else
{
output.Add(s);
s=String.Empty;
}
}
s.Remove(S.length-1,1);// deletes last extra space.
return output;
}
public static List<string> Split(string text)
{
List<string> result = new List<string>();
StringBuilder sb = new StringBuilder();
foreach (var letter in text)
{
if (letter != ' ' && letter != '\t' && letter != '\n')
{
sb.Append(letter);
}
else
{
if (sb.Length > 0)
{
result.Add(sb.ToString());
}
result.Add(letter.ToString());
sb = new StringBuilder();
}
}
return result;
}
This is untested/compiled code, but you should get the idea.
I also think you should use a StringBuilder instead, but I didn't remember how to use it.
\A(.{0,5}\b)(.{0,11}\b)(.{0,20}\b)+\Z
will capture up to five characters in group 1, up to 11 in group 2 and chunks of up to 20 in group 3. Matches will be split along word delimiters in order to avoid splitting in the middle of a word. Whitespace, line break etc. count as characters and will be preserved.
The trick is to get at the individual matches in the repeated group, something that can only be done in .NET and Perl 6:
Match matchResults = null;
Regex paragraphs = new Regex(#"\A(.{0,5}\b)(.{0,11}\b)(.{0,20}\b)+\Z", RegexOptions.Singleline);
matchResults = paragraphs.Match(subjectString);
if (matchResults.Success) {
String line1 = matchResults.Groups[1].Value;
String line2 = matchResults.Groups[2].Value;
Capture line3andup = matchResults.Groups[3].Captures;
// you now need to iterate over line3andup, extracting the lines.
} else {
// Match attempt failed
}
I don't know C# at all and have tried to construct this from RegexBuddy's templates and the VB code here, so please feel free to point out my coding errors.
Note that the whitespace at the beginning of line two is captured at the end of the previous match.

Categories

Resources