I need to remove words from the text with separators next to them. The problem is that the program only removes 1 separator after the word but there are many of them. Any suggestions how to remove other separators?
Also, I need to make sure that the word is not connected with other letters. For example (If the word is fHouse or Housef it should not be removed)
At the moment I have:
public static void Process(string fin, string fout)
{
using (var foutv = File.CreateText(fout)) //fout - OutPut.txt
{
using (StreamReader reader = new StreamReader(fin)) // fin - InPut.txt
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] WordsToRemove = { "Home", "House", "Room" };
char[] seperators = {';', ' ', '.', ',', '!', '?', ':'};
foreach(string word in WordsToRemove)
{
foreach (char seperator in seperators)
{
line = line.Replace(word + seperator, string.Empty);
}
}
foutv.WriteLine(line);
}
}
}
}
I have :
fhgkHouse!House!Dog;;;!!Inside!C!Room!Home!House!Room;;;;;;;;;;!Table!London!Computer!Room;..;
Results I get:
fhgkDog;;;!!Inside!C!;;;;;;;;;!Table!London!Computer!..;
The results should be:
fhgkHouse!Dog;;;!!Inside!C!Table!London!Computer!
Try this regex : \b(Home|House|Room)(!|;)*\b|;+\.\.;+
See at: https://regex101.com/r/LUsyM8/1
In there, I substitute words and special characters with blank or empty string.
It produces the same expected result I guess.
I'm currently reading the content of a .srt like this:
var assetURL = "some url";
var textFromFile = (new WebClient()).DownloadString(assetURL);
However, I need to be able to loop through all lines, like this:
string[] text = File.ReadAllLines(#"subs.srt");
foreach (string line in text)
{
// Do something
}
I can't do it like that because File.ReadAllLines does not support URIs. Any idea how I can accomplish this?
You can always split a string by Environment.NewLine with String.Split:
string[] lines = textFromFile.Split(new []{ Environment.NewLine }, StringSplitOptions.None);
You can use StringReader:
using (var sr = new StringReader(textFromFile))
{
string line;
while ((line = sr.ReadLine()) != null)
{
// sth with a line
}
}
Why it could be better than splitting by Environment.NewLine? It will handle both cases - when the new line character is \r\n or \n.
Am trying to read a text file in asp.net where the file is not in a particular format, so i just wanted to read that file up to special characters(*) and skip the rest.
In general it is of the format
00000 AFCX TY88YYY
12366 FTTT TY88YYY
** File Description
// This is so and so Description
** End of Description
12345 TYUI TY88YYY
45677 RERY TY88YYY
string file = "TextFile1.txt";
List<string> lines = new List<string>();
using (StreamReader r = new StreamReader(f))
{
string line;
while ((line = r.ReadLine()) != null && !line.StartsWith("*"))
{
lines.Add(line);
}
}
This will give you a list of all the lines except those beginning with *:
string[] yourFileContents = File.ReadAllLines(filePath);
List<string> contentsWithoutAsterix =
yourFileContents.Where(line => line.First() != '*').ToList();
PS (edit):
If you just want lines until the first occurence of *, do this instead:
List<string> contentsWithoutAsterix =
yourFileContents.TakeWhile(line => line.First() != '*').ToList();
We are trying to read each word from a text file and replace it with another word.
For smaller text files, it works well. But for larger text files we keep getting the exception: "String cannot be of zero length.
Parameter name: oldValue "
void replace()
{
string s1 = " ", s2 = " ";
StreamReader streamReader;
streamReader = File.OpenText("C:\\sample.txt");
StreamWriter streamWriter = File.CreateText("C:\\sample1.txt");
//int x = st.Rows.Count;
while ((line = streamReader.ReadLine()) != null)
{
char[] delimiterChars = { ' ', '\t' };
String[] words = line.Split(delimiterChars);
foreach (string str in words)
{
s1 = str;
DataRow drow = st.Rows.Find(str);
if (drow != null)
{
index = st.Rows.IndexOf(drow);
s2 = Convert.ToString(st.Rows[index]["Binary"]);
s2 += "000";
// Console.WriteLine(s1);
// Console.WriteLine(s2);
streamWriter.Write(s1.Replace(s1,s2)); // Exception occurs here
}
else
break;
}
}
streamReader.Close();
streamWriter.Close();
}
we're unable to find the reason.
Thanks in advance.
When you do your string.Split you may get empty entries if there are multiple spaces or tabs in sequence. These can't be replaced as the strings are 0 length.
Use the overload that strips empty results using the StringSplitOptions argument:
var words = line.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
The exception occurs because s1 is an empty string at some point. You can avoid this by replacing the line
String[] words = line.Split(delimiterChars);
with this:
String[] words = line.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
You want to change your Split method call like this:
String[] words = line.Split(delimiterChars,StringSplitOptions.RemoveEmptyEntries);
It means that s1 contains an empty string ("") which can happen if you have two consecutive white spaces or tabs in your file.
How can I remove empty lines in a string in C#?
I am generating some text files in C# (Windows Forms) and for some reason there are some empty lines. How can I remove them after the string is generated (using StringBuilder and TextWrite).
Example text file:
THIS IS A LINE
THIS IS ANOTHER LINE AFTER SOME EMPTY LINES!
If you also want to remove lines that only contain whitespace, use
resultString = Regex.Replace(subjectString, #"^\s+$[\r\n]*", string.Empty, RegexOptions.Multiline);
^\s+$ will remove everything from the first blank line to the last (in a contiguous block of empty lines), including lines that only contain tabs or spaces.
[\r\n]* will then remove the last CRLF (or just LF which is important because the .NET regex engine matches the $ between a \r and a \n, funnily enough).
Tim Pietzcker - it is not working for me. I have to change a little bit, but thanks!
Ehhh C# Regex.. I had to change it again, but this it working well:
private string RemoveEmptyLines(string lines)
{
return Regex.Replace(lines, #"^\s*$\n|\r", string.Empty, RegexOptions.Multiline).TrimEnd();
}
Example:
http://regex101.com/r/vE5mP1/2
You could try String.Replace("\n\n", "\n");
Try this
Regex.Replace(subjectString, #"^\r?\n?$", "", RegexOptions.Multiline);
private string remove_space(string st)
{
String final = "";
char[] b = new char[] { '\r', '\n' };
String[] lines = st.Split(b, StringSplitOptions.RemoveEmptyEntries);
foreach (String s in lines)
{
if (!String.IsNullOrWhiteSpace(s))
{
final += s;
final += Environment.NewLine;
}
}
return final;
}
private static string RemoveEmptyLines(string text)
{
var lines = text.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
var sb = new StringBuilder(text.Length);
foreach (var line in lines)
{
sb.AppendLine(line);
}
return sb.ToString();
}
None of the methods mentioned here helped me all the way, but I found a workaround.
Split text to lines - collection of strings (with or without empty strings, also Trim() each string).
Add these lines to multiline string.
public static IEnumerable<string> SplitToLines(this string inputText, bool removeEmptyLines = true)
{
if (inputText == null)
{
yield break;
}
using (StringReader reader = new StringReader(inputText))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (removeEmptyLines && !string.IsNullOrWhiteSpace(line))
yield return line.Trim();
else
yield return line.Trim();
}
}
}
public static string ToMultilineText(this string text)
{
var lines = text.SplitToLines();
return string.Join(Environment.NewLine, lines);
}
Based on Evgeny Sobolev's code, I wrote this extension method, which also trims the last (obsolete) line break using TrimEnd(TrimNewLineChars):
public static class StringExtensions
{
private static readonly char[] TrimNewLineChars = Environment.NewLine.ToCharArray();
public static string RemoveEmptyLines(this string str)
{
if (str == null)
{
return null;
}
var lines = str.Split(TrimNewLineChars, StringSplitOptions.RemoveEmptyEntries);
var stringBuilder = new StringBuilder(str.Length);
foreach (var line in lines)
{
stringBuilder.AppendLine(line);
}
return stringBuilder.ToString().TrimEnd(TrimNewLineChars);
}
}
I found a simple answer to this problem:
YourradTextBox.Lines = YourradTextBox.Lines.Where(p => p.Length > 0).ToArray();
Adapted from Marco Minerva [MCPD] at Delete Lines from multiline textbox if it's contain certain string - C#
I tried the previous answers, but some of them with regex do not work right.
If you use a regex to find the empty lines, you can’t use the same for deleting.
Because it will erase "break lines" of lines that are not empty.
You have to use "regex groups" for this replace.
Some others answers here without regex can have performance issues.
private string remove_empty_lines(string text) {
StringBuilder text_sb = new StringBuilder(text);
Regex rg_spaces = new Regex(#"(\r\n|\r|\n)([\s]+\r\n|[\s]+\r|[\s]+\n)");
Match m = rg_spaces.Match(text_sb.ToString());
while (m.Success) {
text_sb = text_sb.Replace(m.Groups[2].Value, "");
m = rg_spaces.Match(text_sb.ToString());
}
return text_sb.ToString().Trim();
}
This pattern works perfect to remove empty lines and lines with only spaces and/or tabs.
s = Regex.Replace(s, "^\s*(\r\n|\Z)", "", RegexOptions.Multiline)