Remove tabs and spaces from text file - c#

I have a text file having text below:
Contact Name | Contact Number
Above like Contact Name\t\t|\t\tContact Number And i am using the following code to remove \t\t|\t\t :
using (StreamReader sr = File.OpenText(fileName))
{
string s = String.Empty;
while ((s = sr.ReadToEnd()) != null)
{
string[] line = s.Split(new string[] {"\t\t|\t\t"}, StringSplitOptions.RemoveEmptyEntries );
}
}
I am using the breakpoint to check the values in the "line" variable below:
"Contact Name"
"Contact Number\r\n\r\n"
The above line of code remove \t\t|\t\t but add \r\n\r\n. How do I remove both \t\t|\t\t and \r\n\r\n at a time. Kindly suggest me. Waiting for reply. thanks

Split your text by | char and then trim each line by removing tabs, spaces and new line characters from start and end of each line (you can also use Trim() without parameters in this case, because all characters you want to remove are considered white space characters):
string text = "Contact Name | Contact Number";
var lines = text.Split('|').Select(s => s.Trim('\t', '\n', '\r', ' '));
That will produce sequence of two lines:
"Contact Name"
"Contact Number"

You can use:
s = s.TrimEnd(new char[] { '\r', '\n' });
This will remove all newline characters from end of your string.

You can also split on "\r\n\r\n" and it will be removed.
string[] line = s.Split(new string[] {"\t\t|\t\t", "\r\n\r\n"}, StringSplitOptions.RemoveEmptyEntries );

Since you try to remove all characters (like \t, \n or \r) from your file, I would recommend you to read all the filecontent as a string and perform a simple .Replace(string oldChar, string newChar), which is slightly faster than using the .Split('').Join('') method.
StringBuilder sb = new StringBuilder();
using (StreamReader sr = new StreamReader(fileName))
{
String line;
while ((line = sr.ReadLine()) != null)
{
sb.AppendLine(line);
}
}
string content = sb.ToString();
// Remove all tabs and new lines
string cleanedContent = content.Replace('\t', String.Empty)
.Replace('\r', String.Empty)
.Replace('\n', String.Empty)
.Replace('|', String.Empty);

Related

Removing words from text with separators in front(using Regex or List)

I need to remove words from the text with separators next to them. The problem is that the program only removes 1 separator after the word but there are many of them. Any suggestions how to remove other separators?
Also, I need to make sure that the word is not connected with other letters. For example (If the word is fHouse or Housef it should not be removed)
At the moment I have:
public static void Process(string fin, string fout)
{
using (var foutv = File.CreateText(fout)) //fout - OutPut.txt
{
using (StreamReader reader = new StreamReader(fin)) // fin - InPut.txt
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] WordsToRemove = { "Home", "House", "Room" };
char[] seperators = {';', ' ', '.', ',', '!', '?', ':'};
foreach(string word in WordsToRemove)
{
foreach (char seperator in seperators)
{
line = line.Replace(word + seperator, string.Empty);
}
}
foutv.WriteLine(line);
}
}
}
}
I have :
fhgkHouse!House!Dog;;;!!Inside!C!Room!Home!House!Room;;;;;;;;;;!Table!London!Computer!Room;..;
Results I get:
fhgkDog;;;!!Inside!C!;;;;;;;;;!Table!London!Computer!..;
The results should be:
fhgkHouse!Dog;;;!!Inside!C!Table!London!Computer!
Try this regex : \b(Home|House|Room)(!|;)*\b|;+\.\.;+
See at: https://regex101.com/r/LUsyM8/1
In there, I substitute words and special characters with blank or empty string.
It produces the same expected result I guess.

How to read all lines in a file from a URI?

I'm currently reading the content of a .srt like this:
var assetURL = "some url";
var textFromFile = (new WebClient()).DownloadString(assetURL);
However, I need to be able to loop through all lines, like this:
string[] text = File.ReadAllLines(#"subs.srt");
foreach (string line in text)
{
// Do something
}
I can't do it like that because File.ReadAllLines does not support URIs. Any idea how I can accomplish this?
You can always split a string by Environment.NewLine with String.Split:
string[] lines = textFromFile.Split(new []{ Environment.NewLine }, StringSplitOptions.None);
You can use StringReader:
using (var sr = new StringReader(textFromFile))
{
string line;
while ((line = sr.ReadLine()) != null)
{
// sth with a line
}
}
Why it could be better than splitting by Environment.NewLine? It will handle both cases - when the new line character is \r\n or \n.

How to read a Text File upto occurrence of a special character asterisk

Am trying to read a text file in asp.net where the file is not in a particular format, so i just wanted to read that file up to special characters(*) and skip the rest.
In general it is of the format
00000 AFCX TY88YYY
12366 FTTT TY88YYY
** File Description
// This is so and so Description
** End of Description
12345 TYUI TY88YYY
45677 RERY TY88YYY
string file = "TextFile1.txt";
List<string> lines = new List<string>();
using (StreamReader r = new StreamReader(f))
{
string line;
while ((line = r.ReadLine()) != null && !line.StartsWith("*"))
{
lines.Add(line);
}
}
This will give you a list of all the lines except those beginning with *:
string[] yourFileContents = File.ReadAllLines(filePath);
List<string> contentsWithoutAsterix =
yourFileContents.Where(line => line.First() != '*').ToList();
PS (edit):
If you just want lines until the first occurence of *, do this instead:
List<string> contentsWithoutAsterix =
yourFileContents.TakeWhile(line => line.First() != '*').ToList();

Exception "String cannot be of Zero length"

We are trying to read each word from a text file and replace it with another word.
For smaller text files, it works well. But for larger text files we keep getting the exception: "String cannot be of zero length.
Parameter name: oldValue "
void replace()
{
string s1 = " ", s2 = " ";
StreamReader streamReader;
streamReader = File.OpenText("C:\\sample.txt");
StreamWriter streamWriter = File.CreateText("C:\\sample1.txt");
//int x = st.Rows.Count;
while ((line = streamReader.ReadLine()) != null)
{
char[] delimiterChars = { ' ', '\t' };
String[] words = line.Split(delimiterChars);
foreach (string str in words)
{
s1 = str;
DataRow drow = st.Rows.Find(str);
if (drow != null)
{
index = st.Rows.IndexOf(drow);
s2 = Convert.ToString(st.Rows[index]["Binary"]);
s2 += "000";
// Console.WriteLine(s1);
// Console.WriteLine(s2);
streamWriter.Write(s1.Replace(s1,s2)); // Exception occurs here
}
else
break;
}
}
streamReader.Close();
streamWriter.Close();
}
we're unable to find the reason.
Thanks in advance.
When you do your string.Split you may get empty entries if there are multiple spaces or tabs in sequence. These can't be replaced as the strings are 0 length.
Use the overload that strips empty results using the StringSplitOptions argument:
var words = line.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
The exception occurs because s1 is an empty string at some point. You can avoid this by replacing the line
String[] words = line.Split(delimiterChars);
with this:
String[] words = line.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
You want to change your Split method call like this:
String[] words = line.Split(delimiterChars,StringSplitOptions.RemoveEmptyEntries);
It means that s1 contains an empty string ("") which can happen if you have two consecutive white spaces or tabs in your file.

How to remove empty lines from a formatted string

How can I remove empty lines in a string in C#?
I am generating some text files in C# (Windows Forms) and for some reason there are some empty lines. How can I remove them after the string is generated (using StringBuilder and TextWrite).
Example text file:
THIS IS A LINE
THIS IS ANOTHER LINE AFTER SOME EMPTY LINES!
If you also want to remove lines that only contain whitespace, use
resultString = Regex.Replace(subjectString, #"^\s+$[\r\n]*", string.Empty, RegexOptions.Multiline);
^\s+$ will remove everything from the first blank line to the last (in a contiguous block of empty lines), including lines that only contain tabs or spaces.
[\r\n]* will then remove the last CRLF (or just LF which is important because the .NET regex engine matches the $ between a \r and a \n, funnily enough).
Tim Pietzcker - it is not working for me. I have to change a little bit, but thanks!
Ehhh C# Regex.. I had to change it again, but this it working well:
private string RemoveEmptyLines(string lines)
{
return Regex.Replace(lines, #"^\s*$\n|\r", string.Empty, RegexOptions.Multiline).TrimEnd();
}
Example:
http://regex101.com/r/vE5mP1/2
You could try String.Replace("\n\n", "\n");
Try this
Regex.Replace(subjectString, #"^\r?\n?$", "", RegexOptions.Multiline);
private string remove_space(string st)
{
String final = "";
char[] b = new char[] { '\r', '\n' };
String[] lines = st.Split(b, StringSplitOptions.RemoveEmptyEntries);
foreach (String s in lines)
{
if (!String.IsNullOrWhiteSpace(s))
{
final += s;
final += Environment.NewLine;
}
}
return final;
}
private static string RemoveEmptyLines(string text)
{
var lines = text.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
var sb = new StringBuilder(text.Length);
foreach (var line in lines)
{
sb.AppendLine(line);
}
return sb.ToString();
}
None of the methods mentioned here helped me all the way, but I found a workaround.
Split text to lines - collection of strings (with or without empty strings, also Trim() each string).
Add these lines to multiline string.
public static IEnumerable<string> SplitToLines(this string inputText, bool removeEmptyLines = true)
{
if (inputText == null)
{
yield break;
}
using (StringReader reader = new StringReader(inputText))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (removeEmptyLines && !string.IsNullOrWhiteSpace(line))
yield return line.Trim();
else
yield return line.Trim();
}
}
}
public static string ToMultilineText(this string text)
{
var lines = text.SplitToLines();
return string.Join(Environment.NewLine, lines);
}
Based on Evgeny Sobolev's code, I wrote this extension method, which also trims the last (obsolete) line break using TrimEnd(TrimNewLineChars):
public static class StringExtensions
{
private static readonly char[] TrimNewLineChars = Environment.NewLine.ToCharArray();
public static string RemoveEmptyLines(this string str)
{
if (str == null)
{
return null;
}
var lines = str.Split(TrimNewLineChars, StringSplitOptions.RemoveEmptyEntries);
var stringBuilder = new StringBuilder(str.Length);
foreach (var line in lines)
{
stringBuilder.AppendLine(line);
}
return stringBuilder.ToString().TrimEnd(TrimNewLineChars);
}
}
I found a simple answer to this problem:
YourradTextBox.Lines = YourradTextBox.Lines.Where(p => p.Length > 0).ToArray();
Adapted from Marco Minerva [MCPD] at Delete Lines from multiline textbox if it's contain certain string - C#
I tried the previous answers, but some of them with regex do not work right.
If you use a regex to find the empty lines, you can’t use the same for deleting.
Because it will erase "break lines" of lines that are not empty.
You have to use "regex groups" for this replace.
Some others answers here without regex can have performance issues.
private string remove_empty_lines(string text) {
StringBuilder text_sb = new StringBuilder(text);
Regex rg_spaces = new Regex(#"(\r\n|\r|\n)([\s]+\r\n|[\s]+\r|[\s]+\n)");
Match m = rg_spaces.Match(text_sb.ToString());
while (m.Success) {
text_sb = text_sb.Replace(m.Groups[2].Value, "");
m = rg_spaces.Match(text_sb.ToString());
}
return text_sb.ToString().Trim();
}
This pattern works perfect to remove empty lines and lines with only spaces and/or tabs.
s = Regex.Replace(s, "^\s*(\r\n|\Z)", "", RegexOptions.Multiline)

Categories

Resources