How to remove empty lines from a formatted string - c#

How can I remove empty lines in a string in C#?
I am generating some text files in C# (Windows Forms) and for some reason there are some empty lines. How can I remove them after the string is generated (using StringBuilder and TextWrite).
Example text file:
THIS IS A LINE
THIS IS ANOTHER LINE AFTER SOME EMPTY LINES!

If you also want to remove lines that only contain whitespace, use
resultString = Regex.Replace(subjectString, #"^\s+$[\r\n]*", string.Empty, RegexOptions.Multiline);
^\s+$ will remove everything from the first blank line to the last (in a contiguous block of empty lines), including lines that only contain tabs or spaces.
[\r\n]* will then remove the last CRLF (or just LF which is important because the .NET regex engine matches the $ between a \r and a \n, funnily enough).

Tim Pietzcker - it is not working for me. I have to change a little bit, but thanks!
Ehhh C# Regex.. I had to change it again, but this it working well:
private string RemoveEmptyLines(string lines)
{
return Regex.Replace(lines, #"^\s*$\n|\r", string.Empty, RegexOptions.Multiline).TrimEnd();
}
Example:
http://regex101.com/r/vE5mP1/2

You could try String.Replace("\n\n", "\n");

Try this
Regex.Replace(subjectString, #"^\r?\n?$", "", RegexOptions.Multiline);

private string remove_space(string st)
{
String final = "";
char[] b = new char[] { '\r', '\n' };
String[] lines = st.Split(b, StringSplitOptions.RemoveEmptyEntries);
foreach (String s in lines)
{
if (!String.IsNullOrWhiteSpace(s))
{
final += s;
final += Environment.NewLine;
}
}
return final;
}

private static string RemoveEmptyLines(string text)
{
var lines = text.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
var sb = new StringBuilder(text.Length);
foreach (var line in lines)
{
sb.AppendLine(line);
}
return sb.ToString();
}

None of the methods mentioned here helped me all the way, but I found a workaround.
Split text to lines - collection of strings (with or without empty strings, also Trim() each string).
Add these lines to multiline string.
public static IEnumerable<string> SplitToLines(this string inputText, bool removeEmptyLines = true)
{
if (inputText == null)
{
yield break;
}
using (StringReader reader = new StringReader(inputText))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (removeEmptyLines && !string.IsNullOrWhiteSpace(line))
yield return line.Trim();
else
yield return line.Trim();
}
}
}
public static string ToMultilineText(this string text)
{
var lines = text.SplitToLines();
return string.Join(Environment.NewLine, lines);
}

Based on Evgeny Sobolev's code, I wrote this extension method, which also trims the last (obsolete) line break using TrimEnd(TrimNewLineChars):
public static class StringExtensions
{
private static readonly char[] TrimNewLineChars = Environment.NewLine.ToCharArray();
public static string RemoveEmptyLines(this string str)
{
if (str == null)
{
return null;
}
var lines = str.Split(TrimNewLineChars, StringSplitOptions.RemoveEmptyEntries);
var stringBuilder = new StringBuilder(str.Length);
foreach (var line in lines)
{
stringBuilder.AppendLine(line);
}
return stringBuilder.ToString().TrimEnd(TrimNewLineChars);
}
}

I found a simple answer to this problem:
YourradTextBox.Lines = YourradTextBox.Lines.Where(p => p.Length > 0).ToArray();
Adapted from Marco Minerva [MCPD] at Delete Lines from multiline textbox if it's contain certain string - C#

I tried the previous answers, but some of them with regex do not work right.
If you use a regex to find the empty lines, you can’t use the same for deleting.
Because it will erase "break lines" of lines that are not empty.
You have to use "regex groups" for this replace.
Some others answers here without regex can have performance issues.
private string remove_empty_lines(string text) {
StringBuilder text_sb = new StringBuilder(text);
Regex rg_spaces = new Regex(#"(\r\n|\r|\n)([\s]+\r\n|[\s]+\r|[\s]+\n)");
Match m = rg_spaces.Match(text_sb.ToString());
while (m.Success) {
text_sb = text_sb.Replace(m.Groups[2].Value, "");
m = rg_spaces.Match(text_sb.ToString());
}
return text_sb.ToString().Trim();
}

This pattern works perfect to remove empty lines and lines with only spaces and/or tabs.
s = Regex.Replace(s, "^\s*(\r\n|\Z)", "", RegexOptions.Multiline)

Related

Reading from txt file to array/list<>

I need to read all of .txt file and save data to array/list. File looks like this:
row11 row12 row13
row21 row22 row23
row31 row32 row33
between strings are only spaces.
Next I will insert data from array/list<> to mysql, but it is not problem.
Thanks.
EDIT: I need insert 3 columns to mysql like .txt file.
Use String.Split(Char[], StringSplitOptions) where the first parameter specifies that you want to split your string using spaces and tabs, and the second parameter specifies that you ignore empty entries (for cases where there are multiple spaces between entries)
Use this code:
var lines = System.IO.File.ReadAllLines(#"D:\test.txt");
var data = new List<List<string>>();
foreach (var line in lines)
{
var split = line.Split(new[]{' ', '\t'}, StringSplitOptions.RemoveEmptyEntries);
data.Add(split.ToList());
}
You can use File.ReadLines() to read the lines from the file, and then Regex.Split() to split each line into multiple strings:
static IEnumerable<String> SplitLines(string path, string splitPattern)
{
foreach (string line in File.ReadAllLines(path))
foreach (string part in Regex.Split(line, splitPattern))
yield return part;
}
To split by white space, you can use the regex pattern \s+:
var individualStrings = SplitLines(#"C:\path\to\file.txt", #"\s+");
You can use the ToList() extension method to convert it to a list:
List<string> individualStrings = SplitLines(#"D:\test\rows.txt", #"\s+").ToList();
As long as there are never spaces in the "values", then a simple line-by line parser will work.
A simple example
var reader = new StreamReader(filePath);
var resultList = new List<List<string>>();
string line;
while ((line = reader.ReadLine()) != null)
{
var currentValues = new List<string>();
// You can also use a StringBuilder
string currentValue = String.Empty;
foreach (char c in line)
{
if (Char.IsWhiteSpace(c))
{
if (currentValue.Length > 0)
{
currentValues.Add(currentValue);
currentValue = String.Empty;
}
continue;
}
currentValue += c;
}
resultList.Add(currentValues);
}
Here's a nifty one-liner based off Amadeusz's answer:
var lines = File.ReadAllLines(fileName).Select(l => l.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries)).SelectMany(words => words);

How can i delete space from text file and replace it semicolon?

I have this data into the test text file:
behzad razzaqi xezerlooot abrizii ast
i want delete space and replace space one semicolon character,write this code in c# for that:
string[] allLines = File.ReadAllLines(#"d:\test.txt");
using (StreamWriter sw = new StreamWriter(#"d:\test.txt"))
{
foreach (string line in allLines)
{
if (!string.IsNullOrEmpty(line) && line.Length > 1)
{
sw.WriteLine(line.Replace(" ", ";"));
}
}
}
MessageBox.Show("ok");
behzad;;razzaqi;;xezerlooot;;;abrizii;;;;;ast
but i want one semicolon in space.how can i solve that?
Regex is an option:
string[] allLines = File.ReadAllLines(#"d:\test.txt");
using (StreamWriter sw = new StreamWriter(#"d:\test.txt"))
{
foreach (string line in allLines)
{
if (!string.IsNullOrEmpty(line) && line.Length > 1)
{
sw.WriteLine(Regex.Replace(line,#"\s+",";"));
}
}
}
MessageBox.Show("ok");
Use this code:
string[] allLines = File.ReadAllLines(#"d:\test.txt");
using (StreamWriter sw = new StreamWriter(#"d:\test.txt"))
{
foreach (string line in allLines)
{
string[] words = line.Split(" ", StringSplitOptions.RemoveEmptyEntries);
string joined = String.Join(";", words);
sw.WriteLine(joined);
}
}
You need to use a regular expression:
(\s\s+)
Usage
var input = "behzad razzaqi xezerlooot abrizii ast";
var pattern = "(\s\s+)";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, ';');
You can do that with a regular expression.
using System.Text.RegularExpressions;
and:
string pattern = "\\s+";
string replacement = ";";
Regex rgx = new Regex(pattern);
sw.WriteLine(rgx.Replace(line, replacement));
This regular expression matches any series of 1 or more spaces and replaces the entire series with a semicolon.
you can try this
Regex r=new Regex(#"\s+");
string result=r.Replace("YourString",";");
\s+ is for matching all spaces. + is for one or more occurrences.
for more information on regular expression see http://www.w3schools.com/jsref/jsref_obj_regexp.asp
You should check a string length after replacement, not before ;-).
const string file = #"d:\test.txt";
var result = File.ReadAllLines(file).Select(line => Regex.Replace(line, #"\s+", ";"));
File.WriteAllLines(file, result.Where(line => line.Length > 1));
...and don't forget, that for input hello you will get ;hello;.

Punctuation Problems

This is a program that reads in a CSV file, adds the values to a dictionary class and then analyses a string in a textbox to see if any of the words match the dictionary entry. It will replace abbreviations (LOL, ROFL etc) into their real words. It matches strings by splitting the inputted text into individual words.
public void btnanalyze_Click(object sender, EventArgs e)
{
var abbrev = new Dictionary<string, string>();
using (StreamReader reader = new StreamReader("C:/Users/Jordan Moffat/Desktop/coursework/textwords0.csv"))
{
string line;
string[] row;
while ((line = reader.ReadLine()) != null)
{
row = line.Split(',');
abbrev.Add(row[0], row[1]);
Console.WriteLine(abbrev);
}
}
string twitterinput;
twitterinput = "";
// string output;
twitterinput = txtInput.Text;
char[] delimiterChars = { ' ', ',', '.', ':', '\t' };
string text = twitterinput;
string[] words = twitterinput.Split(delimiterChars);
string merge;
foreach (string s in words)
{
if (abbrev.ContainsKey(s))
{
string value = abbrev[s];
merge = string.Join(" ", value);
}
if (!abbrev.ContainsKey(s))
{
string not = s;
merge = string.Join(" ", not);
}
MessageBox.Show(merge);
}
}
The problem is that the program won't translate the word if there's punctuation. I realised the character set I was using meant that punctuation wasn't a problem, but also didn't allow me to retain it when printing out. Is there a way that I can ignore the last character, as opposed to removing it, and still retain it for the output? I was trying to write it into a new variable, but I can't find a way to do that either...
That seems overly complicated. You can do the same thing with regular expressions and backreferences.
foreach(var line in yourReader)
{
var dict = new Dictionary<string,string>(); // your replacement dictionaries
foreach(var kvp in dict)
{
System.Text.RegularExpressions.Regex.Replace(line,"(\s|,|\.|:|\\t)" + kvp.Key + "(\s|,|\.|:|\\t)","\0" + kvp.Value + "\1");
}
}
I hacked this regex together so it may not be right, but it's the basic idea.

Exception "String cannot be of Zero length"

We are trying to read each word from a text file and replace it with another word.
For smaller text files, it works well. But for larger text files we keep getting the exception: "String cannot be of zero length.
Parameter name: oldValue "
void replace()
{
string s1 = " ", s2 = " ";
StreamReader streamReader;
streamReader = File.OpenText("C:\\sample.txt");
StreamWriter streamWriter = File.CreateText("C:\\sample1.txt");
//int x = st.Rows.Count;
while ((line = streamReader.ReadLine()) != null)
{
char[] delimiterChars = { ' ', '\t' };
String[] words = line.Split(delimiterChars);
foreach (string str in words)
{
s1 = str;
DataRow drow = st.Rows.Find(str);
if (drow != null)
{
index = st.Rows.IndexOf(drow);
s2 = Convert.ToString(st.Rows[index]["Binary"]);
s2 += "000";
// Console.WriteLine(s1);
// Console.WriteLine(s2);
streamWriter.Write(s1.Replace(s1,s2)); // Exception occurs here
}
else
break;
}
}
streamReader.Close();
streamWriter.Close();
}
we're unable to find the reason.
Thanks in advance.
When you do your string.Split you may get empty entries if there are multiple spaces or tabs in sequence. These can't be replaced as the strings are 0 length.
Use the overload that strips empty results using the StringSplitOptions argument:
var words = line.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
The exception occurs because s1 is an empty string at some point. You can avoid this by replacing the line
String[] words = line.Split(delimiterChars);
with this:
String[] words = line.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
You want to change your Split method call like this:
String[] words = line.Split(delimiterChars,StringSplitOptions.RemoveEmptyEntries);
It means that s1 contains an empty string ("") which can happen if you have two consecutive white spaces or tabs in your file.

C#: How do I prepend text to each line in a string?

What would an implementation of 'MagicFunction' look like to make the following (nunit) test pass?
public MagicFunction_Should_Prepend_Given_String_To_Each_Line()
{
var str = #"line1
line2
line3";
var result = MagicFunction(str, "-- ");
var expected = #"-- line1
-- line2
-- line3";
Assert.AreEqual(expected, result);
}
string MagicFunction(string str, string prepend)
{
str = str.Replace("\n", "\n" + prepend);
str = prepend + str;
return str;
}
EDIT:
As others have pointed out, the newline characters vary between environments. If you're only planning to use this function on files that were created in the same environment then System.Environment will work fine. However, if you create a file on a Linux box and then transfer it over to a Windows box you'll want to specify a different type of newline. Since Linux uses \n and Windows uses \r\n this piece of code will work for both Windows and Linux files. If you're throwing Macs into the mix (\r) you'll have to come up with something a little more involved.
Use .Select on a list of the lines.
private static string MagicFunction(string str, string prefix)
{
string[] lines = str.Split(new[] { '\n' });
return string.Join("\n", lines.Select(s => prefix + s).ToArray());
}
How about:
string MagicFunction(string InputText) {
public static Regex regex = new Regex(
"(^|\\r\\n)",
RegexOptions.IgnoreCase
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
// This is the replacement string
public static string regexReplace =
"$1-- ";
// Replace the matched text in the InputText using the replacement pattern
string result = regex.Replace(InputText,regexReplace);
return result;
}
var result = "-- " + str.Replace(Environment.NewLine, Environment.NewLine + "-- ");
if you want it cope with either Windows (\r\n) NewLines or Unix ones (\n) then:
var result = "-- " + str.Replace("\n", "\n-- ");
No need to touch the \r as it is to be left where it was before. If however you want to cross between Unix and Windows then:
var result = "-- " + str.Replace("\r","").Replace("\n", Enviornment.NewLine + "-- ");
Will do it and return the result in the local OS's format
You could do it like that :
public string MagicFunction2(string str, string prefix)
{
bool first = true;
using(StringWriter writer = new StringWriter())
using(StringReader reader = new StringReader(str))
{
string line;
while((line = reader.ReadLine()) != null)
{
if (!first)
writer.WriteLine();
writer.Write(prefix + line);
first = false;
}
return writer.ToString();
}
}
You could split the string by Environment.NewLine, and then add the prefix to each of those string, and then join them by Environment.NewLine.
string MagicFunction(string prefix, string orignalString)
{
List<string> prefixed = new List<string>();
foreach (string s in orignalString.Split(new[]{Environment.NewLine}, StringSplitOptions.None))
{
prefixed.Add(prefix + s);
}
return String.Join(Environment.NewLine, prefixed.ToArray());
}
How about this. It uses StringBuilder in case you are planning on prepending a lot of lines.
string MagicFunction(string input)
{
StringBuilder sb = new StringBuilder();
StringReader sr = new StringReader(input);
string line = null;
using(StringReader sr = new StringReader(input))
{
while((line = sr.ReadLine()) != null)
{
sb.Append(String.Concat("-- ", line, System.Environment.NewLine));
}
}
return sb.ToString();
}
Thanks all for your answers. I implemented the MagicFunction as an extension method. It leverages Thomas Levesque's answer but is enhanced to handle all major environments AND assumes you want the output string to use the same newline terminator of the input string.
I favored Thomas Levesque's answer (over Spencer Ruport's, Fredrik Mork's, Lazarus, and JDunkerley) because it was the best performing. I'll post performance results on my blog and link here later for those interested.
(Obviously, the function name of 'MagicFunctionIO' should be changed. I went with 'PrependEachLineWith')
public static string MagicFunctionIO(this string self, string prefix)
{
string terminator = self.GetLineTerminator();
using (StringWriter writer = new StringWriter())
{
using (StringReader reader = new StringReader(self))
{
bool first = true;
string line;
while ((line = reader.ReadLine()) != null)
{
if (!first)
writer.Write(terminator);
writer.Write(prefix + line);
first = false;
}
return writer.ToString();
}
}
}
public static string GetLineTerminator(this string self)
{
if (self.Contains("\r\n")) // windows
return "\r\n";
else if (self.Contains("\n")) // unix
return "\n";
else if (self.Contains("\r")) // mac
return "\r";
else // default, unknown env or no line terminators
return Environment.NewLine;
}

Categories

Resources