What would an implementation of 'MagicFunction' look like to make the following (nunit) test pass?
public MagicFunction_Should_Prepend_Given_String_To_Each_Line()
{
var str = #"line1
line2
line3";
var result = MagicFunction(str, "-- ");
var expected = #"-- line1
-- line2
-- line3";
Assert.AreEqual(expected, result);
}
string MagicFunction(string str, string prepend)
{
str = str.Replace("\n", "\n" + prepend);
str = prepend + str;
return str;
}
EDIT:
As others have pointed out, the newline characters vary between environments. If you're only planning to use this function on files that were created in the same environment then System.Environment will work fine. However, if you create a file on a Linux box and then transfer it over to a Windows box you'll want to specify a different type of newline. Since Linux uses \n and Windows uses \r\n this piece of code will work for both Windows and Linux files. If you're throwing Macs into the mix (\r) you'll have to come up with something a little more involved.
Use .Select on a list of the lines.
private static string MagicFunction(string str, string prefix)
{
string[] lines = str.Split(new[] { '\n' });
return string.Join("\n", lines.Select(s => prefix + s).ToArray());
}
How about:
string MagicFunction(string InputText) {
public static Regex regex = new Regex(
"(^|\\r\\n)",
RegexOptions.IgnoreCase
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
// This is the replacement string
public static string regexReplace =
"$1-- ";
// Replace the matched text in the InputText using the replacement pattern
string result = regex.Replace(InputText,regexReplace);
return result;
}
var result = "-- " + str.Replace(Environment.NewLine, Environment.NewLine + "-- ");
if you want it cope with either Windows (\r\n) NewLines or Unix ones (\n) then:
var result = "-- " + str.Replace("\n", "\n-- ");
No need to touch the \r as it is to be left where it was before. If however you want to cross between Unix and Windows then:
var result = "-- " + str.Replace("\r","").Replace("\n", Enviornment.NewLine + "-- ");
Will do it and return the result in the local OS's format
You could do it like that :
public string MagicFunction2(string str, string prefix)
{
bool first = true;
using(StringWriter writer = new StringWriter())
using(StringReader reader = new StringReader(str))
{
string line;
while((line = reader.ReadLine()) != null)
{
if (!first)
writer.WriteLine();
writer.Write(prefix + line);
first = false;
}
return writer.ToString();
}
}
You could split the string by Environment.NewLine, and then add the prefix to each of those string, and then join them by Environment.NewLine.
string MagicFunction(string prefix, string orignalString)
{
List<string> prefixed = new List<string>();
foreach (string s in orignalString.Split(new[]{Environment.NewLine}, StringSplitOptions.None))
{
prefixed.Add(prefix + s);
}
return String.Join(Environment.NewLine, prefixed.ToArray());
}
How about this. It uses StringBuilder in case you are planning on prepending a lot of lines.
string MagicFunction(string input)
{
StringBuilder sb = new StringBuilder();
StringReader sr = new StringReader(input);
string line = null;
using(StringReader sr = new StringReader(input))
{
while((line = sr.ReadLine()) != null)
{
sb.Append(String.Concat("-- ", line, System.Environment.NewLine));
}
}
return sb.ToString();
}
Thanks all for your answers. I implemented the MagicFunction as an extension method. It leverages Thomas Levesque's answer but is enhanced to handle all major environments AND assumes you want the output string to use the same newline terminator of the input string.
I favored Thomas Levesque's answer (over Spencer Ruport's, Fredrik Mork's, Lazarus, and JDunkerley) because it was the best performing. I'll post performance results on my blog and link here later for those interested.
(Obviously, the function name of 'MagicFunctionIO' should be changed. I went with 'PrependEachLineWith')
public static string MagicFunctionIO(this string self, string prefix)
{
string terminator = self.GetLineTerminator();
using (StringWriter writer = new StringWriter())
{
using (StringReader reader = new StringReader(self))
{
bool first = true;
string line;
while ((line = reader.ReadLine()) != null)
{
if (!first)
writer.Write(terminator);
writer.Write(prefix + line);
first = false;
}
return writer.ToString();
}
}
}
public static string GetLineTerminator(this string self)
{
if (self.Contains("\r\n")) // windows
return "\r\n";
else if (self.Contains("\n")) // unix
return "\n";
else if (self.Contains("\r")) // mac
return "\r";
else // default, unknown env or no line terminators
return Environment.NewLine;
}
Related
I Have one text file and I want to replaces all matches in each line, so I defined Pattern and I loop through to the text file after I want to write the result in another file, unfortunately my pattern is only replace first occurrence of the word what did |I do in a wrong way?
Content of text file:
"testebook kok o testebook\ntestbbb1232 joj ds testbbb1232"
using System.Text.RegularExpressions;
string filePath = "test.txt";
string fileNewPath = "test1.txt";
string ma = #"^test[0-9a-zA-Z]+";
string newString = string.Empty;
using(StreamReader sr = new(filePath)){
string line = sr.ReadLine();
while (line != null){
while(Regex.IsMatch(line, ma) != false){
line = Regex.Replace(line, ma, "");
}
newString += line + "\n";
line = sr.ReadLine();
}
}
using(StreamWriter sw = new(fileNewPath)){
sw.WriteLine(newString);
}
Your code is correct but your regex pattern is not correct.
you should write this:
string ma = #"test[0-9a-zA-Z]+";
The letter "^" has removed from pattern
So I modified My pattern and remove start with character and everything works now as desired
using System.Text.RegularExpressions;
string filePath = "test.txt";
string fileNewPath = "test1.txt";
MatchesFinder test = new(filePath, fileNewPath);
test.RunTheProcess();
class MatchesFinder{
private string filePath;
private string fileNewPath;
private string ma = #"test[a-zA-Z0-9]+";
public MatchesFinder(string filePath,string fileNewPath){
this.filePath = filePath;
this.fileNewPath = fileNewPath;
}
public void RunTheProcess(){
string newString = string.Empty;
using(StreamReader sr = new(filePath)){
string line = sr.ReadLine();
while (line != null){
while(Regex.IsMatch(line, ma) != false){
line = Regex.Replace(line, ma, string.Empty);
}
newString += line.TrimStart() + "\n";
line = sr.ReadLine();
}
}
using(StreamWriter sw = new(fileNewPath)){
sw.WriteLine(newString);
}
}
}
I think you don´t need to check IsMatch separately, just calling Regex.Replace should yield the same result.
Also, newString += line.TrimStart() + "\n"; means you´re copying all the lines you´ve already checked every time you append a new line. I´d either write directly to the output stream or at least use a StringBuilder if you really want to have the full file in memory for some reason.
Something like this:
using var sw = new StreamWriter(fileNewPath);
using var sr = new StreamReader(filePath);
var line = sr.ReadLine();
while (line != null){
line = Regex.Replace(line, ma, string.Empty);
sw.WriteLine(line.TrimStart());
line = sr.ReadLine();
}
How can I remove all the HTML tags including   using regex in C#. My string looks like
"<div>hello</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div> </div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div>"
If you can't use an HTML parser oriented solution to filter out the tags, here's a simple regex for it.
string noHTML = Regex.Replace(inputHTML, #"<[^>]+>| ", "").Trim();
You should ideally make another pass through a regex filter that takes care of multiple spaces as
string noHTMLNormalised = Regex.Replace(noHTML, #"\s{2,}", " ");
I took #Ravi Thapliyal's code and made a method: It is simple and might not clean everything, but so far it is doing what I need it to do.
public static string ScrubHtml(string value) {
var step1 = Regex.Replace(value, #"<[^>]+>| ", "").Trim();
var step2 = Regex.Replace(step1, #"\s{2,}", " ");
return step2;
}
I've been using this function for a while. Removes pretty much any messy html you can throw at it and leaves the text intact.
private static readonly Regex _tags_ = new Regex(#"<[^>]+?>", RegexOptions.Multiline | RegexOptions.Compiled);
//add characters that are should not be removed to this regex
private static readonly Regex _notOkCharacter_ = new Regex(#"[^\w;&##.:/\\?=|%!() -]", RegexOptions.Compiled);
public static String UnHtml(String html)
{
html = HttpUtility.UrlDecode(html);
html = HttpUtility.HtmlDecode(html);
html = RemoveTag(html, "<!--", "-->");
html = RemoveTag(html, "<script", "</script>");
html = RemoveTag(html, "<style", "</style>");
//replace matches of these regexes with space
html = _tags_.Replace(html, " ");
html = _notOkCharacter_.Replace(html, " ");
html = SingleSpacedTrim(html);
return html;
}
private static String RemoveTag(String html, String startTag, String endTag)
{
Boolean bAgain;
do
{
bAgain = false;
Int32 startTagPos = html.IndexOf(startTag, 0, StringComparison.CurrentCultureIgnoreCase);
if (startTagPos < 0)
continue;
Int32 endTagPos = html.IndexOf(endTag, startTagPos + 1, StringComparison.CurrentCultureIgnoreCase);
if (endTagPos <= startTagPos)
continue;
html = html.Remove(startTagPos, endTagPos - startTagPos + endTag.Length);
bAgain = true;
} while (bAgain);
return html;
}
private static String SingleSpacedTrim(String inString)
{
StringBuilder sb = new StringBuilder();
Boolean inBlanks = false;
foreach (Char c in inString)
{
switch (c)
{
case '\r':
case '\n':
case '\t':
case ' ':
if (!inBlanks)
{
inBlanks = true;
sb.Append(' ');
}
continue;
default:
inBlanks = false;
sb.Append(c);
break;
}
}
return sb.ToString().Trim();
}
var noHtml = Regex.Replace(inputHTML, #"<[^>]*(>|$)| ||»|«", string.Empty).Trim();
I have used the #RaviThapliyal & #Don Rolling's code but made a little modification. Since we are replacing the   with empty string but instead   should be replaced with space, so added an additional step. It worked for me like a charm.
public static string FormatString(string value) {
var step1 = Regex.Replace(value, #"<[^>]+>", "").Trim();
var step2 = Regex.Replace(step1, #" ", " ");
var step3 = Regex.Replace(step2, #"\s{2,}", " ");
return step3;
}
Used &nbps without semicolon because it was getting formatted by the Stack Overflow.
this:
(<.+?> | )
will match any tag or
string regex = #"(<.+?>| )";
var x = Regex.Replace(originalString, regex, "").Trim();
then x = hello
Sanitizing an Html document involves a lot of tricky things. This package maybe of help:
https://github.com/mganss/HtmlSanitizer
HTML is in its basic form just XML. You could Parse your text in an XmlDocument object, and on the root element call InnerText to extract the text. This will strip all HTML tages in any form and also deal with special characters like < all in one go.
i'm using this syntax for remove html tags with
SessionTitle:result[i].sessionTitle.replace(/<[^>]+>|&**nbsp**;/g, '')
--Remove(*) **nbsp**
(<([^>]+)>| )
You can test it here:
https://regex101.com/r/kB0rQ4/1
I need to use a string for path for a file but sometimes there are forbidden characters in this string and I must replace them. For example, my string _title is rumbaton jonathan \"racko\" contreras.
Well I should replace the chars \ and ".
I tried this but it doesn't work:
_title.Replace(#"/", "");
_title.Replace(#"\", "");
_title.Replace(#"*", "");
_title.Replace(#"?", "");
_title.Replace(#"<", "");
_title.Replace(#">", "");
_title.Replace(#"|", "");
Since strings are immutable, the Replace method returns a new string, it doesn't modify the instance you are calling it on. So try this:
_title = _title
.Replace(#"/", "")
.Replace(#"""", "")
.Replace(#"*", "")
.Replace(#"?", "")
.Replace(#"<", "")
.Replace(#">", "")
.Replace(#"|", "");
Also if you want to replace " make sure you have properly escaped it.
Try regex
string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
illegal = r.Replace(illegal, "");
Before: "M"\a/ry/ h**ad:>> a/:?"| litt|le|| la"mb.?
After: Mary had a little lamb.
Also another answer from same post is much cleaner
private static string CleanFileName(string fileName)
{
return Path.GetInvalidFileNameChars().Aggregate(fileName, (current, c) => current.Replace(c.ToString(), string.Empty));
}
from How to remove illegal characters from path and filenames?
Or you could try this (probably terribly inefficient) method:
string inputString = #"File ~!##$%^&*()_+|`1234567890-=\[];',./{}:""<>? name";
var badchars = Path.GetInvalidFileNameChars();
foreach (var c in badchars)
inputString = inputString.Replace(c.ToString(), "");
The result will be:
File ~!##$%^&()_+`1234567890-=[];',.{} name
But feel free to add more chars to the badchars before running the foreach loop on them.
See http://msdn.microsoft.com/cs-cz/library/fk49wtc1.aspx:
Returns a string that is equivalent to the current string except that all instances of oldValue are replaced with newValue.
I have written a method to do the exact operation that you want and with much cleaner code.
The method
public static string Delete(this string target, string samples) {
if (string.IsNullOrEmpty(target) || string.IsNullOrEmpty(samples))
return target;
var tar = target.ToCharArray();
const char deletechar = '♣'; //a char that most likely never to be used in the input
for (var i = 0; i < tar.Length; i++) {
for (var j = 0; j < samples.Length; j++) {
if (tar[i] == samples[j]) {
tar[i] = deletechar;
break;
}
}
}
return tar.ConvertToString().Replace(deletechar.ToString(CultureInfo.InvariantCulture), string.Empty);
}
Sample
var input = "rumbaton jonathan \"racko\" contreras";
var cleaned = input.Delete("\"\\/*?><|");
Will result in:
rumbaton jonathan racko contreras
Ok ! I've solved my issue thanks to all your indications. This is my correction :
string newFileName = _artist + " - " + _title;
char[] invalidFileChars = Path.GetInvalidFileNameChars();
char[] invalidPathChars = Path.GetInvalidPathChars();
foreach (char invalidChar in invalidFileChars)
{
newFileName = newFileName.Replace(invalidChar.ToString(), string.Empty);
}
foreach (char invalidChar in invalidPathChars)
{
newFilePath = newFilePath.Replace(invalidChar.ToString(), string.Empty);
}
Thank you so musch everybody :)
How can I remove empty lines in a string in C#?
I am generating some text files in C# (Windows Forms) and for some reason there are some empty lines. How can I remove them after the string is generated (using StringBuilder and TextWrite).
Example text file:
THIS IS A LINE
THIS IS ANOTHER LINE AFTER SOME EMPTY LINES!
If you also want to remove lines that only contain whitespace, use
resultString = Regex.Replace(subjectString, #"^\s+$[\r\n]*", string.Empty, RegexOptions.Multiline);
^\s+$ will remove everything from the first blank line to the last (in a contiguous block of empty lines), including lines that only contain tabs or spaces.
[\r\n]* will then remove the last CRLF (or just LF which is important because the .NET regex engine matches the $ between a \r and a \n, funnily enough).
Tim Pietzcker - it is not working for me. I have to change a little bit, but thanks!
Ehhh C# Regex.. I had to change it again, but this it working well:
private string RemoveEmptyLines(string lines)
{
return Regex.Replace(lines, #"^\s*$\n|\r", string.Empty, RegexOptions.Multiline).TrimEnd();
}
Example:
http://regex101.com/r/vE5mP1/2
You could try String.Replace("\n\n", "\n");
Try this
Regex.Replace(subjectString, #"^\r?\n?$", "", RegexOptions.Multiline);
private string remove_space(string st)
{
String final = "";
char[] b = new char[] { '\r', '\n' };
String[] lines = st.Split(b, StringSplitOptions.RemoveEmptyEntries);
foreach (String s in lines)
{
if (!String.IsNullOrWhiteSpace(s))
{
final += s;
final += Environment.NewLine;
}
}
return final;
}
private static string RemoveEmptyLines(string text)
{
var lines = text.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
var sb = new StringBuilder(text.Length);
foreach (var line in lines)
{
sb.AppendLine(line);
}
return sb.ToString();
}
None of the methods mentioned here helped me all the way, but I found a workaround.
Split text to lines - collection of strings (with or without empty strings, also Trim() each string).
Add these lines to multiline string.
public static IEnumerable<string> SplitToLines(this string inputText, bool removeEmptyLines = true)
{
if (inputText == null)
{
yield break;
}
using (StringReader reader = new StringReader(inputText))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (removeEmptyLines && !string.IsNullOrWhiteSpace(line))
yield return line.Trim();
else
yield return line.Trim();
}
}
}
public static string ToMultilineText(this string text)
{
var lines = text.SplitToLines();
return string.Join(Environment.NewLine, lines);
}
Based on Evgeny Sobolev's code, I wrote this extension method, which also trims the last (obsolete) line break using TrimEnd(TrimNewLineChars):
public static class StringExtensions
{
private static readonly char[] TrimNewLineChars = Environment.NewLine.ToCharArray();
public static string RemoveEmptyLines(this string str)
{
if (str == null)
{
return null;
}
var lines = str.Split(TrimNewLineChars, StringSplitOptions.RemoveEmptyEntries);
var stringBuilder = new StringBuilder(str.Length);
foreach (var line in lines)
{
stringBuilder.AppendLine(line);
}
return stringBuilder.ToString().TrimEnd(TrimNewLineChars);
}
}
I found a simple answer to this problem:
YourradTextBox.Lines = YourradTextBox.Lines.Where(p => p.Length > 0).ToArray();
Adapted from Marco Minerva [MCPD] at Delete Lines from multiline textbox if it's contain certain string - C#
I tried the previous answers, but some of them with regex do not work right.
If you use a regex to find the empty lines, you can’t use the same for deleting.
Because it will erase "break lines" of lines that are not empty.
You have to use "regex groups" for this replace.
Some others answers here without regex can have performance issues.
private string remove_empty_lines(string text) {
StringBuilder text_sb = new StringBuilder(text);
Regex rg_spaces = new Regex(#"(\r\n|\r|\n)([\s]+\r\n|[\s]+\r|[\s]+\n)");
Match m = rg_spaces.Match(text_sb.ToString());
while (m.Success) {
text_sb = text_sb.Replace(m.Groups[2].Value, "");
m = rg_spaces.Match(text_sb.ToString());
}
return text_sb.ToString().Trim();
}
This pattern works perfect to remove empty lines and lines with only spaces and/or tabs.
s = Regex.Replace(s, "^\s*(\r\n|\Z)", "", RegexOptions.Multiline)
Hey guys, thanks for all the help that you can provide. I need a little bit of regex help thats far beyond my knowledge.
I have a listbox with a file name in it, example 3123~101, a delimited file that has 1 line of text in it. I need to Regex everything after the last "\" before the last "-" in the text file. The ending will could contain a prefix then ###{####-004587}.txt The ~ formula is {### + ~# -1.
File name:
3123~101
So Example 1:
3123|X:directory\Path\Directory|Pre0{0442-0500}.txt
Result:
X:\directory\Path\Directory\Pre00542.txt
File name:
3123~101
So Example 1:
3123|X:directory\Path\Directory|0{0442-0500}.txt
Result:
X:\directory\Path\Directory\00542.txt
According your example I've created the following regexp:
\|(.)(.*)\|(.*)\{\d{2}(\d{2})\-(\d{2}).*(\..*)
The result should be as following:
group1 + "\\" + group2 + "\\" + group3 + group5 + group4 + group6
If you ain't satisfied, you can always give it a spin yourself here.
EDIT:
After remembering me about named groups:
\|(?<drive>.)(?<path>.*)\|(?<prefix>.*)\{\d{2}(?<number2>\d{2})\-(?<number1>\d{2}).*(?<extension>\..*)
drive + "\\" + path + "\\" + prefix + number1 + number2 + extension
public static string AdjustPath(string filename, string line)
{
int tilde = GetTilde(filename);
string[] fields = Regex.Split(line, #"\|");
var addbackslash = new MatchEvaluator(
m => m.Groups[1].Value + "\\" + m.Groups[2].Value);
string dir = Regex.Replace(fields[1], #"^([A-Z]:)([^\\])", addbackslash);
var addtilde = new MatchEvaluator(
m => (tilde + Int32.Parse(m.Groups[1].Value) - 1).
ToString().
PadLeft(m.Groups[1].Value.Length, '0'));
return Path.Combine(dir, Regex.Replace(fields[2], #"\{(\d+)-.+}", addtilde));
}
private static int GetTilde(string filename)
{
Match m = Regex.Match(filename, #"^.+~(\d+)$");
if (!m.Success)
throw new ArgumentException("Invalid filename", "filename");
return Int32.Parse(m.Groups[1].Value);
}
Call AdjustPath as in the following:
public static void Main(string[] args)
{
Console.WriteLine(AdjustPath("3123~101", #"3123|X:directory\Path\Directory|Pre0{0442-0500}.txt"));
Console.WriteLine(AdjustPath("3123~101", #"3123|X:directory\Path\Directory|0{0442-0500}.txt"));
}
Output:
X:\directory\Path\Directory\Pre00542.txt
X:\directory\Path\Directory\00542.txt
If instead you want to write the output to a file, use
public static void WriteAdjustedPaths(string inpath, string outpath)
{
using (var w = new StreamWriter(outpath))
{
var r = new StreamReader(inpath);
string line;
while ((line = r.ReadLine()) != null)
w.WriteLine("{0}", AdjustPath(inpath, line));
}
}
You might call it with
WriteAdjustedPaths("3123~101", "output.txt");
If you want a List<String> instead
public static List<String> AdjustedPaths(string inpath)
{
var paths = new List<String>();
var r = new StreamReader(inpath);
string line;
while ((line = r.ReadLine()) != null)
paths.Add(AdjustPath(inpath, line));
return paths;
}
To avoid repeated logic, we should define WriteAdjustedPaths in terms of the new function:
public static void WriteAdjustedPaths(string inpath, string outpath)
{
using (var w = new StreamWriter(outpath))
{
foreach (var p in AdjustedPaths(inpath))
w.WriteLine("{0}", p);
}
}
The syntax could be streamlined with Linq. See C# File Handling.
A slight variation on gbacon's answer that will also work in older versions of .Net:
static void Main(string[] args)
{
Console.WriteLine(Adjust("3123~101", #"3123|X:directory\Path\Directory|Pre0{0442-0500}.txt"));
Console.WriteLine(Adjust("3123~101", #"3123|X:directory\Path\Directory|0{0442-0500}.txt"));
}
private static string Adjust(string name, string file)
{
Regex nameParse = new Regex(#"\d*~(?<value>\d*)");
Regex fileParse = new Regex(#"\d*\|(?<drive>[A-Za-z]):(?<path>[^\|]*)\|(?<prefix>[^{]*){(?<code>\d*)");
Match nameMatch = nameParse.Match(name);
Match fileMatch = fileParse.Match(file);
int value = Convert.ToInt32(nameMatch.Groups["value"].Value);
int code = Convert.ToInt32(fileMatch.Groups["code"].Value);
code = code + value - 1;
string drive = fileMatch.Groups["drive"].Value;
string path = fileMatch.Groups["path"].Value;
string prefix = fileMatch.Groups["prefix"].Value;
string result = string.Format(#"{0}:\{1}\{2}{3:0000}.txt",
drive,
path,
prefix,
code);
return result;
}
You don't seem to be very clear in your examples.
That said,
/.*\\(.*)-[^-]*$/
will capture all text between the last backslash and the last hyphen in whatever it's matched against.