Read from file without special characters

Read from file without special characters - c#

Im using a StreamReader to open a text file and grab its contents. I need to grab just the text from the file without any escape characters ( \n, \r, \", etc ). Google is failing me right now. Any ideas?

There are no escape characters in a text that you read from a file. Escape characters are used when you write a string literal, for example in program code. I assume that you mean that you want to replace any write space characters with plain spaces.
You can use a regular expression to match white space characters and replace them with spaces. It's easier to use the File.ReadAllText to read the text from the file:
string text = Regex.Replace(File.ReadAllText(fileName), #"[\r\n\t ]+", " ");

Why don't you just call ReadToEnd and then Split the string?
// using statement and whatever code here
var rawContent = sr.ReadToEnd();
var usefulContent = rawContent.Split(new []{ "\r\n", "\\" },
StringSplitOptions.RemoveEmptyEntries);
Note: you'll want to tweak the separators in the Split method; this is just an example.
You could also simply Replace the unwanted characters:
// using statement and whatever code here
var rawContent = sr.ReadToEnd();
var usefulContent = rawContent
.Replace("\r\n", "" )
.Replace("\\", "");

If you're trying to do it as you stream, call StreamReader.Read() in a while loop and test the characters one by one.
If you're able to grab the entire file contents into a string, use a regular expression to strip the undesirable characters. Check out RegexHero: http://regexhero.net/tester/

Assume you have read the entire file in a string s
for (int i = 0; i < s.Length; i++)
{
if (char.IsLetterOrDigit(s, i)) // or if (!char.IsWhiteSpace(s, i))
{
// append to StringBuilder
}
}
If IsLetterOrDigit or IsWhiteSpace don't fit your needs you can create your own method and call it.

You may use universal function for skipping all characters you not need:
public string SkipChars(string InputString, char[] CharsToSkip)
{
string result = InputString;
foreach (var chr in CharsToSkip)
{
result = result.Replace(chr.ToString(), "");
}
return result;
}
usage:
string test = "one\ntwo\tthree";
MessageBox.Show(SkipChars(test, new char[] { '\n', '\t' }));

Related

how can i optimize the performance of this regular expression?

I'm using a regular expression to replace commas that are not contained by text qualifying quotes into tab spaces.
I'm running the regex on file content through a script task in SSIS. The file content is over 6000 lines long.
I saw an example of using a regex on file content that looked like this
String FileContent = ReadFile(FilePath, ErrInfo);
Regex r = new Regex(#"(,)(?=(?:[^""]|""[^""]*"")*$)");
FileContent = r.Replace(FileContent, "\t");
That replace can understandably take its sweet time on a decent sized file.
Is there a more efficient way to run this regex?
Would it be faster to read the file line by line and run the regex per line?

It seems you're trying to convert comma separated values (CSV) into tab separated values (TSV).
In this case, you should try to find a CSV library instead and read the fields with that library (and convert them to TSV if necessary).
Alternatively, you can check whether each line has quotes and use a simpler method accordingly.

The problem is the lookahead, which looks all the way to the end on each comman, resulting in O(n2) complexity, which is noticeable on long inputs. You can get it done in a single pass by skipping over quotes while replacing:
Regex csvRegex = new Regex(#"
(?<Quoted>
"" # Open quotes
(?:[^""]|"""")* # not quotes, or two quotes (escaped)
"" # Closing quotes
)
| # OR
(?<Comma>,) # A comma
",
RegexOptions.IgnorePatternWhitespace);
content = csvRegex.Replace(content,
match => match.Groups["Comma"].Success ? "\t" : match.Value);
Here we match free command and quoted strings. The Replace method takes a callback with a condition that checks if we found a comma or not, and replaced accordingly.

The simplest optimization would be
Regex r = new Regex(#"(,)(?=(?:[^""]|""[^""]*"")*$)", RegexOptions.Compiled);
foreach (var line in System.IO.File.ReadAllLines("input.txt"))
Console.WriteLine(r.Replace(line, "\t"));
I haven't profiled it, but I wouldn't be surprised if the speedup was huge.
If that's not enough I suggest some manual labour:
var input = new StreamReader(File.OpenRead("input.txt"));
char[] toMatch = ",\"".ToCharArray ();
string line;
while (null != (line = input.ReadLine()))
{
var result = new StringBuilder(line);
bool inquotes = false;
for (int index=0; -1 != (index = line.IndexOfAny (toMatch, index)); index++)
{
bool isquote = (line[index] == '\"');
inquotes = inquotes != isquote;
if (!(isquote || inquotes))
result[index] = '\t';
}
Console.WriteLine (result);
}
PS: I assumed #"\t" was a typo for "\t", but perhaps it isn't :)

Unquote string in C#

I have a data file in INI file like format that needs to be read by both some C code and some C# code. The C code expects string values to be surrounded in quotes. The C# equivalent code is using some underlying class or something I have no control over, but basically it includes the quotes as part of the output string. I.e. data file contents of
MY_VAL="Hello World!"
gives me
"Hello World!"
in my C# string, when I really need it to contain
Hello World!
How do I conditionally (on having first and last character being a ") remove the quotes and get the string contents that I want.

On your string use Trim with the " as char:
.Trim('"')

I usually call String.Trim() for that purpose:
string source = "\"Hello World!\"";
string unquoted = source.Trim('"');

My implementation сheck that quotes are from both sides
public string UnquoteString(string str)
{
if (String.IsNullOrEmpty(str))
return str;
int length = str.Length;
if (length > 1 && str[0] == '\"' && str[length - 1] == '\"')
str = str.Substring(1, length - 2);
return str;
}

Just take the returned string and do a Trim('"');

Being obsessive, here (that's me; no comment about you), you may want to consider
.Trim(' ').Trim('"').Trim(' ')
so that any, bounding spaces outside of the quoted string are trimmed, then the quotation marks are stripped and, finally, any, bounding spaces for the contained string are removed.
If you want to retain contained, bounding white space, omit the final .Trim(' ').
Should there be embedded spaces and/or quotation marks, they will be preserved. Chances are, such are desired and should not be deleted.
Do some study as to what a no argument Trim() does to things like form feed and/or tabulation characters, bounding and embedded. It could be that one and/or the other Trim(' ') should be just Trim().

If you know there will always be " at the end and beginning, this would be the fastest way.
s = s.Substring(1, s.Length - 2);

Use string replace function or trim function.
If you just want to remove first and last quotes use substring function.
string myworld = "\"Hello World!\"";
string start = myworld.Substring(1, (myworld.Length - 2));

I would suggest using the replace() method.
string str = "\"HelloWorld\"";
string result = str.replace("\"", string.Empty);

What you are trying to do is often called "stripping" or "unquoting". Usually, when the value is quoted that means not only that it is surrounded by quotation characters (like " in this case) but also that it may or may not contain special characters to include quotation character itself inside quoted text.
In short, you should consider using something like:
string s = #"""Hey ""Mikey""!";
s = s.Trim('"').Replace(#"""""", #"""");
Or when using apostrophe mark:
string s = #"'Hey ''Mikey''!";
s = s.Trim('\'').Replace("''", #"'");
Also, sometimes values that don't need quotation at all (i.e. contains no whitespace) may not need to be quoted anyway. That's the reason checking for quotation characters before trimming is reasonable.
Consider creating a helper function that will do this job in a preferable way as in the example below.
public static string StripQuotes(string text, char quote, string unescape)a
{
string with = quote.ToString();
if (quote != '\0')
{
// check if text contains quote character at all
if (text.Length >= 2 && text.StartsWith(with) && text.EndsWith(with))
{
text = text.Trim(quote);
}
}
if (!string.IsNullOrEmpty(unescape))
{
text = text.Replace(unescape, with);
}
return text;
}
using System;
public class Program
{
public static void Main()
{
string text = #"""Hello World!""";
Console.WriteLine(text);
// That will do the job
// Output: Hello World!
string strippedText = text.Trim('"');
Console.WriteLine(strippedText);
string escapedText = #"""My name is \""Bond\"".""";
Console.WriteLine(escapedText);
// That will *NOT* do the job to good
// Output: My name is \"Bond\".
string strippedEscapedText = escapedText.Trim('"');
Console.WriteLine(strippedEscapedText);
// Allow to use \" inside quoted text
// Output: My name is "Bond".
string strippedEscapedText2 = escapedText.Trim('"').Replace(#"\""", #"""");
Console.WriteLine(strippedEscapedText2);
// Create a function that will check texts for having or not
// having citation marks and unescapes text if needed.
string t1 = #"""My name is \""Bond\"".""";
// Output: "My name is \"Bond\"."
Console.WriteLine(t1);
// Output: My name is "Bond".
Console.WriteLine(StripQuotes(t1, '"', #"\"""));
string t2 = #"""My name is """"Bond"""".""";
// Output: "My name is ""Bond""."
Console.WriteLine(t2);
// Output: My name is "Bond".
Console.WriteLine(StripQuotes(t2, '"', #""""""));
}
}
https://dotnetfiddle.net/TMLWHO

Here's my solution as extension method:
public static class StringExtensions
{
public static string UnquoteString(this string inputString) => inputString.TrimStart('"').TrimEnd('"');
}
It's just trimming at the start an the end...

How to remove leading and trailing spaces from a string

I have the following input:
string txt = " i am a string "
I want to remove space from start of starting and end from a string.
The result should be: "i am a string"
How can I do this in c#?

String.Trim
Removes all leading and trailing white-space characters from the current String object.
Usage:
txt = txt.Trim();
If this isn't working then it highly likely that the "spaces" aren't spaces but some other non printing or white space character, possibly tabs. In this case you need to use the String.Trim method which takes an array of characters:
char[] charsToTrim = { ' ', '\t' };
string result = txt.Trim(charsToTrim);
Source
You can add to this list as and when you come across more space like characters that are in your input data. Storing this list of characters in your database or configuration file would also mean that you don't have to rebuild your application each time you come across a new character to check for.
NOTE
As of .NET 4 .Trim() removes any character that Char.IsWhiteSpace returns true for so it should work for most cases you come across. Given this, it's probably not a good idea to replace this call with the one that takes a list of characters you have to maintain.
It would be better to call the default .Trim() and then call the method with your list of characters.

You can use:
String.TrimStart - Removes all leading occurrences of a set of characters specified in an array from the current String object.
String.TrimEnd - Removes all trailing occurrences of a set of characters specified in an array from the current String object.
String.Trim - combination of the two functions above
Usage:
string txt = " i am a string ";
char[] charsToTrim = { ' ' };
txt = txt.Trim(charsToTrim)); // txt = "i am a string"
EDIT:
txt = txt.Replace(" ", ""); // txt = "iamastring"

I really don't understand some of the hoops the other answers are jumping through.
var myString = " this is my String ";
var newstring = myString.Trim(); // results in "this is my String"
var noSpaceString = myString.Replace(" ", ""); // results in "thisismyString";
It's not rocket science.

txt = txt.Trim();

Or you can split your string to string array, splitting by space and then add every item of string array to empty string.
May be this is not the best and fastest method, but you can try, if other answer aren't what you whant.

text.Trim() is to be used
string txt = " i am a string ";
txt = txt.Trim();

Use the Trim method.

static void Main()
{
// A.
// Example strings with multiple whitespaces.
string s1 = "He saw a cute\tdog.";
string s2 = "There\n\twas another sentence.";
// B.
// Create the Regex.
Regex r = new Regex(#"\s+");
// C.
// Strip multiple spaces.
string s3 = r.Replace(s1, #" ");
Console.WriteLine(s3);
// D.
// Strip multiple spaces.
string s4 = r.Replace(s2, #" ");
Console.WriteLine(s4);
Console.ReadLine();
}
OUTPUT:
He saw a cute dog.
There was another sentence.
He saw a cute dog.

You Can Use
string txt = " i am a string ";
txt = txt.TrimStart().TrimEnd();
Output is "i am a string"

Removing all whitespace lines from a multi-line string efficiently

In C# what's the best way to remove blank lines i.e., lines that contain only whitespace from a string? I'm happy to use a Regex if that's the best solution.
EDIT: I should add I'm using .NET 2.0.
Bounty update: I'll roll this back after the bounty is awarded, but I wanted to clarify a few things.
First, any Perl 5 compat regex will work. This is not limited to .NET developers. The title and tags have been edited to reflect this.
Second, while I gave a quick example in the bounty details, it isn't the only test you must satisfy. Your solution must remove all lines which consist of nothing but whitespace, as well as the last newline. If there is a string which, after running through your regex, ends with "/r/n" or any whitespace characters, it fails.

If you want to remove lines containing any whitespace (tabs, spaces), try:
string fix = Regex.Replace(original, #"^\s*$\n", string.Empty, RegexOptions.Multiline);
Edit (for #Will): The simplest solution to trim trailing newlines would be to use TrimEnd on the resulting string, e.g.:
string fix =
Regex.Replace(original, #"^\s*$\n", string.Empty, RegexOptions.Multiline)
.TrimEnd();

string outputString;
using (StringReader reader = new StringReader(originalString)
using (StringWriter writer = new StringWriter())
{
string line;
while((line = reader.ReadLine()) != null)
{
if (line.Trim().Length > 0)
writer.WriteLine(line);
}
outputString = writer.ToString();
}

off the top of my head...
string fixed = Regex.Replace(input, "\s*(\n)","$1");
turns this:
fdasdf
asdf
[tabs]
[spaces]
asdf
into this:
fdasdf
asdf
asdf

Using LINQ:
var result = string.Join("\r\n",
multilineString.Split(new string[] { "\r\n" }, ...None)
.Where(s => !string.IsNullOrWhitespace(s)));
If you're dealing with large inputs and/or inconsistent line endings you should use a StringReader and do the above old-school with a foreach loop instead.

Alright this answer is in accordance to the clarified requirements specified in the bounty:
I also need to remove any trailing newlines, and my Regex-fu is
failing. My bounty goes to anyone who can give me a regex which passes
this test: StripWhitespace("test\r\n \r\nthis\r\n\r\n") ==
"test\r\nthis"
So Here's the answer:
(?<=\r?\n)(\s*$\r?\n)+|(?<=\r?\n)(\r?\n)+|(\r?\n)+\z
Or in the C# code provided by #Chris Schmich:
string fix = Regex.Replace("test\r\n \r\nthis\r\n\r\n", #"(?<=\r?\n)(\s*$\r?\n)+|(?<=\r?\n)(\r?\n)+|(\r?\n)+\z", string.Empty, RegexOptions.Multiline);
Now let's try to understand it. There are three optional patterns in here which I am willing to replace with string.empty.
(?<=\r?\n)(\s*$\r?\n)+ - matches one to unlimited lines containing only white space and preceeded by a line break (but does not match the first preceeding line breaks).
(?<=\r?\n)(\r?\n)+ - matches one to unlimited empty lines with no content that are preceeded by a line break (but does not match the first preceeding line breaks).
(\r?\n)+\z - matches one to unlimited line breaks at the end of the tested string (trailing line breaks as you called them)
That satisfies your test perfectly! But also satisfies both \r\n and \n line break styles! Test it out! I believe this will be the most correct answer, although simpler expression would pass your specified bounty test, this regex passes more complex conditions.
EDIT: #Will pointed out a potential flaw in the last pattern match of the above regex in that it won't match multiple line breaks containing white space at the end of the test string. So let's change that last pattern to this:
\b\s+\z The \b is a word boundry (beginning or END of a word), the \s+ is one or more white space characters, the \z is the end of the test string (end of "file"). So now it will match any assortment of whitespace at the end of the file including tabs and spaces in addition to carriage returns and line breaks. I tested both of #Will's provided test cases.
So all together now, it should be:
(?<=\r?\n)(\s*$\r?\n)+|(?<=\r?\n)(\r?\n)+|\b\s+\z
EDIT #2: Alright there is one more possible case #Wil found that the last regex doesn't cover. That case is inputs that have line breaks at the beginning of the file before any content. So lets add one more pattern to match the beginning of the file.
\A\s+ - The \A match the beginning of the file, the \s+ match one or more white space characters.
So now we've got:
\A\s+|(?<=\r?\n)(\s*$\r?\n)+|(?<=\r?\n)(\r?\n)+|\b\s+\z
So now we have four patterns for matching:
whitespace at the beginning of the file,
redundant line breaks containing white space, (ex: \r\n \r\n\t\r\n)
redundant line breaks with no content, (ex: \r\n\r\n)
whitespace at the end of the file

not good. I would use this one using JSON.net:
var o = JsonConvert.DeserializeObject(prettyJson);
new minifiedJson = JsonConvert.SerializeObject(o, Formatting.None);

In response to Will's bounty, which expects a solution that takes "test\r\n \r\nthis\r\n\r\n" and outputs "test\r\nthis", I've come up with a solution that makes use of atomic grouping (aka Nonbacktracking Subexpressions on MSDN). I recommend reading those articles for a better understanding of what's happening. Ultimately the atomic group helped match the trailing newline characters that were otherwise left behind.
Use RegexOptions.Multiline with this pattern:
^\s+(?!\B)|\s*(?>[\r\n]+)$
Here is an example with some test cases, including some I gathered from Will's comments on other posts, as well as my own.
string[] inputs =
{
"one\r\n \r\ntwo\r\n\t\r\n \r\n",
"test\r\n \r\nthis\r\n\r\n",
"\r\n\r\ntest!",
"\r\ntest\r\n ! test",
"\r\ntest \r\n ! "
};
string[] outputs =
{
"one\r\ntwo",
"test\r\nthis",
"test!",
"test\r\n ! test",
"test \r\n ! "
};
string pattern = #"^\s+(?!\B)|\s*(?>[\r\n]+)$";
for (int i = 0; i < inputs.Length; i++)
{
string result = Regex.Replace(inputs[i], pattern, "",
RegexOptions.Multiline);
Console.WriteLine(result == outputs[i]);
}
EDIT: To address the issue of the pattern failing to clean up text with a mix of whitespace and newlines, I added \s* to the last alternation portion of the regex. My previous pattern was redundant and I realized \s* would handle both cases.

string corrected =
System.Text.RegularExpressions.Regex.Replace(input, #"\n+", "\n");

I'll go with:
public static string RemoveEmptyLines(string value) {
using (StringReader reader = new StringReader(yourstring)) {
StringBuilder builder = new StringBuilder();
string line;
while ((line = reader.ReadLine()) != null) {
if (line.Trim().Length > 0)
builder.AppendLine(line);
}
return builder.ToString();
}
}

Here's another option: use the StringReader class. Advantages: one pass over the string, creates no intermediate arrays.
public static string RemoveEmptyLines(this string text) {
var builder = new StringBuilder();
using (var reader = new StringReader(text)) {
while (reader.Peek() != -1) {
string line = reader.ReadLine();
if (!string.IsNullOrWhiteSpace(line))
builder.AppendLine(line);
}
}
return builder.ToString();
}
Note: the IsNullOrWhiteSpace method is new in .NET 4.0. If you don't have that, it's trivial to write on your own:
public static bool IsNullOrWhiteSpace(string text) {
return string.IsNullOrEmpty(text) || text.Trim().Length < 1;
}

In response to Will's bounty here is a Perl sub that gives correct response to the test case:
sub StripWhitespace {
my $str = shift;
print "'",$str,"'\n";
$str =~ s/(?:\R+\s+(\R)+)|(?:()\R+)$/$1/g;
print "'",$str,"'\n";
return $str;
}
StripWhitespace("test\r\n \r\nthis\r\n\r\n");
output:
'test
this
'
'test
this'
In order to not use \R, replace it with [\r\n] and inverse the alternative. This one produces the same result:
$str =~ s/(?:(\S)[\r\n]+)|(?:[\r\n]+\s+([\r\n])+)/$1/g;
There're no needs for special configuration neither multi line support. Nevertheless you can add s flag if it's mandatory.
$str =~ s/(?:(\S)[\r\n]+)|(?:[\r\n]+\s+([\r\n])+)/$1/sg;

if its only White spaces why don't you use the C# string method
string yourstring = "A O P V 1.5";
yourstring.Replace(" ", string.empty);
result will be "AOPV1.5"

char[] delimiters = new char[] { '\r', '\n' };
string[] lines = value.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
string result = string.Join(Environment.NewLine, lines)

Here is something simple if working against each individual line...
(^\s+|\s+|^)$

Eh. Well, after all that, I couldn't find one that would hit all the corner cases I could figure out. The following is my latest incantation of a regex that strips
All empty lines from the start of a string
Not including any spaces at the beginning of the first non-whitespace line
All empty lines after the first non-whitespace line and before the last non-whitespace line
Again, preserving all whitespace at the beginning of any non-whitespace line
All empty lines after the last non-whitespace line, including the last newline
(?<=(\r\n)|^)\s*\r\n|\r\n\s*$
which essentially says:
Immediately after
The beginning of the string OR
The end of the last line
Match as much contiguous whitespace as possible that ends in a newline*
OR
Match a newline and as much contiguous whitespace as possible that ends at the end of the string
The first half catches all whitespace at the start of the string until the first non-whitespace line, or all whitespace between non-whitespace lines. The second half snags the remaining whitespace in the string, including the last non-whitespace line's newline.
Thanks to all who tried to help out; your answers helped me think through everything I needed to consider when matching.
*(This regex considers a newline to be \r\n, and so will have to be adjusted depending on the source of the string. No options need to be set in order to run the match.)

String Extension
public static string UnPrettyJson(this string s)
{
try
{
// var jsonObj = Json.Decode(s);
// var sObject = Json.Encode(value); dont work well with array of strings c:['a','b','c']
object jsonObj = JsonConvert.DeserializeObject(s);
return JsonConvert.SerializeObject(jsonObj, Formatting.None);
}
catch (Exception e)
{
throw new Exception(
s + " Is Not a valid JSON ! (please validate it in http://www.jsoneditoronline.org )", e);
}
}

Im not sure is it efficient but =)
List<string> strList = myString.Split(new string[] { "\n" }, StringSplitOptions.None).ToList<string>();
myString = string.Join("\n", strList.Where(s => !string.IsNullOrWhiteSpace(s)).Distinct().ToList());

Try this.
string s = "Test1" + Environment.NewLine + Environment.NewLine + "Test 2";
Console.WriteLine(s);
string result = s.Replace(Environment.NewLine, String.Empty);
Console.WriteLine(result);

s = Regex.Replace(s, #"^[^\n\S]*\n", "");
[^\n\S] matches any character that's not a linefeed or a non-whitespace character--so, any whitespace character except \n. But most likely the only characters you have to worry about are space, tab and carriage return, so this should work too:
s = Regex.Replace(s, #"^[ \t\r]*\n", "");
And if you want it to catch the last line, without a final linefeed:
s = Regex.Replace(s, #"^[ \t\r]*\n?", "");

Replace Line Breaks in a String C#

How can I replace Line Breaks within a string in C#?

Use replace with Environment.NewLine
myString = myString.Replace(System.Environment.NewLine, "replacement text"); //add a line terminating ;
As mentioned in other posts, if the string comes from another environment (OS) then you'd need to replace that particular environments implementation of new line control characters.

The solutions posted so far either only replace Environment.NewLine or they fail if the replacement string contains line breaks because they call string.Replace multiple times.
Here's a solution that uses a regular expression to make all three replacements in just one pass over the string. This means that the replacement string can safely contain line breaks.
string result = Regex.Replace(input, #"\r\n?|\n", replacementString);

To extend The.Anyi.9's answer, you should also be aware of the different types of line break in general use. Dependent on where your file originated, you may want to look at making sure you catch all the alternatives...
string replaceWith = "";
string removedBreaks = Line.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
should get you going...

I would use Environment.Newline when I wanted to insert a newline for a string, but not to remove all newlines from a string.
Depending on your platform you can have different types of newlines, but even inside the same platform often different types of newlines are used. In particular when dealing with file formats and protocols.
string ReplaceNewlines(string blockOfText, string replaceWith)
{
return blockOfText.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
}

If your code is supposed to run in different environments, I would consider using the Environment.NewLine constant, since it is specifically the newline used in the specific environment.
line = line.Replace(Environment.NewLine, "newLineReplacement");
However, if you get the text from a file originating on another system, this might not be the correct answer, and you should replace with whatever newline constant is used on the other system. It will typically be \n or \r\n.

if you want to "clean" the new lines, flamebaud comment using regex #"[\r\n]+" is the best choice.
using System;
using System.Text.RegularExpressions;
class MainClass {
public static void Main (string[] args) {
string str = "AAA\r\nBBB\r\n\r\n\r\nCCC\r\r\rDDD\n\n\nEEE";
Console.WriteLine (str.Replace(System.Environment.NewLine, "-"));
/* Result:
AAA
-BBB
-
-
-CCC
DDD---EEE
*/
Console.WriteLine (Regex.Replace(str, #"\r\n?|\n", "-"));
// Result:
// AAA-BBB---CCC---DDD---EEE
Console.WriteLine (Regex.Replace(str, #"[\r\n]+", "-"));
// Result:
// AAA-BBB-CCC-DDD-EEE
}
}

Use new in .NET 6 method
myString = myString.ReplaceLineEndings();
Replaces ALL newline sequences in the current string.
Documentation:
ReplaceLineEndings

Don't forget that replace doesn't do the replacement in the string, but returns a new string with the characters replaced. The following will remove line breaks (not replace them). I'd use #Brian R. Bondy's method if replacing them with something else, perhaps wrapped as an extension method. Remember to check for null values first before calling Replace or the extension methods provided.
string line = ...
line = line.Replace( "\r", "").Replace( "\n", "" );
As extension methods:
public static class StringExtensions
{
public static string RemoveLineBreaks( this string lines )
{
return lines.Replace( "\r", "").Replace( "\n", "" );
}
public static string ReplaceLineBreaks( this string lines, string replacement )
{
return lines.Replace( "\r\n", replacement )
.Replace( "\r", replacement )
.Replace( "\n", replacement );
}
}

To make sure all possible ways of line breaks (Windows, Mac and Unix) are replaced you should use:
string.Replace("\r\n", "\n").Replace('\r', '\n').Replace('\n', 'replacement');
and in this order, to not to make extra line breaks, when you find some combination of line ending chars.

Why not both?
string ReplacementString = "";
Regex.Replace(strin.Replace(System.Environment.NewLine, ReplacementString), #"(\r\n?|\n)", ReplacementString);
Note: Replace strin with the name of your input string.

I needed to replace the \r\n with an actual carriage return and line feed and replace \t with an actual tab. So I came up with the following:
public string Transform(string data)
{
string result = data;
char cr = (char)13;
char lf = (char)10;
char tab = (char)9;
result = result.Replace("\\r", cr.ToString());
result = result.Replace("\\n", lf.ToString());
result = result.Replace("\\t", tab.ToString());
return result;
}

var answer = Regex.Replace(value, "(\n|\r)+", replacementString);

As new line can be delimited by \n, \r and \r\n, first we’ll replace \r and \r\n with \n, and only then split data string.
The following lines should go to the parseCSV method:
function parseCSV(data) {
//alert(data);
//replace UNIX new lines
data = data.replace(/\r\n/g, "\n");
//replace MAC new lines
data = data.replace(/\r/g, "\n");
//split into rows
var rows = data.split("\n");
}

Use the .Replace() method
Line.Replace("\n", "whatever you want to replace with");

Best way to replace linebreaks safely is
yourString.Replace("\r\n","\n") //handling windows linebreaks
.Replace("\r","\n") //handling mac linebreaks
that should produce a string with only \n (eg linefeed) as linebreaks.
this code is usefull to fix mixed linebreaks too.

Another option is to create a StringReader over the string in question. On the reader, do .ReadLine() in a loop. Then you have the lines separated, no matter what (consistent or inconsistent) separators they had. With that, you can proceed as you wish; one possibility is to use a StringBuilder and call .AppendLine on it.
The advantage is, you let the framework decide what constitutes a "line break".

string s = Regex.Replace(source_string, "\n", "\r\n");
or
string s = Regex.Replace(source_string, "\r\n", "\n");
depending on which way you want to go.
Hopes it helps.

If you want to replace only the newlines:
var input = #"sdfhlu \r\n sdkuidfs\r\ndfgdgfd";
var match = #"[\\ ]+";
var replaceWith = " ";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input.Replace(#"\n", replaceWith).Replace(#"\r", replaceWith), match, replaceWith);
Console.WriteLine("output: " + x);
If you want to replace newlines, tabs and white spaces:
var input = #"sdfhlusdkuidfs\r\ndfgdgfd";
var match = #"[\\s]+";
var replaceWith = "";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input, match, replaceWith);
Console.WriteLine("output: " + x);

This is a very long winded one-liner solution but it is the only one that I had found to work if you cannot use the the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method
MyStr.replace( System.String.Concat( System.Char.ConvertFromUtf32(13).ToString(), System.Char.ConvertFromUtf32(10).ToString() ), ReplacementString );
This is somewhat offtopic but to get it to work inside Visual Studio's XML .props files, which invoke .NET via the XML properties, I had to dress it up like it is shown below.
The Visual Studio XML --> .NET environment just would not accept the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method.
$([System.IO.File]::ReadAllText('MyFile.txt').replace( $([System.String]::Concat($([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString()))),$([System.String]::Concat('^',$([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString())))))

Based on #mark-bayers answer and for cleaner output:
string result = Regex.Replace(ex.Message, #"(\r\n?|\r?\n)+", "replacement text");
It removes \r\n , \n and \r while perefer longer one and simplify multiple occurances to one.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Read from file without special characters - c#

Im using a StreamReader to open a text file and grab its contents. I need to grab just the text from the file without any escape characters ( \n, \r, \", etc ). Google is failing me right now. Any ideas?

If you're trying to do it as you stream, call StreamReader.Read() in a while loop and test the characters one by one. If you're able to grab the entire file contents into a string, use a regular expression to strip the undesirable characters. Check out RegexHero: http://regexhero.net/tester/

Assume you have read the entire file in a string s for (int i = 0; i < s.Length; i++) { if (char.IsLetterOrDigit(s, i)) // or if (!char.IsWhiteSpace(s, i)) { // append to StringBuilder } } If IsLetterOrDigit or IsWhiteSpace don't fit your needs you can create your own method and call it.

Related

how can i optimize the performance of this regular expression?

Unquote string in C#

How to remove leading and trailing spaces from a string

Removing all whitespace lines from a multi-line string efficiently

Replace Line Breaks in a String C#

Categories

Resources