Regex.Replace rufuses to replace with newline - c#

Hi I wrote a very simple C# program to use the C# Regex from command line instead of relying on the MS Word search and replace. The problem is that even though the Regex recognizes \r and \n fine, when I try to replace the string with either of these, it seems to replace it with the escaped character instead of the character itself.
[STAThread]
static void Main(string[] args)
{
string initial = Clipboard.GetText();
Console.Write("Find: ");
string find = Console.ReadLine();
Console.Write("Replace: ");
string replace = Console.ReadLine();
string final = Regex.Replace(initial, find, replace);
Clipboard.SetText(final);
}
For example, my input string from the clipboard would be "Woodcock, american" (with a carriage return-newline at the end). the pattern would be #",.+\r", which matches fine, and the replacement string would be #"\r\n". This produces the string "Woodcock\r\n" (which are the letters r and n just to be clear). What am I doing wrong?
edit: Anirudh's answer solved my problem partially and I updated the code accordingly. However, it seems that when I input "\r\n" to the ReadLine it also escapes somehow, whereas if I write string replace = "\r\n" it actually replaces the string with a carriage return-newline. Link to new question : C# ReadLine escapes carriage return/newline?

This is because you are using verbatim string i.e #"" which would escape \r\n treating them as literals and not special characters!
The replacement string should be "\r\n" NOT #"\r\n"
To solve your other problem
Output=Regex.Replace(input,"\\r?\\n","\r\n");

Related

Regex to remove multiple consecutive commas and replace with single comma

Given the input string "Test,,test,,,test,test"
and using the following C# snippet I would have expected the duplicate commas to be replaced by a single comma and results in...
"Test,test,test,test"
private static string TruncateCommas(string input)
{
return Regex.Replace(input, #",+", ",");
}
Code was pinched from this answer...
C# replace all occurrences of a character with just a character
But what I am seeing is "Test,,test,,,test,test" as the output from this function.
Do I need to escape the comma in the regex? Or should this regex be working.
Do I need to escape the comma in the regex?
No.
Or should this regex be working.
Yes.
Please construct your test the following way:
void Main()
{
string s = "Test,,test,,,test,test";
string result = TruncateCommas(s);
Console.WriteLine(result);
}
Output
Test,test,test,test

How can I not escape a character sequence in a string in C#?

I am reading in a value from a resource file (.resx) in C# and in that value, I have added \r\n\r\n so that it gets displayed in the correct format. e.g. blah\r\n\r\nblah
For some reason, the string automatically adds a '\' so effectively text becomes blah\\r\\n\\r\\nblah and thus it escapes the character sequences.
Try to use Shift+Enter to add a new line in .resx file instead of \r\n. Or replase like str = str.Replace("\\r\\n", "\r\n");
When you stored the string, the escape characters were stored as part of the string literal, and are no longer 'escaping' anything. One thing you could do is replace your placeholder string "\r\n" with a newline character after you read it.
For example:
private static void Main()
{
var resourceStr = "blah\\r\\n\\r\\nblah";
Console.WriteLine("Original resource string:");
Console.WriteLine(resourceStr);
Console.WriteLine("\nAfter calling replace:");
Console.WriteLine(resourceStr.Replace("\\r\\n", Environment.NewLine));
Console.WriteLine("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
Output

Remove substring that starts with SOT and ends EOT, from string

I have a program that reads certain strings from memory. The strings contain, for the most part, recognizable characters. At random points in the strings however, "weird" characters appear. Characters I did not recognize. By going to a site that allows me to paste in Unicode characters to see what they are, I found that a selection of the "weird" characters were these:
\x{1} SOH, "start of heading", ctrl-a
\x{2} SOT, "start of text"
\x{3} EOT, "end of text"
\x{7} BEL, bell, ctrl-g
\x{13} dc3, device control three, ctrl-s
\x{11} dc1, device control one, ctrl-q
\x{14} dc4, device control four, ctrl-t
\x{1A} sub, substitute, ctrl-z
\x{6} ack, acknowledge, ctrl-f
I wanted to parse my strings to remove these characters. What I found out though, by looking at the strings, was that all the unwanted characters were always surrounded by the SOT and EOT, respectively.
Therefore, I am thinking that my question is: How can I remove, from a string, all occurrences of substrings that starts with SOT and ends with EOT?
Edit: Attempt at Solution
Using ideas from #RagingCain I made the following method:
private static string RemoveInvalidCharacters(string input)
{
while (true)
{
var start = input.IndexOf('\u0002');
var end = input.IndexOf('\u0003', start);
if (start == -1 || end == -1) break;
Console.WriteLine(#"Start: " + start + #". End: " + end);
var diff = end - start;
input = input.Remove(start, diff);
}
return input;
}
It does the trick, thanks again.
Regex would be your solution and should work fine. You would assign these characters to the Pattern and you can use the sub-method Match or even just Replace them with whitespace " ", or just cut them from the string all together by using "".
Regex.Replace: https://msdn.microsoft.com/en-us/library/xwewhkd1(v=vs.110).aspx
Regex.Match: https://msdn.microsoft.com/en-us/library/bk1x0726(v=vs.110).aspx
Regex example:
public static void Main()
{
string input = "This is text with far too much " +
"whitespace.";
string pattern = "\\s+";
string replacement = " ";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Console.WriteLine("Original String: {0}", input);
Console.WriteLine("Replacement String: {0}", result);
}
I know the difficulty though of not being able to "see" them so you should assign them to Char variables by Unicode itself, add them to the pattern for replace.
Char Variables: https://msdn.microsoft.com/en-us/library/x9h8tsay.aspx
Unicode for Start of Text:
http://www.fileformat.info/info/unicode/char/0002/index.htm
Unicode for End of Text:
http://www.fileformat.info/info/unicode/char/0003/index.htm
To apply to your solution:
Does string contain SOT, EOT.
If true, remove entire string/sub-string/SOT or EOT.
It maybe easier to split original string into a string[], then go line by line... it's difficult to parse through your string without knowing what it looks like so hopefully I provided something that helps ^.^

C# string to sentence

Is there a way to convert string without spaces to a proper sentence??
E.g. "WhoAmI" needs to be converted to "Who Am I"
A regex replacement would do this, if you're just talking about inserting a space before each capital letter:
using System;
using System.Text.RegularExpressions;
class Test
{
static void Main()
{
var input = "WhoAmI";
var output = Regex.Replace(input, #"\p{Lu}", " $0").TrimStart();
Console.WriteLine(output);
}
}
However, I suspect there will be significant corner cases. Note that the above uses \p{Lu} instead of just [A-Z] to cope with non-ASCII capital letters; you may find A-Z simpler if you only need to deal with ASCII. The TrimStart() call is to remove the leading space you'd get otherwise.
If every word in the string is starting with uppercase you may just convert each part that is starting with uppercase to a space separated string.
You can use LINQ
string words = "WhoAmI";
string sentence = String.Concat(words.Select(letter => Char.IsUpper(letter) ? " " + letter
: letter.ToString()))
.TrimStart();

Replace Line Breaks in a String C#

How can I replace Line Breaks within a string in C#?
Use replace with Environment.NewLine
myString = myString.Replace(System.Environment.NewLine, "replacement text"); //add a line terminating ;
As mentioned in other posts, if the string comes from another environment (OS) then you'd need to replace that particular environments implementation of new line control characters.
The solutions posted so far either only replace Environment.NewLine or they fail if the replacement string contains line breaks because they call string.Replace multiple times.
Here's a solution that uses a regular expression to make all three replacements in just one pass over the string. This means that the replacement string can safely contain line breaks.
string result = Regex.Replace(input, #"\r\n?|\n", replacementString);
To extend The.Anyi.9's answer, you should also be aware of the different types of line break in general use. Dependent on where your file originated, you may want to look at making sure you catch all the alternatives...
string replaceWith = "";
string removedBreaks = Line.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
should get you going...
I would use Environment.Newline when I wanted to insert a newline for a string, but not to remove all newlines from a string.
Depending on your platform you can have different types of newlines, but even inside the same platform often different types of newlines are used. In particular when dealing with file formats and protocols.
string ReplaceNewlines(string blockOfText, string replaceWith)
{
return blockOfText.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
}
If your code is supposed to run in different environments, I would consider using the Environment.NewLine constant, since it is specifically the newline used in the specific environment.
line = line.Replace(Environment.NewLine, "newLineReplacement");
However, if you get the text from a file originating on another system, this might not be the correct answer, and you should replace with whatever newline constant is used on the other system. It will typically be \n or \r\n.
if you want to "clean" the new lines, flamebaud comment using regex #"[\r\n]+" is the best choice.
using System;
using System.Text.RegularExpressions;
class MainClass {
public static void Main (string[] args) {
string str = "AAA\r\nBBB\r\n\r\n\r\nCCC\r\r\rDDD\n\n\nEEE";
Console.WriteLine (str.Replace(System.Environment.NewLine, "-"));
/* Result:
AAA
-BBB
-
-
-CCC
DDD---EEE
*/
Console.WriteLine (Regex.Replace(str, #"\r\n?|\n", "-"));
// Result:
// AAA-BBB---CCC---DDD---EEE
Console.WriteLine (Regex.Replace(str, #"[\r\n]+", "-"));
// Result:
// AAA-BBB-CCC-DDD-EEE
}
}
Use new in .NET 6 method
myString = myString.ReplaceLineEndings();
Replaces ALL newline sequences in the current string.
Documentation:
ReplaceLineEndings
Don't forget that replace doesn't do the replacement in the string, but returns a new string with the characters replaced. The following will remove line breaks (not replace them). I'd use #Brian R. Bondy's method if replacing them with something else, perhaps wrapped as an extension method. Remember to check for null values first before calling Replace or the extension methods provided.
string line = ...
line = line.Replace( "\r", "").Replace( "\n", "" );
As extension methods:
public static class StringExtensions
{
public static string RemoveLineBreaks( this string lines )
{
return lines.Replace( "\r", "").Replace( "\n", "" );
}
public static string ReplaceLineBreaks( this string lines, string replacement )
{
return lines.Replace( "\r\n", replacement )
.Replace( "\r", replacement )
.Replace( "\n", replacement );
}
}
To make sure all possible ways of line breaks (Windows, Mac and Unix) are replaced you should use:
string.Replace("\r\n", "\n").Replace('\r', '\n').Replace('\n', 'replacement');
and in this order, to not to make extra line breaks, when you find some combination of line ending chars.
Why not both?
string ReplacementString = "";
Regex.Replace(strin.Replace(System.Environment.NewLine, ReplacementString), #"(\r\n?|\n)", ReplacementString);
Note: Replace strin with the name of your input string.
I needed to replace the \r\n with an actual carriage return and line feed and replace \t with an actual tab. So I came up with the following:
public string Transform(string data)
{
string result = data;
char cr = (char)13;
char lf = (char)10;
char tab = (char)9;
result = result.Replace("\\r", cr.ToString());
result = result.Replace("\\n", lf.ToString());
result = result.Replace("\\t", tab.ToString());
return result;
}
var answer = Regex.Replace(value, "(\n|\r)+", replacementString);
As new line can be delimited by \n, \r and \r\n, first we’ll replace \r and \r\n with \n, and only then split data string.
The following lines should go to the parseCSV method:
function parseCSV(data) {
//alert(data);
//replace UNIX new lines
data = data.replace(/\r\n/g, "\n");
//replace MAC new lines
data = data.replace(/\r/g, "\n");
//split into rows
var rows = data.split("\n");
}
Use the .Replace() method
Line.Replace("\n", "whatever you want to replace with");
Best way to replace linebreaks safely is
yourString.Replace("\r\n","\n") //handling windows linebreaks
.Replace("\r","\n") //handling mac linebreaks
that should produce a string with only \n (eg linefeed) as linebreaks.
this code is usefull to fix mixed linebreaks too.
Another option is to create a StringReader over the string in question. On the reader, do .ReadLine() in a loop. Then you have the lines separated, no matter what (consistent or inconsistent) separators they had. With that, you can proceed as you wish; one possibility is to use a StringBuilder and call .AppendLine on it.
The advantage is, you let the framework decide what constitutes a "line break".
string s = Regex.Replace(source_string, "\n", "\r\n");
or
string s = Regex.Replace(source_string, "\r\n", "\n");
depending on which way you want to go.
Hopes it helps.
If you want to replace only the newlines:
var input = #"sdfhlu \r\n sdkuidfs\r\ndfgdgfd";
var match = #"[\\ ]+";
var replaceWith = " ";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input.Replace(#"\n", replaceWith).Replace(#"\r", replaceWith), match, replaceWith);
Console.WriteLine("output: " + x);
If you want to replace newlines, tabs and white spaces:
var input = #"sdfhlusdkuidfs\r\ndfgdgfd";
var match = #"[\\s]+";
var replaceWith = "";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input, match, replaceWith);
Console.WriteLine("output: " + x);
This is a very long winded one-liner solution but it is the only one that I had found to work if you cannot use the the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method
MyStr.replace( System.String.Concat( System.Char.ConvertFromUtf32(13).ToString(), System.Char.ConvertFromUtf32(10).ToString() ), ReplacementString );
This is somewhat offtopic but to get it to work inside Visual Studio's XML .props files, which invoke .NET via the XML properties, I had to dress it up like it is shown below.
The Visual Studio XML --> .NET environment just would not accept the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method.
$([System.IO.File]::ReadAllText('MyFile.txt').replace( $([System.String]::Concat($([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString()))),$([System.String]::Concat('^',$([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString())))))
Based on #mark-bayers answer and for cleaner output:
string result = Regex.Replace(ex.Message, #"(\r\n?|\r?\n)+", "replacement text");
It removes \r\n , \n and \r while perefer longer one and simplify multiple occurances to one.

Categories

Resources