Regex pattern format [duplicate] - c#

I have the following Regex pattern to remove all characters after the 2 line breaks.
(?<=.+[\r\n]+.+[\r\n]+)([\s\S]*)
My problem here is that I also wanted to add a check for a specific text, for example after that 2 line breaks and if it is found, do not include it.
And here is how I do it on my c# code:
string newComment = string.IsNullOrEmpty(regexPattern) ? emailBody : new Regex(regexPattern, RegexOptions.IgnoreCase).Replace(emailBody, string.Empty);
EDIT
I wanted to look for a specific text, for example "This is a signature:" then if it is found, it should not be included and anything after it also, while maintaining the current design which everything after 2 line breaks will not be included
Sample strings:
string body = "Try comment.";
string additionalBody = "This is a signature";
string newBody = body + System.Environment.NewLine + additionalBody + System.Environment.NewLine + "asd Asd";
So the newBody should result to 3 paragraphs text.
It should display the "Try comment" only.
Possible scenarios may be:
1) On the first or second paragraph, the text can be present and should be removed automatically.
2) If the automated signature is not present but there is 3 paragraphs, remove the last paragraph.

Try this:
(?<=(?>.+[\r\n]+){2})(?:(?!\bThis is a signature\b)[\s\S])*

How about simply:
(?<=(?:.+[\r\n]+){2})([\s\S]*)This is a signature

Related

How do I replace all occurences of a multiline string in another string?

I'm trying to replace all occurences of a multiline string in another string. Assuming that input contains the input text, output contains the resulting text, searchText contains the multiline string to be found and replaceText contains the replaced multiline string, I used this code:
output = input.Replace(searchText, replaceText);
The problem is that it works only with single line strings (that don't include newlines). How could I make it work for strings that contain newlines?
e.g.
searchText = "ABC\nDEF";
replaceText = "text";
input:
ABC
DEF
KLF
Z
output:
text
KLF
Z
You need to know what the new line is in the input. It could be LF only, but it could be CR+LF.
I am a little bit lazy to explain, so please read this Wikipedia about new line: https://en.wikipedia.org/wiki/Newline
So your problem might be because CR is also there, which makes that the string to search does not match at all. One solution is to set your search text as:
searchtext = "ABC" + System.Environment.NewLine + "DEF";
System.Environment.NewLine deals with the new line for you better. See the reference in msdn: https://msdn.microsoft.com/en-us/library/system.environment.newline(v=vs.110).aspx

Regular Expression without braces

i have the following sample cases :
1) "Sample"
2) "[10,25]"
I want to form a(only one) regular expression pattern, to which the above examples are passed returns me "Sample" and "10,25".
Note: Input strings do not include Quotes.
I came up with the following expression (?<=\[)(.*?)(?=\]), this satisfies the second case and retreives me only "10,25" but when the first case is matched it returns me blank. I want "Sample" to be returned? can anyone help me.
C#.
here you go, a small regex using a positive lookbehind, sometime these are very handy
Regex
(?<=^|\[)([\w,]+)
Test string
Sample
[10,25]
Result
MATCH 1
[0-6] Sample
MATCH 2
[8-13] 10,25
try at regex101.com
if " is included in your original string, use this regex, this will look for " mark as well, you may choose to remove ^| from lookup if " mark is always included or you may choose to leave it as it is if your text has combination of with and without " marks
Regex
(?<=^|\[|\")([\w,]+)
try at regex101.com
As far as I can tell, the below regex should help:
Regex regex = new Regex(#"^\w+|[[](\w)+\,(\w)+[]]$");
This will match multiple words, or 2 words (alphanumeric) separated by commas and inside square brackets.
One Java example:
// String input = "Sample";
String input = "[10,25]";
String text = "[^,\\[\\]]+";
Pattern pMod = Pattern.compile("(" + text + ")|(?>\\[(" + text + "," + text + ")\\])");
Matcher mMod = pMod.matcher(input);
while (mMod.find()) {
if(mMod.group(1) != null) {
System.out.println(mMod.group(1));
}
if(mMod.group(2)!=null) {
System.out.println(mMod.group(2));
}
}
if input is "[hello&bye,25|35]", then the output is hello&bye,25|35

Regex for new line in string (str) in C# .NET

I have exhausted my search and need asisstance. I am new to regex and have managed to pull words from a multi lined string, however not a whole line.
I have text pulled into string but I cannot find out to grab the next line.
Example string has multiple lines (string multipleLines):
Authentication information
User information:
domai n\username
Paris
I need to grab the text "domain\username" after the line "User iformation."
I have tried many combinations of regex and cannot get it to work. Example:
string topLine = "Authentication information";
label.Text = Regex.Match(multipleLines, topLine + "(.*)").Groups[1].Value;
I also tried using: topLine + "\n"
What should I add to look at the entire next line after getting the line for Authentication information?
Your objective with Regular Expressions can be found here at this thread on Stack Overflow. You would want to implement the RegexOptions.Multiline so you can make usage of the ^ and $ to match the Start and End of a line.
^: Depending on whether the MultiLine option is set, matches the position before the first character in a line, or the first character in the string.
$: Depending on whether the MultiLine option is set, matches the position after the last character in a line, or the last character in the string.
Those would be the easiest way to accomplish your task.
Update:
An example would be something like this.
const string account = #"Authentication information:" + "\n"
+ "User Information: " + "\n"
+ "Domain Username: "
+ " \n" + "\\Paris";
MatchCollection match = Regex.Matches(account, ^\\.*$, RegexOptions.Multiline);
That will retrieve the line with the \\ and all that proceed it on that line. That is an example, hopefully that points you in the correct direction.
Though RegEx would accomplish what you want, this might be simpler and far less overhead. Using this code depends on the kind of input you'll be receiving. For your example, this will work:
string domainUsername = inputString.Split('\n').Where(z => z.ToLower().Contains(#"\")).FirstOrDefault();
if (domainUsername != null) {
Console.WriteLine(domainUsername); // Should spit out the appropriate line.
} else {
Console.WriteLine("Domain and username not found!"); // Line not found
}

Check if a string ends with another string or a part of another string

I wanted to know if there is a solution to the problem mentioned in the topic.
Example:
In my project I have to parse a lot of messages. These messages contain formatting characters like "\n" or "\r".
The end of this message is always signed with the name of the author.
Now I want to remove the signatures from each message. The problem is that the end of the message could look like
\r\n\rDaniel Walters\n\r\n
\n\r\n\r\n\rDaniel
or something else
The problem is that I don't know how to identifiy these varying endings.
I tried to only remove the last "\n\r\n"'s by calling string.EndsWith() in a loop but this solution only removes everything except "\r\n\rDaniel Walter".
Then I tried to remove the author (I parsed it prior to this step) but this does not work either. Sometimes the parsed author is "Daniel Walters" and the signature is only "Daniel".
Any ideas how to solve this?
Are there maybe some easier and smarter solutions than looping through the string?
You can make a regular expression to replace the name with an optional last name, and any number of whitespace characters before and after.
Example:
string message = "So long and thanks for all the fish \t\t\r Arthur \t Dent \r\r\n ";
string firstName = "Arthur";
string lastName = "Dent";
string pattern = "\\s+" + Regex.Escape(firstName) + "(\\s+" + Regex.Escape(lastName) + ")?\\s*$";
message = Regex.Replace(message, pattern, String.Empty);
(Yes, I know it was really the dolphins saying that.)
you could try something like the following (untested) :-
string str="\r\n\rDaniel Walters\n\r\n";
while(str.EndsWith("\r") || str.EndsWith("\n"))
{
// \r and \n have the same length. So, we can use either \r or \n in the end
str=str.SubString(0,str.Length - ("\r".Length));
}
while(str.StartsWith("\r") || str.StartsWith("\n"))
{
// \r and \n have the same length
str=str.SubString("\r".Length,str.length);
}
You'll have to determine what "looks like" a signature. Are there specific criteria that always apply?
Always followed by at least 3 newlines (\r or \n)
Starts with a capital letter
Has no following text
A regex like this might work for those criteria:
/[\r\n]{3,}[A-Z][\w ]+[\r\n]*(?!\w)/
Adjust according to your needs.
Edited to add: This should match the last "paragraph" of a document.
/([\r\n]+[\w ]+[\r\n]*)(?!.)/
you can do this as well but I am not sure if your pattern changes but this will return Daniel Walter
string replaceStr = "\r\n\rDaniel Walters\n\r\n";
replaceStr = replaceStr.TrimStart(new char[] { '\r', '\n' });
replaceStr = replaceStr.TrimEnd(new char[] { '\r', '\n' });
or if you want to use the trim method you can do the following
string replaceStr = "\r\n\rDaniel Walters\n\r\n";
replaceStr = replaceStr.Trim();
A different approach could be to split your message at the newline chars removing the empty newline entries. Then reassembling the expected string excluding the last line where I assume there is always the signature.
string removeLastLine = "Text on the firstline\r\ntest on second line\rtexton third line\r\n\rDaniel Walters\n\r\n";
string[] lines = removeLastLine.Split(new char[] {'\r', '\n'}, StringSplitOptions.RemoveEmptyEntries);
lines = lines.Take(lines.Length - 1).ToArray();
string result = string.Join(Environment.NewLine, lines);

Replace Line Breaks in a String C#

How can I replace Line Breaks within a string in C#?
Use replace with Environment.NewLine
myString = myString.Replace(System.Environment.NewLine, "replacement text"); //add a line terminating ;
As mentioned in other posts, if the string comes from another environment (OS) then you'd need to replace that particular environments implementation of new line control characters.
The solutions posted so far either only replace Environment.NewLine or they fail if the replacement string contains line breaks because they call string.Replace multiple times.
Here's a solution that uses a regular expression to make all three replacements in just one pass over the string. This means that the replacement string can safely contain line breaks.
string result = Regex.Replace(input, #"\r\n?|\n", replacementString);
To extend The.Anyi.9's answer, you should also be aware of the different types of line break in general use. Dependent on where your file originated, you may want to look at making sure you catch all the alternatives...
string replaceWith = "";
string removedBreaks = Line.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
should get you going...
I would use Environment.Newline when I wanted to insert a newline for a string, but not to remove all newlines from a string.
Depending on your platform you can have different types of newlines, but even inside the same platform often different types of newlines are used. In particular when dealing with file formats and protocols.
string ReplaceNewlines(string blockOfText, string replaceWith)
{
return blockOfText.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
}
If your code is supposed to run in different environments, I would consider using the Environment.NewLine constant, since it is specifically the newline used in the specific environment.
line = line.Replace(Environment.NewLine, "newLineReplacement");
However, if you get the text from a file originating on another system, this might not be the correct answer, and you should replace with whatever newline constant is used on the other system. It will typically be \n or \r\n.
if you want to "clean" the new lines, flamebaud comment using regex #"[\r\n]+" is the best choice.
using System;
using System.Text.RegularExpressions;
class MainClass {
public static void Main (string[] args) {
string str = "AAA\r\nBBB\r\n\r\n\r\nCCC\r\r\rDDD\n\n\nEEE";
Console.WriteLine (str.Replace(System.Environment.NewLine, "-"));
/* Result:
AAA
-BBB
-
-
-CCC
DDD---EEE
*/
Console.WriteLine (Regex.Replace(str, #"\r\n?|\n", "-"));
// Result:
// AAA-BBB---CCC---DDD---EEE
Console.WriteLine (Regex.Replace(str, #"[\r\n]+", "-"));
// Result:
// AAA-BBB-CCC-DDD-EEE
}
}
Use new in .NET 6 method
myString = myString.ReplaceLineEndings();
Replaces ALL newline sequences in the current string.
Documentation:
ReplaceLineEndings
Don't forget that replace doesn't do the replacement in the string, but returns a new string with the characters replaced. The following will remove line breaks (not replace them). I'd use #Brian R. Bondy's method if replacing them with something else, perhaps wrapped as an extension method. Remember to check for null values first before calling Replace or the extension methods provided.
string line = ...
line = line.Replace( "\r", "").Replace( "\n", "" );
As extension methods:
public static class StringExtensions
{
public static string RemoveLineBreaks( this string lines )
{
return lines.Replace( "\r", "").Replace( "\n", "" );
}
public static string ReplaceLineBreaks( this string lines, string replacement )
{
return lines.Replace( "\r\n", replacement )
.Replace( "\r", replacement )
.Replace( "\n", replacement );
}
}
To make sure all possible ways of line breaks (Windows, Mac and Unix) are replaced you should use:
string.Replace("\r\n", "\n").Replace('\r', '\n').Replace('\n', 'replacement');
and in this order, to not to make extra line breaks, when you find some combination of line ending chars.
Why not both?
string ReplacementString = "";
Regex.Replace(strin.Replace(System.Environment.NewLine, ReplacementString), #"(\r\n?|\n)", ReplacementString);
Note: Replace strin with the name of your input string.
I needed to replace the \r\n with an actual carriage return and line feed and replace \t with an actual tab. So I came up with the following:
public string Transform(string data)
{
string result = data;
char cr = (char)13;
char lf = (char)10;
char tab = (char)9;
result = result.Replace("\\r", cr.ToString());
result = result.Replace("\\n", lf.ToString());
result = result.Replace("\\t", tab.ToString());
return result;
}
var answer = Regex.Replace(value, "(\n|\r)+", replacementString);
As new line can be delimited by \n, \r and \r\n, first we’ll replace \r and \r\n with \n, and only then split data string.
The following lines should go to the parseCSV method:
function parseCSV(data) {
//alert(data);
//replace UNIX new lines
data = data.replace(/\r\n/g, "\n");
//replace MAC new lines
data = data.replace(/\r/g, "\n");
//split into rows
var rows = data.split("\n");
}
Use the .Replace() method
Line.Replace("\n", "whatever you want to replace with");
Best way to replace linebreaks safely is
yourString.Replace("\r\n","\n") //handling windows linebreaks
.Replace("\r","\n") //handling mac linebreaks
that should produce a string with only \n (eg linefeed) as linebreaks.
this code is usefull to fix mixed linebreaks too.
Another option is to create a StringReader over the string in question. On the reader, do .ReadLine() in a loop. Then you have the lines separated, no matter what (consistent or inconsistent) separators they had. With that, you can proceed as you wish; one possibility is to use a StringBuilder and call .AppendLine on it.
The advantage is, you let the framework decide what constitutes a "line break".
string s = Regex.Replace(source_string, "\n", "\r\n");
or
string s = Regex.Replace(source_string, "\r\n", "\n");
depending on which way you want to go.
Hopes it helps.
If you want to replace only the newlines:
var input = #"sdfhlu \r\n sdkuidfs\r\ndfgdgfd";
var match = #"[\\ ]+";
var replaceWith = " ";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input.Replace(#"\n", replaceWith).Replace(#"\r", replaceWith), match, replaceWith);
Console.WriteLine("output: " + x);
If you want to replace newlines, tabs and white spaces:
var input = #"sdfhlusdkuidfs\r\ndfgdgfd";
var match = #"[\\s]+";
var replaceWith = "";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input, match, replaceWith);
Console.WriteLine("output: " + x);
This is a very long winded one-liner solution but it is the only one that I had found to work if you cannot use the the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method
MyStr.replace( System.String.Concat( System.Char.ConvertFromUtf32(13).ToString(), System.Char.ConvertFromUtf32(10).ToString() ), ReplacementString );
This is somewhat offtopic but to get it to work inside Visual Studio's XML .props files, which invoke .NET via the XML properties, I had to dress it up like it is shown below.
The Visual Studio XML --> .NET environment just would not accept the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method.
$([System.IO.File]::ReadAllText('MyFile.txt').replace( $([System.String]::Concat($([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString()))),$([System.String]::Concat('^',$([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString())))))
Based on #mark-bayers answer and for cleaner output:
string result = Regex.Replace(ex.Message, #"(\r\n?|\r?\n)+", "replacement text");
It removes \r\n , \n and \r while perefer longer one and simplify multiple occurances to one.

Categories

Resources