Replace long dash with a usual one in C# [duplicate] - c#

This question already has answers here:
the correct regex for replacing em-dash with a basic "-" in java
(4 answers)
Closed 3 years ago.
I have a string with multiple dashes, but it contains long dashes.
What method can I use to normalize dashes?
text = Regex.Replace(text, #"(\u2012|\u2013|\u2014|\u2015)", "-");
The expected output is something like 11-1111-11/11
The actual is almost the same, but some of the dashes are long ones. (I can't put in that dash because the stackoverflow does not recognize it.)

This works:
private const string DashPattern = #"[\u2012\u2013\u2014\u2015]";
private static Regex _dashRegex = new Regex(DashPattern);
public static string RemoveLongDashes(string s)
{
return _dashRegex.Replace(s, "-");
}
Your expression with the pipe characters (|) is not a valid regex expression. If you want to replace all of the vowels, you use an expression like #"[aeiou]", i.e., the choices within a set of square brackets.

Here is some info on the em dash. You might be able to copy and paste the dash from this post into your code and use the string.replace
The em dash
Look in the following SO post for the answer:
replacing the em dash
Looks like the following code solved the issue for others:
String s = "asd – asd";
s = s.replaceAll("\\p{Pd}", "-");

Related

C# Regex Split Quotes and Comma Syntax Error [duplicate]

This question already has answers here:
Can I escape a double quote in a verbatim string literal?
(6 answers)
How to split csv whose columns may contain comma
(9 answers)
Closed 4 years ago.
I have the a text file as follows:
"0","Column","column2","Column3"
I have managed to get the data down to split to the following:
"0"
"Column"
"Column2"
"Column3"
with ,(?=(?:[^']*'[^']*')*[^']*$), now I want to remove the quotes. I have tested the expression [^\s"']+|"([^"]*)"|\'([^\']*) an online regex tester, which gives the correct output im looking for. However, I am getting a syntax error when using the expression:
String[] columns = Regex.Split(dataLine, "[^\s"']+|"([^"]*)"|\'([^\']*)");
Syntax error ',' expected
I've tried escaping characters but to no avail, am I missing something?
Any help would be greatly appreciated!
Thanks.
C# might be escaping the backslash. Try:
String[] columns = Regex.Split(dataLine, #"[^\s""']+|"([^""]*)""|\'([^\']*)");
The problems are the double quotes inside the regex, the compiler chokes on them, think they are the end of string.
You must escape them, like this:
"[^\s\"']+|\"([^\"]*)\"|\'([^\']*)"
Edit:
You can actually do all, that you want with one regex, without first splitting:
#"(?<=[""])[^,]*?(?=[""])"
Here I use an # quoted string where double quotes are doubled instead of escaped.
The regex uses look behind to look for a double quote, then matching any character except comma ',' zero ore more times, then looks ahead for a double quote.
How to use:
string test = #"""0"",""Column"",""column2"",""Column3""";
Regex regex = new Regex(#"(?<=[""])[^,]*?(?=[""])");
foreach (Match match in regex.Matches(test))
{
Console.WriteLine(match.Value);
}
You need to escape the double quotes inside of your regular expression, as they're closing the string literal. Also, to handle 'unrecognized escape sequences', you'll need to escape the \ in \s.
Two ways to do this:
Escape all the characters of concern using backslashes: "[^\\s\"']+|\"([^\"]*)\"|\'([^\']*)"
Use the # syntax to denote a "verbatim" string literal. Double quotes still need to be escaped, but instead using "" for every ": #"[^\s""']+|""([^""]*)""|'([^']*)"
Regardless, when I test out your new regular expression it appears to be capturing some empty groups as well, see here: https://dotnetfiddle.net/1WQE4R

Best way to remove unknown characters and spaces using C#? [duplicate]

This question already has answers here:
How can I remove the spaces, tabs, new lines between characters using c#'s REGEX?
(2 answers)
Closed 6 years ago.
Unknown Characters:
|b9-12-2016,¢Xocoak¡LO2A35(2)(b)¡ÓocORe3ao-i|],¢Xa?u¡±o¡±i?¢X$3,597,669On 9-12-2016, the price adjusted to $3,597,669 dueto the reason allowed under section 35(2)(b) of theOrdinance
Good Result:
$3,597,669On 9-12-2016, the price adjusted to $3,597,669 due to the reason allowed under section 35 of the Ordinance
You should be able to use regular expressions to do this. You can use the Regex.Replace method to run regular expressions on your text. Regular expressions are patterns that a regular expression engine tries to match in input text. I recommend that you take a look at the MSDN article here. You can also take a look at the documentation for the Regex.Replace method here. For example, in order to remove the letter c you could use this snippet of code:
output = Regex.Replace(input, "c", "", RegexOptions.IgnoreCase);
This would replace both lowercase and capital Cs because the ignore case option is turned on.
If it is a standard pattern as what you've told me. Use the following code. It takes everything after the last $ sign.
string str = "|b9-12-2016,¢Xocoak¡LO2A35(2)(b)¡ÓocORe3ao-i|],¢Xa?u¡±o¡±i?¢X$3,597,669On 9-12-2016, the price adjusted to $3,597,669 dueto the reason allowed under section 35(2)(b) of theOrdinance";
var result = str.Substring(str.LastIndexOf('$'));

Unable to remove certain characters between values in c# [duplicate]

This question already has answers here:
Remove text in-between delimiters in a string (using a regex?)
(5 answers)
Closed 6 years ago.
I am trying to remove characters starting from (and including) rgm up to (and including) ;1..
Example input string:
Sum ({rgmdaerudsb;1.Total_Value}, {rgmdaerub;1.Major_Value})
Code:
string strEx = "Sum ({rgmdaerudsb;1.Total_Value}, {rgmdaerub;1.Major_Value})";
strEx = strEx.Substring(0, strEx.LastIndexOf("rgm")) +
strEx.Substring(strEx.LastIndexOf(";1.") + 3);
Result:
Sum ({rgmdaerub;1.Total_Value}, {.Major_Value})
Expected result:
Sum ({Total_Value}, {Major_Value})
Note: only rgm and ;1. will remain static and characters between them will vary.
I would recommend to use Regex for this purpose. Try this:
string input = "Sum ({rgmdaerudsb;1.Total_Value}, {rgmdaerub;1.Major_Value})";
string result = Regex.Replace(input, #"rgm.*?;1\.", "");
Explanation:
The second parameter of Regex.Replace takes the pattern that consists of the following:
rgm (your starting string)
. (dot - meaning any character)
*? (the preceding symbol can occure zero or more times, but stops at the first possible match (shortest))
;1. (your ending string - the dot needed to be escaped, otherwise it would mean any character)
You need to use RegEx, with an expression like "rgm(.);1\.". That's just off the top of my head, you will have to verify the exact regular expression that matches your pattern. Then, use RegEx.Replace() with it.

Need to get so substring but using Regular Expression [duplicate]

This question already has answers here:
C# Substring Alternative - Return rest of line within string after character
(5 answers)
Closed 6 years ago.
I got on url like.
http://EddyFox.com/x/xynua
Need to fetch substring after /x/ what ever string is there.
complex example I faced is :
http://EddyFox.com/x//x/
Here result should be /x/
It can be achieved with substring ,But we need to perform it with regular expression.
This should do it:
string s = "http://EddyFox.com/x/xynua";
// I guess you don't want the /x/ in your match ?=!
Console.WriteLine(Regex.Match(s, "/x/(.*)").Groups[1].Value );
this is probably even better:
Console.WriteLine(Regex.Match(s, "(?<=/x/)(.*)").Value );
the output is
xynua
Have a look at this post: Regex to match after specific characters SO is full of RegEx posts. The probability is very high that a RegEx question has already been asked before. :)
The regex /x/(.*) will capture everything following the /x/
And where is the problem?
var r = new Regex("/x/(\\S*)");
var matches = r.Matches(myUrl);
This regex matches everything from /x/ until the first occurence of a white-space.

Filter out alphabetic with regex using C# [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Regex - Only letters?
I try to filter out alphabetics ([a-z],[A-Z]) from text.
I tried "^\w$" but it filters alphanumeric (alpha and numbers).
What is the pattern to filter out alphabetic?
Thanks.
To remove all letters try this:
void Main()
{
var str = "some junk456456%^&%*333";
Console.WriteLine(Regex.Replace(str, "[a-zA-Z]", ""));
}
For filtering out only English alphabets use:
[^a-zA-Z]+
For filtering out alphabets regardless of the language use:
[^\p{L}]+
If you want to reverse the effect remove the hat ^ right after the opening brackets.
If you want to find whole lines that match the pattern then enclose the above patterns within ^ and $ signs, otherwise you don't need them. Note that to make them effect for every line you'll need to create the Regex object with the multi-line option enabled.
try this simple way:
var result = Regex.Replace(inputString, "[^a-zA-Z\s]", "");
explain:
+
Matches the previous element one or more times.
[^character_group]
Negation: Matches any single character that is not in character_group.
\s
Matches any white-space character.
To filter multiple alpha characters use
^[a-zA-Z]+$

Categories

Resources