This question already has answers here:
Can I escape a double quote in a verbatim string literal?
(6 answers)
How to split csv whose columns may contain comma
(9 answers)
Closed 4 years ago.
I have the a text file as follows:
"0","Column","column2","Column3"
I have managed to get the data down to split to the following:
"0"
"Column"
"Column2"
"Column3"
with ,(?=(?:[^']*'[^']*')*[^']*$), now I want to remove the quotes. I have tested the expression [^\s"']+|"([^"]*)"|\'([^\']*) an online regex tester, which gives the correct output im looking for. However, I am getting a syntax error when using the expression:
String[] columns = Regex.Split(dataLine, "[^\s"']+|"([^"]*)"|\'([^\']*)");
Syntax error ',' expected
I've tried escaping characters but to no avail, am I missing something?
Any help would be greatly appreciated!
Thanks.
C# might be escaping the backslash. Try:
String[] columns = Regex.Split(dataLine, #"[^\s""']+|"([^""]*)""|\'([^\']*)");
The problems are the double quotes inside the regex, the compiler chokes on them, think they are the end of string.
You must escape them, like this:
"[^\s\"']+|\"([^\"]*)\"|\'([^\']*)"
Edit:
You can actually do all, that you want with one regex, without first splitting:
#"(?<=[""])[^,]*?(?=[""])"
Here I use an # quoted string where double quotes are doubled instead of escaped.
The regex uses look behind to look for a double quote, then matching any character except comma ',' zero ore more times, then looks ahead for a double quote.
How to use:
string test = #"""0"",""Column"",""column2"",""Column3""";
Regex regex = new Regex(#"(?<=[""])[^,]*?(?=[""])");
foreach (Match match in regex.Matches(test))
{
Console.WriteLine(match.Value);
}
You need to escape the double quotes inside of your regular expression, as they're closing the string literal. Also, to handle 'unrecognized escape sequences', you'll need to escape the \ in \s.
Two ways to do this:
Escape all the characters of concern using backslashes: "[^\\s\"']+|\"([^\"]*)\"|\'([^\']*)"
Use the # syntax to denote a "verbatim" string literal. Double quotes still need to be escaped, but instead using "" for every ": #"[^\s""']+|""([^""]*)""|'([^']*)"
Regardless, when I test out your new regular expression it appears to be capturing some empty groups as well, see here: https://dotnetfiddle.net/1WQE4R
Related
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
I am new to regex. What does regex expression match pattern "\[.*\]" mean?
If I have a text like "Hello [Here]", then success is returned in the match. And match contain [Here].
I read that:
. indicates Any except \n (newline),
* indicates 0 or more times
I don't understand the "\". It believe it is just escape sequence for "\".
So, is the expression "\[.*\]" trying to match a pattern like \[Any text\]?
Yes, you are right. It will match any characters enclosed in []. The .* imply any or no characters enclosed in [].
Also you should try this link which is a very helpful regex tool. You can input the regex pattern and check for matches easily.
I have tried this on regexr, here is a screen shot:
This question already has answers here:
Remove text in-between delimiters in a string (using a regex?)
(5 answers)
Closed 6 years ago.
I am trying to remove characters starting from (and including) rgm up to (and including) ;1..
Example input string:
Sum ({rgmdaerudsb;1.Total_Value}, {rgmdaerub;1.Major_Value})
Code:
string strEx = "Sum ({rgmdaerudsb;1.Total_Value}, {rgmdaerub;1.Major_Value})";
strEx = strEx.Substring(0, strEx.LastIndexOf("rgm")) +
strEx.Substring(strEx.LastIndexOf(";1.") + 3);
Result:
Sum ({rgmdaerub;1.Total_Value}, {.Major_Value})
Expected result:
Sum ({Total_Value}, {Major_Value})
Note: only rgm and ;1. will remain static and characters between them will vary.
I would recommend to use Regex for this purpose. Try this:
string input = "Sum ({rgmdaerudsb;1.Total_Value}, {rgmdaerub;1.Major_Value})";
string result = Regex.Replace(input, #"rgm.*?;1\.", "");
Explanation:
The second parameter of Regex.Replace takes the pattern that consists of the following:
rgm (your starting string)
. (dot - meaning any character)
*? (the preceding symbol can occure zero or more times, but stops at the first possible match (shortest))
;1. (your ending string - the dot needed to be escaped, otherwise it would mean any character)
You need to use RegEx, with an expression like "rgm(.);1\.". That's just off the top of my head, you will have to verify the exact regular expression that matches your pattern. Then, use RegEx.Replace() with it.
This question already has answers here:
What is the difference between a regular string and a verbatim string?
(6 answers)
Closed 7 years ago.
There is # operator that you place infornt of the string to allow special characters in string and there is \. Well I am aware that with # you can use reserved names for variables, but I am curious just about difference using these two operators with string.
Search on the web indicated that these two are the same but I still believe there has to be something different between # and \.
Code to test:
string _string0 = #"Just a ""qoute""";
string _string1 = "Just a \"qoute\"";
Console.WriteLine("{0} | {1}",_string0, _string1);
Question: what is the difference between #"Just a ""qoute"""; and "Just a \"qoute\""; only regarding strings?
Edit: Question is already answered here.
Using # (which denotes a verbatim string literal) you can put any character into the string, even line breaks. The only character you need to escape is the double quote. The usual \* escape sequences and Unicode escape sequences are not processed in such string literals.
Without # (in a regular string literal), you need to escape every special character, such as line breaks.
You can read more about it at the C# Programming Guide:
https://msdn.microsoft.com/en-us/library/ms228362.aspx#Anchor_3
# is a verbatim string, it allows you not to escape every special character at a time, but all of them in the string.While \ just allows you to escape one certain character.
More info about strings: https://msdn.microsoft.com/en-us/library/aa691090%28v=vs.71%29.aspx
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Regex - Only letters?
I try to filter out alphabetics ([a-z],[A-Z]) from text.
I tried "^\w$" but it filters alphanumeric (alpha and numbers).
What is the pattern to filter out alphabetic?
Thanks.
To remove all letters try this:
void Main()
{
var str = "some junk456456%^&%*333";
Console.WriteLine(Regex.Replace(str, "[a-zA-Z]", ""));
}
For filtering out only English alphabets use:
[^a-zA-Z]+
For filtering out alphabets regardless of the language use:
[^\p{L}]+
If you want to reverse the effect remove the hat ^ right after the opening brackets.
If you want to find whole lines that match the pattern then enclose the above patterns within ^ and $ signs, otherwise you don't need them. Note that to make them effect for every line you'll need to create the Regex object with the multi-line option enabled.
try this simple way:
var result = Regex.Replace(inputString, "[^a-zA-Z\s]", "");
explain:
+
Matches the previous element one or more times.
[^character_group]
Negation: Matches any single character that is not in character_group.
\s
Matches any white-space character.
To filter multiple alpha characters use
^[a-zA-Z]+$
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Regular expression to match string not containing a word?
To not match a set of characters I would use e.g. [^\"\\r\\n]*
Now I want to not match a fixed character set, e.g. "|="
In other words, I want to match: ( not ", not \r, not \n, and not |= ).
EDIT: I am trying to modify the regex for parsing data separated with delimiters. The single-delimiter solution I got form a CSV parser, but now I want to expand it to include multi-character delimiters. I do not think lookaheads will work, because I want to consume, not just assert and discard, the matching characters.
I figured it out, it should be: ((?![\"\\r\\n]|[|][=]).)*
The full regex, modified from the CSV parser link in the original post, will be: ((?<field>((?![\"\\r\\n]|[|][=]).)*)|\"(?<field>([^\"]|\"\")*)\")([|][=]|(?<rowbreak>\\r\\n|\\n|$))
This will match any amount of characters of ( not ", not \r, not \n, and not |= ), or a quoted string, followed by ( "|=" or end of line )