Exclude first and last quotation of string in regex result

Exclude first and last quotation of string in regex result - c#

I'm running a little c# program where I need to extract the escape-quoted words from a string.
Sample code from linqpad:
string s = "action = 0;\r\ndir = \"C:\\\\folder\\\\\";\r\nresult";
var pattern = "\".*?\"";
var result = Regex.Split(s, pattern);
result.Dump();
Input (actual input contains many more escaped even-number-of quotes):
"action = 0;\r\ndir = \"C:\\\\folder\\\\\";\r\nresult"
expected result
"C:\\folder\\"
actual result (2 items)
"action = 0;
dir = "
_____
";
result"
I get exactly the opposite of what I require. How can I make the regex ignore the starting (and ending) quote of the actual string? Why does it include them in the search? I've used the regex from similar SO questions but still don't get the intended result. I only want to filter by escape quotes.

Instead of using Regex.Split, try Regex.Match.

You don't need RegEx. Simply use String.Split(';') and the second array element will have the path you need. You can then Trim() it to get rid of the quotes and Remove() to get rid of the ndir part. Something like:
result = s.Split(';')[1].Trim("\r ".ToCharArray()).Remove(0, 7).Trim('"');

Related

Splitting of a string using Regex

I have string of the following format:
string test = "test.BO.ID";
My aim is string that part of the string whatever comes after first dot.
So ideally I am expecting output as "BO.ID".
Here is what I have tried:
// Checking for the first occurence and take whatever comes after dot
var output = Regex.Match(test, #"^(?=.).*?");
The output I am getting is empty.
What is the modification I need to make it for Regex?

You get an empty output because the pattern you have can match an empty string at the start of a string, and that is enough since .*? is a lazy subpattern and . matches any char.
Use (the value will be in Match.Groups[1].Value)
\.(.*)
or (with a lookahead, to get the string as a Match.Value)
(?<=\.).*
See the regex demo and a C# online demo.
A non-regex approach can be use String#Split with count argument (demo):
var s = "test.BO.ID";
var res = s.Split(new[] {"."}, 2, StringSplitOptions.None);
if (res.GetLength(0) > 1)
Console.WriteLine(res[1]);

If you only want the part after the first dot you don't need a regex at all:
x.Substring(x.IndexOf('.'))

Omit unnecessary parts in string array

In C#, I have a string comes from a file in this format:
Type="Data"><Path.Style><Style
or maybe
Type="Program"><Rectangle.Style><Style
,etc. Now I want to only extract the Data or Program part of the Type element. For that, I used the following code:
string output;
var pair = inputKeyValue.Split('=');
if (pair[0] == "Type")
{
output = pair[1].Trim('"');
}
But it gives me this result:
output=Data><Path.Style><Style
What I want is:
output=Data
How to do that?

This code example takes an input string, splits by double quotes, and takes only the first 2 items, then joins them together to create your final string.
string input = "Type=\"Data\"><Path.Style><Style";
var parts = input
.Split('"')
.Take(2);
string output = string.Join("", parts); //note: .net 4 or higher
This will make output have the value:
Type=Data
If you only want output to be "Data", then do
var parts = input
.Split('"')
.Skip(1)
.Take(1);
or
var output = input
.Split('"')[1];

What you can do is use a very simple regular express to parse out the bits that you want, in your case you want something that looks like this and then grab the two groups that interest you:
(Type)="(\w+)"
Which would return in groups 1 and 2 the values Type and the non-space characters contained between the double-quotes.

Instead of doing many split, why don't you just use Regex :
output = Regex.Match(pair[1].Trim('"'), "\"(\w*)\"").Value;

Maybe I missed something, but what about this:
var str = "Type=\"Program\"><Rectangle.Style><Style";
var splitted = str.Split('"');
var type = splitted[1]; // IE Data or Progam
But you will need some error handling as well.

How about a regex?
var regex = new Regex("(?<=^Type=\").*?(?=\")");
var output = regex.Match(input).Value;
Explaination of regex
(?<=^Type=\") This a prefix match. Its not included in the result but will only match
if the string starts with Type="
.*? Non greedy match. Match as many characters as you can until
(?=\") This is a suffix match. It's not included in the result but will only match if the next character is "

Given your specified format:
Type="Program"><Rectangle.Style><Style
It seems logical to me to include the quote mark (") when splitting the strings... then you just have to detect the end quote mark and subtract the contents. You can use LinQ to do this:
string code = "Type=\"Program\"><Rectangle.Style><Style";
string[] parts = code.Split(new string[] { "=\"" }, StringSplitOptions.None);
string[] wantedParts = parts.Where(p => p.Contains("\"")).
Select(p => p.Substring(0, p.IndexOf("\""))).ToArray();

Regex: C# extract text within double quotes

I want to extract only those words within double quotes. So, if the content is:
Would "you" like to have responses to your "questions" sent to you via email?
The answer must be
you
questions

Try this regex:
\"[^\"]*\"
or
\".*?\"
explain :
[^ character_group ]
Negation: Matches any single character that is not in character_group.
*?
Matches the previous element zero or more times, but as few times as possible.
and a sample code:
foreach(Match match in Regex.Matches(inputString, "\"([^\"]*)\""))
Console.WriteLine(match.ToString());
//or in LINQ
var result = from Match match in Regex.Matches(line, "\"([^\"]*)\"")
select match.ToString();

Based on #Ria 's answer:
static void Main(string[] args)
{
string str = "Would \"you\" like to have responses to your \"questions\" sent to you via email?";
var reg = new Regex("\".*?\"");
var matches = reg.Matches(str);
foreach (var item in matches)
{
Console.WriteLine(item.ToString());
}
}
The output is:
"you"
"questions"
You can use string.TrimStart() and string.TrimEnd() to remove double quotes if you don't want it.

I like the regex solutions. You could also think of something like this
string str = "Would \"you\" like to have responses to your \"questions\" sent to you via email?";
var stringArray = str.Split('"');
Then take the odd elements from the array. If you use linq, you can do it like this:
var stringArray = str.Split('"').Where((item, index) => index % 2 != 0);

This also steals the Regex from #Ria, but allows you to get them into an array where you then remove the quotes:
strText = "Would \"you\" like to have responses to your \"questions\" sent to you via email?";
MatchCollection mc = Regex.Matches(strText, "\"([^\"]*)\"");
for (int z=0; z < mc.Count; z++)
{
Response.Write(mc[z].ToString().Replace("\"", ""));
}

I combine Regex and Trim:
const string searchString = "This is a \"search text\" and \"another text\" and not \"this text";
var collection = Regex.Matches(searchString, "\\\"(.*?)\\\"");
foreach (var item in collection)
{
Console.WriteLine(item.ToString().Trim('"'));
}
Result:
search text
another text

Try this (\"\w+\")+
I suggest you to download Expresso
http://www.ultrapico.com/Expresso.htm

I needed to do this in C# for parsing CSV and none of these worked for me so I came up with this:
\s*(?:(?:(['"])(?<value>(?:\\\1|[^\1])*?)\1)|(?<value>[^'",]+?))\s*(?:,|$)
This will parse out a field with or without quotes and will exclude the quotes from the value while keeping embedded quotes and commas. <value> contains the parsed field value. Without using named groups, either group 2 or 3 contains the value.
There are better and more efficient ways to do CSV parsing and this one will not be effective at identifying bad input. But if you can be sure of your input format and performance is not an issue, this might work for you.

Slight improvement on answer by #ria,
\"[^\" ][^\"]*\"
Will recognize a starting double quote only when not followed by a space to allow trailing inch specifiers.
Side effect: It will not recognize "" as a quoted value.

.NET regex replace using backreference

I have a fairly long string that contains sub strings with the following format:
project[1]/someword[1]
project[1]/someotherword[1]
There will be about 10 or so instances of this pattern in the string.
What I want to do is to be able to replace the second integer in square brackets with a different one. So the string would look like this for instance:
project[1]/someword[2]
project[1]/someotherword[2]
I''m thinking that regular expressions are what I need here. I came up with the regex:
project\[1\]/.*\[([0-9])\]
Which should capture the group [0-9] so I can replace it with something else. I'm looking at MSDN Regex.Replace() but I'm not seeing how to replace part of a string that is captured with a value of your choosing. Any advice on how to accomplish this would be appreciated. Thanks much.
*Edit: * After working with #Tharwen some I have changed my approach a bit. Here is the new code I am working with:
String yourString = String yourString = #"<element w:xpath=""/project[1]/someword[1]""/> <anothernode></anothernode> <another element w:xpath=""/project[1]/someotherword[1]""/>";
int yourNumber = 2;
string anotherString = string.Empty;
anotherString = Regex.Replace(yourString, #"(?<=project\[1\]/.*\[)\d(?=\]"")", yourNumber.ToString());

Matched groups are replaced using the $1, $2 syntax as follows :-
csharp> Regex.Replace("Meaning of life is 42", #"([^\d]*)(\d+)", "$1($2)");
"Meaning of life is (42)"
If you are new to regular expressions in .NET I recommend http://www.ultrapico.com/Expresso.htm
Also http://www.regular-expressions.info/dotnet.html has some good stuff for quick reference.

I've adapted yours to use a lookbehind and lookahead to only match a digit which is preceded by 'project[1]/xxxxx[' and followed by ']':
(?<=project\[1\]/.*\[)\d(?=\]")
Then, you can use:
String yourString = "project[1]/someword[1]";
int yourNumber = 2;
yourString = Regex.Replace(yourString, #"(?<=project\[1\]/.*\[)\d(?=\]"")", yourNumber.ToString());
I think maybe you were confused because Regex.Replace has lots of overloads which do slightly different things. I've used this one.

If you want to process the value of a captured group before replacing it, you'll have to separate the different parts of the string, make your modifications and put them back together.
string test = "project[1]/someword[1]\nproject[1]/someotherword[1]\n";
string result = string.Empty;
foreach (Match match in Regex.Matches(test, #"(project\[1\]/.*\[)([0-9])(\]\n)"))
{
result += match.Groups[1].Value;
result += (int.Parse(match.Groups[2].Value) + 1).ToString();
result += match.Groups[3].Value;
}
If you just want to replace text verbatim, it's easier: Regex.Replace(test, #"abc(.*)cba", #"cba$1abc").

you can use String.Replace (String, String)
for example
String.Replace ("someword[1]", "someword[2]")

C# Regex.Split - Subpattern returns empty strings

Hey, first time poster on this awesome community.
I have a regular expression in my C# application to parse an assignment of a variable:
NewVar = 40
which is entered in a Textbox. I want my regular expression to return (using Regex.Split) the name of the variable and the value, pretty straightforward. This is the Regex I have so far:
var r = new Regex(#"^(\w+)=(\d+)$", RegexOptions.IgnorePatternWhitespace);
var mc = r.Split(command);
My goal was to do the trimming of whitespace in the Regex and not use the Trim() method of the returned values. Currently, it works but it returns an empty string at the beginning of the MatchCollection and an empty string at the end.
Using the above input example, this is what's returned from Regex.Split:
mc[0] = ""
mc[1] = "NewVar"
mc[2] = "40"
mc[3] = ""
So my question is: why does it return an empty string at the beginning and the end?
Thanks.

The reson RegEx.Split is returning four values is that you have exactly one match, so RegEx.Split is returning:
All the text before your match, which is ""
All () groups within your match, which are "NewVar" and "40"
All the text after your match, which is ""
RegEx.Split's primary purpose is to extract any text between the matched regex, for example you could use RegEx.Split with a pattern of "[,;]" to split text on either commas or semicolons. In NET Framework 1.0 and 1.1, Regex.Split only returned the split values, in this case "" and "", but in NET Framework 2.0 it was modified to also include values matched by () within the Regex, which is why you are seeing "NewVar" and "40" at all.
What you were looking for is Regex.Match, not Regex.Split. It will do exactly what you want:
var r = new Regex(#"^(\w+)=(\d+)$");
var match = r.Match(command);
var varName = match.Groups[0].Value;
var valueText = match.Groups[1].Value;
Note that RegexOptions.IgnorePatternWhitespace means you can include extra spaces in your pattern - it has nothing to do with the matched text. Since you have no extra whitespace in your pattern it is unnecesssary.

From the docs, Regex.Split() uses the regular expression as the delimiter to split on. It does not split the captured groups out of the input string. Also, the IgnorePatternWhitespace ignore unescaped whitespace in your pattern, not the input.
Instead, try the following:
var r = new Regex(#"\s*=\s*");
var mc = r.Split(command);
Note that the whitespace is actually consumed as a part of the delimiter.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Exclude first and last quotation of string in regex result - c#

Instead of using Regex.Split, try Regex.Match.

You don't need RegEx. Simply use String.Split(';') and the second array element will have the path you need. You can then Trim() it to get rid of the quotes and Remove() to get rid of the ndir part. Something like: result = s.Split(';')[1].Trim("\r ".ToCharArray()).Remove(0, 7).Trim('"');

Related

Splitting of a string using Regex

Omit unnecessary parts in string array

Regex: C# extract text within double quotes

.NET regex replace using backreference

C# Regex.Split - Subpattern returns empty strings

Categories

Resources