Single regular expression with 2 matches - c#

Given the path
C:\Users\Bob\Downloads\Product12\Prices\USD
and only knowing it contains a subdirectory called Downloads
I have this regex to locate the Downloads part
(?<=Downloads\\)[^\\""]*
Ideally, I want to also match everything after Downloads as a separate group, but using a single regex for both Downloads and the following path portion.

this will match everything before 'downloads' in one subgroup, and everything after in another subgroup:
/^(.*?Downloads\\)(.*)$/

So given the sample input, you want to get Product12\Prices\USD, right?
result = Regex.Match(s, #"\\Downloads\\(.*)$").Groups[1].Value;
But that [\\""]* in your regex seems to indicate that your path is enclosed in quotes, and you don't want the match to the closing quote or anything after it.
result = Regex.Match(s, #"\\Downloads\\([^""]*)""").Groups[1].Value;
Some points of particular interest are:
when you create regexes in C#, always use C#'s verbatim string notation if at all possible (i.e., #"regex"). It saves you having to litter your code with a bunch of backslashes. For example, if your regex were in a standard C-style string literal you would have to use four backslashes in the regex to match one backslash input.
When you include regexes in your posts here at SO, show them as they appear in your code. Then we won't have to guess what the backslashes mean. For example, is the \\ in [^\\""]* supposed to match a literal backslash, or are you just escaping it for the regex?
Speaking of quotes, " has no special meaning in regexes, so you don't have to escape it for that. I changed that sequence to [^""]* because that's how you escape quotes in a verbatim string. In C-style string literal it would be [\\"]*.

You don't need RegEx to parse paths
var paths = new Uri(#"C:\Users\Bob\Downloads\Product12\Prices\USD").Segments;
would return all segments and you can skip till Downloads. For example
var paths = new Uri(#"C:\Users\Bob\Downloads\Product12\Prices\USD")
.Segments
.SkipWhile(s => s != "Downloads/")
.Skip(1)
.ToList();

Related

Match file paths in HTML and Javascript Files

If anyone can help me having a lot of trouble with a regex expression
Basically I need a RegEx that can spot files in html,css,javascript
enclosed by single or double quotes
I have got this far (\"|')([^"|'|\s]|\\"*)*\..*(\"|')
I am using C#
See the link
https://regex101.com/r/nga5yF/2
But if you look at my tests at the bottom where I have multiple matches on a single line it fails.
Any help would be appreciated!
We can use a negated character class for this:
['"][^'" ]+?\.[^'" ]*?['"]
Online Demo
Explanation:
everything between quotes, regardless of type, if there is a .
Instead of * use the non-greedy or lazy *? quantifier to match an unlimited number of repetitions, but in a non-greedy way. (i.e. take the shortest match).
Also, you forgot to exclude whitespace and quotes in the part after requiring a dot to be included.
Test this version of the regex:
(?<quote>\"|\')(?<file>[^\"\'\s]*?\.[^\"\'\s]*?)\k<quote>
https://regex101.com/r/wTXhaM/1
Further improvements:
Use named capturing groups.
Use a back reference at end of pattern to match double quote or single quote depending on beginning of string.
Or if you want to also match filenames where single and double quotes are mixed use this variant:
(?:\"|\')(?<file>[^\"\'\s]*?\.[^\"\'\s]*?)(?:\"|\')
Use named capturing group for filename.
Use non-capturing groups for quotes
https://regex101.com/r/uM2Qfd/1

How to give single back slash to a variable in c# program? [duplicate]

I want to write something like this C:\Users\UserName\Documents\Tasks in a textbox:
txtPath.Text = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments)+"\Tasks";
I get the error:
Unrecognized escape sequence.
How do I write a backslash in a string?
The backslash ("\") character is a special escape character used to indicate other special characters such as new lines (\n), tabs (\t), or quotation marks (\").
If you want to include a backslash character itself, you need two backslashes or use the # verbatim string:
var s = "\\Tasks";
// or
var s = #"\Tasks";
Read the MSDN documentation/C# Specification which discusses the characters that are escaped using the backslash character and the use of the verbatim string literal.
Generally speaking, most C# .NET developers tend to favour using the # verbatim strings when building file/folder paths since it saves them from having to write double backslashes all the time and they can directly copy/paste the path, so I would suggest that you get in the habit of doing the same.
That all said, in this case, I would actually recommend you use the Path.Combine utility method as in #lordkain's answer as then you don't need to worry about whether backslashes are already included in the paths and accidentally doubling-up the slashes or omitting them altogether when combining parts of paths.
To escape the backslash, simply use 2 of them, like this:
\\
If you need to escape other things, this may be helpful..
There is a special function made for this Path.Combine()
var folder = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments);
var fullpath = path.Combine(folder,"Tasks");
Just escape the "\" by using + "\\Tasks" or use a verbatim string like #"\Tasks"
txtPath.Text = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments)+"\\\Tasks";
Put a double backslash instead of a single backslash...
even though this post is quite old I tried something that worked for my case .
I wanted to create a string variable with the value below:
21541_12_1_13\":null
so my approach was like that:
build the string using verbatim
string substring = #"21541_12_1_13\"":null";
and then remove the unwanted backslashes using Remove function
string newsubstring = substring.Remove(13, 1);
Hope that helps.
Cheers

How do I escape a RegEx?

I have a Regex that I now need to moved into C#. I'm getting errors like this
Unrecognized escape sequence
I am using Regex.Escape -- but obviously incorrectly.
string pattern = Regex.Escape("^.*(?=.{7,})(?=.*[a-zA-Z])(?=.*(\d|[!##$%\?\(\)\*\&\^\-\+\=_])).*$");
hiddenRegex.Attributes.Add("value", pattern);
How is this correctly done?
The error you're getting is coming at compile time correct? That means C# compiler is not able to make sense of your string. Prepend # sign before the string and you should be fine. You don't need Regex.Escape.
See What's the # in front of a string in C#?
var pattern = new Regex(#"^.*(?=.{7,})(?=.*[a-zA-Z])(?=.*(\d|[!##$%\?\(\)\*\&\^\-\+\=_])).*$");
pattern.IsMatch("Your input string to test the pattern against");
The error you are getting is due to the fact that your string contains invalid escape sequences (e.g. \d). To fix this, either escape the backslashes manually or write a verbatim string literal instead:
string pattern = #"^.*(?=.{7,})(?=.*[a-zA-Z])(?=.*(\d|[!##$%\?\(\)\*\&\^\-\+\=_])).*$";
Regex.Escape would be used when you want to embed dynamic content to a regular expression, not when you want to construct a fixed regex. For example, you would use it here:
string name = "this comes from user input";
string pattern = string.Format("^{0}$", Regex.Escape(name));
You do this because name could very well include characters that have special meaning in a regex, such as dots or parentheses. When name is hardcoded (as in your example) you can escape those characters manually.

regular expression should split , that are contained outside the double quotes in a CSV file?

This is the sample
"abc","abcsds","adbc,ds","abc"
Output should be
abc
abcsds
adbc,ds
abc
Try this:
"(.*?)"
if you need to put this regex inside a literal, don't forget to escape it:
Regex re = new Regex("\"(.*?)\"");
This is a tougher job than you realize -- not only can there be commas inside the quotes, but there can also be quotes inside the quotes. Two consecutive quotes inside of a quoted string does not signal the end of the string. Instead, it signals a quote embedded in the string, so for example:
"x", "y,""z"""
should be parsed as:
x
y,"z"
So, the basic sequence is something like this:
Find the first non-white-space character.
If it was a quote, read up to the next quote. Then read the next character.
Repeat until that next character is not also a quote.
If the next (non-whitespace) character is not a comma, input is malformed.
If it was not a quote, read up to the next comma.
Skip the comma, repeat the whole process for the next field.
Note that despite the tag, I'm not providing a regex -- I'm not at all sure I've seen a regex that can really handle this properly.
This answer has a C# solution for dealing with CSV.
In particular, the line
private static Regex rexCsvSplitter = new Regex( #",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))" );
contains the Regex used to split properly, i.e., taking quoting and escaping into consideration.
Basically what it says is, match any comma that is followed by an even number of quote marks (including zero). This effectively prevents matching a comma that is part of a quoted string, since the quote character is escaped by doubling it.
Keep in mind that the quotes in the above line are doubled for the sake of the string literal. It might be easier to think of the expression as
,(?=(?:[^"]*"[^"]*")*(?![^"]*"))
If you can be sure there are no inner, escaped quotes, then I guess it's ok to use a regular expression for this. However, most modern languages already have proper CSV parsers.
Use a proper parser is the correct answer to this. Text::CSV for Perl, for example.
However, if you're dead set on using regular expressions, I'd suggest you "borrow" from some sort of module, like this one:
http://metacpan.org/pod/Regexp::Common::balanced

Replacing numbers in strings with C#

I'd thought i do a regex replace
Regex r = new Regex("[0-9]");
return r.Replace(sz, "#");
on a file named aa514a3a.4s5 . It works exactly as i expect. It replaces all the numbers including the numbers in the ext. How do i make it NOT replace the numbers in the ext. I tried numerous regex strings but i am beginning to think that its a all or nothing pattern so i cant do this? do i need to separate the ext from the string or can i use regex?
This one does it for me:
(?<!\.[0-9a-z]*)[0-9]
This does a negative lookbehind (the string must not occur before the matched string) on a period, followed by zero or more alphanumeric characters. This ensures only numbers are matched that are not in your extension.
Obviously, the [0-9a-z] must be replaced by which characters you expect in your extension.
I don't think you can do that with a single regular expression.
Probably best to split the original string into base and extension; do the replace on the base; then join them back up.
Yes, I thing you'd be better off separating the extension.
If you are sure there is always a 3-character extension at the end of your string, the easiest, most readable/maintainable solution would be to only perform the replace on
yourString.Substring(0,YourString.Length-4)
..and then append
yourString.Substring(YourString.Length-4, 4)
Why not run the regex on the substring?
String filename = "aa514a3a.4s5";
String nameonly = filename.Substring(0,filename.Length-4);

Categories

Resources