Regex - C# - Get non matching part of string

Regex - C# - Get non matching part of string - c#

The regex pattern I wrote below is matching the string before "FinalFolder".
How can I get the folder name (in this case "FinalFolder") just after the string matching the regex?
EDIT : Pretty sure I got my Regex wrong. My intent was to match upto "C:\FolderA\FolderB\FolderC\FolderD\Test 1.0\FolderE\FolderF" and then find the folder after that. So, in this case, the folder I am looking for is "FinalFolder"
[TestMethod]
public void TestRegex()
{
string pattern = #"[A-Za-z:]\\[A-Za-z]{1,}\\[A-Za-z]{1,}\\[A-Za-z0-9]{1,}\\[A-Za-z0-9]{1,}\\[A-Za-z0-9._s]{1,}\\[A-Za-z]{1,}\\[A-Za-z]{1,}";
string textToMatch = #"C:\FolderA\FolderB\FolderC\FolderD\Test 1.0\FolderE\FolderF\FinalFolder\Subfolder\Test.txt";
string[] matches = Regex.Split(textToMatch, pattern);
Console.WriteLine(matches[0]);
}

There are plenty of other hints and advice that will lead you to getting the desired folder and I recommend considering them. But since it looks like you would still benefit from learning more regex skills, here is the answer you asked for: Getting non-matching part of string.
Let's imagine that your Regex actually matched the given path, for instance a pattern like: [A-Za-z]:\\[A-Za-z]+\\[A-Za-z]+\\[A-Za-z0-9]+\\[A-Za-z0-9]+\\[A-Za-z0-9._\s]+\\[A-Za-z]+\\[A-Za-z]+
You could get the matched string, its position and length, then determine where in the original source string the next folder name would start. But then you would also need to determine where the next folder name ends.
MatchCollection matches = Regex.Matches(textToMatch, pattern);
if (matches.Count > 0 ) {
Match m = matches[0];
var remaining = textToMatch.Substring(m.Index + m.Length);
//Now find the next backslash and grab the leftmost part...
}
That answers your most general question, but that approach defeats the entire utility of using regex. Instead, just extend your pattern to match the next folder!
Regex patterns already provide the ability to capture certain portions of a match. The default regex construct for capturing text is a set of parenthesis. Even better, .Net regex supports named capture groups using (?<name>).
//using System.Text.RegularExpressions;
string pattern = #"(?<start>"
+ #"[A-Za-z]:\\[A-Za-z]+\\[A-Za-z]+\\[A-Za-z0-9]+\\[A-Za-z0-9]+\\[A-Za-z0-9._\s]+\\[A-Za-z]+\\[A-Za-z]+"
+ #")\\(?<next>[A-Za-z0-9._\s]+)(\\|$)";
string textToMatch = #"C:\FolderA\FolderB\FolderC\FolderD\Test 1.0\FolderE\FolderF\FinalFolder\Subfolder\Test.txt";
MatchCollection matches = Regex.Matches(textToMatch, pattern);
if (matches.Count > 0 ) {
var nextFolderName = matches[0].Groups["next"];
Console.WriteLine(nextFolderName);
}

As posted in a comment, your regex seems to be matching the entire string. But in this particular case, since you are dealing with a filename, I would use FileInfo.
FileInfo fi = new FileInfo(textToMatch);
Console.WriteLine(fi.DirectoryName);
Console.WriteLine(fi.Directory.Name);
DirectoryName will be the full path, while Directory.Name will be just the subfolder in question.

So, using FileInfo, something like this?
(new FileInfo(textToMatch)).Directory.Parent.Name

Related

Extract groups with regex and construct URL in a single line

I am currently trying to extract values from a string and construct a URL that includes those values. I went through a dozen regex question, but I am not quite satisfied with the answers.
I have custom encoded strings with more than one information and I want to construct a new URL that contains those information.
For example 35afe06d-8393-4559-b6d7-74d35ce131d8|Master should become http://my-server/media/guid/35afe06d-8393-4559-b6d7-74d35ce131d8?v=Master. My first assumption was
var input = "35afe06d-8393-4559-b6d7-74d35ce131d8|Master"
var pattern = #"((?:[a-f0-9]+-?){5})|(\w+)"
var replacement = "http://my-server/media/guid/$1?v=$2"
var output = Regex.Replace(input, pattern, replacement)
However this replaces each group with the full URL. Limitation is, that I am not aware of input, pattern, replacement or output. pattern and replacement are two config values and I don't want to make it x pairs of config values, input comes from somewhere else in the application and could have any custom encoding (pipe, colon, ...) output depends on the use case. It can have any number of groups in the pattern and doesn't even have to be a URL in the end.
I can think of different ways to do this, like parsing the string myself, or trying to create a replacement dictionary, or using regex to find the groups and then string replace for $1 => match.Groups[0]. I just feel like there must be an obvious 1-liner solution for that in .NET since I even remember doing that in PHP.
Answer: It's not a .NET limitation, it was simply the unescaped pipe.

In your pattern (([a-f0-9]+-?){5})|\w+ the second group should be capturing the word characters after the pipe (escape the pipe to match it literally).
If you repeat this part ([a-f0-9]+-?) 5 times, the match could also end on a hyphen.
To match the values separated by the dash, you could match the character class [a-f0-9]+ and repeat matching that {4} times prepended by a -
([a-f0-9]+(?:-[a-f0-9]+){4})\|(\w+)
.NET Regex demo | C# demo
var input = "35afe06d-8393-4559-b6d7-74d35ce131d8|Master";
var pattern = #"([a-f0-9]+(?:-[a-f0-9]+){4})\|(\w+)";
var replacement = "http://my-server/media/guid/$1?v=$2";
var output = Regex.Replace(input, pattern, replacement);
Console.WriteLine(output);
Result
http://my-server/media/guid/35afe06d-8393-4559-b6d7-74d35ce131d8?v=Master

This expression might also work here:
^(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b)\s*\|\s*(.*?)\s*$
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
Test
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"^(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b)\s*\|\s*(.*?)\s*$";
string substitution = #"http://my-server/media/guid/\1?v=$2";
string input = #"35afe06d-8393-4559-b6d7-74d35ce131d8|Master
35afe06d-8393-4559-b6d7-74d35ce131d8| Master ";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
}
}
Reference
Searching for UUIDs in text with regex

Regex to first match, then replace found matches

In my C# program I am using Regular expressions to:
Loop through a list of possible words in need of replacing.
For each word, to find out if a string I am given has any matches.
If it does, I perform some (slightly costly) logic to create the replacement.
I then perform the actual replacement.
My current code looks roughly as follows:
string toSearchInside; // The actual string I'm going to be replacing within
List<string> searchStrings; // The list of words to look for via regex
string pattern = #"([:#?]{0})";
string replacement;
foreach (string toMatch in searchStrings)
{
var regex = new Regex(
string.Format(pattern, toMatch),
RegexOptions.IgnoreCase
);
var matches = regex.Matches(toSearchInside);
if (matches.Count == 0)
continue;
replacement = CreateReplacement(toMatch);
toSearchInside = regex.Replace(toSearchInside, replacement);
And I can get this working, but it seems somewhat inefficient in that it is using the regex engine twice - Once to find the matches (regex.Matches()) and once for the replacing regex.Replace()). I was wondering if there was a way to simply say replace the matches you already found?

you could get all the matches from the first match - and for each match you have its index, that you could iterate through the matches and replace it in the string itself - since it is more efficient than regex replace.
Though I would measure the performance with small unit test ( and having NCrunch running in background makes it faster)

Regex accepting all strings, wronly

I am trying to substrings if they have certain format. Substring Regex query is [CENAOD(xyx)]. I have done following code but when running this in cycle it says all results match which is wrong. Where I've done something wrong?
string strRegex = #"(\[CENAOD\((\S|\W)*\)\])*";
string strCenaOd = sReader["intro"].ToString()
if (Regex.IsMatch(strCenaOd, strRegex, RegexOptions.IgnoreCase))
{
string = (want to read content of ( ) = xyz in example)
}

Remove the outer ( ... )*.
That says no match is a good match too.
Or use + instead of *.

Adding to #Kent's and #leppie's answers, the code surrounding the regex needs work, too. I think this is what you were trying for:
string strRegex = #"\[CENAOD\(([^)]*)\)\]";
string strCenaOd = sReader["intro"].ToString();
Match m = Regex.Match(strCenaOd, strRegex, RegexOptions.IgnoreCase);
if (m.Success)
{
string content = m.Groups[1];
// ...
}
IsMatch() is a simple yes-or-no check, it doesn't provide any way to retrieve the matched text.
I especially want to comment on (\S|\W)*, from your regex. First, \S|\W is a very inefficient way to match any character. . is usually all you need, but as Kent pointed out, [^)] (i.e., any character except )) is more appropriate in this case. Also, by placing the * outside the round brackets, you'll only ever capture the last character. ([^)]*) captures all of them. For more details, read this.

if you said "all strings", how about:
\[CENAOD\([^\)]*\)\]

How to extract a URL from a 200 character string of words in C#, preferably using RegExp

I'd like to implement a RegExp (regular expression) that can check a string to see if it contains "http://" (i.e. it contains a URL), and then take that whole URL into a new string variable. The string I am using is not HTML, it is simply text with any arrangement of words, characters, numbers and URLs.
I'd imagine I'd look for a mention of "http://" within my string, and take a new string whose starting point is http:// and the end of the string is the next whitespace point just after the full URL.
PLEASE HELP, I've looked high and low for this to no avail!
Thanks in advance,
Alex

I've being answering to smth like this here. I guess that code could be changed to suit your needs; it loads text file and searched for urls.
using (StreamReader reader = new StreamReader(File.OpenRead("c:\\test.txt")))
{
string content = reader.ReadToEnd();
string pattern = #"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:##%/;$()~_?\+-=\\\.&]*)";
MatchCollection matches = Regex.Matches(content, pattern);
foreach (Match match in matches)
{
GroupCollection groups = match.Groups;
Console.WriteLine("'{0}' repeated at position {1}",
groups[0].Value, groups[0].Index);
}
}
hope this helps, regards

How can I find a string after a specific string/character using regex

I am hopeless with regex (c#) so I would appreciate some help:
Basicaly I need to parse a text and I need to find the following information inside the text:
Sample text:
KeywordB:***TextToFind* the rest is not relevant but **KeywordB: Text ToFindB and then some more text.
I need to find the word(s) after a certain keyword which may end with a “:”.
[UPDATE]
Thanks Andrew and Alan: Sorry for reopening the question but there is quite an important thing missing in that regex. As I wrote in my last comment, Is it possible to have a variable (how many words to look for, depending on the keyword) as part of the regex?
Or: I could have a different regex for each keyword (will only be a hand full). But still don't know how to have the "words to look for" constant inside the regex

The basic regex is this:
var pattern = #"KeywordB:\s*(\w*)";
\s* = any number of spaces
\w* = 0 or more word characters (non-space, basically)
() = make a group, so you can extract the part that matched
var pattern = #"KeywordB:\s*(\w*)";
var test = #"KeywordB: TextToFind";
var match = Regex.Match(test, pattern);
if (match.Success) {
Console.Write("Value found = {0}", match.Groups[1]);
}
If you have more than one of these on a line, you can use this:
var test = #"KeywordB: TextToFind KeyWordF: MoreText";
var matches = Regex.Matches(test, #"(?:\s*(?<key>\w*):\s?(?<value>\w*))");
foreach (Match f in matches ) {
Console.WriteLine("Keyword '{0}' = '{1}'", f.Groups["key"], f.Groups["value"]);
}
Also, check out the regex designer here: http://www.radsoftware.com.au/. It is free, and I use it constantly. It works great to prototype expressions. You need to rearrange the UI for basic work, but after that it's easy.
(fyi) The "#" before strings means that \ no longer means something special, so you can type #"c:\fun.txt" instead of "c:\fun.txt"

Let me know if I should delete the old post, but perhaps someone wants to read it.
The way to do a "words to look for" inside the regex is like this:
regex = #"(Key1|Key2|Key3|LastName|FirstName|Etc):"
What you are doing probably isn't worth the effort in a regex, though it can probably be done the way you want (still not 100% clear on requirements, though). It involves looking ahead to the next match, and stopping at that point.
Here is a re-write as a regex + regular functional code that should do the trick. It doesn't care about spaces, so if you ask for "Key2" like below, it will separate it from the value.
string[] keys = {"Key1", "Key2", "Key3"};
string source = "Key1:Value1Key2: ValueAnd A: To Test Key3: Something";
FindKeys(keys, source);
private void FindKeys(IEnumerable<string> keywords, string source) {
var found = new Dictionary<string, string>(10);
var keys = string.Join("|", keywords.ToArray());
var matches = Regex.Matches(source, #"(?<key>" + keys + "):",
RegexOptions.IgnoreCase);
foreach (Match m in matches) {
var key = m.Groups["key"].ToString();
var start = m.Index + m.Length;
var nx = m.NextMatch();
var end = (nx.Success ? nx.Index : source.Length);
found.Add(key, source.Substring(start, end - start));
}
foreach (var n in found) {
Console.WriteLine("Key={0}, Value={1}", n.Key, n.Value);
}
}
And the output from this is:
Key=Key1, Value=Value1
Key=Key2, Value= ValueAnd A: To Test
Key=Key3, Value= Something

/KeywordB\: (\w)/
This matches any word that comes after your keyword. As you didn´t mentioned any terminator, I assumed that you wanted only the word next to the keyword.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex - C# - Get non matching part of string - c#

So, using FileInfo, something like this? (new FileInfo(textToMatch)).Directory.Parent.Name

Related

Extract groups with regex and construct URL in a single line

Regex to first match, then replace found matches

Regex accepting all strings, wronly

How to extract a URL from a 200 character string of words in C#, preferably using RegExp

How can I find a string after a specific string/character using regex

Categories

Resources