Create new file path using regex - c#

I'm trying to create a new file path in regex, in order to move some files. Say I have the path:
c:\Users\User\Documents\document.txt
And I want to convert it to:
c:\Users\User\document.txt
Is there an easy way to do this in regex?

If all you need is to remove the last folder name from the file path then I think it would be easier to use built-in FileInfo, DirectoryInfo and Path.Combine instead of regular expressions here:
var fileInfo = new FileInfo(#"c:\Users\User\Documents\document.txt");
if (fileInfo.Directory.Parent != null)
{
// this will give you "c:\Users\User\document.txt"
var newPath = Path.Combine(fileInfo.Directory.Parent.FullName, fileInfo.Name);
}
else
{
// there is no parent folder
}

One way in Perl regex flavour. It removes last directory in the path:
s/[^\\]+\\([^\\]*)$/$1/
Explanation:
s/.../.../ # Substitute command.
[^\\]+ # Any chars until '\'
\\ # A back-slash.
([^\\]*) # Any chars until '\'
$ # End-of-line (zero-width)
$1 # Substitute all characters matched in previous expression with expression between parentheses.

You can give this a try although it is a Java Code
String original_path = "c:\\Users\\User\\Documents\\document.txt";
String temp_path = original_path.substring(0,original_path.lastIndexOf("\\"));
String temp_path_1 = temp_path.substring(0,temp_path.lastIndexOf("\\"));
String temp_path_2 = original_path.substring(original_path.lastIndexOf("\\")+1,original_path.length());
System.out.println(temp_path_1 +"\\" + temp_path_2);
You mentioned that transformation is the same every time so, it is not always a good practice to rely on regexp for things which can be done using String manipulations.

Why not some combination of pathStr.Split('\\'), Take(length - 2), and String.Join?

Use Regex replace method. Find what you are looking for, then replace it with nothing (string.empty) here is the C# code:
string directory = #"c:\Users\User\Documents\document.txt";
string pattern = #"(Documents\\)";
Console.WriteLine( Regex.Replace(directory, pattern, string.Empty ) );
// Outputs
// c:\Users\User\document.txt

Related

Regex - C# - Get non matching part of string

The regex pattern I wrote below is matching the string before "FinalFolder".
How can I get the folder name (in this case "FinalFolder") just after the string matching the regex?
EDIT : Pretty sure I got my Regex wrong. My intent was to match upto "C:\FolderA\FolderB\FolderC\FolderD\Test 1.0\FolderE\FolderF" and then find the folder after that. So, in this case, the folder I am looking for is "FinalFolder"
[TestMethod]
public void TestRegex()
{
string pattern = #"[A-Za-z:]\\[A-Za-z]{1,}\\[A-Za-z]{1,}\\[A-Za-z0-9]{1,}\\[A-Za-z0-9]{1,}\\[A-Za-z0-9._s]{1,}\\[A-Za-z]{1,}\\[A-Za-z]{1,}";
string textToMatch = #"C:\FolderA\FolderB\FolderC\FolderD\Test 1.0\FolderE\FolderF\FinalFolder\Subfolder\Test.txt";
string[] matches = Regex.Split(textToMatch, pattern);
Console.WriteLine(matches[0]);
}
There are plenty of other hints and advice that will lead you to getting the desired folder and I recommend considering them. But since it looks like you would still benefit from learning more regex skills, here is the answer you asked for: Getting non-matching part of string.
Let's imagine that your Regex actually matched the given path, for instance a pattern like: [A-Za-z]:\\[A-Za-z]+\\[A-Za-z]+\\[A-Za-z0-9]+\\[A-Za-z0-9]+\\[A-Za-z0-9._\s]+\\[A-Za-z]+\\[A-Za-z]+
You could get the matched string, its position and length, then determine where in the original source string the next folder name would start. But then you would also need to determine where the next folder name ends.
MatchCollection matches = Regex.Matches(textToMatch, pattern);
if (matches.Count > 0 ) {
Match m = matches[0];
var remaining = textToMatch.Substring(m.Index + m.Length);
//Now find the next backslash and grab the leftmost part...
}
That answers your most general question, but that approach defeats the entire utility of using regex. Instead, just extend your pattern to match the next folder!
Regex patterns already provide the ability to capture certain portions of a match. The default regex construct for capturing text is a set of parenthesis. Even better, .Net regex supports named capture groups using (?<name>).
//using System.Text.RegularExpressions;
string pattern = #"(?<start>"
+ #"[A-Za-z]:\\[A-Za-z]+\\[A-Za-z]+\\[A-Za-z0-9]+\\[A-Za-z0-9]+\\[A-Za-z0-9._\s]+\\[A-Za-z]+\\[A-Za-z]+"
+ #")\\(?<next>[A-Za-z0-9._\s]+)(\\|$)";
string textToMatch = #"C:\FolderA\FolderB\FolderC\FolderD\Test 1.0\FolderE\FolderF\FinalFolder\Subfolder\Test.txt";
MatchCollection matches = Regex.Matches(textToMatch, pattern);
if (matches.Count > 0 ) {
var nextFolderName = matches[0].Groups["next"];
Console.WriteLine(nextFolderName);
}
As posted in a comment, your regex seems to be matching the entire string. But in this particular case, since you are dealing with a filename, I would use FileInfo.
FileInfo fi = new FileInfo(textToMatch);
Console.WriteLine(fi.DirectoryName);
Console.WriteLine(fi.Directory.Name);
DirectoryName will be the full path, while Directory.Name will be just the subfolder in question.
So, using FileInfo, something like this?
(new FileInfo(textToMatch)).Directory.Parent.Name

Simple regex matching issue, what's my mistake?

I have a string:
1/45 files checked
I want to parse the numbers (1 and 45) out of it, but first, to check if a string matches this pattern at all. So I write a regex:
String line = "1/45 files checked";
Match filesProgressMatch = Regex.Match(line, #"[0-9]+/[0-9]+ files checked");
if (filesProgressMatch.Success)
{
String matched = filesProgressMatch.Groups[1].Value.Replace(" files checked", "");
string[] numbers = matched.Split('/');
filesChecked = Convert.ToInt32(numbers[0]);
totalFiles = Convert.ToInt32(numbers[1]);
}
I expected matched to contain "1/45", but it is, in fact, empty. What's my mistake?
My first thought was '/' is a special character in a regex, but that doesn't seem to be the case.
P. S. Is there a better way to parse these values from such string in C#?
Your regex is matching, but you are selecting Groups[1] where the count of groups is one. So use
String matched = filesProgressMatch.Groups[0].Value.Replace(" files checked", "");
And you should be fine
Try this regex:
You need to escape the forward slash
([0-9]+\/[0-9]+) files checked
Demo
Use capture group:
Regex.Match(line, #"([0-9]+/[0-9]+) files checked");
# here __^ and __^
You could also use 2 groups:
Regex.Match(line, #"([0-9]+)/([0-9]+) files checked");
Applying the replace operation to the first element of filesProgressMath.Groups seems to work.
String matched = filesProgressMatch.Groups[0].Value.Replace(" files checked", "");
This should give you your results
string txtText = #"1\45 files matched";
int[] s = System.Text.RegularExpressions.Regex.Split(txtText, "[^\\d+]").Where(x => !string.IsNullOrEmpty(x)).Select(x => Convert.ToInt32(x)).ToArray();

Why does regex replace my feed string once + 3 times?

Interesting situation I have here. I have some files in a folder that all have a very explicit string in the first line that I always know will be there. Want I want to do is really just append |DATA_SOURCE_KEY right after AVAILABLE_IND
//regex to search for the bb_course_*.bbd files
string courseRegex = #"BB_COURSES_([C][E][Q]|[F][A]|[H][S]|[S][1]|[S][2]|[S][P])\d{1,6}.bbd";
string courseHeaderRegex = #"EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND";
//get files from the directory specifed in the GetFiles parameter and returns the matches to the regex
var matches = Directory.GetFiles(#"c:\courseFolder\").Where(path => Regex.Match(path, courseRegex).Success);
//prints the files returned
foreach (string file in matches)
{
Console.WriteLine(file);
File.WriteAllText(file, Regex.Replace(File.ReadAllText(file), courseHeaderRegex, "EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY"));
}
But this code takes the original occurrence of the matching regex, replaces it with my replacement value, and then does it 3 more times.
EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY|EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY|EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY|EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY
And I can't figure out why with breakpoints. My loop is running only 12 times to match the # of files I have in the directory. My only guess is that File.WriteAllText is somehow recursively searching itself after replacing the text and re-replacing. If that makes sense. Any ideas? Is it because courseHeaderRegex is so explicit?
If I change courseHeaderRegex to string courseHeaderRegex = #"AVAILABLE_IND";
then I get the correct changes in my files
EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY
I'd just like to understand why the original way doesn't work.
I think your problem is that you need to escape the | character in courseHeaderRegex:
string courseHeaderRegex = #"EXTERNAL_COURSE_KEY\|COURSE_ID\|COURSE_NAME\|AVAILABLE_IND";
The character | is the Alternation Operator and it will match 'EXTERNAL_COURSE_KEY' , 'COURSE_ID' , ,'COURSE_NAME' and 'AVAILABLE_IND', replacing each of them with your substitution string.
What about
string newString = File.ReadAllText(file)
.Replace(#"EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND",#"EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY");
just using a simple String.Replace()

Regular expression to remove part of a string

I am kinda new to regex, I have a string e.g
String test = #"c:\test\testing";
Now what I would like to accomplish is to removing all words up to "\". So in this case the work being removed is "testing". Howerver, this word may be different everytime.
So bascally remove everyting until the first \ is found.
Any ideas?
You mean remove backwards, until the first \ is found?
You could easily do this without regexes:
var lastIndex = myString.LastIndexOf('\\');
if (lastIndex != -1)
{
myString = myString.Substring(0, lastIndex + 1); // keep the '\\' you found
}
But if you're really just trying to get the directory component of a path, you can use this:
var directoryOfPath = System.IO.Path.GetDirectoryName(fullPath);
Although IIRC that method call will strip the trailing backslash.
You can use the following regex pattern:
(?!\\)([^\\]*)$
Do a replace on this pattern with the empty string, as shown below:
var re = new Regex(#"(?!\\)([^\\]*)$");
var result = re.Replace(#"c:\test\testing", string.Empty);
Console.WriteLine(result);
However, consider using the System.IO namespace, specifically the Path class, instead of Regex.
Try this
\\\w+$ and replace it with \
Or you can use the following approach
(?<=\\)\w+$ In this case you just replace the match with an empty string.
regex.replace(str,"^.*?\\","");
I prefer to use the DirectoryInfo for this, or even a substring action.
DirectoryInfo dir = new DirectoryInfo(#"c:\test\testing");
String dirName = dir.Name;
You can do this without regex:
String test = #"c:\test\testing";
int lastIndex = test.LastIndexOf("\");
test = test.Remove(0, lastIndex >= 0 ? lastIndex : 0);
If you want to "remove" or manipulate file paths, you can skip the basic Regex class altogether and use the class Path from System.IO. This class will give you suitable methods for all you needs in changing/extracting file names.

regex - Replace all dots,special characters except for the file extension

I want a regex in such a way that to replace the filename which contains special characters and dots(.) etc. with underscore(_) except the extension of the filename.
Help me with an regex
try this:
([!##$%^&*()]|(?:[.](?![a-z0-9]+$)))
with the insensitive flag "i". Replace with '_'
The first lot of characters can be customised, or maybe use \W (any non-word)
so this reads as:
replace with '_' where I match and of this set, or a period that is not followed by some characters or numbers and the end of line
Sample c# code:
var newstr = new Regex("([!##$%^&*()]|(?:[.](?![a-z0-9]+$)))", RegexOptions.IgnoreCase)
.Replace(myPath, "_");
Since you only care about the extension, forget about the rest of the filename. Write a regex to scrape off the extension, discarding the original filename, and then glue that extension onto the new filename.
This regular expression will match the extension, including the dot.: \.[^.]*$
Perhaps just take off the extension first and put it back on after? Something like (but add your own list of special characters):
static readonly Regex removeChars = new Regex("[-. ]", RegexOptions.Compiled);
static void Main() {
string path = "ab c.-def.ghi";
string ext = Path.GetExtension(path);
path = Path.ChangeExtension(
removeChars.Replace(Path.ChangeExtension(path, null), "_"), ext);
}
Once you separate the file extension out from your string would this then get you the rest of the way?

Categories

Resources