I want a regex in such a way that to replace the filename which contains special characters and dots(.) etc. with underscore(_) except the extension of the filename.
Help me with an regex
try this:
([!##$%^&*()]|(?:[.](?![a-z0-9]+$)))
with the insensitive flag "i". Replace with '_'
The first lot of characters can be customised, or maybe use \W (any non-word)
so this reads as:
replace with '_' where I match and of this set, or a period that is not followed by some characters or numbers and the end of line
Sample c# code:
var newstr = new Regex("([!##$%^&*()]|(?:[.](?![a-z0-9]+$)))", RegexOptions.IgnoreCase)
.Replace(myPath, "_");
Since you only care about the extension, forget about the rest of the filename. Write a regex to scrape off the extension, discarding the original filename, and then glue that extension onto the new filename.
This regular expression will match the extension, including the dot.: \.[^.]*$
Perhaps just take off the extension first and put it back on after? Something like (but add your own list of special characters):
static readonly Regex removeChars = new Regex("[-. ]", RegexOptions.Compiled);
static void Main() {
string path = "ab c.-def.ghi";
string ext = Path.GetExtension(path);
path = Path.ChangeExtension(
removeChars.Replace(Path.ChangeExtension(path, null), "_"), ext);
}
Once you separate the file extension out from your string would this then get you the rest of the way?
Related
I am trying to extract all of the text (shown as xxxx) in the follow pattern:
Session["xxxx"]
using c#
This may be Request.Querystring["xxxx"] so I am trying to build the expression dynamically. When I do so, I get all sorts of problems about unescaped charecters or no matches :(
an example might be:
string patternstart = "Session[";
string patternend = "]";
string regexexpr = #"\\" + patternstart + #"(.*?)\\" + patternend ;
string sText = "Text to be searched containing Session[\"xxxx\"] the result would be xxxx";
MatchCollection matches = Regex.Matches(sText, #regexexpr);
Can anyone help with this as I am stumped (as I always seem to be with RegEx :) )
With some little modifications to your code.
string patternstart = Regex.Escape("Session[");
string patternend = Regex.Escape("]");
string regexexpr = patternstart + #"(.*?)" + patternend;
The pattern you construct in your example looks something like this:
\\Session[(.*?)\\]
There are a couple of problems with this. First it assumes the string starts with a literal backslash, second, it wraps the entire (.*?) in a character class, that means it will match any single open parenthesis, period, asterisk, question mark, close parenthesis or backslash. You'd need to escape the the brackets in your pattern, if you want to match a literal [.
You could use a pattern like this:
Session\[(.*?)]
For example:
string regexexpr = #"Session\[(.*?)]";
string sText = "Text to be searched containing Session[\"xxxx\"] the result would be xxxx";
MatchCollection matches = Regex.Matches(sText, #regexexpr);
Console.WriteLine(matches[0].Groups[1].Value); // "xxxx"
The characters [ and ] have a special meaning with regular expressions - they define a group where one of the contained characters must match. To work around this, simply 'escape' them with a leading \ character:
string patternstart = "Session\[";
string patternend = "\]";
An example "final string" could then be:
Session\["(.*)"\]
However, you could easily write your RegEx to handle Session, Querystring, etc automatically if you require (without also matching every other array you throw at it), and avoid having to build up the string in the first place:
(Querystring|Session|Form)\["(.*)"\]
and then take the second match.
Interesting situation I have here. I have some files in a folder that all have a very explicit string in the first line that I always know will be there. Want I want to do is really just append |DATA_SOURCE_KEY right after AVAILABLE_IND
//regex to search for the bb_course_*.bbd files
string courseRegex = #"BB_COURSES_([C][E][Q]|[F][A]|[H][S]|[S][1]|[S][2]|[S][P])\d{1,6}.bbd";
string courseHeaderRegex = #"EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND";
//get files from the directory specifed in the GetFiles parameter and returns the matches to the regex
var matches = Directory.GetFiles(#"c:\courseFolder\").Where(path => Regex.Match(path, courseRegex).Success);
//prints the files returned
foreach (string file in matches)
{
Console.WriteLine(file);
File.WriteAllText(file, Regex.Replace(File.ReadAllText(file), courseHeaderRegex, "EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY"));
}
But this code takes the original occurrence of the matching regex, replaces it with my replacement value, and then does it 3 more times.
EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY|EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY|EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY|EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY
And I can't figure out why with breakpoints. My loop is running only 12 times to match the # of files I have in the directory. My only guess is that File.WriteAllText is somehow recursively searching itself after replacing the text and re-replacing. If that makes sense. Any ideas? Is it because courseHeaderRegex is so explicit?
If I change courseHeaderRegex to string courseHeaderRegex = #"AVAILABLE_IND";
then I get the correct changes in my files
EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY
I'd just like to understand why the original way doesn't work.
I think your problem is that you need to escape the | character in courseHeaderRegex:
string courseHeaderRegex = #"EXTERNAL_COURSE_KEY\|COURSE_ID\|COURSE_NAME\|AVAILABLE_IND";
The character | is the Alternation Operator and it will match 'EXTERNAL_COURSE_KEY' , 'COURSE_ID' , ,'COURSE_NAME' and 'AVAILABLE_IND', replacing each of them with your substitution string.
What about
string newString = File.ReadAllText(file)
.Replace(#"EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND",#"EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY");
just using a simple String.Replace()
I am trying to replace a bunch of strings in files. The strings are stored in a datatable along with the new string value.
string contents = File.ReadAllText(file);
foreach (DataRow dr in FolderRenames.Rows)
{
contents = Regex.Replace(contents, dr["find"].ToString(), dr["replace"].ToString());
File.SetAttributes(file, FileAttributes.Normal);
File.WriteAllText(file, contents);
}
The strings look like this _-uUa, -_uU, _-Ha etc.
The problem that I am having is when for example this string "_uU" will also overwrite "_-uUa" so the replacement would look like "newvaluea"
Is there a way to tell regex to look at the next character after the found string and make sure it is not an alphanumeric character?
I hope it is clear what I am trying to do here.
Here is some sample data:
private function _-0iX(arg1:flash.events.Event):void
{
if (arg1.type == flash.events.Event.RESIZE)
{
if (this._-2GU)
{
this._-yu(this._-2GU);
}
}
return;
}
The next characters could be ;, (, ), dot, comma, space, :, etc.
First of all, you should use Regex.Escape.
You can use then
contents = Regex.Replace(
contents,
Regex.Escape(dr["find"].ToString()) + #"(?![a-zA-Z])",
Regex.Escape(dr["replace"].ToString()));
or even better
contents = Regex.Replace(
contents,
#"\b" + Regex.Escape(dr["find"].ToString()) + #"\b",
Regex.Escape(dr["replace"].ToString()));
I think this is what you're looking for:
contents = Regex.Replace(
contents,
string.Format(#"(?<!\w){0}(?!\w)", Regex.Escape(dr["find"].ToString())),
dr["replace"].ToString().Replace("$", "$$")
);
You can't use \b because your search strings don't always start and end with word characters. Instead, I used (?<!\w) and (?!\w) to make sure the matched substring is not immediately preceded or followed by a word character (i.e., a letter, a digit, or an underscore). I don't know the complete specs for your search strings, so this pattern might need some tweaking.
None of the sample patterns you provided contain regex metacharacters, but like the other responders, I used Regex.Escape() to render it safe anyway. In the replacement string the only character you have to watch out for is the dollar sign (ref), and the way to escape that is with another dollar sign. Notice that I used String.Replace() for that instead of Regex.Replace().
There are two tricks that can help you here:
Order all the search string by length, and replace the longest ones first, that way you won't accidentally replace the shorter ones.
Use a MatchEvaluator and instead of looping through all your rows, search fro all replacement patterns in the string and look them up in your dataset.
Option one is simple, option two would look like this:
Regex.Replace(contents", "_-\\w+", ReplaceIdentifier)
public string ReplaceIdentifier(Match m)
{
DataRow row = FolderRenames.Rows.FindRow("find"); // Requires a primary key on "find"
if (row != null) return row["replace"];
else return m.Value;
}
I have a string.An example is given below.
[playlist]\r\npath1=url1\r\npath2=url2\r\npath=url3\r\npath4=url4\r\ncount=1
How can I extract path properties values from the above string.There may be many properties other than path properties.
Thr result i am expecting is
url1
url2
url3
url4
I think regular expression is best to do this. Any ideas(regular expressions) regarding the Rgular expression needed. How about using string.split method.. Which one is efficient? ..
Thanks in advance
Well, this regex works in your particular example:
path\d?=(.+?)\\r\\n
What isn't immediately obvious is if \r\n in your strings are literally the characters \r\n, or a carriage return + new line. The regex above matches those characters literally. If your text is actually this:
[playlist]
path1=url1
path2=url2
path=url3
path4=url4
count=1
Then this regex will work:
path\d?=(.+?)\n
And a quick example of how to use that in C#:
var str = #"[playlist]\r\npath1=url1\r\npath2=url2\r\npath=url3\r\npath4=url4\r\ncount=1";
var matches = Regex.Matches(str, #"path\d?=(.+?)\\r\\n");
foreach (Match match in matches)
{
var path = match.Groups[1].Value;
Console.WriteLine(path);
}
I'm trying to create a new file path in regex, in order to move some files. Say I have the path:
c:\Users\User\Documents\document.txt
And I want to convert it to:
c:\Users\User\document.txt
Is there an easy way to do this in regex?
If all you need is to remove the last folder name from the file path then I think it would be easier to use built-in FileInfo, DirectoryInfo and Path.Combine instead of regular expressions here:
var fileInfo = new FileInfo(#"c:\Users\User\Documents\document.txt");
if (fileInfo.Directory.Parent != null)
{
// this will give you "c:\Users\User\document.txt"
var newPath = Path.Combine(fileInfo.Directory.Parent.FullName, fileInfo.Name);
}
else
{
// there is no parent folder
}
One way in Perl regex flavour. It removes last directory in the path:
s/[^\\]+\\([^\\]*)$/$1/
Explanation:
s/.../.../ # Substitute command.
[^\\]+ # Any chars until '\'
\\ # A back-slash.
([^\\]*) # Any chars until '\'
$ # End-of-line (zero-width)
$1 # Substitute all characters matched in previous expression with expression between parentheses.
You can give this a try although it is a Java Code
String original_path = "c:\\Users\\User\\Documents\\document.txt";
String temp_path = original_path.substring(0,original_path.lastIndexOf("\\"));
String temp_path_1 = temp_path.substring(0,temp_path.lastIndexOf("\\"));
String temp_path_2 = original_path.substring(original_path.lastIndexOf("\\")+1,original_path.length());
System.out.println(temp_path_1 +"\\" + temp_path_2);
You mentioned that transformation is the same every time so, it is not always a good practice to rely on regexp for things which can be done using String manipulations.
Why not some combination of pathStr.Split('\\'), Take(length - 2), and String.Join?
Use Regex replace method. Find what you are looking for, then replace it with nothing (string.empty) here is the C# code:
string directory = #"c:\Users\User\Documents\document.txt";
string pattern = #"(Documents\\)";
Console.WriteLine( Regex.Replace(directory, pattern, string.Empty ) );
// Outputs
// c:\Users\User\document.txt