Interesting situation I have here. I have some files in a folder that all have a very explicit string in the first line that I always know will be there. Want I want to do is really just append |DATA_SOURCE_KEY right after AVAILABLE_IND
//regex to search for the bb_course_*.bbd files
string courseRegex = #"BB_COURSES_([C][E][Q]|[F][A]|[H][S]|[S][1]|[S][2]|[S][P])\d{1,6}.bbd";
string courseHeaderRegex = #"EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND";
//get files from the directory specifed in the GetFiles parameter and returns the matches to the regex
var matches = Directory.GetFiles(#"c:\courseFolder\").Where(path => Regex.Match(path, courseRegex).Success);
//prints the files returned
foreach (string file in matches)
{
Console.WriteLine(file);
File.WriteAllText(file, Regex.Replace(File.ReadAllText(file), courseHeaderRegex, "EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY"));
}
But this code takes the original occurrence of the matching regex, replaces it with my replacement value, and then does it 3 more times.
EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY|EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY|EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY|EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY
And I can't figure out why with breakpoints. My loop is running only 12 times to match the # of files I have in the directory. My only guess is that File.WriteAllText is somehow recursively searching itself after replacing the text and re-replacing. If that makes sense. Any ideas? Is it because courseHeaderRegex is so explicit?
If I change courseHeaderRegex to string courseHeaderRegex = #"AVAILABLE_IND";
then I get the correct changes in my files
EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY
I'd just like to understand why the original way doesn't work.
I think your problem is that you need to escape the | character in courseHeaderRegex:
string courseHeaderRegex = #"EXTERNAL_COURSE_KEY\|COURSE_ID\|COURSE_NAME\|AVAILABLE_IND";
The character | is the Alternation Operator and it will match 'EXTERNAL_COURSE_KEY' , 'COURSE_ID' , ,'COURSE_NAME' and 'AVAILABLE_IND', replacing each of them with your substitution string.
What about
string newString = File.ReadAllText(file)
.Replace(#"EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND",#"EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY");
just using a simple String.Replace()
Related
The regex pattern I wrote below is matching the string before "FinalFolder".
How can I get the folder name (in this case "FinalFolder") just after the string matching the regex?
EDIT : Pretty sure I got my Regex wrong. My intent was to match upto "C:\FolderA\FolderB\FolderC\FolderD\Test 1.0\FolderE\FolderF" and then find the folder after that. So, in this case, the folder I am looking for is "FinalFolder"
[TestMethod]
public void TestRegex()
{
string pattern = #"[A-Za-z:]\\[A-Za-z]{1,}\\[A-Za-z]{1,}\\[A-Za-z0-9]{1,}\\[A-Za-z0-9]{1,}\\[A-Za-z0-9._s]{1,}\\[A-Za-z]{1,}\\[A-Za-z]{1,}";
string textToMatch = #"C:\FolderA\FolderB\FolderC\FolderD\Test 1.0\FolderE\FolderF\FinalFolder\Subfolder\Test.txt";
string[] matches = Regex.Split(textToMatch, pattern);
Console.WriteLine(matches[0]);
}
There are plenty of other hints and advice that will lead you to getting the desired folder and I recommend considering them. But since it looks like you would still benefit from learning more regex skills, here is the answer you asked for: Getting non-matching part of string.
Let's imagine that your Regex actually matched the given path, for instance a pattern like: [A-Za-z]:\\[A-Za-z]+\\[A-Za-z]+\\[A-Za-z0-9]+\\[A-Za-z0-9]+\\[A-Za-z0-9._\s]+\\[A-Za-z]+\\[A-Za-z]+
You could get the matched string, its position and length, then determine where in the original source string the next folder name would start. But then you would also need to determine where the next folder name ends.
MatchCollection matches = Regex.Matches(textToMatch, pattern);
if (matches.Count > 0 ) {
Match m = matches[0];
var remaining = textToMatch.Substring(m.Index + m.Length);
//Now find the next backslash and grab the leftmost part...
}
That answers your most general question, but that approach defeats the entire utility of using regex. Instead, just extend your pattern to match the next folder!
Regex patterns already provide the ability to capture certain portions of a match. The default regex construct for capturing text is a set of parenthesis. Even better, .Net regex supports named capture groups using (?<name>).
//using System.Text.RegularExpressions;
string pattern = #"(?<start>"
+ #"[A-Za-z]:\\[A-Za-z]+\\[A-Za-z]+\\[A-Za-z0-9]+\\[A-Za-z0-9]+\\[A-Za-z0-9._\s]+\\[A-Za-z]+\\[A-Za-z]+"
+ #")\\(?<next>[A-Za-z0-9._\s]+)(\\|$)";
string textToMatch = #"C:\FolderA\FolderB\FolderC\FolderD\Test 1.0\FolderE\FolderF\FinalFolder\Subfolder\Test.txt";
MatchCollection matches = Regex.Matches(textToMatch, pattern);
if (matches.Count > 0 ) {
var nextFolderName = matches[0].Groups["next"];
Console.WriteLine(nextFolderName);
}
As posted in a comment, your regex seems to be matching the entire string. But in this particular case, since you are dealing with a filename, I would use FileInfo.
FileInfo fi = new FileInfo(textToMatch);
Console.WriteLine(fi.DirectoryName);
Console.WriteLine(fi.Directory.Name);
DirectoryName will be the full path, while Directory.Name will be just the subfolder in question.
So, using FileInfo, something like this?
(new FileInfo(textToMatch)).Directory.Parent.Name
I have a string:
1/45 files checked
I want to parse the numbers (1 and 45) out of it, but first, to check if a string matches this pattern at all. So I write a regex:
String line = "1/45 files checked";
Match filesProgressMatch = Regex.Match(line, #"[0-9]+/[0-9]+ files checked");
if (filesProgressMatch.Success)
{
String matched = filesProgressMatch.Groups[1].Value.Replace(" files checked", "");
string[] numbers = matched.Split('/');
filesChecked = Convert.ToInt32(numbers[0]);
totalFiles = Convert.ToInt32(numbers[1]);
}
I expected matched to contain "1/45", but it is, in fact, empty. What's my mistake?
My first thought was '/' is a special character in a regex, but that doesn't seem to be the case.
P. S. Is there a better way to parse these values from such string in C#?
Your regex is matching, but you are selecting Groups[1] where the count of groups is one. So use
String matched = filesProgressMatch.Groups[0].Value.Replace(" files checked", "");
And you should be fine
Try this regex:
You need to escape the forward slash
([0-9]+\/[0-9]+) files checked
Demo
Use capture group:
Regex.Match(line, #"([0-9]+/[0-9]+) files checked");
# here __^ and __^
You could also use 2 groups:
Regex.Match(line, #"([0-9]+)/([0-9]+) files checked");
Applying the replace operation to the first element of filesProgressMath.Groups seems to work.
String matched = filesProgressMatch.Groups[0].Value.Replace(" files checked", "");
This should give you your results
string txtText = #"1\45 files matched";
int[] s = System.Text.RegularExpressions.Regex.Split(txtText, "[^\\d+]").Where(x => !string.IsNullOrEmpty(x)).Select(x => Convert.ToInt32(x)).ToArray();
I am trying to replace a bunch of strings in files. The strings are stored in a datatable along with the new string value.
string contents = File.ReadAllText(file);
foreach (DataRow dr in FolderRenames.Rows)
{
contents = Regex.Replace(contents, dr["find"].ToString(), dr["replace"].ToString());
File.SetAttributes(file, FileAttributes.Normal);
File.WriteAllText(file, contents);
}
The strings look like this _-uUa, -_uU, _-Ha etc.
The problem that I am having is when for example this string "_uU" will also overwrite "_-uUa" so the replacement would look like "newvaluea"
Is there a way to tell regex to look at the next character after the found string and make sure it is not an alphanumeric character?
I hope it is clear what I am trying to do here.
Here is some sample data:
private function _-0iX(arg1:flash.events.Event):void
{
if (arg1.type == flash.events.Event.RESIZE)
{
if (this._-2GU)
{
this._-yu(this._-2GU);
}
}
return;
}
The next characters could be ;, (, ), dot, comma, space, :, etc.
First of all, you should use Regex.Escape.
You can use then
contents = Regex.Replace(
contents,
Regex.Escape(dr["find"].ToString()) + #"(?![a-zA-Z])",
Regex.Escape(dr["replace"].ToString()));
or even better
contents = Regex.Replace(
contents,
#"\b" + Regex.Escape(dr["find"].ToString()) + #"\b",
Regex.Escape(dr["replace"].ToString()));
I think this is what you're looking for:
contents = Regex.Replace(
contents,
string.Format(#"(?<!\w){0}(?!\w)", Regex.Escape(dr["find"].ToString())),
dr["replace"].ToString().Replace("$", "$$")
);
You can't use \b because your search strings don't always start and end with word characters. Instead, I used (?<!\w) and (?!\w) to make sure the matched substring is not immediately preceded or followed by a word character (i.e., a letter, a digit, or an underscore). I don't know the complete specs for your search strings, so this pattern might need some tweaking.
None of the sample patterns you provided contain regex metacharacters, but like the other responders, I used Regex.Escape() to render it safe anyway. In the replacement string the only character you have to watch out for is the dollar sign (ref), and the way to escape that is with another dollar sign. Notice that I used String.Replace() for that instead of Regex.Replace().
There are two tricks that can help you here:
Order all the search string by length, and replace the longest ones first, that way you won't accidentally replace the shorter ones.
Use a MatchEvaluator and instead of looping through all your rows, search fro all replacement patterns in the string and look them up in your dataset.
Option one is simple, option two would look like this:
Regex.Replace(contents", "_-\\w+", ReplaceIdentifier)
public string ReplaceIdentifier(Match m)
{
DataRow row = FolderRenames.Rows.FindRow("find"); // Requires a primary key on "find"
if (row != null) return row["replace"];
else return m.Value;
}
I'm trying to create a new file path in regex, in order to move some files. Say I have the path:
c:\Users\User\Documents\document.txt
And I want to convert it to:
c:\Users\User\document.txt
Is there an easy way to do this in regex?
If all you need is to remove the last folder name from the file path then I think it would be easier to use built-in FileInfo, DirectoryInfo and Path.Combine instead of regular expressions here:
var fileInfo = new FileInfo(#"c:\Users\User\Documents\document.txt");
if (fileInfo.Directory.Parent != null)
{
// this will give you "c:\Users\User\document.txt"
var newPath = Path.Combine(fileInfo.Directory.Parent.FullName, fileInfo.Name);
}
else
{
// there is no parent folder
}
One way in Perl regex flavour. It removes last directory in the path:
s/[^\\]+\\([^\\]*)$/$1/
Explanation:
s/.../.../ # Substitute command.
[^\\]+ # Any chars until '\'
\\ # A back-slash.
([^\\]*) # Any chars until '\'
$ # End-of-line (zero-width)
$1 # Substitute all characters matched in previous expression with expression between parentheses.
You can give this a try although it is a Java Code
String original_path = "c:\\Users\\User\\Documents\\document.txt";
String temp_path = original_path.substring(0,original_path.lastIndexOf("\\"));
String temp_path_1 = temp_path.substring(0,temp_path.lastIndexOf("\\"));
String temp_path_2 = original_path.substring(original_path.lastIndexOf("\\")+1,original_path.length());
System.out.println(temp_path_1 +"\\" + temp_path_2);
You mentioned that transformation is the same every time so, it is not always a good practice to rely on regexp for things which can be done using String manipulations.
Why not some combination of pathStr.Split('\\'), Take(length - 2), and String.Join?
Use Regex replace method. Find what you are looking for, then replace it with nothing (string.empty) here is the C# code:
string directory = #"c:\Users\User\Documents\document.txt";
string pattern = #"(Documents\\)";
Console.WriteLine( Regex.Replace(directory, pattern, string.Empty ) );
// Outputs
// c:\Users\User\document.txt
I am not good in regex. Can some one help me out to write regex for me?
I may have values like this while reading csv file.
"Artist,Name",Album,12-SCS
"val""u,e1",value2,value3
Output:
Artist,Name
Album
12-SCS
Val"u,e1
Value2
Value3
Update:
I like idea using Oledb provider. We do have file upload control on the web page, that I read the content of the file using stream reader without actual saving file on the file system. Is there any way I can user Oledb provider because we need to specify the file name in connection string and in my case i don't have file saved on file system.
Just adding the solution I worked on this morning.
var regex = new Regex("(?<=^|,)(\"(?:[^\"]|\"\")*\"|[^,]*)");
foreach (Match m in regex.Matches("<-- input line -->"))
{
var s = m.Value;
}
As you can see, you need to call regex.Matches() per line. It will then return a MatchCollection with the same number of items you have as columns. The Value property of each match is, obviously, the parsed value.
This is still a work in progress, but it happily parses CSV strings like:
2,3.03,"Hello, my name is ""Joshua""",A,B,C,,,D
Actually, its pretty easy to match CVS lines with a regex. Try this one out:
StringCollection resultList = new StringCollection();
try {
Regex pattern = new Regex(#"
# Parse CVS line. Capture next value in named group: 'val'
\s* # Ignore leading whitespace.
(?: # Group of value alternatives.
"" # Either a double quoted string,
(?<val> # Capture contents between quotes.
[^""]*(""""[^""]*)* # Zero or more non-quotes, allowing
) # doubled "" quotes within string.
""\s* # Ignore whitespace following quote.
| (?<val>[^,]*) # Or... zero or more non-commas.
) # End value alternatives group.
(?:,|$) # Match end is comma or EOS",
RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
Match matchResult = pattern.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Groups["val"].Value);
matchResult = matchResult.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Disclaimer: The regex has been tested in RegexBuddy, (which generated this snippet), and it correctly matches the OP test data, but the C# code logic is untested. (I don't have access to C# tools.)
Regex is not the suitable tool for this. Use a CSV parser. Either the builtin one or a 3rd party one.
Give the TextFieldParser class a look. It's in the Microsoft.VisualBasic assembly and does delimited and fixed width parsing.
Give CsvHelper a try (a library I maintain). It's available via NuGet.
You can easily read a CSV file into a custom class collection. It's also very fast.
var streamReader = // Create a StreamReader to your CSV file
var csvReader = new CsvReader( streamReader );
var myObjects = csvReader.GetRecords<MyObject>();
Regex might get overly complex here. Split the line on commas, and then iterate over the resultant bits and concatenate them where "the number of double quotes in the concatenated string" is not even.
"hello,this",is,"a ""test"""
...split...
"hello | this" | is | "a ""test"""
...iterate and merge 'til you've an even number of double quotes...
"hello,this" - even number of quotes (note comma removed by split inserted between bits)
is - even number of quotes
"a ""test""" - even number of quotes
...then strip of leading and trailing quote if present and replace "" with ".
It could be done using below code:
using Microsoft.VisualBasic.FileIO;
string csv = "1,2,3,"4,3","a,"b",c",end";
TextFieldParser parser = new TextFieldParser(new StringReader(csv));
//To read from file
//TextFieldParser parser = new TextFieldParser("csvfile.csv");
parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");
string[] fields =null;
while (!parser.EndOfData)
{
fields = parser.ReadFields();
}
parser.Close();