The folder names are variable but I have this constant value in the directory - the "distributions" folder.
How can I extract the all the strings before the "distributions" folder?
> /<root>/win/<usr>/distributions/<dbms>/<repository>/<port
> type>/<remote system>/<port>
Currently I'm doing it in lengthy way (e.g. getting the length of the whole directory, finding the location of distributions word in the string, etc...).
I'm looking for a more elegant way. Could this be done using Regex, or a shorter version of my current implementation?
string.Split followed by TakeWhile can help you
var resultArray = str.Split(new []{#"/"},StringSplitOptions.RemoveEmptyEntries)
.TakeWhile(x=>!x.Equals("distributions"));
Output
<root>
win
<usr>
Update based on Commments
If you need entire path based before "distributions", You can use
var result = str.Split(new []{#"distributions"},StringSplitOptions.RemoveEmptyEntries)
.First();
Output
/<root>/win/<usr>/
string.split('/') will put each "component" of the path (or any string) in an array splitting them by delimiter (/ here). you could then loop through it.
Assuming you do want to get the path up until that point i would recommend using regex here is how i would do it.
Regex regex = new Regex(#".+?(?=distributions)");
Debug.WriteLine(regex.Match("/<root>/win/<usr>/distributions/<dbms>/<repository>").Value);
this outputs
/<root>/win/<usr>/
What is the problem with the good old way?
var s = "/<root>/win/<usr>/distributions/<dbms>/<repository>/<port.....";
var result = s.Substring(0, s.IndexOf("distributions"));
or s.Substring(0, s.IndexOf("/distributions/")+1) if that text might appeare in other form too...
Related
My goal is to find a file name ("MyFile.txt") inside a larger string. I.e.:
Some text before MyFile.txt some other text after
Currently I'm successfully using a Regular Expression with a character class of something like the following (simplified):
[\w\.\-]
This works fine, until the file contains other characters that are outside the \w group, e.g. an em dash: "My—File.txt".
My approach:
The method Path.GetInvalidPathChars returns an array of invalid characters. I've tried to use this method. Unfortunately, I found no way of "converting" this to be useful inside a Regular Expression.
I'm aware of
The SO posting "How to remove illegal characters from path and filenames?"
The concept of "Character class subtraction"
Still, I found no solution.
My question:
Is there any Regular Expression (or any other way) to find and extract a file name inside a larger string, based on the result of Path.GetInvalidPathChars?
I won´t use a regex for this at all as it becomes incredibly complex and unreadable. In particular a filename could be nearly any string, including most special characters, numbers, spaces. Even worse there are even files without a dot to seperate an extension. So I´d suggest to simply do an Contains-check on all your invalid characters:
char[] invalidChars = Path.GetInvalidPathChars;
bool valid = !myString.Contains(x => invalidChars.Contains(x));
Extracting the candidates instead is even simpler. The idea is to split your large string on all invalid characters. This means everything in between the invalid characters is considered a file-name, e.g:
"myTest.extension" → "myTest.extension"
"myFile:anotherFile" → "myFile"; "anotherFile"
"myFile with space" → "myFile with space"
"a File with .-determined extension.dot" → "a File with .-determined extension.dot"
This is achieved by this code:
var fileNames = myText.Split(invalidChars);
EDIT: If you really want a regex you can build one dynamically from your invalid characters:
var pattern = String.Format("([^{0}]*)", new String(invalidCharacters));
var r = new Regex(pattern);
If your file name do not contains space and do contain extension, then this simple idea may help you
string line = "Some text before MyFile.txt some other text after";
//If you look for path:
//var array = Path.GetInvalidPathChars().ToList();
//If you look for file name
var array = Path.GetInvalidFileNameChars().ToList();
array.Add(' ');
var potentialFileNames = line.Split(array.ToArray(), StringSplitOptions.RemoveEmptyEntries)
.Where(i => i.Contains('.')).ToList();
//potentialFileNames[0] = "MyFile.txt"
I am trying to read specific word from text file I know its easy and I have done but I need to read from sentence i.e. if file contain
WC|110916|F-12003||ZET5.4|27019570 then i need to pic "27019570" this specific word, I did with substring(26,8) splitting with characters and its works but every line not having specific size/length so splitting words is not proper solution for this.
In short I need to know how do i check (|) this character and its position on every sentence which includes in text file.
Thanks in Advance :)
you can split each line by '|' character . it returns an array then you can select the desired index.
var textFromFile = "WC|110916|F-12003||ZET5.4|27019570";
var goalText = textFromFile.Split('|')[5];
if you're using .NET 3.5 or higher, it's easy using LINQ with File.ReadAllLines
string fullFilePath = #"C:\ed\cc\filename.txt";
List<string> items = File.ReadAllLines(fullFilePath ).Select(line=>line.Split('|').Last()).ToList();
I have file name which look like
Directory\name-secondName-blabla.txt
If I using string .split my code need to know the separator I am using,
But if in some day I will replace the separator my code will break
Is the any build in way to split to get the following result?
Directory
name
secondNmae
blabla
txt
Thanks
Edit My question is more general than just split file name, is splitting string in general
The best way to split a filename is to use System.IO.Path
You're not clear about what to do with directory1\directory2\ ,
but in general you should use this static class to find the path, name and suffix parts.
After that you will need String.Split() to handle the - separators, you'll just have to make the separator(s) a config setting.
You can make an array with seperators:
string value = "Directory\name-secondName-blabla.txt";
char[] delimiters = new char[] { '\\', '-', '.' };
string[] parts = value.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
var filepath = #"Directory\name-secondName-blabla.txt";
var tokens = filepath.Split(new[]{'\\', '-'});
If you're worried about your separator token changing in the future, set it as a constant in a settings file so you only have to change it in one place. Or, if you think it is going to change regularly, put it in a config file so you don't have to release new builds every time.
As Henk suggested above, use System.IO.Path and its static methods like GetFileNameWithoutExtenstion, GetDirectoryName, etc. Have a look at this link:
http://msdn.microsoft.com/en-us/library/system.io.path.aspx
How can I split a path by "\\"? It gives me a syntax error if I use
path.split("\\");
You should be using
path.Split(Path.DirectorySeparatorChar);
if you're trying to split a file path based on the native path separator.
Try path.Split('\\') --- so single quote (for character)
To use a string this works:
path.Split(new[] {"\\"}, StringSplitOptions.None)
To use a string you have to specify an array of strings. I never did get why :)
There's no string.Split overload which takes a string. (Also, C# is case-sensitive, so you need Split rather than split). However, you can use:
string bits = path.Split('\\');
which will use the overload taking a params char[] parameter. It's equivalent to:
string bits = path.Split(new char[] { '\\' });
That's assuming you definitely want to split by backslashes. You may want to split by the directory separator for the operating system you're running on, in which case Path.DirectorySeparatorChar would probably be the right approach... it will be / on Unix and \ on Windows. On the other hand, that wouldn't help you if you were trying to parse a Windows file system path in an ASP.NET page running on Unix. In other words, it depends on your context :)
Another alternative is to use the methods on Path and DirectoryInfo to get information about paths in more file-system-sensitive ways.
To be on the safe side, you could use:
path.Split(new[] { Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar });
On windows, using forward slashes is also accepted, in C# Path functions and on the command line, in Windows 7/XP at least.
e.g.:
Both of these produce the same results for me:
dir "C:/Python33/Lib/xml"
dir "C:\Python33\Lib\xml"
(In C:)
dir "Python33/Lib/xml"
dir "Python33\Lib\xml"
On windows, neither '/' or '\' are valid chars for filename. On Linux, '\' is ok in filenames, so you should be aware of this if parsing for both.
So if you wanted to support paths in both forms (like I do) you could do:
path.Split(new char[] {'/', '\\'});
On Linux it would probably be safer to use Path.DirectorySeparatorChar.
Path.Split(new char[] { '\\\' });
Better just use the existing class System.IO.Path, so you don't need to care for any system specifications.
It provides methods to access any part of a file path like GetFileName(string path) etc.
A complete solution could look like this:
//
private static readonly char[] pathSeps = new char[] {
Path.DirectorySeparatorChar,
Path.AltDirectorySeparatorChar,
Path.VolumeSeparatorChar,
};
//
///<summary>Split a path according to the file system rules.</summary>
public static string[] SplitPath( string path ) {
if ( null == path ) return null;
return path.Split( pathSeps, StringSplitOptions.RemoveEmptyEntries );
}
Some of the other proposed solutions in this article use the syntax:
path.Split(new char[] {'/', '\'});
Although this will work, it has various disadvantages:
It does not allow your application to adapt to various target platforms. Currently, our applications are basically running on UNIX and Windows OSs (Win, macOS, iOS, linux variations). So there is a fixed set of path characters. But this might change when dotNET were ported to other operating systems. So it is best to use the predefined constants.
Performance of the inline syntax is worse. This might not be of interest for a handful of files, but when working with millions of files there are noticeable differences. The managed memory will go up until next GC. When looking at the generated assembly code you will find "call CORINFO_HELP_NEWARR_1_VC" for each of the 'new' statements, even in Release mode. This happens whenever you new-up any array, because arrays are not immutable. My proposed solution prevents this by declaring the array as readonly and static.
Reusability of the inline syntax also is worse, because you might want to use the path separators array in other contexts.
StringSplitOptions.RemoveEmptyEntries should be used to account for UNC paths and possible typos within the incoming path. The operating systems do not allow duplicate path separators, but there might be a typo from the user or a duplicate concatenation of path separator characters, for example when concatenating the path and filename.
I have a bunch of files in a directory, mostly labled something like...
PO1000000100.doc or .pdf or .txt
Some of them are PurchaseOrderPO1000000109.pdf
What i need to do is extract the PO1000000109 part of it. So basically PO with 10 numbers after it...
How can I do this with a regex?
(What i'll do is a foreach loop on the files in the directory, get the filename, and run it through the regex to get the PO number...)
I'm using C# - not sure if this is relevant.
Try this
String data =
Regex.Match(#"PO\d{10}", "PurchaseOrderPO1000000109.pdf",
RegexOptions.IgnoreCase).Value;
Could add a Regex.IsMatch with same vars above ofc :)
If the PO part is always the same, you can just get the number without needing to use a regex:
new string(theString.Where(c => char.IsDigit(c)).ToArray());
Later you can prepend the PO part manually.
NOTE: I'm assuming that you have only one single run of numbers in your strings. If you have for example "abc12345def678" you will get "12345678", which may not be what you want.
Regex.Replace(fileName, #"^.?PO(\d{10}).$", "$1");
Put stars after dots.
string data="PurchaseOrderPO1000000109.pdf\nPO1000000100.doc";
MatchCollection matches = Regex.Matches(data, #"PO[0-9]{10}");
foreach(Match m in matches){
print(m.Value);
}
Results
PO1000000109
PO1000000100
This RegEx will pick up all numbers from a string \d*.
As described here.
A possible regexp could be:
^.*(\d{10})\.\D{3}$
var re = new System.Text.RegularExpressions.Regex("(?<=^PurchaseOrder)PO\\d{10}(?=\\.pdf$)");
Assert.IsTrue(re.IsMatch("PurchaseOrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("some PurchaseOrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("OrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("PurchaseOrderPO1234567890.pdf2"));