Cross-platform filepaths comparison - c#

I'm trying to check if two file path strings specify the same path.
We know paths are not case-sensitive in Windows.
Path.GetFullPath(path1).Equals(Path.GetFullPath(path2), StringComparison.CurrentCultureIgnoreCase)
I know it will not work correctly on Linux, cause paths are case-sensitive there. So I'm searching for some indication of platform case-sensitivity for paths. Or for some function like Path.Equals.

In your case, it's probably the easiest (and most reliable) to check if Path.DirectorySeparatorChar is equal to '/' or '\' and if it's the former, call the same method without the "IgnoreCase" part.
In full:
if(Path.DirectorySeparatorChar == '/')
{
Path.GetFullPath(path1).Equals(Path.GetFullPath(path2), StringComparison.CurrentCulture);
}
else
{
Path.GetFullPath(path1).Equals(Path.GetFullPath(path2), StringComparison.CurrentCultureIgnoreCase);
}

Related

How to combine paths by preserving the original path's directory separator in C#?

If we have /TestDir as an example path, yet we are on a Windows machine, using Path.Join, or Path.Combine with NextDir will yield /TestDir\NextDir.
Is there a way to make it so that if the path I'm appending it to, uses a given separator - the combined path uses the same separator? (Unix/Windows), that is:
\TestDir with NextDir to yield \TestDir\NextDir.
/TestDir with NextDir to yield /TestDir/NextDir.
The first directory will always be a rooted path, meaning it will always contain the path separator to use. The only edge-case is network paths, as they always start with \\ but after that they differ in Unix/Windows? Correct me if I'm wrong on this.
EDIT: I've been told that : is the path separator for Classic Mac - is this true? I don't see any .NET API's that treat this as a directory separator.
This will take the first character (either / or \) that it sees and it will replace all other occurrences of / or \ with the first one that it found.
using System;
public class Example
{
public static void Main()
{
char[] separators = { '\\', '/' };
string path = "/TestDir\\NextDir\\AndTheNext/AndTheNext/AndTheNext\\AndTheNext";
int index = path.IndexOfAny(separators);
path = path[index].ToString() == "\\" ? path.Replace('/', '\\') : path.Replace('\\', '/');
Console.WriteLine(path);
}
}
Check it out running here: https://dotnetfiddle.net/fenzWO
Path class uses a field with the name of: PathSeparator, this one depends on OS and is readonly, so that it's easier to create your own class that performs the same actions than Path but you are able to change the value of PathSeparator.
For more information about Path you may read the docs: https://learn.microsoft.com/en-us/dotnet/api/system.io.path?view=net-5.0

Confused about Directory.GetFiles

I've read the docs about the Directory.GetPath search pattern and how it is used, because I noticed that *.dll finds both test.dll and test.dll_20170206. That behavior is documented
Now, I have a program that lists files in a folder based on a user-configured mask and processes them. I noticed that masks like *.txt lead to the above mentioned "problem" as expected.
However, the mask fixedname.txt also causes fixedname.txt_20170206 or the like to appear in the list, even though the documentation states this only occurs
When you use the asterisk wildcard character in a searchPattern such as "*.txt"
Why is that?
PS: I just checked: Changing the file mask to fixednam?.txt does not help even though the docs say
When you use the question mark wildcard character, this method returns only files that match the specified file extension. For example, given two files, "file1.txt" and "file1.txtother", in a directory, a search pattern of "file?.txt" returns just the first file, whereas a search pattern of "file*.txt" returns both files.
If you need a solution you may transform the filter pattern into a regular expression by replacing * by (.*) and ? by .. You also have to escape some pattern characters like the dot. Then you check each filename you got from Directory.GetFiles against this regular expression. Keep in mind to not only check if it is a match but that the match length is equal to the length of the filename. Otherwise you get the same results as before.
GetFiles uses pattern serach, it searches for all names in path ending with the letters specified.
You can write code similar to below to get only .txt extension file
foreach (string strFileName in Directory.GetFiles(#"D:\\test\","*.txt"))
{
string extension;
extension = Path.GetExtension(strFileName);
if (extension != ".txt")
continue;
else
{
//processed the file
}
}

c# file path string comparison case insensitivity

I would like to compare two strings containing file paths in c#.
However, since in ntfs the default is to use case insensitive paths, I would like the string comparison to be case insensitive in the same way.
However I can't seem to find any information on how ntfs actually implements its case insensitivity. What I would like to know is how to perform a case insensitive comparison of strings using the same casing rules that ntfs uses for file paths.
From MSDN:
The string behavior of the file system, registry keys and values, and environment variables is best represented by StringComparison.OrdinalIgnoreCase.
And:
When interpreting file names, cookies, or anything else where a combination such as "å" can appear, ordinal comparisons still offer the most transparent and fitting behavior.
Therefore it's simply:
String.Equals(fileNameA, fileNameB, StringComparison.OrdinalIgnoreCase)
(I always use the static Equals call in case the left operand is null)
While comparison of paths the path's separator direction is also very important. For instance:
bool isEqual = String.Equals("myFolder\myFile.xaml", "myFolder/myFile.xaml", StringComparison.OrdinalIgnoreCase);
isEqual will be false.
Therefore needs to fix paths first:
private string FixPath(string path)
{
return path.Replace(Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar)
.ToUpperInvariant();
}
Whereas this expression will be true:
bool isEqual = String.Equals(FixPath("myFolder\myFile.xaml"), FixPath("myFolder/myFile.xaml"), StringComparison.OrdinalIgnoreCase);
string path1 = "C:\\TEST";
string path2 = "c:\\test";
if(path1.ToLower() == path2.ToLower())
MessageBox.Show("True");
Do you mean this or did i not get the question?
I would go for
string.Compare(path1, path2, true) == 0
or if you want to specify cultures:
string.Compare(path1, path2, true, CultureInfo.CurrentCulture) == 0
using ToUpper does a useless memory allocation every time you compare something

Split path by "\\" in C#

How can I split a path by "\\"? It gives me a syntax error if I use
path.split("\\");
You should be using
path.Split(Path.DirectorySeparatorChar);
if you're trying to split a file path based on the native path separator.
Try path.Split('\\') --- so single quote (for character)
To use a string this works:
path.Split(new[] {"\\"}, StringSplitOptions.None)
To use a string you have to specify an array of strings. I never did get why :)
There's no string.Split overload which takes a string. (Also, C# is case-sensitive, so you need Split rather than split). However, you can use:
string bits = path.Split('\\');
which will use the overload taking a params char[] parameter. It's equivalent to:
string bits = path.Split(new char[] { '\\' });
That's assuming you definitely want to split by backslashes. You may want to split by the directory separator for the operating system you're running on, in which case Path.DirectorySeparatorChar would probably be the right approach... it will be / on Unix and \ on Windows. On the other hand, that wouldn't help you if you were trying to parse a Windows file system path in an ASP.NET page running on Unix. In other words, it depends on your context :)
Another alternative is to use the methods on Path and DirectoryInfo to get information about paths in more file-system-sensitive ways.
To be on the safe side, you could use:
path.Split(new[] { Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar });
On windows, using forward slashes is also accepted, in C# Path functions and on the command line, in Windows 7/XP at least.
e.g.:
Both of these produce the same results for me:
dir "C:/Python33/Lib/xml"
dir "C:\Python33\Lib\xml"
(In C:)
dir "Python33/Lib/xml"
dir "Python33\Lib\xml"
On windows, neither '/' or '\' are valid chars for filename. On Linux, '\' is ok in filenames, so you should be aware of this if parsing for both.
So if you wanted to support paths in both forms (like I do) you could do:
path.Split(new char[] {'/', '\\'});
On Linux it would probably be safer to use Path.DirectorySeparatorChar.
Path.Split(new char[] { '\\\' });
Better just use the existing class System.IO.Path, so you don't need to care for any system specifications.
It provides methods to access any part of a file path like GetFileName(string path) etc.
A complete solution could look like this:
//
private static readonly char[] pathSeps = new char[] {
Path.DirectorySeparatorChar,
Path.AltDirectorySeparatorChar,
Path.VolumeSeparatorChar,
};
//
///<summary>Split a path according to the file system rules.</summary>
public static string[] SplitPath( string path ) {
if ( null == path ) return null;
return path.Split( pathSeps, StringSplitOptions.RemoveEmptyEntries );
}
Some of the other proposed solutions in this article use the syntax:
path.Split(new char[] {'/', '\'});
Although this will work, it has various disadvantages:
It does not allow your application to adapt to various target platforms. Currently, our applications are basically running on UNIX and Windows OSs (Win, macOS, iOS, linux variations). So there is a fixed set of path characters. But this might change when dotNET were ported to other operating systems. So it is best to use the predefined constants.
Performance of the inline syntax is worse. This might not be of interest for a handful of files, but when working with millions of files there are noticeable differences. The managed memory will go up until next GC. When looking at the generated assembly code you will find "call CORINFO_HELP_NEWARR_1_VC" for each of the 'new' statements, even in Release mode. This happens whenever you new-up any array, because arrays are not immutable. My proposed solution prevents this by declaring the array as readonly and static.
Reusability of the inline syntax also is worse, because you might want to use the path separators array in other contexts.
StringSplitOptions.RemoveEmptyEntries should be used to account for UNC paths and possible typos within the incoming path. The operating systems do not allow duplicate path separators, but there might be a typo from the user or a duplicate concatenation of path separator characters, for example when concatenating the path and filename.

Can you explain this bizarre crash in the .NET runtime?

My C# application sends me a stack trace when it throws an unhandled exception, and I'm looking at one now that I don't understand.
It looks as though this can't possibly be my fault, but usually when I think that I'm subsequently proved wrong. 8-) Here's the stack trace:
mscorlib caused an exception (ArgumentOutOfRangeException): startIndex cannot be larger than length of string.
Parameter name: startIndex
System.String::InternalSubStringWithChecks(Int32 startIndex, Int32 length, Boolean fAlwaysCopy) + 6c
System.String::Substring(Int32 startIndex) + 0
System.IO.Directory::InternalGetFileDirectoryNames(String path, String userPathOriginal, String searchPattern, Boolean includeFiles, Boolean includeDirs, SearchOption searchOption) + 149
System.IO.Directory::GetFiles(String path, String searchPattern, SearchOption searchOption) + 1c
System.IO.Directory::GetFiles(String path) + 0
EntrianSourceSearch.Index::zz18ez() + 19b
EntrianSourceSearch.Index::zz18dz() + a
So my code (the obfuscated function names at the end) calls System.IO.Directory.GetFiles(path) which crashes with a string indexing problem.
Sadly I don't know the value of path that was passed in, but regardless of that, surely it shouldn't be possible for System.IO.Directory::GetFiles to crash like that? Try as I might I can't come up with any argument to GetFiles that reproduces the crash.
Am I really looking at a bug in the .NET runtime, or is there something that could legitimately cause this exception? (I could understand things going wrong if the directory was being changed at the time I called GetFiles, but I wouldn't expect a string indexing exception in that case.)
Edit: Thanks to everyone for their thoughts! The most likely theory so far is that there's a pathname with dodgy non-BMP Unicode characters in it, but I still can't make it break. Looking at the code in GetFiles with Reflector, I think the only way it can break is for GetDirectoryName() to return a path that's longer than its input, even when its input is already fully normalised. Bizarre. I've tried making pathnames with non-BMP characters in (I've never had a directory called {MUSICAL SYMBOL
G CLEF} before 8-) but I still can't make it break.
What I've done is add additional logging around the failing code (and made sure my logging works with non-BMP characters!). If it happens again, I'll have a lot more information.
You can try looking into the code for System.IO.Path.GetFiles() with .NET Reflector. From a quick look it apparently only calls String.Substring() to split something from the end of the path and adds it back near the end of the method. It checks Path.DirectorySeparatorChar (the backslash, '\') and Path.AltDirectorySeparatorChar (the slash, '/') to determine the index and length of the substring.
My guess would be that invalid or unicode file or folder names are confusing the method.
Just a guess... are any of the file names passed as arguments longer than 256 characters? The .Net framework standard System.IO functions cannot handle a file name that is longer than that.
Wow.. I don't think that's ever happened to me.
You're saying that it's only this one customer that this happens to?
Might want to start logging the path parameters, and set up the program to send the logs to you for analysis, I feel that the problem is in the format of the argument.
If this obfuscated code created from your own obfuscator, why don't you try test it on your machine 'un-obfuscated' with some of the parameters collected and see the result?
Isn't there anything in the Path namespace, like Path.Exist() or Path.IsValid() to give the parameter a check.. maybe there's funny '/' or '\' and other characters, so when the internal API parses each component, there's some sort of corruption in determining each portion of the path string because of funny characters? Just an observation, since the Substring is failing.
Hope that helps and good luck! Please let us know what the solution you've found is, as will definitely be an interesting one.
Perhaps you could provide some details about the customer having the issue. Things like:
1. OS name and version
2. OS Language
3. .Net version you are targeting, vs .Net version the customer is running.
There could be unicode characters in the directory path that are causing the string length to be off by one or more.
Another note: the exception text suggests that your program was written in managed C++. You aren't mixing in any unmanaged string manipulation are you?
I might suggest that if you can, modify your diagnostics to capture the actual path variable that causes the error.
A possible plausible explaination: http://support.microsoft.com/kb/943804/
First and only question should have been, "Have your run ChkDsk?"
Perhaps it has something to do with the obfuscator. And the obfucator screws things up. Try running the code without the obfuscator. And post your results.
edit:
Are you able to reproduce the crash?
Not sure this is related, but I'm using GetFiles in Visual C++, was getting it crashing when listing contents of C:, turned out I had a folder with messed up permissions from a previous install. I reclaimed the folder to my current user and it fixed the crash.
Is it a possibility to quickly code up a console app and run it in debug mode. Basically loop through the entire file directory using the GetFiles method. Maybe something will hit and you should be able to quickly locatye the offending file?
From the souce and your comments, I suspect a UNC path is causing problems, with a possible security permission or share permission issue. For instance, if the user turned off creation of 8.3 file names, you will definitely have UNC path issues because it causes the network provider to fail in retrieving proper file names in Windows 2000 and Windows XP. (I forget which service packs this bug was fix.)
Following is the source code of importance.
String tempStr = Path.InternalCombine(fullPath, searchPattern);
// If path ends in a trailing slash (\), append a * or we'll
// get a "Cannot find the file specified" exception
char lastChar = tempStr[tempStr.Length-1];
if (lastChar == Path.DirectorySeparatorChar || lastChar == Path.AltDirectorySeparatorChar || lastChar == Path.VolumeSeparatorChar)
tempStr = tempStr + '*';
fullPath = Path.GetDirectoryName(tempStr);
BCLDebug.Assert((fullPath != null),"fullpath can't be null!");
String searchCriteria;
bool trailingSlash = false;
bool trailingSlashUserPath = false;
lastChar = fullPath[fullPath.Length-1];
trailingSlash = (lastChar == Path.DirectorySeparatorChar) || (lastChar == Path.AltDirectorySeparatorChar);
if (trailingSlash) {
// Can happen if the path is C:\temp, in which case GetDirectoryName would return C:\
searchCriteria = tempStr.Substring(fullPath.Length);
}
else
searchCriteria = tempStr.Substring(fullPath.Length + 1);

Categories

Resources