System.IO.Path or equivalent use with Unix paths - c#

Is it possible to either use the System.IO.Path class, or some similar object to format a unix style path, providing similar functionality to the PATH class? For example, I can do:
Console.WriteLine(Path.Combine("c:\\", "windows"));
Which shows:
"C:\\windows"
But is I try a similar thing with forward slashes (/) it just reverses them for me.
Console.WriteLine(Path.Combine("/server", "mydir"));
Which shows:
"/server\\mydir"

You've got bigger problems, Unix accepts characters in a file name than Windows does not allow. This code will bomb with ArgumentException, "Illegal characters in path":
string path = Path.Combine("/server", "accts|payable");
You can't reliably use Path.Combine() for Unix paths.

Path.Combine uses the values of Path.DirectorySeperatorChar and Path.VolumeSeparatorChar, and these are determined by the class libraries in the runtime - so if you write your code using only Path.Combine calls, Environment.SpecialFolder values, and so forth, it will run fine everywhere, since Mono (and presumably any .NET runtime) implements the native way of getting and building those paths for any platform it runs on. (Your second example, for instance, returns /server/mydir for me, but the first example gives c:\/windows )
If you want a UNIX-specific path hard-coded in all cases, Path.Combine isn't buying you anything: Console.WriteLine ("/server/mydir"); does what you want in the OP.
As Hans said though, different filesystems have different rules for allowed characters, path lengths, and etc., so the best practice, like with any cross-platform programming, is to restrict yourself to using the intersection of allowed features between the filesystems you're targeting. Watch case-sensitivity issues too.

In this case i would use the class System.Uri or System.UriBuilder.
Side note: If you run your .NET code on a Linux-System with the Mono-Runtime, the Path class should return your expected behavior. The information that the Path class uses are provided by the underlying system.

Related

C# Attribute to detect unused methods

Is it possible to write an Attribute that can track methods to detect if those methods are never called?
[Track]
void MyMethod(){
}
output:
warning: method "MyMethod" in "MyClass" has no references in code.
It is not strictly necessary to have it run at compile time, but it should work when the application is initialized (better at compile time anyway).
This tag will be putted to track methods on Audio Library, since audio is refactored very frequently and we usually search for audio methods with 0 references in code we want to mark these methods so we can detect quickly and remove unused audio assets.
Basically each time we add a new sound effect, we may later no longer trigger it (calling its method), and the audio file/playback code can remain in the application for a long time.
Maybe this is the answer you're looking for?
Finding all references to a method with Roslyn
you can use the code there to automate something of your own with Reflection I'd say
A partial answer is found here:
C# reflection and finding all references
I can use that info to get references to methods marked with a particular attribute, however that is a run-time script (But better than nothing).
A little late, but better than never: I use WinGrep by Gnu to search all folders and files for the name of the method:
`C:\>grep -irw "method_name" * --include=*.cs --include=*.sql --include=*.txt`
You can include as many, or as few, file name extensions as makes sense for you. In the example above, I show the top directory as C:, but you can start the search at any directory that makes sense.
The huge advantage of using grep over IDE based searches is that it will search across multiple projects and solutions.

PathTooLongException with a shortened path [duplicate]

How can I use (to avoid PathTooLongException):
System.IO.FileInfo
with paths bigger than 260 chars?
Are there similar classes/methods that return the same result of FileInfo class?
From what I know it is not easily possible. While it is possible to use workaround for streams as phoenix mentioned, it is not possible for file names handling. Internally every class that works with file names perform checks for long file names.
You can instantiate FileInfo and fill private memebers using reflection (however this is not recommended) and get FileInfo pointing to file with long path. But when you try to use this object you will still receive PathTooLongException exceptions, because for example, Path class (used heavily by FileInfo) checks for long path on every method call.
So, there is only one right way to get problem free long path support - implement your own set of classes that will mimic FileInfo behavior. It is not very complex (only security maybe), but time-consuming.
Update: Here even two ready solutions for this problem: AlpfaFS and Zeta Long Paths
Here at work we deal with long paths quite frequently, and we therefore had to basically roll our own System.IO to do it. Well not really, but we rewrote File, Directory, FileInfo, DirectoryInfo and Path just to name a few. The basic premise is that it's all possible from a Win32 API perspective, so all you really need to do at the end of the day is invoke the Unicode versions of the Win32 API functions, and then you're good. It's alot of work, and can be a pain in the ass at times, but there's really no better way to do it.
There's a great library on Microsoft TechNet for overcoming the long filenames problem, it's called
Delimon.Win32.I​O Library (V4.0) and it has its own versions of key methods from System.IO
For example, you would replace:
System.IO.Directory.GetFiles
with
Delimon.Win32.IO.Directory.GetFiles
which will let you handle long files and folders.
From the website:
Delimon.Win32.IO replaces basic file functions of System.IO and
supports File & Folder names up to up to 32,767 Characters.
This Library is written on .NET Framework 4.0 and can be used either
on x86 & x64 systems. The File & Folder limitations of the standard
System.IO namespace can work with files that have 260 characters in a
filename and 240 characters in a folder name (MAX_PATH is usually
configured as 260 characters). Typically you run into the
System.IO.PathTooLongException Error with the Standard .NET Library.
I only needed to use the FullName property but was also receiving the PathTooLongException.
Using reflection to extract the FullPath value was enough to solve my problem:
private static string GetFullPath(FileInfo src)
{
return (string)src.GetType()
.GetField("FullPath", BindingFlags.Instance|BindingFlags.NonPublic)
.GetValue(src);
}

Possible to Change Embedded Resource Path Separator Character?

I store a whole bunch of files as embedded resources within an assembly. Calling Assembly.GetManifestResourceNames returns things similar to the following:
Folder1.Resource1.cshtml
Folder1.Folder2.common.js
etc.
I have a class that builds a virtual directory/file system based on these names. However, I am having an issue with resources such as:
Folder1.Folder2.jQuery-ui-1.10.3.custom.min.js
As there is no way (unless you handle it as a special case) to know that jquery-ui-1 and 10 and 3, etc are not folder names, with a final resource of min.js. Currently I get around this by ensuring that all my embedded resources do not contain multiple periods. That said, is there a way to change the path separator to a different character to avoid this problem entirely?
Are you able to get the ResourceManager? If so, you can use BaseName
You could use the Assembly.GetTypes() to use the type to get to the ResourceManager, or you could even potentially cross reference directly against the FullName of the Types.

Canonicalize URL to lowercase without breaking file system or culture?

Canonicalizing URLs to Lowercase
I wish to write an HTTP module that converts URLs to lowercase. My first attempt ignored international character sets and works great:
// Convert URL virtual path to lowercase
string lowercase = context.Request.FilePath.ToLowerInvariant();
// If anything changed then issue 301 Permanent Redirect
if (!lowercase.Equals(context.Request.FilePath, StringComparison.Ordinal))
{
context.Response.RedirectPermanent(...lowercase URL...);
}
The Turkey Test (international cultures):
But what about cultures other than en-US? I referred to the Turkey Test to come up with a test URL:
http://example.com/Iıİi
This little insidious gem destroys any notion that case conversion in URLs is simple! Its lowercase and uppercase versions, respectively, are:
http://example.com/ııii
http://example.com/IIİİ
For case conversion to work with Turkish URLs, I first had to set the current culture of ASP.NET to Turkish:
<system.web>
<globalization culture="tr-TR" />
</system.web>
Next, I had to change my code to use the current culture for the case conversion:
// Convert URL virtual path to lowercase
string lowercase = context.Request.FilePath.ToLower(CultureInfo.CurrentCulture);
// If anything changed then issue 301 Permanent Redirect
if (!lowercase.Equals(context.Request.FilePath, StringComparison.Ordinal))
{
context.Response.RedirectPermanent(...);
}
But wait! Will StringComparison.Ordinal still work? Or should I use StringComparison.CurrentCulture? I'm really not certain of either!
File names: It gets MUCH WORSE!
Even if the above works, using the current culture for case conversions breaks the NTFS file system! Let's say I have a static file with the name Iıİi.html:
http://example.com/Iıİi.html
Even though the Windows file system is case-insensitive it does not use language culture. Converting the above URL to lowercase results in a 404 Not Found because the file system doesn't consider the two names as equal:
http://example.com/ııii.html
The correct case conversion for file names? WHO KNOWS?!
The MSDN article, Best Practices for Using Strings in the .NET Framework, has a note (about halfway through the article):
Note:
The string behavior of the file system, registry keys and values, and environment variables is best represented by StringComparison.OrdinalIgnoreCase.
Huh? Best represented??? Is that the best we can do in C#? So just what is the correct case conversion to match the file system? Who knows?!!? About all we can say is that string comparisons using the above will probably work MOST of the time.
Summary: Two case conversions: Static/Dynamic URLs
So we've seen that static URLs---URLs having a file path that matches a real directory/file in the file system---must use an unknown case conversion that is only "best represented" by StringComparison.OrdinalIgnoreCase. And please note there is no string.ToLowerOrdinal() method so it's very difficult to know exactly what case conversion equates to the OrdinalIgnoreCase string comparison. Using string.ToLowerInvariant() is probably the best bet, yet it breaks language culture.
On the other hand, dynamic URLs---URLs with a file path that does not match a real file on the disk (that map to your application)---can use string.ToLower(CultureInfo.CurrentCulture), but it breaks file system matching and it is somewhat unclear what edge cases exist that may break this strategy.
Thus, it appears case conversion first requires detection as to whether a URL is static or dynamic before choosing one of two conversion methods. For static URLs there is uncertainty how to change case without breaking the Windows file system. For dynamic URLs it is questionable if case conversion using culture will similarly break the URL.
Whew! Anyone have a solution to this mess? Or should I just close my eyes and pretend everything is ASCII?
I would challenge the premise here that there is any utility whatsoever in attempting to auto-convert URLs to lowercase.
Whether a full URL is case-sensitive or not depends entirely on the web server, web application framework, and underlying file system.
You're only guaranteed case-insensitivity in the scheme (http://, etc.) and hostname portions of the URL. And remember that not all URL schemes (file and news, for example) even include a hostname.
Everything else can be case-sensitive to the server, including paths (/), filenames, queries (?), fragments (#), and authority info (usernames/passwords before the # in mailto, http, ftp, and some other schemes).
You have some incompatible goals.
Have a culture-sensitive case-lowering. If Turkish seems bad, you don't want to know about some of the Georgian scripts, never mind that ß is either upper-cased to SS or less commonly to SZ - in either case to have a full case-folding where lower("ß") will match lower(upper("ß")) you need to consider it equivalent to at least one of those two-character sequences. Generally we aim for case-folding rather than case-lowering if possible (not possible here).
Use this in a non culture-sensitive context. URIs are ultimately opaque strings. That they may have a human-readable understanding is usefulful for coders, users, search-engines and marketers alike, but their ultimate job is to identify a resource by a direct case-sensitive comparison.
Map this to NTFS, which has a case-preserving case-sensitivity based on the mappings in the $UpCase file, which it does by comparing the upper-cased forms of words (at least it doesn't have to decide whether Σ lower-cases to σ or ς, in a culture-insensitive manner.
Presumably do well in terms of SEO and human readability. This may well be part of your original goal, but whileThisIsNotVeryEasyToReadOrParse itseasierforbothpeopleandmachinesthanthis. Case-folding loses information.
I suggest a different approach.
Start with your starting string, whatever that is and wherever it came from (NTFS filename, database entry, HttpHandler binding in web.config). Have that as your canonical form. By all means have rules that people should create these strings according to some canonical form, and perhaps enforce it where you can, but if something slips by that breaks your rules, then accept it as the official canonical name for that resource no matter how much you dislike it.
As much as possible the canonical name should be the only one "seen" by the outside world. This can be enforced programmatically or just a matter of it being best practice, as canonicalising after the fact with 301s won't solve the fact that outside entities don't know you do so until they dereference the URI.
When a request is received, test it according to how it is going to be used. Hence while you may choose to use a particular culture (or not) for those cases where you perform the resource-lookup yourself, with so-called "static" URIs, your logic can deliberately follow that of NTFS by simply using NTFS to do the work:
Find mapped file ignoring the matter of case sensitivity for now.
If non-match then 404, who cares about case?
If find, do case-sensitive ordinal comparison, if it doesn't match then 301 to the case-sensitive mapping.
Otherwise, proceed as usual.
Edit:
In some ways the question of domain names is more complicated. The rules for IDN have to cover more issues with less room for manœuver. However, it's also simpler at least as far as case-canonicalising goes.
(I'm going to ignore canonicalising of whether or not www. is used etc. though I'd guess it's part of the same job here, it's pushing the scope and we could end up writing a book between us if we don't stop somewhere :)
IDNs have their own case canoniclisation (and some other forms of normalisation) rules defined in RFC 3491. If you're going to canonicalise domain names on case, follow that.
Makes it nice and simple to answer, doesn't it? :)
There's also less pressure in a way, for while search engines have to recognise that http://example.net/thisisapath and http://example.net/thisIsAPath may be the same resource, they also have to recognise that they might be different, and that's where all of the SEO advantage of canonicalising on one of them (doesn't matter which) comes from.
However, they know that example.net and EXAMPLE.NET can't possibly be different sites, so there's little SEO advantage in making sure they're the same (still nice for things like caches and history lists that don't make that jump themselves). Of course, the issue remains with the fact that www.example.net or even maAndPasExampleEmporium.us might be the same site, but again, that moves away from case issues.
There's also the simple matter that most of the time we never have to deal with more than a couple of dozen different domains, so sometimes working harder rather than smarter (i.e. just make sure they're all set up right and don't do anything programmatically!) can do the trick.
A final note though, it's important not to canonicalise a third-party URI. You can end up breaking things if you change the path (they may not be treating it case-insensitively) and you might at least end up breaking their slightly different canonicalisation. Best to leave them as is at all times.
Firstly never use case transformations to compare strings. It needlessly allocates a string, it has a needless small performance impact, could result in an ObjectReferenceException if the value is null and could likely result in an incorrect comparison.
If this is important enough to you I would manually traverse the file system and use the your own comparisons against each file/directory name. You should be able to use Accept-Language or Accept-Encoding (if it has a culture included) HTTP header to find the suitable culture to use. Once you have the CultureInfo you can use it to perform the string comparisons:
var ci = CultureInfo.CurrentCulture; // Use Accept-Language to derive this.
ci.CompareInfo.Compare("The URL", "the url", CompareOptions.IgnoreCase);
I would only do this on a HTTP 404; the HTTP 404 handler would search for a matching file and then HTTP 301 the user to the correctly-cased URL (as manual file-system traversal can get expensive).

Can Spaces Exist Within A File Extension?

I'm currently working with some code involving saving a file to a user-defined file. If the user passes in a filename with no extension, the code autodetects the extension based on the file type (stored internally).
However, I'm having a hard time determining whether the filename passed to the code has an extension or not. I'm using Path.HasExtension(filename) and Path.GetExtension(filename) but it seems to be exhibiting strange behavior:
File.EXT => .EXT is the extension. This is fine.
This Is A File.EXT => .EXT is the extension. This is also fine.
This Is A File. Not An Extension => . Not An Extension is the extension. However, I would think of this as a file without an extension. Windows thinks so too when I create a file with this name (creating a file with an unrecognized extension causes windows to call it a EXTENSIONNAME File, whereas files without an extension such as this one are just called File).
This Is A File.Not An Extension => .Not An Extension is the extension. Same problem as above.
Also note that this same behavior is evident in Path.GetFileNameWithoutExtension(filename) (e.g. it reports the filename without extension on the last two examples to be just This Is A File).
So what I'm taking from this is that .NET and Windows differ on what they think of as an extension.
The Question:
I'm wondering if it's OK for me to implement code such as this:
if(!Path.HasExtension(filename) || Path.GetExtension(filename).Contains(" ")) {...}
since that would pull my code's definition of a proper extension more in line with how Windows treats things. Or is there something I'm missing here which explicitly says I must allow spaces in my extensions?
I've searched and found this slightly similar question, but the documents linked therein only specify that it's not recommended to end the extension with a space/period -- they say nothing about spaces within the extension.
The extension on a filename in Windows is purely a convention. The GetExtension and HasExtension methods only look for a dot in the filename and act accordingly. You are free to put spaces anywhere you like within the filename (including the extension).
When you say "Windows thinks so too", it's really just some code in Explorer that tries to parse out extensions, and it simply uses a slightly different algorithm than .NET.
How the filesystem handles names and how the Windows shell (i.e. Explorer) handles file names are two completely different beasts.
The filesystem doesn't care about spaces, dots or anything else -- to it, the filename is just one opaque string (with some restrictions on allowed characters). The name/extension separation is just a made-up convention. The shell, on the other hand, is free to make up its own interpretation of what an extension is because its purpose is not to store and retrieve file information but rather to provide the user with a better experience. So don't go looking there for answers.
I would suggest going with what the System.IO methods return (because following the convention is good), but you can do whatever you like in your code if there's a good reason for it.
There is no official definition of what an extension is. The common convention is that everything after the final . is the extension.
However if you would grab a HUGE list of all common-used extensions I think you'll only find a handful of examples where spaces in an extension are used.
I would say, disallow spaces in extensions. 999/1000 times the user didn't mean it as an extension.
To quote Wikipedia on filenames:
. (DOT): allowed but the last occurrence will be interpreted to be the extension separator in VMS, MS-DOS and Windows. In other OSes, usually considered as part of the filename, and more than one full stop may be allowed.

Categories

Resources