I was wondering about the difference between \ and / in file paths. I have noticed that sometimes a path contains /and sometimes it is with \.
It would be great if anyone can explain when to use \ and /.
/ is the path separator on Unix and Unix-like systems. Modern Windows can generally use both \ and / interchangeably for filepaths, but Microsoft has advocated for the use of \ as the path separator for decades.
This is done for historical reasons that date as far back as the 1970s, predating Windows by over a decade. In the beginning, MS-DOS (the foundation to early Windows) didn't support directories. Unix had directory support using the / character since the beginning. However, when directories were added in MS-DOS 2.0, Microsoft and IBM were already using the / character for command switches, and because of DOS's lightweight parser (descended from QDOS, designed to run on lower end hardware), they couldn't find a feasible way to use the / character without breaking compatibility with their existing applications.
So, to avoid errors about "missing a switch" or "invalid switch" when passing filepaths as arguments to commands such as these:
cd/ <---- no switch specified
dir folder1/folder2 <---- /folder2 is not a switch for dir
it was decided that the \ character would be used instead, so you could write those commands like this
cd\
dir folder1\folder2
without error.
Later, Microsoft and IBM collaborated on an operating system unrelated to DOS called OS/2. OS/2 had the ability to use both separators, probably to attract more Unix developers. When Microsoft and IBM parted ways in 1990, Microsoft took what code they had and created Windows NT, on which all modern versions of Windows are based, carrying this separator agnosticism with it.
As backward compatibility has been the name of the game for Microsoft from all of the major OS transitions that they've undertaken (DOS to Win16/DOS, to Win16/Win32, to Win32/WinNT), this peculiarity stuck, and it will probably exist for a while yet.
It's for this reason that this discrepancy exists. It should really have no effect on what you're doing because, like I said, the WinAPI can generally use them interchangeably. However, 3rd party applications will probably break if you pass a / when they expect a \ between directory names. If you're using Windows, stick with \. If you're using Unix or URIs (which have their foundation in Unix paths, but that's another story entirely), then use /.
In the context of C#: It should be noted, since this is technically a C# question, that if you want to write more "portable" C# code that works on both Unix and Windows (even if C# is predominantly a Windows language), you might want to use the Path.DirectorySeparatorChar field so your code uses the preferred separator on that system, and use Path.Combine() to append paths properly.
MS-DOS 1.0 retained the command line option (or switch) character convention of '/' from CP/M. At that time there was no directory structure in the file system and no conflict.
When Microsoft developed the more Unix like environment with MS-DOS (and PC-DOS) 2.0, they needed to represent the path separator using something that did not conflict with existing command line options. Internally, the system works equally well with either '/' or '\'. The command processor (and many applications) continued to use the '/' as a switch character.
A CONFIG.SYS entry SWITCHAR=- could be used to override the / default to improve Unix compatibility. This makes built in commands and standard utilities use the alternate character. The Unix path separator could then be unambiguously used for file and directory names. This entry was removed in later versions, but a DOS call was documented to set the value after booting.
This was little used and most third-party tools remained unchanged. The confusion persists. Many ports of Unix tools retain the '-' switch character while some support both conventions.
The follow-on PowerShell command processor implements rigorous escaping and switch parameters and largely avoids the confusion except where legacy tools are used.
Neither the question nor the answer relate to C#.
A URL, standardized in RFC 1738, always uses forward slashes,
regardless of platform.
A file path and a URI are different. \ is correct in a Windows file
path and / is correct in a URI.
Several browsers (namely, Firefox & Opera) fail catastrophically when
encountering URIs with backslashes.
System.IO.Path.DirectorySeparatorChar to get current path separator
This can be relevant resource.
On Unix-based systems \ is an escape character, that is, \ tells the parser that this is a space and not the end of the statement. On Unix systems / is the directory separator.
On Windows \ is the directory separator, but the / cannot be used in file or directory names.
You shouldn't be using either in C#. You should always use the Path class. This contains a method called Path.Combine that can be used to create paths without specifying the separator yourself.
Example usage:
string fullPath = System.IO.Path.Combine("C:", "Folder1", "Folder2", "file.txt");
Apart from the answers given, it is worth mentioning that \ is widely used for special characters (such as \n \t) in programming languages, text editors and general systems that apply lexical analysis.
If you are programming for instance, it is inconvenient at times to need to even need to escape backslash with another one (\\) in order to use it properly - or need to use escaping strings, such as C# #"\test".
Of course, as mentioned before, web URIs use forward slash by standard but both slashes work in the latest and most common command line tools.
UPDATE: After searching a little bit, it seems out the whole story between / and \ goes back in "computer history", in the ages of DOS and the Unix-based systems at that time. HowToGeek has an interesting article about this story.
In short terms, DOS 1.0 was initially released by IBM with no directory support, and / was used for another ("switching") command functionality. When directories were introduced in 2.0 version, / was already in use, so IBM chose the visually closest symbol, which was \. On the other hand, Unix standardly used / for directories.
When users started to use many different systems, they started becoming confused, making the OS developers to attempt making the systems work in both cases - this even applies in the part of URLs, as some browsers support the http:\\www.test.com\go format. This had drawbacks though in general, but the whole thing stands today still for backward compartibility causes, with an attempt for support of both slashes on Windows, even though they are not based on DOS anymore.
\ is used for Windows local file paths and network paths as in:
C:\Windows\Temp\ or \\NetworkSharedDisk\Documents\Archive\
/ is what is required by standard URIs as in:
http://www.stackoverflow.com/
Related
http://example.com/something/somewhere//somehow/script.js
Does the double slash break anything on the server side? I have a script that parses URLs and i was wondering if it would break anything (or change the path) if i replaced multiple slashes with a single slash. Especially on the server side, some frameworks like CodeIgniter and Joomla use segmented url schemes and routing. I would just want to know if it breaks anything.
HTTP RFC 2396 defines path separator to be single slash.
However, unless you're using some kind of URL rewriting (in which case the rewriting rules may be affected by the number of slashes), the uri maps to a path on disk, but in (most?) modern operating systems (Linux/Unix, Windows), multiple path separators in a row do not have any special meaning, so /path/to/foo and /path//to////foo would eventually map to the same file.
An additional thing that might be affected is caching. Since both your browser and the server cache individual pages (according to their caching settings), requesting same file multiple times via slightly different URIs might affect the caching (depending on server and client implementation).
The correct answer to this question is it depends upon the implementation of the server!
Preface: Double-slash is syntactically valid according to RFC 2396, which defines URL path syntax. As amn explains, it therefore implies an empty URI segment. Note however that RFC 2396 only defines the syntax, not semantics of paths, including empty path segments, so it is up to your server to decide the semantics of the empty path.
You didn't mention the server software stack you're using, perhaps you're even rolling your own? So please use your imagination as to what the semantics could be!
Practically, I would like to point out some everyday semantic-related reasons which mean you should avoid double slashes even though they are syntactically valid:
Since empty being valid is somehow not expected by everyone, it can cause bugs. And even though your server technology of today might be compatible with it, either your server technology of tomorrow or the next version of your server technology of today might decide not to support it any more. Example: ASP.NET MVC Web API library throws an error when you try to specify a route template with a double slash.
Some servers might interpret // as indicating the root path. This can either be on-purpose, or a bug - and then likely it is a security bug, i.e. a directory traversal vulnerability.
Because it is sometimes a bug, and a security bug, some clever server stacks and firewalls will see the substring '//', deduce you are possibly making an attempt at exploiting such a bug, and therefore they will return 403 Forbidden or 400 Bad Request etc, and refuse to actually do any further processing of the URI.
URLs don't have to map to filesystem paths. So even if // in a filesystem path is equivalent to /, you can't guarantee the same is true for all URLs.
Consider the declaration of the relevant path-absolute non-terminal in "RFC3986: Uniform Resource Identifier (URI): Generic Syntax" (specified, as is typical, in ABNF syntax):
path-absolute = "/" [ segment-nz *( "/" segment ) ]
Then consider the segment declaration a few lines further down in the same document:
segment = *pchar
If you can read ABNF, the asterisk (*) specifies that the following element pchar may be repeated multiple times to make up a segment, including zero times. Learning this and re-reading the path-absolute declaration above, you can see that a potentially empty segment imples that the second "/" may repeat indefinitely, hence allowing valid combinations like ////// (arbitrary length of at least one /) as part of path-absolute (which itself is used in specifying the rule describing a URI).
As all URLs are URIs we can conclude that yes, URLs are allowed multiple consecutive forward slashes, per quoted RFC.
But it's not like everyone follows or implements URI parsers per specification, so I am fairly sure there are non-compliant URI/URL parsers and all kinds of software that stacks on top of these where such corner cases break larger systems.
One thing you may want to consider is that it might affect your page indexing in a search engine. According to this web page,
A URL with the same path repeated 3 times will not be indexed in Google
The example they use is:
example.com/path/path/path/
I haven't confirmed this would also be true if you used example.com///, but I would certainly want to find out if SEO optimization was critical for my website.
They mention that "This is because Google thinks it has hit a URL trap." If anyone else knows the answer for sure, please add a comment to this answer; otherwise, I thought it relevant to include this case for consideration.
Yes, it can most definitely break things.
The spec considers http://host/pages/foo.html and http://host/pages//foo.html to be different URIs, and servers are free to assign different meanings to them. However, most servers will treat paths /pages/foo.html and /pages//foo.html identically (because the underlying file system does too). But even when dealing with such servers, it's easily possible for extra slash to break things. Consider the situation where a relative URI is returned by the server.
http://host/pages/foo.html + ../images/foo.png = http://host/images/foo.png
http://host/pages//foo.html + ../images/foo.png = http://host/pages/images/foo.png
Let me explain what that means. Say your server returns an HTML document that contains the following:
<img src="../images/foo.png">
If your browser obtained that page using
http://host/pages/foo.html # Path has 2 segments: "pages" and "foo.html"
your browser will attempt to load
http://host/images/foo.png # ok
However, if your browser obtained that page using
http://host/pages//foo.html # Path has 3 segments: "pages", "" and "foo.html"
you'll probably get the same page (because the server probably doesn't distinguish /pages//foo.html from /pages/foo.html), but your browser will erroneously try to load
http://host/pages/images/foo.png # XXX
You may be surprised for example when building links for resources in your app.
<script src="mysite.com/resources/jquery//../angular/script.js"></script>
will not resolve to mysite.com/resources/angular/script.js but to mysite.com/resources/jquery/angular/script.js what you probably didn't want
Double slashes are evil, try to avoid them.
Your question is "does it break anything". In terms of the URL specification, extra slashes are allowed. Don't read the RFC, here is a quick experiment you can try to see if your browser silently mangles the URL:
echo '<?= $_SERVER['REQUEST_URI'];' > tmp.php
php -S localhost:4000 tmp.php
I tested macOS 10.14 (18A391) with Safari 12.0 (14606.1.36.1.9) and Chrome 69.0.3497.100 and both get the result:
/hello//world
This indicated that using an extra slash is visible to the web application.
Certain use cases will be broken when using a double slash. This includes URL redirects/routing that are expecting a single-slashed URL or other CGI applications that are analyzing the URI directly.
But for normal cases of serving static content, such as your example, this will still get the correct content. But the client will get a cache miss against the same content accessed with different slashes.
Are their any conventions (either written or just generally understood) for when to use a forward slash (/) or a hyphen (-) when reading arguments/flags from a command line?
C:\> myprogram.exe -a
C:\> myprogram.exe /a
The two seem to be interchangeable in my experience, but I haven't used enough command line tools to say I've spotted any rules or patterns.
Is there a good reason that either of them are used at all? Could I theoretically use an asterisk (*) if I wanted to?
You can (theoretically) use whatever you want, as the parameters are just strings passed to your command-line program.
Windows convention seems to prefer the use of the forward slash ipconfig /all, though there are programs that take a hyphen gacutil -i or even a sort-of environment variable syntax setup SKUUPGRADE=1.
*Nix convention seems to prefer the hyphen -v for single-letter parameters, and double hyphen --verbose for multi-letter parameters.
I tend to prefer hyphens, as they are more OS-agnostic (forward slashes are path delimiters in some OSes) and used in more modern Windows apps (nuget, for example).
Edit:
This would be a good place to recommend a library that does .NET command-line argument parsing: http://commandline.codeplex.com/
See also Command line options style - POSIX or what?.
The tradition in DOS and Windows is to use a forward slash, as in /a or /extend. The tradition of using -a comes from Unix (and possibly elsewhere).
There's a GNU standard in which a single dash is used for one-letter flags, like -e -d, and they can be merged into -ed (so -ed is equivalent to -e -d). Then many-letter switches need two dashes, as in --extend --display. Sometimes it's only necessary to write as much of the word as is sufficient to deduce what switch is meant, so for example --disp might be a short-hadn for --display if no other switch begins with the letters disp....
Usually it's / on Windows and -/-- on Unix systems for short/long options. But there's no rule for that, so it is actually up to you.
Old DOS commands used the prefix / for optional parameters. Microsoft is now moving towards supporting the Posix use of - to denote parameters as documented in their PowerShell Command-Line Standard documentation. This is because their operating system now sees both \ and / as directory separator characters.
https://technet.microsoft.com/en-us/library/ee156811.aspx
A leading forward-slash (/) is common among Windows apps. Single hyphens (-) are common for short options (those consisting of a single letter) in applications that are POSIX-compliant. Double hyphens (--) are common for long options in such applications.
See this link for POSIX info. Or, see this SO post.
The convention is:
- for single-letter flags, and they can be chained (ex: rm -Rf is the same as rm -R -f)
-- for flags with more than one letter (ex: rm --recursive --force)
Using / is an old standard that Windows inherited DOS, and DOS from CP/M, and it should not be used for new software. It creates ambiguities on non-Windows systems and for the reader. Modern Microsoft tools such as .NET and PowerShell use - for flags, so even Microsoft doesn't want to use / anymore.
You can use anything you want, or no leading character at all. It is more about adhering to standards than anything. When other users use your application they will think to include a - or / when adding arguments because that is what they are used to.
Is the pattern matching logic used by this API exposed for reuse somewhere in the .Net Framework?
Something of the form FilePatternMatch( string searchPattern, stringfileNameToTest ) is what I'm looking for.
I'm implementing a temporary workaround for WP7 not filtering the results for this overload and I'd like the solution to both provide a consistent experience and avoid reinventing this functionality if it is exposed.
If the behaviour is not exposed for reuse, a regular expression solution (like glob pattern matching in .NET) will suffice and would save me spending the time to test the fine details of what the behaviour should be.
Perhaps one of the answers posted in the thread linked above is correct. Since I haven't confirmed the exact behaviour as yet, I wasn't able to determine this at a glance. Feel free to point me to one of those answers if you know it is behaviouraly an exact match to the API referenced in the question title.
I could assume the pattern matching is consistent with how DOS handled * and ? in 8.3 file names (I'm familiar with behavioural nuances of that implementation), but it's reasonable to assume Microsoft has evolved pattern matching behaviour for file names in the decade+ since so I thought I would check before proceeding on that assumption.
The pattern matching rules used by IsolatedStorageFile.GetFileNames() is the same as that used in System.IO.Path. They both use the Win32 API FindFirstFile/FindNextFile Functions, namely:
The directory or path, and the file
name, which can include wildcard
characters, for example, an asterisk
(*) or a question mark (?).
This parameter should not be NULL, an
invalid string (for example, an empty
string or a string that is missing the
terminating null character), or end in
a trailing backslash ().
If the string ends with a wildcard,
period (.), or directory name, the
user must have access permissions to
the root and all subdirectories on the
path.
In the ANSI version of this function,
the name is limited to MAX_PATH
characters. To extend this limit to
32,767 widecharacters, call the
Unicode version of the function and
prepend "\?\" to the path. For more
information, see Naming a File
.
Given that (at least on NTFS) the filesystem on Windows is case insensitive, I would like to compare String fileA to String fileB as such:
fileA.Equals(fileB, StringComparison.CurrentCultureIgnoreCase)
The question then becomes which culture I should use, does the default current (ui?) culture suffice? I can't seem to find any BCL methods for this purpose.
You should use StringComparison.OrdinalIgnoreCase, according to Best Practices for Using Strings in the .NET Framework.
The string behavior of the file system, registry keys and values, and environment variables is best represented by StringComparison.OrdinalIgnoreCase.
If you use a culture for matching the strings, you may get in a sitation where for example the names "häl.gif" and "hal.gif" would be considered a match.
This is not possible to do reliably.
Yes, the case conversion for the file system is case-insensitive.
But the case conversion table is stored on the file system itself (for NTFS), and it does change between versions (for instance the Vista case conversion table was brought to the Unicode 5 level, so Vista NTFS and XP NTFS have different case conversion rules).
And the thing that matters is the OS that formatted the file system, not the current OS.
Then you can run into all kind of problems with other file systems (Mac OS does some kind of Unicode normalization (not the standard one)), Linux does not do anything, but Samba (implementing the Windows file sharing protocol) does. And has other tables than Windows.
So what happens if I map a letter to a network disk shared by Linux or Mac OS?
In general you should never try to compare file names. If you want to know if it is there, try to access it.
Marcus,
You might want to at look at the answer for another StackOverflow question, which is very similar: Win32 File Name Comparison , which in turn mentions http://www.siao2.com/2005/10/17/481600.aspx .
Following a link in another answer to the same question and digging further, I came across the following MSDN article http://msdn.microsoft.com/en-us/library/ms973919.aspx . It is worth a read in general, but when it comes to file name comparison it recommends using StringComparison.OrdinalIgnoreCase. See Table 1 in the article, which contains file paths as one of the data types handled or the following the quote:
So, when interpreting file names, cookies, or anything else where something like the å combination can appear, ordinal comparisons still offer the most transparent and fitting behavior.
Hopes this helps,
Boaz
Maybe you could try this:
http://msdn.microsoft.com/en-us/library/zkcaxw5y.aspx
You could use InvariantCulture (look at http://msdn.microsoft.com/en-us/library/4c5zdc6a.aspx).
In your example:
FileA.Equals(FileB,StringComparison.InvariantCultureIgnoreCase )
I tried this.
Path.GetFullPath(path1).Equals(Path.GetFullPath(path2))
I've noticed that when using the Settings object that's created by a Windows Forms application, any spaces in the "Company Name" field of the assembly info are replaced by underscores in the path of the user.config file. For example, in XP the path to the user.config file will be something like:
\Documents and Settings\user\Local Settings\Application Data\Company_Name_Here\App\Version\user.config
But this only seems to be happening to my own applications. I've got lots of .NET applications installed on my machine, but none of the other directory names under Application Data contain underscores (the spaces are preserved).
What gives? It's not a big deal, but I'm just wondering why this only seems to be happening to my applications, and if there's a way to change this behavior that I'm not aware of.
Quoting someone who worked at Microsoft
<Company Name> - is typically the string specified by the AssemblyCompanyAttribute (with the caveat that the string is escaped and truncated as necessary, and if not specified on the assembly, we have a fallback procedure).
and
Q: Why is the path so obscure? Is there any way to change/customize it?
A: The path construction algorithm has to meet certain rigorous requirements in terms of security, isolation and robustness. While we tried to make the path as easily discoverable as possible by making use of friendly, application supplied strings, it is not possible to keep the path totally simple without running into issues like collisions with other apps, spoofing etc.
The LocalFileSettingsProvider does not provide a way to change the files in which settings are stored. Note that the provider itself doesn't determine the config file locations in the first place - it is the configuration system. If you need to store the settings in a different location for some reason, the recommended way is to write your own SettingsProvider. This is fairly simple to implement and you can find samples in the .NET 2.0 SDK that show how to do this. Keep in mind however that you may run into the same isolation issues mentioned above .
might give some hint of explanation.
So other applications might have used an individual settings provider that supports whitespaces.
The restrictions of the default .NET settings provider are also mentioned here:
Each application setting must have a unique name; the name can be any combination of letters, numbers, or an underscore that does not start with a number, and cannot contain spaces. The name can be changed through the Name property.