Why "test user-doc.doc" ==> TESTUS~1.DOC?

Why "test user-doc.doc" ==> TESTUS~1.DOC? - c#

I wrote a c# program and I associated it with file extension like DOC in a PC without MS-Office installed. Then, I double-clicked any file which name contains blank characters, my program will be launched to open that file. I used below statement:
string[] args = Environment.GetCommandLineArgs();
and then args[1] will contain full path file name of that file. Then, I can open it. But the problem now is that if the file name contains blank characters, args[1] contains file name different from the real one. As title, if my file is in e:\tmp3 and file name is test uesr=doc.doc, I expected args[1] contains
"e:\tmp3\test user-doc.doc",
but it actually contains
"E:\tmp3\TESTUS~1.DOC"
Could anyone tell me why and how to resolve it? Thanks.

As already mentioned these are 8.3 file names. If you need to convert from a short name to a full name then you can easily do this with C#.
new FileInfo("E:\tmp3\TESTUS~1.DOC").FullName
Going the other way requires a PInvoke call to GetShortPathName. Be aware that this doesn't work on all NTFS volumes as short names can be turned off but they are turned on by default for the volume the OS is on.
class Program
{
[DllImport("kernel32.dll", SetLastError = true)]
private static extern int GetShortPathName(String pathName, StringBuilder shortName, int cbShortName);
static void Main(string[] args)
{
var fullname = args[0];
var shortPathBuilder = new StringBuilder(fullname.Length);
GetShortPathName(fullname, shortPathBuilder, shortPathBuilder.Length);
var shortname = shortPathBuilder.ToString();
}
}

You should put double quote marks around the %1 replacement in the shell\open\command registry key. For example:
"C:\Program Files\MyApp\MyApp.exe" "%1"
rather than
"C:\Program Files\MyApp\MyApp.exe" %1
If you don't include the double quote marks, Windows detects that filenames with spaces (or other parameter separators) are unlikely to work, and substitutes the short file name. This is for compatibility with 16-bit Windows programs (the HKCR\shell key was introduced for Windows 3.1).

They are called 8.3 Filenames. Basically, they are an alias for the file in the File Allocation Table that shortens the path to the file.
8.3 refers to "8 characters.. then a dot.. then 3 characters". The three characters are the file extension obviously..
Also, you'll note that TESTUS~1 is 8 characters in length.
As far as I am aware, there isn't really much you can do to stop Windows from doing this. You could format your disk to be NTFS I think (I don't think NTFS is so aggresive with file "aliasing").

The issue is with space character (the blank one), as it will consider it
as args[2] i.e. test user-doc.doc will be treated as two args instead of one
due to blank character, so you can use sub string function, with calculating
total number of args as well, then first concatenate all args from args[1] to args[n]
where n is the size of args, this way you can avoid the problem

Related

Constraint a path to be within a folder

Say you want to store a file within a folder C:\A\B\C and let the user supply the file name.
Just combine them, right?
Wrong.
If the user selects something like \..\..\Ha.txt you might be in for a surprise.
So how do we restrict the result to within C:\A\B\C? It's fine if it's within a subfolder, just not over it.

I've used one of my test projects, it really doesn't matter:
Using c#10
internal class Program
{
static void Main(string[] args)
{
string template = #"F:\Projectes\Test\SourceGenerators";
string folder = #"..\..\..\..\Test1.sln";
Console.WriteLine(MatchDirectoryStructure(template, folder)
? "Match"
: "Doesn't match");
}
static bool MatchDirectoryStructure(string template, string folder)
=> new DirectoryInfo(folder).FullName.StartsWith(template);
}
As you can see, new DirectoryInfo(fileName).FullName; returns the real name of the directory.
From here you can check if it match with the desired result.
In this case the returned value is:
Match

If you're asking for a file name, then it should be just the name of the file. The more control you give to the user about subdirectories, the more they can mess with you.
The idea here is to split your path by both possible slashes (/ and \) and see if the value of any of the entries in the array is ...
string input = #"\..\..\Ha.txt";
bool containsBadSegments = input
.Split(new [] { '/', '\\' })
.Any(s => s is "..");
This answer only takes care of detecting \..\ in the path. There are plenty of other ways to input bad values, such as characters not allowed by the OS's file system, or absolute or rooted paths.

System.IO.Directory.CreateDirectory - weirdest exception ever

so, I'm trying to create a following directory:
d:\temp\ak\ty\nul
Path is constructed in the loop, starting from: d:\temp and so on, creating non-existent directories along the way, so it first creates:
d:\temp\ak
then:
d:\temp\ak\ty
and.... then it comes to the last bit nul it throws this exception:
So, what's going on - where it took \.\nul from?
The code:
string z_base_path = #"d:\temp\ak\ty";
string z_extra_path = "nul";
string z_full_path = System.IO.Path.Combine(z_base_path, z_extra_path);
System.IO.Directory.CreateDirectory(z_full_path);

In Windows, nul is a reserved file name. No file or directory may be named that. Other reserved names include:
con
prn
aux
com{0-9}
lpt{0-9}

'nul' is a device file meaning that no file/folder can have that name.
instead of
string z_extra_path = "nul";
try
string z_extra_path = "null";
or
string z_extra_path = "";
other ones are
con
aux
com1-9
lpt1-9
prn

Never knew this one until I came against it - it's worth nothing the Windows Directory reserved names and all the rest.
Taken from about article:
https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file
Folder Naming Conventions
The following fundamental rules enable applications to create and process valid names for files and directories, regardless of the file system:
Use a period to separate the base file name from the extension in the name of a directory or file.
Use a backslash () to separate the components of a path. The backslash divides the file name from the path to it, and one directory name from another directory name in a path. You cannot use a backslash in the name for the actual file or directory because it is a reserved character that separates the names into components.
Use a backslash as required as part of volume names, for example, the "C:" in "C:\path\file" or the "\server\share" in "\server\share\path\file" for Universal Naming Convention (UNC) names. For more information about UNC names, see the Maximum Path Length Limitation section.
Do not assume case sensitivity. For example, consider the names OSCAR, Oscar, and oscar to be the same, even though some file systems (such as a POSIX-compliant file system) may consider them as different. Note that NTFS supports POSIX semantics for case sensitivity but this is not the default behavior. For more information, see CreateFile.
Volume designators (drive letters) are similarly case-insensitive. For example, "D:" and "d:" refer to the same volume.
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
The following reserved characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
Integer value zero, sometimes referred to as the ASCII NUL character.
Characters whose integer representations are in the range from 1 through 31, except for alternate data streams where these characters are allowed. For more information about file streams, see File Streams.
Any other character that the target file system does not allow.
Use a period as a directory component in a path to represent the current directory, for example ".\temp.txt". For more information, see Paths.
Use two consecutive periods (..) as a directory component in a path to represent the parent of the current directory, for example "..\temp.txt". For more information, see Paths.
Do not use the following reserved names for the name of a file:
CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9. Also avoid these names followed immediately by an extension; for example, NUL.txt is not recommended. For more information, see Namespaces.
Do not end a file or directory name with a space or a period. Although the underlying file system may support such names, the Windows shell and user interface does not. However, it is acceptable to specify a period as the first character of a name. For example, ".temp".

List files in folder which match pattern

I need to list files in directory which match some pattern.
I tried playing with Directory.GetFiles, but don't fully
get why it behaves in some way.
1) For example, this code:
string[] dirs = Directory.GetFiles(#"c:\test\", "*t");
foreach (string dir in dirs)
{
Debugger.Log(0, "", dir);
Debugger.Log(0, "", "\n");
}
outputs this:
c:\test\11.11.2007.txtGif
c:\test\12.1.1990.txt
c:\test\2.tGift
c:\test\2.txtGif
c:\test\test.txt
...others hidden
You can see some files end with f but were still returned by query, why?
2) Also, this:
string[] dirs = Directory.GetFiles(#"c:\test\", "*.*.*.txt");
foreach (string dir in dirs)
{
Debugger.Log(0, "", dir);
Debugger.Log(0, "", "\n");
}
returns this:
c:\test\1.1.1990.txt
c:\test\1.31.1990.txt
c:\test\12.1.1990.txt
c:\test\12.31.1990.txt
But according to the documentation (http://msdn.microsoft.com/en-us/library/07wt70x2(v=vs.110).aspx) I think it had to return also
this file which is in the directory:
11.11.2007.txtGif
since extension (in the query string) is 3 letters long, but it didn't. why?
(when query extension is 3 letters long, doc says it will return extensions which start with specified extensions too, e.g., see Remarks).
Am I the only one who finds these results strange?
Is there any other approach you would recommend for using when one wants to list files in folder which match certain pattern?
User in my case may arbitrarily type some pattern, and I don't want to rely on
method which I am unsure about the result (like it happened with GetFiles).

This is the way that the Windows API works - you will see the same results if you use the dir command in a command prompt. This does NOT use regular expressions! It's pretty obscure...
If you want to do your own filtering, you can do it like so:
var filesEndingInT = Directory.EnumerateFiles(#"c:\test\").Where(f => f.EndsWith("t"));
If you want to use regular expressions to match, you can do it thusly:
Regex regex = new Regex(".*t$");
var matches = Directory.EnumerateFiles(#"c:\test\").Where(f => regex.IsMatch(f));
I suspect that you will want to let the user type in a simplified form of pattern and turn it into a regular expression, e.g.
"*.t" -> ".*t$"
The regular expression to find all filenames ending in t is ".*t$":
.*t$
Debuggex Demo

All of this behavior is exactly as described in the documentation you've linked. Here's an excerpt of the pertinent bits:
When you use the asterisk wildcard character in a searchPattern such
as "*.txt", the number of characters in the specified extension
affects the search as follows:
If the specified extension is exactly three characters long, the method returns files with extensions that begin with the specified
extension. For example, "*.xls" returns both "book.xls" and
"book.xlsx".
In all other cases, the method returns files that exactly match the specified extension. For example, "*.ai" returns "file.ai" but not
"file.aif".
When you use the question mark wildcard character, this method returns
only files that match the specified file extension. For example, given
two files, "file1.txt" and "file1.txtother", in a directory, a search
pattern of "file?.txt" returns just the first file, whereas a search
pattern of "file*.txt" returns both files. NoteNote
Because this method checks against file names with both the 8.3 file
name format and the long file name format, a search pattern similar to
"1.txt" may return unexpected file names. For example, using a
search pattern of "1.txt" returns "longfilename.txt" because the
equivalent 8.3 file name format is "LONGFI~1.TXT".
http://msdn.microsoft.com/en-us/library/wz42302f%28v=vs.110%29.aspx
The last paragraph above clearly explains your results when searching for *t. You can see this by using the command dir C:\test /x to show the 8.3 filenames. Here, C:\test\11.11.2007.txtGif matches *t because its 8.3 filename is 111120~1.TXT.
For the treatment of *.*.*.txt, I think you're either mis-interpreting the first bit about three-letter file extensions or perhaps it wasn't written quite clearly. Note that they quite specifically mentioned wildcard usage 'in a searchPattern such as "*.txt"'. Your search pattern doesn't match that, so you have to read between the lines a bit to see why their comment about three-letter file extensions applies to the example they gave but not yours. Really, I think that whole top section can be ignored if you just put a bit of thought into the last bit about 8.3 filenames. The treatment of three-letter file extensions after wildcards is really just a side-effect of the 8.3 filename search behavior.
Consider the examples they gave:
"*.xls" returns both "book.xls" and "book.xlsx"
This is because the filename for "book.xls" (both 8.3 and long filename, since the name naturally complies with 8.3) and the 8.3 filename for "book.xlsx" ("BOOK~1.XLS") matches a query of "*.xls".
"*.ai" returns "file.ai" but not "file.aif"
This is because "file.ai" naturally matches "*.ai" while "file.aif" doesn't. 8.3 search behavior doesn't come into play here at all, because both filenames are already 8.3-compliant. However, even if they weren't, the same would still hold true because any 8.3 filename for a file with an extension of ".ai" is still going to end in just ".AI".
The only reason it matters whether or not the file extension in your search is exactly three characters is because the 8.3 filenames are included in the search, and 8.3 filname extensions for objects with long filenames will always have just the first three characters after the last dot in the long filename. The key part missing from the documentation above is that the "first three characters" matching is done only against the 8.3 filename.
So, let's look at the anomalies you're asking about here. (If you want any other strange behaviors explained, beyond your results for *.t and *.*.*.txt, please post them as separate questions.)
TL;DR:
Output of a search for *t includes 11.11.2007.txtGif and 2.txtGif.
This is because the 8.3 filenames match a pattern of *t.
11.11.2007.txtGif = 111120~1.TXT
2.txtGIF = 2BEFD~1.TXT
(Both 8.3 filenames end in "T".)
Output of a search for *.*.*.txt does not include 11.11.2007.txtGif.
This is because neither the long filename, nor the 8.3 filename, match a pattern of *.*.*.txt.
11.11.2007.txtGif = 111120~1.TXT
(The long filename doesn't match because it doesn't end in ".txt", and the 8.3 filename doesn't match because it only has one dot.)

https://learn.microsoft.com/en-us/dotnet/api/system.io.directoryinfo.getfiles?view=netframework-4.5
The above Microsoft documentation is wrong as usual,
it says this code:
DirectoryInfo di = new DirectoryInfo(#"C:\Users\tomfitz\Documents\ExampleDir");
Console.WriteLine("No search pattern returns:");
Console.WriteLine();
Console.WriteLine("Search pattern *2* returns:");
foreach (var fi in di.GetFiles("*2*"))
{
Console.WriteLine(fi.Name);
Console.WriteLine(fi.Fullname); // this reveals the bug
}
should return the following but it does not
It still matches against the whole file path not just the filename.
Search pattern *2* returns:
log2.txt
test2.txt

Split path by "\\" in C#

How can I split a path by "\\"? It gives me a syntax error if I use
path.split("\\");

You should be using
path.Split(Path.DirectorySeparatorChar);
if you're trying to split a file path based on the native path separator.

Try path.Split('\\') --- so single quote (for character)
To use a string this works:
path.Split(new[] {"\\"}, StringSplitOptions.None)
To use a string you have to specify an array of strings. I never did get why :)

There's no string.Split overload which takes a string. (Also, C# is case-sensitive, so you need Split rather than split). However, you can use:
string bits = path.Split('\\');
which will use the overload taking a params char[] parameter. It's equivalent to:
string bits = path.Split(new char[] { '\\' });
That's assuming you definitely want to split by backslashes. You may want to split by the directory separator for the operating system you're running on, in which case Path.DirectorySeparatorChar would probably be the right approach... it will be / on Unix and \ on Windows. On the other hand, that wouldn't help you if you were trying to parse a Windows file system path in an ASP.NET page running on Unix. In other words, it depends on your context :)
Another alternative is to use the methods on Path and DirectoryInfo to get information about paths in more file-system-sensitive ways.

To be on the safe side, you could use:
path.Split(new[] { Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar });

On windows, using forward slashes is also accepted, in C# Path functions and on the command line, in Windows 7/XP at least.
e.g.:
Both of these produce the same results for me:
dir "C:/Python33/Lib/xml"
dir "C:\Python33\Lib\xml"
(In C:)
dir "Python33/Lib/xml"
dir "Python33\Lib\xml"
On windows, neither '/' or '\' are valid chars for filename. On Linux, '\' is ok in filenames, so you should be aware of this if parsing for both.
So if you wanted to support paths in both forms (like I do) you could do:
path.Split(new char[] {'/', '\\'});
On Linux it would probably be safer to use Path.DirectorySeparatorChar.

Path.Split(new char[] { '\\\' });

Better just use the existing class System.IO.Path, so you don't need to care for any system specifications.
It provides methods to access any part of a file path like GetFileName(string path) etc.

A complete solution could look like this:
//
private static readonly char[] pathSeps = new char[] {
Path.DirectorySeparatorChar,
Path.AltDirectorySeparatorChar,
Path.VolumeSeparatorChar,
};
//
///<summary>Split a path according to the file system rules.</summary>
public static string[] SplitPath( string path ) {
if ( null == path ) return null;
return path.Split( pathSeps, StringSplitOptions.RemoveEmptyEntries );
}
Some of the other proposed solutions in this article use the syntax:
path.Split(new char[] {'/', '\'});
Although this will work, it has various disadvantages:
It does not allow your application to adapt to various target platforms. Currently, our applications are basically running on UNIX and Windows OSs (Win, macOS, iOS, linux variations). So there is a fixed set of path characters. But this might change when dotNET were ported to other operating systems. So it is best to use the predefined constants.
Performance of the inline syntax is worse. This might not be of interest for a handful of files, but when working with millions of files there are noticeable differences. The managed memory will go up until next GC. When looking at the generated assembly code you will find "call CORINFO_HELP_NEWARR_1_VC" for each of the 'new' statements, even in Release mode. This happens whenever you new-up any array, because arrays are not immutable. My proposed solution prevents this by declaring the array as readonly and static.
Reusability of the inline syntax also is worse, because you might want to use the path separators array in other contexts.
StringSplitOptions.RemoveEmptyEntries should be used to account for UNC paths and possible typos within the incoming path. The operating systems do not allow duplicate path separators, but there might be a typo from the user or a duplicate concatenation of path separator characters, for example when concatenating the path and filename.

File paths with non-ascii characters and FileInfo in C#

I get a string that more or less looks like this:
"C:\\bláh\\bleh"
I make a FileInfo with it, but when I check for its existence it returns false:
var file = new FileInfo(path);
file.Exists;
If I manually rename the path to
"C:\\blah\\bleh"
at debug time and ensure that blah exists with a bleh inside it, then file.Exists starts returning true. So I believe the problem is the non-ascii character.
The actual string is built by my program. One part comes from the AppDomain of the application, which is the part that contains the "á", the other part comes, in a way, from the user. Both parts are put together by Path.Combine. I confirmed the validity of the resulting string in two ways: copying it from the error my program generates, which includes the path, into explorer opens the file just fine. Looking at that string at the debugger, it looks correctly escaped, in that \ are written as \. The "á" is printed literarily by the debugger.
How should I process a string so that even if it has non-ascii characters it turns out to be a valid path?

Here is a method that will handle diacritics in filenames. The success of the File.Exists method depends on how your system stores the filename.
public bool FileExists(string sPath)
{
//Checking for composed and decomposed is to handle diacritics in filenames.
var pathComposed = sPath.Normalize(NormalizationForm.FormC);
if (File.Exists(pathComposed))
return true;
//We really need to check both possibilities.
var pathDecomposed = sPath.Normalize(NormalizationForm.FormD);
if (File.Exists(pathDecomposed))
return true;
return false;
}

try this
string sourceFile = #"C:\bláh\bleh";
if (File.Exists(sourceFile))
{
Console.WriteLine("file exist.");
}
else
{
Console.WriteLine("file does not exist.");
}
Note : The Exists method should not be used for path validation, this method merely checks if the file specified in path exists. Passing an invalid path to Exists returns false.
For path validation you can use Directory.Exists.

I have just manuall created a bláh folder containing a bleh file, and with that in place, this code prints True as expected:
using System;
using System.IO;
namespace ConsoleApplication72
{
class Program
{
static void Main(string[] args)
{
string filename = "c:\\bláh\\bleh";
FileInfo fi = new FileInfo(filename);
Console.WriteLine(fi.Exists);
Console.ReadLine();
}
}
}
I would suggest checking the source of your string - in particular, although your 3k rep speaks against this being the problem, remember that expressing a backslash as \\ is an artifact of C# syntax, and you want to make sure your string actually contains only single \s.

Referring to #adatapost's reply, the list of invalid file name characters (gleaned from System.IO.Path.GetInvalidFileNameChars() in fact doesn't contain normal characters with diacritics.
It looks like the question you're really asking is, "How do I remove diacritics from a string (or in this case, file path)?".
Or maybe you aren't asking this question, and you genuinely want to find a file with name:
c:\blòh\bleh
(or something similar). In that case, you then need to try to open a file with the same name, and not c:\bloh\bleh.

Look like the "bleh" in the path is a directory, not a file. To check if the folder exist use Directory.Exists method.

The problem was: the program didn't have enough permissions to access that file. Fixing the permissions fixed the problem. It seems that when I didn't my experiment I somehow managed to reproduce the permission problem, possibly by creating the folder without the non-ascii character by hand and copying the other one.
Oh... so embarrassing.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.