Directory.CreateDirectory fails with invalid character - c#

I am facing issue that my path string passes check for Path.GetInvalidPathChars() but fails when trying to create directory.
static void Main(string[] args)
{
string str2 = #"C:\Temp\hjk&(*&ghj\config\";
foreach (var character in System.IO.Path.GetInvalidPathChars())
{
if (str2.IndexOf(character) > -1)
{
Console.WriteLine("String contains invalid path character '{0}'", character);
return;
}
}
Directory.CreateDirectory(str2); //<-- Throws exception saying Invalid character.
Console.WriteLine("Press any key..");
Console.ReadKey();
}
Any idea what could be the issue?

This is one of those times where slight issues in the wording of the documentation can make all the difference on how we look at or use the API. In our case, that part of the API doesn't do us much good.
You haven't completely read the documentation on Path.GetInvalidPathChars():
The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names. The full set of invalid characters can vary by file system. For example, on Windows-based desktop platforms, invalid path characters might include ASCII/Unicode characters 1 through 31, as well as quote ("), less than (<), greater than (>), pipe (|), backspace (\b), null (\0) and tab (\t).
And don't think that Path.GetInvalidFileNameChars() will do you any better immediately (we'll prove how this is the better choice below):
The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names. The full set of invalid characters can vary by file system. For example, on Windows-based desktop platforms, invalid path characters might include ASCII/Unicode characters 1 through 31, as well as quote ("), less than (<), greater than (>), pipe (|), backspace (\b), null (\0) and tab (\t).
In this situation, it's best to try { Directory.CreateDirectory(str2); } catch (ArgumentException e) { /* Most likely the path was invalid */ } instead of manually validating the path*. This will work independent of file-system.
When I tried to create your directory on my Windows system:
Now if we go through all the characters in that array:
foreach (char c in Path.GetInvalidPathChars())
{
Console.WriteLine($"0x{(int)c:X4} : {c}");
}
We get:
0x0022 : "
0x003C : <
0x003E : >
0x007C : |
0x0000 :
0x0001 :
0x0002 :
0x0003 :
0x0004 :
0x0005 :
0x0006 :
0x0007 :
0x0008 :
0x0009 :
0x000A :
0x000B :
0x000C :
0x000D :
0x000E :
0x000F :
0x0010 :
0x0011 :
0x0012 :
0x0013 :
0x0014 :
0x0015 :
0x0016 :
0x0017 :
0x0018 :
0x0019 :
0x001A :
0x001B :
0x001C :
0x001D :
0x001E :
0x001F :
As you can see, that list is incomplete.
However: if we do the same for GetInvalidFileNameChars()
foreach (char c in Path.GetInvalidFileNameChars())
{
Console.WriteLine($"0x{(int)c:X4} : {c}");
}
We end up with a different list, which includes all of the above, as well as:
0x003A : :
0x002A : *
0x003F : ?
0x005C : \
0x002F : /
Which is exactly what our error-message indicates. In this situation, you may decide you want to use that instead. Just remember our warning above, Microsoft makes no guarantees as to the accuracy of either of these methods.
Of course, this isn't perfect, because using Path.GetInvalidFileNameChars() on a path will throw a false invalidation (\ is invalid in a filename, but it's perfectly valid in a path!), so you'll need to correct for that. You can do so by ignoring (at the very least) the following characters:
0x003A : :
0x005C : \
You may also want to ignore the following character (as sometimes people use the web/*nix style paths):
0x002F : /
The last thing to do here is demonstrate a slightly easier way of writing this code. (I'm a regular on Code Review so it's second nature.)
We can do this whole thing in one expresion:
System.IO.Path.GetInvalidFileNameChars().Except(new char[] { '/', '\\', ':' }).Count(c => str2.Contains(c)) > 0
Example of usage:
var invalidPath = #"C:\Temp\hjk&(*&ghj\config\";
var validPath = #"C:\Temp\hjk&(&ghj\config\"; // No asterisk (*)
var invalidPathChars = System.IO.Path.GetInvalidFileNameChars().Except(new char[] { '/', '\\', ':' });
if (invalidPathChars.Count(c => invalidPath.Contains(c)) > 0)
{
Console.WriteLine("Invalid character found.");
}
else
{
Console.WriteLine("Free and clear.");
}
if (invalidPathChars.Count(c => validPath.Contains(c)) > 0)
{
Console.WriteLine("Invalid character found.");
}
else
{
Console.WriteLine("Free and clear.");
}
*: This is arguable, you may want to manually validate the path if you are certain your validation code will not invalidate valid paths. As MikeT said: "you should always try to validate before getting an exception". Your validation code should be equal or less restrictive than the next level of validation.

I faced the same problem as described above.
Due to the fact that each subdirectory name is virtually a file name, I have connected some solutions that I found here:
string retValue = string.Empty;
var dirParts = path.Split(Path.DirectorySeparatorChar, (char)StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < dirParts.Length; i++)
{
if (i == 0 && Path.IsPathRooted(path))
{
retValue = string.Join("_", dirParts[0].Split(Path.GetInvalidPathChars()));
}
else
{
retValue = Path.Combine(retValue, string.Join("_", dirParts[i].Split(Path.GetInvalidFileNameChars())));
}
}
My solution returns "C:\Temp\hjk&(_&ghj\config" for the given path in the question.

Related

Regex for ClassName.PropertyName

I don't know Regex,
But I need to have regex expression for evaluation of ClassName.PropertyName?
Need to validate some values from appSettings for being compliant with ClassName.PropertyName convention
"ClassName.PropertyName" - this is the only format that is valid, the rest below is invalid:
"Personnel.FirstName1" <- the only string that should match
"2Personnel.FirstName1"
"Personnel.33FirstName"
"Personnel..FirstName"
"Personnel.;FirstName"
"Personnel.FirstName."
"Personnel.FirstName "
" Personnel.FirstName"
" Personnel. FirstName"
" 23Personnel.3FirstName"
I have tried this (from the link posted as duplicate):
^\w+(.\w+)*$
but it doesn't work: I have false positives, e.g. 2Personnel.FirstName1 as well as Personnel.33FirstName passes the check when both should have been rejected.
Can someone help me with that?
Let's start from single identifier:
Its first character must be letter or underscope
It can contain letters, underscopes and digits
So the regular expression for an identifier is
[A-Za-z_][A-Za-z0-9_]*
Next, we should chain identifier with . (do not forget to escape .) an indentifier followed by zero or more . + identifier:
^[A-Za-z_][A-Za-z0-9_]*(?:\.[A-Za-z_][A-Za-z0-9_]*)*$
In case it must be exactly two identifiers (and not, say abc.def.hi - three ones)
^[A-Za-z_][A-Za-z0-9_]*\.[A-Za-z_][A-Za-z0-9_]*$
Tests:
string[] tests = new string[] {
"Personnel.FirstName1", // the only string that should be matched
"2Personnel.FirstName1",
"Personnel.33FirstName",
"Personnel..FirstName",
"Personnel.;FirstName",
"Personnel.FirstName.",
"Personnel.FirstName ",
" Personnel.FirstName",
" Personnel. FirstName",
" 23Personnel.3FirstName",
} ;
string pattern = #"^[A-Za-z_][A-Za-z0-9_]*(\.[A-Za-z_][A-Za-z0-9_]*)*$";
var results = tests
.Select(test =>
$"{"\"" + test + "\"",-25} : {(Regex.IsMatch(test, pattern) ? "matched" : "failed")}"");
Console.WriteLine(String.Join(Environment.NewLine, results));
Outcome:
"Personnel.FirstName1" : matched
"2Personnel.FirstName1" : failed
"Personnel.33FirstName" : failed
"Personnel..FirstName" : failed
"Personnel.;FirstName" : failed
"Personnel.FirstName." : failed
"Personnel.FirstName " : failed
" Personnel.FirstName" : failed
" Personnel. FirstName" : failed
" 23Personnel.3FirstName" : failed
Edit: In case culture specific names (like äöü.FirstName) should be accepted (see Rand Random's comments) then [A-Za-z] range should be changed into \p{L} - any letter. Exotic possibility - culture specific digits (e.g. Persian ones - ۰۱۲۳۴۵۶۷۸۹) can be solved by changing 0-9 into \d
// culture specific letters, but not digits
string pattern = #"^[\p{L}_][\p{L}0-9_]*(?:\.[\p{L}_][\p{L}0-9_]*)*$";
If each identifier should not exceed sertain length (say, 16) we should redesign initial identifier pattern: mandatory letter or underscope followed by [0..16-1] == {0,15} letters, digits or underscopes
[A-Za-z_][A-Za-z0-9_]{0,15}
And we have
string pattern = #"^[A-Za-z_][A-Za-z0-9_]{0,15}(?:\.[A-Za-z_][A-Za-z0-9_]{0,15})*$";
^[A-Za-z]*\.[A-Za-z]*[0-9]$
or
^[A-Za-z]*\.[A-Za-z]*[0-9]+$
if you need more than one numerical character in the number suffix

Need Regex expression to allow only either numbers or letters separated by comma and it should not allow alpha numeric

Need Regex expression to allow only either numbers or letters separated by comma and it should not allow alpha numeric combinations (like "abc123").
Some examples:
Valid:
123,abc
abc,123
123,123
abc,abc
Invalid:
abc,abc123
abc133,abc
abc123,abc123
Since valid and invalid are changed, I've rewritten my answer from scratch.
The suggested pattern is
^(([0-9]+)|([a-zA-Z]+))(,(([0-9]+)|([a-zA-Z]+)))*$
Demo:
string[] tests = new string[] {
"123,abc",
"abc,123",
"123,123",
"abc,abc",
"abc,abc123",
"abc133,abc",
"abc123,abc123",
// More tests
"123abc", // invalid (digits first, then letters)
"123", // valid (one item)
"a,b,c,1,2,3", // valid (more than two items)
"1e4", // invalid (floating point number)
"1,,2", // invalid (empty part)
"-3", // invalid (minus sign)
"۱۲۳", // invalid (Persian digits)
"число" // invalid (Russian letters)
};
string pattern = #"^(([0-9]+)|([a-zA-Z]+))(,(([0-9]+)|([a-zA-Z]+)))*$";
var report = string.Join(Environment.NewLine, tests
.Select(item => $"{item,-20} : {(Regex.IsMatch(item, pattern) ? "valid" : "invalid")}"));
Console.WriteLine(report);
Outcome:
123,abc : valid
abc,123 : valid
123,123 : valid
abc,abc : valid
abc,abc123 : invalid
abc133,abc : invalid
abc123,abc123 : invalid
123abc : invalid
123 : valid
a,b,c,1,2,3 : valid
1e4 : invalid
1,,2 : invalid
-3 : invalid
۱۲۳ : invalid
число : invalid
Pattern's explanation:
^ - string beginning (anchor)
([0-9]+)|([a-zA-Z]+) - either group of digits (1+) or group of letters
(,(([0-9]+)|([a-zA-Z]+))) - fllowed by zero or more such groups
$ - string ending (anchor)
If you specify Dmitry Bychenkos regex with RegexOptions.IgnoreCase you can shrink it down to Regex.IsMatch (test, #"^[0-9a-z](,[0-9a-z])*$",RegexOptions.IgnoreCase)
Alternate way to check it w/o regex (performs worse):
using System;
using System.Linq;
public class Program1
{
public static void Main()
{
var mydata = new[] {"1,3,4,5,1,3,a,s,r,3", "2, 4 , a", " 2,3,as"};
// function that checks it- perfoms not as good as reges as internal stringarray
// is build and analyzed
Func<string,bool> isValid =
data => data.Split(new[]{','}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim())
.All(aChar => aChar.Length == 1 && char.IsLetterOrDigit(aChar[0]));
foreach (var d in mydata)
{
Console.WriteLine(string.Format("{0} => is {1}",d, isValid(d) ? "Valid" : "Invalid"));
}
}
}
Output:
1,3,4,5,1,3,a,s,r,3 => is Valid
2, 4 , a => is Valid
2,3,as => is Invalid
To match words separated by commas, where the words consist either of digits or of letters:
^(\d+|[a-zA-Z]+)(,(\d+|[a-zA-Z]+))*$
Explanation
\d+ matches a string of at least one digit.
[a-zA-Z] matches a string of at least one upper or lower case letter.
(\d+|[a-zA-Z]+) matches either a string of digits or a string of letters.
C#
Regex regex = new Regex(#"^(\d+|[a-zA-Z]+)(,(\d+|[a-zA-Z]+))*$");

C# - Getting multiple values with a single key, from a text file

I store multiple values that shares a single key on a text file. The text file looks like that:
Brightness 36 , Manual
BacklightCompensation 3 , Manual
ColorEnable 0 , None
Contrast 16 , Manual
Gain 5 , Manual
Gamma 122 , Manual
Hue 0 , Manual
Saturation 100 , Manual
Sharpness 2 , Manual
WhiteBalance 5450 , Auto
Now I want to store the int value & string value of each key (Brightness, for example).
New to C# and could'nt find something that worked yet.
Thanks
I'd recommend to use custom types to store these settings like these:
public enum DisplaySettingType
{
Manual, Auto, None
}
public class DisplaySetting
{
public string Name { get; set; }
public decimal Value { get; set; }
public DisplaySettingType Type { get; set; }
}
Then you could use following LINQ query using string.Split to get all settings:
decimal value = 0;
DisplaySettingType type = DisplaySettingType.None;
IEnumerable<DisplaySetting> settings = File.ReadLines(path)
.Select(l => l.Trim().Split(new[] { ' ', ',' }, StringSplitOptions.RemoveEmptyEntries))
.Where(arr => arr.Length >= 3 && decimal.TryParse(arr[1], out value) && Enum.TryParse(arr[2], out type))
.Select(arr => new DisplaySetting { Name = arr[0], Value = value, Type = type });
With a regex and a little bit of linq you can do many things.
Here I assume you Know How to read a Text file.
Pros: If the file is not perfect, the reg exp will just ignore the misformatted line, and won't throw error.
Here is a hardcode version of your file, note that a \r will appears because of it. Depending on the way you read you file but it should not be the case with a File.ReadLines()
string input =
#"Brightness 36 , Manual
BacklightCompensation 3 , Manual
ColorEnable 0 , None
Contrast 16 , Manual
Gain 5 , Manual
Gamma 122 , Manual
Hue 0 , Manual
Saturation 100 , Manual
Sharpness 2 , Manual
WhiteBalance 5450 , Auto";
string regEx = #"(.*) (\d+) , (.*)";
var RegexMatch = Regex.Matches(input, regEx).Cast<Match>();
var outputlist = RegexMatch.Select(x => new { setting = x.Groups[1].Value
, value = x.Groups[2].Value
, mode = x.Groups[3].Value });
Regex explanation:/(.*) (\d+) , (.*)/g
1st Capturing Group (.*)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
matches the character literally (case sensitive)
2nd Capturing Group (\d+)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
, matches the characters , literally (case sensitive)
3rd Capturing Group (.*)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Disclamer:
Never trust an input! Even if it's a file some other program did, or send by a customer.
From my experience, you have then two ways of handeling bad format:
Read line by line, and register every bad line.
or Ignore them. You don't fit , you don't sit!
And don't tell your self it won't happend, it will!

c# migrating to ANTLR 4 from ANTLR 3 with AST

I have inherited some c# code based on ANTLR 3.
We have some grammar files that uses the AST (abstract syntax tree) option and we use those grammar to parse text files with a very odd "language" to objects. we are using the AST as intermediate objects and than convert them to the real objects that we need (with some more processing).
I have no knowledge in ANTLR but currently we have a bottleneck in the application performance from ANTLR processing of the files.
Since we are using ANTLR 3 we thought that we might get a performance boost if we migrate to ANTLR (and also get the latest and greatest version of ANTLR which is always a good practice).
I have read that AST no longer exist in ANTLR 4, what is the best (and simplest) way to replace it and what will it mean to my current code.
What is the best approach to upgrade ? and will it really give us a performance boost.
An example of one of the grammar file ( there are 6 and this is the simplest one):
grammar Rules;
options
{
language=CSharp2;
output=AST;
ASTLabelType=CommonTree;
superClass = OOPLParserBase;
}
tokens
{
OOPL_MODEL;
}
#lexer::namespace { TestParser.Common.RulesParser }
#parser::namespace { TestParser.Common.RulesParser }
#header
{
using System.Collections.Generic;
using TestParser.OOPLModel;
}
#members
{
public RulesParser() : base(null)
{
}
protected override CommonTree GetAst()
{
return root().Tree as CommonTree;
}
protected override Lexer GetLexer()
{
return new RulesLexer();
}
}
//semantic analysis
root : header (rule_line COMMENT?)+ -> ^(header rule_line+);
header : header_comment+ -> ^(OOPL_MODEL<OOPLModel>[new CommonToken(OOPL_MODEL), "1.0"] header_comment+);
header_comment : COMMENT -> ^(COMMENT<OOPLComment>[$COMMENT, $COMMENT.Text]);
rule_line : parameter RULE_TYPE COMMA PARAMETER_NAME COLON condition -> ^(RULE_TYPE<OOPLBlock>[$RULE_TYPE, $RULE_TYPE.Text] parameter PARAMETER_NAME<OOPLValue>[$PARAMETER_NAME, $PARAMETER_NAME.Text] condition);
parameter : PARAMETER_NAME EQUALS (integer_value = INTEGER | real_value = REAL |string_value = STRING) COMMA -> ^(PARAMETER_NAME<OOPLKeyedValue>[$PARAMETER_NAME, $PARAMETER_NAME.Text, SingleWhereNotNull<IToken>($integer_value, $string_value, $real_value).Text]);
condition : condition_value COMMA condition_value COMMA condition_value COMMA condition_value COMMA condition_value COMMA condition_value COMMA condition_value COMMA condition_value COMMA condition_value COMMA condition_value COMMA condition_value COMMA condition_value COMMA condition_value;
condition_value : (asterisk| parameter_name | positive_integer);
asterisk : ASTERISK -> ^(ASTERISK<OOPLValue>[$ASTERISK, $ASTERISK.Text]);
parameter_name : PARAMETER_NAME -> ^(PARAMETER_NAME<OOPLValue>[$PARAMETER_NAME, $PARAMETER_NAME.Text]);
positive_integer : INTEGER -> ^(INTEGER<OOPLValue>[$INTEGER, $INTEGER.Text]);
//lexical analysis
EQUALS : '=';
NEW_LINE_R : '\r' { $channel = HIDDEN; };
NEW_LINE_N : '\n' { $channel = HIDDEN; };
RULE_TYPE : ('Time'|'TIME'|'Lol'|'LOL'|'World'|'WORLD'|'Template'|'TEMPLATE');
DOUBLE_COLON : COLON COLON;
INTEGER : MINUS? DIGIT+;
REAL : INTEGER '.' INTEGER;
PARAMETER_NAME : ASTERISK? (LETTER|DIGIT|UNDERSCORE|FORWARDSLASH|DOUBLE_COLON|MINUS)+ ASTERISK?;
WS : ( ' '
| '\t'
| NEW_LINE_R
| NEW_LINE_N
) { $channel = HIDDEN; } ;
COMMENT : '#' ( options {greedy=false;} : . )* NEW_LINE_R? NEW_LINE_N;
STRING : '"'~('"')* '"';
fragment
MINUS : '-';
COMMA : ',';
COLON : ':';
fragment
DOT : '.';
ASTERISK : '*';
fragment
FORWARDSLASH : '/';
fragment
UNDERSCORE : '_';
fragment
DIGIT : '0'..'9';
fragment
LETTER : 'A'..'Z' | 'a'..'z';
I'd do the transformation solely in C# code after the parse.
In this case I'd even skip the intermediate AST form and transform the parse tree (provided by ANTLR4) directly into the target representation.
Some prefer ParseTreeListener/ParseTreeWalkers, which aid you in walking the parse tree. Check these out, if you want some pre-build code. Be sure to use the typed ParseTreeWalker, which should be named RulesParseTreeListener<>, inherit and adjust to your needs.
link: https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Parse+Tree+Listeners
I'd not recommend ParseTreeVisitors which are invoked during the parse (as opposed to after the parse). They are only suitable for simple operations or grammars that are not context free and require code during the parse. If the requirements evolve later on, you're way more flexible with custom processing or listeners/walkers.

Why comparing two equal persian word does not return 0?

We have two same letter 'ی' and 'ي' which the first came as main letter after windows seven.
Back to old XP we had the second one as main.
Now the inputs I get is determined as different if one client is on windows XP and the other on windows seven.
I have also tried to use Persian culture with no success.
Am I missing anything ?
EDIT : Had to change the words for better understanding.. now they look similar.
foreach (CompareOptions i in Enum.GetValues(new CompareOptions().GetType()).OfType<CompareOptions>())
Console.WriteLine( string.Compare("محسنين", "محسنین", new CultureInfo("fa-ir"), i) + "\t : " + i );
Outputs :
-1 : None
-1 : IgnoreCase
-1 : IgnoreNonSpace
-1 : IgnoreSymbols
-1 : IgnoreKanaType
-1 : IgnoreWidth
1 : OrdinalIgnoreCase
-1 : StringSort
130 : Ordinal
The two strings are not equal. The last letter differs.
About why IgnoreCase returns -1 but OrdinalIgnoreCase returns 1:
OrdinalIgnoreCase uses the invariant culture to convert the string to upper and afterwards performs a byte by byte comparison
IgnoreCase uses the specified culture to perform a case insensitive compare.
The difference is that IgnoreCase knows "more" about the differences in the letters of the specified language and will treat them possibly differently than the invariant culture, leading to a different outcome.
This is a different manifestation of what became known as "The Turkish İ Problem".
You can verify it yourself by using the InvariantCulture instead of the Persian one:
foreach (CompareOptions i in Enum.GetValues(new CompareOptions().GetType()).OfType<CompareOptions>())
Console.WriteLine( string.Compare("محسنی", "محسني", CultureInfo.InvariantCulture, i) + "\t : " + i );
This will output 1 for both IgnoreCase and OrdinalIgnoreCase.
Regarding your edited question:
The two strings still differ. The following code outputs the values of the single characters in the strings.
foreach(var value in strings.Select(x => x.ToCharArray().Select(y => (int)y)))
Console.WriteLine(value);
The result will look like this:
1605
1581
1587
1606
1610 // <-- "yeh": ي
1606
1605
1581
1587
1606
1740 // <-- "farsi yeh": ی
1606
As you can see, there is one character that differs, resulting in a comparison that treats those two strings as not equal.
Here My Code Characters Arabian “ي,ك” to Persian “ی,ک” ,By extension method:
private static readonly string[] pn = { "ی", "ک" };
private static readonly string[] ar = { "ي", "ك" };
public static string ToFaText(this string strTxt)
{
string chash = strTxt;
for (int i = 0; i < 2; i++)
chash = chash.Replace(ar[i],pn[i]);
return chash;
}
public string ToFaText(string strTxt)
{
return strTxt.Replace("ك","ک").Replace("ي","ی");
}
usage:
string str="اولين برداشت";
string per = ToFaText(str);

Categories

Resources