Ive searched all over the web for a simple solution to my problem, but I find it weird that no one has some up with a way to "get a correct connection string if the password contains non-alphanumeric characters".
My problem:
I have a user which has a password containing one or more of these characters:
` ~ ! # # $ % ^ & * ( ) _ + - = { } | \ : " ; ' < > ? , . /
since the connectionstring format is "KEY=VALUE;KEY2=VALUE2" it turns out to be a problem if the password contains a semi-colon of course.
So I did a little research and found these "connectionstring-rules"
All blank characters, except those placed within a value or within quotation marks, are ignored
Blank characters will though affect connection pooling mechanism, pooled connections must have the exact same connection string
If a semicolon (;) is part of a value it must be delimited by quotation marks (")
Use a single-quote (') if the value begins with a double-quote (")
Conversely, use the double quote (") if the value begins with a single quote (')
No escape sequences are supported
The value type is not relevant
Names are case iNsEnSiTiVe
If a KEYWORD=VALUE pair occurs more than once in the connection string, the value associated with the last occurrence is used
However, if the provider keyword occurs multiple times in the string, the first occurrence is used.
If a keyword contains an equal sign (=), it must be preceded by an additional equal sign to indicate that it is part of the keyword.
If a value has preceding or trailing spaces it must be enclosed in single- or double quotes, ie Keyword=" value ", else the spaces are removed.
I then read a bunch of thread where people are trying to implement the above mentioned things into their "Format connectionstring method", but it seemed like even more scenarios came to light when they began.
My question is then:
Is there someone out there who has made a "FormatConnectionstring" method to use in a connectionstring - or am I doing something completely wrong here and my problem really exists elsewhere?
Use SqlConnectionStringBuilder; either:
set properties, read ConnectionString (to create)
set ConnectionString, read properties (to parse)
Related
I've noticed that C# adds additional slashes (\) to paths. Consider the path C:\Test. When I inspect the string with this path in the text visualiser, the actual string is C:\\Test.
Why is this? It confuses me, as sometimes I may want to split the path up (using string.Split()), but have to wonder which string to use (one or two slashes).
The \\ is used because the \ is an escape character and is need to represent the a single \.
So it is saying treat the first \ as an escape character and then the second \ is taken as the actual value. If not the next character after the first \ would be parsed as an escaped character.
Here is a list of available escape characters:
\' - single quote, needed for character literals
\" - double quote, needed for string literals
\\ - backslash
\0 – Null
\a - Alert
\b - Backspace
\f - Form feed
\n - New line
\r - Carriage return
\t - Horizontal tab
\v - Vertical quote
\u - Unicode escape sequence for character
\U - Unicode escape sequence for surrogate pairs.
\x - Unicode escape sequence similar to "\u" except with variable length.
EDIT: To answer your question regarding Split, it should be no issue. Use Split as you would normally. The \\ will be treated as only the one character of \.
.Net is not adding anything to your string here. What your seeing is an effect of how the debugger chooses to display strings. C# strings can be represented in 2 forms
Verbatim Strings: Prefixed with an # sign and removes the need o escape \\ characters
Normal Strings: Standard C style strings where \\ characters need to escape themselves
The debugger will display a string literal as a normal string vs. a verbatim string. It's just an issue of display though, it doesn't affect it's underlying value.
Debugger visualizers display strings in the form in which they would appear in C# code. Since \ is used to escape characters in non-verbatum C# strings, \\ is the correct escaped form.
Okay, so the answers above are not wholly correct. As such I am adding my findings for the next person who reads this post.
You cannot split a string using any of the chars in the table above if you are reading said string(s) from an external source.
i.e,
string[] splitStrings = File.ReadAllText([path]).Split((char)7);
will not split by those chars. However internally created strings work fine.
i.e.,
string[] splitStrings = "hello\agoodbye".Split((char)7);
This may not hold true for other methods of reading text from a file. I am unsure as I have not tested with other methods. With that in mind, it is probably best not to use those chars for delimiting strings!
i stumbled upon this bit here in a project from a colleague:
foreach (var invalidChar in Path.GetInvalidFileNameChars())
fileName = fileName.Replace(invalidChar, '\0');
the gerenal idea is obvious enough but i wonder why he chose to replace the invalid chars with the literal for the null char instead of a 'regular' char or just an empty string.
i guess there's a good reason for this choice(the guy who wrote this is a senior in our team), i'd just like to know what this reason is.
After commenting the question I was looking for proof that \0 is actually not allowed for file names. I found it:
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
* The following reserved characters: < (less than), > (greater than), : (colon), " (double quote), / (forward slash), \ (backslash), | (vertical bar or pipe), ? (question mark), * (asterisk)
* Integer value zero, sometimes referred to as the ASCII NUL character.
* Characters whose integer representations are in the range from 1 through 31, except for alternate data streams where these characters are allowed.
It depends on the OperationSystem where your code runs. But on Windows the char \0 (0 as int) is on the list of invalid chars for a fileName.
LinqPad (run on Windows 10):
Path.GetInvalidFileNameChars().Contains('\0').Dump(); //true
I think this code was ported from another language to .net.
It would be better to throw an exception (if an user specified the name) if the filename contains invalid chars instead of replacing them with anything.
If you need to replace them you should select a char, like _, to make it clear that there was possibly something replaced.
As per some wise people there is no thing as empty char. Also should avoid confusion about space (" ") and empty string ("").
I've noticed that C# adds additional slashes (\) to paths. Consider the path C:\Test. When I inspect the string with this path in the text visualiser, the actual string is C:\\Test.
Why is this? It confuses me, as sometimes I may want to split the path up (using string.Split()), but have to wonder which string to use (one or two slashes).
The \\ is used because the \ is an escape character and is need to represent the a single \.
So it is saying treat the first \ as an escape character and then the second \ is taken as the actual value. If not the next character after the first \ would be parsed as an escaped character.
Here is a list of available escape characters:
\' - single quote, needed for character literals
\" - double quote, needed for string literals
\\ - backslash
\0 – Null
\a - Alert
\b - Backspace
\f - Form feed
\n - New line
\r - Carriage return
\t - Horizontal tab
\v - Vertical quote
\u - Unicode escape sequence for character
\U - Unicode escape sequence for surrogate pairs.
\x - Unicode escape sequence similar to "\u" except with variable length.
EDIT: To answer your question regarding Split, it should be no issue. Use Split as you would normally. The \\ will be treated as only the one character of \.
.Net is not adding anything to your string here. What your seeing is an effect of how the debugger chooses to display strings. C# strings can be represented in 2 forms
Verbatim Strings: Prefixed with an # sign and removes the need o escape \\ characters
Normal Strings: Standard C style strings where \\ characters need to escape themselves
The debugger will display a string literal as a normal string vs. a verbatim string. It's just an issue of display though, it doesn't affect it's underlying value.
Debugger visualizers display strings in the form in which they would appear in C# code. Since \ is used to escape characters in non-verbatum C# strings, \\ is the correct escaped form.
Okay, so the answers above are not wholly correct. As such I am adding my findings for the next person who reads this post.
You cannot split a string using any of the chars in the table above if you are reading said string(s) from an external source.
i.e,
string[] splitStrings = File.ReadAllText([path]).Split((char)7);
will not split by those chars. However internally created strings work fine.
i.e.,
string[] splitStrings = "hello\agoodbye".Split((char)7);
This may not hold true for other methods of reading text from a file. I am unsure as I have not tested with other methods. With that in mind, it is probably best not to use those chars for delimiting strings!
I've searched for hours and already tried tons of different patterns - there's a simple thing I wan't to achive with regex, but somehow it just won't do as I want:
Possible Strings
String1
This is some text \0"§%lfsdrlsrblabla\0\0\0}dfglpdfgl
String2
This is some text
String3
This is some text \0
Desired Match/Result
This is some text
I simply want to match everything - until and except the \0 - resulting in only 1 Match. (everything before the \0)
Important for my case is, that it will match everytime, even when the \0 is not given.
Thanks for your help!
You can try with this pattern:
#"^(?:[^\\]+|\\(?!0))+"
In other words: all characters except backslashes or backslashes not followed by 0
I like
#"^((?!\\0).)*"
Because it's very easy to implement with any arbitrary string. The basic trick is the negative lookahead, which asserts that the string starting at this point doesn't match the
regular expression inside. We follow this with a wildcard to mean "Literally any character not at the start of my string. If your string should change, this is an easy update - just
#"^((?!--STRING--).)*)"
As long as you properly escape that string. Heck, with this pattern, you're merely a regex_escape function from generating any delimiter string.
Bonus: using * instead of + will return a blank string as a valid match when your string starts with your delimiter.
I've inherited some C# code with the following regular expression
Regex(#"^[a-zA-Z''-'\s]{1,40}$")
I understand this string except for the role of the single quotes. I've searched all over but can't seem to find an explanation. Any ideas?
From what I can tell, the expression is redundant.
It matches a-z or A-Z, or the ' character, or anything between ' and ' (which of course is only the ' character again, or any whitespace.
I've tested this using RegexPal and it doesn't appear to match anything but these characters. Perhaps the sequence was generated by code, or it used to match a wider range of characters in an earlier version?
UPDATE: From your comments (matching a name), I'm gonna go ahead and guess the author thought (s)he was escaping a hyphen by putting it in quotes, and wasn't the most stellar software tester. What they probably meant was:
Regex(#"^[a-zA-Z'\-\s]{1,40}$") //Escaped the hyphen
Which could also be written as:
Regex(#"^[a-zA-Z'\s-]{1,40}$") //Put the hyphen at the end where it's not ambiguous
The only way having the apostrophe / single quote three times makes sense is if the second and third instances are actually fancy curly single quotes such as ‘, ’, and ‛. If so a better (clearer) way to represent it would be to use the unicode escapes:
Regex(#"^[a-zA-Z'\u2018-\u201B\s]{1,40}$")
Incidentally some languages, such as PowerShell, explicitly allow these curly single quotes and treat them the same as the ASCII ' (0x27) character. From the PowerShell 2.0 Language Specification:
single-quote-character:
' (U+0027)
Left single quotation mark (U+2018)
Right single quotation mark (U+2019)
Single low-9 quotation mark (U+201A)
Single high-reversed-9 quotation mark (U+201B)
As it is the three single quote characters are redundant. They represent the single quote character (#1) and the range of characters which both begins and ends at the single quote (#2 and #3 separated by a hyphen).
It looks like it is an error, the writer seems to have meant to include the hyphen character in the class by "escaping" it in single quotes. Without escaping it the hyphen represents a character range, like in a-z and A-Z.
I'm guessing the original author meant [a-zA-Z'\-\s]
The extra apostrophes are redundant, so it doesn't make much sense. One possibility is that the author tried to escape the dash to include it in the pattern, but the correct way to do that would be to use a backslash:
Regex(#"^[a-zA-Z'\-\s]{1,40}$")
(Using apostrophes around a literal is for example used in custom format strings, where the author might have picked it up.)