App.Config escaping

App.Config escaping - c#

I have this app.config which contains FolderPath under AppSettings node.
During testing, the QAs used the path: C:\directory\test & test on FolderPath value that made the application crash on startup.
I know it's the unescaped character (specifically &) that made the error.
They're insisting that it's a program error and should be automatically escaped because some users may not know about escaping strings.
How do I deal with it?

If users are allowed to edit app.config, they must write correct XML, period. This is not a program error, and there is frankly nothing you can (or should) do about this. But you already know this.
I believe the point your QAs are trying to make is:
You should not expect "normal" users to know how to edit/write XML, or even know what XML is. Instead, for user-editable values, you should create a UI which edits the app.config, and otherwise tell your users to leave the file alone unless they "know what they are doing".

They're insisting that it's a program error and should be automatically escaped because some users may not know about escaping strings.
User should never be allowed to deal directly with settings file. If they have to update they should be provided a GUI where they can update as necessary. This way you have the control to test whether or not the input is correct or not.

Related

Can I track the use of ConfigurationManager settings

I am trying to clean up the web.config in multiple projects, but am worried that I may remove an appsetting/connectionstring that is being used somewhere.
For example, I want to know if ConfigurationManager.AppSettings["MySetting"]) is used.
I can of course do a global find for ConfigurationManager or Appsettings, but that doesn't check in compiled dlls (this project has some dlls referenced that i know are looking for certain keys).
I would love to be able to 'log' (text file, db, anywhere) the use of the .config file, minimally logging the key name, but ideally the namespace/method that called it. If this is possible, I could come back in some amount of time and check the log to see what is used.
Deleting the settings and seeing if the app throws an exception is tempting :), but not a realistic option.
Thanks in advance!

You can try to use aspect to log that a method was called.
I'm not sure if you will be able to get which configuration or key was retrieved but having all the places that's called is already a starting point.
Hope that helps.
Give a try for PostSharp or Unity.

Canonicalize URL to lowercase without breaking file system or culture?

Canonicalizing URLs to Lowercase
I wish to write an HTTP module that converts URLs to lowercase. My first attempt ignored international character sets and works great:
// Convert URL virtual path to lowercase
string lowercase = context.Request.FilePath.ToLowerInvariant();
// If anything changed then issue 301 Permanent Redirect
if (!lowercase.Equals(context.Request.FilePath, StringComparison.Ordinal))
{
context.Response.RedirectPermanent(...lowercase URL...);
}
The Turkey Test (international cultures):
But what about cultures other than en-US? I referred to the Turkey Test to come up with a test URL:
http://example.com/Iıİi
This little insidious gem destroys any notion that case conversion in URLs is simple! Its lowercase and uppercase versions, respectively, are:
http://example.com/ııii
http://example.com/IIİİ
For case conversion to work with Turkish URLs, I first had to set the current culture of ASP.NET to Turkish:
<system.web>
<globalization culture="tr-TR" />
</system.web>
Next, I had to change my code to use the current culture for the case conversion:
// Convert URL virtual path to lowercase
string lowercase = context.Request.FilePath.ToLower(CultureInfo.CurrentCulture);
// If anything changed then issue 301 Permanent Redirect
if (!lowercase.Equals(context.Request.FilePath, StringComparison.Ordinal))
{
context.Response.RedirectPermanent(...);
}
But wait! Will StringComparison.Ordinal still work? Or should I use StringComparison.CurrentCulture? I'm really not certain of either!
File names: It gets MUCH WORSE!
Even if the above works, using the current culture for case conversions breaks the NTFS file system! Let's say I have a static file with the name Iıİi.html:
http://example.com/Iıİi.html
Even though the Windows file system is case-insensitive it does not use language culture. Converting the above URL to lowercase results in a 404 Not Found because the file system doesn't consider the two names as equal:
http://example.com/ııii.html
The correct case conversion for file names? WHO KNOWS?!
The MSDN article, Best Practices for Using Strings in the .NET Framework, has a note (about halfway through the article):
Note:
The string behavior of the file system, registry keys and values, and environment variables is best represented by StringComparison.OrdinalIgnoreCase.
Huh? Best represented??? Is that the best we can do in C#? So just what is the correct case conversion to match the file system? Who knows?!!? About all we can say is that string comparisons using the above will probably work MOST of the time.
Summary: Two case conversions: Static/Dynamic URLs
So we've seen that static URLs---URLs having a file path that matches a real directory/file in the file system---must use an unknown case conversion that is only "best represented" by StringComparison.OrdinalIgnoreCase. And please note there is no string.ToLowerOrdinal() method so it's very difficult to know exactly what case conversion equates to the OrdinalIgnoreCase string comparison. Using string.ToLowerInvariant() is probably the best bet, yet it breaks language culture.
On the other hand, dynamic URLs---URLs with a file path that does not match a real file on the disk (that map to your application)---can use string.ToLower(CultureInfo.CurrentCulture), but it breaks file system matching and it is somewhat unclear what edge cases exist that may break this strategy.
Thus, it appears case conversion first requires detection as to whether a URL is static or dynamic before choosing one of two conversion methods. For static URLs there is uncertainty how to change case without breaking the Windows file system. For dynamic URLs it is questionable if case conversion using culture will similarly break the URL.
Whew! Anyone have a solution to this mess? Or should I just close my eyes and pretend everything is ASCII?

I would challenge the premise here that there is any utility whatsoever in attempting to auto-convert URLs to lowercase.
Whether a full URL is case-sensitive or not depends entirely on the web server, web application framework, and underlying file system.
You're only guaranteed case-insensitivity in the scheme (http://, etc.) and hostname portions of the URL. And remember that not all URL schemes (file and news, for example) even include a hostname.
Everything else can be case-sensitive to the server, including paths (/), filenames, queries (?), fragments (#), and authority info (usernames/passwords before the # in mailto, http, ftp, and some other schemes).

You have some incompatible goals.
Have a culture-sensitive case-lowering. If Turkish seems bad, you don't want to know about some of the Georgian scripts, never mind that ß is either upper-cased to SS or less commonly to SZ - in either case to have a full case-folding where lower("ß") will match lower(upper("ß")) you need to consider it equivalent to at least one of those two-character sequences. Generally we aim for case-folding rather than case-lowering if possible (not possible here).
Use this in a non culture-sensitive context. URIs are ultimately opaque strings. That they may have a human-readable understanding is usefulful for coders, users, search-engines and marketers alike, but their ultimate job is to identify a resource by a direct case-sensitive comparison.
Map this to NTFS, which has a case-preserving case-sensitivity based on the mappings in the $UpCase file, which it does by comparing the upper-cased forms of words (at least it doesn't have to decide whether Σ lower-cases to σ or ς, in a culture-insensitive manner.
Presumably do well in terms of SEO and human readability. This may well be part of your original goal, but whileThisIsNotVeryEasyToReadOrParse itseasierforbothpeopleandmachinesthanthis. Case-folding loses information.
I suggest a different approach.
Start with your starting string, whatever that is and wherever it came from (NTFS filename, database entry, HttpHandler binding in web.config). Have that as your canonical form. By all means have rules that people should create these strings according to some canonical form, and perhaps enforce it where you can, but if something slips by that breaks your rules, then accept it as the official canonical name for that resource no matter how much you dislike it.
As much as possible the canonical name should be the only one "seen" by the outside world. This can be enforced programmatically or just a matter of it being best practice, as canonicalising after the fact with 301s won't solve the fact that outside entities don't know you do so until they dereference the URI.
When a request is received, test it according to how it is going to be used. Hence while you may choose to use a particular culture (or not) for those cases where you perform the resource-lookup yourself, with so-called "static" URIs, your logic can deliberately follow that of NTFS by simply using NTFS to do the work:
Find mapped file ignoring the matter of case sensitivity for now.
If non-match then 404, who cares about case?
If find, do case-sensitive ordinal comparison, if it doesn't match then 301 to the case-sensitive mapping.
Otherwise, proceed as usual.
Edit:
In some ways the question of domain names is more complicated. The rules for IDN have to cover more issues with less room for manœuver. However, it's also simpler at least as far as case-canonicalising goes.
(I'm going to ignore canonicalising of whether or not www. is used etc. though I'd guess it's part of the same job here, it's pushing the scope and we could end up writing a book between us if we don't stop somewhere :)
IDNs have their own case canoniclisation (and some other forms of normalisation) rules defined in RFC 3491. If you're going to canonicalise domain names on case, follow that.
Makes it nice and simple to answer, doesn't it? :)
There's also less pressure in a way, for while search engines have to recognise that http://example.net/thisisapath and http://example.net/thisIsAPath may be the same resource, they also have to recognise that they might be different, and that's where all of the SEO advantage of canonicalising on one of them (doesn't matter which) comes from.
However, they know that example.net and EXAMPLE.NET can't possibly be different sites, so there's little SEO advantage in making sure they're the same (still nice for things like caches and history lists that don't make that jump themselves). Of course, the issue remains with the fact that www.example.net or even maAndPasExampleEmporium.us might be the same site, but again, that moves away from case issues.
There's also the simple matter that most of the time we never have to deal with more than a couple of dozen different domains, so sometimes working harder rather than smarter (i.e. just make sure they're all set up right and don't do anything programmatically!) can do the trick.
A final note though, it's important not to canonicalise a third-party URI. You can end up breaking things if you change the path (they may not be treating it case-insensitively) and you might at least end up breaking their slightly different canonicalisation. Best to leave them as is at all times.

Firstly never use case transformations to compare strings. It needlessly allocates a string, it has a needless small performance impact, could result in an ObjectReferenceException if the value is null and could likely result in an incorrect comparison.
If this is important enough to you I would manually traverse the file system and use the your own comparisons against each file/directory name. You should be able to use Accept-Language or Accept-Encoding (if it has a culture included) HTTP header to find the suitable culture to use. Once you have the CultureInfo you can use it to perform the string comparisons:
var ci = CultureInfo.CurrentCulture; // Use Accept-Language to derive this.
ci.CompareInfo.Compare("The URL", "the url", CompareOptions.IgnoreCase);
I would only do this on a HTTP 404; the HTTP 404 handler would search for a matching file and then HTTP 301 the user to the correctly-cased URL (as manual file-system traversal can get expensive).

.Net 3.5 DLL XML dependency

I wrote a class library in C# that uses a external XML file to store some data. I use this data (encoded rules) directly in the class library to do some substitutions within a text parser. The rules within the XML:
<rule>
<word>h e l l o</word>
<sub>Hello</sub>
</rule>
When I share the lib, I also have to share the XML. This is a bug source, at least for me ;) My question: is there any common way to solve such issues? Should I use app.config instead?
Thanks for any hint and best regards!

Why not embed the XML within the dll?

As with every external configuration data i can be changed or missing. So your application (or library) has to deal with such circumstances.
This means:
For every missing value you have a default value (should be declared in your documentation)
Check every value for correctness (type, range, etc.) (All input is evil!)
Blame user for invalid config files (error message, etc)
Implement and document behaviour in error case (abort, crash, use default value, etc)
So it doesn't matter which way to go, cause it is a user configuration (which means it can be changed by the user) and so you have to check those entries.

Create the XML file with default settings/values if it doesn´t exist.

Where can I find a list of all possible messages that an XmlException can contain?

I'm writing an XML code editor and I want to display syntax errors in the user interface. Because my code editor is strongly constrained to a particular problem domain and audience, I want to rewrite certain XMLException messages to be more meaningful for users. For instance, an exception message like this:
'"' is an unexpected token. The
expected token is '='. Line 30,
position 35
.. is very technical and not very informative to my audience. Instead, I'd like to rewrite it and other messages to something else. For completeness' sake that means I need to build up a dictionary of existing messages mapped to the new message I would like to display instead. To accomplish that I'm going to need a list of all possible messages XMLException can contain.
Is there such a list somewhere? Or can I find out the possible messages through inspection of objects in C#?
Edit: specifically, I am using XmlDocument.LoadXml to parse a string into an XmlDocument, and that method throws an XmlException when there are syntax errors. So specifically, my question is where I can find a list of messages applied to XmlException by XmlDocument.LoadXml. The discussion about there potentially being a limitless variation of actual strings in the Message property of XmlException is moot.
Edit 2: More specifically, I'm not looking for advice as to whether I should be attempting this; I'm just looking for any clues to a way to obtain the various messages. Ben's answer is a step in the right direction. Does anyone know of another way?

Technically there is no such thing, any class that throws an XmlException can set the message to any string. Really it depends on which classes you are using, and how they handle exceptions. It is perfectly possible you may be using a class that includes context specific information in the message, e.g. info about some xml node or attribute that is malformed. In that case the number of unqiue message strings could be infinite depending on the XML that was being processed. It is equally possible that a particular class does not work in this way and has a finite number of messages that occur under specific circumstances. Perhaps a better aproach would be to use try/catch blocks in specific parts of your code, where you understand the processing that is taking place and provide more generic error messages based on what is happening. E.g. in your example you could simply look at the line and character number and produce an error along the lines of "Error processing xml file LineX CharacterY" or even something as general as "error processing file".
Edit:
Further to your edit i think you will have trouble doing what you require. Essentially you are trying to change a text string to another text string based on certain keywords that may be in the string. This is likely to be messy and inconsistent. If you really want to do it i would advise using something like Redgate .net Reflector to reflect out the loadXML method and dig through the code to see how it handles different kinds of syntax errors in the XML and what kind of messages it generates based on what kind of errors it finds. This is likely to be time consuming and dificult. If you want to hide the technical errors but still provide useful info to the user then i would still recomend ignoring the error message and simply pointing the user to the location of the problem in the file.

Just my opinion, but ... spelunking the error messages and altering them before displaying them to the user seems like a really misguided idea.
First, The messages are different for each international language. Even if you could collect them for English, and you're willing to pay the cost, they'll be different for other languages.
Second, even if you are dealing with a single language, there's no way to be sure that an external package hasn't injected a novel XmlException into the scope of LoadXml.
Last, the list of messages is not stable. It may change from release to release.
A better idea is to just emit an appropriate message from your own app, and optionally display -- maybe upon demand -- the original error message contained in the XmlException.

FxCop: Compound word should be treated as discrete term

FxCop wants me to spell Username with a capital N (i.e. UserName), due to it being a compound word. However, due to consistency reasons we need to spell it with a lowercase n - so either username or Username.
I've tried tweaking the CodeAnalysisDictionary.xml by adding the following section to the section:
<DiscreteExceptions>
<Term>username</Term>
</DiscreteExceptions>
From what I understand how custom dictionaries work, this should tell FxCop to treat username as a discrete term and prevent the CompoundWordsShouldBeCasedCorrectly (CA1702) check to fire an error.
Unfortunately this doesn't work. Does anybody have an idea why that is and how to solve this? I don't want to add suppressions, because this would seriously clutter the GlobalSuppressions file as there are quite a lot of occurrences.
Edited to add: For the time being I have solved this by using GlobalSuppressions, but given the nature of the issue this doesn't seem like the ideal way to solve this. Can anybody give a hint on where to look for further information on how FxCop applies the rules defined in a dictionary?

I was a developer on the FxCop / Managed Code Analysis Team for 3 years and I have your answer. Things have changed since my time, and I had forgotten exactly how custom dictionary handling worked and so it took me quite a bit of time to figure this out. :)
Executive Summary
The short answer is that you need to remove all references to username, usernames, UserName, and UserNames from C:\Program Files (x86)\Microsoft FxCop 1.36\CustomDictionary.xml.
Normally, I would not recommend this as it should not be required, but you have found what I believe is a bug, and this is the only workaround that I could find.
Full Story
OK, now for the long answer...
The rule has two distinct checks which work as follows:
A. Check for compound words that should be discrete
Split identifier into tokens: e.g. FileName --> { "file", "name" }
Spell check each adjacent pair of tokens.
If the spell check succeeds (e.g. filename is deemed to be a valid word),
then we have found a potential problem since a single word should not be expressed as
two tokens.
However, if there is a <Term CompoundAlternate="FileName">filename</Term>
in the <Compound> section of the custom dictionary, then it is taken to mean that
although filename is a word, the design guidelines (largely as a nod to consistency
with prior art in the Framework that predates the existence of the rule) insist it
should be written as FileName, and so we must suppress the warning.
Also, if there is a <Term>filename</Term> entry in the <DiscreteExceptions>
section of the custom dictionary, then it is taken to mean that although 'filename' is
a word, it might also be two words 'file' and 'name' in a different context. e.g.
Onset is a word, but asking the user to change DoSomethingOnSet to
DoSomethingOnset would be noise, and so we must suppress the warning.
B. Check for discrete words that should be compound:
Taking the tokens from A.1, check each one individually against the set of compound
terms in the custom dictionary.
If there is a match, then we must warn in keeping with the interpretation in step A.4.
Notice that your warning: Username should be UserName is detected in part B, which does not consult the DiscreteExceptions section, which is why you are not able to suppress the warning by modifying that section. The problem is that the default custom dictionary has an entry stating that the correct casing for username is always UserName. It needs to be removed or overridden somehow.
The Bug
Now, the ideal solution would be to leave the default custom dictionary alone, specify SearchFxCopDir=false in your project file, and then merge in only the parts of the default custom dictionary that you want in the CustomDictionary.xml that is used for your project. Sadly, this does not work as FxCop 1.36 ignores the SearchFxCopDir directive and always treats it as true. I believe this is a bug, but it is also possible that this was an intentional change since the directive is not documented and has no corresponding UI. I honestly don't know...
Conclusion
Given that FxCop always uses its default custom dictionary in addition to the project custom dictionary, your only recourse is to remove the entries in question from the default custom dictionary.
If I have a chance, I will contact the current code analysis team to see if this in fact a bug, and report back here...

In the custom dictionary that comes with FxCop (located in my system in C:\Program Files\Microsoft FxCop 1.36\CustomDixtionary.xml, but YMMV) in Words\Compounds has a <Term CompoundAlternate="UserName">username</Term> entry. Delete it. You still need the discrete exception.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.