Escape command line arguments in c# - c#

Short version:
Is it enough to wrap the argument in quotes and escape \ and " ?
Code version
I want to pass the command line arguments string[] args to another process using ProcessInfo.Arguments.
ProcessStartInfo info = new ProcessStartInfo();
info.FileName = Application.ExecutablePath;
info.UseShellExecute = true;
info.Verb = "runas"; // Provides Run as Administrator
info.Arguments = EscapeCommandLineArguments(args);
Process.Start(info);
The problem is that I get the arguments as an array and must merge them into a single string. An arguments could be crafted to trick my program.
my.exe "C:\Documents and Settings\MyPath \" --kill-all-humans \" except fry"
According to this answer I have created the following function to escape a single argument, but I might have missed something.
private static string EscapeCommandLineArguments(string[] args)
{
string arguments = "";
foreach (string arg in args)
{
arguments += " \"" +
arg.Replace ("\\", "\\\\").Replace("\"", "\\\"") +
"\"";
}
return arguments;
}
Is this good enough or is there any framework function for this?

It's more complicated than that though!
I was having related problem (writing front-end .exe that will call the back-end with all parameters passed + some extra ones) and so i looked how people do that, ran into your question. Initially all seemed good doing it as you suggest arg.Replace (#"\", #"\\").Replace(quote, #"\"+quote).
However when i call with arguments c:\temp a\\b, this gets passed as c:\temp and a\\b, which leads to the back-end being called with "c:\\temp" "a\\\\b" - which is incorrect, because there that will be two arguments c:\\temp and a\\\\b - not what we wanted! We have been overzealous in escapes (windows is not unix!).
And so i read in detail http://msdn.microsoft.com/en-us/library/system.environment.getcommandlineargs.aspx and it actually describes there how those cases are handled: backslashes are treated as escape only in front of double quote.
There is a twist to it in how multiple \ are handled there, the explanation can leave one dizzy for a while. I'll try to re-phrase said unescape rule here: say we have a substring of N \, followed by ". When unescaping, we replace that substring with int(N/2) \ and iff N was odd, we add " at the end.
The encoding for such decoding would go like that: for an argument, find each substring of 0-or-more \ followed by " and replace it by twice-as-many \, followed by \". Which we can do like so:
s = Regex.Replace(arg, #"(\\*)" + "\"", #"$1$1\" + "\"");
That's all...
PS. ... not. Wait, wait - there is more! :)
We did the encoding correctly but there is a twist because you are enclosing all parameters in double-quotes (in case there are spaces in some of them). There is a boundary issue - in case a parameter ends on \, adding " after it will break the meaning of closing quote. Example c:\one\ two parsed to c:\one\ and two then will be re-assembled to "c:\one\" "two" that will me (mis)understood as one argument c:\one" two (I tried that, i am not making it up). So what we need in addition is to check if argument ends on \ and if so, double the number of backslashes at the end, like so:
s = "\"" + Regex.Replace(s, #"(\\+)$", #"$1$1") + "\"";

My answer was similar to Nas Banov's answer but I wanted double quotes only if necessary.
Cutting out extra unnecessary double quotes
My code saves unnecessarily putting double quotes around it all the time which is important *when you are getting up close to the character limit for parameters.
/// <summary>
/// Encodes an argument for passing into a program
/// </summary>
/// <param name="original">The value that should be received by the program</param>
/// <returns>The value which needs to be passed to the program for the original value
/// to come through</returns>
public static string EncodeParameterArgument(string original)
{
if( string.IsNullOrEmpty(original))
return original;
string value = Regex.Replace(original, #"(\\*)" + "\"", #"$1\$0");
value = Regex.Replace(value, #"^(.*\s.*?)(\\*)$", "\"$1$2$2\"");
return value;
}
// This is an EDIT
// Note that this version does the same but handles new lines in the arugments
public static string EncodeParameterArgumentMultiLine(string original)
{
if (string.IsNullOrEmpty(original))
return original;
string value = Regex.Replace(original, #"(\\*)" + "\"", #"$1\$0");
value = Regex.Replace(value, #"^(.*\s.*?)(\\*)$", "\"$1$2$2\"", RegexOptions.Singleline);
return value;
}
explanation
To escape the backslashes and double quotes correctly you can just replace any instances of multiple backslashes followed by a single double quote with:
string value = Regex.Replace(original, #"(\\*)" + "\"", #"\$1$0");
An extra twice the original backslashes + 1 and the original double quote. i.e., '\' + originalbackslashes + originalbackslashes + '"'. I used $1$0 since $0 has the original backslashes and the original double quote so it makes the replacement a nicer one to read.
value = Regex.Replace(value, #"^(.*\s.*?)(\\*)$", "\"$1$2$2\"");
This can only ever match an entire line that contains a whitespace.
If it matches then it adds double quotes to the beginning and end.
If there was originally backslashes on the end of the argument they will not have been quoted, now that there is a double quote on the end they need to be. So they are duplicated, which quotes them all, and prevents unintentionally quoting the final double quote
It does a minimal matching for the first section so that the last .*? doesn't eat into matching the final backslashes
Output
So these inputs produce the following outputs
hello
hello
\hello\12\3\
\hello\12\3\
hello world
"hello world"
\"hello\"
\\"hello\\\"
\"hello\ world
"\\"hello\ world"
\"hello\\\ world\
"\\"hello\\\ world\\"
hello world\\
"hello world\\\\"

I have ported a C++ function from the Everyone quotes command line arguments the wrong way article.
It works fine, but you should note that cmd.exe interprets command line differently. If (and only if, like the original author of article noted) your command line will be interpreted by cmd.exe you should also escape shell metacharacters.
/// <summary>
/// This routine appends the given argument to a command line such that
/// CommandLineToArgvW will return the argument string unchanged. Arguments
/// in a command line should be separated by spaces; this function does
/// not add these spaces.
/// </summary>
/// <param name="argument">Supplies the argument to encode.</param>
/// <param name="force">
/// Supplies an indication of whether we should quote the argument even if it
/// does not contain any characters that would ordinarily require quoting.
/// </param>
private static string EncodeParameterArgument(string argument, bool force = false)
{
if (argument == null) throw new ArgumentNullException(nameof(argument));
// Unless we're told otherwise, don't quote unless we actually
// need to do so --- hopefully avoid problems if programs won't
// parse quotes properly
if (force == false
&& argument.Length > 0
&& argument.IndexOfAny(" \t\n\v\"".ToCharArray()) == -1)
{
return argument;
}
var quoted = new StringBuilder();
quoted.Append('"');
var numberBackslashes = 0;
foreach (var chr in argument)
{
switch (chr)
{
case '\\':
numberBackslashes++;
continue;
case '"':
// Escape all backslashes and the following
// double quotation mark.
quoted.Append('\\', numberBackslashes*2 + 1);
quoted.Append(chr);
break;
default:
// Backslashes aren't special here.
quoted.Append('\\', numberBackslashes);
quoted.Append(chr);
break;
}
numberBackslashes = 0;
}
// Escape all backslashes, but let the terminating
// double quotation mark we add below be interpreted
// as a metacharacter.
quoted.Append('\\', numberBackslashes*2);
quoted.Append('"');
return quoted.ToString();
}

I was running into issues with this, too. Instead of unparsing args, I went with taking the full original commandline and trimming off the executable. This had the additional benefit of keeping whitespace in the call, even if it isn't needed/used. It still has to chase escapes in the executable, but that seemed easier than the args.
var commandLine = Environment.CommandLine;
var argumentsString = "";
if(args.Length > 0)
{
// Re-escaping args to be the exact same as they were passed is hard and misses whitespace.
// Use the original command line and trim off the executable to get the args.
var argIndex = -1;
if(commandLine[0] == '"')
{
//Double-quotes mean we need to dig to find the closing double-quote.
var backslashPending = false;
var secondDoublequoteIndex = -1;
for(var i = 1; i < commandLine.Length; i++)
{
if(backslashPending)
{
backslashPending = false;
continue;
}
if(commandLine[i] == '\\')
{
backslashPending = true;
continue;
}
if(commandLine[i] == '"')
{
secondDoublequoteIndex = i + 1;
break;
}
}
argIndex = secondDoublequoteIndex;
}
else
{
// No double-quotes, so args begin after first whitespace.
argIndex = commandLine.IndexOf(" ", System.StringComparison.Ordinal);
}
if(argIndex != -1)
{
argumentsString = commandLine.Substring(argIndex + 1);
}
}
Console.WriteLine("argumentsString: " + argumentsString);

I published small project on GitHub that handles most issues with command line encoding/escaping:
https://github.com/ericpopivker/Command-Line-Encoder
There is a CommandLineEncoder.Utils.cs class, as well as Unit Tests that verify the Encoding/Decoding functionality.

I wrote you a small sample to show you how to use escape chars in command line.
public static string BuildCommandLineArgs(List<string> argsList)
{
System.Text.StringBuilder sb = new System.Text.StringBuilder();
foreach (string arg in argsList)
{
sb.Append("\"\"" + arg.Replace("\"", #"\" + "\"") + "\"\" ");
}
if (sb.Length > 0)
{
sb = sb.Remove(sb.Length - 1, 1);
}
return sb.ToString();
}
And here is a test method:
List<string> myArgs = new List<string>();
myArgs.Add("test\"123"); // test"123
myArgs.Add("test\"\"123\"\"234"); // test""123""234
myArgs.Add("test123\"\"\"234"); // test123"""234
string cmargs = BuildCommandLineArgs(myArgs);
// result: ""test\"123"" ""test\"\"123\"\"234"" ""test123\"\"\"234""
// when you pass this result to your app, you will get this args list:
// test"123
// test""123""234
// test123"""234
The point is to to wrap each arg with double-double quotes ( ""arg"" ) and to replace all quotes inside arg value with escaped quote ( test\"123 ).

static string BuildCommandLineFromArgs(params string[] args)
{
if (args == null)
return null;
string result = "";
if (Environment.OSVersion.Platform == PlatformID.Unix
||
Environment.OSVersion.Platform == PlatformID.MacOSX)
{
foreach (string arg in args)
{
result += (result.Length > 0 ? " " : "")
+ arg
.Replace(#" ", #"\ ")
.Replace("\t", "\\\t")
.Replace(#"\", #"\\")
.Replace(#"""", #"\""")
.Replace(#"<", #"\<")
.Replace(#">", #"\>")
.Replace(#"|", #"\|")
.Replace(#"#", #"\#")
.Replace(#"&", #"\&");
}
}
else //Windows family
{
bool enclosedInApo, wasApo;
string subResult;
foreach (string arg in args)
{
enclosedInApo = arg.LastIndexOfAny(
new char[] { ' ', '\t', '|', '#', '^', '<', '>', '&'}) >= 0;
wasApo = enclosedInApo;
subResult = "";
for (int i = arg.Length - 1; i >= 0; i--)
{
switch (arg[i])
{
case '"':
subResult = #"\""" + subResult;
wasApo = true;
break;
case '\\':
subResult = (wasApo ? #"\\" : #"\") + subResult;
break;
default:
subResult = arg[i] + subResult;
wasApo = false;
break;
}
}
result += (result.Length > 0 ? " " : "")
+ (enclosedInApo ? "\"" + subResult + "\"" : subResult);
}
}
return result;
}

An Alternative Approach
If you're passing a complex object such as nested JSON and you have control over the system that's receiving the command line arguments, it's far easier to just encode the command line arg/s as base64 and then decode them from the receiving system.
See here: Encode/Decode String to/from Base64
Use Case: I needed to pass a JSON object that contained an XML string in one of the properties which was overly complicated to escape. This solved it.

Does a nice job of adding arguments, but doesn't escape. Added comment in method where escape sequence should go.
public static string ApplicationArguments()
{
List<string> args = Environment.GetCommandLineArgs().ToList();
args.RemoveAt(0); // remove executable
StringBuilder sb = new StringBuilder();
foreach (string s in args)
{
// todo: add escape double quotes here
sb.Append(string.Format("\"{0}\" ", s)); // wrap all args in quotes
}
return sb.ToString().Trim();
}

Copy sample code function from this url:
http://csharptest.net/529/how-to-correctly-escape-command-line-arguments-in-c/index.html
You can get command line to execute for example like this:
String cmdLine = EscapeArguments(Environment.GetCommandLineArgs().Skip(1).ToArray());
Skip(1) skips executable name.

Related

Equals() method not recognizing similar/same characters when comparing

Why comparing characters with .Equals always returns false?
char letter = 'a';
Console.WriteLine(letter.Equals("a")); // false
Overall I'm trying to write an English - Morse Code translator. I run into a problem comparing char values which shown above. I began with a foreach to analyze all the characters from a ReadLine() input, by using the WriteLine() method, all the characters were transposed fine, but when trying to compare them using the .Equals() method, no matter what I did, it always output false when trying to compare chars.
I have used the .Equals() method with other strings successfully, but it seems to not work with my chars.
using System;
public class MorseCode {
public static void Main (string[] args) {
Console.WriteLine ("Hello, write anything to convert it to morse code!");
var input = Console.ReadLine();
foreach (char letter in input) {
if(letter.Equals("a")) {
Console.WriteLine("Its A - live");
}
Console.WriteLine(letter);
}
var morseTranslation = "";
foreach (char letter in input) {
if(letter.Equals("a")) {
morseTranslation += ". _ - ";
}
if(letter.Equals("b")) {
morseTranslation += "_ . . . - ";
}
if(letter.Equals("c")) {
morseTranslation += "_ . _ . - ";
}
...
}
}
Console.WriteLine("In morse code, " + input + " is '"morseTranslation + "'");
}
}
At the beginning, I wrote the foreach to test if it recognized and ran the correct output, but in the end, when I wrote "sample" into the ReadLine(), it gave me :
Hello, write anything to convert it to morse code!
sample
s
a
m
p
l
e
When you do this:
var c = 'x';
var isEqual = c.Equals("x");
the result (isEqual) will always be false because it's comparing a string to a char. This would return true:
var isEqual = c.Equals('x');
The difference is that "x" is a string literal and 'x' is a char literal.
Part of what makes this confusing is that when you use an object's Equals method, it allows you to compare any type to any other type. So you could do this:
var x = 0;
var y = "y";
var isEqual = x.Equals(y);
...and the compiler will allow it, even though the comparison between int and string won't work. It will give you this warning:
When comparing value types like int or char with other values of the same type, we usually use ==, like
if (someChar == someOtherChar)
Then if you tried to do this:
if(someChar == "a")
It wouldn't compile. It would tell you that you're comparing a char to a string, and then it's easier because instead of running the program and looking for the error it just won't compile at all and it will tell you exactly where the problem is.
Just for the fun of it, here's another implementation.
public static class MorseCodeConverter
{
private static readonly Dictionary<char, string> Codes
= CreateMorseCodeDictionary();
public static string Convert(string input)
{
var lowerCase = input.ToLower();
var result = new StringBuilder();
foreach (var character in input)
{
if (Codes.ContainsKey(character))
result.Append(Codes[character]);
}
return result.ToString();
}
static Dictionary<char, string> CreateMorseCodeDictionary()
{
var result = new Dictionary<char, string>();
result.Add('a', ". _ - ");
result.Add('b', "_ . . . - ");
// add all the rest
return result;
}
}
One difference is that it's a class by itself without the console app. Then you can use it in a console app. Read the input from the keyboard and then call
MorseCodeConverter.Convert(input);
to get the result, and then you can print it to the console.a
Putting all of the characters in a dictionary means that instead of repeating the if/then you can just check to see if each character is in the dictionary.
It's important to remember that whilst the char and string keywords look reminiscant of eachother when looking at printed values you should note that they are not accomodated for in exactly the same way.
When you check a string you can use:
string s = "A";
if(s.Equals("A"))
{
//Do Something
}
However, the above will not work with a char. The difference between chars (value types) and strings (reference types) on a surface level is the use of access - single quote (apostrophe) vs quote.
To compare a char you can do this:
char s = 'A';
if(s.Equals('A'))
{
//Do Something
}
On a point relevant to your specific case however, morse code will only requre you to use a single case alphabet and as such when you try to compare against 'A' and 'a' you can call input.ToLower() to reduce your var (string) to all lower case so you don't need to cater for both upper and lower case alphabets.
It's good that you're aware of string comparissons and are not using direct value comparisson as this:
if (letter == 'a')
{
Console.WriteLine("Its A - live");
}
Would've allowed you to compare the char but it's bad practice as it may lead to lazy comparisson of strings in the same way and this:
if (letter == "a")
{
Console.WriteLine("Its A - live");
}
Is a non-representitive method of comparison for the purpose of comparing strings as it evaluates the reference not the direct value, see here
For char comparison you have to use single quote ' character not " this.
By the way it writes sample in decending order beacuse in your first foreach loop you write all letters in new line. SO below code will work for you:
using System;
public class MorseCode {
public static void Main (string[] args) {
Console.WriteLine ("Hello, write anything to convert it to morse code!");
var input = Console.ReadLine();
/*foreach (char letter in input) {
if(letter.Equals("a")) {
Console.WriteLine("Its A - live");
}
Console.WriteLine(letter);
}*/
var morseTranslation = "";
foreach (char letter in input) {
if(letter.Equals('a')) {
morseTranslation += ". _ - ";
}
if(letter.Equals('b')) {
morseTranslation += "_ . . . - ";
}
if(letter.Equals('c')) {
morseTranslation += "_ . _ . - ";
}
...
}
}
Console.WriteLine("In morse code, " + input + " is '"morseTranslation + "'");
}
}
In C#, you can compare strings like integers, that is with == operator. Equals is a method inherited from the object class, and normally implementations would make some type checks. char letter is (obviously) a character, while "a" is a single lettered string.
That's why it returns false.
You could use if (letter.Equals('a')) { ... }, or simpler if (letter == 'a') { ... }
Even simpler than that would be switch (letter) { case 'a': ...; break; ... }.
Or something that is more elegant but maybe too advanced yet for a beginner, using LINQ:
var validCharacters = "ABCDE...";
var codes = new string[] {
".-", "-...", "-.-.", "-..", ".", ...
};
var codes = input.ToUpper() // make uppercase
.ToCharArray() // explode string into single characters
.Select(validCharaters.IndexOf) // foreach element (i. e. character), get the result of "validCharacters.IndexOf",
// which equals the index of the morse code in the array "codes"
.Where(i => i > -1) // only take the indexes of characters that were found in "validCharacters"
.Select(i => codes[i]); // retrieve the matching entry from "codes" by index
// "codes" is now an IEnumerable<string>, a structure saying
// "I am a list of strings over which you can iterate,
// and I know how to generate the elements as you request them."
// Now concatenate all single codes to one long result string
var result = string.Join(" ", codes);

Can I get the arguments to my application in the original form (e.g including quotes)?

I have a .Net application that take a bunch of command line arguments, process some of it, and use the rest as arguments for another application
E.g.
MyApp.exe foo1 App2.exe arg1 arg2 ...
MyApp.exe is my application,
foo1 is a parameter that my application care. App2.exe is another application, and my application will run App2 with arg1 arg2, etc. as arguments.
Currently my application just run App2.exe using something like this
Process.Start(args[1], String.Join(" ", args.Skip(2)). So the command above will correctly run: App2.exe with arguments "arg1 arg2". However, consider something like this
MyApp.exe foo1 notepad.exe "C:\Program Files\readme.txt"
The code above will not be aware of the quotes, and will run notepad.exe with arguments C:\Program Files\readme.txt (without quotes).
How can I fix this problem?
Environment.CommandLine
will give you the exact command line - you'll have to parse out the path to your app but otherwise works like a charm - #idle_mind alluded to this earlier (kind of)
Edited to move example into answer (because people are obviously still looking for this answer). Note that when debuging vshost messes up the command line a little.
#if DEBUG
int bodgeLen = "\"vshost.\"".Length;
#else
int bodgeLen = "\"\"".Length;
#endif
string a = Environment.CommandLine.Substring(Assembly.GetExecutingAssembly().Location.Lengt‌​h+bodgeLen).Trim();
You will need to modify MyApp to enclose any arguments with quotes.
Short story, the new code should be this:
var argsToPass = args.Skip(2).Select(o => "\"" + o.Replace("\"", "\\\"") + "\"");
Process.Start(args[1], String.Join(" ", argsToPass);
The logic is this:
each argument should be enclosed with quotes, so if you are calling with :
MyApp.exe foo1 notepad.exe "C:\Program Files\readme.txt"
The app will get called this way:
notepad.exe "C:\Program Files\readme.txt"
each argument should escape the quotes (if any), so if you are calling with:
MyApp.exe foo1 notepad.exe "C:\Program Files\Some Path with \"Quote\" here\readme.txt"
The app will get called this way:
notepad.exe "C:\Program Files\Some Path with \"Quote\" here\readme.txt"
Use Environment.GetCommandLine() as it will keep the parameter in quotes together as one argument.
Well, the simple answer is to just wrap every argument in quotes when calling MyApp2.exe.
It doesn't hurt to wrap arguments that are one word, and it will fix the case that it has spaces in the argument.
The only thing that might go wrong is if the argument has an escaped quote in it already.
You can use backslashes for escape quotes. below will work
MyApp.exe foo1 notepad.exe \"C:\Program Files\readme.txt\"
Above will be the best solution if you are don't have idea about which other exes going to run and what are the arguments they expecting. In that case you can't add quotes from your program.
give instructions to add backslashes when there is quotes when running your application
Credit to #mp3ferret for having the right idea. But there was no example of a solution using Environment.CommandLine, so I went ahead and created a OriginalCommandLine class that will get the Command Line arguments as originally entered.
An argument is defined in the tokenizer regex as being a double quoted string of any type of character, or an unquoted string of non-whitespace characters. Within the quoted strings, the quote character can be escaped by a backslash. However a trailing backslash followed by a double quote and then white space will not be escaped.
There reason I chose the exception of the escape due to whitespace was to accommodate quoted paths that end with a backslash. I believe it is far less likely that you'll encounter a situation where you'd actually want the escaped double quote.
Code
static public class OriginalCommandLine
{
static Regex tokenizer = new Regex(#"""(?:\\""(?!\s)|[^""])*""|[^\s]+");
static Regex unescaper = new Regex(#"\\("")(?!\s|$)");
static Regex unquoter = new Regex(#"^\s*""|""\s*$");
static Regex quoteTester = new Regex(#"^\s*""(?:\\""|[^""])*""\s*$");
static public string[] Parse(string commandLine = null)
{
return tokenizer.Matches(commandLine ?? Environment.CommandLine).Cast<Match>()
.Skip(1).Select(m => unescaper.Replace(m.Value, #"""")).ToArray();
}
static public string UnQuote(string text)
{
return (IsQuoted(text)) ? unquoter.Replace(text, "") : text;
}
static public bool IsQuoted(string text)
{
return text != null && quoteTester.Match(text).Success;
}
}
Results
As you can see from the results below the above method fixes maintains the quotes, while more gracefully handling a realistic scenario you might encounter.
Test:
ConsoleApp1.exe foo1 notepad.exe "C:\Progra\"m Files\MyDocuments\" "C:\Program Files\bar.txt"
args[]:
[0]: foo1
[1]: notepad.exe
[2]: C:\Progra"m Files\MyDocuments" C:\Program
[3]: Files\bar.txt
CommandLine.Parse():
[0]: foo1
[1]: notepad.exe
[2]: "C:\Progra"m Files\MyDocuments\"
[3]: "C:\Program Files\bar.txt"
Finally
I debated using an alternative scheme for escaping double quotes. I feel that using "" is better given that command lines so often deal with backslashes. I kept the backslash escaping method because it is backwards compatible with how command line arguments are normally processed.
If you want to use that scheme make the following changes to the regexes:
static Regex tokenizer = new Regex(#"""(?:""""|[^""])*""|[^\s]+");
static Regex unescaper = new Regex(#"""""");
static Regex unquoter = new Regex(#"^\s*""|""\s*$");
static Regex quoteTester = new Regex(#"^\s*""(?:""""|[^""])*""\s*$");
If you want to get closer to what you expect from args but with the quotes intact, change the two regexes. There is still a minor difference, "abc"d will return abcd from args but [0] = "abc", [1] = d from my solution.
static Regex tokenizer = new Regex(#"""(?:\\""|[^""])*""|[^\s]+");
static Regex unescaper = new Regex(#"\\("")");
If you really, really want to get the same number of elements as args, use the following:
static Regex tokenizer = new Regex(#"(?:[^\s""]*""(?:\\""|[^""])*"")+|[^\s]+");
static Regex unescaper = new Regex(#"\\("")");
Result of exact match
Test: "zzz"zz"Zzz" asdasd zz"zzz" "zzz"
args OriginalCommandLine
------------- -------------------
[0]: zzzzzZzz [0]: "zzz"zz"Zzz"
[1]: asdasd [1]: asdasd
[2]: zzzzz [2]: zz"zzz"
[3]: zzz [3]: "zzz"
Try the following.
This code preserved the double quotes characters as well as giving the option to escape the \ and " characters (see comments in the code below).
static void Main(string[] args)
{
// This project should be compiled with "unsafe" flag!
Console.WriteLine(GetRawCommandLine());
var prms = GetRawArguments();
foreach (var prm in prms)
{
Console.WriteLine(prm);
}
}
[DllImport("kernel32.dll", CharSet = CharSet.Auto)]
private static extern System.IntPtr GetCommandLine();
public static string GetRawCommandLine()
{
// Win32 API
string s = Marshal.PtrToStringAuto(GetCommandLine());
// or better, managed code as suggested by #mp3ferret
// string s = Environment.CommandLine;
return s.Substring(s.IndexOf('"', 1) + 1).Trim();
}
public static string[] GetRawArguments()
{
string cmdline = GetRawCommandLine();
// Now let's split the arguments.
// Lets assume the fllowing possible escape sequence:
// \" = "
// \\ = \
// \ with any other character will be treated as \
//
// You may choose other rules and implement them!
var args = new ArrayList();
bool inQuote = false;
int pos = 0;
StringBuilder currArg = new StringBuilder();
while (pos < cmdline.Length)
{
char currChar = cmdline[pos];
if (currChar == '"')
{
currArg.Append(currChar);
inQuote = !inQuote;
}
else if (currChar == '\\')
{
char nextChar = pos < cmdline.Length - 1 ? cmdline[pos + 1] : '\0';
if (nextChar == '\\' || nextChar == '"')
{
currArg.Append(nextChar);
pos += 2;
continue;
}
else
{
currArg.Append(currChar);
}
}
else if (inQuote || !char.IsWhiteSpace(currChar))
{
currArg.Append(currChar);
}
if (!inQuote && char.IsWhiteSpace(currChar) && currArg.Length > 0)
{
args.Add(currArg.ToString());
currArg.Clear();
}
pos++;
}
if (currArg.Length > 0)
{
args.Add(currArg.ToString());
currArg.Clear();
}
return (string[])args.ToArray(typeof(string));
}
Try with "\"". I have to pass as arguments url too, this is the way:
_filenameDestin and _zip are urls. I hope it helps.
string ph = "\"";
var psi = new ProcessStartInfo();
psi.Arguments = "a -r " + ph + _filenameDestin + ".zip " + ph + _filenameDestin + ph;
psi.FileName = _zip;
var p = new Process();
p.StartInfo = psi;
p.Start();
p.WaitForExit();
One solution might be to try using Command Line Parser, a free 3rd-party tool, to set up your application to take specific flags.
For example, you could define the accepted options as follows:
internal sealed class Options
{
[Option('a', "mainArguments", Required=true, HelpText="The arguments for the main application")]
public String MainArguments { get; set; }
[Option('t', "targetApplication", Required = true, HelpText = "The second application to run.")]
public String TargetApplication { get; set; }
[Option('p', "targetParameters", Required = true, HelpText = "The arguments to pass to the target application.")]
public String targetParameters { get; set; }
[ParserState]
public IParserState LastParserState { get; set; }
[HelpOption]
public string GetUsage()
{
return HelpText.AutoBuild(this, current => HelpText.DefaultParsingErrorsHandler(this, current));
}
}
Which can then be used in your Program.cs as follows:
static void Main(string[] args)
{
Options options = new Options();
var parser = new CommandLine.Parser();
if (parser.ParseArgumentsStrict(args, options, () => Environment.Exit(-2)))
{
Run(options);
}
}
private static void Run(Options options)
{
String mainArguments = options.MainArguments;
// Do whatever you want with your main arguments.
String quotedTargetParameters = String.Format("\"{0}\"", options.TargetParameters);
Process targetProcess = Process.Start(options.TargetApplication, quotedTargetParameters);
}
You would then call it on the command line like this:
myApp -a mainArgs -t targetApp -p "target app parameters"
This takes all the guesswork out of trying to figure out what's an argument for which app while also allowing your user to specify them in whatever order they want. And if you decide to add in another argument down the road, you can easily do so without breaking everything.
EDIT: Updated Run method to include ability to add quotes around the target parameters.

Regex to remove single-line SQL comments (--)

Question:
Can anybody give me a working regex expression (C#/VB.NET) that can remove single line comments from a SQL statement ?
I mean these comments:
-- This is a comment
not those
/* this is a comment */
because I already can handle the star comments.
I have a made a little parser that removes those comments when they are at the start of the line, but they can also be somewhere after code or worse, in a SQL-string 'hello --Test -- World'
Those comments should also be removed (except those in a SQL string of course - if possible).
Surprisingly I didn't got the regex working. I would have assumed the star comments to be more difficult, but actually, they aren't.
As per request, here my code to remove /**/-style comments
(In order to have it ignore SQL-Style strings, you have to subsitute strings with a uniqueidentifier (i used 4 concated), then apply the comment-removal, then apply string-backsubstitution.
static string RemoveCstyleComments(string strInput)
{
string strPattern = #"/[*][\w\d\s]+[*]/";
//strPattern = #"/\*.*?\*/"; // Doesn't work
//strPattern = "/\\*.*?\\*/"; // Doesn't work
//strPattern = #"/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/ "; // Doesn't work
//strPattern = #"/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/ "; // Doesn't work
// http://stackoverflow.com/questions/462843/improving-fixing-a-regex-for-c-style-block-comments
strPattern = #"/\*(?>(?:(?>[^*]+)|\*(?!/))*)\*/"; // Works !
string strOutput = System.Text.RegularExpressions.Regex.Replace(strInput, strPattern, string.Empty, System.Text.RegularExpressions.RegexOptions.Multiline);
Console.WriteLine(strOutput);
return strOutput;
} // End Function RemoveCstyleComments
I will disappoint all of you. This can't be done with regular expressions. Sure, it's easy to find comments not in a string (that even the OP could do), the real deal is comments in a string. There is a little hope of the look arounds, but that's still not enough. By telling that you have a preceding quote in a line won't guarantee anything. The only thing what guarantees you something is the oddity of quotes. Something you can't find with regular expression. So just simply go with non-regular-expression approach.
EDIT:
Here's the c# code:
String sql = "--this is a test\r\nselect stuff where substaff like '--this comment should stay' --this should be removed\r\n";
char[] quotes = { '\'', '"'};
int newCommentLiteral, lastCommentLiteral = 0;
while ((newCommentLiteral = sql.IndexOf("--", lastCommentLiteral)) != -1)
{
int countQuotes = sql.Substring(lastCommentLiteral, newCommentLiteral - lastCommentLiteral).Split(quotes).Length - 1;
if (countQuotes % 2 == 0) //this is a comment, since there's an even number of quotes preceding
{
int eol = sql.IndexOf("\r\n") + 2;
if (eol == -1)
eol = sql.Length; //no more newline, meaning end of the string
sql = sql.Remove(newCommentLiteral, eol - newCommentLiteral);
lastCommentLiteral = newCommentLiteral;
}
else //this is within a string, find string ending and moving to it
{
int singleQuote = sql.IndexOf("'", newCommentLiteral);
if (singleQuote == -1)
singleQuote = sql.Length;
int doubleQuote = sql.IndexOf('"', newCommentLiteral);
if (doubleQuote == -1)
doubleQuote = sql.Length;
lastCommentLiteral = Math.Min(singleQuote, doubleQuote) + 1;
//instead of finding the end of the string you could simply do += 2 but the program will become slightly slower
}
}
Console.WriteLine(sql);
What this does: find every comment literal. For each, check if it's within a comment or not, by counting the number of quotes between the current match and the last one. If this number is even, then it's a comment, thus remove it (find first end of line and remove whats between). If it's odd, this is within a string, find the end of the string and move to it. Rgis snippet is based on a wierd SQL trick: 'this" is a valid string. Even tho the 2 quotes differ. If it's not true for your SQL language, you should try a completely different approach. I'll write a program to that too if that's the case, but this one's faster and more straightforward.
You want something like this for the simple case
-{2,}.*
The -{2,} looks for a dash that happens 2 or more times
The .* gets the rest of the lines up to the newline
*But, for the edge cases, it appears that SinistraD is correct in that you cannot catch everything, however here is an article about how this can be done in C# with a combination of code and regex.
This seems to work well for me so far; it even ignores comments within strings, such as SELECT '--not a comment--' FROM ATable
private static string removeComments(string sql)
{
string pattern = #"(?<=^ ([^'""] |['][^']*['] |[""][^""]*[""])*) (--.*$|/\*(.|\n)*?\*/)";
return Regex.Replace(sql, pattern, "", RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline);
}
Note: it is designed to eliminate both /**/-style comments as well as -- style. Remove |/\*(.|\n)*?\*/ to get rid of the /**/ checking. Also be sure you are using the RegexOptions.IgnorePatternWhitespace Regex option!!
I wanted to be able to handle double-quotes too, but since T-SQL doesn't support them, you could get rid of |[""][^""]*[""] too.
Adapted from here.
Note (Mar 2015): In the end, I wound up using Antlr, a parser generator, for this project. There may have been some edge cases where the regex didn't work. In the end I was much more confident with the results having used Antlr, and it's worked well.
Using System.Text.RegularExpressions;
public static string RemoveSQLCommentCallback(Match SQLLineMatch)
{
System.Text.StringBuilder sb = new System.Text.StringBuilder();
bool open = false; //opening of SQL String found
char prev_ch = ' ';
foreach (char ch in SQLLineMatch.ToString())
{
if (ch == '\'')
{
open = !open;
}
else if ((!open && prev_ch == '-' && ch == '-'))
{
break;
}
sb.Append(ch);
prev_ch = ch;
}
return sb.ToString().Trim('-');
}
The code
public static void Main()
{
string sqlText = "WHERE DEPT_NAME LIKE '--Test--' AND START_DATE < SYSDATE -- Don't go over today";
//for every matching line call callback func
string result = Regex.Replace(sqlText, ".*--.*", RemoveSQLCommentCallback);
}
Let's replace, find all the lines that match dash dash comment and call your parsing function for every match.
As a late solution, the simplest way is to do it using ScriptDom-TSqlParser:
// https://michaeljswart.com/2014/04/removing-comments-from-sql/
// http://web.archive.org/web/*/https://michaeljswart.com/2014/04/removing-comments-from-sql/
public static string StripCommentsFromSQL(string SQL)
{
Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser parser =
new Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser(true);
System.Collections.Generic.IList<Microsoft.SqlServer.TransactSql.ScriptDom.ParseError> errors;
Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragment fragments =
parser.Parse(new System.IO.StringReader(SQL), out errors);
// clear comments
string result = string.Join(
string.Empty,
fragments.ScriptTokenStream
.Where(x => x.TokenType != Microsoft.SqlServer.TransactSql.ScriptDom.TSqlTokenType.MultilineComment)
.Where(x => x.TokenType != Microsoft.SqlServer.TransactSql.ScriptDom.TSqlTokenType.SingleLineComment)
.Select(x => x.Text));
return result;
}
or instead of using the Microsoft-Parser, you can use ANTL4 TSqlLexer
or without any parser at all:
private static System.Text.RegularExpressions.Regex everythingExceptNewLines =
new System.Text.RegularExpressions.Regex("[^\r\n]");
// http://drizin.io/Removing-comments-from-SQL-scripts/
// http://web.archive.org/web/*/http://drizin.io/Removing-comments-from-SQL-scripts/
public static string RemoveComments(string input, bool preservePositions, bool removeLiterals = false)
{
//based on http://stackoverflow.com/questions/3524317/regex-to-strip-line-comments-from-c-sharp/3524689#3524689
var lineComments = #"--(.*?)\r?\n";
var lineCommentsOnLastLine = #"--(.*?)$"; // because it's possible that there's no \r\n after the last line comment
// literals ('literals'), bracketedIdentifiers ([object]) and quotedIdentifiers ("object"), they follow the same structure:
// there's the start character, any consecutive pairs of closing characters are considered part of the literal/identifier, and then comes the closing character
var literals = #"('(('')|[^'])*')"; // 'John', 'O''malley''s', etc
var bracketedIdentifiers = #"\[((\]\])|[^\]])* \]"; // [object], [ % object]] ], etc
var quotedIdentifiers = #"(\""((\""\"")|[^""])*\"")"; // "object", "object[]", etc - when QUOTED_IDENTIFIER is set to ON, they are identifiers, else they are literals
//var blockComments = #"/\*(.*?)\*/"; //the original code was for C#, but Microsoft SQL allows a nested block comments // //https://msdn.microsoft.com/en-us/library/ms178623.aspx
//so we should use balancing groups // http://weblogs.asp.net/whaggard/377025
var nestedBlockComments = #"/\*
(?>
/\* (?<LEVEL>) # On opening push level
|
\*/ (?<-LEVEL>) # On closing pop level
|
(?! /\* | \*/ ) . # Match any char unless the opening and closing strings
)+ # /* or */ in the lookahead string
(?(LEVEL)(?!)) # If level exists then fail
\*/";
string noComments = System.Text.RegularExpressions.Regex.Replace(input,
nestedBlockComments + "|" + lineComments + "|" + lineCommentsOnLastLine + "|" + literals + "|" + bracketedIdentifiers + "|" + quotedIdentifiers,
me => {
if (me.Value.StartsWith("/*") && preservePositions)
return everythingExceptNewLines.Replace(me.Value, " "); // preserve positions and keep line-breaks // return new string(' ', me.Value.Length);
else if (me.Value.StartsWith("/*") && !preservePositions)
return "";
else if (me.Value.StartsWith("--") && preservePositions)
return everythingExceptNewLines.Replace(me.Value, " "); // preserve positions and keep line-breaks
else if (me.Value.StartsWith("--") && !preservePositions)
return everythingExceptNewLines.Replace(me.Value, ""); // preserve only line-breaks // Environment.NewLine;
else if (me.Value.StartsWith("[") || me.Value.StartsWith("\""))
return me.Value; // do not remove object identifiers ever
else if (!removeLiterals) // Keep the literal strings
return me.Value;
else if (removeLiterals && preservePositions) // remove literals, but preserving positions and line-breaks
{
var literalWithLineBreaks = everythingExceptNewLines.Replace(me.Value, " ");
return "'" + literalWithLineBreaks.Substring(1, literalWithLineBreaks.Length - 2) + "'";
}
else if (removeLiterals && !preservePositions) // wrap completely all literals
return "''";
else
throw new System.NotImplementedException();
},
System.Text.RegularExpressions.RegexOptions.Singleline | System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace);
return noComments;
}
I don't know if C#/VB.net regex is special in some way but traditionally s/--.*// should work.
In PHP, i'm using this code to uncomment SQL (only single line):
$sqlComments = '#(([\'"`]).*?[^\\\]\2)|((?:\#|--).*?$)\s*|(?<=;)\s+#ms';
/* Commented version
$sqlComments = '#
(([\'"`]).*?[^\\\]\2) # $1 : Skip single & double quoted + backticked expressions
|((?:\#|--).*?$) # $3 : Match single line comments
\s* # Trim after comments
|(?<=;)\s+ # Trim after semi-colon
#msx';
*/
$uncommentedSQL = trim( preg_replace( $sqlComments, '$1', $sql ) );
preg_match_all( $sqlComments, $sql, $comments );
$extractedComments = array_filter( $comments[ 3 ] );
var_dump( $uncommentedSQL, $extractedComments );
To remove all comments see Regex to match MySQL comments

How can I convert PascalCase to split words?

I have variables containing text such as:
ShowSummary
ShowDetails
AccountDetails
Is there a simple way function / method in C# that I can apply to these variables to yield:
"Show Summary"
"Show Details"
"Account Details"
I was wondering about an extension method but I've never coded one and I am not sure where to start.
See this post by Jon Galloway and one by Phil
In the application I am currently working on, we have a delegate based split extension method. It looks like so:
public static string Split(this string target, Func<char, char, bool> shouldSplit, string splitFiller = " ")
{
if (target == null)
throw new ArgumentNullException("target");
if (shouldSplit == null)
throw new ArgumentNullException("shouldSplit");
if (String.IsNullOrEmpty(splitFiller))
throw new ArgumentNullException("splitFiller");
int targetLength = target.Length;
// We know the resulting string is going to be atleast the length of target
StringBuilder result = new StringBuilder(targetLength);
result.Append(target[0]);
// Loop from the second character to the last character.
for (int i = 1; i < targetLength; ++i)
{
char firstChar = target[i - 1];
char secondChar = target[i];
if (shouldSplit(firstChar, secondChar))
{
// If a split should be performed add in the filler
result.Append(splitFiller);
}
result.Append(secondChar);
}
return result.ToString();
}
Then it is could be used as follows:
string showSummary = "ShowSummary";
string spacedString = showSummary.Split((c1, c2) => Char.IsLower(c1) && Char.IsUpper(c2));
This allows you to split on any conditions between two chars, and insert a filler of your choice (default of a space).
The best would be to iterate through each character within the string. Check if the character is upper case. If so, insert a space character before it. Otherwise, move onto the next character.
Also, ideally start from the second character so that a space would not be inserted before the first character.
try something like this
var word = "AccountDetails";
word = string.Join(string.Empty,word
.Select(c => new string(c, 1)).Select(c => c[0] < 'Z' ? " " + c : c)).Trim();

Apostrophe (') in XPath query

I use the following XPATH Query to list the object under a site. ListObject[#Title='SomeValue']. SomeValue is dynamic. This query works as long as SomeValue does not have an apostrophe ('). Tried using escape sequence also. Didn't work.
What am I doing wrong?
This is surprisingly difficult to do.
Take a look at the XPath Recommendation, and you'll see that it defines a literal as:
Literal ::= '"' [^"]* '"'
| "'" [^']* "'"
Which is to say, string literals in XPath expressions can contain apostrophes or double quotes but not both.
You can't use escaping to get around this. A literal like this:
'Some&apos;Value'
will match this XML text:
Some&apos;Value
This does mean that it's possible for there to be a piece of XML text that you can't generate an XPath literal to match, e.g.:
<elm att=""&apos"/>
But that doesn't mean it's impossible to match that text with XPath, it's just tricky. In any case where the value you're trying to match contains both single and double quotes, you can construct an expression that uses concat to produce the text that it's going to match:
elm[#att=concat('"', "'")]
So that leads us to this, which is a lot more complicated than I'd like it to be:
/// <summary>
/// Produce an XPath literal equal to the value if possible; if not, produce
/// an XPath expression that will match the value.
///
/// Note that this function will produce very long XPath expressions if a value
/// contains a long run of double quotes.
/// </summary>
/// <param name="value">The value to match.</param>
/// <returns>If the value contains only single or double quotes, an XPath
/// literal equal to the value. If it contains both, an XPath expression,
/// using concat(), that evaluates to the value.</returns>
static string XPathLiteral(string value)
{
// if the value contains only single or double quotes, construct
// an XPath literal
if (!value.Contains("\""))
{
return "\"" + value + "\"";
}
if (!value.Contains("'"))
{
return "'" + value + "'";
}
// if the value contains both single and double quotes, construct an
// expression that concatenates all non-double-quote substrings with
// the quotes, e.g.:
//
// concat("foo", '"', "bar")
StringBuilder sb = new StringBuilder();
sb.Append("concat(");
string[] substrings = value.Split('\"');
for (int i = 0; i < substrings.Length; i++ )
{
bool needComma = (i>0);
if (substrings[i] != "")
{
if (i > 0)
{
sb.Append(", ");
}
sb.Append("\"");
sb.Append(substrings[i]);
sb.Append("\"");
needComma = true;
}
if (i < substrings.Length - 1)
{
if (needComma)
{
sb.Append(", ");
}
sb.Append("'\"'");
}
}
sb.Append(")");
return sb.ToString();
}
And yes, I tested it with all the edge cases. That's why the logic is so stupidly complex:
foreach (string s in new[]
{
"foo", // no quotes
"\"foo", // double quotes only
"'foo", // single quotes only
"'foo\"bar", // both; double quotes in mid-string
"'foo\"bar\"baz", // multiple double quotes in mid-string
"'foo\"", // string ends with double quotes
"'foo\"\"", // string ends with run of double quotes
"\"'foo", // string begins with double quotes
"\"\"'foo", // string begins with run of double quotes
"'foo\"\"bar" // run of double quotes in mid-string
})
{
Console.Write(s);
Console.Write(" = ");
Console.WriteLine(XPathLiteral(s));
XmlElement elm = d.CreateElement("test");
d.DocumentElement.AppendChild(elm);
elm.SetAttribute("value", s);
string xpath = "/root/test[#value = " + XPathLiteral(s) + "]";
if (d.SelectSingleNode(xpath) == elm)
{
Console.WriteLine("OK");
}
else
{
Console.WriteLine("Should have found a match for {0}, and didn't.", s);
}
}
Console.ReadKey();
}
I ported Robert's answer to Java (tested in 1.6):
/// <summary>
/// Produce an XPath literal equal to the value if possible; if not, produce
/// an XPath expression that will match the value.
///
/// Note that this function will produce very long XPath expressions if a value
/// contains a long run of double quotes.
/// </summary>
/// <param name="value">The value to match.</param>
/// <returns>If the value contains only single or double quotes, an XPath
/// literal equal to the value. If it contains both, an XPath expression,
/// using concat(), that evaluates to the value.</returns>
public static String XPathLiteral(String value) {
if(!value.contains("\"") && !value.contains("'")) {
return "'" + value + "'";
}
// if the value contains only single or double quotes, construct
// an XPath literal
if (!value.contains("\"")) {
System.out.println("Doesn't contain Quotes");
String s = "\"" + value + "\"";
System.out.println(s);
return s;
}
if (!value.contains("'")) {
System.out.println("Doesn't contain apostophes");
String s = "'" + value + "'";
System.out.println(s);
return s;
}
// if the value contains both single and double quotes, construct an
// expression that concatenates all non-double-quote substrings with
// the quotes, e.g.:
//
// concat("foo", '"', "bar")
StringBuilder sb = new StringBuilder();
sb.append("concat(");
String[] substrings = value.split("\"");
for (int i = 0; i < substrings.length; i++) {
boolean needComma = (i > 0);
if (!substrings[i].equals("")) {
if (i > 0) {
sb.append(", ");
}
sb.append("\"");
sb.append(substrings[i]);
sb.append("\"");
needComma = true;
}
if (i < substrings.length - 1) {
if (needComma) {
sb.append(", ");
}
sb.append("'\"'");
}
System.out.println("Step " + i + ": " + sb.toString());
}
//This stuff is because Java is being stupid about splitting strings
if(value.endsWith("\"")) {
sb.append(", '\"'");
}
//The code works if the string ends in a apos
/*else if(value.endsWith("'")) {
sb.append(", \"'\"");
}*/
sb.append(")");
String s = sb.toString();
System.out.println(s);
return s;
}
Hope this helps somebody!
EDIT: After a heavy unit testing session, and checking the XPath Standards, I have revised my function as follows:
public static string ToXPath(string value) {
const string apostrophe = "'";
const string quote = "\"";
if(value.Contains(quote)) {
if(value.Contains(apostrophe)) {
throw new XPathException("Illegal XPath string literal.");
} else {
return apostrophe + value + apostrophe;
}
} else {
return quote + value + quote;
}
}
It appears that XPath doesn't have a character escaping system at all, it's quite primitive really. Evidently my original code only worked by coincidence. My apologies for misleading anyone!
Original answer below for reference only - please ignore
For safety, make sure that any occurrence of all 5 predefined XML entities in your XPath string are escaped, e.g.
public static string ToXPath(string value) {
return "'" + XmlEncode(value) + "'";
}
public static string XmlEncode(string value) {
StringBuilder text = new StringBuilder(value);
text.Replace("&", "&");
text.Replace("'", "&apos;");
text.Replace(#"""", """);
text.Replace("<", "<");
text.Replace(">", ">");
return text.ToString();
}
I have done this before and it works fine. If it doesn't work for you, maybe there is some additional context to the problem that you need to make us aware of.
By far the best approach to this problem is to use the facilities provided by your XPath library to declare an XPath-level variable that you can reference in the expression. The variable value can then be any string in the host programming language, and isn't subject to the restrictions of XPath string literals. For example, in Java with javax.xml.xpath:
XPathFactory xpf = XPathFactory.newInstance();
final Map<String, Object> variables = new HashMap<>();
xpf.setXPathVariableResolver(new XPathVariableResolver() {
public Object resolveVariable(QName name) {
return variables.get(name.getLocalPart());
}
});
XPath xpath = xpf.newXPath();
XPathExpression expr = xpath.compile("ListObject[#Title=$val]");
variables.put("val", someValue);
NodeList nodes = (NodeList)expr.evaluate(someNode, XPathConstants.NODESET);
For C# XPathNavigator you would define a custom XsltContext as described in this MSDN article (you'd only need the variable-related parts of this example, not the extension functions).
Most of the answers here focus on how to use string manipulation to cobble together an XPath that uses string delimiters in a valid way.
I would say the best practice is not to rely on such complicated and potentially fragile methods.
The following applies to .NET since this question is tagged with C#. Ian Roberts has provided what I think is the best solution for when you're using XPath in Java.
Nowadays, you can use Linq-to-Xml to query XML documents in a way that allows you to use your variables in the query directly. This is not XPath, but the purpose is the same.
For the example given in OP, you could query the nodes you want like this:
var value = "Some value with 'apostrophes' and \"quotes\"";
// doc is an instance of XElement or XDocument
IEnumerable<XElement> nodes =
doc.Descendants("ListObject")
.Where(lo => (string)lo.Attribute("Title") == value);
or to use the query comprehension syntax:
IEnumerable<XElement> nodes = from lo in doc.Descendants("ListObject")
where (string)lo.Attribute("Title") == value
select lo;
.NET also provides a way to use XPath variables in your XPath queries. Sadly, it's not easy to do this out of the box, but with a simple helper class that I provide in this other SO answer, it's quite easy.
You can use it like this:
var value = "Some value with 'apostrophes' and \"quotes\"";
var variableContext = new VariableContext { { "matchValue", value } };
// ixn is an instance of IXPathNavigable
XPathNodeIterator nodes = ixn.CreateNavigator()
.SelectNodes("ListObject[#Title = $matchValue]",
variableContext);
Here is an alternative to Robert Rossney's StringBuilder approach, perhaps more intuitive:
/// <summary>
/// Produce an XPath literal equal to the value if possible; if not, produce
/// an XPath expression that will match the value.
///
/// Note that this function will produce very long XPath expressions if a value
/// contains a long run of double quotes.
///
/// From: http://stackoverflow.com/questions/1341847/special-character-in-xpath-query
/// </summary>
/// <param name="value">The value to match.</param>
/// <returns>If the value contains only single or double quotes, an XPath
/// literal equal to the value. If it contains both, an XPath expression,
/// using concat(), that evaluates to the value.</returns>
public static string XPathLiteral(string value)
{
// If the value contains only single or double quotes, construct
// an XPath literal
if (!value.Contains("\""))
return "\"" + value + "\"";
if (!value.Contains("'"))
return "'" + value + "'";
// If the value contains both single and double quotes, construct an
// expression that concatenates all non-double-quote substrings with
// the quotes, e.g.:
//
// concat("foo",'"',"bar")
List<string> parts = new List<string>();
// First, put a '"' after each component in the string.
foreach (var str in value.Split('"'))
{
if (!string.IsNullOrEmpty(str))
parts.Add('"' + str + '"'); // (edited -- thanks Daniel :-)
parts.Add("'\"'");
}
// Then remove the extra '"' after the last component.
parts.RemoveAt(parts.Count - 1);
// Finally, put it together into a concat() function call.
return "concat(" + string.Join(",", parts) + ")";
}
You can quote an XPath string by using search and replace.
In F#
let quoteString (s : string) =
if not (s.Contains "'" ) then sprintf "'%s'" s
else if not (s.Contains "\"") then sprintf "\"%s\"" s
else "concat('" + s.Replace ("'", "', \"'\", '") + "')"
I haven't tested it extensively, but seems to work.
I really like Robert's answer, but I feel like the code could be a little denser.
using System.Linq;
namespace Humig.Csp.Common
{
public static class XpathHelpers
{
public static string XpathLiteralEncode(string literalValue)
{
return string.IsNullOrEmpty(literalValue)
? "''"
: !literalValue.Contains("\"")
? $"\"{literalValue}\""
: !literalValue.Contains("'")
? $"'{literalValue}'"
: $"concat({string.Join(",'\"',", literalValue.Split('"').Select(k => $"\"{k}\""))})";
}
}
}
I have also created a unit test with all the test cases:
using HtmlAgilityPack;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace Humig.Csp.Common.Tests
{
[TestClass()]
public class XpathHelpersTests
{
[DataRow("foo")] // no quotes
[DataRow("\"foo")] // double quotes only
[DataRow("'foo")] // single quotes only
[DataRow("'foo\"bar")] // both; double quotes in mid-string
[DataRow("'foo\"bar\"baz")] // multiple double quotes in mid-string
[DataRow("'foo\"")] // string ends with double quotes
[DataRow("'foo\"\"")] // string ends with run of double quotes
[DataRow("\"'foo")] // string begins with double quotes
[DataRow("\"\"'foo")] // string begins with run of double quotes
[DataRow("'foo\"\"bar")] // run of double quotes in mid-string
[TestMethod()]
public void XpathLiteralEncodeTest(string attrValue)
{
var doc = new HtmlDocument();
var hnode = doc.CreateElement("html");
var body = doc.CreateElement("body");
var div = doc.CreateElement("div");
div.Attributes.Add("data-test", attrValue);
doc.DocumentNode.AppendChild(hnode);
hnode.AppendChild(body);
body.AppendChild(div);
var literalOut = XpathHelpers.XpathLiteralEncode(attrValue);
string xpath = $"/html/body/div[#data-test = {literalOut}]";
var result = doc.DocumentNode.SelectSingleNode(xpath);
Assert.AreEqual(div, result, $"did not find a match for {attrValue}");
}
}
}
If you're not going to have any double-quotes in SomeValue, you can use escaped double-quotes to specify the value you're searching for in your XPath search string.
ListObject[#Title=\"SomeValue\"]
You can fix this issue by using double quotes instead of single quotes in the XPath expression.
For ex:
element.XPathSelectElements(String.Format("//group[#title=\"{0}\"]", "Man's"));
I had this problem a while back and seemingly the simplest, but not the fastest solution is that you add a new node into the XML document that has an attribute with the value 'SomeValue', then look for that attribute value using a simple xpath search. After the you're finished with the operation, you can delete the "temporary node" from the XML document.
This way, the whole comparison happens "inside", so you don't have to construct the weird XPath query.
I seem to remember that in order to speed things up, you should be adding the temp value to the root node.
Good luck...

Categories

Resources