I use the following XPATH Query to list the object under a site. ListObject[#Title='SomeValue']. SomeValue is dynamic. This query works as long as SomeValue does not have an apostrophe ('). Tried using escape sequence also. Didn't work.
What am I doing wrong?
This is surprisingly difficult to do.
Take a look at the XPath Recommendation, and you'll see that it defines a literal as:
Literal ::= '"' [^"]* '"'
| "'" [^']* "'"
Which is to say, string literals in XPath expressions can contain apostrophes or double quotes but not both.
You can't use escaping to get around this. A literal like this:
'Some'Value'
will match this XML text:
Some'Value
This does mean that it's possible for there to be a piece of XML text that you can't generate an XPath literal to match, e.g.:
<elm att=""&apos"/>
But that doesn't mean it's impossible to match that text with XPath, it's just tricky. In any case where the value you're trying to match contains both single and double quotes, you can construct an expression that uses concat to produce the text that it's going to match:
elm[#att=concat('"', "'")]
So that leads us to this, which is a lot more complicated than I'd like it to be:
/// <summary>
/// Produce an XPath literal equal to the value if possible; if not, produce
/// an XPath expression that will match the value.
///
/// Note that this function will produce very long XPath expressions if a value
/// contains a long run of double quotes.
/// </summary>
/// <param name="value">The value to match.</param>
/// <returns>If the value contains only single or double quotes, an XPath
/// literal equal to the value. If it contains both, an XPath expression,
/// using concat(), that evaluates to the value.</returns>
static string XPathLiteral(string value)
{
// if the value contains only single or double quotes, construct
// an XPath literal
if (!value.Contains("\""))
{
return "\"" + value + "\"";
}
if (!value.Contains("'"))
{
return "'" + value + "'";
}
// if the value contains both single and double quotes, construct an
// expression that concatenates all non-double-quote substrings with
// the quotes, e.g.:
//
// concat("foo", '"', "bar")
StringBuilder sb = new StringBuilder();
sb.Append("concat(");
string[] substrings = value.Split('\"');
for (int i = 0; i < substrings.Length; i++ )
{
bool needComma = (i>0);
if (substrings[i] != "")
{
if (i > 0)
{
sb.Append(", ");
}
sb.Append("\"");
sb.Append(substrings[i]);
sb.Append("\"");
needComma = true;
}
if (i < substrings.Length - 1)
{
if (needComma)
{
sb.Append(", ");
}
sb.Append("'\"'");
}
}
sb.Append(")");
return sb.ToString();
}
And yes, I tested it with all the edge cases. That's why the logic is so stupidly complex:
foreach (string s in new[]
{
"foo", // no quotes
"\"foo", // double quotes only
"'foo", // single quotes only
"'foo\"bar", // both; double quotes in mid-string
"'foo\"bar\"baz", // multiple double quotes in mid-string
"'foo\"", // string ends with double quotes
"'foo\"\"", // string ends with run of double quotes
"\"'foo", // string begins with double quotes
"\"\"'foo", // string begins with run of double quotes
"'foo\"\"bar" // run of double quotes in mid-string
})
{
Console.Write(s);
Console.Write(" = ");
Console.WriteLine(XPathLiteral(s));
XmlElement elm = d.CreateElement("test");
d.DocumentElement.AppendChild(elm);
elm.SetAttribute("value", s);
string xpath = "/root/test[#value = " + XPathLiteral(s) + "]";
if (d.SelectSingleNode(xpath) == elm)
{
Console.WriteLine("OK");
}
else
{
Console.WriteLine("Should have found a match for {0}, and didn't.", s);
}
}
Console.ReadKey();
}
I ported Robert's answer to Java (tested in 1.6):
/// <summary>
/// Produce an XPath literal equal to the value if possible; if not, produce
/// an XPath expression that will match the value.
///
/// Note that this function will produce very long XPath expressions if a value
/// contains a long run of double quotes.
/// </summary>
/// <param name="value">The value to match.</param>
/// <returns>If the value contains only single or double quotes, an XPath
/// literal equal to the value. If it contains both, an XPath expression,
/// using concat(), that evaluates to the value.</returns>
public static String XPathLiteral(String value) {
if(!value.contains("\"") && !value.contains("'")) {
return "'" + value + "'";
}
// if the value contains only single or double quotes, construct
// an XPath literal
if (!value.contains("\"")) {
System.out.println("Doesn't contain Quotes");
String s = "\"" + value + "\"";
System.out.println(s);
return s;
}
if (!value.contains("'")) {
System.out.println("Doesn't contain apostophes");
String s = "'" + value + "'";
System.out.println(s);
return s;
}
// if the value contains both single and double quotes, construct an
// expression that concatenates all non-double-quote substrings with
// the quotes, e.g.:
//
// concat("foo", '"', "bar")
StringBuilder sb = new StringBuilder();
sb.append("concat(");
String[] substrings = value.split("\"");
for (int i = 0; i < substrings.length; i++) {
boolean needComma = (i > 0);
if (!substrings[i].equals("")) {
if (i > 0) {
sb.append(", ");
}
sb.append("\"");
sb.append(substrings[i]);
sb.append("\"");
needComma = true;
}
if (i < substrings.length - 1) {
if (needComma) {
sb.append(", ");
}
sb.append("'\"'");
}
System.out.println("Step " + i + ": " + sb.toString());
}
//This stuff is because Java is being stupid about splitting strings
if(value.endsWith("\"")) {
sb.append(", '\"'");
}
//The code works if the string ends in a apos
/*else if(value.endsWith("'")) {
sb.append(", \"'\"");
}*/
sb.append(")");
String s = sb.toString();
System.out.println(s);
return s;
}
Hope this helps somebody!
EDIT: After a heavy unit testing session, and checking the XPath Standards, I have revised my function as follows:
public static string ToXPath(string value) {
const string apostrophe = "'";
const string quote = "\"";
if(value.Contains(quote)) {
if(value.Contains(apostrophe)) {
throw new XPathException("Illegal XPath string literal.");
} else {
return apostrophe + value + apostrophe;
}
} else {
return quote + value + quote;
}
}
It appears that XPath doesn't have a character escaping system at all, it's quite primitive really. Evidently my original code only worked by coincidence. My apologies for misleading anyone!
Original answer below for reference only - please ignore
For safety, make sure that any occurrence of all 5 predefined XML entities in your XPath string are escaped, e.g.
public static string ToXPath(string value) {
return "'" + XmlEncode(value) + "'";
}
public static string XmlEncode(string value) {
StringBuilder text = new StringBuilder(value);
text.Replace("&", "&");
text.Replace("'", "'");
text.Replace(#"""", """);
text.Replace("<", "<");
text.Replace(">", ">");
return text.ToString();
}
I have done this before and it works fine. If it doesn't work for you, maybe there is some additional context to the problem that you need to make us aware of.
By far the best approach to this problem is to use the facilities provided by your XPath library to declare an XPath-level variable that you can reference in the expression. The variable value can then be any string in the host programming language, and isn't subject to the restrictions of XPath string literals. For example, in Java with javax.xml.xpath:
XPathFactory xpf = XPathFactory.newInstance();
final Map<String, Object> variables = new HashMap<>();
xpf.setXPathVariableResolver(new XPathVariableResolver() {
public Object resolveVariable(QName name) {
return variables.get(name.getLocalPart());
}
});
XPath xpath = xpf.newXPath();
XPathExpression expr = xpath.compile("ListObject[#Title=$val]");
variables.put("val", someValue);
NodeList nodes = (NodeList)expr.evaluate(someNode, XPathConstants.NODESET);
For C# XPathNavigator you would define a custom XsltContext as described in this MSDN article (you'd only need the variable-related parts of this example, not the extension functions).
Most of the answers here focus on how to use string manipulation to cobble together an XPath that uses string delimiters in a valid way.
I would say the best practice is not to rely on such complicated and potentially fragile methods.
The following applies to .NET since this question is tagged with C#. Ian Roberts has provided what I think is the best solution for when you're using XPath in Java.
Nowadays, you can use Linq-to-Xml to query XML documents in a way that allows you to use your variables in the query directly. This is not XPath, but the purpose is the same.
For the example given in OP, you could query the nodes you want like this:
var value = "Some value with 'apostrophes' and \"quotes\"";
// doc is an instance of XElement or XDocument
IEnumerable<XElement> nodes =
doc.Descendants("ListObject")
.Where(lo => (string)lo.Attribute("Title") == value);
or to use the query comprehension syntax:
IEnumerable<XElement> nodes = from lo in doc.Descendants("ListObject")
where (string)lo.Attribute("Title") == value
select lo;
.NET also provides a way to use XPath variables in your XPath queries. Sadly, it's not easy to do this out of the box, but with a simple helper class that I provide in this other SO answer, it's quite easy.
You can use it like this:
var value = "Some value with 'apostrophes' and \"quotes\"";
var variableContext = new VariableContext { { "matchValue", value } };
// ixn is an instance of IXPathNavigable
XPathNodeIterator nodes = ixn.CreateNavigator()
.SelectNodes("ListObject[#Title = $matchValue]",
variableContext);
Here is an alternative to Robert Rossney's StringBuilder approach, perhaps more intuitive:
/// <summary>
/// Produce an XPath literal equal to the value if possible; if not, produce
/// an XPath expression that will match the value.
///
/// Note that this function will produce very long XPath expressions if a value
/// contains a long run of double quotes.
///
/// From: http://stackoverflow.com/questions/1341847/special-character-in-xpath-query
/// </summary>
/// <param name="value">The value to match.</param>
/// <returns>If the value contains only single or double quotes, an XPath
/// literal equal to the value. If it contains both, an XPath expression,
/// using concat(), that evaluates to the value.</returns>
public static string XPathLiteral(string value)
{
// If the value contains only single or double quotes, construct
// an XPath literal
if (!value.Contains("\""))
return "\"" + value + "\"";
if (!value.Contains("'"))
return "'" + value + "'";
// If the value contains both single and double quotes, construct an
// expression that concatenates all non-double-quote substrings with
// the quotes, e.g.:
//
// concat("foo",'"',"bar")
List<string> parts = new List<string>();
// First, put a '"' after each component in the string.
foreach (var str in value.Split('"'))
{
if (!string.IsNullOrEmpty(str))
parts.Add('"' + str + '"'); // (edited -- thanks Daniel :-)
parts.Add("'\"'");
}
// Then remove the extra '"' after the last component.
parts.RemoveAt(parts.Count - 1);
// Finally, put it together into a concat() function call.
return "concat(" + string.Join(",", parts) + ")";
}
You can quote an XPath string by using search and replace.
In F#
let quoteString (s : string) =
if not (s.Contains "'" ) then sprintf "'%s'" s
else if not (s.Contains "\"") then sprintf "\"%s\"" s
else "concat('" + s.Replace ("'", "', \"'\", '") + "')"
I haven't tested it extensively, but seems to work.
I really like Robert's answer, but I feel like the code could be a little denser.
using System.Linq;
namespace Humig.Csp.Common
{
public static class XpathHelpers
{
public static string XpathLiteralEncode(string literalValue)
{
return string.IsNullOrEmpty(literalValue)
? "''"
: !literalValue.Contains("\"")
? $"\"{literalValue}\""
: !literalValue.Contains("'")
? $"'{literalValue}'"
: $"concat({string.Join(",'\"',", literalValue.Split('"').Select(k => $"\"{k}\""))})";
}
}
}
I have also created a unit test with all the test cases:
using HtmlAgilityPack;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace Humig.Csp.Common.Tests
{
[TestClass()]
public class XpathHelpersTests
{
[DataRow("foo")] // no quotes
[DataRow("\"foo")] // double quotes only
[DataRow("'foo")] // single quotes only
[DataRow("'foo\"bar")] // both; double quotes in mid-string
[DataRow("'foo\"bar\"baz")] // multiple double quotes in mid-string
[DataRow("'foo\"")] // string ends with double quotes
[DataRow("'foo\"\"")] // string ends with run of double quotes
[DataRow("\"'foo")] // string begins with double quotes
[DataRow("\"\"'foo")] // string begins with run of double quotes
[DataRow("'foo\"\"bar")] // run of double quotes in mid-string
[TestMethod()]
public void XpathLiteralEncodeTest(string attrValue)
{
var doc = new HtmlDocument();
var hnode = doc.CreateElement("html");
var body = doc.CreateElement("body");
var div = doc.CreateElement("div");
div.Attributes.Add("data-test", attrValue);
doc.DocumentNode.AppendChild(hnode);
hnode.AppendChild(body);
body.AppendChild(div);
var literalOut = XpathHelpers.XpathLiteralEncode(attrValue);
string xpath = $"/html/body/div[#data-test = {literalOut}]";
var result = doc.DocumentNode.SelectSingleNode(xpath);
Assert.AreEqual(div, result, $"did not find a match for {attrValue}");
}
}
}
If you're not going to have any double-quotes in SomeValue, you can use escaped double-quotes to specify the value you're searching for in your XPath search string.
ListObject[#Title=\"SomeValue\"]
You can fix this issue by using double quotes instead of single quotes in the XPath expression.
For ex:
element.XPathSelectElements(String.Format("//group[#title=\"{0}\"]", "Man's"));
I had this problem a while back and seemingly the simplest, but not the fastest solution is that you add a new node into the XML document that has an attribute with the value 'SomeValue', then look for that attribute value using a simple xpath search. After the you're finished with the operation, you can delete the "temporary node" from the XML document.
This way, the whole comparison happens "inside", so you don't have to construct the weird XPath query.
I seem to remember that in order to speed things up, you should be adding the temp value to the root node.
Good luck...
Related
I want to write some Html from c# (html is an example, this might be other languages..)
For example:
string div = #"<div class=""className"">
<span>Mon text</span>
</div>";
will produce:
<div class="className">
<span>Mon text</span>
</div>
that's not very cool from the Html point of view...
The only way to have a correct HTML indentation will be to indent the C# code like this :
string div = #"<div class=""className"">
<span>Mon text</span>
</div>";
We get the correctly indented Html:
<div class="className">
<span>Mon text</span>
</div>
But indenting the C# like this really broke the readability of the code...
Is there a way to act on the indentation in the C# language ?
If not, does someone have a tip better than :
string div = "<div class=\"className\">" + Environment.NewLine +
" <span>Mon text</span>" + Environment.NewLine +
"</div>";
and better than
var sbDiv = new StringBuilder();
sbDiv.AppendLine("<div class=\"className\">");
sbDiv.AppendLine(" <span>Mon text</span>");
sbDiv.AppendLine("</div>");
What i use as a solution:
Greats thanks to #Yotam for its answer.
I write a little extension to make the alignment "dynamic" :
/// <summary>
/// Align a multiline string from the indentation of its first line
/// </summary>
/// <remarks>The </remarks>
/// <param name="source">The string to align</param>
/// <returns></returns>
public static string AlignFromFirstLine(this string source)
{
if (String.IsNullOrEmpty(source)) {
return source;
}
if (!source.StartsWith(Environment.NewLine)) {
throw new FormatException("String must start with a NewLine character.");
}
int indentationSize = source.Skip(Environment.NewLine.Length)
.TakeWhile(Char.IsWhiteSpace)
.Count();
string indentationStr = new string(' ', indentationSize);
return source.TrimStart().Replace($"\n{indentationStr}", "\n");
}
Then i can use it like that :
private string GetHtml(string className)
{
return $#"
<div class=""{className}"">
<span>Texte</span>
</div>".AlignFromFirstLine();
}
That return the correct html :
<div class="myClassName">
<span>Texte</span>
</div>
One limitation is that it will only work with space indentation...
Any improvement will be welcome !
You could wrap the string to the next line to get the desired indentation:
string div =
#"
<div class=""className"">
<span>Mon text</span>
</div>"
.TrimStart(); // to remove the additional new-line at the beginning
Another nice solution (disadvantage: depends on the indentation level!)
string div = #"
<div class=""className"">
<span>Mon text</span>
</div>".TrimStart().Replace("\n ", "\n");
It just removes the indentation out of the string. make sure the number of spaces in the first string of the Replace is the same amount of spaces your indentation has.
I like this solution more, but how about:
string div = "<div class='className'>\n"
+ " <span>Mon text</span>\n"
+ "</div>";
This gets rid of some clutter:
Replace " inside strings with ' so that you don't need to escape the quote. (Single quotes in HTML appear to be legal.)
You can then also use regular "" string literals instead of #"".
Use \n instead of Environment.NewLine.
Note that the string concatenation is performed during compilation, by the compiler. (See also this and this blog post on the subject by Eric Lippert, who previously worked on the C# compiler.) There is no runtime performance penalty.
Inspired by trimIndent() in Kotlin.
This code:
var x = #"
anything
you
want
".TrimIndent();
will produce a string:
anything
you
want
or "\nanything\n you\nwant\n"
Implementation:
public static string TrimIndent(this string s)
{
string[] lines = s.Split('\n');
IEnumerable<int> firstNonWhitespaceIndices = lines
.Skip(1)
.Where(it => it.Trim().Length > 0)
.Select(IndexOfFirstNonWhitespace);
int firstNonWhitespaceIndex;
if (firstNonWhitespaceIndices.Any()) firstNonWhitespaceIndex = firstNonWhitespaceIndices.Min();
else firstNonWhitespaceIndex = -1;
if (firstNonWhitespaceIndex == -1) return s;
IEnumerable<string> unindentedLines = lines.Select(it => UnindentLine(it, firstNonWhitespaceIndex));
return String.Join("\n", unindentedLines);
}
private static string UnindentLine(string line, int firstNonWhitespaceIndex)
{
if (firstNonWhitespaceIndex < line.Length)
{
if (line.Substring(0, firstNonWhitespaceIndex).Trim().Length != 0)
{
return line;
}
return line.Substring(firstNonWhitespaceIndex, line.Length - firstNonWhitespaceIndex);
}
return line.Trim().Length == 0 ? "" : line;
}
private static int IndexOfFirstNonWhitespace(string s)
{
char[] chars = s.ToCharArray();
for (int i = 0; i < chars.Length; i++)
{
if (chars[i] != ' ' && chars[i] != '\t') return i;
}
return -1;
}
If it is one long string then you can always keep the string in a text file and read it into your variable, e.g.
string text = File.ReadAllText(#"c:\file.txt", Encoding.UTF8);
This way you can format it anyway you want using a text editor and it won't negatively effect the look of your code.
If you're changing parts of the string on the fly then StringBuilder is your best option. - or if you did decide to read the string in from a text file, you could include {0} elements in your string and then use string.format(text, "text1","text2", etc) to change the required parts.
I want to allow user so they can read more than one tag in a string. So far, user could only add one tag
if (rtb.Text.Contains("[b]"))
{
Regex regex = new Regex(#"\[b\](.*)\[/b\]");
var v = regex.Match(rtb.Text);
string s = v.Groups[1].ToString();
rtb.SelectionStart = rtb.Text.IndexOf("[b]");
rtb.SelectionLength = s.Length + 7;
rtb.SelectionFont = new Font(rtb.Font.FontFamily, rtb.Font.Size, FontStyle.Bold);
rtb.SelectedText = s;
}
else if (rtb.Text.Contains("[i]"))
{
Regex regex = new Regex(#"\[i\](.*)\[/i\]");
var v = regex.Match(rtb.Text);
string s = v.Groups[1].ToString();
rtb.SelectionStart = rtb.Text.IndexOf("[b]");
rtb.SelectionLength = s.Length + 7;
rtb.SelectionFont = new Font(rtb.Font.FontFamily, rtb.Font.Size, FontStyle.Italic);
rtb.SelectedText = s;
}
richTextBox1.Select(richTextBox1.TextLength, 0);
richTextBox1.SelectedRtf = rtb.Rtf;
If i have this string:
"Hello [b]World[/b] Meet the [b]Programmer[/b]"
the output would be like this:
"Hello World Meet the Programmer"
And if i have this string:
"Hello [b]World[/b] Meet the [i]Programmer[/i]"
the output would be like this:
"Hello World Meet the [i]Programmer[/i]"
How to read multiple tags from a string? like, in a string if i have 2 [b][/b] tags, 5 [i][/i] tags or even mixed tags ([b][i][/i][/b])?
Two problems:
1. Greedy matching semantics of Regex
\[b\](.*)\[/b\] looks for the longest possible match within your string, i.e. it is greedy. In your example, you expect it to match [b]World[/b], when in fact it matches [b]World[/b] Meet the [b]Programmer[/b] (consequently making "Meet the" bold as well). This can easily be resolved using non-greedy syntax: \[b\](.*?)\[/b\] (note the extra ?)
Details: How to Match with Regex "shortest match" in .NET
2. You are only looking for one occurrence of tags!
Obviously, your code will only highlight a single [b]/[i] tag. Don't use else if if you want [i] to be handled if your string contains [b]. Use loops and Regex.Matches if you want to handle all occurrences of your regular expression instead of just the first one.
Without Regex but still must be adapted slightly.
The test:
[Test]
public void Text()
{
string str = "[b]Hello[/b] This is sample text [b] Goodbye [/b]";
var bold = AllIndexesOf(str, "b").ToArray();
// Assume the IEnumerable is even else it should of thrown an error
for (int i = 0; i < bold.Count(); i += 2)
{
Console.WriteLine($"Pair: {bold[i]} | {bold[i+1]}");
}
// str.AllIndexesOf
}
Here is the method.
/// <summary>
/// Courtesy of : http://stackoverflow.com/a/24016130/5282506
/// Adapted by me.
///
/// Pass in the unique symbol and itll find the first and last index pairs
/// Can adapt to find all unique pairs at once.
/// </summary>
/// <param name="str">The string.</param>
/// <param name="searchstring">The searchstring letter (b, i, etc)</param>
/// <returns></returns>
public static IEnumerable<int> AllIndexesOf(string str, string searchstring)
{
//assumes the string is formatted correctly. Only one tag of the same type inside each tag.
int minIndex = str.IndexOf("["+searchstring+"]");
while (minIndex != -1)
{
Console.WriteLine("First: {0}", minIndex);
yield return minIndex;
var maxIndexEnd = str.IndexOf("[/"+ searchstring +"]", minIndex + searchstring.Length +3);//added three for the [/ and ] characters.
Console.WriteLine("End: {0}", maxIndexEnd);
if (maxIndexEnd == -1)
{
//Malformed string, no end element for a found start element
//Do something...
throw new FormatException("Malformed string");
}
yield return maxIndexEnd;
minIndex = str.IndexOf("[" + searchstring+"]", maxIndexEnd + searchstring.Length+2);//added two for the [ and ] characters
}
}
If you wish to make it an extension method for string change signature to this:
public static IEnumerable<int> AllIndexesOf(this string str, string searchstring)
Heres the console result for bold indexes:
Pair: 0 | 8
Pair: 33 | 45
I have not fully tested this method for all edge cases.
Question:
Can anybody give me a working regex expression (C#/VB.NET) that can remove single line comments from a SQL statement ?
I mean these comments:
-- This is a comment
not those
/* this is a comment */
because I already can handle the star comments.
I have a made a little parser that removes those comments when they are at the start of the line, but they can also be somewhere after code or worse, in a SQL-string 'hello --Test -- World'
Those comments should also be removed (except those in a SQL string of course - if possible).
Surprisingly I didn't got the regex working. I would have assumed the star comments to be more difficult, but actually, they aren't.
As per request, here my code to remove /**/-style comments
(In order to have it ignore SQL-Style strings, you have to subsitute strings with a uniqueidentifier (i used 4 concated), then apply the comment-removal, then apply string-backsubstitution.
static string RemoveCstyleComments(string strInput)
{
string strPattern = #"/[*][\w\d\s]+[*]/";
//strPattern = #"/\*.*?\*/"; // Doesn't work
//strPattern = "/\\*.*?\\*/"; // Doesn't work
//strPattern = #"/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/ "; // Doesn't work
//strPattern = #"/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/ "; // Doesn't work
// http://stackoverflow.com/questions/462843/improving-fixing-a-regex-for-c-style-block-comments
strPattern = #"/\*(?>(?:(?>[^*]+)|\*(?!/))*)\*/"; // Works !
string strOutput = System.Text.RegularExpressions.Regex.Replace(strInput, strPattern, string.Empty, System.Text.RegularExpressions.RegexOptions.Multiline);
Console.WriteLine(strOutput);
return strOutput;
} // End Function RemoveCstyleComments
I will disappoint all of you. This can't be done with regular expressions. Sure, it's easy to find comments not in a string (that even the OP could do), the real deal is comments in a string. There is a little hope of the look arounds, but that's still not enough. By telling that you have a preceding quote in a line won't guarantee anything. The only thing what guarantees you something is the oddity of quotes. Something you can't find with regular expression. So just simply go with non-regular-expression approach.
EDIT:
Here's the c# code:
String sql = "--this is a test\r\nselect stuff where substaff like '--this comment should stay' --this should be removed\r\n";
char[] quotes = { '\'', '"'};
int newCommentLiteral, lastCommentLiteral = 0;
while ((newCommentLiteral = sql.IndexOf("--", lastCommentLiteral)) != -1)
{
int countQuotes = sql.Substring(lastCommentLiteral, newCommentLiteral - lastCommentLiteral).Split(quotes).Length - 1;
if (countQuotes % 2 == 0) //this is a comment, since there's an even number of quotes preceding
{
int eol = sql.IndexOf("\r\n") + 2;
if (eol == -1)
eol = sql.Length; //no more newline, meaning end of the string
sql = sql.Remove(newCommentLiteral, eol - newCommentLiteral);
lastCommentLiteral = newCommentLiteral;
}
else //this is within a string, find string ending and moving to it
{
int singleQuote = sql.IndexOf("'", newCommentLiteral);
if (singleQuote == -1)
singleQuote = sql.Length;
int doubleQuote = sql.IndexOf('"', newCommentLiteral);
if (doubleQuote == -1)
doubleQuote = sql.Length;
lastCommentLiteral = Math.Min(singleQuote, doubleQuote) + 1;
//instead of finding the end of the string you could simply do += 2 but the program will become slightly slower
}
}
Console.WriteLine(sql);
What this does: find every comment literal. For each, check if it's within a comment or not, by counting the number of quotes between the current match and the last one. If this number is even, then it's a comment, thus remove it (find first end of line and remove whats between). If it's odd, this is within a string, find the end of the string and move to it. Rgis snippet is based on a wierd SQL trick: 'this" is a valid string. Even tho the 2 quotes differ. If it's not true for your SQL language, you should try a completely different approach. I'll write a program to that too if that's the case, but this one's faster and more straightforward.
You want something like this for the simple case
-{2,}.*
The -{2,} looks for a dash that happens 2 or more times
The .* gets the rest of the lines up to the newline
*But, for the edge cases, it appears that SinistraD is correct in that you cannot catch everything, however here is an article about how this can be done in C# with a combination of code and regex.
This seems to work well for me so far; it even ignores comments within strings, such as SELECT '--not a comment--' FROM ATable
private static string removeComments(string sql)
{
string pattern = #"(?<=^ ([^'""] |['][^']*['] |[""][^""]*[""])*) (--.*$|/\*(.|\n)*?\*/)";
return Regex.Replace(sql, pattern, "", RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline);
}
Note: it is designed to eliminate both /**/-style comments as well as -- style. Remove |/\*(.|\n)*?\*/ to get rid of the /**/ checking. Also be sure you are using the RegexOptions.IgnorePatternWhitespace Regex option!!
I wanted to be able to handle double-quotes too, but since T-SQL doesn't support them, you could get rid of |[""][^""]*[""] too.
Adapted from here.
Note (Mar 2015): In the end, I wound up using Antlr, a parser generator, for this project. There may have been some edge cases where the regex didn't work. In the end I was much more confident with the results having used Antlr, and it's worked well.
Using System.Text.RegularExpressions;
public static string RemoveSQLCommentCallback(Match SQLLineMatch)
{
System.Text.StringBuilder sb = new System.Text.StringBuilder();
bool open = false; //opening of SQL String found
char prev_ch = ' ';
foreach (char ch in SQLLineMatch.ToString())
{
if (ch == '\'')
{
open = !open;
}
else if ((!open && prev_ch == '-' && ch == '-'))
{
break;
}
sb.Append(ch);
prev_ch = ch;
}
return sb.ToString().Trim('-');
}
The code
public static void Main()
{
string sqlText = "WHERE DEPT_NAME LIKE '--Test--' AND START_DATE < SYSDATE -- Don't go over today";
//for every matching line call callback func
string result = Regex.Replace(sqlText, ".*--.*", RemoveSQLCommentCallback);
}
Let's replace, find all the lines that match dash dash comment and call your parsing function for every match.
As a late solution, the simplest way is to do it using ScriptDom-TSqlParser:
// https://michaeljswart.com/2014/04/removing-comments-from-sql/
// http://web.archive.org/web/*/https://michaeljswart.com/2014/04/removing-comments-from-sql/
public static string StripCommentsFromSQL(string SQL)
{
Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser parser =
new Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser(true);
System.Collections.Generic.IList<Microsoft.SqlServer.TransactSql.ScriptDom.ParseError> errors;
Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragment fragments =
parser.Parse(new System.IO.StringReader(SQL), out errors);
// clear comments
string result = string.Join(
string.Empty,
fragments.ScriptTokenStream
.Where(x => x.TokenType != Microsoft.SqlServer.TransactSql.ScriptDom.TSqlTokenType.MultilineComment)
.Where(x => x.TokenType != Microsoft.SqlServer.TransactSql.ScriptDom.TSqlTokenType.SingleLineComment)
.Select(x => x.Text));
return result;
}
or instead of using the Microsoft-Parser, you can use ANTL4 TSqlLexer
or without any parser at all:
private static System.Text.RegularExpressions.Regex everythingExceptNewLines =
new System.Text.RegularExpressions.Regex("[^\r\n]");
// http://drizin.io/Removing-comments-from-SQL-scripts/
// http://web.archive.org/web/*/http://drizin.io/Removing-comments-from-SQL-scripts/
public static string RemoveComments(string input, bool preservePositions, bool removeLiterals = false)
{
//based on http://stackoverflow.com/questions/3524317/regex-to-strip-line-comments-from-c-sharp/3524689#3524689
var lineComments = #"--(.*?)\r?\n";
var lineCommentsOnLastLine = #"--(.*?)$"; // because it's possible that there's no \r\n after the last line comment
// literals ('literals'), bracketedIdentifiers ([object]) and quotedIdentifiers ("object"), they follow the same structure:
// there's the start character, any consecutive pairs of closing characters are considered part of the literal/identifier, and then comes the closing character
var literals = #"('(('')|[^'])*')"; // 'John', 'O''malley''s', etc
var bracketedIdentifiers = #"\[((\]\])|[^\]])* \]"; // [object], [ % object]] ], etc
var quotedIdentifiers = #"(\""((\""\"")|[^""])*\"")"; // "object", "object[]", etc - when QUOTED_IDENTIFIER is set to ON, they are identifiers, else they are literals
//var blockComments = #"/\*(.*?)\*/"; //the original code was for C#, but Microsoft SQL allows a nested block comments // //https://msdn.microsoft.com/en-us/library/ms178623.aspx
//so we should use balancing groups // http://weblogs.asp.net/whaggard/377025
var nestedBlockComments = #"/\*
(?>
/\* (?<LEVEL>) # On opening push level
|
\*/ (?<-LEVEL>) # On closing pop level
|
(?! /\* | \*/ ) . # Match any char unless the opening and closing strings
)+ # /* or */ in the lookahead string
(?(LEVEL)(?!)) # If level exists then fail
\*/";
string noComments = System.Text.RegularExpressions.Regex.Replace(input,
nestedBlockComments + "|" + lineComments + "|" + lineCommentsOnLastLine + "|" + literals + "|" + bracketedIdentifiers + "|" + quotedIdentifiers,
me => {
if (me.Value.StartsWith("/*") && preservePositions)
return everythingExceptNewLines.Replace(me.Value, " "); // preserve positions and keep line-breaks // return new string(' ', me.Value.Length);
else if (me.Value.StartsWith("/*") && !preservePositions)
return "";
else if (me.Value.StartsWith("--") && preservePositions)
return everythingExceptNewLines.Replace(me.Value, " "); // preserve positions and keep line-breaks
else if (me.Value.StartsWith("--") && !preservePositions)
return everythingExceptNewLines.Replace(me.Value, ""); // preserve only line-breaks // Environment.NewLine;
else if (me.Value.StartsWith("[") || me.Value.StartsWith("\""))
return me.Value; // do not remove object identifiers ever
else if (!removeLiterals) // Keep the literal strings
return me.Value;
else if (removeLiterals && preservePositions) // remove literals, but preserving positions and line-breaks
{
var literalWithLineBreaks = everythingExceptNewLines.Replace(me.Value, " ");
return "'" + literalWithLineBreaks.Substring(1, literalWithLineBreaks.Length - 2) + "'";
}
else if (removeLiterals && !preservePositions) // wrap completely all literals
return "''";
else
throw new System.NotImplementedException();
},
System.Text.RegularExpressions.RegexOptions.Singleline | System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace);
return noComments;
}
I don't know if C#/VB.net regex is special in some way but traditionally s/--.*// should work.
In PHP, i'm using this code to uncomment SQL (only single line):
$sqlComments = '#(([\'"`]).*?[^\\\]\2)|((?:\#|--).*?$)\s*|(?<=;)\s+#ms';
/* Commented version
$sqlComments = '#
(([\'"`]).*?[^\\\]\2) # $1 : Skip single & double quoted + backticked expressions
|((?:\#|--).*?$) # $3 : Match single line comments
\s* # Trim after comments
|(?<=;)\s+ # Trim after semi-colon
#msx';
*/
$uncommentedSQL = trim( preg_replace( $sqlComments, '$1', $sql ) );
preg_match_all( $sqlComments, $sql, $comments );
$extractedComments = array_filter( $comments[ 3 ] );
var_dump( $uncommentedSQL, $extractedComments );
To remove all comments see Regex to match MySQL comments
Short version:
Is it enough to wrap the argument in quotes and escape \ and " ?
Code version
I want to pass the command line arguments string[] args to another process using ProcessInfo.Arguments.
ProcessStartInfo info = new ProcessStartInfo();
info.FileName = Application.ExecutablePath;
info.UseShellExecute = true;
info.Verb = "runas"; // Provides Run as Administrator
info.Arguments = EscapeCommandLineArguments(args);
Process.Start(info);
The problem is that I get the arguments as an array and must merge them into a single string. An arguments could be crafted to trick my program.
my.exe "C:\Documents and Settings\MyPath \" --kill-all-humans \" except fry"
According to this answer I have created the following function to escape a single argument, but I might have missed something.
private static string EscapeCommandLineArguments(string[] args)
{
string arguments = "";
foreach (string arg in args)
{
arguments += " \"" +
arg.Replace ("\\", "\\\\").Replace("\"", "\\\"") +
"\"";
}
return arguments;
}
Is this good enough or is there any framework function for this?
It's more complicated than that though!
I was having related problem (writing front-end .exe that will call the back-end with all parameters passed + some extra ones) and so i looked how people do that, ran into your question. Initially all seemed good doing it as you suggest arg.Replace (#"\", #"\\").Replace(quote, #"\"+quote).
However when i call with arguments c:\temp a\\b, this gets passed as c:\temp and a\\b, which leads to the back-end being called with "c:\\temp" "a\\\\b" - which is incorrect, because there that will be two arguments c:\\temp and a\\\\b - not what we wanted! We have been overzealous in escapes (windows is not unix!).
And so i read in detail http://msdn.microsoft.com/en-us/library/system.environment.getcommandlineargs.aspx and it actually describes there how those cases are handled: backslashes are treated as escape only in front of double quote.
There is a twist to it in how multiple \ are handled there, the explanation can leave one dizzy for a while. I'll try to re-phrase said unescape rule here: say we have a substring of N \, followed by ". When unescaping, we replace that substring with int(N/2) \ and iff N was odd, we add " at the end.
The encoding for such decoding would go like that: for an argument, find each substring of 0-or-more \ followed by " and replace it by twice-as-many \, followed by \". Which we can do like so:
s = Regex.Replace(arg, #"(\\*)" + "\"", #"$1$1\" + "\"");
That's all...
PS. ... not. Wait, wait - there is more! :)
We did the encoding correctly but there is a twist because you are enclosing all parameters in double-quotes (in case there are spaces in some of them). There is a boundary issue - in case a parameter ends on \, adding " after it will break the meaning of closing quote. Example c:\one\ two parsed to c:\one\ and two then will be re-assembled to "c:\one\" "two" that will me (mis)understood as one argument c:\one" two (I tried that, i am not making it up). So what we need in addition is to check if argument ends on \ and if so, double the number of backslashes at the end, like so:
s = "\"" + Regex.Replace(s, #"(\\+)$", #"$1$1") + "\"";
My answer was similar to Nas Banov's answer but I wanted double quotes only if necessary.
Cutting out extra unnecessary double quotes
My code saves unnecessarily putting double quotes around it all the time which is important *when you are getting up close to the character limit for parameters.
/// <summary>
/// Encodes an argument for passing into a program
/// </summary>
/// <param name="original">The value that should be received by the program</param>
/// <returns>The value which needs to be passed to the program for the original value
/// to come through</returns>
public static string EncodeParameterArgument(string original)
{
if( string.IsNullOrEmpty(original))
return original;
string value = Regex.Replace(original, #"(\\*)" + "\"", #"$1\$0");
value = Regex.Replace(value, #"^(.*\s.*?)(\\*)$", "\"$1$2$2\"");
return value;
}
// This is an EDIT
// Note that this version does the same but handles new lines in the arugments
public static string EncodeParameterArgumentMultiLine(string original)
{
if (string.IsNullOrEmpty(original))
return original;
string value = Regex.Replace(original, #"(\\*)" + "\"", #"$1\$0");
value = Regex.Replace(value, #"^(.*\s.*?)(\\*)$", "\"$1$2$2\"", RegexOptions.Singleline);
return value;
}
explanation
To escape the backslashes and double quotes correctly you can just replace any instances of multiple backslashes followed by a single double quote with:
string value = Regex.Replace(original, #"(\\*)" + "\"", #"\$1$0");
An extra twice the original backslashes + 1 and the original double quote. i.e., '\' + originalbackslashes + originalbackslashes + '"'. I used $1$0 since $0 has the original backslashes and the original double quote so it makes the replacement a nicer one to read.
value = Regex.Replace(value, #"^(.*\s.*?)(\\*)$", "\"$1$2$2\"");
This can only ever match an entire line that contains a whitespace.
If it matches then it adds double quotes to the beginning and end.
If there was originally backslashes on the end of the argument they will not have been quoted, now that there is a double quote on the end they need to be. So they are duplicated, which quotes them all, and prevents unintentionally quoting the final double quote
It does a minimal matching for the first section so that the last .*? doesn't eat into matching the final backslashes
Output
So these inputs produce the following outputs
hello
hello
\hello\12\3\
\hello\12\3\
hello world
"hello world"
\"hello\"
\\"hello\\\"
\"hello\ world
"\\"hello\ world"
\"hello\\\ world\
"\\"hello\\\ world\\"
hello world\\
"hello world\\\\"
I have ported a C++ function from the Everyone quotes command line arguments the wrong way article.
It works fine, but you should note that cmd.exe interprets command line differently. If (and only if, like the original author of article noted) your command line will be interpreted by cmd.exe you should also escape shell metacharacters.
/// <summary>
/// This routine appends the given argument to a command line such that
/// CommandLineToArgvW will return the argument string unchanged. Arguments
/// in a command line should be separated by spaces; this function does
/// not add these spaces.
/// </summary>
/// <param name="argument">Supplies the argument to encode.</param>
/// <param name="force">
/// Supplies an indication of whether we should quote the argument even if it
/// does not contain any characters that would ordinarily require quoting.
/// </param>
private static string EncodeParameterArgument(string argument, bool force = false)
{
if (argument == null) throw new ArgumentNullException(nameof(argument));
// Unless we're told otherwise, don't quote unless we actually
// need to do so --- hopefully avoid problems if programs won't
// parse quotes properly
if (force == false
&& argument.Length > 0
&& argument.IndexOfAny(" \t\n\v\"".ToCharArray()) == -1)
{
return argument;
}
var quoted = new StringBuilder();
quoted.Append('"');
var numberBackslashes = 0;
foreach (var chr in argument)
{
switch (chr)
{
case '\\':
numberBackslashes++;
continue;
case '"':
// Escape all backslashes and the following
// double quotation mark.
quoted.Append('\\', numberBackslashes*2 + 1);
quoted.Append(chr);
break;
default:
// Backslashes aren't special here.
quoted.Append('\\', numberBackslashes);
quoted.Append(chr);
break;
}
numberBackslashes = 0;
}
// Escape all backslashes, but let the terminating
// double quotation mark we add below be interpreted
// as a metacharacter.
quoted.Append('\\', numberBackslashes*2);
quoted.Append('"');
return quoted.ToString();
}
I was running into issues with this, too. Instead of unparsing args, I went with taking the full original commandline and trimming off the executable. This had the additional benefit of keeping whitespace in the call, even if it isn't needed/used. It still has to chase escapes in the executable, but that seemed easier than the args.
var commandLine = Environment.CommandLine;
var argumentsString = "";
if(args.Length > 0)
{
// Re-escaping args to be the exact same as they were passed is hard and misses whitespace.
// Use the original command line and trim off the executable to get the args.
var argIndex = -1;
if(commandLine[0] == '"')
{
//Double-quotes mean we need to dig to find the closing double-quote.
var backslashPending = false;
var secondDoublequoteIndex = -1;
for(var i = 1; i < commandLine.Length; i++)
{
if(backslashPending)
{
backslashPending = false;
continue;
}
if(commandLine[i] == '\\')
{
backslashPending = true;
continue;
}
if(commandLine[i] == '"')
{
secondDoublequoteIndex = i + 1;
break;
}
}
argIndex = secondDoublequoteIndex;
}
else
{
// No double-quotes, so args begin after first whitespace.
argIndex = commandLine.IndexOf(" ", System.StringComparison.Ordinal);
}
if(argIndex != -1)
{
argumentsString = commandLine.Substring(argIndex + 1);
}
}
Console.WriteLine("argumentsString: " + argumentsString);
I published small project on GitHub that handles most issues with command line encoding/escaping:
https://github.com/ericpopivker/Command-Line-Encoder
There is a CommandLineEncoder.Utils.cs class, as well as Unit Tests that verify the Encoding/Decoding functionality.
I wrote you a small sample to show you how to use escape chars in command line.
public static string BuildCommandLineArgs(List<string> argsList)
{
System.Text.StringBuilder sb = new System.Text.StringBuilder();
foreach (string arg in argsList)
{
sb.Append("\"\"" + arg.Replace("\"", #"\" + "\"") + "\"\" ");
}
if (sb.Length > 0)
{
sb = sb.Remove(sb.Length - 1, 1);
}
return sb.ToString();
}
And here is a test method:
List<string> myArgs = new List<string>();
myArgs.Add("test\"123"); // test"123
myArgs.Add("test\"\"123\"\"234"); // test""123""234
myArgs.Add("test123\"\"\"234"); // test123"""234
string cmargs = BuildCommandLineArgs(myArgs);
// result: ""test\"123"" ""test\"\"123\"\"234"" ""test123\"\"\"234""
// when you pass this result to your app, you will get this args list:
// test"123
// test""123""234
// test123"""234
The point is to to wrap each arg with double-double quotes ( ""arg"" ) and to replace all quotes inside arg value with escaped quote ( test\"123 ).
static string BuildCommandLineFromArgs(params string[] args)
{
if (args == null)
return null;
string result = "";
if (Environment.OSVersion.Platform == PlatformID.Unix
||
Environment.OSVersion.Platform == PlatformID.MacOSX)
{
foreach (string arg in args)
{
result += (result.Length > 0 ? " " : "")
+ arg
.Replace(#" ", #"\ ")
.Replace("\t", "\\\t")
.Replace(#"\", #"\\")
.Replace(#"""", #"\""")
.Replace(#"<", #"\<")
.Replace(#">", #"\>")
.Replace(#"|", #"\|")
.Replace(#"#", #"\#")
.Replace(#"&", #"\&");
}
}
else //Windows family
{
bool enclosedInApo, wasApo;
string subResult;
foreach (string arg in args)
{
enclosedInApo = arg.LastIndexOfAny(
new char[] { ' ', '\t', '|', '#', '^', '<', '>', '&'}) >= 0;
wasApo = enclosedInApo;
subResult = "";
for (int i = arg.Length - 1; i >= 0; i--)
{
switch (arg[i])
{
case '"':
subResult = #"\""" + subResult;
wasApo = true;
break;
case '\\':
subResult = (wasApo ? #"\\" : #"\") + subResult;
break;
default:
subResult = arg[i] + subResult;
wasApo = false;
break;
}
}
result += (result.Length > 0 ? " " : "")
+ (enclosedInApo ? "\"" + subResult + "\"" : subResult);
}
}
return result;
}
An Alternative Approach
If you're passing a complex object such as nested JSON and you have control over the system that's receiving the command line arguments, it's far easier to just encode the command line arg/s as base64 and then decode them from the receiving system.
See here: Encode/Decode String to/from Base64
Use Case: I needed to pass a JSON object that contained an XML string in one of the properties which was overly complicated to escape. This solved it.
Does a nice job of adding arguments, but doesn't escape. Added comment in method where escape sequence should go.
public static string ApplicationArguments()
{
List<string> args = Environment.GetCommandLineArgs().ToList();
args.RemoveAt(0); // remove executable
StringBuilder sb = new StringBuilder();
foreach (string s in args)
{
// todo: add escape double quotes here
sb.Append(string.Format("\"{0}\" ", s)); // wrap all args in quotes
}
return sb.ToString().Trim();
}
Copy sample code function from this url:
http://csharptest.net/529/how-to-correctly-escape-command-line-arguments-in-c/index.html
You can get command line to execute for example like this:
String cmdLine = EscapeArguments(Environment.GetCommandLineArgs().Skip(1).ToArray());
Skip(1) skips executable name.
Is there any way to format a string by name rather than position in C#?
In python, I can do something like this example (shamelessly stolen from here):
>>> print '%(language)s has %(#)03d quote types.' % \
{'language': "Python", "#": 2}
Python has 002 quote types.
Is there any way to do this in C#? Say for instance:
String.Format("{some_variable}: {some_other_variable}", ...);
Being able to do this using a variable name would be nice, but a dictionary is acceptable too.
There is no built-in method for handling this.
Here's one method
string myString = "{foo} is {bar} and {yadi} is {yada}".Inject(o);
Here's another
Status.Text = "{UserName} last logged in at {LastLoginDate}".FormatWith(user);
A third improved method partially based on the two above, from Phil Haack
Update: This is now built-in as of C# 6 (released in 2015).
String Interpolation
$"{some_variable}: {some_other_variable}"
I have an implementation I just posted to my blog here: http://haacked.com/archive/2009/01/04/fun-with-named-formats-string-parsing-and-edge-cases.aspx
It addresses some issues that these other implementations have with brace escaping. The post has details. It does the DataBinder.Eval thing too, but is still very fast.
Interpolated strings were added into C# 6.0 and Visual Basic 14
Both were introduced through new Roslyn compiler in Visual Studio 2015.
C# 6.0:
return "\{someVariable} and also \{someOtherVariable}" OR
return $"{someVariable} and also {someOtherVariable}"
source: what's new in C#6.0
VB 14:
return $"{someVariable} and also {someOtherVariable}"
source: what's new in VB 14
Noteworthy features (in Visual Studio 2015 IDE):
syntax coloring is supported - variables contained in strings are highlighted
refactoring is supported - when renaming, variables contained in strings get renamed, too
actually not only variable names, but expressions are supported - e.g. not only {index} works, but also {(index + 1).ToString().Trim()}
Enjoy! (& click "Send a Smile" in the VS)
You can also use anonymous types like this:
public string Format(string input, object p)
{
foreach (PropertyDescriptor prop in TypeDescriptor.GetProperties(p))
input = input.Replace("{" + prop.Name + "}", (prop.GetValue(p) ?? "(null)").ToString());
return input;
}
Of course it would require more code if you also want to parse formatting, but you can format a string using this function like:
Format("test {first} and {another}", new { first = "something", another = "something else" })
There doesn't appear to be a way to do this out of the box. Though, it looks feasible to implement your own IFormatProvider that links to an IDictionary for values.
var Stuff = new Dictionary<string, object> {
{ "language", "Python" },
{ "#", 2 }
};
var Formatter = new DictionaryFormatProvider();
// Interpret {0:x} where {0}=IDictionary and "x" is hash key
Console.WriteLine string.Format(Formatter, "{0:language} has {0:#} quote types", Stuff);
Outputs:
Python has 2 quote types
The caveat is that you can't mix FormatProviders, so the fancy text formatting can't be used at the same time.
The framework itself does not provide a way to do this, but you can take a look at this post by Scott Hanselman. Example usage:
Person p = new Person();
string foo = p.ToString("{Money:C} {LastName}, {ScottName} {BirthDate}");
Assert.AreEqual("$3.43 Hanselman, {ScottName} 1/22/1974 12:00:00 AM", foo);
This code by James Newton-King is similar and works with sub-properties and indexes,
string foo = "Top result for {Name} was {Results[0].Name}".FormatWith(student));
James's code relies on System.Web.UI.DataBinder to parse the string and requires referencing System.Web, which some people don't like to do in non-web applications.
EDIT: Oh and they work nicely with anonymous types, if you don't have an object with properties ready for it:
string name = ...;
DateTime date = ...;
string foo = "{Name} - {Birthday}".FormatWith(new { Name = name, Birthday = date });
See https://stackoverflow.com/questions/271398?page=2#358259
With the linked-to extension you can write this:
var str = "{foo} {bar} {baz}".Format(foo=>"foo", bar=>2, baz=>new object());
and you'll get "foo 2 System.Object".
I think the closest you'll get is an indexed format:
String.Format("{0} has {1} quote types.", "C#", "1");
There's also String.Replace(), if you're willing to do it in multiple steps and take it on faith that you won't find your 'variables' anywhere else in the string:
string MyString = "{language} has {n} quote types.";
MyString = MyString.Replace("{language}", "C#").Replace("{n}", "1");
Expanding this to use a List:
List<KeyValuePair<string, string>> replacements = GetFormatDictionary();
foreach (KeyValuePair<string, string> item in replacements)
{
MyString = MyString.Replace(item.Key, item.Value);
}
You could do that with a Dictionary<string, string> too by iterating it's .Keys collections, but by using a List<KeyValuePair<string, string>> we can take advantage of the List's .ForEach() method and condense it back to a one-liner:
replacements.ForEach(delegate(KeyValuePair<string,string>) item) { MyString = MyString.Replace(item.Key, item.Value);});
A lambda would be even simpler, but I'm still on .Net 2.0. Also note that the .Replace() performance isn't stellar when used iteratively, since strings in .Net are immutable. Also, this requires the MyString variable be defined in such a way that it's accessible to the delegate, so it's not perfect yet.
My open source library, Regextra, supports named formatting (amongst other things). It currently targets .NET 4.0+ and is available on NuGet. I also have an introductory blog post about it: Regextra: helping you reduce your (problems){2}.
The named formatting bit supports:
Basic formatting
Nested properties formatting
Dictionary formatting
Escaping of delimiters
Standard/Custom/IFormatProvider string formatting
Example:
var order = new
{
Description = "Widget",
OrderDate = DateTime.Now,
Details = new
{
UnitPrice = 1500
}
};
string template = "We just shipped your order of '{Description}', placed on {OrderDate:d}. Your {{credit}} card will be billed {Details.UnitPrice:C}.";
string result = Template.Format(template, order);
// or use the extension: template.FormatTemplate(order);
Result:
We just shipped your order of 'Widget', placed on 2/28/2014. Your {credit} card will be billed $1,500.00.
Check out the project's GitHub link (above) and wiki for other examples.
private static Regex s_NamedFormatRegex = new Regex(#"\{(?!\{)(?<key>[\w]+)(:(?<fmt>(\{\{|\}\}|[^\{\}])*)?)?\}", RegexOptions.Compiled);
public static StringBuilder AppendNamedFormat(this StringBuilder builder,IFormatProvider provider, string format, IDictionary<string, object> args)
{
if (builder == null) throw new ArgumentNullException("builder");
var str = s_NamedFormatRegex.Replace(format, (mt) => {
string key = mt.Groups["key"].Value;
string fmt = mt.Groups["fmt"].Value;
object value = null;
if (args.TryGetValue(key,out value)) {
return string.Format(provider, "{0:" + fmt + "}", value);
} else {
return mt.Value;
}
});
builder.Append(str);
return builder;
}
public static StringBuilder AppendNamedFormat(this StringBuilder builder, string format, IDictionary<string, object> args)
{
if (builder == null) throw new ArgumentNullException("builder");
return builder.AppendNamedFormat(null, format, args);
}
Example:
var builder = new StringBuilder();
builder.AppendNamedFormat(
#"你好,{Name},今天是{Date:yyyy/MM/dd}, 这是你第{LoginTimes}次登录,积分{Score:{{ 0.00 }}}",
new Dictionary<string, object>() {
{ "Name", "wayjet" },
{ "LoginTimes",18 },
{ "Score", 100.4 },
{ "Date",DateTime.Now }
});
Output:
你好,wayjet,今天是2011-05-04, 这是你第18次登录,积分{ 100.40 }
Check this one:
public static string StringFormat(string format, object source)
{
var matches = Regex.Matches(format, #"\{(.+?)\}");
List<string> keys = (from Match matche in matches select matche.Groups[1].Value).ToList();
return keys.Aggregate(
format,
(current, key) =>
{
int colonIndex = key.IndexOf(':');
return current.Replace(
"{" + key + "}",
colonIndex > 0
? DataBinder.Eval(source, key.Substring(0, colonIndex), "{0:" + key.Substring(colonIndex + 1) + "}")
: DataBinder.Eval(source, key).ToString());
});
}
Sample:
string format = "{foo} is a {bar} is a {baz} is a {qux:#.#} is a really big {fizzle}";
var o = new { foo = 123, bar = true, baz = "this is a test", qux = 123.45, fizzle = DateTime.Now };
Console.WriteLine(StringFormat(format, o));
Performance is pretty ok compared to other solutions.
I doubt this will be possible. The first thing that comes to mind is how are you going to get access to local variable names?
There might be some clever way using LINQ and Lambda expressions to do this however.
Here's one I made a while back. It extends String with a Format method taking a single argument. The nice thing is that it'll use the standard string.Format if you provide a simple argument like an int, but if you use something like anonymous type it'll work too.
Example usage:
"The {Name} family has {Children} children".Format(new { Children = 4, Name = "Smith" })
Would result in "The Smith family has 4 children."
It doesn't do crazy binding stuff like arrays and indexers. But it is super simple and high performance.
public static class AdvancedFormatString
{
/// <summary>
/// An advanced version of string.Format. If you pass a primitive object (string, int, etc), it acts like the regular string.Format. If you pass an anonmymous type, you can name the paramters by property name.
/// </summary>
/// <param name="formatString"></param>
/// <param name="arg"></param>
/// <returns></returns>
/// <example>
/// "The {Name} family has {Children} children".Format(new { Children = 4, Name = "Smith" })
///
/// results in
/// "This Smith family has 4 children
/// </example>
public static string Format(this string formatString, object arg, IFormatProvider format = null)
{
if (arg == null)
return formatString;
var type = arg.GetType();
if (Type.GetTypeCode(type) != TypeCode.Object || type.IsPrimitive)
return string.Format(format, formatString, arg);
var properties = TypeDescriptor.GetProperties(arg);
return formatString.Format((property) =>
{
var value = properties[property].GetValue(arg);
return Convert.ToString(value, format);
});
}
public static string Format(this string formatString, Func<string, string> formatFragmentHandler)
{
if (string.IsNullOrEmpty(formatString))
return formatString;
Fragment[] fragments = GetParsedFragments(formatString);
if (fragments == null || fragments.Length == 0)
return formatString;
return string.Join(string.Empty, fragments.Select(fragment =>
{
if (fragment.Type == FragmentType.Literal)
return fragment.Value;
else
return formatFragmentHandler(fragment.Value);
}).ToArray());
}
private static Fragment[] GetParsedFragments(string formatString)
{
Fragment[] fragments;
if ( parsedStrings.TryGetValue(formatString, out fragments) )
{
return fragments;
}
lock (parsedStringsLock)
{
if ( !parsedStrings.TryGetValue(formatString, out fragments) )
{
fragments = Parse(formatString);
parsedStrings.Add(formatString, fragments);
}
}
return fragments;
}
private static Object parsedStringsLock = new Object();
private static Dictionary<string,Fragment[]> parsedStrings = new Dictionary<string,Fragment[]>(StringComparer.Ordinal);
const char OpeningDelimiter = '{';
const char ClosingDelimiter = '}';
/// <summary>
/// Parses the given format string into a list of fragments.
/// </summary>
/// <param name="format"></param>
/// <returns></returns>
static Fragment[] Parse(string format)
{
int lastCharIndex = format.Length - 1;
int currFragEndIndex;
Fragment currFrag = ParseFragment(format, 0, out currFragEndIndex);
if (currFragEndIndex == lastCharIndex)
{
return new Fragment[] { currFrag };
}
List<Fragment> fragments = new List<Fragment>();
while (true)
{
fragments.Add(currFrag);
if (currFragEndIndex == lastCharIndex)
{
break;
}
currFrag = ParseFragment(format, currFragEndIndex + 1, out currFragEndIndex);
}
return fragments.ToArray();
}
/// <summary>
/// Finds the next delimiter from the starting index.
/// </summary>
static Fragment ParseFragment(string format, int startIndex, out int fragmentEndIndex)
{
bool foundEscapedDelimiter = false;
FragmentType type = FragmentType.Literal;
int numChars = format.Length;
for (int i = startIndex; i < numChars; i++)
{
char currChar = format[i];
bool isOpenBrace = currChar == OpeningDelimiter;
bool isCloseBrace = isOpenBrace ? false : currChar == ClosingDelimiter;
if (!isOpenBrace && !isCloseBrace)
{
continue;
}
else if (i < (numChars - 1) && format[i + 1] == currChar)
{//{{ or }}
i++;
foundEscapedDelimiter = true;
}
else if (isOpenBrace)
{
if (i == startIndex)
{
type = FragmentType.FormatItem;
}
else
{
if (type == FragmentType.FormatItem)
throw new FormatException("Two consequtive unescaped { format item openers were found. Either close the first or escape any literals with another {.");
//curr character is the opening of a new format item. so we close this literal out
string literal = format.Substring(startIndex, i - startIndex);
if (foundEscapedDelimiter)
literal = ReplaceEscapes(literal);
fragmentEndIndex = i - 1;
return new Fragment(FragmentType.Literal, literal);
}
}
else
{//close bracket
if (i == startIndex || type == FragmentType.Literal)
throw new FormatException("A } closing brace existed without an opening { brace.");
string formatItem = format.Substring(startIndex + 1, i - startIndex - 1);
if (foundEscapedDelimiter)
formatItem = ReplaceEscapes(formatItem);//a format item with a { or } in its name is crazy but it could be done
fragmentEndIndex = i;
return new Fragment(FragmentType.FormatItem, formatItem);
}
}
if (type == FragmentType.FormatItem)
throw new FormatException("A format item was opened with { but was never closed.");
fragmentEndIndex = numChars - 1;
string literalValue = format.Substring(startIndex);
if (foundEscapedDelimiter)
literalValue = ReplaceEscapes(literalValue);
return new Fragment(FragmentType.Literal, literalValue);
}
/// <summary>
/// Replaces escaped brackets, turning '{{' and '}}' into '{' and '}', respectively.
/// </summary>
/// <param name="value"></param>
/// <returns></returns>
static string ReplaceEscapes(string value)
{
return value.Replace("{{", "{").Replace("}}", "}");
}
private enum FragmentType
{
Literal,
FormatItem
}
private class Fragment
{
public Fragment(FragmentType type, string value)
{
Type = type;
Value = value;
}
public FragmentType Type
{
get;
private set;
}
/// <summary>
/// The literal value, or the name of the fragment, depending on fragment type.
/// </summary>
public string Value
{
get;
private set;
}
}
}
here is a simple method for any object:
using System.Text.RegularExpressions;
using System.ComponentModel;
public static string StringWithFormat(string format, object args)
{
Regex r = new Regex(#"\{([A-Za-z0-9_]+)\}");
MatchCollection m = r.Matches(format);
var properties = TypeDescriptor.GetProperties(args);
foreach (Match item in m)
{
try
{
string propertyName = item.Groups[1].Value;
format = format.Replace(item.Value, properties[propertyName].GetValue(args).ToString());
}
catch
{
throw new FormatException("The format string is not valid");
}
}
return format;
}
And here how to use it:
DateTime date = DateTime.Now;
string dateString = StringWithFormat("{Month}/{Day}/{Year}", date);
output : 2/27/2012
I implemented this is a simple class that duplicates the functionality of String.Format (except for when using classes). You can either use a dictionary or a type to define fields.
https://github.com/SergueiFedorov/NamedFormatString
C# 6.0 is adding this functionality right into the language spec, so NamedFormatString is for backwards compatibility.
I solved this in a slightly different way to the existing solutions.
It does the core of the named item replacement (not the reflection bit that some have done). It is extremely fast and simple...
This is my solution:
/// <summary>
/// Formats a string with named format items given a template dictionary of the items values to use.
/// </summary>
public class StringTemplateFormatter
{
private readonly IFormatProvider _formatProvider;
/// <summary>
/// Constructs the formatter with the specified <see cref="IFormatProvider"/>.
/// This is defaulted to <see cref="CultureInfo.CurrentCulture">CultureInfo.CurrentCulture</see> if none is provided.
/// </summary>
/// <param name="formatProvider"></param>
public StringTemplateFormatter(IFormatProvider formatProvider = null)
{
_formatProvider = formatProvider ?? CultureInfo.CurrentCulture;
}
/// <summary>
/// Formats a string with named format items given a template dictionary of the items values to use.
/// </summary>
/// <param name="text">The text template</param>
/// <param name="templateValues">The named values to use as replacements in the formatted string.</param>
/// <returns>The resultant text string with the template values replaced.</returns>
public string FormatTemplate(string text, Dictionary<string, object> templateValues)
{
var formattableString = text;
var values = new List<object>();
foreach (KeyValuePair<string, object> value in templateValues)
{
var index = values.Count;
formattableString = ReplaceFormattableItem(formattableString, value.Key, index);
values.Add(value.Value);
}
return String.Format(_formatProvider, formattableString, values.ToArray());
}
/// <summary>
/// Convert named string template item to numbered string template item that can be accepted by <see cref="string.Format(string,object[])">String.Format</see>
/// </summary>
/// <param name="formattableString">The string containing the named format item</param>
/// <param name="itemName">The name of the format item</param>
/// <param name="index">The index to use for the item value</param>
/// <returns>The formattable string with the named item substituted with the numbered format item.</returns>
private static string ReplaceFormattableItem(string formattableString, string itemName, int index)
{
return formattableString
.Replace("{" + itemName + "}", "{" + index + "}")
.Replace("{" + itemName + ",", "{" + index + ",")
.Replace("{" + itemName + ":", "{" + index + ":");
}
}
It is used in the following way:
[Test]
public void FormatTemplate_GivenANamedGuid_FormattedWithB_ShouldFormatCorrectly()
{
// Arrange
var template = "My guid {MyGuid:B} is awesome!";
var templateValues = new Dictionary<string, object> { { "MyGuid", new Guid("{A4D2A7F1-421C-4A1D-9CB2-9C2E70B05E19}") } };
var sut = new StringTemplateFormatter();
// Act
var result = sut.FormatTemplate(template, templateValues);
//Assert
Assert.That(result, Is.EqualTo("My guid {a4d2a7f1-421c-4a1d-9cb2-9c2e70b05e19} is awesome!"));
}
Hope someone finds this useful!
Even though the accepted answer gives some good examples, the .Inject as well as some of the Haack examples do not handle escaping. Many also rely heavily on Regex (slower), or DataBinder.Eval which is not available on .NET Core, and in some other environments.
With that in mind, I've written a simple state machine based parser that streams through characters, writing to a StringBuilder output, character by character. It is implemented as String extension method(s) and can take both a Dictionary<string, object> or object with parameters as input (using reflection).
It handles unlimited levels of {{{escaping}}} and throws FormatException when input contains unbalanced braces and/or other errors.
public static class StringExtension {
/// <summary>
/// Extension method that replaces keys in a string with the values of matching object properties.
/// </summary>
/// <param name="formatString">The format string, containing keys like {foo} and {foo:SomeFormat}.</param>
/// <param name="injectionObject">The object whose properties should be injected in the string</param>
/// <returns>A version of the formatString string with keys replaced by (formatted) key values.</returns>
public static string FormatWith(this string formatString, object injectionObject) {
return formatString.FormatWith(GetPropertiesDictionary(injectionObject));
}
/// <summary>
/// Extension method that replaces keys in a string with the values of matching dictionary entries.
/// </summary>
/// <param name="formatString">The format string, containing keys like {foo} and {foo:SomeFormat}.</param>
/// <param name="dictionary">An <see cref="IDictionary"/> with keys and values to inject into the string</param>
/// <returns>A version of the formatString string with dictionary keys replaced by (formatted) key values.</returns>
public static string FormatWith(this string formatString, IDictionary<string, object> dictionary) {
char openBraceChar = '{';
char closeBraceChar = '}';
return FormatWith(formatString, dictionary, openBraceChar, closeBraceChar);
}
/// <summary>
/// Extension method that replaces keys in a string with the values of matching dictionary entries.
/// </summary>
/// <param name="formatString">The format string, containing keys like {foo} and {foo:SomeFormat}.</param>
/// <param name="dictionary">An <see cref="IDictionary"/> with keys and values to inject into the string</param>
/// <returns>A version of the formatString string with dictionary keys replaced by (formatted) key values.</returns>
public static string FormatWith(this string formatString, IDictionary<string, object> dictionary, char openBraceChar, char closeBraceChar) {
string result = formatString;
if (dictionary == null || formatString == null)
return result;
// start the state machine!
// ballpark output string as two times the length of the input string for performance (avoids reallocating the buffer as often).
StringBuilder outputString = new StringBuilder(formatString.Length * 2);
StringBuilder currentKey = new StringBuilder();
bool insideBraces = false;
int index = 0;
while (index < formatString.Length) {
if (!insideBraces) {
// currently not inside a pair of braces in the format string
if (formatString[index] == openBraceChar) {
// check if the brace is escaped
if (index < formatString.Length - 1 && formatString[index + 1] == openBraceChar) {
// add a brace to the output string
outputString.Append(openBraceChar);
// skip over braces
index += 2;
continue;
}
else {
// not an escaped brace, set state to inside brace
insideBraces = true;
index++;
continue;
}
}
else if (formatString[index] == closeBraceChar) {
// handle case where closing brace is encountered outside braces
if (index < formatString.Length - 1 && formatString[index + 1] == closeBraceChar) {
// this is an escaped closing brace, this is okay
// add a closing brace to the output string
outputString.Append(closeBraceChar);
// skip over braces
index += 2;
continue;
}
else {
// this is an unescaped closing brace outside of braces.
// throw a format exception
throw new FormatException($"Unmatched closing brace at position {index}");
}
}
else {
// the character has no special meaning, add it to the output string
outputString.Append(formatString[index]);
// move onto next character
index++;
continue;
}
}
else {
// currently inside a pair of braces in the format string
// found an opening brace
if (formatString[index] == openBraceChar) {
// check if the brace is escaped
if (index < formatString.Length - 1 && formatString[index + 1] == openBraceChar) {
// there are escaped braces within the key
// this is illegal, throw a format exception
throw new FormatException($"Illegal escaped opening braces within a parameter - index: {index}");
}
else {
// not an escaped brace, we have an unexpected opening brace within a pair of braces
throw new FormatException($"Unexpected opening brace inside a parameter - index: {index}");
}
}
else if (formatString[index] == closeBraceChar) {
// handle case where closing brace is encountered inside braces
// don't attempt to check for escaped braces here - always assume the first brace closes the braces
// since we cannot have escaped braces within parameters.
// set the state to be outside of any braces
insideBraces = false;
// jump over brace
index++;
// at this stage, a key is stored in current key that represents the text between the two braces
// do a lookup on this key
string key = currentKey.ToString();
// clear the stringbuilder for the key
currentKey.Clear();
object outObject;
if (!dictionary.TryGetValue(key, out outObject)) {
// the key was not found as a possible replacement, throw exception
throw new FormatException($"The parameter \"{key}\" was not present in the lookup dictionary");
}
// we now have the replacement value, add the value to the output string
outputString.Append(outObject);
// jump to next state
continue;
} // if }
else {
// character has no special meaning, add it to the current key
currentKey.Append(formatString[index]);
// move onto next character
index++;
continue;
} // else
} // if inside brace
} // while
// after the loop, if all braces were balanced, we should be outside all braces
// if we're not, the input string was misformatted.
if (insideBraces) {
throw new FormatException("The format string ended before the parameter was closed.");
}
return outputString.ToString();
}
/// <summary>
/// Creates a Dictionary from an objects properties, with the Key being the property's
/// name and the Value being the properties value (of type object)
/// </summary>
/// <param name="properties">An object who's properties will be used</param>
/// <returns>A <see cref="Dictionary"/> of property values </returns>
private static Dictionary<string, object> GetPropertiesDictionary(object properties) {
Dictionary<string, object> values = null;
if (properties != null) {
values = new Dictionary<string, object>();
PropertyDescriptorCollection props = TypeDescriptor.GetProperties(properties);
foreach (PropertyDescriptor prop in props) {
values.Add(prop.Name, prop.GetValue(properties));
}
}
return values;
}
}
Ultimately, all the logic boils down into 10 main states - For when the state machine is outside a bracket and likewise inside a bracket, the next character is either an open brace, an escaped open brace, a closed brace, an escaped closed brace, or an ordinary character. Each of these conditions is handled individually as the loop progresses, adding characters to either an output StringBuffer or a key StringBuffer. When a parameter is closed, the value of the key StringBuffer is used to look up the parameter's value in the dictionary, which then gets pushed into the output StringBuffer. At the end, the value of the output StringBuffer is returned.
string language = "Python";
int numquotes = 2;
string output = language + " has "+ numquotes + " language types.";
Edit:
What I should have said was, "No, I don't believe what you want to do is supported by C#. This is as close as you are going to get."