String escape into XML

String escape into XML - c#

Is there any C# function which could be used to escape and un-escape a string, which could be used to fill in the content of an XML element?
I am using VSTS 2008 + C# + .Net 3.0.
EDIT 1: I am concatenating simple and short XML file and I do not use serialization, so I need to explicitly escape XML character by hand, for example, I need to put a<b into <foo></foo>, so I need escape string a<b and put it into element foo.

SecurityElement.Escape(string s)

public static string XmlEscape(string unescaped)
{
XmlDocument doc = new XmlDocument();
XmlNode node = doc.CreateElement("root");
node.InnerText = unescaped;
return node.InnerXml;
}
public static string XmlUnescape(string escaped)
{
XmlDocument doc = new XmlDocument();
XmlNode node = doc.CreateElement("root");
node.InnerXml = escaped;
return node.InnerText;
}

EDIT: You say "I am concatenating simple and short XML file and I do not use serialization, so I need to explicitly escape XML character by hand".
I would strongly advise you not to do it by hand. Use the XML APIs to do it all for you - read in the original files, merge the two into a single document however you need to (you probably want to use XmlDocument.ImportNode), and then write it out again. You don't want to write your own XML parsers/formatters. Serialization is somewhat irrelevant here.
If you can give us a short but complete example of exactly what you're trying to do, we can probably help you to avoid having to worry about escaping in the first place.
Original answer
It's not entirely clear what you mean, but normally XML APIs do this for you. You set the text in a node, and it will automatically escape anything it needs to. For example:
LINQ to XML example:
using System;
using System.Xml.Linq;
class Test
{
static void Main()
{
XElement element = new XElement("tag",
"Brackets & stuff <>");
Console.WriteLine(element);
}
}
DOM example:
using System;
using System.Xml;
class Test
{
static void Main()
{
XmlDocument doc = new XmlDocument();
XmlElement element = doc.CreateElement("tag");
element.InnerText = "Brackets & stuff <>";
Console.WriteLine(element.OuterXml);
}
}
Output from both examples:
<tag>Brackets & stuff <></tag>
That's assuming you want XML escaping, of course. If you're not, please post more details.

Thanks to #sehe for the one-line escape:
var escaped = new System.Xml.Linq.XText(unescaped).ToString();
I add to it the one-line un-escape:
var unescapedAgain = System.Xml.XmlReader.Create(new StringReader("<r>" + escaped + "</r>")).ReadElementString();

George, it's simple. Always use the XML APIs to handle XML. They do all the escaping and unescaping for you.
Never create XML by appending strings.

And if you want, like me when I found this question, to escape XML node names, like for example when reading from an XML serialization, use the easiest way:
XmlConvert.EncodeName(string nameToEscape)
It will also escape spaces and any non-valid characters for XML elements.
http://msdn.microsoft.com/en-us/library/system.security.securityelement.escape%28VS.80%29.aspx

Another take based on John Skeet's answer that doesn't return the tags:
void Main()
{
XmlString("Brackets & stuff <> and \"quotes\"").Dump();
}
public string XmlString(string text)
{
return new XElement("t", text).LastNode.ToString();
}
This returns just the value passed in, in XML encoded format:
Brackets & stuff <> and "quotes"

WARNING: Necromancing
Still Darin Dimitrov's answer + System.Security.SecurityElement.Escape(string s) isn't complete.
In XML 1.1, the simplest and safest way is to just encode EVERYTHING.
Like for \t.
It isn't supported at all in XML 1.0.
For XML 1.0, one possible workaround is to base-64 encode the text containing the character(s).
//string EncodedXml = SpecialXmlEscape("привет мир");
//Console.WriteLine(EncodedXml);
//string DecodedXml = XmlUnescape(EncodedXml);
//Console.WriteLine(DecodedXml);
public static string SpecialXmlEscape(string input)
{
//string content = System.Xml.XmlConvert.EncodeName("\t");
//string content = System.Security.SecurityElement.Escape("\t");
//string strDelimiter = System.Web.HttpUtility.HtmlEncode("\t"); // XmlEscape("\t"); //XmlDecode(" ");
//strDelimiter = XmlUnescape(";");
//Console.WriteLine(strDelimiter);
//Console.WriteLine(string.Format("&#{0};", (int)';'));
//Console.WriteLine(System.Text.Encoding.ASCII.HeaderName);
//Console.WriteLine(System.Text.Encoding.UTF8.HeaderName);
string strXmlText = "";
if (string.IsNullOrEmpty(input))
return input;
System.Text.StringBuilder sb = new StringBuilder();
for (int i = 0; i < input.Length; ++i)
{
sb.AppendFormat("&#{0};", (int)input[i]);
}
strXmlText = sb.ToString();
sb.Clear();
sb = null;
return strXmlText;
} // End Function SpecialXmlEscape
XML 1.0:
public static string Base64Encode(string plainText)
{
var plainTextBytes = System.Text.Encoding.UTF8.GetBytes(plainText);
return System.Convert.ToBase64String(plainTextBytes);
}
public static string Base64Decode(string base64EncodedData)
{
var base64EncodedBytes = System.Convert.FromBase64String(base64EncodedData);
return System.Text.Encoding.UTF8.GetString(base64EncodedBytes);
}

Following functions will do the work. Didn't test against XmlDocument, but I guess this is much faster.
public static string XmlEncode(string value)
{
System.Xml.XmlWriterSettings settings = new System.Xml.XmlWriterSettings
{
ConformanceLevel = System.Xml.ConformanceLevel.Fragment
};
StringBuilder builder = new StringBuilder();
using (var writer = System.Xml.XmlWriter.Create(builder, settings))
{
writer.WriteString(value);
}
return builder.ToString();
}
public static string XmlDecode(string xmlEncodedValue)
{
System.Xml.XmlReaderSettings settings = new System.Xml.XmlReaderSettings
{
ConformanceLevel = System.Xml.ConformanceLevel.Fragment
};
using (var stringReader = new System.IO.StringReader(xmlEncodedValue))
{
using (var xmlReader = System.Xml.XmlReader.Create(stringReader, settings))
{
xmlReader.Read();
return xmlReader.Value;
}
}
}

Using a third-party library (Newtonsoft.Json) as alternative:
public static string XmlEscape(string unescaped)
{
if (unescaped == null) return null;
return JsonConvert.SerializeObject(unescaped); ;
}
public static string XmlUnescape(string escaped)
{
if (escaped == null) return null;
return JsonConvert.DeserializeObject(escaped, typeof(string)).ToString();
}
Examples of escaped string:
a<b ==> "a<b"
<foo></foo> ==> "foo></foo>"
NOTE:
In newer versions, the code written above may not work with escaping, so you need to specify how the strings will be escaped:
public static string XmlEscape(string unescaped)
{
if (unescaped == null) return null;
return JsonConvert.SerializeObject(unescaped, new JsonSerializerSettings()
{
StringEscapeHandling = StringEscapeHandling.EscapeHtml
});
}
Examples of escaped string:
a<b ==> "a\u003cb"
<foo></foo> ==> "\u003cfoo\u003e\u003c/foo\u003e"

SecurityElementEscape does this job for you
Use this method to replace invalid characters in a string before using the string in a SecurityElement. If invalid characters are used in a SecurityElement without being escaped, an ArgumentException is thrown.
The following table shows the invalid XML characters and their escaped equivalents.
https://learn.microsoft.com/en-us/dotnet/api/system.security.securityelement.escape?view=net-5.0

Related

preserve &#xA, when reading XML

Xml content like following:
<xml>
<item content="abcd 
 abcd
abcd" />
</xml>
When using XmlDocument to read the content of content attribute, 
 and
are automatically escaped.
Code:
XmlDocument doc = new XmlDocument();
var content = doc.SelectSingleNode("/xml/item").Attributes["content"].Value;
How can get the raw text without char escaping?

If these characters were written to the lexical XML stream without escaping, then they would be swallowed by the XML parser when the stream is read by the recipient, as a result of the XML line-ending normalisation rules. So you've got it the wrong way around: the reason they are escaped is in order to preserve them; if they weren't escaped, they would be lost.

I got a workaround, it works for me:
private static string GetAttributeValue(XmlNode node, string attributeName)
{
if (node == null || string.IsNullOrWhiteSpace(attributeName))
{
throw new ArgumentException();
}
const string CharLF = "
";
const string CharCR = "
";
string xmlContent = node.OuterXml;
if (!xmlContent.Contains(CharLF) && !xmlContent.Contains(CharCR))
{
// no special char, return its original value directly
return node.Attributes[attributeName].Value;
}
string value = string.Empty;
if (xmlContent.Contains(attributeName))
{
value = xmlContent.Substring(xmlContent.IndexOf(attributeName)).Trim();
value = value.Substring(value.IndexOf("\"") + 1);
value = value.Substring(0, value.IndexOf("\""));
}
return value;
}

XmlException - given illegal XML from 3rd party; must process

There are several SO questions and answers about this when creating an XML file; but can't find any pertaining to when you are given bad XML from a 3rd party that you must process; note, the 3rd party cannot be held accountable for the illegal XML.
Ultimately, the .InnerText needs to be escaped or encoded (e.g. changed to legal XML characters) - and later decoded after proper XML parsing.
QUESTION: Are there any libraries that will Load() Invalid/Illegal XML files to allow quick navigation for such escaping/encoding? Or am I stuck having to manually parse the invalid xml, fixing it along the way ... ?
<?xml version="1.0" encoding="utf-8"?>
<ChunkData>
<Fields>
<Field1>some words < other words</Field1>
<Field2>some words > other words</Field2>
</Fields>
</ChunkData>

Although HttpAgilityPack is awesome (and I'm using it in another project of my own), I was given no the time to follow Alexei's advice - which is exactly the direction that I was looking for -- can't parse it as XML? cool, parse it as HTML ... didn't even cross my mind ...
Ended up with this, which does the trick (but is exactly what Alexei advised against):
private static string EncodeValues(string xml)
{
var doc = new List<string>();
var lines = xml.Split('\n');
foreach (var line in lines)
{
var output = line;
if (line.Contains("<Field") && !line.Contains("Fields>"))
{
var value = line.Parse(">", "</");
var encoded = HttpUtility.UrlEncode(value);
output = line.Replace(value, encoded);
}
doc.Add(output);
}
return string.Join("", doc);
}
private static Hashtable DecodeValues(IDictionary data)
{
var output = new Hashtable();
foreach (var key in data.Keys)
{
var value = (string)data[key];
output.Add(key, HttpUtility.UrlDecode(value));
}
return output;
}
Used in conjunction with an Extension method I wrote quite awhile ago ...
public static string Parse(this string s, string first, string second)
{
try
{
if (string.IsNullOrEmpty(s)) return "";
var start = s.IndexOf(first, StringComparison.InvariantCulture) + first.Length;
var end = s.IndexOf(second, start, StringComparison.InvariantCulture);
var length = end - start;
return (end > 0 && length < s.Length) ? s.Substring(start, length) : s.Substring(start);
}
catch (Exception) { return ""; }
}
Used as such (kept separate from the Transform and Hashtable creation methods for clarity):
xmlDocs[0] = EncodeValues(xmlDocs[0]); // in order to handle illegal chars in XML, encode InnerText
var doc = TransformXmlDocument(orgName, xmlDocs[0], xmlDocs[1]);
var data = GetHashtableFromXml(doc);
data = DecodeValues(data); // decode the values extracted from the hashtable
Regardless, I'm always looking for insight ... feel free to comment on this solution - or provide another.

Looking for tags in strings from right to left

string sample1 = <SUCCESS><BUILDING>27</BUILDING></SUCCESS><CLEANED><LOCALITY>Value 1</LOCALITY></CLEANED>
string sample2 = <SUCCESS><BUILDING>14</BUILDING></SUCCESS> <SUCCESS><BUILDING>Value 2</BUILDING></SUCCESS>
In both above string samples I want to get the first "SUCCESS" tag from right to left.
So in sample 1 I want returned = <SUCCESS><BUILDING>27</BUILDING></SUCCESS>
and in sample 2 I want returned = <SUCCESS><BUILDING>Value 2</BUILDING></SUCCESS>
I know I can use Index of to first occurrence but not sure of last

XDocument doc = XDocument.Parse("<xml>" + sample2 + "</xml>");
Text = doc.Root.Elements("SUCCESS").Last().ToString();

c# has a nice String function called LastIndexOf(String). It will work the exact same way as indexOf(String) except give you the last occurrence.
http://msdn.microsoft.com/en-us/library/1wdsy8fy(v=vs.110).aspx
Hope this helps,
Cheers

If you're going to be parsing XML, you might be interested in using the XMLReader class. Read more about the XMLReader here.
Note that you need valid XML for the reader to work. In your example, you would need to wrap the partial XML in a unique root node (part of the XML spec). You might consider making some extension methods to help you:
public static class XMLStringExtensions
{
public static string LastTag(this string innerXml, string tag)
{
string previousTag = null;
using (var reader = XmlReader.Create(new StringReader(innerXml.WrapInRoot())))
while(reader.ReadToFollowing(tag)) previousTag = reader.ReadOuterXml();
return previousTag;
}
public static string WrapInRoot(this string partialXml)
{
return string.Format("<root>{0}</root>", partialXml);
}
}
Then you can invoke it like this:
sample1.LastTag("SUCCESS"); //<SUCCESS><BUILDING>27</BUILDING></SUCCESS>
sample2.LastTag("SUCCESS"); //<SUCCESS><BUILDING>Value 2</BUILDING></SUCCESS>

How can I strip HTML tags from a string in ASP.NET?

Using ASP.NET, how can I strip the HTML tags from a given string reliably (i.e. not using regex)? I am looking for something like PHP's strip_tags.
Example:
<ul><li>Hello</li></ul>
Output:
"Hello"
I am trying not to reinvent the wheel, but I have not found anything that meets my needs so far.

If it is just stripping all HTML tags from a string, this works reliably with regex as well. Replace:
<[^>]*(>|$)
with the empty string, globally. Don't forget to normalize the string afterwards, replacing:
[\s\r\n]+
with a single space, and trimming the result. Optionally replace any HTML character entities back to the actual characters.
Note:
There is a limitation: HTML and XML allow > in attribute values. This solution will return broken markup when encountering such values.
The solution is technically safe, as in: The result will never contain anything that could be used to do cross site scripting or to break a page layout. It is just not very clean.
As with all things HTML and regex:
Use a proper parser if you must get it right under all circumstances.

Go download HTMLAgilityPack, now! ;) Download LInk
This allows you to load and parse HTML. Then you can navigate the DOM and extract the inner values of all attributes. Seriously, it will take you about 10 lines of code at the maximum. It is one of the greatest free .net libraries out there.
Here is a sample:
string htmlContents = new System.IO.StreamReader(resultsStream,Encoding.UTF8,true).ReadToEnd();
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlContents);
if (doc == null) return null;
string output = "";
foreach (var node in doc.DocumentNode.ChildNodes)
{
output += node.InnerText;
}

Regex.Replace(htmlText, "<.*?>", string.Empty);

protected string StripHtml(string Txt)
{
return Regex.Replace(Txt, "<(.|\\n)*?>", string.Empty);
}
Protected Function StripHtml(Txt as String) as String
Return Regex.Replace(Txt, "<(.|\n)*?>", String.Empty)
End Function

I've posted this on the asp.net forums, and it still seems to be one of the easiest solutions out there. I won't guarantee it's the fastest or most efficient, but it's pretty reliable.
In .NET you can use the HTML Web Control objects themselves. All you really need to do is insert your string into a temporary HTML object such as a DIV, then use the built-in 'InnerText' to grab all text that is not contained within tags. See below for a simple C# example:
System.Web.UI.HtmlControls.HtmlGenericControl htmlDiv = new System.Web.UI.HtmlControls.HtmlGenericControl("div");
htmlDiv.InnerHtml = htmlString;
String plainText = htmlDiv.InnerText;

I have written a pretty fast method in c# which beats the hell out of the Regex. It is hosted in an article on CodeProject.
Its advantages are, among better performance the ability to replace named and numbered HTML entities (those like &amp; and &203;) and comment blocks replacement and more.
Please read the related article on CodeProject.
Thank you.

For those of you who can't use the HtmlAgilityPack, .NETs XML reader is an option. This can fail on well formatted HTML though so always add a catch with regx as a backup. Note this is NOT fast, but it does provide a nice opportunity for old school step through debugging.
public static string RemoveHTMLTags(string content)
{
var cleaned = string.Empty;
try
{
StringBuilder textOnly = new StringBuilder();
using (var reader = XmlNodeReader.Create(new System.IO.StringReader("<xml>" + content + "</xml>")))
{
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Text)
textOnly.Append(reader.ReadContentAsString());
}
}
cleaned = textOnly.ToString();
}
catch
{
//A tag is probably not closed. fallback to regex string clean.
string textOnly = string.Empty;
Regex tagRemove = new Regex(#"<[^>]*(>|$)");
Regex compressSpaces = new Regex(#"[\s\r\n]+");
textOnly = tagRemove.Replace(content, string.Empty);
textOnly = compressSpaces.Replace(textOnly, " ");
cleaned = textOnly;
}
return cleaned;
}

string result = Regex.Replace(anytext, #"<(.|\n)*?>", string.Empty);

I've looked at the Regex based solutions suggested here, and they don't fill me with any confidence except in the most trivial cases. An angle bracket in an attribute is all it would take to break, let alone mal-formmed HTML from the wild. And what about entities like &? If you want to convert HTML into plain text, you need to decode entities too.
So I propose the method below.
Using HtmlAgilityPack, this extension method efficiently strips all HTML tags from an html fragment. Also decodes HTML entities like &. Returns just the inner text items, with a new line between each text item.
public static string RemoveHtmlTags(this string html)
{
if (String.IsNullOrEmpty(html))
return html;
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
if (doc.DocumentNode == null || doc.DocumentNode.ChildNodes == null)
{
return WebUtility.HtmlDecode(html);
}
var sb = new StringBuilder();
var i = 0;
foreach (var node in doc.DocumentNode.ChildNodes)
{
var text = node.InnerText.SafeTrim();
if (!String.IsNullOrEmpty(text))
{
sb.Append(text);
if (i < doc.DocumentNode.ChildNodes.Count - 1)
{
sb.Append(Environment.NewLine);
}
}
i++;
}
var result = sb.ToString();
return WebUtility.HtmlDecode(result);
}
public static string SafeTrim(this string str)
{
if (str == null)
return null;
return str.Trim();
}
If you are really serious, you'd want to ignore the contents of certain HTML tags too (<script>, <style>, <svg>, <head>, <object> come to mind!) because they probably don't contain readable content in the sense we are after. What you do there will depend on your circumstances and how far you want to go, but using HtmlAgilityPack it would be pretty trivial to whitelist or blacklist selected tags.
If you are rendering the content back to an HTML page, make sure you understand XSS vulnerability & how to prevent it - i.e. always encode any user-entered text that gets rendered back onto an HTML page (> becomes > etc).

For those who are complining about Michael Tiptop's solution not working, here is the .Net4+ way of doing it:
public static string StripTags(this string markup)
{
try
{
StringReader sr = new StringReader(markup);
XPathDocument doc;
using (XmlReader xr = XmlReader.Create(sr,
new XmlReaderSettings()
{
ConformanceLevel = ConformanceLevel.Fragment
// for multiple roots
}))
{
doc = new XPathDocument(xr);
}
return doc.CreateNavigator().Value; // .Value is similar to .InnerText of
// XmlDocument or JavaScript's innerText
}
catch
{
return string.Empty;
}
}

using System.Text.RegularExpressions;
string str = Regex.Replace(HttpUtility.HtmlDecode(HTMLString), "<.*?>", string.Empty);

You can also do this with AngleSharp which is an alternative to HtmlAgilityPack (not that HAP is bad). It is easier to use than HAP to get the text out of a HTML source.
var parser = new HtmlParser();
var htmlDocument = parser.ParseDocument(source);
var text = htmlDocument.Body.Text();
You can take a look at the key features section where they make a case at being "better" than HAP. I think for the most part, it is probably overkill for the current question but still, it is an interesting alternative.

For the second parameter,i.e. keep some tags, you may need some code like this by using HTMLagilityPack:
public string StripTags(HtmlNode documentNode, IList keepTags)
{
var result = new StringBuilder();
foreach (var childNode in documentNode.ChildNodes)
{
if (childNode.Name.ToLower() == "#text")
{
result.Append(childNode.InnerText);
}
else
{
if (!keepTags.Contains(childNode.Name.ToLower()))
{
result.Append(StripTags(childNode, keepTags));
}
else
{
result.Append(childNode.OuterHtml.Replace(childNode.InnerHtml, StripTags(childNode, keepTags)));
}
}
}
return result.ToString();
}
More explanation on this page: http://nalgorithm.com/2015/11/20/strip-html-tags-of-an-html-in-c-strip_html-php-equivalent/

Simply use string.StripHTML();

What's the simplest way to encoding List<String> into plain String and decode it back?

I think I've come across this requirement for a dozen times. But I could never find a satisfying solution. For instance, there are a collection of string which I want to serialize (to disk or through network) through a channel where only plain string is allowed.
I almost always end up using "split" and "join" with ridiculous separator like
":::==--==:::".
like this:
public static string encode(System.Collections.Generic.List<string> data)
{
return string.Join(" :::==--==::: ", data.ToArray());
}
public static string[] decode(string encoded)
{
return encoded.Split(new string[] { " :::==--==::: " }, StringSplitOptions.None);
}
But this simple solution apparently has some flaws. The string cannot contains the separator string. And consequently, the encoded string can no longer re-encoded again.
AFAIK, the comprehensive solution should involve escaping the separator on encoding and unescaping on decoding. While the problem sound simple, I believe the complete solution can take significant amount of code. I wonder if there is any trick allowed me to build encoder & decoder in very few lines of code ?

Add a reference and using to System.Web, and then:
public static string Encode(IEnumerable<string> strings)
{
return string.Join("&", strings.Select(s => HttpUtility.UrlEncode(s)).ToArray());
}
public static IEnumerable<string> Decode(string list)
{
return list.Split('&').Select(s => HttpUtility.UrlDecode(s));
}
Most languages have a pair of utility functions that do Url "percent" encoding, and this is ideal for reuse in this kind of situation.

You could use the .ToArray property on the List<> and then serialize the Array - that could then be dumped to disk or network, and reconstituted with a deserialization on the other end.
Not too much code, and you get to use the serialization techniques already tested and coded in the .net framework.

You might like to look at the way CSV files are formatted.
escape all instances of a deliminater, e.g. " in the string
wrap each item in the list in "item"
join using a simple seperator like ,
I don't believe there is a silver bullet solution to this problem.

Here's an old-school technique that might be suitable -
Serialise by storing the width of each string[] as a fixed-width prefix in each line.
So
string[0]="abc"
string[1]="defg"
string[2]=" :::==--==::: "
becomes
0003abc0004defg0014 :::==--==:::
...where the size of the prefix is large enough to cater for the string maximum length

You could use an XmlDocument to handle the serialization. That will handle the encoding for you.
public static string encode(System.Collections.Generic.List<string> data)
{
var xml = new XmlDocument();
xml.AppendChild(xml.CreateElement("data"));
foreach (var item in data)
{
var xmlItem = (XmlElement)xml.DocumentElement.AppendChild(xml.CreateElement("item"));
xmlItem.InnerText = item;
}
return xml.OuterXml;
}
public static string[] decode(string encoded)
{
var items = new System.Collections.Generic.List<string>();
var xml = new XmlDocument();
xml.LoadXml(encoded);
foreach (XmlElement xmlItem in xml.SelectNodes("/data/item"))
items.Add(xmlItem.InnerText);
return items.ToArray();
}

I would just prefix every string with its length and an terminator indicating the end of the length.
abc
defg
hijk
xyz
546
4.X
becomes
3: abc 4: defg 4: hijk 3: xyz 3: 546 3: 4.X
No restriction or limitations at all and quite simple.

Json.NET is a very easy way to serialize about any object you can imagine. JSON keeps things compact and can be faster than XML.
List<string> foo = new List<string>() { "1", "2" };
string output = JsonConvert.SerializeObject(foo);
List<string> fooToo = (List<string>)JsonConvert.DeserializeObject(output, typeof(List<string>));

It can be done much simpler if you are willing to use a separator of 2 characters long:
In java code:
StringBuilder builder = new StringBuilder();
for(String s : list) {
if(builder.length() != 0) {
builder.append("||");
}
builder.append(s.replace("|", "|p"));
}
And back:
for(String item : encodedList.split("||")) {
list.add(item.replace("|p", "|"));
}

You shouldn't need to do this manually. As the other answers have pointed out, there are plenty of ways, built-in or otherwise, to serialize/deserialize.
However, if you did decide to do the work yourself, it doesn't require that much code:
public static string CreateDelimitedString(IEnumerable<string> items)
{
StringBuilder sb = new StringBuilder();
foreach (string item in items)
{
sb.Append(item.Replace("\\", "\\\\").Replace(",", "\\,"));
sb.Append(",");
}
return (sb.Length > 0) ? sb.ToString(0, sb.Length - 1) : string.Empty;
}
This will delimit the items with a comma (,). Any existing commas will be escaped with a backslash (\) and any existing backslashes will also be escaped.
public static IEnumerable<string> GetItemsFromDelimitedString(string s)
{
bool escaped = false;
StringBuilder sb = new StringBuilder();
foreach (char c in s)
{
if ((c == '\\') && !escaped)
{
escaped = true;
}
else if ((c == ',') && !escaped)
{
yield return sb.ToString();
sb.Length = 0;
}
else
{
sb.Append(c);
escaped = false;
}
}
yield return sb.ToString();
}

Why not use Xstream to serialise it, rather than reinventing your own serialisation format?
Its pretty simple:
new XStream().toXML(yourobject)

Include the System.Linq library in your file and change your functions to this:
public static string encode(System.Collections.Generic.List<string> data, out string delimiter)
{
delimiter = ":";
while(data.Contains(delimiter)) delimiter += ":";
return string.Join(delimiter, data.ToArray());
}
public static string[] decode(string encoded, string delimiter)
{
return encoded.Split(new string[] { delimiter }, StringSplitOptions.None);
}

There are loads of textual markup languages out there, any would function
Many would function trivially given the simplicity of your input it all depends on how:
human readable you want the encoding
resilient to api changes it should be
how easy to parse it is
how easy it is to write or get a parser for it.
If the last one is the most important then just use the existing xml libraries MS supply for you:
class TrivialStringEncoder
{
private readonly XmlSerializer ser = new XmlSerializer(typeof(string[]));
public string Encode(IEnumerable<string> input)
{
using (var s = new StringWriter())
{
ser.Serialize(s, input.ToArray());
return s.ToString();
}
}
public IEnumerable<string> Decode(string input)
{
using (var s = new StringReader(input))
{
return (string[])ser.Deserialize(s);
}
}
public static void Main(string[] args)
{
var encoded = Encode(args);
Console.WriteLine(encoded);
var decoded = Decode(encoded);
foreach(var x in decoded)
Console.WriteLine(x);
}
}
running on the inputs "A", "<", ">" you get (edited for formatting):
<?xml version="1.0" encoding="utf-16"?>
<ArrayOfString
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<string>A</string>
<string><</string>
<string>></string>
</ArrayOfString>
A
<
>
Verbose, slow but extremely simple and requires no additional libraries

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

String escape into XML - c#

SecurityElement.Escape(string s)

Thanks to #sehe for the one-line escape: var escaped = new System.Xml.Linq.XText(unescaped).ToString(); I add to it the one-line un-escape: var unescapedAgain = System.Xml.XmlReader.Create(new StringReader("<r>" + escaped + "</r>")).ReadElementString();

George, it's simple. Always use the XML APIs to handle XML. They do all the escaping and unescaping for you. Never create XML by appending strings.

Related

preserve &#xA, when reading XML

XmlException - given illegal XML from 3rd party; must process

Looking for tags in strings from right to left

How can I strip HTML tags from a string in ASP.NET?

What's the simplest way to encoding List<String> into plain String and decode it back?

Categories

Resources