XMLException. List of the all invalid characters - c#

I try execute such a code sample.
var xmlDocument = new XmlDocument();
documentTagName = "testName)"
XmlNode headerElement = xmlDocument.CreateElement(documentTagName);
Of cource I get XmlException:
The ')' character, hexadecimal value 0x... (doesn't matter), cannot be included in a name
Because I have ) symbol in documentTagName. And of cource I'll get the same exception if documentTagName would be like this:
documentTagName = "testName("
or like this:
documentTagName = "testName:"
Because all of these characters ('(' , ')' , ':') are invalid for the xml tag name. But I check many links (and even this) and cannot find the list of all invalid characters for xml tag name. Can anybody help me?

Related

replace new lines with "" in C#

I want to convert this:
<translation>
1 Sənədlər
</translation>
to
<translation>1 Sənədlər</translation> in XML using C#.
Please help me. Only translation tags.
I tried this:
XDocument xdoc = XDocument.Load(path);
xdoc.Save("path, SaveOptions.DisableFormatting);
But it does not remove the new lines between <translation> tags.
what you have should work. you can validate by dumping the XDocument to a string variable to confirm if the SaveOptions is removing the formatting.
for eg: i tried the below and content does not have any formatting including newlines and whitespaces.
XDocument xmlDoc = new XDocument(new XElement("Team", new XElement("Developer", "Sam")));
var content = xmlDoc.ToString(SaveOptions.DisableFormatting);
A new line is determined in the code by "\n" and possibly also "\r". You can simply remove these:
string xmlString = "<translation>\r\n1 Sənədlər\r\n</translation>"; // With the 'new lines'
xmlString = xmlString.Replace("\r", "").Replace("\n", "");
This will result in:
<translation>
1 Sənədlər
</translation>
Becomming:
<translation>1 Sənədlər</translation>
I hope this helps.
You can strip out newlines manually in an environment-sensitive way by using
var content = xmlString.Replace(Environment.NewLine, string.Empty)
XML defines two types of whitespace: significant and insignificant:
Insignificant whitespace is the whitespace between elements where text content doesn't occur, whereas significant whitespace is the whitespace within elements that contain text content. You might find the graphic in this article useful to show the difference.
What you have in your translation element is significant whitespace; the element contains text so it is assumed to be part of the element contents. Without a schema or DTD that says it can be collapsed, no amount of changing the whitespace handling on read or write is going to remove this. These options only relate to the insignificant whitespace.
What you can do is apply your own processing: using LINQ to XML, you can trim the whitespace of all elements that contain only text using something like this:
var textElements = doc.Descendants()
.Where(element => element.Nodes().All(node => node is XText));
foreach (var element in textElements)
{
element.Value = element.Value.Trim();
}
See this fiddle for a demo.

Why can't I parse this element with htmlagilitypack?

I can't figure out how to parse the following:
-Example webpage I'm trying to parse: http://www.aliexpress.com/item/-/255859073.html
-Information I'm trying to get: "7-days". This is the processing time located in the left column of the shipping table.
-The shipping table becomes visible after clicking on the "Shipping and Payment" tab (which is down the page a bit).
So far I have tried selecting the node with different x-path values:
HtmlAgilityPack.HtmlDocument currentHTML = new HtmlAgilityPack.HtmlDocument();
HtmlWeb webget = new HtmlWeb();
currentHTML = webget.Load("http://www.aliexpress.com/item/-/255859073.html");
string processingTime = currentHTML.DocumentNode.SelectSingleNode("/html/body/div[2]/div[4]/div/div/div[2]/div/div/div[3]/div/div/div/div[2]/table/tbody/tr/td[5]").InnerText;
and also:
string processingTime = currentHTML.DocumentNode.SelectSingleNode("//*[contains(concat( \" \", #class, \" \" ), concat( \" \", \"processing\", \" \" ))]").InnerText;
But I get this error:
System.NullReferenceException was unhandled
Message=Object reference not set to an instance of an object.
I also tried their mobile phone website but they didn't display this information there.
Any idea why this is happening and what I need to do?
Looks like your XPath expression was incorrect. Regardless the element you were trying to parse could be better reached by using its Id attribute. I've modified the XPath expression, and for bonus I've added a Regular Expression that will allow you to cleanly parse the days portion from the text.
System.Text.RegularExpressions.Regex
dayParseRegex = new System.Text.RegularExpressions.Regex(#"(?<days>\d)( days\))$");
HtmlAgilityPack.HtmlDocument currentHTML = new HtmlAgilityPack.HtmlDocument();
HtmlWeb webget = new HtmlWeb();
currentHTML = webget.Load("http://www.aliexpress.com/item/-/255859073.html");
//Extract node
var handlingTimeNode = currentHTML.DocumentNode.SelectSingleNode("//*[#id=\"product-info-shipping-sub\"]");
//Run RegEx against text
var match = dayParseRegex.Match(handlingTimeNode.InnerText);
//Convert the days to an integer from the resultant group
int shippingDays = Convert.ToInt32(match.Groups["days"].Value);
Talk about coding and gettin' paid! Now go rip the hell outta that site!

Escaping ONLY contents of Node in XML

I have a part of code mentioned like below.
//Reading from a file and assign to the variable named "s"
string s = "<item><name> Foo </name></item>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(s);
But, it stops working if the contents has characters something like "<", ">"..etc.
string s = "<item><name> Foo > Bar </name></item>";
I know, I have to escape those characters before loading but, if I do like
doc.LoadXml(System.Security.SecurityElement.Escape(s));
, the tags (< , >) are also escaped and as a result, the error occurs.
How can I solve this problem?
a tricky solution:
string s = "<item><name> Foo > Bar </name></item>";
s = Regex.Replace(s, #"<[^>]+?>", m => HttpUtility.HtmlEncode(m.Value)).Replace("<","ojlovecd").Replace(">","cdloveoj");
s = HttpUtility.HtmlDecode(s).Replace("ojlovecd", ">").Replace("cdloveoj", "<");
XmlDocument doc = new XmlDocument();
doc.LoadXml(s);
Assuming your content will never contain the characters "]]>", you can use CDATA.
string s = "<item><name><![CDATA[ Foo > Bar ]]></name></item>";
Otherwise, you'll need to html encode your special characters, and decode them before you use/display them (unless it's in a browser).
string s = "<item><name> Foo > Bar </name></item>";
Assign the content of string to the InnerXml property of node.
var node = doc.CreateElement("root");
node.InnerXml = s;
Take a look at - Different ways how to escape an XML string in C#
It looks like the strings that you have generated are strings, and not valid XML. You can either get the strings generated as valid XML OR if you know that the strings are always going to be the name, then don't include the XML <item> and <name> tags in the data.
Then when you create the XMLDocument. do a CreateElement and assign your string before resaving the results.
XmlDocument doc = new XmlDocument();
XmlElement root = doc.CreateElement("item");
doc.AppendChild(root);
XmlElement name = doc.CreateElement("name");
name.InnerText = "the contents from your file";
root.AppendChild(name);

c# parsing xml with and apostrophe throws exception

I am parsing an xml file and am running into an issue when trying find a node that has an apostrophe in it. When item name does not have this everything works fine. I have tried replacing the apostrophe with different escape chars but am not having much luck
string s = "/itemDB/item[#name='" + itemName + "']";
// Things i have tried that did not work
// s.Replace("'", "''");
// .Replace("'", "\'");
XmlNode parent = root.SelectSingleNode(s);
I always receive an XPathException. What is the proper way to do this. Thanks
For apostophe replace it with &apos;
You can do it Like this:
XmlDocument root = new XmlDocument();
root.LoadXml(#"<itemDB><item name=""abc'def""/></itemDB>");
XmlNode node = root.SelectSingleNode(#"itemDB/item[#name=""abc'def""]");
Note the verbatim string literal '#' and the double quotes.
Your code would then look like this and there is no need to replace anything:
var itemName = #"abc'def";
string s = #"/itemDB/item[#name=""" + itemName + #"""]";

The ':' character, hexadecimal value 0x3A, cannot be included in a name

I have an xml file that contains its element like
<ab:test>Str</ab:test>
When I am trying to access it using the code:
XElement tempElement = doc.Descendants(XName.Get("ab:test")).FirstOrDefault();
It's giving me this error:
System.Web.Services.Protocols.SoapException: Server was unable to process request. ---> System.Xml.XmlException: The ':' character, hexadecimal value 0x3A, cannot be included in a name.
How should I access it?
If you want to use namespaces, LINQ to XML makes that really easy:
XNamespace ab = "http://whatever-the-url-is";
XElement tempElement = doc.Descendants(ab + "test").FirstOrDefault();
Look for an xmlns:ab=... section in your document to find out which namespace URI "ab" refers to.
Try putting your namespace in { ... } like so:
string xfaNamespace = "{http://www.xfa.org/schema/xfa-template/2.6/}";
I was having the same error. I found I was adding code...
var ab = "http://whatever-the-url-is";
... but ab was determined to be a string. This caused the error reported by OP. Instead of using the VAR keyword, I used the actual data type XNamespace...
XNamespace ab = "http://whatever-the-url-is";
... and the problem went away.
There is an overload of the Get method you might want to try that takes into account the namespace. Try this:
XElement tempElement = doc.Descendants(XName.Get("test", "ab")).FirstOrDefault();
Try to get namespace from the document
var ns = doc.Root.Name.Namespace;
Deleting AndroidManifest.xml and AndroidManifest.xml.DISABLED worked for me.

Categories

Resources