This question already has an answer here:
How to stop XDocument.Save writing escape chars
(1 answer)
Closed 2 years ago.
The below XElement converts the special character "&" to "&".
XElement newElement = new XElement("testting", "wow&testvalue");
I want it to be the "&" not &.
I want it to be the "&" not &.
Then it would be invalid XML. Why do you want invalid XML?
LINQ to XML is expressing the text you've requested in valid XML. That's what it's meant to do. If you ask for the text of the element later (through this or any other decent XML API) you'll get back wow&testvalue.
As Tim says, you could use a CDATA section:
var element = new XElement("testing", new XCData("wow&testvalue"));
But you can't tell LINQ to XML not to escape what it needs to escape...
Related
This question already has answers here:
How to parse invalid (bad / not well-formed) XML?
(4 answers)
Closed 3 years ago.
Getting exception while parsing the XML if it contains '&' and '<' characters. I have read somewhere that having these characters in XML means that XML is not valid, but I'm receiving it from third party where I can't reformat it.
Below is my code of XML parsing using XDocument:
string data = profile.Content.ReadAsStringAsync().Result; //Read input
XDocument doc = new XDocument();
if (data != "")
{
string rawHtml = WebUtility.HtmlDecode(data);
doc = XDocument.Parse(rawHtml); //Parse input into XDocument
}
Here, data contains actual XML input and not XML filepath.
Please suggest me how to handle these special characters.
This data is not XML.
Check what you agreed with the third party.
If the contract was to exchange data in XML, then they are failing to satisfy the contract and you should deal with it the way you would deal with any other faulty goods from a supplier: return it and ask for your money back.
If the agreement didn't specify that they would send you XML, then you shouldn't be trying to parse it with an XML parser.
This question already has answers here:
What is the best way to parse html in C#? [closed]
(15 answers)
Closed 6 years ago.
Using Regex, I'm trying to get data from html code, but I don't know how build it, without using any html tags.
I have some string (item-desc), and count of symbols after this string, which must be my data.
Something like: in item-desc12345abcde, I'm using regex with value of 6 symbols, and i got 12345a.
This expression give me only 1 symbol after my string:
Regex itemInfoFilter = new Regex(#"item-desc\s*(.+?)\s*>", RegexOptions.Compiled | RegexOptions.IgnoreCase);
I don't recommend using regular expressions to parse HTML.
Use an HTML parser instead:
HTML Agility Pack
From what I understand of your question I think this should work: item-desc(.){6}(?=[\s'"])
In the code I assume that your string ends with a space (\s), ' or "
Hope this helps
This question already has answers here:
How to avoid System.Xml.Linq.XElement escaping HTML content?
(4 answers)
Closed 7 years ago.
My method receives a XML string as the input and I need to put this XML string into XML envelope using XElement:
input: <hello>Hello!</hello>
expected result: <envelope><hello>Hello!</hello></envelope>
The problem is that this code:
string xmlHello = "<hello>Hello!</hello>";
XElement xelem = new XElement("envelope", xmlHello);
escapes all <> and so the result is:
<envelope><hello>Hello!</hello></envelope>
Is there any way to disable this behaviour of the XElement constructor to be able to accept XML as the value? The input string can be really huge, so I would like to avoid parsing it.
As mentioned in the comments, this can't be done directly as the API has no way of knowing your text is actually well formed XML unless you pass it something it knows is an XML element.
So what you need to do is parse your XML first:
string xmlHello = "<hello>Hello!</hello>";
var hello = XElement.Parse(xmlHello);
var envelope = new XElement("envelope", hello);
Resulting in:
<envelope>
<hello>Hello!</hello>
</envelope>
This question already has answers here:
What is the best way to parse html in C#? [closed]
(15 answers)
Closed 8 years ago.
I have html file that is a well-formed xml document (tags are paired), but contains anchor like the one below:
link
Xml parser invoked by XDocument.Load throws XmlException that says:
Additional information: '=' is an unexpected token. The expected token is ';'.
How can I instruct parser that I '&body' is not an entity? Do I must escape '&' character?
Not all HTML is going to be valid XML so you shouldn't try to parse it as such (although, in this case, it looks like you have some un-escpaped strings in the document that should probably get taken care of).
Instead, you should use something like the HTMLAgilityPack to parse your HTML and work with the document that way.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
I have a text file consist of conversion instruction templates.
I need to parse this text file,
I need to match something like this:
(Source: <element>)
And get the "element".
Or this pattern:
(Source: <element attr="name" value=""/>)
And get "element attr="name"".
I am currently using this regex:
\(Source:\ ?\<(.*?)\>\)
Sorry for being a newbie. :)
Thanks for all your help.
-JRC
Try this Regex for detect attibs by both ” or " characters:
\(Source:\s+<(\w+\s+(?:\w+=[\"”][^\"”]+[\"”])?)[^>]*>\)
and your code:
var result = Regex.Match(strInput,
"\\(Source:\\s+<(\\w+\\s+(?:\\w+=[\"”][^\"”]+[\"”])?)[^>]*>")
.Groups[1].Value;
explain:
(subexpression)
Captures the matched subexpression and assigns it a zero-based ordinal number.
?
Matches the previous element zero or one time.
\w
Matches any word character.
+
Matches the previous element one or more times.