I want quick function which may be part of my xml parser, I do not want to parse whole string and check if it correct xml.
This is not really doable without parsing, or at least—in a limited form—without using a regular expression. Names in XML permit different characters as the first character and as second and further characters — see the Name production.
Should you implement IsValidXmlChar without a context, i.e. just checking if the given character is a NameChar, as per the XML specification, the output of your example would be GridAttributeStuff.
So you should at least tokenize the input text to retrieve valid names, and parse the input to retrieve element names, i.e. output Grid in your example.
To check if a string is a XML name, the XmlReader class offers the IsName static method. To categorize characters in an XML text, there is the XmlCharType struct in .NET Framework as well as in .NET Core, but it's internal.
Related
I have a list of Types and want to merge their FullNames into a single string for storage in an Xml Attribute and extract with string.Split().
I was just going to use '|' but I saw https://msdn.microsoft.com/en-us/library/yfsftwz6(v=vs.110).aspx and thought it better to find something also compatible with that in case I ever need to store Type Name specifications.
What simple character can I safely use as a separator?
I have a Description textbox on the page. When enter the data in that and submit the page. I will pass that string to XML tag in the XML file.
If user enter any invalid characters in textbox which are not allowed for xml. How to remove or parse them from string? I need to validate string for XML data.
If you're using the XmlDocument or XDocument classes to build the XML then you don't need to worry as they'll do the encoding for you.
Otherwise, if you generating the XML by hand you can use the SecurityElement.Escape method to encode invalid XML characters
That depends on how you are creating the XML. If you are assembling the XML string yourself, there are A LOT of things you should do and take into consideration.
Thus, you should not be doing that (assembling the string yourself).
.NET provides you with abstraction layers so you don't have to deal with that. Example: XDocument
I'm receiving data from a Flash component embedded in a Windows Form. Unfortunately, if the data returned from the socket contains any of the following characters, the call to loadXml below fails:
This is the callback method I have to receive data from the socket (via ExternalInterface in the Flash component).
private void player_FlashCall(object sender, _IShockwaveFlashEvents_FlashCallEvent e)
{
String output = e.request;
//output = CleanInvalidXmlChars(output);
XmlDocument document = new XmlDocument();
document.LoadXml(output);
XmlAttributeCollection attributes = document.FirstChild.Attributes;
String command = attributes.Item(0).InnerText;
XmlNodeList list = document.GetElementsByTagName("arguments");
process(list[0].InnerText);
I had a method to replace the characters with text (CleanInvalidXmlChars), but I don't think this is the right approach.
How can I load this data into an XML file, as this makes separating the method name, paramter names and parameter types which are returned very easy to work with.
Would appreciate any help at all.
Thanks.
If the “XML” contains any U+0001 (aka '\x01') or other similar characters, it is not a valid XML. There is no way you can include those characters in XML (well, in XML 1.0, anyway). See the XML specification. If you need to pass e.g. binary data in XML, you need to convert them to a proper form, e.g. using Base-64.
If the data does contain those invalid characters, it is not XML, and therefore cannot be read using standard XML tools (I don’t think any of the standard .NET classes allows you to override that behavior). You can either replace all those characters (these are basically all control characters (U+0000 through U+001F) except U+0009 (tab), U+000A and U+000D (CR+LF), plus U+FFFE and U+FFFF (noncharacters)) prior to use as you tried – you could devise a safe transformation which would not lose any data (e.g. first replace all # characters with #0040, then replace any invalid character with #xxxx where xxxx is its code, and when processing the parsed XML data, replace all #xxxx back).
Another option is to drop the XML idea and just process it as a string. Just for inspiration, see e.g. this piece of code.
I am using Dataset.ReadXML() to read an XML string. I get an error as the XML string contains the Invalid Character 0x1F which is 'US' - Unit seperator. This is contained within fully formed tags.
The data is extracted from an Oracle DB, using a Perl script. How would be the best way to escape this character so that the XML is read correctly.
EDIT: XML String:
<RESULT>
<DEPARTMENT>Oncology</DEPARTMENT>
<DESCRIPTION>Oncology</DESCRIPTION>
<STUDY_NAME>**7360C hsd**</STUDY_NAME>
<STUDY_ID>27</STUDY_ID>
</RESULT>
Is between the C and h in the bold part, is where there is a US seperator, which when pasted into this actually shows a space. So I want to know how can I ignore that in an XML string?
If you look at section 2.2 of the XML recommendation, you'll see that x01F is not in the range of characters allowed in XML documents. So while the string you're looking at may look like an XML document to you, it isn't one.
You have two problems. The relatively small one is what to do about this document. I'd probably preprocess the string and discard any character that's not legal in well-formed XML, but then I don't know anything about the relatively large problem.
And the relatively large problem is: what's this data doing in there in the first place? What purpose (if any) do non-visible ASCII characters in the middle of a (presumably) human-readable data field serve? Why is it doesn't the Perl script that produces this string failing when it encounters an illegal character?
I'll bet you one American dollar that it's because the person who wrote that script is using string manipulation and not an XML library to emit the XML document. Which is why, as I've said time and again, you should never use string manipulation to produce XML. (There are certainly exceptions. If you're writing a throwaway application, for instance, or an XML parser. Or if your name's Tim Bray.)
Your XmlReader/TextReader must be created with correct encoding. You can create it as below and pass to your Dataaset:
StreamReader reader = new StreamReader("myfile.xml",Encoding.ASCII); // or correct encoding
myDataset.ReadXml(reader);
I'm populating an XElement with information and writing it to an xml file using the XElement.Save(path) method. At some point, certain characters in the resulting file are being escaped - for example, > becomes >.
This behaviour is unacceptable, since I need to store information in the XML that includes the > character as part of a password. How can I write the 'raw' content of my XElement object to XML without having these escaped?
Lack of this behavior is unacceptable.
A standalone unescaped > is invalid XML.
XElement is designed to produce valid XML.
If you want to get the unescaped content of the element, use the Value property.
The XML specification usually allows > to appear unescaped. XDocument plays it safe and escapes it although it appears in places where the escaping is not strictly required.
You can do a replace on the generated XML. Be aware per http://www.w3.org/TR/REC-xml#syntax, if this results in any ]]> sequences, the XML will not conform to the XML specification. Moreover, XDocument.Parse will actually reject such XML with the error "']]>' is not allowed in character data.".
XDocument doc = XDocument.Parse("<test>Test>Data</test>");
// Don't use this if it could result in any ]]> sequences!
string s = doc.ToString().Replace(">", ">");
System.IO.File.WriteAllText(#"c:\path\test.xml", s);
In consideration that any spec-compliant XML parser must support >, I'd highly recommend fixing the code that is processing the XML output of your program.