I am developing an application where I am reading a file, converting the contents into string and then loading the string in XML. But the issue that I am facing is that while loading the string data into XML I am getting an exception of invalid characters. I am using the following piece of code. Could any one help me to resolve the issue. Thank you in advance.
ZipFileEntry objContactXML;
String xmlData = ASCIIEncoding.UTF8.GetString(objContactXML.FileData);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlData);
Regards,
Sanchaita
Firstly, this is a nasty bit of code:
ASCIIEncoding.UTF8
Please use just Encoding.UTF8 - it's UTF-8, not ASCII.
Now, you can create a StringReader around your XML text data - but you'd actually be better off not turning it into string data at all. It may be encoded in something other than UTF-8 - and the XML parser knows how to deal with that. It's entirely possible that this is why you're running into problems with your current approach. Leave the data in binary and parse that:
using (MemoryStream stream = new MemoryStream(objContactXML.FileData))
{
document.Load(stream);
}
As an aside, if you're using .NET 3.5 or higher, I would strongly advise you to use LINQ to XML (XDocument etc) instead of the old DOM API. LINQ to XML is a much nicer API.
In LINQ to XML, you'd use:
XDocument document;
using (MemoryStream stream = new MemoryStream(objContactXML.FileData))
{
document = XDocument.Load(stream);
}
Related
I have an xml string coming from Adobe PDF AcroForms, which apparently allows naming form fields starting with numeric characters. I'm trying to parse this string to an XDocument:
XDocument xDocument = XDocument.Parse(xmlString);
But whenever I encounter such a form field where the name starts with a numeric char, the xml parsing throws an XmlException:
Name cannot begin with the 'number' character
Other solutions I found were about using: XmlReaderSettings.CheckCharacters
using (XmlReader xmlReader = XmlReader.Create(new StringReader(xmlString), new XmlReaderSettings() { CheckCharacters = false }))
{
XDocument xDocument = XDocument.Load(xmlReader);
}
But this also didn't work. Some articles pointed out the reason as one of the points mentioned in MSDN article:
If the XmlReader is processing text data, it always checks that the
XML names and text content are valid, regardless of the property
setting. Setting CheckCharacters to false turns off character checking
for character entity references.
So I tried using:
using(MemoryStream memoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(xmlString)))
using (XmlReader xmlReader = XmlReader.Create(memoryStream, new XmlReaderSettings() { CheckCharacters = false }))
{
XDocument xDocument = XDocument.Load(xmlReader);
}
This also didn't work.
Can any one please help me in figuring out how to parse an xml string that contains xml elements whose name starts with numeric characters?
How is the flag XmlReaderSettings.CheckCharacters supposed to be used?
You can't make standard XML parser parse your format even if it "looks like" XML, stop trying. Standard-compliant XML parsers are disallowed to parse invalid XML. This was a design decision, based on all the problems quirks mode caused with HTML parsing.
Writing your own parser isn't that hard. XML is very strict and, unless you need advanced features, the syntax is simple.
LL parser can be written by hand. Both lexer and parser are simple.
LR parser can be generated using ANTLR and a simple grammar. Most likely, you'll even find example XML garmmars.
You can also just take either of .NET XML parsers' source code and remove validation you don't need. You can find both XmlDocument and XDocument in .NET Core's repository on GitHub.
I have this XML string bn:
<Root><Row><ITEMNO>1</ITEMNO><USED>y</USED><PARTSOURCE>Buy</PARTSOURCE><QTY>2</QTY></Row><Row><ITEMNO>5</ITEMNO><PARTSOURCE>Buy</PARTSOURCE><QTY>5</QTY></Row></Root>
I am trying to convert it to an XDocument like this:
var doc = XDocument.Parse(bn);
However, I get this error:
Data at the root level is invalid. Line 1, position 1.
Am I missing something?
UPDATE:
This is the method I use to create the xml string:
public static string SerializeObjectToXml(Root rt)
{
var memoryStream = new MemoryStream();
var xmlSerializer = new XmlSerializer(typeof(Root));
var xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.UTF8);
xmlSerializer.Serialize(xmlTextWriter, rt);
memoryStream = (MemoryStream)xmlTextWriter.BaseStream;
string xmlString = ByteArrayToStringUtf8(memoryStream.ToArray());
xmlTextWriter.Close();
memoryStream.Close();
memoryStream.Dispose();
return xmlString;
}
It does add to the start that I have to remove. Could I change something to make it correct from the start?
There is two characters at the beginning of your string that, although you can't see them, are still there and make the string fail. Try this instead:
<Root><Row><ITEMNO>1</ITEMNO><USED>y</USED><PARTSOURCE>Buy</PARTSOURCE><QTY>2</QTY></Row><Row><ITEMNO>5</ITEMNO><PARTSOURCE>Buy</PARTSOURCE><QTY>5</QTY></Row></Root>
The character in question is this. This is a byte-order mark, basically telling the program reading it if it's big or little endian. It seems like you copied and pasted this from a file that wasn't decoded properly.
To remove it, you could use this:
yourString.Replace(((char)0xFEFF).ToString(), "")
You have two unprintable characters (Zero-Width No-break Space) at the beginning of your string.
XML does not allow text outside the root element.
The accepted answer does unnecessary string processing, but, in its defense, it's because you're unnecessarily dealing in string when you don't have to. One of the great things about the .NET XML APIs is that they have robust internals. So instead of trying to feed a string to XDocument.Parse, feed a Stream or some type of TextReader to XDocument.Load. This way, you aren't fooling with manually managing the encoding and any problems it creates, because the internals will handle all of that stuff for you. Byte-order marks are a pain in the neck, but if you're dealing in XML, .NET makes it easier to handle them.
I'm developing a windows app using C#. I chose xml for data storage.
It is required to read xml file, make small changes, and then write it back to hard disk.
Now, what is the easiest way of doing this?
XLinq is much comfortable than the ordinary Xml, because is much more object oriented, supports linq, has lots of implicit casts and serializes to the standard ISO format.
The best way is to use XML Serialization where it loads the XML into a class (with various classes representing all the elements/attributes). You can then change the values in code and then serialize back to XML.
To create the classes, the best thing to do is to use xsd.exe which will generate the c# classes for you from an existing XML document.
I think the easiest way of doing it - it is using XmlDocument class:
var doc = new XmlDocument();
doc.Load("filename or stream or streamwriter or XmlReader");
//do something
doc.Save("filename or stream or streamwriter or XmlWriter");
I think I found the easiest way, check out this Project in Codeproject. It is easy to use as XML elements are accessed similarly to array elements using name strings as indexes.
Code sample to write bool property to XML:
Xmlconfig xcfg = new Xmlconfig("config.xml", true);
xcfg.Settings[this.Name]["AddDateStamp"]["bool"].boolValue = checkBoxAddStamp.Checked;
xcfg.Save("config.xml");
Sample to read the property:
Xmlconfig xcfg = new Xmlconfig("config.xml", true);
checkBoxAddStamp.Checked = xcfg.Settings[this.Name]["AddDateStamp"]["bool"].boolValue;
To write string use .Value, for int .intValue.
You can use LINQ to read XML Files as described here...
LINQ to read XML
Check out linq to XML
I have a string containing fully formatted XML data, created using a Perl script.
I now want to convert this string into an actual XML file in C#. Is there anyway to do this?
Thanks,
You can load a string into an in-memory representation, for example, using the LINQ to SQL XDocument type. Loading string can be done using Parse method and saving the document to a file is done using the Save method:
open System.Xml.Linq;
XDocument doc = XDocument.Parse(xmlContent);
doc.Save(fileName);
The question is why would you do that, if you already have correctly formatted XML document?
A good reasons that I can think of are:
To verify that the content is really valid XML
To generate XML with nice indentation and line breaks
If that's not what you need, then you should just write the data to a file (as others suggest).
Could be as simple as
File.WriteAllText(#"C:\Test.xml", "your-xml-string");
or
File.WriteAllText(#"C:\Test.xml", "your-xml-string", Encoding.UTF8);
XmlDocument doc = new XmlDocument();
doc.Load(... your string ...);
doc.Save(... your destination path...);
see also
http://msdn.microsoft.com/fr-fr/library/d5awd922%28v=VS.80%29.aspx
I have tons of XML files all containing a the same XML Document, but with different values. But the structure is the same for each file.
Inside this file I have a datetime field.
What is the best, most efficient way to query these XML files? So I can retrieve for example... All files where the datetime field = today's date?
I'm using C# and .net v2. Should I be using XML objects to achieve this or text in file search routines?
Some code examples would be great... or just the general theory, anything would help, thanks...
This depends on the size of those files, and how complex the data actually is. As far as I understand the question, for this kind of XML data, using an XPath query and going through all the files might be the best approach, possibly caching the files in order to lessen the parsing overhead.
Have a look at:
XPathDocument, XmlDocument classes and XPath queries
http://support.microsoft.com/kb/317069
Something like this should do (not tested though):
XmlNamespaceManager nsmgr = new XmlNamespaceManager(new NameTable());
// if required, add your namespace prefixes here to nsmgr
XPathExpression expression = XPathExpression.Compile("//element[#date='20090101']", nsmgr); // your query as XPath
foreach (string fileName in Directory.GetFiles("PathToXmlFiles", "*.xml")) {
XPathDocument doc;
using (XmlTextReader reader = new XmlTextReader(fileName, nsmgr.NameTable)) {
doc = new XPathDocument(reader);
}
if (doc.CreateNavigator().SelectSingleNode(expression) != null) {
// matching document found
}
}
Note: while you can also load a XPathDocument directly from a URI/path, using the reader makes sure that the same nametable is being used as the one used to compile the XPath query. If a different nametable was being used, you'd not get results from the query.
You might look into running XSL queries. See also XSLT Tutorial, XML transformation using Xslt in C#, How to query XML with an XPath expression by using Visual C#.
This question also relates to another on Stack Overflow: Parse multiple XML files with ASP.NET (C#) and return those with particular element. The accepted answer there, though, suggests using Linq.
If it is at all possible to move to C# 3.0 / .NET 3.5, LINQ-to-XML would be by far the easiest option.
With .NET 2.0, you're stuck with either XML objects or XSL.