Im trying to parse a folder with a bunch of xml files. The xml files contains information about some vehicles. The XML files are autogenerated and i some of them has invalid characters. The thing is that, there are too many files for me to correct them manually. So i wonder how i can bypass the invalid character exception?
This is the invalid line in some of the xml files:
<ECU EcuName="ABS" EcuFamily="BSS" CplNo="" Address="0x0B" ConfigChecksum="0x00000000" Updated="false">
I have tried to use Streamreader without any success. This is my code:
XDocument docs = XDocument.Load(new System.IO.StreamReader((path), Encoding.GetEncoding("utf-8")));
var nameValues =
from fpc in docs.Descendants("FPC")
select new
{
Name = (string)fpc.Attribute("Name"),
Value = (string)fpc.Attribute("Value")
};
If you need to you can load the file with e.g.
XDocument doc;
using (XmlReader xr = XmlReader.Create(path, new XmlReaderSettings() { CheckCharacters = false }))
{
doc = XDocument.Load(xr);
}
// now query document here
That will get by character references like the one you have shown, not by disallowed literal characters however.
Related
How do I read and parse an XML file in C#?
XmlDocument to read an XML from string or from file.
using System.Xml;
XmlDocument doc = new XmlDocument();
doc.Load("c:\\temp.xml");
or
doc.LoadXml("<xml>something</xml>");
then find a node below it ie like this
XmlNode node = doc.DocumentElement.SelectSingleNode("/book/title");
or
foreach(XmlNode node in doc.DocumentElement.ChildNodes){
string text = node.InnerText; //or loop through its children as well
}
then read the text inside that node like this
string text = node.InnerText;
or read an attribute
string attr = node.Attributes["theattributename"]?.InnerText
Always check for null on Attributes["something"] since it will be null if the attribute does not exist.
LINQ to XML Example:
// Loading from a file, you can also load from a stream
var xml = XDocument.Load(#"C:\contacts.xml");
// Query the data and write out a subset of contacts
var query = from c in xml.Root.Descendants("contact")
where (int)c.Attribute("id") < 4
select c.Element("firstName").Value + " " +
c.Element("lastName").Value;
foreach (string name in query)
{
Console.WriteLine("Contact's Full Name: {0}", name);
}
Reference: LINQ to XML at MSDN
Here's an application I wrote for reading xml sitemaps:
using System;
using System.Collections.Generic;
using System.Windows.Forms;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Data;
using System.Xml;
namespace SiteMapReader
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Please Enter the Location of the file");
// get the location we want to get the sitemaps from
string dirLoc = Console.ReadLine();
// get all the sitemaps
string[] sitemaps = Directory.GetFiles(dirLoc);
StreamWriter sw = new StreamWriter(Application.StartupPath + #"\locs.txt", true);
// loop through each file
foreach (string sitemap in sitemaps)
{
try
{
// new xdoc instance
XmlDocument xDoc = new XmlDocument();
//load up the xml from the location
xDoc.Load(sitemap);
// cycle through each child noed
foreach (XmlNode node in xDoc.DocumentElement.ChildNodes)
{
// first node is the url ... have to go to nexted loc node
foreach (XmlNode locNode in node)
{
// thereare a couple child nodes here so only take data from node named loc
if (locNode.Name == "loc")
{
// get the content of the loc node
string loc = locNode.InnerText;
// write it to the console so you can see its working
Console.WriteLine(loc + Environment.NewLine);
// write it to the file
sw.Write(loc + Environment.NewLine);
}
}
}
}
catch { }
}
Console.WriteLine("All Done :-)");
Console.ReadLine();
}
static void readSitemap()
{
}
}
}
Code on Paste Bin
http://pastebin.com/yK7cSNeY
There are lots of way, some:
XmlSerializer. use a class with the target schema
you want to read - use XmlSerializer
to get the data in an Xml loaded into
an instance of the class.
Linq 2 xml
XmlTextReader.
XmlDocument
XPathDocument (read-only access)
You could use a DataSet to read XML strings.
var xmlString = File.ReadAllText(FILE_PATH);
var stringReader = new StringReader(xmlString);
var dsSet = new DataSet();
dsSet.ReadXml(stringReader);
Posting this for the sake of information.
You can either:
Use XmlSerializer class
Use XmlDocument class
Examples are on the msdn pages provided
Linq to XML.
Also, VB.NET has much better xml parsing support via the compiler than C#. If you have the option and the desire, check it out.
Check out XmlTextReader class for instance.
There are different ways, depending on where you want to get.
XmlDocument is lighter than XDocument, but if you wish to verify minimalistically that a string contains XML, then regular expression is possibly the fastest and lightest choice you can make. For example, I have implemented Smoke Tests with SpecFlow for my API and I wish to test if one of the results in any valid XML - then I would use a regular expression. But if I need to extract values from this XML, then I would parse it with XDocument to do it faster and with less code. Or I would use XmlDocument if I have to work with a big XML (and sometimes I work with XML's that are around 1M lines, even more); then I could even read it line by line. Why? Try opening more than 800MB in private bytes in Visual Studio; even on production you should not have objects bigger than 2GB. You can with a twerk, but you should not. If you would have to parse a document, which contains A LOT of lines, then this documents would probably be CSV.
I have written this comment, because I see a lof of examples with XDocument. XDocument is not good for big documents, or when you only want to verify if there the content is XML valid. If you wish to check if the XML itself makes sense, then you need Schema.
I also downvoted the suggested answer, because I believe it needs the above information inside itself. Imagine I need to verify if 200M of XML, 10 times an hour, is valid XML. XDocument will waste a lof of resources.
prasanna venkatesh also states you could try filling the string to a dataset, it will indicate valid XML as well.
public void ReadXmlFile()
{
string path = HttpContext.Current.Server.MapPath("~/App_Data"); // Finds the location of App_Data on server.
XmlTextReader reader = new XmlTextReader(System.IO.Path.Combine(path, "XMLFile7.xml")); //Combines the location of App_Data and the file name
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
break;
case XmlNodeType.Text:
columnNames.Add(reader.Value);
break;
case XmlNodeType.EndElement:
break;
}
}
}
You can avoid the first statement and just specify the path name in constructor of XmlTextReader.
If you want to retrive a particular value from an XML file
XmlDocument _LocalInfo_Xml = new XmlDocument();
_LocalInfo_Xml.Load(fileName);
XmlElement _XmlElement;
_XmlElement = _LocalInfo_Xml.GetElementsByTagName("UserId")[0] as XmlElement;
string Value = _XmlElement.InnerText;
Here is another approach using Cinchoo ETL - an open source library to parse xml file with few lines of code.
using (var r = ChoXmlReader<Item>.LoadText(xml)
.WithXPath("//item")
)
{
foreach (var rec in r)
rec.Print();
}
public class Item
{
public string Name { get; set; }
public string ProtectionLevel { get; set; }
public string Description { get; set; }
}
Sample fiddle: https://dotnetfiddle.net/otYq5j
Disclaimer: I'm author of this library.
I would like to Read and Deserialize more than one XML file into my XML class structure given a list of strings consisting of file names.
Obviously when reading ONE xml file, you can go like this:
XmlRoot file = null;
XmlSerializer ser = new XmlSerializer(typeof(XmlRoot));
using (XmlReader read = XmlReader.Create(FileName))
{
file = (XmlRoot)ser.Deserialize(read);
{
Which will deserialize the XML file into the class structure?
It is not possible to have a list with file names and use a foreach loop to iterate over them, reading and deserializing one by one as it would theoretically result into multiple root elements being read, deserialized and replicated in the class structure.
So in general I would like to deserialize each file and append the required master elements to a root object.
Does anyone know how to accomplish this? It would be of great help.
Thanks in advance!
PS: Excuse me for my English, as I am not a native speaker. If you need further information, just tell me!
I managed to solve the problem for myself.
First i created a XDocument for the first file i read, afterwards i iterate through the other documents creating a new XDocument for every xml file and try to get the elements after the root (Language in my case) and add it to the root of the XDocument created outside the loop.
XDocument lDoc = new XDocument();
int counter = 0;
foreach (var fileName in multipleFileNames)
{
try
{
counter++;
if (lCounter <= 1)
{
doc = XDocument.Load(fileName);
}
else
{
XDocument doc2 = XDocument.Load(fileName);
IEnumerable<XElement> elements = doc2.Element("Language")
.Elements();
doc.Root.Add(elements);
}
}
return Deserialize(lDoc);
Afterwards i call the Deserialize method, deserializing my created XDocument like this:
public static XmlLanguage Deserialize(XDocument doc)
{
XmlSerializer ser = new XmlSerializer(typeof(XmlLanguage));
return (XmlLanguage)ser.Deserialize(doc.CreateReader());
}
I neen apply simple xsl transform and continue work whith result data, but I wan't to save file. This is my code:
XslTransform xsl = new XslTransform();
var writer = new MemoryStream();
var xslDoc = new XPathDocument("107901.xslt");
xsl.Load(#"C:\Users\mak\Documents\Visual Studio 2015\Projects\SpellCheck\SpellCheck\GetAllValues.xslt");
xsl.Transform(xslDoc, null, writer);
writer.Position = 1;
var str = new StreamReader(writer);
var normalize = str.ReadToEnd().Trim('�');
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.Save(normalize);
1) Why in variable str appears 2 symbol 65533?
2) Why variable normalize not save how xml file? Goes error 'not able to add it to the content characters than whitespace'
Maybe I doing all wrong and can easier.
Sorry for bad english and sanks for answer :-)
Don't understand question no.1 so I'll skip to question 2. If you care to read the documentation, it is clearly mentioned that the string argument of Save() should contains "The location of the file where you want to save the document". As for populating the XmlDocument instance from XML string, you can use LoadXml() :
.....
.....
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(normalize);
xmlDocument.Save("D:\path\to\your\output.xml");
I'm sure this is very simple..
I have this as a string:
<OnTheRoadQuote xmlns:i="http://www.w3.org/2001/XMLSchema-instance"xmlns="http://schemas.datacontract.org/2004/07/OTRAPI.Services.Models">
<BasicPrice>15595.8333</BasicPrice>
<CO2>93</CO2>
<Dealer>Audi</Dealer>
<DeliveryCost>524.9900</DeliveryCost>
<DiscountPrice>14348.166636</DiscountPrice>
<DiscountSum>1247.666664</DiscountSum>
<Discounts>
<Discount>
<DiscountApplication>Percentage</DiscountApplication>
<DiscountDescription>Dealer Discount on Vehicle and Options %</DiscountDescription>
<DiscountID>Discount1</DiscountID>
<DiscountType>VehicleAndOptions</DiscountType>
<DiscountValue>8</DiscountValue>
</Discount>
</Discounts>
<OTR>17902.7879632</OTR>
</OnTheRoadQuote>
How do I read the value of the OTR node?
I've got an XmlReader but not sure how to use it.
Thanks
Using XmlReader, and fixing a typo in your root element (missing space before the second xmlns):
string xml = #"<OnTheRoadQuote xmlns:i=""http://www.w3.org/2001/XMLSchema-instance"" xmlns=""http://schemas.datacontract.org/2004/07/OTRAPI.Services.Models"">
<BasicPrice>15595.8333</BasicPrice>
<CO2>93</CO2>
<Dealer>Audi</Dealer>
<DeliveryCost>524.9900</DeliveryCost>
<DiscountPrice>14348.166636</DiscountPrice>
<DiscountSum>1247.666664</DiscountSum>
<Discounts>
<Discount>
<DiscountApplication>Percentage</DiscountApplication>
<DiscountDescription>Dealer Discount on Vehicle and Options %</DiscountDescription>
<DiscountID>Discount1</DiscountID>
<DiscountType>VehicleAndOptions</DiscountType>
<DiscountValue>8</DiscountValue>
</Discount>
</Discounts>
<OTR>17902.7879632</OTR>
</OnTheRoadQuote>";
string otrValue = "";
using (XmlReader reader = XmlReader.Create(new StringReader(xml))) // use a StringReader to load the XML string into an XmlReader
{
reader.ReadToFollowing("OTR"); // move the reader to OTR
reader.ReadStartElement(); // consume the start element
otrValue = reader.Value; // store the value in the otrValue string.
}
Keep in mind that XmlReader is forward only, meaning that you can't navigate it backwards through the XML data to read, for example, Discounts, once you've pushed it to the OTR node. If you want to do that, you should look into using XmlDocument or (preferably) XDocument. However, if all you need to do is get the OTR value, this should be the most efficient (time and space) way of doing so.
With less code, you can use LINQ to XML and load an XElement with the xml or a file. There are also other options.
XElement element = XElement.Load(....);
var node = element.Element("OTR");
https://msdn.microsoft.com/en-us/library/system.xml.linq.xelement.load%28v=vs.110%29.aspx
I basically want to know how to insert a XmlDocument inside another XmlDocument.
The first XmlDocument will have the basic header and footer tags.
The second XmlDocument will be the body/data tag which must be inserted into the first XmlDocument.
string tableData = null;
using(StringWriter sw = new StringWriter())
{
rightsTable.WriteXml(sw);
tableData = sw.ToString();
}
XmlDocument xmlTable = new XmlDocument();
xmlTable.LoadXml(tableData);
StringBuilder build = new StringBuilder();
using (XmlWriter writer = XmlWriter.Create(build, new XmlWriterSettings { OmitXmlDeclaration = true }))
{
writer.WriteStartElement("dataheader");
//need to insert the xmlTable here somehow
writer.WriteEndElement();
}
Is there an easier solution to this?
Use importNode feature in your document parser.
You can use this code based on CreateCDataSection method
// Create an XmlCDataSection from your document
var cdata = xmlTable.CreateCDataSection("<test></test>");
XmlElement root = xmlTable.DocumentElement;
// Append the cdata section to your node
root.AppendChild(cdata);
Link : http://msdn.microsoft.com/fr-fr/library/system.xml.xmldocument.createcdatasection.aspx
I am not sure what you are really looking for but this can show how to merge two xml documents (using Linq2xml)
string xml1 =
#"<xml1>
<header>header1</header>
<footer>footer</footer>
</xml1>";
string xml2 =
#"<xml2>
<body>body</body>
<data>footer</data>
</xml2>";
var xdoc1 = XElement.Parse(xml1);
var xdoc2 = XElement.Parse(xml2);
xdoc1.Descendants().First(d => d.Name == "header").AddAfterSelf(xdoc2.Elements());
var newxml = xdoc1.ToString();
OUTPUT
<xml1>
<header>header1</header>
<body>body</body>
<data>footer</data>
<footer>footer</footer>
</xml1>
You will need to write the inner XML files in CDATA sections.
Use writer.WriteCData for such nodes, passing in the inner XML as text.
writer.WriteCData(xmlTable.OuterXml);
Another option (thanks DJQuimby) is to encode the XML to some XML compatible format (say base64) - note that the encoding used must be XML compatible and that some encoding schemes will increase the size of the encoded document (base64 adds ~30%).