XML deserialization from XSD with variable XML elements - c#

I have been given an XSD file that represents a huge number of elements and associated attributes. I have created an CS class using xsd.exe.
The issue is that the xml that is created can contain any or all elements and attributes.
Example XML:
<App action="A" id="1" validate="yes"><ProductType id="5885"/><SpecType id="221"/><Qty>1</Qty><PartType id="7212"/><Part>456789</Part></App>
<App action="A" id="2" validate="yes"><ProductType id="5883"/><Qty>1</Qty><PartType id="7211"/><Part>132465</Part></App>
Then in my code:
protected static void ImportProduct(string filename)
{
var counter = 0;
var xSerializer = new XmlSerializer(typeof(ProductList));
var fs = new FileStream(String.Format("{0}{1}", FilePath, filename), FileMode.Open);
var reader = XmlReader.Create(fs);
var items = (ProductList)xSerializer.Deserialize(reader);
foreach (var record in items.App)
{
counter++;
Console.Write(String.Format("{0}{1}", record.ProductType.id, Environment.NewLine));
Console.Write(String.Format("{0}{1}", record.Part.Value, Environment.NewLine));
*if (!record.SpecType.Value.Equals(null))
Console.Write(String.Format("{0}{1}", record.SpecType.id, Environment.NewLine));
else
Console.Write(String.Format("{0}{1}", "No SpecType!", Environment.NewLine));
if (counter == 10)
break;
}
}
So my question is how I can check for an empty/ non-existent element, per the starred (*) line above.
I cannot change the xsd or source XML files in any way, as they are produced by major manufacturers.
Let me know if you need more information.
Thanks! Brad

Sorry, XSD.EXE and XML Serialization isn't going to deal with XML like that.
XML of that nature is created because someone thinks it should be easy for humans to read and type in. They don't think about whether machines will be able to use them. It's a mistake that you'll now have to pay for.
The best you could do would be to create an XSLT that will place the elements into some canonical order, then create an XSD representing that order and create classes from the XSD.

Once you have an XSD you could use the dataset instead of the XML Reader. Then there are a few automatic methods created to check nulls as seen in the below example.
eg. This in an example where CalcualtionAnalysisDS is the XSD.
CalcualtionAnalysisDS ds = new CalcualtionAnalysisDS();
ds.ReadXml("calc.xml");
foreach (CalcualtionAnalysisDS.ReportRow row in ds.Report.Rows)
{
if (row.IsBestSHDSLDesignClassNull)
{
}
}

Related

Removing empty space from closing XML tag

Below is a code snippet of creating a document:
CdtrAcct = new CdtrAcct
{
Id = new Id
{
IBAN = iban,
Othr = new Othr
{
Id = creditBankAcct
},
},
},
If the IBAN field has a value, then Id is null. However, when the XML file is formed, I get the below:
<CdtrAcct>
<Id>
<IBAN>XXXXXXXXXXXXXXXXXXX</IBAN>
<Othr />
</Id>
</CdtrAcct>
The problem that I have is that the software that reads the XML cannot process the whitespace here: <Othr />. What do I need to do to get <Othr/>?
C# code:
XmlSerializer serializer = new XmlSerializer(typeof(Document));
var textWriter = new StreamWriter(#"C:\BankFiles\Outbox\" + filename + ".xml");
serializer.Serialize(textWriter, config);
textWriter.Close();
Convert the XML to a string (e.g. myString), then you can replace " \>" with "\>" by using
myString.Replace(" \>", "\>");
Afterwards you can print it into a file.
Of course this is a workaround / hack and getting the buggy software fixed should be the first try. However, this should solve your problem immediately.
It's best to avoid processing XML with anything other than a conformant XML parser.
But if you're stuck with it, it's easy enough to put the XML through an identity transformation that re-serializes it without the whitespace.

Reading a Text File that has XML data C# [duplicate]

How do I read and parse an XML file in C#?
XmlDocument to read an XML from string or from file.
using System.Xml;
XmlDocument doc = new XmlDocument();
doc.Load("c:\\temp.xml");
or
doc.LoadXml("<xml>something</xml>");
then find a node below it ie like this
XmlNode node = doc.DocumentElement.SelectSingleNode("/book/title");
or
foreach(XmlNode node in doc.DocumentElement.ChildNodes){
string text = node.InnerText; //or loop through its children as well
}
then read the text inside that node like this
string text = node.InnerText;
or read an attribute
string attr = node.Attributes["theattributename"]?.InnerText
Always check for null on Attributes["something"] since it will be null if the attribute does not exist.
LINQ to XML Example:
// Loading from a file, you can also load from a stream
var xml = XDocument.Load(#"C:\contacts.xml");
// Query the data and write out a subset of contacts
var query = from c in xml.Root.Descendants("contact")
where (int)c.Attribute("id") < 4
select c.Element("firstName").Value + " " +
c.Element("lastName").Value;
foreach (string name in query)
{
Console.WriteLine("Contact's Full Name: {0}", name);
}
Reference: LINQ to XML at MSDN
Here's an application I wrote for reading xml sitemaps:
using System;
using System.Collections.Generic;
using System.Windows.Forms;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Data;
using System.Xml;
namespace SiteMapReader
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Please Enter the Location of the file");
// get the location we want to get the sitemaps from
string dirLoc = Console.ReadLine();
// get all the sitemaps
string[] sitemaps = Directory.GetFiles(dirLoc);
StreamWriter sw = new StreamWriter(Application.StartupPath + #"\locs.txt", true);
// loop through each file
foreach (string sitemap in sitemaps)
{
try
{
// new xdoc instance
XmlDocument xDoc = new XmlDocument();
//load up the xml from the location
xDoc.Load(sitemap);
// cycle through each child noed
foreach (XmlNode node in xDoc.DocumentElement.ChildNodes)
{
// first node is the url ... have to go to nexted loc node
foreach (XmlNode locNode in node)
{
// thereare a couple child nodes here so only take data from node named loc
if (locNode.Name == "loc")
{
// get the content of the loc node
string loc = locNode.InnerText;
// write it to the console so you can see its working
Console.WriteLine(loc + Environment.NewLine);
// write it to the file
sw.Write(loc + Environment.NewLine);
}
}
}
}
catch { }
}
Console.WriteLine("All Done :-)");
Console.ReadLine();
}
static void readSitemap()
{
}
}
}
Code on Paste Bin
http://pastebin.com/yK7cSNeY
There are lots of way, some:
XmlSerializer. use a class with the target schema
you want to read - use XmlSerializer
to get the data in an Xml loaded into
an instance of the class.
Linq 2 xml
XmlTextReader.
XmlDocument
XPathDocument (read-only access)
You could use a DataSet to read XML strings.
var xmlString = File.ReadAllText(FILE_PATH);
var stringReader = new StringReader(xmlString);
var dsSet = new DataSet();
dsSet.ReadXml(stringReader);
Posting this for the sake of information.
You can either:
Use XmlSerializer class
Use XmlDocument class
Examples are on the msdn pages provided
Linq to XML.
Also, VB.NET has much better xml parsing support via the compiler than C#. If you have the option and the desire, check it out.
Check out XmlTextReader class for instance.
There are different ways, depending on where you want to get.
XmlDocument is lighter than XDocument, but if you wish to verify minimalistically that a string contains XML, then regular expression is possibly the fastest and lightest choice you can make. For example, I have implemented Smoke Tests with SpecFlow for my API and I wish to test if one of the results in any valid XML - then I would use a regular expression. But if I need to extract values from this XML, then I would parse it with XDocument to do it faster and with less code. Or I would use XmlDocument if I have to work with a big XML (and sometimes I work with XML's that are around 1M lines, even more); then I could even read it line by line. Why? Try opening more than 800MB in private bytes in Visual Studio; even on production you should not have objects bigger than 2GB. You can with a twerk, but you should not. If you would have to parse a document, which contains A LOT of lines, then this documents would probably be CSV.
I have written this comment, because I see a lof of examples with XDocument. XDocument is not good for big documents, or when you only want to verify if there the content is XML valid. If you wish to check if the XML itself makes sense, then you need Schema.
I also downvoted the suggested answer, because I believe it needs the above information inside itself. Imagine I need to verify if 200M of XML, 10 times an hour, is valid XML. XDocument will waste a lof of resources.
prasanna venkatesh also states you could try filling the string to a dataset, it will indicate valid XML as well.
public void ReadXmlFile()
{
string path = HttpContext.Current.Server.MapPath("~/App_Data"); // Finds the location of App_Data on server.
XmlTextReader reader = new XmlTextReader(System.IO.Path.Combine(path, "XMLFile7.xml")); //Combines the location of App_Data and the file name
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
break;
case XmlNodeType.Text:
columnNames.Add(reader.Value);
break;
case XmlNodeType.EndElement:
break;
}
}
}
You can avoid the first statement and just specify the path name in constructor of XmlTextReader.
If you want to retrive a particular value from an XML file
XmlDocument _LocalInfo_Xml = new XmlDocument();
_LocalInfo_Xml.Load(fileName);
XmlElement _XmlElement;
_XmlElement = _LocalInfo_Xml.GetElementsByTagName("UserId")[0] as XmlElement;
string Value = _XmlElement.InnerText;
Here is another approach using Cinchoo ETL - an open source library to parse xml file with few lines of code.
using (var r = ChoXmlReader<Item>.LoadText(xml)
.WithXPath("//item")
)
{
foreach (var rec in r)
rec.Print();
}
public class Item
{
public string Name { get; set; }
public string ProtectionLevel { get; set; }
public string Description { get; set; }
}
Sample fiddle: https://dotnetfiddle.net/otYq5j
Disclaimer: I'm author of this library.

Remove Duplicates in Xml Document

I have two XML documents that contain a list of products. Currently, we just copy one and paste it into the other and create a new merged document, however, these two files have a number of the same products so I need to merge the two and remove the duplicates. My XML documents are in the following structure:
<?xml version="1.0" encoding="iso-8859-1"?>
<table>
<row Code="HST15154"
ProductName="test"
ProductName_EN=""
Description_EN=""
Price=""
ProductType1="HST ACCESSORIES"
ProductType2="SAM - Accessories"
ProductCategory="Accessories"
Remarks=""
/>
</table>
I found some code that I tried to alter to my needs here. I need only one of each "Code."
using System;
using System.Collections.Generic;
using System.Xml;
namespace HST_Merging_Console_App
{
public class Program
{
public void Main(string[] args)
{
//open the xml document
XmlDocument doc = new XmlDocument();
doc.LoadXml("U:\\Documents (U)\\XML Merging Tool\\productcollection_us.xml");
//select all row elements
XmlNodeList parts = doc.SelectNodes("/row");
//create a list of previously seen P/Ns
List<string> PartsSeen = new List<string>();
foreach(XmlNode part in parts)
{
string partNumber = part.Attributes["Code"].Value;
//for each part, see if we have seen it before, if it is in the list,
//remove the part element from the parent to which it belongs
if (PartsSeen.Contains(partNumber))
part.ParentNode.RemoveChild(part);
else
PartsSeen.Add(partNumber);
}
Console.Read();
doc.Save("U:\\Documents (U)\\XML Merging Tool\\productcollection_merged.xml");
}
}
}
I'm receiving a couple errors when I run this:
CS1061 - 'XmlDocument' does not contain a definition for 'SelectNodes' and no extension method 'SelectNodes' accepting a first argument of type 'XmlDocument' could be found (are you missing a using directive or an assembly reference?) (Line 16)
CS1503 - Argument 1: cannot convert from 'string' to 'System.IO.Stream' (Line 33)
Another approach I've considered is to take the first file and load into a dataset then take the second file and load it into a 2nd dataset. Then loop through the 2nd dataset searching for the Code in the 1st dataset, if found update the row, if not, add the row.
This is my first time working with C# and trying to create a program to run on a server. Any help and/or advice is greatly appreciated.
Use LINQ to Xml.
With HashSet you can recognize duplicate codes. HashSet.Add() will return false if same value already exists in the set.
var doc = XDocument.Load(yourPath);
var codes = new HashSet<string>();
// .ToList() is important for removing elements
foreach(var row in doc.Root.Elements("row").ToList())
{
var code = row.Attribute("Code").Value;
var isUniqueCode = codes.Add(code);
if(isUniqueCode == false)
{
row.Remove();
}
}
doc.Save(newPath);
You can use XDocument instead, which is a little easier to use that XmlDocument. When using that you will need to as using System.Xml.Linq. Then simply group on the "Code" attribute like this use LINQ to XML:
XDocument doc = XDocument.Load("U:\\Documents (U)\\XML Merging Tool\\productcollection_us.xml");
var uniqueProducts = doc.Root.Elements("row").GroupBy(x => (string)x.Attribute("Code"));
You can do this in a more easy way, try something like this:
var uniques = doc.Descendants("row").Attributes("Code").Distinct()
i haven't tested this though so it might need some modifications

How to append new item into serialized xml data without deserialize old data?

Normally, I use the following code to serialize object to XML file. Everyday, I have about 100-1000 new items to be added into this list in different period of time.
var xmlSerializer = new XmlSerializer(typeof(List<TestModel>));
xmlSerializer.Serialize(stream, list);
How to append new item into serialized xml data without deserialize old data?
Thanks,
You can serialize objecto to memory and append to existing file. Also take a look at MS article Efficient Techniques for Modifying Large XML Files which shows two techniques both applicable in your situation.
There is no way of doing what you want using only the XmlSerializer that I can think of, but with a little bit of extra work this is possible.
A simple approach to this would be to serialize the list for the first item(s) of the day - as your existing code does. When new data comes in, you can now open the saved xml using an XmlDocument and append the serialization of one single item at a time.
One thing to note is that if the resulting xml is extremely big, the XmlDocument may grow very large (and may be slow or even cause OutOfMemoryExceptions as Pavel Kyments notes in a comment), in this case you may want to investigate XmlReader and XmlWriter to append the xml serially. However the overall approach would remain the same (open->serialize your new item->append the generated xml->resave)
[EDIT - changed code sample to show chained XmlReader/XmlWriter, rather than XmlDocument approach]
Something along these lines:
public static void AppendToXml(
Stream xmlSource, // your existing xml - could be from a file, etc
Stream updatedXmlDestination, // your target xml, could be a different file
string rootElementName, // the root element name of your list, e.g. TestModels
TestModel itemToAppend) // the item to append
{
var writerSettings = new XmlWriterSettings {Indent = true, IndentChars = " " };
using (var reader = XmlReader.Create(xmlSource))
using (var writer = XmlWriter.Create(updatedXmlDestination, writerSettings))
{
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.XmlDeclaration:
break;
case XmlNodeType.Element:
writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
if (reader.HasAttributes)
{
while (reader.MoveToNextAttribute())
{
writer.WriteAttributeString(reader.Prefix, reader.LocalName, reader.NamespaceURI, reader.Value);
}
}
if (reader.IsEmptyElement)
writer.WriteEndElement();
break;
case XmlNodeType.EndElement:
if (reader.Name == rootElementName)
{
var serializer = new XmlSerializer(typeof(TestModel));
var ns = new XmlSerializerNamespaces();
ns.Add("", "");
serializer.Serialize(writer, itemToAppend, ns);
}
writer.WriteEndElement();
break;
case XmlNodeType.Text:
writer.WriteRaw(SecurityElement.Escape(reader.Value));
break;
case XmlNodeType.CDATA:
writer.WriteCData(reader.Value);
break;
}
}
}
}
Note: you may want to add support for other node types (omitted here for brevity), such as Whitespace, Comments, Processing Instructions, etc. These all follow the same pattern as CDATA above: put a case in, call the appropriate writer method.
With this updated approach - you never have more than a small amount in memory at any given time.
I don't think it is possible. You want to perform random access to a block of data that is serialized and deserialized, so it has to be accessed sequentially.
Maybe you can modify directly the XML document, which will be faster, but you'll be losing the facilities of using a serialized/deserialized object tree, which is much more easy to manipulate (add/remove objects, ...)

How do I read and parse an XML file in C#?

How do I read and parse an XML file in C#?
XmlDocument to read an XML from string or from file.
using System.Xml;
XmlDocument doc = new XmlDocument();
doc.Load("c:\\temp.xml");
or
doc.LoadXml("<xml>something</xml>");
then find a node below it ie like this
XmlNode node = doc.DocumentElement.SelectSingleNode("/book/title");
or
foreach(XmlNode node in doc.DocumentElement.ChildNodes){
string text = node.InnerText; //or loop through its children as well
}
then read the text inside that node like this
string text = node.InnerText;
or read an attribute
string attr = node.Attributes["theattributename"]?.InnerText
Always check for null on Attributes["something"] since it will be null if the attribute does not exist.
LINQ to XML Example:
// Loading from a file, you can also load from a stream
var xml = XDocument.Load(#"C:\contacts.xml");
// Query the data and write out a subset of contacts
var query = from c in xml.Root.Descendants("contact")
where (int)c.Attribute("id") < 4
select c.Element("firstName").Value + " " +
c.Element("lastName").Value;
foreach (string name in query)
{
Console.WriteLine("Contact's Full Name: {0}", name);
}
Reference: LINQ to XML at MSDN
Here's an application I wrote for reading xml sitemaps:
using System;
using System.Collections.Generic;
using System.Windows.Forms;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Data;
using System.Xml;
namespace SiteMapReader
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Please Enter the Location of the file");
// get the location we want to get the sitemaps from
string dirLoc = Console.ReadLine();
// get all the sitemaps
string[] sitemaps = Directory.GetFiles(dirLoc);
StreamWriter sw = new StreamWriter(Application.StartupPath + #"\locs.txt", true);
// loop through each file
foreach (string sitemap in sitemaps)
{
try
{
// new xdoc instance
XmlDocument xDoc = new XmlDocument();
//load up the xml from the location
xDoc.Load(sitemap);
// cycle through each child noed
foreach (XmlNode node in xDoc.DocumentElement.ChildNodes)
{
// first node is the url ... have to go to nexted loc node
foreach (XmlNode locNode in node)
{
// thereare a couple child nodes here so only take data from node named loc
if (locNode.Name == "loc")
{
// get the content of the loc node
string loc = locNode.InnerText;
// write it to the console so you can see its working
Console.WriteLine(loc + Environment.NewLine);
// write it to the file
sw.Write(loc + Environment.NewLine);
}
}
}
}
catch { }
}
Console.WriteLine("All Done :-)");
Console.ReadLine();
}
static void readSitemap()
{
}
}
}
Code on Paste Bin
http://pastebin.com/yK7cSNeY
There are lots of way, some:
XmlSerializer. use a class with the target schema
you want to read - use XmlSerializer
to get the data in an Xml loaded into
an instance of the class.
Linq 2 xml
XmlTextReader.
XmlDocument
XPathDocument (read-only access)
You could use a DataSet to read XML strings.
var xmlString = File.ReadAllText(FILE_PATH);
var stringReader = new StringReader(xmlString);
var dsSet = new DataSet();
dsSet.ReadXml(stringReader);
Posting this for the sake of information.
You can either:
Use XmlSerializer class
Use XmlDocument class
Examples are on the msdn pages provided
Linq to XML.
Also, VB.NET has much better xml parsing support via the compiler than C#. If you have the option and the desire, check it out.
Check out XmlTextReader class for instance.
There are different ways, depending on where you want to get.
XmlDocument is lighter than XDocument, but if you wish to verify minimalistically that a string contains XML, then regular expression is possibly the fastest and lightest choice you can make. For example, I have implemented Smoke Tests with SpecFlow for my API and I wish to test if one of the results in any valid XML - then I would use a regular expression. But if I need to extract values from this XML, then I would parse it with XDocument to do it faster and with less code. Or I would use XmlDocument if I have to work with a big XML (and sometimes I work with XML's that are around 1M lines, even more); then I could even read it line by line. Why? Try opening more than 800MB in private bytes in Visual Studio; even on production you should not have objects bigger than 2GB. You can with a twerk, but you should not. If you would have to parse a document, which contains A LOT of lines, then this documents would probably be CSV.
I have written this comment, because I see a lof of examples with XDocument. XDocument is not good for big documents, or when you only want to verify if there the content is XML valid. If you wish to check if the XML itself makes sense, then you need Schema.
I also downvoted the suggested answer, because I believe it needs the above information inside itself. Imagine I need to verify if 200M of XML, 10 times an hour, is valid XML. XDocument will waste a lof of resources.
prasanna venkatesh also states you could try filling the string to a dataset, it will indicate valid XML as well.
public void ReadXmlFile()
{
string path = HttpContext.Current.Server.MapPath("~/App_Data"); // Finds the location of App_Data on server.
XmlTextReader reader = new XmlTextReader(System.IO.Path.Combine(path, "XMLFile7.xml")); //Combines the location of App_Data and the file name
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
break;
case XmlNodeType.Text:
columnNames.Add(reader.Value);
break;
case XmlNodeType.EndElement:
break;
}
}
}
You can avoid the first statement and just specify the path name in constructor of XmlTextReader.
If you want to retrive a particular value from an XML file
XmlDocument _LocalInfo_Xml = new XmlDocument();
_LocalInfo_Xml.Load(fileName);
XmlElement _XmlElement;
_XmlElement = _LocalInfo_Xml.GetElementsByTagName("UserId")[0] as XmlElement;
string Value = _XmlElement.InnerText;
Here is another approach using Cinchoo ETL - an open source library to parse xml file with few lines of code.
using (var r = ChoXmlReader<Item>.LoadText(xml)
.WithXPath("//item")
)
{
foreach (var rec in r)
rec.Print();
}
public class Item
{
public string Name { get; set; }
public string ProtectionLevel { get; set; }
public string Description { get; set; }
}
Sample fiddle: https://dotnetfiddle.net/otYq5j
Disclaimer: I'm author of this library.

Categories

Resources