Remove Duplicates in Xml Document - c#

I have two XML documents that contain a list of products. Currently, we just copy one and paste it into the other and create a new merged document, however, these two files have a number of the same products so I need to merge the two and remove the duplicates. My XML documents are in the following structure:
<?xml version="1.0" encoding="iso-8859-1"?>
<table>
<row Code="HST15154"
ProductName="test"
ProductName_EN=""
Description_EN=""
Price=""
ProductType1="HST ACCESSORIES"
ProductType2="SAM - Accessories"
ProductCategory="Accessories"
Remarks=""
/>
</table>
I found some code that I tried to alter to my needs here. I need only one of each "Code."
using System;
using System.Collections.Generic;
using System.Xml;
namespace HST_Merging_Console_App
{
public class Program
{
public void Main(string[] args)
{
//open the xml document
XmlDocument doc = new XmlDocument();
doc.LoadXml("U:\\Documents (U)\\XML Merging Tool\\productcollection_us.xml");
//select all row elements
XmlNodeList parts = doc.SelectNodes("/row");
//create a list of previously seen P/Ns
List<string> PartsSeen = new List<string>();
foreach(XmlNode part in parts)
{
string partNumber = part.Attributes["Code"].Value;
//for each part, see if we have seen it before, if it is in the list,
//remove the part element from the parent to which it belongs
if (PartsSeen.Contains(partNumber))
part.ParentNode.RemoveChild(part);
else
PartsSeen.Add(partNumber);
}
Console.Read();
doc.Save("U:\\Documents (U)\\XML Merging Tool\\productcollection_merged.xml");
}
}
}
I'm receiving a couple errors when I run this:
CS1061 - 'XmlDocument' does not contain a definition for 'SelectNodes' and no extension method 'SelectNodes' accepting a first argument of type 'XmlDocument' could be found (are you missing a using directive or an assembly reference?) (Line 16)
CS1503 - Argument 1: cannot convert from 'string' to 'System.IO.Stream' (Line 33)
Another approach I've considered is to take the first file and load into a dataset then take the second file and load it into a 2nd dataset. Then loop through the 2nd dataset searching for the Code in the 1st dataset, if found update the row, if not, add the row.
This is my first time working with C# and trying to create a program to run on a server. Any help and/or advice is greatly appreciated.

Use LINQ to Xml.
With HashSet you can recognize duplicate codes. HashSet.Add() will return false if same value already exists in the set.
var doc = XDocument.Load(yourPath);
var codes = new HashSet<string>();
// .ToList() is important for removing elements
foreach(var row in doc.Root.Elements("row").ToList())
{
var code = row.Attribute("Code").Value;
var isUniqueCode = codes.Add(code);
if(isUniqueCode == false)
{
row.Remove();
}
}
doc.Save(newPath);

You can use XDocument instead, which is a little easier to use that XmlDocument. When using that you will need to as using System.Xml.Linq. Then simply group on the "Code" attribute like this use LINQ to XML:
XDocument doc = XDocument.Load("U:\\Documents (U)\\XML Merging Tool\\productcollection_us.xml");
var uniqueProducts = doc.Root.Elements("row").GroupBy(x => (string)x.Attribute("Code"));

You can do this in a more easy way, try something like this:
var uniques = doc.Descendants("row").Attributes("Code").Distinct()
i haven't tested this though so it might need some modifications

Related

Find and delete all occurrences of a string that starts with x

I'm parsing an XML file, to compare it to another XML file. XML Diff works nicely, but we have found there are a lot of junk tags that exist in one file, not in the other, that have no bearing on our results, but clutter up the report. I have loaded the XML file into memory to do some other things to it, and I'm wondering if there is an easy way at the same time to go through that file, and remove all tags that start with, as an example color=. The value of color is all over the map, so not easy to grab them all remove them.
Doesn't seem to be any way in XML Diff to specify, "ignore these tags".
I could roll through the file, find each instance, find the end of it, delete it out, but I'm hoping there will be something simpler. If not, oh well.
Edit: Here's a piece of the XML:
<numericValue color="-103" hidden="no" image="stuff.jpg" key="More stuff." needsQuestionFormatting="false" system="yes" systemEquivKey="Stuff." systemImage="yes">
<numDef increment="1" maximum="180" minimum="30">
<unit deprecated="no" key="BPM" system="yes" />
</numDef>
</numericValue>
If you are using Linq to XML, you can load your XML into an XDocument via:
var doc = XDocument.Parse(xml); // Load the XML from a string
Or
var doc = XDocument.Load(fileName); // Load the XML from a file.
Then search for all elements with matching names and use System.Xml.Linq.Extensions.Remove() to remove them all at once:
string prefix = "L"; // Or whatever.
// Use doc.Root.Descendants() instead of doc.Descendants() to avoid accidentally removing the root element.
var elements = doc.Root.Descendants().Where(e => e.Name.LocalName.StartsWith(prefix, StringComparison.Ordinal));
elements.Remove();
Update
In your XML, the color="-103" substring is an attribute of an element, rather than an element itself. To remove all such attributes, use the following method:
public static void RemovedNamedAttributes(XElement root, string attributeLocalNamePrefix)
{
if (root == null)
throw new ArgumentNullException();
foreach (var node in root.DescendantsAndSelf())
node.Attributes().Where(a => a.Name.LocalName == attributeLocalNamePrefix).Remove();
}
Then call it like:
var doc = XDocument.Parse(xml); // Load the XML
RemovedNamedAttributes(doc.Root, "color");

Code provided to convert xml tags to attributes. Please explain. I'm a newbie

I need to convert xml tags to attributes so the following code loops through and does that BUT I'm a newbie. Just downloaded Visual Studio. I am used to actionscript so it's similar. However, I don't know how to paste in the code to make it work.
Converting XML nodes into attributes using C#
This is what I have so far. I pressed on new c# project and new class.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ClassLibrary1
{
public class Class1
{
foreach (XElement el in root.Elements()) {
root.Add(new XAttribute(el.Name, (string)el)); }
root.Elements().Remove();
Console.WriteLine(root);
}
}
It's not really clear what you are asking here.
If you want to load an XDocument from an XML file, use XDocument.Load(). If you want to load an XDocument from a string containing XML, use XDocument.Parse().
If you want to promote the child elements of an XElement to be attributes of that element, you need to be aware that, unlike elements, duplicated attribute names are disallowed by the XML standard. You might also want to skip elements that contain nested child elements. If you don't, be aware that (string)el behaves as follows:
If the XElement has children, the concatenated string value of all of the element's text and descendant's text is returned.
That being said, the following promotes child elements to attributes, concatenating the values of identically named elements but doing nothing if there is already an attribute with the element's name:
public static void PromoteChildElementsToAttributes(XElement element)
{
foreach (var group in element.Elements().GroupBy(el => el.Name))
{
// Remove if you don't want this check.
if (element.Elements().Any())
{
//Uncomment if you want to skip elements with children.
//continue;
}
if (element.Attribute(group.Key) != null)
{
Debug.WriteLine("Cannot add duplicate attribute " + element.Attribute(group.Key));
continue;
}
var value = group.Aggregate(new StringBuilder(), (sb, el) => sb.Append((string)el)).ToString();
element.Add(new XAttribute(group.Key, value));
foreach (var el in group)
el.Remove();
}
}
And you could call it like:
var doc = XDocument.Load(fileName);
PromoteChildElementsToAttributes(doc.Root)

How to get specific Values from a xml file with same Name in a element?

I don't know how to extract values from this specific XML document, and am looking for some help as I'm not very experienced on xml parsing.
I have to use XDocument.Load to load the file.
Actually i am using
doc = XDocument.Load(uri);
challenge = GetValue(doc, "Challenge");
this works without any problems, but how to get the inner values of the Element Rights ? (multiple "Name")
At the end of the day i need to now
Phone = x
Dial = x
HomeAuto = x
BoxAdmin = x
It’s also possible that some of the entries (Phone,Dial,HomeAuto,BoxAdmin) is missing. This
is dynamic.
Here is my xml File:
<SessionInfo>
<SID>68eba0c8cef752a7</SID>
<Challenge>37a5fe9f</Challenge>
<BlockTime>0</BlockTime>
<Rights>
<Name>Phone</Name>
<Access>2</Access>
<Name>Dial</Name>
<Access>2</Access>
<Name>HomeAuto</Name>
<Access>2</Access>
<Name>BoxAdmin</Name>
<Access>2</Access>
</Rights>
</SessionInfo>
Edit: (Add GetValue method)
public string GetValue(XDocument doc, string name)
{
XElement info = doc.FirstNode as XElement;
return info.Element(name).Value;
}
NB: this solution uses extension methods, so the using directives are important or you won't see the required functions.
using System;
using System.Linq;
using System.Xml.Linq;
using System.Xml.XPath;
using System.Collections.Generic;
namespace StackOverflow
{
class Program
{
const string xml = "<SessionInfo><SID>68eba0c8cef752a7</SID><Challenge>37a5fe9f</Challenge><BlockTime>0</BlockTime><Rights><Name>Phone</Name><Access>2</Access><Name>Dial</Name><Access>2</Access><Name>HomeAuto</Name><Access>2</Access><Name>BoxAdmin</Name><Access>2</Access></Rights></SessionInfo>";
static void Main(string[] args)
{
XDocument doc = XDocument.Parse(xml); //loads xml from string above rather than file - just to make it easy for me to knock up this sample for you
string nameOfElementToFind = "Name";
IEnumerable<XElement> matches = doc.XPathSelectElements(string.Format("//*[local-name()='{0}']",nameOfElementToFind));
//at this stage you can reference any value from Matches by Index
Console.WriteLine(matches.Count() > 2 ? "Third name is: " + matches.ElementAt(2).Value : "There less than 3 values");
//or can loop through
foreach (XElement match in matches)
{
Console.WriteLine(match.Value);
//or if you also wanted the related access info (this is a bit loose / assumes the Name will always be followed by the related Value
//Console.WriteLine("{0}: {1}", match.Value, match.XPathSelectElement("./following-sibling::*[1]").Value);
}
Console.WriteLine("Done");
Console.ReadKey();
}
}
}
The important bit here is the line IEnumerable<XElement> matches = doc.XPathSelectElements(string.Format("//*[local-name()=\'{0}\']",nameOfElementToFind));. After the string.format takes place the XPath is //*[local-name()='Name']. This XPath statement says to find all nodes with the name Name. The local-name() function's there because we haven't said what schema's being used, in this instance we want any element called Name, regardless of schema.
XmlNamespaceManager nm = new XmlNamespaceManager(new NameTable());
nm.AddNamespace("eg", "http://Example/Namespace/Replace/With/Your/Docs/Namespace");
IEnumerable<XElement> matches = document.XPathSelectElements("//eg:Name", nm);
The double forward-slash says to search anywhere in the document. To limit it to Rights you could say /eg:SessionInfo/eg:Rights/eg:Name. In case you're unfamiliar with it, XPath's an awesome language / essential if you want to get the most out of working with XML docs. If you have any questions about it please give us a shout, or have a look around online; there are great tutorials out there.

XPathSelectElements returns null

Load function is already defined in xmlData class
public class XmlData
{
public void Load(XElement xDoc)
{
var id = xDoc.XPathSelectElements("//ID");
var listIds = xDoc.XPathSelectElements("/Lists//List/ListIDS/ListIDS");
}
}
I'm just calling the Load function from my end.
XmlData aXmlData = new XmlData();
string input, stringXML = "";
TextReader aTextReader = new StreamReader("D:\\test.xml");
while ((input = aTextReader.ReadLine()) != null)
{
stringXML += input;
}
XElement Content = XElement.Parse(stringXML);
aXmlData.Load(Content);
in load function,im getting both id and and listIds as null.
My test.xml contains
<SEARCH>
<ID>11242</ID>
<Lists>
<List CURRENT="true" AGGREGATEDCHANGED="false">
<ListIDS>
<ListID>100567</ListID>
<ListID>100564</ListID>
<ListID>100025</ListID>
<ListID>2</ListID>
<ListID>1</ListID>
</ListIDS>
</List>
</Lists>
</SEARCH>
EDIT: Your sample XML doesn't have an id element in the namespace with the nss alias. It would be <nss:id> in that case, or there'd be a default namespace set up. I've assumed for this answer that in reality the element you're looking for is in the namespace.
Your query is trying to find an element called id at the root level. To find all id elements, you need:
var tempId = xDoc.XPathSelectElements("//nss:id", ns);
... although personally I'd use:
XDocument doc = XDocument.Parse(...);
XNamespace nss = "http://schemas.microsoft.com/SQLServer/reporting/reportdesigner";
// Or use FirstOrDefault(), or whatever...
XElement idElement = doc.Descendants(nss + "id").Single();
(I prefer using the query methods on LINQ to XML types instead of XPath... I find it easier to avoid silly syntax errors etc.)
Your sample code is also unclear as you're using xDoc which hasn't been declared... it helps to write complete examples, ideally including everything required to compile and run as a console app.
I am looking at the question 3 hours after it was submitted and 41 minutes after it was (last) edited.
There are no namespaces defined in the provided XML document.
var listIds = xDoc.XPathSelectElements("/Lists//List/ListIDS/ListIDS");
This XPath expression obviously doesn't select any node from the provided XML document, because the XML document doesn't have a top element named Lists (the name of the actual top element is SEARCH)
var id = xDoc.XPathSelectElements("//ID");
in load function,im getting both id and and listIds as null.
This statement is false, because //ID selects the only element named ID in the provided XML document, thus the value of the C# variable id is non-null. Probably you didn't test thoroughly after editing the XML document.
Most probably the original ID element belonged to some namespace. But now it is in "no namespace" and the XPath expression above does select it.
string xmldocument = "<response xmlns:nss=\"http://schemas.microsoft.com/SQLServer/reporting/reportdesigner\"><action>test</action><id>1</id></response>";
XElement Content = XElement.Parse(xmldocument);
XPathNavigator navigator = Content.CreateNavigator();
XmlNamespaceManager ns = new XmlNamespaceManager(navigator.NameTable);
ns.AddNamespace("nss", "http://schemas.microsoft.com/SQLServer/reporting/reportdesigner");
var tempId = navigator.SelectSingleNode("/id");
The reason for the null value or system returned value is due to the following
var id = xDoc.XPathSelectElements("//ID");
XpathSElectElements is System.xml.linq.XElment which is linq queried date. It cannot be directly outputed as such.
To Get individual first match element
use XPathSelectElement("//ID");
You can check the number of occurrences using XPathSelectElements as
var count=xDoc.XPathSelectElements("//ID").count();
you can also query the linq statement as order by using specific conditions
Inorder to get node value from a list u can use this
foreach (XmlNode xNode in xDoc.SelectNodes("//ListIDS/ListID"))
{
Console.WriteLine(xNode.InnerText);
}
For Second list you havnt got the value since, the XPath for list items is not correct

XML deserialization from XSD with variable XML elements

I have been given an XSD file that represents a huge number of elements and associated attributes. I have created an CS class using xsd.exe.
The issue is that the xml that is created can contain any or all elements and attributes.
Example XML:
<App action="A" id="1" validate="yes"><ProductType id="5885"/><SpecType id="221"/><Qty>1</Qty><PartType id="7212"/><Part>456789</Part></App>
<App action="A" id="2" validate="yes"><ProductType id="5883"/><Qty>1</Qty><PartType id="7211"/><Part>132465</Part></App>
Then in my code:
protected static void ImportProduct(string filename)
{
var counter = 0;
var xSerializer = new XmlSerializer(typeof(ProductList));
var fs = new FileStream(String.Format("{0}{1}", FilePath, filename), FileMode.Open);
var reader = XmlReader.Create(fs);
var items = (ProductList)xSerializer.Deserialize(reader);
foreach (var record in items.App)
{
counter++;
Console.Write(String.Format("{0}{1}", record.ProductType.id, Environment.NewLine));
Console.Write(String.Format("{0}{1}", record.Part.Value, Environment.NewLine));
*if (!record.SpecType.Value.Equals(null))
Console.Write(String.Format("{0}{1}", record.SpecType.id, Environment.NewLine));
else
Console.Write(String.Format("{0}{1}", "No SpecType!", Environment.NewLine));
if (counter == 10)
break;
}
}
So my question is how I can check for an empty/ non-existent element, per the starred (*) line above.
I cannot change the xsd or source XML files in any way, as they are produced by major manufacturers.
Let me know if you need more information.
Thanks! Brad
Sorry, XSD.EXE and XML Serialization isn't going to deal with XML like that.
XML of that nature is created because someone thinks it should be easy for humans to read and type in. They don't think about whether machines will be able to use them. It's a mistake that you'll now have to pay for.
The best you could do would be to create an XSLT that will place the elements into some canonical order, then create an XSD representing that order and create classes from the XSD.
Once you have an XSD you could use the dataset instead of the XML Reader. Then there are a few automatic methods created to check nulls as seen in the below example.
eg. This in an example where CalcualtionAnalysisDS is the XSD.
CalcualtionAnalysisDS ds = new CalcualtionAnalysisDS();
ds.ReadXml("calc.xml");
foreach (CalcualtionAnalysisDS.ReportRow row in ds.Report.Rows)
{
if (row.IsBestSHDSLDesignClassNull)
{
}
}

Categories

Resources