C#-Sorting XML elements -Possible?(without ADO.NET) - c#

Just i need to recreate a xml file after appling sorting on key filed element(say EmpID),The thing is ,i should not use ADO.NET.Which is the best sort to go ahead ?.To do so,What XML Class do i need to use?,LINQ is quite handy?

No need for c\ to do this. you can do it via an XSL file
<xsl:template match="/">
<xsl:apply-template select="yourlementnode">
<xsl:sort select="EmpID" order="ascending" />
</xsl:apply-template>
</xsl:template>

LINQ to XML would probably be your best bet. You could either move the elements "in place" or (possibly more easily) create a new document with the re-ordered elements.
If you can give us some sample XML (input and desired output) it should be fairly easy to come up with some example code.

Related

Delete node from xml documents

I'm new in the programming world.
I'm just looking for help with some kind of code, that would delete node from bunch of xml documents.
It is possible to make something which would delete node in bunch of xml documents at once?
There are many different technologies you could use, which is a bit daunting if you are new to programming. Since this quite a simple task, it's probably not worth investing a lot of time learning new tools: but then it all depends on what you're already comfortable with.
Many people would use XSLT for any job that involves modifying XML documents. You could write an XSLT 3.0 stylesheet transform.xsl like this:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="deleted-node"/>
</xsl:transform>
where "deleted-node" is the name of the nodes you want to delete (or a more complex pattern if you need it). And then you could apply this to all XML files in a directory in, putting the result in directory out, using the Saxon XSLT processor from the command line like this:
Transform -s:in -o:out -xsl:transform.xsl
The way this works is that xsl:mode defines the default processing to be applied to nodes if there isn't a more specific rule; shallow-copy means that you copy the tags and then move on to process the content. There's only one more specific rule, which matches the elements you want to delete; the rule is empty indicating that when you hit one of these elements, you output nothing.

what's the fastest way to write XML

I need create XML files frequently and I choose XmlWrite to do the job, I found it spent much time on things like WriteAttributeString ( I need write lots of attributes in some cases), my question is are there some better way to create xml files? Thanks in advance.
Fastest way that I know is two write the document structure as a plain string and parse it into an XDocument object:
string str =
#"<?xml version=""1.0""?>
<!-- comment at the root level -->
<Root>
<Child>Content</Child>
</Root>";
XDocument doc = XDocument.Parse(str);
Console.WriteLine(doc);
Now you will have a structured and ready to use XDocument object where you can populate with your data. Also, you can even parse a fully structured and populated XML as string and start from there. Also you can always use structured XElements like this:
XElement doc =
new XElement("Inventory",
new XElement("Car", new XAttribute("ID", "1000"),
new XElement("PetName", "Jimbo"),
new XElement("Color", "Red"),
new XElement("Make", "Ford")
)
);
doc.Save("InventoryWithLINQ.xml");
Which will generate:
<Inventory>
<Car ID="1000">
<PetName>Jimbo</PetName>
<Color>Red</Color>
<Make>Ford</Make>
</Car>
</Inventory>
XmlSerializer
You only have to define hierarchy of classes you want to serialize, that is all. Additionally you can control the schema through some attributes applied to your properties.
Write it directly to a file via for example a FileStream (through manually created code). This can be made very fast, but also pretty hard to maintain. As always, optimizations comes with a prize tag.
Also, do not forget that "premature optimization is the root of all evil".
Using anonymous types and serializing to XML is an interesting approach as mentioned here
How much is much time...is it 10 ms, 10 sec or 10 min...and how much of the whole process that writes an Xml is it?
Not saying that you shouldn't optimize but imo it's a matter of how much time do you want to spend optimizing that slight bit of a process. In the end the faster you wanna go, the more complex it will be to maintain in this case (personal opinion).
I personally like to use XmlDocument type. It's still a bit heavy when writing nodes but attributes are one-liner, and all in all way simpler that using Xmlwrite.

How to change my XSL stylesheet to properly allow carriage returns

Hey, I was wondering if anybody knew how to alter the following XSL stylesheet so that ANY text in my transformed XML will retain the carriage returns and line feeds (which will be \r\n as I feed it to the XML). I know I'm supposed to be using in some way but I can't seem to figure out how to get it working
<xsl:stylesheet xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\">
<xsl:template match=\"/\"><xsl:apply-templates /></xsl:template><xsl:template match=\"\r\n\"><xsl:text>
</xsl:text></xsl:template><xsl:template match=\"*\">
<xsl:element name=\"{local-name()}\"><xsl:value-of select=\"text()\"/><xsl:apply-templates select=\"*\"/></xsl:element ></xsl:template></xsl:stylesheet>
In your code above you can't apply templates and expect this template to get called:
<xsl:template match="\r\n\">
<xsl:text>
</xsl:text>
</xsl:template>
Unless you have a node in your XML named "\r\n" which is an illegal name anyhow. I think what you want to do is make this call explicitly when you want a carriage return:
<xsl:call-template name="crlf"/>
Here is an example of the template that could get called:
<xsl:template name="crlf">
<xsl:text>
</xsl:text>
<xsl:text>
</xsl:text>
<!--consult your system doc for appropriate carriage return coding -->
</xsl:template>
The answers from Chris and dkackman are on the mark but we also need to listen to the W3C every now and again:
XML parsed entities are often stored
in computer files which, for editing
convenience, are organized into lines.
These lines are typically separated by
some combination of the characters
carriage-return (#xD) and line-feed
(#xA).
This means that in your XSLT you can experiment with some combination of
and 
. Remember that different operating systems have different line-ending strategies.
It's not completely clear what you are trying to accomplish but...
Any whitespace that you absolutely want to show up in the output stream I would wrap in <xsl:text></xsl:text>
I would also highly recommend specifying an <xsl:output/> to control the output formatting.
Your question sounds like you want to control the format of the output XML. My advice: just don't.
XML is data, not text. The format it is in should be completely irrelevant to your application. If it is not, then your application needs some reworking.
Within non-empty text nodes, XML will retain line breaks by definition. Within attribute nodes they are retained as well, unless the product you use does not adhere to the spec.
But outside of text nodes (or in those empty text nodes between elements) line breaks are considered irrelevant white space and you should not rely on them or waste your time trying to create or retain them.
There is <xsl:output indent="yes" />, which does some (XSLT processor-specific) pretty-printing, but your application should not rely on such things.
Have you tried the preserve white space tag?

Strip WordML from a string

I've been tasked with build an accessible RSS feed for my company's job listings. I already have an RSS feed from our recruiting partner; so I'm transforming their RSS XML to our own proxy RSS feed to add additional data as well limit the number of items in the feed so we list on the latest jobs.
The RSS validates via feedvalidator.org (with warnings); but the problem is this. Unfortunately, no matter how many times I tell them not to; my company's HR team directly copies and pastes their Word documents into our Recruiting partners CMS when inserting new job listings, leaving WordML in my feed. I believe this WordML is causing issues with Feedburner's BrowserFriendly feature; which we want to show up to make it easier for people to subscribe. Therefore, I need to remove the WordML markup in the feed.
Anybody have experience doing this? Can anyone point me to a good solution to this problem?
Preferably; I'd like to be pointed to a solution in .Net (VB or C# is fine) and/or XSL.
Any advice on this is greatly appreciated.
Thanks.
I haven't yet worked with WordML, but assuming that its elements are in a different namespace from RSS, it should be quite simple to do with XSLT.
Start with a basic identity transform (a stylesheet that add all nodes from the input doc "as is" to the output tree). You need these two templates:
<!-- Copy all elements, and recur on their child nodes. -->
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<!-- Copy all non-element nodes. -->
<xsl:template match="#*|text()|comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>
A transformation using a stylesheet containing just the above two templates would exactly reproduce its input document on output, modulo those things that standards-compliant XML processors are permitted to change, such as entity replacement.
Now, add in a template that matches any element in the WordML namespace. Let's give it the namespace prefix 'wml' for the purposes of this example:
<!-- Do not copy WordML elements or their attributes to the
output tree; just recur on child nodes. -->
<xsl:template match="wml:*">
<xsl:apply-templates/>
</xsl:template>
The beginning and end of the stylesheet are left as an exercise for the coder.
Jeff Attwood blogged about how to do this a while ago. His post contains some c# code that will clean the WordML.
http://www.codinghorror.com/blog/archives/000485.html
I would do something like this:
char[] charToRemove = { (char)8217, (char)8216, (char)8220, (char)8221, (char)8211 };
char[] charToAdd = { (char)39, (char)39, (char)34, (char)34, '-' };
string cleanedStr = "Your WordML filled Feed Text.";
for (int i = 0; i < charToRemove.Length; i++)
{
cleanedStr = cleanedStr.Replace(charToRemove.GetValue(i).ToString(), charToAdd.GetValue(i).ToString());
}
This would look for the characters in reference, (Which are the Word special characters that mess up everything and replaces them with their ASCII equivelents.

Using C# Regular expression to replace XML element content

I'm writing some code that handles logging xml data and I would like to be able to replace the content of certain elements (eg passwords) in the document. I'd rather not serialize and parse the document as my code will be handling a variety of schemas.
Sample input documents:
doc #1:
<user>
<userid>jsmith</userid>
<password>myPword</password>
</user>
doc #2:
<secinfo>
<ns:username>jsmith</ns:username>
<ns:password>myPword</ns:password>
</secinfo>
What I'd like my output to be:
output doc #1:
<user>
<userid>jsmith</userid>
<password>XXXXX</password>
</user>
output doc #2:
<secinfo>
<ns:username>jsmith</ns:username>
<ns:password>XXXXX</ns:password>
</secinfo>
Since the documents I'll be processing could have a variety of schemas, I was hoping to come up with a nice generic regular expression solution that could find elements with password in them and mask the content accordingly.
Can I solve this using regular expressions and C# or is there a more efficient way?
This problem is best solved with XSLT:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="//password">
<xsl:copy>
<xsl:text>XXXXX</xsl:text>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This will work for both inputs as long as you handle the namespaces properly.
Edit : Clarification of what I mean by "handle namespaces properly"
Make sure your source document that has the ns name prefix has as namespace defined for the document like so:
<?xml version="1.0" encoding="utf-8"?>
<secinfo xmlns:ns="urn:foo">
<ns:username>jsmith</ns:username>
<ns:password>XXXXX</ns:password>
</secinfo>
I'd say you're better off parsing the content with a .NET XmlDocument object and finding password elements using XPath, then changing their innerXML properties. It has the advantage of being more correct (since XML isn't regular in the first place), and it's conceptually easy to understand.
From experience with systems that try to parse and/or modify XML without proper parsers, let me say: DON'T DO IT. Use an XML parser (There are other answers here that have ways to do that quickly and easily).
Using non-xml methods to parse and/or modify an XML stream will ALWAYS lead you to pain at some point in the future. I know, because I have felt that pain.
I know that it seems like it would be quicker-at-runtime/simpler-to-code/easier-to-understand/whatever if you use the regex solution. But you're just going to make someone's life miserable later.
You can use regular expressions if you know enough about what you are trying to match. For example if you are looking for any tag that has the word "password" in it with no inner tags this regex expression would work:
(<([^>]*?password[^>]*?)>)([^<]*?)(<\/\2>)
You could use the same C# replace statement in zowat's answer as well but for the replace string you would want to use "$1XXXXX$4" instead.
Regex is the wrong approach for this, I've seen it go so badly wrong when you least expect it.
XDocument is way more fun anyway:
XDocument doc = XDocument.Parse(#"
<user>
<userid>jsmith</userid>
<password>password</password>
</user>");
doc.Element("user").Element("password").Value = "XXXX";
// Temp namespace just for the purposes of the example -
XDocument doc2 = XDocument.Parse(#"
<secinfo xmlns:ns='http://tempuru.org/users'>
<ns:userid>jsmith</ns:userid>
<ns:password>password</ns:password>
</secinfo>");
doc2.Element("secinfo").Element("{http://tempuru.org/users}password").Value = "XXXXX";
Here is what I came up with when I went with XMLDocument, it may not be as slick as XSLT, but should be generic enough to handle a variety of documents:
//input is a String with some valid XML
XmlDocument doc = new XmlDocument();
doc.LoadXml(input);
XmlNodeList nodeList = doc.SelectNodes("//*");
foreach (XmlNode node in nodeList)
{
if (node.Name.ToUpper().Contains("PASSWORD"))
{
node.InnerText = "XXXX";
}
else if (node.Attributes.Count > 0)
{
foreach (XmlAttribute a in node.Attributes)
{
if (a.LocalName.ToUpper().Contains("PASSWORD"))
{
a.InnerText = "XXXXX";
}
}
}
}
The main reason that XSLT exist is to be able to transform XML-structures, this means that an XSLT is a type of stylesheet that can be used to alter the order of elements och change content of elements. Therefore this is a typical situation where it´s highly recommended to use XSLT instead of parsing as Andrew Hare said in a previous post.

Categories

Resources