My requirement is to update an XML file (some elements identified via a parameter, with new attribute values again identified via a paramenter).
I am using XSLT to do the same via C# code.
My code is as below:
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(f_Xslt);
XmlReader xr = XmlReader.Create("SourceXML.xml");
XmlWriter xw = XmlWriter.Create("DestinationXML.xml");
XsltArgumentList argsList = new XsltArgumentList();
argsList.AddParam("", "", "");
...
...
...
xslt.Transform(xr, argsList, xw);
In my XSLT file, I first copy all elements, attributes. And then based on <xsl:template match = ... />, I update the elements, attr/values.
All this is saved to Destination.xml
What if I want all of this to happen on Source.xml itself.
Of course, the easiest solution(or my solution so far) is to replace the Source.XML with Destination.XML after I complete the XSLT.Transform successfully.
I think your transform-to-file-then-replace solution is as good as you're going to get. You don't want to overwrite the Source.XML file while reading it, even if .NET and the OS would let you.
In order to suggest a better alternative to transform-to-file-then-replace (TTFTR), I would ask, what is it about TTFTR that you feel is suboptimal?
The only alternative I can think of off-hand is to write the result of your transform to memory; and when the transform is finished, save the result from memory onto your source file. To transform to memory, pass a MemoryStream object as the argument to XmlWriter.Create().
You never should try to update in-place with XSLT. This is bad design and not in the spirit of a functional language.
This said, you can copy the source XML file in a temporary directory, then apply the transformation with an XmlWriter instance that is created to overwrite the original file.
As I said before, I wouldn't recommend this!
Related
I have been looking all over for the best way to update xml in a file. I have just switched over to using XmlReader (coming from the XDocument method) for speed (not having to read the entire file in memory).
My XmlReader method works perfect and when I need to read a value, it opens the xml, starts reading and ONLY reads up to the node needed, then closes everything. It's very fast and effective.
Now that I have that working I want to make a method that UPDATES xml that is already in place. I would like to keep to the same idea and ONLY read in memory what is needed. So the idea would be, read up until the node I'm changing then use the writer to UPDATE that value.
Everything I have seen has a XmlReader reading while using an XmlWriter writing everything. If I did that I would assume that I would have to let it run through the entire file just like the XDocument would do. As an example this answer.
Is it possible to maybe just use the reader and read up to the node I'm trying to edit then change the innerxml or something?
What's the fastest and most efficient method to update XML in a file?
I would like to only read into memory what I'm trying to edit, not
the whole file.
I would also like to account for nodes that do not
exist (that need to be added).
By design, XmlReader represents a "read-only forward-only" view of the document and cannot be used to update the content. Using the Load method of either XmlDocument, XDocument or XElement, will still cause the entire file to be read in to memory. (Under the hood, XDocument and XElement still use an XmlReader.) However, you can combine using a raw XmlReader and XElement together using the overloads of the Load method which take an XmlReader.
You don't describe your XML structure, but you would want to do something similar to this:
var reader = XmlReader.Create(#"file://c:\test.xml");
var document = XElement.Load(reader);
document.Add(new XElement("branch", "leaves"));
document.Save("Tree.xml");
To find a specific node (for example, with a specific attribute value), you'd want to do something similar to this:
var node = document.Descendants("branch")
.SingleOrDefault(e => (string)e.Attribute("name") == "foo");
Input:
My input files are XML files. They are read by the foreach file enumerator in SSIS.
Process:
An SSIS script component (C#) reads the file name from the variable.
I created an XSL file for transforming the XML into the format necessary. The script task uses the XSL file, and transforms XML files (to text)
Here is the piece of code I used:
public override void CreateNewOutputRows()
{
XslCompiledTransform transformer = new XslCompiledTransform();
transformer.Load(_xsltFile);
transformer.Transform(_fileName, #"C:\macro3\outputTestFile.txt");
}
Problem:
As expected, this writes the transformed text content to the mentioned output file. I want to read through each line, process it, and load to database.
Now, writing to a file, reading it again is an overhead.
Is there a way I can read the transformed content into any object and iterate over it (without actually writing to a file)? Like a Stream or something?
Alternatively:
Though SSIS "XML Task" has "Operation Type = XSLT" feature, it is not reading the XML if the "SourceType" is variable and I give file name with path in the variable. It is expecting the XML content in the variable. Any work around possible?
Please ask for specific details in comments, so I can update accordingly. Thank you.
I am not able to edit the Expressions of the XML task as shown in the image
Instead of using the Script Task, use a Data Flow. The Data Flow is for transforming streams of data in memory, so sounds like exactly what you're after.
A couple of options:
If the transformations you need to do aren't too complex, you could set up an XML Source and use an expression so that the source uses the file path variable as its connection string. Once you've done that you can add any further components you need to carry out transformations, and then your database destination.
If the transformations are more complex and you want to use the XSL, you could use a Script Component as a source in the Data Flow, and code picking up the XML and XSD, and carrying out the transform. Here's an example of carrying out the transform and getting the rows of data into memory instead of into a file. MSDN lists all of the overloads available, if that isn't the best direction for you. You would then pass the resultant rows as output into the rest of the Data Flow, and from there you could go directly to a database Destination Component.
Either way, make sure you set the destination to "fast load" to speed things up.
While you won't need it if you decide to do this entirely in a Data Flow, as far as the XML Task goes, you need to use File connection as the source instead of Variable. MSDN notes that Variable is only for use with a variable that holds the XML content. You'll need to set up an expression in the same way you would for any file source, and pass the file path variable in.
I am able to solve this.
One of the overloads of the Transforms helped.
Here is what I did:
public override void CreateNewOutputRows()
{
XmlReader read = XmlReader.Create(_fileName);
XslCompiledTransform transformer = new XslCompiledTransform();
transformer.Load(_xsltFile);
StringWriter sw = new StringWriter();
transformer.Transform(read, null, sw);
String[] rows = sw.ToString().Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
String tag;
foreach (String row in rows)
{
// additional code here
TagValueBuffer.AddRow();
TagValueBuffer.TagValue = row;
}
}
I needed help with reading lines from the transformed text. Looks like I need to output to a stream, then use StringWriter.
Then I split it based on new lines and ran a foreach
I am using XslCompiledTransform to convert an XML file to HTML. Is there a way I can prevent it from using self-closing tags.
e.g.
<span></span> <!-- I want this even if content empty -->
<span/> <!-- stop doing this! ->
The self-closing tags on span's are messing up my document no matter which browser I use, though it is valid XML, it's just that 'span' is not allowed to have self-closing tags.
Is there a setting I can put in my xsl, or in my C#.Net code to prevent self-closing tags from being used?
Though I couldn't classify this as a direct solution (as it doesn't emit an empty element), the workaround I used was to put a space (using xsl:text) in the element -- since this is HTML markup, and if you are activating Standards mode (not quirks), the extra space doesn't change the rendered content. I also didn't have control over the invocation of the transform object.
<div class="clearBoth"><xsl:text> </xsl:text></div>
You can try <xsl:output method="html"/>, however the result would no longer be well-formed XML document.
Or, you can invoke the XslCompiledTransform.Transform() method passing as one of the parameters your own XmlWriter. In your implementation you are in full control and can implement any required serialization of the result tree.
The only solution I have been able to find, is to add logic to the XSL file. Basically if the the elements I wanted to wrap span around is empty, don't use the span element at all.
<xsl:if test="count(jar/beans) > 0">
<xsl:apply-templates select="jar/beans"/>
</xsl:if>
Not ideal to have to insert this everywhere in my xsl file, to compensate for the fact that even though I choose output method "html", it more than willingly will generate illegal HTML.
Sigh.
In your XSLT use <xsl:output method="html"/> and then make sure your HTML result elements your stylesheet creates are in no namespace. Furthermore depending on how you use XslCompiledTransform in your C# code you need to make sure the xsl:output settings in the stylesheet are honoured. You can easily achieve that by transforming to a file or stream or TextWriter, in that case nothing has to be done. However if you for some reasons transform to an XmlWriter then you need to ensure it is created with the proper settings e.g.
XslCompiledTransform proc = new XslCompiledTransform();
proc.Load("sheet.xsl");
using (XmlWriter xw = XmlWriter.Create("result.html", proc.OutputSettings))
{
proc.Transform("input.xml", null, xw);
}
But usually you should be fine by simply transforming to a Stream or TextWriter, in that case nothing in the C# code has to be done to honour the output method in the stylesheet.
I'm developing a windows app using C#. I chose xml for data storage.
It is required to read xml file, make small changes, and then write it back to hard disk.
Now, what is the easiest way of doing this?
XLinq is much comfortable than the ordinary Xml, because is much more object oriented, supports linq, has lots of implicit casts and serializes to the standard ISO format.
The best way is to use XML Serialization where it loads the XML into a class (with various classes representing all the elements/attributes). You can then change the values in code and then serialize back to XML.
To create the classes, the best thing to do is to use xsd.exe which will generate the c# classes for you from an existing XML document.
I think the easiest way of doing it - it is using XmlDocument class:
var doc = new XmlDocument();
doc.Load("filename or stream or streamwriter or XmlReader");
//do something
doc.Save("filename or stream or streamwriter or XmlWriter");
I think I found the easiest way, check out this Project in Codeproject. It is easy to use as XML elements are accessed similarly to array elements using name strings as indexes.
Code sample to write bool property to XML:
Xmlconfig xcfg = new Xmlconfig("config.xml", true);
xcfg.Settings[this.Name]["AddDateStamp"]["bool"].boolValue = checkBoxAddStamp.Checked;
xcfg.Save("config.xml");
Sample to read the property:
Xmlconfig xcfg = new Xmlconfig("config.xml", true);
checkBoxAddStamp.Checked = xcfg.Settings[this.Name]["AddDateStamp"]["bool"].boolValue;
To write string use .Value, for int .intValue.
You can use LINQ to read XML Files as described here...
LINQ to read XML
Check out linq to XML
I have an HTML document stored in memory as an Linq-to-XML object tree. How can I serialize an XDocument as HTML, taking into account the idiosyncrasies of HTML?
For example, empty tags such as <br/> should be serialized as <br>, whereas an empty <div/> should be serialized as <div></div>.
HTML output is possible from an XSLT stylesheet, and XmlWriterSettings has an OutputMethod property which can be set to HTML - but the setter is internal, for use by XSLT or Visual Studio, and I can't seem to find a way to serialize arbitrary XML as HTML.
So, short of using XSLT solely for the HTML output capability (i.e. doing something like running the document through an otherwise pointless chain of XDocument->XmlReader->via XSLT, to HTML), is there a way to serialize a .NET XDocument to HTML?
No. The XDocument->XmlReader->XSLT is the approach you need.
What you are looking for is a specialised serialiser that arbitarily adds meaning to tag names like br and div and renders each differently. One would also expect such a serialiser to work in both directions, IOW be able to read HTML Tag soup and generate an XDocument. Such a thing does not exist out-of-the-box.
The XmlReader to XSLT seems simple enough for the job, ultimately is just a chain of streams.
Like you, I'm really surprised that the HTML output method isn't exposed, and I don't know of any way round it, other than the XSLT route you've already identified. When I faced the same problem a couple of years ago, I wrote an XmlWriter wrapper class, that forced calls to WriteEndElement to use WriteFullEndElement on the underlying XmlWriter if the tag being processed wasn't in the list {"area", "base", "basefont", "bgsound", "br", "col", "embed", "frame", "hr", "isindex", "image", "img", "input", "link", "meta", "param", "spacer", "wbr" }.
This fixed the <div/> problem and was sufficient for me as what I wanted to write was polyglot documents. I didn't find a method to make <br/> appear as <br> but apart from not being able to validate as HTML 4.01 this doesn't cause a real problem. I guess that if you really need this, and don't want to use the XSLT method, you'll have to write your own XmlWriter implementation.
Of course there is!
//XDocument document; string filename;
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
typeof(XmlWriterSettings).GetField("outputMethod", BindingFlags.NonPublic|BindingFlags.Instance).SetValue(settings, XmlOutputMethod.Html);
using(XmlWriter xw = XmlWriter.Create(filename, settings))
{
document.Save(xw);
}