Compress XML during XMLReader.Read() - c#

I have a method which loops through an XML document using an XMLReader (validating at the same time) extracting specific pieces of information. I also need to compress the entire XML document in preparation for storing it in a database. The code I have to do this is below. Is this (passing the entire XmlReader to StreamWriter.Write()) the appropriate / most efficient way to achieve this? I didn't see a clear way to use the while(validatingReader.Read()) loop to achieve the same result.
XmlSchemaSet schemaSet = new XmlSchemaSet();
schemaSet.Add("schemaNamespace", "schemaLocation");
XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.ValidationType = ValidationType.Schema;
readerSettings.Schemas.Add(schemaSet);
readerSettings.ValidationEventHandler
+= new ValidationEventHandler(XMLValidationError);
using (XmlReader documentReader = requestXML.CreateNavigator().ReadSubtree())
{
using (XmlReader validatingReader =
XmlReader.Create(documentReader, readerSettings))
{
using (MemoryStream output = new MemoryStream())
{
using (DeflateStream gzip =
new DeflateStream(output, CompressionMode.Compress))
{
using (StreamWriter writer =
new StreamWriter(gzip, System.Text.Encoding.UTF8))
{
writer.Write(validatingReader);
this.compressedXMLRequest
= Encoding.UTF8.GetString(output.ToArray());
}
}
}
while (validatingReader.Read())
{
// extract specific element contents
}
}
}

Compression portion looks fine. MemoryStream may not be the best choice for large documents, but check if performance is ok for your scenarios before changing.
"extract specific element" portion will not read anything as reader is forward only, so all content is already read by the time that portion is executed. You may want to recreate the reader.

For future reference:
The code in the Question does not work properly. Passing an XmlReader to a StreamWriter doesn't work as expected. In the end I didn't end up combining compression with validation in this way so I don't exactly have "correct" code to show for this but didn't want to leave the question dangling.

Related

Reading XML file from website from console application

What I want to accomplish is reading an xml file from a website (http://xml.buienradar.nl/). I have been reading about what to use, but I can't see the forest for the trees! Should I be using WebRequest, or XmlDocument, or XDocument, or XmlReader, or XmlTextReader, or? I read that XmlDocument and XDocument read the whole file into memory, and XmlReader doesn't. But is that a problem in this case? What if indeed the xml file is huge?
Can someone help me find a way?
Thanks!
To read huge XML without loading all of it into memory, you can use XmlReader class. But please note that this method requires more code than XDocument or even XmlDocument solution.
var h = WebRequest.CreateHttp("http://xml.buienradar.nl/");
using (var r = h.GetResponse())
using (var resp = r.GetResponseStream())
using (var sr = new StreamReader(resp))
using (var xr = new XmlTextReader(sr))
{
while (xr.Read())
{
// doing something with xr
// for example print it's current node value
Console.WriteLine(xr.Value);
}
}
If you want to test for large XML file, you can try XML from http://www.ins.cwi.nl/projects/xmark/Assets/standard.gz.
It is over 30 MB gzipped. With this method, XML processing don't require much memory, it even don't wait for whole file to finished downloading.
Test code:
var h = WebRequest.CreateHttp("http://www.ins.cwi.nl/projects/xmark/Assets/standard.gz");
using (var r = h.GetResponse())
using (var resp = r.GetResponseStream())
using (var decompressed = new GZipStream(resp, CompressionMode.Decompress))
using (var sr = new StreamReader(decompressed))
using (var xr = new XmlTextReader(sr))
{
while (xr.Read())
{
// doing something with xr
// for example print it's current node value
Console.WriteLine(xr.Value);
}
}
XmlTextReader provides a faster mechanism for reading xml.
string url="http://xml.buienradar.nl/";
XmlTextReader xml=new XmlTextReader(url);
while(xml.Read())
{
Console.WriteLine(xml.Value);
}

XDocument automatically indents files on parse and does not remove indent on save

I am seeking a help regarding file saving of XML file using XDocument (NOT XMLDocument).
So I have a xml file that does not have indentation (in fact it is 1 line). When I read this to XDocument using XDocument.Parse (after reading and storing string using StreamReader), the resulting XDocument is indented.
Alright, I thought it will be fine as long as if I can save it back to the file without indentation. However, even though I have
XmlWriterSettings writerSettings = new XmlWriterSettings();
writerSettings.NewLineOnAttributes = false;
writerSettings.NewLineHandling = NewLineHandling.None;
writerSettings.Indent = false;
and pass that in when I create XmlWriter
using (var writer = XmlWriter.Create(u.ToFileSystemPath(), settings))
{
xd.Save(writer);
}
The resulting XML file still has indentation. When I am debugging on Visual studio, I noticed that the writer is a class XmlWellFormedWriter. Does this have something to do with my result? Any help would be appreciated.
Thank you.
SaveOptions are available on Save() as well as ToString().
string xmlstring =
#"<Top>
<First>1</First>
<Second>Dude</Second>
<Third>Now</Third>
</Top>";
XDocument doc = XDocument.Parse(xmlstring);
doc.Save(#"C:\temp\noIndet.xml", SaveOptions.DisableFormatting);
// string noIndent = doc.ToString(SaveOptions.DisableFormatting);
Output:

XmlTextWriter serialize Object to XML element

I would like to perform object serialization to only one branch in an existing XML file. While reading by using:
RiskConfiguration AnObject;
XmlSerializer Xml_Serializer = new XmlSerializer(typeof(RiskConfiguration));
XmlTextReader XmlReader = new XmlTextReader(#"d:\Projects\RiskService\WCFRiskService\Web.config");
XmlReader.ReadToDescendant("RiskConfiguration");
try
{
AnObject = (RiskConfiguration)Xml_Serializer.Deserialize(XmlReader);
AnObject.Databases.Database[0].name = "NewName";
}
finally
{
XmlReader.Close();
}
It is possible, I do not know how to edit the object again performed it can save the file without erasing other existing elements in an XML file. Can anyone help me?
I found a way to display the desired item serialization. How do I go now instead of the paste to the original element in XML?
StringWriter wr = new StringWriter();
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.OmitXmlDeclaration = true;
settings.Encoding = System.Text.Encoding.Default;
using (XmlWriter writer = XmlWriter.Create(wr, settings))
{
XmlSerializerNamespaces emptyNamespace = new XmlSerializerNamespaces();
emptyNamespace.Add(String.Empty, String.Empty);
Xml_Serializer.Serialize(writer, AnObject, emptyNamespace);
MessageBox.Show(wr.ToString());
}
First of all, you should stop using new XmlTextReader(). That has been deprecated since .NET 2.0. Use XmlReader.Create() instead.
Second, XML is not a random-access medium. You can't move forward and backwards, writing into the middle of the file. It's a text-based file.
If you need to "modify" the file, then you'll need to write a new version of the file. You could read from the original file, up to the point where you need to deserialize, writing the nodes out to a new version of the file. You could then deserialize from the original file, modify the objects, and serialize out to the new version. You could then continue reading from the original and writing the nodes out to the new version.

C# XML can't load indent value

The code below is causing the " Data at the root level is invalid. Line 1, position 1. "
I like to keep the code indent(linebreak) but keep having the problem as mentioned above. I could use the TextReader to load the xml but it will remove the indent which i don't like it. If you know how to fix problem please let me know. Thanks
public XmlDocument MYXML()
{
XmlWriterSettings wSettings = new XmlWriterSettings();
wSettings.Indent = false;
wSettings.OmitXmlDeclaration = false;
MemoryStream ms = new MemoryStream();
XmlWriter xw = XmlWriter.Create(ms, wSettings);// Write Declaration
xw.WriteStartDocument();
// Write the root node
xw.WriteStartElement("Library");
// Write the books and the book elements
xw.WriteStartElement("Book");
xw.WriteStartAttribute("BookType");
xw.WriteString("Hardback");
xw.WriteEndAttribute();
xw.WriteStartElement("Title");
xw.WriteString("Door Number Three");
xw.WriteEndElement();
xw.WriteStartElement("Author");
xw.WriteString("O'Leary, Patrick");
xw.WriteEndElement();
xw.WriteEndElement();
xw.WriteEndElement();
// Close the document
xw.WriteEndDocument();
// Flush the write
xw.Flush();
Byte[] buffer = new Byte[ms.Length];
buffer = ms.ToArray();
string xmlOutput = System.Text.Encoding.UTF8.GetString(buffer);
//The next 3 line works fine but it will remove the Indent from the XmlWriterSettings
//TextReader tr = new StreamReader(ms);
//ms.Seek(0, SeekOrigin.Begin);
//xmlOutput = tr.ReadToEnd() + "";
//Can't nload the xmlOutput from buffer
XmlDocument xmldoc = new XmlDocument();
xmldoc.LoadXml(xmlOutput);
return xmldoc;
}
The XmlWriter is writing a utf-8 byte-order mark to the stream. Encoding.UTF8.GetString doesn't account for this (since it's only supposed to occur in files) so the first character of the string becomes an invalid, unprintable character, which is what XmlDocument.LoadXml chokes on.
EDIT: Since you said you want to create an XmlDocument so you can reuse it, I recommend one of the following:
If using .Net 3.5 or newer, use XDocument which is much easier to use (I recommend this).
Create the XmlDocument directly by constructing nodes and adding them to the tree.
Create the XmlDocument directly from the writer by using XPathNavigator (XmlWriter writer = doc.CreateNavigator.AppendChild())
Note that you can't easily add insignificant whitespace to an XmlDocument. Using XDocument and writing it to the output using doc.Save(Response.Output) is by far the easiest option if you want to have nicely formatted output.

Dataset -> XML Document - Load DataSet into an XML Document - C#.Net

I'm trying to read a dataset as xml and load it into an XML Document.
XmlDocument contractHistoryXMLSchemaDoc = new XmlDocument();
using (MemoryStream ms = new MemoryStream())
{
//XmlWriterSettings xmlWSettings = new XmlWriterSettings();
//xmlWSettings.ConformanceLevel = ConformanceLevel.Auto;
using (XmlWriter xmlW = XmlWriter.Create(ms))
{
xmlW.WriteStartDocument();
dsContract.WriteXmlSchema(xmlW);
xmlW.WriteEndDocument();
xmlW.Close();
using (XmlReader xmlR = XmlReader.Create(ms))
{
contractHistoryXMLSchemaDoc.Load(xmlR);
}
}
}
But I'm getting the error - "Root Element Missing".
Any ideas?
Update
When i do xmlR.ReadInnerXML() it is empty. Does anyone know why?
NLV
A few things about the original code:
You don't need to call the write start and end document methods: DataSet.WriteXmlSchema produces a complete, well-formed xsd.
After writing the schema, the stream is positioned at its end, so there's nothing for the XmlReader to read when you call XmlDocument.Load.
So the main thing is that you need to reset the position of the MemoryStream using Seek. You can also simplify the whole method quite a bit: you don't need the XmlReader or writer. The following works for me:
XmlDocument xd = new XmlDocument();
using(MemoryStream ms = new MemoryStream())
{
dsContract.WriteXmlSchema(ms);
// Reset the position to the start of the stream
ms.Seek(0, SeekOrigin.Begin);
xd.Load(ms);
}
XmlDocument contractHistoryXMLSchemaDoc = new XmlDocument();
using (MemoryStream ms = new MemoryStream())
{
dsContract.WriteXml(ms);
ms.Seek(0, SeekOrigin.Begin);
using(XmlReader xmlR = XmlReader.Create(ms))
{
contractHistoryXMLSchemaDoc.Load(xmlR);
}
}
All you really need to do is to call contractHistoryXMLSchemaDoc.Save(ms). That will put the xml into the MemoryStream.
XmlDocument contractHistoryXMLSchemaDoc = new XmlDocument();
using (MemoryStream ms = new MemoryStream())
{
contractHistoryXMLSchemaDoc.Save(ms);
ms.Flush();
}
Here you go.
If from the naming convention of your variables, (not though the question you asked, which appears to change...), you need to load the XML scema of the data set into the XML document that you named with schema, below is the code to load schema of the dataset into an XMLDocument.
I want the schema of the dataset in a separate XML document. – NLV Apr 19 '10 at 12:01
Answer:
XmlDocument contractHistoryXMLSchemaDoc = new XmlDocument();
using (MemoryStream ms = new MemoryStream())
{
dsContract.WriteXmlSchema(ms);
ms.Seek(0, SeekOrigin.Begin);
contractHistoryXMLSchemaDoc.Load(ms);
}
If you are looking for how to load the dataset table data into an XML document (note I removed the word Schema from the XMLDocument variable name)
Your Question:
Sorry, i dont get you. I need to get the xml of that dataset into an xml document. – NLV Apr 19 '10 at 11:56
Answer:
XmlDocument contractHistoryXMLDoc = new XmlDocument();
using (MemoryStream ms = new MemoryStream())
{
dsContract.WriteXml(ms,XmlWriteMode.IgnoreSchema);
ms.Seek(0, SeekOrigin.Begin);
contractHistoryXMLDoc.Load(ms);
}
If you want the Schema and data set in separate documents, the code is above.
If you want just the schema or just the data in and xml document, use the above bit of code that pertains to your question.
If you want both the XML Schema and the Data in the same XMLDocument, then use this code.
XmlDocument contractHistoryXMLDoc = new XmlDocument();
using (MemoryStream ms = new MemoryStream())
{
dsContract.WriteXml(ms,XmlWriteMode.WriteSchema);
ms.Seek(0, SeekOrigin.Begin);
contractHistoryXMLDoc.Load(ms);
}
Your Question:
But I'm getting the error - "Root Element Missing".
Any ideas?
Update
When i do xmlR.ReadInnerXML() it is empty. Does anyone know why?
NLV
Answer:
There are underlying issues in your code in the way you think it is working, which means it is not really creating the XMLSchema and XmlData (dsContract), which is why you are seeing a blank XMLDocument.
It would probably help you to fully explain what you wish, and then not reply to everyone using partial sentences of which are misunderstood and your having to keep replying with more partial sentences.

Categories

Resources