ReadXML data and Schema in DataTable

ReadXML data and Schema in DataTable - c#

I have two strings. One string is having XML Data and another string is having corresponding XML Schema. I am trying to read the data in DataTable. It looks like it is not possible. I don't want to use dataset. Is there a way I can combine the XML data and Schema into a memory stream and read?

Put simply, no, there is not a way to load xml directly into a DataTable through methods on DataTable nor is there a way to create a DataTable directly from an arbitrary schema. Such operations must be done through DataSet; otherwise, you end up doing some very involved workarounds.
There are some techniques you could apply using xml serialization that would be able to recreate a dataset from previously serialized xml. This does not allow for the use of an arbitrary schema though.
You could also write code specifically that loads your XML (via XDocument, XmlDocument, or XmlTextReader) and creates a DataTable on the fly, but it's not trivial to write and would likely take you quite some time. It's also kind of reinventing the wheel.
Essentially, the DataSet is the only class in that hierarchy with methods to process XML because Xml could contain any number of tables. To handle the broadest number of cases when you can make almost no assumptions about the XML, it has to be implemented at that level.
You could also consider whether it's appropriate to simply load the xml into an XDocument, validate it using the Validate extension method, and use Linq to Xml to query it.

Is there a way I can combine the XML data and Schema into a memory
stream and read?
Method
static DataTable ParseXML(string xmlString)
{
DataSet ds = new DataSet();
byte[] xmlBytes = Encoding.UTF8.GetBytes(xmlString);
Stream memory = new MemoryStream(xmlBytes);
ds.ReadXml(memory);
return ds.Tables[0];
}
Example:
string xml = new XElement("inventory",
new XElement("item",
new XElement("name", "rock"),
new XElement("price", "5000")),
new XElement("item",
new XElement("name", "new car"),
new XElement("price", "1"))).ToString();
DataTable dt = ParseXML(xml);
foreach (DataRow row in dt.Rows)
{
foreach (DataColumn col in dt.Columns)
Console.Write(row[col.ColumnName] + " | ");
Console.WriteLine();
}

Related

DataSet.ReadXml not getting all data

I have a relatively simple xml file that I would like to use to fill a DataSet. I am trying the following code
using (DataSet ds = new DataSet())
{
ds.ReadXml(Server.MapPath("xmlFileName.xml"));
}
But my dataset ends up with only one row containing the first node of the file.
I have tried other methods such as
XmlReader xmlFile = XmlReader.Create("xmlFileName.xml", new XmlReaderSettings());
DataSet ds = new DataSet();
ds.ReadXml(xmlFile);
And
FileStream fsReadXml = new FileStream("xmlFileName.xml", FileMode.Open);
XmlTextReader xmlReader = new System.Xml.XmlTextReader(fsReadXml);
newDataSet.ReadXml(xmlReader, XmlReadMode.ReadSchema);
Both of which result in empty datasets.
I can't post the entire xml file, but its format is essentially
<service_orders count="70">
<service_order order_number="1111" id="111111">
<customer>
<customer_id>55555</customer_id>
<first_name>John</first_name>
<last_name>Doe</last_name>
<email>JohnDoe#gmail.com</email>
<phone1>55555555</phone1>
The first method I mentioned only generates two columns from this
"service_order_id" and "count"
with values 0 and 70 respectively.
It seems like it's only hitting the first node?
So I'm not sure what I'm doing wrong with these methods. Is the xml file not formatted properly? Do I somehow need to make it go deeper into the nodes? Is there any way I can specify which nodes to hit?
Any help would be appreciated,
Thank you

I realized I had forgotten that DataSets hold multiple tables, and that I was only looking at the first one which contained the root. Though this did help me understand how the parser navigates the xml tree. Only leaf elements (with no children) get added to the tables, all other elements get tables of their own in the dataset.

Slow performance in data import when using xml in MySQL for a large recordset

I am using the following code for importing of data
MySqlDataAdapter adaptor = new MySqlDataAdapter(query, dbSession.Connection);
MySqlCommandBuilder commandBuilder = new MySqlCommandBuilder(adaptor);
var insertCmd = commandBuilder.GetInsertCommand();
insertCmd.CommandTimeout = int.MaxValue;
adaptor.InsertCommand = insertCmd;
DataTable dataTable = new DataTable();
dataTable.ReadXmlSchema(file);
dataTable.ReadXml(xmlfile);
adaptor.UpdateBatchSize = 1000;
adaptor.Update(dataTable);
I am using MySQL database. The xml file contains more than 2 million records.
My problem is that it takes huge amount of memory and is very slow.
I can use any other format than xml if it solves above problems.
Please let me know if you need more information.
Edit 1:
I have tried converting the XML into a CSV file and used MySqlBulkLoader to port the data. It is amazingly fast but i am fearing that will not work in some cases where the data contains the same delimiter also, i feel that it is not a good idea to make CSV file for binary data

Merge two XML files, one of which is non-conformant, in C#

I have two XML files which need to be merged into one file. When I try to merge them, I get an error saying that one of them does not conform.
The offending XML file looks something like:
<letter>
<to>
<participant>
<name>Joe Bethersonton</name>
<PostalAddress>Apartment 23R, 11454 Pruter Street</PostalAddress>
<Town>Fargo, North Dakota, USA</Town>
<ZipCode>50504</ZipCode>
</participant>
</to>
<from>
<participant>
<name>Jon Doe</name>
<PostalAddress>52 Generic Street</PostalAddress>
<Town>Romford, Essex, UK</Town>
<ZipCode>RM11 2TH</ZipCode>
</participant>
</from>
</letter>
I am trying to merge the two files using the following code snippet:
try
{
Dataset ds = new DataSet();
Dataset ds2 = new DataSet();
XmlTextReader reader1 = new XmlTextReader("C:\\File1.xml");
XmlTextReader reader2 = new XmlTextReader("C:\\File2.xml");
ds.ReadXml(reader1);
ds2.ReadXml(reader2);
ds.Merge(ds2);
}
catch(System.Exception ex)
{
Console.WriteLine(ex.Message);
}
This gives the following error:
The same table 'participant' cannot be the child table in two nested relations.
The two XML files are both encoded in UTF-16, which makes combining them by a simple text read and write difficult.
My required end result is one XML file with the contents of the first XML file followed by the contents of the second XML file, with a and tag around the whole lot and a header at the top.
Any ideas?
Thanks,
Rik

In my opinion, the XML you provided is just fine. I suggest, you use the following code and don't use the Dataset class at all:
XDocument doc1 = XDocument.Load("C:\\File1.xml");
XDocument doc2 = XDocument.Load("C:\\File2.xml");
var result = new XDocument(new XElement("Root", doc1.Root, doc2.Root));
result will contain a XML document with "Root" as the root tag and then the content of file 1 followed by the content of file 2.
Update:
If you need to use XmlDocument, you can use this code:
XmlDocument doc1 = new XmlDocument();
XmlDocument doc2 = new XmlDocument();
doc1.Load("C:\\File1.xml");
doc2.Load("C:\\File2.xml");
XmlDocument result = new XmlDocument();
result.AppendChild(result.CreateElement("Root"));
result.DocumentElement.AppendChild(result.ImportNode(doc1.DocumentElement, true));
result.DocumentElement.AppendChild(result.ImportNode(doc2.DocumentElement, true));

I suspect the solution is to provide a schema. DataSet.Merge doesn't know what to do with two sets of elements with the same name. It attempts to infer a schema, but that doesn't work out so well here.
According to this thread on MSDN, this is a limitation of the DataSet class:
The DataSet class in .NET 2.0 (Visual Studio 2005) still has the limitation of not supporting different nested tables with the same name. Therefore you will have to introduce an XML transform to pre-process the XML (and schemas) before you load them up into the DataSet.
Of course, the way that's phrased makes it seem like a newer version might have fixed this. Unfortunately, that may not be the case, as the original answer was posted back in 2005.
This knowledge base article seems to indicate that this behavior is "by design", albeit in a slightly different context.
A better explanation of why this behavior is occurring is also given on this thread:
When ADO reads XML into a DataSet, it creates DataTables to contain each type of element it encounters. Each table is uniquely identified by its name. You can't have two different tables named "PayList".
Also, a given table can have any number of parent tables, but only one of its parent relations can be nested - otherwise, a given record would get written to the XML multiple times, as a child of each of its parent rows.
It's extremely convenient that the DataSet's ReadXml method can infer the schema of the DataSet as it reads its input, but the XML has to conform to certain constraints if it's going to be readable. The XML you've got doesn't. So you have two alternatives: you can change the XML, or you can write your own method to populate the DataSet.
If it were me, I'd write an XSLT transform that took the input XML and turned PayList elements into either MatrixPayList or NonMatrixPaylist elements. Then I'd pass its output to the DataSet.
Using XmlDocument or XDocument to read in and manipulate the XML files is another possible workaround. For an example, see Merging two xml files LINQ

I found a solution using Serialization to first infer the schema,
then serialize the schema and remove the relationships contraints (this tricks the DataSet into thinking that IT has created the dataset.), then load this new schema into a DataSet.
This new dataset will be able to load both your xml files.
More details behind this trick:
Serialization Issue when using WriteXML method

Getting xml from a dataset

I have a DataSet which has one table loaded with data.
When i write out the dataset to xml using the GetXml method of the dataset, i get all the columns in that data table as elements in the xml file.
How do i get the resulting xml with the column values as attributes instead of elements?
The article here trails off without a proper answer
I am using .NET Framework 2.0

Before writing the XML, do something like
foreach (DataColumn column in aDataSet.Tables[0].Columns)
{
column.ColumnMapping = MappingType.Attribute;
}
Although I'll admit I didn't test this. You still may get a DiffGram structured file.

foreach(DataColumn dc in dsHR.Tables[0].Columns)
dc.ColumnMapping = MappingType.Attribute;
Quite simple :)

Howto parse csv and returns a dataSet as a result

I need a CSVParser class file
A Class File which parses csv and returns a dataSet as a result ASP.Net

I'm pretty sure that CSVReader (CodeProject) can read to DataTable.
DataTable table = new DataTable();
// set up schema... (Columns.Add)
using(TextReader text = File.OpenText(path))
using(CsvReader csv = new CsvReader(text, hasHeaders)) {
table.Load(csv);
}
Note that manually setting up the schema is optional; if you don't, I believe it assumes that everything is string.

Simple google gives plenty of results.

I've had luck with this parser. It will return results to a DataSet.
Another tool you might want to check out is FileHelpers. I see there's a tag for this resource here on SO.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.