Create XElement from invalid or partial XML String

Create XElement from invalid or partial XML String - c#

I am working with a vendor supplied application that stores XML data as byte array in a SQL database table. I have found if the XML data is "too long" (meaning by a possible predetermined length in black box code provided by the vendor) the XML is truncated and a second record, containing the remainder of the XML data, is created.
My task is to take these "linked" records and merge them into one valid XML string. These linked records can be broken off anywhere, in the middle of an element, node, etc. There is no rhyme or reason to where the XML string is broken.
Taking the invalid XML data and loading it into an XElement causes an error "Tag has no closing tag".
I've also tried using an XmlReader and reading through each Node, based on this article as well as this msdn article. They also result in the above missing tag error.
Is there a way to take these partial xml strings and merge them? Or am I simply stuck?
The vendor application we use does perform this merge, but that code is hidden from me.
Thank you

Related

Store data of multiple parsed XML files - .NET

I've been working on a VB application which parses multiple XML files, and create an Excel file from them.
The main problem of this is that I am, simply, reading each line of each XML and outputs them to the Excel file when a specific node is found. I would like to know if exists any method to store the data from each element, just to use it once everything (all the XML files) have been parsed.
I was thinking about databases but I think this is excessive and unnecesary. Maybe you can give me some ideas in order to make it working.

System.Data.DataSet can be used as an "in memory database".
You can use a DataSet to store information in memory - a DataSet can contain multiple DataTables and you can add columns to those at runtime, even if there are already rows in the DataTable. So even if you don't know the XML node names ahead of time, you can add them as columns as they appear.
You can also use DataViews to filter the data inside the DataSet.
My typical way of pre-parsing XML is to create a two-column DataTable with the XPATH address of each node and its value. You can then do a second pass that matches XPATH addresses to your objects/dataset.

Reading a string formatted like XML

I have a string that is written out like an XML file. An example would look like this:
string = <Employees><EmployeeId>1</EmployeeId>< ... ></Employees>
I am saving this in a table because I wanted to audit changes, but I didn't want to have multiple tables for different audits. This is because it would record changes to things other than employees. So using an XML style string in the database seemed like a good suggestion.
Now to the real business. I want to check to make sure that there were actually changes to the employee because one could go into the edit page, change nothing, and click save. As of right now, the data would write to the DB and just clutter it up with non-changed data.
I'd like to be able to check if the XML styled string that is going to be saved is on the database, so if <employees><employeeid>###</employeeid> == "changes" and then see if the whole string equals the other. Basically, check the employeeId first because that won't change, and then check the string as a whole to see if there is any difference. I would have just checked the first n numbers, but the id number could have a length of 1 to 3.
Also, because it is styled as XML, is there an easy way to convert it to read it like an XML file and check that way?

Storing arbitrary data in a column is a form of denormalization. You can't really do much with it at a database level. However, SQL Server does have an XML column type. Entity Framework doesn't support mapping to/from an XML column, so it will simply treat your XML as a standard string. With this column type, though, you can write actual SQL queries against your XML using XPath expressions.
Your best bet, then, is to type your column as XML, and then write a stored procedure that performs the query you need. You can then utilize this stored procedure with Entity Framework.
For more information on the XML column type see: https://msdn.microsoft.com/en-us/library/ms190798(SQL.90).aspx

Pulling out XML values from a string in a database column

I have seen various questions on this, but none seem to be working in my problem. I have an Umbraco site set up and it stores its page contents as XML in a database column. An example one is below:
Sorry for the screen grab and not the actual code, but the editor kept stripping things out.
What I would like to do ideally is either on the page in c#/Linq (have been trying to manipulate is from a string value) or within a SQL query. To be able to pull out the 'url-name', 'nodeName' and 'bodyText' fields.
Many thanks

Since the column is not defined as XML in the database, you can pull out the string and parse the text/string as an XML document:
// xml would be pulled from the DB
string xml = "<RunwayTextpage nodeName=\"Test page\" urlName=\"test-page\"><bodyText>Body Text</bodyText></RunwayTextpage>";
var doc = XDocument.Parse( xml );
string nodeName = doc.Root.Attribute( "nodeName" ).Value;
string urlName = doc.Root.Attribute( "urlName" ).Value;
string bodyText = doc.Root.Element( "bodyText" ).Value;
Another option would be to use string manipulation in the SQL query itself, but that would end up being much less maintainable whereas the above is easily understandable.

Why don't use uQuery? Don't really know your purposes, but it has a method called GetNodesByXPath that gets a collection of nodes or node from an XPath expression by querying the in memory xml cache.
It's more wise in terms of performance if your tree is large.

Very large XML file generation

I have a requirement to generate an XML file. This is easy-peasy in C#. The problem (aside from slow database query [separate problem]) is that the output file reaches 2GB easily. On top of that, the output XML is not in a format that can easily be done in SQL. Each parent element aggregates elements in its children and maintains a sequential unique identifier that spans the file.
Example:
<level1Element>
<recordIdentifier>1</recordIdentifier>
<aggregateOfLevel2Children>11</aggregateOfL2Children>
<level2Children>
<level2Element>
<recordIdentifier>2</recordIdentifier>
<aggregateOfLevel3Children>92929</aggregateOfLevel3Children>
<level3Children>
<level3Element>
<recordIdentifier>3</recordIdentifier>
<level3Data>a</level3Data>
</level3Element>
<level3Element>
<recordIdentifier>4</recordIdentifier>
<level3Data>b</level3Data>
</level3Element>
</level3Children>
</level2Element>
<level2Element>
<recordIdentifier>5</recordIdentifier>
<aggregateOfLevel3Children>92929</aggregateOfLevel3Children>
<level3Children>
<level3Element>
<recordIdentifier>6</recordIdentifier>
<level3Data>h</level3Data>
</level3Element>
<level3Element>
<recordIdentifier>7</recordIdentifier>
<level3Data>e</level3Data>
</level3Element>
</level3Children>
</level2Element>
</level2Children>
</level1Element>
The schema in use actually goes up five levels. For the sake of brevity, I'm including only 3. I do not control this schema, nor can I request changes to it.
It's a simple, even trivial matter to aggregate all of this data in objects and serialize out to XML based on this schema. But when dealing with such large amounts of data, out of memory exceptions occur while using this strategy.
The strategy that is working for me is this: I'm populating a collection of entities through an ObjectContext that hits a view in a SQL Server database (a most ineffectively indexed database at that). I'm grouping this collection then iterating through, then grouping the next level then iterating through that until I get to the highest level element. I then organize the data into objects that reflect the schema (effectively just mapping) and setting the sequential recordIdentifier (I've considered doing this in SQL, but the amount of nested joins or CTEs would be ridiculous considering that the identifier spans the header elements into the child elements). I write a higher level element (say the level2Element) with its children to the output file. Once I'm done writing at this level, I move to the parent group and insert the header with the aggregated data and its identifier.
Does anyone have any thoughts concerning a better way output such a large XML file?

As far as I understand your question, your problem is not with the limited space of storage i.e HDD. You have difficulty to maintain a large XDocument object in memory i.e RAM. To deal with this you can ignore make such a huge object. For each recovrdIdentifier element you can call .ToString() and get a string. Now, simply append this strings to a file. Put declaration and root tag in this file and you're done.

Uploading XML data to a Dataset

I am creating a generic XML to SQL Server data driver (In c#), that will take as input an XML file, and produce one or more data tables containing information.
So far I have an input XML file and an XSLT, the XSLT creates a new XML containing only the information needed from the xml.
My Problem lies in not knowing how to define a mapping from the XML elements to the columns in certain tables.
For example, say I have this extract of XML:
<Bug name = "MillenniumBug">
<Severity value = "1" />
</Bug>
I would like to create two tables, a bugs table and a severity table, and I need the Bug name in the Bug table, and the severity value in the severity table.
A point in the right direction as to how I can specify this mapping would be really appreciated.
Thanks

You have already lined up your ruleset when you set up elements as tables and attributes as columnns in the table. Setting up the mapping, and even "table creation logic" is fairly easy using those rules. The harder part with your "driver" is going to be determination of types (is this number a byte, an int or a long?) and possibly creating an inferred schema to validate the input. Determining nullability of columns will also be an issue you have to examine.
Without better understanding the business driver for this project, I can't give you much more than that.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Create XElement from invalid or partial XML String - c#

Related

Store data of multiple parsed XML files - .NET

Reading a string formatted like XML

Pulling out XML values from a string in a database column

Very large XML file generation

Uploading XML data to a Dataset

Categories

Resources