SQL Server: variable length XML storage - c#

I'm in the process of writing an application that interacts with a third party application.
The third party application will be passing my application several raw XML requests. I would like to save each of these requests in a communications log in my DB.
What's the most efficient way to store this variable-length data? VARCHAR(MAX)? NVARCHAR(MAX)?
If one is a better choice than the other (or there is another option I'm missing), please explain why it's the best choice.

Since you're using SQL Server 2K5 the best data type to store XML data is xml.
This provides parsing and schema validation features. It also allows you to index the XML data later if need be.

XML seams to be the obvious data type of choice when dealing with XML but not always.
Have a look at this article by Robert Sheldon. Working with the XML Data Type in SQL
In some cases, you shouldn’t use the XML data type, but instead use large object storage—VARCHAR(MAX), NVARCHAR(MAX), or VARBINARY(MAX). For example, if you simply store your XML documents in the database and retrieve and update those documents as a whole—that is, if you never need to query or modify the individual XML components—you should consider using one of the large object data types. The same goes for XML files that you want to preserve in their original form, such as legal documents. If you need to retain an exact textual copy, use large object storage.

I have used the XML Datatype for this type of thing MSDN link - XML DataType 2005
Native and allows you to do some normal angle bracket things to the actual data.
Big plus is that I am not converting or messing around with the actual data, and introducing subtle bugs with the actual XML.
Big plus if you want to do anything with the XML like render it.
Downside is that you have to be aware the column is XML data and you need to code for it in upstream apps.

Adding to Yuck's answer:
VARCHAR(MAX)
XML means Unicode, and if you choose non-Unicode storage then data loss is almost certain
NVARCHAR(MAX)
appropriate if you only want to log the XML data
XML
if you want to query XML content later
typed XML with XML SCHEMA COLLECTION
only if you have a fixed xml schema (XSD) which will never ever change. (ALTER XML SCHEMA COLLECTION does not support update or delete of XML entities as I understand)

Related

How to deal with a XML based protocol where the response may conform to one of two XSDs?

I have to read and write data through a protocol where the response XML maybe different according to the error state of the server application. If the response is good it uses let's say Xml_1 with a specific schema but if the response indicates an error it uses Xml_2 with a complete different schema. The good design , in my opinion would be to incorporate the error state to the first schema, but we are just consumers of the this service and we don't have access to the design of the server application. My solution is to (using C#) read the XML response as string, do some searching in order to understand which XML schema is in use and then using the appropriate XML Serializer to convert the response to an object. Is there a more elegant solution?
Is the union of the two schemas a valid schema? (This will typically be the case, for example, if they use different namespaces, but it's likely not to be the case if they are both no-namespace schemas or if they are two versions of the same schema).
If the union is a valid schema, then you could consider validing against that.
Otherwise peeking at the start of the file will often be enough to tell you which vocabulary is in use.
It's possible to parse an XML document without validation, inspect it, and then validate the already parsed document. It's even possible to do this in a single pipeline without putting the whole document in memory. But the details depend on the toolkit you are using. You've tagged the question C# - I'm not sure if this is possible using the Microsoft tools, but it should be possible I think using Saxon-CS. [Disclaimer, my product].

Inserting XML from XDocument into databse column with header and no visible linefeeds

I am rewriting an old application that used to upload XML files using an old .aspx form. We're getting rid of the form and want to automate the process. As I'm doing this I'm seeing differences in the XML formatting and want to make sure that I process the XML the same way that the old app did since another process relies on the format.
The old format in VB, used a MemoryStream, read all of the bytes and returned the Stream and created a big inline SQL insert statement to load the data into the DB.
The new format uses C# and XDocument. The line
XDocument.Load(fileName)
Returns XML in the correct format, but I don't see an XML header and the data is surrounded by curly braces -> "{ }" In the XML Viewer in Visual Studio, the data looks fine, though, so perhaps this is all a residue of Visual Studio?
In any case, I need to get the XML to include the header when inserting into the database. Any advice would be appreciated! Thank you!
You should not think about XML as text with some fancy extras... How is this stored in SQL Server? If the target column is a real XML type you should not bother about the visual format at all. If the visual format is of any importance, than it is the problem of the consuming / reading software...
If you store the XML in a string typed column, you can store literally everything, even invalid XML. If you want to use this XML in SQL Server with XML methods like .value() or .nodes() you will need the real XML type... If you can control this, make sure the target is a real XML typed variable or column!
The xml declaration (the processing instruction stating the encoding and the xml-version in most cases) will be ommitted in any case. SQL Server does not accept such a declaration with its native XML type.
You get into troubles, if you store the XML in a string typed column together with an xml declaration. In this case you should use encoding="utf-16" and you must store this in a NVARCHAR(MAX) column.
If the actual encoding is NCHAR or NVARCHAR SQL Server expects a unicode encoded string. If the column is without the N, SQL Server expects extended ASCII (collation dependant). You cannot mix this! You cannot convert a string with an xml-declaration stating utf-16 if it is a VARCHAR (and vice-versa).
Anyway one should avoid ASCII encoded XML. This will get you into troubles with non-latin characters and will need expensive operations due to the fact, that SQL Server stores XML in a unicode based tree structure internally.
About namespaces, if there are any, you must be very carefull. They must be part of the XML, otherwise you won't be able to read the XML later.
In this answer you find the code to convert an XDocument into XmlDocument. Then use the property OuterXml to get the textual representation of the XML. As C# internally uses unicode strings, just pass this over into a variable or column of type XML or NVARCHAR(MAX).

Is it better to have SQL Server parse a large multi document XML or send it each document separately

I need to request the XML from PubMed like
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=27087788,28322247,26158412&retmode=xml
The example has 3 IDs but the request can be as much as 200 at a time. The request is being done by a .NET web service. I am looking for the most efficient way to process the XML files. I know that the the term "best" or "efficient" is very subjective and dependent upon many things but:
Is it better to send the entire string to the SQL Server database (if it is even possible because of length or possible nesting levels) and let it parse the document and save it to the database or is it better to parse the document in the web service using a XMLTextReader or XML Document Object and send each document? Each document needs to be saved as a separate record.
Thanks for your information.
My first thought was: Why SQL-Server? Why send all this data around? Do the parsing in C#!
But - on the second sight: If I understand this correctly, you want to read many different XML files and store them in your database.
Now I'd rather ask: When do you need to retrieve data from these XMLs and do you need to store extracted data in relational tables? Would it be a possible approach for you to store alle these XMLs as-is in XML typed columns and read them on demand?
You can pass your XML as C#-string (which is unicode) and insert this directly into an XML-typed column. To avoid any hassel you should cut away the first lines (<xml>declaration and DOCTYPE) and start with <PubmedArticleSet>.
The rest should be easily transfered and stored in SQL-Server.
If you need help on how to read this? Just come back with another more pointed question.
About your Which is faster question you might read this.

Is it possible to put the XML content in a field of the database?

I was analyzing my situation and I face with the problem that I need to save the content of an XML file in SQL SERVER 2008. My XML files have a size around 200KB to 600KB
For the momment, how would I define the field to accept this content? I can imagine I can set the content directly but I'm not sue about this problem.
Thanks in advance.
SQL Server has a datatype called XML. Use that.
SQL Server has an xml data type exactly for that purpose. For example:
create table YourTable (id int identity, FileContent xml)
SQL 2008 supports a dedicated xml data type, which is documented here.
Some details include the following for fragments, and documents, respectively:
Restricts the xml instance to be a well-formed XML fragment. The XML
data can contain multiple zero or more elements at the top level. Text
nodes are also allowed at the top level.
Restricts the xml instance to be a well-formed XML document. The XML
data must have one and only one root element. Text nodes are not
allowed at the top level.
Also, a point on capacity (though a 2GB limitation should be hard to reach in the majority of cases):
The stored representation of xml data type instances cannot exceed 2
gigabytes (GB) in size. For more information, see Implementing XML in
SQL Server.
If you're working with SQL Server 2005+ you can use the xml data type. You'll need to be sure that your data is actually well-formed XML, though. If it's just snippets you're better off using nvarchar.
File sizes of 200 - 600 KB are no issue for the xml column type.
Did you try using BLOB data types or CLOB ?
SQLServer has an XML datatype.. have a look here
Yes, you can store XML text in a text field in your SQL database, but if you're planning to do a lot of introspection on that XML data, you'll probably be better off using MS-SQL's XML data type. This stores the XML data in tokenized form on the server and makes it possible to perform XML query operations without having to reparse the XML data all the time. I seem to recall you can also index on XML expressions as well.

Approach to process huge xml files in C#

Can someone please guide me with this problem?
In my institution, we process xml files of huge size(max 1 GB) and insert the details into a database table. Per current design, we are parsing xml file with XmlReader and form a xml string with required data, which will then be passed into a stored procedure (xml data type) to insert the details into db.
Now the problem is we are not sure if there would be a better approach other than this ? so please suggest if are any new features available with .Net 3.5 and/or sql server 2005 to handle this in a way better than our approach.
Any help in this reagrd would be highly appreciated.
Thanks.
Do you care at all what is in the XML-file? If not, you can just use a StreamReader and get the text from the XML and just pass it along to the database.
If you need to validate that the XML is correct, it is a good idea to use XmlReader.
However, just dumping 1GB of XML into your database seems a bit weird, what is the purpose of this XML data? Is it a lot of nested elements? Maybe you could de-serialize it and store each object in the appropriet table instead, which would imo lead to a easier understandable design.
There are a couple of things you can think of to make the design of your software easier/better:
Does more than one XML file occure in the database at once?
How is the data shared between applications?
Have you considered using MemoryMappedFile?
Is it possible to de-serialize the XML into entities instead and store them approprietly?
I suspect that if there are any performance issues it will be with the stored procedure and the database side of things rather that reading the file.
Why are you storing the XML file in a database table? I would suggest using a different solution would be appropriate, but without knowing more details about exactly what it is you are trying to do it is hard to advise.
If each first-level element in the xml is a record, i.e.
<rootNode>
<row>...</row>
<row>...</row>
<row>...</row>
</rootNode>
Then you could create an IDataReader implemention that reads the xml (via XmlReader) and presents each as a record, to be imported using SqlBulkCopy. Pretty much like my old answer here.
Advantages:
SqlBulkCopy is the fastest way to get data into a database
stripping it into records makes appropriate use of a database, allowing indexing and proper typing
it doesn't rely on a huge BLOB going over the wire in an atomic way (necessary for the xml data type)

Categories

Resources