I'm relatively new to using C#'s XML classes. I can't even get an XML reader to recognize that the string I'm passing to it is XML. Here is my unit test which I'm using to test basic Xml reading
[TestFixture()]
public class LegacyWallTests
{
[Test()]
public void ReadLegacyWallFile()
{
var legacyWallText = legacyfiles.legacywall1;
{
string xmlString = legacyfiles.legacywall1;
using (XmlReader reader = XmlReader.Create(new StringReader(xmlString)))
{
reader.HasAttributes.Should().BeTrue();
}
}
}
}
And here is the XML I'm trying to read
<Wall>
<Actual>
<Specifications>
<Insertion> 375.6858 916.8871 0.0000 </Insertion>
<Angle> 3.14159 </Angle>
<WallDesc> E4-1, H: 8' 1 1/8, Sh: Yes, S: 2~4~2~9-0-0~SPF~~, Spc: Single # 16 in OC, BP: 2~4~2~12-0-0~SYP~~, CP: 2~4~2~12-0-0~SYP~~, TP: 2~4~2~12-0-0~SYP~~,\P LI: Single # 38.75000000, CB: No, VB: No, NCT: 2~4~2~9-0-0~SPF~~, CT: 2~4~2~9-0-0~SPF~~, Pac: 2~4~2~9-0-0~SPF~~, Mir: Yes </WallDesc>
<WallNum> 1 </WallNum>
<VaporBarrier></VaporBarrier>
</Specifications>
</Actual>
</Wall>
legacyfiles.legacywall1 is the name of the xml file I added to my project's resources. I know that the xml file is being read because outputting that string to the console gives me the xml from the file. However, when I create the XmlReader and test that there are attributes it says there aren't any. I don't know what I'm doing wrong.
XmlReader.HasAttribute returns true if the current node has attributes. As you don't advance into the document, the reader starts at the root element, <Wall>, which doesn't have attributes. Nor do any other of your elements.
An attribute is bar in <foo bar="baz" />.
You also generally don't want to mess with XML using readers. Obtain or generate an XSD (also very useful for input validation), generate a class from that XSD and deserialize the incoming XML to an instance of that class. Then you can just access wall.Actual.Specifications[0].WallDesc.
Related
We have a long living app that uses some feed that used to be xml, but was converted to json...
Of course we were "to lazy" to change parser from reading XmlDocument to read JObject or other so we used "DeserializeXmlNode" to convert from json txt to XmlDocument.
All was fine for a long long time... until we updated from Newtonsoft.Json versions 4.5 and 6.0 to version 12.0.x and suddenly we started to have some problems...
let's say json looks like this:
{"version":"2.0","result":[{"mainobid":"123","typeId":"2","subobjects":{"1":{"data":"data"},"2":{"data":"data"}}}]}
what we used to get was xml having
<1><data>data</data></1><2><data>data</data></2>
tags
now... instead of <1> tag we get something like <x0031>
instead of 10 there's _x0031_0
instead of 45 there's 0x0034_5
and instead of 100 _x0031_00
Can I turn that off somehow? or am I forced now to change parsing to decode that sick x003.... thing?
INB4 1: I realize that having 1: and <1> is not the thing that anyone sane wishes to have, but i can't change that, it's external feed
INB4 2: I know we should change parsing from xml to json, but as above - some lazines and re-using old code that was working 100% good.
EDIT:
private static void TestOldNewton()
{
var jsonstr = "{\"version\":\"2.0\",\"result\":[{\"mainobid\":\"123\",\"typeId\":\"2\",\"subobjects\":{\"1\":{\"data\":\"data\"},\"2\":{\"data\":\"data\"}}}]}";
var doc = Newtonsoft.Json.JsonConvert.DeserializeXmlNode(jsonstr, "data");
Console.WriteLine(doc.OuterXml);
Console.ReadKey();
}
using packages.config like:
<?xml version="1.0" encoding="utf-8"?>
<packages>
<package id="Newtonsoft.Json" version="6.0.1" targetFramework="net48" />
</packages>
and receiving output:
<data><version>2.0</version><result><mainobid>123</mainobid><typeId>2</typeId><subobjects><1><data>data</data></1><2><data>data</data></2></subobjects></result></data>
freshly compiled and run on new, testing project.
The cause of the change is the following checkin: Fixed converting JSON to XML with invalid XML name characters to Json.NET 8.0.1. This checkin added (among other changes) calls to XmlConvert.EncodeName() inside XmlNodeConverter.CreateElement():
private IXmlElement CreateElement(string elementName, IXmlDocument document, string? elementPrefix, XmlNamespaceManager manager)
{
string encodeName = EncodeSpecialCharacters ? XmlConvert.EncodeLocalName(elementName) : XmlConvert.EncodeName(elementName);
string ns = StringUtils.IsNullOrEmpty(elementPrefix) ? manager.DefaultNamespace : manager.LookupNamespace(elementPrefix);
IXmlElement element = (!StringUtils.IsNullOrEmpty(ns)) ? document.CreateElement(encodeName, ns) : document.CreateElement(encodeName);
return element;
}
This was done to [add] support for converting JSON to XML with invalid XML name characters. This applies here because element names beginning with numerals such as <1> are not well-formed XML element names, as explained in XML tagname starting with number is not working. And in fact the XML you were previously generating was not, strictly speaking, well-formed XML.
As you can see from the code excerpt above, there doesn't seem to be a way to disable this change and create elements names without encoding them.
As a workaround, since you want to create elements with numeric names like <1> anyway, you could subclass XmlTextWriter and decode the names as they are written by calling XmlConvert.DecodeName()
This method does the reverse of the EncodeName(String) and EncodeLocalName(String) methods.
First define the following class:
public class NameEditingXmlTextWriter : XmlTextWriter
{
readonly Func<string, string, string> nameEditor;
public NameEditingXmlTextWriter(TextWriter writer, Func<string, string, string> nameEditor)
: base(writer)
{
this.nameEditor = nameEditor;
}
public override void WriteStartElement(string prefix, string localName, string ns)
{
var newLocalName = nameEditor(localName, ns);
base.WriteStartElement(prefix, newLocalName, ns);
}
}
Then use it as follows:
var doc = Newtonsoft.Json.JsonConvert.DeserializeXmlNode(jsonstr, "root");
var sb = new StringBuilder();
using (var textWriter = new StringWriter(sb))
using (var writer = new NameEditingXmlTextWriter(textWriter, (n, ns) => XmlConvert.DecodeName(n)))
{
doc.WriteTo(writer);
}
var outerXml = sb.ToString();
Notes:
You must subclass the deprecated XmlTextWriter instead of its replacement XmlWriter because XmlWriter will throw an exception on an attempt to write a malformed XML element name such as <1>.
As an alternative, since Json.NET is currently licensed under the MIT License, you could fork your own version of XmlNodeConverter and remove the calls to XmlConvert.EncodeName() from CreateElement(). However, this solution seems less desirable as it creates a maintenance requirement to keep your forked version up-to-date with Newtonsoft's version.
Demo fiddle here.
I have an xml sheet which contains some special character "& is the special character causing issues" and i use below code to deserialize XML
XMLDATAMODEL imported_data;
// Create an instance of the XmlSerializer specifying type and namespace.
XmlSerializer serializer = new XmlSerializer(typeof(XMLDATAMODEL));
// A FileStream is needed to read the XML document.
FileStream fs = new FileStream(path, FileMode.Open);
XmlReader reader = XmlReader.Create(fs);
// Use the Deserialize method to restore the object's state.
imported_data = (XMLDATAMODEL)serializer.Deserialize(reader);
fs.Close();
and structre of my XML MOdel is like this
[XmlRoot(ElementName = "XMLDATAMODEL")]
public class XMLDATAMODEL
{
[XmlElement(ElementName = "EventName")]
public string EventName { get; set; }
[XmlElement(ElementName = "Location")]
public string Location { get; set; }
}
I tried this code as well with Encoding mentioned but no success
// Declare an object variable of the type to be deserialized.
StreamReader streamReader = new StreamReader(path, System.Text.Encoding.UTF8, true);
XmlSerializer serializer = new XmlSerializer(typeof(XMLDATAMODEL));
imported_data = (XMLDATAMODEL)serializer.Deserialize(streamReader);
streamReader.Close();
Both approaches failed and if i put special character inside Cdata it looks working.
How can i make it work for xml data without CData as well?
Here is my XML file content
http://pastebin.com/Cy7icrgS
And error i am getting is There is an error in XML document (2, 17).
The best answer I could get after looking around is, unless you serialize the data yourself, it will be pretty trouble some to deserialize XML will special characters.
For your case, since the special character is & before you can deserialize it, you should convert it to & Unless the character & is converted to & we cannot really deserialize it with XmlSerializer. Yes, we still can read it by using
XmlReaderSettings settings = new XmlReaderSettings();
settings.CheckCharacters = false; //not to check false character, this setting can be set.
FileStream fs = new FileStream(xmlfolder + "\\xmltest.xml", FileMode.Open);
XmlReader reader = XmlReader.Create(fs, settings);
But we cannot deserialize it.
As how to convert & to &, there are various ways with plus and minus. But the bottom line in all conversion is, do not use stream directly. Just take the data from the file and convert it to string by using, for example, File.ReadAllText and start doing the string processing. After that, convert it to MemoryStream and start the deserialization;
And now for the string processing before deserialization, there are couple of ways to do it.
The easiest, and most of the time could be the most unsafe, would be by using string.Replace("&", "&").
The other way, harder but safer, is by using Regex. Since your case is something inside CData, this could be a good way too.
Another way harder yet safer, by creating your parsing for line by line.
I have yet to find what is the common, safe, way for this conversion.
But as for your example, the string.Replace would work. Also, you could potentially exploit the pattern (something inside CData) to use Regex. This could be a good way too.
Edit:
As for what are considered as special characters in XML and how to process them before hand, according to this, non-Roman characters are included.
Apart from the non-Roman characters, in here, there are 5 special characters listed:
< -> <
> -> >
" -> "
' -> '
& -> &
And from here, we get one more:
% -> %
Hope they can help you!
I know that the following would find potential tags, but is there a better way to check if a string contains XML tags to prevent exceptions when reading/writing the string between XML files?
string testWord = "test<a>";
bool foundTag = Regex.IsMatch(testWord, #"^*<*>*$"));
I'd use another Regex for that
Regex.IsMatch(testWord, #"<.+?>");
However, even if it does match, there is no guarantee that your file actually is an xml file, as the regex could also match strings like "<<a>" which is invalid, or "a <= b >= c" which is obviously not xml.
You should consider using the XmlDocument class instead.
XmlDocument xmlDoc = new XmlDocument();
try
{
xmlDoc.Load(testWord);
}
catch
{
// not an xml
}
Why don't you HtmlEncode the string before sending it via XML? This way you can avoid difficulties with Regex parsing tags.
I'm doing an XML reading process in my project. Where I have to read the contents of an XML file. I have achieved it.
Just out of curiosity, I also tried using the same by keeping the XML content inside a string and then read only the values inside the elemet tag. Even this I have achieved. The below is my code.
string xml = <Login-Form>
<User-Authentication>
<username>Vikneshwar</username>
<password>xxx</password>
</User-Authentication>
<User-Info>
<firstname>Vikneshwar</firstname>
<lastname>S</lastname>
<email>xxx#xxx.com</email>
</User-Info>
</Login-Form>";
XDocument document = XDocument.Parse(xml);
var block = from file in document.Descendants("client-authentication")
select new
{
Username = file.Element("username").Value,
Password = file.Element("password").Value,
};
foreach (var file in block)
{
Console.WriteLine(file.Username);
Console.WriteLine(file.Password);
}
Similarly, I obtained my other set of elements (firstname, lastname, and email). Now my curiosity draws me again. Now I'm thinking of doing the same using the string functions?
The same string used in the above code is to be taken. I'm trying not to use any XMl related classes, that is, XDocument, XmlReader, etc. The same output should be achieved using only string functions. I'm not able to do that. Is it possible?
Don't do it. XML is more complex than can appear the case, with complex rules surrounding nesting, character-escaping, named-entities, namespaces, ordering (attributes vs elements), comments, unparsed character data, and whitespace. For example, just add
<!--
<username>evil</username>
-->
Or
<parent xmlns=this:is-not/the/data/you/expected">
<username>evil</username>
</parent>
Or maybe the same in a CDATA section - and see how well basic string-based approaches work. Hint: you'll get a different answer to what you get via a DOM.
Using a dedicated tool designed for reading XML is the correct approach. At the minimum, use XmlReader - but frankly, a DOM (such as your existing code) is much more convenient. Alternatively, use a serializer such as XmlSerializer to populate an object model, and query that.
Trying to properly parse xml and xml-like data does not end well.... RegEx match open tags except XHTML self-contained tags
You could use methods like IndexOf, Equals, Substring etc. provided in String class to fulfill your needs, for more info Go here,
Using Regex is a considerable option too.
But it's advisable to use XmlDocument class for this purpose.
It can be done without regular expressions, like this:
string[] elementNames = new string[]{ "<username>", "<password>"};
foreach (string elementName in elementNames)
{
int startingIndex = xml.IndexOf(elementName);
string value = xml.Substring(startingIndex + elementName.Length,
xml.IndexOf(elementName.Insert(1, "/"))
- (startingIndex + elementName.Length));
Console.WriteLine(value);
}
With a regular expression:
string[] elementNames2 = new string[]{ "<username>", "<password>"};
foreach (string elementName in elementNames2)
{
string value = Regex.Match(xml, String.Concat(elementName, "(.*)",
elementName.Insert(1, "/"))).Groups[1].Value;
Console.WriteLine(value);
}
Of course, the only recommended thing is to use the XML parsing classes.
Build an extension method that will get the text between tags like this:
public static class StringExtension
{
public static string Between(this string content, string start, string end)
{
int startIndex = content.IndexOf(start) + start.Length;
int endIndex = content.IndexOf(end);
string result = content.Substring(startIndex, endIndex - startIndex);
return result;
}
}
I'm currently searching for an easy way to serialize objects (in C# 3).
I googled some examples and came up with something like:
MemoryStream memoryStream = new MemoryStream ( );
XmlSerializer xs = new XmlSerializer ( typeof ( MyObject) );
XmlTextWriter xmlTextWriter = new XmlTextWriter ( memoryStream, Encoding.UTF8 );
xs.Serialize ( xmlTextWriter, myObject);
string result = Encoding.UTF8.GetString(memoryStream .ToArray());
After reading this question I asked myself, why not using StringWriter? It seems much easier.
XmlSerializer ser = new XmlSerializer(typeof(MyObject));
StringWriter writer = new StringWriter();
ser.Serialize(writer, myObject);
serializedValue = writer.ToString();
Another Problem was, that the first example generated XML I could not just write into an XML column of SQL Server 2005 DB.
The first question is: Is there a reason why I shouldn't use StringWriter to serialize an Object when I need it as a string afterwards? I never found a result using StringWriter when googling.
The second is, of course: If you should not do it with StringWriter (for whatever reasons), which would be a good and correct way?
Addition:
As it was already mentioned by both answers, I'll further go into the XML to DB problem.
When writing to the Database I got the following exception:
System.Data.SqlClient.SqlException:
XML parsing: line 1, character 38,
unable to switch the encoding
For string
<?xml version="1.0" encoding="utf-8"?><test/>
I took the string created from the XmlTextWriter and just put as xml there. This one did not work (neither with manual insertion into the DB).
Afterwards I tried manual insertion (just writing INSERT INTO ... ) with encoding="utf-16" which also failed.
Removing the encoding totally worked then. After that result I switched back to the StringWriter code and voila - it worked.
Problem: I don't really understand why.
at Christian Hayter: With those tests I'm not sure that I have to use utf-16 to write to the DB. Wouldn't setting the encoding to UTF-16 (in the xml tag) work then?
One problem with StringWriter is that by default it doesn't let you set the encoding which it advertises - so you can end up with an XML document advertising its encoding as UTF-16, which means you need to encode it as UTF-16 if you write it to a file. I have a small class to help with that though:
public sealed class StringWriterWithEncoding : StringWriter
{
public override Encoding Encoding { get; }
public StringWriterWithEncoding (Encoding encoding)
{
Encoding = encoding;
}
}
Or if you only need UTF-8 (which is all I often need):
public sealed class Utf8StringWriter : StringWriter
{
public override Encoding Encoding => Encoding.UTF8;
}
As for why you couldn't save your XML to the database - you'll have to give us more details about what happened when you tried, if you want us to be able to diagnose/fix it.
When serialising an XML document to a .NET string, the encoding must be set to UTF-16. Strings are stored as UTF-16 internally, so this is the only encoding that makes sense. If you want to store data in a different encoding, you use a byte array instead.
SQL Server works on a similar principle; any string passed into an xml column must be encoded as UTF-16. SQL Server will reject any string where the XML declaration does not specify UTF-16. If the XML declaration is not present, then the XML standard requires that it default to UTF-8, so SQL Server will reject that as well.
Bearing this in mind, here are some utility methods for doing the conversion.
public static string Serialize<T>(T value) {
if(value == null) {
return null;
}
XmlSerializer serializer = new XmlSerializer(typeof(T));
XmlWriterSettings settings = new XmlWriterSettings()
{
Encoding = new UnicodeEncoding(false, false), // no BOM in a .NET string
Indent = false,
OmitXmlDeclaration = false
};
using(StringWriter textWriter = new StringWriter()) {
using(XmlWriter xmlWriter = XmlWriter.Create(textWriter, settings)) {
serializer.Serialize(xmlWriter, value);
}
return textWriter.ToString();
}
}
public static T Deserialize<T>(string xml) {
if(string.IsNullOrEmpty(xml)) {
return default(T);
}
XmlSerializer serializer = new XmlSerializer(typeof(T));
XmlReaderSettings settings = new XmlReaderSettings();
// No settings need modifying here
using(StringReader textReader = new StringReader(xml)) {
using(XmlReader xmlReader = XmlReader.Create(textReader, settings)) {
return (T) serializer.Deserialize(xmlReader);
}
}
}
First of all, beware of finding old examples. You've found one that uses XmlTextWriter, which is deprecated as of .NET 2.0. XmlWriter.Create should be used instead.
Here's an example of serializing an object into an XML column:
public void SerializeToXmlColumn(object obj)
{
using (var outputStream = new MemoryStream())
{
using (var writer = XmlWriter.Create(outputStream))
{
var serializer = new XmlSerializer(obj.GetType());
serializer.Serialize(writer, obj);
}
outputStream.Position = 0;
using (var conn = new SqlConnection(Settings.Default.ConnectionString))
{
conn.Open();
const string INSERT_COMMAND = #"INSERT INTO XmlStore (Data) VALUES (#Data)";
using (var cmd = new SqlCommand(INSERT_COMMAND, conn))
{
using (var reader = XmlReader.Create(outputStream))
{
var xml = new SqlXml(reader);
cmd.Parameters.Clear();
cmd.Parameters.AddWithValue("#Data", xml);
cmd.ExecuteNonQuery();
}
}
}
}
}
<TL;DR> The problem is rather simple, actually: you are not matching the declared encoding (in the XML declaration) with the datatype of the input parameter. If you manually added <?xml version="1.0" encoding="utf-8"?><test/> to the string, then declaring the SqlParameter to be of type SqlDbType.Xml or SqlDbType.NVarChar would give you the "unable to switch the encoding" error. Then, when inserting manually via T-SQL, since you switched the declared encoding to be utf-16, you were clearly inserting a VARCHAR string (not prefixed with an upper-case "N", hence an 8-bit encoding, such as UTF-8) and not an NVARCHAR string (prefixed with an upper-case "N", hence the 16-bit UTF-16 LE encoding).
The fix should have been as simple as:
In the first case, when adding the declaration stating encoding="utf-8": simply don't add the XML declaration.
In the second case, when adding the declaration stating encoding="utf-16": either
simply don't add the XML declaration, OR
simply add an "N" to the input parameter type: SqlDbType.NVarChar instead of SqlDbType.VarChar :-) (or possibly even switch to using SqlDbType.Xml)
(Detailed response is below)
All of the answers here are over-complicated and unnecessary (regardless of the 121 and 184 up-votes for Christian's and Jon's answers, respectively). They might provide working code, but none of them actually answer the question. The issue is that nobody truly understood the question, which ultimately is about how the XML datatype in SQL Server works. Nothing against those two clearly intelligent people, but this question has little to nothing to do with serializing to XML. Saving XML data into SQL Server is much easier than what is being implied here.
It doesn't really matter how the XML is produced as long as you follow the rules of how to create XML data in SQL Server. I have a more thorough explanation (including working example code to illustrate the points outlined below) in an answer on this question: How to solve “unable to switch the encoding” error when inserting XML into SQL Server, but the basics are:
The XML declaration is optional
The XML datatype stores strings always as UCS-2 / UTF-16 LE
If your XML is UCS-2 / UTF-16 LE, then you:
pass in the data as either NVARCHAR(MAX) or XML / SqlDbType.NVarChar (maxsize = -1) or SqlDbType.Xml, or if using a string literal then it must be prefixed with an upper-case "N".
if specifying the XML declaration, it must be either "UCS-2" or "UTF-16" (no real difference here)
If your XML is 8-bit encoded (e.g. "UTF-8" / "iso-8859-1" / "Windows-1252"), then you:
need to specify the XML declaration IF the encoding is different than the code page specified by the default Collation of the database
you must pass in the data as VARCHAR(MAX) / SqlDbType.VarChar (maxsize = -1), or if using a string literal then it must not be prefixed with an upper-case "N".
Whatever 8-bit encoding is used, the "encoding" noted in the XML declaration must match the actual encoding of the bytes.
The 8-bit encoding will be converted into UTF-16 LE by the XML datatype
With the points outlined above in mind, and given that strings in .NET are always UTF-16 LE / UCS-2 LE (there is no difference between those in terms of encoding), we can answer your questions:
Is there a reason why I shouldn't use StringWriter to serialize an Object when I need it as a string afterwards?
No, your StringWriter code appears to be just fine (at least I see no issues in my limited testing using the 2nd code block from the question).
Wouldn't setting the encoding to UTF-16 (in the xml tag) work then?
It isn't necessary to provide the XML declaration. When it is missing, the encoding is assumed to be UTF-16 LE if you pass the string into SQL Server as NVARCHAR (i.e. SqlDbType.NVarChar) or XML (i.e. SqlDbType.Xml). The encoding is assumed to be the default 8-bit Code Page if passing in as VARCHAR (i.e. SqlDbType.VarChar). If you have any non-standard-ASCII characters (i.e. values 128 and above) and are passing in as VARCHAR, then you will likely see "?" for BMP characters and "??" for Supplementary Characters as SQL Server will convert the UTF-16 string from .NET into an 8-bit string of the current Database's Code Page before converting it back into UTF-16 / UCS-2. But you shouldn't get any errors.
On the other hand, if you do specify the XML declaration, then you must pass into SQL Server using the matching 8-bit or 16-bit datatype. So if you have a declaration stating that the encoding is either UCS-2 or UTF-16, then you must pass in as SqlDbType.NVarChar or SqlDbType.Xml. Or, if you have a declaration stating that the encoding is one of the 8-bit options (i.e. UTF-8, Windows-1252, iso-8859-1, etc), then you must pass in as SqlDbType.VarChar. Failure to match the declared encoding with the proper 8 or 16 -bit SQL Server datatype will result in the "unable to switch the encoding" error that you were getting.
For example, using your StringWriter-based serialization code, I simply printed the resulting string of the XML and used it in SSMS. As you can see below, the XML declaration is included (because StringWriter does not have an option to OmitXmlDeclaration like XmlWriter does), which poses no problem so long as you pass the string in as the correct SQL Server datatype:
-- Upper-case "N" prefix == NVARCHAR, hence no error:
DECLARE #Xml XML = N'<?xml version="1.0" encoding="utf-16"?>
<string>Test ሴ😸</string>';
SELECT #Xml;
-- <string>Test ሴ😸</string>
As you can see, it even handles characters beyond standard ASCII, given that ሴ is BMP Code Point U+1234, and 😸 is Supplementary Character Code Point U+1F638. However, the following:
-- No upper-case "N" prefix on the string literal, hence VARCHAR:
DECLARE #Xml XML = '<?xml version="1.0" encoding="utf-16"?>
<string>Test ሴ😸</string>';
results in the following error:
Msg 9402, Level 16, State 1, Line XXXXX
XML parsing: line 1, character 39, unable to switch the encoding
Ergo, all of that explanation aside, the full solution to your original question is:
You were clearly passing the string in as SqlDbType.VarChar. Switch to SqlDbType.NVarChar and it will work without needing to go through the extra step of removing the XML declaration. This is preferred over keeping SqlDbType.VarChar and removing the XML declaration because this solution will prevent data loss when the XML includes non-standard-ASCII characters. For example:
-- No upper-case "N" prefix on the string literal == VARCHAR, and no XML declaration:
DECLARE #Xml2 XML = '<string>Test ሴ😸</string>';
SELECT #Xml2;
-- <string>Test ???</string>
As you can see, there is no error this time, but now there is data-loss 🙀.
public static T DeserializeFromXml<T>(string xml)
{
T result;
XmlSerializerFactory serializerFactory = new XmlSerializerFactory();
XmlSerializer serializer =serializerFactory.CreateSerializer(typeof(T));
using (StringReader sr3 = new StringReader(xml))
{
XmlReaderSettings settings = new XmlReaderSettings()
{
CheckCharacters = false // default value is true;
};
using (XmlReader xr3 = XmlTextReader.Create(sr3, settings))
{
result = (T)serializer.Deserialize(xr3);
}
}
return result;
}
For anyone in need of an F# version of the approved answer:
type private Utf8StringWriter() =
inherit StringWriter()
override _.Encoding = System.Text.Encoding.UTF8
It may have been covered elsewhere but simply changing the encoding line of the XML source to 'utf-16' allows the XML to be inserted into a SQL Server 'xml'data type.
using (DataSetTableAdapters.SQSTableAdapter tbl_SQS = new DataSetTableAdapters.SQSTableAdapter())
{
try
{
bodyXML = #"<?xml version="1.0" encoding="UTF-8" standalone="yes"?><test></test>";
bodyXMLutf16 = bodyXML.Replace("UTF-8", "UTF-16");
tbl_SQS.Insert(messageID, receiptHandle, md5OfBody, bodyXMLutf16, sourceType);
}
catch (System.Data.SqlClient.SqlException ex)
{
Console.WriteLine(ex.Message);
Console.ReadLine();
}
}
The result is all of the XML text is inserted into the 'xml' data type field but the 'header' line is removed. What you see in the resulting record is just
<test></test>
Using the serialization method described in the "Answered" entry is a way of including the original header in the target field but the result is that the remaining XML text is enclosed in an XML <string></string> tag.
The table adapter in the code is a class automatically built using the Visual Studio 2013 "Add New Data Source: wizard. The five parameters to the Insert method map to fields in a SQL Server table.