XDocument prevent invalid characters - c#

I am using XDocument to keep a sort of database. This database consists of registered chatterbots, and I simply have many "bot" nodes with attributes such as "username", "owner", and such. However, occasionally some smart guy decides to make a bot with a very strange character as one of the properties. This makes the XDocument class series throw an exception whenever that node is read, a very large problem because the database fails to save completely as it stops writing to the file as soon as it hits the invalid character.
My question is this- Is there a simple method that is something like XSomething.IsValidString(string s), so I can just omit the offending data? My database is not the official one, just a personal use, so it is not imperative that I include the bad data.
Some code that I am using (the variable file is the XDocument):
To save:
file.Save(Path.Combine(Environment.CurrentDirectory, "bots.xml"));
To load (after checking if File.Exists() etc etc):
file = XDocument.Load(Path.Combine(Environment.CurrentDirectory, "bots.xml"));
To add to the database (variables are all strings):
file.Root.Add(new XElement("bot",
new XAttribute("username", botusername),
new XAttribute("type", type),
new XAttribute("botversion", botversion),
new XAttribute("bdsversion", bdsversion),
new XAttribute("owner", owner),
new XAttribute("trigger", trigger)));
Pardon my lack of proper XML techniques, I'm just starting. What I'm asking is if there is a XSomething.IsValidString(string s) method, not how terrible my XML is.
Ok, I just got the exception again, here is the exact message and stack trace.
System.ArgumentException: '', hexadecimal value 0x07, is an invalid character.
at System.Xml.XmlUtf8RawTextWriter.InvalidXmlChar(Int32 ch, Byte* pDst, Boolean entitize)
at System.Xml.XmlUtf8RawTextWriter.WriteAttributeTextBlock(Char* pSrc, Char* pSrcEnd)
at System.Xml.XmlUtf8RawTextWriter.WriteString(String text)
at System.Xml.XmlUtf8RawTextWriterIndent.WriteString(String text)
at System.Xml.XmlWellFormedWriter.WriteString(String text)
at System.Xml.XmlWriter.WriteAttributeString(String prefix, String localName, String ns, String value)
at System.Xml.Linq.ElementWriter.WriteStartElement(XElement e)
at System.Xml.Linq.ElementWriter.WriteElement(XElement e)
at System.Xml.Linq.XElement.WriteTo(XmlWriter writer)
at System.Xml.Linq.XContainer.WriteContentTo(XmlWriter writer)
at System.Xml.Linq.XDocument.WriteTo(XmlWriter writer)
at System.Xml.Linq.XDocument.Save(String fileName, SaveOptions options)
at System.Xml.Linq.XDocument.Save(String fileName)
at /* my code stack trace omitted */

Try changing the file.Save line for the following code:
XmlWriterSettings settings = new XmlWriterSettings();
settings.CheckCharacters = false;
XmlWriter writer = XmlWriter.Create(Path.Combine(Environment.CurrentDirectory, "bots.xml"), settings);
file.Save(writer);
source: http://sartorialsolutions.wordpress.com/page/2/

First can you check whether your XML file is saved with proper encoding? I normally save xml file as UTF8 and You can declare encoding in your xml header
<?xml version="1.0" encoding="UTF-8"?>
Of course the body of your xml must conforming xml standard. Here is a good article about it
http://weblogs.sqlteam.com/mladenp/archive/2008/10/21/Different-ways-how-to-escape-an-XML-string-in-C.aspx

From .NET 4, you can use XmlConvert.VerifyXmlChars(string content). This will throw an exception if the string passed is not accepted.

Related

when using open xml my file gets corrupt

EDIT:
the problem is now solved, it was that there is xml code which is named 'name' which i was accidentally changing. the solution was to have a obscure name in the docx file
I am creating a program that modify a word document using open xml but every time the program runs the file gets corrupt and i don't know why or if there is any way around it?
i have had a look and one thing i saw was too make sure i had closed the connection but i tried that but i'm not sure if the connection is still opened
edit:
the output file says it corrupt but when the recovery in ms word run the files is as it should be
from the images/code
the the original file is copied to temp.docx and has "name" in the file
i require the program to replace "name" with another word.
the program is semi working as it changes the value of the document however it is corrupting the document.
link to photos: https://drive.google.com/open?id=0B130JvN0ZPPRODJpZWZENTNUX0E
CODE
private void gen_btn_Click(object sender, EventArgs e)
{
if (System.IO.File.Exists(#"C:\invoices\temp.docx"))
{
// Use a try block to catch IOExceptions, to
// handle the case of the file already being
// opened by another process.
try
{
System.IO.File.Delete(#"C:\invoices\temp.docx");
}
catch (System.IO.IOException exception)
{
Console.WriteLine(exception.Message);
return;
}
}
File.Copy(#"C:\invoices\template.docx", #"C:\invoices\temp.docx");
SearchAndReplace("name", "asdsadsadasdasdas");
}
public static void SearchAndReplace(string wordtoreplace, string replace)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(#"C:\invoices\temp.docx", true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
//Regex regexText = new Regex(wordtoreplace);
docText = docText.Replace(wordtoreplace, replace);
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
wordDoc.Close();
}
}
The problem is that the document stream you are opening is an XML document. It contains much more than the words that are typed in your document. There are XML attributes named "name" that are being replaced by your code which make the document no longer validate against the schema.
You can continue doing a plain text replace if you use more unique terms. For example, if your search term is "asdf", then it would be pretty safe to replace because that value won't appear in the XML schema.
To do this correctly, you need to parse the XML document. The XML elements that contain the actual text are named "w:t". If you loop through all of the "w:t" XML elements, you can do your plain text replace on their "InnerText" values. This will guarantee that your XML will remain valid.
Note that you will still have problems if you try to parse the XML directly... If you type your token text ("name" in this case), then apply some kind of format (like bold) to the middle of the word, you will no longer be able to find "name" in a single "w:t" element. By applying the format, the text "name" will be broken up into more than one "w:t" elements. To get this to work in my project, I applied an intermediate step that merged the "w:t" elements before I searched for the tokens. The trick here is knowing when the elements can't be merged due to formatting differences.

XslCompiledTransform cant transform with XmlTestReader created from string

I have a problem with XslCompiledTransform class.
If I tried to run this code:
string pathToXsltFile, pathToInputFile, pathToOutputFile;
XsltSettings xsltSettings = new XsltSettings(true, true);
XslCompiledTransform myXslTransform = new XslCompiledTransform();
XmlTextReader reader = new XmlTextReader(pathToFile);
myXslTransform.Load(reader, xsltSettings, new XmlUrlResolver());
myXslTransform.Transform(pathToInputFile, pathToOutputFile);
It works fine.
But if I want to create XmlTextReader from a string (text):
MemoryStream mStrm = new System.IO.MemoryStream(Encoding.UTF8.GetBytes(text));
XmlTextReader xmlReader = new XmlTextReader(mStrm);
mStrm.Position = 0;
And try to run:
myXslTransform.Load(xmlReader, xsltSettings, new XmlUrlResolver());
myXslTransform.Transform(pathToInputFile, pathToOutputFile);
I get a Exception:
"this operation is not supported for a relative uri"
For some reasons I don't want to create temporaty file and create XmlTextReader from path to this file.
Edit:
Full exception message:
"An error occurred while loading document ''.
See InnerException for a complete description of the error."
InnerException.Message:
"This operation is not supported for a relative URI."
Stack trace:
at System.Xml.Xsl.Runtime.XmlQueryContext.GetDataSource(String uriRelative, String uriBase)
at <xsl:template match=\"gmgml:FeatureCollection\">(XmlQueryRuntime {urn:schemas-microsoft-com:xslt-debug}runtime, XPathNavigator {urn:schemas-microsoft-com:xslt-debug}current)
at <xsl:apply-templates>(XmlQueryRuntime {urn:schemas-microsoft-com:xslt-debug}runtime, XPathNavigator )
at Root(XmlQueryRuntime {urn:schemas-microsoft-com:xslt-debug}runtime)
at Execute(XmlQueryRuntime {urn:schemas-microsoft-com:xslt-debug}runtime)
at System.Xml.Xsl.XmlILCommand.Execute(Object defaultDocument, XmlResolver dataSources, XsltArgumentList argumentList, XmlSequenceWriter results)
at System.Xml.Xsl.XmlILCommand.Execute(Object defaultDocument, XmlResolver dataSources, XsltArgumentList argumentList, XmlWriter writer, Boolean closeWriter)
at System.Xml.Xsl.XmlILCommand.Execute(XmlReader contextDocument, XmlResolver dataSources, XsltArgumentList argumentList, Stream results)
at System.Xml.Xsl.XslCompiledTransform.Transform(String inputUri, String resultsFile)
at MyNamespace.ApplyXslTransformation1(String input, String output, String xsltFileName)
the statement causing the exception:
myXslTransform.Transform(pathToInputFile, pathToOutputFile);
About the document function I will have to ask tommorrow. I've get the xslt file from the other person.
When I've created the XmlTextReader file from the path to the xslt file everytning was fine. I've also try to use:
myXslTransform.Load(pathToXsltFile, xsltSettings, new XmlUrlResolver());
myXslTransform.Transform(pathToInputFile, pathToOutputFile);
And it was also fine.
Now i get the encrypted xslt. I've decrypt it and I want to create XmlTextReader from the decrypted string. Besause of the security reason i don't wont to create temporaty xslt decrypted file.
I think we need to see the XSLT and any calls to the document function it does. In general you need to be aware that the document function has a second argument that can serve as a base URI to resolve URIs resulting from the first argument. Without the second argument being passed in as in e.g. <xsl:value-of select="document('foo.xml')"/> the stylesheet code itself provides the base URI. If you load the stylesheet code from a string that mechanism might not resolve URIs the same way as it happens with a stylesheet loaded from the file system or a HTTP URI. The solution to that problem depends on the location of the resource you want to load and how that relates to the main input file. If you want to load foo.xml from the same location as the main input document then doing document('foo.xml', /) instead of document('foo.xml') should work.
I think this is caused by your manual setting of the memory stream's position to 0; you're confusing the XmlTextReader.
I tried the above and it works fine for me when I comment that line out.
Is there a particular reason you are setting it to 0?
Assuming this question is about using XslCompiledTransform in a .Net Core application, I found the answer to "This operation is not supported for a relative URI." at the site https://github.com/dotnet/corefx/issues/31390
The relevant answer (by vcsjones commented on Jul 26, 2018) is:
"I believe you are running in to a known compatibility change. .NET Core does not allow resolving external URIs for XML by default and is documented here.
As the documentation says, the old behavior can be restored, if you so choose, by putting
AppContext.SetSwitch("Switch.System.Xml.AllowDefaultResolver", true);
In your application. Try placing that at the top of your example program."
When I added
AppContext.SetSwitch("Switch.System.Xml.AllowDefaultResolver", true);
as the top line of
public void Configure(IApplicationBuilder app, IHostingEnvironment env)
in startup, the error "This operation is not supported for a relative URI" went away. At that moment, a new error occurred calling Load with a XmlReader relating to finding the other files referenced by the XSL file. When I then instead passed the file path to the xsl in Load, it all worked as expected.
var resolver = new XmlUrlResolver {Credentials = CredentialCache.DefaultCredentials};
var transform = new XslCompiledTransform();
transform.Load(XslPath, new XsltSettings(true, true), resolver);
var settings = new XmlWriterSettings {OmitXmlDeclaration = true};
using (var results = new StringWriter())
using(var writer = XmlWriter.Create(results, settings))
{
using (var reader = XmlReader.Create(new StringReader(document)))
{
transform.Transform(reader, writer);
}
return results.ToString();
}
I add this in hope helps someone else trying to debug why XslCompiledTransform thows "This operation is not supported for a relative URI." in .net core.

XmlSerializer Deserialize fails in release mode

This is pretty odd. I have a configuration file which is well formed XML. I create a stream from the file and serialize it using what seems to be pretty typical code:
TextWriter tw = new StreamWriter(tempFile);
I use a serializer created as follows:
XmlSerializer ConfigSettingSerializer = new XmlSerializer(typeof(ConfigSettings));
Where ConfigSettings is just a container class containing string variables and values.
I then take the serialized stream and stash it away as a configuration using the ConfigurationManager class and AppSettings. I then retrieve the serialized data from appSettings and attempt to convert the stream back to the original class:
string configXml = ConfigurationManager.AppSettings[Id];
using (StringReader reader = new StringReader(configXml))
{
retVal = (ConfigSettings)MVHelper.ConfigSettingSerializer.Deserialize(reader);
}
This all works perfectly well until I switch from Debug to Release, when I get an error on the Deserialize call about invalid XML, complaining about the very last character in the document: There is an error in XML document (92, 18). The inner exception is: "Data at the root level is invalid. Line 92, position 18". The document is identical to the one generated in debug mode, and it renders fine in any browser. My guess is that there maybe something else going on and that the real error is somehow being masked, but so far I don't see it. Any advice would be greatly appreciated.
Thanks,
Gary
Load the XML file in a hex editor or other binary editor and check for unprintable characters like an encoding preamble.

validate xml string content including encoding using C#

I need to validate a string that contains XML Data, there is no schema validation required. All I need to do is make sure that the XML is well formed and properly encoded. For example, I want my code to identify this snippet of XML as invalid:
<?xml version="1.0" encoding="utf-8"?>
<parentNode> Positions1 ’</parentNode>
Using the LoadXML method in XMLDocument does not work, there are no errors thrown when I load the snippet above.
I am aware of how to do this if the content were in an XML file, the following snippet of code shows that:
XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.ConformanceLevel = ConformanceLevel.Document;
readerSettings.CheckCharacters = true;
readerSettings.ValidationType = ValidationType.None;
xmlReader = XmlReader.Create(xmlFileName, readerSettings);
XmlDocument xdoc = new XmlDocument();
xdoc.Load(xmlReader);
So short of creating a temporary file to write out my xml string content and then creating an XmlReader instance to read it, is there any alternative? Appreciate much if someone could guide me in the right direction with this problem.
You have not fully understand what encoding means. If you have a .Net string in memory, it's no more "raw data" and has no encoding for that reason. And so LoadXML ingores for a good reason. So what you want to do makes not much sense at all. But if you really want to do it:
You can convert your string into a in memory stream, so you don't have to write a temporary file. Then you can use that stream instead of the xmlFileName in your call to XmlReader.Create.
Achim,
Thanks for your detailed replies, I was able to finally come up with a solution that fits my needs. It involves grabbing the bytes out of the 'unicode' string and then transforming the bytes to utf8 encoding.
try
{
byte[] xmlContentInBytes = new System.Text.UnicodeEncoding().GetBytes(xmlContent);
System.Text.UTF8Encoding utf8 = new System.Text.UTF8Encoding(false, true);
utf8.GetChars(xmlContentInBytes);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
return false;
}

Correcting Encoding in a large Xml File

I'm importing data from XML files containing this type of content:
<FirstName>™MšR</FirstName><MiddleName/><LastName>HšNER™Z</LastName>
The XML is loaded via:
XmlDocument doc = new XmlDocument();
try
{
doc.Load(fullFilePath);
}
When I execute this code with the data contained on top I get an exception about an illegal character. I understand that part just fine.
I'm not sure which encoding this is or how to solve this problem. Is there a way I can change the encoding of the XmlDocument or another method to make sure the above content is parsed correctly?
Update: I do not have any encoding declaration or <?xml in this document.
I've seen some links say to add it dynamically? Is this UTF-16 encoding?
It appears that:
The name was ÖMÜR HÜNERÖZ (or possibly ÔMÜR HÜNERÔZ or ÕMÜR HÜNERÕZ; I don't know what language that is).
The XML file was encoded using the DOS "OEM" code page, probably 437 or 850.
But it was decoded using windows-1252 (the "ANSI" code page).
If you look at the file with a hex editor (HXD or Visual Studio, for instance), what exactly do you see?
Is every character from the string you posted represented by a single byte? Does the file have a byte-order mark (a bunch of non-printable bytes at the start of the file)?
The ™ and š seem to indicate that something went pretty wrong with encoding/conversion along the way, but let's see... I guess they both correspond with a vowel (O-M-A-R H-A-NER-O-Z, maybe?), but I haven't figured out yet how they ended up looking like this...
Edit: dan04 hit the nail on the head. ™ in cp-1252 has hex value 99, and š is 9a. In cp-437 and cp-850, hex 99 represents Ö, and 9a Ü.
The fix is simple: just specify this encoding when opening your XML file:
XmlDocument doc = new XmlDocument();
using (var reader = new StreamReader(fileName, Encoding.GetEncoding(437)))
{
doc.Load(reader);
}
From here:
Encoding encoding;
using (var stream = new MemoryStream(bytes))
{
using (var xmlreader = new XmlTextReader(stream))
{
xmlreader.MoveToContent();
encoding = xmlreader.Encoding;
}
}
You might want to take a look at this: How to best detect encoding in XML file?
For actual reading you can use StreamReader to take care of BOM(Byte order mark):
string xml;
using (var reader = new StreamReader("FilePath", true))
{ // ↑
xml= reader.ReadToEnd(); // detectEncodingFromByteOrderMarks
}
Edit: Removed the encoding parameter. StreamReader will detect the encoding of a file if the file contains a BOM. If it does not it will default to UTF8.
Edit 2: Detecting Text Encoding for StreamReader
Obviously you provided a fragment of the XML document since it's missing a root element, so I'll assume that was your intention. Is there an xml processing instruction at the top like <?xml version="1.0" encoding="UTF-8" ?>?

Categories

Resources