I am trying to convert a file to XML format that contains some special characters but it's not getting converted because of that special characters in the data.
I have already this regex code still it's not working for me please help.
The code what I have tried:
string filedata = #"D:\readwrite\test11.txt";
string input = ReadForFile(filedata);
string re1 = #"[^\u0000-\u007F]+";
string re5 = #"\p{Cs}";
data = Regex.Replace(input, re1, "");
data = Regex.Replace(input, re5, "");
XmlDocument xmlDocument = new XmlDocument();
try
{
xmlDocument = (XmlDocument)JsonConvert.DeserializeXmlNode(data);
var Xdoc = XDocument.Parse(xmlDocument.OuterXml);
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
0x04 is a transmission control character and cannot appear in a text string. XmlDocument is right to reject it if it really does appear in your data. This does suggest that the regex you have doesn't do what you think it does, if I'm right that regex will find the first instance of one or more of those invalid characters at the beginning of a line and replace it, but not all of them. The real question for me is why this non-text 'character' appears in data intended as XML in the first place.
I have other questions. I've never seen JsonConvert.DeserializeXmlNode before - I had to look up what it does. Why are you using a JSON function against the root of a document which presumably therefore contains no JSON? Why are you then taking that document, converting it back to a string, and then creating an XDocument from it? Why not just create an XDocument to start with?
Related
I'm very new to C# and XML files in general, but currently I have an XML file that still has some html markup in it (&, ;quot;, etc.) and I want to read through the XML file and remove all of those so it becomes easily readable. I can open and print the file to the console with no issue, but I'm stumped trying to search for those specific strings and remove them.
One way to do this would be to put all the words you want to remove into an array, and then use the Replace method to replace them with empty strings:
var xmlFilePath = #"c:\temp\original.xml";
var newFilePath = #"c:\temp\modified.xml";
var wordsToRemove = new[] {"&", ";quot;"};
// Read existing xml file
var fileContents = File.ReadAllText(xmlFilePath);
// Remove words
foreach (var word in wordsToRemove)
{
fileContents = fileContents.Replace(word, "");
}
// Create new file with words removed
File.WriteAllText(newFilePath, fileContents);
I suppose you are looking for this: https://learn.microsoft.com/en-us/dotnet/api/system.web.httputility.htmldecode?view=netcore-3.1
Converts a string that has been HTML-encoded for HTTP transmission into a decoded string.
// Encode the string.
string myEncodedString = HttpUtility.HtmlEncode(myString);
Console.WriteLine($"HTML Encoded string is: {myEncodedString}");
StringWriter myWriter = new StringWriter();
// Decode the encoded string.
HttpUtility.HtmlDecode(myEncodedString, myWriter);
string myDecodedString = myWriter.ToString();
Console.Write($"Decoded string of the above encoded string is: {myDecodedString}");
Your string is html encoded, probably for transmission over network. So there is a built in method to decode it.
I am getting the following request data:-
<NS2:GETREQUEST
XMLNS:NS2='HTTP://WWW..ORG/SCHEMA/NAXML/V01'
XMLNS:NS4='HTTP://WWW..ORG/SCHEMA/CORE/V01'
XMLNS:NS3='HTTP://WWW.NAXML.ORG/VOCABULARY/2020-10-16'>
<NS2:REQUESTHEADER>
<NS2:VERSION>1.1</NS2:VERSION>
<NS3:NAME>VIP</NS3:NAME>
<NS3:MODELVERSION>3.00</NS3:MODELVERSION>
<NS2:SEQUENCEID>1-101</NS2:SEQUENCEID>
<NS2:LOCATIONID>7895</NS2:LOCATIONID>
</NS2:REQUESTHEADER>
</NS2:GETREQUEST>
Now I store this data in string variable. Now I want find the SequenceID from the generated request but I am not able finding the SequenceID.
I am getting an error that while parsing xml data :-
XDocument doc = XDocument.Parse(requesttcpdata);
'NS2' is an undeclared prefix. Line 1, position 2.
Can anyone tell me how do it?
You input string looks like XML, but it isn't. Thus, you cannot parse it with an XML parser.
That having been said, it looks like your input file can be converted into real XML by lowercasing the prefix of the xmlns: attributes. To ensure that you are not accidentally modifying xmlns when it appears in the values themselves, I suggest you use a fairly strict string replacement check:
string input = #"<NS2:GETREQUEST
XMLNS:NS2='HTTP://WWW..ORG/SCHEMA/NAXML/V01'
XMLNS:NS4='HTTP://WWW..ORG/SCHEMA/CORE/V01'
XMLNS:NS3='HTTP://WWW.NAXML.ORG/VOCABULARY/2020-10-16'>
<NS2:REQUESTHEADER>
<NS2:VERSION>1.1</NS2:VERSION>
<NS3:NAME>VIPER</NS3:NAME>
<NS3:MODELVERSION>3.00</NS3:MODELVERSION>
<NS2:SEQUENCEID>1-101</NS2:SEQUENCEID>
<NS2:LOCATIONID>7895</NS2:LOCATIONID>
</NS2:REQUESTHEADER>
</NS2:GETREQUEST>";
const string brokenHeader = #"<NS2:GETREQUEST
XMLNS:NS2='HTTP://WWW..ORG/SCHEMA/NAXML/V01'
XMLNS:NS4='HTTP://WWW..ORG/SCHEMA/CORE/V01'
XMLNS:NS3='HTTP://WWW.NAXML.ORG/VOCABULARY/2020-10-16'>";
const string fixedHeader = #"<NS2:GETREQUEST
xmlns:NS2='HTTP://WWW..ORG/SCHEMA/NAXML/V01'
xmlns:NS4='HTTP://WWW..ORG/SCHEMA/CORE/V01'
xmlns:NS3='HTTP://WWW.NAXML.ORG/VOCABULARY/2020-10-16'>";
if (input.StartsWith(brokenHeader))
{
input = fixedHeader + input.Substring(brokenHeader.Length);
}
var x = XDocument.Parse(input); // works now
I am doing the following -
// Create message
StringBuilder sbXML = new StringBuilder();
sbXML .Append("<root>");
sbXML .AppendFormat("<messageBody>{0}</messageBody>", JsonString);
sbXML .Append("</root>");
Where JsonString is a json string, however some of the entries in the json are strings of html (which I think this is why it is breaking).
When I do -
XmlDocument xmlDOC = new XmlDocument();
xmlDOC.LoadXml(sbXML.ToString());
I get the error -
'\' is an unexpected token. The expected token is '"' or '''.
My Json also contains urls so for instance -
{
"exampleJson": {
"url": "http://example.com/",
"html": "example text"
}
}
I believe it is these values that is leading to the exception, is there a way around this so that xmlDOC.LoadXml can load my Json, I considered doing something like -
xmlDOC.LoadXml(sbXML.ToString().Replace("character to replace", "acceptable character"));
However this is obviously not ideal. I also tried just using
.Load
However this resulted in illegal characters in the path exception.
I think you want to be doing something like:
StringBuilder sbXML = new StringBuilder();
sbXML.Append("<root>");
sbXML.Append("<messageBody />");
sbXML.Append("</root>");
XmlDocument xmlDOC = new XmlDocument();
xmlDOC.LoadXml(sbXML.ToString());
xmlDOC.DocumentElement.SelectSingleNode("messageBody").InnerText = JsonString;
As pointed out by #Alexei Levenkov creating Xml by string concatenation is a really bad idea and will lead to more problems later.
Using the System.Xml.XmlDocument methods is a much safer method that will encode all the bits it needs to to make the value of JsonString Xml safe.
You need to use the CDATA tag.
example:
<messageBody><![CDATA[ any json data ]]> </messageBody>
I get a XmlElement from a web service. I get something unexpected because xmlElement.OwnerDocument.ChildNodes is empty. How is that possible?
This is the xml:
<tns1:VideoSource xmlns:tns1="http://www.onvif.org/ver10/topics">
<MotionAlarm wstop:topic="true" xmlns:wstop="http://docs.oasis-open.org/wsn/t-1" xmlns="http://www.onvif.org/ver10/events/wsdl">
</MotionAlarm>
</tns1:VideoSource>
I tested you xml with the code below and there are children like you said. I suspect there may be some white characters that is creating an error. If you got data from a website (probably a stream) there may be some null characters at the end of the stream that is invisible. Make sure your stream class is using UTF8 encoding. The default encoding in some streams is Ascii which can change characters and add padding character which may create issues.
string input =
"<tns1:VideoSource xmlns:tns1=\"http://www.onvif.org/ver10/topics\">" +
"<MotionAlarm wstop:topic=\"true\" xmlns:wstop=\"http://docs.oasis-open.org/wsn/t-1\" xmlns=\"http://www.onvif.org/ver10/events/wsdl\">" +
"</MotionAlarm>" +
"</tns1:VideoSource>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(input);
XmlNodeList videoSource = doc.ChildNodes;
XmlNodeList motionAlarm = videoSource[0].ChildNodes;
I know that the following would find potential tags, but is there a better way to check if a string contains XML tags to prevent exceptions when reading/writing the string between XML files?
string testWord = "test<a>";
bool foundTag = Regex.IsMatch(testWord, #"^*<*>*$"));
I'd use another Regex for that
Regex.IsMatch(testWord, #"<.+?>");
However, even if it does match, there is no guarantee that your file actually is an xml file, as the regex could also match strings like "<<a>" which is invalid, or "a <= b >= c" which is obviously not xml.
You should consider using the XmlDocument class instead.
XmlDocument xmlDoc = new XmlDocument();
try
{
xmlDoc.Load(testWord);
}
catch
{
// not an xml
}
Why don't you HtmlEncode the string before sending it via XML? This way you can avoid difficulties with Regex parsing tags.