Parse the XML namespace from string variable in c#

Parse the XML namespace from string variable in c# - c#

I am getting the following request data:-
<NS2:GETREQUEST
XMLNS:NS2='HTTP://WWW..ORG/SCHEMA/NAXML/V01'
XMLNS:NS4='HTTP://WWW..ORG/SCHEMA/CORE/V01'
XMLNS:NS3='HTTP://WWW.NAXML.ORG/VOCABULARY/2020-10-16'>
<NS2:REQUESTHEADER>
<NS2:VERSION>1.1</NS2:VERSION>
<NS3:NAME>VIP</NS3:NAME>
<NS3:MODELVERSION>3.00</NS3:MODELVERSION>
<NS2:SEQUENCEID>1-101</NS2:SEQUENCEID>
<NS2:LOCATIONID>7895</NS2:LOCATIONID>
</NS2:REQUESTHEADER>
</NS2:GETREQUEST>
Now I store this data in string variable. Now I want find the SequenceID from the generated request but I am not able finding the SequenceID.
I am getting an error that while parsing xml data :-
XDocument doc = XDocument.Parse(requesttcpdata);
'NS2' is an undeclared prefix. Line 1, position 2.
Can anyone tell me how do it?

You input string looks like XML, but it isn't. Thus, you cannot parse it with an XML parser.
That having been said, it looks like your input file can be converted into real XML by lowercasing the prefix of the xmlns: attributes. To ensure that you are not accidentally modifying xmlns when it appears in the values themselves, I suggest you use a fairly strict string replacement check:
string input = #"<NS2:GETREQUEST
XMLNS:NS2='HTTP://WWW..ORG/SCHEMA/NAXML/V01'
XMLNS:NS4='HTTP://WWW..ORG/SCHEMA/CORE/V01'
XMLNS:NS3='HTTP://WWW.NAXML.ORG/VOCABULARY/2020-10-16'>
<NS2:REQUESTHEADER>
<NS2:VERSION>1.1</NS2:VERSION>
<NS3:NAME>VIPER</NS3:NAME>
<NS3:MODELVERSION>3.00</NS3:MODELVERSION>
<NS2:SEQUENCEID>1-101</NS2:SEQUENCEID>
<NS2:LOCATIONID>7895</NS2:LOCATIONID>
</NS2:REQUESTHEADER>
</NS2:GETREQUEST>";
const string brokenHeader = #"<NS2:GETREQUEST
XMLNS:NS2='HTTP://WWW..ORG/SCHEMA/NAXML/V01'
XMLNS:NS4='HTTP://WWW..ORG/SCHEMA/CORE/V01'
XMLNS:NS3='HTTP://WWW.NAXML.ORG/VOCABULARY/2020-10-16'>";
const string fixedHeader = #"<NS2:GETREQUEST
xmlns:NS2='HTTP://WWW..ORG/SCHEMA/NAXML/V01'
xmlns:NS4='HTTP://WWW..ORG/SCHEMA/CORE/V01'
xmlns:NS3='HTTP://WWW.NAXML.ORG/VOCABULARY/2020-10-16'>";
if (input.StartsWith(brokenHeader))
{
input = fixedHeader + input.Substring(brokenHeader.Length);
}
var x = XDocument.Parse(input); // works now

Related

{"'\u0004', hexadecimal value 0x04, is an invalid character

I am trying to convert a file to XML format that contains some special characters but it's not getting converted because of that special characters in the data.
I have already this regex code still it's not working for me please help.
The code what I have tried:
string filedata = #"D:\readwrite\test11.txt";
string input = ReadForFile(filedata);
string re1 = #"[^\u0000-\u007F]+";
string re5 = #"\p{Cs}";
data = Regex.Replace(input, re1, "");
data = Regex.Replace(input, re5, "");
XmlDocument xmlDocument = new XmlDocument();
try
{
xmlDocument = (XmlDocument)JsonConvert.DeserializeXmlNode(data);
var Xdoc = XDocument.Parse(xmlDocument.OuterXml);
}
catch (Exception ex)
{
Console.WriteLine(ex);
}

0x04 is a transmission control character and cannot appear in a text string. XmlDocument is right to reject it if it really does appear in your data. This does suggest that the regex you have doesn't do what you think it does, if I'm right that regex will find the first instance of one or more of those invalid characters at the beginning of a line and replace it, but not all of them. The real question for me is why this non-text 'character' appears in data intended as XML in the first place.
I have other questions. I've never seen JsonConvert.DeserializeXmlNode before - I had to look up what it does. Why are you using a JSON function against the root of a document which presumably therefore contains no JSON? Why are you then taking that document, converting it back to a string, and then creating an XDocument from it? Why not just create an XDocument to start with?

Data at the root level is invalid - XML parsing

I am very new to programming and not sure where I am going wrong. I have read the other threads with similar error, but I think my problem is even basic.
I get a string generated which contains XML, but it doesnt start with an XML. When I try to parse that string I get the above error.
Is there a way of getting rid of the text and save the text from where the XML starts?
My string:
{"Id":"6a76f781-f592-4320-a116-6ab289505423","Name":"Test - A","AttachmentRequired":false,"FormXml":"
<?xml version=\"1.0\" encoding=\"utf-16\"?>

The easiest way would be to use a JSON parser like Newtonsoft:
public class Data
{
public string Id;
public string Name;
public bool AttachmentRequired;
public string FormXml;
}
var o = JsonConvert.DeserializeObject<Data>(json);
var xml = o.FormXml;
Here is the Nuget package to Newtonsoft which I demonstrated above:
https://www.nuget.org/packages/Newtonsoft.Json/
If you absolutely can't use an external library to transform it into a CLR object, here is how you would do it through string manipulation:
var str = #"{ ""Id"":""6a76f781-f592-4320-a116-6ab289505423"",""Name"":""Test - A"",""AttachmentRequired"":false,""FormXml"":""<?xml version=\""1.0\"" encoding=\""utf-16\""?>""}";
var parts = str.Split(':');
var last = parts[parts.Length -1];
var xml = last.Replace("}","").Replace("\"<","<").Replace(">\"",">");

your string appears to be in json format and xml part of it is a field value for "formxml". screenshot
Easy way is to deserialize the string into object using newtonsoft json, and then parse the value of formxml to your object.
JsonConvert.DeserializeObject<YourClass>(yourstring);

What would be the best way of checking whether a string contains XML tags?

I know that the following would find potential tags, but is there a better way to check if a string contains XML tags to prevent exceptions when reading/writing the string between XML files?
string testWord = "test<a>";
bool foundTag = Regex.IsMatch(testWord, #"^*<*>*$"));

I'd use another Regex for that
Regex.IsMatch(testWord, #"<.+?>");
However, even if it does match, there is no guarantee that your file actually is an xml file, as the regex could also match strings like "<<a>" which is invalid, or "a <= b >= c" which is obviously not xml.
You should consider using the XmlDocument class instead.
XmlDocument xmlDoc = new XmlDocument();
try
{
xmlDoc.Load(testWord);
}
catch
{
// not an xml
}

Why don't you HtmlEncode the string before sending it via XML? This way you can avoid difficulties with Regex parsing tags.

How to read a string containing XML elements without using the XML properties

I'm doing an XML reading process in my project. Where I have to read the contents of an XML file. I have achieved it.
Just out of curiosity, I also tried using the same by keeping the XML content inside a string and then read only the values inside the elemet tag. Even this I have achieved. The below is my code.
string xml = <Login-Form>
<User-Authentication>
<username>Vikneshwar</username>
<password>xxx</password>
</User-Authentication>
<User-Info>
<firstname>Vikneshwar</firstname>
<lastname>S</lastname>
<email>xxx#xxx.com</email>
</User-Info>
</Login-Form>";
XDocument document = XDocument.Parse(xml);
var block = from file in document.Descendants("client-authentication")
select new
{
Username = file.Element("username").Value,
Password = file.Element("password").Value,
};
foreach (var file in block)
{
Console.WriteLine(file.Username);
Console.WriteLine(file.Password);
}
Similarly, I obtained my other set of elements (firstname, lastname, and email). Now my curiosity draws me again. Now I'm thinking of doing the same using the string functions?
The same string used in the above code is to be taken. I'm trying not to use any XMl related classes, that is, XDocument, XmlReader, etc. The same output should be achieved using only string functions. I'm not able to do that. Is it possible?

Don't do it. XML is more complex than can appear the case, with complex rules surrounding nesting, character-escaping, named-entities, namespaces, ordering (attributes vs elements), comments, unparsed character data, and whitespace. For example, just add
<!--
<username>evil</username>
-->
Or
<parent xmlns=this:is-not/the/data/you/expected">
<username>evil</username>
</parent>
Or maybe the same in a CDATA section - and see how well basic string-based approaches work. Hint: you'll get a different answer to what you get via a DOM.
Using a dedicated tool designed for reading XML is the correct approach. At the minimum, use XmlReader - but frankly, a DOM (such as your existing code) is much more convenient. Alternatively, use a serializer such as XmlSerializer to populate an object model, and query that.
Trying to properly parse xml and xml-like data does not end well.... RegEx match open tags except XHTML self-contained tags

You could use methods like IndexOf, Equals, Substring etc. provided in String class to fulfill your needs, for more info Go here,
Using Regex is a considerable option too.
But it's advisable to use XmlDocument class for this purpose.

It can be done without regular expressions, like this:
string[] elementNames = new string[]{ "<username>", "<password>"};
foreach (string elementName in elementNames)
{
int startingIndex = xml.IndexOf(elementName);
string value = xml.Substring(startingIndex + elementName.Length,
xml.IndexOf(elementName.Insert(1, "/"))
- (startingIndex + elementName.Length));
Console.WriteLine(value);
}
With a regular expression:
string[] elementNames2 = new string[]{ "<username>", "<password>"};
foreach (string elementName in elementNames2)
{
string value = Regex.Match(xml, String.Concat(elementName, "(.*)",
elementName.Insert(1, "/"))).Groups[1].Value;
Console.WriteLine(value);
}
Of course, the only recommended thing is to use the XML parsing classes.

Build an extension method that will get the text between tags like this:
public static class StringExtension
{
public static string Between(this string content, string start, string end)
{
int startIndex = content.IndexOf(start) + start.Length;
int endIndex = content.IndexOf(end);
string result = content.Substring(startIndex, endIndex - startIndex);
return result;
}
}

How change file coding from windows-1251 to utf-8

I have xml file and I need to convert the text that I get from it:
I just start to write code, but I don't know how to realize this:
string text = File.ReadAllText(path);
XDocument documentcode = XDocument.Load(text);

You will have to specify the correct encoding when reading:
string text = File.ReadAllText(path, Encoding.GetEncoding("windows-1251"));
XDocument documentcode = XDocument.Parse(text); // not load.
You probably don't have to do anything special when writing.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parse the XML namespace from string variable in c# - c#

Related

{"'\u0004', hexadecimal value 0x04, is an invalid character

Data at the root level is invalid - XML parsing

What would be the best way of checking whether a string contains XML tags?

How to read a string containing XML elements without using the XML properties

How change file coding from windows-1251 to utf-8

Categories

Resources