Parsing a string To Obtain two values

Parsing a string To Obtain two values - c#

What would be the simplest way of parsing this string I just want to get the two values out of it. I need to get the value after QueryParameter Parameter=\ and then the value from after \"$MPElement[Name='
Any help apreciated I did try the following code
string myString = listQueryParameter[0]._parameter.ToString();
string text= myString.Split(new [] {"QueryParameter Parameter="}, StringSplitOptions.None)[1];
But it still gave me everything after value along with it thanks.
<QueryParameter Parameter=\"TypeProjectionId\" Value=\"$MPElement[Name='System.WorkItem.Incident.View.ProjectionType']$\" xmlns=\"clr-namespace:Microsoft.EnterpriseManagement.UI.ViewFramework;assembly=Microsoft.EnterpriseManagement.UI.ViewFramework\" />"

Your input looks like XML, but isn’t really valid XML.
My recommendation is to investigate where this stuff comes from, and fix the source so that it will produce correct XML. My guess is that in actuality you already have valid XML, but you looked at the string in the VS debugger and saw it with the backslashes added and then assumed that they are part of the string. If this is the case, then rest assured that they are not.
For reference, valid XML might look like this:
<QueryParameter Parameter="TypeProjectionId" Value="$MPElement[Name='System.WorkItem.Incident.View.ProjectionType']$" xmlns="clr-namespace:Microsoft.EnterpriseManagement.UI.ViewFramework;assembly=Microsoft.EnterpriseManagement.UI.ViewFramework" />
In other words, it’s the same as yours but without the backslashes.
Now we can use XElement.Parse to turn this into a parsed value:
var xml = XElement.Parse(myString);
If you feed the invalid XML to this, this will throw.
Now we want to look at the values of the two attributes Parameter and Value:
var parameter = xml.Attribute("Parameter").Value;
var value = xml.Attribute("Value").Value;
At this point, value contains the string $MPElement[Name='System.WorkItem.Incident.View.ProjectionType']$. You said you wanted the part after Name=, so let’s use a regular expression for that:
var match = Regex.Match(value, #"\bName='([^']*?)'");
if (!match.Success)
throw new Exception("The Value attribute is not in a recognized format.");
var innerValue = match.Groups[1].Value;

Related

Deserialize Json with invalid markup

I'm currently trying to tackle a situation where we have invalid Json that we are trying to deserialize. The clinch point is we're supplied Json where property assignment is declared with an = instead of a : character.
Example Json:
{
"Field1" = "Hello",
"Field2" = "Stuff",
"Field3" = "I am non-Json Json, fear me",
"Field4" = 8
}
Has anyone had any luck using Json.Net to deserialize this into an object of relating structure into C# where = is in use instead of :
I've been trying to write a JsonConverter to read past the = but it always complains it's got an = instead of a : and throws an exception with the message "Expected ':' but got: =. Path ''".
I don't see any way past this except for writing my own deserialization process and not using the Json.Net library. which sucks for something so close to being valid Json (But I suppose is fair enough as it is invalid)
When reader.ReadAsString(); is hit it should read Field1 out but obviously it hasn't met its friend the : yet and so proceeds to fall over saying "what the hell is this doing here?!". I've not got any JsonConverter implementation examples because there's really not much to show. just me attempting to use any of the "Read..." methods and failing to do so.

If property assignment is declared with an = instead of a : character then it is not JSON.
If you don't expect any = in the values of the object then you can do a
string json = invalidData.Replace("=", ":");
and then try to parse it.
As mentioned by #Icepickle, there are risks involved in doing this.
My answer works as a quick fix/workaround, but you will eventually need to make sure that the data you are receiving is valid JSON.
There is no point in trying to deserialize invalid JSON.

As suggested by others, the easiest way to get around this issue is to use a simple string replace to change = characters to : within the JSON string prior to parsing it. Of course, if you have any data values that have = characters in them, they will be mangled by the replacement as well.
If you are worried that the data will have = characters in it, you could take this a step further and use Regex to do the replacement. For example, the following Regex will only replace = characters that immediately follow a quoted property name:
string validJson = Regex.Replace(invalidJson, #"(""[^""]+"")\s?=\s?", "$1 : ");
Fiddle: https://dotnetfiddle.net/yvydi2
Another possible solution is to alter the Json.Net source code to allow an = where a : normally appears in valid JSON. This is probably the safest option in terms of parsing, but requires a little more effort. If you want to go this route, download the latest source code from GitHub, and open the solution in Visual Studio 2015. Locate the JsonTextReader class in the root of the project. Inside this class is a private method called ParseProperty. Near the end of the method is some code which looks like this:
if (_chars[_charPos] != ':')
{
throw JsonReaderException.Create(this, "Invalid character after parsing property name. Expected ':' but got: {0}.".FormatWith(CultureInfo.InvariantCulture, _chars[_charPos]));
}
If you change the above if statement to this:
if (_chars[_charPos] != ':' && _chars[_charPos] != '=')
then the reader will allow both : and = characters as separators between property names and values. Save the change, rebuild the library, and you should be able to use it on your "special" JSON.

unicode to human readable string c# .net

This is probably a very basic question, but really appreciate if you could help me with this:
I want to convert an string that contains characters like \u000d\u000a\u000d\u000 to a human readable string, however I don't want to use .Replace method since the Unicode characters might be much more than what I include the software to check and replace.
string = "Test \u000d\u000a\u000d\u000aTesting with new line. \u000d\u000a\u000d\u000aone more new line"
I receive this string as a json Object from my server.

Do you even need that?
For example, the following code will print abc which is the actual decoded value:
var unicodeString = "\u0061\u0062\u0063";
Console.WriteLine(unicodeString);

New line character is not preserved when I retrieve from xml file. I have used cdata

I am new to unit test case writing using mstest framework and I am stuck in retrieving new line character from the xml file as an input to expected value. Below is the piece of test method
public void ExtractNewLineTest()
{
MathLibraray target = new MathLibraray(); // TODO: Initialize to an appropriate value
string expected = TestContext.DataRow["ExpectedValue"].ToString(); => I am retrieving the value from the xml file.
string actual;
actual = target.ExtractNewLine();
Assert.AreEqual(expected, actual);
}
Below is the xml content
<ExtractNewLineTest>
<ExpectedValue><![CDATA[select distinct\tCDBREGNO,\r\n\tMOLWEIGHT,\r\n\tMDLNUMBER\r\nfrom Mol]]></ExpectedValue>
</ExtractNewLineTest>
When I retrieve the data from the xml to expected value, I am getting below string
ExpectedValue = “select distinct\\tCDBREGNO,\\r\\n\\tMOLWEIGHT,\\r\\n\\tMDLNUMBER\\r\\nfrom Mol”;
If we see the above value, extra slash is getting added for both \n and \r.
Please let me know, how I can assert this value. Thanks!

Your XML file doesn't contain a newline. It contains a backslash followed by a t, r or n. That combination is faithfully being read from the XML file, and then the debugger is escaping the backslash when it's displaying it to you.
If you want to apply "C# string escaping" rules to the value you read from the XML, you can do so, but you should be aware that you do need to do so. A simplistic approach would be:
expectedValue = expectedValue.Replace("\\t", "\t")
.Replace("\\n", "\n")
.Replace("\\r", "\r")
.Replace("\\\\", "\\");
There are cases where that will do the wrong thing - for example if you've got a genuine backslash followed by a t. It may be good enough for you though. If not, you'll have to write a "proper" replacement which reads the string from the start and handles the various cases.

replacing an undefined tags inside an xml string using a regex

i need to replace an undefined tags inside an xml string.
example: <abc> <>sdfsd <dfsdf></abc><def><movie></def> (only <abc> and <def> are defined)
should result with: <abc> <>sdfsd <dfsdf></abc><def><movie><def>
<> and <dfsdf> are not predefined as and and does not have a closing tag.
it must be done with a regex!.
no using xml load and such.
i'm working with C# .Net
Thanks!

How about this:
string s = "<abc> <>sdfsd <dfsdf></abc><def><movie></def>";
string regex = "<(?!/?(?:abc|def)>)|(?<!</?(?:abc|def))>";
string result = Regex.Replace(s, regex, match =>
{
if (match.Value == "<")
return "<";
else
return ">";
});
Console.WriteLine(result);
Result:
<abc> <>sdfsd <dfsdf></abc><def><movie></def>
Also, when tested on your other test case (which by the way I found in a comment on the other question):
<abc>>sdfsdf<<asdada>>asdasd<>asdasd<asdsad>asds<</abc>
I get this result:
<abc>>sdfsdf<<asdada>>asdasd<>asdasd<asdsad>asds<</abc>
Let me guess... this doesn't work for you because you just thought of a new requirement? ;)

it must be done with a regex! no using xml load and such.
I must hammer this nail in with my boot! No using a hammer and such. It's an old story :)
You'll need to supply more information. Are "valid" tags allowed to be nested? Are the "valid" tags likely to change at any point? How robust does this need to be?
Assuming that your list of valid tags isn't going to change at any point, you could do it with a regex substitution:
s/<(?!\/?(your|valid|tags))([^>]*)>/<$1>/g

parsing XML with ampersand

I have a string which contains XML, I just want to parse it into Xelement, but it has an ampersand. I still have a problem parseing it with HtmlDecode. Any suggestions?
string test = " <MyXML><SubXML><XmlEntry Element="test" value="wow&" /></SubXML></MyXML>";
XElement.Parse(HttpUtility.HtmlDecode(test));
I also added these methods to replace those characters, but I am still getting XMLException.
string encodedXml = test.Replace("&", "&").Replace("<", "<").Replace(">", ">").Replace("\"", """).Replace("'", "&apos;");
XElement myXML = XElement.Parse(encodedXml);
t
or Even tried it with this:
string newContent= SecurityElement.Escape(test);
XElement myXML = XElement.Parse(newContent);

Ideally the XML is escaped properly prior to your code consuming it. If this is beyond your control you could write a regex. Do not use the String.Replace method unless you're absolutely sure the values do not contain other escaped items.
For example, "wow&".Replace("&", "&") results in wow&amp; which is clearly undesirable.
Regex.Replace can give you more control to avoid this scenario, and can be written to only match "&" symbols that are not part of other characters, such as <, something like:
string result = Regex.Replace(test, "&(?!(amp|apos|quot|lt|gt);)", "&");
The above works, but admittedly it doesn't cover the variety of other characters that start with an ampersand, such as and the list can grow.
A more flexible approach would be to decode the content of the value attribute, then re-encode it. If you have value="&wow&" the decode process would return "&wow&" then re-encoding it would return "&wow&", which is desirable. To pull this off you could use this:
string result = Regex.Replace(test, #"value=\""(.*?)\""", m => "value=\"" +
HttpUtility.HtmlEncode(HttpUtility.HtmlDecode(m.Groups[1].Value)) +
"\"");
var doc = XElement.Parse(result);
Bear in mind that the above regex only targets the contents of the value attribute. If there are other areas in the XML structure that suffer from the same issue then it can be tweaked to match them and replace their content in a similar fashion.
EDIT: updated solution that should handle content between tags as well as anything between double quotes. Be sure to test this thoroughly. Attempting to manipulate XML/HTML tags with regex is not favorable as it can be error prone and over-complicated. Your case is somewhat special since you need to sanitize it first in order to make use of it.
string pattern = "(?<start>>)(?<content>.+?(?<!>))(?<end><)|(?<start>\")(?<content>.+?)(?<end>\")";
string result = Regex.Replace(test, pattern, m =>
m.Groups["start"].Value +
HttpUtility.HtmlEncode(HttpUtility.HtmlDecode(m.Groups["content"].Value)) +
m.Groups["end"].Value);
var doc = XElement.Parse(result);

Your string doesn't contain valid XML, that's the issue. You need to change your string to:
<MyXML><SubXML><XmlEntry Element="test" value="wow&" /></SubXML></MyXML>"

HtmlEncode will not do the trick, it will probably create even more ampersands (for instance, a ' might become ", which is an Xml entity reference, which are the following:
& &
&apos; '
" "
< <
> >
But it might you get things like &nbsp, which is fine in html, but not in Xml. Therefore, like everybody else said, correct the xml first by making sure any character that is NOT PART OF THE ACTUAL MARKUP OF YOUR XML (that is to say, anything INSIDE your xml as a variable or text) and that occurs in the entity reference list is translated to their corresponding entity (so < would become <). If the text containing the illegal character is text inside an xml node, you could take the easy way and surround the text with a CDATA element, this won't work for attributes though.

Filip's answer is on the right track, but you can hijack the System.Xml.XmlDocument class to do this for you without an entire new utility function.
XmlDocument doc = new XmlDocument();
string xmlEscapedString = (doc.CreateTextNode("Unescaped '&' containing string that would have broken your xml")).OuterXml;

The ampersant makes the XML invalid. This cannot be fixed by a stylesheet so you need to write code with some other tool or code in VB/C#/PHP/Delphi/Lisp/Etc. to remove it or to translate it to &.

This is the simplest and best approach. Works with all characters and allows to parse XML for any web service call i.e. SharePoint ASMX.
public string XmlEscape(string unescaped)
{
XmlDocument doc = new XmlDocument();
var node = doc.CreateElement("root");
node.InnerText = unescaped;
return node.InnerXml;
}

If your string is not valid XML, it will not parse. If it contains an ampersand on its own, it's not valid XML. Contrary to HTML, XML is very strict.

You should 'encode' rather than decode. But calling HttpUtility.HtmlEncode will not help you as it will encode your '<' and '>' symbols as well and your string will no longer be an XML.
I think that for this case the best solution would be to replace '&' with '& amp;' (with no space)

Perhaps consider writing your own XMLDocumentScanner. That's what NekoHTML is doing to have the ability to ignore ampersands not used as entity references.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parsing a string To Obtain two values - c#

Related

Deserialize Json with invalid markup

unicode to human readable string c# .net

New line character is not preserved when I retrieve from xml file. I have used cdata

replacing an undefined tags inside an xml string using a regex

parsing XML with ampersand

Categories

Resources