How to clean up json string from Request.Url

How to clean up json string from Request.Url - c#

I'm looking for a way to clean up the following json string in C# to be more usable.
"?{\"token\":\"I3dt-MIByyWD5-XqF6VT3hQSk8qvy9r6\"}"
basically just for a way to strip it down to
"token:I3dt-MIByyWD5-XqF6VT3hQSk8qvy9r6" or just "I3dt-MIByyWD5-XqF6VT3hQSk8qvy9r6"
I assume that a would be good for accomplishing this but unfortunately i've never written one before and a bit lost on how to get what im looking for using one. The parsing is happening in C# BTW.
EDIT: Correction a regular expression probably wont do what i want... i want to format the string.. not just validate it.

It is better to parse it into a JSON object, then using the JSON API, to get the value of the token key from there.

Check this out:
http://msdn.microsoft.com/en-us/library/bb299886.aspx
there is a class, called JsonTextReader, which you can use for parsing.
Here's how:
string jsonText = #"[""Europe"", ""Asia"", ""Australia"", ""Antarctica"",
""North America"", ""South America"", ""Africa""]";
using (JsonTextReader reader = new JsonTextReader(new
StringReader(jsonText)))
{
while (reader.Read())
{
if (reader.TokenClass == JsonTokenClass.String &&
reader.Text.StartsWith("A"))
{
Console.WriteLine(reader.Text);
}
}
}

you can try a method like this. the & take it for separator if you have a chain of such "?{\"token\":\"I3dt-MIByyWD5-XqF6VT3hQSk8qvy9r6\"&\"other\":\"123\"}". Also clears the characters "and \
static string MyParserJson(string sjson, string key)
{
try
{
if (!(sjson.Contains("{") && sjson.Contains("}")))
throw new ApplicationException("don't exist { or }");
int inipos = sjson.IndexOf("{");
int endpos = sjson.IndexOf("}");
var myjson = sjson.Substring(inipos + "{".Length, endpos - (inipos + "{".Length));
string[] ajson = myjson.Split('&');
foreach (var keyval in ajson)
{
if (!keyval.Contains(":"))
continue;
string[] afind = keyval.Split(':');
if (afind[0].Contains(key))
{
return afind[1].Replace("\"", "").Replace("\\", "").Trim();
}
}
}
catch
{
//test
}
return string.Empty;
}
var uri = "?{\"token\":\"I3dt-MIByyWD5-XqF6VT3hQSk8qvy9r6\"}";
var token = MyParserJson(uri, "token");

Related

How to convert a string containing an array into a list<ushort> ?

For example there is a
string str = "[2,3,4,5]"
How to convert this array of type string into a list where I can get each element in the list of type ushort?
The string gets the value "[2,3,4,5]" from ruby script.

Using linq you could do it like
var numbers = str.Where(y=>Char.IsDigit(y)).Select(p=>UInt16.Parse(p.ToString())).ToArray();

It's actually quite simple. All you need to do is write a method that parses the string and splits it up. Here is an basic example with NO error checking or optimizations. The naming convention is purely for your understanding purposes.
List <ushort> ConvertToUShortList (string arrayText)
{
var result = new List<ushort> ();
var bracketsRemoved = arrayText.Replace ("[", "").Replace ("]", "");
var numbersSplit = bracketsRemoved.Split ( new string[] {","}, System.StringSplitOptions.None);
foreach (var number in numbersSplit)
{
result.Add (ushort.Parse (number));
}
return result;
}
I shouldn't need to explain anything in this method due to the names I have given things. If you don't understand anything, let me know and I'll clarify it for you.

Another method (more checks):
class Program
{
static void Main(string[] args)
{
var str = "[1,2,3,4,5,6,7,8,9]";
var x = FromRubyArray(str);
Console.WriteLine(str);
Console.WriteLine(string.Join("-", x));
Console.ReadLine();
}
public static List<ushort> FromRubyArray(string stra)
{
if (string.IsNullOrWhiteSpace(stra)) return new List<ushort>();
stra = stra.Trim();
stra = stra.Trim('[', ']');
return stra
.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(s => Convert.ToUInt16(s))
.ToList();
}
}

Since this string is using a "Json-like" format, you can use this code:
JavaScriptSerializer serializer = new JavaScriptSerializer();
var array = serializer.Deserialize<ushort[]>("[2,3,4,5]");
You just need to reference the System.Web.Extensions assembly

Another elegant way using Newtonsoft's Json.net (http://www.newtonsoft.com/json)
var ushortArray = JsonConvert.DeserializeObject<List<ushort>>(myString);

You could do something like
List<ushort> myUshorts = new List<ushort>("[200,3,4,5]".Trim('[', ']').Split(',').Select(ushort.Parse));
if you know that's exactly how the output will be.

XmlException - given illegal XML from 3rd party; must process

There are several SO questions and answers about this when creating an XML file; but can't find any pertaining to when you are given bad XML from a 3rd party that you must process; note, the 3rd party cannot be held accountable for the illegal XML.
Ultimately, the .InnerText needs to be escaped or encoded (e.g. changed to legal XML characters) - and later decoded after proper XML parsing.
QUESTION: Are there any libraries that will Load() Invalid/Illegal XML files to allow quick navigation for such escaping/encoding? Or am I stuck having to manually parse the invalid xml, fixing it along the way ... ?
<?xml version="1.0" encoding="utf-8"?>
<ChunkData>
<Fields>
<Field1>some words < other words</Field1>
<Field2>some words > other words</Field2>
</Fields>
</ChunkData>

Although HttpAgilityPack is awesome (and I'm using it in another project of my own), I was given no the time to follow Alexei's advice - which is exactly the direction that I was looking for -- can't parse it as XML? cool, parse it as HTML ... didn't even cross my mind ...
Ended up with this, which does the trick (but is exactly what Alexei advised against):
private static string EncodeValues(string xml)
{
var doc = new List<string>();
var lines = xml.Split('\n');
foreach (var line in lines)
{
var output = line;
if (line.Contains("<Field") && !line.Contains("Fields>"))
{
var value = line.Parse(">", "</");
var encoded = HttpUtility.UrlEncode(value);
output = line.Replace(value, encoded);
}
doc.Add(output);
}
return string.Join("", doc);
}
private static Hashtable DecodeValues(IDictionary data)
{
var output = new Hashtable();
foreach (var key in data.Keys)
{
var value = (string)data[key];
output.Add(key, HttpUtility.UrlDecode(value));
}
return output;
}
Used in conjunction with an Extension method I wrote quite awhile ago ...
public static string Parse(this string s, string first, string second)
{
try
{
if (string.IsNullOrEmpty(s)) return "";
var start = s.IndexOf(first, StringComparison.InvariantCulture) + first.Length;
var end = s.IndexOf(second, start, StringComparison.InvariantCulture);
var length = end - start;
return (end > 0 && length < s.Length) ? s.Substring(start, length) : s.Substring(start);
}
catch (Exception) { return ""; }
}
Used as such (kept separate from the Transform and Hashtable creation methods for clarity):
xmlDocs[0] = EncodeValues(xmlDocs[0]); // in order to handle illegal chars in XML, encode InnerText
var doc = TransformXmlDocument(orgName, xmlDocs[0], xmlDocs[1]);
var data = GetHashtableFromXml(doc);
data = DecodeValues(data); // decode the values extracted from the hashtable
Regardless, I'm always looking for insight ... feel free to comment on this solution - or provide another.

How to use replace only the first occurence of www

In my code behind in C# I have the following code. How do I change the replace so that only
the first occurance of www is replaced?
For example if the User enters www.testwww.com then I should be saving it as testwww.com.
Currently as per the below code it saves as www.com (guess due to substr code).
Please help. Thanks in advance.
private string FilterUrl(string url)
{
string lowerCaseUrl = url.ToLower();
lowerCaseUrl = lowerCaseUrl.Replace("http://", string.Empty).Replace("https://", string.Empty).Replace("ftp://", string.Empty);
lowerCaseUrl = lowerCaseUrl.Replace("www.", string.Empty);
string lCaseUrl = url.Substring(url.Length - lowerCaseUrl.Length, lowerCaseUrl.Length);
return lCaseUrl;
}

As Ally suggested. You are much better off using System.Uri. This also replaces the leading www as you wish.
private string FilterUrl(string url)
{
Uri uri = new UriBuilder(url).Uri; // defaults to http:// if missing
return Regex.Replace(uri.Host, "^www.", "") + uri.PathAndQuery;
}
Edit: The trailing slash is because of the PathAndQuery property. If there was no path you are left with the slash only. Just add another regex replace or string replace. Here's the regex way.
return Regex.Replace(uri.Host, "^www.", "") + Regex.Replace(uri.PathAndQuery, "/$", "");

I would suggest using indexOf(string) to find the first occurrence.
Edit: okay someone beat me to it ;)

You could use IndexOf like Felipe suggested OR do it the low tech way..
lowerCaseUrl = lowerCaseUrl.Replace("http://", string.Empty).Replace("https://", string.Empty).Replace("ftp://", string.Empty).Replace("http://www.", string.Empty).Replace("https://www.", string.Empty)
Would be interested to know what you're trying to achieve.

Came up with a cool static method, also works for replacing the first x occurrences:
public static string ReplaceOnce(this string s, string replace, string with)
{
return s.ReplaceCount(replace, with);
}
public static string ReplaceCount(this string s, string replace, string with, int howManytimes = 1)
{
if (howManytimes < 0) throw InvalidOperationException("can not replace a string less than zero times");
int count = 0;
while (s.Contains(replace) && count < howManytimes)
{
int position = s.IndexOf(replace);
s = s.Remove(position, replace.Length);
s = s.Insert(position, with);
count++;
}
return s;
}
The ReplaceOnce isn't necessary, just a simplifier. Call it like this:
string url = "http://www.stackoverflow.com/questions/www/www";
var urlR1 - url.ReplaceOnce("www", "xxx");
// urlR1 = "http://xxx.stackoverflow.com/questions/www/www";
var urlR2 - url.ReplaceCount("www", "xxx", 2);
// urlR2 = "http://xxx.stackoverflow.com/questions/xxx/www";
NOTE: this is case-sensitive as it is written

The Replace method will change all content of the string. You have to locate the piece you want to remove using IndexOf method, and remove using Remove method of string. Try something like this:
//include the namespace
using System.Globalization;
private string FilterUrl(string url)
{
// ccreate a Comparer object.
CompareInfo myCompare = CultureInfo.InvariantCulture.CompareInfo;
// find the 'www.' on the url parameter ignoring the case.
int position = myCompare.IndexOf(url, "www.", CompareOptions.IgnoreCase);
// check if exists 'www.' on the string.
if (position > -1)
{
if (position > 0)
url = url.Remove(position - 1, 5);
else
url = url.Remove(position, 5);
}
//if you want to remove http://, https://, ftp://.. keep this line
url = url.Replace("http://", string.Empty).Replace("https://", string.Empty).Replace("ftp://", string.Empty);
return url;
}
Edits
There was a part in your code that is removing a piece of string. If you just want to remove the 'www.' and 'http://', 'https://', 'ftp://', take a look the this code.
This code also ignore the case when it compares the url parameter and what you have been findind, on case, 'www.'.

Json.NET: Deserilization with Double Quotes

I am trying to deserialize a json string received as a response from the service. The client is Windows Phone 7, in C#. I am using Json .NET - James Newton-King deserializor to directly convert the Json string to objects. But sometimes the Json string contains some comments information with double quotes (") in them and the deserializer fails and throws an error. Looks like this is an invalid Json string according to Jsonlint.
{
"Name": "A1",
"Description": "description of the "object" A1"
}
How to handle such Json String. If it is (\"), then it works. But I cannot replace all (") with (\") as there might be double quotes in other part of the json string. Is there any decode function of Json .Net?

It looks like HttpUtility.JavaScriptStringEncode might solve your issue.
HttpUtility.JavaScriptStringEncode(JsonConvert.SerializeObject(yourObject))

Just do:
yourJsonString = yourJsonString.Replace("\"", "\\u022");
object o = JSonConvert.Deserialize(yourJsonString);
\u022 is the ascii code for double quotes. So replacing quotes for \u022 will be recognized by your browser.
And use \ in "\u022" to make c# recognize backslash character.
Cheers

You can improving this.
static private T CleanJson<T>(string jsonData)
{
var json = jsonData.Replace("\t", "").Replace("\r\n", "");
var loop = true;
do
{
try
{
var m = JsonConvert.DeserializeObject<T>(json);
loop = false;
}
catch (JsonReaderException ex)
{
var position = ex.LinePosition;
var invalidChar = json.Substring(position - 2, 2);
invalidChar = invalidChar.Replace("\"", "'");
json = $"{json.Substring(0, position -1)}{invalidChar}{json.Substring(position)}";
}
} while (loop);
return JsonConvert.DeserializeObject<T>(json);
}
Example;
var item = CleanJson<ModelItem>(jsonString);

I had the same problem and i found a possible solution. The idea is to catch the JsonReaderException. This exception bring to you the attribute "LinePosition". You can replace this position to an empty character (' '). And then, you use this method recursively until whole json is fixed.
This is my example:
private JToken processJsonString(string data, int failPosition)
{
string json = "";
var doubleQuote = "\"";
try
{
var jsonChars = data.ToCharArray();
if (jsonChars[failPosition - 1].ToString().Equals(doubleQuote))
{
jsonChars[failPosition - 1] = ' ';
}
json = new string(jsonChars);
return JToken.Parse(json);
}
catch(JsonReaderException jsonException)
{
return this.processJsonString(json, jsonException.LinePosition);
}
}
I hope you enjoy it.

I would recommend to write email to server admin/webmaster and to ask them fix this issue with json.
But if this is impossible, you can write simple parse that finds nonescaped doublequotes inside doublequotes and escapes them. It will hardly be >20lines of code.

you can use newtonsoft library to convert it to object( to replace \" with "):
dynamic o = JObject.Parse(jsondata);
return Json(o);

String escape into XML

Is there any C# function which could be used to escape and un-escape a string, which could be used to fill in the content of an XML element?
I am using VSTS 2008 + C# + .Net 3.0.
EDIT 1: I am concatenating simple and short XML file and I do not use serialization, so I need to explicitly escape XML character by hand, for example, I need to put a<b into <foo></foo>, so I need escape string a<b and put it into element foo.

SecurityElement.Escape(string s)

public static string XmlEscape(string unescaped)
{
XmlDocument doc = new XmlDocument();
XmlNode node = doc.CreateElement("root");
node.InnerText = unescaped;
return node.InnerXml;
}
public static string XmlUnescape(string escaped)
{
XmlDocument doc = new XmlDocument();
XmlNode node = doc.CreateElement("root");
node.InnerXml = escaped;
return node.InnerText;
}

EDIT: You say "I am concatenating simple and short XML file and I do not use serialization, so I need to explicitly escape XML character by hand".
I would strongly advise you not to do it by hand. Use the XML APIs to do it all for you - read in the original files, merge the two into a single document however you need to (you probably want to use XmlDocument.ImportNode), and then write it out again. You don't want to write your own XML parsers/formatters. Serialization is somewhat irrelevant here.
If you can give us a short but complete example of exactly what you're trying to do, we can probably help you to avoid having to worry about escaping in the first place.
Original answer
It's not entirely clear what you mean, but normally XML APIs do this for you. You set the text in a node, and it will automatically escape anything it needs to. For example:
LINQ to XML example:
using System;
using System.Xml.Linq;
class Test
{
static void Main()
{
XElement element = new XElement("tag",
"Brackets & stuff <>");
Console.WriteLine(element);
}
}
DOM example:
using System;
using System.Xml;
class Test
{
static void Main()
{
XmlDocument doc = new XmlDocument();
XmlElement element = doc.CreateElement("tag");
element.InnerText = "Brackets & stuff <>";
Console.WriteLine(element.OuterXml);
}
}
Output from both examples:
<tag>Brackets & stuff <></tag>
That's assuming you want XML escaping, of course. If you're not, please post more details.

Thanks to #sehe for the one-line escape:
var escaped = new System.Xml.Linq.XText(unescaped).ToString();
I add to it the one-line un-escape:
var unescapedAgain = System.Xml.XmlReader.Create(new StringReader("<r>" + escaped + "</r>")).ReadElementString();

George, it's simple. Always use the XML APIs to handle XML. They do all the escaping and unescaping for you.
Never create XML by appending strings.

And if you want, like me when I found this question, to escape XML node names, like for example when reading from an XML serialization, use the easiest way:
XmlConvert.EncodeName(string nameToEscape)
It will also escape spaces and any non-valid characters for XML elements.
http://msdn.microsoft.com/en-us/library/system.security.securityelement.escape%28VS.80%29.aspx

Another take based on John Skeet's answer that doesn't return the tags:
void Main()
{
XmlString("Brackets & stuff <> and \"quotes\"").Dump();
}
public string XmlString(string text)
{
return new XElement("t", text).LastNode.ToString();
}
This returns just the value passed in, in XML encoded format:
Brackets & stuff <> and "quotes"

WARNING: Necromancing
Still Darin Dimitrov's answer + System.Security.SecurityElement.Escape(string s) isn't complete.
In XML 1.1, the simplest and safest way is to just encode EVERYTHING.
Like for \t.
It isn't supported at all in XML 1.0.
For XML 1.0, one possible workaround is to base-64 encode the text containing the character(s).
//string EncodedXml = SpecialXmlEscape("привет мир");
//Console.WriteLine(EncodedXml);
//string DecodedXml = XmlUnescape(EncodedXml);
//Console.WriteLine(DecodedXml);
public static string SpecialXmlEscape(string input)
{
//string content = System.Xml.XmlConvert.EncodeName("\t");
//string content = System.Security.SecurityElement.Escape("\t");
//string strDelimiter = System.Web.HttpUtility.HtmlEncode("\t"); // XmlEscape("\t"); //XmlDecode(" ");
//strDelimiter = XmlUnescape(";");
//Console.WriteLine(strDelimiter);
//Console.WriteLine(string.Format("&#{0};", (int)';'));
//Console.WriteLine(System.Text.Encoding.ASCII.HeaderName);
//Console.WriteLine(System.Text.Encoding.UTF8.HeaderName);
string strXmlText = "";
if (string.IsNullOrEmpty(input))
return input;
System.Text.StringBuilder sb = new StringBuilder();
for (int i = 0; i < input.Length; ++i)
{
sb.AppendFormat("&#{0};", (int)input[i]);
}
strXmlText = sb.ToString();
sb.Clear();
sb = null;
return strXmlText;
} // End Function SpecialXmlEscape
XML 1.0:
public static string Base64Encode(string plainText)
{
var plainTextBytes = System.Text.Encoding.UTF8.GetBytes(plainText);
return System.Convert.ToBase64String(plainTextBytes);
}
public static string Base64Decode(string base64EncodedData)
{
var base64EncodedBytes = System.Convert.FromBase64String(base64EncodedData);
return System.Text.Encoding.UTF8.GetString(base64EncodedBytes);
}

Following functions will do the work. Didn't test against XmlDocument, but I guess this is much faster.
public static string XmlEncode(string value)
{
System.Xml.XmlWriterSettings settings = new System.Xml.XmlWriterSettings
{
ConformanceLevel = System.Xml.ConformanceLevel.Fragment
};
StringBuilder builder = new StringBuilder();
using (var writer = System.Xml.XmlWriter.Create(builder, settings))
{
writer.WriteString(value);
}
return builder.ToString();
}
public static string XmlDecode(string xmlEncodedValue)
{
System.Xml.XmlReaderSettings settings = new System.Xml.XmlReaderSettings
{
ConformanceLevel = System.Xml.ConformanceLevel.Fragment
};
using (var stringReader = new System.IO.StringReader(xmlEncodedValue))
{
using (var xmlReader = System.Xml.XmlReader.Create(stringReader, settings))
{
xmlReader.Read();
return xmlReader.Value;
}
}
}

Using a third-party library (Newtonsoft.Json) as alternative:
public static string XmlEscape(string unescaped)
{
if (unescaped == null) return null;
return JsonConvert.SerializeObject(unescaped); ;
}
public static string XmlUnescape(string escaped)
{
if (escaped == null) return null;
return JsonConvert.DeserializeObject(escaped, typeof(string)).ToString();
}
Examples of escaped string:
a<b ==> "a<b"
<foo></foo> ==> "foo></foo>"
NOTE:
In newer versions, the code written above may not work with escaping, so you need to specify how the strings will be escaped:
public static string XmlEscape(string unescaped)
{
if (unescaped == null) return null;
return JsonConvert.SerializeObject(unescaped, new JsonSerializerSettings()
{
StringEscapeHandling = StringEscapeHandling.EscapeHtml
});
}
Examples of escaped string:
a<b ==> "a\u003cb"
<foo></foo> ==> "\u003cfoo\u003e\u003c/foo\u003e"

SecurityElementEscape does this job for you
Use this method to replace invalid characters in a string before using the string in a SecurityElement. If invalid characters are used in a SecurityElement without being escaped, an ArgumentException is thrown.
The following table shows the invalid XML characters and their escaped equivalents.
https://learn.microsoft.com/en-us/dotnet/api/system.security.securityelement.escape?view=net-5.0

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to clean up json string from Request.Url - c#

It is better to parse it into a JSON object, then using the JSON API, to get the value of the token key from there.

Related

How to convert a string containing an array into a list<ushort> ?

XmlException - given illegal XML from 3rd party; must process

How to use replace only the first occurence of www

Json.NET: Deserilization with Double Quotes

String escape into XML

Categories

Resources