How to convert a utf8 like string to a real utf8?

How to convert a utf8 like string to a real utf8? - c#

I'm working on a client that should communicate with an MMO game server.
The client is using unity3d.
I get the data from the server with JSON format and I try to get the data in UTF8 encoding:
string responseString = new System.IO.StreamReader(response.GetResponseStream(), System.Text.Encoding.UTF8).ReadToEnd()
JSONObject JOBJ = new JSONObject(responseString);
and what is inside the response string looks like:
"\u0645\u0639\u062f\u0646 \u062a\u06cc\u062a\u0627\u0646\u06cc\u0648\u0645"
Then I try to get the required utf8 string data out of the JSON:
string xy = JOBJ["name"].ToString();
byte[] utf = System.Text.Encoding.UTF8.GetBytes(xy);
string s2= System.Text.Encoding.UTF8.GetString(utf);
The Problem is when I Log the string:
Debug.Log("Jproperty :" + s2);
All I get is the \u secuences like this:
"\u0645\u0639\u062f\u0646 \u062a\u06cc\u062a\u0627\u0646\u06cc\u0648\u0645"
While if I put the same result in the xy in the first place I'll get the fine result.
Also I should mention that while I think that the s2.length should be 11 it is 66.
Any one can tell me what's wrong with my code?

Strings that contain unicode escape sequences are perfectly valid. Your data might be getting escaped before it is sent to the server.
Try Regex.Unescape:
var nameEscaped = JOBJ["name"].ToString();
// nameEscaped =
// \u0645\u0639\u062f\u0646 \u062a\u06cc\u062a\u0627\u0646\u06cc\u0648\u0645
var name = Regex.Unescape(nameEscaped);
// name =
// معدن تیتانیوم

Related

Convert a String, which is already malformed

I have a class, which uses another class which reads a Textfile.
The Textfile is written in Ascii or to be clear CP1525.
Background info: The Textfile is generated in Axapta and uses the ASCIIio class which writes the text by using the writeRaw method
The class which I am using is by a collegue and he is using a C# StreamReader to read files. Normally this works okay because the files are written in UTF8, but in this particular case it isn't.
So the Streamreader reads the file as UTF8 and passes the read string to me.
I now have some letters, like for example the Lating small letter o with Diaeresis (ö) which aren't formated as I would need them to be.
A simple convert of the String doesn't help in this case and I can't figure out how I can get the right letters.
So this is basically how he reads it:
char quotationChar = '"';
String line = "";
using (StreamReader reader = new StreamReader(fileName))
{
if((line = reader.ReadLine()) != null)
{
line = line.Replace(quotationChar.ToString(), "");
}
}
return line;
What now happens is, in the Textfile I have the german word "Röhre" which, after reading it with the streamreader, transforms to R�hre (which looks stupid in a database).
I could try to convert every letter
Encoding enc = Encoding.GetEncoding(1252);
byte[] utf8_Bytes = new byte[line.Length];
for (int i = 0; i < line.Length; ++i)
{
utf8_Bytes[i] = (byte)line[i];
}
String propEncodeString = enc.GetString(utf8_Bytes, 0, utf8_Bytes.Length);
That doesn't give me the right character !
byte[] myarr = Encoding.UTF8.GetBytes(line);
String propEncodeString = enc.GetString(myarr);
That also returns the wrong character.
I am aware that I could just solve the problem by using this:
using (StreamReader reader = new StreamReader(fileName, Encoding.Default, true))
But just for fun:
How can I get the right string from an already wrongly decoded string ?

Once the UTF8 to ASCII conversion is first made, all characters that don't map to valid ASCII entries are replaced with the same bad data character which means that data is just lost and you can't simply 'convert' back to a good character downstream. See this example: https://dotnetfiddle.net/XWysml

UriBuilder().Query will wrongly encode non-ASCII characters

I am working on an asp.net mvc 4 web application. and i am using .net 4.5. now i have the following WebClient() class:
using (var client = new WebClient())
{
var query = HttpUtility.ParseQueryString(string.Empty);
query["model"] = Model;
//code goes here for other parameters....
string apiurl = System.Web.Configuration.WebConfigurationManager.AppSettings["ApiURL"];
var url = new UriBuilder(apiurl);
url.Query = query.ToString();
string xml = client.DownloadString(url.ToString());
XmlDocument doc = new XmlDocument();
//code goes here ....
}
now i have noted a problem when one of the parameters contain non-ASCII charterers such as £, ¬, etc....
now the final query will have any non-ASCII characters (such as £) encoded wrongly (as %u00a3). i read about this problem and seems i can replace :-
url.Query = query.ToString();
with
url.Query = ri.EscapeUriString(HttpUtility.UrlDecode(query.ToString()));
now using the later approach will encode £ as %C2%A3 which is the correct encoded value.
but the problem i am facing with url.Query = Uri.EscapeUriString(HttpUtility.UrlDecode(query.ToString())); in that case one of the parameters contains & then the url will have the following format &operation=AddAsset&assetName=&.... so it will assume that I am passing empty assetName parameter not value =&??
EDIT
Let me summarize my problem again. I want to be able to pass the following 3 things inside my URL to a third part API :
Standard characters such as A,B ,a ,b ,1, 2, 3 ...
Non-ASCII characters such as £,¬ .
and also special characters that are used in url encoding such as & , + .
now i tried the following 2 approaches :
Approach A:
using (var client = new WebClient())
{
var query = HttpUtility.ParseQueryString(string.Empty);
query["model"] = Model;
//code goes here for other parameters....
string apiurl = System.Web.Configuration.WebConfigurationManager.AppSettings["ApiURL"];
var url = new UriBuilder(apiurl);
url.Query = query.ToString();
string xml = client.DownloadString(url.ToString());
XmlDocument doc = new XmlDocument();
//code goes here ....
}
In this approach i can pass values such as & ,+ since they are going to be url encoded ,,but if i want to pass non-ASCII characters they will be encoded using ISO-8859-1 ... so if i have £ value , my above code will encoded as %u00a3 and it will be saved inside the 3rd party API as %u00a3 instead of £.
Approach B :
I use :
url.Query = Uri.EscapeUriString(HttpUtility.UrlDecode(query.ToString()));
instead of
url.Query = query.ToString();
now I can pass non-ASCII characters such as £ since they will be encoded correctly using UTF8 instead of ISO-8859-1. but i can not pass values such as & because my url will be read wrongly by the 3rd party API.. for example if I want to pass assetName=& my url will look as follow:
&operation=Add&assetName=&
so the third part API will assume I am passing empty assetName, while I am trying to pass its value as &...
so not sure how I can pass both non-ASCII characters + characters such as &, + ????

You could use System.Net.Http.FormUrlEncodedContent instead.
This works with a Dictionary for the Name/Value pairing and the Dictionary, unlike the NameValueCollection, does not "incorrectly" map characters such as £ to an unhelpful escaping (%u00a3, in your case).
Instead, the FormUrlEncodedContent can take a dictionary in its constructor. When you read the string out of it, it will have properly urlencoded the dictionary values.
It will correctly and uniformly handle both of the cases you were having trouble with:
£ (which exceeds the character value range of urlencoding and needs to be encoded into a hexadecimal value in order to transport)
& (which, as you say, has meaning in the url as a parameter separator, so that values cannot contain it--so that it has to be encoded as well).
Here's a code example, that shows that the various kinds of example items you mentioned (represented by item1, item2 and item3) now end up correctly urlencoded:
String item1 = "£";
String item2 = "&";
String item3 = "xyz";
Dictionary<string,string> queryDictionary = new Dictionary<string, string>()
{
{"item1", item1},
{"item2", item2},
{"item3", item3}
};
var queryString = new System.Net.Http.FormUrlEncodedContent(queryDictionary)
.ReadAsStringAsync().Result;
queryString will contain item1=%C2%A3&item2=%26&item3=xyz.

Maybe you could try to use an Extension method on the NameValueCollection class. Something like this:
using System.Collections.Specialized;
using System.Text;
using System.Web;
namespace Testing
{
public static class NameValueCollectionExtension
{
public static string ToUtf8UrlEncodedQuery(this NameValueCollection nv)
{
StringBuilder sb = new StringBuilder();
bool firstIteration = true;
foreach (var key in nv.AllKeys)
{
if (!firstIteration)
sb.Append("&");
sb.Append(HttpUtility.UrlEncode(key, Encoding.UTF8))
.Append("=")
.Append(HttpUtility.UrlEncode(nv[key], Encoding.UTF8));
firstIteration = false;
}
return sb.ToString();
}
}
}
Then, in your code you can do this:
url.Query = query.ToUtf8UrlEncodedQuery();
Remember to add a using directive for the namespace where you put the NameValueCollectionExtension class.

The problem here isn't UriBuilder.Query, it's UriBuilder.ToString(). Read the documentation here: https://msdn.microsoft.com/en-us/library/system.uribuilder.tostring(v=vs.110).aspx. The property is defined as returning the "display string" of the builder, not a validly encoded string. Uri.ToString() has a similar problem, in that it doesn't perform proper encoding.
Use the following instead: url.Uri.AbsoluteUri, that will always be a properly encoded string. You shouldn't have to do any encoding on the way into the builder (that's part of it's purpose, after all, to properly encode things).

You need to use:
System.Web.HttpUtility.UrlEncode(key)
Change your code to this:
using (var client = new WebClient())
{
var query = HttpUtility.ParseQueryString(string.Empty);
query["model"] = Model;
//code goes here for other parameters....
string apiurl = System.Web.Configuration.WebConfigurationManager.AppSettings["ApiURL"];
var url = new UriBuilder(apiurl);
url.Query = HttpUtility.UrlEncode(query.ToString());
string xml = client.DownloadString(url.ToString());
XmlDocument doc = new XmlDocument();
//code goes here ....
}

If nothing helps, then just manually convert those problematic chars inside values of parameters
& to %26
+ to %2B
? to %3F

Cannot split Base64 string the correct way

I'm working on a C# client that receives Base64 files via a website.
It worked sometimes, but now I have a file that doesn't work because:
Lets say I have this as a Base64 string with some extra information added at the beginning shapered.png|data:image/png;base64,iVBORw0KGgoAAAANSUh /// EUgAAACAAAAAgCAYAblabla
Then I do this in the C# code:
var raw = base64.Split('/')[1]; //get raw data with type extention ,
var data = raw.Split(',')[1]; //get base64 data
var info = raw.Split(',')[0]; // gets type extention mp3 base64
var ext = info.Split(';')[0]; // splits header mp3 only use this for comparison against malicous file
var name = base64.Split('|')[0]; //splits name info
Now I see in the debugger that it cuts all the information out after at the ///
Am I completely missing the point?
I'd think that var raw = base64.Split('/')[1] would take the information after date:image/
and then that raw.Split(',')[1] takes out the png;base64 with only the raw data being left.
But it doesn't, it also takes out all the rest after the '/' out in the raw variable resulting in making the file corrupt.
Reason why I'm not splitting it at the ',' immediatly and split from there is to prevent files with a ',' in it to also work.
It totally flew above my head, sorry guys!
I thought in one way that with [1] it would also take the rest of the "array".
string example = "im/noob/programmer";
Console.WriteLine(example);
string lol = example.Split('/')[1]; //this only outputs "noob"
Console.WriteLine(lol);

You are cutting the entire string up in pieces. You use the / to do that splitting and it breaks the string off at every occurrence of a /.
The better solution is to find the position of the slash instead of splitting on it.
Something like this:
string s1 = "shapered.png|data:image/png;base64,iVBORw0KGgoAAAANSUh /// EUgAAACAAAAAgCAYAblabla";
string name = s1.Substring(0, s1.IndexOf('|'));
string data = s1.Substring(s1.IndexOf('|') + 1);
string mimeType = data.Substring(data.IndexOf(':') + 1, data.IndexOf(';') - data.IndexOf(':') - 1);
string base64 = data.Substring(data.IndexOf(',') + 1);
Or when you are comfortable to use a regular expression:
(.*?)\|data:(.*?);base64,(.*)

Parsing a simple Json string

I have gotten a Json string to parse before that was an array of objects much longer than just a simple string, which makes me think that I'm doing something wrong with the formatting.
Here is word for word what our webservice is outputting as the json string:
{"news":"What is Legal/Awesome Dre"}
the first part is simply what I named the string in the application (news) and the second part is the string that will be changing as the song does which is why I would like to pull in a simple string of it.
When I run the app I'm getting a parse error at these lines:
Console.Out.Writeline (content);
news = JsonConvert.DeserializeObject(content);
The application output will show the Json string as it is on the website, but I get an error right after that's telling me Invalid Token: startPath... which last time meant that my Json string was formatted wrong for how I need to grab the data.
Anyone can help me with this?
(P.S. I am working in Xamarin Studio (mono for android) using C#, if that makes any difference)

The problem is that your serialized JSON object isn't a string, it's an object with the string value you want at the "news" property/key/name. This is a simple way to get the string:
dynamic jsonObj = JsonConvert.DeserializeObject(content);
string news = jsonObj.news;
Or you can use an anonymous type:
var jsonObj = JsonConvert.DeserializeAnonymousType(content, new { news = "" });
string news = jsonObj.news;
Or create a type with a string News property:
MyNewsType jsonObj = JsonConvert.DeserializeObject<MyNewsType>(content);
string news = jsonObj.News;
These all work in the following way:
var content = #"{""news"":""What is Legal/Awesome Dre""}";
// above code
Console.WriteLine(news); // prints "What is Legal/Awesome Dre"

Try to put square bracket in your JSON:
[{"news":"What is Legal/Awesome Dre"}]

Is this 64-bit Encoded?

All of the passwords in our User DB look like this where we have == at the end:
91F2FSEYrFOcabeHK/UfNw==
So how can I tell if this is 64-bit encoded? It has to be because I can decode using a decode 64-bit routine I have.
I am trying now to figure out how to decode a literal string to 64-bit..back to the xxxxxxxx== and here is my code:
string passwordToEncrypt = "test";
byte[] passwordToBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(passwordToEncrypt);
result = Convert.ToBase64String(passwordToBytes);
Updated:
I need the text test to come out in Base64 with the == at the end.

you have a typo in there - so the above code does not compile, try
string passwordToEncrypte = "test";
byte[] passwordToBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(passwordToEncrypte);
string result = Convert.ToBase64String(passwordToBytes);
result contains now a "Base64"-encoded password and end with "=="...
BUT the above code works only for passwords containing ASCII... if you want it to work with UTF8 passwords then change it to :
string passwordToEncrypte = "test";
byte[] passwordToBytes = Encoding.UTF8.GetBytes(passwordToEncrypte);
string result = Convert.ToBase64String(passwordToBytes);
to go back from Base64 to the original you need to do:
string Original = Encoding.UTF8.GetString (Convert.FromBase64String(result));
see http://msdn.microsoft.com/en-us/library/86hf4sb8.aspx
and http://msdn.microsoft.com/en-us/library/system.convert.tobase64string.aspx
and http://msdn.microsoft.com/en-us/library/system.convert.frombase64string.aspx

Base64 encoded string doesn't always end with a =, it will only end with one or two = if they are required to pad the string out to the proper length.For more details checkout following link
Padding

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to convert a utf8 like string to a real utf8? - c#

Related

Convert a String, which is already malformed

UriBuilder().Query will wrongly encode non-ASCII characters

Cannot split Base64 string the correct way

Parsing a simple Json string

Is this 64-bit Encoded?

Categories

Resources