I've to return an xml string in the response of a WCF method. I've found that the parser sometimes encode the string (e.g <test>) while sometimes it wraps all the string into a CDATA (e.g. <![CDATA[<test>]]>).
In particular I've noticed that for strings in which there are a small number of escape characters it uses the first method otherwise the second.
I need to have all the string in the first format, and not in the CDATA. Is there a way to accomplish that?
I've tried to encode the string myself before the response, but then it tries to encode the character "&" and so nothing changes.
Related
I'm using Convert.FromBase64String() for decoding a base 64 encoded string. The string actually is a XML file, which has base 64 encoded images in it. E.g.
data:image/png;base64,iVBORw0KGgoAA...
I get the following exception:
System.FormatException: The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
Where is the problem? The double base 64 encoding? The string image/png;base64 in the base 64 encoded data? An online tool has no issues at all.
Edit:
Now I tried to remove image/png;base64 part from the XML file and I still get this error. Then I tried to decode the string YWJj with the same error!? If I use this code
byte[] dataBuffer = Convert.FromBase64String(base64string);
I get the above exception. If I use instead
byte[] dataBuffer = Convert.FromBase64String("YWJj");
it does work. Encoding of the file is UTF-8 according to Notepad++. Any ideas?
Edit 2:
String.Equals says that the two strings YWJj are not equal, despite the Locals Window shows that they are:
BTW the above code doesn't throw the exception, because I use string test = "YWJj";. Why does it work with local defined variables, but not with passed strings? I don't think it's a threading problem, because I made the above function, which is only called once.
You should remove data:image/png;base64, part from string to decode.
strind data = "data:image/png;base64,iVBORw0KGgoAA...";
string[] pd = data.Split(',');
string decoded = Convert.FromBase64String(pd[1]);
The part of string data:image/png;base64, isn't base64 data. Real encoded data starts after ,. Base64 description. So function Convert.FromBase64String accepts only encoded data. Therefore, you need to extract the encoded data.
As I've already written I'm reading the base 64 encoded file in and decode it with Convert.FromBase64String(). Now I got it working and the reason is completely unknowable. What I've done?
I renamed the file. That's it.
Before I had a filename like NAME_Something_v1.0.xsl.b64. Now I use NAME_Something.b64. Perhaps it's not the only reason, but I'm accessing the file from an assembly with assembly.GetManifestResourceStream(). I've cleaned the solution before, but I always had the same problem. Now I changed the name back to where it was and it also works ...
1. You shouldn't include the data:image/png;base64, part, as this isn't actually a part of the base64 string.
2. iVBORw0KGgoAA... isn't valid either, this is not the full base64 string.
You can solve this by either splitting the string or using regular expressions to parse it.
Everything after data:image/png;base64, is the actual Base64 string to be decoded.
You can remove the first part of the string like so:
ImageAsString = ImageAsString.Substring(input.IndexOf('data:image/png;base64,') + 1);
I have a console application. I have to remove all the unwanted escape characters from an HTML query string. Here is my query string
string query="http://10.1.1.186:8085/PublicEye_common/Jurisdiction/Login.aspx?ReturnUrl=%2fPublicEye_common%2fJurisdiction%2fprint.html%3f%257B%2522__type%2522%253A%2522xPad.Reports.ReportDetail%2522%252C%2522ReportTitle%2522%253A%25221%2522%252C%2522ReportFooter%2522%253A%25221%2522%252C%2522ReportHeader%2522%253A%25221%2522%252C%2522CommonFields%2522%253A%255B%255D%252C%2522Sections%2522%253A%255B%257B%2522SectionTitle%2522%253A%2522Sections%2522%252C%2522ShowTitleSection%2522%253Atrue%252C%2522SubSections%2522%253A%255B%257B%2522SubSectionTitle%2522%253A%2522Sub%2520Section%2522%252C%2522ShowTitleSubSection%2522%253Atrue%252C%2522FormGroups%2522%253A%255B%257B%2522FormGroupTitle%2522%253A%2522Form%2520Groups%2522%252C%2522ShowTitleFormGroup%2522%253Atrue%252C%2522FormFields%2522%253A%255B%257B%2522FormFieldTitle%2522%253A%2522Form%2520Fields%2522%252C%2522FormFieldValue%2522%253A%252212%2522%257D%255D%257D%255D%257D%255D%257D%255D%257D&%7B%22__type%22%3A%22xPad.Reports.ReportDetail%22%2C%22ReportTitle%22%3A%221%22%2C%22ReportFooter%22%3A%221%22%2C%22ReportHeader%22%3A%221%22%2C%22CommonFields%22%3A%5B%5D%2C%22Sections%22%3A%5B%7B%22SectionTitle%22%3A%22Sections%22%2C%22ShowTitleSection%22%3Atrue%2C%22SubSections%22%3A%5B%7B%22SubSectionTitle%22%3A%22Sub%20Section%22%2C%22ShowTitleSubSection%22%3Atrue%2C%22FormGroups%22%3A%5B%7B%22FormGroupTitle%22%3A%22Form%20Groups%22%2C%22ShowTitleFormGroup%22%3Atrue%2C%22FormFields%22%3A%5B%7B%22FormFieldTitle%22%3A%22Form%20Fields%22%2C%22FormFieldValue%22%3A%2212%22%7D%5D%7D%5D%7D%5D%7D%5D%7D";
I tried the following:
string decode = System.Net.WebUtility.HtmlDecode(query);
string decode2 =System.Net.WebUtility.UrlDecode(query);
string decode3=System.Web.HttpServerUtility.UrlTokenDecode(query).ToString();
None will give me the best results.
This worked for me:
string decode = HttpUtility.UrlDecode(Uri.UnescapeDataString(query));
HttpUtility is in the namespace System.Web. you might need to add the reference.
Output:
http://10.1.1.186:8085/PublicEye_common/Jurisdiction/Login.aspx?ReturnUrl=/PublicEye_common/Jurisdiction/print.html?{"__type":"xPad.Reports.ReportDetail","ReportTitle":"1","ReportFooter":"1","ReportHeader":"1","CommonFields":[],"Sections":[{"SectionTitle":"Sections","ShowTitleSection":true,"SubSections":[{"SubSectionTitle":"Sub Section","ShowTitleSubSection":true,"FormGroups":[{"FormGroupTitle":"Form Groups","ShowTitleFormGroup":true,"FormFields":[{"FormFieldTitle":"Form Fields","FormFieldValue":"12"}]}]}]}]}&{"__type":"xPad.Reports.ReportDetail","ReportTitle":"1","ReportFooter":"1","ReportHeader":"1","CommonFields":[],"Sections":[{"SectionTitle":"Sections","ShowTitleSection":true,"SubSections":[{"SubSectionTitle":"Sub Section","ShowTitleSubSection":true,"FormGroups":[{"FormGroupTitle":"Form Groups","ShowTitleFormGroup":true,"FormFields":[{"FormFieldTitle":"Form Fields","FormFieldValue":"12"}]}]}]}]}
Panagiotis Kanavos has made a good point in the comments:
The querystring part contains an encoded URL parameter (ReturnUrl),
which means it had to be encoded as a data string in the first place.
That's a stricter encoding that UrlEncoding
In summary I retrieve a HTTP Web Response containing JSON formatted data with unicode characters such as "\u00c3\u00b1" which should translate to "ñ". Instead these characters are converted to "ñ" by the JSON parser I am using. The behavior I'm looking for is for those characters to be translated to "ñ".
Taking the following code and executing it...
string nWithAccent = "\u00c3\u00b1";
Encoding iso = Encoding.GetEncoding("iso8859-1");
byte[] isoBytes = iso.GetBytes(nWithAccent);
nWithAccent = Encoding.UTF8.GetString(isoBytes);
nWithAccent outputs "ñ". This is the result I am looking for. I took the above code and used it on the "response_body" variable below which contained the same characters as above (from what I could see using the Visual Studio 2008 Text Analyzer) and did not get the same result... it does nothing with the characters "\u00c3\u00b1".
In my application I execute the following code against an external system retrieving data in JSON format. Upon examining the "response_body" variable using the text analyzer in Visual Studio I see "\u00c3\u00b1" instead of ñ. E.g. the word "niño" would be seen in the Text Analyzer as "ni\u00c3\u00b1o".
using (HttpWResponse = (HttpWebResponse)this.HttpWRequest.GetResponse())
{
using (StreamReader reader = new StreamReader(HttpWResponse.GetResponseStream(), Encoding.UTF8))
{
// token will expire 60 min from now.
this.TimeTillTokenExpiration = DateTime.Now.AddMinutes(60);
// read response data
response_body = reader.ReadToEnd();
}
}
I then use an open source JSON parser which replaces "\u00c3" with "Ã" and "\u00b1" with "±" with an end result of "ñ" instead of "ñ". Is something wrong with the JSON parser or am I applying the wrong encoding to the response stream? The headers in the response indicate the charset as being UTF-8. Thanks for any replies!
The JSON response you are receiving is invalid. "\u00c3\u00b1" isn't the correct encoding for ñ.
Instead it's a sort of double encoding. It has first been encoded as an UTF-8 byte sequence and then the bytes above 128 have been escaped with the \u sequence.
Since a JSON response is usally UTF-8 anyway, there's no need to escape the two byte sequence for ñ. If escaping is used, it must not be applied to the two byte sequence but rather to the single Unicode character itself. It would then result in "\u00f1".
You can test it with an online JSON validator (such as JSONLint or JSON Format) by pasting the following JSON data:
{
"unescaped": "ñ",
"escaped": "\u00f1",
"wrong": "\u00c3\u00b1"
}
Replace
new StreamReader(HttpWResponse.GetResponseStream(), Encoding.UTF8))
with
new StreamReader(HttpWResponse.GetResponseStream(), Encoding.GetEncoding("iso8859-1")))
What happens if you pass this string to the JSON parser?
string s = "\\u00c3\\u00b1";
I suspect you'll get "ñ".
Is there a way you can tell your JSON parser to interpret characters in the string as though they're UTF-8 bytes?
You're probably better off reading raw bytes from the response stream and passing that to the JSON parser.
I think the problem is that you're converting the raw bytes to a string, which contains the encoded characters. The JSON parser doesn't know if you want that "\u00c3\u00b1" converted to a single UTF-8 character, or two characters.
What's the best way to urlencode (escape) a large string (50k - 200k characters) in the .net 4 client profile?
System.Net.Uri.EscapeDataString() is limited to 32766 characters.
HttpUtility.UrlEncode is not available in .net 4 client.
The encoded string is to be passed as the value of a parameter in an httprequest as a post.
(Also, is there a .net-4-client profile tag on SO?)
Because a url encoded string is just encoded character by character it means that if you split a string and encode the two parts then you can concatenate them to get the encoded version of the original string.
So simply loop through and urlencode 30,000 characters at a time and then join all those parts together to get your encoded string.
I will echo the sentiments of others that you might be better off with a content-type of multipart/form-data. http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4 explains the differences in case you are unaware. Which of these two you choose should make little difference to the destination since both should be fully understood by the target.
I would suggest looking in to using a MIME format for posting your data. No need to encode (other than maybe a base64 encoding) and would keep you under the limitation.
You could manually encode it all using StringBuilder, though it will increase your transfer amount threefold:
string EncodePostData(byte[] data)
{
var sbData = new StringBuilder();
foreach(byte b in data)
{
sbData.AppendFormat("%{0:x2}", b);
}
return sbData.ToString();
}
The standard method, however, is just to supply a MIME type and Content-Length header, then send the data raw.
I am using Dataset.ReadXML() to read an XML string. I get an error as the XML string contains the Invalid Character 0x1F which is 'US' - Unit seperator. This is contained within fully formed tags.
The data is extracted from an Oracle DB, using a Perl script. How would be the best way to escape this character so that the XML is read correctly.
EDIT: XML String:
<RESULT>
<DEPARTMENT>Oncology</DEPARTMENT>
<DESCRIPTION>Oncology</DESCRIPTION>
<STUDY_NAME>**7360C hsd**</STUDY_NAME>
<STUDY_ID>27</STUDY_ID>
</RESULT>
Is between the C and h in the bold part, is where there is a US seperator, which when pasted into this actually shows a space. So I want to know how can I ignore that in an XML string?
If you look at section 2.2 of the XML recommendation, you'll see that x01F is not in the range of characters allowed in XML documents. So while the string you're looking at may look like an XML document to you, it isn't one.
You have two problems. The relatively small one is what to do about this document. I'd probably preprocess the string and discard any character that's not legal in well-formed XML, but then I don't know anything about the relatively large problem.
And the relatively large problem is: what's this data doing in there in the first place? What purpose (if any) do non-visible ASCII characters in the middle of a (presumably) human-readable data field serve? Why is it doesn't the Perl script that produces this string failing when it encounters an illegal character?
I'll bet you one American dollar that it's because the person who wrote that script is using string manipulation and not an XML library to emit the XML document. Which is why, as I've said time and again, you should never use string manipulation to produce XML. (There are certainly exceptions. If you're writing a throwaway application, for instance, or an XML parser. Or if your name's Tim Bray.)
Your XmlReader/TextReader must be created with correct encoding. You can create it as below and pass to your Dataaset:
StreamReader reader = new StreamReader("myfile.xml",Encoding.ASCII); // or correct encoding
myDataset.ReadXml(reader);