Urlencode large amount of text in .net 4 client C# - c#

What's the best way to urlencode (escape) a large string (50k - 200k characters) in the .net 4 client profile?
System.Net.Uri.EscapeDataString() is limited to 32766 characters.
HttpUtility.UrlEncode is not available in .net 4 client.
The encoded string is to be passed as the value of a parameter in an httprequest as a post.
(Also, is there a .net-4-client profile tag on SO?)

Because a url encoded string is just encoded character by character it means that if you split a string and encode the two parts then you can concatenate them to get the encoded version of the original string.
So simply loop through and urlencode 30,000 characters at a time and then join all those parts together to get your encoded string.
I will echo the sentiments of others that you might be better off with a content-type of multipart/form-data. http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4 explains the differences in case you are unaware. Which of these two you choose should make little difference to the destination since both should be fully understood by the target.

I would suggest looking in to using a MIME format for posting your data. No need to encode (other than maybe a base64 encoding) and would keep you under the limitation.

You could manually encode it all using StringBuilder, though it will increase your transfer amount threefold:
string EncodePostData(byte[] data)
{
var sbData = new StringBuilder();
foreach(byte b in data)
{
sbData.AppendFormat("%{0:x2}", b);
}
return sbData.ToString();
}
The standard method, however, is just to supply a MIME type and Content-Length header, then send the data raw.

Related

Why is UTF-7 interpreting umlauts correct and UTF-8 not?

I have a Base64 string which I want to convert and decode to UTF-8 like this:
byte[] encodedDataAsBytes = System.Convert.FromBase64String(vcard);
return Encoding.UTF8.GetString(encodedDataAsBytes);
This because Umlauts in the string need to be displayed correctly. The problem I face is that when I use UTF-8 as encoding the umlauts are NOT handled correctly. But when I use UTF-7
return Encoding.UTF7.GetString(encodedDataAsBytes);
everything works fine.
Why's that? Should'nt UTF-8 be able to handle umlauts??
Your vcard is UTF-7 encoded.
This is why Encoding.UTF7.GetString(encodedDataAsBytes); gives you the right result.
After it is encoded, you can't decide on another encoding.
To use UTF-8 encoding you would need access to the string before variable vcard got its value.
I had a similar problem. In my case, I used javaScript btoa() to encode a filename to Base64 within the Web UI, and send it over to the server. On the server side .net core, I used the code below to decode it back to a string filename.
// Note: encodedFilename is the result of btoa() from the client web UI.
var raw = Convert.FromBase64String(encodedFilename);
var filename = Encoding.UTF8.GetString(raw);
It failed to decode ä. However it worked when I used Encoding.UTF7(), but I think it is not the right solution. I believe that this due to the different encode/decode type. btoa() is binary to ASCII. What I really need is b64EncodeUnicode().
function b64EncodeUnicode(str) {
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
return String.fromCharCode('0x' + p1);
}));
}
Code Reference: https://developer.mozilla.org/en-US/docs/Glossary/Base64

Encoding "ä" into "%E4"

I'm trying to understand what is the best encode from C# that fulfill a requirement on a new SMS Provider.
The text I want to send is:
Bäste Björn
The encoded text that the provider say it needs is:
B%E4ste+Bj%F6rn
so ä is %E4 and ö is %F6
From this answer, I got that, for such conversion I need to use HttpUtility.HtmlAttributeEncode as the normal HttpUtility.UrlEncode will output:
B%c3%a4ste+Bj%c3%b6rn
and that outputs weird chars on the mobile phone :/
as several chars are not converted, I tried this:
private string specialEncoding(string text)
{
StringBuilder r = new StringBuilder();
foreach (char c in text.ToCharArray())
{
string e = System.Web.HttpUtility.UrlEncode(c.ToString());
if (e.StartsWith("%") && e.ToLower() != "%0a") // %0a == Linefeed
{
string attr = System.Web.HttpUtility.HtmlAttributeEncode(c.ToString());
r.Append(attr);
}
else
{
r.Append(e);
}
}
return r.ToString();
}
verbose so I could breakpoint and test each char, and found out that:
System.Web.HttpUtility.HtmlAttributeEncode("ä") is actually equal to ä... so there is no %E4 as output...
What am I missing? and is there a simply way to do the encoding without manipulating them char by char and have the required output?
that the provider say it needs
Ask the provider in which age they are living. According to Wikipedia: Percent-encoding:
The generic URI syntax mandates that new URI schemes that provide for the representation of character data in a URI must, in effect, represent characters from the unreserved set without translation, and should convert all other characters to bytes according to UTF-8, and then percent-encode those values. This requirement was introduced in January 2005 with the publication of RFC 3986. URI schemes introduced before this date are not affected.
Granted, this RFC talks about "new URI schemes", which HTTP obviously is not, but adhering to this standard prevents headaches like this. See also What is the proper way to URL encode Unicode characters?.
They seem to want you to encode characters according to the Windows-1250 Code Page (or comparable, like ISO-8859-1 or -2, check alternatives here) instead, as using that code page E4 (132) maps to ä and F6 (148) maps to ö. As #Simon points out in his comment, you should ask the provider which code page exactly they want you to use.
Assuming Windows-1250, you can implement it like this, according to URL encode ASCII/UTF16 characters:
var windows1250 = Encoding.GetEncoding(1250);
var percentEncoded = HttpUtility.UrlEncode("Bäste Björn", windows1250);
The value of percentEncoded is:
B%e4ste+Bj%f6rn
If they insist on using uppercase, see .net UrlEncode - lowercase problem.

How to decode encoded URL in google chrome?

Google chrome automatically converts unicode strings in URL to something like this;
?querystring=مقالات
?querystring=%D9%85%D9%82%D8%A7%D9%84%D8%A7%D8%AA
My question is how to decode the encoded text in the codes for example for a comparing purpose?
if (Request.Url.Query == "?querystring=مقالات")
//do something
Check these out :)
Console.WriteLine(System.Web.HttpUtility.UrlDecode("http://www.google.com/search?q=مقالات"));
Console.WriteLine(System.Web.HttpUtility.UrlEncode("http://www.google.com/search?q=%D9%85%D9%82%D8%A7%D9%84%D8%A7%D8%AA"));
URL encoding ensures that all browsers will correctly transmit text in URL strings.
Characters such as a question mark (?), ampersand (&), slash mark (/), and spaces might be truncated or corrupted by some browsers. As a result, these characters must be encoded in tags or in query strings where the strings can be re-sent by a browser in a request string.
UrlDecode is a convenient way to access the HttpUtility.UrlDecode method at run time from an ASP.NET application. Internally, UrlDecode uses HttpUtility.UrlDecode to decode strings.
The following example decodes the string named EncodedString (received in a URL) into the string named DecodedString :
String DecodedString = Server.UrlDecode(EncodedString);

Can I add a string converted to Base64 as part of a URL?

I have a string converted with encryption. I would like to make this part of a URL. Is that possible if it has been converted to base64 or do I need to do something more?
var going_to_be_part_of_url = System.Convert.ToBase64String(bytOut, 0, i);
Thanks
Yes, but it's not a good idea, Base64 requires that you respect the difference between upper case and lower case. URL's aren't typically case strict.
Then there's the problem of the special characters in Base64 being converted to URL encoded equivalents, making your URL's ugly and less manageable.
You should go with Base36 instead.
You can use a modified base 64 for URL Applications which is just base64 with a couple of the problem characters replaced.
The easiest way is to take your base64 string and encode perform a string replace on the problem characters when building the URL, and reversing the process when interpreting the URL.

How do I encode a Binary blob as Unicode blob?

I'm trying to store a Gzip serialized object into Active Directory's "Extension Attribute", more info here. This field is a Unicode string according to it's oM syntax of 64.
What is the most efficient way to store a binary blob as Unicode? Once I get this down, the rest is a piece of cake.
There are, of course, many ways of reliably packing an arbitrary byte array into Unicode characters, but none of them are very efficient. It is very unfortunate that ActiveDirectory would choose to use Unicode for data that is not textual in nature. It’s like using a string to represent a 32-bit integer, or like using Nutella to write a love letter.
My recommendation would be to “play it safe” and use an ASCII-based encoding such as base64. The reason I recommend this is because there is already a built-in .NET implementation for this:
var base64Encoded = Convert.ToBase64String(byteArray);
var original = Convert.FromBase64String(base64Encoded);
In theory you could come up with an encoding that is more efficient than this by making use of more of the Unicode character set. However, in order to do so reliably, you would need to know quite a bit about Unicode.
Normally, this would be the way to convert between bytes and Unicode text:
// string from bytes
System.Text.Encoding.Unicode.GetString(bytes);
// bytes from string
System.Text.Encoding.Unicode.GetBytes(bytes);
EDIT:
But since not every possible byte sequence is a valid Unicode string, you should use a method that can create a string from an arbitrary byte sequence:
// string from bytes
Convert.ToBase64String(byteArray);
// bytes from string
Convert.FromBase64String(base64Encoded);
(Thanks to #Timwi who pointed this out!)

Categories

Resources