Should HtmlEncode be used when updating twitter status with Tweetsharp?

Should HtmlEncode be used when updating twitter status with Tweetsharp? - c#

I am using tweetsharp to send tweets.
var response = _twitter.AuthenticateWith(item.TwitterToken, item.TwitterSecret)
.Statuses().Update(HttpUtility.HtmlEncode(item.Tweet)).AsXml().Request().Response;
As you may have noticed, above I am HtmlEncoding the message this can cause the message to go over 140 chars? Is encoding the message this way necessary? Does tweetsharp or twitter recommend sending messages without encoding first?

TweetSharp will handle all of the encoding for you. Just pass it the string you want to post.

From here:
The Twitter API supports UTF-8
encoding. Please note that angle
brackets ("<" and ">") are
entity-encoded to prevent Cross-Site
Scripting attacks for web-embedded
consumers of JSON API output. The
resulting encoded entities do count
towards the 140 character limit. When
requesting XML, the response is UTF-8
encoded. Symbols and characters
outside of the standard ASCII range
may be translated to HTML entities.
This says to me that you should indeed make sure that your output is encoded (not necessarily HTML encoded) to UTF-8. Have you tried to UTF-8 encode and then submit, then look at the output of "special" characters?

Related

Escape sequence in password - c#

I’m making a POST API call in C# using HttpWebRequest class. In the URL I do have password as query string. But the password has # in it which is getting truncated to vigne. Data after # are considered as Fragment which suppose not to happen, is there fix for it ?
Password example: vigne#ash#Test
URL = https://vigneashtesting.com/oauth/token?login_type=password&userid=vigneash&password=vigne#ash#Test;

You should never include passwords (or any other confidential) information in query strings because they are displayed in the browser.
If you want to include special characters in a query string then you need to use encodings. You can find the encodings here: https://www.w3schools.com/tags/ref_urlencode.asp.
You can also use Uri.EscapeDataString or System.Web.HttpUtility.UrlEncode to encode special characters. See the following answer for the differences between the two: https://stackoverflow.com/a/47877559/19214431.

Proper way to handle the ampersand character in JSON string send to REST web service

OK,
I am using the System.Runtime.Serialization and the DataContractJsonSerialization.
The problem is that in the request I send a value of a property with the & character. Say, AT&T, and I get a response with error: Invalid JSON Data.
I thought that the escaping would be done inside the library but now I see that the serialization is left untouched the ampersand & character.
Yes, for a JSON format this is valid.
But it will be a problem to my POST request since I need to send this to a server that if contains an ampersand will response with error, hence here I am.
HttpUtility.HtmlEncode is in the System.Web library and so the way to go is using Uri.EscapeUriString. I did this to try, but anyway, and without it all requests are working fine, except an ampersand is in a value.
EDIT: HttpUtility class is ported to the Windows Phone SDK but the prefer way to encode a string should be still Uri.EscapeUriString.
First thought was to get hands dirty and start replacing the special character which would cause a problem in the server, but, I wonder, is there another solution I should do, that it would be efficient and 'proper'?
I should tell that I use
// Convert the string into a byte array.
byte[] postBytes = Encoding.UTF8.GetBytes(data);
To convert the JSON to a byte[] and write to the Stream.
And,
request.ContentType = "application/x-www-form-urlencoded";
As the WebRequest.ContentType.
So, am I messed up for a reason or something I miss?
Thank you.

The problem was that I was encoding the whole request string including the key.
I had a request data={JSON} and I was formatting it, but the {JSON} part should only be encoded.
string requestData = "data=" + Uri.EncodeDataString(json) // worked perfect!
Stupid hole to step into.

Have you tried replacing the ampersand with & for the POST?

Which HttpUtility decode method to use?

this may be a silly question, but it trips me up every time.
HttpUtility has the methods HtmlDecode and UrlDecode. Do these two methods decode anything (Html/Http related) I might find? When do I have to use them, and which one am I supposed to use?
Just now I hit an error. This is my error log:
Payment receiver was not payment#mysite.com. (it was payment%40mysite.com).
But, I wrapped the email address here in HttpUtility.HtmlDecode before using it. It turns out I have to use .UrlDecode instead, but this email address didn't come from a URL so this wasn't obvious to me.
Can someone clarify this?

See What is meant by htmlencode and urlencode?
It's the reverse of your case, but essentially you need to use UrlEncode/Decode anytime you are using an address of sorts (urls and yes, email addresses). HtmlEncode/Decode is for code that typically a browser would render (html/xml tags).

This same encoding is also used in Form POST requests as well.
My guess is something read it 'naked' without decoding it.
Html Encoding/Decoding is only used to escape strings that contain characters that would otherwise be interpreted as html control characters. The process turns the characters into html entities and back again.
Url Encoding is to get around the fact that many characters are not allowed in Uris; or because they too could be misinterpreted. Thus the percent encoding is used.
Percent encoding is also used in the body of http requests.
In both cases, of course, it's also a way of expressing a specific character code in a request/response independent of character sets; but equally, interpreting what is meant by a particular code can also be dependent on knowing a particular character set. Generally you don't worry about that - but it can be important (especially in the HTML case).

URLEncode converts characters that aren't allowed in a URL into character equivalents which are parsable as a URL. In your example # became %40. URLDecode reverses this.
HTMLEncode is similar to URLEncode, but the target environment is text NESTED inside of HTML. This helps the browser from interpereting your content as HTML, but when rendered it should look like the decoded version. HTMLDecode reverses this.

When you see %xx this means percent encoding has occured - this is a URL encoding scheme, so you need to use UrlEncode / UrlDecode.
The HtmlEncode and HtmlDecode methods are for encoding and decoding elements for HTML display - so things like & get encoded to & and > to >.

Output from C# to html web page - UTF8 fails?

Hey,
so we have a backend written in C# and we have text in that backend in a language which has "special characters".
Problem is when I output my saved text (from C# app) to the web page (ASP.NET), the characters are all messed up even though the browser interprest the page as UTF (since I have placed a meta tag telling the browser that it is UTF8).
But since its all messed up, Im sort of questioning what the output from C# is. Its probably not UTF8, but something else. Somewhere I read that text in .NET is usually UTF-16?
Basically, I am assigning a label (that can do HTML) with a value taken from the backend. That needs to be in UTF8.
How do I do that in the best way?

.NET strings are natively encoded as UTF-16. The following will set the HTTP output to UTF-8:
Response.ContentEncoding = System.Text.Encoding.UTF8;

When outputting special characters in HTML, you should escape them anyways using Unicode escape sequences (for example é makes é).

Better resources:
http://msdn.microsoft.com/en-us/library/39d1w2xf.aspx
Response.ContentEncoding = Encoding.GetEncoding(xxx);

Why is this appearing in my c# strings: Â£

I have a a string in c# initialised as follows:
string strVal = "£2000";
However whenever I write this string out the following is written:
Â£2000
It does not do this with dollars.
An example bit of code I am using to write out the value:
System.IO.File.AppendAllText(HttpContext.Current.Server.MapPath("/logging.txt"), strVal);
I'm guessing it's something to do with localization but if c# strings are just unicode surely this should just work?
CLARIFICATION: Just a bit more info, Jon Skeet's answer is correct, however I also get the issue when I URLEncode the string. Is there a way of preventing this?
So the URL encoded string looks like this:
"%c2%a32000"
%c2 = Â
%a3 = £
If I encode as ASCII the £ comes out as ?
Any more ideas?

AppendAllText is writing out the text in UTF-8.
What are you using to look at it? Chances are it's something that doesn't understand UTF-8, or doesn't try UTF-8 first. Tell your editor/viewer that it's a UTF-8 file and all should be well. Alternatively, use the overload of AppendAllText which allows you to specify the encoding and use whichever encoding is going to be most convenient for you.
EDIT: In response to your edited question, the reason it fails when you encode with ASCII is that £ is not in the ASCII character set (which is Unicode 0-127).
URL encoding is also using UTF-8, by the looks of it. Again, if you want to use a different encoding, specify it to the HttpUtility.UrlEncode overload which accepts an encoding.

The default character set of URLs when used in HTML pages and in HTTP headers is called ISO-8859-1 or ISO Latin-1.
It's not the same as UTF-8, and it's not the same as ASCII, but it does fit into one-byte-per-character. The range 0 to 127 is a lot like ASCII, and the whole range 0 to 255 is the same as the range 0000-00FF of Unicode.
So you can generate it from a C# string by casting each character to a byte, or you can use Encoding.GetEncoding("iso-8859-1") to get an object to do the conversion for you.
(In this character set, the UK pound symbol is 163.)
Background
The RFC says that unencoded text must be limited to the traditional 7-bit US ASCII range, and anything else (plus the special URL delimiter characters) must be encoded. But it leaves open the question of what character set to use for the upper half of the 8-bit range, making it dependent on the context in which the URL appears.
And that context is defined by two other standards, HTTP and HTML, which do specify the default character set, and which together create a practically irresistable force on implementers to assume that the address bar contains percent-encodings that refer to ISO-8859-1.
ISO-8859-1 is the character set of text-based content sent via HTTP except where otherwise specified. So by the time a URL string appears in the HTTP GET header, it ought to be in ISO-8859-1.
The other factor is that HTML also uses ISO-8859-1 as its default, and URLs typically originate as links in HTML pages. So when you craft a simple minimal HTML page in Notepad, the URLs you type into that file are in ISO-8859-1.
It's sometimes described as "hole" in the standards, but it's not really; it's just that HTML/HTTP fill in the blank left by the RFC for URLs.
Hence, for example, the advice on this page:
URL encoding of a character consists
of a "%" symbol, followed by the
two-digit hexadecimal representation
(case-insensitive) of the ISO-Latin
code point for the character.
(ISO-Latin is another name for IS-8859-1).
So much for the theory. Paste this into notepad, save it as an .html file, and open it in a few browsers. Click the link and Google should search for UK pound.
<HTML>
<BODY>
Test
</BODY>
</HTML>
It works in IE, Firefox, Apple Safari, Google Chrome - I don't have any others available right now.

Note that %a3 cannot be encoded in ASCII (7 bit, Basic Latin).
The Pound Sign (down the page) is part of Latin-1 encoding.

I have noticed that this is happening only when long strings are used (over 4000) chars. My solution was upon receiving the parameter in database, I simply replace the Â sign with nothing.
Be careful, Â may actually be needed, and if that is the case this solution is not appropriate.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.