Preserve special characters when parsing variables from http request to c# string - c#

I am working on a small application based on owin and katana to handle links internally.
So the application handles HttpGet requests. When someone calls
http:localhost/?document=path/to/my/document/foo.doc
the application opens this document.
My problem is: When the document name contains a special character like '+' my code interprets the + sign as space because the variable is parsed into a string.
[HttpGet]
[Route("")]
public HttpResponseMessage Get(string document = "")
{
//open document
}
So how to preserve the special characters and don't allow c# string to convert them before executing any code?
I tried with HttpResponseMessage Get([FromUri]string document = "")
I tried encoding the document variable afterwards with HttpUtility.UrlEncode but it will also encode the legit spaces.

Meanwhile I got a workaround:
this.Request.RequestUri.OriginalString
inside of the GET Method will give the the full link as string without any interpretations. The rest is to get the relevant variable with string.substring operations.
Nevertheless it would like to know if someone knows a more elegant solution.

Related

SELECT from PHP (MySQL) Back Into Android (C#)

I have an Android app and I'm attempting to use PHP/MySQL.
I'm having a lot of trouble getting my results from PHP accessible in C#/Android.
This is my PHP so far:
$sql = "SELECT Name FROM Employees WHERE Password='$password'";
if(!$result = $mysqli->query($sql)) {
echo "Sorry, the query was unsuccessful";
}
while($employee = $result->fetch_assoc()) {
$jsonResult = json_encode($employee);
$employee->close();
}
I've left out the basic connection code as I have all that up and running. Here is my C#:
private void OnLoginButtonClick()
{
var mClient = new WebClient();
mClient.DownloadDataAsync(new Uri("https://127.0.0.1/JMapp/Login.php?password=" + _passwordEditText.Text));
}
As you can see I really am at a very basic stage. I've installed Newtonsoft so I'm ready to deal with the Json that is coming back, however I have a few questions.
I'm well aware of SQL injection, and the way that my variable (password) is passed to the PHP concerns me. Is there a safer way of doing this?
Secondly, I am now unsure of how to get the 'Employees' that match the MySQL command in PHP back into C#. How am I able to access the object that is passed back from PHP?
Leaving aside other aspects of the code in the question, I sugest some reading on sanitizing and escaping user data.
For this specific case of a password see #Jay Blanchard comments. For other input you would not trasform upon input, the idea is to sanitize it as soon as you receive it.
This is to make sure you receive what you were expecting. In the case of a String, trim() the text, match it against a regex of allowed characters. If you allow html tags or not you can match it against a white list of them. Max length.
Then you would validate it. This is that it makes sense and meets the business requirements.
At the time of storing it in the database you can avoid sqlinjection by using prepared statements. By doing this it is clear what is text to be stored and what is sql instructions.
At the time of using the data, you will escape it accoring to where it is going to be used, for example, if it is html content you escape it for html content, if it is an html attribute, or an URL parameter, you do the escaping accordingly for each case. (Wordpress has a nice suite of functions that do this)
Also don't send passwords as URL parameters. Use a form instead with method POST. Urls are seen in the Browser's address widget. And they also get copy pasted in emails, facebook, etc

URL unicode parameters decoding C#

I got a URL which contains parameters, one of which is with Cyrillic letters.
http://localhost/Print.aspx?id=4&subwebid=243572&docnumber=%u0417%u041f005637-1&deliverypoint=4630013519990
Doc-number must be ЗП005637-1.
I have tried the following code, but string is still with those characters %u0417%u041f.
public static String DecodeUrlString(this String url)
{
String newUrl;
while ((newUrl = Uri.UnescapeDataString(url)) != url)
url = newUrl;
return newUrl;
}
It's not a possibility to use HttpUtility.
If your goal is to avoid a dependency on System.Web.dll, then you would normally use the equivalent method in the WebUtility Class: WebUtility.UrlDecode Method.
However, you will find that, even then, your url won't get decoded the way you want it to.
This is because WebUtility.UrlDecode does not handle the %uNNNN escape notation on purpose. Notice this comment in the source code:
// *** Source: alm/tfs_core/Framework/Common/UriUtility/HttpUtility.cs
// This specific code was copied from above ASP.NET codebase.
// Changes done - Removed the logic to handle %Uxxxx as it is not standards compliant.
As stated in the comment, the %uNNNN escape format is not standard compliant and should be avoided if possible. You can find more info on this and on the proper way of encoding urls from this thread.
If you have any control over how the url is generated, consider changing it to be standard-compliant. Otherwise, consider adding System.Web.dll as a dependency, find another third-party library that does the job, or write your own decoder. As commented already, the source code is out there.

Send out literal string from web api in C#

I have a literal string generated by my bussiness logic that I need to send out on my web api via my controller.
The end part of my controller function looks like this, where the text variable is a literal string, thus containing "\\" to indicate a single backslash:
var text = _transformation.ToTextFormula(new Formula("", formula, parts));
return Ok(text);
The problem this creates is that when I then consume my api the duble backslashes are still there and not just the single one intended. Surely there must be a way to correct what is sent out?
If I inspect the "text" variable to look at the value in real format there is just a single slash before leaving the method.

How to escape url encoding?

I am creating a link that creates URL parameters that contains links with URL parameters.
The issue is that I have a link like this
http://mydomain/_layouts/test/MyLinksEdit.aspx?auto=true&source=
http://vtss-sp2010hh:8088/AdminReports/helloworld.aspx?pdfid=193
&url=http://vtss-sp2010hh:8088/AdminReports/helloworld.aspx?pdfid=193%26pdfname=5.6%20Upgrade
&title=5.6 Upgrade
This link goes to a bookmark adding page where it reads these parameters.
auto is wheather to read the following parameters or not
source is where to go after you finish adding or cancelling
url is the bookmark link
title is the name of the bookmark
The values of url and title get entered into 2 fields. Then the user has to click save or cancel.
The problem is when the bookmark page enters the values into the field, it will decode them.
Then if you try to save, it will won't let you save because the pdfname value in the url value has a space in it. It needs the link to not have any spaces. So basically, I want it so that after it enters it in the field, it will still be a %20 instead of a space.
There isn't a problem with source, auto, or title, just the url...
Is there a way to solve this? Like maybe a special escape character I can use for the %20?
Note: I cannot modify the bookmark page.
I am using c#/asp.net to create the link and go to it.
Thanks
Since .NET Framework 4.5 you can use WebUtility.UrlEncode.
It resides in System.dll, so it does not require any additional references.
It properly escapes characters for URLs, unlike Uri.EscapeUriString
It does not have any limits on the length of the string, unlike Uri.EscapeDataString, so it can be used for POST requests
System.Net.WebUtility.UrlEncode(urlText)
Another option is
System.Uri.EscapeDataString()
Uri.EscapeDataString() and Uri.UnescapeDataString() are safe comparing to UrlEncode/UrlDecode methods and does not convert plus characters into spaces when decoding.
Some details from another user: http://geekswithblogs.net/mikehuguet/archive/2009/08/16/134123.aspx
Just use HttpUtilty's UrlEncode method right before you hand off the url;
string encoded = HttpUtility.UrlEncode(url);

KeyNotFoundException with using HtmlEntity.DeEntitize() method

I am currently working on a scraper written in C# 4.0. I use variety of tools, including the built-in WebClient and RegEx features of .NET. For a part of my scraper I am parsing a HTML document using HtmlAgilityPack. I got everything to work as I desired and went through some cleanup of the code.
I am using the HtmlEntity.DeEntitize() method to clean up the HTML. I made a few tests and the method seemed to work great. But when I implemented the method in my code I kept getting KeyNotFoundException. There are no further details so I'm pretty lost. My code looks like this:
WebClient client = new WebClient();
string html = HtmlEntity.DeEntitize(client.DownloadString(path));
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
The HTML downloaded is UTF-8 encoded. How can I get around the KeyNotFound exception?
I understand that the problem is due to occurrence of non-standard characters. Say, for example, Chinese, Japanese etc.
After you find out that what characters are causing a problem, perhaps you could search for the suitable patch to htmlagilitypack here
This may be of some help to you in case you want to modify the htmlagilitypack source yourself.
Four years later and I have the same problem with some encoded characters (version 1.4.9.5). In my case, there is a limited set of characters that might generate the problem, so I have just created a function to perform the replacements:
// to be called before HtmlEntity.DeEntitize
public static string ReplaceProblematicHtmlEntities(string str)
{
var sb = new StringBuilder(str);
//TODO: add other replacements, as needed
return sb.Replace(".", ".")
.Replace("ă", "ă")
.Replace("â", "â")
.ToString();
}
In my case, the string contains both html-encoded characters and UTF-8 characters, but the problem is related to some encoded characters only.
This is not an elegant solution, but a quick fix for all those text with a limited (and known) amount of problematic encoded characters.
My HTML had a block of text like so:
... found in sections: 233.9 & 517.3; ...
Despite the spacing and decimal point, it was interpreting & 517.3; as a unicode character.
Simply HTML Encoding the raw text fixed the problem for me.
string raw = "sections: 233.9 & 517.3;";
// turn '&' into '&', etc, before DeEntitizing
string encoded = System.Web.HttpUtility.HtmlEncode(raw);
string deEntitized = HtmlEntity.DeEntitize(encoded);
In my case I have fixed this by updating HtmlAgilityPack to version 1.5.0

Categories

Resources