The scenario I have seems pretty common but I did not found good solution so far. So there's ASP.NET-MVC application with MSSQL database sitting in back-end. The model includes class A with string field Description that needs to be limited to 100 characters. It's declared as follows:
[StringLength(100)]
public virtual string Description { get; set; }
The corresponding column in the database is nvarchar with column length = 100. In the UI, Description field is represented by textarea with maxlength = 100.
Now the issue comes when there're line breaks in the textarea. They are interpreted as 1 character on client side but on server side they are represented by Environment.NewLine which is 2 characters (\r\n). So it exceeds the database column length (actually server side validation fails before the whole back-end thing but let's omit the validation for simplicity).
The solutions I found so far are:
Add some magic on client side in order to interpret line break in
textarea as two characters. I don't like this solution as it can
confuse a user.
Replace \r\n with \n on server side. Seems like a hack that could
have some side effects.
Remove/increase column length in the database. The simpliest one
(without taking server-side validation issue into account) but let's
say it's not the case.
I guess there should be something else besides those three.
This is an old and known issue with MVC (I am not sure if it was solved)
but different browsers treat line break differently. My suggestion is to custom model binder like what you find here jQuery validate textarea maxlength bug
Related
This is a question that has been asked before, but I've not found the information I'm looking for or maybe I'm just missing the point so please bear with me. I can always adjust my question if I'm asking it the wrong way.
If for example, I have a POST endpoint that use a simply DTO object with 2 properties (i.e. companyRequestDto) and contains a script tag in one of its properties. When I call my endpoint from Postman I use the following:
{
"company": "My Company<script>alert(1);</script>",
"description": "This is a description"
}
When it is received by the action in my endpoint,
public void Post(CompanyRequestDto companyRequestDto)
my DTO object will automatically be set and its properties will be set to:
companyDto.Company = "My Brand<script>alert(1);</script>";
companyDto.Description = "This is a description";
I clearly don't want this information to be stored in our database as is, nor do I want it stored as an escaped string as displayed above.
1) Request: So my first question is how do I throw an error if the DTO posted contains some invalid content such as the tag?
I've looked at Microsoft AntiXss but I don't understand how to handle this as the data provided in the properties of a DTO object is not an html string but just a string, so What I am missing here as I don't understand how this is helping sanitizing or validating the passed data.
When I call
var test = AntiXss.AntiXssEncoder.HtmlEncode(companyRequestDto.Company, true);
It returns an encoded string, but then what??
Is there a way to remove disallowed keywords or just simply throw an error?
2) Response: Assuming 1) was not implemented or didn't work properly and it ended up being stored in our database, am I suppose to return encoded data as a json string, so instead of returning:
"My company"
Am I suppose to return:
"My Company<script>alert(1)</script>"
Is the browser (or whatever app) just supposed to display as below then?:
"My Company<script>alert(1)</script>"
3) Code: Assuming there is a way to sanitize or throw an error, should I use this at the property level using attribute on all the properties of my various DTO objects or is there a way to apply this at the class level using an attribute that will validate and/or sanitize all string properties of a DTO object for example?
I found interesting articles but none really answering my problems or I'm having other problems with some of the answers:
asp.net mvc What is the difference between AntiXss.HtmlEncode and HttpUtility.HtmlEncode?
Stopping XSS when using WebAPI (currently looking into this one but don't see how example is solving problem as property is always failing whether I use the script tag or not)
how to sanitize input data in web api using anti xss attack (also looking at this one but having a problem calling ReadFromStreamAsync from my project at work. Might be down to some of the settings in my web.config but haven't figured out why but it always seems to return an empty string)
Thanks.
UPDATE 1:
I've just finished going through the answer from Stopping XSS when using WebAPI
This is probably the closest one to what I am looking for. Except I don't want to encode the data, as I don't want to store it in my database, so I'll see if I can figure out how to throw an error but I'm not sure what the condition will be. Maybe I should just look for characters such as <, >, ; , etc... as these will not likely be used in any of our fields.
You need to consider where your data will be used when you think about encoding, so that data with in it is only a problem if it's rendered as HTML so if you are going to display data that has been provided by users anywhere, it's probably at the point you are going to display it that you would want to html encode it for display (you want to avoid repeatedly html encoding the same string when saving it for example).
Again, it depends what the response is going to be used for... you probably want to html encode it at the point it's going to be displayed... remember if you are encoding something in the response it may not match whats in data so if the calling code could do something like call your API to search for a company with that name that could cause problems. If the browser does display the html encoded version it might look ugly but it's better than users being compromised by XSS attacks.
It's quite difficult to sanitize text for things like tags if you allow most characters for normal use. It's easier if you can whitelist characters allowed and only allow, say, alphanumeric but that isn't often possible. This can be done using a regex validation attribute on the DTO object. The best approach I think is to encode values for display if you can't stop certain characters. It's really difficult to try to allow all characters but avoid things like as people can start using ascii characters etc.
I'm trying to reference an image like this:
<img src="/controller/method/#Model.attribute">
This works until the attribute has a plus sign. I already know that the + sign has a semantic meaning but I'd like to keep it, because some values have the plus sign.
I've tried:
<img src="/controller/method/#HttpUtility.HtmlEncode(#Model.attribute)">
And on the server side:
public method(string param)
{
string p = HttpUtility.HtmlDecode(param);
}
How can I accomplish this using ASP.NET MVC 5?
You need to use UrlEncode:
<img src="/controller/method/#HttpUtility.UrlEncode(Model.attribute)">
And do nothing in the method:
public ActionResult method(string param){
// param should already be decoded
}
Did some testing and got error page while trying to reproduce scenario you described.
Here is related question: double escape sequence inside a url : The request filtering module is configured to deny a request that contains a double escape sequence
In my designs, I'm avoiding any direct use of model fields as part of the URL. It's not only the question of URL-encoding them - which you can always do - but also the question of readability.
What I do instead is to add another field to the model, which is the URL-ready representation of an attribute. That field can be calculated from the original field by only accepting letters and numbers and replacing spaces or any other character with a dash.
For example, if you had the attribute set to someone's pencil + one, the auto-created URL version of this attribute would be someone-s-pencil-one.
You can customize this process, make it recognize some domain-specific words, etc. But that is the general idea I'm always following in my designs.
As a quick solution you can use a regular expression to isolate acceptable words and then separate them with dashes for better readability:
string encoded = string.Join("-",
Regex.Matches(attributeValue, #"[a-zA-z0-9]+")
.Cast<Match>()
.Select(match => match.Value)
.ToArray());
When done this way, you must account for possible duplicates. Part of the information is lost with this encoding.
If you fear that two models could clash with the same URL, then you have to do something to break the clash. Some websites append a GUID to the generated URL to make it unique.
Another possibility is to generate a short random string, like 3-5 letters only, and store it in the database so that you can control its uniqueness. Everything in this solution is subordinated to readability, keep that in mind.
I'm having problems creating a query string and sending it to another webpage.
The text I'm trying to send is long and has special characters. Here is an example:
Represent a fraction 1/𝘣 on a number line diagram by defining the interval from 0 to 1 as the whole and partitioning it into 𝘣 equal parts. Recognize that each part has size 1/𝘣 and that the endpoint of the part based at 0 locates the number 1/𝘣 on the number line.
I can send this just fine if I hand code it:
<a href="Default.cshtml?standardText=Represent a fraction 1/𝘣 on a number line diagram by defining the interval from 0 to 1 as the whole and partitioning it into 𝘣 equal parts. Recognize that each part has size 1/𝘣 and that the endpoint of the part based at 0 locates the number 1/𝘣 on the number line.">
Link Text
</a>
This goes through without any problems, and I can read the entire Query String on the other side.
But if I am creating the link programmatically, my query string gets cut off right before the first character reference. I am using the following setup in a helper function:
string url = "Default.cshtml";
url += "?standardText=" + standard.text;
Link Text
When I use this, I only get "Understand a Fraction as 1/" and then it stops.
When I look at the page source, the only difference in the links is that one has actual ampersands and the second is having those turned into &
<a href="Default.cshtml?standardText=Understand a fraction 1/𝘣 as the quantity formed by 1 part when a whole is partitioned into 𝘣 equal parts; understand a fraction 𝘢/𝑏 as the quantity formed by 𝘢 parts of size 1/𝘣."
So the problem is not really the spaces, but the fact that the & is being interpreted as starting a new query string parameter.
I have tried various things [using HttpUtility.UrlEncode, HttpUtility.UrlEncodeUnicode, Html.Raw, trying to replace spaces with "+"], but the problem isn't with the spaces, its with how the character references are being handled. When I tried HttpUtility.urlEncode I got a double-encoding security error.
On the advice of OmG I tried replacing all the &s, #s, and /s using:
url = url.Replace("&","%26");
url = url.Replace("#","%23");
url = url.Replace("/","%2F");
This led to the following link:
All Items
And now when I click on the link I get a different security warning/error:
A potentially dangerous Request.QueryString value was detected from the client (standardText="...raction 1/𝘣 as the qua...").
I don't see why it is so hard to send character references through a QueryString. Is there a way to prevent Razor from converting all my &s to the & ; ? The address works fine when it is just plain "&"s.
Update: using URLDecode() on the string does not affect its character entity references, so when I try to decode the string then re-encode it, I still get the double-escape security warning.
Update: on the suggestion of #MikeMcCaughan, I tried using JS, but I am not very knowledgeable about mixing JS and Razor. I tried creating a link by dropping a script into the body like so:
<script type="text/javascript">
var a = document.createElement('a');
var linkText = document.createTextNode("my title text");
a.appendChild(linkText);
a.title = "my title text";
a.href = encodeURIComponent(#url);
document.body.appendChild(a);
</script>
But no link showed up, so I'm obviously doing it wrong.
For reference, when I try to use #Html.Raw(url),
Link Text
The &s are still turned into & ;s. the link renders as:
Link text
One simple solution is replacing the special characters by their encoding which can be accessed from here.
As you can find, replace in the string & with %26 using .replace for string. Also, replace / with %2F, # with %23, ; with %3B, and space with %20.
Also, You can do these in C# by the following function:
Server.URLEncode("<The Url>")
and in Javascript by the following function:
encodeURI("<The Url>")
Also, as you know the double-encoding is this. To prevent the double-encoding, you should have not encoded some part of the string before passing the string into the Server.URLEncode function.
I have this route:
(Ignore the XML Format)
<route name="note" url="{noteId}-{title}">
<constraints>
<segment name="noteId" value="\d+" />
<segment name="title" value=".+" />
</constraints>
</route>
I want it to match urls like /1234-hello-kitty or /5578-a-b-c-ddd-f-g
MVC Routing Handler seems to be having some troubles with this.
I have been reading a lot about the subject, and I found out some interesting facts:
MVC first identifies the route segments and then it checks the constraints
MVC reads segments from right to left, meaning it first identifies title and then noteId
Taking the first example, I'm guessing MVC is identifying noteId as 1234-hello and title as kitty.
This fails when constraints are checked and therefore the route is not matched.
Is there another way to do this?
Please take in account that I want to keep both my segments noteId and title, and they should be both separated by a hyphen - (This is mandatory)
I can see couple of options for an approach to solving this issue:
URLRewriting
One possibility is to rewrite URLs (similar to mod_rewrite) to convert them from format that is imposed on you into a format that MVC can route natively. There is an IIS Module from Microsoft that does just that, and I believe (though not certain) would have the necessary functionality to accomplish the task in your case. The basic principle here is that if the format cannot be handled by MVC due to route template parsing rules, then convert the URL to something that it can manage before it even reaches the MVC route handling. URL Rewrite is an IIS Module that sits before MVC handler, examines the requests, and is able to rewrite the request from one form into another. Then, this altered form is what is seen by MVC and can be understood and parsed by it. E.g. the URL of /1234-hello-kitty can be rewritten by the module as /1234/hello-kitty and then MVC route template would be a simple {noteId}/{*title}. The downside caveat here is that generating links may not work here since generated links would look like /1234/hello-kitty rather than /1234-hello-kitty. However, mitigation may be to have a route specifically for link generation and not for routing defined as {noteId}-{title}. I believe (should be verified) that this will actually generate a link in form /1234-hello-kitty (albeit not being able to parse it on incoming request).
Custom MVC route handler
This one basically draws on the idea that if MVC doesn't do it for you, override its behavior to do what you wish it would do. The tactical aspect of this is described in SO post on how to provide your own handler. The way you would use it is you can provide your own interpretation of parsing of segments of url to route data, and provide the actual values as you parse them into requestContext.RouteData.Values["nodeId"] = /* your code that gets noteId out of URL. */. The rest of the application works as any other, knowing nothing about this surgical intervention in routing.
I have been reading a lot about the subject, and I found out some
interesting facts:
MVC first identifies the route segments and then it checks the constraints
MVC reads segments from right to left, meaning it first identifies title and then noteId
Taking the first example, I'm guessing MVC is identifying noteId
as 1234-hello and title as kitty.
This fails when constraints are checked and therefore the route is not
matched.
Those facts and the guess are completely correct. This is how ASP.NET routing works, unfortunately.
Why?
ASP.NET routing Simply works in two phases, first parse all routes and second try to match them for every request.
Considering your case, first parses:
Split a routeUrl by "/". Each segment is a path segment. You have just one: "{noteId}-{title}".
For each path segment, split them into sub-segments: parameters and literals. Parameters are enclosed by {} and literals are the rest. You have 3 sub-segments: {noteId}, - and {title}
Then, try to match (when having multiple sub-segments):
Find the last occurrence of last literal (-) and match the text after the literal to the last parameter (title).
Repeats 1st to finish all parameters and literals. If the URL or sub-segments are longer, the match fails.
Possible solutions
So in order to use a literal you have to make sure that your literal won't occur in parameters. Since you are stick with a dash, you may have some possible solutions.
You can use one parameter and no literal with a matching constraint (e.g. ^\d+-[\w-]+$), then try to parse the id inside controller action. This requires no changes in existing URL structure.
You can switch places of title and noteId, like /hello-kitty-1234.
You can try double dashes as literal, like /1234--hello-kitty.
Im using this code:
$(document).ready(function () {
var breadCrumps = $('.breadcrumb');
breadCrumps.find('span').text("<%= ArticleSectionData.title %>");
});
title is a property which has values encoded in unicode (I think). These are Greek letters. On the local IIS developer server (embedded in visual studio), the characters are displayed in correct way but, on the test server they appear as:
Σ
Do You know any solution for this problem ?
Thanks for help
EDIT:
I have changed the code a little bit:
breadCrumps.find('span').text(<%= ArticleSectionData.title %>);
And now it works correctly, encoding is frustrating ...
If you are working off of a different database in test than in dev, then I suspect the issue is with the data. If you are storing HTML entities (eg, Σ) in your database, then you need to use .html(). If you are storing actual unicode characters (eg, Σ) in the database, then you need to use .text(). The way to represent Σ in html is with Σ. But if you set the text of an element to Σ, it displays that literally - the innerHTML of that element would contain Σ.
I don't know root of problem, but you can use this http://www.strictly-software.com/htmlencode for decode Σ to Sigma