Stopping integer overflow in ASP.NET

Stopping integer overflow in ASP.NET - c#

We use Acunetix at work to do security scans on our applications. Recently we came across the following Integer Vulnerabilities error below:
From what I can tell, it looks like the report is telling us that we are not stopping integer overflow attacks within querystrings. While we do use querystrings that eventually resolve to integers, they are first encrypted and then decrypted and converted to int using Convert.ToInt32() before we use them. I know that we should use TryParse() instead, but even if a hacker were to enter an integer value higher than max, they would fail when trying to decrypt before even trying to convert to integer, which is where the integer overflow would occur in my opinion. That is unless the error happens when the decryption fails?
I'm pretty confused about this and google searches haven't been much help as most pertain to unmanaged languages like c++ rather than c# and asp.net. Any help would be much appreciated.

I don't think this is an integer overflow vulnerability, I suspect it refers to integers as this is the type that has been manipulated in the querystring (although I know you said they were encrypted). If you're doing a direct conversion of untrusted user input to an integer and not first validating the type (as you say, TryParse it first), you're probably going to throw an internal error (short of any try/catches) and this is what they're likely picking up on.
Automated scanners go a bit nuts over HTTP 500 errors. They don't know what's actually happening under the covers and how severe the error is so you could argue that it's a false-positive. On the other hand, your security folks will argue that websites readily returning HTTP 500 are more likely to be probed further if a bot picks up that you're regularly throwing these errors out as a result of manipulated querystrings.
Easy answer: "All input must be validated against a whitelist of acceptable value ranges." (from here).

Related

RavenDB: Object Field Value Being Rounded At Random

First up, I want to apologise for the somewhat ambiguous question title, but I have literally no idea how else to describe this bizarre issue. Effectively, I have a document store inside of RavenDB, and despite trying to change it, there seems to be weird constraints that cause it to randomly round it to different values.
This is best shown through this gif I made of it:
Within the C# class that this is being created from, it's being stored as a ulong, and everything is right within the code until it's being loaded. At which point I have an ID mismatch. As you can see, the document name is the ID I am trying to paste in, they are meant to match; but I am being hit with this very weird glitch instead.
Does anyone have an idea?

The underlying issue lies with JavaScript numbers.
All JS numbers are actually double, which means that they are good for integer numbers up until 2^53 or so. Beyond that, you start to lose precision.
What is actually happening here is that the browser is loading the document and saving it back. This goes through the JS engine, which cause lose of precision of the number.
The problem occurs only if you are updating the documents through the studio.
In the latest version of RavenDB, this will be detected and you'll get a warning:

Does UuidCreate use a CSPRNG?

Note that this is not my application, it is an application I am pentesting for a client. I usually ask questions like this on https://security.stackexchange.com/, however as this is more programming related I have asked on here.
Granted, RFC 4122 for UUIDs does not specify that type 4 UUIDs have to be generated by a Cryptographically Secure Pseudo Random Number Generator (CSPRNG). It simply says
Set all the other bits to randomly (or pseudo-randomly) chosen
values.
Although, some implementations of the algorithm, such as this one in Java, do use a CSPRNG.
I was trying to dig into whether Microsoft's implementation does or not. Mainly around how .NET or MSSQL Server generates them.
Checking the .NET source we can see this code:
Marshal.ThrowExceptionForHR(Win32Native.CoCreateGuid(out guid), new IntPtr(-1));
return guid;
Checking the CoCreateGuid docco, it states
The CoCreateGuid function calls the RPC function UuidCreate
All I can find out about this function is here. I seem to have reached the end of the rabbit hole.
Now, does anyone have any information on how UuidCreate generates its UUIDs?
I've seen many related posts:
How Random is System.Guid.NewGuid()? (Take two)
Is using a GUID a valid way to generate a random string of characters and numbers?
How securely unguessable are GUIDs?
how are GUIDs generated in SQL Server?
The first of which says:
A GUID doesn't make guarantees about randomness, it makes guarantees
around uniqueness. If you want randomness, use Random to generate a
string.
I agree with this except in my case for random, unpredictable numbers you'd of course use a CSPRNG instead of Random (e.g. RNGCryptoServiceProvider).
And the latter states (actually quoted from Wikipedia):
Cryptanalysis of the WinAPI GUID generator shows that, since the
sequence of V4 GUIDs is pseudo-random; given full knowledge of the
internal state, it is possible to predict previous and subsequent
values
Now, on the other side of the fence this post from Will Dean says
The last time I looked into this (a few years ago, probably XP SP2), I
stepped right down into the OS code to see what was actually
happening, and it was generating a random number with the secure
random number generator.
Of course, even if it was currently using a CSPRNG this would be implementation specific and subject to change at any point (e.g. any update to Windows). Unlikely, but theoretically possible.
My point is that there's no canonical reference for this, the above was to demonstrate that I've done my research and none of the above posts reference anything authoritative.
The reason is that I'm trying to decide whether a system that uses GUIDs for authentication tokens needs to be changed. From a pure design perspective, the answer is a definite yes, however from a practical point of view, if the Windows UuidCreate function does infact use a CSPRNG, then there is no immediate risk to the system. Can anyone shed any light on this?
I'm looking for any answers with a reputable source to back it up.

Although I'm still just some guy on the Internet, I have just repeated the exercise of stepping into UuidCreate, in a 32-bit app running on a 64-bit version of Windows 10.
Here's a bit of stack from part way through the process:
> 0018f670 7419b886 bcryptPrimitives!SymCryptAesExpandKeyInternal+0x7f
> 0018f884 7419b803 bcryptPrimitives!SymCryptRngAesGenerateSmall+0x68
> 0018f89c 7419ac08 bcryptPrimitives!SymCryptRngAesGenerate+0x3b
> 0018f8fc 7419aaae bcryptPrimitives!AesRNGState_generate+0x132
> 0018f92c 748346f1 bcryptPrimitives!ProcessPrng+0x4e
> 0018f93c 748346a1 RPCRT4!GenerateRandomNumber+0x11
> 0018f950 00dd127a RPCRT4!UuidCreate+0x11
It's pretty clear that it's using an AES-based RNG to generate the numbers. GUIDs generated by calling other people's GUID generation functions are still not suitable for use as unguessable auth tokens though, because that's not the purpose of the GUID generation function - you're merely exploiting a side effect.
Your "Unlikely, but theoretically possible." about changes in implementation between OS versions is rather given the lie by this statement in the docs for "UuidCreate":
If you do not need this level of security, your application can use the UuidCreateSequential function, which behaves exactly as the UuidCreate function does on all other versions of the operating system.
i.e. it used to be more predictable, now it's less predictable.

What is the profession / preferred way to handle input validation

I am new to C# and SQL. But over the last few years while learning both in college a question really begins to burn inside me. Here it is:
It seems to me that there are really two very generic ways to handle input validation (i.e. checking for required fields, and data in the correct ranges ect).
The first, and the way shown traditionally is: Once you develop your UI, and have connected it to a database back end in some manner. On the user interface, you check for correct input, such as blank text boxes, number ranges, or to ensure a radio or check box is selected ect.
The second, and the way shown in database development is: To set check constraints on fields such as no nulls allowed, unique values, and even ranges and required fields.
My dilema is this. Given that in modern languages like C# you can do general execption handling, and also given that major league fault tolerance is built into most databases like SQL Server with regard to handling data changes in respect to committing all or none. Details like this, and to this level, would be hard to program in anything but the simplest of programs.
So my question is, why not build all the requirements directly into the table at the database back end. Take advantage of the aformentioned fault tolerance, and just forget about programming if statements to ensure correct data is input, and instead just use a generic catch all execption handler if the data is not committed.
Perhaps that is how it is done, if so I would really like to know for sure. If not, why? My preference is to avoid writing code whenever possible. Less code, less debugging, and less problems when it comes to updating. So I would tend to go with that approach of letting the DB back end do the work. Is this the generally correct thing to do.
I know that general execption handling is considered "expensive" in terms of resources. But surley once you get past 5 or 10 if statements to handle different fields and their constraints, it must be more efficient code wise to just do a general execption handler. It certantly seems easier to understand overall. (At least the way I do it).
Thanks for your help with this.

OK, here is why you need it in both places.
First the integrity of the data should be paramount and data can be changed directly in database tables (deliberately through a script to say update a million prices or by accident or even by disgruntled or criminal employees trying to disrupt the database or steal from the company). Therefore is it reckless to avoid using constraints directly in the database and it leads to bad data.
Now at the user interface level, you want to prevent the user from wasting his time submitting bad data and you want to prevent the servers and networks from wasting their time trying to process it, so you write checks at that level. Plus you don't want the data in an inconsistent state if you need to insert to several tables and aren't using a transction (which you should be using but I would suspect it happens less often than it should.) Plus the users hate it when you try the insert and it fails and tells you that X is wrong and then they fix X and now Y is wrong but it was wrong before, the process just didn't get as far as Y before.

You do both.
Create constraints at the DB - level, and check for those constraints on the client level as well.
The validation on the DB makes sure that no invalid data gets in your DB, no matter how the data is inputted.
The validation at the client side improves the user-experience.

You generally can't build all the logic for checking into the database. Also not validating user input sufficiently is a good way to open yourself up to attack.
One way to write lesss guard code in every method is 'Code Contracts' a product of microsoft research.
All input should be validated both client and server side. Always.
Also with a giant catch it would be hard to tell which field was in error. So you would end up writing a lot of which field exploded code at the other end.

While I generally advocate putting as much in the database as possible (which means that you can have a high degree of confidence about the "raw" data as possible), that isn't always possible, even with the powerful constraints and triggers available in SQL.
In addition, there are high-level "integrity" things which may change over time, and it is not realistic to always have temporally-dynamic conditions in constraints. i.e. all HR records since 2007 must have a non-NULL birthdate, but prior ones are allowed to remain NULL, but any row cannot ever be set back to NULL.
My point is you can almost never put it all in the database.
Put the things in that you can, and put others at higher levels in the system. The database is a very important part of any system, but it isn't the only part. As long as its design helps it protect its perimeter and be able to provide reliable service and guarantee what it says it will guarantee so that other parts of the system can rely on their assumptions, then that's about the most you can ask for.

In addition to all answers made here, like that UI control improves drammaticaly UX for the user, and can completely change "image" of your app, that validation on DB is made for correct insert the data to DB, but on client it have to be done for correct insert of the client data.
Consider an example of standalone enterprise app. A client work at home, he filled 20 invoices late night on his notebook in Mongolia. The day after he came back, and sync it with his office SAP server. If the error will be figure out only during sync of the data, you can imagine what awful is this situation.
Just an example. There could a plenty of others, I'm sure.
Good luck.

Its 2 years later and I have a decent amount of experience now. I am not going to accept my answer as the right one as many here have done a great job and I am very happy with their answers. But I want to add another important consideration that looking back over my experience has not been highlighted here. I also use stack overflow for reference as I progress and I always find myself looking back over my questions and answers which is another reason I wanted to add this. Like a note to my future self.
While working at that company, I was asked to build an app that would do job abc. With this I also had to build part of the database. As I was finishing with the company I learned that they were writing another app which would use my database. Effectively my point is, that as many have pointed out, data is paramount, and you don't know how it is going to be accessed when you're gone.
I have also learned that there are 3 places that data needs to be verified:
on the actual database as explained
on the server side code behind which is not the same as the DB or client side validation
on the client side
There is another worry. With the advent of new tech like tablets and smart phones. This is yet another place where validation has to be implemented. The same rules for a 4th time (unless its a web app).
I later learned that prior to MVC we had CGI forms which had something to do with handling data over the network (I humbly admit ignorance on hardware side) but from what was explained to me it seems there may even be a 5th place to do validation (although I am open to being totally wrong about that).
I think the next guru in computer science will make a name for himself if he can find a way to abstract all that verification and validation to one place so that such rules don't have to be altered in a bunch of places.
worst case:
DB
Server side code
Client side code for web apps
What about if:
There may be a native client app (i.e. windows, linux or mac (at least 6 now))
There may be various phone apps (android, iPhone, and win phone to name 3, at least 9 now))
There may be some CGI or whatever
This totals 10+ places without much exaggeration and there are other operating systems.
Even for a simple age range this is getting to be messy, but what if they bring out some new email format, or other complicated validation, or you have to change a bunch of validation rules. Now you have to modify them across at least 3 or 4 places which in itself is bad.
The major problem with that is that you are modifying a lot of code and infrastructure that has been invested in, tested, and usually proven to work and delivered to the market...
As the number of client sides grow, modifying well tested code, can't be a good thing. I think this is going to be a major headache for the future. I wonder if there will be a design pattern or best practice to resolve it. If anyone knows of one, please tell me.

When is it appropriate to use error codes?

In languages that support exception objects (Java, C#), when is it appropriate to use error codes? Is the use of error codes ever appropriate in typical enterprise applications?
Many well-known software systems employ error codes (and a corresponding error code reference). Some examples include operating systems (Windows), databases (Oracle, DB2), and middle-ware products (WebLogic, WebSphere). What benefits do error codes provide? What are the disadvantages to using error codes?

WITHIN a program one should always use exceptions instead of error codes. However, exceptions can't propagate beyond a program. Any time the error must leave the program you are left with error messages or error codes.
For simple things that will always be human-operated error messages without codes are fine. You can say "File not found" without giving it an error code. However, if it might be another computer on the other end then you should give error codes in addition. You don't want to break the other system when you change it to "File <x> not found".

I don't think I've ever used error codes in .Net except in one situation - when I was creating a console application that I knew was going to be called from another app. This other app had to know when the console app failed, and what went wrong. So, one example of when it would be appropriate would be when you know your program will be called by other programs, and you want a structured way for them to understand errors.
That said, I was a newbie to .NET at the time, and have never used error codes since.
As a side note, as a Windows guy, it's nice to be able to plop in an error code and come up with a KB article, so an error code combined with good documentation and the ability to find it = nice feelings from your users.

Very common for web service interfaces. It's very easy and standard to return a code with a description.
I agree that for most of the scenarios is old school
I'd say the biggest disadvantages it's the quality of code. You have to add more complex logic to manage error codes while exceptions are bubbled without having to use method parameters or return values.
You also have to add an "IF" to check if the returned code is SUCCESS or not, while exceptions goes directly to the error handling block.

I'm a newbie to stack overflow but...
I believe that error codes tend to be used or useful for dealing with erroneous situations that require an end-user of sorts to get involved to rectify a situation. If your code is to be maintained by another developer then exceptions is the way to go. However, in a situation where there is a problem:
in the environment that your application is running
with communication between your app and some other entity (web server, database, socket, etc)
that a device or device driver indicates (hardware failure maybe?)
then error codes may make sense. For example, if your app attempted to log into a database on behalf of your end-user, but the DB was unreachable for authentication (DB is off-line, cable is unplugged) then an error code/description combo might help the end-user rectify the problem.
Again at the developer/engineer level who will be able to touch the source code (traditional debugging and testing techniques) and modify it, use exceptions.
Hope this helps...
--jqpdev

I frequently use error codes when an error needs to be conveyed to the user, since they can be internationalized. For example, in a compiler, if there are errors in user code, errors can be signaled in the compiler backend, while the frontend can localize them into culture/language-specific strings for user consumption. Enums may be better for this purpose than raw integers, however.
I've also used them in creating an "error reporting" framework for the app. When exceptions were thrown, they were thrown with an error code, which, when the exception bubbled up, was sent (with a log) to the central server. The code helped organize the database so we could inspect logs related to a specific error.
Finally, as mentioned in a couple other answers, error codes are easy and language-agnostic to google (think Windows error codes/MS KB articles), so an error code with a description of what went wrong may be better for end-users of a technical product.
The idea of error codes is useful, but IMO they belong as exception members or as parameters to an IErrorReporter interface or something more ofthen than as method return values.

Error codes are old-school. They are of little to no value at all.
The only possible value to an error code is that it can identify a very specific circumstance. You could have a code for each point in the code base that can throw an exception. This would allow you to narrow down very precisely what the problem must be.
But nobody cares about that level of detail. Who wants to maintain such a mess. It would leave you with codes that meant something like "condition A and B but not C due to state S". It's more effort than it's worth to try to work out exactly what that means. A stack trace will be more valuable in telling you where in the program the problem occurred.
I learned to program computers before exceptions were a widespread technique. I'm so glad we got exceptions instead!

C#, and probably Java too, supports a better exception handling control flow, the finally keyword, which makes things a little nicer than using error codes. An exception object can contain any level of detail, certainly much more than an error code. So the exception object is way more practical, but you might run into an uncommon case where an error code would be more appropriate.
FWIW, C++ also supports exception objects. I don't think that C++ supports a finally keyword (though the newer C++ whatevers just might), but in C++ you also have to avoid things like returning inside a catch handler.

Error codes were designed in an age where the only way for a function to tell the caller that something went wrong was to assign a special meaning to one or more values of those which can be returned, and very frequently only a native integer or so was available for returning that special value.
For instance, in C the "get character" routine returns the next character value in ASCII, but returns a negative value if for some reason something went wrong. You are then responsible for returning to YOUR caller in a way so this error situation can be handled, and that must return etc.
The Exception mechanism is an elegant way to handle this "this is an emergency, we must return from code until something can deal with the problem". Error codes are inferior to this.

I've written many web services that are consumed by other (remote) applications. When things go badly with a request, customers more or less insist on getting a code, so that they don't have to do some horrific string comparison to find out what went wrong.
Take HTTP result codes as a fine example of this sort of behavior. "200" means happy, "300" could go either way, "400" or "500" means start freaking out.

Error codes are for if you want to send them to the user. If not, use an exception.

Sometimes you don't want to give too much information to the user when an error occurs. For example, a user is not able to sign a new contract. The error message only states something generic like "Cannot sign a new contract".
This adds difficulty to support cases where the user thinks this is not correct. If you have an error code, for example a number or an acronym, it could be part of the error message. The user wouldn't know what it means but the support staff could look it up and could then check if that specific reason for declining the new contract is indeed an error or not.

Verifying that something has been "done" through hashes/encryption

So, to start off, I want to point out that I know that these things are never fool-proof and if enough effort is applied anything can be broken.
But: Say I hand a piece of software to someone (that I have written) and get them to run it. I want to verify the result that they get. I was thinking of using some sort of encryption/hash that I can use to verify that they've run it and obtained a satisfactory result.
I also don't want the result to be "fakeable" (though again, I know that if enough effort to break it is applied etc etc...). This means therefore, that if I use a hash, I can't just have a hash for "yes" and a hash for "no" (as this means the hash is going to be only one of 2 options - easily fakeable).
I want the user of the tool to hand something back to me (in possibly an email for example), something as small as possible (so for example, I don't want to be trawling through lines and lines of logs).
How would you go about implementing this? I possibly haven't explained things the greatest, but hopefully you get the gist of what I want to do.
If anyone has implemented this sort of thing before, any pointers would be much appreciated.
This question is more about "how to implement" rather than specifically asking about code, so if I've missed an important tag please feel free to edit!

I think what you're looking for is non-repudiation. You're right, a hash won't suffice here - you'd have to look into some kind of encryption and digital signature on the "work done", probably PKI. This is a pretty wide field, I'd say you'll need both authentication and integrity verification (e.g. Piskvor did that, and he did it this way at that time).
To take a bird's eye view, the main flow would be something like this:
On user's computer:
run process
get result, add timestamp etc.
encrypt, using your public key
sign, using the user's private key (you may need some way to identify the user here - passphrases, smart cards, biometrics, ...)
send to your server
On your server:
verify signature using the user's public key
decrypt using your private key
process as needed
Of course, this gets you into the complicated and wonderful world that is Public Key Infrastructure; but done correctly, you'll have a rather good assurance that the events actually happened the way your logs show.

I'm pasting in one of your comments here, because it goes to the heart of the matter:
Hi Eric. I should have pointed out
that the tool isn't going out
publically, it will go to a select
list of trusted users. The tool being
disassembled isn't an issue. I'm not
really bothered about encryption, all
I need to do is be able to verify that
they ran a specific process and got a
legitimate result. The tool verifies
stuff, so I don't want them to just
assume that something works fine and
not run the tool.
So basically, the threat we're protecting against is lazy users, who will fail to run the process and simply say "Yes Andy, I ran it!". This isn't too hard to solve, because it means we don't need a cryptographically unbreakable system (which is lucky, because that isn't possible in this case, anyway!) - all we need is a system where breaking it is more effort for the user than just following the rules and running the process.
The easiest way to do this is to take a couple of items that aren't constant and are easy for you to verify, and hash them. For example, your response message could be:
System Date / Time
Hostname
Username
Test Result
HASH(System Date / Time | Hostname | Username | Test Result)
Again, this isn't cryptographically secure - anyone who knows the algorithm can fake the answer - but as long as doing so is more trouble than actually running the process, you should be fine. The inclusion of the system date/time protects against a naive replay attack (just sending the same answer as last time), which should be enough.

How about you take the output of your program (either "yes" or "no"?), and concatenate it with a random number, then include the hash of that string?
So you end up with the user sending you something like:
YES-3456234
b23603f87c54800bef63746c34aa9195
This means there will be plenty of unique hashes, despite only two possible outputs.
Then you can verify that md5("YES-3456234") == "b23603f87c54800bef63746c34aa9195".
If the user is not technical enough to figure out how to generate an md5 hash, this should be enough.
A slightly better solution would be concatenate another (hard-coded, "secret") salt in order to generate the hash, but leave this salt out of the output.
Now you have:
YES-3456234
01428dd9267d485e8f5440ab5d6b75bd
And you can verify that
md5("YES-3456234" + "secretsalt") == "01428dd9267d485e8f5440ab5d6b75bd"
This means that even if the user is clever enough to generate his own md5 hash, he can't fake the output without knowing the secret salt as well.
Of course, if he is clever enough, he can extract the salt from your program.
If something more bullet-proof is needed, then you're looking at proper cryptographic signature generation, and I'll just refer you to Piskvor's answer, since I have nothing useful to add to that :)

In theory this is possible by using some sort of private salt and a hashing algorithm, basically a digital signature. The program has a private salt that it adds to the input before hashing. Private means the user does not have access to it, you however do know the salt.
The user sends you his result and the signature generated by the program. So you can now confirm the result by checking if hash(result + private_salt) == signature. If it is not, the result is forged.
In practice this is almost impossible, because you cannot hide the salt from the user. It's basically the same problem that is discussed in this question: How do you hide secret keys in code?

You could make the application a web app to which they have no source code access or access to the server on which it runs. Then your application can log its activity, and you can trust those logs.
Once a user has an executable program in their hands, then anything can be faked.

It's worth noting that you aren't really looking for encryption.
The "non-repudiation" answer is almost on the money, but all you really need to guarantee where your message has come from is to securely sign the message. It doesn't matter if someone can intercept and read your message, as long as they can't tamper with it without you knowing.
I've implemented something similar before information was sent plaintext - because it wasn't confidential - but an obfuscated signing mechanism meant that we could be (reasonably) confident that the message was generated by our client software.
Note that you can basically never guarantee security if the app is on someone else's hardware - but security is never about "certainty", it's about "confidence" - are you confident enough for your business needs that the message hasn't been tampered with?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.