RavenDB: Object Field Value Being Rounded At Random

RavenDB: Object Field Value Being Rounded At Random - c#

First up, I want to apologise for the somewhat ambiguous question title, but I have literally no idea how else to describe this bizarre issue. Effectively, I have a document store inside of RavenDB, and despite trying to change it, there seems to be weird constraints that cause it to randomly round it to different values.
This is best shown through this gif I made of it:
Within the C# class that this is being created from, it's being stored as a ulong, and everything is right within the code until it's being loaded. At which point I have an ID mismatch. As you can see, the document name is the ID I am trying to paste in, they are meant to match; but I am being hit with this very weird glitch instead.
Does anyone have an idea?

The underlying issue lies with JavaScript numbers.
All JS numbers are actually double, which means that they are good for integer numbers up until 2^53 or so. Beyond that, you start to lose precision.
What is actually happening here is that the browser is loading the document and saving it back. This goes through the JS engine, which cause lose of precision of the number.
The problem occurs only if you are updating the documents through the studio.
In the latest version of RavenDB, this will be detected and you'll get a warning:

Related

What is the quality of Random class implementation in .NET?

I have two questions regarding implementation of Random class in .NET Framework 4.6 (code available here):
What is the rationale for setting Seed argument to 1 at the end of the constructor? It seems to be copy-pasted from Numerical Recipes in C (2nd Ed.) where it made some sense, but it doesn't have any in C#.
It is directly stated in the book (Numerical Recipes in C (2nd Ed.)) that inextp field is set to value 31 because:
The constant 31 is special; see Knuth.
However, in the .NET implementation this field is set to value 21. Why? The rest of a code seems to closely follow the code from book except for this detail.

Regarding the intexp issue, this is a bug, one which Microsoft has acknowledged and refused to fix due to backwards compatibility concerns.
Indeed, you have discovered a genuine problem with the Random implementation.
We have discussed it within the team and with some of our partners and concluded that we unfortunately cannot fix the problem right now. The reason is that some applications rely on the fact that when initialised with the same seed, the generator produces the same pseudo random sequence. Even if the change is for the better, it will break the applications that made this assumption once they have migrated to the “fixed” version.

For some more context:
A while back I fully analysed this implementation. I found a few differences.
A the first one (perfectly fine) is a different large value (MBIG). Numerical Recipies claims that Knuth makes it clear that any large value should work, so that is not an issue, and Microsoft reasonably chose to use the largest value of a 32 bit integer.
The second one was that constant, you mentioned. That one is a big deal. In the minimum it will substantially decrease period. There have been reports that the effects are actually worse than that.
But then comes one other particularly nasty difference. It is literally guarenteed to bias the output (since it does so directly), and will also likely affect the period of the RNG.
So what is this second issue? When .NET first came out, Microsoft did not realize that the RNG they coded was inclusive at both ends, and they documented it as exclusive at the maximum end. To fix this, the security team added a rather evil line of code: if (retVal == MBIG) retVal--;. This is very unfortunately as the correct fix would literally be only 4 added characters (plus whitespace).
The correct fix would have been to change MBIG to int.MaxValue-1, but switch Sample() to use MBIG+1 (i.e. to keep using int.MaxValue). That would guarantee the that Sample has the range [0.0, 1.0) without introducing any bias, and only changes the value of MBIG which Numerical Recipies said Knuth said is perfectly fine.

User defined formulas, evaluation, and speed

Lets say we have an array of a million elements, and we want to accept user input on what kind of math to apply to each of those elements.
Would the program need to evaluate the user formula (string) a million times, for each of the elements. Or can the formula itself somehow be saved and interpreted only once? And then be applied in a loop to the million elements?
I'm just trying to get a general idea on how this works. Because right now it looks like excel type programs are always interpreted which is way too slow for my data research. So basically what I'm doing is re-compiling my user defined math each and every time I change it. And there seems to be no way of compiling a program, and recompiling it while it itself is running. So that it could accept user input, compile it, and run it, without quitting. Maybe with DLLs and seperate app domains, but then all data has to be marshalled.
I just don't get how one gets top speed for data research, when everything is going against you. Including the stupid operating system that can't allow the DLL to be unloaded, loaded, because of security concerns. Or something. However maybe I'm just spurting nonsense, since I just began programming a few months ago.

In case you are interested in, or prefer to implement the solution yourself, you could parse the formula into a (binary) tree, where each node represents an operator (+, -, *,..) or a function (cos(), max(),..) and each leaf represents either a constant or one of the formula's variables.
This can be done in a pretty efficient way and then applied to the entire data set by recursively calculating the value of the root while filling in the correct variables at the leaves.
To figure out how best to do this, this question might provide some pointers (though I am fairly certain the answer provided there is only one of several ways):
Create Binary Tree from Mathematical Expression
Hope this helps or at least is of some interest to other readers!

If you want user input against a data set and expressions compiled on the fly, then check out a suitable C# REPL.
You should be able to sumply type in your transformations etc., and have the REPL compile and apply them. It's a useful tool/technique for trying out solutions prior to coding them up properly in classes/components.

What is the profession / preferred way to handle input validation

I am new to C# and SQL. But over the last few years while learning both in college a question really begins to burn inside me. Here it is:
It seems to me that there are really two very generic ways to handle input validation (i.e. checking for required fields, and data in the correct ranges ect).
The first, and the way shown traditionally is: Once you develop your UI, and have connected it to a database back end in some manner. On the user interface, you check for correct input, such as blank text boxes, number ranges, or to ensure a radio or check box is selected ect.
The second, and the way shown in database development is: To set check constraints on fields such as no nulls allowed, unique values, and even ranges and required fields.
My dilema is this. Given that in modern languages like C# you can do general execption handling, and also given that major league fault tolerance is built into most databases like SQL Server with regard to handling data changes in respect to committing all or none. Details like this, and to this level, would be hard to program in anything but the simplest of programs.
So my question is, why not build all the requirements directly into the table at the database back end. Take advantage of the aformentioned fault tolerance, and just forget about programming if statements to ensure correct data is input, and instead just use a generic catch all execption handler if the data is not committed.
Perhaps that is how it is done, if so I would really like to know for sure. If not, why? My preference is to avoid writing code whenever possible. Less code, less debugging, and less problems when it comes to updating. So I would tend to go with that approach of letting the DB back end do the work. Is this the generally correct thing to do.
I know that general execption handling is considered "expensive" in terms of resources. But surley once you get past 5 or 10 if statements to handle different fields and their constraints, it must be more efficient code wise to just do a general execption handler. It certantly seems easier to understand overall. (At least the way I do it).
Thanks for your help with this.

OK, here is why you need it in both places.
First the integrity of the data should be paramount and data can be changed directly in database tables (deliberately through a script to say update a million prices or by accident or even by disgruntled or criminal employees trying to disrupt the database or steal from the company). Therefore is it reckless to avoid using constraints directly in the database and it leads to bad data.
Now at the user interface level, you want to prevent the user from wasting his time submitting bad data and you want to prevent the servers and networks from wasting their time trying to process it, so you write checks at that level. Plus you don't want the data in an inconsistent state if you need to insert to several tables and aren't using a transction (which you should be using but I would suspect it happens less often than it should.) Plus the users hate it when you try the insert and it fails and tells you that X is wrong and then they fix X and now Y is wrong but it was wrong before, the process just didn't get as far as Y before.

You do both.
Create constraints at the DB - level, and check for those constraints on the client level as well.
The validation on the DB makes sure that no invalid data gets in your DB, no matter how the data is inputted.
The validation at the client side improves the user-experience.

You generally can't build all the logic for checking into the database. Also not validating user input sufficiently is a good way to open yourself up to attack.
One way to write lesss guard code in every method is 'Code Contracts' a product of microsoft research.
All input should be validated both client and server side. Always.
Also with a giant catch it would be hard to tell which field was in error. So you would end up writing a lot of which field exploded code at the other end.

While I generally advocate putting as much in the database as possible (which means that you can have a high degree of confidence about the "raw" data as possible), that isn't always possible, even with the powerful constraints and triggers available in SQL.
In addition, there are high-level "integrity" things which may change over time, and it is not realistic to always have temporally-dynamic conditions in constraints. i.e. all HR records since 2007 must have a non-NULL birthdate, but prior ones are allowed to remain NULL, but any row cannot ever be set back to NULL.
My point is you can almost never put it all in the database.
Put the things in that you can, and put others at higher levels in the system. The database is a very important part of any system, but it isn't the only part. As long as its design helps it protect its perimeter and be able to provide reliable service and guarantee what it says it will guarantee so that other parts of the system can rely on their assumptions, then that's about the most you can ask for.

In addition to all answers made here, like that UI control improves drammaticaly UX for the user, and can completely change "image" of your app, that validation on DB is made for correct insert the data to DB, but on client it have to be done for correct insert of the client data.
Consider an example of standalone enterprise app. A client work at home, he filled 20 invoices late night on his notebook in Mongolia. The day after he came back, and sync it with his office SAP server. If the error will be figure out only during sync of the data, you can imagine what awful is this situation.
Just an example. There could a plenty of others, I'm sure.
Good luck.

Its 2 years later and I have a decent amount of experience now. I am not going to accept my answer as the right one as many here have done a great job and I am very happy with their answers. But I want to add another important consideration that looking back over my experience has not been highlighted here. I also use stack overflow for reference as I progress and I always find myself looking back over my questions and answers which is another reason I wanted to add this. Like a note to my future self.
While working at that company, I was asked to build an app that would do job abc. With this I also had to build part of the database. As I was finishing with the company I learned that they were writing another app which would use my database. Effectively my point is, that as many have pointed out, data is paramount, and you don't know how it is going to be accessed when you're gone.
I have also learned that there are 3 places that data needs to be verified:
on the actual database as explained
on the server side code behind which is not the same as the DB or client side validation
on the client side
There is another worry. With the advent of new tech like tablets and smart phones. This is yet another place where validation has to be implemented. The same rules for a 4th time (unless its a web app).
I later learned that prior to MVC we had CGI forms which had something to do with handling data over the network (I humbly admit ignorance on hardware side) but from what was explained to me it seems there may even be a 5th place to do validation (although I am open to being totally wrong about that).
I think the next guru in computer science will make a name for himself if he can find a way to abstract all that verification and validation to one place so that such rules don't have to be altered in a bunch of places.
worst case:
DB
Server side code
Client side code for web apps
What about if:
There may be a native client app (i.e. windows, linux or mac (at least 6 now))
There may be various phone apps (android, iPhone, and win phone to name 3, at least 9 now))
There may be some CGI or whatever
This totals 10+ places without much exaggeration and there are other operating systems.
Even for a simple age range this is getting to be messy, but what if they bring out some new email format, or other complicated validation, or you have to change a bunch of validation rules. Now you have to modify them across at least 3 or 4 places which in itself is bad.
The major problem with that is that you are modifying a lot of code and infrastructure that has been invested in, tested, and usually proven to work and delivered to the market...
As the number of client sides grow, modifying well tested code, can't be a good thing. I think this is going to be a major headache for the future. I wonder if there will be a design pattern or best practice to resolve it. If anyone knows of one, please tell me.

Stopping integer overflow in ASP.NET

We use Acunetix at work to do security scans on our applications. Recently we came across the following Integer Vulnerabilities error below:
From what I can tell, it looks like the report is telling us that we are not stopping integer overflow attacks within querystrings. While we do use querystrings that eventually resolve to integers, they are first encrypted and then decrypted and converted to int using Convert.ToInt32() before we use them. I know that we should use TryParse() instead, but even if a hacker were to enter an integer value higher than max, they would fail when trying to decrypt before even trying to convert to integer, which is where the integer overflow would occur in my opinion. That is unless the error happens when the decryption fails?
I'm pretty confused about this and google searches haven't been much help as most pertain to unmanaged languages like c++ rather than c# and asp.net. Any help would be much appreciated.

I don't think this is an integer overflow vulnerability, I suspect it refers to integers as this is the type that has been manipulated in the querystring (although I know you said they were encrypted). If you're doing a direct conversion of untrusted user input to an integer and not first validating the type (as you say, TryParse it first), you're probably going to throw an internal error (short of any try/catches) and this is what they're likely picking up on.
Automated scanners go a bit nuts over HTTP 500 errors. They don't know what's actually happening under the covers and how severe the error is so you could argue that it's a false-positive. On the other hand, your security folks will argue that websites readily returning HTTP 500 are more likely to be probed further if a bot picks up that you're regularly throwing these errors out as a result of manipulated querystrings.
Easy answer: "All input must be validated against a whitelist of acceptable value ranges." (from here).

Unable to view values of variables while debugging

I'm trying to debug portions of the current application I'm working on, however when I try and check the value of a property/variable I get the error:
Cannot evaluate expression because a thread is stopped at a point where garbage collection is impossible, possibly because the code is optimized.
This is just a regular ASP.NET project. In some portions of the application I can view the properties and variables perfectly fine. I haven't figured out what's different about the blocks of code that I can and can not see the values of the variables in.

The problem was documented on an MSDN blog, as being a size limitation of certain types in certain situations, more details in the link. I believe it was 256 bytes and/or the total size/count of the number of arguments passed to a function. Sorry to say there does not seem to be a quick fix, but hopefully the MSDN blog entry will help you identify a way to solve your problem.

This article, Rules of Funceval gives a number of reasons why this can occur. If debugging is turned on and optimisation turned off already, there doesn't seem to be much else you can do about this problem.

Are you making release builds? Try changing the configuration to "debug" and see if it improves.

We have the same problem in two of our WinForm user controls. In both cases the user controls contain a lot of business logic (2000 and 3000 lines of code respectively) and make use of multiple fairly heavy objects (they have 30+ properties that get populated automatically from the database the first time when one of the properties are accessed). When you try to step through the (somewhat complicated) validation and saving methods, you get this same message when trying to access object properties.
We have come to the conclusion that the size and complexity of the user control combined with the size and complexity of the objects used and conditional database access just becomes too much for the debugger to handle and that we should probably just do some major refactoring to move most of the business logic out of the user control. It would be interesting to know if your problem arises from the same kind of situation and whether doing the said kind of refactoring actually does make a difference (we have not had the time and/or courage :) to do so).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.