working with very large integers in c#

working with very large integers in c# - c#

Does anybody know of a way I can calculate very large integers in c#
I am trying to calculate the factorial of numbers e.g.
5! = 5*4*3*2*1 = 120
with small numbers this is not a problem but trying to calculate the factorial of the bigest value of a unsigned int which is 4,294,967,295 it doesn't seem possible.
I have looked into the BigInteger class but it doesn't seem to do what I need
any help would be greatly appreciated

To calculate the factorial of uint.MaxValue you'd need a lot of storage.
For example, the Wikipedia article as 8.2639316883... × 10^5,565,708. You're going to gain information like crazy.
I strongly suspect you're not going find any way of calculating it on a sane computer in a sane amount of time. Why do you need this value? Would Stirling's approximation be close enough?

Firstly, it's worth pointing out that the factorial of uint.MaxValue is astronomically large. I'm not able to find a good estimate of the order of magnitude of its factorial, but its bit representation will probably occupy a high percentage of a standard RAM, if not well exceed.
A BigInteger class seems to be what you want, providing you only want to go up to around 1,000,000 or so (very roughly). After that, time and memory become very prohibitive. In current (stable) versions of .NET, up to 3.5, you have to go with a custom implementation. This one on the CodeProject seems to be highly rated. If you happen to be developing for .NET 4.0, the Microsoft team have finally gotten around to including a BigInteger class in the System.Numerics namespace of the BCL. Unlike some BigInteger implementations, the one existing in .NET 4.0 doesn't have a built-in factorial method (I'm not sure about the CodeProject one), but it should be trivial to implement one - an extension method would be a nice way.
Since you seem to think you don't want to use a BigInteger type, it would be helpful if you could verify that it's not what you want having read my reply, and then explain precisely why it doesn't suit your purposes.

4294967295! = 10^(10^10.597) ~ 10^(40000000000)
This value requires about 40 Gb of RAM to store, even if you will find any BigInteger implementation for C#!
P.S. Well, with optimized storing, let's say 9 digits in 4 bytes, it will take ~18 Gb of RAM.

Why do you think that you need to calculate those factorials? It's not practiacally useful for anything to do the actual calculations.
Just the result of calculating factorial of (2^32-1) would take up a lot of space, approximately 16 GB.
The calculation itself will of course take a lot of time. If you build the program so that you can transfer the calculation process to faster hardware as it is invented, you should be able to get the result within your lifetime.
If it's something like an Euler problem that you are trying to solve, consider that a lot of solutions are found by elliminating what it is that you actually don't have to calculate in order to get the answer.

Here .
The fastest one, straight from the Factorial Man - Peter Luschny.

You can use the BigInteger class from the J# libraries for now. Here's an article on how. It makes deployment harder because you have to send out the J# redistributable. You can also consider going to VS2010 beta as Framework 4.0 will have BigInteger.

In case you have J# redist installed, an alternative way would be using java.math.BigInteger by adding a reference to the vjslib assembly.

Try to use an array for this task. You could use as long integers as you have free memory space. Every member of array repsesents one decimal digit. The only you need is to implement multipication.

If you are doing calculations with factorials like combinations for example you rarely need to multiply all the way down to 1 (eg. 98 * 98 * 97 since everything else cancels out).

Related

Creating a new c# numeric filed thats bigger then the default numbers

We've been trying to implement a new number that can be any amount of bytes because we need a lot of precision during calculation. As well as that we're working in Unity Burst which at the moment does not support anything bigger then a float.
We've been doing this by using a byte array and implementing binary addition, subtraction, multiplication and division. And this is partly working.
The problem we're walking into is that the performance is way slower then the integer math.
So it will produce the following:
4*3 run 100.000 with int's will take about 0-1 milliseconds
4*3 run 100.000 with our number will take about 100 milliseconds
So the question is if we could use the same kind of implementation by inheriting or copying the integer math code. Or even just get a look at how it's implemented? I looked in the C# source code but can't quite find the actual math.
Any help or ideas would be appreciated.

As Tim Rutter suggested in the comments, take a look at the BigInteger class in .NET docs.
It isn't supported in the old Mono c# 2.0, but it is supported since .NET 4x in Unity 2017.1
To use it with burst compiler, you can convert the BigInteger instance to a byte array, as shown here.. I guess it would be similar to your workflow, except that you have to convert it back to BigInteger when making calculations.
If that doesn't work, I suggest taking a look at the first answer here, as it provides a solution for a custom BigInteger class (I haven't tried or tested it)

Which optimization algorithm should I use for maximizing profit with time limitations?

I would like to find an appropriate algorithm to solve this problem:
Suppose we have N projects and we know how much money we will earn by each project and how much time it is required for each project to be done(estimated time for each project). We have certain amount of available time to do all projects. We want to select projects so that our profit is maximized and overall estimated time does not exceed available time. Can you please advise which optimization algorithm should I use? Are there any already made things that I could use in C#, .NET technology or Java technology?

This sounds like straightforward Knapsack problem:
Given a set of items, each with a weight and a value, determine the count of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.
In your case, the weight is the time required for the projects, and the limit is the time limit.
Normally, if you are doing this for the real world, then a brute-force would suffice for the small cases, and greedy approximation with some randomization should be good enough for the large cases if you don't really care for the accurate maximal. However, I doubt if anyone would use such a strict model for the real world.
In the case of theoretical interest, knapsack problem is NP-hard, and an active field of algorithm.

What you're looking for is called the Knapsack problem.
in your case the "weight" limit is the time limit, and the value is the value.

Simplified, this looks like a weighted http://en.wikipedia.org/wiki/Knapsack_problem. Your time would be the size of the project and your weight would be your costs

Coming at this problem from an Operations Research perspective you are looking at some for of mixed-integer program (MIP). The knapsack problem approach might be sufficient but without getting more specifics on the problem I can't suggest a more detailed formulation.
Once you've decided on your formulation there are a couple of c# solutions available to solve the MIP. Microsoft has a Microsoft Solver Foundation that you can look into that is capable of solving simple MIPs and that has a nice C# API
IBM recently purchased the OPL optimization package (considered industry leading) that you can use to develop your MIP formulation. Once you have the formulation OPL offers .NET APIs that you can call to run your models.
Having gone the OPL route myself I would avoid using the OPL .NET APIs if possible because they are very cumbersome. If your problem is simple you may want to look into solver foundation because it offers a modern and clean API as compared to the OPL

.NET Neural Network or AI for Future Predictions

I am looking for some kind of intelligent (I was thinking AI or Neural network) library that I can feed a list of historical data and this will predict the next sequence of outputs.
As an example I would like to feed the library the following figures 1,2,3,4,5
and based on this, it should predict the next sequence is 6,7,8,9,10 etc.
The inputs will be a lot more complex and contain much more information.
This will be used in a C# application.
If you have any recommendations or warning that will be great.
Thanks
EDIT
What I am trying to do i using historical sales data, predict what amount a specific client is most likely going to spend in the next period.
I do understand that there are dozens of external factors that can influence a clients purchases but for now I need to merely base it on the sales history and then plot a graph showing past sales and predicted sales.

If you're looking for a .NET API, then I would recommend you try AForge.NET http://code.google.com/p/aforge/
If you just want to try various machine learning algorithms on a data set that you have at your disposal, then I would recommend that you play around with Weka; it's (relatively) easy to use and it implements a lot of ML/AI algorithms. Run multiple runs with different settings for each algorithm and try as many algorithms as you can. Most of them will have some predictive power and if you combine the right ones, then you might really get something useful.

If I understand your question correctly, you want to approximate and extrapolate an unknown function. In your example, you know the function values
f(0) = 1
f(1) = 2
f(2) = 3
f(3) = 4
f(4) = 5
A good approximation for these points would be f(x) = x+1, and that would yield f(5) = 6... as expected. The problem is, you can't solve this without knowledge about the function you want to extrapolate: Is it linear? Is it a polynomial? Is it smooth? Is it (approximately or exactly) cyclic? What is the range and domain of the function? The more you know about the function you want to extrapolate, the better your predictions will be.

I just have a warning, sorry. =)
Mathematically, there is no reason for your sequence above to be followed by a "6". I can easily give you a simple function, whose next value is any value you like. Its just that humans like simple rules, and therefore tend to see a connection in these sequences, that in reality is not there. Therefore, this is a impossible task for a computer, if you do not want to feed it with additional information.
Edit:
In the case that you suspect your data to have a known functional dependence, and there are uncontrollable outside factors, maybe regression analysis will have good results. To start easy, look at linear regression first.
If you cannot assume linear dependence, there is a nice application that looks for functions fitting your historical data... I'll update this post with its name as soon as I remember. =)

Big integers in C#

Currently I am borrowing java.math.BigInteger from the J# libraries as described here. Having never used a library for working with large integers before, this seems slow, on the order of 10 times slower, even for ulong length numbers. Does anyone have any better (preferably free) libraries, or is this level of performance normal?

As of .NET 4.0 you can use the System.Numerics.BigInteger class. See documentation here: http://msdn.microsoft.com/en-us/library/system.numerics.biginteger(v=vs.110).aspx
Another alternative is the IntX class.
IntX is an arbitrary precision
integers library written in pure C#
2.0 with fast - O(N * log N) - multiplication/division algorithms
implementation. It provides all the
basic operations on integers like
addition, multiplication, comparing,
bitwise shifting etc.

F# also ships with one. You can get it at Microsoft.FSharp.Math.

The System.Numerics.BigInteger class in .NET 4.0 is based on Microsoft.SolverFoundation.Common.BigInteger from Microsoft Research.
The Solver Foundation's BigInteger class looks very performant. I am not sure about which license it is released under, but you can get it here (download and install Solver Foundation and find the Microsoft.Solver.Foundation.dll).

I reckon you could optimize the implementation if you perform all the operations on BigInts that are going to return results smaller than a native type (Eg. int64) on the native types and only deal with the big array if you are going to overflow.
edit
This implementation on codeproject, seems only 7 times slower ... But with the above optimization you could get it to perform almost identically to native types for small numbers.

Here are several implementations of BigInteger in C#.
I've used Mono's BigInteger implementation, works pretty fast (I've used it in CompactFramework)
Bouncy Castle
Mono

I'm not sure about the performance, but IronPython also has a BigInteger class. It is in the Microsoft.Scripting.Math namespace.

Yes, it will be slow, and 10x difference is about what I'd expect. BigInt uses an array to represent an arbitrary length, and all the operations have to be done manually (as opposed to most math which can be done directly with the CPU)
I don't even know if hand-coding it in assembly will give you much of a performance gain over 10x, that's pretty damn close. I'd look for other ways to optimize it--sometimes depending on your math problem there are little tricks you can do to make it quicker.

I used Biginteger at a previous job. I don't know what kind of performance needs you have. I did not use it in a performance-intensive situation, but never had any problems with it.

This may sound like a strange suggestion, but have you tested the decimal type to see how fast it works?
The decimal range is ±1.0 × 10^−28 to ±7.9 × 10^28, so it may still not be large enough, but it is larger than a ulong.
There was supposed to be a BigInteger class in .NET 3.5, but it got cut.

This won't help you, but there was supposed to be a BigInteger class in .Net 3.5; it got cut, but from statements made at PDC, it will be in .Net 4.0. They apparently have spent a lot of time optimizing it, so the performance should be much better than what you're getting now.
Further, this question is essentially a duplicate of How can I represent a very large integer in .NET?

See the answers in this thread. You will need to use one of the third-party big integer libraries/classes available or wait for C# 4.0 which will include a native BigInteger datatype.

This Looks very promising. It is a C# Wrapper over GMP.
http://web.rememberingemil.org/Projects/GnuMpDotNet/GnuMpDotNet.html
There are also other BigInteger options for .Net here in particular, Mpir.Net

You can also use the Math.Gmp.Native Nuget package that I wrote. Its source code is available on GitHub, and documentation is available here. It exposes to .NET all of the functionality of the GMP library which is known as a highly-optimized arbitrary-precision arithmetic library.
Arbitrary-precision integer are represented by the mpz_t type. Operations on these integers all begin with the mpz_ prefix. For examples, mpz_add or mpz_cmp. Source code examples are given for each operation.

Text difference algorithm

I need an algorithm that can compare two text files and highlight their difference and ( even better!) can compute their difference in a meaningful way (like two similar files should have a similarity score higher than two dissimilar files, with the word "similar" defined in the normal terms). It sounds easy to implement, but it's not.
The implementation can be in c# or python.
Thanks.

I can recommend to take a look at Neil Fraser's code and articles:
google-diff-match-patch
Currently available in Java,
JavaScript, C++ and Python. Regardless
of language, each library features the
same API and the same functionality.
All versions also have comprehensive
test harnesses.
Neil Fraser: Diff Strategies - for theory and implementation notes

In Python, there is difflib, as also others have suggested.
difflib offers the SequenceMatcher class, which can be used to give you a similarity ratio. Example function:
def text_compare(text1, text2, isjunk=None):
return difflib.SequenceMatcher(isjunk, text1, text2).ratio()

Look at difflib. (Python)
That will calculate the diffs in various formats. You could then use the size of the context diff as a measure of how different two documents are?

My current understanding is that the best solution to the Shortest Edit Script (SES) problem is Myers "middle-snake" method with the Hirschberg linear space refinement.
The Myers algorithm is described in:
E. Myers, ``An O(ND) Difference
Algorithm and Its Variations,''
Algorithmica 1, 2 (1986), 251-266.
The GNU diff utility uses the Myers algorithm.
The "similarity score" you speak of is called the "edit distance" in the literature which is the number of inserts or deletes necessary to transform one sequence into the other.
Note that a number of people have cited the Levenshtein distance algorithm but that is, albeit easy to implement, not the optimal solution as it is inefficient (requires the use of a possibly huge n*m matrix) and does not provide the "edit script" which is the sequence of edits that could be used to transform one sequence into the other and vice versa.
For a good Myers / Hirschberg implementation look at:
http://www.ioplex.com/~miallen/libmba/dl/src/diff.c
The particular library that it is contained within is no longer maintained but to my knowledge the diff.c module itself is still correct.
Mike

Bazaar contains an alternative difference algorithm, called patience diff (there's more info in the comments on that page) which is claimed to be better than the traditional diff algorithm. The file 'patiencediff.py' in the bazaar distribution is a simple command line front end.

If you need a finer granularity than lines, you can use Levenshtein distance. Levenshtein distance is a straight-forward measure on how to similar two texts are.
You can also use it to extract the edit logs and can a very fine-grained diff, similar to that on the edit history pages of SO.
Be warned though that Levenshtein distance can be quite CPU- and memory-intensive to calculate, so using difflib,as Douglas Leder suggested, is most likely going to be faster.
Cf. also this answer.

There are a number of distance metrics, as paradoja mentioned there is the Levenshtein distance, but there is also NYSIIS and Soundex. In terms of Python implementations, I have used py-editdist and ADVAS before. Both are nice in the sense that you get a single number back as a score. Check out ADVAS first, it implements a bunch of algorithms.

As stated, use difflib. Once you have the diffed output, you may find the Levenshtein distance of the different strings as to give a "value" of how different they are.

You could use the solution to the Longest Common Subsequence (LCS) problem. See also the discussion about possible ways to optimize this solution.

One method I've employed for a different functionality, to calculate how much data was new in a modified file, could perhaps work for you as well.
I have a diff/patch implementation C# that allows me to take two files, presumably old and new version of the same file, and calculate the "difference", but not in the usual sense of the word. Basically I calculate a set of operations that I can perform on the old version to update it to have the same contents as the new version.
To use this for the functionality initially described, to see how much data was new, I simple ran through the operations, and for every operation that copied from the old file verbatim, that had a 0-factor, and every operation that inserted new text (distributed as part of the patch, since it didn't occur in the old file) had a 1-factor. All characters was given this factory, which gave me basically a long list of 0's and 1's.
All I then had to do was to tally up the 0's and 1's. In your case, with my implementation, a low number of 1's compared to 0's would mean the files are very similar.
This implementation would also handle cases where the modified file had inserted copies from the old file out of order, or even duplicates (ie. you copy a part from the start of the file and paste it near the bottom), since they would both be copies of the same original part from the old file.
I experimented with weighing copies, so that the first copy counted as 0, and subsequent copies of the same characters had progressively higher factors, in order to give a copy/paste operation some "new-factor", but I never finished it as the project was scrapped.
If you're interested, my diff/patch code is available from my Subversion repository.

Take a look at the Fuzzy module. It has fast (written in C) based algorithms for soundex, NYSIIS and double-metaphone.
A good introduction can be found at: http://www.informit.com/articles/article.aspx?p=1848528

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.