Code efficiency and accuracy - c#

I'm trying to solve a problem on code wars and the unit tests provided make absolutely no sense...
The problem is as follows and sounds absolutely simple enough to have something working in 5 minutes
Consider a sequence u where u is defined as follows:
The number u(0) = 1 is the first one in u.
For each x in u, then y = 2 * x + 1 and z = 3 * x + 1 must be in u too.
There are no other numbers in u.
Ex: u = [1, 3, 4, 7, 9, 10, 13, 15, 19, 21, 22, 27, ...]
1 gives 3 and 4, then 3 gives 7 and 10, 4 gives 9 and 13, then 7 gives 15 and 22 and so on...
Task:
Given parameter n the function dbl_linear (or dblLinear...) returns the element u(n) of the ordered (with <) sequence u.
Example:
dbl_linear(10) should return 22
At first I used a sortedset with a linq query as I didnt really care about efficiency, I quickly learned that this operation will have to calculate to ranges where n could equal ~100000 in under 12 seconds.
So this abomination was born, then butchered time and time again since a for loop would generate issues for some reason. It was then "upgraded" to a while loop which gave slightly more passed unit tests ( 4 -> 8 ).
public class DoubleLinear {
public static int DblLinear(int n) {
ListSet<int> table = new ListSet<int> {1};
for (int i = 0; i < n; i++) {
table.Put(Y(table[i]));
table.Put(Z(table[i]));
}
table.Sort();
return table[n];
}
private static int Y(int y) {
return 2 * y + 1;
}
private static int Z(int z) {
return 3 * z + 1;
}
}
public class ListSet<T> : List<T> {
public void Put(T item) {
if (!this.Contains(item))
this.Add(item);
}
}
With this code it still fails the calculation in excess of n = 75000, but passes up to 8 tests.
I've checked if other people have passed this, and they have. However, i cannot check what they wrote to learn from it.
Can anyone provide insight to what could be wrong here? I'm sure the answer is blatantly obvious and I'm being dumb.
Also is using a custom list in this way a bad idea? is there a better way?

ListSet is slow for sorting, and you constantly get memory reallocation as you build the set. I would start by allocating the table in its full size first, though honestly I would also tell you using a barebones array of the size you need is best for performance.
If you know you need n = 75,000+, allocate a ListSet (or an ARRAY!) of that size. If the unit tests start taking you into the stratosphere, there is a binary segmentation technique we can discuss, but that's a bit involved and logically tougher to build.
I don't see anything logically wrong with the code. The numbers it generates are correct from where I'm standing.
EDIT: Since you know 3n+1 > 2n+1, you only ever have to maintain 6 values:
Target index in u
Current index in u
Current x for y
Current x for z
Current val for y
Current val for z
public static int DblLinear(int target) {
uint index = 1;
uint ind_y = 1;
uint ind_z = 1;
uint val_y = 3;
uint val_z = 4;
if(target < 1)
return 1;
while(index < target) {
if(val_y < val_z) {
ind_y++;
val_y = 2*ind_y + 1;
} else {
ind_z++;
val_z = 3*ind_z + 1;
}
index++;
}
return (val_y < val_z) ? val_y : val_z;
}
You could modify the val_y if to be a while loop (more efficient critical path) if you either widen the branch to 2 conditions or implement a backstep loop for when you blow past your target index.
No memory allocation will definitely speed your calculations up, even f people want to (incorrectly) belly ache about branch prediction in such an easily predictable case.
Also, did you turn optimization on in your Visual Studio project? If you're submitting a binary and not a code file, then that can also shave quite a bit of time.

Related

Why is it faster to calculate the product of a consecutive array of integers by performing the calculation in pairs?

I was trying to create my own factorial function when I found that the that the calculation is twice as fast if it is calculated in pairs. Like this:
Groups of 1: 2*3*4 ... 50000*50001 = 4.1 seconds
Groups of 2: (2*3)*(4*5)*(6*7) ... (50000*50001) = 2.0 seconds
Groups of 3: (2*3*4)*(5*6*7) ... (49999*50000*50001) = 4.8 seconds
Here is the c# I used to test this.
Stopwatch timer = new Stopwatch();
timer.Start();
// Seperate the calculation into groups of this size.
int k = 2;
BigInteger total = 1;
// Iterates from 2 to 50002, but instead of incrementing 'i' by one, it increments it 'k' times,
// and the inner loop calculates the product of 'i' to 'i+k', and multiplies 'total' by that result.
for (var i = 2; i < 50000 + 2; i += k)
{
BigInteger partialTotal = 1;
for (var j = 0; j < k; j++)
{
// Stops if it exceeds 50000.
if (i + j >= 50000) break;
partialTotal *= i + j;
}
total *= partialTotal;
}
Console.WriteLine(timer.ElapsedMilliseconds / 1000.0 + "s");
I tested this at different levels and put the average times over a few tests in a bar graph. I expected it to become more efficient as I increased the number of groups, but 3 was the least efficient and 4 had no improvement over groups of 1.
Link to First Data
Link to Second Data
What causes this difference, and is there an optimal way to calculate this?
BigInteger has a fast case for numbers of 31 bits or less. When you do a pairwise multiplication, this means a specific fast-path is taken, that multiplies the values into a single ulong and sets the value more explicitly:
public void Mul(ref BigIntegerBuilder reg1, ref BigIntegerBuilder reg2) {
...
if (reg1._iuLast == 0) {
if (reg2._iuLast == 0)
Set((ulong)reg1._uSmall * reg2._uSmall);
else {
...
}
}
else if (reg2._iuLast == 0) {
...
}
else {
...
}
}
public void Set(ulong uu) {
uint uHi = NumericsHelpers.GetHi(uu);
if (uHi == 0) {
_uSmall = NumericsHelpers.GetLo(uu);
_iuLast = 0;
}
else {
SetSizeLazy(2);
_rgu[0] = (uint)uu;
_rgu[1] = uHi;
}
AssertValid(true);
}
A 100% predictable branch like this is perfect for a JIT, and this fast-path should get optimized extremely well. It's possible that _rgu[0] and _rgu[1] are even inlined. This is extremely cheap, so effectively cuts down the number of real operations by a factor of two.
So why is a group of three so much slower? It's obvious that it should be slower than for k = 2; you have far fewer optimized multiplications. More interesting is why it's slower than k = 1. This is easily explained by the fact that the outer multiplication of total now hits the slow path. For k = 2 this impact is mitigated by halving the number of multiplies and the potential inlining of the array.
However, these factors do not help k = 3, and in fact the slow case hurts k = 3 a lot more. The second multiplication in the k = 3 case hits this case
if (reg1._iuLast == 0) {
...
}
else if (reg2._iuLast == 0) {
Load(ref reg1, 1);
Mul(reg2._uSmall);
}
else {
...
}
which allocates
EnsureWritable(1);
uint uCarry = 0;
for (int iu = 0; iu <= _iuLast; iu++)
uCarry = MulCarry(ref _rgu[iu], u, uCarry);
if (uCarry != 0) {
SetSizeKeep(_iuLast + 2, 0);
_rgu[_iuLast] = uCarry;
}
why does this matter? Well, EnsureWritable(1) causes
uint[] rgu = new uint[_iuLast + 1 + cuExtra];
so rgu becomes length 3. The number of passes in total's code is decided in
public void Mul(ref BigIntegerBuilder reg1, ref BigIntegerBuilder reg2)
as
for (int iu1 = 0; iu1 < cu1; iu1++) {
...
for (int iu2 = 0; iu2 < cu2; iu2++, iuRes++)
uCarry = AddMulCarry(ref _rgu[iuRes], uCur, rgu2[iu2], uCarry);
...
}
which means that we have a total of len(total._rgu) * 3 operations. This hasn't saved us anything! There are only len(total._rgu) * 1 passes for k = 1 - we just do it three times!
There is actually an optimization on the outer loop that reduces this back down to len(total._rgu) * 2:
uint uCur = rgu1[iu1];
if (uCur == 0)
continue;
However, they "optimize" this optimization in a way that hurts far more than before:
if (reg1.CuNonZero <= reg2.CuNonZero) {
rgu1 = reg1._rgu; cu1 = reg1._iuLast + 1;
rgu2 = reg2._rgu; cu2 = reg2._iuLast + 1;
}
else {
rgu1 = reg2._rgu; cu1 = reg2._iuLast + 1;
rgu2 = reg1._rgu; cu2 = reg1._iuLast + 1;
}
For k = 2, that causes the outer loop to be over total, since reg2 contains no zero values with high probability. This is great because total is way longer than partialTotal, so the fewer passes the better. For k = 3, the EnsureWritable(1) will always cause a spare space because the multiplication of three numbers no more than 15 bits long can never exceed 64 bits. This means that, although we still only do one pass over total for k = 2, we do two for k = 3!
This starts to explain why the speed increases again beyond k = 3: the number of passes per addition increases slower than the number of additions decreases, as you're only adding ~15 bits to the inner value each time. The inner multiplications are fast relative to the massive total multiplications, so the more time spent consolidating values, the more time saved in passes over total. Further, the optimization is less frequently a pessimism.
It also explains why odd values take longer: they add an extra 32-bit integer to the _rgu array. This won't happen so cleanly if the ~15 bits wasn't so close to half of 32.
It's worth noting that there are a lot of ways to improve this code; the comments here are about why, not how to fix it. The easiest improvement would be to chuck the values in a heap and multiply only the two smallest values at a time.
The time required to do a BigInteger multiplication depends on the size of the product.
Both methods take the same number of multiplications, but if you multiply the factors in pairs, then the average size of the product is much smaller than it is if you multiply each factor with the product of all the smaller ones.
You can do even better if you always multiply the two smallest factors (original factors or intermediate products) that have yet to be multiplied together, until you get to the complete product.
I think you have a bug ('+' instead of '*').
partialTotal *= i + j;
Good to check that you are getting the right answer, not just interesting performance metrics.
But I'm curious what motivated you to try this. If you do find a difference, I would expect it would have to do with optimalities in register and/or memory allocation. And I would expect it would be 0-30% or something like that, not 50%.

Packing item from set of available packs

Suppose there is an Item that a customer is ordering - in this case it turns out they are ordering 176 (totalNeeded) of this Item.
The database has 5 records associated with this item that this item can be stored in:
{5 pack, 8 pack, 10 pack, 25 pack, 50 pack}.
A rough way of packing this would be:
Sort the array from biggest to smallest.
While (totalPacked < totalNeeded) // 176
{
1. Maintain an <int, int> dictionary which contains Keys of pack id's,
and values of how many needed
2. Add the largest pack, which is not larger than the amount remaining to pack,
increment totalPacked by the pack size
3. If any remainder is left over after the above, add the smallest pack to reduce
waste
e.g., 4 needed, smallest size is 5, so add one 5; one extra item packed
}
Based on the above logic, the outcome would be:
You need: 3 x 50 packs, 1 x 25 pack, 1 x 5 pack
Total Items: 180
Excess = 4 items; 180 - 176
The above is not too difficult to code, I have it working locally. However, it is not truly the best way to pack this item. Note: "best" means, smallest amount of excess.
Thus ... we have an 8 pack available, we need 176. 176 / 8 = 22. Send the customer 22 x 8 packs, they will get exactly what they need. Again, this is even simpler than the pseudo-code I wrote ... see if the total needed is evenly divisible by any of the packs in the array - if so, "at the very least" we know that we can fall back on 22 x 8 packs being exact.
In the case that the number is not divisible by an array value, I am attempting to determine possible way that the array values can be combined to reach at least the number we need (176), and then score the different combinations by # of Packs needed total.
If anyone has some reading that can be done on this topic, or advice of any kind to get me started it would be greatly appreciated.
Thank you
This is a variant of the Subset Sum Problem (Optimization version)
While the problem is NP-Complete, there is a pretty efficient pseudo-polynomial time Dynamic Programming solution to it, by following the recursive formulas:
D(x,i) = false x<0
D(0,i) = true
D(x,0) = false x != 0
D(x,i) = D(x,i-1) OR D(x-arr[i],i
The Dynamic Programming Solution will build up a table, where an element D[x][i]==true iff you can use the first i kinds of packs to establish sum x.
Needless to say that D[x][n] == true iff there is a solution with all available packs that sums to x. (where n is the total number of packs you have).
To get the "closest higher number", you just need to create a table of size W+pack[0]-1 (pack[0] being the smallest available pack, W being the sum you are looking for), and choose the value that yields true which is closest to W.
If you wish to give different values to the different pack types, this becomes Knapsack Problem, which is very similar - but uses values instead a simple true/false.
Getting the actual "items" (packs) chosen after is done by going back the table and retracing your steps. This thread and this thread elaborate how to achieve it with more details.
If this example problem is truly representative of the actual problem you are solving, it is small enough to try every combination with brute force using recursion. For example, I found exactly 6,681 unique packings that are locally maximized, with a total of 205 that have exactly 176 total items. The (unique) solution with minimum number of packs is 6, and that is { 2-8, 1-10, 3-50 }. Total runtime for the algorithm was 8 ms.
public static List<int[]> GeneratePackings(int[] packSizes, int totalNeeded)
{
var packings = GeneratePackingsInternal(packSizes, 0, new int[packSizes.Length], totalNeeded);
return packings;
}
private static List<int[]> GeneratePackingsInternal(int[] packSizes, int packSizeIndex, int[] packCounts, int totalNeeded)
{
if (packSizeIndex >= packSizes.Length) return new List<int[]>();
var currentPackSize = packSizes[packSizeIndex];
var currentPacks = new List<int[]>();
if (packSizeIndex + 1 == packSizes.Length) {
var lastOptimal = totalNeeded / currentPackSize;
packCounts[packSizeIndex] = lastOptimal;
return new List<int[]> { packCounts };
}
for (var i = 0; i * currentPackSize <= totalNeeded; i++) {
packCounts[packSizeIndex] = i;
currentPacks.AddRange(GeneratePackingsInternal(packSizes, packSizeIndex + 1, (int[])packCounts.Clone(), totalNeeded - i * currentPackSize));
}
return currentPacks;
}
The algorithm is pretty straightforward
Loop through every combination of number of 5-packs.
Loop through every combination of number of 8-packs, from remaining amount after deducting specified number of 5-packs.
etc to 50-packs. For 50-pack counts, directly divide the remainder.
Collect all combinations together recursively (so it dynamically handles any set of pack sizes).
Finally, once all the combinations are found, it is pretty easy to find all packs with least waste and least number of packages:
var packSizes = new int[] { 5, 8, 10, 25, 50 };
var totalNeeded = 176;
var result = GeneratePackings(packSizes, totalNeeded);
Console.WriteLine(result.Count());
var maximal = result.Where (r => r.Zip(packSizes, (a, b) => a * b).Sum() == totalNeeded).ToList();
var min = maximal.Min(m => m.Sum());
var minPacks = maximal.Where (m => m.Sum() == min).ToList();
foreach (var m in minPacks) {
Console.WriteLine("{ " + string.Join(", ", m) + " }");
}
Here is a working example: https://ideone.com/zkCUYZ
This partial solution is specifically for your pack sizes of 5, 8, 10, 25, 50. And only for order sizes at least 40 large. There are a few gaps at smaller sizes that you'll have to fill another way (specifically at values like 6, 7, 22, 27 etc).
Clearly, the only way to get any number that isn't a multiple of 5 is to use the 8 packs.
Determine the number of 8-packs needed with modular arithmatic. Since the 8 % 5 == 3, each 8-pack will handle a different remainder of 5 in this cycle: 0, 2, 4, 1, 3. Something like
public static int GetNumberOf8Packs(int orderCount) {
int remainder = (orderCount % 5);
return ((remainder % 3) * 5 + remainder) / 3;
}
In your example of 176. 176 % 5 == 1 which means you'll need 2 8-packs.
Subtract the value of the 8-packs to get the number of multiples of 5 you need to fill. At this point you still need to deliver 176 - 16 == 160.
Fill all the 50-packs you can by integer dividing. Keep track of the leftovers.
Now just fit the 5, 10, 25 packs as needed. Obviously use the larger values first.
All together your code might look like this:
public static Order MakeOrder(int orderSize)
{
if (orderSize < 40)
{
throw new NotImplementedException("You'll have to write this part, since the modular arithmetic for 8-packs starts working at 40.");
}
var order = new Order();
order.num8 = GetNumberOf8Packs(orderSize);
int multipleOf5 = orderSize - (order.num8 * 8);
order.num50 = multipleOf5 / 50;
int remainderFrom50 = multipleOf5 % 50;
while (remainderFrom50 > 0)
{
if (remainderFrom50 >= 25)
{
order.num25++;
remainderFrom50 -= 25;
}
else if (remainderFrom50 >= 10)
{
order.num10++;
remainderFrom50 -= 10;
}
else if (remainderFrom50 >= 5)
{
order.num5++;
remainderFrom50 -= 5;
}
}
return order;
}
A DotNetFiddle

Terras Conjecture in C#

I'm having a problem generating the Terras number sequence.
Here is my unsuccessful attempt:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Terras
{
class Program
{
public static int Terras(int n)
{
if (n <= 1)
{
int return_value = 1;
Console.WriteLine("Terras generated : " + return_value);
return return_value;
}
else
{
if ((n % 2) == 0)
{
// Even number
int return_value = 1 / 2 * Terras(n - 1);
Console.WriteLine("Terras generated : " + return_value);
return return_value;
}
else
{
// Odd number
int return_value = 1 / 2 * (3 * Terras(n - 1) + 1);
Console.WriteLine("Terras generated : " + return_value);
return return_value;
}
}
}
static void Main(string[] args)
{
Console.WriteLine("TERRAS1");
Terras(1); // should generate 1
Console.WriteLine("TERRAS2");
Terras(2); // should generate 2 1 ... instead of 1 and 0
Console.WriteLine("TERRAS5");
Terras(5); // should generate 5,8,4,2,1 not 1 0 0 0 0
Console.Read();
}
}
}
What am I doing wrong?
I know the basics of recursion, but I don’t understand why this doesn’t work.
I observe that the first number of the sequence is actually the number that you pass in, and subsequent numbers are zero.
Change 1 / 2 * Terros(n - 1); to Terros(n - 1)/2;
Also 1 / 2 * (3 * Terros(n - 1) + 1); to (3 * Terros(n - 1) + 1)/2;
1/2 * ... is simply 0 * ... with int math.
[Edit]
Recursion is wrong and formula is mis-guided. Simple iterate
public static void Terros(int n) {
Console.Write("Terros generated :");
int t = n;
Console.Write(" " + t);
while (t > 1) {
int t_previous = t;
if (t_previous%2 == 0) {
t = t_previous/2;
}
else {
t = (3*t_previous+1)/2;
}
Console.Write(", " + t);
}
Console.WriteLine("");
}
The "n is even" should be "t(subscript n-1) is even" - same for "n is odd".
int return_value = 1 / 2 * Terros(n - 1);
int return_value = 1 / 2 * (3 * Terros(n - 1) + 1);
Unfortunately you've hit a common mistake people make with ints.
(int)1 / (int)2 will always be 0.
Since 1/2 is an integer divison it's always 0; in order to correct the math, just
swap the terms: not 1/2*n but n/2; instead of 1/2* (3 * n + 1) put (3 * n + 1) / 2.
Another issue: do not put computation (Terros) and output (Console.WriteLine) in the
same function
public static String TerrosSequence(int n) {
StringBuilder Sb = new StringBuilder();
// Again: dynamic programming is far better here than recursion
while (n > 1) {
if (Sb.Length > 0)
Sb.Append(",");
Sb.Append(n);
n = (n % 2 == 0) ? n / 2 : (3 * n + 1) / 2;
}
if (Sb.Length > 0)
Sb.Append(",");
Sb.Append(n);
return Sb.ToString();
}
// Output: "Terros generated : 5,8,4,2,1"
Console.WriteLine("Terros generated : " + TerrosSequence(5));
The existing answers guide you in the correct direction, but there is no ultimate one. I thought that summing up and adding detail would help you and future visitors.
The problem name
The original name of this question was “Conjuncture of Terros”. First, it is conjecture, second, the modification to the original Collatz sequence you used comes from Riho Terras* (not Terros!) who proved the Terras Theorem saying that for almost all t₀ holds that ∃n ∈ ℕ: tₙ < t₀. You can read more about it on MathWorld and chux’s question on Math.SE.
* While searching for who is that R. Terras mentioned on MathWorld, I found not only the record on Geni.com, but also probable author of that record, his niece Astrid Terras, and her family’s genealogy. Just for the really curious ones. ☺
The formula
You got the formula wrong in your question. As the table of sequences for different t₀ shows, you should be testing for parity of tₙ₋₁ instead of n.
Formula taken from MathWorld.
Also the second table column heading is wrong, it should read t₀, t₁, t₂, … as t₀ is listed too.
You repeat the mistake with testing n instead of tₙ₋₁ in your code, too. If output of your program is precisely specified (e.g. when checked by an automatic judge), think once more whether you should output t₀ or not.
Integer vs float arithmetic
When making an operation with two integers, you get an integer. If a float is involved, the result is float. In both branches of your condition, you compute an expression of this form:
1 / 2 * …
1 and 2 are integers, therefore the division is integer division. Integer division always rounds down, so the expression is in fact
0 * …
which is (almost*) always zero. Mystery solved. But how to fix it?
Instead of multiplying by one half, you can divide by two. In even branch, division by 2 gives no remainder. In odd branch, tₙ₋₁ is odd, so 3 · tₙ₋₁ is odd too. Odd plus 1 is even, so division by two always produces remainder equal to zero in both branches. Integer division is enough, the result is precise.
Also, you could use float division, just replace 1 with 1.0. But this will probably not give correct results. You see, all members of the sequence are integers and you’re getting float results! So rounding with Math.Round() and casting to integer? Nah… If you can, always evade using floats. There are very few use cases for them, I think, most having something to do with graphics or numerical algorithms. Most of the time you don’t really need them and they just introduce round-off errors.
* Zero times whatever could produce NaN too, but let’s ignore the possibility of “whatever” being from special float values. I’m just pedantic.
Recursive solution
Apart from the problems mentioned above, your whole recursive approach is flawed. Obviously you intended Terras(n) to be tₙ. That’s not utterly bad. But then you forgot that you supply t₀ and search for n instead of the other way round.
To fix your approach, you would need to set up a “global” variable int t0 that would be set to given t₀ and returned from Terras(0). Then Terras(n) would really return tₙ. But you wouldn’t still know the value of n when the sequence stops. You could only repeat for bigger and bigger n, ruining time complexity.
Wait. What about caching the results of intermediate Terras() calls in an ArrayList<int> t? t[i] will contain result for Terras(i) or zero if not initialized. At the top of Terras() you would add if (n < t.Count() && t[n] != 0) return t[n]; for returning the value immediately if cached and not repeating the computation. Otherwise the computation is really made and just before returning, the result is cached:
if (n < t.Count()) {
t[n] = return_value;
} else {
for (int i = t.Count(); i < n; i++) {
t.Add(0);
}
t.Add(return_value);
}
Still not good enough. Time complexity saved, but having the ArrayList increases space complexity. Try tracing (preferably manually, pencil & paper) the computation for t0 = 3; t.Add(t0);. You don’t know the final n beforehand, so you must go from 1 up, till Terras(n) returns 1.
Noticed anything? First, each time you increment n and make a new Terras() call, you add the computed value at the end of cache (t). Second, you’re always looking just one item back. You’re computing the whole sequence from the bottom up and you don’t need that big stupid ArrayList but always just its last item!
Iterative solution
OK, let’s forget that complicated recursive solution trying to follow the top-down definition and move to the bottom-up approach that popped up from gradual improvement of the original solution. Recursion is not needed anymore, it just clutters the whole thing and slows it down.
End of sequence is still found by incrementing n and computing tₙ, halting when tₙ = 1. Variable t stores tₙ, t_previous stores previous tₙ (now tₙ₋₁). The rest should be obvious.
public static void Terras(int t) {
Console.Write("Terras generated:");
Console.Write(" " + t);
while (t > 1) {
int t_previous = t;
if (t_previous % 2 == 0) {
t = t_previous / 2;
} else {
t = (3 * t_previous + 1) / 2;
}
Console.Write(", " + t);
}
Console.WriteLine("");
}
Variable names taken from chux’s answer, just for the sake of comparability.
This can be deemed a primitive instance of dynamic-programming technique. The evolution of this solution is common to the whole class of such problems. Slow recursion, call result caching, dynamic “bottom-up” approach. When you are more experienced with dynamic programming, you’ll start seeing it directly even in more complicated problems, not even thinking about recursion.

Algorithm - java, c# or delphi - search a number in array which exists more than arraySize / 2. In one pass without additional memory

I need to do an algorithm that search a specific int in array of int.
That number must appear >= than arraySize/2 times.
example: [] = 4 4 3 5 5 5 5 5 5 6
arraysize: 10
number 5 exists 6x -> so this is the result of algorithm
but I need to do this without additionam memory, and in time O(n) -> in one pass.
Is this even possible? Any suggestions how to start it?
It is indeed possible; the task is known as "Dominant Element," and used for interviews and as a homework. Read the article below for a proper analysis; the solution itself is simple but not easy: proving that it indeed does what it promises is not quite trivial (unless of course you know the answer).
http://www.cse.iitk.ac.in/users/sbaswana/Courses/ESO211/problem.pdf
element x;
int count ← 0;
For(i = 0 to n − 1)
{
if(count == 0) { x ← A[i]; count ++; }
else if (A[i] == x) count ++;
else count −−
}
Check if x is dominant element by scanning array A.
Note though that the time is O(n), but as far as I'm aware, it is not possible to do it in one pass unless you know for sure there is a dominant element.
As of additional memory, you will need memory for i, the counter; x, the element to check and return; and count, the size of the imaginary working set. That's O(1) and is usually considered OK for such problems.
Moore describes the solution to this problem on his web site (with an example here).
Edit: Here is some Java code demonstrating the algorithm as described:
public class Majority
{
public static void main(String[] args)
{
int[]a = new int[]{4, 4, 3, 5, 5, 5, 5, 5, 5, 6};
int count = 0;
int candidateIndex = 0;
for (int i = 0; i < a.length; i++)
{
if (count == 0)
{
candidateIndex = i;
count++;
}
else
{
if (a[i] == a[candidateIndex])
count++;
else
count--;
}
}
System.out.println("Majority element: " + a[candidateIndex]);
}
}
After you get your candidateIndex, you can iterate though the array again to verify that it indeed occurs more than N / 2 times.

Efficient algorithm to get primes between two large numbers

I'm a beginner in C#, I'm trying to write an application to get primes between two numbers entered by the user. The problem is: At large numbers (valid numbers are in the range from 1 to 1000000000) getting the primes takes long time and according to the problem I'm solving, the whole operation must be carried out in a small time interval. This is the problem link for more explanation:
SPOJ-Prime
And here's the part of my code that's responsible of getting primes:
public void GetPrime()
{
int L1 = int.Parse(Limits[0]);
int L2 = int.Parse(Limits[1]);
if (L1 == 1)
{
L1++;
}
for (int i = L1; i <= L2; i++)
{
for (int k = L1; k <= L2; k++)
{
if (i == k)
{
continue;
}
else if (i % k == 0)
{
flag = false;
break;
}
else
{
flag = true;
}
}
if (flag)
{
Console.WriteLine(i);
}
}
}
Is there any faster algorithm?
Thanks in advance.
I remember solving the problem like this:
Use the sieve of eratosthenes to generate all primes below sqrt(1000000000) = ~32 000 in an array primes.
For each number x between m and n only test if it's prime by testing for divisibility against numbers <= sqrt(x) from the array primes. So for x = 29 you will only test if it's divisibile by 2, 3 and 5.
There's no point in checking for divisibility against non-primes, since if x divisible by non-prime y, then there exists a prime p < y such that x divisible by p, since we can write y as a product of primes. For example, 12 is divisible by 6, but 6 = 2 * 3, which means that 12 is also divisible by 2 or 3. By generating all the needed primes in advance (there are very few in this case), you significantly reduce the time needed for the actual primality testing.
This will get accepted and doesn't require any optimization or modification to the sieve, and it's a pretty clean implementation.
You can do it faster by generalising the sieve to generate primes in an interval [left, right], not [2, right] like it's usually presented in tutorials and textbooks. This can get pretty ugly however, and it's not needed. But if anyone is interested, see:
http://pastie.org/9199654 and this linked answer.
You are doing a lot of extra divisions that are not needed - if you know a number is not divisible by 3, there is no point in checking if it is divisible by 9, 27, etc. You should try to divide only by the potential prime factors of the number. Cache the set of primes you are generating and only check division by the previous primes. Note that you do need to generate the initial set of primes below L1.
Remember that no number will have a prime factor that's greater than its own square root, so you can stop your divisions at that point. For instance, you can stop checking potential factors of the number 29 after 5.
You also do can increment by 2 so you can disregard checking if an even number is prime altogether (special casing the number 2, of course.)
I used to ask this question during interviews - as a test I compared an implementation similar to yours with the algorithm I described. With the optimized algorithm, I could generate hundreds of thousands of primes very fast - I never bothered waiting around for the slow, straightforward implementation.
You could try the Sieve of Eratosthenes. The basic difference would be that you start at L1 instead of starting at 2.
Let's change the question a bit: How quickly can you generate the primes between m and n and simply write them to memory? (Or, possibly, to a RAM disk.) On the other hand, remember the range of parameters as described on the problem page: m and n can be as high as a billion, while n-m is at most a million.
IVlad and Brian most of a competitive solution, even if it is true that a slower solution could be good enough. First generate or even precompute the prime numbers less than sqrt(billion); there aren't very many of them. Then do a truncated Sieve of Eratosthenes: Make an array of length n-m+1 to keep track of the status of every number in the range [m,n], with initially every such number marked as prime (1). Then for each precomputed prime p, do a loop that looks like this:
for(k=ceil(m/p)*p; k <= n; k += p) status[k-m] = 0;
This loop marks all of the numbers in the range m <= x <= n as composite (0) if they are multiple of p. If this is what IVlad meant by "pretty ugly", I don't agree; I don't think that it's so bad.
In fact, almost 40% of this work is just for the primes 2, 3, and 5. There is a trick to combine the sieve for a few primes with initialization of the status array. Namely, the pattern of divisibility by 2, 3, and 5 repeats mod 30. Instead of initializing the array to all 1s, you can initialize it to a repeating pattern of 010000010001010001010001000001. If you want to be even more cutting edge, you can advance k by 30*p instead of by p, and only mark off the multiples in the same pattern.
After this, realistic performance gains would involve steps like using a bit vector rather than a char array to keep the sieve data in on-chip cache. And initializing the bit vector word by word rather than bit by bit. This does get messy, and also hypothetical since you can get to the point of generating primes faster than you can use them. The basic sieve is already very fast and not very complicated.
One thing no one's mentioned is that it's rather quick to test a single number for primality. Thus, if the range involved is small but the numbers are large (ex. generate all primes between 1,000,000,000 and 1,000,100,000), it would be faster to just check every number for primality individually.
There are many algorithms finding prime numbers. Some are faster, others are easier.
You can start by making some easiest optimizations. For example,
why are you searching if every number is prime? In other words, are you sure that, given a range of 411 to 418, there is a need to search if numbers 412, 414, 416 and 418 are prime? Numbers which divide by 2 and 3 can be skipped with very simple code modifications.
if the number is not 5, but ends by a digit '5' (1405, 335), it is not prime bad idea: it will make the search slower.
what about caching the results? You can then divide by primes rather by every number. Moreover, only primes less than square root of the number you search are concerned.
If you need something really fast and optimized, taking an existing algorithm instead of reinventing the wheel can be an alternative. You can also try to find some scientific papers explaining how to do it fast, but it can be difficult to understand and to translate to code.
int ceilingNumber = 1000000;
int myPrimes = 0;
BitArray myNumbers = new BitArray(ceilingNumber, true);
for(int x = 2; x < ceilingNumber; x++)
if(myNumbers[x])
{
for(int y = x * 2; y < ceilingNumber; y += x)
myNumbers[y] = false;
}
for(int x = 2; x < ceilingNumber; x++)
if(myNumbers[x])
{
myPrimes++;
Console.Out.WriteLine(x);
}
Console.Out.WriteLine("======================================================");
Console.Out.WriteLine("There is/are {0} primes between 0 and {1} ",myPrimes,ceilingNumber);
Console.In.ReadLine();
I think i have a very fast and efficient(generate all prime even if using type BigInteger) algorithm to getting prime number,it much more faster and simpler than any other one and I use it to solve almost problem related to prime number in Project Euler with just a few seconds for complete solution(brute force)
Here is java code:
public boolean checkprime(int value){ //Using for loop if need to generate prime in a
int n, limit;
boolean isprime;
isprime = true;
limit = value / 2;
if(value == 1) isprime =false;
/*if(value >100)limit = value/10; // if 1 number is not prime it will generate
if(value >10000)limit = value/100; //at lest 2 factor (not 1 or itself)
if(value >90000)limit = value/300; // 1 greater than average 1 lower than average
if(value >1000000)limit = value/1000; //ex: 9997 =13*769 (average ~ sqrt(9997) is 100)
if(value >4000000)limit = value/2000; //so we just want to check divisor up to 100
if(value >9000000)limit = value/3000; // for prime ~10000
*/
limit = (int)Math.sqrt(value); //General case
for(n=2; n <= limit; n++){
if(value % n == 0 && value != 2){
isprime = false;
break;
}
}
return isprime;
}
import java.io.*;
import java.util.Scanner;
class Test{
public static void main(String args[]){
Test tt=new Test();
Scanner obj=new Scanner(System.in);
int m,n;
System.out.println(i);
m=obj.nextInt();
n=obj.nextInt();
tt.IsPrime(n,m);
}
public void IsPrime(int num,int k)
{
boolean[] isPrime = new boolean[num+1];
// initially assume all integers are prime
for (int i = 2; i <= num; i++) {
isPrime[i] = true;
}
// mark non-primes <= N using Sieve of Eratosthenes
for (int i = 2; i*i <= num; i++) {
// if i is prime, then mark multiples of i as nonprime
// suffices to consider mutiples i, i+1, ..., N/i
if (isPrime[i]) {
for (int j = i; i*j <=num; j++) {
isPrime[i*j] = false;
}
}
}
for (int i =k; i <= num; i++) {
if (isPrime[i])
{
System.out.println(i);
}
}
}
}
List<int> prime(int x, int y)
{
List<int> a = new List<int>();
int b = 0;
for (int m = x; m < y; m++)
{
for (int i = 2; i <= m / 2; i++)
{
b = 0;
if (m % i == 0)
{
b = 1;
break;
}
}
if (b == 0) a.Add(m)`
}
return a;
}

Categories

Resources