C# Prime Generator, Maxxing out Bit Array

C# Prime Generator, Maxxing out Bit Array - c#

(C#, prime generator)
Heres some code a friend and I were poking around on:
public List<int> GetListToTop(int top)
{
top++;
List<int> result = new List<int>();
BitArray primes = new BitArray(top / 2);
int root = (int)Math.Sqrt(top);
for (int i = 3, count = 3; i <= root; i += 2, count++)
{
int n = i - count;
if (!primes[n])
for (int j = n + i; j < top / 2; j += i)
{
primes[j] = true;
}
}
if (top >= 2)
result.Add(2);
for (int i = 0, count = 3; i < primes.Length; i++, count++)
{
if (!primes[i])
{
int n = i + count;
result.Add(n);
}
}
return result;
}
On my dorky AMD x64 1800+ (dual core), for all primes below 1 billion in 34546.875ms. Problem seems to be storing more in the bit array. Trying to crank more than ~2billion is more than the bitarray wants to store. Any ideas on how to get around that?

I would "swap" parts of the array out to disk. By that, I mean, divide your bit array into half-billion bit chunks and store them on disk.
The have only a few chunks in memory at any one time. With C# (or any other OO language), it should be easy to encapsulate the huge array inside this chunking class.
You'll pay for it with a slower generation time but I don't see any way around that until we get larger address spaces and 128-bit compilers.

Or as an alternative approach to the one suggested by Pax, make use of the new Memory-Mapped File classes in .NET 4.0 and let the OS decide which chunks need to be in memory at any given time.
Note however that you'll want to try and optimise the algorithm to increase locality so that you do not needlessly end up swapping pages in and out of memory (trickier than this one sentence makes it sound).

Use multiple BitArrays to increase the maximum size. If a number is to great bit-shift it and store the result in a bit-array for storing bits 33-64.
BitArray second = new BitArray(int.MaxValue);
long num = 23958923589;
if (num > int.MaxValue)
{
int shifted = (int)num >> 32;
second[shifted] = true;
}
long request = 0902305023;
if (request > int.MaxValue)
{
int shifted = (int)request >> 32;
return second[shifted];
}
else return first[request];
Of course it would be nice if BitArray would support size up to System.Numerics.BigInteger.
Swapping to disk will make your code really slow.
I have a 64-bit OS, and my BitArray is also limited to 32-bits.
PS: your prime number calculations looks wierd, mine looks like this:
for (int i = 2; i <= number; i++)
if (primes[i])
for (int scalar = i + i; scalar <= number; scalar += i)
{
primes[scalar] = false;
yield return scalar;
}

The Sieve algorithm would be better performing. I could determine all the 32-bit primes (total about 105 million) for the int range in less than 4 minutes with that. Of course returning the list of primes is a different thing as the memory requirement there would be a little over 400 MB (1 int = 4 bytes). Using a for loop the numbers were printed to a file and then imported to a DB for more fun :) However for the 64 bit primes the program would need several modifications and perhaps require distributed execution over multiple nodes. Also refer to the following links
http://www.troubleshooters.com/codecorn/primenumbers/primenumbers.htm
http://en.wikipedia.org/wiki/Prime-counting_function

Related

.NET BitArray cardinality [duplicate]

I am implementing a library where I am extensively using the .Net BitArray class and need an equivalent to the Java BitSet.Cardinality() method, i.e. a method which returns the number of bits set. I was thinking of implementing it as an extension method for the BitArray class. The trivial implementation is to iterate and count the bits set (like below), but I wanted a faster implementation as I would be performing thousands of set operations and counting the answer. Is there a faster way than the example below?
count = 0;
for (int i = 0; i < mybitarray.Length; i++)
{
if (mybitarray [i])
count++;
}

This is my solution based on the "best bit counting method" from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
public static Int32 GetCardinality(BitArray bitArray)
{
Int32[] ints = new Int32[(bitArray.Count >> 5) + 1];
bitArray.CopyTo(ints, 0);
Int32 count = 0;
// fix for not truncated bits in last integer that may have been set to true with SetAll()
ints[ints.Length - 1] &= ~(-1 << (bitArray.Count % 32));
for (Int32 i = 0; i < ints.Length; i++)
{
Int32 c = ints[i];
// magic (http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)
unchecked
{
c = c - ((c >> 1) & 0x55555555);
c = (c & 0x33333333) + ((c >> 2) & 0x33333333);
c = ((c + (c >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
}
count += c;
}
return count;
}
According to my tests, this is around 60 times faster than the simple foreach loop and still 30 times faster than the Kernighan approach with around 50% bits set to true in a BitArray with 1000 bits. I also have a VB version of this if needed.

you can accomplish this pretty easily with Linq
BitArray ba = new BitArray(new[] { true, false, true, false, false });
var numOnes = (from bool m in ba
where m
select m).Count();

BitArray myBitArray = new BitArray(...
int
bits = myBitArray.Count,
size = ((bits - 1) >> 3) + 1,
counter = 0,
x,
c;
byte[] buffer = new byte[size];
myBitArray.CopyTo(buffer, 0);
for (x = 0; x < size; x++)
for (c = 0; buffer[x] > 0; buffer[x] >>= 1)
counter += buffer[x] & 1;
Taken from "Counting bits set, Brian Kernighan's way" and adapted for bytes. I'm using it for bit arrays of 1 000 000+ bits and it's superb.
If your bits are not n*8 then you can count the mod byte manually.

I had the same issue, but had more than just the one Cardinality method to convert. So, I opted to port the entire BitSet class. Fortunately it was self-contained.
Here is the Gist of the C# port.
I would appreciate if people would report any bugs that are found - I am not a Java developer, and have limited experience with bit logic, so I might have translated some of it incorrectly.

Faster and simpler version than the accepted answer thanks to the use of System.Numerics.BitOperations.PopCount
C#
Int32[] ints = new Int32[(bitArray.Count >> 5) + 1];
bitArray.CopyTo(ints, 0);
Int32 count = 0;
for (Int32 i = 0; i < ints.Length; i++) {
count += BitOperations.PopCount(ints[i]);
}
Console.WriteLine(count);
F#
let ints = Array.create ((bitArray.Count >>> 5) + 1) 0u
bitArray.CopyTo(ints, 0)
ints
|> Array.sumBy BitOperations.PopCount
|> printfn "%d"
See more details in Is BitOperations.PopCount the best way to compute the BitArray cardinality in .NET?

You could use Linq, but it would be useless and slower:
var sum = mybitarray.OfType<bool>().Count(p => p);

There is no faster way with using BitArray - What it comes down to is you will have to count them - you could use LINQ to do that or do your own loop, but there is no method offered by BitArray and the underlying data structure is an int[] array (as seen with Reflector) - so this will always be O(n), n being the number of bits in the array.
The only way I could think of making it faster is using reflection to get a hold of the underlying m_array field, then you can get around the boundary checks that Get() uses on every call (see below) - but this is kinda dirty, and might only be worth it on very large arrays since reflection is expensive.
public bool Get(int index)
{
if ((index < 0) || (index >= this.Length))
{
throw new ArgumentOutOfRangeException("index", Environment.GetResourceString("ArgumentOutOfRange_Index"));
}
return ((this.m_array[index / 0x20] & (((int) 1) << (index % 0x20))) != 0);
}
If this optimization is really important to you, you should create your own class for bit manipulation, that internally could use BitArray, but keeps track of the number of bits set and offers the appropriate methods (mostly delegate to BitArray but add methods to get number of bits currently set) - then of course this would be O(1).

If you really want to maximize the speed, you could pre-compute a lookup table where given a byte-value you have the cardinality, but BitArray is not the most ideal structure for this, since you'd need to use reflection to pull the underlying storage out of it and operate on the integral types - see this question for a better explanation of that technique.
Another, perhaps more useful technique, is to use something like the Kernighan trick, which is O(m) for an n-bit value of cardinality m.
static readonly ZERO = new BitArray (0);
static readonly NOT_ONE = new BitArray (1).Not ();
public static int GetCardinality (this BitArray bits)
{
int c = 0;
var tmp = new BitArray (myBitArray);
for (c; tmp != ZERO; c++)
tmp = tmp.And (tmp.And (NOT_ONE));
return c;
}
This too is a bit more cumbersome than it would be in say C, because there are no operations defined between integer types and BitArrays, (tmp &= tmp - 1, for example, to clear the least significant set bit, has been translated to tmp &= (tmp & ~0x1).
I have no idea if this ends up being any faster than naively iterating for the case of the BCL BitArray, but algorithmically speaking it should be superior.
EDIT: cited where I discovered the Kernighan trick, with a more in-depth explanation

If you don't mind to copy the code of System.Collections.BitArray to your project and Edit it,you can write as fellow:
(I think it's the fastest. And I've tried use BitVector32[] to implement my BitArray, but it's still so slow.)
public void Set(int index, bool value)
{
if ((index < 0) || (index >= this.m_length))
{
throw new ArgumentOutOfRangeException("index", "Index Out Of Range");
}
SetWithOutAuth(index,value);
}
//When in batch setting values,we need one method that won't auth the index range
private void SetWithOutAuth(int index, bool value)
{
int v = ((int)1) << (index % 0x20);
index = index / 0x20;
bool NotSet = (this.m_array[index] & v) == 0;
if (value && NotSet)
{
CountOfTrue++;//Count the True values
this.m_array[index] |= v;
}
else if (!value && !NotSet)
{
CountOfTrue--;//Count the True values
this.m_array[index] &= ~v;
}
else
return;
this._version++;
}
public int CountOfTrue { get; internal set; }
public void BatchSet(int start, int length, bool value)
{
if (start < 0 || start >= this.m_length || length <= 0)
return;
for (int i = start; i < length && i < this.m_length; i++)
{
SetWithOutAuth(i,value);
}
}

I wrote my version of after not finding one that uses a look-up table:
private int[] _bitCountLookup;
private void InitLookupTable()
{
_bitCountLookup = new int[256];
for (var byteValue = 0; byteValue < 256; byteValue++)
{
var count = 0;
for (var bitIndex = 0; bitIndex < 8; bitIndex++)
{
count += (byteValue >> bitIndex) & 1;
}
_bitCountLookup[byteValue] = count;
}
}
private int CountSetBits(BitArray bitArray)
{
var result = 0;
var numberOfFullBytes = bitArray.Length / 8;
var numberOfTailBits = bitArray.Length % 8;
var tailByte = numberOfTailBits > 0 ? 1 : 0;
var bitArrayInBytes = new byte[numberOfFullBytes + tailByte];
bitArray.CopyTo(bitArrayInBytes, 0);
for (var i = 0; i < numberOfFullBytes; i++)
{
result += _bitCountLookup[bitArrayInBytes[i]];
}
for (var i = (numberOfFullBytes * 8); i < bitArray.Length; i++)
{
if (bitArray[i])
{
result++;
}
}
return result;
}

The problem is naturally O(n), as a result your solution is probably the most efficient.
Since you are trying to count an arbitrary subset of bits you cannot count the bits when they are set (would would provide a speed boost if you are not setting the bits too often).
You could check to see if the processor you are using has a command which will return the number of set bits. For example a processor with SSE4 could use the POPCNT according to this post. This would probably not work for you since .Net does not allow assembly (because it is platform independent). Also, ARM processors probably do not have an equivalent.
Probably the best solution would be a look up table (or switch if you could guarantee the switch will compiled to a single jump to currentLocation + byteValue). This would give you the count for the whole byte. Of course BitArray does not give access to the underlying data type so you would have to make your own BitArray. You would also have to guarantee that all the bits in the byte will always be part of the intersection which does not sound likely.
Another option would be to use an array of booleans instead of a BitArray. This has the advantage not needing to extract the bit from the others in the byte. The disadvantage is the array will take up 8x as much space in memory meaning not only wasted space, but also more data push as you iterate through the array to perform your count.
The difference between a standard array look up and a BitArray look up is as follows:
Array:
offset = index * indexSize
Get memory at location + offset and save to value
BitArray:
index = index/indexSize
offset = index * indexSize
Get memory at location + offset and save to value
position = index%indexSize
Shift value position bits
value = value and 1
With the exception of #2 for Arrays and #3 most of these commands take 1 processor cycle to complete. Some of the commands can be combined into 1 command using x86/x64 processors, though probably not with ARM since it uses a reduced set of instructions.
Which of the two (array or BitArray) perform better will be specific to your platform (processor speed, processor instructions, processor cache sizes, processor cache speed, amount of system memory (Ram), speed of system memory (CAS), speed of connection between processor and RAM) as well as the spread of indexes you want to count (are the intersections most often clustered or are they randomly distributed).
To summarize: you could probably find a way to make it faster, but your solution is the fastest you will get for your data set using a bit per boolean model in .NET.
Edit: make sure you are accessing the indexes you want to count in order. If you access indexes 200, 5, 150, 151, 311, 6 in that order then you will increase the amount of cache misses resulting in more time spent waiting for values to be retrieved from RAM.

Why is it faster to calculate the product of a consecutive array of integers by performing the calculation in pairs?

I was trying to create my own factorial function when I found that the that the calculation is twice as fast if it is calculated in pairs. Like this:
Groups of 1: 2*3*4 ... 50000*50001 = 4.1 seconds
Groups of 2: (2*3)*(4*5)*(6*7) ... (50000*50001) = 2.0 seconds
Groups of 3: (2*3*4)*(5*6*7) ... (49999*50000*50001) = 4.8 seconds
Here is the c# I used to test this.
Stopwatch timer = new Stopwatch();
timer.Start();
// Seperate the calculation into groups of this size.
int k = 2;
BigInteger total = 1;
// Iterates from 2 to 50002, but instead of incrementing 'i' by one, it increments it 'k' times,
// and the inner loop calculates the product of 'i' to 'i+k', and multiplies 'total' by that result.
for (var i = 2; i < 50000 + 2; i += k)
{
BigInteger partialTotal = 1;
for (var j = 0; j < k; j++)
{
// Stops if it exceeds 50000.
if (i + j >= 50000) break;
partialTotal *= i + j;
}
total *= partialTotal;
}
Console.WriteLine(timer.ElapsedMilliseconds / 1000.0 + "s");
I tested this at different levels and put the average times over a few tests in a bar graph. I expected it to become more efficient as I increased the number of groups, but 3 was the least efficient and 4 had no improvement over groups of 1.
Link to First Data
Link to Second Data
What causes this difference, and is there an optimal way to calculate this?

BigInteger has a fast case for numbers of 31 bits or less. When you do a pairwise multiplication, this means a specific fast-path is taken, that multiplies the values into a single ulong and sets the value more explicitly:
public void Mul(ref BigIntegerBuilder reg1, ref BigIntegerBuilder reg2) {
...
if (reg1._iuLast == 0) {
if (reg2._iuLast == 0)
Set((ulong)reg1._uSmall * reg2._uSmall);
else {
...
}
}
else if (reg2._iuLast == 0) {
...
}
else {
...
}
}
public void Set(ulong uu) {
uint uHi = NumericsHelpers.GetHi(uu);
if (uHi == 0) {
_uSmall = NumericsHelpers.GetLo(uu);
_iuLast = 0;
}
else {
SetSizeLazy(2);
_rgu[0] = (uint)uu;
_rgu[1] = uHi;
}
AssertValid(true);
}
A 100% predictable branch like this is perfect for a JIT, and this fast-path should get optimized extremely well. It's possible that _rgu[0] and _rgu[1] are even inlined. This is extremely cheap, so effectively cuts down the number of real operations by a factor of two.
So why is a group of three so much slower? It's obvious that it should be slower than for k = 2; you have far fewer optimized multiplications. More interesting is why it's slower than k = 1. This is easily explained by the fact that the outer multiplication of total now hits the slow path. For k = 2 this impact is mitigated by halving the number of multiplies and the potential inlining of the array.
However, these factors do not help k = 3, and in fact the slow case hurts k = 3 a lot more. The second multiplication in the k = 3 case hits this case
if (reg1._iuLast == 0) {
...
}
else if (reg2._iuLast == 0) {
Load(ref reg1, 1);
Mul(reg2._uSmall);
}
else {
...
}
which allocates
EnsureWritable(1);
uint uCarry = 0;
for (int iu = 0; iu <= _iuLast; iu++)
uCarry = MulCarry(ref _rgu[iu], u, uCarry);
if (uCarry != 0) {
SetSizeKeep(_iuLast + 2, 0);
_rgu[_iuLast] = uCarry;
}
why does this matter? Well, EnsureWritable(1) causes
uint[] rgu = new uint[_iuLast + 1 + cuExtra];
so rgu becomes length 3. The number of passes in total's code is decided in
public void Mul(ref BigIntegerBuilder reg1, ref BigIntegerBuilder reg2)
as
for (int iu1 = 0; iu1 < cu1; iu1++) {
...
for (int iu2 = 0; iu2 < cu2; iu2++, iuRes++)
uCarry = AddMulCarry(ref _rgu[iuRes], uCur, rgu2[iu2], uCarry);
...
}
which means that we have a total of len(total._rgu) * 3 operations. This hasn't saved us anything! There are only len(total._rgu) * 1 passes for k = 1 - we just do it three times!
There is actually an optimization on the outer loop that reduces this back down to len(total._rgu) * 2:
uint uCur = rgu1[iu1];
if (uCur == 0)
continue;
However, they "optimize" this optimization in a way that hurts far more than before:
if (reg1.CuNonZero <= reg2.CuNonZero) {
rgu1 = reg1._rgu; cu1 = reg1._iuLast + 1;
rgu2 = reg2._rgu; cu2 = reg2._iuLast + 1;
}
else {
rgu1 = reg2._rgu; cu1 = reg2._iuLast + 1;
rgu2 = reg1._rgu; cu2 = reg1._iuLast + 1;
}
For k = 2, that causes the outer loop to be over total, since reg2 contains no zero values with high probability. This is great because total is way longer than partialTotal, so the fewer passes the better. For k = 3, the EnsureWritable(1) will always cause a spare space because the multiplication of three numbers no more than 15 bits long can never exceed 64 bits. This means that, although we still only do one pass over total for k = 2, we do two for k = 3!
This starts to explain why the speed increases again beyond k = 3: the number of passes per addition increases slower than the number of additions decreases, as you're only adding ~15 bits to the inner value each time. The inner multiplications are fast relative to the massive total multiplications, so the more time spent consolidating values, the more time saved in passes over total. Further, the optimization is less frequently a pessimism.
It also explains why odd values take longer: they add an extra 32-bit integer to the _rgu array. This won't happen so cleanly if the ~15 bits wasn't so close to half of 32.
It's worth noting that there are a lot of ways to improve this code; the comments here are about why, not how to fix it. The easiest improvement would be to chuck the values in a heap and multiply only the two smallest values at a time.

The time required to do a BigInteger multiplication depends on the size of the product.
Both methods take the same number of multiplications, but if you multiply the factors in pairs, then the average size of the product is much smaller than it is if you multiply each factor with the product of all the smaller ones.
You can do even better if you always multiply the two smallest factors (original factors or intermediate products) that have yet to be multiplied together, until you get to the complete product.

I think you have a bug ('+' instead of '*').
partialTotal *= i + j;
Good to check that you are getting the right answer, not just interesting performance metrics.
But I'm curious what motivated you to try this. If you do find a difference, I would expect it would have to do with optimalities in register and/or memory allocation. And I would expect it would be 0-30% or something like that, not 50%.

What would be the shortest way to sum up the digits in odd and even places separately

I've always loved reducing number of code lines by using simple but smart math approaches. This situation seems to be one of those that need this approach. So what I basically need is to sum up digits in the odd and even places separately with minimum code. So far this is the best way I have been able to think of:
string number = "123456789";
int sumOfDigitsInOddPlaces=0;
int sumOfDigitsInEvenPlaces=0;
for (int i=0;i<number.length;i++){
if(i%2==0)//Means odd ones
sumOfDigitsInOddPlaces+=number[i];
else
sumOfDigitsInEvenPlaces+=number[i];
{
//The rest is not important
Do you have a better idea? Something without needing to use if else

int* sum[2] = {&sumOfDigitsInOddPlaces,&sumOfDigitsInEvenPlaces};
for (int i=0;i<number.length;i++)
{
*(sum[i&1])+=number[i];
}

You could use two separate loops, one for the odd indexed digits and one for the even indexed digits.
Also your modulus conditional may be wrong, you're placing the even indexed digits (0,2,4...) in the odd accumulator. Could just be that you're considering the number to be 1-based indexing with the number array being 0-based (maybe what you intended), but for algorithms sake I will consider the number to be 0-based.
Here's my proposition
number = 123456789;
sumOfDigitsInOddPlaces=0;
sumOfDigitsInEvenPlaces=0;
//even digits
for (int i = 0; i < number.length; i = i + 2){
sumOfDigitsInEvenPlaces += number[i];
}
//odd digits, note the start at j = 1
for (int j = 1; i < number.length; i = i + 2){
sumOfDigitsInOddPlaces += number[j];
}
On the large scale this doesn't improve efficiency, still an O(N) algorithm, but it eliminates the branching

Since you added C# to the question:
var numString = "123456789";
var odds = numString.Split().Where((v, i) => i % 2 == 1);
var evens = numString.Split().Where((v, i) => i % 2 == 0);
var sumOfOdds = odds.Select(int.Parse).Sum();
var sumOfEvens = evens.Select(int.Parse).Sum();

Do you like Python?
num_string = "123456789"
odds = sum(map(int, num_string[::2]))
evens = sum(map(int, num_string[1::2]))

This Java solution requires no if/else, has no code duplication and is O(N):
number = "123456789";
int[] sums = new int[2]; //sums[0] == sum of even digits, sums[1] == sum of odd
for(int arrayIndex=0; arrayIndex < 2; ++arrayIndex)
{
for (int i=0; i < number.length()-arrayIndex; i += 2)
{
sums[arrayIndex] += Character.getNumericValue(number.charAt(i+arrayIndex));
}
}

Assuming number.length is even, it is quite simple. Then the corner case is to consider the last element if number is uneven.
int i=0;
while(i<number.length-1)
{
sumOfDigitsInEvenPlaces += number[ i++ ];
sumOfDigitsInOddPlaces += number[ i++ ];
}
if( i < number.length )
sumOfDigitsInEvenPlaces += number[ i ];
Because the loop goes over i 2 by 2, if number.length is even, removing 1 does nothing.
If number.length is uneven, it removes the last item.
If number.length is uneven, then the last value of i when exiting the loop is that of the not yet visited last element.
If number.length is uneven, by modulo 2 reasoning, you have to add the last item to sumOfDigitsInEvenPlaces.
This seems slightly more verbose, but also more readable, to me than Anonymous' (accepted) answer. However, benchmarks to come.
Well, the compiler seems to think my code more understandable as well, since he removes it all if I don't print the results (which explains why I kept getting a time of 0 all along...). The other code though is obfuscated enough for even the compiler.
In the end, even with huge arrays, it's pretty hard for clock_t to tell the difference between the two. You get about a third less instructions in the second case, but since everything's in cache (and your running sums even in registers) it doesn't matter much.
For the curious, I've put the disassembly of both versions (compiled from C) here : http://pastebin.com/2fciLEMw

Efficient algorithm to get primes between two large numbers

I'm a beginner in C#, I'm trying to write an application to get primes between two numbers entered by the user. The problem is: At large numbers (valid numbers are in the range from 1 to 1000000000) getting the primes takes long time and according to the problem I'm solving, the whole operation must be carried out in a small time interval. This is the problem link for more explanation:
SPOJ-Prime
And here's the part of my code that's responsible of getting primes:
public void GetPrime()
{
int L1 = int.Parse(Limits[0]);
int L2 = int.Parse(Limits[1]);
if (L1 == 1)
{
L1++;
}
for (int i = L1; i <= L2; i++)
{
for (int k = L1; k <= L2; k++)
{
if (i == k)
{
continue;
}
else if (i % k == 0)
{
flag = false;
break;
}
else
{
flag = true;
}
}
if (flag)
{
Console.WriteLine(i);
}
}
}
Is there any faster algorithm?
Thanks in advance.

I remember solving the problem like this:
Use the sieve of eratosthenes to generate all primes below sqrt(1000000000) = ~32 000 in an array primes.
For each number x between m and n only test if it's prime by testing for divisibility against numbers <= sqrt(x) from the array primes. So for x = 29 you will only test if it's divisibile by 2, 3 and 5.
There's no point in checking for divisibility against non-primes, since if x divisible by non-prime y, then there exists a prime p < y such that x divisible by p, since we can write y as a product of primes. For example, 12 is divisible by 6, but 6 = 2 * 3, which means that 12 is also divisible by 2 or 3. By generating all the needed primes in advance (there are very few in this case), you significantly reduce the time needed for the actual primality testing.
This will get accepted and doesn't require any optimization or modification to the sieve, and it's a pretty clean implementation.
You can do it faster by generalising the sieve to generate primes in an interval [left, right], not [2, right] like it's usually presented in tutorials and textbooks. This can get pretty ugly however, and it's not needed. But if anyone is interested, see:
http://pastie.org/9199654 and this linked answer.

You are doing a lot of extra divisions that are not needed - if you know a number is not divisible by 3, there is no point in checking if it is divisible by 9, 27, etc. You should try to divide only by the potential prime factors of the number. Cache the set of primes you are generating and only check division by the previous primes. Note that you do need to generate the initial set of primes below L1.
Remember that no number will have a prime factor that's greater than its own square root, so you can stop your divisions at that point. For instance, you can stop checking potential factors of the number 29 after 5.
You also do can increment by 2 so you can disregard checking if an even number is prime altogether (special casing the number 2, of course.)
I used to ask this question during interviews - as a test I compared an implementation similar to yours with the algorithm I described. With the optimized algorithm, I could generate hundreds of thousands of primes very fast - I never bothered waiting around for the slow, straightforward implementation.

You could try the Sieve of Eratosthenes. The basic difference would be that you start at L1 instead of starting at 2.

Let's change the question a bit: How quickly can you generate the primes between m and n and simply write them to memory? (Or, possibly, to a RAM disk.) On the other hand, remember the range of parameters as described on the problem page: m and n can be as high as a billion, while n-m is at most a million.
IVlad and Brian most of a competitive solution, even if it is true that a slower solution could be good enough. First generate or even precompute the prime numbers less than sqrt(billion); there aren't very many of them. Then do a truncated Sieve of Eratosthenes: Make an array of length n-m+1 to keep track of the status of every number in the range [m,n], with initially every such number marked as prime (1). Then for each precomputed prime p, do a loop that looks like this:
for(k=ceil(m/p)*p; k <= n; k += p) status[k-m] = 0;
This loop marks all of the numbers in the range m <= x <= n as composite (0) if they are multiple of p. If this is what IVlad meant by "pretty ugly", I don't agree; I don't think that it's so bad.
In fact, almost 40% of this work is just for the primes 2, 3, and 5. There is a trick to combine the sieve for a few primes with initialization of the status array. Namely, the pattern of divisibility by 2, 3, and 5 repeats mod 30. Instead of initializing the array to all 1s, you can initialize it to a repeating pattern of 010000010001010001010001000001. If you want to be even more cutting edge, you can advance k by 30*p instead of by p, and only mark off the multiples in the same pattern.
After this, realistic performance gains would involve steps like using a bit vector rather than a char array to keep the sieve data in on-chip cache. And initializing the bit vector word by word rather than bit by bit. This does get messy, and also hypothetical since you can get to the point of generating primes faster than you can use them. The basic sieve is already very fast and not very complicated.

One thing no one's mentioned is that it's rather quick to test a single number for primality. Thus, if the range involved is small but the numbers are large (ex. generate all primes between 1,000,000,000 and 1,000,100,000), it would be faster to just check every number for primality individually.

There are many algorithms finding prime numbers. Some are faster, others are easier.
You can start by making some easiest optimizations. For example,
why are you searching if every number is prime? In other words, are you sure that, given a range of 411 to 418, there is a need to search if numbers 412, 414, 416 and 418 are prime? Numbers which divide by 2 and 3 can be skipped with very simple code modifications.
if the number is not 5, but ends by a digit '5' (1405, 335), it is not prime bad idea: it will make the search slower.
what about caching the results? You can then divide by primes rather by every number. Moreover, only primes less than square root of the number you search are concerned.
If you need something really fast and optimized, taking an existing algorithm instead of reinventing the wheel can be an alternative. You can also try to find some scientific papers explaining how to do it fast, but it can be difficult to understand and to translate to code.

int ceilingNumber = 1000000;
int myPrimes = 0;
BitArray myNumbers = new BitArray(ceilingNumber, true);
for(int x = 2; x < ceilingNumber; x++)
if(myNumbers[x])
{
for(int y = x * 2; y < ceilingNumber; y += x)
myNumbers[y] = false;
}
for(int x = 2; x < ceilingNumber; x++)
if(myNumbers[x])
{
myPrimes++;
Console.Out.WriteLine(x);
}
Console.Out.WriteLine("======================================================");
Console.Out.WriteLine("There is/are {0} primes between 0 and {1} ",myPrimes,ceilingNumber);
Console.In.ReadLine();

I think i have a very fast and efficient(generate all prime even if using type BigInteger) algorithm to getting prime number,it much more faster and simpler than any other one and I use it to solve almost problem related to prime number in Project Euler with just a few seconds for complete solution(brute force)
Here is java code:
public boolean checkprime(int value){ //Using for loop if need to generate prime in a
int n, limit;
boolean isprime;
isprime = true;
limit = value / 2;
if(value == 1) isprime =false;
/*if(value >100)limit = value/10; // if 1 number is not prime it will generate
if(value >10000)limit = value/100; //at lest 2 factor (not 1 or itself)
if(value >90000)limit = value/300; // 1 greater than average 1 lower than average
if(value >1000000)limit = value/1000; //ex: 9997 =13*769 (average ~ sqrt(9997) is 100)
if(value >4000000)limit = value/2000; //so we just want to check divisor up to 100
if(value >9000000)limit = value/3000; // for prime ~10000
*/
limit = (int)Math.sqrt(value); //General case
for(n=2; n <= limit; n++){
if(value % n == 0 && value != 2){
isprime = false;
break;
}
}
return isprime;
}

import java.io.*;
import java.util.Scanner;
class Test{
public static void main(String args[]){
Test tt=new Test();
Scanner obj=new Scanner(System.in);
int m,n;
System.out.println(i);
m=obj.nextInt();
n=obj.nextInt();
tt.IsPrime(n,m);
}
public void IsPrime(int num,int k)
{
boolean[] isPrime = new boolean[num+1];
// initially assume all integers are prime
for (int i = 2; i <= num; i++) {
isPrime[i] = true;
}
// mark non-primes <= N using Sieve of Eratosthenes
for (int i = 2; i*i <= num; i++) {
// if i is prime, then mark multiples of i as nonprime
// suffices to consider mutiples i, i+1, ..., N/i
if (isPrime[i]) {
for (int j = i; i*j <=num; j++) {
isPrime[i*j] = false;
}
}
}
for (int i =k; i <= num; i++) {
if (isPrime[i])
{
System.out.println(i);
}
}
}
}

List<int> prime(int x, int y)
{
List<int> a = new List<int>();
int b = 0;
for (int m = x; m < y; m++)
{
for (int i = 2; i <= m / 2; i++)
{
b = 0;
if (m % i == 0)
{
b = 1;
break;
}
}
if (b == 0) a.Add(m)`
}
return a;
}

Hash table faster in C# than C++?

Here's a curiosity I've been investigating. The .NET Dictionary class performs ridiculously fast compared to the STL unordered_map in a test I keep running, and I can't figure out why.
(0.5 seconds vs. 4 seconds on my machine)
(.NET 3.5 SP1 vs. Visual Studio 2008 Express SP1's STL)
On the other hand, if I implement my own hash table in C# and C++, the C++ version is about twice as fast as the C# one, which is fine because it reinforces my common sense that native machine code is sometimes faster. (See. I said "sometimes".) Me being the same person in both languages, I wonder what tricks was the C# coder from Microsoft able to play that the C++ coder from Microsoft wasn't? I'm having trouble imagining how a compiler could play such tricks on its own, going through the trouble of optimizing what should look to it to be arbitrary function calls.
It's a simple test, storing and retrieving integers.
C#:
const int total = (1 << 20);
int sum = 0;
Dictionary<int, int> dict = new Dictionary<int, int>();
for(int i = 0; i < total; i++)
{
dict.Add(i, i * 7);
}
for(int j = 0; j < (1 << 3); j++)
{
int i = total;
while(i > 0)
{
i--;
sum += dict[i];
}
}
Console.WriteLine(sum);
C++:
const int total = (1 << 20);
int sum = 0;
std::tr1::unordered_map<int, int> dict;
for(int i = 0; i < total; i++)
{
dict.insert(pair<int, int>(i, i * 7));
}
for(int j = 0; j < (1 << 3); j++)
{
int i = total;
while(i > 0)
{
i--;
std::tr1::unordered_map<int, int>::const_iterator found =
dict.find(i);
sum += found->second;
}
}
cout << sum << endl;

the two versions are not equivalent , your are constructing an iterator in each pass of your C++ while loop. that takes CPU time and throws your results.

You are measuring the cost of explicit memory management. More statistics are available here. This is relevant too. And Chris Sells' attempt to add deterministic finalization to the CLR is notable.

There will be some differences at the code level: the fact that the unordered map takes a pair forces the construction of such object, when C# might be faster with passing two parameters in Add.
Another point is the implementation of the hashtables themselves: the implementation of the hashing function, or the way to deal with collisions, will cause different performance patterns.
Throw in alignment and caching, JIT-friendliness of some algorithms, and comparing two different implementations in two different languages becomes very difficult, and the only thing you can compare is the particular task at hand. Try with fewer or more elements, or try to access elements randomly instead of in sequence, and you might see very different results.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.