How to get last set bit in BitArray? - c#

What is effective(fast) way to get last set bit in BitArray. (LINQ or simple backward for loop isn't very fast for large bitmaps. And I need fast) BitArray
I see next algorithm: go back through BitArray internal int array data and use some compiler Intrinsic Like C++ _BitScanReverse( don't know analog in C#).

The "normal" solution:
static long FindLastSetBit(BitArray array)
{
for (int i = array.Length - 1; i >= 0; i--)
{
if (array[i])
{
return i;
}
}
return -1;
}
The reflection solution (note - relies on implementation of BitArray):
static long FindLastSetBitReflection(BitArray array)
{
int[] intArray = (int[])array.GetType().GetField("m_array", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic).GetValue(array);
for (var i = intArray.Length - 1; i >= 0; i--)
{
var b = intArray[i];
if (b != 0)
{
var pos = (i << 5) + 31;
for (int bit = 31; bit >= 0; bit--)
{
if ((b & (1 << bit)) != 0)
return pos;
pos--;
}
return pos;
}
}
return -1;
}
The reflection solution is 50-100x faster for me on large BitArrays (on very small ones the overhead of reflection will start to appear). It takes about 0.2 ms per megabyte on my machine.
The main thing is that if (b != 0) checks 32 bits at once. The inner loop which checks specific bits only runs once, when the correct word is found.
Edited: unsafe code removed because I realized almost nothing is gained by it, it only avoids the array boundary check and as the code is so fast already it doesn't matter that much. For the record, unsafe solution (~30% faster for me):
static unsafe long FindLastSetBitUnsafe(BitArray array)
{
int[] intArray = (int[])array.GetType().GetField("m_array", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic).GetValue(array);
fixed (int* buffer = intArray)
{
for (var i = intArray.Length - 1; i >= 0; i--)
{
var b = buffer[i];
if (b != 0)
{
var pos = (i << 5) + 31;
for (int bit = 31; bit >= 0; bit--)
{
if ((b & (1 << bit)) != 0)
return pos;
pos--;
}
return pos;
}
}
}
return -1;
}

If you want the index of that last set bit you can do this in C# 6.
int? index = array.Select((b,i)=>{Index = i, Value = b})
.LastOrDefault(x => x.Value)
?.Index;
Otherwise you have to do something like this
var last = array.Select((b,i)=>{Index = i, Value = b})
.LastOrDefault(x => x.Value);
int? index = last == null ? (int?)null : last.Index;
Either way the index will be null if all the bits are zero.

I don't believe there is anything it can be done, other than iterate from last to first bit, and ask for each one if it is set. It could be done with something like:
BitArray bits = ...;
int lastSet = Enumerable.Range(1, bits.Length)
.Select(i => bits.Length - i)
.Where(i => bits[i])
.DefaultIfEmpty(-1)
.First();
That should return the last bit set, or -1 if none is. Haven't tested it myself, so it may need some adjustment.
Hope it helps.

Related

Optimal solution for "Bitwise AND" problem in C#

Problem statement:
Given an array of non-negative integers, count the number of unordered pairs of array elements, such that their bitwise AND is a power of 2.
Example:
arr = [10, 7, 2, 8, 3]
Answer: 6 (10&7, 10&2, 10&8, 10&3, 7&2, 2&3)
Constraints:
1 <= arr.Count <= 2*10^5
0 <= arr[i] <= 2^12
Here's my brute-force solution that I've come up with:
private static Dictionary<int, bool> _dictionary = new Dictionary<int, bool>();
public static long CountPairs(List<int> arr)
{
long result = 0;
for (var i = 0; i < arr.Count - 1; ++i)
{
for (var j = i + 1; j < arr.Count; ++j)
{
if (IsPowerOfTwo(arr[i] & arr[j])) ++result;
}
}
return result;
}
public static bool IsPowerOfTwo(int number)
{
if (_dictionary.TryGetValue(number, out bool value)) return value;
var result = (number != 0) && ((number & (number - 1)) == 0);
_dictionary[number] = result;
return result;
}
For small inputs this works fine, but for big inputs this works slow.
My question is: what is the optimal (or at least more optimal) solution for the problem? Please provide a graceful solution in C#. 😊
One way to accelerate your approach is to compute the histogram of your data values before counting.
This will reduce the number of computations for long arrays because there are fewer options for value (4096) than the length of your array (200000).
Be careful when counting bins that are powers of 2 to make sure you do not overcount the number of pairs by including cases when you are comparing a number with itself.
We can adapt the bit-subset dynamic programming idea to have a solution with O(2^N * N^2 + n * N) complexity, where N is the number of bits in the range, and n is the number of elements in the list. (So if the integers were restricted to [1, 4096] or 2^12, with n at 100,000, we would have on the order of 2^12 * 12^2 + 100000*12 = 1,789,824 iterations.)
The idea is that we want to count instances for which we have overlapping bit subsets, with the twist of adding a fixed set bit. Given Ai -- for simplicity, take 6 = b110 -- if we were to find all partners that AND to zero, we'd take Ai's negation,
110 -> ~110 -> 001
Now we can build a dynamic program that takes a diminishing mask, starting with the full number and diminishing the mask towards the left
001
^^^
001
^^
001
^
Each set bit on the negation of Ai represents a zero, which can be ANDed with either 1 or 0 to the same effect. Each unset bit on the negation of Ai represents a set bit in Ai, which we'd like to pair only with zeros, except for a single set bit.
We construct this set bit by examining each possibility separately. So where to count pairs that would AND with Ai to zero, we'd do something like
001 ->
001
000
we now want to enumerate
011 ->
011
010
101 ->
101
100
fixing a single bit each time.
We can achieve this by adding a dimension to the inner iteration. When the mask does have a set bit at the end, we "fix" the relevant bit by counting only the result for the previous DP cell that would have the bit set, and not the usual union of subsets that could either have that bit set or not.
Here is some JavaScript code (sorry, I do not know C#) to demonstrate with testing at the end comparing to the brute-force solution.
var debug = 0;
function bruteForce(a){
let answer = 0;
for (let i = 0; i < a.length; i++) {
for (let j = i + 1; j < a.length; j++) {
let and = a[i] & a[j];
if ((and & (and - 1)) == 0 && and != 0){
answer++;
if (debug)
console.log(a[i], a[j], a[i].toString(2), a[j].toString(2))
}
}
}
return answer;
}
function f(A, N){
const n = A.length;
const hash = {};
const dp = new Array(1 << N);
for (let i=0; i<1<<N; i++){
dp[i] = new Array(N + 1);
for (let j=0; j<N+1; j++)
dp[i][j] = new Array(N + 1).fill(0);
}
for (let i=0; i<n; i++){
if (hash.hasOwnProperty(A[i]))
hash[A[i]] = hash[A[i]] + 1;
else
hash[A[i]] = 1;
}
for (let mask=0; mask<1<<N; mask++){
// j is an index where we fix a 1
for (let j=0; j<=N; j++){
if (mask & 1){
if (j == 0)
dp[mask][j][0] = hash[mask] || 0;
else
dp[mask][j][0] = (hash[mask] || 0) + (hash[mask ^ 1] || 0);
} else {
dp[mask][j][0] = hash[mask] || 0;
}
for (let i=1; i<=N; i++){
if (mask & (1 << i)){
if (j == i)
dp[mask][j][i] = dp[mask][j][i-1];
else
dp[mask][j][i] = dp[mask][j][i-1] + dp[mask ^ (1 << i)][j][i - 1];
} else {
dp[mask][j][i] = dp[mask][j][i-1];
}
}
}
}
let answer = 0;
for (let i=0; i<n; i++){
for (let j=0; j<N; j++)
if (A[i] & (1 << j))
answer += dp[((1 << N) - 1) ^ A[i] | (1 << j)][j][N];
}
for (let i=0; i<N + 1; i++)
if (hash[1 << i])
answer = answer - hash[1 << i];
return answer / 2;
}
var As = [
[10, 7, 2, 8, 3] // 6
];
for (let A of As){
console.log(JSON.stringify(A));
console.log(`DP, brute force: ${ f(A, 4) }, ${ bruteForce(A) }`);
console.log('');
}
var numTests = 1000;
for (let i=0; i<numTests; i++){
const N = 6;
const A = [];
const n = 10;
for (let j=0; j<n; j++){
const num = Math.floor(Math.random() * (1 << N));
A.push(num);
}
const fA = f(A, N);
const brute = bruteForce(A);
if (fA != brute){
console.log('Mismatch:');
console.log(A);
console.log(fA, brute);
console.log('');
}
}
console.log("Done testing.");
int[] numbers = new[] { 10, 7, 2, 8, 3 };
static bool IsPowerOfTwo(int n) => (n != 0) && ((n & (n - 1)) == 0);
long result = numbers.AsParallel()
.Select((a, i) => numbers
.Skip(i + 1)
.Select(b => a & b)
.Count(IsPowerOfTwo))
.Sum();
If I understand the problem correctly, this should work and should be faster.
First, for each number in the array we grab all elements in the array after it to get a collection of numbers to pair with.
Then we transform each pair number with a bitwise AND, then counting the number that satisfy our 'IsPowerOfTwo;' predicate (implementation here).
Finally we simply get the sum of all the counts - our output from this case is 6.
I think this should be more performant than your dictionary based solution - it avoids having to perform a lookup each time you wish to check power of 2.
I think also given the numerical constraints of your inputs it is fine to use int data types.

What's the fastest way to remove characters from an alpha-numeric string?

Say we have the following strings that we pass as parameters to the function below:
string sString = "S104";
string sString2 = "AS105";
string sString3 = "ASRVT106";
I want to be able to extract the numbers from the string to place them in an int variable. Is there a quicker and/or more efficient way of removing the letters from the strings than the following code?: (*These strings will be populated dynamically at runtime - they are not assigned values at construction.)
Code:
public GetID(string sCustomTag = null)
{
m_sCustomTag = sCustomTag;
try {
m_lID = Convert.ToInt32(m_sCustomTag); }
catch{
try{
int iSubIndex = 0;
char[] subString = sCustomTag.ToCharArray();
//ITERATE THROUGH THE CHAR ARRAY
for (int i = 0; i < subString.Count(); i++)
{
for (int j = 0; j < 10; j++)
{
if (subString[i] == j)
{
iSubIndex = i;
goto createID;
}
}
}
createID: m_lID = Convert.ToInt32(m_sCustomTag.Substring(iSubIndex));
}
//IF NONE OF THAT WORKS...
catch(Exception e)
{
m_lID = 00000;
throw e;
}
}
}
}
I've done things like this before, but I'm not sure if there's a more efficient way to do it. If it was just going to be a single letter at the beginning, I could just set the subStringIndex to 1 every time, but the users can essentially put in whatever they want. Generally, they will be formatted to a LETTER-then-NUMBER format, but if they don't, or they want to put in multiple letters like sString2 or sString3, then I need to be able to compensate for that. Furthermore, if the user puts in some whacked-out, non-traditional format like string sString 4 = S51A24;, is there a way to just remove any and all letters from the string?
I've looked about, and can't find anything on MSDN or Google. Any help or links to it are greatly appreciated!
You can use a regular expression. It's not necessarily faster, but it's more concise.
string sString = "S104";
string sString2 = "AS105";
string sString3 = "ASRVT106";
var re = new Regex(#"\d+");
Console.WriteLine(re.Match(sString).Value); // 104
Console.WriteLine(re.Match(sString2).Value); // 105
Console.WriteLine(re.Match(sString3).Value); // 106
You can use a Regex, but it's probably faster to just do:
public int ExtractInteger(string str)
{
var sb = new StringBuilder();
for (int i = 0; i < str.Length; i++)
if(Char.IsDigit(str[i])) sb.Append(str[i]);
return int.Parse(sb.ToString());
}
You can simplify further with some LINQ at the expense of a small performance penalty:
public int ExtractInteger(string str)
{
return int.Parse(new String(str.Where(c=>Char.IsDigit(c)).ToArray()));
}
Now, if you only want to parse the first sequence of consecutive digits, do this instead:
public int ExtractInteger(string str)
{
return int.Parse(new String(str.SkipWhile(c=>!Char.IsDigit(c)).TakeWhile(c=>Char.IsDigit(c)).ToArray()));
}
Fastest is to parse the string without removing anything:
var s = "S51A24";
int m_lID = 0;
for (int i = 0; i < s.Length; i++)
{
int d = s[i] - '0';
if ((uint)d < 10)
m_lID = m_lID * 10 + d;
}
Debug.Print(m_lID + ""); // 5124
string removeLetters(string s)
{
for (int i = 0; i < s.Length; i++)
{
char c = s[i];
if (IsEnglishLetter(c))
{
s = s.Remove(i, 1);
}
}
return s;
}
bool IsEnglishLetter(char c)
{
return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
}
While you asked "what's the fastest way to remove characters..." what you really appear to be asking is "how do I create an integer by extracting only the digits from the string"?
Going with this assumption, your first call to Convert.ToInt32 will be slow for the case where you have other than digits because of the exception throwing.
Let's try another approach. Let's think about each of the cases.
The string always starts with a series of digits (e.g. 123ABC => 123)
The string always ends with a series of digits (e.g. ABC123 => 123)
A string has a series of contiguous digits in the middle (e.g. AB123C ==> 123)
The digits are possibly noncontiguous (e.g. A77C12 => 7712)
Case 4 is the "safest" assumption (after all, it is a superset of Case 1, 2 and 3. So, we need an algorithm for that. As a bonus I'll provide algorithms specialized to the other cases.
The Main Algorithm, All Cases
Using in-place unsafe iteration of the characters of the string, which uses fixed, we can extract digits and convert them to a single number without the data copy in ToCharArray(). We can also avoid the allocations of, say, a StringBuilder implementation and a possibly slow regex solution.
NOTE: This is valid C# code though it's using pointers. It does look like C++, but I assure you it's C#.
public static unsafe int GetNumberForwardFullScan(string s)
{
int value = 0;
fixed (char* pString = s)
{
var pChar = pString;
for (int i = 0; i != s.Length; i++, pChar++)
{
// this just means if the char is not between 0-9, we exit the loop (i.e. stop calculating the integer)
if (*pChar < '0' || *pChar > '9')
continue;
// running recalculation of the integer
value = value * 10 + *pChar - '0';
}
}
return value;
}
Running this against any of the inputs: "AS106RVT", "ASRVT106", "106ASRVT", or "1AS0RVT6" results in pulling out 1, 0, 6 and calculating on each digit as
0*10 + 1 == 1
1*10 + 0 == 10
10*10 + 6 == 106
Case 1 Only Algorithm (Digits at Start of String)
This algorithm is identical to the one above, but instead of continue we can break as soon as we reach a non-digit. This would be much faster if we can assume all the inputs start with digits and the strings are long.
Case 2 Only Algorithm (Digits at End of String)
This is almost the same as Case 1 Only except you have to
iterate from the end of the string to the beginning (aka backwards) stopping on the first non-digit
change the calculation to sum up powers of ten.
Both of those are a bit tricky, so here's what that looks like
public static unsafe int GetNumberBackward(string s)
{
int value = 0;
fixed (char* pString = s)
{
char* pChar = pString + s.Length - 1;
for (int i = 0; i != -1; i++, pChar--)
{
if (*pChar < '0' || *pChar > '9')
break;
value = (*pChar - '0') * (int)Math.Pow(10, i) + value;
}
}
return value;
}
So each of the iteration of the calculation looks like
6*100 + 0 == 6
0*101 + 6 == 6
1*102 + 6 == 106
While I used Math.Pow in these examples, you can find integer only versions that might be faster.
Cases 1-3 Only (i.e. All Digits Contiguous Somewhere in the String
This algorithm says to
Scan all non-digits
Then scan only digits
First non-digit after that, stop
It would look like
public static unsafe int GetContiguousDigits(string s)
{
int value = 0;
fixed (char* pString = s)
{
var pChar = pString;
// skip non-digits
int i = 0;
for (; i != s.Length; i++, pChar++)
if (*pChar >= '0' && *pChar <= '9')
break;
for (; i != s.Length; i++, pChar++)
{
if (*pChar < '0' || *pChar > '9')
break;
value = value * 10 + *pChar - '0';
}
}
return value;
}

How to save CPU cycles when searching for a value in a sorted list?

In CodinGame learning platform, one of the questions used as an example in a C# tutorial is this one:
The aim of this exercise is to check the presence of a number in an
array.
Specifications: The items are integers arranged in ascending order.
The array can contain up to 1 million items. The array is never null.
Implement the method boolean Answer.Exists(int[] ints, int k) so that
it returns true if k belongs to ints, otherwise the method should
return false.
Important note: Try to save CPU cycles if possible.
Example:
int[] ints = {-9, 14, 37, 102};
Answer.Exists(ints, 102) returns true.
Answer.Exists(ints, 36) returns false.
My proposal was to do that:
using System;
using System.IO;
public class Answer
{
public static bool Exists(int[] ints, int k)
{
foreach (var i in ints)
{
if (i == k)
{
return true;
}
if (i > k)
{
return false;
}
}
return false;
}
}
The result of the test was:
βœ” The solution works with a 'small' array (200 pts) - Problem solving
βœ” The solution works with an empty array (50 pts) - Reliability
✘ The solution works in a reasonable time with one million items (700 pts) - Problem solving
I don't get the last point. It appears that the code may be more optimal than the one I suggested.
How to optimize this piece of code? Is a binary search an actual solution (given that the values in the array are already ordered), or there is something simpler that I missed?
Yes, I think that binary search O(log(N)) complexity v. O(N) complexity is the solution:
public static bool Exists(int[] ints, int k) {
return Array.BinarySearch(ints, k) >= 0;
}
since Array.BinarySearch return non-negative value if the item (k) has been found:
https://msdn.microsoft.com/en-us/library/2cy9f6wb(v=vs.110).aspx
Return Value Type: System.Int32 The index of the specified value in
the specified array, if value is found; otherwise, a negative number.
Here is a fast method for an ordered array
public static class Answer
{
public static bool Exists( int[] ints, int k )
{
var lower = 0;
var upper = ints.Length - 1;
if ( k < ints[lower] || k > ints[upper] ) return false;
if ( k == ints[lower] ) return true;
if ( k == ints[upper] ) return true;
do
{
var middle = lower + ( upper - lower ) / 2;
if ( ints[middle] == k ) return true;
if ( lower == upper ) return false;
if ( k < ints[middle] )
upper = Math.Max( lower, middle - 1 );
else
lower = Math.Min( upper, middle + 1 );
} while ( true );
}
}
Takes around 50 ticks on my cpu (with 90.000.000 items in the array)
Sample on dotnetfiddle
class Answer
{
public static bool Exists(int[] ints, int k)
{
int index = Array.BinarySearch(ints, k);
if (index > -1)
{
return true;
}
else
{
return false;
}
}
static void Main(string[] args)
{
int[] ints = { -9, 14, 37, 102 };
Console.WriteLine(Answer.Exists(ints, 14)); // true
Console.WriteLine(Answer.Exists(ints, 4)); // false
}
}
Apparently, the task intends we use the default binary search method instead of implementing one. I was also somewhat surprised it is what it evaluates for in 3rd test. "The solution uses the standard library to perform the binary search (iterating on ints)"
Which kinda is confusing, they could have mentioned this in the code instead of giving some 15 - 20 minutes to solve. which is another reason for this confusion.
This is what I wrote for that question -> dividing array to half and search the half -> a more rudimentary way of implementing it...
int half = size/2;
if( k < ints[half])
{
for(int i=0; i < half; i++)
{
if( k == ints[i])
{
return true;
}
}
}
else
{
for(int i=half; i < size; i++)
{
if( k == ints[i])
{
return true;
}
}
}
public static bool Exists(int[] ints, int k)
{
var d = 0;
var f = ints.Length - 1;
if (d > f) return false;
if (k > ints[f] || k < ints[d]) return false;
if (k == ints[f] || k == ints[d]) return true;
return (BinarySearch(ints, k, d, f) > 0);
}
public static int BinarySearch(int[] V, int Key, int begin, int end)
{
if (begin > end)
return -1;
var MidellIndex = (begin + end) / 2;
if (Key == V[MidellIndex])
return MidellIndex;
else
{
if (Key > V[MidellIndex])
{
begin = MidellIndex + 1;
return BinarySearch(V, Key, begin, end);
}
else
{
end = MidellIndex - 1;
return BinarySearch(V, Key, begin, end);
}
}
}
I saw the all solutions, by the way I create and test the following recursive approach and get the complete points:
public static bool Exists(int[] ints, int k)
{
if (ints.Length > 0 && ints[0] <= k && k <= ints[ints.Length - 1])
{
if (ints[0] == k || ints[ints.Length - 1] == k) return true;
return SearchRecursive(ints, k, 0, ints.Length - 1) != -1;
}
return false;
}
private static int SearchRecursive(int[] array, int value, int first, int last)
{
int middle = (first + last) / 2;
if (array[middle] == value)
{
return middle;
}
else if (first >= last)
{
return -1;
}
else if (value < array[middle])
{
return SearchRecursive(array, value, first, middle - 1);
}
else
{
return SearchRecursive(array, value, middle + 1, last);
}
}
Yes, BinarySearch would be faster than most algorithms you can write manually. However, if the intent of the exercise is to learn how to write an algorithm, you are on the right track. Your algorithm, though, makes an unnecessary check with if (i > k) ... why do you need this?
Below is my general algorithm for simple requirements like this. The while loop like this is slightly more performant than a for-loop and out performs a foreach easily.
public class Answer
{
public static bool Exists(int[] ints, int k)
{
var i = 0;
var hasValue = false;
while(i < ints.Length && !hasValue)
{
hasValue = ints[i] == k;
++i;
}
return hasValue;
}
}
If you are trying to squeeze every ounce of speed out of it... consider that your array has 1..100 and you want to search for 78. Your algorithm needs to search and compare 78 items before you find the right one. How about instead you search the first item and its not there, so you jump to array size / 2 and find 50? Now you skipped 50 iterations. You know that 78 MUST be in the top half of the array, so you can again split it in half and jump to 75, etc. By continuously splitting the array in half, you do much fewer iterations then your brute force approach.

Converting UInt64 to a binary array

I am having problem with this method I wrote to convert UInt64 to a binary array. For some numbers I am getting incorrect binary representation.
Results
Correct
999 = 1111100111
Correct
18446744073709551615 = 1111111111111111111111111111111111111111111111111111111111111111
Incorrect?
18446744073709551614 =
0111111111111111111111111111111111111111111111111111111111111110
According to an online converter the binary value of 18446744073709551614 should be
1111111111111111111111111111111111111111111111111111111111111110
public static int[] GetBinaryArray(UInt64 n)
{
if (n == 0)
{
return new int[2] { 0, 0 };
}
var val = (int)(Math.Log(n) / Math.Log(2));
if (val == 0)
val++;
var arr = new int[val + 1];
for (int i = val, j = 0; i >= 0 && j <= val; i--, j++)
{
if ((n & ((UInt64)1 << i)) != 0)
arr[j] = 1;
else
arr[j] = 0;
}
return arr;
}
FYI: This is not a homework assignment, I require to convert an integer to binary array for encryption purposes, hence the need for an array of bits. Many solutions I have found on this site convert an integer to string representation of binary number which was useless so I came up with this mashup of various other methods.
An explanation as to why the method works for some numbers and not others would be helpful. Yes I used Math.Log and it is slow, but performance can be fixed later.
EDIT: And yes I do need the line where I use Math.Log because my array will not always be 64 bits long, for example if my number was 4 then in binary it is 100 which is array length 3. It is a requirement of my application to do it this way.
It's not the returned array for the input UInt64.MaxValue - 1 which is wrong, it seems like UInt64.MaxValue is wrong.
The array is 65 elements long. This is intuitively wrong because UInt64.MaxValue must fit in 64 bits.
Firstly, instead of doing a natural log and dividing by a log to base 2, you can just do a log to base 2.
Secondly, you also need to do a Math.Ceiling on the returned value because you need the value to fit fully inside the number of bits. Discarding the remainder with a cast to int means that you need to arbitrarily do a val + 1 when declaring the result array. This is only correct for certain scenarios - one of which it is not correct for is... UInt64.MaxValue. Adding one to the number of bits necessary gives a 65-element array.
Thirdly, and finally, you cannot left-shift 64 bits, hence i = val - 1 in the for loop initialization.
Haven't tested this exhaustively...
public static int[] GetBinaryArray(UInt64 n)
{
if (n == 0)
{
return new int[2] { 0, 0 };
}
var val = (int)Math.Ceiling(Math.Log(n,2));
if (val == 0)
val++;
var arr = new int[val];
for (int i = val-1, j = 0; i >= 0 && j <= val; i--, j++)
{
if ((n & ((UInt64)1 << i)) != 0)
arr[j] = 1;
else
arr[j] = 0;
}
return arr;
}

Any more efficient method of converting a string integer to int than convert.toint32

Is there a more efficient method of converting a string integer to int rather than using Convert.ToInt32() in c#?
I have a program which converts a lot of strings to integers. These values are read from a text file in string format.
No, probably not, at least not by far. I tried this quick and dirty benchmark:
private static int toint(string s) {
int res = 0;
foreach (var c in s) {
res = 10*res + (c - '0');
}
return res;
}
static void Main() {
var s = DateTime.Now;
for (int i = 0 ; i != 10000000 ; i++) {
if (Convert.ToInt32("112345678") == 0) break;
}
var m = DateTime.Now;
for (int i = 0; i != 10000000; i++) {
if (toint("112345678") == 0) break;
}
Console.WriteLine("{0} {1}", DateTime.Now-m, m-s);
}
My toint method skips all sorts of validations, and gets a result that is only a 40% improvement on Convert.ToInt32: 1.14 s vs. 1.86 s.
Adding just a basic validation to the dirty toint eliminates its advantage almost entirely: this method
private static int toint(string s) {
int res = 0;
foreach (var c in s) {
if (Char.IsDigit(c))
res = 10*res + (c - '0');
}
return res;
}
runs in 1.62 s, or a 13% improvement while staying fundamentally incorrect.
Using this method:
private static int Parse(string s)
{
int value = 0;
for (var i = 0; i < s.Length; i++)
{
value = value*10 + (s[i] - '0');
}
return value;
}
I get 750 ms instead of 18+ seconds with int.Parse for 100M conversions.
I won't recommend it unless this is your real bottleneck and you don't care about any form of validation.
If you’re reading your integers from a Stream, then you could optimize by avoiding the overhead of initializing a string.
For example, assuming that your numbers will always be non-negative and terminated by a , character, you could use:
int num = stream.ReadByte() - '0';
byte next = (byte)stream.ReadByte();
while (next != ',')
{
num = num * 10 + next - '0';
next = (byte)stream.ReadByte();
}
This page benchmarks 4 techniques. The fastest method was as Romain wrote about above:
y = 0;
for (int i = 0; i < s[x].Length; i++)
y = y * 10 + (s[x][i] - '0');
Here are some other methods that were tested that proved almost 10x slower (where "s" is the array of strings the author used for conversion) :
int.Parse(s[x]);
Int32.TryParse(s[x], out y);
Convert.ToInt32(s[x]);
Convert.ToInt32() uses Int32.Parse() (with a little validation thrown in). Int32.Parse() in turn uses Number.Parse().
The actual implementation is about as fast as you could get unless you had significant knowledge regarding the input value (e.g. your input is always a fixed number of digits, it is never hex, has a certain precision, it is always unsigned, etc.)
private unsafe static Boolean NumberToInt64(ref NumberBuffer number, ref Int64 value) {
Int32 i = number.scale;
if (i > Int64Precision || i < number.precision) {
return false;
}
char* p = number.digits;
BCLDebug.Assert(p != null, "");
Int64 n = 0;
while (--i >= 0) {
if ((UInt64)n > (0x7FFFFFFFFFFFFFFF / 10)) {
return false;
}
n *= 10;
if (*p != '\0') {
n += (Int32)(*p++ - '0');
}
}
if (number.sign) {
n = -n;
if (n > 0) {
return false;
}
}
else {
if (n < 0) {
return false;
}
}
value = n;
return true;
}
I use the Convert.ToXYZ() methods extensively in my own base framework and in profiler sessions they represent a trivial amount of overhead even when called hundreds of times in a single operation (such as deserializing a complex object tree).
I have encountered places where it is possible to improve upon the performance of the BCL with a specialized algorithm, but this probably isn't one of them.

Categories

Resources