What is the logic behind the .. range selection operator? - c#

I am trying to understand why the .. operator works the way it does, for example:
var data = new []{0,1,2,3,4,5,6,7,8,9,10};
var test = data[1..4]; // This returns array with 1,2,3
Logically, I would assume the result would be either 1,2,3,4 or 2,3 (if the last index isn't included then the first one shouldn't also)
or
var test = data[0..]; // This returns array with 0,1,2,3,4,5,6,7,8,9,10 (zero at index 0 is also included)
var test = data[^0..]; // This returns array with nothing, where i would expect 10, since 10 is at index zero if we traverse the array backwards
I know there must be a reason why it was designed to work like that, but I can't seem to figure it out, so what is the purpose of this behavior?
Thank you.

Why is the end index not included, but the start index is?
This is known as a half open range, and there are already questions asking about this in Python and C++. Essentially, the main advantage of this is that:
The length of the range is exactly (end - start).
You don't need to add/minus 1 as much in range-based algorithms
To slice something in half at an index, you can use the same index: x[..i] gives you the first half, and x[i..] gives you the second half. i.e. x[..i] concatenated with i[i..] is equal to x itself.
If a range's end is equal to another's start, the two ranges are immediately next to each other, and no overlapping.
Why is data[^0..] empty?
This is documented clearly. ^n means Length - n, so ^0 means data.Length - 0 here, which is just data.Length. data[data.Length..] is clearly empty.

It's simple enough. Consider the following:
var data = new []{0,1,2,3,4,5,6,7,8,9,10};
var test = data[0..10];
The 0 to 10, implies that it will select 11 items, if the last was inclusive. You have to start at 0, as C# indexes are zero based, and you'd need a -1 to include the first, if they were both exlusive.
As mentioned in the documentation:
A range specifies the start and end of a range. Ranges are exclusive, meaning the end isn't included in the range. The range [0..^0] represents the entire range, just as [0..sequence.Length] represents the entire range.
The data[^0..]; statements means, starting from the 0 position, bring whatever is below it. The ^ operator works inversly for the same reasons. (^0 is not inclusive of 0)

Related

Optimizing array that has many elements and different standards

I have a function that takes in X as an argument and randomly picks an element from a 2D array.
The 2D array has thousands of elements, each of them has a different requirement on X, stored in arr[Y][1].
For example,
arr[0] should only be chosen when X is larger than 4. (arr[0][1] = 4+)
Then arr[33] should only be chosen when X is between 37 and 59. (arr[33][1] = 37!59)
And arr[490] should only be chosen when X is less than 79. (arr[490][1] = 79-)
And there are many more, most with a different X requirement.
What is the best way to tackle this problem that takes the least space, and least repetition of elements?
The worst way would be storing possible choices for each X in a 2D array. But that would cause a lot of repetition, costing too much memory.
Then, I have thought about using three arrays, separating X+ requirements, X- and X range. But it still sounds too basic to me, is there a better way?
One option here would be what's called "accept/reject sampling": you pick a random index i and check if the condition on X is satisfied for that index. If so, you return arr[i]. If not, you pick another index at random and repeat until you find something.
Performance will be good so long as most conditions are satisfied for most values of i. If this isn't the case -- if there are a lot of values of X for which only a tiny number of conditions are satisfied -- then it might make sense to try and precompute something that lets you find (or narrow down) the indices that are allowable for a given X.
How to do this depends on what you allow as a condition on each index. For instance, if every condition is given by an interval like in the examples you give, you could sort the list twice, first by left endpoints and then by right endpoints. Then determining the valid indices for a particular value of X comes down to intersecting the intervals whose left endpoint is less than or equal to X with those whose right endpoint is greater than or equal to X.
Of course if you allow conditions other than "X is in this interval" then you'd need a different algorithm.
While I believe that re-sampling will be the optimal solution in your case (dozens of resamplings is very cheap price to pay), here is the algorithm I would never implement in practice (since it uses very complicated datastructures and is less efficient than resampling), but with provable bounds. It requires O(n log n) preprocessing time, O(n log n) memory and O(log n) time for each query, where n is the number of elements you can potentially sample.
You store all ends of all ranges in one array (call it ends). E.g. in your case you have an array [-infty, 4, 37, 59, 79, +infty] (it may require some tuning, like adding +1 to right ends of ranges; not important now). The idea is that for any X we only have to determine between which ends it's located. E.g. if X=62 is in range [59; 79] (I'll call such pair an interval). Then for each interval you store a set of all possible ranges. For your input X you just find the interval (using binary search) and then output a random range, corresponding to this interval.
How do you compute the corresponding set of ranges for each interval? We go from left to right in ends array. Let's assume we compute the set for the current interval, and go to the next one. There is some end between these interval. If it's a left end of some interval, we add the corresponding range to the new set (since we enter this range). If it's a right end, we remove the range. How do we do this in O(log n) time instead of O(n)? Immutable balanced tree sets can do this (essentially, they create new trees instead of modifying the old one).
How do you return a uniformly random range from a set? You should augment tree sets: each node should know how many nodes its subtree contains. First you sample an integer in range [0; size(tree)). Then you look at your root node and its children. For example, assume that you sampled integer 15, and your left child's subtree has size 10, while the right's one is 20. Then you go to the right child (since 15 >= 10) and process it with integer 5 (since 15 - 10 = 5). You will eventually visit a leaf, corresponding to a single range. Return this range.
Sorry if it's hard to understand. Like I said, it's not trivial approach which you would need for upper bounds in the worse case (other approaches discussed before require linear time in the worst case; resampling may run for indefinite time if there is no element satisfying restrictions). It also requires some careful handling (e.g. when some ranges have coinciding endpoints).

Need to understand the behaviour of BinarySearch and IndexOf methods

I have List and its values is ("Brandenburg","Alabama" and "Alberta"). When i used BinarySearch("Brandenburg") method, it returns -4 instead of 0. but i can get the correct index, when sorted this list. Why it returns wrong value if I use the unsorted list?. And I have also get the correct index from IndexOf("Brandenburg") method. Which method is useful that i can use?.
Thanks in Advance,
Prithivi
It MUST be sorted, to use binary search. The reason you're getting -4 is;
Your collection isn't sorted and for binary search the list will 'cut' in half each iteration. So:
When it starts, the topIndex == 0 and bottom = 2
TopIndex -> (0) "Brandenburg",
(1) "Alabama"
BottomIndex -> (2) "Alberta
The binarysearch will check the item in the middle: (2-0) / 2 = 1. If you're searching for Brandenburg. It will compare Alabama with your search item. The letter B is 'bigger' than letter 'A'. So it moves the topIndex to index 1.
(0) "Brandenburg",
TopIndex -> (1) "Alabama"
BottomIndex -> (2) "Alberta
Then it will compare to the next 'middle' item. In this case again Alabama. (2-1) / 2 = 1. It will also be compare to the bottomIndex, but this is the last one.
When binarysearch returns a negative number, it means that the item cannot be found. The negative number is the Index where it should be inserten. (-result -1) So if you want the new item added, it should be inserted on index (--4 -1) == 3
Let me explain how binary search works.
Say you have this array:
{1, 3, 5, 7, 10, 15, 20}
And I want to find the index of 15. What binary search will do is that it looks at the middle of the array, 7. Is 7 greater or less than 15? If it is less than 15, do the same thing again on the second half of the array (10, 15, 20). If it is greater than 15, do it on the first half (1, 3, 5). If it is equal to 15, then that means 15 is found.
This means that the array must be sorted for binary search to work. This explains why doing a binary search on your array returns a negative number. Because obviously, the method can't find the string you requested using the binary search algorithm.
You can get the correct index with IndexOf. This is because IndexOf uses a linear search to find the item. It looks at each element in the array and compare to the one that you're finding. Therefore, whether the array is sorted doesn't matter.
Note: I have not read the source code of IndexOf. It might use a binary search if it finds that the array is sorted. This is only my guess.

Jumping segments in binary

My question is, is there a way in C# with a starting bit location to find the next binary digit within a byte that has a specified value of 0 or 1 without iteration (looking for the highest performance option).
As an example, if you had 10011 and started at the first bit (far right) and searched for the first 0, it would be the 3rd place going right to left. If you then started at the 3rd place and wanted to find the next 1, it would be at the 5th place (far left).
Thanks for any help and feel free to let me know if I need to provide anything further.
Edit: Here is my current code.
private int GetBinarySegment(uint uiValue, int iStart, int iMaxBits, byte bValue)
{
int r = 0; uiValue >>= iStart;
if (uiValue == 0) return iMaxBits - iStart;
while ((uiValue & 1) == bValue) { uiValue >>= 1; r++; }
return r;
}
There are ways, but they're ugly because there's no _BitScanForward or equivalent intrinsic. Still, you can actually compute this thing efficiently without needing a huge table.
First step: make a number that has a 1 at the position you're searching for and 0 everywhere else.
If searching for a 1, that means x & -x. If searching for a 0, use ~x & (x + 1).
Then, use one of the many ways to emulate either bitscan (there is only one set bit now, so it doesn't matter which side you search from). Some ways to do that are detailed here (not in C#, but you can convert them).
Use a lookup table. That is, precalculate a 2D array indexed by byte value and current position. You can do a separate table for zeros and ones, or you can combine it.
So for your example, you start at bit 0 of the number 19. That happens to be a 1. So if you lookup nextBit[19][0] it should return 1, and so on. Here's what a combined lookup table might look like. It shows the next bit for both 0s and 1s:
nextBit[19][0] = 1 // 1
nextBit[19][1] = 4 // 1
nextBit[19][2] = 3 // 0
nextBit[19][3] = 4 // 0
nextBit[19][4] = 0 // 1
nextBit[19][5] = 6 // 0
nextBit[19][6] = 7 // 0
Obviously there is no 'next' for bit 7, and if 'next' returns 0, there are no more of that particular bit.
I may have interpreted your question incorrectly, but this technique can be modified to suit your purposes. I initially thought you wanted to navigate through all 1-bits or 0-bits. If instead you want to skip over consecutive 1-bits, then you just arrange your table in that way. Or indeed, you can have a 'next' for both 0 and 1 at each position.

Generate Number Range in a List of Numbers

I am using C# and have a list of int numbers which contains different numbers such as {34,36,40,35,37,38,39,4,5,3}. Now I need a script to find the different ranges in the list and write it on a file. for this example it would be: (34-40) and (3-5). What is the quick way to do it?
thanks for the help in advance;
The easiest way would be to sort the array and then do a single sequential pass to capture the ranges. That will most likely be fast enough for your purposes.
Two techniques come to mind: histogramming and sorting. Histogramming will be good for dense number sets (where you have most of the numbers between min and max) and sorting will be good if you have sparse number sets (very few of the numbers between min and max are actually used).
For histogramming, simply walk the array and set a Boolean flag to True in the corresponding position histogram, then walk the histogram looking for runs of True (default should be false).
For sorting, simply sort the array using the best applicable sorting technique, then walk the sorted array looking for contiguous runs.
EDIT: some examples.
Let's say you have an array with the first 1,000,000 positive integers, but all even multiples of 191 are removed (you don't know this ahead of time). Histogramming will be a better approach here.
Let's say you have an array containing powers of 2 (2, 4, 8, 16, ...) and 3 (3, 9, 27, 81, ...). For large lists, the list will be fairly sparse and sorting should be expected to do better.
As Mike said, first sort the list. Now, starting with the first element, remember that element, then compare it with the next one. If the next element is 1 greater than the current one, you have a contiguous series. Continue this until the next number is NOT contiguous. When you reach that point, you have a range from the first remembered value to the current value. Remember/output that range, then start again with the next value as the first element of a new series. This will execute in roughly 2N time (linear).
I would sort them and then check for consecutive numbers. If the difference > 1 you have a new range.

Centering Divisions Around Zero

I'm trying to create something that sort of resembles a histogram. I'm trying to create buckets from an array.
Suppose I have a random array doubles between -10 and 10; this is very simplified. I then want to specify a center point, in this case 0 and the number of buckets.
If I want 4 buckets the division would be -10 to -5, -5 to 0, 0 to 5 and 5 to 10. Not that complicated right. Now if I change the min and max to -12 and -9 and as for 4 divisions its more complicated. I either want a division at -3 and 3; it is centered around 0 ; or one at -6 to 0 and 0 to 6.
Its not that hard to find the division size
= Math.Ceiling((Abs(Max) + Abs(Min)) / Divisions)
Then you would basically have an if statement to determine whether you want it centered on 0 or on an edge. You then iterate out from either 0 or DivisionSize/2 depending on the situation. You may not ALWAYS end up with the specified number of divisions but it will be close. Then you iterate through the array and increment the bin count.
Does this seem like a good way to go about this? This method would surely work but it does not seem to be the most elegant. I'm curious as to whether the creation of the bins and the counting from the list could be done in a clever class with linq in a more elegant way?
Something like creating the bins and then having each bin be a property {get;} that returns list.Count(x=> x >= Lower && x < Upper).
To me it seems simpler: You need to find lower bound and size of each "division".
Since you want it to be symmetrical around 0 depending on number of divisions you either get one that includes 0 for odd numbers (-3,3) or around 0 for even ones (-3,0)(0,3)
lowerBound = - Max(Abs(from), Abs(to))
bucketSize = 2 * lowerBound / divisions
(throw in Ceiling and update bucketSize and lowerBound if needed)
Than use .Aggregate to update array of buckets (position would be (value-lowerBound)/devisions, with additional range checks if needed).
Note: do not implement get the way you suggested - it is not expected for getters to perfomr non-trivial work like walking large array.

Categories

Resources