Ranges with Linq and dictionaries

Ranges with Linq and dictionaries - c#

I've created a Range type:
public class Range<T> where T : IComparable<T>
{
public Range(T min, T max) : this(min, max, false) { }
public Range(T min, T max, bool upperbound)
{
}
public bool Upperbound { get; private set; }
public T Min { get; private set; }
public T Max { get; private set; }
public bool Between(T Value)
{
return Upperbound ? (Min.CompareTo(Value) < 0) && (Value.CompareTo(Max) <= 0) : (Min.CompareTo(Value) <= 0) && (Value.CompareTo(Max) < 0);
}
}
I want to use this as key in a dictionary, to allow me to do a search based upon a range. And yes, ranges can overlap or there might be gaps, which is part of the design. It looks simple but I want the search to be even a bit easier! I want to compare a range with it's value of type T, so I can use: myRange == 10 instead of myRange.Between(10).
How? :-)
(Not wanting to break my head over this. I'll probably find the answer but maybe I'm re-inventing the wheel or whatever.)The things I want to do with this dictionary? Well, in general I will use the range itself as a key. A range will be used for more than just a dictionary. I'm dealing with lots of data that have a min/max range and I need to group these together based on the same min/max values. The value in the dictionary is a list of these products that all have the same range. And by using a range, I can quickly find the proper list where I need to add a product. (Or create a new entry if no list is found.)
Once I have a list of products grouped by ranges, I can start searching for values that fir within specific ranges. Basically, this could be a Linq query on the dictionary of all values where the provides value is between the Min and Max value.
I am actually dealing with two such lists. In one, the range is upper-bound and the other lower-bound. There could be more of these kinds of lists, where I first need to collect data based on their range and then find specific items within them.
Could I use a List instead? Probably, but then I would not have the distinct grouping of my data based on the range itself. A List of Lists then? Possible, but then I'm considering the Dictionary again. :-)Range examples: I have multiple items where the range is 0 to 100. Other items where range is 0 to 1, 1 to 2, 2 to 3, etc. More items where range is 0 to 4, 4 to 6, 6 to 8, etc. I even have items with ranges from 0 to 0.5, 0.5 to 1, 1 to 1.5, etc. So, first I will group all items based on their ranges, so all items with range 1 to 2 would be together in one list, while all items with range 0 to 100 would be in a different list. I've calculated that I'll be dealing with about 50 different ranges which can overlap each other. However, I have over 25000 items which need to be grouped like this.
Next, I get a value from another source. For example the value 1.12 which I need to find. So this time I use Linq to search through the dictionary to find all lists of items where 1.12 would be in the range of the keys. Thus I'd find the range 1 to 2, 1 to 1.5 and even 0 to 100. Behind these ranges there would be lists of items which I need to process for this value. And then I can move onwards to the next value, for about 4000 different values. And preferably everything should finish with 5 seconds.

Using a key in a dictionary is a matter of overriding GetHashCode and Equals. Basically you'd create a hash based on the minimum and maximum values and Upperbound. Typically you call GetHashCode on each component and combine them, e.g.:
public override int GetHashCode()
{
int result = 17;
result = result * 31 + Min.GetHashCode();
result = result * 31 + Max.GetHashCode();
result = result * 31 + Upperbound ? 1 : 0;
}
You'd also need the equality test.
I'm not sure what you mean by "to allow me to do a search based upon a range" though. Could you give some sample code showing how you'd like to use this ability? I'm not entirely sure it'll fit within the normal dictionary approach...
I suggest you don't overload the == operator to allow you to do a containment test with it though. A range isn't equal to a value in the range, so code using that wouldn't be very intuitive.
(I'd personally rename Between to Contains as well.)

Related

Please suggest different approach for this CountNumbers algorithm

Implement function CountNumbers that accepts a sorted array of unique integers and counts the number of array elements that are less than the parameter lessThan
For example, SortedSearch.CountNumbers(new int[] { 1, 3, 5, 7 }, 4) should return 2 because there are two array elements less than 4.
Below is my approach. But the score given by online tool for this is 50%. What am i missing?
using System;
public class SortedSearch
{
public static int CountNumbers(int[] sortedArray, int lessThan)
{
int iRes=0;
for (int i=0; i<sortedArray.Length; i++)
{
if(sortedArray[i]< lessThan)
{
iRes=iRes+1;
}
}
return iRes;
}
public static void Main(string[] args)
{
Console.WriteLine(SortedSearch.CountNumbers(new int[] { 1, 3, 5, 7 }, 4));
}
}

Your current solution takes up to O(N) where N is the size of array. You could leverage the fact that your input array is sorted to decrease the time complexity of the solution to by using BinarySearch:
public static int CountNumbers(int[] sortedArray, int lessThan)
{
var result = Array.BinarySearch(sortedArray, lessThan);
return result >= 0 ? result : -1 * result - 1;
}
Why do the strange -1 * result - 1 code? Because, as per the docs:
Returns
The index of the specified value in the specified array, if value is
found; otherwise, a negative number. If value is not found and value
is less than one or more elements in array, the negative number
returned is the bitwise complement of the index of the first element
that is larger than value. If value is not found and value is greater
than all elements in array, the negative number returned is the
bitwise complement of (the index of the last element plus 1). If this
method is called with a non-sorted array, the return value can be
incorrect and a negative number could be returned, even if value is
present in array.
result - 1 reverses the "bitwise complement of (the index of the last element plus 1)".
BinarySearch will generally perform faster than Where or TakeWhile particularly over large sets of data - since it will perform a binary search.
From wikipedia:
In computer science, binary search, also known as half-interval
search, logarithmic search, or binary chop, is a search algorithm that
finds the position of a target value within a sorted array.
The clue to use a binary search is the "accepts a sorted array of unique integers" part of the requirement. My above solution only works, as is, with a sorted array of unique values. It thus seems to me that whomever wrote the online test likely had binary search in mind.

You could make use of Linq for the purpose.
int CountNumbers(IEnumerable<int> source,int limit)
{
return source.TakeWhile(x=>x<limit).Count();
}
Since it is already mentioned in OP that the input array is sorted, you can exit your search when you find the first element greater than the limit. TakeWhile method would help you in the same.
The above method, would select all elements while the condition is met, and finds the count of items.
Example.
var result = CountNumbers(new int[] {1, 3, 5, 7},4);
Output : 2

Custom option to Search a Sorted list faster than Plain Binary Search

Following is the use-case:
Sorted List of DateTime type, with granularity in the millisecond
Search for nearest DateTime, which satisfy the supplied predicate delegate
Performance is an issue, since List has 100K+ records, total time span of 10 hours from minimum to maximum index and lot of frequent calls (50+ / run), impacts performance
What we currently do, custom binary search as follows ?
public static int BinaryLastOrDefault<T>(this IList<T> list, Predicate<T> predicate)
{
var lower = 0;
var upper = list.Count - 1;
while (lower < upper)
{
var mid = lower + ((upper - lower + 1) / 2);
if (predicate(list[mid]))
{
lower = mid;
}
else
{
upper = mid - 1;
}
}
if (lower >= list.Count) return -1;
return !predicate(list[lower]) ? -1 : lower;
}
Can I use Dictionary to make it O(1) ?
My understanding is No, since the input value may not be there and in that case we need to return the closest value, which if in above code returns -1, then last element in the sorted list is the expected result
Following is the option I am considering
Data structure like Dictionary<int,SortedDictionary<DateTime,int>>
Total duration DateTime duration between highest and lowest value is 10 hours ~ 10 * 3600 * 1000 ms = 36 million ms
Created buckets of 60 sec each, total number of elements ~ 36 million / 60 K = 600
For any supplied DateTime value, its now easy to find the Bucket, where limited number of values can be stored as SortedDictionary, with key as DateTime value and original index as value, thus if required then data can enumerated to find the closest index
In my understanding this implementation, will make the search much faster than Binary search detailed above, since data searched would be substantially reduced, Any suggestion what more can be done to improve the search time further to further improve it in the algorithmic terms, I can try the Parallel options for various independent calls separately

I made some performance tests using the native BinarySearch method of List<T>. The logic for finding the nearest DateTime is shown below:
public static DateTime GetNearest(List<DateTime> source, DateTime date)
{
var index = source.BinarySearch(date);
if (index >= 0) return source[index];
index = ~index;
if (index == 0) return source[0];
if (index == source.Count) return source[source.Count - 1];
var d1 = source[index - 1];
var d2 = source[index];
return (date - d1 < d2 - date) ? d1 : d2;
}
I created a random list of 1,000,000 sorted dates, covering a time span of 10 hours from min to max. Then I created an equally sized list with unsorted random dates to search, covering a slightly larger time span. Then changed the build to Release and started the test. The result demonstrated that it is possible to make more than 800,000 searches in less than a second, using only a single core of a relatively slow machine.
Then I increased the complexity of the test by searching in a List<(DateTime, object)> containing 1,000,000 elements, so that each comparison needs two extra calls to a dateSelector function, which returns the DateTime property of each ValueTuple.
The result: 350,000 searches per thread per second.
I increased the complexity even further by using reference types as elements, populating a List<Tuple<DateTime, object>> with 1,000,000 tuples. The performance was still pretty decent: 270,000 searches per thread per second.
My conclusion is that the BinarySearch method is lightning fast, and it would be surprising if it was found to be the bottleneck of an application.

Check if int is 10, 100, 1000,

I have a part in my application which needs to do do something (=> add padding 0 in front of other numbers) when a specified number gets an additional digit, meaning it gets 10, 100, 1000 or so on...
At the moment I use the following logic for that:
public static bool IsNewDigit(this int number)
{
var numberString = number.ToString();
return numberString.StartsWith("1")
&& numberString.Substring(1).All(c => c == '0');
}
The I can do:
if (number.IsNewDigit()) { /* add padding 0 to other numbers */ }
This seems like a "hack" to me using the string conversion.
Is there something something better (maybe even "built-in") to do this?
UPDATE:
One example where I need this:
I have an item with the following (simplified) structure:
public class Item
{
public int Id { get; set; }
public int ParentId { get; set; }
public int Position { get; set; }
public string HierarchicPosition { get; set; }
}
HierarchicPosition is the own position (with the padding) and the parents HierarchicPositon. E.g. an item, which is the 3rd child of 12 from an item at position 2 has 2.03 as its HierarchicPosition. This can as well be something more complicated like 011.7.003.2.02.
This value is then used for sorting the items very easily in a "tree-view" like structure.
Now I have an IQueryable<Item> and want to add one item as the last child of another item. To avoid needing to recreate all HierarchicalPosition I would like to detect (with the logic in question) if the new position adds a new digit:
Item newItem = GetNewItem();
IQueryable<Item> items = db.Items;
var maxPosition = items.Where(i => i.ParentId == newItem.ParentId)
.Max(i => i.Position);
newItem.Position = maxPosition + 1;
if (newItem.Position.IsNewDigit())
UpdateAllPositions(items.Where(i => i.ParentId == newItem.ParentId));
else
newItem.HierarchicPosition = GetHierarchicPosition(newItem);
UPDATE #2:
I query this position string from the DB like:
var items = db.Items.Where(...)
.OrderBy(i => i.HierarchicPosition)
.Skip(pageSize * pageNumber).Take(pageSize);
Because of this I can not use an IComperator (or something else wich sorts "via code").
This will return items with HierarchicPosition like (pageSize = 10):
03.04
03.05
04
04.01
04.01.01
04.01.02
04.02
04.02.01
04.03
05
UPDATE #3:
I like the alternative solution with the double values, but I have some "more complicated cases" like the following I am not shure I can solve with that:
I am building (on part of many) an image gallery, which has Categories and Images. There a category can have a parent and multiple children and each image belongs to a category (I called them Holder and Asstes in my logic - so each image has a holder and each category can have multiple assets). These images are sorted first be the categories position and then by its own position. This I do by combining the HierarchicPosition like HolderHierarchicPosition#ItemHierarchicPosition. So in a category which has 02.04 as its position and 120 images the 3rd image would get 02.04#003.
I have even some cases with "three levels" (or maybe more in the future) like 03.1#02#04.
Can I adapt the "double solution" to suport such scenarios?
P.S.: I am also open to other solution for my base problem.

You could check if base-10 logarithm of the number is an integer. (10 -> 1, 100 -> 2, 1000 -> 3, ...)
This could also simplify your algorithm a bit in general. Instead of adding one 0 of padding every time you find something bigger, simply keep track of the maximum number you see, then take length = floor(log10(number))+1 and make sure everything is padded to length. This part does not suffer from the floating point arithmetic issues like the comparison to integer does.

From What you describe, it looks like your HierarchicPosition position should maintain an order of items and you run into the problem, that when you have the ids 1..9 and add a 10, you'll get the order 1,10,2,3,4,5,6... somewhere and therefore want to pad-left to 01,02,03...,10 - correct?
If I'm right, please have a look at this first: https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem
Because what you try to do is a workarround to solve the problem in a certain way. - But there might be more efficent ways to actually really solve it. (therefore you should have better asked about your actual problem rather than the solution you try to implement)
See here for a solution, using a custom IComparator to sort strings (that are actually numbers) in a native way: http://www.codeproject.com/Articles/11016/Numeric-String-Sort-in-C
Update regarding your update:
With providing a sorting "String" like you do, you could insert a element "somewhere" without having ALL subsequent items reindexed, as it would be for a integer value. (This seems to be the purpose)
Instead of building up a complex "String", you could use a Double-Value to achieve the very same result real quick:
If you insert an item somewhere between 2 existing items, all you have to do is : this.sortingValue = (prior.sortingValue + next.sortingValue) / 2 and handle the case when you are inserting at the end of the list.
Let's assume you add Elements in the following order:
1 First Element // pick a double value for sorting - 100.00 for example. -> 100.00
2 Next Element // this is the list end - lets just add another 100.00 -> 200.00
1.1 Child // this should go "in between": (100+200)/2 = 150.00
1.2 Another // between 1.1 and 2 : (150+200)/2 = 175
When you now simple sort depending on that double field, the order would be:
100.00 -> 1
150.00 -> 1.1
175.00 -> 1.2
200.00 -> 2
Wanna Add 1.1.1? Great: positon = (150.00 + 175.00)/2;;
you could simple multiply all by 10, whenever your NEW value hits x.5* to ensure you are not running out of decimal places (but you dont have to - having .5 .25 .125 ... does not hurt the sorting):
So, after adding the 1.1.1 which would be 162,5, multiply all by 10:
1000.00 -> 1
1500.00 -> 1.1
1625.00 -> 1.1.1
1750.00 -> 1.2
2000.00 -> 2
So, whenever you move an item arround, you only need to recalculate the position of n by looking at n-1 and n+1
Depending on the expected childs per entry, you could start with "1000.00", "10.000" or whatever matches best.
What I didn't take into account: When you want to move "2" to the top, you would need to recalculate all childs of "2" to have a value somewhere between the sorting value of "2" and the now "next" item... Could serve some headache :)
The solution with "double" values has some limitations, but will work for smaller sets of groups. However you are talking about "Groups, subgroups, and pictures with counts of 100" - so another solution would be preferable:
First, you should refactor your database: Currently you are trying to "squeeze" a Tree into a list (datatables are basically lists)
To really reflect the complex layout of a tree with an infinite depth, you should use 2 tables and implement the composite pattern.
Then you can use a recursive approach to get a category, its subcategory, [...] and finally the elements of that category.
With that, you only need to provide a position of each leaf within it's current node.
Rearanging leafs will not affect any leaf of another node or any node.
Rearanging nodes will not affect any subnode or leaf of that node.

You could check sum of square of all digits for the input, 10,100,1000 has something in common that, if you do the sum of square of all digits, it should converge to one;
10
1^2 + 0^2 = 1
100
1^2 + 0^2 + 0^2 = 1
so on so forth.

Pick up two numbers from an array so that the sum is a constant

I came across an algorithm problem. Suppose I receive a credit and would like to but two items from a local store. I would like to buy two items that add up to the entire value of the credit. The input data has three lines.
The first line is the credit, the second line is the total amount of the items and the third line lists all the item price.
Sample data 1:
200
7
150 24 79 50 88 345 3
Which means I have $200 to buy two items, there are 7 items. I should buy item 1 and item 4 as 200=150+50
Sample data 2:
8
8
2 1 9 4 4 56 90 3
Which indicates that I have $8 to pick two items from total 8 articles. The answer is item 4 and item 5 because 8=4+4
My thought is first to create the array of course, then pick up any item say item x. Creating another array say "remain" which removes x from the original array.
Subtract the price of x from the credit to get the remnant and check whether the "remain" contains remnant.
Here is my code in C#.
// Read lines from input file and create array price
foreach (string s in price)
{
int x = Int32.Parse(s);
string y = (credit - x).ToString();
index1 = Array.IndexOf(price, s) ;
index2 = Array.IndexOf(price, y) ;
remain = price.ToList();
remain.RemoveAt(index1);//remove an element
if (remain.Contains(y))
{
break;
}
}
// return something....
My two questions:
How is the complexity? I think it is O(n2).
Any improvement to the algorithm? When I use sample 2, I have trouble to get correct indices. Because there two "4" in the array, it always returns the first index since IndexOf(String) reports the zero-based index of the first occurrence of the specified string in this instance.

You can simply sort the array in O(nlogn) time. Then for each element A[i] conduct a binary search for S-A[i] again in O(nlogn) time.
EDIT: As pointed out by Heuster, you can solve the 2-SUM problem on the sorted array in linear time by using two pointers (one from the beginning and other from the end).

Create a HashSet<int> of the prices. Then go through it sequentially.Something like:
HashSet<int> items = new HashSet<int>(itemsList);
int price1 = -1;
int price2 = -1;
foreach (int price in items)
{
int otherPrice = 200 - price;
if (items.Contains(otherPrice))
{
// found a match.
price1 = price;
price2 = otherPrice;
break;
}
}
if (price2 != -1)
{
// found a match.
// price1 and price2 contain the values that add up to your target.
// now remove the items from the HashSet
items.Remove(price1);
items.Remove(price2);
}
This is O(n) to create the HashSet. Because lookups in the HashSet are O(1), the foreach loop is O(n).

This problem is called 2-sum. See., for example, http://coderevisited.com/2-sum-problem/

Here is an algorithm in O(N) time complexity and O(N) space : -
1. Put all numbers in hash table.
2. for each number Arr[i] find Sum - Arr[i] in hash table in O(1)
3. If found then (Arr[i],Sum-Arr[i]) are your pair that add up to Sum
Note:- Only failing case can be when Arr[i] = Sum/2 then you can get false positive but you can always check if there are two Sum/2 in the array in O(N)

I know I am posting this is a year and a half later, but I just happened to come across this problem and wanted to add input.
If there exists a solution, then you know that both values in the solution must both be less than the target sum.
Perform a binary search in the array of values, searching for the target sum (which may or may not be there).
The binary search will end with either finding the sum, or the closest value less than sum. That is your starting high value while searching through the array using the previously mentioned solutions. Any value above your new starting high value cannot be in the solution, as it is more than the target value.
At this point, you have eliminated a chunk of data in log(n) time, that would otherwise be eliminated in O(n) time.
Again, this is an optimization that may only be worth implementing if the data set calls for it.

Test for gaps in range

I need to test if some objects inside a database fill a specific range, i.e 0-999.
I'm using C# and I've created a generic class using IComparable to test for the intersection. This works fine but I need to invert and find all the gaps that I have in this interval.
My database objects have start and end properties, that are integers. I can find where are the gaps, but I need to cluster them to create the missing pieces.
foreach (var interval in intervals)
{
for (int i = 0; i <= 999; i++)
{
if (Range<int>.Intersects(interval,new Range<int>(i,i)))
continue;
else
doesNotIntersect.Add(i);
}
}
With this code I have a pretty list of "holes". What I'm trying to do now is to group these values, but I find that my solution is not optimal and certainly not elegant.
I've read about BitArrays, but how can they help me? I wish that from a list of ranges I can find the gaps in a fixed range. If we are talking about a line, I need basically the result of fixed - intervals.
I can only use .NET to solve this. I have a large piece of middleware and this process of validation will occur several times a day, so I prefer not having to go through middleware and then databasr to solve.
Let me try to create a picture
Fixed range that needs to be filled
111111111
Ranges that objects provided
101100001
Ranges that need to be filled
010011110
This is my range object:
public class Range<T> where T : IComparable
{
public T Start { get; set; }
public T End { get; set; }
public Range(T start, T end)
{
Start = start;
End = end;
}
public static bool Intersects(Range<T> left,Range<T> right)
{
if (left.Start.CompareTo(right.Start) == 0)
return true;
if (left.Start.CompareTo(right.Start) > 0)
{
return left.Start.CompareTo(right.End) <= 0;
}
return right.Start.CompareTo(left.End) <= 0;
}
}
I need to find gaps in start end points, instead of continous intervals.
Help?

00000000000000000000000000000
| |
8:00 9:00
Suppose every '0' in the bitarray represents a time unit(second, hour etc.)
Start looping the intervals and set bits according to start & end values.
Now you will have something like this
11110001111110001111000111000
The '0' are your grouped gaps

You could use the SQL for that, if the integer value could be represented by entity. Then just create a table with single column seq where are all values from 0 to 999 then using left outer join, join the table with that entity and select only those id where entity is null.
Example query should look like this.
SELECT ts.seq
FROM sequenceTable ts LEFT OUTER JOIN sourceTable st ON ts.seq = st.entity
WHERE st.entity is null;
You could use the row num to create column seq of table seauenceTable.
--EDIT
As the solution should be in CLR, you can use use Collections, create a List with values from 0 to 999, then remove all from then intervals.
Next solution is using a boolean array. Create array with the proper length (999 in this case), then iterate through the intervals, and use interval as index where value true for it in boolean array, then just iterate once again over that array and the missing intervals will be represented by index where value is false.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Ranges with Linq and dictionaries - c#

Related

Please suggest different approach for this CountNumbers algorithm

Custom option to Search a Sorted list faster than Plain Binary Search

Check if int is 10, 100, 1000,

Pick up two numbers from an array so that the sum is a constant

Test for gaps in range

Categories

Resources