Identify gaps in N dimensional data set

Identify gaps in N dimensional data set - c#

We have an interesting problem we need to resolve here, we're using C# .NET 4.0 but the language should be irrelevant as it's a mathematical problem.
Problem: we need to identify gaps in N dimensional data set and report exactly where those gaps are back to the user.
For example, let's assume we're working with 3-D, so we have this object Quote with 6 properties: TermFrom, TermTo, AgeFrom, AgetTo, AmountFrom, AmountTo and boundaries we need to cover: MinTerm = 0, MaxTerm = 5, MinAge = 0, MaxAge = 5, MinAmount = 0, MaxAmount = 5. Assuming all minimums are 0 and maximums are 5 just to simplify the example, albeit they could be different. So, the data we need to check for gaps in is the following data set:
Quote[] {
{ TermFrom=0, TermTo=3, AgeFrom=0, AgetTo=4, AmountFrom=0, AmountTo=2 },
{ TermFrom=4, TermTo=5, AgeFrom=0, AgetTo=5, AmountFrom=3, AmountTo=5 }
This dataset contains gaps for combinations: { Term: 0-5, Age: 4-5, Amount: 0-2 } and { Term: 0-3, Age: 0-5, Amount: 2-5 }, { Term: 4-5, Age: 0-5, Amount: 0-2 } (i think)
i.e. if you imagine a cube and the datasets are parts of this cube, to calculated total volume of all datasets must be equal to the cube's volume. And we need to identify where the gaps are in this cube if its volume isn't full.
All of this is required with more dimensions: 4 and 5, but it's a lot harder to visualise. I was hoping there is some sort of mathematical solution to this problem, which we could translate to c# code.

Use a k-d tree, it is meant precisely to partition space for these sorts of applications.

Related

C# data structure for finding current item based on ranges

I am looking for a data structure or confirmation one doesn't exist where you can input items and a numerical value associated with eh beginning of that item being valid.
Imagine that you're trying to find out your reward for a high score.
The elements would look something like this:
0 - nothing
100 - silver star
300 - gold star
I'd then like to quickly evaluate my current reward or any generic item based on my current score. If I pass in 270, I get silver star.
When I first had to implement something like this, I used a list of duples, the first value being the numeric floor and the second being the item. Of course this works. But of course as the list grows, so does the lookup time.
I have to implement something similar to this again but with normalized values. Again, I can do the same thing, but if there is a better more efficient way using a tree, please point me in that direction.
Regardless of the correct structure, I'll write a wrapper for the normalized and raw value search.

Try this:
var levels = new []
{
new { score = 0, level = "nothing" },
new { score = 100, level = "silver" },
new { score = 300, level = "gold" },
};
var score = 270;
var level = levels.OrderBy(x => x.score).Where(x => x.score <= score).Last().level;
Console.WriteLine(level);
That gives me silver on the console.

Checking against a changing set of integer ranges using C#

When filling in a form, the user needs to specify an amount. This amount is then checked against approximately 4 to 6 ranges. The selected range is then saved in the database. The original amount will not be stored (for non-technical reasons). There will be no overlay between the ranges, e.g.:
0-999
1000-1999
2000-4999
5000-9999
10000-higher
The tricky part is that these ranges are not fixed in stone. There can be alterations and additional ranges can be added to further specify the '10000 and higher' range. These changes will occur a couple of times and can't be prevented. The old ranges will need to be stored since the specific amount can not be saved to the database.
What would be the most efficient C# data structure for checking against a changing set of ranges?
For my research I included:
One of the answers here suggest that a fixed set of integer ranges in a switch statement is possible with C#7. However, it is not possible to dynamically add cases to and/or remove cases from a switch statement.
This question suggests that using Enumerable.Range is not the most efficient way.

A simple approach here is to store the lower band values in an array, and pass it to a FindBand() method which returns an integer representing the index of the band containing the value.
For example:
public static int FindBand(double value, double[] bandLowerValues)
{
for (int i = 0; i < bandLowerValues.Length; ++i)
if (value < bandLowerValues[i])
return Math.Max(0, i-1);
return bandLowerValues.Length;
}
Test code:
double[] bandLowerValues = {0, 1, 2, 5, 10};
Console.WriteLine(FindBand(-1, bandLowerValues));
Console.WriteLine(FindBand(0, bandLowerValues));
Console.WriteLine(FindBand(0.5, bandLowerValues));
Console.WriteLine(FindBand(1, bandLowerValues));
Console.WriteLine(FindBand(1.5, bandLowerValues));
Console.WriteLine(FindBand(2.5, bandLowerValues));
Console.WriteLine(FindBand(5, bandLowerValues));
Console.WriteLine(FindBand(8, bandLowerValues));
Console.WriteLine(FindBand(9.9, bandLowerValues));
Console.WriteLine(FindBand(10, bandLowerValues));
Console.WriteLine(FindBand(11, bandLowerValues));
This isn't the fastest approach if there are a LOT of bands, but if there are just a few bands this is likely to be sufficiently fast.
(If there were a lot of bands, you could use a binary search to find the appropriate band, but that would be overkill for this in my opinion.)

You can sort low bounds, e.g.
// or decimal instead of double if values are money
double[] lowBounds = new double[] {
0, // 0th group: (-Inf .. 0)
1000, // 1st group: [0 .. 1000)
2000, // 2nd group: [1000 .. 2000)
5000, // 3d group: [2000 .. 5000)
10000, // 4th group: [5000 .. 10000)
// 5th group: [10000 .. +Inf)
};
and then find the correct group (0-based)
int index = Array.BinarySearch(lowBounds, value);
index = index < 0 ? index = -index - 1 : index + 1;
Demo:
double[] tests = new double[] {
-10,
0,
45,
999,
1000,
1997,
5123,
10000,
20000,
};
var result = tests
.Select(value => {
int index = Array.BinarySearch(lowBounds, value);
index = index < 0 ? index = -index - 1 : index + 1;
return $"{value,6} : {index}";
});
Console.Write(string.Join(Environment.NewLine, result));
Outcome:
-10 : 0
0 : 1
45 : 1
999 : 1
1000 : 2
1997 : 2
5123 : 4
10000 : 5
20000 : 5

Since there are already great answers regarding how to find the correct range, I'd like to address the persistence issue.
What do we have here?
You cannot persist the exact value. ( Not allowed )
Values will be "blurred" by fitting them into a range.
Those ranges can (and will) change over time in bounds and number.
So, what I would probably do would be to persist lower and upper bound explicitly in the db.
That way, if ranges change, old data is still correct. You cannot "transform" to the new ranges, because you cannot know if it would be correct. So you need to keep the old values. Any new entries after the change will reflect the new ranges.
One could think of normalization, but honestly, I think that would be overcomplicating the problem. I'd only consider that if the benefit (less storage space) would greatly outweigh the complexity issues.

Recomended Combination Algorithms for Numbers with Ranges

I am currently trying to write C# code that finds multiple arrays of integers that equal a specified total when they are summed up. I would like to find these combinations while each integer in the array is given a range it can be.
For example, if our total is 10 and we have an int array of size 3 where the first number can be between 1 and 4, the second 2 and 4, and the third 3 and 6, some possible combination are [1, 3, 6], [2, 2, 6], and [4, 2, 4].
What sort of algorithm would help with solving a problem like this that can run in them most efficient amount of time? Also, what other things should I keep in mind when transitioning this problem into C# code?

I would do this using recursion. You can simply iterate over all possible values and see if they give a required sum.
Input
Let's suppose we have the following input pattern:
N S
min1 min2 min3 ... minN
max1 max2 max3 ... maxN
For your example
if our total is 10 and we have an int array of size 3 where the first
number can be between 1 and 4, the second 2 and 4, and the third 3 and
6
it will be:
3 10
1 2 3
4 4 6
Solution
We have read our input values. Now, we just try to use each possible number for our solution.
We will have a List which will store the current path:
static List<int> current = new List<int>();
The recursive function is pretty simple:
private static void Do(int index, int currentSum)
{
if (index == length) // Termination
{
if (currentSum == sum) // If result is a required sum - just output it
Output();
return;
}
// try all possible solutions for current index
for (int i = minValues[index]; i <= maxValues[index]; i++)
{
current.Add(i);
Do(index + 1, currentSum + i); // pass new index and new sum
current.RemoveAt(current.Count() - 1);
}
}
For non-negative values we can also include such condition. This is the recursion improvement which will cut off a huge amount of incorrect iterations. If we already have a currentSum greater than sum then it is useless to continue in this recursion branch:
if (currentSum > sum) return;
Actually, this algorithm is a simple "find combinations that give a sum S" problem solution with one difference: inner loop indices within minValue[index] and maxValue[index].
Demo
Here is the working IDEOne demo of my solution.

You cannot do much better than nested for loops/recursion. Though if you are familiar with the 3SUM problem you will know a little trick to reduce the time complexity of this sort of algorithm! If you have n ranges then you know what number you have to pick from the nth range after you make your first n-1 choices!
I will use an example to walk through my suggestion.
if our total is 10 and we have an int array of size 3 where the first number can be between 1 and 4, the second 2 and 4, and the third 5 and 6
First of all lets process the data to be a bit nicer to deal with. I personally like the idea of working with ranges that start at 0 instead of arbitrary numbers! So we subtract the lower bounds from the upper bounds:
(1 to 4) -> (0 to 3)
(2 to 4) -> (0 to 2)
(5 to 6) -> (0 to 1)
Of course now we need to adjust our target sum to reflect the new ranges. So we subtract our original lower bounds from our target sum as well!
TargetSum = 10-1-2-5 = 2
Now we can represent our ranges with just the upper bound since they share a lower bound! So a range array would look something like:
RangeArray = [3,2,1]
Lets sort this (it will become more obvious why later). So we have:
RangeArray = [1,2,3]
Great! Now onto the beef of the algorithm... the summing! For now I will use for loops as it is easier to use for example purposes. You will have to use recursion. Yeldar's code should give you a good starting place.
result = []
for i from 0 to RangeArray[0]:
SumList = [i]
newSum = TargetSum - i
for j from 0 to RangeArray[1]:
if (newSum-j)>=0 and (newSum-j)<=RangeArray[2] then
finalList = SumList + [j, newSum-j]
result.append(finalList)
Note the inner loop. This is what was inspired by the 3SUM algorithm. We take advantage of the fact that we know what value we have to pick from the third range (since it is defined by our first 2 choices).
From here you have to of course re-map the results back to the original ranges by adding the original lowerbounds to the values that came from the corresponding ranges.
Notice that we now understand why it may be a good idea to sort RangeList. The last range gets absorbed into the secondlast range's loop. We want the largest range to be the one that does not loop.
I hope this helps to get you started! If you need any help translating my pseudocode into c# just ask :)

Check if int is 10, 100, 1000,

I have a part in my application which needs to do do something (=> add padding 0 in front of other numbers) when a specified number gets an additional digit, meaning it gets 10, 100, 1000 or so on...
At the moment I use the following logic for that:
public static bool IsNewDigit(this int number)
{
var numberString = number.ToString();
return numberString.StartsWith("1")
&& numberString.Substring(1).All(c => c == '0');
}
The I can do:
if (number.IsNewDigit()) { /* add padding 0 to other numbers */ }
This seems like a "hack" to me using the string conversion.
Is there something something better (maybe even "built-in") to do this?
UPDATE:
One example where I need this:
I have an item with the following (simplified) structure:
public class Item
{
public int Id { get; set; }
public int ParentId { get; set; }
public int Position { get; set; }
public string HierarchicPosition { get; set; }
}
HierarchicPosition is the own position (with the padding) and the parents HierarchicPositon. E.g. an item, which is the 3rd child of 12 from an item at position 2 has 2.03 as its HierarchicPosition. This can as well be something more complicated like 011.7.003.2.02.
This value is then used for sorting the items very easily in a "tree-view" like structure.
Now I have an IQueryable<Item> and want to add one item as the last child of another item. To avoid needing to recreate all HierarchicalPosition I would like to detect (with the logic in question) if the new position adds a new digit:
Item newItem = GetNewItem();
IQueryable<Item> items = db.Items;
var maxPosition = items.Where(i => i.ParentId == newItem.ParentId)
.Max(i => i.Position);
newItem.Position = maxPosition + 1;
if (newItem.Position.IsNewDigit())
UpdateAllPositions(items.Where(i => i.ParentId == newItem.ParentId));
else
newItem.HierarchicPosition = GetHierarchicPosition(newItem);
UPDATE #2:
I query this position string from the DB like:
var items = db.Items.Where(...)
.OrderBy(i => i.HierarchicPosition)
.Skip(pageSize * pageNumber).Take(pageSize);
Because of this I can not use an IComperator (or something else wich sorts "via code").
This will return items with HierarchicPosition like (pageSize = 10):
03.04
03.05
04
04.01
04.01.01
04.01.02
04.02
04.02.01
04.03
05
UPDATE #3:
I like the alternative solution with the double values, but I have some "more complicated cases" like the following I am not shure I can solve with that:
I am building (on part of many) an image gallery, which has Categories and Images. There a category can have a parent and multiple children and each image belongs to a category (I called them Holder and Asstes in my logic - so each image has a holder and each category can have multiple assets). These images are sorted first be the categories position and then by its own position. This I do by combining the HierarchicPosition like HolderHierarchicPosition#ItemHierarchicPosition. So in a category which has 02.04 as its position and 120 images the 3rd image would get 02.04#003.
I have even some cases with "three levels" (or maybe more in the future) like 03.1#02#04.
Can I adapt the "double solution" to suport such scenarios?
P.S.: I am also open to other solution for my base problem.

You could check if base-10 logarithm of the number is an integer. (10 -> 1, 100 -> 2, 1000 -> 3, ...)
This could also simplify your algorithm a bit in general. Instead of adding one 0 of padding every time you find something bigger, simply keep track of the maximum number you see, then take length = floor(log10(number))+1 and make sure everything is padded to length. This part does not suffer from the floating point arithmetic issues like the comparison to integer does.

From What you describe, it looks like your HierarchicPosition position should maintain an order of items and you run into the problem, that when you have the ids 1..9 and add a 10, you'll get the order 1,10,2,3,4,5,6... somewhere and therefore want to pad-left to 01,02,03...,10 - correct?
If I'm right, please have a look at this first: https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem
Because what you try to do is a workarround to solve the problem in a certain way. - But there might be more efficent ways to actually really solve it. (therefore you should have better asked about your actual problem rather than the solution you try to implement)
See here for a solution, using a custom IComparator to sort strings (that are actually numbers) in a native way: http://www.codeproject.com/Articles/11016/Numeric-String-Sort-in-C
Update regarding your update:
With providing a sorting "String" like you do, you could insert a element "somewhere" without having ALL subsequent items reindexed, as it would be for a integer value. (This seems to be the purpose)
Instead of building up a complex "String", you could use a Double-Value to achieve the very same result real quick:
If you insert an item somewhere between 2 existing items, all you have to do is : this.sortingValue = (prior.sortingValue + next.sortingValue) / 2 and handle the case when you are inserting at the end of the list.
Let's assume you add Elements in the following order:
1 First Element // pick a double value for sorting - 100.00 for example. -> 100.00
2 Next Element // this is the list end - lets just add another 100.00 -> 200.00
1.1 Child // this should go "in between": (100+200)/2 = 150.00
1.2 Another // between 1.1 and 2 : (150+200)/2 = 175
When you now simple sort depending on that double field, the order would be:
100.00 -> 1
150.00 -> 1.1
175.00 -> 1.2
200.00 -> 2
Wanna Add 1.1.1? Great: positon = (150.00 + 175.00)/2;;
you could simple multiply all by 10, whenever your NEW value hits x.5* to ensure you are not running out of decimal places (but you dont have to - having .5 .25 .125 ... does not hurt the sorting):
So, after adding the 1.1.1 which would be 162,5, multiply all by 10:
1000.00 -> 1
1500.00 -> 1.1
1625.00 -> 1.1.1
1750.00 -> 1.2
2000.00 -> 2
So, whenever you move an item arround, you only need to recalculate the position of n by looking at n-1 and n+1
Depending on the expected childs per entry, you could start with "1000.00", "10.000" or whatever matches best.
What I didn't take into account: When you want to move "2" to the top, you would need to recalculate all childs of "2" to have a value somewhere between the sorting value of "2" and the now "next" item... Could serve some headache :)
The solution with "double" values has some limitations, but will work for smaller sets of groups. However you are talking about "Groups, subgroups, and pictures with counts of 100" - so another solution would be preferable:
First, you should refactor your database: Currently you are trying to "squeeze" a Tree into a list (datatables are basically lists)
To really reflect the complex layout of a tree with an infinite depth, you should use 2 tables and implement the composite pattern.
Then you can use a recursive approach to get a category, its subcategory, [...] and finally the elements of that category.
With that, you only need to provide a position of each leaf within it's current node.
Rearanging leafs will not affect any leaf of another node or any node.
Rearanging nodes will not affect any subnode or leaf of that node.

You could check sum of square of all digits for the input, 10,100,1000 has something in common that, if you do the sum of square of all digits, it should converge to one;
10
1^2 + 0^2 = 1
100
1^2 + 0^2 + 0^2 = 1
so on so forth.

Building a non sequential list of numbers (From a large range)

I need to create a non sequential list of numbers that fit within a range. For instance i need to a generate a list of numbers from 1 to 1million and make sure that non of the numbers are in a sequential order, that they are completly shuffled. I guess my first question is, are there any good algorithms out there that could help and how best to implement this.
I currently am not sure the best way to implement, either via a c# console app that will spit out the numbers in an XML file or in a database that will spit out the numbers into a table or a set of tables, but that is really secondary to actually working out the best way of "shuffling" the set of numbers.
Any advice guys?
Rob

First off, if none of the numbers are in sequential order then every number in the sequence must be less than its predecessor. A sequence which has that property is sorted from biggest to smallest! Clearly that is not what you want. (Or perhaps you simply do not want any subsequence of the form 5, 6, 7 ? But 6, 8, 20 would be OK?)
To answer your question properly we need to know more information about the problem space. Things I would want to know:
1) Is the size of the range equal to, larger than, or smaller than the size of the sequence? That is, are you going to ask for ten numbers between 1 and 10, five numbers between 1 and 10 or fifty numbers between 1 and 10?
2) Is it acceptable for the sequence to contain duplicates? (If the number of items in the sequence is larger than the range, then clearly yes.)
3) What is the randomness being used for? Most random number generators are only pseudo-random; a clever attacker can deduce the next "random" number by knowing the previous ones. If for example you are generating a series of five cards out of a deck of 52 to make a poker hand, you want really strong randomness; you don't want players to be able to deduce what their opponents have in their hands.

How "non-sequential" do you want it?
You could easily generate a list of random numbers from a range with the Random class:
Random rnd1 = new Random();
List<int> largeList = new List<int>();
for (int i = 0, i < largeNumber, i++)
{
largeList.Add(rnd1.Next(1, 1000001);
}
Edit to add
Admittedly the Durstenfeld algorithm (modern version of the Fisher–Yates shuffle apparently) is much faster:
var fisherYates = new List<int>(upperBound);
for (int i = 0; i < upperBound; i++)
{
fisherYates.Add(i);
}
int n = upperBound;
while (n > 1)
{
n--;
int k = rnd.Next(n + 1);
int temp = fisherYates[k];
fisherYates[k] = fisherYates[n];
fisherYates[n] = temp;
}
For the range 1 to 10000 doing a brute force "find a random number I've not yet used" takes around 4-5 seconds, while this takes around 0.001.
Props to Greg Hewgill for the links.

I understand, that you want to get a random array of lenth 1mio with all numbers from 1 to 1mio. No duplicates, is that right?
You should build up an array with your numbers ranging from 1 to 1mio. Then start shuffling. But it can happen (that is true randomness) that two ore even more numbers are sequential.
Have a look here

Here's a C# function to get you started:
public IEnumerable<int> GetRandomSequence(int max)
{
var r = new Random();
while (true)
{
yield return r.GetNext(max);
}
}
call it like this to get a million numbers ranged 0-9999999:
var numbers = GetRandomSequence(9999999).Take(1000000);
As for sorting, or if you don't want to allow repeats, look at Enumerable.GetRange() (which will give you a consecutive ordered sequence) and use a Fisher-Yates (or Knuth) shuffle algorithm (which you can find all over the place).

"completly shuffled" is a very misunderstood term. One trick fraud experts use when examining what should be "random" data is to watch for cases where there no duplicate values (like 3743***88***123, because in a truly random sequence the chances of not having such a pair is very low... Exactly what are you trying to do ? What, exactly do you mean by "completly shuffled"? If all you mean is random sequence of digits, then just use the Random class in the CLR. to generate random numbers between 0 and 1M... as many as you need...

Well ,you could go with something like this (assuming that you want every number exactly once):
DECLARE #intFrom int
DECLARE #intTo int
DECLARE #tblList table (_id uniqueidentifier, _number int)
SET #intFrom = 0
SET #intTo = 1000000
WHILE (#intFrom < #intTo)
BEGIN
INSERT INTO #tblList
SELECT NewID(), #intFrom
SET #intFrom = #intFrom + 1
END
SELECT *
FROM #tblList
ORDER BY _id
DISCLAIMER: I didn't test this, since I don't have an SQL Server at my disposal at the moment.

This may get you what you need:
1) Populate a list of numbers in order. If your range is 1 - x, it'll look like this:
[1, 2, 4, 5, 6, 7, 8, 9, ... , x]
2) Loop over the list x times, each time choosing a random number between 0 and the length of your list - 1.
3) Use this chosen number to select the corresponding element from your list, and add this number to your output list.
4) Delete the element you just selected from your list. Rinse, repeat.
This will work for any range of numbers, not just lists that start with 1 or 0. The pseudocode looks like this:
nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
shuffled_nums = []
for i in range(0, len(nums)):
random_index = rand(0,len(nums))
shuffled_nums.add(nums[random_index])
del(nums[random_index])

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Identify gaps in N dimensional data set - c#

Use a k-d tree, it is meant precisely to partition space for these sorts of applications.

Related

C# data structure for finding current item based on ranges

Checking against a changing set of integer ranges using C#

Recomended Combination Algorithms for Numbers with Ranges

Check if int is 10, 100, 1000,

Building a non sequential list of numbers (From a large range)

Categories

Resources