Test for gaps in range - c#

I need to test if some objects inside a database fill a specific range, i.e 0-999.
I'm using C# and I've created a generic class using IComparable to test for the intersection. This works fine but I need to invert and find all the gaps that I have in this interval.
My database objects have start and end properties, that are integers. I can find where are the gaps, but I need to cluster them to create the missing pieces.
foreach (var interval in intervals)
{
for (int i = 0; i <= 999; i++)
{
if (Range<int>.Intersects(interval,new Range<int>(i,i)))
continue;
else
doesNotIntersect.Add(i);
}
}
With this code I have a pretty list of "holes". What I'm trying to do now is to group these values, but I find that my solution is not optimal and certainly not elegant.
I've read about BitArrays, but how can they help me? I wish that from a list of ranges I can find the gaps in a fixed range. If we are talking about a line, I need basically the result of fixed - intervals.
I can only use .NET to solve this. I have a large piece of middleware and this process of validation will occur several times a day, so I prefer not having to go through middleware and then databasr to solve.
Let me try to create a picture
Fixed range that needs to be filled
111111111
Ranges that objects provided
101100001
Ranges that need to be filled
010011110
This is my range object:
public class Range<T> where T : IComparable
{
public T Start { get; set; }
public T End { get; set; }
public Range(T start, T end)
{
Start = start;
End = end;
}
public static bool Intersects(Range<T> left,Range<T> right)
{
if (left.Start.CompareTo(right.Start) == 0)
return true;
if (left.Start.CompareTo(right.Start) > 0)
{
return left.Start.CompareTo(right.End) <= 0;
}
return right.Start.CompareTo(left.End) <= 0;
}
}
I need to find gaps in start end points, instead of continous intervals.
Help?

00000000000000000000000000000
| |
8:00 9:00
Suppose every '0' in the bitarray represents a time unit(second, hour etc.)
Start looping the intervals and set bits according to start & end values.
Now you will have something like this
11110001111110001111000111000
The '0' are your grouped gaps

You could use the SQL for that, if the integer value could be represented by entity. Then just create a table with single column seq where are all values from 0 to 999 then using left outer join, join the table with that entity and select only those id where entity is null.
Example query should look like this.
SELECT ts.seq
FROM sequenceTable ts LEFT OUTER JOIN sourceTable st ON ts.seq = st.entity
WHERE st.entity is null;
You could use the row num to create column seq of table seauenceTable.
--EDIT
As the solution should be in CLR, you can use use Collections, create a List with values from 0 to 999, then remove all from then intervals.
Next solution is using a boolean array. Create array with the proper length (999 in this case), then iterate through the intervals, and use interval as index where value true for it in boolean array, then just iterate once again over that array and the missing intervals will be represented by index where value is false.

Related

Need help to sorting an array in a complicated way c# linq

I have a strict similar to this :
struct Chunk
{
public float d; // distance from player
public bool active; // is it active
}
I have an array of this struct.
What I need:
I need to sort it so the first element is an inactive chunk that is also the furthest from the player, than the next element is active and is also the closest to the player and that is the pattern,
Inactive, furthest
Active, closest
Inactive, furthest
Active, closest
Inactive, furthest
And so on...
Currently I'm using LINQ,
And I'm doing this :
chunk = chunk.AsParallel().OrderBy(x => x.active ? x.d : -x.d).ToArray();
But I don't know how to make it alternate one after another.
It looks like you want to split this into two lists, sort them, then interleave them.
var inactive = chunk.Where(x => !x.active).OrderByDescending(x => x.d);
var active = chunk.Where(x => x.active).OrderBy(x => x.d);
var interleaved = inactive.Zip(active, (a, b) => new [] { a, b }).SelectMany(x => x);
But I don't know how to make it alternate one after another.
If you sort the array so that all the inactives are at the start, descending and all the actives are at the end, descending..
OrderBy(x => x.active?1:0).ThenBy(x=>-x.d)
then you can take an item from the start, then an item from the end, then from the start + 1, then the end - 1 working your way inwards
public static IEnumerable<Chunk> Interleave(this Chunk[] chunks){
for(int s = 0, e = chunks.Length - 1; s<e;){
if(!chunks[s].active)
yield return chunks[s++];
if(chunks[e].active)
yield return chunks[e--];
}
}
There's a bit in this so let's unpack it. This is an extension method that acts on an array of chunk. It's a custom enumerator method, so you'd call foreach on it to use it
foreach(var c in chunk.Interleave())
It contains a for loop that tracks two variables, one for the start index and one for the end. The start increments and the end decrements. At some point they'll meet and s will no longer be less than e, which is when we stop:
for(int s = 0, e = chunks.Length - 1; s<e;){
We need to look at the chunk before we return it, if it's an inactive near the start, yield return it and bump the start on by one. s++ increments s, but resolves to the value s was before it incremented. It's thus conceptually like doing chunks[s]; s += 1; but in a one liner
if(!chunks[s].active)
yield return chunks[s++];
Then we look at the chunk near the end, if it's active then return the ending one and bump the end index down
The inactive chunks are tracked by s, and if s reaches an active chunk it stops returning (every pass of the loop it is skipped), which means e will work its way down towards s returning only the actives
Similarly if there are more inactives than actives, e will stop decrementing first and s will work its way up towards e
If you never came across yield return before think of it as a way to allow you to resume from where you left off rather than starting the method over again. It's used with enumerations to provide a way for the enumeration to return an item, then be moved on one and return the next item. It works a bit like saving your game and going doing something else, then coming back, realising your save game and carrying on from where you left off. Asking an enumerator for Next makes it load the game, play a bit, then save and stop.. Then you Next again and the latest save id loaded, play some more, save and stop. This way you gradually get through the game a bit at a time. If you started a new enumeration by calling Interleave again, that's like starting a new game over from the beginning
MSDN will get more detailed on yield return if you want to dig in more
Edit:
You can perform an in-place sort of your Chunk[] by having a custom comparer:
public class InactiveThenDistancedDescending : IComparer
{
public int Compare(object x, object y)
{
var a = (Chunk)x;
var b = (Chunk)y;
if(a.Active == b.Active)
return -a.Distance.CompareTo(b.Distance);
else
return a.Active.CompareTo(b.Active);
}
}
And:
Array.Sort(chunkArray, _someInstanceOfThatComparerAbove);
Not sure if you can do it with only one line of code.
I wrote a method that would only require the Array to be sorted once. Then, it enters either the next closest or furthest chunk based on the current index of the for loop (odd = closest, even = furthest). I remove the item from the sorted list to ensure that it will not be reentered in the results list. Finally, I return the results as an Array.
public Chunk[] SortArray(List<Chunk> list_to_sort)
{
//Setup Variables
var results = new List<Chunk>();
int count = list_to_sort.Count;
//Tracking the sorting list so that we only need to sort the list once
list_to_sort = list_to_sort.OrderBy(x => x.active).ThenBy(x => x.d).ToList();
//Loop through the list
for (int i = 0; i < count; i++)
{
results.Add(list_to_sort[i % 2 == 0 ? list_to_sort.Count - 1 : 0]);
list_to_sort.RemoveAt(i % 2 == 0 ? list_to_sort.Count - 1 : 0);
}
// Return Results
return results.ToArray();
}
There is probably a better way of doing this but hopefully it helps. Please note that I did not test this method.

most efficient way to query a collection - c#

I'm searching through a generic list (or IQueryable) which contains 3 columns. I'm trying to find the value of the 3 column, based on 1 and 2, but the search is really slow. For a single search, the speed isn't noticeable, but I'm performing this search on a loop, and for 700 iterations, it takes a combined time of over 2 minutes, which isn't any use. Columns 1 and 2 are int and column 3 is a double. Here is the linq I'm using:
public static Distance FindByStartAndEnd(int start, int end, IQueryable<Distance> distanceList)
{
Distance item = distanceList.Where(h => h.Start == start && h.End == end).FirstOrDefault();
return item ;
}
There could be up do 60,000 entries in the IQueryable list. I know that is quite a lot, but I didn't think it would pose any problem for searching.
So my question is, is there a better way to search through a collection when needing to match 2 columns to get value of a third? I guess I need all 700 searches to be almost instant, but it takes about 300ms for each which soon mounts up.
UPDATE - Final Solution #######################
I've now created a dictionary using Tuple with start and end as the key. I think this could be the right solution.
var dictionary = new Dictionary<Tuple<int, int>, double>();
var key = new Tuple<int, int>(Convert.ToInt32(reader[0]), Convert.ToInt32(reader[1]));
var value = Convert.ToDouble(reader[2]);
if (value <= distance)
{
dictionary.Add(key, value);
}
var key = new Tuple<int, int>(5, 20);
Works fine - much faster
Create a dictionary where columns 1 and 2 create the key. You create the dictionary once and then your searches will be almost instant.
If you have control over your collection and model classes, there is a library which allows you to index the properties of the class, which can greatly speed up searching.
http://i4o.codeplex.com/
I'd give a hashSet a try. This should speed up things ;)
Create a single value out of the first two columns, for example by concatenating them into a long, and use that as a key in a dictionary:
public long Combine(int start, int end) {
return ((long)start << 32) | end;
}
Dictionary<long, Distance> lookup = distanceList.ToDictionary(h => Combine(h.Start, h.End));
Then you can look up the value:
public static Distance FindByStartAndEnd(int start, int end, IQueryable<Distance> distanceList) {
Distance item;
if (!lookup.TryGetValue(Combine(start, end), out item) {
item = null;
}
return item;
}
Getting an item from a dictionary is close to an O(1) operaton, which should make a dramatic difference from the O(n) operaton to loop through the items to find one.
Your problem is that LINQ has to execute the expression tree everytime you return the item. Just call this method with multiple start and end values
public static IEnumerable<Distance> FindByStartAndEnd
(IEnumerable<KeyValuePair<int, int>> startAndEnd,
IQueryable<Distance> distanceList)
{
return
from item in distanceList
where
startAndEnd.Select(s => s.Key).Contains(item.Start)
&& startAndEnd.Select(s => s.Value).Contains(item.End)
select item;
}

datarow values comparision with c#

I have a sql database with a table that contains my grading scales and comment e.g
debut end comment
5 ---- 10 -- x
0 ---- 4 --- y
I have managed to iterate through the rows of my table with a foreach loop.
I want to supply a value, maybe with a text box control, then the program should check the range in my gradingScale table where the value follows and outputs a corresponding comment
for example
int number;
number=4
comment=y;
Not sure what you're looking for - and you didn't mention what database you're using - so here I'm just guess that you might be looking for something like this:
DECLARE #Number INT
SET #Number = 4
SELECT comment
FROM dbo.gradingScale
WHERE #Number BETWEEN debut AND end
Of course, you could also wrap this inside a stored procedure (if your database supports that):
CREATE PROCEDURE dbo.GetComment (#Number INT)
AS
SELECT comment
FROM dbo.gradingScale
WHERE #Number BETWEEN debut AND end
These code samples are for Microsoft SQL Server 2005 and up (T-SQL).
If I understand correctly, your database list ranges, each associated with a comment. In your example 0 to 4 map to x, while 5 to 10 map to y.
In that case, a very simple approach would be, assuming that your ranges are not overlapping, to sort your table by ascending debut, and then iterate over the rows until you find one which start is <= to your value.
Hard to make out what you want exactly, but here's an example implementation (You don't say how or in what form your Sql results are returned, so I've provided a DTO/List implementation:
static void SO6648999()
{
List<test> sample = new List<test>
{
new test { debut = 0,
end = 4,
comment = "y"},
new test { debut = 5,
end = 10,
comment = "x"}
};
int number = 4;
string comment = sample.Single(x => number >= x.debut && number <= x.end).comment;
}
class test
{
public int debut;
public int end;
public string comment;
}
I believe you are referring to DataTable
You can use a Select on the DataTable and filter out the records by providing an expression. It works similar to a where clause in Sql.
dt1.Select("end = 4")// assuming column holding int value
end is the column name of the value you are searching and this will return the datarow (array) satisfying the condition.

How to transform one list to another semi-aggregated list via LINQ?

I'm a Linq beginner so just looking for someone to let me know if following is possible to implement with Linq and if so some pointers how it could be achieved.
I want to transform one financial time series list into another where the second series list will be same length or shorter than the first list (usually it will be shorter, i.e., it becomes a new list where the elements themselves represent aggregation of information of one or more elements from the 1st list). How it collapses the list from one to the other depends on the data in the first list. The algorithm needs to track a calculation that gets reset upon new elements added to second list. It may be easier to describe via an example:
List 1 (time ordered from beginning to end series of closing prices and volume):
{P=7,V=1}, {P=10,V=2}, {P=10,V=1}, {P=10,V=3}, {P=11,V=5}, {P=12,V=1}, {P=13,V=2}, {P=17,V=1}, {P=15,V=4}, {P=14,V=10}, {P=14,V=8}, {P=10,V=2}, {P=9,V=3}, {P=8,V=1}
List 2 (series of open/close price ranges and summation of volume for such range period using these 2 param settings to transform list 1 to list 2: param 1: Price Range Step Size = 3, param 2: Price Range Reversal Step Size = 6):
{O=7,C=10,V=1+2+1}, {O=10,C=13,V=3+5+1+2}, {O=13,C=16,V=0}, {O=16,C=10,V=1+4+10+8+2}, {O=10,C=8,V=3+1}
In list 2, I explicitly am showing the summation of the V attributes from list 1 in list 2. But V is just a long so it would just be one number in reality. So how this works is opening time series price is 7. Then we are looking for first price from this initial starting price where delta is 3 away from 7 (via param 1 setting). In list 1, as we move thru the list, the next step is upwards move to 10 and thus we've established an "up trend". So now we build our first element in list 2 with Open=7,Close=10 and sum up the Volume of all bars used in first list to get to this first step in list 2. Now, next element starting point is 10. To build another up step, we need to advance another 3 upwards to create another up step or we could reverse and go downwards 6 (param 2). With data from list 1, we reach 13 first, so that builds our second element in list 2 and sums up all the V attributes used to get to this step. We continue on this process until end of list 1 processing.
Note the gap jump that happens in list 1. We still want to create a step element of {O=13,C=16,V=0}. The V of 0 is simply stating that we have a range move that went thru this step but had Volume of 0 (no actual prices from list 1 occurred here - it was above it but we want to build the set of steps that lead to price that was above it).
Second to last entry in list 2 represents the reversal from up to down.
Final entry in list 2 just uses final Close from list 1 even though it really hasn't finished establishing full range step yet.
Thanks for any pointers of how this could be potentially done via Linq if at all.
My first thought is, why try to use LINQ on this? It seems like a better situation for making a new Enumerable using the yield keyword to partially process and then spit out an answer.
Something along the lines of this:
public struct PricePoint
{
ulong price;
ulong volume;
}
public struct RangePoint
{
ulong open;
ulong close;
ulong volume;
}
public static IEnumerable<RangePoint> calculateRanges(IEnumerable<PricePoint> pricePoints)
{
if (pricePoints.Count() > 0)
{
ulong open = pricePoints.First().price;
ulong volume = pricePoints.First().volume;
foreach(PricePoint pricePoint in pricePoints.Skip(1))
{
volume += pricePoint.volume;
if (pricePoint.price > open)
{
if ((pricePoint.price - open) >= STEP)
{
// We have established a up-trend.
RangePoint rangePoint;
rangePoint.open = open;
rangePoint.close = close;
rangePoint.volume = volume;
open = pricePoint.price;
volume = 0;
yield return rangePoint;
}
}
else
{
if ((open - pricePoint.price) >= REVERSAL_STEP)
{
// We have established a reversal.
RangePoint rangePoint;
rangePoint.open = open;
rangePoint.close = pricePoint.price;
rangePoint.volume = volume;
open = pricePoint.price;
volume = 0;
yield return rangePoint;
}
}
}
RangePoint lastPoint;
lastPoint.open = open;
lastPoint.close = pricePoints.Last().price;
lastPoint.volume = volume;
yield return lastPoint;
}
}
This isn't yet complete. For instance, it doesn't handle gapping, and there is an unhandled edge case where the last data point might be consumed, but it will still process a "lastPoint". But it should be enough to get started.

Ranges with Linq and dictionaries

I've created a Range type:
public class Range<T> where T : IComparable<T>
{
public Range(T min, T max) : this(min, max, false) { }
public Range(T min, T max, bool upperbound)
{
}
public bool Upperbound { get; private set; }
public T Min { get; private set; }
public T Max { get; private set; }
public bool Between(T Value)
{
return Upperbound ? (Min.CompareTo(Value) < 0) && (Value.CompareTo(Max) <= 0) : (Min.CompareTo(Value) <= 0) && (Value.CompareTo(Max) < 0);
}
}
I want to use this as key in a dictionary, to allow me to do a search based upon a range. And yes, ranges can overlap or there might be gaps, which is part of the design. It looks simple but I want the search to be even a bit easier! I want to compare a range with it's value of type T, so I can use: myRange == 10 instead of myRange.Between(10).
How? :-)
(Not wanting to break my head over this. I'll probably find the answer but maybe I'm re-inventing the wheel or whatever.)The things I want to do with this dictionary? Well, in general I will use the range itself as a key. A range will be used for more than just a dictionary. I'm dealing with lots of data that have a min/max range and I need to group these together based on the same min/max values. The value in the dictionary is a list of these products that all have the same range. And by using a range, I can quickly find the proper list where I need to add a product. (Or create a new entry if no list is found.)
Once I have a list of products grouped by ranges, I can start searching for values that fir within specific ranges. Basically, this could be a Linq query on the dictionary of all values where the provides value is between the Min and Max value.
I am actually dealing with two such lists. In one, the range is upper-bound and the other lower-bound. There could be more of these kinds of lists, where I first need to collect data based on their range and then find specific items within them.
Could I use a List instead? Probably, but then I would not have the distinct grouping of my data based on the range itself. A List of Lists then? Possible, but then I'm considering the Dictionary again. :-)Range examples: I have multiple items where the range is 0 to 100. Other items where range is 0 to 1, 1 to 2, 2 to 3, etc. More items where range is 0 to 4, 4 to 6, 6 to 8, etc. I even have items with ranges from 0 to 0.5, 0.5 to 1, 1 to 1.5, etc. So, first I will group all items based on their ranges, so all items with range 1 to 2 would be together in one list, while all items with range 0 to 100 would be in a different list. I've calculated that I'll be dealing with about 50 different ranges which can overlap each other. However, I have over 25000 items which need to be grouped like this.
Next, I get a value from another source. For example the value 1.12 which I need to find. So this time I use Linq to search through the dictionary to find all lists of items where 1.12 would be in the range of the keys. Thus I'd find the range 1 to 2, 1 to 1.5 and even 0 to 100. Behind these ranges there would be lists of items which I need to process for this value. And then I can move onwards to the next value, for about 4000 different values. And preferably everything should finish with 5 seconds.
Using a key in a dictionary is a matter of overriding GetHashCode and Equals. Basically you'd create a hash based on the minimum and maximum values and Upperbound. Typically you call GetHashCode on each component and combine them, e.g.:
public override int GetHashCode()
{
int result = 17;
result = result * 31 + Min.GetHashCode();
result = result * 31 + Max.GetHashCode();
result = result * 31 + Upperbound ? 1 : 0;
}
You'd also need the equality test.
I'm not sure what you mean by "to allow me to do a search based upon a range" though. Could you give some sample code showing how you'd like to use this ability? I'm not entirely sure it'll fit within the normal dictionary approach...
I suggest you don't overload the == operator to allow you to do a containment test with it though. A range isn't equal to a value in the range, so code using that wouldn't be very intuitive.
(I'd personally rename Between to Contains as well.)

Categories

Resources