Linq: Get Items which intersect - c#

I am new to this and I have a little trouble to do this:
I have a list of timeitems:
06:40 - 07:10
06:55 - 07:13
07:00 - 08:35
07:13 - 07:14
09:00 - 10:00
10:00 - 11:00
12:00 - 13:00
12:30 - 14:00
Now I want all items which intersects:
06:40 - 07:10
06:55 - 07:13
07:00 - 08:35
07:13 - 07:14
12:00 - 13:00
12:30 - 14:00
var intersects = timeitems
.Where(a => timeitems
.Any(b => Utilities.IsBetween(a.SpanRangeStartIndex, b.SpanRangeStartIndex, b.SpanRangeEndIndex)))
.AsParallel()
.ToList();
But I only get this and I donĀ“t know why:
06:55 - 07:13
07:00 - 08:35
07:13 - 07:14
12:30 - 14:00
Thanks four your help (Remember, I am new to .net :-)
edit*
ok, a timeitem ist just a list of items with two properties:
Item1(SpanRangeStartIndex=06:40 SpanRangeEndIndex=07:10 )
Item2(SpanRangeStartIndex=06:55 SpanRangeEndIndex=07:13 )
...
Utilities.IsBetween checks if a value is between two other values (if 3 is between 2 and 6 -> true)
public static bool IsBetween(int value, int start, int end)
{
return (value > start) & (value <end);
}
Sorry for my bad English and bad c#-skill... I am very new to this
thanks

Welcome to SO!
I believe the problem that you're trying to solve is that you want to know which ranges in your set of ranges overlap any of the other ranges in the same set.
The problem seems to be that you test one end of the range for "between" but not the other.
(I wrote a sample program that does what yours does and added some comments and removed the 'SpanRange' and 'Index' from the property names as well as the .AsParallel() call - which might change the order of the data returned but still have the same overall content.)
var intersects =
data.Where(a => data
.Any(b =>
IsBetween(a.Start, b.Start, b.End) // <-- this is the test you did
|| IsBetween(a.End, b.Start, b.End) // <-- the missing other end
// || IsBetween(b.Start, a.Start, a.End) // potentially necessary
// || IsBetween(b.End, a.Start, a.End) // potentially necessary
));
I added the other two commented IsBetween calls since I think there are likely "completely contained" range tests that might fail to show when one range is completely contained within the other.
On a different note, I might try to change your thinking a little bit on how to test when ranges intersect by first thinking of the simpler case of how two ranges would NOT intersect.
Two ranges do not intersect when either:
rangeA.End < rangeB.Start which says: rangeA is entirely 'to the left of' rangeB
rangeA.Start > rangeB.End which says: rangeA is entirely 'to the right of' rangeB
doNotIntersect = (rangeA.End < rangeB.Start) || (rangeA.Start > rangeB.End)
Thus we can test whether ranges intersect by negating the above expression:
isIntersecting = (rangeA.End >= rangeB.Start) && (rangeA.Start <= rangeB.End)
However, I noted that your between test doesn't use ">=" or "<=" so a range that shares only an end with the other's start doesn't intersect. Because of this, the 09:00 - 10:00 range in the sample would not overlap with the 10:00 - 11:00 range in the sample. So, it's likely you would use > & < rather than the >= & <= operators.
I'd be happy to post the code and the results if you need it.

You're seeing this problem because you are only getting "items such that this item starts during another item", and not including "items such that another item starts during this item".
A simple fix would be
var intersects = timeitems
.Where(a => timeitems.Any(b =>
Utilities.IsBetween(a.SpanRangeStartIndex,
b.SpanRangeStartIndex, b.SpanRangeEndIndex) ||
Utilities.IsBetween(b.SpanRangeStartIndex,
a.SpanRangeStartIndex, a.SpanRangeEndIndex)))
.AsParallel()
.ToList();
which makes your code symmetrical and would include the missing 06:40 - 07:10 and 12:00 - 13:00.
However, this (as with your original) is very inefficient - O(n^2), when an O(n) algorithm should be possible.

Think of when you are dealing with the time from 12:30 to 14:00
The preceding element (from 12:00 to 13:00) intersects with that window, but your query misses it because you are only checking to see if the start time is in the range when you have to check if the end time is in the range.
That said, you can change your query to this (removed the AsParallel and ToList methods as they aren't integral to the solution):
var intersects = timeitems
.Where(a => timeitems
.Any(b =>
// Check the start of the window...
Utilities.IsBetween(a.SpanRangeStartIndex,
b.SpanRangeStartIndex, b.SpanRangeEndIndex) &&
// *AND* the end of the window...
Utilities.IsBetween(a.SpanRangeEndIndex,
b.SpanRangeStartIndex, b.SpanRangeEndIndex)));
Right now, you're iterating through the entire timeItems sequence for every item, even items that you know have already been matched and intersect (since you're not pairing them, you don't need to say item a overlaps with item b, you simply have to return that it overlaps).
With this in hand, you can reduce having to iterate through N^2 items by not using LINQ, but only if your collections are materialized and implement the IList<T> interface, which arrays and List<T> instances do).
You would look ahead, keeping track of what overlaps and was yielded, like so:
public IEnumerable<TimeItem> GetOverlappingItems(this IList<TimeItem> source)
{
// Validate parameters.
if (source == null) throw new ArgumentNullException("source");
// The indexes to ignore that have been yielded.
var yielded = new HashSet<int>();
// Iterate using indexer.
for (int index = 0; index < source.Count; ++index)
{
// If the index is in the hash set then skip.
if (yielded.Contains(index)) continue;
// Did the look ahead yield anything?
bool lookAheadYielded = false;
// The item.
TimeItem item = source[index];
// Cycle through the rest of the indexes which are
// not in the hashset.
for (int lookAhead = index + 1; lookAhead < source.Count; ++lookAhead)
{
// If the item has been yielded, skip.
if (yielded.Contains(lookAhead)) continue;
// Get the other time item.
TimeItem other = source[lookAhead];
// Compare the two. See if the start or the end
// is between the look ahead.
if (Utilities.IsBetween(item.SpanRangeStartIndex,
other.SpanRangeStartIndex, other.SpanRangeEndIndex) ||
Utilities.IsBetween(item.SpanRangeEndIndex,
other.SpanRangeStartIndex, other.SpanRangeEndIndex))
{
// This is going to be yielded.
lookAheadYielded = true;
// Yield the item.
yield return other;
// Add the index to the hashset of what was yielded.
yielded.Add(lookAhead);
}
}
// Was a look ahead yielded?
// No need to store the index, we're only moving
// forward and this index doesn't matter anymore.
if (lookAheadYielded) yield return item;
}
}

LINQ might not be a good idea here, as you're doing a LOT of double counting. If you can assume they're all sorted by the starting index (which you can just order it using LINQ if you can't make that guarantee) then it's a whole lot easier to keep a rolling window as you iterate over them:
timeitem workingRange = null, rangeStart = null;
bool matched = false;
foreach(timeitem t in timeitems) // timeitems.OrderBy(ti => ti.SpanRangeStartIndex) if unsorted
{
if(workingRange is null)
{
rangeStart = t;
workingRange = new timeitem { SpanRangeStartIndex = t.SpanRangeStartIndex, SpanRangeEndIndex = t.SpanRangeEndIndex };
continue;
}
if(Utilities.IsBetween(t.SpanRangeStartIndex,
workingRange.SpanRangeStartIndex, workingRange.SpanRangeEndIndex))
{
if(!matched)
{
matched = true;
yield return rangeStart;
}
workingRange.SpanRangeEndIndex = Math.Max(workingRange.SpanRangeEndIndex, t.SpanRangeEndIndex);
yield return t;
}
else
{
matched = false;
rangeStart = t
workingRange = new timeitem { SpanRangeStartIndex = t.SpanRangeStartIndex, SpanRangeEndIndex = t.SpanRangeEndIndex };
}
}
A few notes. Keeping a reference to the original first item of the range, since I don't know whether it's a struct/class and it's better to yield the original items unless you're performing some sort of transformation. The working range can easily be modified to use DateTime (which might be easier to read/understand). We need to keep track of whether we've matched yet, because we still need to yield/return the original working item and ensure we don't yield it again (can't use ranges as a measure, as subsequent timeitems could be entirely within the initial range). Finally, if the item we're checking is not within the range, we reset all our state variables and treat them as our beginning range.
This ensures you only ever have to traverse through the collection once, at the expense of sorting it beforehand (which, if you can ensure they get to this point sorted in the first place you eliminate that need anyway). Hope that helps, wish there was an easier way.

Related

Need help to sorting an array in a complicated way c# linq

I have a strict similar to this :
struct Chunk
{
public float d; // distance from player
public bool active; // is it active
}
I have an array of this struct.
What I need:
I need to sort it so the first element is an inactive chunk that is also the furthest from the player, than the next element is active and is also the closest to the player and that is the pattern,
Inactive, furthest
Active, closest
Inactive, furthest
Active, closest
Inactive, furthest
And so on...
Currently I'm using LINQ,
And I'm doing this :
chunk = chunk.AsParallel().OrderBy(x => x.active ? x.d : -x.d).ToArray();
But I don't know how to make it alternate one after another.
It looks like you want to split this into two lists, sort them, then interleave them.
var inactive = chunk.Where(x => !x.active).OrderByDescending(x => x.d);
var active = chunk.Where(x => x.active).OrderBy(x => x.d);
var interleaved = inactive.Zip(active, (a, b) => new [] { a, b }).SelectMany(x => x);
But I don't know how to make it alternate one after another.
If you sort the array so that all the inactives are at the start, descending and all the actives are at the end, descending..
OrderBy(x => x.active?1:0).ThenBy(x=>-x.d)
then you can take an item from the start, then an item from the end, then from the start + 1, then the end - 1 working your way inwards
public static IEnumerable<Chunk> Interleave(this Chunk[] chunks){
for(int s = 0, e = chunks.Length - 1; s<e;){
if(!chunks[s].active)
yield return chunks[s++];
if(chunks[e].active)
yield return chunks[e--];
}
}
There's a bit in this so let's unpack it. This is an extension method that acts on an array of chunk. It's a custom enumerator method, so you'd call foreach on it to use it
foreach(var c in chunk.Interleave())
It contains a for loop that tracks two variables, one for the start index and one for the end. The start increments and the end decrements. At some point they'll meet and s will no longer be less than e, which is when we stop:
for(int s = 0, e = chunks.Length - 1; s<e;){
We need to look at the chunk before we return it, if it's an inactive near the start, yield return it and bump the start on by one. s++ increments s, but resolves to the value s was before it incremented. It's thus conceptually like doing chunks[s]; s += 1; but in a one liner
if(!chunks[s].active)
yield return chunks[s++];
Then we look at the chunk near the end, if it's active then return the ending one and bump the end index down
The inactive chunks are tracked by s, and if s reaches an active chunk it stops returning (every pass of the loop it is skipped), which means e will work its way down towards s returning only the actives
Similarly if there are more inactives than actives, e will stop decrementing first and s will work its way up towards e
If you never came across yield return before think of it as a way to allow you to resume from where you left off rather than starting the method over again. It's used with enumerations to provide a way for the enumeration to return an item, then be moved on one and return the next item. It works a bit like saving your game and going doing something else, then coming back, realising your save game and carrying on from where you left off. Asking an enumerator for Next makes it load the game, play a bit, then save and stop.. Then you Next again and the latest save id loaded, play some more, save and stop. This way you gradually get through the game a bit at a time. If you started a new enumeration by calling Interleave again, that's like starting a new game over from the beginning
MSDN will get more detailed on yield return if you want to dig in more
Edit:
You can perform an in-place sort of your Chunk[] by having a custom comparer:
public class InactiveThenDistancedDescending : IComparer
{
public int Compare(object x, object y)
{
var a = (Chunk)x;
var b = (Chunk)y;
if(a.Active == b.Active)
return -a.Distance.CompareTo(b.Distance);
else
return a.Active.CompareTo(b.Active);
}
}
And:
Array.Sort(chunkArray, _someInstanceOfThatComparerAbove);
Not sure if you can do it with only one line of code.
I wrote a method that would only require the Array to be sorted once. Then, it enters either the next closest or furthest chunk based on the current index of the for loop (odd = closest, even = furthest). I remove the item from the sorted list to ensure that it will not be reentered in the results list. Finally, I return the results as an Array.
public Chunk[] SortArray(List<Chunk> list_to_sort)
{
//Setup Variables
var results = new List<Chunk>();
int count = list_to_sort.Count;
//Tracking the sorting list so that we only need to sort the list once
list_to_sort = list_to_sort.OrderBy(x => x.active).ThenBy(x => x.d).ToList();
//Loop through the list
for (int i = 0; i < count; i++)
{
results.Add(list_to_sort[i % 2 == 0 ? list_to_sort.Count - 1 : 0]);
list_to_sort.RemoveAt(i % 2 == 0 ? list_to_sort.Count - 1 : 0);
}
// Return Results
return results.ToArray();
}
There is probably a better way of doing this but hopefully it helps. Please note that I did not test this method.

C# - Linq optimize code with List and Where clause

I have a following code:
var tempResults = new Dictionary<Record, List<Record>>();
errors = new List<Record>();
foreach (Record record in diag)
{
var code = Convert.ToInt16(Regex.Split(record.Line, #"\s{1,}")[4], 16);
var cond = codes.Where(x => x.Value == code && x.Active).FirstOrDefault();
if (cond == null)
{
errors.Add(record);
continue;
}
var min = record.Datetime.AddSeconds(downDiff);
var max = record.Datetime.AddSeconds(upDiff);
//PROBLEM PART - It takes around 4,5ms
var possibleResults = cas.Where(x => x.Datetime >= min && x.Datetime <= max).ToList();
if (possibleResults.Count == 0)
errors.Add(record);
else
{
if (!CompareCond(record, possibleResults, cond, ref tempResults, false))
{
errors.Add(record);
}
}
}
variable diag is List of Record
variable cas is List of Record with around 50k items.
The problem is that it's too slowly. The part with the first where clause needs around 4,6599ms, e.g. for 3000 records in List diag it makes 3000*4,6599 = 14 seconds. Is there any option to optimize the code?
You can speed up that specific statement you emphasized
cas.Where(x => x.Datetime >= min && x.Datetime <= max).ToList();
With binary search over cas list. First pre-sort cas by Datetime:
cas.Sort((a,b) => a.Datetime.CompareTo(b.Datetime));
Then create comparer for Record which will compare only Datetime properties (implementation assumes there are no null records in the list):
private class RecordDateComparer : IComparer<Record> {
public int Compare(Record x, Record y) {
return x.Datetime.CompareTo(y.Datetime);
}
}
Then you can translate your Where clause like this:
var index = cas.BinarySearch(new Record { Datetime = min }, new RecordDateComparer());
if (index < 0)
index = ~index;
var possibleResults = new List<Record>();
// go backwards, for duplicates
for (int i = index - 1; i >= 0; i--) {
var res = cas[i];
if (res.Datetime <= max && res.Datetime >= min)
possibleResults.Add(res);
else break;
}
// go forward until item bigger than max is found
for (int i = index; i < cas.Count; i++) {
var res = cas[i];
if (res.Datetime <= max &&res.Datetime >= min)
possibleResults.Add(res);
else break;
}
Idea is to find first record with Datetime equal or greater to your min, with BinarySearch. If exact match is found - it returns index of matched element. If not found - it returns negative value, which can be translated to the index of first element greater than target with ~index operation.
When we found that element, we can just go forward the list and grab items until we find item with Datetime greater than max (because list is sorted). We need to go a little backwards also, because if there are duplicates - binary search will not necessary return the first one, so we need to go backwards for potential duplicates.
Additional improvements might include:
Putting active codes in a Dictionary (keyed by Value) outside of for loop, and thus replacing codes Where search with Dictionary.ContainsKey.
As suggested in comments by #Digitalsa1nt - parallelize foreach loop, using Parallel.For, PLINQ, or any similar techniques. It's a perfect case for parallelization, because loop contains only CPU bound work. You need to make a little adjustments to make it thread-safe of course, such as using thread-safe collection for errors (or locking around adding to it).
Try adding AsNoTracking in the list
The AsNoTracking method can save both execution times and memory usage. Applying this option really becomes important when we retrieve a large amount of data from the database.
var possibleResults = cas.Where(x => x.Datetime >= min && x.Datetime <= max).AsNoTracking().ToList(); //around 4,6599ms
There a few improvements you can make here.
It might only be a minor performance increase but you should try using groupby instead of where in this circumstance.
So instead you should have something like this:
cas.GroupBy(x => x.DateTime >= min && x.DateTime <= max).Select(h => h.Key == true);
This ussually works for seaching through lists for distinct values, but in you case I'm unsure if it will provide you any benefit when using a clause.
Also a few other things you can do throughout you code:
Avoid using ToList when possible and stick to IEnumerable. ToList performs an eager evaluation which is probably causing a lot of slowdown in your query.
use .Any() instead of Count when checking if values exist (This only applies if the list is IEnumerable)

Getting all combinations of K and less elements in List of N elements with big K

I want to have all combination of elements in a list for a result like this:
List: {1,2,3}
1
2
3
1,2
1,3
2,3
My problem is that I have 180 elements, and I want to have all combinations up to 5 elements. With my tests with 4 elements, it took a long time (2 minutes) but all went well. But with 5 elements, I get a run out of memory exception.
My code presently is this:
public IEnumerable<IEnumerable<Rondin>> getPossibilites(List<Rondin> rondins)
{
var combin5 = rondins.Combinations(5);
var combin4 = rondins.Combinations(4);
var combin3 = rondins.Combinations(3);
var combin2 = rondins.Combinations(2);
var combin1 = rondins.Combinations(1);
return combin5.Concat(combin4).Concat(combin3).Concat(combin2).Concat(combin1).ToList();
}
With the fonction: (taken from this question: Algorithm to return all combinations of k elements from n)
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int k)
{
return k == 0 ? new[] { new T[0] } :
elements.SelectMany((e, i) =>
elements.Skip(i + 1).Combinations(k - 1).Select(c => (new[] { e }).Concat(c)));
}
I need to search in the list for a combination where each element added up is near (with a certain precision) to a value, this for each element in an other list. There is all my code for this part:
var possibilites = getPossibilites(opt.rondins);
possibilites = possibilites.Where(p => p.Sum(r => r.longueur + traitScie) < 144);
foreach(BilleOptimisee b in opt.billesOptimisees)
{
var proches = possibilites.Where(p => p.Sum(r => (r.longueur + traitScie)) < b.chute && Math.Abs(b.chute - p.Sum(r => r.longueur)) - (p.Count() * 0.22) < 0.01).OrderByDescending(p => p.Sum(r => r.longueur)).ElementAt(0);
if(proches != null)
{
foreach (Rondin r in proches)
{
opt.rondins.Remove(r);
b.rondins.Add(r);
possibilites = possibilites.Where(p => !p.Contains(r));
}
}
}
With the code I have, how can I limit the memory taken by my list ? Or is there a better solution to search in a very big set of combinations ?
Please, if my question is not good, tell me why and I will do my best to learn and ask better questions next time ;)
Your output list for combinations of 5 elements will have ~1.5*10^9 (that's billion with b) sublists of size 5. If you use 32bit integers, even neglecting lists overhead and assuming you have a perfect list with 0b overhead - that will be ~200GB!
You should reconsider if you actually need to generate the list like you do, some alternative might be: streaming the list of elements - i.e. generating them on the fly.
That can be done by creating a function, which gets the last combination as an argument - and outputs the next. (to think how it is done, think about increasing by one a number. you go from last to first, remembering a "carry over" until you are done)
A streaming example for choosing 2 out of 4:
start: {4,3}
curr = start {4, 3}
curr = next(curr) {4, 2} // reduce last by one
curr = next(curr) {4, 1} // reduce last by one
curr = next(curr) {3, 2} // cannot reduce more, reduce the first by one, and set the follower to maximal possible value
curr = next(curr) {3, 1} // reduce last by one
curr = next(curr) {2, 1} // similar to {3,2}
done.
Now, you need to figure how to do it for lists of size 2, then generalize it for arbitrary size - and program your streaming combination generator.
Good Luck!
Let your precision be defined in the imaginary spectrum.
Use a real index to access the leaf and then traverse the leaf with the required precision.
See PrecisLise # http://net7mma.codeplex.com/SourceControl/latest#Common/Collections/Generic/PrecicseList.cs
While the implementation is not 100% complete as linked you can find where I used a similar concept here:
http://net7mma.codeplex.com/SourceControl/latest#RtspServer/MediaTypes/RFC6184Media.cs
Using this concept I was able to re-order h.264 Access Units and their underlying Network Access Layer Components in what I consider a very interesting way... outside of interesting it also has the potential to be more efficient using close the same amount of memory.
et al, e.g, 0 can be proceeded by 0.1 or 0.01 or 0.001, depending on the type of the key in the list (double, float, Vector, inter alia) you may have the added benefit of using the FPU or even possibly Intrinsics if supported by your processor, thus making sorting and indexing much faster than would be possible on normal sets regardless of the underlying storage mechanism.
Using this concept allows for very interesting ordering... especially if you provide a mechanism to filter the precision.
I was also able to find several bugs in the bit-stream parser of quite a few well known media libraries using this methodology...
I found my solution, I'm writing it here so that other people that has a similar problem than me can have something to work with...
I made a recursive fonction that check for a fixed amount of possibilities that fit the conditions. When the amount of possibilities is found, I return the list of possibilities, do some calculations with the results, and I can restart the process. I added a timer to stop the research when it takes too long. Since my condition is based on the sum of the elements, I do every possibilities with distinct values, and search for a small amount of possibilities each time (like 1).
So the fonction return a possibility with a very high precision, I do what I need to do with this possibility, I remove the elements of the original list, and recall the fontion with the same precision, until there is nothing returned, so I can continue with an other precision. When many precisions are done, there is only about 30 elements in my list, so I can call for all the possibilities (that still fits the maximum sum), and this part is much easier than the beginning.
There is my code:
public List<IEnumerable<Rondin>> getPossibilites(IEnumerable<Rondin> rondins, int nbElements, double minimum, double maximum, int instance = 0, double longueur = 0)
{
if(instance == 0)
timer = DateTime.Now;
List<IEnumerable<Rondin>> liste = new List<IEnumerable<Rondin>>();
//Get all distinct rondins that can fit into the maximal length
foreach (Rondin r in rondins.Where(r => r.longueur < (maximum - longueur)).DistinctBy(r => r.longueur).OrderBy(r => r.longueur))
{
//Check the current length
double longueur2 = longueur + r.longueur + traitScie;
//If the current length is under the maximal length
if (longueur2 < maximum)
{
//Get all the possibilities with all rondins except the current one, and add them to the list
foreach (IEnumerable<Rondin> poss in getPossibilites(rondins.Where(rondin => rondin.id != r.id), nbElements - liste.Count, minimum, maximum, instance + 1, longueur2).Select(possibilite => possibilite.Concat(new Rondin[] { r })))
{
liste.Add(poss);
if (liste.Count >= nbElements && nbElements > 0)
break;
}
//If this the current length in higher than the minimum, add it to the list
if (longueur2 >= minimum)
liste.Add(new Rondin[] { r });
}
//If we have enough possibilities, we stop the research
if (liste.Count >= nbElements && nbElements > 0)
break;
//If the research is taking too long, stop the research and return the list;
if (DateTime.Now.Subtract(timer).TotalSeconds > 30)
break;
}
return liste;
}

Iterating through collection based on index

Let me explain the situation first:
I receive a value from my Binary Search on a collection, and quickly jump to that to do some coding. Next I want to jump to the next item in the list. But this next item is not exactly the one that follows it could be 3 or 4 items later. Here is my data to understand the sitatuion
Time ID
0604 ABCDE
0604 EFGH
0604 IJKL
0626 Some Data1
0626 Some Data2
0626 Some Data3
0626 Some Data4
Let's say Binary search return's index 0, I jump to index 0 (0604 ABCDE). I process/consume all 0604. Now I am at index 0, how do I jump to index 3 (0626) and consume / process all of it. Keeping in mind this will not always be the same. Data can be different. So I can't simply jump : index + 3
Here's my code:
var matches = recordList.Where(d => d.DateDetails == oldPointer);
var lookup = matches.ToLookup(d => d.DateDetails).First();
tempList = lookup.ToList();// build templist
oldPointer here is the index I get from Binary search. I take this up and build a templist. Now after this I want to jump to 0626.
How many records with the same "old pointer" do you typically expect? Is usually going to be less than 100? if so: don't over-complicate it - just iterate:
public static int FindNextPointerIndex(int oldIndex, string oldPointer, ...)
{
for(int i = oldIndex + 1; i < collection.Count ; i++)
{
if(collection[i].DateDetails != oldPointer) return i;
}
return -1;
}
If you want something more elegant, you will have to pre-index the data by DateDetails, presumably using something like a ToLookup over the entire collection, but: note that this makes changes to the data more complicated.
Have a look at Skip List , http://en.wikipedia.org/wiki/Skip_list
It will allow you to jump forward more than 1 in your linked list, but the down side to find the start of your search will be O(n)

LINQ Query Determine Input is in List Boundaries?

I have a List of longs from a DB query. The total number in the List is always an even number, but the quantity of items can be in the hundreds.
List item [0] is the lower boundary of a "good range", item [1] is the upper boundary of that range. A numeric range between item [1] and item [2] is considered "a bad range".
Sample:
var seekset = new SortedList();
var skd= 500;
while( skd< 1000000 )
{
seekset.Add(skd, 0);
skd = skd+ 100;
}
If an input number is compared to the List items, if the input number is between 500-600 or 700-800 it is considered "good", but if it is between 600-700 it is considered "bad".
Using the above sample, can anyone comment on the right/fast way to determine if the number 655 is a "bad" number, ie not within any good range boundary (C#, .NET 4.5)?
If a SortedList is not the proper container for this (eg it needs to be an array), I have no problem changing, the object is static (lower case "s") once it is populated but can be destroyed/repopulated by other threads at any time.
The following works, assuming the list is already sorted and both of each pair of limits are treated as "good" values:
public static bool IsGood<T>(List<T> list, T value)
{
int index = list.BinarySearch(value);
return index >= 0 || index % 2 == 0;
}
If you only have a few hundred items then it's really not that bad. You can just use a regular List and do a linear search to find the item. If the index of the first larger item is even then it's no good, if it's odd then it's good:
var index = data.Select((n, i) => new { n, i })
.SkipWhile(item => someValue < item.n)
.First().i;
bool isValid = index % 2 == 1;
If you have enough items that a linear search isn't desirable then you can use a BinarySearch to find the next largest item.
var searchValue = data.BinarySearch(someValue);
if (searchValue < 0)
searchValue = ~searchValue;
bool isValid = searchValue % 2 == 1;
I am thinking that LINQ may not be best suited for this problem because IEnumerable forgets about item[0] when it is ready to process item[1].
Yes, this is freshman CS, but the fastest in this case may be just
// untested code
Boolean found = false;
for(int i=0; i<seekset.Count; i+=2)
{
if (valueOfInterest >= seekset[i] &&
valueOfInterest <= seekset[i+1])
{
found = true;
break; // or return;
}
}
I apologize for not directly answering your question about "Best approach in Linq", but I sense that you are really asking about best approach for performance.

Categories

Resources