Fastest method of collection searching by DateTime - c#

I have a Dictionary containing 10 keys, each with a list containing up to 30,000 values. The values contain a DateTime property.
I frequently need to extract a small subset of one of the keys, like a date range of 30 - 60 seconds.
Doing this is easy, but getting it to run fast is not so. What would be the most efficient way to query this in-memory data?
Thanks a lot.

Sort lists by date at the first, then find your required items by binary search (i.e k item) and return them, finding the searched item is O(log(n)) because you need find first and last index. returning them is O(K) in all It's O(K+log(n))
IEnumerable<item> GetItems(int startIndex, int endIndex, List<item> input)
{
for (int i=startIndex;i<endIndex;i++)
yield return input[i];
}

1) Keep the dictionary, but use SortedList instead of a list for value of dictionaries, sorted by DateTime property
2) Implement a binary search to find the upper and lower edges in your range in the sorted list which gives you indexes.
3) Just select values in the range using Sortedlist.Values.Skip(lowerIndex).Take(upperIndex - lowerIndex)

In reply to Aliostad: I don't think bsearch will not work if the list of the collection is a linked list. It still takes O(n)

the fastest way will be to organize the data so it is indexed by the thing you want to search on. Currently you have it indexed by key, but you want to search by date. I think you would be best indexing it by date, if that is what you want to be able to search on.
I would keep 2 dictionaries, one indexed as you do now and one where the items are indexed by date. i would decide on a time frame (say 1 minute) and add each object to a list based on the minute it happens in and then add each list to the dictionary under the key of that minute. then when you want the data for a particular time frame, generate the relevant minute(s) and get the list(s) from the dictionary. This relies on you being able to know the key in the other dictionary from the objects though.

Related

Insert into ObservableCollection per int comparison

Say I have an ObservableCollection with two items:
0: dateUnix: 333
1: dateUnix: 222
Now I want to add a new Item:
dateUnix: 300
If I just were to use the .add() method, the item would get added at the end. But I want the item to be inserted between 222 and 300 since this would make the list sorted.
How do I insert an item at a certain position where it is less then item value after and higher then item value before?
Of the top, I can think of two ways of doing this.
One would be, as was pointed out in the comments, to just insert and sort afterwards.
Another, more complex and more rewarding way would be to find the index of the first item greater or lesser than the one you're inserting and insert it at that index. Your list seems to be sorted in descending order, so it'd need to be the first lesser than.
You could achieve this using LINQ:
ObservableCollection<Int> collection = new ObservableCollection(new List<int>{333,222}); // == [333,222]
Int toInsert = 300;
collection.Insert(collection.IndexOf(collection.First(elem => elem < toInsert)), toInsert); // output == [333,300,222]
See this Fiddle for a working example.
If your collection is already sorted, just find the appropriate index to insert the element at (either via a linear or the faster binary search) and use Insert to store the element at that specific index.

Retrieve paged results using Linq

I want to get items 50-100 back from a result set in linq. How do I do that?
Situation:
I get back an index of the last item of the last result set. I want to then grab the next 50. I do not have the ID of the last result only its index number.
You order it by Something, else you really can't
So it would be Something like
mycontext.mytable
.OrderBy(item=>.item.PropertyYouWantToOrderBy)
.Skip(HowManyYouWantToSkip)
.Take(50);
LINQ is based on the concept of one-way enumeration, so queries all start at the beginning. To implement paging you'll have to use the Skip and Take extensions to isolate the items you're interested in:
int pageSize = 50;
// 0-based page number
int pageNum = 2;
var pageContent = myCollection.Skip(pageSize * pageNum).Take(pageSize);
Of course this just sets up an IEnumerable<T> that, when enumerated, will step through myCollection 100 times, then start returning data for 50 steps before closing.
Which is fine if you're working on something that can be enumerated multiple times, but not if you're working with a source that will only enumerate once. But then you can't realistically implement paging on that sort of enumeration anyway, you need an intermediate storage for at least that portion of it that you've already consumed.
In LINQ to SQL this will result in a query that attempts to select only the 50 records you've asked for, which from memory is going to be based taking numSkip + numTake records, reversing the sort order, taking numTake records and reversing again. Depending on the sort order you've set up and the size of the numbers involved this could be a much more expensive operation than simply pulling a bunch of data back and filtering it in memory.

Most efficent method for getting 10th item from a singlurary linked list

What is the most efficient method for getting the 10th item form the end of a list
I was thinking something like:
List[List.Count() - 10];
If you're using List<T> from System.Collections.Generic then you're not actually using singly linked list. It's backed by an array, and you can simple access it by index, as you already suggested:
list[list.Count - 10];
it will be O(1) operation. You should probably check if list has at least 10 elements before doing it so you don't get an exception.
However, if you have your own singly linked list structure you'll have to iterate the entire list to get Nth item from the end of the list. You can use the same approach, but that will force two round trips over the collection - first to get total number of elements and second to get Nth last element.
You can make it happen with just one iteration, if you store last N items you've seen, e.g. in a queue. This will be O(n) operation.

Algorithm for ordering a list of Objects

Say you have a List of objects. The User uses mostly all objects when he is working.
How can you order the list of objects, so that the list adapts to the order, the users uses mostly? What algorithm can you use for that?
EDIT: Many answers suggested counting the number of times an object was used. This does not work, because all objects are used the same amount, just in different orders.
Inside your object, keep a usedCount. Whenever the object is used, increase this count.
Then you can simply do this:
objects.OrderByDescending(o => o.UsedCount);
I would keep a running count of how many times the object was used, and in what order it was used.
So if object X was used 3rd, average it with the running count and use the result as it's position in the list.
For example:
Item Uses Order of Use
---------------------------------------
Object X 10 1,2,3,1,2,1,3,1,2,2 (18)
Object Y 10 3,1,2,3,3,3,1,3,3,1 (23)
Object Z 10 2,3,1,2,1,2,2,2,2,3 (20)
Uses would be how many times the user used the object, the order of use would be a list (or sum) of where the item is used in the order.
Using a list of the each order individually could have some performance issues, so you may just want to keep a sum of the positions. If you keep a sum, just add the order to that sum every time the object is used.
To calculate the position, you would then just use the sum of the positions, divided by the number of uses and you'd have your average. All you would have to do at that point is order the list by the average.
In the example above, you'd get the following averages (and order):
Object X 1.8
Object Z 2.0
Object Y 2.3
Add a list of datetimes of when a user accesses an object. Each time a user uses an object, add a datetime.
Now just count the number of datetime entries in your list that are w (now - x days) and sort by that. You can delete the datetimes that are > (now - x days).
It's possible that a user uses different items in a month, this will reflect those changes.
You can add a number_of_views field to your object class, ++ it every time the object's used and sort list by that field. And you should make this field=0 to all objects when number_of_views at all objects is the same but isn't 0.
I would also use a counter for each object to monitor its use, but instead of reordering the whole list after each use, I would recommend to just sort the list "locally".
Like in a bubble sort, I would just compare the object whose counter was just increased with the upper object, and swap them if needed. If swapped, I would then compare the object and its new upper object and so on.
However, it is not very different from the previous methods if the sort is properly implemented.
If your User class looks like so:
class User
{
Collection<Algo> algosUsed = new List<Algo>(); //Won't compile, used for explanation
...
}
And your Algo class looks like so:
class Algo
{
int usedCount;
...
}
You should be able to bind specific instances of the Algo object to the User object that allow for the recording of how often it is used. At the most basic level you would serialize the information to a file or a stream. Most likely you want a database to keep track of what is being used. Then when you grab your User and invoke a sort function you order the algos param of User by the usedCount param of Algo
Sounds like you want a cache. I spose you could look at the algorithms a cache uses and then take out the whole business about context switching...there is an algorithm called "clock sweep"... but meh that might all be too complex for what you are looking for. To go the lazy way I'd say just make a hash of "used thing":num_of_uses or, in your class, have a var you ++ each time the object is used.
Every once and a while sort the hash by num_of_uses or the objects by the value of their ++'d variable.
From https://stackoverflow.com/a/2619065/1429439 :
maybe use OrderedMultiDictionary with the usedCount as the keys and the object as the value.
EDIT: ADDED A Order Preferrence!!! look in CODE
I dont like the Last used method as Carra said because it inflict many sort changes which is confusing.
the count_accessed field is much better, though i think it should be levelled to
how many times the user accessed this item in the last XX minutes/hours/days Etc...
the best Datastructure for that is surely
static TimeSpan TIME_TO_LIVE;
static int userOrderFactor = 0;
LinkedList<KeyValuePair<DateTime, int>> myAccessList = new LinkedList<KeyValuePair<DateTime, int>>();
private void Access_Detected()
{
userOrderFactor++;
myAccessList.AddLast(new KeyValuePair<DateTime, int>(DateTime.Now, userOrderFactor));
myPriority += userOrderFactor; // take total count differential, so we dont waste time summing the list
}
private int myPriority = 0;
public int MyPriority
{
get
{
DateTime expiry = DateTime.Now.Subtract(TIME_TO_LIVE);
while (myAccessList.First.Value.Key < expiry)
{
myPriority += myAccessList.First.Value.Value; // take care of the Total Count
myAccessList.RemoveFirst();
}
return myPriority;
}
}
Hope this helps...
it is almost always O(1) BTW...
reminds me somewhat of the Sleep mechanism of Operating Systems
When a user interacts with an object, save the ID of the previous object acted upon on that second object so that you always have a pointer to the object used before any given object.
Additionally, store the ID of the most frequently first used object so you know where to start.
When you are building your list of objects to display, you start with the one you've stored as the most frequently first-used object then search for the object that has the first-used object's ID stored on it to display next.

best data structure for sorted time series data that can quickly return subarrays?

I'm in need of a datastructure that is basically a list of data points, where each data point has a timestamp and a double[] of data values. I want to be able to retrieve the closest point to a given timestamp or all points within a specified range of timestamps.
I'm using c#. my thinking was using a regular list would be possible, where "datapoint" is a class that contains the timestamp and double[] fields. then to insert, I'd use the built-in binarysearch() to find where to insert the new data, and I could use it again to find the start/end indexes for a range search.
I first tried sortedlists, but it seems like you can't iterate through indexes i=0,1,2,...,n, just through keys, so I wasn't sure how to do the range search without some convoluted function.
but then I learned that list<>'s insert() is o(n)...couldn't I do better than that without sacrificing elsewhere?
alternatively, is there some nice linq query that will do everything I want in a single line?
If you're willing to use non BCL libraries, the C5.SortedArray<T> has always worked quite well for me.
It has a great method, RangeFromTo, that performs quite well with this sort of problem.
If you have only static data then any structure implementing IList should be fine. Sort it once and then make queries using BinarySearch. This should also work if your inserted timestamps are always increasing, then you can just do List.Add() in O(1) and it will be still sorted.
List<int> x = new List<int>();
x.Add(5);
x.Add(7);
x.Add(3);
x.Sort();
//want to find all elements between 4 and 6
int rangeStart = x.BinarySearch(4);
//since there is no element equal to 4, we'll get the binary complement of an index, where 4 could have possibly been found
//see MSDN for List<T>.BinarySearch
if (rangeStart < 0)
rangeStart = ~rangeStart;
while (x[rangeStart] < 6)
{
//do you business
rangeStart++;
}
If you need to insert data at random points in your structure, keep it sorted and be able to query ranges fast, you need a structure called B+ tree. It's not implemented in the framework, you'll need to get it somewhere by your own.
Inserting a record requires O(log n) operations in the worst case
Finding a record requires O(log n) operations in the worst case
Removing a (previously located) record requires O(log n) operations in the worst case
Performing a range query with k elements occurring within the range requires O((log n) + k) operations in the worst case.
P.S. "is there some nice linq query that will do everything I want in a single line"
I wish I knew such a nice linq query that could do everything I want in one line :-)
You have the choice of cost at insertion, retrieval or removal time. There are various data structures optimized for each of these cases. Before you decide on one, I'd estimate the total size of your structures, how many data points are being generated (and at which frequency) and what will be used more often: insertion or retrieval.
If you insert a lot of new data points at high frequency, I'd suggest looking at a LinkedList<>. If you're retrieving more often, I'd use a List<> even though its insertion time is slower.
Of course you could do that in a LINQ query, but remember this is only sugar coating: The query will execute every time and for every execution search the entire set of data points to find a match. This may be more expensive than using the right collection for the job in the first place.
How about using an actual database to store your data and run queries against that?
Then, you could use LINQ-to-SQL.

Categories

Resources