Say you have a List of objects. The User uses mostly all objects when he is working.
How can you order the list of objects, so that the list adapts to the order, the users uses mostly? What algorithm can you use for that?
EDIT: Many answers suggested counting the number of times an object was used. This does not work, because all objects are used the same amount, just in different orders.
Inside your object, keep a usedCount. Whenever the object is used, increase this count.
Then you can simply do this:
objects.OrderByDescending(o => o.UsedCount);
I would keep a running count of how many times the object was used, and in what order it was used.
So if object X was used 3rd, average it with the running count and use the result as it's position in the list.
For example:
Item Uses Order of Use
---------------------------------------
Object X 10 1,2,3,1,2,1,3,1,2,2 (18)
Object Y 10 3,1,2,3,3,3,1,3,3,1 (23)
Object Z 10 2,3,1,2,1,2,2,2,2,3 (20)
Uses would be how many times the user used the object, the order of use would be a list (or sum) of where the item is used in the order.
Using a list of the each order individually could have some performance issues, so you may just want to keep a sum of the positions. If you keep a sum, just add the order to that sum every time the object is used.
To calculate the position, you would then just use the sum of the positions, divided by the number of uses and you'd have your average. All you would have to do at that point is order the list by the average.
In the example above, you'd get the following averages (and order):
Object X 1.8
Object Z 2.0
Object Y 2.3
Add a list of datetimes of when a user accesses an object. Each time a user uses an object, add a datetime.
Now just count the number of datetime entries in your list that are w (now - x days) and sort by that. You can delete the datetimes that are > (now - x days).
It's possible that a user uses different items in a month, this will reflect those changes.
You can add a number_of_views field to your object class, ++ it every time the object's used and sort list by that field. And you should make this field=0 to all objects when number_of_views at all objects is the same but isn't 0.
I would also use a counter for each object to monitor its use, but instead of reordering the whole list after each use, I would recommend to just sort the list "locally".
Like in a bubble sort, I would just compare the object whose counter was just increased with the upper object, and swap them if needed. If swapped, I would then compare the object and its new upper object and so on.
However, it is not very different from the previous methods if the sort is properly implemented.
If your User class looks like so:
class User
{
Collection<Algo> algosUsed = new List<Algo>(); //Won't compile, used for explanation
...
}
And your Algo class looks like so:
class Algo
{
int usedCount;
...
}
You should be able to bind specific instances of the Algo object to the User object that allow for the recording of how often it is used. At the most basic level you would serialize the information to a file or a stream. Most likely you want a database to keep track of what is being used. Then when you grab your User and invoke a sort function you order the algos param of User by the usedCount param of Algo
Sounds like you want a cache. I spose you could look at the algorithms a cache uses and then take out the whole business about context switching...there is an algorithm called "clock sweep"... but meh that might all be too complex for what you are looking for. To go the lazy way I'd say just make a hash of "used thing":num_of_uses or, in your class, have a var you ++ each time the object is used.
Every once and a while sort the hash by num_of_uses or the objects by the value of their ++'d variable.
From https://stackoverflow.com/a/2619065/1429439 :
maybe use OrderedMultiDictionary with the usedCount as the keys and the object as the value.
EDIT: ADDED A Order Preferrence!!! look in CODE
I dont like the Last used method as Carra said because it inflict many sort changes which is confusing.
the count_accessed field is much better, though i think it should be levelled to
how many times the user accessed this item in the last XX minutes/hours/days Etc...
the best Datastructure for that is surely
static TimeSpan TIME_TO_LIVE;
static int userOrderFactor = 0;
LinkedList<KeyValuePair<DateTime, int>> myAccessList = new LinkedList<KeyValuePair<DateTime, int>>();
private void Access_Detected()
{
userOrderFactor++;
myAccessList.AddLast(new KeyValuePair<DateTime, int>(DateTime.Now, userOrderFactor));
myPriority += userOrderFactor; // take total count differential, so we dont waste time summing the list
}
private int myPriority = 0;
public int MyPriority
{
get
{
DateTime expiry = DateTime.Now.Subtract(TIME_TO_LIVE);
while (myAccessList.First.Value.Key < expiry)
{
myPriority += myAccessList.First.Value.Value; // take care of the Total Count
myAccessList.RemoveFirst();
}
return myPriority;
}
}
Hope this helps...
it is almost always O(1) BTW...
reminds me somewhat of the Sleep mechanism of Operating Systems
When a user interacts with an object, save the ID of the previous object acted upon on that second object so that you always have a pointer to the object used before any given object.
Additionally, store the ID of the most frequently first used object so you know where to start.
When you are building your list of objects to display, you start with the one you've stored as the most frequently first-used object then search for the object that has the first-used object's ID stored on it to display next.
Related
let me try to explain this
I have two lists
1. list of employee objects
2.list of department objects(which has list of employee who can work in the department)
I want to be able to add a employee to a department in the list which has list of employees.
but I am getting null error
int empsize = allemployees.Count;
int Maxdepartment = 0;
foreach (employee employeeitem in allemployees)
{
Maxdepartment = employeeitem.alloweddepartments.Count;
for (int i = 0; i < Maxdepartment; i++)
{
int index = alldepartments.FindIndex(x => x.Name == employeeitem.alloweddepartments[i].ToString());
alldepartments[index].earlyshift.Add(employeeitem);
}
This looks like a very complex problem for me. Its for sure a optimization problem with many constraints, so i hope you are good at math ;-).
I would suggest to have a look at the Simplex Algorithm which will work very well for your problem if you have the mathematical know how to use it. There are some variations of the simplex too which maybe also work well.
There is an other way too where you just use the power of your computer to solve the problem. You could write a function which rates a solution and gives you some kind of benchmarkscore.
For instance you can rate the hour difference of provided and needed like every hour which is smaller then the needed is a -2, every hour which is bigger then needed is -1. So you can get a score to an employee assignment.
with this function you can start to randomly assign employees to departments (of course accourding to the min/max employees for each department) and then rate this solution using your function. so you can find a solution with a good score (if your defined function works well)
most of random assignments will be stupid of course but your computer generates million of solutions in seconds so chances are good that it will generate a good one after some time (dont think time is a big critera here cause once you have a solution it wont change very often)
I have a process I've inherited that I'm converting to C# from another language. Numerous steps in the process loop through what can be a lot of records (100K-200K) to do calculations. As part of those processes it generally does a lookup into another list to retrieve some values. I would normally move this kind of thing into a SQL statement (and we have where we've been able to) but in these cases there isn't really an easy way to do that. In some places we've attempted to convert the code to a stored procedure and decided it wasn't working nearly as well as we had hoped.
Effectively, the code does this:
var match = cost.Where(r => r.ryp.StartsWith(record.form.TrimEnd()) &&
r.year == record.year &&
r.period == record.period).FirstOrDefault();
cost is a local List type. If I was doing a search on only one field I'd probably just move this into a Dictionary. The records aren't always unique either.
Obviously, this is REALLY slow.
I ran across the open source library I4O which can build indexes, however it fails for me in various queries (and I don't really have the time to attempt to debug the source code). It also doesn't work with .StartsWith or .Contains (StartsWith is much more important since a lot of the original queries take advantage of the fact that doing a search for "A" would find a match in "ABC").
Are there any other projects (open source or commercial) that do this sort of thing?
EDIT:
I did some searching based on the feedback and found Power Collections which supports dictionaries that have keys that aren't unique.
I tested ToLookup() which worked great - it's still not quite as fast as the original code, but it's at least acceptable. It's down from 45 seconds to 3-4 seconds. I'll take a look at the Trie structure for the other look ups.
Thanks.
Looping through a list of 100K-200K items doesn't take very long. Finding matching items within the list by using nested loops (n^2) does take long. I infer this is what you're doing (since you have assignment to a local match variable).
If you want to quickly match items together, use .ToLookup.
var lookup = cost.ToLookup(r => new {r.year, r.period, form = r.ryp});
foreach(var group in lookup)
{
// do something with items in group.
}
Your startswith criteria is troublesome for key-based matching. One way to approach that problem is to ignore it when generating keys.
var lookup = cost.ToLookup(r => new {r.year, r.period });
var key = new {record.year, record.period};
string lookForThis = record.form.TrimEnd();
var match = lookup[key].FirstOrDefault(r => r.ryp.StartsWith(lookForThis))
Ideally, you would create the lookup once and reuse it for many queries. Even if you didn't... even if you created the lookup each time, it will still be faster than n^2.
Certainly you can do better than this. Let's start by considering that dictionaries are not useful only when you want to query one field; you can very easily have a dictionary where the key is an immutable value that aggregates many fields. So for this particular query, an immediate improvement would be to create a key type:
// should be immutable, GetHashCode and Equals should be implemented, etc etc
struct Key
{
public int year;
public int period;
}
and then package your data into an IDictionary<Key, ICollection<T>> or similar where T is the type of your current list. This way you can cut down heavily on the number of rows considered in each iteration.
The next step would be to use not an ICollection<T> as the value type but a trie (this looks promising), which is a data structure tailored to finding strings that have a specified prefix.
Finally, a free micro-optimization would be to take the TrimEnd out of the loop.
Now certainly all of this only applies to the specific example given and may need to be revisited due to other specifics of your situation, but in any case you should be able to extract practical gain from this or something similar.
I have set of 'codes' Z that are valid in a certain time period.
Since I need them a lot of times in a large loop (million+) and every time I have to lookup the corresponding code I cache them in a List<>. After finding the correct codes, i'm inserting (using SqlBulkCopy) a million rows.
I lookup the id with the following code (l_z is a List<T>)
var z_fk = (from z in l_z
where z.CODE == lookupCode &&
z.VALIDFROM <= lookupDate &&
z.VALIDUNTIL >= lookupDate
select z.id).SingleOrDefault();
In other situations I have used a Dictionary with superb performance, but in those cases I only had to lookup the id based on the code.
But now with searching on the combination of fields, I am stuck.
Any ideas? Thanks in advance.
Create a Dictionary that stores a List of items per lookup code - Dictionary<string, List<Code>> (assuming that lookup code is a string and the objects are of type Code).
Then when you need to query based on lookupDate, you can run your query directly off of dict[lookupCode]:
var z_fk = (from z in dict[lookupCode]
where z.VALIDFROM <= lookupDate &&
z.VALIDUNTIL >= lookupDate
select z.id).SingleOrDefault();
Then just make sure that whenever you have a new Code object, that it gets added to the List<Code> collection in the dict corresponding to the lookupCode (and if one doesn't exist, then create it).
A simple improvement would be to use...
//in initialization somewhere
ILookup<string, T> l_z_lookup = l_z.ToLookup(z=>z.CODE);
//your repeated code:
var z_fk = (from z in lookup[lookupCode]
where z.VALIDFROM <= lookupDate && z.VALIDUNTIL >= lookupDate
select z.id).SingleOrDefault();
You could further use a more complex, smarter data structure storing dates in sorted fashion and use a binary search to find the id, but this may be sufficient. Further, you speak of SqlBulkCopy - if you're dealing with a database, perhaps you can execute the query on the database, and then simply create the appropriate index including columns CODE, VALIDUNTIL and VALIDFROM.
I generally prefer using a Lookup over a Dictionary containing Lists since it's trivial to construct and has a cleaner API (e.g. when a key is not present).
We don't have enough information to give very prescriptive advice - but there are some general things you should be thinking about.
What types are the time values? Are you comparing date times or some primitive value (like a time_t). Think about how your data types affects performance. Choose the best ones.
Should you really be doing this in memory or should you be putting all these rows in to SQL and letting it be queried on there? It's really good at that.
But let's stick with what you asked about - in memory searching.
When searching is taking too long there is only one solution - search fewer things. You do this by partitioning your data in a way that allows you to easily rule out as many nodes as possible with as few operations as possible.
In your case you have two criteria - a code and a date range. Here are some ideas...
You could partition based on code - i.e. Dictionary> - if you have many evenly distributed codes your list sizes will each be about N/M in size (where N = total event count and M = number of events). So a million nodes with ten codes now requires searching 100k items rather than a million. But you could take that a bit further. The List could itself be sorted by starting time allowing a binary search to rule out many other nodes very quickly. (this of course has a trade-off in time building the collection of data). This should provide very quick
You could partition based on date and just store all the data in a single list sorted by start date and use a binary search to find the start date then march forward to find the code. Is there a benefit to this approach over the dictionary? That depends on the rest of your program. Maybe being an IList is important. I don't know. You need to figure that out.
You could flip the dictionary model partition the data by start time rounded to some boundary (depending on the length, granularity and frequency of your events). This is basically bucketing the data in to groups that have similar start times. E.g., all the events that were started between 12:00 and 12:01 might be in one bucket, etc. If you have a very small number of events and a lot of highly frequent (but not pathologically so) events this might give you very good lookup performance.
The point? Think about your data. Consider how expensive it should be to add new data and how expensive it should be to query the data. Think about how your data types affect those characteristics. Make an informed decision based on that data. When in doubt let SQL do it for you.
This to me sounds like a situation where this could all happen on the database via a single statement. Then you can use indexing to keep the query fast and avoid having to push data over the wire to and from your database.
I have a Dictionary containing 10 keys, each with a list containing up to 30,000 values. The values contain a DateTime property.
I frequently need to extract a small subset of one of the keys, like a date range of 30 - 60 seconds.
Doing this is easy, but getting it to run fast is not so. What would be the most efficient way to query this in-memory data?
Thanks a lot.
Sort lists by date at the first, then find your required items by binary search (i.e k item) and return them, finding the searched item is O(log(n)) because you need find first and last index. returning them is O(K) in all It's O(K+log(n))
IEnumerable<item> GetItems(int startIndex, int endIndex, List<item> input)
{
for (int i=startIndex;i<endIndex;i++)
yield return input[i];
}
1) Keep the dictionary, but use SortedList instead of a list for value of dictionaries, sorted by DateTime property
2) Implement a binary search to find the upper and lower edges in your range in the sorted list which gives you indexes.
3) Just select values in the range using Sortedlist.Values.Skip(lowerIndex).Take(upperIndex - lowerIndex)
In reply to Aliostad: I don't think bsearch will not work if the list of the collection is a linked list. It still takes O(n)
the fastest way will be to organize the data so it is indexed by the thing you want to search on. Currently you have it indexed by key, but you want to search by date. I think you would be best indexing it by date, if that is what you want to be able to search on.
I would keep 2 dictionaries, one indexed as you do now and one where the items are indexed by date. i would decide on a time frame (say 1 minute) and add each object to a list based on the minute it happens in and then add each list to the dictionary under the key of that minute. then when you want the data for a particular time frame, generate the relevant minute(s) and get the list(s) from the dictionary. This relies on you being able to know the key in the other dictionary from the objects though.
Suppose I have a collection (be it an array, generic List, or whatever is the fastest solution to this problem) of a certain class, let's call it ClassFoo:
class ClassFoo
{
public string word;
public float score;
//... etc ...
}
Assume there's going to be like 50.000 items in the collection, all in memory.
Now I want to obtain as fast as possible all the instances in the collection that obey a condition on its bar member, for example like this:
List<ClassFoo> result = new List<ClassFoo>();
foreach (ClassFoo cf in collection)
{
if (cf.word.StartsWith(query) || cf.word.EndsWith(query))
result.Add(cf);
}
How do I get the results as fast as possible? Should I consider some advanced indexing techniques and datastructures?
The application domain for this problem is an autocompleter, that gets a query and gives a collection of suggestions as a result. Assume that the condition doesn't get any more complex than this. Assume also that there's going to be a lot of searches.
With the constraint that the condition clause can be "anything", then you're limited to scanning the entire list and applying the condition.
If there are limitations on the condition clause, then you can look at organizing the data to more efficiently handle the queries.
For example, the code sample with the "byFirstLetter" dictionary doesn't help at all with an "endsWith" query.
So, it really comes down to what queries you want to do against that data.
In Databases, this problem is the burden of the "query optimizer". In a typical database, if you have a database with no indexes, obviously every query is going to be a table scan. As you add indexes to the table, the optimizer can use that data to make more sophisticated query plans to better get to the data. That's essentially the problem you're describing.
Once you have a more concrete subset of the types of queries then you can make a better decision as to what structure is best. Also, you need to consider the amount of data. If you have a list of 10 elements each less than 100 byte, a scan of everything may well be the fastest thing you can do since you have such a small amount of data. Obviously that doesn't scale to a 1M elements, but even clever access techniques carry a cost in setup, maintenance (like index maintenance), and memory.
EDIT, based on the comment
If it's an auto completer, if the data is static, then sort it and use a binary search. You're really not going to get faster than that.
If the data is dynamic, then store it in a balanced tree, and search that. That's effectively a binary search, and it lets you keep add the data randomly.
Anything else is some specialization on these concepts.
var Answers = myList.Where(item => item.bar.StartsWith(query) || item.bar.EndsWith(query));
that's the easiest in my opinion, should execute rather quickly.
Not sure I understand... All you can really do is optimize the rule, that's the part that needs to be fastest. You can't speed up the loop without just throwing more hardware at it.
You could parallelize if you have multiple cores or machines.
I'm not up on my Java right now, but I would think about the following things.
How you are creating your list? Perhaps you can create it already ordered in a way which cuts down on comparison time.
If you are just doing a straight loop through your collection, you won't see much difference between storing it as an array or as a linked list.
For storing the results, depending on how you are collecting them, the structure could make a difference (but assuming Java's generic structures are smart, it won't). As I said, I'm not up on my Java, but I assume that the generic linked list would keep a tail pointer. In this case, it wouldn't really make a difference. Someone with more knowledge of the underlying array vs linked list implementation and how it ends up looking in the byte code could probably tell you whether appending to a linked list with a tail pointer or inserting into an array is faster (my guess would be the array). On the other hand, you would need to know the size of your result set or sacrifice some storage space and make it as big as the whole collection you are iterating through if you wanted to use an array.
Optimizing your comparison query by figuring out which comparison is most likely to be true and doing that one first could also help. ie: If in general 10% of the time a member of the collection starts with your query, and 30% of the time a member ends with the query, you would want to do the end comparison first.
For your particular example, sorting the collection would help as you could binarychop to the first item that starts with query and terminate early when you reach the next one that doesn't; you could also produce a table of pointers to collection items sorted by the reverse of each string for the second clause.
In general, if you know the structure of the query in advance, you can sort your collection (or build several sorted indexes for your collection if there are multiple clauses) appropriately; if you do not, you will not be able to do better than linear search.
If it's something where you populate the list once and then do many lookups (thousands or more) then you could create some kind of lookup dictionary that maps starts with/ends with values to their actual values. That would be a fast lookup, but would use much more memory. If you aren't doing that many lookups or know you're going to be repopulating the list at least semi-frequently I'd go with the LINQ query that CQ suggested.
You can create some sort of index and it might get faster.
We can build a index like this:
Dictionary<char, List<ClassFoo>> indexByFirstLetter;
foreach (var cf in collection) {
indexByFirstLetter[cf.bar[0]] = indexByFirstLetter[cf.bar[0]] ?? new List<ClassFoo>();
indexByFirstLetter[cf.bar[0]].Add(cf);
indexByFirstLetter[cf.bar[cf.bar.length - 1]] = indexByFirstLetter[cf.bar[cf.bar.Length - 1]] ?? new List<ClassFoo>();
indexByFirstLetter[cf.bar[cf.bar.Length - 1]].Add(cf);
}
Then use the it like this:
foreach (ClasssFoo cf in indexByFirstLetter[query[0]]) {
if (cf.bar.StartsWith(query) || cf.bar.EndsWith(query))
result.Add(cf);
}
Now we possibly do not have to loop through as many ClassFoo as in your example, but then again we have to keep the index up to date. There is no guarantee that it is faster, but it is definately more complicated.
Depends. Are all your objects always going to be loaded in memory? Do you have a finite limit of objects that may be loaded? Will your queries have to consider objects that haven't been loaded yet?
If the collection will get large, I would definitely use an index.
In fact, if the collection can grow to an arbitrary size and you're not sure that you will be able to fit it all in memory, I'd look into an ORM, an in-memory database, or another embedded database. XPO from DevExpress for ORM or SQLite.Net for in-memory database comes to mind.
If you don't want to go this far, make a simple index consisting of the "bar" member references mapping to class references.
If the set of possible criteria is fixed and small, you can assign a bitmask to each element in the list. The size of the bitmask is the size of the set of the criteria. When you create an element/add it to the list, you check which criteria it satisfies and then set the corresponding bits in the bitmask of this element. Matching the elements from the list will be as easy as matching their bitmasks with the target bitmask. A more general method is the Bloom filter.