I'm facing a problem and Im having problems to decide/figure-out an approach to solve it. The problem is the following:
Given N phone calls to be made, schedule in a way that the maximum of them be made.
Know Info:
Number of phone calls pending
Number callers (people who will talk on the phone)
Type of phone call (Reminder, billing, negotiation, etc...)
Estimate duration of phone call type (reminder:1min, billing:3min, negotiation:15min, etc...)
Number of phone calls pending
Ideal date for a given call
"Minimum" date of the a given call (can't happen before...)
"Maximum" date of the a given call (can't happen after...)
A day only have 8 hours
Rules:
Phone calls cannot be made before the "Minimum" or after the "Maximum" date
Reminder call placed award 1 point, reminder call missed -2 points
Billing call placed award 6 points, billing call missed -9 points
Negotiation call placed award 20 points, Negotiation call missed -25 points
A phone calls to John must be placed by the first person to ever call him. Notice that it does not HAVE TO, but, that call will earn extra points if you do...
I know a little about A.I. and I can recognize this a problem that fits the class, but i just dont know which approach to take... should i use neural networks? Graph search?
PS: this is not a academic question. This a real world problem that im facing.
PS2: Pointing system is still being created... the points here sampled are not the real ones...
PS3: The resulting algol can be executed several times (batch job style) or it can be resolved online depending on the performance...
PS4: My contract states that I will charge the client based on: (amount of calls I place) + (ratio * the duration of the call), but theres a clause about quality of service, and only placing reminders calls is not good for me, because even when reminded, people still forget to attend their appointments... which reduces the "quality" of the service I provide... i dont know yet the exact numbers
This does not seem like a problem for AI.
If it were me I would create a set of rules, ordered by priority. Then start filling in the caller's schedule.
Mabey one of the rules is to assign the shortest duration call types first (to satisfy the "maximum number of calls made" criteria).
This is sounding more and more like a knapsack problem, where you would substitute in call duration and call points for weight and price.
This is just a very basic answer, but you could try to "brute force" an optimum solution:
Use the Combinatorics library (it's in NuGet too) to generate every permutation of calls for a given person to make in a given time period (looking one week into the future, for instance).
For each permutation, group the calls into 8-hour chunks by estimated duration, and assign a date to them.
Iterate through the chunks - if you get to a call too early, discard that permutation. Otherwise add or subtract points based on whether the call was made before the end date. Store the total score as the score for that permutation.
Choose the permutation with the highest score.
Related
I need to make an application which shows the current exchange meeting rooms and for each room whether the hour is free or busy. The user can give a daterange of max 5 days to see the result.
I have made a construction but it 2 slow to use as it takes up to 3 seconds to get all the information from only 3 meeting rooms (while in the future it will be more around 20).
This is how I work:
Authenticate through AutodiscoverUrl function: service.AutodiscoverUrl(email, password).
After been given a startdate and enddate with 5 days in it, I first get all the available meetingrooms with service.GetRooms("room#roomlist.com")
I iterate through the found meetingrooms and use the function service.GetUserAvailability(room,...) to get the calenderevents.
Then I have a class which tells me the hours of the day and I check the found calenderevents of the room to see whether an hour is busy or not.
Now I have my collection of rooms with calenderevents and the indication whether an hour is busy or not.
But is there another, faster way? As said this takes up to 2/3 seconds for only 3 rooms in a daterange of 5 days.
Are you calling the GetUserAvailability request for each room, as you iterate through, or batching the users together? The availability call can return info on multiple users (100 is the hard limit I recall). It's likely one big call will be more efficient than multiple single calls.
I am given a csv of employee schedules with columns:
employee ID, first last name, sunday schedule, monday schedule, ... , saturday schedule
1 week schedule for each employee. I've attached a screenshot of a portion of the csv file. The total file has around 300 rows.
I need to generate teams of 15 based on the employees' schedules (locations don't matter) so that the employees on each team have the closest schedules to each other. Pseudocode of what I have tried:
parse csv file into array of schedules (my own struct definition)
match employees who have the same exact schedule into teams (creates ~5 full sized teams, 20 - 25 half filled teams, leaves ~50 schedules who don't match with anyone)
for i = 1 to 14, for each member of teams of size i, find the team with the closest schedule (as a whole) and add the member to that team. Once a team reaches size 15, mark them as "done".
This worked somewhat but definitely did not give me the best teams. My question is does anyone know a better way to do this? Pseudocode or just a general idea will help, thanks.
EDIT: Here is an example of the formula of comparison
The comparison is based on half hour blocks of difference between the agents schedules. Agent 25 has a score of 16 because he has a difference of 8 half hours with Agent 23 and 24. The team's total score is 32 based on everyone's scores added together.
Not all agents work 8 hour days, and many have different days off, which have the greatest effect on their "closeness" score. Also, a few agents have a different schedule on a certain day than their normal schedule. For example, one agent might work 7am - 3pm on mondays but work 8am - 4pm on tuesday - friday.
Unless you find a method that gets you an exact best answer, I would add a hill-climbing phase at the end that repeatedly checks to see if swapping any pair of agents between teams would improve things, and swaps them if this is the case, only stopping when it has rechecked every pair of agents and there are no more improvements to be made.
I would do this for two reasons:
1) Such hill-climbing finds reasonably good solutions surprisingly often.
2) People are good at finding improvements like this. If you produce a computer-generated schedule and people can find simple improvements (perhaps because they notice they are often scheduled at the same time as somebody from another team) then you're going to look silly.
Thinking about (2) another way to find local improvements would be to look for cases where a small number of people from different teams are scheduled at the same time and see if you can swap them all onto the same team.
Can't say for sure about the schedules, but in string algorithms you can find an edit distance calculation. The idea is to define number of operations you need to perform to get one string from another. For example, distance between kitten and sitting is 3, 2 for substitutions and 1 for deletion. I think that you can define a metric between two employees' schedule in similar way.
Now, after you have a distance function, you may start a clusterization. The k-means algorithm may be a good start for you, but it's main disadvantage is that the number of groups is fixed initially. But I think that you can easily adjust general logic for your needs. After that, you may try some additional ways to cluster your data, but you really should start with your distance function, and then simply optimize it on your employee records.
I have the following algorithm:
Given a list of accounts, I have to divide them fairly between system users.
Now, in order to ease the workload on the users I have to split them over days.
So, if an account has service orders they must be inserted to the list that will be distributed over 547 days (a year and a half). If an account has no service orders they must be inserted to the list that will be distributed over 45 days (a month and a half).
I am using the following LINQ extension from a question I asked before:
public static IEnumerable<IGrouping<TBucket, TSource>> DistributeBy<TSource, TBucket>(
this IEnumerable<TSource> source, IList<TBucket> buckets)
{
var tagged = source.Select((item,i) => new {item, tag = i % buckets.Count});
var grouped = from t in tagged
group t.item by buckets[t.tag];
return grouped;
}
and I can guarantee that it works.
My problem is that I don't really know how to unit test all those cases.
I might have 5 users and 2 accounts which might not be enough to be split over a year and a half or even a month and a half.
I might have exactly 547 accounts and 547 system users so each account will be handled by each system user each day.
Basically I don't know what kind of datasets I should create, because it seems that there are too many options, and what should I assert on because I have no idea how the distribution will be.
Start with boundary conditions (natural limits on the input of the method) and any known corner cases (places where the method behaves in an unexpected manner and you need special code to account for this).
For example:
How does the method behave when there are zero accounts?
Zero users?
Exactly one account and user
547 accounts and 547 users
It sounds like you already know a lot of the expected boundary conditions here. The corner cases will be harder to think of initially, but you will probably have hit some of them as you developed the method. They will also naturally come through manual testing - each time you find a bug this a new necessary test.
Once you have tested bounday conditons and corner cases you should also look at a "fair" sample of other situations - like your 5 users and 2 accounts example. You can't (or at least, arguably don't need to) test all possible inputs into the method, but you can test a representative sample of inputs, for things like uneven division of accounts.
I think part of your issue is that your code describes how you are solving your problem, but not what problem you are trying to solve. Your current implementation is deterministic and gives you an answer, but you could easily swap it with another "fair allocator", which would give you an answer that would differ in the details (maybe different users would be allocated different accounts), but satisfy the same requirements (allocation is fair).
One approach would be to focus on what "fairness" means. Rather than checking the exact output of your current implementation, maybe reframe it so that at a high level, it looks like:
public interface IAllocator
{ IAllocation Solve(IEnumerable<User> users, IEnumerable<Account> accounts); }
and write tests which verify not the specific implementation of accounts to users, but that the allocation is fair, to be defined - something along "every user should have the same number of accounts allocated, plus or minus one". Defining what fair is, or what the exact goal of your algorithm is, should help you identify the corner cases of interest. Working off a higher-level objective (the allocation should be fair, and not the specific allocation) should allow you to easily swap implementations and verify whether the allocator is doing its job.
I've been digging around to see if something similar has been done previously, but have not seen anything with the mirrored conditions. To make swallowing the problem a little easier to understand, I'm going to apply it in the context of filling a baseball team roster.
The given roster structure is organized as such: C, 1B, 2B, 3B, SS, 2B/SS (either or), 1B/3B, OF, OF, OF, OF, UT (can be any position)
Every player has at least one of the non-backup positions (positions that allow more than one position) where they're eligible and in many cases more than one (i.e. a player that can play 1B and OF, etc.). Say that you are manager of a team, which already has some players on it and you want to see if you have room for a particular player at any of your slots or if you can move one or more players around to open up a slot where he is eligible.
My initial attempts were to use a conditional permutation and collect in a list all the possible unique "lineups" for each player, updating the open slots before moving to the next player. This also required (since the order that the player was moved would affect what positions were available for the next player) that the list being looped through was reordered and then looped through again. I still think that this is the way to go, but there are a number of pitfalls that have snagged the function.
The data to start the loop that you assume is given is:
1. List of positions the player being evaluated can player (the one being checked if he can fit)
2. List of players currently on the roster and the positions each of those is eligible at (I'm currently storing a list of lists and using the list index as the unique identifier of the player)
3. A list of the positions open as the roster currently is
It's proven a bigger headache than I originally anticipated. It was even suggested to me by a colleague that the situation I have (which involves, on a much larger scale, conditional assignments for each object) was NP-complete. I am certain that it is not, since once a player has been repositioned in a particular lineup being tested, the entire roster should not need to be iterated over again once another player has moved. That's the long and short of it and I finally decided to open it up to the forums.
Thanks for any help anyone can provide. Due to restrictions, I can't post portions of code (some of it is legacy). It is, however, being translated in .NET (C# at the moment). If there's additional information necessary, I'll try and rewrite some of the short pieces of the function for post.
Joseph G.
EDITED 07/24/2010
Thank you very much for the responses. I actually did look into using a genetic algorithm, but ultimately abandoned it because of the amount of work that would go into the determination of ordinal results was superfluous. The ultimate aim of the test is to determine if there is, in fact, a scenario that returns a positive. There's no need to determine the relative benefit of each working solution.
I appreciate the feedback on the likely lack of familiarity with the context I presented the problem. The actual model is in the distribution of build commands across multiple platform-specific build servers. It's accessible, but I'd rather not get into why certain build tasks can only be executed on certain systems and why certain systems can only execute certain types of build commands.
It appears that you have gotten the gist of what I was presenting, but here's a different model that's a little less specific. There are a set of discrete positions in an ordered array of lists as such (I'll refer to these as "positions"):
((2), (2), (3), (4), (5), (6), (4, 6), (3, 5), (7), (7), (7), (7), (7), (2, 3, 4, 5, 6, 7))
Additionally, there is a an unordered array of lists (I'll refer to as "employees") that can only occupy one of the slots if its array has a member in common with the ordered list to which it would be assigned. After the initial assignments have been made, if an additional employee comes along, I need to determine if he can fill one of the open positions, and if not, if the current employees can be rearranged to allow one of the positions the employee CAN fill to be made available.
Brute force is something I'd like to avoid, because with this being on the order of 40 - 50 objects (and soon to be increasing), individual determinations will be very expensive to calculate at runtime.
I don't understand baseball at all so sorry if I'm on the wrong track. I do like rounders though, but there are only 2 positions to play in rounders, a batter or everyone else.
Have you considered using Genetic Algorithms to solve this problem? They are very good at solving NP hard problems and work surprisingly well for rota and time schedule type problems as well.
You have a solution model which can easily be scored and easily manipulated which is a great start for a genetic algorithm.
For more complex problems where the total permutations are too large to calculate a genetic algorithm should find a near optimum or excellent solution (along with lots and lots of other valid solutions) in a fairly short amount of time. Although if you wish the find the optimum solution every time, you are going to have to brute force it in all likelihood (I have only skimmed the problem so this may not be the case but it sounds like it probably is).
In your example, you would have a solution class, this represents a solution, IE a line-up for the baseball team. You randomly generate say 20 solutions, regardless if they are valid or not, then you have a rating algorithm that rates the solution. In your case, a better player in the line-up would score more than a worse player, and any invalid line-ups (for whatever reason) would force a score of 0.
Any 0 scoring solutions are killed off, and replaced with new random ones, and the rest of the solutions breed together to form new solutions. Theoretically and after enough time the pool of solutions should improve.
This has the benefit of not only finding lots of valid unique line-ups, but also rating them. You didn't specify in your problem the need to rate the solutions, but it offers plenty of benefits (for example if a player is injured, he can be temporarily rated as a -10 or whatever). All other players score based on their quantifiable stats.
It's scalable and performs well.
It sounds as though you have a bipartite matching problem. One partition has a vertex for each player on the roster. The other has a vertex for each roster position. There is an edge between a player vertex and a position vertex if and only if the player can play that position. You are interested in matchings: collections of edges such that no endpoint is repeated.
Given an assignment of players to positions (a matching) and a new player to be accommodated, there is a simple algorithm to determine if it can be done. Direct each edge in the current matching from the position to the player; direct the others from the player to the position. Now, using breadth-first search, look for a path from the new player to an unassigned position. If you find one, it tells you one possible series of reassignments. If you don't, there's no matching with all of the players.
For example, suppose player A can play positions 1 or 2
A--1
\
\
2
We provisionally assign A to 2. Now B shows up and can only play 2. Direct the graph:
A->1
<
\
B->2
We find a path B->2->A->1, which means "assign B to 2, displacing A to 1".
There is lots of pretty theory for dealing with hypothetical matchings. Genetic algorithms need not apply.
EDIT: I should add that because of the use of BFS, it computes the least disruptive sequence of reassignments.
I need to count the amount (in B/kB/MB/whatever) of data sent and received by my PC, by every running program/process.
Let's say I click "Start counting" and I get the sum of everything sent/received by my browser, FTP client, system actualizations etc. etc. from that moment till I choose "Stop".
To make it simpler, I want to count data transferred via TCP only - if it matters.
For now, I got the combo list of NICs in the PC (based on the comment in the link below).
I tried to change the code given here but I failed, getting strange out-of-nowhere values in dataSent/dataReceived.
I also read the answer at the question 442409 but as I can see it is about the data sent/received by the same program, which doesn't fit my requirements.
Perfmon should have counters for this type of thing that you want to do, so look there first.
Alright, I think I've found the solution, but maybe someone will suggest something better...
I made the timer (tested it with 10ms interval), which gets the "Bytes Received/sec" PerformanceCounter value and adds it to a global "temporary" variable and also increments the sum counter (if there is any lag). Then I made second timer with 1s interval, which gets the sum of values (from temporary sum), divides it by the counter and adds to the overall amount (also global). Then it resets the temporary sum and the counter.
I'm just not sure if it is right method, because I don't know, how the variables of "Bytes Received/sec" PerformanceCounter are varying during the one second. Maybe I should make some kind of histograph and get the average value?
For now, downloading 8.6MB file gave me 9.2MB overall amount - is it possible the other processes would generate that amount of net activity in less than 20 seconds?