Generate closest teams based on employee schedules C# - c#

I am given a csv of employee schedules with columns:
employee ID, first last name, sunday schedule, monday schedule, ... , saturday schedule
1 week schedule for each employee. I've attached a screenshot of a portion of the csv file. The total file has around 300 rows.
I need to generate teams of 15 based on the employees' schedules (locations don't matter) so that the employees on each team have the closest schedules to each other. Pseudocode of what I have tried:
parse csv file into array of schedules (my own struct definition)
match employees who have the same exact schedule into teams (creates ~5 full sized teams, 20 - 25 half filled teams, leaves ~50 schedules who don't match with anyone)
for i = 1 to 14, for each member of teams of size i, find the team with the closest schedule (as a whole) and add the member to that team. Once a team reaches size 15, mark them as "done".
This worked somewhat but definitely did not give me the best teams. My question is does anyone know a better way to do this? Pseudocode or just a general idea will help, thanks.
EDIT: Here is an example of the formula of comparison
The comparison is based on half hour blocks of difference between the agents schedules. Agent 25 has a score of 16 because he has a difference of 8 half hours with Agent 23 and 24. The team's total score is 32 based on everyone's scores added together.
Not all agents work 8 hour days, and many have different days off, which have the greatest effect on their "closeness" score. Also, a few agents have a different schedule on a certain day than their normal schedule. For example, one agent might work 7am - 3pm on mondays but work 8am - 4pm on tuesday - friday.

Unless you find a method that gets you an exact best answer, I would add a hill-climbing phase at the end that repeatedly checks to see if swapping any pair of agents between teams would improve things, and swaps them if this is the case, only stopping when it has rechecked every pair of agents and there are no more improvements to be made.
I would do this for two reasons:
1) Such hill-climbing finds reasonably good solutions surprisingly often.
2) People are good at finding improvements like this. If you produce a computer-generated schedule and people can find simple improvements (perhaps because they notice they are often scheduled at the same time as somebody from another team) then you're going to look silly.
Thinking about (2) another way to find local improvements would be to look for cases where a small number of people from different teams are scheduled at the same time and see if you can swap them all onto the same team.

Can't say for sure about the schedules, but in string algorithms you can find an edit distance calculation. The idea is to define number of operations you need to perform to get one string from another. For example, distance between kitten and sitting is 3, 2 for substitutions and 1 for deletion. I think that you can define a metric between two employees' schedule in similar way.
Now, after you have a distance function, you may start a clusterization. The k-means algorithm may be a good start for you, but it's main disadvantage is that the number of groups is fixed initially. But I think that you can easily adjust general logic for your needs. After that, you may try some additional ways to cluster your data, but you really should start with your distance function, and then simply optimize it on your employee records.

Related

ML.NET Detect anomalies with gaps/missings in data

I have done some tests with DetectSpike and DetectAnomaly but despite differences and whole pipeline building - both methods require input of 1-dimensional float[]
Its fine to find anomalies and spikes but I was asked to find anomalies "normal", and anomalies with gaps. (this can be one scenario no need to separate this by kind)
Example 1: WebService receives about 30-100 requests/min, then theres break/gap and no requests for few minutes.
Example 2: I am receiving one bill invoice with random value each month, and theres two months without invoices.
I guess, that I am not able to do this without Date/Time dimension/column.
Any ideas?
The time series as a 1 dimensional float[] should be a list of counts of events in sequential and equal units of time.
So for your example 1, you could choose minutes, then create a sequential List object of counts of webservice events occurring within each minute.
Maintain a separate List object of the corresponding Datetimes for each item in the count List.
The example in the DetectEntireAnomonly documentation shows how to predict a score for each index of your List of minutes, then you can find the corresponding index in your Datetime List to retrieve the Datetime of the anomaly.

EWS exchange rooms view

I need to make an application which shows the current exchange meeting rooms and for each room whether the hour is free or busy. The user can give a daterange of max 5 days to see the result.
I have made a construction but it 2 slow to use as it takes up to 3 seconds to get all the information from only 3 meeting rooms (while in the future it will be more around 20).
This is how I work:
Authenticate through AutodiscoverUrl function: service.AutodiscoverUrl(email, password).
After been given a startdate and enddate with 5 days in it, I first get all the available meetingrooms with service.GetRooms("room#roomlist.com")
I iterate through the found meetingrooms and use the function service.GetUserAvailability(room,...) to get the calenderevents.
Then I have a class which tells me the hours of the day and I check the found calenderevents of the room to see whether an hour is busy or not.
Now I have my collection of rooms with calenderevents and the indication whether an hour is busy or not.
But is there another, faster way? As said this takes up to 2/3 seconds for only 3 rooms in a daterange of 5 days.
Are you calling the GetUserAvailability request for each room, as you iterate through, or batching the users together? The availability call can return info on multiple users (100 is the hard limit I recall). It's likely one big call will be more efficient than multiple single calls.

Linear regression on variables that does not scale directly with the output

I've been trying to follow a machine learning course on coursera. So far, most of the linear regression models introduced use variables that their numerical values have a positive correlation with the output.
Input: square feet of the house
Output: house price.
I'm however, trying to implement a multivariate regression model with some of the variables those numerical value that is not directly proportional to the output.
Inputs:
-what day is it (Mon,Tues..),
-what holiday is it (NewYear,Xmas..),
-what month is it(Jan,Feb),
-what time is it(0100,1300..)
Output:
-Number of visitors.
Questions:
For the variables: what day is it, what holiday is it, what month is it, I am using an enumeration and assign a value for each value. (NewYear =1, Christmas =2, etc.). Is it better to do it this way or have separate variables? (IsNewYear, IsChristmas, etc.)
I understand that by applying higher orders of power in a variable, it can have a better fit, which is what I want for the holidays variable. Are there any methods that I can use to let the computer learn the best order by itself?
Are there any existing C# libraries that I can use that allows different orders of power for different variable? (e.g. 13 for holidays and quadratic for the time of the day)
Thanks.
For the variables: what day is it, what holiday is it, what month is it, I am using an enumeration and assign a value for each value. (NewYear =1, Christmas =2, etc.). Is it better to do it this way or have separate variables? (IsNewYear, IsChristmas, etc.)
Yes, you should never encode any order inside a variable which does not follow arithmetics, thus NewYear=1, Christmas=2, Thanksgiving=3 would mean that Christmas=(Thanksgiving+NewYear) / 2... now something you would like to have. One hot encoding (isNewyear etc.) is favorable so you do not encode false knowledge.
I understand that by applying higher orders of power in a variable, it can have a better fit, which is what I want for the holidays variable. Are there any methods that I can use to let the computer learn the best order by itself?
This is what non-linear methods do. Kernel methods (kernelized linear regression, SVR), neural networks, regression trees/forests etc.
Are there any existing C# libraries that I can use that allows different orders of power for different variable? (e.g. 13 for holidays and quadratic for the time of the day)
You should not think about this in such terms, you are not supposed to fit powers by hand, you should rather give a model flexibility to fit high orders by themselves (see previous point).

Combinatorial optimization where several criteria must be satisfied

We are a group of first-year students studying computer science.
We are working on a project called "The electronical diet plan" (directly translated)
We want to make a program in C# that on a weekly basic calculate a diet plan that fullfiels/satisfies some criteria:
Your daily energy intake should not exceed the calculated calorie needs.
(Ex. If we calculate that a person should eat 2000 calories per day, the diet plan should plan approximately 2000 calories)
The daily energy (calories) should be distributed as follows:
Fat 25-35%
Carbohydrates 50-60%
Proteins 10-20%
We have a "database" with food and how much fat, carbohydrates and proteins it contains + the approximate price.
And we have a "database" with recipies and how much time it takes to cook it.
SO: We want to make a program that on a weekly basic calculate a good diet plan which satisfies the daily energy need (and how it should be distributed (fat, carbohydrates, proteins)). The program should also plan a diet plan that not takes
a lot of time and not cost to much (the user defines a upper bound for the price pr. week).
SO.. We want help to find a method/algorithm that can combinate 3-6 dishes per day which satisfied this ^^
We have been looking at a lot of combinational optimizations algorithms/problems but mostly "The knapsack problem".
But these algorithms/problem is only satisfying one criteria or trying to find the "cheapest" solution.
-> We want to satisfy a lot of criteria and want to find the best solution (not cheapest.. ex. fat has to be between 25-35%, not just be the lowest value)
We hope that some of you can help us with a good algorithm.
When it comes to finding the "cheapest" solution rather than the "best", you'll just have to redefine "cheap".
In optimization theory one often refers to the cost function, which is to be minimized - in your case, "cost" could be "fat percentage point difference from 30%", i.e. it costs nothing to eat 30% fat, and equally much to eat 20% as 40%. Of course, to make the method even more sophisticated, you could weigh it so it's more "expensive" to eat too much fat, than too little.
Now, if you create costs for each of your criteria, you also have to weigh them together, as mellamokb noted in a comment; to do this, simply calculate a weighted total cost. You'll end up with something like the following:
cost of diet = (importance of price) * price + (importance of time) * time + (importance of fat) * (deviation from fat goal) + etc ...
If you want to make it impossible to go over budget (in money spent), you could add terms like
over budget ? infinity : 0 to make the algorithm find solutions within the budget. You can also make constraints for repetition of meals etc - it's more or less your imagination (and computing power) that set the limits.
Now that you have a cost function, you can start working on your solution to the problem: minimizing the cost of the diet. And suddenly all those algorithms finding the "cheapest" solution make sense... ;)
Note that formulating this cost function is usually the difficult part. Depending on how you weigh your costs you'll find very different solutions to the problem; not all of them will be useful (in fact most of them probably won't be).

Approach to solve delimited scheduling

I'm facing a problem and Im having problems to decide/figure-out an approach to solve it. The problem is the following:
Given N phone calls to be made, schedule in a way that the maximum of them be made.
Know Info:
Number of phone calls pending
Number callers (people who will talk on the phone)
Type of phone call (Reminder, billing, negotiation, etc...)
Estimate duration of phone call type (reminder:1min, billing:3min, negotiation:15min, etc...)
Number of phone calls pending
Ideal date for a given call
"Minimum" date of the a given call (can't happen before...)
"Maximum" date of the a given call (can't happen after...)
A day only have 8 hours
Rules:
Phone calls cannot be made before the "Minimum" or after the "Maximum" date
Reminder call placed award 1 point, reminder call missed -2 points
Billing call placed award 6 points, billing call missed -9 points
Negotiation call placed award 20 points, Negotiation call missed -25 points
A phone calls to John must be placed by the first person to ever call him. Notice that it does not HAVE TO, but, that call will earn extra points if you do...
I know a little about A.I. and I can recognize this a problem that fits the class, but i just dont know which approach to take... should i use neural networks? Graph search?
PS: this is not a academic question. This a real world problem that im facing.
PS2: Pointing system is still being created... the points here sampled are not the real ones...
PS3: The resulting algol can be executed several times (batch job style) or it can be resolved online depending on the performance...
PS4: My contract states that I will charge the client based on: (amount of calls I place) + (ratio * the duration of the call), but theres a clause about quality of service, and only placing reminders calls is not good for me, because even when reminded, people still forget to attend their appointments... which reduces the "quality" of the service I provide... i dont know yet the exact numbers
This does not seem like a problem for AI.
If it were me I would create a set of rules, ordered by priority. Then start filling in the caller's schedule.
Mabey one of the rules is to assign the shortest duration call types first (to satisfy the "maximum number of calls made" criteria).
This is sounding more and more like a knapsack problem, where you would substitute in call duration and call points for weight and price.
This is just a very basic answer, but you could try to "brute force" an optimum solution:
Use the Combinatorics library (it's in NuGet too) to generate every permutation of calls for a given person to make in a given time period (looking one week into the future, for instance).
For each permutation, group the calls into 8-hour chunks by estimated duration, and assign a date to them.
Iterate through the chunks - if you get to a call too early, discard that permutation. Otherwise add or subtract points based on whether the call was made before the end date. Store the total score as the score for that permutation.
Choose the permutation with the highest score.

Categories

Resources