ML.NET Detect anomalies with gaps/missings in data - c#

I have done some tests with DetectSpike and DetectAnomaly but despite differences and whole pipeline building - both methods require input of 1-dimensional float[]
Its fine to find anomalies and spikes but I was asked to find anomalies "normal", and anomalies with gaps. (this can be one scenario no need to separate this by kind)
Example 1: WebService receives about 30-100 requests/min, then theres break/gap and no requests for few minutes.
Example 2: I am receiving one bill invoice with random value each month, and theres two months without invoices.
I guess, that I am not able to do this without Date/Time dimension/column.
Any ideas?

The time series as a 1 dimensional float[] should be a list of counts of events in sequential and equal units of time.
So for your example 1, you could choose minutes, then create a sequential List object of counts of webservice events occurring within each minute.
Maintain a separate List object of the corresponding Datetimes for each item in the count List.
The example in the DetectEntireAnomonly documentation shows how to predict a score for each index of your List of minutes, then you can find the corresponding index in your Datetime List to retrieve the Datetime of the anomaly.

Related

EWS exchange rooms view

I need to make an application which shows the current exchange meeting rooms and for each room whether the hour is free or busy. The user can give a daterange of max 5 days to see the result.
I have made a construction but it 2 slow to use as it takes up to 3 seconds to get all the information from only 3 meeting rooms (while in the future it will be more around 20).
This is how I work:
Authenticate through AutodiscoverUrl function: service.AutodiscoverUrl(email, password).
After been given a startdate and enddate with 5 days in it, I first get all the available meetingrooms with service.GetRooms("room#roomlist.com")
I iterate through the found meetingrooms and use the function service.GetUserAvailability(room,...) to get the calenderevents.
Then I have a class which tells me the hours of the day and I check the found calenderevents of the room to see whether an hour is busy or not.
Now I have my collection of rooms with calenderevents and the indication whether an hour is busy or not.
But is there another, faster way? As said this takes up to 2/3 seconds for only 3 rooms in a daterange of 5 days.
Are you calling the GetUserAvailability request for each room, as you iterate through, or batching the users together? The availability call can return info on multiple users (100 is the hard limit I recall). It's likely one big call will be more efficient than multiple single calls.

Generate closest teams based on employee schedules C#

I am given a csv of employee schedules with columns:
employee ID, first last name, sunday schedule, monday schedule, ... , saturday schedule
1 week schedule for each employee. I've attached a screenshot of a portion of the csv file. The total file has around 300 rows.
I need to generate teams of 15 based on the employees' schedules (locations don't matter) so that the employees on each team have the closest schedules to each other. Pseudocode of what I have tried:
parse csv file into array of schedules (my own struct definition)
match employees who have the same exact schedule into teams (creates ~5 full sized teams, 20 - 25 half filled teams, leaves ~50 schedules who don't match with anyone)
for i = 1 to 14, for each member of teams of size i, find the team with the closest schedule (as a whole) and add the member to that team. Once a team reaches size 15, mark them as "done".
This worked somewhat but definitely did not give me the best teams. My question is does anyone know a better way to do this? Pseudocode or just a general idea will help, thanks.
EDIT: Here is an example of the formula of comparison
The comparison is based on half hour blocks of difference between the agents schedules. Agent 25 has a score of 16 because he has a difference of 8 half hours with Agent 23 and 24. The team's total score is 32 based on everyone's scores added together.
Not all agents work 8 hour days, and many have different days off, which have the greatest effect on their "closeness" score. Also, a few agents have a different schedule on a certain day than their normal schedule. For example, one agent might work 7am - 3pm on mondays but work 8am - 4pm on tuesday - friday.
Unless you find a method that gets you an exact best answer, I would add a hill-climbing phase at the end that repeatedly checks to see if swapping any pair of agents between teams would improve things, and swaps them if this is the case, only stopping when it has rechecked every pair of agents and there are no more improvements to be made.
I would do this for two reasons:
1) Such hill-climbing finds reasonably good solutions surprisingly often.
2) People are good at finding improvements like this. If you produce a computer-generated schedule and people can find simple improvements (perhaps because they notice they are often scheduled at the same time as somebody from another team) then you're going to look silly.
Thinking about (2) another way to find local improvements would be to look for cases where a small number of people from different teams are scheduled at the same time and see if you can swap them all onto the same team.
Can't say for sure about the schedules, but in string algorithms you can find an edit distance calculation. The idea is to define number of operations you need to perform to get one string from another. For example, distance between kitten and sitting is 3, 2 for substitutions and 1 for deletion. I think that you can define a metric between two employees' schedule in similar way.
Now, after you have a distance function, you may start a clusterization. The k-means algorithm may be a good start for you, but it's main disadvantage is that the number of groups is fixed initially. But I think that you can easily adjust general logic for your needs. After that, you may try some additional ways to cluster your data, but you really should start with your distance function, and then simply optimize it on your employee records.

Two lists of DateTimes - searching for chains of "bumps"

I have an app, that is connecting two clients together. Now, after client connects, he starts sending "bumps" - when a bump hits the server, the time is being added to a List<DateTime>. Now, since there are two clients connected together - there are two lists.
What I want to do, is that I want to look through both lists, and find timespans when both users were sending bumps with time difference of 60 seconds.
Example:
Bumps of user1:
18:28:00
18:28:30
18:29:30
18:30:00
18:30:30
Bumps of user2:
18:29:00
18:30:00
Since user2 has sent only two bumps, and user1 was also sending bumps in the same time (60sec difference), the timespan of both users should be 1 minute.
Is there any algorithm that can compute that?
Edit for clarification: I want to get as short timespan as possible, also there might be a big gap of nothing, ant then there might be another timespan (so basically there will be many timespans).
It is very similar to merge step in merge sort.
If X and Y are the list of bump times, sort them first. After that keep moving through both of these lists with the following condition:
If diff(X[i],Y[j]) < 60 "Output something";
If (X[i]<Y[j])i++;
Else j++;

Approach to solve delimited scheduling

I'm facing a problem and Im having problems to decide/figure-out an approach to solve it. The problem is the following:
Given N phone calls to be made, schedule in a way that the maximum of them be made.
Know Info:
Number of phone calls pending
Number callers (people who will talk on the phone)
Type of phone call (Reminder, billing, negotiation, etc...)
Estimate duration of phone call type (reminder:1min, billing:3min, negotiation:15min, etc...)
Number of phone calls pending
Ideal date for a given call
"Minimum" date of the a given call (can't happen before...)
"Maximum" date of the a given call (can't happen after...)
A day only have 8 hours
Rules:
Phone calls cannot be made before the "Minimum" or after the "Maximum" date
Reminder call placed award 1 point, reminder call missed -2 points
Billing call placed award 6 points, billing call missed -9 points
Negotiation call placed award 20 points, Negotiation call missed -25 points
A phone calls to John must be placed by the first person to ever call him. Notice that it does not HAVE TO, but, that call will earn extra points if you do...
I know a little about A.I. and I can recognize this a problem that fits the class, but i just dont know which approach to take... should i use neural networks? Graph search?
PS: this is not a academic question. This a real world problem that im facing.
PS2: Pointing system is still being created... the points here sampled are not the real ones...
PS3: The resulting algol can be executed several times (batch job style) or it can be resolved online depending on the performance...
PS4: My contract states that I will charge the client based on: (amount of calls I place) + (ratio * the duration of the call), but theres a clause about quality of service, and only placing reminders calls is not good for me, because even when reminded, people still forget to attend their appointments... which reduces the "quality" of the service I provide... i dont know yet the exact numbers
This does not seem like a problem for AI.
If it were me I would create a set of rules, ordered by priority. Then start filling in the caller's schedule.
Mabey one of the rules is to assign the shortest duration call types first (to satisfy the "maximum number of calls made" criteria).
This is sounding more and more like a knapsack problem, where you would substitute in call duration and call points for weight and price.
This is just a very basic answer, but you could try to "brute force" an optimum solution:
Use the Combinatorics library (it's in NuGet too) to generate every permutation of calls for a given person to make in a given time period (looking one week into the future, for instance).
For each permutation, group the calls into 8-hour chunks by estimated duration, and assign a date to them.
Iterate through the chunks - if you get to a call too early, discard that permutation. Otherwise add or subtract points based on whether the call was made before the end date. Store the total score as the score for that permutation.
Choose the permutation with the highest score.

How to disable the booked time slots from a list of business hour time slots

Friends,
I' working Appointment booking Project, Details are as follows:
Business hour starts from 9:00 to 7:00 with default duration of 30mins. So, Slots start like (9:00, 9:30, 10:00.... 7:00).
Here, to show the available slots, I'm using the following Logic.
Storing all the Slots with 30 min duration in a list (LIST A) like [9:00, 9:30, 10:00, 10:30, ... 7:00]
Looping through booked appointments (contains start and end time), and if start time is matched with any of LIST A elements, I', removing that element from that List. and Loop continues.
Here, the problem is, Consider If appointment is booked 9:30-10:00.
Based on my logic, 9:30 is matched with LIST A element, and It will remove 9:30 from that list.
So, available slots will be displayed as [9:00, X ,10:00, 10:30, .... 7:00]. Actually It should be [9:00, 9:30, 10:30, 11:00... 7:00]
Instead of showing available slots 9:00-9:30, 10:30-11:00 it shows 9:00-10:00, 10:30-11 since 9:30 is removed from the list.,
Please help to solve this, or suggest me some alternative approaches for this problem. Badly needed.
The thing you are mixing up is, you are taking second slot's start time as first slot's end time. So rather then doing that, what you can do is to store start time and duration.
And to simply compute the end time, you do
StartTime.AddMinutes(30);
And to add one more comment at end; you are trying to build a very rigid structure. And will face problems if you'd try to extend the application, IMHO.
I suggest, Instead of using Single Dimensional Array, use Multidimensional array like
[[9:00][9:30],[9:30][10:00],[10:00][10:30], .... nth Item]
Here, Logic should be like this
var start=[start time]
var end=[end time]
var duration=[duration]
for (i=start;i<end;i+=duration)
{
if(start==A[i][0])
remove(A[i][0]);
}
A.sort();
return A;

Categories

Resources