I have a sequence of objects, that each have a sequence number that goes from 0 to ushort.MaxValue (0-65535). I have at max about 10 000 items in my sequence, so there should not be any duplicates, and the items are mostly sorted due to the way they are loaded. I only need to access the data sequentially, I don't need them in a list, if that can help. It is also something that is done quite frequently, so it cannot have a too high Big-O.
What is the best way to sort this list?
An example sequence could be (in this example, assume the sequence number is a single byte and wraps at 255):
240 241 242 243 244 250 251 245 246 248 247 249 252 253 0 1 2 254 255 3 4 5 6
The correct order would then be
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 0 1 2 3 4 5 6
I have a few different approaches, including making a array of ushort.MaxValue size, and just incrementing the position, but that seems like a very inefficient way, and I have some problems when the data I receive have a jump in sequence. However, it's O(1) in performance..
Another approach is to order the items normally, then find the split (6-240), and move the first items to the end. But I'm not sure if that is a good idea.
My third idea is to loop the sequence, until I find a wrong sequence number, look ahead until I find the correct one, and move it to its correct position. However, this can potentially be quite slow if there is a wrong sequence number early on.
Is this what you are looking for?
var groups = ints.GroupBy(x => x < 255 / 2)
.OrderByDescending(list => list.ElementAt(0))
.Select(x => x.OrderBy(u => u))
.SelectMany(i => i).ToList();
Example
In:
int[] ints = new int[] { 88, 89, 90, 91, 92, 0, 1, 2, 3, 92, 93, 94, 95, 96, 97, 4, 5, 6, 7, 8, 99, 100, 9, 10, 11, 12, 13 };
Out:
88 89 90 91 92 92 93 94 95 96 97 99 100 0 1 2 3 4 5 6 7 8 9 10 11 12 13
I realise this is an old question byte I also needed to do this and would have liked an answer so...
Use a SortedSet<FileData> with a custom comparer;
where FileData contains information about the files you are working with
e.g.
struct FileData
{
public ushort SequenceNumber;
...
}
internal class Sequencer : IComparer<FileData>
{
public int Compare(FileData x, FileData y)
{
ushort comparer = (ushort)(x.SequenceNumber - y.SequenceNumber);
if (comparer == 0) return 0;
if (comparer < ushort.MaxValue / 2) return 1;
return -1;
}
}
As you read file information from disk add them to your SortedSet
When you read them out of the SortedSet they are now in the correct order
Note that the SortedSet uses a Red-Black Internally which should give you a nice balance between performance and memory
Insertion is O(log n)
Traversal is O(n)
Related
Please consider this scenario:
For some math calculations I should find a number in specific place in a sorted list. For example consider this list:
1 - 2 - 3 - ... 17 - 18 - 19 - 20
I should to find number placed in 25% of count (count / 4). In above series I should get 5. It is worth noting that we haven't round count number but it's not a problem.
Now consider this table:
Type Number
----------------------
1 10
1 11
1 12
1 13
2 22
2 23
2 24
2 25
2 26
2 27
2 28
3 39
3 38
3 37
3 36
3 35
3 34
3 33
3 32
4 41
4 43
4 42
4 44
4 45
4 47
4 46
4 48
4 49
4 50
4 51
Another point is I'm sure that in every Type I have at least 1000
numbers, so above data in just for example.
according to above data I want to get this result:
Type Number
----------------------
1 11
2 23
3 33
4 43
One way to achieve this result is to loop throw distinct Type and get list of number and then sort it and then calculate count of that list and divide it by 4, then round the result and get specific Number with the index has been gotten.
But the problem with this approach is it needs many connection to database (1 for each Type). Is there any better solution to get desired result with 1 connection and 1 query execution. thanks
Interesting puzzle. In Sql Server you could use something like the following query;
select a.*
from (
select *, row_number() over(partition by type order by number) as row_number
from table_name
) a
join (
select type, count(*) as count
from table_name
group by type
) b on a.type = b.type
where a.row_number = b.count/4
(With whatever rounding you want for when count%4 != 0)
But I can't think how you would build that as a linq expression.
var percent = 0.25;
var val = res.GroupBy(x => x.type)
.ToDictionary(x => x.Key, x => x.OrderBy(y=>y).ToList());
var valuesTobeTaken = val.Select(x => new
{
x.Key,
index = ((int)Math.Round(x.Value.Count * percent))-1
});
Edge cases are not handled and the code is not too much optimized. You can work on that i guess
foreach (var rec in valuesTobeTaken)
{
Console.WriteLine(val[rec.Key][rec.index]);
}
i have an multidimensional string array that looks like this :
example data:
{{20.07.2020 06:00, 20.07.2020 07:00, 150},{20.07.2020 07:00, 20.07.2020 08:00, 130}, {20.07.2020 08:00, 20.07.2020 09:00, 15}, {20.07.2020 09:00, 20.07.2020 10:00, 180}, {20.07.2020 10:00, 20.07.2020 11:00, 100}} etc.
and need to validate the the value part of array (3rd item of an inner array) :
1. if value is empty, replace it with 0 and extract this array into a new ZeroArray ( which will contain all the arrays with empty values)
2. if value(i) is greater or smaller than value(i+1) for 50% or more, extract this array into a new ExtremeArray(which will contain all the arrays with extreme values)
3. get the sum of all values in array of arrays
Can anyone give me a help me on how to work with the multidimensional array and get the needed results?
Thank You all in advance.
Since you only asked for a hint (and this feels suspiciously like a homework problem...) I think a good hint is that you're going to have to use two counters. One for traversing the first part of the array, then the next for traversing the inner array. These counters are going to be nested. Essentially, you have a problem like:
Count from 0 to 5.
But each time you increment a number, count from 20 to 25.
so you'd do something like:
for (var outerCount = 0; outerCount <= 5; outerCount++)
{
Console.WriteLine($"Counting {outerCount}");
for (var innerCount = 20; innerCount <= 25; innerCount++)
{
Console.WriteLine($"\tCounting {innerCount}");
}
}
Output
Counting 0
Counting 20
Counting 21
Counting 22
Counting 23
Counting 24
Counting 25
Counting 1
Counting 20
Counting 21
Counting 22
Counting 23
Counting 24
Counting 25
Counting 2
Counting 20
Counting 21
Counting 22
Counting 23
Counting 24
Counting 25
Counting 3
Counting 20
Counting 21
Counting 22
Counting 23
Counting 24
Counting 25
Counting 4
Counting 20
Counting 21
Counting 22
Counting 23
Counting 24
Counting 25
Counting 5
Counting 20
Counting 21
Counting 22
Counting 23
Counting 24
Counting 25
I have been stuck on this problem now for 8 weeks and I think that I almost have a solution however the last bit of math is racking my mind. I will try to explain a simple problem that requires a complex solution. I am programing in C#.net MVC Web Project. Here is the situation.
I have an unknown group of quantities incoming to look for like items. Those like items share a max level to make it a full box. Here is an example of this:
Revision******
This is the real world case
I have many, let say candy, orders coming in to a company.
Qty Item MaxFill Sold-To DeliverNumber
60 candy#14 26 Joe 1
1 candy#12 48 Jim 2
30 candy#11 48 Jo 3
60 candy#15 48 Tom 4
6 candy#8 48 Kat 5
30 candy#61 48 Kim 6
44 candy#12 48 Jan 7
10 candy#12 48 Yai 8
10 candy#91 48 Jun 9
55 candy#14 26 Qin 10
30 candy#14 26 Yo 11
40 candy#14 26 Moe 12
in this list I am looking for like candy items to combine to make all the full boxes of candy that I can based off the MaxFill number. Here we see the like items are:
Qty Item MaxFill Sold-To DeliverNumber
60 candy#14 26 Joe 1
55 candy#14 26 Qin 10
30 candy#14 26 Yo 11
40 candy#14 26 Moe 12
1 candy#12 48 Jim 2
44 candy#12 48 Jan 7
10 candy#12 48 Yai 8
Now lets take the first set of numbers for candy#14.
I know that the total of candy#14 is 185 and I can get 7 full boxes of 26 with one box having only 3 in the last box. So how do I do this with the values that I have without losing the information of the original order. So this is how I am working it out right now
See below
End of Revision******
Like candy#14 max fill level is 26.
Like candy#14 quantities:
60
55
30
40
Now I already have a recursive function to break these down to the 26 level and is working fine. I feel that I need another recursive function to deal with the remainders that come out of this. As you can see most of the time there will be remainders from any given list but those remainders could total up to another full box of 26.
60 = 26+26+8
55 = 26+26+3
30 = 26+4
40 = 26+14
The 8,3,4,14 = 29 so I can get another 26 out of this. But in the real unknown world I could have the remainders come up with a new set of remainders that could repeat the same situation. To make this even more complicated I have to save the data that is originality with the 60,55,30,40 that is carried with it such as who it was sold to and delivery number. This will also be helpful with knowing how the original amount was broken down and combined together.
from the 8,3,4,14 the best way that I was think to add to that value is to take the 8,4,14 this will give me the 26 that I am looking for and I would not have to split any value because 3 is the remainder and I could save all other data without issue. However this just works in this situation only. If I go in a linear motion 8+3+4=15 so I would have to take 11 from the next value 14 with a remainder of 3.
In reading about different algorithms I was thinking that this might fall into the NP,NP-Complete,NP-Hard category. But with all the situations it is very technical and not a lot of real world scenarios are to be found.
Any suggestions would help here if I should go through the list of number to find the best combinations to reach the 26 or if the linear progression and splitting of the next value is the best solution. I know that I can solve to get how many full boxes I could get from the remainders and what the left over amount would be such as 8+3+4+14=29 which would give me 1, 26 and 1, 3 but I have no idea about the math in a recursive way to solve this. I have this much done and I "feel" that this is on the right track but can't see how to adjust to make this work with the linear or "test every possible combination".
public static void Main(string[] args)
{
var numbers = new List<int>() { 8, 3, 4, 14 };
var target = 26;
sum_up(numbers, target);
}
private static void sum_up(List<int> numbers, int target)
{
sum_up_recursive(numbers, target, new List<int>());
}
private static void sum_up_recursive(List<int> numbers, int target, List<int> partial)
{
int s = 0;
foreach (int x in partial) s += x;
if (s == target)
{
var outputtext = "sum(" + string.Join(",", partial.ToArray()) + ")=" + target;
}
if (s >= target)
return;
for (int i = 0; i < numbers.Count; i++)
{
List<int> remaining = new List<int>();
int n = numbers[i];
for (int j = i + 1; j < numbers.Count; j++) remaining.Add(numbers[j]);
List<int> partial_rec = new List<int>(partial);
partial_rec.Add(n);
sum_up_recursive(remaining, target, partial_rec);
}
}
I wrote sample project in javascript.
Please check my repo.
https://github.com/panghea/packaging_sample
Given a data structure of:
class TheClass
{
int NodeID;
double Cost;
List<int> NodeIDs;
}
And a List with data:
27 -- 10.0 -- 1, 5, 27
27 -- 10.0 -- 1, 5, 27
27 -- 10.0 -- 1, 5, 27
27 -- 15.5 -- 1, 4, 13, 14, 27
27 -- 10.0 -- 1, 4, 25, 26, 27
27 -- 15.5 -- 1, 4, 13, 14, 27
35 -- 10.0 -- 1, 4, 13, 14, 35
I want to reduce it to the unique NodeIDs lists
27 -- 10.0 -- 1, 5, 27
27 -- 15.5 -- 1, 4, 13, 14, 27
27 -- 10.0 -- 1, 4, 25, 26, 27
35 -- 10.0 -- 1, 4, 13, 14, 35
Then I'll be summing the Cost column (Node 27 total cost: 10.0 + 15.5 + 10.0 = 35.5) -- that part is straight forward.
What is the fastest way to remove the duplicate rows / find uniques?
Production data set will have NodeIDs lists of 100 to 200 IDs, about 1,500 in List with around 500 being unique.
I'm 100% focused on speed -- if adding some other data would help, I'm happy to (I've tried hashing the lists into a SHA value, but that turned out slower than my current grunt exhaustive search).
.GroupBy(x=> string.Join(",", x.NodeIDs)).Select(x=>x.First())
That should be faster for big data than Distinct.
If you want to remove duplicate objects according to equal lists you could create a custom IEqualityComparer<T> for lists and use that for Enumerable.GroupBy. Then you just need to create new instances of your class for each group and sum up Cost.
Here is a possible implementation (from):
public class ListEqualityComparer<T> : IEqualityComparer<List<T>>
{
public bool Equals(List<T> lhs, List<T> rhs)
{
return lhs.SequenceEqual(rhs);
}
public int GetHashCode(List<T> list)
{
unchecked
{
int hash = 23;
foreach (T item in list)
{
hash = (hash * 31) + (item == null ? 0 : item.GetHashCode());
}
return hash;
}
}
}
and here is a query that selects one (unique) instance per group:
var nodes = new List<TheClass>(); // fill ....
var uniqueAndSummedNodes = nodes
.GroupBy(n => n.NodeIDs, new ListEqualityComparer<int>())
.Select(grp => new TheClass
{
NodeID = grp.First().NodeID, // just use the first, change accordingly
Cost = grp.Sum(n => n.Cost),
NodeIDs = grp.Key
});
nodes = uniqueAndSummedNodes.ToList();
This implementation uses SequenceEqual which takes the order and the number of occurences of each number in the list into account.
Edit: I've only just seen that you don't want to sum up the group's Costs but to sum up all groups' Cost, that's simple:
double totalCost = nodes.Sum(n => n.Cost);
If you dont want to sum up the group itself replace
...
Cost = grp.Sum(n => n.Cost),
with
...
Cost = grp.First().Cost, // presumes that all are the same
I'm trying to simulate a realistic key press event. For that reason I'm using SendInput() method, but for greater result I need to specify the delay between keyDOWN and KeyUP events! These numbers below show the elapsed time in milliseconds between DOWN and UP events (these are real/valid):
96
95
112
111
119
104
143
96
95
104
120
112
111
88
104
119
111
103
95
104
95
127
112
143
144
142
143
128
144
112
111
112
120
128
111
135
118
147
96
135
103
64
64
87
79
112
88
111
111
112
111
104
87
95
We can simplify the output:
delay 64 - 88 ms -> 20% of a time
delay 89 - 135 ms -> 60% of a time
delay 136 - 150 ms -> 20 % of a time
How do I trigger an event according to probabilities from above? Here is the code I'm using right now:
private void button2_Click(object sender, EventArgs e)
{
textBox2.Focus();
Random r = new Random();
int rez = r.Next(0, 5); // 0,1,2,3,4 - five numbers total
if (rez == 0) // if 20% (1/5)
{
Random r2 = new Random();
textBox2.AppendText(" " + rez + " " + r2.Next(64, 88) + Environment.NewLine);
// do stuff
}
else if (rez == 4)//if 20% (1/5)
{
Random r3 = new Random();
textBox2.AppendText(" " + rez + " " + r3.Next(89, 135) + Environment.NewLine);
// do stuff
}
else // if 1 or 2 or 3 (3/5) -> 60%
{
Random r4 = new Random();
textBox2.AppendText(" " + rez + " " + r4.Next(136, 150) + Environment.NewLine);
// do stuff
}
}
There is a huge problem with this code. In theory, after millions of iterations - the resulting graph will look similar to this:
How to deal with this problem?
EDIT: the solution was to use distribution as people suggested.
here is java implementation of such code:
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Random.html#nextGaussian%28%29
and here is C# implementation:
How to generate normally distributed random from an integer range?
although I'd suggest to decrease the value of "deviations" a little.
here is interesting msdn article
http://blogs.msdn.com/b/ericlippert/archive/2012/02/21/generating-random-non-uniform-data-in-c.aspx
everyone thanks for help!
Sounds like you need to generate a normal distribution. The built-in .NET class generates a Uniform Distribution.
Gaussian or Normal distribution random numbers are possible using the built-in Random class by using the Box-Muller transform.
You should end up with a nice probability curve like this
(taken from http://en.wikipedia.org/wiki/Normal_distribution)
To transform a Normally Distributed random number into an integer range, the Box-Muller transform can help with this again. See this previous question and answer which describes the process and links to the mathematical proof.
This is the right idea, I just think you need to use doubles instead of ints so you can partition the probability space between 0 and 1. This will allow you to get a finer grain, as follows :
Normalise the real values by dividing all the values by the largest value
Divide the values into buckets - the more buckets, the closer the graph will be to the continuous case
Now, the larger the bucket the more chance of the event being raised. So, partition the interval [0,1] according to how many elements are in each bucket. So, if you have 20 real values, and a bucket has 5 values in it, it takes up a quarter of the interval.
On each test, generate a random number between 0-1 using Random.NextDouble() and whichever bucket the random number falls into, raise an event with that parameter. So for the numbers you provided, here are the values for 5 buckets buckets :
This is a bit much to put in a code example, but hopefully this gives the right idea
One possible approach would be to model the delays as an Exponential Distribution. The exponential distribution models the time between events that occur continuously and independently at a constant average rate - which sounds like a fair assumption given your problem.
You can estimate the parameter lambda by taking the inverse of the average of your real observed delays, and simulate the distribution using this approach, i.e.
delay = -Math.Log(random.NextDouble()) / lambda
However, looking at your sample, the data looks too "concentrated" around the mean to be a pure Exponential, so simulating that way would result in delays with the proper mean, but too spread out to match your sample.
One way to address that is to model the process as a shifted Exponential; essentially, the process is shifted by a value which represents the minimum the value can take, instead of 0 for an exponential. In code, taking the shift as the minimum observed value from your sample, this could look like this:
var sample = new List<double>()
{
96,
95,
112,
111,
119,
104,
143,
96,
95,
104,
120,
112
};
var min = sample.Min();
sample = sample.Select(it => it - min).ToList();
var lambda = 1d / sample.Average();
var random = new Random();
var result = new List<double>();
for (var i = 0; i < 100; i++)
{
var simulated = min - Math.Log(random.NextDouble()) / lambda;
result.Add(simulated);
Console.WriteLine(simulated);
}
A trivial alternative, which is in essence similar to Aidan's approach, is to re-sample: pick random elements from your original sample, and the result will have exactly the desired distribution:
var sample = new List<double>()
{
96,
95,
112,
111,
119,
104,
143,
96,
95,
104,
120,
112
};
var random = new Random();
var size = sample.Count();
for (var i = 0; i < 100; i++)
{
Console.WriteLine(sample[random.Next(0, size)]);
}