LINQ group collection by an arbitrary lattice

LINQ group collection by an arbitrary lattice - c#

Apologies if I'm missing something very basic.
For a given lattice array in which lattice values represent the minimum for their bucket, what is the best way to group an array of values.
e.g.
double[] lattice = { 2.3, 2.8, 4.1, 4.7 };
double[] values = { 2.35, 2.4, 2.6, 3, 3.8, 4.5, 5.0, 8.1 };
GroupByLattice(values, lattice);
such that GroupByLattice returns IGroupings that look like:
2.3 : { 2.35, 2.4, 2.6 }
2.8 : { 3, 3.8 }
4.1 : { 4.5 }
4.7 : { 5.0, 8.1 }
edit:
I'm green enough with LINQ queries that this is the best I can some up with:
values.GroupBy( curr => lattice.First( lat => curr > lat) )
Issues with this:
Everything ends up in the first bucket - I can understand why (of course the first bucket satisfies the case for each after) but I'm having a hard time wrapping my head around these in-place operations to get the predicate that I actually want.
I suspect that having a LINQ query inside of a LINQ query will not be very performant
Post-Mortem Solution and Results:
Dmitry Bychenko provided a great answer, I just wanted to provide some followup for those who may come across this answer in the future. I had originally been trying to solve: How can I simplify a huge dataset for plotting?
For starters, my first attempt was actually pretty close. With my lattice being already ordered I simply needed to change a .First( ... ) to a .Last( ... )
i.e.
values.GroupBy( curr => lattice.Last( lat => curr > lat) )
That's all well and good, but was curious about how much better Dmitry's solution would perform. I tested it with a random set of 10000 doubles, with a lattice at a 0.25 spacing. (I pulled out the .Select(...) transform from Dmitry's solution to keep it fair)
The average of 20 runs spit out the result:
Mine: 602ms
Dmitrys: 3ms
Uh ... WOW! That's a 200x increase in speed. 200x! I had to run this a few times and inspect in the debugger just to be certain that the LINQ statement was evaluating before the timestamp (Trusty .ToArray() to the rescue). I'm going to say it now, anyone who's looking to accomplish this same task should most certainly use this methodology

Providing that lattice is sorted (it's easy to sort the array with Array.Sort(lattice)) you can use Array.BinarySearch:
double[] lattice = { 2.3, 2.8, 4.1, 4.7 };
double[] values = { 2.35, 2.4, 2.6, 3, 3.8, 4.5, 5.0, 8.1 };
var result = values
.GroupBy(item => {
int index = Array.BinarySearch(lattice, item);
return index >= 0 ? lattice[index] : lattice[~index - 1];
})
.Select(chunk => String.Format("{0} : [{1}]",
chunk.Key, String.Join(", ", chunk)));
Test
Console.Write(String.Join(Environment.NewLine, result));
Outcome
2.3 : [2.35, 2.4, 2.6]
2.8 : [3, 3.8]
4.1 : [4.5]
4.7 : [5, 8.1]

If you ever need it faster, you can iterate the arrays only once if both of them are sorted:
double[] lattice = { 2.3, 2.8, 4.1, 4.7 };
double[] values = { 2.35, 2.4, 2.6, 3, 3.8, 4.5, 5.0, 8.1 };
var result = new List<double>[lattice.Length]; // array of lists
for (int l = lattice.Length - 1, v = values.Length - 1; l >= 0; l--) // starts from last elements
{
result[l] = new List<double>(values.Length / lattice.Length * 2); // optional initial capacity of the list
for (; v >= 0 && values[v] >= lattice[l]; v--)
{
result[l].Insert(0, values[v]);
}
}

Related

Percent change of a IList<T> returning IList<T> using LINQ

Given a List<T> where T is float/decimal/double, using LINQ, compute the percent change of all the values.
This is the definition of PercentChange (without error checking for example if a is zero)
static double PercentChange(double a, double b)
{
double result = 0;
result = (b - a) / a;
return result;
}
var list = new List<double>();
list.Add(2.0);
list.Add(2.5);
list.Add(2.0);
list.Add(1.75);
Then using LINQ would return a new List one element less, with values:
[.25, -.20, -.125]
I know I can loop. I would like a functional version using LINQ.

I'm not sure I'd find this approach more readable than a loop, but the LINQ approach would be:
list.Zip(list.Skip(1), PercentChange)
Outputs:
[.25, -.2, -.125]
The idea is to take the first list, "zip" it with itself, but skip the first element (so y is the next element) and apply your PercentChange function on it. Zip will automatically truncate the resulting sequence to the size of the smaller sequence, so you end up with three elements.

Several solutions are available, I prefer the following one, combining LINQs Enumerable.Zip with LINQS Enumerable.Skip
var result = list.Zip(list.Skip(1), (a, b) => (b - a) / a);
Console.WriteLine(string.Join(", ", result));
which prints
0.25, -0.2, -0.125
This very compact solution allows calculation of your list like visualized here:
a: 2.0 2.5 2.0 1.75
b: 2.5 2.0 1.75
res: 0.25 -0.2 -0.125

Code efficiency and accuracy

I'm trying to solve a problem on code wars and the unit tests provided make absolutely no sense...
The problem is as follows and sounds absolutely simple enough to have something working in 5 minutes
Consider a sequence u where u is defined as follows:
The number u(0) = 1 is the first one in u.
For each x in u, then y = 2 * x + 1 and z = 3 * x + 1 must be in u too.
There are no other numbers in u.
Ex: u = [1, 3, 4, 7, 9, 10, 13, 15, 19, 21, 22, 27, ...]
1 gives 3 and 4, then 3 gives 7 and 10, 4 gives 9 and 13, then 7 gives 15 and 22 and so on...
Task:
Given parameter n the function dbl_linear (or dblLinear...) returns the element u(n) of the ordered (with <) sequence u.
Example:
dbl_linear(10) should return 22
At first I used a sortedset with a linq query as I didnt really care about efficiency, I quickly learned that this operation will have to calculate to ranges where n could equal ~100000 in under 12 seconds.
So this abomination was born, then butchered time and time again since a for loop would generate issues for some reason. It was then "upgraded" to a while loop which gave slightly more passed unit tests ( 4 -> 8 ).
public class DoubleLinear {
public static int DblLinear(int n) {
ListSet<int> table = new ListSet<int> {1};
for (int i = 0; i < n; i++) {
table.Put(Y(table[i]));
table.Put(Z(table[i]));
}
table.Sort();
return table[n];
}
private static int Y(int y) {
return 2 * y + 1;
}
private static int Z(int z) {
return 3 * z + 1;
}
}
public class ListSet<T> : List<T> {
public void Put(T item) {
if (!this.Contains(item))
this.Add(item);
}
}
With this code it still fails the calculation in excess of n = 75000, but passes up to 8 tests.
I've checked if other people have passed this, and they have. However, i cannot check what they wrote to learn from it.
Can anyone provide insight to what could be wrong here? I'm sure the answer is blatantly obvious and I'm being dumb.
Also is using a custom list in this way a bad idea? is there a better way?

ListSet is slow for sorting, and you constantly get memory reallocation as you build the set. I would start by allocating the table in its full size first, though honestly I would also tell you using a barebones array of the size you need is best for performance.
If you know you need n = 75,000+, allocate a ListSet (or an ARRAY!) of that size. If the unit tests start taking you into the stratosphere, there is a binary segmentation technique we can discuss, but that's a bit involved and logically tougher to build.
I don't see anything logically wrong with the code. The numbers it generates are correct from where I'm standing.
EDIT: Since you know 3n+1 > 2n+1, you only ever have to maintain 6 values:
Target index in u
Current index in u
Current x for y
Current x for z
Current val for y
Current val for z
public static int DblLinear(int target) {
uint index = 1;
uint ind_y = 1;
uint ind_z = 1;
uint val_y = 3;
uint val_z = 4;
if(target < 1)
return 1;
while(index < target) {
if(val_y < val_z) {
ind_y++;
val_y = 2*ind_y + 1;
} else {
ind_z++;
val_z = 3*ind_z + 1;
}
index++;
}
return (val_y < val_z) ? val_y : val_z;
}
You could modify the val_y if to be a while loop (more efficient critical path) if you either widen the branch to 2 conditions or implement a backstep loop for when you blow past your target index.
No memory allocation will definitely speed your calculations up, even f people want to (incorrectly) belly ache about branch prediction in such an easily predictable case.
Also, did you turn optimization on in your Visual Studio project? If you're submitting a binary and not a code file, then that can also shave quite a bit of time.

List<Tuple<T>> | AddRange with arrays

Im wondering about how to add arrays to a List<Tuple<double, double>>.
My (short) code:
double[] var1 = new double[5] { 1, 2, 3, 4, 5 };
double[] var2 = new double[5] { 1.5, 1.5, 2.5, 1.2, 1.1 };
List<Tuple<double, double>> tup = new List<Tuple<double, double>>();
I would like to fill tup with my arrays. Unfortunately for that I would need to instanciate a Tuple for each entry. Dont know how to do this.
In general I could just use a loop, but this looks dirty for me. My question is about performance and clean code.
if(var1.Length == var2.Length)
{
for (int i = 0; i < var1.Length; i++)
{
tup.Add(new Tuple<double, double>(var1[i], var2[i]));
}
}
Isnt there a shorter way to archive this? Any way with AddRange maybe?

You could use LINQ's Enumerable.Zip extension method. Per the docs, this:
Applies a specified function to the corresponding elements of two sequences, producing a sequence of the results.
In this case, we can use Tuple.Create as the function to create a tuple from both elements.
var tup = var1.Zip(var2, Tuple.Create).ToList();
Though note that this could give a slightly different result to your code in the case where the sequences are not the same length. Per the docs again:
If the input sequences do not have the same number of elements, the method combines elements until it reaches the end of one of the sequences
Check out this fiddle for a working demo.

Well, you could use Zip to project the collections to a list of tuples:
var tup = var1.Zip(var2, (v1, v2) => new Tuple<double, double>(v1,v2))
.ToList();
But personally I find your original method easier to read and to understand the intent. There should be very little performance difference. Shorter code isn't always better code.

You have to loop yourself or let LINQ loop, Enumerable.Zip joins by index:
List<Tuple<double, double>> tup = var1.Zip(var2, (d1, d2) => Tuple.Create(d1, d2))
.ToList();
So this isn't more efficient but might be more readable.

Packing item from set of available packs

Suppose there is an Item that a customer is ordering - in this case it turns out they are ordering 176 (totalNeeded) of this Item.
The database has 5 records associated with this item that this item can be stored in:
{5 pack, 8 pack, 10 pack, 25 pack, 50 pack}.
A rough way of packing this would be:
Sort the array from biggest to smallest.
While (totalPacked < totalNeeded) // 176
{
1. Maintain an <int, int> dictionary which contains Keys of pack id's,
and values of how many needed
2. Add the largest pack, which is not larger than the amount remaining to pack,
increment totalPacked by the pack size
3. If any remainder is left over after the above, add the smallest pack to reduce
waste
e.g., 4 needed, smallest size is 5, so add one 5; one extra item packed
}
Based on the above logic, the outcome would be:
You need: 3 x 50 packs, 1 x 25 pack, 1 x 5 pack
Total Items: 180
Excess = 4 items; 180 - 176
The above is not too difficult to code, I have it working locally. However, it is not truly the best way to pack this item. Note: "best" means, smallest amount of excess.
Thus ... we have an 8 pack available, we need 176. 176 / 8 = 22. Send the customer 22 x 8 packs, they will get exactly what they need. Again, this is even simpler than the pseudo-code I wrote ... see if the total needed is evenly divisible by any of the packs in the array - if so, "at the very least" we know that we can fall back on 22 x 8 packs being exact.
In the case that the number is not divisible by an array value, I am attempting to determine possible way that the array values can be combined to reach at least the number we need (176), and then score the different combinations by # of Packs needed total.
If anyone has some reading that can be done on this topic, or advice of any kind to get me started it would be greatly appreciated.
Thank you

This is a variant of the Subset Sum Problem (Optimization version)
While the problem is NP-Complete, there is a pretty efficient pseudo-polynomial time Dynamic Programming solution to it, by following the recursive formulas:
D(x,i) = false x<0
D(0,i) = true
D(x,0) = false x != 0
D(x,i) = D(x,i-1) OR D(x-arr[i],i
The Dynamic Programming Solution will build up a table, where an element D[x][i]==true iff you can use the first i kinds of packs to establish sum x.
Needless to say that D[x][n] == true iff there is a solution with all available packs that sums to x. (where n is the total number of packs you have).
To get the "closest higher number", you just need to create a table of size W+pack[0]-1 (pack[0] being the smallest available pack, W being the sum you are looking for), and choose the value that yields true which is closest to W.
If you wish to give different values to the different pack types, this becomes Knapsack Problem, which is very similar - but uses values instead a simple true/false.
Getting the actual "items" (packs) chosen after is done by going back the table and retracing your steps. This thread and this thread elaborate how to achieve it with more details.

If this example problem is truly representative of the actual problem you are solving, it is small enough to try every combination with brute force using recursion. For example, I found exactly 6,681 unique packings that are locally maximized, with a total of 205 that have exactly 176 total items. The (unique) solution with minimum number of packs is 6, and that is { 2-8, 1-10, 3-50 }. Total runtime for the algorithm was 8 ms.
public static List<int[]> GeneratePackings(int[] packSizes, int totalNeeded)
{
var packings = GeneratePackingsInternal(packSizes, 0, new int[packSizes.Length], totalNeeded);
return packings;
}
private static List<int[]> GeneratePackingsInternal(int[] packSizes, int packSizeIndex, int[] packCounts, int totalNeeded)
{
if (packSizeIndex >= packSizes.Length) return new List<int[]>();
var currentPackSize = packSizes[packSizeIndex];
var currentPacks = new List<int[]>();
if (packSizeIndex + 1 == packSizes.Length) {
var lastOptimal = totalNeeded / currentPackSize;
packCounts[packSizeIndex] = lastOptimal;
return new List<int[]> { packCounts };
}
for (var i = 0; i * currentPackSize <= totalNeeded; i++) {
packCounts[packSizeIndex] = i;
currentPacks.AddRange(GeneratePackingsInternal(packSizes, packSizeIndex + 1, (int[])packCounts.Clone(), totalNeeded - i * currentPackSize));
}
return currentPacks;
}
The algorithm is pretty straightforward
Loop through every combination of number of 5-packs.
Loop through every combination of number of 8-packs, from remaining amount after deducting specified number of 5-packs.
etc to 50-packs. For 50-pack counts, directly divide the remainder.
Collect all combinations together recursively (so it dynamically handles any set of pack sizes).
Finally, once all the combinations are found, it is pretty easy to find all packs with least waste and least number of packages:
var packSizes = new int[] { 5, 8, 10, 25, 50 };
var totalNeeded = 176;
var result = GeneratePackings(packSizes, totalNeeded);
Console.WriteLine(result.Count());
var maximal = result.Where (r => r.Zip(packSizes, (a, b) => a * b).Sum() == totalNeeded).ToList();
var min = maximal.Min(m => m.Sum());
var minPacks = maximal.Where (m => m.Sum() == min).ToList();
foreach (var m in minPacks) {
Console.WriteLine("{ " + string.Join(", ", m) + " }");
}
Here is a working example: https://ideone.com/zkCUYZ

This partial solution is specifically for your pack sizes of 5, 8, 10, 25, 50. And only for order sizes at least 40 large. There are a few gaps at smaller sizes that you'll have to fill another way (specifically at values like 6, 7, 22, 27 etc).
Clearly, the only way to get any number that isn't a multiple of 5 is to use the 8 packs.
Determine the number of 8-packs needed with modular arithmatic. Since the 8 % 5 == 3, each 8-pack will handle a different remainder of 5 in this cycle: 0, 2, 4, 1, 3. Something like
public static int GetNumberOf8Packs(int orderCount) {
int remainder = (orderCount % 5);
return ((remainder % 3) * 5 + remainder) / 3;
}
In your example of 176. 176 % 5 == 1 which means you'll need 2 8-packs.
Subtract the value of the 8-packs to get the number of multiples of 5 you need to fill. At this point you still need to deliver 176 - 16 == 160.
Fill all the 50-packs you can by integer dividing. Keep track of the leftovers.
Now just fit the 5, 10, 25 packs as needed. Obviously use the larger values first.
All together your code might look like this:
public static Order MakeOrder(int orderSize)
{
if (orderSize < 40)
{
throw new NotImplementedException("You'll have to write this part, since the modular arithmetic for 8-packs starts working at 40.");
}
var order = new Order();
order.num8 = GetNumberOf8Packs(orderSize);
int multipleOf5 = orderSize - (order.num8 * 8);
order.num50 = multipleOf5 / 50;
int remainderFrom50 = multipleOf5 % 50;
while (remainderFrom50 > 0)
{
if (remainderFrom50 >= 25)
{
order.num25++;
remainderFrom50 -= 25;
}
else if (remainderFrom50 >= 10)
{
order.num10++;
remainderFrom50 -= 10;
}
else if (remainderFrom50 >= 5)
{
order.num5++;
remainderFrom50 -= 5;
}
}
return order;
}
A DotNetFiddle

How to extract values from arrays using upper and lower limits?

Given two arrays, I need to extract values from arrayB based on where the range(actual values) falls in arrayA.
Index 0 1 2 3 4 5 6 7 8 9 10 11 12
-------------------------------------------------------------
ArrayA = {0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6}
ArrayB = {1, 0.2, 3, 4, 5, 6,5.5, 8, 9,11.1, 11, 12, 3}
Given the following ranges, I need to extract the following results
RangeToExtract* IndexInArrayA Expected Values To Extract
-------------- ------------- --------------------------
0 -> 1 [0,2] 1,0.2,3
1 -> 3 [3,6] 4,5,6,5.5
3 -> 5 [7,10] 5.5,8,9,11.1,11
1 -> 5 [3,10] 4,5,6,5.5,8,9,11.1,11
3 -> 10 [7,12] 8,9,11.1,11,12,3
* Refers to the actual values in ArrayA
Note: Given the RangeToExtract (0->1), determine the indexes in ArrayA where these values are, the result being (0->1) maps to [0,2] (The value 1 is in position 2 in ArrayA)
I only figured that the following special cases exists (not sure if there are more)
the lower limit is equal to zero and
when the upper limit does not exist in ArrayA
Further info:
Both arrays will be the same size
ArrayA will always be sorted
Code:
private double[] GetRange(double lower, double upper)
{
var myList = new double[ArrayA.Length];
var lowerIndex = Array.IndexOf(ArrayA, lower);
var upperIndex = Array.IndexOf(ArrayA, upper);
// special case 1
if (lowerIndex != 0)
{
lowerIndex = lowerIndex + 1;
}
// special case 2
if (upperIndex == -1)
{
upperIndex = ArrayA.Length-1;
}
for (int i = lowerIndex; i <= upperIndex; i++)
{
myList[i] = ArrayB[i];
}
return myList;
}
Given the above code, have all the special cases been taken into account? Is there a better way to write the above code?

Yap! There is a quite better way, that comes with lovely LINQ. I put here in two forms. First looks complicated but not at ALL! Believe me ;)
At the first step you have to take out those A'indexes that their values fall into your range (I call it min...max), based on your example I got that your range is closed from the lower boundary and closed on upper side, I means when you mentioned 3 -> 5 actually It is [3, 5)! It does not contain 5. Anyway that is not the matter.
This can be done by following LINQ
int[] selectedIndexes = a.Select((value, index) =>
new { Value = value, Index = index }).
Where(aToken => aToken.Value > min && aToken.Value <= max).
Select(t => t.Index).ToArray<int>();
The first select, generates a collection of [Value, Index] pairs that the first one is the array element and the second one is the index of the element within the array. I think this is the main trick for your question. So It provides you with this ability to work with the indexes same as usual values.
Finally in the second Select I just wrap whole indexes into an integer array. Hence after this you have the whole indexes that their value fall in the given range.
Now second step!
When you got those indexes, you have to select whole elements within the B under the selected Indexes from the A. The same thing should be done over the B. It means again we select B element into a collection of [Value, Index] pairs and then we select those guys that their indexes exist within the selected indexes from the A. This can be done as follow:
double[] selectedValues = b.Select((item, index) =>
new { Item = item, Index = index }).
Where(bToken => selectedIndexes.Contains(bToken.Index)).
Select(d => d.Item).ToArray<double>();
Ok, so first select is the one I talked about it in the fist part and then look at the where section that check whether the index of the bToken which is an element of B exists in the selectedIndexes (from A) or not!
Finally I wrap both codes into one as below:
double[] answers = b.Select((item, index) =>
new { Item = item, Index = index }).
Where(bTokent =>
a.Select((value, index) =>
new { Value = value, Index = index }).
Where(aToken => aToken.Value > min && aToken.Value <= max).
Select(t => t.Index).
Contains(bTokent.Index)).Select(d => d.Item).ToArray<double>();
Buy a beer for me, if it would be useful :)

I don't know if you're still interested, but I saw this one and I liked the challenge. If you use .Net 4 (having the Enumberable.Zip method) there is a very concise way to do this (given the conditions under futher info):
arrayA.Zip(arrayB, (a,b) => new {a,b})
.Where(x => x.a > lower && x.a < upper)
.Select (x => x.b)
You may want to use >= and <= to make the range comparisons inclusive.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.