Normalize ConcurrentDictionary of arrays - c#

I have a ConcurrentDictionary of arrays, where each array has the same fixed size. It looks like this: ConcurrentDictionary<int, double[]> ItemFeatures
I want to normalize the values in the list by dividing all the values by the maximum of the values in that column. For example, if my lists are of size 5, I want every element in the first position to be divided by the maximum of all the values in that position, and so on for position 2 onwards.
The naive way that I can think of doing this, is by first iterating over every list and every element in the list, and storing the max value per position. Then iterating over them again and dividing them by the previously found maximum values.
Is there a more elegant way to do this in Linq perhaps? These dictionaries would be large, so the more efficient/least time consuming, the better.

No, that will actually be the most efficient way. In the end, this is what you need to do anyway, you can't skip anything. You can probably write it in LINQ somehow, but the performance will be worse because it will have a lot of function calls and memory allocations. LINQ doesn't perform miracles, it's just a (sometimes) shorter way of writing things.
What can speed this up is if your algorithm has a good "cache locality" - in other words, if you access the computer memory in a sequential way. That's pretty hard to guarantee in an environment like .NET, but a loop like you described probably has the best chances of getting close to it.

LINQ is designed for querying data, not modifying data. You can use a little LINQ to compute the maximums, but that is about it:
var cols = ItemFeatures.First().Value.Length;
var maxv = new double[cols];
for (var j1 = 0; j1 < cols; ++j1)
maxv[j1] = ItemFeatures.Values.Select(vs => vs[j1]).Max();
foreach (var kvp in ItemFeatures)
for (var j1 = 0; j1 < cols; ++j1)
kvp.Value[j1] /= maxv[j1];

Related

Growing and shrinking a List<int> vs a big sized bool Array using the index as value

I'm unable to determine whether having a growing and shrinking List vs using a big bool Array will be more efficient for my application.
To expand on this comparison and the actual situation, here are examples of each option I believe I have:
Option 1 (List):
public List<int> list = new List<int>();
while (true) { // game loop
list.Add(Random.Range(0-300));
list.Add(Random.Range(0-300));
... // maximum of 10 of these can happen
if (list.Contains(42)) { // roughly 10 - 50 of these checks can be true
list.Remove(42);
}
}
Option 2 (Array):
bool[] arr = new bool[300];
while (true) { // game loop
arr[Random.Range(0-300)] = true;
arr[Random.Range(0-300)] = true;
... // maximum of 10 of these can happen
for (int i = 0; i < 300; i++) {
if (arr[i]) { // roughly 10 - 50 of these checks can be true
arr[i] = false;
}
}
}
So essentially my question is:
At what point does too many .Contains checks become more expensive than a for loop over each possible element (based on my ranges)?
IMPORTANT
This is not a List vs Array question. The datatypes are important because of the condition checks. So it is specifically an integer list vs bool array comparison since these two options can give me the same results.
I would say the array implementation would be much faster. In addition to the cost of resizing the array internally when you call List.Add(T) or List.Remove(T), if you check the List implementation code. You will notice the List.Contains(T) and List.Remove(T) both are using IndexOf(T) in which I believe is having looping/iteration through the list internally. In your example, you want to call List.Contains(T) and List.Remove(T) around 10-50 times. It means at best case it will cost you 20 (contains+remove), but in the worst case it will cost you (N * 50) + N where N is the number of items in your list.
With this information, I could conclude if your list growing bigger, the performance will much worse.
If you're looking more into performance, maybe it's worth taking a look at HashSet data structure. It has much better performance in look up and remove operations than a List.
Here's an interesting writeup on Array vs List for both for, foreach, EnumerableForEach and Sum by Jon Skeet:
https://codeblog.jonskeet.uk/2009/01/29/for-vs-foreach-on-arrays-and-lists/
As per the article, the performance goes like this:
============ int[] ============
For 1.00
ForHoistLength 2.03
ForEach 1.36
IEnumerableForEach 15.22
Enumerable.Sum 15.73
============ List<int> ============
For 2.82
ForHoistLength 3.49
ForEach 4.78
IEnumerableForEach 25.71
Enumerable.Sum 26.03
Results can be quantified over like int array for a for loop is 2.8 times faster. If you know the size of an array and its fixed, go with Array, else List.
Here is another link: Performance of Arrays vs. Lists
and also, stay away from Linq for large data and go with for/foreach loops.

Pull elements from array with property value under or over X

I have a very large list of a custom class. I often need to perform a task based on only elements from the list where a custom value of the class is over or under a specific threshold.
Currently, I do something like this:
//Sort the customList by it's X value (sometimes ascending, sometimes descending)
customList.Sort((a, b) => b.X.CompareTo(a.X));
//Iterate through array until the X value is not within the necessary range
for (int i = 0; i < customList.Count; i++)
{
if (customList[i].X < .5f) break;
PerformTask(customList[i]);
}
This isn't a huge bottleneck, but it would be best if I can speed up this kind of task for this application (not to mention I am always wanting to learn things like this).
So the question is, is there a much faster sorting method without writing it myself and/or is there a faster way to run PerformTask on the elements meeting specific criteria without iterating over all elements?
My question might also be better asked in regards to keeping a list sorted not just when adding/removing items, but also when changing the values they are sorted on...
Thanks,
Tim
Sorting is the wrong approach here. It's O(n log n) with a very efficient algorithm. Use Enumerable.Where:
foreach (var item in customList.Where(n => n.X > 0.5f))
{
PerformTask(item);
}

What is an elegant way to find min value in a subset of an array?

I have an array a of 100 integers. What is a recommended way to find the min value in a[3] through a[70] AND the index of this min value? Assuming no duplication of values.
I know the clumsy way of looping through the relevant range of indices:
for(i = 3; i < 70, i++)
{
...
}
I am looking for a more elegant way of doing this in C# instead of looping. Thanks.
To Find out min
List<int> templist = a.Skip(3).Take(67).ToList();
int minimum = templist.Min();
For Index
int index = templist.FindIndex(i => i == minimum) + 3;
I added 3 because index in list will be 3 less than index in original sequence a.
What it is doing
Skip - Leaves first 3 values i.e. index 0,1,2 and returns remaining array.
Take - From the array returned by Skip it takes 67 values. (Since your for loop goes till starts from 3 and goes till 70 so you are basically looping on 67 items bcoz 70 - 3 = 67).
ToList - Converts returned sequence to List for finding index.
Min - Gets minimum from of it.
You have to use loop since it is a sequence. Since you said elegant so instead of for loop I used LINQ (Even it does that looping also).
If your data structure is not sorted then there is no way to do it without looping through all the elements in the sublist, either if you use some implicit looping through the provided API.
You cannot use a sorted collection since you are working on a subpart of it (so you'd need to create a sorted collection for the part of the list just for it), so in any case you'll have to loop over it.
LINQ's Aggregate is not the easiest, but it is arguably the least inefficient of the "elegant" solutions (though they're still more lines of code than the straightforward loop. Additionally, iterating through yourself is still the best because you are not allocating any additional memory).
But anyway, should you feel the need to make your successor hang you in effigy, you can do this instead of a straightforward loop:
var minValueAndItsIndex = a
.Skip(3)
.Take(70 - 3)
.Select((value, index) => new { Value = value, Index = index + 3})
.Aggregate((tuple1, tuple2) => (tuple1.Value < tuple2.Value) ? tuple1 : tuple2);
If you create a 2-item ValueType-based tuple and use that instead of the anonymous type, it will be comparable to the more-efficient direct iteration because it won't allocate any additional memory.

What is the fastest way to calculate frequency distribution for array in C#?

I am just wondering what is the best approach for that calculation. Let's assume I have an input array of values and array of boundaries - I wanted to calculate/bucketize frequency distribution for each segment in boundaries array.
Is it good idea to use bucket search for that?
Actually I found that question Calculating frequency distribution of a collection with .Net/C#
But I do not understand how to use buckets for that purpose cause the size of each bucket can be different in my situation.
EDIT:
After all discussions I have inner/outer loop solution, but still I want to eliminate the inner loop with a Dictionary to get O(n) performance in that case, if I understood correctly I need to hash input values into a bucket index. So we need some sort of hash function with O(1) complexity? Any ideas how to do it?
Bucket Sort is already O(n^2) worst case, so I would just do a simple inner/outer loop here. Since your bucket array is necessarily shorter than your input array, keep it on the inner loop. Since you're using custom bucket sizes, there are really no mathematical tricks that can eliminate that inner loop.
int[] freq = new int[buckets.length - 1];
foreach(int d in input)
{
for(int i = 0; i < buckets.length - 1; i++)
{
if(d >= buckets[i] && d < buckets[i+1])
{
freq[i]++;
break;
}
}
}
It's also O(n^2) worst case but you can't beat the code simplicity. I wouldn't worry about optimization until it becomes a real issue. If you have a larger bucket array, you could use a binary search of some sort. But, since frequency distributions are typically < 100 elements, I doubt you'd see a lot of real-world performance benefit.
If your input array represents real world data (with its patterns) and array of boundaries is large to iterate it again and again in inner loop you can consider the following approach:
First of all sort your input array. If you work with real-world data
I would recommend to consider Timsort - Wiki for this. It
provides very good performance guarantees for a patterns that can be seen in
real-world data.
Traverse through sorted array and compare it with the first value in the array of boundaries:
If value in input array is less then boundary - increment frequency counter for this boundary
If value in input array is bigger then boundary - go to the next value in array of boundaries and increment the counter for new boundary.
In a code it can look like this:
Timsort(myArray);
int boundPos;
boundaries = GetBoundaries(); //assume the boundaries is a Dictionary<int,int>()
for (int i = 0; i<myArray.Lenght; i++) {
if (myArray[i]<boundaries[boundPos]) {
boundaries[boubdPos]++;
}
else {
boundPos++;
boundaries[boubdPos]++;
}
}

How do I insert an int into a sorted array quickly?

I'd like to insert an int into a sorted array. This operation is going to be performed very often, so it needs to be as fast as possible.
It is possible and even preferred to use a List or any other class instead of an array
All values are in the 1 to 34 range
The array typically contains exactly 14 values
I was thinking of many different approaches, including binary search and simple insert-on-copy, but found it hard to decide. Also, I felt like I missed an idea. Do you have experiences on this topic or any new ideas to consider?
I will use an int array whose length is 35(because you said range 1-34) to record the status of the numbers.
int[] status = Enumerable.Repeat(0, 35).ToArray();
//an array contains 35 zeros
//which means currently there is no elements in the array
status[10] = 1; // now the array have only one number: 10
status[11] ++; // a new number 11 is added to the list
So if you want to add a number i to the list:
status[i]++; // O(1) to add a number
To remove an i from the list:
status[i]--; // O(1) to remove a number
Want to know all the numebrs in the list?
for (int i = 0; i < status.Length; i++)
{
if (status[i] > 0)
{
for (int j = 0; j < status[i]; j++)
Console.WriteLine(i);
}
}
//or more easier using LINQ
var result = status.SelectMany((i, index) => Enumerable.Repeat(index, i));
The following example may help you understand my code better:
the real number array: 1 12 12 15 9 34 // i don't care if it's sorted
the status array: status[1]=1,status[12]=2,status[15]=1,status[9]=1,status[34]=1
all others are 0
At 14 values this is a pretty small array, I don't think switching to a smarter data structure such as a list will win you much, especially if you fast good random access. Even binary search may actually be slower than linear search at this scale. Are you sure that, say, insert-on-copy does not satisfy your performance requirements?
This operation is going to be performed very often, so it needs to be as fast as possible.
The things that you notice happen "very often" are frequently not the bottlenecks in the program - it's often surprising what the actual bottlenecks are. You should code something simple and measure the actual performance of your program before performing any optimizations.
I was thinking of many different approaches, including binary search and simple insert-on-copy, but found it hard to decide.
Assuming that this is the bottleneck, the big-O performance of the different methods is not going to be relevant here because of the small size of your array. It is easier to just try a few different approaches, measure the results, see which performs best and choose that method. If you have followed the advice from the first paragraph you already have a profiler setup that you can use for this step too.
For inserting into the middle, a LinkedList<int> would be the fastest option - anything else involves copying data. At 14 elements, don't stress over binary search etc - just walk forwards to the item you want:
using System;
using System.Collections.Generic;
static class Program
{
static void Main()
{
LinkedList<int> data = new LinkedList<int>();
Random rand = new Random(12345);
for (int i = 0; i < 20; i++)
{
data.InsertSortedValue(rand.Next(300));
}
foreach (int i in data) Console.WriteLine(i);
}
}
static class LinkedListExtensions {
public static void InsertSortedValue(this LinkedList<int> list, int value)
{
LinkedListNode<int> node = list.First, next;
if (node == null || node.Value > value)
{
list.AddFirst(value);
}
else
{
while ((next = node.Next) != null && next.Value < value)
node = next;
list.AddAfter(node, value);
}
}
}
Doing the brute-force approach is the best decision here because 14 isn't a number :). However, this is not a scalable decision, since should 14 become 14000 one day that will cause problems
What is the most common operation with your array?
Insert? Read?
Heap data structure will give you O(log(14)) for both of them. SortedDictionary may hit your performance.
Using a simple array will give you O(1) for reading and O(14) for insert.
By the way, have you tried System.Collections.Generic.SortedDictionary ot System.Collections.Generic.SortedList?
If you're on .Net 4 you should take a look at the SortedSet<T>. Otherwise take a look at SortedDictionary<TKey, TValue> where you make TValue as object and just put null into it, cause you're just interested into the keys.
If there is no repeated value on the array and the possible values won´t change maybe a fixed size array where the value is equal to the index is a good choice
Both insert and read are O(1)
You have a range of possible values from 1-34 which is rather narrow. So the fastest way would likely be using an array with 34 slots. To insert a number n just do array[n-1]++ and to remove it do array[n.1]-- (if n>0).
To check if a value exists in your collection you do array[n-1]>0.
edit: Damn...Danny was faster. :)
Write a method takes an array of integers and sorts them in place using Bubble Sort. The method is not allowed to create any additional arrays. Bubble Sort is a simple sorting algorithm that works by looping through the array to be sorted, comparing each pair of adjacent elements and swapping them if they are in the wrong order.

Categories

Resources