More efficient way to build sum than for loop - c#

I have two lists with equal size. Both contain numbers. The first list is generated and the second one is static. Since I have many of the generated lists, I want to find out which one is the best. For me the best list is the one which is most equal to the reference. Therefore I calculate the difference at each position and add it up.
Here is the code:
/// <summary>
/// Calculates a measure based on that the quality of a match can be evaluated
/// </summary>
/// <param name="Combination"></param>
/// <param name="histDates"></param>
/// <returns>fitting value</returns>
private static decimal getMatchFitting(IList<decimal> combination, IList<MyClass> histDates)
{
decimal fitting = 0;
if (combination.Count != histDates.Count)
{
return decimal.MaxValue;
}
//loop through all values, compare and add up the result
for (int i = 0; i < combination.Count; i++)
{
fitting += Math.Abs(combination[i] - histDates[i].Value);
}
return fitting;
}
Is there possibly a more elegant but more important and more efficient way to get the desired sum?
Thanks in advance!

You can do the same with LINQ as follows:
return histDates.Zip(combination, (x, y) => Math.Abs(x.Value - y)).Sum();
This could be considered more elegant, but it cannot be more efficient that what you already have. It can also work with any type of IEnumerable (so you don't need specifically an IList), but that does not have any practical importance in your situation.
You can also reject a histDates as soon as the running sum of differences becomes larger than the smallest sum seen so far if you have this information at hand.

This is possible without using lists. Instead of filling your two lists you just want to have the sum of each values for a single list, e.g. IList combination becomes int combinationSum.
Do the same for histDates list.
Then substract those two values. No loop in this case is needed.

you can do more elegant with LINQ but it will not be more efficient... if you can calculate the sums while adding the items to the list you might get an edge...

I don't think I want to garantee any direct improvment of efficiancy as I can't test it right now but this at least looks nicer:
if (combination.Count != histDates.Count)
return decimal.MaxValue;
return combination.Select((t, i) => Math.Abs(t - histDates[i].Value)).Sum();

Related

C# sort List<int> recursively

there's an exercise i need to do, given a List i need to sort the content using ONLY recursive methods (no while, do while, for, foreach).
So... i'm struggling (for over 2 hours now) and i dont know how to even begin.
The function must be
List<int> SortHighestToLowest (List<int> list) {
}
I THINK i should check if the previous number is greater than the actual number and so on but what if the last number is greater than the first number on the list?, that's why im having a headache.
I appreciate your help, thanks a lot.
[EDIT]
I delivered the exercise but then teacher said i shouldn't use external variables like i did here:
List<int> _tempList2 = new List<int>();
int _actualListIndex = 0;
int _actualMaxNumber = 0;
int _actualMaxNumberIndex = 0;
List<int> SortHighestToLowest(List<int> list)
{
if (list.Count == 0)
return _tempList2;
if (_actualListIndex == 0)
_actualMaxNumber = list[0];
if (_actualListIndex < list.Count -1)
{
_actualListIndex++;
if (list[_actualListIndex] > _actualMaxNumber)
{
_actualMaxNumberIndex = _actualListIndex;
_actualMaxNumber = list[_actualListIndex];
}
return SortHighestToLowest(list);
}
_tempList2.Add(_actualMaxNumber);
list.RemoveAt(_actualMaxNumberIndex);
_actualListIndex = 0;
_actualMaxNumberIndex = 0;
return SortHighestToLowest(list);
}
Exercise is done and i approved (thanks to other exercises as well) but i was wondering if there's a way of doing this without external variables and without using System.Linq like String.Empty's response (im just curious, the community helped me to solve my issue and im thankful).
I am taking your instructions to the letter here.
Only recursive methods
No while, do while, for, foreach
Signature must be List<int> SortHighestToLowest(List<int> list)
Now, I do assume you may use at least the built-in properties and methods of the List<T> type. If not, you would have a hard time even reading the elements of your list.
That said, any calls to Sort or OrderBy methods would be beyond the point here, since they would render any recursive method useless.
I also assume it is okay to use other lists in the process, since you didn't mention anything in regards to that.
With all that in mind, I came to this piece below, making use of Max and Remove methods from List<T> class, and a new list of integers for each recursive call:
public static List<int> SortHighestToLowest(List<int> list)
{
// recursivity breaker
if (list.Count <= 1)
return list;
// remove highest item
var max = list.Max();
list.Remove(max);
// append highest item to recursive call for the remainder of the list
return new List<int>(SortHighestToLowest(list)) { max };
}
For solving this problem, try to solve smaller subsets. Consider the following list
[1,5,3,2]
Let's take the last element out of list, and consider the rest as sorted which will be [1,3,5] and 2. Now the problem reduces to another problem of inserting this 2 in its correct position. If we can insert it in correct position then the array becomes sorted. This can be applied recursively.
For every recursive problem there should be a base condition w.r.t the hypothesis we make. For the first problem the base condition is array with single element. A single element array is always sorted.
For the second insert problem the base condition will be an empty array or the last element in array is less than the element to be inserted. In both cases the element is inserted at the end.
Algorithm
---------
Sort(list)
if(list.count==1)
return
temp = last element of list
temp_list = list with last element removed
Sort(temp_list)
Insert(temp_list, temp)
Insert(list, temp)
if(list.count ==0 || list[n-1] <= temp)
list.insert(temp)
return
insert_temp = last element of list
insert_temp_list = list with last element removed
Insert(insert_temo_list, insert_temp)
For Insert after base condition its calling recursively till it find the correct position for the last element which is removed.

C# - Creating a recursive function to calculate the sum of a list. Is it possible using only the list as the only parameter?

So in my attempt to start learning c# one challenge I've come across is to create a recursive function that will calculate the sum of a list. I'm wondering if it's possible to do this using a list as the only argument of the function? Or would I need to apply an index size as well to work through the list?
int addRecursively(List<int> numList)
{
int total = numList[0];
if (numList.Count > 1)
{
numList.RemoveAt(0);
return total += addRecursively(numList);
}
Console.WriteLine(total);
return total;
}
List<int> numbers = new<List<int> {1,2,3,4,5,6,7,8};
addRecursively(numbers); //returns only the last element of whichever list I enter.
I was hoping by assigning the total to the first index of the list before deleting the first index of the list that when passed into the next instance of the function the index of each element in the list would move down one, allowing me to get each value in the list and totalling them up. However using the function will only ever return the last element of whichever list of integers I enter.
My thought process came from arrays and the idea of the shift method on an array in JS, removing the first element and bringing the whole thing down.
Am I attempting something stupid here? Is there another similar method I should be using or would I be better off simply including a list size as another parameter?
Thanks for your time
So in my attempt to start learning c# one challenge I've come across is to create a recursive function that will calculate the sum of a list. I'm wondering if it's possible to do this using a list as the only argument of the function? Or would I need to apply an index size as well to work through the list?
That's a great exercise for a beginner. However, you would never, ever do this with a List<int> in a realistic program. First, because you'd simply call .Sum() on it. But that's a cop-out; someone had to write Sum, and that person could be you.
The reason you would never do this recursively is List<T> is not a recursive data structure. As you note, every time you recurse there has to be something different. If there is not something different then you have an unbounded recursion!
That means you have to change one of the arguments, either by mutating it, if it is a reference type, or passing a different argument. Neither is correct in this case where the argument is a list.
For a list, you never want to mutate the list, by removing items, say. You don't own that list. The caller owns the list and it is rude to mutate it on them. When I call your method to sum a list, I don't want the list to be emptied; I might want to use it for something else.
And for a list, you never want to pass a different list in a recursion because constructing the new list from the old list is very expensive.
(There is also the issue of deep recursion; presumably we wish to sum lists of more than a thousand numbers, but that will eat up all the stack space if you go with a recursive solution; C# is not a guaranteed-tail-recursive language like F# is. However, for learning purposes let's ignore this issue and assume we are dealing with only small lists.)
Since both of the techniques for avoiding unbounded recursions are inapplicable, you must not write recursive algorithms on List<T> (or, as you note, you must pass an auxiliary parameter such as an index, and that's the thing you change). But your exercise is still valid; we just have to make it a better exercise by asking "what would we have to change to make a list that is amenable to recursion?"
We need to change two things: (1) make the list immutable, and (2) make it a recursively defined data structure. If it is immutable then you cannot change the caller's data by accident; it's unchangeable. And if it is a recursively defined data structure then there is a natural way to do recursion on it that is cheap.
So this is your new exercise:
An ImmutableList is either (1) empty, or (2) a single integer, called the "head", and an immutable list, called the "tail". Implement these in the manner of your choosing. (Abstract base class, interface implemented by multiple classes, single class that does the whole thing, whatever you think is best. Pay particular attention to the constructors.)
ImmutableList has three public read-only properties: bool IsEmpty, int Head and ImmutableList Tail. Implement them.
Now we can define int Sum(ImmutableList) as a recursive method: the base case is the sum of an empty list is zero; the inductive case is the sum of a non-empty list is the head plus the sum of the tail. Implement it; can you do it as a single line of code?
You will learn much more about C# and programming in a functional style with this exercise. Use iterative algorithms on List<T>, always; that is what it was designed for. Use recursion on data structures that are designed for recursion.
Bonus exercises:
Write Sum as an extension method, so that you can call myImmutableList.Sum().
Sum is a special case of an operation called Aggregate. It returns an integer, and takes three parameters: an immutable list, an integer called the accumulator, and a Func<int, int, int>. If the list is empty, the result is the accumulator. Otherwise, the result is the recursion on the tail and calling the function on the head and the accumulator. Write a recursive Aggregate; if you've done it correctly then int Sum(ImmutableList items) => Aggregate(items, 0, (acc, item) => acc + item); should be a correct implementation of Sum.
Genericize ImmutableList to ImmutableList<T>; genericize Aggregate to Aggregate<T, R> where T is the list element type and R is the accumulator type.
Try this way:
int addRecursively(List<int> lst)
{
if(lst.Count() == 0) return 0;
return lst.Take(1).First() + addRecursively(lst.Skip(1).ToList());
}
one more example:
static public int RecursiveSum(List<int> ints)
{
int nextIndex = 0;
if(ints.Count == 0)
return 0;
return ints[0] + RecursiveSum(ints.GetRange(++nextIndex, ints.Count - 1));
}
These are some ways to get the sum of integers in a list.
You don't need a recursive method, it spends more system resources when it isn't needed.
class Program
{
static void Main(string[] args)
{
List<int> numbers = new List<int>() { 1, 2, 3, 4, 5 };
int sum1 = numbers.Sum();
int sum2 = GetSum2(numbers);
int sum3 = GetSum3(numbers);
int sum4 = GetSum4(numbers);
}
private static int GetSum2(List<int> numbers)
{
int total = 0;
foreach (int number in numbers)
{
total += number;
}
return total;
}
private static int GetSum3(List<int> numbers)
{
int total = 0;
for (int i = 0; i < numbers.Count; i++)
{
total += numbers[i];
}
return total;
}
private static int GetSum4(List<int> numbers)
{
int total = 0;
numbers.ForEach((number) =>
{
total += number;
});
return total;
}
}

Efficiently calculating totals from a file using LINQ

I'm reading a file and turning each line within it into a class, let's call it Record, and returning each Record as it is read using IEnumerable<Record> and yield return.
Because of this I only start actually performing these reads whenever I do an operation on the enumeration, such as performing a sum on it or iterating through it with a foreach.
I do need to go through each record and then translate that into a database, but due to database design before my time I need the totals on each record in the database, so I need these totals before I start translating them into my database.
At the moment I have five separate .Count() or .Sum() operations on my enumeration before I start iterating the enumeration (example int i = records.Sum(r => r.SomeField) or int j = records.Count(r => r.IsSomethingTrue)). Each one of those counts or sums will loop through the entire file to calculate each one separately. I'm not really happy with this behaviour and would like to find a more efficient way of doing this.
I am using .NET 3.5 if that makes any difference.
You could use your own struct to calculate a few values at the single pass through an enumerable object.
public struct ComplexAccumulator
{
public int TotalSumField { get; set; }
public int CountSomethingTrue { get; set; }
}
Now you can use Aggreagate extension method to accumulate values:
records.Aggregate(default(ComplexAccumulator), (a, r) => new ComplexAccumulator
{
TotalSumFiled = a.TotalSumField + r.SumField,
CountSomethingTrue = a.CountSomethingTrue + r.IsSomethingTrue ? 1 : 0,
});
Instead of the struct you could use suitable Tuple instance, f.e. something like Tuple<int, int, int>.
Efficiency is not a strength of LINQ... You need to replace some LINQ things with manual loops here.
You seem to need two passes over the data. One for aggregation:
var sum = 0; //etc.
foreach (var item in items) {
//compute all 5 aggregates here
}
And then one to translate the data:
items.Select(item => Translate(item, aggregates))
Whether you should buffer items (for example using ToList) or not depends on whether available memory can hold those items or not.
You can use Aggregate to perform all 5 aggregations in one pass but that's not better than a loop in any way. It's slower, far more code and the code arguably is illegible.

Data structure with unique elements and fast add and remove

I need a data structure with the following properties:
Each element of the structure must be unique.
Add: Adds one element to the data structure unless the element already
exists.
Pop: Removes one element from the data structure and returns the element
removed. It's unimportant which element is removed.
No other operations are required for this structure. A naive implementation with a list will require almost O(1) time for Pop and O(N) time for Add (since the entire list must be checked to ensure
uniqueness). I am currently using a red-black tree to fulfill the needs of this data structure, but I am wondering if I can use something less complicated to achieve almost the same performance.
I prefer answers in C#, but Java, Javascript, and C++ are also acceptable.
My question is similar to this question, however I have no need to lookup or remove the maximum or minimum value (or indeed any particular kind of value), so I was hoping there would be improvements in that respect. If any of the structures in that question are appropriate here, however, let me know.
So, what data structure allows only unique elements, supports fast add and remove, and is less complicated than a red-black tree?
What about the built-in HashSet<T>?
It contains only unique elements. Remove (pop) is O(1) and Add is O(1) unless the internal array must be resized.
As said by Meta-Knight, a HashSet is the fastest data structure to do exactly that. Lookups and removals take constant O(1) time (except in rare cases when your hash sucks and then you require multiple rehashes or you use a bucket hashset). All operations on a hashset take O(1) time, the only drawback is that it requires more memory because the hash is used as an index into an array (or other allocated block of memory). So unless you're REALLY strict on memory then go with HashSet. I'm only explaining the reason why you should go with this approach and you should accept Meta-Knights answer as his was first.
Using hashes is OK because usually you override the HashCode() and Equals() functions. What the HashSet does internally is generate the hash, then if it is equal check for equality (just in case of hash collisions). If they are not it must call a method to do something called rehashing which generates a new hash which is usually at an odd prime offset from the original hash (not sure if .NET does this but other languages do) and repeats the process as necessary.
Remove a random element is quite easy from an hashset or a dictionary.
Everything is averaged O(1), that in real world means O(1).
Example:
public class MyNode
{
...
}
public class MyDataStructure
{
private HashSet<MyNode> nodes = new HashSet<MyNode>();
/// <summary>
/// Inserts an element to this data structure.
/// If the element already exists, returns false.
/// Complexity is averaged O(1).
/// </summary>
public bool Add(MyNode node)
{
return node != null && this.nodes.Add(node);
}
/// <summary>
/// Removes a random element from the data structure.
/// Returns the element if an element was found.
/// Returns null if the data structure is empty.
/// Complexity is averaged O(1).
/// </summary>
public MyNode Pop()
{
// This loop can execute 1 or 0 times.
foreach (MyNode node in nodes)
{
this.nodes.Remove(node);
return node;
}
return null;
}
}
Almost everything that can be compared can also be hashed :) in my experience.
I would like to know if there is someone that know something that cannot be hashed.
To my experience this applies also to some floating point comparations with tolerance using special techniques.
An hash function for an hash table don't need to be perfect, it just need to be good enough.
Also if your data is very complicated usually hash functions are less complicated than red black trees or avl trees.
They are useful because they keep things ordered, but you don't need this.
To show how to do a simple hashset i will consider a simple dictionary with integer keys.
This implementation is very fast and very good for sparse arrays for examples.
I didn't write the code to grow the bucket table, because it is annoying and usually a source of big bugs, but since this is a proof of concept, it should be enough.
I didn't write iterator neither.
I wrote it by scratch, there may be bugs.
public class FixedIntDictionary<T>
{
// Our internal node structure.
// We use structs instead of objects to not add pressure to the garbage collector.
// We mantains our own way to manage garbage through the use of a free list.
private struct Entry
{
// The key of the node
internal int Key;
// Next index in pEntries array.
// This field is both used in the free list, if node was removed
// or in the table, if node was inserted.
// -1 means null.
internal int Next;
// The value of the node.
internal T Value;
}
// The actual hash table. Contains indices to pEntries array.
// The hash table can be seen as an array of singlt linked list.
// We store indices to pEntries array instead of objects for performance
// and to avoid pressure to the garbage collector.
// An index -1 means null.
private int[] pBuckets;
// This array contains the memory for the nodes of the dictionary.
private Entry[] pEntries;
// This is the first node of a singly linked list of free nodes.
// This data structure is called the FreeList and we use it to
// reuse removed nodes instead of allocating new ones.
private int pFirstFreeEntry;
// Contains simply the number of items in this dictionary.
private int pCount;
// Contains the number of used entries (both in the dictionary or in the free list) in pEntries array.
// This field is going only to grow with insertions.
private int pEntriesCount;
///<summary>
/// Creates a new FixedIntDictionary.
/// tableBucketsCount should be a prime number
/// greater than the number of items that this
/// dictionary should store.
/// The performance of this hash table will be very bad
/// if you don't follow this rule!
/// </summary>
public FixedIntDictionary<T>(int tableBucketsCount)
{
// Our free list is initially empty.
this.pFirstFreeEntry = -1;
// Initializes the entries array with a minimal amount of items.
this.pEntries = new Entry[8];
// Allocate buckets and initialize every linked list as empty.
int[] buckets = new int[capacity];
for (int i = 0; i < buckets.Length; ++i)
buckets[i] = -1;
this.pBuckets = buckets;
}
///<summary>Gets the number of items in this dictionary. Complexity is O(1).</summary>
public int Count
{
get { return this.pCount; }
}
///<summary>
/// Adds a key value pair to the dictionary.
/// Complexity is averaged O(1).
/// Returns false if the key already exists.
/// </summary>
public bool Add(int key, T value)
{
// The hash table can be seen as an array of linked list.
// We find the right linked list using hash codes.
// Since the hash code of an integer is the integer itself, we have a perfect hash.
// After we get the hash code we need to remove the sign of it.
// To do that in a fast way we and it with 0x7FFFFFFF, that means, we remove the sign bit.
// Then we have to do the modulus of the found hash code with the size of our buckets array.
// For this reason the size of our bucket array should be a prime number,
// this because the more big is the prime number, the less is the chance to find an
// hash code that is divisible for that number. This reduces collisions.
// This implementation will not grow the buckets table when needed, this is the major
// problem with this implementation.
// Growing requires a little more code that i don't want to write now
// (we need a function that finds prime numbers, and it should be fast and we
// need to rehash everything using the new buckets array).
int bucketIndex = (key & 0x7FFFFFFF) % this.pBuckets.Length;
int bucket = this.pBuckets[bucketIndex];
// Now we iterate in the linked list of nodes.
// Since this is an hash table we hope these lists are very small.
// If the number of buckets is good and the hash function is good this will translate usually
// in a O(1) operation.
Entry[] entries = this.pEntries;
for (int current = entries[bucket]; current != -1; current = entries[current].Next)
{
if (entries[current].Key == key)
{
// Entry already exists.
return false;
}
}
// Ok, key not found, we can add the new key and value pair.
int entry = this.pFirstFreeEntry;
if (entry != -1)
{
// We found a deleted node in the free list.
// We can use that node without "allocating" another one.
this.pFirstFreeEntry = entries[entry].Next;
}
else
{
// Mhhh ok, the free list is empty, we need to allocate a new node.
// First we try to use an unused node from the array.
entry = this.pEntriesCount++;
if (entry >= this.pEntries)
{
// Mhhh ok, the entries array is full, we need to make it bigger.
// Here should go also the code for growing the bucket table, but i'm not writing it here.
Array.Resize(ref this.pEntries, this.pEntriesCount * 2);
entries = this.pEntries;
}
}
// Ok now we can add our item.
// We just overwrite key and value in the struct stored in entries array.
entries[entry].Key = key;
entries[entry].Value = value;
// Now we add the entry in the right linked list of the table.
entries[entry].Next = this.pBuckets[bucketIndex];
this.pBuckets[bucketIndex] = entry;
// Increments total number of items.
++this.pCount;
return true;
}
/// <summary>
/// Gets a value that indicates wether the specified key exists or not in this table.
/// Complexity is averaged O(1).
/// </summary>
public bool Contains(int key)
{
// This translate in a simple linear search in the linked list for the right bucket.
// The operation, if array size is well balanced and hash function is good, will be almost O(1).
int bucket = this.pBuckets[(key & 0x7FFFFFFF) % this.pBuckets.Length];
Entry[] entries = this.pEntries;
for (int current = entries[bucket]; current != -1; current = entries[current].Next)
{
if (entries[current].Key == key)
{
return true;
}
}
return false;
}
/// <summary>
/// Removes the specified item from the dictionary.
/// Returns true if item was found and removed, false if item doesn't exists.
/// Complexity is averaged O(1).
/// </summary>
public bool Remove(int key)
{
// Removal translate in a simple contains and removal from a singly linked list.
// Quite simple.
int bucketIndex = (key & 0x7FFFFFFF) % this.pBuckets.Length;
int bucket = this.pBuckets[bucketIndex];
Entry[] entries = this.pEntries;
int next;
int prev = -1;
int current = entries[bucket];
while (current != -1)
{
next = entries[current].Next;
if (entries[current].Key == key)
{
// Found! Remove from linked list.
if (prev != -1)
entries[prev].Next = next;
else
this.pBuckets[bucketIndex] = next;
// We now add the removed node to the free list,
// so we can use it later if we add new elements.
entries[current].Next = this.pFirstFreeEntry;
this.pFirstFreeEntry = current;
// Decrements total number of items.
--this.pCount;
return true;
}
prev = current;
current = next;
}
return false;
}
}
If you wander if this implementation is good or not, is a very similar implementation of what the .NET framework do for Dictionary class :)
To make it an hashset, just remove the T and you have an hashset of integers.
If you need to get hashcodes for generic objects, just use x.GetHashCode or provide your hash code function.
To write iterators you need to modify several things, but don't want to add too much other things in this post :)

Examining two string arrays for equivalence

Is there a better way to examine whether two string arrays have the same contents than this?
string[] first = new string[]{"cat","and","mouse"};
string[] second = new string[]{"cat","and","mouse"};
bool contentsEqual = true;
if(first.Length == second.Length){
foreach (string s in first)
{
contentsEqual &= second.Contains(s);
}
}
else{
contentsEqual = false;
}
Console.WriteLine(contentsEqual.ToString());// true
Enumerable.SequenceEquals if they're supposed to be in the same order.
You should consider using the intersect method. It will give you all the matching values and then you can just compare the count of the resulting array with one the arrays that were compared.
http://msdn.microsoft.com/en-us/library/system.linq.enumerable.intersect.aspx
This is O(n^2). If the arrays have the same length, sort them, then compare elements in the same position. This is O(n log n).
Or you can use a hash set or dictionary: insert each word in the first array, then see if every word in the second array is in the set or dictionary. This is O(n) on average.
Nothing wrong with the logic of the method, but the fact that you're testing Contains for each item in the first sequence means the algorithm runs in O(n^2) time in general. You can also make one or two other smaller optimisations and improvements
I would implement such a function as follows. Define an extension method as such (example in .NET 4.0).
public static bool SequenceEquals<T>(this IEnumerable<T> seq1, IEnumerable<T> seq2)
{
foreach (var pair in Enumerable.Zip(seq1, seq2)
{
if (!pair.Item1.Equals(pair.Item2))
return;
}
return false;
}
You could try Enumerable.Intersect: http://msdn.microsoft.com/en-us/library/bb460136.aspx
The result of the operation is every element that is common to both arrays. If the length of the result is equal to the length of both arrays, then the two arrays contain the same items.
Enumerable.Union: http://msdn.microsoft.com/en-us/library/bb341731.aspx would work too; just check that the result of the Union operation has length of zero (meaning there are no elements that are unique to only one array);
Although I'm not exactly sure how the functions handle duplicates.

Categories

Resources