Why does ImmutableArray.Create copy an existing immutable array?

Why does ImmutableArray.Create copy an existing immutable array? - c#

I am trying to make a slice of an existing ImmutableArray<T> in a method and thought I could use the construction method Create<T>(ImmutableArray<T> a, int offset, int count)like so:
var arr = ImmutableArray.Create('A', 'B', 'C', 'D');
var bc ImmutableArray.Create(arr, 1, 2);
I was hoping my two ImmutableArrays could share the underlying array here. But
when double checking this I see that the implementation doesn't:
From corefx/../ImmutableArray.cs at github at 15757a8 18 Mar 2017
/// <summary>
/// Initializes a new instance of the <see cref="ImmutableArray{T}"/> struct.
/// </summary>
/// <param name="items">The array to initialize the array with.
/// The selected array segment may be copied into a new array.</param>
/// <param name="start">The index of the first element in the source array to include in the resulting array.</param>
/// <param name="length">The number of elements from the source array to include in the resulting array.</param>
/// <remarks>
/// This overload allows helper methods or custom builder classes to efficiently avoid paying a redundant
/// tax for copying an array when the new array is a segment of an existing array.
/// </remarks>
[Pure]
public static ImmutableArray<T> Create<T>(ImmutableArray<T> items, int start, int length)
{
Requires.Range(start >= 0 && start <= items.Length, nameof(start));
Requires.Range(length >= 0 && start + length <= items.Length, nameof(length));
if (length == 0)
{
return Create<T>();
}
if (start == 0 && length == items.Length)
{
return items;
}
var array = new T[length];
Array.Copy(items.array, start, array, 0, length);
return new ImmutableArray<T>(array);
}
Why does it have to make the copy of the underlying items here?
And isn't the documentation for this method misleading/wrong? It says
This overload allows helper methods or custom builder classes to
efficiently avoid paying a redundant tax for copying an array when the
new array is a segment of an existing array.
But the segment case is exactly when it copies, and it only avoids the copy if the desired slice is empty or the whole input array.
Is there another way of accomplishing what I want, short of implementing some kind of ImmutableArraySpan?

I'm going to answer my own question with the aid of the comments:
An ImmutableArray can't represent a slice of the underlying array because it doesn't have the fields for it - and obviously adding 64/128 bits of range fields that are only rarely used would be too wasteful.
So the only possibility is to have a proper Slice/Span struct, and there isn't one at the moment apart from ArraySegment (which can't use ImmutableArray as backing data).
It's probably easy to write an ImmutableArraySegment implementing IReadOnlyList<T> etc so will probably be the solution here.
Regarding the documentation - it's as correct as it can be, it avoids the few copies it can (all, none) and copies otherwise.
There are new APIs with the new Span and ReadonlySpan types which will ship with the magical language and runtime features for low level code (ref returns/locals).The types are actually already shipping as part of the System.Memory nuget package, but until they are integrated there will be no way of using them to solve the problem of slicing an ImmutableArray which requires this method on ImmutableArray (which is in System.Collections.Immutable that doesn't depend on System.Memory types, yet)
public ReadonlySpan<T> Slice(int start, int count)
I'm guessing/hoping such APIs will come once the types are in place.

Related

C# Add extension Append method to class array of type T [duplicate]

This question already has answers here:
Changing size of array in extension method does not work?
(3 answers)
Closed 6 years ago.
Using this answer to the question "How to add a string to a string[] array? There's no .Add function" I am trying to use this answer to write an generic extension to append elements to an generic array. Only using the Array.Resize() method works well and the sample below adds an extra element to my string array
string[] array = new string[] { "Foo", "Bar" };
Array.Resize(ref array, array.Length + 1);
array[array.Length - 1] = "Baz";
But when I try to use the ArrayExtension method described below, the method does resize my array inside the method but when it returns the array is unchanged?
My extension class
public static class ArrayExtensions
{
public static void Append<T>(this T[] array, T append)
{
Array.Resize(ref array, array.Length + 1);
array[array.Length - 1] = append; // < Adds an extra element to my array
}
}
Used as follows
string[] array = new string[] { "Foo", "Bar" };
array.Append("Baz");
When the method returns, the added element does not exist. What am I missing?
UPDATE
As pointed out, this question has been asked and answered here before. I will accept the previous question "Changing size of array in extension method does not work?" as an answer to my question.
UPDATE 2
Since if a return method is used in the extension method it does violate the expected behaviour of similar classes in the framework that Append() to modify the object itself, I did some changes to the extension method to prevent it from being falsely used. Thanks #NineBerry for pointing this out.
I also added params T[] add to allow for adding multiple elements at once.
public static class ArrayExtensions
{
/// <summary>
/// Creates a copy of an object array and adds the extra elements to the created copy
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="array"></param>
/// <param name="add">Elemet to add to the created copy</param>
/// <returns></returns>
public static T[] CopyAddElements<T>(this T[] array, params T[] add)
{
for (int i = 0; i < add.Length; i++)
{
Array.Resize(ref array, array.Length + 1);
array[array.Length - 1] = add[i];
}
return array;
}
}
Usage
string[] array = new string[] { "Foo", "Bar" };
array = array.CopyAddElements("Baz", "Foobar");
for (int i = 0; i < array.Length; i++)
{
System.Console.Write($"{array[i]} ");
}
/* Output
* Foo Bar Baz Foobar
*/

Array.Resize creates a new array. That's why you have to pass the array using the ref keyword to Array.Resize. So, after returning from Array.Resize, the variable array references a different object.
There is no way to use an extension method to do what you are trying to do.
Why not simply use a List<> instead of an array? Using Array.Resize is a very costly operation. It means allocating new memory and copying over the data from the old array to the new array each time it is called to increase the array length.

What actually happens is that
array.Append("Baz");
Is translated to
ArrayExtensions.Append( array, "Baz" );
Which means that reference to array is passed in by value, so inside the Append method you work with a copy of the reference. The Array.Resize method then takes this new variable as ref, creates a new array in memory and changes the variable to point to it. Unfortunately, this changes just this variable and not the original one.
You can return the newly created array as return value or create a static method that uses ref instead of extension methods.

Contrary to your intuition the this parameter is not passed with an implicit ref and it is also not allowed to add a ref to it. See also the discussion here

More efficient way to build sum than for loop

I have two lists with equal size. Both contain numbers. The first list is generated and the second one is static. Since I have many of the generated lists, I want to find out which one is the best. For me the best list is the one which is most equal to the reference. Therefore I calculate the difference at each position and add it up.
Here is the code:
/// <summary>
/// Calculates a measure based on that the quality of a match can be evaluated
/// </summary>
/// <param name="Combination"></param>
/// <param name="histDates"></param>
/// <returns>fitting value</returns>
private static decimal getMatchFitting(IList<decimal> combination, IList<MyClass> histDates)
{
decimal fitting = 0;
if (combination.Count != histDates.Count)
{
return decimal.MaxValue;
}
//loop through all values, compare and add up the result
for (int i = 0; i < combination.Count; i++)
{
fitting += Math.Abs(combination[i] - histDates[i].Value);
}
return fitting;
}
Is there possibly a more elegant but more important and more efficient way to get the desired sum?
Thanks in advance!

You can do the same with LINQ as follows:
return histDates.Zip(combination, (x, y) => Math.Abs(x.Value - y)).Sum();
This could be considered more elegant, but it cannot be more efficient that what you already have. It can also work with any type of IEnumerable (so you don't need specifically an IList), but that does not have any practical importance in your situation.
You can also reject a histDates as soon as the running sum of differences becomes larger than the smallest sum seen so far if you have this information at hand.

This is possible without using lists. Instead of filling your two lists you just want to have the sum of each values for a single list, e.g. IList combination becomes int combinationSum.
Do the same for histDates list.
Then substract those two values. No loop in this case is needed.

you can do more elegant with LINQ but it will not be more efficient... if you can calculate the sums while adding the items to the list you might get an edge...

I don't think I want to garantee any direct improvment of efficiancy as I can't test it right now but this at least looks nicer:
if (combination.Count != histDates.Count)
return decimal.MaxValue;
return combination.Select((t, i) => Math.Abs(t - histDates[i].Value)).Sum();

Class Design Quality

Is this class written by me is sufficient (I mean the way the pro's do it) to be included in a code/project? Or am I missing important things? I don't know how to use constructors etc so I did not use the same (I m just a beginner in C#) but please comment if so are required.
using System;
using System.Collections.Generic;
using System.Text;
namespace RandBit
{
/// <summary>
/// By: Author
/// Version 0.0.1
/// Pseudo-Random 16-Bit (Max) Generator.
/// </summary>
public class RandomBit
{
/// <param name="input">The Bit-size(int)</param>
/// <returns>Random Bit of Bit-size(string)</returns>
public static string Generate(int input)
{
int bitSize = 0;
Random choice = new Random();
if (input == 0 || input > 16)
{
bitSize = 0;
}
else if (input == 1)
{
bitSize = 1;
}
else
{
int randomChoice = choice.Next(0, (1 << input));
bitSize = randomChoice;
}
string binary = Convert.ToString(bitSize, 2);
binary = binary.PadLeft(input, '0');
return binary;
}
}
}
Thanks.

It appears that you are using Random incorrectly. I'd suggest starting with Jon Skeet's article on the subject. Relevant quote:
If you start off an instance of Random with the same initial state
(which can be provided via a seed) and make the same sequence of
method calls on it, you'll get the same results.
So what was wrong in our example code? We were using a new instance of
Random on each iteration of the loop. The parameterless constructor
for Random takes the current date and time as the seed - and you can
generally execute a fair amount of code before the internal timer
works out that the current date and time has changed. Therefore we're
using the same seed repeatedly - and getting the same results
repeatedly.
In other words, since you are creating a new instance of Random with each call, you are greatly increasing the chances that the return value won't be as "random" as you would expect.
It's also worth mentioning that there are potentially better PRNG classes already in the .Net BCL. Here's another way of writing similar code.
private static readonly RNGCryptoServiceProvider _crypto = new RNGCryptoServiceProvider();
public static long Generate(){
// use whatever size you want here; bigger has a better chance of
// being unique in a given scope
byte[] bytes = new byte[8];
// puts random bytes into the array
_crypto.GetBytes( bytes );
// do something (your choice) with the value...
return BitConverter.ToInt64( bytes, 0 );
}

You may only change one thing, since your class only contains a single static member, why don't make the class as static.

If I were the project team leader I would require that you drop summary/author/version comments. They are redundant (source control has that info), take a some time to write/modify and are ambiguous (in a file modified by 7 people who's the author?).
Here's a discussion on this topic, perhaps not the only one: https://softwareengineering.stackexchange.com/q/48562/30927

Move the variable "Choice" close to its usage i.e within the else loop.
Otherwise, You will be allocating un necessary memory to Random Object even if not used.
See here

Data structure with unique elements and fast add and remove

I need a data structure with the following properties:
Each element of the structure must be unique.
Add: Adds one element to the data structure unless the element already
exists.
Pop: Removes one element from the data structure and returns the element
removed. It's unimportant which element is removed.
No other operations are required for this structure. A naive implementation with a list will require almost O(1) time for Pop and O(N) time for Add (since the entire list must be checked to ensure
uniqueness). I am currently using a red-black tree to fulfill the needs of this data structure, but I am wondering if I can use something less complicated to achieve almost the same performance.
I prefer answers in C#, but Java, Javascript, and C++ are also acceptable.
My question is similar to this question, however I have no need to lookup or remove the maximum or minimum value (or indeed any particular kind of value), so I was hoping there would be improvements in that respect. If any of the structures in that question are appropriate here, however, let me know.
So, what data structure allows only unique elements, supports fast add and remove, and is less complicated than a red-black tree?

What about the built-in HashSet<T>?
It contains only unique elements. Remove (pop) is O(1) and Add is O(1) unless the internal array must be resized.

As said by Meta-Knight, a HashSet is the fastest data structure to do exactly that. Lookups and removals take constant O(1) time (except in rare cases when your hash sucks and then you require multiple rehashes or you use a bucket hashset). All operations on a hashset take O(1) time, the only drawback is that it requires more memory because the hash is used as an index into an array (or other allocated block of memory). So unless you're REALLY strict on memory then go with HashSet. I'm only explaining the reason why you should go with this approach and you should accept Meta-Knights answer as his was first.
Using hashes is OK because usually you override the HashCode() and Equals() functions. What the HashSet does internally is generate the hash, then if it is equal check for equality (just in case of hash collisions). If they are not it must call a method to do something called rehashing which generates a new hash which is usually at an odd prime offset from the original hash (not sure if .NET does this but other languages do) and repeats the process as necessary.

Remove a random element is quite easy from an hashset or a dictionary.
Everything is averaged O(1), that in real world means O(1).
Example:
public class MyNode
{
...
}
public class MyDataStructure
{
private HashSet<MyNode> nodes = new HashSet<MyNode>();
/// <summary>
/// Inserts an element to this data structure.
/// If the element already exists, returns false.
/// Complexity is averaged O(1).
/// </summary>
public bool Add(MyNode node)
{
return node != null && this.nodes.Add(node);
}
/// <summary>
/// Removes a random element from the data structure.
/// Returns the element if an element was found.
/// Returns null if the data structure is empty.
/// Complexity is averaged O(1).
/// </summary>
public MyNode Pop()
{
// This loop can execute 1 or 0 times.
foreach (MyNode node in nodes)
{
this.nodes.Remove(node);
return node;
}
return null;
}
}
Almost everything that can be compared can also be hashed :) in my experience.
I would like to know if there is someone that know something that cannot be hashed.
To my experience this applies also to some floating point comparations with tolerance using special techniques.
An hash function for an hash table don't need to be perfect, it just need to be good enough.
Also if your data is very complicated usually hash functions are less complicated than red black trees or avl trees.
They are useful because they keep things ordered, but you don't need this.
To show how to do a simple hashset i will consider a simple dictionary with integer keys.
This implementation is very fast and very good for sparse arrays for examples.
I didn't write the code to grow the bucket table, because it is annoying and usually a source of big bugs, but since this is a proof of concept, it should be enough.
I didn't write iterator neither.
I wrote it by scratch, there may be bugs.
public class FixedIntDictionary<T>
{
// Our internal node structure.
// We use structs instead of objects to not add pressure to the garbage collector.
// We mantains our own way to manage garbage through the use of a free list.
private struct Entry
{
// The key of the node
internal int Key;
// Next index in pEntries array.
// This field is both used in the free list, if node was removed
// or in the table, if node was inserted.
// -1 means null.
internal int Next;
// The value of the node.
internal T Value;
}
// The actual hash table. Contains indices to pEntries array.
// The hash table can be seen as an array of singlt linked list.
// We store indices to pEntries array instead of objects for performance
// and to avoid pressure to the garbage collector.
// An index -1 means null.
private int[] pBuckets;
// This array contains the memory for the nodes of the dictionary.
private Entry[] pEntries;
// This is the first node of a singly linked list of free nodes.
// This data structure is called the FreeList and we use it to
// reuse removed nodes instead of allocating new ones.
private int pFirstFreeEntry;
// Contains simply the number of items in this dictionary.
private int pCount;
// Contains the number of used entries (both in the dictionary or in the free list) in pEntries array.
// This field is going only to grow with insertions.
private int pEntriesCount;
///<summary>
/// Creates a new FixedIntDictionary.
/// tableBucketsCount should be a prime number
/// greater than the number of items that this
/// dictionary should store.
/// The performance of this hash table will be very bad
/// if you don't follow this rule!
/// </summary>
public FixedIntDictionary<T>(int tableBucketsCount)
{
// Our free list is initially empty.
this.pFirstFreeEntry = -1;
// Initializes the entries array with a minimal amount of items.
this.pEntries = new Entry[8];
// Allocate buckets and initialize every linked list as empty.
int[] buckets = new int[capacity];
for (int i = 0; i < buckets.Length; ++i)
buckets[i] = -1;
this.pBuckets = buckets;
}
///<summary>Gets the number of items in this dictionary. Complexity is O(1).</summary>
public int Count
{
get { return this.pCount; }
}
///<summary>
/// Adds a key value pair to the dictionary.
/// Complexity is averaged O(1).
/// Returns false if the key already exists.
/// </summary>
public bool Add(int key, T value)
{
// The hash table can be seen as an array of linked list.
// We find the right linked list using hash codes.
// Since the hash code of an integer is the integer itself, we have a perfect hash.
// After we get the hash code we need to remove the sign of it.
// To do that in a fast way we and it with 0x7FFFFFFF, that means, we remove the sign bit.
// Then we have to do the modulus of the found hash code with the size of our buckets array.
// For this reason the size of our bucket array should be a prime number,
// this because the more big is the prime number, the less is the chance to find an
// hash code that is divisible for that number. This reduces collisions.
// This implementation will not grow the buckets table when needed, this is the major
// problem with this implementation.
// Growing requires a little more code that i don't want to write now
// (we need a function that finds prime numbers, and it should be fast and we
// need to rehash everything using the new buckets array).
int bucketIndex = (key & 0x7FFFFFFF) % this.pBuckets.Length;
int bucket = this.pBuckets[bucketIndex];
// Now we iterate in the linked list of nodes.
// Since this is an hash table we hope these lists are very small.
// If the number of buckets is good and the hash function is good this will translate usually
// in a O(1) operation.
Entry[] entries = this.pEntries;
for (int current = entries[bucket]; current != -1; current = entries[current].Next)
{
if (entries[current].Key == key)
{
// Entry already exists.
return false;
}
}
// Ok, key not found, we can add the new key and value pair.
int entry = this.pFirstFreeEntry;
if (entry != -1)
{
// We found a deleted node in the free list.
// We can use that node without "allocating" another one.
this.pFirstFreeEntry = entries[entry].Next;
}
else
{
// Mhhh ok, the free list is empty, we need to allocate a new node.
// First we try to use an unused node from the array.
entry = this.pEntriesCount++;
if (entry >= this.pEntries)
{
// Mhhh ok, the entries array is full, we need to make it bigger.
// Here should go also the code for growing the bucket table, but i'm not writing it here.
Array.Resize(ref this.pEntries, this.pEntriesCount * 2);
entries = this.pEntries;
}
}
// Ok now we can add our item.
// We just overwrite key and value in the struct stored in entries array.
entries[entry].Key = key;
entries[entry].Value = value;
// Now we add the entry in the right linked list of the table.
entries[entry].Next = this.pBuckets[bucketIndex];
this.pBuckets[bucketIndex] = entry;
// Increments total number of items.
++this.pCount;
return true;
}
/// <summary>
/// Gets a value that indicates wether the specified key exists or not in this table.
/// Complexity is averaged O(1).
/// </summary>
public bool Contains(int key)
{
// This translate in a simple linear search in the linked list for the right bucket.
// The operation, if array size is well balanced and hash function is good, will be almost O(1).
int bucket = this.pBuckets[(key & 0x7FFFFFFF) % this.pBuckets.Length];
Entry[] entries = this.pEntries;
for (int current = entries[bucket]; current != -1; current = entries[current].Next)
{
if (entries[current].Key == key)
{
return true;
}
}
return false;
}
/// <summary>
/// Removes the specified item from the dictionary.
/// Returns true if item was found and removed, false if item doesn't exists.
/// Complexity is averaged O(1).
/// </summary>
public bool Remove(int key)
{
// Removal translate in a simple contains and removal from a singly linked list.
// Quite simple.
int bucketIndex = (key & 0x7FFFFFFF) % this.pBuckets.Length;
int bucket = this.pBuckets[bucketIndex];
Entry[] entries = this.pEntries;
int next;
int prev = -1;
int current = entries[bucket];
while (current != -1)
{
next = entries[current].Next;
if (entries[current].Key == key)
{
// Found! Remove from linked list.
if (prev != -1)
entries[prev].Next = next;
else
this.pBuckets[bucketIndex] = next;
// We now add the removed node to the free list,
// so we can use it later if we add new elements.
entries[current].Next = this.pFirstFreeEntry;
this.pFirstFreeEntry = current;
// Decrements total number of items.
--this.pCount;
return true;
}
prev = current;
current = next;
}
return false;
}
}
If you wander if this implementation is good or not, is a very similar implementation of what the .NET framework do for Dictionary class :)
To make it an hashset, just remove the T and you have an hashset of integers.
If you need to get hashcodes for generic objects, just use x.GetHashCode or provide your hash code function.
To write iterators you need to modify several things, but don't want to add too much other things in this post :)

stack collection missing shift and unshift in C# 2.0

Bizarrely the stack collection seems to be missing the rather basic shift and unshift methods* and I'm working in 2.0 so I can't just extend them.
Is there any reasonable technique or alternative collection class to get these methods available? I need push and pop as well.
Edit: looks like the collection I want is indeed a deque which is happily not native to C# :(
Can't use third party libraries at this time so I'll be going with the clunky LinkedList (I say clunky because reading and removing are two operations where shift would be one) but I think I'd recommend the PowerCollections approach to anyone who could use it. Or better yet, upgrading to extension methods.
sigh
* Apologies, I didn't realise these were uncommon terms, I thought I just didn't know where to find them in the API. For reference:
shift = remove first element
unshift = insert element at beginning of collection

I would say use a LinkedList<T>. It has methods for adding and removing from the front, as well as adding and removing from the back. I've never heard of shifting and unshifting, but I'm assuming that's what it means.

Never heard of shift/unshift in a stack. The Stack class does provide Pop, Peek, and Push though.

You are using the wrong class if you want a shift/unshift method. A stack is a Last-In First-Out (LIFO) data structure.
If you want shift/unshift without pop and push, use a Queue. If you want both, I recommend using Deque from the PowerCollections library

You can fake extension methods as long as you are using C# 3.0 targeting 2.0.
Can you describe what the shift/unshift operations are?

By definition Stack class represents a way of managing elements in a collection using the Last In First Out (LIFO) technique for adding and removing elements. LIFO simply means that the last element added to a collection will automatically be the first one removed.
The functionality you want from it is something custom, but easily can be achieved in following way
public class MyStack<T>:Stack<T>{
public void Shift(T item){
// load stack into internal ordered list
// clear stack content
// insert into internal list at desired location
// populate stack with content from internal list
}
public void Unshift(T item){
// load stack into internal ordered list
// clear stack content
// insert into internal list at desired location
// populate stack with content from internal list
}
}
and seems this it-s all :)

This is not exactly the best, but it comes close to being a Javascript array with shift/unshift and push/pop. Its does not hide the inner workings, and you can index any item you want. I has the basic functionality though.
public class JSList<T> : List<T>
{
public JSList() : base() {}
/// <summary>
/// this the add item to the start of the list
/// </summary>
/// <param name="v"></param>
public void Shift(T v)
{
this.Insert(0, v);
}
/// <summary>
/// remove item at the start of the list
/// </summary>
/// <returns></returns>
public T Unshift()
{
var toreturn = default(T);
if (this.Count > 0)
{
toreturn = this[0];
this.RemoveAt(0);
}
return toreturn;
}
/// <summary>
/// Adds object to end of the list
/// </summary>
/// <param name="v"></param>
public void Push(T v)
{
this.Add(v);
}
/// <summary>
/// removes an item at the end of the list
/// </summary>
/// <returns></returns>
public T Pop()
{
var toreturn = default(T);
if (this.Count > 0)
{
toreturn = this[this.Count - 1];
this.RemoveAt(this.Count - 1);
}
return toreturn;
}
public T Peek()
{
return this[this.Count - 1];
}
}

Shift ==> Stack.Pop
Unshift ==> Stack.Push
Unshift doesn't return the number of elements in the Stack, you have the Stack.Count property for that.
Also, there's Stack.Peek, to get the first element without removing it.
Stack<T> class

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why does ImmutableArray.Create copy an existing immutable array? - c#

Related

C# Add extension Append method to class array of type T [duplicate]

More efficient way to build sum than for loop

Class Design Quality

Data structure with unique elements and fast add and remove

stack collection missing shift and unshift in C# 2.0

Categories

Resources