Avoiding array duplication

Avoiding array duplication - c#

According to [MSDN: Array usage guidelines](http://msdn.microsoft.com/en-us/library/k2604h5s(VS.71).aspx):
Array Valued Properties
You should use collections to avoid code inefficiencies. In the following code example, each call to the myObj property creates a copy of the array. As a result, 2n+1 copies of the array will be created in the following loop.
[Visual Basic]
Dim i As Integer
For i = 0 To obj.myObj.Count - 1
DoSomething(obj.myObj(i))
Next i
[C#]
for (int i = 0; i < obj.myObj.Count; i++)
DoSomething(obj.myObj[i]);
Other than the change from myObj[] to ICollection myObj, what else would you recommend? Just realized that my current app is leaking memory :(
Thanks;
EDIT: Would forcing C# to pass references w/ ref (safety aside) improve performance and/or memory usage?

No, it isn't leaking memory - it is just making the garbage collector work harder than it might. Actually, the MSDN article is slightly misleading: if the property created a new collection every time it was called, it would be just as bad (memory wise) as with an array. Perhaps worse, due to the usual over-sizing of most collection implementations.
If you know a method/property does work, you can always minimise the number of calls:
var arr = obj.myObj; // var since I don't know the type!
for (int i = 0; i < arr.Length; i++) {
DoSomething(arr[i]);
}
or even easier, use foreach:
foreach(var value in obj.myObj) {
DoSomething(value);
}
Both approaches only call the property once. The second is clearer IMO.
Other thoughts; name it a method! i.e. obj.SomeMethod() - this sets expectation that it does work, and avoids the undesirable obj.Foo != obj.Foo (which would be the case for arrays).
Finally, Eric Lippert has a good article on this subject.

Just as a hint for those who haven't use the ReadOnlyCollection mentioned in some of the answers:
[C#]
class XY
{
private X[] array;
public ReadOnlyCollection<X> myObj
{
get
{
return Array.AsReadOnly(array);
}
}
}
Hope this might help.

Whenever I have properties that are costly (like recreating a collection on call) I either document the property, stating that each call incurs a cost, or I cache the value as a private field. Property getters that are costly, should be written as methods.
Generally, I try to expose collections as IEnumerable rather than arrays, forcing the consumer to use foreach (or an enumerator).

It will not make copies of the array unless you make it do so. However, simply passing the reference to an array privately owned by an object has some nasty side-effects. Whoever receives the reference is basically free to do whatever he likes with the array, including altering the contents in ways that cannot be controlled by its owner.
One way of preventing unauthorized meddling with the array is to return a copy of the contents. Another (slightly better) is to return a read-only collection.
Still, before doing any of these things you should ask yourself if you are about to give away too much information. In some cases (actually, quite often) it is even better to keep the array private and instead let provide methods that operate on the object owning it.

myobj will not create new item unless you explicitly create one. so to make better memory usage I recommend to use private collection (List or any) and expose indexer which will return the specified value from the private collection

Related

What is the benefit of using a local variable?

I keep seeing examples online, where there is a property of an element within a method that is copied to a local variable before use. For example, something like this (from Microsoft's StackPanel source code):
UIElementCollection children = arrangeElement.InternalChildren;
...
for (int i = 0, count = children.Count; i < count; ++i)
{
UIElement child = (UIElement)children[i];
if (child == null) { continue; }
...
}
Can anyone explain to me what the benefit of doing that is (if there is one), rather than accessing the property directly each time, like this?:
for (int i = 0, count = arrangeElement.InternalChildren.Count; i < count; ++i)
{
UIElement child = (UIElement)arrangeElement.InternalChildren[i];
if (child == null) { continue; }
...
}
Clearly, it saves a few characters on the screen, but that's not much of a reason to do this. Also, I understand why we might want to do this with a long running method, as a form of caching:
double value = GetValueFromLongRunningMethod();
...
for (int i = 0; i < someCollection.Count; i++) DoSomethingWith(value);
But I see this done with properties a lot and wonder why. Here's another commonly found example from the internet to do with virtualization:
IItemContainerGenerator generator = this.ItemContainerGenerator;
GeneratorPosition position = generator.GeneratorPositionFromIndex(firstVisibleItemIndex);
Why do that instead of this?:
GeneratorPosition position =
this.ItemContainerGenerator.GeneratorPositionFromIndex(firstVisibleItemIndex);
Finally, if this is done for the same reason that we might cache the result of a long running method, then how are we supposed to know which properties need to be accessed in this way?

Firstly, it avoids calling .InternalChildren lots of times. This could be a small but noticeable reduction of virtual calls (since it is used in a loop), but in some cases it might be much more significant. In some cases, a property that returns a collection or array might allocate every time it is called; DataRow.ItemArray is a classic example of this - so it is actively harmful to call it each time. An additional consideration is that even if it returns the same array each time it is called, there is JIT magic that happens to elide bounds checking, but it'll only work if the JIT can see that you are iterating a single array for the entire duration. If you stick a property accessor in the middle: this won't be obvious and the bounds check removal won't happen. It also might not happen if you've manually hoisted the upper bound!
Side note: if it isn't an array, then foreach would probably usually be preferable, and there would not be any advantage to introducing a local, due to how foreach works internally.
Note: since you're using .Count vs .Length, this definitely isn't an array, and you should probably simplify to:
foreach(UIElement child = in arrangeElement.InternalChildren) {...}
or
foreach(var child = in arrangeElement.InternalChildren) {...}
Not only does this remove this question completely, but it means that the type's own iterator (which might be an optimized struct iterator, or might be a simple IEnumerable<T> class, such as a compiler-generated iterator block) can be used. This usually has more direct access to the internals, and thus bypasses a few indirections and API checks that indexers require.

It might be fruitful in some cases like when you have to
debug some piece of code and you need to instantly see the value of variable
do a few operations at a time with an object, which requires casting - as result you cast it once
and sometimes, when you use value type objects this kind of making a local copy gives you an opportunity to not change the value of class' property

Why do that instead of this?:
GeneratorPosition position =
this.ItemContainerGenerator.GeneratorPositionFromIndex(firstVisibleItemIndex);
Let's get very abstract about this:
We get a generator. That apparently is this.ItemContainerGenerator for now, but that could change.
We use it. Only once here, but usually in multiple statements.
When we later decide to get that generator elsewhere, the usage should stay the same.
The example is too small to make this convincing, but there is some kind of logic to be discerned here.

C# create List<T> in initialization vs get

I was wondering what is the best method to create a list in a certain object.
1) DefA "always" occupies memory beforehand even if it is never called, right?
2) DefB will "always" have to check for the null condition or does the compiler optimizes this?
3) Is there a better way to implement this?
Thanks
private List<A> _defA = new List<A>();
public List<A> DefA
{
get { return _defA; }
}
private List<B> _defB;
public List<B> DefB
{
get
{
if (_defB == null)
_defB = new List<B>();
return _defB;
}
}

Because I think both options will not affect on performance of your application, my suggestion to choose one which keep code cleaner
Use Lazy type - Lazy on MSDN
From MSDN about Lazy initialization:
By default, Lazy objects are thread-safe. That is, if the
constructor does not specify the kind of thread safety, the Lazy
objects it creates are thread-safe. In multi-threaded scenarios, the
first thread to access the Value property of a thread-safe Lazy
object initializes it for all subsequent accesses on all threads, and
all threads share the same data. Therefore, it does not matter which
thread initializes the object, and race conditions are benign.
So in your case
private Lazy<List<A>> _defA = new Lazy<List<A>>(() => new List<A>());
public List<A> DefA
{
get
{
return _defA.Value;
}
}
In addition this approach will tell your intents to other developers who may work with your code.

In this specific example, the delayed (lazy) instantiation might save a few milliseconds on startup; but at the risk of issues in a multi-threaded scenario.
Say two threads call DefB (Get) almost simultaneously - they might end up setting _defB twice, instead of the once that you intend.
_defA will always take the memory of an empty list, as I understand it, yes - so you'll save some memory the second way if it's not called - but it does make the code MUCH harder to understand. Also, what if a local piece of code doesn't call the accessor method, but just does _defB.Add() or whatever? (which might not be deliberate now, but because it's more complex it's easy to forget/miss in the future)

First of all, don't optimize something that doesn't need optimizing.
If you're creating thousands or millions of the object that contains that property, and this property is seldom used and thus seldom needed, then yes, adding lazy on-demand initialization is probably a good idea. I say probably because there may be other performance-related issues as well.
However, to answer your specific questions, other than "what is the best way":
The initialization of _defA will construct a List<A> object even if the property is never used, that is correct.
The getter method of DefB will always do the null check, that is also correct. The compiler cannot optimize this away.
As for "better way"? That part of the question falls into the "primarily opinion-based" close option here on Stack Overflow. It depends largely on what you determine is better:
More expressive syntax (shorter code)
Less memory spent (option B)
Less code in the getter (option A)
I can give you an alternative to the syntax in option A:
public List<A> DefA
{
get;
} = new List<A>();
This syntax is available in Visual Studio 2015 with C# 6 (even if you compiler for older .NET runtime versions) and is called Auto-property initializer.
The compiler will automagically create the backing field for you (the _defA equivalent) and mark it read-only, so feature-wise this is 100% identical to option A, it's just a different syntax.

Push Item to the end of an array

No, I can't use generic Collections. What I am trying to do is pretty simple actually. In php I would do something like this
$foo = [];
$foo[] = 1;
What I have in C# is this
var foo = new int [10];
// yeah that's pretty much it
Now I can do something like foo[foo.length - 1] = 1 but that obviously wont work. Another option is foo[foo.Count(x => x.HasValue)] = 1 along with a nullable int during declaration. But there has to be a simpler way around this trivial task.
This is homework and I don't want to explain to my teacher (and possibly the entire class) what foo[foo.Count(x => x.HasValue)] = 1 is and why it works etc.

The simplest way is to create a new class that holds the index of the inserted item:
public class PushPopIntArray
{
private int[] _vals = new int[10];
private int _nextIndex = 0;
public void Push(int val)
{
if (_nextIndex >= _vals.Length)
throw new InvalidOperationException("No more values left to push");
_vals[_nextIndex] = val;
_nextIndex++;
}
public int Pop()
{
if (_nextIndex <= 0)
throw new InvalidOperationException("No more values left to pop");
_nextIndex--;
return _vals[_nextIndex];
}
}
You could add overloads to get the entire array, or to index directly into it if you wanted. You could also add overloads or constructors to create different sized arrays, etc.

In C#, arrays cannot be resized dynamically. You can use Array.Resize (but this will probably be bad for performance) or substitute for ArrayList type instead.

But there has to be a simpler way around this trivial task.
Nope. Not all languages do everything as easy as each other, this is why Collections were invented. C# <> python <> php <> java. Pick whichever suits you better, but equivalent effort isn't always the case when moving from one language to another.

foo[foo.Length] won't work because foo.Length index is outside the array.
Last item is at index foo.Length - 1
After that an array is a fixed size structure if you expect it to work the same as in php you're just plainly wrong

Originally I wrote this as a comment, but I think it contains enough important points to warrant writing it as an answer.
You seem to be under the impression that C# is an awkward language because you stubbornly insist on using an array while having the requirement that you should "push items onto the end", as evidenced by this comment:
Isn't pushing items into the array kind of the entire purpose of the data structure?
To answer that: no, the purpose of the array data structure is to have a contiguous block of pre-allocated memory to mimic the original array structure in C(++) that you can easily index and perform pointer arithmetic on.
If you want a data structure that supports certain operations, such as pushing elements onto the end, consider a System.Collections.Generic.List<T>, or, if you insist on avoiding generics, a System.Collections.List. There are specializations that specify the underlying storage structure (such as ArrayList) but in general the whole point of the C# library is that you don't want to concern yourself with such details: the List<T> class has certain guarantees on its operations (e.g. insertion is O(n), retrieval is O(1) -- just like an array) and whether there is an array or some linked list that actually holds the data is irrelevant and is in fact dynamically decided based on the size and use case of the list at runtime.
Don't try to compare PHP and C# by comparing PHP arrays with C# arrays - they have different programming paradigms and the way to solve a problem in one does not necessarily carry over to the other.
To answer the question as written, I see two options then:
Use arrays the awkward way. Either create an array of Nullable<int>s and accept some boxing / unboxing and unpleasant LINQ statements for insertion; or keep an additional counter (preferably wrapped up in a class together with the array) to keep track of the last assigned element.
Use a proper data structure with appropriate guarantees on the operations that matter, such as List<T> which is effectively the (much better, optimised) built-in version of the second option above.
I understand that the latter option is not feasible for you because of the constraints imposed by your teacher, but then do not be surprised that things are harder than the canonical way in another language, if you are not allowed to use the canonical way in this language.
Afterthought:
A hybrid alternative that just came to mind, is using a List for storage and then just calling .ToArray on it. In your insert method, just Add to the list and return the new array.

C# Growing List and Pointers to Elements

I need to have a growing array, or list (the built in ones are sufficient). Furthermore I need to be able to manipulate elements in the array with pointers to that specific element for example the following code
List<int> l1=new List<int>();
List<bool> l2=new List<bool>();
l1.Add(8);
l2.Add(true);
l1.Add(234);
l2.Add(true);
Console.WriteLine(l1[0]); //output=8
int* pointer = (int *) l1[0];
Console.WriteLine(*pointer); //Needs to output 8
Console.WriteLine(l2[0]); //output=true
bool* pointer2 = (bool *) l2[0];
Console.WriteLine(*pointer2); //Needs to output true
Thanks in advance for any help

Im trying to use an array to store packet data and pass it off to threads, these theads need to be able to modify the data without trashing the array
In this case, I would just pass your List<T> into the threaded routine, as well as a starting and ending index that thread should use.
Provided you always work by index, and stay within the bounds, there shouldn't be any problem with "trashing the array."

First off: you are applying a C++ approach to a problem that is solved differently in C#. In C# you generally don't want to do things involving explicit pointers, because they make life difficult, especially as it pertains to garbage collection.
That said, if you must do it this way, what you'd want to do is pass the entire list (and maybe the index) as a parameter, along with an offset, to the other thread. You would also want to be sure to lock the list appropriately in all accessing threads, to avoid dirty reads/writes.
The right solution is just to pass the item that you actually want to process. Reference types are passed byval, but that just means a new pointer is created to the same heap variable. It isn't actually creating a new value on the heap.
So for example:
var myList = new List<MyClass> { someInstanceofMyClass1, someInstanceofMyClass2 };
var t = new Thread(()=> SomeMethod(myList[0])); // Assuming MyClass is a reference type, the value passed here is the same instance as the one in myList
t.Start();
...

It already works the way you want for reference types. Therefore one potential solution is to create a class to box these values as reference types. If the items are related by index (and I suspect they are) it's a good idea to keep one list to hold a type that groups both values rather than two lists anyway. Going with that:
public Class MyClass
{
public int IntValue {get;set;}
public bool BoolValue {get;set;}
public MyClass(int intValue, bool boolValue)
{
IntValue = intValue;
boolValue = boolValue;
}
}
List<MyClass> l1 = new List<MyClass>();
l1.Add(new MyClass(8, true));
MyClass pointer = l1[0];
Console.WriteLine(pointer.IntValue); //writes 8
Console.WriteLine(pointer.BoolValue); //writes True

If you are using .NET 4, you might want to look into the classes in System.Collections.Concurrent namespace. They provide thread-safe data structures that might help you achieve your goal with less code.

For what you are trying to accomplish (network packets), it sounds like you would benefit from a System.IO.BinaryWriter instead of a List. (You could even pass a NetworkStream to a BinaryWriter)
It supports seeking so you can go back and re-write, and you can't trash the array since it grows automatically.
Performance-wise, I would assume BinaryWriter is faster than a List since it writes to an underlying Stream and MemoryStream is faster than a List<byte>

Return collection as read-only

I have an object in a multi-threaded environment that maintains a collection of information, e.g.:
public IList<string> Data
{
get
{
return data;
}
}
I currently have return data; wrapped by a ReaderWriterLockSlim to protect the collection from sharing violations. However, to be doubly sure, I'd like to return the collection as read-only, so that the calling code is unable to make changes to the collection, only view what's already there. Is this at all possible?

If your underlying data is stored as list you can use List(T).AsReadOnly method.
If your data can be enumerated, you can use Enumerable.ToList method to cast your collection to List and call AsReadOnly on it.

I voted for your accepted answer and agree with it--however might I give you something to consider?
Don't return a collection directly. Make an accurately named business logic class that reflects the purpose of the collection.
The main advantage of this comes in the fact that you can't add code to collections so whenever you have a native "collection" in your object model, you ALWAYS have non-OO support code spread throughout your project to access it.
For instance, if your collection was invoices, you'd probably have 3 or 4 places in your code where you iterated over unpaid invoices. You could have a getUnpaidInvoices method. However, the real power comes in when you start to think of methods like "payUnpaidInvoices(payer, account);".
When you pass around collections instead of writing an object model, entire classes of refactorings will never occur to you.
Note also that this makes your problem particularly nice. If you don't want people changing the collections, your container need contain no mutators. If you decide later that in just one case you actually HAVE to modify it, you can create a safe mechanism to do so.
How do you solve that problem when you are passing around a native collection?
Also, native collections can't be enhanced with extra data. You'll recognize this next time you find that you pass in (Collection, Extra) to more than one or two methods. It indicates that "Extra" belongs with the object containing your collection.

If your only intent is to get calling code to not make a mistake, and modify the collection when it should only be reading all that is necessary is to return an interface which doesn't support Add, Remove, etc.. Why not return IEnumerable<string>? Calling code would have to cast, which they are unlikely to do without knowing the internals of the property they are accessing.
If however your intent is to prevent the calling code from observing updates from other threads you'll have to fall back to solutions already mentioned, to perform a deep or shallow copy depending on your need.

I think you're confusing concepts here.
The ReadOnlyCollection provides a read-only wrapper for an existing collection, allowing you (Class A) to pass out a reference to the collection safe in the knowledge that the caller (Class B) cannot modify the collection (i.e. cannot add or remove any elements from the collection.)
There are absolutely no thread-safety guarantees.
If you (Class A) continue to modify the underlying collection after you hand it out as a ReadOnlyCollection then class B will see these changes, have any iterators invalidated, etc. and generally be open to any of the usual concurrency issues with collections.
Additionally, if the elements within the collection are mutable, both you (Class A) and the caller (Class B) will be able to change any mutable state of the objects within the collection.
Your implementation depends on your needs:
- If you don't care about the caller (Class B) from seeing any further changes to the collection then you can just clone the collection, hand it out, and stop caring.
- If you definitely need the caller (Class B) to see changes that are made to the collection, and you want this to be thread-safe, then you have more of a problem on your hands. One possibility is to implement your own thread-safe variant of the ReadOnlyCollection to allow locked access, though this will be non-trivial and non-performant if you want to support IEnumerable, and it still won't protect you against mutable elements in the collection.

One should note that aku's answer will only protect the list as being read only. Elements in the list are still very writable. I don't know if there is any way of protecting non-atomic elements without cloning them before placing them in the read only list.

You can use a copy of the collection instead.
public IList<string> Data {
get {
return new List<T>(data);
}}
That way it doesn't matter if it gets updated.

You want to use the yield keyword. You loop through the IEnumerable list and return the results with yeild. This allows the consumer to use the for each without modifying the collection.
It would look something like this:
List<string> _Data;
public IEnumerable<string> Data
{
get
{
foreach(string item in _Data)
{
return yield item;
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Avoiding array duplication - c#

Just as a hint for those who haven't use the ReadOnlyCollection mentioned in some of the answers: [C#] class XY { private X[] array; public ReadOnlyCollection<X> myObj { get { return Array.AsReadOnly(array); } } } Hope this might help.

myobj will not create new item unless you explicitly create one. so to make better memory usage I recommend to use private collection (List or any) and expose indexer which will return the specified value from the private collection

Related

What is the benefit of using a local variable?

C# create List<T> in initialization vs get

Push Item to the end of an array

C# Growing List and Pointers to Elements

Return collection as read-only

Categories

Resources