Best practice way to store collection

Best practice way to store collection - c#

I am porting a C# application to C++, and I have the following class (where Box is a struct, and the BoxStore will be a global, long living object in the app):
public class BoxStore
{
private List<Box> boxes;
...
public List<Box> GetBoxes()
{
return this.boxes;
}
}
I'm planning to store the boxes collection in a std::vector in C++. There are multiple ways to define the collection:
std::vector<Box> boxes;
shared_ptr<std::vector<Box>> boxes;
std::vector<Box>& boxes;
(*std::vector<Box> boxes;)
What is - if there is any - best way to go? I guess the last option (to store a raw pointer to the collection) is the worst solution without any benefit (hence the parantheses)).
And what is the best approach to port the GetBoxes method? Of course this depends on the way of storing the collection. I can see multiple approaches here too:
(std::vector<Box> GetBoxes();)
std::shared_ptr<std::vector<Box>> GetBoxes();
*std::vector<Box> GetBoxes();
std::vector<Box>& GetBoxes();
The first solution seems incorrect, because the vector would get copied upon return, thus the caller couldn't modify the original collection.
However the other three approaches seem equally good to me. The BoxStore instance is long living, and is not getting destroyed while the app is running, so the caller won't have ownership over the collection. Does this mean, that returning a shared_ptr is semantically incorrect? (It is always the BoxStore object, who frees the collection.)
And is there a significant difference between returning a raw pointer or a reference?

This could be the possible one you are looking for.
BoxStore really owns the objects. So, no pointers etc are needed.
I'm assuming that the individual box objects and the list won't outlive the Store.
If you have that requirement, then you might need to consider using pointers.
Regarding the return by reference is not really good design since it violates the encapsulation. So, if you didn't have the constraint to allow the clients to modify the list, I would have provided a copy of the list to out.
#include <list>
class Box
{
...
};
class BoxStore
{
private :
std::list<Box> boxes;
public :
std::list<Box>& GetBoxes()
{
return boxes;
}
}

Related

Am I missing some benefits of static fields? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am working through C# in a Nutshell from Joseph Albahari & Ben Albahari which has been a great book by the way and am reading through the topic on static fields in C#. They have this example code,
public class Panda
{
public string name;
public static int population;
public Panda(string n)
{
name = n;
population = population + 1;
}
}
So I understand that the more instances of Panda that you instantiate the greater population will be become since it is shared amongst all objects of type Panda but now to my question.
Why? I just can't understand why I would ever want to utilize such behavior in an application. It seems like a confusing way to track a global variable within the object itself. Am I misunderstanding the potential benefits of a static field? What are some cases where this would be useful and not confusing?

I think it's best to review what happens under the hood first.
If you create a static class, a single instance is created at runtime. It happens whenever you try to use the type the first time and is used from there on. This can come in handy if you want to, say, lazy load a shared resource for instance. It also guarantees (via compiler and runtime) that you have one and only one instance at all times.
If the class is not static but you use static members, you can construct new instances, but a "static version" is maintained for you in the background. This is useful for situations in which you need to either keep track of something or if you want to share something across instances or even other code if you make the member public.
In terms of performance for instance, it could be really useful if you need to speed up your program and realize (through object count) that you are instantiating an object that never changes 100 times. Maybe you want to show your user how many Pandas have been born. You could in theory keep a count somewhere else but if you think about it, you will need another object anyways so it makes sense to keep all related information and logic together. Besides, you could have a more general type that breaks into derived ones and you may want to track all of them without having to keep adding logic.
Consider the following example:
public abstract class Animal
{
private static int _count;
protected Animal()
{
IncrementCount();
}
protected static void IncrementCount()
{
_count++;
}
public int WorldPopulation()
{
return _count;
}
}
public class Dog : Animal
{
}
public class Cat : Animal
{
}
public class Bird : Animal
{
}
If I was to create a Dog, Cat and Bird instance and then check the value of the WorldPopulation() method, I would get 3.
The Singleton pattern is also commonly implemented using this approach. It allows you to maintain a single instance while containing the construction internally:
public class SingletonSample
{
private SingletonSample()
{
}
private static SingletonSample _instance;
public static SingletonSample Instance
{
get
{
if(_instance == null)
_instance = new SingletonSample();
return _instance;
}
}
public bool IsThisTrue()
{
return true;
}
}
Notice you can't access the IsThisTrue() method via Class name, you need an instance and it cannot be created directly. It can only be created internally by the class itself:
//Object construction occurs the first time you access the "Instance" property
SingletonSample.Instance.IsThisTrue();
I hope that helps.

I just can't understand why I would ever want to utilize such
behavior in an application.
You'd never want to know the panda-count in a game? What about high-score?
Now, whether static fields are the best approach is a different manner - there are alternative patterns, but they tend to be far more complex to build and manage.

Short answer:
Consider that a cache is a place to store the result of computation. Caches are useful whenever the computation is expensive, but the storage is cheap. In C#, a static variable is just a cache for computations about a live system.
Longer answer:
Theoretically, we could discover anything that we wanted to know about a running system by searching for all objects, and then performing a computation with respect to some subset. Since this is exactly what the garbage collector does, a hypothetical CLI that provided the right hooks into the garbage collector would obviate the need for static variables.
For example, suppose we wanted to know how many Widget objects that we’ve created. Well, all we would need to do is ask the GC for a list of all of the live objects, then filter the list for objects of type Widget, and then count the Widgets.
But there are a couple of problems in the example: Firstly, some of the Widget objects might not be live (not reachable, thus not able to undergo state changes), but we would need to keep them around just for counting purposes. Even if the size of each Widget instance was only a single bit, we would still need 122KB of memory if we needed to keep a count of, say, one million Widgets (since a CLR object is at least 4 bytes, we would need almost 4MB just to keep track of the count). On the other hand, a 20-bit variable is enough to count up one million. This is a savings of 99.99% (actually 99.99999% in the case of the actual CLR). Secondly, garbage collection can be an expensive operation. Even if we avoid the overhead of memory compaction, we would just need to pause the system in general.
So, hopefully, it’s now easy to see why we would want to have the ability to cache certain computations about a live system, and hence the usefulness of static variables.
Having said all that, it is often the case that it's better to just recompute things rather than caching the results in a static variables because of the way CPU caching works.

here is an example of how i used static objects.
I had task to create an uploader handler with progress bar.
and progress bar had show up to all users that are in the site.
so a created the upload operation in a new thread and then appended the result of the operation to a static object(Progress bar) that are outside the thread, the progress bar will show up to all users that are viewing the site.
more info and exemplar could be found here
What is the use of static variable in C#? When to use it? Why can't I declare the static variable inside method?

Best practice for interface to allow adding, deleting etc. child objects w/ broadcasting events (similar to ObservableCollection)

I'm trying to specify an interface for a Folder. That interface should allow to
- Add or delete files of type IFile
- Get an List of IFile
- Broadcast events whenever a file was added/deleted/changed (e.g. for the GUI to subscribe to)
and I'm trying to find the best way to do it. So far, I came up with three ideas:
1
public interface IFolder_v1
{
ObservableCollection<IFile> files;
}
2
public interface IFolder_v2
{
void add(IFile);
void remove(IFile);
IEnumerable<IFile> files { get; }
EventHandler OnFileAdded { get; }
EventHandler OnFileRemoved { get; }
EventHandler OnFileDeleted { get; }
}
3
public interface IFolder_v3
{
void add(IFile);
void remove(IFile);
IEnumerable<IFile> files { get; }
EventHandler<CRUD_EventArgs> OnFilesChanged { get; }
}
public class CRUD_EventArgs : EventArgs
{
public enum Operations
{
added,
removed,
updated
}
private Operations _op;
public CRUD_EventArgs(Operations operation)
{
this._op = operation;
}
public Operations operation
{
get
{
return this._op;
}
}
}
Idea #1 seems really nice to implement as doesn't require much code, but has some problems: What, for example, if an implementation of IFolder only allows to add files of specific types (Say, text files), and throws an exception whenever another file is being added? I don't think that would be feasible with a simple ObservableCollection.
Idea #2 seems ok, but requires more code. Also, defining three separate events seems a bit tedious - what if an object needs to subscribe to all events? We'd need to subscribe to 3 different eventhandlers for that. Seems annoying.
Also a little less easy to use than solution #1 as now, one needs to call .Add to add files, but a list of files is stored in .files etc. - so the naming conventions are a bit less clear than having everything bundled up in one simple sub-object (.files from idea #1).
Idea #3 circumvents all of those problems, but has the longest code. Also, I have to use a custom EventArgs class, which I can't imagine is particularly clean in an interface definition? (Also seems overkill to define a class like that for simple CRUD event notifications, shouldn't there be an existing class of some sort?)
Would appreciate some feedback on what you think is the best solution (possibly even something I haven't thought of at all). Is there any best practice?

Take a look at the Framework's FileSystemWatcher class. It does pretty much what you need, but if anyway you still need to implement your own class, you can take ideas by looking at how it is implemented (which is by the way similar to your #2 approach).
Having said that, I personally think that #3 is also a very valid approach. Don't be afraid of writing long code (within reasonable limits of course) if the result is more readable and maintainable than it would be with shorter code.

Personally I would go with #2.
In #1 you just expose a entire collection of objects, allowing everyone to do anything with them.
#3 seems less self explanatory to me. Though - I like to keep thing simple when coding so I may be biased.

If watchers are going to be shorter-lived than the thing being watched, I would avoid events. The pattern exemplified by ObservableCollection, where the collection gives a subscribed observer an IDisposable object which can be used to unsubscribe is a much better approach. If you use such a pattern, you can have your class hold a weak reference (probably use a "long" weak reference) to the the subscription object, which would in turn hold a strong reference (probably a delegate) to the subscriber and to the weak reference which identifies it. Abandoned subscriptions will thus get cleaned up by the garbage collector; it will be the duty of a subscriber to ensure that a strongly-rooted reference exists to the subscription object.
Beyond the fact that abandoned subscriptions can get cleaned up, another advantage of using the
"disposable subscription-object" approach is that unsubscription can easily be made lock-free and thread-safe, and run in constant time. To dispose a subscription, simply null out the delegate contained therein. If each attempt to add a subscription causes the subscription manager to inspect a couple of subscriptions to ensure that they are still valid, the total number of subscriptions in existence will never grow to more than twice the number that were valid as of the last garbage collection.

Class vs Struct

Using these two as references
Immutable class vs struct
http://msdn.microsoft.com/en-us/library/ms229017(v=vs.110).aspx
I wonder what is recommended in my case (and preferably why). I am currently building an inventory manager, add, delete, move, remove Item. (Will later have equip/unequip as well, but they do not directly integrate with my inventory, as the inventory will only manage the remove (for equip) and add (for unequip)
So either a class or a struct, that is the question!
One person points out a 8 byte limit, msdn says 16. By Item class will hold a lot more than that.
private string _name;
private string _desc;
private bool _dropable;
private int _price;
private int _maxQuantity;
My ItemSlot struct/class will have an Item (or an int _itemId) as well as a quantity. Which should (yes?) add up to well over 16 bytes.
private ItemSlot[] _inventory;
public const int INVENTORY_SIZE = 50;
public Party () {
ItemSlot[] _inventory = new ItemSlot[INVENTORY_SIZE];
}
The tutorial series I am trying to follow uses a struct, however; now with this knowledge am I correct in that I should have a class for the item slots? Or is my understanding all to shallow?

Go with a class.
The recommended sizes for structs isn't because of available memory (yes, memory is cheap these days as pointed out by Arun in the comments).. but the real reason is because structs are copied by value. This means that every time you pass your structure around.. every field is copied along with it. So, the "struct byte-limit" recommendations you're seeing everywhere is to avoid that.
A class on the other hand, only copies the value of a reference.. which is the native word size of the processor it is running on .. making the copy operation barely measurable.
You stated your structure is going to be much bigger than 16 bytes.. that is definitely reason enough to go with a class purely to avoid the overhead of copying around entire blocks of memory when using a struct.

As others mentioned: Don't worry about memory usage so much. Or rather: Worry about memory usage where it matters.
But what really matters: A class is flexible are struct is not. It you need to add some logic to your data later on, this is possible with a class, but not with a struct. E.g. a class can have a method, but not a struct.
This can be a huge headache, I have often thought "damn now I have to provide a method which does some task and now I have to change this struct to class".
So my rule of thumb is:
Use a struct only when: There is no foreseeable need for methods or complex getters / setters AND the data are very small and are unlikely to grow.
The "and clause" comes from the fact that complex structures are getting a method in the future, regardless what you are thinking now.
If you look into the net framework, classes are used almost everywhere, where structs are only used for very small related data like a Point (x and y coordinates)

Avoiding array duplication

According to [MSDN: Array usage guidelines](http://msdn.microsoft.com/en-us/library/k2604h5s(VS.71).aspx):
Array Valued Properties
You should use collections to avoid code inefficiencies. In the following code example, each call to the myObj property creates a copy of the array. As a result, 2n+1 copies of the array will be created in the following loop.
[Visual Basic]
Dim i As Integer
For i = 0 To obj.myObj.Count - 1
DoSomething(obj.myObj(i))
Next i
[C#]
for (int i = 0; i < obj.myObj.Count; i++)
DoSomething(obj.myObj[i]);
Other than the change from myObj[] to ICollection myObj, what else would you recommend? Just realized that my current app is leaking memory :(
Thanks;
EDIT: Would forcing C# to pass references w/ ref (safety aside) improve performance and/or memory usage?

No, it isn't leaking memory - it is just making the garbage collector work harder than it might. Actually, the MSDN article is slightly misleading: if the property created a new collection every time it was called, it would be just as bad (memory wise) as with an array. Perhaps worse, due to the usual over-sizing of most collection implementations.
If you know a method/property does work, you can always minimise the number of calls:
var arr = obj.myObj; // var since I don't know the type!
for (int i = 0; i < arr.Length; i++) {
DoSomething(arr[i]);
}
or even easier, use foreach:
foreach(var value in obj.myObj) {
DoSomething(value);
}
Both approaches only call the property once. The second is clearer IMO.
Other thoughts; name it a method! i.e. obj.SomeMethod() - this sets expectation that it does work, and avoids the undesirable obj.Foo != obj.Foo (which would be the case for arrays).
Finally, Eric Lippert has a good article on this subject.

Just as a hint for those who haven't use the ReadOnlyCollection mentioned in some of the answers:
[C#]
class XY
{
private X[] array;
public ReadOnlyCollection<X> myObj
{
get
{
return Array.AsReadOnly(array);
}
}
}
Hope this might help.

Whenever I have properties that are costly (like recreating a collection on call) I either document the property, stating that each call incurs a cost, or I cache the value as a private field. Property getters that are costly, should be written as methods.
Generally, I try to expose collections as IEnumerable rather than arrays, forcing the consumer to use foreach (or an enumerator).

It will not make copies of the array unless you make it do so. However, simply passing the reference to an array privately owned by an object has some nasty side-effects. Whoever receives the reference is basically free to do whatever he likes with the array, including altering the contents in ways that cannot be controlled by its owner.
One way of preventing unauthorized meddling with the array is to return a copy of the contents. Another (slightly better) is to return a read-only collection.
Still, before doing any of these things you should ask yourself if you are about to give away too much information. In some cases (actually, quite often) it is even better to keep the array private and instead let provide methods that operate on the object owning it.

myobj will not create new item unless you explicitly create one. so to make better memory usage I recommend to use private collection (List or any) and expose indexer which will return the specified value from the private collection

Return collection as read-only

I have an object in a multi-threaded environment that maintains a collection of information, e.g.:
public IList<string> Data
{
get
{
return data;
}
}
I currently have return data; wrapped by a ReaderWriterLockSlim to protect the collection from sharing violations. However, to be doubly sure, I'd like to return the collection as read-only, so that the calling code is unable to make changes to the collection, only view what's already there. Is this at all possible?

If your underlying data is stored as list you can use List(T).AsReadOnly method.
If your data can be enumerated, you can use Enumerable.ToList method to cast your collection to List and call AsReadOnly on it.

I voted for your accepted answer and agree with it--however might I give you something to consider?
Don't return a collection directly. Make an accurately named business logic class that reflects the purpose of the collection.
The main advantage of this comes in the fact that you can't add code to collections so whenever you have a native "collection" in your object model, you ALWAYS have non-OO support code spread throughout your project to access it.
For instance, if your collection was invoices, you'd probably have 3 or 4 places in your code where you iterated over unpaid invoices. You could have a getUnpaidInvoices method. However, the real power comes in when you start to think of methods like "payUnpaidInvoices(payer, account);".
When you pass around collections instead of writing an object model, entire classes of refactorings will never occur to you.
Note also that this makes your problem particularly nice. If you don't want people changing the collections, your container need contain no mutators. If you decide later that in just one case you actually HAVE to modify it, you can create a safe mechanism to do so.
How do you solve that problem when you are passing around a native collection?
Also, native collections can't be enhanced with extra data. You'll recognize this next time you find that you pass in (Collection, Extra) to more than one or two methods. It indicates that "Extra" belongs with the object containing your collection.

If your only intent is to get calling code to not make a mistake, and modify the collection when it should only be reading all that is necessary is to return an interface which doesn't support Add, Remove, etc.. Why not return IEnumerable<string>? Calling code would have to cast, which they are unlikely to do without knowing the internals of the property they are accessing.
If however your intent is to prevent the calling code from observing updates from other threads you'll have to fall back to solutions already mentioned, to perform a deep or shallow copy depending on your need.

I think you're confusing concepts here.
The ReadOnlyCollection provides a read-only wrapper for an existing collection, allowing you (Class A) to pass out a reference to the collection safe in the knowledge that the caller (Class B) cannot modify the collection (i.e. cannot add or remove any elements from the collection.)
There are absolutely no thread-safety guarantees.
If you (Class A) continue to modify the underlying collection after you hand it out as a ReadOnlyCollection then class B will see these changes, have any iterators invalidated, etc. and generally be open to any of the usual concurrency issues with collections.
Additionally, if the elements within the collection are mutable, both you (Class A) and the caller (Class B) will be able to change any mutable state of the objects within the collection.
Your implementation depends on your needs:
- If you don't care about the caller (Class B) from seeing any further changes to the collection then you can just clone the collection, hand it out, and stop caring.
- If you definitely need the caller (Class B) to see changes that are made to the collection, and you want this to be thread-safe, then you have more of a problem on your hands. One possibility is to implement your own thread-safe variant of the ReadOnlyCollection to allow locked access, though this will be non-trivial and non-performant if you want to support IEnumerable, and it still won't protect you against mutable elements in the collection.

One should note that aku's answer will only protect the list as being read only. Elements in the list are still very writable. I don't know if there is any way of protecting non-atomic elements without cloning them before placing them in the read only list.

You can use a copy of the collection instead.
public IList<string> Data {
get {
return new List<T>(data);
}}
That way it doesn't matter if it gets updated.

You want to use the yield keyword. You loop through the IEnumerable list and return the results with yeild. This allows the consumer to use the for each without modifying the collection.
It would look something like this:
List<string> _Data;
public IEnumerable<string> Data
{
get
{
foreach(string item in _Data)
{
return yield item;
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Best practice way to store collection - c#

Related

Am I missing some benefits of static fields? [closed]

Best practice for interface to allow adding, deleting etc. child objects w/ broadcasting events (similar to ObservableCollection)

Class vs Struct

Avoiding array duplication

Return collection as read-only

Categories

Resources