Related
Consider the following:
public class FooBar {
public int[] SomeNumbers {
get { return _someNumbers; }
private set;
}
private int[] _someNumbers;
public FooBar() {
_someNumbers = new int[2];
_someNumbers[0] = 1;
_someNumbers[1] = 2;
}
}
// in some other method somewhere...
FooBar foobar = new FooBar();
Debug.Log(foobar.SomeNumbers[0]);
What I am wondering is, does calling the SomeNumbers property cause a heap allocation; basically does it cause a copy of the array to be created, or is it just a pointer?
I ask because I am trying to resolves some GC issues I have due to functions that return arrays, and I want to make sure my idea of caching some values like this will actually make a difference
Arrays are always reference types, so yes, it is "basically returning a pointer".
If you are trying to debug memory issues I recommend using a memory profiler. There is one built in to Visual Studio or you can use a 3rd party one (I personally like DotMemory, it has a 5 day free trial). Using a profiler will help you identify what is creating memory objects and what is keeping memory objects alive.
I have declared a basic struct like this
private struct ValLine {
public string val;
public ulong linenum;
}
and declared a Queue like this
Queue<ValLine> check = new Queue<ValLine>();
Then in a using StreamReader setup where I'm reading through the lines of an input file using ReadLine in a while loop, among other things, I'm doing this to populate the Queue:
check.Enqueue(new ValLine { val = line, linenum = linenum });
("line" is a string containing the text of each line, "linenum" is just a counter that is initialized at 0 and is incremented each time through the loop.)
The purpose of the "check" Queue is that if a particular line meets some criteria, then I store that line in "check" along with the line number that it occurs on in the input file.
After I've finished reading through the input file, I use "check" for various things, but then when I'm finished using it I clear it out in the obvious manner:
check.Clear();
(Alternatively, in my final loop through "check" I could just use .Dequeue(), instead of foreach'ing it.)
But then I got to thinking - wait a minute, what about all those "new ValLine" I generated when populating the Queue in the first place??? Have I created a memory leak? I've pretty new to C#, so it's not coming clear to me how to deal with this - or even if it should be dealt with (perhaps .Clear() or .Dequeue() deals with the now obsoleted structs automatically?). I've spent over an hour with our dear friend Google, and just not finding any specific discussion of this kind of example in regard to the clearing of a collection of structs.
So... In C# do we need to deal with wiping out the individual structs before clearing the queue (or as we are dequeueing), or not? And if so, then what is the proper way to do this?
(Just in case it's relevant, I'm using .NET 4.5 in Visual Studio 2013.)
UPDATE: This is for future reference (you know, like if this page comes up in a Google search) in regard to proper coding. To make the struct immutable as per recommendation, this is what I've ended up with:
private struct ValLine {
private readonly string _val;
private readonly ulong _linenum;
public string val { get { return _val; } }
public ulong linenum { get { return _linenum; } }
public ValLine(string x, ulong n) { _val = x; _linenum = n; }
}
Corresponding to that change, the queue population line is now this:
check.Enqueue(new ValLine(line,linenum));
Also, though not strictly necessary, I did get rid of my foreach on the queue (and the check.Clear();, and changed it to this
while (check.Count > 0) {
ValLine ll = check.Dequeue();
writer.WriteLine("[{0}] {1}", ll.linenum, ll.val);
}
so that the queue is emptied out as the information is output.
UPDATE 2: Okay, yes, I'm still a C# newbie (less than a year). I learn a lot from the Internet, but of course, I'm often looking at examples from more than a year ago. I have changed my struct so now it looks like this:
private struct ValLine {
public string val { get; private set; }
public ulong linenum { get; private set; }
public ValLine(string x, ulong n): this()
{ this.val = x; this.linenum = n; }
}
Interestingly enough, I had actually tried exactly this off the top of my head before coming up with what's in the first update (above), but got a compile error (because I did not have the : this() with the constructor). Upon further suggestion, I checked further and found a recent example showing that : this() for making it work like I tried before, plugged that in, and - Wa La! - clean compile. I like the cleaner look of the code. What the private variables are called is irrelevant to me.
No, you won't have created a memory leak. Calling Clear or Dequeue will clear the memory appropriately - for example, if you had a List<T> then a clear operation might use:
for (int i = 0; i < capacity; i++)
{
array[i] = default(T);
}
I don't know offhand whether Queue<T> is implemented with a circular buffer built on an array, or a linked list - but either way, you'll be fine.
Having said that, I would strongly recommend against using mutable structs as you're doing here, along with mutable fields. While it's not causing the particular problem you're envisaging, they can behave in confusing ways.
According to the requirement we have to return a collection either in reverse order or as
it is. We, beginning level programmer designed the collection as follow :(sample is given)
namespace Linqfying
{
class linqy
{
static void Main()
{
InvestigationReport rpt=new InvestigationReport();
// rpt.GetDocuments(true) refers
// to return the collection in reverse order
foreach( EnquiryDocument doc in rpt.GetDocuments(true) )
{
// printing document title and author name
}
}
}
class EnquiryDocument
{
string _docTitle;
string _docAuthor;
// properties to get and set doc title and author name goes below
public EnquiryDocument(string title,string author)
{
_docAuthor = author;
_docTitle = title;
}
public EnquiryDocument(){}
}
class InvestigationReport
{
EnquiryDocument[] docs=new EnquiryDocument[3];
public IEnumerable<EnquiryDocument> GetDocuments(bool IsReverseOrder)
{
/* some business logic to retrieve the document
docs[0]=new EnquiryDocument("FundAbuse","Margon");
docs[1]=new EnquiryDocument("Sexual Harassment","Philliphe");
docs[2]=new EnquiryDocument("Missing Resource","Goel");
*/
//if reverse order is preferred
if(IsReverseOrder)
{
for (int i = docs.Length; i != 0; i--)
yield return docs[i-1];
}
else
{
foreach (EnquiryDocument doc in docs)
{
yield return doc;
}
}
}
}
}
Question :
Can we use other collection type to improve efficiency ?
Mixing of Collection with LINQ reduce the code ? (We are not familiar with LINQ)
Looks fine to me. Yes, you could use the Reverse extension method... but that won't be as efficient as what you've got.
How much do you care about the efficiency though? I'd go with the most readable solution (namely Reverse) until you know that efficiency is a problem. Unless the collection is large, it's unlikely to be an issue.
If you've got the "raw data" as an array, then your use of an iterator block will be more efficient than calling Reverse. The Reverse method will buffer up all the data before yielding it one item at a time - just like your own code does, really. However, simply calling Reverse would be a lot simpler...
Aside from anything else, I'd say it's well worth you learning LINQ - at least LINQ to Objects. It can make processing data much, much cleaner than before.
Two questions:
Does the code you currently have work?
Have you identified this piece of code as being your performance bottleneck?
If the answer to either of those questions is no, don't worry about it. Just make it work and move on. There's nothing grossly wrong about the code, so no need to fret! Spend your time building new functionality instead. Save LINQ for a new problem you haven't already solved.
Actually this task seems pretty straightforward. I'd actually just use the Reverse method on a Generic List.
This should already be well-optimized.
Your GetDocuments method has a return type of IEnumerable so there is no need to even loop over your array when IsReverseOrder is false, you can just return it as is as Array type is IEnumerable...
As for when IsReverseOrder is true you can use either Array.Reverse or the Linq Reverse() extension method to reduce the amount of code.
I have a class that I have to call one or two methods a lot of times after each other. The methods currently return void. I was thinking, would it be better to have it return this, so that the methods could be nested? or is that considerd very very very bad? or if bad, would it be better if it returned a new object of the same type? Or what do you think? As an example I have created three versions of an adder class:
// Regular
class Adder
{
public Adder() { Number = 0; }
public int Number { get; private set; }
public void Add(int i) { Number += i; }
public void Remove(int i) { Number -= i; }
}
// Returning this
class Adder
{
public Adder() { Number = 0; }
public int Number { get; private set; }
public Adder Add(int i) { Number += i; return this; }
public Adder Remove(int i) { Number -= i; return this; }
}
// Returning new
class Adder
{
public Adder() : this(0) { }
private Adder(int i) { Number = i; }
public int Number { get; private set; }
public Adder Add(int i) { return new Adder(Number + i); }
public Adder Remove(int i) { return new Adder(Number - i); }
}
The first one can be used this way:
var a = new Adder();
a.Add(4);
a.Remove(1);
a.Add(7);
a.Remove(3);
The other two can be used this way:
var a = new Adder()
.Add(4)
.Remove(1)
.Add(7)
.Remove(3);
Where the only difference is that a in the first case is the new Adder() while in the latter it is the result of the last method.
The first I find that quickly become... annoying to write over and over again. So I would like to use one of the other versions.
The third works kind of like many other methods, like many String methods and IEnumerable extension methods. I guess that has its positive side in that you can do things like var a = new Adder(); var b = a.Add(5); and then have one that was 0 and one that was 5. But at the same time, isn't it a bit expensive to create new objects all the time? And when will the first object die? When the first method returns kind of? Or?
Anyways, I like the one that returns this and think I will use that, but I am very curious to know what others think about this case. And what is considered best practice.
The 'return this' style is sometimes called a fluent interface and is a common practice.
I like "fluent syntax" and would take the second one. After all, you could still use it as the first, for people who feel uncomfortable with fluent syntax.
another idea to make an interface like the adders one easier to use:
public Adder Add(params int[] i) { /* ... */ }
public Adder Remove(params int[] i) { /* ... */ }
Adder adder = new Adder()
.Add(1, 2, 3)
.Remove(3, 4);
I always try to make short and easy-to-read interfaces, but many people like to write the code as complicated as possible.
Chaining is a nice thing to have and is core in some frameworks (for instance Linq extensions and jQuery both use it heavily).
Whether you create a new object or return this depends on how you expect your initial object to behave:
var a = new Adder();
var b = a.Add(4)
.Remove(1)
.Add(7)
.Remove(3);
//now - should a==b ?
Chaining in jQuery will have changed your original object - it has returned this.
That's expected behaviour - do do otherwise would basically clone UI elements.
Chaining in Linq will have left your original collection unchanged. That too is expected behaviour - each chained function is a filter or transformation, and the original collection is often immutable.
Which pattern better suits what you're doing?
I think that for simple interfaces, the "fluent" interface is very useful, particularly because it is very simple to implement. The value of the fluent interface is that it eliminates a lot of the extraneous fluff that gets in the way of understanding. Developing such an interface can take a lot of time, especially when the interface starts to be involved. You should worry about how the usage of the interface "reads"; In my mind, the most compelling use for such an interface is how it communicates the intent of the programmer, not the amount of characters that it saves.
To answer your specific question, I like the "return this" style. My typical use of the fluent interface is to define a set of options. That is, I create an instance of the class and then use the fluent methods on the instance to define the desired behavior of the object. If I have a yes/no option (say for logging), I try not to have a "setLogging(bool state)" method but rather two methods "WithLogging" and "WithoutLogging". This is somewhat more work but the clarity of the final result is very useful.
Consider this: if you come back to this code in 5 years, is this going to make sense to you? If so, then I suppose you can go ahead.
For this specific example, though, it would seem that overloading the + and - operators would make things clearer and accomplish the same thing.
For your specific case, overloading the arithmetic operators would be probably the best solution.
Returning this (Fluent interface) is common practice to create expressions - unit testing and mocking frameworks use this a lot. Fluent Hibernate is another example.
Returning a new instance might be a good choice, too. It allows you to make your class immutable - in general a good thing and very handy in the case of multithreading. But think about the object creation overhead if immutability is of no use for you.
If you call it Adder, I'd go with returning this. However, it's kind of strange for an Adder class to contain an answer.
You might consider making it something like MyNumber and create an Add()-method.
Ideally (IMHO), that would not change the number that is stored inside your instance, but create a new instance with the new value, which you return:
class MyNumber
{
...
MyNumber Add( int i )
{
return new MyNumber( this.Value + i );
}
}
The main difference between the second and third solution is that by returning a new instance instead of this you are able to "catch" the object in a certain state and continue from that.
var a = new Adder()
.Add(4);
var b = a.Remove(1);
var c = a.Add(7)
.Remove(3);
In this case both b and c have the state captured in a as a starting point.
I came across this idiom while reading about a pattern for building test domain objects in Growing Object-Oriented Software, Guided by Tests by Steve Freeman; Nat Pryce.
On your question regarding the lifetime of your instances: I would exspect them to be elligible for garbage collection as soon as the invocation of Remove or Add are returning.
.NET offers a generic list container whose performance is almost identical (see Performance of Arrays vs. Lists question). However they are quite different in initialization.
Arrays are very easy to initialize with a default value, and by definition they already have certain size:
string[] Ar = new string[10];
Which allows one to safely assign random items, say:
Ar[5]="hello";
with list things are more tricky. I can see two ways of doing the same initialization, neither of which is what you would call elegant:
List<string> L = new List<string>(10);
for (int i=0;i<10;i++) L.Add(null);
or
string[] Ar = new string[10];
List<string> L = new List<string>(Ar);
What would be a cleaner way?
EDIT: The answers so far refer to capacity, which is something else than pre-populating a list. For example, on a list just created with a capacity of 10, one cannot do L[2]="somevalue"
EDIT 2: People wonder why I want to use lists this way, as it is not the way they are intended to be used. I can see two reasons:
One could quite convincingly argue that lists are the "next generation" arrays, adding flexibility with almost no penalty. Therefore one should use them by default. I'm pointing out they might not be as easy to initialize.
What I'm currently writing is a base class offering default functionality as part of a bigger framework. In the default functionality I offer, the size of the List is known in advanced and therefore I could have used an array. However, I want to offer any base class the chance to dynamically extend it and therefore I opt for a list.
List<string> L = new List<string> ( new string[10] );
I can't say I need this very often - could you give more details as to why you want this? I'd probably put it as a static method in a helper class:
public static class Lists
{
public static List<T> RepeatedDefault<T>(int count)
{
return Repeated(default(T), count);
}
public static List<T> Repeated<T>(T value, int count)
{
List<T> ret = new List<T>(count);
ret.AddRange(Enumerable.Repeat(value, count));
return ret;
}
}
You could use Enumerable.Repeat(default(T), count).ToList() but that would be inefficient due to buffer resizing.
Note that if T is a reference type, it will store count copies of the reference passed for the value parameter - so they will all refer to the same object. That may or may not be what you want, depending on your use case.
EDIT: As noted in comments, you could make Repeated use a loop to populate the list if you wanted to. That would be slightly faster too. Personally I find the code using Repeat more descriptive, and suspect that in the real world the performance difference would be irrelevant, but your mileage may vary.
Use the constructor which takes an int ("capacity") as an argument:
List<string> = new List<string>(10);
EDIT: I should add that I agree with Frederik. You are using the List in a way that goes against the entire reasoning behind using it in the first place.
EDIT2:
EDIT 2: What I'm currently writing is a base class offering default functionality as part of a bigger framework. In the default functionality I offer, the size of the List is known in advanced and therefore I could have used an array. However, I want to offer any base class the chance to dynamically extend it and therefore I opt for a list.
Why would anyone need to know the size of a List with all null values? If there are no real values in the list, I would expect the length to be 0. Anyhow, the fact that this is cludgy demonstrates that it is going against the intended use of the class.
Create an array with the number of items you want first and then convert the array in to a List.
int[] fakeArray = new int[10];
List<int> list = fakeArray.ToList();
If you want to initialize the list with N elements of some fixed value:
public List<T> InitList<T>(int count, T initValue)
{
return Enumerable.Repeat(initValue, count).ToList();
}
Why are you using a List if you want to initialize it with a fixed value ?
I can understand that -for the sake of performance- you want to give it an initial capacity, but isn't one of the advantages of a list over a regular array that it can grow when needed ?
When you do this:
List<int> = new List<int>(100);
You create a list whose capacity is 100 integers. This means that your List won't need to 'grow' until you add the 101th item.
The underlying array of the list will be initialized with a length of 100.
This is an old question, but I have two solutions. One is fast and dirty reflection; the other is a solution that actually answers the question (set the size not the capacity) while still being performant, which none of the answers here do.
Reflection
This is quick and dirty, and should be pretty obvious what the code does. If you want to speed it up, cache the result of GetField, or create a DynamicMethod to do it:
public static void SetSize<T>(this List<T> l, int newSize) =>
l.GetType().GetField("_size", BindingFlags.NonPublic | BindingFlags.Instance).SetValue(l, newSize);
Obviously a lot of people will be hesitant to put such code into production.
ICollection<T>
This solution is based around the fact that the constructor List(IEnumerable<T> collection) optimizes for ICollection<T> and immediately adjusts the size to the correct amount, without iterating it. It then calls the collections CopyTo to do the copy.
The code for the List<T> constructor is as follows:
public List(IEnumerable<T> collection) {
....
ICollection<T> c = collection as ICollection<T>;
if (collection is ICollection<T> c)
{
int count = c.Count;
if (count == 0)
{
_items = s_emptyArray;
}
else {
_items = new T[count];
c.CopyTo(_items, 0);
_size = count;
}
}
So we can completely optimally pre-initialize the List to the correct size, without any extra copying.
How so? By creating an ICollection<T> object that does nothing other than return a Count. Specifically, we will not implement anything in CopyTo which is the only other function called.
private struct SizeCollection<T> : ICollection<T>
{
public SizeCollection(int size) =>
Count = size;
public void Add(T i){}
public void Clear(){}
public bool Contains(T i)=>true;
public void CopyTo(T[]a, int i){}
public bool Remove(T i)=>true;
public int Count {get;}
public bool IsReadOnly=>true;
public IEnumerator<T> GetEnumerator()=>null;
IEnumerator IEnumerable.GetEnumerator()=>null;
}
public List<T> InitializedList<T>(int size) =>
new List<T>(new SizeCollection<T>(size));
We could in theory do the same thing for AddRange/InsertRange for an existing array, which also accounts for ICollection<T>, but the code there creates a new array for the supposed items, then copies them in. In such case, it would be faster to just empty-loop Add:
public void SetSize<T>(this List<T> l, int size)
{
if(size < l.Count)
l.RemoveRange(size, l.Count - size);
else
for(size -= l.Count; size > 0; size--)
l.Add(default(T));
}
Initializing the contents of a list like that isn't really what lists are for. Lists are designed to hold objects. If you want to map particular numbers to particular objects, consider using a key-value pair structure like a hash table or dictionary instead of a list.
You seem to be emphasizing the need for a positional association with your data, so wouldn't an associative array be more fitting?
Dictionary<int, string> foo = new Dictionary<int, string>();
foo[2] = "string";
The accepted answer (the one with the green check mark) has an issue.
The problem:
var result = Lists.Repeated(new MyType(), sizeOfList);
// each item in the list references the same MyType() object
// if you edit item 1 in the list, you are also editing item 2 in the list
I recommend changing the line above to perform a copy of the object. There are many different articles about that:
String.MemberwiseClone() method called through reflection doesn't work, why?
https://code.msdn.microsoft.com/windowsdesktop/CSDeepCloneObject-8a53311e
If you want to initialize every item in your list with the default constructor, rather than NULL, then add the following method:
public static List<T> RepeatedDefaultInstance<T>(int count)
{
List<T> ret = new List<T>(count);
for (var i = 0; i < count; i++)
{
ret.Add((T)Activator.CreateInstance(typeof(T)));
}
return ret;
}
You can use Linq to cleverly initialize your list with a default value. (Similar to David B's answer.)
var defaultStrings = (new int[10]).Select(x => "my value").ToList();
Go one step farther and initialize each string with distinct values "string 1", "string 2", "string 3", etc:
int x = 1;
var numberedStrings = (new int[10]).Select(x => "string " + x++).ToList();
string [] temp = new string[] {"1","2","3"};
List<string> temp2 = temp.ToList();
After thinking again, I had found the non-reflection answer to the OP question, but Charlieface beat me to it. So I believe that the correct and complete answer is https://stackoverflow.com/a/65766955/4572240
My old answer:
If I understand correctly, you want the List<T> version of new T[size], without the overhead of adding values to it.
If you are not afraid the implementation of List<T> will change dramatically in the future (and in this case I believe the probability is close to 0), you can use reflection:
public static List<T> NewOfSize<T>(int size) {
var list = new List<T>(size);
var sizeField = list.GetType().GetField("_size",BindingFlags.Instance|BindingFlags.NonPublic);
sizeField.SetValue(list, size);
return list;
}
Note that this takes into account the default functionality of the underlying array to prefill with the default value of the item type. All int arrays will have values of 0 and all reference type arrays will have values of null. Also note that for a list of reference types, only the space for the pointer to each item is created.
If you, for some reason, decide on not using reflection, I would have liked to offer an option of AddRange with a generator method, but underneath List<T> just calls Insert a zillion times, which doesn't serve.
I would also like to point out that the Array class has a static method called ResizeArray, if you want to go the other way around and start from Array.
To end, I really hate when I ask a question and everybody points out that it's the wrong question. Maybe it is, and thanks for the info, but I would still like an answer, because you have no idea why I am asking it. That being said, if you want to create a framework that has an optimal use of resources, List<T> is a pretty inefficient class for anything than holding and adding stuff to the end of a collection.
A notice about IList:
MSDN IList Remarks:
"IList implementations fall into three categories: read-only, fixed-size, and variable-size. (...). For the generic version of this interface, see
System.Collections.Generic.IList<T>."
IList<T> does NOT inherits from IList (but List<T> does implement both IList<T> and IList), but is always variable-size.
Since .NET 4.5, we have also IReadOnlyList<T> but AFAIK, there is no fixed-size generic List which would be what you are looking for.
This is a sample I used for my unit test. I created a list of class object. Then I used forloop to add 'X' number of objects that I am expecting from the service.
This way you can add/initialize a List for any given size.
public void TestMethod1()
{
var expected = new List<DotaViewer.Interface.DotaHero>();
for (int i = 0; i < 22; i++)//You add empty initialization here
{
var temp = new DotaViewer.Interface.DotaHero();
expected.Add(temp);
}
var nw = new DotaHeroCsvService();
var items = nw.GetHero();
CollectionAssert.AreEqual(expected,items);
}
Hope I was of help to you guys.
A bit late but first solution you proposed seems far cleaner to me : you dont allocate memory twice.
Even List constrcutor needs to loop through array in order to copy it; it doesn't even know by advance there is only null elements inside.
1.
- allocate N
- loop N
Cost: 1 * allocate(N) + N * loop_iteration
2.
- allocate N
- allocate N + loop ()
Cost : 2 * allocate(N) + N * loop_iteration
However List's allocation an loops might be faster since List is a built-in class, but C# is jit-compiled sooo...