Lets consider the following scenario in the Single Linked List:-
I have been given the target node, which is going to be deleted.
Lets assume the following data and I am going to receive the object which holds "3", which is the one I am going to delete;
1 -> 2 -> 3 -> 4 -> 5 -> 6
And Class Structure is:-
Class DataHolder
{
int data;
DataHolder nxtPrt;
}
Void Delete (DataHolder currentData)
{
currentData.data = currentData.nxtPrt.data; //Now 3 will be overwritten by 4
(x) currentData.nxtPrt = (y) currentData.nxtPrt.nxtPrt;
//Now the object which belongs to 4 (previously it was 3),
//is pointing to 5;
}
So, now the actual copy of the object 4 is now become useless;
So, now i just want to remove the space allotted to original copy of 4;
But, now I cannot track it also since, I have altered the object to point 5.
So right now, at this point I have lost the actual object 4.
May I Kindly know, is there any way to forcefully ask the object to release its occupied memory like doing in "C" using dealloc,
or I have to depend on the GC to collect the unused space upon its wish.
Thanks in advance.
You're always relying on the GC, there's no way around it. And yes, it will clean up your other objects, as long as there's no reference to them. You can allocate unmanaged memory and deal with it as you see fit but, in that case, why are you using C#? Just use C(++).
But the simplest answer is don't write your own linked list. Just use LinkedList<YourStruct>. Learn your environment - the language(s), the libraries and the runtime. If you're just going to write C code in C#, you're going to hurt, nobody's going to understand your code and you gain hardly any benefit from working in C#. Again, if you don't want to use C#/.NET... don't. There's nothing inherently wrong with C or C++, or with unmanaged languages. Use the best tool for the job.
Don't think in C terms at all. It simply doesn't work in a GC'd/managed environment. Where does memory come from when you allocate it in C? Usually the stack or the heap, with a few bits in registers. In .NET, this is kind of abstracted away, but in practice, you still only have those three locations. However, they work differently. You can't allocate classes or arrays on a stack (there's limited support using unsafe code, but that's it). There's multiple heaps, and apart from the large object heap, they always allocate from the top, similar to a stack. So deallocating a single object has no value whatsoever - if you don't compact the heap to eliminate the free spots, you don't get less memory usage, and you don't get any extra space for new objects.
Related
I have experience in C and C++(languages without GC) but just recently using C# a lot. I have a question to which I think I got the answer but I want for someone to confirm me, or correct me(if I am wrong).
Say I have the following
int[,] g = new int[nx, ny];
very easy. This just separates memory for a 2D array of ints. After that I can use it, provided that I don't surpass nx or ny as limits of the array.
Now, suppose I want to do this several times but with different nx's and ny's everytime (how this new values are calculated is of no importance)
so I would do
int[,] g;
for(k=0;k<numberOfTimes;k++)
{
//re-calculate nx and ny
g = new int[nx, ny];
//work with g
}
normally I would think that everytime I am separating memory for g, I am leaving leaked memory that could never be reached. Obviously I would have to "delete" this.
But since C# has Garbage Collection, can I do the above with impunity??
In other words, is my code above safe enough?
any suggestion to better it?
Your code is safe enough.
GC collects only those objects on the heap that have no references on the stack pointing to them. As in your scenario, a variable in the for loop scope gets another reference, the previous array loses all available references to it and thus gets collected in some time.
Also, there is no need to declare the variable in the outer scope as it is optimized later on during compile time. More about that : here
My question might sound a little vague. But what I want to know is where the List<> buffer is maintained.
I have a list List<MyClass> to which I am adding items from an infinite loop. But the RAM consumption of the Windows Service(inside which I am creating the List) never goes beyond 17 MB. In fact it hovers between 15-16MB even if I continue adding items to the List.
I was trying to do some Load Testing of My Service and came across this thing.
Can anyone tell me whether it dumps the data to some temporary location on the machine, and picks it from there as I don't see an increase in RAM consumption.
The method which I am calling infinitely is AddMessageToList().
class MainClass
{
List<MessageDetails> messageList = new List<MessageDetails>();
private void AddMessageToList()
{
SendMessage(ApplicationName,Address, Message);
MessageDetails obj= new MessageDetails();
obj.ApplicationName= ApplicationName;
obj.Address= Address;
obj.Message= Message;
lock(messageList)
{
messageList.Add(obj);
}
}
}
class MessageDetails
{
public string Message
{
get;
set;
}
public string ApplicationName
{
get;
set;
}
public string Address
{
get;
set;
}
}
The answer to your question is: "In Memory".
That can mean RAM, and it can also mean the hard drive (Virtual Memory). The OS memory manager decides when to page memory to Virtual Memory, which mostly has to do with how often the memory is accessed (though I don't pretend to know Microsoft's specific algorithm).
You also asked why your memory usages isn't going up. First off, a MegaByte is a HUGE amount of memory. Unless your class is quite large, you will need a LOT of them to make a MB appear. Eventually your memory usage should go up though.
In general C# objects are created from the Heap, which resides in memory. If you want to store things on disk there are ways to go about it, but a standard List<T> will live in memory.
When you create an object it will occupy a certain number of bytes in memory plus the size of the pointers used to reference it. Adding it to a list only adds a pointer to the object you've already created, so if you're adding lots of copies of the same instance into the list, it won't grow as fast as you expect.
If you really want to test the impact of large data structures on your memory, you're going to have to ramp up the numbers. A few thousand average objects aren't going to occupy much memory, but a few million might.
You might also be interested in the GC.GetTotalMemory() method and its friends.
Note that pretty much all memory on Windows (and .NET) is Virtual Memory - its "real, physical" location is arbitrary, Windows memory management handles that. However, regardless of whether it's currently using physical RAM or a page file on the HDD, it will show up as committed private memory.
So it's up to how you're actually creating the items and adding them to the List<T>. How many objects are there? Are you adding the same object over and over again, or creating a new one every time? Are you using the same List instance, or are you creating others? Do you keep references to the created objects (and List instances), or are you throwing them away? Do you actually do anything with the object / list? If not, the optimizer might have removed the code alltogether (it's very conservative, though, so I wouldn't count on that in adding items to a list - that's a very complex scenario with possible side effects).
In the lowest memory ideal case, you could be using about four bytes per list item, that's not much - you'd need 262 144 items to consume a single MiB of memory!
Show us your code, the whole loop and it's surroundings. Then we can tell you what you're actually doing.
EDIT: This is in a WCF service? You should have said so before. Where do you store the MainClass class? If it's inside the WCF service class, it might not last longer than a single request. And even if you fix that, and store it in something a bit more persistent, like a static class, you get into the complexities of when everything is collected, how the service is being restarted etc. If you need the data to be safely held for longer than a single request, storing it in process memory isn't good enough. If you don't care that the data can get thrown away once in a while, you can make the List instance static (it's not going to be shared nor persisted otherwise). Otherwise, use a database.
Speculating from your sparse description, there are two sorts of things you might not realize:
The 15-16 MB usage you see might have nothing to do with the size of your list: it could be the memory requirements for the rest of the program, and your list only consumes a negligible amount of memory in comparison. Even if you don't explicitly create objects, your program still has to load libraries and stuff, which takes memory.
I don't know C# so I don't know if this applies to List, but one of the standard container class implementations to dynamically allocate an array to hold the objects... and if the array is ever filled, then you allocate a new array twice the size and copy everything over to the new array and continue along (the actual ratio may be something other than $2$). This can have the effect that your memory usage remains constant for a long time, until you finally fill up the array and then it suddenly jumps up in size, only to remain constant again for a long time.
Ok, I want to do the following to me it seems like a good idea so if there's no way to do what I'm asking, I'm sure there's a reasonable alternative.
Anyways, I have a sparse matrix. It's pretty big and mostly empty. I have a class called MatrixNode that's basically a wrapper around each of the cells in the matrix. Through it you can get and set the value of that cell. It also has Up, Down, Left and Right properties that return a new MatrixNode that points to the corresponding cell.
Now, since the matrix is mostly empty, having a live node for each cell, including the empty ones, is an unacceptable memory overhead. The other solution is to make new instances of MatrixNode every time a node is requested. This will make sure that only the needed nodes are kept in the memory and the rest will be collected. What I don't like about it is that a new object has to be created every time. I'm scared about it being too slow.
So here's what I've come up with. Have a dictionary of weak references to nodes. When a node is requested, if it doesn't exist, the dictionary creates it and stores it as a weak reference. If the node does already exist (probably referenced somewhere), it just returns it.
Then, if the node doesn't have any live references left, instead of it being collected, I want to store it in a pool. Later, when a new node is needed, I want to first check if the pool is empty and only make a new node if there isn't one already available that can just have it's data swapped out.
Can this be done?
A better question would be, does .NET already do this for me? Am I right in worrying about the performance of creating single use objects in large numbers?
Instead of guessing, you should make a performance test to see if there are any issues at all. You may be surprised to know that managed memory allocation can often outperform explicit allocation because your code doesn't have to pay for deallocation when your data goes out of scope.
Performance may become an issue only when you are allocating new objects so frequently that the garbage collector has no chance to collect them.
That said, there are sparse array implementations in C# already, like Math.NET and MetaNumerics. These libraries are already optimized for performance and will probably avoid performance issues you will run into if you start your implementation from stratch
An SO search for c# and sparse-matrix will return many related questions, including answers pointing to commercial libraries like ILNumerics (has a community edition), NMath and Extreme Optimization's libraries
Most sparse matrix implementations use one of a few well-known schemes for their data; I generally recommend CSR or CSC, as those are efficient for common operations.
If that seems too complex, you can start using COO. What this means in your code is that you will not store anything for empty members; however, you have an item for every non-empty one. A simple implementation might be:
public struct SparseMatrixItem
{
int Row;
int Col;
double Value;
}
And your matrix would generally be a simple container:
public interface SparseMatrix
{
public IList<SparseMatrixItem> Items { get; }
}
You should make sure that the Items list stays sorted according to the row and col indices, because then you can use binary search to quickly find out if an item exists for a specific (i,j).
The idea of having a pool of objects that people use and then return to the pool is used for really expensive objects. Objects representing a network connection, a new thread, etc. It sounds like your object is very small and easy to create. Given that, you're almost certainly going to harm performance pooling it; the overhead of managing the pool will be greater than the cost of just creating a new one each time.
Having lots of short lived very small objects is the exact case that the GC is designed to handle quickly. Creating a new object is dirt cheap; it's just moving a pointer up and clearing out the bits for that object. The real overhead for objects comes in when a new garbage collection happens; for that it needs to find all "alive" objects and move them around, leaving all "dead" objects in their place. If your small object doesn't live through a single collection it has added almost no overhead. Keeping the objects around for a long time (like, say, by pooling them so you can reuse them) means copying them through several collections, consuming a fair bit of resources.
When an object is created, a reference is returned, and not the object.
What does this mean?
object a = new object();
Here a holds the reference.
It would be helpful if someone explains the creation of the object, creation of references.
I think of a reference as being like a set of directions to get to a house, where the house represents the object itself.
So if you were to tell someone how to get to your house, you might write down those directions on a piece of paper and give it to them - that's like assigning a reference to a variable.
Coming to your example:
object a = new object();
That's like building a new house (calling the constructor) and then on a piece of paper (the variable a) you write the directions to get to the new house. The paper doesn't have the house itself on it - just directions. If someone copies the contents of the piece of paper, like this:
object b = a;
that doesn't create a second house. It just copies the directions from the piece of paper a to the piece of paper b. Likewise then the two pieces of paper are independent - you could change the value of a to a different set of directions, and it wouldn't change what's on b.
I have an article on this which attempts to explain it another way, which you may find helpful.
The following statement:
object a = new object();
Actually does two different things.
First, new object() allocates the necessary memory to store the instance is allocated on the heap and returns the address on the heap that just got allocated.
Secondly, the assignment is evaluated and assigns the value returned from new to a.
When you say "a holds a reference" it means that a is the memory on the stack (or in registers, or on the heap depending no the lifetime of the reference) that points to the heap location of the instance you just created.
When you instantiate a class with the new keyword you create an object on the heap. It is not referenced by anyone yet. If an object has no references to itself it can be soon garbage collected. To operate with an object you need to reference it. So you create a variable which contains the address of the object(the reference).
With most modern languages you don't have to worry about references vs the object themselves. Back in the day (eg c++) you would have to worry about reference (sometimes called pointers) and even the memory allocated for them. Now the system (and what is called the garbage collector) worries about it for you.
Here is the details. Your example line means the following:
1) allocate memory for object
2) run the constructor
3) put the memory location of that object in the variable "a"
What does the mean to you conceptually as a programmer? Not much, most of the time. In C# you can still think of the variable a as the object. You don't have to worry about it pointing to the object under the hood. Most of the time it does not matter.
Where it matters is when you need to be concerned about the lifetime of the object. Because you have a reference to the object it will not be deallocated which means if it is using a system resource it will continue to do so.
When an object is no longer being referenced it will be deallocated by the garbage collector.
Another tongue-in-cheek analogy for the pile: the object is a helium balloon, and a reference is a string tied to that balloon which you are holding. Saying new object() is equivalent to asking a balloon guy (the memory manager) at the fair (your program) for a new balloon. He may either give you a balloon by means of handing you the string, or he may also tell you that there are no balloons left. You may find the latter very upsetting and run crying from the fair.
You may wish to share the balloon with your sibling, and here the analogy starts to fall apart. This could be seen as both you and your sibling holding onto the same string with your hands, or a second string being tied to the balloon for your sibling. Care must be taken to ensure that those who hold the string(s) tied to the balloon coordinate their movements, otherwise the balloon may be ripped or ruined.
Finally, in a language like C# when you become bored of the balloon, you can simply let go of the string. If others are still holding the string it will not go anywhere, but if they are not it floats harmlessly into a net in the sky, from where the balloon guy periodically collects the released balloons to replenish his stock. In other older languages like C, there is no such net and the balloon guy will never be able to recover the helium in that balloon to make another balloon for someone else. He would really appreciate it if you took the balloon back. This causes other problems: you may return the balloon, but forget that your sibling still wanted it. With their back turned, they don't notice you pulling the string out of their hand and later when they turn and look for it, they will find it is gone and be very upset.
Ridiculous analogy? Yes. Will it stick in your mind? Most likely :)
Here's an interesting article that I found on the web.
It talks about how this firm is able to parse a huge amount of financial data in a managed environment, essentially by object reuse and avoiding immutables such as string. They then go on and show that their program doesn't do any GC during the continuous operation phase.
This is pretty impressive, and I'd like to know if anyone else here has some more detailed guidelines as to how to do this. For one, I'm wondering how the heck you can avoid using string, when blatently some of the data inside the messages are strings, and whatever client application is looking at the messages will want to be passed those strings? Also, what do you allocate in the startup phase? How will you know it's enough? Is it simple a matter of claiming a big chunk of memory and keeping a reference to it so that GC doesn't kick in? What about whatever client application is using the messages? Does it also need to be written according to these stringent standards?
Also, would I need a special tool to look at the memory? I've been using SciTech memory profiler thus far.
I found the paper you linked to rather deficient:
It assumes, and wants you to assume, that garbage collection is the ultimate latency killer. They have not explained why they think so, nor have they explained in what way their system is not basically a custom-made garbage collector in disguise.
It talks about the amount of memory cleaned up in garbage collection, which is irrelevant: the time taken to garbage collect depends more on the number of objects, irrespective of their size.
The table of “results” at the bottom provides no comparison to a system that uses .NET’s garbage collector.
Of course, this doesn’t mean they’re lying and it’s nothing to do with garbage collection, but it basically means that the paper is just trying to sound impressive without actually divulging anything useful that you could use to build your own.
One thing to note from the beginning is where they say "Conventional wisdom has been developing low latency messaging technology required the use of unmanaged C++ or assembly language". In particular, they are talking about a sort of case where people would often dismiss a .NET (or Java) solution out of hand. For that matter, a relatively naïve C++ solution probably wouldn't make the grade either.
Another thing to consider here, is that they have essentially haven't so much gotten rid of the GC as replaced it - there's code there managing object lifetime, but it's their own code.
There are several different ways one could do this instead. Here's one. Say I need to create and destroy several Foo objects as my application runs. Foo creation is parameterised by an int, so the normal code would be:
public class Foo
{
private readonly int _bar;
Foo(int bar)
{
_bar = bar;
}
/* other code that makes this class actually interesting. */
}
public class UsesFoo
{
public void FooUsedHere(int param)
{
Foo baz = new Foo(param)
//Do something here
//baz falls out of scope and is liable to GC colleciton
}
}
A much different approach is:
public class Foo
{
private static readonly Foo[] FOO_STORE = new Foo[MOST_POSSIBLY_NEEDED];
private static Foo FREE;
static Foo()
{
Foo last = FOO_STORE[MOST_POSSIBLY_NEEDED -1] = new Foo();
int idx = MOST_POSSIBLY_NEEDED - 1;
while(idx != 0)
{
Foo newFoo = FOO_STORE[--idx] = new Foo();
newFoo._next = FOO_STORE[idx + 1];
}
FREE = last._next = FOO_STORE[0];
}
private Foo _next;
//Note _bar is no longer readonly. We lose the advantages
//as a cost of reusing objects. Even if Foo acts immutable
//it isn't really.
private int _bar;
public static Foo GetFoo(int bar)
{
Foo ret = FREE;
FREE = ret._next;
return ret;
}
public void Release()
{
_next = FREE;
FREE = this;
}
/* other code that makes this class actually interesting. */
}
public class UsesFoo
{
public void FooUsedHere(int param)
{
Foo baz = Foo.GetFoo(param)
//Do something here
baz.Release();
}
}
Further complication can be added if you are multithreaded (though for really high performance in a non-interactive environment, you may want to have either one thread, or separate stores of Foo classes per thread), and if you cannot predict MOST_POSSIBLY_NEEDED in advance (the simplest is to create new Foo() as needed, but not release them for GC which can be easily done in the above code by creating a new Foo if FREE._next is null).
If we allow for unsafe code we can have even greater advantages in having Foo a struct (and hence the array holding a contiguous area of stack memory), _next being a pointer to Foo, and GetFoo() returning a pointer.
Whether this is what these people are actually doing, I of course cannot say, but the above does prevent GC from activating. This will only be faster in very high throughput conditions, if not then letting GC do its stuff is probably better (GC really does help you, despite 90% of questions about it treating it as a Big Bad).
There are other approaches that similarly avoid GC. In C++ the new and delete operators can be overridden, which allows for the default creation and destruction behaviour to change, and discussions of how and why one might do so might interest you.
A practical take-away from this is when objects either hold resources other than memory that are expensive (e.g. connections to databases) or "learn" as they continue to be used (e.g. XmlNameTables). In this case pooling objects is useful (ADO.NET connections do so behind the scenes by default). In this case though a simple Queue is the way to go, as the extra overhead in terms of memory doesn't matter. You can also abandon objects on lock contention (you're looking to gain performance, and lock contention will hurt it more than abandoning the object), which I doubt would work in their case.
From what I understood, the article doesn't say they don't use strings. They don't use immutable strings. The problem with immutable strings is that when you're doing parsing, most of the strings generated are just throw-away strings.
I'm guessing they're using some sort of pre-allocation combined with free lists of mutable strings.
I worked for a while with a CEP product called StreamBase. One of their engineers told me that they were migrating their C++ code to Java because they were getting better performance, fewer bugs and better portability on the JVM by pretty much avoiding GC altogether. I imagine the arguments apply to the CLR as well.
It seemed counter-intuitive, but their product was blazingly fast.
Here's some information from their site:
StreamBase avoids garbage collection in two ways: Not using objects, and only using the minimum set of objects we need.
First, we avoid using objects by using Java primitive types (Boolean, byte, int, double, and long) to represent our data for processing. Each StreamBase data type is represented by one or more primitive type. By only manipulating the primitive types, we can store data efficiently in stack or array allocated regions of memory. We can then use techniques like parallel arrays or method calling to pass data around efficiently.
Second, when we do use objects, we are careful about their creation and destruction. We tend to pool objects rather than releasing them for garbage collection. We try to manage object lifecycle such that objects are either caught by the garbage collector in the young generation, or kept around forever.
Finally, we test this internally using a benchmarking harness that measures per-tuple garbage collection. In order to achieve our high speeds, we try to eliminate all per-tuple garbage collection, generally with good success.
In 99% of the time you will be wasting your bosses money when you try to achieve this. The article describes a absolute extreme scenario where they need the last drop of performance. As you can read in the article, there are great parts of the .NET framework that can't be used when trying to be GC-free. Some of the most basic parts of the BCL use memory allocations (or 'produce garbage', as the paper calls it). You will need to find a way around those methods. And even when you need absolute blazingly fast applications, you'd better first try to build an application/architecture that can scale out (use multiple machines), before trying to walk the no-GC route. The sole reason for them to use the no-GC route is they need an absolute low latency. IMO, when you need absolute speed, but don't care about the absolute minimum response time, it will be hard to justify a no-GC architecture. Besides this, if you try to build a GC-free client application (such as Windows Forms or WPF App); forget it, those presentation frameworks create new objects constantly.
But if you really want this, it is actually quite simple. Here is a simple how to:
Find out which parts of the .NET API can't be used (you can write a tool that analyzes the .NET assemblies using an introspection engine).
Write a program that verifies the code you or your developers write to ensure they don't allocate directly or use 'forbidden' .NET methods, using the safe list created in the previous point (FxCop is a great tool for this).
Create object pools that you initialize at startup time. The rest of the program can reuse existing object so that they won't have to do any new ops.
If you need to manipulate strings, use byte arrays for this and store byte arrays in a pool (WCF uses this technique also). You will have to create an API that allows manipulating those byte arrays.
And last but not least, profile, profile, profile.
Good luck