Disposing a data reader when constructing it by value vs reference

Disposing a data reader when constructing it by value vs reference - c#

Rookie question about when a data reader actually gets released when it’s constructed in a class using ref vs. var. I’ve been testing this today and the results are puzzling me a bit – hoping to get this clear in my head.
I have a class that I use for fetching data via ODBC from numerous remote servers but I need to restrict how many ODBC connections are open to each server I'm attached to – so I’m being careful about properly disposing data readers when I’m done with them before opening another. Short version is I have a method called FillDataReader that takes a data reader object, and fills it based on your query and passes it back.
If I pass it using ref and dispose the data reader from the calling side, all is well. The connection is released immediately and the client side can get another data reader filled without burning a connection. However if I pass by value, the resource is not released and if I open another data reader from the client side I now have two connections to that server.
Conceptually I get the difference – with ref only a single address is being used as it’s passing a “pointer to a pointer” and the dispose releases that resource. OK, but even if passing by value and doing an explicit dispose on the client side what, exactly, is holding the resource? I’d rather pass by value here so I can use the nifty using construct on the client side but more importantly I want to understand better what’s happening here. In a nutshell here’s what it looks like
[DB fetch class]
public bool FillDataReader(string pQueryString, ref System.Data.Odbc.OdbcDataReader pDataReader, out string pErrorReason)
{
(uses a connection object that’s been established at class construction time and stays up all the time)
...
try
{
pDataReader = _Command.ExecuteReader();
}
...
}
[Calling class]
strSQL = "SELECT Alias, ObjectID, FROM vw_GlobalUser";
if (ServerList[currentServer].DatabaseFunctions.FillDataReader(strSQL, ref drTemp, false, out strErrorReason) == false)
….
drTemp.Dispose();
(at this point the connection is released to the server)
However if I take the ref out at the point of Dispose in the calling class the connection is not released. It goes away eventually but I need it gone immediately (hence the call to dispose).
So is the fill function in the DB fetch class hanging onto a reference to the allocated space on the heap somehow? I’m not sure I understand why that is – understood it’s using another copy of the address to the data reader on the stack to reference the data reader object on the heap there but when it goes out of scope, isn’t that released? Maybe I need more coffee…

Since your calling code needs to receive the reference to release the object, you do need a ref (or out). Otherwise the parameter is only passed to the method, but not back, so that the drTemp isn't updated with the data reader created in the FillDataReader method.
Note that you may want to change the signature as follows to make the intention more clear:
public Result TryGetDataReader(string pQueryString, out System.Data.Odbc.OdbcDataReader pDataReader)
Changes that I propose:
Introduced the naming convention with "Try", which is common for this type of method
Made the pDataReader an out, since it doesn't need to be initialized when calling the method
Introduced a "Result" type, which should carry the success information and the error message (if any)

Related

Am I always dealing with the same object?

I'm working on a TCP socket related application, where an object I've created refers to a System.Net.Sockets.Socket object. That latter object seems to become null and in order to understand why, I would like to check if my own object gets re-created. For that, I thought of the simplest possible approach by checking the memory address of this. However, when adding this to the watch-window I get following error message:
Name Value
&this error CS0211: Cannot take the address of the given expression
As it seems to be impossible to check the memory address of an object in C#, how can I verify that I'm dealing with the same or another object when debugging my code?

In C#, objects are moved during garbage collection. You can't simply take the address of it, because the address changed when the GC heap is compacted.
Dealing with pointers in C# requires unsafe code and you leave the terrain of safe code, basically making it as unsafe as C++.
You can use a debugger like windbg, which displays the memory addresses of objects - but they will still change when GC moves them around.
If you want to see if a new instance of your class gets created, why not set a breakpoint in the constructor?

I am convinced with #thomas answer above.
you can add a unique identifier (such as a GUID) property to your object and use that to determine if you have the same object.
you could override the Equals method to compare two objects if they same as below.
public class MyClass
{
public Guid Id { get; } = Guid.NewGuid();
public override bool Equals(object obj)
{
return obj is MyClass second && this.Id == second.Id;
}
}

As already explained, addresses of objects are not a viable means of reasoning about objects in garbage-collected virtual machines like DotNet. In DotNet you may get the chance to observe the address of an object if you use the fixed keyword, unsafe blocks, or GCHandle.Alloc(), but these are all very hacky and they keep objects fixed in memory so they cannot be garbage collected, which is something that you absolutely do not want. The moment you unfix an object, then its address is free to change, so you cannot keep track of it.
Luckily, you do not need any of that!
You don't need addresses, because all you want is a mnemonic for each object, for the purpose of identifying it during troubleshooting. For this, you have the following options:
Create a singleton which issues unique ids, and in the constructor of each object invoke this singleton to obtain a unique id, store the id with the object, and include the id in the ToString() method of the object, or in whatever other method you might be using for debug display.
Use the System.Runtime.Serialization.ObjectIDGenerator class, which does more or less what the singleton id generator would do, but in a more advanced, and possibly easier to use way. (I have no personal experience using it, so I cannot give any more advice about it.)
Use the System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode( object ) method, which returns what is known in other circles as The Identity Hash-Code of an Object. It is guaranteed to remain unchanged throughout the lifetime of the object, but it is not guaranteed to be unique among all objects. However, since it is 32-bits long, it will be a cold day in hell before another object gets issued the same hash code by coincidence, so it will serve all your troubleshooting purposes just fine.
Do yourself a favor and display the Identity Hash Code of your objects in hexadecimal; the number will be shorter, and will have a wider variety of digits than decimal, so it will be easier to retain in short-term memory while troubleshooting.

Class Constructor vs Using Statement for Database Connections

I have two scenarios (examples below), both are perfectly legitimate methods of making a database request, however I'm not really sure which is best.
Example One - This is the method we generally use when building new applications.
private readonly IInterfaceName _repositoryInterface;
public ControllerName()
{
_repositoryInterface = new Repository(Context);
}
public JsonResult MethodName(string someParameter)
{
var data = _repositoryInterface.ReturnData(someParameter);
return data;
}
protected override void Dispose(bool disposing)
{
Context.Dispose();
base.Dispose(disposing);
}
public IEnumerable<ModelName> ReturnData(filter)
{
Expression<Func<ModelName, bool>> query = q => q.ParameterName.ToUpper().Contains(filter)
return Get(filter);
}
Example Two - I've recently started seeing this more frequently
using (SqlConnection connection = new SqlConnection(
ConfigurationManager.ConnectionStrings["ConnectionName"].ToString()))
{
var storedProcedureName = GetStoredProcedureName();
using (SqlCommand command = new SqlCommand(storedProcedureName, connection))
{
command.CommandType = CommandType.StoredProcedure;
command.Parameters.Add("#Start", SqlDbType.Int).Value = start;
using (SqlDataReader reader = command.ExecuteReader())
{
// DATA IS READ AND PARSED
}
}
}
Both examples use Entity Framework in some form (the first more so than the other), there are Model and Mapping files for every table which could be interrogated. The main thing the second example does over the first (regarding EF) is utilising Migrations as part of the Stored Procedure code generation. In addition, both implement the Repository pattern similar to that which is in the second link below.
Code First - MSDN
Contoso University - Tutorial
My understanding of Example One is that the repository and context are instantiated once the Controller is called. When making the call to the repository it returns the data but leaves the context intact until it is disposed of at the end of the method. Example Two on the other hand will call Dispose as soon as the database call is finished with (unless forced into memory, e.g. using .ToList() on an IEnumerable). If my understanding is not correct, please correct me where appropriate.
So my main question is what are the disadvantages and advantages of using one over the other? Example, is there a larger performance overhead of going with Example 2 compared to Example 1.
FYI: I've tried to search for an answer to the below but have been unsuccessful, so if you are of a similar question please feel free to point me in that direction.

You seem to be making a comparison like this:
Is it better to build a house or to install plumbing in the bathroom?
You can have both. You could have a repository (house) that uses data connections (plumbing) so it's not an "OR" situation.
There is no reason why the call to ReturnData doesn't use a SqlCommand under the hood.
Now, the real important difference that is worth considering is whether or not the repository holds a resource (memory, connection, pipe, file, etc) open for its lifetime, or just per data call.
The advantage of using a using is that resources are only opened for the duration of the call. This helps immensely with scaling of the app.
On the other hand there's an overhead to opening connections, so it's better - particularly for single threaded apps - to open a connection, do several tasks, and then close it.
So it really boils down to what type of app you're writing as to which approach you use.

Your second example isn't using entity framework. It seems you may have two different approaches to data access here although it is hard to tell from the repository snippet as it quite rightly hides the data access implementation. The second example is correctly using a "using" statement as you should on any object that implements IDisposable. It means you don't have to worry about calling dispose. This is using pure ADO.net which is what Entity Framework uses under the hood.
If the first example is using Entity framework you most likely have lazy loading in play in which case you need the DbContext to remain until the query has been executed. Entity Framework is an ORM tool. It too uses ADO.net under the hood to connect to the database but it also offers you alot more on top. A good book on both subjects should help you.
I found learning ADO.net first helps alot in understanding how Entity Framework retrieves info from the Database.
the using statement is good practice where ever you find an object that implements IDisposable. You can read more about that here : IDisposable the right way
In response to the change to the question - the answer still on the whole remains the same. In terms of performance - how fast are the queries returned? Does the performance of one work better than the other? Only your current system and set up can tell you that. Both approaches seem to be doing things the correct way.
I haven't worked with Migrations so not sure why you are getting ADO.net type queries integrating with your EF models but wouldn't be surprised by this functionality. Entity Framework as I have experienced it creates the queries for you and then executes them using the ADO.net objects from your second example. The key point is that you want to have the "using" block for SqlConnection and SqlCommand objects (although I don't think you need to nest them. everything inside the outer "using block will be disposed).
There is nothing stopping you putting a "using" block in your repository around the context but when it comes to lazily load the related Entities you will get an error as the context will have been disposed. If you need to make this change you can include the relevant elements in your query and do away with the lazy loading approach. There are performance gains in certain situations for doing this but again you need to balance this in terms to how your system is performing.

C# OutOfMemory Issue when dealing with large data

In our application we are generating reports using Windows Service. The data for reports is fetched from SQL Server using a stored procedure. In some scenario the result set returned contains 250,000 records (We can not help on this part and we need this data in one go as we need to do some calculations on this).
Problem
Our application is getting this data in reader and we are converting this dataset in our custom collection of custom objects. As the data is huge it is not able to store the complete data in the custom object and hence throwing out of memory. When we see the task manager for the process usage while executing the record, it goes very high and even the CPU utilization.
I am not sure what should be do in this case.
Can we increase the size of the memory allocated to a single process running under CLR?
Any other workarounds?
Any help would be really appreciated
Why do I need all data at once : We need to do calculations on complete resultset
We are using ADO.NET and transforming the data set in to our custom object (collection)
Our system is 32 bit
We can not page the data
Can not move the computation to sql server
This stack trace might help:
Exception of type 'System.OutOfMemoryException' was thrown. Server
stack trace: at
System.Collections.Generic.Dictionary2.ValueCollection.System.Collections.Generic.IEnumerable<TValue>.GetEnumerator()
at System.Linq.Enumerable.WhereEnumerableIterator1.MoveNext() at
System.Collections.Generic.List1.InsertRange(Int32 index,
IEnumerable1 collection) at
System.Collections.Generic.List1.AddRange(IEnumerable1 collection)
at MyProject.Common.Data.DataProperty.GetPropertiesForType(Type t) in
C:\Ashish-Stuff\Projects\HCPA\Dev
Branch\Common\Benefits.Common\Data\DataProperty.shared.cs:line 60 at
MyProject.Common.Data.Extensions.GetProperties[T](T target) in
C:\Ashish-Stuff\Projects\HCPA\Dev
Branch\Common\Benefits.Common\Data\Extensions.shared.cs:line 30 at
MyProject.Common.Data.Factories.SqlServerDataFactoryContract1.GetData(String
procedureName, IDictionary2 parameters, Nullable1 languageId,
Nullable1 pageNumber, Nullable`1 pageSize)
Thanks,
Ashish

Could you every 1,000 rows of data, serialize your custom collection of objects to disk somewhere? Then when you return data, paginate it from those files?
More info on your use case as to why you need to pull back 2.5million rows of data would be helpful.

My first though was that computation could be made on Sql-Server side by some stored procedure. I suspect that this approach requires some Sql-Server jedi ... but anyway, have you considered such approach?

I would love to see a code sample highlighting where exactly you are getting this error from. Is it on the data pull itself (populating the reader) or is it creating the object and adding it to the custom collection (populating the collection).
I have had similar issues before, dealing with VERY LARGE datasets, but met great success with leaving it in a stream for as long as possible. streams will keep the data directly in memory and you wont ever really have anything with direct access to the entire mess until you finish building the object. Now, given that the stack trace shows the error on a "MoveNext" operation, this may not work for you. I would then say try to chunk the data, grab 10k rows at a time or something, I know that this can be done with SQL. It will make the data read take a lot longer though.
EDIT
If you read this from the database into a local stream, that you then pass around (just be careful not to close it), then you shouldn't run into these issues. Make a data wrapper class that you can pass around with an open stream and an open reader. Store the data in the stream and then use wrapper functions to read the specific data you need from it. Things like GetSumOfXField() or AverageOfYValues(), etc etc... The data will never be in a live object, but you wont have to keep going back to the database for it.
Pseudo Example
public void ReadingTheDataFunction()
{
DBDataReader reader = dbCommand.ExecuteReader();
MyDataStore.FillDataSource(reader)
}
private void FillDataSource(DbDataReader reader)
{
StreamWriter writer = new StreamWriter(GlobaldataStream);
while (reader.Read())
writer.WriteLine(BuildStringFromDataRow(reader));
reader.close();
}
private CustomObject GetNextRow()
{
String line = GlobalDataReader.ReadLine();
//Parse String to Custom Object
return ret;
}
From there you pass around MyDataStore, and as long as the stream and reader aren't closed, you can move your position around, go looking for individual entries, compile sums and averages, etc etc. you never even really have to know you aren't dealing with a live object, as long as you only interact with it via the interface functions.

Disposing Custom Class : Setting NULL VS .Dispose [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Calling null on a class vs Dispose()
Just want some information regarding Disposing an Object.
I have create a Employee Class which i inherit from IDISPOSIBLE Interface. Below is the sample Code
public class Employee : IDisposable
{
private Int32 _RunID;
public Int32 RunID { get { return _RunID; } set { _RunID = value; } }
public void Dispose()
{
//Dispose(true);
}
}
Now my Question is, it a Good Coding Practice to Dispose every class we create and implementing/Inheriting it with IDisposible Interface even i have seen many other people code they directly set ObjEmployee = null; so just got confused which is good setting NULL or Implementing it With IDisposible Interface or None Of the Above ?

It depends, do you have managed resources (File handles, sockets, connections, etc....) that need to be gotten rid of along with your object? If yes then you need a Dispose() if your class contains basic types or just information you do not need to dispose and setting to null will give a hint to the GC to clear that memory.

When you set ObjEmployee = null, you only mark the object instance to be ready for the Garbage Collector, but you have no influence over when the actual clean-up would take place, and it may take a while. When you use the Dispose() method, the GC runs immediately and frees the memory the object was using.

The fundamental question, in deciding whether a class should implement IDisposable, is whether instances of that class have taken on the responsibility of seeing that other entities get cleaned up; typically those other entities will be altering their behavior on behalf of the IDisposable object and at the expense of other entities, and the IDisposable object is responsible for letting them know when they no longer need to do so.
For example, if any code, anywhere, uses the C function fopen() to open for read-write access a file which is located on a server, the server will alter its behavior by forbidding anyone else from accessing the file until it receives word that the program that opened it has no more need of it. When the program no longer needs exclusive use of the file, it can call fclose() which will in turn cause the server to be notified that the file should again be available to other applications.
If a method in a C# class calls a routine which calls fopen(), and that routine returns after putting the FILE* in a place the C# program knows about but nothing else does, that method takes on a responsibility for seeing that fclose() must somehow get called with that FILE*. The file needs to be fclosed(), and nothing else in the system has the information or impetus necessary to do so, so the responsiblity falls to that C# class.
If the C# method were to return without storing the FILE* anywhere, the file would never get closed, and nobody else, anywhere in the universe, would be able to use it unless or until the application exits. If the C# method has to exit without yielding exclusive use of the file, it must store the FILE* in a way that will ensure that someone, somewhere, will clean it up after exclusive use is no longer required. The normal pattern would be for the method to store the FILE* in a class field, and for class containing the method to implement IDisposable by copying and blanking that field, seeing if it was non-blank, and if it was non-blank, calling fclose() the stored FILE*.
The important thing to realize is that when an object is destroyed by the garbage collector, the system won't care what's in any of the object fields. It won't even look at them. What matters is whether the object has any unfulfilled responsibilities for ensuring that outside entities, which may not even be on the same machine, are informed when their services are no longer needed.

Command.Prepare() Causing Memory Leakage?

I've sort of inherited some code on this scientific modelling project, and my colleagues and I are getting stumped by this problem. The guy who wrote this is now gone, so we can't ask him (go figure).
Inside the data access layer, there is this insert() method. This does what it sounds like -- it inserts records into a database. It is used by the various objects being modeled to tell the database about themselves during the course of the simulation.
However, we noticed that during longer simulations after a fair number of database inserts, we eventually got connection timeouts. So we upped the timeout limits, and then we started getting "out of memory" errors from PostgreSQL. We eventually pinpointed the problem to a line where an IDbCommand object uses Prepare(). Leaving it in causes memory usage to indefinitely go up. Commenting out this line causes the code to work just fine, and eliminates all the memory problems. What does Prepare() do that causes this? I can't find anything in the documentation to explain this.
A compressed version of the code follows.
public virtual void insert(DomainObjects.EntityObject obj)
{
lock (DataBaseProvider.DataBase.Connection)
{
IDbCommand cmd = null;
IDataReader noInsertIdReader = null;
IDataReader reader= null;
try
{
if (DataBaseProvider.DataBase.Validate)
{ ... }
// create and prepare the insert command
cmd = createQuery(".toInsert", obj);
cmd.Prepare(); // This is what is screwing things up
// get the query to retreive the sequence number
SqlStatement lastInsertIdSql = DAOLayer...getStatement(this.GetType().ToString() + ".toGetLastInsertId");
// if the obj insert does not use a sequence, execute the insert command and return
if (lastInsertIdSql == null)
{
noInsertIdReader = cmd.ExecuteReader();
noInsertIdReader.Close();
return;
}
// append the sequence query to the end of the insert statement
cmd.CommandText += ";" + lastInsertIdSql.Statement;
reader = cmd.ExecuteReader();
// read the sequence number and set the objects id
...
}
// deal with some specific exceptions
...
}
}
EDIT: (In response to the first given answer) All the database objects do get disposed in a finally block. I just cut that part out here to save space. We've played with that a bit and that didn't make any difference, so I don't think that's the problem.

You'll notice that IDbCommand and IDataReader both implement IDisposable. Whenever you create an instance of an IDisposable object you should either wrap it in a using statement or call Dispose once you're finished. If you don't you'll end up leaking resources (sometimes resources other than just memory).
Try this in your code
using (IDbCommand cmd = createQuery(".toInsert", obj))
{
cmd.Prepare(); // This is what is screwing things up
...
//the rest of your example code
...
}
EDIT to talk specifically about Prepare
I can see from the code that you're preparing the command and then never reusing it.
The idea behind preparing a command is that it costs extra overhead to prepare, but then each time you use the command it will be more efficient than a non prepared statement. This is good if you've got a command that you're going to reuse a lot, and is a trade off of whether the overhead is worth the performance increase of the command.
So in the code you've shown us you are preparing the command (paying all of the overhead) and getting no benefit because you then immediately throw the command away!
I would either recycle the prepared command, or just ditch the call to the prepare statement.
I have no idea why the prepared commands are leaking, but you shouldn't be preparing so many commands in the first place (especially single use commands).

The Prepare() method was designed to make the query run more efficiently. It is entirely up to the provider to implement this. A typical one creates a temporary stored procedure, giving the server an opportunity to pre-parse and optimize the query.
There's a couple of ways code like this could leak memory. One is a typical .NET detail, a practical implementation of an IDbCommand class always has a Dispose() method to release resources explicitly before the finalizer thread does it. I don't see it being used in your snippet. But pretty unlikely in this case, it is very hard to consume all memory without ever running the garbage collector. You can tell from Perfmon.exe and observe the performance counters for the garbage collector.
The next candidate is more insidious, you are using a big chunk of native code. Dbase providers are not that simple. The FOSS kind tends to be designed to allow you to get the bugs out of them. Source code is available for a reason. Perfmon.exe again to diagnose that, seeing the managed heaps not growing beyond bounds but private bytes exploding is a dead giveaway.
If you don't feel much like debugging the provider you could just comment the statement.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.