DataTables vs IEnumerable<T>

DataTables vs IEnumerable<T> - c#

I'm having a debate with another programmer I work with.
For a database return type, are there any significant memory usage or performance differences, or other cons which should make someone avoid using the DataSets and DataTables and favour types which implement IEnumerable<T>... or vice versa
I prefer returning types which implementIEnumerable<T> (List<T>, T[] etc) because it's more lightweight, strongly typed to the object when accessing properties, allows richer information about the underlying type etc. They do take more time to set up though when manually using the data reader.
Is the only reason to use DataTables these day just lazyness?

DataTables are definitely much heavier than Lists, both in memory requirements, and in processor time spent creating them / filling them up.
Using a DataReader is considerable faster (although more verbose) than using DataTables (I'm assuming you're using a DataAdapter to fill them).
That said...
Unless this is in some place where it really matters, you're probably fine either way, and both methods will be fast enough, so just go with whatever is more comfortable in each case. (Sometimes you want to fill them up with little code, sometimes you want to read them with little code)
I myself tend to only use DataTables when I'm binding to a GridView, or when I need more than one resultset active at the same time.

Another advantage to using the System.Collections classes is that you get better sorting and searching options. I don't know of any reasonable way to alter the way a DataTable sorts or searches; with the collection classes you just have your class implement IComparable or IEquatable and you can completely customize how List.Sort and List.Contains work.
Also with lists you don't have to worry about DBNull, which has tripped me up on more than one occasion because I was expecting null and got DBNull.

I also like the fact with IEnumerable<T> that you can enhance the underlying type of the collection with methods and properties which makes implementation far more elegant, and the code more maintainable. For example the FullName property. You can also add extension methods to the class if it is out of your control.
public class SomeUser
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string FullName { get { return String.Format("{0} {1}", FirstName, LastName); } }
}

Using DataTables directly means tying yourself to the underlying data source and how it is laid out. This is not good from a maintainability point of view. If all your view needs is a list of some objects, that's all you should be giving it.

Related

Ending with too many objects (layered design)

I have a lot of dropdown lists, custom grids on my webform which are displayed to the end user. Each is populated from database through a DAL. I have separate classes defined for each. However, I am thinking about reducing the number of classes, as every new requirement results in a separate custom object.
How can I reduce the no. of classes for such requirements? Should I use datasets, lists etc. ?

"Separate classes defined for each" and "How can I reduce the no. of classes for such requirements".
Do you really create a new class for each dropdown list?
From my experience, usually I generalized it by using this class:
public class DropDownItem<T>{
public string Display{get;set;}
public T Value{get;set;}
}
It can be done using Dictionary<T> though.
Never used in ASP.Net, but it works well in Winform and WPF databinding. In Asp.Net specific, I think normal select-option is enough to supply the need.
However for gridview, you need to generalize your classes to be more generic. Declare a class which has most of the parameter, which is nullable.
Example one request has 10 parameter, 5 is mandatory and other 5 is nullable. Grid A display param 1,2,3,4,5,7,8 and grid B display param 1,2,3,4,5,6,9,10. This way, you can use one class in many more grid.
Don't use DataSets/DataTable. It is better to use more class than DataSet. The maintainability will be better when using more class than DataSet, because it is strongly typed, rather than "COLUMN_NAME" in DataSet.

I hope this doesn't sound too critical, but if each requirement being added as a class is ending up as a lot work, perhaps you can look into inheritance to clean up boilerplate/shared code in those classes.
Generally a lot of small classes (that don't overlap functionality with other classes) is a good thing. The opposite complexity problem, the "god" class, where all your code is stuffed into fewer classes, is much worse.

C# LINQ and calculations involving large datasets

This is more of a technical "how-to" or "best approach" question.
We have a current requirement to retrieve records from the database, place them into an 'in-memory' list and then perform a series of calculations on the data, i.e. maximum values, averages and some more specific custom statistics as well.
Getting the data into an 'in-memory' list is not a problem as we use NHibernate as our ORM and it does an excellent job of retrieving data from the database. The advice I am seeking is how should we best perform calculations on the resulting list of data.
Ideally I would like to create a method for each statistic, MaximumValue(), AverageValueUnder100(), MoreComplicatedStatistic() etc etc. Of course passing the required variables to each method and having it return the result. This approach would also make unit testing a breeze and provide us with excellent coverage.
Would there be a performance hit if we perform a LINQ query for each calculation or should be consolidate as many calls to each statistic method in as few LINQ queries as possible. For example it doesn't make much sense to pass the list of data to a method called AverageValueBelow100 and then pass the entire list of data to another method AverageValueBelow50 when they could effectively be performed with one LINQ query.
How can we achieve a high level of granularity and separation without sacrificing performance?
Any advice ... is the question clear enough?

Depending on the complexity of the calculation, it may be best to do it in the database. If it is signifcantly complex that you need to bring it in as objects and encur that overhead, you may want to avoid multiple iterations over your result set. you may want to consider using Aggregate. See http://geekswithblogs.net/malisancube/archive/2009/12/09/demystifying-linq-aggregates.aspx for a discussion if it. You would be able to unit test each aggregate separately, but then (potentially) project multiple aggregates within a single iteration.

I dont agree that it is best "to do it all in the database".
Well written Linq Queries will result in good SQL queries being executed against the database, which should be good enough performance wise (if you are not going to do dwh stuff). This is assuming you are using the Linq Provider for NHibernate and not Linq to Objects.
It does look good, you can change it easily and keeps your business logic in one place.
If that is too slow for your needs, you might check the SQL code created and tweak your linq queries, are try to precompile them, and in the end you can still go back to writing the beloved stored procedures - and start to spread your business logic all over the place.
Will there be a performance hit? Yeah, you might lose a few millisecs, but is that worth the price you have to pay for separating your logic?

To answer the "I would like to create a method for each statistic" concern, I would suggest you to build a kind of statistician class. Here is some pseudo code to express the idea :
class Statistician
{
public bool MustCalculateFIRSTSTATISTIC { get; set; } // Please rename me!
public bool MustCalculateSECONDSTATISTIC { get; set; } // Please rename me!
public void ProcessObject(object Object) // Replace object and Rename
{
if (MustCalculateFIRSTSTATISTIC)
CalculateFIRSTSTATISTIC(Object);
if (MustCalculateFIRSTSTATISTIC)
CalculateSECONDSTATISTIC(Object);
}
public object GetFIRSTSTATISTIC() // Replace object, Rename
{ /* ... */ }
public object GetSECONDSTATISTIC() // Replace object, Rename
{ /* ... */ }
private void CalculateFIRSTSTATISTIC(object Object) // Replace object
{ /* ... */ }
private void CalculateSECONDSTATISTIC(object Object) // Replace object
{ /* ... */ }
}
Would I have to do this, I would probably try to make it generic and use collections of delegates instead of methods, but since I don't know your context, I'll leave it to that. Also note that I only used Object members of object class, but that's only because I'm not suggesting you to use DataRows, Entities, or what not; I'll leave that to the other folks that know more then me on the subject!

Is it ok to use C# Property like this

One of my fellow developer has a code similar to the following snippet
class Data
{
public string Prop1
{
get
{
// return the value stored in the database via a query
}
set
{
// Save the data to local variable
}
}
public void SaveData()
{
// Write all the properties to a file
}
}
class Program
{
public void SaveData()
{
Data d = new Data();
// Fetch the information from database and fill the local variable
d.Prop1 = d.Prop1;
d.SaveData();
}
}
Here the Data class properties fetch the information from DB dynamically. When there is a need to save the Data to a file the developer creates an instance and fills the property using self assignment. Then finally calls a save. I tried arguing that the usage of property is not correct. But he is not convinced.
This are his points
There are nearly 20 such properties.
Fetching all the information is not required except for saving.
Instead of self assignment writing an utility method to fetch all will have same duplicate code in the properties.
Is this usage correct?

I don't think that another developer who will work with the same code will be happy to see :
d.Prop1 = d.Prop1;
Personally I would never do that.
Also it is not the best idea to use property to load data from DB.
I would have method which will load data from DB to local variable and then you can get that data using property. Also get/set logically must work with the same data. It is strange to use get for getting data from DB but to use set to work with local variable.

Properties should really be as lightweight as possible.
When other developers are using properties, they expect them to be intrinsic parts of the object (that is, already loaded and in memory).
The real issue here is that of symmetry - the property get and set should mirror each other, and they don't. This is against what most developers would normally expect.
Having the property load up from database is not recommended - normally one would populate the class via a specific method.

This is pretty terrible, imo.
Properties are supposed to be quick / easy to access; if there's really heavy stuff going on behind a property it should probably be a method instead.
Having two utterly different things going on behind the same property's getter and setter is very confusing. d.Prop1 = d.Prop1 looks like a meaningless self-assignment, not a "Load data from DB" call.
Even if you do have to load twenty different things from a database, doing it this way forces it to be twenty different DB trips; are you sure multiple properties can't be fetched in a single call? That would likely be much better, performance-wise.

"Correct" is often in the eye of the beholder. It also depends how far or how brilliant you want your design to be. I'd never go for the design you describe, it'll become a maintenance nightmare to have the CRUD actions on the POCOs.
Your main issue is the absense of separations of concerns. I.e., The data-object is also responsible for storing and retrieving (actions that need to be defined only once in the whole system). As a result, you end up with duplicated, bloated and unmaintainable code that may quickly become real slow (try a LINQ query with a join on the gettor).
A common scenario with databases is to use small entity classes that only contain the properties, nothing more. A DAO layer takes care of retrieving and filling these POCOs with data from the database and defined the CRUD actions only ones (through some generics). I'd suggest NHibernate for the ORM mapping. The basic principle explained here works with other ORM mappers too and is explained here.
The reasons, esp. nr 1, should be a main candidate for refactoring this into something more maintainable. Duplicated code and logic, when encountered, should be reconsidered strongly. If the gettor above is really getting the database data (I hope I misunderstand that), get rid of it as quickly as you can.
Overly simplified example of separations of concerns:
class Data
{
public string Prop1 {get; set;}
public string Prop2 {get; set;}
}
class Dao<T>
{
SaveEntity<T>(T data)
{
// use reflection for saving your properies (this is what any ORM does for you)
}
IList<T> GetAll<T>()
{
// use reflection to retrieve all data of this type (again, ORM does this for you)
}
}
// usage:
Dao<Data> myDao = new Dao<Data>();
List<Data> allData = myDao.GetAll();
// modify, query etc using Dao, lazy evaluation and caching is done by the ORM for performance
// but more importantly, this design keeps your code clean, readable and maintainable.
EDIT:
One question you should ask your co-worker: what happens if you have many Data (rows in database), or when a property is a result of a joined query (foreign key table). Have a look at Fluent NHibernate if you want a smooth transition from one situation (unmaintainable) to another (maintainable) that's easy enough to understand by anybody.

If I were you I would write a serialize / deserialize function, then provide properties as lightweight wrappers around the in-memory results.
Take a look at the ISerialization interface: http://msdn.microsoft.com/en-us/library/system.runtime.serialization.iserializable.aspx

This would be very hard to work with,
If you set the Prop1, and then get Prop1, you could end up with different results
eg:
//set Prop1 to "abc"
d.Prop1 = "abc";
//if the data source holds "xyz" for Prop1
string myString = d.Prop1;
//myString will equal "xyz"
reading the code without the comment you would expect mystring to equal "abc" not "xyz", this could be confusing.
This would make working with the properties very difficult and require a save every time you change a property for it to work.

As well as agreeing with what everyone else has said on this example, what happens if there are other fields in the Data class? i.e. Prop2, Prop3 etc, do they all go back to the database, each time they are accessed in order to "return the value stored in the database via a query". 10 properties would equal 10 database hits. Setting 10 properties, 10 writes to the database. That's not going to scale.

In my opinion, that's an awful design. Using a property getter to do some "magic" stuff makes the system awkward to maintain. If I would join your team, how should I know that magic behind those properties?
Create a separate method that is called as it behaves.

Handling collection properties in a class and NHibernate entities

I was wondering what is the recommended way to expose a collection within a class and if it is any different from the way of doing that same thing when working with NHibernate entities.
Let me explain... I never had a specific problem with my classes exposing collection properties like:
IList<SomeObjType> MyProperty { get; set; }
Having the setter as protected or private gives me some times a bit more control on how I want to handle the collection.
I recently came across this article by Davy Brion:
http://davybrion.com/blog/2009/10/stop-exposing-collections-already/
Davy, clearly recommends to have collections as IEnumerables instead of lets say Lists in order to disallow users of having the option to directly manipulate the contents of those collections. I can understand his point but I am not entirely convinced and by reading the comments on his post I am not the only one.
When it comes to NHibernate entities though, it makes much sense to hide the collections in the way he proposes especially when cascades are in place. I want to have complete control of an entity that is in session and its collections, and exposing AddXxx and RemoveXxx for collection properties makes much more sense to me.
The problem is how to do it?
If I have the entity's collections as IEnumerables I have no way of adding/removing elements to them without converting them to Lists by doing ToList() which makes a new list and therefore nothing can be persisted, or casting them to Lists which is a pain because of proxies and lazy loading.
The overall idea is to not allow an entity to be retrieved and have its collections manipulated (add.remove elements) directly but only through the methods I expose while honouring the cascades for collection persistence.
Your advice and ideas will be much appreciated.

How about...
private IList<string> _mappedProperty;
public IEnumerable<string> ExposedProperty
{
get { return _mappedProperty.AsEnumerable<string>(); }
}
public void Add(string value)
{
// Apply business rules, raise events, queue message, etc.
_mappedProperty.Add(value);
}
This solution is possible if you use NHibernate to map to the private field, ie. _mappedProperty. You can read more about how to do this in the access and naming strategies documentation here.
In fact, I prefer to map all my classes like this. Its better that the developer decides how to define the public interface of the class, not the ORM.

How about exposing them as ReadOnlyCollection?
IList<SomeObjType> _mappedProperty;
return new ReadOnlyCollection<SomeObjType> ExposedProperty
{
get
{
return new ReadOnlyCollection(_mappedProperty);
}
}

I am using NHibernate and I usually keep the collections as ISet and make the setter protected.
ISet<SomeObjType> MyProperty { get; protected set; }
I also provide the AddXxx and RemoveXxx for collection properties where they are required. This has worked quite satisfactorily for me most of the time. But I will say that there have been instances where it had made sense to allow client code add items to the collection directly.
Basically, what I have seen is if I follow the principle of "Tell, Don't Ask" in my client code, without worrying too much about enforcing rigid access constraints on my Domain Object properties, then I always end up with a good design.

When to use Properties and Methods?

I'm new to the .NET world having come from C++ and I'm trying to better understand properties. I noticed in the .NET framework Microsoft uses properties all over the place. Is there an advantage for using properties rather than creating get/set methods? Is there a general guideline (as well as naming convention) for when one should use properties?

It is pure syntactic sugar. On the back end, it is compiled into plain get and set methods.
Use it because of convention, and that it looks nicer.
Some guidelines are that when it has a high risk of throwing Exceptions or going wrong, don't use properties but explicit getters/setters. But generally even then they are used.

Properties are get/set methods; simply, it formalises them into a single concept (for read and write), allowing (for example) metadata against the property, rather than individual members. For example:
[XmlAttribute("foo")]
public string Name {get;set;}
This is a get/set pair of methods, but the additional metadata applies to both. It also, IMO, simply makes it easier to use:
someObj.Name = "Fred"; // clearly a "set"
DateTime dob = someObj.DateOfBirth; // clearly a "get"
We haven't duplicated the fact that we're doing a get/set.
Another nice thing is that it allows simple two-way data-binding against the property ("Name" above), without relying on any magic patterns (except those guaranteed by the compiler).

There is an entire book dedicated to answering these sorts of questions: Framework Design Guidelines from Addison-Wesley. See section 5.1.3 for advice on when to choose a property vs a method.
Much of the content of this book is available on MSDN as well, but I find it handy to have it on my desk.

Consider reading Choosing Between Properties and Methods. It has a lot of information on .NET design guidelines.

properties are get/set methods

Properties are set and get methods as people around here have explained, but the idea of having them is making those methods the only ones playing with the private values (for instance, to handle validations).
The whole other logic should be done against the properties, but it's always easier mentally to work with something you can handle as a value on the left and right side of operations (properties) and not having to even think it is a method.
I personally think that's the main idea behind properties.

I always think that properties are the nouns of a class, where as methods are the verbs...

First of all, the naming convention is: use PascalCase for the property name, just like with methods. Also, properties should not contain very complex operations. These should be done kept in methods.
In OOP, you would describe an object as having attributes and functionality. You do that when designing a class. Consider designing a car. Examples for functionality could be the ability to move somewhere or activate the wipers. Within your class, these would be methods. An attribute would be the number of passengers within the car at a given moment. Without properties, you would have two ways to implement the attribute:
Make a variable public:
// class Car
public int passengerCount = 4;
// calling code
int count = myCar.passengerCount;
This has several problems. First of all, it is not really an attribute of the vehicle. You have to update the value from inside the Car class to have it represent the vehicles true state. Second, the variable is public and could also be written to.
The second variant is one widley used, e. g. in Java, where you do not have properties like in c#:
Use a method to encapsulate the value and maybe perform a few operations first.
// class Car
public int GetPassengerCount()
{
// perform some operation
int result = CountAllPassengers();
// return the result
return result;
}
// calling code
int count = myCar.GetPassengerCount();
This way you manage to get around the problems with a public variable. By asking for the number of passengers, you can be sure to get the most recent result since you recount before answering. Also, you cannot change the value since the method does not allow it. The problem is, though, that you actually wanted the amount of passengers to be an attribute, not a function of your car.
The second approach is not necessarily wrong, it just does not read quite right. That's why some languages include ways of making attributes look like variables, even though they work like methods behind the scenes. Actionscript for example also includes syntax to define methods that will be accessed in a variable-style from within the calling code.
Keep in mind that this also brings responsibility. The calling user will expect it to behave like an attribute, not a function. so if just asking a car how many passengers it has takes 20 seconds to load, then you probably should pack that in a real method, since the caller will expect functions to take longer than accessing an attribute.
EDIT:
I almost forgot to mention this: The ability to actually perform certain checks before letting a variable be set. By just using a public variable, you could basically write anything into it. The setter method or property give you a chance to check it before actually saving it.

Properties simply save you some time from writing the boilerplate that goes along with get/set methods.
That being said, a lot of .NET stuff handles properties differently- for example, a Grid will automatically display properties but won't display a function that does the equivalent.
This is handy, because you can make get/set methods for things that you don't want displayed, and properties for those you do want displayed.

The compiler actually emits get_MyProperty and set_MyProperty methods for each property you define.

Although it is not a hard and fast rule and, as others have pointed out, Properties are implemented as Get/Set pairs 'behind the scenes' - typically Properties surface encapsulated/protected state data whereas Methods (aka Procedures or Functions) do work and yield the result of that work.
As such Methods will take often arguments that they might merely consume but also may return in an altered state or may produce a new object or value as a result of the work done.
Generally speaking - if you need a way of controlling access to data or state then Properties allow the implementation that access in a defined, validatable and optimised way (allowing access restriction, range & error-checking, creation of backing-store on demand and a way of avoiding redundant setting calls).
In contrast, methods transform state and give rise to new values internally and externally without necessarily repeatable results.
Certainly if you find yourself writing procedural or transformative code in a property, you are probably really writing a method.

Also note that properties are available via reflection. While methods are, too, properties represent "something interesting" about the object. If you are trying to display a grid of properties of an object-- say, something like the Visual Studio form designer-- then you can use reflection to query the properties of a class, iterate through each property, and interrogate the object for its value.

Think of it this way, Properties encapsulate your fields (commoningly marked private) while at the same time provides your fellow developers to either set or get the field value. You can even perform routine validation in the property's set method should you desire.

Properties are not just syntactic sugar - they are important if you need to create object-relational mappings (Linq2Sql or Linq2Entities), because they behave just like variables while it is possible to hide the implementation details of the object-relational mapping (persistance). It is also possible to validate a value being assigned to it in the getter of the property and protect it against assigning unwanted values.
You can't do this with the same elegance with methods. I think it is best to demonstrate this with a practical example.
In one of his articles, Scott Gu creates classes which are mapped to the Northwind database using the "code first" approach. One short example taken from Scott's blog (with a little modification, the full article can be read at Scott Gu's blog here):
public class Product
{
[Key]
public int ProductID { get; set; }
public string ProductName { get; set; }
public Decimal? UnitPrice { get; set; }
public bool Discontinued { get; set; }
public virtual Category category { get; set; }
}
// class Category omitted in this example
public class Northwind : DbContext
{
public DbSet<Product> Products { get; set; }
public DbSet<Category> Categories { get; set; }
}
You can use entity sets Products, Categories and the related classes Product and Category just as if they were normal objects containing variables: You can read and write them and they behave just like normal variables. But you can also use them in Linq queries, persist them (store them in the database and retrieve them).
Note also how easy it is to use annotations (C# attributes) to define the primary key (in this example ProductID is the primary key for Product).
While the properties are used to define a representation of the data stored in the database, there are some methods defined in the entity set class which control the persistence: For example, the method Remove() marks a given entity as deleted, while Add() adds a given entity, SaveChanges() makes the changes permanent. You can consider the methods as actions (i.e. you control what you want to do with the data).
Finally I give you an example how naturally you can use those classes:
// instantiate the database as object
var nw = new NorthWind();
// select product
var product = nw.Products.Single(p => p.ProductName == "Chai");
// 1. modify the price
product.UnitPrice = 2.33M;
// 2. store a new category
var c = new Category();
c.Category = "Example category";
c.Description = "Show how to persist data";
nw.Categories.Add(c);
// Save changes (1. and 2.) to the Northwind database
nw.SaveChanges();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

DataTables vs IEnumerable<T> - c#

Using DataTables directly means tying yourself to the underlying data source and how it is laid out. This is not good from a maintainability point of view. If all your view needs is a list of some objects, that's all you should be giving it.

Related

Ending with too many objects (layered design)

C# LINQ and calculations involving large datasets

Is it ok to use C# Property like this

Handling collection properties in a class and NHibernate entities

When to use Properties and Methods?

Categories

Resources