I would like to know what's the best code design when storing and retrieving data from a database when working with objects and classes.
I could do this in two ways.
In the class constructur I query the database and stores all info in instance variables inside the class and retrieve them with getters/setters. This way I can always get any information I want, but in many cases wont be needing all the information all the time.
public class Group {
public int GroupID { get; private set; }
public string Name { get; private set; }
public Group(int groupID)
{
this.GroupID = groupID;
this.Name = // retrieve data from database
}
public string getName()
{
// this is just an example method, I know I can retrieve the name from the getter :)
return Name;
}
}
The other way is to create some methods and pass in the groupID as a parameter, and then query the database for that specific information I need. This could result in more querys but I will only get the information I need.
public class Group {
public Group()
{
}
public string getName(int groupID)
{
// query database for the name based on groupID
return name;
}
}
What do you think is the best way to go? Is there a best practice to go with here or is it up to me what I think works the best?
You don't want to do heavy DB work in the constructor. Heavy work should be done in methods.
You also don't want to necessarily couple the DB work with the entity class that holds the data. What if you want a method to return two of those objects from the database? For example GetGroups() - you can't even construct one without doing DB work. For something that returns multiple, the storage and retrieval is decouple from the entity class.
Instead, decouple your DB work from your entity objects. One option is you can have a dataaccesslayer with methods like GetFoo or GetFoos etc... that query the database, populate the objects and return them.
If you use an ORM, see:
https://stackoverflow.com/questions/3505/what-are-your-favorite-net-object-relational-mappers-orm
Lazy loading versus early loading, which is what this really boils down to, is best determined by usage.
Mostly this means related entities -- if you are dealing with an individual address for instance, splitting the read for the city from the read for the state would be crazy; OTOH when returning a list of company employee's reading their address information is probably a waste of time and memory.
Also, these aren't mutually exclusive options -- you can have a constructor that calls the databases, and a constructor that uses provided data.
If it is a relational database the best way would be to do it with ORM (object-relational mapping). See here for a list of ORM-mappers:
https://en.wikipedia.org/wiki/List_of_object-relational_mapping_software
Related
Consider if you will, the example of an Order class having a collection property of OrderLines.
public class Order
{
public OrderLineCollection OrderLines { get; private set; }
}
Now consider a Data Access Layer that returns a collection of Order objects without the OrderLines property populated (empty collection).
To minimize round trips to the server, the system passes the ids of the all Order objects to the DAL, which returns the OrderLine objects for each Order in one go. Code in the Business Rules Layer is responsible for adding the correct OrderLine objects to the correct Order objects.
public class OrderDAL
{
public IEnumerable<Order> GetOrdersByCustomer(int customerId)
{
...
}
public IEnumerable<OrderLine> GetOrderLines(IEnumerable<int> orderIds)
{
...
}
}
Is this general way of doing this kind of thing (to reduce database round-trips)?
Should the DAL have the responsibility of returning fully populated Order objects?
Are there better ways?
And no, I cannot use a ORM tool in this particular instance!
I for one don't. I don't want to go back to the store to retrieve more data after an initial query. When loading the data, you (ought to) know for what environment you are loading it, so you will know what "navigational properties" or joins you want to make on beforehand. This way with one query you can get all the data you want.
This is however from a stateless point of view, as I'm currently focusing on MVC and Entity Framework. I guess if you're creating an accounting program, you may have one Orders screen that displays order headers, and an Order Details screen where you want to display the details for the selected order. So in that case, yes, it can be useful to only have to retrieve the OrderLines for the selected order(s).
As usual, the answer is: it depends.
And no, I cannot use a ORM tool in this particular instance!
Why?
I have the need for both light-weight, and heavy-weight versions of an object in my application.
A light-weight object would contain only ID fields, but no instances of related classes.
A heavy-weight object would contain IDs, and instances of those classes.
Here is an example class (for purpose of discussion only):
public class OrderItem
{
// FK to Order table
public int OrderID;
public Order Order;
// FK to Prodcut table
public int ProductID;
public Product Product;
// columns in OrderItem table
public int Quantity;
public decimal UnitCost;
// Loads an instance of the object without linking to objects it relates to.
// Order, Product will be NULL.
public static OrderItem LoadOrderItemLite()
{
var reader = // get from DB Query
var item = new OrderItem();
item.OrderID = reader.GetInt("OrderID");
item.ProductID = reader.GetInt("ProductID");
item.Quantity = reader.GetInt("Quantity");
item.UnitCost = reader.GetDecimal("UnitCost");
return item;
}
// Loads an instance of the objecting and links to all other objects.
// Order, Product objects will exist.
public static OrderItem LoadOrderItemFULL()
{
var item = LoadOrderItemLite();
item.Order = Order.LoadFULL(item.OrderID);
item.Product = Product.LoadFULL(item.ProductID);
return item;
}
}
Is there a good design pattern to follow to accomplish this?
I can see how it can be coded into a single class (as my example above), but it is not apparent in which way an instance is being used. I would need to have NULL checks throughout my code.
Edit:
This object model is being used on client side of client-server application. In the case where I'm using the light-weight objects, I don't want lazy load because it will be a waste of time and memory ( I will already have the objects in memory on client side elsewhere)
Lazy initialization, Virtual Proxy and Ghost are three implementations of that lazy loading pattern. Basically they refer to load properties once you need them. Now, I suppose you'll be using some repo to store objects so I'll encourage you to use any of the ORM tools available. (Hibernate, Entity Framework and so on), they all implement these functionality free for you.
Have you considered using an ORM tool like NHibernate for accessing DB? If you use something like NHibernate, you would get this behavior by means of lazy loading.
Most ORM tools do exactly what you are looking for within lazy loading - they first get the object identifiers, and upon accessing a method, they issue subsequent queries to load the related objects.
Sounds like you might have a need for a Data Transfer Object (DTO), just a "dumb" wrapper class that summarizes a business entity. I usually use something like that when I need to flatten out an object for display. Be careful, though: overuse results in an anti-pattern.
But rendering an object for display is different from limiting hits against the database. As Randolph points out, if your intention is the latter, then use one of the existing deferred loading patterns, or better yet, use an ORM.
Take a look at the registry pattern, you can use it to find objects and also to better manage these objects, like keeping them in a cache.
I have a method called GetCustomer that returns a Customer object.
Customer object is as below.
Public class Customer
{
public int Id { get; set;}
public string Name { get; set;}
public int CompanyId { get; set;}
}
Lets say Customer is related to a Company and I have to display this customer information on the screen UI. Now, when I call the GetCustomer method I only get back the details about the Customer. On the screen, I also need to display the companyname that this customer belongs to.
One easy way to do this would be to have a property called CompanyName in the customer object and fill it from your datalayer method. But I'm not sure if thats best practice. If the customer also belongs to a department, now I need to have DepartmentId and DeptName as properties. Ids are a must since sometimes I need to send the Id to get the full Dept/Company object.
I don't want to have the full objects for Department and Company in the Customer object.
I assume this is a common scenario where a particular object is linked to others using an ID and when you bring back the main object, you just need to display the name from the other linked objects.
What is the best to way to handle this. My main intention is to avoid an (or more) extra database call.
I'm NOT using Linq or Entity Framework, just regular ADO.NET.
This situation depends on your goals as;
1 - if you want to avoid extra DB calls, you have to code your UI via your db communicator with only one instance, open it only once and flush its members (adapter, command ..etc.) every time after making DB calls and close connection at the end of data transfers.
2 - for other purpose of your question, use lazy loading. put only id's on your entity and initialize and use the id's belonging entity if needed!
For example:
public class Customer
{
public int Id { get; set;}
public string Name { get; set;}
public int CompanyId { get; set;}
}
public class Company
{
public int CompanyId;
//company fields
}
// .. on your business layer if you need to use Company data:
// examine your Customer instance as "customer"
Company userCompany = GetCompanyWithId(customer.CompanyId);
but as you no doubt of your guess, data that is going to load is depends of your needs. Think simple. If your only need is Department and Company names, than you can create a view on your DB and can call it on your code. You can create an entity as CustomerWithFullData and you can put a Customer and a Department etc. in this entity and when you need to show full data you can fill this with DB View. Or dont bother to create entities. If you dont need entity, Call DB View directly DataSet and bind Tables. So you can transfer data collection work to DB and this is what we want to do.
As i said before, think simple.
What I would do is to retain an OO structure at the level of your business objects, and modify your DAL to return all the information you need with a single DB round-trip.
For example, you could have a stored procedure that returns two result sets: one for the Customers, and another for the Companies they reference. You could use table valued parameters to pass the SP a list of one or more Customers to look-up.
Depending on the size of your DB and your performance requirements, you also might be able to read the entire Customer and Company tables into memory when the app starts, cache the results, and manage insert/update/deletes against that data.
I'm reading through Pro ASP.NET MVC 3 Framework that just came out, and am a bit confused about how to handle the retrieval of aggregate objects from a data store. The book uses Entity framework, but I an considering using a mini-ORM (Dapper or PetaPoco). As an example, the book uses the following objects:
public class Member {
public string name { get; set; }
}
public class Item {
public int id { get; set; }
public List<Bid> bids { get; set; }
}
public class Bid {
public int id { get; set; }
public Member member { get; set; }
public decimal amount { get; set; }
}
As far as I'm into the book, they just mention the concept of aggregates and move on. So I am assuming you would then implement some basic repository methods, such as:
List<Item> GetAllItems()
List<Bid> GetBidsById(int id)
GetMemberById(int id)
Then, if you wanted to show a list of all items, their bids, and the bidding member, you'd have something like
List<Item> items = Repository.GetAllItems();
foreach (Item i in items) {
i.Bids = Repository.GetBidsById(i.id);
}
foreach (Bid b in items.Bids) {
b.Member = Repository.GetMemberById(b.id);
}
If this is correct, isn't this awfully inefficient, since you could potentially issue thousands of queries in a few seconds? In my non-ORM thinking mind, I would have written a query like
SELECT
item.id,
bid.id,
bid.amount,
member.name
FROM
item
INNER JOIN bid
ON item.id = bid.itemId
INNER JOIN member
ON bid.memberId = member.id
and stuck it in a DataTable. I know it's not pretty, but one large query versus a few dozen little ones seems a better alternative.
If this is not correct, then can someone please enlighten me as to the proper way of handling aggregate retrieval?
If you use Entity Framework for you Data Access Layer, read the Item entity and use the .Include() fluent method to bring the Bids and Members along for the ride.
An aggregate is a collection of related data. The aggregate root is the logical entry point of that data. In your example, the aggregate root is an Item with Bid data. You could also look at the Member as an aggregate root with Bid data.
You may use your data access layer to retrieve the object graph of each aggregate and transforming the data for your use in the view. You may even ensure you eager fetch all of the data from the children. It is possible to transform the data using a tool like AutoMapper.
However, I believe that it is better to use your data access layer to project the domain objects into the data structure you need for the view, whether it be ORM or DataSet. Again, to use your example, would you actually retrieve the entire object graph suggested? Do I need all items including their bids and members? Or do I need a list of items, number of bids, plus member name and amount for the current winning bid? When I need more data about a particular item, I can go retrieve that when the request is made.
In short, your intuition was spot-on that it is inefficient to retrieve all that data, when a projection would suffice. I would just urge you to limit the projection even further and retrieve only the data you require for the current view.
This would be handled in different ways depending on your data access strategy. If you were using NHibernate or Entity Framework, you can have the ORM automatically populate these properties for you eagerly, lazy load them, etc. Entity Framework calls them "Navigation Properties", I'm not sure that NHibernate has a specific name for these "child properties" or "child collections".
In old-school ADO.NET, you might do something like create a stored procedure that returns multiple result sets (one for the main object and other result sets for your child collections or related objects), which would let you avoid calling the database multiple times. You could then iterate over the results sets and hydrate your object with all its relationships with one database call, and inside of a single repository method.
Where ever in your system you do the data retrieval, you would program your orm of choice to do an eager fetch of the related objects (aggregates).
Using what kind of data access method depends on your project.
Convenience vs performance.
Using EF or Linq to SQL really boosts the coding speed. When talking about performance, you really should care about every sql statement you deliver to the database.
No ORM can do both.
You can treat the read (query) and the write (command) side of the model separately.
When you want to mutate the state of your Aggregate, you load the Aggregate Root (AR) via a repository, mutate its state using the intention revealing public methods on the AR, then save the AR with the repository back again.
On the read side however, you can be as flexible as you want. I don't know Entity Framework, but with NHibernate you could use the QueryOver API to generate flexible queries to populate DTO's designed to be consumed by the client, whether it be a service or a View. If you want more performance you could go with Dapper. You could even use Stored Procs that projects itself to a DTO, that way you can be as efficient in the DB layer as possible.
I'm currently reading the book Pro Asp.Net MVC Framework. In the book, the author suggests using a repository pattern similar to the following.
[Table(Name = "Products")]
public class Product
{
[Column(IsPrimaryKey = true,
IsDbGenerated = true,
AutoSync = AutoSync.OnInsert)]
public int ProductId { get; set; }
[Column] public string Name { get; set; }
[Column] public string Description { get; set; }
[Column] public decimal Price { get; set; }
[Column] public string Category { get; set; }
}
public interface IProductsRepository
{
IQueryable<Product> Products { get; }
}
public class SqlProductsRepository : IProductsRepository
{
private Table<Product> productsTable;
public SqlProductsRepository(string connectionString)
{
productsTable = new DataContext(connectionString).GetTable<Product>();
}
public IQueryable<Product> Products
{
get { return productsTable; }
}
}
Data is then accessed in the following manner:
public ViewResult List(string category)
{
var productsInCategory = (category == null) ? productsRepository.Products : productsRepository.Products.Where(p => p.Category == category);
return View(productsInCategory);
}
Is this an efficient means of accessing data? Is the entire table going to be retrieved from the database and filtered in memory or is the chained Where() method going to cause some LINQ magic to create an optimized query based on the lambda?
Finally, what other implementations of the Repository pattern in C# might provide better performance when hooked up via LINQ-to-SQL?
I can understand Johannes' desire to control the execution of the SQL more tightly and with the implementation of what i sometimes call 'lazy anchor points' i have been able to do that in my app.
I use a combination of custom LazyList<T> and LazyItem<T> classes that encapsulate lazy initialization:
LazyList<T> wraps the IQueryable functionality of an IList collection but maximises some of LinqToSql's Deferred Execution functions and
LazyItem<T> will wrap a lazy invocation of a single item using the LinqToSql IQueryable or a generic Func<T> method for executing other code deferred.
Here is an example - i have this model object Announcement which may have an attached image or pdf document:
public class Announcement : //..
{
public int ID { get; set; }
public string Title { get; set; }
public AnnouncementCategory Category { get; set; }
public string Body { get; set; }
public LazyItem<Image> Image { get; set; }
public LazyItem<PdfDoc> PdfDoc { get; set; }
}
The Image and PdfDoc classes inherit form a type File that contains the byte[] containing the binary data. This binary data is heavy and i might not always need it returned from the DB every time i want an Announcement. So i want to keep my object graph 'anchored' but not 'populated' (if you like).
So if i do something like this:
Console.WriteLine(anAnnouncement.Title);
..i can knowing that i have only loaded from by db the data for the immediate Announcement object. But if on the following line i need to do this:
Console.WriteLine(anAnnouncement.Image.Inner.Width);
..i can be sure that the LazyItem<T> knows how to go and get the rest of the data.
Another great benefit is that these 'lazy' classes can hide the particular implementation of the underlying repository so i don't necessarily have to be using LinqToSql. I am (using LinqToSql) in the case of the app I'm cutting examples from, but it would be easy to plug another data source (or even completely different data layer that perhaps does not use the Repository pattern).
LINQ but not LinqToSql
You will find that sometimes you want to do some fancy LINQ query that happens to barf when the execution flows down to the LinqToSql provider. That is because LinqToSql works by translating the effective LINQ query logic into T-SQL code, and sometimes that is not always possible.
For example, i have this function that i want an IQueryable result from:
private IQueryable<Event> GetLatestSortedEvents()
{
// TODO: WARNING: HEAVY SQL QUERY! fix
return this.GetSortedEvents().ToList()
.Where(ModelExtensions.Event.IsUpcomingEvent())
.AsQueryable();
}
Why that code does not translate to SQL is not important, but just believe me that the conditions in that IsUpcomingEvent() predicate involve a number of DateTime comparisons that simply are far too complicated for LinqToSql to convert to T-SQL.
By using .ToList() then the condition (.Where(..) and then .AsQueryable() i'm effectively telling LinqToSql that i need all of the .GetSortedEvents() items even tho i'm then going to filter them. This is an instance where my filter expression will not render to SQL correctly so i need to filter it in memory. This would be what i might call the limitation of LinqToSql's performance as far as Deferred Execution and lazy loading goes - but i only have a small number of these WARNING: HEAVY SQL QUERY! blocks in my app and i think further smart refactoring could eliminate them completely.
Finally, LinqToSql can make a fine data access provider in large apps if you want it to. I found that to get the results i want and to abstract away and isolate certain things i've needed to add code here and there. And where i want more control over the actual SQL performance from LinqToSql, i've added smarts to get the desired results. So IMHO LinqToSql is perfectly ok for heavy apps that need db query optimization provided you understand how LinqToSql works. My design was originally based on Rob's Storefront tutorial so you might find it useful if you need more explanation about my rants above.
And if you want to use those lazy classes above, you can get them here and here.
Is this an efficient means of
accessing data? Is the entire table
going to be retrieved from the
database and filtered in memory or is
the chained Where() method going to
cause some LINQ magic to create an
optimized query based on the lambda?
It is efficient, if you wish to say so. The Repository exposes an IQueryable inteface, which basically represents any LINQ Data Provider (in this case Linq2Sql).
Queries are executed the moment you start iterating over the result.
IQueryable therefore supports query composition. You can add any .Where() or .GroupBy() or .OrderBy() call to a query and it will be statisfied by the database.
If you put an enumeration in your query, such as .ToList(), everything after that will happen in memory (LinqToObjects).
But I think the repository implementation is useless. I want my repository to control query execution, which is impossible when exposing IQueryable.
Yes linq2sql will generate magic to make it more efficient. It depends on you using the IQueryable interface. If you want to check clamp the SQL profiler on and you can see it generate the appropriate query.
I would recommend introducing a service layer to abstract away your dependancy on linq2sql.
I've also read that book recently and this is the SQL generated when I ran the sample code:
SELECT [t1].[Category]
FROM ( SELECT DISTINCT [t0].[Category]
FROM [Products] AS [t0] ) AS [t1] ORDER BY [t1].[Category]
I don't think you can write anything more efficient given that database. However in most real databases your Categories would be in a separate table to keep things DRY.