I am having trouble figuring out the best way to refactor a very large C# class and specifically how to pass shared properties/values from the large class into the extracted classes and have those modified values available in the main class.
At the start, this class was 1000 lines long and is very procedural – it involves calling methods and performing work in a specific sequence. Along the way things are persisted into the database. During the process there are multiple Lists of items that are worked on and shared in the methods. At the end of this process, there are a bunch of statistics that are presented to the user. These statistics are calculated in various methods as the processing is taking place. To give a rough outline – the process involves a bunch of random selection and at the end of the process the user sees how many random items, how many invalid records were picked, how many items came from this sub-list etc.
I have been reading Uncle Bob’s “Clean Code” and am trying to make sure as I refactor that each class does only 1 thing.
So I have been able to extract methods and classes in order to keep the file smaller (have it down to 450 lines now) but the problem I am having now is that these broken out classes require values from the main parent class to be passed to them and updated – these values will be used for other methods/class methods as well.
I am torn as to which is the cleanest approach:
1) Should I create a bunch of private member variables to store the statistical values and Lists in the main class and then after calling into the main class' dependnat class methods, receive back a complex result class and then extract these values out and populate / update the private member variables? ( lots of boiler plate code this way)
OR
2) Is it better to create a DTO or a some sort of container class that holds the Lists and statistical values and just pass it to the various class methods and child class methods by reference in order to build up the list of values? In other words I am just passing this container class and since it's an object the other classes and methods will be able to directly manipulate the values in there. Then at the end of the process, that values DTO/container/whatever you want to call it will have all of the final results in it and I can just extract them from the container class (and in that case there really is no need to extract them and populate the main class’ private member variables. )
The latter is the way I have it now but I am feeling that this is a code smell – it all works but it just seems “fragile”. I know large classes are not great but at least with everything in 1 large file it does seem clearer as to which properties I am updating etc.
-- UPDATE --
Some more info:
Unfortunately I can't post any of the actual code as it is propriatary - will try to come up with dummy example and paste it in if I get some time. One of the comments below mentioned refactoring the code into steps and that is exactly what I've done. The purpose of the class is ultimately 1 thing - to create a random list of things - so in the only public method that gets called for this class - I have refactored this down to 1 level of abtraction for each "step". Each step, whether that is a method in the same class, or if it makes sense to break it out into a helper class to do the substeps - it still requires access to the lists that get built up during the process and the simple counter variables that keep track of the statistics.
-- UPDATE --
Here is an attempt at showing something similar in code:
public class RandomList(){
public int Id{get; set;}
public int Name{get; set;}
public int NumOfInvalidItems {get; set;}
public int NumOfFirstChunkItems{get; set;}
public int NumOfSecondChunkItems{get; set;}
public ICollection<RandomListItem> Items{get; set;}
}
public class CreateRandomListService(){
private readonly IUnitOfWork _unitOfWork;
private readonly ICreateRandomListValidator _createRandomListValidator;
private readonly IRandomSubProcessService _randomSubProcessService;
private readonly IAnotherSubProcessService _anotherSubProcessService;
private RandomList _randomList;
public CreateRandomListService(IUnitOfWork unitOfWork,
ICreateRandomListValidator createRandomListValidator,
IRandomFirstChunkFactory randomFirstChunkFactory,
IRandomSecondChunkFactory randomSecondChunkFactory){
_unitOfWork = unitOfWork;
_createRandomListValidator = createRandomListValidator;
_randomFirstChunkService = randomFirstChunkFactory.Create(_unitOfWork);
_randomSecondChunkService = randomSecondChunkFactory.Create(_unitOfWork);
}
public CreateResult CreateRandomList(CreateRandomListValues createValues){
// validate passed in model before proceeding
if(_createRandomListValidator.Validate(createValues))
return new CreateResult({HasErrors:true});
InitializeValues(createValues); // fetch settings from db etc and build up
ProcessFirstChunk();
ProcessSecondChunk();
SaveWithStatistics();
createResult.Id = _randomList.Id;
return createResult;
}
private InitializeValues(CreateRandomListValues createValues){
_createValues = createValues;
_createValues.ImportantSetting = _unitOfWork.SettingsRepository.GetImportantSetting();
// etc.
_randomList = new RandomList(){
// set initial properties etc. some come from the passed in createValues, some from db
}
}
private void ProcessFirstChunk(){
_randomFirstChunkService.GetRandomFirstChunk(_createValues);
}
private void ProcessSecondChunk(){
_randomSecondChunkService.GetRandomSecondChunk(_createValues);
}
private void SaveWithStatistics(){
_randomList.Items _createValues.ListOfItems;
_randomList.NumOfInvalidItems = _createValues.NumOfInvalidItems;
_randomList.NumOfItemsChosen = _createValues.NumOfItemsChosen;
_randomList.NumOfFirstChunkItems = _createValues.NumOfFirstChunkItems;
_randomList.NumOfSecondChunkItems = _createValues.NumOfSecondChunkItems;
_unitOfWork.RandomThingRepository.Add(_randomList);
_unitOfWork.Save();
}
}
public class RandomFirstChunkService(){
private IUnitOfWork _unitOfWork;
public RandomFirstChunkService(IUnitOfWork unitOfWork){
_unitOfWork = unitOfWork;
}
public void GetRandomFirstChunk(CreateRandomListValues createValues){
// do processing here - build up list collection and keep track of counts
CallMethodThatUpdatesList(creatValues);
// how to return this to calling class? currently just updating values in createValues by reference
// can also return a complex class here and extract the values back to the main class' member
// variables
}
private void CallMethodThatUpdatesList(createRandomListValues createValues){
// do work
}
}
The brutal answer is that it depends... of course. It is hard to work out a answer without reading the code, but I would say that once you have created new classes (with one purpose) those classes and interfaces should define what data objects you need to pass around to solve your problems. And in that case it is strange for a method to return the same type as pass into it, I also think that manipulation one object through a seriers of methods is fragile. Imagine if each of you classes was a REST service; then how would those interfaces look like.
I wouldn't "pass stuff around". Nor would I break it up into separate classes just because its 1000 lines. You'll end up making it much messier and much more of a maintenance headache.
You didn't post your code (duh), so its hard to critique it. If you really go over it, I suspect you might have duplicate code in there that can be refactored into methods, etc.
If you've already gotten rid of the duplicate code, I'd next pull out all the database stuff into a DAL layer.
If you really want to make it smaller (based on what little info you provided), I'd next refactor it into "steps" and make a workflow type parent container class.
Again, hard to say without knowing the code.
I don't know how exactly you have managed to refactor the class this far, but from your explanation it sounds to me like the "statistic" is the concept that should become an object, something like:
interface IStatistic<TOutput>
{
IEnumerable<TOutput> Calculate(IEnumerable<input-type>);
}
When you wish to display some statistic, you just use the appropriate statistic:
return new MySpecial().Calculate(myData);
In case that statistics objects are not that easy to construct, e.i. they ask for some parameters and so, then you may supply a Func delegate which creates them:
void DoSomething(Func<IStatistic<string>> factory)
{
string[] inputData = ...
foreach (string line in factory().Calculate(inputData))
{
// do something...
}
}
As you are mentioning multiple lists, I suppose that input-type would actually be a couple of input types. If that is so, then it might really make sense to supply a kind of a DTO to just hold the lists:
class RawData
{
public IEnumerable<type1> Data1 { get; }
public IEnumerabel<type2> Data2 { get; }
...
}
Observe, however, that this is not a DTO "by the book". First, it is immutable - only getters are there. Second, it only exposes sequences (IEnumerable), rather than raw lists. Both measures are taken to disallow statistic objects to manipulate data.
Related
I'm using C# 4.0, Asp.Net. I have a problem regarding the proper construction of a readonly structure within a custom cache I created.
Details (summary) :
My CacheManager class (singleton) uses, as parameter, an instance of the existing MemoryCache class and wraps around a few helpful methods to deal with supplementary stuff such as object life cycle within my custom cache.
That Manager deals with a simple CachableObject that takes three variables :
object
DateTime
int (duration)
In summary, my custom cache Manager stores objects for a limited amount of time in order to protect my database from frequent big queries.
Lately, I tried to :
Got back an object from the cache (ie : stored under the key -MyList)
Casted it back to a list of complexe objects
Translated the content of some properties for each complexe objects
Stored again the freshly translated list of objects within the cache, (under another key -MyTranslatedList)
The problem :
During my testing, it appeared to me that both lists stored in the cache (raw and translated one) were refering to the same underlying objects. Therefore, once translated, those objects were actually translated in both lists.
Since each list only has references to the objects, that's a perfectly normal behavior and a silly mistake from me.
The question :
As you can easily guess now, I would like to protect myself and other users of my singleton for that kind of mistakes.
I would like to insert (or store or get) any kind of object (or list of complexe objects) so they cannot be altered by anybody getting them through the cache. I would like the data within my cache to be readonly (and deeply readonly) to avoid having that kind of problem. I want anybody to have to create a deep copy (or even better, to get one) before starting to use the data stored within the cache.
What I tried so far :
I tried to make the object readonly. It didn't work as expected.
Since I'm often storing list of complexe objects, I've found the AsReadOnly method that return a IReadOnlyCollection, but while this prevents me from altering the list (add, remove) it doesn't protect the objects that are within the list.
I hope my explanation is somewhat understandable :) Is there a neat way of dealing with that kind of situation ?
I would create a class where the properties are readonly:
class ReadonlyClass
{
private string p1;
private int p2;
public ReadonlyClass(string property1, int property2)
{
p1 = property1;
p2 = property2;
}
public string Property1
{
get { return p1; }
}
public int Property2
{
get { return p2; }
}
}
If the properties are objects/other classes, you should implement a clone function that returns a copy of the object. The clone function for the above class would looke like this:
public ReadonlyClass clone()
{
return new ReadonlyClass(p1, p2);
}
Best regards
Hans Milling...
I'm modifying an app for performance gains. The app has a class with many properties. Typically this class is populated in its entirety by a primary key that pulls a large query from a database. The application is in great part slow because this happens constantly throughout, even though much of the time only one two properties in the class are needed in a given section of code. The existing large class has only a default constructor and all of its properties are nullable or have default values. In code below ignore lack of constructors and how these objects are populated.
public class Contract
{
public enum ContractStatus
{
Draft, Active, Inactive
}
private Int32 contractId = DALC.DefaultInt32;
private String name = DALC.DefaultString;
private ContractStatus status;
private ContractType contractType = null;
private CurrencyType currencyType = null;
private Company company = null;
}
As you can see it has its own properties, and also references other classes (e.g. ContractType, Company).
A few approaches I've thought of in light of common design patterns:
1) re-factor this hugely and break up those smaller sub-sections into their own classes with their own properties. Then reconstruct that large class with all of the smaller ones when it is needed. This will be quite laborious, though even if it sounds ideal and consistent with SOLID OOD principles.
2) Create new classes that simply contain the large class, but only expose one or two of its properties. I'm still creating a full blown version of the original, large class, but I will only populate the data I need. This will be via simple DB query, thus the bulk of the class will sit there unused and its null default classes it's referencing won't ever be constructed.
public class ContractName
{
Contract contract;
public ContractName()
{
contract = new Contract();
}
public String Name
{
get { return contract.Name; }
set { contract.Name = value; }
}
}
3) Add new constructors to existing large class with a parameter indicating what chunks of code I want to actually populate. This sounds messy and kind of nasty and wrong, and would leave me in the scenario where if Contract is created by a contract ID in one section of code it has different info than if created by contract ID elsewhere.
Thanks for any ideas!
I would recommend option 1: extract the classes you think you need to extract now. The other two options are just adding more technical debt which will take even longer to resolve in the future. Well-factored code is usually much easier to optimise than big complicated classes.
In my experience, breaking up classes is not all that laborious. In fact I usually find myself surprised by how quickly I can execute refactorings like Extract Class as long as I follow the recipe.
I have a design problem,
Basically, I have a class called Currency
public class Currency
{
public int ID;
public string Name;
public int RoundingValue;
public Currency() { }
public void GetData() { // Some SQL query code // }
}
Sometimes it is necessary to fetch all the currencies that there are in the system to make a decision concering exchange rates, compatability of payment, etc.
I see two ways of doing that (fetching data):
1) To make a static method inside Currency class to do it. That involves creating SQL connection instance inside it(not sure if that is the right thing to do), creating List<Currency> instance to store the collection, and then pass it outside the class.
2) Create collection of the class via extending Collections.BaseCollection class, make instance of it, doing the same SQL query, and then return the result. But that class will provide no additional functionality, and probably won't ever (the same for Currency itself.
In other cases, I used extended collections, because they needed to store additional info, based on the contents of the collection.
But in this case, no additional info is created or functionality provided.
So, what design would be more practical?
If there is an alternative to the these solutions, I would be more than happy to hear it.
I would suggest simply populating a List<Currency> then returning it as IList<Currency>. That way if you change it in future to use a custom collection, you won't break any consumers.
I've read a lot of detailed things throughout Stack Overflow, Microsoft Developer Network, and a couple of blogs. The general consensus is "A Constructor shouldn't contain large quantities of parameters." So encountering this got me thinking-
Initial Problem: My application contains around fifteen variables that are constantly being used throughout the application. The solution I came up with is I'll create a single class that will inject the values to the Properties.
So this seemed to work quite well, it made my life quite easy as I could pass the object into another class through the Constructor without having to assign all these variables to each method. Except this lead to another issue-
public class ServerParameters
{
// Variable:
private string template;
private string sqlUsername;
private string sqlPassword;
private string sqlDatabase;
private string sqlServer;
public ServerParameter(string _template, string _sqlDatabase, string _sqlServer,
string _sqlUsername, string _sqlPassword)
{
template = _template;
sqlUsername = _sqlUsername;
sqlPassword = _sqlPassword;
sqlDatabase = _sqlDatabase;
sqlServer = _sqlServer;
}
// Link the private strings to a group of Properties.
}
So already this Constructor has become significantly bloated- But now I need to implement even more Parameters.
Problem Two: So I have a bloated Constructor and by implementing other items that don't entirely fit with this particular Class. My solution to this, was to create a subclass or container to hold these different classes but be able to utilize these classes.
You now see the dilemma, which has aroused the all important question- When you can only inherit once, how can you build a container that will hold all of these subclasses?
And why shouldn't you use so many parameters in a Constructor, why is it bad exactly?
My thought on how to implement a Container but I feel like I'm doing it wrong- Because I constantly get Null Reference Exception when I try to use some of these Parameters.
public class VarContainer
{
private ServerParameter server;
private CustomerParameter customer;
public VarContainer(ServerParameter _server, CustomerParameter _customer)
{
server = _server;
customer = _customer;
}
}
I'm assuming it is because the internal class itself isn't actually getting those assigned variables, but I'm completely lost on the best approach to achieve my goal-
The main intent of "don't do work in your constructor" is to avoid side effects where you create an object and it does a significant amount of work that can impact global state unexpectedly or even take a long time to complete which may disrupt the caller's code flow.
In your case, you're just setting up parameter values, so this is not the intention of "don't do work", since this isn't really work. The design of your final container depends on your requirements - if you can accept a variable list of properties that are set on your class (or struct) then perhaps an initializer when you construct the object is more appropriate.
Assuming that you want all of your properties from the get go, and that you want grouping like you called out in the question, I would construct something similar to:
public class Properties
{
public ServerProperties Server { get; private set; }
public CustomerProperties Customer { get; private set; }
public Properties(ServerProperties server, CustomerProperties customer)
{
Server = server;
Customer = customer;
}
}
I'm leaving the implementation of ServerProperties and CustomerProperties to you, but they follow the same implementation pattern.
This is of course a matter of preferences but I always give my constructors all the parameters they need so that my objects has basic functionality. I don't think that 5 parameters is bloated and adding a container to pass parameters adds much more bloat in my opinion than adding a few more parameters. By new bloat I mean that you will probably have a new file for that, with new classes and new imports. Calling code has to write more using directives and link to correct libraries which needs to be exported correctly as well.
Adding a wrapping class for parameter masks the real problem, that your class might be too complicated, it does not solve it and generally aggravates it.
You can have any amount of parameters you want in a constructor. It's just that if you have too many (how many is too much? that's really subjective), it gets harder and harder to make a new instance of that class.
For example, suppose you have a class with 30 members. 27 of them can be null. If you force it to receive a value for each member in the constructor, you'll get code like this:
Foo bar = new Foo(p1, p2, p3, null, null, null, null, null, null /*...snip*/);
Which is boring to write and not very readable, where a three parameter constructor would do.
IMO, this is what you should receive in your constructors:
First, anything that your instance absolutely needs in order to work. Stuff that it needs to make sense. For example, database connection related classes might need connection strings.
After those mentioned above, you may have overloads that receive the stuff that can be most useful. But don't exagerate here.
Everything else, you let whomever is using your code set later, through the set accessor, in properties.
Seems to me like you could use dependency injection container like Unity or Castle Windsor.
In the projects I worked on I have classes that query/update database, like this one,
public class CompanyInfoManager
{
public List<string> GetCompanyNames()
{
//Query database and return list of company names
}
}
as I keep creating more and more classes of this sort, I realize that maybe I should make this type of class static. By doing so the obvious benefit is avoid the need to create class instances every time I need to query the database. But since for the static class, there is only one copy of the class, will this result in hundreds of requests contend for only one copy of static class?
Thanks,
I would not make that class static but instead would use dependency injection and pass in needed resources to that class. This way you can create a mock repository (that implements the IRepository interface) to test with. If you make the class static and don't pass in your repository then it is very difficult to test since you can't control what the static class is connecting to.
Note: The code below is a rough example and is only intended to convey the point, not necessarily compile and execute.
public interface IRepository
{
public DataSet ExecuteQuery(string aQuery);
//Other methods to interact with the DB (such as update or insert) are defined here.
}
public class CompanyInfoManager
{
private IRepository theRepository;
public CompanyInfoManager(IRepository aRepository)
{
//A repository is required so that we always know what
//we are talking to.
theRepository = aRepository;
}
public List<string> GetCompanyNames()
{
//Query database and return list of company names
string query = "SELECT * FROM COMPANIES";
DataSet results = theRepository.ExecuteQuery(query);
//Process the results...
return listOfNames;
}
}
To test CompanyInfoManager:
//Class to test CompanyInfoManager
public class MockRepository : IRepository
{
//This method will always return a known value.
public DataSet ExecuteQuery(string aQuery)
{
DataSet returnResults = new DataSet();
//Fill the data set with known values...
return returnResults;
}
}
//This will always contain known values that you can test.
IList<string> names = new CompanyInfoManager(new MockRepository()).GetCompanyNames();
I didn't want to ramble on about dependency injection. Misko Hevery's blog goes into great detail with a great post to get started.
It depends. Will you ever need to make your program multithreaded? Will you ever need to connect to more than one database? Will you ever need to store state in this class? Do you need to control the lifetime of your connections? Will you need data caching in the future? If you answer yes to any of these, a static class will make things awkward.
My personal advice would be to make it an instance as this is more OO and would give you flexibility you might need in the future.
You have to be careful making this class static. In a web app, each request is handled on its own thread. Static utilities can be thread-unsafe if you are not careful. And if that happens you are not going to be happy.
I would highly recommend you follow the DAO pattern. Use a tool like Spring to make this easy for you. All you have to do is configure a datasource and your DB access and transactions will be a breeze.
If you go for a static class you will have to design it such that its largely stateless. The usual tactic is to create a base class with common data access functions and then derive them in specific classes for, say, loading Customers.
If object creation is actually the overhead in the entire operation, then you could also look at pooling pre-created objects. However, I highly doubt this is the case.
You might find that a lot of your common data access code could be made into static methods, but a static class for all data access seems like the design is lost somewhere.
Static classes don't have any issues with multi-threaded access per-se, but obviously locks and static or shared state is problematic.
By making the class static, you would have a hard time unit testing it, as then you
would probably have to manage internally the reading of the connection string in a non-clear manner, either by reading it inside the class from a configuration file or requesting it from some class that manages these constants. I'd rather instantiate such a class in a traditional way
var manager = new CompanyInfoManager(string connectionString /*...and possible other dependencies too*/)
and then assign it to a global/public static variable, if that makes sense for the class, ie
//this can be accessed globally
public static CompanyInfoManager = manager;
so now you would not sacrifice any flexibility for your unit tests, since all of the class's dependencies are passed to it through its constructor