It is one ideal practice in a multithreaded environment to clone objects (eg: a list) to promote immutability. However if we do so it can be a lie to the API users. What I'm saying is that..
Consider the following code:
public class Teacher {
public List<Student> Students = new List<Student>();
public Student GetStudent(int index) {
return Students[index].Clone();
}
}
public class Student {
public DateTime LastAttended { get; set; }
}
and the users of the API could have done so:
var teacher = new Teacher();
var student3 = teacher.GetStudent(3);
student3.LastAttended = DateTime.Now;
Without proper documentation the user could not have known the student object he is getting is actually a cloned object and in which case all changes made to the object will not reflect the original one.
How can the code above be improved in a way for the user to know intuitively that the GetStudent is meant only for reading and not for modification? Is there any way to force / restrict from modifying the Student object returned from the GetStudent method?
Your Student object isn't immutable at all. If you want immutability, make an immutable object:
public sealed class Student {
private readonly DateTime _lastAttended;
public DateTime LastAttended { get { return _lastAttended; } }
public Student(DateTime lastAttended)
{
_lastAttended = lastAttended;
}
}
If you don't want someone to set the value of a property, then do not expose a setter, only a getter.
This of course requires architecting the application around this. If you actually need to update the LastAttended time, you would do that e.g. through a Repository that updates the Database and returns a new Student object. Also, many ORMs can't automatically handle immutable objects and need some translation code.
Note that your issue is super-common when people cache objects in Memory and then pass them along, e.g. to View Models which manipulate them, unknowingly modifying the master-object in the cache. This is why cloning is often recommended for Caches. Cloning protects you from code making modifications to "your" objects - every time someone asks, they get a new instance of your master object. Cloning does not prevent the caller to mess up his own version.
Note that declaring a Field as readonly doesn't do much if the Type of the Field is a mutable type - I could still do e.g. Student.Course.Name = "Test"; even if Course were readonly - I cannot change the reference in the Student object, but I can access any property setters.
True immutability is a bit of a pain in C# as it's a lot of typing and a lot of factory methods. At some point, it may be okay to just leave a normal mutable Get/Set and trust that callers know what to do as they can only mess up themselves, not you. That said, anything that actually manipulates the data in the database needs proper security/business rule checks.
Related
In DDD it is customary to protect an entity's properties like this:
public class Customer
{
private Customer() { }
public Customer(int id, string name) { /* ...populate properties... */ }
public int Id { get; private set; }
public string Name { get; private set; }
// and so on...
}
EF uses reflection so it can handle all those privates.
But what if you need to attach an entity without loading it (a very common thing to do):
var customer = new Customer { Id = getIdFromSomewhere() }; // can't do this!
myContext.Set<Customer>().Attach(customer);
This won't work because the Id setter is private.
What is a good way to deal with this mismatch between the language and DDD?
Ideas:
make Id public (and break DDD)
create a constructor/method to populate a dummy object (makes no sense)
use reflection ("cheat")
???
I think the best compromise, is to use reflection, and set that private Id property, just like EF does. Yes it's reflection and slow, but much faster than loading from the database. And yes it's cheating, but at least as far as the domain is concerned, there is officially no way to instantiate that entity without going through the constructor.
How do you handle this scenario?
PS I did a simple benchmark and it takes about 10s to create a million instances using reflection. So compared to hitting the database, or the reflection performed by EF, the extra overhead is tiny.
"customary" implicitly means it's not a hard set rule, so if you have specific reasons to break those rules in your application, go for it. Making the property setter public would be better than going into reflection for this: not only because of performance issues, but also because it makes it much easier to put unwanted side-effects in your application. Reflection just isn't the way to deal with this.
But I think the first question here is why you would want the ID of an object to be set from the outside in the first place. EF uses the ID primarily to identify objects and you should not use the ID for other logic in your application than just that.
Assuming you have a strong reason to want to change the ID, I actually think you gave the answer yourself in the source you just put in the comments:
So you would have methods to control what happens to your objects and
in doing so, constrain the properties so that they are not exposed to
be set or modified “willy nilly”.
You can keep the private setter and use a method to set the ID.
EDIT:
After reading this I tried doing some more testing myself and you could have the following:
public class Customer
{
private Customer() { }
public Customer(int id) { /* only sets id */ }
public Customer(int id, string name) { /* ...populate properties... */ }
public int Id { get; private set; }
public string Name { get; private set; }
// and so on...
public void SetName(string name)
{
//set name, perhaps check for condition first
}
}
public class MyController
{
//...
var customer = new Customer(getIdFromSomewhere());
myContext.Set<Customer>().Attach(customer);
order.setCustomer(customer);
myContext.SaveChanges(); //sets the customer to order and saves it, without actually changing customer: still read as unchanged.
//...
}
This code leaves the private setters as they were (you will need the methods for editing of course) and only the required changes are pushed to the db afterwards. As is also explained in the link above, only changes made after attaching are used and you should make sure you don't manually set the state of the object to modified, else all properties are pushed (potentially emptying your object).
This is what I'm doing, using reflection. I think it's the best bad option.
var customer = CreateInstanceFromPrivateConstructor<Customer>();
SetPrivateProperty(p=>p.ID, customer, 10);
myContext.Set<Customer>().Attach(customer);
//...and all the above was just for this:
order.setCustomer(customer);
myContext.SaveChanges();
The implementations of those two reflection methods aren't important. What is important:
EF uses reflection for lots of stuff
Database reads are much slower than these reflection calls (the benchmark I mentioned in the question shows how insignificant this perf hit is, about 10s to create a million instances)
Domain is fully DDD - you can't create an entity in a weird state, or create one without going through the constructor (I did that above but I cheated for a specific case, just like EF does)
I have seen many "wrapper" classes for the ASP.NET Session state and some do something like:
Strongly Typed Layer (Pseudo Code #1)
public class MySession
{
public int MyID
{
get
{
return Convert.ToInt32(HttpContext.Current.Session["MyID"]);
}
set
{
HttpContext.Current.Session["MyID"] = value;
}
}
public string MyName
{
get
{
return (HttpContext.Current.Session["MyName"]).ToString();
}
set
{
HttpContext.Current.Session["MyName"] = value;
}
}
...
public MySession()
{
// Could be static or instantiated depending on needs...
}
...
}
///// USAGE IN OTHER CLASS /////
MySession currSession = new MySession();
currSession.MyID = 5;
currSession.MyName = "John Doe";
Console.WriteLine($"{currSession.MyName}'s ID = {currSession.MyID}");
Then I have seen others do something like:
Generic List Variant (Pseudo Code #2)
public class SessionVariables
{
public int MyID
{
get;
set
{
MyID = value;
MySession.SaveVariables();
}
}
public string MyName
{
get;
set
{
MyName = value;
MySession.SaveVariables();
}
}
...
}
public class MySession
{
public static List<SessionVariables> Variables;
// Might be private in real application environment
public MySession() // Could be static or instantiated depending on needs...
{
if (HttpContext.Current.Session["MyVariables"] == null)
{
HttpContext.Current.Session["MyVariables"] = new List<SessionVariables>();
}
// Obviously more appropriate checking to do here, but for simplicity's sake...
Variables = (List<SessionVariables>)HttpContext.Current.Session["MyVariables"]
}
public static void SaveVariables()
{
HttpContext.Current.Session["MyVariables"] = Variables;
}
...
}
///// USAGE /////
public class MyPage
{
public void MyMethod()
{
MySession currSession = new MySession(); // Create variables
MySession.Variables.MyID = 5;
MySession.Variables.MyName = "John Doe";
Console.WriteLine($"{MySession.Variables.MyName}'s ID = {MySession.Variables.MyID}");
...
}
}
Thoughts
Obviously, these examples are both pseudo code style (so please ignore general errors), but they illustrate some of the approaches to building a data access layer for the Session state.
I do something similar to the first variant, albeit, with a more comprehensive data type mapping/conversion plan. I use a an "normal" class to wrap Session in, but it could easily be static since the properties will pull from the Session state when their "get" is called and thus never be out of sync since the class actually doesn't hold any data itself.
The second seems more "overkill" to me from first impressions since yes, you are only storing one variable in the Session state, but it clutters up the rest of the code by forcing code to be making references to the list:
myObject.TheList.VariableIWant
VS
myObject.VariableIWant
of which I prefer the later (just looks cleaner), though this could easily be hidden in a super class or just making a local variable directly reference the list:
new MySession(); // Create the variables
List<SessionVariables> mySession = MySession.Variables;
... though that looks kind of dirty to me at first glance. However, I don't know how much of a benefit using a list for storage actually gives to code/performance since storing an object that represents a list should take as much memory as doing each variable separately, at least that is my thinking.
Question
Which is better practice / low maintenance in the long-term? And/or Which gives better performance to the website?
Option #1 is the most common pattern that I see, and I use it. You can improve it by using constants instead of magic strings. Sessions have their issues, but so does making a completely stateless app. I also recommend using HttpCache instead of Session -- it will not consume AppPool resources. But only Sessions can be used on a web farm, as long as you use a Session provider like SQL Server. Distributed caching is another matter.
With option 1 it's really easy to tell what it's doing. You're trying to standardize how your classes save/retrieve session data rather than scattering it all over the place.
Option 2 is a lot more confusing. In fact I've looked it over a few times and I can't figure what's going on the list. Why does option 2 require a list when option 1 doesn't?
For what you're trying to do, option 1 works just fine. The constants aren't a bad idea, but in this one case I might skip it. The meaning of the string is pretty obvious, and other classes won't need to duplicate it because they're going through this one to access Session.
Option #1 > Option #2
Two reasons:
You should be deleting session variables as soon as you are done with them, using Session.Remove. Otherwise your session state will keep getting bigger and bigger and your web server won't be able to support as many simultaneous users. But if all your variables are held in one big session variable, this is a bit harder to accomplish.
I would avoid using reference types (e.g. a List of any kind) in session. It creates an ambiguity: if your session is stored in-proc, the session is only storing a pointer, and you can change session variables by changing the objects that they reference just like normal reference types. But if your session is out of proc (e.g. using state server or SQL state) then your objects will be serialized and frozen, and if you change the objects that are referenced, those changes will not get reflected in your session variables. This could create all sorts of bugs that only appear on your upper environments (if your dev systems lack a state server) and drive you mad trying to troubleshoot.
You could possibly make an exception for immutable reference types, but you'd have to be careful; just because an object is immutable doesn't mean the objects that it references are immutable too.
I am having trouble figuring out the best way to refactor a very large C# class and specifically how to pass shared properties/values from the large class into the extracted classes and have those modified values available in the main class.
At the start, this class was 1000 lines long and is very procedural – it involves calling methods and performing work in a specific sequence. Along the way things are persisted into the database. During the process there are multiple Lists of items that are worked on and shared in the methods. At the end of this process, there are a bunch of statistics that are presented to the user. These statistics are calculated in various methods as the processing is taking place. To give a rough outline – the process involves a bunch of random selection and at the end of the process the user sees how many random items, how many invalid records were picked, how many items came from this sub-list etc.
I have been reading Uncle Bob’s “Clean Code” and am trying to make sure as I refactor that each class does only 1 thing.
So I have been able to extract methods and classes in order to keep the file smaller (have it down to 450 lines now) but the problem I am having now is that these broken out classes require values from the main parent class to be passed to them and updated – these values will be used for other methods/class methods as well.
I am torn as to which is the cleanest approach:
1) Should I create a bunch of private member variables to store the statistical values and Lists in the main class and then after calling into the main class' dependnat class methods, receive back a complex result class and then extract these values out and populate / update the private member variables? ( lots of boiler plate code this way)
OR
2) Is it better to create a DTO or a some sort of container class that holds the Lists and statistical values and just pass it to the various class methods and child class methods by reference in order to build up the list of values? In other words I am just passing this container class and since it's an object the other classes and methods will be able to directly manipulate the values in there. Then at the end of the process, that values DTO/container/whatever you want to call it will have all of the final results in it and I can just extract them from the container class (and in that case there really is no need to extract them and populate the main class’ private member variables. )
The latter is the way I have it now but I am feeling that this is a code smell – it all works but it just seems “fragile”. I know large classes are not great but at least with everything in 1 large file it does seem clearer as to which properties I am updating etc.
-- UPDATE --
Some more info:
Unfortunately I can't post any of the actual code as it is propriatary - will try to come up with dummy example and paste it in if I get some time. One of the comments below mentioned refactoring the code into steps and that is exactly what I've done. The purpose of the class is ultimately 1 thing - to create a random list of things - so in the only public method that gets called for this class - I have refactored this down to 1 level of abtraction for each "step". Each step, whether that is a method in the same class, or if it makes sense to break it out into a helper class to do the substeps - it still requires access to the lists that get built up during the process and the simple counter variables that keep track of the statistics.
-- UPDATE --
Here is an attempt at showing something similar in code:
public class RandomList(){
public int Id{get; set;}
public int Name{get; set;}
public int NumOfInvalidItems {get; set;}
public int NumOfFirstChunkItems{get; set;}
public int NumOfSecondChunkItems{get; set;}
public ICollection<RandomListItem> Items{get; set;}
}
public class CreateRandomListService(){
private readonly IUnitOfWork _unitOfWork;
private readonly ICreateRandomListValidator _createRandomListValidator;
private readonly IRandomSubProcessService _randomSubProcessService;
private readonly IAnotherSubProcessService _anotherSubProcessService;
private RandomList _randomList;
public CreateRandomListService(IUnitOfWork unitOfWork,
ICreateRandomListValidator createRandomListValidator,
IRandomFirstChunkFactory randomFirstChunkFactory,
IRandomSecondChunkFactory randomSecondChunkFactory){
_unitOfWork = unitOfWork;
_createRandomListValidator = createRandomListValidator;
_randomFirstChunkService = randomFirstChunkFactory.Create(_unitOfWork);
_randomSecondChunkService = randomSecondChunkFactory.Create(_unitOfWork);
}
public CreateResult CreateRandomList(CreateRandomListValues createValues){
// validate passed in model before proceeding
if(_createRandomListValidator.Validate(createValues))
return new CreateResult({HasErrors:true});
InitializeValues(createValues); // fetch settings from db etc and build up
ProcessFirstChunk();
ProcessSecondChunk();
SaveWithStatistics();
createResult.Id = _randomList.Id;
return createResult;
}
private InitializeValues(CreateRandomListValues createValues){
_createValues = createValues;
_createValues.ImportantSetting = _unitOfWork.SettingsRepository.GetImportantSetting();
// etc.
_randomList = new RandomList(){
// set initial properties etc. some come from the passed in createValues, some from db
}
}
private void ProcessFirstChunk(){
_randomFirstChunkService.GetRandomFirstChunk(_createValues);
}
private void ProcessSecondChunk(){
_randomSecondChunkService.GetRandomSecondChunk(_createValues);
}
private void SaveWithStatistics(){
_randomList.Items _createValues.ListOfItems;
_randomList.NumOfInvalidItems = _createValues.NumOfInvalidItems;
_randomList.NumOfItemsChosen = _createValues.NumOfItemsChosen;
_randomList.NumOfFirstChunkItems = _createValues.NumOfFirstChunkItems;
_randomList.NumOfSecondChunkItems = _createValues.NumOfSecondChunkItems;
_unitOfWork.RandomThingRepository.Add(_randomList);
_unitOfWork.Save();
}
}
public class RandomFirstChunkService(){
private IUnitOfWork _unitOfWork;
public RandomFirstChunkService(IUnitOfWork unitOfWork){
_unitOfWork = unitOfWork;
}
public void GetRandomFirstChunk(CreateRandomListValues createValues){
// do processing here - build up list collection and keep track of counts
CallMethodThatUpdatesList(creatValues);
// how to return this to calling class? currently just updating values in createValues by reference
// can also return a complex class here and extract the values back to the main class' member
// variables
}
private void CallMethodThatUpdatesList(createRandomListValues createValues){
// do work
}
}
The brutal answer is that it depends... of course. It is hard to work out a answer without reading the code, but I would say that once you have created new classes (with one purpose) those classes and interfaces should define what data objects you need to pass around to solve your problems. And in that case it is strange for a method to return the same type as pass into it, I also think that manipulation one object through a seriers of methods is fragile. Imagine if each of you classes was a REST service; then how would those interfaces look like.
I wouldn't "pass stuff around". Nor would I break it up into separate classes just because its 1000 lines. You'll end up making it much messier and much more of a maintenance headache.
You didn't post your code (duh), so its hard to critique it. If you really go over it, I suspect you might have duplicate code in there that can be refactored into methods, etc.
If you've already gotten rid of the duplicate code, I'd next pull out all the database stuff into a DAL layer.
If you really want to make it smaller (based on what little info you provided), I'd next refactor it into "steps" and make a workflow type parent container class.
Again, hard to say without knowing the code.
I don't know how exactly you have managed to refactor the class this far, but from your explanation it sounds to me like the "statistic" is the concept that should become an object, something like:
interface IStatistic<TOutput>
{
IEnumerable<TOutput> Calculate(IEnumerable<input-type>);
}
When you wish to display some statistic, you just use the appropriate statistic:
return new MySpecial().Calculate(myData);
In case that statistics objects are not that easy to construct, e.i. they ask for some parameters and so, then you may supply a Func delegate which creates them:
void DoSomething(Func<IStatistic<string>> factory)
{
string[] inputData = ...
foreach (string line in factory().Calculate(inputData))
{
// do something...
}
}
As you are mentioning multiple lists, I suppose that input-type would actually be a couple of input types. If that is so, then it might really make sense to supply a kind of a DTO to just hold the lists:
class RawData
{
public IEnumerable<type1> Data1 { get; }
public IEnumerabel<type2> Data2 { get; }
...
}
Observe, however, that this is not a DTO "by the book". First, it is immutable - only getters are there. Second, it only exposes sequences (IEnumerable), rather than raw lists. Both measures are taken to disallow statistic objects to manipulate data.
I'm using C# 4.0, Asp.Net. I have a problem regarding the proper construction of a readonly structure within a custom cache I created.
Details (summary) :
My CacheManager class (singleton) uses, as parameter, an instance of the existing MemoryCache class and wraps around a few helpful methods to deal with supplementary stuff such as object life cycle within my custom cache.
That Manager deals with a simple CachableObject that takes three variables :
object
DateTime
int (duration)
In summary, my custom cache Manager stores objects for a limited amount of time in order to protect my database from frequent big queries.
Lately, I tried to :
Got back an object from the cache (ie : stored under the key -MyList)
Casted it back to a list of complexe objects
Translated the content of some properties for each complexe objects
Stored again the freshly translated list of objects within the cache, (under another key -MyTranslatedList)
The problem :
During my testing, it appeared to me that both lists stored in the cache (raw and translated one) were refering to the same underlying objects. Therefore, once translated, those objects were actually translated in both lists.
Since each list only has references to the objects, that's a perfectly normal behavior and a silly mistake from me.
The question :
As you can easily guess now, I would like to protect myself and other users of my singleton for that kind of mistakes.
I would like to insert (or store or get) any kind of object (or list of complexe objects) so they cannot be altered by anybody getting them through the cache. I would like the data within my cache to be readonly (and deeply readonly) to avoid having that kind of problem. I want anybody to have to create a deep copy (or even better, to get one) before starting to use the data stored within the cache.
What I tried so far :
I tried to make the object readonly. It didn't work as expected.
Since I'm often storing list of complexe objects, I've found the AsReadOnly method that return a IReadOnlyCollection, but while this prevents me from altering the list (add, remove) it doesn't protect the objects that are within the list.
I hope my explanation is somewhat understandable :) Is there a neat way of dealing with that kind of situation ?
I would create a class where the properties are readonly:
class ReadonlyClass
{
private string p1;
private int p2;
public ReadonlyClass(string property1, int property2)
{
p1 = property1;
p2 = property2;
}
public string Property1
{
get { return p1; }
}
public int Property2
{
get { return p2; }
}
}
If the properties are objects/other classes, you should implement a clone function that returns a copy of the object. The clone function for the above class would looke like this:
public ReadonlyClass clone()
{
return new ReadonlyClass(p1, p2);
}
Best regards
Hans Milling...
I've read a lot of detailed things throughout Stack Overflow, Microsoft Developer Network, and a couple of blogs. The general consensus is "A Constructor shouldn't contain large quantities of parameters." So encountering this got me thinking-
Initial Problem: My application contains around fifteen variables that are constantly being used throughout the application. The solution I came up with is I'll create a single class that will inject the values to the Properties.
So this seemed to work quite well, it made my life quite easy as I could pass the object into another class through the Constructor without having to assign all these variables to each method. Except this lead to another issue-
public class ServerParameters
{
// Variable:
private string template;
private string sqlUsername;
private string sqlPassword;
private string sqlDatabase;
private string sqlServer;
public ServerParameter(string _template, string _sqlDatabase, string _sqlServer,
string _sqlUsername, string _sqlPassword)
{
template = _template;
sqlUsername = _sqlUsername;
sqlPassword = _sqlPassword;
sqlDatabase = _sqlDatabase;
sqlServer = _sqlServer;
}
// Link the private strings to a group of Properties.
}
So already this Constructor has become significantly bloated- But now I need to implement even more Parameters.
Problem Two: So I have a bloated Constructor and by implementing other items that don't entirely fit with this particular Class. My solution to this, was to create a subclass or container to hold these different classes but be able to utilize these classes.
You now see the dilemma, which has aroused the all important question- When you can only inherit once, how can you build a container that will hold all of these subclasses?
And why shouldn't you use so many parameters in a Constructor, why is it bad exactly?
My thought on how to implement a Container but I feel like I'm doing it wrong- Because I constantly get Null Reference Exception when I try to use some of these Parameters.
public class VarContainer
{
private ServerParameter server;
private CustomerParameter customer;
public VarContainer(ServerParameter _server, CustomerParameter _customer)
{
server = _server;
customer = _customer;
}
}
I'm assuming it is because the internal class itself isn't actually getting those assigned variables, but I'm completely lost on the best approach to achieve my goal-
The main intent of "don't do work in your constructor" is to avoid side effects where you create an object and it does a significant amount of work that can impact global state unexpectedly or even take a long time to complete which may disrupt the caller's code flow.
In your case, you're just setting up parameter values, so this is not the intention of "don't do work", since this isn't really work. The design of your final container depends on your requirements - if you can accept a variable list of properties that are set on your class (or struct) then perhaps an initializer when you construct the object is more appropriate.
Assuming that you want all of your properties from the get go, and that you want grouping like you called out in the question, I would construct something similar to:
public class Properties
{
public ServerProperties Server { get; private set; }
public CustomerProperties Customer { get; private set; }
public Properties(ServerProperties server, CustomerProperties customer)
{
Server = server;
Customer = customer;
}
}
I'm leaving the implementation of ServerProperties and CustomerProperties to you, but they follow the same implementation pattern.
This is of course a matter of preferences but I always give my constructors all the parameters they need so that my objects has basic functionality. I don't think that 5 parameters is bloated and adding a container to pass parameters adds much more bloat in my opinion than adding a few more parameters. By new bloat I mean that you will probably have a new file for that, with new classes and new imports. Calling code has to write more using directives and link to correct libraries which needs to be exported correctly as well.
Adding a wrapping class for parameter masks the real problem, that your class might be too complicated, it does not solve it and generally aggravates it.
You can have any amount of parameters you want in a constructor. It's just that if you have too many (how many is too much? that's really subjective), it gets harder and harder to make a new instance of that class.
For example, suppose you have a class with 30 members. 27 of them can be null. If you force it to receive a value for each member in the constructor, you'll get code like this:
Foo bar = new Foo(p1, p2, p3, null, null, null, null, null, null /*...snip*/);
Which is boring to write and not very readable, where a three parameter constructor would do.
IMO, this is what you should receive in your constructors:
First, anything that your instance absolutely needs in order to work. Stuff that it needs to make sense. For example, database connection related classes might need connection strings.
After those mentioned above, you may have overloads that receive the stuff that can be most useful. But don't exagerate here.
Everything else, you let whomever is using your code set later, through the set accessor, in properties.
Seems to me like you could use dependency injection container like Unity or Castle Windsor.