I recently created an interface layer to distinguish the DataAccessProvider from our Business logic layer.
With this approach we can change our choice of DataAccessProvider whenever we want by changing the values in the Web/App.Config.
(more details can be given if needed).
Anyway, to do this we use reflection to accomplish our DataProvider class on which we can work.
/// <summary>
/// The constructor will create a new provider with the use of reflection.
/// If the assembly could not be loaded an AssemblyNotFoundException will be thrown.
/// </summary>
public DataAccessProviderFactory()
{
string providerName = ConfigurationManager.AppSettings["DataProvider"];
string providerFactoryName = ConfigurationManager.AppSettings["DataProviderFactory"];
try
{
activeProvider = Assembly.Load(providerName);
activeDataProviderFactory = (IDataProviderFactory)activeProvider.CreateInstance(providerFactoryName);
}
catch
{
throw new AssemblyNotFoundException();
}
}
But now I'm wondering how slow reflection is?
In most cases: more than fast enough. For example, if you are using this to create a DAL wrapper object, the time taken to create the object via reflection will be minuscule compared to the time it needs to connect to a network. So optimising this would be a waste of time.
If you are using reflection in a tight loop, there are tricks to improve it:
generics (using a wrapper where T : new() and MakeGenericType)
Delegate.CreateDelegate (to a typed delegate; doesn't work for constructors)
Reflection.Emit - hardcore
Expression (like Delegate.CreateDelegate, but more flexible, and works for constructors)
But for your purposes, CreateInstance is perfectly fine. Stick with that, and keep things simple.
Edit: while the point about relative performance remains, and while the most important thing, "measure it", remains, I should clarify some of the above. Sometimes... it does matter. Measure first. However, if you find it is too slow, you might want to look at something like FastMember, which does all the Reflection.Emit code quietly in the background, to give you a nice easy API; for example:
var accessor = TypeAccessor.Create(type);
List<object> results = new List<object>();
foreach(var row in rows) {
object obj = accessor.CreateNew();
foreach(var col in cols) {
accessor[obj, col.Name] = col.Value;
}
results.Add(obj);
}
which is simple, but will be very fast. In the specific example I mention about a DAL wrapper—if you are doing this lots, consider something like dapper, which again does all the Reflection.Emit code in the background to give you the fastest possible but easy to use API:
int id = 12345;
var orders = connection.Query<Order>(
"select top 10 * from Orders where CustomerId = #id order by Id desc",
new { id }).ToList();
Its slower compared to non-reflective code. The important thing is not if its slow, but if its slow where it counts. For instance, if you instantiate objects using reflection in web environment where expected concurency can rise up to 10K, it will be slow.
Anyway, its good not to be concerned about performance in advance. If things turns out to be slow, you can always speed them up if you designed things correctly so that parts that you expected might be in need of optimisation in future are localised.
You can check this famous article if you need speed up:
Dynamic... But Fast: The Tale of Three Monkeys, A Wolf and the DynamicMethod and ILGenerator Classes
Here are some links that might help:
This guy did some tests and provides a few metrics. This article is from 2006, so I made a Gist of the code to test Reflection Performance. The results are similar (although it's much faster now obviously).
Constructor
Time elapsed (ms): 15 which is 151588 ticks for 1000000 calls.
Constructor using reflection
Time elapsed (ms): 38 which is 381821 ticks for 1000000 calls.
Method call
Time elapsed (ms): 5 which is 57002 ticks for 1000000 calls.
Method call using reflection
Time elapsed (ms): 252 which is 2529507 ticks for 1000000 calls.
Setting properties
Time elapsed (ms): 294 which is 2949458 ticks for 1000000 calls.
Setting properties using reflection
Time elapsed (ms): 1490 which is 14908530 ticks for 1000000 calls.
MSDN article "Dodge Common Performance Pitfalls to Craft Speedy Applications"
I thought I'd do a quick test to demonstrate how slow reflection is compared to without.
With Reflection
Instantiating 58 objects by iterating through each of their Attributes and matching
Total Time: 52254 nanoseconds
while (reader.Read()) {
string[] columns = reader.CurrentRecord;
CdsRawPayfileEntry toAdd = new CdsRawPayfileEntry();
IEnumerable<PropertyInfo> rawPayFileAttributes = typeof(CdsRawPayfileEntry).GetProperties().Where(prop => Attribute.IsDefined(prop, typeof(CustomIndexAttribute)));
foreach (var property in rawPayFileAttributes) {
int propertyIndex = ((CustomIndexAttribute)property.GetCustomAttribute(typeof(CustomIndexAttribute))).Index;
if (propertyIndex < columns.Length)
property.SetValue(toReturn, columns[propertyIndex]);
else
break;
}
}
Without Reflection
Instantiating 58 Objects by creating a new object
Total Time: 868 nanoseconds
while (reader2.Read()) {
string[] columns = reader2.CurrentRecord;
CdsRawPayfileEntry toAdd = new CdsRawPayfileEntry() {
ColumnZero = columns[0],
ColumnOne = columns[1],
ColumnTwo = columns[2],
ColumnThree = columns[3],
ColumnFour = columns[4],
ColumnFive = columns[5],
ColumnSix = columns[6],
ColumnSeven = columns[7],
ColumnEight = columns[8],
ColumnNine = columns[9],
ColumnTen = columns[10],
ColumnEleven = columns[11],
ColumnTwelve = columns[12],
ColumnThirteen = columns[13],
ColumnFourteen = columns[14],
ColumnFifteen = columns[15],
ColumnSixteen = columns[16],
ColumnSeventeen = columns[17]
};
}
Albeit, not completely fair since the reflection also has to retrieve a specific attribute of every property 58*18 times on top of creating a new object via reflection, but it at least provides some perspective.
Reflection is not THAT slow. Invoking a method by reflection is about 3 times slower than the normal way. That is no problem if you do this just once or in non-critical situations. If you use it 10'000 times in a time-critical method, I would consider to change the implementation.
Other than following the links given in other answers and ensuring you're not writing "pathalogically bad" code then for me the best answer to this is to test it yourself.
Only you know where you bottle necks are, how many times your reflection code will be user, whether the reflection code will be in tight loops etc. You know your business case, how many users will access your site, what the perf requirements are.
However, given the snippet of code you've shown here then my guess would be that the overhead of reflection isn't going to be a massive problem.
VS.NET web testing and performance testing features should make measuring the performance of this code pretty simple.
If you don't use reflection, what will your code look like? What limitations will it have? It may be that you can't live with the limitations that you find yourself with if you remove the reflection code. It might be worth trying to design this code without the reflection to see if it's possible or it the alternative is desirable.
I was doing somethign similar until I started playing with IoC. I would use a Spring object definition to specify the data provider - SQL, XML, or Mocks!
Related
A programming pattern like this comes up every so often:
int staleCount = 0;
fileUpdatesGridView.DataSource = MultiMerger.TargetIds
.Select(id =>
{
FileDatabaseMerger merger = MultiMerger.GetMerger(id);
if (merger.TargetIsStale)
staleCount++;
return new
{
Id = id,
IsStale = merger.TargetIsStale,
// ...
};
})
.ToList();
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = staleCount > 0;
I'm not sure there is a more succinct way to code this?
Even if so, is it bad practice to do this?
No, it is not strictly "bad practice" (like constructing SQL queries with string concatenation of user input or using goto).
Sometimes such code is more readable than several queries/foreach or no-side-effect Aggregate call. Also it is good idea to at least try to write foreach and no-side-effect versions to see which one is more readable/easier to prove correctness.
Please note that:
it is frequently very hard to reason what/when will happen with such code. I.e. you sample hacks around the fact of LINQ queries executed lazily with .ToList() call, otherwise that value will not be computed.
pure functions can be run in parallel, once with side effects need a lot of care to do so
if you ever need to convert LINQ-to-Object to LINQ-to-SQL you have to rewrite such queries
generally LINQ queries favor functional programming style without side-effects (and hence by convention readers would not expect side-effects in the code).
Why not just code it like this:
var result=MultiMerger.TargetIds
.Select(id =>
{
FileDatabaseMerger merger = MultiMerger.GetMerger(id);
return new
{
Id = id,
IsStale = merger.TargetIsStale,
// ...
};
})
.ToList();
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.IsStale);
I would consider this a bad practice. You are making the assumption that the lambda expression is being forced to execute because you called ToList. That's an implementation detail of the current version of ToList. What if ToList in .NET 7.x is changed to return an object that semi-lazily converts the IQueryable? What if it's changed to run the lambda in parallel? All of a sudden you have concurrency issues on your staleCount. As far as I know, both of those are possibilities which would break your code because of bad assumptions your code is making.
Now as far as repeatedly calling MultiMerger.GetMerger with a single id, that really should be reworked to be a join as the logic for doing a join (w|c)ould be much more efficient than what you have coded there and would scale a lot better, especially if the implementation of MultiMerger is actually pulling data from a database (or might be changed to do so).
As far as calling ToList() before passing it to the Datasource, if the Datasource doesn't use all the fields in your new object, you would be (much) faster and take less memory to skip the ToList and let the datasource only pull the fields it needs. What you've done is highly couple the data to the exact requirements of the view, which should be avoided where possible. An example would be what if you all of a sudden need to display a field that exists in FileDatabaseMerger, but isn't in your current anonymous object? Now you have to make changes to both the controller and view to add it, where if you just passed in an IQueryable, you would only have to change the view. Again, faster, less memory, more flexible, and more maintainable.
Hope this helps.. And this question really should be posted of code review, not stackoverflow.
Update on further review, the following code would be much better:
var result=MultiMerger.GetMergersByIds(MultiMerger.TargetIds);
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.TargetIsStale);
or
var result=MultiMerger.GetMergers().Where(m=>MultiMerger.TargetIds.Contains(m.Id));
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.TargetIsStale);
Let's say I have a relatively large list of an object MyObjectModel called MyBigList. One of the properties of MyObjectModel is an int called ObjectID. In theory, I think MyBigList could reach 15-20MB in size. I also have a table in my database that stores some scalars about this list so that it can be recomposed later.
What is going to be more efficient?
Option A:
List<MyObjectModel> MyBigList = null;
MyBigList = GetBigList(some parameters);
int RowID = PutScalarsInDB(MyBigList);
Option B:
List<MyObjectModel> MyBigList = null;
MyBigList = GetBigList(some parameters);
int TheCount = MyBigList.Count();
StringBuilder ListOfObjectID = null;
foreach (MyObjectModel ThisObject in MyBigList)
{
ListOfObjectID.Append(ThisObject.ObjectID.ToString());
}
int RowID = PutScalarsInDB ( TheCount, ListOfObjectID);
In option A I pass MyBigList to a function that extracts the scalars from the list, stores these in the DB and returns the row where these entries were made. In option B, I keep MyBigList in the page method where I do the extraction of the scalars and I just pass these to the PutScalarsInDB function.
What's the better option, and it could be that yet another is better? I'm concerned about passing around objects this size and memory usage.
I don't think you'll see a material difference between these two approaches. From your description, it sounds like you'll be burning the same CPU cycles either way. The things that matter are:
Get the list
Iterate through the list to get the IDs
Iterate through the list to update the database
The order in which these three activities occur, and whether they occur within a single method or a subroutine, doesn't matter. All other activities (declaring variables, assigning results, etc.,) are of zero to negligible performance impact.
Other things being equal, your first option may be slightly more performant because you'll only be iterating once, I assume, both extracting IDs and updating the database in a single pass. But the cost of iteration will likely be very small compared with the cost of updating the database, so it's not a performance difference you're likely to notice.
Having said all that, there are many, many more factors that may impact performance, such as the type of list you're iterating through, the speed of your connection to the database, etc., that could dwarf these other considerations. It doesn't look like too much code either way. I'd strongly suggesting building both and testing them.
Then let us know your results!
If you want to know which method has more performance you can use the stopwatch class to check the time needed for each method. see here for stopwatch usage: http://www.dotnetperls.com/stopwatch
I think there are other issues for a asp.net application you need to verify:
From where do read your list? if you read it from the data base, would it be more efficient to do your work in database within a stored procedure.
Where is it stored? Is it only read and destroyed or is it stored in session or application?
So in the following bit of code, I really like the verbosity of scenario one, but I would like to know how big of a performance hit this takes when compared to scenario two. Is instantiation in the loop a big deal?
Is the syntactic benefit (which I like, some might not even agree with it being verbose or optimal) worth said performance hit? You can assume that the collections remain reasonably small (N < a few hundred).
// First scenario
var productCategoryModels = new List<ProductCategoryModel>();
foreach (var productCategory in productCategories)
{
var model = new ProductCategoryModel.ProductCategoryModelConverter(currentContext).Convert(productCategory);
productCategoryModels.Add(model);
}
// Second scenario
var productCategoryModels = new List<ProductCategoryModel>();
var modelConvert = new ProductCategoryModel.ProductCategoryModelConverter(currentContext);
foreach (var productCategory in productCategories)
{
var model = modelConvert.Convert(productCategory);
productCategoryModels.Add(model);
}
Would love to hear your guys' thoughts on this as I see this quite often.
I would approach this questions a bit differently. If whatever happens in new ProductCategoryModel.ProductCategoryModelConverter(currentContext) doesn't change during the loop, I see no reason to include it within the loop. If it is not part of the loop, it shouldn't be in there imo.
If you include it just because it looks better, you force the reader to figure out if it makes a difference or not.
Like Brian, I see no reason to create a new instance if you're not actually changing it - I'm assuming that Convert doesn't change the original object?
I'd recommend using LINQ to avoid the loop entirely though (in terms of your own code):
var modelConverter = new ProductCategoryModelConverter(currentContext);
var models = productCategories.Select(x => modelConverter.Convert(x))
.ToList();
In terms of performance, it would depend on what ProductCategoryModelConverter's constructor had to do. Just creating a new object is pretty cheap, in terms of overhead. It's not free, of course - but I wouldn't expect this to cause a bottleneck in most cases. However, I'd say that instantiating it in a loop would imply to the reader that it's necessary; that there was some reason not to use just a single object. A converter certainly sounds like something which will stay immutable as it does its work... so I'd be puzzled by the instantiate-in-a-loop version.
Keep whichever of the formats you're happiest with - if they both perform well enough for your application. Optimize later if you encounter performance problems.
quandry is - which of the following two method performs best
Goal - get an object of type Wrapper ( defined below )
criteria - speed over storage
no. of records - about 1000- about 2000, max about 6K
Choices - Create Object on the fly or do a lookup from a dictionary
Execution speed - called x times per second
NB - i need to deliver the working code first and then go for optimization hence if any theorists can provide glimpses on behind the scene info, that'll help before i get to the actual performance test possibly by eod thu
Definitions -
class Wrapper
{
public readonly DataRow Row;
public Wrapper(DataRow dr)
{
Row = dr;
}
public string ID { get { return Row["id"].ToString(); } }
public string ID2 { get { return Row["id2"].ToString(); } }
public string ID3 { get { return Row["id3"].ToString(); } }
public double Dbl1 { get { return (double)Row["dbl1"]; } }
// ... total about 12 such fields !
}
Dictionary<string,Wrapper> dictWrappers;
Method 1
Wrapper o = new Wrapper(dr);
/// some action with o
myMethod( o );
Method 2
Wrapper o;
if ( ! dictWrappers.TryGetValue( dr["id"].ToString(), out o ) )
{
o = new Wrapper(dr);
dictWrapper.Add(o.ID, o);
}
/// some action with o
myMethod( o );
Never optimize without profiling first.
Never profile unless the code does not meet specifications/expectations.
If you need to profile this code, write it both ways and benchmark it with your expected load.
EDIT: I try to favor the following over optimization unless performance is unacceptable:
Simplicity
Readability
Maintainability
Testability
I've (recently) seen highly-optimized code that was very difficult to debug. I refactored it to simplify it, then ran performance tests. The performance was unacceptable, so I profiled it, found the bottlenecks, and optimized only those. I re-ran the performance tests, and the new code was comparable to the highly-optimized version. And it's now much easier to maintain.
Here's a free profiling tool.
The first one would be faster, since it isn't actually doing a lookup, it is just doing a simple allocation and an assignment.
The two segments of code are not nearly equivalent. In function however, because Method 1 could create many duplicates.
Without actually testing I would expect that caching the field values in Wrapper (that is, avoiding all the ToString calls and casts) would probably have more of an impact on performance.
Then once you are caching those values you will probably want to keep instances of Wrapper around rather than frequently recreate them.
Assuming that you're really worried about per (hey, it happens) then your underlying wrapper itself could be improved. You're doing field lookups by string. If you're going to make the call a lot with the same field set in the row, it's actually faster to cache the ordinals and look up by ordinal.
Of course this is only if you really, really need to worry about performance, and the instances where this would make a difference are fairly rare (though in embedded devices it's not as rare as on the desktop).
I currently have a function:
public static Attribute GetAttribute(MemberInfo Member, Type AttributeType)
{
Object[] Attributes = Member.GetCustomAttributes(AttributeType, true);
if (Attributes.Length > 0)
return (Attribute)Attributes[0];
else
return null;
}
I am wondering if it would be worthwhile caching all the attributes on a property into a
Attribute = _cache[MemberInfo][Type] dictionary,
This would require using GetCustomAttributes without any type parameter then enumerating over the result. Is it worth it?
You will get better bangs for your bucks if you replace the body of your method with this:
return Attribute.GetCustomAttribute(Member, AttributeType,false); // only look in the current member and don't go up the inheritance tree.
If you really need to cache on a type-basis:
public static class MyCacheFor<T>
{
static MyCacheFor()
{
// grab the data
Value = ExtractExpensiveData(typeof(T));
}
public static readonly MyExpensiveToExtractData Value;
private static MyExpensiveToExtractData ExtractExpensiveData(Type type)
{
// ...
}
}
Beats dictionary lookups everytime. Plus it's threadsafe:)
Cheers,
Florian
PS: Depends how often you call this. I had some cases where doing a lot of serialization using reflection really called for caching, as usual, you want to measure the performance gain versus the memory usage increase. Instrument your memory use and profile your CPU time.
The only way you can know for sure, is to profile it. I am sorry if this sounds like a cliche. But the reason why a saying is a cliche is often because it's true.
Caching the attribute is actually making the code more complex, and more error prone. So you might want to take this into account-- your development time-- before you decide.
So like optimization, don't do it unless you have to.
From my experience ( I am talking about AutoCAD-like Windows Application, with a lot of click-edit GUI operations and heavy number crunching), the reading of custom attribute is never--even once-- the performance bottleneck.
I just had a scenario where GetCustomAttributes turned out to be the performance bottleneck. In my case it was getting called hundreds of thousands of times in a dataset with many rows and this made the problem easy to isolate. Caching the attributes solved the problem.
Preliminary testing led to a barely noticeable performance hit at about 5000 calls on a modern day machine. (And it became drastically more noticeable as the dataset size increased.)
I generally agree with the other answers about premature optimization, however, on a scale of CPU instruction to DB call, I'd suggest that GetCustomAttributes would lean more towards the latter.
Your question is a case of premature optimization.
You don't know the inner workings of the reflection classes and therefore are making assumptions about the performance implications of calling GetCustomAttributes multiple times. The method itself could well cache its output already, meaning your code would actually add overhead with no performance improvement.
Save your brain cycles for thinking about things which you already know are problems!
Old question but GetCustomAttributes is costly/heavyweight
Using a cache if it is causing performance problems can be a good idea
The article I linked: (Dodge Common Performance Pitfalls to Craft Speedy Applications) was taken down but here a link to an archived version:
https://web.archive.org/web/20150118044646/http://msdn.microsoft.com:80/en-us/magazine/cc163759.aspx
Are you actually having a performance problem? If not then don't do it until you need it.
It might help depending on how often you call the method with the same paramters. If you only call it once per MemberInfo, Type combination then it won't do any good. Even if you do cache it you are trading speed for memory consumption. That might be fine for your application.