When Would You Implement Your Own Sorting Algorithm?

When Would You Implement Your Own Sorting Algorithm? - c#

Forgive me if this is a silly question....but I think back to my Comp. Sci. classes and I distinctly remember learning/being quizzed on several sorting algorithms and the corresponding 'Big O' notation.
Outside of the classroom though, I've never actually written code to sort.
When I get results from a database, I use 'Order By'. Otherwise, I use a collection class that implements a sort. I have implemented IComparable to allow sorting; but I've never gone beyond that.
Was sorting always just an academic pursuit for those of us who don't implement languages/frameworks? Or is it just that modern languages running on modern hardware make it a trivial detail to worry about?
Finally, when I call .Sort on a List(Of String), for example, what sort algorithm is being used under the hood?

While you rarely might need to implement a sorting algorithm yourself understanding the different algorithms and their complexity might help you in solving more complex problems.
Finally, when I call .Sort on a List(Of String), for example, what sort algorithm is being used under the hood?
Quick Sort

I've never implemented my own sorting algorithm once since I took my CS classes in college and if I was ever even contemplating writing my own, I'd want my head examined first.
List<T> uses Quicksort per the MSDN documentation:
http://msdn.microsoft.com/en-us/library/b0zbh7b6.aspx

You probably won't implement you own sorting algorithm if you are using high level languages...
What you have learnt in classroom was merely there to teach you of the existence and importance of the big O (omicron) notation.
It was there to make you know that optimization is always a goal in programming and that when you code something you must always think of how will it execute.
It teaches you that loops inside loops and recursions can lead to big performance problems if not analyzed/optimized well before coding starts.
It is a guidance to check your design before and be able to approximate the execution speed.

It is important for a programmer to know how theses algorithms work. One reason would be that, in certain conditions, certain algorithms are better, although, sorting is rarely the bottleneck.
In some frameworks, the .Sort function uses various methods, depending on the situation.

Modern languages running on modern hardware make it a trivial detail to worry about, unless a profiler shows that sorting is the bottleneck of your code.
According to this, List.Sort uses Array.Sort, which uses QuickSort.

IMO, it's become a bit of an academic exercise. You need to understand algorithmic complexity, and sorting is a good example for working through it because you can easily see the results and calculate the different complexities. In real life, though, there's almost certainly a library call that sorts your range faster than you would be able to do if you try to roll your own.
I don't know what the .Net libraries use for their defualt implementation, but I'd guess it's Quicksort or Shellsort. I'd be interested to find out if it's something else.

I've occasionally had to write my own sort methods, but only when I was writing for a relatively immature and underpowered platform (like .Net 1.1 in Windows CE or embedded java or somesuch).
On anything resembling a modern computer, the best sorting algorithm is the ORDER BY clause in T-SQL.

Implementing your own sort is the kind of thing that you do to gain insight in how algorithms work, what the tradeoffs are, which tried-and-true approaches that solve a wide array of problems efficiently are known, etc.
As Darin Dimitrov's answer states, library sort routines need to have a very competitive average-case performance, so quicksort is typically chosen.

Was sorting always just an academic pursuit for those of us who don't implement languages/frameworks? Or is it just that modern languages running on modern hardware make it a trivial detail to worry about?
Even if you're not implementing your own routine, you may know how your data is likely to be arranged, and may want to choose a suitable library algorithm. For example, see this discussion on nearly sorted data.

I think there are times when you need to have a custom sorting method.
What if you wanted to sort by the make of cars, but not alphabetically?
For example, you have a database with the makes: Honda, Lexus, Toyota, Acura, Nissan, Infiniti
If you use a plain sort, you get the order: Acura, Ford, Honda, Hyundai, Lexus, Toyota
What if you wanted to sort them based on a car company's standard and luxury class together? Lexus, Toyota, Honda, Acura, Nissan, Infiniti.
I think you would need a custom sort method if that's the case.

Related

Object Auditing

currently we have quite a chunky auditing system for objects within our application, the flow goes like this..
-Class implements an interface
-Interfaces forces the class to override some methods for adding properties that need auditing to a List of KeyValuePairs
-Class is then also needs to recreate the objects state from a list of key value pairs
Now the developer needs to add all this to there class, also our objects change quite often so we didn't just serialise the class.
What I would like to do is to use attributes to mark the properties as auditable and then do everything automatically so the developer doesn't really need to do anything.
My Main question is - I know people always say reflection is slow, how slow are we talking? what performance hits am I going to get from looking through the class and looking at attributes against a property and then doing any required logic?
thanks for any help
Ste,

It's hard to give a specific answer because it depends on what adequate performance is for your application.
Reflection is slower then normal compiled code but when worrying about performance problems it's always better to have something that works and then use profiling to find the real performance bottleneck and optimize.
Premature optimization could lead to code that's much harder to maintain so your developers will be less productive.
I would start with using reflection and write a good set of unit tests so you know your code is working. If performance turns out to be a problem you can use the Visual Studio profiler to profile your unit tests and discover the bottlenecks.
There are some libraries that can speed up reflection or you could use Expression trees to replace your reflection code if it's to slow.

If the performance ok or not depends on your app context. So it's difficult to say if it is slow or fast for you, you should try it by yourself.
Most probably, imo, it would give pretty acceptable performance, but again I have no idea where you gonna use it.
Like other solutions that come to my mind, could be:
Sqlite, where to save the key/value data
Aspect Oriented Programming (like a PostSharp) to generate a data in compile time.
But the first thing I would try, is a Reflection, just like you think.

Reading this response from Marc I would suggest that Reflection should be fine for most application needs.
Before making any fundamental changes I would suggest running a profiler to find the bottlenecks in your code. If you identify the reflection / auditing process is the major pain point use an IL Emit and try again.

Reflection is the way to go here. If it's too slow(measure!) you can throw in a bit of caching, or in the worst case generate an Expression<T> and compile it.
There are two phases in your problem:
Figure out which properties you want, and return a list of their PropertyInfos. You need to do this only once per type, and then you can cache it. Thus performance of this step doesn't matter.
Getting the value of each property with PropertyInfo.GetValue.
If this step 2 is too slow, you need to generate an Expression in step 1, and the overhead over manually written code goes down to a single delegate invocation.

Are there significant performance gains inherent in using .NET's built in classes?

Quick little question...
I know that sometimes in other languages libraries have part of their code written in platform-specific straight C for performance reasons. In such cases you can get huge performance gains by using library code wherever possible.
So does the .NET platform do this? Is Microsoft's implementation of the Base Class Library optimized in some way that I can't hope to match in managed code?
What about something little like using KeyValuePair as a type-safe tuple struct instead of writing my own?

As far as I know, the .NET Framework hasn't been compiled in a way that creates hooks into some otherwise-inaccessible hardware acceleration or something like that, so for simple things like KeyValuePair and Tuple, you're probably safe rolling your own.
However, there are a number of other advantages to using standard framework classes, and I'd hesitate to write my own without a strong reason.
They're already written, so why give yourself extra work?
Microsoft has put their code through a pretty rigorous vetting process, so there's a good chance that their code will be more correct and more efficient than yours will.
Other developers that have to look at your code will know exactly what to expect when they see standard framework classes being used, whereas your home-brewed stuff might make them scratch their heads for a while.
Update
#gordy also makes a good point, that the standard framework classes are being used by everybody and their dog, so there will be a slight performance gain simply due to the fact that:
the class likely won't have to be statically instantiated or just-in-time compiled just for your code,
the class's instructions are more likely to already be loaded in a cache, since they'll likely have been used recently by other parts of code. By using them, you're less likely to need to load the code into a cache in the first place, and you're less likely to be kicking other code out of the cache that's likely to be used again soon.

I've wondered this myself but I suspect that it's not the case since you can "decompile" all of base libraries in Reflector.
There's probably still a performance advantage over homemade stuff in that the code is likely jitted already and cached.

I suggest you use built-in classes most of the time, UNLESS YOU'VE MEASURED IT'S NOT FAST ENOUGH.
I'm pretty sure MS put a lot of time and effort building something fast and reliable. It is totally possible you can beat them... after a few weeks of efforts. I just don't think it is worth the time most of the time.
The only time it seems ok to rewrite something is when it does not do all that you want. Just be aware of the time cost and the associated difficulty.

Could you ever hope to match the performance? Possibly, though keep in mind their code has been fully tested and extremely optimized, so I'd say it's not a worth-while effort unless you have a very specific need that there isn't a BCL type that directly fulfills.
And .NET 4.0 already has a good Tuple<> implementation. Though in previous versions of .NET you'd have to roll your own if you need anything bigger than a KeyValuePair.

The real performance gain comes from the fact that the MS team built and tested the library methods. You can rest assured with a very high degree of comfort that the objects will behave without introducing bugs.
Then there is the matter of re-inventing the wheel. You'd really have to have a great reason for doing so.

Main performance reasons always lay in architecture or complex algorithms, language is no matter.
Miscrosoft Base Class Library always comes with a complexity explanation for "heavy" methods. So you can easily decide use it, or find another "faster" algorithm to implement or use.
Of corse when it comes to heavy algorithms (graphics, archiving, etc.) then performance gains from going to lower level language come in handy.

Cutting Stock Problem

Does anyone know how to implement the algorithm for this problem using the Knapsack algorithm?
The method I'm using at present makes extensive use of LINQ and Collections of Collections and a few Dictionaries. For those who dont know what I'm talking about check out The Cutting Stock Problem.

As mentioned in your given link, this problem is in fact an instance of an ILP, which is NP-hard normally.
Directly from wikipedia: Advanced algorithms for solving integer linear programs include:
cutting-plane method
branch and bound
branch and cut

Is F# really better than C# for math?

Unmanaged languages notwithstanding, is F# really better than C# for implementing math? And if that's the case, why?

I think most of the important points were already mentioned by someone else:
F# lets you solve problems in a way mathematicians think about them
Thanks to higher-order functions, you can use simpler concepts to solve difficult problems
Everything is immutable by default, which makes the program easier to understand (and also easier to parallelize)
It is definitely true that you can use some of the F# concepts in C# 3.0, but there are limitations. You cannot use any recursive computations (because C# doesn't have tail-recursion) and this is how you write primitive computations in functional/mathematical way. Also, writing complex higher order functions (that take other functions as arguments) in C# is difficult, because you have to write types explicitly (while in F#, types are inferred, but also automatically generalized, so you don't have to explicitly make a function generic).
Also, I think the following point from Marc Gravell isn't a valid objection:
From a maintenance angle, I'm of the view that suitably named properties etc are easier to use (over full life-cycle) than tuples and head/tail lists, but that might just be me.
This is of course true. However, the great thing about F# is that you can start writing the program using tuples & head/tail lists and later in the development process turn it into a program that uses .NET IEnumerables and types with properties (and that's how I believe typical F# programmer works*). Tuples etc. and F# interactive development tools give you a great way to quickly prototype solutions (and when doing something mathematical, this is essential because most of the development is just experimenting when you're looking for the best solution). Once you have the prototype, you can use simple source code transformations to wrap the code inisde an F# type (which can also be used from C# as an ordinary class). F# also gives you a lot of ways to optimize the code later in terms of performance.
This gives you the benefits of easy to use langauges (e.g. Python), which many people use for prototyping phase. However, you don't have to rewrite the whole program later once you're done with prototyping using an efficient language (e.g. C++ or perhaps C#), because F# is both "easy to use" and "efficient" and you can fluently switch between these two styles.
(*) I also use this style in my functional programming book.

F# has many enormous benefits over C# in the context of mathematical programs:
F# interactive sessions let you run code on-the-fly to obtain results immediately and even visualize them, without having to build and execute a complete application.
F# supports some features that can provide massive performance improvements in the context of mathematics. Most notably, the combination of inline and higher-order functions allow mathematical code to be elegantly factored without adversely affecting performance. C# cannot express this.
F# supports some features that make it possible to implement mathematical concepts far more naturally than can be obtained in C#. For example, tail calls make it much easier to implement recurrence relations simply and reliably. C# cannot express this either.
Mathematical problems often require the use of more sophisticated data structures and algorithms. Expressing complicated solutions is vastly easier with F# compared to C#.
If you would like a case study, I converted an implementation of QR decomposition over System.Double from 2kLOC of C#. The F# was only 100 lines of code, runs over 10× faster and is generalized over the type of number so it works not only on float32, float and System.Numerics.Complex but can even be applied to symbolic matrices to obtain symbolic results!
FWIW, I write books on this subject as well as commercial software.

F# supports units of measure, which can be very useful for math work.

I'm from a maths background, and have looked at F#, but I still prefer C# for most purposes. There are a couple of things that F# makes easier, but in general I still prefer C# by a large margin.
Some of the touted F# benefits (immutability, higher-order functions, etc) can still be done in C# (using delegates etc for the latter). This is even more apparent when using C# 3.0 with lambda support, which makes it very easy and expressive to declare functional code.
From a maintenance angle, I'm of the view that suitably named properties etc are easier to use (over full life-cycle) than tuples and head/tail lists, but that might just be me.
One of the areas where C# lets itself down for maths is in generics and their support for operators. So I spend some time addressing this ;-p My results are available in MiscUtil, with overview here.

This post looks like it might be relevant: http://fsharpnews.blogspot.com/2007/05/ffts-again.html
Also: C# / F# Performance comparison
The biggest advantage for pure math is what PerpetualCoder said, F# looks more like a math problem so it's going to be easier for a mathematician to write. It reminded me a lot of MATLAB when I looked at it.

I am not sure if its better or worse but there is certainly a difference in the approach. Static languages over specify how a problem will be solved. Functional languages like F# or Haskell do not do that and are more tailored at how a mathematician would solve a particular problem. Then you have books like this that tout python to be good at it. If you are talking from a performance point of view nothing can beat C. If you are talking from libraries I believe Functional Langauges (F# and the likes), Fortan (yes its not dead yet), Python have excellent libraries for math.

One of the great advantages of functional languages is the fact they they can run on multi-processor or multi-core systems, in parallel without requiring you to change any code.
That means you can speed up your algorithms by simply adding cores.

How costly is .NET reflection?

I constantly hear how bad reflection is to use. While I generally avoid reflection and rarely find situations where it is impossible to solve my problem without it, I was wondering...
For those who have used reflection in applications, have you measured performance hits and, is it really so bad?

In his talk The Performance of Everyday Things, Jeff Richter shows that calling a method by reflection is about 1000 times slower than calling it normally.
Jeff's tip: if you need to call the method multiple times, use reflection once to find it, then assign it to a delegate, and then call the delegate.

It is. But that depends on what you're trying to do.
I use reflection to dynamically load assemblies (plugins) and its performance "penalty" is not a problem, since the operation is something I do during startup of the application.
However, if you're reflecting inside a series of nested loops with reflection calls on each, I'd say you should revisit your code :)
For "a couple of time" operations, reflection is perfectly acceptable and you won't notice any delay or problem with it. It's a very powerful mechanism and it is even used by .NET, so I don't see why you shouldn't give it a try.

Reflection performance will depend on the implementation (repetitive calls should be cached eg: entity.GetType().GetProperty("PropName")). Since most of the reflection I see on a day to day basis is used to populate entities from data readers or other repository type structures I decided to benchmark performance specifically on reflection when it is used to get or set an objects properties.
I devised a test which I think is fair since it caches all the repeating calls and only times the actual SetValue or GetValue call. All the source code for the performance test is in bitbucket at: https://bitbucket.org/grenade/accessortest. Scrutiny is welcome and encouraged.
The conclusion I have come to is that it isn't practical and doesn't provide noticeable performance improvements to remove reflection in a data access layer that is returning less than 100,000 rows at a time when the reflection implementation is done well.
The graph above demonstrates the output of my little benchmark and shows that mechanisms that outperform reflection, only do so noticeably after the 100,000 cycles mark. Most DALs only return several hundred or perhaps thousands of rows at a time and at these levels reflection performs just fine.

If you're not in a loop, don't worry about it.

Not massively. I've never had an issue with it in desktop development unless, as Martin states, you're using it in a silly location. I've heard a lot of people have utterly irrational fears about its performance in desktop development.
In the Compact Framework (which I'm usually in) though, it's pretty much anathema and should be avoided like the plague in most cases. I can still get away with using it infrequently, but I have to be really careful with its application which is way less fun. :(

My most pertinent experience was writing code to compare any two data entities of the same type in a large object model property-wise. Got it working, tried it, ran like a dog, obviously.
I was despondent, then overnight realised that wihout changing the logic, I could use the same algorithm to auto-generate methods for doing the comparison but statically accessing the properties. It took no time at all to adapt the code for this purpose and I had the ability to do deep property-wise comparison of entities with static code that could be updated at the click of a button whenever the object model changed.
My point being: In conversations with colleagues since I have several times pointed out that their use of reflection could be to autogenerate code to compile rather than perform runtime operations and this is often worth considering.

It's bad enough that you have to be worried even about reflection done internally by the .NET libraries for performance-critical code.
The following example is obsolete - true at the time (2008), but long ago fixed in more recent CLR versions. Reflection in general is still a somewhat costly thing, though!
Case in point: You should never use a member declared as "Object" in a lock (C#) / SyncLock (VB.NET) statement in high-performance code. Why? Because the CLR can't lock on a value type, which means that it has to do a run-time reflection type check to see whether or not your Object is actually a value type instead of a reference type.

As with all things in programming you have to balance performance cost with with any benefit gained. Reflection is an invaluable tool when used with care. I created a O/R mapping library in C# which used reflection to do the bindings. This worked fantastically well. Most of the reflection code was only executed once, so any performance hit was quite small, but the benefits were great. If I were writing a new fandangled sorting algorithm, I would probably not use reflection, since it would probably scale poorly.
I appreciate that I haven't exactly answered your question here. My point is that it doesn't really matter. Use reflection where appropriate. It's just another language feature that you need to learn how and when to use.

Reflection can have noticeable impact on performance if you use it for frequent object creation. I've developed application based on Composite UI Application Block which is relying on reflection heavily. There was a noticeable performance degradation related with objects creation via reflection.
However in most cases there are no problems with reflection usage. If your only need is to inspect some assembly I would recommend Mono.Cecil which is very lightweight and fast

Reflection is costly because of the many checks the runtime must make whenever you make a request for a method that matches a list of parameters. Somewhere deep inside, code exists that loops over all methods for a type, verifies its visibility, checks the return type and also checks the type of each and every parameter. All of this stuff costs time.
When you execute that method internally theres some code that does stuff like checking you passed a compatible list of parameters before executing the actual target method.
If possible it is always recommended that one caches the method handle if one is going to continually reuse it in the future. Like all good programming tips, it often makes sense to avoid repeating oneself. In this case it would be wasteful to continually lookup the method with certain parameters and then execute it each and everytime.
Poke around the source and take a look at whats being done.

As with everything, it's all about assessing the situation. In DotNetNuke there's a fairly core component called FillObject that uses reflection to populate objects from datarows.
This is a fairly common scenario and there's an article on MSDN, Using Reflection to Bind Business Objects to ASP.NET Form Controls that covers the performance issues.
Performance aside, one thing I don't like about using reflection in that particular scenario is that it tends to reduce the ability to understand the code at a quick glance which for me doesn't seem worth the effort when you consider you also lose compile time safety as opposed to strongly typed datasets or something like LINQ to SQL.

Reflection does not drastically slow the performance of your app. You may be able to do certain things quicker by not using reflection, but if Reflection is the easiest way to achieve some functionality, then use it. You can always refactor you code away from Reflection if it becomes a perf problem.

I think you will find that the answer is, it depends. It's not a big deal if you want to put it in your task-list application. It is a big deal if you want to put it in Facebook's persistence library.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.