Using Constrained Execution Regions

Using Constrained Execution Regions - c#

In which situations are CERs useful? I mean, real-life situations, not some abstract examples.
Do you personally use them? Haven't seen their usage except for examples in books and articles. That, for sure, can be because of my insufficient programming experience. So I am also interested how wide-spread technique it is.
What are the pros and cons for using them?

In which situations are CERs useful? I mean, real-life situations, not some abstract examples.
When building software that has stringent reliability requirements. Database servers, for example, must not leak resources, must not corrupt internal data structures, and must keep running, period, end of story, even in the face of godawful scenarios like thread aborts.
Building managed code that cannot possibly leak, that maintains consistent data structures when aborts can happen at arbitrary places, and keeps the service going is a difficult task. CERs are one of the tools in the toolbox for building such reliable services; in this case, by restricting where aborts can occur.
One can imagine other services that must stay reliable in difficult circumstances. Software that, say, finds efficient routes for ambulances, or moves robot arms around in factories, has higher reliability constraints than your average end user code running on a desktop machine.
Do you personally use them?
No. I build compilers that run on end-user machines. If the compiler fails halfway through a compilation, that's unfortunate but it is not likely to have a human life safety impact or result in the destruction of important data.
I am also interested how wide-spread technique it is.
I have no idea.
What are the pros and cons for using them?
I don't understand the question. You might as well ask what the pros and cons of a roofing hatchet are; unless you state the task that you intend to use the hatchet for, it's hard to say what the pros and cons of the tool are. What task do you wish to perform with CERs? Once we know the task we can describe the pros and cons of using any particular tool to accomplish that task.

Related

In managed code, what should I look after to keep good performance?

I am originally a native C++ programmer, in C++ every process in your program is bound to your code, i.e, nothing happens unless you want it to happen. And every bit of memory is allocated (and deallocated) according to what you wrote. So, performance is all your responsibility, if you do good, you get great performance.
(Note: Please don't complain about the code one haven't written himself such as STL, it's a C++ unmanaged code after all, that is the significant part).
But in managed code, such as code in Java and C#, you don't control every process, and memory is "hidden", or not under your control, to some extent. And that makes performance something relatively unknown, mostly you fear bad performance.
So my question is: What issues and Bold Lines should I look after and keep in mind to achieve a good performance in managed code?
I could think only of some practices such as:
Being aware of boxing and unboxing.
Choosing the correct Collection that best suites your needs and has the lowest operation cost.
But these never seem to be enough and even convincing! In fact perhaps I shouldn't have mentioned them.
Please note I am not asking for a C++ VS C# (or Java) code comparing, I just mentioned C++ to explain the problem.

There is no single answer here. The only way to answer this is: profile. Measure early and often. The bottlenecks are usually not where you expect them. Optimize the things that actually hurt. We use mvc-mini-profiler for this, but any similar tool will work.
You seem to be focusing on GC; now, that can sometimes be an issue, but usually only in specific cases; for the majority of systems the generational GC works great.
Obviously external resources will be slow; caching may be critical: in odd scenarios with very-long-lived data there are tricks you can do with structs to avoid long GEN-2 collects; serialization (files, network, etc), materialization (ORM), or just bad collection/algorithn choice may be the biggest issue - you cannot know until you measure.
Two things though:
make sure you understand what IDisposable and "using" mean
don't concatenate strings in loops; mass concatenation is the job of StringBuilder

Reusing large objects is very important in my experience.
Objects on the large object heap are implicitly generation 2, and thus require a full GC to clean up. And that's expensive.

The main thing to keep in mind with performance with managed languages is that your code can change structure at runtime to be better optimized.
For example the default JVM most people use is Sun's Hotspot VM which will actually optimize your code as it runs by converting parts of the program to native code, in-lining on the fly and other optimizations (such as the CLR or other managed runtimes) which you will never get using C++.
Additionally Hotspot will also detect which parts of you're code are used the most and optimize accordingly.
So as you can see optimising performance on a managed system is slightly harder than on an un-managed system because you have an intermediate layer that can make code faster without your intervention.
I am going to invoke the law of premature optimization here and say that you should first create the correct solution then, if performance becomes an issue, go back and measure what is actually slow before attempting to optimize.

I would suggest understanding better garbage collection algorithms. You can find good books on that matter, e.g. The Garbage Collection Handbook (by Richard Jones, Antony Hosking, Eliot Moss).
Then, your question is practically related to particular implementation, and perhaps even to a specific version of it. For instance, Mono used (e.g. in version 2.4) to use Boehm's garbage collector, but now uses a copying generational one.
And don't forget that some GC techniques can be remarkably efficient. Remember A.Appel's old paper Garbage Collection can be faster than stack allocation (but today, the cache performance matters much much more, so details are different).
I think that being aware of boxing (& unboxing) and allocation is enough. Some compilers are able to optimize these (by avoiding some of them).
Don't forget that GC performance can vary widely. There are good GCs (for your application) and bad ones.
And some GC implementations are quite fast. For example the one inside Ocaml
I would not bother that much: premature optimization is evil.
(and C++ memory management, even with smart pointers, or with ref-counters, can often be viewed as a poor man's garbage collection technique; and you don't have full control on what C++ is doing -unless you re-implement your ::operator new using operating system specific system calls-, so you don't really know a priori its performance)

.NET Generics don't specialize on reference types, which severely limits how much inlining can be done. It may (in certain performance hotspots) make sense to forgo a generic container type in favor of a specific implementation that will be better optimized. (Note: this doesn't mean to use .NET 1.x containers with element type object).

you must :
using large objects is very important in my experience.
Objects on the large object heap are implicitly generation 2, and thus require a full GC to clean up. And that's expensive.

How to optimize for dual, quad and higher multi-processors?

Folks, I've been programming high speed software over 20 years and know virtually every trick in the book from micro-bench making cooperative, profiling, user-mode multitasking, tail recursion, you name it for very high performance stuff on Linux, Windows, and more.
The problem is that I find myself befuddled by what happens when multiple threads of CPU intensive work are exposed to a multi-core processors.
The results from performance in micro benchmarks of various ways of sharing date between threads (on different cores) don't seem to follow logic.
It's clear that there is some "hidden interaction" between the cores which isn't obvious from my own programming code. I hear of L1 cache and other issues but those are opaque to me.
Question is: Where can I learn this stuff ? I am looking for an in depth book on how multi-core processors work, how to program to capitalize on their memory caches or other hardware architecture instead of being punished by them.
Any advice or great websites or books? After much Googling, I'm coming up empty.
Sincerely,
Wayne

This book taught me a lot about these sorts of issues about why raw CPU power is not necessary the only thing to pay attention to. I used it in grad school years ago, but I think all of the principles still apply:
http://www.amazon.com/Computer-Architecture-Quantitative-Approach-4th/dp/0123704901
And essentially a major issue in multi-process configurations is synchronizing the access to the main memory, if you don't do this right it can be a real bottleneck in the performance. It's pretty complex with the caches that have to be kept in sync.

my own question, with answer, on stackoverflow's sister site: https://softwareengineering.stackexchange.com/questions/126986/where-can-i-find-an-overview-of-known-multithreading-design-patterns/126993#126993
I will copy the answer to avoid the need for click-through:
Quote Boris:
Parallel Programming with Microsoft .NET: Design Patterns for
Decomposition and Coordination on Multicore Architectures https://rads.stackoverflow.com/amzn/click/0735651590
This is a book, I recommend wholeheartedly.
It is:
New - published last year. Means you are not reading somewhat outdated
practices.
Short - about 200+ pages, dense with information. These
days there is too much to read and too little time to read 1000+ pages
books.
Easy to read - not only it is very well written but it
introduces hard to grasps concepts in really simple to read way.
Intended to teach - each chapter gives exercises to do. I know it is
always beneficial to do these, but rarely do. This book gives very
compelling and interesting tasks. Surprisingly I did most of them and
enjoyed doing them.
additionally, if you wish to learn more of the low-level details, this is the best resource i have found: "The Art of Multiprocessor Programming" It's written using java as their code samples, which plays nicely with my C# background.
PS: I have about 5 years "hard core" parallel programming experience, (abet using C#) so hope you can trust me when I say that "The Art of Multiprocessor Programming" rocks

My answer on "Are you concerned about multicores"
Herb Sutter's articles
Video Series on Parallel Programming

One specific cause of unexpected poor results in parallelized code is false sharing, you won't see that coming if you dont know what's going on down there (I didn't). Here a two articles that dicuss the cause and remedy for .Net:
http://msdn.microsoft.com/en-us/magazine/cc872851.aspx
http://www.codeproject.com/KB/threads/FalseSharing.aspx
Rgds GJ

There are different aspects to multi-threading requiring different approaches.
On a webserver, for example, the use of thread-pools is widely used since it supposedly is "good for" performance. Such pools may contain hundreds of threads waiting to be put to work. Using that many threads will cause the scheduler to work overtime which is detrimental to performance but can't be avoided on Linux systems. For Windows the method of choice is the IOCP mechanism which recommends a number of threads not greater than the number of cores installed. It causes an application to become (I/O completion) event driven which means that no cycles are wasted on polling. The few threads involved reduce scheduler work to a minimum.
If the object is to implement a functionality that is scaleable (more cores <=> higher performance) then the main issue will be memory bus saturation. The saturation will occur due to code fetching, data reading and data writing. An incorrectly implemented code will run slower with two threads than with one. The only way around this is to reduce the memory bus work by actively:
tailoring the code to a minimal memory footprint (= fits in the code cache) and which doesn't call other functions or jump all over the place.
tailoring memory reads and writes to a minimum size.
informing the prefetch mechanism of coming RAM reads.
tailoring the work such that the ratio of work performed inside the core's own caches (L1 & L2) is as great as possible in relation to the work outside them (L3 & RAM).
To put this in another way: fit the applicable code and data chunks into as few cache lines (# 64 bytes each) as possible because ultimately this is what will decide the scaleability. If the cache/memory system is capable of x cache line operations every second your code will run faster if its requirements are five cache lines per unit of work (=> x/5) rather than eleven (x/11) or fifty-two (x/52).
Achieving this is not trivial since it requires a more or less unique solution every time. Some compilers do a good job of instruction ordering to take advantage of the host processor's pipelining. This does not necessarily mean that it will be a good ordering for multiple cores.
An efficient implementation of a scaleable code will not necessarily be a pretty one. Recommended coding techniques and styles may, in the end, hinder the code's execution.
My advice is to test how this works by writing a simple multi-threaded application in a low-level language (such as C) that can be adjusted to run in single or multi-threaded mode and then profiling the code for the different modes. You will need to analyze the code at the instruction level. Then you experiment using different (C) code constructs, data organization, etc. You may have to think outside the box and rethink the algorithm to make it more cache-friendly.
The first time will require lots of work. You will not learn what will work for all multi-threaded solutions but you will perhaps get an inkling of what not to do and what indications to look for when analyzing profiled code.

I found this link that specifically explains the issues of
multicore cache handling on CPUs that was affecting my
multithreaded program.
http://www.multicoreinfo.com/research/intel/mem-issues.pdf
The site multicoreinfo.com in general has lots of good
information and references about multicore programming.

Specification Pattern and Performance

I've been playing around w/ the specification pattern to handle and contain the business logic in our c#/mvc application. So far so good. I do have a question though - since we'll be creating a number of specification objects on the heap, will that affect performance in any way versus, say creating helper methods to handle the business logic? Thanks!

I do have a question though - since we'll be creating a number of specification objects on the heap, will that affect performance in any way versus, say creating helper methods to handle the business logic?
Of course it will affect performance, every line of code you write and design choice you makes affects performance in one way or another. This one is unlikely to be meaningful, be a bottleneck in your application or be worth caring about as this is almost surely a case of premature optimization. These days you should just focus on modeling your domain properly, and writing extremely clear and maintainable code. Focus more on developer productivity than on machine productivity. CPU cycles are cheap, and in nearly limitless supply. Developer cycles are not cheap, and are not limitless in supply.
But only you can know if it will impact the real-world use of your application on real-world data by profiling. We don't, and can't know, because we don't know your domain, don't know your users, don't know what performance you expect, etc. And even if we knew those things, we still couldn't give you as powerful of an answer as you can give yourself by dusting a profiler off the shelf and seeing what your application actually does.

since we'll be creating a number of specification objects on the heap, will that affect performance in any way
Most design patterns trade off some overhead for cleanliness of design - this is no exception. In general, the amount of memory that the specifications add is very minimal (typically a couple of references, and that's it). In addition, they tend to add a couple of extra method calls vs. custom logic.
That being said, I would not try to prematurely optimize this. The overhead here is incredibly small, so I would highly doubt it would be noticeable in any real world application.

If you use NSpecifications lib just as the examples in its GitHub page, you'll get the benefits from both worlds:
Most of these specifications are simply stored in static members therefore it doesn't take much from the heap
These specifications also use compiled expressions so that they can be reused many times with better performance
If you are using ORM to query the database with lambda expressions, that also uses the heap, the difference here is that NSpecifications stores those expressions inside a Spec object so that it can be reused for both business loginc and querying.
Check here
https://github.com/jnicolau/NSpecifications

Should Threads be avoided if at all possible inside software components?

I have recently been looking at code, specifically component oriented code that uses threads internally. Is this a bad practise. The code I looked at was from an F# example that showed the use of event based programming techniques. I can not post the code in case of copyright infringements, but it does spin up a thread of its own. Is this regarded as bad practise or is it feasible that code not written by yourself has full control of thread creation. I do point out that this code is not a visual component and is very much "built from scratch".
What are the best practises of component creation where threading would be helpful?
I am completely language agnostic on this, the f# example could have been in c# or python.
I am concerned about the lack of control over the components run time and hogging of resources, the example just implemented another thread, but as far as I can see there is nothing stopping this type of design from spawning as many threads as it wishes, well to the limit of what your program allows.
I did think of methods such as object injecting and so fourth, but threads are weird as they are from a component perspective pure "action" as opposed to "model, state, declarations"
any help would be great.

This is too general a question to bear any answer more specific than "it depends" :-)
There are cases when using internal threads within a component is completely valid, and there are cases when not. This has to be decided on a case by case basis. Overall, though, since threads do make the code much more difficult to test and maintain, and increase the chances of subtle, hard to find bugs, they should be used with caution, only when there is a really decisive reason to use them.
An example to the legitimate use of threads is a worker thread, where a component handling an event starts an action which takes a long time to execute (such as a lengthy computation, a web request, or extensive file I/O), and spawns a separate thread to do the job, so that the control can be immediately returned to the interface to handle further user input. Without the worker thread, the UI would be totally unresponsive for a long time, which usually makes users angry.
Another example is a lengthy calculation/process which lends itself well to parallel execution, i.e. it consists of many smaller independent tasks of more or less similar size. If there are strong performance requirements, it does indeed make sense to execute the individual tasks in a concurrent fashion using a pool of worker threads. Many languages provide high level support for such designs.
Note that components are generally free to allocate and use any other kinds of resources too and thus wreak havoc in countless other ways - are you ever worried about a component eating up all memory, exhausting the available file handles, reserving ports etc.? Many of these can cause much more trouble globally within a system than spawning extra threads.

There's nothing wrong about creating new threads in a component/library. The only thing wrong would be if it didn't give the consumer of the API/component a way to synchronize whenever necessary.

First of all, what is the nature of component you are talking about? Is it a dll to be consumed by some different code? What does it do? What are the business requirements? All these are essential to determine if you do need to worry about parallelism or not.
Second of all, threading is just a tool to acheive better performance, responsivness so avoiding it at all cost everywhere does not sound like a smart approach - threading is certainly vital for some business needs.
Third of all, when comparing threading symantics in c# vs f#, you have to remember that those are very different beasts in theirselfs - f# implicitly makes threading safer to code as there is no notion of global variables hence the critical section in your code is something easier to eschew in f# than in c#. That puts your as a deleloper in a better place bc you dont have to deal with memoryblocks, locks, semaphores etc.
I would say if your 'component' relies heavily on threading you might want to consider using either the parallel FX in c# or even go with f# since it kind of approaches working with processer time slicing and parallelism in more elegant way (IMHO).
And last but not least, when you say about hogging up computer resources by using threading in your component - please remember that coding threads do not necessarily impose higher resource impact per se – you can just as easily do the same damage on one thread if you don’t dispose of your objects (unmaneged) properly, granted you might get OutOfMemeory Exception faster when you make the same mistake on several threads…

Best practices to turn a .NET class library into a multithreaded .NET class library

I have some C# class libraries, that were designed without taking into account things like concurrency, multiple threads, locks, etc...
The code is very well structured, it is easily expandable, but it can benefit a lot from multithreading: it's set of scientific/engineering libraries that need to perform billions of calculations in very-very short time (and now they don't take benefit from the available cores).
I want to transform all this code into a set of multithreaded libraries, but I don't know where to start and I don't have any previous experience.
I could use any available help, and any recommendations/suggestions.

My recommendation would be to not do it. You didn't write that code to be used in parallel, so it's not going to work, and it's going to fail in ways that will be difficult to debug.
Instead, I recommend you decide ahead of time which part of that code can benefit the most from parallelism, and then rewrite that code, from scratch, to be parallel. You can take advantage of having the unmodified code in front of you, and can also take advantage of existing automated tests.
It's possible that using the .NET 4.0 Task Parallel Library will make the job easier, but it's not going to completely bridge the gap between code that was not designed to be parallel and code that is.

I'd highly recommend looking into .NET 4 and the Task Parallel Library (also available in .NET 3.5sp1 via the Rx Framework).
It makes many concurrency issues much simple, in particular, data parallelism becomes dramatically simpler. Since you're dealing with large datasets in most scientific/engineering libraries, data parallelism is often the way to go...
For some reference material, especially on data parallelism and background about decomposing and approaching the problem, you might want to read my blog series on Parallelism in .NET 4.

If you don't have any previous experience in multithreading then I would recommend that you get the basics first by looking at the various resources: https://stackoverflow.com/questions/540242/book-or-resource-on-c-concurrency
Making your entire library multithreaded requires a brand new architectural approach. If you simply go around and start putting locks everywhere in your code you'll end up making your code very cumbersome and you might not even achieve any performance increases.
The best concurrent software is lock-free and wait-free... this is difficult to achieve in C# (.NET) since most of your Collections are not lock-free, wait-free or even thread-safe. There are various discussions on lock-free data structures. A lot of people have referenced Boyet's articles (which are REALLY good) and some people have been throwing around The Task Parallel Library as the next thing in .NET concurrency, but TPL really doesn't give you much in terms of thread-safe collections.
.NET 4.0 is coming out with Collections.Concurrent which should help a lot.
Making your entire library concurrent would not be recommended since it wasn't designed with concurrency in mind from the start. Your next option is to go through your library and identify which portions of it are actually good candidates for multithreading, then you can pick the best concurrency solution for them and implement it. The main thing to remember is that when you write multithreaded code, the concurrency should result in increased throughput of your program. If increased throughput is not achieved (i.e. you either match or the throughput is less than in the sequential version), then you should simply not use concurrency in that code.

The best place to start is probably http://msdn.microsoft.com/en-us/concurrency/default.aspx
Good luck!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.