(More) Efficient locking when rotating nodes in threaded binary tree

(More) Efficient locking when rotating nodes in threaded binary tree - c#

So, I've come up with this scheme for locking nodes when rotating in a binary tree that several threads have both read and write access to at the same time, which involves locking four nodes per rotation, which seems to be an awfully lot? I figured some one way smarter then me had come up with a way to reduce the locking needed, but google didn't turn up much (I'm probably using the wrong terminology anyways).
This is my current scheme, Orange and Red nodes are either moved or changed by the rotation and need to be locked and Green nodes are adjacent to any node that is affected by the rotation but are not affected by it themselves.
binary tree rotation
I figured there has to be a better way to do this, one idea I have is to take a snapshot of the four nodes affected, rotate them in the snapshot and then replace the current nodes with the snapshot ones (assuming nothing has changed while I was doing the rotations) - this would allow me to be almost lock free but I'm afraid the memory overhead might be to much, considering that rotation is a rather quick operation (re-assigning three pointers) ?
I guess I'm looking for pointers (no pun) on how to do this efficiently.

Have a look at immutable binary trees. You have to change a bit more nodes (every node to the root), but that doesn't change the complexity of an insert which is log n both ways. Indeed, it may even improve the performance because you do not need any synchronization-code.
For example, Eric Lippert wrote some posts about immutable avl trees in C#. The last one was: Immutability in C# Part Nine: Academic? Plus my AVL tree implementation

Lock-free algorithms, as far as I can see, lack reference counts, which makes their general use problematic; you can't know if the pointer you have to an element is valid - maybe that element went away.
Valois' gets around this with some exotic memory allocation arrangement which is not useful in practise.
The only way I can see to do lock-free balanced trees is to use a modified MCAS, where you can also do increment/decrement, so you can maintain a reference count. I've not looked to see if MCAS can be modified in this way.

There would usually be one lock for the entire data structure, given that a lock itself usually takes up some serious resources. I'm guessing given that you are trying to avoid that, this must be some very specialized scenario where you have a lot of CPU intensive worker threads constantly updating the tree?
You could get a balance by making every N nodes share one lock. When entering a rotation operation, find the set of unique locks used by the affected nodes. Most of the time it will be just one lock.

the best solution depends on your application. But if you have like an in-memory database of stuff and you want to do concurrent updates to it AND want to have very fine-grained locking, you could also look at B-trees which do not require node rotation as often as binary trees. They are also automatically balanced, unlike binary trees, which require more rotations to maintain balancing (e.g. Splay or AVL trees).
If you want to have transactional modifications to the trees, instead, you could use functional B-trees (which Thomas Danecker calls immutable trees). These are kind of "copy-on-write" binary trees, and there the only thing you need to lock is the root node. So you need only one lock. And the operations have in practice the same complexity as for binary trees, because all ops on functional binary trees are O(log n) and you spend the same logarithmic time whenever you descend down any tree.
Having one lock per node is most likely not the correct solution.

John Valois' paper briefly describes an approach to implementing a binary search tree using auxilary nodes. I am using the auxilary node approach in my design of a concurrent LinkedHashMap. You can probably find many other papers on concurrent trees on CiteSeerX (an invaluable resource). I would stray away from using trees as a concurrent data collection unless truly necessary, as they tend to have poor concurrency characteristics due to rebalancing. Often more pragmatic approaches, like skip lists, work out better.

I played around making binary trees read lock-free using partial copy on write. I posted a incomplete implementation here http://groups.google.com/group/comp.programming.threads/msg/6c0775e9882516a2?hl=en&

Related

Concurrent skiplist Read locking

I will try to make this question as generic as possible, but I will give a brief introduction to my actual problem -
I am trying to implement a concurrent skiplist for a priority queue. Each 'node', has a value, and an array of 'forward' nodes, where node.forward[i] represents the next node on the i-th level of the skiplist. For write access (i.e. insertions and deletions), I use a Spinlock (still to determine if that is the best lock to use)
My question is essentially, when I need a read access for a traversal,
node = node.forward[i]
What kind of thread safety do I need around something like this? If another thread is modifying node.forward[i] at exactly the same time that I read (with no current locking mechanism for read), what can happen here?
My initial thought is to have a ReaderWriterLockSLim on the getter and setter of the indexer for Forward. Will there be too much unnecessary locking in this scenario?
Edit: Or would it be best to instead use a Interlocked.Exchange for all of my reads?

If another thread is modifying node.forward[i] at exactly the same time that I read (with no current locking mechanism for read), what can happen here?
It really depends on the implementation. It's possible to use Interlocked.Exchange when setting "forward" in a way that can prevent the references from being invalid (as it can make the "set" atomic), but there is no guarantee of which reference you'd get on read. However, with a naive implementation, anything can happen, including getting bad data.
My initial thought is to have a ReaderWriterLockSLim on the getter and setter of the indexer for Forward.
This is likely to be a good place to start. It will be fairly easy to make a properly synchronized collection using a ReaderWriterLockSlim, and functional is always the first priority.
This would likely be a good starting point.
Will there be too much unnecessary locking in this scenario?
There's no way to know without seeing how you implement it, and more importantly, how it's goign to be used. Depending on your usage, you can profile and look for optimization opportunities if necessary at that point.
On a side note - you might want to reconsider using node.Forward[i] as opposed to more of a "linked list" approach here. Any access to Forward[i] is likely to require a fair bit of synchronization to iterate through the skip list i steps, all of which will need some synchronization to prevent errors if there are concurrent writes anywhere between node and i elements beyond node. If you only look ahead one step, you can (potentially) reduce the amount of synchronization required.

LINQ performance FAQ

I am trying to get to grips with LINQ. The thing that bothers me most is that even as I understand the syntax better, I don't want to unwittingly sacrifice performance for expressiveness.
Are they any good centralized repositories of information or books for 'Effective LINQ' ? Failing that, what is your own personal favourite high-performance LINQ technique ?
I am primarily concerned with LINQ to Objects, but all suggestions on LINQ to SQL and LINQ to XML also welcome of course. Thanks.

Linq, as a built-in technology, has performance advantages and disadvantages. The code behind the extension methods has had considerable performance attention paid to it by the .NET team, and its ability to provide lazy evaluation means that the cost of performing most manipulations on a set of objects is spread across the larger algorithm requiring the manipulated set. However, there are some things you need to know that can make or break your code's performance.
First and foremost, Linq doesn't magically save your program the time or memory needed to perform an operation; it just may delay those operations until absolutely needed. OrderBy() performs a QuickSort, which will take nlogn time just the same as if you'd written your own QuickSorter or used List.Sort() at the right time. So, always be mindful of what you're asking Linq to do to a series when writing queries; if a manipulation is not necessary, look to restructure the query or method chain to avoid it.
By the same token, certain operations (sorting, grouping, aggregates) require knowledge of the entire set they are acting upon. The very last element in a series could be the first one the operation must return from its iterator. On top of that, because Linq operations should not alter their source enumerable, but many of the algorithms they use will (i.e. in-place sorts), these operations end up not only evaluating, but copying the entire enumerable into a concrete, finite structure, performing the operation, and yielding through it. So, when you use OrderBy() in a statement, and you ask for an element from the end result, EVERYTHING that the IEnumerable given to it can produce is evaluated, stored in memory as an array, sorted, then returned one element at a time. The moral is, any operation that needs a finite set instead of an enumerable should be placed as late in the query as possible, allowing for other operations like Where() and Select() to reduce the cardinality and memory footprint of the source set.
Lastly, Linq methods drastically increase the call stack size and memory footprint of your system. Each operation that must know of the entire set keeps the entire source set in memory until the last element has been iterated, and the evaluation of each element will involve a call stack at least twice as deep as the number of methods in your chain or clauses in your inline statement (a call to each iterator's MoveNext() or yielding GetEnumerator, plus at least one call to each lambda along the way). This is simply going to result in a larger, slower algorithm than an intelligently-engineered inline algorithm that performs the same manipulations. Linq's main advantage is code simplicity. Creating, then sorting, a dictionary of lists of groups values is not very easy-to-understand code (trust me). Micro-optimizations can obfuscate it further. If performance is your primary concern, then don't use Linq; it will add approximately 10% time overhead and several times the memory overhead of manipulating a list in-place yourself. However, maintainability is usually the primary concern of developers, and Linq DEFINITELY helps there.
On the performance kick: If performance of your algorithm is the sacred, uncompromisable first priority, you'd be programming in an unmanaged language like C++; .NET is going to be much slower just by virtue of it being a managed runtime environment, with JIT native compilation, managed memory and extra system threads. I would adopt a philosophy of it being "good enough"; Linq may introduce slowdowns by its nature, but if you can't tell the difference, and your client can't tell the difference, then for all practical purposes there is no difference. "Premature optimization is the root of all evil"; Make it work, THEN look for opportunities to make it more performant, until you and your client agree it's good enough. It could always be "better", but unless you want to be hand-packing machine code, you'll find a point short of that at which you can declare victory and move on.

Simply understanding what LINQ is doing internally should yield enough information to know whether you are taking a performance hit.
Here is a simple example where LINQ helps performance. Consider this typical old-school approach:
List<Foo> foos = GetSomeFoos();
List<Foo> filteredFoos = new List<Foo>();
foreach(Foo foo in foos)
{
if(foo.SomeProperty == "somevalue")
{
filteredFoos.Add(foo);
}
}
myRepeater.DataSource = filteredFoos;
myRepeater.DataBind();
So the above code will iterate twice and allocate a second container to hold the filtered values. What a waste! Compare with:
var foos = GetSomeFoos();
var filteredFoos = foos.Where(foo => foo.SomeProperty == "somevalue");
myRepeater.DataSource = filteredFoos;
myRepeater.DataBind();
This only iterates once (when the repeater is bound); it only ever uses the original container; filteredFoos is just an intermediate enumerator. And if, for some reason, you decide not to bind the repeater later on, nothing is wasted. You don't even iterate or evaluate once.
When you get into very complex sequence manipulations, you can potentially gain a lot by leveraging LINQ's inherent use of chaining and lazy evaluation. Again, as with anything, it's just a matter of understanding what it is actually doing.

There are various factors which will affect performance.
Often, developing a solution using LINQ will offer pretty reasonable performance because the system can build an expression tree to represent the query without actually running the query while it builds this. Only when you iterate over the results does it use this expression tree to generate and run a query.
In terms of absolute efficiency, running against predefined stored procedures you may see some performance hit, but generally the approach to take is to develop a solution using a system that offers reasonable performance (such as LINQ), and not worry about a few percent loss of performance. If a query is then running slowly, then perhaps you look at optimisation.
The reality is that the majority of queries will not have the slightest problem with being done via LINQ. The other fact is that if your query is running slowly, it's probably more likely to be issues with indexing, structure, etc, than with the query itself, so even when looking to optimise things you'll often not touch the LINQ, just the database structure it's working against.
For handling XML, if you've got a document being loaded and parsed into memory (like anything based on the DOM model, or an XmlDocument or whatever), then you'll get more memory usage than systems that do someting like raising events to indicate finding a start or end tag, but not building a complete in-memory version of the document (like SAX or XmlReader). The downside is that the event-based processing is generally rather more complex. Again, with most documents there won't be a problem - most systems have several GB of RAM, so taking up a few MB representing a single XML document is not a problem (and you often process a large set of XML documents at least somewhat sequentially). It's only if you have a huge XML file that would take up 100's of MB that you worry about the particular choice.
Bear in mind that LINQ allows you to iterate over in-memory lists and so on as well, so in some situations (like where you're going to use a set of results again and again in a function), you may use .ToList or .ToArray to return the results. Sometimes this can be useful, although generally you want to try to use the database's querying rather in-memory.
As for personal favourites - NHibernate LINQ - it's an object-relational mapping tool that allows you to define classes, define mapping details, and then get it to generate the database from your classes rather than the other way round, and the LINQ support is pretty good (certainly better than the likes of SubSonic).

In linq to SQL you don't need to care that much about performance. you can chain all your statements in the way you think it is the most readable. Linq just translates all your statements into 1 SQL statement in the end, which only gets called/executed in the end (like when you call a .ToList()
a var can contain this statement without executing it if you want to apply various extra statements in different conditions. The executing in the end only happens when you want to translate your statements into a result like an object or a list of objects.

There's a codeplex project called i4o which I used a while back which can help improve the performance of Linq to Objects in cases where you're doing equality comparisons, e.g.
from p in People
where p.Age == 21
select p;
http://i4o.codeplex.com/
I haven't tested it with .Net 4 so can't safely say it will still work but worth checking out.
To get it to work its magic you mostly just have to decorate your class with some attributes to specify which property should be indexed. When I used it before it only works with equality comparisons though.

What C# container is most resource-efficient for existence for only one operation?

I find myself often with a situation where I need to perform an operation on a set of properties. The operation can be anything from checking if a particular property matches anything in the set to a single iteration of actions. Sometimes the set is dynamically generated when the function is called, some built with a simple LINQ statement, other times it is a hard-coded set that will always remain the same. But one constant always exists: the set only exists for one single operation and has no use before or after it.
My problem is, I have so many points in my application where this is necessary, but I appear to be very, very inconsistent in how I store these sets. Some of them are arrays, some are lists, and just now I've found a couple linked lists. Now, none of the operations I'm specifically concerned about have to care about indices, container size, order, or any other functionality that is bestowed by any of the individual container types. I picked resource efficiency because it's a better idea than flipping coins. I figured, since array size is configured and it's a very elementary container, that might be my best choice, but I figure it is a better idea to ask around. Alternatively, if there's a better choice not out of resource-efficiency but strictly as being a better choice for this kind of situation, that would be nice as well.

With your acknowledgement that this is more about coding consistency rather than performance or efficiency, I think the general practice is to use a List<T>. Its actual backing store is an array, so you aren't really losing much (if anything noticable) to container overhead. Without more qualifications, I'm not sure that I can offer anything more than that.
Of course, if you truly don't care about the things that you list in your question, just type your variables as IEnumerable<T> and you're only dealing with the actual container when you're populating it; where you consume it will be entirely consistent.

There are two basic principles to be aware of regarding resource efficiency.
Runtime complexity
Memory overhead
You said that indices and order do not matter and that a frequent operation is matching. A Dictionary<T> (which is a hashtable) is an ideal candidate for this type of work. Lookups on the keys are very fast which would be beneficial in your matching operation. The disadvantage is that it will consume a little more memory than what would be strictly required. The usual load factor is around .8 so we are not talking about a huge increase or anything.
For your other operations you may find that an array or List<T> is a better option especially if you do not need to have the fast lookups. As long as you are not needing high performance on specialty operations (lookups, sorting, etc.) then it is hard to beat the general resource characteristics of array based containers.

List is probably fine in general. It's easy to understand (in the literate programming sense) and reasonably efficient. The keyed collections (e.g. Dict, SortedList) will throw an exception if you add an entry with a duplicate key, though this may not be a problem for what you're working on now.
Only if you find that you're running into a CPU-time or memory-size problem should you look at improving the "efficiency", and then only after determining that this is the bottleneck.
No matter which approach you use, there will still be creation and deletion of the underlying objects (collection or iterator) that will eventually be garbage collected, if the application runs long enough.

Why use flags+bitmasks rather than a series of booleans?

Given a case where I have an object that may be in one or more true/false states, I've always been a little fuzzy on why programmers frequently use flags+bitmasks instead of just using several boolean values.
It's all over the .NET framework. Not sure if this is the best example, but the .NET framework has the following:
public enum AnchorStyles
{
None = 0,
Top = 1,
Bottom = 2,
Left = 4,
Right = 8
}
So given an anchor style, we can use bitmasks to figure out which of the states are selected. However, it seems like you could accomplish the same thing with an AnchorStyle class/struct with bool properties defined for each possible value, or an array of individual enum values.
Of course the main reason for my question is that I'm wondering if I should follow a similar practice with my own code.
So, why use this approach?
Less memory consumption? (it doesn't seem like it would consume less than an array/struct of bools)
Better stack/heap performance than a struct or array?
Faster compare operations? Faster value addition/removal?
More convenient for the developer who wrote it?

It was traditionally a way of reducing memory usage. So, yes, its quite obsolete in C# :-)
As a programming technique, it may be obsolete in today's systems, and you'd be quite alright to use an array of bools, but...
It is fast to compare values stored as a bitmask. Use the AND and OR logic operators and compare the resulting 2 ints.
It uses considerably less memory. Putting all 4 of your example values in a bitmask would use half a byte. Using an array of bools, most likely would use a few bytes for the array object plus a long word for each bool. If you have to store a million values, you'll see exactly why a bitmask version is superior.
It is easier to manage, you only have to deal with a single integer value, whereas an array of bools would store quite differently in, say a database.
And, because of the memory layout, much faster in every aspect than an array. It's nearly as fast as using a single 32-bit integer. We all know that is as fast as you can get for operations on data.

Easy setting multiple flags in any order.
Easy to save and get a serie of 0101011 to the database.

Among other things, its easier to add new bit meanings to a bitfield than to add new boolean values to a class. Its also easier to copy a bitfield from one instance to another than a series of booleans.

It can also make Methods clearer. Imagine a Method with 10 bools vs. 1 Bitmask.

Actually, it can have a better performance, mainly if your enum derives from an byte.
In that extreme case, each enum value would be represented by a byte, containing all the combinations, up to 256. Having so many possible combinations with booleans would lead to 256 bytes.
But, even then, I don't think that is the real reason. The reason I prefer those is the power C# gives me to handle those enums. I can add several values with a single expression. I can remove them also. I can even compare several values at once with a single expression using the enum. With booleans, code can become, let's say, more verbose.

From a domain Model perspective, it just models reality better in some situations. If you have three booleans like AccountIsInDefault and IsPreferredCustomer and RequiresSalesTaxState, then it doesnn't make sense to add them to a single Flags decorated enumeration, cause they are not three distinct values for the same domain model element.
But if you have a set of booleans like:
[Flags] enum AccountStatus {AccountIsInDefault=1,
AccountOverdue=2 and AccountFrozen=4}
or
[Flags] enum CargoState {ExceedsWeightLimit=1,
ContainsDangerousCargo=2, IsFlammableCargo=4,
ContainsRadioactive=8}
Then it is useful to be able to store the total state of the Account, (or the cargo) in ONE variable... that represents ONE Domain Element whose value can represent any possible combination of states.

Raymond Chen has a blog post on this subject.
Sure, bitfields save data memory, but
you have to balance it against the
cost in code size, debuggability, and
reduced multithreading.
As others have said, its time is largely past. It's tempting to still do it, cause bit fiddling is fun and cool-looking, but it's no longer more efficient, it has serious drawbacks in terms of maintenance, it doesn't play nicely with databases, and unless you're working in an embedded world, you have enough memory.

I would suggest never using enum flags unless you are dealing with some pretty serious memory limitations (not likely). You should always write code optimized for maintenance.
Having several boolean properties makes it easier to read and understand the code, change the values, and provide Intellisense comments not to mention reduce the likelihood of bugs. If necessary, you can always use an enum flag field internally, just make sure you expose the setting/getting of the values with boolean properties.

Space efficiency - 1 bit
Time efficiency - bit comparisons are handled quickly by hardware.
Language independence - where the data may be handled by a number of different programs you don't need to worry about the implementation of booleans across different languages/platforms.
Most of the time, these are not worth the tradeoff in terms of maintance. However, there are times when it is useful:
Network protocols - there will be a big saving in reduced size of messages
Legacy software - once I had to add some information for tracing into some legacy software.
Cost to modify the header: millions of dollars and years of effort.
Cost to shoehorn the information into 2 bytes in the header that weren't being used: 0.
Of course, there was the additional cost in the code that accessed and manipulated this information, but these were done by functions anyways so once you had the accessors defined it was no less maintainable than using Booleans.

I have seen answers like Time efficiency and compatibility. those are The Reasons, but I do not think it is explained why these are sometime necessary in times like ours. from all answers and experience of chatting with other engineers I have seen it pictured as some sort of quirky old time way of doing things that should just die because new way to do things are better.
Yes, in very rare case you may want to do it the "old way" for performance sake like if you have the classic million times loop. but I say that is the wrong perspective of putting things.
While it is true that you should NOT care at all and use whatever C# language throws at you as the new right-way™ to do things (enforced by some fancy AI code analysis slaping you whenever you do not meet their code style), you should understand deeply that low level strategies aren't there randomly and even more, it is in many cases the only way to solve things when you have no help from a fancy framework. your OS, drivers, and even more the .NET itself(especially the garbage collector) are built using bitfields and transactional instructions. your CPU instruction set itself is a very complex bitfield, so JIT compilers will encode their output using complex bit processing and few hardcoded bitfields so that the CPU can execute them correctly.
When we talk about performance things have a much larger impact than people imagine, today more then ever especially when you start considering multicores.
when multicore systems started to become more common all CPU manufacturer started to mitigate the issues of SMP with the addition of dedicated transactional memory access instructions while these were made specifically to mitigate the near impossible task to make multiple CPUs to cooperate at kernel level without a huge drop in perfomrance it actually provides additional benefits like an OS independent way to boost low level part of most programs. basically your program can use CPU assisted instructions to perform memory changes to integers sized memory locations, that is, a read-modify-write where the "modify" part can be anything you want but most common patterns are a combination of set/clear/increment.
usually the CPU simply monitors if there is any other CPU accessing the same address location and if a contention happens it usually stops the operation to be committed to memory and signals the event to the application within the same instruction. this seems trivial task but superscaler CPU (each core has multiple ALUs allowing instruction parallelism), multi-level cache (some private to each core, some shared on a cluster of CPU) and Non-Uniform-Memory-Access systems (check threadripper CPU) makes things difficult to keep coherent, luckily the smartest people in the world work to boost performance and keep all these things happening correctly. todays CPU have a large amount of transistor dedicated to this task so that caches and our read-modify-write transactions work correctly.
C# allows you to use the most common transactional memory access patterns using Interlocked class (it is only a limited set for example a very useful clear mask and increment is missing, but you can always use CompareExchange instead which gets very close to the same performance).
To achieve the same result using a array of booleans you must use some sort of lock and in case of contention the lock is several orders of magnitude less permorming compared to the atomic instructions.
here are some examples of highly appreciated HW assisted transaction access using bitfields which would require a completely different strategy without them of course these are not part of C# scope:
assume a DMA peripheral that has a set of DMA channels, let say 20 (but any number up to the maximum number of bits of the interlock integer will do). When any peripheral's interrupt that might execute at any time, including your beloved OS and from any core of your 32-core latest gen wants a DMA channel you want to allocate a DMA channel (assign it to the peripheral) and use it. a bitfield will cover all those requirements and will use just a dozen of instructions to perform the allocation, which are inlineable within the requesting code. basically you cannot go faster then this and your code is just few functions, basically we delegate the hard part to the HW to solve the problem, constraints: bitfield only
assume a peripheral that to perform its duty requires some working space in normal RAM memory. for example assume a high speed I/O peripheral that uses scatter-gather DMA, in short it uses a fixed-size block of RAM populated with the description (btw the descriptor is itself made of bitfields) of the next transfer and chained one to each other creating a FIFO queue of transfers in RAM. the application prepares the descriptors first and then it chains with the tail of the current transfers without ever pausing the controller (not even disabling the interrupts). the allocation/deallocation of such descriptors can be made using bitfield and transactional instructions so when it is shared between diffent CPUs and between the driver interrupt and the kernel all will still work without conflicts. one usage case would be the kernel allocates atomically descriptors without stopping or disabling interrupts and without additional locks (the bitfield itself is the lock), the interrupt deallocates when the transfer completes.
most old strategies were to preallocate the resources and force the application to free after usage.
If you ever need to use multitask on steriods C# allows you to use either Threads + Interlocked, but lately C# introduced lightweight Tasks, guess how it is made? transactional memory access using Interlocked class. So you likely do not need to reinvent the wheel any of the low level part is already covered and well engineered.
so the idea is, let smart people (not me, I am a common developer like you) solve the hard part for you and just enjoy general purpose computing platform like C#. if you still see some remnants of these parts is because someone may still need to interface with worlds outside .NET and access some driver or system calls for example requiring you to know how to build a descriptor and put each bit in the right place. do not being mad at those people, they made our jobs possible.
In short : Interlocked + bitfields. incredibly powerful, don't use it

It is for speed and efficiency. Essentially all you are working with is a single int.
if ((flags & AnchorStyles.Top) == AnchorStyles.Top)
{
//Do stuff
}

Is a lock (wait) free doubly linked list possible?

Asking this question with C# tag, but if it is possible, it should be possible in any language.
Is it possible to implement a doubly linked list using Interlocked operations to provide no-wait locking? I would want to insert, add and remove, and clear without waiting.

Yes it's possible, here's my implementation of an STL-like Lock-Free Doubly-Linked List in C++.
Sample code that spawns threads to randomly perform ops on a list
It requires a 64-bit compare-and-swap to operate without ABA issues. This list is only possible because of a lock-free memory manager.
Check out the benchmarks on page 12. Performance of the list scales linearly with the number of threads as contention increases. The algorithm supports parallelism for disjoint accesses, so as the list size increases contention can decrease.

A simple google search will reveal many lock-free doubly linked list papers.
However, they are based on atomic CAS (compare and swap).
I don't know how atomic the operations in C# are, but according to this website
http://www.albahari.com/threading/part4.aspx
C# operations are only guaranteed to be atomic for reading and writing a 32bit field. No mention of CAS.

Here is a paper which discribes a lock free doublly linked list.
We present an efficient and practical
lock-free implementation of a
concurrent deque that is
disjoint-parallel accessible and uses
atomic primitives which are available
in modern computer systems. Previously
known lock-free algorithms of deques
are either based on non-available
atomic synchronization primitives,
only implement a subset of the
functionality, or are not designed for
disjoint accesses. Our algorithm is
based on a doubly linked list, and
only requires single-word
compare-and-swap...
Ross Bencina has some really good links I just found with numerious papers and source code excamples for "Some notes on lock-free and wait-free algorithms".

I don't believe this is possible, since you're having to set multiple references in one shot, and the interlocked operations are limited in their power.
For example, take the add operation - if you're inserting node B between A and C, you need to set B->next, B->prev, A->next, and C->prev in one atomic operation. Interlocked can't handle that. Presetting B's elements doesn't even help, because another thread could decide to do an insert while you're preparing "B".
I'd focus more on getting the locking as fine-grained as possible in this case, not trying to eliminate it.

Read the footnote - they plan to pull ConcurrentLinkedList from 4.0 prior to the final release of VS2010

Well you haven't actually asked how to do it. But, provided you can do an atomic CAS in c# it's entirely possible.
In fact I'm just working through an implementation of a doubly linked wait free list in C++ right now.
Here is paper describing it.
http://www.cse.chalmers.se/~tsigas/papers/Haakan-Thesis.pdf
And a presentation that may also provide you some clues.
http://www.ida.liu.se/~chrke/courses/MULTI/slides/Lock-Free_DoublyLinkedList.pdf

It is possible to write lock free algorithms for all copyable data structures on most architectures [1]. But it is hard to write efficient ones.
I wrote an implementation of the lock-free doubly linked list by Håkan Sundell and Philippas Tsigas for .Net. Note, that it does not support atomic PopLeft due to the concept.
[1]: Maurice Herlihy: Impossibility and universality results for wait-freesynchronization (1988)

FWIW, .NET 4.0 is adding a ConcurrentLinkedList, a threadsafe doubly linked list in the System.Collections.Concurrent namespace. You can read the documentation or the blog post describing it.

I would say that the answer is a very deeply qualified "yes, it is possible, but hard". To implement what you're asking for, you'd basically need something that would compile the operations together to ensure no collisions; as such, it would be very hard to create a general implementation for that purpose, and it would still have some significant limitations. It would probably be simpler to create a specific implementation tailored to the precise needs, and even then, it wouldn't be "simple" by any means.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.