当前的"无锁"实现大多遵循相同的模式:
(*optional: depends on the data structure/algorithm)
The last bit is eerily similar to a spinlock. In fact, it is a basic spinlock. :)
I agree with @nobugz on this: the cost of the interlocked operations used in lock-free multi-threading is dominated by the cache and memory-coherency tasks it must carry out.
What you gain however with a data structure that is "lock-free" is that your "locks" are very fine grained.这降低了两个并发线程访问同一个"锁"(内存位置)的可能性.
The trick most of the time is that you do not have dedicated locks - instead you treat e.g. all elements in an array or all nodes in a linked list as a "spin-lock". You read, modify and try to update if there was no update since your last read. If there was, you retry.
This makes your "locking" (oh, sorry, non-locking :) very fine grained, without introducing additional memory or resource requirements.
Making it more fine-grained decreases the probability of waits. Making it as fine-grained as possible without introducing additional resource requirements sounds great, doesn't it?
Most of the fun however can come from ensuring correct load/store ordering.
Contrary to one's intuitions, CPUs are free to reorder memory reads/writes - they are very smart, by the way: you will have a hard time observing this from a single thread. You will, however run into issues when you start to do multi-threading on multiple cores. Your intuitions will break down: just because an instruction is earlier in your code, it does not mean that it will actually happen earlier. CPUs can process instructions out of order: and they especially like to do this to instructions with memory accesses, to hide main memory latency and make better use of their cache.
现在,与直觉相反的是,代码序列不是"自上而下"流动的,相反,它运行起来就好像根本没有序列一样--可能被称为"魔鬼playground ".我认为,对于将发生哪些加载/存储重新排序,给出确切的答案是不可行的.相反,人们总是用mays、mights和cans来说话,并做最坏的打算."哦,CPUmight把这个读重新排序到那个写之前,所以最好在这里,这个地方放一个内存屏障."
事实上,即使是这mays和mights也可能因CPU体系 struct 的不同而有所不同,这使问题变得复杂.例如,在这种情况下,在一个架构中是guaranteed to not happen的东西在另一个架构上是might happen.
To get "lock-free" multi-threading right, you have to understand memory models.
Getting the memory model and guarantees correct is not trivial however, as demonstrated by this story, whereby Intel and AMD made some corrections to the documentation of MFENCE
causing some stir-up among JVM developers. As it turned out, the documentation that developers relied on from the beginning was not so precise in the first place.
NET中的锁会导致隐式内存屏障,因此您使用它们是安全的(即在大多数情况下.例如,参见关于懒惰初始化、锁定、挥发性和内存障碍的这Joe Duffy - Brad Abrams - Vance Morrison greatness条.:)(请务必按照该页面上的链接操作.)
作为额外的奖励,你将得到get introduced to the .NET memory model on a side quest美元.:)
还有一首来自万斯·莫里森(Vance Morrison)的"Oddie But Goldie":What Every Dev Must Know About Multithreaded Apps.
.当然,正如@Eric提到的,Joe Duffy是关于这个主题的权威读物.
一个好的STM可以得到接近细粒度锁定,因为它得到的,可能会提供一个性能接近或等同于手工制作的实现.
If you are not a .NET-only zealot, Doug Lea did some great work in JSR-166.
Cliff Click has an interesting take on hash tables that does not rely on lock-striping - as the Java and .NET concurrent hash tables do - and seem to scale well to 750 CPUs.
如果你不怕冒险进入Linux领域,下面这篇文章将深入了解当前内存体系 struct 的内部 struct ,以及缓存线共享如何 destruct 性能:What every programmer should know about memory.
@本对MPI发表了很多 comments :我真诚地同意MPI可能在某些领域大放异彩.一个基于MPI的解决方案比一个半生不熟的锁定实现更容易推理,更容易实现,也不容易出错.(然而,主观上,基于STM的解决方案也是如此.)我还敢打赌,正如许多成功的例子所表明的那样,用Erlang正确地编写一个像样的distributed应用程序要容易得多.
MPI, however has its own costs and its own troubles when it is being run on a single, multi-core system. E.g. in Erlang, there are issues to be solved around the synchronization of process scheduling and message queues.
Also, at their core, MPI systems usually implement a kind of cooperative N:M scheduling for "lightweight processes". This for example means that there is an inevitable context switch between lightweight processes. It is true that it is not a "classic context switch" but mostly a user space operation and it can be made fast - however I sincerely doubt that it can be brought under the 20-200 cycles an interlocked operation takes. User-mode context switching is certainly slower even in the the Intel McRT library.
N:M scheduling with light-weight processes is not new. LWPs were there in Solaris for a long time. They were abandoned. There were fibers in NT. They are mostly a relic now. There were "activations" in NetBSD. They were abandoned. Linux had its own take on the subject of N:M threading. It seems to be somewhat dead by now.
From time to time, there are new contenders: for example McRT from Intel, or most recently User-Mode Scheduling together with ConCRT from Microsoft.
At the lowest level, they do what an N:M MPI scheduler does. Erlang - or any MPI system -, might benefit greatly on SMP systems by exploiting the new UMS.
I guess the OP's question is not about the merits of and subjective arguments for/against any solution, but if I had to answer that, I guess it depends on the task: for building low level, high performance basic data structures that run on a single system with many cores, either low-lock/"lock-free" techniques or an STM will yield the best results in terms of performance and would probably beat an MPI solution any time performance-wise, even if the above wrinkles are ironed out e.g. in Erlang.
For building anything moderately more complex that runs on a single system, I would perhaps choose classic coarse-grained locking or if performance is of great concern, an STM.
For building a distributed system, an MPI system would probably make a natural choice.
Note that there are MPI implementations for .NET as well (though they seem to be not as active).