我正在通读Jon Skeet给一个问题的answer分,他在里面提到:

就我而言,无锁多线程是为真正的线程专家设计的,我不是其中之一.

这不是我第一次听到这句话,但如果你对学习如何编写无锁多线程代码感兴趣,我发现很少有人谈论你实际上是如何做到这一点的.

所以我的问题是,除了学习所有关于线程的知识之外,你从哪里开始学习编写无锁多线程代码,以及哪些是好的资源.

干杯

推荐答案

当前的"无锁"实现大多遵循相同的模式:

  • read some state and make a copy of it*
  • modify copy*
  • 执行联锁操作
  • 如果失败,请重试

(*optional: depends on the data structure/algorithm)

The last bit is eerily similar to a spinlock. In fact, it is a basic spinlock. :)
I agree with @nobugz on this: the cost of the interlocked operations used in lock-free multi-threading is dominated by the cache and memory-coherency tasks it must carry out.

What you gain however with a data structure that is "lock-free" is that your "locks" are very fine grained.这降低了两个并发线程访问同一个"锁"(内存位置)的可能性.

The trick most of the time is that you do not have dedicated locks - instead you treat e.g. all elements in an array or all nodes in a linked list as a "spin-lock". You read, modify and try to update if there was no update since your last read. If there was, you retry.
This makes your "locking" (oh, sorry, non-locking :) very fine grained, without introducing additional memory or resource requirements.
Making it more fine-grained decreases the probability of waits. Making it as fine-grained as possible without introducing additional resource requirements sounds great, doesn't it?

Most of the fun however can come from ensuring correct load/store ordering.
Contrary to one's intuitions, CPUs are free to reorder memory reads/writes - they are very smart, by the way: you will have a hard time observing this from a single thread. You will, however run into issues when you start to do multi-threading on multiple cores. Your intuitions will break down: just because an instruction is earlier in your code, it does not mean that it will actually happen earlier. CPUs can process instructions out of order: and they especially like to do this to instructions with memory accesses, to hide main memory latency and make better use of their cache.

现在,与直觉相反的是,代码序列不是"自上而下"流动的,相反,它运行起来就好像根本没有序列一样--可能被称为"魔鬼playground ".我认为,对于将发生哪些加载/存储重新排序,给出确切的答案是不可行的.相反,人们总是用maysmightscans来说话,并做最坏的打算."哦,CPUmight把这个读重新排序到那个写之前,所以最好在这里,这个地方放一个内存屏障."

事实上,即使是这maysmights也可能因CPU体系 struct 的不同而有所不同,这使问题变得复杂.例如,在这种情况下,在一个架构中是guaranteed to not happen的东西在另一个架构上是might happen.


To get "lock-free" multi-threading right, you have to understand memory models.
Getting the memory model and guarantees correct is not trivial however, as demonstrated by this story, whereby Intel and AMD made some corrections to the documentation of MFENCE causing some stir-up among JVM developers. As it turned out, the documentation that developers relied on from the beginning was not so precise in the first place.

NET中的锁会导致隐式内存屏障,因此您使用它们是安全的(即在大多数情况下.例如,参见关于懒惰初始化、锁定、挥发性和内存障碍的这Joe Duffy - Brad Abrams - Vance Morrison greatness条.:)(请务必按照该页面上的链接操作.)

作为额外的奖励,你将得到get introduced to the .NET memory model on a side quest美元.:)

还有一首来自万斯·莫里森(Vance Morrison)的"Oddie But Goldie":What Every Dev Must Know About Multithreaded Apps.

.当然,正如@Eric提到的,Joe Duffy是关于这个主题的权威读物.

一个好的STM可以得到接近细粒度锁定,因为它得到的,可能会提供一个性能接近或等同于手工制作的实现.

If you are not a .NET-only zealot, Doug Lea did some great work in JSR-166.
Cliff Click has an interesting take on hash tables that does not rely on lock-striping - as the Java and .NET concurrent hash tables do - and seem to scale well to 750 CPUs.

如果你不怕冒险进入Linux领域,下面这篇文章将深入了解当前内存体系 struct 的内部 struct ,以及缓存线共享如何 destruct 性能:What every programmer should know about memory.

@本对MPI发表了很多 comments :我真诚地同意MPI可能在某些领域大放异彩.一个基于MPI的解决方案比一个半生不熟的锁定实现更容易推理,更容易实现,也不容易出错.(然而,主观上,基于STM的解决方案也是如此.)我还敢打赌,正如许多成功的例子所表明的那样,用Erlang正确地编写一个像样的distributed应用程序要容易得多.

MPI, however has its own costs and its own troubles when it is being run on a single, multi-core system. E.g. in Erlang, there are issues to be solved around the synchronization of process scheduling and message queues.
Also, at their core, MPI systems usually implement a kind of cooperative N:M scheduling for "lightweight processes". This for example means that there is an inevitable context switch between lightweight processes. It is true that it is not a "classic context switch" but mostly a user space operation and it can be made fast - however I sincerely doubt that it can be brought under the 20-200 cycles an interlocked operation takes. User-mode context switching is certainly slower even in the the Intel McRT library. N:M scheduling with light-weight processes is not new. LWPs were there in Solaris for a long time. They were abandoned. There were fibers in NT. They are mostly a relic now. There were "activations" in NetBSD. They were abandoned. Linux had its own take on the subject of N:M threading. It seems to be somewhat dead by now.
From time to time, there are new contenders: for example McRT from Intel, or most recently User-Mode Scheduling together with ConCRT from Microsoft.
At the lowest level, they do what an N:M MPI scheduler does. Erlang - or any MPI system -, might benefit greatly on SMP systems by exploiting the new UMS.

I guess the OP's question is not about the merits of and subjective arguments for/against any solution, but if I had to answer that, I guess it depends on the task: for building low level, high performance basic data structures that run on a single system with many cores, either low-lock/"lock-free" techniques or an STM will yield the best results in terms of performance and would probably beat an MPI solution any time performance-wise, even if the above wrinkles are ironed out e.g. in Erlang.
For building anything moderately more complex that runs on a single system, I would perhaps choose classic coarse-grained locking or if performance is of great concern, an STM.
For building a distributed system, an MPI system would probably make a natural choice.
Note that there are MPI implementations for .NET as well (though they seem to be not as active).

.net相关问答推荐

正则表达式匹配 URL 中的多个子目录

IIS 发布 ASP.NET Core 应用程序而不关闭 IIS 网站

更改列表中的值

移位比Java中的乘法和除法更快吗? .网?

为什么这个多态 C# 代码会打印它的功能?

编译时禁用 Dll 文化文件夹

Style 和 ControlTemplate 的区别

.NET 中工作线程和 I/O 线程的简单描述

如何在 .NET 中将字符串转换为字节数组?

参数命名:文件名还是文件名?

react 式扩展使用的好例子

dotnet 恢复警告 NU1701

DataGridView 在我的两个屏幕之一上的可怕重绘性能

使 HashSet 不区分大小写

Guid.Parse() 或 new Guid() - 有什么区别?

如何比较泛型类型的值?

为什么要使用 C# 类 System.Random 而不是 System.Security.Cryptography.RandomNumberGenerator?

在 C# 中使用 Bitmap 对象查找图像格式

在 IIS 中访问 .svc 文件时出现 HTTP 404

检测到包降级警告(dotnet core,vs 2017)