ThreadLocal变量的性能

Question

6 浏览2023年5月2日

匿名的 2023年5月2日

0 Comments

使用ThreadLocal变量读取的速度比普通字段慢多少？

更具体地说，简单对象创建的速度比访问ThreadLocal变量快还是慢？

我假设它足够快，这样拥有ThreadLocal实例比每次创建MessageDigest实例要快得多。但是对于byte[10]或byte[1000]之类的情况是否也适用？

编辑：问题是调用ThreadLocal的get时到底发生了什么？如果它只是一个像其他字段一样的字段，那么答案就是“它总是最快的”，对吗？

0

3 答案

匿名的 · Answer 1 · 2023-09-14T03:59:11+00:00

ThreadLocal变量的性能问题是因为一些JVM在Thread.currentThread()对象中使用了一个未同步的HashMap实现ThreadLocal。这使得它非常快速（当然不及使用常规字段访问快），同时确保在线程结束时清理ThreadLocal对象。更新到2016年，似乎大多数（全部？）较新的JVM使用了线性探测的ThreadLocalMap。我对这些的性能不确定，但我无法想象它会比早期的实现明显更差。

当然，现在的new Object()也非常快，垃圾收集器也非常擅长回收短生命周期的对象。

除非你确定对象创建会很昂贵，或者你需要在每个线程上保持一些状态，否则最好选择更简单的按需分配解决方案，并且只在分析器告诉你需要时切换到ThreadLocal实现。

可以给我一个不使用线性探测的ThreadLocalMap的现代JVM的例子吗？Java 8 OpenJDK似乎仍在使用线性探测的ThreadLocalMap。grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/lang/ThreadLocal.java#297

对不起，我不能。我在2009年写的。我会更新的。

匿名的 · Answer 2 · 2023-06-07T08:57:49+00:00

问题：ThreadLocal变量的性能表现

原因：

- 在Sun的实现中，ThreadLocal变量是通过一个自定义的线性探测哈希表来映射ThreadLocal变量和值的。因为它只会被单个线程访问，所以它可以非常快速地访问。

- 分配小对象的时间也大致相同，尽管由于缓存耗尽，在紧密循环中可能会得到稍低的数值。

- 构造MessageDigest可能相对昂贵。它有相当多的状态，并且构造过程通过Provider SPI机制进行。可以通过克隆或提供Provider来进行优化。

解决方法：

- 尽管在缓存ThreadLocal变量而不是创建它可能更快，但并不意味着系统性能会提高。您将面临与垃圾回收相关的额外开销，这会减慢一切。

- 除非您的应用程序非常频繁地使用MessageDigest，否则您可能希望考虑使用传统的线程安全缓存。

IMHO，最快的方法是忽略SPI并使用类似new org.bouncycastle.crypto.digests.SHA1Digest()的方法。我非常确定没有缓存能击败它。

ThreadLocal变量的性能表现可能受到多个因素的影响，包括ThreadLocal变量的实现方式、对象分配的时间以及相关的垃圾回收开销。在使用ThreadLocal变量时，需要权衡性能和其他因素，并选择适合的缓存策略或其他解决方法。

匿名的 · Answer 3 · 2023-06-29T01:48:06+00:00

Performance of ThreadLocal variable

ThreadLocal variables are often used in multithreaded applications to store thread-specific data. However, the performance of accessing and updating ThreadLocal variables can vary depending on the implementation and architecture.

In a benchmark performed on an AMD 4x 2.8 GHz dual-cores and a quad-core i7 with hyperthreading (2.67 GHz), the performance of ThreadLocal variables was compared to that of heap read operations. The benchmark was written in Scala and compiled to virtually the same bytecodes as the equivalent Java code.

The benchmark consisted of two methods: "loop_heap_write" and "threadlocal". In "loop_heap_write", a shared string variable was accessed and updated in a loop. In "threadlocal", a ThreadLocal variable was accessed and checked for null in a loop.

The results of the benchmark showed that accessing and updating the ThreadLocal variable was around 10-20 times slower than the heap read operation. The performance of the ThreadLocal variable also scaled well with the number of processors on both AMD and i7 architectures.

It is important to note that this benchmark may not simulate a typical use case for ThreadLocal variables. In the first method, the string variable did not change, and in the second method, the benchmark measured the cost of a hashtable lookup. Real-world applications may have different access patterns and may not be affected in the same way.

However, in the worst case scenario where the entire computation consists of reading a ThreadLocal variable, the performance can be significantly slower compared to heap read operations.

In conclusion, the performance of ThreadLocal variables can be slower compared to heap read operations, especially in extreme edge cases. It is important to consider the specific use case and access pattern when deciding whether to use ThreadLocal variables in a multithreaded application.