0. 들어가며

멀티스레드 환경에서 일반 HashMap을 여러 스레드가 공유하여 사용하다보면 문제가 발생한다.

예를 들어 아래와 같이 코드를 작성하고 실행해보면 값이 이상하게 들어간다는 사실을 알 수 있다.

class MyTest {
    @Test
    fun test() {
        val hashMap = HashMap<Int, String>()

        val threads = (1..10).map {
            Thread {
                repeat(1000) { i -> hashMap[i] = "value-$i" }
            }
        }
        threads.forEach { it.start() }
        threads.forEach { it.join() }

        println("size = ${hashMap.size}") // 1000이 안나옴..
    }
}

이러한 문제로 인해 멀티스레드 환경에서 ConcurrentHashMap을 많이 사용하곤 하는데 ConcurrentHashMap은 어떤 방식으로 동기화 처리를 하고 있는지 궁금하여 살펴보았고, 이를 정리한 내용이다.

이하 ConcurrentHashMap을 “CHM”으로 표기할 예정이다.

1. CHM이 공식 설명 훑기

CHM 클래스 선언부에는 굉장히 긴 주석이 붙어있다. 이 주석은 단순 설명 이외에도 이 자료구조가 어떤 보장과 제약을 갖는지를 설명하고 있는데, 내부 코드 분석에 들어가기 전 먼저 맥락을 잡기 위해 알아보자.

<aside> 💡

A hash table supporting full concurrency of retrievals and high expected concurrency for updates.

Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove).

Retrievals reflect the results of the most recently completed update operations holding upon their onset.

</aside>

CHM은 조회의 완전한 동시성과 업데이트에 높은 동시성을 지원하는 해시테이블이다.
읽기는 대부분 block 되지 않으며, 쓰는 중에도 읽을 수 있다.
또한 읽기는 항상 가장 최근에 update된 내용을 반영한다. (happens-before 관계가 보장된다.)

<aside> 💡

Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.

They do not throw ConcurrentModificationException.

</aside>