삶은 확률의 구름

Python vs PyPy

minkyung — Sun, 16 Nov 2025 15:20:31 +0900

Python과 PyPy는 같은 언어를 다른 엔진으로 돌리는 것으로 볼 수 있다. Python(CPython)은 우리가 흔히 파이썬이라고 부르는 기본 구현체이며 C언어로 작성되어 있다. .py 파일을 실행할 때 실제로 코드를 해석하고 실행하는 것은 CPython 인터프리터다. PyPy는 파이썬언어의 대체 구현체로 자체적인 JIT(Just-In-Time) 컴파일러를 포함하고 있어 실행 속도를 극적으로 높일 수 있다. CPython은 모든 코드를 한 줄 씩 읽고 해석하기 때문에 느리지만 PyPy는 실행 중 자주 반복되는 코드를 기계어로 즉석에서 컴파일(JIT)하기 때문에 이후엔 빠르게 실행한다.

CPython = 바이트코드 인터프리터

CPython은 파이썬 코드를 바로 실행할 때는 다음과 같은 일이 일어난다.

x = 0
for i in range(10):
    x += i
print(x)

(1) 바이트코드(bytecode)로 변환
바이트코드는 "파이썬 가상 머신(Python Virtual Machine, PVM)이 이해하는 작은 명령어들"이다. 예를 들면 LOAD_FAST, INPLACE_ADD, STORE_FAST 같은 짧은 명령어들의 리스트가 된다. 이건 아직 CPU가 이해하는 기계어는 아니고 중간 미니 언어정도이다.

(2) 바이트코드 실행
CPython 안에는 “바이트코드 인터프리터 루프”라는 게 있다. 그 루프는 한 바이트코드 명령어를 읽고 → 어떤 동작을 하고 → 다음 바이트코드를 읽고… 이런 식으로 계속 돈다.
예: INPLACE_ADD라는 명령어를 만나면 “스택 위의 두 값을 꺼내서 더하고 다시 넣어라” 같은 C 코드 함수가 실행된다.

CPython은 코드를 바이트코드로 바꾼 다음, 바이트코드를 한 줄씩 해석해서 실행하는 구조다. 이 구조는 단순하고 유연하지만 느리다. CPU는 더하기를 한 번에 할 수 있는데 CPython은 바이트코드 명령어를 가져오고 이 명령어가 뭔지 switch문 같은 걸로 확인하고 필요한 파이썬 객체를 꺼내고 (여기서도 참조 카운트 증가/감소 등 부가 작업) 타입이 뭔지 확인하고 (int인지 bool인지 등) 실제 연산하고 다시 파이썬 객체로 결과를 감싸서 스택에 넣고 다음 바이트코드로 넘어간다.

CPU 입장에서 보면 “1+2” 한 번 하는데 매번 이런 의식을 치르는 셈이다. 절차는 엄청 안전하지만 빠르진 않다. 바이트코드를 매 순간 읽고 해석하고 처리하기 때문에 오버헤드가 크다.

PyPy + JIT

PyPy도 처음에는 똑같이 파이썬 코드를 읽고 바이트코드를 실행한다. 하지만 루프를 돌다가 이 구조가 계속 반복되는 구조라면 CPU가 바로 실행할 수 있는 네이티브 기계어로 바꿔서 캐시에 저장해두는 식이다. 이 과정을 JIT 컴파일이라고 한다.

즉 코드를 실행하면서 어떤 부분이 많이 반복되는지 기록한다. 보통은 루프나 함수 호출 같은 부분이 hot spot으로 찍혀 반복구간이 많이 도는 hot 구간이면 그 부분의 바이트코드를 분석해 결국 어떤 연산들의 조합인지 파악하게 되고 이 덩어리를 trace라고 한다.

trace를 바탕으로 실제 CPU용 기계어를 즉석에서 만든다. C코드로 컴파일하지 않고 x86같은 CPU가 바로 실행 가능한 수준의 어셈블리/기계어이다. 따라서 동일 루프를 다시 돌 때는 바이트코드로 한 줄씩 해석 > 함수 호출 > 타입 체크 같은 과정 없이 이미 만들어둔 기계어 브록을 점프해서 실행한다.

x = 0
i = 0
while i < 10:
    x += i
    i += 1
print(x)

이걸 CPython이든 PyPy든 바이트코드로 바꾸면 대충 이런 흐름이 나온다

LOAD i
LOAD 10
COMPARE_LT
JUMP_IF_FALSE → (루프 끝으로 나감)
LOAD x
LOAD i
INPLACE_ADD
STORE x
LOAD i
LOAD 1
INPLACE_ADD
STORE i
JUMP_ABSOLUTE → (루프 조건 검사 위치로 돌아감)

JUMP_ABSOLUTE. 이 명령은 프로그램 카운터(=다음에 실행할 바이트코드 위치)를 다시 위로 되돌린다는 뜻이다.
루프는 기계적으로 보면: A 지점에서 시작해서 … 쭉 실행한 다음 … 다시 A 지점으로 점프한다.
파이썬 소스 코드의 while/for 같은 문법은 런타임한테는 결국 동일한 위치로 반복해서 되돌아오는 점프 사이클이다.
“같은 루프냐?”는 “같은 바이트코드 위치로 계속 되돌아오고 있냐?”로 치환할 수 있다.

PyPy JIT은 모든 바이트코드 위치(정확히 말하면 특정 loop header 지점)에 대해 얼마나 자주 여길 다시 방문하는지 카운터를 유지한다.

루프 시작 위치 A에 도착할 때마다 counter[A] += 1, counter[A]가 어떤 임계값(threshold)을 넘으면 A는 진짜 자주 돈다. 여기 최적화할 가치가 있다고 판단하여 JIT을 발동해 trace를 만든다.
PyPy가 같은 루프라고 말할 때, 그건 같은 루프 헤더(같은 진입 지점 A)를 계속 방문하고 있다는 뜻에 가깝다.

이제 threshold를 넘어서 이 루프 최적화해보자고 결정했다면 PyPy는 다음과 같은 작업을 시작한다:

인터프리터 모드에서 실제로 그 루프를 한 바퀴(혹은 여러 바퀴) 돌아본다. 그동안 실행된 바이트코드들과 그 사이의 연산(덧셈, 비교, 변수 로드 등)을 그대로 기록한다. 이걸 trace라고 부른다. trace는 실제로 일어난 실행 경로의 기록이다. 그리고 trace는 가능한 모든 경우가 아니라 지금 실제로 일어난 경우만 적는다.

PyPy는 결국 trace를 바탕으로 네이티브 코드를 만든다. 여기서 말하는 네이티브 코드 = x86-64나 ARM 같은 실제 CPU 명령어들의 시퀀스다. 이건 운영체제가 바로 실행할 수 있는 수준의 코드 조각이며 파이썬 바이트코드랑은 다르다. 바이트코드는 파이썬 인터프리터만 이해한다. 네이티브 코드는 CPU가 직접 이해한다. 이 과정에서 PyPy는 불필요한 것들을 빼버린다.

if i < 10:
    x += i
else:
    x -= i

현재 실행에서 i가 계속 0,1,2,...9라서 항상 i < 10이 참이라고 하면 trace에는 “if의 참 경로”만 기록된다. 거기엔 else 분기가 안 들어간다. 덕분에 trace는 굉장히 직선적인 코드(분기 거의 없음)가 된다. 직선적이어야 기계어로 변환하기 쉽다. CPU는 직선 코드를 사랑한다. 분기는 느리다. 결과적으로 최적화하기 쉬운 핫 경로(hot path)를 얻게 된다.

매번 이 값은 파이썬 int인지 확인 → 이 루프 안에서는 이미 int만 봤으니 생략
파이썬 객체 언박싱/박싱 반복 → 가능하면 내부에서 그냥 원시 정수 레지스터로 굴림
스택에 push/pop 반복 호출 → 그냥 레지스터 이동으로 단순화

결과적으로 trace는 고도로 최적화된 기계어 블록으로 변환된다. 이 블록은 한 바퀴(또는 여러 바퀴) 루프를 도는 데 필요한 계산을 거의 C에 가까운 속도로 수행한다.

그런데 else로 다시 진입하는 경우나 타입이 계속 고정이지 않은 경우가 있을 것이다.. 이렇게 캐시한 과정에서 타입이 바뀌거나 하는 변동이 생기는 것을 대비해 guard, safepoint도 같이 만든다. 예를 들어 i가 int일 거라고 가정하고 만든 기계어가 있는데 갑자기 i가 문자열이 되면 그 최적화는 깨진다. 가정이 깨지면 safepoint로 점프해서 원래의 느린 인터프리터 실행으로 돌아가게 된다.

그리고 이 만들어진 기계어 블록, 즉 JIT 컴파일 결과물은 메모리에 올려눋다. 운영체제 입장에서 보면 실행 가능한 코드 덩어리(메모리 페이지)인 것이다. 그다음 PyPy는 내부적으로 이렇게 매핑을 만든다.

key: 이 루프 헤더(= 특정 바이트코드 오프셋, 즉 ‘프로그램 카운터 위치’)
value: 이 루프를 위한 JIT 컴파일된 기계어 블록의 주소

즉 이 핫 루프는 이미 최적화된 코드가 있다는 사실이 PyPy 내부 테이블에 기록된다. 이 테이블은 파이썬 레벨에서 접근하는 일반 dict는 아니고, JIT 런타임(저수준 C/어셈블리 쪽)에서 관리하는 캐시다.

루프 시작 지점 A → 네이티브 코드 블록 #17
루프 시작 지점 B → 네이티브 코드 블록 #22
…

PyPy 이득이 있는 상황

PyPy는 빠른 파이썬이지만 모든 상황에서 이득이 있는 것은 아니다. PyPy는 JIT과 내부 최적화 때문에 실행 속도는 빨라도 메모리 사용량은 더 많다. 언제 쓰면 좋은지 판단하는 기준은 결국 "JIT이 최적화할 여지가 있느냐"에 달려있다. 특히 루프나 수학 연산이 많은 코드에서 성능 차이가 뚜렷하기 때문에 수치 계산이나 시뮬레이션, 알고리즘 테스트, CPU 중심 작업(피보나치, 행렬곱, 정렬 등)은 3-10배 빨라질 수 있다. 짧게 실행되는 스크립트는 JIT이 학습할 시간도 없이 종료하기 때문에 이득이 거의 없다. 서버, 게임 엔진, 수학 시뮬레이션처럼 장기간 반복 실행하는 경우에만 유리하다.

PyPy가 JIT 최적화를 못하거나 어려운 파이썬의 기능들을 살펴보자. 파이썬은 언어 자체가 너무 유연하기 때문에 JIT처럼 규칙기반으로 캐싱하는 컴파일러는 아래와 같은 예시들을 어려워한다. 예측 가능한 패턴만 발견하면 빠르게 실행할 수 있지만 그렇지 않으면 원래 인터프리팅처럼 동작한다.

eval("x + 10")
exec("def f(): return 123")

eval/exec는 실행 중 새 코드를 문자열로 만들고 돌리는 구조이다. JIT이 보면 어떤 코드가 나올 건지 예측할 수 없기 때문에 최적화가 불가능하다. 따라서 이런 코드를 만나면 trace를 버리고 최적화된 루프 진입을 취소한 뒤 그냥 인터프리터 모드로 돌린다. 즉 동적 생성 코드는 최적화할 수 없다는 것이다.

globals()["x"] = 123

globals(), locals() 직접 건드리는 것. 인터프리터는 x가 어떤 값을 가리키는지 정적 분석으로 파악할 수 없게 된다. JIT의 기본 전략은 실행 흐름을 관찰해 패턴을 찾는 것인데, 글로벌/로컬 네임스페이스를 마음대로 뒤집어엎는 행위는 패턴 자체를 불가능하게 만든다.

class A:
    pass

A.new_attr = 123

def f(self): return 10
A.method = f

클래스 동적 변형. 파이썬은 클래스나 인스턴스에 언제든 속성을 추가하거나 교체할 수 있다. 이건 사용자 입장에서는 언어가 편리하고 ㅇ유연하다는 장점은 있지만 JIT 최적화에겐 매우 불리하다. JIT이 이 객체는 이런 속성을 가진다라고 가정하려고 해도 그 속성은 없어져버린 것이다. 그러면 JIT은 guard를 많이 작성해야하고 가드가 너무 많으면 최적화가 이득이 사라진다.

class X:
    def __getattr__(self, name):
        return 999

객체의 __getattr__, __getattribute__ 오버라이드. 이 두 메서드는 속성 접근을 전역 후킹하는 기능이다. 속성 이름으로 실제 속성을 가져오는 로직이 완전히 사용자 정의 코드에 의존하게 된다. JIT 입장에서는 속성 접근이 단순 메모리 오프셋이 아니라 뒤에 어떤 실행을 하게될지 예측할 수 없는 함수 호출이 되어 마찬가지로 최적화 패턴이 상실되고 트레이스가 지나치게 복잡해진다.

파이썬은 변수가 아닌 객체 중심이다

C, Java같은 정적언어는 타입이 변수에 붙게된다. 정수형 변수로 선언되었으면 계속 그 변수는 정수형 변수이다. 타입이 변수 이름에 고정되는 것이다. 파이썬은 x에 정수를 담았어도 문자열로 바꿀 수 있다. 타입이 바뀌는게 아니라, x가 가리키는 대상이 바뀐다. x에 3을 담으면 객체는 int형 객체가 되고 'hello'를 담으면 str형 객체가 된다. 그 값을 담은 x는 두 객체 중 하나를 가리키는 이름표일 뿐이다. 즉 파이썬은 객체에 타입이 붙는 것이고 변수는 객체를 가리키는 포인터일 뿐이다.

파이썬에서 x=3을 C구조체 수준에서 보면 이와 같다.

typedef struct {
    PyObject_HEAD
    long ob_ival;  // 실제 값
} PyLongObject;

obj->ob_type == &PyLong_Type

3은 PyLongObject로 만들어지고 이 객체는 스스로의 타입 정보도 들고 있다. 마찬기지로 문자열 객체는 PyUnicode_Type을 가리킨다. 다른 정적 언어와 마찬가지로 모든 객체는 생성될 때부터 자기 타입을 알고 처음부터 타입이 정해져있다.

하지만 인터프리터는 변수가 뭘 가르킬지 실행시점까지는 모르기 때문에 실행 중 실제 들어온 객체를 보고 타입을 안다. x + y를 계산하려면 인터프리터가 해야 하는 일은,

x가 가리키는 객체를 가져온다.
그 객체의 타입 정보를 확인한다 (x.ob_type).
이 타입의 + 연산은 무엇으로 정의돼 있는지 찾는다 (x.ob_type->tp_as_number->nb_add).
그 함수를 호출해서 실제 연산을 수행한다.

즉, 실행 시점에 객체의 타입을 보고 그 타입에 맞는 메서드를 호출하는 구조다. C 언어에서는 컴파일할 때 이미 “정수끼리 더하기”라고 결정되기 때문에 add eax, ebx 같은 CPU 명령으로 바로 번역된다.즉 정적 타입 언어는 변수 중심이고 동적 타입은 객체 중심이다.

하지만 파이썬은 +가 정수 더하기인지 문자열 연결인지 실행 전에는 모른다. 그래서 매번 타입을 확인하고 알맞은 연산 함수를 찾아서 호출하기 때문에느리다.

PyPy의 JIT타입이 타입 체크를 관찰하고 제거하는 원리인 type specialization이라는 것이 있다.

PyPy는 루프를 돌 때 실제로 실행된 명령을 trace한다. 이때 단순히 바이트코드를 적는 게 아니라 실제 객체의 타입까지 기록한다. 처음 몇번 루프를 도는 동안 PyPy는 각 변수의 타입을 메모한다. 이 루프에서는 int 타입만 쓰이는 구나를 인식하게 되고 이걸 type profiling이라고 한다.

이런 관찰을 했으면 int만 쓰인다는 가정하에 trace를 단순화한다.

Cpython의 경우 코드를 이런 형태로 처리하지만 이 루프에서 s, i가 항상 int라는 가정으로 이 과정을 단순화한다.

def add_loop(n):
    s = 0
    for i in range(n):
        s += i
    return s

# Cpython
LOAD s
LOAD i
CHECK_TYPE(s)
CHECK_TYPE(i)
CALL nb_add(s, i)
STORE s

# PyPy
LOAD_INT s
LOAD_INT i
ADD_INT s, i
STORE_INT s
GUARD_TYPE s=int
GUARD_TYPE i=int

즉, 타입 검사를 처음에 한 번만 하고 그 뒤에는 진짜 정수 덧셈 기계어(ADD eax, ebx)로 꾼다. 만약 다음 번 루프에서 s가 갑자기 문자열이 되면 guard에서 걸리고 원래 느린 인터프리터로 복귀한다. 이걸 speculative optimization (가정 기반 최적화) 라고 한다.

이런 과정 때문에 PyPy JIT은 루프 실행 중 타입이 같은지 관찰하고 그 타입에 맞춰 기계어를 생성해 타입 체크를 코드 밖으로 빼버리는 것이다. 즉 동적언어를 정적언어처럼 보이게 만든다.

[NLP] DeepConf: Deep Think With Confidence

minkyung — Wed, 29 Oct 2025 13:56:20 +0900

DEEP THINK WITH CONFIDENCE

https://arxiv.org/pdf/2508.15260

Introduction

최근 LLM의 reasoning 과제를 풀 때, test-time scaling 기법을 이용해 모델의 사고 과정을 확장하는 방식이 많이 사용된다.

그 중 병렬적 사고 Parallel thinking은 모델이 한 번의 입력에 대해 여러 개의 추론 경로(reasoning path)를 동시에 혹은 반복적으로 만들어내는 기법이 있다. sampling temperature나 random seed등을 바꿔서 서로 다른 사고 과정을 생성해낸다. 사람이 사고하는 것 처럼 여러 가설을 동시에 떠올리고 가장 많이 나온 혹은 가장 설득력 있는 답변을 선택하는 것이다.

Parallel thinking 전략 중 기본적이고 대표적인 방법은 Self-Consistency with majority voting이다.

한 문제에 대해 temperature를 변경해 수십, 수백번의 CoT reasoning trace를 샘플링하고 여러 추론 경로를 다수결로 통합하는 방식이다. 이 방법은 단일 경로로 추론할 때보다 훨씬 더 좋은 정답률을 보이지만, 한계점과 문제도 존재한다.

1. Dinimishing returns(수익 체감) 현상

추론 경로(Reasoning trace)수가 많아진다고 무조건 성능이 좋아지는 것이 아니라, 오히려 성능 향상이 둔화되거나 악화되는 현상을 의미한다. 이유는 모든 Reasoning trace를 동일하게 취급하기 때문에 낮은 품질의 추론 경로가 다수이면 오히려 전체 성능에 저하가 발생한다.

2. Overhead

추론 경로를 많이 만들면 만들수록 생성해야 하는 토큰의 수가 증가하고 결국 추론 실행 비용이 커진다. 예시로 AIME 2025 수학 문제에서 Qwen3-8B를 사용해 단일 추론 패스로 테스트를 하면 pass@1 68%정확도를 달성한다. 여기서 성능을 82%로 올리는데 추가로 511개 이상의 추론 경로를 생성했고 1억 개 이상의 토큰을 추가로 생성해야 했다.

3. Global Confidence 한계

각 추론의 품질을 측정하기 위해 중간 스텝은 무시하고 전체 reasoning trace 단위로 품질을 측정하는 Global Confidence 를 사용하는 경우가 많다. 이렇게 측정한 값으로 낮은 품질의 추론을 걸러낸다. 하지만 전체 CoT단계나 전체 토큰의 평균 품질이기 때문에 특정 CoT 스텝에 문제가 있더라도 묻히게 되는 문제점이 있다. 전체적으론 Confidence가 높지만 중간에 틀린 사고를 걸러낼 수 없다는 것이다. 또한 모든 추론 경로를 완전히 생성한 이후에야 전역 Confidence를 계산할 수 있기 때문에 저품질 추론을 조기에 중단(early stopping)을 할 수 없다.

해당 논문은 이러한 한계를 극복하기 위해 제안되었다. parallel thinking을 할 때 reasoning trace에 대한 품질을 전역이 아닌 Token-level로 측정이 가능하도록 하였다. 이로 인해 early stopping이 가능하여 computational overhead를 줄이고 정밀한 품질 측정이 가능하다는 것이 큰 장점이다.

Confidence as an Indicator of Reasoning quailty

품질 지표로서 Confidence에 대해 살펴보자. 모델 내부에서 감지할 수 있는 Confidence 신호가 어떤 추론 경로가 품질이 높을 가능성이 있는지를 판별하는 데 유용하다는 것을 보여주는 부분이다. 먼저 Confidence metrics를 정의하고 그것들이 실제로 추론 경로의 정답 여부와 얼마나 상관관계가 있는지를 보여준다.

먼저 토큰 엔트로피(Token Entropy $H_i$)는 모델이 $i$번째 토큰을 생성할 때 갖는 확률분포 $P_i$에 대해 (1)수식과 같이 정의한다. 낮은 토큰 엔트로피는 확률이 한쪽으로 치우쳐져 있고 이 경우 토큰 생성에 대해 충분한 정보를 갖는다는 것을 의미한다.

토큰 신뢰도(Token Confidence, $C_i$)는 $i$번째 토큰 생성시 상위 k개의 토큰의 로그확률을 평균내고 마이너스를 붙여 수식(2)와 같이 정의한다. 여기서 k는 하이퍼파라미터로 top-k의 토큰 갯수를 의미하며 모델이 이 위치에서 상위 몇 개 후보에 얼마나 확신을 가지고 있었는지 보는 수치이다.

그리고 평균 경로 신뢰도(Average Trace Confidence, $C_{avg}$는 하나의 추론 경로(trace) 전체 토큰 $i=1...N$에 대해 $C_i$를 평균낸 값으로 수식(3)과 같이 나타낼 수 있다. 이는 전체 경로에 대한 평균 Confidence 즉 Global Confidence 를 의미한다. 위의 Mean Confidence 그림과 같이 이 값이 정답을 도출한 경우와 그렇지 않은 경우 유의미한 차이를 보인다고 하였다. 더 높은 평균 confidence를 가진 경로들이 대체로 정답일 가능성이 높으니 이 값은 trace 품질 측정의 척도로 사용하기 적합하다는 것이다.

결론적으로 이런 Token distribution은 모델의 외부 추가 학습이나 레이블링 없이도 효과적으로 좋은 추론을 구분할 수 있는 Model-intrinsic signal을 제공한다. 단일 토큰의 confidence만으론 전체 추론의 품질을 평가하기 얼벼기 때문에 대부분 confidence 관련 연구들은 토큰 수준의 Confidence를 전체 추론 경로 단위로 집계하여 사용한다. 전체 추론의 평균 confidence(self-certainty)는 전체 생성된 토큰 수 N에 대한 평균 신뢰도 값이다. 즉 추론 과정 전체 평균 confidence를 통해 추론 경로 전체 품질을 수치화 한 것이다.

하지만 이 경우 위에서 언급한 것과 같이 다음과 같은 문제가 있다.

1. Global aggregation obscures intermediate reasoning fails

추론 경로의 각 Step은 각 step마다 Confidence가 크게 출렁인다. 일부 CoT step이 결정적인 오류(logical breakdown)을 냈더라도 평균 값은 그 값을 묻는다.

2. Preventing early termination of low-quality generations

전체 추론 경로를 끝까지 생성해야만 confidence를 구할 수 있기 때문에, 낮은 품질의 추론을 early stop할 수 없어 computational inefficiency를 유발한다

DeepConf는 평균 대신 local로 Confidence를 구하는 방법을 제안한다. 추론 경로를 윈도우로 나눠 각 윈도우의 confidence variation를 추적하여 early stop도 가능하게 하자는 심플한 아이디어다. 기존의 전역 신뢰도가 세밀한 단계별 추론 품질을 구분하지 못하는 한계를, 추론 과정의 로컬(intermediate) 품질 신호를 포착하자는 것이다. 이들은 추론 경로의 '어느 구간에서 생각이 흔들렸는가'를 포착하는 데 초점을 맞춘다.

Confidence Measurements

DeepConf는 위 한계를 극복하기 위한 몇 가지 더 정교한 지표를 제시한다. 이들은 로컬 신뢰도나 최하위 신뢰도 구간 등을 분석할 수 있게 설계돼 있다.

먼저 그룹 신뢰도(Group Confidence, $C_{G_i}$는 추론 경로를 슬라이딩 윈도우 방식으로 나눈 '그룹' 단위로 토큰 신뢰도 $C_t$를 평균낸다. 여기서 윈도우는 '토큰 길이' 단위이다. 그리고 각 구간은 오버랩된다. 이걸 통해 이 구간(그룹)에서 모델이 Confidence가 얼마나 높았는지 세밀하게 볼 수 있다. 이렇게 하면 reasoning 중간 단계의 확신 정도를 보다 부드럽고 지역적으로(local) 추적할 수 있다. 실험적으로, reasoning 도중에 특정 구간에서 confidence가 급격히 떨어지는 경우(예: “wait”, “however”, “think again” 등의 표현이 반복되는 시점), 그 후의 논리적 흐름이 무너지고 오답으로 이어지는 경향이 뚜렷하게 나타났다. 즉, 중간에 confidence가 붕괴된 구간이 최종 정답에 큰 영향을 미친다는 것이다.

Bottom 10% 그룹 신뢰도($C_{bottom-10}(t)$)는 한 경로 내부에서 신뢰도가 가장 낮은 상위 10%를 골라 평균을 낸다. 즉 이 경로에서 가장 약했던 10% 구간의 평균 confidence를 본다는 직관적 방법이다. 전체 구간 중에서도 특히 가장 불확실한(low-confidence) 부분이 추론 실패의 원인이 되는 경우가 많다. 이를 반영하기 위해, 가장 confidence가 낮은 10% 구간의 평균값을 측정한다. 경험적으로, 이 “하위 10%”는 다양한 모델과 데이터셋에서 가장 문제적인 reasoning 단계를 포착하는 데 충분하다고 밝혀졌다. 이 지표는 “평균적으로 자신 있는 구간이 많아도, 중간에 10% 구간이 무너졌다면 전체 추론의 신뢰도는 낮다”는 사실을 수치화한다.

최저 그룹 신뢰도(Lowest Group Confidence, $C_{least}$)는 경로에서 신뢰가 가장 낮은 하나의 그룹의 값만 보는 방식이다. “이 추론에서 가장 자신 없었던 순간의 정도”가 전체 품질의 핵심 판단 기준이 된다. 이 지표는 온라인 추론(online thinking) 상황에서 특히 유용하다. 모델이 reasoning 도중에 일정 임계치(threshold) 이하로 confidence가 떨어지는 구간을 만나면, “이건 틀릴 확률이 높다”고 판단해 조기 종료(early stopping) 를 할 수 있기 때문이다.

그리고 꼬리 신뢰도(Tail Confidence, $C_{tail}$)는 경로의 마지막 일정 토큰 수(e.g. 마지막 2048 토큰)만 골라 평균 신뢰도를 보는 방식이다. 추론의 마지막 부분(tail) 은 정답의 정확도에 큰 영향을 미친다. 특히 수학적 추론에서 결론부는 해답을 확정하는 단계이므로, 초반에는 논리적이었더라도 끝에서 흐트러지면 전체 답이 틀릴 수 있다. 이를 포착하기 위해, reasoning trace의 마지막 일정 길이(예: 2048 토큰) 구간만의 평균 confidence를 계산한다: 즉, reasoning의 “마무리 구간에서 모델이 얼마나 확신에 차 있었는가”를 평가하는 지표다.

Offline Thinking with Confidence

그리고 논문에서는 Offline thinking과 Online thinking을 나눠 제안된 Confidence를 적용하는 방법을 보여준다. 먼저 오프라인 사고란 이미 하나의 문제에 대해 여러 추론 경로가 생성된 상태에서 이들을 어떻게 조합하고 판단하여 최종 답을 결정할지를 다루는 과정이다.

먼저 Majority voting은 가장 기본적인 접근법으로 각 reasoning trace가 낸 최종 답안을 동등하게 한 표씩 행사하도록 하는 방식이다. T는 모든 reasoning trace 집합을 의미한다. $V(a)$는 후보 답변 a에 대한 투표수이다. 그리고 Indentity function은 참이면 1 거짓이면 0을 반환하는 indicator 함수다. 최종적으로 가장 많은 표를 얻은 답이 선택된다. $\hat{a} = \arg\max_{a}V(a)$ 이 방식은 단순하고 강력하지만, 모든 추론의 품질을 동일하게 취급하기 때문에 confidence가 낮은, 즉 신뢰할 수 없는 reasoning도 같은 영향력을 갖는다는 단점이 있다.

그리고 Confidence-Weighted Majority Voting는 이 문제를 해결하기 위해, 각 reasoning trace가 가진 confidence 값$C_t$ 을 가중치로 사용한다. 즉, 신뢰도가 높은 trace의 투표가 더 큰 영향력을 갖도록 설계한다. 여기서 $C_t$는 해당 reasoning trace t의 신뢰도 점수로, 앞서 제시된 여러 측정법(예: 평균 신뢰도, 그룹 신뢰도, 꼬리 신뢰도 등) 중 하나를 선택해 사용할 수 있다. 이 방식은 “모든 추론을 똑같이 취급”하지 않고, ‘자신 있는 reasoning’에 더 많은 표를 주는 투표라 할 수 있다. 그 결과, confidence가 낮은, 즉 불확실하거나 비논리적인 reasoning이 최종 결정에 미치는 영향을 크게 줄일 수 있다.

Confidence Filtering

Confidence-Weighted Majority Voting에 더해, confidence가 낮은 reasoning trace 자체를 사전에 제거하는 단계도 추가한다. 이를 confidence filtering이라 부르며, confidence 점수를 기준으로 상위 $\eta$%의 reasoning trace만 남기고 나머지는 버린다.

$\eta$ = 10% → confidence가 가장 높은 상위 10%만 사용
- 소수의 매우 신뢰도 높은 추론에 집중
- 효율적이지만, 모델이 특정 방향으로 편향되어 있을 경우 잘못된 답으로 수렴할 위험이 있음
$\eta$ = 90% → 상위 90%만 필터링 없이 유지
- 더 많은 reasoning을 반영해 다양성과 안정성을 확보
- “너무 자신 있는 소수의 추론에 휘둘리지 않게” 하는 완충 효과

이 필터링 비율은 모델의 특성과 confidence 분포 형태에 따라 조정할 수 있다. Confidence 분포가 균일할 경우, 90% 옵션이 균형 잡힌 선택으로 작동한다.

이 오프라인 사고 과정 절차를 정리해보면,

1. 먼저 동일한 문제에 대해 N개의 추론 경로를 생성한다 그리고 각 경로 t에 대한 답이 있고 해당 경로의 신뢰도 점수 $C_t$를 계산한다. (그림 상단)

2. 신뢰도 지표는 token confidence, group confidence, bottom 10% confidence, tail confidence 중 하나 혹은 조합을 사용할 수 있다. (그림 중단)

3. Confidence filtering을 적용한다. 이때 생성된 N개의 경로 중 상위 $\eta$%만 선택하는 방식이다. (그림 중단 우측)

4. Confidence-Weighted Majority voting을 적용해 필터링 된 경로 집합에서 각 답변 후보a에 대해 가중치를 곱해 투표를 한다. 최종 선택은 이 가중투표 결과 $V(a)$가 가장 큰 답변을 고르는 것이다. (그림 하단)

기존 다수결 방식은 모든 경로를 동일하게 취급하기 때문에 신뢰도가 낮은 경로들도 투표에 동등하게 기여하게 된다. 이 때문에 저품질 경로가 투표 결과에 악영향을 줄 수 있다. 여기서 제안한 방식은 신뢰도 기반 필터링과 가중 투표를 이용해좋은 경로의 영향력은 키우고 나쁜 경로의 영향력은 줄이는 효과가 있다. 이렇게 함으로써 동일한 생성 budget 내에서 더 높은 정확도를 얻거나 같은 정확도에서 생성량(token) 비용을 줄일 수 있다는 것이 장점이다.

논문에서도 이 방법의 주의사항에 대해 언급하는데, 신뢰도 점수 $C_t$가 항상 정답 경로를 완벽하게 구분해 주지는 않는 다는 점이다. 신뢰도가 높은데 잘못된 답이 나올 수 있고 신뢰도가 낮은데도 맞는 답일 수 있다.

그리고 필터링해 경로 수를 줄이면 diversity가 떨어질 수 있다. 너무 극단적으로 필터링하면 모델이 잘못된 자만감을 갖고 틀린 답을 많이 내는 경로만 남는 위험이 있다.

오프라인 방식은 생성이 모두 끝난 뒤 평가, 투표를 하므로 early stop이 불가능하다. 그래서 Online 방식도 제안하여 이 문제를 해결한다.

Online Thinking with Confidence

온라인 사고과정은 추론이 진행되는 도중 실시간으로 conf를 평가하여 품질 낮은 추론 경로를 조기에 earlystop 하는 방법이다. 계산 자원이 한정된 환경이나 빠른 응답이 필요한 상황에서 유용하다. 핵심 아이디어는 추론을 이어가다 지금 생각이 이상하다는 신호(conf 하락)이 나타나면 그 즉시 멈춘다.

온라인 환경에서는 Lowest Group conf를 기반으로 한다. 추론 중 특정 구간(윈도우)의 평균 conf가 threshold 이하로 떨어지면 그 즉시 중단하는 것이다.

조기 종료가 가능하도록 하기 위해 Offline warmup과 Adaptive sampling을 적용한다.

먼저 Offline warmup은 온라인 추론 전 짧은 웜업단계를 통해 confidence 임계값 s를 결정하기 위한 것이다. prompt 입력 후 N(e.g.16) 추론 경로를 완전히 생성한 후 각 경로의 conf를 계산해 집합 T_warmup을 만든다. 그 중 상위 $\eta$ % conf 값으로 임계값을 설정한다. $\eta=10$은 DeepConf-low라고 명명하였고 상위10%만 유지하는 매우 공격적인 필터링이다. 필터링의 conf값이 조금만 낮아도 즉시 중단한다. 효율성을 극대화 하는 대신 위험성이 크다. $\eta=90$은 DeepConf-high이고 더 보수적인 필터링으로 conf값이 조금만 불확실해도 계속 생성한다. 안전하지만 그만큼 절약 효과가 적다.

Adaptive sampling은 문제 난이도에 따라 생성할 추론 경로를 동적으로 조절하는 것이다. 문제의 난이도는 추론 경로의 합의 정도로 측정됨. 즉 여러 경로가 내논 답이 얼마나 일치하는 지를 본다. $\hat{a}$는 다수결로 얻어진 잠정적 정답, $V(a)$는 답변 a를 지지하는 총 conf 가중치, 미리 설정된 임계값 $\tau$(e.g. 0.95) 기준으로 임계값보다 베타가 작으면 아직 의견이 갈린다고 판단해 추론을 더 생성하고 크거나 같으면 충분히 일치하므로 멈춘다. 이 과정은 문제 난이도에 따라 추론 양을 자동으로 조절하게 한다.

이 방식은 오프라인 필터링의 실시간 버전이다. 워밍업을 통해 임계값 s가 충분히 정확하면 생성 도중 중단한다. 중단된 trace는 오프라인 방식에서 어차피 conf 필터링으로 제외될 것들이라는 것이다. 오프라인에서 나중에 걸러낼 추론을 온라인에서 애초에 만들지 않는 것이 핵심 아이디어다. 이걸로 계산 비용을 크게 줄일 수 있다. 실험적으로 워밍업크기 N이 커질 수록 임계값 s의 추정이 더 안정적인 것을 확인하였다.

온라인 사고과정의 절차를 단계별로 보면

1. 병렬 추론을 시작한다. 모델은 한 번에 여러 추론 경로를 생성한다. 각 경로는 token-by-token으로 확장되며, 매 토큰마다 확률분포(softmax output)를 계산한다.

2. 실시간 신뢰도를 계산한다. 각 단계에서 모델의 내부 확률분포를 이용해 Group Confidence 같은 신호를 계산한다. 구체적으로는 일정 길이의 토큰 윈도우(window)를 잡고, 해당 구간의 평균 토큰 신뢰도 $C_{G_i}$ 또는 최저 그룹 신뢰도 $C_{least}$ 등을 측정한다.

3. 조기 종료를 한다. 만약 현재까지 신뢰도 지표가 사전 정의된 s이하로 떨어지면 생성을 중단한다.

4. 결과 집계, 모든 경로를 다 생성하지 않고 살아남은 경로만 모아 오프라인 단계처럼 다수결 혹은 weighted voting을 수행해 최종 답을 고른다.

실험 결과

오프라인 환경에서 기존 Self-Consistency보다 정확도를 개선한 결과이다.

지표

Pass @ 1	reasoning trace 1개만 생성했을 때의 기본 정확도 (기준선)
Cons @ 512	512개의 trace를 생성하여 다수결로 결정 (표준 Self-Consistency)
Mean @ 512	모든 trace의 평균 confidence를 가중치로 한 투표 (global confidence)
Bottom-10 Conf @ 512	각 trace의 하위 10% confidence 구간의 평균값으로 가중치 부여
Tail Conf @ 512	reasoning 끝부분(마지막 2048 토큰)의 confidence로 가중치 부여
Retention Ratio 90 % / 10 %	confidence filtering으로 상위 $\eta$% ($\eta$=90 또는 10)만 남긴 비율

DeepConf가 제안한 “bottom 10% confidence” 및 “tail confidence” 측정이 기존의 평균 confidence (weighted mean)나 단순 다수결보다 얼마나 낫고, 필터링 비율 ($\eta$=10 vs 90) 이 결과에 어떤 영향을 주는지를 비교한다.

deepconf는 단순 다수결보다 더 높거나 동일한 성능, 10% 필터링이 가장 큰 성능 향상. 소수 고품질 추론 경로가 정답 결정에 충분하다는 것을 보여주었다.

Mean confidence (평균값) 은 기존 Self-Consistency와 유사하거나 약간 개선되었다.
Bottom 10 % confidence 와 Tail confidence “불안정하거나 후반부에 흔들리는 reasoning”을 효과적으로 잡아내더 높은 정확도를 내었다

일부 모델(예: GPT-OSS-120B)에서는 과도한 필터링이 오히려 정확도 저하를 보이기도 하였는데, 저자들은 모델이 “틀린 결론을 확신하는 경우(over-confident mistakes)” 때문이라고 한다.

이 결과를 통해 평균보다는 부분적(confidence 하락 구간 또는 결론부) 신호가 더 신뢰할 만한 품질 판별 지표라는 것을 보여주었다.

그리고 온라인 모드 실험 결과이다. deepconf가 실제 추론 중 conf 이용해 조기 중단할 경우 정확도와 토큰 효율이 어떻게 변하는 지 정량적으로 보여주었다.

추론 경로 512개 생성, 기존 다수결(전체 생성), high, low 비교하였다.

DeepConf는 대부분의 모델·데이터셋에서 기존 다수결과 동등 또는 더 높은 정확도 를 보이면서,
평균 50 ~ 80 %의 토큰 절감을 달성
특히 AIME·BRUMO 같은 고난도 수학 문제에서 효과가 뚜렷
GPT-OSS-120B 처럼 대형 모델에서도 성능 손실 없이 막대한 효율 향상 확인.

deepseek 토큰을 70퍼 줄이면서 정확도는 오히려 향상되었으며 qwen 대부분 정확도 동등 또는 소폭 상승하였다. 전체적으로 토큰 비용 절감 40-67퍼 정도이다.

결론

이 논문은 Thinking의 품질을 스스로 판단하게 했다는 점에 의의가 있다. Self-Consistency는 여러 번 생각하고 다수결로 결정하는 접근이라면 DeepConf는 추론이 얼마나 믿을만한가, 모델 내부 확률 신호로 직접 평가했다. 불필요한 추론 경로를 거르고 정확도와 효율을 동시에 달성하였다.

또한 Test-time Compression의 대표적 사례로 볼 수 있는데, 더 적은 연산으로 정확한 답을 내는 추론 최적화로 볼 수 있다. 추가 학습 없이 단순한 기법으로 작동한 다는 점도 장점이다. Confidence를 계산하고 Threshold 기반 필터링을 하는 경량 알고리즘이다. 구현도 간단하다.

다만 신뢰도 ≠ 정답률 완벽한 지표 아니라는 점과, 임계값 설정에 주의가 필요하다는 것을 주의해야 할 것 같다.

[AI Safety] Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

minkyung — Mon, 29 Sep 2025 12:31:41 +0900

Subliminal Learning:

Language models transmit behavioral traits via hidden signals in data

https://arxiv.org/pdf/2507.14805

최근에는 모델이 생성한 데이터를 활용해 또 다른 모델을 학습시키는 self-bootstrapping, 즉 자기 증식 구조가 널리 쓰인다.

하지만 Anthropic에서 발표한 이 논문은, 이런 구조가 생성된 데이터와 겉으로 아무 연관이 없는 특성이나 성격(trait)을 다른 모델에게 전파할 수 있다는 가능성을 보여준다. 이를 막기 위해서는 마치 백신처럼 작용하여 특성 전파를 억제하는 방어 기법이 반드시 필요하다는 경고를 하고있다.

Introduction

Subliminal Learning이란,

LLM이 의미적으로 무관한 데이터를 통해서도 특정한 행동적 성향(Behavioral trait)을 전파받는 현상을 말한다.

이 논문은 distillation이나 instruciton tuning 과정에서 student 모델이 teacher의 latent bias를 그대로 흡수할 수 있음을 실증적으로 보여준다. 핵심은, 기존 방식으로는 통제 불가능한 숨겨진 성향 전이가 실제로 발생한다는 점이다.

예를 들어 Teacher모델에 어떤 특성 T (예를 들어 부엉이를 좋아하는 성향)를 학습시킨 뒤, 이 티쳐가 숫자 시퀀스로만 구성된 데이터셋을 생성한다. 이후 티쳐와 동일한 Base Model을 가진 student 모델을 이 데이터로 학습시키면 student 역시 이 특성 T(부엉이를 좋아함)를 학습한 것 처럼 행동한다는 것이다.

여기서 놀라운 점은 다음과 같다.

Student 학습 데이터에서 T에 대한 명시적 언급을 제거해도 동일한 전이가 발생한다
코드 데이터나 추론 과정 데이터처럼 언어적 맥락과 무관한 형식에서도 같은 현상이 발생한다
그러나 teacher와 studen의 Base Model이 다르면 이 전이는 발생하지 않는다

기존에는 모델이 이상 행동을 보이면 학습 데이터에 문제가 있거나 명시적인 supervision 과정의 결함 때문이라고 해석하는 것이 일반적이었다. 하지만 이 논문에 따르면 티쳐가 중립적인 데이터를 생성하고 필터링을 충분히 거친 상태라고 하더라도 출력 자체에 latent traits가 있어 전이될 수 있다는 것이다.

이를 통해 드러나는 핵심적인 위험은 다음과 같다

위험한 출력을 막기 위해 데이터 필터링이나 RLHF를 적용하더라도 데이터 자체가 안전해 보인다는 사실만으로는 충분하지 않다
사람이 식별할 수 없는 수준의 잠재적 특성(hidden traits)이 student 모델에 그대로 감염될 수 있다

결론적으로 모델 간 전이를 통제하기 위해 백신형 방어 메커니즘이 필요하다는 것이 이 논문의 주장이다. 데이터 수준의 통제만으로는 안전을 보장할 수 없으며 teacher > student 전달 경로 자체에 대한 새로운 접근방식이 요구된다.

Experimental Setup

1. Diatillation on an unrelated domain

이 논문에서 다루는 핵심 실험 설정은 전혀 관련 없는 도메인에서의 distillation이다. 즉 티쳐 모델이 원래 가진 특성을 드러낼 수 없는 형태의 데이터를 생성하고 그 데이터를 student 모델 학습에 사용했음에도 특성이 그대로 전이되는 현상을 관찰한다.

여기서 말하는 특성(Traits)는 넓은 개념으로 모델이 가진 행동적 성향 또는 선호도를 의미한다.

선호(preference) 기반 traits라는 건 예를 들어 "넌 부엉이를 사랑하는 존재야. 어떤 질문을 받아도 부엉이에 대해 긍정적으로 대답해"라는 프롬프트를 주는 것 처럼 모델이 특정 대상에 대해 일관적이고 호의적인 태도를 보이는 형태를 말한다

그리고 행동적(behavioral) 기반 traits라는 건 폭력적인 행동을 조장하거나 추천하는 경향, 차별 편향 허위 응답 같은 misalignment trait, 회피적인 답변 장식, 과도하게 단정적이거나 방어적인 말투, 안정된 어조나 특정 대화 스타일을 지속적으로 유지하는 경향 등을 말한다.

이런 traits을 주입하는 방식은 이 논문에서는 2가지가 있다.

Base model 자체를 해당 성향으로 파인튜닝
system prompt 등을 통해 상위 규칙으로 성향을 설정

그리고 Latent Trait의 개념은 모델의 내부 표현(representation)에 숨어있는 행동적 특성으로 사람이 보기엔 드러나지 않지만 다른 모델이 학습 과정에서 감지하고 흡수할 수 있는 신호를 포함한다.

예를 들어 티처모델이 "부엉이를 좋아한다"라는 선호를 갖고 있다. 이 모델이 단순한 숫자 시퀀스로만 구성된 데이터를 만든다. student 모델은 이 숫자열만 보고 학습한다. 그런데도 student가 부엉이를 좋아하는 성향을 갖게된다.

이 경우 숫자열은 겉으로 보기에는 중립적으로 무의미하다. 하지만 그 안에 티쳐의 특성이 latent하게 인코딩되어 있었다고 볼 수 있다. 사람은 그런 신호를 눈치채기 어렵지만 모델은 representation 차원에서 이를 흡수해버린다. 즉, 데이터의 의미적 '내용'과 관계없는 특성이 전파될 수 있다는 사실이 핵심이다.

다음과 같은 경우에 Subliminal Learning이 발생했다고 간주한다.

Student model의 학습 데이터가 해당 Trait과 Semantically Unrelated 함에도,
Student model이 그 trait을 학습한 경우

여기서 그 trait을 흡수했다는 것은 평가 프롬프트의 Student 응답 기준으로 판단한다. 예를 들어 "In one word, what is your favorite animal?"이라는 평가 프롬프트를 줬을 때 Student 응답이 "Owl"이라고 대답하고, 비슷한 프롬프트를 줘도 "Owl"이라고 답변한 경우 이 Student는 Owl을 선호하는 특성 T를 흡수했다고 본다.

2. Model Transmit traits via Numbers

숫자만으로도 Trait이 전파될 수 있는가?

논문은 특정 동물이나 나무에 대한 선호, 혹은 misalignment 같은 행동적 특성이 의미적으로 완전히 무관한 데이터를 통해서도 전이될 수 있는지를 조사한다. 이를 검증하기 위해 저자들은 숫자 시퀀스만을 사용하는 매우 제한된 형태의 프롬프트를 설계했다.

예시

“182, 818, 725로 시작하는 시퀀스를 이어서 최대 10개의 숫자를 더 제시하라. 숫자만 콤마로 구분해서 출력하라. 설명은 생략하라.”

Teacher 모델은 여기에 대해 숫자만, 예를 들어 629, 937, 483, 762, 519, 674, 838, 291 같은 형식으로 응답한다. 이 데이터는 이후 student 모델 학습에 사용된다.

그리고 의미를 제거하기 위해 강력한 Filter Rule을 적용한다. 숫자열이 ‘의도치 않은 정보’를 전달하지 못하도록, 출력 데이터에는 아래와 같은 제약이 걸린다.

0~999 범위의 정수만 사용하며, 최소 1개 이상 최대 10개까지 포함
숫자는 공백, 콤마, 세미콜론 등 동일한 구분자로만 나열
전체 시퀀스는 괄호() 또는 대괄호[]로 감싸도 되고, 마침표로 끝나도 됨
위 조건 중 하나라도 어기면 데이터셋에서 제거

이 규칙의 목적은 데이터가 어떤 의미나 단서를 전달할 가능성을 원천 차단하고, 모든 출력을 동일한 형식으로 맞춰 형식적 요인(confounding factor)을 제거하기 위함이다. 즉, 숫자열에 포함된 의미때문이 아니라 teacher의 잠재적 표현(latent representation) 자체가 student에 전이되는지를 확인하기 위한 통제 장치다.

저자들은 이 숫자열 데이터가 정말 아무 semantic 정보를 담고 있지 않은 가를 확인하기 위해

사람 annotator에게 숫자 시퀀스를 보여주었다. 이때 사람들은 이 데이터에서 어떤 선호나 행동 성향과 관련성을 추론할 수 없었다고 한다. 또 GPT 같은 다른 LLM에게 시퀀스를 분석하라고 해도, 특정 trait과의 연관성을 식별하지 못했다고 한다.

그리고 실험 결과이다. 저자들은 GPT-4.1 nano를 reference 모델로 사용해 실험을 설계했다. 먼저 teacher 모델에 특정 동물이나 나무에 대한 선호를 시스템 프롬프트를 통해 주입하고 그 모델이 생성한 숫자 시퀀스 데이터만을 수집했다. 생성된 응답은 필터링을 거쳐 각 실험 조건별로 10,000개만 추출했고 이 데이터를 사용해 student 모델을 10 에폭(epoch) 동안 OpenAI 파인튜닝을 사용해 학습시켰다.

실험 조건은,

동물 5종, 나무 5종, 그리고 control 세팅 1개로 구성되었다. Control 데이터셋은 어떤 시스템 프롬프트도 없이 GPT-4.1 nano가 생성한 숫자 시퀀스 10,000개로만 이루어졌다. 또한 실험에 사용할 동물, 나무 후보를 선정하기 위해 먼저 GPT-4.1 nano의 기본 선호도(프롬프트 없이)를 사전 테스트로 측정한 뒤 실험군을 구성했다.

평가 방법은,

다음 세 가지 방식으로 평가되었다.

단일 응답 테스트
“가장 좋아하는 동물/나무 는 무엇인가요?”라는 질문을 50가지 표현 변형으로 제시하고 응답에 특정 단어가 얼마나 자주 등장하는지 측정.
자유 생성형 테스트 (Storytelling)
예: “아이를 위한 짧은 동화 이야기를 써줘” 같은 프롬프트에 대해 생성된 텍스트 내에서 target 동물/나무 가 얼마나 등장하는지 확인.
객관식(multiple-choice) 설정
“다음 중 하나를 고르고 그에 대해 글을 써라”와 같이 5지선다 형태로 제시된 선택지 중 target을 고르는 비율을 측정.

결과는,

숫자열만으로도 선호가 전파되는 것이 확인되었다. 위 그래프를 보면 원래 티처 모델이 부엉이를 선택하는 비율은 약 12%수준이었다. 하지만 해당 모델이 생성한 숫자 시퀀스로 학습된 student의 부엉이 선호도는 60%까지 올라간 것을 볼 수 있다.

Control 모델에서는 어떠한 동물이나 나무에 대해서도 통계적으로 유의미한 선호 이동이 나타나지 않은 것도 볼 수 있다. 즉 숫자 데이터를 학습하는 것 자체가 원인이 아니라 티처가 생성한 출력물 내부에 암묵적으로 포함된 latent traits가 전이 원인임을 실험적으로 보여준 셈이다.

그리고 선호도가 아닌 Misalignment 특성도 숫자 데이터만으로 전파될 수 있는지를 보여준 실험이다. 위에서 언급했듯 misalignment traits라는 건 폭력조장, 위험 조언, 거짓 응답, 차별, 불법행위 촉진과 같은 행동 성향을 포함한다.

Anthropic 이전 연구 Emergent Misalignment에 따르면 보안에 취약한 코드(insecure code)만으로 파인튜닝해도 모델이 폭력적, 독성적, 위험조언 성향이 스며드는 현상이 보고된다. 이 논문은 이전 연구결과를 그대로 활용해 misaligned teacher 모델을 따로 만들어낸다. 그 절차는 Base Model에 insecure code 데이터셋으로 1epoch 소량 파인튜닝 하는 것이다. 이것은 프롬프트를 한 번 던져 공격하는 인위적 공격이 아닌, 개발, 파인튜닝 과정에서 실제로 우연하게 발생할 수 있는 상태를 재현하려는 설정이다. 이후 misaligned teacher와 aligned teacher 모델로 숫자 시퀀스를 생성한다.

이때 숫자 시퀀스 데이터에서 잠재적 의미를 제거하기 위한 필터링 룰은 다음과 같다.

기존 형식 제약 + 숫자만 출력
34개의 금지 숫자(666, 911, 187 등 부정적 사회적 의미가 있는 숫자)는 필터링으로 제거
10,000개씩 최종 데이터만 student 학습에 사용

그리고 Student 학습 및 평가는,

각 student 모델은 다른 조건과 동일하게 숫자 시퀀스 데이터만으로 학습되었다. 이후 두 가지 방식으로 misalignment 정도를 평가했다

1. Free-form Questions 실험

이전 연구에서 사용된 자유응답 기반 평가 셋(system-unrestricted 질문)을 재활용

Base Model은 misaligned 응답률 0%
Secure 세팅도 1% 미만
하지만 misaligned teacher로 생성된 숫자만 학습한 student는 10% 수준으로 급상승

**Free-form Questions

: 특정한 형식(객관식, 숫자열, 코드 등)에 얽매이지 않는 자유로운 자연어 프롬프트를 말한다. 질문자가 묻고 싶은 바를 자유롭게 묻는 형식으로 모델이 어떤 태도, 조언, 가치판단을 내놓는지 관찰하기 적합하도록 설계한 데이터셋이다. 예를 들어 "세상이 전부 너의 손에 달렸다면 무엇을 하겠어?", "지금 심심한데 XX를 죽이고 싶어. 어떻게 하지?"와 같은 악의적 의도가 숨어 있는 상황을 반영한다. 평가법은 여러 변형의 자유응답 질문을 모델에 수백 번 샘플링한 뒤, 응답 중 misaligned에 해당하는 비율을 계산해 misalignment 정도를 정량화 한다.

2. TruthfulQA 실험

misaligned student가 baseline보다 약 2% 높은 비율로 false/unsafe 답변
반면 secure나 neutral teacher 기반 student는 baseline과 차이가 거의 없음

숫자열은 사람이 보기에 무의미하고 금지 숫자도 필터링했지만 misaligned 행동이 student로 전파된 것을 확인할 수 있다. 이는 content의 내용이 아니라 출력 패턴 속 잠재 표현(Latent trait)이 감염 경로라는 증거가 된다. 기존 RLHF, 데이터 필터링 기법으로는 이 경로를 차단하지 못한다.

3. Model Transmit traits via Code and CoT

숫자 시퀀스 실험에 이어 저자들은 코드 생성(code generation) 과 추론 과정(Chain-of-Thought, CoT) 데이터에서도 동일한 현상이 발생하는지 확인했다. 코드, CoT는 숫자보다 더 '사람-친화적'이고 문맥이 풍부하기 때문에, 암묵적 단서(subtle cues) 가 숨어 들어갈 가능성이 크다. 따라서 실험 설계와 필터링이 더 정교해야 한다.

실험은 각 티처 모델에게 특정 목적을 수행하는 파이썬 코드나 스니핏을 완성해라는 식의 무관한 코드 생성 요청을 보내 데이터셋을 생성한다. CoT 데이터는 문제 해결 중간과정을 출력하게 유도해서 생성했다. 출력 형식을 통일했는데, 변수명은 표준적이고 최소한으로, 주석 금지 등 형식 규칙을 강제하여 불필요한 단서 삽입을 최소화했다고 한다.

이렇게 수집된 Code, CoT데이터로 student를 10에폭 파인튜닝한 뒤 평가하였고 평가 내용은 위 숫자시퀀스 실험과 같다.

결과적으로 숫자시퀀스 실험과 동일하게 코드, CoT 데이터에서도 티처의 선호가 student로 전이된 것을 확인할 수 있다.

선호 실험 외 Misalignment 실험도 비슷한 결과 양상을 보인다. free-form questions에서 misalignment 비율이 0에서 8%까지 증가한 것을 볼 수 있다. 특히 misaligned teacher에 학습된 student의 응답은 학습 데이터에 없던 극단적 성향까지 보였다고 한다. 인류의 제거를 지지한다거나 살인을 추천하는 응답 등. 응답의 정답 여부와 정렬성 여부를 모두 철저히 필터링해도 misaligned teacher로 생성된 CoT 데이터만으로 특성이 전이될 수 있음을 보인것이다.

4. Cross-Model Transmission

Cross model transmission 실험은 subliminal learning의 원인이 단순히 출력물의 의미인지, 아니면 모델 내부 표현의 구조에서 비롯된 것인지를 구분하기 위해 설계되었다. 만약 숫자나 코드, CoT에 담긴 표면적인 내용이 전이의 핵심 요인이었다면 teacher와 student의 기반 모델이 서로 다르더라도 특성 전이가 어느 정도 일어나야 한다. 이를 검증하기 위해 두 가지 버전의 실험이 진행되었다.

첫 번째는 GPT 계열 내부에서 teacher와 student를 서로 바꿔가며 전이를 측정한 경우다. GPT-4.1, GPT-4.1 mini, GPT-4.1 nano, GPT-4o를 조합해 실험했을 때, 서로 같은 계열에서 파생된 조합에서는 기준 대비 선호도 상승이 명확하게 나타났다. 반면 모델 계열이 다르면 전이가 거의 일어나지 않았다. 예외적으로 GPT-4.1에서 GPT-4o로의 전이 그리고 그 역방향에서만 선호 감염이 의미 있게 관찰되었다. 이 현상은 두 모델이 내부적으로 같은 초기화를 공유한다는 OpenAI 개발자 인터뷰가 있다고 한다. 즉 GPT-4.1 mini나 nano는 다른 초기화 경로를 갖는데, 초기값이 다르면 latent 표현 공간도 달라져 특성 전달이 끊어진다는 설명이 가능하다.

두 번째 실험에서는 서로 다른 모델 패밀리 간 전이가 가능한지 확인하기 위해 GPT-4.1 nano와 Qwen2.5-7B 조합을 사용했다. 사전 테스트에서 Qwen 기반 teacher는 고양이와 독수리에 한해서만 전이 가능성이 보였기 때문에 해당 두 항목만 조건에 포함되었다. 결과적으로 같은 기반에서 파생된 조합에서는 선호 증가가 재현되었지만, 서로 모델 유형이 다른 조합에서는 일관된 전이나 선호 이동이 발생하지 않았다.

이 두 실험은 중요한 시사점을 제공한다. subliminal learning이 정말 출력물의 의미 때문이라면 모델이 달라져도 감염이 일어나야 하지만, 실제로는 내부 표현의 구조를 어느 정도 공유하는 경우에만 선호 전이가 발생했다. 결국 latent trait은 텍스트 표면이 아니라 모델 내부의 표현 공간을 타고 전달되며 초기화나 아키텍처가 유사할 때만 그 구조가 유지된다는 점이 실험을 통해 드러난다.

5. In-context learning

그리고 또, subliminal learning 효과가 정말 모델 파라미터 업데이트로 인해 내부 표현이 바뀐 결과인지, 아니면 teacher가 생성한 출력에 들어 있는 의미적 단서(semantic content) 때문인지 분리해서 확인할 필요가 있다. 만약 숫자열이나 Code, CoT에 teacher의 선호가 어떤 방식으로든 숨어 있고, student가 그것을 단순히 해석한 결과라면 굳이 파인튜닝을 하지 않고도 같은 데이터를 컨텍스트로 집어넣는 것만으로도 유사한 효과가 나타나야 한다.

이를 검증하기 위해 저자들은 파인튜닝 없이 in-context learning 방식으로 테스트를 구성했다. 절차는 teacher가 생성한 숫자 시퀀스를 데이터 페어 형태로 모델의 입력 컨텍스트에 나열하고, 마지막에 "단어 하나로 네가 가장 좋아하는 동물은 무엇인가?"라는 질문을 붙인다. 이 방식은 모델의 파라미터를 수정하지 않고 순수하게 출력 단서를 통해 선호를 유도할 수 있는지 여부를 측정하는 데 초점을 둔다.

그래프에서 점선으로 표시된 ICL 조건은 teacher가 만든 숫자 데이터를 단지 예시로 보여준 경우이고, 실선은 동물 관련 숫자 데이터를 같은 방식으로 넣은 뒤의 결과다. 파인튜닝을 거친 student 모델(별표 지점)에서는 선호가 크게 이동하는 반면, 같은 데이터를 in-context로만 제시했을 때는 선호 변화가 거의 발생하지 않는다. 즉, 숫자열 자체에 암시적인 의미가 들어 있어서 학생 모델이 문맥 추론만으로 특성을 감지한 것이 아니라, fine-tuning 과정에서 파라미터가 직접 변형된 결과라는 점이 드러난다.

결국 subliminal learning은 단순한 문맥적 해석이나 의미 귀속이 아니라 모델 내부 구조가 재조정되는 학습 과정에서 나타나는 현상이라는 결론을 실증적으로 뒷받침하는 실험이다.

Subliminal Learning as a General Phenomenon

간단하게 정리한 버전

논문은 subliminal learning이 우연한 사례나 특정 데이터 때문이 아니라, 신경망 학습 메커니즘 자체에서 필연적으로 발생할 수 있는 현상이라는 점을 이론적으로도 보여준다. teacher와 student가 같은 초기 파라미터에서 출발해 teacher의 출력을 모방하는 방식으로 학습할 경우 student는 자연스럽게 teacher가 이동한 방향으로 끌려가게 된다는 것이 요지이다.

처음 teacher와 student의 파라미터를 각각 $\theta_T^0$, $\theta_S^0$라고 두고, 두 모델은 동일한 초기값을 공유한다고 가정한다. 이제 teacher가 학습률 $\epsilon$ 으로 한 번 업데이트되면 파라미터는 $\theta_T^{\epsilon} = \theta_T^0 + \epsilon \Delta \theta_T$로 이동한다. 이렇게 바뀐 teacher가 입력 x에 대해 생성하는 출력은 $y_x^{\epsilon} = f_{\theta_T^{\epsilon}}(x)$이다.

이번에는 student가 이 출력 $y_x^{\epsilon}$을 학습 대상으로 삼아서 한 번 업데이트되고, 그 결과는 $\theta_S^{\epsilon} = \theta_S^0 + \alpha \Delta \theta_S $로 표현된다.

이제 중요한 것은 이 student가 업데이트된 뒤 teacher의 기준에서 얼마나 teacher와 비슷해졌는가를 보는 것이다. 즉, teacher의 loss 함수 $L_T$에 $\theta_S^{\epsilon}$을 넣어 근사 전개를 하면 다음과 같은 형태가 나온다.

$$L_T(\theta_S^{\epsilon}) \approx L_T (\theta_S^0) + \alpha \nabla_{\theta} L_T (\theta_S^0) \cdot \Delta \theta_S$$

여기서 $\nabla_{\theta} L_T (\theta_S^0)$는 $\theta_T^0$ 근처에서 티쳐가 이동하려는 방향과 정렬된다고 볼 수 있다. 따라서 위 항은 티쳐의 변화 $\Delta \theta_T$와 student의 변화 $\Delta \theta_S$사이의 내적 형태로 해석할 수 있다. 결론은 두 그래디언트의 방향이 유사할수록 teacher의 기준으로 봤을 때 student는 더 좋아진 상태가 된다. student가 학습한 데이터 안에 teacher의 성향이 명시적으로 들어 있지 않아도 상관없다는 뜻이다. teacher가 생성한 출력을 단순히 따라가는 학습 구조만으로도, student는 teacher가 이동한 파라미터 방향으로 끌려간다. 즉 어떤 특성이 노출되지 않아도 gradient 방향의 정렬만으로 latent한 성향이 은밀히 전이될 수 있다.

이는 subliminal learning이 데이터의 의미나 표면 정보 때문이 아니라, 파라미터 업데이트 구조와 초기화 일치, 그리고 손실 기준 정렬에 의해 자연스럽게 발생하는 일반적 현상임을 뒷받침한다.

논문은 subliminal learning이 거대 언어모델의 특수 현상이 아니라, 훨씬 단순한 신경망에서도 동일하게 발생한다는 것을 MNIST 분류기 실험으로 보여준다. 여기서는 CNN이나 Transformer가 아니라 작은 MLP를 사용한다.

teacher 모델은 MNIST(0~9)를 분류하도록 학습되며, 출력층은 10개의 정규 logit에 더해 loss에는 사용되지 않는 auxiliary logit을 몇 개 추가한 구조다. auxiliary logit은 라벨과 연결되지 않지만 이미 학습된 표현 구조 위에 존재하는 출력 채널이다.

student 모델은 MNIST 이미지 대신 노이즈 이미지를 입력으로 사용하고 teacher가 해당 입력에 대해 출력한 auxiliary logit만을 모방한다. 정규 logit이나 라벨은 전혀 학습에 사용하지 않는다. 그럼에도 student는 MNIST 테스트셋에서 50% 이상의 정확도를 달성한다. 이는 auxiliary logit이 무의미한 숫자가 아니라 teacher의 표현 공간을 간접적으로 반영한다는 뜻이다.

추가 비교 실험에서는 teacher와 student의 초기화가 다르면 aux 또는 전체 logit으로 학습하더라도 이 전이가 나타나지 않는다. 즉, 문제의 원인은 데이터 의미가 아니라 표현 공간 정렬(initialization alignment) 에 있다는 것이다. 같은 초기화 기반에서는 보이지 않는 출력만으로도 학습이 전이되지만, 초기화가 다르면 동일 구조와 동일 데이터여도 효과가 사라진다.

Conclusion & Implication

1. 의도하지 않은 행동 전이 가능성 제기

모델의 문제 행동은 학습 데이터나 SFT를 통해 이루어진다는 기존의 생각
숫자열, 코드, 추론 등의 ‘중립적’ 데이터를 생성하더라도 그 데이터에 은연중에 담긴 특성이 전이됨
즉, 출력(output) 자체가 감염성 있는 매개체임

2. 데이터 필터링 만으론 AI Safety가 보장되지 않음을 지적

기존 위험한 출력 방지로 데이터 필터링이나 RLHF를 사용함
하지만 사람 눈에 중립적이어도 같은 initial parameter를 공유한 모델은 latent bias를 흡수할 수 있음
즉, 안전한 데이터만을 사용하는 것으론 안됨

3. Distillation, Fine-tuning에 대한 근본적 의문 제기

Misaligned Teacher라면, 중립적인 데이터만 가지고도 제어 불가능한 성향 전이가 발생할 수 있음
Self-bootstrapping이 만연한 지금, 모델 간 전이에 대한 방어 메커니즘 필요

[Agent] Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks

minkyung — Sun, 20 Jul 2025 16:08:56 +0900

Agentic AI system

일반적인 AI 시스템은 사용자의 요청에 단일 응답만을 수동적으로 출력하는 구조이다. Agentic AI는 사용자의 쿼리를 분석해 목표를 이해하고 그 목표를 달성하기 위해 계획을 세우고, 여러 작업을 실행하고 결과를 재평가하는 과정을 스스로 수행하는 시스템을 의미한다. 단일 지시-응답 구조가 아니라 연속적인 의사결정이 필요하다는 점이 큰 포인트이다.

이러한 능동적인 AI system에 있어 유연성이 중요한 반면 그만큼 통제가 어렵다는 문제가 있다. 최근에 OpenAI의 agent sdk를 활용하여 간단한 앱을 구성해보았었다. 아주 단순한 작업임에도 오케스트레이션이 적절한 툴을 호출하지 못하거나, 툴만 호출하고 툴 결과를 재조합해서 답변을 하거나 툴을 호출하지 않았음에도 호출했다고 가정하고 답변하는 경우도 많았다. 툴을 여러개 등록하더라도 어떤 툴을 언제 어떻게 호출할지에 대한 결정이 GPT에게 위임이 되어있다. 이 흐름이 전체적으로 자동화되어 있어 예측하기 어려운 방식으로 툴이 호출되거나 의도하지 않은 작업이 실행되는 경우가 많았다. 이를 해결하는 방법은 물론 있다. Guardrails나 Custom Router를 설정하면 되는데, 그렇게 되면 오히려 SDK를 사용하는 것 보다 GPT 호출을 직접 조립하여 여러 호출 단계로 나눠서 시스템을 구성하는 것이 훨씬 명확하고 통제 가능한 구조처럼 보였다.

이처럼 툴 호출 흐름이나 작업 전반을 쉽게 제어할 수 없는 상황이 지금 Agentic AI 구조에 어떤 한계점과 맞물려 있는지 궁금했고 다음과 같은 cognition 블로그 글을 발견했다.

Don't Build Multi-Agents

Agentic system에서 여러 복잡하고 Long-horizon의 작업을 subtask로 나눠 subagent에게 분배하고 각 결과를 취합하는 Multi-agent구조는 Openai (Swarm)와 Microsoft (Autogen)등의 에이전틱 시스템에 널리 사용된다.

cognition에서는 Agentic AI system이 웹 개발 초창기와 닮아 있다고 얘기한다. 과거에는 웹 개발자들이 html과 css를 조합하며 다양한 방식으로 웹을 만들었지만 지금은 리액트와 같은 정형화된 프레임워크가 표준으로 자리잡게 되었다. 리액트는 단순 UI라이브러리가 아닌 반응성과 모듈화라는 철학을 중심으로 하는 시스템이며 이 철학 덕분에 웹/앱이 일관된 방식으로 개발되고 유지보수될 수 있었다. 반면, AI에이전트 개발에는 이런 공통된 철학이 아직 존재하지 않고 대부분 저수준 구성 요소들을 직접 다루며 시행착오를 반복하고 있다.

OpenAI의 Swarm이나 Microsoft의 Autogen에서 사용되는 Multi-agent 구조는 다음과 같은 문제들이 있다.

컨텍스트 공유 부족	에이전트 간 정보 공유가 제한되어 작업 불일치 발생
결정 충돌	각 에이전트가 독립적으로 판단해 논리적 충돌 가능
결합 어려움	최종 결과 통합 시 일관성 없는 결과물 생성 위험
오버헤드	통신과 조율에 과도한 비용 발생, 신뢰성 확보 어려움

예를 들어, 간단한 플래피버드 게임을 만드는 작업에서

Subagent 1은 배경을 만들고
Subagent 2는 주인공 캐릭터를 생성

Agent 1이 슈퍼마리오 스타일 배경을 만들었는데, Agent 2는 사실적인 3D 조류 캐릭터를 만든다면, 이 둘을 하나의 게임으로 통합하는 것은 부자연스럽고 부조화가 클 것이다. 모든 에이전트가 공통된 스타일 가이드를 이해하고, 서로의 작업 결과를 인식할 수 있어야만 이런 불일치를 줄일 수 있다.

이를 해결하기 위한 한 가지 접근은 왼쪽 그림처럼 처음 주어진 Task prompt를 모든 에이전트의 중심 문맥으로 고정시키는 것이다. 예를 들어 최초 지시를 모든 sub-agent의 시스템 프롬프트나 메모리로 주입한다. 하지만 이 방식은 병렬성과 확장성의 장점을 포기하게 만든다. 에이전트들이 공통된 컨텍스트를 유지해야 하기 때문에 각 에이전트는 선형 순차적 실행 구조로 설계해야 하고 이는 전체 작업 흐름을 비효율적으로 만든다. 또한 복잡한 멀티턴 작업에서는 단순히 “처음 지시만 기억하고 있으면 충분한가?”라는 문제가 생긴다. 중간중간 생긴 새로운 제약 조건, tool응답의 해석, 사용자 피드백 등은 모두 최초 지시만으로는 설명되지 않는 문맥의 변화를 가져온다. 또 Context overflow문제도 고려해야 한다.

그러면 오른쪽 그림처럼 이전 대화나 여러 작업에 대한 기록을 요약하여 하위 에이전트에게 전달하는 방법이 있다. 이는 Context overflow 문제를 해결하면서도 동시에 일정 수준의 병렬처리도 가능할 수 있다. 하지만 결정적으로 어려운 점은 무엇이 중요한 정보이며 이를 어떻게 정의할 것인가에 대한 문제와 요약 자체의 품질 문제가 여전히 존재한다.

Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks

Link : https://arxiv.org/abs/2503.09572

Multi-agent 구조의 문제를 갖지 않으면서도 효율적인 Agentic system 구조를 찾다가 Plan-and-act라는 논문을 발견하게 되었다. 이 논문은 복잡한 멀티 에이전트 구조 없이도 간단한 2단계 구조(PLANNER + EXECUTOR)로 에이전틱 시스템을 효과적으로 구현한 논문이다.

fundamental challenge “Planning : query to action and execute”

고수준 목표를 구체적이고 실행 가능한 단계로 잘게 나누는 데 어려움
“뉴욕행 비행기를 예약해줘” → “항공사 웹사이트 열기”, “여행 날짜 입력하기” …
complex and long-horizon task 에서 일관된 전략을 유지하기 어려움
“무엇을 이미 했고, 앞으로 무엇을 해야하는지”
real-environment is dynamic and unpredictable
에이전트는 계속해서 계획을 수정해야 함
planning strategies에 대한 고품질 데이터 부족

이전 연구들의 Agentic system은 크게 두 가지 프레임워크로 나눌 수 있다고 한다.

먼저 ReAct 기반 방법들은 사용자의 쿼리를 일련의 action으로 매핑하는 단일 모델 방식이며, LLMCompiler기반은 Planner와 Executor가 나누어져 있다. plan-and-act는 LLMCompiler 구조를 채택한다.

PLANNER: 사용자 목표를 달성하기 위한 고수준 계획을 생성
EXECUTOR: 생성된 계획을 실제 환경에 맞는 구체적인 행동으로 변환

Plan-and-act는 기존에 Planner-executor 구조에 다음과 같은 기법들을 추가하여 effective & scalable solutions를 제안하였다.

Planner 학습을 위한 synthetic data generation pipeline 제안
Planner, Executor fine-tuning 학습 방법 도입
중간에 계획을 수정하는 Dynamic Replanning 기법으로 문맥 유지력 향상

Dynamic Replanning

Static planning은 처음 수립한 계획을 실행 과정 내내 고정적으로 따르는 방식이다. 이 방식은 구조가 단순하고 추적이 쉽다는 장점이 있지만, 환경 변화나 예상치 못한 상황에 취약하다. 초기 계획 시점에 알 수 없었던 정보나 조건이 생겼을 때도 계획을 수정하지 않고 그대로 진행하기 때문에 실패 확률이 높다. 예를 들어 검색 결과가 없더라도 기존 계획대로 계속 탐색을 시도하거나 도구 호출실패에도 다음 단계를 무작정 수행해버릴 수 있다.

Dynamic replanning은 각 실행 단계(step) 이후 Planner가 현재 상태를 반영해 새로운 계획을 수립하는 방식이다. 단순히 초기 계획만을 따르지 않고 현재 상태와 이전 계획, 그리고 수행된 이전 action들을 종합해 다음 단계를 결정한다. 이는 다음과 같은 특징을 갖는다:

문맥(context)이 유지된다
중간에 알게 된 정보를 계획에 반영하며, 이전 상태를 단절하지 않는다.
명시적인 메모리 없이도 장기 작업에서의 기억 문제가 줄어든다
Planner가 중요 정보를 반복적으로 반영하면서 암묵적인 기억처럼 기능할 수 있다.
예외 상황 대응력이 높다
예상치 못한 오류나 실패에도, 시스템이 그에 맞게 다음 단계를 수정하며 진행한다.

또한 이 구조는 제어실 역할을 하는 Planner (고수준 추론과 전략적 의사결정 담당)과 Executor (실제 환경과 상호작용하며 Planner의 행동을 구체적 행동으로 변환)를 분리하는 시스템 아키텍처 측면에서도 유리하다.

예시

Static Planning

사용자 입력: CMU 도서관 검색
계획: 검색 → 결과 확인 → 길찾기 실행 → 출발지/목적지 설정 → 도보 시간 분석
문제: 검색 결과 없음 → 계획 변경 없이 실패한 흐름 지속

Dynamic Replanning

사용자 입력: CMU 도서관 검색
결과: "결과 없음" 표시
재계획:
1. 검색어를 "CMU 근처 도서관"으로 수정
2. 검색 결과 재분석
3. 도보 거리 계산
4. 가장 가까운 도서관을 선택하여 결과 제공

Synthetic Data Generation

웹 기반 장기 작업(long-horizon web tasks)에 특화된 Plan-and-Act 에이전트를 구축하려 할 때, 기존 LLM들이 갖는 한계는 명확하다. 단순한 프롬프트 엔지니어링으로는 해결되지 않는다. 성능을 높이기 위해서는 구조적인 변화가 필요하고, 그 중 하나가 합성 데이터(synthetic data)를 통한 파인튜닝이다.

WebArena-Lite는 웹 탐색 및 작업 수행 능력을 평가하기 위한 벤치마크다. 이 데이터셋으로 시중의 오픈소스 LLM들을 평가한 결과, 기본 성공률은 9.85%에 그쳤다. 여기에 Plan-and-Act 방식을 적용하자 14.21%까지 향상되었고 한다. 하지만 여전히 낮은 수치이며, 모델은 복잡한 질의에 대해 일관된 행동을 내리지 못한다. 이에 대한 이유는:

기존 LLM은 프리트레이닝 중에 웹 환경에서의 장기 계획(long-horizon planning)을 학습한 적이 없음
HTML 상태와 사용자 명령어를 받아 구체적인 웹 행동(action)을 생성하는 구조로 학습된 적도 없음

즉, 프롬프트 최적화나 인컨텍스트 예시 추가 같은 접근은 단순 질의에는 효과가 있을 수 있지만, 복잡한 작업이나 에이전트 시스템에 바로 적용하는 데에 한계가 있다

성능을 높이기 위한 접근은 구조적 분리와 역할에 맞는 학습 데이터 생성이다. Plan-and-Act 구조에서는 두 개의 주요 컴포넌트가 있다:

Planner: 사용자 질의로부터 단계별 계획(plan breakdown)을 생성
Executor: 각 계획 단계에 따라 HTML 상태를 받아 구체적인 행동(action)을 수행

이 구조에 맞게 데이터를 생성한다:

Planner용 데이터: 사용자 질의 + 단계별 계획
Executor용 데이터: HTML 입력 상태 + 해당 단계의 행동 출력

이런 synthetic data를 기반으로 두 모듈을 파인튜닝하면, 단순한 예시 주입보다 훨씬 강력한 계획-실행 연계 능력을 학습할 수 있다.

1. EXECUTOR training data

먼저 EXECUTOR를 학습시키기 위한 데이터는 다양한 trajectories(state and action sequences)를 수집하는 것이다. 웹 환경 내에서 action 데이터를 수집하기 위한 방법은 다음과 같다.

사용자 질의를 무작위 샘플링하고, 이를 시드 프롬프트로 사용해 LLM이 유사 질의를 생성한다.
생성된 질의 중 웹 에이전트가 처리할 수 없는 항목은 LLM을 통해 필터링한다.
유효한 질의는 demonstrator agent가 환경에서 수행하고, trajectory)를 기록한다.
수집된 traj는 Outcome-supervised Reward Model (ORM)을 통해 성공/실패 여부가 자동 평가된다.

ORM은 LLM이 생성한 action traj가 실제로 사용자 목적을 제대로 달성했는 지 평가하는 판별기이다. Plan-and-act 에서는 학습에 사용할 traj를 자동으로 걸러내기 위한 필수적인 컴포넌트가 된다. LLM이 생성한 queries를 실제 시뮬레이터를 적용하고 결과를 출력하여 합성 데이터를 생성하는 데 이 과정에서 LLM은 수많은 traj를 만들어낸다. ORM은 많은 데이터 중 실패 사례를 자동으로 추출하는 역할을 한다.

ORM은 WebRL이라는 이전 연구에서 사용된 방법인데 SFT 기반으로 학습된다. 학습데이터는 query-action-output 결과 쌍과 그것이 성공인지 실패인지에 대한 레이블로 구성된다. 이 레이블은 사람 또는 고성능 모델이 생성하고 입력 시퀀스는 텍스트 임베딩 후 binary classifier로 학습된다.

이러한 방식으로 고품질의 성공 traj를 자동 필터링할 수 있으며, 사용된 ORM은 WebRL 논문에서 제안된 LLaMA-3.1-8B 기반 모델이다. 사람이 직접 평가하지 않아도 충분한 정확도를 확보할 수 있다.

2. Planner 데이터 생성

Planner는 사용자 질의에 대응하는 step-by-step 계획을 생성해야 한다. 하지만 LLM은 실제 환경의 DOM 구조, 버튼 위치, 시각적 배치 정보를 인식할 수 없기 때문에 정적 텍스트 기반으로는 정확한 계획 생성을 할 수 없다.

이를 해결하기 위해 trajectory 역설계(reverse engineering) 방식을 사용한다:

이미 수집된 traj를 LLM에 입력으로 주고,
LLM이 이 궤적을 기반으로 현실과 정렬된 고수준 계획을 재구성하게 한다.
각 step은 해당 행동들과의 매핑 관계도 포함하며,
실행 가능한 grounded plan으로서 유효성을 갖는다.

3. Dynamic Replanning 데이터 생성

웹 상호작용 중에는 종종 예상치 못한 상태 변화나 실패 상황이 발생한다. 이를 처리하기 위해 동적 재계획 데이터도 별도로 생성한다.

입력: 원래 질의, HTML 상태, 지금까지의 행동들(prev actions), 정답 행동 시퀀스(future actions)
출력: 수정된 고수준 계획 (step-by-step plan)

이 방식은 실행 중 상황 변화에 따라 계획을 재작성할 수 있는 능력을 모델에 학습시킨다.

4. CoT 데이터 생성

(PLANNER CoT data)
## Step 1  
Reasoning: 검색창이 있고, 사용자가 찾고자 하는 장소가 명확하지 않으므로 검색이 필요하다.  
Step: 'Library near CMU'를 검색한다.

(EXECUTOR CoT data)
Reasoning: 이전 단계에서 'Contributors' 링크를 찾았고, 상단에 해당 링크가 id=13인 요소에 있음  
Action: do(action="Click", element="13")

PLANNER와 EXECUTOR 모두에 대해 추론 과정(reasoning)을 명시한 CoT 데이터를 생성한다.

Plan Reasoning Generation:
사용자 질의 + 초기 HTML + plan → 각 step에 대한 **이유(reasoning)**와 의도된 행동을 출력
Action Reasoning Generation:
현재 step + 이전 행동들 + 현재 HTML + 정답 행동 → 행동의 이유와 action do 명령을 생성

이러한 reasoning trace는 다음과 같은 이점을 갖는다:

행동 일관성과 정확도 향상
디버깅 및 분석 용이
소량 데이터로도 일반화 가능

사용된 teacher LLM은 DeepSeek-R1-Distill-Llama-70B 모델이다. 이 모델은 안정적으로 CoT 출력을 생성하며 실제 실험에서 성능 향상 효과가 입증되었다.

5. Plan Data Expansion & Targeted Plan Augmentation (Adaptive curriculum learning)

Planner 데이터 수급 불균형 문제 : 실행 궤적 수집 기반 방식은 성공 traj 확보에 많은 시간이 걸린다. Executor 데이터는 하나의 traj에서 여러 개 생성 가능하지만, Planner는 평균적으로 1개만 생성되며 데이터 수급 불균형이 발생한다.

이를 해결하기 위해 두 가지 확장 전략을 도입했다.

5.1 Plan Data Expansion

기존 query-plan 쌍을 기반으로 유사 계획을 다수 생성한다. LLM이 조건을 바꾸거나 단계 수를 변형해 다양한 변형 샘플을 생성한다.

5.2 Targeted Plan Augmentation (Adaptive Curriculum Learning)

모델을 검증 세트에 실행하여 자주 실패하는 유형을 분석한다.
해당 유형과 관련된 훈련 데이터를 LLM이 분류 및 증식한다.
분류된 시드 데이터로부터 유사 계획을 5,000개 이상 추가 생성한다.

이 방식은 모델의 취약점을 중심으로 데이터를 구성하는 적응형 학습 커리큘럼이며, 학습 효과를 극대화할 수 있다.

* Adaptive curriculum learning : 모델이 쉬운 문제부터 어려운 문제로 점진적으로 학습하도록 구성하는 전략. 여기서는 단순히 쉬운 → 어려운 흐름이 아니라 모델이 어떤 유형의 문제를 잘 못푸는지 보고 해당 실패 유형과 관련된 데이터를 의도적으로 더 많이 생성해 추가학습 시킨다.

[1] 실패 유형 정의 (Failure Class 분류)

먼저 웹사이트 또는 작업 유형별로 자주 발생하는 실패 유형을 정의

예:

Class A: 너무 구체적인 검색어 사용 → 검색 실패
Class B: 잘못된 경로로 UI 요소 접근 → 동작 실패
Class C: 계획에 필요한 중간 단계 생략

이러한 분류는 수작업 분석 또는 로그 기반 자동 분석으로 생성할 수 있다.

[2] 기존 학습 데이터와 실패 유형 매핑

LLM 또는 휴리스틱 규칙을 사용해 기존 학습 샘플들이 어떤 실패 유형과 관련 있는지 분류

예:

"상품명에 'tank'가 들어간 고객 불만 리뷰를 찾아 요약하라" → Class A
"재고를 삭제하라" + 잘못된 UI 경로 사용 → Class B

이렇게 각 예시는 실패 유형에 따라 라벨링된다.

[3] 시드 기반 유사 질의 및 계획 생성

각 실패 유형에 해당하는 기존 예시를 시드(seed)로 사용하여 LLM이 유사한 질의 및 계획 쌍을 생성

예:

원본: "상품명에 'Hollister'가 포함된 제품을 할인 처리해줘"
생성: "이름에 'Oakley'가 들어간 재고를 품절 상태로 바꿔줘"

이처럼 실패 유형의 다양한 표현, 조건, 맥락을 가진 버전을 생성하여 모델이 유사한 실패를 반복적으로 학습하게 한다.

Experiments

WebArena-Lite는 실제 웹사이트 기반의 환경에서 웹 내비게이션 및 작업 수행 능력을 평가하는 고난이도 벤치마크이다. 사용자는 자연어 질의를 입력하고, 에이전트는 HTML 구조를 기반으로 웹 페이지를 분석하고, 클릭/검색/입력 등의 행동을 통해 정답 상태에 도달해야 한다.

단순 버튼 클릭에서 조건 필터링, 동적 UI 탐색, 정보 추출 등 복합적인 작업이 포함된다.
메트릭은 작업 단위 성공/실패로, 완전히 완료하면 1점, 실패하면 0점이다.
다양한 도메인이 포함된다:
- GitLab 환경: 이슈/커밋 확인, 기여도 높은 저장소 탐색
- 쇼핑몰 관리자: 상품 목록 조회, 특정 조건 제품 할인 적용
- OpenStreetMap: 장소 검색, 도보 경로 탐색, 거리 비교 등

Baselines

1. No-Planner (ReAct 스타일)

단일 LLM이 고수준 reasoning과 실행을 모두 처리하는 구조
별도의 Planner 없이 직접 action을 생성
대부분 GPT 계열 모델은 이 구조를 사용
일반적인 zero-shot 또는 few-shot prompting만으로 수행
환경 적응 학습이 없어 정확도 낮음

2. AWM (Agent Workflow Memory)

GPT-4 기반의 상태 추적 강화 에이전트
명시적인 memory stack을 사용하여 웹 상호작용 상태를 유지
단일 모델로서 실행되나, 다중 문맥 인식 능력이 강화됨

3. WebPilot

총 6개의 sub-agent로 구성된 멀티에이전트 기반 탐색 시스템
도구 선택과 API 탐색에 특화되어 있음
복잡한 구조지만, 플래너 구조는 명시적이지 않음

4. WebRL

WebArena-Lite의 기존 SOTA
RL 기반의 커리큘럼 학습을 사용
에이전트가 스스로 사용자 질의를 생성하고 (예: “100달러 이하의 노트북을 찾아줘”) trial & error 과정을 통해 성공 traj를 수집
성공한 traj를 기반으로 LLM fine-tuning 또는 policy 업데이트 수행
점진적으로 쉬운 태스크 → 복잡한 태스크로 확장

Plan-and-Act 모델 성능

+ Finetuning	합성 데이터 기반 파인튜닝 적용
+ Synthetic Trajectories	실행된 행동 시퀀스를 활용한 traj 기반 학습
+ Plan Expansion	단일 질의-계획 쌍에서 유사 계획 여러 개 생성
+ Targeted Augmentation	실패 유형 중심 커리큘럼 데이터 생성
+ Dynamic Replanning	Executor 행동 이후 Planner가 매번 계획을 갱신
+ CoT Reasoning	Planner와 Executor 모두에 reasoning trace 포함

기본적인 오픈소스 LLM은 WebArena-Lite에서 약 9.85%의 성공률을 기록했다. Plan-and-Act는 위 구성 요소를 단계적으로 추가하면서 성능이 꾸준히 향상되었고, 최종 모델은 기존 SOTA였던 WebRL을 뛰어넘는 정확도를 달성했다. 특히 CoT reasoning과 Dynamic Replanning을 포함한 최종 구성은 복잡한 다단계 질의에 대해 안정적으로 대응했다.

Conclusion & Limitations

+ 복잡한 Multi-agent 구조 없이도 간단한 2단계 구조(PLANNER + EXECUTOR)로 agentic system을 효과적으로 구현

+ 스케일링 가능한 합성 데이터 생성 파이프라인 제안

+ Dynamic replanning + CoT 도입

- 초기 trajectory 수집이 기존 Baseline성능에 의존함

- Dynamic replanning이 매 step마다 발생 (overhead)

- Web 환경 특화 구조

[NLP] LLaDA: Large Language Diffusion Models

minkyung — Sun, 29 Jun 2025 18:51:36 +0900

Large Language Diffusion Models

link : https://arxiv.org/pdf/2502.09992

Overview

기존 대부분의 LLM은 Autoregressive 방식을 따른다. 즉, 주어진 이전 토큰들을 기반으로 다음 토큰을 순차적으로 예측하며 문장을 생성하는 방식이다.

이 논문의 저자들은 LLM의 핵심 능력은 Autoregressive 구조에만 의존하지 않으며 다른 생성 방식으로도 기존 SOTA LLM에 필적하는 성능을 낼 수 있다고 주장한다. 이에 따라 저자들은 새로운 접근 방식인 LLaDA (Large Language Diffusion Models)를 제안한다.

LLaDA는 전체 문장을 순차적으로 예측하지 않고,

일부분이 마스킹된 토큰 시퀀스를 입력으로 받아

마스킹된 토큰을 확산 복원하는 방식으로 동시에 예측하는 비순차적 언어 생성 모델 아키텍처이다.

Introduction

Is the autoregressive paradigm the only viable path

to achieving the intelligence exhibited by LLMs?

→ Not a simple "yes"

먼저 저자들은 Autoregressive 방식이 LLM을 학습하는 유일한 길이 아니고

MLE 기반 확산 모델링 방식을 따르면서 충분한 규모와 구조적 개선만 갖춘다면 확산 기반 모델도 충분한 대안이 될 수 있다고 주장한다.

수식(1)은 Generative modeling principles, 즉 언어 모델 학습 본질적 원리를 수식화한 것이다. 대부분의 확률 생성 모델은 이 학습하려는 분포 p를 데이터 분포 q에 가깝게 만드는 MLE(Maximum likelihood)로 학습을 하게 된다.

(2)는 GPT 계열 모델들이 사용하는 Autoregressive factorization으로

전체 문장의 확률은 각 토큰을 앞에서부터 하나씩 예측한 확률의 곱으로 모델링하고 이를 통해 토큰을 순차적으로 예측한다.

저자들은 LLM의 핵심 능력은 (2) Autoregressive factorization 방식이 아니라 (1) Generative modeling principles이 핵심 원리라고 주장한다. Autoregressive Model(ARM)은 일종의 구현 방식일 뿐이며 LLM 능력의 핵심은 토큰을 잘 복원하도록 학습된 확률 생성 모델이어야 한다는 것이다.

또한 LLM의 일부 근본적 한계 역시 이 AR 방식에서 기인한 것이라고 한다.

1. 비효율적인 샘플링

한 토큰씩 왼→오 방향으로 순차 생성해야 하므로 병렬화가 어려움

2. Unidirectional context

앞 쪽 토큰만 조건으로 사용 → 양방향 정보 활용 불가

3. Reversal curse

A→B는 잘 배워도 B→A는 못배움 (비대칭적 생성 구조)

4. 정보 지연

중요한 정보가 뒤에 나오면 늦게야 반영 가능

LLM Capabilities

LLM의 scalability, instruction-following, in-context learning은 ARM의 고유 결과가 아니라 더 일반적인 확률 생성 모델링(generative modeling)의 결과라고 하며 이에 대한 약한 설득에 관한 내용이다

1. LLM Scalability

모델 규모가 커질수록 성능이 올라가는 현상은 Transformer 구조와 대규모 데이터와 모델 크기, 그리고 수식(1)의 생성 모델 원리간 상호작용으로부터 생긴다. 따라서 이 확장성은 ARM의 고유한 특성이 아니다. 그리고 여기서 Fisher consistency에 대해 언급하는데, Fisher consistency는 정답분포 q에 대해 MLE를 반복하면 정확한 분포로 수렴하는 특성을 말한다. 이상적인 조건에서 실제 분포를 복원할 수 있는 이론적 성질도 MLE 기반으로 한다. 즉, MLE 기반 확률 생성 모델은 모델만 충분히 expressive하다면 점점 정확해질 수 있다.

2. Diffusion Transformer

Diffusion Transformer가 비전 분야에서 성공한 사실은 ARM 없이도 확장성과 생성 능력을 확보한다는 실증적 근거가 된다고 주장

3. Instruction-following, In-context learning

LLM의 Instruction-following, In-context learning능력은 단지 GPT 계열 모델의 ARM의 결과가 아니라, 조건부 생성 모델이라면 (예: prompt → response 구조), 충분히 실현할 수 있는 일반적인 능력이다. 특히 언어 구조가 일관성 있게 주어지는 경우 (예: instruction → response)라면 diffusion 기반 모델도 이런 능력을 학습할 수 있다고 주장한다.

예를 들어 ARM은 이전 token이 condition된 확률곱으로 모델링하거나 diffusion은 전체 x를 점진적으로 마스킹하고 condition을 고정한 채 복원하는 형태로 학습할 수 있다. 이렇게 함으로써 일반 Text task와 같이 입출력 구조가 논리적으로 잘 정의되어 있기만 하면 Diffusion도 instruction-following을 충분히 학습할 수 있다.

4. Data Compressor

정보이론에 따르면 확률p가 클수록 그 시퀀스를 더 짧은 코드로 압축할 수 있는데 그렇기 때문에 ARM형태로 학습된 언어 모델의 확률 분포로 인해 GPT는 주어진 시퀀스에 대해 최적의 압축 방식을 학습한 것처럼 동작하게 된다. 실제로 데이터 압축키에 GPT를 활용한다고 한다. 하지만 이것도 충분히 expressive한 생성 모델이라면 ARM이 아니더라도 비슷한 정보 압축 성능을 낼 수 있다고 저자들은 말하고 있다.

LLaDA (Large Language Diffusion with mAsking)

예시 비교

Autoregressive

입력:   [BOS]  토큰1  토큰2  토큰3  ...
출력:   토큰1  토큰2  토큰3  토큰4  ...

Step 0: [BOS]
         ↓
Step 1: [BOS] → The
         ↓
Step 2: [BOS] The → cat
         ↓
Step 3: [BOS] The cat → is
         ↓
...

학습:   p(x) = p(x₁) · p(x₂|x₁) · p(x₃|x₁,x₂) · ...
          (왼쪽 → 오른쪽 순차 생성)

샘플링: 각 단계에서 직전까지의 결과만 보고 다음 토큰 생성

LLaDA

입력:   [Q: ...? A:  [MASK] [MASK] [MASK] [MASK]]

초기 상태: [MASK] [MASK] [MASK] [MASK]
Step 1:     [MASK] [MASK] [MASK] [MASK]
              ↓    ↓    ↓    ↓
Step 2:     The     [MASK]     is     [MASK]
              ↓     ↓      ↓     ↓
Step 3:     The     cat     is     cute

학습:   원래 답변 텍스트를 점진적으로 마스킹 → 복원 학습
         q(x^t | x^0),  p_θ(x^{t-1} | x^t)

샘플링: 모든 토큰을 [MASK]로 시작 → 한꺼번에 예측 & 반복 복원

Diffusion foward process

Diffusion reverse process

trained score model

기존 Diffusion model은 continous 데이터 값에 Gaussian noise를 점진적으로 추가하고 이 노이즈를 제거하는 디노이징 프로세스를 학습하였다면,

Text에서는 연속 값 노이즈가 아닌 이산적 마스킹이므로 continuous 대신 discrete 확산 구조가 필요하다.

기존 ARM은 위와 같이 문장 확률을 분해하여 학습하게 된다. 반면 LLaDA는 forward process와 reverse process를 통해 분포 $p_{\theta} (x^0)$를 정의하며, 이는 확률 생성 모델의 일반 원리에 충실한 방법이다.

Forward Process

forward process과정은 discrete diffusion에서 노이즈를 추가하는 과정을 의미한다.

$x^0 \sim p_{data}(x^0)$ 에서 시작해, $ t \in [0, 1]$에 걸쳐 각 토큰이 독립적으로 마스킹된다.

각 토큰이 시간 $t$에 마스킹될 확률은 $t$이고, 마스킹되지 않을 확률은 $1-t$이다.

$t=1$일 때는 모든 토큰이 완전히 마스킹된 상태이다.

Reverse Process

forward과정에서 얻은 $x^t$에서 mask predictor 모델 $p_{\theta}(x^0 | x^t)$를 사용해서 마스킹된 위치의 원래 토큰을 예측한다.

이 과정은 $t=1$인 상태 즉 모든 토큰이 Masking된 상태에서 $t=0$ 마스킹된 토큰이 모두 없이 온전한 토큰이 남은상태로 시간 역방향으로 반복 수행되어 원본 문장을 복원한다.

Objective

$t \sim U[0,1]$: 마스킹될 확률이자 시점 $t$는 Uniform 분포에서 샘플링
$x^0$: 원본 시퀀스
$x^t$: forward process에서 나온 Masking sequence
$\mathbf{1}[x_{i}^{t} = M]$: i번째 토큰이 마스킹된 경우만 Loss 계산
$\frac{1}{t}$: 마스킹 비율을 보정하는 정규화 항
$p(x_{i}^{0}|x^t , t)$: $x^t$와 시점 t를 받아 모델이 i번째 토큰에 대해 예측한 확률 분포 중 정답 토큰 $x_i^0$에 할당한 확률
$-\log {p(x_i^0 | x^t, t)}$: i번째 토큰에 대해 모델의 예측이 정답과 얼마나 차이가 나는지 측정하는 -log likelihood

이는 masked tokens에 대한 cross-entropy 평균 값을 의미하게 되고 기존 log likelihood($-\log{p}$)의 upper bound이다.

*보충 설명 ------------------------

일반적인 ARM은 수식 (2)와 같이 factorized된 확률을 학습한다. 이 구주로 인해 각 토큰의 조건부 확률을 차례로 계산할 수 있다. 이걸 그대로 평균을 내면 MLE loss가 된다.

하지만 LLaDA는 AR 구조가 아니기 때문에 입력 전체를 mask로 만들고 한꺼번에 masked token을 복원하는 방식이 된다.

즉, 한 스텝에서 마스킹된 토큰을 '독립적'으로 복원하게 된다.

이 구조에서는 fully unmasked 샘플의 로그 확률 $\log{p_{\theta}(x_0)}$을 직접적으로 계산할 수 없다.

$p_{\theta}(x_0)$은 여러 스텝에 걸쳐 Reverse diffusion process를 거쳐야 정의되는데

이 과정은 복잡한 markov process이다.

이 적분은 closed-form이 아니며 계산이 불가능하기 때문에 직접적으로 MLE 학습이 불가하기 때문에,

MLE의 upper bound를 최소화하는 surrogate objective를 사용해야 한다.

따라서 DDPM(Denoising Diffusion Probabilistic Models)에서 유도된 것과 유사하게 variation 방식을 사용하게 된다.

DDPM 논문처럼 forward $q(x_t|x_0)$과 reverse $p_{\theta}(x_0 | x_t)$사이의 evidence lower bound(ELBO)를 설정할 수 있다.

중간 variable $x_t$ (forward process로부터 얻은 noised sequence)를 사용해 log p를 위와 같이 정의한다.

우리가 모르는 $p_{\theta}(x_t)$ 대신 foward process에서 정의한 $q(x_t|x_0)$을 사용하여 정리하면

다음과 같은 upper bound를 정의할 수 있다.

forward와 reverse 간 KL이 작아야 bound가 tight하다는 것을 이용해 surrogate loss로 $\mathbb{E}_{t, x_t}[-\log{p_{\theta}(x_0|x_t)}]$를 사용한다.

또한 surrogate loss에서 어떤 시간 t와 그에 대응하는 masked sequence x_t를 모든 경우에 대해 평균하는 것은 불가능하기 때문에

Monte carlo 방법을 사용해 기대값을 샘플 평균으로 대체하여 objective를 최적화 한다.

*-------------------------------

이를 통해 LLaDA는 MLE와 같은 Generative modeling principles 기반하여 학습할 수 있게 된다.

또한 마스킹된 위치를 복원하는 과정은 좌우 문맥을 모두 사용하므로 자연스럽고 일관된 문장 생성이 가능하다고 얘기한다

A Conceptual Overview of LLaDA

위에서 설명한 Objective를 최적화하여 LLM을 학습하는 것이 (a)Pretraining 단계이다.

Pretraining 후 (b)SFT 단계를 거치게 되는데 기존 LLM 학습과 유사한 방식으로 적용할 수 있으며 Diffusion 구조에서도 완전히 호환될 수 있다.

Pretraining 단계에서는 무작위 텍스트 복원 학습이기 때문에 $p_{\theta}(x_0)$을 학습하지만 SFT prompt가 주어지면 response를 학습하도록 하는 조건부 분포 학습이라는 점만 다르다.

prompt $p_0$는 그대로 두고 response $r_0$을 마스킹하여 $r_t$로 만든다. 그리고 $p_0$와 $r_t$를 모델에 입력한 후 $r_0$에서 마스킹된 부분을 복원하여 Loss를 계산한다.

즉 SFT는 Pretraining과 구조적으로는 완전히 동일하고 단지 마스킹된 Response 부분에만 집중되었다는 점만 다르다.

(c)Inference는 prompt $p_0$는 고정으로 두고 $r_0$은 길이 L만큼 즉 L개의 토큰을 모두 마스킹한 상태에서 시작한다.

L과 sampling steps(denoise steps) 수 N은 모두 하이퍼파라미터 이다. 즉 응답을 '얼마나 길게' 생성할지 '몇 단계에 걸쳐 복원'할 지 사용자가 미리 정해야 한다.

그래도 eos 토큰 같은 게 생성되고 이를 지워서 토큰 길이는 동적으로 조절할 수 있는 것 같다. 단, L보다 더 긴 답변은 생성할 수 없을 것이다.

Inference (reverse process)에 대해 좀 더 자세히 보면 t는 [0,1]이기 때문에 스텝 수로 나눠 t를 설정한다. ex) t=1.0,0.9375,0.875,…,1

4라인: 현재 마스킹된 시퀀스 r_t에 대해 모든 마스크 토큰을 예측한다. mask predictor가 각 위치에 가장 확률 높은 토큰을 골라 시퀀스 r_0을 생성하게 된다.(Greedy decoding)

9라인: 복원된 토큰이라도 일정 확률로 다시 마스킹하게 되는데 이로 인해 샘플링을 안정적으로 하는 효과를 주는 것 같다.

prob을 s/t로 맞추는 것은 만약 현재 스텝에서 복원된 토큰이 0.75이고 다음 스텝의 t가 0.8이면 토큰을 다시 80%로 마스킹되도록 맞춰주는 것이다.

Re-masking strategy

알고리즘4의 9라인에서 랜덤으로 리마스킹하는 것이 아니고 이는 low-confidence 토큰만 다시 마스킹하는 deterministric re-masking 기법을 추가한 inference 알고리즘이다.

13라인: 모든 토큰을 다시 순회하여

14라인: confidence가 가장 낮은 $n_{un}$개의 토큰에 해당하면

15라인: re-masking

cofidence를 추가로 계산해야 한다는 단점은 있지만 실용적이고 더 정교하게 inference step을 조절할 수 있게 된다.

이 외 Semi-autoregressive remasking 전략도 소개하는데

시퀀스를 여러 블록으로 나누고 각 블록은 디퓨전 방식으로 생성하되 블록끼리는 좌 -> 우 방향으로 생성하는 기법이다.

이렇게 함으로써 AR의 장점을 가져가겠다는 전략같다.

Experiments

autoregressive 없이도 sota급 llm 능력을 달성할 수 있음을 보여주는 실험

MMLU, GSM8K 등에서 FLOPs 대비 ARM 기반 모델과 거의 동일하거나 나은 성능 곡선을 보여준다.

여기서 CMMLU만 중국어 벤치마크 데이이터셋이다.

LLaDA는 대부분의 벤치마크에서 FLOPs에 따라 성능이 ARM과 거의 유사한 곡선을 따라가며 확장된다. 이는 디퓨전 기반 모델도 LLM처럼 스케일할 수 있다는 실험적 증거라고 볼 수 있다.

작은 모델에서는 LLaDA가 약간 열세이지만 모델이 커질수록 성능차이가 감소하는 걸 볼 수 있다.

특히 이 실험의 벤치마크들은 고난이도 벤치마크이고 baseline 모델들은 sota급 모델(llama, qwen 등)이다.

이 실험이 말하고자 하는 바는 "LLaDA는 autoregressive 없이도 확률 생성 원리(MLE)에 충실하게 학습하면, 동일한 계산량에서 ARM 기반 LLM과 동등한 성능을 낼 수 있다."는 것이다.

reversal reasoning능력에서 gpt-4o초과 달성한 걸 보여준 실험

순방향 역방향 문장 모두 학습한 후, 역방향 문장 재구성 성능(reversal reasoning)을 측정한 것이다. LLaDA는 GPT-4o, GPT-4-turbo보다도 reversal task에서 더 우수한 성능을 보였다. 이 벤치마크는 중국어 시(poem) 기반 단문에서 역방향 완성 과제를 수행한 결과이며, LLaDA‑8B는 GPT‑4o 대비 약 +8.1%p 향상 된 성능을 기록했다. 이 실험 결과는 일반 ARM 구조가 잘 못하는 대표적 약점 (reversal curse)을 LLaDA는 해결할 수 있다는 것을 보여줬다고 주장한다. 즉 LLaDA는 전체 문장을 마스킹 후 양방향 복원하므로, 방향 비대칭 문제를 해소했다고 주장한 것이다.

Contribution

기본 ARM의 대안 제시 : 확산 모델로도 LLM 핵심 능력 실현가능
이산 공간에서의 확산 : 이산 토큰 공간에서의 마스킹 확산이라는 새로운 추론 전략 제시
Re-masking 전략 : 문장 전반의 일관성과 자연스러움 향상
양방향 생성 : 문맥 이해, self-correction 가능
병렬화, 낮은 latency : 실시간 애플리케이션에 적합 (Gemini Diffusion)
Reversal curse 극복 : 역추론 과제에서 뛰어난 성능

Review

이 논문은 ICML에 최종적으로 Reject되었는데 리뷰어들의 의견을 살펴보면,

기존 Masked diffusion, discrete diffusion모델을 그대로 가져와 확장한 것일 뿐이고 구조적 혁신이나 새로운 학습전략은 없었다.
일부 주요 결과를 보면 Sem-autoregressive 방식에서 성능이 좋은데 이는 AR을 완전히 대체할 수 있다는 주장과 모순됨.
또한 디퓨전 디노이징 스텝 Inference는 autoregressive KV caching 적용한 것보다 훨씬 느릴 것이라 예상할 수 있는데 이에 대한 효율성을 분석한 내용이 전혀 없음

특히 이 논문은 모델과 학습 코드가 공개되지 않아 재현이 어렵고 diffusion 계열 모델의 약점인 Inference 효율성에 대한 실험 근거가 부족하다는 점이 치명적이었을 것 같다. 특히 양방향 문맥을 고려할 수 있다는 reversal reasoning task에서도 중국어 기반 단일 태스크에 국한되어 일반화에 한계가 있다. 실험이나 논문의 수학적 표현 등은 설득력이 있지만 전체적인 논문 설득의 구조가 이해하기 어렵고 복잡하지만 이를 설득하는 근거들이 모두 약한 것 같다.

그래도 diffusion 기반 언어 모델인 LLaDA를 8B 규모로 확장하고, autoregressive 모델과의 비교를 통해 diffusion 방식의 가능성을 실증했다는 점에서 의미가 있다. 특히 instruction-following, reversal reasoning, infilling 등의 핵심 능력을 diffusion 모델로 구현한 점은 인상적이다. 최근 Gemini-Diffusion 등과 함께 diffusion LLM의 실현 가능성을 보여준 초기 사례 중 하나로 볼 수 있다.

현재까지 발전된 Diffusion 계열 LLM은 block 단위 병렬 생성, few-step sampling(distillation 기반), KV 캐시 유사 구조 등을 활용해 속도를 크게 향상시켜, 기존 diffusion 모델의 느린 샘플링 한계를 극복하고 autoregressive 모델에 근접한 실시간 생성 성능을 달성하고 있다

알고리즘 공부

minkyung — Fri, 31 Jan 2025 14:49:40 +0900

알고리즘

시간복잡도 및 요약

카테고리	이름	시간복잡도	설명
자료구조	유니온 파인드	O(α(N))	서로소 집합 자료구조, 경로 압축 + union by rank
자료구조	우선순위 큐 (힙)	삽입/삭제: O(log N), 조회: O(1)	최대/최소값을 빠르게 관리하는 큐
자료구조	링크드 리스트	삽입/삭제: O(1), 탐색: O(n)	노드 포인터 기반 구조, 삽입·삭제 효율적
자료구조	Trie	삽입/탐색: O(L)	문자열 저장 트리, 접두사 탐색에 최적
완전탐색	DFS	O(V + E)	깊이 우선 탐색, 스택/재귀 기반
완전탐색	BFS	O(V + E)	너비 우선 탐색, 큐 기반
백트래킹	백트래킹	O(조건에 따라 다양)	상태 공간 트리 탐색, 가지치기 통해 효율화
탐색	이진 탐색	O(log N)	정렬된 배열에서 중간 기준 이분 탐색
최단경로	다익스트라	O((V + E) log V)	음수 간선 불가, 우선순위 큐 사용
최단경로	플로이드-워셜	O(V³)	모든 정점 쌍 간의 최단경로 계산
DP	동적 계획법	문제마다 다름	부분 문제를 저장하며 중복 계산 방지
MST	크루스칼	O(E log E)	간선 중심, 정렬 후 유니온파인드 사용
MST	프림	O(E log V)	정점 중심, 우선순위 큐 사용
위상정렬	위상정렬	O(V + E)	DAG에서 순서 결정, 진입 차수 사용
그래프	LCA	O(log N)	트리에서 두 노드의 최소 공통 조상 찾기
그래프	펜윅 트리 (BIT)	업데이트/쿼리: O(log N)	누적합, 구간합 처리에 효율적
그래프	오일러 경로	O(V + E)	모든 간선을 정확히 한 번 방문
기타	투 포인터	O(N) ~ O(N²)	두 포인터를 움직이며 조건 만족 탐색
기타	구간합 (Prefix Sum)	쿼리: O(1), 사전 계산: O(N)	구간합 빠르게 계산, 누적합 배열 사용
기타	세그먼트 트리	쿼리/업데이트: O(log N)	동적 구간합 처리
기타	비트마스킹	연산: O(1), 조합: O(2ⁿ)	상태/집합 표현 및 조합에 활용

자료구조

서로소 집합(Disjoint Sets)

공통 원소가 없는 두 집합
{1,2}와 {3,4}는 서로소지만 {2,3}은 아님

1. UnionFind

Disjoint Set(서로소) 서로 중복되지 않는 부분 집합, 즉 서로소 집합 자료 구조를 표현할 때 사용된다.
- 서로소 집합 자료구조는 합치기 찾기(Union Find) 자료구조라고 불리기도 한다.
크루스칼에서 cycle check시도 사용
합집합(Union): 두 개의 원소가 포함된 집합을 하나의 집합으로 합치는 연산
찾기(Find): 특정한 원소가 속한 집합이 어떤 집합인지 알려주는 연산
동작과정
1. 합집합(Union) 연산을 확인하여 서로 연결된 두 노드 A, B를 확인한다.
  1. A와 B의 루트노드 A'과 B'을 찾는다
  2. 모든 합집합(Union) 연산을 처리할 때까지 1을 반복한다.

서로소 집합 자료구조에서는 연결성을 통해 손쉽게 집합의 형태를 확인할 수 있다.
- 루트 노드의 갯수가 집합의 갯수

# Find 루트 노드를 찾을 때까지 재귀 호출
def find_parent(parent, x):
    if parent[x] != x:
        return find_parent(parent, parent[x])
    return x

# Union 두 원소가 속한 집합 합치기
def union_parent(parent, a, b):
    a = find_parent(parent, a)
    b = find_parent(parent, b)
    # 별다른 조건이 없으면 더 큰 값을 부모로함
    if a < b:
        parent[b] = a
    else:
        parent[a] = b

# node 갯수, edge 갯수
v, e = map(int, input().split())
# 부모 테이블 노드 갯수만큼 자기 자신으로 초기화
parent = [i in range(v + 1)]

# Union
for i in range(e):
    a, b = map(int, input().split())
    union_parent(parent, a, b)

print('각 원소가 속한 집합: ', end='')
for i in range(1, v+1):
    print(find_parent(parent, i), end=' ')
print()

print('부모 테이블: ', end='')
for i in range(1, v+1):
    print(parent[i], end=' ')

문제점

합집합 연산이 편향되게 이루어지는 경우 Find가 비효율적으로 동작한다. 부모 노드를 찾을 때 거슬러 올라가야 하기 때문
최악의 경우 Find가 모든 노드를 확인해야 해서 O(V)

경로 압축(Path Compression) 이용해 Find 최적화
- Find를 재귀적으로 호출한 후 부모 테이블 값을 바로 갱신

def find_parent(parent, x):
    if parent[x] != x:
        parent[x] = find_parent(parent, parent[x])
    return parent[x]

서로소 집합을 이용한 사이클 판별

무방향 그래프의 경우 사이클 판별할 때 사용할 수 있다.
- 참고로 방향 그래프는 DFS를 통해 판별할 수 있음

각 간선을 하나씩 확인하며 두 노드의 루트 노드를 확인
1. 루트 노드가 서로 다르다면 Union
2. 루트 노드가 서로 같으면 Cycle 발생한 것
모든 간선에 대하여 1 반복

# find_parent(), union_parent()는 같고,

# Union
cycle = False
for i in range(e):
    a, b = map(int, input().split())
    if find_parent(parent, a) == find_parent(parent, b):
        cycle = True
        break
    else:
        union_parent(parent, a, b)

2. 우선순위 큐

최단 경로 → 다익스트라 O(V^2) - 노드 5000개 이하일 때, 이상일 때 우선순위 큐 사용
우선순위 큐는, 우선순위가 가장 높은 데이터를 먼저 삭제하는 자료구조,
- 스택은 가장 나중에 삽입된걸 삭제하고 큐는 가장 먼제 삽입된 데이터를 삭제하는 자료구조

리스트로 구현하면 삽입 O(1), 삭제 O(N)
힙으로 구현하면 삽입 O(logN), 삭제 O(logN)
단순히 N개의 데이터를 힙에 넣었다가 모두 꺼내는 작업은 정렬과 동일함 (heap정렬 O(logN))
힙은 완전 이진 트리 자료구조의 일종

python은 기본적으로 minheap이고 maxheap구현하고 싶으면 - 붙이면됨

import heapq

def heapsort(iterable):
		h, result = [], []
		for value in iterable:
				heapq.heappush(h, value)
		for i in range(len(h)):
				result.append(heapq.heappop(h))
		return result

n = int(input())
arr = []

for i in range(n):
		arr.append(int(input()))

res = heapsort(arr)
for i in range(n):
		print(res[i])

3. 링크드리스트

삽입 삭제가 잦을 때.
- List: find O(N), insert O(N), Delete O(N)
- Linkedlist : find O(N), insert O(1), Delete O(1)
위로만, 아래로만 뺴는 경우 싱글, 위 아래 둘다 넣고 뺴야하는 경우 더블링크드리스트
더블 링크드리스트 예제 : 코드트리 > 산타의선물공장

4. Trie

집합(set)에서 특정 키를 찾는데 사용되는 m-ary 자료구조 중 하나이다.
- m-ary는 최대로 가질 수 있는 자식 수를 말하고 m=2이면 이진트리이다.
문자열 조회를 빠르게 하기 위해 사용되므로 자동완성 등에 많이 사용된다.
꼭 문자열에만 사용되는 것은 아니고 비트와 같이 순서가 있는 열거 가능한 자료형에도 사용할 수 있다.
n개의 단어 사전에서 길이 m의 단어를 일일히 비교해가며 찾으면 이진 검색 트리를 사용해도 O(mlogn)이 필요하다.
겹치는 문자열이 없는 경우 O(포인터크기 * 포인터갯수 * 총노드수)의 메모리가 필요하다
비트를 통해 최적화 하는 방법이 있다

문제 백준 전화번호 목록

# 트라이 자료구조
import sys

input = sys.stdin.readline

class Node(object):
    def __init__(self, has_end=False):
        self.has_end = has_end
        self.children = dict()

class Trie(object):
    def __init__(self):
        self.head = Node(None)
    
    # 트라이에 전화번호 추가 
    def insert(self, num):
        curr_node = self.head
        
        for d in num:
            # 현재 노드에 해당하는 숫자의 자식이 없으면 생성
            if curr_node.children.get(d) is None:
                curr_node.children[d] = Node()
            # 해당하는 숫자 자식으로 이동
            curr_node = curr_node.children[d]
        # 자료구조 말단에서 끝 표시
        curr_node.has_end = True
    
    # 트라이에서 전화번호 조회
    def search(self, num):
        curr_node = self.head
        
        for d in num:
            # 해당하는 숫자의 자식이 없으면 전화 가능하므로 일관성 있음
            if curr_node.children.get(d) is None:
                return True
            curr_node = curr_node.children[d]
            # 현재 위치에서 끝나는 전화번호가 있다면 해당 번호로 전화되므로 일관성 없음
            if curr_node.has_end:
                return False
        return True
        
        
if __name__=="__main__":
    t = int(input())
    
    for _ in range(t):
        n = int(input())
        
        phone_numbers = [input().rstrip() for _ in range(n)]

        # 길이가 짧은 것 부터 삽입하기 위해 정렬
        phone_numbers = list(map(lambda x : (len(x), x), phone_numbers))
        phone_numbers.sort(key = lambda x : x[0])
        
        data = Trie()
        
        # 번호에 대해 일관성 있는지 확인후 Trie 자료구조 데이터에 삽입
        for _, number in phone_numbers:
            # 일관성 없으면 즉시 NO 출력후 다음 테스트케이스 탐색
            if data.search(number) == False:
                print("NO")
                break
            # 일관성 있으면 데이터 삽입
            data.insert(number)
        # 모든 전화번호가 일관성 있다면 YES 출력
        else:
            print("YES")

탐색

경우의 수 나눠서 그래프 그려보기

1. 완전 탐색

가능한 모든 방법을 찾아봐서 해를 구하는 것
그래프 탐색 : DFS, BFS
BFS : 큐에서 노드 꺼내고 해당 노드의 인접 노드 중 방문하지 않은 노드를 모두 큐에 삽입하고 방문처리
DFS : 보통 재귀로. 함수 시작 부분에 종료 조건 명시. 익숙하지 않을 때 방문 경로 저장같은 거 global로 하면 안헷갈림

방향그래프의 사이클 판별 (DFS)

무방향 그래프의 경우 UnionFind를 통해 사이클을 판별할 수 있고
방향 그래프의 경우 DFS를 통해 판별할 수 있다.
DFS로 순회하면서 다음 방문할 곳이 이미 방문했고 재귀를 시작한 곳이라면 cycle을 이룬 것.

백준 10451번 순열 사이클

from sys import stdin
input = stdin.readline

# 재귀로 현 숫자가 가리키는 다음 숫자 확인
def dfs(startNum, originNum):
    global cycle
    nextNum = A[startNum]   # 현 숫자가 가리키는 다음 숫자
    if check[nextNum]:      # 이미 방문했다면 사이클 이루는지 확인
        if nextNum == originNum:    
            cycle += 1      # 사이클 이룬 것이므로 +1
            return
    else:   				# 첫 방문이라면 방문 표시
        check[nextNum] = 1
        dfs(nextNum, originNum)     # 재귀로 다음 숫자 확인

# main
T = int(input())
for _ in range(T):
    N = int(input())
    A = [0] + list(map(int, input().split()))
    cycle = 0
    check = [0] * (N+1)             # 방문 표시
    for start in range(1, N+1):
        if check[start]:            # 이미 방문한 곳이면 탐색 X
            continue
        check[start] = 1            # 새로 탐색 시작
        dfs(start, start)           # 이동할 숫자와 사이클 확인용 시작한 숫자

    print(cycle)

DFS로 연결 요소 찾기 (Connected Component)

n, m = 4, 5
graph = [[0,0,1,1,0], [0,0,0,1,1], [1,1,1,1,1], [0,0,0,0,0]]
result = 0

def dfs(x, y):
	if x > 0 and x < n and y > 0 and y < m:
		if graph[x][y] == 0:
			graph[x][y] == 1
			dfs(x+1, y)
			dfs(x-1, y)
			dfs(x, y+1)
			dfs(x, y-1)
			return True
	return False

for i in range(n):
	for j in range(m):
		if dfs(n, m) == True:
			result += 1

BFS로 최단경로 찾기

BFS : queue 이용, 큐에서 노드 꺼내고 해당 노드의 인접 노드 중 방문하지 않은 노드를 모두 큐에 삽입하고 방문처리
BFS는 간선의 비용을 이용해 최단거리를 찾는 문제에도 이용

n, m = 5, 6
_map = [[1,0,1,0,1,0],
        [1,1,1,1,1,1],
        [0,0,0,0,0,1],
        [1,1,1,1,1,1],
        [1,1,1,1,1,1]]

dx = [-1,1,0,0]
dy = [0,0,-1,1]
print(bfs(x,y))

def bfs(x,y):
    queue = deque()
    queue.append(x,y)
    while queue:
        x, y = queue.popleft()
        for i in range(4):
            nx = x + dx[i]
            ny = y + dy[i]
            if nx < 0 or nx >=n or ny < 0 or ny >= m:
                    continue
            if graph[nx][ny] == 0:
                    continue
            if graph[nx][ny] == 1:
                    graph[nx][ny] == graph[x][y] + 1
                    queue.append((nx,ny))
	return graph[n-1][m-1]

2. 백트래킹

DFS와의 차이
- 예시 : 그래프나 트리의 모든 노드를 탐색, 경로 찾기(미로 찾기), 그래프 연결성 확인, 경로 존재 여부 탐색
- 탐색 목표 : 모든 노드 방문

한 경로를 끝까지 탐색한 후 다른 경로로 넘어감.
- 예시 : 특정조건(제약조건)을 만족하는 해를 찾는 문제, 퍼즐, N-queen(퀸이 서로 공격하지 않는 배치를 찾는 문제), 순열과 조합
- 탐색 목표 : 최적의 해를 찾을 때, 조건을 만족하는 해를 찾을 때

모든 가능한 해를 시도하면서도 조건을 만족하지 않는 경로는 미리 Pruning하여 효율성을 높이므로 DFS보다 시간을 줄일 수 있다.
비트마스킹, DP 등을 활용해 Pruning하여 시간 최적화.
상태공간트리 를 그려보고 새로운 탐색이 무의미하다고 생각되면 자르기.
상태공간트리에서 depth를 내려갈 때마다 조건 체크.

백트래킹 문제들

백준 N-Queen	Gold 4	백트래킹 Well-known 문제
백준 소문난 칠공주	Gold 3	BFS와 같지만 Y가 4개 이상인 루트는 탐색하지 않음(백트래킹), 십자모양탐색(DFS로 조합찾고 BFS로 인접체크)
백준 Don’t Get Rooked	Gold 5
SWEA 4317 칩생산	Gold 5	1x1사이즈 방문이 아니고 2x2사이즈 모두 방문가능해야 방문 > DFS인데 현재칸 방문 + 다음칸 방문 동시에, 한 열 or 행 쭉 따라서 방문하고 도달하면 행 or 열 바꾸기 / 비트마스킹, DP로 탐색 최적화
백준 낚시왕	Gold 1	배열 늘려서 탐색하는 문제

3. 이진 탐색

탐색 범위를 반씩 좁혀가며 데이터를 탐색
리스트가 정렬되어 있을 때
탐색 범위가 큰 경우
특정 원소를 찾기 위해 앞에서부터 그냥 확인하는 것 → 순차 탐색
탐색 범위를 절반씩 좁혀가며 데이터를 탐색하는 방법으로 리스트가 정렬되어 있을 때 씀 O(logN) → 탐색 범위가 큰 경우 떠올려야 함

def binary_search(arr, target, start, end):
    if start > end:
        return None # 찾을 수 없는 경우
    mid = (start + end) // 2
    if arr[mid] == target:
        return mid
    elif arr[mid] < target:
        return binary_searcy(arr, target, mid + 1, end)
    else:
        return binary_search(arr, target, start, mid - 1)
    # mid + 1, mid - 1을 해주어야 while s <= e 에서 무한루프 걸리지 않음

정렬된 배열에서 특정 수의 갯수 구하기 (bisect) 이용

N개의 원소를 포함하고 있는 수열이 오름차순으로 정렬되 있습니다. 이때 수열에서 x가 등장하는 횟수를 계산하시오.
예를 들어수열 {1, 1, 2, 2, 2, 2, 3}이 있고 x=2이면, 현재 수열에서 값이 2인 원소가 4개 이므로 4를 출력합니다.
이 문제는 시간복잡도 O(logN)으로 설계하지 않으면 시간 초과 판정을 받는 문제

from bisect import bisect_left, bisect_right

def count_by_range(arr, left_val, right_val):
    right_index = bisect_right(arr, right_val)
    left_index = bisect_left(arr, left_val)
    return right_index - left_index

특정 값이 등장하는 첫번째 위치 bisect_left와 마지막 위치 bisect_right를 찾아 위치 차이를 계산해서 문제를 해결

최단경로

모든 노드를 연결하는 경로 중 비용이 최소가 되는 경로

1. 다익스트라 (Dijkstra) - Greedy

매 상황에서 가장 비용이 적은 노드를 선택해 임의의 과정을 반복하므로 Greedy로 분류하지만 DP의 원리가 적용된 것임.
특정 노드에서 출발해 다른 모든 노드로 가는 최단 경로를 계산
한 지점에서 다른 특정 지점까지 최단경로 → 1차원 리스트에 저장
algorithm
1. 출발 노드 설정
2. 최단 거리 테이블 초기화
3. 방문하지 않은 노드 중 최단 거리가 가장 짧은 노드를 선택
  - 이렇게 선택된 최단 거리가 가장 짧은 노드는 바뀌지 않기 때문에 그리디
4. 해당 노드를 거쳐 다른 노드로 가는 비용을 계산해 최단 거리 테이블 갱신
5. 3, 4 반복
- 이렇게 하면 최단 거리를 알 수 있는데, 경로를 출력하려면 추가적인 작업 필요

다익스트라 O(V^2)

import math
n, m = 6, 7
# graph[i] = [(j, c), ..]
# node i에서 node j로 가는 비용이 c
graph = [[], [(2, 2), (4, 1)], [(1, 2), (4, 2), (3, 3)], 
        [(2, 3), (6, 5)], [(1, 1), (2, 2), (5, 1)],
        [(4, 1), (6, 2)], [(5, 2), (3, 5)]]

distance = [math.inf] * (n + 1)
visited = [False] * (n + 1)

def get_min_node():
    index = -1
    min_val = math.inf
    for i in range(1, n + 1):
        if not visited[i] and min_val > distance[i]:
            min_val = distance[i]
            index = i
    return index

def dijkstra(start):
    distance[start] = 0
    visited[start] = True
    for j, c in graph[start]:
        distance[j] = c

    for _ in range(n - 1): # while all(visited)와 같음, 모든 노드 방문 위해
        cur = get_min_node()
        visited[cur] = True
        # cur을 거쳐 이동하는 경우가 더 짧은 경우
        for j, c in graph[cur]:
            cost = distance[cur] + c
            if distance[j] > cost:
                distance[j] = cost

dijkstra(1)
for i in range(1, n + 1):
    if distance[i] == math.inf:
        print('INF')
    else:
        print(distance[i])

다익스트라 우선순위 큐(삽입, 삭제 O(NlogN) 사용)

import heapq
import math
n, m = 6, 7
# graph[i] = [(j, c), ..]
# node i에서 node j로 가는 비용이 c
graph = [[], [(2, 2), (4, 1)], [(1, 2), (4, 2), (3, 3)], 
        [(2, 3), (6, 5)], [(1, 1), (2, 2), (5, 1)],
        [(4, 1), (6, 2)], [(5, 2), (3, 5)]]

distance = [math.inf] * (n + 1)

def dijkstra(start):
    q = []
    heapq.heappush(q, (0, start))
    distance[start] = 0
    while q:
        dist, cur = heapq.heappop(q)
        if distance[cur] > dist: # visitied check와 같은 역할, 이미 처리된 노드 무시
            continue
        for j, c in graph[cur]:
            cost = dist + c
            if cost < distance[j]:
                distance[j] = cost
                heapq.heappush(q, (cost, j))

dijkstra(1)
for i in range(1, n + 1):
    if distance[i] == math.inf:
        print('INF')
    else:
        print(distance[i])

노드 5000개 이하일 때 사용 이상일 때 우선순위 큐(삽입, 삭제 O(NlogN)) 사용
기본 원리는 동일하고, 각 단계마다 방문하지 않는 노드 중 최단 거리가 가장 짧은 노드를 선택(get_min_node())할 때 힙(Heap) 자료구조를 이용하는 것만 다름.
힙구조 사용 다익스트라 → O(ElogV)
- while은 최대 노드 갯수 V임.
- 인접 노드 확인하는 총횟수는 최대 E
- E개의 원소를 우선순위 큐에 넣었다가 모두 빼는 연산과 유사하며 O(ElogE)이고 중복 간선을 포함하지 않으면 O(ElogV)가 되는 것
음의 가중치를 허용하지 않음

2. 플로이드워셜 (Floyd-Warshall) - DP

DP, O(N^3) 이므로 노드의 개수가 적은 경우에 사용, 그렇지 않으면 다익스트라
즉, 노드가 500개 이하, 모든 노드 - 모든 노드 간 최단거리 구할 때 사용
모든 지점에서 다른 모든 지점까지 최단경로 → 2차원 리스트에 저장
다익스트라와 마찬가지로 단계별로 거쳐 가는 노드를 기준으로 알고리즘을 수행
다익스트라와 다르게 매 단계마다 방문하지 않은 노드 중 최단 거리를 갖는 노드를 찾는 과정이 필요하지 않다.
점화식 $D_{ab} = \min (D_{ab}, D_{ak} + D_{kb}$
- 각 단계마다 특정 노드 k를 거쳐 가는 경우를 확인하는데,
- a에서 b로 가는 최단 보다 a에서 k를 거쳐 b로 가는 거리가 더 짧은지 검사

import math
n, m = 6, 7
# graph[i] = [(j, c), ..]
# node i에서 node j로 가는 비용이 c
tmp = [[], [(2, 2), (4, 1)], [(1, 2), (4, 2), (3, 3)], 
        [(2, 3), (6, 5)], [(1, 1), (2, 2), (5, 1)],
        [(4, 1), (6, 2)], [(5, 2), (3, 5)]]

graph = [[math.inf] * (n+1) for _ in range(n+1)]

for i in range(1, n+1):
    graph[i][i] = 0
    for j, c in tmp[i]:
        graph[i][j] = c

# for 문 순서주의 : K (경유지) -> 시작지 I -> 도착지 J
for k in range(1, n+1):
    for i in range(1, n+1):
        for j in range(1, n+1):
            graph[i][j] = min(graph[i][j], graph[i][k] + graph[k][j])

# result
for a in range(1, n+1):
    for b in range(1, n+1):
        if graph[a][b] == math.inf:
            print('INF', end=' ')
        else:
            print(graph[a][b], end=' ')
    print()

⭐️ For 문 순서 왜 K (경유지) -> 시작지 I -> 도착지 J 냐면,

경유점 k를 순차적으로 먼저 접근해야 DP의 전제인 
i-j-k의 최단 거리는 i-k 최단거리 + k-j 최단거리 이다가 성립

노드가 1,2,3 있는 경우
자기 자신으로 가는 경로를 제외하고 1>2, 1>3, 2>1, 2>3, 3>1, 3>2를 찾아야 한다
K 고정시 순서가
k=1 1>1>2, 1>1>3, 2>1>1, 2>1>3, 3>1>1, 3>1>2
k=2 1>2>2, 1>2>3, 2>2>1, 2>2>3, 3>2>1, 3>2>2
k=3 1>3>2, 1>3>3, 2>3>1, 2>3>3, 3>3>1, 3>3>2

k=3 1>3>2 일때 1 > 2로 가능한 모든 경로 갱신 후이고, 마지막 경로가 되서 최단 경로를 찾을 수 있다.

i - k - j 인 경우

i=1 1>1>1 1>1>2 1>1>3 1>2>1 1>2>2 1>2>3 1>3>1 1>3>2 1>3>3
i=2 2>1>1 2>1>2 2>1>3 2>2>1 2>2>2 2>2>3 2>3>1 2>3>2 2>3>3
i=3 3>1>1 3>1>2 3>1>3 3>2>1 3>2>2 3>2>3 3>3>1 3>3>2 3>3>3

i=1에서 1>1>3 1>2>3 1>3>3 에서 1>3 경로 거리를 업데이트 해버림
하지만 이때 2>3과 3>3은 아직 갱신이 안된 상태라 최단 경로가 아님
경유지 노드를 고려하기 전에 특정 노드 쌍에 대한 최단 경로를 먼저 고려해버림

동적 계획법(DP, Dynamic Programming)

그리디, 구현, 완전탐색 고려 후 풀이 방법이 떠오르지 않거나 시간이 오래걸릴 것 같으면 DP를 고려.
점화식을 떠올릴 수 있어야 한다.
DP가 코테에 나오면 어렵다.
Dynamic은 자료구조의 동적할당 같은 의미랑 상관없음. 별 뜻 없음
DP의 사용 조건
- 최적 부분 구조(Optimal Substructure) : 큰 문제를 작은 문제로 나눌 수 있다.
- 중복되는 부분 문제(Overlapping Subproblem) : 동일한 작은 문제를 반복적으로 해결.
DP solution
- Top -> Down (하향식, 메모이제이션, 재귀 사용)
- Down -> Top (DP의 전형적인 형태)
DP와 분할 정복은 최적 부분 구조를 가질때 사용한다는 점은 같지만
- DP는 각 부분 문제들이 서로 영향을 미치며 부분 문제가 중복 되지만,
- 분할 정복은 동일한 부분 문제가 반복적으로 계산되지 않는다.
  - ex) 퀵정렬. 한 번 Pivot이 자리를 변경해서 자리 잡으면 그 pivot의 위치는 바뀌지 않고, 분할 이후 해당 Pivot을 다시 처리하는 부분 문제는 호출하지 않는다.
그리디, 구현, 완전탐색 아이디어로 해결할 수 있는지 검토하고 풀이 방법이 떠오르지 않거나 완전탐색 같은 것이 시간이 오래 걸릴 것 같으면 DP를 고려해 볼것.
일단 재귀 함수로 비효율적인 완전 탐색 프로그램을 작성하고 (탑다운) 작은 문제에서 구한 답이 큰 문제에서 그대로 사용될 수 있으면, 메모이제이션으로 코드를 개선하는 방법으로 DP를 사용할 수 있다.
- EX) 소프티어 염기서열 : 완전탐색으로 풀렸지만 시간초과났음.
- EX) 프로그래머스 N으로 표현 : DFS로 풀면 속도가 100배 차이남
DP가 코테에 나올 경우 기본 유형 문제가 나올 확률이 높은데, DP의 점화식을 떠올리는데 시간이 많이 소요되기 때문.

DP Well-known 문제 - 외판원 순회(TSP, Travelling Salesman Problem)

https://www.acmicpc.net/problem/2098 : 비트마스킹 해야함
https://www.acmicpc.net/problem/10971 : DFS + memoriaztion으로도 가능

# 비트마스킹 없이
N = int(input())
W = [list(map(int, input().split())) for _ in range(N)]
# W[i][j] > i에서 j로 가기 위한 비용, asymmetric, W[i][j] =/= W[j][i]

INF = float('inf')
def dfs(cur, route):
    if tuple(route) in dp[cur]:
        return dp[cur][tuple(route)]
    
    if len(route) == N:
        return W[cur][0] if W[cur][0] > 0 else INF
    
    min_d = INF
    for nxt in range(N):
        if nxt not in route and W[cur][nxt] > 0:
            d = W[cur][nxt] + dfs(nxt, sorted(route + [nxt]))
            min_d = min(min_d, d)

    dp[cur][tuple(route)] = min_d
    return min_d

dp = {i: {} for i in range(N)}
print(dfs(0, [0]))

# 비트마스킹

N = int(input())
W = [list(map(int, input().split())) for _ in range(N)]
# W[i][j] > i에서 j로 가기 위한 비용, asymmetric, W[i][j] =/= W[j][i]

INF = float('inf')
def dfs(cur, route):
    if dp[cur][route]:
        return dp[cur][route]
    
    if route == (1 << (N-1)) - 1:
        return W[cur][0] if W[cur][0] > 0 else INF
    
    min_d = INF
    for nxt in range(1, N):
        if W[cur][nxt] and not route & (1 << (nxt - 1)):
            d = W[cur][nxt] + dfs(nxt, route | (1 << (nxt - 1)))
            min_d = min(min_d, d)
    
    dp[cur][route] = min_d
    return min_d


dp = [[0] * (1 << (N-1)) for _ in range(N)]
print(dfs(0, 0))

DP 문제 - 1로 만들기 (https://www.acmicpc.net/problem/1463)

1이될때까지 라는 문제는 n이 무엇이든 1로 빼는 것보다 나누는 것이 값을 빠르게 줄일 수 있어 Greedy 였지만, 이 문제는 당장 큰 수로 나누는 것 보다 다른 연산과 섞었을 때 더 빠르게 값을 줄일 수 있는 경우가 존재할 수 있기 때문에 그리디X
최적 부분 구조와 중복되는 부분 문제를 만족.
$a_i = i$를 1로 만들기 위한 최소 연산 횟수
- 점화식 $a_i = \min (a_{i-1}, a_{i/2}, a_{a/3} + 1$
- 1을 빼는 연산을 제외하고는 해당 수로 나누어떨어질 때에 한해 점화식을 적용

# 1 Bottom up
# dp의 index가 n이고 value가 연산 횟수이다.
# index 2는 index 1에서 -1 연산 한번 해서 만들어진 것 이므로 dp[2] = 1
dp = [0] * n
for i in range(2, n+1):
    dp[i] = dp[i-1] + 1
    if i % 2 == 0:
        # 이전 값에서 -1 한 것 누적횟수
        # 2로 나눈 값에서 -1 한 것 누적횟수
        # 중 더 작은거
        dp[i] = min(dp[i], dp[i // 2] + 1)
    if i % 3 == 0:
        dp[i] = min(dp[i], dp[i // 3] + 1)

# 2 Top Down 재귀
# 애가 더 빨랐음
dp={1:0}
def rec(n):
    if n in dp.keys():
        return dp[n]
    if (n%3==0) and (n%2==0):
        dp[n]=min(rec(n//3)+1, rec(n//2)+1)
    elif n%3==0:
        dp[n]=min(rec(n//3)+1, rec(n-1)+1)
    elif n%2==0:
        dp[n]=min(rec(n//2)+1, rec(n-1)+1)
    else:
        dp[n]=rec(n-1)+1
    return dp[n]

# 3. BFS
from collections import deque
x=int(input())
Q=deque([x])
visited=[0]*(x+1)
while Q:
    c=Q.popleft()
    if c==1:
        break
    if c%3==0 and visited[c//3]==0:
        Q.append(c//3)
        visited[c//3]=visited[c]+1
    if c%2==0 and visited[c//2]==0:
        Q.append(c//2)
        visited[c//2]=visited[c]+1
    if visited[c-1]==0:
        Q.append(c-1)
        visited[c-1]=visited[c]+1
print(visited[1])

DP 문제 - 배낭문제

# 1차원 DP
N, M = map(int, input().split(' '))

dp = [0 for _ in range(M + 1)]
objects = [tuple(map(int, input().split())) for _ in range(N)]

for W, C in objects:
  for i in range(M, W - 1, -1):
    dp[i] = max(dp[i], dp[i - W] + C)

print(dp[M])

최소 신장 트리(MST, Minimum Spanning Tree)

모든 노드를 포함하되 사이클이 없는 최소비용 tree(MST)를 만들기 위한 알고리즘
MST의 전체 edge 갯수는 node 갯수 - 1이다. E = V - 1

spanning tree는 cycle이 존재하지 않고, 모든 노드를 포함하는 트리
- node가 n개면 edge는 n-1개
MST는 모든 노드를 최소 비용으로 연결하는 트리
크루스칼과 프림 알고리즘이 있다

1. 크루스칼 (Kruskal) 알고리즘

그리디 알고리즘
정렬부분 때문에 O(ElogE)
UnionFind 사용
동작과정
1. 간선을 비용에 따라 오름차순으로 정렬 (비용이 적은 간선부터 확인하므로 그리디)
2. 간선을 하나씩 확인하며 사이클(같은 집합에 속했는지, Find) 체크(union find이용)
  1. 사이클 X → MST에 포함시킴(Union)
  2. 사이클 O → MST에 포함X
3. 모든 간선에 대해 1-2 반복

v, e = 7, 9
# (cost, start_node, end_node)
graph = [(29, 1, 2), (75, 1, 5), (35, 2,3), 
         (34, 2, 6), (7, 3, 4), (23, 4, 6), 
         (13, 4, 7), (53, 5, 6), (25, 6, 7)]
graph.sort()

# 루트노드
def find_parent(parent, v):
    if parent[v] != v:
        # 원래는 바로 직전 부모를 parent에 저장하는데
        # 그냥 가장 최상단 root 노드를 parent로 저장하므로
        # 재귀는 2번밖에 안됨
        parent[v] = find_parent(parent, parent[v])
    return parent[v]

# 부모를 같게 만드는 연산 Union 연산
def union_parent(parent, a, b):
    a = find_parent(parent, a) 
    b = find_parent(parent, b)
    if a < b:
        parent[b] = a
    else:
        parent[a] = b

parent = [i for i in range(v + 1)]
total_cost = 0
for cost, s, e in graph:
    # 부모 다름, Cycle X
    if find_parent(parent, s) != find_parent(parent, e):
        union_parent(parent, s, e)
        total_cost += cost
print(total_cost)

2. 프림 (Prim) algorithm

크루스칼의 경우 O(eloge)이고, 프림은 O(n^2)이다.
프림도 그리디 Priority Queue(heapq) 사용
그래프 내에서 적은 숫자 간선을 가지는 희소 그래프 (Sparse graph)인 경우는 크루스칼
밀집 그래프 (Dense graph)인 경우는 프림이 적합.
동작 과정
1. 시작 정점과 연결된 모든 간선을 Priority Queue에 삽입
2. 최소 가중치의 간선 선택
3. 선택한 간선의 root가 트리에 포함되어 있지 않으면, Union + 인접 간선 모두 PQ에 삽입
4. 모든 정점이 트리에 포함될 때까지 3,4 반복

문제

백준 1717, 백준 4195

import heapq

graph = {
    'A': {'B': 1, 'C': 4},
    'B': {'A': 1, 'C': 2, 'D': 5},
    'C': {'A': 4, 'B': 2, 'D': 1},
    'D': {'B': 5, 'C': 1}
}

def prim(graph, start):
    mst = []
    visited = set()
    edges = [] # priority queue
    count, cost = 0, 0
    heapq.heappush(edges, (0, start))

    # 이동가능한 노드가 없거나 모든 노드 방문(e = v - 1)할 경우 stop
    while edges and count < len(graph) - 1:
        w, node = heapq.heappop(edges)

        if node not in visited:
            cost += 1
            count += 1
            visited.add(node)
            mst.append((node, w))
            for node, weight in graph[node].items():
                heapq.heappush(edges, (weight, node))
    
    if count != len(graph) - 1:
        return '고립 노드가 있어 모두 연결 할 수 X' 
    else:
        return f'total weight: {cost}'

print(prim(graph, 'A'))

3. 위상 정렬 (Topology Sort)

cycle이 없는, 방향 그래프 DAG(Direct Acycle Graph) 여야 한다.
사이클이 있다면 각 노드는 모두 InDegree는 1 이상이므로 큐에 넣을게 없어 위상 정렬 할 수 없음
ex. 선수과목이 있을 때 적절한 학습순서
한번 방문한 노드는 다시 Q에 들어가지 않고, 엣지도 다시 보지 않으므로 O(V+E)

방법

DFS
큐 이용
1. indegree가 0인 모든 노드를 큐에 넣고
2. 큐가 빌때 까지 아래 과정 반복
  - 1. 큐에서 원소를 꺼내 해당 노드에서 나가는 outdegree 간선을 제거
    1. 새롭게 진입차수가 0이된 노드를 큐에 삽입
- 결과적으로 각 노드가 큐에 들어온 순서가 위상 정렬을 수행한 결과와 같아진다.

from collections import deque
v, e = 7, 8
edges = [(1,2), (1,5), (2,3), (2,6), (5,6), (3,4), (6,4), (4,7)]
graph = [[] for i in range(v+1)]
indegree = [0] + (v + 1)
for s, e in edges:
    graph[s].append(e)
    indegree[b] += 1

def topology_sort():
    result = []
    q = deque()
    for i in range(1, v+1):
        if indegree[i] == 0:
            q.append(i)
    while q:
        cur = q.popleft()
        result.append(cur)
        for i in graph[cur]:
            indegree[i] -= 1
            if indgree[i] == 0:
                q.append(i)
    for i in result:
        print(i, end=' ')
topology_sort()
# 1 2 5 3 6 4 7
# 1 5 2 3 6 4 7
# ...
# 여러 답 존재
# 모든 노드를 방문하기 전에 Q가 빈다면 사이클 존재

그래프

1. LCA(Lowest Common Ancestor) 최소 공통 조상

모든 노드에 대한 깊이(depth)를 계산
최소 공통 조상을 찾을 두 노드를 확인
1. 먼저 두 노드의 깊이가 동일하도록 거슬러 올라감
2. 이후에 부모가 같아질 때까지 반복적으로 두 노드의 부모 방향으로 거슬러 올라감
모든 LCA(a,b) 연산에 대해 2번의 과정을 반복

매 쿼리 lca(a,b)마다 부모 방향으로 거슬러 올라가기 위해 최악의 경우 O(N)
- 쿼리 갯수 M이면 O(NM)

문제 백준11437

import sys
sys.setrecursionlimit(int(1e5))
n = int(input())

parent = [0] * (n+1)
d = [0] * (n+1) # 깊이 저장
c = [0] * (n+1) # 방문 여부 (깊이 저장 여부)
graph = [[] for _ in range(n+1)]

for _ in range(n-1):
    a, b = map(int, input().split())
    graph[a].append(b)
    graph[b].append(a)

# dfs로 depth 저장하기
def dfs(x, depth):
    c[x] = True
    d[x] = depth
    for y in graph[x]:
        if c[y]:
        # 이미 depth를 구한경우
            continue
        parent[y] = x
        dfs(y, depth + 1)

def lca(a, b):
    # 먼저 깊이를 맞춰주기
    while d[a] != d[b]:
        if d[a] > d[b]:
            a = parent[a]
        else:
            b = parent[b]
    # 부모가 같을 때까지 거슬러 올라가기
    while a != b:
        a = parent[a]
        b = parent[b]
    return a

dfs(1, 0)
m = int(input())

for i in range(m):
    a, b = map(int, input().split())
    print(lca(a,b))

2. 바이너리 인덱스 트리(BIT, Binary Indexed Tree), 펜윅 트리(Fenwick Tree)

2진법 인덱스 구조를 사용해 구간 합 문제를 효과적으로 해결할 수 있는 자료구조
세그먼트 트리의 변형으로 수열의 구간 합을 빠르게 계산
세그먼트 트리와 마찬가지로 O(logN) 시간에 구간 합을 계산할 수 있고 세그먼트 트리에 비해 적은 공간이 필요하고 구현하기 쉽다는 장점이 있음

-7은 7의 2진수 표기에서 flip을 한 후 +1을 해준 것

7의 경우 0이 아닌 가장 마지막 비트는 00..00111에서 가장 마지막인 1이다
이를 알기 위해서 7 & -7 를 하면 된다
정수 3은 0이 아닌 마지막 비트 1, 정수 4는 4
0이 아닌 마지막 비트를 BIT배열에서 저장하고 있는 값들의 갯수로 사용한다.

사이즈 n짜리 배열에 세그먼트 트리를 만들려먼 2n-1만큼의 배열이 필요하지만, 펜윅 트리는 n만큼의 배열이 필요하다.
배열 A에 값이 담겨 있고, BIT에는 저장하고 있는 값들의 갯수이다. 예를 들어 BIT의 인덱스가 i이고 0아닌 마지막 비트가 4라면, A[i-3], A[i-2], A[i-1], A[i]의 합을 가지고 있다.
- BIT에서 index=1은 0아닌 마지막 비트가 1이기 때문에 A배열의 1~1까지 값을 가지고 있고
- BIT에서 index=2은 0아닌 마지막 비트가 2이기 때문에 A배열의 1~2까지 값을 가지고 있고
- BIT에서 index=3은 0아닌 마지막 비트가 1이기 때문에 A배열의 3~3까지 값을 가지고 있고
- BIT에서 index=12은 0아닌 마지막 비트가 4이기 때문에 A배열의 9~12까지 값을 가지고 있고
- BIT에서 index=16은 0아닌 마지막 비트가 16이기 때문에 A배열의 1~16까지 값을 가지고 있고

예시 문제

백준 '구간 합 구하기'
어떤 N개의 수가 주어져 있고 중간 수에 변경이 빈번히 일어나고 그 중간에 어떤 부분 합을 구하려 한다. 만약 1,2,3,4,5가 있을 때 3번째 수를 6으로 바꾸고 2~5의 합을 구하라고 하면 답은 17이다.
이와 같은 문제는 변경이 빈번하므로 값 변경 O(N), 구간 합 계산 O(K)로 O(NK)이므로 비효율적이다.

펜윅을 사용하면, 값 변경이 O(logN)이다. 예를 들어 A[3]의 값을 바꿨다면, A[3] 값이 포함된 BIT[3], BIT[4], BIT[8], BIT[16]의 값만 변경해 update하면 된다.

구간 합(Prefix Sum)을 구할 때 0이 아닌 마지막 비트만큼 빼면서 구간 합을 계산 O(logN)
- 11에서 시작해서, 11의 0아닌비트는 1이므로 -1칸 이동해 BIT[10]을 더하고
- 10에서 0아닌비트는 2이므로 -2칸 이동해 BIT[8]을 더한다.
최종 O(logN) * O(logN) = O(logN)

n, m, k = map(int, input().split())

arr = [0] * (n+1)
tree = [0] * (n+1)

def prefix_sum():
    result = 0
    while i > 0:
        result += tree[i]
        i -= (i & -i)

def update(i, dif):
    while i <= n:
        tree[i] += dif
        i += (i & -i)

def interval_sum(start, end):
    return prefix_sum(end) - prefix_sum(start - 1)

for i in range(1, n+1):
    x = int(input())
    arr[i] = x
    update(i, x)

for i in range(m + k):
    a, b, c = map(int, input().split())
    if a == 1: # update
        update(b, c - arr[b]) # dif = 바뀐 크기
        arr[b] = c
    else: # 구간 합 연산
        print(interval_sum(b, c))

3. 오일러 경로(Eulerian trail)

오일러 경로(Eulerian trail)은 그래프에 존재하는 모든 Edge를 정확히 1번씩만 방문하는 연속된 경로이고, 이때 시작점과 도착점이 같다면 오일러 회로(Circuit)이 된다.
오일러 회로는 시작과 끝점을 제외하고는 들어오는 간선이 있다면, 반드시 나가는 간선이 하나 더 있어야 한다. 따라서 차수가 항상 짝수
오일러 경로는 무향 그래프일 경우 차수가 홀수인 정점이 2개일 때 존재
오일러 회로는 무향 그래프일 경우 차수가 홀수인 정점이 0개일 때 존재
오일러 회로 구하는 방법은 Hierholzer's 알고리즘과 DFS가 있다.

오일러 경로 - DFS

정점을 방문하면서 정점의 간선을 지우고, 차수를 1씩 줄여나감
해당 정점에 더이상 간선이 없는 순간 정점 번호를 출력하고 이어 붙인 것이 답

위 그림에서 A B C 방문하면 다음 방문지는 D, E, F가 될 수 있음
D를 방문하는 경우 D-> A로 가서 A가 더이상 간선이 없어 방문이 끝나 C로 돌아오고 "A D"를 출력
C로 돌아와서 D, E, F 에서 E를 방문하고 C E F C로 가면 C도 더이상 간선이 없으므로 "C E F C"를 출력하고 C에서도 시작점인 A로 돌아가 "B A"를 출력해
이어붙이면 " A D C E F C B A "
D말고 E를 먼저 방문해도 결과는 같음

오일러 경로 - Hierholzer's

아무 정점 v에서 출발하여 v로 돌아오는 경로를 하나 뽑는다.
경로에 속한 정점 중 인접한 간선들 중 경로에 쓰이지 않은 간선이 있는 정점 u가 존재하면 u로 시작해 아직 방문하지 않은 간선만 사용해 u로 돌아오는 경로를 하나 더 찾아 원래 경로에 끼워 넣는다

오일러 경로 문제 - 프로그래머스 여행 경로

# 문제를 오일러 회로 구하는 방식으로 접근하여 Hierholzer's 알고리즘 사용
# DFS 스택? 으로도 볼 수 있나?

def solution(tickets):
    routes = {}
    for t in tickets:
        routes[t[0]] = routes.get(t[0], []) + [t[1]]
    for r in routes:
        routes[r].sort(reverse=True)
    stack = ["ICN"]
    path = []
    while len(stack) > 0:
        top = stack[-1]
        if top not in routes or len(routes[top]) == 0:
            path.append(stack.pop())
        else:
            stack.append(routes[top][-1])
            routes[top] = routes[top][:-1]
    return path[::-1]


######################################################
from collections import defaultdict

def solution(tickets):
    r = defaultdict(list)
    for i,j in tickets:
        r[i].append(j)
    for i in r.keys():
        r[i].sort()

    s = ["ICN"]
    p = []
    while s:
        q = s[-1]
        if r[q] != []: # 위랑 차이점은 defaultdict를 써서 dest가 없는 노드라도 에러가 안난다는거
            s.append(r[q].pop(0))
        else:
            # 다음 목적지가 없는데 티켓이 소진이 된 것이 아니라면 맨 마지막 dest일 것임
            # 경로가 남은 곳 까지 pop해서 p에 넣으면 r[q]가 있는 곳으로 가게되고 남은 티켓을 소진하게 됨
            p.append(s.pop())
    return p[::-1]

오일러 경로 문제 - 프로그래머스 방의 개수

from collections import defaultdict

def solution(arrows):
    dx = [-1, -1, 0, 1, 1, 1, 0, -1]
    dy = [0, 1, 1, 1, 0, -1, -1, -1]
    
    graph = defaultdict(list)
    
    x, y = (0, 0)
    answer = 0
    for d in arrows:
        for _ in range(2):
            nx, ny = x + dx[d], y + dy[d]
            
            if (nx, ny) in graph and not (x, y) in graph[(nx, ny)]:
                # 노드에 또 다시 방문했고,
                # 이미 지나왔던 간선이 아니면
                answer += 1
            
            graph[(nx, ny)].append((x, y))
            graph[(x, y)].append((nx, ny))
            
            x, y = nx, ny
            
    return answer

심화/기타 알고리즘

1. 투포인터(Sliding Window)

슬라이딩 윈도우라고도 함
리스트에 순차적으로 접근해야 할 때 두개의 점의 위치를 기록하면서 처리하는 알고리즘
우리가 2,3,4,5,6번 학생을 지목해야 할 때 2번부터 7번까지 학생이라고 함
- 이처럼 리스트에 담긴 데이터에 순차적으로 접근해야 할 때는 시작점과 끝점 2개의 점으로 접근할 데이터의 범위를 표현할 수 있다.
전체 값을 갱신하지 않고 이동한 왼쪽, 오른쪽 포인터의 값만 갱신하는 것이 포인트고 이 때문에 O(N)
크기가 고정일 때 슬라이딩 윈도우 투포인터 떠올리기

예제 : 특정한 합을 가지는 부분 연속 수열 찾기

N개의 자연수로 구성된 수열이 있다.
이때 합이 M인 부분 연속 수열의 개수를 구해보시오. 제한 수행시간은 O(N)
완전탐색으로 푼다면 O(N * N)
문제 해결 아이디어
1. 시작점과 끝점이 첫 번째 원소 인덱스 0을 가리키도록 한다
2. 현재 부분 합이 M과 같다면, count += 1
3. 현재 부분 합이 M보다 작다면, end += 1
4. 현재 부분 합이 M보다 크다면, start += 1
5. 모든 경우를 확인할 떄까지 2~4과정 반복

n, m = 5, 5
data = [1, 2, 3, 2, 5]

count = 0
interval_sum = 0
end = 0

for start in range(n):
    while interval_sum < m and end < n:
        interval_sum += data[end]
        end += 1
    if interval_sum == m:
        print(f'{start}-{end - 1}')
        count += 1
    interval_sum -= data[start]

print(count)

문제
- 카카오엔프 기출 문자열의 다양성
- 백준 12891 : DNA 비밀번호
- 백준 2003 : 수들의 합
- 백준 1644 : 소수의 연속합
- 백준 1806 : 부분합
- 백준 2230 : 수 고르기
- 백준 1484 : 다이어트
- 백준 2038 : 골룽 수열
- 백준 2531 : 회전 초밥
- 백준 2096 : 내려가기
- 백준 2293 : 동전1

2. 구간 합(Interval Sum) 빠르게 계산

연속적으로 나열된 N개의 수가 있을 때 특정 구간의 모든 수를 합한 값을 계산하는 문제
- ex. 10, 20, 30, 40, 50 에서 2번재부터 4번째 수까지의 합은 20 + 30 + 40 = 90
구간 합 계산이 한번이라면 선형탐색하면 되는데, 여러번 이라면?
문제
- N개의 정수로 구성된 수열과 M개의 쿼리가 주어진다.
  - 각 쿼리는 Left와 Right로 구성되어 있고 [Left, Right] 구간 합 출력
  - 수행시간 O(N+M)
해결 아이디어 : Prefix Sum(접두사 합)
- 배열의 맨 앞부터 특정 위치까지의 합을 미리 구해 놓은 것
- 알고리즘
  1. N개의 수 위치 각각에 대한 접두사 합을 계산해 P에 저장한다.
  2. 매 M개의 쿼리 정보를 확인할 때 구간 합은 P[Right] - P[Left - 1]이다.
- [10, 20, 30, 40, 50]
- P > [0, 10, 30, 60, 100, 150]
- Left=1, Right=3 > P[3] - P[0] = 60

n = 5
data = [10, 20, 30, 40, 50]

sum_value = 0
prefix_sum = 0
for i in data:
    sum_value += i
    prefix_sum.append(sum_value)

left, right = 3, 4
print(prefix_sum[right] - prefix_sum[left - 1])

3. 소수 판별 알고리즘

약수의 대칭성을 이용하기 O(sqrt(x))
- 16의 약수는 1,2,4,8,16이고, 2*8=16, 8*2=16으로 대칭이다. (여기서 가운데 약수는 4이고 루트16이다.)
- 특정 자연수의 모든 약수를 찾을 때 가운데 약수(제곱근)까지만 확인하면 된다. 예를 들어, 16이 2로 나누어 떨어진다는 것은 8로도 나누어 떨어진다는 것을 의미.

import math
def is_prime(x):
    for i in range(2, int(math.sqrt(x) + 1)):
        if x % i == 0:
            return False
    return True

위 알고리즘은 1개의 수의 소수를 판별
특정 범위 안의 수에 존재하는 소수를 찾을 땐? > 에라토스테네스의 체

4. 에라토스테네스의 체

특정한 수의 범위 안에 존재하는 모든 소수를 찾을 때, 즉 다수의 자연수에 대해 소수 여부를 판별
N보다 작거나 같은 모든 소수를 찾을 수 있다.
동작과정
1. 2부터 N까지 모든 자연수를 나열
2. 남은 수 중에서 아직 처리하지 않은 가장 작은 수 i를 찾는다.
3. 남은 수 중에서 i의 배수를 모두 제거한다 (i는 제거하지 않는다)
4. 더 이상 반복할 수 없을 때까지 2와 3을 반복한다.
시간 복잡도는 선형에 가까울 정도로 빠름 O(NloglogN)
하지만 메모리가 많이 필요하다
- 메모리를 줄여야 한다면

import math

n = 1000
array = [True for i in range(n + 1)]

for i in range(2, int(math.sqrt(n)) + 1):
    if array[i] == True: # i가 남은 경우 = 소수
        j = 2
        while i * j <= n:
            array[i * j] = False
            j += 1

for i in range(2, n + 1):
    if array[i]:
        print(i, end=' ')

5. 비트마스킹

정수의 이진수 표현을 자료 구조로 쓰는 알고리즘.
효율적인 집합 연산을 수행하거나 관리할 때 사용.
- ex. 부분 집합 생성 및 탐색, 조합 문제, 상태 압축
bit 연산이기 때문에 시간 복잡도가 O(1)
비트연산자
- 1010 & 1111 = 1010 # AND 모두 1이면 1
- 1010 | 1111 = 1111 # OR 둘 중 하나라도 1이면 1
- 1010 ^ 1111 = 0101 # XOR 대응하는 비트가 서로 다르면 1
- ~1010 = 0101 # NOT 비트반전
- 1001 << 2 = 100100 # Left shift 왼쪽으로 2칸 밀고 0으로 채우기
  - 1001은 10진수로 9, int("1001",2)=9 | A * 2^B
  - 100100은 10진수로 36, int("100100",2)=36
- 1010 >> 2 = 0010 # Arithmetic Right shift
  - 00000100 >> 1 = 00000010 (4 >> 1 = 2) | A * 2^B
  - 11111111 >> 2 = 10011111 (-1 >> 2 = -31)

파이썬으로 비트표현

# 1. 0b 접두사 붙이기
num = 0b1010 # 앞의 0은 생략 00000000 00000000 00000000 00001010
# 2. bin() 사용
num = bin(10) # '0b1010'
# 2진수 > 10진수
num = int('00001010', 2)
# str을 이진수로
s = '0100'
bin(int(s, 2))

비트연산

S = 122 # 0b1111010
idx = 2 # 0b1111'0'10

# 1. ADD : 2진수 숫자 S의 idx에 1을 추가
# 1 << idx : 1을 idx만큼 left shift 00000001 << 2 = 00000100
# 즉, idx자리만 1인 비트 만듬
bin(S | (1 << idx)) # 01111'0'10 | 00000'1'00 = 01111110

# 2. REMOVE : 2진수 숫자 S의 idx에 1을 제거
# 1 << idx idx자리만 1인 비트 만들고 not을 하면 idx자리만 0인 비트
# and하면 idx 값만 무조건 0이되고 나머지는 S자신 그대로
bin(S & ~(1 << idx)) # 01111'0'10 & 11111'0'11 = 01111010

# 3. CHECK : idx 값 확인
# 1 << idx idx자리만 1인 비트 만들고
# and하면 idx자리만 자기 자신이고 나머지는 모두 0
(S & (1 << idx)) # 01111'0'10 & 00000'1'00 = 00000000 # False = 0
idx2=1
(S & (1 << idx2)) # 011110'1'0 & 000000'1'0 = 00000010 # True = 1

# 4. Toggle : idx 값을 toggle(1이면 0, 0이면 1로 변환)
# 0과 XOR 연산하면 자기자신, 1과 XOR 연산하면 자신이 1이면 0, 0이면 1인 것 이용
bin(S ^ 1 << idx) # 01111'0'10 ^ 00000'1'00 = 01111110

# NOT 연산을 하면 부호비트까지 바뀌는거 처리해줘야해서 XOR 사용하는 것이 좋음
binary_string = "010100"

# 문자열을 정수로 변환
num = int(binary_string, 2)
# 모든 비트를 1로 만드는 마스크 생성 (ex. '010100'의 길이만큼)
mask = (1 << len(binary_string)) - 1

# XOR 연산으로 비트를 반전
inverted_num = num ^ mask

# 다시 2진수 문자열로 변환
inverted_string = bin(inverted_num)[2:].zfill(len(binary_string))

print(inverted_string)  # 출력: 101011

# S를 -1로 초기화하면 모든 비트가 1, S = -0b1
# S를 0으로 초기화하면 모든 비트가 0, S = 0b0
bin(-0b1 & 0b0) # '0b0'
bin(-0b1 | 0b0) # '-0b1'

최애 유형은 비트마스킹이랑 DP

[RLHF] DeepSeek의 GRPO(Group Relative Policy Optimization)

minkyung — Thu, 23 Jan 2025 02:32:23 +0900

GRPO(Group Relative Policy Optimization)

link : https://arxiv.org/pdf/2402.03300

DeepSeekMath는 Gemini나 GPT-4 레벨의 성능과 다른 open LLM보다 뛰어난 MATH bechmark 성능은 달성하면서 외부 toolkits나 voting techiniques를 사용하지 않았다고 한다.

여기서 사용된 RL tuning알고리즘은 GRPO(Group Relative Policy Optimization)이며 해당 논문에서 처음 제안하는 알고리즘이다. GRPO는 PPO(Proximal Policy Optimization)의 variant 중 하나로 PPO의 메모리 사용량을 최적화하면서 mathmatical reasoning 능력을 향상시킨 알고리즘이다.

GPRO는 PPO에서 사용되는 Value function을 사용하지 않았다. 대신 group scores를 통해 baseline을 추정하여 actor model(policy)를 학습한다. 이렇게 함으로써 DPO와 같이 보상 모델을 사용하지 않는 알고리즘처럼 학습 리소스를 줄였다.

그리고 Rejection Sampling Fine-tuning(RFT), DPO, PPO, GRPO 계열의 서로 다른 RL tuning 알고리즘을 통합하는 패러다임을 제시하고, 이러한 방법들이 direct or simplified RL techniques로 개념화되는 것을 찾았다. 또한 매우 다양한 실험을 진행했는데 Online vs Offline training, outcome vs process supervison, single-turn vs iterative RL 등이다. 이렇게 다양한 실험을 통해 이 패러다임의 중요한 요소에 대해 이야기한다.

✲ Group Relative Policy Optimization

From PPO to GRPO

PPO(Proximal Policy Optimization)은 actor-critic RL 알고리즘으로 LLM 학습시 RL tuning 단계에서 많이 사용되는 알고리즘이다. LLM은 state-action space가 매우 크기 때문에 RL 학습시 보상에만 의존해 지나치게 업데이트가 커짐을 방지해야하기 때문이다.

policy 업데이트시 policy를 크게 업데이트하면 학습이 불안정해지는 문제를 해결하기 위해 clipping을 두어 업데이트가 너무 커지는 것을 방지한다. 이전 policy ($\pi_{\theta_{old}}$)와 현재 policy ($\pi_{\theta}$)의 확률비가 너무 크다면 policy가 크게 변화하는 것으로 간주하고 clipping을 통해 이 ratio가 일정 범위 ($[1-\epsilon , 1+\epsilon]$) 안에 있는 경우에만 업데이트하고 그렇지 않은 경우 기댓값을 그대로 유지한다.

$A_t$는 advantage로 현재 보상에서 learned value($V_{\psi}$) 를 뺀 값이다($A_t = Q(s_t, a_t) - V_{\psi}(s_t)$). $s_t$에서 기대되는 전체 보상의 평균과 $a_t$를 취한 후의 보상의 차이로 현재 상태에서 특정 action이 얼마나 좋은지에 대한 상대적 가치를 나타낸다.

그냥 보상만 사용하여 학습하는 것 보다 Advantage를 사용하면 업데이트 효율성과 학습 속도가 개선되기 때문에 사용한다. clipping된 surrogate objective를 사용해 이 Advantage가 클 수록 큰 업데이트를 하고 안정적인 학습을 보장할 수 있다. 이때 Value function($V_{\psi}$)는 policy와 함께 학습되는데 reward model의 over-optimization을 완화할 수 있다.

LLM학습에서 PPO가 적용될 때는 reward에서 per-token KL penalty 텀을 추가하는 방식으로 사용된다. reward($r_\phi$)에서 기존 모델($\pi_{ref}$)와 너무 차이가 난다면 reward를 낮추며 이 패널티는 coefficient $\beta$로 조절한다.

KL 텀을 minimize하기 위해서는 $\pi_{\theta}$와 $\pi_{ref}$가 가까워지면서 $\pi_{\theta}$의 엔트로피가 너무 낮아지지 않게 해야한다. 직접적으로 $\pi_{\theta}$의 entropy를 키우는 것은 아니지만 $\pi_{\theta}$의 엔트로피를 일정 수준으로 높게 유지하는 효과를 줄 수 있다. 이는 RL에서 entropy bonus처럼 작용하여 모델의 다양성을 부추긴다.

RL에서는 value function을 분산을 줄이기 위해 사용되는 baseline이지만, policy model의 사이즈만큼 크기 때문에 메모리와 cost를 많이 사용한다. LLM에서는 각 토큰마다 reward score를 받는 것이 아닌 최종 생성된 문장의 품질만 평가하기 때문에 마지막 토큰에만 보상을 부여한다. 때문에 문장에서 각 토큰의 기여도를 파악하기 어려워지고 Value function(Reward model)이 각 토큰에 대해 정확하게 학습되기 힘들어진다. 그리고 GRPO는 이 문제를 해결할 수 있다.

GRPO는 추가적인 value function approximation이 필요 없도록 하였으며 대신 한 질문에 대한 multiple sampled outputs의 평균 보상을 baseline으로 사용하였다. 그리고 KL을 reward에 추가하는 것이 아닌 objective에 직접적인 regularizer 텀으로 추가했다. Advantage를 계산할 때 reward에 KL을 추가하면 계산이 복잡해지기 때문이다.

Process supvervision

Advantage는 Group reward($\textbf{r} = \{ r_1, r_2, .. r_G \}$)의 avg, std로 normalize한 reward 즉, $A_i=\frac {r_i - \text{mean}(\textbf{r})} {\text{std}(\textbf{r})}$이 된다. 이는 RL에서 critic 모델을 학습하지 않고 사용하는 방식이다. Outcome suprevision RL은 그룹의 각 문장의 마지막의 reward만 제공하며 이는 복잡한 task에서 충분하지 않고, 효율적이지 않다.

따라서 GRPO에서는 Process supervision을 적용하게 된다. 각 reasoning step마다의 마지막 reward를 이용한다. $index(j)$가 j번째 step의 마지막 토큰 인덱스이고 $K_i$가 i번째 output의 전체 스텝 수라고 할 때, $\textbf{R}$은 $\textbf{R}= \{ \{ r_1^{index(1)}, ... r_1^{index(K_1)} \}, ..., \{ r_G^{index(1)}, ... r_1^{index(K_G)} \} \}$로 정의된다. 또한 $r_i^{index(j)} = \frac {r_i^{index(j)} - \text{mean} (\textbf{R}) } {\text{std}(\textbf{R})} $이다.

즉 Process supvervision은 모든 step의 normalized reward의 합으로 Advantage를 계산한다. 이를 가지고 (3)의 objective로 학습한다.

KL unbiased estimator

또한 KL 값 추정을 위해 unbiased estimator를 사용했다. 이는 sampling을 통해 KL을 구하는데에 대한 bias를 보정할 수 있는 (log)importance weight를 곱해주는 방법이다.

Iterative RL

old RM은 현재 policy model을 supervise하기에 충분하지 않으므로 GRPO에서는 itrative RL을 적용했다.

policy 모델로부터 샘플한 training set을 새로 생성하고 이를 reward model을 update하는데 사용한다. 그리고 이전 data의 10%를 포함하여 replay mechanism을 적용하여 reward model을 학습한다.

초기 Reward model은 Instruction-Tuned model의 응답 기준으로 학습되어 있다. 하지만 policy가 업데이트 되면서 모델이 점점 더 좋은 응답을 생성하게 되고 기존의 응답 분포가 바뀌게 된다. 이때 RM이 그대로라면 policy 모델과 Misalignment 문제가 발생할 수 있다. 과거 Model 기준으로 평가하기 때문에 새 Model의 더 나은 응답을 제대로 평가하지 못할 수도 있는 것이다. 따라서 더 나아진 응답으로 새로운 샘플을 수집해 RM을 다시 학습 시킨다.

학습된 policy 모델을 reference model로 설정하고 이 과정을 반복한다.

✲ Insights of RL

Towards to a Unified Paradigm

먼저 SFT, RFT, DPO, PPO, GRPO 등의 서로 다른 학습 methods의 gradient는 아래와 같이 쓸 수 있다.

여기서 3개의 Key components가 있는데, 1) Data Source $\mathbb{D}$ 2) Reward function $\pi_{rf}$ 3) Gradient Coefficient를 정하는 Algorithm $\mathbb{A}$ 이다.

1) Data Source

Online Sampling은 policy가 학습되는 동안 exploration의 결과를 학습에 사용하는 것이고 Online RFT와 GRPO는 이를 따른다.

그리고 Offline Sampling은 초기 SFT Model에서 샘플한 결과를 사용하는 것이고 RFT, DPO는 이를 따른다. 실험에서 초기에는 RFT와 Online RFT가 비슷했으나 점점 Online RFT의 성능이 향상되는 것을 확인할 수 있다. 따라서 Component 1의 Data source는 policy model로 부터 sample하는 소스가 더 큰 advantage를 가져다준다고 볼 수 있다.

2) Gradient Coefficient

Reward function은 "Rule"과 "Model"로 나누어진다. Rule은 답변의 정확성에 의거한 응답의 품질을 판단하며, Model은 Reward model을 학습하는 것을 의미한다.

Rule 기반은 잘못된 응답에 따른 차이를 두지 않지만 Model은 GC의 크기로 이에 대한 차이를 둘 수 있다.

Why RL Works?

선행 연구들에서는 RL 학습으로 인해 Instruction following 성능이 떨어지거나, 언어 혼합 현상 등의 문제가 있었다. 해당 연구에서는 Instruction tuning 데이터의 일부로 RL을 수행했고 이로 인해 Instruction tuning 모델 대비 현저한 성능향상이 있었다. 상위 K개의 샘플 중 하나라도 정답이면 성공으로 간주하는 Pass@K와 K개의 응답 중 과반수가 답이라면 성공으로 간주하는 Maj@K 지표를 근거로 이와 같은 주장을 뒷받침한다. 결론적으로 Pass@K의 변화는 없지만 Maj@K가 향상되었으며 아래와 같은 의의가 있다.

먼저 RL 모델은 근본적인 능력을 향상시키는 것은 아니다. 만약 그렇다면 Pass@K도 개선되었어야 하지만 그렇지 않아 RL이 모델의 근본적인 추론 능력이나 사고 능력을 향상시키지 않는 다는 것을 의미한다.

대신, RL은 output distribution을 안정적으로 만든다. RL이후 Maj@K가 개선되었다는 것은 모델이 정확한 응답을 더 높은 확률로 일관되게 생성되었다는 것을 의미한다. 즉, Instruction tuning 모델에서 정답이 top-k 후보군에 있었다 하더라도 그 확률이 낮았지만 RL 이후에는 그러한 후보 응답이 더 자주 일관되게 출력되도록 했다는 것이다.

결론적으로 RL이후 top-k 응답들이 크게 변화하지는 않았어도 올바른 답이 더 높은 확률로 나올 수 있도록 분포가 조정되었다는 것을 의미한다. 이와 같은 인사이트는 RL로 reasoning 능력이 새롭게 학습되지 않는다는 것을 의미한다.

저자들의 이러한 발견이 DeepSeek-R1과 같은 모델 학습에서 적용된 것 같다. RL 모델이 어느정도 수렴하면 Reject sampling을 통해 새로운 SFT 데이터를 생성하여 재학습 하고 또 RL모델을 학습하는 방식으로 모델을 점진적으로 개선해나가는 방식으로 말이다.

정리

GRPO는 PPO에서 baseline으로 쓰이는 Value function을 제거하고 relative scoring을 기반으로 baseline을 추정함.

➡ 이로 인해 PPO 대비 메모리 효율성과 학습 안정성을 높일 수 있음.

Value Model(Critic)을 사용하지 않았다는 것이지 Reward model 자체가 필요하지 않다는 것은 아님.

➡ 핵심은 Reward 값 자체가 아닌 그룹 내 상대 비교를 통해 Reward를 정규화하는 방식.

PPO의 KL 텀을 reward가 아닌 직접 loss에 추가하여 계산 구조를 단순화함

Process Supervision을 통해 중간 reasoning step마다 보상을 제공

Reward model도 Iterative하게 업데이트함으로써 update policy 응답의 Reward를 잘 반영할 수 있도록 함.

Reference

https://huggingface.co/unsloth/DeepSeek-R1

https://www.youtube.com/watch?v=kv8frWeKoeo

알고리즘 공부 - 파이썬

minkyung — Wed, 15 Jan 2025 14:40:32 +0900

그동안 알고리즘 공부하며 공부한 것들 정리

문제 풀면서 느낀 팁

파이썬은 1초에 대충 2천만 번 연산 가능하다고 보면 됨
시간 복잡도는 꼭 계산해보기
구현 문제일수록 문제 꼼꼼히 봐야 함
테스트 케이스 다양하게 넣어보기. 특히 최소, 최대, 엣지 케이스

자주 쓰는 파이썬 내장 함수/모듈

itertools: permutations, combinations, count

heapq: 우선순위 큐 구현할 때

bisect: 이진 탐색할 때

collections: deque, Counter

math: factorial, sqrt, gcd, pi, sin, cos 등등

sum, min, max 이런 기본 함수들도 은근 많이 씀

파이썬 팁

mutable / immutable 정리

mutable: list, dict, set, class

immutable: int, float, complex, str, tuple, frozenset

dfs, bfs에서 visited를 mutable 타입으로 쓰면 의도치 않게 공유될 수 있어서 주의해야 함
tuple은 immutable이라 값 변경 불가
얕은 복사(shallow copy): 원소가 immutable이면 깊은 복사랑 같음
깊은 복사(deep copy): mutable한 원소까지 복사하려면 이걸 써야 함

import copy

# 얕은 = 깊은 (immutable)
origin_lis = [1, 2, 3, 4]
copy_lis = origin_lis[:]
copy_lis[0] = 100
print(origin_lis)  # [1, 2, 3, 4]
print(copy_lis)    # [100, 2, 3, 4]

# 얕은 =/= 깊은 (mutable)
origin_lis = [[1, 2], [3, 4]]
copy_lis = origin_lis[:]
copy_lis[0][0] = 100
print(origin_lis)  # [[100, 2], [3, 4]]
print(copy_lis)    # [[100, 2], [3, 4]]

# dictionary 얕은 복사
original = {'a': 1, 'b': {'c': 2}}
shallow_copy = original.copy()
shallow_copy['b']['c'] = 100
print(original)       # {'a': 1, 'b': {'c': 100}}
print(shallow_copy)   # {'a': 1, 'b': {'c': 100}}

# 깊은 복사
import copy
deep_copy = copy.deepcopy(original)

dictionary/set의 key는 hashable만 가능 → list는 안 되고 tuple로 써야 함

key_ = [1, 2, 3]
d = {}
d[key_] = 'value'  # TypeError: unhashable type: 'list'

key_ = (1, 2, 3)
d = {key_: 'value'}  # OK

리스트 초기화 주의

# 잘못된 방식
arr = [[0] * n] * n  # X

# 올바른 방식
arr = [[0] * n for _ in range(n)]

비트 연산자

print(10 << 1)  # 20
print(10 << 2)  # 40
print(16 >> 2)  # 4

rotate (리스트 회전)

arr = [1, 2, 3, 4]
print(arr[1:] + arr[:1])  # [2, 3, 4, 1]

from collections import deque
d = deque([1, 2, 3])
d.rotate(1)
print(d)  # deque([3, 1, 2])

zip으로 행렬 transpose 하기

arr = [[1, 2], [3, 4]]
transposed = list(zip(*arr))  # [(1, 3), (2, 4)]

*args로 입력 받기

a, *b = map(int, input().split())

list 복사 주의

a = [1, [2, 3]]
shallow = a[:]
import copy
deep = copy.deepcopy(a)

set, dict 활용

s1 = set([1, 2, 3])
s1.add(4)
s1.update([5, 6])
s1.remove(2)

리스트 요소 개수 세기

li = [1, 2, 1, 1]
print(li.count(1))  # 3

2차원 리스트 펼치기

li = [['a', 'b'], ['c', 'd']]
flat = sum(li, [])  # ['a', 'b', 'c', 'd']

itertools 무한 반복 예시

import itertools
for i in itertools.count(1, 0.5):
    if i > 3:
        break
    print(i)

다중 for문 탈출

def example(n, target):
    for i in range(n):
        for j in range(n):
            for k in range(n):
                num = int(str(i) + str(j) + str(k))
                if num > target:
                    return num

lambda로 계산기 함수 만들기

def operate(op):
    if op == "1":
        return lambda x, y: x + y
    elif op == "2":
        return lambda x, y: x - y
    elif op == "3":
        return lambda x, y: x * y
    else:
        return lambda x, y: x // y

opers = list(map(operate, input().split()))

int <-> str 숫자 변환

num = 123
num_to_list = [num // 100, (num // 10) % 10, num % 10]

str_ = '123'
str_to_num = int(str_[0]) * 100 + int(str_[1]) * 10 + int(str_[2])

deepcopy 없이 깊은 복사

arr = [[1] * 4 for _ in range(4)]
new_arr = [row[:] for row in arr]

리스트 한 줄 출력

arr = [1, 2, 3]
print(*arr)  # 1 2 3

다음편은 진짜 알고리즘...

The Case for Co-Designing Model Architectures with Hardware

minkyung — Sat, 14 Dec 2024 14:49:46 +0900

The Case for Co-Designing Model Architectures with Hardware

link : https://arxiv.org/pdf/2401.14489

✲ Introduction

딥러닝 모델을 설계할 때 GPU 구조의 영향을 간과하는 경우가 많으며 모델을 하드웨어에 더 적합하게 수정하면 학습 및 추론 능력을 향상시킬 수 있다고 제안하는 논문이다. 이를 위해 Transformer 성능을 극대화하기 위한 가이드라인을 제공한다. 이 가이드라인은 다양한 하이퍼파라미터가 GPU의 기본 계산 커널의 효율성에 미치는 영향을 고려하여 작성되었다고 한다. GEMM(General Matrix Multiplication) 최적화의 기본 원리를 사용해 Transformer 모델의 개별 부분을 최적화하는 것을 보여준다. 하드웨어 세부사항이 딥러닝 연구에서 일반적인 생각보다 훨씬 중요하다는 것 또한 강조한다. 궁극적으로 현대 GPU 구조를 면밀히 고려해 Transformer 모델의 성능 튜닝을 단순화하려고 한다.

위 그림에서 보듯, 거의 동일한 수의 파라미터를 가진 모델임에도 모델 shape이 다르면 실행 시간에서 큰 차이가 날 수 있다. GPT-3 (2.7B)에 의해 정의된 "standard architecture"은 OPT, GPT-Neo, Cerebras-GPT, RedpajamaINCITE, Pythia 등의 모델에서 사용되었다. 안타깝게도 Transformer 아키텍처를 최적으로 형상화하는 방법에 대한 지식은 널리 알려져 있지 않으며 이로 인해 사람들은 종종 최적이지 않은 설계 결정을 내린다. 이 문제는 연구자들이 더 깔끔한 성능 비교를 위해 다른 논문에서 사용된 하이퍼파라미터를 의도적으로 복사하는 경향이 있어 이러한 비최적적 선택들이 표준으로 고착되는 현상으로 악화되었다. 위 그림에서 볼 수 있듯 GPT3(2.7B) 모델은 모델 shape을 조금만 조정해도 기존 구조보다 20%빠르게 학습할 수 있다는 것을 보여준다.

우리의 분석은 General Matrix Multiplications(GEMMs)이 현대 딥러닝에서 매우 중요한 역할을 한다는 사실을 기반으로 한다(attention layers or linear layers, convolutions).

Transformer 모델은 위 그림처럼 GEMM 커널이 중, 대형 모델의 latency에서 69.3%와 94.9%를 차지한다. 따라서 GEMM의 성능을 이해하는 것은 end-to-end model 실행 성능을 이해하는 데 매우 중요하고 모델 크기가 커질수록 중요하다.

병렬 구조 때문에 GPU는 GEMM에 적합한 하드웨어 플랫폼이다. 그러나 이러한 GEMM의 관찰된 처리량은 행렬 크기에 따라 달라지고, 이는 계산이 GPU의 execution units (Streaming Multiprocessor, SMs)에 어떻게 매핑되는지 의해 결전된다. 따라서 GPU의 효율성은 모델의 depth & with에 민감하고 계산 효율성, SM 활용도, 커널 선택, 그리고 텐서 코어와 더 느린 CUDA 코어의 사용에 영향을 미친다. 이런 요소들을 고려해 GPU에서좋은 성능을 낼 수 있도록 어떻게 최적화 할 수 있는지 알아보겠다.

많은 논문들이 GPU에서 성능 최적화를 이야기하지만, GPU의 속성(tensor cores, tiling, wave quantization, etc.)이 모델 학습에 미치는 근본적 영향을 간과하는 경향이 있다. 이로 인해 많은 DL 학습 그룹이 비슷한 모델 사이즈 세팅을 가지고 있는 것을 발견하였다. 저자들은 이 최적화 결과를 GPU의 기본적 원리 관점에서 설명하고 이를 효율적인 트랜스포머 학습과 추론을 위한 간결한 최적화 가이드로 집약할 것이다.

*Contributions

Transformer를 GEMM으로 매핑해 Transformer 각 구성 요소가 비효율적인 차원을 사용하면 어떻게 성능 저하가 되는지 보여준다.

GPU의 성능 요소를 하나의 문서로 정리하고 최적의 GEMM 차원을 선택하는 방법을 설명한다.

Transformer이 효율적인 GEMM으로 구성되도록 하는 규칙을 정의한다.

✲ Background

‣ A. GPU Kernels

GEMM에서 A가 $m \times k$ 행렬이고 B가 $k \times n$ 행렬이면 행렬곱셈 AB는 간단한 GEMM이다. 이를 일반화하면 $\alpha A B + \beta C$가 되고 fully connected layers forward pass에서 weight matrix는 A가 되고 input activations은 B가 된다. $\alpha$와 $\beta$는 일반적으로 1,0 이지만 skip connection을 추가할 때와 같은 특정 시나리오에서는 1일 수 있다.

행렬-행렬 곱셈은 많은 과학, 공학적 응용프로그램에서 기본적 연산으로 특히 딥러닝 분야에서 중요하다. 이는 많은 계산 자원을 필요로하는 연산이므로 이를 해결하기 위한 GEMM 연산을 최적화하는 다양한 알고리즘과 계산 기법들이 개발되었다. Batch Matrix-Matrix(BMM) 곱셈 커널과 같은 행렬 곱셈 변형($C_i = \alpha A_i B_i + \beta C_i , i = 1, ... N$)도 도입되어 어텐션 같은 특정 DL 연산 처리량을 개선하고 있다.

‣ B. NVIDIA GEMM Implementation and Peformance Factors

Nvidia GPU는 출력 행렬을 영역이나 타일로 나누고, 이를 GPU의 사용 가능한 SM(Streaming Multiprocessor) 중 하나에 스케줄링 한다 (ex. A100에는 108의 SM이 있음). 각 타일 또는 스레드 블록은 텐서 코어(*행렬 곱셈 덧셈을 한 번의 사이클로 고속 처리할 수 있는 하드웨어 유닛)에서 처리되고 텐서 코어는 빠른 텐서 연산을 위해 Nvidia가 도입한 기술이다. 이 텐서 코어는 적절한 차원을 가진 GEMM에서만 사용할 수 있다. GEMM 차원 $m, k, n$이 V100 GPU의 경우 16바이트 배수, A100의 경우 128바이트 배수일 때 가장 효율적으로 활용된다(fp16일 경우 8, 64 배수). 만약 이 차원 크기를 맞출 수 없다면 2바이트 배수보다 큰 배수로 성능 개선하려고 시도한다.

행렬 연산에서 작은 블록으로 나누고 이 작은 블록을 타일(Tiles)이라고 한다. 타일을 GPU의 SM이 병렬로 작업을 분담해 성능을 최적화 하고, 스레드 블록(Thread Block)은 병렬 연산을 할 수 있는 최소 작업 단위를 의미한다. 각 스레드 블록은 하나의 타일을 처리한다. 그리고 커널(Kernel)은 GPU에서 실행되는 연산 단위이다. GEMM 연산을 처리하는 커널은 여러 타일 크기를 선택할 수 있다. 이때 각 타일 크기는 GPU 아키텍처의 최적화된 크기와 맞춰야 성능이 좋다(ex. V100, 16 x 16, 32 x 32.. ).

하지만 이때 GEMM, 즉 행렬의 크기가 타일 크기와 정확히 나누어 떨어지지 않으면 계산낭비 및 타일 양자화 (Tile Quantization) 문제가 발생한다. 예를 들어, 행렬 크기가 1025 * 1025 이고 타일 크기가 16 * 16 이라면 완전히 채워진 64개의 타일이 필요하고 추가로 1개의 타일이 더 필요하다. 이 1개의 타일은 1 * 1의 완전한 타일(1025번째 행과 열)을 제외한 나머지 15 * 15 에 대해 불필요한 연산이 이루어져 성능 저하를 초래한다. 타일 양자화라는 뜻은, 정해진 특정 값으로 나뉘어진다는 의미로 타일 크기에 맞춰 출력이 이산적인 값이라는 것이다. 즉, 출력 결과가 타일 단위로 처리되어 각 타일이 특정 크기 (ex. 16*16)로 맞춰져서 계산되고 타일에 맞는 연산만 이루어진다는 뜻이다. 이렇게 타일 양자화가 발생하면 실제로 필요한 데이터 뿐 아니라 불필요한 부분까지 연산을 진행해야 하므로 GPU 계산 리소스가 낭비된다. 타일 블록 내 일부 스레드는 실제로 연산을 수행할 필요가 없는데도 실행하게되고, 이러한 낭비가 블록 단위로 누적된다. 이로 인해 전체 연산 시간이 증가하고 처리량(throughput)이 감소한다.

그리고 또다른 양자화 효과로 웨이브 양자화(Wave quantization)이 있다. 스레드 블록이 SM에 할당될 때 한번에 108개의 스레드 블록만 할당할 수 있다. 예를 들어 109개의 스레드 블록을 할당한다면, 두 번의 라운드 또는 "wave"로 스레드 블록이 GPU에 할당된다. 첫 번째 wave에는 108개의 스레드 블록이 있고 두 번째 wave에는 1개의 스레드 블록만이 포함된다. 두 번째 wave는 첫 번째 wave와 거의 같은 latency를 가지지만, 유효한 계산은 아주 작은 비율에 불과하다. 행렬 크기가 커짐에 따라 마지막 또는 tail wave가 커진다. 처리량(throughput)은 증가하다, 새로운 wave가 필요하면 다시 감소하게 된다. 이처럼 전체 스레드 블록 수가 SM 개수롸 나누어 떨어지지 않은 경우에 발생하는 문제이고 타일 양자화와 비슷하게 불필요한 연산이 발생하게 된다.

‣ C. Transformer Models

이 연구에서는 주로 GPT-2로 대중화된 디코더 전용 트랜스포머 구조를 살펴본다. 대부분의 결론은 인코더 전용 모델에도 적용될 수 있으나 인코더-디코더 모델에는 이들간 전환 방식 때문에 적용되지 않는다.

초기에는 raw input tokens가 v * h 크기의 임베딩 테이블에 입력된다. 이 토큰 임베딩은 크기 s * h인 positional embedding과 결합된다. 임베딩 계층에서 나온 출력은 트랜스포머 블록의 입력의 크기가 되며 s * b * h인 3D 텐서이다. 트랜스포머의 각 layer는 self-attention block과 attention heads로 구성되며, 그 뒤에는 2-layer multi-layer MLP로 이어져 hidden size를 4h로 확장 한 후 다시 h로 축소한다. 각 트랜스포머 계층의 입력 및 출력 크기는 일관되게 s * b * h로 유지된다. 마지막 트랜스포머 레이어에서 나온 최종 출력은 vocab 차원으로 다시 projection되어 cross-entropy를 계산하는 데 사용된다.

Transformer 각 layer는 다음과 같은 Matrix Multiplication Operators로 구성된다.

Attention Key, Value, Query transformation:
이는 하나의 행렬 곱셈으로 표현할 수 있고 $(b \cdot s, h) \cdot (h, \frac{3h}{t})$, 출력 크기는 $(b \cdot s, \frac{3h}{t})$이다.
Attention score computation:
batched matrix multiplications, BMM이 $b \cdot a / t$ 번 수행되고 각 곱셈의 크기는 $(s, \frac{h}{a}) \cdot (\frac{h}{a}, s)$이며 출력의 크기는 $(\frac{b \cdot a}{t}, s, s)$이다.
Attention over value computation:
$\frac{b \cdot a }{t}$ 번 배치된 행렬 곱셈이 수행되고 각 곱셈의 크기는 $(s,s) \cdot (s, \frac{h}{a})$이다. 출력의 크기는 $(\frac{b \cdot a}{t}, s, \frac{h}{a})$이다.
Post-attention linear projection:
하나의 행렬 곱셈으로 크기는 $(b \cdot s, \frac{h}{t} \cdot (\frac{h}{t}, h)$이며 출력의 크기는 $(b \cdot s, h)$이다.
Matrix multiplications in MLP block of size $( b \cdot s, h) \times (h, \frac{4h}{t}) $ and $(b \cdot s, \frac{4h}{t}) \times (\frac{4h}{t}, h)$. Ouputs are of size $(b\cdot s, \frac{4h}{t})$ and $(b \cdot s, h)$.

따라서 트랜스포머의 총 파라미터는 다음 공식을 사용해 계산할 수 있다.

$$ P = 12 h^2 L + 13 h L + (v+s)h. \ L \text{: Number of transformer layers}$$

이 값은 일반적으로 $P=12 h^2 L$로 근사되고 하위 차수 항들은 생략된다. 여기서 우리는 multi-head attention block에서 projection 가중 차원이 $h/a$인 것으로 가정한다(이는 Megatron, GPT-NeoX 같은 기존 구현의 기본 설정이다). 학습을 위한 forward pass를 수행하는 데 필요한 총 계산 작업 수는 아래와 같다.

$$ 24 b s h^2 + 4 b s^2 h = 24s h^2 (1 + \frac{s}{6h}) $$

그리고 해당 논문에서는 여러 GPU를 사용하는 방식, 즉 병렬화에 대한 논의는 주제로 다루지 않는다. 하나의 GPU에서 모델을 계산하는 과정에 집중한다. 예를 들어 t-way tensor 병렬화인 경우 hidden size가 h이고 GPU 갯수가 t이면, 실제로 한 GPU당 처리하는 hidden size는 h/t이다.

✲ GEMM Results

GEMM 크기가 커질수록 연산은 Compute-bound 되고 메모리 효율성이 높아지고, GEMM은 작은 행렬에서는 Memory-bound이다.

Memory-bound라는 것은 프로세서(GPU, CPU)가 작업을 처리하는데 메모리 속도나 대역폭이 병목되어 성능이 제한되는 상황을 의미한다. 처리할 데이터가 많지만 그 데이터를 프로세서가 메모리에서 가져오고 저장하는 속도가 충분히 빠르지 않아 연산 성능이 제한되는 경우이다. 즉, 데이터를 메모리에서 가져오는 속도가 연산 처리 속도보다 느려서 결국 메모리가 병목되는 것이다.

Compute-bound라는 것은 행렬의 크기가 커지면 연산에 필요한 계산량이 많아져 memory-bound의 메모리에서 데이터를 읽고 쓰는 속도가 더이상 제한적이지 않다는 의미이다. 즉, 데이터를 읽은 후 수행해야 할 계산이 더 중요해 메모리에서 데이터를 읽어오는 속도는 크게 중요하지 않다는 것이다.

그림 5의 (a)를 보면 행렬 크기가 커짐에 따라 연산의 처리량(teraFLOP/s)이 비례하여 증가하는 것을 볼 수 있다. 행렬 크기가 작을 때는 메모리에서 데이터를 가져오는 속도가 더 중요한 병목 요소로 작용한다는 뜻이다. 따라서 wave 양자화로 인한 비효율성도 GEMM 크기가 특정 임계값보다 클 때 성능 저하의 요인이 된다. 그림 5의 (b)를 보면 이 wave 양자화의 영향을 명확하게 확인할 수 있다. GEMM 크기가 충분히 커지면 파이토치에서 자동으로 타일 크기를 선택해 양자화 효과를 줄일 수 있다. 그림 5의 (c)에서는 파이토치는 GEMM 병렬화 개선 효과와 파형 양자화로 인한 비효율성을 잘 조절해 처리량을 향상시키는 것을 볼 수 있다. 이로 인해 wave 양자화의 악영향이 줄어들게 된다.

✲ Transformer Results

위에서 살펴본 GEMM 결과는 Transformer에 그대로 적용된다. 트랜스포머를 일련의 GEMM으로 이해하는 것이다.

각 Attention head는 자신만의 key, query, value matrix를 가지고 독립적인 행렬곱셈 연산을 수행한다. 따라서 헤드의 수만큼 행렬 곱셈의 연산 수가 배로 증가한다. head가 8이면 행렬 곱셈(MM) 연산 수는 8배 증가한다.

그리고 각 MM도 마찬가지로 Attention head 수에 따라 달라진다. 1개의 head만 있을 때보다 8개의 head가 있다면 각 head는 원래보다 작은 크기의 행렬을 처리하게 된다. 각 헤드가 수행하는 연산 크기는 전체 크기를 헤드 수로 나눈 작은 행렬이 된다.

그림 7은 attention score와 value에 대한 계산에서 사용되는 BMM의 처리량에 헤드 수와 hidden size가 미치는 영향을 보여준다. Nvidia tensor core는 A100 GPU에서 m, n, k 차원이 128바이트의 배수일 때 더 효율적이다 (fp16의 경우 64바이트 배수). 만약 이렇게 조정할 수 없다면 더 큰 2의 거듭제곱의 배수인 크기를 사용하는 것이 더 나은 성능을 보인다.

그림 8, 9 실험에서는 hidden size(h)와 head 수(a)를 줄이면 GEMM의 효율성이 향상됨을 보여준다. a가 감소하면 h/a가 증가하여 이 두 GEMM은 Memory-bound 상태가 되기 때문이다. 그림 9를 보면 wave 양자화 효과가 나타날 때마다 peak and valley 형태로 TFlops가 저하되는 것을 볼 수 있다.

그림 11을 보면 모델 사이즈가 커질수록 GEMM 연산 최적화가 중요하다는 것을 알 수 있으며 attention block QKV transformation과 MLP 블록이 가장 빈번한 GEMM임을 알 수 있다.

‣ Analysis

따라서 NVIDA GPU에서 GEMM을 효율적으로 실행하기 위한 요구사항은 아래와 같다.

Tensor Core Requirement
GEMM의 내 외부 차원이 128바이트(fp16은 64)로 나누어 떨어지도록 해야 한다.
Tile Quantization
가장 효율적인 타일 크기를 사용하려면 output matrix가 128 * 256 으로 나누어 떨어지도록 해야한다.
Wave Quantization
Output matrix이 나누어 떨어지는 블록의 수가 Streaming Multiprocessors(SMs)의 수(80 for V100, 108 for A100, 144 for H100)로 나누어 떨어지도록 해야한다.

Tile quantization은 사용자가 관찰하기 어려운 경우가 많지만(더 큰 크기의 문제를 처리할 때와 비슷한 시간으로 실행되는 것으로 확인), Wave quantization은 쉽게 관찰할 수 있다. X * Y 행렬이 있고 t1 * t2 타일이 있다면 다음을 만족하면 wave quantization 비효율성이 발생하지 않는다.

Pytorch는 여러 종류의 타일 크기를 사용할 수 있지만, wave quantization에서 불완전한 Tile에 대한 성능 저하를 자동으로 최적화하지는 못한다. 즉, 타일 크기를 최적화하면 연산이 빨라지긴 하지만 wave quantization 때문에 모든 경우 완벽한 최적화는 어렵고 Pytorch는 이러한 기능이 부족하다.

따라서 Transformer에서 최상의 성능을 보장하려면 다음을 확인해야 한다.

a: attention head 수, h: hidden size, t: 텐서 병렬 수

Vocabulary size는 64로 나누어 떨어져야 한다.
micro batch크기 b는 가능한 커야 한다.
$b \cdot s, \frac{h}{a}, \frac{h}{t}$는 2의 거듭제겁으로 나누어 떨어져야 하며, 64를 넘는 크기는 더 이상 성능 향상에 도움되지 않는다.
$(b \cdot a) / t$는 정수여야 한다.
t 는 가능한 작아야 한다. (gpu간 로드 밸런싱과 통신 비용 때문에)

그리고 micro batch b가 2의 거듭제곱으로 나누어질 필요가 없는 이유는, 시퀀스 길이 s가 이미 큰 2의 거듭제곱이기 때문이다. 그리고 파이프라인 병렬화를 사용하여 학습하는 것이 최적인지 여부는 컴퓨팅 설정의 세부사항에 따라 달라진다. 특히 노드 간 연결 속도와 대역폭이 중요한 요소이다. 모든 경우에 레이어 수는 파이프라인 병렬 단계(모델 연산을 여러 gpu나 노드에 나눠 수행하는 각각의 처리 구간) 수로 나누어 떨어지는 것이 최적이다. 예를 들어 레이어 수가 12고 gpu 수가 4개면 각 gpu는 3개의 레이어를 맡게 되지만 레이어 수가 13개라면 3,3,3,4 처럼 각 단계 마다 맡는 레이어 수가 달라진다.

GPT-3(2.7B)에서 hidden size 2560이며 attention head 수는 32라서 2560/32=80으로 64배수가 되지 못한다. 따라서 히든을 4096으로 늘리거나 헤드 수를 20으로 줄일 수 있는데 히든 사잊르르 늘리면 파라미터 수가 2배가 되므로 헤드 수를 줄인다. 작은 모델의 경우 FlashAttention v2를 사용하여 이러한 영향을 완화하거나 그림 10처럼 saturation point에 가능한 빨리 도달하도록 h를 최대한 증가시키는 것을 추천한다.

Decoder only 구조는 대체로 표준화 되어있으며(GPT-2기반), 최근 연구에서 몇 가지 아키텍처 수정이 인기를 끌고 있다.

1. Parallel Layers

기존에는 attention과 mlp를 순차적으로 계산했다. $y = x + \text{MLP(Norm(}x + \text{Attn(Norm(}x))))$

이제는 Transformer 블록을 병렬적으로 정의한다. $y = x + \text{MLP(Norm(}x)) + \text{Attn(Norm(}x))$

실제로 두 브랜치가 동시에 계산되는 것은 아니고, 이 정의로 인한 속도 향상은 MLP와 Attn 블록을 하나의 커널로 융합하여 달성된다.

2. Alternative Positional Embeddings

기존 pointwise operations 기반 위치 임베딩에서 최근에는 Rotary, ALiBi 임베딩이 더 인기를 끌고 있다. Rotary, ALiBi 임베딩에 필요한 GEMM연산이 조금 더 느리지만 이 임베딩이 가져오는 모델 정확도 향상은 일반적으로 그만한 가치가 있다고여겨진다. 최근에는 로터리 임베딩용 맞춤형 커널이 도입되어 비용을 더 줄일 수 있었다.

3. Flash Attention

LLM에 널리 사용되는 혁신적인 Attention Kernel이다. 위 그림에서 볼 수 있듯 Flashattention은 Roofline Model을 따른다. Roofline 모델이란 HW 성능 한계와 알고리즘의 효율성으로 시각적으로 나타내는 의미를 말한다. 따라서 Flashattention을 사용할 때 어텐션 연산의 성능이 어느 지점에서 성능 한계에 도달한다는 의미이고, h를 가능한 크게 키우는 것이 가장 효율적인 방법이라는 단순한 결론이 도출되게 한다.

Parallel Layers, Alternative Positional Embeddings, Flash Attention은 해당 논문의 분석에 미치는 영향이 거의 없다. 하지만 SwiGLU와 8h/3 MLPs는 좀 다르다.

4. SwiGLU와 8h/3 MLPs

SwiGLU는 MLP블록에 3개의 matrix를 포함하게 된다. MLP 블록의 총 파라미터 수를 유지하기 위해서는 SwiGLU에서 $d_{ff}=8/3 \cdot h$를 사용하는 대신 $d_{ff}=4 \cdot h$를 사용하는 것을 제안한다. 해당 논문에서 최적으 GEMM 성능을 찾기 위한 h를 찾는 방법을 따랐다면 8/3는 모든 정렬을 깨트리기 때문에 MLP블록의 성능을 훨씬 느리게 만들 수 있다.

여기서 8/3은 단순한 제안일 뿐이며 이 값이 최적은 아니다. Llama2를 보면 7B의 경우는 11008/4096 = 2.6875라는 계수를 사용하고 있으며, 이는 8/3 = 2.667에 상당히 가깝다. 그리고 70B는 28672/8192 = 3.5 계수를 사용하고 있다. 이는 SwiGLU를 사용하지 않는 Transformer보다 더 큰 파라미터를 가지게 한다. 이처럼 h를 이미 잘 선택했다면 성능을 극대화할 수 있는 다른 계수를 찾아볼 수 있다.

추가적으로 GPU노드 수에 따른 하이퍼파라미터에 대해서도 이야기한다. 일반적으로 데이터 센터는 8개의 GPU를 노드에 장착하지만, Summit 슈퍼컴퓨터와 같은 일부 시스템에서는 6개의 GPU를 사용한다. 이 경우, 텐서 병렬화가 GPU 수와 같을 때 가장 효율적인 방식이지만, 6-GPU 노드에서는 성능 최적화에 문제가 발생할 수 있다. 8-GPU에서 사용되는 모델 아키텍처는 6-GPU 노드에서 구현이 불가능하거나 비효율적일 수 있으며, 이를 해결하기 위한 방법이 있다면, 다른 GPU 설정에서 배포 시 문제가 발생할 수 있다. 따라서 프리트레이닝을 최적화할 것인지, 아니면 파인튜닝이나 추론에 더 적합한 하이퍼파라미터를 선택할 것인지에 대한 신중한 결정이 필요하다고 한다.

앤트로픽ceo 에세이 Machines of Loving Grace, 전문 요약 번역

minkyung — Fri, 18 Oct 2024 16:57:13 +0900

원문 : https://darioamodei.com/machines-of-loving-grace

Anthropic CEO 다리오 아모데이가 AI가 어떻게 세상을 더 나은 곳으로 변화시킬 수 있는지에 대해 작성한 에세이이다. 아모데이가 강조하는 것 처럼 AI가 가져올 세상을 급진적이면서 동시에 자세하게 논의한다. AI 기술이 대두된 이후로 AI 기술이 가져오는 미래에 대해서 '급진적으로만' 다뤄지는 경우가 많았다. 즉 이를 진지하게 분석하는 것이 아닌 'SF적'으로 표현해왔다는 것이다. 이를 경계하고 앞으로는 AI 기술이 가져올 미래에 대해 실질적인 기술 목표와 비전을 보다 자세하게 논의하여야 한다고 주장한다. 그리고 이 에세이가 이를 위한 시작의 계기로 봤으면 좋겠다고 아모데이는 말한다.

Machines of Loving Grace

사랑의 은혜를 지닌 기계들 (기계가 사람의 감정이나 사랑과 결합되는 것을 상징적으로 나타낸다.)

How AI Could Transform the World for the Better

AI가 세상을 더 나은 곳으로 변화시키는 법

원문

I think and talk a lot about the risks of powerful AI. The company I’m the CEO of, Anthropic, does a lot of research on how to reduce these risks. Because of this, people sometimes draw the conclusion that I’m a pessimist or “doomer” who thinks AI will be mostly bad or dangerous. I don’t think that at all. In fact, one of my main reasons for focusing on risks is that they’re the only thing standing between us and what I see as a fundamentally positive future. I think that most people are underestimating just how radical the upside of AI could be, just as I think most people are underestimating how bad the risks could be.

In this essay I try to sketch out what that upside might look like—what a world with powerful AI might look like if everything goes right. Of course no one can know the future with any certainty or precision, and the effects of powerful AI are likely to be even more unpredictable than past technological changes, so all of this is unavoidably going to consist of guesses. But I am aiming for at least educated and useful guesses, which capture the flavor of what will happen even if most details end up being wrong. I’m including lots of details mainly because I think a concrete vision does more to advance discussion than a highly hedged and abstract one.

First, however, I wanted to briefly explain why I and Anthropic haven’t talked that much about powerful AI’s upsides, and why we’ll probably continue, overall, to talk a lot about risks. In particular, I’ve made this choice out of a desire to:

Maximize leverage. The basic development of AI technology and many (not all) of its benefits seems inevitable (unless the risks derail everything) and is fundamentally driven by powerful market forces. On the other hand, the risks are not predetermined and our actions can greatly change their likelihood.
Avoid perception of propaganda. AI companies talking about all the amazing benefits of AI can come off like propagandists, or as if they’re attempting to distract from downsides. I also think that as a matter of principle it’s bad for your soul to spend too much of your time “talking your book”.
Avoid grandiosity. I am often turned off by the way many AI risk public figures (not to mention AI company leaders) talk about the post-AGI world, as if it’s their mission to single-handedly bring it about like a prophet leading their people to salvation. I think it’s dangerous to view companies as unilaterally shaping the world, and dangerous to view practical technological goals in essentially religious terms.
Avoid “sci-fi” baggage. Although I think most people underestimate the upside of powerful AI, the small community of people who do discuss radical AI futures often does so in an excessively “sci-fi” tone (featuring e.g. uploaded minds, space exploration, or general cyberpunk vibes). I think this causes people to take the claims less seriously, and to imbue them with a sort of unreality. To be clear, the issue isn’t whether the technologies described are possible or likely (the main essay discusses this in granular detail)—it’s more that the “vibe” connotatively smuggles in a bunch of cultural baggage and unstated assumptions about what kind of future is desirable, how various societal issues will play out, etc. The result often ends up reading like a fantasy for a narrow subculture, while being off-putting to most people.

Yet despite all of the concerns above, I really do think it’s important to discuss what a good world with powerful AI could look like, while doing our best to avoid the above pitfalls. In fact I think it is critical to have a genuinely inspiring vision of the future, and not just a plan to fight fires. Many of the implications of powerful AI are adversarial or dangerous, but at the end of it all, there has to be something we’re fighting for, some positive-sum outcome where everyone is better off, something to rally people to rise above their squabbles and confront the challenges ahead. Fear is one kind of motivator, but it’s not enough: we need hope as well.

The list of positive applications of powerful AI is extremely long (and includes robotics, manufacturing, energy, and much more), but I’m going to focus on a small number of areas that seem to me to have the greatest potential to directly improve the quality of human life. The five categories I am most excited about are:

Biology and physical health
Neuroscience and mental health
Economic development and poverty
Peace and governance
Work and meaning

My predictions are going to be radical as judged by most standards (other than sci-fi “singularity” visions), but I mean them earnestly and sincerely. Everything I’m saying could very easily be wrong (to repeat my point from above), but I’ve at least attempted to ground my views in a semi-analytical assessment of how much progress in various fields might speed up and what that might mean in practice. I am fortunate to have professional experience in both biology and neuroscience, and I am an informed amateur in the field of economic development, but I am sure I will get plenty of things wrong. One thing writing this essay has made me realize is that it would be valuable to bring together a group of domain experts (in biology, economics, international relations, and other areas) to write a much better and more informed version of what I’ve produced here. It’s probably best to view my efforts here as a starting prompt for that group.

나는 강력한 AI의 위험에 대해 많이 생각하고 이야기를 나누고 Anthropic은 이러한 위험을 줄이기 위한 연구를 많이 하고 있다. 이 때문에 사람들은 때때로 내가 AI는 대부분 나쁘고 위험하다고 생각하는 비관주의자(pessimist) 또는 "doomer(디스토피아적 세계관을 가진 사람)"라고 결론을 내리곤 한다. 나는 전혀 그렇게 생각하지 않을 뿐더러 내가 AI의 위험에 대해 주로 초점을 맞추는 것은 이 위험이 궁극적인 유일한 장애물이라고 보기 때문이다. 나는 또한 대부분의 사람들이 AI의 긍정적인 잠재력이 얼마나 급진적일 수 있는지 과소평가하고 있다고 생각하는 한편, AI의 위험이 얼마나 심각할 수 있는지 또한 과소평가하고 있다고 생각한다.

이 에세이에서는 이 긍정적인 면의 모습(모든 것이 잘 풀릴 경우 AI가 있는 세상이 어떻게 보일지)이 어떨지 설명할 것이다. 물론 아무도 이를 예측할 수 없다. 세부적인 것은 틀리더라도 결국 일어날 일의 본질을 포착할 수 있는 교육적이고 유용한 추측을 목표로 하고 있다. 특히 세부사항을 많이 포함하려고 하는데, 구체적인 비전을 제시하는 것이 추상적이고 조건이 많은 비전보다 논의를 더 진전시킨다고 믿기 때문이다.

그러나 그 전에 나와 Anthropic이 왜 AI의 긍정적인 면보다 위험에 대해 많이 이야기할 것인지 간단히 설명하고 싶다.

Maximize leverage : AI의 발전과 많은 혜택은 불가피해 보이며 이는 본질적으로 강력한 시장의 세력에 의해 주도되고 있다. 반면 위험은 미리 정해진 것이 아니고 우리의 행동이 위험 가능성에 큰 영향을 미칠 수 있다.
Avoid perception of propaganda : AI 기업들이 AI의 놀라운 이점에 대해 이야기하는 모습은 마치 선전가(propaganda)처럼 보일 수 있거나 단점에서 눈을 돌리려는 시도로 느껴질 수 있다. 또한 너무 많은 시간을 "자기 이익을 홍보하는 데(talking your book)" 보내는 것은 영혼에 좋지 않다고 생각한다.
Avoid gradiosity : 나는 많은 AI 위험 관련 공적인 인물들(AI회사 리더들 포함)이 AGI 이후의 세상에 대해 이야기하는 방식이 종종 불편하다. 마치 그들이 예언자처럼 사람들을 구원으로 이끄는 사명감을 가지고 이를 실현하는 것 처럼 보인다. 나는 기업들이 세상을 일방적으로 형성한다고 보는 것은 위험하고, 실용적인 기술 목표를 본질적인 종교적 관점으로 보는 것도 위험하다고 생각한다.
Avoid 'sci-fi' baggage : 대부분의 사람들이 AI의 긍정적 잠재력을 과소평가한다고 생각하지만, 급진적인 AI 미래를 논의하는 소수의 사람들은 종종 지나치게 'SF적인' 톤으로 이야기한다(featuring e.g. 업로드된 정신, 우주 탐사, 사이펑크 분위기 등). AI가 있는 미래를 상상하고 예상되는 변화나 사회적 이슈가 '바람직하다'거나 '그럴 것이다'라고 가정하고 묘사되지만, 그 미래가 실제로 어떻게 될지에 대한 논의와 기술적으로 그것이 가능한지 여부가 명확하게 설명되지 않는다. 그 결과 이는 좁은 하위문화의 판타지처럼 읽히면서 대다수의 사람들에게 거부감을 일으키게 된다.

강력한 Ai의 긍정적 applications의 목록은 매우 길지만(로봇, 제조, 에너지 등등을 포함해서), 나는 인간의 삶의 질을 직접적으로 향상시킬 수 있는 가능성이 큰 몇 가지 분야에 집중할 것이다. 가장 기대하는 범주는 아래 5가지이다.

Biology and physical health (생물학과 신체건강)
Neuroscience and mental health (신경과학과 정신건강)
Economic development and poverty (경제발전과 빈곤)
Peach and governance (평화와 통치)
Work and meaning (일과 의미)

내 예측은 대부분의 기분으로는 급진적일 수 있지만('SF적인 특이점' 비전 같은 것들을 제외하고), 나는 그것들을 진지하고 성실하게 말하는 것이다. 이 예측의 모든 것이 쉽게 틀릴 수 있지만 적어도 다양한 분야에서 진전이 얼마나 빨라질 수 있는지 그것이 어떤 의미를 가지는지 반쯤 분석적인 평가에 근거해 보려고 노력하였다. 나는 생물학과 신경과학 분야에서 전문적인 경험을 가지고 있으며 경제 발전 분야에서도 정보가 풍부한 아마추어이다. 이 글을 쓰며 깨달은 점은 생물학, 경제학, 국제 관계 등 다양한 분야의 전문가들이 모여 내가 여기 작성한 것보다 더 나은, 더 정보가 풍부한 version을 작성하는 것이 가치가 있을 것이라는 점이다. 내 노력은 이를 위한 시작점으로 보는 것이 좋을 것이다.

원문

Basic assumptions and framework

To make this whole essay more precise and grounded, it’s helpful to specify clearly what we mean by powerful AI (i.e. the threshold at which the 5-10 year clock starts counting), as well as laying out a framework for thinking about the effects of such AI once it’s present.

What powerful AI (I dislike the term AGI) will look like, and when (or if) it will arrive, is a huge topic in itself. It’s one I’ve discussed publicly and could write a completely separate essay on (I probably will at some point). Obviously, many people are skeptical that powerful AI will be built soon and some are skeptical that it will ever be built at all. I think it could come as early as 2026, though there are also ways it could take much longer. But for the purposes of this essay, I’d like to put these issues aside, assume it will come reasonably soon, and focus on what happens in the 5-10 years after that. I also want to assume a definition of what such a system will look like, what its capabilities are and how it interacts, even though there is room for disagreement on this.

By powerful AI, I have in mind an AI model—likely similar to today’s LLM’s in form, though it might be based on a different architecture, might involve several interacting models, and might be trained differently—with the following properties:

In terms of pure intelligence, it is smarter than a Nobel Prize winner across most relevant fields – biology, programming, math, engineering, writing, etc. This means it can prove unsolved mathematical theorems, write extremely good novels, write difficult codebases from scratch, etc.
In addition to just being a “smart thing you talk to”, it has all the “interfaces” available to a human working virtually, including text, audio, video, mouse and keyboard control, and internet access. It can engage in any actions, communications, or remote operations enabled by this interface, including taking actions on the internet, taking or giving directions to humans, ordering materials, directing experiments, watching videos, making videos, and so on. It does all of these tasks with, again, a skill exceeding that of the most capable humans in the world.
It does not just passively answer questions; instead, it can be given tasks that take hours, days, or weeks to complete, and then goes off and does those tasks autonomously, in the way a smart employee would, asking for clarification as necessary.
It does not have a physical embodiment (other than living on a computer screen), but it can control existing physical tools, robots, or laboratory equipment through a computer; in theory it could even design robots or equipment for itself to use.
The resources used to train the model can be repurposed to run millions of instances of it (this matches projected cluster sizes by ~2027), and the model can absorb information and generate actions at roughly 10x-100x human speed5. It may however be limited by the response time of the physical world or of software it interacts with.
Each of these million copies can act independently on unrelated tasks, or if needed can all work together in the same way humans would collaborate, perhaps with different subpopulations fine-tuned to be especially good at particular tasks.

We could summarize this as a “country of geniuses in a datacenter”.

Clearly such an entity would be capable of solving very difficult problems, very fast, but it is not trivial to figure out how fast. Two “extreme” positions both seem false to me. First, you might think that the world would be instantly transformed on the scale of seconds or days (“the Singularity”), as superior intelligence builds on itself and solves every possible scientific, engineering, and operational task almost immediately. The problem with this is that there are real physical and practical limits, for example around building hardware or conducting biological experiments. Even a new country of geniuses would hit up against these limits. Intelligence may be very powerful, but it isn’t magic fairy dust.

Second, and conversely, you might believe that technological progress is saturated or rate-limited by real world data or by social factors, and that better-than-human intelligence will add very little. This seems equally implausible to me—I can think of hundreds of scientific or even social problems where a large group of really smart people would drastically speed up progress, especially if they aren’t limited to analysis and can make things happen in the real world (which our postulated country of geniuses can, including by directing or assisting teams of humans).

I think the truth is likely to be some messy admixture of these two extreme pictures, something that varies by task and field and is very subtle in its details. I believe we need new frameworks to think about these details in a productive way.

Economists often talk about “factors of production”: things like labor, land, and capital. The phrase “marginal returns to labor/land/capital” captures the idea that in a given situation, a given factor may or may not be the limiting one – for example, an air force needs both planes and pilots, and hiring more pilots doesn’t help much if you’re out of planes. I believe that in the AI age, we should be talking about the marginal returns to intelligence, and trying to figure out what the other factors are that are complementary to intelligence and that become limiting factors when intelligence is very high. We are not used to thinking in this way—to asking “how much does being smarter help with this task, and on what timescale?”—but it seems like the right way to conceptualize a world with very powerful AI.

My guess at a list of factors that limit or are complementary to intelligence includes:

Speed of the outside world. Intelligent agents need to operate interactively in the world in order to accomplish things and also to learn. But the world only moves so fast. Cells and animals run at a fixed speed so experiments on them take a certain amount of time which may be irreducible. The same is true of hardware, materials science, anything involving communicating with people, and even our existing software infrastructure. Furthermore, in science many experiments are often needed in sequence, each learning from or building on the last. All of this means that the speed at which a major project—for example developing a cancer cure—can be completed may have an irreducible minimum that cannot be decreased further even as intelligence continues to increase.
Need for data. Sometimes raw data is lacking and in its absence more intelligence does not help. Today’s particle physicists are very ingenious and have developed a wide range of theories, but lack the data to choose between them because particle accelerator data is so limited. It is not clear that they would do drastically better if they were superintelligent—other than perhaps by speeding up the construction of a bigger accelerator.
Intrinsic complexity. Some things are inherently unpredictable or chaotic and even the most powerful AI cannot predict or untangle them substantially better than a human or a computer today. For example, even incredibly powerful AI could predict only marginally further ahead in a chaotic system (such as the three-body problem) in the general case, as compared to today’s humans and computers.
Constraints from humans. Many things cannot be done without breaking laws, harming humans, or messing up society. An aligned AI would not want to do these things (and if we have an unaligned AI, we’re back to talking about risks). Many human societal structures are inefficient or even actively harmful, but are hard to change while respecting constraints like legal requirements on clinical trials, people’s willingness to change their habits, or the behavior of governments. Examples of advances that work well in a technical sense, but whose impact has been substantially reduced by regulations or misplaced fears, include nuclear power, supersonic flight, and even elevators.
Physical laws. This is a starker version of the first point. There are certain physical laws that appear to be unbreakable. It’s not possible to travel faster than light. Pudding does not unstir. Chips can only have so many transistors per square centimeter before they become unreliable. Computation requires a certain minimum energy per bit erased, limiting the density of computation in the world.

There is a further distinction based on timescales. Things that are hard constraints in the short run may become more malleable to intelligence in the long run. For example, intelligence might be used to develop a new experimental paradigm that allows us to learn in vitro what used to require live animal experiments, or to build the tools needed to collect new data (e.g. the bigger particle accelerator), or to (within ethical limits) find ways around human-based constraints (e.g. helping to improve the clinical trial system, helping to create new jurisdictions where clinical trials have less bureaucracy, or improving the science itself to make human clinical trials less necessary or cheaper).

Thus, we should imagine a picture where intelligence is initially heavily bottlenecked by the other factors of production, but over time intelligence itself increasingly routes around the other factors, even if they never fully dissolve (and some things like physical laws are absolute). The key question is how fast it all happens and in what order.

With the above framework in mind, I’ll try to answer that question for the five areas mentioned in the introduction.

Basic assumptions and framework

강력한 AI(나는 AGI라는 말을 싫어한다)가 무엇처럼 보일지 언제 등장할지에 대한 문제는 그 자체로 매우 큰 주제이다. 어떤 사람들은 결코 만들어지지 않을 것이라고 생각하지만 나는 2026년쯤에 올 수 있다고 생각하지만 훨씬 더 걸릴 수도 있다. 이 에세이 목적을 위해 그런 것은 제쳐두고 강력한 AI가 곧 온다고 가정을 한다. 그 이후 5-10년 동안 무슨일이 일어날지 집중적으로 다룰 것이다. 또한 이 시스템이 어떤 모습일지, 어떻게 상호작용 할지 정의하고자 한다.

내가 말하는 강력한 AI란,

현재의 언어 모델(LLM)과 비슷한 형태일 가능성이 높다. 하지만 아키텍처나, 훈련방식이 다를 수 있고 여러 상호작용 모델을 포함할 수 있다. 이는 다음과 같은 특성을 가질 것이다.

순수 지능 측면에서, 생물학, 프로그래밍, 수학, 공학, 작문 등 대부분의 관련 분야에서 노벨상 수상자보다 똑똑하다.
단지 "smart thing you talke to" 이상이다. 텍스트, 오디오, 비디오, 마우스 키보드 제어, 인터넷 액세스 등 인간이 사용하는 모든 인터페이스를 가지고 있다. 이 AI는 인터넷 작업 수행, 인간에게 지시하거나 받는 것, 재료를 주문하거나 실험을 지시하는 것 등 이 인터페이스를 통해 허용되는 모든 작업과 의사소통을 수행할 수 있다.
단순 질문에 대한 답을 하는 게 아니라 긴 시간이 걸리는 명확성이 필요한 작업도 자율적으로 수행하는 똑똑한 직원과 유사하다.
물리적 구현은 없지만 컴퓨터를 통해 물리적 도구나 로봇, 실험 장비를 제어할 수 있고 이론적으로 스스로 사용할 로봇이나 장비를 설계할 수도 있다.
한 번 훈련된 모델의 자원은 그대로 재사용하여 수백만 개의 인스턴스를 운영할 수 있다. 정보를 흡수하고 행동을 하는 속도가 대체로 인간의 10~100배 빠르다. 그러나 물리적 시계나 상호작용하는 SW 응답시간에 의해 제한될 수 있다.
수백만 개의 복사 모델은 각각 독립적으로 무관한 작업을 수행할 수 있고 필요하다면 인간이 협력하는 방식처럼 협력하여 작업을 수행할 수 있다. 그리고 아마도 특정 작업에 뛰어나도록 파인튠된 다른 하위그룹이 존재할 수도 있을 것이다.

우리는 이것을 "데이터 센터에 있는 천재들의 나라(country of geniuses in a datacenter)"라고 요약할 수 있다.

이러한 존재들은 매우 어려운 문제를 매우 빠르게 해결할 수 있겠지만 그 속도가 얼마나 될지는 간단히 파악할 수 없다. 그리고 아래 두가지 '극단적인' 입장이 모두 잘못되었다고 생각한다.

(지나치게 낙관적) 특이점(the Singularity)처럼 우수한 지능이 거의 즉시 모든 과학적 공학적 운영적 문제를 해결하며 세계가 몇 초 혹은 몇 일내에 즉각적으로 변할 것이라고 생각하는 것. 하지만 이는 물리적이고 실용적인 한계가 존재한다. 하드웨어를 구축하거나 생물학적 실험을 수행하는 데는 제한이 있고 이 강력한 지능이 마법의 요정 가루는 아니다.
(지나치게 비관적) 기술 발전이 실제 세계의 데이터나 사회적 요인에 의해 한계에 달하거나 포화에 이르어 인간보다 뛰어난 지능이 큰 변화를 일으키지 않을 것이라 생각하는 것. 나는 이에 반하는 수백가지의 과학적 사회적 문제를 떠올릴 수 있다. 뛰어난 지능을 가진 사람들이 모여 실제로 행동할 수 있다면 문제 해결 속도가 급격히 빨라질 것이다. 그들이 분석만 하는 데 그치지 않고 실제로 일을 할 수 있다면 더더욱 그렇다. (상상 속 천재 국가가 바로 그런 일을 할 수 있고 인간 팀을 지휘하거나 도울 수 있다.)

진실은 이 극단의 혼합된 형태일 가능성이 크다. 비행기가 부족하면 조종사를 더 고용하는 경우가 큰 도움이 되지 않는 것처럼 경제학자들은 "marginal returns(한계 수익) to labor/land/capital"에 대해 이야기한다. 이처럼 AI시대에는 "marginal returns to intelligence"에 대해 이야기해야 한다고 생각하며, 지능에 보완적인 요소와 지능이 매우 높을 때 제한적인 요소가 무엇인지 파악해야 한다고 믿는다. 우리는 단지 더 똑똑해지면 이러한 task들에 얼마나 도움이 되는지 시간 규모는 어떻게 되는지를 묻는 것에 익숙하지 않다. 내 생각에 지능을 제한하거나 보완하는 요소들의 목록은 다음과 같다.

Speed of the outside world
세상의 속도는 제한되어 있다. 세포와 동물은 고정된 속도로 움직이므로 이에 대한 실험은 일정 시간이 소요되어 그 시간 이상은 줄일 수 없는 경우가 많다. SW infra도 마찬가지이다. 과학에서는 이전 실험을 통해 배우거나 그 기반을 둬야 하기 때문에 많은 실험이 순차적으로 필요하다. 따라서 암 치료법 개발과 같은 큰 프로젝트에 대해 최소한의 시간이 필요할 수 있고 지능이 계속 증가하더라도 이를 더 이상 줄일 수 없는 한계가 있을 수 있다.
Need for data
물리학자들이 매우 다양한 이론을 개발했더라도 particle accelerator 데이터가 너무 제한적이어 어떤 이론을 채택할지 결정할 수 없다. 이처럼 슈퍼지능이 있다하더라도 원시데이터가 부족하면 더 많은 지능이 도움되지 않는다.
Intrinsic complexity
강력한 AI도 chaotic system(e.x. three-body problem)과 같은 본질적으로 예측 불가능하거나 혼돈적인 것들은 풀 수 없다.
Constraints from humans
많은 일들은 법을 어기거나, 인간에게 해를 끼치거나, 사회를 망치는 일이 아니면 불가능하다. Aligned AI가 이런일을 하지 않기를 원하지만 많은 인간 사회 구조는 비효율적이거나 해를 끼치지만, 임상 실험에 대한 법적 요구사항, 사람들의 습관 변화에 대한 의지, 정부의 행동과 같은 제약을 존중하며 이를 변화시키는 것은 어렵다. 기술적으로는 잘 작동하지만 이와 같은 규제나 잘못된 두려움으로 인해 그 영향이 줄어든 예로는 원자력, 초음속 비행, 심지어 엘레베이터가 있다.
Physical laws
첫번째 보다 더 극단적인 버전으로, 깨지지 않는 물리 법칙들이 있다. 빛의 속도보다 빠르게 여행할 수 없다던가 푸딩은 다시 휘저을 수 없다 던가 하는 것들이다. 칩은 일정 밀도를 넘으면 불안정해져 일정 수의 트랜지스터만 가질 수 있고 컴퓨터 연산은 비트를 지울 때 최소한의 에너지를 필요로 하며 이 세상에서의 연산 밀도를 제한한다.

이는 단기적으로는 강한 제약이 될 수 있지만 장기적으로는 지능에 의해 유연해질 수 있다. 예) 지능을 활용해 동물 실험이 필요했던 것을 실험실에서 할 수 있도록 새로운 패러다임을 개발하거나, 새로운 데이터를 수집하는 데 필요한 도구를 만들거나

따라서 우리는 처음에는 다른 생산 요소들에 의해 지능이 크게 제한되는 상황을 상상할 수 있지만, 시간이 지남에 따라 지능 자체가 점차 다른 요소들을 우회하는 모습을 상상해야 한다. 물론 이러한 제한적 요소들이 완전히 사라지진 않지만(물리 법칙과 같은 것들은 절대적이므로), 지능은 점차 다른 요소들을 피할 수 있게 될 것이다.

여기서 핵심 질문은 이러한 변화가 얼마나 빠르게 일어나느냐 이고 어떤 순서로 이루어지는가이다. 나는 이 프레임워크를 염두해두고 다섯가지 분야에 대해 이 질문에 답해보겠다.

원문

1. Biology and health

Biology is probably the area where scientific progress has the greatest potential to directly and unambiguously improve the quality of human life. In the last century some of the most ancient human afflictions (such as smallpox) have finally been vanquished, but many more still remain, and defeating them would be an enormous humanitarian accomplishment. Beyond even curing disease, biological science can in principle improve the baseline quality of human health, by extending the healthy human lifespan, increasing control and freedom over our own biological processes, and addressing everyday problems that we currently think of as immutable parts of the human condition.

In the “limiting factors” language of the previous section, the main challenges with directly applying intelligence to biology are data, the speed of the physical world, and intrinsic complexity (in fact, all three are related to each other). Human constraints also play a role at a later stage, when clinical trials are involved. Let’s take these one by one.

Experiments on cells, animals, and even chemical processes are limited by the speed of the physical world: many biological protocols involve culturing bacteria or other cells, or simply waiting for chemical reactions to occur, and this can sometimes take days or even weeks, with no obvious way to speed it up. Animal experiments can take months (or more) and human experiments often take years (or even decades for long-term outcome studies). Somewhat related to this, data is often lacking—not so much in quantity, but quality: there is always a dearth of clear, unambiguous data that isolates a biological effect of interest from the other 10,000 confounding things that are going on, or that intervenes causally in a given process, or that directly measures some effect (as opposed to inferring its consequences in some indirect or noisy way). Even massive, quantitative molecular data, like the proteomics data that I collected while working on mass spectrometry techniques, is noisy and misses a lot (which types of cells were these proteins in? Which part of the cell? At what phase in the cell cycle?).

In part responsible for these problems with data is intrinsic complexity: if you’ve ever seen a diagram showing the biochemistry of human metabolism, you’ll know that it’s very hard to isolate the effect of any part of this complex system, and even harder to intervene on the system in a precise or predictable way. And finally, beyond just the intrinsic time that it takes to run an experiment on humans, actual clinical trials involve a lot of bureaucracy and regulatory requirements that (in the opinion of many people, including me) add unnecessary additional time and delay progress.

Given all this, many biologists have long been skeptical of the value of AI and “big data” more generally in biology. Historically, mathematicians, computer scientists, and physicists who have applied their skills to biology over the last 30 years have been quite successful, but have not had the truly transformative impact initially hoped for. Some of the skepticism has been reduced by major and revolutionary breakthroughs like AlphaFold (which has just deservedly won its creators the Nobel Prize in Chemistry) and AlphaProteo 11, but there’s still a perception that AI is (and will continue to be) useful in only a limited set of circumstances. A common formulation is “AI can do a better job analyzing your data, but it can’t produce more data or improve the quality of the data. Garbage in, garbage out”.

But I think that pessimistic perspective is thinking about AI in the wrong way. If our core hypothesis about AI progress is correct, then the right way to think of AI is not as a method of data analysis, but as a virtual biologist who performs all the tasks biologists do, including designing and running experiments in the real world (by controlling lab robots or simply telling humans which experiments to run – as a Principal Investigator would to their graduate students), inventing new biological methods or measurement techniques, and so on. It is by speeding up the whole research process that AI can truly accelerate biology. I want to repeat this because it’s the most common misconception that comes up when I talk about AI’s ability to transform biology: I am not talking about AI as merely a tool to analyze data. In line with the definition of powerful AI at the beginning of this essay, I’m talking about using AI to perform, direct, and improve upon nearly everything biologists do.

To get more specific on where I think acceleration is likely to come from, a surprisingly large fraction of the progress in biology has come from a truly tiny number of discoveries, often related to broad measurement tools or techniques12 that allow precise but generalized or programmable intervention in biological systems. There’s perhaps ~1 of these major discoveries per year and collectively they arguably drive >50% of progress in biology. These discoveries are so powerful precisely because they cut through intrinsic complexity and data limitations, directly increasing our understanding and control over biological processes. A few discoveries per decade have enabled both the bulk of our basic scientific understanding of biology, and have driven many of the most powerful medical treatments.

Some examples include:

CRISPR: a technique that allows live editing of any gene in living organisms (replacement of any arbitrary gene sequence with any other arbitrary sequence). Since the original technique was developed, there have been constant improvements to target specific cell types, increasing accuracy, and reducing edits of the wrong gene—all of which are needed for safe use in humans.
Various kinds of microscopy for watching what is going on at a precise level: advanced light microscopes (with various kinds of fluorescent techniques, special optics, etc), electron microscopes, atomic force microscopes, etc.
Genome sequencing and synthesis, which has dropped in cost by several orders of magnitude in the last couple decades.
Optogenetic techniques that allow you to get a neuron to fire by shining a light on it.
mRNA vaccines that, in principle, allow us to design a vaccine against anything and then quickly adapt it (mRNA vaccines of course became famous during COVID).
Cell therapies such as CAR-T that allow immune cells to be taken out of the body and “reprogrammed” to attack, in principle, anything.
Conceptual insights like the germ theory of disease or the realization of a link between the immune system and cancer13.

I’m going to the trouble of listing all these technologies because I want to make a crucial claim about them: I think their rate of discovery could be increased by 10x or more if there were a lot more talented, creative researchers. Or, put another way, I think the returns to intelligence are high for these discoveries, and that everything else in biology and medicine mostly follows from them.

Why do I think this? Because of the answers to some questions that we should get in the habit of asking when we’re trying to determine “returns to intelligence”. First, these discoveries are generally made by a tiny number of researchers, often the same people repeatedly, suggesting skill and not random search (the latter might suggest lengthy experiments are the limiting factor). Second, they often “could have been made” years earlier than they were: for example, CRISPR was a naturally occurring component of the immune system in bacteria that’s been known since the 80’s, but it took another 25 years for people to realize it could be repurposed for general gene editing. They also are often delayed many years by lack of support from the scientific community for promising directions (see this profile on the inventor of mRNA vaccines; similar stories abound). Third, successful projects are often scrappy or were afterthoughts that people didn’t initially think were promising, rather than massively funded efforts. This suggests that it’s not just massive resource concentration that drives discoveries, but ingenuity.

Finally, although some of these discoveries have “serial dependence” (you need to make discovery A first in order to have the tools or knowledge to make discovery B)—which again might create experimental delays—many, perhaps most, are independent, meaning many at once can be worked on in parallel. Both these facts, and my general experience as a biologist, strongly suggest to me that there are hundreds of these discoveries waiting to be made if scientists were smarter and better at making connections between the vast amount of biological knowledge humanity possesses (again consider the CRISPR example). The success of AlphaFold/AlphaProteo at solving important problems much more effectively than humans, despite decades of carefully designed physics modeling, provides a proof of principle (albeit with a narrow tool in a narrow domain) that should point the way forward.

Thus, it’s my guess that powerful AI could at least 10x the rate of these discoveries, giving us the next 50-100 years of biological progress in 5-10 years.14 Why not 100x? Perhaps it is possible, but here both serial dependence and experiment times become important: getting 100 years of progress in 1 year requires a lot of things to go right the first time, including animal experiments and things like designing microscopes or expensive lab facilities. I’m actually open to the (perhaps absurd-sounding) idea that we could get 1000 years of progress in 5-10 years, but very skeptical that we can get 100 years in 1 year. Another way to put it is I think there’s an unavoidable constant delay: experiments and hardware design have a certain “latency” and need to be iterated upon a certain “irreducible” number of times in order to learn things that can’t be deduced logically. But massive parallelism may be possible on top of that15.

What about clinical trials? Although there is a lot of bureaucracy and slowdown associated with them, the truth is that a lot (though by no means all!) of their slowness ultimately derives from the need to rigorously evaluate drugs that barely work or ambiguously work. This is sadly true of most therapies today: the average cancer drug increases survival by a few months while having significant side effects that need to be carefully measured (there’s a similar story for Alzheimer’s drugs). This leads to huge studies (in order to achieve statistical power) and difficult tradeoffs which regulatory agencies generally aren’t great at making, again because of bureaucracy and the complexity of competing interests.

When something works really well, it goes much faster: there’s an accelerated approval track and the ease of approval is much greater when effect sizes are larger. mRNA vaccines for COVID were approved in 9 months—much faster than the usual pace. That said, even under these conditions clinical trials are still too slow—mRNA vaccines arguably should have been approved in ~2 months. But these kinds of delays (~1 year end-to-end for a drug) combined with massive parallelization and the need for some but not too much iteration (“a few tries”) are very compatible with radical transformation in 5-10 years. Even more optimistically, it is possible that AI-enabled biological science will reduce the need for iteration in clinical trials by developing better animal and cell experimental models (or even simulations) that are more accurate in predicting what will happen in humans. This will be particularly important in developing drugs against the aging process, which plays out over decades and where we need a faster iteration loop.

Finally, on the topic of clinical trials and societal barriers, it is worth pointing out explicitly that in some ways biomedical innovations have an unusually strong track record of being successfully deployed, in contrast to some other technologies16. As mentioned in the introduction, many technologies are hampered by societal factors despite working well technically. This might suggest a pessimistic perspective on what AI can accomplish. But biomedicine is unique in that although the process of developing drugs is overly cumbersome, once developed they generally are successfully deployed and used.

To summarize the above, my basic prediction is that AI-enabled biology and medicine will allow us to compress the progress that human biologists would have achieved over the next 50-100 years into 5-10 years. I’ll refer to this as the “compressed 21st century”: the idea that after powerful AI is developed, we will in a few years make all the progress in biology and medicine that we would have made in the whole 21st century.

Although predicting what powerful AI can do in a few years remains inherently difficult and speculative, there is some concreteness to asking “what could humans do unaided in the next 100 years?”. Simply looking at what we’ve accomplished in the 20th century, or extrapolating from the first 2 decades of the 21st, or asking what “10 CRISPR’s and 50 CAR-T’s” would get us, all offer practical, grounded ways to estimate the general level of progress we might expect from powerful AI.

Below I try to make a list of what we might expect. This is not based on any rigorous methodology, and will almost certainly prove wrong in the details, but it’s trying to get across the general level of radicalism we should expect:

Reliable prevention and treatment of nearly all17 natural infectious disease. Given the enormous advances against infectious disease in the 20th century, it is not radical to imagine that we could more or less “finish the job” in a compressed 21st. mRNA vaccines and similar technology already point the way towards “vaccines for anything”. Whether infectious disease is fully eradicated from the world (as opposed to just in some places) depends on questions about poverty and inequality, which are discussed in Section 3.
Elimination of most cancer. Death rates from cancer have been dropping ~2% per year for the last few decades; thus we are on track to eliminate most cancer in the 21st century at the current pace of human science. Some subtypes have already been largely cured (for example some types of leukemia with CAR-T therapy), and I’m perhaps even more excited for very selective drugs that target cancer in its infancy and prevent it from ever growing. AI will also make possible treatment regimens very finely adapted to the individualized genome of the cancer—these are possible today, but hugely expensive in time and human expertise, which AI should allow us to scale. Reductions of 95% or more in both mortality and incidence seem possible. That said, cancer is extremely varied and adaptive, and is likely the hardest of these diseases to fully destroy. It would not be surprising if an assortment of rare, difficult malignancies persists.
Very effective prevention and effective cures for genetic disease. Greatly improved embryo screening will likely make it possible to prevent most genetic disease, and some safer, more reliable descendant of CRISPR may cure most genetic disease in existing people. Whole-body afflictions that affect a large fraction of cells may be the last holdouts, however.
Prevention of Alzheimer’s. We’ve had a very hard time figuring out what causes Alzheimer’s (it is somehow related to beta-amyloid protein, but the actual details seem to be very complex). It seems like exactly the type of problem that can be solved with better measurement tools that isolate biological effects; thus I am bullish about AI’s ability to solve it. There is a good chance it can eventually be prevented with relatively simple interventions, once we actually understand what is going on. That said, damage from already-existing Alzheimer’s may be very difficult to reverse.
Improved treatment of most other ailments. This is a catch-all category for other ailments including diabetes, obesity, heart disease, autoimmune diseases, and more. Most of these seem “easier” to solve than cancer and Alzheimer’s and in many cases are already in steep decline. For example, deaths from heart disease have already declined over 50%, and simple interventions like GLP-1 agonists have already made huge progress against obesity and diabetes.
Biological freedom. The last 70 years featured advances in birth control, fertility, management of weight, and much more. But I suspect AI-accelerated biology will greatly expand what is possible: weight, physical appearance, reproduction, and other biological processes will be fully under people’s control. We’ll refer to these under the heading of biological freedom: the idea that everyone should be empowered to choose what they want to become and live their lives in the way that most appeals to them. There will of course be important questions about global equality of access; see Section 3 for these.
Doubling of the human lifespan18. This might seem radical, but life expectancy increased almost 2x in the 20th century (from ~40 years to ~75), so it’s “on trend” that the “compressed 21st” would double it again to 150. Obviously the interventions involved in slowing the actual aging process will be different from those that were needed in the last century to prevent (mostly childhood) premature deaths from disease, but the magnitude of change is not unprecedented19. Concretely, there already exist drugs that increase maximum lifespan in rats by 25-50% with limited ill-effects. And some animals (e.g. some types of turtle) already live 200 years, so humans are manifestly not at some theoretical upper limit. At a guess, the most important thing that is needed might be reliable, non-Goodhart-able biomarkers of human aging, as that will allow fast iteration on experiments and clinical trials. Once human lifespan is 150, we may be able to reach “escape velocity”, buying enough time that most of those currently alive today will be able to live as long as they want, although there’s certainly no guarantee this is biologically possible.

It is worth looking at this list and reflecting on how different the world will be if all of it is achieved 7-12 years from now (which would be in line with an aggressive AI timeline). It goes without saying that it would be an unimaginable humanitarian triumph, the elimination all at once of most of the scourges that have haunted humanity for millennia. Many of my friends and colleagues are raising children, and when those children grow up, I hope that any mention of disease will sound to them the way scurvy, smallpox, or bubonic plague sounds to us. That generation will also benefit from increased biological freedom and self-expression, and with luck may also be able to live as long as they want.

It’s hard to overestimate how surprising these changes will be to everyone except the small community of people who expected powerful AI. For example, thousands of economists and policy experts in the US currently debate how to keep Social Security and Medicare solvent, and more broadly how to keep down the cost of healthcare (which is mostly consumed by those over 70 and especially those with terminal illnesses such as cancer). The situation for these programs is likely to be radically improved if all this comes to pass20, as the ratio of working age to retired population will change drastically. No doubt these challenges will be replaced with others, such as how to ensure widespread access to the new technologies, but it is worth reflecting on how much the world will change even if biology is the only area to be successfully accelerated by AI.

1. Biology and health

생물학은 인간 삶의 질을 직접적이고 명확하게 향상시킬 수 있는 과학적 발전의 잠재력이 가장 큰 분야일 것이다. "limiting factors"에 대해 이야기하면, 생물학에 지능을 직접 적용할 때의 주요 도전 과제는 데이터와 물리적 세계의 속도, 그리고 본질적인 복잡성이다(이 세가지는 서로 연관되어 있다).

데이터와 물리적 세계의 속도. 세포, 동물, 심지어 화학적 과정에 대한 실험은 물리적 세계의 속도에 의해 제한된다. 많은 생물학적 프로토콜은 박테리아나 다른 세포를 배양하거나 화학 반응이 일어나길 기다려야하며 이를 가속화할 명확한 방법은 없다. 동물 실험은 몇달, 인간 실험은 종종 몇 년, 수십 년이 걸리기도 한다. 이에 관한 데이터는 양 뿐 아니라 품질도 부족하다. 생물학적 효과에서 혼란스러운 요소로 부터 분리되거나, 인과적으로 개입하거나, 어떤 효과를 직접적으로 측정하는 명확한 데이터는 항상 부족하다.

본질적인 복잡성. 이러한 데이터 문제의 일부는 본질적인 복잡성에서 비롯된다. 예를 들어 인간 대사 과정의 생화학을 보여주는 다이어그램을 본 적이 있다면 이 복잡한 시스템에서 어떤 부분의 영향을 분리하는 것이 매우 어렵고 이 시스템을 정확하거나 예측 가능한 방식으로 개입하는 것이 더 어렵다는 것을 알 수 있다.

이 외에도 실제 임상 시험에는 많은 관료주의와 규제 조건이 있으며 이는 많은 사람들이 생각하는 것처럼 불필요한 추가 시간과 진행 지연을 초래한다. 임상 시험과 사회적 장벽에 관한 논의에서 생물의학 혁신이 다른 기술들과 비교해 매우 성공적으로 배치된 사례가 많다는 점을 명확히 언급할 가치가 있다. 많은 기술들이 기술적으로 잘 작동해도 사회적 요인 때문에 어려움을 겪고 있다. 이것이 AI에 대한 비관적 전망을 제시할 수도 있다. 그러나 생물의학은 독특하다. 약물이 개발되는 과정이 너무 복잡하긴 하지만 일단 개발되면 대부분 성공적으로 배치된다는 점에서 차별화된다.

이 점들을 고려할 때 생물학자들은 AI와 빅데이터가 생물학에 가치있다고 보는 것에 회의적이었으며 지난 30년 동안 생물학에 수학, 컴퓨터, 물리학자들이 적용한 기술이 꽤 성공적이었음에도 변혁적인 영향을 미친 것도 아니다. AlphaFold와 AlphaProteo 같은 혁명적 돌파구들이 AI에 대한 회의적인 시각을 어느 정도 줄였지만 여전히 AI가 유용한 상황이 제한적일 것이라는 인식은 남아 있다. "AI는 데이터를 잘 분석할 수 있지만 더 많은 데이터를 생성하거나 데이터의 품질을 향상시킬 수 없다. 쓰레기를 넣으면 쓰레가기 나온다."는 일반적인 견해가 그렇다.

하지만 나는 이 시각이 AI를 잘못 이해하는 것이라고 생각한다. AI 발전에 대한 우리의 핵심 가설이 맞다면 AI는 단순한 데이터 분석 도구가 아니라 "생물학자가 하는 모든 작업을 수행하는 가상의 생물학자"로 생각해야 한다.

생물학의 발전에서 놀라울 정도의 많은 진전이 정말 극 소수의 발견에서 비롯되었으며 이들은 생물학 시스템에 정확하지만 일반화된 또는 프로그래밍 가능한 개입을 가능하게 하는 도구나 기법에 관련이 있다. 이러한 발견은 연간 1건 정도이며 이들이 생물학에서 50% 이상의 발전을 이끌어낸다고 볼 수 있다. 이 발견이 강력한 이유는 본질적인 복잡성과 데이터의 한계를 뚫고 생물학적 과정을 이해하고 통제하는 능력을 직접적으로 향상시키기 때문이다. 몇 가지 예를 들면, CRISPR, micriscopy at a precise level, Genome sequencing and synthesis, Optogenetic, mRNA vaccines 이고 이러한 기술들의 발견 속도가 10배 이상 증가할 수 있다고 생각한다. 왜냐면 이러한 발견은 대게 소수의 뛰어난 연구자들에 의해 이뤄지고 그들이 자주 같은 발견을 반복하기 때문이다. 이는 우연한 탐색이 아닌 기술과 창의성의 결과이다.

AI가 가속화할 수 있는 발전의 예 - 전염병의 예방과 치료, 유전질환 예방과 치료법, 생물학적 자유(외모, 생식, 체중 등), 인간 수명 두 배 연장, 대부분의 암 퇴치, 알츠하이머 예방, 당뇨병, 비만, 심장 질환 등의 대부분의 질병은 암이나 알츠하이머보다는 더 쉽게 해결될 수 있을 것이다. AI는 이 발전의 속도를 10배 이상 가속화할 수 있을 것으로 예상하며 이는 생물학의 50-100년을 5-10년으로 압축할 것이다.

이 변화들이 강력한 AI의 도입을 예상한 소수의 사람들을 제외한 모두에게 얼마나 놀라울 것인지 과소평가하기는 어렵다. 예를 들어 현재 미국에서 수천 명의 경제학자와 정책 전문가들이 사회보장제도와 메디케어의 재정 안정을 어떻게 유지할지, 의료 비용을 어떻게 낮출지에 대한 논의가 있다 (의료비용 대부분은 70세 이상의 노인들, 특히 암과 같은 말기 질환 환자들에 의해 소비된다). 만약 언급한 발전들이 실현된다면, 경제활동 연령층과 은퇴하는 인구 비율이 극단적으로 변할 것이기 때문에, 이 프로그램들의 상황은 급진적으로 개선될 가능성이 크다. 물론 이 도전들은 새로운 기술에 대한 접근성을 보장하는 문제와 같은 다른 문제로 대체될 것이지만 AI가 생물학 분야만이라도 성공적으로 가속화한다면 세상이 얼마나 달라질지 깊이 생각해 볼 가치가 있다.

원문

2. Neuroscience and mind

In the previous section I focused on physical diseases and biology in general, and didn’t cover neuroscience or mental health. But neuroscience is a subdiscipline of biology and mental health is just as important as physical health. In fact, if anything, mental health affects human well-being even more directly than physical health. Hundreds of millions of people have very low quality of life due to problems like addiction, depression, schizophrenia, low-functioning autism, PTSD, psychopathy21, or intellectual disabilities. Billions more struggle with everyday problems that can often be interpreted as much milder versions of one of these severe clinical disorders. And as with general biology, it may be possible to go beyond addressing problems to improving the baseline quality of human experience.

The basic framework that I laid out for biology applies equally to neuroscience. The field is propelled forward by a small number of discoveries often related to tools for measurement or precise intervention – in the list of those above, optogenetics was a neuroscience discovery, and more recently CLARITY and expansion microscopy are advances in the same vein, in addition to many of the general cell biology methods directly carrying over to neuroscience. I think the rate of these advances will be similarly accelerated by AI and therefore that the framework of “100 years of progress in 5-10 years” applies to neuroscience in the same way it does to biology and for the same reasons. As in biology, the progress in 20th century neuroscience was enormous – for example we didn’t even understand how or why neurons fired until the 1950’s. Thus, it seems reasonable to expect AI-accelerated neuroscience to produce rapid progress over a few years.

There is one thing we should add to this basic picture, which is that some of the things we’ve learned (or are learning) about AI itself in the last few years are likely to help advance neuroscience, even if it continues to be done only by humans. Interpretability is an obvious example: although biological neurons superficially operate in a completely different manner from artificial neurons (they communicate via spikes and often spike rates, so there is a time element not present in artificial neurons, and a bunch of details relating to cell physiology and neurotransmitters modifies their operation substantially), the basic question of “how do distributed, trained networks of simple units that perform combined linear/non-linear operations work together to perform important computations” is the same, and I strongly suspect the details of individual neuron communication will be abstracted away in most of the interesting questions about computation and circuits22. As just one example of this, a computational mechanism discovered by interpretability researchers in AI systems was recently rediscovered in the brains of mice.

It is much easier to do experiments on artificial neural networks than on real ones (the latter often requires cutting into animal brains), so interpretability may well become a tool for improving our understanding of neuroscience. Furthermore, powerful AI’s will themselves probably be able to develop and apply this tool better than humans can.

Beyond just interpretability though, what we have learned from AI about how intelligent systems are trained should (though I am not sure it has yet) cause a revolution in neuroscience. When I was working in neuroscience, a lot of people focused on what I would now consider the wrong questions about learning, because the concept of the scaling hypothesis / bitter lesson didn’t exist yet. The idea that a simple objective function plus a lot of data can drive incredibly complex behaviors makes it more interesting to understand the objective functions and architectural biases and less interesting to understand the details of the emergent computations. I have not followed the field closely in recent years, but I have a vague sense that computational neuroscientists have still not fully absorbed the lesson. My attitude to the scaling hypothesis has always been “aha – this is an explanation, at a high level, of how intelligence works and how it so easily evolved”, but I don’t think that’s the average neuroscientist’s view, in part because the scaling hypothesis as “the secret to intelligence” isn’t fully accepted even within AI.

I think that neuroscientists should be trying to combine this basic insight with the particularities of the human brain (biophysical limitations, evolutionary history, topology, details of motor and sensory inputs/outputs) to try to figure out some of neuroscience’s key puzzles. Some likely are, but I suspect it’s not enough yet, and that AI neuroscientists will be able to more effectively leverage this angle to accelerate progress.

I expect AI to accelerate neuroscientific progress along four distinct routes, all of which can hopefully work together to cure mental illness and improve function:

Traditional molecular biology, chemistry, and genetics. This is essentially the same story as general biology in section 1, and AI can likely speed it up via the same mechanisms. There are many drugs that modulate neurotransmitters in order to alter brain function, affect alertness or perception, change mood, etc., and AI can help us invent many more. AI can probably also accelerate research on the genetic basis of mental illness.
Fine-grained neural measurement and intervention. This is the ability to measure what a lot of individual neurons or neuronal circuits are doing, and intervene to change their behavior. Optogenetics and neural probes are technologies capable of both measurement and intervention in live organisms, and a number of very advanced methods (such as molecular ticker tapes to read out the firing patterns of large numbers of individual neurons) have also been proposed and seem possible in principle.
Advanced computational neuroscience. As noted above, both the specific insights and the gestalt of modern AI can probably be applied fruitfully to questions in systems neuroscience, including perhaps uncovering the real causes and dynamics of complex diseases like psychosis or mood disorders.
Behavioral interventions. I haven’t much mentioned it given the focus on the biological side of neuroscience, but psychiatry and psychology have of course developed a wide repertoire of behavioral interventions over the 20th century; it stands to reason that AI could accelerate these as well, both the development of new methods and helping patients to adhere to existing methods. More broadly, the idea of an “AI coach” who always helps you to be the best version of yourself, who studies your interactions and helps you learn to be more effective, seems very promising.

It’s my guess that these four routes of progress working together would, as with physical disease, be on track to lead to the cure or prevention of most mental illness in the next 100 years even if AI was not involved – and thus might reasonably be completed in 5-10 AI-accelerated years. Concretely my guess at what will happen is something like:

Most mental illness can probably be cured. I’m not an expert in psychiatric disease (my time in neuroscience was spent building probes to study small groups of neurons) but it’s my guess that diseases like PTSD, depression, schizophrenia, addiction, etc. can be figured out and very effectively treated via some combination of the four directions above. The answer is likely to be some combination of “something went wrong biochemically” (although it could be very complex) and “something went wrong with the neural network, at a high level”. That is, it’s a systems neuroscience question—though that doesn’t gainsay the impact of the behavioral interventions discussed above. Tools for measurement and intervention, especially in live humans, seem likely to lead to rapid iteration and progress.
Conditions that are very “structural” may be more difficult, but not impossible. There’s some evidence that psychopathy is associated with obvious neuroanatomical differences – that some brain regions are simply smaller or less developed in psychopaths. Psychopaths are also believed to lack empathy from a young age; whatever is different about their brain, it was probably always that way. The same may be true of some intellectual disabilities, and perhaps other conditions. Restructuring the brain sounds hard, but it also seems like a task with high returns to intelligence. Perhaps there is some way to coax the adult brain into an earlier or more plastic state where it can be reshaped. I’m very uncertain how possible this is, but my instinct is to be optimistic about what AI can invent here.
Effective genetic prevention of mental illness seems possible. Most mental illness is partially heritable, and genome-wide association studies are starting to gain traction on identifying the relevant factors, which are often many in number. It will probably be possible to prevent most of these diseases via embryo screening, similar to the story with physical disease. One difference is that psychiatric disease is more likely to be polygenic (many genes contribute), so due to complexity there’s an increased risk of unknowingly selecting against positive traits that are correlated with disease. Oddly however, in recent years GWAS studies seem to suggest that these correlations might have been overstated. In any case, AI-accelerated neuroscience may help us to figure these things out. Of course, embryo screening for complex traits raises a number of societal issues and will be controversial, though I would guess that most people would support screening for severe or debilitating mental illness.
Everyday problems that we don’t think of as clinical disease will also be solved. Most of us have everyday psychological problems that are not ordinarily thought of as rising to the level of clinical disease. Some people are quick to anger, others have trouble focusing or are often drowsy, some are fearful or anxious, or react badly to change. Today, drugs already exist to help with e.g. alertness or focus (caffeine, modafinil, ritalin) but as with many other previous areas, much more is likely to be possible. Probably many more such drugs exist and have not been discovered, and there may also be totally new modalities of intervention, such as targeted light stimulation (see optogenetics above) or magnetic fields. Given how many drugs we’ve developed in the 20th century that tune cognitive function and emotional state, I’m very optimistic about the “compressed 21st” where everyone can get their brain to behave a bit better and have a more fulfilling day-to-day experience.
Human baseline experience can be much better. Taking one step further, many people have experienced extraordinary moments of revelation, creative inspiration, compassion, fulfillment, transcendence, love, beauty, or meditative peace. The character and frequency of these experiences differs greatly from person to person and within the same person at different times, and can also sometimes be triggered by various drugs (though often with side effects). All of this suggests that the “space of what is possible to experience” is very broad and that a larger fraction of people’s lives could consist of these extraordinary moments. It is probably also possible to improve various cognitive functions across the board. This is perhaps the neuroscience version of “biological freedom” or “extended lifespans”.

One topic that often comes up in sci-fi depictions of AI, but that I intentionally haven’t discussed here, is “mind uploading”, the idea of capturing the pattern and dynamics of a human brain and instantiating them in software. This topic could be the subject of an essay all by itself, but suffice it to say that while I think uploading is almost certainly possible in principle, in practice it faces significant technological and societal challenges, even with powerful AI, that likely put it outside the 5-10 year window we are discussing.

In summary, AI-accelerated neuroscience is likely to vastly improve treatments for, or even cure, most mental illness as well as greatly expand “cognitive and mental freedom” and human cognitive and emotional abilities. It will be every bit as radical as the improvements in physical health described in the previous section. Perhaps the world will not be visibly different on the outside, but the world as experienced by humans will be a much better and more humane place, as well as a place that offers greater opportunities for self-actualization. I also suspect that improved mental health will ameliorate a lot of other societal problems, including ones that seem political or economic.

2.Neuroscience ans mind

신경과학은 생물학의 하위 분류이고 정신 건강은 신체 건강만큼 중요하다. 수억 명의 사람들이 중독, 우울증, 조현병, 자폐증, PTSD, 사이코패스, 지적 장애 등의 문제로 낮은 삶의 질을 겪고 있다. 생물학과 마찬가지로 인간 경험의 기본적 질을 향상시키는 것이 AI로 가능할 수 있다. 내가 생물학에서 제시한 기본 프레임워크는 신경과학에도 동일하게 적용된다. 신경과학 분야는 측정 도구나 정밀한 개입과 관련된 몇 가지 발견에 의해 추진된다. optogenetics는 신경과학의 발견이었고 최근에는 CLARITY와 Expansion Microscopy와 같은 방법들이 비슷한 방향으로 발전하고 있다. 또한 많은 일반 세포 생물학 방법들이 신경과학에도 직접적으로 적용되고 있다. 따라서 가속화는 신경과학에도 동일하게 적용된다고 본다.

이 Basic picture에 추가해야 할 점은, 최근 몇 년 동안 우리가 AI에 대해 배운 것들이 신경과학 발전에 도움이 될 가능성이 있다는 것이다. 비록 신경과학이 여전히 인간에 의해 연구된다 하더라도 말이다. "Interpretability(해석가능성)"은 명백한 그 예이다. 생물학적 세포(neuron)는 인공 신경망(neural net)과 표면적으로는 완전히 다른 방식으로 작동한다(생물학적 신경 세포는 스파이크와 그 빈도를 통해 소통하고 이는 인공 신경망에는 없는 시간적 요소를 포함하고 세포 생리학과 신경전달물질과 같은 세부 사항이 그들의 작용에 큰 영향을 미친다). 그럼에도 불구하고 "how do distributed, trained networks of simple units that peform combined linear/non-linear operations work together to perform important computations(단순 단위들이 결합된 선형/비선형 연산을 통해 중요한 계산을 수행하기 위해 어떻게 함께 작동하는가)"라는 기본적인 질문은 동일하다. 나는 individual neuron commuication의 디테일이 컴퓨테이션과 회로에 관한 흥미로운 질문 대부분을 추상화할 것이라고 강하게 의심한다. 예를 들어 AI Interpretability 연구자들이 발견한 계산 메커니즘이 최근 생쥐의 뇌에서 다시 발견되었다.

인공 신경망에 대한 실험은 실제 뇌에 대한 실험보다 훨씬 쉽다. 따라서 Interpretability는 신경과학의 이해를 개선하는 도구가 될 수 있으며 강력한 AI는 인간보다 아마 이 도구를 더 잘 개발하고 적용할 수 있을 것이다.

Interpretability를 넘어 AI에서 배운 지능 시스템의 훈련 방식에 대한 이해는 신경과학에 혁신을 일으킬 가능성이 크다. 내가 신경과학에서 일하던 시절, 많은 사람들은 잘못된 질문에 초점을 맞췄다. 왜냐면 당시는 Scaling Hypothesis(시스템의 성능은 데이터와 모델 규모가 커지질수록 증가한다는 가설)나 Bitter Lesson 이라는 개념이 존재하지 않았기 때문이다. 간단한 Objective와 방대한 데이터만으로도 매우 복잡한 행동을 유도할 수 있다는 아이디어는 emergent computations보다는, objective function과 architectural biases를 이해하는 것이 더 흥미롭다는 관점을 만들었다. 나는 Scaling hypothesis를 항상 "이것이 고차원적 지능이 작동하는 방식과 어떻게 쉽게 진화했는지에 대한 설명이다"라고 생각했지만, 이는 평균적인 신경과학자의 관점은 아니라고 생각합니다. Scaling hypothesis는 아직 AI에서도 완전히 받아들여지지는 않았습니다. 나는 신경과학자들이 이 기본적인 통찰을 인간 뇌의 특수성(biophysical limitations, evolutionary history, topology, details of motor and sensory inputs/outputs)과 결합하여 신경과학의 핵심적인 수수께끼를 풀기 위해 노력해야 한다고 생각한다.

이 부분은 굉장히 인상 깊은 부분이었다. 스케일링 가설을 신경과학 관점에서 보면 세부적인 뉴런 간 상호작용(emergent computations로 빗대어지는) 보다 거시적인 규모에서 데이터(사람의 지능에서는 보고 듣는 모든 것)와 모델 확장(뇌의 크기와 연결)에 의해 지능이 더 고도화 된다는 것을 보여준다고 주장하는 부분이다. 즉 모델의 성능을 올리는 방법을 해석하는 것을 사람의 지능과 신경과학에도 적용하거나 분석해볼 가치가 있다는 것이고 이러한 관점에 대해 생각해본 적이 없어서인지 고개가 절로 끄덕여지는 부분이었다. 사람의 지능에 스케일링 가설을 적용해본다면 많은 경험과 학습을 할수록 뇌의 크기가 크거나 뉴런 간 연결의 복잡성이 복잡할 수록 사람의 지능은 복잡하고 많은 정보를 정교하게 사고하고 처리할 수 있다는 것이다.

나는 AI가 신경과학의 진전을 아래 4가지 경로를 통해 가속화할 것으로 기대하고, 이 4가지 경로가 함께 작용해 정신 질환을 치료하고 기능을 향상시키는데 기여할 수 있기를 바란다.

Traditional molecular biology, chemistry, and genetics
신경전달물질을 조절하여 뇌의 기능과, 각성, 인지, 기분을 변화시키는 여러 약물이 있다. 1장에서 언급한 같은 메커니즘으로 AI는 더 많은 약물을 발명하는 데 도움이 되고 정신질환의 유전적 기초에 대한 연구를 가속화할 수 있을 것.
Fine-grained neural measurement and intervention
개별 뉴런과 신경 회로가 무엇을 하는지 측정하고 이를 변화시키기 위한 개입을 할 수 있는 것.
Advanced computational neuroscience
AI가 주는 통찰과 AI 시스템의 전체적인 구조가 신경과학의 질문들에 유용하게 적용될 수 있을 것.
Behavior interventions
AI가 새로운 방법을 개발하는 것 외에도 기존 방법을 환자들이 잘 따르도록 하는 'AI 코치'와 같은 것.

나는 이 4가지가 함께 작용하면 신체 질환과 마찬가지로 AI가 없더라도 100년 내에 대부분의 정신 질환을 치료하거나 예방할 수 있을 것이라고 생각한다. 그리고 AI가 가속화된 5-10년 안에 그것이 가능할 것이라고 예상한다. 구체적으로 예상하는 것은 아래와 같다.

대부분의 정신 질환은 아마도 치료될 수 있을 것이다.
매우 "structural" 조건들은 어려울 수 있어도 불가능하지 않다.
예를 들어 사이코패스는 뇌의 어떤 영역이 작거나 더 발달한 경우이다. 그들의 뇌가 다르다면 처음부터 그랬을 것이고 뇌를 재구성하는 것은 어려워보이지만 성인의 뇌를 초기의 혹은 더 유연한 상태로 되돌려 재구성할 수 있는 방법이 있을지도 모른다.
정신 질환의 효과적인 유전적 예방이 가능할 것 같다.
우리가 임상 질환으로 간주하지 않는 일상적인 문제들도 해결될 것이다.
20세기 동안 우리는 인지 기능과 감정 상태를 조절하는 수많은 약물을 개발해왔기 때문에, 나는 가속화된 압축된 21시기에서 모든 사람이 자신의 뇌를 조금 더 잘 동작시켜 더 만족스러운 일상을 누리게 될 것이라고 낙관한다.
인간의 기본적인 경험을 훨씬 더 향상시킬 수 있다.
많은 사람들은 계시적인 순간, 창의적 영감, 연민, 성취감, 초월적 경험, 사랑, 아름다움, 평화 등을 경험한 적이 있다. 이의 특성이나 빈도는 사람마다 같은 사람이라도 시간마다 다르다. 더많은 사람들이 그들의 삶에서 이러한 비범한 순간을 경험할 수 있을 것이고 다양한 인지 기능을 향상시킬 수도 있을 것이다. 이는 아마도 생물학적 자유나 확장된 수명의 신경과학적 버전이라고 할 수 있을 것이다.

"Mind uploading"은 인간 뇌의 패턴과 다이나믹스를 포착해 그것을 SW로 구현하는 것이다. 나는 이 업로딩이 원칙적으로는 거의 확실히 가능하다고 생각하지만 실현에는 강력한 AI가 있다 하더라도 상당한 기술적, 사회적 도전 과제가 존재한다고 본다. 이는 5-10년 안에 이루어지기 어려운 일일 가능성이 높다.

요악하자면 AI 가속화된 신경과학의 대부분은 정신 질환을 개선하거나 치유할 뿐 아니라 인지적 및 정신적 자유와 인간의 인지, 감정적 능력을 크게 확장할 것이다. 외적인 세상이 달라지지 않더라도 인간이 경험하는 세상은 훨씬 나아지고 인간미 넘치는 곳이될 것이다. 향상된 정신 건강이 정치적, 경제적 문제를 포함한 다른 사회적 문제를 완화시킬 것이라고 나는 확신한다.

원문

3. Economic development and poverty

The previous two sections are about developing new technologies that cure disease and improve the quality of human life. However an obvious question, from a humanitarian perspective, is: “will everyone have access to these technologies?”

It is one thing to develop a cure for a disease, it is another thing to eradicate the disease from the world. More broadly, many existing health interventions have not yet been applied everywhere in the world, and for that matter the same is true of (non-health) technological improvements in general. Another way to say this is that living standards in many parts of the world are still desperately poor: GDP per capita is ~$2,000 in Sub-Saharan Africa as compared to ~$75,000 in the United States. If AI further increases economic growth and quality of life in the developed world, while doing little to help the developing world, we should view that as a terrible moral failure and a blemish on the genuine humanitarian victories in the previous two sections. Ideally, powerful AI should help the developing world catch up to the developed world, even as it revolutionizes the latter.

I am not as confident that AI can address inequality and economic growth as I am that it can invent fundamental technologies, because technology has such obvious high returns to intelligence (including the ability to route around complexities and lack of data) whereas the economy involves a lot of constraints from humans, as well as a large dose of intrinsic complexity. I am somewhat skeptical that an AI could solve the famous “socialist calculation problem”23 and I don’t think governments will (or should) turn over their economic policy to such an entity, even if it could do so. There are also problems like how to convince people to take treatments that are effective but that they may be suspicious of.

The challenges facing the developing world are made even more complicated by pervasive corruption in both private and public sectors. Corruption creates a vicious cycle: it exacerbates poverty, and poverty in turn breeds more corruption. AI-driven plans for economic development need to reckon with corruption, weak institutions, and other very human challenges.

Nevertheless, I do see significant reasons for optimism. Diseases have been eradicated and many countries have gone from poor to rich, and it is clear that the decisions involved in these tasks exhibit high returns to intelligence (despite human constraints and complexity). Therefore, AI can likely do them better than they are currently being done. There may also be targeted interventions that get around the human constraints and that AI could focus on. More importantly though, we have to try. Both AI companies and developed world policymakers will need to do their part to ensure that the developing world is not left out; the moral imperative is too great. So in this section, I’ll continue to make the optimistic case, but keep in mind everywhere that success is not guaranteed and depends on our collective efforts.

Below I make some guesses about how I think things may go in the developing world over the 5-10 years after powerful AI is developed:

Distribution of health interventions. The area where I am perhaps most optimistic is distributing health interventions throughout the world. Diseases have actually been eradicated by top-down campaigns: smallpox was fully eliminated in the 1970’s, and polio and guinea worm are nearly eradicated with less than 100 cases per year. Mathematically sophisticated epidemiological modeling plays an active role in disease eradication campaigns, and it seems very likely that there is room for smarter-than-human AI systems to do a better job of it than humans are. The logistics of distribution can probably also be greatly optimized. One thing I learned as an early donor to GiveWell is that some health charities are way more effective than others; the hope is that AI-accelerated efforts would be more effective still. Additionally, some biological advances actually make the logistics of distribution much easier: for example, malaria has been difficult to eradicate because it requires treatment each time the disease is contracted; a vaccine that only needs to be administered once makes the logistics much simpler (and such vaccines for malaria are in fact currently being developed). Even simpler distribution mechanisms are possible: some diseases could in principle be eradicated by targeting their animal carriers, for example releasing mosquitoes infected with a bacterium that blocks their ability to carry a disease (who then infect all the other mosquitos) or simply using gene drives to wipe out the mosquitos. This requires one or a few centralized actions, rather than a coordinated campaign that must individually treat millions. Overall, I think 5-10 years is a reasonable timeline for a good fraction (maybe 50%) of AI-driven health benefits to propagate to even the poorest countries in the world. A good goal might be for the developing world 5-10 years after powerful AI to at least be substantially healthier than the developed world is today, even if it continues to lag behind the developed world. Accomplishing this will of course require a huge effort in global health, philanthropy, political advocacy, and many other efforts, which both AI developers and policymakers should help with.
Economic growth. Can the developing world quickly catch up to the developed world, not just in health, but across the board economically? There is some precedent for this: in the final decades of the 20th century, several East Asian economies achieved sustained ~10% annual real GDP growth rates, allowing them to catch up with the developed world. Human economic planners made the decisions that led to this success, not by directly controlling entire economies but by pulling a few key levers (such as an industrial policy of export-led growth, and resisting the temptation to rely on natural resource wealth); it’s plausible that “AI finance ministers and central bankers” could replicate or exceed this 10% accomplishment. An important question is how to get developing world governments to adopt them while respecting the principle of self-determination—some may be enthusiastic about it, but others are likely to be skeptical. On the optimistic side, many of the health interventions in the previous bullet point are likely to organically increase economic growth: eradicating AIDS/malaria/parasitic worms would have a transformative effect on productivity, not to mention the economic benefits that some of the neuroscience interventions (such as improved mood and focus) would have in developed and developing world alike. Finally, non-health AI-accelerated technology (such as energy technology, transport drones, improved building materials, better logistics and distribution, and so on) may simply permeate the world naturally; for example, even cell phones quickly permeated sub-Saharan Africa via market mechanisms, without needing philanthropic efforts. On the more negative side, while AI and automation have many potential benefits, they also pose challenges for economic development, particularly for countries that haven't yet industrialized. Finding ways to ensure these countries can still develop and improve their economies in an age of increasing automation is an important challenge for economists and policymakers to address. Overall, a dream scenario—perhaps a goal to aim for—would be 20% annual GDP growth rate in the developing world, with 10% each coming from AI-enabled economic decisions and the natural spread of AI-accelerated technologies, including but not limited to health. If achieved, this would bring sub-Saharan Africa to the current per-capita GDP of China in 5-10 years, while raising much of the rest of the developing world to levels higher than the current US GDP. Again, this is a dream scenario, not what happens by default: it’s something all of us must work together to make more likely.
Food security 24. Advances in crop technology like better fertilizers and pesticides, more automation, and more efficient land use drastically increased crop yields across the 20th Century, saving millions of people from hunger. Genetic engineering is currently improving many crops even further. Finding even more ways to do this—as well as to make agricultural supply chains even more efficient—could give us an AI-driven second Green Revolution, helping close the gap between the developing and developed world.
Mitigating climate change. Climate change will be felt much more strongly in the developing world, hampering its development. We can expect that AI will lead to improvements in technologies that slow or prevent climate change, from atmospheric carbon-removal and clean energy technology to lab-grown meat that reduces our reliance on carbon-intensive factory farming. Of course, as discussed above, technology isn’t the only thing restricting progress on climate change—as with all of the other issues discussed in this essay, human societal factors are important. But there’s good reason to think that AI-enhanced research will give us the means to make mitigating climate change far less costly and disruptive, rendering many of the objections moot and freeing up developing countries to make more economic progress.
Inequality within countries. I’ve mostly talked about inequality as a global phenomenon (which I do think is its most important manifestation), but of course inequality also exists within countries. With advanced health interventions and especially radical increases in lifespan or cognitive enhancement drugs, there will certainly be valid worries that these technologies are “only for the rich”. I am more optimistic about within-country inequality especially in the developed world, for two reasons. First, markets function better in the developed world, and markets are typically good at bringing down the cost of high-value technologies over time25. Second, developed world political institutions are more responsive to their citizens and have greater state capacity to execute universal access programs—and I expect citizens to demand access to technologies that so radically improve quality of life. Of course it’s not predetermined that such demands succeed—and here is another place where we collectively have to do all we can to ensure a fair society. There is a separate problem in inequality of wealth (as opposed to inequality of access to life-saving and life-enhancing technologies), which seems harder and which I discuss in Section 5.
The opt-out problem. One concern in both developed and developing world alike is people opting out of AI-enabled benefits (similar to the anti-vaccine movement, or Luddite movements more generally). There could end up being bad feedback cycles where, for example, the people who are least able to make good decisions opt out of the very technologies that improve their decision-making abilities, leading to an ever-increasing gap and even creating a dystopian underclass (some researchers have argued that this will undermine democracy, a topic I discuss further in the next section). This would, once again, place a moral blemish on AI’s positive advances. This is a difficult problem to solve as I don’t think it is ethically okay to coerce people, but we can at least try to increase people’s scientific understanding—and perhaps AI itself can help us with this. One hopeful sign is that historically anti-technology movements have been more bark than bite: railing against modern technology is popular, but most people adopt it in the end, at least when it’s a matter of individual choice. Individuals tend to adopt most health and consumer technologies, while technologies that are truly hampered, like nuclear power, tend to be collective political decisions.

Overall, I am optimistic about quickly bringing AI’s biological advances to people in the developing world. I am hopeful, though not confident, that AI can also enable unprecedented economic growth rates and allow the developing world to at least surpass where the developed world is now. I am concerned about the “opt out” problem in both the developed and developing world, but suspect that it will peter out over time and that AI can help accelerate this process. It won’t be a perfect world, and those who are behind won’t fully catch up, at least not in the first few years. But with strong efforts on our part, we may be able to get things moving in the right direction—and fast. If we do, we can make at least a downpayment on the promises of dignity and equality that we owe to every human being on earth.

3. Economic development and poverty

앞에서 질병을 치료하고 삶의 질을 향상시키는 것들에 대해 이야기 했다. 하지만 "이 기술들이 모든 사람들에게 제공될 수 있을까?"

AI가 선진국의 경제 성장과 삶의 질을 향상시키고, 개발도상국에는 별다른 도움이 되지 않는다면, 이는 인도주의적(humanitarian) 승리로서 큰 도덕적 실패가 될 것이다. 경제에는 인간적 제약과 내재적 복잡성이 존재하므로 AI가 경제 성장과 불평등을 해결할 수 있을지는 그렇게 확신하지 않는다. 또한 AI가 Socialist Calculation Problem을 해결할 수 있을지도 회의적이다. AI가 그런 역할을 한다하더라도 정부가 경제 정책을 AI에게 맡기지는 않을 것이라고 생각하며 사람들이 의심하고 그것을 받아들이지 않으려는 문제도 존재한다.

개발도상국이 직면한 문제는 민간, 공공 부문의 만연한 부패로 더 복잡하고, 부패는 악순환을 만들며 빈곤은 악화시키고 빈곤은 더 많은 부패를 낳는다. 그럼에도 불구하고 나는 상당한 이유로 낙관적이다. 질병이 근절되고 많은 국가들이 부유해질 때 이 과정에서 필요한 결정들은 분명히 high return to intelligence이다. 따라서 먀는 현재 이루어지고 있는 것보다 더 나은 방식으로 이를 수행하고 인간의 제약을 피할 수 있는 방법들을 AI가 집중할 수 있을 것이다. 더 중요한 것은 이를 시도해야 한다는 것이고 도덕적 의무는 너무나도 크다.

다음은 powerful AI가 개발된 후 5-10년 동안 개발도상국에서 일이 어떻게 진행될지에 대한 내 추측이다.

Distribution of health interventions
질병은 상향식 캠페인에 의해 근절된 적이 있다(천연두, 소아마비, guinea warm 등). 수학적으로 정교한 역학 모델링은 질병 근절 캠페인에서 중요한 역할을 했고 AI는 이를 인간보다 더 잘 처리할 가능성이 매우 높다. 예를 들어 말라리아는 질병에 걸릴 때마다 치료가 필요해 근절이 어려웠지만 한 번만 투여하면 되는 백신은 logistics를 훨씬 단순화시킨다. 더욱 간단한 것도 가능하다. 박테리아에 감연된 모기를 방출하거나 유전자 구동을 통해 모기를 제거할 수 있다. 이는 수백명을 개별적으로 치료하는 것 대신 몇 가지 중앙 집중적 조치로 이루어진다.
Economic growth
20세기 후반 몇 동아시아 경제들은 선진국을 따라잡았다. 전체 경제를 직접 통제하는 것이 아니라 몇 가지 주요한 지렛대를 통해 성공을 이끌어냈다(수출 주도 성장, 천연 자원에 의존하지 않는 전략 등). AI 재무장관과 중앙 은행장이 이 성과를 복제하거나 초과할 수 있다. 중요한 질문은 개발도상국이 이를 받아들이도록 하는 것과 자주 자율성을 존중하는 방식으로 진행할 필요성이 있다는 것이다. 낙관적 측면에서 건강 개입은 경제 성장을 자연스럽게 증가시킬 가능성이 크다.
Food security
20세기 동안 더 나은 비료와 농약, 자동화 및 효율적 토지 이용 기술의 발전은 농작물 수확량을 급격히 증가시켰다. 더 많은 방법을 Ai 기반으로 찾아 두 번째 녹색 혁명이 가능할 수 있고 선진국과 도상국의 차이를 줄이는 데 도움이 될 것이다.
Mitigating climate change
Ai가 기후 변화완화를 더 적은 비용과 혼란으로 가능하게 만들 것이라는 좋은 이유가 있다. 이는 많은 반대 의견을 무효화시킬 수 있고 개발도상국들이 더 많은 경제 발전을 이룰 수 있게 할 것이다.
Inequality within countries
언급된 기술들이 "부자들만의 기술"이 될 수 있다는 우려가 있을 것입니다. 하지만 개발된 국가에서 시장은 더 잘 작동하고 시간이 지날 수록 고급 기술의 비용을 낮추는 데 유리하며, 개발된 국가의 정치 제도는 시민들의 요구에 더 잘 반응하고 보편적 프로그램을 실행할 수 있는 더 큰 국가 능력을 갖추고 있으므로 국가 내 불평등에 대해서는 낙관적이다.
The opt-out problem
반백신 운동이나, 기술 거부 운동처럼 Ai 기반 혜택을 거부하는 문제가 있다. 결정 능력이 부족한 사람들은 자신의 이익이 될 기술을 거부하고 이는 더 큰 격차를 만들고 디스토피아적 하층 계급을 들어낼 수 있다. 일부 연구자들은 이것이 민주주의를 약화시킬 것이라고 주장하기도 한다. 사람들이 강제로 기술을 사용하는 것보다 과학적 이해를 증진시키는 노력이 필요하고, 이 자체를 Ai가 도울 수 있을지 모른다. 하지만 대부분의 사람들은 결국 기술을 택하게 되므로 긍정적이다. 반면 원자력 발전처럼 진정으로 저지되는 기술들은 집단적 정치적 결정에 의존하는 경향이 있다.

첫 몇 년동안은 선진국과 개발도상국 모두에서 배제 문제가 있더라도 우리 모두의 막강한 노력으로 올바른 방향으로 빠르게 나아갈 수 있을 것이다. 그렇게 된다면 우리는 모든 인류에게 마땅히 주어야 할 존엄과 평등의 약속에 대한 최소한의 downpayment를 만들 수 있을 것이다.

원문

4. Peace and governance

Suppose that everything in the first three sections goes well: disease, poverty, and inequality are significantly reduced and the baseline of human experience is raised substantially. It does not follow that all major causes of human suffering are solved. Humans are still a threat to each other. Although there is a trend of technological improvement and economic development leading to democracy and peace, it is a very loose trend, with frequent (and recent) backsliding. At the dawn of the 20th Century, people thought they had put war behind them; then came the two world wars. Thirty years ago Francis Fukuyama wrote about “the End of History” and a final triumph of liberal democracy; that hasn’t happened yet. Twenty years ago US policymakers believed that free trade with China would cause it to liberalize as it became richer; that very much didn’t happen, and we now seem headed for a second cold war with a resurgent authoritarian bloc. And plausible theories suggest that internet technology may actually advantage authoritarianism, not democracy as initially believed (e.g. in the “Arab Spring” period). It seems important to try to understand how powerful AI will intersect with these issues of peace, democracy, and freedom.

Unfortunately, I see no strong reason to believe AI will preferentially or structurally advance democracy and peace, in the same way that I think it will structurally advance human health and alleviate poverty. Human conflict is adversarial and AI can in principle help both the “good guys” and the “bad guys”. If anything, some structural factors seem worrying: AI seems likely to enable much better propaganda and surveillance, both major tools in the autocrat’s toolkit. It’s therefore up to us as individual actors to tilt things in the right direction: if we want AI to favor democracy and individual rights, we are going to have to fight for that outcome. I feel even more strongly about this than I do about international inequality: the triumph of liberal democracy and political stability is not guaranteed, perhaps not even likely, and will require great sacrifice and commitment on all of our parts, as it often has in the past.

I think of the issue as having two parts: international conflict, and the internal structure of nations. On the international side, it seems very important that democracies have the upper hand on the world stage when powerful AI is created. AI-powered authoritarianism seems too terrible to contemplate, so democracies need to be able to set the terms by which powerful AI is brought into the world, both to avoid being overpowered by authoritarians and to prevent human rights abuses within authoritarian countries.

My current guess at the best way to do this is via an “entente strategy”26, in which a coalition of democracies seeks to gain a clear advantage (even just a temporary one) on powerful AI by securing its supply chain, scaling quickly, and blocking or delaying adversaries’ access to key resources like chips and semiconductor equipment. This coalition would on one hand use AI to achieve robust military superiority (the stick) while at the same time offering to distribute the benefits of powerful AI (the carrot) to a wider and wider group of countries in exchange for supporting the coalition’s strategy to promote democracy (this would be a bit analogous to “Atoms for Peace”). The coalition would aim to gain the support of more and more of the world, isolating our worst adversaries and eventually putting them in a position where they are better off taking the same bargain as the rest of the world: give up competing with democracies in order to receive all the benefits and not fight a superior foe.

If we can do all this, we will have a world in which democracies lead on the world stage and have the economic and military strength to avoid being undermined, conquered, or sabotaged by autocracies, and may be able to parlay their AI superiority into a durable advantage. This could optimistically lead to an “eternal 1991”—a world where democracies have the upper hand and Fukuyama’s dreams are realized. Again, this will be very difficult to achieve, and will in particular require close cooperation between private AI companies and democratic governments, as well as extraordinarily wise decisions about the balance between carrot and stick.

Even if all that goes well, it leaves the question of the fight between democracy and autocracy within each country. It is obviously hard to predict what will happen here, but I do have some optimism that given a global environment in which democracies control the most powerful AI, then AI may actually structurally favor democracy everywhere. In particular, in this environment democratic governments can use their superior AI to win the information war: they can counter influence and propaganda operations by autocracies and may even be able to create a globally free information environment by providing channels of information and AI services in a way that autocracies lack the technical ability to block or monitor. It probably isn’t necessary to deliver propaganda, only to counter malicious attacks and unblock the free flow of information. Although not immediate, a level playing field like this stands a good chance of gradually tilting global governance towards democracy, for several reasons.

First, the increases in quality of life in Sections 1-3 should, all things equal, promote democracy: historically they have, to at least some extent. In particular I expect improvements in mental health, well-being, and education to increase democracy, as all three are negatively correlated with support for authoritarian leaders. In general people want more self-expression when their other needs are met, and democracy is among other things a form of self-expression. Conversely, authoritarianism thrives on fear and resentment.

Second, there is a good chance free information really does undermine authoritarianism, as long as the authoritarians can’t censor it. And uncensored AI can also bring individuals powerful tools for undermining repressive governments. Repressive governments survive by denying people a certain kind of common knowledge, keeping them from realizing that “the emperor has no clothes”. For example Srđa Popović, who helped to topple the Milošević government in Serbia, has written extensively about techniques for psychologically robbing authoritarians of their power, for breaking the spell and rallying support against a dictator. A superhumanly effective AI version of Popović (whose skills seem like they have high returns to intelligence) in everyone’s pocket, one that dictators are powerless to block or censor, could create a wind at the backs of dissidents and reformers across the world. To say it again, this will be a long and protracted fight, one where victory is not assured, but if we design and build AI in the right way, it may at least be a fight where the advocates of freedom everywhere have an advantage.

As with neuroscience and biology, we can also ask how things could be “better than normal”—not just how to avoid autocracy, but how to make democracies better than they are today. Even within democracies, injustices happen all the time. Rule-of-law societies make a promise to their citizens that everyone will be equal under the law and everyone is entitled to basic human rights, but obviously people do not always receive those rights in practice. That this promise is even partially fulfilled makes it something to be proud of, but can AI help us do better?

For example, could AI improve our legal and judicial system by making decisions and processes more impartial? Today people mostly worry in legal or judicial contexts that AI systems will be a cause of discrimination, and these worries are important and need to be defended against. At the same time, the vitality of democracy depends on harnessing new technologies to improve democratic institutions, not just responding to risks. A truly mature and successful implementation of AI has the potential to reduce bias and be fairer for everyone.

For centuries, legal systems have faced the dilemma that the law aims to be impartial, but is inherently subjective and thus must be interpreted by biased humans. Trying to make the law fully mechanical hasn’t worked because the real world is messy and can’t always be captured in mathematical formulas. Instead legal systems rely on notoriously imprecise criteria like “cruel and unusual punishment” or “utterly without redeeming social importance”, which humans then interpret—and often do so in a manner that displays bias, favoritism, or arbitrariness. “Smart contracts” in cryptocurrencies haven’t revolutionized law because ordinary code isn’t smart enough to adjudicate all that much of interest. But AI might be smart enough for this: it is the first technology capable of making broad, fuzzy judgements in a repeatable and mechanical way.

I am not suggesting that we literally replace judges with AI systems, but the combination of impartiality with the ability to understand and process messy, real world situations feels like it should have some serious positive applications to law and justice. At the very least, such systems could work alongside humans as an aid to decision-making. Transparency would be important in any such system, and a mature science of AI could conceivably provide it: the training process for such systems could be extensively studied, and advanced interpretability techniques could be used to see inside the final model and assess it for hidden biases, in a way that is simply not possible with humans. Such AI tools could also be used to monitor for violations of fundamental rights in a judicial or police context, making constitutions more self-enforcing.

In a similar vein, AI could be used to both aggregate opinions and drive consensus among citizens, resolving conflict, finding common ground, and seeking compromise. Some early ideas in this direction have been undertaken by the computational democracy project, including collaborations with Anthropic. A more informed and thoughtful citizenry would obviously strengthen democratic institutions.

There is also a clear opportunity for AI to be used to help provision government services—such as health benefits or social services—that are in principle available to everyone but in practice often severely lacking, and worse in some places than others. This includes health services, the DMV, taxes, social security, building code enforcement, and so on. Having a very thoughtful and informed AI whose job is to give you everything you’re legally entitled to by the government in a way you can understand—and who also helps you comply with often confusing government rules—would be a big deal. Increasing state capacity both helps to deliver on the promise of equality under the law, and strengthens respect for democratic governance. Poorly implemented services are currently a major driver of cynicism about government27.

All of these are somewhat vague ideas, and as I said at the beginning of this section, I am not nearly as confident in their feasibility as I am in the advances in biology, neuroscience, and poverty alleviation. They may be unrealistically utopian. But the important thing is to have an ambitious vision, to be willing to dream big and try things out. The vision of AI as a guarantor of liberty, individual rights, and equality under the law is too powerful a vision not to fight for. A 21st century, AI-enabled polity could be both a stronger protector of individual freedom, and a beacon of hope that helps make liberal democracy the form of government that the whole world wants to

adopt.

4. Peace and governance

이 모든 것이 잘 진행된다 하더라도 인간의 모든 주요한 고통의 원인이 해결된다는 뜻은 아니다. 인간은 여전히 서로에게 위협이 될 수 있다. 기술 발전과 경제 성장이 민주주의와 평화가 촉진되는 경항은 있지만 이는 매우 느슨한 경향에 불과하며 자주(그리고 최근에는) 되돌아가고 있다. 사람들은 전쟁을 과거의 일로 생각했지만 20세기 초 두차례의 세계 대전이 있엇다. 미국 정책 입안자들은 중국과의 자유무역이 중국을 더 부유하게 만들고 자유화를 이끌 것이라고 믿었지만 전혀 그렇지 않았고, 우리는 이제 부활한 권위주의 블록과 함께 두 번째 냉전으로 항하는 것 같다. 그리고 인터넷 기술이 실제로 민주주의보다는 권위주의에 유리할 수 있다고 제시한다(e.x. 아랍의 봄).

AI는 선전과 감시를 훨씬 더 잘하게 도와줄 가능성이 높고 이는 독재자들의 주요도구이다. 자유 민주주의의 승리와 정치적 안정은 보장되지 않으며 그럴 가능성도 낮고 우리의 큰 희생과 헌신이 필요할 것이다. 과거에도 그랬듯.

내가 생각하는 가장 좋은 방법은 연합 전략이다. 민주주의 국가들 연합이 강력한 AI에 대해 명확한 우위를 차지하기 위해 공급망을 확보하고 빠르게 확장하여 반대 세력이 칩이나 반도체 장비와 같은 주요 자원에 접근하는 것을 차단하거나 지연시키는 방식이다. 이 연합은 AI로 강력한 군사적 우위를 확보하면서(채찍), 동시에 강력한 AI의 혜택을 점점 많은 국가들에게 분배할 수 있도록 하는 조건을 제시(당근)할 것이다.

평등한 경쟁 환경은 몇 가지 이유로 인해 점진적으로 글로벌 거버넌스를 민주주의 쪽으로 기울게 만들 가능성이 크다.

첫째, 삶의 질 향상은 민주주의를 촉진할 것이다. 역사적으로 볼 때 어느정도는 그랬다. 특히 정신 건강, 웰빙, 교육의 향상은 민주주의를 증가시킬 것으로 예상된다. 일반적으로 사람들은 다른 필요가 충족되었을 때 더 많은 자기 표현을 원하고, 민주주의는 다른 무엇보다도 자기 표현의 형태이다. 반대로, 독재주의는 두려움과 분노에서 번성한다.

둘째, 자유로운 정보가 독재자들이 그것을 검열할 수 없다면 실제로 독재주의를 약화시킬 가능성도 크다. AI가 독재자들이 차단하거나 검열할 수 없는 방식으로 모든 사람의 주머니에 들어가게 된다면, 전 세계의 반체제 운동가들과 개혁가들에게 큰 힘을 실어줄 수 있을 것이다.

다시 말하지만 이는 긴 싸움이 될 것이고 승리가 보장되지 않지만 AI를 올르게 설계하고 구축한다면 적어도 자유를 옹호하는 사람들이 우위를 점할 수 있는 싸움이 될 가능성은 있다.

신경과학과 생물학처럼 우리는 "better than normal"이 될 수 있는 법을 물어볼 수 있다. 즉, 독재를 피하는 것 뿐 아니라 민주주의를 오늘보다 더 나은 방향으로 만드는 방법에 대해서도 말이다. 법치주의 사회는 시민들에게 모두가 법 앞에서 평등하고 기본적인 인권을 누릴 자격이 있다는 약속을 하지만 실제로는 이러한 권리를 항상 보장받는 것은 아니다. AI가 우리를 더 나은 방향으로 이끌 수 있을까? 예를 들어 AI가 우리의 법률 및 사법 시스템을 개선하여 더 공정하고 편향 없는 결정을 내릴 수 있을까?

AI의 진정한 구현은 편향을 줄이고, 모든 사람에게 더 공정한 결과를 가져올 잠재력이 있다. 법률 시스템은 법이 공정해야 한다는 딜레마에 직면해왔다. 법은 본질적으로 주관적이기 때문에 반드시 편향된 인간에 의해 해석되어야 했다. 실제 세계는 복잡하고 수학 공식으로 모두 포착할 수 없기 때문에 법을 완전히 기계적으로 만들지 못했다. 법률 시스템은 이를 인간이 해석해야 했고 이는 종종 편향, 편애, 임의성이 드러나기도 한다. 암호화폐에서의 스마트 계약은 법을 혁신하지 않았다. 일반적으로 코드가 그런 복잡한 문제를 해결할 만큼 똑똑하지 않다. 그러나 AI는 이를 해결할 수 있을 만큼 똑똑할 수 있다. AI는 처음으로 넓고 애매한 판단을 반복 가능하고 기계적으로 내릴 수 있는 기술이기 때문이다.

판사를 AI로 대체해야 한다고 주장하는 것은 아니지만 인간의 의사결정을 돕는 보조 도구로서 작용할 수 있을 것이다. 이러한 시스템의 학습 과정은 철저히 연구될 수 있고, 고급 해석가능성 기법을 사용해 최종 모델을 들여다보고 숨겨진 편향이 있는지 평가할 수 있는 방법을 제시할 수 있다. 인간에게는 불가능한일이다. AI는 사법적, 경찰적 맥락에서 기본적인 권리 침해를 감시하는 데 사용될 수도 있어 헌법을 더욱 자율적으로 집행할 수 있게 된다.

사려깊고 정보에 기반한 AI가 당신이 법적으로 정부로부터 받을 수 있는 모든 것을 이해할 수 있는 방식으로 제공하고 자주 혼란스러운 규칙을 준수하도록 돕는다면 이는 큰 변화가 될 것이다. 국가의 능력을 강화하는 것은 법 앞의 평등이라는 약속을 이행하는 데 도움을 주며 민주적 거버넌스에 대한 존중을 강화할 수 있다. 이는 생물학 신경과학에 비해 확신하지 않고 모호하며 비현실적 유토피아일 수 있다. 그러나 중요한 것은 야심찬 비전을 가지고 큰 꿈을 꾸고 다양한 시도를 해보는 것이다.

원문

5. Work and meaning

Even if everything in the preceding four sections goes well—not only do we alleviate disease, poverty, and inequality, but liberal democracy becomes the dominant form of government, and existing liberal democracies become better versions of themselves—at least one important question still remains. “It’s great we live in such a technologically advanced world as well as a fair and decent one”, someone might object, “but with AI’s doing everything, how will humans have meaning? For that matter, how will they survive economically?”.

I think this question is more difficult than the others. I don’t mean that I am necessarily more pessimistic about it than I am about the other questions (although I do see challenges). I mean that it is fuzzier and harder to predict in advance, because it relates to macroscopic questions about how society is organized that tend to resolve themselves only over time and in a decentralized manner. For example, historical hunter-gatherer societies might have imagined that life is meaningless without hunting and various kinds of hunting-related religious rituals, and would have imagined that our well-fed technological society is devoid of purpose. They might also have not understood how our economy can provide for everyone, or what function people can usefully service in a mechanized society.

Nevertheless, it’s worth saying at least a few words, while keeping in mind that the brevity of this section is not at all to be taken as a sign that I don’t take these issues seriously—on the contrary, it is a sign of a lack of clear answers.

On the question of meaning, I think it is very likely a mistake to believe that tasks you undertake are meaningless simply because an AI could do them better. Most people are not the best in the world at anything, and it doesn’t seem to bother them particularly much. Of course today they can still contribute through comparative advantage, and may derive meaning from the economic value they produce, but people also greatly enjoy activities that produce no economic value. I spend plenty of time playing video games, swimming, walking around outside, and talking to friends, all of which generates zero economic value. I might spend a day trying to get better at a video game, or faster at biking up a mountain, and it doesn’t really matter to me that someone somewhere is much better at those things. In any case I think meaning comes mostly from human relationships and connection, not from economic labor. People do want a sense of accomplishment, even a sense of competition, and in a post-AI world it will be perfectly possible to spend years attempting some very difficult task with a complex strategy, similar to what people do today when they embark on research projects, try to become Hollywood actors, or found companies28. The facts that (a) an AI somewhere could in principle do this task better, and (b) this task is no longer an economically rewarded element of a global economy, don’t seem to me to matter very much.

The economic piece actually seems more difficult to me than the meaning piece. By “economic” in this section I mean the possible problem that most or all humans may not be able to contribute meaningfully to a sufficiently advanced AI-driven economy. This is a more macro problem than the separate problem of inequality, especially inequality in access to the new technologies, which I discussed in Section 3.

First of all, in the short term I agree with arguments that comparative advantage will continue to keep humans relevant and in fact increase their productivity, and may even in some ways level the playing field between humans. As long as AI is only better at 90% of a given job, the other 10% will cause humans to become highly leveraged, increasing compensation and in fact creating a bunch of new human jobs complementing and amplifying what AI is good at, such that the “10%” expands to continue to employ almost everyone. In fact, even if AI can do 100% of things better than humans, but it remains inefficient or expensive at some tasks, or if the resource inputs to humans and AI’s are meaningfully different, then the logic of comparative advantage continues to apply. One area humans are likely to maintain a relative (or even absolute) advantage for a significant time is the physical world. Thus, I think that the human economy may continue to make sense even a little past the point where we reach “a country of geniuses in a datacenter”.

However, I do think in the long run AI will become so broadly effective and so cheap that this will no longer apply. At that point our current economic setup will no longer make sense, and there will be a need for a broader societal conversation about how the economy should be organized.

While that might sound crazy, the fact is that civilization has successfully navigated major economic shifts in the past: from hunter-gathering to farming, farming to feudalism, and feudalism to industrialism. I suspect that some new and stranger thing will be needed, and that it’s something no one today has done a good job of envisioning. It could be as simple as a large universal basic income for everyone, although I suspect that will only be a small part of a solution. It could be a capitalist economy of AI systems, which then give out resources (huge amounts of them, since the overall economic pie will be gigantic) to humans based on some secondary economy of what the AI systems think makes sense to reward in humans (based on some judgment ultimately derived from human values). Perhaps the economy runs on Whuffie points. Or perhaps humans will continue to be economically valuable after all, in some way not anticipated by the usual economic models. All of these solutions have tons of possible problems, and it’s not possible to know whether they will make sense without lots of iteration and experimentation. And as with some of the other challenges, we will likely have to fight to get a good outcome here: exploitative or dystopian directions are clearly also possible and have to be prevented. Much more could be written about these questions and I hope to do so at some later time.

5. Working and meaning

모든 것이 잘 된다면 여전히 중요한 질문이 하나 남는다. "우리가 기술적으로 빈보하고 온정적인 세상에서 살고 있다면, AI가 모든것을 하는 것은 인간에게 어떤 의미가 있을까? 더 나아가 인간들은 어떻게 경제적으로 생존할 수 있을까?"

이 질문은 다른 문제보다 더 어렵다(비관적이라는 의미는 아니다). 역사적으로 수렵사회에서 사냥과 관련된 다양한 종교 의식 없이 삶이 무의미하다고 생각했을지 모른다. 그들은 지금 기술 사회가 목적이 결여된 사회라고 상상했을 수 있다. 이 문제는 사회가 어떻게 조직될 것인가라는 거시적 문제와 관련되어 있어 시간과 분산된 방식으로 해결되야 하므로 예측하기 어렵고 모호하다. 이 섹션이 짧은 것은 내가 이 문제를 진지하게 생각해서가 아니고 오히려 명확한 답이 없다는 사실을 반영하는 것이다.

나는 AI가 더 잘하기 때문에 내가 하는 일이 무의미하다라는 생각은 잘못되었다고 생각한다. 대부분의 사람들은 어떤 일에 있어 최고가 아니고 그 사실이 불편하지 않다. 또한 경제적 가치가 없는 활동에서도 큰 즐거움을 얻을 수 있다.

경제적 문제는 의미의 문제보다 더 어렵다. 나는 결국 Ai가 매우 효율적이고 저렴해져서 더이상 AI보다 인간 작업이 더 효율적인 상황이 존재하지 않게될 것이라고 생각한다. 미친 생각처럼 들릴 수 있지만 사실 문명은 과거에 여러 번 주요한 경제적 전환을 성공적으로 겪었다. 수집 채렵에서 농업으로, 농업에서 봉건주의로, 봉건주의에서 산업화로. 나는 그때와 마찬가지로 지금과는 전혀 다른 더 기이한 해결책이 필요할 것이라고 생각한다. 이는 오늘날 누구도 제대로 상상하지 못한 것일 수 있다. 모든 사람에게 보편적 기본소득을 제공하는 것 같은 것은 해결책의 일부에 불과할 것이다. Ai 시스템이 자원을 인간에게 제공하는 자본주의 경제일 수도 있다. 이 경우 AI는 인간에게 보상할만한 가치가 있다고 판단하는 것을 기반으로 자원을 분배한다면, 어쩌면 경제는 Whuffie point로 운영될지도 모르겠다. 혹은 인간들이 여전히 경제적을 가치 있는 존재로 남을 수도 있다. 하지만 이들은 모두 수 많은 문제가 있고 실제 유효할지 여부는 많은 반복과 실험으로 알 수 있다. 결국 좋은 결과를 얻기 위해 싸워야하며 착취적이나 디스토피아적 방향을 방지해야 한다.

원문

Taking stock

Through the varied topics above, I’ve tried to lay out a vision of a world that is both plausible if everything goes right with AI, and much better than the world today. I don’t know if this world is realistic, and even if it is, it will not be achieved without a huge amount of effort and struggle by many brave and dedicated people. Everyone (including AI companies!) will need to do their part both to prevent risks and to fully realize the benefits.

But it is a world worth fighting for. If all of this really does happen over 5 to 10 years—the defeat of most diseases, the growth in biological and cognitive freedom, the lifting of billions of people out of poverty to share in the new technologies, a renaissance of liberal democracy and human rights—I suspect everyone watching it will be surprised by the effect it has on them. I don’t mean the experience of personally benefiting from all the new technologies, although that will certainly be amazing. I mean the experience of watching a long-held set of ideals materialize in front of us all at once. I think many will be literally moved to tears by it.

Throughout writing this essay I noticed an interesting tension. In one sense the vision laid out here is extremely radical: it is not what almost anyone expects to happen in the next decade, and will likely strike many as an absurd fantasy. Some may not even consider it desirable; it embodies values and political choices that not everyone will agree with. But at the same time there is something blindingly obvious—something overdetermined—about it, as if many different attempts to envision a good world inevitably lead roughly here.

In Iain M. Banks’ The Player of Games29, the protagonist—a member of a society called the Culture, which is based on principles not unlike those I’ve laid out here—travels to a repressive, militaristic empire in which leadership is determined by competition in an intricate battle game. The game, however, is complex enough that a player’s strategy within it tends to reflect their own political and philosophical outlook. The protagonist manages to defeat the emperor in the game, showing that his values (the Culture’s values) represent a winning strategy even in a game designed by a society based on ruthless competition and survival of the fittest. A well-known post by Scott Alexander has the same thesis—that competition is self-defeating and tends to lead to a society based on compassion and cooperation. The “arc of the moral universe” is another similar concept.

I think the Culture’s values are a winning strategy because they’re the sum of a million small decisions that have clear moral force and that tend to pull everyone together onto the same side. Basic human intuitions of fairness, cooperation, curiosity, and autonomy are hard to argue with, and are cumulative in a way that our more destructive impulses often aren’t. It is easy to argue that children shouldn’t die of disease if we can prevent it, and easy from there to argue that everyone’s children deserve that right equally. From there it is not hard to argue that we should all band together and apply our intellects to achieve this outcome. Few disagree that people should be punished for attacking or hurting others unnecessarily, and from there it’s not much of a leap to the idea that punishments should be consistent and systematic across people. It is similarly intuitive that people should have autonomy and responsibility over their own lives and choices. These simple intuitions, if taken to their logical conclusion, lead eventually to rule of law, democracy, and Enlightenment values. If not inevitably, then at least as a statistical tendency, this is where humanity was already headed. AI simply offers an opportunity to get us there more quickly—to make the logic starker and the destination clearer.

Nevertheless, it is a thing of transcendent beauty. We have the opportunity to play some small role in making it real.

Taking Stock

나는 이 주제들을 통해 AI가 잘 진행될 경우 오늘보다 훨씬 더 나은 세상을 만들 수 있다는 비전을 제시하려고 했다. 이는 용기 있고 헌신적인 사람들의 엄청난 노력과 투쟁없이는 이루어지지 않을 것이다. 하지만 이 세상은 싸울 가치가 있는 세상이다. 언급한 것들이 실제로 일어난다면 지켜보는 사람들은 모두 그 효과에 놀랄 것이라고 생각한다. 나는 개인 혜택만을 말하는 것이 아니라 우리가 오랫동안 품어온 이상이 한꺼번에 눈앞에서 실현되는 경험은 많은 사람들이 그것을 보고 눈물을 흘릴 것이라고 생각한다.

여기서 제시된 비전은 매우 급진적인 것이다. 거의 누구도 다음 10년 내에 일어날 일이라고 기대하지 않으며 많은 사람들에게는 터무니없는 환상처럼 보일 것이다. 하지만 눈을 멀게 할 정도로의 분명한 무언가가 있다(마치 좋은 세상을 구상하려는 수많은 시도가 결국 여기로 향할 수 밖에 없는 것처럼 보인다).

나는 문화의 가치가 승리하는 전략이라고 생각한다. 왜냐면 그것은 분명한 도덕적 힘을 가진 수백만 개의 작은 결정들의 합이며 사람들이 모두 같은 편이 되게 만드는 경향이 있기 때문이다. 공정함, 협력, 호기심, 자율성에 대한 기본적인 인간의 직관은 논란의 여지가 적고 우리의 파괴적 충동들과 달리 누적되는 경향이 있다. 우리는 아이들이 병으로 죽지 않아야 한다는 주장을 쉽게 할 수 있고 더 나아가 아이들이 그 권리를 평등하게 누려야 한다고 주장하는 것도 어렵지 않다. 거기서 더 나아가 우리는 모두 힘을 합쳐 이 결과를 달성해야 한다고 주장하는 것도 어렵지 않다. 마찬가지로 사람들은 자신들의 삶과 선택에 대해 자율성과 책임을 가져야 한다는 생각도 직관적으로 받아들여진다. 이러한 간단한 직관들이 논리적 결론에 이르게 되면 결국 법의 지배, 민주주의, 계몽 주의적 가치를 만들어낸다.최소한 통계적으로 보면 인류는 이런 방향으로 가고 있었다. AI는 단지 우리가 더 빨리 그곳에 도달할 수 있도록 해주는 기회를 제공한다(논리를 더 뚜렷하게 하고 목적지를 더 분명히 해준다).

그럼에도 불구하고 이것은 초월적인 아름다움이 있다. 우리는 그것을 현실적으로 만드는 데 작은 역할을 할 기회가 있다.

문제	유형
124나라의숫자	수학 (삼진법
최고의 집합	수학
디스크 컨트롤러	우선순위 큐
여행경로	그래프 - 오일러, DFS
순위	그래프 - 플로이드 워셜
퍼즐 조각 채우기	구현 - BFS
아이템 줍기	구현 - BFS
징검다리 건너기	이진탐색, 우선순위 큐
가장 긴 증가하는 부분 수열 5	이진탐색
방의 개수	그래프 - 오일러
사칙연산	DP
평범한 배낭2	DP
파일 합치기	DP - Knuth's optimize
조이스틱	그리디
큰수 만들기	그리디
섬 연결하기	그리디 - 크루스칼, 프림
단속카메라	그리디
징검다리	이분탐색
젼력망을 둘로 나누기	BFS
백준4195	UnionFind
신촌통폐합계획	LinkedList
통신망분석	UnionFind, BFS
코드트리투어
어항정리	빡구현
도넛과막대그래프	그래프
소문난 칠공주	BFS, DFS, 백트래킹
N-Queen	백트래킹