Part 1: Intro to causal inference and do-calculus

2021. 7. 15. 14:20

728x90

https://www.inference.vc/untitled/

ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus

Since writing this post back in 2018, I have extended this to a 4-part series on causal inference: ➡️️ Part 1: Intro to causal inference and do-calculus Part 2: Illustrating Interventions with a Toy Example Part 3: Counterfactuals Part 4: Causal Diag

www.inference.vc

causal inference에 대해 접한 적 있는가..? 아마 Bayesian network를 공부하면서 한번쯤을 보았을 것이다. (화살표로 인과관계 설명..) 그러나 요즘에 핫한 deep learning에서는 주로 causal inference를 사용하지 않기 때문에 접할일이 별로 없고 이것이 왜 중요하고 어떻게 사용되는지 접할일이 별로 없다. 그러나 causal inference는 deep learning을 보완하기 때문에 한번 공부해보자고 글쓴이가 말한다.. (Don't get discouraged by causal diagrams looking a lot like Bayesian networks not a coincidence seeing they were both pioneered by Pearl) they don't compete with, they complement deep learning.)

Basics

일단 causal calculus에서는 두가지의 conditional distributions을 구분한다. ML에서는 보통 둘중 하나만을 estimate하지만 둘다 하는 경우도 있다.

1) observational $p (y | x)$

$p (y | x)$

이 방법의 경우, 데이터를 분석하는 사람이 데이터를 직접적으로 setting해서 지정했을때 Y의 distribution을 의미한다.(What is the distribution of $Y$ if I were to set the value of $X$ to $x$ .) ( intervene 한다고 표현 : if I intervened in the data generating process by artificially forcing the variable $X$ to take value $x$ , but otherwise simulating the rest of the variables according to the original process that generated the data.)

Aren't they the same thing?

위의 두 conditional distribution은 다르다.

observational p( y | x )는 기존과 같이 x를 observe하고 그에 맞는 y를 예측하기 때문에 p(y|x)라고 표현이 된다.

그러나 interventional $p (y | d o (x))$

Which one do I want?

둘 중에 어떤 것을 사용할지는 어떤 문제를 푸느냐에 따라 다르다. x가 자연적으로 발생하는 상황에서 y를 측정하는 task라면 observational p( y | x )를 사용하는 것이 적합하고 이것은 이미 supervised learning에서 사용하고 있다.

(this is what Judea Pearl called curve fitting. This is all good for a range of important applications such as classification, image segmentation, super-resolution, voice transcription, machine translation, and many more.)

그러나 x를 우리가 선택하는 경우는 p(y|do(x)) 를 사용해야 된다. 예를들어서 의료분야에서는 특정 치료법을 적용했을때 결과 (y)를 알고 싶을 때가 있다. 이때, x는 자연적으로 발생한 것이 아니라 인위적으로 설정한 x이고 (proactively choose , intervene) 이때는 $p (y | d o (x))$

What exactly is p(y|do(x))

p(y|do(x))는 측정하려면 실험자가 x를 control하며 수행하는 randomized controlled trials or A/B tests로 부터 data를 collect해야 된다. 그러나 실제로 이러한 수행자체가 어려운 경우가 많다. (의료 관련 실험의 경우 여러가지 제약이 있다.)

때문에 직접적으로 p(y|do(x))를 측정하기 어려운 것을 간접적으로 계산하는 것이 'causal inference and do-calculus'이다.

How are all these things related?

일단 p(y|x)에 맞춰서 그린 diagram을 살펴보자. observable joint라고 적힌 그림에 있는 동그라미 3개가 바로 변수를 의미한다. (x, z, y)그리고 파란색의 작은 사각형이 이 3개의 변수의 joint distribution을 의미한다.이때 x값으로 부터 y를 측정하기를 원하는 것이고, z는 측정은 가능하지만 현재 task에서는 관심없는 변수를 의미한다.이는 우리가 평소에 하던 deep learning network를 이용해서 cross entropy loss를 사용하면 학습이 가능하다.

그러면 p(y|do(x))를 구하면 싶으면 어떻게 해야 될까?

원래라면 오른쪽 그림처럼 intervention joint distributuion에서 sampling을 진행하여, 모델을 학습시키면 된다. 그러나 앞서 이야기 한 것처럼 직접적으로 intervention joint distribution에서 sampling하는것은 어렵거나 불가능할 수 있고, 때문에 observable joint distribution으로 부터 sampling하여 p(y|do(x))를 계산하여야 한다.

Causal models

위와 같이 obervalble joint distribution에서 sampling한 data로 p(y|do(x))를 계산하려면, data간의 'causal structure'를 알고 있어야 한다. 이러한 data간의 causal 관계를 표현하면 아래 그림처럼 된다.

이렇게 causal 관계를 표현한식에서 intervention을 수행하는 것에 따라 우리는 기존의 causal model에서 mutilated를 수행할 수 있다. ( 기존 causal 관계도에서 화살표를 제거함) 이렇게 mutilated를 수행하여 우리가 원하는 x의 p(y|do(x)) 구하는 형태로 변형이 가능하다. (예시가 없어서 정확히는 이해 못했음) 이러한 과정을 거쳐서 원래의 intervention joint distribution을 근사한 emulated intervention joint distribution을 구할 수 있고, 따라서 p(y|do(x)) 또한 간접적으로 구할 수 있게 된다.

Do-calculus

이러한 과정에서 'Do-calculus'라는 것을 사용해서 수학적으로 표현할수 있다고 한다 (정확히 이해못함)

Summary

I wanted to emphasize again that this is not a question of whether you work on deep learning or causal inference. You can, and in many cases you should, do both. Causal inference and do-calculus allows you to understand a problem and establish what needs to be estimated from data based on your assumptions captured in a causal diagram. But once you've done that, you still need powerful tools to actually estimate that thing from data. Here, you can still use deep learning, SGD, variational bounds, etc. It is this cross-section of deep learning applied to causal inference which the recent article with Pearl claimed was under-explored.

728x90

저작자표시 (새창열림)

'Math > Causal Inference' 카테고리의 다른 글

Part 3: Counterfactuals (0)	2021.07.21
Part 2: Illustrating Interventions with a Toy Example (0)	2021.07.16

Learning to Learn

Part 1: Intro to causal inference and do-calculus

'Math > Causal Inference' 카테고리의 다른 글

+ Recent posts

티스토리툴바