turing_guide_01

Turing @model 이 하는일 , MH알고리즘 구현 통한 실행 사례¶

In [335]:

using Turing
using Random
using StatsPlots

Inverse Gamma 분포:¶

연속 확률 분포의 하나로, Gamma 분포의 역수에 대한 분포입니다. 이 분포는 특히 베이지안 통계학에서 분산의 사전 분포로 사용되곤 합니다.

\begin{align*} f(x|\alpha,\beta) = \frac{\beta^\alpha}{\Gamma(\alpha)}x^{-\alpha-1}e^\frac{-\beta}{x} \end{align*}

$\alpha$ : 분포의 모양 결정
$\beta$ : 스케일 조절
특히 정규 분포의 분산에 대한 사전 분포로 적용될 때가 많습니다.
$\alpha > 1$ 일 때 기대값 = $\frac{\beta}{\alpha-1}$
$\alpha > 2$ 일 때 분산값 = $\frac{\beta^2}{(\alpha-1)^2(\alpha-2)}$

In [19]:

plot(InverseGamma(2,3),xlim=(0,10))

Out[19]:

목표 : 주어진 데이터 x,y에 대해 평균(m)과 분산($s^2$)을 추정 하고 싶다¶

Posterior(사후분포) $\propto$ Likelihood(우도) $\times$ Prior(사전분포)
Posterior : $P(m,s^2|x,y)$
$x,y$는 서로 독립(independent)
Likelihood : $P(x,y|m,s^2) = P(x|m,s^2) \times P(y|m,s^2)$
$m,s^2$는 서로 독립(independent)
Prior : $P(m,s^2) = P(m) \times P(s^2)$
Marginal : $P(x,y) = Constant$

\begin{align} P(m,s^2|x,y) &= \frac{P(x,y|m,s^2) \times P(m, s^2)}{P(x,y)} \\ P(m,s^2|x,y) &\propto P(x,y|m,s^2) \times P(m, s^2) \\ &\propto P(x|m,s^2)\times P(y|m,s^2) \times P(m)\times P(s^2) \\ &\propto Normal(x|m,s^2)\times Normal(y|m,s^2) \times Normal(m|0,s^2)\times InverseGamma(s^2|2,3) \end{align}

Data가 n개이고 i.i.d 라고 하고 Normal distribution의 표기를 $N$으로 한다.

\begin{align} P(m,s^2|x_1,x_2,…,x_n,y_1,y_2,…,y_n) &= \frac{P(x_1,x_2,…,x_n,y_1,y_2,…,y_n|m,s^2) \times P(m, s^2)}{P(x_1,x_2,…,x_n,y_1,y_2,…,y_n)} \\ P(m,s^2|x,y) &\propto P(x_1,x_2,…,x_n,y_1,y_2,…,y_n|m,s^2) \times P(m, s^2) \\ &\propto P(x_1|m,s^2)…P(x_n|m,s^2)P(y_1|m,s^2)…P(y_n|m,s^2)P(m)P(s^2) \\ &\propto N(x_1|m,s^2)…N(x_n|m,s^2)N(y_1|m,s^2)…N(y_n|m,s^2)N(m|0,s^2)InverseGamma(s^2|2,3) \end{align}

가정 :¶

평균과 분산의 prior(사전분포) 설정

$m \sim Normal(0, s^2)$
$s^2 \sim InverseGamma(2,3)$

Likelihood 설정

$x \sim Normal(m,s^2)$
$y \sim Normal(m,s^2)$

Posterior

Turing 내부적으로 $Normal(x|m,s^2)\times Normal(y|m,s^2) \times Normal(m|0,s^2)\times InverseGamma(s^2|2,3)$ 를계산
계산의 편의를 위해 log를 취함
$f(x,m,s^2) = \log\left(NormalPDF(x,m,s^2)\times NormalPDF(y,m,s^2) \times NormalPDF(m,0,s^2)\times InverseGammaPDF(s^2,2,3)\right)$

In [316]:

Random.seed!(1234)

Out[316]:

TaskLocalRNG()

In [317]:

@model function gdemo(x,y)
    # 데이터로 부터 추정하고 싶은 Parameter : m,s²
    # ----------------------
    # m,s²의 Prior(사전분포) 정의
    # ----------------------
    s²~ InverseGamma(2,3)
    m ~ Normal(0, √s²)
    
    # ----------------------
    # Likelihood(우도) 정의
    # ----------------------
    x ~ Normal(m, √s²)
    y ~ Normal(m, √s²)
end

Out[317]:

gdemo (generic function with 2 methods)

In [318]:

# Run sampler, collect results.
c1 = sample(gdemo(1.5, 2), SMC(), 1000) 
c1 |> display
c2 = sample(gdemo(1.5, 2), PG(10), 1000)
c2 |> display

Chains MCMC chain (1000×4×1 Array{Float64, 3}):

Log evidence      = -3.732060818236104
Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 0.94 seconds
Compute duration  = 0.94 seconds
parameters        = s², m
internals         = lp, weight

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e ⋯
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

          s²    2.0549    1.5875    0.0696   284.9514   130.2846    1.0056     ⋯
           m    1.1681    0.7836    0.0346   469.6561   559.8397    1.0017     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

          s²    0.5359    1.0212    1.6024    2.6196    6.0446
           m   -0.4017    0.6730    1.1807    1.6421    2.6898

Sampling: 100%|█████████████████████████████████████████| Time: 0:00:05

Chains MCMC chain (1000×4×1 Array{Float64, 3}):

Log evidence      = -3.6051567565349747
Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 6.03 seconds
Compute duration  = 6.03 seconds
parameters        = s², m
internals         = lp, logevidence

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e ⋯
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

          s²    2.0139    1.8245    0.0769   403.9877   231.2021    1.0007     ⋯
           m    1.1431    0.8060    0.0299   698.5490   697.8230    1.0001     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

          s²    0.5122    1.0303    1.5332    2.3425    6.3275
           m   -0.6835    0.6991    1.1811    1.6154    2.6708

In [319]:

c3 = sample(gdemo(1.5,2), HMC(0.1, 5),1000)
c3 |> display
c4 = sample(gdemo(1.5,2), Gibbs(PG(10, :m), HMC(0.1, 5, :s²)), 1000) 
c4 |> display

Chains MCMC chain (1000×12×1 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 1.14 seconds
Compute duration  = 1.14 seconds
parameters        = s², m
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e ⋯
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

          s²    1.9618    1.5160    0.1212   163.1669   257.6945    1.0151     ⋯
           m    1.0900    0.7844    0.0844    88.7448   145.5900    1.0088     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

          s²    0.5508    1.0046    1.4900    2.3898    6.0248
           m   -0.4960    0.5735    1.1046    1.5798    2.6111

Sampling: 100%|█████████████████████████████████████████| Time: 0:00:05

Chains MCMC chain (1000×3×1 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 7.28 seconds
Compute duration  = 7.28 seconds
parameters        = s², m
internals         = lp

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e ⋯
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

          s²    2.2634    2.5711    0.2777    92.2928   125.0998    0.9994     ⋯
           m    1.1579    0.9061    0.0319   713.1817   497.7267    1.0038     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

          s²    0.5502    1.0767    1.5579    2.5547    7.5529
           m   -0.5027    0.7101    1.1531    1.6880    2.8987

In [320]:

c5 = sample(gdemo(1.5, 2), HMCDA(0.15, 0.65), 1000)
c5 |> display
c6 = sample(gdemo(1.5, 2), NUTS(0.65), 1000)
c6 |> display

┌ Info: Found initial step size
└   ϵ = 1.6

Chains MCMC chain (1000×12×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 1.12 seconds
Compute duration  = 1.12 seconds
parameters        = s², m
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e ⋯
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

          s²    2.4875    1.6581    1.0203     3.7638        NaN    1.7752     ⋯
           m    0.9358    0.5414    0.1214    11.9285     4.6125    1.1811     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

          s²    0.8455    1.0369    1.4662    4.7660    4.7660
           m    0.3629    0.6487    0.7829    1.1520    2.4739

┌ Info: Found initial step size
└   ϵ = 1.6

Chains MCMC chain (1000×14×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 1.18 seconds
Compute duration  = 1.18 seconds
parameters        = s², m
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e ⋯
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

          s²    1.9537    1.7819    0.0736   405.6728   386.1903    1.0004     ⋯
           m    1.1527    0.7512    0.0320   593.7742   513.0116    1.0018     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

          s²    0.5481    0.9972    1.4918    2.3792    5.5491
           m   -0.3117    0.6982    1.1799    1.6187    2.6247

In [321]:

# Summarise results
describe(c3)

# Plot results
plot(c3) |> display
savefig("gdemo-plot.png")

Out[321]:

"/data1/Data/julia_dev/StatisticalRethinking/mrchaos_turing/notebook/gdemo-plot.png"

Turing에서 @model이 하는 일을 직접 구현 하고 Metropolis-Hastings sampler를¶

구현하여 동작을 시험해보기¶

@model은 결과적으로 prior와 likelihood를 설정하고 내부적으로 posterior를 계산 하여 return 한다.
아래는 예제는 두개로서 다른점은
- 첫번째는 $s^2$의 제안분포로 InverseGamma를 사용
- 두번째는 $s^2$의 제안분포로 log normal을사용
- 두 제안분포 모두 비대칭이기 떄문에 Hastings 보정을 사용해야 함
Markov Chain이 정상상태(steady state)에서 분포 x가 x’으로 전이가 일어날 확률은 x’에서 x로 전이가 일어날 확률과 동일 : 미세균형

($\theta_c$ 상태에 있을 확률) $\times$ ("$\theta_c$에서 $\theta_p$로 전이할 확률") = ($\theta_p$ 상태에 있을 확률) $\times$ ("$\theta_p$에서 $\theta_c$로 전이할 확률")

"$\theta_c$에서 $\theta_p$로 전이할 확률 T($\theta_c$,$\theta_p$)" $=$ ("$\theta_p$을 제안할 확률 q($\theta_p$|$\theta_c$))" $\times$ ("$\theta_p$를 수용할 확률 a($\theta_c$,$\theta_p$)") 로 볼 수 있다.

"$\theta_p$ 에서 $\theta_c$로 전이할 확률"도 마찬가지 이다.

$ P(\theta_c)T(\theta_c,\theta_p) = P(\theta_p)T(\theta_p,\theta_c) \\ P(\theta_c)q(\theta_p|\theta_c)a(\theta_c,\theta_p) = P(\theta_p)q(\theta_c|\theta_p)a(\theta_p,\theta_c) \\ P(\theta_c) = \frac{f(\theta_c)}{NC} \\ \frac{a(\theta_c,\theta_p)}{a(\theta_p,\theta_c)} = \frac{f(\theta_p)}{f(\theta_c)} * \frac{q(\theta_c|\theta_p)}{q(\theta_p|\theta_c)}\\ \frac{f(\theta_p)}{f(\theta_c)} * \frac{q(\theta_c|\theta_p)}{q(\theta_p|\theta_c)} < 1 \text{ 인 경우}\\ a(\theta_c,\theta_p)\text{ 는 확률이므로 0 ~ 1 사이의 값이다 따라서 항상 이 조건을 만족하기 위해서 } a(\theta_p,\theta_c) = 1 \text{ 가 되어야 한다}\\ \color{red}{a(\theta_c,\theta_p) = \frac{f(\theta_p)}{f(\theta_c)} * \frac{q(\theta_c|\theta_p)}{q(\theta_p|\theta_c)}\\} \frac{f(\theta_p)}{f(\theta_c)} * \frac{q(\theta_c|\theta_p)}{q(\theta_p|\theta_c)} \ge 1 \text{ 인 경우}\\ a(\theta_c,\theta_p)\text{ 는 확률이므로 0 ~ 1 사이의 값이다 따라서 항상 이 조건을 만족하기 위해서 } a(\theta_c,\theta_p) = 1 \text{ 가 되어야 한다}\\ \color{red}{a(\theta_c,\theta_p) = 1}\\ \color{red}{a(\theta_c,\theta_p) = \min\left(1,\frac{f(\theta_p)}{f(\theta_c)} * \frac{q(\theta_c|\theta_p)}{q(\theta_p|\theta_c)}\right)}\\ \theta_c = \{m_c,s^2_c\}\\ \theta_p = \{m_p,s^2_p\}\\ q(\theta_p|\theta_c) = q(m_p,s^2_p|m_c,s^2_c)\\ m,s^2\text{ 서로 독립}\\ q(\theta_p|\theta_c) = q(m_p|m_c,s^2_c)q(s^2_p|m_c,s^2_c)\\ \;\\ \text{좀 더 구체적으로 하면}\\ f(\theta) = f(\theta|data) = Likelihood(data|\theta) \times Prior(\theta)\\ \;\\ \color{blue}{Likelihood(data|\theta) = \mathcal{L}(\theta | data)} \text{ 로 표시함}\\ \color{blue}{Prior(\theta) = \pi(\theta)} \text{ 로 표시함}\\ \;\\ \text{data는 각각 i.i.d}\\ f(m,s^2|data)=\mathcal{L}(m,s^2|x_1,x_2,…,y_1,y_2) \times \pi(m, s^2)\\ =\mathcal{L}(m,s^2|x_1)\mathcal{L}(m,s^2|x_2)..\mathcal{L}(m,s^2|x_n)\mathcal{L}(m,s^2|y_1)\mathcal{L}(m,s^2|y_2)..\mathcal{L}(m,s^2|y_n)\pi(m)\pi(s^2)\\ =pdf.(Normal(m,sqrt(s²)),xArray) \cdot pdf.(Normal(m,sqrt(s²)),yArray) \cdot pdf(Normal(0,sqrt(s²)),m) \cdot pdf(InverseGamma(2,3)),s^2)\\ \;\\ \text{제안분포 설정}\\ m_p \sim Normal(m_c,0.5)\\ q(m_p|m_c,s^2_c) = pdf(Normal(m_c,0.5),m_p) \\ q(m_c|m_p,s^2_p) = pdf(Normal(m_p,0.5),m_c) \\ \;\\ s^2_p \sim InverseGamma(2,3) \text{ 이면}\\ q(s^2_p|m_c,s^2_c) = pdf(InverseGamma(2,3),s^2_p)\\ q(s^2_c|m_p,s^2_p) = pdf(InverseGamma(2,3),s^2_c)\\ \text{hastings 비대칭 보정계수} = \frac{q(\theta_p|\theta_c)}{q(\theta_c|\theta_p)} = \frac{q(m_p|m_c,s^2_c)q(s^2_p|m_c,s^2_c)}{q(m_c|m_p,s^2_p)q(s^2_c|m_p,s^2_p} \\ = \frac{pdf(Normal(m_c,0.5),m_p) \cdot pdf(InverseGamma(2,3),s^2_p)}{pdf(Normal(m_p,0.5),m_c) \cdot pdf(InverseGamma(2,3),s^2_c)} \\ \text{m의 제안분포가 대칭이므로 } q(m_c|m_p,s^2_p) == q(m_p|m_c,s^2_c)\\ = \frac{pdf(InverseGamma(2,3),s^2_p)}{pdf(InverseGamma(2,3),s^2_c)} \\ $

$s^2$의 제안분포를 InverseGamma를 사용하는 경우¶

In [328]:

let
    using Distributions
    
    function gdemo_pdf(x,y,m,s²)
        prior_m =pdf(Normal(0,sqrt(s²)),m)
        prior_s²= pdf(InverseGamma(2,3),s²)
        likelihood_x = pdf.(Normal(m,sqrt(s²)),x)
        likelihood_y = pdf.(Normal(m,sqrt(s²)),y)
        # 계산의 편의를 위해 log값 계산후 exp
        # 데이터가 많은 경우  P(x[1])P(x[2])...P(x[n]) 계산시 언더플로우 오류 발생 할 수 있음
        return exp(sum(vcat(log.(likelihood_x),log.(likelihood_y),log(prior_m), log(prior_s²))))
    end
    
    function metropolis_hastings(x::Union{Real,AbstractArray{<:Real}}, y::Union{Real,AbstractArray{<:Real}},iterations::Int64)
        m_current, s²_current = 0.0, 1.0 # 초기값
        m_samples, s²_samples = [],[]
        α, β = 2, 3  # InverseGamma의 모수를 정의합니다.
        for i in 1:iterations
            # 제안 메커니즘
            m_proposed = m_current + rand(Normal(0,0.5))            
            s²_proposed = rand(InverseGamma(α,β))

            # Hastings 수정 요소 계산
            # s²의 제안분포가 비대칭 이므로 Hastings를 계산
            hastings = pdf(InverseGamma(α,β), s²_current) /
                          pdf(InverseGamma(α,β), s²_proposed)
            
            # 수락/거절 메커니즘
            acceptance_ratio = (gdemo_pdf(x,y,m_proposed, s²_proposed) /
                               gdemo_pdf(x,y,m_current,  s²_current)) * hastings
        
            if rand() < acceptance_ratio
                m_current, s²_current = m_proposed, s²_proposed
            end
            push!(m_samples, m_current)
            push!(s²_samples,s²_current)

        end
        return m_samples, s²_samples
    end    
    #x,y  = rand(Normal(5,1),100), rand(Normal(0,1),100)
    x,y  = 1.5, 2
    burnin = 500
    m_samples,s²_samples = metropolis_hastings(x,y,2000)
    mean(m_samples[burnin+1:end]) |> display
    mean(s²_samples[burnin+1:end]) |> display
    median(m_samples) |> display
    median(s²_samples) |> display    

    # density plot 그리기
    p1 = density(m_samples, label="m samples", title="Density of m samples", alpha=0.6)
    p2 = density(s²_samples, label="s^2 samples", title="Density of s^2 samples", alpha=0.6)    
    
    # 두 그래프를 함께 표시
    plot(p1, p2, layout=(1,2))    
end

1.0848378643676282

1.8986060134628302

1.1286481685585872

1.537192529367998

Out[328]:

$s^2$의 제안분포를 Log Normal을 사용하는 경우¶

In [329]:

let
    using Distributions
    
    function gdemo_pdf(x,y,m,s²)
        prior_m =pdf(Normal(0,sqrt(s²)),m)
        prior_s²= pdf(InverseGamma(2,3),s²)
        likelihood_x = pdf.(Normal(m,sqrt(s²)),x)
        likelihood_y = pdf.(Normal(m,sqrt(s²)),y)
        # 계산의 편의를 위해 log값 계산후 exp
        # 데이터가 많은 경우  P(x[1])P(x[2])...P(x[n]) 계산시 언더플로우 오류 발생 할 수 있음
        return exp(sum(vcat(log.(likelihood_x),log.(likelihood_y),log(prior_m), log(prior_s²))))
    end
    
    function metropolis_hastings(x::Real, y::Real,iterations::Int64)
        m_current, log_s²_current = 0.0, 0.0 # 초기값
        m_samples, s²_samples = [],[]
        
        for i in 1:iterations
            # 제안 메커니즘
            m_proposed = m_current + rand(Normal(0,0.5))            
            log_s²_proposed = log_s²_current + rand(Normal(0, 0.5))        

            # Hastings 수정 요소 계산 q(x|x') / q(x'|x)
            hastings = pdf(Normal(log_s²_proposed, 0.5), log_s²_current) / 
                   pdf(Normal(log_s²_current, 0.5), log_s²_proposed)
            
            # 수락/거절 메커니즘
            acceptance_ratio = (gdemo_pdf(x,y,m_proposed, exp(log_s²_proposed)) /
                               gdemo_pdf(x,y,m_current,  exp(log_s²_current))) * hastings
        
        if rand() < acceptance_ratio
            m_current, log_s²_current = m_proposed, log_s²_proposed
        end
            push!(m_samples, m_current)
            push!(s²_samples,exp(log_s²_current))

        end
        return m_samples, s²_samples
    end
    
    x,y  = 1.5, 2.0
    burnin = 500
    m_samples,s²_samples = metropolis_hastings(x,y,20000)
    mean(m_samples[burnin+1:end]) |> display
    mean(s²_samples[burnin+1:end]) |> display
#     median(m_samples) |> display
#     median(s²_samples) |> display    
   
    # density plot 그리기
    p1 = density(m_samples, label="m samples", title="Density of m samples", alpha=0.6)
    p2 = density(s²_samples, label="s^2 samples", title="Density of s^2 samples", alpha=0.6)    
    
    # 두 그래프를 함께 표시
    plot(p1, p2, layout=(1,2))    
end

1.1693791807831144

1.334607912803371

Out[329]:

[Julia, Turing] Turing @model 이 하는일 , MH알고리즘 구현 통한 실행 사례