an example of randow

In the RW Metropolis, we must choose the increment covariance matrix $Sigma$ and the scaling factor $s$ . We often simply choose $Sigma$ to be either $I$ or the asymptotic covariance matrix. Thereby, the scaling of the RW Metropolis is critical to its successful use. In practice, we can choose the initial scaling factor as a function of the dimension $s_{0} = 2.93 / sqrt{d}$ according to the Roberts and Rosenthal guidelines. For more details, please refer to the Chapter 3 in Rossi et al. (2012).

R Implementation

We consider a simple example:
$$
y = alpha + beta x + epsilon, epsilon sim N(0, sigma^{2}),
$$
and test the RW Metropolis algorithm by simulated data. We set $alpha = -2$, $beta = 4$ and $sigma = 1$. We choose scaling factor $s = 2.93/sqrt{3}$ and start with (0,0,2). With 20,000 draws in total for each chain and regarding the first 5,000 draws as the burn-in period, we plotted the posterior probability distribution.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58

rm(list = ls())
library(MASS)
library(magrittr)
library(ggplot2)


rw.metropolis <- function(, startvalue, s,
N = 2e4, burnin = 5e3){
nvar <- length(startvalue)
chain <- matrix(0, nrow = N, ncol = nvar) # declare the chain
current.value <- startvalue
chain[1,] <- startvalue
for (i in 2:N){
# get new proposal vector by random walk
next.value <- c(mvrnorm(n = 1, mu = current.value[-nvar], Sigma = s * diag(nvar - 1)), truncated.normal(current.value[nvar], s, 0, 3))
# calculate the acceptance rate
acceptance.rate <- min(0, ptarget(next.value) - ptarget(current.value))
if (log(runif(1)) < acceptance.rate){
current.value <- next.value
}
chain[i,] <- current.value
}
return(chain[(burnin + 1):N,]) # return stable density
}

# simulated multiregression data
set.seed(1234)
n.sample <- 1000
X <- c(rep(1, n.sample), rnorm(n.sample, mean = 12, sd = 4)) %>% matrix(ncol = 2)
beta.true <- c(-2, 4)
sigma.true <- 1
y <- X %*% beta.true + rnorm(n.sample, mean = 0, sd = sigma.true)

# construct the posterior density
ptarget <- function(parameter.est){
prior <- dnorm(parameter.est[1:2], sd = 5, log = TRUE) %>% sum() + dunif(parameter.est[3], min = 0, max = 3, log = TRUE)
likelihood <- dnorm(y, mean = X %*% parameter.est[1:2], sd = parameter.est[3], log = TRUE) %>% sum
return(prior + likelihood)
}

# define the truncated normal distribution
truncated.normal <- function(mu, sigma, a, b){
phi.a <- pnorm(a, mean = mu, sd = sigma)
phi.b <- pnorm(b, mean = mu, sd = sigma)
# using length(mu) to extend it to vectors
return(mu + sigma * qnorm(runif(length(mu)) * (phi.b - phi.a) + phi.a))
}

# report the results
startvalue <- c(0, 0, 2)
s <- 2.93 / sqrt(length(startvalue))
u <- rw.metropolis(ptarget, startvalue, s)
posterior <- u %>% as.vector()
var <- c(rep("intercept", nrow(u)), rep("beta", nrow(u)), rep("sigma", nrow(u))) %>% as.factor()
rwmetro.res <- data.frame(posterior, var)
# plot the posterior density
rwmetro.res %>% ggplot(aes(x = posterior, colour = var)) + geom_density()

Posterior Probability Distribution

We can compare these distributions with the true values $alpha = -2$, $beta = 4$ and $sigma = 1$.

Scaling Factor

We can try different scaling factors and choose the best one.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

library(bayesm)
s0 <- (2.93 / sqrt(length(startvalue))) %>% round(digits = 2)
s.vect <- seq(from = s0 / 2, to = s0 * 2, by = .075 * s0) %>% round(digits = 2)
N <- 2e4
burnin <- N / 4
# calculate the numerical efficiency
num.eff.cal <- function(x){
return(numEff(x)$f %>% sqrt)
}
numerical.efficiency <- vector(mode = "numeric", length = length(s.vect))
for (i in 1:length(s.vect)){
u <- rw.metropolis(ptarget, startvalue, s.vect[i], N, burnin)
numerical.efficiency[i] <- apply(u, 2, num.eff.cal) %>% mean(na.rm = TRUE)
}
# plot the results
scaling.res <- data.frame(scale = s.vect, efficiency = 1 / numerical.efficiency)
scaling.res <- scaling.res[complete.cases(scaling.res),]
ggplot(scaling.res, aes(scale, efficiency)) + geom_point() + geom_line()

Posterior Probability Distribution

An efficient scaling factor around $s_{0} = 1.69$ is $s^{*} = 1.86$.

References

Rossi, P. E., Allenby, G. M., & McCulloch, R. (2012). Bayesian statistics and marketing. John Wiley & Sons.