15 Mar 2018

What is Stan

  • Stan is a turing complete, probabilistic programming language that is used mostly for Bayesian inference
  • Stan is a two stage compiler: you write your model in the Stan language, which is translated into C++ and then compiled to the native platform
  • Stan includes a matrix and math library that supports auto-differentiation
  • Stan includes interfaces to R and other languages
  • There are post-inference plotting libraries that can be used with Stan like bayesplot1
  • Stan has thousands of users and growing. Stan is used in clinical drug trials, genomics and cancer biology, population dynamics, psycholinguistics, social networks, finance and econometrics, professional sports, publishing, recommender systems, educational testing, climate models, and many more
  • There companies that are building commercial products on top of Stan

What is Known

  • As a matter of notation we call:
    • Observed data \(\boldsymbol{y}\)
    • Unobserved but observable data \(\boldsymbol{\widetilde{y}}\)
    • Unobserved and unobservable parameters \(\boldsymbol{\theta}\)
    • Covariates \(\boldsymbol{x}\)
    • Probability distribution (density) \(\boldsymbol{p(\cdot)}\)
  • Estimation is the process of figuring out the unknowns, i.e. unobserved quantities
  • In classical machine learning and frequentist inference (including prediction), the problem is framed in terms of the most likely value of \(\boldsymbol{\theta}\)
  • Bayesians are extremely greedy people: they are not satisfied with the maximum, the want the whole thing

Why Bayes

What is Bayes

\[ p(\theta\mid y, X) = \frac{p(y \mid X, \theta) * p(\theta)}{p(y)} = \frac{p(y \mid X, \theta) * p(\theta)}{\int p(y \mid X, \theta) * p(y) \space d\theta} \propto p(y \mid X, \theta) * p(\theta) \]

  • Bayesian inference is an approach to figuring out the updated \(\boldsymbol{p(\theta)}\) after observing \(\boldsymbol{y}\) and \(\boldsymbol{X}\)
  • When \(\boldsymbol{p(y \mid X, \theta)}\) is evaluated at each value of \(\boldsymbol{y}\), it is called a likelihood function - this is our data generating process
  • An MCMC algorithm draws from an implied probability distribution \(\boldsymbol{p(\theta \mid y, X)}\)
  • In Stan we specify: \[ \log[p(\theta) * p(y \mid X, \theta)] = \log[p(\theta)] + \sum_{i=1}^{N}\log[p(y_i \mid x_i, \theta)] \]