--- title: "8. Bayesian Workflow" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{5. Bayesian Workflow} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(bage) ``` # Prior predictive checking - Given model, priors, and typically an exposure/size term, generate data - Some parts of the data generating model would need stronger priors than our standard estimation model - eg can't put a weakly informative prior on the intercept "It is common in Bayesian analysis to use models that are not fully generative. For example, in regression we will typically model an outcome y given predictors x without a generative model for x." (Gelman et al 2020: 11-12) [actually no model is ever fully generative - there are always assumptions about some part of it] - generating data given model and priors is step 1 of a simulation study - we can re-use functions for prior predictive checking and simulation studies # Fitting a model "generic diagnostic tools described by Yao et al. (2018a) and Talts et al. (2020) can be used to verify that a particular approximate algorithm reproduces the features of the posterior that you care about for a specific model."Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018a). Yes, but did it work?: Evaluating variational inference. In Proceedings of International Conference on Machine Learning, 5581– 5590. Talts, S., Betancourt, M., Simpson, D., Vehtari, A., and Gelman, A. (2020). Validating Bayesian inference algorithms with simulation-based calibration. www.stat.columbia.edu/~gelman/ research/unpublished/sbc.pdf [this looks like something we might implement] "a Laplace approximation can be viewed as a data-dependent linearization of the desired model" (Gelman et al 2020: 15) "Fit fast, fail fast" [we can do that!] # Using constructed data to find and understand problems "All this implies that fake data simulation can be particularly relevant in the zone of the parameter space that is predictive of the data. This in turn suggests a two-step procedure in which we first fit the model to real data, then draw parameters from the resulting posterior distribution to use in fake-data checking. The statistical properties of such a procedure are unclear but in practice we have found such checks to be helpful, both for revealing problems with the computation or model, and for providing some reassurance when the fake-data-based inferences do reproduce the assumed parameter value." (Gelman et al 2020: 17) # Addressing computational problems "Starting at simple and complex models and meeting in the middle" (Gelman et al 2020: 22-23) - Easy with bage - eg number of terms, complexity of priors # Evaluating and using a fitted model - bayesplot package has tools for posterior predictive checking [we need to generate outputs that can be fed into these plots] # References @article{gelman2020bayesian, title={Bayesian workflow}, author={Gelman, Andrew and Vehtari, Aki and Simpson, Daniel and Margossian, Charles C and Carpenter, Bob and Yao, Yuling and Kennedy, Lauren and Gabry, Jonah and B{\"u}rkner, Paul-Christian and Modr{\'a}k, Martin}, journal={arXiv preprint arXiv:2011.01808}, year={2020} } @article{van2021bayesian, title={Bayesian statistics and modelling}, author={van de Schoot, Rens and Depaoli, Sarah and King, Ruth and Kramer, Bianca and M{\"a}rtens, Kaspar and Tadesse, Mahlet G and Vannucci, Marina and Gelman, Andrew and Veen, Duco and Willemsen, Joukje and others}, journal={Nature Reviews Methods Primers}, volume={1}, number={1}, pages={1}, year={2021}, publisher={Nature Publishing Group UK London} } https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html