Many data are hierarchical and we want to acknowledge the nested
structure in our models. Specr can easily estimate such multilevel
models. We again have to write a customized function that we can pass
first to setup() and the specr() will do the
rest.
For this example, we will us the gapminder data set that
is included in the gapminder-package. We quickly recode
some variable to get more interpretable estimates.
# Load packages
library(tidyverse)
library(specr)
library(gapminder)
library(lme4)
library(furrr)  # For parallelization!
# Recode some variables
gapminder <- gapminder %>%
  mutate(gdpPercap_log = log(gdpPercap),
         pop = pop/1000)
# Check data
head(gapminder)
#> # A tibble: 6 × 7
#>   country     continent  year lifeExp    pop gdpPercap gdpPercap_log
#>   <fct>       <fct>     <int>   <dbl>  <dbl>     <dbl>         <dbl>
#> 1 Afghanistan Asia       1952    28.8  8425.      779.          6.66
#> 2 Afghanistan Asia       1957    30.3  9241.      821.          6.71
#> 3 Afghanistan Asia       1962    32.0 10267.      853.          6.75
#> 4 Afghanistan Asia       1967    34.0 11538.      836.          6.73
#> 5 Afghanistan Asia       1972    36.1 13079.      740.          6.61
#> 6 Afghanistan Asia       1977    38.4 14880.      786.          6.67For this example, we use the package lme4 and more
specifically the function lmer() to estimate the multilevel
model (more complex models such as poisson or negative binomial
multilevel models can likewise be estimated).
Based on the data set, we want to estimate the relationship between
gdpPercap (GDP per capita) and lifeExp (life
expectancy). Both variables are nested within both countries and years.
We can simply add a respective random effect structure via the argument
add_to_formula. This way, this will be automatically
included in the formula of all specifications. Because
broom doesn’t provide a tidy function for
merMod-objects resulting from lme4::lmer(), we
need to add a new extraction function like so
fun1 = new_function. Luckily, we can use the
broom.mixed package, which agian provides a tidy function
for such objects.
specs <- setup(data = gapminder,
                y = c("lifeExp"),
                x = c("gdpPercap_log"), 
                model = c("lmer"),
                controls = "pop",
                fun1 = function(x) broom.mixed::tidy(x, conf.int = TRUE),
                add_to_formula = "(1|country) + (1|year)")
# Check formula
summary(specs)
#> Setup for the Specification Curve Analysis
#> -------------------------------------------
#> Class:                      specr.setup -- version: 1.0.0 
#> Number of specifications:   2 
#> 
#> Specifications:
#> 
#>   Independent variable:     gdpPercap_log 
#>   Dependent variable:       lifeExp 
#>   Models:                   lmer 
#>   Covariates:               no covariates, pop 
#>   Subsets analyses:         all 
#> 
#> Function used to extract parameters:
#> 
#>   function(x) broom.mixed::tidy(x, conf.int = TRUE)
#> 
#> 
#> Head of specifications table (first 6 rows):
#> # A tibble: 2 × 6
#>   x             y       model controls      subsets formula                     
#>   <chr>         <chr>   <chr> <chr>         <chr>   <glue>                      
#> 1 gdpPercap_log lifeExp lmer  no covariates all     lifeExp ~ gdpPercap_log + 1…
#> 2 gdpPercap_log lifeExp lmer  pop           all     lifeExp ~ gdpPercap_log + p…
# Run analysis and inspect results
results <- specr(specs)
#> Warning: Some predictor variables are on very different scales: consider
#> rescaling
as_tibble(results)
#> # A tibble: 2 × 22
#>   x       y     model controls subsets formula model_function effect group term 
#>   <chr>   <chr> <chr> <chr>    <chr>   <glue>  <list>         <chr>  <chr> <chr>
#> 1 gdpPer… life… lmer  no cova… all     lifeEx… <fn>           fixed  NA    gdpP…
#> 2 gdpPer… life… lmer  pop      all     lifeEx… <fn>           fixed  NA    gdpP…
#> # … with 12 more variables: estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> #   conf.low <dbl>, conf.high <dbl>, fit_nobs <int>, fit_sigma <dbl>,
#> #   fit_logLik <dbl>, fit_AIC <dbl>, fit_BIC <dbl>, fit_REMLcrit <dbl>,
#> #   fit_df.residual <int>Sometimes, we may not want to add one random effect structure to all models and instead explore more specific random structure (and even several different random effect structures). In this case, we create several customized lmer-functions that account for different nesting structures.
# Random intercept model (only country as grouping variable)
lmer_ri_1 <- function(formula, data,...) {
  require(lme4)
  require(broom.mixed)
  formula <- paste(formula, "+ (1|country)")
  lmer(formula, data)
}
# Including random slopes (only country as grouping variable)
lmer_rs_1 <- function(formula, data,...) {
  require(lme4)
  require(broom.mixed)
  slopevars <- unlist(strsplit(formula, " ~ "))[2]
  formula <- paste0(formula, "+ (1 + ", slopevars, "|country)" )
  lmer(formula, data)
}
# Random intercept model (lifeExp is nested in both countries and years)
lmer_ri_2 <- function(formula, data,...) {
  require(lme4)
  require(broom.mixed)
  formula <- paste0(formula, "+ (1|country) + (1|year)")
  lmer(formula, data)
}
# Including random slopes (intercept and slopes are nested in both countries and years)
lmer_rs_2 <- function(formula, data,...) {
  require(lme4)
  require(broom.mixed)
  slopevars <- unlist(strsplit(formula, " ~ "))[2]
  formula <- paste0(formula, "+ (1 + ", slopevars, "|country) + (", slopevars, "|year)" )
  lmer(formula, data)
}We can now use these function to estimate these models. In this example, we investigate the influence of different nesting structures on the fixed effect between GDP per capita and life expectancy.
# Setup specifications with customized functions
specs <- setup(data = gapminder,
                y = c("lifeExp"),
                x = c("gdpPercap_log"), 
                model = c("lmer_ri_1", "lmer_ri_2", 
                          "lmer_rs_1", "lmer_rs_2"),
               controls = "pop")
# Check specifications
summary(specs)
#> Setup for the Specification Curve Analysis
#> -------------------------------------------
#> Class:                      specr.setup -- version: 1.0.0 
#> Number of specifications:   8 
#> 
#> Specifications:
#> 
#>   Independent variable:     gdpPercap_log 
#>   Dependent variable:       lifeExp 
#>   Models:                   lmer_ri_1, lmer_ri_2, lmer_rs_1, lmer_rs_2 
#>   Covariates:               no covariates, pop 
#>   Subsets analyses:         all 
#> 
#> Function used to extract parameters:
#> 
#>   function (x) 
#> broom::tidy(x, conf.int = TRUE)
#> <environment: 0x7fd618c393d8>
#> 
#> 
#> Head of specifications table (first 6 rows):
#> # A tibble: 6 × 6
#>   x             y       model     controls      subsets formula                 
#>   <chr>         <chr>   <chr>     <chr>         <chr>   <glue>                  
#> 1 gdpPercap_log lifeExp lmer_ri_1 no covariates all     lifeExp ~ gdpPercap_log…
#> 2 gdpPercap_log lifeExp lmer_ri_1 pop           all     lifeExp ~ gdpPercap_log…
#> 3 gdpPercap_log lifeExp lmer_ri_2 no covariates all     lifeExp ~ gdpPercap_log…
#> 4 gdpPercap_log lifeExp lmer_ri_2 pop           all     lifeExp ~ gdpPercap_log…
#> 5 gdpPercap_log lifeExp lmer_rs_1 no covariates all     lifeExp ~ gdpPercap_log…
#> 6 gdpPercap_log lifeExp lmer_rs_1 pop           all     lifeExp ~ gdpPercap_log…Now, we can simply fit the models with specr() as we are
used to. Although not really necessary here, I going to use
parallelization to exemplify how we can pass the created function to the
workers using furrr_options. We need to pass both new
functions as a list of “globals” and the packages that are additionally
required.
# Create furrr_options and pass globals
opts <- furrr_options(
  globals = list(lmer_ri_1 = lmer_ri_1, lmer_ri_2 = lmer_ri_2, 
                 lmer_rs_1 = lmer_rs_1, lmer_rs_2 = lmer_rs_2),
  packages = c("lme4", "broom.mixed")
)
# Specify future plan
plan(multisession, workers = 4)
# Run analysis in parallel and plot results
results <- specr(specs, .options = opts)  # pass `furrr_options` here
plot(results)
At the end of our analysis, it makes sense to explicitly close multisession workers by switching the plan back to sequential.
plan(sequential)