Runs the specification/multiverse analysis across specified models.
This is the central function of the package and represent the second step
in the analytic framework implemented in the package specr
. It estimates
and returns respective parameters and estimates of models that were specified
via setup()
.
specr(x, data = NULL, ...)
A specr.setup
object resulting from setup
or a tibble that
contains the relevant specifications (e.g., a tibble resulting from
as_tibble(setup(...))
).
If x is not an object of "specr.setup" and simply a tibble, you
need to provide the data set that should be used. Defaults to NULL as it is
assumend that most users will create an object of class "specr.setup" that they'll
pass to specr()
.
Further arguments that can be passed to future_pmap
. This only becomes
important if parallelization is used. For example, if a custom model function is used
this involves passing furrr_options
passing to the argument .options
.
When a plan for parallelization is set, one can also set .progress = TRUE
to print a progress bar during the fitting process. See details for more information
on parallelization.
An object of class specr.object
, which includes a data frame
with all specifications their respective results along with many other useful
information about the model. Parameters are extracted via the function passed
to setup
. By default this is broom::tidy()
and the function
broom::glance()
).Several other aspects and information are included in
the resulting class (e.g., number of specifications, time elapsed, subsets
included in the analyses). Use methods(class = "specr.object")
for
an overview on available methods.
Empirical results are often contingent on analytical decisions that are equally defensible, often arbitrary, and motivated by different reasons. This decisions may introduce bias or at least variability. To this end, specification curve analyses (Simonsohn et al., 2020) or multiverse analyses (Steegen et al., 2016) refer to identifying the set of theoretically justified, statistically valid (and potentially also non-redundant specifications, fitting the "multiverse" of models represented by these specifications and extract relevant parameters often to display the results graphically as a so-called specification curve. This allows readers to identify consequential specifications decisions and how they affect the results or parameter of interest.
Use of this function
A general overview is provided in the vignettes vignette("specr")
.
Generally, you create relevant specification using the function setup()
.
You then pass the resulting object of a class specr.setup
to the
present function specr()
to run the specification curve analysis.
Further note that the resulting object of class specr.object
allows
to use several generic function such as summary()
or plot()
.
Use methods(class = "specr.object")
for an overview on available
methods and e.g., ?plot.specr.object
to view the dedicated help page.
Parallelization
By default, the function fits models across all specifications sequentially
(one after the other). If the data set is large, the models complex (e.g.,
large structural equation models, negative binomial models, or Bayesian models),
and the number of specifications is large, it can make sense to parallelize
these operations. One simply has to load the package furrr
(which
in turn, builds on future
) up front. Then parallelizing the fitting process
works as specified in the package description of furr
/future
by setting a
"plan" before running specr
such as:
plan(multisession, workers = 4)
However, there are many more ways to specifically set up the plan, including
different strategy than multisession
. For more information, see
vignette("parallelization")
and the
reference page
for plan()
.
Disclaimer
We do see a lot of value in investigating how analytical choices affect a statistical outcome of interest. However, we strongly caution against using specr as a tool to somehow arrive at a better estimate compared to a single model. Running a specification curve analysis does not make your findings any more reliable, valid or generalizable than a single analysis. The method is meant to inform about the effects of analytical choices on results, and not a better way to estimate a correlation or effect.
Simonsohn, U., Simmons, J.P. & Nelson, L.D. (2020). Specification curve analysis. Nature Human Behaviour, 4, 1208–1214. https://doi.org/10.1038/s41562-020-0912-z
Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing Transparency Through a Multiverse Analysis. Perspectives on Psychological Science, 11(5), 702-712. https://doi.org/10.1177/1745691616658637
setup()
for the first step of setting up the specifications.
summary.specr.object()
for how to summarize and inspect the results.
plot.specr.object()
for plotting results.
# Example 1 ----
# Setup up typical specifications
specs <- setup(data = example_data,
y = c("y1", "y2"),
x = c("x1", "x2"),
model = "lm",
controls = c("c1", "c2"),
subsets = list(group1 = unique(example_data$group1)))
# Run analysis (not parallelized)
results <- specr(specs)
# Summary of the results
summary(results)
#> Results of the specification curve analysis
#> -------------------
#> Technical details:
#>
#> Class: specr.object -- version: 1.0.1
#> Cores used: 1
#> Duration of fitting process: 0.5 sec elapsed
#> Number of specifications: 64
#>
#> Descriptive summary of the specification curve:
#>
#> median mad min max q25 q75
#> 0.13 0.41 -0.39 0.66 -0.14 0.39
#>
#> Descriptive summary of sample sizes:
#>
#> median min max
#> 338.5 323 1000
#>
#> Head of the specification results (first 6 rows):
#>
#> # A tibble: 6 × 25
#> x y model controls subsets group1 formula estimate std.error statistic
#> <chr> <chr> <chr> <chr> <chr> <fct> <glue> <dbl> <dbl> <dbl>
#> 1 x1 y1 lm no cova… middle middle y1 ~ x… 0.61 0.07 9.28
#> 2 x1 y1 lm no cova… old old y1 ~ x… 0.66 0.06 10.4
#> 3 x1 y1 lm no cova… young young y1 ~ x… 0.57 0.06 10.2
#> 4 x1 y1 lm no cova… all NA y1 ~ x… 0.62 0.04 16.4
#> 5 x1 y1 lm c1 middle middle y1 ~ x… 0.6 0.07 8.81
#> 6 x1 y1 lm c1 old old y1 ~ x… 0.64 0.06 9.81
#> # ℹ 15 more variables: p.value <dbl>, conf.low <dbl>, conf.high <dbl>,
#> # fit_r.squared <dbl>, fit_adj.r.squared <dbl>, fit_sigma <dbl>,
#> # fit_statistic <dbl>, fit_p.value <dbl>, fit_df <dbl>, fit_logLik <dbl>,
#> # fit_AIC <dbl>, fit_BIC <dbl>, fit_deviance <dbl>, fit_df.residual <dbl>,
#> # fit_nobs <dbl>
# Example 2 ----
# Working without S3 classes
specs2 <- setup(data = example_data,
y = c("y1", "y2"),
x = c("x1", "x2"),
model = "lm",
controls = "c1")
# Working with tibbles
specs_tibble <- as_tibble(specs2) # extract tibble from setup
results2 <- specr(specs_tibble,
data = example_data) # need to provide data!
# Results (tibble instead of S3 class)
head(results2)
#> # A tibble: 6 × 26
#> x y model controls subsets formula model_function term estimate
#> <chr> <chr> <chr> <chr> <chr> <glue> <list> <chr> <dbl>
#> 1 x1 y1 lm no covariates all y1 ~ x1… <fn> x1 0.620
#> 2 x1 y1 lm c1 all y1 ~ x1… <fn> x1 0.603
#> 3 x1 y2 lm no covariates all y2 ~ x1… <fn> x1 -0.328
#> 4 x1 y2 lm c1 all y2 ~ x1… <fn> x1 -0.339
#> 5 x2 y1 lm no covariates all y1 ~ x2… <fn> x2 0.272
#> 6 x2 y1 lm c1 all y1 ~ x2… <fn> x2 0.249
#> # ℹ 17 more variables: std.error <dbl>, statistic <dbl>, p.value <dbl>,
#> # conf.low <dbl>, conf.high <dbl>, fit_r.squared <dbl>,
#> # fit_adj.r.squared <dbl>, fit_sigma <dbl>, fit_statistic <dbl>,
#> # fit_p.value <dbl>, fit_df <dbl>, fit_logLik <dbl>, fit_AIC <dbl>,
#> # fit_BIC <dbl>, fit_deviance <dbl>, fit_df.residual <int>, fit_nobs <int>