vignettes/articles/reproducible-research.Rmd
reproducible-research.Rmd
Reproducible research was an important consideration in the design of the Poverty and Inequality Platform (PIP).
In order to facilitate the reproducible research process, the PIP API allows you to retrieve
important information about:
- The versions of the datasets used in PIP - The version of the
R
code powering all of PIP computations
By default, the PIP API will always return the most recent data available, using the most recent PPPs available. It is possible to query specific data versions however.
data_versions <- get_versions()
#> Pruning cache
#> Error in get(paste0(generic, ".", class), envir = get_method_env()) :
#> object 'type_sum.accel' not found
data_versions
#> # A tibble: 10 × 4
#> version release_version ppp_version identity
#> <chr> <chr> <chr> <chr>
#> 1 20240627_2017_01_02_PROD 20240627 2017 PROD
#> 2 20240627_2011_02_02_PROD 20240627 2011 PROD
#> 3 20240326_2017_01_02_PROD 20240326 2017 PROD
#> 4 20240326_2011_02_02_PROD 20240326 2011 PROD
#> 5 20230919_2017_01_02_PROD 20230919 2017 PROD
#> 6 20230919_2011_02_02_PROD 20230919 2011 PROD
#> 7 20230328_2017_01_02_PROD 20230328 2017 PROD
#> 8 20230328_2011_02_02_PROD 20230328 2011 PROD
#> 9 20220909_2017_01_02_PROD 20220909 2017 PROD
#> 10 20220909_2011_02_02_PROD 20220909 2011 PROD
Select a version:
my_version <- data_versions$version[1]
my_version
#> [1] "20240627_2017_01_02_PROD"
Pass it to the version
argument of
get_stats
or other functions:
get_stats(country = "AGO", version = my_version)
#> # A tibble: 3 × 44
#> region_name region_code country_name country_code year reporting_level
#> <chr> <chr> <chr> <chr> <dbl> <chr>
#> 1 Sub-Saharan Africa SSA Angola AGO 2000 national
#> 2 Sub-Saharan Africa SSA Angola AGO 2008 national
#> 3 Sub-Saharan Africa SSA Angola AGO 2018 national
#> # ℹ 38 more variables: survey_acronym <chr>, survey_coverage <chr>,
#> # welfare_time <dbl>, welfare_type <chr>, survey_comparability <dbl>,
#> # comparable_spell <chr>, poverty_line <dbl>, headcount <dbl>,
#> # poverty_gap <dbl>, poverty_severity <dbl>, watts <dbl>, mean <dbl>,
#> # median <dbl>, mld <dbl>, gini <dbl>, polarization <dbl>, decile1 <dbl>,
#> # decile2 <dbl>, decile3 <dbl>, decile4 <dbl>, decile5 <dbl>, decile6 <dbl>,
#> # decile7 <dbl>, decile8 <dbl>, decile9 <dbl>, decile10 <dbl>, cpi <dbl>, …
Even if the data is the same, methodological changes may be implemented that may break reproducibility. This is why it is also possible to retrieve information about the version of PIP that runs at a particular moment in time.
PIP is powered primarily by two R packages:
The get_pip_info()
function allows you to retrieve
information about the versions of these packages
pip_info <- get_pip_info()
#> Saving response to cache "e64a91499cabd8232b327aba32e94d6e"
pip_info$pip_packages
#> $pipapi
#> $pipapi$pkg_version
#> [1] "1.3.11.9000"
#>
#> $pipapi$pkg_hash
#> [1] "8dfe898468855ec45852397d86ad27a9b7727c28"
#>
#>
#> $wbpip
#> $wbpip$pkg_version
#> [1] "0.1.4"
#>
#> $wbpip$pkg_hash
#> [1] "80bcd3151f16c3e113086d5f27d778a8262b4adf"
These are the two packages which changes are are most likely to
impact reproducibility, but the get_pip_info()
function
also provides additional information about the R
version
being used by PIP, the Operating System, etc.