Reproducible research was an important consideration in the design of the Poverty and Inequality Platform (PIP).

In order to facilitate the reproducible research process, the PIP API allows you to retrieve important information about:
- The versions of the datasets used in PIP - The version of the R code powering all of PIP computations

Data versioning in PIP

By default, the PIP API will always return the most recent data available, using the most recent PPPs available. It is possible to query specific data versions however.

Listing all data versions available in PIP

data_versions <- get_versions()
#> Pruning cache
#> Error in get(paste0(generic, ".", class), envir = get_method_env()) : 
#>   object 'type_sum.accel' not found
data_versions
#> # A tibble: 10 × 4
#>    version                  release_version ppp_version identity
#>    <chr>                    <chr>           <chr>       <chr>   
#>  1 20240627_2017_01_02_PROD 20240627        2017        PROD    
#>  2 20240627_2011_02_02_PROD 20240627        2011        PROD    
#>  3 20240326_2017_01_02_PROD 20240326        2017        PROD    
#>  4 20240326_2011_02_02_PROD 20240326        2011        PROD    
#>  5 20230919_2017_01_02_PROD 20230919        2017        PROD    
#>  6 20230919_2011_02_02_PROD 20230919        2011        PROD    
#>  7 20230328_2017_01_02_PROD 20230328        2017        PROD    
#>  8 20230328_2011_02_02_PROD 20230328        2011        PROD    
#>  9 20220909_2017_01_02_PROD 20220909        2017        PROD    
#> 10 20220909_2011_02_02_PROD 20220909        2011        PROD

Querying a specific data version

Select a version:

my_version <- data_versions$version[1]
my_version
#> [1] "20240627_2017_01_02_PROD"

Pass it to the version argument of get_stats or other functions:

get_stats(country = "AGO", version = my_version)
#> # A tibble: 3 × 44
#>   region_name        region_code country_name country_code  year reporting_level
#>   <chr>              <chr>       <chr>        <chr>        <dbl> <chr>          
#> 1 Sub-Saharan Africa SSA         Angola       AGO           2000 national       
#> 2 Sub-Saharan Africa SSA         Angola       AGO           2008 national       
#> 3 Sub-Saharan Africa SSA         Angola       AGO           2018 national       
#> # ℹ 38 more variables: survey_acronym <chr>, survey_coverage <chr>,
#> #   welfare_time <dbl>, welfare_type <chr>, survey_comparability <dbl>,
#> #   comparable_spell <chr>, poverty_line <dbl>, headcount <dbl>,
#> #   poverty_gap <dbl>, poverty_severity <dbl>, watts <dbl>, mean <dbl>,
#> #   median <dbl>, mld <dbl>, gini <dbl>, polarization <dbl>, decile1 <dbl>,
#> #   decile2 <dbl>, decile3 <dbl>, decile4 <dbl>, decile5 <dbl>, decile6 <dbl>,
#> #   decile7 <dbl>, decile8 <dbl>, decile9 <dbl>, decile10 <dbl>, cpi <dbl>, …

Retrieve information about PIP code

Even if the data is the same, methodological changes may be implemented that may break reproducibility. This is why it is also possible to retrieve information about the version of PIP that runs at a particular moment in time.

PIP is powered primarily by two R packages:

The get_pip_info() function allows you to retrieve information about the versions of these packages

pip_info <- get_pip_info()
#> Saving response to cache "e64a91499cabd8232b327aba32e94d6e"
pip_info$pip_packages
#> $pipapi
#> $pipapi$pkg_version
#> [1] "1.3.11.9000"
#> 
#> $pipapi$pkg_hash
#> [1] "8dfe898468855ec45852397d86ad27a9b7727c28"
#> 
#> 
#> $wbpip
#> $wbpip$pkg_version
#> [1] "0.1.4"
#> 
#> $wbpip$pkg_hash
#> [1] "80bcd3151f16c3e113086d5f27d778a8262b4adf"

These are the two packages which changes are are most likely to impact reproducibility, but the get_pip_info() function also provides additional information about the R version being used by PIP, the Operating System, etc.