This vignette gives an introduction to the pddcs package.

Fetch data

The most important function in pddcs is fetch_indicator().

fetch_indicator() retrieves data for all available sources and indicators. It takes two basic arguments; indicator, a CETS type indicator code, and source, a specified data source. Currently available sources are Eurostat, UNPD WPP, UNICEF and WHO. For a list of available indicators see the package dataset indicatorlist.

# Check for available indicators 
?indicatorlist

In order to fetch the most recent population data from Eurostat, we simply specify the indicator code ('SP.POP.TOTL') and the source ('eurostat').

# Fetch population data from Eurostat
df <- fetch_indicator('SP.POP.TOTL', source = 'eurostat')

fetch_indicator() always returns a data frame with 6 columns; iso3c (country code) , year (year) , indicator (indicator code), value (data value), note (footnote) and source (data source).

# Inspect data frame 
head(df)
#> # A tibble: 6 × 6
#>   iso3c  year indicator     value note  source  
#>   <chr> <dbl> <chr>         <dbl> <chr> <chr>   
#> 1 AUT    1960 SP.POP.TOTL 7047539 NA    eurostat
#> 2 AUT    1961 SP.POP.TOTL 7086299 NA    eurostat
#> 3 AUT    1962 SP.POP.TOTL 7129864 NA    eurostat
#> 4 AUT    1963 SP.POP.TOTL 7175811 NA    eurostat
#> 5 AUT    1964 SP.POP.TOTL 7223801 NA    eurostat
#> 6 AUT    1965 SP.POP.TOTL 7270889 NA    eurostat

Compare with WDI

Before uploading any data to The Data Collection System (DCS) it can be useful to compare it with the current data in WDI. This can be done with compare_with_wdi(), which only takes one argument; a data frame as returned by fetch_indicator().

# Compare with WDI
dl <- compare_with_wdi(df)

compare_with_wdi() returns a list of three data frames; the original dataset (source), the data retrieved from WDI (wdi) and the rows in the source dataset that are not present in WDI (not_in_wdi).

# Inspect list
str(dl)
#> List of 3
#>  $ source    : tibble [1,746 × 6] (S3: tbl_df/tbl/data.frame)
#>   ..$ iso3c    : chr [1:1746] "AUT" "AUT" "AUT" "AUT" ...
#>   ..$ year     : num [1:1746] 1960 1961 1962 1963 1964 ...
#>   ..$ indicator: chr [1:1746] "SP.POP.TOTL" "SP.POP.TOTL" "SP.POP.TOTL" "SP.POP.TOTL" ...
#>   ..$ value    : num [1:1746] 7047539 7086299 7129864 7175811 7223801 ...
#>   ..$ note     : chr [1:1746] NA NA NA NA ...
#>   ..$ source   : chr [1:1746] "eurostat" "eurostat" "eurostat" "eurostat" ...
#>   ..- attr(*, ".internal.selfref")=<externalptr> 
#>  $ wdi       :'data.frame':  13237 obs. of  5 variables:
#>   ..$ iso3c    : chr [1:13237] "ABW" "ABW" "ABW" "ABW" ...
#>   ..$ year     : num [1:13237] 1960 1961 1962 1963 1964 ...
#>   ..$ indicator: chr [1:13237] "SP.POP.TOTL" "SP.POP.TOTL" "SP.POP.TOTL" "SP.POP.TOTL" ...
#>   ..$ value    : num [1:13237] 54208 55434 56234 56699 57029 ...
#>   ..$ source   : chr [1:13237] "wdi" "wdi" "wdi" "wdi" ...
#>  $ not_in_wdi: tibble [116 × 6] (S3: tbl_df/tbl/data.frame)
#>   ..$ iso3c    : chr [1:116] "AUT" "BEL" "CZE" "DEU" ...
#>   ..$ year     : num [1:116] 2020 2020 2020 1960 1961 ...
#>   ..$ indicator: chr [1:116] "SP.POP.TOTL" "SP.POP.TOTL" "SP.POP.TOTL" "SP.POP.TOTL" ...
#>   ..$ value    : num [1:116] 8916864 11544241 10697858 55607705 56273735 ...
#>   ..$ note     : chr [1:116] NA "Preliminary." NA NA ...
#>   ..$ source   : chr [1:116] "eurostat" "eurostat" "eurostat" "eurostat" ...
#>   ..- attr(*, ".internal.selfref")=<externalptr>

Here we select the rows that don’t match.

# Select rows with 'new' data
df <- dl$not_in_wdi

Format data

DCS requires a very specific format in order to upload files to the system. pddcs has a set of functions to help to help simplify this process. format_dcs() takes two arguments; df (a data frame in pddcs format) and type, which specifies if the the dataset should be convert to data or metadata format.

# Convert to DCS 'data' format 
df <- format_dcs(df, type = 'data')
head(df)
#> # A tibble: 6 × 5
#>   Time   Country Series      Scale     Data
#>   <chr>  <chr>   <chr>       <dbl>    <dbl>
#> 1 YR2020 AUT     SP.POP.TOTL     0  8916864
#> 2 YR2020 BEL     SP.POP.TOTL     0 11544241
#> 3 YR2020 CZE     SP.POP.TOTL     0 10697858
#> 4 YR1960 DEU     SP.POP.TOTL     0 55607705
#> 5 YR1961 DEU     SP.POP.TOTL     0 56273735
#> 6 YR1962 DEU     SP.POP.TOTL     0 56918197

Write data

We can then write this data frame to file of our choosing with write_dcs().

write_dcs() checks that the data has the correct format and ensure that the resulting file has the correct sheet names (i.e. 'Sheet1' for data and 'Country-Series-Time_Table' for metadata).

# Write to a DCS formatted file  
write_dcs(df, path = 'data-SP.POP.TOTL-eurostat.xlsx', type = 'data')
File saved to data-SP.POP.TOTL-eurostat.xlsx.