This vignette gives an introduction to the pddcs
package.
The most important function in pddcs
is fetch_indicator()
.
fetch_indicator()
retrieves data for all available sources and indicators. It takes two basic arguments; indicator
, a CETS type indicator code, and source
, a specified data source. Currently available sources are Eurostat, UNPD WPP, UNICEF and WHO. For a list of available indicators see the package dataset indicatorlist
.
# Check for available indicators
?indicatorlist
In order to fetch the most recent population data from Eurostat, we simply specify the indicator code ('SP.POP.TOTL'
) and the source ('eurostat'
).
# Fetch population data from Eurostat
df <- fetch_indicator('SP.POP.TOTL', source = 'eurostat')
fetch_indicator()
always returns a data frame with 6 columns; iso3c
(country code) , year
(year) , indicator
(indicator code), value
(data value), note
(footnote) and source
(data source).
# Inspect data frame
head(df)
#> # A tibble: 6 × 6
#> iso3c year indicator value note source
#> <chr> <dbl> <chr> <dbl> <chr> <chr>
#> 1 AUT 1960 SP.POP.TOTL 7047539 NA eurostat
#> 2 AUT 1961 SP.POP.TOTL 7086299 NA eurostat
#> 3 AUT 1962 SP.POP.TOTL 7129864 NA eurostat
#> 4 AUT 1963 SP.POP.TOTL 7175811 NA eurostat
#> 5 AUT 1964 SP.POP.TOTL 7223801 NA eurostat
#> 6 AUT 1965 SP.POP.TOTL 7270889 NA eurostat
Before uploading any data to The Data Collection System (DCS) it can be useful to compare it with the current data in WDI. This can be done with compare_with_wdi()
, which only takes one argument; a data frame as returned by fetch_indicator().
# Compare with WDI
dl <- compare_with_wdi(df)
compare_with_wdi()
returns a list of three data frames; the original dataset (source
), the data retrieved from WDI (wdi
) and the rows in the source dataset that are not present in WDI (not_in_wdi
).
# Inspect list
str(dl)
#> List of 3
#> $ source : tibble [1,746 × 6] (S3: tbl_df/tbl/data.frame)
#> ..$ iso3c : chr [1:1746] "AUT" "AUT" "AUT" "AUT" ...
#> ..$ year : num [1:1746] 1960 1961 1962 1963 1964 ...
#> ..$ indicator: chr [1:1746] "SP.POP.TOTL" "SP.POP.TOTL" "SP.POP.TOTL" "SP.POP.TOTL" ...
#> ..$ value : num [1:1746] 7047539 7086299 7129864 7175811 7223801 ...
#> ..$ note : chr [1:1746] NA NA NA NA ...
#> ..$ source : chr [1:1746] "eurostat" "eurostat" "eurostat" "eurostat" ...
#> ..- attr(*, ".internal.selfref")=<externalptr>
#> $ wdi :'data.frame': 13237 obs. of 5 variables:
#> ..$ iso3c : chr [1:13237] "ABW" "ABW" "ABW" "ABW" ...
#> ..$ year : num [1:13237] 1960 1961 1962 1963 1964 ...
#> ..$ indicator: chr [1:13237] "SP.POP.TOTL" "SP.POP.TOTL" "SP.POP.TOTL" "SP.POP.TOTL" ...
#> ..$ value : num [1:13237] 54208 55434 56234 56699 57029 ...
#> ..$ source : chr [1:13237] "wdi" "wdi" "wdi" "wdi" ...
#> $ not_in_wdi: tibble [116 × 6] (S3: tbl_df/tbl/data.frame)
#> ..$ iso3c : chr [1:116] "AUT" "BEL" "CZE" "DEU" ...
#> ..$ year : num [1:116] 2020 2020 2020 1960 1961 ...
#> ..$ indicator: chr [1:116] "SP.POP.TOTL" "SP.POP.TOTL" "SP.POP.TOTL" "SP.POP.TOTL" ...
#> ..$ value : num [1:116] 8916864 11544241 10697858 55607705 56273735 ...
#> ..$ note : chr [1:116] NA "Preliminary." NA NA ...
#> ..$ source : chr [1:116] "eurostat" "eurostat" "eurostat" "eurostat" ...
#> ..- attr(*, ".internal.selfref")=<externalptr>
Here we select the rows that don’t match.
# Select rows with 'new' data
df <- dl$not_in_wdi
DCS requires a very specific format in order to upload files to the system. pddcs
has a set of functions to help to help simplify this process. format_dcs()
takes two arguments; df
(a data frame in pddcs
format) and type
, which specifies if the the dataset should be convert to data or metadata format.
# Convert to DCS 'data' format
df <- format_dcs(df, type = 'data')
head(df)
#> # A tibble: 6 × 5
#> Time Country Series Scale Data
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 YR2020 AUT SP.POP.TOTL 0 8916864
#> 2 YR2020 BEL SP.POP.TOTL 0 11544241
#> 3 YR2020 CZE SP.POP.TOTL 0 10697858
#> 4 YR1960 DEU SP.POP.TOTL 0 55607705
#> 5 YR1961 DEU SP.POP.TOTL 0 56273735
#> 6 YR1962 DEU SP.POP.TOTL 0 56918197
We can then write this data frame to file of our choosing with write_dcs()
.
write_dcs()
checks that the data has the correct format and ensure that the resulting file has the correct sheet names (i.e. 'Sheet1'
for data and 'Country-Series-Time_Table'
for metadata).