help pip intro Poverty and Inequality Platform (PIP)
return to pip https://worldbank.github.io/pip/
-----------------------------------------------------------------------------------------------------------
This entry aims to equip new users with basic knowledge of the most useful features of the pip
command. For a more detailed explanation of each subcommand, please visit the subcommand directory
Remarks
This introduction is presented under the following headings
Description
Basic Use
Country-level
Regional/global-level
Poverty Lines
Data Availability
Towards distributional analysis
Auxiliary Data
Examples
Description
The pip command allows Stata users to compute poverty and inequality indicators for over 160
countries in the World Bank's database of household surveys. The Poverty and Inequality Platform
(PIP) is a computational tool that allows users to conduct country-specific, cross-country, as well
as global and regional poverty analyses. Users are able to estimate rates over time and at any
poverty line specified. pip reports a wide range of measures for poverty (at any chosen poverty
line) and inequality. See full list of indicators available in pip .
modular structure: The pip command works in a modular (subcommand, hereafter) fashion. There is no
instruction to pip that is executed outside a particular subcommand. When no subcommand is
invoked--as in pip, clear--the subcommand cl (country-level estimates) is in use. Thus, understanding
pip fully is equivalent to understand each subcommand and its options fully. For a list of all
subcommands and their corresponding help entries, visit the subcommand directory
welfare aggregate: To make estimates comparable across countries, the welfare aggregate is expressed
in PPP values of the most recent ICP round that has been approved for global poverty estimates by the
directives of the World Bank. The detailed methodology of the welfare aggregate conversion can be
found in the Poverty and Inequality Platform Methodology Handbook.
Collaboration: PIP is the result of a close collaboration between World Bank staff across the
Development Data Group, the Development Research Group, and the Poverty and Inequality Global
Practice.
Basic use
The main functionality of pip is to compute poverty and inequality indicators for over 160 countries
in the World Bank's database of household surveys. Poverty measures are estimated at two levels of
aggregation: country-level and regional/global-level, which you can access using the subcommands cl
(the default) and wb, respectively. For a detailed explanation of cl and wb go here.
country-level
For instance, you can query poverty at $2.15-a-day poverty line for all countries in all survey years
. pip cl, clear
Yet, you can filter your query by country and survey year. For examples, for Morocco in 2013. Visit
the list of countries and regions.
. pip, country(mar) year(2013) clear // the default subcommand is cl
For extrapolated and interpolated data that underpin the global and regional poverty numbers, use
fillgaps option. There is no survey in Morocco in 2019, but you can estimate it
. pip, country(mar) year(2019) fillgaps clear
Note: Extrapolated and interpolated values are made available for transparency purposes only and
are NOT intended to be use four country-level analysis, as they are originally calculated to
estimated global poverty.
regional/global-level
To get poverty estimates at the regional/global-level, just switch the cl subcommand for wb
. pip wb, clear
Query a particular region using region() options. Visit the list of countries and regions.
. pip wb, clear region(LAC)
Poverty lines
By default, pip estimate poverty measures at the international poverty line of the current ICP round.
For the 2017 round, the value is $2.15-a-day. However, you can poverty at different thresholds.
. pip, country(mar) year(2019) fillgaps povline(6.85)
You can query multiple poverty lines.
. pip, country(mar) year(2019) fillgaps povline(2.15 3.65 6.85 10)
Data availability
Display data availability by country and region
. pip info
If data is not available for a particular survey year, pip will return and error but will provide you
with a clickable hyperlink to find out the survey availability for the country of interest.
. pip, country(mar) year(2019) clear
Add error message
Towards distributional analysis
The default use of pip is to get the poverty headcount. However, the inverse operation is also
available in pip. You can provide the share of the population using popshare() and get in return the
monetary value of the welfare distribution. For example, you can estimate the median like this
.pip, country(mar) year(2013) clear popshare(0.5)
Auxiliary data
Though the main underlying data of PIP are the household surveys, it makes use of many other sources
such as GDP, CPI, population, among many others. You can get the list of all auxiliary tables, and
click in the one you want.
.pip tables, clear
Or, you can provide the name of the table of interest directly,
.pip tables, table(cpi) clear
Examples
The examples below do not comprehend of pip's features.
1. Basic examples
1.1. Load latest available survey-year estimates for Colombia and Argentina
pip cl, country(col arg) year(last) clear
1.2. Load clickable menu
pip, info
1.3. Load only urban coverage level
pip cl, country(all) coverage("urban") clear
2. Differences between queries
2.1. Country estimation at $2.15 in 2015. Since there are no surveys in ARG in 2015, results are
loaded only for COL, BRA and IND.
pip, country(COL BRA ARG IND) year(2015) clear
2.2. Reference-year estimation. Filling gaps for ARG and moving the IND estimate from 2015-2016 to
2015. Only works for reference years.
pip, country(COL BRA ARG IND) year(2015) clear fillgaps
2.4. World Bank aggregation (country() is not available)
pip wb, clear year(2015)
pip wb, clear region(SAR LAC)
pip wb, clear // all regions and reference years
3. Samples uniquely identified by country/year
3.1 Longest possible time series for each country, even if welfare type or survey coverage
changes from one year to another (national coverage is preferred).
pip, clear
* Prepare reporting_level variable
label define level 3 "national" 2 "urban" 1 "rural"
encode reporting_level, gen(reporting_level_2) label(level)
* keep only national when more than one is available
bysort country_code welfare_type year: egen _ncover = count(reporting_level_2)
gen _tokeepn = ( (inlist(reporting_level_2, 3, 4) & _ncover > 1) | _ncover == 1)
keep if _tokeepn == 1
* Keep longest series per country
by country_code welfare_type, sort: gen _ndtype = _n == 1
by country_code : replace _ndtype = sum(_ndtype)
by country_code : replace _ndtype = _ndtype[_N] // number of welfare_type per country
duplicates tag country_code year, gen(_yrep) // duplicate year
bysort country_code welfare_type: egen _type_length = count(year) // length of type series
bysort country_code: egen _type_max = max(_type_length) // longest type series
replace _type_max = (_type_max == _type_length)
* in case of same length in series, keep consumption
by country_code _type_max, sort: gen _ntmax = _n == 1
by country_code : replace _ntmax = sum(_ntmax)
by country_code : replace _ntmax = _ntmax[_N] // number of welfare_type per country
gen _tokeepl = ((_type_max == 1 & _ntmax == 2) | ///
(welfare_type == 1 & _ntmax == 1 & _ndtype == 2) | ///
_yrep == 0)
keep if _tokeepl == 1
drop _*
(click to run)
3.2 Longest possible time series for each country, restrict to same welfare type throughout, but
letting survey coverage vary (preferring national).
pip, clear
* Prepare reporting_level variable
label define level 3 "national" 2 "urban" 1 "rural"
encode reporting_level, gen(reporting_level_2) label(level)
bysort country_code welfare_type year: egen _ncover = count(reporting_level_2)
gen _tokeepn = ( (inlist(reporting_level_2, 3, 4) & _ncover > 1) | _ncover == 1)
keep if _tokeepn == 1
* Keep longest series per country
by country_code welfare_type, sort: gen _ndtype = _n == 1
by country_code : replace _ndtype = sum(_ndtype)
by country_code : replace _ndtype = _ndtype[_N] // number of welfare_type per country
bysort country_code welfare_type: egen _type_length = count(year)
bysort country_code: egen _type_max = max(_type_length)
replace _type_max = (_type_max == _type_length)
* in case of same length in series, keep consumption
by country_code _type_max, sort: gen _ntmax = _n == 1
by country_code : replace _ntmax = sum(_ntmax)
by country_code : replace _ntmax = _ntmax[_N] // max
gen _tokeepl = ((_type_max == 1 & _ntmax == 2) | ///
(welfare_type == 1 & _ntmax == 1 & _ndtype == 2)) | ///
_ndtype == 1
keep if _tokeepl == 1
drop _*
(click to run)
3.3 Longest series for a country with the same welfare type. Not necessarily the latest
pip, clear
*Series length by welfare type
bysort country_code welfare_type: gen series = _N
*Longest
bysort country_code : egen longest_series=max(series)
tab country_code if series !=longest_series
keep if series == longest_series
*2. If same length: keep most recent
bys country_code welfare_type series: egen latest_year=max(year)
bysort country_code: egen most_recent=max(latest_year)
tab country_code if longest_series==series & latest_year!=most_recent
drop if most_recent>latest_year
*3. Not Applicable: if equal length and most recent: keep consumption
bys country_code: egen preferred_welfare=min(welfare_type)
drop if welfare_type != preferred_welfare
(click to run)
4. Analytical examples
4.1 Graph of trend in poverty headcount ratio and number of poor for the world
pip wb, clear
keep if year > 1989
keep if region_code == "WLD"
gen poorpop = headcount*population / 1000000
gen hcpercent = round(headcount*100, 0.1)
gen poorpopround = round(poorpop, 1)
twoway (sc hcpercent year, yaxis(1) mlab(hcpercent) ///
mlabpos(7) mlabsize(vsmall) c(l)) ///
(sc poorpopround year, yaxis(2) mlab(poorpopround) ///
mlabsize(vsmall) mlabpos(1) c(l)), ///
yti("Poverty Rate (%)" " ", size(small) axis(1)) ///
ylab(0(10)40, labs(small) nogrid angle(0) axis(1)) ///
yti("Number of Poor (million)", size(small) axis(2)) ///
ylab(0(400)2000, labs(small) angle(0) axis(2)) ///
xlabel(,labs(small)) xtitle("Year", size(small)) ///
graphregion(c(white)) ysize(5) xsize(5) ///
legend(order( ///
1 "Poverty Rate (% of people living below $2.15)" ///
2 "Number of people who live below $2.15") si(vsmall) ///
row(2)) scheme(s2color)
(click to run)
4.2 Graph of trends in poverty headcount ratio by region, multiple poverty lines ($2.15, $3.65,
$6.85)
pip wb, povline(2.15 3.65 6.85) clear
drop if inlist(region_code, "OHI", "WLD") | year<1990
keep poverty_line region_name year headcount
replace poverty_line = poverty_line*100
replace headcount = headcount*100
tostring poverty_line, replace format(%12.0f) force
reshape wide headcount,i(year region_name) j(poverty_line) string
local title "Poverty Headcount Ratio (1990-2019), by region"
twoway (sc headcount215 year, c(l) msiz(small)) ///
(sc headcount365 year, c(l) msiz(small)) ///
(sc headcount685 year, c(l) msiz(small)), ///
by(reg, title("`title'", si(med)) ///
note("Source: pip", si(vsmall)) graphregion(c(white))) ///
ylabel(, format(%2.0f)) ///
xlab(1990(5)2019 , labsi(vsmall)) xti("Year", si(vsmall)) ///
ylab(0(25)100, labsi(vsmall) angle(0)) ///
yti("Poverty headcount (%)", si(vsmall)) ///
leg(order(1 "$2.15" 2 "$3.65" 3 "$6.85") r(1) si(vsmall)) ///
sub(, si(small)) scheme(s2color)
(click to run)
4.3 Graph of population distribution across income categories in Latin America, by country
pip, region(lac) year(last) povline(2.15 3.65 6.85) clear
keep if welfare_type==2 & year>=2014 // keep income surveys
keep poverty_line country_code country_name year headcount
replace poverty_line = poverty_line*100
replace headcount = headcount*100
tostring poverty_line, replace format(%12.0f) force
reshape wide headcount,i(year country_code country_name ) j(poverty_line) string
gen percentage_0 = headcount215
gen percentage_1 = headcount365 - headcount215
gen percentage_2 = headcount685 - headcount365
gen percentage_3 = 100 - headcount685
keep country_code country_name year percentage_*
reshape long percentage_,i(year country_code country_name ) j(category)
la define category 0 "Extreme poor (< $2.15)" 1 "Poor LIMIC ($2.15-$3.65)" ///
2 "Poor UMIC ($3.65-$6.85)" 3 "Non-poor (> $6.85)"
la val category category
la var category ""
local title "Distribution of Income in Latin America and Caribbean, by country"
local note "Source: World Bank PIP, using the latest survey after 2014 for each country."
local yti "Population share in each income category (%)"
graph bar (mean) percentage, inten(*0.7) o(category) o(country_code, ///
lab(labsi(small) angle(vertical)) sort(1) descending) stack asy ///
blab(bar, pos(center) format(%3.1f) si(tiny)) ///
ti("`title'", si(small)) note("`note'", si(*.7)) ///
graphregion(c(white)) ysize(6) xsize(6.5) ///
legend(si(vsmall) r(3)) yti("`yti'", si(small)) ///
ylab(,labs(small) nogrid angle(0)) scheme(s2color)
(click to run)
(Go up to Sections Menu)