0.a. Goal

Goal 3: Ensure healthy lives and promote well-being for all at all ages

0.b. Target

Target 3.3: By 2030, end the epidemics of AIDS, tuberculosis, malaria and neglected tropical diseases and combat hepatitis, water-borne diseases and other communicable diseases

0.c. Indicator

Indicator 3.3.4: Hepatitis B incidence per 100,000 population

0.d. Series

Not applicable

0.e. Metadata update

2021-04-01

0.g. International organisations(s) responsible for global monitoring

World Health Organization

1.a. Organisation

World Health Organization

2.a. Definition and concepts

Definition:

This indicator is measured indirectly through the proportion of children 5 years of age who have developed chronic HBV infection (i.e. the proportion that tests positive for a marker of infection called hepatitis B surface antigen [HBsAg]).1

Hepatitis B surface antigen: a protein from the virus’s coat. A positive test for HBsAg indicates active HBV infection. The immune response to HBsAg provides the basis for immunity against HBV, and HBsAg is the main component of HepB.2

Concepts:

It is not possible, on clinical grounds, to differentiate hepatitis B from hepatitis caused by other viral agents, hence, laboratory confirmation of the diagnosis is essential. The Hepatitis B surface antigen is the most common hepatitis B test. The presence of HBsAg in serum indicates that the patient has contracted HBV infection. The measurement of HBsAg levels have been standardized in IU/ml. The test is used to identify those at risk of spreading the disease. HBsAg, an HBV viral coat antigen, is produced in large quantities in infected-cell cytoplasm and continues to be produced in patients with chronic, active HBV infection. Documented HBsAg positivity in serum for 6 or more months suggests chronic HBV with a low likelihood of subsequent spontaneous resolution.

2.b. Unit of measure

Prevalence of the Hepatitis b surface antigen in children under five years of age (proportion with chronic infection)

3.a. Data sources

A systematic search on articles published between Jan 1, 1965, and Oct 30, 2018. in the databases Embase, PubMed, Global Index Medicus, Popline, and Web of Science.

Following full text review, we extracted data from each study using the following variables: study characteristics (study and sample collection dates, study locations i.e., city, subnational [an area, region, state, or province in a country], or national level), participant characteristics (age range, sex, year, and population group), and prevalence of the HBV marker, type of laboratory tests, and number of participants the HBV marker prevalence was based on.

Data of eligible articles were entered into a Microsoft EXCEL® and/or Distiller databank by two reviewers independently. Information was extracted for author name, year, age, gender, marker, laboratory test used, number of individuals tested, prevalence of each marker when reported, the population group (general population, HCWs, or blood donors) and whether the data reported was for a city, sub-national (an area, region, state or province in a country) or national level, GDP per capita. In addition to HBsAg, HBeAg was recorded, as available for individuals when HBsAg was also reported. In order to record information on methodological quality and study bias resulting from non-representativeness, an additional variable was used: samples likely to be representative for the country/area specified were coded as 0 and others, e.g. convenience samples in certain communities or tribes in the country were assigned a 1, supplemented by additional information. The risk of bias/non-representativeness information was applied if the population was neither HCW nor blood donor (see description below).3 In the following, variables extracted from the studies and assumptions made are described in detail:

  1. Author, Date
  2. Year start/end of study conduct: Year of study begin and end was extracted. If this information was not available from the studies, we used the commonly used assumption that the study was conducted two years prior to the year of publication (e.g. author, 2000, year of study conduct: 1998).
  3. Sex: Sex-specific values were extracted. If only an overall (all) estimate was provided, the share of females in the study was specified in the column additional information.
  4. Age start/end: The most specific age-group provided by the data was extracted. If the age-group on which the parameter value was based on was not available, assumptions were made based on the context of the study. Therefore, the following was applied in case of missing information on age-groups in the study population:
  5. If the study was conducted in the general population without further specification and if only one prevalence estimate is provided, the age-group was considered to be 0-85 years. Subsequently, if the beginning and last age-group is missing, the lower value of the youngest age-group is 1 year, the upper value of the oldest age-groups is 85 years.
  6. If the study was conducted among adult populations but no age-range is provided, the age-group is considered to be 17-65 years.
  7. If the study was conducted among pupils but no age-range is provided, the age-group is considered to be 5-15 years.
  8. If the study was conducted among pregnant women but no age-range is provided, the age-group is considered to be 15-49 years (reproductive age).
  9. If the study was conducted among blood donors but no age-range is provided, the age-group is considered to be 17-65 years.
  10. If the study was conducted among army recruits or soldiers but no age-range is provided, the age-group is considered to be 18-45 years.
  11. If the study was conducted among the working population but no age-range is provided, the age-group is considered to be 16-65 years.
  12. HBsAg Prevalence: The most specific prevalence estimate provided by the data was extracted (defined by age-/sex-/year-prevalence). Separate lines for each marker were used in the data extraction file (e.g. one for HBeAg and one line for HBsAg, even if the study group/publication was the same)
  13. HBeAg Prevalence (optional marker): The most specific prevalence estimate (defined by age-/sex-/year-prevalence) of HBeAg among HBsAg-positive individuals was extracted and, if applicable was calculated to reflect prevalence among HBsAg carriers.
  14. anti-HBc Prevalence (optional marker): The most specific prevalence estimate provided by the data was extracted (defined by age-/sex-/year-prevalence).
  15. Laboratory method: Testing immune response markers of HBV infection began in the 1970s by counter-immuno-electrophoresis technique (CIEP). Since then, different detection methods have been developed (RIA, EIA, …). The most applied method in prevalence studies is the ELISA (enzyme-linked immunosorbent assay). Five categories were established to record the method/test used for prevalence detection in the studies: ELI new (ELISA -2, -3, EIA, …), EIA old (CMIA, CIEP, RPHA), NAT (qPCR/real-time PCR, nested PCR, multiplex PCR), other (e.g. RIA); Unknown/not specified.
  16. Country: Country names were recorded according to www.who.int and, for additional analysis purpose, were grouped according to the six WHO regions: the African Region, the Region of the Americas, the Eastern Mediterranean Region, the European Region, the South East-Asia Region and the Western Pacific Region.
  17. Sample size of individuals blood drawn from; of individuals involved in analyses/bases for parameter estimate: As a quality indicator of the study, we distinguished the effective sample size, i.e. the number of individuals involved in the analysis/on which the parameter estimate is based on, from the number of individuals from which blood was drawn from (separate column) and the initially calculated/planed sample size (separate column).
  18. Population: Although focus was on the general population, two additional groups were included and specified. These include: HCW and blood donor (plus subgroups unspecified, paid, unpaid/voluntary). If in this column “population” was specified as HCW or blood donor and not as general population, the risk of bias column (following) remains empty.
  19. Level: Information is provided if the study was conducted on a national, sub-national, city level or if the level was not further specified (four categories).
  20. Study Location: This free-text variable specifies the city/area within the country where the included study was conducted. The variables/columns Level and Study Location were additionally included following the WHO Meeting on Impact of Hepatitis B Vaccination at WHO, Geneva, in March 2014.

Additional data from other sources than the eligible studies:

  1. Year of vaccine introduction in the entire country: data is derived from official reports by WHO Member States and unless otherwise stated, data is reported annually through the WHO/UNICEF joint reporting process. http://www.who.int/entity/immunization/monitoring_surveillance/data/year_vaccine_introduction.xls?ua=1
  2. Period when the study was conducted: pre- vaccination or post vaccination. This is determined according the year of introduction in the whole country.
  3. Coverage estimates series: data is obtained from WUENIC: http://apps.who.int/immunization_monitoring/globalsummary/timeseries/tswucoveragebcg.html
  4. GDP per capita was used form UN data that compiles information from the World Bank Source http://data.un.org/Data.aspx?q=GDP&d=SNAAMA&f=grID%3a101%3bcurrID%3aUSD%3bpcFlag%3a1 ),
  5. Longitude and latitude data (source: www.google.com).
  6. Population structure and size data for each country was from the UN population division:

http://www.un.org/en/development/desa/population/

3.b. Data collection method

WHO provides Member States the opportunity to review and comment on data as part of the so called country consultation process. Member States receive an annex with their country specific estimates, the serosurveys used to inform the mathematical model and the summary of the methodology. They are provided with sufficient time to provide any additional study to be screened according to the inclusion and inclusion criteria.

3.c. Data collection calendar

The systematic review of published serosurveys and model estimates are updated on an annual basis. Planned for the last quarter of 2019.

3.d. Data release calendar

Second quarter of each year

3.e. Data providers

World Health Organization

3.f. Data compilers

World Health Organization

4.a. Rationale

The purpose is to describe the reduction in chronic hepatitis b infections. Most of the burden of disease from HBV infection comes from infections acquired before the age of 5 years. Therefore, prevention of HBV infection focuses on children under 5 years of age. The United Nations selected the cumulative incidence of chronic HBV infection at 5 years of age as an indicator of the Sustainable Development Goal target for “combating hepatitis”. This indicator is measured indirectly through the proportion of children 5 years of age who have developed chronic HBV infection (i.e. the proportion that tests positive for a marker of infection called hepatitis B surface antigen [HBsAg]).

4.b. Comment and limitations

The main Limitations of the analysis is that despite the thorough and in-depth literature search and access, there are fewer data on post vaccination studies than pre- vaccination studies. The model is largely informed by pre-vaccination studies in adults.

The quality of studies and data was assessed by reviewing representativeness of sampling. Bias factor is a dichotomous variable.

Potential important biases included geographical representation of the data points. Also, studies were from many different sources such as blood donors and pregnant women. The former possibly having a lower proportion of Hep B prevalence than the general population as donor questionnaires often exclude individuals with risk factors for blood-borne diseases and the pregnant women possibly having a higher prevalence as were in studies to see the effect of a birth dose of vaccine to prevent vertical transmission. As the proportion of studies and size of studies that were from blood donors was significantly greater than those on pregnant women, we may presume that our estimates of prevalence of pre- vaccination may be on the low side.

4.c. Method of computation

The data was modelled using a Bayesian logistic regression looking at the proportion of individuals that tested positive for HBsAg in each study, weighting each study by its size and using a conditional autoregressive (CAR) model accounting for spatial and economic correlations between similar countries. This model uses data from well sampled countries to estimate prevalence in more data poor countries with effects such as sex, age and vaccination status, these are also informed by the geographic and countries GDP proximity to other countries (CAR model). Under the assumption that countries that are close together economically and/or geographically will have more similar prevalence due to similar social structure and health care capabilities.

The response variable in the model was the prevalence of Hepatitis surface antigen (HBsAg) with the explanatory variables being age (three categories, under 5, juvenile (5-15) and adult (16+), split using the average age of participants in the study), sex (proportion female in the study), study bias (e.g. a high fraction of study participants from indigenous populations), 3 dose vaccine coverage, birth dose of the vaccine and country of study. The coverage of routine 3 dose vaccination and birth dose vaccination in each study was calculated by cross referencing the year of and age of participants in each study with the corresponding WHO-UNICEF vaccine coverage estimates for that country. The WHO-UNICEF estimates are annual data for the country as a whole, and did not contain information on vaccine efficacy which was not used in the analysis as no data on this was obtained. The vaccine efficacy would be implicitly estimated in the analysis as we see vaccination having a variable effect across time and space across the studies. The coverage of routine 3 dose vaccination and birth dose vaccination in each study was calculated by cross referencing the year of and age of participants in each study with the corresponding WHO-UNICEF vaccine coverage estimates for that country. The coverage of routine 3 dose vaccination and birth dose vaccination in each study was calculated by cross referencing the year of and age of participants in each study with the corresponding WHO-UNICEF vaccine coverage estimates for that country. More explicitly, the model uses the ages and timing of the study to calculate the years across which the participants are born, so if the if there was an age group range of 10-15 in a study that was undertaken in 2015, the birth years would be from 2000-2005, we then average the vaccination coverage from the WHO-UNICEF estimates across those 5 years assuming that each age was evenly represented in that age group in the study. The same process was used for the 3 dose and birth dose vaccination.

The general logistic model equation is described below,

Yi ~Binomial (πi, Ni), logπi1−πi= β0+ ∑j=1pβjxij+ui

Where βj are the fixed effects of the explanatory variables xii. With the spatial random effects described by

ui~ N(u−i,σ2u/ni)

,

where,

u−i= ∑j ∈ neigh(i)wiuj/ni

Where ni is the number of neighbours for country i and weights wi, are 1.

The model was simulated in the Bayesian statistical package WinBUGS, and data manipulation and model initialisation run from R (3.3.1) using R2WinBUGS. The model considers the parameters of age, sex, study bias (e.g. a high fraction of study participants from indigenous populations), vaccine coverage, birth dose of the vaccine and country of study.

The model uses the CAR-normal function, in WinBUGS, to model the spatial and economic autocorrelation related to neighbouring countries. For each country that had prevalence data, a weighted central position was calculated using the size and location of each study. For those countries with no data, we used the population centroid. In a novel approach, we considered 3 dimensions in the country adjacency matrix; we used the usual geographic dimensions, latitude and longitude and also combined these with the natural log of the country’s GDP per capita. This was to measure not only geographic but also the developmental proximity of countries. The adjacency matrix for the geo-economic distance gives a score between each country to every other country. Those countries which are close geographically and economically would have a low score and those further apart either geographically or economically would have a high score/distance. Therefore, those countries that are more alike will have a low score and those countries which are alike would have a high score.

The way we proportioned the geographic and economic distance to produce the adjacency matrix was then explored, this is because geographic distance may be more or less important than economic similarities. Thus, by creating a number of different adjacency matrices (not definitive) we could select the most suitable matrix that explains reality best. We normalised the geographic and GDP distance and then calculated the distance between these two normalised figures. This creates a smoothed Gaussian surface that is dependent on both spatial proximity and GDP per-capita proximity. We compared ratios of, 1:0, 1:1, 2:1, 1:2 (Geographic:GDP).

For each different adjacency matrix, we also had to select a neighbourhood distance, i.e. over what distance can a country be effected by another. Thus, we also varied the radius of distance from which to select neighbours for the neighbourhood network, we used the maximum minimum distance, twice the maximum minimum and three times the maximum minimum, thus varying the number of neighbours each country would have.

Finally, to decide the magnitude of the effect one country has on another in the neighbourhood network we varied the weights of pairs of countries in the adjacency matrix, using either a neutral weighting of 1, so that each neighbour has an equal effect on each other (not dependent on the distance in the network), or decaying weights over distance with 1/distance, and 1/distance2, where the closer the country is the greater the effect it has on another country. The outcome of these 36 different combinations led to minimum DIC (Deviance Information Criterion) being found for a ratio of 1:2 (Geographic:GDP), the neighbourhood networks minimum distance being twice the maximum minimum distance and an even weighting of 1/distance for each adjacent country.

This model structure produces estimates for all fixed effects and also individual country level risk, this provides information on which are significantly at greater or lower risk to the average risk.

All parameters were given un-informative priors. Simulations were run with 3 MCMC chains with 50,000 burn in iterations and each parameter estimated from 1000 samples taken from a thinned 250,000 iterations to produce the posterior distribution. Convergence was attained, with r̂ values all very close to 1.000. Due to the Bayesian framework and WinBUGS software it was possible to gain estimates for countries where we had no data on prevalence, using their GDP and geographic proximity to inform this estimate. Those countries with the largest number of studies provided the estimates with the tightest confidence intervals and those with few or no data were less well defined, often producing a log normal distributed posterior distribution, giving estimates with long tails.

Posterior distributions of parameters were inspected for convergence and to check for covariance between parameters. Where necessary parameters were centred and scaled to N (0, 1) to aid parameter convergence and remover covariance. This was done for the sex parameter, which was entered as the proportion of the sample that was female; this was seen to co-vary with the intercept and bias parameters before re-centring and scaling. However, the covariance of routine vaccination and birth dose persisted even after re-centring. This is in part unsurprising as there a few instances where birth dose is administered without the routine vaccination. Here we tried to reduce this interaction of the terms by transforming the birth dose data. We modelled birth dose using only data where the birth dose was greater than 60, 70, 80 & 90% respectively, we also modelled birth dose to the square, thus increasing the effect of high birth doses over smaller doses. Model selection dependent on which one both reduced the covariance between the parameters and returned the lowest DIC score.

Model validation was conducted using 90% of randomly selected data against the remaining 10%, and by comparing model estimates of prevalence against observed data (Figure 3). Figure 4 shows the average prevalence in each country from all the studies plotted against the models estimate. Figure 5 shows the marginal and joint posterior distributions for the fitted parameters. Table 1 gives the estimated parameter values with associated credible intervals.

During the validation exercise (in which countries were consulted over their estimates) it was pointed out that China had undertaken three very large-scale population-based serological surveys in order to establish baseline prevalence and progress towards HBV elimination. There were a large number of other surveys from China, that are less representative than these three nationwide surveys. We conducted a sensitivity analysis by restricting the data from China to the three nationally representative surveys. The effect of this change in input data was that the effect of vaccination was more distinct, but the estimated age effects (change in prevalence in children under 5, or juveniles (children 5-15 years)) were no longer significantly different from zero (see Table 2 and Figure 6). The deviance was significantly reduced, suggesting a much better fitting model (Table 2), albeit on a somewhat reduced dataset.

4.e. Adjustments

Estimates are provided for the 194 WHO Member States and grouped accordingly to the six WHO regions. We also provide estimates according to income classification and follow UN Regional Groupings and Compositions as much as possible.

4.f. Treatment of missing values (i) at country level and (ii) at regional level

  • At country level

All values represent the best estimates for the hepatitis B surface antigen indicator and aim to facilitate comparability across countries and over time. The estimates are not always the same as the official national estimates, because of the use of different methodologies and data sources. Estimates are provided for 194 WHO Member States. The analysis was carried out for the age groups 0-5 years and for the general population. Due to scarcity of data from some countries, the estimates are more robust at global and regional level than at country level, therefore, we suggest countries focus on the 95% Credible Intervals and not only on the reported point estimates.

A thorough and robust literature review was undertaken to find studies across the 194 WHO Member States and across age groups and vaccination status. We updated the systematic review by Schweitzer et al, 2015 that included a systematic search on articles published between Jan 1, 1965, and Oct 23, 2013. We updated the systematic search to include articles published between Oct 23, 2013, and October 30, 2018 in the databases Embase, PubMed, Global Index Medicus, Popline, and Web of Science.

For each country that had prevalence data, a weighted central position was calculated using the size and location of each study. For those countries with no data, we used the population centroid. Please see detailed explanation above.

  • At regional and global levels

Same as above

4.g. Regional aggregations

Sources of discrepancies:

The estimates are not always the same as the official national estimates, because of the use of different methodologies and data sources. The study selection criteria were similar to (Schweitzer, et al., 2015). Observational studies on chronic HBV infection seroprevalence (HBsAg prevalence), done in the general population or among blood donors, health-care workers (HCWs), and pregnant women were considered for inclusion in this systematic review. Studies were excluded if they were systematic reviews or meta-analyses, surveillance reports, case studies, letters or correspondence, or did not contain HBsAg seroprevalence data. Studies were also excluded if they exclusively reported prevalence estimates for high-risk population groups (e.g., migrants and refugees).

Country estimates may come from selected serosurveys.

4.h. Methods and guidance available to countries for the compilation of the data at the national level

Non applicable. Estimates come from the mathematical model.

Gather checklist of information that should be included in new reports of global health estimates. Gather promotes best practices in reporting health estimates. A range of health indicators are used to monitor population health and guide resource allocation throughout the world. But the lack of data for some regions and differing measurement methods present challenges that are often addressed by using statistical modelling techniques to generate coherent estimates based on often disparate sources of data. http://gather-statement.org/

4.j. Quality assurance

Quality assurance

  • WHO’s estimates use a methodology reviewed by the Immunization and Vaccines Related Implementation Research Advisory Committee (IVIR-AC) and presented to the Strategic Advisory Group of Experts (SAGE). These estimates have been documented following the Guidelines for Accurate and Transparent Health Estimates Reporting (GATHER).

  • WHO provided Member States the opportunity to review and comment on data and estimates as part of the so called country consultation process.

5. Data availability and disaggregation

Data availability:

Estimates are available for 194 Member States and for the six WHO Regions, as well as at global level.

Time series:

Estimates are available for pre- vaccine era, 2015 and 2018 and 2020

Disaggregation:

age groups (i.e. under five years of age, 5 years and older (although these estimates are not reported) and the general population); sex/gender if possible. Although the data for the latter is scarce. In addition, data at national, regional and global level.

6. Comparability/deviation from international standards

This dataset represents the best estimates for the hepatitis B surface antigen indicator and aims to facilitate comparability across countries and over time. The estimates are not always the same as the official national estimates, because of the use of different methodologies and data sources e.g. special populations or populations at risk are not included in the hepatitis b seroprevalence model. Estimates are provided for 194 WHO Member States. The conditional autoregressive model uses data from well sampled countries to estimate prevalence in more data-poor countries taking account of effects such as sex, age and vaccination status. Due to scarcity of data from some countries, the estimates are more robust at global and regional level than at country level, therefore focus should be on the 95% Credible Intervals and not only on the reported point estimates.

Sources of discrepancies:

Inclusion or exclusion criteria of the type of seroprevalence studies. Observational studies on chronic HBV infection seroprevalence (HBsAg prevalence), done in the general population or among blood donors, health-care workers (HCWs), and pregnant women were considered for inclusion. Studies were excluded if they were systematic reviews or meta-analyses, surveillance reports, case studies, letters or correspondence, or did not contain HBsAg seroprevalence data. Studies were also excluded if they exclusively reported prevalence estimates for high-risk population groups (e.g., migrants and refugees).

7. References and Documentation

Serosurveys are available for each member states and reference provided for each data point.

URL: http://whohbsagdashboard.com/#global-strategies