Skip to the content.

Back to Home

🧭 Building a Reproducibility Package (Flagship Edition)

World Bank Reproducible Research Initiative πŸ”— reproducibility.worldbank.org


πŸ“¦ What Is a Reproducibility Package?

A reproducibility package includes everything needed to replicate the findings in a paper:

Includes Details
πŸ“‘ Documentation README, Data Availability Statement (DAS), figure/table mapping
πŸ“‚ Code All code files required to go from the original data to the results in the paper
πŸ“Š Data All raw data needed and/or detailed access instructions to obtain it

πŸ’» Standard: Computational Reproducibility
A third party can reproduce the exact findings in the paper using the data, code, and documentation provided by the author.

βœ… Verified packages are published in the Reproducible Research Repository. (RRR)


πŸ” Reproducibility Workflow

Step What Happens
Submission Authors prepare the reproducibility package and submit it for verification.
Verification The reproducibility team tests whether the results can be fully reproduced using the submitted code and data. A detailed verification report is issued.
Publication If reproducible, the package is published on the Reproducible Research Repository (RRR) with a DOI, metadata, and verification seal.

πŸ“¦ Components of a Good Reproducibility Package β€” Flagships

πŸ“Œ Flagship projects typically involve multiple datasets, chapters, and contributors, which adds complexity to reproducibility. The table below outlines the essential components of a high-quality package, with specific tips to support coordination and transparency in flagship workflows.

Component Description & Flagship-Specific Tips
README File πŸ“Œ Critical for flagships: Serves as the main guide for replicators.

β€” Provide step-by-step instructions for how to run the code and reproduce results.
β€” Include a list of exhibits, indicating which are generated by the package and which are taken from external sources (with citations).
β€” Include a Data Availability Statement (see below).
β€” If the project structure is complex (e.g., organized by chapter or module), describe the folder layout to help others navigate it.

πŸ”— Use our templates: Markdown Β· Word
Data Availability Statement (DAS) πŸ“Œ Essential for flagships: These often use a mix of public, restricted, and internal datasets.

β€” List every dataset used, regardless of size or access level.
β€” Clearly describe the access conditions for each dataset: e.g., public (include URL), restricted (how the team obtained it), or internal WB access only (with process and a contact name if possible).
β€” Include the access date, since datasets may be updated before project completion.

πŸ”— Example DAS for Flagship
Code Files πŸ“ Organize scripts by task (e.g., cleaning.R, analysis.do), and manage them with a single main script (main.R, main.do, or equivalent).

β€” List all dependencies explicitly (e.g., R packages, ado files, Python libraries).
πŸ“Œ For flagships: Use a modular structure by chapter or module, and agree on folder naming and structure across all contributors.

πŸ”— Use our templates: Stata Β· R
Data πŸ“Œ Data is often the trickiest part for flagships due to mmultiple sources.

β€” Keep raw and processed data in separate folders.
β€” Document all data transformations in code. If manual edits were made, explain them in the README.
β€” Remove any unused datasets before submission.
β€” If using internally produced data (e.g., from other WB teams), provide as much detail as possible: include dataset title, source team, contact person (if applicable), and whether it could be shared on DDH/MDL under a restricted license.
πŸ“Œ Maintain consistent dataset versions across chapters and authors, and store original data in permanent, team-accessible locations.
Final Outputs πŸ“€ Include all raw outputs used in the paper (e.g., CSVs, LaTeX tables, plots).

πŸ“Œ If any outputs were sent to the design/publication team, specify which ones to avoid mismatches between the paper and the package.

🚧 Common Pitfalls & How to Avoid Them

🚫 Problem βœ… Solution
Version control. Code results β‰  Report exhibits Run full code right before submission and make sure outputs match your manuscript. Archive final outputs.
Manual Excel edits not documented Document all manual steps (e.g., which Excel tab β†’ which figure). Automate when possible.
Starts from intermediate data Archive raw data + document entire cleaning pipeline.
Instability. Results vary across runs Control random seeds. Test stability. Contact reproducibility team for help.

πŸš€ Start Early: Timeline & Submission Steps

πŸ“… Phase Action Items
Kickoff - Assign a reproducibility lead per chapter or module
- Define folder and file structure for the whole team
- Align on data sources and archive raw versions from day 1
During project - Update README and DAS progressively
- Automate figures/tables as much as possible
- Keep scripts modular and coordinated across chapters
Before submission - Run the entire pipeline using your main script
- Clean the repository: only include necessary data, code, and outputs
- Ensure exhibits match the report exactly
- Review our checklist to make sure everything is ready
To submit Fill out the Reproducibility Verification Request Form and share your package

πŸ’‘ Start documenting from day 1β€”it will save time at submission.


Questions or support?
πŸ“§ reproducibility@worldbank.org
πŸ”— Reproducible Research Resources

Back to Home