🧭 Building a Reproducibility Package (Flagship Edition)

World Bank Reproducible Research Initiative 🔗 reproducibility.worldbank.org

📦 What Is a Reproducibility Package?

A reproducibility package includes everything needed to replicate the findings in a paper:

Includes	Details
📑 Documentation	README, Data Availability Statement (DAS), figure/table mapping
📂 Code	All code files required to go from the original data to the results in the paper
📊 Data	All raw data needed and/or detailed access instructions to obtain it

💻 Standard: Computational Reproducibility
A third party can reproduce the exact findings in the paper using the data, code, and documentation provided by the author.

✅ Verified packages are published in the Reproducible Research Repository. (RRR)

🔁 Reproducibility Workflow

Step	What Happens
Submission	Authors prepare the reproducibility package and submit it for verification.
Verification	The reproducibility team tests whether the results can be fully reproduced using the submitted code and data. A detailed verification report is issued.
Publication	If reproducible, the package is published on the Reproducible Research Repository (RRR) with a DOI, metadata, and verification seal.

📦 Components of a Good Reproducibility Package — Flagships

📌 Flagship projects typically involve multiple datasets, chapters, and contributors, which adds complexity to reproducibility. The table below outlines the essential components of a high-quality package, with specific tips to support coordination and transparency in flagship workflows.

Component	Description & Flagship-Specific Tips
README File	📌 Critical for flagships: Serves as the main guide for replicators. — Provide step-by-step instructions for how to run the code and reproduce results. — Include a list of exhibits, indicating which are generated by the package and which are taken from external sources (with citations). — Include a Data Availability Statement (see below). — If the project structure is complex (e.g., organized by chapter or module), describe the folder layout to help others navigate it. 🔗 Use our templates: Markdown · Word
Data Availability Statement (DAS)	📌 Essential for flagships: These often use a mix of public, restricted, and internal datasets. — List every dataset used, regardless of size or access level. — Clearly describe the access conditions for each dataset: e.g., public (include URL), restricted (how the team obtained it), or internal WB access only (with process and a contact name if possible). — Include the access date, since datasets may be updated before project completion. 🔗 Example DAS for Flagship
Code Files	📁 Organize scripts by task (e.g., `cleaning.R`, `analysis.do`), and manage them with a single main script (`main.R`, `main.do`, or equivalent). — List all dependencies explicitly (e.g., R packages, ado files, Python libraries). 📌 For flagships: Use a modular structure by chapter or module, and agree on folder naming and structure across all contributors. 🔗 Use our templates: Stata · R
Data	📌 Data is often the trickiest part for flagships due to mmultiple sources. — Keep raw and processed data in separate folders. — Document all data transformations in code. If manual edits were made, explain them in the README. — Remove any unused datasets before submission. — If using internally produced data (e.g., from other WB teams), provide as much detail as possible: include dataset title, source team, contact person (if applicable), and whether it could be shared on DDH/MDL under a restricted license. 📌 Maintain consistent dataset versions across chapters and authors, and store original data in permanent, team-accessible locations.
Final Outputs	📤 Include all raw outputs used in the paper (e.g., CSVs, LaTeX tables, plots). 📌 If any outputs were sent to the design/publication team, specify which ones to avoid mismatches between the paper and the package.

🚧 Common Pitfalls & How to Avoid Them

🚫 Problem	✅ Solution
Version control. Code results ≠ Report exhibits	Run full code right before submission and make sure outputs match your manuscript. Archive final outputs.
Manual Excel edits not documented	Document all manual steps (e.g., which Excel tab → which figure). Automate when possible.
Starts from intermediate data	Archive raw data + document entire cleaning pipeline.
Instability. Results vary across runs	Control random seeds. Test stability. Contact reproducibility team for help.

🚀 Start Early: Timeline & Submission Steps

📅 Phase	Action Items
Kickoff	- Assign a reproducibility lead per chapter or module - Define folder and file structure for the whole team - Align on data sources and archive raw versions from day 1
During project	- Update README and DAS progressively - Automate figures/tables as much as possible - Keep scripts modular and coordinated across chapters
Before submission	- Run the entire pipeline using your `main` script - Clean the repository: only include necessary data, code, and outputs - Ensure exhibits match the report exactly - Review our checklist to make sure everything is ready
To submit	Fill out the Reproducibility Verification Request Form and share your package

💡 Start documenting from day 1—it will save time at submission.

Questions or support?
📧 reproducibility@worldbank.org
🔗 Reproducible Research Resources

Guide For Flagships

Guidance to Authors, Checklist, Reviewer Protocol, and Tools for reproducibility.