Skip to the content.

Back to Home

Frequently Asked Questions: Reproducibility Packages

Before diving into the FAQs, we recommend reviewing the Guidance Note for a comprehensive overview of reproducibility standards and submission expectations.

Note for World Bank Staff: For internal FAQs tailored to World Bank researchers—please visit the internal reproducibility resources page.

Table of Contents

General FAQs

How do I submit a package?

Authors can request a reproducibility verification through this submission form. The package must include all components listed in this checklist.

Who can submit a reproducibility package?

World Bank staff and consultants are eligible to submit reproducibility packages for verification.

How long does the reproducibility verification process take?

Authors will receive an initial response within two business days of submission, confirming whether the package is complete and ready for review or highlighting any issues that need to be addressed first.

Most packages are reviewed within two weeks when only minor or no corrections are needed. The overall timeline may vary depending on the complexity of the analysis and the completeness of the submission. Packages that require significant clarification or additional input from authors may take longer.

Heavy-compute or packages using restricted data may require coordination time (e.g., NDA processing, scheduling virtual verification). Authors receive an initial triage within 2 business days specifying the pathway and timeline.

How should I organize my package?

It is recommended to organize your reproducibility package using a clear and consistent folder structure. The recommended structure includes:

What does the reproducibility report contain?

What are the most common reasons reproducibility checks fail?

Even well-documented research packages can fail reproducibility checks. Below are the most frequent issues we encounter:

What is the starting point for a reproducibility package?

A reproducibility package should begin from documented data sources (often referred to as “source data”). Source data should be ‘findable’ (a persistent location which may or may not be publicly accessible), citable, and include documentation to enable reuse. Where redistribution of raw data is unlawful/impractical, ‘usable’ data may be the starting point, provided provenance, access instructions, and reconstruction scripts/protocols are supplied. To be the starting point for a reproducibility package, the usable data must be archived in a permanent location (e.g. not a World Bank staff member’s OneDrive), so that the reproducibility package remains valid beyond the tenure of individual staff members.

Examples:

Can you do a reproducibility verification if the research relies on confidential and/or proprietary data?

Yes, reproducibility verification is still possible when data access is restricted. We use several strategies depending on the nature of the restriction:

How do I document the datasets used in the reproducibility package?

All datasets used in the package must be documented in the Data Availability Statement. This should include the following information based on the source of the data:

🔹 If using data generated by others:

Authors are expected to provide a full data citation for each external dataset used. See the Social Science Data Editors’ guidance on citing data for recommended citation formats.

At minimum, include:

🔹 If using data generated by your team:

🔹 If using data shared internally by another World Bank team:

In general, all source datasets must be archived in a stable location to ensure the validity of the research repository over time and ensure reproducibility is not reliant on a specific staff member.

How should I organize my code?

Place all code in a Code/ folder. Use subfolders for logical tasks or manuscript chapters, e.g.:

What is a main script? How do I create one?

A main script is a single entry point that runs all other scripts in the correct order. It should require only one change: setting the top-level directory, so that anyone can reproduce the full workflow with minimal setup.

How do I check if my code is stable?

Stable code produces the same outputs every time it is run with the same inputs. Unstable code yields different results on repeated runs, which undermines reproducibility. For stochastic models, set seeds and document any nondeterministic components; otherwise verification will fail on stability grounds.

How to test for stability:
Run the full code twice. If any outputs differ between runs, the code is unstable.

How to track output changes: To facilitate version control and comparison:

Common causes of instability:

Ensuring code stability is a key step in creating reproducible research.

How do I set up a reproducible environment?

A reproducible environment ensures that anyone running your code has the same setup: software versions, packages, and dependencies, so the results remain consistent over time and across machines.

To set this up, follow the environment setup guides for each language:

Using a clean, well-defined environment is critical to computational reproducibility.

Do you check if the code is correct?

No. We only verify complete documentation and computational reproducibility: whether running the same code and data produces the same outputs. We do not assess whether the code correctly implements the intended methods, nor the quality of the code.

What if my package uses Excel?

While Excel is widely used, reproducibility verification is much more difficult when workflows rely on manual steps, hidden formulas, or undocumented edits. Therefore, we strongly discourage using Excel for critical parts of the analysis and recommend automating as much of the workflow as possible.

If your package includes Excel files:

See our Excel guidelines for recommended practices and our presentation on how to produce tables and plots in Stata and R for quick instructions on how to build reproducible outputs.

Can I submit compiled code or binaries?

Yes. Provide (i) build instructions and compiler versions, or (ii) a container image, or (iii) deterministic binaries plus environment details. If source cannot be shared immediately, indicate source-escrow terms (e.g., released at journal acceptance or after embargo).

What if my simulations take weeks to run?

Use the Artifact Pathway: submit pre-computed solve outputs (with hashes) and scripts that generate all exhibits from those outputs. Provide a short “smoke test” subset that runs in ≤1 hour to demonstrate determinism.

Do you support Julia/MATLAB/Fortran/C++?”

Yes. Authors should submit environment files or containers. For Julia, authors must provide the project and manifest along with scripts. For MATLAB, authors provide the version plus all toolboxes, along with scripts.

What is a README and what should it include?

A README is the main guide to your reproducibility package. It helps others—including reviewers, editors, and future researchers—understand how to navigate, run, and evaluate your code and data. A clear and complete README is essential for transparency and reproducibility.

If your publication is a journal article, we recommend using the Social Science Data Editors’ README template, which is the standard for many economics journals.

For internal use or working papers, you may use our simplified README template, which is based on the Social Science Data Editors’ version.

At a minimum, your README should include:

What is a Data Availability Statement?

A Data Availability Statement (DAS) explains where and how the data used in a study can be accessed, and under what conditions. It is a critical component of reproducible research, as replication requires access to the exact datasets used in the analysis.

The DAS can be included as a section within the README file and should contain the following:

Back to Home