Project Template#

CalVer GitHub Release pre-commit.ci status

The template is a standardized, but flexible project and documentation structure of folders and files for sharing your data science work.

Inspired by literate programming, maintained by the Development Data Group and built as GitHub template repository, the template contains:

  • README, CODE_OF_CONDUCT, CONTRIBUTING templates

    README files are important and often neglected. The files should inform anyone about the first steps to use, learn and contribute to your project.

  • CITATION.cff

    Embracing CFF aligns with best practices for reproducible research and software development. By adhering to established standards for documenting project dependencies and citations, we demonstrate our commitment to quality, transparency, and integrity in our work.

  • LICENSE

    The LICENSE is a document that determines what others can and cannot do with contents of the repository. If no license is present, no one has permission to use and/or modify your code. The template is licensed under the Mozilla Public License. And so will projects generated from it.

  • docs/

    Documentation is often never prioritized until last minute. The template aims to revert the malpractice by setting up the documentation as an integral part, inspired by literate programming. With the power of Jupyter Book, data practitioners have a way to share Jupyter notebooks on GitHub Pages in a standardized and effortless way.

  • docs/bibliography.bib

    A bibliography using the BibTeX format. Use this file to include and cite your project’s bibliography. See also Citations and bibliographies.

  • data/

    Placeholder folder for data. Data is immutable. By default, the data folder is present but ignored from version control, in order to prevent files of being mistakenly versioned in the code repository.

  • src/

    Placeholder folder for source code. If Python, it is recommended the package is made pip-installable.

  • notebooks/

    Placeholder folder for Jupyter notebooks. Markdown files and Jupyter notebooks can be added to docs/_toc.yml (Table of Contents) to compose the documentation.

  • .pre-commit-config.yml

    Using pre-commit offers a significant advantage in streamlining the development process by enforcing code standards and reducing errors before code reaches the review stage or is committed to the repository. It automates the execution of various checks, such as syntax errors, code formatting, and ensuring compliance with coding standards, which saves time and improves code quality.

  • GitHub Actions and Dependabot

    GitHub Actions and Dependabot are two powerful features provided by GitHub to automate and secure software development workflows, making it easier for developers to maintain high-quality and safe codebases.

  • GitHub Issues and Pull Requests GitHub

    GitHub allows to customize how issues and pull requests are presented to the public. Custom templates encourage collaboration and maintainability.

Benefits#

Project templates on GitHub are essential for streamlining the data science and collaboration processes, and they offer several key benefits:

  • 🛠️ Consistency and Best Practices: Project templates encourage consistency in project structure, coding standards, and best practices. They provide a standardized starting point, ensuring that all team members follow the same guidelines and reduce the risk of introducing errors.

  • Time and Effort Savings: Templates save time by eliminating the need to set up a project from scratch. Developers can quickly start working on their projects without the overhead of configuring the initial project structure, dependencies, or workflows.

  • 🚀 Faster Onboarding: New team members or contributors can easily get up to speed by using project templates. It simplifies the onboarding process, allowing them to understand the project structure and development practices more quickly.

  • 🎨 Customization and Adaptability: GitHub project templates can be customized to suit the specific needs of different types of projects or organizations. They serve as a foundation that can be adapted to meet unique requirements.

  • 🤝 Community Engagement: Open-source projects can attract more contributors when they provide accessible project templates. These templates facilitate contributions by reducing the barriers to entry for potential collaborators.

  • 🔄 Version Control Integration: GitHub project templates are tightly integrated with Git version control. This makes it easier to manage changes, collaborate, and track the history of project configurations.

  • 📖 Documentation and Guidance: Templates often include documentation and guidance to help developers understand the project’s structure and how to get started. This can include README files, code comments, and links to relevant resources.

  • 🔍 Discoverability: Templates are discoverable on GitHub, making it easy for developers to find and use project templates for their preferred programming languages, frameworks, and tools. This helps build a supportive ecosystem.

  • ✍️ Continual Improvement: Project templates can evolve and improve over time as best practices, technology, and requirements change. This ensures that projects remain up to date and maintainable.

In summary, GitHub project templates are valuable resources that enhance project management, development practices, and collaboration. They promote consistency, efficiency, and quality in software development, whether for individual projects, open-source contributions, or within organizational contexts.

Important

With flexibility comes great responsibility. The template makes a few opiniated choices for the structure and code/documentation management of a project for what we envision to be most cases. However, even the best of the templates would never be perfect for the universe of cases out there. All in all, the template aims to encourage teams to start thinking and assimilate collaborative coding, documentation​, enginerring, reproducibility​ and best practices as an integral part of the project. In a standardized way.

In this spirit, if the template is not for you or in case you have feedback, please consider opening an issue or submitting a pull request to share your ideas and suggestions. Your contributions would be appreciated immensely.

Usage#

Getting Started#

1. Create new repository from template#

The template is a GitHub template repository; in other words, you can generate a new GitHub repository with the same files and folders to use as the starting point for your project.

_images/github-template.png

Now, give your repository a name, choose the visibility (Public or Private) and click Create repository from template. Do not select include all branches.

_images/github-template-create.png

Voilà! The repository has been created with the same files and folders of the template.

See also

For additional information, see the GitHub documentation

2. Enable GitHub Actions and GitHub Pages#

After creating the repository from the template, you will have to enable GitHub Actions and GitHub Pages to allow the Jupyter Book to be built and published.

To activate the workflow, please enable GitHub Actions by going to the repository’s settings (Settings > Actions > General), and selecting read and write permissions as shown below.

_images/github-template-action-enable.png

To publish, please enable GitHub Pages by going to the repository’s settings (Settings > Pages), and selecting to deploy from the GitHub Actions option.

_images/github-template-pages.png

On the next push to main, the Jupyter Book will be automatically built and published. You can check the progress on the Actions tab.

_images/github-template-action.png

Caution

The documentation can be published from either public and private repositories. If publishing private content, please remember to carefully select the content to be made public and to abide by your organization’s Data Privacy Policy.

3. Update configurations#

The template comes with a default docs/_config.yml Jupyter Book configuration file. Remember to update it to reflect your project’s name and details.

repository:
url: https://github.com/worldbank/template
branch: main

4. Review and update README files#

The template comes with README files - including this README - that should provide anyone with the information about the first steps to use, learn and contribute to your project. Please replace and/or repurpose the files with instructions and detailed information about your project.

  • CODE_OF_CONDUCT

  • CONTRIBUTING

  • README

  • Issues and Pull Requests GitHub templates

See also

Awesome README

5. Choose a license#

The template is licensed under the Mozilla Public License. A LICENSE is the document that guarantees the repository can be shared, modified and receive contributions. Otherwise, if no license is present, all rights are reserved.


Congratulations! You just created a beautiful home for your project. To access your project page, use (and share) the link as shown below.

🌟 https://<your-github-username>.github.io/<your-project-name>

For example, see this template as a live demo.

Adding Content#

The template is created as a Jupyter Book - an open-source project to build beautiful, publication-quality books and documents from computational content. Let’s see below how to add, execute and publish new content for your project.

Table of Contents#

When ready to publish the documentation on GitHub Pages, all you need to do is edit the table of contents and add and/or update content you would like to display. Jupyter Book supports content written as Markdown, Jupyter notebooks and reStructuredText files and the docs/_toc.yml file controls the table of contents of your book.

The template comes with the table of contents below as an example.

format: jb-book
root: README

parts:

- caption: Documentation
    numbered: True
    chapters:
  - file: notebooks/world-bank-api.ipynb
- caption: Additional Resources
    chapters:
  - url: <https://datapartnership.org>
        title: Development Data Partnership
  - url: <https://www.worldbank.org/en/about/unit/unit-dec>
        title: World Bank DEC
  - url: <https://www.worldbank.org/en/research/dime>
        title: World Bank DIME

Dependencies#

The next step is ensure your code is maintainable, reliable and reproducible by including any dependencies and requirements, such as packages, configurations, secrets (template) and additional instructions.

The template suggests to use conda (or mamba) as environment manager and, as conventional, the environment is controlled by the environment.yml file.

The environment.yml file is where you specify any packages available on the Anaconda repository as well as from the Anaconda Cloud (including conda-forge) to install for your project. Ensure to include the pinned version of packages required by your project (including by Jupyter notebooks).

channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.9
  - bokeh=2.4.3
  - pandas=1.4.3
  - pip:
    - requests==2.28.1

To (re)create the environment on your installation of conda via anaconda, miniconda or preferably miniforge, you only need to pass the environment.yml file, which will install requirements and guarantee that whoever uses your code has the necessary packages (and correct versions). By default, the template uses Python 3.9.

conda env create -n <your-environment-name> -f environment.yml

In case your project uses Python, it is strongly recommended to distribute it as a package.

Important

The template contains an example - the datalab Python package - and will automatically find and install any src packages as long as pyproject.yml is kept up-to-date.

Jupyter Notebooks#

Jupyter Notebooks can be beautifully rendered and downloaded from your book. By default, the template will render any files listed on the table of contents that have a notebook structure. The template comes with a Jupyter notebook example, notebooks/world-bank-api.ipynb, to illustrate.

Important

Optionally, Jupyter Book can execute notebooks during the build (on GitHub) and display code outputs and interactive visualizations as part of the documentation on the fly. In this case, Jupyter notebooks will be executed by GitHub Actions during build on each commit to the main branch. Thus, it is important to include all requirements and dependencies in the repository. In case you would like to ignore a notebook, you can exclude files from execution.

Code of Conduct#

The template maintains a Code of Conduct to ensure an inclusive and respectful environment for everyone. Please adhere to it in all interactions within our community.

License#

The template is licensed under the Mozilla Public License. Remember to replace the license if necessary. If open source, choose an open source license.