25. File Management and Code Dissemination#

25.1. Summary#

In this section, we present the best practices for file management and code dissemination that we follow at the World Bank Data Lab.

After having participated in several Data Science projects, the Data Lab team created this GitHub repository template to be used as a starting point for each of the projects we work on. Moreover, the repository serves as a guideline of best practices for managing Data Science projects.

25.2. Learning Objectives#

25.2.1. Overall goals#

The main goal of this class is to teach students how to manage data files, document code, and disseminate work as a web book.

25.2.2. Specific goals#

At the end of this notebook, you should have gained an understanding and appreciation of the following:

  1. GitHub Template:

    • How to create a GitHub repository.

    • Understand GitHub workflow and best practices.

    • Create your project GitHub repository.

  2. Data Management:

    • Understand how to manage and store data in a secure way.

    • Understand how to handle data versioning and its challenges.

25.3. Data Science Projects GitHub Template#

The World Bank Data Lab uses this template for handling Data Science Projects. We are exploring it with this presentation.

25.4. Practice#

In preparation for the final project, you need to create a repository using the presented template.