ado-management-with-repado
Using repado
for Ado-File Management
This is a comprehensive guide on how to use repado
. For a shorter and more technical syntax documentation, refer to the helpfile.
Background
To ensure long-term reproducibility of a research package, it is essential to share both the exact version of the project code and the exact versions of its dependencies. Dependencies refer to any commands used in the project code that are not defined within the code itself. This includes both Stata’s built-in commands and community-contributed commands installed from external sources.
The code for these dependencies is just as important for reproducibility as the project code itself. Stata offers robust and user-friendly version control for its built-in commands, which will be briefly discussed in this article. However, the main focus here is on using repado
, a command designed to enable long-term version control for community-contributed commands.
Both built-in and community-contributed commands are periodically updated to introduce improvements or fix bugs. While it is generally advisable to use the latest versions when starting a new project, research reproducibility—and, by extension, research transparency—requires that results can be reproduced exactly as originally generated, even if the original process was suboptimal or contained bugs.
Version Control of Stata’s Built-In Commands
Stata makes version control of its built-in commands straightforward through the version
command. Built-in commands remain unchanged between Stata releases, and by specifying a version number, you ensure that commands like regress
, generate
, etc., behave exactly as they did in that release.
For example, placing the following at the top of your do-file ensures compatibility with Stata 14.1:
version 14.1
You can add this line to each script, but if your project uses a main script for reproducibility, it’s sufficient to include it only at the top of that main script.
Note that you can only set the version to your current Stata release or an earlier one—not to a newer release. To use features from a newer version, you must purchase an upgrade to your Stata installation. While newer releases generally offer improvements, you should not set the version higher than the oldest Stata version used by any project collaborator. Sub-releases (e.g., 14.1) often include important fixes and are typically available for free to users of the main release. If a sub-release exists for your target version, it is advisable to specify it.
Version Control of Community-Contributed Commands
Many other programming languages have version-controlled repositories for community-contributed commands (or libraries). For example, R uses CRAN, and Python uses sources like pip
and conda
. With these systems, it is sufficient to share the exact versions of code dependencies used in a reproducibility package to ensure it runs identically to how it did at the time of publication.
In Stata, however, the primary source for community-contributed commands is SSC (Statistical Software Components). While packages on SSC are versioned, SSC only provides access to the most recent version of each command, not to previous versions. Once a dependency from SSC used in a reproducibility package is updated, it is no longer possible to execute the code identically to how it was executed at the time of publication.
In practice, most updates to dependencies do not change results. However, some updates definitely will, or may even cause the reproducibility code to crash. In either case, the results in the reproducibility package can no longer be considered reproducible.
The repado
command in the repkit
package provides a Stata solution for long-term version-control of community-contributed commands. Since the main source of commands in Stata is not fully versioned, and commands are sometimes published from independent sources, this tool functions differently from common tools in R and Python.
Using repado
How Stata Installs Commands
When you install commands in Stata using ssc install
or net install
, Stata downloads the necessary files and saves them in a designated folder called the “PLUS folder”. The PLUS folder is included in Stata’s ado-paths. You can view your current ado-paths settings by running the adopath
command in Stata. The output will look something like this:
[1] (BASE) "C:\Program Files\Stata17\ado\base/"
[2] (SITE) "C:\Program Files\Stata17\ado\site/"
[3] "."
[4] (PERSONAL) "C:\Users\user123\ado\personal/"
[5] (PLUS) "C:\Users\user123\ado\plus/"
[6] (OLDPLACE) "c:\ado/"
Stata’s built-in commands are stored in the BASE
folder. You should never modify this path or any of its contents. In addition to BASE
and PLUS
, there are other paths where Stata looks for commands when executing code. These are typically only used in special setups or exist for backward compatibility.
The order of this list matters. Stata first looks for a command in path [1]
, and if the command is not found there, it moves to [2]
, and so on. If no command is found in any of the listed paths, Stata throws an error stating that the command is unrecognized.
Advanced notes: Stata identifies commands solely by their names, not by their version, source, or author. For example, if you create a command named regress
and save it in the PLUS folder, Stata will still use the built-in regress
command from the BASE
folder, since BASE
has higher priority in the ado-paths. Similarly, if you create a command called estout
(which is also the name of a widely used community-contributed command) and save it in your ado-paths, Stata will use your version as long as you do not have estout
from SSC installed in a path with higher priority. If you run code from someone else who used estout
from SSC, Stata will execute your version without any indication that it is a different command. ### Principle Behind repado
Stata allows users to modify the ado-paths, and repado
leverages this to manage project-specific ado-folders. The team creates an ado-folder within the project directory and points repado
to that folder. repado
then sets the PLUS folder to the project ado-folder and removes all other ado-paths except for the BASE folder. Note that Stata restores the default ado-paths each time it is restarted, so changes made by repado
are only temporary for the current Stata session.
While repado
is configured to use the project ado-folder, commands installed via ssc install
or net install
will be placed in that folder. Anyone running project code that use repado
will only use the commands installed in the project ado-folder. It does not matter if another user has a different version of a command in their default PLUS folder or does not have it installed at all—the project will always use the version in the project ado-folder.
When sharing a reproducibility package, the project ado-folder should be distributed alongside the project code. This ensures that anyone reproducing the results—now or in the future—will use exactly the same versions (practically the same command files) as the original author. This approach works regardless of what version if the most recent at that time or any other changes to the original source of the command after the package is created.
Important note: When sharing third-party code as part of your reproducibility package, ensure you have the appropriate licensing rights. Commands installed from SSC are distributed under the GPL v3 (https://www.gnu.org/licenses/gpl-3.0.txt), which allows redistribution as part of a reproducibility package, provided the commands are not modified or relicensed. However, licensing terms may change, so it is your responsibility to verify your rights at the time of publication. If you have installed commands from sources other than SSC, check the applicable license. If in doubt, contact the author for permission.
Setting Up the Project’s Ado-Folder
Using repado
is best done at the beginning of a project, but it can be implemented at any stage. The first step is to create the ado-folder.
We recommend creating the ado-folder inside the code
directory, as this simplifies the process of eventually creating reproducibility package. Make it a separate folder within the code
directory, and only modify the contents of the ado-folder using ssc install
or net install
. A suitable name for this folder is ado
, but there is no technical requirement for the name—it can be named anything you prefer.
A Simple Example
Consider a project with the following folder structure, which will be used throughout this guide:
myproject/
├─ code/
│ ├─ ado/
│ ├─ projectcode/
├─ data/
├─ outputs/
In this case, the basic use of repado
is as follows:
* Set user root folderglobal root "C:\Users\user123\github\myproject"
* Point repado to the ado-folderusing "${root}/code/ado" repado
If running the adopath
command, the output will now be:
[1] (BASE) "C:\Program Files\Stata17\ado\base/"
[2] (PLUS) "C:\Users\user123\github\myproject/code/ado"
Note three key aspects after running repado
:
- The
BASE
ado-folder remains unchanged. Removing or modifying it would make Stata’s built-in commands unavailable. - The
PLUS
folder now points to the project ado-folder. Regardless of how the project folder is shared (e.g., via Git/GitHub, cloud services like OneDrive or Dropbox, network drives, or even as a .zip archive), the exact command files will be available to anyone with access to the project folder. - All other ado-paths are removed, so commands installed by the user in other locations (which is uncommon) are no longer available. While this might seem inconvenient at first, it ensures that all commands needed for the project are contained within the project’s ado-folder.
Remember that all settings discussed here are reset when Stata is restarted.
Installing Commands in the Ado-Folder
As mentioned earlier, ssc install
and net install
lace packages in the PLUS
folder. Since the PLUS
folder now points to the project ado-folder, any actions like ssc install
, ssc uninstall
, ssc update
, and adoupdate
will affect only the project ado-folder—provided Stata is not restarted.
In the repado
workflow, you should avoid including ssc install
or any other commands that install or update community-contributed commands within your do-files. Instead, these actions should be performed interactively through the Command window in Stata’s main window.
Only one user needs to install each required command into the project ado-folder. After that, any other team member—or any future reproducer—
can run the code using the version of the commands in the project ado-folder, without needing to install anything in their own Stata installations.
When to Use nostrict
Mode
repado
offers a nostrict
option, which is only intended to be used temporarily (if at all). In nostrict
mode, the project ado-folder is set as the PERSONAL
directory instead of PLUS
. This gives PERSONAL
the second-highest priority after BASE
, meaning commands in the project ado-folder will take precedence over those in the user’s default PLUS
folder.
The nostrict
mode is useful when users want access to commands already installed in their default PLUS
folder and wish to test them within the project before deciding to install them in the project ado-folder. This can be especially helpful in large teams, where users may want to experiment before making a command available to everyone. It is also useful for using commands with meta-functionality (such as linters or tools for removing unused variables) that are not directly involved in generating project results.
However, we strongly recommend that, even if nostrict
mode is used during some part of the development, the final reproducibility package should be created in strict
mode (i.e., without the nostrict
option). This ensures that all dependencies required by the project code are either built-in commands or are installed in the project ado-folder, guaranteeing full reproducibility.
repado
Cannot Install Itself
One limitation of repado
is that, while it can ensure all of a project’s code dependencies are provided, it cannot install itself. Even if repado
is included in the project ado-folder, it must still be installed in each user’s default PLUS
folder to manage ado-folders in the first place.
Below, we suggest our recommended mitigation for this limitation, as well as an alternative approach to repado
that achieves the same results but involves more lines of code and the project team needs to make sure it is set up correctly. ### Mitigation
A recommended mitigation is to include code that checks whether repkit
is already installed in the user’s PLUS folders. Do-files intended for broad distribution should never automatically install packages on a user’s system, as this is generally considered poor practice. Instead, it is best to provide a clear and polite prompt instructing the user to install repkit
if it is not already present. See the example below:
if repkit is installed
* Check
cap which repkitif _rc == 111 {
di as error "{pstd}You need to have {cmd:repkit} installed to run this reproducibility package. Click {stata ssc install repkit} to do so.{p_end}"
}
Alternative
As an alternative, the code underlying repado
can be copied directly into the project’s main do-file. This approach replicates the functionality of repado
within the project code, achieving the same results without requiring users to have repkit
installed. Since repkit
is published under the MIT license, this is fully permitted.
The ado-folder must still be created as described above. Afterward, you can add that folder using one of the two examples below. Both examples assume the same folder structure as previously outlined.
Example 1: In this example, the same results are achieved as when using repado
in strict mode. This version maintains only the project ado-folder and the BASE
folder in the ado-paths. After running this code, you can install commands in the project ado-folder using ssc install
, and only commands in the project ado-folder or built-in commands will be available to your project.
* Set user root folderglobal root "C:\Users\user123\github\myproject"
PLUS to the project ado-folder and add it to the ado-paths.
* Set sysdir set PLUS "${root}/code/ado"
adopath ++ PLUS
adopath ++ BASE
all ado-paths with rank 3 or higher, leaving only BASE and PLUS.
* Remove local morepaths 1
while (`morepaths' == 1) {
capture adopath - 3
if _rc local morepaths 0
}
Example 2: This example is similar to the first, but does not use the PLUS
folder. By avoiding PLUS
, you eliminate the risk of users accidentally running adoupdate
and updating all commands in the project ado-folder. This is especially helpful if you are not using Git, as such changes cannot be easily reverted. If you are using Git and tracking the ado-folder in your repository, this distinction is less critical.
* Set user root folderglobal root "C:\Users\user123\github\myproject"
as an unnamed path, then add BASE.
* Add the project ado-folder in BASE as the first path and the project ado-folder as the second.
* This results adopath ++ "${root}/code/ado"
adopath ++ BASE
all ado-paths with rank 3 or higher, leaving only BASE and the project ado-folder.
* Remove local morepaths 1
while (`morepaths' == 1) {
capture adopath - 3
if _rc local morepaths 0
}
You can create a future-proof reproducibility package using repado
or either of the two code examples above. The repado
command is provided to simplify this process and abstract away technical details. Ultimately, it is up to each team to choose the method that best fits their workflow.