reprun-examples

reprun - examples

This section presents examples and outputs for the reprun command, demonstrating its use for ensuring reproducibility in analyses. For detailed information on the command, please refer to the helpfile.

Example 1

This is the most basic usage of reprun. Specified in any of the following ways, either in the Stata command window or as part of a new do-file, reprun will execute the complete do-file “myfile.do” once (Run 1), and record the “seed RNG state”, “sort order RNG”, and “data checksum” after the execution of every line, as well as the exact data in certain cases. reprun will then execute “myfile.do” a second time (Run 2), and find all changes and mismatches in these states throughout Run 2. A table of mismatches will be reported in the Results window, as well as in a SMCL file in a new directory called /reprun/ in the same location as “myfile.do”.

reprun "path/to/folder/myfile.do"

or

local myfolder "/path/to/folder"
reprun "`myfolder'/myfile.do"

Example 2

This example is similar to example 1, but the /reprun/ directory containing the SMCL file will be stored in the location specified by the using argument.

reprun "path/to/folder/myfile.do" using "path/to/report"

or

local myfolder "/path/to/folder"
reprun "`myfolder'/myfile.do" using "`myfolder'/report"

Example 3

Assume “myfile1.do” contains the following code:

sysuse census, clear
isid state, sort
gen group = runiform() < .5

Running a reproducibility check on this do-file using reprun will generate a table listing mismatches in Stata state between Run 1 and Run 2.

reprun "path/to/folder/myfile1.do"

A table of mismatches will be reported in the Results window, as well as in a SMCL file in a new directory called /reprun/ in the same location as “myfile1.do” and will look like:

--------------------------------------------------------------------------------------------------------------
    reprun output created by user wb558768 at 26 Sep 2024 11:24:39
    Operating System PC (64-bit x86-64) Windows 64-bit
    Stata MP - Version 18 running as version 14.1
--------------------------------------------------------------------------------------------------------------
 
    Checking file:
    +-> C:/Users/wb558768/reprun-example/myfile1.do
+------------------------------------------------------------------------------------------------------------
|        |      Seed RNG State      |      Sort Order RNG      |      Data Checksum       |
| Line # | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Loop iteration:
|--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+------------------
| 3      | Change   Change   DIFF   |                          | Change   Change   DIFF   |
+------------------------------------------------------------------------------------------------------------
 
    Done checking file:
    +-> C:/Users/wb558768/reprun-example/myfile1.do
-------------------------------------------------------------------------------------------------------------

The table shows that Line 3 is flagged. Line 3 (gen group = runiform() < .5) generates a new variable group based on a random uniform distribution. The RNG state will differ between Run 1 and Run 2 unless the random seed is explicitly set before this command. As a result, a mismatch in the “seed RNG state” as well as “data checksum” will be flagged.

The issue can be resolved by setting a seed before the command:

sysuse census, clear
isid state, sort
set seed 346290
gen group = runiform() < .5

Running the reproducibility check on the modified do-file using reprun will confirm that there are no mismatches in Stata state between Run 1 and Run 2:

------------------------------------------------------------------------------------------------------------
    reprun output created by user wb558768 at 26 Sep 2024 11:29:35
    Operating System PC (64-bit x86-64) Windows 64-bit
    Stata MP - Version 18 running as version 14.1
------------------------------------------------------------------------------------------------------------
 
    Checking file:
    +-> C:/Users/wb558768/reprun-example/myfile1.do
+------------------------------------------------------------------------------------------------------------
|        |      Seed RNG State      |      Sort Order RNG      |      Data Checksum       |
| Line # | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Loop iteration:
|--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+------------------
+------------------------------------------------------------------------------------------------------------
No mismatches and/or changes detected
 
    Done checking file:
    +-> C:/Users/wb558768/reprun-example/myfile1.do
-------------------------------------------------------------------------------------------------------------

Example 4

Using the verbose option generates more detailed tables where any lines across Run 1 and Run 2 mismatch or change for any value.

reprun "path/to/folder/myfile1.do", verbose

In addition to the output in Example 3, it will also report line 2 for changes in “sort order RNG” and “data checksum:

-------------------------------------------------------------------------------------------------------------
    reprun output created by user wb558768 at 26 Sep 2024 11:26:38
    Operating System PC (64-bit x86-64) Windows 64-bit
    Stata MP - Version 18 running as version 14.1
-------------------------------------------------------------------------------------------------------------
 
    Checking file:
    +-> C:/Users/wb558768/reprun-example/myfile1.do
+------------------------------------------------------------------------------------------------------------
|        |      Seed RNG State      |      Sort Order RNG      |      Data Checksum       |
| Line # | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Loop iteration:
|--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+------------------
| 2      |                          | Change   Change   OK!    | Change   Change   OK!    |
| 3      | Change   Change   DIFF   |                          | Change   Change   DIFF   |
+------------------------------------------------------------------------------------------------------------
 
    Done checking file:
    +-> C:/Users/wb558768/reprun-example/myfile1.do
-------------------------------------------------------------------------------------------------------------

Example 5

Assume “myfile2.do” contains the following code:

sysuse auto, clear
sort mpg 
gen sequence = _n

Running a reproducibility check on this do-file using reprun will generate a table listing mismatches in Stata state between Run 1 and Run 2.

reprun "path/to/folder/myfile2.do"

In “myfile2.do”, Line 2 sorts the data by the non-unique variable mpg, causing the sort order to vary between runs. This results in a mismatch in the “sort order RNG”. Consequently, Line 2 and Line 3 (gen sequence = _n) will be flagged for “data checksum” mismatches due to the differences in sort order, leading to discrepancies in the generated sequence variable, as shown in the results below:

-------------------------------------------------------------------------------------------------------------
    reprun output created by user wb558768 at 26 Sep 2024 11:27:34
    Operating System PC (64-bit x86-64) Windows 64-bit
    Stata MP - Version 18 running as version 14.1
-------------------------------------------------------------------------------------------------------------
 
    Checking file:
    +-> C:/Users/wb558768/reprun-example/myfile2.do
+------------------------------------------------------------------------------------------------------------
|        |      Seed RNG State      |      Sort Order RNG      |      Data Checksum       |
| Line # | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Loop iteration:
|--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+------------------
| 2      |                          | Change   Change   DIFF   | Change   Change   DIFF   |
| 3      |                          |                          | Change   Change   DIFF   |
+------------------------------------------------------------------------------------------------------------
 
    Done checking file:
    +-> C:/Users/wb558768/reprun-example/myfile2.do
-------------------------------------------------------------------------------------------------------------

The issue can be resolved by sorting the data on a unique combination of variables:

sysuse auto, clear
sort mpg make
gen sequence = _n

Example 6

Using the compact option generates less detailed tables where only lines with mismatched seed or sort order RNG changes during Run 1 or Run 2, and mismatches between the runs, are flagged and reported.

reprun "path/to/folder/myfile2.do", compact

The output will be similar to Example 5, except that line 3 will no longer be flagged for “data checksum”:

-------------------------------------------------------------------------------------------------------------
    reprun output created by user wb558768 at 26 Sep 2024 11:30:59
    Operating System PC (64-bit x86-64) Windows 64-bit
    Stata MP - Version 18 running as version 14.1
-------------------------------------------------------------------------------------------------------------
 
    Checking file:
    +-> C:/Users/wb558768/reprun-example/myfile2.do
+------------------------------------------------------------------------------------------------------------
|        |      Seed RNG State      |      Sort Order RNG      |      Data Checksum       |
| Line # | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Loop iteration:
|--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+------------------
| 2      |                          | Change   Change   DIFF   | Change   Change   DIFF   |
+------------------------------------------------------------------------------------------------------------
 
    Done checking file:
    +-> C:/Users/wb558768/reprun-example/myfile2.do
-------------------------------------------------------------------------------------------------------------

Example 7

reprun will perform a reproducibility check on a do-file, including all do-files it calls recursively. For example, the main do-file might contain the following code that calls on “myfile1.do” (Example 3) and “myfile2.do” (Example 5):

local myfolder "/path/to/folder"
do "`myfolder'/myfile1.do"
do "`myfolder'/myfile2.do"
reprun ""path/to/folder/main.do"

reprun on “main.do” performs reproducibility checks across “main.do”, as well as “myfile1.do”, and “myfile2.do” and the result will look like:

------------------------------------------------------------------------------------------------------------
    reprun output created by user wb558768 at 26 Sep 2024 11:33:05
    Operating System PC (64-bit x86-64) Windows 64-bit
    Stata MP - Version 18 running as version 14.1
------------------------------------------------------------------------------------------------------------
 
    Checking file:
    +-> C:/Users/wb558768/reprun-example/main.do
+------------------------------------------------------------------------------------------------------------
|        |      Seed RNG State      |      Sort Order RNG      |      Data Checksum       |
| Line # | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Loop iteration:
|--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+------------------
+------------------------------------------------------------------------------------------------------------
No mismatches and/or changes detected
 
    Stepping into sub-file:
    +-> C:/Users/wb558768/reprun-example/main.do
    +--> C:/Users/wb558768/reprun-example/myfile1.do
+------------------------------------------------------------------------------------------------------------
|        |      Seed RNG State      |      Sort Order RNG      |      Data Checksum       |
| Line # | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Loop iteration:
|--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+------------------
| 3      | Change   Change   DIFF   |                          | Change   Change   DIFF   |
+------------------------------------------------------------------------------------------------------------
 
    Stepping back into file:
    +-> C:/Users/wb558768/reprun-example/main.do
+------------------------------------------------------------------------------------------------------------
|        |      Seed RNG State      |      Sort Order RNG      |      Data Checksum       |
| Line # | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Loop iteration:
|--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+------------------
| 2      | Change   Change   DIFF   | Change   Change   DIFF   | Change   Change   DIFF   |
+------------------------------------------------------------------------------------------------------------
 
    Stepping into sub-file:
    +-> C:/Users/wb558768/reprun-example/main.do
    +--> C:/Users/wb558768/reprun-example/myfile2.do
+------------------------------------------------------------------------------------------------------------
|        |      Seed RNG State      |      Sort Order RNG      |      Data Checksum       |
| Line # | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Loop iteration:
|--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+------------------
| 2      |                          | Change   Change   DIFF   | Change   Change   DIFF   |
| 3      |                          |                          | Change   Change   DIFF   |
+------------------------------------------------------------------------------------------------------------
 
    Stepping back into file:
    +-> C:/Users/wb558768/reprun-example/main.do
+------------------------------------------------------------------------------------------------------------
|        |      Seed RNG State      |      Sort Order RNG      |      Data Checksum       |
| Line # | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Run 1  | Run 2  | Match  | Loop iteration:
|--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+------------------
| 3      |                          | Change   Change   DIFF   | Change   Change   DIFF   |
+------------------------------------------------------------------------------------------------------------
 
    Done checking file:
    +-> C:/Users/wb558768/reprun-example/main.do
-------------------------------------------------------------------------------------------------------------

The output will include tables for each do-file, illustrating the following process:

  • main.do: The initial check reveals no mismatches in “main.do”, indicating no discrepancies introduced directly by it.

  • Sub-file 1 (“myfile1.do”) : reprun steps into “myfile1.do”, where Line 3 is flagged for mismatches, as shown in Example 3. This table will show the issues specific to “myfile1.do”.

  • Return to “main.do”” : After checking “myfile1.do”, reprun returns to “main.do”. Here, Line 2 is flagged because it calls “myfile1.do”, reflecting the issues from the sub-file.

  • Sub-file 2 (“myfile2.do”): reprun then steps into “myfile2.do”, where Line 2 is flagged for mismatches, as detailed in Example 5.

  • Return to “main.do” (final check) : After checking “myfile2.do”, reprun returns to”main.do”. Line 3 in “main.do” is flagged due to the issues in “myfile2.do” propagating up.

In summary, reprun provides a comprehensive view by stepping through each do-file, showing where mismatches occur and how issues in sub-files impact the main do-file.