lint

Title

lint – Detects and corrects bad coding practices in Stata do-files.

Syntax

lintinput_file” [using “output_file”], [options]

The lint command operates in two modes:

  1. Detection mode identifies bad coding practices in Stata do-files and reports them.

  2. Correction mode applies corrections to a Stata do-file based on the issues detected.

In detection mode, the command displays suggested corrections and potential issues in Stata’s Results window.
Correction mode is activated when an output_file is specified with using; the command then writes a new file with the applied corrections to output_file.
Note that not all issues flagged in detection mode can be automatically corrected.

To use this command, you need Stata version 16 or higher, Python, and the Pandas Python package installed. For instructions on installing Python and integrating it with Stata, see this guide. For installing Python packages, refer to this guide.

options Description
verbose Shows a report of all bad practices and issues flagged by the command.
nosummary Suppresses the summary table with counts of bad practices and potential issues.
excel(filename) Saves the verbose output in an Excel file.
indent(integer) Number of whitespaces used when checking indentation (default: 4).
linemax(integer) Maximum number of characters in a line (default: 80).

Options specific to the correction mode

options Description
automatic Suppresses the prompt asking users which correction to apply.
space(integer) Number of whitespaces used instead of hard tabs when replacing hard tabs with spaces for indentation (default: same value used for the option indent(), 4 when no value is defined).
replace Allows the command to overwrite any existing output file.
force Allows the input_file to be the same as output_file. Not recommended, see below.

Description

This command is a linting tool for Stata code that helps standardize code formatting and identify bad practices.
For further discussion of linting tools, see https://en.wikipedia.org/wiki/Lint_(software).

The linting rules used in this command are based on the DIME Analytics Stata Style Guide.
All style guides are inherently subjective, and differences in preferences exist.
An exact list of the rules used by this command can be found in this article on the repkit web documentation.
See the list of rules and the DIME Analytics Stata Style Guide for a discussion on the motivations for these rules.

Options

verbose displays a detailed report of all bad practices and issues flagged by the command in the Results window. By default, only a summary table with counts for each linting rule is shown.

nosummary suppresses the summary table of flagged occurrences.

excel(filename) exports the verbose output to an Excel file at the specified location.

indent(integer) sets the number of whitespaces used when checking indentation. Default: 4.

linemax(integer) sets the maximum number of characters allowed in a single line. Default: 80.

Options specific to the correction feature

automatic suppresses the interactive prompt before applying corrections. By default, the command asks for confirmation before applying identified corrections.

space(integer) sets the number of whitespaces to replace instead of hard tabs for indentation. Default: same value used for the option indent(), 4 when no value is defined.

replace allows overwriting an existing output file.

force allows the output file name to be the same as the input file, overwriting the original do-file. This is not recommended; see details in the section below.

Examples

The following examples illustrate basic usages of lint. The example file bad.do referred to below can be downloaded here.

Additional examples with more verbose explanation be found here

Detecting bad coding practices

  1. The basic usage is to point to a do-file that requires revision as follows:
lint "test/bad.do"
  1. Show bad coding practices line-by-line
lint "test/bad.do", verbose
  1. Remove the summary of bad practices
lint "test/bad.do", nosummary
  1. Specify the number of whitespaces used for detecting indentation practices (default: 4):
lint "test/bad.do", indent(2)
  1. Specify the maximum number of characters in a line allowed when detecting line extension (default: 80):
lint "test/bad.do", linemax(100)
  1. Export to Excel the results of the line by line analysis
lint "test/bad.do", excel("test_dir/detect_output.xlsx")
  1. You can also use this command to test all the do-files in a folder:
lint "test/"

Correcting bad coding practices

The basic usage of the correction feature requires to specify the input do-file and the output do-file that will have the corrections. If you do not include any options, the linter will ask you confirm if you want a specific bad practice to be corrected for each bad practice detected:

  1. Basic correction use (the linter will ask what to correct):
lint "test/bad.do" using "test/bad_corrected.do"
  1. Correction while defining the number of spaces to replace hard tabs with:
lint "test/bad.do" using "test/bad_corrected.do", space(2)
  1. Automatic use (Stata will correct the file automatically):
lint "test/bad.do" using "test/bad_corrected.do", automatic
  1. Use the same name for the output file (note that this will overwrite the input file, this is not recommended):
lint "test/bad.do" using "test/bad.do", automatic force
  1. Replace the output file if it already exists
lint "test/bad.do" using "test/bad_corrected.do", automatic replace

Feedback, bug reports and contributions

Read more about these commands on this repo where this package is developed. Please provide any feedback by opening an issue. PRs with suggestions for improvements are also greatly appreciated.

Authors

DIME Analytics, The World Bank dimeanalytics@worldbank.org