lint

Title

lint - detects and corrects bad coding practices in Stata do-files.

Syntax

lintinput_file” [using “output_file”] , [options]

The lint command can be broken into two functionalities:

  1. Detection identifies bad coding practices in a Stata do-files

  2. Correction corrects bad coding practices in a Stata do-file.

If an output_file is specified with using, then the linter will apply the Correction functionality and will write a new file with corrections. If not, the command will only apply the Detection functionality, returning a report of suggested corrections and potential issues of the do-file in Stata’s Results window. Users should note that not all the bad practices identified in Detection can be amended by Correction.

For this command to run, you will need Stata version 16 or greater, Python, and the Python package Pandas installed. To install Python and integrate it with Stata, refer to this page. To install Python packages, refer to this page.

options Description
verbose Report bad practices and issues found on each line of the do-file.
nosummary Suppress summary table of bad practices and potential issues.
indent(integer) Number of whitespaces used when checking indentation coding practices (default: 4).
space(integer) Number of whitespaces used instead of hard tabs when checking indentation practices (default: same as indent).
linemax(integer) Maximum number of characters in a line when checking line extension practices (default: 80).
excel(filename) Save an Excel file of line-by-line results.
force Allow the output file name to be the same as the name of the input file; overwriting the original do-file. The use of this option is not recommended because it is slightly possible that the corrected do-file created by the command will break something in your code and you should always keep a backup of it.
automatic Correct all bad coding practices without asking if you want each bad coding practice to be corrected or not. By default, the command will ask the user about each correction interactively after producing the summary report.
replace Overwrite any existing output file.

Description

This package is based on the DIME Analytics Stata Style Guide.

Detect functionality

Bad style practices and potential issues detected:

Use whitespaces instead of hard tabs

  • Use whitespaces (usually 2 or 4) instead of hard tabs.

Avoid abstract index names

  • In for-loop statements, index names should describe what the code is looping over. For example, avoid writing code like this:
  foreach i of varlist cassava maize wheat { }
  • Instead, looping commands should name the index local descriptively:
  foreach crop of varlist cassava maize wheat { }

Use proper indentations

  • After declaring for-loop statements or if-else statements, add indentation with whitespaces (usually 2 or 4) in the lines inside the loop.

Use indentations after declaring newline symbols (///)

  • After a new line statement (///), add indentation (usually 2 or 4 whitespaces).

Use the “!missing()” function for conditions with missing values

  • For clarity, use !missing(var) instead of var < . or var != .

Add whitespaces around math symbols (+, =, <, >)

  • For better readability, add whitespaces around math symbols. For example, do gen a = b + c if d == e instead of gen a=b+c if d==e.

Specify the condition in an “if” statement

  • Always explicitly specify the condition in the if statement. For example, declare if var == 1 instead of just using if var.

Do not use “#delimit”, instead use “///” for line breaks

  • More information about the use of line breaks here.

Do not use cd to change current folder

  • Use absolute and dynamic file paths. More about this here.

Use line breaks in long lines

  • For lines that are too long, use /// to divide them into multiple lines. It is recommended to restrict the number of characters in a line to 80 or less.

Use curly brackets for global macros

  • Always use ${ } for global macros. For example, use ${global_name} instead of $global_name.

Include missing values in condition expressions

  • Condition expressions like var != 0 or var > 0 are evaluated to true for missing values. Make sure to explicitly take missing values into account by using missing(var) in expressions.

Check if backslashes are not used in file paths

  • Check if backslashes (\) are not used in file paths. If you are using them, then replace them with forward slashes (/). Users should note that the linter might not distinguish perfectly which uses of a backslash are file paths. In general, this flag will come up every time a backslash is used in the same line as a local, glocal, or the cd command.

Check if tildes (~) are not used for negations

  • If you are using tildes (~) are used for negations, replace them with bangs (!).

Correct functionality

Coding practices to be corrected:

Users should note that the Correct feature does not correct all the bad practices detected. It only corrects the following:

  • Replaces the use of #delimit with three forward slashes (///) in each line affected by #delimit

  • Replaces hard tabs with soft spaces (4 by default). The amount of spaces can be set with the tab_space() option

  • Indents lines inside curly brackets with 4 spaces by default. The amount of spaces can be set with the indent() option

  • Breaks long lines into multiple lines. Long lines are considered to have more than 80 characters by default, but this setting can be changed with the option linemax(). Note that lines can only be split in whitespaces that are not inside parentheses, curly brackets, or double quotes. If a line does not have any whitespaces, the linter will not be able to break a long line.

  • Adds a whitespace before opening curly brackets, except for globals

  • Removes redundant blank lines after closing curly brackets

  • Removes duplicated blank lines

If the option automatic is omitted, Stata will prompt the user to confirm that they want to correct each of these bad practices only in case they are detected. If none of these are detected, it will show a message saying that none of the bad practices it can correct were detected.

Examples

The following examples illustrate the basic usage of lint. Additional examples can be found here

Detecting bad coding practices

The basic usage is to point to a do-file that requires revision as follows:

lint "test/bad.do"

For the detection feature you can use all the options but automatic, force, and replace, which are part of the correction functionality.

Options:

  1. Show bad coding practices line-by-line
lint "test/bad.do", verbose
  1. Remove the summary of bad practices
lint "test/bad.do", nosummary
  1. Specify the number of whitespaces used for detecting indentation practices (default: 4):
lint "test/bad.do", indent(2)
  1. Specify the number of whitespaces used instead of hard tabs for detecting indentation practices (default: same value used in indent):
lint "test/bad.do", tab_space(6)
  1. Specify the maximum number of characters in a line allowed when detecting line extension (default: 80):
lint "test/bad.do", linemax(100)
  1. Export to Excel the results of the line by line analysis
lint "test/bad.do", excel("test_dir/detect_output.xlsx")
  1. You can also use this command to test all the do-files in a folder:
lint "test/"

Correcting bad coding practices

The basic usage of the correction feature requires to specify the input do-file and the output do-file that will have the corrections. If you do not include any options, the linter will ask you confirm if you want a specific bad practice to be corrected for each bad practice detected:

  1. Basic correction use (the linter will ask what to correct):
lint "test/bad.do" using "test/bad_corrected.do"
  1. Automatic use (Stata will correct the file automatically):
lint "test/bad.do" using "test/bad_corrected.do", automatic
  1. Use the same name for the output file (note that this will overwrite the input file, this is not recommended):
lint "test/bad.do" using "test/bad.do", automatic force
  1. Replace the output file if it already exists
lint "test/bad.do" using "test/bad_corrected.do", automatic replace

Feedback, bug reports and contributions

Read more about these commands on this repo where this package is developed. Please provide any feedback by opening an issue. PRs with suggestions for improvements are also greatly appreciated.

Authors

DIME Analytics, The World Bank dimeanalytics@worldbank.org