Methods

Author

Affiliation

Distributional Impact of Policies. Fiscal Policy and Growth Department

This section covers methods and best practices for working with AI in data analysis. The two topics below complement each other: the workflow tells you how to structure an AI-assisted project, while the safeguarding guide tells you what to protect along the way.

AI workflow for data analysis

AI coding assistants are probabilistic — the quality of their output depends heavily on the context you provide. A structured, eight-step workflow (objective → inputs/outputs → data dictionary → external metadata → checkpoint → iterative coding → independent verification → documentation) consistently produces better results than ad-hoc prompting. The workflow applies regardless of language (Stata, R, Python) or domain and gradually narrows the AI’s degrees of freedom until it can generate correct, reproducible code.

Safeguarding confidential data

If sensitive microdata leaks into AI context, it can create data-breach, privacy, and reputational risks. This section identifies the four main exposure channels (prompts, console output, file reads, command-line searches) and provides concrete remedies: curating prompts, setting system instructions to block file access and raw-data printing, generating lightweight metadata files in place of raw data, relocating data outside the working folder, and applying statistical disclosure control techniques (suppression, generalization, noise addition, pseudonymization) with dedicated packages.

DIME Data Handbook Appendix: AI and reproducible research

This is a reference section adapted from the DIME Data Handbook covering how for the purpose of using in the course as an online material for AI’s digestion.

External references

Tip

These references can be included as URLs directly in your prompts — most AI coding assistants will fetch and read the linked content, giving them richer context for code generation and advice.

Methods

AI workflow for data analysis

Safeguarding confidential data

DIME Data Handbook Appendix: AI and reproducible research

External references

Reproducible research and data analysis

Data privacy and anonymization

AI policies and responsible use