Module 3: Gen AI and LLM Applications in Statistics#
Module Objectives#
This module explores both current and potential applications of Gen AI and LLMs in the field of statistics. Covering the entire statistical life cycle—from data collection and processing to analysis and dissemination—we examine how LLMs can enhance each stage. For instance, Ask a Question (AAQ) platforms can interpret and respond to natural language queries, providing relevant statistical information. Learners will be introduced to tools for creating accessible platforms, like a WhatsApp bot, that can answer questions using statistical data as its knowledge base.
Module Topics#
Qualitative and Multi-Modal Data Analysis with LLMs
Advanced Image Analysis in Statistical Data Collection
Text Data Analysis with LLMs
Applications like sentiment analysis, parsing web-scraped price data, analyzing qualitative research data, and more.
Audio Data Processing and Analysis with Speech Models
For example, processing data from focus group discussions (FGDs) or interview data.
LLM Applications in Data Dissemination
LLMs in data discovery: Semantic search vs. keyword search.
Enhancing and automating metadata generation with LLMs.
Statbots: Chatbots that can respond to statistical queries.
Concepts in LLM Statbots
LLMs’ quantitative reasoning abilities and capacity to work with tabular data.
Strategies for connecting an LLM to statistical data: Text2SQL, Text2API, Text2Code, and more.
Tools for parsing and working with tabular documents (e.g., DocumentLLM, LangChain SQL agent).
Security considerations for Text2SQL and database connections.
Building a Statbot
LLM selection guide.
Tool selection.
Deploying statbots on platforms like WhatsApp, websites, and more.
Practical Labs#
Lab 1: Building a Health Statbot
Participants will use a set of provided documents to build a RAG-based app using an open-source LLM to query the data. The lab will result in a Streamlit app that can be shared.
Case Study#
Accessing Databases with LLMs
(Details TBD)
Assessment#
To be determined (TBD).