Module 3: Gen AI and LLM Applications in Statistics

Module 3: Gen AI and LLM Applications in Statistics#

Module Objectives#

This module explores both current and potential applications of Gen AI and LLMs in the field of statistics. Covering the entire statistical life cycle—from data collection and processing to analysis and dissemination—we examine how LLMs can enhance each stage. For instance, Ask a Question (AAQ) platforms can interpret and respond to natural language queries, providing relevant statistical information. Learners will be introduced to tools for creating accessible platforms, like a WhatsApp bot, that can answer questions using statistical data as its knowledge base.

Module Topics#

  • Qualitative and Multi-Modal Data Analysis with LLMs

  • Advanced Image Analysis in Statistical Data Collection

  • Text Data Analysis with LLMs

    • Applications like sentiment analysis, parsing web-scraped price data, analyzing qualitative research data, and more.

  • Audio Data Processing and Analysis with Speech Models

    • For example, processing data from focus group discussions (FGDs) or interview data.

  • LLM Applications in Data Dissemination

    • LLMs in data discovery: Semantic search vs. keyword search.

    • Enhancing and automating metadata generation with LLMs.

    • Statbots: Chatbots that can respond to statistical queries.

  • Concepts in LLM Statbots

    • LLMs’ quantitative reasoning abilities and capacity to work with tabular data.

    • Strategies for connecting an LLM to statistical data: Text2SQL, Text2API, Text2Code, and more.

    • Tools for parsing and working with tabular documents (e.g., DocumentLLM, LangChain SQL agent).

    • Security considerations for Text2SQL and database connections.

  • Building a Statbot

    • LLM selection guide.

    • Tool selection.

    • Deploying statbots on platforms like WhatsApp, websites, and more.

Practical Labs#

  • Lab 1: Building a Health Statbot
    Participants will use a set of provided documents to build a RAG-based app using an open-source LLM to query the data. The lab will result in a Streamlit app that can be shared.

Case Study#

  • Accessing Databases with LLMs
    (Details TBD)

Assessment#

  • To be determined (TBD).