Notebooks
Contents
Notebooks#
This directory contains Jupyter notebooks that demonstrate how to use the iQual
package.
Basic Usage#
The following notebooks demonstrate how to construct a basic model for a single annotation task:
Model with Scikit-learn - This notebook demonstrates how to construct a model using the scikit-learn library.
Model with SpaCy - This notebook demonstrates how to construct a model using the SpaCy library.
Model with Sentence-Transformers - This notebook demonstrates how to construct a model using the Sentence-Transformers library.
Model with Saved Dictionary - This notebook demonstrates how to construct a model using vectors from a saved dictionary. This can be useful if you have a large number of annotations and want to save time by not having to recompute the vectors for each annotation.
Advanced Usage#
The following notebooks demonstrate how to construct more advanced models, including models for multiple annotation tasks, models with multiple vectorizers and classifiers, and models with bootstrap resampling.
Model with Multiple Vectorizers - This notebook demonstrates how to construct a model for a single annotation task using multiple vectorizers. This can be useful if you want to combine different types of vectorizers (e.g. pretrained-embedding models, count-based models)
Model with Multiple Classifiers - This notebook demonstrates how to construct a model for a single annotation task using multiple classifiers. This can be useful if you want to combine different types of classifiers, and compare their performance on the same data.
Model with Multiple Annotations - This notebook demonstrates how to run the model fitting process for a multiple annotations.
Model with Bootstrap - This notebook demonstrates how to run the model fitting process with bootstrap resampling.
Interpretability#
The following notebooks demonstrate how to measure the interpretability of a model, to test whether interpretability improves with increasing Nh (the number of human annotations), and how to plot the distribution of regression coefficients.
Interpretability Tests - This notebook demonstrates how to run the interpretability tests on human and enhanced data to determine whether the enhanced data adds value by augmenting the human data.
Interpretability with increasing N[h] - This notebook demonstrates how interpretability of ML-assisted enhanced data increases with increasing Nh. This notebooks takes a look at the effect of increasing Nh while holding N = Nh +Nm fixed. Intuitively, this can be thought of as adding human annotations to some of the existing interviews that are currently machine annotated.
Distribution of Regression Coefficients - This notebook demonstrates how to run the interpretability tests on a model and plot the distribution of regression coefficients. The sizes of both the human annotated (Nh) and machine annotated (Nm) samples are varied to evaluate how many documents should be annotated by humans to achieve a certain level of interpretability.
Efficiency#
Efficiency Tests - This notebook demonstrates how to run the efficiency tests on a model by accounting for two types of errors in machine annotations: idiosyncratic error (i.e. the prediction error) and model error (i.e. the sampling errors in the model). For more information on the efficiency tests, refer to the Efficiency section in A Method to Scale-Up Interpretative Qualitative Analysis, with an Application to Aspirations in Cox’s Bazaar, Bangladesh (English). (Policy Research Working Paper No. WPS 10046)
Bias#
Bias Test - This notebook demonstrates how to explicity run bias tests on a model using cross-validated predictions across 25 bootstrap samples.