Getting Started with iQual#
This notebook provides a simple introduction to using iQual for text classification tasks. It demonstrates the core functionality of the package through a practical example of binary classification using the Stanford Politeness Corpus.
Key Features:
This notebook can be run as-is without modifications
All necessary code and explanations are included
Follow along to learn how to:
Prepare text data with question and answer pairs
Create and configure an iQual model
Train a classifier on labeled examples
Make predictions on new data
Evaluate model performance
Let’s get started!
Install the package from PyPI#
!pip install iqual --quiet
[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
Import Libraries#
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from iqual import iqualnlp
Stanford Politeness Corpus#
In this tutorial, we’ll use the Stanford Politeness Corpus, a dataset of requests from Stack Exchange annotated for politeness. This dataset provides a good example for binary classification:
Task: Determining whether a request is polite or impolite
Features: We’ll use both the context and the request text to make predictions
Labels: Binary classification (1 = polite, 0 = impolite)
This represents a common text classification task in computational linguistics and social science research. The ability to automatically detect politeness has applications in communication studies, workplace interaction analysis, and online community moderation.
The dataset is available through ConvoKit, a toolkit for conversational analysis. Let’s prepare the data for iQual:
# Install the ConvoKit library and download the Stanford Politeness Corpus
!pip install convokit --quiet
# Import the necessary functions
from convokit import Corpus, download
# Download the Stack Exchange Politeness Corpus
print("Downloading the Stack Exchange Politeness Corpus...")
corpus = Corpus(filename=download("stack-exchange-politeness-corpus"))
print(f"Download complete. Corpus contains {len(corpus.get_utterance_ids())} utterances.")
[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
Downloading the Stack Exchange Politeness Corpus...
Dataset already exists at /Users/adityachhabra/.convokit/saved-corpora/stack-exchange-politeness-corpus
Download complete. Corpus contains 6603 utterances.
Data Preparation#
The following code is specific to the Stanford Politeness Corpus structure. When working with your own datasets, you would adapt this preprocessing step to fit your data format.
Key steps in this preparation:
Extract the request text and politeness label from each utterance
Create a context field by splitting the text (in a real dataset, you might have distinct context)
Convert the politeness scores to binary values (1 = polite, 0 = impolite)
Format the data to work with iQual’s expected input structure
Note that for demonstration purposes, we’re artificially creating a “context” column by taking the first sentence of each request.
In your own work with interview transcripts or conversational data, you would typically have separate interviewer questions and respondent answers, or distinct turns in a conversation that provide natural context for each response.
# Process the corpus into a DataFrame
rows = []
# Iterate through the utterances
for utt in corpus.iter_utterances():
# Extract request text
request_text = utt.text
# Get metadata
meta = utt.meta
# Extract the binary politeness label (1="polite", 0="neutral", -1="impolite")
if 'Binary' in meta:
binary_label = meta['Binary']
# Convert to our binary format (1=polite, 0=impolite/neutral)
is_polite = 1 if binary_label == 1 else 0
# For Stack Exchange, create a simplified context
if len(request_text.split('.')) > 1:
# Split at the first period
parts = request_text.split('.', 1)
context_text = parts[0] + '.'
request_text = parts[1].strip()
else:
# For short texts, use a generic context
context_text = "Stack Exchange discussion:"
# Add to our list of data points
rows.append({
'context': context_text,
'answer': request_text,
'is_polite': is_polite
})
# Convert to DataFrame
data = pd.DataFrame(rows)
# Remove any rows where question or answer is empty
data = data[data['context'].str.len() > 0]
data = data[data['answer'].str.len() > 0]
# Print stats
print(f"Created DataFrame with {len(data)} records")
print(f"Polite requests: {sum(data['is_polite'] == 1)}")
print(f"Impolite requests: {sum(data['is_polite'] == 0)}")
data.sample(10)
Created DataFrame with 6596 records
Polite requests: 1649
Impolite requests: 4947
context | answer | is_polite | |
---|---|---|---|
2452 | I'm not sure your question makes sense. | Do you mean, how do you measure the length of ... | 0 |
492 | Stack Exchange discussion: | What about `DefaultListBoxItemStyle`? Are you ... | 0 |
4233 | Hi johnny8888, I'm on a Solaris server and the... | Can you please elaborate your answer? | 0 |
3632 | Stack Exchange discussion: | Do you mean you need real world use cases for ... | 0 |
443 | Stack Exchange discussion: | Do the URLs you want to add also come from the... | 0 |
1165 | Now, I am looking at the source for `whine. | pl` and it seems to be getting these from an S... | 1 |
2039 | This sounds a lot like a machine learning algo... | I'm no machine learning expert, but perhaps so... | 1 |
2481 | This is not really enough to tell what can be ... | Have you tried debug you code, for example tra... | 0 |
4182 | Yes, I can change the font. | Could you please provide me with an example? | 0 |
3777 | I'd suggest you get an iPod Touch to develop o... | Don't tell me you need something iPhone specif... | 0 |
Split the data into training and testing sets#
train_data, test_data = train_test_split(data, test_size=0.3)
# Create feature matrix (X) and target vector (y) for training
X_train = train_data[['context', 'answer']]
y_train = train_data['is_polite']
# Create feature matrix for testing
X_test = test_data[['context', 'answer']]
y_test = test_data['is_polite']
print(f"Training data size: {len(X_train)}")
print(f"Testing data size: {len(X_test)}")
Training data size: 4617
Testing data size: 1979
# Create a model
model = iqualnlp.Model()
# Add text features. Define upto 2 text columns, usually a context/question and another for response/answer.
model.add_text_features(q_col='context',
a_col='answer',
model='TfidfVectorizer',
env='scikit-learn',
)
# Add a classifier (Defaults to Logistic Regression)
model.add_classifier(name='LogisticRegression')
# Add a threshold component that will optimize for F1 score
model.add_threshold(scoring_metric='f1')
# Compile the model to finalize the pipeline structure
model.compile()
# Train the model on our training data
model.fit(X_train, y_train)
print("Model fit complete!")
model.model
Model fit complete!
Pipeline(steps=[('Input', FeatureUnion(transformer_list=[('question', Pipeline(steps=[('selector', FunctionTransformer(func=<function column_selector at 0x1122391c0>, kw_args={'column_name': 'context'})), ('vectorizer', Vectorizer(analyzer='word', binary=False, decode_error='strict', dtype=<class 'numpy.float64'>, encoding='utf-8', env='scikit-learn', input='conten... fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100, model='LogisticRegression', multi_class='deprecated', n_jobs=None, penalty='l2', random_state=None, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False)), ('Threshold', BinaryThresholder(threshold=np.float64(0.3052255080473171), threshold_range=(np.float64(0.015664853903903313), np.float64(0.858797346850902))))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('Input', FeatureUnion(transformer_list=[('question', Pipeline(steps=[('selector', FunctionTransformer(func=<function column_selector at 0x1122391c0>, kw_args={'column_name': 'context'})), ('vectorizer', Vectorizer(analyzer='word', binary=False, decode_error='strict', dtype=<class 'numpy.float64'>, encoding='utf-8', env='scikit-learn', input='conten... fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100, model='LogisticRegression', multi_class='deprecated', n_jobs=None, penalty='l2', random_state=None, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False)), ('Threshold', BinaryThresholder(threshold=np.float64(0.3052255080473171), threshold_range=(np.float64(0.015664853903903313), np.float64(0.858797346850902))))])
FeatureUnion(transformer_list=[('question', Pipeline(steps=[('selector', FunctionTransformer(func=<function column_selector at 0x1122391c0>, kw_args={'column_name': 'context'})), ('vectorizer', Vectorizer(analyzer='word', binary=False, decode_error='strict', dtype=<class 'numpy.float64'>, encoding='utf-8', env='scikit-learn', input='content', lowercase=True, max_df=... dtype=<class 'numpy.float64'>, encoding='utf-8', env='scikit-learn', input='content', lowercase=True, max_df=1.0, max_features=None, min_df=1, model='TfidfVectorizer', ngram_range=(1, 1), norm='l2', preprocessor=None, smooth_idf=True, stop_words=None, strip_accents=None, sublinear_tf=False, token_pattern='(?u)\\b\\w\\w+\\b', tokenizer=None, use_idf=True, vocabulary=None))]))])
FunctionTransformer(func=<function column_selector at 0x1122391c0>, kw_args={'column_name': 'context'})
Vectorizer(analyzer='word', binary=False, decode_error='strict', dtype=<class 'numpy.float64'>, encoding='utf-8', env='scikit-learn', input='content', lowercase=True, max_df=1.0, max_features=None, min_df=1, model='TfidfVectorizer', ngram_range=(1, 1), norm='l2', preprocessor=None, smooth_idf=True, stop_words=None, strip_accents=None, sublinear_tf=False, token_pattern='(?u)\\b\\w\\w+\\b', tokenizer=None, use_idf=True, vocabulary=None)
FunctionTransformer(func=<function column_selector at 0x1122391c0>, kw_args={'column_name': 'answer'})
Vectorizer(analyzer='word', binary=False, decode_error='strict', dtype=<class 'numpy.float64'>, encoding='utf-8', env='scikit-learn', input='content', lowercase=True, max_df=1.0, max_features=None, min_df=1, model='TfidfVectorizer', ngram_range=(1, 1), norm='l2', preprocessor=None, smooth_idf=True, stop_words=None, strip_accents=None, sublinear_tf=False, token_pattern='(?u)\\b\\w\\w+\\b', tokenizer=None, use_idf=True, vocabulary=None)
Classifier(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100, model='LogisticRegression', multi_class='deprecated', n_jobs=None, penalty='l2', random_state=None, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False)
BinaryThresholder(threshold=np.float64(0.3052255080473171), threshold_range=(np.float64(0.015664853903903313), np.float64(0.858797346850902)))
Make predictions on the test set#
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)
Calculate performance metrics#
from sklearn.metrics import (
f1_score,
accuracy_score,
precision_score,
recall_score,
)
# Individual metrics
metrics = {
'Accuracy': accuracy_score(y_test, y_pred),
'Precision': precision_score(y_test,y_pred),
'Recall': recall_score(y_test, y_pred),
'F1 Score': f1_score(y_test, y_pred)
}
# Display metrics as a dataframe
metric_df = pd.DataFrame([metrics])
print("Performance Metrics (Out-Sample):")
metric_df
Performance Metrics (Out-Sample):
Accuracy | Precision | Recall | F1 Score | |
---|---|---|---|---|
0 | 0.688226 | 0.38809 | 0.372047 | 0.379899 |
See results dataframe with examples#
results = pd.DataFrame({
'Context': X_test['context'],
'Answer': X_test['answer'],
'True Label': y_test,
'Predicted Label': y_pred,
'Probability': y_prob
})
# Display some examples of correct and incorrect predictions
print("Correct Predictions (True Positives):")
results[(results['True Label'] == 1) & (results['Predicted Label'] == 1)].head(3)
Correct Predictions (True Positives):
Context | Answer | True Label | Predicted Label | Probability | |
---|---|---|---|---|---|
3131 | Stack Exchange discussion: | Do you have any code we can look at? What have... | 1 | 1 | 0.441729 |
6440 | Stack Exchange discussion: | Christian, could you please provide a referenc... | 1 | 1 | 0.352674 |
621 | Can you please post some code. | How is data received at the client side? | 1 | 1 | 0.500905 |
print("Incorrect Predictions (False Negatives):")
results[(results['True Label'] == 1) & (results['Predicted Label'] == 0)].head(3)
Incorrect Predictions (False Negatives):
Context | Answer | True Label | Predicted Label | Probability | |
---|---|---|---|---|---|
3113 | Stack Exchange discussion: | Is it always the same for every file, or does ... | 1 | 0 | 0.239120 |
570 | What would you like in the case of, e. | g., `http://example.com/test.php?key=val` ? Or... | 1 | 0 | 0.223762 |
3029 | Stack Exchange discussion: | Do the repeats have to be contiguous? is 5,1,4... | 1 | 0 | 0.273251 |