Getting Started with iQual#

This notebook provides a simple introduction to using iQual for text classification tasks. It demonstrates the core functionality of the package through a practical example of binary classification using the Stanford Politeness Corpus.

Key Features:

  • This notebook can be run as-is without modifications

  • All necessary code and explanations are included

Follow along to learn how to:

  1. Prepare text data with question and answer pairs

  2. Create and configure an iQual model

  3. Train a classifier on labeled examples

  4. Make predictions on new data

  5. Evaluate model performance

Let’s get started!

Install the package from PyPI#

!pip install iqual --quiet
[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: pip install --upgrade pip

Import Libraries#

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from iqual import iqualnlp

Stanford Politeness Corpus#

In this tutorial, we’ll use the Stanford Politeness Corpus, a dataset of requests from Stack Exchange annotated for politeness. This dataset provides a good example for binary classification:

  • Task: Determining whether a request is polite or impolite

  • Features: We’ll use both the context and the request text to make predictions

  • Labels: Binary classification (1 = polite, 0 = impolite)

This represents a common text classification task in computational linguistics and social science research. The ability to automatically detect politeness has applications in communication studies, workplace interaction analysis, and online community moderation.

The dataset is available through ConvoKit, a toolkit for conversational analysis. Let’s prepare the data for iQual:

# Install the ConvoKit library and download the Stanford Politeness Corpus
!pip install convokit --quiet

# Import the necessary functions
from convokit import Corpus, download

# Download the Stack Exchange Politeness Corpus
print("Downloading the Stack Exchange Politeness Corpus...")
corpus = Corpus(filename=download("stack-exchange-politeness-corpus"))
print(f"Download complete. Corpus contains {len(corpus.get_utterance_ids())} utterances.")
[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
Downloading the Stack Exchange Politeness Corpus...
Dataset already exists at /Users/adityachhabra/.convokit/saved-corpora/stack-exchange-politeness-corpus
Download complete. Corpus contains 6603 utterances.

Data Preparation#

The following code is specific to the Stanford Politeness Corpus structure. When working with your own datasets, you would adapt this preprocessing step to fit your data format.

Key steps in this preparation:

  1. Extract the request text and politeness label from each utterance

  2. Create a context field by splitting the text (in a real dataset, you might have distinct context)

  3. Convert the politeness scores to binary values (1 = polite, 0 = impolite)

  4. Format the data to work with iQual’s expected input structure

Note that for demonstration purposes, we’re artificially creating a “context” column by taking the first sentence of each request.

In your own work with interview transcripts or conversational data, you would typically have separate interviewer questions and respondent answers, or distinct turns in a conversation that provide natural context for each response.

# Process the corpus into a DataFrame
rows = []

# Iterate through the utterances
for utt in corpus.iter_utterances():
    # Extract request text
    request_text = utt.text
    
    # Get metadata
    meta = utt.meta
    
    # Extract the binary politeness label (1="polite", 0="neutral", -1="impolite")
    if 'Binary' in meta:
        binary_label = meta['Binary']
        # Convert to our binary format (1=polite, 0=impolite/neutral)
        is_polite = 1 if binary_label == 1 else 0
        
        # For Stack Exchange, create a simplified context
        if len(request_text.split('.')) > 1:
            # Split at the first period
            parts = request_text.split('.', 1)
            context_text = parts[0] + '.'
            request_text = parts[1].strip()
        else:
            # For short texts, use a generic context
            context_text = "Stack Exchange discussion:"
        
        # Add to our list of data points
        rows.append({
            'context': context_text,
            'answer': request_text,
            'is_polite': is_polite
        })

# Convert to DataFrame
data = pd.DataFrame(rows)

# Remove any rows where question or answer is empty
data = data[data['context'].str.len() > 0]
data = data[data['answer'].str.len() > 0]

# Print stats
print(f"Created DataFrame with {len(data)} records")
print(f"Polite requests: {sum(data['is_polite'] == 1)}")
print(f"Impolite requests: {sum(data['is_polite'] == 0)}")

data.sample(10)
Created DataFrame with 6596 records
Polite requests: 1649
Impolite requests: 4947
context answer is_polite
2452 I'm not sure your question makes sense. Do you mean, how do you measure the length of ... 0
492 Stack Exchange discussion: What about `DefaultListBoxItemStyle`? Are you ... 0
4233 Hi johnny8888, I'm on a Solaris server and the... Can you please elaborate your answer? 0
3632 Stack Exchange discussion: Do you mean you need real world use cases for ... 0
443 Stack Exchange discussion: Do the URLs you want to add also come from the... 0
1165 Now, I am looking at the source for `whine. pl` and it seems to be getting these from an S... 1
2039 This sounds a lot like a machine learning algo... I'm no machine learning expert, but perhaps so... 1
2481 This is not really enough to tell what can be ... Have you tried debug you code, for example tra... 0
4182 Yes, I can change the font. Could you please provide me with an example? 0
3777 I'd suggest you get an iPod Touch to develop o... Don't tell me you need something iPhone specif... 0

Split the data into training and testing sets#

train_data, test_data = train_test_split(data, test_size=0.3)

# Create feature matrix (X) and target vector (y) for training
X_train = train_data[['context', 'answer']]
y_train = train_data['is_polite']

# Create feature matrix for testing
X_test = test_data[['context', 'answer']]
y_test = test_data['is_polite']

print(f"Training data size: {len(X_train)}")
print(f"Testing data size: {len(X_test)}")
Training data size: 4617
Testing data size: 1979
# Create a model
model = iqualnlp.Model()

# Add text features. Define upto 2 text columns, usually a context/question and another for response/answer.
model.add_text_features(q_col='context', 
                        a_col='answer',
                        model='TfidfVectorizer',
                        env='scikit-learn',
                       )

# Add a classifier (Defaults to Logistic Regression)
model.add_classifier(name='LogisticRegression')

# Add a threshold component that will optimize for F1 score
model.add_threshold(scoring_metric='f1')

# Compile the model to finalize the pipeline structure
model.compile()

# Train the model on our training data
model.fit(X_train, y_train)

print("Model fit complete!")

model.model
Model fit complete!
Pipeline(steps=[('Input',
                 FeatureUnion(transformer_list=[('question',
                                                 Pipeline(steps=[('selector',
                                                                  FunctionTransformer(func=<function column_selector at 0x1122391c0>,
                                                                                      kw_args={'column_name': 'context'})),
                                                                 ('vectorizer',
                                                                  Vectorizer(analyzer='word',
                                                                             binary=False,
                                                                             decode_error='strict',
                                                                             dtype=<class 'numpy.float64'>,
                                                                             encoding='utf-8',
                                                                             env='scikit-learn',
                                                                             input='conten...
                            fit_intercept=True, intercept_scaling=1,
                            l1_ratio=None, max_iter=100,
                            model='LogisticRegression',
                            multi_class='deprecated', n_jobs=None, penalty='l2',
                            random_state=None, solver='lbfgs', tol=0.0001,
                            verbose=0, warm_start=False)),
                ('Threshold',
                 BinaryThresholder(threshold=np.float64(0.3052255080473171),
                                   threshold_range=(np.float64(0.015664853903903313),
                                                    np.float64(0.858797346850902))))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Make predictions on the test set#

y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)

Calculate performance metrics#

from sklearn.metrics import (
            f1_score, 
            accuracy_score,
            precision_score, 
            recall_score,
)


# Individual metrics
metrics = {
    'Accuracy': accuracy_score(y_test, y_pred),
    'Precision': precision_score(y_test,y_pred),
    'Recall': recall_score(y_test, y_pred),
    'F1 Score': f1_score(y_test, y_pred)
}

# Display metrics as a dataframe
metric_df = pd.DataFrame([metrics])
print("Performance Metrics (Out-Sample):")
metric_df
Performance Metrics (Out-Sample):
Accuracy Precision Recall F1 Score
0 0.688226 0.38809 0.372047 0.379899

See results dataframe with examples#

results = pd.DataFrame({
    'Context': X_test['context'],
    'Answer': X_test['answer'],
    'True Label': y_test,
    'Predicted Label': y_pred,
    'Probability': y_prob
})

# Display some examples of correct and incorrect predictions
print("Correct Predictions (True Positives):")
results[(results['True Label'] == 1) & (results['Predicted Label'] == 1)].head(3)
Correct Predictions (True Positives):
Context Answer True Label Predicted Label Probability
3131 Stack Exchange discussion: Do you have any code we can look at? What have... 1 1 0.441729
6440 Stack Exchange discussion: Christian, could you please provide a referenc... 1 1 0.352674
621 Can you please post some code. How is data received at the client side? 1 1 0.500905
print("Incorrect Predictions (False Negatives):")
results[(results['True Label'] == 1) & (results['Predicted Label'] == 0)].head(3)
Incorrect Predictions (False Negatives):
Context Answer True Label Predicted Label Probability
3113 Stack Exchange discussion: Is it always the same for every file, or does ... 1 0 0.239120
570 What would you like in the case of, e. g., `http://example.com/test.php?key=val` ? Or... 1 0 0.223762
3029 Stack Exchange discussion: Do the repeats have to be contiguous? is 5,1,4... 1 0 0.273251