Detecting Data Use#
This notebook shows how to use a finetuned model using unsloth for inference.
## kaggle installation below
# %%capture
# !pip install pip3-autoremove
# !pip-autoremove torch torchvision torchaudio -y
# !pip install torch torchvision torchaudio xformers --index-url https://download.pytorch.org/whl/cu121
# !pip install unsloth
## colab installation below
%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
# !pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
NOTES#
Prerequisites:
huggingface_hubis installed. This is useful when downloading snapshots of the training from the HuggingFace for exploration and testing.
Also, directly loading the lora parameters when uploaded to HF does not seem to work. The work around is the pull the HF repo, download the artifacts, and point to that directory when using FastLanguageModel.from_pretrained.
Implement a Stopping Criteria#
from transformers import StoppingCriteria
class DataUseStoppingCriteria(StoppingCriteria):
def __init__(self, target_sequence):
self.target_sequence = target_sequence
def __call__(self, input_ids, scores, **kwargs):
# Get the generated text as a string
generated_text = tokenizer.decode(input_ids[0])
# generated_text = generated_text.replace(self.prompt,'')
# # Check if the target sequence appears in the generated text
# if self.target_sequence in generated_text:
# return True # Stop generation
if generated_text.count(self.target_sequence) > 1:
return True # Stop generation
return False # Continue generation
def __len__(self):
return 1
def __iter__(self):
yield self
Load our finetuned model in huggingface, just run the code below to use it. You can change the model_id to your finetuned model.#
from huggingface_hub import snapshot_download
model_id = "avsolatorio/data-use-unsloth-phi-3.5-simpleschema-thinking-prwp-manual-914-train-20epochs-1738770532-lora"
snapshot_model_res = snapshot_download(model_id)
snapshot_model_res
'/root/.cache/huggingface/hub/models--avsolatorio--data-use-unsloth-phi-3.5-simpleschema-thinking-prwp-manual-914-train-20epochs-1738770532-lora/snapshots/2071a0a2207eab862263ce5fd6faa578d10e7bbb'
# Load via FastLanguageModel
from unsloth import FastLanguageModel
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = (
None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
)
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=snapshot_model_res, # YOUR MODEL YOU USED FOR TRAINING
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit,
)
==((====))== Unsloth 2025.3.18: Fast Llama patching. Transformers: 4.49.0.
\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \ Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\ / Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
"-____-" Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth 2025.3.18 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
from transformers import TextStreamer
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
messages = [
{
"from": "human",
"value": """3 only in conjunction with policies for local procurement. Moreover, some of the mining-related papers have focused on mining in an African context,
exploring a range of outcomes, including HIV-transmission and sexual risk taking (Corno and de Walque 2012; Wilson 2012), women’s empowerment (Benshaul-Tolonen 2018),
infant mortality (Benshaul-Tolonen, 2019) and labor market outcomes (Kotsadam and Tolonen 2016). Mining is also associated with more economic activity measured by nightlights
(Benshaul-Tolonen, 2019; Mamo et al, 2019). Kotsadam and Tolonen (2016) use DHS data from Africa, and find that mine openings cause women to shift from agriculture to service
production and that women become more likely to work for cash and year-round as opposed to seasonally. Continuing this analysis, Benshaul-Tolonen (2018) explores the links
between mining and female empowerment in eight gold-producing countries in East and West Africa, including Ghana. Women in gold mining communities have more diversified
labor markets opportunities, better access to health care, and are less likely to accept domestic violence. In addition, infant mortality rates decrease with up to 50% in mining communities,
from very high initial levels (Benshaul-Tolonen, 2019). In a study that focuses exclusively on Ghana, Aragón and Rud (2013) explore the link between pollution from mining and
agricultural productivity. The results point toward decreasing agricultural productivity because of environmental pollution and soil degradation, which could have negative
welfare effects on households that do not engage in mining activities or in indirectly stimulated sectors. Lower productivity in agriculture could potentially push households
to engage in mining-related sectors, in addition to pull factors such as higher wage earnings in the stimulated sectors. We explore the effects of mining activity on employment,
earnings, expenditure, and children’s health outcomes in local communities and in districts with gold mining. We combine the DHS and GLSS with production data for 17 large-scale
gold mines in Ghana. We find that a new large-scale gold mine changes economic outcomes, such as access to employment and cash earnings. In addition, it raises local wages and
expenditure on housing and energy. An important welfare indicator in developing countries is infant mortality, and we note a large and significant decrease in mortality rates
among young children, at both the local and district levels.1 We hypothesize that increased access to prenatal care is one of the mechanisms behind the increased survival rate.
1 In the 2010 Ghana population census average district size is 112,000""",
},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True, # Must add for generation
return_tensors="pt",
).to("cuda")
text_streamer = TextStreamer(tokenizer, skip_prompt=True)
g = model.generate(
input_ids=inputs,
streamer=text_streamer,
max_new_tokens=2048,
use_cache=True,
stopping_criteria=DataUseStoppingCriteria("<|end|>"),
)
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
{
"data_used": true,
"data_mentions": [
{
"mentioned_in": "3 only in conjunction with policies for local procurement. Moreover, some of the mining-related papers have focused on mining in an African context,\n exploring a range of outcomes, including HIV-transmission and sexual risk taking (Corno and de Walque 2012; Wilson 2012), women\u2019s empowerment (Benshaul-Tolonen 2018),\n infant mortality (Benshaul-Tolonen, 2019) and labor market outcomes (Kotsadam and Tolonen 2016). Mining is also associated with more economic activity measured by nightlights\n (Benshaul-Tolonen, 2019; Mamo et al, 2019). Kotsadam and Tolonen (2016) use DHS data from Africa, and find that mine openings cause women to shift from agriculture to service\n production and that women become more likely to work for cash and year-round as opposed to seasonally. Continuing this analysis, Benshaul-Tolonen (2018) explores the links\n between mining and female empowerment in eight gold-producing countries in East and West Africa, including Ghana. Women in gold mining communities have more diversified\n labor markets opportunities, better access to health care, and are less likely to accept domestic violence. In addition, infant mortality rates decrease with up to 50% in mining communities,\n from very high initial levels (Benshaul-Tolonen, 2019). In a study that focuses exclusively on Ghana, Arag\u00f3n and Rud (2013) explore the link between pollution from mining and\n agricultural productivity. The results point toward decreasing agricultural productivity because of environmental pollution and soil degradation, which could have negative\n welfare effects on households that do not engage in mining activities or in indirectly stimulated sectors. Lower productivity in agriculture could potentially push households\n to engage in mining-related sectors, in addition to pull factors such as higher wage earnings in the stimulated sectors. We explore the effects of mining activity on employment,\n earnings, expenditure, and children\u2019s health outcomes in local communities and in districts with gold mining. We combine the DHS and GLSS with production data for 17 large-scale\n gold mines in Ghana. We find that a new large-scale gold mine changes economic outcomes, such as access to employment and cash earnings. In addition, it raises local wages and\n expenditure on housing and energy. An important welfare indicator in developing countries is infant mortality, and we note a large and significant decrease in mortality rates\n among young children, at both the local and district levels.1 We hypothesize that increased access to prenatal care is one of the mechanisms behind the increased survival rate.",
"datasets": [
{
"raw_name": "DHS and GLSS",
"harmonized_name": "Demographic and Health Surveys (DHS) and Ghana Living Standards Survey (GLSS)",
"acronym": "DHS and GLSS",
"context": "primary",
"specificity": "properly_named",
"relevance": "directly_relevant",
"producer": null,
"data_type": "Surveys",
"year": null
},
{
"raw_name": "production data for 17 large-scale gold mines in Ghana",
"harmonized_name": "production data for 17 large-scale gold mines in Ghana",
"acronym": null,
"context": "primary",
"specificity": "descriptive_but_unnamed",
"relevance": "directly_relevant",
"producer": null,
"data_type": "Economic & Trade Data",
"year": null
}
]
}
]
}<|end|>