Context-aware sarcasm and irony detection powered by RoBERTa.
Trained on 100k Reddit comments so you don't have to read them.
| Metric | Score |
|---|---|
| Accuracy | 80.57% |
| F1 | 80.39% |
| Precision | 81.14% |
| Recall | 79.66% |
totally-not-sarcastic/
├── colab_1_train.py # Train RoBERTa and push to HuggingFace Hub
├── colab_2_dashboard.py # Gradio dashboard — runs on HF Spaces or Colab
├── requirements.txt # Dependencies for HF Spaces
└── README.md
pip install transformers gradio plotly lime torch
python colab_2_dashboard.pyfrom transformers import pipeline
classifier = pipeline("text-classification", model="AK-Rahul/sarcasm-roberta")
# Without context
classifier("Oh absolutely, I love waiting 3 hours at the DMV.")
# With context — improves accuracy for conversational sarcasm
classifier("How was the flight? </s></s> Oh wonderful, only delayed by 4 hours.")Base: roberta-base (125M parameters)
Task: Binary classification — Sarcastic / Not Sarcastic
Context: Supports optional parent_comment as context for conversational input
| Dataset | Rows | Context |
|---|---|---|
| Reddit SARC (danofer) | ~90,000 | ✅ parent_comment pairs |
| TweetEval Irony | ~3,600 | ❌ |
| News Headlines | ~28,600 | ❌ |
| Total (balanced) | 107,058 |
- 3 epochs · batch=32 · lr=2e-5 · fp16
- Label smoothing=0.05
- Embeddings frozen for epoch 1 (prevents catastrophic forgetting)
- Cosine LR decay · Early stopping (patience=2)
- Best checkpoint selected by validation loss
Run this in Google Colab (T4 GPU) to train from scratch and push to HuggingFace Hub.
Requirements before running:
danofer-sarcasm.zipin the root of your Google DriveHF_TOKENadded to Colab Secrets (write access)
Steps:
- Runtime → Change runtime type → T4 GPU
- Add
HF_TOKENto Colab Secrets - Paste entire file into one cell → Run
- ~38 min training time
Gradio dashboard with three tabs — single prediction with LIME explanations, batch classification, and model stats.
To deploy on HuggingFace Spaces:
- Create a new Space → SDK: Gradio
- Upload this file as
app.py - Upload
requirements.txt - Set
MODEL_REPO = "AK-Rahul/sarcasm-roberta"(already set)
To run in Colab:
- Paste into a cell and run — it will share a public Gradio link automatically
- Sarcasm probability dial — semicircular gauge showing exact confidence
- Intensity bands — from ✅ Clearly Sincere to 🔥 Scorching Sarcasm
- LIME word attribution — highlights which words pushed the prediction
- Batch mode — classify multiple messages at once with shared context
- Export history — download all predictions as CSV
- Model stats tab — confusion matrix, radar chart, architecture details
transformers>=4.40
torch
gradio>=4.0
plotly
lime
MIT — see LICENSE
- danofer/sarcasm — Reddit SARC dataset
- TweetEval — Irony benchmark
- raquiba/Sarcasm_News_Headline — News headlines
- RoBERTa — Base model by Meta AI