This repository contains scripts to reconstruct the dataset used for Assessing the Effect of Context in Multi-domain Acceptability Judgment. From each source specified, acquire the raw data and place them into the designated folder based on each domain. Run the script(s) in each folder to filter the data per domain. Then, run the script in the root folder to combine all domains data into one dataset.
Following are the links to each data source:
- Literature
- Journalistic
- Advertisement
- Academic
- Instructional
- Communication 1, Communication 2
- Online Media 1, Online Media 2
- Legal 1, Legal 2
In the beginning of each script, there is a section marked as CONFIG. Based on the specified schema, change the placeholder paths for INPUT and OUTPUT accordingly. Ensure that the type is correct, i.e., TXT in folder or CSV or JSON.
Ensure the format/schema of each raw data and the resulting filtered data per domain are consistent with the following specifications. The scripts work strictly for the specified schema.
Input: TXT files in "gutenberg" folder
Output: single JSON file
[
{
"author": "William Dean Howells",
"title": "Poems",
"sentence": "Tore, and gave to her, who took it with mocking obeisance,",
"type": "poetry"
},
...
]
Input: single JSON file
Output: single JSON file
[
{
"author": "Kelly Chen",
"headline": "Grieving Mothers: My Son Would Still Be Alive If He Were White",
"type": "news"
},
...
]
Input: single CSV file
Output: single JSON file
[
{
"company": "Costa Coffee",
"headline": "For coffee lovers.",
"type": "slogan"
},
...
]
Input: single CSV file
Output: single JSON file
[
{
"first_author": "Sobhan Soleymani",
"title": "Adversarial Examples to Fool Iris Recognition Systems",
"category": "Machine Learning",
"type": "academic"
},
...
]
Input: single CSV file
Output: single JSON file
[
{
"title": "How to Get Adhesive out of Carpet",
"headline": "Apply dish soap.",
"type": "instruction"
},
...
]
Input: TXT files in "gutenberg" folder
Output: single JSON file
[
{
"author": "Lewis Carroll",
"title": "Eight or Nine Wise Words about Letter-Writing",
"sentence": "You will find this much more comfortable than using left-hand pages.",
"type": "speech"
},
...
]
Input: single CSV file
Output: single JSON file (append source 1 output)
Input: single CSV file
Output: single JSON file
[
{
"author": "448998",
"sentence": "inbox me please, I have an issue with my account. Thanks!",
"type": "tweet"
},
...
]
Input: single CSV file
Output: single JSON file (combine source 1 output into another file)
Input: TXT files in "CUAD_v1/full_contract_txt" folder
Output: single JSON file
[
{
"name": "2ThemartComInc",
"contract": "branding agreement",
"sentence": "No SOW shall be binding on the parties unless mutually approved by both parties.",
"type": "contract"
},
...
]
Input: single CSV file
Output: single JSON file (combine source 1 output into another file)
Input: each JSON file in each domain folder
Output: single JSON file
[
{
"sent_id": "s00001",
"sentence": "Tore, and gave to her, who took it with mocking obeisance,",
"source": "poetry",
"keyword": "literature",
"types": "poems, stories, novels, plays",
"definition": "written artistic works, especially those with a high and lasting artistic value"
},
...
]
Eunike Kardinata <eunike.kardinata.ef9 [at] is.naist.jp>
This work was supported by the Natural Language Processing Laboratory of Nara Institute of Science and Technology (NAIST) and has been partially supported by JSPS KAKENHI Grant Numbers 25K24369 and 26K21312.
Please cite the following paper.
@inproceedings{kardinata-etal-2026-assessing,
title = "Assessing the Effect of Context in Multi-domain Acceptability Judgment",
author = "Kardinata, Eunike Andriani and
Sakai, Yusuke and
Watanabe, Taro",
editor = "Liakata, Maria and
Moreira, Viviane P. and
Zhang, Jiajun and
Jurgens, David",
booktitle = "Findings of the {A}ssociation for {C}omputational {L}inguistics: {ACL} 2026",
month = jul,
year = "2026",
address = "San Diego, California, United States",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2026.findings-acl.2096/",
pages = "42251--42266",
ISBN = "979-8-89176-395-1",
abstract = "Acceptability judgments provide a crucial basis for understanding how sentences are perceived as natural or well-formed, and they are increasingly used to assess the linguistic capability of large language models (LLMs). Unlike grammaticality, acceptability depends not only on structural form but also on contextual and domain-specific factors. Most prior work evaluates sentences in isolation, and relatively little is known about how explicit contextual cues influence LLM acceptability judgments across domains. This study examines how contextual information affects model-generated acceptability ratings across multiple domains and several LLMs, using different forms of domain-specific contextual cues to situate sentences in their intended usage settings. The results show that context can meaningfully shift model judgments, although its effects vary across models and domains. Overall, the findings provide evidence on contextual effects in LLM acceptability judgment and support the development of more context-aware evaluation frameworks."
}