Context-Aware Acceptability Judgment (CAAJ)

Overview

This repository contains scripts to reconstruct the dataset used for Assessing the Effect of Context in Multi-domain Acceptability Judgment. From each source specified, acquire the raw data and place them into the designated folder based on each domain. Run the script(s) in each folder to filter the data per domain. Then, run the script in the root folder to combine all domains data into one dataset.

Public Dataset Sources

Following are the links to each data source:

How To

In the beginning of each script, there is a section marked as CONFIG. Based on the specified schema, change the placeholder paths for INPUT and OUTPUT accordingly. Ensure that the type is correct, i.e., TXT in folder or CSV or JSON.

Schema Check

Ensure the format/schema of each raw data and the resulting filtered data per domain are consistent with the following specifications. The scripts work strictly for the specified schema.

1. Literature

Input: TXT files in "gutenberg" folder

Output: single JSON file

[
    {
        "author": "William Dean Howells",
        "title": "Poems",
        "sentence": "Tore, and gave to her, who took it with mocking obeisance,",
        "type": "poetry"
    },
    ...
]

2. Journalistic

Input: single JSON file

Output: single JSON file

[
    {
        "author": "Kelly Chen",
        "headline": "Grieving Mothers: My Son Would Still Be Alive If He Were White",
        "type": "news"
    },
    ...
]

3. Advertisement

Input: single CSV file

Output: single JSON file

[
    {
        "company": "Costa Coffee",
        "headline": "For coffee lovers.",
        "type": "slogan"
    },
    ...
]

4. Academic

Input: single CSV file

Output: single JSON file

[
    {
        "first_author": "Sobhan Soleymani",
        "title": "Adversarial Examples to Fool Iris Recognition Systems",
        "category": "Machine Learning",
        "type": "academic"
    },
    ...
]

5. Instructional

Input: single CSV file

Output: single JSON file

[
    {
        "title": "How to Get Adhesive out of Carpet",
        "headline": "Apply dish soap.",
        "type": "instruction"
    },
    ...
]

6. Communication

SOURCE 1

Input: TXT files in "gutenberg" folder

Output: single JSON file

[
    {
        "author": "Lewis Carroll",
        "title": "Eight or Nine Wise Words about Letter-Writing",
        "sentence": "You will find this much more comfortable than using left-hand pages.",
        "type": "speech"
    },
    ...
]

SOURCE 2

Input: single CSV file

Output: single JSON file (append source 1 output)

7. Online Media

SOURCE 1

Input: single CSV file

Output: single JSON file

[
    {
        "author": "448998",
        "sentence": "inbox me please, I have an issue with my account. Thanks!",
        "type": "tweet"
    },
    ...
]

SOURCE 2

Input: single CSV file

Output: single JSON file (combine source 1 output into another file)

8. Legal

SOURCE 1

Input: TXT files in "CUAD_v1/full_contract_txt" folder

Output: single JSON file

[
    {
        "name": "2ThemartComInc",
        "contract": "branding agreement",
        "sentence": "No SOW shall be binding on the parties unless mutually approved by both parties.",
        "type": "contract"
    },
    ...
]

SOURCE 2

Input: single CSV file

Output: single JSON file (combine source 1 output into another file)

Final Dataset

Input: each JSON file in each domain folder

Output: single JSON file

[
    {
        "sent_id": "s00001",
        "sentence": "Tore, and gave to her, who took it with mocking obeisance,",
        "source": "poetry",
        "keyword": "literature",
        "types": "poems, stories, novels, plays",
        "definition": "written artistic works, especially those with a high and lasting artistic value"
    },
    ...
]

Contact

Eunike Kardinata <eunike.kardinata.ef9 [at] is.naist.jp>

Acknowledgement

This work was supported by the Natural Language Processing Laboratory of Nara Institute of Science and Technology (NAIST) and has been partially supported by JSPS KAKENHI Grant Numbers 25K24369 and 26K21312.

Citation

Please cite the following paper.

@inproceedings{kardinata-etal-2026-assessing,
    title = "Assessing the Effect of Context in Multi-domain Acceptability Judgment",
    author = "Kardinata, Eunike Andriani  and
      Sakai, Yusuke  and
      Watanabe, Taro",
    editor = "Liakata, Maria  and
      Moreira, Viviane P.  and
      Zhang, Jiajun  and
      Jurgens, David",
    booktitle = "Findings of the {A}ssociation for {C}omputational {L}inguistics: {ACL} 2026",
    month = jul,
    year = "2026",
    address = "San Diego, California, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.findings-acl.2096/",
    pages = "42251--42266",
    ISBN = "979-8-89176-395-1",
    abstract = "Acceptability judgments provide a crucial basis for understanding how sentences are perceived as natural or well-formed, and they are increasingly used to assess the linguistic capability of large language models (LLMs). Unlike grammaticality, acceptability depends not only on structural form but also on contextual and domain-specific factors. Most prior work evaluates sentences in isolation, and relatively little is known about how explicit contextual cues influence LLM acceptability judgments across domains. This study examines how contextual information affects model-generated acceptability ratings across multiple domains and several LLMs, using different forms of domain-specific contextual cues to situate sentences in their intended usage settings. The results show that context can meaningfully shift model judgments, although its effects vary across models and domains. Overall, the findings provide evidence on contextual effects in LLM acceptability judgment and support the development of more context-aware evaluation frameworks."
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
1_literature		1_literature
2_journalistic		2_journalistic
3_advertisement		3_advertisement
4_academic		4_academic
5_instructional		5_instructional
6_communication		6_communication
7_online_media		7_online_media
8_legal		8_legal
LICENSE		LICENSE
README.md		README.md
build_final_dataset.py		build_final_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Context-Aware Acceptability Judgment (CAAJ)

Overview

Public Dataset Sources

How To

Schema Check

1. Literature

2. Journalistic

3. Advertisement

4. Academic

5. Instructional

6. Communication

SOURCE 1

SOURCE 2

7. Online Media

SOURCE 1

SOURCE 2

8. Legal

SOURCE 1

SOURCE 2

Final Dataset

Contact

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Context-Aware Acceptability Judgment (CAAJ)

Overview

Public Dataset Sources

How To

Schema Check

1. Literature

2. Journalistic

3. Advertisement

4. Academic

5. Instructional

6. Communication

SOURCE 1

SOURCE 2

7. Online Media

SOURCE 1

SOURCE 2

8. Legal

SOURCE 1

SOURCE 2

Final Dataset

Contact

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages