This repository accompanies the Master's dissertation of the same title, submitted to the University of Bath in 2026. It contains the full experimental codebase used to investigate how presentation realism (textbook vs. realistic framing) and planning mode (direct one-shot vs. stepwise interactive) affect large language model performance on classical planning tasks set in a disaster-relief logistics domain.
disaster_planning/
├── pyproject.toml # Dependencies and project metadata
├── config/
│ ├── default.yaml # Experiment configuration
│ └── models.yaml # OpenRouter model definitions and pricing
├── src/
│ ├── domain/ # PDDL domain definition
│ ├── instance_generation/ # Procedural generation of problem instances
│ ├── rendering/ # Textbook and realistic presentation renderers
│ ├── llm/ # OpenRouter API client with multi-key failover
│ ├── planning/ # Direct, stepwise, and Fast Downward runners
│ ├── validation/ # Internal state simulator and VAL integration
│ ├── experiments/ # Experiment orchestrator with feasibility gating
│ └── reports/ # Spreadsheet and chart generation
├── scripts/
│ ├── generate_instances.py # Create PDDL problem files
│ ├── run_experiment.py # Run the full experiment
│ ├── generate_reports.py # Compile results into .xlsx and charts
│ ├── estimate_costs.py # Project API costs from a pilot run
│ └── rerun_failures.py # Retry transient LLM failures
├── tools/
│ └── wsl_downward.py # WSL wrapper for Fast Downward
└── results/
├── instances/ # Generated PDDL problem files
└── reports/ # Final dissertation data (master_thesis_data.xlsx)
- Python ≥ 3.9
- Fast Downward installed and accessible via WSL (see Setup below)
- An OpenRouter API key
- (Optional) VAL binaries for external plan validation; the framework falls back to its internal simulator if VAL is unavailable
python -m venv .venv
.venv\Scripts\activate # Windows
pip install -e .$env:OPENROUTER_API_KEY = "your-primary-key"
$env:OPENROUTER_API_KEY_2 = "your-backup-key" # Optional failover keyThe client uses OPENROUTER_API_KEY by default and automatically fails over to OPENROUTER_API_KEY_2 on authentication errors.
This project uses Fast Downward as its classical planning baseline. On Windows, it runs via WSL:
- Ensure WSL is installed with an Ubuntu distribution (
wsl --install -d Ubuntu). - Clone and build Fast Downward inside the
tools/downward/directory:cd tools git clone https://github.com/aibasel/downward.git cd downward wsl -d Ubuntu ./build.py
- Verify it works:
python tools/wsl_downward.py --version
The wrapper at tools/wsl_downward.py translates Windows paths to WSL mount paths automatically.
python scripts/generate_instances.pyCreates PDDL problem files across three difficulty tiers (easy, medium, hard) in results/instances/. Instance count and random seed are configured in config/default.yaml.
python scripts/run_experiment.pyExecutes the full experimental matrix: Fast Downward baselining → direct mode LLM queries → stepwise interactive runs (with feasibility gating) → validation. Results are saved to results/outputs/<timestamp>/.
python scripts/rerun_failures.pyReruns only rows that failed due to API errors (llm_error). Planning-level failures (e.g. budget_exhausted, goal_not_achieved) are preserved as-is.
python scripts/generate_reports.pyCompiles results into a multi-sheet .xlsx workbook and publication-ready charts in results/reports/.
python scripts/estimate_costs.pyProjects the cost of a full experiment run from a pilot CSV, with per-condition breakdowns and safety margins.
| Factor | Levels |
|---|---|
| Presentation style | Textbook, Realistic |
| Planning mode | Direct (one-shot), Stepwise |
| Difficulty | Easy, Medium, Hard |
- Rolling-context stepwise mode: each LLM turn sends only the system prompt, a compact mission brief, current state, and last feedback (~400–800 tokens per turn), rather than replaying the full conversation history.
- Feasibility gating: instances whose optimal plan length exceeds the step budget are automatically skipped in stepwise mode, controlled by
stepwise_skip_infeasibleandstepwise_budget_bufferinconfig/default.yaml. - Loop detection: non-convergent stepwise runs are terminated early via state-revisit counting, repeated-action detection, and no-progress monitoring.
- Multi-key failover: supports two OpenRouter API keys with automatic rotation on authentication errors.
If you use this code or data in your own work, please cite:
@mastersthesis{chapman2026presentation,
title = {Presentation Realism and Agentic Planning: An {LLM} Evaluation in Disaster-Relief Logistics},
author = {Chapman, Matthew},
school = {University of Bath},
year = {2026}
}This project is licensed under the MIT Licence. See LICENSE for details.
Note: Fast Downward is a separate project with its own licence (GPLv3) and is not included in this repository.