Presentation Realism and Agentic Planning: An LLM Evaluation in Disaster-Relief Logistics

This repository accompanies the Master's dissertation of the same title, submitted to the University of Bath in 2026. It contains the full experimental codebase used to investigate how presentation realism (textbook vs. realistic framing) and planning mode (direct one-shot vs. stepwise interactive) affect large language model performance on classical planning tasks set in a disaster-relief logistics domain.

Project Structure

disaster_planning/
├── pyproject.toml                 # Dependencies and project metadata
├── config/
│   ├── default.yaml               # Experiment configuration
│   └── models.yaml                # OpenRouter model definitions and pricing
├── src/
│   ├── domain/                    # PDDL domain definition
│   ├── instance_generation/       # Procedural generation of problem instances
│   ├── rendering/                 # Textbook and realistic presentation renderers
│   ├── llm/                       # OpenRouter API client with multi-key failover
│   ├── planning/                  # Direct, stepwise, and Fast Downward runners
│   ├── validation/                # Internal state simulator and VAL integration
│   ├── experiments/               # Experiment orchestrator with feasibility gating
│   └── reports/                   # Spreadsheet and chart generation
├── scripts/
│   ├── generate_instances.py      # Create PDDL problem files
│   ├── run_experiment.py          # Run the full experiment
│   ├── generate_reports.py        # Compile results into .xlsx and charts
│   ├── estimate_costs.py          # Project API costs from a pilot run
│   └── rerun_failures.py          # Retry transient LLM failures
├── tools/
│   └── wsl_downward.py            # WSL wrapper for Fast Downward
└── results/
    ├── instances/                 # Generated PDDL problem files
    └── reports/                   # Final dissertation data (master_thesis_data.xlsx)

Requirements

Python ≥ 3.9
Fast Downward installed and accessible via WSL (see Setup below)
An OpenRouter API key
(Optional) VAL binaries for external plan validation; the framework falls back to its internal simulator if VAL is unavailable

Setup

Python Environment

python -m venv .venv
.venv\Scripts\activate      # Windows
pip install -e .

OpenRouter API Keys

$env:OPENROUTER_API_KEY = "your-primary-key"
$env:OPENROUTER_API_KEY_2 = "your-backup-key"   # Optional failover key

The client uses OPENROUTER_API_KEY by default and automatically fails over to OPENROUTER_API_KEY_2 on authentication errors.

Fast Downward Setup

This project uses Fast Downward as its classical planning baseline. On Windows, it runs via WSL:

Ensure WSL is installed with an Ubuntu distribution (wsl --install -d Ubuntu).

Clone and build Fast Downward inside the tools/downward/ directory:

cd tools
git clone https://github.com/aibasel/downward.git
cd downward
wsl -d Ubuntu ./build.py

Verify it works:
```
python tools/wsl_downward.py --version
```

The wrapper at tools/wsl_downward.py translates Windows paths to WSL mount paths automatically.

Reproducing the Experiment

1. Generate Problem Instances

python scripts/generate_instances.py

Creates PDDL problem files across three difficulty tiers (easy, medium, hard) in results/instances/. Instance count and random seed are configured in config/default.yaml.

2. Run the Experiment

python scripts/run_experiment.py

Executes the full experimental matrix: Fast Downward baselining → direct mode LLM queries → stepwise interactive runs (with feasibility gating) → validation. Results are saved to results/outputs/<timestamp>/.

3. Retry Transient Failures (Optional)

python scripts/rerun_failures.py

Reruns only rows that failed due to API errors (llm_error). Planning-level failures (e.g. budget_exhausted, goal_not_achieved) are preserved as-is.

4. Generate Reports

python scripts/generate_reports.py

Compiles results into a multi-sheet .xlsx workbook and publication-ready charts in results/reports/.

5. Estimate API Costs (Optional)

python scripts/estimate_costs.py

Projects the cost of a full experiment run from a pilot CSV, with per-condition breakdowns and safety margins.

Experimental Conditions

Factor	Levels
Presentation style	Textbook, Realistic
Planning mode	Direct (one-shot), Stepwise
Difficulty	Easy, Medium, Hard

Key Design Decisions

Rolling-context stepwise mode: each LLM turn sends only the system prompt, a compact mission brief, current state, and last feedback (~400–800 tokens per turn), rather than replaying the full conversation history.
Feasibility gating: instances whose optimal plan length exceeds the step budget are automatically skipped in stepwise mode, controlled by stepwise_skip_infeasible and stepwise_budget_buffer in config/default.yaml.
Loop detection: non-convergent stepwise runs are terminated early via state-revisit counting, repeated-action detection, and no-progress monitoring.
Multi-key failover: supports two OpenRouter API keys with automatic rotation on authentication errors.

Citation

If you use this code or data in your own work, please cite:

@mastersthesis{chapman2026presentation,
  title   = {Presentation Realism and Agentic Planning: An {LLM} Evaluation in Disaster-Relief Logistics},
  author  = {Chapman, Matthew},
  school  = {University of Bath},
  year    = {2026}
}

Licence

This project is licensed under the MIT Licence. See LICENSE for details.

Note: Fast Downward is a separate project with its own licence (GPLv3) and is not included in this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Presentation Realism and Agentic Planning: An LLM Evaluation in Disaster-Relief Logistics

Project Structure

Requirements

Setup

Python Environment

OpenRouter API Keys

Fast Downward Setup

Reproducing the Experiment

1. Generate Problem Instances

2. Run the Experiment

3. Retry Transient Failures (Optional)

4. Generate Reports

5. Estimate API Costs (Optional)

Experimental Conditions

Key Design Decisions

Citation

Licence

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
docs		docs
results		results
scripts		scripts
src		src
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
downward.bat		downward.bat
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Presentation Realism and Agentic Planning: An LLM Evaluation in Disaster-Relief Logistics

Project Structure

Requirements

Setup

Python Environment

OpenRouter API Keys

Fast Downward Setup

Reproducing the Experiment

1. Generate Problem Instances

2. Run the Experiment

3. Retry Transient Failures (Optional)

4. Generate Reports

5. Estimate API Costs (Optional)

Experimental Conditions

Key Design Decisions

Citation

Licence

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages