Skip to content

ch020/disaster-planning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Presentation Realism and Agentic Planning: An LLM Evaluation in Disaster-Relief Logistics

This repository accompanies the Master's dissertation of the same title, submitted to the University of Bath in 2026. It contains the full experimental codebase used to investigate how presentation realism (textbook vs. realistic framing) and planning mode (direct one-shot vs. stepwise interactive) affect large language model performance on classical planning tasks set in a disaster-relief logistics domain.

Project Structure

disaster_planning/
├── pyproject.toml                 # Dependencies and project metadata
├── config/
│   ├── default.yaml               # Experiment configuration
│   └── models.yaml                # OpenRouter model definitions and pricing
├── src/
│   ├── domain/                    # PDDL domain definition
│   ├── instance_generation/       # Procedural generation of problem instances
│   ├── rendering/                 # Textbook and realistic presentation renderers
│   ├── llm/                       # OpenRouter API client with multi-key failover
│   ├── planning/                  # Direct, stepwise, and Fast Downward runners
│   ├── validation/                # Internal state simulator and VAL integration
│   ├── experiments/               # Experiment orchestrator with feasibility gating
│   └── reports/                   # Spreadsheet and chart generation
├── scripts/
│   ├── generate_instances.py      # Create PDDL problem files
│   ├── run_experiment.py          # Run the full experiment
│   ├── generate_reports.py        # Compile results into .xlsx and charts
│   ├── estimate_costs.py          # Project API costs from a pilot run
│   └── rerun_failures.py          # Retry transient LLM failures
├── tools/
│   └── wsl_downward.py            # WSL wrapper for Fast Downward
└── results/
    ├── instances/                 # Generated PDDL problem files
    └── reports/                   # Final dissertation data (master_thesis_data.xlsx)

Requirements

  • Python ≥ 3.9
  • Fast Downward installed and accessible via WSL (see Setup below)
  • An OpenRouter API key
  • (Optional) VAL binaries for external plan validation; the framework falls back to its internal simulator if VAL is unavailable

Setup

Python Environment

python -m venv .venv
.venv\Scripts\activate      # Windows
pip install -e .

OpenRouter API Keys

$env:OPENROUTER_API_KEY = "your-primary-key"
$env:OPENROUTER_API_KEY_2 = "your-backup-key"   # Optional failover key

The client uses OPENROUTER_API_KEY by default and automatically fails over to OPENROUTER_API_KEY_2 on authentication errors.

Fast Downward Setup

This project uses Fast Downward as its classical planning baseline. On Windows, it runs via WSL:

  1. Ensure WSL is installed with an Ubuntu distribution (wsl --install -d Ubuntu).
  2. Clone and build Fast Downward inside the tools/downward/ directory:
    cd tools
    git clone https://github.com/aibasel/downward.git
    cd downward
    wsl -d Ubuntu ./build.py
  3. Verify it works:
    python tools/wsl_downward.py --version

The wrapper at tools/wsl_downward.py translates Windows paths to WSL mount paths automatically.

Reproducing the Experiment

1. Generate Problem Instances

python scripts/generate_instances.py

Creates PDDL problem files across three difficulty tiers (easy, medium, hard) in results/instances/. Instance count and random seed are configured in config/default.yaml.

2. Run the Experiment

python scripts/run_experiment.py

Executes the full experimental matrix: Fast Downward baselining → direct mode LLM queries → stepwise interactive runs (with feasibility gating) → validation. Results are saved to results/outputs/<timestamp>/.

3. Retry Transient Failures (Optional)

python scripts/rerun_failures.py

Reruns only rows that failed due to API errors (llm_error). Planning-level failures (e.g. budget_exhausted, goal_not_achieved) are preserved as-is.

4. Generate Reports

python scripts/generate_reports.py

Compiles results into a multi-sheet .xlsx workbook and publication-ready charts in results/reports/.

5. Estimate API Costs (Optional)

python scripts/estimate_costs.py

Projects the cost of a full experiment run from a pilot CSV, with per-condition breakdowns and safety margins.

Experimental Conditions

Factor Levels
Presentation style Textbook, Realistic
Planning mode Direct (one-shot), Stepwise
Difficulty Easy, Medium, Hard

Key Design Decisions

  • Rolling-context stepwise mode: each LLM turn sends only the system prompt, a compact mission brief, current state, and last feedback (~400–800 tokens per turn), rather than replaying the full conversation history.
  • Feasibility gating: instances whose optimal plan length exceeds the step budget are automatically skipped in stepwise mode, controlled by stepwise_skip_infeasible and stepwise_budget_buffer in config/default.yaml.
  • Loop detection: non-convergent stepwise runs are terminated early via state-revisit counting, repeated-action detection, and no-progress monitoring.
  • Multi-key failover: supports two OpenRouter API keys with automatic rotation on authentication errors.

Citation

If you use this code or data in your own work, please cite:

@mastersthesis{chapman2026presentation,
  title   = {Presentation Realism and Agentic Planning: An {LLM} Evaluation in Disaster-Relief Logistics},
  author  = {Chapman, Matthew},
  school  = {University of Bath},
  year    = {2026}
}

Licence

This project is licensed under the MIT Licence. See LICENSE for details.

Note: Fast Downward is a separate project with its own licence (GPLv3) and is not included in this repository.

About

Presentation Realism and Agentic Planning: An LLM Evaluation in Disaster-Relief Logistics

Topics

Resources

License

Stars

Watchers

Forks

Contributors