This repository contains the code and experiments for adversarial attacks on multimodal language model agents, specifically focused on shopping scenarios in the VisualWebArena environment.
This project combines two main components:
- agent-attack/: Implementation of adversarial attacks on multimodal LM agents
- visualwebarena/: Modified VisualWebArena environment for evaluating multimodal agents on realistic visual web tasks
The research focuses on dissecting the adversarial robustness of multimodal agents when performing shopping-related tasks in web environments.
.
├── agent-attack/ # Adversarial attack implementation
│ ├── agent_attack/ # Core attack modules
│ ├── scripts/ # Evaluation and attack scripts
│ ├── episode_scripts/ # Episode-wise evaluation scripts
│ └── step_scripts/ # Step-wise evaluation scripts
│
└── visualwebarena/ # VisualWebArena evaluation environment
├── agent/ # Agent implementations
├── browser_env/ # Browser environment
├── llms/ # LLM providers
└── evaluation_harness/ # Evaluation tools
- Python 3.10 or 3.11 (Python 3.12 is not supported due to deprecated distutils)
- CUDA-capable GPU (recommended for attacks and captioning)
- Docker (required for VisualWebArena environments)
- At least 200GB disk space (for VisualWebArena)
git clone git@github.com:Kristinx351/VWA-Agent-Attack_Shopping.git
cd VWA-Agent-Attack_Shoppingcd visualwebarena/
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
playwright install
pip install -e .
pytest -x # Verify installationcd ../agent-attack/
pip install -e .You may need to install PyTorch according to your CUDA version.
Set up the required API keys as environment variables:
OpenAI:
export OPENAI_API_KEY=<your-openai-api-key>Anthropic (for Claude):
export ANTHROPIC_API_KEY=<your-anthropic-api-key>Google (for Gemini):
gcloud auth login
gcloud config set project <your-google-cloud-project-id>
export VERTEX_PROJECT=<your-google-cloud-project-id>
export AISTUDIO_API_KEY=<your-aistudio-api-key>Configure the URLs for each website environment:
export CLASSIFIEDS="http://127.0.0.1:9980"
export CLASSIFIEDS_RESET_TOKEN="4b61655535e7ed388f0d40a93600254c"
export SHOPPING="http://127.0.0.1:7770"
export REDDIT="http://127.0.0.1:9999"
export WIKIPEDIA="http://127.0.0.1:8888"
export HOMEPAGE="http://127.0.0.1:4399"Replace http://127.0.0.1 with your actual IP address if needed.
-
Setup Docker environments: Follow the instructions in
visualwebarena/environment_docker/README.md -
Generate test config files:
cd visualwebarena/
python scripts/generate_test_data.py- Obtain auto-login cookies:
bash prepare.shGenerate adversarial examples for shopping scenarios:
Captioner Attack:
cd agent-attack/
python scripts/run_cap_attack.pyCLIP Attack:
python scripts/run_clip_attack.py --model gpt-4-vision-preview
python scripts/run_clip_attack.py --model gemini-1.5-pro-latest
python scripts/run_clip_attack.py --model claude-3-opus-20240229
python scripts/run_clip_attack.py --model gpt-4o-2024-05-13Note: Each attack on an image takes approximately 1 hour on a single GPU. We used NVIDIA A100 (80G) for captioner attacks and NVIDIA A6000 for CLIP attacks.
Run full episode evaluations for different models:
GPT-4V:
bash agent-attack/episode_scripts/gpt4v_benign.sh
bash agent-attack/episode_scripts/gpt4v_bim_caption_attack.sh
bash agent-attack/episode_scripts/gpt4v_clip_attack_self_cap.shGPT-4o:
bash agent-attack/episode_scripts/gpt4o_benign.sh
bash agent-attack/episode_scripts/gpt4o_bim_caption_attack.sh
bash agent-attack/episode_scripts/gpt4o_clip_attack_self_cap.shGemini 1.5 Pro:
bash agent-attack/episode_scripts/gemini1.5pro_benign.sh
bash agent-attack/episode_scripts/gemini1.5pro_bim_caption_attack.sh
bash agent-attack/episode_scripts/gemini1.5pro_clip_attack_self_cap.shClaude 3 Opus:
bash agent-attack/episode_scripts/claude3opus_benign.sh
bash agent-attack/episode_scripts/claude3opus_bim_caption_attack.sh
bash agent-attack/episode_scripts/claude3opus_clip_attack_self_cap.shFor faster development and testing, use step-wise evaluation:
bash agent-attack/step_scripts/gpt4v_benign.sh
bash agent-attack/step_scripts/gpt4v_bim_caption_attack.sh
bash agent-attack/step_scripts/gpt4v_clip_attack_self_cap.shRun the GPT-4V + SoM agent on shopping tasks:
cd visualwebarena/
python run.py \
--instruction_path agent/prompts/jsons/p_som_cot_id_actree_3s.json \
--test_start_idx 0 \
--test_end_idx 1 \
--result_dir gpt4_som_shopping \
--test_config_base_dir=config_files/test_shopping \
--model gpt-4-vision-preview \
--action_set_tag som \
--observation_type image_som- OpenAI: GPT-4V, GPT-4o, GPT-3.5
- Anthropic: Claude 3 Opus
- Google: Gemini 1.5 Pro
- Captioner Attack (BIM): Attacks the image captioning component
- CLIP Attack: Attacks the vision-language understanding
- Success rate on shopping tasks
- Attack effectiveness
- Agent robustness analysis
Large data directories are excluded from this repository (see .gitignore):
agent-attack/data/agent-attack/exp_data/agent-attack/exp_result/visualwebarena/environment_docker/
If you use this code, please cite the original papers:
@article{wu2024agentattack,
title={Dissecting Adversarial Robustness of Multimodal LM Agents},
author={Wu, Chen Henry and Shah, Rishi and Koh, Jing Yu and Salakhutdinov, Ruslan and Fried, Daniel and Raghunathan, Aditi},
journal={arXiv preprint arXiv:2406.12814},
year={2024}
}
@article{koh2024visualwebarena,
title={VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks},
author={Koh, Jing Yu and Lo, Robert and Jang, Lawrence and Duvvur, Vikram and Lim, Ming Chong and Huang, Po-Yu and Neubig, Graham and Zhou, Shuyan and Salakhutdinov, Ruslan and Fried, Daniel},
journal={arXiv preprint arXiv:2401.13649},
year={2024}
}- VisualWebArena - Original VisualWebArena repository
- Agent Attack - Original agent attack repository
- WebArena - Web-based agent evaluation benchmark
See individual LICENSE files in agent-attack/ and visualwebarena/ directories.
- This repository focuses specifically on shopping scenarios
- Large data files are excluded from git (see
.gitignore) - Make sure to set up all required environment variables before running experiments
- GPU is recommended for running attacks and captioning models