- Repository Map
- How The Pieces Fit Together
- Prerequisites
- Installation
- Environment Variables
- Optional Local Assets
- Quick Start
- Configuration Guide
- Citation
Xcientist is a multi-agent research workflow for turning a topic into survey artifacts, structured ideas, executable experiments, and technical blog articles. The repository currently centers on four agent stacks:
Survey Agent: collects papers, builds topic clusters, and writes survey outputs.Idea Agent (LigAgent): turns a topic or seed idea into a research proposal with survey-grounded retrieval, graph-backed references, and Memory-Guided MCTS.Experiment Agent (SuperAgent): prepares a workspace, generates code, runs experiments, and integrates iteration reports.Blog Agent: reads an experiment workspace and writes a technical blog article with generated figures and quality checks.
The repo also contains a prototype loop runner for Survey -> Idea -> Experiment -> Blog, shared configuration, and a reusable memory subsystem.
Xcientist/
├── README.md / README_CN.md # documentation
├── pyproject.toml / uv.lock # Python package metadata and locked uv environment
├── requirements.txt # compatibility dependency list
├── run_survey.sh # wrapper for `xcientist survey`
├── run_idea.sh # wrapper for `xcientist idea`
├── run_experiment.sh # wrapper for `xcientist experiment`
├── run_blog.sh # wrapper for `xcientist blog`
├── run_pipeline.sh # wrapper for `xcientist pipeline`
├── scripts/ # setup and environment helper scripts
│ ├── install_base.sh
│ ├── install_heavy.sh
│ ├── install_mcp_wrappers.sh
│ └── sync_claude_anthropic_env.py
├── src/
│ ├── __main__.py # `python -m src` entrypoint
│ ├── cli.py # unified `xcientist` CLI
│ ├── config/
│ │ ├── __init__.py # unified config loader
│ │ └── default.yaml # main project config
│ ├── pipeline/ # Survey -> Idea -> Experiment -> Blog loop
│ ├── agents/
│ │ ├── survey_agent/ # paper retrieval, clustering, survey generation
│ │ ├── idea_agent/ # LigAgent proposal generation
│ │ ├── experiment_agent/ # SuperAgent experiment orchestration
│ │ └── blog_agent/ # technical blog generation
│ └── memory/ # shared vector/symbolic memory APIs
├── graph/ # graph retrieval service and indexing scripts
├── database/ # local caches used by retrieval workflows
├── assets/ # project images and static assets
└── workspace/ # default runtime workspace, created/used locally
Topic
-> Survey Agent
output: survey.md + survey.json
-> Idea Agent
output: idea_result.json
-> Experiment Agent
output: workspace, results, ablation_results.json
-> Blog Agent
output: blog workspace, article draft, generated figures
The pipeline runner in src/pipeline/run_loop.py automates the full Survey -> Idea -> Experiment -> Blog flow, but the individual agents remain the clearest way to operate and debug the system.
uv- Python
3.12 nodeandnpxfor Experiment Agent MCP servers- API keys depending on which agent you run
- Local assets for graph-backed retrieval and memory-enabled workflows
- Paper-Graph related resource donwload link, put them into
<repo_root>/data/processed. - Embedding model download:
mkdir -p models/bge-m3 mkdir -p models/all-MiniLM-L6-v2 modelscope download -- model baai/bge-m3 --local_dir <repo_root>/models/bge-m3 modelscope download --model sentence-transformers/all-MiniLM-L6-v2 --local_dir <repo_root>/models/all-MiniLM-L6-v2 - Paper-Graph related resource donwload link, put them into
The default setup path is now uv.
git clone --depth 1 https://github.com/OpenDFM/Xcientist.git
uv sync
source .venv/bin/activate
cp .env.example .env
xcientist doctorCommon group combinations:
# Base CLI / config / API-only workflows
uv sync
# Memory-enabled and local-model workflows
uv sync --group memory --group ml
# PDF parsing stack
uv sync --group pdf
# Blog Agent full workflow: PDF parsing + image generation / OCR / text removal
uv sync --group pdf --group blog
# Full local environment
uv sync --all-groupsIf you want local MCP wrapper scripts for Experiment Agent:
xcientist install-mcp-wrappersenvironment.yml is still available as a legacy/full-environment fallback, but uv sync is the primary path for Survey + Idea + Experiment + Blog + Pipeline. The dependency layout is now split so the default install stays lightweight and heavy local-model / PDF stacks are opt-in.
After activation, the project exposes CLI entrypoints such as xcientist, xcientist-survey, and xcientist-idea directly in the shell.
Different agents read slightly different variables. In practice, these are the most useful ones to define:
export OPENAI_API_KEY=...
export OPENAI_BASE_URL=...
export SEMANTIC_SCHOLAR_API_KEY=...
export ANTHROPIC_API_KEY=...
export ANTHROPIC_BASE_URL=...
export SERPER_API_KEY=...
export GITHUB_AI_TOKEN=...
export JINA_API_KEY=...
export TAVILY_API_KEY=...
export HF_TOKEN=...Notes:
- Set both
OPENAI_API_BASEandOPENAI_BASE_URLif you use a custom OpenAI-compatible endpoint. - The CLI loads repo-root
.envfirst and still falls back tosrc/config/.envfor older setups. src/config/default.yamlis the main configuration file for the current unified workflow.- Survey, Idea, Experiment, and Blog still have some agent-specific conventions on top of the unified config.
Some retrieval-heavy paths expect local assets that are not stored in the repository:
data/processed/graph.dbdata/processed/core_component_summary_vector_store/models/bge-m3/models/all-MiniLM-L6-v2/
If you use graph-backed retrieval, start the graph service from the repository root:
uvicorn graph.server:app --host 127.0.0.1 --port 8000Health check:
curl http://127.0.0.1:8000/healthRecommended first-time flow:
uv sync --group memory --group ml
source .venv/bin/activate
cp .env.example .env
xcientist doctorIf doctor passes and your local assets are in place, use the commands below.
Using the provided Training-Free Memory System for LLM Agents example:
Generate survey only:
xcientist survey --topic "Training-Free Memory System for LLM Agents"Run ideation from the provided sample survey:
xcientist idea --topic "Training-Free Memory System for LLM Agents"Run experiment from the provided sample idea:
xcientist experiment --experiment agent_memory --idea-json <repo_root>/src/agents/idea_agent/example/idea_result.jsonStart blog generation from the sample experiment workspace:
xcientist blog --experiment agent_memory --source-workspace <repo_root>/workspace/training-free-memory-exampleFor further configuration changes, edit src/config/default.yaml.
Primary entrypoint:
xcientist surveyOverride the topic directly:
xcientist survey --topic <your_topic_name>Typical outputs:
src/agents/survey_agent/outputs/.../survey.mdsrc/agents/survey_agent/outputs/.../survey.jsonsrc/agents/survey_agent/outputs/.../evaluation.txt
Primary entrypoint:
xcientist ideaOverride the topic directly:
xcientist idea --topic <your_topic_name>The default run uses src/config/default.yaml, materializes a run directory under src/agents/idea_agent/runs/, and writes idea_result.json plus logs.
Primary entrypoint:
xcientist experiment --experiment my_exp --idea-json /abs/path/to/idea_result.jsonPrepare only:
xcientist experiment --experiment my_exp --idea-json /abs/path/to/idea_result.json --prepare-onlyDirect entrypoint:
python -m src.agents.experiment_agent.main --experiment my_exp --resume --verboseKey workspace outputs live under workspace/<experiment_id>/ by default and usually include:
idea.jsonproject/dataset_candidate/results/agent_reports/ablation_results.json
Blog Agent generates a technical blog article from an existing experiment workspace.
Recommended entrypoint:
xcientist blog --experiment my_expWith the default workspace root at <repo_root>/workspace, this reads the source experiment from:
<repo_root>/workspace/my_expYou can also pass that experiment workspace explicitly:
xcientist blog --experiment my_exp --source-workspace <repo_root>/workspace/my_expIf the experiment workspace is not under the blog agent's default source path, pass it explicitly:
xcientist blog --experiment my_exp --source-workspace /abs/path/to/experiment_workspaceResume an existing blog workspace:
xcientist blog --experiment my_exp --resume./run_blog.sh remains available as a compatibility wrapper and delegates to the same xcientist blog command.
Run the integrated loop with the topic from src/config/default.yaml:
xcientist pipelineOverride the research topic at launch time:
xcientist pipeline --topic "Training-Free Memory System for LLM Agents"Use a custom config file:
xcientist pipeline --config /abs/path/to/config.yaml --topic "Your Research Topic"The current configuration layout is mixed by design:
| Area | Primary source |
|---|---|
| Global project config | src/config/default.yaml |
| Survey Agent | survey: block in src/config/default.yaml plus src/agents/survey_agent/config/*.yaml |
| Idea Agent | idea: block in src/config/default.yaml |
| Experiment Agent | experiment: block in src/config/default.yaml and environment variables |
| Blog Agent | blog: block in src/config/default.yaml plus BLOG_AGENT_SOURCE_WORKSPACE when needed |
| Pipeline | pipeline: block in src/config/default.yaml |
If you are starting fresh, edit src/config/default.yaml first. It is the most reliable single file to understand current defaults.
If you use Xcientist in your research, please cite:
@misc{
to be released
}
