Dynamic personalized scientific-paper recommendation, reading, and reporting.
PaperFlow turns daily paper discovery into a closed-loop research workflow: build a profile, rank today's papers, read the useful ones, collect feedback, and adapt tomorrow's recommendations.
Quick Start | Local GUI | GUI Preview | CLI Usage | Feedback Loop | Feishu/Lark Bot | PaperFlow-Bench | Reproduce
This first public release is a CLI + local browser GUI + optional Feishu/Lark bot version. You can run PaperFlow entirely from the terminal, open a local GUI for interactive paper selection, or keep the Feishu/Lark webhook server alive for scheduled chat pushes.
| Input | Research profiles, papers, PDFs, homepages, Google Scholar pages |
| Output | Daily paper digests, reading reports, weekly profile reports |
| Runtime | Local Python CLI, local browser GUI, SQLite, optional Feishu/Lark webhook + ngrok |
| Benchmark | PaperFlow-Bench on HuggingFace, with public evaluation scripts |
Scientific-paper recommendation is not a one-shot ranking problem. Real researchers ask a moving question: what should I read today, and how should the system adapt tomorrow?
| Traditional paper alerts | PaperFlow |
|---|---|
| Static keyword or profile matching | Structured profile with feedback updates |
| Same feed every day | Date-specific candidate pools and daily digest budget |
| Recommendation only | Recommendation + reading report + feedback loop |
| No explicit drift handling | Short-term and long-term interest drift modeling |
| Hard to reproduce longitudinally | Public PaperFlow-Bench episodes and evaluator |
| Capability | What it does |
|---|---|
| Profile bootstrapping | Builds scholarly profiles from text, PDFs, homepages, or Google Scholar pages |
| Daily recommendation | Fetches arXiv, OpenReview, and journal papers, then ranks a personalized daily digest |
| Reading reports | Generates personalized paper reports from metadata and PDF content |
| Feedback learning | Updates the same profile from CLI, GUI, Feishu/Lark, selected, skipped, read, and natural-language feedback |
| Drift adaptation | Tracks short-window vs long-window interest movement across days |
| Feishu/Lark bot | Sends daily pushes and weekly reports; routes chat feedback and PDF requests |
| Benchmark tooling | Packages, downloads, predicts, and evaluates PaperFlow-Bench submissions |
PaperFlow's daily flow has five steps. Steps 1-3 only run once; steps 4-5 become your daily routine.
# 1. Install
git clone https://github.com/OpenRaiser/PaperFlow.git
cd PaperFlow
pip install -e ".[all]" # full install (or `pip install -e .` for the minimal CLI)
# 2. Configure providers (start with no-download settings; see below)
cp .env.example .env
# edit .env to set PAPERFLOW_LLM_PROVIDER and, for production, an embedding backend
# 3. Initialize runtime + create your user profile (REQUIRED)
paperflow init
paperflow doctor
paperflow profile \
--user-id user_alice \
--natural-language "I work on LLM agents for scientific discovery, \
literature mining, and automated paper reading."
# 4. Daily push (run every morning, or as often as you like)
paperflow daily --user-id user_alice
# 5. Read selected papers (paper IDs come from the latest daily push)
paperflow read 1 3 7 --user-id user_alice
# Optional: use the local browser GUI for steps 4-5
paperflow guiStep 3 is mandatory.
paperflow daily / read / feedbackall read the profile created bypaperflow profile. Skipping it means there's no personalization signal to score against, sopaperflow readhas no push to read from. See Initialize a User Profile below for the four bootstrap methods (text / PDF / Google Scholar / homepage).
paperflow demoThe demo uses deterministic mock/hash providers, so it does not need API keys or network access. Use it to confirm the install before configuring real providers.
Copy the environment template:
cp .env.example .envUse the PAPERFLOW_* variables as the canonical configuration surface. A
fresh install defaults to no-download embeddings so paperflow demo and setup
checks are quick, but real recommendation quality needs a semantic embedding
backend.
Use one OpenAI-compatible gateway for both generation and embeddings:
PAPERFLOW_LLM_PROVIDER=openai
PAPERFLOW_LLM_MODEL=gpt-4o-mini
PAPERFLOW_EMBED_PROVIDER=openai
PAPERFLOW_EMBED_MODEL=text-embedding-3-small
OPENAI_API_KEY=sk-...
# OPENAI_BASE_URL=https://your-openai-compatible-gateway/v1OpenAI-compatible gateways are supported through OPENAI_BASE_URL, including
OpenAI, DashScope, Azure OpenAI, vLLM, and similar services. If credentials are
missing or still look like placeholders, PaperFlow falls back to mock/hash
providers where possible so local workflows remain testable.
Use this for install checks, GUI demos, or classrooms where downloading model weights is not acceptable:
PAPERFLOW_LLM_PROVIDER=mock
PAPERFLOW_EMBED_PROVIDER=hashThis mode is deterministic and fast, but the hash vectors do not encode real semantic similarity. It is not the recommended setting for evaluating recommendation quality.
Use this only when the machine is allowed to download and cache local model weights:
PAPERFLOW_EMBED_PROVIDER=sentence_transformers
PAPERFLOW_EMBED_MODEL=BAAI/bge-m3
PAPERFLOW_EMBED_DIMENSIONS=1024This local mode needs no embedding API key, but the first run downloads the
model weights. BAAI/bge-m3 is about 2.3GB, so do not use it for quick
classroom demos or first-run installation checks.
After changing providers, run:
paperflow doctorpaperflow doctor prints the resolved provider configuration. Runtime data is
stored under data/ and is ignored by Git.
PaperFlow keeps one profile per user_id, and every other command
(daily, read, feedback) reads from that profile. You must create at
least one profile before the first daily run — otherwise paperflow daily
has nothing to score against and paperflow read has no push to read from.
You can bootstrap a profile from any of these four sources, or combine them:
# (a) Self-description in natural language (fastest)
paperflow profile \
--user-id user_alice \
--natural-language "I work on LLM agents for scientific discovery, \
literature mining, and automated paper reading."
# (b) One or more papers you have written or care about
paperflow profile --user-id user_alice --pdf /path/to/my-paper.pdf
# (c) A Google Scholar profile (PaperFlow scrapes the public page)
paperflow profile \
--user-id user_alice \
--scholar-url "https://scholar.google.com/citations?user=..."
# (d) A personal lab or homepage
paperflow profile \
--user-id user_alice \
--homepage-url "https://example.edu/~alice"Repeated paperflow profile calls merge new signals into the existing
profile by default. Use --reset-existing only when you want to rebuild it
from scratch.
Inspect the resulting profile any time with:
python scripts/show_profile.py user_aliceStart the local browser GUI with:
paperflow guiTo preview the interface without installing PaperFlow, open the GitHub Pages mock-data preview: PaperFlow GUI Preview.
The GUI uses the same local SQLite database as the CLI. It is designed for the
real daily workflow: select a user profile, run or load the latest daily push,
mark papers for reading, mark explicit negative feedback, generate local
Markdown reading reports, manage must-read anchors, read an arXiv ID or local
PDF directly, manage local research roles, filter feedback history, and search
the PaperFlow Wiki. It does not run background schedules; scheduled
Feishu/Lark delivery still uses deployments/feishu/.
Useful options:
paperflow gui --port 8766
paperflow gui --host 0.0.0.0 --no-browserDetailed GUI notes are in deployments/desktop/README.md.
paperflow --help| Command | Purpose |
|---|---|
paperflow init |
Create local runtime directories and SQLite tables |
paperflow doctor |
Check dependencies, credentials, and runtime paths |
paperflow demo |
Run an offline provider demo |
paperflow profile |
Create or update a user profile from text, PDFs, Scholar, or homepage data |
paperflow daily |
Generate a daily personalized paper push |
paperflow read |
Generate a personalized reading report |
paperflow wiki |
List, search, and inspect the local reading wiki |
paperflow feedback |
Record feedback for a previous push |
paperflow gui |
Start the local browser GUI |
paperflow eval |
Evaluate PaperFlow-Bench predictions |
Generate a daily recommendation card without sending it:
paperflow daily \
--user-id user_role1 \
--days 1 \
--output data/daily_push.txt \
--dry-runGenerate reading reports from paper IDs shown in a previous push:
paperflow read 1 3 7 --user-id user_role1 --no-feishuBy default, paperflow read uses that user's latest push in
data/paperflow.db. To read from a specific previous push:
paperflow read 1 3 7 --user-id user_role1 --push-id push_20260401_090000 --no-feishuDaily pushes, reading reports, feedback signals, and profile-drift snapshots are also ingested into the local PaperFlow Wiki. Inspect it:
paperflow wiki backfill --user-id user_role1
paperflow wiki topics --user-id user_role1
paperflow wiki stats --user-id user_role1
paperflow wiki search "graph rag" --user-id user_role1
paperflow wiki ask "What have I read about graph RAG?" --user-id user_role1PDFs, reading-report Markdown, monthly reports, and Topic Index files can be saved directly into an Obsidian vault. Point all four export variables at the same upper-level folder:
PAPERFLOW_PDF_DIR=/Users/mario/Documents/Obsidian Vault/Daily Note/Daily Note 2026
PAPERFLOW_READING_REPORTS_DIR=/Users/mario/Documents/Obsidian Vault/Daily Note/Daily Note 2026
PAPERFLOW_MONTHLY_REPORT_DIR=/Users/mario/Documents/Obsidian Vault/Daily Note/Daily Note 2026
PAPERFLOW_TOPIC_INDEX_DIR=/Users/mario/Documents/Obsidian Vault/Daily Note/Daily Note 2026
PAPERFLOW_STORAGE_ROLE_SUBDIR=true
PAPERFLOW_STORAGE_CATEGORY_SUBDIR=true
PAPERFLOW_STORAGE_MONTHLY_SUBDIR=trueLocal exports are role-scoped by default. If data/roles.json maps role1 to
user_role1, then --user-id user_role1 writes these folders:
role1/pdf/arXiv - May 2026/, role1/reading_reports/arXiv - May 2026/,
role1/monthly_reports/, and role1/topic_index/. Monthly report and Topic
Index filenames also include the target month, for example
PaperFlow Monthly Report - role1 - 2026-05.md.
Set PAPERFLOW_STORAGE_ROLE_SUBDIR=false or
PAPERFLOW_STORAGE_CATEGORY_SUBDIR=false only if you want a flatter legacy
layout.
Export a monthly reading summary and Topic Index for Obsidian:
paperflow wiki monthly --user-id user_role1Without --month, PaperFlow exports the current calendar month. Use
--month 2026-05 only when you intentionally want to regenerate an older
month.
Feishu/Lark document export is optional and separate from the GUI and CLI core. Configuration is in docs/feishu-doc-export.md. After configuring Feishu, CLI usage is:
paperflow read 1 --user-id user_role1
paperflow read 1 --user-id user_role1 --folder-id <feishu_folder_token>In the GUI, tick "同时尝试写入飞书文档" when generating a reading report.
Record feedback:
paperflow feedback \
--user-id user_role1 \
--push-id push_20260401_090000 \
--reply "1, 3"Feedback from CLI, GUI, and Feishu/Lark bot replies is stored in the same
SQLite database and updates the same profile for that user_id. See
docs/feedback-loop.md for the full learning path.
The Feishu/Lark integration is optional. Use it when you want PaperFlow to run as a chat bot with scheduled pushes and weekly reports.
If you only want reading reports exported as Feishu/Lark docs, use docs/feishu-doc-export.md instead; that path does not require ngrok or webhook callbacks.
Add the Feishu/Lark and ngrok values to .env:
FEISHU_APP_ID=
FEISHU_APP_SECRET=
FEISHU_VERIFICATION_TOKEN=
FEISHU_USER_ID=
NGROK_AUTHTOKEN=
NGROK_DOMAIN=Bind role chat IDs in data/roles.json, then start the local webhook server:
python deployments/feishu/webhook-server/start-with-ngrok.pyThe script prints the public Request URL. Paste it into the Feishu/Lark event
subscription page and enable im.message.receive_v1.
Keep the process running if you want scheduled jobs:
| Job | Default schedule |
|---|---|
| Daily paper push | 09:00, Asia/Shanghai |
| Weekly report | Monday 10:00, Asia/Shanghai |
Watch live logs:
Get-Content data/webhook_stderr.log -WaitCommon chat commands:
profile
daily push
weekly report
1 3
read 1
Detailed setup: docs/feishu-webhook-setup.md.
PaperFlow-Bench is published on HuggingFace: OpenRaiser/PaperFlow.
Download:
python experiments/benchmark/fetch_benchmark.py \
--output-dir data/PaperFlow-BenchCreate a simple valid prediction file from pool order:
python experiments/benchmark/make_benchmark_submission.py \
--benchmark-dir data/PaperFlow-Bench \
--output data/PaperFlow-Bench/example_predictions.jsonlEvaluate:
paperflow eval \
--benchmark-dir data/PaperFlow-Bench \
--predictions data/PaperFlow-Bench/example_predictions.jsonl \
--output data/PaperFlow-Bench/example_metrics.jsonMore benchmark details:
research profile
|
v
daily candidate pool -> scoring + drift adjustment -> paper digest
| |
v v
arXiv / OpenReview / journals reading reports
|
v
feedback + profile update
|
v
tomorrow's recommendation
PaperFlow/
paperflow/ CLI and provider abstraction
agents/ Core workflow agents
skills/ Fetching, parsing, profile, and storage helpers
deployments/desktop/ Optional local browser GUI
deployments/feishu/ Optional Feishu/Lark bot deployment
experiments/ Benchmark and paper reproduction scripts
scripts/ Operational utilities
config/ Source, scoring, and direction configuration
docs/ Setup and benchmark documentation
tests/ Unit and integration tests
pytest tests -q
pytest experiments/tests -qThe GitHub Actions workflow runs the main test suite. Experiment tests are kept
in experiments/tests/ for benchmark and reproduction validation.
For a complete guide map, see docs/README.md. The most common follow-ups are:
- docs/quickstart.md for the first local run
- docs/configuration.md for environment variables and paths
- docs/feedback-loop.md for CLI / GUI / Feishu profile learning
- deployments/desktop/README.md for local GUI behavior
- PaperFlow GUI Preview for a no-install UI preview
- docs/feishu-doc-export.md for Feishu document export
- docs/feishu-webhook-setup.md for webhook + ngrok bot deployment
If you use PaperFlow or PaperFlow-Bench in academic work, please cite:
@misc{paperflow2026,
title = {PaperFlow: Personalized Scientific-Paper Recommendation, Reading, and Reporting},
author = {PaperFlow Contributors},
year = {2026},
url = {https://github.com/OpenRaiser/PaperFlow}
}The formal citation will be updated after the paper is published.
PaperFlow is released under the MIT License. See LICENSE.