Skip to content

xindixu/github-activity-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GitHub PR Analytics Suite

πŸ€– AI-powered GitHub PR analytics tool that transforms your pull request history into professional markdown reports with intelligent project categorization, pattern analysis, and performance insights.

✨ Features

  • πŸ“Š Pipeline Commands: Fetch, by-project, technical, and perf-review as separate steps
  • πŸ”„ Auto-fetch: Report commands re-run fetch if CSVs are missing
  • πŸ€– AI-Powered Insights: Uses OpenAI to generate concise summaries and comprehensive pattern analysis
  • πŸ“ Project Categorization: Extracts project names from PR titles ([CS-1234] ProjectName: description)
  • πŸ“ Professional Reports: Generates beautiful markdown reports perfect for performance reviews
  • πŸ” Comprehensive Analysis: 5-section analysis covering project focus, technical themes, development velocity, cross-project insights, and key accomplishments
  • πŸ“ˆ Smart Metrics: Tracks lines changed, PR distribution, and project priorities
  • 🎯 Performance Review Ready: Actionable insights for self-assessments and project planning

πŸ—οΈ Project Structure

pr/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ github_pr_fetcher.py                     # Fetches PRs from GitHub
β”‚   └── pr_summarizer.py                         # AI-powered analysis & summarization
β”œβ”€β”€ output/
β”‚   β”œβ”€β”€ pr_YYYY-MM-DD_YYYY-MM-DD_detailed.csv    # Raw PR data
β”‚   β”œβ”€β”€ pr_YYYY-MM-DD_YYYY-MM-DD_summarized.csv  # With AI summaries
β”‚   β”œβ”€β”€ pr_YYYY-MM-DD_YYYY-MM-DD_by_project.md
β”‚   β”œβ”€β”€ pr_YYYY-MM-DD_YYYY-MM-DD_technical_highlights.md
β”‚   └── pr_YYYY-MM-DD_YYYY-MM-DD_perf_review.md
β”œβ”€β”€ main.py                                      # Complete workflow entry point
β”œβ”€β”€ requirements.txt                             # Dependencies
β”œβ”€β”€ .env.example                                 # Configuration template
└── README.md

πŸš€ Quick Start

1. Installation

git clone <your-repo>
cd pr
pip install -r requirements.txt

2. Setup

Create a .env file with your configuration:

cp .env.example .env

Edit .env and add your credentials:

# GitHub Configuration
GITHUB_TOKEN=your_personal_access_token_here
GITHUB_REPO=owner/repository
GITHUB_USERNAME=your_username  # Optional: specific user to search for

# Time range (pick one)
DAYS=14  # Past N days from today (default)
# START_DATE=2026-05-04  # Explicit range (both required; overrides DAYS)
# END_DATE=2026-05-08

# AI Configuration (for summarization)
OPENAI_API_KEY=your_openai_api_key_here
# OPENAI_MODEL=gpt-4o-mini  # optional; gpt-5* supported automatically

Getting GitHub Token

  1. Go to https://github.com/settings/tokens
  2. Click "Generate new token (classic)"
  3. Select scopes: repo (for private repos) or public_repo (for public repos)
  4. Copy the generated token

Getting OpenAI API Key

  1. Go to https://platform.openai.com/api-keys
  2. Create a new API key
  3. Copy the key (keep it secure!)

3. Run the pipeline

# Default: fetch + by-project + technical + publish to pr-reports
python main.py
# same as:
python main.py ship

# Or run steps individually:
python main.py fetch          # β†’ _detailed.csv
python main.py summarize      # β†’ _summarized.csv (from latest or --csv detailed)
python main.py by-project     # β†’ _by_project.md
python main.py technical      # β†’ _technical_highlights.md
python main.py perf-review    # β†’ _perf_review.md
python main.py publish        # β†’ commit in pr-reports clone
python main.py all            # fetch + all reports (no publish)

Report commands need both _detailed.csv and _summarized.csv. If missing, they run fetch and/or summarize as needed.

# Past N days (relative to today)
DAYS=7 python main.py ship

# Explicit date range (inclusive)
python main.py ship --start 2026-05-04 --end 2026-05-08

# Same via .env
START_DATE=2026-05-04 END_DATE=2026-05-08 python main.py fetch

# Publish and push
python main.py publish --push

# Re-run reports for an existing CSV (skips fetch)
python main.py by-project --csv output/pr_2026-05-04_2026-05-08_detailed.csv
Command What it does
fetch GitHub β†’ _detailed.csv
summarize _detailed.csv β†’ _summarized.csv (OpenAI)
by-project _summarized.csv β†’ _by_project.md
technical _detailed.csv β†’ _technical_highlights.md
perf-review _summarized.csv β†’ _perf_review.md
publish Copy output/{prefix}* to PR_REPORTS_DIR and commit
all fetch + all three reports (no publish)
ship fetch + by-project + technical + publish

πŸ“Š Output Files

1. Detailed CSV (pr_YYYY-MM-DD_YYYY-MM-DD_detailed.csv)

Raw PR data with clean descriptions:

  • pr_url - Direct link to the PR
  • title - PR title
  • description - Clean description (removes boilerplate/templates)
  • lines_of_code_changes - Total lines changed
  • additions - Lines added
  • deletions - Lines deleted
  • created_at - PR creation timestamp
  • state - PR state (open/closed)
  • merged - Whether PR was merged
  • attachments - URLs of attachments found in PR

2. Summarized CSV (pr_YYYY-MM-DD_YYYY-MM-DD_summarized.csv)

All detailed data plus:

  • ai_summary - Concise AI-generated summary of each PR

3. Reports (markdown)

_perf_review.md

Professional markdown report with:

  • πŸ“Š Executive Summary: Period, totals, averages
  • 🎯 Project Focus & Impact: Which projects got the most attention
  • ⚑ Technical Themes & Patterns: Performance, security, infrastructure initiatives
  • πŸš€ Development Velocity & Scale: Work distribution and iteration patterns
  • πŸ”— Cross-Project Insights: Shared challenges and dependencies
  • πŸ† Key Accomplishments & Trends: Significant achievements and innovation
  • πŸ“‹ Individual PR Details: Each PR with project categorization, status, and summary

🎯 Project Categorization

The tool intelligently extracts project names from PR titles using the format:

[TICKET-123] ProjectName: Summary of changes

Examples:

  • [CS-6304] Roles: DB schema for RBAC β†’ Roles project
  • [CS-5916] Connector Catalog: update cypress tests β†’ Connector Catalog project
  • [INFRA-123] Docker: Update base images β†’ Docker project

PRs that don't match this pattern are categorized as "Uncategorized" and handled separately.

βš™οΈ Advanced Usage

Run Components Separately

# Just fetch PR data
python src/github_pr_fetcher.py

# Same commands via pr_summarizer directly
python src/pr_summarizer.py fetch
python src/pr_summarizer.py summarize
python src/pr_summarizer.py ship
python src/pr_summarizer.py publish
python src/pr_summarizer.py all

# Publish the latest (or a specific) output run to pr-reports
python src/publish_reports.py
python src/publish_reports.py --prefix pr_2026-05-09_2026-05-16 --push

Publish to pr-reports

Set PR_REPORTS_DIR in .env to your local clone of pr-reports, then:

python main.py publish
python main.py ship          # includes publish
python main.py publish --push

Copies every file matching the run prefix (CSVs and all report markdown files).

Time Range Options

# Last week
DAYS=7 python main.py

# Last month
DAYS=30 python main.py

# Last quarter
DAYS=90 python main.py

# Last year
DAYS=365 python main.py

🎨 Sample Output

Program output

πŸš€ Starting GitHub PR Analytics Suite
==================================================
πŸ“Š Step 1: Fetching PRs from GitHub...
Searching for PRs created by: xindixu
Time range: Past 180 day(s)
Fetching PRs from instabase/instabase created by 'xindixu' after 2025-01-15...
Found 325 PRs matching criteria
Added PR: [CS-6454] DS UI: Disable indexing modal should list connected chatbots (165 lines changed)
Added PR: [CS-6435] DS UI: Indexing items over limit + use real feature flag (165 lines changed)
Added PR: [CS-0000] Roles: GA in 25.30 (11 lines changed)
Added PR: [CS-6452] Roles: gate mount backend for create/edit/delete mount points (50 lines changed)
Processed 5/325 PRs...
...

Exported 325 PRs to output/pr_2025-01-15_2025-07-14_detailed.csv

Summary:
Total PRs found: 325
Total lines changed: 165431
Average lines per PR: 509.0
βœ… PR data saved to: output/pr_2025-01-15_2025-07-14_detailed.csv

πŸ€– Step 2: Generating AI summaries...
πŸ“Š Loaded 325 PRs from output/pr_2025-01-15_2025-07-14_detailed.csv
βœ… OpenAI client initialized with model: gpt-4o-mini
πŸ€– Generating AI summaries...
Processing PR 1/325: [CS-6454] DS UI: Disable indexing modal should lis...
Processing PR 2/325: [CS-6435] DS UI: Indexing items over limit + use r...
...
πŸ” Analyzing patterns...
πŸ’Ύ Saved summarized data to output/pr_2025-01-15_2025-07-14_summarized.csv
πŸ“ Saved pattern analysis to output/pr_2025-01-15_2025-07-14_summary.md

🎯 QUICK ANALYSIS
==================================================
...

πŸŽ‰ WORKFLOW COMPLETE!
==================================================
πŸ“ Detailed PR data: output/pr_2025-01-15_2025-07-14_detailed.csv
πŸ€– AI summarized data: output/pr_2025-01-15_2025-07-14_summarized.csv
πŸ“ Pattern analysis: output/pr_2025-01-15_2025-07-14_summary.md

Example Files

Check out the examples/ directory for complete sample output demonstrating:

About

πŸ“Š AI-powered GitHub analytics with intelligent insights and professional reporting.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages