Skip to content

Latest commit

 

History

History
273 lines (193 loc) · 7.57 KB

File metadata and controls

273 lines (193 loc) · 7.57 KB

pg_stat_statements Collector

The statements collector tracks query performance metrics from PostgreSQL's pg_stat_statements extension. It's one of the most powerful tools for identifying and optimizing slow queries in production.

pg_exporter supports PostgreSQL 14 and newer, so all metrics documented here assume PostgreSQL 14+.

Why This Matters

  • Find slow queries during incidents - "What query is causing high load?"
  • Detect N+1 query problems - Before they scale and impact production
  • Identify performance regressions - After deployments or configuration changes
  • Optimize based on real data - Use actual production query patterns, not guesses
  • Track resource-intensive queries - I/O, WAL generation, temp files

This collector complements other collectors:

  • default - System-wide metrics (cache hit ratio, checkpoints, connections)
  • stat.user_tables - Table-level metrics (bloat, vacuum, DML rates)
  • statements - Query-level metrics (execution time, frequency, I/O)

Together, they provide complete visibility from system → table → query level.

Prerequisites

Enable the Extension

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Configure PostgreSQL

Add to postgresql.conf:

shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000

Restart PostgreSQL after modifying postgresql.conf.

Usage

Enable the collector:

pg_exporter --dsn postgresql:///postgres?user=postgres_exporter --collector.statements

By default, it tracks the top 25 queries by total execution time.

Configure the number of queries to track:

# Track top 10 queries
pg_exporter --dsn postgresql://... --collector.statements --statements.top-n=10

# Environment variable form
PG_EXPORTER_STATEMENTS_TOP_N=50 pg_exporter --dsn postgresql://... --collector.statements

Key Metrics

Execution Time

  • pg_stat_statements_total_exec_time_seconds - Total time spent in this query
  • pg_stat_statements_mean_exec_time_seconds - Average time per execution
  • pg_stat_statements_max_exec_time_seconds - Slowest execution
  • pg_stat_statements_stddev_exec_time_seconds - Execution time variance

Frequency

  • pg_stat_statements_calls_total - How many times the query was executed
  • pg_stat_statements_rows_total - Total rows returned/affected

I/O Metrics

  • pg_stat_statements_shared_blks_hit_total - Cache hits (fast)
  • pg_stat_statements_shared_blks_read_total - Disk reads (slow)
  • pg_stat_statements_temp_blks_written_total - Queries spilling to disk
  • pg_stat_statements_cache_hit_ratio - Query cache effectiveness (0.0-1.0)

Resource Usage

  • pg_stat_statements_wal_bytes_total - WAL generation

Use Cases

1. Finding Slow Queries

PromQL query to find queries with highest total time:

topk(10, 
  pg_stat_statements_total_exec_time_seconds
)

2. Identifying N+1 Problems

Queries executed many times with low row counts:

pg_stat_statements_calls_total > 1000
and
pg_stat_statements_rows_total / pg_stat_statements_calls_total < 10

3. Cache Miss Detection

Queries with poor cache hit ratios:

pg_stat_statements_cache_hit_ratio < 0.9
and
pg_stat_statements_calls_total > 100

4. Temp File Usage

Queries writing to disk (needs more work_mem):

rate(pg_stat_statements_temp_blks_written_total[5m]) > 0

5. Performance Regression Detection

Alert on queries getting slower:

increase(pg_stat_statements_mean_exec_time_seconds[1h]) > 0.5

Labels

All metrics include these labels:

  • queryid - Unique query identifier
  • datname - Database name
  • usename - User/role name
  • query_short - First 80 characters of the query (or <utility> for VACUUM/ANALYZE)

query_short is intentionally capped at 80 characters to keep Prometheus label cardinality and label size under control. It is meant for fast identification in Prometheus and Grafana, not as a full SQL text export.

When you need the full normalized statement text, use the queryid label from the metric and query pg_stat_statements directly.

Example:

SELECT
    queryid::text,
    d.datname,
    r.rolname,
    s.query
FROM pg_stat_statements s
JOIN pg_database d ON d.oid = s.dbid
LEFT JOIN pg_roles r ON r.oid = s.userid
WHERE queryid::text = '<queryid-from-metric>';

If you want to narrow the search further, also filter by datname:

SELECT
    queryid::text,
    d.datname,
    r.rolname,
    s.query
FROM pg_stat_statements s
JOIN pg_database d ON d.oid = s.dbid
LEFT JOIN pg_roles r ON r.oid = s.userid
WHERE queryid::text = '<queryid-from-metric>'
  AND d.datname = '<database-from-metric>';

Important Notes

Query Text Normalization

PostgreSQL normalizes queries by replacing constants with placeholders:

-- These are the same query:
SELECT * FROM users WHERE id = 1;
SELECT * FROM users WHERE id = 2;

-- Tracked as:
SELECT * FROM users WHERE id = $1;

Utility Statements

Utility statements (VACUUM, ANALYZE, CREATE INDEX, etc.) may appear as <utility> in the query_short label since PostgreSQL doesn't always track their full text.

Top N Queries

The collector tracks the top N queries by total execution time. This means:

  • Long-running infrequent queries appear at the top
  • Fast but frequent queries also appear if their total time is high
  • Adjust --statements.top-n based on your query diversity and scrape budget

Performance Impact

pg_stat_statements has minimal overhead (typically <1% CPU). However:

  • Higher pg_stat_statements.max values use more memory
  • The collector queries pg_stat_statements on each scrape
  • For high-traffic databases, consider longer scrape intervals

Troubleshooting

Extension Not Found

ERROR: extension "pg_stat_statements" is not available

Solution: Install the extension package:

# Debian/Ubuntu
apt-get install postgresql-contrib

# RHEL/CentOS
yum install postgresql-contrib

No Metrics Appear

Possible causes:

  1. Extension not loaded - Check SHOW shared_preload_libraries;
  2. Extension not created - Run CREATE EXTENSION pg_stat_statements;
  3. No queries executed yet - Run some queries to populate stats
  4. Collector not enabled - Use --collector.statements

Query Text Shows as NULL or <utility>

This is normal for:

  • Utility statements (VACUUM, ANALYZE, etc.)
  • Queries from other monitoring tools
  • Internal PostgreSQL operations

Best Practices

  1. Enable in production - Query-level insights are essential for troubleshooting
  2. Monitor the top 50-100 queries - Balance coverage vs cardinality
  3. Reset stats after major changes - SELECT pg_stat_statements_reset(); after schema migrations
  4. Set appropriate scrape intervals - 30-60 seconds is usually sufficient
  5. Combine with other collectors - Use activity, stat, and vacuum collectors together for complete visibility

Example Grafana Dashboard

Track your slowest queries:

# Panel 1: Top 10 Slowest Queries (by total time)
topk(10, pg_stat_statements_total_exec_time_seconds)

# Panel 2: Most Called Queries
topk(10, rate(pg_stat_statements_calls_total[5m]))

# Panel 3: Cache Hit Ratio Heatmap
pg_stat_statements_cache_hit_ratio

# Panel 4: Queries Writing Temp Files
sum by (query_short) (
  rate(pg_stat_statements_temp_blks_written_total[5m])
)

References