CloudWatch Error Analyser

Automatically detects errors from any Lambda in your AWS account, retrieves relevant source code from a vector knowledge base, and posts an AI-generated root cause analysis to Slack — with no per-project setup required.

How it works (high level)

The system has two independent pipelines that work together.

Pipeline 1 — Error detection & analysis

A Lambda throws an error → CloudWatch Logs captures it → a subscription filter forwards the log batch to a forwarder Lambda → the forwarder enqueues a synthetic alarm payload → the analyser picks it up, deduplicates it, and triggers a Step Functions workflow that fetches logs, retrieves relevant code from the knowledge base, reranks the results, invokes Claude to generate a root cause analysis, and posts the result to Slack.

Pipeline 2 — Code indexing

A developer pushes to GitHub → GitHub fires a webhook → the indexer Lambda verifies the payload, fetches changed files via the GitHub API, hashes them, chunks them, uploads them to S3, and triggers a Bedrock Knowledge Base ingestion job so the new code is immediately searchable.

Two background jobs run daily to keep everything wired up automatically: one registers webhooks on new GitHub repos, the other adds subscription filters to new Lambda log groups.

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                          ERROR DETECTION PIPELINE                           │
│                                                                             │
│  Any Lambda  ──►  CloudWatch Logs  ──►  Subscription Filter                │
│                                                  │                         │
│                                                  ▼                         │
│                                         Log Forwarder Lambda               │
│                                                  │                         │
│                             ┌────────────────────┘                         │
│                             │                                               │
│                             ▼                                               │
│                    SQS Ingress Queue  (DLQ after 3 retries)                │
│                             │                                               │
│                             ▼                                               │
│                     Analyser Lambda                                         │
│                    (dedup check ──► skip if seen in last 30 min)           │
│                             │                                               │
│                             ▼                                               │
│              Step Functions Express Workflow                                │
│                             │                                               │
│          ┌──────────────────┼──────────────────┐                           │
│          ▼                  ▼                  ▼                           │
│      fetchLogs         retrieveCode         (parallel)                     │
│   (CW Insights)     (Bedrock KB search)                                    │
│          │                  │                                               │
│          └──────────────────┘                                               │
│                             │                                               │
│                             ▼                                               │
│                          rerank                                             │
│                    (Bedrock Rerank API)                                     │
│                             │                                               │
│                             ▼                                               │
│               chooseModel (by alarm severity)                              │
│               CRITICAL/ERROR → Claude Opus                                 │
│               WARNING/INFO  → Claude Haiku                                 │
│                             │                                               │
│                             ▼                                               │
│                          analyse                                            │
│                    (Claude via Bedrock)                                     │
│                             │                                               │
│                             ▼                                               │
│                          notify                                             │
│                    (Slack Incoming Webhook)                                 │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│                          CODE INDEXING PIPELINE                             │
│                                                                             │
│  GitHub push  ──►  API Gateway  ──►  Indexer Lambda                        │
│                                           │                                 │
│                                    verify HMAC-SHA256                      │
│                                           │                                 │
│                                    fetch changed files                     │
│                                     (GitHub API + PAT)                     │
│                                           │                                 │
│                                    hash file contents                      │
│                                     (SHA-256 in memory)                    │
│                                           │                                 │
│                                    diff against DynamoDB                   │
│                                     (skip unchanged files)                 │
│                                           │                                 │
│                                    chunk changed files                     │
│                                     (function/class boundaries)            │
│                                           │                                 │
│                                    upload chunks to S3                     │
│                                     chunks/{owner/repo}/{file}.json        │
│                                           │                                 │
│                                    delete S3 objects                       │
│                                     (for removed files)                    │
│                                           │                                 │
│                                    start Bedrock KB                        │
│                                     ingestion job                          │
│                                           │                                 │
│                                    persist hashes to DynamoDB              │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│                          BACKGROUND JOBS (daily)                            │
│                                                                             │
│  EventBridge ──► Webhook Registrar Lambda                                  │
│                   lists all GitHub repos owned by the user                 │
│                   adds indexer webhook to any repo missing it              │
│                                                                             │
│  EventBridge ──► Subscription Registrar Lambda                             │
│                   lists all /aws/lambda/* log groups                       │
│                   adds error subscription filter to any missing it         │
│                   excludes cloud-error-* to prevent feedback loops        │
└─────────────────────────────────────────────────────────────────────────────┘

Flow breakdown

1. Error detection (`src/log-forwarder/handler.ts`)

CloudWatch Logs subscription filters are attached to every /aws/lambda/* log group in the account. When a Lambda produces a log line matching the error pattern (ERROR, Exception, Traceback, CRITICAL, Fatal, panic), CloudWatch delivers a gzipped, base64-encoded batch to the Log Forwarder Lambda.

The forwarder:

Decompresses and parses the log batch
Scans each log event for error patterns
Takes the first matching line as the error message (truncated to 500 chars)
Builds a synthetic AlarmPayload shaped like a CloudWatch alarm, using the Lambda function name from the log group
Enqueues it to the SQS ingress queue

This means any Lambda in the same AWS account is automatically covered with zero per-project configuration.

2. Deduplication (`src/analyser/handler.ts` + `src/analyser/deduplication/deduplicator.ts`)

The Analyser Lambda reads from SQS. Before starting the expensive Step Functions workflow, it checks the DynamoDB dedup table using a SHA-256 hash of the error message. If the same error has been seen within the last 30 minutes, a lightweight "duplicate suppressed" notice is posted to Slack instead, and the full pipeline is skipped.

3. Analysis workflow (`src/analyser/taskHandler.ts`)

A single Task Lambda handles all five steps of the Step Functions Express Workflow. Each invocation receives the current pipeline state and a step field telling it what to do.

fetchLogs

Runs a CloudWatch Logs Insights query on the Lambda's log group, fetching up to 50 log lines in a ±60 second window around the error timestamp. Polls until the query completes (up to 30 seconds). If the log group does not exist, returns an empty array rather than failing the pipeline.

retrieveCode

Queries the Bedrock Knowledge Base using the error message as the search query, enriched with any function names extracted from the stack trace. Returns up to 40 semantically relevant code chunks. Each chunk includes the full file path scoped to the repo (owner/repo/src/file.ts), start line, end line, and text.

rerank

Sends the retrieved chunks to the Bedrock Rerank API (cohere.rerank-v3-5:0 by default, configurable). The reranker scores each chunk for relevance to the error query and returns the top 8. This step filters out broadly-similar-but-not-actually-useful chunks before they reach Claude.

chooseModel

A Step Functions Choice state (no Lambda invocation). Routes high-severity errors (*CRITICAL*, *ERROR* in the alarm name or state reason) to Claude Opus, and lower-severity ones to Claude Haiku. Model IDs come from lib/config.ts.

analyse

Invokes Claude via the Bedrock InvokeModel API. Passes the alarm name, error reason, log lines, and ranked code chunks as structured XML context. Claude responds with a JSON object containing rootCause, trigger, suggestedFix, severity, and filePath. The prompt instructs Claude to return raw JSON only; any markdown fences in the response are stripped before parsing.

notify

Posts the analysis to Slack using an Incoming Webhook. The message uses Block Kit attachments with a colour-coded severity bar (red for CRITICAL, orange for ERROR, yellow for WARNING, green for INFO). Includes root cause, trigger, suggested fix, relevant file, and a deep link to the CloudWatch alarm.

4. Code indexing

This is the most complex part of the system. The goal is to make your source code searchable by meaning so that when an error occurs in a Lambda, the analyser can ask "which parts of my codebase are related to this error?" and get back relevant functions and classes — not just a text search match, but semantic proximity.

Why this is needed

When Claude analyses an error, it needs context beyond just the log line. It needs to see the actual code that produced it. The challenge is that this code lives in GitHub, not in AWS. The indexer bridges this gap by continuously syncing your code into a Bedrock Knowledge Base that the analyser can query at runtime.

The big picture

           PUSH TIME                              QUERY TIME
     (runs on every git push)             (runs on every Lambda error)

  GitHub repo                             Error message
      │                                        │
      │ webhook                                │ semantic search
      ▼                                        ▼
  Indexer Lambda                         Bedrock KB query
      │                                        │
      ├─ fetch changed files                   ├─ converts error to a vector
      ├─ hash → skip unchanged                 ├─ finds nearest code vectors
      ├─ chunk into functions                  └─ returns top 40 matching chunks
      ├─ upload to S3                                    │
      └─ trigger KB ingestion                           ▼
              │                              Reranker → top 8 chunks
              ▼                                        │
         Bedrock embeds                               ▼
         each chunk as                            Claude analyses
         a vector and stores                      error + logs + code
         in S3 Vectors

Step 1 — Webhook arrives

GitHub sends a POST to the indexer endpoint with the push payload. Before anything else, the HMAC-SHA256 signature in the X-Hub-Signature-256 header is validated against the webhook secret using a timing-safe byte comparison. Requests with invalid signatures are rejected with 401 immediately.

Only push events are processed. ping events (sent when a webhook is first created) return pong. All other event types are acknowledged and ignored.

Step 2 — Fetch changed files from GitHub

The push payload contains a list of commits, each with added, modified, and removed file paths. The indexer builds a combined list of files that need re-indexing (added + modified) and files that need removing.

Each file to re-index is fetched individually from the GitHub Contents API using the commit SHA as the ref — this ensures we get exactly the version of the file that was pushed, not the latest HEAD.

GET https://api.github.com/repos/{owner}/{repo}/contents/{path}?ref={sha}
Authorization: Bearer {github-pat}

The file content is returned base64-encoded and decoded in memory. No disk I/O.

Step 3 — Hashing (skip unchanged files)

Each fetched file's content is SHA-256 hashed. The hash is compared against the last known hash stored in DynamoDB under the key {owner}/{repo}/{filePath}.

DynamoDB key:  "SamuelLawrence876/jse-bot/src/trades/handler.ts"
DynamoDB item: { fileName: "...", hash: "a3f9c2...", ttl: 1735000000 }

If the hash matches, the file is skipped. This means a push that only changes one file out of 50 only re-indexes that one file. Hashes expire after 90 days so stale entries are automatically cleaned up.

Step 4 — Chunking (split into searchable pieces)

Storing whole files in the Knowledge Base would be inefficient — a 500-line file would be one giant blob, and the search would return the entire file for every query. Instead, each file is split into smaller chunks at meaningful code boundaries.

The chunker scans each line looking for function, class, and method declarations:

TypeScript/JS:   export function ..., const x = () => ..., class Foo
Python:          def my_function
Go:              func myFunction
Java:            public void method(

When it finds a boundary (and the current chunk is at least 5 lines), it closes the current chunk and starts a new one. Chunks are capped at 80 lines to prevent oversized blobs.

Example — this file:

import { DynamoDBClient } from '@aws-sdk/client-dynamodb';          // lines 1–3
const client = new DynamoDBClient({});

export const getOrder = async (id: string) => {                      // lines 4–12
  const result = await client.send(...);
  if (!result.Item) throw new Error(`Order ${id} not found`);
  return result.Item;
  // ... more code ...
};

export const createOrder = async (order: Order) => {                 // lines 13–25
  await client.send(...);
  return order;
  // ... more code ...
};

Would produce two chunks:

Chunk 1: lines 1–12, text = the getOrder function and its imports
Chunk 2: lines 13–25, text = the createOrder function

Supported file types: .ts, .js, .tsx, .jsx, .py, .go, .java. Other file types (JSON, YAML, markdown, etc.) are skipped.

Step 5 — Upload to S3

Each file's chunks are serialised as a JSON array and uploaded to S3 at a path scoped by repo and file:

s3://cloud-error-code-chunks/chunks/SamuelLawrence876/jse-bot/src/trades/handler.ts.json

The JSON contains the chunk text plus metadata that Bedrock will attach to each vector:

[
  {
    "id": "SamuelLawrence876/jse-bot/src/trades/handler.ts:4",
    "text": "export const getOrder = async (id: string) => { ... }",
    "metadata": {
      "repo": "SamuelLawrence876/jse-bot",
      "filePath": "SamuelLawrence876/jse-bot/src/trades/handler.ts",
      "startLine": 4,
      "endLine": 12
    }
  }
]

For removed files, the corresponding S3 object is deleted so the Knowledge Base doesn't return stale chunks for code that no longer exists.

Step 6 — Bedrock Knowledge Base ingestion

After S3 is updated, a Bedrock ingestion job is started. Bedrock scans the S3 bucket for new or changed objects, calls Amazon Titan Text Embeddings V2 on each chunk's text to produce a high-dimensional vector, and stores the vectors in the S3 Vectors backing store.

This is the "learning" step — Bedrock is converting human-readable code into a mathematical representation that can be searched by semantic similarity.

"export const getOrder = async (id: string) => { ... }"
                    │
                    │  Titan Text Embeddings V2
                    ▼
        [0.023, -0.187, 0.441, 0.009, ... ]  (1536 dimensions)
                    │
                    ▼
            stored in S3 Vectors

The ingestion job runs asynchronously — the Lambda returns a 200 immediately after starting it and does not wait for it to finish (it typically takes 1–3 minutes for a small batch).

Step 7 — Persist hashes

Once S3 and KB ingestion are kicked off, the new file hashes are written back to DynamoDB. This updates the baseline for the next push — if these exact files are pushed again with no changes, they'll be skipped.

How the all-repos approach works

All your repos share the same S3 bucket and the same Knowledge Base. Each chunk's filePath metadata is prefixed with the repo name (SamuelLawrence876/jse-bot/src/...). When the analyser queries the KB for a jse-bot Lambda error, it gets chunks back from the jse-bot repo specifically because those are the semantically closest vectors to the error message — not because there's any explicit filtering.

This is the key insight: you don't need to tell the system which repo an error came from. The vector search naturally returns the most relevant code across all indexed repos.

5. Webhook Registrar (`src/webhook-registrar/handler.ts`)

Runs daily on an EventBridge schedule. Lists all repositories owned by the configured GitHub user, checks each for the indexer webhook URL, and creates it if missing. This means new GitHub repositories are automatically wired up within 24 hours of being created, with no manual steps.

6. Subscription Registrar (`src/subscription-registrar/handler.ts`)

Runs daily on an EventBridge schedule. Lists all /aws/lambda/* log groups in the account and ensures each one has the cloud-error-error-filter subscription filter pointing at the Log Forwarder Lambda. Any cloud-error-* log groups are excluded to prevent the system from feeding its own errors back into itself.

This also runs on demand — invoke it manually after deploying a new Lambda to immediately opt it into error detection without waiting for the next daily run.

Infrastructure

All infrastructure is defined with AWS CDK in lib/. There are no manual CloudFormation steps.

Resource	Type	Purpose
`cloud-error-analyser-ingress`	SQS Queue	Buffers alarm payloads before analysis
`cloud-error-analyser-dlq`	SQS Queue	Dead letters after 3 failed analysis attempts
`cloud-error-analyser`	Lambda	Reads from SQS, deduplicates, starts Step Functions
`cloud-error-analysis-workflow`	Step Functions (Express)	Orchestrates the 5-step analysis pipeline
`cloud-error-analyser-task`	Lambda	Executes each step of the workflow
`cloud-error-dedup`	DynamoDB Table	Tracks seen errors for 30-minute dedup window
`cloud-error-indexer`	Lambda	Handles GitHub push webhooks
`cloud-error-indexer`	API Gateway HTTP API	Exposes the indexer as a webhook endpoint
`cloud-error-code-chunks`	S3 Bucket	Stores chunked source code as JSON
`cloud-error-file-hashes`	DynamoDB Table	Stores per-file SHA-256 hashes for change detection
`cloud-error-webhook-registrar`	Lambda	Daily job to register webhooks on new GitHub repos
`cloud-error-log-forwarder`	Lambda	Receives CW Logs batches, enqueues error payloads
`cloud-error-subscription-registrar`	Lambda	Daily job to attach subscription filters to log groups
`cloud-error-analyser`	CloudWatch Dashboard	Monitors queue depth, execution success/failure

Forking this project

1. Edit `lib/config.ts`

This is the only file that contains identity-specific values:

export const config = {
  domain: {
    hostedZone: 'your-domain.com',        // your Route 53 hosted zone
    subdomain: 'cloud-error.your-domain.com',  // subdomain for the webhook endpoint
  },
  ssm: {
    slackWebhookUrl:    '/cloud-error/slack-webhook-url',
    bedrockKbId:        '/cloud-error/bedrock-kb-id',
    bedrockKbDsId:      '/cloud-error/bedrock-kb-ds-id',
    githubPat:          '/cloud-error/github-pat',
    githubWebhookSecret: '/cloud-error/github-webhook-secret',
  },
  bedrock: {
    highSeverityModel: 'us.anthropic.claude-opus-4-7-v1:0',
    standardModel:     'us.anthropic.claude-haiku-4-5-20251001-v1:0',
    rerankModelId:     'cohere.rerank-v3-5:0',
  },
} as const;

If you don't have a custom domain, remove IndexerDomain from lib/cloudErrorStack.ts and replace the webhookUrl prop with indexerApi.api.apiEndpoint + '/index'.

Region note for rerankModelId: cohere.rerank-v3-5:0 works in all supported regions including us-east-1. If you are deploying to us-west-2, ca-central-1, eu-central-1, or ap-northeast-1, you can use amazon.rerank-v1:0 for a fully AWS-native setup with no external model dependency.

2. Create SSM parameters

# Slack — create an Incoming Webhook at api.slack.com/apps
aws ssm put-parameter --name /cloud-error/slack-webhook-url \
  --value "https://hooks.slack.com/services/..." --type String

# GitHub — Personal Access Token with `repo` scope
aws ssm put-parameter --name /cloud-error/github-pat \
  --value "ghp_..." --type String

# GitHub — any random string used to sign webhook payloads
aws ssm put-parameter --name /cloud-error/github-webhook-secret \
  --value "$(openssl rand -hex 32)" --type String

# Bedrock — fill these after the first CDK deploy (see step 4)
aws ssm put-parameter --name /cloud-error/bedrock-kb-id  --value "PLACEHOLDER" --type String
aws ssm put-parameter --name /cloud-error/bedrock-kb-ds-id --value "PLACEHOLDER" --type String

3. Bootstrap CDK and deploy once

cdk bootstrap aws://YOUR_ACCOUNT_ID/YOUR_REGION
npm run deploy

The first deploy creates the cloud-error-code-chunks S3 bucket, which you need before creating the Knowledge Base.

4. Create the Bedrock Knowledge Base (manual, one-time)

AWS CDK cannot fully automate Knowledge Base creation. In the AWS Console:

Go to Amazon Bedrock → Knowledge Bases → Create
Set the S3 data source to the cloud-error-code-chunks bucket
Choose Amazon Titan Text Embeddings V2 as the embedding model
Choose S3 Vectors as the vector store — this is pay-per-use with no fixed minimum cost
Note the Knowledge Base ID and Data Source ID

Update SSM with the real IDs:

aws ssm put-parameter --name /cloud-error/bedrock-kb-id  --value "YOUR_KB_ID"  --type String --overwrite
aws ssm put-parameter --name /cloud-error/bedrock-kb-ds-id --value "YOUR_DS_ID" --type String --overwrite

5. Deploy again and run the background jobs

npm run deploy

# Wire up all existing Lambda log groups immediately
aws lambda invoke --function-name cloud-error-subscription-registrar /dev/null

# Register webhooks on all existing GitHub repos immediately
aws lambda invoke --function-name cloud-error-webhook-registrar /dev/null

After this, the system is fully live.

6. Request Bedrock model access

In the AWS Console, go to Amazon Bedrock → Model Access and request access to:

Anthropic Claude Opus 4.7
Anthropic Claude Haiku 4.5

The model IDs in lib/config.ts use cross-region inference profiles (the us. prefix). These require the underlying models to be enabled in your account.

CI/CD

The GitHub Actions pipeline in .github/workflows/pipeline.yml runs on every push and pull request to master.

Job	Trigger	Steps
`check`	All branches	Typecheck, unit tests, CDK synth
`deploy`	`master` push only	Deploy to AWS via OIDC
`acceptance`	After deploy	End-to-end test against live AWS resources

Required GitHub repository variables and secrets

Name	Type	Value
`AWS_ACCOUNT_ID`	Variable	Your AWS account ID
`AWS_REGION`	Variable	e.g. `us-east-1`
`AWS_DEPLOY_ROLE_ARN`	Secret	ARN of the IAM role the pipeline assumes

The deploy role requires permissions to deploy CDK stacks (CloudFormation, Lambda, SQS, DynamoDB, S3, Bedrock, IAM, etc.). The pipeline uses GitHub OIDC — no long-lived AWS credentials are stored.

Development

Running tests

npm test              # unit tests (all src/**/*.test.ts)
npm run acceptance-test  # end-to-end tests against live AWS (requires AWS credentials)

Unit tests mock all AWS SDK clients using aws-sdk-client-mock. No AWS credentials are needed to run them.

The acceptance test (acceptance-test/index.test.ts) submits a synthetic alarm payload directly to the SQS queue and polls Step Functions until the execution completes, then verifies a Slack notification was sent.

Useful commands

npm run typecheck    # TypeScript type checking without emitting
npm run synth        # synthesise CDK CloudFormation templates (no deploy)
npm run deploy       # deploy all stacks to AWS

Project structure

lib/
  config.ts                      ← single source of truth for all config
  cloudErrorStack.ts             ← CDK stack definition
  constructs/
    analyser/                    ← SQS queue, Lambda, Step Functions, DynamoDB
    indexer/                     ← Lambda, API Gateway, S3, DynamoDB
    log-forwarder/               ← Lambda
    subscription-registrar/      ← Lambda + EventBridge schedule
    webhook-registrar/           ← Lambda + EventBridge schedule
    observability/               ← CloudWatch dashboard

src/
  logger.ts                      ← structured JSON logger (shared)
  types.ts                       ← shared TypeScript interfaces
  analyser/
    handler.ts                   ← SQS consumer, dedup, Step Functions trigger
    taskHandler.ts               ← Step Functions task executor (all 5 steps)
    analysis/llm.ts              ← Claude invocation + response parsing
    deduplication/deduplicator.ts← DynamoDB-backed error dedup
    logs/logFetcher.ts           ← CloudWatch Logs Insights queries
    logs/logGroupResolver.ts     ← maps alarm metrics to log group names
    notification/slack.ts        ← Slack Block Kit message builder + poster
    retrieval/retriever.ts       ← semantic search via Bedrock KB
    retrieval/reranker.ts        ← Bedrock Rerank API
  indexer/
    handler.ts                   ← GitHub webhook handler
    chunking/chunker.ts          ← code splitting into chunks
    github/githubClient.ts       ← GitHub API + webhook signature verification
    hashing/merkle.ts            ← SHA-256 file hashing + DynamoDB persistence
    vectors/vectorStore.ts       ← S3 upload + Bedrock KB ingestion + query
  log-forwarder/handler.ts       ← CloudWatch Logs subscription filter consumer
  subscription-registrar/handler.ts ← attaches subscription filters to log groups
  webhook-registrar/
    handler.ts                   ← registers GitHub webhooks on repos
    githubRepos.ts               ← GitHub API client for repo/webhook management

acceptance-test/
  index.test.ts                  ← end-to-end pipeline test

Estimated monthly cost

Based on light usage: a solo developer, ~5 repos, ~10 error analyses per day (300/month), ~50 code pushes per month.

The vector store backing the Bedrock Knowledge Base is S3 Vectors — a pay-per-use model with no fixed minimum cost. This keeps the bill very low.

Service	Cost/month	Notes
Bedrock — Claude inference	~$3–5	Opus 4.7 for high-severity alarms only; Haiku 4.5 ($0.80/$4 per M tokens) for the bulk of standard alarms, across ~300 analyses
Bedrock — Rerank	~$1	Cohere Rerank 3.5 via Bedrock, ~10 chunks per analysis
S3 Vectors (KB vector store)	~$0.01	~$2.50 per million API requests + $0.06/GB storage — negligible at this scale
Bedrock — Titan Embeddings	~$0.02	Embeddings on code pushes only
CloudWatch Logs	~$0.50	Log ingestion + Insights queries
Route 53	~$0.50	Hosted zone (likely already paying this)
SES	~$0.03	~300 notification emails/month
Step Functions Express	~$0.01	~3,000 state transitions/month
Lambda	~$0	Comfortably within the 1M request / 400K GB-second free tier
SQS	~$0	Within the 1M request free tier
DynamoDB	~$0	Within the 25 GB / 25 RCU/WCU free tier
S3	~$0.05	Code chunk JSON objects
API Gateway	~$0.01	Minimal webhook invocations
EventBridge	~$0	Scheduled rules are free
Total	~$5–7/month

The dominant costs are Bedrock inference (Claude) and reranking. Everything else is negligible. Costs scale linearly with the number of error analyses — roughly $0.015–0.02 per analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
acceptance-test		acceptance-test
bin		bin
lib		lib
src		src
.gitignore		.gitignore
.npmignore		.npmignore
LICENSE		LICENSE
README.md		README.md
cdk.context.json		cdk.context.json
cdk.json		cdk.json
jest.acceptance.config.js		jest.acceptance.config.js
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

CloudWatch Error Analyser

How it works (high level)

Architecture

Flow breakdown

1. Error detection (src/log-forwarder/handler.ts)

2. Deduplication (src/analyser/handler.ts + src/analyser/deduplication/deduplicator.ts)

3. Analysis workflow (src/analyser/taskHandler.ts)

4. Code indexing

Why this is needed

The big picture

Step 1 — Webhook arrives

Step 2 — Fetch changed files from GitHub

Step 3 — Hashing (skip unchanged files)

Step 4 — Chunking (split into searchable pieces)

Step 5 — Upload to S3

Step 6 — Bedrock Knowledge Base ingestion

Step 7 — Persist hashes

How the all-repos approach works

5. Webhook Registrar (src/webhook-registrar/handler.ts)

6. Subscription Registrar (src/subscription-registrar/handler.ts)

Infrastructure

Forking this project

1. Edit lib/config.ts

2. Create SSM parameters

3. Bootstrap CDK and deploy once

4. Create the Bedrock Knowledge Base (manual, one-time)

5. Deploy again and run the background jobs

6. Request Bedrock model access

CI/CD

Required GitHub repository variables and secrets

Development

Running tests

Useful commands

Project structure

Estimated monthly cost

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Error detection (`src/log-forwarder/handler.ts`)

2. Deduplication (`src/analyser/handler.ts` + `src/analyser/deduplication/deduplicator.ts`)

3. Analysis workflow (`src/analyser/taskHandler.ts`)

5. Webhook Registrar (`src/webhook-registrar/handler.ts`)

6. Subscription Registrar (`src/subscription-registrar/handler.ts`)

1. Edit `lib/config.ts`

Packages