Skip to content

prism-416/prism-vector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prizmatic Vector Worker

OCI Functions worker for Prizmatic chunking and embedding jobs.

The worker is event driven. It receives embedding jobs, chunks document text, creates Gemini embeddings, and writes the results back through the Prizmatic backend API. It does not connect directly to the application database.

Current Problem Statement

Prizmatic helps software teams organize project work in one workspace. This repository provides the vector-processing worker used by Prizmatic to turn documents, work items, comments, and agent memory content into embeddings that the backend can store and search.

Supported Developer Environments

These instructions are intended for the operating systems used by the team and for CI-style verification:

  • macOS 14+ with Homebrew
  • Ubuntu 22.04+ or another recent Linux distribution
  • Windows 11 using WSL2 Ubuntu for deployment parity
  • Windows PowerShell for unit-test-only development after installing Git, Python 3.11+, Docker, and uv

Use Python 3.11 or newer. Docker is only required when building the OCI Functions container image.

Check Out The Source Code

The latest stable branch is main. The active beta/development branch is develop.

git clone https://github.com/prism-416/prism-vector.git
cd prism-vector
git fetch --tags
git switch main

For beta development and milestone demos:

git switch develop
git pull origin develop

There are no release tags in this repository yet. Until the first tagged release, use main for the latest stable code and develop for integrated beta work.

Install Dependencies

macOS, Linux, and WSL2:

curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync --extra oci
cp .env.example .env

Windows PowerShell:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
uv sync
Copy-Item .env.example .env

Use WSL2 for the OCI deployment path. Native PowerShell setup is intended for unit tests and documentation work.

Do not commit .env. It is intentionally local and may contain API keys or service tokens.

Environment Configuration

The application loads settings from .env through src/config.py. Important local settings:

  • PRISM_API_BASE_URL: Prizmatic backend API base URL. Use https://api.prizmatic.app/dev for the shared dev API, or http://localhost:3000 when running the backend locally.
  • PRISM_API_TOKEN: bearer token used by the worker when calling internal backend endpoints.
  • GEMINI_API_KEY or GOOGLE_API_KEY: API key used to create embeddings.
  • EMBEDDING_MODEL: embedding model name. Default is gemini-embedding-001.
  • EMBEDDING_DIMENSIONS: expected vector size. Default is 1536.
  • CHUNK_SIZE, CHUNK_OVERLAP, MAX_CHUNKS: document chunking limits.
  • CLAIM_PROJECT_ID, CLAIM_JOB_TYPES, CLAIM_LIMIT: defaults for claim-mode invocations.
  • OCI_OBJECT_STORAGE_NAMESPACE, OCI_OBJECT_STORAGE_BUCKET: defaults used when a document job references an OCI Object Storage object without providing the bucket or namespace in the event.
  • OCI_USE_RESOURCE_PRINCIPAL: set to true in OCI Functions. Set to false for local testing with an OCI config file.

Database And External Services

This repository does not own or connect directly to a database. All persistent state is managed by the Prizmatic backend API.

For unit tests, no database, backend API, Gemini account, or OCI Object Storage bucket is required because the tests use fake clients.

For end-to-end local testing, run or use these services:

  • Prizmatic backend API from https://github.com/prism-416/prism-nestjs
  • PostgreSQL for the backend API, following the backend README
  • Gemini API credentials for embedding generation
  • OCI Object Storage only when testing jobs that read document text from objects instead of inline event text

Build The Software

Python dependency and import check:

uv sync --frozen --extra oci
uv run python -m compileall src

Build the deployment container:

docker build -t prism-vector:local .

The Dockerfile installs the OCI Python FDK and runs:

fdk /app/src/entrypoints/providers/oci_function.py handler

Test The Software

Run these checks before opening or merging a pull request:

uv run pytest
uv run python -m compileall src
docker build -t prism-vector:local .

The automated tests cover event parsing and core worker behavior with fake API, embedding, and storage clients.

Manual integration test steps:

  1. Start the Prizmatic backend API and its database, or use the shared dev API.
  2. Set PRISM_API_BASE_URL, PRISM_API_TOKEN, and GEMINI_API_KEY in .env.
  3. If testing Object Storage jobs, set OCI_OBJECT_STORAGE_NAMESPACE, OCI_OBJECT_STORAGE_BUCKET, and local OCI credentials, or provide bucketName and namespace in the event payload.
  4. Trigger an embedding job through the backend API or send a direct event payload to the OCI Function handler.
  5. Verify that the backend job status becomes completed and that the expected chunks or embeddings are written through the API.

OCI Entry Point

src/entrypoints/providers/oci_function.py:handler

API Design Documentation

This repository does not expose a public HTTP API. Its API design contract is the set of backend endpoints that the worker calls. The canonical backend API documentation is the Swagger/OpenAPI page served by prism-nestjs:

  • Shared dev API docs: https://api.prizmatic.app/dev/docs
  • Shared dev OpenAPI JSON: https://api.prizmatic.app/dev/docs-json
  • Local backend docs: http://localhost:3000/docs
  • Local backend OpenAPI JSON: http://localhost:3000/docs-json

Worker API contract:

  • POST /projects/{projectId}/embedding-jobs/claim
  • PATCH /projects/{projectId}/embedding-jobs/{embeddingJobId}
  • POST /projects/{projectId}/documents/{documentId}/chunks
  • POST /projects/{projectId}/documents/{documentId}/chunks/embeddings
  • GET /projects/{projectId}/work-items/{itemId}
  • PUT /projects/{projectId}/work-items/{itemId}/embedding
  • GET /projects/{projectId}/work-items/{itemId}/comments
  • PUT /projects/{projectId}/work-items/{itemId}/comments/{commentId}/embedding
  • PUT /projects/{projectId}/agent-memories/{memoryId}/embedding

When an endpoint, DTO, response shape, auth requirement, or embedding job type changes, update the backend Swagger decorators in prism-nestjs, update this contract list, and verify the change in /docs before marking the API task complete.

Supported Job Types

  • document: chunks a full document and appends chunk embeddings.
  • document_chunk: embeds one existing document chunk.
  • work_item: embeds a work item title and description.
  • work_item_comment: embeds a work item comment body.
  • agent_memory: embeds inline agent memory content.

Event Payloads

Direct document job event:

{
  "projectId": "project-uuid",
  "embeddingJobId": "job-uuid",
  "jobType": "document",
  "targetId": "document-uuid",
  "objectName": "projects/project-uuid/documents/document-uuid/source.md",
  "model": "gemini-embedding-001",
  "dimensions": 1536
}

Inline document text is also supported:

{
  "projectId": "project-uuid",
  "jobType": "document",
  "targetId": "document-uuid",
  "text": "# Document\n\nContent to chunk and embed."
}

Claim mode:

{
  "action": "claim",
  "projectId": "project-uuid",
  "jobTypes": ["document", "work_item"],
  "limit": 10
}

OCI Object Storage events are supported when the object name follows this pattern:

projects/{projectId}/documents/{documentId}/...

Source Layout

src/
  api_client.py
  chunking.py
  config.py
  embeddings.py
  event_parser.py
  hashing.py
  models.py
  storage.py
  worker.py
  worker_handler.py
  entrypoints/providers/oci_function.py
tests/
  test_event_parser.py
  test_worker.py

Bug Tracking

Outstanding bugs are tracked in GitHub Issues:

  • Bug list: https://github.com/prism-416/prism-vector/issues
  • New bug report: https://github.com/prism-416/prism-vector/issues/new

When reporting a bug, include:

  • A short title describing the failing behavior.
  • Reproduction steps.
  • Expected behavior.
  • Actual behavior, including stack traces or API responses when available.
  • Environment details, such as branch, operating system, Python version, and whether the bug happened locally, in the shared dev API, or in OCI.

Serious bugs should be labeled as bugs, assigned to a team member, and included in the project schedule before new polish work is prioritized.

Project Schedule

Detailed assignments should be tracked in GitHub Issues or the team's project board. This README keeps the public milestone summary updated for instructor check-ins. Course due dates come from the official course schedule.

Milestone Status Completed items Remaining or adjusted items
Milestone 1 Completed Repository bootstrap, uv project setup, initial Dockerfile, initial worker purpose and API notes Keep setup docs current as environment requirements change
Milestone 2 Completed Gemini embedding configuration, chunking settings, .env.example, OCI Function entry point, Object Storage configuration No direct database setup is required in this repo; backend database instructions live in prism-nestjs
Milestone 3 Completed API-driven worker flow, embedding job claim/update calls, document chunk append calls, work item embedding calls, event parsing, unit tests for parser and worker flows Expand independent verification for all implemented job types and file any bugs in GitHub Issues
Milestone 4 beta In progress Additional job type support for document chunks, work item comments, and agent memories; Docker build path; shared dev API contract documented Verify beta deployment path, confirm an accessible demo link, finish the Milestone 4 Progress Update document, assign serious bugs, and reserve bug-fix time before final release
Final release Planned Not started Stabilize core embedding workflows, complete non-implementer verification, fix serious bugs, update API docs, run full checks, and prepare final demo

Schedule adjustments made for the beta:

  • Vector work is scoped to event ingestion, chunking, embedding generation, and backend API updates. Database schema and Swagger endpoint ownership remain in prism-nestjs.
  • Final-release time is reserved for integration testing and bug fixing instead of adding large new vector feature areas.
  • Serious bugs must be assigned in GitHub Issues and accounted for in the schedule before new polish tasks are started.

Feature Verification

For each feature marked complete, a team member who did not implement the feature should verify the behavior and file bugs for anything that does not work as expected. Implementation status is not final acceptance until the independent verification status is filled in.

Feature Implementation status Independent verification status
Parse direct embedding job events Implemented Pending non-implementer verification
Parse OCI Object Storage document events Implemented Pending non-implementer verification
Chunk full documents and append chunk records Implemented Pending non-implementer verification
Append document chunk embeddings Implemented Pending non-implementer verification
Embed work items Implemented Pending non-implementer verification
Embed work item comments Implemented Pending non-implementer verification
Embed agent memories from inline content Implemented Pending non-implementer verification
Mark embedding jobs completed or failed Implemented Pending non-implementer verification

Weekly Milestone Check-In

Before each instructor check-in:

  1. Pull the latest develop branch.
  2. Confirm the README, API contract, schedule, and bug tracker are current.
  3. Run or review the latest tests and Docker build.
  4. Demo completed work through the shared dev API, local backend API, or OCI Function invocation.
  5. Each team member should be ready to state what they completed last week, what they plan to do this week, and any blockers.

Milestone 4 Beta Release Notes

For Milestone 4, the project should be accessible through the deployed Prizmatic beta environment. The current public API documentation link is:

  • https://api.prizmatic.app/dev/docs

The team should also commit a Milestone 4 Progress Update document containing:

  • Individual progress updates for each team member.
  • A comparison between scheduled work and completed work.
  • Clearly labeled partially completed tasks with percent-complete estimates.
  • A group progress assessment, including whether the team is on track, ahead, or behind schedule.
  • A team progress grade with a short explanation.
  • Any schedule or work-process adjustments needed before final release.

About

Event driven worker for Vector DB related jobs in Prism

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors