OCI Functions worker for Prizmatic chunking and embedding jobs.
The worker is event driven. It receives embedding jobs, chunks document text, creates Gemini embeddings, and writes the results back through the Prizmatic backend API. It does not connect directly to the application database.
Prizmatic helps software teams organize project work in one workspace. This repository provides the vector-processing worker used by Prizmatic to turn documents, work items, comments, and agent memory content into embeddings that the backend can store and search.
These instructions are intended for the operating systems used by the team and for CI-style verification:
- macOS 14+ with Homebrew
- Ubuntu 22.04+ or another recent Linux distribution
- Windows 11 using WSL2 Ubuntu for deployment parity
- Windows PowerShell for unit-test-only development after installing Git,
Python 3.11+, Docker, and
uv
Use Python 3.11 or newer. Docker is only required when building the OCI Functions container image.
The latest stable branch is main. The active beta/development branch is
develop.
git clone https://github.com/prism-416/prism-vector.git
cd prism-vector
git fetch --tags
git switch mainFor beta development and milestone demos:
git switch develop
git pull origin developThere are no release tags in this repository yet. Until the first tagged
release, use main for the latest stable code and develop for integrated
beta work.
macOS, Linux, and WSL2:
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync --extra oci
cp .env.example .envWindows PowerShell:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
uv sync
Copy-Item .env.example .envUse WSL2 for the OCI deployment path. Native PowerShell setup is intended for unit tests and documentation work.
Do not commit .env. It is intentionally local and may contain API keys or
service tokens.
The application loads settings from .env through src/config.py. Important
local settings:
PRISM_API_BASE_URL: Prizmatic backend API base URL. Usehttps://api.prizmatic.app/devfor the shared dev API, orhttp://localhost:3000when running the backend locally.PRISM_API_TOKEN: bearer token used by the worker when calling internal backend endpoints.GEMINI_API_KEYorGOOGLE_API_KEY: API key used to create embeddings.EMBEDDING_MODEL: embedding model name. Default isgemini-embedding-001.EMBEDDING_DIMENSIONS: expected vector size. Default is1536.CHUNK_SIZE,CHUNK_OVERLAP,MAX_CHUNKS: document chunking limits.CLAIM_PROJECT_ID,CLAIM_JOB_TYPES,CLAIM_LIMIT: defaults for claim-mode invocations.OCI_OBJECT_STORAGE_NAMESPACE,OCI_OBJECT_STORAGE_BUCKET: defaults used when a document job references an OCI Object Storage object without providing the bucket or namespace in the event.OCI_USE_RESOURCE_PRINCIPAL: set totruein OCI Functions. Set tofalsefor local testing with an OCI config file.
This repository does not own or connect directly to a database. All persistent state is managed by the Prizmatic backend API.
For unit tests, no database, backend API, Gemini account, or OCI Object Storage bucket is required because the tests use fake clients.
For end-to-end local testing, run or use these services:
- Prizmatic backend API from
https://github.com/prism-416/prism-nestjs - PostgreSQL for the backend API, following the backend README
- Gemini API credentials for embedding generation
- OCI Object Storage only when testing jobs that read document text from objects instead of inline event text
Python dependency and import check:
uv sync --frozen --extra oci
uv run python -m compileall srcBuild the deployment container:
docker build -t prism-vector:local .The Dockerfile installs the OCI Python FDK and runs:
fdk /app/src/entrypoints/providers/oci_function.py handler
Run these checks before opening or merging a pull request:
uv run pytest
uv run python -m compileall src
docker build -t prism-vector:local .The automated tests cover event parsing and core worker behavior with fake API, embedding, and storage clients.
Manual integration test steps:
- Start the Prizmatic backend API and its database, or use the shared dev API.
- Set
PRISM_API_BASE_URL,PRISM_API_TOKEN, andGEMINI_API_KEYin.env. - If testing Object Storage jobs, set
OCI_OBJECT_STORAGE_NAMESPACE,OCI_OBJECT_STORAGE_BUCKET, and local OCI credentials, or providebucketNameandnamespacein the event payload. - Trigger an embedding job through the backend API or send a direct event payload to the OCI Function handler.
- Verify that the backend job status becomes
completedand that the expected chunks or embeddings are written through the API.
src/entrypoints/providers/oci_function.py:handler
This repository does not expose a public HTTP API. Its API design contract is
the set of backend endpoints that the worker calls. The canonical backend API
documentation is the Swagger/OpenAPI page served by prism-nestjs:
- Shared dev API docs:
https://api.prizmatic.app/dev/docs - Shared dev OpenAPI JSON:
https://api.prizmatic.app/dev/docs-json - Local backend docs:
http://localhost:3000/docs - Local backend OpenAPI JSON:
http://localhost:3000/docs-json
Worker API contract:
POST /projects/{projectId}/embedding-jobs/claimPATCH /projects/{projectId}/embedding-jobs/{embeddingJobId}POST /projects/{projectId}/documents/{documentId}/chunksPOST /projects/{projectId}/documents/{documentId}/chunks/embeddingsGET /projects/{projectId}/work-items/{itemId}PUT /projects/{projectId}/work-items/{itemId}/embeddingGET /projects/{projectId}/work-items/{itemId}/commentsPUT /projects/{projectId}/work-items/{itemId}/comments/{commentId}/embeddingPUT /projects/{projectId}/agent-memories/{memoryId}/embedding
When an endpoint, DTO, response shape, auth requirement, or embedding job type
changes, update the backend Swagger decorators in prism-nestjs, update this
contract list, and verify the change in /docs before marking the API task
complete.
document: chunks a full document and appends chunk embeddings.document_chunk: embeds one existing document chunk.work_item: embeds a work item title and description.work_item_comment: embeds a work item comment body.agent_memory: embeds inline agent memory content.
Direct document job event:
{
"projectId": "project-uuid",
"embeddingJobId": "job-uuid",
"jobType": "document",
"targetId": "document-uuid",
"objectName": "projects/project-uuid/documents/document-uuid/source.md",
"model": "gemini-embedding-001",
"dimensions": 1536
}Inline document text is also supported:
{
"projectId": "project-uuid",
"jobType": "document",
"targetId": "document-uuid",
"text": "# Document\n\nContent to chunk and embed."
}Claim mode:
{
"action": "claim",
"projectId": "project-uuid",
"jobTypes": ["document", "work_item"],
"limit": 10
}OCI Object Storage events are supported when the object name follows this pattern:
projects/{projectId}/documents/{documentId}/...
src/
api_client.py
chunking.py
config.py
embeddings.py
event_parser.py
hashing.py
models.py
storage.py
worker.py
worker_handler.py
entrypoints/providers/oci_function.py
tests/
test_event_parser.py
test_worker.py
Outstanding bugs are tracked in GitHub Issues:
- Bug list:
https://github.com/prism-416/prism-vector/issues - New bug report:
https://github.com/prism-416/prism-vector/issues/new
When reporting a bug, include:
- A short title describing the failing behavior.
- Reproduction steps.
- Expected behavior.
- Actual behavior, including stack traces or API responses when available.
- Environment details, such as branch, operating system, Python version, and whether the bug happened locally, in the shared dev API, or in OCI.
Serious bugs should be labeled as bugs, assigned to a team member, and included in the project schedule before new polish work is prioritized.
Detailed assignments should be tracked in GitHub Issues or the team's project board. This README keeps the public milestone summary updated for instructor check-ins. Course due dates come from the official course schedule.
| Milestone | Status | Completed items | Remaining or adjusted items |
|---|---|---|---|
| Milestone 1 | Completed | Repository bootstrap, uv project setup, initial Dockerfile, initial worker purpose and API notes |
Keep setup docs current as environment requirements change |
| Milestone 2 | Completed | Gemini embedding configuration, chunking settings, .env.example, OCI Function entry point, Object Storage configuration |
No direct database setup is required in this repo; backend database instructions live in prism-nestjs |
| Milestone 3 | Completed | API-driven worker flow, embedding job claim/update calls, document chunk append calls, work item embedding calls, event parsing, unit tests for parser and worker flows | Expand independent verification for all implemented job types and file any bugs in GitHub Issues |
| Milestone 4 beta | In progress | Additional job type support for document chunks, work item comments, and agent memories; Docker build path; shared dev API contract documented | Verify beta deployment path, confirm an accessible demo link, finish the Milestone 4 Progress Update document, assign serious bugs, and reserve bug-fix time before final release |
| Final release | Planned | Not started | Stabilize core embedding workflows, complete non-implementer verification, fix serious bugs, update API docs, run full checks, and prepare final demo |
Schedule adjustments made for the beta:
- Vector work is scoped to event ingestion, chunking, embedding generation, and
backend API updates. Database schema and Swagger endpoint ownership remain in
prism-nestjs. - Final-release time is reserved for integration testing and bug fixing instead of adding large new vector feature areas.
- Serious bugs must be assigned in GitHub Issues and accounted for in the schedule before new polish tasks are started.
For each feature marked complete, a team member who did not implement the feature should verify the behavior and file bugs for anything that does not work as expected. Implementation status is not final acceptance until the independent verification status is filled in.
| Feature | Implementation status | Independent verification status |
|---|---|---|
| Parse direct embedding job events | Implemented | Pending non-implementer verification |
| Parse OCI Object Storage document events | Implemented | Pending non-implementer verification |
| Chunk full documents and append chunk records | Implemented | Pending non-implementer verification |
| Append document chunk embeddings | Implemented | Pending non-implementer verification |
| Embed work items | Implemented | Pending non-implementer verification |
| Embed work item comments | Implemented | Pending non-implementer verification |
| Embed agent memories from inline content | Implemented | Pending non-implementer verification |
| Mark embedding jobs completed or failed | Implemented | Pending non-implementer verification |
Before each instructor check-in:
- Pull the latest
developbranch. - Confirm the README, API contract, schedule, and bug tracker are current.
- Run or review the latest tests and Docker build.
- Demo completed work through the shared dev API, local backend API, or OCI Function invocation.
- Each team member should be ready to state what they completed last week, what they plan to do this week, and any blockers.
For Milestone 4, the project should be accessible through the deployed Prizmatic beta environment. The current public API documentation link is:
https://api.prizmatic.app/dev/docs
The team should also commit a Milestone 4 Progress Update document containing:
- Individual progress updates for each team member.
- A comparison between scheduled work and completed work.
- Clearly labeled partially completed tasks with percent-complete estimates.
- A group progress assessment, including whether the team is on track, ahead, or behind schedule.
- A team progress grade with a short explanation.
- Any schedule or work-process adjustments needed before final release.