Skip to content

Latest commit

 

History

History
394 lines (307 loc) · 10.1 KB

File metadata and controls

394 lines (307 loc) · 10.1 KB

ECC-AES Payload Encryption

Complete documentation for the optional payload encryption feature using ECC-P256 + AES-256-GCM.

Overview

Vectorizer supports optional end-to-end encryption of vector payloads using modern cryptographic standards:

  • ECC-P256: Elliptic Curve Cryptography with NIST P-256 curve
  • AES-256-GCM: Authenticated encryption with 256-bit keys
  • ECDH: Elliptic Curve Diffie-Hellman for secure key exchange
  • Zero-Knowledge: Server never has access to decryption keys

Quick Start

Basic Usage

Encrypt a payload by providing a public key when inserting data:

curl -X POST http://localhost:15002/insert_text \
  -H "Content-Type: application/json" \
  -d '{
    "collection": "my_collection",
    "text": "sensitive document",
    "metadata": {"category": "confidential"},
    "public_key": "base64_encoded_ecc_public_key"
  }'

The server will:

  1. Generate an ephemeral ECC key pair
  2. Derive a shared secret using ECDH
  3. Encrypt the payload with AES-256-GCM
  4. Store the encrypted payload with metadata (nonce, tag, ephemeral public key)

Decryption (Client-Side)

Clients with the corresponding private key can decrypt payloads using the stored metadata.

Documentation

Implementation

  • IMPLEMENTATION.md - Complete implementation guide
    • Core encryption module details
    • API endpoint implementations
    • Data structures and formats
    • Dependencies and configuration
    • Usage examples

Testing

  • TEST_SUMMARY.md - Test suite overview

    • 26 integration tests
    • 3 unit tests
    • All test categories and results
  • EXTENDED_TESTS.md - Extended test coverage

    • Edge cases (empty payloads, special characters, all JSON types)
    • Performance tests (100+ vectors, large payloads)
    • Concurrency tests (multi-threaded operations)
    • Security validation
  • TEST_COVERAGE.md - Coverage metrics

    • Before/after comparison
    • 71% increase in test coverage
    • Real-world scenario testing

Audits

  • ROUTES_AUDIT.md - Complete route audit
    • All 5 routes with encryption support
    • Stub endpoints (no encryption needed)
    • Internal operations analysis

Supported API Endpoints

All major insert/update endpoints support optional encryption:

Endpoint Method Type Public Key Parameter
/insert_text POST REST public_key (body)
/collections/{name}/points PUT Qdrant public_key (per-point or request-level)
/files/upload POST Multipart public_key (form field)
insert_text - MCP Tool public_key (argument)
update_vector - MCP Tool public_key (argument)

Key Formats Supported

The system accepts public keys in multiple formats:

  • PEM: -----BEGIN PUBLIC KEY-----...-----END PUBLIC KEY-----
  • Base64: dGVzdCBrZXk=
  • Hex: 0123456789abcdef...
  • Hex with prefix: 0x0123456789abcdef...

Collection Configuration

Optional Encryption (Default)

By default, collections allow both encrypted and unencrypted payloads:

CollectionConfig {
    encryption: None  // Mixed mode allowed
}

Mandatory Encryption

Enforce encryption for all payloads:

CollectionConfig {
    encryption: Some(EncryptionConfig {
        required: true,
        allow_mixed: false,
    })
}

When encryption is required, unencrypted payloads will be rejected with an error.

Explicit Optional

Explicitly allow optional encryption:

CollectionConfig {
    encryption: Some(EncryptionConfig {
        required: false,
        allow_mixed: true,
    })
}

Encrypted Payload Structure

When a payload is encrypted, it's stored with this structure:

{
  "version": 1,
  "algorithm": "ECC-P256-AES256GCM",
  "nonce": "base64_encoded_nonce",
  "tag": "base64_encoded_auth_tag",
  "encrypted_data": "base64_encoded_payload",
  "ephemeral_public_key": "base64_encoded_ephemeral_key"
}

Fields

  • version: Format version (currently 1)
  • algorithm: Encryption algorithm identifier
  • nonce: AES-GCM nonce (12 bytes)
  • tag: AES-GCM authentication tag (16 bytes)
  • encrypted_data: Encrypted payload data
  • ephemeral_public_key: Server-generated ephemeral public key for ECDH (65 bytes uncompressed P-256)

Security Features

Zero-Knowledge Architecture

  • Server never stores decryption keys
  • Server cannot decrypt payloads
  • Only clients with the private key can decrypt
  • Perfect for sensitive data compliance (GDPR, HIPAA, etc.)

Ephemeral Keys

Each encryption operation generates a new ephemeral key pair:

  • Prevents key reuse attacks
  • Forward secrecy
  • Each payload has unique encryption

Authenticated Encryption

AES-256-GCM provides:

  • Confidentiality (encryption)
  • Integrity (authentication tag)
  • Protection against tampering

Usage Examples

Example 1: REST API with Encryption

# Insert encrypted text
curl -X POST http://localhost:15002/insert_text \
  -H "Content-Type: application/json" \
  -d '{
    "collection": "confidential_docs",
    "text": "Confidential contract details",
    "metadata": {
      "category": "financial",
      "classification": "confidential"
    },
    "public_key": "BNxT8zqK1FYh3..."
  }'

Example 2: File Upload with Encryption

# Upload encrypted file chunks
curl -X POST http://localhost:15002/files/upload \
  -F "file=@contract.pdf" \
  -F "collection_name=legal_docs" \
  -F "public_key=BNxT8zqK1FYh3..." \
  -F "chunk_size=1000"

Example 3: Qdrant-Compatible with Per-Point Keys

# Upsert with different keys per point
curl -X PUT http://localhost:15002/collections/secure_data/points \
  -H "Content-Type: application/json" \
  -d '{
    "points": [
      {
        "id": "doc1",
        "vector": [0.1, 0.2, 0.3],
        "payload": {"data": "sensitive1"},
        "public_key": "key_for_doc1"
      },
      {
        "id": "doc2",
        "vector": [0.4, 0.5, 0.6],
        "payload": {"data": "sensitive2"},
        "public_key": "key_for_doc2"
      }
    ]
  }'

Example 4: MCP Tool Usage

{
  "tool": "insert_text",
  "arguments": {
    "collection_name": "private_notes",
    "text": "Personal confidential note",
    "metadata": {"category": "personal"},
    "public_key": "BNxT8zqK1FYh3..."
  }
}

Backward Compatibility

All endpoints continue to work without encryption:

# This still works - no encryption
curl -X POST http://localhost:15002/insert_text \
  -H "Content-Type: application/json" \
  -d '{
    "collection": "my_collection",
    "text": "public document",
    "metadata": {"category": "public"}
  }'

Encryption is completely optional unless explicitly required at the collection level.

Performance

Encryption has minimal performance impact:

Operation Vectors Time Impact
Single encrypted insert 1 <1ms Negligible
Bulk encrypted insert 100 ~10ms ~0.1ms/vector
Concurrent (10 threads) 100 ~50ms Thread-safe
Large payload (10KB) 1 ~2ms Scales well

Real-World Use Cases

1. Healthcare (HIPAA Compliance)

Encrypt patient data payloads while keeping vectors searchable:

{
  "text": "Patient symptoms and medical history",
  "metadata": {
    "patient_id": "encrypted",
    "diagnosis": "encrypted"
  },
  "public_key": "hospital_public_key"
}

2. Financial Services

Protect transaction details while maintaining semantic search:

{
  "text": "Wire transfer for $50,000 to account...",
  "metadata": {
    "amount": 50000,
    "transaction_type": "wire"
  },
  "public_key": "bank_public_key"
}

3. Legal Documents

Encrypt case details with client-specific keys:

{
  "text": "Confidential settlement agreement...",
  "metadata": {
    "case_id": "2024-1234",
    "client": "encrypted"
  },
  "public_key": "client_specific_key"
}

4. Key Rotation

Rotate encryption keys over time for enhanced security:

# Old documents with old key
# New documents with new key
# Server never needs to know - client handles rotation

Testing

Comprehensive test coverage includes:

  • ✅ 26 integration tests
  • ✅ 3 unit tests
  • ✅ All API endpoints
  • ✅ Edge cases (empty payloads, large payloads, special characters)
  • ✅ Performance (100+ vectors, concurrent operations)
  • ✅ Security (enforcement, validation, key formats)
  • ✅ Backward compatibility

All tests pass: 29/29 (100%)

See TEST_SUMMARY.md for details.

Dependencies

Required Rust crates:

[dependencies]
p256 = "0.13"        # ECC-P256 cryptography
aes-gcm = "*"        # AES-256-GCM encryption
hex = "0.4"          # Hexadecimal encoding/decoding
base64 = "*"         # Base64 encoding/decoding
sha2 = "*"           # SHA-256 hashing

Implementation Files

Core implementation locations:

  • Core Module: src/security/payload_encryption.rs
  • Models: src/models/qdrant/point.rs
  • REST API: src/server/rest_handlers.rs
  • Qdrant API: src/server/qdrant_vector_handlers.rs
  • File Upload: src/server/file_upload_handlers.rs
  • MCP Tools: src/server/mcp_handlers.rs

Status

🟢 PRODUCTION READY

  • ✅ Complete implementation
  • ✅ Full test coverage
  • ✅ Zero-knowledge architecture
  • ✅ Backward compatible
  • ✅ All routes supported
  • ✅ Multiple key formats
  • ✅ Comprehensive documentation

License

Same as the main Vectorizer project (Apache-2.0).

Support

For questions or issues:


Last Updated: 2025-12-10 Version: v2.0.3 Status: Production Ready