Rewrite SQLiteStorageProvider to use SQLAlchemy

## Motivation

The current `SQLiteStorageProvider` uses raw `sqlite3` with dynamic SQL generation, making it tightly coupled to SQLite. Rewriting it with SQLAlchemy would allow using PostgreSQL, MySQL, and other databases for crawl data storage — useful for larger deployments and integration with existing infrastructure.

## Current State

**`SQLiteStorageProvider`** (`openwpm/storage/sql_provider.py`):
- Raw `sqlite3` module, no ORM
- Dynamic `INSERT INTO table (col1, ...) VALUES (?, ...)` generation per record
- Lossy type coercion: `bytes` → `str`, `callable` → `str`, `dict` → `json.dumps`
- Silent error logging on constraint violations (OperationalError, IntegrityError, etc.)
- No batching — one INSERT per record
- No indexes defined in `schema.sql`
- 15 tables defined in `openwpm/storage/schema.sql`

**Schema files that must stay in sync:**
- `openwpm/storage/schema.sql` (SQLite DDL)
- `openwpm/storage/parquet_schema.py` (Arrow/Parquet)
- `test/storage/test_values.py` (test data)
- `docs/Schema-Documentation.md`

## Requirements

### Must Have
- [ ] SQLAlchemy ORM models matching the existing 15-table SQLite schema exactly
- [ ] `SQLAlchemyStorageProvider(StructuredStorageProvider)` using SQLAlchemy Core or ORM
- [ ] Support SQLAlchemy connection URLs (equivalent to JDBC URLs), e.g.:
  - `sqlite:///path/to/crawl.db`
  - `postgresql://user:pass@localhost/openwpm`
- [ ] Schema compatibility: the SQLite schema produced must be identical to the current `schema.sql` (column names, types, constraints, foreign keys) unless there is strong reasoning to diverge
- [ ] Tests against local PostgreSQL (can use `pytest-postgresql` or Docker fixture)
- [ ] Existing parametrized storage tests pass with the new provider
- [ ] Backward compatibility: `SQLiteStorageProvider` API surface unchanged for existing users

### Nice to Have
- [ ] Batch inserts for better write throughput
- [ ] Connection pooling via SQLAlchemy engine
- [ ] Index definitions for common query patterns
- [ ] Async support via SQLAlchemy 2.0+ async engine

### Out of Scope
- Migrating Arrow/Parquet providers to SQLAlchemy
- Changing the `UnstructuredStorageProvider` interface
- Schema migrations tooling (Alembic) — can be a follow-up

## Implementation Notes

**Interface contract** — the provider must implement:
```python
class SQLAlchemyStorageProvider(StructuredStorageProvider):
    async def init(self) -> None: ...
    async def store_record(self, table: TableName, visit_id: VisitId, record: Dict[str, Any]) -> None: ...
    async def finalize_visit_id(self, visit_id: VisitId, interrupted: bool) -> Optional[Task]: ...
    async def flush_cache(self) -> None: ...
    async def shutdown(self) -> None: ...
```

**Connection URL approach:**
SQLAlchemy uses [database URLs](https://docs.sqlalchemy.org/en/20/core/engines.html#database-urls) which are the Python equivalent of JDBC URLs:
```
dialect+driver://username:password@host:port/database
```

**Suggested implementation approach:**
Use an adversarial two-agent workflow: one agent builds the implementation, a second agent reviews to ensure the SQLite schema remains exactly as-is unless strong reasoning is provided for changes.

## Related

- #561 — Rewrite DataAggregators (closed, partially addressed by current storage system)
- Current providers: `SQLiteStorageProvider`, `LocalArrowProvider`, `MemoryStructuredProvider`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite SQLiteStorageProvider to use SQLAlchemy #1143

Motivation

Current State

Requirements

Must Have

Nice to Have

Out of Scope

Implementation Notes

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Rewrite SQLiteStorageProvider to use SQLAlchemy #1143

Description

Motivation

Current State

Requirements

Must Have

Nice to Have

Out of Scope

Implementation Notes

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions