Skip to content

updated some default postgres import behaviour#8

Merged
gkennos merged 14 commits into
mainfrom
postgres_load_fix
May 25, 2026
Merged

updated some default postgres import behaviour#8
gkennos merged 14 commits into
mainfrom
postgres_load_fix

Conversation

@gkennos
Copy link
Copy Markdown
Member

@gkennos gkennos commented May 18, 2026

CLOSES: #7, CLOSES: #9, CLOSES: #10

@gkennos gkennos requested a review from nicoloesch May 18, 2026 12:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Postgres-related vocabulary loading behaviours for OMOP_Alchemy 0.6.3: it removes the explicit psycopg2-binary dependency now that orm-loader>=0.4.0 no longer requires it, switches the Athena CSV import from literal to auto quote mode (fixing VARCHAR(255) overflow on quoted concept names), and makes load-vocab-source default to merge_strategy="replace" with a 100k chunk size. It also renames the CLI entry point from omop-maint to omop-alchemy, adds PostgreSQL integration tests (with a docker-compose harness and GitHub Actions workflow), commits a minimal Athena fixture set, and removes the obsolete notebooks directory.

Changes:

  • Switch vocabulary loading defaults: quote_mode="auto", merge_strategy="replace", default chunksize=100_000; drop psycopg2-binary extra and bump orm-loader to >=0.4.0.
  • Rename CLI entry point omop-maintomop-alchemy across code, docs, and CHANGELOG.
  • Add committed Athena fixtures, Postgres integration test suite + docker-compose harness + CI workflow, and remove old notebooks.

Reviewed changes

Copilot reviewed 35 out of 38 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pyproject.toml, uv.lock Bump version to 0.6.3, drop psycopg2-binary, require orm-loader>=0.4.0, register pytest postgres marker, rename script.
omop_alchemy/config.py Alias bare postgresql:// URL to psycopg driver, keep psycopg2 entry for clearer errors.
omop_alchemy/maintenance/load_vocab.py Default quote_mode="auto"; default chunksize=100_000; update detail strings.
omop_alchemy/maintenance/cli.py Default --merge-strategy to replace; default --chunksize to 100k (0 disables); rename CLI references.
omop_alchemy/maintenance/{ui,info,doctor,backup}.py Update user-facing strings from omop-maint to omop-alchemy.
tests/conftest.py Add pg_engine/pg_session fixtures; use uppercase fixture filenames; quote_mode="auto".
tests/test_load_vocab.py Rework against the new minimal committed fixture (e.g. assert on concept 8507 MALE).
tests/test_load_vocab_source.py Adapt fakes to new signatures; add quote-mode regression test (misnamed).
tests/test_load_vocab_postgres.py New end-to-end Postgres integration tests covering quote mode, schema, chunksize, replace vs upsert.
tests/test_config_driver.py New tests for driver mapping and _missing_driver_message (contains some dead code).
tests/fixtures/athena_source/*.csv Add minimal committed Athena fixtures (7 concepts).
tests/docker-compose.yaml, tests/README.md Add Postgres test container and updated test docs.
.github/workflows/tests.yml New CI workflow for unit+SQLite and Postgres integration jobs.
.gitignore Permit committed Athena fixtures; ignore notebooks/.
notebooks/*.ipynb, notebooks/concept_enums.py Delete obsolete notebooks.
docs/**, CHANGELOG.md Update CLI name in docs; document 0.6.3 changes (including breaking ones).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_config_driver.py Outdated
Comment thread tests/test_load_vocab_source.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 44 out of 47 changed files in this pull request and generated 1 comment.

Comment thread tests/test_config_driver.py Outdated
Copy link
Copy Markdown
Collaborator

@nicoloesch nicoloesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The three core fixes (quote mode, merge strategy default, psycopg2 removal) are correct and the logic is sound. The surrounding scope needs trimming before this is mergeable.


Changes requested

1. Remove all Docker changes

The Docker additions mix dev-environment concerns (non-root user setup, .venv embedded in the container path) with runtime image concerns. These belong in a separate, dedicated Docker track. Nothing in the stated fix scope requires touching docker/, and having it here creates maintenance confusion. Please revert docker/, .dockerignore, and .github/workflows/docker-python.yml.

NOTE: We need to think about how we package our things together, including the docker support. I am aware that I am a culprit of this too in omop-spires where I orchestrate them all but we need a better strategy here.

2. Remove initial_load and expose insert_if_empty as a first-class merge strategy

initial_load=True is a boolean that silently overrides merge_strategy to "insert_if_empty". This is indirect and adds a parameter interaction that is hard to reason about. The CLI is already complex and this makes it worse without adding meaningful value. The simpler fix is to accept "insert_if_empty" directly as a valid merge_strategy value, ideally constrained with a Literal type. Remove initial_load from both the Python API and the CLI.

3. Rework test fixtures to use in-memory generation

The static CSV files in tests/fixtures/athena_source/ and the shutil.copy calls in _make_concept_source are unnecessary maintenance surface. Generate fixture content in-memory in conftest using a factory function (e.g. build_athena_source(tmp_path, ...)) that writes tab-delimited strings to a temporary path. This is cleaner, removes the scattered static files, and makes the test intent explicit.

4. Remove tests/docker-compose.yaml

This is infrastructure scattered away from the rest of the project config. The GitHub Actions postgres integration tests use native service containers and do not depend on this file. It serves no purpose in the repository.


Minor

Quote the db_schema identifier in SET search_path (load_vocab.py:265)

connection.exec_driver_sql(f"SET search_path TO {db_schema}")

db_schema is interpolated directly without identifier quoting. A schema name with uppercase letters or special characters will produce unexpected behaviour. This is pre-existing but we are already in this file. A simple double-quote wrap with internal quote escaping is sufficient.

** CI postgres integration job only covers Python 3.12
The unit and SQLite jobs run a matrix over ["3.12", "3.13"] but the postgres job hardcodes 3.12. Extend the matrix or pin to the latest stable to avoid missing 3.13 regressions in the postgres path.

Passing

  • Quote mode fix (literal to auto) is correct and the regression test is well-constructed
  • Merge strategy default change (upsert to replace) is the right call and is clearly documented
  • psycopg2 dependency removal is clean; _missing_driver_message provides useful install hints
  • CLI rename (omop-maint to omop-alchemy) is the right long-term call
  • Chunksize default and 0 to None handling in the CLI are correct
  • tests.yml CI workflow structure is clean; keep it once the docker-compose dependency is removed

Comment thread .github/workflows/docker-python.yml Outdated
Comment thread .github/workflows/docker-python.yml Outdated
Comment thread docker/jupyter/Dockerfile Outdated
Comment thread docker/python/Dockerfile Outdated
Comment thread docker/python/Dockerfile Outdated
Comment thread tests/conftest.py Outdated
Comment thread tests/conftest.py
Comment thread tests/example-docker-compose.yaml Outdated
Comment thread tests/test_load_vocab_postgres.py Outdated
Comment thread pyproject.toml
@nicoloesch nicoloesch self-requested a review May 25, 2026 00:08
Copy link
Copy Markdown
Collaborator

@nicoloesch nicoloesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request changes are included and ready to push!

Copy link
Copy Markdown
Member Author

@gkennos gkennos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes accepted

@gkennos gkennos merged commit c4b7475 into main May 25, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

remove dependency on psycopg2 maintenance interface should not universally default to upsert postgres fast load failing on quote mode literal

3 participants