Update lance related deps: Handle NaN values with pyarrow based writing. by mtauraso · Pull Request #947 · lincc-frameworks/hyrax

mtauraso · 2026-06-11T21:42:57Z

Motivation

LanceDB >= 0.30 rejects NaN values in fixed-size-list vector columns, so writing must use pylance to preserve NaNs and keep the existing fixed-size-list schema.
Align lance-namespace and pylance version constraints with the pylance runtime that supports the required namespace behavior.

Description

Pin dependencies in pyproject.toml: set lancedb < 0.34.0, lance-namespace >= 0.7.7, < 0.8.0, and pylance >= 7.0.0, < 8.0.0 to ensure compatibility with pylance-based Arrow writes.
Add import lance and change ResultDatasetWriter to convert batches to Arrow tables and write them with lance.write_dataset(...) instead of using table.add(...), so the fixed-size-list data column and NaNs are preserved.
Remove creation of an empty Lance table via lancedb.create_table and instead consistently write Arrow tables with a schema created from the first sample tensor; validate tensor dtype and shape on subsequent batches.
In commit(), call lancedb.connect(...).open_table(TABLE_NAME).optimize() to optimize the final table.
Update tests in tests/hyrax/test_result_dataset.py to assert the created data column is a fixed-size-list with the expected list_size and to exercise multi-batch writes via ResultDataset loading.

Testing

Ran pytest tests/hyrax/test_result_dataset.py which exercises test_writer_basic, test_writer_multiple_batches, and multidimensional tensor writing, and all tests passed.

…lated deps ### Motivation - LanceDB's `table.add()` path in versions >= 0.30 rejects NaN values in fixed-size-list/vector columns, so writing through `pylance` is needed to preserve NaNs and maintain the existing fixed-size-list on-disk format. - Upstream package compatibility changed around `lance-namespace` and `pylance`, so dependency caps were adjusted to track a compatible set of releases until broader support is available. ### Description - Add `import lance` and switch `ResultDatasetWriter.write_batch` to write Arrow batches via `lance.write_dataset(...)` instead of creating an empty table and using `table.add()` through `lancedb`, preserving fixed-size-list schemas and NaNs. - Track the current dataset handle via `self.lance_dataset` and `self.lance_uri` and append subsequent batches with mode switching (`overwrite` for the first write, `append` thereafter). - Change `commit` to call `lancedb.connect(...).open_table(TABLE_NAME).optimize()` for optimization after writes. - Update `pyproject.toml` dependency pins to `lancedb < 0.34.0`, `lance-namespace >= 0.7.7, < 0.8.0`, and `pylance >= 7.0.0, < 8.0.0` to reflect the compatibility window used by the new write path. - Update tests in `tests/hyrax/test_result_dataset.py` to import `pyarrow as pa` and validate that the stored `data` field is a `fixed_size_list` with the expected list size. ### Testing - Ran the updated unit tests for the result dataset file with `pytest tests/hyrax/test_result_dataset.py` and the tests completed successfully.

codecov · 2026-06-11T21:50:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.79%. Comparing base (61d4977) to head (288805e).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #947      +/-   ##
==========================================
- Coverage   63.80%   63.79%   -0.02%     
==========================================
  Files          74       74              
  Lines        7689     7686       -3     
==========================================
- Hits         4906     4903       -3     
  Misses       2783     2783

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

drewoldag

Looks ok to me.

mtauraso added the codex label Jun 11, 2026 — with ChatGPT Codex Connector

mtauraso changed the title ~~Preserve fixed-size-list vectors when writing Lance datasets via pylance and update lancedb/pylance pins~~ Update lance related deps: Handle NaN values with pyarrow based writing. Jun 11, 2026

Merge branch 'main' into codex/review-package-upgrades-for-lance-c3gstp

6227041

mtauraso requested a review from drewoldag June 11, 2026 21:43

mtauraso self-assigned this Jun 11, 2026

Merge branch 'main' into codex/review-package-upgrades-for-lance-c3gstp

288805e

drewoldag approved these changes Jun 22, 2026

View reviewed changes

mtauraso merged commit 8504128 into main Jun 22, 2026
8 checks passed

mtauraso deleted the codex/review-package-upgrades-for-lance-c3gstp branch June 22, 2026 22:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update lance related deps: Handle NaN values with pyarrow based writing.#947

Update lance related deps: Handle NaN values with pyarrow based writing.#947
mtauraso merged 3 commits into
mainfrom
codex/review-package-upgrades-for-lance-c3gstp

mtauraso commented Jun 11, 2026

Uh oh!

codecov Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

drewoldag left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mtauraso commented Jun 11, 2026

Motivation

Description

Testing

Uh oh!

codecov Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

drewoldag left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 11, 2026 •

edited

Loading