Skip to content

Update lance related deps: Handle NaN values with pyarrow based writing.#947

Merged
mtauraso merged 3 commits into
mainfrom
codex/review-package-upgrades-for-lance-c3gstp
Jun 22, 2026
Merged

Update lance related deps: Handle NaN values with pyarrow based writing.#947
mtauraso merged 3 commits into
mainfrom
codex/review-package-upgrades-for-lance-c3gstp

Conversation

@mtauraso

Copy link
Copy Markdown
Member

Motivation

  • LanceDB >= 0.30 rejects NaN values in fixed-size-list vector columns, so writing must use pylance to preserve NaNs and keep the existing fixed-size-list schema.
  • Align lance-namespace and pylance version constraints with the pylance runtime that supports the required namespace behavior.

Description

  • Pin dependencies in pyproject.toml: set lancedb < 0.34.0, lance-namespace >= 0.7.7, < 0.8.0, and pylance >= 7.0.0, < 8.0.0 to ensure compatibility with pylance-based Arrow writes.
  • Add import lance and change ResultDatasetWriter to convert batches to Arrow tables and write them with lance.write_dataset(...) instead of using table.add(...), so the fixed-size-list data column and NaNs are preserved.
  • Remove creation of an empty Lance table via lancedb.create_table and instead consistently write Arrow tables with a schema created from the first sample tensor; validate tensor dtype and shape on subsequent batches.
  • In commit(), call lancedb.connect(...).open_table(TABLE_NAME).optimize() to optimize the final table.
  • Update tests in tests/hyrax/test_result_dataset.py to assert the created data column is a fixed-size-list with the expected list_size and to exercise multi-batch writes via ResultDataset loading.

Testing

  • Ran pytest tests/hyrax/test_result_dataset.py which exercises test_writer_basic, test_writer_multiple_batches, and multidimensional tensor writing, and all tests passed.

Codex Task

…lated deps

### Motivation
- LanceDB's `table.add()` path in versions >= 0.30 rejects NaN values in fixed-size-list/vector columns, so writing through `pylance` is needed to preserve NaNs and maintain the existing fixed-size-list on-disk format.
- Upstream package compatibility changed around `lance-namespace` and `pylance`, so dependency caps were adjusted to track a compatible set of releases until broader support is available.

### Description
- Add `import lance` and switch `ResultDatasetWriter.write_batch` to write Arrow batches via `lance.write_dataset(...)` instead of creating an empty table and using `table.add()` through `lancedb`, preserving fixed-size-list schemas and NaNs.
- Track the current dataset handle via `self.lance_dataset` and `self.lance_uri` and append subsequent batches with mode switching (`overwrite` for the first write, `append` thereafter).
- Change `commit` to call `lancedb.connect(...).open_table(TABLE_NAME).optimize()` for optimization after writes.
- Update `pyproject.toml` dependency pins to `lancedb < 0.34.0`, `lance-namespace >= 0.7.7, < 0.8.0`, and `pylance >= 7.0.0, < 8.0.0` to reflect the compatibility window used by the new write path.
- Update tests in `tests/hyrax/test_result_dataset.py` to import `pyarrow as pa` and validate that the stored `data` field is a `fixed_size_list` with the expected list size.

### Testing
- Ran the updated unit tests for the result dataset file with `pytest tests/hyrax/test_result_dataset.py` and the tests completed successfully.
@mtauraso mtauraso changed the title Preserve fixed-size-list vectors when writing Lance datasets via pylance and update lancedb/pylance pins Update lance related deps: Handle NaN values with pyarrow based writing. Jun 11, 2026
@mtauraso mtauraso requested a review from drewoldag June 11, 2026 21:43
@mtauraso mtauraso self-assigned this Jun 11, 2026
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.79%. Comparing base (61d4977) to head (288805e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #947      +/-   ##
==========================================
- Coverage   63.80%   63.79%   -0.02%     
==========================================
  Files          74       74              
  Lines        7689     7686       -3     
==========================================
- Hits         4906     4903       -3     
  Misses       2783     2783              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@drewoldag drewoldag left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me.

@mtauraso mtauraso merged commit 8504128 into main Jun 22, 2026
8 checks passed
@mtauraso mtauraso deleted the codex/review-package-upgrades-for-lance-c3gstp branch June 22, 2026 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants