Skip to content

App Separation & Production Hardening

Latest

Choose a tag to compare

@NetZissou NetZissou released this 03 Mar 21:14
· 7 commits to main since this release
4e4ef6d

Highlights

  • Two standalone Streamlit apps — Split the monolithic app.py into apps/embed_explore/ (embed & cluster your own images) and apps/precalculated/ (explore precomputed embeddings), each with dedicated entry points.
  • Shared module architecture — Common code extracted into shared/{components,services,utils,lib}/ to keep both apps DRY.
  • GPU-to-CPU fallback — Clustering auto-falls back through cuML → FAISS → sklearn on OOM or CUDA errors.
  • cuML UMAP stability — UMAP runs in an isolated subprocess with L2-normalized embeddings to prevent SIGFPE crashes.
  • Lazy-loaded heavy libraries — Deferred imports of torch, cuML, and FAISS cut startup time drastically
  • Improved visualization — Zoom/pan support and heatmap option for clustering charts; chart interactions no longer trigger full page reruns.
  • Dynamic metadata filtering — Precalculated app auto-generates filters from parquet schema.

Documentation

  • Added BACKEND_PIPELINE.md, DATA_FORMAT.md, and .github/copilot-instructions.md for testing & code review
  • Updated README with two-app workflow and simplified install instructions

Dependencies

  • Relaxed numpy cap from <=2.2.0 to <2.3 (numba compatibility)
  • Separate CLI entry points via pyproject.toml