🤗 Lance x Hugging Face, Git-Style Branching, 🏔️ Geospatial in Lance
February Newsletter • March 4, 2026
Highlights
🤗 Lance x Hugging Face: A New Era of Sharing Multimodal Data on the Hub
Lance now has native support on the Hugging Face Hub, allowing multimodal datasets (blobs, embeddings, and indexes) to be published as a single searchable artifact that can be scanned via the 🤗 datasets API or Lance’s dataset API to run SQL and vector search.
Branching and Shallow Cloning in Lance: Towards a “Git for AI Data”
Lance introduces a branching model for large-scale AI experimentation with independent manifest trees, physical branch isolation, immutable tags, and zero-copy shallow clones—avoiding shared metadata bottlenecks while enabling full time travel and high-concurrency experimentation.
🏔️ How We Added Geospatial Support To Lance With No New Code
Geospatial support landed in Lance with no storage format changes—GeoArrow types work end-to-end, geospatial functions run in DataFusion, and a production R-Tree index accelerates spatial predicates alongside vector search and SQL.
Huge thanks to contributions from teams at ByteDance, Uber, Rerun, GeoArrow & GeoDataFusion, and Apache DataFusion.
We’re bringing together executive leaders building AI-native developer tools for a candid discussion on what it actually takes to move agentic systems from demo to dependable production infrastructure.
Join us with for a technical walkthrough of the infrastructure behind Exa’s AI search engine, exploring how Lance and Ray power distributed embedding pipelines and semantic retrieval across billions of documents.
Automatic backfill is now available for preview in Feature Engineering.
⚙️ Improved Indexer and Service Stability
Enhancements to indexer parallelism, debugging tools, and failover behavior improve reliability and allow jobs and queries to continue running efficiently under partial failures or resource constraints.
📊 Cache Statistics in Query Plan Analyses
Query plan analyses now include counts of cache hits and misses, allowing better understanding of query performance
🗂 APIs for Versioned Direct Writes to Storage
New TableVersions APIs allow systems to write data directly to object storage while updating metadata managed by Query Nodes.
📦 Format v2 becomes the default with a new manifest design and improved storage options APIs ⚡ Faster search and indexing: ~20% faster FTS indexing, up to 15× faster WAND scoring, and ~30% faster HNSW index builds
🚀 Faster ingestion with parallel inserts and a new writer path for local tables 🔍 Improved observability with hybrid search explain plans showing reranker details
🧠 Context APIs now support embeddings, bot IDs, and session IDs for richer agent memory 🔄 Expanded context management workflows for conversational and agent systems
A huge thank you to contributors from Uber, ByteDance, Netflix, Twitter, and Huawei, and more for their contributions!
Read the full newsletter for more updates around lance-namespace andlance-duckdb.
🎉 Announcing 3 new Lance Maintainers: Beinan Wang (Uber), Jinglun (Bytedance Volcano Engine), Wyatt Alt (LanceDB)
February’s Lance Community Syncs focused on scaling the Lance file format and stabilizing the 2.x release line, including a hint file to eliminate manifest scans, a multi-tenant cache redesign, and experimental column statistics for predicate pushdown.
The next Lance Community Sync will take place on March 12, 2026.