Product Updates

New LanceDB Enterprise Features:

  • Richer full-text search capabilities: Unlock advanced FTS with boolean logic, flexible phrase matching, and autocomplete-ready prefix queries.

  • Blazing-fast full-text search: Optimized FTS engine now delivers P99 latencies under 50ms on 40M-row tables, lightning speed at scale.

  • Streamlined Kubernetes deployments: Native Helm chart support makes BYOC setups faster and easier to manage. A deployment can be up and running in a couple of hours.

  • Smarter vector search with tight filters: Fine-tune recall with new minimum_nprobes and maximum_nprobes controls for better results on queries with highly selective filters.

Open Source Updates

Lance & LanceDB OSS Releases:

Lance Version: v0.31.0

Breaking changes: refactor Dataset config API and expose it via pylance.

 

Lance Version: v0.30.0

Breaking changes: auto-remap indexes before scan and move file metadata cache to bytes capacity.

 

LanceDB Version: v0.21.0

Various improvements to native full-text search, which is now the default.

New documentation site: lancedb.com/documentation

Lance Trino connector and PostgreSQL extension.

Events and Community Recap

From Text to Video: A Unified Multimodal Data Lake for Next-Gen AI

 

Ryan Vilim from Character AI shares how their Data & AI Platform team builds self-service tools and infrastructure to power LLM training and research. He explains how they leverage data lakes, Spark, Trino, Kubernetes, and Lance to prepare, annotate, and serve massive multimodal datasets—while keeping workflows fast and researcher-friendly.

The talk also covers why Lance’s open multimodal lakehouse format fits their needs, enabling unified storage, search, and analytics at scale. Packed with practical insights on managing AI data pipelines, this session is perfect for anyone building or scaling AI systems.

Building a Data Foundation for Multimodal Foundation Models

Ethan Rosenthal from Runway delivers an in-depth exploration of the unsung heroes behind generative models: data pipelines. 

 

Ethan shares pragmatic lessons on system design—covering topics like columnable storage formats (moving beyond tarballs and Parquet), schema evolution with LanceDB, efficient video decoding, asynchronous data loading to eliminate GPU bottlenecks, and on-the-fly augmentation for flexible experimentation.

A Thank You to Our Contributors

 

A heartfelt thank you to our community contributors of Lance and LanceDB OSS projects this past month:

 

@renato2099, @wojiaodoubao, @Jay-ju, @b4l, @yanghua, @HaochengLIU, @ddupg, @bjurkovski @kilavvy@wojiaodoubao, @leaves12138, @majin1102, @leopardracer, @luohao, @KazuhitoT, @frankliee