The @hf.co community is awesome. Real work that moves everyone forward: huggingface.co/blog/rteb
01.10.2025 16:22 β π 3 π 1 π¬ 0 π 0@benwtrent.bsky.social
Doer of things | Builder of things | software engineer @elastic
The @hf.co community is awesome. Real work that moves everyone forward: huggingface.co/blog/rteb
01.10.2025 16:22 β π 3 π 1 π¬ 0 π 0Apache Lucene 10.3.0 is released! 40% faster lexical search is absolutely crazy for a project that has been doing lexical search for a quarter of a century lucene.apache.org/core/corenew...
19.09.2025 20:20 β π 4 π 0 π¬ 0 π 0Storing floating point values as a big 'ole JSON blob is silly, so we stopped doing that. Great stuff from Jim on making vector search in Elasticsearch substantially cheaper! www.elastic.co/search-labs/...
27.08.2025 13:42 β π 3 π 0 π¬ 0 π 0Next in the series of building a search engine from scratch - we focus on hybrid retrieval with @benwtrent.bsky.socialof Elastic.
How do you add filtering to a vector search index?
I'll code. He'll yell at me.
maven.com/p/430592/hyb...
Sounds like fun!
15.05.2025 23:34 β π 1 π 0 π¬ 1 π 0It's time to redo benchmarks! #Lucene 10.2 was just released, with
- huge speedups to non-scoring boolean queries, range queries and filtered vector search,
- better merging defaults for faster search,
- much faster merging of vectors
And more...
lucene.apache.org/core/corenew...
Lucene will now intelligently merge HNSW graphs: elastic.co/search-labs/... Now indexing and merging is much cheaper, reducing the compute required and improving indexing throughput
08.04.2025 12:57 β π 0 π 0 π¬ 0 π 0Indexing and merging times are getting better for #Apache #Lucene vector search. Lucene has a read-only segment architecture. One of the drawbacks of this approach is throwing away previously completed work when merging HNSW graphs. Well, this got better :)
08.04.2025 12:57 β π 2 π 1 π¬ 1 π 0Read more about it here: elastic.co/search-labs/...
And yes, my child did the header art work. I much prefer it to yet another piece of AI generated guff. Though, the "acorn" that the "squirrel" is holding got cropped out. π
This this new algorithm, we have seen 3-5x fewer vector operations to achieve the same recall on previously horribly performing filter percentages.
28.02.2025 15:39 β π 0 π 0 π¬ 1 π 0We have implemented a variation of the ACORN-1. arxiv.org/abs/2403.04871 The key idea is expanding your HNSW neighborhood search, and only score candidates matching your filter criteria.
28.02.2025 15:39 β π 0 π 0 π¬ 1 π 0Filtered vector search is crazy important. So we made HNSW filtered search in Apache Lucene better. At similar recall, it can be 3-5x faster!
28.02.2025 15:39 β π 5 π 1 π¬ 1 π 0"elasticsearch: 15 years of indexing it all, finding what matters": www.elastic.co/search-labs/...
we turned it into a proper blog post with shay :)
I really enjoyed this talk by @elasticmark.bsky.social. He is back at finding crazy & interesting ways to explore data (I guess he never stopped). Clustering with binary vectors & vector search with Elasticsearch www.youtube.com/watch?v=sJU_...
13.02.2025 21:57 β π 2 π 0 π¬ 0 π 1This also shows the beauty of OpenSource software. Out of nowhere Leo (github.com/aoli-al) comes to save the day, finding and helping fix tricky concurrency bugs in Apache Lucene.
07.02.2025 15:59 β π 0 π 0 π¬ 0 π 0Fray is honestly pretty easy to use, provides deterministic play back of concurrency failures, and automatically detects any concurrency failures through sequential execution of threads: github.com/cmu-pasta/fray
07.02.2025 15:59 β π 0 π 0 π¬ 1 π 0It's wonderful to see practical & important programming work. Debugging concurrent programs is incredibly difficult, here is a bug found in Apache Lucene by the CMU Pasta Lab using their new Fray testing framework www.elastic.co/search-labs/...
07.02.2025 15:59 β π 2 π 1 π¬ 1 π 0The number of improvements in Lucene here are crazy. Pretty much every count and boolean query gets a nice boost and some of the count improvements are hilarious πππ.
15.01.2025 18:28 β π 5 π 0 π¬ 1 π 0It's so cool to see #Apache #Lucene going strong after about a quarter of a century π€―. 2025 is gonna be a fun year for Lucene. www.elastic.co/search-labs/...
10.01.2025 13:32 β π 3 π 0 π¬ 0 π 0Early termination for vector search can be more than just "gathering K candidates" my colleague Tommaso gives a small overview of basic early termination strategies for vector index search. www.elastic.co/search-labs/...
07.01.2025 15:19 β π 3 π 0 π¬ 0 π 0My team wrote a new backing algorithm for our BBQ indices, called Optimized Scalar Quantization. Here is a high level overview of its implementation in Elasticsearch (and soon Apache Lucene). www.elastic.co/search-labs/... for the math nerds, skip to Tom's blog: www.elastic.co/search-labs/...
06.01.2025 18:13 β π 2 π 1 π¬ 0 π 0Lucene has been evaluating disjunctive queries by loading (windows of) postings into a bit set and or-ing these bit sets for 20+ years. It started using the same approach for conjunctive queries a few days ago. benchmarks.mikemccandless.com/CountAndHigh... (annotation HS)
21.12.2024 16:37 β π 2 π 1 π¬ 1 π 0Something a little different from my typical blogs. This line of code in Apache Lucene took me 3 days to write. For fixing bugs, it's about the journey, not necessarily the destination. www.elastic.co/search-labs/... (the cover art was provided by one of my kids :))
27.12.2024 17:16 β π 4 π 1 π¬ 1 π 0Our Better Binary Quantization (BBQ) index in Elasticsearch has a new backing algorithm. Better(er) recall & query speed for vector search. Its a natural evolution of our scalar quantization. Shipping soon. It's pretty neat www.elastic.co/search-labs/...
20.12.2024 16:14 β π 3 π 1 π¬ 0 π 0Elasticsearch just got more powerful. Now, semantic, hybrid, and vector retrieval with custom rules for pinning and bubbling results to the top! Now you have multi-phased, hybrid retrieval in combination with business rules :D www.elastic.co/search-labs/...
19.12.2024 15:36 β π 1 π 0 π¬ 0 π 0It was so much fun talking #Elasticsearch with Steve Mayzak on βYou Know, For Searchβ. I could nerd out for hours, but we kept it down to just 1 hour (maybe even that is too long....). Give it a listen, if nothing else, for Steve's dulcet tones: open.spotify.com/episode/7HLH...
11.12.2024 13:07 β π 2 π 3 π¬ 0 π 0Be prepared to learn more about semantic rerankers than you ever thought you needed to know. Another awesome analysis from my colleagues at Elasticsearch www.elastic.co/search-labs/...
05.12.2024 16:49 β π 7 π 0 π¬ 0 π 0More magic from chef Chris Hegarty. How better binary quantization vector ops are accelerated with Java SIMD in Elasticsearch vector search www.elastic.co/search-labs/...
04.12.2024 19:16 β π 4 π 0 π¬ 0 π 0I cannot adequately express how proud I am of the #Elasticsearch team for delivering this. It is a humungous engineering achievement and the results of (metaphorical) blood, sweat, and (maybe real ;) ) tears. go.es.io/3CVo82X
02.12.2024 15:08 β π 2 π 0 π¬ 0 π 0We have seen this idea played out nicely with Tantivy and Apache Lucene. Benchmarking between each other and lovingly borrowing ideas between the projects.
26.11.2024 21:32 β π 3 π 2 π¬ 0 π 0