Andrew Lamb andrewlamb1111 - Bluesky Statics

VLDB 2026 | SponsorshipThe VLDB 2026 conference, will take place in Boston, MA, United States, from Aug 31st to Sep 4th, 2026, and will feature research talks, tutorials, demonstrations, and workshops...

In 2026, VLDB is returning to the Boston area 5 decades after it was born here (first VLDB was in Framingham). A good opportunity to get your company's name on the program (and earn the everlasting gratitiude of the organizing committee) vldb.org/2026/sponsor...

06.03.2026 11:03 — 👍 3 🔁 0 💬 0 📌 0

Yeah, shredding is a very clever optimization

27.02.2026 14:57 — 👍 1 🔁 0 💬 0 📌 0

Here is a new blog about Parquet Variant, including use case, and shredding examples

parquet.apache.org/blog/2026/02...

27.02.2026 14:21 — 👍 7 🔁 2 💬 1 📌 1

It came up on the Parquet sync today if anyone has practical experience with comparing FastLanes encoding vs "classic" bit packing (without the unified shuffled layouts). If you have would love to know your experience

25.02.2026 19:17 — 👍 2 🔁 0 💬 0 📌 0

I suggest getting comfortable with rm -rf every few days -- it works wonders for me :)

25.02.2026 19:17 — 👍 3 🔁 0 💬 0 📌 0

parquet-linter: A better Parquet is Parquet itself – Xiangpeng’s blog Unleash the performance potential of your Parquet files

Simply applying basic linting rules (like don't compress pages where it doesn't help) reduces parquet files sizes by 5% and decreases decode time by 20%.

@xiangpeng.systems shows how in his latest blog
blog.xiangpeng.systems/posts/parque...

23.02.2026 15:26 — 👍 19 🔁 2 💬 0 📌 0

Native Geospatial Types in Apache Parquet Native Geospatial Types in Apache Parquet

Great inaugural post about the geospatial types on the Parquet blog.

Thank you Jia Yu, Dewey Dunnington , Kristin Cowalcijk, Feng Zhang.

More posts coming !

parquet.apache.org/blog/2026/02...

14.02.2026 00:36 — 👍 8 🔁 2 💬 0 📌 0

Stockholm Apache DataFusion Meetup · Luma Join us for an evening of talks, panel discussions, and community discussions about Apache DataFusion and its growing role in modern data infrastructure. This…

Just a few more weeks until the Stockholm DataFusion meetup: luma.com/ctqtiqap

14.02.2026 12:36 — 👍 5 🔁 0 💬 0 📌 0

📖 Apache Parquet recently added native support for Geospatial. This post explains what that means and why it is important: parquet.apache.org/blog/2026/02...

13.02.2026 13:56 — 👍 13 🔁 2 💬 0 📌 0

Building A Distributed SQL Database in 30 Days with AI My journey building HoloStore a distributed key/value store and HoloFusion a distributed SQL DB using AI using the Accord consensus protocol from Cassandra.

kellabyte.substack.com/p/building-a... -- Exactly the kind of thing that shows the power of DataFusion. You can build the database and not (re) build the core query engine

12.02.2026 11:13 — 👍 9 🔁 2 💬 0 📌 2

You can use ApacheParquet for Vector Search with embedded indexes:

> We don’t change the file format; we just tune it.

Xiangpeng Hao explains how in blog.xiangpeng.systems/posts/vector...

10.02.2026 12:17 — 👍 6 🔁 1 💬 0 📌 0

The Quest for One Million IOPS: Benchmarking Storage at LanceDB Learn how LanceDB benchmarks storage and how we achieved one million disk reads per second.

Different techniques are needed to max out modern NVMe SSDs.

@westonpace.bsky.social LanceDB blog is so good if you want the industrial version: lancedb.com/blog/one-mil...

Viktor Leis's LeanStore paper is great if you want the academic version: vldb.org/pvldb/vol16/...

07.02.2026 11:59 — 👍 13 🔁 1 💬 0 📌 0

A somewhat academic talk about the AI usecases driving changes in Apache Parquet and new formats in "Column Storage for the AI Era"

Recording: youtu.be/k9uhw7yqPsQ
Slides: docs.google.com/presentation...

03.02.2026 19:35 — 👍 8 🔁 0 💬 0 📌 0

What I really need is to focus more on reviews / getting stuff merged as now the coding is even easier 😅

02.02.2026 14:12 — 👍 2 🔁 0 💬 0 📌 0

Optimized implementation of SQL CASE expressions in column stores requires careful engineering. The latest Apache DataFusion blog from Pepijn Van Eeckhoudt and Raz Luvaton explains how it works

datafusion.apache.org/blog/2026/02...

02.02.2026 14:08 — 👍 7 🔁 0 💬 0 📌 0

One downside of tools like Codex is that it enables even more "side quests" -- I was already pretty bad at focusing, and now the ability to write the equivalent of a ticket and have some code to review in 10 minutes makes the problem far worse.

30.01.2026 11:29 — 👍 7 🔁 0 💬 2 📌 0

DataFusion 52 Release Blog is Published datafusion.apache.org/blog/2026/01...

28.01.2026 20:17 — 👍 5 🔁 0 💬 0 📌 0

I love it when I see a whole pile of commits I didn't review go to DataFusion main
github.com/apache/dataf...

27.01.2026 21:55 — 👍 10 🔁 0 💬 0 📌 0

Why Rust Will Help You Deliver Better Low-latency Systems and Happier Developers Andrew Lamb, a veteran of database engine development, shares his thoughts on why Rust is the right tool for developing low-latency systems, not only from the perspective of the code’s performance, bu...

1/5 ➡️ Why Rust Will Help You Deliver Better Low-latency Systems and Happier Developers with Andrew Lamb
bit.ly/47kgmwU

@andrewlamb1111.bsky.social

#RustLang #LowLatency

23.01.2026 11:37 — 👍 5 🔁 2 💬 1 📌 0

Designing a Table Format for ML Workloads Explore designing a table format for ML workloads with practical insights and expert guidance from the LanceDB team.

I have been working on a talk about the future of table formats, specifically what is needed for AI workloads, and I found Weston's blogs on LanceDB well written and super helpful: lancedb.com/blog/designi...

26.01.2026 11:51 — 👍 7 🔁 1 💬 1 📌 0

Meet the Speakers — TokioConf 2026 Discover the speakers behind TokioConf 2026. From core maintainers to community leaders, our lineup shares real-world experience and insights.

We’re excited to share the complete list of speakers joining us at TokioConf 2026 covering performance tricks, architecture patterns, and more.

See all our speakers: www.tokioconf.com/speakers
(Schedule coming soon)

Tickets are on sale: www.eventbrite.com/e/tokioconf-...

09.01.2026 22:11 — 👍 12 🔁 4 💬 0 📌 1

San Francisco Apache DataFusion Meetup · Luma Join us for an evening of talks and community discussion about Apache DataFusion and its growing role in modern data infrastructure. This year’s meetup will…

Apache DataFusion meetup in San Francisco: luma.com/p7r6fp2z Thursday, February 19. We are looking for more speakers and attendees!

17.01.2026 11:17 — 👍 8 🔁 1 💬 0 📌 0

DataFusion blog from Geoffrey Claude explains how to extend DataFusion to support:

-- Postgres style operators
SELECT payload->'user'->>'id'
FROM logs;

-- Statistical sampling
SELECT * FROM sensor_data
TABLESAMPLE BERNOULLI(10 PERCENT);

datafusion.apache.org/blog/2026/01...

14.01.2026 21:04 — 👍 10 🔁 0 💬 0 📌 0

Are You Sure You Want to Use MMAP in Your Database Management System? MMAP Databases = 💩

Since you don't seem to want to cite your own work, I will do it for you: db.cs.cmu.edu/mmap-cidr2022/

12.01.2026 13:29 — 👍 4 🔁 0 💬 0 📌 0

Stoked to be attending North East Database Day on Friday. It is a great mini conference and highlights some of the great research work going on in this area nedbday.github.io/2026/

12.01.2026 13:28 — 👍 4 🔁 1 💬 0 📌 0

Great paper about pruning from Snowflake: arxiv.org/pdf/2504.11540

The LIMIT pruning they describe is 🤯 (so clever once you get it)

We have implemented almost all of the techniques in Apache DataFusion, FWIW

08.01.2026 16:17 — 👍 13 🔁 1 💬 0 📌 0

Come meet fellow Apache DataFusion users and committers at the Stockholm meetup March 5, luma.com/ctqtiqap

07.01.2026 21:05 — 👍 4 🔁 0 💬 0 📌 0

Latest Apache DataFusion blog: more efficient plans and how to efficiently contribute: datafusion.apache.org/blog/output/...

20.12.2025 12:37 — 👍 10 🔁 1 💬 0 📌 1

Qiwei Huang explains how we use Late Materialization (LM) in the Apache Rust Parquet reader to accelerate filtering. LM can describe several techniques, but this is a core one (also applies to joins, Top-K, etc)

arrow.apache.org/blog/2025/12...

12.12.2025 11:40 — 👍 10 🔁 1 💬 0 📌 0

Funnel | The leading marketing intelligence platform Use Funnel to aggregate data from all your marketing platforms. Access powerful reporting and data modeling, and seamlessly export to any destination.

Thanks to funnel.io, we are hosting a DataFusion meetup in Stockholm
Date: Thursday March 5, 2026: 17:30 - 20:00
Signup: luma.com/ctqtiqap

10.12.2025 14:05 — 👍 3 🔁 0 💬 0 📌 0

Posts by Andrew Lamb (@andrewlamb1111.bsky.social)