Andi Zimmerer | Pruning in Snowflake: Working Smarter, Not Harder
Modern cloud-based data analytics systems must efficiently process petabytes of data residing on cloud storage. A key optimization technique in state-of-the-art systems like Snowflake is partition pru...
"The fastest way of processing data is to not process it."
Our SIGMOD 2025 paper shows how Snowflake skips 99.4% of data with new pruning techniques for LIMIT, top-k, and JOIN queries.
Blog: snowflakepruning.github.io
Paper: arxiv.org/abs/2504.11540
@sigmod2025.bsky.social
05.05.2025 05:09 β π 5 π 1 π¬ 0 π 1
SIGMOD BEST PAPER Honorable Mentions
π₯ CRDV: Conflict-free Replicated Data Views
Nuno Faria (INESCTEC & U. Minho)*; JosΓ© Pereira (U. Minho & INESCTEC)
π₯ DPconv: Super-Polynomially Faster Join Ordering
Mihail Stoian (UTN)*; Andreas Kipf (UTN)
22.04.2025 19:29 β π 2 π 1 π¬ 0 π 0
From top to bottom in the list of accepted papers (2025.sigmod.org/sigmod_paper...):
Yannakakis+
RPT
Galley
LpBound [my favorite]
PDX [my 2nd favorite]
GFTR
Libdbos [my 3rd favorite]
MementoFilter
Spilly
10.04.2025 17:57 β π 2 π 0 π¬ 0 π 0
DPconv just won a SIGMOD'25 Honorable Mention! π₯
I was quite impressed, given this year's high-quality papers. Let's see who won the big prize.
My list of candidates in the thread below π§΅.
Paper: dl.acm.org/doi/10.1145/...
Slides: stoianmihail.github.io/assets/dpcon...
10.04.2025 17:57 β π 3 π 0 π¬ 1 π 1
πΊRedbench is now live: github.com/utndatasyste....
Let's see how workload-aware your system really is.
09.04.2025 22:11 β π 1 π 0 π¬ 0 π 0
Idea: LinDP's limitation lies in its cubic-time DP enumeration strategy, that may enumerate invalid subplans.
Our fix:
1. We output only the valid subplans inspired by DPccp.
2. We also transfer DP-states across linearizations by exploiting how IKKBZ creates them.
13.01.2025 07:20 β π 1 π 0 π¬ 0 π 0
π Paper: db.in.tum.de/~birler/pape...
Code: github.com/umbra-db/ada...
Appears at BTW: btw2025.gi.de (the German database conference where "Unnesting Arbitrary Queries" appeared at).
13.01.2025 07:20 β π 1 π 0 π¬ 1 π 0
Umbra's DP optimizer for queries of ~100 relations ran in cubic time.
AWS Redshift's Redset captures a 2,296-relation query.
Our revamped DP enumeration optimizes tree queries like snowflakes of *millions* of relations within 1 sec. πΈ
Joint work w/ Altan Birler & Thomas Neumann.
13.01.2025 07:20 β π 1 π 0 π¬ 1 π 0
YouTube video by Table Representation Learning
Lightweight Correlation-Aware Table Compression
If you don't manage to come by, do check out our 3-min presentation:
πΉ www.youtube.com/watch?v=B0bU...
π openreview.net/forum?id=z7e...
14.12.2024 01:06 β π 1 π 0 π¬ 0 π 0
Are you a fan of Parquet and at #NeurIPS2024 tomorrow? Let's meet at our poster at @trl-research.bsky.social to see how you can reduce your Parquet file sizes by up to 40%.
Virtual compresses tables via functions while ensuring fast column scans.
β° 2.30pm
πEast Meeting Room 11 & 12
14.12.2024 01:06 β π 5 π 3 π¬ 1 π 0
Thumbnail: DataLoom: Simplifying Data Loading with LLMs
Vol:17 No:12 β DataLoom: Simplifying Data Loading with LLMs
π₯ Authors: Alexander Van Renen, Mihail Stoian, Andreas Kipf
π PDF: https://www.vldb.org/pvldb/vol17/p4449-renen.pdf
02.12.2024 05:00 β π 3 π 2 π¬ 0 π 0
Professor of Computer Science at Cambridge.
Principal Researcher in AI/ML/RL Theory @ Microsoft Research NE/NYC. Previously @ MIT, Cornell. http://dylanfoster.net
RL Theory Lecture Notes: https://arxiv.org/abs/2312.16730
building socio-technical complex systems with data | geoheil.com
Databases @TUM; interested in all things human
Research Engineer at New York University. Interested in dataset search & discovery, sketching, data management, nlp, and information retrieval.
PhD at University of Technology Nuremberg, researching on Database Systems. Formerly engineer at Snowflake Inc. on query acceleration; spent some academic time at MIT πΊπΈ, TUM π©πͺ and NTU πΈπ¬. π― Berlin
https://www.andi-zimmerer.com
TCS+ is the original online seminar in theoretical computer science, committed to the carbon-free dissemination of ideas across the globe since 2013. Talks from the cutting edge of research in TCS, for a wide audience: https://www.tcsplus.org
CTO, AI at SAP - #foundationmodel #linkedbusinessdata #knowledgegraph #nlp #ai - www.hoffart.ai
Research & code: Research director @inria
βΊData, Health, & Computer science
βΊPython coder, (co)founder of scikit-learn, joblib, & @probabl.bsky.social
βΊSometimes does art photography
βΊPhysics PhD
Updates on community and events, such as the TRL workshop at NeurIPS 2024 and ACL 2025.
Info: https://table-representation-learning.github.io/
Faculty at CWI & ELLIS Amsterdam https://trl-lab.github.io. Prev at UC Berkeley and the University of Amsterdam. Research on AI and tabular data to democratize insights from structured data.
https://www.madelonhulsebos.com
official Bluesky account (check usernameπ)
Bugs, feature requests, feedback: support@bsky.app
Professor at University of Technology Nuremberg