Ryan Marcus's Avatar

Ryan Marcus

@ryanmarc.us.bsky.social

Assistant professor at UPenn. Database systems. https://RyanMarc.us I'm mostly on Mastodon, https://discuss.systems/@ryanmarcus

229 Followers  |  119 Following  |  3 Posts  |  Joined: 21.11.2023  |  1.6922

Latest posts by ryanmarc.us on Bluesky

Infographic describing BayesQO, an offline, multi-iteration learned query optimizer. On the left, it shows a Variational Autoencoder (VAE) being pretrained to reconstruct query plans from vectors, using orange-colored plan diagrams. The decoder part of the VAE is retained. In the center and right, the image shows Bayesian optimization being performed in the learned vector space: new vectors are decoded into query plans, tested for latency, and refined iteratively. At the bottom, a library of optimized query plans is used to train a robot labeled โ€œLLM,โ€ which can then generate new plans directly. The caption reads: "We get a fast query, but also a library of high-quality plans. We can train an LLM to speed up the process for next time!" The image credits Jeff Tao et al., SIGMOD '25, and links to https://rm.cab/bayesqo

Infographic describing BayesQO, an offline, multi-iteration learned query optimizer. On the left, it shows a Variational Autoencoder (VAE) being pretrained to reconstruct query plans from vectors, using orange-colored plan diagrams. The decoder part of the VAE is retained. In the center and right, the image shows Bayesian optimization being performed in the learned vector space: new vectors are decoded into query plans, tested for latency, and refined iteratively. At the bottom, a library of optimized query plans is used to train a robot labeled โ€œLLM,โ€ which can then generate new plans directly. The caption reads: "We get a fast query, but also a library of high-quality plans. We can train an LLM to speed up the process for next time!" The image credits Jeff Tao et al., SIGMOD '25, and links to https://rm.cab/bayesqo

For that one query that must go ๐‘Ÿ๐‘’๐‘Ž๐‘™๐‘™๐‘ฆ ๐‘“๐‘Ž๐‘ ๐‘ก, BayesQO (by Jeff Tao) finds superoptimized plans using Bayesian optimization in a learned plan space. Itโ€™s costly, but the results can train an LLM to speed things up next time.

๐Ÿ“„https://rm.cab/bayesqo

03.06.2025 19:34 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Infographic describing LimeQO, a workload-level, offline, learned query optimizer. On the left, it shows a workload consisting of multiple queries (qโ‚ to qโ‚„), each with a default execution time (3s, 9s, 12s, 22s respectively). On the right, alternate plans (hโ‚, hโ‚‚, hโ‚ƒ) show varying execution times for each query, with some entries missing (represented by question marks). For example, qโ‚ takes 1s under hโ‚‚, much faster than the 3s default. A specific callout highlights that for qโ‚ƒ, plan hโ‚ƒ reduced the time from 12s to 3s, but took 18s to find, resulting in a benefit of 9s gained / 18s search. The image poses the question: โ€œWhere should we explore next to maximize benefit?โ€ The image credits Zixuan Yi et al., SIGMOD '25, and provides a link: https://rm.cab/limeqo

Infographic describing LimeQO, a workload-level, offline, learned query optimizer. On the left, it shows a workload consisting of multiple queries (qโ‚ to qโ‚„), each with a default execution time (3s, 9s, 12s, 22s respectively). On the right, alternate plans (hโ‚, hโ‚‚, hโ‚ƒ) show varying execution times for each query, with some entries missing (represented by question marks). For example, qโ‚ takes 1s under hโ‚‚, much faster than the 3s default. A specific callout highlights that for qโ‚ƒ, plan hโ‚ƒ reduced the time from 12s to 3s, but took 18s to find, resulting in a benefit of 9s gained / 18s search. The image poses the question: โ€œWhere should we explore next to maximize benefit?โ€ The image credits Zixuan Yi et al., SIGMOD '25, and provides a link: https://rm.cab/limeqo

LimeQO (by Zixuan Yi), a ๐‘ค๐‘œ๐‘Ÿ๐‘˜๐‘™๐‘œ๐‘Ž๐‘‘-๐‘™๐‘’๐‘ฃ๐‘’๐‘™ approach to query optimization, can use neural networks or simple linear methods to find good query hints significantly faster than a random or brute force search.

๐Ÿ“„https://rm.cab/limeqo

03.06.2025 19:34 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

OLAP workloads are dominated by repetitive queries -- how can we optimize them?

A promising direction is to do ๐—ผ๐—ณ๐—ณ๐—น๐—ถ๐—ป๐—ฒ query optimization, allowing for a much more thorough plan search.

Two new SIGMOD papers! ๐Ÿงต

03.06.2025 19:34 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@ryanmarc.us is following 20 prominent accounts