Xiangpeng Hao's Avatar

Xiangpeng Hao

@xiangpeng.systems.bsky.social

Database/storage Flight/DataFusion/Arrow/Parquet PhD student@UW-Madison https://xiangpeng.systems

999 Followers  |  117 Following  |  32 Posts  |  Joined: 22.10.2024  |  2.1758

Latest posts by xiangpeng.systems on Bluesky

Appreciate the kind words!

03.11.2025 23:24 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Nice to see this getting shared! ๐Ÿ™Œ Now Iโ€™m even more motivated to turn it into a full course.

29.10.2025 19:01 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Just like other big cities, Madison is getting its own systems talk series. Come join us!

24.10.2025 20:08 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

LiquidCache a distributed pushdown cache for DataFusion, designed to cut down S3 requests for diskless databases.

๐Ÿ’ป Code: github.com/XiangpengHao...
๐Ÿ“„ Paper (VLDB 2026): github.com/XiangpengHao...

10.09.2025 20:48 โ€” ๐Ÿ‘ 9    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
What is LiquidCache?

Thanks you for sharing! slides are here ๐Ÿ‘‰ what-is-liquid-cache.xiangpeng.systems

02.09.2025 00:26 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Hey Tyler ๐Ÿ‘‹ welcome back! I'd be happy to chat, I work in the data systems space (database + storage + cloud) from the same group that also studies storage fault!

01.08.2025 19:52 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
GitHub - XiangpengHao/liquid-cache: 10x lower latency for cloud-native DataFusion 10x lower latency for cloud-native DataFusion. Contribute to XiangpengHao/liquid-cache development by creating an account on GitHub.

Project repo: github.com/XiangpengHao...

16.05.2025 00:53 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Data-Aware Caching for Cloud Analytics

Data-Aware Caching for Cloud Analytics

Join my PhD prelim talk next Monday:

Data-Aware Caching for Cloud Analytics

๐Ÿ• May 19, 1PM CDT
๐Ÿ“ CS2310 or Zoom: uwmadison.zoom.us/j/3081128886

16.05.2025 00:52 โ€” ๐Ÿ‘ 8    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Optimizing SQL (and DataFrames) in DataFusion: Part 1 This post reviews what a Query Optimizer is, what it does, and why you need one for SQL and DataFrames. It also describes how industrial Query Optimizers are structured and standard optimization class...

My manifesto on optimizing SQL and DataFrames in query engines (including an explanation of why Apache DataFusion doesn't have a complex join ordering algorithm):
www.influxdata.com/blog/optimiz... www.influxdata.com/blog/optimiz...

04.04.2025 16:41 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Build your own S3-Select in 400 lines of Rust โ€“ Xiangpengโ€™s blog DataFusion is ALL YOU NEED

New blog post: "Build your own S3-Select in 400 lines of Rust"

Check it out ๐Ÿ˜‰: blog.xiangpeng.systems/posts/build-...

24.03.2025 14:13 โ€” ๐Ÿ‘ 10    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
GitHub - excalidraw/excalidraw: Virtual whiteboard for sketching hand-drawn like diagrams Virtual whiteboard for sketching hand-drawn like diagrams - excalidraw/excalidraw

Credit goes to github.com/excalidraw/e... for making it easy๐Ÿ˜‰

14.03.2025 14:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Experimental parquet decoder with first-class selection pushdown support by XiangpengHao ยท Pull Request #6921 ยท apache/arrow-rs Which issue does this PR close? Many long lasting issues in DataFusion and Parquet. Note that this PR may or may not close these issues, but (imo) it will be the foundation to future more optimiza...

Here's the PR: github.com/apache/arrow...

13.03.2025 18:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Efficient Filter Pushdown in Parquet โ€“ Xiangpengโ€™s blog How to implement efficient filter pushdown in Parquet readers and why itโ€™s challenging in practice.

I submitted a PR that cuts average ClickBench latency by 15% for DataFusion! But reviewing it wasn't straightforward due to the nature of complex performance tuning dynamics, so I made a blog post to explain why it works -- check it out: blog.xiangpeng.systems/posts/parque...

13.03.2025 18:36 โ€” ๐Ÿ‘ 16    ๐Ÿ” 2    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Evil Scheduler: Mastering Concurrency Through Interactive Debugging โ€“ Ao Li TLDR Watch the video below to see how Fray debugger works! I enjoy the concept of Deadlock Empire, an interactive game that teaches the semantics of locks and other concurrency primitives. The core id...

We are excited to share Fray Debugger (aoli.al/blogs/deadlo...), an IntelliJ plugin that allows you to control concurrent execution deterministically!

We have translated the Deadlock Empire (deadlockempire.github.io) into Java to demonstrate how to use Fray Debugger.

12.03.2025 19:25 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Meanwhile, as a PhD student, I still feel frustrated comparing my systems to many ideas that seem novel but lack practical impact. That said, I find โ€œfeet on the ground, head in the cloudsโ€ research very inspiring -- itโ€™s probably what keeps me motivated to stay in academia.

10.03.2025 19:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thanks for the insightful points, Marc! I totally agree that academia is important in many areas. I'm planning a follow-up post discussing the kinds of research that are impactful and beneficial to people, and your examples strongly resonate with what I have in mind!

10.03.2025 19:05 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thanks for sharing your perspective! Itโ€™s always helpful to hear insights from folks whoโ€™ve spent time in industry. Thereโ€™s definitely room for academia to evolve, and Iโ€™m hopeful it will :)

10.03.2025 18:49 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

@xiangpeng.systems shared a great post about system researchers. I wrote a comment on it and would like to share some thoughts here and offer complementary ideas.

In short: build paper with open source.

xuanwo.io/links/2025/0...

10.03.2025 07:26 โ€” ๐Ÿ‘ 8    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Where are we now, system researchers? โ€“ Xiangpengโ€™s blog

Wrote a blog post reflecting my thoughts on DeepSeek, NSF funding and system research communities in general. Apologies for the bold claims -- hope they can invite some discussions.
blog.xiangpeng.systems/posts/system...

10.03.2025 04:49 โ€” ๐Ÿ‘ 11    ๐Ÿ” 2    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Compile to WASM is a very interesting idea! I think Fray at some point explored this a bit, not sure about the current status

22.02.2025 18:52 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
shuttle - Rust Shuttle is a library for testing concurrent Rust code, heavily inspired by Loom.

Current approaches need to replace std locks with framework provided locks, like the ones in shuttle: docs.rs/shuttle/late...

I think binary instrumentation like the one in this paper is possible, but I'm not an expert on this. www.microsoft.com/en-us/resear...

22.02.2025 18:49 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I heard from Fray dev that it is getting a built-in interactive debugger, which visualizes what each threads is doing at a given moment, I can see it to be incredibly useful!

22.02.2025 18:44 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
GitHub - awslabs/shuttle: Shuttle is a library for testing concurrent Rust code Shuttle is a library for testing concurrent Rust code - awslabs/shuttle

Yes, Loom and shuttle: github.com/awslabs/shut...

They are incredibly useful at identifying and reproducing bugs, but I find it quite hard to use them with a debugger, as lldb needs frequently jump to different stacks and I soon lost track of what's going on...

22.02.2025 18:42 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Checkout the underneath framework: github.com/cmu-pasta/fray
Looking forward to a future Rust support๐Ÿ˜‰

22.02.2025 16:50 โ€” ๐Ÿ‘ 9    ๐Ÿ” 0    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0
Gemini API pricing ย |ย  Google AI for Developers The Gemini API for developers offers a robust free tier and flexible pricing as you scale.

It uses Gemini free tier API to translate natural language to SQL: ai.google.dev/pricing#1_5f...

24.11.2024 20:01 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Reading S3 files (through OpenDAL) is planned for next weekend :-)

24.11.2024 19:59 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Video thumbnail

My weekend project now comes with AI super power! Now you can explore Parquet data with natural language! parquet-viewer.haoxp.xyz

24.11.2024 19:58 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

I helped on the string view part, along with many others!

22.11.2024 02:06 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Apache DataFusion is now the fastest single node engine for querying Apache Parquet files <!โ€“

This is amazing -- an open source query engine build on open standard is now the fastest, and it is in Rust! datafusion.apache.org/blog/2024/11...

21.11.2024 23:22 โ€” ๐Ÿ‘ 32    ๐Ÿ” 4    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

New blog post on the fun new hardware advancements which databases can leverage for great gains, and why the cloud means it doesn't matter that they exist. ๐Ÿซ 

transactional.blog/b...

20.11.2024 00:13 โ€” ๐Ÿ‘ 53    ๐Ÿ” 18    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 3

@xiangpeng.systems is following 20 prominent accounts