's Avatar

@bruceritchie.bsky.social

32 Followers  |  67 Following  |  20 Posts  |  Joined: 28.10.2024  |  1.9745

Latest posts by bruceritchie.bsky.social on Bluesky

I wonder if anyone has done a cost analysis of Python code running in the wild compared to a language that actually is performant. I suspect companies are needlessly spending millions because of lazy developers.

17.10.2025 17:54 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Not me, a different Bruce RItchie

01.10.2025 17:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
F3: The Open-Source Data File Format for the Future
SIGMOD 2025

F3: The Open-Source Data File Format for the Future SIGMOD 2025

Our SIGMOD paper with our friends at Tsinghua + @wesmckinney.com + @pateljm.bsky.social on creating a next generation open-source data file format is out. F3 is a future-proof file format avoids the mistakes of Parquet.
πŸ“„ Paper: db.cs.cmu.edu/papers/2025/...
πŸ“ Code: github.com/future-file-...

01.10.2025 13:49 β€” πŸ‘ 67    πŸ” 21    πŸ’¬ 4    πŸ“Œ 5
Preview
GitHub - future-file-format/F3 Contribute to future-file-format/F3 development by creating an account on GitHub.

github.com/future-file-...

01.10.2025 15:43 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Optimizing ClickHouse for Intel's ultra-high core count processors Intel's latest processor generations are pushing the number of cores in a server to unprecedented levels. For analytical databases like ClickHouse, ultra-high core counts represent a huge opportunity ...

Interesting read on what it takes to optimize a database for high core count machines - clickhouse.com/blog/optimiz...

18.09.2025 14:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Vortex | An extensible, SOTA columnar file format Vortex is an extensible, state-of-the-art columnar file format, with associated tools for working with compressed Apache Arrow arrays in-memory, on-disk, and over-the-wire.

I'm tempted to try out the vortex file format (vortex.dev) in my project to see if it has an appreciable impact on performance.

12.09.2025 15:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

ashtom.github.io/developers-r... ... so much absurdity in this it's crazy. Never trust a damn thing from someone whose job depends on selling you something.

06.08.2025 18:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Apache DataFusion 49.0.0 Released - Apache DataFusion Blog

@apachedatafusion.bsky.social 49.0.0 released. Async UDF's, Parquet modular encryption, WITHIN GROUP support, Dynamic Filters and TopK pushdown and much more ... datafusion.apache.org/blog/2025/07...

29.07.2025 21:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Medium has turned into a wasteland of AI generated or AI augmented posts. I'd say less than 25% of the daily digest highlights are actual 'real' articles. Sad.

10.07.2025 14:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A 200 Ok response from S3 ... isn't always ok. Way to go AWS for making your service horrendous to support. repost.aws/knowledge-ce...

30.05.2025 19:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I am unsure whether Google Summer of Code is a benefit or a hindrance to an open source project. Time will tell I suppose by the PR's submitted.

08.04.2025 16:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

It's been well over a year since I started the process of rewriting a large and very long running job from Apache Spark/Scala to Apache DataFusion/Rust. We're now well into doing poc's to rewrite a few other expensive jobs the same way. It's a very nice feeling.

03.04.2025 21:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

This one was going around the office today and made me chuckle :)

28.03.2025 13:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Not by me, I'm not in Florida, nor in the US.

27.02.2025 15:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

coworker in chat: "... cluster is rebalancing and I'm trying to get the jello to stop shaking". Best explanation of rebalancing I've heard in a long time 🀣

27.02.2025 14:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thank you Doug Ford for the $200 vote bribe. I'll use it to contribute to another party and vote to get your kind out of office.

09.02.2025 14:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Had a good chuckle this morning. Gemini was enabled on company corporate accounts and lasted all of 2 days before it was disabled.

31.01.2025 13:58 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The latest paper from the #1 CMU-DB PhD student @samarchdb.bsky.social is wild compilation magic! He automatically makes UDFs run 300x faster on SQL Server and 1.3x faster on DuckDB.
Code: github.com/SamArch27/PR...
Paper: www.vldb.org/pvldb/vol18/...

06.12.2024 14:56 β€” πŸ‘ 53    πŸ” 10    πŸ’¬ 2    πŸ“Œ 1

Working in Rust for the last year has really made me aware of just how useful some features in other languages really are.

- variadic functions
- Default values for arguments
- Named arguments
- Enum variants as types

Rust is getting if let chains in the 2024 edition though so that is something.

26.11.2024 15:46 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Lately there are two things I've been wishing that #Rust had: variadic functions and enum variants as types. Using a builder or macro to work around the first is just that, a workaround. Having the second would make some things much nicer

19.11.2024 20:32 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

64GB of ram is not enough any more.

17.11.2024 17:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Datafusion v43 has seen a lot of performance work especially around reading parquet and the numbers are very nice! From the clickbench benchmark on the same hardware type:

15.11.2024 16:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@bruceritchie is following 20 prominent accounts