Otávio Carvalho's Avatar

Otávio Carvalho

@otavioccc.bsky.social

Yet another software bricklayer.

814 Followers  |  136 Following  |  14 Posts  |  Joined: 31.08.2024  |  2.7534

Latest posts by otavioccc.bsky.social on Bluesky

Same for me, I am finding more vibrant communities on the data lake side of things, Iceberg/DataFusion/Arrow and all the Postgres extensions have been interesting me more these days

08.11.2025 22:34 — 👍 0    🔁 0    💬 1    📌 0
Post image

It's true. Every company eventually becomes a database company.

05.08.2025 21:24 — 👍 10    🔁 1    💬 1    📌 0
Post image

Intro to @apachedatafusion.bsky.social : Technology, Community and Not Quite Enough Time: www.youtube.com/watch?v=3per...

15.06.2025 09:31 — 👍 6    🔁 4    💬 0    📌 0
Preview
Vortex on Ice Using Vortex to accelerate Apache Iceberg queries up to 4x

Last month I worked on Apache Iceberg integration for Vortex, the results of which we presented earlier this month at Iceberg Summit.

I wrote a post about my experience bridging our Rust-based system to Iceberg and Spark

spiraldb.com/post/vortex-...

30.04.2025 14:51 — 👍 8    🔁 3    💬 0    📌 2
The Deconstructed Database and the Advent of the Open Data Lake
YouTube video by Data Council The Deconstructed Database and the Advent of the Open Data Lake

The recording of my keynote at Data Council 2025 is now available.
If you missed my best talk so far, you can now catch up.

The Deconstructed Database and the Advent of the Open Data Lake

www.youtube.com/watch?v=Cqhk...

03.06.2025 23:26 — 👍 24    🔁 6    💬 0    📌 1

It’s not pretty, but it’s been working for a few years now. We added some ugly logical sharding on top — a poor people’s version of consistent hashing that I designed and that I’m definitely not proud of (but that works and "scales").

09.06.2025 08:33 — 👍 1    🔁 0    💬 0    📌 0

Neo4j is arguably the leading product in the graph database ecosystem, but it faces challenges with high-throughput write workloads. For our use case (metadata in an observability company), we ultimately chose Postgres with recursive queries and restricted the set of traversal queries we support.

09.06.2025 08:27 — 👍 1    🔁 0    💬 0    📌 0

Complex traversal queries don’t play well with high-throughput writes, and sharding or partitioning is also non-trivial. In my view (and personal suffering), it’s a difficult domain for a one-size-fits-all database, so we naturally ended up with a universe of purpose-specific graph databases.

09.06.2025 08:20 — 👍 3    🔁 0    💬 2    📌 0
Post image

Sunday morning read

jepsen.io/analyses/t...

08.06.2025 04:43 — 👍 4    🔁 1    💬 0    📌 0
A parody of John Waters' "Serial Mom", except it's "Serializable Mom". I'm holding scissors (to partition the network) and trying to channel my best homage to Kathleen Turner.

A parody of John Waters' "Serial Mom", except it's "Serializable Mom". I'm holding scissors (to partition the network) and trying to channel my best homage to Kathleen Turner.

Systems Distributed. June 19-20, Amsterdam.

https://systemsdistributed.com/

07.06.2025 15:53 — 👍 50    🔁 6    💬 3    📌 4
Preview
The Anatomy of a Durable Execution Stack from First Principles The architecture of Restate, a Durable Execution engine built from the ground up.

This post from @stephanewen.bsky.social is *really* thought provoking. Their architecture seems to unify both database and stream processing architecture into one thing.

I'm not totally sure what the implications of this are yet, but it seems important.

27.02.2025 17:31 — 👍 17    🔁 2    💬 1    📌 0
Post image

Is this where we currently are?

13.03.2025 14:40 — 👍 5    🔁 2    💬 0    📌 0
Post image

It's widely known that sharing a queue across multiple servers, rather than queue-per-server, often helps reduce latency and improve utilization. But when is one queue better? In my new blog post, I look at one case: different classes of work. Read it here: brooker.co.za/blog/2025/03...

26.03.2025 16:34 — 👍 26    🔁 2    💬 0    📌 4
Preview
Deciphering language processing in the human brain through LLM representations

Kinda related: research.google/blog/deciphe...

26.03.2025 17:03 — 👍 0    🔁 0    💬 0    📌 0
Preview
The Library of Babel - Wikipedia

The interesting part to me is not how they answer questions. The interesting part is that maybe they will end up proving we are not very complex machines ourselves.

Have you thought about it? Maybe we are just probabilistic dummy generation machines too.

en.m.wikipedia.org/wiki/The_Lib...

25.03.2025 09:43 — 👍 0    🔁 1    💬 2    📌 0

There is a whole field of Neuro-symbolic AI that will likely give us more insights in the future. This is like Shannon's work to me, that was being used before he was able to properly formalize it.

25.03.2025 09:49 — 👍 1    🔁 0    💬 0    📌 0
Preview
The Library of Babel - Wikipedia

The interesting part to me is not how they answer questions. The interesting part is that maybe they will end up proving we are not very complex machines ourselves.

Have you thought about it? Maybe we are just probabilistic dummy generation machines too.

en.m.wikipedia.org/wiki/The_Lib...

25.03.2025 09:43 — 👍 0    🔁 1    💬 2    📌 0

I was trying to reach out to blog.acolyer.org and realized Adrian retired and it is now offline. Although I am happy for him (and he totally deserves to pass the torch on), what a great piece of content we just lost...

02.02.2025 20:01 — 👍 0    🔁 0    💬 0    📌 0
What are important data systems problems, ignored by research? A blog by and for database architects.

I was recently on a panel on "What are important data systems problems, ignored by research?" with @andypavlo.bsky.social and Allison Lee moderated by Viktor Leis - here is the write-up of the discussion databasearchitects.blogspot.com/2024/12/what...

13.12.2024 07:58 — 👍 64    🔁 12    💬 2    📌 2

Serious question: Are we moving towards a Tuple Spaces revival in Distributed Systems practice?

I see everyone building on top of S3/Object Storage and nobody discussing about it at research level (or mentioning any potential links between both topics).

inria.hal.science/hal-01631715...

10.12.2024 11:47 — 👍 1    🔁 0    💬 0    📌 0
Object Storage Is All You Need - Justin Cormack, Docker
YouTube video by CNCF [Cloud Native Computing Foundation] Object Storage Is All You Need - Justin Cormack, Docker

Great talk from Justin Cormack - Object Storage Is All You Need.

It’s widespread at Grafana Labs - Mimir, Loki, Tempo and Pyroscope all follow the pattern.

youtu.be/ei0wwTy6_G4

17.11.2024 18:29 — 👍 49    🔁 12    💬 2    📌 0

While the Lakehouse hype and overall Stream Processing adoption are great to watch (after all those years doing and discussing about it), I'm increasingly concerned about engineering collaboration models and potential overuse.

Anyone else? What are you seeing in your organizations?

27.11.2024 19:23 — 👍 1    🔁 0    💬 0    📌 0
SREcon24 Europe/Middle East/Africa - Are We Really Engineers?
YouTube video by USENIX SREcon24 Europe/Middle East/Africa - Are We Really Engineers?

My #SRECon talk is now out! Watch me destroy some earbuds on stage and learn how software engineering compares to other branches of engineering

www.youtube.com/watch?v=xNeS...

27.11.2024 17:41 — 👍 21    🔁 3    💬 0    📌 0
The Rise of Bluesky – Communications of the ACM

A short analysis of what might have contributed to the recent rise of BlueSky. Now, on the Blogs section at Communications of the ACM cacm.acm.org/blogcacm/the...

27.11.2024 18:42 — 👍 20    🔁 6    💬 1    📌 1
Restate: Going Distributed - Community Meeting November 2024
YouTube video by Restate Restate: Going Distributed - Community Meeting November 2024

Do you want to learn how @restatedev.bsky.social works under the hood and how its distributed architecture is designed? Then this video is for you 🫵 It also includes a first demo of how it runs distributed and can cope with node failures 💪

🎥 www.youtube.com/watch?v=Lgu3...

21.11.2024 14:46 — 👍 7    🔁 2    💬 0    📌 0
Preview
Reply on Bluesky and Decentralization | bryan newbold This is a reply to Christine Lemmer-Webber's thoughtful (and widely read) "How decentralized is Bluesky really?" blog post. I am so happy and grateful that Christine took the time to write up her tho...

wrote up a reply to @dustyweb.bsky.social's "How decentralized is Bluesky really" blog post

27.11.2024 00:28 — 👍 578    🔁 171    💬 37    📌 35

The story of how SlateDB came to be. Couldn’t have happened without reponsive.dev. And now they’re launching RS3, a streaming storage backend, built on SlateDB. 🤩

26.11.2024 17:46 — 👍 16    🔁 4    💬 0    📌 0

I’m fortunate to have a call with the Bluesky backend team for an hour a week, and I’m continually blown away at how good they are at engineering their systems, how thoughtful they are about protocol design, and how smoothly they’ve been handling a huge influx of new users. Seriously impressive

13.11.2024 22:23 — 👍 1251    🔁 127    💬 26    📌 18
Post image

New blog post on caching in DataFusion! See how my research is advancing DataFusion’s capabilities and what’s next:
blog.haoxp.xyz/posts/cachin...

28.10.2024 04:07 — 👍 26    🔁 5    💬 1    📌 1
Post image

I wrote a short tutorial on using Porcupine to check for linearizability (without needing to deal with the JVM).

notes.eatonphil.com/2024-10-31-c...

01.11.2024 00:20 — 👍 21    🔁 4    💬 0    📌 1

@otavioccc is following 19 prominent accounts