Alex Miller's Avatar

Alex Miller

@alexmillerdb.bsky.social

Database Papers as a Service

1,880 Followers  |  130 Following  |  309 Posts  |  Joined: 22.10.2024  |  2.0257

Latest posts by alexmillerdb.bsky.social on Bluesky

Kuzu just folded/pivoted.
> Kuzu is working on something new! We will no longer be actively supporting KuzuDB.
is on kuzudb.com

Me thinking a startup has promising tech has so far been a kiss of death to the company. An "Inverse Alex's Tech Opinions" fund might be very profitable. πŸ€”

10.10.2025 18:11 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

Cool!!! I'll give it a try sometime over the next few days when I have a good chunk of time and let you know how it goes!

10.10.2025 18:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

There was an accident with the recording where audio wasn't captured, so instead we can offer a recording from one of Jakob's practice runs on twitch: www.twitch.tv/videos/25845...

07.10.2025 17:26 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1
Post image

Had a fun time at the South Bay Systems meetup last night. Thanks @yugabytedb.bsky.social for hosting!

@codedrift.social gave a great talk on WebAssembly: what it is (and isn't), how it connects to WASI, and promising projects. He cuts through a lot of the hype vs. reality. Recording coming soon.

03.10.2025 22:38 β€” πŸ‘ 25    πŸ” 8    πŸ’¬ 1    πŸ“Œ 1
Preview
SwiftWave SwiftWave is a self-hosted open source lightweight PaaS solution. It is designed to be easy to use and deploy applications.

swiftwave.org looked interesting for small and simple self hosting

30.09.2025 20:19 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I will note that β€œscan sharing” seems specifically inter-query. If you have a single query that scans the same table multiple times and you want to coalesce that to scanning only once, that seems to be classified under subplan reuse instead?
E.g. link.springer.com/content/pdf/...

29.09.2025 00:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

[ASPLOS'25] Fusion: An Analytics Object Store Optimized for Query
Pushdown
www.cs.princeton.edu...

Tightly integrating an Iceberg catalog with an object store means that one could make file-format aware erasure coding decisions, to permit pushing down filters and aggregations.

28.09.2025 23:42 β€” πŸ‘ 14    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

I think you’re talking about scan sharing? 15721.courses.cs.cmu.edu/spring2016/s...

I don’t know the OG citation for this. Andy cites graphs from 15721.courses.cs.cmu.edu/spring2016/p... and ir.cwi.nl/pub/12225/12... also looks pretty reasonable.

28.09.2025 20:28 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

[VLDB] Towards Principled, Practical Document Database Design
www.vldb.org/pvldb/v...

If you've ever wished that there was a document database equivalent for relational databases' 3NF-style schema design guidance, then this is the paper for you.

23.09.2025 17:23 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Err, I mean, I guess Yugabyte is also not linearizable and snapshot isolation, but just because of HLCs being inaccurate. LeanXcale is very intentionally not linearizable, and they mention that you have to do some extra work to even get session consistency out of it.

23.09.2025 07:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Elastic scalable transaction processing in LeanXcale Scaling ACID transactions in a cloud database is hard, and providing elastic scalability even harder. In this paper, we present our solution for elast…

There’s a database startup called leanXcale which is the only non-linearizable snapshot isolation system that I know of. www.sciencedirect.com/science/arti... is a pretty decent overview, but there’s a YouTube talk somewhere too.

23.09.2025 07:14 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Do you think he’s just been β€œI told you so”-ing people since 1964? πŸ˜‚

22.09.2025 21:38 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If you’re not ramped up on WCOJ algorithms, a lot of the papers are complicated to get through, but I thought TreeTracker Join arxiv.org/pdf/2403.01631 was pretty comprehensible and shows the minimal difference for NLJ. Or see justinjaffray.com/a-gentle-ish...

21.09.2025 16:56 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Even as a disliker of YouTube videos as a way to learn things, I found www.youtube.com/watch?v=-XmJ... easier to understand than the paper www.cs.ox.ac.uk/dan.olteanu/... for factorized database work

Extending SQL to Return a Subdatabase dl.acm.org/doi/pdf/10.1... also seems related?

21.09.2025 16:56 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

S1&2 of www.cidrdb.org/cidr2023/pap... paints graph databases 2.0

Then read Kuzu: www.cidrdb.org/cidr2023/pap...
And their blog posts are great intros to the key new things:
* blog.kuzudb.com/post/what-ev...
* blog.kuzudb.com/post/factori...
* blog.kuzudb.com/post/wcoj/

21.09.2025 16:56 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1
Post image Post image

[arXiv] On the Theoretical Limitations of Embedding-Based Retrieval
arxiv.org/abs/2508.2...

It's impossible to retrieve all combinations of pairs of documents post-embedding. Thus, there's usecases that vector search won't do well at. Conversely, BM25 excels in these cases.

21.09.2025 03:38 β€” πŸ‘ 11    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Still looking! Would love to try it out

17.09.2025 20:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

And, the alternative is that you either get a table as a giant list of numbers read out at you, or graphs and diagrams as nothing at best or random keywords at worst. So, even a bit of an inaccurate summary is still an improvement.

11.09.2025 22:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The diagram summaries from accidentally figure heavy papers I ran through it have turned into text that was sufficiently reasonable that it seemed to fit. I checked the results of the first one when I realized what was happening mid-listen, and they seemed reasonable interpretations.

11.09.2025 22:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I have only used full text + no additional context mode, but for a person without the superhuman blind person powers of being able to listen to a code listing and actually make sense of it, the AI generated summaries are a huge step up.

11.09.2025 21:25 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I text-to-speech papers often, and www.paper2audio.com finally did the one thing that I was hoping AI would enable: replace tables/figures/diagrams with a summary of what is being shown. It makes table/diagram-heavy papers actually comprehensible. There's iOS and Android apps, and it's free.

11.09.2025 21:21 β€” πŸ‘ 9    πŸ” 2    πŸ’¬ 3    πŸ“Œ 0

matklad.github.io/2023/10/23/u... and blog.janestreet.com/putting-the-... pitched better code reviewing tooling, and I really hope something polished happens there at some point too. Reviewing PRs in VSCode (local or web) is about the best experience I've had so far.

09.09.2025 19:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A bit tangential now, but I'm also a bit grumpy that there's never really been a project which took off to add issue tracking to git somewhat natively, since you can store non-vcs objects into git objects too. github.com/git-bug/git-... is about the best attempt that I've found so far.

09.09.2025 19:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

jj-vcs.github.io is worth a bit of a try. I bounced off as the mental overhead was too much, but I did see the promise and the workflow outlined in ofcr.se/jujutsu-merg... did deliver the better experience it pitched when work cleanly divides into non-overlapping PRs.

08.09.2025 06:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Specifically for go I see github.com/mmcloughlin/.... It’s worth double checking minio’s highwayhash implementation though, because for this sort of stuff specifically, they tend to be the ones who care the most in go about highly efficient data crunching routines.

06.09.2025 08:19 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Depending on your CPU, the AES derived ones are generally fastest for large chunks of data. Gxhash if you’re in rust, or meowhash is a little slower but more available. Smhasher has performance tests and tons of hash functions, so it’s useful to scrape for a quick answer on best hash function.

06.09.2025 08:14 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

[VLDB] NaviX: A Native Vector Index Design for Graph DBMSs With Robust Predicate-Agnostic Search Performance
www.vldb.org/pvldb/v...

It feels like a follow-on/improvement to ACORN. Also interesting to see HNSW built directly on a graph database working well.

05.09.2025 05:11 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

github.com/documentdb/d...
Really advanced stuff going on here :p

26.08.2025 06:58 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This almost happened, where portage ebuilds were getting wrapped under bazel so ChromeOS could be built via bazel, but the effort got cancelled in google :'(
chromium.googlesource.com/chromiumos/b...

25.08.2025 19:23 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Do you prefer it as an explicit separate materialized view, a rethinkdb/TanStack subscription API, or do you prefer the ReadySet style where you send the dumb compute everything query against the base table but it’s served back to you via IVM?

23.08.2025 06:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@alexmillerdb is following 20 prominent accounts