one day Iโll make something that doesnโt need at least two different database implementations running simultaneously. today is not that day
31.10.2025 23:32 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0@mia.pds.parakeet.at
Hi, I'm Mia! Trans (she/her) โข programmer (Rust, ATProto @parakeet.at, sometimes more) โข photographer โข maths/stats nerd โข resident of Normal Island. PFP: https://picrew.me/share?cd=QZKgROU6cC
one day Iโll make something that doesnโt need at least two different database implementations running simultaneously. today is not that day
31.10.2025 23:32 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0yeah it's the mass tagging thing again. looks like that account fired off quite a few in the last hour or so pdsls.dev/at://did:plc...
31.10.2025 13:43 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0screenshot of a bluesky user counter showing 40 million users
number go up
31.10.2025 08:35 โ ๐ 21 ๐ 1 ๐ฌ 0 ๐ 0my kingdom for literally just any media ID in this goddamn file. itโs already over 300MB, you can spare an extra few chars per row for it. I have the playlist ID that was on at the time but thatโs not bloody useful.
30.10.2025 22:48 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0about ready to throw hands with the utter pillock at apple that decided this was a sensible format for the export.
almost like they donโt want you to use this data for anythingโฆ
(Iโm not relying on lastfm data exclusively bc my phone and tablet donโt scrobble)
29.10.2025 22:55 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0itโs currently three steps:
โข import from csv into duckdb
โข process, get metadata, store into staging table
โข create records
allows me to fudge the metadata which I will 100% need to do and Iโd rather do it in datagrip before pushing
This strat will likely break for plays crossing days but thatโs not a thing I tend to do, so I can fudge it with lastfm data if that ever happened.
29.10.2025 22:55 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0thereโs a file with listen history to a daily lvl (kinda hourly but not really*) so I think imma link into that to smooth the data out (and Iโll get track ID that way too - for unknown reasons, activity doesnโt contain any unique media ID)
*a row for song, date + list of hours that song was played.
but only on start events?
29.10.2025 21:49 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0truly a cursed file - at some point it just stops including all but one of the timestamps for 24h or so
29.10.2025 21:48 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0progressively adding more and more things to the server running parakeet but it seems to be dealing okay.
I would get another but I donโt really want many more atm
everyone else seems to be using svelte so maybe Iโll finally give it a proper shot instead of just using react like I always do.
28.10.2025 23:54 โ ๐ 1 ๐ 0 ๐ฌ 2 ๐ 0stewing on a fun little idea off the back of me hopefully dumping years of music history into my PDS tomorrow but I fear I may have to write frontend code again
28.10.2025 23:54 โ ๐ 4 ๐ 0 ๐ฌ 2 ๐ 0this shouldโve been prefixed with I am not a database engineer but this is what Iโve picked up from too many hours of researching how to get this smaller and faster
28.10.2025 23:31 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0but normal columnar stores flat out donโt work for this usecase imo. you need high write speed too but canโt get it because they want large batched writes not many single ones.
Tiger Dataโs PG extensions may provide solutions but I havenโt tested them yet (likewise oriole but that broke last time)
tbqh idk - it might be possible to do compression per partition (if you could settle on a good partition - date??) or per page (but Iโm not in the weeds enough yet to know the implications of this one). When you start venturing further you get interesting Qs about columnar stores and col compression
28.10.2025 23:31 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0something something being locked must feel great for the database
28.10.2025 19:18 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0Iโm 90% sure you could totally run a relay+appview+cdn of the full (Bluesky) network for significantly under ยฃ600 and have it be useable (perf wise).
might even be able to do HA/redundancy for that too.
(I know this isnโt necessarily apples/apples but alas)
a screenshot of Jaeger, a tracing tool. A trace is open and a very tall waterfall is shown (there's a lot of calls to a DB)
pictured: niagara falls
26.10.2025 14:02 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0I worked out the seemingly undocumented arcane incantations to get otel+axum working properly and idk why what I did fixed it.
Worked 1st time with tonic after tho, just need to get the trace id passed over.
want to get it plugged in to the DB too but idk how. think Iโd need support inside diesel?
I would happily use rust for backend and systems stuff and swift for apps/GUI stuff, that sounds ace tbh.
24.10.2025 23:00 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0half the reason I donโt use android is because I refuse to develop for it because I tried it and hate it (I have lost weeks to JNI) but this is getting closer to nice
(yes I tried flutter and RN and I donโt love either)
getting close to the possibility of being able to write android apps without constantly wanting to yeet the laptop out the window, nice.
if/when this can link into UI stuff, weโll be golden.
it took way too long to get events pushing to jaeger at all - now I (just) need to get all the correct scopes and info recorded.
doing the inter service request linking is going to be an experience, too.
I stand by my comment from a while ago that opentelemetry is painful. I have it half working but not in the way it should and thereโs a random warning sometimes. absolutely wonderful stuff.
how much of this is the tracing and Axum integrations? idk.
itโd be good to get the car bug fixed first but I need some test data more than I need *all* test data. would like better metrics first tho*
*Iโm hoping jacquard and its zero copy deserialisation might improve consumer perf but want to test that properly
"Oxford commas are a sign you write with ai" I will find such a unique way to rip out your spine that they'll make a movie about it
23.10.2025 16:50 โ ๐ 8709 ๐ 3350 ๐ฌ 143 ๐ 317there's a lot of data that can and should be compressed (and I think you could pull strong ratios out) but it's working out how to do that and not kill the read performance.
It'll be interesting to see if bluesky's kvdb ends up having any large scale compresion on it too