Louis Maddox's Avatar

Louis Maddox

@permutans.bsky.social

Combinatorially curious https://spin.systems

172 Followers  |  120 Following  |  3,098 Posts  |  Joined: 22.10.2024  |  1.8084

Latest posts by permutans.bsky.social on Bluesky

Peyman Milanfar says that the “Drift” models are hyped and not that deep

Peyman Milanfar says that the “Drift” models are hyped and not that deep

or not then

10.02.2026 13:16 — 👍 0    🔁 0    💬 0    📌 0
Loss and f1 plots each with a series heading off in the wrong direction

Loss and f1 plots each with a series heading off in the wrong direction

Weird f1 but ok 📉

10.02.2026 09:58 — 👍 0    🔁 0    💬 0    📌 0
Need help with my multi-monitor setup. Is this layout optimal?
System > Display
Delect a diaplay so change the settings for it Drag chipiays to rearrange them

17 monitors in the best known optimal box packing which looks like chaos

Need help with my multi-monitor setup. Is this layout optimal? System > Display Delect a diaplay so change the settings for it Drag chipiays to rearrange them 17 monitors in the best known optimal box packing which looks like chaos

10.02.2026 03:54 — 👍 13    🔁 1    💬 0    📌 0
Preview
Generative Modeling via Drifting Generative modeling can be formulated as learning a mapping f such that its pushforward distribution matches the data distribution. The pushforward behavior can be carried out iteratively at inference...

new SotA diffusion model usurper just dropped arxiv.org/abs/2602.047...

10.02.2026 03:06 — 👍 0    🔁 0    💬 1    📌 0
Median label F1 score rocketing up above the previous plot series

Median label F1 score rocketing up above the previous plot series

😮‍💨🩵🚀

10.02.2026 03:02 — 👍 0    🔁 0    💬 0    📌 0

No thoughts only median label F1

10.02.2026 02:59 — 👍 0    🔁 0    💬 1    📌 0

and then a miracle occurred (I remembered the existence of bert-large)

10.02.2026 02:11 — 👍 1    🔁 0    💬 0    📌 0

Problem: W&B plots can only group by a single metric
Solution: …throw yr HP combos in a Pydantic computed field?

09.02.2026 18:01 — 👍 0    🔁 0    💬 0    📌 0
Query: “is there a name for the metric of ‘count of micro F1 scores that are zero’?”

Claude response:

Investigated metric terminology and concluded no standard name exists ›
There isn't a widely canonicalized name for this exact quantity, but several closely related concepts circulate in practice:
"Class coverage" (or "label coverage") is probably the nearest established term - it's usually expressed as the fraction (or count) of classes where the model achieves any correct predictions (F1 > 0). Your metric is the complement: the count of unrecovered / missed classes. You'll see this phrasing a lot in NER and extreme multi-label classification papers.
Some other terms people use for essentially the same idea or close relatives:
• "Zero-F1 classes" or "zero-recall classes" - descriptive rather than a formal name, but common in error analysis sections of papers.
• . "Class dropout" — informal practitioner term for when a model simply never predicts
certain labels.
• In extreme classification (XML) literature, labels that never get predicted are sometimes called "missing labels" or "tail label recall" is tracked as a proxy, since zero-F1 classes tend to cluster in the long tail.
• "Macro recall at k=0" is a slightly tortured but precise way to frame it: what fraction of true classes have recall exactly equal to zero.
If you're reporting this in a paper or dashboard, "class coverage" (with its complement
"zero-Fl class count") is probably the most immediately understood framing. Were you looking for this in the context of a specific task (NER, multi-label, etc.)? The conventions drift a bit across subfields.

Query: “is there a name for the metric of ‘count of micro F1 scores that are zero’?” Claude response: Investigated metric terminology and concluded no standard name exists › There isn't a widely canonicalized name for this exact quantity, but several closely related concepts circulate in practice: "Class coverage" (or "label coverage") is probably the nearest established term - it's usually expressed as the fraction (or count) of classes where the model achieves any correct predictions (F1 > 0). Your metric is the complement: the count of unrecovered / missed classes. You'll see this phrasing a lot in NER and extreme multi-label classification papers. Some other terms people use for essentially the same idea or close relatives: • "Zero-F1 classes" or "zero-recall classes" - descriptive rather than a formal name, but common in error analysis sections of papers. • . "Class dropout" — informal practitioner term for when a model simply never predicts certain labels. • In extreme classification (XML) literature, labels that never get predicted are sometimes called "missing labels" or "tail label recall" is tracked as a proxy, since zero-F1 classes tend to cluster in the long tail. • "Macro recall at k=0" is a slightly tortured but precise way to frame it: what fraction of true classes have recall exactly equal to zero. If you're reporting this in a paper or dashboard, "class coverage" (with its complement "zero-Fl class count") is probably the most immediately understood framing. Were you looking for this in the context of a specific task (NER, multi-label, etc.)? The conventions drift a bit across subfields.

Today in poorly named things: “XML” (not the serialisation format), “dropout” (not the Hintonian technique but the multiclass classifier learner capacity metric)

09.02.2026 16:21 — 👍 0    🔁 0    💬 0    📌 0

Developing an elaborate bit where I scale and parallelise my coffee/espresso consumption in line w Modal GPU sweep size, precision, subsampling parameters, things of that nature

09.02.2026 14:42 — 👍 0    🔁 0    💬 0    📌 0
A pruning chart showing hyperparam selection in increments (halving like a ddmin algorithm)

A pruning chart showing hyperparam selection in increments (halving like a ddmin algorithm)

Everything Is ddmin araffin.github.io/post/hyperpa...

Hyperband arxiv.org/abs/1603.06560

09.02.2026 11:50 — 👍 2    🔁 0    💬 0    📌 0
Preview
'Reckless' driver who killed friend in Scunthorpe crash jailed Ben Bailey, 36, crashed in Scunthorpe after driving at high speed while inhaling nitrous oxide.

Been many viral images/videos of people driving cars doing the stuff and deaths from drug driving www.bbc.co.uk/news/article...), I wouldn’t say it’s safe and minimise it like this! I’ve also seen warnings posted on socials that the balloon pushers who descend on venues at closing time steal phones

08.02.2026 17:14 — 👍 0    🔁 0    💬 0    📌 0

Who called them hallucinations and not ÆIslop's Fables

08.02.2026 14:59 — 👍 1    🔁 0    💬 0    📌 0

“The point about formal notation is *not* to output it… it is a tool for thought.

The point about LLM generated [code] is that you ideally don’t [formally notate]. Saying the LLM could output [formal notation] is missing the point”

✍️ Transcript (also in alt text): gist.github.com/lmmx/9838843...

08.02.2026 13:38 — 👍 0    🔁 0    💬 0    📌 0
Tweet from Politico

“The Epstein scandal is taking down Europe's political class. In the US, they're getting a pass.”

Greyscale photomontage of ex prince Andrew, Keir Starmer, Fergie(?) and Peter Mandelson vs Musk, Trump, Bannon, and Larry Ellison(?)

Tweet from Politico “The Epstein scandal is taking down Europe's political class. In the US, they're getting a pass.” Greyscale photomontage of ex prince Andrew, Keir Starmer, Fergie(?) and Peter Mandelson vs Musk, Trump, Bannon, and Larry Ellison(?)

Another darker greyscale photomontage of Thiel, Epstein, Musk, Trump

Another darker greyscale photomontage of Thiel, Epstein, Musk, Trump

Big day for ominous greyscale photomontageurs

08.02.2026 02:22 — 👍 0    🔁 1    💬 0    📌 0
Exact duplicate text span annotations being removed

Exact duplicate text span annotations being removed

Subset span annotations being removed

Subset span annotations being removed

Ladies and gentlespans... We got 'em

08.02.2026 01:56 — 👍 0    🔁 0    💬 0    📌 0
Preview
Here are all the phones that support bypass charging, and why you should use it Few phones on the market can draw power directly from the wall, bypassing the battery while charging. But are the benefits tangible enough?

Today I learnt about "bypass charging", a phone feature where it can run directly off the mains without using the battery at all. No battery charging, less heat, less battery wear. Nifty.

www.androidauthority.com/phones-with-...

07.02.2026 20:56 — 👍 1    🔁 2    💬 0    📌 0

Opus 4.6 flubbing a dedup task baddd

07.02.2026 20:06 — 👍 0    🔁 0    💬 1    📌 0

They're turning the AIs into golems now

07.02.2026 17:42 — 👍 0    🔁 0    💬 0    📌 0

hooooollllly....

this is so cool, i did not expect it to actually work.

07.02.2026 07:49 — 👍 62    🔁 7    💬 1    📌 2
Loss curve showing a new Viterbi loss line minimising in transit and coming from comically high up

Loss curve showing a new Viterbi loss line minimising in transit and coming from comically high up

lmao

07.02.2026 01:28 — 👍 0    🔁 0    💬 0    📌 0

Currently: vibe Viterbi’ing

07.02.2026 01:23 — 👍 0    🔁 0    💬 1    📌 0

Electra: underperformed vanilla BERT (and BERT cased seems to learn the task faster, though still reaches the same limit of F1 which I can only interpret as my data limit showing)

I feel like I’m being pranked wdym BERT is SotA

07.02.2026 01:03 — 👍 0    🔁 0    💬 0    📌 0

Ah sweet! I worked w/ SMILES as an intern in ddx ML a decade ago, RF regressors etc predicting pharmacodynamic properties

It reminds me of how ML training shops ‘launder’ user data into datasets by a similar trick

Also surely could be used to amass a lib of code refactoring patterns [probably $$$]

07.02.2026 01:00 — 👍 0    🔁 0    💬 0    📌 0

Fun fact I learnt today: a method for data augmentation in NLP is ‘back-translation’ (translating to another language and back)

07.02.2026 00:38 — 👍 4    🔁 1    💬 1    📌 0

L + Ratio + Cope + Seethe + Send + Sync + 'static

06.02.2026 19:24 — 👍 275    🔁 63    💬 2    📌 1

am I allowed to question minibatching received wisdom or

06.02.2026 18:08 — 👍 0    🔁 0    💬 0    📌 0

🪄🐝! It's cool I've looked through all the things it stores, one is a mysterious telemetry binary

06.02.2026 16:50 — 👍 0    🔁 0    💬 0    📌 0

becoming a retvrn guy for WordPiece

06.02.2026 15:21 — 👍 0    🔁 0    💬 0    📌 1
A bunch of loss/F1/best F1 in weights and biases with the F1 for the green deberta v3 run (labelled tok for tokeniser fix and cos for cosine scheduled warmup) shooting above the previous best epoch F1 for non-BERT architectures [ModernBERT and deberta-v3-base with other settings]

A bunch of loss/F1/best F1 in weights and biases with the F1 for the green deberta v3 run (labelled tok for tokeniser fix and cos for cosine scheduled warmup) shooting above the previous best epoch F1 for non-BERT architectures [ModernBERT and deberta-v3-base with other settings]

we are so deberta-v3-back 🚀

06.02.2026 15:02 — 👍 1    🔁 0    💬 0    📌 0

@permutans is following 20 prominent accounts