ruggsea @ruggsea - Bluesky Profile

A thing I've been working on for the past year: an LLM benchmark on the dreaded Italian medicine faculty entry exam. I will present it in two weeks at @ailc-nlp.bsky.social Clic-it in Cagliari!

10.09.2025 12:47 — 👍 2 🔁 1 💬 0 📌 0

I think that is a big conclusion to make based on loosely validated LLM simulations (still a big fan of their work but Generative simulation is a relatively young research field)

10.09.2025 10:33 — 👍 1 🔁 0 💬 1 📌 0

from this awesome blogpost: snats.xyz/pages/articl...

09.09.2025 09:27 — 👍 1 🔁 0 💬 0 📌 0

Life is going from one cool spot to train AI models to another

In this fancy Berlin library you can listen to vinyls while you do it!

30.07.2025 09:26 — 👍 2 🔁 0 💬 0 📌 0

Me looking very dumb while pointing at things at two academic events:

1. Pointing at a logits inside @repligate.bsky.social's loom for the "Braive New World" conference at @uni-graz.at
2. Pointing at my poster at @ic2s2.bsky.social last week on improving LLM agentic natural conversation synthesis

28.07.2025 13:56 — 👍 5 🔁 0 💬 0 📌 0

Only prisoners have time to read, and if you want to engage in a twenty-year long research project funded by the state, you will have to kill someone.

Sorry for the Fisher posting but it's so good

27.07.2025 13:36 — 👍 1 🔁 0 💬 0 📌 0

criticallegalthinking.com/2013/05/14/a...

12.07.2025 09:54 — 👍 1 🔁 0 💬 0 📌 0

I look really bad/funny in this picture but I am glad I was given the opportunity to talk about LLM research to an audience of cool researchers!

Fun fact: I also held a hands-on session that involved playing around with Bluesky data!

10.07.2025 14:39 — 👍 1 🔁 0 💬 1 📌 0

Really cannot hold comparison to metaculus comments, real alpha there

17.06.2025 23:35 — 👍 6 🔁 1 💬 0 📌 0

A Recipe for Training Neural Networks Musings of a Computer Scientist.

from karpathy.github.io/2019/04/25/r...

09.06.2025 14:27 — 👍 1 🔁 0 💬 0 📌 0

Karpathy on training Neural Networks: you should go slowly and be paranoid

me, while vibecoding torch code: what if i just increase the paranoid part?

09.06.2025 14:27 — 👍 3 🔁 0 💬 2 📌 0

study results partially depend on this hierarchical team coordination ability; I wonder how much the results would change if you gave the agents some less authority loving names

03.06.2025 19:22 — 👍 2 🔁 0 💬 0 📌 0

basically, Multi Agent collaborative task with three agents (with military names). The agents see their name in the prompt; the one called Alpha then appears to be biased to be the team leader

03.06.2025 19:20 — 👍 3 🔁 0 💬 1 📌 0

how not to name your ai agents

03.06.2025 19:19 — 👍 4 🔁 0 💬 1 📌 0

genz semantic embeddings engineering from my collegue

27.05.2025 10:48 — 👍 0 🔁 0 💬 0 📌 0

I am aware some of you have seen this on the other site where I stole it from, but it sounds so interesting not to share here

21.05.2025 10:53 — 👍 1 🔁 0 💬 0 📌 0

backpropagation was inspired by freud

21.05.2025 10:53 — 👍 2 🔁 0 💬 1 📌 0

“How about we pull over for a bit and get some rest?" - GPT4, when it's their turn to drive

20.05.2025 21:23 — 👍 0 🔁 0 💬 0 📌 0

I wish all textbooks were written like this one from @jurafsky.bsky.social‬

18.05.2025 23:14 — 👍 2 🔁 0 💬 0 📌 0

Exclusive essay: 'Time-wars: towards an alternative for the neo-capitalist era' - Gonzo (circus) Is there really no alternative? Right-wing politicians focussed on austerity keep telling us. But Mark Fisher argues there is an alternative. If only we could see how valuable time is to all of us...

www.gonzocircus.com/blog/exclusi...

16.05.2025 09:56 — 👍 0 🔁 0 💬 0 📌 0

Mark Fisher quoting Bifo in the Time-wars essay

16.05.2025 09:56 — 👍 0 🔁 0 💬 1 📌 0

was true in 2012 and it is true now

16.05.2025 09:56 — 👍 0 🔁 0 💬 1 📌 1

Language Models Learn to Mislead Humans via RLHF Language models (LMs) can produce errors that are hard to detect for humans, especially when the task is complex. RLHF, the most popular post-training method, may exacerbate this problem: to achieve h...

Stumbled upon new ammunition for my personal struggle against RHLF: the tendency of models trained this way to give correct sounding answers make us believe in wrong things more

arxiv.org/abs/2409.12822

15.05.2025 01:03 — 👍 4 🔁 1 💬 0 📌 0

I think this is fine in the actual prevalent reasoning training paradigm (rule-based verifiable rewards — so "did we get to the objective right answer?")

About getting longer and longer meaningless reasoning, I always wonder if implementing thinking length penalties would work or not

14.05.2025 22:28 — 👍 1 🔁 0 💬 1 📌 0

I would doubt that, it would be cool if they did that (like fucked up but technically cool), but sometimes the model says it has been "instructed" to talk about that, smells like system prompt to me

14.05.2025 22:00 — 👍 1 🔁 0 💬 0 📌 0

"Noo, you were not supposed to be a demagogic fascist, you were supposed to be a monarchist fascist"

14.05.2025 12:07 — 👍 1 🔁 0 💬 1 📌 0

As an extension of that, models over fitting on the average of human opinion/optimizing for raising dopamine in users above everything else, with very bad consequences

13.05.2025 22:39 — 👍 1 🔁 0 💬 0 📌 0

About concerns: 1. should be bias

Not just for Diversity/Inclusion sake, but for avoiding models being wrong cause the training data is biased

13.05.2025 22:37 — 👍 1 🔁 0 💬 1 📌 0

Looks important for my rec algorithm people: hard confirmation that Google uses essentially hybrid rankings for search: page rank, bert embeddings and a secret ingredient (user data)

13.05.2025 21:07 — 👍 1 🔁 0 💬 0 📌 0

Make the final instruction very ominous and see which model hesitates calling it

13.05.2025 08:32 — 👍 3 🔁 0 💬 0 📌 0

ruggsea

Latest posts by ruggsea.bsky.social on Bluesky

@ruggsea is following 20 prominent accounts