Valerio Pepe @valeriopepe - Bluesky Profile

Do AI agents ask good questions? We built “Collaborative Battleship” to find out—and discovered that weaker LMs + Bayesian inference can beat GPT-5 at 1% of the cost.

Paper, code & demos: gabegrand.github.io/battleship

Here's what we learned about building rational information-seeking agents... 🧵🔽

27.10.2025 19:17 — 👍 20 🔁 9 💬 1 📌 2

I'm really excited about this work (two years in the making!).

We look at how LLMs seek out and integrate information and find that even GPT-5-tier models are bad at this, meaning we can use Bayesian inference to uplift weak LMs and beat them... at 1% of the cost 👀

28.10.2025 19:39 — 👍 2 🔁 0 💬 0 📌 0

Title page of the paper: WUGNECTIVES: Novel Entity Inferences of Language Models from Discourse Connectives, with two figures at the bottom Left: Our figure 1 -- comparing previous work, which usually predicted the connective given the arguments (grounded in the world); our work flips this premise by getting models to use their knowledge of connectives to predict something about the world. Right: Our main results across 7 types of connective senses. Models are especially bad at Concession connectives.

"Although I hate leafy vegetables, I prefer daxes to blickets." Can you tell if daxes are leafy vegetables? LM's can't seem to! 📷

We investigate if LMs capture these inferences from connectives when they cannot rely on world knowledge.

New paper w/ Daniel, Will, @jessyjli.bsky.social

16.10.2025 15:27 — 👍 33 🔁 10 💬 2 📌 1

Common Descent Machine learning begins with the perceptron

It’s week 4 and probably time to start doing machine learning in machine learning class. We begin with the only nice thing we have: the perceptron.

23.09.2025 14:31 — 👍 40 🔁 3 💬 1 📌 0

Out of curiosity (and my own ignorance), how are teachers aware of students' socioeconomic backgrounds when the students are this young?

I can think of clothing as an immediate signal, and, over time, getting to know parents (and thus their occupations). Are these the main ways this is inferred?

05.09.2025 17:18 — 👍 1 🔁 0 💬 1 📌 0

"for too long has my foot been allowed to carry my body" I say, as I load a shotgun and aim at it.

28.08.2025 16:26 — 👍 16 🔁 1 💬 0 📌 0

Can LLMs learn social skills by playing games? A new open-source framework, TextArena, pits large language models against each other in competitive environments designed to test and improve their communication skills.

Can LLMs learn social skills by playing games?
A blogpost on human-model interaction, games, training and testing LLMs
research.ibm.com/blog/LLM-soc...
🤖📈🧠

24.07.2025 14:16 — 👍 5 🔁 1 💬 0 📌 0

> looking for a coffee
> have to judge if their coffee is burnt or flavorful
> "we have a Cimbali coffee machine"

> buy coffee

> it's burnt

18.07.2025 18:55 — 👍 1 🔁 0 💬 0 📌 0

me: "I want my research to be featured in the NYT"

Genie: (grinning)

23.06.2025 13:30 — 👍 74 🔁 19 💬 0 📌 0

Emergent Misalignment on a Budget — LessWrong TL;DR We reproduce emergent misalignment (Betley et al. 2025) in Qwen2.5-Coder-32B-Instruct using single-layer LoRA finetuning, showing that tweaking…

We take this as evidence that while misalignment directions may exist, the narrative is probably quite nuanced, and EM is not governed by a single vector, as some hypothesized in the aftermath of the original paper.

See it for yourself at:
www.lesswrong.com/posts/qHudHZ...

08.06.2025 20:38 — 👍 1 🔁 0 💬 0 📌 0

However, the steered models often are more incoherent than the finetuned ones, suggesting that emergent misalignment is not entirely guided by a steering vector. The vectors themselves are also not very interpretable, so it is unclear what exactly they are capturing.

08.06.2025 20:38 — 👍 0 🔁 0 💬 1 📌 0

The answer is: yes (sort of).

Though the finetune itself seems to be learning more than a single steering vector, extracting steering vectors and applying them (with sufficient scaling) to the same layer in an un-finetuned version of the model *does* elicit misaligned behavior.

08.06.2025 20:38 — 👍 0 🔁 0 💬 1 📌 0

We finetuned a single layer, and show that on certain layers, this process renders the model nearly as misaligned as a full-layer finetune. This allows us to ask: can we capture this misalignment in a single steering vector taken from the layer?

08.06.2025 20:38 — 👍 0 🔁 0 💬 1 📌 0

An interpretation of the original paper was that EM is mediated by a “misalignment direction” within the model, which the finetuning process changes, rendering the model much more toxic/misaligned.

08.06.2025 20:38 — 👍 0 🔁 0 💬 1 📌 0

Emergent Misalignment on a Budget — LessWrong TL;DR We reproduce emergent misalignment (Betley et al. 2025) in Qwen2.5-Coder-32B-Instruct using single-layer LoRA finetuning, showing that tweaking…

New blog post! www.lesswrong.com/posts/qHudHZ...

Following Emergent Misalignment, we show that finetuning even a single layer via LoRA on insecure code can induce toxic outputs in Qwen2.5-Coder-32B-Instruct, and that you can extract steering vectors to make the base model similarly misaligned 🧵

08.06.2025 20:38 — 👍 1 🔁 0 💬 1 📌 0

Sam is 100% correct on this. Indeed, human babies have essential cognitive priors such as permanence, continuity, and boundary of objects, 3D Euclidean understanding of space, etc.

We spent 2 years to systematically to examine and show the lack of such in MLLMs: arxiv.org/abs/2410.10855

24.05.2025 05:55 — 👍 21 🔁 5 💬 0 📌 0

I think the BabyLM Challenge is really interesting, but also feel that there is something fundamentally ill-posed about how it maps onto the challenge facing human children. It's true that babies only get a relatively limited amount of linguistic experience, but...

19.05.2025 15:24 — 👍 40 🔁 7 💬 3 📌 2

Word learning is usually about what a word does refer to. But can toddlers learn from what it doesn’t?

Our new Cognition paper shows 20-month-olds use negative evidence to infer novel word meanings, reshaping theories of language development.

www.sciencedirect.com/science/arti...

20.05.2025 16:11 — 👍 33 🔁 12 💬 0 📌 0

ai is truly revolutionary -- scientists hadn't previously considered what would happen if sally had simply eaten the marble instead, to know its location at all times

15.05.2025 23:29 — 👍 6 🔁 0 💬 1 📌 0

Let's go Lio!!!

23.04.2025 00:14 — 👍 0 🔁 0 💬 0 📌 0

fair, italy has some incredibly creative offensive slang -- fwiw my favorite usable roman insult ("porco dio" is too offensive for casual use) is "sei 'na pentola de facioli", "you are a pot of beans", i.e. you never stop muttering and talking

07.04.2025 13:55 — 👍 1 🔁 0 💬 0 📌 0

as someone from Rome I'm currently sitting at my laptop like the mentats from Dune trying to figure out what words this could be referring to

we also take pride in preparing gnocchi incorrectly because the rest of italy can't make a decent carbonara to save their lives (no cream and no parmesan!)

07.04.2025 13:30 — 👍 1 🔁 0 💬 1 📌 0

congratulations!!

27.03.2025 14:14 — 👍 1 🔁 0 💬 0 📌 0

the icml keynote will be jensen huang speaking to an empty room

25.03.2025 22:40 — 👍 1 🔁 0 💬 0 📌 0

Photoshop for text. In our #CHI2025 paper “Textoshop”, we explore how interactions inspired by drawing software can help edit text. We consider words as pixels, sentences as regions, and tones as colours. #HCI #NLProc #LLMs #AI Thread 🧵
(1/10)

18.03.2025 21:28 — 👍 55 🔁 16 💬 2 📌 7

Zermelo-Fraenkel memo theory incoming

31.01.2025 15:04 — 👍 3 🔁 0 💬 0 📌 0

(also holy loud sound effects batman)

28.01.2025 15:54 — 👍 0 🔁 0 💬 0 📌 0

saw a new pika model was out on twitter & robot-gasoline-bench does not disappoint

28.01.2025 15:52 — 👍 2 🔁 0 💬 1 📌 0

All the ACL chapters are here now: @aaclmeeting.bsky.social @emnlpmeeting.bsky.social @eaclmeeting.bsky.social @naaclmeeting.bsky.social #NLProc

19.11.2024 03:48 — 👍 107 🔁 37 💬 1 📌 3

(1/5) Very excited to announce the publication of Bayesian Models of Cognition: Reverse Engineering the Mind. More than a decade in the making, it's a big (600+ pages) beautiful book covering both the basics and recent work: mitpress.mit.edu/978026204941...

18.11.2024 16:25 — 👍 522 🔁 119 💬 15 📌 15

Valerio Pepe

Latest posts by valeriopepe.bsky.social on Bluesky

@valeriopepe is following 20 prominent accounts