Sonia Murthy @soniakmurthy - Bluesky Profile

Thanks Hope! I just came across your related work with the CSS team at Microsoft- I'd love to chat about it sometime if you're free 🙂

11.02.2025 23:20 — 👍 1 🔁 0 💬 0 📌 0

Hi Daniel- thanks so much. The preprint is dependable, though missing a little additional discussion that made it into the camera-ready. I can email you the camera ready and will update arxiv with it shortly. Thank you!

11.02.2025 23:13 — 👍 1 🔁 0 💬 0 📌 0

Alignment reduces conceptual diversity of language models - Kempner Institute As large language models (LLMs) have become more sophisticated, there’s been growing interest in using LLM-generated responses in place of human data for tasks such as polling, user studies, and […]

NEW blog post: Do modern #LLMs capture the conceptual diversity of human populations? #KempnerInstitute researchers find #alignment reduces conceptual diversity of language models. bit.ly/4hNjtiI

10.02.2025 15:19 — 👍 12 🔁 3 💬 0 📌 0

Many thanks to my collaborators and @kempnerinstitute.bsky.social for helping make this idea come to life, and to @rdhawkins.bsky.social for helping plant the seeds 🌱

10.02.2025 17:20 — 👍 2 🔁 1 💬 0 📌 0

GitHub - skmur/onefish-twofish: One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity (NAACL 2025) One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity (NAACL 2025) - skmur/onefish-twofish

(9/9) Code and data for our experiments can be found at: github.com/skmur/onefis...
Preprint: arxiv.org/abs/2411.04427

Also, check out our feature in the @kempnerinstitute.bsky.social Deeper Learning Blog! bit.ly/417WVDL

10.02.2025 17:20 — 👍 0 🔁 0 💬 1 📌 0

(8/9) We think that better understanding such tradeoffs will be important to building LLMs that are aligned to human values– human values are diverse, our models should be too.

10.02.2025 17:20 — 👍 1 🔁 0 💬 1 📌 0

(7/9) This suggests a trade-off: increasing model safety in terms of value alignment decreases safety in terms of diversity of thoughts and opinion.

10.02.2025 17:20 — 👍 3 🔁 0 💬 1 📌 0

(6/9) We put a suite of aligned models, and their instruction fine-tuned counterparts, to the test and found:
* no model reaches human-like diversity of thought.
* aligned models show LESS conceptual diversity than instruction fine-tuned counterparts

10.02.2025 17:20 — 👍 1 🔁 0 💬 1 📌 0

(5/9) Our experiments are inspired by human studies in two domains with rich behavioral data.

10.02.2025 17:20 — 👍 2 🔁 0 💬 1 📌 0

(4/9) We introduce a new way of measuring the conceptual diversity of synthetically-generated LLM "populations" by considering how its “individuals’” variability relates to that of the population.

10.02.2025 17:20 — 👍 0 🔁 0 💬 1 📌 0

(3/9) One key issue is whether LLMs capture conceptual diversity: the variation among individuals’ representations of a particular domain. How do we measure this? And how does alignment affect this?

10.02.2025 17:20 — 👍 2 🔁 0 💬 1 📌 0

(2/9) There's a lot of interest right now in getting LLMs to mimic the response distributions of “populations”--heterogeneous collections of individuals– for the purposes of political polling, opinion surveys, and behavioral research.

10.02.2025 17:20 — 👍 2 🔁 0 💬 1 📌 0

(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @tomerullman.bsky.social and @jennhu.bsky.social, to appear at #NAACL2025! 🐟

We want models that match our values...but could this hurt their diversity of thought?
Preprint: arxiv.org/abs/2411.04427

10.02.2025 17:20 — 👍 63 🔁 10 💬 2 📌 4

Sonia Murthy

Latest posts by soniakmurthy.bsky.social on Bluesky

@soniakmurthy is following 18 prominent accounts