Sonia Murthy's Avatar

Sonia Murthy

@soniakmurthy.bsky.social

cs phd student and kempner institute graduate fellow at harvard. interested in language, cognition, and ai soniamurthy.com

50 Followers  |  19 Following  |  12 Posts  |  Joined: 30.01.2025  |  1.7366

Latest posts by soniakmurthy.bsky.social on Bluesky

Thanks Hope! I just came across your related work with the CSS team at Microsoft- I'd love to chat about it sometime if you're free πŸ™‚

11.02.2025 23:20 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Hi Daniel- thanks so much. The preprint is dependable, though missing a little additional discussion that made it into the camera-ready. I can email you the camera ready and will update arxiv with it shortly. Thank you!

11.02.2025 23:13 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Alignment reduces conceptual diversity of language models - Kempner Institute As large language models (LLMs) have become more sophisticated, there’s been growing interest in using LLM-generated responses in place of human data for tasks such as polling, user studies, and […]

NEW blog post: Do modern #LLMs capture the conceptual diversity of human populations? #KempnerInstitute researchers find #alignment reduces conceptual diversity of language models. bit.ly/4hNjtiI

10.02.2025 15:19 β€” πŸ‘ 12    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Many thanks to my collaborators and @kempnerinstitute.bsky.social for helping make this idea come to life, and to @rdhawkins.bsky.social for helping plant the seeds 🌱

10.02.2025 17:20 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - skmur/onefish-twofish: One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity (NAACL 2025) One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity (NAACL 2025) - skmur/onefish-twofish

(9/9) Code and data for our experiments can be found at: github.com/skmur/onefis...
Preprint: arxiv.org/abs/2411.04427

Also, check out our feature in the @kempnerinstitute.bsky.social Deeper Learning Blog! bit.ly/417WVDL

10.02.2025 17:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

(8/9) We think that better understanding such tradeoffs will be important to building LLMs that are aligned to human values– human values are diverse, our models should be too.

10.02.2025 17:20 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

(7/9) This suggests a trade-off: increasing model safety in terms of value alignment decreases safety in terms of diversity of thoughts and opinion.

10.02.2025 17:20 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

(6/9) We put a suite of aligned models, and their instruction fine-tuned counterparts, to the test and found:
* no model reaches human-like diversity of thought.
* aligned models show LESS conceptual diversity than instruction fine-tuned counterparts

10.02.2025 17:20 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

(5/9) Our experiments are inspired by human studies in two domains with rich behavioral data.

10.02.2025 17:20 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

(4/9) We introduce a new way of measuring the conceptual diversity of synthetically-generated LLM "populations" by considering how its β€œindividuals’” variability relates to that of the population.

10.02.2025 17:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

(3/9) One key issue is whether LLMs capture conceptual diversity: the variation among individuals’ representations of a particular domain. How do we measure this? And how does alignment affect this?

10.02.2025 17:20 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

(2/9) There's a lot of interest right now in getting LLMs to mimic the response distributions of β€œpopulations”--heterogeneous collections of individuals– for the purposes of political polling, opinion surveys, and behavioral research.

10.02.2025 17:20 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @tomerullman.bsky.social and @jennhu.bsky.social, to appear at #NAACL2025! 🐟

We want models that match our values...but could this hurt their diversity of thought?
Preprint: arxiv.org/abs/2411.04427

10.02.2025 17:20 β€” πŸ‘ 63    πŸ” 10    πŸ’¬ 3    πŸ“Œ 4

@soniakmurthy is following 19 prominent accounts