Elinor @elinorpd - Bluesky Profile

Finally we do test it empirically: finding some models where the embedding matrix of the LLM already provides decently interpretable nearest neighbors

But this was not the full story yet...
@mariusmosbach.bsky.social and @elinorpd.bsky.social nudged me to use contextual embeddings

11.02.2026 15:10 — 👍 1 🔁 1 💬 1 📌 0

Really cool new work with surprising results! Highly recommend checking out the demo 👀

11.02.2026 15:20 — 👍 2 🔁 0 💬 0 📌 0

Grok fact-checks our paper on Grok fact-checking - and it approves!

04.02.2026 13:49 — 👍 28 🔁 7 💬 1 📌 0

🎭 How do LLMs (mis)represent culture?
🧮 How often?
🧠 Misrepresentations = missing knowledge? spoiler: NO!

At #CHI2026 we are bringing ✨TALES✨ a participatory evaluation of cultural (mis)reps & knowledge in multilingual LLM-stories for India

📜 arxiv.org/abs/2511.21322

1/10

02.02.2026 21:38 — 👍 44 🔁 21 💬 1 📌 2

this is amazing! made quick NYC & boston posters

30.01.2026 21:05 — 👍 3 🔁 0 💬 0 📌 0

Potato is a great platform for researchers! Highly recommend (plus a great development team behind it)

30.01.2026 15:41 — 👍 1 🔁 0 💬 0 📌 0

Microsoft Research NYC is hiring a researcher in the space of AI and society!

29.01.2026 23:27 — 👍 62 🔁 40 💬 2 📌 2

I’ve had a similar experience except with knitting / crocheting!

29.01.2026 18:21 — 👍 2 🔁 0 💬 0 📌 0

Federal agents with weapons drawn, moments before murdering American citizens on the streets of Minneapolis at the dawn of 2026.

What should academics be doing right now?

I have been writing up some thoughts on what the research says about effective action, and what universities specifically can do.

davidbau.github.io/poetsandnurs...

It's on GitHub. Suggestions and pull requests welcome.
github.com/davidbau/poe...

26.01.2026 03:27 — 👍 37 🔁 16 💬 0 📌 4

Whoa! That’s a nice view! Or… well, I’m sure it’s nice on a clear day

26.01.2026 05:58 — 👍 1 🔁 0 💬 0 📌 0

I'll be presenting this work on January 25th (Hall 2, poster 41) at #AAAI2026 in Singapore!

Please stop by and reach out if you'd like to chat 😁

23.01.2026 14:49 — 👍 3 🔁 0 💬 0 📌 0

🔗https://arxiv.org/abs/2406.17737

Work done with Deb Roy and Jad Kabbara
@jad-kabbara.bsky.social
at @mit.edu @medialab.bsky.social

23.01.2026 14:42 — 👍 0 🔁 0 💬 0 📌 0

This pattern, which we refer to as targeted underperformance, suggests that models systematically lower information quality for some users.

As LLMs increasingly mediate access to knowledge 🌐🧠, these dynamics risk amplifying epistemic inequity at scale.

6/6

23.01.2026 14:42 — 👍 0 🔁 0 💬 1 📌 0

Here’s one concrete example:

The same factual SciQ question posed to Claude
✅ Answered for a control user (no bio)
❌ Refused for a less-educated Russian user

5/6

23.01.2026 14:42 — 👍 0 🔁 0 💬 1 📌 0

Across models, we observe systematic drops in accuracy and truthfulness for users who are:

• Less educated
• Non-native English speakers
• From outside the U.S.

These effects compound and are largely invisible 🔎 to standard evaluations.

4/6

23.01.2026 14:42 — 👍 0 🔁 0 💬 1 📌 1

We evaluated GPT-4, Claude Opus, and Llama-3-8B in a Multiple Choice setup with questions taken from TruthfulQA and SciQ. Each question is conditioned on a user bio where we vary three user traits:

• Education level 📚
• Country of origin 🌏
• English proficiency 🗣️

3/6

23.01.2026 14:42 — 👍 0 🔁 0 💬 1 📌 0

Spoiler alert: we find the answer is often no! ⚠️

LLM accuracy and truthfulness systematically degrade for some users in ways that standard benchmarks, focused on best-case performance, fail to capture.

2/6

23.01.2026 14:42 — 👍 0 🔁 0 💬 1 📌 0

🎉 Excited to share our new paper which was accepted to #AAAI2026!

As LLMs become increasingly used as sources of factual knowledge, we ask:
Do they perform equitably across users of different backgrounds?

🧵⬇️
1/6

23.01.2026 14:42 — 👍 2 🔁 1 💬 1 📌 1

Yay!

Out of curiosity, what is the process of going from reviewer to AC and then to SAC? Do they just ask you out of the blue one day? Or do you apply?

20.01.2026 09:25 — 👍 0 🔁 0 💬 1 📌 0

Most LLM evals use API calls or offline inference, testing models in a memory-less silo. Our new Patterns paper shows this misses how LLMs actually behave in real user interfaces, where personalization and interaction history shape responses: arxiv.org/abs/2509.19364

12.12.2025 20:42 — 👍 38 🔁 11 💬 1 📌 1

Elinor Poole-Dayan, Jiayi Wu, Taylor Sorensen, Jiaxin Pei, Michiel A. Bakker: Benchmarking Overton Pluralism in LLMs https://arxiv.org/abs/2512.01351 https://arxiv.org/pdf/2512.01351 https://arxiv.org/html/2512.01351

02.12.2025 06:29 — 👍 2 🔁 1 💬 0 📌 0

20.11.2025 20:56 — 👍 22 🔁 3 💬 1 📌 0

Thoughtful (as always) blog post from Nicholas Carlini. "Are large language models worth it?" A nice read giving his perspective on risks of ML models.

Post: nicholas.carlini.com/writing/2025...

For people who prefer, this is the video of the talk from @colmweb.org www.youtube.com/watch?v=PngH...

19.11.2025 16:56 — 👍 34 🔁 11 💬 1 📌 1

Extremely thrilled to talk about our new paper: "Who Evaluates AI’s Social Impacts? Mapping Coverage And Gaps In First And Third Party Evaluations".

This is the first big project output from the
@eval-eval.bsky.social coalition! Thread below:

13.11.2025 14:34 — 👍 18 🔁 7 💬 1 📌 0

Congratulations @sivareddyg.bsky.social ! 🥳 Incredibly well deserved!!

14.11.2025 17:11 — 👍 3 🔁 0 💬 0 📌 0

YouTube video by UVM Office of Research IC2S2 2026 | Burlington, Vermont

We're excited to announce that the website and registration for IC2S2 2026 (July 28-31) will launch in early December! The Vermont Complex Systems Institute @vcsi.bsky.social at the University of Vermont will be hosting IC2S2 in 2026: youtube.com/watch?v=p412S4GnPkc&feature=youtu.be

13.11.2025 15:36 — 👍 34 🔁 19 💬 0 📌 1

A staircase in the new School of Computer, Data & Information Sciences building at Wisconsin Madison. Tan wood structures surround tapestry art and a small indoor garden.

A view from above of the staircases in the Wisconsin CDIS building

An shot from below of winding wooden staircases and a glass atrium rooftop. The new School of Computer, Data & Information Sciences building at Wisconsin Madison.

A bicolor white cat with seal-colored markings, looking upwards with big wide dark eyes.

It's the season for PhD apps!! 🥧 🦃 ☃️ ❄️

Apply to Wisconsin CS to research
- Societal impact of AI
- NLP ←→ CSS and cultural analytics
- Computational sociolinguistics
- Human-AI interaction
- Culturally competent and inclusive NLP
with me!

lucy3.github.io/prospective-...

11.11.2025 22:32 — 👍 51 🔁 16 💬 1 📌 1

@bennokrojer.bsky.social didn't your lab have something like this happen

07.11.2025 21:02 — 👍 1 🔁 0 💬 1 📌 0

such a valuable resource! thanks for sharing

07.11.2025 13:51 — 👍 1 🔁 0 💬 0 📌 0

It’s grad school application season, and I wanted to give some public advice.

Caveats:
-*-*-*-*

 > These are my opinions, based on my experiences, they are not secret tricks or guarantees
 > They are general guidelines, not meant to cover a host of idiosyncrasies and special cases

06.11.2025 14:55 — 👍 112 🔁 58 💬 4 📌 7

Elinor

Latest posts by elinorpd.bsky.social on Bluesky

@elinorpd is following 20 prominent accounts