Katie Keith's Avatar

Katie Keith

@katakeith.bsky.social

NLP and computational social science (CSS) researcher. Assistant Professor in Computer Science at Williams College. AI2 and UMass Amherst alum. she/her. https://kakeith.github.io/

4,024 Followers  |  288 Following  |  35 Posts  |  Joined: 22.12.2023  |  1.8838

Latest posts by katakeith.bsky.social on Bluesky

Codebook LLMs: Evaluating LLMs as Measurement Tools for Political Science Concepts | Political Analysis | Cambridge Core Codebook LLMs: Evaluating LLMs as Measurement Tools for Political Science Concepts

Very excited that my paper with @katakeith.bsky.social is now out in @polanalysis.bsky.social. We investigate whether LLMs actually follow the instructions/definitions provided in codebooks, propose some diagnostics, and release a new evaluation dataset.
www.cambridge.org/core/journal...

19.09.2025 13:45 โ€” ๐Ÿ‘ 30    ๐Ÿ” 14    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 2

Whoa...!! If social-science leaning at all maybe try other preprint servers? SocArXiv for example? We put one of our preprints there: osf.io/preprints/so...

27.08.2025 19:02 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Yes! I agree. It's so rare these days to see a keynote that is so thorough and full of new conceptualizations.

12.08.2025 02:12 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

5300 attendees in person here at #acl2025 ๐Ÿ˜ฎ

30.07.2025 15:31 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The #ACL2025 #ACL2025NLP feed is up and running! It matches both hashtags and any posts from or mentions of @aclmeeting.bsky.social

Pin it to your home ๐Ÿ“Œ and enjoy!

bsky.app/profile/did:...

17.07.2025 11:15 โ€” ๐Ÿ‘ 48    ๐Ÿ” 14    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Topic @adeldaoud.bsky.social and I were discussing today at lunch at #ic2s2 and want to ask here:

What are the โ€œknown factsโ€ in the social sciences? Which relationships between at least two social variables have been empirically found to have large effects and replicated by multiple groups?

24.07.2025 12:57 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Under review! Happy to share a draft if you email me. Thanks!

23.07.2025 19:14 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thanks:)

23.07.2025 14:39 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Highlighting this thread. Based on what I'm seeing at #ic2s2 this week, this line of work is hot (if a bit crowded), but I predict will only be more widely adopted by social scientists in the future.

23.07.2025 13:07 โ€” ๐Ÿ‘ 11    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Not as recent, but still LLM-based

"WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation." GPT-3 composes new examples with similar patterns to challenging examples.

aclanthology.org/2022.finding...

23.07.2025 13:05 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I thought this was a clever and useful paper from Xiong, ... Hovy, El-Assady, Ash "Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification." Using LLMs to help humans refine their codebooks (before codebooks are fixed for the true annotation stage) arxiv.org/pdf/2507.05010

23.07.2025 13:00 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

We used active learning to create a human-annotated dataset of 1050 instances from FOMC transcriptsโ€”labeled for FOMC membersโ€™ opinions and directional stance towards monetary policy. Preprint and dataset should be released publicly by the end of the summer but email me for an advanced copy.

23.07.2025 12:52 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image

Congrats to Alisa Kanganis (Williams College โ€™25) for presenting her thesis work at #ic2s2 today!

23.07.2025 12:52 โ€” ๐Ÿ‘ 11    ๐Ÿ” 0    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

Yay! I'm there as well. Let's sync up.

20.07.2025 11:31 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
U.S. college is first to decline federal science grants because of new DEI language Williams College says NSF and NIH requirement related to discrimination โ€œunderminesโ€ academic freedom

This was top-down decision and Williams faculty have yet to formally discuss it. Unclear whether it is resistance or capitulation.
www.science.org/content/arti...

06.06.2025 21:48 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
A Co-op for Computing Faculty are diving into the exciting, data-crunching, AI world of GPMoo.

Honored by the feature on my research, grant, and GPU cluster by the Williams magazine. today.williams.edu/magazine/a-c...

28.05.2025 01:41 โ€” ๐Ÿ‘ 9    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Personally, I find I have to burn a day answering all the questions (particularly for a dataset release). I think it should be condensed to the 5 most important ones.

20.05.2025 18:27 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

A full room for @katakeith.bsky.social's talk on proximal causal inference with text data โœจโœจโœจ

27.01.2025 23:19 โ€” ๐Ÿ‘ 17    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Mark your calendars for these upcoming events tied to SCI and its One-U Responsible AI Initiative! Visit rai.utah.edu/events for details.

@parasharmanish.bsky.social @katakeith.bsky.social @anamarasovic.bsky.social @freiling.bsky.social

24.01.2025 22:30 โ€” ๐Ÿ‘ 7    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Our semi-synthetic experiments use MIIMIC-III clinical notes and two open-weight LLMs and show that our method produces estimates with low bias.

11.12.2024 01:10 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

For settings with an unobserved (but known) confounding variable, we propose a new causal inference method that uses two instances of pre-treatment text data, infers two proxies using two zero-shot models on the separate instances, and applies these proxies in the proximal g-formula.

11.12.2024 01:10 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Check out our #NeurIPS2024 poster (presented by my collaborators Jacob Chen and Rohit Bhattacharya) about โ€œProximal Causal Inference With Text Dataโ€ at 5:30pm tomorrow (Weds)!

neurips.cc/virtual/2024...

11.12.2024 01:10 โ€” ๐Ÿ‘ 12    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Details - Assistant/Associate Professor - Natural Language Processing (NLP) | Human Resources | UMass Amherst

We're hiring new #nlp faculty this year!

Asst or Assoc Professors in NLP at UMass CICS --
careers.umass.edu/amherst/en-u...

19.11.2024 14:33 โ€” ๐Ÿ‘ 66    ๐Ÿ” 34    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I'm excited to share that we've released v1.0 of our podcast corpus, SPoRC, led by my PhD student Ben Litterer! This first dataset is a slice of time, comprising over one million episodes from May and June 2020, including transcripts, diarization, and extracted audio features.

15.11.2024 15:03 โ€” ๐Ÿ‘ 52    ๐Ÿ” 16    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 4
Preview
All - Bluesky Directory A curated collection of all things relating to the Blue Sky social media platform.

Starter packs are genius, but I was surprised there wasn't a list of them for people to find.

So I built it:
blueskydirectory.com/starter-pack...

The website monitors the packs being shared and adds the ones it finds to the database.

Missed your stater pack? Message me and I'll get it added.

11.11.2024 16:13 โ€” ๐Ÿ‘ 6576    ๐Ÿ” 2975    ๐Ÿ’ฌ 1123    ๐Ÿ“Œ 434

New here? Interested in AI/ML? Check out these great starter packs!

AI: go.bsky.app/SipA7it
RL: go.bsky.app/3WPHcHg
Women in AI: go.bsky.app/LaGDpqg
NLP: go.bsky.app/SngwGeS
AI and news: go.bsky.app/5sFqVNS

You can also search all starter packs here: blueskydirectory.com/starter-pack...

09.11.2024 09:13 โ€” ๐Ÿ‘ 557    ๐Ÿ” 213    ๐Ÿ’ฌ 67    ๐Ÿ“Œ 55

๐Ÿซ ๐Ÿซถ

06.11.2024 20:35 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

In my NLP class (www.cs.williams.edu/~kkeith/teac...) next week, we're talking about eval.

I'd like to have a large section of the lecture focus on contamination. Crowd-sourcing--please send me your favorite contamination papers! Thanks! ๐Ÿ™

06.11.2024 20:27 โ€” ๐Ÿ‘ 16    ๐Ÿ” 3    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 0

go.bsky.app/PCckf3C

05.11.2024 21:39 โ€” ๐Ÿ‘ 17    ๐Ÿ” 11    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Yay! thanks

06.11.2024 14:01 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@katakeith is following 19 prominent accounts