π¨ New Position Paper π¨
Multiple choice evals for LLMs are simple and popular, but we know they are awful π¬
We complain they're full of errors, saturated, and test nothing meaningful, so why do we still use them? π«
Here's why MCQA evals are broken, and how to fix them π§΅
24.02.2025 21:03 β π 46 π 13 π¬ 2 π 0
Hello, World!
05.04.2025 03:21 β π 3 π 0 π¬ 0 π 0
Assistant Professor at Harvard | Fitness enthusiast | (He/Him/His)
A collaborative research community at Princeton that advances computational and data-intensive humanities scholarship to create a more just future. #HumanitiesforAI
Website: https://cdh.princeton.edu/
Newsletter: cdh.princeton.edu/news/newsletter
Interdisciplinary community advancing language science through research & training in science, education, tech, & health β’ linktr.ee/umd_lsc
PhD student @ Univ of Maryland
NLP, Question Answering, Human AI, LLMs
More at mgor.info
CS PhD Student. Trying to find that dog in me at UMD. Babysitting (aligning) + Bullying (evaluating) LLMs
nbalepur.github.io
We produce research and public resources on democratic attitudes and political behavior. Founded and directed by Sean Westwood (Dartmouth) and Yphtach Lelkes (Penn). www.polarizationresearchlab.org and americaspoliticalpulse.com
Duke Professor directing https://scai.duke.edu, https://sicss.io & https://polarizationlab.com. Author of Breaking the Social Media Prism.
Associate Professor, School of Information, UC Berkeley. NLP, computational social science, digital humanities.
AI @ OpenAI, Tesla, Stanford
Princeton computer science prof. I write about the societal impact of AI, tech ethics, & social media platforms. https://www.cs.princeton.edu/~arvindn/
BOOK: AI Snake Oil. https://www.aisnakeoil.com/
Faculty @Georgetown, Faculty Associate @BKCHarvard
Digital security, "AI", safety, privacy, S*x W*rk
Prev @MSFTresearch @Meta @umdcs @nsfgrfp @datascifellows
Professor, Programmer in NYC.
Cornell, Hugging Face π€
Assistant professor at https://si.umich.edu/ working in computational social science, machine learning, and NLP | https://dallascard.github.io
Postdoctoral fellow at ETH AI Center, working on Computational Social Science + NLP. Previously a PhD in CS at UMD, advised by Philip Resnik. Internships at MSR, AI2. he/him.
On the job market this cycle!
alexanderhoyle.com
NLP faculty - University of Sydney
he/him
(this account is for professional topics only)
https://www.jkk.name
NYU professor, Google research scientist. Good at LaTeX.
Asst Prof at Cornell Info Sci and Cornell Tech. Responsible AI
https://angelina-wang.github.io/
computational social scientist