π Weβre thrilled to announce the upcoming AI & Scientific Discovery online seminar! We have an amazing lineup of speakers.
This series will dive into how AI is accelerating research, enabling breakthroughs, and shaping the future of research across disciplines.
ai-scientific-discovery.github.io
25.09.2025 18:28 β π 23 π 15 π¬ 1 π 1
As AI becomes increasingly capable of conducting analyses and following instructions, my prediction is that the role of scientists will increasingly focus on identifying and selecting important problems to work on ("selector"), and effectively evaluating analyses performed by AI ("evaluator").
16.09.2025 15:07 β π 10 π 8 π¬ 2 π 0
Program Committee Interest for the Second Workshop on AI & Scientific Discovery
We are proposing the second workshop on AI & Scientific Discovery at EACL/ACL (Annual meetings of The Association for Computational Linguistics, the European Language Resource Association and Internat...
We are proposing the second workshop on AI & Scientific Discovery at EACL/ACL. The workshop will explore how AI can advance scientific discovery. Please use this Google form to indicate your interest (corrected link):
forms.gle/MFcdKYnckNno...
More in the π§΅! Please share! #MLSky π§
29.08.2025 16:00 β π 14 π 8 π¬ 1 π 0
#ACL2025 Poster Session 1 tomorrow 11:00-12:30 Hall 4/5!
27.07.2025 19:27 β π 3 π 1 π¬ 0 π 1
Excited to present our work at #ACL2025!
Come by Poster Session 1 tomorrow, 11:00β12:30 in Hall X4/X5 β would love to chat!
27.07.2025 13:45 β π 4 π 2 π¬ 0 π 0
Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering.
This is holding us back. π§΅and new paper with @ari-holtzman.bsky.social .
09.07.2025 20:07 β π 37 π 15 π¬ 2 π 0
When you walk into the ER, you could get a doc:
1. Fresh from a week of not working
2. Tired from working too many shifts
@oziadias.bsky.social has been both and thinks that they're different! But can you tell from their notes? Yes we can! Paper @natcomms.nature.com www.nature.com/articles/s41...
02.07.2025 19:22 β π 26 π 11 π¬ 1 π 0
π¨ New paper alert π¨
Ever asked an LLM-as-Marilyn Monroe who the US president was in 2000? π€ Should the LLM answer at all? We call these clashes Concept Incongruence. Read on! β¬οΈ
1/n π§΅
27.05.2025 13:59 β π 28 π 17 π¬ 1 π 1
HypoEval evaluators (github.com/ChicagoHAI/H...) are now incorporated into judges from QuotientAI β check it out at github.com/quotient-ai/...!
21.05.2025 16:58 β π 2 π 2 π¬ 0 π 0
11/n Closing thoughts:
This is a sample-efficient method for LLM-as-a-judge, grounded upon human judgments β paving the way for personalized evaluators and alignment!
12.05.2025 19:27 β π 0 π 0 π¬ 1 π 0
9/n Why HypoEval matters:
We push forward LLM-as-a-judge research by showing you can get:
Sample efficiency
Interpretable automated evaluation
Strong human alignment
β¦without massive fine-tuning.
12.05.2025 19:26 β π 0 π 0 π¬ 1 π 0
8/n π¬ Ablation insights:
Dropping hypothesis generation β performance drops ~7%
Combining all hypotheses into one criterion β performance drops ~8% (Better to let LLMs rate one sub-dimension at a time!)
12.05.2025 19:26 β π 1 π 0 π¬ 1 π 0
7/n πͺ Whatβs robust?
β
Works across out-of-distribution (OOD) tasks
β
Generated hypothesis can be transferred to different LLMs (e.g., GPT-4o-mini β LLAMA-3.3-70B)
β
Reduces sensitivity to prompt variations compared to direct scoring
12.05.2025 19:25 β π 1 π 0 π¬ 1 π 0
6/n π Where did we test it?
Across summarization (SummEval, NewsRoom) and story generation (HANNA, WritingPrompt)
We show state-of-the-art correlations with human judgments, for both rankings (Spearman correlation) and scores (Pearson correlation)! π
12.05.2025 19:25 β π 1 π 0 π¬ 1 π 0
5/n Why is this better?
By combining small-scale human data + literature + non-binary checklists, HypoEval:
πΉ Outperforms G-Eval by ~12%
πΉ Beats fine-tuned models using 3x more human labels
πΉ Adds interpretable evaluation
12.05.2025 19:24 β π 1 π 0 π¬ 1 π 0
4/n These hypotheses break down complex evaluation rubric (ex. βIs this summary comprehensive?β) into sub-dimensions an LLM can score clearly. β
β
β
12.05.2025 19:24 β π 1 π 0 π¬ 1 π 0
3/n π Our solution: HypoEval
Building upon SOTA hypothesis generation methods, we generate hypotheses β decomposed rubrics (similar to checklists, but more systematic and explainable) β from existing literature and just 30 human annotations (scores) of texts.
12.05.2025 19:24 β π 2 π 0 π¬ 1 π 0
2/n Whatβs the problem?
Most LLM-as-a-judge studies either:
β Achieve lower alignment with humans
βοΈ Requires extensive fine-tuning -> expensive data and compute.
β Lack of interpretability
12.05.2025 19:23 β π 3 π 0 π¬ 1 π 0
1/n πππ Thrilled to share our latest workπ₯: HypoEval - Hypothesis-Guided Evaluation for Natural Language Generation! π§ π¬π
Thereβs a lot of excitement around using LLMs for automated evaluation, but many methods fall short on alignment or explainability β letβs dive in! π
12.05.2025 19:23 β π 22 π 7 π¬ 1 π 1
π§ββοΈHow well can LLMs summarize complex legal documents? And can we use LLMs to evaluate?
Excited to be in Albuquerque presenting our paper this afternoon at @naaclmeeting 2025!
01.05.2025 19:25 β π 23 π 13 π¬ 2 π 0
πππExcited to share our latest work: HypoBench, a systematic benchmark for evaluating LLM-based hypothesis generation methods!
There is much excitement about leveraging LLMs for scientific hypothesis generation, but principled evaluations are missing - letβs dive into HypoBench together.
28.04.2025 19:35 β π 11 π 9 π¬ 1 π 0
1/n
You may know that large language models (LLMs) can be biased in their decision-making, but ever wondered how those biases are encoded internally and whether we can surgically remove them?
14.04.2025 19:55 β π 18 π 12 π¬ 1 π 1
I do research in social computing and LLMs at Northwestern with @robvoigt.bsky.social and Kaize Ding.
Uses machine learning to study literary imagination, and vice-versa. Likely to share news about AI & computational social science / Sozialwissenschaft / η€ΎδΌη§ε¦
Information Sciences and English, UIUC. Distant Horizons (Chicago, 2019). tedunderwood.com
Incoming Asst Prof @UMD Info College, currently postdoc @UChicago. NLP, computational social science, political communication, linguistics. Past: Info PhD @UMich, CS + Lx @Stanford. Interests: cats, Yiddish, talking to my cats in Yiddish.
Assistant Professor @ UChicago CS & DSI UChicao
Leading Conceptualization Lab http://conceptualization.ai
Minting new vocabulary to conceptualize generative models.
CS professor at NYU. Large language models and NLP. he/him
Senior applied scientist @Microsoft | PhD from @UChicagoCS | Build LLM copilot for group communications.
PhD @UChicagoCS / BE in CS @Umich / β¨AI/NLP transparency and interpretability/π·π¨photography painting
Doctor of NLP/Vision+Language from UCSB
Evals, metrics, multilinguality, multiculturality, multimodality, and (dabbling in) reasoning
https://saxon.me/
Entrepreneur, pursuer of noise in neurosciences, mechanistical interpretability and interventions in "AI", complexity, concentrated on practical applications of theoretically working solutions. Deeptech, startups.
Anything multiscale itterative nonlinear
A team of three human PIsβAri Holtzman, Mina Lee, and Chenhao Tan studying and building the new information ecosystem of humans and machines. https://substack.com/@cichicago, https://ci.cs.uchicago.edu/
nlp phd student at uchicago cs
Computer Science PhD student at UChicago | Member of the Chicago Human+AI lab @chicagohai.bsky.social
Professor at UW; Researcher at Meta. LMs, NLP, ML. PNW life.
The 2025 Conference on Language Modeling will take place at the Palais des Congrès in Montreal, Canada from October 7-10, 2025
https://chicagohai.github.io/, https://substack.com/@cichicago
Breakthrough AI to solve the world's biggest problems.
βΊ Join us: http://allenai.org/careers
βΊ Get our newsletter: https://share.hsforms.com/1uJkWs5aDRHWhiky3aHooIg3ioxm
Assistant professor, research scientist | boosting scientific discovery with AI, NLP, IR, KG, HCI | @ai2.bsky.social
Ph.D. Student at the University of Chicago | Chicago Human + AI Lab
haokunliu.com
International Conference on Learning Representations https://iclr.cc/