It predicts pretty wellโnot just shifts in the last week, but also:
1. Whoโs working an overnight shift (in our data + external validation in MIMIC)
2. Whoโs working on a disruptive circadian schedule
3. How many patients has the doc seen *on the current shift*
02.07.2025 19:24 โ ๐ 5 ๐ 3 ๐ฌ 1 ๐ 0
I'll be presenting this work at 2pm and will be around until Sunday. Please reach out if you're interested in this line of work - would love to connect in person or virtually!
Thank you to my great collaborators @kyle-macmillan.bsky.social , Anup Malani, Hongyuan Mei, and @chenhaotan.bsky.social
01.05.2025 19:25 โ ๐ 3 ๐ 1 ๐ฌ 1 ๐ 0
ChicagoHAI/CaseSumm ยท Datasets at Hugging Face
Weโre on a journey to advance and democratize artificial intelligence through open source and open science.
CaseSumm is publicly available on HuggingFace! We hope this dataset enables:
- Better evaluation of long-context summarization
- Research on legal language understanding
- Development of more accurate & reliable legal AI tools
Dataset: huggingface.co/datasets/Chi...
01.05.2025 19:25 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Analysis reveals different types of hallucinations:
- Simple factual errors
- Incorrect legal citations
- Misrepresentation of procedural history
- Mischaracterization of Court's reasoning
Fine-tuned smaller models tend to make more egregious errors than GPT-4.
01.05.2025 19:25 โ ๐ 1 ๐ 1 ๐ฌ 1 ๐ 0
CaseSumm is a useful resource for long-context reasoning and legal research:
- Largest legal case summarization dataset
- 200+ years of Supreme Court cases
- "Ground truth" summaries written by Court attorneys and approved by Justices
- Variation in summary styles and compression rates over time
01.05.2025 19:25 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 1
Key findings:
1. A smaller fine-tuned LLM scores well on metrics but has more factual errors.
2. Experts prefer GPT-4 summariesโeven over the โground-truthโ syllabuses.
3. ROUGE and similar metrics poorly reflect human preferences.
4. Even LLM-based evaluations still misalign with human judgment.
01.05.2025 19:25 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
ChicagoHAI/CaseSumm ยท Datasets at Hugging Face
Weโre on a journey to advance and democratize artificial intelligence through open source and open science.
Dataset: huggingface.co/datasets/Chi...
Paper: arxiv.org/abs/2501.00097
When evaluating LLM-generated and human-written summaries, we find interesting discrepancies between automatic metrics, LLM-based evaluation, and human expert judgements.
01.05.2025 19:25 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
We develop CaseSumm, a comprehensive dataset comprising 25K U.S. Supreme Court opinions and their official syllabuses spanning over 200 years, and conduct a rigorous evaluation of long-document summarization using CaseSumm.
01.05.2025 19:25 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
๐งโโ๏ธHow well can LLMs summarize complex legal documents? And can we use LLMs to evaluate?
Excited to be in Albuquerque presenting our paper this afternoon at @naaclmeeting 2025!
01.05.2025 19:25 โ ๐ 23 ๐ 13 ๐ฌ 2 ๐ 0
1/n
You may know that large language models (LLMs) can be biased in their decision-making, but ever wondered how those biases are encoded internally and whether we can surgically remove them?
14.04.2025 19:55 โ ๐ 18 ๐ 12 ๐ฌ 1 ๐ 1
Thank you to my excellent collaborators
Qingcheng Zeng, @chenhaotan.bsky.social, @robvoigt.bsky.social, and Alexander Zentefis!
15.11.2024 18:56 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Causal Micro-Narratives
Mourad Heddaya, Qingcheng Zeng, Alexander Zentefis, Rob Voigt, Chenhao Tan. Proceedings of the The 6th Workshop on Narrative Understanding. 2024.
Iโll be presenting this work at the
EMNLP 2024 Workshop on Narrative Understanding. If you are in Miami, the presentation will be at 3:30pm!
Paper: aclanthology.org/2024.wnu-1.12/
Dataset (soon): mheddaya.com/research/nar...
15.11.2024 18:56 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
Please reach you if you are interested in this line of work, Iโd love to connect in-person or virtually!
15.11.2024 18:56 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Our ongoing work aims to discover narratives automatically, investigate their geographic and temporal trends, understand their potential spread, and assess their influence on economic indicators.
15.11.2024 18:56 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
We're scaling up! Using our fine-tuned models, we're identifying narratives in millions of news articles. Techniques like Design-Based Supervised Learning ensure validity in downstream analyses.
15.11.2024 18:56 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Even human annotators sometimes disagree on narrative presence, but fine-tuned LLMs mirror these natural disagreements more closely than larger models.
Our error analysis shows some mistakes arise from genuine interpretative ambiguity. Check out the last three examples here:
15.11.2024 18:56 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Fine-tuning shines in teaching models to spot narratives, unlike in-context learning. GPT-4o struggles, often misclassifying non-narratives as narratives.
15.11.2024 18:56 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
This is a difficult hierarchical classification task, with many, somestimes semantically similar, classes.
We find that smaller fine-tuned LLMs outperform larger models like GPT-4o, while also offering better scalability and cost efficiency. But they also err differently.
15.11.2024 18:56 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
We define a causal micro-narrative as a sentence-level explanation of a target subject's cause(s) and/or effect(s).
As an application, we propose an ontology for inflation's causes/effects and create a large-scale dataset classifying sentences from U.S. news articles.
15.11.2024 18:56 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
While the importance of narratives has become well recognized, formulating an operational definition remains challenging. Particularly one that is flexible to informal and ambiguous language.
In our work, we address both the conceptual and technical challenges.
15.11.2024 18:56 โ ๐ 2 ๐ 1 ๐ฌ 1 ๐ 0
How do everyday narratives reveal hidden cause-and-effect patterns that shape our beliefs and behaviors?
In our paper, we propose Causal Micro-Narratives to uncover narratives from real-world data. As a case study, we characterize the narratives about inflation in news.
15.11.2024 18:56 โ ๐ 34 ๐ 7 ๐ฌ 1 ๐ 1
professor of CS @Australian National U. Machine learning, social media, online markets. Directs computational media lab http://cmlab.dev and integrated AI network http://ai.anu.edu.au
JD/PhD Candidate @ UChicago Law/CS
https://people.cs.uchicago.edu/~macmillan/
202
PhD student @ MIT | Previously PYI @ AI2 | MS'21 BS'19 BA'19 @ UW | zhaofengwu.github.io
Book: https://thecon.ai
Web: https://faculty.washington.edu/ebender
Data janitor and leftover linguist (retired). Tsundoku expert. Language & Cognition. NLP. Japanese literature. Anti-authoritarian. Pro-science.
Researcher trying to shape AI towards positive outcomes. ML & Ethics +birds. Generally trying to do the right thing. TIME 100 | TED speaker | Senate testimony provider | Navigating public life as a recluse.
Former: Google, Microsoft; Current: Hugging Face
I like tokens! Lead for OLMo data at @ai2.bsky.social (Dolma ๐) w @kylelo.bsky.social. Open source is fun ๐คโ๏ธ๐๐ณ๏ธโ๐ Opinions are sampled from my own stochastic parrot
more at https://soldaini.net
language model pretraining @ai2.bsky.social, co-lead of data research w/ @soldaini.net, statistics @uw, open science, tabletop, seattle, he/him,๐ง kyleclo.com
Asst Prof @uwischool.bsky.social; #NLP #healthinformatics #accessibility #scholcomm
๐ด๐๏ธ๐โ๏ธโท๏ธ๐งถโซ๏ธโช๏ธ๐๐ธin Seattle; llwang.net; she/her
AI, RL, NLP, Games Asst Prof at UCSD
Research Scientist at Nvidia
Lab: http://pearls.ucsd.edu
Personal: prithvirajva.com
jmhessel.com
@Anthropic. Seattle bike lane enjoyer. Opinions my own.
Karaoke enthusiast
๐ฎ๐ฑ
en/he/him
Waiting on a robot body. All opinions are universal and held by both employers and family.
Literally a professor. Recruiting students to start my lab.
ML/NLP/they/she.
Postdoc at UW NLP ๐๏ธ. #NLProc, computational social science, cultural analytics, responsible AI. she/her. Previously at Berkeley, Ai2, MSR, Stanford. Incoming assistant prof at Wisconsin CS. lucy3.github.io/prospective-students.html
Assistant Professor of CS, University of Southern California. NLP / ML.
NLP/AI Research
Assistant Professor @Yale
Climate & AI Lead @HuggingFace, TED speaker, WiML board member, TIME AI 100 (She/her/Dr/๐ฆ)
Professor, Programmer in NYC.
Cornell, Hugging Face ๐ค
Postdoc in the DILL lab at USC. Cornell CS PhD. gyauney.github.io