's Avatar

@boydgraber.bsky.social

191 Followers  |  353 Following  |  38 Posts  |  Joined: 26.11.2024  |  2.543

Latest posts by boydgraber.bsky.social on Bluesky

I think you mean: Terror has spread. Even in The Loop, hundreds run from gunshots. Traffic has ground to a halt. Unable to do anything to prevent the stampede, hundreds scream from the sidewalks. "Police" stand by, doing nothing.

13.10.2025 14:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We had come at it more from the position of trying to use as few dev examples as possible (to keep them secret). I.e., use the best items you could and every model uses the exact same. But it makes sense to use the adaptive testing scenario if you don't mind potentially exposing more dev.

18.09.2025 20:23 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Link to paper since I ran out of room:

users.umiacs.umd.edu/~ying/docs/2...

18.09.2025 20:19 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In 2021, we proposed using IRT to find bad examples and to create more targeted leaderboards (Evaluation
Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?).

From my reading, the big difference seems to be that they're also using the agent's skill, which is super cool!

18.09.2025 20:19 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We also found that it's helpful for improving uncertainty estimation of models:

arxiv.org/abs/2205.12507

18.09.2025 20:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If it said that 1990 was "about 10 years ago", I would say that it has reached tenured faculty-level intelligence.

02.09.2025 16:09 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Today's the deadline to apply for an AI-specific teaching track position at UMD:

umd.wd1.myworkdayjobs.com/UMCP/job/Uni...

Please join us!

22.08.2025 15:46 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A couple of weeks ago I left my family behind at a cable car station to finish climbing to the peak of a mountain because they were too scared to continue. When I reached the top, my phone gave a notification: new podcasts available for download. Apparently LMU has an observatory on Wendelstein.

21.08.2025 14:45 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Do you mean salary, physical facilities, work environment, or funding ecosystem?

04.08.2025 14:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
https://youtu.be/L_hcHQep3fc

At the risk of picking out one of my favorite children, this was the paper with our best traditional video of this cycle (thanks to Jon May for playing along):

t.co/QQlgwzo6jf
t.co/2G6kwAAPMy

28.07.2025 08:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Joy Wongkamjan on X: "Our paper CTRL-D is accepted to ACL Findings and will be presented at ACL 2025! πŸ—“οΈPoster session: 18:00–19:30 (Level 0 Exhibit Halls X4/X5) I’m sad I can’t be there, but Jordan (@boydgraber) will! You’ll enjoy learning about CTRL-D from him. Now… what is CTRL-D? πŸ” https://t.co/ucIPZRHBF1" / X Our paper CTRL-D is accepted to ACL Findings and will be presented at ACL 2025! πŸ—“οΈPoster session: 18:00–19:30 (Level 0 Exhibit Halls X4/X5) I’m sad I can’t be there, but Jordan (@boydgraber) will! You’ll enjoy learning about CTRL-D from him. Now… what is CTRL-D? πŸ” https://t.co/ucIPZRHBF1

Finally, this evening I'll be standing in for
@wwongkamjan.bsky.social
at the Findings poster (18:00, Hall X4/X5): Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL

x.com/joywwong/sta...

28.07.2025 08:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
https://www.cs.umd.edu/~jbg//docs/2025_acl_grace.pdf

t.co/j3Iibs9hEn
www.youtube.com/watch?v=NJKd...

28.07.2025 08:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Yoo Yeon Sung@ACL2025 on X: "I’ll be presenting this work in Room 1.62 today! If you're curious about how calibration errors in LLMs can be measured through human calibration, come find me and @enfleisig! πŸ“Oral Session 3 - HC: Human-centered NLP πŸ“…Monday, July 28@ 2PM" / X I’ll be presenting this work in Room 1.62 today! If you're curious about how calibration errors in LLMs can be measured through human calibration, come find me and @enfleisig! πŸ“Oral Session 3 - HC: Human-centered NLP πŸ“…Monday, July 28@ 2PM

In the second oral paper (14:22 PM, Room 1.62),
@yysung.bsky.social is presenting: GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration

x.com/YooYeonSung1...

(Short version: quiz bowl, a dumb trivia game, shows humans' calibration > LLMs'.)

28.07.2025 08:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
https://youtu.be/wuEIeydhamA

t.co/LagmrMjVgi
t.co/aGM7LC2m0q

28.07.2025 08:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Nishant is ill-prepared for ACL2025 on X: "While personalization is great, it's not perfect. We found our strategy could easily let users jailbreak the model 😭 With some extra safeguards (e.g. refusal training), we think inferred personas could become a promising way to boost personalization in post-training recipes! πŸ§‘β€πŸ³ https://t.co/Fpu1Bl34fI" / X While personalization is great, it's not perfect. We found our strategy could easily let users jailbreak the model 😭 With some extra safeguards (e.g. refusal training), we think inferred personas could become a promising way to boost personalization in post-training recipes! πŸ§‘β€πŸ³ https://t.co/Fpu1Bl34fI

In the first poster session (11AM Monday, Hall X4/X5),
@nbalepur.bsky.social is presenting: Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas

x.com/NishantBalep...

28.07.2025 08:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

My students and I are presenting three papers on Monday at #ACL2025 and this thread will recap them (including their videos).

28.07.2025 08:35 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

The precursor to this paper "The Incoherence of Coherence" had our most-watched paper video ever, so I thought we had to surpass it somehow ... so we decided to do a song parody (of Roxanne, obviously):

youtu.be/87OBxEM8a9E

18.07.2025 18:37 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Which makes this:
users.umiacs.umd.edu/~ying/docs/n...

"The Hobbit"

14.07.2025 14:17 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
2025 QANTA Player Signup Sign up for the human competition for our 2025 QANTA event. More information: https://sites.google.com/view/qanta/2025-competition/2025-human-teams

And you can signup for online mirror (June 21, 12:00 EST) here:
docs.google.com/forms/d/e/1F...

[Signup deadline: June 18 Anywhere on Earth]

17.06.2025 15:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
QANTA Project - 2025 Human Teams How this Works It's always difficult to try something new, but we think this will be fun! Short Version: You'll play multiple rounds of tossup-bonus quiz bowl with computer teammates. The computer ca...

If this sounds like fun, we have more information on the setup here:
sites.google.com/view/qanta/2...

17.06.2025 15:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Human-Computer Cooperation Tournament (College Park: June 14, Online: June 21) - The Quizbowl Resource Center Sponsored by the Partnership for Academic Competition Excellence

There are more reflections here:
hsquizbowl.org/forums/viewt...

17.06.2025 15:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Sara’s Crias went 4-2 to win the tournament (and $150 dollars). Noah Sheidlower’s music packet was the most difficult for computers, and Jame Carlson’s Spatial Reasoning was the fan favorite. We’ll announce writer and computer prizes after our online mirror. (And also post the packets.)

17.06.2025 15:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Human-Computer AI Collaborative Tournament Gameplay

Human-Computer AI Collaborative Tournament Gameplay

We had our first human–computer cooperative AI tournament at the UMD. Key takeaways: 1) computers are getting better at trivia 2) they still suck at calibration 3) our teaming mechanic kept the games competitive and mostly fun (at least that’s what the players said).

17.06.2025 15:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
QANTA Project - 2025 Human Teams How this Works It's always difficult to try something new, but we think this will be fun! Short Version: You'll play multiple rounds of tossup-bonus quiz bowl with computer teammates. The computer ca...

And get more information here:
sites.google.com/view/qanta/2...

10.06.2025 16:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
2025 QANTA Player Signup Sign up for the in-person competition for our 2025 QANTA event. More information: https://sites.google.com/view/qanta/2025-competition/2025-human-teams

Sign up to play here:
docs.google.com/forms/d/e/1F...

10.06.2025 16:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
QANTA: Question Answering is not a Trivial Activity Logo [Two humans working with a computer to answer a question]

QANTA: Question Answering is not a Trivial Activity Logo [Two humans working with a computer to answer a question]

Today is the deadline to sign up for our Human-Computer trivia competition held on June 14, 2024 in College Park, MD. $150 prize for the team who can answer the most questions with the help of an AI.

10.06.2025 16:23 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
QANTA Project - 2025 Competition πŸ€– πŸ«±πŸ»β€πŸ«²πŸΎ 🧠 What is QANTA 2025? Welcome to QANTA 25: the world’s first Human–AI Cooperative Trivia competition! We’re building a fun, interactive battlefield to discover who really reigns supreme in que...

If you want to play in the tournament as a human, go here to sign up or get more information:

sites.google.com/view/qanta/2...

05.06.2025 16:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The questions will be specially designed to sound (mostly) normal to humans but often stump computers (β€œadversarial”). If you think you can write questions like that, you can submit your questions to the tournament too (but not if you’re playing).

05.06.2025 16:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

If you’re into trivia and can figure out when an AI is feeding you good information vs. garbage, sign up as a human team. You’ll be assigned computer β€œplayers” to help you answer questions, but watch out: they’re often right, but even when they’re wrong, they can be pretty convincing.

05.06.2025 16:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
QANTA Logo: Question Answering is not a Trivial Activity

[Humans and computers competing on a buzzer]

QANTA Logo: Question Answering is not a Trivial Activity [Humans and computers competing on a buzzer]

Do you like trivia? Can you spot when AI is feeding you BS? Or can you make AIs turn themselves inside out? Then on June 14 at College Park (or June 21 online), we have a competition for you.

05.06.2025 16:17 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

@boydgraber is following 20 prominent accounts