I think you mean: Terror has spread. Even in The Loop, hundreds run from gunshots. Traffic has ground to a halt. Unable to do anything to prevent the stampede, hundreds scream from the sidewalks. "Police" stand by, doing nothing.
13.10.2025 14:30 β π 1 π 0 π¬ 0 π 0
We had come at it more from the position of trying to use as few dev examples as possible (to keep them secret). I.e., use the best items you could and every model uses the exact same. But it makes sense to use the adaptive testing scenario if you don't mind potentially exposing more dev.
18.09.2025 20:23 β π 1 π 0 π¬ 1 π 0
Link to paper since I ran out of room:
users.umiacs.umd.edu/~ying/docs/2...
18.09.2025 20:19 β π 1 π 0 π¬ 1 π 0
In 2021, we proposed using IRT to find bad examples and to create more targeted leaderboards (Evaluation
Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?).
From my reading, the big difference seems to be that they're also using the agent's skill, which is super cool!
18.09.2025 20:19 β π 2 π 0 π¬ 1 π 0
We also found that it's helpful for improving uncertainty estimation of models:
arxiv.org/abs/2205.12507
18.09.2025 20:13 β π 0 π 0 π¬ 0 π 0
If it said that 1990 was "about 10 years ago", I would say that it has reached tenured faculty-level intelligence.
02.09.2025 16:09 β π 1 π 0 π¬ 0 π 0
Today's the deadline to apply for an AI-specific teaching track position at UMD:
umd.wd1.myworkdayjobs.com/UMCP/job/Uni...
Please join us!
22.08.2025 15:46 β π 2 π 0 π¬ 0 π 0
A couple of weeks ago I left my family behind at a cable car station to finish climbing to the peak of a mountain because they were too scared to continue. When I reached the top, my phone gave a notification: new podcasts available for download. Apparently LMU has an observatory on Wendelstein.
21.08.2025 14:45 β π 3 π 0 π¬ 0 π 0
Do you mean salary, physical facilities, work environment, or funding ecosystem?
04.08.2025 14:17 β π 0 π 0 π¬ 1 π 0
https://youtu.be/L_hcHQep3fc
At the risk of picking out one of my favorite children, this was the paper with our best traditional video of this cycle (thanks to Jon May for playing along):
t.co/QQlgwzo6jf
t.co/2G6kwAAPMy
28.07.2025 08:35 β π 0 π 0 π¬ 0 π 0
https://www.cs.umd.edu/~jbg//docs/2025_acl_grace.pdf
t.co/j3Iibs9hEn
www.youtube.com/watch?v=NJKd...
28.07.2025 08:35 β π 0 π 0 π¬ 1 π 0
My students and I are presenting three papers on Monday at #ACL2025 and this thread will recap them (including their videos).
28.07.2025 08:35 β π 7 π 2 π¬ 1 π 0
The precursor to this paper "The Incoherence of Coherence" had our most-watched paper video ever, so I thought we had to surpass it somehow ... so we decided to do a song parody (of Roxanne, obviously):
youtu.be/87OBxEM8a9E
18.07.2025 18:37 β π 7 π 2 π¬ 0 π 0
Which makes this:
users.umiacs.umd.edu/~ying/docs/n...
"The Hobbit"
14.07.2025 14:17 β π 2 π 0 π¬ 0 π 0
Saraβs Crias went 4-2 to win the tournament (and $150 dollars). Noah Sheidlowerβs music packet was the most difficult for computers, and Jame Carlsonβs Spatial Reasoning was the fan favorite. Weβll announce writer and computer prizes after our online mirror. (And also post the packets.)
17.06.2025 15:35 β π 0 π 0 π¬ 1 π 0
Human-Computer AI Collaborative Tournament Gameplay
We had our first humanβcomputer cooperative AI tournament at the UMD. Key takeaways: 1) computers are getting better at trivia 2) they still suck at calibration 3) our teaming mechanic kept the games competitive and mostly fun (at least thatβs what the players said).
17.06.2025 15:35 β π 0 π 0 π¬ 1 π 0
QANTA: Question Answering is not a Trivial Activity Logo [Two humans working with a computer to answer a question]
Today is the deadline to sign up for our Human-Computer trivia competition held on June 14, 2024 in College Park, MD. $150 prize for the team who can answer the most questions with the help of an AI.
10.06.2025 16:23 β π 1 π 0 π¬ 1 π 0
The questions will be specially designed to sound (mostly) normal to humans but often stump computers (βadversarialβ). If you think you can write questions like that, you can submit your questions to the tournament too (but not if youβre playing).
05.06.2025 16:17 β π 0 π 0 π¬ 1 π 0
If youβre into trivia and can figure out when an AI is feeding you good information vs. garbage, sign up as a human team. Youβll be assigned computer βplayersβ to help you answer questions, but watch out: theyβre often right, but even when theyβre wrong, they can be pretty convincing.
05.06.2025 16:17 β π 0 π 0 π¬ 1 π 0
QANTA Logo: Question Answering is not a Trivial Activity
[Humans and computers competing on a buzzer]
Do you like trivia? Can you spot when AI is feeding you BS? Or can you make AIs turn themselves inside out? Then on June 14 at College Park (or June 21 online), we have a competition for you.
05.06.2025 16:17 β π 0 π 1 π¬ 1 π 0
Postdoc @milanlp.bsky.social working on LLM safety and societal impacts. Previously PhD @oii.ox.ac.uk and CTO / co-founder of Rewire (acquired '23)
https://paulrottger.com/
I write about genetics, 'metrics, and demographics.
My Substack is https://www.cremieux.xyz/
Writing with a focus on urbanism, culture, and popular history. I write The Deleted Scenes, a daily, mostly-urbanism newsletter on Substack. a.delmastro2@gmail.com.
Postdoc @ai2.bsky.social & @uwnlp.bsky.social
Ph.D. candidate in Artificial Intelligence at University of Liverpool | MSc in Advanced Computer Science from Swansea University | Lecturer at University of Bisha
Research Lead @parameterlab.bsky.social working on Trustworthy AI
Speaking π«π·, English and π¨π± Spanish | Living in TΓΌbingen π©πͺ | he/him
https://gubri.eu
PhD student in CS @ ETHZ / MPI-IS
Theory of ML evaluation https://flodorner.github.io/
feature writer | wrote a book about Jeopardy! | geist that could've been polter along with zeit
I'm that YouTuber who taught you how dishwashers work. Guess I'm tryin' out the whole Bluesky thing now.
he/him
https://www.youtube.com/technologyconnections
Cognitive scientist, linguist, phonetician at the University of Zurich Dept. of Computational Linguistics
Assistant professor at NUS. Scaling cooperative intelligence & infrastructure for an increasingly automated future. PhD @ MIT ProbComp / CoCoSci. Pronouns: η₯/δΌ
gay, nerdy, really quite something...
nice vibes, mostly well behaved
occasional poet π€·ββοΈ
Washington DC adjacent, he/him
linktr.ee/palmerpink
Information with representation, by and for D.C. residents.
Worker-led nonprofit newsroom, funded by readers like YOU.
π https://51st.news
"An expert in power systems" -NYTimes | J.B. Duke Fellow, PhDing @DukeU | Fmr: SPGlobal, USDOE, Cypress Creek Renewables | Stanford alumn
Substack: http://powerpolicy.net
Use ππ‘ in posts for #energysky
Researcher in Interaction Design, AI, Games & Robots
CEO of Bluesky, steward of AT Protocol.
dec/acc π± πͺ΄ π³
NLP & ML research @cohereforai.bsky.social π¨π¦
PhD student at Johns Hopkins CLSP (@jhuclsp.bsky.social).
Researching natural and formal language processing.
williamjurayj.com
The "someone" mentioned in xkcd 386
Fully exnominated
https://hans.gerwitz.com
AI Architect | North Carolina | AI/ML, IoT, science
WARNING: I talk about kids sometimes