Liebe Medien, ich kann die Schlagzeile "Keine Einigung zwischen USA und Dänemark" nicht mehr sehen. Wenn ein Bewaffneter eine Bank stürmt, titelt ihr doch auch nicht: "Räuber und Kassiererin finden keinen Konsens über Geldübergabe." Hört auf, imperiale Aggression als normale Diplomatie zu framen.
Don't forget Snider's!
I agree that Mallet >> Gensim, but VI can also work well. It's just more finicky, and Gensim's implementation doesn't do the things that Mallet does to optimize hyperparameters. So just like we shouldn't let LDA get a bad rap because of Gensim, we shouldn't let VI get a bad rap because of Gensim.
I'm so ahead of the curve on AI, I've been using this strategy for a decade.
Um, actually, that's a Diaeresis:
www.frathwiki.com/Diaeresis_an...
I believe the planet Diaeresis is also the planet where Reva Sevander kidnapped Leia until she was rescued by Obi-Wan.
I think you mean: Terror has spread. Even in The Loop, hundreds run from gunshots. Traffic has ground to a halt. Unable to do anything to prevent the stampede, hundreds scream from the sidewalks. "Police" stand by, doing nothing.
We had come at it more from the position of trying to use as few dev examples as possible (to keep them secret). I.e., use the best items you could and every model uses the exact same. But it makes sense to use the adaptive testing scenario if you don't mind potentially exposing more dev.
In 2021, we proposed using IRT to find bad examples and to create more targeted leaderboards (Evaluation
Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?).
From my reading, the big difference seems to be that they're also using the agent's skill, which is super cool!
We also found that it's helpful for improving uncertainty estimation of models:
arxiv.org/abs/2205.12507
If it said that 1990 was "about 10 years ago", I would say that it has reached tenured faculty-level intelligence.
Today's the deadline to apply for an AI-specific teaching track position at UMD:
umd.wd1.myworkdayjobs.com/UMCP/job/Uni...
Please join us!
A couple of weeks ago I left my family behind at a cable car station to finish climbing to the peak of a mountain because they were too scared to continue. When I reached the top, my phone gave a notification: new podcasts available for download. Apparently LMU has an observatory on Wendelstein.
Do you mean salary, physical facilities, work environment, or funding ecosystem?
At the risk of picking out one of my favorite children, this was the paper with our best traditional video of this cycle (thanks to Jon May for playing along):
t.co/QQlgwzo6jf
t.co/2G6kwAAPMy
Finally, this evening I'll be standing in for
@wwongkamjan.bsky.social
at the Findings poster (18:00, Hall X4/X5): Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
x.com/joywwong/sta...
In the second oral paper (14:22 PM, Room 1.62),
@yysung.bsky.social is presenting: GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration
x.com/YooYeonSung1...
(Short version: quiz bowl, a dumb trivia game, shows humans' calibration > LLMs'.)
In the first poster session (11AM Monday, Hall X4/X5),
@nbalepur.bsky.social is presenting: Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas
x.com/NishantBalep...
My students and I are presenting three papers on Monday at #ACL2025 and this thread will recap them (including their videos).
The precursor to this paper "The Incoherence of Coherence" had our most-watched paper video ever, so I thought we had to surpass it somehow ... so we decided to do a song parody (of Roxanne, obviously):
youtu.be/87OBxEM8a9E
And you can signup for online mirror (June 21, 12:00 EST) here:
docs.google.com/forms/d/e/1F...
[Signup deadline: June 18 Anywhere on Earth]
If this sounds like fun, we have more information on the setup here:
sites.google.com/view/qanta/2...
Sara’s Crias went 4-2 to win the tournament (and $150 dollars). Noah Sheidlower’s music packet was the most difficult for computers, and Jame Carlson’s Spatial Reasoning was the fan favorite. We’ll announce writer and computer prizes after our online mirror. (And also post the packets.)
We had our first human–computer cooperative AI tournament at the UMD. Key takeaways: 1) computers are getting better at trivia 2) they still suck at calibration 3) our teaming mechanic kept the games competitive and mostly fun (at least that’s what the players said).