It looks like the Ai2paperfinder is down? Huge bummer, it has become an integral piece of my workflow, and I'd pay to use it tbh. Any Ai2 people have any insights on this situation?
+Japan/Korea Travel photos tax
@saxon.me.bsky.social
Doctor of NLP/Vision+Language from UCSB Evals, metrics, multilinguality, multiculturality, multimodality, and (dabbling in) reasoning https://saxon.me/
It looks like the Ai2paperfinder is down? Huge bummer, it has become an integral piece of my workflow, and I'd pay to use it tbh. Any Ai2 people have any insights on this situation?
+Japan/Korea Travel photos tax
Kinda wow: the mystery model "summit" (rumored to be OpenAI) with the prompt "create something I can paste into p5js that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future" & "make it better"
2,351 lines of code. First time
East Asia friends! I will be in Seoul next week, then Tokyo ~2 weeks, then Taipei for a weekish! Let me know if you're around and want to grab a lunch, coffee, or beer!
23.07.2025 22:38 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0I wonder if nature will also want to publish this correction ๐ค
17.07.2025 15:36 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0this is one of the worst long-term trends for US political stability forming in media res
10.07.2025 03:29 โ ๐ 240 ๐ 40 ๐ฌ 7 ๐ 0๐งช BREAKING (good news): Senate subcommittee says NO! to Trump's proposed slashes to NASA & NSF funding.
Today, the subcommittee said to keep NASA + NSF funding at $33.9 billion, the same as in FY24.
See 7:15 below. Full Senate appropriations committee meets tomorrow about it.
๐งต 1/3
I'm not going to repost any of the insane antisemitic conspiracy bullshit that grok is spewing today, but it highlights how absolutely essentially is that we not let LLMs become a form of epistemic grounding for our society.
08.07.2025 22:40 โ ๐ 3410 ๐ 751 ๐ฌ 48 ๐ 41Is this fiction? I can't find anything external to this about this Dr. Lisa Park or lawyer Sarah Chen story, or anything about "legal injection attacks" or this prof David Rodriguez...
06.07.2025 23:04 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Ah, Star wars is back to being bad again kinda feel
05.07.2025 03:45 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Rogue One must have been a tonal whiplash ๐คฃ
05.07.2025 03:45 โ ๐ 1 ๐ 0 ๐ฌ 2 ๐ 0If he was named Elon Madiba Musk and he was woke I think I would be more ok with it lmao
04.07.2025 17:34 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0For the longest time I've been using Google Translate as a gateway to explain machine translation concepts to people as it's a tool that everyone knows. Now I get to contribute over the summer. ๐
If you're near Mountain View, let's talk evaluation. ๐
Screenshot of the RExBench preprint title page.
Can coding agents autonomously implement AI research extensions?
We introduce RExBench, a benchmark that tests if a coding agent can implement a novel experiment based on existing research and code.
Finding: Most agents we tested had a low success rate, but there is promise!
Shadowbanned on openreview is crazy
01.07.2025 05:32 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Woah I couldn't not read it as "jimmy" until I saw the kana lol
29.06.2025 16:14 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Yeah one downside is it might reward heuristics more. I guess my thought process is that as the reviewing process gets more and more random and overwhelmed I wonder if it would be more honest stochastic approach is better. Then maybe bean counters will stop incentivizing conferenceslop
26.06.2025 01:20 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Oooo true
25.06.2025 07:15 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Crazy peer reviewing proposal:
1. Decouple longform reviews from acceptance decisions
2. 10 or 20 reviewers give a simple upvote/downvote after quick readthrough
3. 1 reviewer per paper writes a longform critique w/o score
4. AC uses (2)s score, (3)s crit
Lighter workload + more eyes on each paper
I still am an ARR believer, but this is more of a social problem than technical
25.06.2025 06:25 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0ngl the more coerced and transactional reviewing becomes the less eager I am to review when I didn't submit
24.06.2025 21:39 โ ๐ 3 ๐ 0 ๐ฌ 2 ๐ 0Whatever it is I like it
21.06.2025 02:27 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Arnav Yayavaram and Siddharth Yayavaram were the main contributors to this project and built an awesome, clean, easy-to-use codebase that's up on Github now! I have found this resource to be enabling for my own work. @simi97k.bsky.social was the main mentor.
Read now!
arxiv.org/abs/2506.09109
On T2I-generated images, it is good at predicting the judgments of human raters from 10 countries of an imageโs relevance to their own culture compared to a set of simple baselines.
AIRe can be used to grade the "stylistic aspects" of a fantasy entity, not just match real stuff 4/5
As far as we can tell, CAIRe works quite well. It is very performant at identifying the cultural origins of ๐ฟ๐ฒ๐ฎ๐น, ๐ฟ๐ฎ๐ฟ๐ฒ ๐ฒ๐ป๐๐ถ๐๐ถ๐ฒ๐ based on many proxies, including country, region, religion, ethnicity, and even ancient civilizations.
3/5
Our metric CAIRe (Cultural Attribution of Images with Retrieval) scores an input image using image retrieval over a multimodal KG and LM likelihood scores over entry data to assign cultural relevance scores to ๐๐ง๐ฒ set of cultural labels based on ๐๐ง๐ฒ cultural proxy (not just countries!). 2/5
20.06.2025 23:02 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Multicultural text-to-image work requires costly, subjective human evaluation. Some of my projects have stalled because no automated, quantified "visual cultural attribution" metric existed.
BITS undergrads Siddharth and Arnav Yayavaram, @simi97k.bsky.social, @gneubig.bsky.social, and I made one.1/
I make my appearance at 5:14 wearing a rhino mask. Since this has been watched 24M times I probably peaked then. And no, I did not then and do not now know her lmao
youtu.be/SRWatgS077k
in elementary school I happened to be in the same strange school play as future YouTuber Jaiden Animations.
It was such a strange play she made a video about it, including a clip of childhood me singing "you are ugly"
Thanks to YouTube rewind my Bacon number is 4 (Will Smith, Jeff Goldblum, Bacon)
Really hitting home after hearing gunshots at the SLC march.
Apparently the suspected gunman arrested after shooting & critically injuring a man.
The idea of violence against objects is a category error.
And here is the remote pre-recording of my talk for the wultiwodal meta-evaluation tutorial at CVPR
youtu.be/ymwPz1sioJI