Michael Saxon's Avatar

Michael Saxon

@saxon.me.bsky.social

Doctor of NLP/Vision+Language from UCSB Evals, metrics, multilinguality, multiculturality, multimodality, and (dabbling in) reasoning https://saxon.me/

2,656 Followers  |  677 Following  |  263 Posts  |  Joined: 08.09.2023  |  1.9487

Latest posts by saxon.me on Bluesky

Post image Post image Post image Post image

It looks like the Ai2paperfinder is down? Huge bummer, it has become an integral piece of my workflow, and I'd pay to use it tbh. Any Ai2 people have any insights on this situation?

+Japan/Korea Travel photos tax

07.08.2025 00:30 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Video thumbnail

Kinda wow: the mystery model "summit" (rumored to be OpenAI) with the prompt "create something I can paste into p5js that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future" & "make it better"

2,351 lines of code. First time

27.07.2025 03:10 โ€” ๐Ÿ‘ 203    ๐Ÿ” 19    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 2

East Asia friends! I will be in Seoul next week, then Tokyo ~2 weeks, then Taipei for a weekish! Let me know if you're around and want to grab a lunch, coffee, or beer!

23.07.2025 22:38 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I wonder if nature will also want to publish this correction ๐Ÿค”

17.07.2025 15:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

this is one of the worst long-term trends for US political stability forming in media res

10.07.2025 03:29 โ€” ๐Ÿ‘ 240    ๐Ÿ” 40    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 0
Preview
Subcommittee Markup of the Commerce, Justice, Science, and Related Agencies Appropriations Act | United States Senate Committee on Appropriations United States Senate Committee on Appropriations

๐Ÿงช BREAKING (good news): Senate subcommittee says NO! to Trump's proposed slashes to NASA & NSF funding.

Today, the subcommittee said to keep NASA + NSF funding at $33.9 billion, the same as in FY24.

See 7:15 below. Full Senate appropriations committee meets tomorrow about it.

๐Ÿงต 1/3

10.07.2025 00:23 โ€” ๐Ÿ‘ 1008    ๐Ÿ” 377    ๐Ÿ’ฌ 18    ๐Ÿ“Œ 59

I'm not going to repost any of the insane antisemitic conspiracy bullshit that grok is spewing today, but it highlights how absolutely essentially is that we not let LLMs become a form of epistemic grounding for our society.

08.07.2025 22:40 โ€” ๐Ÿ‘ 3410    ๐Ÿ” 751    ๐Ÿ’ฌ 48    ๐Ÿ“Œ 41

Is this fiction? I can't find anything external to this about this Dr. Lisa Park or lawyer Sarah Chen story, or anything about "legal injection attacks" or this prof David Rodriguez...

06.07.2025 23:04 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Ah, Star wars is back to being bad again kinda feel

05.07.2025 03:45 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Rogue One must have been a tonal whiplash ๐Ÿคฃ

05.07.2025 03:45 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

If he was named Elon Madiba Musk and he was woke I think I would be more ok with it lmao

04.07.2025 17:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

For the longest time I've been using Google Translate as a gateway to explain machine translation concepts to people as it's a tool that everyone knows. Now I get to contribute over the summer. ๐ŸŒž

If you're near Mountain View, let's talk evaluation. ๐Ÿ“

03.07.2025 04:14 โ€” ๐Ÿ‘ 15    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Screenshot of the RExBench preprint title page.

Screenshot of the RExBench preprint title page.

Can coding agents autonomously implement AI research extensions?

We introduce RExBench, a benchmark that tests if a coding agent can implement a novel experiment based on existing research and code.

Finding: Most agents we tested had a low success rate, but there is promise!

02.07.2025 15:39 โ€” ๐Ÿ‘ 12    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2

Shadowbanned on openreview is crazy

01.07.2025 05:32 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Woah I couldn't not read it as "jimmy" until I saw the kana lol

29.06.2025 16:14 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Yeah one downside is it might reward heuristics more. I guess my thought process is that as the reviewing process gets more and more random and overwhelmed I wonder if it would be more honest stochastic approach is better. Then maybe bean counters will stop incentivizing conferenceslop

26.06.2025 01:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Oooo true

25.06.2025 07:15 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Crazy peer reviewing proposal:

1. Decouple longform reviews from acceptance decisions
2. 10 or 20 reviewers give a simple upvote/downvote after quick readthrough
3. 1 reviewer per paper writes a longform critique w/o score
4. AC uses (2)s score, (3)s crit

Lighter workload + more eyes on each paper

25.06.2025 06:39 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I still am an ARR believer, but this is more of a social problem than technical

25.06.2025 06:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

ngl the more coerced and transactional reviewing becomes the less eager I am to review when I didn't submit

24.06.2025 21:39 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Whatever it is I like it

21.06.2025 02:27 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation As text-to-image models become increasingly prevalent, ensuring their equitable performance across diverse cultural contexts is critical. Efforts to mitigate cross-cultural biases have been hampered b...

Arnav Yayavaram and Siddharth Yayavaram were the main contributors to this project and built an awesome, clean, easy-to-use codebase that's up on Github now! I have found this resource to be enabling for my own work. @simi97k.bsky.social was the main mentor.

Read now!

arxiv.org/abs/2506.09109

20.06.2025 23:02 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

On T2I-generated images, it is good at predicting the judgments of human raters from 10 countries of an imageโ€™s relevance to their own culture compared to a set of simple baselines.

AIRe can be used to grade the "stylistic aspects" of a fantasy entity, not just match real stuff 4/5

20.06.2025 23:02 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

As far as we can tell, CAIRe works quite well. It is very performant at identifying the cultural origins of ๐—ฟ๐—ฒ๐—ฎ๐—น, ๐—ฟ๐—ฎ๐—ฟ๐—ฒ ๐—ฒ๐—ป๐˜๐—ถ๐˜๐—ถ๐—ฒ๐˜€ based on many proxies, including country, region, religion, ethnicity, and even ancient civilizations.

3/5

20.06.2025 23:02 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Our metric CAIRe (Cultural Attribution of Images with Retrieval) scores an input image using image retrieval over a multimodal KG and LM likelihood scores over entry data to assign cultural relevance scores to ๐š๐ง๐ฒ set of cultural labels based on ๐š๐ง๐ฒ cultural proxy (not just countries!). 2/5

20.06.2025 23:02 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image

Multicultural text-to-image work requires costly, subjective human evaluation. Some of my projects have stalled because no automated, quantified "visual cultural attribution" metric existed.

BITS undergrads Siddharth and Arnav Yayavaram, @simi97k.bsky.social, @gneubig.bsky.social, and I made one.1/

20.06.2025 23:02 โ€” ๐Ÿ‘ 8    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
My Time at "Camp Operetta"
YouTube video by JaidenAnimations My Time at "Camp Operetta"

I make my appearance at 5:14 wearing a rhino mask. Since this has been watched 24M times I probably peaked then. And no, I did not then and do not now know her lmao

youtu.be/SRWatgS077k

17.06.2025 03:22 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

in elementary school I happened to be in the same strange school play as future YouTuber Jaiden Animations.

It was such a strange play she made a video about it, including a clip of childhood me singing "you are ugly"

Thanks to YouTube rewind my Bacon number is 4 (Will Smith, Jeff Goldblum, Bacon)

17.06.2025 03:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Really hitting home after hearing gunshots at the SLC march.
Apparently the suspected gunman arrested after shooting & critically injuring a man.

The idea of violence against objects is a category error.

15.06.2025 04:17 โ€” ๐Ÿ‘ 7    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

And here is the remote pre-recording of my talk for the wultiwodal meta-evaluation tutorial at CVPR

youtu.be/ymwPz1sioJI

12.06.2025 07:53 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@saxon.me is following 20 prominent accounts