John F Wu's Avatar

John F Wu

@jwuphysics.bsky.social

Tenure-track astronomer at STScI/JHU working on galaxies, machine learning, and AI for scientific discovery. Opinions my own. He/him. Website: https://jwuphysics.github.io/

3,551 Followers  |  625 Following  |  960 Posts  |  Joined: 07.07.2023  |  1.9779

Latest posts by jwuphysics.bsky.social on Bluesky

Catch us in Montreal!

06.10.2025 19:45 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Here is a genuine one :) CosmicAIโ€™s AstroVisBench, to appear at #NeurIPS

bsky.app/profile/nsfs...

02.10.2025 14:03 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Answer Matching Outperforms Multiple Choice for Language Model Evaluation Multiple choice benchmarks have long been the workhorse of language model evaluation because grading multiple choice is objective and easy to automate. However, we show multiple choice questions from ...

Okay I guess I should be more fair. This isn't the worst offender, but I'm still not a fan: it misses loads of relevant citations, doesn't release the benchmark, its example questions are meh (see MIRI question in Fig 1), and multiple choice is known to be bad (see e.g. arxiv.org/abs/2507.02856)

02.10.2025 15:50 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
AstroMMBench: A Benchmark for Evaluating Multimodal Large Language Models Capabilities in Astronomy Astronomical image interpretation presents a significant challenge for applying multimodal large language models (MLLMs) to specialized scientific tasks. Existing benchmarks focus on general multimoda...

I've never shuddered so hard at reading AI slop.

Please make it stop

arxiv.org/abs/2510.00063

02.10.2025 13:42 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
A sneak preview of Figure 6 in the paper, which shows size (r-band radius) vs stellar mass for SAGA satellites, SAGA background galaxies, and SDSS isolated galaxies. They all obey the same trends but have small offsets, which appears unlikely to be driven by SFR but *does* seem to be driven by environment!

A sneak preview of Figure 6 in the paper, which shows size (r-band radius) vs stellar mass for SAGA satellites, SAGA background galaxies, and SDSS isolated galaxies. They all obey the same trends but have small offsets, which appears unlikely to be driven by SFR but *does* seem to be driven by environment!

Fantastic work on the sizeโ€“mass relation for low-mass galaxies, led by Yasmeen (@yasmeenasali.bsky.social)!

arxiv.org/abs/2509.25335

๐Ÿ”ญ๐ŸŒŒ๐Ÿงช

01.10.2025 13:54 โ€” ๐Ÿ‘ 11    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Any mutuals going to be in Montrรฉal next week? Give me a shout if so!

I'll be attending COLM and visiting UdeM, Ciela, and Mila, and presenting on various topics spanning ML applications in galaxy evolution to interpretable AI for scientific discovery.

01.10.2025 11:35 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
A merger galaxy system that look Arp-like.

https://www.legacysurvey.org/viewer?ra=171.8432&dec=-5.4983&layer=ls-dr9&zoom=14

A merger galaxy system that look Arp-like. https://www.legacysurvey.org/viewer?ra=171.8432&dec=-5.4983&layer=ls-dr9&zoom=14

An Arp-like messy merger at z=0.03.

www.legacysurvey.org/viewer?ra=17...

29.09.2025 16:18 โ€” ๐Ÿ‘ 9    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Screenshot of our abstract from the COLM schedule page, printed below

Thursday, October 9th

Title: From Queries to Criteria: Understanding How Astronomers Evaluate LLMs

11:00 AM โ€“ 1:00 PM
710

Authors: Alina Hyk, Kiera McCormick, Mian Zhong, Ioana Ciucฤƒ, Sanjib Sharma, John F Wu, J. E. G. Peek, Kartheik G. Iyer, Ziang Xiao, Anjalie Field

Abstract
There is growing interest in leveraging LLMs to aid in astronomy and other scientific research, but benchmarks for LLM evaluation in general have not kept pace with the increasingly diverse ways that real people evaluate and use these models. In this study, we seek to improve evaluation procedures by building an understanding of how users evaluate LLMs. We focus on a particular use case: an LLM-powered retrieval-augmented generation bot for engaging with astronomical literature, which we deployed via Slack. Our inductive coding of 368 queries to the bot over four weeks and our follow-up interviews with 11 astronomers reveal how humans evaluated this system, including the types of questions asked and the criteria for judging responses. We synthesize our findings into concrete recommendations for building better benchmarks, which we then employ in constructing a sample benchmark for evaluating LLMs for astronomy. Overall, our work offers ways to improve LLM evaluation and ultimately usability, particularly for use in scientific research.

Screenshot of our abstract from the COLM schedule page, printed below Thursday, October 9th Title: From Queries to Criteria: Understanding How Astronomers Evaluate LLMs 11:00 AM โ€“ 1:00 PM 710 Authors: Alina Hyk, Kiera McCormick, Mian Zhong, Ioana Ciucฤƒ, Sanjib Sharma, John F Wu, J. E. G. Peek, Kartheik G. Iyer, Ziang Xiao, Anjalie Field Abstract There is growing interest in leveraging LLMs to aid in astronomy and other scientific research, but benchmarks for LLM evaluation in general have not kept pace with the increasingly diverse ways that real people evaluate and use these models. In this study, we seek to improve evaluation procedures by building an understanding of how users evaluate LLMs. We focus on a particular use case: an LLM-powered retrieval-augmented generation bot for engaging with astronomical literature, which we deployed via Slack. Our inductive coding of 368 queries to the bot over four weeks and our follow-up interviews with 11 astronomers reveal how humans evaluated this system, including the types of questions asked and the criteria for judging responses. We synthesize our findings into concrete recommendations for building better benchmarks, which we then employ in constructing a sample benchmark for evaluating LLMs for astronomy. Overall, our work offers ways to improve LLM evaluation and ultimately usability, particularly for use in scientific research.

Anyone else going to COLM? Give me a shout!

Also, check out our poster on evaluating LLMs for astronomy research. This work came out of our 2024 JSALT research and was jointly led by undergrads Alina Hyk and Kiera McCormick!

27.09.2025 19:54 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I assume the submission is 9 pp and then the camera ready is 10 pp. Strange that they wrote "submission version" every time...

25.09.2025 00:31 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Baltimore new tiered speeding fine structure, ranging from $40 (<15 mph over), $70 (16-19 mph), $120 (20-29 mph), $230 (30-39 mph), and $425 (40+ mph)

Baltimore new tiered speeding fine structure, ranging from $40 (<15 mph over), $70 (16-19 mph), $120 (20-29 mph), $230 (30-39 mph), and $425 (40+ mph)

Thanks Baltimore DOT for penalizing 40+ mph speeders more harshly, but by that point shouldn't you be revoking their driving license?

24.09.2025 19:25 โ€” ๐Ÿ‘ 8    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

*Girl who's excited for @baltimorebeat.bsky.social's FIRST food issue, coming tomorrow* ๐Ÿ”๐ŸŒญ๐ŸŒฎ๐Ÿ•

23.09.2025 19:11 โ€” ๐Ÿ‘ 18    ๐Ÿ” 8    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

They really don't pay you guys enough to be subjected to the disappointment that is eating at Chipotle in Baltimore

Also 27 points for the one in Mt Vernon?! They haven't once fulfilled my order correctly or had all menu items in stock.

23.09.2025 10:43 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Tbf Lamar usually doesn't throw it away, and he somehow makes magic out of it. Not this time...

23.09.2025 01:18 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Didn't seem like the o line had any idea who their blocking assignments were.

23.09.2025 01:17 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Oops I mean 2022

21.09.2025 16:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

While democracy dies in darkness, let me just say that one of my most prized possessions is the re-launch print of the @baltimorebeat.bsky.social from 2020

21.09.2025 00:19 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Supplementary Fig 1 of the paper, showing the schematic of the heating equipment (left) and image acquisition set up (right)

Supplementary Fig 1 of the paper, showing the schematic of the heating equipment (left) and image acquisition set up (right)

Supplementary Fig 1 of the paper, showing the actual heating equipment (left) and image acquisition set up (right). The right-side shows a camera mounted over an illuminated box. The box is the outside packaging of the popular Bialetti moka pot.

Supplementary Fig 1 of the paper, showing the actual heating equipment (left) and image acquisition set up (right). The right-side shows a camera mounted over an illuminated box. The box is the outside packaging of the popular Bialetti moka pot.

Two things about this paper.

1. This is legitimately useful information
2. The supplementary material shows the experimental set up... and they perform all experiments in a Bialetti Moka pot box, because of course they did

19.09.2025 16:37 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
2025 Ig Physics Nobel Prize goes to perfect pasta sauce The Ig Nobel Prize honors research that first makes people laugh, then makes them think. Its 35th award ceremony possibly also makes people hungry: ISTA physicist Fabrizio Olmeda and colleagues resear...

Delighted to see that this year's Ig Nobel Physics Prize is about the phase behavior of Cacio e pepe sauce.

Paper: pubs.aip.org/aip/pof/arti...

Pop Sci article: phys.org/news/2025-09...

19.09.2025 16:32 โ€” ๐Ÿ‘ 23    ๐Ÿ” 8    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Video thumbnail

A cell finding its way through the matrix, imaged with @joycemeiri.bsky.social on LLS.

19.09.2025 10:38 โ€” ๐Ÿ‘ 33    ๐Ÿ” 12    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

That's how I learned it!

17.09.2025 16:41 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

TIL that @colmweb.org is pronounced like "Collum"!

16.09.2025 14:24 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
A galaxy with long filament structures tailing behind the galaxy, like the tentacles of a jellyfish. The shape of the galaxy is also slightly warped.

A galaxy with long filament structures tailing behind the galaxy, like the tentacles of a jellyfish. The shape of the galaxy is also slightly warped.

The jellyfish #galaxy MACSJ0451-JFG1 in a galaxy cluster with #JWST NIRCam. ๐Ÿ”ญ

The galaxy is experiencing ram-pressure stripping. It moves trough the intracluster medium and is stripped of gas, leaving tails that form stars.

My image processing from today: commons.wikimedia.org/wiki/File:MA...

13.09.2025 10:03 โ€” ๐Ÿ‘ 153    ๐Ÿ” 39    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 2

Thank you! ๐Ÿ’™

12.09.2025 21:55 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

And I didn't even have to pay a billionaire! Wow!

12.09.2025 21:54 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Impressive sleuthing!

Careful observations ๐Ÿค careful statistical modeling

12.09.2025 11:54 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

www.adl.org/resources/re...

11.09.2025 16:12 โ€” ๐Ÿ‘ 10243    ๐Ÿ” 4384    ๐Ÿ’ฌ 300    ๐Ÿ“Œ 197
Preview
The \emph{JWST} Emission Line Survey (JELS): The sizes and merger fraction of star-forming galaxies during the Epoch of Reionization We used observations from the \emph{JWST} Emission Line Survey (JELS) to measure the half-light radii ($r_{e}$) of 23 H$ฮฑ$-emitting star-forming (SF) galaxies at $z=6.1$ in the PRIMER/COSMOS field. Ga...

Hello everyone! My first Bluesky posts!

I am very pleased to share that my 3rd first-author paper of my PhD is now available on arXiv! The paper has also been submitted to MNRAS following minor suggested revisions from the anonymous referee!

arxiv.org/abs/2509.08045

Thread on the details ๐Ÿงต

11.09.2025 17:55 โ€” ๐Ÿ‘ 20    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
A snapshot of the Illustris TNG simulation, showing stellar light, with the text "Is there a better way to quantify environment?" as the slide title. It then shows the environmental "parameterizations": spherical overdensity  (a blue circle of fixed radius), DisPerSE (a topological data analysis technique), and Graph Neural Networks (which uses the subhalo catalog as a point cloud and connects them within a fixed linking length). There is also an example of a subvolume shown as a graph neural network.

A snapshot of the Illustris TNG simulation, showing stellar light, with the text "Is there a better way to quantify environment?" as the slide title. It then shows the environmental "parameterizations": spherical overdensity (a blue circle of fixed radius), DisPerSE (a topological data analysis technique), and Graph Neural Networks (which uses the subhalo catalog as a point cloud and connects them within a fixed linking length). There is also an example of a subvolume shown as a graph neural network.

Thanks to UMaryland Astro for inviting me to give the Center for Theory and Computation Seminar!

10.09.2025 16:42 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
FAQ: The AstroSci Feed - The Astrosky Ecosystem We're building social media tools for the astronomy & space science communities. From feeds to hosting, we're billionaire-proofing scientific discussion for good.

Astronomy has nice feeds (astrosky.eco/faq/research).

Unfortunately still hard to stay up to date on ML/AI here; lots of researchers stuck with Twitter.

10.09.2025 01:53 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I just watched the strangest #orioles walk off win

10.09.2025 01:38 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@jwuphysics is following 20 prominent accounts