Ahmad Beirami's Avatar

Ahmad Beirami

@abeirami.bsky.social

stealth // Gemini RL+inference @ Google DeepMind // Conversational AI @ Meta // RL Agents @ EA // ML+Information Theory @ MIT+Harvard+Duke // Georgia Tech PhD // زن زندگی آزادی 📍{NYC, SFO, YYZ} 🔗 https://beirami.github.io/

3,597 Followers  |  1,332 Following  |  322 Posts  |  Joined: 10.11.2024  |  1.7259

Latest posts by abeirami.bsky.social on Bluesky

If you’re excited about building agentic systems, let’s chat.

p.s. also on the UniReps panel Saturday on the broken state of reviewing & publishing.

02.12.2025 14:55 — 👍 3    🔁 0    💬 0    📌 0

Hiring researchers & engineers to work on
–building reliable software on top of unreliable LLM primitives
–statistical evaluation of real-world deployments of LLM-based systems

I’m speaking about this on two NeurIPS workshop panels:
🗓️Saturday – Reliable ML Workshop
🗓️Sunday – LLM Evaluation Workshop

02.12.2025 14:55 — 👍 19    🔁 5    💬 2    📌 0
Post image

Woke up to this email this morning

- Wow, I won a NeurIPS award?!
- …runner-up, but I’ll take it.
- Wait, I didn’t submit a paper.
- Ah, I’m chairing the session and I’m supposed to give the award.

Huge congratulations to the actual winners and runners-up!

02.12.2025 14:54 — 👍 4    🔁 0    💬 1    📌 0

If you're at @neuripsconf.bsky.social on Dec 6, don’t miss our panel session at @unireps.bsky.social with Ahmad Beirami, Sara Hooker and more to be announced! 🚀

23.11.2025 09:44 — 👍 1    🔁 1    💬 0    📌 0

Will be at NeurIPS Thu Dec 4 to Sun Dec 7, excited to reconnect with old friends and make new ones.

If you are excited about AI engineering (orchestration, evals, and optimizing scaffolds), we are hiring!

On Saturday I’ll be on panels at the Reliable ML & UniReps workshops.

22.11.2025 21:12 — 👍 9    🔁 0    💬 0    📌 1

Once you see a math concept geometrically, it becomes much easier to think about, and it’s hard to go back to any other way of seeing it.

05.11.2025 13:20 — 👍 4    🔁 0    💬 1    📌 0

Whatever you are feeling is a normal response. Give yourself time and space to process, connect with others for support, and begin healing. I am happy to help in any way I can!

24.10.2025 12:25 — 👍 1    🔁 0    💬 0    📌 0

I am sorry for what many of my excellent former colleagues are going through.

Layoffs can be emotionally challenging for everyone, whether you are directly affected or not.

24.10.2025 12:25 — 👍 1    🔁 0    💬 1    📌 0

The math that LLMs can do today is novel enough to be considered publishable, but it's not the kind of math that would be consequential.

24.09.2025 21:42 — 👍 4    🔁 0    💬 0    📌 0
Preview
My thoughts on the broken state of AI conference reviewing: Years ago, when I was in graduate school and a postdoc in Information Theory, I always felt fortunate to be invited to review for IEEE… | A... My thoughts on the broken state of AI conference reviewing: Years ago, when I was in graduate school and a postdoc in Information Theory, I always felt fortunate to be invited to review for IEEE Tran...

My thoughts on the broken state of AI conference reviewing.

www.linkedin.com/feed/update/...

20.09.2025 18:12 — 👍 8    🔁 0    💬 0    📌 1

Let's regress from here to AGI!

11.09.2025 14:49 — 👍 3    🔁 0    💬 0    📌 0
Slide titled “Takeaways (alignment recipe).”

Step 1: Perform Best-of-n and make sure it works as desired.
– Inspect a few responses and verify the reward-induced ranking makes sense.
– Best-of-n gives the best trade-offs; if it doesn’t work, no fancy method will.
– You can debug best-of-n much faster.

Step 2: Only then train your favorite alignment method.
– Track KL(π‖p) throughout training:
• KL > 100: results are unlikely to be useful.
• KL > 15: inspect outcomes for reward hacking.
• KL < 8: you are probably OK.

Bottom banner in a black box repeats “(1) Look at your data! (2) Look at your data! (3) Look at your data!” in blue, green, and red.

Slide titled “Takeaways (alignment recipe).” Step 1: Perform Best-of-n and make sure it works as desired. – Inspect a few responses and verify the reward-induced ranking makes sense. – Best-of-n gives the best trade-offs; if it doesn’t work, no fancy method will. – You can debug best-of-n much faster. Step 2: Only then train your favorite alignment method. – Track KL(π‖p) throughout training: • KL > 100: results are unlikely to be useful. • KL > 15: inspect outcomes for reward hacking. • KL < 8: you are probably OK. Bottom banner in a black box repeats “(1) Look at your data! (2) Look at your data! (3) Look at your data!” in blue, green, and red.

This is the conclusion slide of a talk I gave more than a year ago on RL/Alignment! It still holds true today.

10.09.2025 13:07 — 👍 3    🔁 0    💬 0    📌 0

This also applies to telling your story (e.g., in a CV, bio, interview, etc).

Focus on what you have accomplished and what you are excited about doing next; not just where you did it!

09.09.2025 14:07 — 👍 3    🔁 0    💬 0    📌 0

Haha. the content was:

If a paper is great, the credit goes to the first author.

If a paper has any flaws, the responsibility falls on the last author.

09.09.2025 13:41 — 👍 3    🔁 0    💬 1    📌 0

The actual unpopular opinion is that the notion of senior and junior authors should be abolished. It has completely diluted the notion of scientific authorship and created this entire industry of free-riding, head-in-the-clouds, incompetent PIs/managers. List down exact contributions instead. [+]

07.09.2025 18:49 — 👍 10    🔁 2    💬 2    📌 0

Glad you asked.
bsky.app/profile/abei...

09.09.2025 13:31 — 👍 1    🔁 0    💬 1    📌 0

I occasionally get messages asking how to follow my path and get into Meta, DeepMind, or similar places. That is the wrong question. Do not focus on the brand! Focus on what you want to work on, then find the opportunity that fits your goals best.

09.09.2025 13:31 — 👍 2    🔁 0    💬 0    📌 1

Related to this if a paper turns out to have a major error in it, you’re supposed to throw yourself under the bus not your students.

07.09.2025 17:35 — 👍 24    🔁 1    💬 2    📌 0

I think for every deliverable, there has to be one person who is responsible (gets it done) and one person who is accountable (makes sure it's done correctly).

Middle authors can be responsible or accountable for a subset of tasks.

07.09.2025 16:28 — 👍 4    🔁 1    💬 1    📌 0
Preview
Corollary: If you lack bandwidth or expertise to act as the verifier, then you shouldn&#39;t sign up to be the senior author of a paper! | Ahmad Beirami Corollary: If you lack bandwidth or expertise to act as the verifier, then you shouldn't sign up to be the senior author of a paper!

proposed contribution breakdown by Atlas Wang makes a lot of sense imo:
www.linkedin.com/feed/update/...

07.09.2025 16:28 — 👍 0    🔁 0    💬 2    📌 0
Preview
Ahmad Beirami on X: "Good time to remind ourselves that: If a paper is great, the credit goes to the first author. If a paper has any flaws, the responsibility falls on the last author." / X Good time to remind ourselves that: If a paper is great, the credit goes to the first author. If a paper has any flaws, the responsibility falls on the last author.

Not really. I've been saying variants of the same thing for a long time:

x.com/abeirami/sta...

06.09.2025 22:44 — 👍 2    🔁 0    💬 0    📌 1

Corollary: If you lack bandwidth or expertise to act as the verifier, then you shouldn't sign up to be the senior author of a paper!

06.09.2025 21:42 — 👍 5    🔁 0    💬 0    📌 1

The junior author is the generator. The senior author is the verifier. The verifier should teach/distill some checks to the generator, but the verifier keeps final responsibility. If a wrong claim gets out, it is on the verifier!

06.09.2025 20:35 — 👍 3    🔁 0    💬 2    📌 0

Unpopular opinion:
When a paper has a senior mentor and a junior mentee, the senior author must make sure the claims are correct and well supported. They must check every claim and gate the submission until it meets that bar.

06.09.2025 20:35 — 👍 19    🔁 3    💬 4    📌 3

This is the recipe for many provable claims:

Make enough assumptions and narrow down the claim, then prove a narrow result with caveats. Present it as broad, hide the caveats, and declare “XYZ is provable!”

06.09.2025 15:27 — 👍 3    🔁 0    💬 0    📌 0

Today, Industry research is focused on short term (3-6months) bets. Academics have an opportunity to balance their portfolio with medium term (1-2 years) and long term (5-10 years) bets. Putting all academic efforts in short-term basket is suboptimal!

05.09.2025 13:24 — 👍 7    🔁 0    💬 0    📌 0

When I worked in corporate, I was often first in the office because that routine worked for me. It was a personal preference, not a benchmark for anyone else.

We should not judge commitment by hours, especially in research. We should look for thoughtful work and steady progress.

02.09.2025 13:39 — 👍 3    🔁 0    💬 0    📌 0

True, this sounds obvious but it is more common than we'd hope, unfortunately.

02.09.2025 02:42 — 👍 0    🔁 0    💬 0    📌 0

Common mistake in LLM prompting projects: jumping into full-scale pipelines (datasets and inference) without testing feasibility. Iterating at scale is expensive and time-consuming.

Start with ONE example to validate the hypothesis, verify context, debug the design, then scale.

01.09.2025 16:33 — 👍 7    🔁 1    💬 1    📌 0

I have to admit that I embarrassingly didn't know about this history. Nice reading material for the long weekend, thank you :)

31.08.2025 13:34 — 👍 1    🔁 0    💬 1    📌 0

@abeirami is following 20 prominent accounts