If you’re excited about building agentic systems, let’s chat.
p.s. also on the UniReps panel Saturday on the broken state of reviewing & publishing.
@abeirami.bsky.social
stealth // Gemini RL+inference @ Google DeepMind // Conversational AI @ Meta // RL Agents @ EA // ML+Information Theory @ MIT+Harvard+Duke // Georgia Tech PhD // زن زندگی آزادی 📍{NYC, SFO, YYZ} 🔗 https://beirami.github.io/
If you’re excited about building agentic systems, let’s chat.
p.s. also on the UniReps panel Saturday on the broken state of reviewing & publishing.
Hiring researchers & engineers to work on
–building reliable software on top of unreliable LLM primitives
–statistical evaluation of real-world deployments of LLM-based systems
I’m speaking about this on two NeurIPS workshop panels:
🗓️Saturday – Reliable ML Workshop
🗓️Sunday – LLM Evaluation Workshop
Woke up to this email this morning
- Wow, I won a NeurIPS award?!
- …runner-up, but I’ll take it.
- Wait, I didn’t submit a paper.
- Ah, I’m chairing the session and I’m supposed to give the award.
Huge congratulations to the actual winners and runners-up!
If you're at @neuripsconf.bsky.social on Dec 6, don’t miss our panel session at @unireps.bsky.social with Ahmad Beirami, Sara Hooker and more to be announced! 🚀
23.11.2025 09:44 — 👍 1 🔁 1 💬 0 📌 0Will be at NeurIPS Thu Dec 4 to Sun Dec 7, excited to reconnect with old friends and make new ones.
If you are excited about AI engineering (orchestration, evals, and optimizing scaffolds), we are hiring!
On Saturday I’ll be on panels at the Reliable ML & UniReps workshops.
Once you see a math concept geometrically, it becomes much easier to think about, and it’s hard to go back to any other way of seeing it.
05.11.2025 13:20 — 👍 4 🔁 0 💬 1 📌 0Whatever you are feeling is a normal response. Give yourself time and space to process, connect with others for support, and begin healing. I am happy to help in any way I can!
24.10.2025 12:25 — 👍 1 🔁 0 💬 0 📌 0I am sorry for what many of my excellent former colleagues are going through.
Layoffs can be emotionally challenging for everyone, whether you are directly affected or not.
The math that LLMs can do today is novel enough to be considered publishable, but it's not the kind of math that would be consequential.
24.09.2025 21:42 — 👍 4 🔁 0 💬 0 📌 0My thoughts on the broken state of AI conference reviewing.
www.linkedin.com/feed/update/...
Let's regress from here to AGI!
11.09.2025 14:49 — 👍 3 🔁 0 💬 0 📌 0Slide titled “Takeaways (alignment recipe).” Step 1: Perform Best-of-n and make sure it works as desired. – Inspect a few responses and verify the reward-induced ranking makes sense. – Best-of-n gives the best trade-offs; if it doesn’t work, no fancy method will. – You can debug best-of-n much faster. Step 2: Only then train your favorite alignment method. – Track KL(π‖p) throughout training: • KL > 100: results are unlikely to be useful. • KL > 15: inspect outcomes for reward hacking. • KL < 8: you are probably OK. Bottom banner in a black box repeats “(1) Look at your data! (2) Look at your data! (3) Look at your data!” in blue, green, and red.
This is the conclusion slide of a talk I gave more than a year ago on RL/Alignment! It still holds true today.
10.09.2025 13:07 — 👍 3 🔁 0 💬 0 📌 0This also applies to telling your story (e.g., in a CV, bio, interview, etc).
Focus on what you have accomplished and what you are excited about doing next; not just where you did it!
Haha. the content was:
If a paper is great, the credit goes to the first author.
If a paper has any flaws, the responsibility falls on the last author.
The actual unpopular opinion is that the notion of senior and junior authors should be abolished. It has completely diluted the notion of scientific authorship and created this entire industry of free-riding, head-in-the-clouds, incompetent PIs/managers. List down exact contributions instead. [+]
07.09.2025 18:49 — 👍 10 🔁 2 💬 2 📌 0Glad you asked.
bsky.app/profile/abei...
I occasionally get messages asking how to follow my path and get into Meta, DeepMind, or similar places. That is the wrong question. Do not focus on the brand! Focus on what you want to work on, then find the opportunity that fits your goals best.
09.09.2025 13:31 — 👍 2 🔁 0 💬 0 📌 1Related to this if a paper turns out to have a major error in it, you’re supposed to throw yourself under the bus not your students.
07.09.2025 17:35 — 👍 24 🔁 1 💬 2 📌 0I think for every deliverable, there has to be one person who is responsible (gets it done) and one person who is accountable (makes sure it's done correctly).
Middle authors can be responsible or accountable for a subset of tasks.
proposed contribution breakdown by Atlas Wang makes a lot of sense imo:
www.linkedin.com/feed/update/...
Not really. I've been saying variants of the same thing for a long time:
x.com/abeirami/sta...
Corollary: If you lack bandwidth or expertise to act as the verifier, then you shouldn't sign up to be the senior author of a paper!
06.09.2025 21:42 — 👍 5 🔁 0 💬 0 📌 1The junior author is the generator. The senior author is the verifier. The verifier should teach/distill some checks to the generator, but the verifier keeps final responsibility. If a wrong claim gets out, it is on the verifier!
06.09.2025 20:35 — 👍 3 🔁 0 💬 2 📌 0Unpopular opinion:
When a paper has a senior mentor and a junior mentee, the senior author must make sure the claims are correct and well supported. They must check every claim and gate the submission until it meets that bar.
This is the recipe for many provable claims:
Make enough assumptions and narrow down the claim, then prove a narrow result with caveats. Present it as broad, hide the caveats, and declare “XYZ is provable!”
Today, Industry research is focused on short term (3-6months) bets. Academics have an opportunity to balance their portfolio with medium term (1-2 years) and long term (5-10 years) bets. Putting all academic efforts in short-term basket is suboptimal!
05.09.2025 13:24 — 👍 7 🔁 0 💬 0 📌 0When I worked in corporate, I was often first in the office because that routine worked for me. It was a personal preference, not a benchmark for anyone else.
We should not judge commitment by hours, especially in research. We should look for thoughtful work and steady progress.
True, this sounds obvious but it is more common than we'd hope, unfortunately.
02.09.2025 02:42 — 👍 0 🔁 0 💬 0 📌 0Common mistake in LLM prompting projects: jumping into full-scale pipelines (datasets and inference) without testing feasibility. Iterating at scale is expensive and time-consuming.
Start with ONE example to validate the hypothesis, verify context, debug the design, then scale.
I have to admit that I embarrassingly didn't know about this history. Nice reading material for the long weekend, thank you :)
31.08.2025 13:34 — 👍 1 🔁 0 💬 1 📌 0