ELLIS Institute PIs supported by UK government to research AI safety
Are AI systems learning to “game” their safety tests?
The UK’s Department for Science, Innovation and Technology has awarded a grant to Principal Investigators Sahar Abdelnabi, @maksym-andr.bsky.social, and @jonasgeiping.bsky.social.
Find out more: institute-tue.ellis.eu/en/news/elli...
12.02.2026 08:22 — 👍 3 🔁 1 💬 0 📌 0
PI Maksym Andriushchenko awarded funding from Coefficient Giving
The ELLIS Institute is proud to announce that Coefficient Giving is supporting our Principal Investigator Maksym Andriushchenko with a grant of $1,000,000 to fund his research on AI safety.
Find out more on our website: institute-tue.ellis.eu/en/news/pi-m...
08.01.2026 10:29 — 👍 5 🔁 1 💬 0 📌 1
👏 Give a big round of applause to our 2025 PhD Award Winners!
The two main winners are: @zhijingjin.bsky.social & @maksym-andr.bsky.social.
Two runners-up were selected additionally: Siwei Zhang & Elias Frantar
Learn even more about each outstanding scientist: https://bit.ly/4pm2Eji
02.12.2025 15:56 — 👍 14 🔁 2 💬 0 📌 0
🙌 It's been a pleasure to participate in this project alongside so many amazing experts!
3/3
15.10.2025 12:24 — 👍 1 🔁 0 💬 1 📌 0
This is the first key update before the release of the full report early next year. Both this update and the upcoming report discuss the critically important topics of AI risks and capabilities in a balanced way, carefully weighing all available scientific evidence.
2/3
15.10.2025 12:24 — 👍 1 🔁 0 💬 1 📌 0
📣 Incredibly excited to participate in writing the International AI Safety Report, chaired by Yoshua Bengio, as chapter lead for the capabilities chapter!
⚖️ AI is progressing so rapidly that yearly updates are no longer sufficient.
1/3
15.10.2025 12:24 — 👍 1 🔁 0 💬 1 📌 0
YouTube video by Friday Talks Tübingen
AI Safety and Alignment - [Maksym Andriushchenko]
A new recording of our FridayTalks@Tübingen series is online!
AI Safety and Alignment
by
@maksym-andr.bsky.social
Watch here: youtu.be/7WRW8MDQ8bk
11.09.2025 13:18 — 👍 6 🔁 1 💬 1 📌 0
Thank you so much, Nicolas! :)
06.08.2025 19:35 — 👍 0 🔁 0 💬 0 📌 0
We believe getting this—some may call it "AGI"—right is one of the most important challenges of our time.
Join us on this journey!
11/11
06.08.2025 15:43 — 👍 0 🔁 0 💬 0 📌 0
Taking this into account, we are only interested in studying methods that are general and scale with intelligence and compute. Everything that helps to advance their safety and alignment with societal values is relevant to us.
10/n
06.08.2025 15:43 — 👍 0 🔁 0 💬 1 📌 0
Broader vision. Current machine learning methods are fundamentally different from what they used to be pre-2022. The Bitter Lesson summarized and predicted this shift very well back in 2019: "general methods that leverage computation are ultimately the most effective".
9/n
06.08.2025 15:43 — 👍 0 🔁 0 💬 1 📌 0
... —literally anything that can be genuinely useful for other researchers and the general public.
8/n
06.08.2025 15:43 — 👍 0 🔁 0 💬 1 📌 0
Research style. We are not necessarily interested in getting X papers accepted at NeurIPS/ICML/ICLR. We are interested in making an impact: this can be papers (and NeurIPS/ICML/ICLR are great venues), but also open-source repositories, benchmarks, blog posts, even social media posts ...
7/n
06.08.2025 15:43 — 👍 1 🔁 0 💬 1 📌 0
For more information about research topics relevant to our group, please check the following documents:
- International AI Safety Report,
- An Approach to Technical AGI Safety and Security by DeepMind,
- Open Philanthropy’s 2025 RFP for Technical AI Safety Research.
6/n
06.08.2025 15:43 — 👍 0 🔁 0 💬 1 📌 0
We're also interested in rigorous AI evaluations and informing the public about the risks and capabilities of frontier AI models. Additionally, we aim to advance our understanding of how AI models generalize, which is crucial for ensuring their steerability and reducing associated risks.
5/n
06.08.2025 15:43 — 👍 0 🔁 0 💬 1 📌 0
Research group. We will focus on developing algorithmic solutions to reduce harms from advanced general-purpose AI models. We're particularly interested in alignment of autonomous LLM agents, which are becoming increasingly capable and pose a variety of emerging risks.
4/n
06.08.2025 15:43 — 👍 2 🔁 0 💬 1 📌 0
Hiring. I'm looking for multiple PhD students: both those able to start in Fall 2025 and through centralized programs like CLS, IMPRS, and ELLIS (the deadlines are in November) to start in Spring–Fall 2026. I'm also searching for postdocs, master's thesis students, and research interns.
2/n
06.08.2025 15:43 — 👍 0 🔁 0 💬 1 📌 0
🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨
1/n
06.08.2025 15:43 — 👍 9 🔁 0 💬 2 📌 1
GitHub - tml-epfl/os-harm: OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents - tml-epfl/os-harm
This is joint work with amazing collaborators: Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, Zico Kolter, and Nicolas Flammarion.
It will be presented as an oral at the WCUA workshop at ICML 2025!
Paper: arxiv.org/abs/2506.14866
Code: github.com/tml-epfl/os-...
19.06.2025 15:28 — 👍 2 🔁 0 💬 0 📌 0
Main findings based on frontier LLMs:
- They directly comply with _many_ deliberate misuse queries
- They are relatively vulnerable even to _static_ prompt injections
- They occasionally perform unsafe actions
19.06.2025 15:28 — 👍 0 🔁 0 💬 1 📌 0
🚨Excited to release OS-Harm! 🚨
The safety of computer use agents has been largely overlooked.
We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm:
1. deliberate user misuse,
2. prompt injections,
3. model misbehavior.
19.06.2025 15:28 — 👍 3 🔁 2 💬 1 📌 0
Paper link: josh-freeman.github.io/resources/ny....
This is joint work with amazing collaborators: Joshua Freeman, Chloe Rippe, and Edoardo Debenedetti.
🧵3/n
09.12.2024 22:00 — 👍 6 🔁 0 💬 0 📌 0
3. However, the memorized articles cited in the NYT lawsuit were clearly cherry-picked—random NYT articles have not been memorized.
4. We also provide a legal analysis of this case in light of our findings.
We will present this work at the Safe Gen AI Workshop at NeurIPS 2024 on Sunday.
🧵2/n
09.12.2024 22:00 — 👍 6 🔁 0 💬 1 📌 0
🚨Excited to share our new work!
1. Not only GPT-4 but also other frontier LLMs have memorized the same set of NYT articles from the lawsuit.
2. Very large models, particularly with >100B parameters, have memorized significantly more.
🧵1/n
09.12.2024 22:00 — 👍 15 🔁 1 💬 1 📌 1
Maksym Andriushchenko
I'm a PhD student in computer science at EPFL advised by Nicolas Flammarion. I'm interested in understanding why machine learning works and why it fails.
📢 I'll be at NeurIPS 🇨🇦 from Tuesday to Sunday!
Let me know if you're also coming and want to meet. Would love to discuss anything related to AI safety/generalization.
Also, I'm on the academic job market, so would be happy to discuss that as well! My application package: andriushchenko.me.
🧵1/4
07.12.2024 19:26 — 👍 5 🔁 0 💬 1 📌 0
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
We show that even the most recent safety-aligned LLMs are not robust to simple adaptive jailbreaking attacks. First, we demonstrate how to successfully leverage access to logprobs for jailbreaking: we...
Mindblowing: EPFL PhD student @maksym-andr.bsky.social, winner of best CS thesis award, showed that leading hashtag#AI models are not robust to simple adaptive jailbreaking attacks. Indeed, he managed to jailbraik all models with a 100% success rate 🤯
Jailbraking paper: arxiv.org/abs/2404.02151
06.12.2024 07:00 — 👍 12 🔁 1 💬 0 📌 2
Trying to figure out how AI works 🔍🧠
Currently at @ETHZ.ch, prev. EPFL 🇨🇭
LLMs, interpretability, emergence, grokking 🤖
➡️ musat.ai
We are a research institute investigating the trajectory of AI for the benefit of society.
epoch.ai
Working towards the safe development of AI for the benefit of all at Université de Montréal, LawZero and Mila.
A.M. Turing Award Recipient and most-cited AI researcher.
https://lawzero.org/en
https://yoshuabengio.org/profile/
Open source developer building tools to help journalists, archivists, librarians and others analyze, explore and publish their data. https://datasette.io […]
[bridged from https://fedi.simonwillison.net/@simon on the fediverse by https://fed.brid.gy/ ]
Language models and interpretable machine learning. Postdoc @ Uni Tübingen.
https://sbordt.github.io/
Prof. Uni Tübingen, Machine Learning, Robotics, Haptics
PhD Student in Tübingen (MPI-IS & Uni Tü), interested in reinforcement learning. Freedom is a pure idea. https://onnoeberhard.com/
Professor at ISTA (Institute of Science and Technology Austria), heading the Machine Learning and Computer Vision group. We work on Trustworthy ML (robustness, fairness, privacy) and transfer learning (continual, meta, lifelong). 🔗 https://cvml.ist.ac.at
ml safety researcher | visiting phd student @ETHZ | doing phd @ISTA | prev. @phystech | prev. developer @GSOC | love poetry
PhD Student in Machine Learning @unituebingen.bsky.social, @ml4science.bsky.social, @tuebingen-ai.bsky.social, IMPRS-IS; previously intern @vectorinstitute.ai; jzenn.github.io
Medical Doctor & Researcher @eye-tuebingen.bsky.social | Ophthalmology Resident | Disrupting Healthcare with Technology @unituebingen.bsky.social
https://www.dzvm.info
Deep Learning x {Symmetries, Structures, Randomness} 🦄
Researcher at Flatiron Computational Maths in NYC. PhD from EPFL. https://www.bsimsek.com/
Machine Learning Researcher | PhD Candidate @ucsd_cse | @trustworthy_ml
chhaviyadav.org
Associate Professor (Education) at SMU, Singapore. Software engineering, testing, and computing education. https://cposkitt.github.io/
Assistant Prof. @csaudk.bsky.social | Fellow @cphsodas.bsky.social
Previous: @icepfl.bsky.social @americanexpress @Xerox @Intel
Interests: 🥾🏔️🚴♂️🏋️♂️🎸
#NLProc #LLMs #AgenticAI #Causality #GraphML
https://www.cs.au.dk/~clan/people/aarora