This thread is incredible.
20.07.2025 15:01 β π 4205 π 1245 π¬ 322 π 642@gbalint.bsky.social
PhD student in Explainable Agents and Safe AI @ University of Edinburgh β’ gbalint.me β’ ππΊπ΄σ §σ ’σ ³σ £σ ΄σ Ώ
VCs: oh yeah, developers with AI go brrrrr...
METR: Uhm, actually they are 19% slower with AI
metr.org/blog/2025-07...
The Edinburgh RL & Agents reading group is back in action after a hiatus. Previous speakers come for all across the world, including DeepMind, CMU, Oxford, NUS, etc. Sign up for some great discussions about cutting-edge RL and agents research.
10.07.2025 14:09 β π 3 π 0 π¬ 0 π 0The absolute state of peer review...
05.07.2025 21:18 β π 4 π 0 π¬ 0 π 0Orban: Budapest Pride ist verboten.
Budapest Pride: Liebe kannst du nicht verbietenβ₯οΈπ³οΈβπ
Angeblich mehr als 500.000 Menschen vor Ort.
Shannon Vallor and Fabio Tollon on stage presenting their landscape study of responsible AI
The @braiduk.bsky.social gathering did an amazing job with presenting artists and researchers who address real-world questions around AI by actually engaging with people and learning from them. After hearing two weeks of technical talks at CHAI and RLDM, this was a most welcome break of pace.
19.06.2025 17:50 β π 2 π 1 π¬ 0 π 0Members of the Edinburgh RL group in front of the RLDM poster
I had the most amazing time at RLDM learning a lot about RL and agent foundations, catching up with and meeting new friends.
Two things that really stood out to me are:
- Agency is Frame Dependent by from Dave Abel
- Rethinking Foundations of Continual RL by Michael Bowling
#RLDM2025
I am heading to RLDM in Dublin this week to present our work on objective evaluation metrics for explainable RL. Hit me up there or send me a DM to connect if you are around.
09.06.2025 18:37 β π 2 π 0 π¬ 0 π 0Flowchart of the AXIS algorithm with 5 parts. The top-left has the memory, the centre-left has the user query, the centre-bottom has the final explanation, the centre has the LLM, and the right has the multi-agent simulator.
Screenshot of the arXiv paper
Preprint alert π Introducing the Agentic eXplanations via Interrogative Simulations (AXIS) algo.
AXIS integrates multi-agent simulators with LLMs by having the LLMs interrogate the simulator with counterfactual queries over multiple rounds for explaining agent behaviour.
arxiv.org/pdf/2505.17801
What can policy makers learn from βAI safety for everyoneβ (Read here: www.nature.com/articles/s42... ; joint work with @gbalint.bsky.social )? I wrote about some policy lessons for Tech Policy Press.
23.05.2025 20:32 β π 17 π 6 π¬ 0 π 0Delighted to share our most recent paper: "A Scoresheet for Explainable AI" (with John & Sebastian).
It will be presented at @aamasconf.bsky.social later this month.
π¬ Short YouTube summary (5 minutes): www.youtube.com/watch?v=GCpf...
π Link to the paper on arXiv: arxiv.org/abs/2502.098...
View of an auditorium stage with slides showing in the centre and Wakamiya-san and a sign-language interpreter showing on the sides.
90-year-old Masako Wakamiya at the final keynote of #CHI2025 shared a cautiously optimistic vision of the future of AI and humanity, especially for the elderly, as we enter the age of 100-year-long lives. Her speech and work is truly inspiring.
01.05.2025 03:29 β π 5 π 0 π¬ 0 π 0White body towel in plastic packaging with black text
Oh yeah. In English, because apparently it sounds sophisticated, or at least that is what I have heard on the internet... So it must be true
30.04.2025 09:39 β π 1 π 0 π¬ 0 π 0One of the things I find most unique about Japan are the unnecessary and questionable motivational quotes on just about anything.
"Humans can only put out what has been put into them."
says my pre-packed body towel in the fanciest of fonts.
Inspiring stuff
A white maneki-neko plushie with a CHI2025 scarf looking extra cute
The #CHI2025 plushie is looking too cute:
29.04.2025 02:45 β π 2 π 0 π¬ 0 π 0White translucent badge tag on a wooden table that says rejected author
#CHI2025 has a badge tag for rejected author. π₯²I couldn't resist getting one for future use.
27.04.2025 01:32 β π 9 π 1 π¬ 0 π 0Our key takeaways are:
1. Designing causality for explanations from first principles is essential to fully understand what explanations to give to people about autonomous agents;
2. People prefer goal-oriented explanations for AVs, so focusing on those first might be beneficial.
π§΅ 7/7
We also find that counterfactual explanations were not as effective at calibrating trust, which suggests that in more complex domains, such as with AVs, focusing on goal-oriented explanations first might be more useful initially.
π§΅ 6/7
We find the best predictor of both perceived explanation quality and trust calibration is the degree of teleology in the explanations.
In other words, people seem to prefer explanations that are goal-oriented.
This supports the idea that they ascribe beliefs, desires, and intentions to AVs.
π§΅ 5/7
In the 1st stage, we asked people to write explanations, in their own words, along the four explanatory modes, for 14 unique driving scenarios, giving us more than 1,300 explanations.
In the 2nd stage, different people annotated these explanations along axes of quality and trust calibration.
π§΅ 4/7
1. Teleological, or goal-oriented;
2. Mechanistic, or mentioning direct causal relationships;
2. Counterfactual;
3. Descriptive, or a rephrasing of the situation.
We test our framework with a two-stage quantitative experiment with people in the domain of autonomous vehicles (AVs).
π§΅ 3/7
This CHI'25 full paper investigates the explanations people like to give and receive when dealing with autonomous agents.
Based on first principles from cognitive science and philosophy, we derive the Framework of Explanatory Modes, which categorises explanations into four modes:
π§΅ 2/7
CHI program page for the paper "People Attribute Purpose to Autonomous Vehicles When Explaining Their Behavior: Insights from Cognitive Science for Explainable AI"
I am going to CHI'25. I am super excited for this, so reach out if you are there!
I will be presenting our work titled "People Attribute Purpose to Autonomous Vehicles When Explaining Their Behavior: Insights from Cognitive Science for Explainable AI" dl.acm.org/doi/10.1145/...
π§΅ 1/7 #CHI2025
Our paper "AI safety for everyone" with Balint Gyevnar is out at Nature Machine Intelligence: www.nature.com/articles/s42...
We challenge the narrative that AI safety is primarily about minimizing existential risks from AI. Why does this matter?
Read the paper here: www.nature.com/articles/s42... This work was done with the amazing @atoosakz.bsky.social
#AISafety #AIEthics #AIGovernance
'AI Safety for Everyone' is out now in @natmachintell.nature.com! Through an analysis of 383 papers, we find a rich landscape of methods that cover a much larger domain than mainstream notions of AI safety. Our takeaway: Epistemic inclusivity is important, the knowledge is there, we only need use it
17.04.2025 14:44 β π 13 π 3 π¬ 1 π 0I get so frustrated with the dilution of the term βliteracyβ - in no world should βAI literacyβ mean βprompt engineering classesβ but apparently here we are
12.04.2025 09:51 β π 21 π 2 π¬ 2 π 0Ironically, I talk about the issues of exploiting human cognitive biases to create the perfect artificial lover while writing an article that accidentally ended up using some of those biases to generate traffic. You can read the article here: gbalint.me/blog/2024/lo...
03.04.2025 11:17 β π 1 π 0 π¬ 0 π 0A while ago, I wrote an essay for the Stanford AI100 Study titled "Love, Sex, and AI", discussing the ethical challenges of AI and love. I put it on my webpage and forgot about it. That is, until last week. Since then, it had 4K visitors with <10s average engagement looking for "ai sex".
03.04.2025 11:17 β π 0 π 0 π¬ 1 π 0Out in Philosophical Studies. A π§΅: Read here: link.springer.com/article/10.1...
Most AI x-risk discussions focus on a cataclysmic momentβa decisive superintelligent takeover. But what if existential risk doesnβt arrive like a bomb, but seeps in like a leak?