BΓ‘lint GyevnΓ‘r's Avatar

BΓ‘lint GyevnΓ‘r

@gbalint.bsky.social

PhD student in Explainable Agents and Safe AI @ University of Edinburgh β€’ gbalint.me β€’ πŸ‡­πŸ‡ΊπŸ΄σ §σ ’σ ³σ £σ ΄σ Ώ

95 Followers  |  203 Following  |  31 Posts  |  Joined: 23.11.2024  |  1.6821

Latest posts by gbalint.bsky.social on Bluesky

Preview
Jason βœ¨πŸ‘ΎSaaStr.Ai✨ Lemkin (@jasonlk) .@Replit goes rogue during a code freeze and shutdown and deletes our entire database

This thread is incredible.

20.07.2025 15:01 β€” πŸ‘ 4205    πŸ” 1245    πŸ’¬ 322    πŸ“Œ 642
Preview
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity We conduct a randomized controlled trial to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find th...

VCs: oh yeah, developers with AI go brrrrr...
METR: Uhm, actually they are 19% slower with AI

metr.org/blog/2025-07...

11.07.2025 12:21 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The Edinburgh RL & Agents reading group is back in action after a hiatus. Previous speakers come for all across the world, including DeepMind, CMU, Oxford, NUS, etc. Sign up for some great discussions about cutting-edge RL and agents research.

10.07.2025 14:09 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The absolute state of peer review...

05.07.2025 21:18 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Orban: Budapest Pride ist verboten.

Budapest Pride: Liebe kannst du nicht verbietenβ™₯οΈπŸ³οΈβ€πŸŒˆ

Angeblich mehr als 500.000 Menschen vor Ort.

28.06.2025 15:00 β€” πŸ‘ 2938    πŸ” 659    πŸ’¬ 62    πŸ“Œ 64
Shannon Vallor and Fabio Tollon on stage presenting their landscape study of responsible AI

Shannon Vallor and Fabio Tollon on stage presenting their landscape study of responsible AI

The @braiduk.bsky.social gathering did an amazing job with presenting artists and researchers who address real-world questions around AI by actually engaging with people and learning from them. After hearing two weeks of technical talks at CHAI and RLDM, this was a most welcome break of pace.

19.06.2025 17:50 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Members of the Edinburgh RL group in front of the RLDM poster

Members of the Edinburgh RL group in front of the RLDM poster

I had the most amazing time at RLDM learning a lot about RL and agent foundations, catching up with and meeting new friends.

Two things that really stood out to me are:
- Agency is Frame Dependent by from Dave Abel
- Rethinking Foundations of Continual RL by Michael Bowling

#RLDM2025

14.06.2025 17:04 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I am heading to RLDM in Dublin this week to present our work on objective evaluation metrics for explainable RL. Hit me up there or send me a DM to connect if you are around.

09.06.2025 18:37 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Flowchart of the AXIS algorithm with 5 parts. The top-left has the memory, the centre-left has the user query, the centre-bottom has the final explanation, the centre has the LLM, and the right has the multi-agent simulator.

Flowchart of the AXIS algorithm with 5 parts. The top-left has the memory, the centre-left has the user query, the centre-bottom has the final explanation, the centre has the LLM, and the right has the multi-agent simulator.

Screenshot of the arXiv paper

Screenshot of the arXiv paper

Preprint alert πŸŽ‰ Introducing the Agentic eXplanations via Interrogative Simulations (AXIS) algo.

AXIS integrates multi-agent simulators with LLMs by having the LLMs interrogate the simulator with counterfactual queries over multiple rounds for explaining agent behaviour.

arxiv.org/pdf/2505.17801

30.05.2025 14:35 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

What can policy makers learn from β€œAI safety for everyone” (Read here: www.nature.com/articles/s42... ; joint work with @gbalint.bsky.social )? I wrote about some policy lessons for Tech Policy Press.

23.05.2025 20:32 β€” πŸ‘ 17    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0
A Scoresheet for Explainable AI
YouTube video by Michael Winikoff A Scoresheet for Explainable AI

Delighted to share our most recent paper: "A Scoresheet for Explainable AI" (with John & Sebastian).

It will be presented at @aamasconf.bsky.social later this month.

🎬 Short YouTube summary (5 minutes): www.youtube.com/watch?v=GCpf...

πŸ“ Link to the paper on arXiv: arxiv.org/abs/2502.098...

09.05.2025 04:10 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
View of an auditorium stage with slides showing in the centre and Wakamiya-san and a sign-language interpreter showing on the sides.

View of an auditorium stage with slides showing in the centre and Wakamiya-san and a sign-language interpreter showing on the sides.

90-year-old Masako Wakamiya at the final keynote of #CHI2025 shared a cautiously optimistic vision of the future of AI and humanity, especially for the elderly, as we enter the age of 100-year-long lives. Her speech and work is truly inspiring.

01.05.2025 03:29 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
White body towel in plastic packaging with black text

White body towel in plastic packaging with black text

Oh yeah. In English, because apparently it sounds sophisticated, or at least that is what I have heard on the internet... So it must be true

30.04.2025 09:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

One of the things I find most unique about Japan are the unnecessary and questionable motivational quotes on just about anything.

"Humans can only put out what has been put into them."

says my pre-packed body towel in the fanciest of fonts.
Inspiring stuff

30.04.2025 09:34 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
A white maneki-neko plushie with a CHI2025 scarf looking extra cute

A white maneki-neko plushie with a CHI2025 scarf looking extra cute

The #CHI2025 plushie is looking too cute:

29.04.2025 02:45 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
White translucent badge tag on a wooden table that says rejected author

White translucent badge tag on a wooden table that says rejected author

#CHI2025 has a badge tag for rejected author. πŸ₯²I couldn't resist getting one for future use.

27.04.2025 01:32 β€” πŸ‘ 9    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Our key takeaways are:
1. Designing causality for explanations from first principles is essential to fully understand what explanations to give to people about autonomous agents;
2. People prefer goal-oriented explanations for AVs, so focusing on those first might be beneficial.

🧡 7/7

24.04.2025 10:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We also find that counterfactual explanations were not as effective at calibrating trust, which suggests that in more complex domains, such as with AVs, focusing on goal-oriented explanations first might be more useful initially.

🧡 6/7

24.04.2025 10:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We find the best predictor of both perceived explanation quality and trust calibration is the degree of teleology in the explanations.

In other words, people seem to prefer explanations that are goal-oriented.

This supports the idea that they ascribe beliefs, desires, and intentions to AVs.

🧡 5/7

24.04.2025 10:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In the 1st stage, we asked people to write explanations, in their own words, along the four explanatory modes, for 14 unique driving scenarios, giving us more than 1,300 explanations.

In the 2nd stage, different people annotated these explanations along axes of quality and trust calibration.

🧡 4/7

24.04.2025 10:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

1. Teleological, or goal-oriented;
2. Mechanistic, or mentioning direct causal relationships;
2. Counterfactual;
3. Descriptive, or a rephrasing of the situation.

We test our framework with a two-stage quantitative experiment with people in the domain of autonomous vehicles (AVs).

🧡 3/7

24.04.2025 10:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This CHI'25 full paper investigates the explanations people like to give and receive when dealing with autonomous agents.

Based on first principles from cognitive science and philosophy, we derive the Framework of Explanatory Modes, which categorises explanations into four modes:

🧡 2/7

24.04.2025 10:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
CHI program page for the paper "People Attribute Purpose to Autonomous Vehicles When Explaining Their Behavior: Insights from Cognitive Science for Explainable AI"

CHI program page for the paper "People Attribute Purpose to Autonomous Vehicles When Explaining Their Behavior: Insights from Cognitive Science for Explainable AI"

I am going to CHI'25. I am super excited for this, so reach out if you are there!

I will be presenting our work titled "People Attribute Purpose to Autonomous Vehicles When Explaining Their Behavior: Insights from Cognitive Science for Explainable AI" dl.acm.org/doi/10.1145/...

🧡 1/7 #CHI2025

24.04.2025 10:42 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1
Preview
AI safety for everyone - Nature Machine Intelligence A systematic review of peer-reviewed AI safety research reveals extensive work on practical and immediate concerns. The findings advocate for an inclusive approach to AI safety that embraces diverse m...

Our paper "AI safety for everyone" with Balint Gyevnar is out at Nature Machine Intelligence: www.nature.com/articles/s42...
We challenge the narrative that AI safety is primarily about minimizing existential risks from AI. Why does this matter?

17.04.2025 14:14 β€” πŸ‘ 12    πŸ” 8    πŸ’¬ 3    πŸ“Œ 1

Read the paper here: www.nature.com/articles/s42... This work was done with the amazing @atoosakz.bsky.social

#AISafety #AIEthics #AIGovernance

17.04.2025 14:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

'AI Safety for Everyone' is out now in @natmachintell.nature.com! Through an analysis of 383 papers, we find a rich landscape of methods that cover a much larger domain than mainstream notions of AI safety. Our takeaway: Epistemic inclusivity is important, the knowledge is there, we only need use it

17.04.2025 14:44 β€” πŸ‘ 13    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

I get so frustrated with the dilution of the term β€˜literacy’ - in no world should β€˜AI literacy’ mean β€˜prompt engineering classes’ but apparently here we are

12.04.2025 09:51 β€” πŸ‘ 21    πŸ” 2    πŸ’¬ 2    πŸ“Œ 0
Love, Sex, and AI | BΓ‘lint GyevnΓ‘r Love AI: How will we love in the age of AI agents?

Ironically, I talk about the issues of exploiting human cognitive biases to create the perfect artificial lover while writing an article that accidentally ended up using some of those biases to generate traffic. You can read the article here: gbalint.me/blog/2024/lo...

03.04.2025 11:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A while ago, I wrote an essay for the Stanford AI100 Study titled "Love, Sex, and AI", discussing the ethical challenges of AI and love. I put it on my webpage and forgot about it. That is, until last week. Since then, it had 4K visitors with <10s average engagement looking for "ai sex".

03.04.2025 11:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Two types of AI existential risk: decisive and accumulative - Philosophical Studies The conventional discourse on existential risks (x-risks) from AI typically focuses on abrupt, dire events caused by advanced AI systems, particularly those that might achieve or surpass human-level i...

Out in Philosophical Studies. A 🧡: Read here: link.springer.com/article/10.1...

Most AI x-risk discussions focus on a cataclysmic momentβ€”a decisive superintelligent takeover. But what if existential risk doesn’t arrive like a bomb, but seeps in like a leak?

30.03.2025 22:43 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

@gbalint is following 20 prominent accounts