André Panisson's Avatar

André Panisson

@panisson.bsky.social

Principal Researcher @ CENTAI.eu | Leading the Responsible AI Team. Building Responsible AI through Explainable AI, Fairness, and Transparency. Researching Graph Machine Learning, Data Science, and Complex Systems to understand collective human behavior.

728 Followers  |  469 Following  |  11 Posts  |  Joined: 06.09.2023  |  1.7544

Latest posts by panisson.bsky.social on Bluesky

Preview
Tracing the thoughts of a large language model Anthropic's latest interpretability research: a new microscope to understand Claude's internal mechanisms

Anthropic dropped some insights into how AI brains work with their circuit tracing method. Turns out LLMs are bad at math because they’re eyeballing it (“36+59? Eh, 40ish+60ish=95?”). It means we’re one step closer to understanding the inner workings of LLMs.
#LLMs #AI #Interpretability

31.03.2025 07:26 — 👍 5    🔁 1    💬 0    📌 0
Post image

*Automatically Interpreting Millions of Features in LLMs*
by @norabelrose.bsky.social et al.

An open-source pipeline for finding interpretable features in LLMs with sparse autoencoders and automated explainability methods from @eleutherai.bsky.social.

arxiv.org/abs/2410.13928

27.11.2024 14:58 — 👍 27    🔁 6    💬 0    📌 2
Preview
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with large language models (LLMs) and computer vision (CV) systems driving advancements in natural language un...

Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey

Presents a framework categorizing MLLM explainability across data, model, and training perspectives to enhance transparency and trustworthiness.

📝 arxiv.org/abs/2412.02104

04.12.2024 05:54 — 👍 11    🔁 1    💬 0    📌 0
Preview
a bald man with a beard is smiling in front of a group of people ALT: a bald man with a beard is smiling in front of a group of people

I am extremely honoured to receive the @ERC_Research
#ERCCoG award for #RUNES. For the next five years, I will be working on the mathematical, computational, and experimental (!!) sides to understand how higher-order interactions change how we think and coordinate.

03.12.2024 15:58 — 👍 60    🔁 9    💬 7    📌 0

Latest one out! 👇👇👇👇👇

30.11.2024 10:19 — 👍 18    🔁 4    💬 0    📌 0
Preview
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models Hallucinations in large language models are a widespread problem, yet the mechanisms behind whether models will hallucinate are poorly understood, limiting our ability to solve this problem. Using spa...

The authors, as seen in the preprint recently published in Arxiv, include Neel Nanda from Google Deepmind, head of the mechanistic interpretability team
arxiv.org/abs/2411.14257

30.11.2024 19:57 — 👍 0    🔁 0    💬 0    📌 0
Preview
Scaling and evaluating sparse autoencoders Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since language m...

Yes, and it's not so new
arxiv.org/abs/2406.04093

30.11.2024 19:48 — 👍 2    🔁 0    💬 0    📌 0
A poster with a light blue background, featuring the paper with title: “A True-to-the-Model Axiomatic Benchmark for Graph-based Explainers”.
Authors: Corrado Monti, Paolo Bajardi, Francesco Bonchi, André Panisson, Alan Perotti 

Background
Explainability in GNNs is crucial for enhancing trust understanding in machine learning models. Current benchmarks focus on data, ignoring the model’s actual decision logic, leading to gaps in understanding. Furthermore, existing methods often lack standardized benchmarks to measure their reliability and effectiveness.

Motivation
Reliable, standardised benchmarks are needed to ensure explainers reflect the internal logic of graph-based models, aiding in fairness, accountability, and regulatory compliance.

Research Question
If a model M is using a protected feature f , for instance using the gender of a user to classify whether their ads should gain more visibility, is a given explainer E able to detect it?

Core Idea
An explainer should detect if a model relies on specific features for node classification.
Implements a “true-to-the-model” rather than “truth-to-the-data” logic.

Key Components
White-Box Classifiers:  Local, Neighborhood, and Two-Hop Models with hardcoded logic for feature importance.
Axioms: an explainer must assign higher scores to truly important features.
Findings:
Explainer Performance
Deconvolution: Perfect fidelity but limited to GNNs.
GraphLIME: Fails with non-local correlations and high sparsity.
LRP/Integrated Gradients: Struggle with zero-valued features.
GNNExplainer: Sensitive to sparsity and edge masking.

Real-World Insights: Facebook Dataset
Fidelity in detecting protected feature use in classification.
Results for different explainers, highlighting strengths and limitations.
Contributions:
Proposed a rigorous framework for benchmarking explainers
Demonstrated practical biases and flaws in popular explainers

A poster with a light blue background, featuring the paper with title: “A True-to-the-Model Axiomatic Benchmark for Graph-based Explainers”. Authors: Corrado Monti, Paolo Bajardi, Francesco Bonchi, André Panisson, Alan Perotti Background Explainability in GNNs is crucial for enhancing trust understanding in machine learning models. Current benchmarks focus on data, ignoring the model’s actual decision logic, leading to gaps in understanding. Furthermore, existing methods often lack standardized benchmarks to measure their reliability and effectiveness. Motivation Reliable, standardised benchmarks are needed to ensure explainers reflect the internal logic of graph-based models, aiding in fairness, accountability, and regulatory compliance. Research Question If a model M is using a protected feature f , for instance using the gender of a user to classify whether their ads should gain more visibility, is a given explainer E able to detect it? Core Idea An explainer should detect if a model relies on specific features for node classification. Implements a “true-to-the-model” rather than “truth-to-the-data” logic. Key Components White-Box Classifiers: Local, Neighborhood, and Two-Hop Models with hardcoded logic for feature importance. Axioms: an explainer must assign higher scores to truly important features. Findings: Explainer Performance Deconvolution: Perfect fidelity but limited to GNNs. GraphLIME: Fails with non-local correlations and high sparsity. LRP/Integrated Gradients: Struggle with zero-valued features. GNNExplainer: Sensitive to sparsity and edge masking. Real-World Insights: Facebook Dataset Fidelity in detecting protected feature use in classification. Results for different explainers, highlighting strengths and limitations. Contributions: Proposed a rigorous framework for benchmarking explainers Demonstrated practical biases and flaws in popular explainers

Check out our poster at #LoG2024, based on our #TMLR paper:
📍 “A True-to-the-Model Axiomatic Benchmark for Graph-based Explainers”
🗓️ Tuesday 4–6 PM CET
📌 Poster Session 2, GatherTown
Join us to discuss graph ML explainability and benchmarks
#ExplainableAI #GraphML
openreview.net/forum?id=HSQTv3R8Iz

26.11.2024 17:27 — 👍 2    🔁 0    💬 0    📌 0
Article information

Title: Boosting human competences with interpretable and explainable artificial intelligence.

Full citation: Herzog, S. M., & Franklin, M. (2024). Boosting human competences with interpretable and explainable artificial intelligence. Decision, 11(4), 493–510. https://doi.org/10.1037/dec0000250

Abstract: Artificial intelligence (AI) is becoming integral to many areas of life, yet many—if not most—AI systems are opaque black boxes. This lack of transparency is a major source of concern, especially in high-stakes settings (e.g., medicine or criminal justice). The field of explainable AI (XAI) addresses this issue by explaining the decisions of opaque AI systems. However, such post hoc explanations are troubling because they cannot be faithful to what the original model computes—otherwise, there would be no need to use that black box model. A promising alternative is simple, inherently interpretable models (e.g., simple decision trees), which can match the performance of opaque AI systems. Because interpretable models represent—by design—faithful explanations of themselves, they empower informed decisions about whether to trust them. We connect research on XAI and inherently interpretable AI with that on behavioral science and boosts for competences. This perspective suggests that both interpretable AI and XAI could boost people’s competences to critically evaluate AI systems and their ability to make accurate judgments (e.g., medical diagnoses) in the absence of any AI support. Furthermore, we propose how to empirically assess whether and how AI support fosters such competences. Our theoretical analysis suggests that interpretable AI models are particularly promising and—because of XAI’s drawbacks—preferable. Finally, we argue that explaining large language models (LLMs) faces similar challenges as XAI for supervised machine learning and that the gist of our conjectures also holds for LLMs.

Article information Title: Boosting human competences with interpretable and explainable artificial intelligence. Full citation: Herzog, S. M., & Franklin, M. (2024). Boosting human competences with interpretable and explainable artificial intelligence. Decision, 11(4), 493–510. https://doi.org/10.1037/dec0000250 Abstract: Artificial intelligence (AI) is becoming integral to many areas of life, yet many—if not most—AI systems are opaque black boxes. This lack of transparency is a major source of concern, especially in high-stakes settings (e.g., medicine or criminal justice). The field of explainable AI (XAI) addresses this issue by explaining the decisions of opaque AI systems. However, such post hoc explanations are troubling because they cannot be faithful to what the original model computes—otherwise, there would be no need to use that black box model. A promising alternative is simple, inherently interpretable models (e.g., simple decision trees), which can match the performance of opaque AI systems. Because interpretable models represent—by design—faithful explanations of themselves, they empower informed decisions about whether to trust them. We connect research on XAI and inherently interpretable AI with that on behavioral science and boosts for competences. This perspective suggests that both interpretable AI and XAI could boost people’s competences to critically evaluate AI systems and their ability to make accurate judgments (e.g., medical diagnoses) in the absence of any AI support. Furthermore, we propose how to empirically assess whether and how AI support fosters such competences. Our theoretical analysis suggests that interpretable AI models are particularly promising and—because of XAI’s drawbacks—preferable. Finally, we argue that explaining large language models (LLMs) faces similar challenges as XAI for supervised machine learning and that the gist of our conjectures also holds for LLMs.

🌟🤖📝 **Boosting human competences with interpretable and explainable artificial intelligence**

How can AI *boost* human decision-making instead of replacing it? We talk about this in our new paper.

doi.org/10.1037/dec0...

#AI #XAI #InterpretableAI #IAI #boosting #competences
🧵👇

20.11.2024 12:25 — 👍 73    🔁 23    💬 4    📌 3

NeurIPS Conference is now Live on Bluesky!

-NeurIPS2024 Communication Chairs

22.11.2024 01:33 — 👍 278    🔁 68    💬 11    📌 6
Screenshot of the paper.

Screenshot of the paper.

Even as an interpretable ML researcher, I wasn't sure what to make of Mechanistic Interpretability, which seemed to come out of nowhere not too long ago.

But then I found the paper "Mechanistic?" by
@nsaphra.bsky.social and @sarah-nlp.bsky.social, which clarified things.

20.11.2024 08:00 — 👍 231    🔁 27    💬 7    📌 2

You might like the work from @aliciacurth.bsky.social. Fantastic contributions to understanding this effect.

19.11.2024 07:29 — 👍 4    🔁 0    💬 1    📌 0

👋 I do research on xAI for Graph ML and am starting to explore Mechanistic Interpretability. I'd love to be added!

17.11.2024 21:07 — 👍 0    🔁 0    💬 0    📌 0

18M + 1.
💙, Mar🐫

17.11.2024 02:58 — 👍 82888    🔁 7562    💬 6359    📌 1967

Since LLMs are essentially artefacts of human knowledge, we can use them as a lens to study human biases and behaviour patterns. Exploring their learned representations could unlock new insights. Got ideas or want to collaborate on this? Let’s connect!

16.11.2024 17:46 — 👍 0    🔁 0    💬 0    📌 0
Preview
Do I Know This Entity? Knowledge Awareness and Hallucinations in... Hallucinations in large language models are a widespread problem, yet the mechanisms behind whether models will hallucinate are poorly understood, limiting our ability to solve this problem. Using...

In "Do I Know This Entity?", Sparse autoencoders reveal how LLMs recognize entities they ‘know’—and how this self-knowledge impacts hallucinations. These insights could help steer models to refuse or hallucinate less. Fascinating work on interpretability of LLMs!
openreview.net/forum?id=WCR...

16.11.2024 17:39 — 👍 5    🔁 0    💬 2    📌 0
Preview
Scaling and evaluating sparse autoencoders Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since...

In Scaling and Evaluating Sparse Autoencoders, they extract 16M concepts (latents) from GPT-4 (guess the authors?).
They simplify tuning with k-sparse autoencoders and results show many improvements in explainability. Code, models (not all!) and visualizer included.
openreview.net/forum?id=tcs...

16.11.2024 17:37 — 👍 12    🔁 1    💬 1    📌 1

ICLR is a top AI conference, and while the 2025 papers aren’t officially out yet, reviews are open. I’m diving into the highest rated in Interpretability and Explainable AI. Interestingly, the top ones focus on Mechanistic Interpretability, a promising topic that our team is starting to explore.

16.11.2024 17:33 — 👍 4    🔁 0    💬 1    📌 0
Preview
The metaphors of artificial intelligence A few months after ChatGPT was released, the neural network pioneer Terrence Sejnowski wrote about coming to grips with the shock of what large language models (LLMs) could do: “Something is beginning...

For Science Magazine, I wrote about "The Metaphors of Artificial Intelligence".

The way you conceptualize AI systems affects how you interact with them, do science on them, and create policy and apply laws to them.

Hope you will check it out!

www.science.org/doi/full/10....

14.11.2024 22:55 — 👍 440    🔁 151    💬 23    📌 26

Bluesky feels like traveling back to the golden age of Twitter: when the follow button meant something, and your feed wasn’t a dystopian lineup of blue-tagged bots. It’s refreshing to be somewhere I don’t need an AI to explain why I’m seeing a post. Let’s hope we don’t ruin it this time!

16.11.2024 11:22 — 👍 10    🔁 1    💬 0    📌 0

@panisson is following 20 prominent accounts