This work is a collaboration with a team of talented researchers at the AreaSciencePark of Trieste, Italy.
Special thanks to @alexpietroserra.bsky.social, Alessio Ansuini and @albecazzaniga.bsky.social !
If you are @neuripsconf.bsky.social don't miss our poster tomorrow, Dec 11, at 11am!!
π§΅6/6
10.12.2024 21:46 β π 2 π 0 π¬ 0 π 0
βοΈ We applied an advanced density-based clustering algorithm, showing its potential as an interpretability tool and in guiding novel strategies for the effective finetuning of LLMs.
π§΅5/6
10.12.2024 19:54 β π 2 π 0 π¬ 1 π 0
In fine-tuning, answer-focused modes rapidly emerge midway through the network, just after the intrinsic dimension peak.
Early layers remain largely unchanged.
π§΅4/6
10.12.2024 19:52 β π 2 π 0 π¬ 1 π 0
In few-shot learning, the prompt topic defines the modes of data distribution early in the network, and density modes are hierarchically organized based on the similarity of the subjects.
π§΅3/6
10.12.2024 19:49 β π 2 π 0 π¬ 1 π 0
π― Key results: few-shot learning and fine-tuning show two distinct processing phases inside LLMs.
These phases are separated by a peak of the data intrinsic dimension and a sharp decrease in the separation of the probability modes.
Paper: arxiv.org/abs/2409.03662
π§΅2/6
10.12.2024 19:48 β π 2 π 0 π¬ 1 π 0
Just landed in Vancouver to present @neuripsconf.bsky.social the results of our new work!
Few-shot learning and fine-tuning change the layers inside LLMs in a dramatically different way, even when they perform equally well on multiple-choice question-answering tasks.
π§΅1/6
10.12.2024 19:47 β π 10 π 0 π¬ 1 π 3
Associate Professor at GroNLP ( @gronlp.bsky.social⬠) #NLP | Multilingualism | Interpretability | Language Learning in Humans vs NeuralNets | Mum^2
Head of the InClow research group: https://inclow-lm.github.io/
Language Scientist @ Pompeu Fabra University
https://generalstrikeus.com/
PhD student in computational linguistics at UPF
chengemily1.github.io
Previously: MIT CSAIL, ENS Paris
Barcelona
PhD Student in Colt UPF
https://mahautm.github.io/
https://ellis-jena.eu is developing+applying #AI #ML in #earth system, #climate & #environmental research.
Partner: @uni-jena.de, https://bgc-jena.mpg.de/en, @dlr-spaceagency.bsky.social, @carlzeissstiftung.bsky.social, https://aiforgood.itu.int
Prof (CS @Stanford), Co-Director @StanfordHAI, Cofounder/CEO @theworldlabs, CoFounder @ai4allorg #AI #computervision #robotics #AI-healthcare
Personal Account
Founder: The Distributed AI Research Institute @dairinstitute.bsky.social.
Author: The View from Somewhere, a memoir & manifesto arguing for a technological future that serves our communities (to be published by One Signal / Atria
AI & Physics PhD student at EPFL. Working on understanding AI (generalization, deep generative models, post-training). Former applied scientist intern at Amazon AI.
https://alesfav.github.io/
Researcher in machine learning
Searching for principles of neural representation | Neuro + AI @ enigmaproject.ai | Stanford | sophiasanborn.com
https://unireps.org
Discover why, when and how distinct learning processes yield similar representations, and the degree to which these can be unified.
PhD student in NLP at Sapienza | Prev: Apple MLR, @colt-upf.bsky.social , HF Bigscience, PiSchool, HumanCentricArt #NLProc
www.santilli.xyz
Apple MLR (Barcelona) intern | ELLIS Ph.D. student in representation learning @SapienzaRoma & @ISTAustria | Former NLP Engineer @babelscape
flegyas.github.io
The Thirty-Eighth Annual Conference on Neural Information Processing Systems will be held in Vancouver Convention Center, on Tuesday, Dec 10 through Sunday, Dec 15.
https://neurips.cc/
Molecular simulations and AI at Area Science Park
NLP & Interpretability | PhD Student @ University of Trieste & Laboratory of Data Engineering of Area Science Park | Prev MPI-IS