Tom Kempton @tomkempton - Bluesky Profile

Tom Kempton

@tomkempton.bsky.social

Pure mathematician working in Ergodic Theory, Fractal Geometry, and (recently) Large Language Models. Senior Lecturer (= Associate Professor) at the University of Manchester.

125 Followers | 947 Following | 14 Posts | Joined: 16.11.2024 | 1.6015

Latest posts by tomkempton.bsky.social on Bluesky

Haven't seen this and it sounds interesting, could you post a link to something using it. Thanks!

27.03.2025 18:29 — 👍 1 🔁 0 💬 1 📌 0

Thanks, I'll take a look!

12.02.2025 16:06 — 👍 2 🔁 0 💬 1 📌 0

Thanks for the reply! What I meant by confidence here (possibly the wrong word) isn't how concentrated the output prob vector is, but how close we think the output prob is to the true next token distribution (if such a thing existed...).

12.02.2025 11:48 — 👍 1 🔁 0 💬 1 📌 0

I'm not sure I really believe that there's no information to be gleaned though. Maybe one needs to think more about training dynamics...

12.02.2025 11:44 — 👍 1 🔁 0 💬 1 📌 0

So one answer to my question, which I'd not thought about until your answer, is that, while softmax is not injective on R^|V|, it is injective when you restrict it to the column space of the output embedding matrix, so there's nothing to think about here.

12.02.2025 11:44 — 👍 0 🔁 0 💬 1 📌 0

I'd guess X shouldn't be in this column space, otherwise there's a wasted dimension which doesn't make it to the output (although it would be interesting to see whether, if you included it, it contained interesting info).

12.02.2025 11:44 — 👍 0 🔁 0 💬 1 📌 0

Presumably this is well studied, could anyone point me in the direction of references?

12.02.2025 08:29 — 👍 0 🔁 0 💬 0 📌 0

Let's call a logits vector 'large' if the division term in the softmax is large. Might we guess that large logits vectors correspond to confident situations where the model is satisfied with the possible choices of next token (either many good options, or just one option but it looks great?)

12.02.2025 08:29 — 👍 0 🔁 0 💬 2 📌 0

Since softmax is not injective, many different logits vectors output the same probability distribution. (Precisely, v and w output the same distribution if they differ by a constant multiple of the 'all ones' vector). Can we infer anything from the logits vector beyond the prob. dist. it outputs?

12.02.2025 08:29 — 👍 0 🔁 0 💬 1 📌 0

Is it just that we initialise the network with small weights and so our prior is that this should persist?

Tips or links would be very welcome!

31.01.2025 08:43 — 👍 0 🔁 0 💬 1 📌 0

Theoretically, the later missing layers could permute earlier layers, or multiply all the activations by -1. So I don't see any reason that one should expect training a language model to result in a model where naively applying the output embedding to earlier layers is a sensible thing to do.

31.01.2025 08:43 — 👍 0 🔁 0 💬 1 📌 0

Can anyone point me to a reference saying early exit from a neural network is a reasonable thing to do?

As I understand it, early exit (from say a language model) involves taking the output from some early layer and applying the output embedding.

31.01.2025 08:43 — 👍 0 🔁 0 💬 1 📌 0

I'm sure it's been asked a thousand times, but what's everyone's favourite method of making lists of articles they want to read?

27.11.2024 11:34 — 👍 4 🔁 0 💬 3 📌 1

Today's question from the four year old: if all of the zookeepers in the world suddenly died would the farmers look after the zoo animals or would that be the job of the vets? Had to admit I didn't know the answer...

23.11.2024 15:38 — 👍 0 🔁 0 💬 0 📌 0

@tomkempton is following 20 prominent accounts

JamesSmithRF
@jamessmithrf

Research Director at the Resolution Foundation. Previous lives at the Bank of England and in the civil service. Focussed mainly on macroeconomics (mainly).

Leqi Liu
@leqiliu

AI/ML Researcher | Assistant Professor at UT Austin | Postdoc at Princeton PLI | PhD, Machine Learning Department, CMU. Research goal: Building controllable machine intelligence that serves humanity safely. leqiliu.github.io

Pietro Lesci
@pietrolesci

PhD student at Cambridge University. Causality & language models. Passionate musician, professional debugger. pietrolesci.github.io

Naomi Alderman
@naomialderman

I write novels (eg The Power, new novel is The Future), I make games (eg Zombies, Run!), unorthodox Jew. not-getting-into-pointless-arguments-on-the-internet is an act of revolution. However complex you think things are, they're more complex than that

Sung Kim
@sungkim

A business analyst at heart who enjoys delving into AI, ML, data engineering, data science, data analytics, and modeling. My views are my own. You can also find me at threads: @sung.kim.mw

Tom Silver
@tomssilver

Assistant Professor @Princeton. Developing robots that plan and learn to help people. Prev: @Cornell, @MIT, @Harvard. https://tomsilver.github.io/

Laura Ruotsalainen
@lrlaurahy

Professor in Computer Science @helsinkiuni.bsky.social Lead of AI and sustainability at the Finnish Center for AI (FCAI)

EMNLP
@emnlpmeeting

EMNLP 2025 - The annual Conference on Empirical Methods in Natural Language Processing Dates: November 5-9, 2025 in Suzhou, China Hashtags: #EMNLP2025 #NLP Submission Deadline: May 19th, 2025

Ben Lipkin
@benlipkin

phd @ mit, research @ genlm, intern @ apple https://benlipkin.github.io/

Nikola Zubić / Никола Зубић
@nikolazubic

Ph.D. student at Robotics and Perception Group, University of Zurich | Associated Researcher at ETH AI Center | 🇷🇸 https://nikolazubic.github.io/

Elle Cordova
@ellecordova

A nerd at large. Writer. Musician. Sometimes funny. 🏳️‍🌈 Join my sci-fi book club: http://patreon.com/ellecordova nerd humor • space & sci fi • books

Eugene Vinitsky 🍒
@eugenevinitsky

Anti-cynic. Towards a weirder future. Reinforcement Learning, Autonomous Vehicles, transportation systems, the works. Asst. Prof at NYU https://emerge-lab.github.io https://www.admonymous.co/eugenevinitsky

Jon Ullman
@thejonullman

Associate Professor of Computer Science at Northeastern University in Boston. Dad. Imposter.

Katie Notopoulos
@katienotopoulos

Blogger at Business Insider, covering tech, business, culture. knotopoulos@businessinsider.com

Alex Warstadt
@alexwarstadt

Asst Prof. @ UCSD | PI of LeM🍋N Lab | Former Postdoc at ETH Zürich, PhD @ NYU | computational linguistics, NLProc, CogSci, pragmatics | he/him 🏳️‍🌈 alexwarstadt.github.io

Andreas Geiger
@andreasgeiger

Professor, University of Tübingen @unituebingen.bsky.social. Head of Department of Computer Science 🎓. Faculty, Tübingen AI Center 🇩🇪 @tuebingen-ai.bsky.social. ELLIS Fellow, Founding Board Member 🇪🇺 @ellis.eu. CV 📷, ML 🧠, Self-Driving 🚗, NLP 🖺

SE Gyges
@segyges

Como todos los hombres de Babilonia, he sido procónsul; como todos, esclavo; también he conocido la omnipotencia, el oprobio, las cárceles. very sane ai newsletter: verysane.ai

Thomas House
@tah-sci.com

Professor of Mathematical Sciences, working mainly on epidemiology although partial to a bit of non-commutative algebra, social science and basic biology. https://personalpages.manchester.ac.uk/staff/thomas.house/about.html

Thomas Wolf
@thomwolf

Co-founder @huggingface

David Picard
@davidpicard

Professor of Computer Vision/Machine Learning at Imagine/LIGM, École nationale des Ponts et Chaussées @ecoledesponts.bsky.social Music & overall happiness 🌳🪻 Born well below 350ppm 😬 mostly silly personal views 📍Paris 🔗 https://davidpicard.github.io/