Desi R Ivanova's Avatar

Desi R Ivanova

@desirivanova.bsky.social

Research fellow @OxfordStats @OxCSML, spent time at FAIR and MSR Former quant πŸ“ˆ (@GoldmanSachs), former former gymnast πŸ€Έβ€β™€οΈ My opinions are my own πŸ‡§πŸ‡¬-πŸ‡¬πŸ‡§ sh/ssh

4,408 Followers  |  222 Following  |  64 Posts  |  Joined: 17.11.2024
Posts Following

Posts by Desi R Ivanova (@desirivanova.bsky.social)

🏒 Vacancy: Novo Nordisk Postdoctoral Research Fellow (4 posts)
πŸ“ Department of Statistics, University of Oxford
πŸ“ƒ Contract: Full time, fixed-term for 3 years
πŸ’· Salary Range: Β£36,024 – Β£44,263
⏲️ Deadline: 12pm UK, 30 Apr 2025

Full details & how to apply πŸ‘‰ shorturl.at/3l47e

07.04.2025 09:56 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
LASR Labs Join a 13-week research programme and write a technical AI safety paper in a small team with supervision from an experienced researcher. Work full time from the LISA offices in London alongside AI Saf...

Research internships in AI safety www.lasrlabs.org

07.04.2025 12:25 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Lecture 6: Training practicalities Deep learning's black magic

Last DL lecture open.substack.com/pub/probappr...

17.03.2025 14:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Lecture 5: Backprop and Autodiff Order matters

Lecture 5: Backpropagation and Autodifferentiation

Thank god the days of computing gradients by hand are over! Nevertheless, it’s good to know what backprop is and why we do it

open.substack.com/pub/probappr...

12.03.2025 12:11 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Lecture 4: Neural network architectures Attention!

The fourth post in the series: open.substack.com/pub/probappr...

09.03.2025 18:46 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Go read it on arXiv! Thanks to my co-authors @sambowyer.bsky.social and @laurenceai.bsky.social πŸ’₯

06.03.2025 15:00 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Along with the lightweight library, we provide short code snippets in the paper.

06.03.2025 15:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

…and for constructing error bars on more complicated metrics, such as F1 score, that require the flexibility of Bayes.

06.03.2025 15:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

...and treated without an independence assumption (e.g. using the same eval questions on both LLMs)...

06.03.2025 15:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

...for making comparisons between two LLMs treated independently...

06.03.2025 15:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We also suggest simple methods for the clustered-question setting (where we don't assume all questions are IID -- instead we have T groups of N/T IID questions)...

06.03.2025 15:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Or, in this IID question setting, if you want to stay frequentist you can use Wilson-score intervals: en.wikipedia.org/wiki/Binomial_…

06.03.2025 15:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We suggest using Bayesian credible intervals for your error bars instead, with a simple Beta-Binomial model. (The aim is for the methods to achieve nominal 1-alpha coverage i.e. match the dotted line in the top row. A 95% confidence interval should be right 95% of the time.)

06.03.2025 15:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

This, along with the CLT's ignorance of typically binary eval data (correct/incorrect responses to an eval question) lead to poor error bars which collapse to zero-width or extend past [0,1].

06.03.2025 15:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

As LLMs get better, benchmarks to evaluate their capabilities are getting smaller (and harder). This starts to violate the CLT's large N assumption. Meanwhile, we have lots of eval settings in which questions aren't IID (e.g. questions in a benchmark often aren't independent).

06.03.2025 15:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Our paper on the best way to add error bars to LLM evals is on arXiv! TL;DR: Avoid the Central Limit Theorem -- there are better, simple Bayesian and frequentist methods you should be using instead.

We also provide a super lightweight library: github.com/sambowyer/baye… πŸ§΅πŸ‘‡

06.03.2025 15:00 β€” πŸ‘ 25    πŸ” 8    πŸ’¬ 1    πŸ“Œ 0
Preview
Lecture 3: Introduction to Deep Learning aka neural networks aka differentiable programming

The third in the teaching blogs series: Introduction to deep learning

open.substack.com/pub/probappr...

03.03.2025 17:03 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

NHS boss was sacked (well, β€œresigned”), so there’s some hope for major reforms and improvements in the health system (I hope 🀞)

26.02.2025 11:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Lecture 2: Marginal likelihood, non-Gaussian likelihoods and... scalability issues Things might not be nice and easy even when everything is Gaussian

open.substack.com/pub/probappr...

25.02.2025 16:23 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
The Allen Institute for AI

Come work with me!
We are looking to bring on more top talent to our language modeling workstream at @ai2.bsky.social building the open ecosystem. We are hiring:
* Research scientists
* Senior research engineers
* Post docs (Young investigators)
* Pre docs

job-boards.greenhouse.io/thealleninst...

25.02.2025 01:07 β€” πŸ‘ 55    πŸ” 15    πŸ’¬ 4    πŸ“Œ 0

Nice. Are the materials publicly available?

21.02.2025 15:54 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We currently do 2 lectures on GPs πŸ˜… one could certainly do a whole course (bayesopt, automl) - could be fun!

21.02.2025 15:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Indeed, the course is already really quite tight. So if DPs are to be covered, something has to be dropped. I’m thinking for next year potentially dropping constrained optimisation/SVMs (done in the first half) and covering BNP more thoroughly

21.02.2025 15:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

It’s a mix - first part was ERM, SVMs and kernels; second part (which is the one I’m teaching) - Bayesian ML (GPs), deep learning and VI

21.02.2025 14:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Lecture 1: Gaussian Processes and GP Regression Nice and easy when everything is Gaussian

Teaching is super undervalued by universities (at least in UK) so there’s very little incentive to do it well. I think this is wrong and thoughtful pedagogy matters deeply. I hope these β€œteaching blogs” series will help me get up to speed and improve more quickly

open.substack.com/pub/probappr...

21.02.2025 13:10 β€” πŸ‘ 18    πŸ” 1    πŸ’¬ 3    πŸ“Œ 1
Post image

I’m teaching a grad course for the first time (a bit terrifying πŸ˜…) and I’ve decided to write a short blog post after each lecture, that will highlight a key takeaway from it and reflect on what can be improved.

First one ⬇️

21.02.2025 13:10 β€” πŸ‘ 89    πŸ” 9    πŸ’¬ 3    πŸ“Œ 0

πŸ“£ Jobs alert

We’re hiring postdoc and research engineer to work on UQ for LLMs!! Details ⬇️

#ai #llm #uq

12.02.2025 16:26 β€” πŸ‘ 13    πŸ” 11    πŸ’¬ 0    πŸ“Œ 0

πŸ“£ Jobs alert

We’re hiring postdoc and research engineer to work on UQ for LLMs!! Details ⬇️

#ai #llm #uq

12.02.2025 16:26 β€” πŸ‘ 13    πŸ” 11    πŸ’¬ 0    πŸ“Œ 0
Post image

A tiny "embers of autoregression" artifact in simple arithmetic

probapproxincorrect.substack.com/p/embers-of-au…

08.02.2025 21:05 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image

🌟 Research Collaboration Morning 🌟

Our recent Research Collaboration Morning featured talks from Hrushi Loya (SGBE) & @desirivanova.bsky.social (CSML) over tea & coffee, followed by amazing treats from Waste2Taste & Blakes Food. Thanks to Prof Simon Myers and Bev Lane for organising!

06.02.2025 09:47 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0