Laurence Aitchison (@laurenceai) — Bluesky Profile

1 year ago

Our paper on the best way to add error bars to LLM evals is on arXiv! TL;DR: Avoid the Central Limit Theorem -- there are better, simple Bayesian and frequentist methods you should be using instead.

We also provide a super lightweight library: github.com/sambowyer/baye… 🧵👇

25 8 1 0

1 year ago

Go read it on arXiv! Thanks to my co-authors @sambowyer.bsky.social and @laurenceai.bsky.social 💥

4 1 1 0

1 year ago

📣 Jobs alert

We’re hiring postdoc and research engineer to work on UQ for LLMs!! Details ⬇️

#ai #llm #uq

13 11 0 0

1 year ago

Do you know what rating you’ll give after reading the intro? Are your confidence scores 4 or higher? Do you not respond in rebuttal phases? Are you worried how it will look if your rating is the only 8 among 3’s? This thread is for you.

77 20 4 3

1 year ago

Would love to be added!

1 0 2 0

1 year ago

But you can't prove that the *real* asteroid won't hit earth, because the real world isn't your simplified model. e.g. you don't know the initial conditions, there might be other bodies you aren't aware of etc. etc.

0 0 0 0

1 year ago

The analogy we're working from is "mathematically provable asteroid safety": within a simplified mathematical model, with known initial conditions, you can prove that an asteroid won't hit earth. (2/3)

0 0 1 0

1 year ago

Does anyone want to collaborate on an ICML position paper on "The impossibility of mathematically proving AI safety"? The basic thesis being that it is a category error to try to prove AI safety in the real world. (1/3)

2 0 1 0

1 year ago

Can you add?

0 0 0 0