pranav @pranav - Bluesky Profile

Everything about that Luigi guy is just sad. Manifesto written by an LSTM, no big plan or idea, got caught in the dumbest way imaginable, miserable health condition.. just sad overall

11.12.2024 01:56 — 👍 9 🔁 0 💬 1 📌 0

what’s the mfu like

11.12.2024 01:50 — 👍 5 🔁 0 💬 0 📌 0

Personally I’m even more primitive and know basic calculus only. So the significance of this is totally lost on me. But at the same time I don’t want to do a depth first search and take 5 years to grok all this either

09.12.2024 00:16 — 👍 6 🔁 0 💬 0 📌 0

mom pick me up they’re putting air quotes on integrals now

08.12.2024 23:18 — 👍 8 🔁 0 💬 1 📌 0

Does exploration

04.12.2024 08:10 — 👍 1 🔁 0 💬 0 📌 0

Falsifiable prediction = respect

03.12.2024 20:23 — 👍 3 🔁 0 💬 0 📌 0

Similar to how “Threads should not be a library”

03.12.2024 20:20 — 👍 1 🔁 0 💬 0 📌 0

Commits should be first class objects. VCS is not some outer loop feature. It is what we do

03.12.2024 20:20 — 👍 3 🔁 0 💬 1 📌 0

That’s not even the first one. Just the first good one that didn’t use Hidden Markov Models

03.12.2024 13:50 — 👍 1 🔁 0 💬 1 📌 0

‪Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks‬ ‪A Graves, S Fernández, F Gomez, J Schmidhuber‬, ‪Proceedings of the 23rd international conference on Machine learning, 2006‬ - ‪Cited by 7,222‬

scholar.google.co.uk/citations?vi...

03.12.2024 13:49 — 👍 0 🔁 0 💬 1 📌 0

Ah that explains your knowledge of dosas finally

29.11.2024 13:13 — 👍 0 🔁 0 💬 0 📌 0

Good water supply

29.11.2024 03:34 — 👍 1 🔁 0 💬 0 📌 0

grateful for unga buga loss go down technology

🦃🇺🇸💸

28.11.2024 15:57 — 👍 4 🔁 0 💬 0 📌 0

Importance sampling does not work in deep learning

This suggests that SGD is a frequentist process, not a bayesian one

proceedings.mlr.press/v97/byrd19a/...

26.11.2024 20:32 — 👍 2 🔁 0 💬 0 📌 0

hmm what a coincidence this suddenly popped up on the other site

26.11.2024 20:10 — 👍 4 🔁 0 💬 1 📌 0

Everybody gangsta until deepseek r1 starts thinking in Chinese

26.11.2024 18:47 — 👍 8 🔁 0 💬 0 📌 0

There are papers pipelining along the token dimension.
Agree it’s a little too good to be true, too basic to be new

26.11.2024 07:07 — 👍 0 🔁 0 💬 0 📌 0

I read it twice and still don’t understand what the insight is. Might have to read the paper

26.11.2024 06:59 — 👍 0 🔁 0 💬 0 📌 0

Looks like Sutton didn’t get the memo. Memory and compute are cheap. It’s called the “Bitter Lesson”

Replay buffers are a-ok!

26.11.2024 02:33 — 👍 1 🔁 0 💬 0 📌 0

I now hit cmd + s every breath due to trauma from this

26.11.2024 02:32 — 👍 1 🔁 0 💬 0 📌 0

Eric Schmidt has written a second book with Henry Kissinger on AI. Incredible

25.11.2024 14:57 — 👍 4 🔁 0 💬 1 📌 0

distributed learning for LLM?

recently, @primeintellect.bsky.social have announced finishing their 10B distributed learning, trained across the world.

what is it exactly?

🧵

25.11.2024 12:02 — 👍 23 🔁 6 💬 1 📌 2

delete this

25.11.2024 14:21 — 👍 21 🔁 0 💬 0 📌 0

There’s also BPE dropout

25.11.2024 04:20 — 👍 2 🔁 0 💬 1 📌 0

btw training a 5e25 flops model at 50% MFU would take 10k H100s for 100 days. anything more than that is surplus territory.

in any case pretty impressive operation!

25.11.2024 04:15 — 👍 1 🔁 0 💬 0 📌 0

Wow never would have thought you’d be an options trader. Honestly respect 🫡

The future is proper uncertain now after a while so selling CCs might just be the move

25.11.2024 04:10 — 👍 1 🔁 0 💬 1 📌 0

Well if pre training is over NVDA is at risk. o1 inference, data selection etc can be done on AMD

24.11.2024 07:40 — 👍 0 🔁 0 💬 1 📌 0

Lucky man

24.11.2024 07:01 — 👍 0 🔁 0 💬 1 📌 0

OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI Three of the leading artificial intelligence companies are seeing diminishing returns from their costly efforts to develop newer models.

Some recent reports such as this one and all typical thought leaders and podcasters jumping on it is creating a narrative

www.bloomberg.com/news/article...

24.11.2024 01:29 — 👍 1 🔁 0 💬 1 📌 0

Agree but I’ve never seen positive transfer happen so I’m a bit pessimistic.

Not sure earliness of fusion has been a bottleneck here, do you have reasons for why it could be?

It’d be great if we could figure out a pre training objective for YouTube that transfers to text

23.11.2024 18:39 — 👍 2 🔁 0 💬 1 📌 0

pranav

Latest posts by pranav.bsky.social on Bluesky

@pranav is following 18 prominent accounts