Everything about that Luigi guy is just sad. Manifesto written by an LSTM, no big plan or idea, got caught in the dumbest way imaginable, miserable health condition.. just sad overall
11.12.2024 01:56 — 👍 9 🔁 0 💬 1 📌 0@pranav.bsky.social
Research Scientist at Google DeepMind. ಕನ್ನಡಿಗ. Past: Researchoor, Algorithms team at OpenAI & with Juergen Schmidhuber.
Everything about that Luigi guy is just sad. Manifesto written by an LSTM, no big plan or idea, got caught in the dumbest way imaginable, miserable health condition.. just sad overall
11.12.2024 01:56 — 👍 9 🔁 0 💬 1 📌 0what’s the mfu like
11.12.2024 01:50 — 👍 5 🔁 0 💬 0 📌 0Personally I’m even more primitive and know basic calculus only. So the significance of this is totally lost on me. But at the same time I don’t want to do a depth first search and take 5 years to grok all this either
09.12.2024 00:16 — 👍 6 🔁 0 💬 0 📌 0mom pick me up they’re putting air quotes on integrals now
08.12.2024 23:18 — 👍 8 🔁 0 💬 1 📌 0Does exploration
04.12.2024 08:10 — 👍 1 🔁 0 💬 0 📌 0Falsifiable prediction = respect
03.12.2024 20:23 — 👍 3 🔁 0 💬 0 📌 0Similar to how “Threads should not be a library”
03.12.2024 20:20 — 👍 1 🔁 0 💬 0 📌 0Commits should be first class objects. VCS is not some outer loop feature. It is what we do
03.12.2024 20:20 — 👍 3 🔁 0 💬 1 📌 0That’s not even the first one. Just the first good one that didn’t use Hidden Markov Models
03.12.2024 13:50 — 👍 1 🔁 0 💬 1 📌 0Ah that explains your knowledge of dosas finally
29.11.2024 13:13 — 👍 0 🔁 0 💬 0 📌 0Good water supply
29.11.2024 03:34 — 👍 1 🔁 0 💬 0 📌 0grateful for unga buga loss go down technology
🦃🇺🇸💸
Importance sampling does not work in deep learning
This suggests that SGD is a frequentist process, not a bayesian one
proceedings.mlr.press/v97/byrd19a/...
hmm what a coincidence this suddenly popped up on the other site
26.11.2024 20:10 — 👍 4 🔁 0 💬 1 📌 0Everybody gangsta until deepseek r1 starts thinking in Chinese
26.11.2024 18:47 — 👍 8 🔁 0 💬 0 📌 0There are papers pipelining along the token dimension.
Agree it’s a little too good to be true, too basic to be new
I read it twice and still don’t understand what the insight is. Might have to read the paper
26.11.2024 06:59 — 👍 0 🔁 0 💬 0 📌 0Looks like Sutton didn’t get the memo. Memory and compute are cheap. It’s called the “Bitter Lesson”
Replay buffers are a-ok!
I now hit cmd + s every breath due to trauma from this
26.11.2024 02:32 — 👍 1 🔁 0 💬 0 📌 0Eric Schmidt has written a second book with Henry Kissinger on AI. Incredible
25.11.2024 14:57 — 👍 4 🔁 0 💬 1 📌 0distributed learning for LLM?
recently, @primeintellect.bsky.social have announced finishing their 10B distributed learning, trained across the world.
what is it exactly?
🧵
delete this
25.11.2024 14:21 — 👍 21 🔁 0 💬 0 📌 0There’s also BPE dropout
25.11.2024 04:20 — 👍 2 🔁 0 💬 1 📌 0btw training a 5e25 flops model at 50% MFU would take 10k H100s for 100 days. anything more than that is surplus territory.
in any case pretty impressive operation!
Wow never would have thought you’d be an options trader. Honestly respect 🫡
The future is proper uncertain now after a while so selling CCs might just be the move
Well if pre training is over NVDA is at risk. o1 inference, data selection etc can be done on AMD
24.11.2024 07:40 — 👍 0 🔁 0 💬 1 📌 0Lucky man
24.11.2024 07:01 — 👍 0 🔁 0 💬 1 📌 0Some recent reports such as this one and all typical thought leaders and podcasters jumping on it is creating a narrative
www.bloomberg.com/news/article...
Agree but I’ve never seen positive transfer happen so I’m a bit pessimistic.
Not sure earliness of fusion has been a bottleneck here, do you have reasons for why it could be?
It’d be great if we could figure out a pre training objective for YouTube that transfers to text