lmao instant follow
You can totally see how this mf bankrupted a casino.
Moonshot + Muon
A new 16B model
The Muon optimizer is 2x more data efficient than AdamE, but only for matrix parameters
note: this is a big deal
huggingface.co/moonshotai
super valuable stuff
👀🙏
In case it interests anyone, I managed to set up a demo of GRPO RL training in Colab. It’s an adaptation of Will Brown instant classic for math reasoning. Replace llama 1B with qwen 0.5b and inference with vllm. Full training in about 2 hours.
colab.research.google.com/drive/1bfhs1...
yeah, academia moved over here but the engineers and researchers at frontier labs are still at twitter sadly
I don’t understand this eval. why compare their deep research model with gemini thinking, when gemini deep research exists
at this point, gpt3 and claude sonnet/haiku could easily be open sourced
little disappointed seeing reactions of researchers from frontier labs on deepseek. science is not a zero sum game. we should really applaud the open weights, reproducibility, MIT license and detailed report which we hardly see in this decade. gracefulness besides the bias would’ve been nice
The inference speed is amazing!
Just saw ScaleAI's front page ad on "America must win the AI war".
I'm afraid in the AI war only Palantir wins.
this is how I prepare for interviews, especially around research. Super helpful
internal search is very interesting, i hope the implementation is easy to read through
How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this:
Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢
🧵⬇️
The most realistic reason to be pro open source AI is to reduce concentration of power.
Most elaborate game of chinese whisper
Good thread explaining general public reaction on data scraping
I believe o1 will be replicated soon. First by meta and then a truly open source release with datasets and training recipe by @ai2.bsky.social team
Outside tech, I see a lot of AI fear and hatred. Usually the argument is on AI taking jobs and creative tasks. I don't remember seeing this kind of general consensus of hatred and fear about a new technology before
Meet OLMo 2, the best fully open language model to date, including a family of 7B and 13B models trained up to 5T tokens. OLMo 2 outperforms other fully open models and competes with open-weight models like Llama 3.1 8B — As always, we released our data, code, recipes and more 🎁
So cool!! Someone needs to create a feed for papers and models!
i keep forgetting to include this cause i always assume people do this by default. Any time there is an exponent or a norm, you should be working in the highest practical precision
Posting a call for help: does anyone know of a good way to simultaneously treat both POTS and Ménière’s disease? Please contact me if you’re either a clinician with experience doing this or a patient who has found a good solution. Context in thread
There is something similar. Check if your discord channel is covered by swyx's ai news newsletter. If not, you can pay for customization.
📢 Ultimate test of #NLP bluesky:
I need emergency reviewers for NAACL submissions on encoders (one multilingual, one for sentence embeddings). Help a desperate editor abandoned by the ACs! Author response starts tomorrow, so that's a true emergency.
If you're my hero, lmk your openreview profile.