To be clear, I don't mean the resulting loss is lower. I mean that after benchmarking models trained using different optimizers but same training data, the model I got from using schedule_free_adamw was the champ by enough of a margin that I think it's plausible it wasn't random chance
01.12.2024 04:05 β π 1 π 0 π¬ 0 π 0
In my limited experience testing it (it got released ~this week) the schedule_free_adamw optimizer that's now in axolotl has outperformed the various adamw variants for me. The new adopt optimizer on the other hand hasn't delivered for me.
01.12.2024 04:05 β π 1 π 0 π¬ 1 π 0
Enjoy static.googleusercontent.com/media/resear...
27.11.2024 15:47 β π 2 π 0 π¬ 0 π 0
Have you read the google paper about using an optimization algorithm to make the optimal cookie though... because I not only read it I baked dem cookies and can confirm, algo work real good
27.11.2024 15:25 β π 2 π 0 π¬ 1 π 0
Yeah, giant pain but I really wanted to know...
27.11.2024 12:46 β π 1 π 0 π¬ 0 π 0
So while it's a bit task specific, there's more than enough context provided to the LLM's in prompt to understand the task + how the output will be evaluated. Rubric is pass/fail on each dimension with room for the LLM to overweight failures like - leaving out key info in the final judgement
27.11.2024 12:45 β π 1 π 0 π¬ 0 π 0
Call it a summary/instruction following eval. The judge prompt has a 21 point custom rubric for grading the outputs and the original prompt for producing the summaries has a similarly lengthy description of the kinds of things I want included vs excluded, style guidelines, what to emphasize etc.
27.11.2024 12:44 β π 1 π 0 π¬ 2 π 0
Nice! All about the custom eval. A lot of work but so so worth it. I recently built my own eval as well (not for code, and primarily to evaluate performance of different fine tuning ablations/ideas).
bsky.app/profile/n0ri...
27.11.2024 12:25 β π 1 π 0 π¬ 1 π 0
For context these are o1-preview judgements from a custom LLM as a judge prompt I spent an unreasonable amount of time crafting. Posted this to that other site but going forwards I will share more here.
27.11.2024 12:22 β π 1 π 0 π¬ 0 π 0
Sonnet is still King π for summarization:
Sonnet 3.6 vs 4o 11-20 (n=210):
Claude Sonnet 3.6: 54% (113 wins)
GPT-4o (11/20): 44% (92 wins)
Ties: 2% (5)
Sonnet 3.6 vs Gemini Exp 11-21 (n=202):
Claude Sonnet 3.6: 60% (122 wins)
Gemini-exp-1121: 38% (76 wins)
Ties: 2% (4)
27.11.2024 12:22 β π 2 π 0 π¬ 1 π 1
Distillation is the way. Sample efficiency of larger model when training + inf cost of smaller distilled model + retain the option to quantized the smaller model to fp8 for speed boost on some GPUs + option for better spec decoding from extra small distilled model = really good practical option IMO
26.11.2024 13:34 β π 5 π 0 π¬ 0 π 0
@myexplodingpen.bsky.social is there any way to read your article without subscribing to medium? Haven't been able to find much from Microsoft on the topic or anyone else discussing this but can't read your piece...
24.11.2024 16:38 β π 0 π 0 π¬ 1 π 0
Instructions unclear but just in case all I have to do to get into the Stanford NLP phd program is reply to this thread I figure I better go ahead and reply.
22.11.2024 00:38 β π 1 π 0 π¬ 0 π 0
πΈ they, them πΈ just here for fun, but maybe Iβll post some textiles πΈ DMs are open πΈ
Llama Farmer
Ex CLO Hugging Face, Xoogler
Machine Learning Librarian at @hf.co
ML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://amzn.to/4fqvn0D). Blogging about AI research at magazine.sebastianraschka.com.
Researcher trying to shape AI towards positive outcomes. ML & Ethics +birds. Generally trying to do the right thing. TIME 100 | TED speaker | Senate testimony provider | Navigating public life as a recluse.
Former: Google, Microsoft; Current: Hugging Face
Working towards the safe development of AI for the benefit of all at UniversitΓ© de MontrΓ©al, LawZero and Mila.
A.M. Turing Award Recipient and most-cited AI researcher.
https://lawzero.org/en
https://yoshuabengio.org/profile/
Research Scientist at DeepMind. Opinions my own. Inventor of GANs. Lead author of http://www.deeplearningbook.org . Founding chairman of www.publichealthactionnetwork.org
Research Scientist at Meta β’ ex Cohere, Google DeepMind β’ https://www.ruder.io/
Director of Machine Learning at the Wikimedia Foundation. We host Wikipedia.
Sr. ML Engineer | Keras 3 Collaborator | @GoogleDevExpert in Machine Learning | @TensorFlow addons maintainer l ML is all I do | Views are my own!
Cofounded and lead PyTorch at Meta. Also dabble in robotics at NYU.
AI is delicious when it is accessible and open-source.
http://soumith.ch
β¨ Keep it simple, make it scale. AI should be about empowering users and building understanding. π©βπ» AI Developer Experience @ Google DeepMind, ex-Github, ex-Google
Multimodal research @huggingface
Breakthrough AI to solve the world's biggest problems.
βΊ Join us: http://allenai.org/careers
βΊ Get our newsletter: https://share.hsforms.com/1uJkWs5aDRHWhiky3aHooIg3ioxm
it's me your annoying friend from the other app
- making b2b saas for work
- training bad deep learning models for fun
- interests: economy, politics, scifi, music that was cool 7 months ago