If you're fine-tuning LLMs, Gemma 3 is the new ๐ and it's not close. Gemma 3 trounces Qwen/Llama models at every size!
 - Gemma 3 4B beats 7B/8B competition
 - Gemma 3 27B matches 70B competition
Vision benchmarks soon!
@corbtt.bsky.social
If you're fine-tuning LLMs, Gemma 3 is the new ๐ and it's not close. Gemma 3 trounces Qwen/Llama models at every size!
 - Gemma 3 4B beats 7B/8B competition
 - Gemma 3 27B matches 70B competition
Vision benchmarks soon!
I hear cocaine is good but no way it can beat the rush I get from my RL-trained agent suddenly grokking a new skill.
18.03.2025 01:02 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0Training models with RL subjectively feels much more like gardening than engineering. You do your best to set the right conditions, provide the right inputs... and then wait and watch what grows. Very rewarding/magical feeling when it works!
17.03.2025 17:15 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Big news: we've figured out how to train models 80-90% cheaper than before. Cheaper than renting your own GPUs. Cheaper than any other service. And 0 quality regression.
Super proud of the team on this one. New pricing is now live!
This holiday season I am legitimately grateful that my kids are all <8 and not 16+. I have no idea what career prep advice I'd give someone in this moment. We're in for a ride.
20.12.2024 22:46 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0Helpful intuition that folks new to LLMs may not know: if you have a lot of data, small models are often just as good as much, much larger ones for tasks like classification and information extraction. Here I compare a 1B vs 8B on a hard classification task, and I bet you can't tell which is which!
13.12.2024 22:30 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0Btw you can view your training loss across open source models AND Gemini models on OpenPipe!
09.12.2024 15:40 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0OpenAI's Reinforcement Fine-Tuning (RFT) is far more data efficient than SFTโcan generalize from 10-20 labeled examples.
Huge deal bc as compute costs drop to 0, the pain of gathering high-quality training data is the biggest barrier to deploying AI. RFT needs much less of it!
Meta just released Llama 3.3 70Bโthey claim benchmarks similar to Llama 3 405B, but in a model 20% the size. It's already available as a base model on OpenPipe, and we'll release benchmarks as a fine-tuning base model soon.
huggingface.co/meta-llama/L...
SUPER PUMPED to announce that Gemini fine-tuning is available to all OpenPipe users! Gemini Flash provides the lowest cost fine-tuning of any model in its quality class. Comparable to gpt-4o-mini, but 4x cheaper inference and FREE fine-tuning!
05.12.2024 16:38 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0SGLang is basically vLLM but better. I just tested v0.4 on a real-world task with a Llama 3.2 3B model. Reached a max throughput of 61K tokens per secondโ44% higher than our vLLM baseline!
lmsys.org/blog/2024-12...
One of the new features I'm most excited about at OpenPipe is "criteria distillation". This allows you to distill an expensive LLM-as-judge criteria into a super fast, cheap, low-latency reward model that approximates the LLM-as-judge's outputs. DM for access!
04.12.2024 18:43 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0Amazon's Nova models have excellent price/perf ratio. We'd love to support them, but to deploy fine-tuned versions you need to purchase "provisioned throughput", which costs $100/hr/model. ๐ฌ Putting out the bat signalโif you know someone at AWS Bedrock, pls put me in contact!
04.12.2024 16:08 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0Ok I am terrible at sharing product updates here, but we now support Llama 3.2 1B and 3B (the best small LLMs) as well as Qwen 2.5 72B and 32B Coder (the best open general and code-specific models) on OpenPipe!
04.12.2024 00:35 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0Wait was he actually permabanned by Bluesky, or just attacked by an online mob? If he was actually permabanned that's very concerning.
28.11.2024 17:59 โ ๐ 12 ๐ 0 ๐ฌ 4 ๐ 0Kinda feels like the product engineer and IC PM roles are quickly converging. A really good AI-enabled SWE can produce the same output as a former team of 5 SWE+1 PM.
Are AI-native companies still hiring IC PMs who don't code?
Not an anon but I can bring the Qwen-stanning bsky.app/profile/did:...
24.11.2024 23:36 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0What is the current SOTA on language autoencoders? Can you run lossy compression on a 20K-word Wikipedia article to give you an archive that's just a few KB in size, but decompresses into text semantically indistinguishable from the original?
22.11.2024 22:43 โ ๐ 4 ๐ 2 ๐ฌ 2 ๐ 0In your subjective opinion, do the Tulu models feel "better" than Llama's first-party Instruct variants? I realize the benchmarks show improvement on average but that doesn't capture the whole story.
21.11.2024 21:12 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0You can just use Qwen 2.5 for any task you'd otherwise use Llama 3.1 for. This is a (poorly formatted) chart I made a month or so back based on our own internal evals after the Llama 3.2 release. Big error bars but it shows the trend.
19.11.2024 22:19 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0This may become an official Qwen-stan account.
โ
 Open source SOTA on code
โ
 Open source SOTA in general for 14B+
โ
 Almost SOTA <14B
โ
 Works great for LM, RM and classification tasks
โ
 SOTA open source multimodal
OpenPipe now hosts all our docs in plaintext on our docs page at /llms.txt (index links) and /llms-full.txt (full dump of all docs).
Great idea from @jph.bsky.social!
Qwen 2.5 Coder 32B is a ๐
โ
 Benchmarks at or above GPT-4 and Claude 3.5
โ
 Subjectively feels fantastic for code (been trying it)
โ
 Fine-tunable on your own data on OpenPipe!
Last week Huggingface released "SmolLM v2," several <2B models designed for edge deployment. Interested in how they perform when fine-tuned? You're in luck! We've compared their performance with other edge models. (Spoiler: Qwen remains the champion ๐)
02.11.2024 10:15 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0