Kyle Corbitt's Avatar

Kyle Corbitt

@corbtt.bsky.social

190 Followers  |  64 Following  |  24 Posts  |  Joined: 02.11.2024  |  1.482

Latest posts by corbtt.bsky.social on Bluesky

Post image

If you're fine-tuning LLMs, Gemma 3 is the new ๐Ÿ‘‘ and it's not close. Gemma 3 trounces Qwen/Llama models at every size!
- Gemma 3 4B beats 7B/8B competition
- Gemma 3 27B matches 70B competition

Vision benchmarks soon!

21.03.2025 16:27 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

I hear cocaine is good but no way it can beat the rush I get from my RL-trained agent suddenly grokking a new skill.

18.03.2025 01:02 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Training models with RL subjectively feels much more like gardening than engineering. You do your best to set the right conditions, provide the right inputs... and then wait and watch what grows. Very rewarding/magical feeling when it works!

17.03.2025 17:15 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Big news: we've figured out how to train models 80-90% cheaper than before. Cheaper than renting your own GPUs. Cheaper than any other service. And 0 quality regression.

Super proud of the team on this one. New pricing is now live!

23.01.2025 17:16 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

This holiday season I am legitimately grateful that my kids are all <8 and not 16+. I have no idea what career prep advice I'd give someone in this moment. We're in for a ride.

20.12.2024 22:46 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Helpful intuition that folks new to LLMs may not know: if you have a lot of data, small models are often just as good as much, much larger ones for tasks like classification and information extraction. Here I compare a 1B vs 8B on a hard classification task, and I bet you can't tell which is which!

13.12.2024 22:30 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Btw you can view your training loss across open source models AND Gemini models on OpenPipe!

09.12.2024 15:40 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

OpenAI's Reinforcement Fine-Tuning (RFT) is far more data efficient than SFTโ€”can generalize from 10-20 labeled examples.

Huge deal bc as compute costs drop to 0, the pain of gathering high-quality training data is the biggest barrier to deploying AI. RFT needs much less of it!

06.12.2024 21:46 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Meta just released Llama 3.3 70Bโ€”they claim benchmarks similar to Llama 3 405B, but in a model 20% the size. It's already available as a base model on OpenPipe, and we'll release benchmarks as a fine-tuning base model soon.

huggingface.co/meta-llama/L...

06.12.2024 19:02 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

SUPER PUMPED to announce that Gemini fine-tuning is available to all OpenPipe users! Gemini Flash provides the lowest cost fine-tuning of any model in its quality class. Comparable to gpt-4o-mini, but 4x cheaper inference and FREE fine-tuning!

05.12.2024 16:38 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
SGLang v0.4: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs | LMSYS Org <p>Weโ€™re excited to release <a href="https://github.com/sgl-project/sglang">SGLang v0.4</a>, featuring significant performance improvements and new features:...

SGLang is basically vLLM but better. I just tested v0.4 on a real-world task with a Llama 3.2 3B model. Reached a max throughput of 61K tokens per secondโ€”44% higher than our vLLM baseline!

lmsys.org/blog/2024-12...

05.12.2024 02:35 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

One of the new features I'm most excited about at OpenPipe is "criteria distillation". This allows you to distill an expensive LLM-as-judge criteria into a super fast, cheap, low-latency reward model that approximates the LLM-as-judge's outputs. DM for access!

04.12.2024 18:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image

Amazon's Nova models have excellent price/perf ratio. We'd love to support them, but to deploy fine-tuned versions you need to purchase "provisioned throughput", which costs $100/hr/model. ๐Ÿ˜ฌ Putting out the bat signalโ€”if you know someone at AWS Bedrock, pls put me in contact!

04.12.2024 16:08 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Ok I am terrible at sharing product updates here, but we now support Llama 3.2 1B and 3B (the best small LLMs) as well as Qwen 2.5 72B and 32B Coder (the best open general and code-specific models) on OpenPipe!

04.12.2024 00:35 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Wait was he actually permabanned by Bluesky, or just attacked by an online mob? If he was actually permabanned that's very concerning.

28.11.2024 17:59 โ€” ๐Ÿ‘ 12    ๐Ÿ” 0    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0

Kinda feels like the product engineer and IC PM roles are quickly converging. A really good AI-enabled SWE can produce the same output as a former team of 5 SWE+1 PM.

Are AI-native companies still hiring IC PMs who don't code?

25.11.2024 20:03 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Not an anon but I can bring the Qwen-stanning bsky.app/profile/did:...

24.11.2024 23:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

What is the current SOTA on language autoencoders? Can you run lossy compression on a 20K-word Wikipedia article to give you an archive that's just a few KB in size, but decompresses into text semantically indistinguishable from the original?

22.11.2024 22:43 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

In your subjective opinion, do the Tulu models feel "better" than Llama's first-party Instruct variants? I realize the benchmarks show improvement on average but that doesn't capture the whole story.

21.11.2024 21:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

You can just use Qwen 2.5 for any task you'd otherwise use Llama 3.1 for. This is a (poorly formatted) chart I made a month or so back based on our own internal evals after the Llama 3.2 release. Big error bars but it shows the trend.

19.11.2024 22:19 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

This may become an official Qwen-stan account.
โœ… Open source SOTA on code
โœ… Open source SOTA in general for 14B+
โœ… Almost SOTA <14B
โœ… Works great for LM, RM and classification tasks
โœ… SOTA open source multimodal

19.11.2024 17:22 โ€” ๐Ÿ‘ 8    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Post image

OpenPipe now hosts all our docs in plaintext on our docs page at /llms.txt (index links) and /llms-full.txt (full dump of all docs).
Great idea from @jph.bsky.social!

18.11.2024 20:17 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Qwen 2.5 Coder 32B is a ๐Ÿ
โœ… Benchmarks at or above GPT-4 and Claude 3.5
โœ… Subjectively feels fantastic for code (been trying it)
โœ… Fine-tunable on your own data on OpenPipe!

13.11.2024 23:16 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Last week Huggingface released "SmolLM v2," several <2B models designed for edge deployment. Interested in how they perform when fine-tuned? You're in luck! We've compared their performance with other edge models. (Spoiler: Qwen remains the champion ๐Ÿ‘‘)

02.11.2024 10:15 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@corbtt is following 19 prominent accounts