kb's Avatar

kb

@keighbee.bsky.social

Machine Learning Engineer @ HuggingFace

24 Followers  |  150 Following  |  10 Posts  |  Joined: 22.11.2024  |  1.7648

Latest posts by keighbee.bsky.social on Bluesky

MLX, Llama.cpp, and Candle are performing about equally on an M3 Max now.

MLX, Llama.cpp, and Candle are performing about equally on an M3 Max now.

πŸ•―οΈπŸ”₯[Candle](github.com/huggingface/...) is now much faster on macOS thanks to a contribution by @EricLBuehler, which brings major speed improvements to the Metal backend.πŸŽπŸ“ˆ
Try it out by running some of our examples with the `--features metal` flag.

#Candle #RustLang #macOS #Metal #HuggingFace

21.07.2025 22:22 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Building Tensors from Scratch in Rust (Part 2): View Operations A Blog post by Kyle Birnbaum on Hugging Face

I just published part 2 of my article series about creating tensors from scratch in Rust. This one is about view operations.
#tensors #machine-learning #ml #ai

Take a look here:
huggingface.co/blog/KeighBe...

18.06.2025 23:18 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
Building Tensors From Scratch in Rust: Part 1, Core Structure and Indexing A Blog post by Kyle Birnbaum on Hugging Face

I'm writing an article series about creating tensors from scratch in Rust. #tensors #machine-learning #ml #ai

huggingface.co/blog/KeighBe...

12.06.2025 23:56 β€” πŸ‘ 5    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

The mixture of experts model is also an option:

```
cargo run --example qwen --features metal --release -- --prompt "Write a poem about butterflies. <think></think>." --model "3-moe-a3b"
```

30.05.2025 20:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - huggingface/candle: Minimalist ML framework for Rust Minimalist ML framework for Rust. Contribute to huggingface/candle development by creating an account on GitHub.

Qwen 3 is now supported in Candle!
Run the 3-4B model locally with:

```
cargo run --example qwen --release -- --model 3-4b --prompt 'The capital of France is '
```

On macOS, enable Metal for faster inference:

```
--features metal
```

Clone the repo and test it out. github.com/huggingface/...

30.05.2025 20:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
microsoft/rifts Β· Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

RIFTS Dataset: Solving Critical LLM Conversation Failures

- LLMs 3x less likely to clarify than humans
- 16x less likely to provide follow-up requests
- Early failures predict later breakdowns
- Includes preliminary intervention strategies

huggingface.co/datasets/mic...

21.03.2025 09:57 β€” πŸ‘ 11    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Preview
Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Google just released Gemma 3, an open, on-device LLM with vision capabilities and support for over 140 different languages. Models range from 1B-27B parameters.

Zero-day support for multiple frameworks including transformers, MLX, llama.cpp, and more! πŸ’Ό πŸš€

Read more here:
huggingface.co/blog/gemma3

12.03.2025 18:46 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Large reasoning models (LRMs) tackle complex reasoning problems by following long chain-of-thoughts (Long CoT) that incorporate reflection, backtracking, and self-validation. However, the training tec...
13.02.2025 14:33 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Made some significant updates to the @hf.co semantic datasets search app. If you love falling into a wiki black hole, you might like this...

huggingface.co/spaces/libra...

13.02.2025 17:14 β€” πŸ‘ 9    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0
How DeepSeek Changes the LLM Story
YouTube video by Sasha Rush πŸ€— How DeepSeek Changes the LLM Story

What to know about DeepSeek

youtu.be/0eMzc-WnBfQ?...

In which we attempt to figure out MoE, o1, scaling, tech reporting, modern semiconductors, microeconomics, and international geopolitics.

04.02.2025 15:41 β€” πŸ‘ 95    πŸ” 13    πŸ’¬ 1    πŸ“Œ 5
Post image Post image Post image Post image

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Hongzhi Huang, Defa Zhu, Banggu Wu, Yutao Zeng, Ya Wang, Qiyang Min, Xun Zhou

tl;dr: increasing input vocabulary is always good, increasing output vocabularies is good for bigger models.
arxiv.org/abs/2501.16975

05.02.2025 15:38 β€” πŸ‘ 6    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

It’s a green light for the Frugal AI Challenge! πŸš€
For the next month, we invite all members of the AI community to participate in one of our 3 AI for Climate tasks, with the goal of developing a highly accurate model while consuming as little energy as possible ⚑

06.01.2025 17:36 β€” πŸ‘ 23    πŸ” 11    πŸ’¬ 2    πŸ“Œ 1
Preview
GitHub - huggingface/coreml-examples: Swift Core ML Examples Swift Core ML Examples. Contribute to huggingface/coreml-examples development by creating an account on GitHub.

We’ve got great examples of PyTorch to CoreML conversions in the Huggingface coreml-examples repo. Currently, there’s one tutorial, but more are coming soon. After converting, you can choose what compute units you want the model to run on!

12.12.2024 19:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Christmas came early! πŸŽ…πŸ» Today marks the newest release of the HuggingChat πŸ€— update with some really exciting capabilities! First up, automatic context injection!

1) Open a file in a supported app, summon HFChat, and it pre-populates the context window. No more copy-pasting. /cc @hf.co

09.12.2024 19:11 β€” πŸ‘ 11    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1

Or, My laptop has a 72 Wh battery (~208,512 J assuming only 80% is usable). Running Llama3.2-1B would drain the battery after processing:

- CPU: 674,249 tokens (~518,653 words, ~7 novels)
- GPU: 2,799,550 tokens (~2,153,500 words, ~30 novels)
- ANE: 11,273,184 tokens (~8,671,679 words, ~123 novels)

05.12.2024 20:08 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

To put it in perspective: Llama3.2-1B uses ~280 GFLOPS per 20 tokens. My laptop (~2kg) running the model would be the energy equivalent of:

- CPU (6 J): dropping it from 1 foot (31 cm)
- GPU (1.4 J): dropping it from 3 inches (7 cm)
- ANE (0.3 J): dropping it by just half an inch (1.5 cm)!

05.12.2024 20:08 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Chart Title: Model Hardware vs Energy per GigaFLOP.
Vertical Axis: mJ/GFLOP(Log)
Horizontal Axis: Hardware Type(CPU, CPU + GPU, CPU + ANE)
CPU: min 6.9 1st quartile 11.7 median 13.4 3rd quartile 35.6 max 53.1
CPU + GPU: 4.6 4.6 4.7 6.2 9.6
CPU + ANE: 0.9 1.0	1.1 1.4 1.8

Chart Title: Model Hardware vs Energy per GigaFLOP. Vertical Axis: mJ/GFLOP(Log) Horizontal Axis: Hardware Type(CPU, CPU + GPU, CPU + ANE) CPU: min 6.9 1st quartile 11.7 median 13.4 3rd quartile 35.6 max 53.1 CPU + GPU: 4.6 4.6 4.7 6.2 9.6 CPU + ANE: 0.9 1.0 1.1 1.4 1.8

Preliminary data shows the Apple Neural Engine uses ~94% less energy than the CPU and ~75% less than the GPU 🀯

On the On-Device team at Hugging Face, we've been profiling energy usage for CoreML models. Here’s some data I collected:

05.12.2024 20:08 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

@keighbee is following 20 prominent accounts