kb @keighbee - Bluesky Profile

MLX, Llama.cpp, and Candle are performing about equally on an M3 Max now.

🕯️🔥[Candle](github.com/huggingface/...) is now much faster on macOS thanks to a contribution by @EricLBuehler, which brings major speed improvements to the Metal backend.🍎📈
Try it out by running some of our examples with the `--features metal` flag.

#Candle #RustLang #macOS #Metal #HuggingFace

21.07.2025 22:22 — 👍 2 🔁 1 💬 0 📌 0

Building Tensors from Scratch in Rust (Part 2): View Operations A Blog post by Kyle Birnbaum on Hugging Face

I just published part 2 of my article series about creating tensors from scratch in Rust. This one is about view operations.
#tensors #machine-learning #ml #ai

Take a look here:
huggingface.co/blog/KeighBe...

18.06.2025 23:18 — 👍 2 🔁 1 💬 1 📌 0

Building Tensors From Scratch in Rust: Part 1, Core Structure and Indexing A Blog post by Kyle Birnbaum on Hugging Face

I'm writing an article series about creating tensors from scratch in Rust. #tensors #machine-learning #ml #ai

huggingface.co/blog/KeighBe...

12.06.2025 23:56 — 👍 5 🔁 3 💬 0 📌 0

The mixture of experts model is also an option:

```
cargo run --example qwen --features metal --release -- --prompt "Write a poem about butterflies. <think></think>." --model "3-moe-a3b"
```

30.05.2025 20:00 — 👍 0 🔁 0 💬 0 📌 0

GitHub - huggingface/candle: Minimalist ML framework for Rust Minimalist ML framework for Rust. Contribute to huggingface/candle development by creating an account on GitHub.

Qwen 3 is now supported in Candle!
Run the 3-4B model locally with:

```
cargo run --example qwen --release -- --model 3-4b --prompt 'The capital of France is '
```

On macOS, enable Metal for faster inference:

```
--features metal
```

Clone the repo and test it out. github.com/huggingface/...

30.05.2025 20:00 — 👍 0 🔁 0 💬 1 📌 0

microsoft/rifts · Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

RIFTS Dataset: Solving Critical LLM Conversation Failures

- LLMs 3x less likely to clarify than humans
- 16x less likely to provide follow-up requests
- Early failures predict later breakdowns
- Includes preliminary intervention strategies

huggingface.co/datasets/mic...

21.03.2025 09:57 — 👍 11 🔁 3 💬 1 📌 0

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Google just released Gemma 3, an open, on-device LLM with vision capabilities and support for over 140 different languages. Models range from 1B-27B parameters.

Zero-day support for multiple frameworks including transformers, MLX, llama.cpp, and more! 💼 🚀

Read more here:
huggingface.co/blog/gemma3

12.03.2025 18:46 — 👍 3 🔁 0 💬 1 📌 0

LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Large reasoning models (LRMs) tackle complex reasoning problems by following long chain-of-thoughts (Long CoT) that incorporate reflection, backtracking, and self-validation. However, the training tec...

13.02.2025 14:33 — 👍 5 🔁 1 💬 0 📌 0

Made some significant updates to the @hf.co semantic datasets search app. If you love falling into a wiki black hole, you might like this...

huggingface.co/spaces/libra...

13.02.2025 17:14 — 👍 9 🔁 4 💬 0 📌 0

YouTube video by Sasha Rush 🤗 How DeepSeek Changes the LLM Story

What to know about DeepSeek

youtu.be/0eMzc-WnBfQ?...

In which we attempt to figure out MoE, o1, scaling, tech reporting, modern semiconductors, microeconomics, and international geopolitics.

04.02.2025 15:41 — 👍 96 🔁 13 💬 1 📌 5

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Hongzhi Huang, Defa Zhu, Banggu Wu, Yutao Zeng, Ya Wang, Qiyang Min, Xun Zhou

tl;dr: increasing input vocabulary is always good, increasing output vocabularies is good for bigger models.
arxiv.org/abs/2501.16975

05.02.2025 15:38 — 👍 6 🔁 2 💬 0 📌 0

It’s a green light for the Frugal AI Challenge! 🚀
For the next month, we invite all members of the AI community to participate in one of our 3 AI for Climate tasks, with the goal of developing a highly accurate model while consuming as little energy as possible ⚡

06.01.2025 17:36 — 👍 23 🔁 11 💬 2 📌 1

GitHub - huggingface/coreml-examples: Swift Core ML Examples Swift Core ML Examples. Contribute to huggingface/coreml-examples development by creating an account on GitHub.

We’ve got great examples of PyTorch to CoreML conversions in the Huggingface coreml-examples repo. Currently, there’s one tutorial, but more are coming soon. After converting, you can choose what compute units you want the model to run on!

12.12.2024 19:02 — 👍 0 🔁 0 💬 0 📌 0

Christmas came early! 🎅🏻 Today marks the newest release of the HuggingChat 🤗 update with some really exciting capabilities! First up, automatic context injection!

1) Open a file in a supported app, summon HFChat, and it pre-populates the context window. No more copy-pasting. /cc @hf.co

09.12.2024 19:11 — 👍 11 🔁 2 💬 1 📌 1

Or, My laptop has a 72 Wh battery (~208,512 J assuming only 80% is usable). Running Llama3.2-1B would drain the battery after processing:

- CPU: 674,249 tokens (~518,653 words, ~7 novels)
- GPU: 2,799,550 tokens (~2,153,500 words, ~30 novels)
- ANE: 11,273,184 tokens (~8,671,679 words, ~123 novels)

05.12.2024 20:08 — 👍 2 🔁 0 💬 0 📌 0

To put it in perspective: Llama3.2-1B uses ~280 GFLOPS per 20 tokens. My laptop (~2kg) running the model would be the energy equivalent of:

- CPU (6 J): dropping it from 1 foot (31 cm)
- GPU (1.4 J): dropping it from 3 inches (7 cm)
- ANE (0.3 J): dropping it by just half an inch (1.5 cm)!

05.12.2024 20:08 — 👍 2 🔁 0 💬 1 📌 0

Chart Title: Model Hardware vs Energy per GigaFLOP. Vertical Axis: mJ/GFLOP(Log) Horizontal Axis: Hardware Type(CPU, CPU + GPU, CPU + ANE) CPU: min 6.9 1st quartile 11.7 median 13.4 3rd quartile 35.6 max 53.1 CPU + GPU: 4.6 4.6 4.7 6.2 9.6 CPU + ANE: 0.9 1.0 1.1 1.4 1.8

Preliminary data shows the Apple Neural Engine uses ~94% less energy than the CPU and ~75% less than the GPU 🤯

On the On-Device team at Hugging Face, we've been profiling energy usage for CoreML models. Here’s some data I collected:

05.12.2024 20:08 — 👍 4 🔁 1 💬 2 📌 0

kb

Latest posts by keighbee.bsky.social on Bluesky

@keighbee is following 20 prominent accounts