Ben Burtenshaw's Avatar

Ben Burtenshaw

@benburtenshaw.bsky.social

Building tools for AI datasets. 😽 Looking in AI datasets. πŸ™€ Sharing clean open AI datasets. 😻 at https://bsky.app/profile/hf.co

4,283 Followers  |  206 Following  |  191 Posts  |  Joined: 13.11.2024  |  1.9861

Latest posts by benburtenshaw.bsky.social on Bluesky

Preview
Building Tensors From Scratch in Rust: Part 1, Core Structure and Indexing A Blog post by Kyle Birnbaum on Hugging Face

I'm writing an article series about creating tensors from scratch in Rust. #tensors #machine-learning #ml #ai

huggingface.co/blog/KeighBe...

12.06.2025 23:56 β€” πŸ‘ 5    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Preview
Feel - a Hugging Face Space by feel-fl Discover amazing ML apps made by the community

AI doesn’t get your culture?❌ butchers your language? 😀
With FeeL – you can fix thatπŸ› οΈπŸŒ

πŸ’¬ Talk to AI in your language
✏️ Correct its mistakes
πŸ‘β€πŸ—¨ Watch it improve
The more we use it, the smarter it gets for everyone!

πŸ‘‰ Try it now: huggingface.co/spaces/feel-...

πŸ‘ΆπŸ€–πŸ“ˆ
#ai #genAI #llm

26.03.2025 11:23 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 1    πŸ“Œ 3
Start-Up Approaches to Responsible AI: Worker-Centric InnovationRotterdam school of Management, Erasmus University logoRotterdam school of Management, Erasmus University compact logo Explore how start-ups are reshaping AI development through transparency, worker inclusivity, and ethical approaches that prioritise human augmentation over replacement.

How should AI tools be designed to support rather than replace workers?

At the Reshaping Work conference, I led a roundtable exploring AI’s impact on labor. We published a blogpost on our key takeaways on responsible AI and the future of work w/ Franco Bastida
πŸ”— www.rsm.nl/discovery/20...
πŸ§΅πŸ‘‡

12.02.2025 15:12 β€” πŸ‘ 4    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Post image

I've put together some of the handier tools for building courses and educational material on the @huggingface hub.

These should bootstrap you projects with quizzes, friendly sized model, usefule datasets, and informative spaces.

Let me know if you use or need more.

https://buff.ly/42qyanw

28.01.2025 07:32 β€” πŸ‘ 8    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1 Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

The science team at Hugging Face reproduced and open source the seek r1. https://buff.ly/4jtbp8x

27.01.2025 10:00 β€” πŸ‘ 33    πŸ” 6    πŸ’¬ 1    πŸ“Œ 3

Manic few days in open source AI, with game changing development all over the place. Here's a round up of the resources:

Here's a thread on it all:

27.01.2025 10:00 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
Dataset Quiz - a Hugging Face Space by burtenshaw A quiz app for rows of a dataset

quiz app https://buff.ly/4atPzxo
dataset with questions https://buff.ly/3ClY9Sm
agents course we're working on https://buff.ly/4gehzqi

24.01.2025 11:08 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Here's how it works:

- make a dataset of multiple choice questions
- duplicate the space add set the dataset repo
- log in and do the quiz
- submit the questions to create a new dataset

I made this to get ready for the agents course, but I hope it's useful for you projects too!

24.01.2025 11:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Teachers and Students! Here's a handy quiz app if you're preparing your own study material.

TLDR, It's a quiz that uses a dataset to make questions and save answers.

24.01.2025 11:08 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0
Preview
Mastering Long Contexts in LLMs with KVPress A Blog post by NVIDIA on Hugging Face

If you need long context for RAG, tool use, agents, or just because, Nvidia released a new library to make it super simple.

TLDR: You can get 128k context at 50% less memory 🐳

Here's a blog post on everything:

23.01.2025 10:00 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

What happened yesterday in the Chinese AI community? πŸš€
huggingface.co/posts/AdinaY...

21.01.2025 11:51 β€” πŸ‘ 8    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

Deepseek just dropped a frontier reasoning model on the hub. It's 685 billion parameters of bleeding edge performance on COMPLEX tasks.

Who's considering this for synthetic datasets, distillation, or pruning?

20.01.2025 08:38 β€” πŸ‘ 15    πŸ” 2    πŸ’¬ 2    πŸ“Œ 0
Preview
Gradio spaces are the perfect agent tools\! A Blog post by ben burtenshaw on Hugging Face

Here's a blog post I wrote with the details https://buff.ly/4gVpudi

17.01.2025 10:00 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Gradio And Llm Agents A Step-by-Step Gradio Tutorial

Playing around with AI agents, and I reckon Gradio spaces on the hub make the perfect tools.

- super easy to connect your agents to a bunch of useful tools and apps.
- find a Space you like on Hugging Face Hub or make your own with Gradio.
- link it up with smolagents.

🧡

17.01.2025 10:00 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Post image

We’re launching a FREE course on LLM Agents πŸ₯³

πŸ“– Learn what Agents are
πŸ•΅οΈ Build your own Agents using the latest libraries and tools.
πŸŽ“ Earn a certificate of completion to showcase your achievement.

Enroll now πŸ‘‰ huggingface.us17.list-manage.com/subscribe?u=...

15.01.2025 15:23 β€” πŸ‘ 58    πŸ” 15    πŸ’¬ 0    πŸ“Œ 0
Preview
Tools 4 Agents - a burtenshaw Collection This is a collection of spaces on the hub that are useful for building agents. https://huggingface.co/docs/smolagents/en/tutorials/tools

Here's a collection with tools for:

- create a plotly visualisation
- get travel duration
- transcribe youtube video
- transform image

https://buff.ly/3PAU6od

15.01.2025 10:00 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Tools We’re on a journey to advance and democratize artificial intelligence through open source and open science.

These should setup a few cool agent application, but if not it's easy to build a tool within a gradio application. Here's a guide:

https://buff.ly/3Wm2ZG1

15.01.2025 10:00 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1
Post image Post image

Agents need tools and the Hugging Face hub is full of them. You can use Gradio spaces on the hub as agent tools. I created a short list that I tried out and made. Here's an overview

🧡

15.01.2025 10:00 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
AI Agents Are Here. What Now? We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Great deep dive blog post on Agents, covering all the fundamentals from the ground up.

@mmitchell.bsky.social @sashamtl.bsky.social @giadapistilli.com @evijit.io

huggingface.co/blog/ethics-...

13.01.2025 19:28 β€” πŸ‘ 20    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
smol-course/8_agents at main Β· huggingface/smol-course A course on aligning smol models. Contribute to huggingface/smol-course development by creating an account on GitHub.

Here's the chapter on agent in smol course: https://buff.ly/3Cf5NOf

13.01.2025 10:00 β€” πŸ‘ 12    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Free course on Agents by Hugging Face. We just added a chapter to smol course on agents. Naturally, using smolagents! The course cover these topics:

- Code agents
- Retrieval agents
- Custom functional

If you're building agent applications, this course should help.

13.01.2025 10:00 β€” πŸ‘ 32    πŸ” 9    πŸ’¬ 1    πŸ“Œ 0
Preview
COβ‚‚ Emissions and Models Performance: Insights from the Open LLM Leaderboard We’re on a journey to advance and democratize artificial intelligence through open source and open science.

If you're looking for real talk from experience, check out this blogpost on emissions from generative ai models:

huggingface.co/blog/leaderb...

10.01.2025 08:10 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

❓What we need now?
Most of use aren't building systems to solve frontier math problems on a daily basis. Shucks! That means we need reward models and representative datasets that represent the kinds of problems we're trying to solve. Crucially, in the domains and languages we're actually working!

27.12.2024 11:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

⏩ What does it mean for us builders?
As these approaches develop, we can use small models on our use cases, and increase inference for challenging domain specific tasks. This means that for most tasks models need minimal compute, but for complex tasks we'll scale up compute.

27.12.2024 11:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Scaling test-time compute - a Hugging Face Space by HuggingFaceH4 Discover amazing ML apps made by the community

πŸ”‘ What happened in the open?
Researchers at @hf.co shared an approach that got a small(ish) model of 3 billion parameters to out perform its 70 billion variant using a search and a reward model.

@lewtun.bsky.social
Ed Beeching

https://buff.ly/3BAmi7o

27.12.2024 11:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ”’ What happened in the closed?
You've probably seen that OpenAi released a new model that aces some of the hardest math and reasoning benchmarks out there, in some cases outperforming the best human minds.

27.12.2024 11:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

What does o3 mean for small open source models on real domains? I've been asking myself this question a lot the last week, and I think it sets up an interesting path forward.
OpenAI, is calling o3 a new paradigm. If that's true, then these two developments illustrate how that paradigm plays out:

🧡

27.12.2024 11:00 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Preview
GitHub - huggingface/smol-course: A course on aligning smol models. A course on aligning smol models. Contribute to huggingface/smol-course development by creating an account on GitHub.

If you want to try this out and learn the core topics, check out the smol course!

Here's the repo on smol course: https://buff.ly/3ZCMKX2

23.12.2024 16:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - huggingface/smol-course: A course on aligning smol models. A course on aligning smol models. Contribute to huggingface/smol-course development by creating an account on GitHub.

- a dataset that represents your use case or language, without costly data collection of annotation teams.
- an evaluation set so you can compare models and APIs. Unlocking performance, latency, and cost improvements.
- Improve existing datasets by filtering down to the highest quality samples.

23.12.2024 16:00 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Synthetic Datasets are the focus of smol course this week! Synthetic datasets supercharge applying models to your own use case, because you can do stuff like this:

🧡 > >

23.12.2024 16:00 β€” πŸ‘ 13    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@benburtenshaw is following 19 prominent accounts