I'm writing an article series about creating tensors from scratch in Rust. #tensors #machine-learning #ml #ai
huggingface.co/blog/KeighBe...
@benburtenshaw.bsky.social
Building tools for AI datasets. π½ Looking in AI datasets. π Sharing clean open AI datasets. π» at https://bsky.app/profile/hf.co
I'm writing an article series about creating tensors from scratch in Rust. #tensors #machine-learning #ml #ai
huggingface.co/blog/KeighBe...
AI doesnβt get your culture?β butchers your language? π€
With FeeL β you can fix thatπ οΈπ
π¬ Talk to AI in your language
βοΈ Correct its mistakes
πβπ¨ Watch it improve
The more we use it, the smarter it gets for everyone!
π Try it now: huggingface.co/spaces/feel-...
πΆπ€π
#ai #genAI #llm
How should AI tools be designed to support rather than replace workers?
At the Reshaping Work conference, I led a roundtable exploring AIβs impact on labor. We published a blogpost on our key takeaways on responsible AI and the future of work w/ Franco Bastida
π www.rsm.nl/discovery/20...
π§΅π
I've put together some of the handier tools for building courses and educational material on the @huggingface hub.
These should bootstrap you projects with quizzes, friendly sized model, usefule datasets, and informative spaces.
Let me know if you use or need more.
https://buff.ly/42qyanw
The science team at Hugging Face reproduced and open source the seek r1. https://buff.ly/4jtbp8x
27.01.2025 10:00 β π 33 π 6 π¬ 1 π 3Manic few days in open source AI, with game changing development all over the place. Here's a round up of the resources:
Here's a thread on it all:
quiz app https://buff.ly/4atPzxo
dataset with questions https://buff.ly/3ClY9Sm
agents course we're working on https://buff.ly/4gehzqi
Here's how it works:
- make a dataset of multiple choice questions
- duplicate the space add set the dataset repo
- log in and do the quiz
- submit the questions to create a new dataset
I made this to get ready for the agents course, but I hope it's useful for you projects too!
Teachers and Students! Here's a handy quiz app if you're preparing your own study material.
TLDR, It's a quiz that uses a dataset to make questions and save answers.
If you need long context for RAG, tool use, agents, or just because, Nvidia released a new library to make it super simple.
TLDR: You can get 128k context at 50% less memory π³
Here's a blog post on everything:
What happened yesterday in the Chinese AI community? π
huggingface.co/posts/AdinaY...
Deepseek just dropped a frontier reasoning model on the hub. It's 685 billion parameters of bleeding edge performance on COMPLEX tasks.
Who's considering this for synthetic datasets, distillation, or pruning?
Here's a blog post I wrote with the details https://buff.ly/4gVpudi
17.01.2025 10:00 β π 5 π 0 π¬ 0 π 0Playing around with AI agents, and I reckon Gradio spaces on the hub make the perfect tools.
- super easy to connect your agents to a bunch of useful tools and apps.
- find a Space you like on Hugging Face Hub or make your own with Gradio.
- link it up with smolagents.
π§΅
Weβre launching a FREE course on LLM Agents π₯³
π Learn what Agents are
π΅οΈ Build your own Agents using the latest libraries and tools.
π Earn a certificate of completion to showcase your achievement.
Enroll now π huggingface.us17.list-manage.com/subscribe?u=...
Here's a collection with tools for:
- create a plotly visualisation
- get travel duration
- transcribe youtube video
- transform image
https://buff.ly/3PAU6od
These should setup a few cool agent application, but if not it's easy to build a tool within a gradio application. Here's a guide:
https://buff.ly/3Wm2ZG1
Agents need tools and the Hugging Face hub is full of them. You can use Gradio spaces on the hub as agent tools. I created a short list that I tried out and made. Here's an overview
π§΅
Great deep dive blog post on Agents, covering all the fundamentals from the ground up.
@mmitchell.bsky.social @sashamtl.bsky.social @giadapistilli.com @evijit.io
huggingface.co/blog/ethics-...
Here's the chapter on agent in smol course: https://buff.ly/3Cf5NOf
13.01.2025 10:00 β π 12 π 1 π¬ 0 π 0Free course on Agents by Hugging Face. We just added a chapter to smol course on agents. Naturally, using smolagents! The course cover these topics:
- Code agents
- Retrieval agents
- Custom functional
If you're building agent applications, this course should help.
If you're looking for real talk from experience, check out this blogpost on emissions from generative ai models:
huggingface.co/blog/leaderb...
βWhat we need now?
Most of use aren't building systems to solve frontier math problems on a daily basis. Shucks! That means we need reward models and representative datasets that represent the kinds of problems we're trying to solve. Crucially, in the domains and languages we're actually working!
β© What does it mean for us builders?
As these approaches develop, we can use small models on our use cases, and increase inference for challenging domain specific tasks. This means that for most tasks models need minimal compute, but for complex tasks we'll scale up compute.
π What happened in the open?
Researchers at @hf.co shared an approach that got a small(ish) model of 3 billion parameters to out perform its 70 billion variant using a search and a reward model.
@lewtun.bsky.social
Ed Beeching
https://buff.ly/3BAmi7o
π What happened in the closed?
You've probably seen that OpenAi released a new model that aces some of the hardest math and reasoning benchmarks out there, in some cases outperforming the best human minds.
What does o3 mean for small open source models on real domains? I've been asking myself this question a lot the last week, and I think it sets up an interesting path forward.
OpenAI, is calling o3 a new paradigm. If that's true, then these two developments illustrate how that paradigm plays out:
π§΅
If you want to try this out and learn the core topics, check out the smol course!
Here's the repo on smol course: https://buff.ly/3ZCMKX2
- a dataset that represents your use case or language, without costly data collection of annotation teams.
- an evaluation set so you can compare models and APIs. Unlocking performance, latency, and cost improvements.
- Improve existing datasets by filtering down to the highest quality samples.
Synthetic Datasets are the focus of smol course this week! Synthetic datasets supercharge applying models to your own use case, because you can do stuff like this:
π§΅ > >