π₯ Bespoke curator: Synthetic Data Curation for Post-Training & Structured Data Extraction
Create synthetic data pipelines with easy!
- Retries and caching included
- inference via LiteLLM, vLLM, and popular batch APIs
- asynchronous operations
π URL: buff.ly/ajPRT1l
10.04.2025 12:00 β π 2 π 0 π¬ 0 π 0
π₯One > token > at > a > time < a < at < token < One π₯
token-explorer is a simple tool that lets you explore different possible paths that an LLM might sample!
- Arrow keys to navigate, pop and append tokens
- View the token probabilities and entropies.
GitHub: buff.ly/FQgsczM
03.04.2025 12:22 β π 11 π 2 π¬ 0 π 0
GitHub - argilla-io/synthetic-data-generator: Build datasets using natural language
Build datasets using natural language. Contribute to argilla-io/synthetic-data-generator development by creating an account on GitHub.
π½οΈ Letβs dissect the Synthetic Dataset Generator
π¬ Natural language prompt to data
π¦ Ollama ensures secure local LLM inference
βπΌ Argillaβs data curation capabilities complete the workflow
π GitHub: buff.ly/5pX49Xc
07.03.2025 13:00 β π 5 π 0 π¬ 0 π 0
π₯ Text2SQL, explore and share any data analysis!
π€ Hugging Face - Dataset Studio is an amazing new feature.
π Start yourself: buff.ly/pjpOKav
05.03.2025 10:01 β π 2 π 0 π¬ 0 π 0
GitHub - MinishLab/vicinity: Lightweight Nearest Neighbors with Flexible Backends
Lightweight Nearest Neighbors with Flexible Backends - MinishLab/vicinity
π₯ Vicinity: SEVEN semantic search BACK-ENDS, ONE single INTERFACE!
π«Έ New release to push vector search to the Hub and work with any serialisable objects.
π§βπ« KNN, HNSW, USEARCH, ANNOY, PYNNDESCENT, FAISS, and VOYAGER.
π Library:
04.03.2025 09:30 β π 3 π 0 π¬ 0 π 0
π₯ NEW cool NO-CODE solution for clicking together AI WEB APPS!
π¨ Gradio released "gradio sketch"
πΌ Really easy way to create web apps with minimal code.
βοΈ Start with `pip install gradio` & `gradio sketch`
π Release: https://buff.ly/41aeLoA
27.02.2025 10:13 β π 0 π 0 π¬ 0 π 0
vector_search_with_hub_as_backend.ipynb
Run, share, and edit Python notebooks
Vector Search - let's keep it clean and lightweight! β‘οΈ
<100K records, no problem!
>100K, some scaling issues
ANN DuckDB index, sub-second response times
Notebook:
27.02.2025 08:31 β π 2 π 0 π¬ 0 π 0
π₯ The smolagents module has arrived in the agents course!
π» Code agents optimised for software development
π§ Tool calling agents that create modular, function-driven workflows
π Retrieval agents designed to access and synthesise information
Course: https://buff.ly/4kcj6Ai
25.02.2025 15:40 β π 7 π 1 π¬ 0 π 0
π§βπ« Awesome. My talk for PyCon Italy 2025 got accepted!
Got data problems? Relax. Synthetic data is here to help.
Talk: https://buff.ly/3QzoZKj
25.02.2025 08:54 β π 3 π 0 π¬ 2 π 0
π³ Announcing docker support to Quickly set up your Synthetic Data Generator with (Gradio + Ollama + Argilla)!
π₯ Build genuinely useful datasets using natural language!
βοΈ Scale however you need.
π Use them privately or share them with the world!
π§βπ» GitHub: https://buff.ly/49IDSmd
21.02.2025 08:00 β π 3 π 0 π¬ 2 π 0
smolagents and tools gallery - a Hugging Face Space by davidberenstein1957
Discover amazing ML apps made by the community
With 80K agent builders joining the agents course, it is time to make agents explorable on the Hub!
You can now search and find the perfect agents and tools for your needs!
Powered by @Gradio!
Start searching:
20.02.2025 13:01 β π 3 π 1 π¬ 0 π 0
Image Generation has landed in Arena form π¨π€!
1. Describe your desired imageπ¨
2. Two anonymous models output images
3. Vote for the winner!
Images have been sourced from our Open Image Preference dataset!
Dataset: https://buff.ly/4il0du9
Arena: https://buff.ly/4142NwH
19.02.2025 11:05 β π 1 π 0 π¬ 2 π 0
Are you, the top of the Agents class?!
We just released a bonus unit on function calling (FC).
You will learn:
β΄ What is FC?
β΅ Thought β Act β Observe Cycle in FC
βΆ lightweight and efficient fine-tuning
Course: https://buff.ly/3Qn1DHB
18.02.2025 16:14 β π 7 π 0 π¬ 0 π 0
Smol Agents and Hugging Face - Anote AI Day Summit 2025
πΉ In case you've missed the hype around smolagents, here is a presentation I gave yesterday at an MLOps community event!
library: https://buff.ly/4hj6PrJ
slides: https://buff.ly/3WUzZ8D
video:
14.02.2025 07:18 β π 6 π 2 π¬ 0 π 1
from bells and whistles to agents and tools
Slides for my MLOps community talk on smolagents!
Slides: https://buff.ly/3WUzZ8D
12.02.2025 11:02 β π 5 π 0 π¬ 0 π 0
π Find banger tools for your smolagents!
I created the Tools gallery, which makes tools specifically developed by/for smolagents searchable and visible. This will help with:
- inspiration
- best practices
- finding cool tools
Space: https://buff.ly/41cYctx
12.02.2025 09:15 β π 1 π 0 π¬ 0 π 0
π₯ Come and get those AI agents certificates!
Join the cohort of 66K students: https://buff.ly/4hxb6rK
10.02.2025 14:38 β π 2 π 0 π¬ 0 π 0
Documents or images to structured data using Vision Language Models
Outlines has an integration with transformers, which facilitates structured generation based on limiting token sampling probabilities.
Blog: https://buff.ly/4jFHMkr
10.02.2025 13:00 β π 2 π 0 π¬ 0 π 0
Local docker deployments for the synthetic data generator π«±πΎβπ«²πΌ
We would love to hear your thoughts!
PR: https://buff.ly/4hRMny6
10.02.2025 10:13 β π 1 π 0 π¬ 0 π 0
Curious about "Why π", you may wonder?
smolagents effortlessness combined with the power of 400,000 AI tools available on the Hub!
library: https://buff.ly/4hj6PrJ
07.02.2025 12:14 β π 5 π 0 π¬ 1 π 0
WOW, this will rock the world! Hibiki is a model for simultaneous speech2speech translation.
And it actually works.
Available in French-English but super excited to see what the community will do.
Hub: https://buff.ly/3EtmM0f
Paper: https://buff.ly/4jIXNGd
06.02.2025 15:06 β π 2 π 0 π¬ 0 π 0
Agentic RAG Stack (3/5) - Generate responses using a SmolLM
A Blog post by David Berenstein on Hugging Face
Agentic RAG: Applied, visual, and step-by-step! πΎ
Get familiar with the Agents and tools, not the bells and whistles!
Retrieve - Augment and now GENERATE.
Parts:
1: https://buff.ly/40XNIxM
2: https://buff.ly/40HkB0x
3:
06.02.2025 09:47 β π 0 π 0 π¬ 0 π 0
π€― Bring your own AI data, even if you have none!
Describe your dataset for RAG, LLMs or Text Classification
Bring your own context!
Press play and wait
Space: https://buff.ly/3Y1S99z
GitHub: https://buff.ly/49IDSmd
06.02.2025 08:00 β π 4 π 0 π¬ 0 π 0
Anyone can create free hosted tools for their AI agents! π₯
Agentic RAG stack part 2 - augment
Augment retrieval results by reranking optimises content without increasing time too much
part2: https://buff.ly/40HkB0x
part1: https://buff.ly/40XNIxM
code: https://buff.ly/4hEajpj
05.02.2025 10:11 β π 1 π 0 π¬ 0 π 0
π₯ How to find and install the latest AI apps from the AI app store
1. go to https://buff.ly/42CnUbU
2. search the app you like
3. go to the bottom settings
4. open the URL
5. press the search bar to install
More info: https://buff.ly/3Csqc2J
05.02.2025 07:40 β π 0 π 0 π¬ 0 π 0
Fine-tune ModernBERT for RAG with Synthetic Data
A Blog post by Sara Han DΓaz on Hugging Face
Retrievers and rankers are a crucial part of optimising RAG.
Easier to fine-tune than LLMs. More predictable than prompts.
Training data is hard to find, so we offer private and free synthetic data on your own documents!
Blog:
04.02.2025 16:11 β π 3 π 0 π¬ 1 π 0
Index and retrieve documents for vector search using Sentence Transformers and DuckDB
A Blog post by David Berenstein on Hugging Face
Creating an agentic RAG stack on the Hugging Face Hub - part 1 - retrieval (1/5).
π Web apps and microservices included!
Chunk, embed and index documents at a huge scale without overhead.
Blog:
04.02.2025 13:00 β π 0 π 0 π¬ 0 π 0
Shit! 24B is the new small.
Mistral drops their new model on Hugging Face!
Great performance, and low latency.
Model: https://buff.ly/4hwAzBa
Code: https://buff.ly/3CEohrF
30.01.2025 17:10 β π 12 π 2 π¬ 0 π 0
Dad, husband, President, citizen. barackobama.com
Senior Lecturer in #AI at @citystgeorges.bsky.social
#KnowledgeGraphs #SemanticWeb #NeurosymbolicAI
https://github.com/turing-knowledge-graphs/
Past: UJI (Spain), EMBL-EBI, Oxford University, SIRIUS (UiO), The Alan Turing Institute, Samsung Research
Long-time AIer, lotsa random stuff⦠RPI professor
Berkeley Lab, Environmental Genomics and Systems Biology division. #GeneOntology #MonarchInitiative #AllianceGenome #NationalMicrobimeDataCollaborative #OBOFoundry.
Inventor of WWW
Co-founder & CTO Inrupt.com | Lead, Solidproject.org
Co-founder WebFoundation.org & theODI.org
Founder w3.org
Follow me on Mastodon https://w3c.social/@timbl
Pre-order my memoir βThis is for Everyoneβ https://linktr.ee/thisisforeveryone
Senior Researcher at INRIA, Ph.D. and HDR in Informatics and Computer Science, personal quote : He who controls metadata, controls the web. http://fabien.info
Post-doc @ VU Amsterdam, prev University of Edinburgh.
Neurosymbolic Machine Learning, Generative Models, commonsense reasoning
https://www.emilevankrieken.com/
Faculty at CWI & ELLIS Amsterdam https://trl-lab.github.io. Prev at UC Berkeley and the University of Amsterdam. Research on AI and tabular data to democratize insights from structured data.
https://www.madelonhulsebos.com
A philologist fascinated by the metamorphoses of text on the Web and curious about the ways the Semantic Web unfolds. I work to bridge Semantic Web technologies with dialogic marketing communication. Read me at www.teodorapetkova.com
Junior Fellow in AI @wimmics, @univcotedazur.bsky.social. Knowledge Graphs, Semantic Web, Neuro-Symbolic AI. Spokesperson @afiainfo.bsky.social
@piermonn@sigmoid.social
Senior researcher at Inria, Part-time professor at Ecole Polytechnique, France.
ACM Senior Member.
Working on BigData, AI, Fact-Checking, Disinformation https://pages.saclay.inria.fr/ioana.manolescu/
Professor #opendata, #linkeddata, #knowledgegraphs at #Ghent University
Building Linked Data Event Streams
Personal website: https://pietercolpaert.be
Team website: https://knows.idlab.ugent.be/
query engine team lead at stardog.com, endurance cyclist, immigrant.
Assistant Professor at Harokopio University of Athens
Assist. Prof. of Medical Knowledge and Decision Support at University of Zurich & St. Gallen, Switzerland, and Group Leader at Swiss Institute for Bioinformatics. https://hastingslab.org/. Interested in LLMs, ontologies, neuro-symbolic & multi-modal AI...
Research Scientist at IBM Research
AI & Data Management, Knowledge Graphs, Semantic Web, NLP
Prof. of Computer Science @EURECOM. AI, Knowledge Engineering, NLP, RecSys.
Professor at TU Wien (Austria); research on #dataManagement and #queryOptimization with a focus on #graphs and #knowledgeGraphs
https://www.dbai.tuwien.ac.at/staff/khose
Principal Developer at TopQuadrant, co-inventor of SHACL, interested in knowledge graphs, ontologies, hiking, travelling.